Full_Name: Lutz Prechelt Version: 2.4.1 OS: Windows XP Submission from: (NULL) (160.45.111.67)
I stack a number of data.frames using rbind. Each of these dataframes has a column 'authorname', which is a factor and a column author = unclass(authorname) as piecewise pseudonyms. When using rbind to stack these dataframes, R warns about invalid factor levels and inserts all NAs in the author column. The reason appears to be that rbind.data.frame looks for the presence of levels, not actually for class==factor when deciding what to handle as a factor: if (!is.null(levels(xj))) { I find this behavior surprising, hence dangerous, and it is not documented. Rather, the documentation says: "The 'rbind' data frame method takes the classes of the columns from the first data frame, and matches columns by name (rather than by position). Factors have their levels expanded as necessary [...]" The behavior has bitten me fairly hard, because I searched for the origin of the warning in all the wrong places before finding the real one after about 3 hours. (Although I still have not understood _why_ it results in that warning.) I believe the behavior of rbind.data.frame should be fixed, so that it ignores levels attributes when there is no factor class as well. The alternative would be to just add a warning to the documentation that 'unclass' on factors is insufficient if users want to avoid factor handling for rbind. ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel