Stavros Macrakis wrote: > The documentation of as.data.frame is not explicit about how it generates > column names for the simple vector case, but it seems to use the character > form of the quoted argument, e.g. > > names(as.data.frame(1:3)) > [1] "1:3" > > But there is a strange case: > > names(as.data.frame(c("a"))) > [1] "if (stringsAsFactors) factor(x) else x" > >
gosh! you don't even need the c(): names(as.data.frame('')) # same as above i thought you don''t even need the '', but then you're served with the following highly informative message: names(as.data.frame()) # Error in as.data.frame() : # element 1 is empty; # the part of the args list of 'is.null' being evaluated was: # (x) which actually comes from as.data.frame(). > I feel fairly comfortable calling this a bug, though there is no explicit > specification. > maybe there is none so that it can always be claimed that you deal with an intentional, but not (yet) documented feature, rather than a bug. let's investigate this feature. in names(as.data.frame('a')) as.data.frame is generic, 'a' is character, thus as.data.frame.character(x, ...) is called with x = 'a'. here's the code for as.data.frame.character: function (x, ..., stringsAsFactors = default.stringsAsFactors()) as.data.frame.vector(if (stringsAsFactors) factor(x) else x, ...) and the as.data.frame.vector it calls: function (x, row.names = NULL, optional = FALSE, ...) { nrows <- length(x) nm <- paste(deparse(substitute(x), width.cutoff = 500L), collapse = " ") if (is.null(row.names)) { if (nrows == 0L) row.names <- character(0L) else if (length(row.names <- names(x)) == nrows && !any(duplicated(row.names))) { } else row.names <- .set_row_names(nrows) } names(x) <- NULL value <- list(x) if (!optional) names(value) <- nm attr(value, "row.names") <- row.names class(value) <- "data.frame" value } watch carefully: nm = paste(deparse(substitute(x)), width.cutoff=500L), that is: nm = paste("if (stringsAsFactors) factor(x) else x", width.cutoff=500L) x = factor('a'), row.names==NULL, names(x)==NULL, and nrows = 1, and thus row.names = .set_row_names(1) = c(NA, -1) (interesting; see .set_row_names). and then we have: x = factor('a') # the input names(x) = NULL value = list(x) # value == list(factor('a')) names(value) = "if (stringsAsFactors) factor(x) else x" # the value of nm attr(value, 'row.names') = c(NA, -1) # the value of row.names class(value) = 'data.frame' value here you go: as some say, the answer is always in the code. that's how ugly hacks with deparse/substitute lead r core developers to produce ugly bugs. very useful, indeed. > There is another strange case which I don't understand. > > The specification of 'optional' is: > > optional: logical. If 'TRUE', setting row names and converting column > names (to syntactic names: see 'make.names') is optional. > > I am not sure what this means and why it is useful. In practice, it seems > to produce a structure of class data.frame which exhibits some very odd > behavior: > > >> d <- as.data.frame(c("a"),optional=TRUE) >> class(d) >> > [1] "data.frame" > >> d >> > structure("a", class = "AsIs") <<< where does this > column name come from? > 1 a' > gosh... rtfc, again; code as above, but this time optional=TRUE so names(value) = nm does not apply: x = factor('a') # the input names(x) = NULL value = list(x) # value == list(factor('a')) attr(value, 'row.names') = c(NA, -1) # the value of row.names class(value) = 'data.frame' value here you go. >> names(d) >> > NULL <<< not from names() > yes, because it was explicitly set to NULL, second line above. >> dput(d) >> > structure(list(structure(1L, .Label = "a", class = "factor")), row.names = > c(NA, > -1L), class = "data.frame") <<< and it doesn't show up in dput > yes, because there are no names there! it's format.data.frame, called from print.data.frame, called from print(value), that makes up this column name; rtfc. seems like there's a need for post-implementation design. for the desserts, here's another curious, somewhat related example: data = data.frame(1) row.names(data) = TRUE data # X1 # TRUE 1 as.data.frame(1, row.names=TRUE) # Error in attr(value, "row.names") <- row.names : # row names must be 'character' or 'integer', not 'logical' probably not a bug, because ?as.data.frame says: " row.names: 'NULL' or a character vector giving the row names for the data frame. Missing values are not allowed. " so it's rather a design flaw. much harder to fix in r. best, vQ ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel