Hello, I think there is an inconsistency in the handling of the compact form of the row.names attributes.
When n is the number of rows of a data.frame, the compact form is c(NA_integer_,-n), as in: > d <- data.frame(x=1:10) > .Internal(inspect(d)) @104f174a8 19 VECSXP g0c1 [OBJ,NAM(2),ATT] (len=1, tl=0) @103a7dc60 13 INTSXP g0c4 [] (len=10, tl=0) 1,2,3,4,5,... ATTRIB: @104959380 02 LISTSXP g0c0 [] TAG: @100823078 01 SYMSXP g1c0 [MARK,LCK,gp=0x4000] "names" (has value) @104f17748 16 STRSXP g0c1 [NAM(2)] (len=1, tl=0) @10085c678 09 CHARSXP g1c1 [MARK,gp=0x61] [ASCII] [cached] "x" TAG: @10082d060 01 SYMSXP g1c0 [MARK,LCK,gp=0x4000] "row.names" (has value) @104f0e898 13 INTSXP g0c1 [] (len=2, tl=0) -2147483648,-10 TAG: @100823548 01 SYMSXP g1c0 [MARK,LCK,gp=0x4000] "class" (has value) @104f0e8c8 16 STRSXP g0c1 [NAM(1)] (len=1, tl=0) @1008a7e60 09 CHARSXP g1c2 [MARK,gp=0x61,ATT] [ASCII] [cached] "data.frame" But then, -10 becomes 10: > d2 <- structure( d, class = "data.frame" ) > .Internal(inspect(d2)) @104f08898 19 VECSXP g0c1 [OBJ,NAM(2),ATT] (len=1, tl=0) @103a7dc60 13 INTSXP g0c4 [NAM(2)] (len=10, tl=0) 1,2,3,4,5,... ATTRIB: @104956150 02 LISTSXP g0c0 [] TAG: @100823078 01 SYMSXP g1c0 [MARK,LCK,gp=0x4000] "names" (has value) @104f087a8 16 STRSXP g0c1 [] (len=1, tl=0) @10085c678 09 CHARSXP g1c1 [MARK,gp=0x61] [ASCII] [cached] "x" TAG: @10082d060 01 SYMSXP g1c0 [MARK,LCK,gp=0x4000] "row.names" (has value) @104f088c8 13 INTSXP g0c1 [] (len=2, tl=0) -2147483648,10 TAG: @100823548 01 SYMSXP g1c0 [MARK,LCK,gp=0x4000] "class" (has value) @104f08838 16 STRSXP g0c1 [] (len=1, tl=0) @1008a7e60 09 CHARSXP g1c2 [MARK,gp=0x61,ATT] [ASCII] [cached] "data.frame" This happens in row_names_gets (attrib.c), here: if(OK_compact) { /* we hide the length in an impossible integer vector */ PROTECT(val = allocVector(INTSXP, 2)); INTEGER(val)[0] = NA_INTEGER; INTEGER(val)[1] = n; ans = installAttrib(vec, R_RowNamesSymbol, val); UNPROTECT(1); return ans; } I believe it should be INTEGER(val)[1] = -n; for consistency. BTW, perhaps structure should be internalized to prevent special handling of row.names when it does not make sense. Here is structure: structure function (.Data, ...) { attrib <- list(...) if (length(attrib)) { specials <- c(".Dim", ".Dimnames", ".Names", ".Tsp", ".Label") replace <- c("dim", "dimnames", "names", "tsp", "levels") m <- match(names(attrib), specials) ok <- (!is.na(m) & m) names(attrib)[ok] <- replace[m[ok]] if ("factor" %in% attrib[["class", exact = TRUE]] && typeof(.Data) == "double") storage.mode(.Data) <- "integer" attributes(.Data) <- c(attributes(.Data), attrib) } return(.Data) } When I do structure( d, class = "data.frame" ), eventually this line is executed: attributes(.Data) <- c(attributes(.Data), attrib) So, first attributes are retrieved, and because of R special handling of row.names, it gets promoted to 1:n in getAttrib. Then, we want to set the row.names attribute, special handling again in setAttrib, leading to row_names_gets, R actually loops over the attribute to check if it is of the form 1:n, and if it is it brings back the compact form (making a mistake along the way). This looks like a waste of resources. Romain ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel