Romain Francois wrote: > Wacek Kusnierczyk wrote: >> redirected to r-devel, because there are implementational details of >> [.data.frame discussed here. spoiler: at the bottom there is a fairly >> interesting performance result. >> >> Romain Francois wrote: >> >>> Hi, >>> >>> This is a bug I think. [.data.frame treats its arguments differently >>> depending on the number of arguments. >>> >> >> you might want to hesitate a bit before you say that something in r is a >> bug, if only because it drives certain people mad. r is a carefully >> tested software, and [.data.frame is such a basic function that if what >> you talk about were a bug, it wouldn't have persisted until now. >> > I did hesitate, and would be prepared to look the other way of someone > shows me proper evidence that this makes sense. > > > d <- data.frame( x = 1:10, y = 1:10, z = 1:10 ) > > d[ j=1 ] > x y z > 1 1 1 1 > 2 2 2 2 > 3 3 3 3 > 4 4 4 4 > 5 5 5 5 > 6 6 6 6 > 7 7 7 7 > 8 8 8 8 > 9 9 9 9 > 10 10 10 10 > > "If a single index is supplied, it is interpreted as indexing the list > of columns". Clearly this does not happen here, and this is because > NextMethod gets confused.
obviously. it seems that there is a bug here, and that it results from the lack of clear design specification. > > I have not looked your implementation in details, but it misses array > indexing, as in: yes; i didn't take it into consideration, but (still without detailed analysis) i guess it should not be difficult to extend the code to handle this. > > > d <- data.frame( x = 1:10, y = 1:10, z = 1:10 ) > > m <- cbind( 5:7, 1:3 ) > > m > [,1] [,2] > [1,] 5 1 > [2,] 6 2 > [3,] 7 3 > > d[m] > [1] 5 6 7 > > subdf( d, m ) > Error in subdf(d, m) : undefined columns selected this should be easy to handle by checking if i is a matrix and then indexing by its first column as i and the second as j. > > "Matrix indexing using '[' is not recommended, and barely > supported. For extraction, 'x' is first coerced to a matrix. For > replacement a logical matrix (only) can be used to select the > elements to be replaced in the same way as for a matrix." yes, here's how it's done (original comment): if(is.matrix(i)) return(as.matrix(x)[i]) # desperate measures and i can easily add this to my code, at virtually no additional expense. it's probably not a good idea to convert x to a matrix, x would often be much more data than the index matrix m, so it's presumably much more efficient, on average, to fiddle with i instead. there are some potentially confusing issues here: m = cbind(8:10, 1:3) d[m] # 3-element vector, as you could expect d[t(m)] # 6-element vector t(m) has dimensionality inappropriate for matrix indexing (it has 3 columns), so it gets flattened into a vector; however, it does not work like in the case of a single vector index where columns would be selected: d[as.vector(t(m))] # error: undefined columns selected i think it would be more appropriate to raise an error in a case like d[t(m)]. furthermore, if a matrix is used in a two-index form, the matrix is flattened again and is used to select rows (not elements, as in d[t(m)]). note also that the help page says that "for extraction, 'x' is first coerced to a matrix". it fails to explain that if *two* indices are used of which at least one is a matrix, no coercion is done. that is, the matrix is again flattened into a vector, but here [.data.frame forgets that it was a matrix (unlike in d[t(m)]): is(d[m]) # a character vector, matrix indexing is(d[t(m)]) # a character vector, vector indexing of elements, not columns is(d[m,]) # a data frame, row indexing and finally, the fact that d[m] in fact converts x (i.e., d) to a matrix before the indexing means that the types of values in a some columns in d may get coerced to another type: d[,2] = as.character(d[,2]) is(d[,1]) # integer vector is(d[,2]) # character vector is(d[1:2, 1]) # integer vector is(d[cbind(1:2, 1)]) # character vector for all it's worth, i think matrix indexing of data frames should be dropped: d[m] # error: ... and if one needs it, it's as simple as as.matrix(d)[m] where the conversion of d to a matrix is explicit. on the side, [.data.frame is able to index matrices: '[.data.frame'(as.matrix(d), m) # same as as.matrix(d)[m] which is, so to speak, nonsense, since '[.data.frame' is designed specifically to handle data frames; i'd expect an error to be raised here (or a warning, at the very least). to summarize, the fact that subdf does not handle matrix indices is not an issue. anyway, thanks for the comment! best, vQ ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel