On Wed, 9 Dec 2009, William Dunlap wrote:
Here are some differences between the current and proposed
split.data.frame.
Adding 'drop=FALSE' fixes this case. See in line correction below.
Chuck
d<-data.frame(Matrix=I(matrix(1:10, ncol=2)),
Named=c(one=1,two=2,three=3,four=4,five=5),
row.names=as.character(1001:1005))
group<-c("A","B","A","A","B")
split.data.frame(d,group)
$A
Matrix.1 Matrix.2 Named
1001 1 6 1
1003 3 8 3
1004 4 9 4
$B
Matrix.1 Matrix.2 Named
1002 2 7 2
1005 5 10 5
mysplit.data.frame(d,group) # lost row.names and 2nd column of Matrix
[1] "processing data.frame"
$A
Matrix Named
[1,] 1 1
[2,] 3 3
[3,] 4 4
$B
Matrix Named
[1,] 2 2
[2,] 5 5
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
-----Original Message-----
From: r-devel-boun...@r-project.org
[mailto:r-devel-boun...@r-project.org] On Behalf Of
pengyu...@gmail.com
Sent: Wednesday, December 09, 2009 2:10 PM
To: r-de...@stat.math.ethz.ch
Cc: r-b...@r-project.org
Subject: [Rd] split() is slow on data.frame (PR#14123)
Please see the following code for the runtime comparison between
split() and mysplit.data.frame() (they do the same thing
semantically). mysplit.data.frame() is a fix of split() in term of
performance. Could somebody include this fix (with possible checking
for corner cases) in future version of R and let me know the inclusion
of the fix?
m=300000
n=6
k=30000
set.seed(0)
x=replicate(n,rnorm(m))
f=sample(1:k, size=m, replace=T)
mysplit.data.frame<-function(x,f) {
print('processing data.frame')
v=lapply(
1:dim(x)[[2]]
, function(i) {
split(x[,i],f)
Change to:
split(x[,i,drop=FALSE],f)
}
)
w=lapply(
seq(along=v[[1]])
, function(i) {
result=do.call(
cbind
, lapply(v,
function(vj) {
vj[[i]]
}
)
)
colnames(result)=colnames(x)
return(result)
}
)
names(w)=names(v[[1]])
return(w)
}
system.time(split(as.data.frame(x),f))
system.time(mysplit.data.frame(as.data.frame(x),f))
______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Charles C. Berry (858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cbe...@tajo.ucsd.edu UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901
______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel