Re: [Rd] fast version of split.data.frame or conversion from data.frame to list of its rows

2012-05-03 Thread Jeff Ryan
A bit late and possibly tangential. The mmap package has something called struct() which is really a row-wise array of heterogenous columns. As Simon and others have pointed out, R has no way to handle this natively, but mmap does provide a very measurable performance gain by orienting rows

Re: [Rd] fast version of split.data.frame or conversion from data.frame to list of its rows

2012-05-01 Thread Matthew Dowle
Antonio Piccolboni antonio at piccolboni.info writes: Hi, I was wondering if there is anything more efficient than split to do the kind of conversion in the subject. If I create a data frame as in system.time({fd = data.frame(x=1:2000, y = rnorm(2000), id = paste(x, 1:2000, sep =))})

Re: [Rd] fast version of split.data.frame or conversion from data.frame to list of its rows

2012-05-01 Thread Prof Brian Ripley
On 01/05/2012 00:28, Antonio Piccolboni wrote: Hi, I was wondering if there is anything more efficient than split to do the kind of conversion in the subject. If I create a data frame as in system.time({fd = data.frame(x=1:2000, y = rnorm(2000), id = paste(x, 1:2000, sep =))}) user system

Re: [Rd] fast version of split.data.frame or conversion from data.frame to list of its rows

2012-05-01 Thread Antonio Piccolboni
It seems like people need to hear more context, happy to provide it. I am implementing a serialization format (typedbytes, HADOOP-1722 if people want the gory details) to make R and Hadoop interoperate better (RHadoop project, package rmr). It is a row first format and it's already implemented as

Re: [Rd] fast version of split.data.frame or conversion from data.frame to list of its rows

2012-05-01 Thread Simon Urbanek
On May 1, 2012, at 1:26 PM, Antonio Piccolboni anto...@piccolboni.info wrote: It seems like people need to hear more context, happy to provide it. I am implementing a serialization format (typedbytes, HADOOP-1722 if people want the gory details) to make R and Hadoop interoperate better

Re: [Rd] fast version of split.data.frame or conversion from data.frame to list of its rows

2012-05-01 Thread Antonio Piccolboni
On Tue, May 1, 2012 at 11:29 AM, Simon Urbanek simon.urba...@r-project.orgwrote: On May 1, 2012, at 1:26 PM, Antonio Piccolboni anto...@piccolboni.info wrote: It seems like people need to hear more context, happy to provide it. I am implementing a serialization format (typedbytes,

[Rd] fast version of split.data.frame or conversion from data.frame to list of its rows

2012-04-30 Thread Antonio Piccolboni
Hi, I was wondering if there is anything more efficient than split to do the kind of conversion in the subject. If I create a data frame as in system.time({fd = data.frame(x=1:2000, y = rnorm(2000), id = paste(x, 1:2000, sep =))}) user system elapsed 0.004 0.000 0.004 and then I try to