On Wed, 23 Aug 2006, Damien Moore wrote:

>
> Thomas Lumley < [EMAIL PROTECTED] > wrote:
>
>> I have written most of a bigglm() function where the data= argument is a
>> function with a single argument 'reset'. When called with reset=FALSE the
>> function should return another chunk of data, or NULL if no data are
>> available, and when called with reset=TRUE it should go back to the
>> beginning of the data. I don't think this is too inelegant.
>
> yes, that does sound like a pretty elegent solution. It would be even 
> more so if you could offer a default implementation of the data_function 
> that simply passes chunks of large X and y matrices held in memory.

I have done that for data frames.

> (ideally you would just intialize the data_function to reference the X 
> and y data to avoid duplicating it, don't know if that's possible in R.)

The part that is extracted is a copy. The whole thing isn't copied, 
though.

The chunk would have to be a copy if it were an R matrix because matrices 
are stored in continguous column-major format and a chunk won't be 
contiguous. I think an implementation that uses precomputed design 
matrices would want to be written in C and call the incremental QR 
decomposition routines row by row.  The reason for working in chunks in R 
is to allow model.frame and model.matrix to work reasonably efficiently, 
and they aren't needed if you already have the design matrix.

> how long before its ready? :)

Depends on how many more urgent things intervene.

        -thomas

Thomas Lumley                   Assoc. Professor, Biostatistics
[EMAIL PROTECTED]       University of Washington, Seattle

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to