Re: [Rd] arbitrary size data frame or other stcucts, curious about issues invovled.

Mike Marchywka Tue, 21 Jun 2011 04:33:43 -0700

Thanks,

http://cran.r-project.org/doc/manuals/R-ints.html#Future-directions

Normally I'd take more time to digest these things before commenting but
a few things struck me right away. First, use of floating point or double 
as a replacement for int strikes me as "going the wrong way" as often
to get predictable performance you try to tell the compiler you have
ints rather than any floating time for which it is free to "round."  This
is even ignoring any performance issue. The other thing is that scaling
should not just be an issue of "make everything bigger" as the growth in
both data needs and computer resources is not uniform. 

I guess my first thought to these constraints and resource issues
is to consider a paged dataframe depending upon the point at which
the 32-bit int constraint is imposed. A random access data struct 
does not always get accessed randomly, and often it is purely sequential.
Further down the road, it would be nice if algorithms were implemented in a
block mode or could communicate their access patterns to the ds or
at least tell it to prefetch things that should be needed soon. 

I guess I'm thinking mostly along the lines of things I've seen from Intel
such as ( first things I could find on their site as I have not looked in detail
in quite a while),


http://www.google.com/search?hl=en&source=hp&q=site%3Aintel.com+performance+optimization

as once you get around thrashing virtual memory, you'd like to preserve the
lower level memory cache hit rates too etc. These are probably not just 
niceities, 
at least with VM, as personally I've seen impl related speed issues make simple 
analyses impractical.

















> Subject: RE: arbitrary size data frame or other stcucts, curious about issues 
> invovled.
> From: jayemer...@gmail.com
> To: marchy...@hotmail.com; r-devel@r-project.org
> 
> Mike,
> 
> 
> Neither bigmemory nor ff are "drop in" solutions -- though useful,
> they are primarily for data storage and management and allowing
> convenient access to subsets of the data.  Direct analysis of the full
> objects via most R functions is not possible.  There are many issues
> that could be discussed here (and have, previously), including the use
> of 32-bit integer indexing.  There is a nice section "Future
> Directions" in the R Internals manual that you might want to look at.
> 
> Jay
> 
> 
> -------------------------------------  Original message:
> 
> We keep getting questions on r-help about memory limits  and
> I was curious to know what issues are involved in making
> common classes like dataframe work with disk and intelligent
> swapping? That is, sure you can always rely on OS for VM
> but in theory it should be possible to make a data structure
> that somehow knows what pieces you will access next and
> can keep thos somewhere fast. Now of course algorithms
> "should" act locally and be block oriented but in any case
> could communicate with data structures on upcoming
> access patterns, see a few ms into the future and have the
> right stuff prefetched.
> 
> I think things like "bigmemory" exist but perhaps one
> issue was that this could not just drop in for data.frame
> or does it already solve all the problems?
> 
> Is memory management just a non-issue or is there something
> that needs to be done  to make large data structures work well?
> 
> 
> -- 
> John W. Emerson (Jay)
> Associate Professor of Statistics
> Department of Statistics
> Yale University
> http://www.stat.yale.edu/~jay
                                          
        [[alternative HTML version deleted]]

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] arbitrary size data frame or other stcucts, curious about issues invovled.

Reply via email to