What's wrong with SparkR? I never heard of either Spark or SparkR. For on-disk dataframes there is a package called 'ff'. I looked into using it, it works well but there are some drawbacks with the implementation. I think that it should be possible to mmap an object from disk and use it as a vector, but 'ff' is doing something else:
https://github.com/edwindj/ffbase/issues/52 I think you'd need something called a "weak reference" to do this properly: http://homepage.divms.uiowa.edu/~luke/R/references/weakfinex.html I don't know what SparkR is doing under the hood. Then again I was mostly interested in having large data sets which persist across R sessions, while Juan seems to be interested in supporting data which doesn't fit in RAM. But if something doesn't fit in RAM, it can be swapped out to disk by the OS, no? So I'm not sure why you'd want a special interface for that situation, aside from giving the programmer more control. Thanks, Frederick On Mon, Sep 04, 2017 at 07:43:50AM -0500, Dirk Eddelbuettel wrote: > > On 4 September 2017 at 11:35, Suzen, Mehmet wrote: > | It is not needed. There is a large community of developer using SparkR. > | https://spark.apache.org/docs/latest/sparkr.html > | It does exactly what you want. > > I hope you are not going to mail a sparkr commercial to this list every day. > As the count is now at two, this may be an excellent good time to stop it. > > Dirk > > -- > http://dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel