2013/12/9 Nick Pentreath <[email protected]>: > This is a cool idea. And it is fairly straightforward. I hacked up an > illustration this evening: https://gist.github.com/MLnick/7880766 > > The better approach would be to amend the sklearn svmlight code to accept > iterables of strings in addition to file handles, and then pretty much no > additional code should be required (though since that part is in Cython I am > not sure, I'm just assuming it should work by eyeballing for now).
Indeed. Related evolution of the svmlight loader but not directly useful for the spark integration: seekable chunk reading with byte offsets: https://github.com/scikit-learn/scikit-learn/pull/935 Still you might want to have that piece of code in mind. > Olivier, might it make sense to put on the Wiki page for the event a few > ideas of what to look at / tackle? Not sure if this is usually done or > helpful etc for these events. Yes sure please feel free to go ahead. -- Olivier ------------------------------------------------------------------------------ Sponsored by Intel(R) XDK Develop, test and display web and hybrid apps with a single code base. Download it for free now! http://pubads.g.doubleclick.net/gampad/clk?id=111408631&iu=/4140/ostg.clktrk _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
