>013/11/27 Nick Pentreath <[email protected]>: > CC'ing Spark Dev list > > I have been thinking about this for quite a while and would really love to > see this happen. > > Most of my pipeline ends up in Scala/Spark these days - which I love, but it > is partly because I am reliant on custom Hadoop input formats that are just > way easier to use from Scala/Java - but I still use Python a lot for data > analysis and interactive work. There is some good stuff happening with > Breeze in Scala and MLlib in Spark (and IScala) but the breadth just doesn't > compare as yet - not to mention IPython and plotting! > > There is a PR that was just merged into PySpark to allow arbitrary > serialization protocols between the Java and Python layers. I hope to try to > use this to allow PySpark users to pull data from arbitrary Hadoop > InputFormats with minimum fuss. This I believe will open the way for many > (including me!) to use PySpark directly for virtually all distributed data > processing without "needing" to use Java > (https://github.com/apache/incubator-spark/pull/146) > (http://mail-archives.apache.org/mod_mbox/incubator-spark-dev/201311.mbox/browser).
This is very interesting, thanks for the heads up. -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel ------------------------------------------------------------------------------ Rapidly troubleshoot problems before they affect your business. Most IT organizations don't have a clear picture of how application performance affects their revenue. With AppDynamics, you get 100% visibility into your Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro! http://pubads.g.doubleclick.net/gampad/clk?id=84349351&iu=/4140/ostg.clktrk _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
