On Aug 15, 2013, at 7:13 PM, Lijie Xu <[email protected]> wrote: > 3) MLBase may require Spark to provide some new features for implementing > some specific algorithms. Is there any? Or you have added some new > fundamental features which are not supported in Spark-0.7?
On this particular aspect, we actually have a few small changes in 0.8 that are required in MLlib -- one is an improvement to the semantics of takeSample to allow over-sampling an RDD, and one is exposing each RDD's storage level as a public API so we can check whether it's cached and warn you otherwise. So it would be better to run this over 0.8 than 0.7. That said, you might be able to port many algorithms back to 0.7. The plan is to release 0.8 this month, so it won't be too far away. Matei
