Could you post you script to reproduce the results (also how to generate the dataset)? That will help us to investigate it.
On Fri, Dec 5, 2014 at 8:40 AM, Jaonary Rabarisoa <jaon...@gmail.com> wrote: > Hmm, here I use spark on local mode on my laptop with 8 cores. The data is > on my local filesystem. Event thought, there an overhead due to the > distributed computation, I found the difference between the runtime of the > two implementations really, really huge. Is there a benchmark on how well > the algorithm implemented in mllib performs ? > > On Fri, Dec 5, 2014 at 4:56 PM, Sean Owen <so...@cloudera.com> wrote: >> >> Spark has much more overhead, since it's set up to distribute the >> computation. Julia isn't distributed, and so has no such overhead in a >> completely in-core implementation. You generally use Spark when you >> have a problem large enough to warrant distributing, or, your data >> already lives in a distributed store like HDFS. >> >> But it's also possible you're not configuring the implementations the >> same way, yes. There's not enough info here really to say. >> >> On Fri, Dec 5, 2014 at 9:50 AM, Jaonary Rabarisoa <jaon...@gmail.com> >> wrote: >> > Hi all, >> > >> > I'm trying to a run clustering with kmeans algorithm. The size of my >> > data >> > set is about 240k vectors of dimension 384. >> > >> > Solving the problem with the kmeans available in julia (kmean++) >> > >> > http://clusteringjl.readthedocs.org/en/latest/kmeans.html >> > >> > take about 8 minutes on a single core. >> > >> > Solving the same problem with spark kmean|| take more than 1.5 hours >> > with 8 >> > cores!!!! >> > >> > Either they don't implement the same algorithm either I don't understand >> > how >> > the kmeans in spark works. Is my data not big enough to take full >> > advantage >> > of spark ? At least, I expect to the same runtime. >> > >> > >> > Cheers, >> > >> > >> > Jao > > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org