Re: Mahout on Spark?

2014-02-19 Thread Dmitriy Lyubimov
Unfortunately methinks the prospects of something like Mahout/MLLib merge seem very unlikely due to vastly diverged approach to the basics of linear algebra (and other things). Just like one cannot grow single tree out of two trunks -- not easily, anyway. It is fairly easy to port (and

Re: Mahout on Spark?

2014-02-19 Thread Sean Owen
Agree that 'merging' is so infeasible as to not make sense. Mahout has been ML on M/R and that's it's thing, which seems fine. IMHO this project has been hurt by an active unwillingness to define scope, and pretending it's helpful to have little bits of lots of ideas and technologies. I also

Re: Mahout on Spark?

2014-02-19 Thread Dmitriy Lyubimov
PS I am moving along cost optimizer for spark-backed DRMs on some multiplicative pipelines that is capable of figuring different cost-based rewrites and R-Like DSL that mixes in-core and distributed matrix representations and blocks but it is painfully slow, i really only doing it like couple

Re: Mahout on Spark?

2014-02-19 Thread Gokhan Capan
I imagine in Mahout offering an option to the users to select from different execution engines (just like we currently do by giving M/R or sequential options), and starting from Spark. I am not sure what changes needed in the codebase, though. Maybe following MLI (or alike) and implementing some

Cluster Dumper in 0.9

2014-02-19 Thread Bikash Gupta
Hi, After running the cluster dumper on Kmeans output I am getting only Key of Sequence File. Options provided for cluster dumper is:- -i cluster-*-final of Kmeans -o Output File -p clusteredPoint -of CSV Is it something that I am missing. PN: I am using sequential mode. -- Regards Bikash

Re: Mahout on Spark?

2014-02-19 Thread Sean Owen
To set expectations appropriately, I think it's important to point out this is completely infeasible short of a total rewrite, and I can't imagine that will happen. It may not be obvious if you haven't looked at the code how completely dependent on M/R it is. You can swap out M/R and Spark if you

Re: Mahout on Spark?

2014-02-19 Thread Sebastian Schelter
Completely agree with Sean's statement. On 02/19/2014 01:52 PM, Sean Owen wrote: To set expectations appropriately, I think it's important to point out this is completely infeasible short of a total rewrite, and I can't imagine that will happen. It may not be obvious if you haven't looked at

Re: Cluster Dumper in 0.9

2014-02-19 Thread Bikash Gupta
I am running cluster dumper After extracting output from Cluster dump I am transposing the row to column, hence I have directly called this class from my java code. Code: ClusterDumper.main(new String[] { buildOption(DefaultOptionCreator.INPUT_OPTION),seqFileDir,