Unfortunately methinks the prospects of something like Mahout/MLLib merge
seem very unlikely due to vastly diverged approach to the basics of linear
algebra (and other things). Just like one cannot grow single tree out of
two trunks -- not easily, anyway.
It is fairly easy to port (and
Agree that 'merging' is so infeasible as to not make sense. Mahout has
been ML on M/R and that's it's thing, which seems fine. IMHO this
project has been hurt by an active unwillingness to define scope, and
pretending it's helpful to have little bits of lots of ideas and
technologies.
I also
PS I am moving along cost optimizer for spark-backed DRMs on some
multiplicative pipelines that is capable of figuring different cost-based
rewrites and R-Like DSL that mixes in-core and distributed matrix
representations and blocks but it is painfully slow, i really only doing it
like couple
I imagine in Mahout offering an option to the users to select from
different execution engines (just like we currently do by giving M/R or
sequential options), and starting from Spark. I am not sure what changes
needed in the codebase, though. Maybe following MLI (or alike) and
implementing some
Hi,
After running the cluster dumper on Kmeans output I am getting only
Key of Sequence File.
Options provided for cluster dumper is:-
-i cluster-*-final of Kmeans -o Output File -p
clusteredPoint -of CSV
Is it something that I am missing.
PN: I am using sequential mode.
--
Regards
Bikash
To set expectations appropriately, I think it's important to point out
this is completely infeasible short of a total rewrite, and I can't
imagine that will happen. It may not be obvious if you haven't looked
at the code how completely dependent on M/R it is.
You can swap out M/R and Spark if you
Completely agree with Sean's statement.
On 02/19/2014 01:52 PM, Sean Owen wrote:
To set expectations appropriately, I think it's important to point out
this is completely infeasible short of a total rewrite, and I can't
imagine that will happen. It may not be obvious if you haven't looked
at
I am running cluster dumper
After extracting output from Cluster dump I am transposing the row to
column, hence I have directly called this class from my java code.
Code:
ClusterDumper.main(new String[] {
buildOption(DefaultOptionCreator.INPUT_OPTION),seqFileDir,