Ted/Peter,
Thanks for the response.
This is exactly what I am trying to achieve. May be I was not able to
put my questions clearly.
I am clustering on few variables of Customer/User(except their
customer_id/user_id) and storing customer_id/user_id list in a
separate place.
Question) What is
On Tuesday, February 18, 2014 3:37 AM, Bikash Gupta bikash.gupt...@gmail.com
wrote:
Ted/Peter,
Thanks for the response.
This is exactly what I am trying to achieve. May be I was not able to
put my questions clearly.
I am clustering on few variables of Customer/User(except their
Suneel,
Thanks for the information.
I am using 0.7 packaged with CDH .
On Tue, Feb 18, 2014 at 2:14 PM, Suneel Marthi suneel_mar...@yahoo.com wrote:
On Tuesday, February 18, 2014 3:37 AM, Bikash Gupta
bikash.gupt...@gmail.com wrote:
Ted/Peter,
Thanks for the response.
This is
Bikash,
Don't use that version. Use a more recent release. We can't help that
Cloudera has an old version.
On Tue, Feb 18, 2014 at 1:26 AM, Bikash Gupta bikash.gupt...@gmail.comwrote:
Suneel,
Thanks for the information.
I am using 0.7 packaged with CDH .
On Tue, Feb 18, 2014 at 2:14
Yeah Tedseems there is major change in 0.9
In 0.9 I found that clsuteredPoint data are getting written in
PairKey,Vector rather than only Vector. Its good.
Thanks to everyone to answer correctly for an unframed question :)
On Tue, Feb 18, 2014 at 7:36 PM, Ted Dunning ted.dunn...@gmail.com
Hello again, and sorry to bother you with this once again,
I'm having a bit of trouble. My CSV files are just full of numbers (doubles).
Each line looks something like this: 2.4135,1.1120. I'm not sure if this makes
a big difference. But when I try to do step #2, I can't seem to figure out
FYI, CDH5 includes version 0.8 + patches. But 0.9 should work fine
with CDH4. You do have to build with the Hadoop 2.x profile, as usual.
On Tue, Feb 18, 2014 at 2:06 PM, Ted Dunning ted.dunn...@gmail.com wrote:
Bikash,
Don't use that version. Use a more recent release. We can't help that
Thanks Sean.
I will check how to support 0.9 with CDH4.
However 0.9 has solved my problem.
On Tue, Feb 18, 2014 at 7:45 PM, Sean Owen sro...@gmail.com wrote:
FYI, CDH5 includes version 0.8 + patches. But 0.9 should work fine
with CDH4. You do have to build with the Hadoop 2.x profile, as
I try to run an example and get the following error:
eb 18, 2014 4:31:28 PM org.apache.hadoop.mapred.LocalJobRunner$Job run
WARNING: job_local_0001
*java.lang.NoSuchFieldError: LUCENE_43*
at
org.apache.mahout.common.lucene.AnalyzerUtils.createAnalyzer(AnalyzerUtils.java:35)
at
You definitely don't have to mess with hadoop source.
On Tuesday, February 18, 2014 10:28 AM, Stamatis Rapanakis
stamrapana...@gmail.com wrote:
I try to run an example and get the following error:
eb 18, 2014 4:31:28 PM org.apache.hadoop.mapred.LocalJobRunner$Job run
WARNING:
Streaming KMeans runs with a single reducer that runs Ball KMeans and hence the
slow performance that you have been experiencing.
How did u come up with -km 63000?
Given that u would like 1 clusters (= k) and have 2,000,000 datapoints (=
n) so k * ln(n) = 1 * ln(2 * 10^6) = 145087
The Apache Mahout PMC is pleased to announce the release of Mahout 0.9.
Mahout's goal is to build scalable machine learning libraries focused
primarily in the areas of collaborative filtering (recommenders),
clustering and classification (known collectively as the 3Cs), as well as the
necessary
Just wonder what is the future of Mahout. We are seeing new stuff from
0xdata and skytree. And spark is also design for in-memory iterative
analysis. What about mahout? Will mahout run on top of spark in future?
Thanks,
Ying Liao
I am very eager to know the same from the community.
Thanks for bringing it up.
--Harshit
On Tue, Feb 18, 2014 at 1:08 PM, Ying Liao yliao...@gmail.com wrote:
Just wonder what is the future of Mahout. We are seeing new stuff from
0xdata and skytree. And spark is also design for in-memory
In general, if you are interested in machine learning.. think there is
already a machine learning specific initiative on spark called Mlbase (
http://www.mlbase.org/)
and graphx (http://amplab.github.io/graphx/) for graphlab style ml.
On Tue, Feb 18, 2014 at 1:14 PM, Harshit Bapna
Spark provides a lower-level ML library called MLlib. MLI / MLBase is
built on top of this and includes some high-level abstractions similar in
nature to distributed matrices / dataframes. But it's still pretty new and
rough at this point (https://github.com/amplab/MLI).
MLlib already provides (
On Tue, Feb 18, 2014 at 1:58 PM, Nick Pentreath nick.pentre...@gmail.comwrote:
My (admittedly heavily biased) view is Spark is a superior platform overall
for ML. If the two communities can work together to leverage the strengths
of Spark, and the large amount of good stuff in Mahout (as well
yes, this is a popular initiative.
On Tue, Feb 18, 2014 at 1:08 PM, Ying Liao yliao...@gmail.com wrote:
Just wonder what is the future of Mahout. We are seeing new stuff from
0xdata and skytree. And spark is also design for in-memory iterative
analysis. What about mahout? Will mahout run on
I know the Spark/Mllib devs can occasionally be quite set in ways of doing
certain things, but we'd welcome as many Mahout devs as possible to work
together.
It may be too late, but perhaps a GSoC project to look at a port of some stuff
like co occurrence recommender and streaming k-means?
I'm also convinced that Spark is a superior platform for executing
distributed ML algorithms. We've had a discussion about a change from
Hadoop to another platform some time ago, but at that point in time it
was not clear which of the upcoming dataflow processing systems (Spark,
Hyracks,
20 matches
Mail list logo