I spent a week trying to get Hadoop to work on Windows 7, and then gave up.
Do you manage to run Hadoop on Windows? Do Hadoop tests (e.g. wordcount) work?
http://en.wikisource.org/wiki/User:Fkorning/Code/Hadoop-on-Cygwin has
lots of details about this.
Some of the possible problems are cygwin
Hi,
Yes i'm using mahout and hadoop libs on windows.
My cluster output is not written on hdfs but in LOCAL.
Thanks to cygwin I am able to run unix command in order to run mahout on
windows.
I changed the path on windows as well.
I didn’t test if wordcount is working, because I am using only
This is the case:
https://issues.apache.org/jira/browse/MAHOUT-973
The bug exists in Mahout 0.6 and was fixed in Mahout 0.7.
I also used the workaround of using a high value for --maxDFPercent
(I guess the number of documents in the corpus is enough).
Maybe it will be good to fix it on 0.6 as
I already generated points directory when i run cluster (kmeans in my case).
But for the moment I can't generate clustedump because of error on this line:
ClusterDumper.readPoints(new Path(output/kmeans/clusters-0), 2, conf);
Second parameter is double but he wants int but does not accept int
I don't know why ClusterDumper is not working, but I can give an
alternate solution.
Use ClusterOutputPostProcessor (clusterpp), on the clusters-*-final
directory. https://cwiki.apache.org/MAHOUT/top-down-clustering.html
It will arrange the vectors in respective directories. However, it will
Just succeed to make work my app. Should to use
ClusterDumperWriter.gettopfeatures(ar1,arg2,arg3) and that gave me the top
words on human readable format :D
-Message d'origine-
De : Paritosh Ranjan [mailto:pran...@xebia.com]
Envoyé : mardi 7 août 2012 10:32
À : user@mahout.apache.org
Hi All,
We have developed an auto tagging system for our micro-blogging platform.
Here is what we have done:
The purpose of the system was to look for tags in an articles automatically
when someone posts a link in our micro-blogging site. The goal was to allow
us to follow a tag instead (in
Nice stuff. And glad that Mahout was able to help!
On Tue, Aug 7, 2012 at 7:37 AM, SAMIK CHAKRABORTY sam...@gmail.com wrote:
Hi All,
We have developed an auto tagging system for our micro-blogging platform.
Here is what we have done:
The purpose of the system was to look for tags in an
Hi,
I would like to know how I can deal with multiple preference values
for the same (user, item)-pair from a machine learning perspective?
That means, I have got more than one rating from a user u for an item i
available.
Of course using any kind of average (maybe also taking date information
As far as I remember, Mahout overrides older preference values with the
newest one.
On Tue, Aug 7, 2012 at 2:14 PM, Dominik Lahmann
dominik.lahm...@fu-berlin.de wrote:
Hi,
I would like to know how I can deal with multiple preference values
for the same (user, item)-pair from a machine
It depends on what the values really mean. If they are something like
ratings, using the most recent version makes most sense. (This is what the
implementations do now.) If they are some kind of sampled reading it might
make sense to take an average. If the input is based on observed activity,
it
I have used the same steps to create the dictionary and vector output from
solr using *lucene.vector* command.
Is there any way to pull only latest changes from solr and create vectors.
Later how do we run clustering algorithms using this incremented vector
files. Can you shed some light on this?
Hi Jake,
Today I submitted the diff. It is available at
https://issues.apache.org/jira/browse/MAHOUT-1051
Thanks for the advices
On Tue, Aug 7, 2012 at 1:06 AM, Jake Mannix jake.man...@gmail.com wrote:
Sounds great Gokhan!
On Mon, Aug 6, 2012 at 2:53 PM, Gokhan Capan gkhn...@gmail.com
Hello Yuval,
Thanks for the link.
But I am sure I use 0.7 version. I will double check it
Pavel
От: Yuval Feinstein [yuv...@citypath.com]
Отправлено: 7 августа 2012 г. 11:08
To: user@mahout.apache.org
Тема: Re: Seq2sparse example produces bad TFIDF
Hello,
I am trying to run KMeans example on 15 000 000 documents (seq2sparse output).
There are 1 000 clusters, 200 000 terms dictionary and 3-10 terms document size
(titles). seq2sparse produces 200 files 80 MB each.
My job failed with Java heap space Error. 1st iteration passes while 2nd
15 matches
Mail list logo