Re: SparseVector, VectorWriter VectorIterable

2013-09-10 Thread Michael Wechner
you might want to have a loot at core/src/main/java/org/apache/mahout/vectorizer/SparseVectorsFromSequenceFiles.java Regarding http://people.apache.org/~isabel/mahout_site/mahout-matrix/apidocs/org/apache/mahout/matrix/SparseVector.html it seems that this class does not exist anymore inside

Question re cluster-reuters.sh

2013-09-10 Thread Michael Wechner
Hi I have tried to follow/execute the steps described at https://cwiki.apache.org/confluence/display/MAHOUT/Quick+tour+of+text+analysis+using+the+Mahout+command+line but had trouble to do so, because for example |org.apache.lucene.benchmark.utils.ExtractReuters does not seem to be contained

Re: Question re cluster-reuters.sh

2013-09-10 Thread Suneel Marthi
Michael. 1. build-reuters.sh is to be be retired, use cluster-reuters.sh instead. 2. You are correct, the script does what's been described in the wiki link. From: Michael Wechner michael.wech...@wyona.com To: mahout-u...@apache.org Sent: Tuesday, September

Re: Question re cluster-reuters.sh

2013-09-10 Thread Michael Wechner
Thanks Suneel for confirming Will try to understand it better and then probably post more questions. Thanks Am 10.09.13 16:26, schrieb Suneel Marthi: Michael. 1. build-reuters.sh is to be be retired, use cluster-reuters.sh instead. 2. You are correct, the script does what's been described in

Re: running mahout on Hadoop 2.0.0-cdh4.3.1

2013-09-10 Thread Sean Owen
You are trying to run on Hadoop 2 and Mahout only works with Hadoop 1 and related branches. This wont work. However the CDH distributions also come in an 'mr1' flavor that stands a much better chance of working with something that is built for Hadoop 1. Use 2.0.0-mr1-4.3.1 instead. (PS 4.3.2 and

Re: running mahout on Hadoop 2.0.0-cdh4.3.1

2013-09-10 Thread Parimi Rohit
Thanks Sean. Will look into that. Rohit On Tue, Sep 10, 2013 at 1:32 PM, Sean Owen sro...@gmail.com wrote: You are trying to run on Hadoop 2 and Mahout only works with Hadoop 1 and related branches. This wont work. However the CDH distributions also come in an 'mr1' flavor that stands a

Tuning parameters for ALS-WR

2013-09-10 Thread Parimi Rohit
Hi All, I was wondering if there is any experimental design to tune the parameters of ALS algorithm in mahout, so that we can compare its recommendations with recommendations from another algorithm. My datasets have implicit data and would like to use the following design for tuning the ALS

running mahout on Hadoop 2.0.0-cdh4.3.1

2013-09-10 Thread Parimi Rohit
Hi All, I am used to running mahout (mahout-core-0.9-SNAPSHOT-job.jar) in the Apache Hadoop environment, however, we had to switch to Cloudera distribution. When I try to run the item based collaborative filtering job (org.apache.mahout.cf.taste.hadoop.item.RecommenderJob) in the Cloudera

SVD, how are the missing values treated?

2013-09-10 Thread Yang
in the simple equation describing SVD: A = USV I guess the original matrix A has to have every value filled, so that mathematics will be able to carry out the calculation, right? but the mahout package described here: https://cwiki.apache.org/confluence/display/MAHOUT/Dimensional+Reduction

Re: SVD, how are the missing values treated?

2013-09-10 Thread Dmitriy Lyubimov
On Tue, Sep 10, 2013 at 5:48 PM, Yang tedd...@gmail.com wrote: in the simple equation describing SVD: A = USV I guess the original matrix A has to have every value filled, so that mathematics will be able to carry out the calculation, right? No. A may be sparse, where 0 elements are

Re: Tuning parameters for ALS-WR

2013-09-10 Thread Ted Dunning
You definitely need to separate into three sets. Another way to put it is that with cross validation, any learning algorithm needs to have test data withheld from it. The remaining data is training data to be used by the learning algorithm. Some training algorithms such as the one that you