RE: Mahout v0.9 is not working with 2.2.0-cdh5.0.0-beta-1

2014-03-31 Thread Phan, Truong Q
Yes, I did rebuild it. oracle@bpdevdmsdbs01: /ora/db002/stg001/BDMSL1D/hadoop/nem-dms/devices/mahout/mahout-distribution-0.9 - $ mvn clean install -Dhadoop2.version=2.2.0-cdh5.0.0-beta-1 -DskipTests=true [INFO] Scanning for projects... [INFO]

Re: Profiling with visualvm

2014-03-31 Thread Mahmood Naderan
I tried with YourKit and a CPU sampling analysis shows only three threads! org.apache.hadoop.mapred.LocalJobRunner$Job.run() org.apache.mahout.driver.MahoutDriver.main(String[]) java.lang.Thread.run() I am trying to view somthing like http://www.yourkit.com/docs/yjp2013/help/cpu_intro.jsp

RE: Mahout v0.9 is not working with 2.2.0-cdh5.0.0-beta-1

2014-03-31 Thread Sean Owen
But you have a bunch of Hadoop 0.20 jars on your classpath! Definitely a problem. Those should not be there. On Mar 31, 2014 7:09 AM, Phan, Truong Q troung.p...@team.telstra.com wrote: Yes, I did rebuild it. oracle@bpdevdmsdbs01:

Re: Fuzzy KMeans fails on reuters corpus with 4GB max heap size

2014-03-31 Thread tuxdna
What else could I do to avoid the problem ? Another question is that, whether or not can this be resolved using a later version of Mahout. I ran the same example with Mahout 0.9 and it works fine for me. Regards, Saleem

Re: (help!) Can someone scan this

2014-03-31 Thread Jay Vyas
FYI I eventually got this working. Im not sure what the fix was, but here is all the stuff i tried (some combination below must have got it working) . - created log4j.properties files and made sure all the necessary properties were there - exported some of the usual hadoop HOME and HADOOP_CONF

Recommendation thresholds

2014-03-31 Thread Jay Vyas
Hi again mahout! What is the lowest that we can set a threshold in the item recommender? I'd like to set it low enough to gaurantee output to confirm that my recommender actually worked structurally, and then start tightening it up But with --threshold=.0001 i still get no results.

Using split without partitioning the data to train/test

2014-03-31 Thread Mahmood Naderan
Hi, In an old Mahout, I used wikipediaDataSetCreator on an input to create the training data         mahout wikipediaDataSetCreator -i wiki-tr/chunks -o tr-input -c labels.txt and then fed the tr-input to the trainclassifier using     mahout trainclassifier -i tr-input -o wikimodel Now, in

Re: Using split without partitioning the data to train/test

2014-03-31 Thread Suneel Marthi
Sent from my iPhone On Mar 31, 2014, at 4:20 PM, Mahmood Naderan nt_mahm...@yahoo.com wrote: Hi, In an old Mahout, I used wikipediaDataSetCreator on an input to create the training data mahout wikipediaDataSetCreator -i wiki-tr/chunks -o tr-input -c labels.txt and then

Re: Using split without partitioning the data to train/test

2014-03-31 Thread Mahmood Naderan
Yeah you are right. I have to ignore that command   Regards, Mahmood On Monday, March 31, 2014 6:56 PM, Suneel Marthi suneel_mar...@yahoo.com wrote: Sent from my iPhone On Mar 31, 2014, at 4:20 PM, Mahmood Naderan nt_mahm...@yahoo.com wrote: Hi, In an old Mahout, I used

Difference between CiMapper and ClusterIterator

2014-03-31 Thread Frank Scholten
Hi all, I noticed in the CIMapper that the policy.update() call is done in the setup of the mapper, while in the ClusterIterator it is called for every vector in the iteration. In the sequential version there is only a single policy while in the MR version we will get a policy per mapper. Which

Amazon EMR updating Mahout

2014-03-31 Thread Andrew Musselman
The EMR team told me that as requested they'll upgrade their default AMI to use Mahout 0.9 in their next release scheduled for April 7. Best Andrew