Profiling with visualvm

2014-03-30 Thread Mahmood Naderan
Hi, I  profiled the Mahout command with visualvm and saw many threads. Some of them are related to the profiler and some other are communication threads. Interesting thing is that, the main thread is always in sleep state! From the thread dump (which has been attached), the owner is Mahout.

Re: Profiling with visualvm

2014-03-30 Thread Sean Owen
Profiled what exactly, a Hadoop job? If you profile a client, you aren't learning anything about the work, but just that the client process is blocked waiting for Hadoop jobs to complete. On Mar 30, 2014 10:08 AM, Mahmood Naderan nt_mahm...@yahoo.com wrote: Hi, I profiled the Mahout command

Re: Number of features for ALS

2014-03-30 Thread Niklas Ekvall
Hi, My name is Niklas Ekvall and I have a implementation of the recommender algorithm Large-scale Parallel Collaborative Filtering for the Netflix Prize and now I'm wondering how to choose the number of features and lambda. Could any of guys help me to explain a stepwise strategy to choose or

Re: Profiling with visualvm

2014-03-30 Thread Mahmood Naderan
Profiled what exactly, a Hadoop job? As soon as I run /mahout testclassifier -m wikipediamodel -d wikipediainputI see a org.apache.mahout.driver.MahoutDriver in the visualvm and then I open it.   Regards, Mahmood

Re: Number of features for ALS

2014-03-30 Thread Sebastian Schelter
Use k-fold cross-validation or hold-out tests for estimating the quality of different parameter combinations. --sebastian On 03/30/2014 11:53 AM, Niklas Ekvall wrote: Hi, My name is Niklas Ekvall and I have a implementation of the recommender algorithm Large-scale Parallel Collaborative

Re: Number of features for ALS

2014-03-30 Thread Niklas Ekvall
Hello Sebastian, could you do a deeper explanation or refer to any article that handle the subject? Best regards, Niklas 2014-03-30 20:50 GMT+02:00 Sebastian Schelter s...@apache.org: Use k-fold cross-validation or hold-out tests for estimating the quality of different parameter

Re: Number of features for ALS

2014-03-30 Thread Ted Dunning
Niklas, http://en.wikipedia.org/wiki/Cross-validation_(statistics) http://statweb.stanford.edu/~tibs/sta306b/cvwrong.pdf On Sun, Mar 30, 2014 at 12:41 PM, Niklas Ekvall niklas.ekv...@gmail.comwrote: Hello Sebastian, could you do a deeper explanation or refer to any article that handle the

text dictionary errors from ClusterDumper

2014-03-30 Thread Bob Morris
After running CanopyDriver.run on some 4 dimensional DenseVectors, I'm using a handcrafted text dictionary passed to ClusterDumper declared as dictionary type text. The dictionary looks like this, with the entry lines having dimension and feature name separated by tab: 4 0 recordedBy 1

Re: Number of features for ALS

2014-03-30 Thread Pat Ferrel
Seems like most people agree that ranking is more important than rating in most recommender deployments. RMSE was used for a long time with cross-validation (partly because it was the choice of Netflix during the competition) but it is really a measure of total rating error. In the past we’ve

Mahout v0.9 is not working with 2.2.0-cdh5.0.0-beta-1

2014-03-30 Thread Phan, Truong Q
Hi Does Mahout v0.9 supports Cloudera Hadoop v5 (2.2.0-cdh5.0.0-beta-1)? I have managed to installed and run all test cases under the Mahout v0.9 without any issue. Please see below for the evident of the test cases. However I have no success to run the example from

Re: Mahout v0.9 is not working with 2.2.0-cdh5.0.0-beta-1

2014-03-30 Thread Andrew Musselman
Have you rebuilt Mahout for your version? We're not supporting Hadoop version two yet. See here for some direction: http://mail-archives.us.apache.org/mod_mbox/mahout-user/201403.mbox/%3CCANg8BGD8Cm_=ESecQQ5mDL+6ybbNrR1Ce7i=pkuimxmcktw...@mail.gmail.com%3E On Mar 30, 2014, at 7:28 PM, Phan,

Re: Number of features for ALS

2014-03-30 Thread Ted Dunning
Yeah... what Pat said. Off-line evaluations are difficult. At most, they provide directional guidance to be refined using live A/B testing. Of course, A/B testing of recommenders comes with a new set of tricky issues like different recommenders learning from each other. On Sun, Mar 30, 2014 at