Re: Decision Forest/Partial Implementation TestForest Error

2012-09-06 Thread Abdelhakim Deneche
Hi Nick, This is not a memory problem, the classifier tries to load the trained forest but it's getting some unexpected values. This problem never occured before! Could the forest files be corrupted ? Try training the forest once again, and this time use the sequential classifier (don't use

Error running RecommenderJob using mahout-core-0.5-cdh3u4-job.jar

2012-09-06 Thread tmefrt
Hi All I'm trying to test the item recommendation. using the command hadoop jar /usr/lib/mahout/mahout-core-0.5-cdh3u4-job.jar org.apache.mahout.cf.taste.hadoop.item.RecommenderJob -Dmapred.input.dir=/user/etl_user/itemrecco/in_file.txt -Dmapred.output.dir=/user/etl_user/itemreccooutput Input

Re: PCA doc question for devs:

2012-09-06 Thread Pat Ferrel
OK, thanks The SSVD junit test with U*Sigma completes fine. On Sep 5, 2012, at 5:37 PM, Dmitriy Lyubimov dlie...@gmail.com wrote: Pat, No, With SSVD you need just US, not US^-1. (or U*Sigma in other notation). This is your dimensionally reduced output of your original document matrix you've

Re: Error running RecommenderJob using mahout-core-0.5-cdh3u4-job.jar

2012-09-06 Thread Lee Carroll
-Dmapred.output.dir=/user/etl_user/itemreccooutput should that be -Dmapred.output.dir=/user/etl_user/itemrecco/output On 6 September 2012 02:40, tmefrt gkodu...@yahoo.com wrote: Hi All I'm trying to test the item recommendation. using the command hadoop jar

Re: Error running RecommenderJob using mahout-core-0.5-cdh3u4-job.jar

2012-09-06 Thread Sean Owen
-D arguments are arguments to the JVM, not the program. This needs to go in the HADOOP_OPTS env variable if using the hadoop binary. On Thu, Sep 6, 2012 at 8:05 AM, Lee Carroll lee.a.carr...@googlemail.com wrote: -Dmapred.output.dir=/user/etl_user/itemreccooutput should that be

RE: Error running RecommenderJob using mahout-core-0.5-cdh3u4-job.jar

2012-09-06 Thread A Geek
hi, I just went through the log and found this error msg: Exception in thread main org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: temp/similarityMatrix at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:231) Can

Re: Error running RecommenderJob using mahout-core-0.5-cdh3u4-job.jar

2012-09-06 Thread Sean Owen
This is just a follow-on error since the intermediate result was not created for the next stage. This is not the problem, nor is the output directory. It is as I said, the -D args. On Thu, Sep 6, 2012 at 9:45 AM, A Geek dw...@live.com wrote: hi, I just went through the log and found this error

Re: Decision Forest/Partial Implementation TestForest Error

2012-09-06 Thread Nick Jordan
Same problem with the sequential classifier. My guess is that this corruption is happening because of that particular setting as it is the only thing that I'm changing, but I have no idea how to investigate further. Nick On Thu, Sep 6, 2012 at 2:22 AM, Abdelhakim Deneche adene...@gmail.com

SGD Based Recommender Contribution Proposal

2012-09-06 Thread Gokhan Capan
Dear Mahout community, I would like to introduce a set of tools for recommender systems those are implemented as a part of my MSc. thesis. This is inspired by our conversations in the user-list, and I tried to stick it to existing Taste framework for possible contribution to Mahout. The library

Simple Result Interpretation Question

2012-09-06 Thread Thomas, Sebastien
Hi community, I am new to mahout and I am looking for some hint. I am running the itemsimilarity, I have about 8 million users and 32 items. My output file (with the format: item1, item2, similarity) is basically telling me that all my items are similar (if my interpretation is right). For

RE: Simple Result Interpretation Question

2012-09-06 Thread Thomas, Sebastien
Thanks for your reply! But all the others give me pretty similar results. Pearson: -0.14similariry0.12 Uncentered_cosine: 0.79similarity0.85 Tanimoto: 0.001similarity0.2 Loglikelyhood: 0.8similarity0.99 Thanks -Original Message- From: Sean Owen [mailto:sro...@gmail.com] Sent:

Re: Simple Result Interpretation Question

2012-09-06 Thread Sean Owen
This could be entirely correct -- it depends on your data. None of these values are wrong or surprising per se. On Thu, Sep 6, 2012 at 4:54 PM, Thomas, Sebastien sebastien.tho...@disney.com wrote: Thanks for your reply! But all the others give me pretty similar results. Pearson:

Re: Simple Result Interpretation Question

2012-09-06 Thread John Conwell
I'm curious, with 8 million users and only 32 products, your data might not be sparse enough (never thought that would be a problem). You might have enough users that purchased a high enough percentage of your products that you end up with a every item to every items recommendation. On Thu,

Re: SGD Based Recommender Contribution Proposal

2012-09-06 Thread Gokhan Capan
By the way, I want to mention that my thesis is advised by Ozgur Yilmazel, who is a founding member of the Mahout project. I conducted this study and kept the implementation integrable to Mahout with his guidance. On Thu, Sep 6, 2012 at 6:04 PM, Gokhan Capan gkhn...@gmail.com wrote: Dear Mahout

RE: Simple Result Interpretation Question

2012-09-06 Thread Thomas, Sebastien
Ok great! Thank you... that’s what I wanted to know. -Original Message- From: Sean Owen [mailto:sro...@gmail.com] Sent: Thursday, September 06, 2012 12:07 PM To: user@mahout.apache.org Subject: Re: Simple Result Interpretation Question This could be entirely correct -- it depends on

Doing dimensionality reduction with SSVD and Lanczos

2012-09-06 Thread Pat Ferrel
When using Laczos the recommendation is to use clean eigen vectors as a distributed row matrix--call it V. A-hat = A^t V^t this per the clusterdump tests DSVD and DSVD2. Dmitriy and Ted recommend when using SSVD to do: A-hat = US When using PCA it's also preferable to use --uHalfSigma to

Re: SSVD error

2012-09-06 Thread Pat Ferrel
To reiterate the situation. In local mode using the local file system SSVD dies with a file not found. In pseudo-cluster mode using hdfs SSVD on the same data it runs correctly. All the rest of the analysis pipeline works fine in either mode. I am using local mode to debug my surrounding code.

Re: SSVD error

2012-09-06 Thread Dmitriy Lyubimov
I don't believe it doesn't work in local mode because its unit tests are run in local mode. With exception of # of reducers, everything else works there just the same. That said, you can disable DistributedCache in some cases using SSVDSolver#setBroadcast(false). (in spite of what javadoc says,

Re: SSVD error

2012-09-06 Thread Dmitriy Lyubimov
Actually, it turns out unit tests explicitly do setBroadcast(false)... here's probably why. But i never remember having this problem or being motivated to set it false by anything... Hm.. On Thu, Sep 6, 2012 at 11:37 AM, Dmitriy Lyubimov dlie...@gmail.com wrote: I don't believe it doesn't work

Cluster Dumper

2012-09-06 Thread Whitmore, Mattie
A while back I was looking at how the clustering algorithm was seeming to filter data. Now I'm wondering if clusterdumper just isn't working properly. I ran a clusterdumper with 100 sample points, and the points within each cluster does not equal the number which the dumper says there should

Re: Doing dimensionality reduction with SSVD and Lanczos

2012-09-06 Thread Dmitriy Lyubimov
On Thu, Sep 6, 2012 at 10:17 AM, Pat Ferrel p...@occamsmachete.com wrote: When using Laczos the recommendation is to use clean eigen vectors as a distributed row matrix--call it V. A-hat = A^t V^t this per the clusterdump tests DSVD and DSVD2. I am not quite sure where this comes from. (for

Re: Error running RecommenderJob using mahout-core-0.5-cdh3u4-job.jar

2012-09-06 Thread Tmefrt
Tried running below mahout recommenditembased --input /user/etl_user/itemrecco --output /user/etl_user/itemreccooutput --usersFile /user/etl_user/users.txt Stuck at same job, and same error.

Re: SSVD error

2012-09-06 Thread Pat Ferrel
When I set SSVDSolver#setBroadcast(false) it works in the debugger locally. This solves my problem, thanks for sticking with me! So much easier to debug now! BTW I guess it's no surprise that when I use the file system for input data the clusterdump unit test (modified to do SSVD + kmeans)

Re: SGD Based Recommender Contribution Proposal

2012-09-06 Thread Ted Dunning
This sounds pretty exciting. Beyond that, it is hard to say much. Can you say a bit more about how you would see introducing the code into Mahout? On Thu, Sep 6, 2012 at 9:14 AM, Gokhan Capan gkhn...@gmail.com wrote: By the way, I want to mention that my thesis is advised by Ozgur Yilmazel,

Should I be using OnlineLogisticRegression?

2012-09-06 Thread Mike Burba
This is a newbie question from someone is just getting familiar with Mahout and machine learning. I bought and have read Mahout In Action, and I'm trying to apply the concepts to some real-world data (i.e., not in the examples). The problem I am trying to solve is a classification problem, so I

Re: Should I be using OnlineLogisticRegression?

2012-09-06 Thread Diederik van Liere
- My (6) predictor variables are all numeric; some of the variables range from 0...5, others range from 0...1,000,000. Have you tried rescaling your predictor variables so they have the same range? Diederik

Re: Should I be using OnlineLogisticRegression?

2012-09-06 Thread Ted Dunning
Try transforming them as well, likely with a log if they are positive and have heavily skewed values. Can you suck the data into R and paste in the results of summary(x)? (assuming you put the data into the variable x). This should look something like: summary(x) v1 v2

Re: Error Running Collaborative Filtering Job

2012-09-06 Thread tmefrt
I see you tried restarting the job, where it failed...Did you get a answer why the job was failing in middle of the main job..and what was the conclusion. -- View this message in context: http://lucene.472066.n3.nabble.com/Error-Running-Collaborative-Filtering-Job-tp2119045p4006061.html Sent