Hi Nick,
This is not a memory problem, the classifier tries to load the trained forest
but it's getting some unexpected values. This problem never occured before!
Could the forest files be corrupted ?
Try training the forest once again, and this time use the sequential classifier
(don't use
Hi All
I'm trying to test the item recommendation. using the command
hadoop jar /usr/lib/mahout/mahout-core-0.5-cdh3u4-job.jar
org.apache.mahout.cf.taste.hadoop.item.RecommenderJob
-Dmapred.input.dir=/user/etl_user/itemrecco/in_file.txt
-Dmapred.output.dir=/user/etl_user/itemreccooutput
Input
OK, thanks
The SSVD junit test with U*Sigma completes fine.
On Sep 5, 2012, at 5:37 PM, Dmitriy Lyubimov dlie...@gmail.com wrote:
Pat,
No,
With SSVD you need just US, not US^-1. (or U*Sigma in other notation).
This is your dimensionally reduced output of your original document
matrix you've
-Dmapred.output.dir=/user/etl_user/itemreccooutput
should that be
-Dmapred.output.dir=/user/etl_user/itemrecco/output
On 6 September 2012 02:40, tmefrt gkodu...@yahoo.com wrote:
Hi All
I'm trying to test the item recommendation. using the command
hadoop jar
-D arguments are arguments to the JVM, not the program. This needs to
go in the HADOOP_OPTS env variable if using the hadoop binary.
On Thu, Sep 6, 2012 at 8:05 AM, Lee Carroll
lee.a.carr...@googlemail.com wrote:
-Dmapred.output.dir=/user/etl_user/itemreccooutput
should that be
hi, I just went through the log and found this error msg: Exception in thread
main org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path
does not exist: temp/similarityMatrix at
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:231)
Can
This is just a follow-on error since the intermediate result was not
created for the next stage. This is not the problem, nor is the output
directory. It is as I said, the -D args.
On Thu, Sep 6, 2012 at 9:45 AM, A Geek dw...@live.com wrote:
hi, I just went through the log and found this error
Same problem with the sequential classifier. My guess is that this
corruption is happening because of that particular setting as it is
the only thing that I'm changing, but I have no idea how to
investigate further.
Nick
On Thu, Sep 6, 2012 at 2:22 AM, Abdelhakim Deneche adene...@gmail.com
Dear Mahout community,
I would like to introduce a set of tools for recommender systems those are
implemented as a part of my MSc. thesis. This is inspired by our
conversations in the user-list, and I tried to stick it to existing Taste
framework for possible contribution to Mahout.
The library
Hi community,
I am new to mahout and I am looking for some hint. I am running the
itemsimilarity, I have about 8 million users and 32 items. My output file
(with the format: item1, item2, similarity) is basically telling me that all
my items are similar (if my interpretation is right). For
Thanks for your reply! But all the others give me pretty similar results.
Pearson: -0.14similariry0.12
Uncentered_cosine: 0.79similarity0.85
Tanimoto: 0.001similarity0.2
Loglikelyhood: 0.8similarity0.99
Thanks
-Original Message-
From: Sean Owen [mailto:sro...@gmail.com]
Sent:
This could be entirely correct -- it depends on your data. None of
these values are wrong or surprising per se.
On Thu, Sep 6, 2012 at 4:54 PM, Thomas, Sebastien
sebastien.tho...@disney.com wrote:
Thanks for your reply! But all the others give me pretty similar results.
Pearson:
I'm curious, with 8 million users and only 32 products, your data might not
be sparse enough (never thought that would be a problem). You might have
enough users that purchased a high enough percentage of your products that
you end up with a every item to every items recommendation.
On Thu,
By the way, I want to mention that my thesis is advised by Ozgur Yilmazel,
who is a founding member of the Mahout project. I conducted this study and
kept the implementation integrable to Mahout with his guidance.
On Thu, Sep 6, 2012 at 6:04 PM, Gokhan Capan gkhn...@gmail.com wrote:
Dear Mahout
Ok great! Thank you... that’s what I wanted to know.
-Original Message-
From: Sean Owen [mailto:sro...@gmail.com]
Sent: Thursday, September 06, 2012 12:07 PM
To: user@mahout.apache.org
Subject: Re: Simple Result Interpretation Question
This could be entirely correct -- it depends on
When using Laczos the recommendation is to use clean eigen vectors as a
distributed row matrix--call it V.
A-hat = A^t V^t this per the clusterdump tests DSVD and DSVD2.
Dmitriy and Ted recommend when using SSVD to do:
A-hat = US
When using PCA it's also preferable to use --uHalfSigma to
To reiterate the situation. In local mode using the local file system SSVD dies
with a file not found. In pseudo-cluster mode using hdfs SSVD on the same data
it runs correctly. All the rest of the analysis pipeline works fine in either
mode. I am using local mode to debug my surrounding code.
I don't believe it doesn't work in local mode because its unit tests
are run in local mode. With exception of # of reducers, everything
else works there just the same.
That said, you can disable DistributedCache in some cases using
SSVDSolver#setBroadcast(false). (in spite of what javadoc says,
Actually, it turns out unit tests explicitly do setBroadcast(false)...
here's probably why. But i never remember having this problem or being
motivated to set it false by anything... Hm..
On Thu, Sep 6, 2012 at 11:37 AM, Dmitriy Lyubimov dlie...@gmail.com wrote:
I don't believe it doesn't work
A while back I was looking at how the clustering algorithm was seeming to
filter data. Now I'm wondering if clusterdumper just isn't working properly.
I ran a clusterdumper with 100 sample points, and the points within each
cluster does not equal the number which the dumper says there should
On Thu, Sep 6, 2012 at 10:17 AM, Pat Ferrel p...@occamsmachete.com wrote:
When using Laczos the recommendation is to use clean eigen vectors as a
distributed row matrix--call it V.
A-hat = A^t V^t this per the clusterdump tests DSVD and DSVD2.
I am not quite sure where this comes from. (for
Tried running below
mahout recommenditembased --input /user/etl_user/itemrecco --output
/user/etl_user/itemreccooutput --usersFile /user/etl_user/users.txt
Stuck at same job, and same error.
When I set SSVDSolver#setBroadcast(false) it works in the debugger locally.
This solves my problem, thanks for sticking with me! So much easier to debug
now!
BTW I guess it's no surprise that when I use the file system for input data the
clusterdump unit test (modified to do SSVD + kmeans)
This sounds pretty exciting. Beyond that, it is hard to say much.
Can you say a bit more about how you would see introducing the code into
Mahout?
On Thu, Sep 6, 2012 at 9:14 AM, Gokhan Capan gkhn...@gmail.com wrote:
By the way, I want to mention that my thesis is advised by Ozgur Yilmazel,
This is a newbie question from someone is just getting familiar with Mahout
and machine learning.
I bought and have read Mahout In Action, and I'm trying to apply the
concepts to some real-world data (i.e., not in the examples).
The problem I am trying to solve is a classification problem, so I
- My (6) predictor variables are all numeric; some of the variables range
from 0...5, others range from 0...1,000,000.
Have you tried rescaling your predictor variables so they have the same range?
Diederik
Try transforming them as well, likely with a log if they are positive and
have heavily skewed values.
Can you suck the data into R and paste in the results of summary(x)?
(assuming you put the data into the variable x). This should look
something like:
summary(x)
v1 v2
I see you tried restarting the job, where it failed...Did you get a answer
why the job was failing in middle of the main job..and what was the
conclusion.
--
View this message in context:
http://lucene.472066.n3.nabble.com/Error-Running-Collaborative-Filtering-Job-tp2119045p4006061.html
Sent
28 matches
Mail list logo