I ran your example file with the current trunk and got results. Can you try to upgrade or are you bound to 0.7? If the latter is the case, I can rerun the test with 0.7.

--sebastian


On 01/21/2014 05:35 PM, Quentin-Gabriel Thurier wrote:
I'm using mahout-examples-0.7-cdh4.5.0-job.jar locally. But I tried on EMR
(with mahout-examples-0.8-job.jar this time) on 3000 tracks and I also had
empty result files. Should I send you the dataset on your apache address
(it is only 140Ko)?

Quentin


2014/1/21 Sebastian Schelter <[email protected]>

Hmm, strange. Which version of mahout are you using? Do you run the 1200
tracks job locally or on a cluster? Can you share your input file (in
private)?

--sebastian



On 01/21/2014 02:34 PM, Quentin-Gabriel Thurier wrote:

Hi Sebastian

I tested the job on a tiny example (50 tracks) :

  mahout itemsimilarity --input input/msd_sample/mahout5 --output

output/mahout5 --similarityClassname SIMILARITY_EUCLIDEAN_DISTANCE
--booleanData false --maxSimilaritiesPerItem 1

*1st row of the output:

-2135949055     -335737401      0.09939478338891584

*related rows from the input:

1,-2135949055,230.42567
2,-2135949055,0.0
3,-2135949055,0.0
4,-2135949055,-3.96
5,-2135949055,-1.0
6,-2135949055,96.897
1,-335737401,222.35384
2,-335737401,0.0
3,-335737401,0.0
4,-335737401,-5.232
5,-335737401,-1.0
6,-335737401,100.812

This is correct :
1/(1+(230.42567-222.35384)^2+(-3.96--5.232)^2+(96.897-100.812)^2)
= 0.09939483

I don't have any exception except the usual warning : WARN
mapred.JobClient: Use GenericOptionsParser for parsing the arguments.
Applications should implement Tool for the same.

Then I take 1200 tracks (the 50 previous are included in the 1200) the job
don't fail but part-r-00000 is empty. As previously I only have a warning
and the input looks like:

1,524572804,192.522
2,524572804,0.0
3,524572804,0.0
4,524572804,-5.902
5,524572804,-1.0
6,524572804,123.756
1,-1821170097,269.81833
2,-1821170097,0.0
3,-1821170097,0.0
4,-1821170097,-13.496
5,-1821170097,0.26586103
6,-1821170097,86.643

Quentin


2014/1/21 Sebastian Schelter <[email protected]>

  Hi Quentin,

Have you checked the log to ensure that you don't get any exceptions
during the computation?

Could you test the job with a tiny example where you can calculate the
result by hand?

Can you share an input file on which this job fails?

--sebastian


On 01/21/2014 11:22 AM, Quentin-Gabriel Thurier wrote:

  I encounter few troubles with Mahout that I can't sort out..

The context is that I'm trying to calculate pairwise euclidean distances
between music tracks based on 6 audio features per track. My input for
the
mahout job is a text file which looks like this:

feature_id,track_id,feature_value
<integer>,< integer>,<double>

This command works locally for less than 600 tracks (based on
mahout-core-0.7-cdh4.5.0-job.jar):

mahout itemsimilarity --input input/msd_sample/mahout --output
output/mahout --similarityClassname
SIMILARITY_EUCLIDEAN_DISTANCE --booleanData false
--maxSimilaritiesPerItem 1

But for more tracks I get an empty file part-r-0000. I tried to decrease
the --threshold parameter but I still don't have any result.

I also tried to launch the job on aws EMR with the equivalent input for
3000 tracks (based on mahout-core-0.8-job.jar):

org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob
--input
s3n://hadoop-filrouge/input/msd-sample/mahout --output
s3n://hadoop-filrouge/output/mahout/01202014-itemsimilarity
--similarityClassname SIMILARITY_EUCLIDEAN_DISTANCE --booleanData false
--maxSimilaritiesPerItem 1

The job runs successfully but I get 17 empty part-r-000xx..

I'm totally stuck right now and I'm running out of idea to fix this
issue.
So if anydody only have a little idea of what is going on, that could
really help.

Many thanks,








Reply via email to