The point is that I'm using the cloudera pseudo-distirbuted distribution
and I think mahout-core-0.7-cdh4.5.0-job.jar is the up to date mahout
version for cdh4.


2014/1/21 Sebastian Schelter <[email protected]>

> I ran your example file with the current trunk and got results. Can you
> try to upgrade or are you bound to 0.7? If the latter is the case, I can
> rerun the test with 0.7.
>
> --sebastian
>
>
>
> On 01/21/2014 05:35 PM, Quentin-Gabriel Thurier wrote:
>
>> I'm using mahout-examples-0.7-cdh4.5.0-job.jar locally. But I tried on
>> EMR
>> (with mahout-examples-0.8-job.jar this time) on 3000 tracks and I also had
>> empty result files. Should I send you the dataset on your apache address
>> (it is only 140Ko)?
>>
>> Quentin
>>
>>
>> 2014/1/21 Sebastian Schelter <[email protected]>
>>
>>  Hmm, strange. Which version of mahout are you using? Do you run the 1200
>>> tracks job locally or on a cluster? Can you share your input file (in
>>> private)?
>>>
>>> --sebastian
>>>
>>>
>>>
>>> On 01/21/2014 02:34 PM, Quentin-Gabriel Thurier wrote:
>>>
>>>  Hi Sebastian
>>>>
>>>> I tested the job on a tiny example (50 tracks) :
>>>>
>>>>   mahout itemsimilarity --input input/msd_sample/mahout5 --output
>>>>
>>>>>
>>>>>  output/mahout5 --similarityClassname SIMILARITY_EUCLIDEAN_DISTANCE
>>>> --booleanData false --maxSimilaritiesPerItem 1
>>>>
>>>> *1st row of the output:
>>>>
>>>> -2135949055     -335737401      0.09939478338891584
>>>>
>>>> *related rows from the input:
>>>>
>>>> 1,-2135949055,230.42567
>>>> 2,-2135949055,0.0
>>>> 3,-2135949055,0.0
>>>> 4,-2135949055,-3.96
>>>> 5,-2135949055,-1.0
>>>> 6,-2135949055,96.897
>>>> 1,-335737401,222.35384
>>>> 2,-335737401,0.0
>>>> 3,-335737401,0.0
>>>> 4,-335737401,-5.232
>>>> 5,-335737401,-1.0
>>>> 6,-335737401,100.812
>>>>
>>>> This is correct :
>>>> 1/(1+(230.42567-222.35384)^2+(-3.96--5.232)^2+(96.897-100.812)^2)
>>>> = 0.09939483
>>>>
>>>> I don't have any exception except the usual warning : WARN
>>>> mapred.JobClient: Use GenericOptionsParser for parsing the arguments.
>>>> Applications should implement Tool for the same.
>>>>
>>>> Then I take 1200 tracks (the 50 previous are included in the 1200) the
>>>> job
>>>> don't fail but part-r-00000 is empty. As previously I only have a
>>>> warning
>>>> and the input looks like:
>>>>
>>>> 1,524572804,192.522
>>>> 2,524572804,0.0
>>>> 3,524572804,0.0
>>>> 4,524572804,-5.902
>>>> 5,524572804,-1.0
>>>> 6,524572804,123.756
>>>> 1,-1821170097,269.81833
>>>> 2,-1821170097,0.0
>>>> 3,-1821170097,0.0
>>>> 4,-1821170097,-13.496
>>>> 5,-1821170097,0.26586103
>>>> 6,-1821170097,86.643
>>>>
>>>> Quentin
>>>>
>>>>
>>>> 2014/1/21 Sebastian Schelter <[email protected]>
>>>>
>>>>   Hi Quentin,
>>>>
>>>>>
>>>>> Have you checked the log to ensure that you don't get any exceptions
>>>>> during the computation?
>>>>>
>>>>> Could you test the job with a tiny example where you can calculate the
>>>>> result by hand?
>>>>>
>>>>> Can you share an input file on which this job fails?
>>>>>
>>>>> --sebastian
>>>>>
>>>>>
>>>>> On 01/21/2014 11:22 AM, Quentin-Gabriel Thurier wrote:
>>>>>
>>>>>   I encounter few troubles with Mahout that I can't sort out..
>>>>>
>>>>>>
>>>>>> The context is that I'm trying to calculate pairwise euclidean
>>>>>> distances
>>>>>> between music tracks based on 6 audio features per track. My input for
>>>>>> the
>>>>>> mahout job is a text file which looks like this:
>>>>>>
>>>>>> feature_id,track_id,feature_value
>>>>>> <integer>,< integer>,<double>
>>>>>>
>>>>>> This command works locally for less than 600 tracks (based on
>>>>>> mahout-core-0.7-cdh4.5.0-job.jar):
>>>>>>
>>>>>> mahout itemsimilarity --input input/msd_sample/mahout --output
>>>>>> output/mahout --similarityClassname
>>>>>> SIMILARITY_EUCLIDEAN_DISTANCE --booleanData false
>>>>>> --maxSimilaritiesPerItem 1
>>>>>>
>>>>>> But for more tracks I get an empty file part-r-0000. I tried to
>>>>>> decrease
>>>>>> the --threshold parameter but I still don't have any result.
>>>>>>
>>>>>> I also tried to launch the job on aws EMR with the equivalent input
>>>>>> for
>>>>>> 3000 tracks (based on mahout-core-0.8-job.jar):
>>>>>>
>>>>>> org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob
>>>>>> --input
>>>>>> s3n://hadoop-filrouge/input/msd-sample/mahout --output
>>>>>> s3n://hadoop-filrouge/output/mahout/01202014-itemsimilarity
>>>>>> --similarityClassname SIMILARITY_EUCLIDEAN_DISTANCE --booleanData
>>>>>> false
>>>>>> --maxSimilaritiesPerItem 1
>>>>>>
>>>>>> The job runs successfully but I get 17 empty part-r-000xx..
>>>>>>
>>>>>> I'm totally stuck right now and I'm running out of idea to fix this
>>>>>> issue.
>>>>>> So if anydody only have a little idea of what is going on, that could
>>>>>> really help.
>>>>>>
>>>>>> Many thanks,
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Reply via email to