The point is that I'm using the cloudera pseudo-distirbuted distribution and I think mahout-core-0.7-cdh4.5.0-job.jar is the up to date mahout version for cdh4.
2014/1/21 Sebastian Schelter <[email protected]> > I ran your example file with the current trunk and got results. Can you > try to upgrade or are you bound to 0.7? If the latter is the case, I can > rerun the test with 0.7. > > --sebastian > > > > On 01/21/2014 05:35 PM, Quentin-Gabriel Thurier wrote: > >> I'm using mahout-examples-0.7-cdh4.5.0-job.jar locally. But I tried on >> EMR >> (with mahout-examples-0.8-job.jar this time) on 3000 tracks and I also had >> empty result files. Should I send you the dataset on your apache address >> (it is only 140Ko)? >> >> Quentin >> >> >> 2014/1/21 Sebastian Schelter <[email protected]> >> >> Hmm, strange. Which version of mahout are you using? Do you run the 1200 >>> tracks job locally or on a cluster? Can you share your input file (in >>> private)? >>> >>> --sebastian >>> >>> >>> >>> On 01/21/2014 02:34 PM, Quentin-Gabriel Thurier wrote: >>> >>> Hi Sebastian >>>> >>>> I tested the job on a tiny example (50 tracks) : >>>> >>>> mahout itemsimilarity --input input/msd_sample/mahout5 --output >>>> >>>>> >>>>> output/mahout5 --similarityClassname SIMILARITY_EUCLIDEAN_DISTANCE >>>> --booleanData false --maxSimilaritiesPerItem 1 >>>> >>>> *1st row of the output: >>>> >>>> -2135949055 -335737401 0.09939478338891584 >>>> >>>> *related rows from the input: >>>> >>>> 1,-2135949055,230.42567 >>>> 2,-2135949055,0.0 >>>> 3,-2135949055,0.0 >>>> 4,-2135949055,-3.96 >>>> 5,-2135949055,-1.0 >>>> 6,-2135949055,96.897 >>>> 1,-335737401,222.35384 >>>> 2,-335737401,0.0 >>>> 3,-335737401,0.0 >>>> 4,-335737401,-5.232 >>>> 5,-335737401,-1.0 >>>> 6,-335737401,100.812 >>>> >>>> This is correct : >>>> 1/(1+(230.42567-222.35384)^2+(-3.96--5.232)^2+(96.897-100.812)^2) >>>> = 0.09939483 >>>> >>>> I don't have any exception except the usual warning : WARN >>>> mapred.JobClient: Use GenericOptionsParser for parsing the arguments. >>>> Applications should implement Tool for the same. >>>> >>>> Then I take 1200 tracks (the 50 previous are included in the 1200) the >>>> job >>>> don't fail but part-r-00000 is empty. As previously I only have a >>>> warning >>>> and the input looks like: >>>> >>>> 1,524572804,192.522 >>>> 2,524572804,0.0 >>>> 3,524572804,0.0 >>>> 4,524572804,-5.902 >>>> 5,524572804,-1.0 >>>> 6,524572804,123.756 >>>> 1,-1821170097,269.81833 >>>> 2,-1821170097,0.0 >>>> 3,-1821170097,0.0 >>>> 4,-1821170097,-13.496 >>>> 5,-1821170097,0.26586103 >>>> 6,-1821170097,86.643 >>>> >>>> Quentin >>>> >>>> >>>> 2014/1/21 Sebastian Schelter <[email protected]> >>>> >>>> Hi Quentin, >>>> >>>>> >>>>> Have you checked the log to ensure that you don't get any exceptions >>>>> during the computation? >>>>> >>>>> Could you test the job with a tiny example where you can calculate the >>>>> result by hand? >>>>> >>>>> Can you share an input file on which this job fails? >>>>> >>>>> --sebastian >>>>> >>>>> >>>>> On 01/21/2014 11:22 AM, Quentin-Gabriel Thurier wrote: >>>>> >>>>> I encounter few troubles with Mahout that I can't sort out.. >>>>> >>>>>> >>>>>> The context is that I'm trying to calculate pairwise euclidean >>>>>> distances >>>>>> between music tracks based on 6 audio features per track. My input for >>>>>> the >>>>>> mahout job is a text file which looks like this: >>>>>> >>>>>> feature_id,track_id,feature_value >>>>>> <integer>,< integer>,<double> >>>>>> >>>>>> This command works locally for less than 600 tracks (based on >>>>>> mahout-core-0.7-cdh4.5.0-job.jar): >>>>>> >>>>>> mahout itemsimilarity --input input/msd_sample/mahout --output >>>>>> output/mahout --similarityClassname >>>>>> SIMILARITY_EUCLIDEAN_DISTANCE --booleanData false >>>>>> --maxSimilaritiesPerItem 1 >>>>>> >>>>>> But for more tracks I get an empty file part-r-0000. I tried to >>>>>> decrease >>>>>> the --threshold parameter but I still don't have any result. >>>>>> >>>>>> I also tried to launch the job on aws EMR with the equivalent input >>>>>> for >>>>>> 3000 tracks (based on mahout-core-0.8-job.jar): >>>>>> >>>>>> org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob >>>>>> --input >>>>>> s3n://hadoop-filrouge/input/msd-sample/mahout --output >>>>>> s3n://hadoop-filrouge/output/mahout/01202014-itemsimilarity >>>>>> --similarityClassname SIMILARITY_EUCLIDEAN_DISTANCE --booleanData >>>>>> false >>>>>> --maxSimilaritiesPerItem 1 >>>>>> >>>>>> The job runs successfully but I get 17 empty part-r-000xx.. >>>>>> >>>>>> I'm totally stuck right now and I'm running out of idea to fix this >>>>>> issue. >>>>>> So if anydody only have a little idea of what is going on, that could >>>>>> really help. >>>>>> >>>>>> Many thanks, >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>> >>> >> >
