Hello Tomomichi, I think its computationally less expensive and programmatically easier to recalculate the whole similarities.
--sebastian On 08.07.2013 13:17, Tomomichi Takiguchi wrote: > Hello, > > I'd like to calculate the score of product similarity from the following > INPUT FILE. > The number of lines in the INPUT FILE is around 300 million. > I want to do ItemSimilarityJob to INPUT FILE every day. > (The detail of command line is as follows.) > > The number of lines in the INPUT FILE is increasing everyday. > I don't want to calculate the whole INPUT FILE because we have to take a > lot of time for calculating the data. > > Could you please advise me how to calculate incremental data in INPUT FILE > without taking many time? > Is it possible to calculate incremental data in ItemSimilarityJob ? > > Thanks > > > [Command] > ----------------------------------------------------------------- > hadoop jar /usr/lib/mahout/mahout-core-0.7-cdh4.2.1-job.jar > org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob > -i INPUT_FILE_PATH > -o OUTPUT_FILE_PATH > --tempDir TEMP_DIR > -b TRUE > -m 100000 > -s SIMILARITY_TANIMOTO_COEFFICIENT > ----------------------------------------------------------------- > > [INPUT FILE] > > UserID,Product ID > -------------------- > 1,10000 > 1,10010 > 1,10020 > 2,20000 > 2,20020 > 3,20000 > 3,10010 > 4,20000 > 4,11000 > 4,22000 > .... > -------------------- > > [OUTPUT FILE] > > ProductID,ProductID,The score of similarity > ------------------------------------- > 10000 10010 0.003048780487804878 > 10000 10020 0.0035335689045936395 > 20000 20020 0.0027624309392265192 > 20000 22000 0.018518518518518517 > .... > ------------------------------------- > > > Regards > Tomomichi Takiguchi >
