Re: How to calculate incremental data in ItemSimilarityJob without calculating whole data ?

Sebastian Schelter Wed, 10 Jul 2013 06:47:18 -0700

Hello Tomomichi,

I think its computationally less expensive and programmatically easier
to recalculate the whole similarities.


--sebastian

On 08.07.2013 13:17, Tomomichi Takiguchi wrote:
> Hello,
> 
> I'd like to calculate the score of product similarity from the following
> INPUT FILE.
> The number of lines in the INPUT FILE is around 300 million.
> I want to do ItemSimilarityJob to INPUT FILE every day.
> (The detail of command line is as follows.)
> 
> The number of lines in the INPUT FILE is increasing everyday.
> I don't want to calculate the whole INPUT FILE because we have to take a
> lot of time for calculating the data.
> 
> Could you please advise me how to calculate incremental data in INPUT FILE
> without taking many time?
> Is it possible to calculate incremental data in ItemSimilarityJob ?
> 
> Thanks
> 
> 
> [Command]
> -----------------------------------------------------------------
> hadoop jar /usr/lib/mahout/mahout-core-0.7-cdh4.2.1-job.jar
> org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob
> -i INPUT_FILE_PATH
> -o OUTPUT_FILE_PATH
> --tempDir TEMP_DIR
> -b TRUE
> -m 100000
> -s SIMILARITY_TANIMOTO_COEFFICIENT
> -----------------------------------------------------------------
> 
> [INPUT FILE]
> 
> UserID,Product ID
> --------------------
> 1,10000
> 1,10010
> 1,10020
> 2,20000
> 2,20020
> 3,20000
> 3,10010
> 4,20000
> 4,11000
> 4,22000
> ....
> --------------------
> 
> [OUTPUT FILE]
> 
> ProductID,ProductID,The score of similarity
> -------------------------------------
> 10000   10010   0.003048780487804878
> 10000   10020   0.0035335689045936395
> 20000   20020   0.0027624309392265192
> 20000   22000   0.018518518518518517
> ....
> -------------------------------------
> 
> 
> Regards
> Tomomichi Takiguchi
>

Re: How to calculate incremental data in ItemSimilarityJob without calculating whole data ?

Reply via email to