Hi Sebastian

Many thanks for your advice.
I understand it.I will calculate the whole date.

Regards
Takiguchi



2013/7/10 Sebastian Schelter <[email protected]>

> Hello Tomomichi,
>
> I think its computationally less expensive and programmatically easier
> to recalculate the whole similarities.
>
> --sebastian
>
> On 08.07.2013 13:17, Tomomichi Takiguchi wrote:
> > Hello,
> >
> > I'd like to calculate the score of product similarity from the following
> > INPUT FILE.
> > The number of lines in the INPUT FILE is around 300 million.
> > I want to do ItemSimilarityJob to INPUT FILE every day.
> > (The detail of command line is as follows.)
> >
> > The number of lines in the INPUT FILE is increasing everyday.
> > I don't want to calculate the whole INPUT FILE because we have to take a
> > lot of time for calculating the data.
> >
> > Could you please advise me how to calculate incremental data in INPUT
> FILE
> > without taking many time?
> > Is it possible to calculate incremental data in ItemSimilarityJob ?
> >
> > Thanks
> >
> >
> > [Command]
> > -----------------------------------------------------------------
> > hadoop jar /usr/lib/mahout/mahout-core-0.7-cdh4.2.1-job.jar
> > org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob
> > -i INPUT_FILE_PATH
> > -o OUTPUT_FILE_PATH
> > --tempDir TEMP_DIR
> > -b TRUE
> > -m 100000
> > -s SIMILARITY_TANIMOTO_COEFFICIENT
> > -----------------------------------------------------------------
> >
> > [INPUT FILE]
> >
> > UserID,Product ID
> > --------------------
> > 1,10000
> > 1,10010
> > 1,10020
> > 2,20000
> > 2,20020
> > 3,20000
> > 3,10010
> > 4,20000
> > 4,11000
> > 4,22000
> > ....
> > --------------------
> >
> > [OUTPUT FILE]
> >
> > ProductID,ProductID,The score of similarity
> > -------------------------------------
> > 10000   10010   0.003048780487804878
> > 10000   10020   0.0035335689045936395
> > 20000   20020   0.0027624309392265192
> > 20000   22000   0.018518518518518517
> > ....
> > -------------------------------------
> >
> >
> > Regards
> > Tomomichi Takiguchi
> >
>
>

Reply via email to