Hi,

After ParallelAlsJob, I'm trying now the parallel item-based recommender
job. Here's some questions.

   1. I specified a userFile, which contains 110000 diff. users, but the
   output contains more than this, nearly 130000 users' recommendatoin. Why is
   this?
   2. How the threashold value is chosen in real cases? For example I'm
   using boolean data and LogLikelyHood.
   3. The job runs slowly, nearly 8h on 48M datapoints. By default all jobs
   have only one reducer, which is the slowest part. How should I choose and
   set the reducer number to make it fast?For example the last job,
   PartialMultiplyMapper-Reducer, takes 7h and its reducer takes 5h. On the
   same data ParallelAls finishes in 1.5h with the threaded version.

Thanks!

-- 
*JU Han*

UTC   -  Université de Technologie de Compiègne
*     **GI06 - Fouille de Données et Décisionnel*

+33 0619608888

Reply via email to