Hi, After ParallelAlsJob, I'm trying now the parallel item-based recommender job. Here's some questions.
1. I specified a userFile, which contains 110000 diff. users, but the output contains more than this, nearly 130000 users' recommendatoin. Why is this? 2. How the threashold value is chosen in real cases? For example I'm using boolean data and LogLikelyHood. 3. The job runs slowly, nearly 8h on 48M datapoints. By default all jobs have only one reducer, which is the slowest part. How should I choose and set the reducer number to make it fast?For example the last job, PartialMultiplyMapper-Reducer, takes 7h and its reducer takes 5h. On the same data ParallelAls finishes in 1.5h with the threaded version. Thanks! -- *JU Han* UTC - Université de Technologie de Compiègne * **GI06 - Fouille de Données et Décisionnel* +33 0619608888
