Re: how to understand the parameter "maxSimilaritiesPerItem"

张玉东 Thu, 08 Sep 2011 04:52:06 -0700

Ok, I understand this point, but in this step, the top similar items have been 
chosen, then is it needed to select the top "maxSimilaritiesPerItem" items in 
the job "mostSimilarItems" ?


-----邮件原件-----
发件人: Sebastian Schelter [mailto:[email protected]] 
发送时间: 2011年9月8日 19:42
收件人: [email protected]
主题: Re: how to understand the parameter "maxSimilaritiesPerItem"

The code snippet is invoked in a job that uses "Secondary Sort" which
means that the "entries" will be seen in descending order by the
reducer. That's why we only need to process the first ones.

--sebastian

On 08.09.2011 13:38, 张玉东 wrote:
> Hello,
> In the ItemSimilarityJob, the parameter "maxSimilaritiesPerItem" is firstly 
> used in the 7th map/reduce job “asMatrix” as
> 
>     protected void reduce(SimilarityMatrixEntryKey key,
>                           Iterable<DistributedRowMatrix.MatrixEntryWritable> 
> entries,
>                           Context ctx) throws IOException, 
> InterruptedException {
>       RandomAccessSparseVector temporaryVector = new 
> RandomAccessSparseVector(Integer.MAX_VALUE, maxSimilaritiesPerRow);
>       int similaritiesSet = 0;
>       for (DistributedRowMatrix.MatrixEntryWritable entry : entries) {
>         temporaryVector.setQuick(entry.getCol(), entry.getVal());
>         if (++similaritiesSet == maxSimilaritiesPerRow) {
>           break;
>         }
>       }
>       SequentialAccessSparseVector vector = new 
> SequentialAccessSparseVector(temporaryVector);
>       ctx.write(new IntWritable(key.getRow()), new VectorWritable(vector));
>     }
> 
> I am confused that whether all the other items with similarity are written 
> into the matrix for each item or not, if only part of them (not more than 
> maxSimilaritiesPerItem) are written, then how to select them? Random?
> Thanks.
> 
> yudong
> 
>

Re: how to understand the parameter "maxSimilaritiesPerItem"

Reply via email to