Re: how to understand the parameter "maxSimilaritiesPerItem"

Sebastian Schelter Thu, 08 Sep 2011 04:43:03 -0700

The code snippet is invoked in a job that uses "Secondary Sort" which
means that the "entries" will be seen in descending order by the
reducer. That's why we only need to process the first ones.


--sebastian

On 08.09.2011 13:38, 张玉东 wrote:
> Hello,
> In the ItemSimilarityJob, the parameter "maxSimilaritiesPerItem" is firstly 
> used in the 7th map/reduce job “asMatrix” as
> 
>     protected void reduce(SimilarityMatrixEntryKey key,
>                           Iterable<DistributedRowMatrix.MatrixEntryWritable> 
> entries,
>                           Context ctx) throws IOException, 
> InterruptedException {
>       RandomAccessSparseVector temporaryVector = new 
> RandomAccessSparseVector(Integer.MAX_VALUE, maxSimilaritiesPerRow);
>       int similaritiesSet = 0;
>       for (DistributedRowMatrix.MatrixEntryWritable entry : entries) {
>         temporaryVector.setQuick(entry.getCol(), entry.getVal());
>         if (++similaritiesSet == maxSimilaritiesPerRow) {
>           break;
>         }
>       }
>       SequentialAccessSparseVector vector = new 
> SequentialAccessSparseVector(temporaryVector);
>       ctx.write(new IntWritable(key.getRow()), new VectorWritable(vector));
>     }
> 
> I am confused that whether all the other items with similarity are written 
> into the matrix for each item or not, if only part of them (not more than 
> maxSimilaritiesPerItem) are written, then how to select them? Random?
> Thanks.
> 
> yudong
> 
>

Re: how to understand the parameter "maxSimilaritiesPerItem"

Reply via email to