The code snippet is invoked in a job that uses "Secondary Sort" which
means that the "entries" will be seen in descending order by the
reducer. That's why we only need to process the first ones.
--sebastian
On 08.09.2011 13:38, 张玉东 wrote:
> Hello,
> In the ItemSimilarityJob, the parameter "maxSimilaritiesPerItem" is firstly
> used in the 7th map/reduce job “asMatrix” as
>
> protected void reduce(SimilarityMatrixEntryKey key,
> Iterable<DistributedRowMatrix.MatrixEntryWritable>
> entries,
> Context ctx) throws IOException,
> InterruptedException {
> RandomAccessSparseVector temporaryVector = new
> RandomAccessSparseVector(Integer.MAX_VALUE, maxSimilaritiesPerRow);
> int similaritiesSet = 0;
> for (DistributedRowMatrix.MatrixEntryWritable entry : entries) {
> temporaryVector.setQuick(entry.getCol(), entry.getVal());
> if (++similaritiesSet == maxSimilaritiesPerRow) {
> break;
> }
> }
> SequentialAccessSparseVector vector = new
> SequentialAccessSparseVector(temporaryVector);
> ctx.write(new IntWritable(key.getRow()), new VectorWritable(vector));
> }
>
> I am confused that whether all the other items with similarity are written
> into the matrix for each item or not, if only part of them (not more than
> maxSimilaritiesPerItem) are written, then how to select them? Random?
> Thanks.
>
> yudong
>
>