Hello,
In the ItemSimilarityJob, the parameter "maxSimilaritiesPerItem" is firstly 
used in the 7th map/reduce job “asMatrix” as

    protected void reduce(SimilarityMatrixEntryKey key,
                          Iterable<DistributedRowMatrix.MatrixEntryWritable> 
entries,
                          Context ctx) throws IOException, InterruptedException 
{
      RandomAccessSparseVector temporaryVector = new 
RandomAccessSparseVector(Integer.MAX_VALUE, maxSimilaritiesPerRow);
      int similaritiesSet = 0;
      for (DistributedRowMatrix.MatrixEntryWritable entry : entries) {
        temporaryVector.setQuick(entry.getCol(), entry.getVal());
        if (++similaritiesSet == maxSimilaritiesPerRow) {
          break;
        }
      }
      SequentialAccessSparseVector vector = new 
SequentialAccessSparseVector(temporaryVector);
      ctx.write(new IntWritable(key.getRow()), new VectorWritable(vector));
    }

I am confused that whether all the other items with similarity are written into 
the matrix for each item or not, if only part of them (not more than 
maxSimilaritiesPerItem) are written, then how to select them? Random?
Thanks.

yudong


Reply via email to