Hello,
In the ItemSimilarityJob, the parameter "maxSimilaritiesPerItem" is firstly
used in the 7th map/reduce job “asMatrix” as
protected void reduce(SimilarityMatrixEntryKey key,
Iterable<DistributedRowMatrix.MatrixEntryWritable>
entries,
Context ctx) throws IOException, InterruptedException
{
RandomAccessSparseVector temporaryVector = new
RandomAccessSparseVector(Integer.MAX_VALUE, maxSimilaritiesPerRow);
int similaritiesSet = 0;
for (DistributedRowMatrix.MatrixEntryWritable entry : entries) {
temporaryVector.setQuick(entry.getCol(), entry.getVal());
if (++similaritiesSet == maxSimilaritiesPerRow) {
break;
}
}
SequentialAccessSparseVector vector = new
SequentialAccessSparseVector(temporaryVector);
ctx.write(new IntWritable(key.getRow()), new VectorWritable(vector));
}
I am confused that whether all the other items with similarity are written into
the matrix for each item or not, if only part of them (not more than
maxSimilaritiesPerItem) are written, then how to select them? Random?
Thanks.
yudong