Ok, I understand this point, but in this step, the top similar items have been chosen, then is it needed to select the top "maxSimilaritiesPerItem" items in the job "mostSimilarItems" ?
-----邮件原件----- 发件人: Sebastian Schelter [mailto:[email protected]] 发送时间: 2011年9月8日 19:42 收件人: [email protected] 主题: Re: how to understand the parameter "maxSimilaritiesPerItem" The code snippet is invoked in a job that uses "Secondary Sort" which means that the "entries" will be seen in descending order by the reducer. That's why we only need to process the first ones. --sebastian On 08.09.2011 13:38, 张玉东 wrote: > Hello, > In the ItemSimilarityJob, the parameter "maxSimilaritiesPerItem" is firstly > used in the 7th map/reduce job “asMatrix” as > > protected void reduce(SimilarityMatrixEntryKey key, > Iterable<DistributedRowMatrix.MatrixEntryWritable> > entries, > Context ctx) throws IOException, > InterruptedException { > RandomAccessSparseVector temporaryVector = new > RandomAccessSparseVector(Integer.MAX_VALUE, maxSimilaritiesPerRow); > int similaritiesSet = 0; > for (DistributedRowMatrix.MatrixEntryWritable entry : entries) { > temporaryVector.setQuick(entry.getCol(), entry.getVal()); > if (++similaritiesSet == maxSimilaritiesPerRow) { > break; > } > } > SequentialAccessSparseVector vector = new > SequentialAccessSparseVector(temporaryVector); > ctx.write(new IntWritable(key.getRow()), new VectorWritable(vector)); > } > > I am confused that whether all the other items with similarity are written > into the matrix for each item or not, if only part of them (not more than > maxSimilaritiesPerItem) are written, then how to select them? Random? > Thanks. > > yudong > >
