I tried with a bigger/denser dataset, but still no output. Here's what I
noticed:
In the MergeVectorsReducer, I see the following:
@Override
protected void reduce(IntWritable row, Iterable<VectorWritable>
partialVectors, Context ctx)
throws IOException, InterruptedException {
Vector partialVector = Vectors.merge(partialVectors);
if (row.get() == NORM_VECTOR_MARKER) {
Vectors.write(partialVector, normsPath, ctx.getConfiguration());
} else if (row.get() == MAXVALUE_VECTOR_MARKER) {
Vectors.write(partialVector, maxValuesPath, ctx.getConfiguration());
} else if (row.get() == NUM_NON_ZERO_ENTRIES_VECTOR_MARKER) {
Vectors.write(partialVector, numNonZeroEntriesPath,
ctx.getConfiguration(), true);
} else {
ctx.write(row, new VectorWritable(partialVector));
}
}
There's nothing coming out of this method. Where is the output supposed to
go? In other words, what Path is this:
normsPath = new Path(ctx.getConfiguration().get(NORMS_PATH));
There are 150 rows going into this reducer & nothing is coming out. Where
is it supposed to go under /tmp? I see the following under HDFS:
-rw-r--r-- 3 root supergroup 7 2012-06-06 21:57
/user/XXX/tmp/maxValues.bin
-rw-r--r-- 3 root supergroup 7 2012-06-06 21:57
/user/XXX/tmp/norms.bin
-rw-r--r-- 3 root supergroup 7 2012-06-06 21:57
/user/XXX/tmp/numNonZeroEntries.bin
drwxrwxrwx - root supergroup 0 2012-06-06 21:57
/user/XXX/tmp/pairwiseSimilarity
drwxrwxrwx - root supergroup 0 2012-06-06 21:55
/user/XXX/tmp/prepareRatingMatrix
drwxrwxrwx - root supergroup 0 2012-06-06 21:58
/user/XXX/tmp/similarityMatrix
drwxrwxrwx - root supergroup 0 2012-06-06 21:57
/user/XXX/tmp/weights
On Wed, Jun 6, 2012 at 10:20 AM, Sean Owen <[email protected]> wrote:
> Just make, say, a completely dense fake data set over 1000 users and items.
> Something will come out.
> On Jun 6, 2012 6:11 PM, "Something Something" <[email protected]>
> wrote:
>
> > Hmm... that's what I am thinking.. something is a miss! A few lines from
> > the files are pasted above. The pattern is fairly similar. Is there a
> > place where I can upload part of my file for someone else to try?
> >
> > OR BETTER YET - Can someone provide a small file that always returns a
> few
> > similarities? Does a file such as this included in the source?
> >
> > Thanks for the help.
> >
> > On Wed, Jun 6, 2012 at 9:01 AM, Sean Owen <[email protected]> wrote:
> >
> > > That sounds like plenty of data -- doubting that's any issue. Is it
> > > very sparse? Meaning many items exist just for one user? It's really
> > > sparseness that might produce few or no similarities.
> > >
> > > I think something else is at work here but don't know off the top of
> > > my head based on the info so far.
> > >
> > > Yes it is always the same hash function -- top 8 bytes of the MD5
> > > hash. Same input means same output.
> > >
> > > Sean
> > >
> > > On Wed, Jun 6, 2012 at 4:57 PM, Something Something
> > > <[email protected]> wrote:
> > > > The input size was about 6 Million so I was expecting to find some
> > > > similarities. Anyway, I have started a test with the real dataset
> that
> > > > contains 700 million lines. We shall see how that goes. One quick
> > > > question, though:
> > > >
> > > > I am using MemoryIDMigrator to convert UserIds from String to Long as
> > > > follows:
> > > >
> > > > static UpdatableIDMigrator migrator = new MemoryIDMigrator();
> > > > <some code omitted here...>
> > > > migrator.toLongID(strUserID);
> > > >
> > > > Question: If I pass the same userId multiple times to this method, I
> > am
> > > > guaranteed to get the same 'Long' number back, correct?
> > >
> >
>