Just make, say, a completely dense fake data set over 1000 users and items.
Something will come out.
On Jun 6, 2012 6:11 PM, "Something Something" <[email protected]>
wrote:

> Hmm... that's what I am thinking.. something is a miss!  A few lines from
> the files are pasted above.  The pattern is fairly similar.  Is there a
> place where I can upload part of my file for someone else to try?
>
> OR BETTER YET - Can someone provide a small file that always returns a few
> similarities?  Does a file such as this included in the source?
>
> Thanks for the help.
>
> On Wed, Jun 6, 2012 at 9:01 AM, Sean Owen <[email protected]> wrote:
>
> > That sounds like plenty of data -- doubting that's any issue. Is it
> > very sparse? Meaning many items exist just for one user? It's really
> > sparseness that might produce few or no similarities.
> >
> > I think something else is at work here but don't know off the top of
> > my head based on the info so far.
> >
> > Yes it is always the same hash function -- top 8 bytes of the MD5
> > hash. Same input means same output.
> >
> > Sean
> >
> > On Wed, Jun 6, 2012 at 4:57 PM, Something Something
> > <[email protected]> wrote:
> > > The input size was about 6 Million so I was expecting to find some
> > > similarities.  Anyway, I have started a test with the real dataset that
> > > contains 700 million lines.  We shall see how that goes.  One quick
> > > question, though:
> > >
> > > I am using MemoryIDMigrator to convert UserIds from String to Long as
> > > follows:
> > >
> > >    static UpdatableIDMigrator migrator = new MemoryIDMigrator();
> > > <some code omitted here...>
> > >    migrator.toLongID(strUserID);
> > >
> > > Question:  If I pass the same userId multiple times to this method, I
> am
> > > guaranteed to get the same 'Long' number back, correct?
> >
>

Reply via email to