Just make, say, a completely dense fake data set over 1000 users and items. Something will come out. On Jun 6, 2012 6:11 PM, "Something Something" <[email protected]> wrote:
> Hmm... that's what I am thinking.. something is a miss! A few lines from > the files are pasted above. The pattern is fairly similar. Is there a > place where I can upload part of my file for someone else to try? > > OR BETTER YET - Can someone provide a small file that always returns a few > similarities? Does a file such as this included in the source? > > Thanks for the help. > > On Wed, Jun 6, 2012 at 9:01 AM, Sean Owen <[email protected]> wrote: > > > That sounds like plenty of data -- doubting that's any issue. Is it > > very sparse? Meaning many items exist just for one user? It's really > > sparseness that might produce few or no similarities. > > > > I think something else is at work here but don't know off the top of > > my head based on the info so far. > > > > Yes it is always the same hash function -- top 8 bytes of the MD5 > > hash. Same input means same output. > > > > Sean > > > > On Wed, Jun 6, 2012 at 4:57 PM, Something Something > > <[email protected]> wrote: > > > The input size was about 6 Million so I was expecting to find some > > > similarities. Anyway, I have started a test with the real dataset that > > > contains 700 million lines. We shall see how that goes. One quick > > > question, though: > > > > > > I am using MemoryIDMigrator to convert UserIds from String to Long as > > > follows: > > > > > > static UpdatableIDMigrator migrator = new MemoryIDMigrator(); > > > <some code omitted here...> > > > migrator.toLongID(strUserID); > > > > > > Question: If I pass the same userId multiple times to this method, I > am > > > guaranteed to get the same 'Long' number back, correct? > > >
