Hmm... that's what I am thinking.. something is a miss! A few lines from the files are pasted above. The pattern is fairly similar. Is there a place where I can upload part of my file for someone else to try?
OR BETTER YET - Can someone provide a small file that always returns a few similarities? Does a file such as this included in the source? Thanks for the help. On Wed, Jun 6, 2012 at 9:01 AM, Sean Owen <[email protected]> wrote: > That sounds like plenty of data -- doubting that's any issue. Is it > very sparse? Meaning many items exist just for one user? It's really > sparseness that might produce few or no similarities. > > I think something else is at work here but don't know off the top of > my head based on the info so far. > > Yes it is always the same hash function -- top 8 bytes of the MD5 > hash. Same input means same output. > > Sean > > On Wed, Jun 6, 2012 at 4:57 PM, Something Something > <[email protected]> wrote: > > The input size was about 6 Million so I was expecting to find some > > similarities. Anyway, I have started a test with the real dataset that > > contains 700 million lines. We shall see how that goes. One quick > > question, though: > > > > I am using MemoryIDMigrator to convert UserIds from String to Long as > > follows: > > > > static UpdatableIDMigrator migrator = new MemoryIDMigrator(); > > <some code omitted here...> > > migrator.toLongID(strUserID); > > > > Question: If I pass the same userId multiple times to this method, I am > > guaranteed to get the same 'Long' number back, correct? >
