That sounds like plenty of data -- doubting that's any issue. Is it very sparse? Meaning many items exist just for one user? It's really sparseness that might produce few or no similarities.
I think something else is at work here but don't know off the top of my head based on the info so far. Yes it is always the same hash function -- top 8 bytes of the MD5 hash. Same input means same output. Sean On Wed, Jun 6, 2012 at 4:57 PM, Something Something <[email protected]> wrote: > The input size was about 6 Million so I was expecting to find some > similarities. Anyway, I have started a test with the real dataset that > contains 700 million lines. We shall see how that goes. One quick > question, though: > > I am using MemoryIDMigrator to convert UserIds from String to Long as > follows: > > static UpdatableIDMigrator migrator = new MemoryIDMigrator(); > <some code omitted here...> > migrator.toLongID(strUserID); > > Question: If I pass the same userId multiple times to this method, I am > guaranteed to get the same 'Long' number back, correct?
