That sounds like plenty of data -- doubting that's any issue. Is it
very sparse? Meaning many items exist just for one user? It's really
sparseness that might produce few or no similarities.

I think something else is at work here but don't know off the top of
my head based on the info so far.

Yes it is always the same hash function -- top 8 bytes of the MD5
hash. Same input means same output.

Sean

On Wed, Jun 6, 2012 at 4:57 PM, Something Something
<[email protected]> wrote:
> The input size was about 6 Million so I was expecting to find some
> similarities.  Anyway, I have started a test with the real dataset that
> contains 700 million lines.  We shall see how that goes.  One quick
> question, though:
>
> I am using MemoryIDMigrator to convert UserIds from String to Long as
> follows:
>
>    static UpdatableIDMigrator migrator = new MemoryIDMigrator();
> <some code omitted here...>
>    migrator.toLongID(strUserID);
>
> Question:  If I pass the same userId multiple times to this method, I am
> guaranteed to get the same 'Long' number back, correct?

Reply via email to