[ 
https://issues.apache.org/jira/browse/S4-30?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13174929#comment-13174929
 ] 

Matthieu Morel commented on S4-30:
----------------------------------

thanks a lot Quoc, that's very useful!

What happens is that the calculated hash value is truncated to a _signed_ 32 
bits number (contrary to what I initially assumed).

I'm not exactly sure about the rationale for truncating to 32 bits, and I don't 
see an optimized way to make sure we get a positive value when casting to int, 
maybe somebody has one?

In the meantime, we could simply use Math.abs (slower, but correct!) and 
probably replace:
{code}return rv & 0xffffffffL;{code}

with 

{code}return Math.abs((int)(rv & 0xffffffffL));{code}

...so that we make sure we have a positive value when we cast to an integer.

We might also add regression tests such as those from twitter's utility library 
https://github.com/twitter/util/blob/master/util-hashing/src/test/scala/com/twitter/hashing/KeyHasherSpec.scala
                
> DefaultHasher hashes keys to negative number
> --------------------------------------------
>
>                 Key: S4-30
>                 URL: https://issues.apache.org/jira/browse/S4-30
>             Project: Apache S4
>          Issue Type: Bug
>    Affects Versions: 0.4
>         Environment: All - Windows and Linux
>            Reporter: Quoc Nguyen
>            Priority: Blocker
>
> DefaultHasher uses HashAlgorithm hashAlgorithm = HashAlgorithm.FNV1_64_HASH; 
> which hashes key strings such as 118+18233, 118+17360, 118+17258, 118+18147 
> and 118+18121 and many more to negative values which the DefaultPartitioner 
> (int partitionId = (int) (hasher.hash(stringValue) % partitionCount);) tries 
> to partition the key to incorrect partition.
> Workaround:
> None - stream has those keys, they will get dropped because the partitioner 
> cannot correctly partition.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to