[ https://issues.apache.org/jira/browse/CODEC-267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16979251#comment-16979251 ]
Alex Herbert commented on CODEC-267: ------------------------------------ Another bug related to sign extension is in hash128() > MurmurHash3.hash32() does not process trailing bytes as unsigned > ---------------------------------------------------------------- > > Key: CODEC-267 > URL: https://issues.apache.org/jira/browse/CODEC-267 > Project: Commons Codec > Issue Type: Bug > Affects Versions: 1.13 > Reporter: Claude Warren > Assignee: Alex Herbert > Priority: Major > > The hash32() algorithm processes blocks of 4 bytes. Trailing bytes of 1, 2 or > 3 that are negative are not masked to unsigned leading to an error. > This test passes using data generated from the Python mmh3 library which > calls the MurmurHash3 c++ code (modified for Python): > {code:java} > /** > * Test to demonstrate the errors in > * {@link MurmurHash3#hash32(byte[], int, int, int)} > * if the final 1, 2, or 3 bytes are negative. > */ > @Test > public void testHash32With1TrailingSignedByteIsInvalid() { > // Generate test data: > // import mmh3 > // import numpy as np > // mmh3.hash(np.uint8([-1])) > // mmh3.hash(np.uint8([0, -1])) > // mmh3.hash(np.uint8([0, 0, -1])) > // mmh3.hash(np.uint8([-1, 0])) > // mmh3.hash(np.uint8([-1, 0, 0])) > // mmh3.hash(np.uint8([0, -1, 0])) > Assert.assertNotEquals(-43192051, MurmurHash3.hash32(new byte[] {-1}, 0, > 1, 0)); > Assert.assertNotEquals(-582037868, MurmurHash3.hash32(new byte[] {0, -1}, > 0, 1, 0)); > Assert.assertNotEquals(922088087, MurmurHash3.hash32(new byte[] {0, 0, > -1}, 0, 1, 0)); > Assert.assertNotEquals(-1309567588, MurmurHash3.hash32(new byte[] {-1, > 0}, 0, 1, 0)); > Assert.assertNotEquals(-363779670, MurmurHash3.hash32(new byte[] {-1, 0, > 0}, 0, 1, 0)); > Assert.assertNotEquals(-225068062, MurmurHash3.hash32(new byte[] {0, -1, > 0}, 0, 1, 0)); > } > {code} > This test passes with {{assertEquals}} when the code is fixed to apply > masking to the final 3 bytes: > {code:java} > case 3: > k1 ^= (data[index + 2] & 0xff) << 16; > case 2: > k1 ^= (data[index + 1] & 0xff) << 8; > case 1: > k1 ^= (data[index] & 0xff); > {code} > Fixing this error will be a behavioural change. > It is recommended to leave this method alone and implement a new hash32x86 > method that should match the {{MurmurHash3_x86_32}} method from the c++ > source code. > -- This message was sent by Atlassian Jira (v8.3.4#803005)