[jira] [Updated] (CASSANDRA-2975) Upgrade MurmurHash to version 3
[ https://issues.apache.org/jira/browse/CASSANDRA-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brian Lindauer updated CASSANDRA-2975: -- Attachment: 0001-Convert-BloomFilter-to-use-MurmurHash-v3-instead-of-.patch Murmur3 changes Upgrade MurmurHash to version 3 --- Key: CASSANDRA-2975 URL: https://issues.apache.org/jira/browse/CASSANDRA-2975 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Brian Lindauer Assignee: Brian Lindauer Priority: Trivial Labels: lhf Fix For: 1.1 Attachments: 0001-Convert-BloomFilter-to-use-MurmurHash-v3-instead-of-.patch, 0002-Backwards-compatibility-with-files-using-Murmur2-blo.patch MurmurHash version 3 was finalized on June 3. It provides an enormous speedup and increased robustness over version 2, which is implemented in Cassandra. Information here: http://code.google.com/p/smhasher/ The reference implementation is here: http://code.google.com/p/smhasher/source/browse/trunk/MurmurHash3.cpp?spec=svn136r=136 I have already done the work to port the (public domain) reference implementation to Java in the MurmurHash class and updated the BloomFilter class to use the new implementation: https://github.com/lindauer/cassandra/commit/cea6068a4a3e5d7d9509335394f9ef3350d37e93 Apart from the faster hash time, the new version only requires one call to hash() rather than 2, since it returns 128 bits of hash instead of 64. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-2975) Upgrade MurmurHash to version 3
[ https://issues.apache.org/jira/browse/CASSANDRA-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brian Lindauer updated CASSANDRA-2975: -- Attachment: 0002-Backwards-compatibility-with-files-using-Murmur2-blo.patch Murmur3-Murmur2 backwards compatibility Upgrade MurmurHash to version 3 --- Key: CASSANDRA-2975 URL: https://issues.apache.org/jira/browse/CASSANDRA-2975 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Brian Lindauer Assignee: Brian Lindauer Priority: Trivial Labels: lhf Fix For: 1.1 Attachments: 0001-Convert-BloomFilter-to-use-MurmurHash-v3-instead-of-.patch, 0002-Backwards-compatibility-with-files-using-Murmur2-blo.patch MurmurHash version 3 was finalized on June 3. It provides an enormous speedup and increased robustness over version 2, which is implemented in Cassandra. Information here: http://code.google.com/p/smhasher/ The reference implementation is here: http://code.google.com/p/smhasher/source/browse/trunk/MurmurHash3.cpp?spec=svn136r=136 I have already done the work to port the (public domain) reference implementation to Java in the MurmurHash class and updated the BloomFilter class to use the new implementation: https://github.com/lindauer/cassandra/commit/cea6068a4a3e5d7d9509335394f9ef3350d37e93 Apart from the faster hash time, the new version only requires one call to hash() rather than 2, since it returns 128 bits of hash instead of 64. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2975) Upgrade MurmurHash to version 3
[ https://issues.apache.org/jira/browse/CASSANDRA-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13073832#comment-13073832 ] Brian Lindauer commented on CASSANDRA-2975: --- You weren't kidding about compatibility with old data files not being simple. It actually turned out to be fairly major surgery. The original changes just to support Mumur3 are here: https://github.com/lindauer/cassandra/commit/cea6068a4a3e5d7d9509335394f9ef3350d37e93 The additional proposed changes to support backward compatibility are at: https://github.com/lindauer/cassandra/commit/9d7479675752a07732f434b307be6642d8b3e85f I can't say I'm completely satisfied with these changes. It feels like we should unify with LegacyBloomFilter now that there are 3 versions. It also feels like all of the places where a serializer is selected based on a Descriptor version/flag could be moved under one roof, where callers just pass the Descriptor and it returns the correct serializer instance. But, not being too familiar with Cassandra, I was trying to be minimally invasive for fear of breaking something. All of the tests pass, but I haven't added any tests, such as making sure that old files can still be read in. Like I said, I'm not very familiar with Cassandra, so you should review these changes carefully. (I'm sure you would anyway.) Upgrade MurmurHash to version 3 --- Key: CASSANDRA-2975 URL: https://issues.apache.org/jira/browse/CASSANDRA-2975 Project: Cassandra Issue Type: Improvement Components: Core Affects Versions: 0.8.3 Reporter: Brian Lindauer Priority: Trivial Labels: lhf MurmurHash version 3 was finalized on June 3. It provides an enormous speedup and increased robustness over version 2, which is implemented in Cassandra. Information here: http://code.google.com/p/smhasher/ The reference implementation is here: http://code.google.com/p/smhasher/source/browse/trunk/MurmurHash3.cpp?spec=svn136r=136 I have already done the work to port the (public domain) reference implementation to Java in the MurmurHash class and updated the BloomFilter class to use the new implementation: https://github.com/lindauer/cassandra/commit/cea6068a4a3e5d7d9509335394f9ef3350d37e93 Apart from the faster hash time, the new version only requires one call to hash() rather than 2, since it returns 128 bits of hash instead of 64. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2975) Upgrade MurmurHash to version 3
[ https://issues.apache.org/jira/browse/CASSANDRA-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13073207#comment-13073207 ] Brian Lindauer commented on CASSANDRA-2975: --- Surprising, but yes. It's dramatically faster. The MurmurHash author reports a 50% speedup over v2 at http://code.google.com/p/smhasher/wiki/MurmurHash3. I ran my own simple benchmark on the Java version comparing the existing MurmurHash.hash64() function to the MurmurHash.hash3_x64_128() I added and found an even larger advantage. The improvement is so huge that I wonder a little bit if there isn't a flaw in my test, but here it is: {code:java} start = System.currentTimeMillis(); long[] reta = {0, 0}; ByteBuffer buf = strToByteBuffer(key); for (int i=0; icnt; i++) { buf.clear(); reta = MurmurHash.hash3_x64_128(buf, 0, key.length(), (int) reta[0]); } end = System.currentTimeMillis(); System.err.println(Ran v3 + cnt + times in + (end - start) + ms.); {code} Similarly for v2. Output: {code} Ran v2 1 times in 19993 ms. Ran v3 1 times in 3104 ms. {code} FWIW, I also ran some tests where I generated random strings and seeds and submitted them to both the reference implementation and the Java port and found no differences. Upgrade MurmurHash to version 3 --- Key: CASSANDRA-2975 URL: https://issues.apache.org/jira/browse/CASSANDRA-2975 Project: Cassandra Issue Type: Improvement Components: Core Affects Versions: 0.8.3 Reporter: Brian Lindauer Priority: Trivial MurmurHash version 3 was finalized on June 3. It provides an enormous speedup and increased robustness over version 2, which is implemented in Cassandra. Information here: http://code.google.com/p/smhasher/ The reference implementation is here: http://code.google.com/p/smhasher/source/browse/trunk/MurmurHash3.cpp?spec=svn136r=136 I have already done the work to port the (public domain) reference implementation to Java in the MurmurHash class and updated the BloomFilter class to use the new implementation: https://github.com/lindauer/cassandra/commit/cea6068a4a3e5d7d9509335394f9ef3350d37e93 Apart from the faster hash time, the new version only requires one call to hash() rather than 2, since it returns 128 bits of hash instead of 64. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2975) Upgrade MurmurHash to version 3
[ https://issues.apache.org/jira/browse/CASSANDRA-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13073215#comment-13073215 ] Brian Lindauer commented on CASSANDRA-2975: --- The test didn't seem to produce any errors: {code} [junit] Testsuite: org.apache.cassandra.utils.LongBloomFilterTest [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 75.354 sec [junit] [junit] Testsuite: org.apache.cassandra.utils.LongLegacyBloomFilterTest [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 76.545 sec {code} I'll have to familiarize myself more with Cassandra before dealing with the other issue you pointed out. I've just been using the BloomFilter class and only have a very general understanding of the overall system. Upgrade MurmurHash to version 3 --- Key: CASSANDRA-2975 URL: https://issues.apache.org/jira/browse/CASSANDRA-2975 Project: Cassandra Issue Type: Improvement Components: Core Affects Versions: 0.8.3 Reporter: Brian Lindauer Priority: Trivial Labels: lhf MurmurHash version 3 was finalized on June 3. It provides an enormous speedup and increased robustness over version 2, which is implemented in Cassandra. Information here: http://code.google.com/p/smhasher/ The reference implementation is here: http://code.google.com/p/smhasher/source/browse/trunk/MurmurHash3.cpp?spec=svn136r=136 I have already done the work to port the (public domain) reference implementation to Java in the MurmurHash class and updated the BloomFilter class to use the new implementation: https://github.com/lindauer/cassandra/commit/cea6068a4a3e5d7d9509335394f9ef3350d37e93 Apart from the faster hash time, the new version only requires one call to hash() rather than 2, since it returns 128 bits of hash instead of 64. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2975) Upgrade MurmurHash to version 3
[ https://issues.apache.org/jira/browse/CASSANDRA-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13073310#comment-13073310 ] Brian Lindauer commented on CASSANDRA-2975: --- Summary: {code} Mean FP rates for version 2: LongBloomFilterTest: 0.997967059178744 LongLegacyBloomFilterTest: 0.997908061594203 Mean FP rates for version 3: LongBloomFilterTest: 0.998045621980676 LongLegacyBloomFilterTest: 0.9988639 {code} Details: {code} Version 2: [echo] running long tests [junit] WARNING: multiple versions of ant detected in path for junit [junit] jar:file:/usr/share/ant/lib/ant.jar!/org/apache/tools/ant/Project.class [junit] and jar:file:/Users/jbl/git/cassandra/build/lib/jars/ant-1.6.5.jar!/org/apache/tools/ant/Project.class [junit] Testsuite: org.apache.cassandra.utils.LongBloomFilterTest [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 106.213 sec [junit] [junit] - Standard Error - [junit] fp_ratio = 0.9973043478260869 [junit] fp_ratio = 0.9965793478260869 [junit] fp_ratio = 0.9996123188405797 [junit] fp_ratio = 1.0004746376811595 [junit] fp_ratio = 0.998409420289855 [junit] fp_ratio = 0.9920978260869565 [junit] fp_ratio = 0.9979420289855072 [junit] fp_ratio = 0.9940797101449276 [junit] fp_ratio = 0.9983913043478261 [junit] fp_ratio = 1.0006159420289855 [junit] fp_ratio = 1.362318840579 [junit] fp_ratio = 1.615942028985 [junit] - --- mean = 0.997967059178744 [junit] Testsuite: org.apache.cassandra.utils.LongLegacyBloomFilterTest [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 61.721 sec [junit] [junit] - Standard Error - [junit] fp_ratio = 0.998095652173913 [junit] fp_ratio = 0.9982576086956522 [junit] fp_ratio = 0.999159420289855 [junit] fp_ratio = 1.0001340579710145 [junit] fp_ratio = 1.0011557971014493 [junit] fp_ratio = 0.9967717391304348 [junit] fp_ratio = 0.9955978260869566 [junit] fp_ratio = 0.9989673913043479 [junit] fp_ratio = 0.9966231884057971 [junit] fp_ratio = 0.9973514492753623 [junit] fp_ratio = 0.9969855072463768 [junit] fp_ratio = 0.9957971014492754 [junit] - --- mean = 0.997908061594203 Version 3: [echo] running long tests [junit] WARNING: multiple versions of ant detected in path for junit [junit] jar:file:/usr/share/ant/lib/ant.jar!/org/apache/tools/ant/Project.class [junit] and jar:file:/Users/jbl/git/cassandra/build/lib/jars/ant-1.6.5.jar!/org/apache/tools/ant/Project.class [junit] Testsuite: org.apache.cassandra.utils.LongBloomFilterTest [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 75.994 sec [junit] [junit] - Standard Error - [junit] fp_ratio = 0.9986532608695652 [junit] fp_ratio = 0.997158695652174 [junit] fp_ratio = 0.9995797101449275 [junit] fp_ratio = 0.9995 [junit] fp_ratio = 0.9984565217391305 [junit] fp_ratio = 0.9987101449275362 [junit] fp_ratio = 0.9979528985507247 [junit] fp_ratio = 0.9998224637681159 [junit] fp_ratio = 0.9938876811594203 [junit] fp_ratio = 0.9993623188405797 [junit] fp_ratio = 0.9953369565217391 [junit] fp_ratio = 0.9981268115942029 [junit] - --- mean = 0.998045621980676 [junit] Testsuite: org.apache.cassandra.utils.LongLegacyBloomFilterTest [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 60.999 sec [junit] [junit] - Standard Error - [junit] fp_ratio = 0.998095652173913 [junit] fp_ratio = 0.9983760869565217 [junit] fp_ratio = 0.9993043478260869 [junit] fp_ratio = 0.9996159420289855 [junit] fp_ratio = 0.9980217391304348 [junit] fp_ratio = 1.0016920289855074 [junit] fp_ratio = 0.9953623188405797 [junit] fp_ratio = 0.9968188405797102 [junit] fp_ratio = 0.9947173913043478 [junit] fp_ratio = 1.000695652173913 [junit] fp_ratio = 1.0030760869565218 [junit] fp_ratio = 1.0005905797101449 [junit] - --- mean = 0.9988639 {code} Upgrade MurmurHash to version 3 --- Key: CASSANDRA-2975 URL: https://issues.apache.org/jira/browse/CASSANDRA-2975 Project: Cassandra Issue Type: Improvement Components: Core Affects Versions: 0.8.3 Reporter: Brian Lindauer Priority: Trivial Labels: lhf MurmurHash version 3 was finalized on June 3. It provides an enormous speedup and increased robustness over version 2, which is implemented in Cassandra.
[jira] [Created] (CASSANDRA-2975) Upgrade MurmurHash to version 3
Upgrade MurmurHash to version 3 --- Key: CASSANDRA-2975 URL: https://issues.apache.org/jira/browse/CASSANDRA-2975 Project: Cassandra Issue Type: Improvement Components: Core Affects Versions: 0.8.3 Reporter: Brian Lindauer Priority: Trivial MurmurHash version 3 was finalized on June 3. It provides an enormous speedup and increased robustness over version 2, which is implemented in Cassandra. Information here: http://code.google.com/p/smhasher/ The reference implementation is here: http://code.google.com/p/smhasher/source/browse/trunk/MurmurHash3.cpp?spec=svn136r=136 I have already done the work to port the (public domain) reference implementation to Java in the MurmurHash class and updated the BloomFilter class to use the new implementation: https://github.com/lindauer/cassandra/commit/cea6068a4a3e5d7d9509335394f9ef3350d37e93 Apart from the faster hash time, the new version only requires one call to hash() rather than 2, since it returns 128 bits of hash instead of 64. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira