[jira] [Updated] (CASSANDRA-2975) Upgrade MurmurHash to version 3

2011-09-08 Thread Brian Lindauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Lindauer updated CASSANDRA-2975:
--

Attachment: 0001-Convert-BloomFilter-to-use-MurmurHash-v3-instead-of-.patch

Murmur3 changes

 Upgrade MurmurHash to version 3
 ---

 Key: CASSANDRA-2975
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2975
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Brian Lindauer
Assignee: Brian Lindauer
Priority: Trivial
  Labels: lhf
 Fix For: 1.1

 Attachments: 
 0001-Convert-BloomFilter-to-use-MurmurHash-v3-instead-of-.patch, 
 0002-Backwards-compatibility-with-files-using-Murmur2-blo.patch


 MurmurHash version 3 was finalized on June 3. It provides an enormous speedup 
 and increased robustness over version 2, which is implemented in Cassandra. 
 Information here:
 http://code.google.com/p/smhasher/
 The reference implementation is here:
 http://code.google.com/p/smhasher/source/browse/trunk/MurmurHash3.cpp?spec=svn136r=136
 I have already done the work to port the (public domain) reference 
 implementation to Java in the MurmurHash class and updated the BloomFilter 
 class to use the new implementation:
 https://github.com/lindauer/cassandra/commit/cea6068a4a3e5d7d9509335394f9ef3350d37e93
 Apart from the faster hash time, the new version only requires one call to 
 hash() rather than 2, since it returns 128 bits of hash instead of 64.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2975) Upgrade MurmurHash to version 3

2011-09-08 Thread Brian Lindauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Lindauer updated CASSANDRA-2975:
--

Attachment: 0002-Backwards-compatibility-with-files-using-Murmur2-blo.patch

Murmur3-Murmur2 backwards compatibility

 Upgrade MurmurHash to version 3
 ---

 Key: CASSANDRA-2975
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2975
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Brian Lindauer
Assignee: Brian Lindauer
Priority: Trivial
  Labels: lhf
 Fix For: 1.1

 Attachments: 
 0001-Convert-BloomFilter-to-use-MurmurHash-v3-instead-of-.patch, 
 0002-Backwards-compatibility-with-files-using-Murmur2-blo.patch


 MurmurHash version 3 was finalized on June 3. It provides an enormous speedup 
 and increased robustness over version 2, which is implemented in Cassandra. 
 Information here:
 http://code.google.com/p/smhasher/
 The reference implementation is here:
 http://code.google.com/p/smhasher/source/browse/trunk/MurmurHash3.cpp?spec=svn136r=136
 I have already done the work to port the (public domain) reference 
 implementation to Java in the MurmurHash class and updated the BloomFilter 
 class to use the new implementation:
 https://github.com/lindauer/cassandra/commit/cea6068a4a3e5d7d9509335394f9ef3350d37e93
 Apart from the faster hash time, the new version only requires one call to 
 hash() rather than 2, since it returns 128 bits of hash instead of 64.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2975) Upgrade MurmurHash to version 3

2011-08-01 Thread Brian Lindauer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13073832#comment-13073832
 ] 

Brian Lindauer commented on CASSANDRA-2975:
---

You weren't kidding about compatibility with old data files not being simple. 
It actually turned out to be fairly major surgery. The original changes just to 
support Mumur3 are here:

https://github.com/lindauer/cassandra/commit/cea6068a4a3e5d7d9509335394f9ef3350d37e93

The additional proposed changes to support backward compatibility are at:

https://github.com/lindauer/cassandra/commit/9d7479675752a07732f434b307be6642d8b3e85f

I can't say I'm completely satisfied with these changes. It feels like we 
should unify with LegacyBloomFilter now that there are 3 versions. It also 
feels like all of the places where a serializer is selected based on a 
Descriptor version/flag could be moved under one roof, where callers just pass 
the Descriptor and it returns the correct serializer instance. But, not being 
too familiar with Cassandra, I was trying to be minimally invasive for fear of 
breaking something.

All of the tests pass, but I haven't added any tests, such as making sure that 
old files can still be read in. Like I said, I'm not very familiar with 
Cassandra, so you should review these changes carefully. (I'm sure you would 
anyway.)


 Upgrade MurmurHash to version 3
 ---

 Key: CASSANDRA-2975
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2975
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 0.8.3
Reporter: Brian Lindauer
Priority: Trivial
  Labels: lhf

 MurmurHash version 3 was finalized on June 3. It provides an enormous speedup 
 and increased robustness over version 2, which is implemented in Cassandra. 
 Information here:
 http://code.google.com/p/smhasher/
 The reference implementation is here:
 http://code.google.com/p/smhasher/source/browse/trunk/MurmurHash3.cpp?spec=svn136r=136
 I have already done the work to port the (public domain) reference 
 implementation to Java in the MurmurHash class and updated the BloomFilter 
 class to use the new implementation:
 https://github.com/lindauer/cassandra/commit/cea6068a4a3e5d7d9509335394f9ef3350d37e93
 Apart from the faster hash time, the new version only requires one call to 
 hash() rather than 2, since it returns 128 bits of hash instead of 64.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2975) Upgrade MurmurHash to version 3

2011-07-30 Thread Brian Lindauer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13073207#comment-13073207
 ] 

Brian Lindauer commented on CASSANDRA-2975:
---

Surprising, but yes. It's dramatically faster. The MurmurHash author reports a 
50% speedup over v2 at http://code.google.com/p/smhasher/wiki/MurmurHash3. I 
ran my own simple benchmark on the Java version comparing the existing 
MurmurHash.hash64() function to the MurmurHash.hash3_x64_128() I added and 
found an even larger advantage. The improvement is so huge that I wonder a 
little bit if there isn't a flaw in my test, but here it is:

{code:java}
start = System.currentTimeMillis();
long[] reta = {0, 0};
ByteBuffer buf = strToByteBuffer(key);
for (int i=0; icnt; i++)
{
  buf.clear();
  reta = MurmurHash.hash3_x64_128(buf, 0, key.length(), (int) reta[0]);
}
end = System.currentTimeMillis();
System.err.println(Ran v3  + cnt +  times in  + (end - start) +  ms.);
{code}

Similarly for v2.

Output:
{code}
Ran v2 1 times in 19993 ms.
Ran v3 1 times in 3104 ms.
{code}

FWIW, I also ran some tests where I generated random strings and seeds and 
submitted them to both the reference implementation and the Java port and found 
no differences.

 Upgrade MurmurHash to version 3
 ---

 Key: CASSANDRA-2975
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2975
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 0.8.3
Reporter: Brian Lindauer
Priority: Trivial

 MurmurHash version 3 was finalized on June 3. It provides an enormous speedup 
 and increased robustness over version 2, which is implemented in Cassandra. 
 Information here:
 http://code.google.com/p/smhasher/
 The reference implementation is here:
 http://code.google.com/p/smhasher/source/browse/trunk/MurmurHash3.cpp?spec=svn136r=136
 I have already done the work to port the (public domain) reference 
 implementation to Java in the MurmurHash class and updated the BloomFilter 
 class to use the new implementation:
 https://github.com/lindauer/cassandra/commit/cea6068a4a3e5d7d9509335394f9ef3350d37e93
 Apart from the faster hash time, the new version only requires one call to 
 hash() rather than 2, since it returns 128 bits of hash instead of 64.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2975) Upgrade MurmurHash to version 3

2011-07-30 Thread Brian Lindauer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13073215#comment-13073215
 ] 

Brian Lindauer commented on CASSANDRA-2975:
---

The test didn't seem to produce any errors:

{code}
[junit] Testsuite: org.apache.cassandra.utils.LongBloomFilterTest
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 75.354 sec
[junit] 
[junit] Testsuite: org.apache.cassandra.utils.LongLegacyBloomFilterTest
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 76.545 sec
{code}

I'll have to familiarize myself more with Cassandra before dealing with the 
other issue you pointed out. I've just been using the BloomFilter class and 
only have a very general understanding of the overall system.

 Upgrade MurmurHash to version 3
 ---

 Key: CASSANDRA-2975
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2975
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 0.8.3
Reporter: Brian Lindauer
Priority: Trivial
  Labels: lhf

 MurmurHash version 3 was finalized on June 3. It provides an enormous speedup 
 and increased robustness over version 2, which is implemented in Cassandra. 
 Information here:
 http://code.google.com/p/smhasher/
 The reference implementation is here:
 http://code.google.com/p/smhasher/source/browse/trunk/MurmurHash3.cpp?spec=svn136r=136
 I have already done the work to port the (public domain) reference 
 implementation to Java in the MurmurHash class and updated the BloomFilter 
 class to use the new implementation:
 https://github.com/lindauer/cassandra/commit/cea6068a4a3e5d7d9509335394f9ef3350d37e93
 Apart from the faster hash time, the new version only requires one call to 
 hash() rather than 2, since it returns 128 bits of hash instead of 64.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2975) Upgrade MurmurHash to version 3

2011-07-30 Thread Brian Lindauer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13073310#comment-13073310
 ] 

Brian Lindauer commented on CASSANDRA-2975:
---

Summary:

{code}
Mean FP rates for version 2:
LongBloomFilterTest: 0.997967059178744
LongLegacyBloomFilterTest: 0.997908061594203

Mean FP rates for version 3:
LongBloomFilterTest: 0.998045621980676
LongLegacyBloomFilterTest: 0.9988639
{code}


Details:

{code}
Version 2:
 [echo] running long tests
[junit] WARNING: multiple versions of ant detected in path for junit 
[junit]  
jar:file:/usr/share/ant/lib/ant.jar!/org/apache/tools/ant/Project.class
[junit]  and 
jar:file:/Users/jbl/git/cassandra/build/lib/jars/ant-1.6.5.jar!/org/apache/tools/ant/Project.class
[junit] Testsuite: org.apache.cassandra.utils.LongBloomFilterTest
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 106.213 sec
[junit] 
[junit] - Standard Error -
[junit] fp_ratio = 0.9973043478260869
[junit] fp_ratio = 0.9965793478260869
[junit] fp_ratio = 0.9996123188405797
[junit] fp_ratio = 1.0004746376811595
[junit] fp_ratio = 0.998409420289855
[junit] fp_ratio = 0.9920978260869565
[junit] fp_ratio = 0.9979420289855072
[junit] fp_ratio = 0.9940797101449276
[junit] fp_ratio = 0.9983913043478261
[junit] fp_ratio = 1.0006159420289855
[junit] fp_ratio = 1.362318840579
[junit] fp_ratio = 1.615942028985
[junit] -  ---
mean = 0.997967059178744

[junit] Testsuite: org.apache.cassandra.utils.LongLegacyBloomFilterTest
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 61.721 sec
[junit] 
[junit] - Standard Error -
[junit] fp_ratio = 0.998095652173913
[junit] fp_ratio = 0.9982576086956522
[junit] fp_ratio = 0.999159420289855
[junit] fp_ratio = 1.0001340579710145
[junit] fp_ratio = 1.0011557971014493
[junit] fp_ratio = 0.9967717391304348
[junit] fp_ratio = 0.9955978260869566
[junit] fp_ratio = 0.9989673913043479
[junit] fp_ratio = 0.9966231884057971
[junit] fp_ratio = 0.9973514492753623
[junit] fp_ratio = 0.9969855072463768
[junit] fp_ratio = 0.9957971014492754
[junit] -  ---
mean  = 0.997908061594203


Version 3:
 [echo] running long tests
[junit] WARNING: multiple versions of ant detected in path for junit 
[junit]  
jar:file:/usr/share/ant/lib/ant.jar!/org/apache/tools/ant/Project.class
[junit]  and 
jar:file:/Users/jbl/git/cassandra/build/lib/jars/ant-1.6.5.jar!/org/apache/tools/ant/Project.class
[junit] Testsuite: org.apache.cassandra.utils.LongBloomFilterTest
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 75.994 sec
[junit] 
[junit] - Standard Error -
[junit] fp_ratio = 0.9986532608695652
[junit] fp_ratio = 0.997158695652174
[junit] fp_ratio = 0.9995797101449275
[junit] fp_ratio = 0.9995
[junit] fp_ratio = 0.9984565217391305
[junit] fp_ratio = 0.9987101449275362
[junit] fp_ratio = 0.9979528985507247
[junit] fp_ratio = 0.9998224637681159
[junit] fp_ratio = 0.9938876811594203
[junit] fp_ratio = 0.9993623188405797
[junit] fp_ratio = 0.9953369565217391
[junit] fp_ratio = 0.9981268115942029
[junit] -  ---
mean  = 0.998045621980676

[junit] Testsuite: org.apache.cassandra.utils.LongLegacyBloomFilterTest
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 60.999 sec
[junit] 
[junit] - Standard Error -
[junit] fp_ratio = 0.998095652173913
[junit] fp_ratio = 0.9983760869565217
[junit] fp_ratio = 0.9993043478260869
[junit] fp_ratio = 0.9996159420289855
[junit] fp_ratio = 0.9980217391304348
[junit] fp_ratio = 1.0016920289855074
[junit] fp_ratio = 0.9953623188405797
[junit] fp_ratio = 0.9968188405797102
[junit] fp_ratio = 0.9947173913043478
[junit] fp_ratio = 1.000695652173913
[junit] fp_ratio = 1.0030760869565218
[junit] fp_ratio = 1.0005905797101449
[junit] -  ---
mean  = 0.9988639
{code}

 Upgrade MurmurHash to version 3
 ---

 Key: CASSANDRA-2975
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2975
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 0.8.3
Reporter: Brian Lindauer
Priority: Trivial
  Labels: lhf

 MurmurHash version 3 was finalized on June 3. It provides an enormous speedup 
 and increased robustness over version 2, which is implemented in Cassandra. 
 

[jira] [Created] (CASSANDRA-2975) Upgrade MurmurHash to version 3

2011-07-29 Thread Brian Lindauer (JIRA)
Upgrade MurmurHash to version 3
---

 Key: CASSANDRA-2975
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2975
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 0.8.3
Reporter: Brian Lindauer
Priority: Trivial


MurmurHash version 3 was finalized on June 3. It provides an enormous speedup 
and increased robustness over version 2, which is implemented in Cassandra. 
Information here:

http://code.google.com/p/smhasher/

The reference implementation is here:
http://code.google.com/p/smhasher/source/browse/trunk/MurmurHash3.cpp?spec=svn136r=136

I have already done the work to port the (public domain) reference 
implementation to Java in the MurmurHash class and updated the BloomFilter 
class to use the new implementation:

https://github.com/lindauer/cassandra/commit/cea6068a4a3e5d7d9509335394f9ef3350d37e93

Apart from the faster hash time, the new version only requires one call to 
hash() rather than 2, since it returns 128 bits of hash instead of 64.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira