[jira] [Updated] (CASSANDRA-8614) Select optimal CRC32 implementation at runtime

2015-11-23 Thread Ariel Weisberg (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ariel Weisberg updated CASSANDRA-8614:
--
Component/s: Local Write-Read Paths
 Compaction

> Select optimal CRC32 implementation at runtime
> --
>
> Key: CASSANDRA-8614
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8614
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction, Local Write-Read Paths
>Reporter: Ariel Weisberg
>Assignee: Ariel Weisberg
>  Labels: performance
> Attachments: 8614.patch, CRC32.class, Sample.java
>
>
> JDK 8 has support for an intrinsic for CRC32 that runs at 12-13 gigabytes/sec 
> per core in my quick and dirty test. PureJavaCRC32 is < 800 megabytes/sec if 
> I recall and it has a lookup table that evicts random cache lines every time 
> it runs.
> In order to capture the benefit of that when it is available we can select a 
> CRC32 implementation at startup in a static block.
> If JDK 8 is not what is running we can fall back to the existing 
> PureJavaCRC32 implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8614) Select optimal CRC32 implementation at runtime

2015-02-11 Thread Ariel Weisberg (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ariel Weisberg updated CASSANDRA-8614:
--
Attachment: CRC32.class

 Select optimal CRC32 implementation at runtime
 --

 Key: CASSANDRA-8614
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8614
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Ariel Weisberg
Assignee: Ariel Weisberg
  Labels: performance
 Attachments: 8614.patch, CRC32.class, Sample.java


 JDK 8 has support for an intrinsic for CRC32 that runs at 12-13 gigabytes/sec 
 per core in my quick and dirty test. PureJavaCRC32 is  800 megabytes/sec if 
 I recall and it has a lookup table that evicts random cache lines every time 
 it runs.
 In order to capture the benefit of that when it is available we can select a 
 CRC32 implementation at startup in a static block.
 If JDK 8 is not what is running we can fall back to the existing 
 PureJavaCRC32 implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8614) Select optimal CRC32 implementation at runtime

2015-01-25 Thread Ariel Weisberg (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ariel Weisberg updated CASSANDRA-8614:
--
Attachment: 8614.patch

New patch addressing a few things.

I had to put in a fake Checksum class to compile against with the JDK 8 methods 
for byte buffers. javac is surprisingly OK with just pointing at a source file. 
Fixed formatting and added a test to make sure JDK detection is actually 
detecting and giving up the goodness.

I think this should go in so we at least get it for the commit log. It looks 
like Adler is not fast in JDK 8 on Linux. It is inexplicably fast on OS X. The 
same speed as CRC32.

I don't have an explanation for the funky performance numbers on OS X. On Linux 
I get the expected behavior where disabling the intrinsic is slow and switching 
to JDK 7 is slow.

I will create a separate ticket for discussion of the right way to replace 
Adler32 with CRC32 in SSTables.

 Select optimal CRC32 implementation at runtime
 --

 Key: CASSANDRA-8614
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8614
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Ariel Weisberg
Assignee: Ariel Weisberg
  Labels: performance
 Attachments: 8614.patch, Sample.java


 JDK 8 has support for an intrinsic for CRC32 that runs at 12-13 gigabytes/sec 
 per core in my quick and dirty test. PureJavaCRC32 is  800 megabytes/sec if 
 I recall and it has a lookup table that evicts random cache lines every time 
 it runs.
 In order to capture the benefit of that when it is available we can select a 
 CRC32 implementation at startup in a static block.
 If JDK 8 is not what is running we can fall back to the existing 
 PureJavaCRC32 implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8614) Select optimal CRC32 implementation at runtime

2015-01-25 Thread Ariel Weisberg (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ariel Weisberg updated CASSANDRA-8614:
--
Reviewer: Benedict

 Select optimal CRC32 implementation at runtime
 --

 Key: CASSANDRA-8614
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8614
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Ariel Weisberg
Assignee: Ariel Weisberg
  Labels: performance
 Attachments: 8614.patch, Sample.java


 JDK 8 has support for an intrinsic for CRC32 that runs at 12-13 gigabytes/sec 
 per core in my quick and dirty test. PureJavaCRC32 is  800 megabytes/sec if 
 I recall and it has a lookup table that evicts random cache lines every time 
 it runs.
 In order to capture the benefit of that when it is available we can select a 
 CRC32 implementation at startup in a static block.
 If JDK 8 is not what is running we can fall back to the existing 
 PureJavaCRC32 implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8614) Select optimal CRC32 implementation at runtime

2015-01-25 Thread Ariel Weisberg (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ariel Weisberg updated CASSANDRA-8614:
--
Attachment: (was: 8614.patch)

 Select optimal CRC32 implementation at runtime
 --

 Key: CASSANDRA-8614
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8614
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Ariel Weisberg
Assignee: Ariel Weisberg
  Labels: performance
 Attachments: Sample.java


 JDK 8 has support for an intrinsic for CRC32 that runs at 12-13 gigabytes/sec 
 per core in my quick and dirty test. PureJavaCRC32 is  800 megabytes/sec if 
 I recall and it has a lookup table that evicts random cache lines every time 
 it runs.
 In order to capture the benefit of that when it is available we can select a 
 CRC32 implementation at startup in a static block.
 If JDK 8 is not what is running we can fall back to the existing 
 PureJavaCRC32 implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8614) Select optimal CRC32 implementation at runtime

2015-01-16 Thread Benedict (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-8614:

Assignee: Ariel Weisberg  (was: Benedict)

 Select optimal CRC32 implementation at runtime
 --

 Key: CASSANDRA-8614
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8614
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Ariel Weisberg
Assignee: Ariel Weisberg
  Labels: performance
 Attachments: 8614.patch, Sample.java


 JDK 8 has support for an intrinsic for CRC32 that runs at 12-13 gigabytes/sec 
 per core in my quick and dirty test. PureJavaCRC32 is  800 megabytes/sec if 
 I recall and it has a lookup table that evicts random cache lines every time 
 it runs.
 In order to capture the benefit of that when it is available we can select a 
 CRC32 implementation at startup in a static block.
 If JDK 8 is not what is running we can fall back to the existing 
 PureJavaCRC32 implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8614) Select optimal CRC32 implementation at runtime

2015-01-14 Thread Ariel Weisberg (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ariel Weisberg updated CASSANDRA-8614:
--
Attachment: Sample.java

JMH benchmark. For small sizes it's not as fast, but at a kilobyte it is many 
times faster. It also doesn't evict random cache lines so the impact could be 
larger than what the micro benchmark shows.

For large sizes it indeed does 13 gigabytes/sec which is pretty crazy.

There is a performance delta between direct and non-direct byte buffers in 
favor of direct byte buffers and the one case I looked at it was 2x faster.

{noformat}
 [java] Benchmark (byteSize)   Mode 
 Samples Score Error  Units
 [java] o.a.c.t.m.Sample.CRC32Array  128  thrpt 
   6  13905041.788 ±  598179.976  ops/s
 [java] o.a.c.t.m.Sample.PureJavaCrc32   128  thrpt 
   6  10525663.252 ±  507525.667  ops/s

 [java] o.a.c.t.m.Sample.CRC32Array  512  thrpt 
   6  14571599.254 ± 8930061.376  ops/s
 [java] o.a.c.t.m.Sample.PureJavaCrc32   512  thrpt 
   6   2835430.274 ±   92029.259  ops/s

 [java] o.a.c.t.m.Sample.CRC32Array 1024  thrpt 
   6   8337714.641 ± 3988493.638  ops/s
 [java] o.a.c.t.m.Sample.PureJavaCrc32  1024  thrpt 
   6   1428928.434 ±   31709.319  ops/s

 [java] o.a.c.t.m.Sample.CRC32Array  1048576  thrpt 
   6 12364.723 ± 344.434  ops/s
 [java] o.a.c.t.m.Sample.PureJavaCrc32   1048576  thrpt 
   6  1412.017 ±  89.214  ops/s

 [java] o.a.c.t.m.Sample.CRC32ByteBuffer 128  thrpt 
   6  15925509.375 ±  779733.985  ops/s
 [java] o.a.c.t.m.Sample.PureJavaCrc32ByteBuffer 128  thrpt 
   6  10446360.681 ±  599847.210  ops/s

 [java] o.a.c.t.m.Sample.CRC32ByteBuffer 512  thrpt 
   6  10906108.722 ±  346735.334  ops/s
 [java] o.a.c.t.m.Sample.PureJavaCrc32ByteBuffer 512  thrpt 
   6   2873179.754 ±  140004.771  ops/s

 [java] o.a.c.t.m.Sample.CRC32ByteBuffer1024  thrpt 
   6   6582936.616 ± 2219292.645  ops/s
 [java] o.a.c.t.m.Sample.PureJavaCrc32ByteBuffer1024  thrpt 
   6   1440343.345 ±   42303.806  ops/s

 [java] o.a.c.t.m.Sample.CRC32ByteBuffer 1048576  thrpt 
   6 12555.846 ± 514.918  ops/s
 [java] o.a.c.t.m.Sample.PureJavaCrc32ByteBuffer 1048576  thrpt 
   6  1414.886 ±  58.363  ops/s

 [java] o.a.c.t.m.Sample.CRC32ByteBufferDirect   128  thrpt 
   6  31786603.552 ± 2000265.643  ops/s
 [java] o.a.c.t.m.Sample.PureJavaCrc32ByteBufferDirect   128  thrpt 
   6   9169128.441 ±  296419.993  ops/s

 [java] o.a.c.t.m.Sample.CRC32ByteBufferDirect   512  thrpt 
   6  15768165.220 ±  589215.966  ops/s
 [java] o.a.c.t.m.Sample.PureJavaCrc32ByteBufferDirect   512  thrpt 
   6   2614215.362 ±  171099.973  ops/s

 [java] o.a.c.t.m.Sample.CRC32ByteBufferDirect  1024  thrpt 
   6   9846566.689 ±  447235.143  ops/s
 [java] o.a.c.t.m.Sample.PureJavaCrc32ByteBufferDirect  1024  thrpt 
   6   1327731.561 ±   41147.584  ops/s

 [java] o.a.c.t.m.Sample.CRC32ByteBufferDirect   1048576  thrpt 
   6 12467.127 ± 543.952  ops/s
 [java] o.a.c.t.m.Sample.PureJavaCrc32ByteBufferDirect   1048576  thrpt 
   6  1333.941 ±  20.311  ops/s


 [java] o.a.c.t.m.Sample.CRC32ByteBufferDirectWrapped128  thrpt 
   6  30545863.214 ± 2669919.886  ops/s
 [java] o.a.c.t.m.Sample.CRC32ByteBufferDirectWrapped512  thrpt 
   6  14929967.141 ± 1596223.606  ops/s
 [java] o.a.c.t.m.Sample.CRC32ByteBufferDirectWrapped   1024  thrpt 
   6   9408037.238 ±  564849.404  ops/s
 [java] o.a.c.t.m.Sample.CRC32ByteBufferDirectWrapped1048576  thrpt 
   6 12020.464 ± 417.515  ops/s
 [java] o.a.c.t.m.Sample.CRC32ByteBufferWrapped  128  thrpt 
   6  12996481.274 ± 9216253.478  ops/s
 [java] o.a.c.t.m.Sample.CRC32ByteBufferWrapped  512  thrpt 
   6   9632311.965 ± 4249496.365  ops/s
 [java] o.a.c.t.m.Sample.CRC32ByteBufferWrapped 1024  thrpt 
   6   7068335.746 ± 2112734.871  ops/s
 [java] o.a.c.t.m.Sample.CRC32ByteBufferWrapped  1048576  thrpt 
   6 12580.275 ± 838.737  ops/s
{noformat}

 Select optimal CRC32 implementation at runtime
 --

 Key: CASSANDRA-8614
 URL: 

[jira] [Updated] (CASSANDRA-8614) Select optimal CRC32 implementation at runtime

2015-01-13 Thread Ariel Weisberg (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ariel Weisberg updated CASSANDRA-8614:
--
Attachment: CRC32FactoryTest.java
8614.patch

Compiles on Java 7 and when run on Java 8 you get the intrinsic. There is a 
test case to validate that the two checksums implementations behave the same.



 Select optimal CRC32 implementation at runtime
 --

 Key: CASSANDRA-8614
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8614
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Ariel Weisberg
  Labels: performance
 Attachments: 8614.patch, CRC32FactoryTest.java


 JDK 8 has support for an intrinsic for CRC32 that runs at 12-13 gigabytes/sec 
 per core in my quick and dirty test. PureJavaCRC32 is  800 megabytes/sec if 
 I recall and it has a lookup table that evicts random cache lines every time 
 it runs.
 In order to capture the benefit of that when it is available we can select a 
 CRC32 implementation at startup in a static block.
 If JDK 8 is not what is running we can fall back to the existing 
 PureJavaCRC32 implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8614) Select optimal CRC32 implementation at runtime

2015-01-13 Thread Ariel Weisberg (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ariel Weisberg updated CASSANDRA-8614:
--
Attachment: (was: CRC32FactoryTest.java)

 Select optimal CRC32 implementation at runtime
 --

 Key: CASSANDRA-8614
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8614
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Ariel Weisberg
  Labels: performance

 JDK 8 has support for an intrinsic for CRC32 that runs at 12-13 gigabytes/sec 
 per core in my quick and dirty test. PureJavaCRC32 is  800 megabytes/sec if 
 I recall and it has a lookup table that evicts random cache lines every time 
 it runs.
 In order to capture the benefit of that when it is available we can select a 
 CRC32 implementation at startup in a static block.
 If JDK 8 is not what is running we can fall back to the existing 
 PureJavaCRC32 implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8614) Select optimal CRC32 implementation at runtime

2015-01-13 Thread Ariel Weisberg (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ariel Weisberg updated CASSANDRA-8614:
--
Attachment: (was: 8614.patch)

 Select optimal CRC32 implementation at runtime
 --

 Key: CASSANDRA-8614
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8614
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Ariel Weisberg
  Labels: performance

 JDK 8 has support for an intrinsic for CRC32 that runs at 12-13 gigabytes/sec 
 per core in my quick and dirty test. PureJavaCRC32 is  800 megabytes/sec if 
 I recall and it has a lookup table that evicts random cache lines every time 
 it runs.
 In order to capture the benefit of that when it is available we can select a 
 CRC32 implementation at startup in a static block.
 If JDK 8 is not what is running we can fall back to the existing 
 PureJavaCRC32 implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8614) Select optimal CRC32 implementation at runtime

2015-01-13 Thread Ariel Weisberg (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ariel Weisberg updated CASSANDRA-8614:
--
Attachment: 8614.patch

Patch including all/missing files

 Select optimal CRC32 implementation at runtime
 --

 Key: CASSANDRA-8614
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8614
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Ariel Weisberg
  Labels: performance
 Attachments: 8614.patch


 JDK 8 has support for an intrinsic for CRC32 that runs at 12-13 gigabytes/sec 
 per core in my quick and dirty test. PureJavaCRC32 is  800 megabytes/sec if 
 I recall and it has a lookup table that evicts random cache lines every time 
 it runs.
 In order to capture the benefit of that when it is available we can select a 
 CRC32 implementation at startup in a static block.
 If JDK 8 is not what is running we can fall back to the existing 
 PureJavaCRC32 implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)