[jira] [Commented] (CASSANDRA-7039) DirectByteBuffer compatible LZ4 methods

2014-11-26 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14227043#comment-14227043
 ] 

Adrien Grand commented on CASSANDRA-7039:
-

The new 1.3.0 release now supports (de)compression on top of the ByteBuffer API.

 DirectByteBuffer compatible LZ4 methods
 ---

 Key: CASSANDRA-7039
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7039
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Branimir Lambov
Priority: Minor
  Labels: performance
 Fix For: 3.0


 As we move more things off-heap, it's becoming more and more essential to be 
 able to use DirectByteBuffer (or native pointers) in various places. 
 Unfortunately LZ4 doesn't currently support this operation, despite being JNI 
 based - this means we both have to perform unnecessary copies to de/compress 
 data from DBB, but also we can stall GC as any JNI method operating over a 
 java array using the GetPrimitiveArrayCritical enters a critical section that 
 prevents GC for its duration. This means STWs will be at least as long any 
 running compression/decompression (and no GC will happen until they complete, 
 so it's additive).
 We should temporarily fork (and then resubmit upstream) jpountz-lz4 to 
 support operating over a native pointer, so that we can pass a DBB or a raw 
 pointer we have allocated ourselves. This will help improve performance when 
 flushing the new offheap memtables, as well as enable us to implement 
 CASSANDRA-6726 and finish CASSANDRA-4338.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-4338) Experiment with direct buffer in SequentialWriter

2013-09-26 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13779098#comment-13779098
 ] 

Adrien Grand commented on CASSANDRA-4338:
-

Interesting, I was wondering whether people actually need to compress from/to 
byte buffers. Now that I know that some do, I can try to move this issue 
forward.

 Experiment with direct buffer in SequentialWriter
 -

 Key: CASSANDRA-4338
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4338
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: Marcus Eriksson
Priority: Minor
  Labels: performance
 Fix For: 2.1

 Attachments: 4338-gc.tar.gz, gc-4338-patched.png, gc-trunk.png


 Using a direct buffer instead of a heap-based byte[] should let us avoid a 
 copy into native memory when we flush the buffer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-5862) Switch to adler checksum for sstables

2013-09-24 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13776112#comment-13776112
 ] 

Adrien Grand commented on CASSANDRA-5862:
-

bq. What is the gain from switching to xxHash ? [...] Specifically, I'm 
interested in hash diversity.

xxHash happens to pass all tests of the SMHasher test suite 
(http://code.google.com/p/smhasher/wiki/SMHasher, see scores and speed of some 
common hash function at http://code.google.com/p/xxhash/) which has many tests 
including hash diversity.

 Switch to adler checksum for sstables
 -

 Key: CASSANDRA-5862
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5862
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: T Jake Luciani
 Fix For: 2.0.1

 Attachments: 5862.txt


 Adler is significantly faster than CRC32: 
 http://java-performance.info/java-crc32-and-adler32/
 (Adler is weaker for short inputs, so we should leave the commitlog alone, as 
 it checksums each mutation individually.)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-5862) Switch to adler checksum for sstables

2013-09-19 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13771671#comment-13771671
 ] 

Adrien Grand commented on CASSANDRA-5862:
-

I just learned about this issue and was wondering if you had considered XXHash 
(http://code.google.com/p/xxhash/) for checksuming. I am a little biased since 
I wrote the JNI bindings and Java ports of XXHash but the benchmarks show 
interesting results compared to other hash/checksum implementations: 
http://jpountz.github.io/lz4-java/1.2.0/xxhash-benchmark/. Just beware that 
depending on the size of the input, the fastest impl of XXHash is not always 
the same: for large inputs ( 1024 bytes), the JNI one is faster while on 
smaller input (= 1024 bytes), the Java port using the Unsafe API is faster, 
probably because of the JNI overhead.

You can find the sources of the benchmark at 
https://github.com/jpountz/jvm-checksum-benchmark.

 Switch to adler checksum for sstables
 -

 Key: CASSANDRA-5862
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5862
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: T Jake Luciani
 Fix For: 2.0.1

 Attachments: 5862.txt


 Adler is significantly faster than CRC32: 
 http://java-performance.info/java-crc32-and-adler32/
 (Adler is weaker for short inputs, so we should leave the commitlog alone, as 
 it checksums each mutation individually.)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-5862) Switch to adler checksum for sstables

2013-09-19 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13772131#comment-13772131
 ] 

Adrien Grand commented on CASSANDRA-5862:
-

I ran this benchmark several weeks ago and I'm not sure but I think it was with 
JDK7. I'll run it again to be sure...

 Switch to adler checksum for sstables
 -

 Key: CASSANDRA-5862
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5862
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: T Jake Luciani
 Fix For: 2.0.1

 Attachments: 5862.txt


 Adler is significantly faster than CRC32: 
 http://java-performance.info/java-crc32-and-adler32/
 (Adler is weaker for short inputs, so we should leave the commitlog alone, as 
 it checksums each mutation individually.)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-5862) Switch to adler checksum for sstables

2013-09-19 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13772295#comment-13772295
 ] 

Adrien Grand commented on CASSANDRA-5862:
-

I ran the benchmark again with Java 6 and 7, here is the raw data:

 - Java 6u45: 
https://microbenchmarks.appspot.com/runs/7340dcbf-d314-4ac4-bcb4-806f2e7f6f7b#r:scenario.benchmarkSpec.parameters.size,scenario.benchmarkSpec.parameters.checksum
 - Java 7u21: 
https://microbenchmarks.appspot.com/runs/4e7ed509-e2ea-4405-bcb6-06a119a06d2b#r:scenario.benchmarkSpec.parameters.size,scenario.benchmarkSpec.parameters.checksum

There are indeed performance differences for CRC32 and Adler32 between Java 6 
and 7 but in both cases, XXHash performs even better.

 Switch to adler checksum for sstables
 -

 Key: CASSANDRA-5862
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5862
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: T Jake Luciani
 Fix For: 2.0.1

 Attachments: 5862.txt


 Adler is significantly faster than CRC32: 
 http://java-performance.info/java-crc32-and-adler32/
 (Adler is weaker for short inputs, so we should leave the commitlog alone, as 
 it checksums each mutation individually.)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-5862) Switch to adler checksum for sstables

2013-09-19 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13772392#comment-13772392
 ] 

Adrien Grand commented on CASSANDRA-5862:
-

Indeed, this is what it suggests.

 Switch to adler checksum for sstables
 -

 Key: CASSANDRA-5862
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5862
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: T Jake Luciani
 Fix For: 2.0.1

 Attachments: 5862.txt


 Adler is significantly faster than CRC32: 
 http://java-performance.info/java-crc32-and-adler32/
 (Adler is weaker for short inputs, so we should leave the commitlog alone, as 
 it checksums each mutation individually.)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-5038) LZ4Compressor

2013-06-10 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13679396#comment-13679396
 ] 

Adrien Grand commented on CASSANDRA-5038:
-

bq. One benefit to LZ4 is that the unsafe implementation isn't much slower than 
the native version; if we end up going to an incompatible version (or people 
run on some platform that Snappy/LZ4 native hasn't been created for), they 
won't have nearly the slowdown that Snappy has (or at least had, not sure if 
they've addressed that).

For your information [~cowtowncoder] ran his JVM compressor benchmark again 
recently, and indeed, it shows good performance numbers for both the native and 
the Java impl of LZ4: https://twitter.com/cowtowncoder/status/343881969697951744

 LZ4Compressor
 -

 Key: CASSANDRA-5038
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5038
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: T Jake Luciani
Assignee: Adrien Grand
Priority: Minor
 Fix For: 1.2.2

 Attachments: CASSANDRA-5038.patch, CASSANDRA-5038.patch, 
 CASSANDRA-5038.patch, LZ4Compressor.java, lz4-java.jar


 LZ4 is a new compression algo that's ~2x faster than Snappy.
 [~jpountz] has written a nice java port which includes a misc.Unsafe version 
 that performs = than our java snappy version.
 Details at http://blog.jpountz.net/post/28092106032/wow-lz4-is-fast
 The nice thing is this should work with java7 and be more portable.
 We can also fallback the pure java impl

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-5038) LZ4Compressor

2013-02-10 Thread Adrien Grand (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-5038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand updated CASSANDRA-5038:


Attachment: CASSANDRA-5038.patch

Updated patch (should apply on top of branch cassandra-1.2):
 - upgraded lz4 to version 1.1.0
 - added copyright headers and license files

After having applied the patch, you need to download the LZ4 jar from 
http://repo1.maven.org/maven2/net/jpountz/lz4/lz4/1.1.0/lz4-1.1.0.jar and put 
it under the lib directory.

 LZ4Compressor
 -

 Key: CASSANDRA-5038
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5038
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: T Jake Luciani
Priority: Minor
 Fix For: 1.2.2

 Attachments: CASSANDRA-5038.patch, CASSANDRA-5038.patch, 
 CASSANDRA-5038.patch, LZ4Compressor.java, lz4-java.jar


 LZ4 is a new compression algo that's ~2x faster than Snappy.
 [~jpountz] has written a nice java port which includes a misc.Unsafe version 
 that performs = than our java snappy version.
 Details at http://blog.jpountz.net/post/28092106032/wow-lz4-is-fast
 The nice thing is this should work with java7 and be more portable.
 We can also fallback the pure java impl

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-5038) LZ4Compressor

2013-02-09 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13575307#comment-13575307
 ] 

Adrien Grand commented on CASSANDRA-5038:
-

I finally managed to deploy an artifact with pre-built JNI bindings for some 
major platforms (win32/amd64, linux/i386, linux/amd64 and darwin/x86_64) to 
Maven Central (http://repo1.maven.org/maven2/net/jpountz/lz4/lz4/1.1.0/). This 
should help provide better performance on these platforms (see compression and 
decompression benchmarks: 
http://jpountz.github.com/lz4-java/1.1.0/lz4-compression-benchmark/ 
http://jpountz.github.com/lz4-java/1.1.0/lz4-decompression-benchmark/).

 LZ4Compressor
 -

 Key: CASSANDRA-5038
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5038
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: T Jake Luciani
Priority: Minor
 Fix For: 1.2.2

 Attachments: CASSANDRA-5038.patch, CASSANDRA-5038.patch, 
 LZ4Compressor.java, lz4-java.jar


 LZ4 is a new compression algo that's ~2x faster than Snappy.
 [~jpountz] has written a nice java port which includes a misc.Unsafe version 
 that performs = than our java snappy version.
 Details at http://blog.jpountz.net/post/28092106032/wow-lz4-is-fast
 The nice thing is this should work with java7 and be more portable.
 We can also fallback the pure java impl

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-5038) LZ4Compressor

2013-02-04 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13570382#comment-13570382
 ] 

Adrien Grand commented on CASSANDRA-5038:
-

The Maven/JNI issue is that the JAR deployed to Maven central doesn't contain 
JNI bindings to the C impl, so the fastest available impl will be the Java one 
based on sun.misc.Unsafe, which should already be as fast as snappy-java. I'm 
working on deploying a JAR packaged with JNI bindings to Maven Central but it 
might take some time, so I think we could commit this patch as-is. When I 
manage to deploy the JNI bindings to Maven Central, all we will need to do is 
to upgrade the JAR packaged with Cassandra. What do you think?

 LZ4Compressor
 -

 Key: CASSANDRA-5038
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5038
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: T Jake Luciani
Priority: Minor
 Fix For: 1.2.2

 Attachments: CASSANDRA-5038.patch, CASSANDRA-5038.patch, 
 LZ4Compressor.java, lz4-java.jar


 LZ4 is a new compression algo that's ~2x faster than Snappy.
 [~jpountz] has written a nice java port which includes a misc.Unsafe version 
 that performs = than our java snappy version.
 Details at http://blog.jpountz.net/post/28092106032/wow-lz4-is-fast
 The nice thing is this should work with java7 and be more portable.
 We can also fallback the pure java impl

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-5038) LZ4Compressor

2013-01-12 Thread Adrien Grand (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-5038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand updated CASSANDRA-5038:


Attachment: CASSANDRA-5038.patch

cassandra-dtest's configuration_test.py and cql_tests.py passed successfully 
with LZ4Compresssor (after performing a sed -i -e 
s/SnappyCompressor/LZ4Compressor/g on the test files). Unfortunately, some 
other tests mentioning compresssion (such as counter_tests.py or 
sstable_generation_loading_test.py) didn't pass but they didn't pass with 
Snappy either so this is probably unrelated?

 LZ4Compressor
 -

 Key: CASSANDRA-5038
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5038
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: T Jake Luciani
Priority: Minor
 Fix For: 1.2.2

 Attachments: CASSANDRA-5038.patch, CASSANDRA-5038.patch, 
 LZ4Compressor.java, lz4-java.jar


 LZ4 is a new compression algo that's ~2x faster than Snappy.
 [~jpountz] has written a nice java port which includes a misc.Unsafe version 
 that performs = than our java snappy version.
 Details at http://blog.jpountz.net/post/28092106032/wow-lz4-is-fast
 The nice thing is this should work with java7 and be more portable.
 We can also fallback the pure java impl

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-5038) LZ4Compressor

2013-01-09 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13548875#comment-13548875
 ] 

Adrien Grand commented on CASSANDRA-5038:
-

No, it doesn't. But I've been contacted by people who are willing to help me 
build and test on various platforms so I may be able to perform a new release 
quickly that will include JNI bindings. Additionally I opened a few tickets for 
bugs with the JNI bindings that need to be fixed or documented 
(https://github.com/jpountz/lz4-java/issues?state=open). But the format of the 
JNI version is the same as the format from the Java version so I think we can 
start working on a patch with this JAR.

 LZ4Compressor
 -

 Key: CASSANDRA-5038
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5038
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: T Jake Luciani
Priority: Minor
 Fix For: 1.2.2

 Attachments: CASSANDRA-5038.patch, LZ4Compressor.java, lz4-java.jar


 LZ4 is a new compression algo that's ~2x faster than Snappy.
 [~jpountz] has written a nice java port which includes a misc.Unsafe version 
 that performs = than our java snappy version.
 Details at http://blog.jpountz.net/post/28092106032/wow-lz4-is-fast
 The nice thing is this should work with java7 and be more portable.
 We can also fallback the pure java impl

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-5038) LZ4Compressor

2013-01-08 Thread Adrien Grand (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-5038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand updated CASSANDRA-5038:


Attachment: CASSANDRA-5038.patch

I performed a first release of lz4-java yesterday, here is a patch with tests 
that adds support for lz4 compression. You need to add the lz4 JAR to the lib 
directory 
(http://repo1.maven.org/maven2/net/jpountz/lz4/lz4/1.0.0/lz4-1.0.0.jar) before 
applying it.

This patch is very similar to Jake's Compressor impl, and I especially copied 
the {{supportedOptions}} method to return {{crc_check_chance}}. What does it 
change to declare this option as supported? ({{DeflateCompressor}} and 
{{SnappyCompressor}} just return an empty set.)

Does Cassandra have integration tests that we could try to run with this new 
compressor?

 LZ4Compressor
 -

 Key: CASSANDRA-5038
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5038
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: T Jake Luciani
Priority: Minor
 Fix For: 1.2.2

 Attachments: CASSANDRA-5038.patch, LZ4Compressor.java, lz4-java.jar


 LZ4 is a new compression algo that's ~2x faster than Snappy.
 [~jpountz] has written a nice java port which includes a misc.Unsafe version 
 that performs = than our java snappy version.
 Details at http://blog.jpountz.net/post/28092106032/wow-lz4-is-fast
 The nice thing is this should work with java7 and be more portable.
 We can also fallback the pure java impl

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-5038) LZ4Compressor

2012-12-06 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13514636#comment-13514636
 ] 

Adrien Grand commented on CASSANDRA-5038:
-

Hi Jake! I'm very happy that you are considering adding LZ4 support to 
Cassandra!

The native impl is indeed very very fast, and the unsafe java impl is ~48% 
faster at compressing data and ~1% slower at decompressing data than the Snappy 
impl you use according to my last benchmarks on the calgary corpus. If you run 
your own benchmarks, I'd be very interested in knowing the results.

I think the fallback to pure Java can be a plus: ideally Cassandra should first 
try to load the native impl (if the JAR includes bindings for the host 
architecture), then the unsafe java impl (if the JVM supports sun.misc.Unsafe) 
and then the pure Java impl (which is unfortunately ~40% slower at compressing 
and ~30% slower at decompressing than the unsafe impl).

I noticed in your patch that you are using the unknown size version of the 
decompressor. It can unfortunately be slightly slower than the normal 
decompressor. On large blocks, it can be faster to write the original length at 
the beginning of the compressed streams to be able to use the normal 
decompressor to decompress data.

I'm currently working on releasing this stuff, so hopefully this will be easier 
for Cassandra to use it.

 LZ4Compressor
 -

 Key: CASSANDRA-5038
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5038
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: T Jake Luciani
Priority: Minor
 Fix For: 1.2.1

 Attachments: LZ4Compressor.java, lz4-java.jar


 LZ4 is a new compression algo that's ~2x faster than Snappy.
 [~jpountz] has written a nice java port which includes a misc.Unsafe version 
 that performs = than our java snappy version.
 Details at http://blog.jpountz.net/post/28092106032/wow-lz4-is-fast
 The nice thing is this should work with java7 and be more portable.
 We can also fallback the pure java impl

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-5038) LZ4Compressor

2012-12-06 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13525993#comment-13525993
 ] 

Adrien Grand commented on CASSANDRA-5038:
-

bq. Cool, yeah I'm not sure if we can use the known size decompressor, does 
it have to be exact or can it be upper bounded? We know from the block size the 
max compressed length.

It needs to be exact, or decompression will fail. An option to be able to use 
it is to write the original length as an int (or better as a variable-length 
int) before the compressed bytes. Upon decompression, first read the original 
length and then use this original length to call the known size decompressor.

bq.  I'd suggest you add a simple way for us to pick the best compressor for 
our node.

This is what the LZ4Factory#defaultInstance (I should probably rename it to 
fastestInstance) aims at doing but it only tries unsafe then safe right now. 
I'll try to add support for the native impl soon.

Another feature of these compressors you might be interested in is that you can 
provide them with an output buffer of any length and they will succeed only if 
they managed to generate an output which is small enough (and they will fail as 
soon as they know they won't make it). So for example, you could decide to 
write the raw bytes instead of the compressed bytes if LZ4 didn't manage to 
compress your data by more than 10%:

{code}
  final int maxAcceptableCompressedLength = originalLength * 90 / 100;
  try {
dest[0] = 0; // means compressed
final int compressedLength = compressor.compress(src, 0, originalLength, 
dest, 1, maxAcceptableCompressedLength);
return 1 + compressedLength;
  } catch (LZ4Exception e) {
dest[0] = 1; // means not compressed
System.arraycopy(src, 0, dest, 1, originalLength);
return 1 + originalLength;
  }
{code}
(Only the native LZ4 HC impl doesn't support this feature.)


 LZ4Compressor
 -

 Key: CASSANDRA-5038
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5038
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: T Jake Luciani
Priority: Minor
 Fix For: 1.2.1

 Attachments: LZ4Compressor.java, lz4-java.jar


 LZ4 is a new compression algo that's ~2x faster than Snappy.
 [~jpountz] has written a nice java port which includes a misc.Unsafe version 
 that performs = than our java snappy version.
 Details at http://blog.jpountz.net/post/28092106032/wow-lz4-is-fast
 The nice thing is this should work with java7 and be more portable.
 We can also fallback the pure java impl

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira