Re: Simple Compression Idea

2011-01-31 Thread Stephen Connolly
On 31 January 2011 04:41, David G. Boney dbon...@semanticartifacts.com wrote: I propose a simple idea for compression using a compressed string datatype. The compressed string datatype could be implemented for column family keys by creating a compressed string ordered partitioner. The

Re: Simple Compression Idea

2011-01-31 Thread David G. Boney
In Cassandra, strings are stored as UTF-8. In arithmetic coding compression, the modeling is separate from the coding. A standard arrangement is to have a 0-order model, frequencies of individual bytes, 1-order model, frequencies of two byte occurrences, and 2-order models, frequencies of three

Re: Simple Compression Idea

2011-01-31 Thread David G. Boney
Is the partitioner the only code that does comparisons on the keys of a column family? What about get_range_slices(), does it only use the partitioner's comparison method? - Sincerely, David G. Boney dbon...@semanticartifacts.com http://www.semanticartifacts.com On Jan 31, 2011,

Re: Simple Compression Idea

2011-01-31 Thread Terje Marthinussen
There is a lot of overhead in the serialized data itself (just have a look at a sstable file). It would be great to be able to compress at the byte array level rather than string. Regards, Terje On 1 Feb 2011, at 03:15, David G. Boney dbon...@semanticartifacts.com wrote: In Cassandra,

Re: Simple Compression Idea

2011-01-31 Thread Mike Malone
I don't see anything inherently wrong with your proposal, it would almost definitely be beneficial in certain scenarios. We use what could be called static compression (golomb-esque encodings) for some data types on our Cassandra clusters. It's useful for representing things like full precision

Simple Compression Idea

2011-01-30 Thread David G. Boney
I propose a simple idea for compression using a compressed string datatype. The compressed string datatype could be implemented for column family keys by creating a compressed string ordered partitioner. The compressed string ordered partitioner works by decompressing the string and then