Re: Storing pre-sorted data

2011-10-21 Thread Stephen Connolly
Well you could use a DOUBLE value to encode relative positions... first item in list gets key 1.0 insert before first item -> key[first]/2.0; append after last item -> key[last]*2.0; insert after non-final item -> (key[n]+key[n+1])/2.0 Using double precision should give you quite a space to fit i

Re: Storing pre-sorted data

2011-10-21 Thread Matthias Pfau
Hi David, yes, what we are working on could be referenced as "encrypted database service". Thanks for your insights. We will continue to work on this topic! Kind regards Matthias On 10/21/2011 02:31 AM, David Jeske wrote: If I understand you correctly, you are saying that you will never have

Re: Storing pre-sorted data

2011-10-20 Thread David Jeske
> > > 2) If a single key, would adding a file/block/record-level encryption to >> Cassandra solve this problem? If not, why not? Is there something >> special about your encryption methods? >> > > There is nothing special about our encryption methods but will never be > able to encrypt or decrypt

Re: Storing pre-sorted data

2011-10-18 Thread Matthias Pfau
David, thanks for sharing these ideas. We basically implemented it by using longs and dividing the long-namespace into multiple parts on each insert. As you already described, the biggest problem with this approach is, that we are not able to simply invoke get(x) because the indexes of the lis

Re: Storing pre-sorted data

2011-10-18 Thread Matthias Pfau
Hi David, encrypting and Decrypting data in cassandra is not an option to us. However, I read your elaboration on best practices for adding a custom feature to cassandra with a lot of interest. Thank you! Please see below for the answers to your questions. Kind regards Matthias On 10/18/2011

Re: Storing pre-sorted data

2011-10-18 Thread David Jeske
On Tue, Oct 18, 2011 at 12:14 AM, Matthias Pfau wrote: > we want to sort completely on the client-side (where the data is > encrypted). But that requires an "insert at offset X" operation. We would > always use CL QUORUM and client side synchronisation. > You can do "insert at offset X"... just

Re: Storing pre-sorted data

2011-10-18 Thread Matthias Pfau
Aaron, we want to sort completely on the client-side (where the data is encrypted). But that requires an "insert at offset X" operation. We would always use CL QUORUM and client side synchronisation. However, it seems to be not be a good idea to add such a feature to cassandra as everyone usi

Re: Storing pre-sorted data

2011-10-17 Thread David Jeske
On Mon, Oct 17, 2011 at 2:39 AM, Matthias Pfau wrote: > We would be very happy if cassandra would give us an option to maintain the > sort order on our own (application logic). That is why it would be > interesting to hear from any of the developers if it would be easily > possible to add such a

Re: Storing pre-sorted data

2011-10-17 Thread aaron morton
Sort order is determined by the Comparator, which is an implementation of the o.a.c.db.marshal.AbstractType class. If you wish to order column (names) in a row based on an opaque (to cassandra) byte value you can create your own implementation. You would then need to decrypt and compare colum

Re: Storing pre-sorted data

2011-10-17 Thread Matthias Pfau
Thanks for that hint! However, it seems like soundex is a very language specific algorithm (US English). We have to get into this topic further... Kind regards Matthias On 10/13/2011 10:43 PM, Stephen Connolly wrote: Then just use a soundex function on the first word in the text... that will s

Re: Storing pre-sorted data

2011-10-17 Thread Matthias Pfau
David, thanks for your nice summary on this topic. We would be very happy if cassandra would give us an option to maintain the sort order on our own (application logic). That is why it would be interesting to hear from any of the developers if it would be easily possible to add such a feature

Re: Storing pre-sorted data

2011-10-15 Thread David Jeske
Logically, whether you use cassandra or not, there is some "physics" of sorted order structures which you should understand and dictate what is possible. In order to keep data sorted, a database needs to be able to see the proper sort-order of the data "all the time" not just at insertion or query

Re: Storing pre-sorted data

2011-10-13 Thread Stephen Connolly
Then just use a soundex function on the first word in the text... that will shrink it sufficiently and give nice buckets in near sequential order (http://en.wikipedia.org/wiki/Soundex) On 13 October 2011 21:21, Matthias Pfau wrote: > Hi Stephen, > we are hashing the first 8 byte (8 US-ASCII chara

Re: Storing pre-sorted data

2011-10-13 Thread Matthias Pfau
Hi Stephen, we are hashing the first 8 byte (8 US-ASCII characters) of text that has been written by humans. Wouldn't it be easy for the attacker to do a dictionary attack on this text, especially if he knows the language of the text? Kind regards Matthias On 10/13/2011 08:20 PM, Stephen Con

Re: Storing pre-sorted data

2011-10-13 Thread Stephen Connolly
in theory, however they have less than 32 bits of entropy from which they can do that, leaving them with at least 32 more bits of combinations to try... that's 2 billion or so... must be a big dictionary - Stephen --- Sent from my Android phone, so random spelling mistakes, random nonsense words

Re: Storing pre-sorted data

2011-10-13 Thread Matthias Pfau
Hi Stephen, this sounds very reasonable. But wouldn't this enable an attacker to execute dictionary attacks in order to "decrypt" the first 8 bytes of the plain text? Kind regards Matthias On 10/13/2011 05:03 PM, Stephen Connolly wrote: It wouldn't be unencrypted... which is the point you u

Re: Storing pre-sorted data

2011-10-13 Thread Matthias Pfau
Hi Zach, thanks for your additional input. You are absolutely right: The long namespace should be big enough. We are going to insert up to 2^32 values into the list. We only need support for get(index), insert(index) and remove(index) while get and insert will be used very often. Remove is al

Re: Storing pre-sorted data

2011-10-13 Thread Stephen Connolly
It wouldn't be unencrypted... which is the point you use a one way linear hash function to take the first, say 8 bytes, of unencrypted data and turn it into 4 bytes of a sort prefix. You've used lost half the data in the process, so effectively each bit is an OR of two bits and you can only infer

Re: Storing pre-sorted data

2011-10-13 Thread Zach Richardson
Matthias, Answers below. On Thu, Oct 13, 2011 at 9:03 AM, Matthias Pfau wrote: > Hi Zach, > thanks for that good idea. Unfortunately, our list needs to be rewritten > often because our data is far away from being evenly distributed. This shouldn't be a problem if you use long's. If you were to

Re: Storing pre-sorted data

2011-10-13 Thread Matthias Pfau
Hi Zach, thanks for that good idea. Unfortunately, our list needs to be rewritten often because our data is far away from being evenly distributed. However, we could get this under control but there is a more severe problem: Random access is very hard to implement on a structure with undefine

Re: Storing pre-sorted data

2011-10-13 Thread Zach Richardson
Matthias, This is an interesting problem. I would consider using long's as the column type, where your column names are evenly distributed longs in sort order when you first write your list out. So if you have items A and C with the long column names 1000 and 2000, and then you have to insert B,

Re: Storing pre-sorted data

2011-10-13 Thread Matthias Pfau
Hi Stephen, this is a great idea but unfortunately doesn't work for us either as we can not store the data in an unencrypted form. Kind regards Matthias On 10/12/2011 07:42 PM, Stephen Connolly wrote: could you prefix the data with 3-4 bytes of a linear hash of the unencypted data? it wouldn'

Re: Storing pre-sorted data

2011-10-12 Thread Stephen Connolly
could you prefix the data with 3-4 bytes of a linear hash of the unencypted data? it wouldn't be a perfect sort, but you'd have less of a range to query to get the sorted values? - Stephen --- Sent from my Android phone, so random spelling mistakes, random nonsense words and other nonsense are a

Re: Storing pre-sorted data

2011-10-12 Thread Matthias Pfau
Unfortunately, that is not an option as we have to store the data in an compressed and encrypted and therefore binary and non-sortable form. On 10/12/2011 06:39 PM, David McNelis wrote: Is it an option to not convert the data to binary prior to inserting into Cassandra? Also, how large are the

Re: Storing pre-sorted data

2011-10-12 Thread David McNelis
Is it an option to not convert the data to binary prior to inserting into Cassandra? Also, how large are the strings you're sorting? If its viable to not convert to binary before writing to Cassandra, and you use one of the string based column ordering techniques (utf8, ascii, for example), then

Storing pre-sorted data

2011-10-12 Thread Matthias Pfau
Hi there, we are currently building a prototype based on cassandra and came into problems on implementing sorted lists containing millions of items. The special thing about the items of our lists is, that cassandra is not able to sort them as the data is stored in a binary format which is not