Best practice to store large keys (>= 64 KByte)

Andreas Wagner Sun, 06 Jul 2014 03:02:25 -0700

Hi Cassandra users,

I'm wondering if there are any best practices to use keys (>= 64 KByte).I'm aware that there is a Cassandra restriction for this [1]. However,my application requires that some keys may be >= 64 KByte. I'm currentlytrying a simple hash-table solution:


//key BLOB may be >= 64 KByte
CREATE TABLE hashedKey VARINT, key BLOB, value BLOB, PRIMARY KEY (hashedKey)

That is, only hash values of keys are indexed. If I a need to searchfor a key, I do:


V search (K key)  {

//compute hash for key
int hashedKey = computeHash(key)
//retrieve key with this hash from Cassandra
K key_with_same_hash = getKeyWithHash(hashedKey)

while (key_with_same_hash != key) {

//compute next hash
hashedKey = resolveHashCollision(key)
//retrieve key with this new hash
key_with_same_hash = getKeyWithHash(hashedKey)

}

//found correct hash value for key, now retrieve value for this key
return getValueWithHash(hashedKey)

I'm aware that I could also do other hash collision resolutions. Mostnotably some that uses maps as an additional data structure:


//the key2value map holds all keys with this hashedKey

CREATE TABLE hashedKey VARINT, map<BLOB, BLOB> key2value, PRIMARY KEY(hashedKey)

However, as far as I understand, Cassandra and CQL would completelymatrialize the key2value map for each lookup with the

hashedKey. This is not so cool ...

I was also considering splitting up the key in 64 KByte fragments andstoring them in a tree, e.g., a binary search tree or a trie.


Does anyone have experience with this kind of problem?

Thanks for your help
Andreas

[1] http://wiki.apache.org/cassandra/FAQ#max_key_size

Best practice to store large keys (>= 64 KByte)

Reply via email to