Hi Cassandra users,

I'm wondering if there are any best practices to use keys (>= 64 KByte). I'm aware that there is a Cassandra restriction for this [1]. However, my application requires that some keys may be >= 64 KByte. I'm currently trying a simple hash-table solution:

//key BLOB may be >= 64 KByte
CREATE TABLE hashedKey VARINT, key BLOB, value BLOB, PRIMARY KEY (hashedKey)

That is, only hash values of keys are indexed. If I a need to search for a key, I do:

V search (K key)  {

//compute hash for key
int hashedKey = computeHash(key)
//retrieve key with this hash from Cassandra
K key_with_same_hash = getKeyWithHash(hashedKey)

while (key_with_same_hash != key) {

//compute next hash
hashedKey = resolveHashCollision(key)
//retrieve key with this new hash
key_with_same_hash = getKeyWithHash(hashedKey)

}

//found correct hash value for key, now retrieve value for this key
return getValueWithHash(hashedKey)

I'm aware that I could also do other hash collision resolutions. Most notably some that uses maps as an additional data structure:

//the key2value map holds all keys with this hashedKey
CREATE TABLE hashedKey VARINT, map<BLOB, BLOB> key2value, PRIMARY KEY (hashedKey)

However, as far as I understand, Cassandra and CQL would completely matrialize the key2value map for each lookup with the
hashedKey. This is not so cool ...

I was also considering splitting up the key in 64 KByte fragments and storing them in a tree, e.g., a binary search tree or a trie.

Does anyone have experience with this kind of problem?

Thanks for your help
Andreas

[1] http://wiki.apache.org/cassandra/FAQ#max_key_size

Reply via email to