Hi Cassandra users,
I'm wondering if there are any best practices to use keys (>= 64 KByte).
I'm aware that there is a Cassandra restriction for this [1]. However,
my application requires that some keys may be >= 64 KByte. I'm currently
trying a simple hash-table solution:
//key BLOB may be >= 64 KByte
CREATE TABLE hashedKey VARINT, key BLOB, value BLOB, PRIMARY KEY (hashedKey)
That is, only hash values of keys are indexed. If I a need to search
for a key, I do:
V search (K key) {
//compute hash for key
int hashedKey = computeHash(key)
//retrieve key with this hash from Cassandra
K key_with_same_hash = getKeyWithHash(hashedKey)
while (key_with_same_hash != key) {
//compute next hash
hashedKey = resolveHashCollision(key)
//retrieve key with this new hash
key_with_same_hash = getKeyWithHash(hashedKey)
}
//found correct hash value for key, now retrieve value for this key
return getValueWithHash(hashedKey)
I'm aware that I could also do other hash collision resolutions. Most
notably some that uses maps as an additional data structure:
//the key2value map holds all keys with this hashedKey
CREATE TABLE hashedKey VARINT, map<BLOB, BLOB> key2value, PRIMARY KEY
(hashedKey)
However, as far as I understand, Cassandra and CQL would completely
matrialize the key2value map for each lookup with the
hashedKey. This is not so cool ...
I was also considering splitting up the key in 64 KByte fragments and
storing them in a tree, e.g., a binary search tree or a trie.
Does anyone have experience with this kind of problem?
Thanks for your help
Andreas
[1] http://wiki.apache.org/cassandra/FAQ#max_key_size