Hi all,
I am using C* 1.2.4 and using CQL3 with Astyanax to consume large amount of
user based data (around 50-100K / sec). Requests come in based on user cookies
which I then need to link to a user (as users can change their cookies). This
is done using a link table:
CREATE TABLE cookie_user_lookup (
cookie TEXT PRIMARY KEY,
user_id BIGINT,
creation_time TIMESTAMP
) with
compression={'crc_check_chance':0.1,'sstable_compression':'LZ4Compressor'} and
compaction={'class':'LeveledCompactionStrategy'} and
gc_grace_seconds = 86400;
As I said, I am handling a large number of these per second and wanted to get
your take on how best to do the lookup. I find that there are 3 ways:
* Serially fetch 1 by 1. The latency is very low at 0.1 ms but multiplying
that by thousands per second becomes substantial. This is too slow
* Serially fetch 1 by 1 but on separate threads. This would require a very
large number of concurrent connections (unless I change to datastax's binary
protocol) as well as threads. Seems heavy
* Batch fetch. This is what I'm doing now where I build a very large select
* from cookie_user_lookup where cookie in (a,b,c,.. Etc). I am actually doing
around 10K of these at a time and getting a response time in my cluster of
around 100 ms. This is very acceptable but wanted to get everyone's take as I
have seen messages about this "starving" the request pool. Note that I'm
running in HSHA and am rarely seeing any reads waiting.
I appreciate your input!