I don't think there is an easy "answer" to this...

A possible approach, based upon the implied dimensions of the problem,
would be to maintain a bloom filter over "words" for each user as a
partition key with the user as clustering key. Then a single query would
efficiently yield the list of users that "may" match and other techniques
could be used to refine that list down to actual matches.

ml


On Thu, Jul 31, 2014 at 10:44 AM, Philo Yang <ud1...@gmail.com> wrote:

> Hi all,
>
> I have a cluster of 2.0.6 and one of my tables is like this:
> CREATE TABLE word (
>   user text,
>   word text,
>   flag double,
>   PRIMARY KEY (user, word)
> )
>
> each "user" has about 10000 "word" per node. I have a requirement of
> selecting all rows where user='someuser' and word is in a large set whose
> size is about 1000 .
>
> In C* document, it is not recommended to use "select ... in" just like:
>
> select from word where user='someuser' and word in ('a','b','aa','ab',...)
>
> So now I select all rows where user='someuser' and filtrate them via
> client rather than via C*. Of course, I use Datastax Java Driver to page
> the resultset by setFetchSize(1000).  Is it the best way? I found the
> system's load is high because of large range query, should I change to
> select for only one row each time and select 1000 times?
>
> just like:
> select from word where user='someuser' and word = 'a';
> select from word where user='someuser' and word = 'b';
> select from word where user='someuser' and word = 'c';
> .....
>
> Which method will cause lower pressure on Cassandra cluster?
>
> Thanks,
> Philo Yang
>
>

Reply via email to