Duplicate result of get_indexed_slices, depending on indexClause.count

sam_ Thu, 14 Apr 2011 23:43:34 -0700

Hi All,

I have been using Cassandra 0.7.2 and 0.7.4 with Thrift API (using Java).


I noticed that if I am querying a Column Family with indexed columns
sometimes I get a duplicate result in get_indexed_slices depending on the
number of rows in the CF and the count that I set in IndexClause.count.
It also depends on the order of rows in CF.

For example consider the following CF that I call Attributes:

create column family Attributes with comparator=UTF8Type
        and column_metadata=[
                {column_name: range_id, validation_class: LongType, index_type: 
KEYS},
                {column_name: attr_key, validation_class: UTF8Type, index_type: 
KEYS},
                {column_name: attr_val, validation_class: BytesType, 
index_type: KEYS}
        ];

And suppose I have the following rows in the CF:

key           range_id       attr_key        attr_val
"1/@1/0",       1,              "A",               "1"
"1/5/0",          1,              "B",             "1000"
"3/@1/0",       2,              "A",               "1"
"3/5/0",          2,              "B",             "1001"
"5/@1/0",       3,              "A",               "2"
"5/5/0",          3,              "B",             "1002"
"7/@1/0",       4,              "A",               "2"
"7/5/0",          4,             "B",              "1003"

Now if I have a query with IndexClause like this (in pseudo code):

attr_key == "A" AND attr_val == "1"

with indexClause.count = 4;

Then I ill get the rows with the following keys from get_indexed_slices :

"1/@1/0", "3/@1/0", "3/@1/0"

The last key is a duplicate!

This is very sensitive to the order of rows in the CF and the number of rows
and the number you set in indexClause.count. I noticed when the number of
rows in the CF is twice the indexClause.count this issue might happen
depending on the order of rows in CF!

This seems a bug. And it occurs in both 0.7.2 and 0.7.4. 

Is there a solution to this problem? 

Many Thanks,
Sam





--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Duplicate-result-of-get-indexed-slices-depending-on-indexClause-count-tp6275394p6275394.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.

Duplicate result of get_indexed_slices, depending on indexClause.count

Reply via email to