Re: AW: problems while TimeUUIDType-index-querying with two expressions

2011-03-17 Thread Aaron Morton
Good work.

Aaron

On 17/03/2011, at 4:37 PM, Jonathan Ellis jbel...@gmail.com wrote:

 Thanks for tracking that down, Roland.  I've created
 https://issues.apache.org/jira/browse/CASSANDRA-2347 to fix this.
 
 On Wed, Mar 16, 2011 at 10:37 AM, Roland Gude roland.g...@yoochoose.com 
 wrote:
 I have applied the suggested changes in my local source tree and did run all
 my testcases (the supplied ones as well as those with real data).
 
 They do work now.
 
 
 
 Von: Roland Gude [mailto:roland.g...@yoochoose.com]
 Gesendet: Mittwoch, 16. März 2011 16:29
 
 An: user@cassandra.apache.org
 Betreff: AW: AW: problems while TimeUUIDType-index-querying with two
 expressions
 
 
 
 With debugging into it i found something that might be the issue (please
 correct me if I am wrong):
 
 In ColumnFamilyStore.java lines 1597 to 1613 is the code that checks whether
 some column satisfies an index expression.
 
 In line 1608 it compares the value of the index expression with the value
 given in the expression.
 
 
 
 For this comparison it utilizes the comparator of the columnfamily while it
 should use the comparator of the Column validation class.
 
 
 
 private static boolean satisfies(ColumnFamily data, IndexClause clause,
 IndexExpression first)
 
 {
 
 for (IndexExpression expression : clause.expressions)
 
 {
 
 // (we can skip first since we already know it's satisfied)
 
 if (expression == first)
 
 continue;
 
 // check column data vs expression
 
 IColumn column = data.getColumn(expression.column_name);
 
 if (column == null)
 
 return false;
 
 int v = data.getComparator().compare(column.value(),
 expression.value);
 
 if (!satisfies(v, expression.op))
 
 return false;
 
 }
 
 return true;
 
 }
 
 
 
 
 
 The line 1608 should be changed from:
 
 int v = data.getComparator().compare(column.value(),
 expression.value);
 
 
 
 to
 
 int v = data.metadata().getValueValidator
 (expression.column_name).compare(column.value(), expression.value);
 
 
 
 
 
 
 
 greetings roland
 
 
 
 
 
 Von: Roland Gude [mailto:roland.g...@yoochoose.com]
 Gesendet: Mittwoch, 16. März 2011 14:50
 An: user@cassandra.apache.org
 Betreff: AW: AW: problems while TimeUUIDType-index-querying with two
 expressions
 
 
 
 Hi Aaron,
 
 
 
 now I am completely confused.
 
 The code that did not work for days now – like a miracle – works even
 against the unpatched Cassandra 0.7.3 but the testcase still does not…
 
 There seems to be some randomness in whether it works or not (which is a bad
 sign I think)… I will debug a little deeper into this and report anything I
 find.
 
 
 
 Greetings,
 
 roland
 
 
 
 Von: aaron morton [mailto:aa...@thelastpickle.com]
 Gesendet: Mittwoch, 16. März 2011 01:15
 An: user@cassandra.apache.org
 Betreff: Re: AW: problems while TimeUUIDType-index-querying with two
 expressions
 
 
 
 Have attached a patch
 to https://issues.apache.org/jira/browse/CASSANDRA-2328
 
 
 
 Can you give it a try ? You should not get a InvalidRequestException when
 you send an invalid name or value in the query expression.
 
 
 
 Aaron
 
 
 
 On 16 Mar 2011, at 10:30, aaron morton wrote:
 
 
 
 Will have the Jira I created finished soon, it's a legitimate issue we
 should be validating the column names and values when a ger_indexed_slice()
 request is sent. The error in your original email shows that.
 
 
 
 WRT your code example. You are using the TimeUUID Validator for the column
 name when creating the index expression, but are using a string serialiser
 for the value...
 
 IndexedSlicesQueryString, UUID, String indexQuery = HFactory
 .createIndexedSlicesQuery(keyspace,
stringSerializer,
 UUID_SERIALIZER, stringSerializer);
 indexQuery.addEqualsExpression(MANDATOR_UUID, mandator);
 
 But your schema is saying it is a bytes type...
 
 
 
 column_metadata=[{column_name: --1000--,
 validation_class: BytesType, index_name: mandatorIndex, index_type: KEYS},
 {column_name: 0001--1000--, validation_class:
 BytesType, index_name: useridIndex, index_type: KEYS}];On 15 Mar 2011, at
 22:41,
 
 
 
 Once I have the patch can you apply it and run your test again ?
 
 
 
 You may also want to ask on the Hector list if it automagically check you
 are using the correct types when creating an IndexedSlicesQuery.
 
 
 
 Aaron
 
 
 
 Roland Gude wrote:
 
 
 
 Forgot to attach the source code… here it comes
 
 
 
 Von: Roland Gude [mailto:roland.g...@yoochoose.com]
 Gesendet: Dienstag, 15. März 2011 10:39
 An: user@cassandra.apache.org
 Betreff: AW: problems while TimeUUIDType-index-querying with two expressions
 
 
 
 Actually its not the column values that should be UUIDs in our case, but the
 column keys. The CF uses

Re: AW: problems while TimeUUIDType-index-querying with two expressions

2011-03-16 Thread Jonathan Ellis
Thanks for tracking that down, Roland.  I've created
https://issues.apache.org/jira/browse/CASSANDRA-2347 to fix this.

On Wed, Mar 16, 2011 at 10:37 AM, Roland Gude roland.g...@yoochoose.com wrote:
 I have applied the suggested changes in my local source tree and did run all
 my testcases (the supplied ones as well as those with real data).

 They do work now.



 Von: Roland Gude [mailto:roland.g...@yoochoose.com]
 Gesendet: Mittwoch, 16. März 2011 16:29

 An: user@cassandra.apache.org
 Betreff: AW: AW: problems while TimeUUIDType-index-querying with two
 expressions



 With debugging into it i found something that might be the issue (please
 correct me if I am wrong):

 In ColumnFamilyStore.java lines 1597 to 1613 is the code that checks whether
 some column satisfies an index expression.

 In line 1608 it compares the value of the index expression with the value
 given in the expression.



 For this comparison it utilizes the comparator of the columnfamily while it
 should use the comparator of the Column validation class.



     private static boolean satisfies(ColumnFamily data, IndexClause clause,
 IndexExpression first)

     {

     for (IndexExpression expression : clause.expressions)

     {

     // (we can skip first since we already know it's satisfied)

     if (expression == first)

     continue;

     // check column data vs expression

     IColumn column = data.getColumn(expression.column_name);

     if (column == null)

     return false;

     int v = data.getComparator().compare(column.value(),
 expression.value);

     if (!satisfies(v, expression.op))

     return false;

     }

     return true;

     }





 The line 1608 should be changed from:

     int v = data.getComparator().compare(column.value(),
 expression.value);



 to

     int v = data.metadata().getValueValidator
 (expression.column_name).compare(column.value(), expression.value);







 greetings roland





 Von: Roland Gude [mailto:roland.g...@yoochoose.com]
 Gesendet: Mittwoch, 16. März 2011 14:50
 An: user@cassandra.apache.org
 Betreff: AW: AW: problems while TimeUUIDType-index-querying with two
 expressions



 Hi Aaron,



 now I am completely confused.

 The code that did not work for days now – like a miracle – works even
 against the unpatched Cassandra 0.7.3 but the testcase still does not…

 There seems to be some randomness in whether it works or not (which is a bad
 sign I think)… I will debug a little deeper into this and report anything I
 find.



 Greetings,

 roland



 Von: aaron morton [mailto:aa...@thelastpickle.com]
 Gesendet: Mittwoch, 16. März 2011 01:15
 An: user@cassandra.apache.org
 Betreff: Re: AW: problems while TimeUUIDType-index-querying with two
 expressions



 Have attached a patch
 to https://issues.apache.org/jira/browse/CASSANDRA-2328



 Can you give it a try ? You should not get a InvalidRequestException when
 you send an invalid name or value in the query expression.



 Aaron



 On 16 Mar 2011, at 10:30, aaron morton wrote:



 Will have the Jira I created finished soon, it's a legitimate issue we
 should be validating the column names and values when a ger_indexed_slice()
 request is sent. The error in your original email shows that.



 WRT your code example. You are using the TimeUUID Validator for the column
 name when creating the index expression, but are using a string serialiser
 for the value...

 IndexedSlicesQueryString, UUID, String indexQuery = HFactory
     .createIndexedSlicesQuery(keyspace,
    stringSerializer,
 UUID_SERIALIZER, stringSerializer);
         indexQuery.addEqualsExpression(MANDATOR_UUID, mandator);

 But your schema is saying it is a bytes type...



 column_metadata=[{column_name: --1000--,
 validation_class: BytesType, index_name: mandatorIndex, index_type: KEYS},
 {column_name: 0001--1000--, validation_class:
 BytesType, index_name: useridIndex, index_type: KEYS}];On 15 Mar 2011, at
 22:41,



 Once I have the patch can you apply it and run your test again ?



 You may also want to ask on the Hector list if it automagically check you
 are using the correct types when creating an IndexedSlicesQuery.



 Aaron



 Roland Gude wrote:



 Forgot to attach the source code… here it comes



 Von: Roland Gude [mailto:roland.g...@yoochoose.com]
 Gesendet: Dienstag, 15. März 2011 10:39
 An: user@cassandra.apache.org
 Betreff: AW: problems while TimeUUIDType-index-querying with two expressions



 Actually its not the column values that should be UUIDs in our case, but the
 column keys. The CF uses TimeUUID ordering and the values are just some
 ByteArrays. Even with changing the code to use UUIDSerializer instead of
 serializing the UUIDs manually the issue still exists.



 As far as I can see

AW: problems while TimeUUIDType-index-querying with two expressions

2011-03-15 Thread Roland Gude
Actually its not the column values that should be UUIDs in our case, but the 
column keys. The CF uses TimeUUID ordering and the values are just some 
ByteArrays. Even with changing the code to use UUIDSerializer instead of 
serializing the UUIDs manually the issue still exists.

As far as I can see, there is nothing wrong with the IndexExpression.
using two Index expressions with key=TimedUUID and Value=anything does not work
using one index expression (any one of the other two) alone does work fine.

I refactored Johannes code into a junit testcase. It  needs the cluster 
configured as described in Johannes mail.
There are three cases. Two with one of the indexExpressions and one with both 
index expression. The one with Both IndexExpression will never finish and youz 
will see the exception in the Cassandra logs.

Bye,
roland

Von: aaron morton [mailto:aa...@thelastpickle.com]
Gesendet: Dienstag, 15. März 2011 07:54
An: user@cassandra.apache.org
Cc: Juergen Link; Roland Gude; her...@datastax.com
Betreff: Re: problems while TimeUUIDType-index-querying with two expressions

Perfectly reasonable, created 
https://issues.apache.org/jira/browse/CASSANDRA-2328

Aaron
On 15 Mar 2011, at 16:52, Jonathan Ellis wrote:


Sounds like we should send an InvalidRequestException then.

On Mon, Mar 14, 2011 at 8:06 PM, aaron morton 
aa...@thelastpickle.commailto:aa...@thelastpickle.com wrote:

It's failing to when comparing two TimeUUID values because on of them is not
properly formatted. In this case it's comparing a stored value with the
value passed in the get_indexed_slice() query expression.
I'm going to assume it's the value passed for the expression.
When you create the IndexedSlicesQuery this is incorrect
IndexedSlicesQueryString, byte[], byte[] indexQuery = HFactory
.createIndexedSlicesQuery(keyspace,
stringSerializer, bytesSerializer, bytesSerializer);
Use a UUIDSerializer for the last param and then pass the UUID you want to
build the expressing. Rather than the string/byte thing you are passing
Hope that helps.
Aaron
On 15 Mar 2011, at 04:17, Johannes Hoerle wrote:

Hi all,

in order to improve our queries, we started to use IndexedSliceQueries from
the hector project (https://github.com/zznate/hector-examples). I followed
the instructions for creating IndexedSlicesQuery with
GetIndexedSlices.java.
I created the corresponding CF with in a keyspace called Keyspace1 (
create keyspace  Keyspace1;) with:
create column family Indexed1 with column_type='Standard' and
comparator='UTF8Type' and keys_cached=20 and read_repair_chance=1.0 and
rows_cached=2 and column_metadata=[{column_name: birthdate,
validation_class: LongType, index_name: dateIndex, index_type:
KEYS},{column_name: birthmonth, validation_class: LongType, index_name:
monthIndex, index_type: KEYS}];
and the example GetIndexedSlices.java worked fine.

Output of CF Indexed1:
---
[default@Keyspace1] list Indexed1;
Using default limit of 100
---
RowKey: fake_key_12
= (column=birthdate, value=1974, timestamp=1300110485826059)
= (column=birthmonth, value=0, timestamp=1300110485826060)
= (column=fake_column_0, value=66616b655f76616c75655f305f3132,
timestamp=1300110485826056)
= (column=fake_column_1, value=66616b655f76616c75655f315f3132,
timestamp=1300110485826057)
= (column=fake_column_2, value=66616b655f76616c75655f325f3132,
timestamp=1300110485826058)
---
RowKey: fake_key_8
= (column=birthdate, value=1974, timestamp=1300110485826039)
= (column=birthmonth, value=8, timestamp=1300110485826040)
= (column=fake_column_0, value=66616b655f76616c75655f305f38,
timestamp=1300110485826036)
= (column=fake_column_1, value=66616b655f76616c75655f315f38,
timestamp=1300110485826037)
= (column=fake_column_2, value=66616b655f76616c75655f325f38,
timestamp=1300110485826038)
---



Now to the problem:
As we have another column format in our cluster (using TimeUUIDType as
comparator in CF definition) I adapted the application to our schema on a
cassandra-0.7.3 cluster.
We use a manually defined UUID for a mandator id index
(--1000--) and another one for a userid index
(0001--1000--). It can be created with:
create column family ByUser with column_type='Standard' and
comparator='TimeUUIDType' and keys_cached=20 and read_repair_chance=1.0
and rows_cached=2 and column_metadata=[{column_name:
--1000--, validation_class: BytesType,
index_name: mandatorIndex, index_type: KEYS}, {column_name:
0001--1000--, validation_class: BytesType,
index_name: useridIndex, index_type: KEYS}];


which looks in the cluster using cassandra-cli like this:

[default@Keyspace1] describe keyspace;
Keyspace: Keyspace1:
  Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
Replication Factor: 1
  Column Families:
ColumnFamily: ByUser
  Columns sorted by: