Re: AW: problems while TimeUUIDType-index-querying with two expressions
Good work. Aaron On 17/03/2011, at 4:37 PM, Jonathan Ellis jbel...@gmail.com wrote: Thanks for tracking that down, Roland. I've created https://issues.apache.org/jira/browse/CASSANDRA-2347 to fix this. On Wed, Mar 16, 2011 at 10:37 AM, Roland Gude roland.g...@yoochoose.com wrote: I have applied the suggested changes in my local source tree and did run all my testcases (the supplied ones as well as those with real data). They do work now. Von: Roland Gude [mailto:roland.g...@yoochoose.com] Gesendet: Mittwoch, 16. März 2011 16:29 An: user@cassandra.apache.org Betreff: AW: AW: problems while TimeUUIDType-index-querying with two expressions With debugging into it i found something that might be the issue (please correct me if I am wrong): In ColumnFamilyStore.java lines 1597 to 1613 is the code that checks whether some column satisfies an index expression. In line 1608 it compares the value of the index expression with the value given in the expression. For this comparison it utilizes the comparator of the columnfamily while it should use the comparator of the Column validation class. private static boolean satisfies(ColumnFamily data, IndexClause clause, IndexExpression first) { for (IndexExpression expression : clause.expressions) { // (we can skip first since we already know it's satisfied) if (expression == first) continue; // check column data vs expression IColumn column = data.getColumn(expression.column_name); if (column == null) return false; int v = data.getComparator().compare(column.value(), expression.value); if (!satisfies(v, expression.op)) return false; } return true; } The line 1608 should be changed from: int v = data.getComparator().compare(column.value(), expression.value); to int v = data.metadata().getValueValidator (expression.column_name).compare(column.value(), expression.value); greetings roland Von: Roland Gude [mailto:roland.g...@yoochoose.com] Gesendet: Mittwoch, 16. März 2011 14:50 An: user@cassandra.apache.org Betreff: AW: AW: problems while TimeUUIDType-index-querying with two expressions Hi Aaron, now I am completely confused. The code that did not work for days now – like a miracle – works even against the unpatched Cassandra 0.7.3 but the testcase still does not… There seems to be some randomness in whether it works or not (which is a bad sign I think)… I will debug a little deeper into this and report anything I find. Greetings, roland Von: aaron morton [mailto:aa...@thelastpickle.com] Gesendet: Mittwoch, 16. März 2011 01:15 An: user@cassandra.apache.org Betreff: Re: AW: problems while TimeUUIDType-index-querying with two expressions Have attached a patch to https://issues.apache.org/jira/browse/CASSANDRA-2328 Can you give it a try ? You should not get a InvalidRequestException when you send an invalid name or value in the query expression. Aaron On 16 Mar 2011, at 10:30, aaron morton wrote: Will have the Jira I created finished soon, it's a legitimate issue we should be validating the column names and values when a ger_indexed_slice() request is sent. The error in your original email shows that. WRT your code example. You are using the TimeUUID Validator for the column name when creating the index expression, but are using a string serialiser for the value... IndexedSlicesQueryString, UUID, String indexQuery = HFactory .createIndexedSlicesQuery(keyspace, stringSerializer, UUID_SERIALIZER, stringSerializer); indexQuery.addEqualsExpression(MANDATOR_UUID, mandator); But your schema is saying it is a bytes type... column_metadata=[{column_name: --1000--, validation_class: BytesType, index_name: mandatorIndex, index_type: KEYS}, {column_name: 0001--1000--, validation_class: BytesType, index_name: useridIndex, index_type: KEYS}];On 15 Mar 2011, at 22:41, Once I have the patch can you apply it and run your test again ? You may also want to ask on the Hector list if it automagically check you are using the correct types when creating an IndexedSlicesQuery. Aaron Roland Gude wrote: Forgot to attach the source code… here it comes Von: Roland Gude [mailto:roland.g...@yoochoose.com] Gesendet: Dienstag, 15. März 2011 10:39 An: user@cassandra.apache.org Betreff: AW: problems while TimeUUIDType-index-querying with two expressions Actually its not the column values that should be UUIDs in our case, but the column keys. The CF uses
Re: AW: problems while TimeUUIDType-index-querying with two expressions
Thanks for tracking that down, Roland. I've created https://issues.apache.org/jira/browse/CASSANDRA-2347 to fix this. On Wed, Mar 16, 2011 at 10:37 AM, Roland Gude roland.g...@yoochoose.com wrote: I have applied the suggested changes in my local source tree and did run all my testcases (the supplied ones as well as those with real data). They do work now. Von: Roland Gude [mailto:roland.g...@yoochoose.com] Gesendet: Mittwoch, 16. März 2011 16:29 An: user@cassandra.apache.org Betreff: AW: AW: problems while TimeUUIDType-index-querying with two expressions With debugging into it i found something that might be the issue (please correct me if I am wrong): In ColumnFamilyStore.java lines 1597 to 1613 is the code that checks whether some column satisfies an index expression. In line 1608 it compares the value of the index expression with the value given in the expression. For this comparison it utilizes the comparator of the columnfamily while it should use the comparator of the Column validation class. private static boolean satisfies(ColumnFamily data, IndexClause clause, IndexExpression first) { for (IndexExpression expression : clause.expressions) { // (we can skip first since we already know it's satisfied) if (expression == first) continue; // check column data vs expression IColumn column = data.getColumn(expression.column_name); if (column == null) return false; int v = data.getComparator().compare(column.value(), expression.value); if (!satisfies(v, expression.op)) return false; } return true; } The line 1608 should be changed from: int v = data.getComparator().compare(column.value(), expression.value); to int v = data.metadata().getValueValidator (expression.column_name).compare(column.value(), expression.value); greetings roland Von: Roland Gude [mailto:roland.g...@yoochoose.com] Gesendet: Mittwoch, 16. März 2011 14:50 An: user@cassandra.apache.org Betreff: AW: AW: problems while TimeUUIDType-index-querying with two expressions Hi Aaron, now I am completely confused. The code that did not work for days now – like a miracle – works even against the unpatched Cassandra 0.7.3 but the testcase still does not… There seems to be some randomness in whether it works or not (which is a bad sign I think)… I will debug a little deeper into this and report anything I find. Greetings, roland Von: aaron morton [mailto:aa...@thelastpickle.com] Gesendet: Mittwoch, 16. März 2011 01:15 An: user@cassandra.apache.org Betreff: Re: AW: problems while TimeUUIDType-index-querying with two expressions Have attached a patch to https://issues.apache.org/jira/browse/CASSANDRA-2328 Can you give it a try ? You should not get a InvalidRequestException when you send an invalid name or value in the query expression. Aaron On 16 Mar 2011, at 10:30, aaron morton wrote: Will have the Jira I created finished soon, it's a legitimate issue we should be validating the column names and values when a ger_indexed_slice() request is sent. The error in your original email shows that. WRT your code example. You are using the TimeUUID Validator for the column name when creating the index expression, but are using a string serialiser for the value... IndexedSlicesQueryString, UUID, String indexQuery = HFactory .createIndexedSlicesQuery(keyspace, stringSerializer, UUID_SERIALIZER, stringSerializer); indexQuery.addEqualsExpression(MANDATOR_UUID, mandator); But your schema is saying it is a bytes type... column_metadata=[{column_name: --1000--, validation_class: BytesType, index_name: mandatorIndex, index_type: KEYS}, {column_name: 0001--1000--, validation_class: BytesType, index_name: useridIndex, index_type: KEYS}];On 15 Mar 2011, at 22:41, Once I have the patch can you apply it and run your test again ? You may also want to ask on the Hector list if it automagically check you are using the correct types when creating an IndexedSlicesQuery. Aaron Roland Gude wrote: Forgot to attach the source code… here it comes Von: Roland Gude [mailto:roland.g...@yoochoose.com] Gesendet: Dienstag, 15. März 2011 10:39 An: user@cassandra.apache.org Betreff: AW: problems while TimeUUIDType-index-querying with two expressions Actually its not the column values that should be UUIDs in our case, but the column keys. The CF uses TimeUUID ordering and the values are just some ByteArrays. Even with changing the code to use UUIDSerializer instead of serializing the UUIDs manually the issue still exists. As far as I can see
AW: problems while TimeUUIDType-index-querying with two expressions
Actually its not the column values that should be UUIDs in our case, but the column keys. The CF uses TimeUUID ordering and the values are just some ByteArrays. Even with changing the code to use UUIDSerializer instead of serializing the UUIDs manually the issue still exists. As far as I can see, there is nothing wrong with the IndexExpression. using two Index expressions with key=TimedUUID and Value=anything does not work using one index expression (any one of the other two) alone does work fine. I refactored Johannes code into a junit testcase. It needs the cluster configured as described in Johannes mail. There are three cases. Two with one of the indexExpressions and one with both index expression. The one with Both IndexExpression will never finish and youz will see the exception in the Cassandra logs. Bye, roland Von: aaron morton [mailto:aa...@thelastpickle.com] Gesendet: Dienstag, 15. März 2011 07:54 An: user@cassandra.apache.org Cc: Juergen Link; Roland Gude; her...@datastax.com Betreff: Re: problems while TimeUUIDType-index-querying with two expressions Perfectly reasonable, created https://issues.apache.org/jira/browse/CASSANDRA-2328 Aaron On 15 Mar 2011, at 16:52, Jonathan Ellis wrote: Sounds like we should send an InvalidRequestException then. On Mon, Mar 14, 2011 at 8:06 PM, aaron morton aa...@thelastpickle.commailto:aa...@thelastpickle.com wrote: It's failing to when comparing two TimeUUID values because on of them is not properly formatted. In this case it's comparing a stored value with the value passed in the get_indexed_slice() query expression. I'm going to assume it's the value passed for the expression. When you create the IndexedSlicesQuery this is incorrect IndexedSlicesQueryString, byte[], byte[] indexQuery = HFactory .createIndexedSlicesQuery(keyspace, stringSerializer, bytesSerializer, bytesSerializer); Use a UUIDSerializer for the last param and then pass the UUID you want to build the expressing. Rather than the string/byte thing you are passing Hope that helps. Aaron On 15 Mar 2011, at 04:17, Johannes Hoerle wrote: Hi all, in order to improve our queries, we started to use IndexedSliceQueries from the hector project (https://github.com/zznate/hector-examples). I followed the instructions for creating IndexedSlicesQuery with GetIndexedSlices.java. I created the corresponding CF with in a keyspace called Keyspace1 ( create keyspace Keyspace1;) with: create column family Indexed1 with column_type='Standard' and comparator='UTF8Type' and keys_cached=20 and read_repair_chance=1.0 and rows_cached=2 and column_metadata=[{column_name: birthdate, validation_class: LongType, index_name: dateIndex, index_type: KEYS},{column_name: birthmonth, validation_class: LongType, index_name: monthIndex, index_type: KEYS}]; and the example GetIndexedSlices.java worked fine. Output of CF Indexed1: --- [default@Keyspace1] list Indexed1; Using default limit of 100 --- RowKey: fake_key_12 = (column=birthdate, value=1974, timestamp=1300110485826059) = (column=birthmonth, value=0, timestamp=1300110485826060) = (column=fake_column_0, value=66616b655f76616c75655f305f3132, timestamp=1300110485826056) = (column=fake_column_1, value=66616b655f76616c75655f315f3132, timestamp=1300110485826057) = (column=fake_column_2, value=66616b655f76616c75655f325f3132, timestamp=1300110485826058) --- RowKey: fake_key_8 = (column=birthdate, value=1974, timestamp=1300110485826039) = (column=birthmonth, value=8, timestamp=1300110485826040) = (column=fake_column_0, value=66616b655f76616c75655f305f38, timestamp=1300110485826036) = (column=fake_column_1, value=66616b655f76616c75655f315f38, timestamp=1300110485826037) = (column=fake_column_2, value=66616b655f76616c75655f325f38, timestamp=1300110485826038) --- Now to the problem: As we have another column format in our cluster (using TimeUUIDType as comparator in CF definition) I adapted the application to our schema on a cassandra-0.7.3 cluster. We use a manually defined UUID for a mandator id index (--1000--) and another one for a userid index (0001--1000--). It can be created with: create column family ByUser with column_type='Standard' and comparator='TimeUUIDType' and keys_cached=20 and read_repair_chance=1.0 and rows_cached=2 and column_metadata=[{column_name: --1000--, validation_class: BytesType, index_name: mandatorIndex, index_type: KEYS}, {column_name: 0001--1000--, validation_class: BytesType, index_name: useridIndex, index_type: KEYS}]; which looks in the cluster using cassandra-cli like this: [default@Keyspace1] describe keyspace; Keyspace: Keyspace1: Replication Strategy: org.apache.cassandra.locator.SimpleStrategy Replication Factor: 1 Column Families: ColumnFamily: ByUser Columns sorted by: