RE: Secondary index issue, unable to query for records that should be there

2011-11-08 Thread Nate Sammons
This is against a single server, not a cluster.  Replication factor for the 
keyspace is set to 1, CL is the default for Hector, which I think is QUORUM.

I'm trying to get a simple test together that shows this.  Does anyone know if 
multiple indexes like this are efficient?

Thanks,

-nate


From: Riyad Kalla [mailto:rka...@gmail.com]
Sent: Monday, November 07, 2011 4:31 PM
To: user@cassandra.apache.org
Subject: Re: Secondary index issue, unable to query for records that should be 
there

Nate, is this all against a single Cassandra server, or do you have a ring 
setup? If you do have a ring setup, what is your replicationfactor set to? Also 
what ConsistencyLevel are you writing with when storing the values?

-R
On Mon, Nov 7, 2011 at 2:43 PM, Nate Sammons 
nsamm...@ften.commailto:nsamm...@ften.com wrote:
Hello,

I'm experimenting with Cassandra (DataStax Enterprise 1.0.3), and I've got a CF 
with several secondary indexes to try out some options.  Right now I have the 
following to create my CF using the CLI:

create column family MyTest with
  key_validation_class = UTF8Type
  and comparator = UTF8Type
  and column_metadata = [
  -- absolute timestamp for this message, also indexed 
year/month/day/hour/minute
  -- index these as they are low cardinality
  {column_name:messageTimestamp, validation_class:LongType},
  {column_name:messageYear, validation_class:IntegerType, index_type: KEYS},
  {column_name:messageMonth, validation_class:IntegerType, index_type: 
KEYS},
  {column_name:messageDay, validation_class:IntegerType, index_type: KEYS},
  {column_name:messageHour, validation_class:IntegerType, index_type: KEYS},
  {column_name:messageMinute, validation_class:IntegerType, index_type: 
KEYS},

... other non-indexed columns defined

  ];


So when I insert data, I calculate a year/month/day/hour/minute and set these 
values on a Hector ColumnFamilyUpdater instance and update that way.  Then 
later I can query from the command line with CQL such as:

get MyTest where messageYear=2011 and messageMonth=6 and 
messageDay=1 and messageHour=13 and messageMinute=44;

etc.  This generally works, however at some point queries that I know should 
return data no longer return any rows.

So for instance, part way through my test (inserting 250K rows), I can query 
for what should be there and get data back such as the above query, but later 
that same query returns 0 rows.  Similarly, with fewer clauses in the 
expression, like this:

get MyTest where messageYear=2011 and messageMonth=6;

Will also return 0 rows.


???
Any idea what could be going wrong?  I'm not getting any exceptions in my 
client during the write, and I don't see anything in the logs (no errors 
anyway).



A second question - is what I'm doing insane?  I'm not sure that performance on 
CQL queries with multiple indexed columns is good (does Cassandra intelligently 
use all available indexes on these queries?)



Thanks,

-nate



Re: Secondary index issue, unable to query for records that should be there

2011-11-08 Thread Jake Luciani
Hi Nate,

Could you try running it with debug enabled on the logs? it will give more
insite into what's going on.

-Jake


On Tue, Nov 8, 2011 at 3:45 PM, Nate Sammons nsamm...@ften.com wrote:

 This is against a single server, not a cluster.  Replication factor for
 the keyspace is set to 1, CL is the default for Hector, which I think is
 QUORUM.

 ** **

 I’m trying to get a simple test together that shows this.  Does anyone
 know if multiple indexes like this are efficient?

 ** **

 Thanks,

 ** **

 -nate

 ** **

 ** **

 *From:* Riyad Kalla [mailto:rka...@gmail.com]
 *Sent:* Monday, November 07, 2011 4:31 PM
 *To:* user@cassandra.apache.org
 *Subject:* Re: Secondary index issue, unable to query for records that
 should be there

 ** **

 Nate, is this all against a single Cassandra server, or do you have a ring
 setup? If you do have a ring setup, what is your replicationfactor set to?
 Also what ConsistencyLevel are you writing with when storing the values?**
 **

 ** **

 -R

 On Mon, Nov 7, 2011 at 2:43 PM, Nate Sammons nsamm...@ften.com wrote:***
 *

 Hello,

  

 I’m experimenting with Cassandra (DataStax Enterprise 1.0.3), and I’ve got
 a CF with several secondary indexes to try out some options.  Right now I
 have the following to create my CF using the CLI:

  

 create column family MyTest with

   key_validation_class = UTF8Type

   and comparator = UTF8Type

   and column_metadata = [

   -- absolute timestamp for this message, also indexed
 year/month/day/hour/minute

   -- index these as they are low cardinality

   {column_name:messageTimestamp, validation_class:LongType},

   {column_name:messageYear, validation_class:IntegerType, index_type:
 KEYS},

   {column_name:messageMonth, validation_class:IntegerType, index_type:
 KEYS},

   {column_name:messageDay, validation_class:IntegerType, index_type:
 KEYS},

   {column_name:messageHour, validation_class:IntegerType, index_type:
 KEYS},

   {column_name:messageMinute, validation_class:IntegerType,
 index_type: KEYS},

  

 … other non-indexed columns defined

  

   ];

  

  

 So when I insert data, I calculate a year/month/day/hour/minute and set
 these values on a Hector ColumnFamilyUpdater instance and update that way.
 Then later I can query from the command line with CQL such as:

  

 get MyTest where messageYear=2011 and messageMonth=6 and
 messageDay=1 and messageHour=13 and messageMinute=44;

  

 etc.  This generally works, however at some point queries that I know
 should return data no longer return any rows.

  

 So for instance, part way through my test (inserting 250K rows), I can
 query for what should be there and get data back such as the above query,
 but later that same query returns 0 rows.  Similarly, with fewer clauses in
 the expression, like this:

  

 get MyTest where messageYear=2011 and messageMonth=6;

  

 Will also return 0 rows.

  

  

 ???

 Any idea what could be going wrong?  I’m not getting any exceptions in my
 client during the write, and I don’t see anything in the logs (no errors
 anyway).

  

  

  

 A second question – is what I’m doing insane?  I’m not sure that
 performance on CQL queries with multiple indexed columns is good (does
 Cassandra intelligently use all available indexes on these queries?)

  

  

  

 Thanks,

  

 -nate

 ** **




-- 
http://twitter.com/tjake


RE: Secondary index issue, unable to query for records that should be there

2011-11-08 Thread Nate Sammons
Here is a simple test that shows the problem.  My setup is:


-  DSE 1.0.3 on Ubuntu 11.04, JDK 1.6.0_29 on x86_64, installed from 
the DataStax debian repo (yesterday)

-  Hector 1.0-1 (from maven)

Attached is a CLI file to create the keyspace and CF, and a java file to insert 
data and do some queries.


This creates the following CF:

create column family IndexTest with
  key_validation_class = UTF8Type
  and comparator = UTF8Type
  and column_metadata = [
  {column_name:year, validation_class:IntegerType, index_type: KEYS},
  {column_name:month, validation_class:IntegerType, index_type: KEYS},
  {column_name:day, validation_class:IntegerType, index_type: KEYS},
  {column_name:hour, validation_class:IntegerType, index_type: KEYS},
  {column_name:minute, validation_class:IntegerType, index_type: KEYS},
  {column_name:data, validation_class:UTF8Type}
  ];


Then inserts 5 rows per minute value, with the following values for 
year/month/day/hour/minute:

Year: 2011
Month: 1, 2
Day: 1-15
Hour: 1-23
Minute: 1-59

For a total of 203,550 rows.  For queries it just picks some known values for 
year/month/day/hour/minute at random and looks for rows, there should be 5 rows 
per combination.

Row keys are of the form YEAR-MONTH-DAY-HOUR-MINUTE-NUM (where NUM is 1-5).


Now once that data is inserted, using the CLI I can find records such as the 
following:


[default@Test] get IndexTest[2011-1-8-18-30--1];
= (column=data, 
value=xvktwirapi0qs0ta29w9rchbdc2omsuv0k2chjqp9pmaodlj9ngecllaa8eq3nnx66p591b2a06mry4rpsvkd54ji5pbxikpc6mxj4czi4nuuxgoasibjd5yk65hdtqe8a0uq3yxnw81dgq6hkx8wnbs177rwo51xtkwuhwizoc0gul92pvo6tfivjgdschd9fjzfu4v1d1uxhih3argr1mp4i1h6fqybfv2utlzdzzqczq3ruu90647prrnqwdw1zqmd46ia175a929ltx2hoz8sv6rs817zm2myhp3wekfk3flnuniqgtpth7g5fns8q3oc8qde5btivt1j99gc1h2kxjbek1p448t1hs91lh9r6yrg1douj53sn7d81bnwp4nnbmz01dbr46fae1b9ter0zljet2nl1x751no6pdt64k2mdh0un01gerfihak6vn0wdvgzuv9soji3pwgnffkw2zvm5q0jlp1uf9nmy7gzswydpxwtvc35c6jw64d,
 timestamp=1320769482652005)
= (column=day, value=8, timestamp=1320769482652002)
= (column=hour, value=18, timestamp=1320769482652003)
= (column=minute, value=30, timestamp=1320769482652004)
= (column=month, value=1, timestamp=1320769482652001)
= (column=year, value=2011, timestamp=1320769482652000)
Returned 6 results.


However a CQL query to find that same record fails:

[default@Test] get IndexTest where year=2011 and month=1 and day=8 and hour=18 
and minute=30;

0 Row Returned.
[default@Test] get IndexTest where year=2011 and month=1 and day=8 and hour=18;

0 Row Returned.
[default@Test] get IndexTest where year=2011 and month=1 and day=8;

0 Row Returned.
[default@Test] get IndexTest where year=2011 and month=1;


Similar results using CQLSH:

cqlsh select * from IndexTest where year=2011 and month=1 and day=8 and 
hour=18 and minute=30;
cqlsh select * from IndexTest where year=2011 and month=1 and day=8 and 
hour=18;
cqlsh select * from IndexTest where year=2011 and month=1 and day=8;

(no results in any of those cases).




However, some data does show up through CQL (I omitted the column data for 
brevity):

[default@Test] get IndexTest where year=2011 and month=2 and day=8 and hour=18 
and minute=30;
---
RowKey: 2011-2-8-18-30--1
---
RowKey: 2011-2-8-18-30--4
---
RowKey: 2011-2-8-18-30--5
---
RowKey: 2011-2-8-18-30--2
---
RowKey: 2011-2-8-18-30--3

5 Rows Returned.


So it seems like (in this case), month=1 is not working, but month=2 does work 
(along with the other parts of the expression).  I havn't tried this a bunch of 
times to see if this is always the case, but it seems to be.


When running those queries using Hector, in the debugger the QueryResult's 
get() method returns null (which should have rows).



Thanks,

-nate



From: Jake Luciani [mailto:jak...@gmail.com]
Sent: Tuesday, November 08, 2011 8:56 AM
To: user@cassandra.apache.org
Subject: Re: Secondary index issue, unable to query for records that should be 
there

Hi Nate,

Could you try running it with debug enabled on the logs? it will give more 
insite into what's going on.

-Jake

On Tue, Nov 8, 2011 at 3:45 PM, Nate Sammons 
nsamm...@ften.commailto:nsamm...@ften.com wrote:
This is against a single server, not a cluster.  Replication factor for the 
keyspace is set to 1, CL is the default for Hector, which I think is QUORUM.

I'm trying to get a simple test together that shows this.  Does anyone know if 
multiple indexes like this are efficient?

Thanks,

-nate


From: Riyad Kalla [mailto:rka...@gmail.commailto:rka...@gmail.com]
Sent: Monday, November 07, 2011 4:31 PM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: Secondary index issue, unable to query for records that should be 
there

Nate, is this all against a single Cassandra server, or do you have a ring 
setup? If you do 

RE: Secondary index issue, unable to query for records that should be there

2011-11-08 Thread Nate Sammons
I restarted with logging turned up to DEBUG, and after quite a bit of logging 
during startup, I re-ran a query:


get IndexTest where year=2011 and month=1 and day=14 and hour=18 and minute=49;


produced the following in the following:

DEBUG [pool-2-thread-3] 2011-11-08 10:19:21,823 CassandraServer.java (line 728) 
scan
DEBUG [pool-2-thread-3] 2011-11-08 10:19:21,824 StorageProxy.java (line 1017) 
restricted ranges for query [-1,-1] are 
[[-1,160425280223280959086247334056682279392], 
(160425280223280959086247334056682279392,-1]]
DEBUG [pool-2-thread-3] 2011-11-08 10:19:21,824 StorageProxy.java (line 1104) 
scan ranges are 
[-1,160425280223280959086247334056682279392],(160425280223280959086247334056682279392,-1]
DEBUG [pool-2-thread-3] 2011-11-08 10:19:21,825 ReadCallback.java (line 77) 
Blockfor/repair is 1/false; setting up requests to natebookpro/127.0.1.1
DEBUG [pool-2-thread-3] 2011-11-08 10:19:21,826 StorageProxy.java (line 1131) 
reading org.apache.cassandra.db.IndexScanCommand@7bc203c from 
natebookpro/127.0.1.1
DEBUG [ReadStage:46] 2011-11-08 10:19:21,827 KeysSearcher.java (line 96) 
Primary scan clause is minute
DEBUG [ReadStage:46] 2011-11-08 10:19:21,827 KeysSearcher.java (line 109) 
Expanding slice filter to entire row to cover additional expressions
DEBUG [ReadStage:46] 2011-11-08 10:19:21,827 KeysSearcher.java (line 151) 
Scanning index 'IndexTest.minute EQ 49' starting with
DEBUG [ReadStage:46] 2011-11-08 10:19:21,827 CollationController.java (line 
189) collectAllData
DEBUG [ReadStage:46] 2011-11-08 10:19:21,828 KeysSearcher.java (line 163) 
fetched null
DEBUG [ReadStage:46] 2011-11-08 10:19:21,828 IndexScanVerbHandler.java (line 
46) Sending RangeSliceReply{rows=} to 808@natebookpro/127.0.1.1
DEBUG [RequestResponseStage:21] 2011-11-08 10:19:21,829 
ResponseVerbHandler.java (line 44) Processing response on a callback from 
808@natebookpro/127.0.1.1
DEBUG [pool-2-thread-3] 2011-11-08 10:19:21,829 ReadCallback.java (line 77) 
Blockfor/repair is 1/false; setting up requests to natebookpro/127.0.1.1
DEBUG [pool-2-thread-3] 2011-11-08 10:19:21,830 StorageProxy.java (line 1131) 
reading org.apache.cassandra.db.IndexScanCommand@6a25a21d from 
natebookpro/127.0.1.1
DEBUG [ReadStage:47] 2011-11-08 10:19:21,831 KeysSearcher.java (line 96) 
Primary scan clause is minute
DEBUG [ReadStage:47] 2011-11-08 10:19:21,832 KeysSearcher.java (line 109) 
Expanding slice filter to entire row to cover additional expressions
DEBUG [ReadStage:47] 2011-11-08 10:19:21,832 KeysSearcher.java (line 151) 
Scanning index 'IndexTest.minute EQ 49' starting with
DEBUG [ReadStage:47] 2011-11-08 10:19:21,832 CollationController.java (line 
189) collectAllData
DEBUG [ReadStage:47] 2011-11-08 10:19:21,832 KeysSearcher.java (line 163) 
fetched null
DEBUG [ReadStage:47] 2011-11-08 10:19:21,833 IndexScanVerbHandler.java (line 
46) Sending RangeSliceReply{rows=} to 809@natebookpro/127.0.1.1
DEBUG [RequestResponseStage:22] 2011-11-08 10:19:21,834 
ResponseVerbHandler.java (line 44) Processing response on a callback from 
809@natebookpro/127.0.1.1



Whereas a direct read of a key using get IndexTest[2011-1-14-18-49--1]; 
produced a result, and the following in the logs:

DEBUG [pool-2-thread-2] 2011-11-08 10:11:20,153 CassandraServer.java (line 323) 
get_slice
DEBUG [pool-2-thread-2] 2011-11-08 10:11:20,159 StorageProxy.java (line 623) 
Command/ConsistencyLevel is SliceFromReadCommand(table='Test', 
key='323031312d312d31342d31382d34392d2d31', 
column_parent='QueryPath(columnFamilyName='IndexTest', superColumnName='null', 
columnName='null')', start='', finish='', reversed=false, count=100)/ONE
DEBUG [pool-2-thread-2] 2011-11-08 10:11:20,159 ReadCallback.java (line 77) 
Blockfor/repair is 1/true; setting up requests to natebookpro/127.0.1.1
DEBUG [pool-2-thread-2] 2011-11-08 10:11:20,159 StorageProxy.java (line 639) 
reading data locally
DEBUG [ReadStage:37] 2011-11-08 10:11:20,160 StorageProxy.java (line 792) 
LocalReadRunnable reading SliceFromReadCommand(table='Test', 
key='323031312d312d31342d31382d34392d2d31', 
column_parent='QueryPath(columnFamilyName='IndexTest', superColumnName='null', 
columnName='null')', start='', finish='', reversed=false, count=100)
DEBUG [ReadStage:37] 2011-11-08 10:11:20,161 CollationController.java (line 
189) collectAllData
DEBUG [ReadStage:37] 2011-11-08 10:11:20,162 SliceQueryFilter.java (line 123) 
collecting 0 of 100: data:false:512@1320769510502017
DEBUG [ReadStage:37] 2011-11-08 10:11:20,163 SliceQueryFilter.java (line 123) 
collecting 1 of 100: day:false:4@1320769510502014
DEBUG [ReadStage:37] 2011-11-08 10:11:20,163 SliceQueryFilter.java (line 123) 
collecting 2 of 100: hour:false:4@1320769510502015
DEBUG [ReadStage:37] 2011-11-08 10:11:20,163 SliceQueryFilter.java (line 123) 
collecting 3 of 100: minute:false:4@1320769510502016
DEBUG [ReadStage:37] 2011-11-08 10:11:20,163 SliceQueryFilter.java (line 123) 
collecting 4 of 100: month:false:4@1320769510502013

RE: Secondary index issue, unable to query for records that should be there

2011-11-08 Thread Nate Sammons
Note that I had identical behavior using a fresh download of Cassandra 1.0.2 as 
of today.

Thanks,

-nate


From: Nate Sammons [mailto:nsamm...@ften.com]
Sent: Tuesday, November 08, 2011 10:20 AM
To: user@cassandra.apache.org
Subject: RE: Secondary index issue, unable to query for records that should be 
there

I restarted with logging turned up to DEBUG, and after quite a bit of logging 
during startup, I re-ran a query:


get IndexTest where year=2011 and month=1 and day=14 and hour=18 and minute=49;


produced the following in the following:

DEBUG [pool-2-thread-3] 2011-11-08 10:19:21,823 CassandraServer.java (line 728) 
scan
DEBUG [pool-2-thread-3] 2011-11-08 10:19:21,824 StorageProxy.java (line 1017) 
restricted ranges for query [-1,-1] are 
[[-1,160425280223280959086247334056682279392], 
(160425280223280959086247334056682279392,-1]]
DEBUG [pool-2-thread-3] 2011-11-08 10:19:21,824 StorageProxy.java (line 1104) 
scan ranges are 
[-1,160425280223280959086247334056682279392],(160425280223280959086247334056682279392,-1]
DEBUG [pool-2-thread-3] 2011-11-08 10:19:21,825 ReadCallback.java (line 77) 
Blockfor/repair is 1/false; setting up requests to natebookpro/127.0.1.1
DEBUG [pool-2-thread-3] 2011-11-08 10:19:21,826 StorageProxy.java (line 1131) 
reading 
org.apache.cassandra.db.IndexScanCommand@7bc203cmailto:org.apache.cassandra.db.IndexScanCommand@7bc203c
 from natebookpro/127.0.1.1
DEBUG [ReadStage:46] 2011-11-08 10:19:21,827 KeysSearcher.java (line 96) 
Primary scan clause is minute
DEBUG [ReadStage:46] 2011-11-08 10:19:21,827 KeysSearcher.java (line 109) 
Expanding slice filter to entire row to cover additional expressions
DEBUG [ReadStage:46] 2011-11-08 10:19:21,827 KeysSearcher.java (line 151) 
Scanning index 'IndexTest.minute EQ 49' starting with
DEBUG [ReadStage:46] 2011-11-08 10:19:21,827 CollationController.java (line 
189) collectAllData
DEBUG [ReadStage:46] 2011-11-08 10:19:21,828 KeysSearcher.java (line 163) 
fetched null
DEBUG [ReadStage:46] 2011-11-08 10:19:21,828 IndexScanVerbHandler.java (line 
46) Sending RangeSliceReply{rows=} to 
808@natebookpro/127.0.1.1mailto:808@natebookpro/127.0.1.1
DEBUG [RequestResponseStage:21] 2011-11-08 10:19:21,829 
ResponseVerbHandler.java (line 44) Processing response on a callback from 
808@natebookpro/127.0.1.1mailto:808@natebookpro/127.0.1.1
DEBUG [pool-2-thread-3] 2011-11-08 10:19:21,829 ReadCallback.java (line 77) 
Blockfor/repair is 1/false; setting up requests to natebookpro/127.0.1.1
DEBUG [pool-2-thread-3] 2011-11-08 10:19:21,830 StorageProxy.java (line 1131) 
reading 
org.apache.cassandra.db.IndexScanCommand@6a25a21dmailto:org.apache.cassandra.db.IndexScanCommand@6a25a21d
 from natebookpro/127.0.1.1
DEBUG [ReadStage:47] 2011-11-08 10:19:21,831 KeysSearcher.java (line 96) 
Primary scan clause is minute
DEBUG [ReadStage:47] 2011-11-08 10:19:21,832 KeysSearcher.java (line 109) 
Expanding slice filter to entire row to cover additional expressions
DEBUG [ReadStage:47] 2011-11-08 10:19:21,832 KeysSearcher.java (line 151) 
Scanning index 'IndexTest.minute EQ 49' starting with
DEBUG [ReadStage:47] 2011-11-08 10:19:21,832 CollationController.java (line 
189) collectAllData
DEBUG [ReadStage:47] 2011-11-08 10:19:21,832 KeysSearcher.java (line 163) 
fetched null
DEBUG [ReadStage:47] 2011-11-08 10:19:21,833 IndexScanVerbHandler.java (line 
46) Sending RangeSliceReply{rows=} to 
809@natebookpro/127.0.1.1mailto:809@natebookpro/127.0.1.1
DEBUG [RequestResponseStage:22] 2011-11-08 10:19:21,834 
ResponseVerbHandler.java (line 44) Processing response on a callback from 
809@natebookpro/127.0.1.1mailto:809@natebookpro/127.0.1.1



Whereas a direct read of a key using get IndexTest[2011-1-14-18-49--1]; 
produced a result, and the following in the logs:

DEBUG [pool-2-thread-2] 2011-11-08 10:11:20,153 CassandraServer.java (line 323) 
get_slice
DEBUG [pool-2-thread-2] 2011-11-08 10:11:20,159 StorageProxy.java (line 623) 
Command/ConsistencyLevel is SliceFromReadCommand(table='Test', 
key='323031312d312d31342d31382d34392d2d31', 
column_parent='QueryPath(columnFamilyName='IndexTest', superColumnName='null', 
columnName='null')', start='', finish='', reversed=false, count=100)/ONE
DEBUG [pool-2-thread-2] 2011-11-08 10:11:20,159 ReadCallback.java (line 77) 
Blockfor/repair is 1/true; setting up requests to natebookpro/127.0.1.1
DEBUG [pool-2-thread-2] 2011-11-08 10:11:20,159 StorageProxy.java (line 639) 
reading data locally
DEBUG [ReadStage:37] 2011-11-08 10:11:20,160 StorageProxy.java (line 792) 
LocalReadRunnable reading SliceFromReadCommand(table='Test', 
key='323031312d312d31342d31382d34392d2d31', 
column_parent='QueryPath(columnFamilyName='IndexTest', superColumnName='null', 
columnName='null')', start='', finish='', reversed=false, count=100)
DEBUG [ReadStage:37] 2011-11-08 10:11:20,161 CollationController.java (line 
189) collectAllData
DEBUG [ReadStage:37] 2011-11-08 10:11:20,162 SliceQueryFilter.java (line 123) 
collecting 0 of 100: 

RE: Secondary index issue, unable to query for records that should be there

2011-11-08 Thread Nate Sammons
Interesting...  if I switch the columns to be UTF8 instead of integers, like 
this:

create column family IndexTest with
  key_validation_class = UTF8Type
  and comparator = UTF8Type
  and column_metadata = [
  {column_name:year, validation_class:UTF8Type, index_type: KEYS},
  {column_name:month, validation_class:UTF8Type, index_type: KEYS},
  {column_name:day, validation_class:UTF8Type, index_type: KEYS},
  {column_name:hour, validation_class:UTF8Type, index_type: KEYS},
  {column_name:minute, validation_class:UTF8Type, index_type: KEYS},
  {column_name:data, validation_class:UTF8Type}
  ];


And change the hector code to use setString(...) instead of setInteger(...).

Then everything works fine.   Is there a CQL bug with respect to non-string 
columns?


Thanks,

-nate



From: Nate Sammons [mailto:nsamm...@ften.com]
Sent: Tuesday, November 08, 2011 11:14 AM
To: user@cassandra.apache.org
Subject: RE: Secondary index issue, unable to query for records that should be 
there

Note that I had identical behavior using a fresh download of Cassandra 1.0.2 as 
of today.

Thanks,

-nate


From: Nate Sammons [mailto:nsamm...@ften.com]mailto:[mailto:nsamm...@ften.com]
Sent: Tuesday, November 08, 2011 10:20 AM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: RE: Secondary index issue, unable to query for records that should be 
there

I restarted with logging turned up to DEBUG, and after quite a bit of logging 
during startup, I re-ran a query:


get IndexTest where year=2011 and month=1 and day=14 and hour=18 and minute=49;


produced the following in the following:

DEBUG [pool-2-thread-3] 2011-11-08 10:19:21,823 CassandraServer.java (line 728) 
scan
DEBUG [pool-2-thread-3] 2011-11-08 10:19:21,824 StorageProxy.java (line 1017) 
restricted ranges for query [-1,-1] are 
[[-1,160425280223280959086247334056682279392], 
(160425280223280959086247334056682279392,-1]]
DEBUG [pool-2-thread-3] 2011-11-08 10:19:21,824 StorageProxy.java (line 1104) 
scan ranges are 
[-1,160425280223280959086247334056682279392],(160425280223280959086247334056682279392,-1]
DEBUG [pool-2-thread-3] 2011-11-08 10:19:21,825 ReadCallback.java (line 77) 
Blockfor/repair is 1/false; setting up requests to natebookpro/127.0.1.1
DEBUG [pool-2-thread-3] 2011-11-08 10:19:21,826 StorageProxy.java (line 1131) 
reading 
org.apache.cassandra.db.IndexScanCommand@7bc203cmailto:org.apache.cassandra.db.IndexScanCommand@7bc203c
 from natebookpro/127.0.1.1
DEBUG [ReadStage:46] 2011-11-08 10:19:21,827 KeysSearcher.java (line 96) 
Primary scan clause is minute
DEBUG [ReadStage:46] 2011-11-08 10:19:21,827 KeysSearcher.java (line 109) 
Expanding slice filter to entire row to cover additional expressions
DEBUG [ReadStage:46] 2011-11-08 10:19:21,827 KeysSearcher.java (line 151) 
Scanning index 'IndexTest.minute EQ 49' starting with
DEBUG [ReadStage:46] 2011-11-08 10:19:21,827 CollationController.java (line 
189) collectAllData
DEBUG [ReadStage:46] 2011-11-08 10:19:21,828 KeysSearcher.java (line 163) 
fetched null
DEBUG [ReadStage:46] 2011-11-08 10:19:21,828 IndexScanVerbHandler.java (line 
46) Sending RangeSliceReply{rows=} to 
808@natebookpro/127.0.1.1mailto:808@natebookpro/127.0.1.1
DEBUG [RequestResponseStage:21] 2011-11-08 10:19:21,829 
ResponseVerbHandler.java (line 44) Processing response on a callback from 
808@natebookpro/127.0.1.1mailto:808@natebookpro/127.0.1.1
DEBUG [pool-2-thread-3] 2011-11-08 10:19:21,829 ReadCallback.java (line 77) 
Blockfor/repair is 1/false; setting up requests to natebookpro/127.0.1.1
DEBUG [pool-2-thread-3] 2011-11-08 10:19:21,830 StorageProxy.java (line 1131) 
reading 
org.apache.cassandra.db.IndexScanCommand@6a25a21dmailto:org.apache.cassandra.db.IndexScanCommand@6a25a21d
 from natebookpro/127.0.1.1
DEBUG [ReadStage:47] 2011-11-08 10:19:21,831 KeysSearcher.java (line 96) 
Primary scan clause is minute
DEBUG [ReadStage:47] 2011-11-08 10:19:21,832 KeysSearcher.java (line 109) 
Expanding slice filter to entire row to cover additional expressions
DEBUG [ReadStage:47] 2011-11-08 10:19:21,832 KeysSearcher.java (line 151) 
Scanning index 'IndexTest.minute EQ 49' starting with
DEBUG [ReadStage:47] 2011-11-08 10:19:21,832 CollationController.java (line 
189) collectAllData
DEBUG [ReadStage:47] 2011-11-08 10:19:21,832 KeysSearcher.java (line 163) 
fetched null
DEBUG [ReadStage:47] 2011-11-08 10:19:21,833 IndexScanVerbHandler.java (line 
46) Sending RangeSliceReply{rows=} to 
809@natebookpro/127.0.1.1mailto:809@natebookpro/127.0.1.1
DEBUG [RequestResponseStage:22] 2011-11-08 10:19:21,834 
ResponseVerbHandler.java (line 44) Processing response on a callback from 
809@natebookpro/127.0.1.1mailto:809@natebookpro/127.0.1.1



Whereas a direct read of a key using get IndexTest[2011-1-14-18-49--1]; 
produced a result, and the following in the logs:

DEBUG [pool-2-thread-2] 2011-11-08 10:11:20,153 CassandraServer.java (line 323) 
get_slice
DEBUG [pool-2-thread-2] 2011-11-08 10:11:20,159 StorageProxy.java 

Secondary index issue, unable to query for records that should be there

2011-11-07 Thread Nate Sammons
Hello,

I'm experimenting with Cassandra (DataStax Enterprise 1.0.3), and I've got a CF 
with several secondary indexes to try out some options.  Right now I have the 
following to create my CF using the CLI:

create column family MyTest with
  key_validation_class = UTF8Type
  and comparator = UTF8Type
  and column_metadata = [
  -- absolute timestamp for this message, also indexed 
year/month/day/hour/minute
  -- index these as they are low cardinality
  {column_name:messageTimestamp, validation_class:LongType},
  {column_name:messageYear, validation_class:IntegerType, index_type: KEYS},
  {column_name:messageMonth, validation_class:IntegerType, index_type: 
KEYS},
  {column_name:messageDay, validation_class:IntegerType, index_type: KEYS},
  {column_name:messageHour, validation_class:IntegerType, index_type: KEYS},
  {column_name:messageMinute, validation_class:IntegerType, index_type: 
KEYS},

... other non-indexed columns defined

  ];


So when I insert data, I calculate a year/month/day/hour/minute and set these 
values on a Hector ColumnFamilyUpdater instance and update that way.  Then 
later I can query from the command line with CQL such as:

get MyTest where messageYear=2011 and messageMonth=6 and 
messageDay=1 and messageHour=13 and messageMinute=44;

etc.  This generally works, however at some point queries that I know should 
return data no longer return any rows.

So for instance, part way through my test (inserting 250K rows), I can query 
for what should be there and get data back such as the above query, but later 
that same query returns 0 rows.  Similarly, with fewer clauses in the 
expression, like this:

get MyTest where messageYear=2011 and messageMonth=6;

Will also return 0 rows.


???
Any idea what could be going wrong?  I'm not getting any exceptions in my 
client during the write, and I don't see anything in the logs (no errors 
anyway).



A second question - is what I'm doing insane?  I'm not sure that performance on 
CQL queries with multiple indexed columns is good (does Cassandra intelligently 
use all available indexes on these queries?)



Thanks,

-nate


Re: Secondary index issue, unable to query for records that should be there

2011-11-07 Thread Riyad Kalla
Nate, is this all against a single Cassandra server, or do you have a ring
setup? If you do have a ring setup, what is your replicationfactor set to?
Also what ConsistencyLevel are you writing with when storing the values?

-R

On Mon, Nov 7, 2011 at 2:43 PM, Nate Sammons nsamm...@ften.com wrote:

 Hello,

 ** **

 I’m experimenting with Cassandra (DataStax Enterprise 1.0.3), and I’ve got
 a CF with several secondary indexes to try out some options.  Right now I
 have the following to create my CF using the CLI:

 ** **

 create column family MyTest with

   key_validation_class = UTF8Type

   and comparator = UTF8Type

   and column_metadata = [

   -- absolute timestamp for this message, also indexed
 year/month/day/hour/minute

   -- index these as they are low cardinality

   {column_name:messageTimestamp, validation_class:LongType},

   {column_name:messageYear, validation_class:IntegerType, index_type:
 KEYS},

   {column_name:messageMonth, validation_class:IntegerType, index_type:
 KEYS},

   {column_name:messageDay, validation_class:IntegerType, index_type:
 KEYS},

   {column_name:messageHour, validation_class:IntegerType, index_type:
 KEYS},

   {column_name:messageMinute, validation_class:IntegerType,
 index_type: KEYS},

 ** **

 … other non-indexed columns defined

 ** **

   ];

 ** **

 ** **

 So when I insert data, I calculate a year/month/day/hour/minute and set
 these values on a Hector ColumnFamilyUpdater instance and update that way.
 Then later I can query from the command line with CQL such as:

 ** **

 get MyTest where messageYear=2011 and messageMonth=6 and
 messageDay=1 and messageHour=13 and messageMinute=44;

 ** **

 etc.  This generally works, however at some point queries that I know
 should return data no longer return any rows.

 ** **

 So for instance, part way through my test (inserting 250K rows), I can
 query for what should be there and get data back such as the above query,
 but later that same query returns 0 rows.  Similarly, with fewer clauses in
 the expression, like this:

 ** **

 get MyTest where messageYear=2011 and messageMonth=6;

 ** **

 Will also return 0 rows.

 ** **

 ** **

 ???

 Any idea what could be going wrong?  I’m not getting any exceptions in my
 client during the write, and I don’t see anything in the logs (no errors
 anyway).

 ** **

 ** **

 ** **

 A second question – is what I’m doing insane?  I’m not sure that
 performance on CQL queries with multiple indexed columns is good (does
 Cassandra intelligently use all available indexes on these queries?)

 ** **

 ** **

 ** **

 Thanks,

 ** **

 -nate