[jira] [Updated] (CASSANDRA-6137) CQL3 SELECT IN CLAUSE inconsistent

2013-10-23 Thread Constance Eustace (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Constance Eustace updated CASSANDRA-6137:
-

Reproduced In: 2.0.1, 1.2.8  (was: 1.2.8, 2.0.1)
  Description: 
Possible Resolution:

What seems to be key is to run a nodetool compact (possibly a nodetool flush) 
after schema drops / schema creations / schema truncates and invalidate the 
caches. This seems to align the data for new inserts/updates. From my 
reproduction tests, I have been unable to generate the database corruption if 
nodetool flush, nodetool compact, nodetool keycacheinvalidate (we have turned 
off rowcache due to other bugs). Then, even after running a more stressful test 
with 10x the inserts and five separate concurrent update threads the corruption 
did not appear. 

So I believe this is a tentative fix to this issue...

--

Problem Encountered:

We are encountering inconsistent results from CQL3 queries with column keys 
using IN clause in WHERE. This has been reproduced in cqlsh and the jdbc driver.

Rowkey is e_entid
Column key is p_prop

This returns roughly 21 rows for 21 column keys that match p_prop.

cqlsh SELECT 
e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_proplinks,p_subents,p_val,p_vallinks,p_vars
 FROM internal_submission.Entity_Job WHERE e_entid = 
'845b38f1-2b91-11e3-854d-126aad0075d4-CJOB';

These three queries each return one row for the requested single column key in 
the IN clause:

SELECT 
e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_proplinks,p_subents,p_val,p_vallinks,p_vars
 FROM internal_submission.Entity_Job WHERE e_entid = 
'845b38f1-2b91-11e3-854d-126aad0075d4-CJOB'  AND p_prop in 
('urn:bby:pcm:job:ingest:content:complete:count');
SELECT 
e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_proplinks,p_subents,p_val,p_vallinks,p_vars
 FROM internal_submission.Entity_Job WHERE e_entid = 
'845b38f1-2b91-11e3-854d-126aad0075d4-CJOB'  AND p_prop in 
('urn:bby:pcm:job:ingest:content:all:count');
SELECT 
e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_proplinks,p_subents,p_val,p_vallinks,p_vars
 FROM internal_submission.Entity_Job WHERE e_entid = 
'845b38f1-2b91-11e3-854d-126aad0075d4-CJOB'  AND p_prop in 
('urn:bby:pcm:job:ingest:content:fail:count');

This query returns ONLY ONE ROW (one column key), not three as I would expect 
from the three-column-key IN clause:

cqlsh SELECT 
e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_proplinks,p_subents,p_val,p_vallinks,p_vars
 FROM internal_submission.Entity_Job WHERE e_entid = 
'845b38f1-2b91-11e3-854d-126aad0075d4-CJOB'  AND p_prop in 
('urn:bby:pcm:job:ingest:content:complete:count','urn:bby:pcm:job:ingest:content:all:count','urn:bby:pcm:job:ingest:content:fail:count');

This query does return two rows however for the requested two column keys:

cqlsh SELECT 
e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_proplinks,p_subents,p_val,p_vallinks,p_vars
 FROM internal_submission.Entity_Job WHERE e_entid = 
'845b38f1-2b91-11e3-854d-126aad0075d4-CJOB'  AND p_prop in (

'urn:bby:pcm:job:ingest:content:all:count','urn:bby:pcm:job:ingest:content:fail:count');





cqlsh describe table internal_submission.entity_job;

CREATE TABLE entity_job (
  e_entid text,
  p_prop text,
  describes text,
  dndcondition text,
  e_entlinks text,
  e_entname text,
  e_enttype text,
  ingeststatus text,
  ingeststatusdetail text,
  p_flags text,
  p_propid text,
  p_proplinks text,
  p_storage text,
  p_subents text,
  p_val text,
  p_vallang text,
  p_vallinks text,
  p_valtype text,
  p_valunit text,
  p_vars text,
  partnerid text,
  referenceid text,
  size int,
  sourceip text,
  submitdate bigint,
  submitevent text,
  userid text,
  version text,
  PRIMARY KEY (e_entid, p_prop)
) WITH
  bloom_filter_fp_chance=0.01 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.00 AND
  gc_grace_seconds=864000 AND
  index_interval=128 AND
  read_repair_chance=0.10 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  default_time_to_live=0 AND
  speculative_retry='NONE' AND
  memtable_flush_period_in_ms=0 AND
  compaction={'class': 'SizeTieredCompactionStrategy'} AND
  compression={'sstable_compression': 'LZ4Compressor'};

CREATE INDEX internal_submission__JobDescribesIDX ON entity_job (describes);

CREATE INDEX internal_submission__JobDNDConditionIDX ON entity_job 
(dndcondition);

CREATE INDEX internal_submission__JobIngestStatusIDX ON entity_job 
(ingeststatus);

CREATE INDEX internal_submission__JobIngestStatusDetailIDX ON entity_job 
(ingeststatusdetail);

CREATE INDEX internal_submission__JobReferenceIDIDX ON entity_job (referenceid);

CREATE INDEX internal_submission__JobUserIDX ON 

[jira] [Updated] (CASSANDRA-6137) CQL3 SELECT IN CLAUSE inconsistent

2013-10-23 Thread Constance Eustace (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Constance Eustace updated CASSANDRA-6137:
-

Description: 
Possible Resolution:

What seems to be key is to run a nodetool compact (possibly a nodetool flush) 
after schema drops / schema creations / schema truncates and invalidate the 
caches. This seems to align the data for new inserts/updates. From my 
reproduction tests, I have been unable to generate the database corruption if 
nodetool flush, nodetool compact, nodetool keycacheinvalidate (we have turned 
off rowcache due to other bugs). Then, even after running a more stressful test 
with 10x the inserts and five separate concurrent update threads the corruption 
did not appear. 

So I believe this is a tentative fix to this issue...

--

Problem Encountered:

We are encountering inconsistent results from CQL3 queries with column keys 
using IN clause in WHERE. This has been reproduced in cqlsh and the jdbc 
driver. Specifically, we are doing queries to pull a subset of column keys for 
a specific row key. 

We detect this corruption by selecting all the column keys for a row, and then 
trying different subsets of column keys in WHERE columnkey IN (column key 
subset list). We see some of these column key subset queries not return all 
the column keys, even though the select-all-column-keys query finds them. 

It seems to appear when there is a large amount of raw insertion work 
(non-updates / new ingested data) combined with simultaneous updates to 
existing data. EDIT: this also seems to only happen with mass insert+updates 
after schema changes / drops / table creation / table truncation. See the 
Possible Resolution section above.

--

Details:

Rowkey is e_entid
Column key is p_prop

This returns roughly 21 rows for 21 column keys that match p_prop.

cqlsh SELECT 
e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_proplinks,p_subents,p_val,p_vallinks,p_vars
 FROM internal_submission.Entity_Job WHERE e_entid = 
'845b38f1-2b91-11e3-854d-126aad0075d4-CJOB';

These three queries each return one row for the requested single column key in 
the IN clause:

SELECT 
e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_proplinks,p_subents,p_val,p_vallinks,p_vars
 FROM internal_submission.Entity_Job WHERE e_entid = 
'845b38f1-2b91-11e3-854d-126aad0075d4-CJOB'  AND p_prop in 
('urn:bby:pcm:job:ingest:content:complete:count');
SELECT 
e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_proplinks,p_subents,p_val,p_vallinks,p_vars
 FROM internal_submission.Entity_Job WHERE e_entid = 
'845b38f1-2b91-11e3-854d-126aad0075d4-CJOB'  AND p_prop in 
('urn:bby:pcm:job:ingest:content:all:count');
SELECT 
e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_proplinks,p_subents,p_val,p_vallinks,p_vars
 FROM internal_submission.Entity_Job WHERE e_entid = 
'845b38f1-2b91-11e3-854d-126aad0075d4-CJOB'  AND p_prop in 
('urn:bby:pcm:job:ingest:content:fail:count');

This query returns ONLY ONE ROW (one column key), not three as I would expect 
from the three-column-key IN clause:

cqlsh SELECT 
e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_proplinks,p_subents,p_val,p_vallinks,p_vars
 FROM internal_submission.Entity_Job WHERE e_entid = 
'845b38f1-2b91-11e3-854d-126aad0075d4-CJOB'  AND p_prop in 
('urn:bby:pcm:job:ingest:content:complete:count','urn:bby:pcm:job:ingest:content:all:count','urn:bby:pcm:job:ingest:content:fail:count');

This query does return two rows however for the requested two column keys:

cqlsh SELECT 
e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_proplinks,p_subents,p_val,p_vallinks,p_vars
 FROM internal_submission.Entity_Job WHERE e_entid = 
'845b38f1-2b91-11e3-854d-126aad0075d4-CJOB'  AND p_prop in (

'urn:bby:pcm:job:ingest:content:all:count','urn:bby:pcm:job:ingest:content:fail:count');





cqlsh describe table internal_submission.entity_job;

CREATE TABLE entity_job (
  e_entid text,
  p_prop text,
  describes text,
  dndcondition text,
  e_entlinks text,
  e_entname text,
  e_enttype text,
  ingeststatus text,
  ingeststatusdetail text,
  p_flags text,
  p_propid text,
  p_proplinks text,
  p_storage text,
  p_subents text,
  p_val text,
  p_vallang text,
  p_vallinks text,
  p_valtype text,
  p_valunit text,
  p_vars text,
  partnerid text,
  referenceid text,
  size int,
  sourceip text,
  submitdate bigint,
  submitevent text,
  userid text,
  version text,
  PRIMARY KEY (e_entid, p_prop)
) WITH
  bloom_filter_fp_chance=0.01 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.00 AND
  gc_grace_seconds=864000 AND
  index_interval=128 AND
  read_repair_chance=0.10 AND
  

[jira] [Updated] (CASSANDRA-6137) CQL3 SELECT IN CLAUSE inconsistent

2013-10-23 Thread Constance Eustace (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Constance Eustace updated CASSANDRA-6137:
-

Description: 
Possible Resolution:

What seems to be key is to run a nodetool compact (possibly a nodetool flush) 
after schema drops / schema creations / schema truncates and invalidate the 
caches. This seems to align the data for new inserts/updates. From my 
reproduction tests, I have been unable to generate the database corruption if 
nodetool flush, nodetool compact, nodetool keycacheinvalidate (we have turned 
off rowcache due to other bugs). Then, even after running a more stressful test 
with 10x the inserts and five separate concurrent update threads the corruption 
did not appear. 

So I believe this is a tentative fix to this issue... in general, after any 
manipulation to the schema, you should run nodetool compact and 
keycacheinvalidate. I have not tested if a general compact on all keyspaces and 
tables versus a more specific compact on the affected keyspace and/or keyspace 
tables is all that is necessary (compact can be a very expensive operation). 

--

Problem Encountered:

We are encountering inconsistent results from CQL3 queries with column keys 
using IN clause in WHERE. This has been reproduced in cqlsh and the jdbc 
driver. Specifically, we are doing queries to pull a subset of column keys for 
a specific row key. 

We detect this corruption by selecting all the column keys for a row, and then 
trying different subsets of column keys in WHERE columnkey IN (column key 
subset list). We see some of these column key subset queries not return all 
the column keys, even though the select-all-column-keys query finds them. 

It seems to appear when there is a large amount of raw insertion work 
(non-updates / new ingested data) combined with simultaneous updates to 
existing data. EDIT: this also seems to only happen with mass insert+updates 
after schema changes / drops / table creation / table truncation. See the 
Possible Resolution section above.

--

Details:

Rowkey is e_entid
Column key is p_prop

This returns roughly 21 rows for 21 column keys that match p_prop.

cqlsh SELECT 
e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_proplinks,p_subents,p_val,p_vallinks,p_vars
 FROM internal_submission.Entity_Job WHERE e_entid = 
'845b38f1-2b91-11e3-854d-126aad0075d4-CJOB';

These three queries each return one row for the requested single column key in 
the IN clause:

SELECT 
e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_proplinks,p_subents,p_val,p_vallinks,p_vars
 FROM internal_submission.Entity_Job WHERE e_entid = 
'845b38f1-2b91-11e3-854d-126aad0075d4-CJOB'  AND p_prop in 
('urn:bby:pcm:job:ingest:content:complete:count');
SELECT 
e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_proplinks,p_subents,p_val,p_vallinks,p_vars
 FROM internal_submission.Entity_Job WHERE e_entid = 
'845b38f1-2b91-11e3-854d-126aad0075d4-CJOB'  AND p_prop in 
('urn:bby:pcm:job:ingest:content:all:count');
SELECT 
e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_proplinks,p_subents,p_val,p_vallinks,p_vars
 FROM internal_submission.Entity_Job WHERE e_entid = 
'845b38f1-2b91-11e3-854d-126aad0075d4-CJOB'  AND p_prop in 
('urn:bby:pcm:job:ingest:content:fail:count');

This query returns ONLY ONE ROW (one column key), not three as I would expect 
from the three-column-key IN clause:

cqlsh SELECT 
e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_proplinks,p_subents,p_val,p_vallinks,p_vars
 FROM internal_submission.Entity_Job WHERE e_entid = 
'845b38f1-2b91-11e3-854d-126aad0075d4-CJOB'  AND p_prop in 
('urn:bby:pcm:job:ingest:content:complete:count','urn:bby:pcm:job:ingest:content:all:count','urn:bby:pcm:job:ingest:content:fail:count');

This query does return two rows however for the requested two column keys:

cqlsh SELECT 
e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_proplinks,p_subents,p_val,p_vallinks,p_vars
 FROM internal_submission.Entity_Job WHERE e_entid = 
'845b38f1-2b91-11e3-854d-126aad0075d4-CJOB'  AND p_prop in (

'urn:bby:pcm:job:ingest:content:all:count','urn:bby:pcm:job:ingest:content:fail:count');





cqlsh describe table internal_submission.entity_job;

CREATE TABLE entity_job (
  e_entid text,
  p_prop text,
  describes text,
  dndcondition text,
  e_entlinks text,
  e_entname text,
  e_enttype text,
  ingeststatus text,
  ingeststatusdetail text,
  p_flags text,
  p_propid text,
  p_proplinks text,
  p_storage text,
  p_subents text,
  p_val text,
  p_vallang text,
  p_vallinks text,
  p_valtype text,
  p_valunit text,
  p_vars text,
  partnerid text,
  referenceid text,
  size int,
  sourceip text,
  

[jira] [Updated] (CASSANDRA-6137) CQL3 SELECT IN CLAUSE inconsistent

2013-10-15 Thread Constance Eustace (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Constance Eustace updated CASSANDRA-6137:
-

  Description: 
I am elevating this to Critical after doing some trace and reproducing in 
several environments. No one has commented on this bug from the cassandra team, 
and I view unreliable/corrupted data a pretty big deal. We are considering 
pulling cassandra and using something else.


--

We are encountering inconsistent results from CQL3 queries with column keys 
using IN clause in WHERE. This has been reproduced in cqlsh and the jdbc driver.

Rowkey is e_entid
Column key is p_prop

This returns roughly 21 rows for 21 column keys that match p_prop.

cqlsh SELECT 
e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_proplinks,p_subents,p_val,p_vallinks,p_vars
 FROM internal_submission.Entity_Job WHERE e_entid = 
'845b38f1-2b91-11e3-854d-126aad0075d4-CJOB';

These three queries each return one row for the requested single column key in 
the IN clause:

SELECT 
e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_proplinks,p_subents,p_val,p_vallinks,p_vars
 FROM internal_submission.Entity_Job WHERE e_entid = 
'845b38f1-2b91-11e3-854d-126aad0075d4-CJOB'  AND p_prop in 
('urn:bby:pcm:job:ingest:content:complete:count');
SELECT 
e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_proplinks,p_subents,p_val,p_vallinks,p_vars
 FROM internal_submission.Entity_Job WHERE e_entid = 
'845b38f1-2b91-11e3-854d-126aad0075d4-CJOB'  AND p_prop in 
('urn:bby:pcm:job:ingest:content:all:count');
SELECT 
e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_proplinks,p_subents,p_val,p_vallinks,p_vars
 FROM internal_submission.Entity_Job WHERE e_entid = 
'845b38f1-2b91-11e3-854d-126aad0075d4-CJOB'  AND p_prop in 
('urn:bby:pcm:job:ingest:content:fail:count');

This query returns ONLY ONE ROW (one column key), not three as I would expect 
from the three-column-key IN clause:

cqlsh SELECT 
e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_proplinks,p_subents,p_val,p_vallinks,p_vars
 FROM internal_submission.Entity_Job WHERE e_entid = 
'845b38f1-2b91-11e3-854d-126aad0075d4-CJOB'  AND p_prop in 
('urn:bby:pcm:job:ingest:content:complete:count','urn:bby:pcm:job:ingest:content:all:count','urn:bby:pcm:job:ingest:content:fail:count');

This query does return two rows however for the requested two column keys:

cqlsh SELECT 
e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_proplinks,p_subents,p_val,p_vallinks,p_vars
 FROM internal_submission.Entity_Job WHERE e_entid = 
'845b38f1-2b91-11e3-854d-126aad0075d4-CJOB'  AND p_prop in (

'urn:bby:pcm:job:ingest:content:all:count','urn:bby:pcm:job:ingest:content:fail:count');





cqlsh describe table internal_submission.entity_job;

CREATE TABLE entity_job (
  e_entid text,
  p_prop text,
  describes text,
  dndcondition text,
  e_entlinks text,
  e_entname text,
  e_enttype text,
  ingeststatus text,
  ingeststatusdetail text,
  p_flags text,
  p_propid text,
  p_proplinks text,
  p_storage text,
  p_subents text,
  p_val text,
  p_vallang text,
  p_vallinks text,
  p_valtype text,
  p_valunit text,
  p_vars text,
  partnerid text,
  referenceid text,
  size int,
  sourceip text,
  submitdate bigint,
  submitevent text,
  userid text,
  version text,
  PRIMARY KEY (e_entid, p_prop)
) WITH
  bloom_filter_fp_chance=0.01 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.00 AND
  gc_grace_seconds=864000 AND
  index_interval=128 AND
  read_repair_chance=0.10 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  default_time_to_live=0 AND
  speculative_retry='NONE' AND
  memtable_flush_period_in_ms=0 AND
  compaction={'class': 'SizeTieredCompactionStrategy'} AND
  compression={'sstable_compression': 'LZ4Compressor'};

CREATE INDEX internal_submission__JobDescribesIDX ON entity_job (describes);

CREATE INDEX internal_submission__JobDNDConditionIDX ON entity_job 
(dndcondition);

CREATE INDEX internal_submission__JobIngestStatusIDX ON entity_job 
(ingeststatus);

CREATE INDEX internal_submission__JobIngestStatusDetailIDX ON entity_job 
(ingeststatusdetail);

CREATE INDEX internal_submission__JobReferenceIDIDX ON entity_job (referenceid);

CREATE INDEX internal_submission__JobUserIDX ON entity_job (userid);

CREATE INDEX internal_submission__JobVersionIDX ON entity_job (version);

---

My suspicion is that the three-column-key IN Clause is translated (improperly 
or not) to a two-column key range with the assumption that the third column key 
is present in that range, but it isn't...

---

We have tried: nodetool cache invalidations, start/stop of cassandra. Those 

[jira] [Updated] (CASSANDRA-6137) CQL3 SELECT IN CLAUSE inconsistent

2013-10-15 Thread Constance Eustace (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Constance Eustace updated CASSANDRA-6137:
-

Description: 
I am elevating this to Critical after doing some trace and reproducing in 
several environments. No one has commented on this bug from the cassandra team, 
and I view unreliable/corrupted data a pretty big deal. We are considering 
pulling cassandra and using something else.

We have the data state reproduced locally in an environment that we can set 
TRACE logging, attach a debugger, etc. Some guidance as to where to look would 
be greatly appreciated.


--

We are encountering inconsistent results from CQL3 queries with column keys 
using IN clause in WHERE. This has been reproduced in cqlsh and the jdbc driver.

Rowkey is e_entid
Column key is p_prop

This returns roughly 21 rows for 21 column keys that match p_prop.

cqlsh SELECT 
e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_proplinks,p_subents,p_val,p_vallinks,p_vars
 FROM internal_submission.Entity_Job WHERE e_entid = 
'845b38f1-2b91-11e3-854d-126aad0075d4-CJOB';

These three queries each return one row for the requested single column key in 
the IN clause:

SELECT 
e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_proplinks,p_subents,p_val,p_vallinks,p_vars
 FROM internal_submission.Entity_Job WHERE e_entid = 
'845b38f1-2b91-11e3-854d-126aad0075d4-CJOB'  AND p_prop in 
('urn:bby:pcm:job:ingest:content:complete:count');
SELECT 
e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_proplinks,p_subents,p_val,p_vallinks,p_vars
 FROM internal_submission.Entity_Job WHERE e_entid = 
'845b38f1-2b91-11e3-854d-126aad0075d4-CJOB'  AND p_prop in 
('urn:bby:pcm:job:ingest:content:all:count');
SELECT 
e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_proplinks,p_subents,p_val,p_vallinks,p_vars
 FROM internal_submission.Entity_Job WHERE e_entid = 
'845b38f1-2b91-11e3-854d-126aad0075d4-CJOB'  AND p_prop in 
('urn:bby:pcm:job:ingest:content:fail:count');

This query returns ONLY ONE ROW (one column key), not three as I would expect 
from the three-column-key IN clause:

cqlsh SELECT 
e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_proplinks,p_subents,p_val,p_vallinks,p_vars
 FROM internal_submission.Entity_Job WHERE e_entid = 
'845b38f1-2b91-11e3-854d-126aad0075d4-CJOB'  AND p_prop in 
('urn:bby:pcm:job:ingest:content:complete:count','urn:bby:pcm:job:ingest:content:all:count','urn:bby:pcm:job:ingest:content:fail:count');

This query does return two rows however for the requested two column keys:

cqlsh SELECT 
e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_proplinks,p_subents,p_val,p_vallinks,p_vars
 FROM internal_submission.Entity_Job WHERE e_entid = 
'845b38f1-2b91-11e3-854d-126aad0075d4-CJOB'  AND p_prop in (

'urn:bby:pcm:job:ingest:content:all:count','urn:bby:pcm:job:ingest:content:fail:count');





cqlsh describe table internal_submission.entity_job;

CREATE TABLE entity_job (
  e_entid text,
  p_prop text,
  describes text,
  dndcondition text,
  e_entlinks text,
  e_entname text,
  e_enttype text,
  ingeststatus text,
  ingeststatusdetail text,
  p_flags text,
  p_propid text,
  p_proplinks text,
  p_storage text,
  p_subents text,
  p_val text,
  p_vallang text,
  p_vallinks text,
  p_valtype text,
  p_valunit text,
  p_vars text,
  partnerid text,
  referenceid text,
  size int,
  sourceip text,
  submitdate bigint,
  submitevent text,
  userid text,
  version text,
  PRIMARY KEY (e_entid, p_prop)
) WITH
  bloom_filter_fp_chance=0.01 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.00 AND
  gc_grace_seconds=864000 AND
  index_interval=128 AND
  read_repair_chance=0.10 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  default_time_to_live=0 AND
  speculative_retry='NONE' AND
  memtable_flush_period_in_ms=0 AND
  compaction={'class': 'SizeTieredCompactionStrategy'} AND
  compression={'sstable_compression': 'LZ4Compressor'};

CREATE INDEX internal_submission__JobDescribesIDX ON entity_job (describes);

CREATE INDEX internal_submission__JobDNDConditionIDX ON entity_job 
(dndcondition);

CREATE INDEX internal_submission__JobIngestStatusIDX ON entity_job 
(ingeststatus);

CREATE INDEX internal_submission__JobIngestStatusDetailIDX ON entity_job 
(ingeststatusdetail);

CREATE INDEX internal_submission__JobReferenceIDIDX ON entity_job (referenceid);

CREATE INDEX internal_submission__JobUserIDX ON entity_job (userid);

CREATE INDEX internal_submission__JobVersionIDX ON entity_job (version);

---

My suspicion is that the three-column-key IN Clause is translated (improperly 
or not) to a two-column key range with the assumption that 

[jira] [Updated] (CASSANDRA-6137) CQL3 SELECT IN CLAUSE inconsistent

2013-10-15 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-6137:
--

Priority: Minor  (was: Critical)

I got lost in the update soup.  Did you ever get to here's how to reproduce 
or are we still at yep, it's broke on my machine?



 CQL3 SELECT IN CLAUSE inconsistent
 --

 Key: CASSANDRA-6137
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6137
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Ubuntu AWS Cassandra 2.0.1 SINGLE NODE on EBS RAID 
 storage
 OSX Cassandra 1.2.8 on SSD storage
Reporter: Constance Eustace
Priority: Minor
 Fix For: 1.2.8, 2.0.1


 I am elevating this to Critical after doing some trace and reproducing in 
 several environments. No one has commented on this bug from the cassandra 
 team, and I view unreliable/corrupted data a pretty big deal. We are 
 considering pulling cassandra and using something else.
 We have the data state reproduced locally in an environment that we can set 
 TRACE logging, attach a debugger, etc. Some guidance as to where to look 
 would be greatly appreciated.
 --
 We are encountering inconsistent results from CQL3 queries with column keys 
 using IN clause in WHERE. This has been reproduced in cqlsh and the jdbc 
 driver.
 Rowkey is e_entid
 Column key is p_prop
 This returns roughly 21 rows for 21 column keys that match p_prop.
 cqlsh SELECT 
 e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_proplinks,p_subents,p_val,p_vallinks,p_vars
  FROM internal_submission.Entity_Job WHERE e_entid = 
 '845b38f1-2b91-11e3-854d-126aad0075d4-CJOB';
 These three queries each return one row for the requested single column key 
 in the IN clause:
 SELECT 
 e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_proplinks,p_subents,p_val,p_vallinks,p_vars
  FROM internal_submission.Entity_Job WHERE e_entid = 
 '845b38f1-2b91-11e3-854d-126aad0075d4-CJOB'  AND p_prop in 
 ('urn:bby:pcm:job:ingest:content:complete:count');
 SELECT 
 e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_proplinks,p_subents,p_val,p_vallinks,p_vars
  FROM internal_submission.Entity_Job WHERE e_entid = 
 '845b38f1-2b91-11e3-854d-126aad0075d4-CJOB'  AND p_prop in 
 ('urn:bby:pcm:job:ingest:content:all:count');
 SELECT 
 e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_proplinks,p_subents,p_val,p_vallinks,p_vars
  FROM internal_submission.Entity_Job WHERE e_entid = 
 '845b38f1-2b91-11e3-854d-126aad0075d4-CJOB'  AND p_prop in 
 ('urn:bby:pcm:job:ingest:content:fail:count');
 This query returns ONLY ONE ROW (one column key), not three as I would expect 
 from the three-column-key IN clause:
 cqlsh SELECT 
 e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_proplinks,p_subents,p_val,p_vallinks,p_vars
  FROM internal_submission.Entity_Job WHERE e_entid = 
 '845b38f1-2b91-11e3-854d-126aad0075d4-CJOB'  AND p_prop in 
 ('urn:bby:pcm:job:ingest:content:complete:count','urn:bby:pcm:job:ingest:content:all:count','urn:bby:pcm:job:ingest:content:fail:count');
 This query does return two rows however for the requested two column keys:
 cqlsh SELECT 
 e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_proplinks,p_subents,p_val,p_vallinks,p_vars
  FROM internal_submission.Entity_Job WHERE e_entid = 
 '845b38f1-2b91-11e3-854d-126aad0075d4-CJOB'  AND p_prop in (  
   
 'urn:bby:pcm:job:ingest:content:all:count','urn:bby:pcm:job:ingest:content:fail:count');
 cqlsh describe table internal_submission.entity_job;
 CREATE TABLE entity_job (
   e_entid text,
   p_prop text,
   describes text,
   dndcondition text,
   e_entlinks text,
   e_entname text,
   e_enttype text,
   ingeststatus text,
   ingeststatusdetail text,
   p_flags text,
   p_propid text,
   p_proplinks text,
   p_storage text,
   p_subents text,
   p_val text,
   p_vallang text,
   p_vallinks text,
   p_valtype text,
   p_valunit text,
   p_vars text,
   partnerid text,
   referenceid text,
   size int,
   sourceip text,
   submitdate bigint,
   submitevent text,
   userid text,
   version text,
   PRIMARY KEY (e_entid, p_prop)
 ) WITH
   bloom_filter_fp_chance=0.01 AND
   caching='KEYS_ONLY' AND
   comment='' AND
   dclocal_read_repair_chance=0.00 AND
   gc_grace_seconds=864000 AND
   index_interval=128 AND
   read_repair_chance=0.10 AND
   replicate_on_write='true' AND
   populate_io_cache_on_flush='false' AND
   default_time_to_live=0 AND
   speculative_retry='NONE' AND
   memtable_flush_period_in_ms=0 AND
   compaction={'class': 'SizeTieredCompactionStrategy'} AND
   compression={'sstable_compression': 'LZ4Compressor'};
 CREATE INDEX 

[jira] [Updated] (CASSANDRA-6137) CQL3 SELECT IN CLAUSE inconsistent

2013-10-15 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-6137:
--

Reproduced In: 2.0.1, 1.2.8
Fix Version/s: (was: 1.2.8)
   (was: 2.0.1)

 CQL3 SELECT IN CLAUSE inconsistent
 --

 Key: CASSANDRA-6137
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6137
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Ubuntu AWS Cassandra 2.0.1 SINGLE NODE on EBS RAID 
 storage
 OSX Cassandra 1.2.8 on SSD storage
Reporter: Constance Eustace
Priority: Minor

 I am elevating this to Critical after doing some trace and reproducing in 
 several environments. No one has commented on this bug from the cassandra 
 team, and I view unreliable/corrupted data a pretty big deal. We are 
 considering pulling cassandra and using something else.
 We have the data state reproduced locally in an environment that we can set 
 TRACE logging, attach a debugger, etc. Some guidance as to where to look 
 would be greatly appreciated.
 --
 We are encountering inconsistent results from CQL3 queries with column keys 
 using IN clause in WHERE. This has been reproduced in cqlsh and the jdbc 
 driver.
 Rowkey is e_entid
 Column key is p_prop
 This returns roughly 21 rows for 21 column keys that match p_prop.
 cqlsh SELECT 
 e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_proplinks,p_subents,p_val,p_vallinks,p_vars
  FROM internal_submission.Entity_Job WHERE e_entid = 
 '845b38f1-2b91-11e3-854d-126aad0075d4-CJOB';
 These three queries each return one row for the requested single column key 
 in the IN clause:
 SELECT 
 e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_proplinks,p_subents,p_val,p_vallinks,p_vars
  FROM internal_submission.Entity_Job WHERE e_entid = 
 '845b38f1-2b91-11e3-854d-126aad0075d4-CJOB'  AND p_prop in 
 ('urn:bby:pcm:job:ingest:content:complete:count');
 SELECT 
 e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_proplinks,p_subents,p_val,p_vallinks,p_vars
  FROM internal_submission.Entity_Job WHERE e_entid = 
 '845b38f1-2b91-11e3-854d-126aad0075d4-CJOB'  AND p_prop in 
 ('urn:bby:pcm:job:ingest:content:all:count');
 SELECT 
 e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_proplinks,p_subents,p_val,p_vallinks,p_vars
  FROM internal_submission.Entity_Job WHERE e_entid = 
 '845b38f1-2b91-11e3-854d-126aad0075d4-CJOB'  AND p_prop in 
 ('urn:bby:pcm:job:ingest:content:fail:count');
 This query returns ONLY ONE ROW (one column key), not three as I would expect 
 from the three-column-key IN clause:
 cqlsh SELECT 
 e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_proplinks,p_subents,p_val,p_vallinks,p_vars
  FROM internal_submission.Entity_Job WHERE e_entid = 
 '845b38f1-2b91-11e3-854d-126aad0075d4-CJOB'  AND p_prop in 
 ('urn:bby:pcm:job:ingest:content:complete:count','urn:bby:pcm:job:ingest:content:all:count','urn:bby:pcm:job:ingest:content:fail:count');
 This query does return two rows however for the requested two column keys:
 cqlsh SELECT 
 e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_proplinks,p_subents,p_val,p_vallinks,p_vars
  FROM internal_submission.Entity_Job WHERE e_entid = 
 '845b38f1-2b91-11e3-854d-126aad0075d4-CJOB'  AND p_prop in (  
   
 'urn:bby:pcm:job:ingest:content:all:count','urn:bby:pcm:job:ingest:content:fail:count');
 cqlsh describe table internal_submission.entity_job;
 CREATE TABLE entity_job (
   e_entid text,
   p_prop text,
   describes text,
   dndcondition text,
   e_entlinks text,
   e_entname text,
   e_enttype text,
   ingeststatus text,
   ingeststatusdetail text,
   p_flags text,
   p_propid text,
   p_proplinks text,
   p_storage text,
   p_subents text,
   p_val text,
   p_vallang text,
   p_vallinks text,
   p_valtype text,
   p_valunit text,
   p_vars text,
   partnerid text,
   referenceid text,
   size int,
   sourceip text,
   submitdate bigint,
   submitevent text,
   userid text,
   version text,
   PRIMARY KEY (e_entid, p_prop)
 ) WITH
   bloom_filter_fp_chance=0.01 AND
   caching='KEYS_ONLY' AND
   comment='' AND
   dclocal_read_repair_chance=0.00 AND
   gc_grace_seconds=864000 AND
   index_interval=128 AND
   read_repair_chance=0.10 AND
   replicate_on_write='true' AND
   populate_io_cache_on_flush='false' AND
   default_time_to_live=0 AND
   speculative_retry='NONE' AND
   memtable_flush_period_in_ms=0 AND
   compaction={'class': 'SizeTieredCompactionStrategy'} AND
   compression={'sstable_compression': 'LZ4Compressor'};
 CREATE INDEX internal_submission__JobDescribesIDX ON entity_job (describes);
 CREATE INDEX 

[jira] [Updated] (CASSANDRA-6137) CQL3 SELECT IN CLAUSE inconsistent

2013-10-07 Thread Constance Eustace (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Constance Eustace updated CASSANDRA-6137:
-

Description: 
We are encountering inconsistent results from CQL3 queries with column keys 
using IN clause in WHERE. This has been reproduced in cqlsh and the jdbc driver.

Rowkey is e_entid
Column key is p_prop

This returns roughly 21 rows for 21 column keys that match p_prop.

cqlsh SELECT 
e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_proplinks,p_subents,p_val,p_vallinks,p_vars
 FROM internal_submission.Entity_Job WHERE e_entid = 
'845b38f1-2b91-11e3-854d-126aad0075d4-CJOB';

These three queries each return one row for the requested single column key in 
the IN clause:

SELECT 
e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_proplinks,p_subents,p_val,p_vallinks,p_vars
 FROM internal_submission.Entity_Job WHERE e_entid = 
'845b38f1-2b91-11e3-854d-126aad0075d4-CJOB'  AND p_prop in 
('urn:bby:pcm:job:ingest:content:complete:count');
SELECT 
e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_proplinks,p_subents,p_val,p_vallinks,p_vars
 FROM internal_submission.Entity_Job WHERE e_entid = 
'845b38f1-2b91-11e3-854d-126aad0075d4-CJOB'  AND p_prop in 
('urn:bby:pcm:job:ingest:content:all:count');
SELECT 
e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_proplinks,p_subents,p_val,p_vallinks,p_vars
 FROM internal_submission.Entity_Job WHERE e_entid = 
'845b38f1-2b91-11e3-854d-126aad0075d4-CJOB'  AND p_prop in 
('urn:bby:pcm:job:ingest:content:fail:count');

This query returns ONLY ONE ROW (one column key), not three as I would expect 
from the three-column-key IN clause:

cqlsh SELECT 
e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_proplinks,p_subents,p_val,p_vallinks,p_vars
 FROM internal_submission.Entity_Job WHERE e_entid = 
'845b38f1-2b91-11e3-854d-126aad0075d4-CJOB'  AND p_prop in 
('urn:bby:pcm:job:ingest:content:complete:count','urn:bby:pcm:job:ingest:content:all:count','urn:bby:pcm:job:ingest:content:fail:count');

This query does return two rows however for the requested two column keys:

cqlsh SELECT 
e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_proplinks,p_subents,p_val,p_vallinks,p_vars
 FROM internal_submission.Entity_Job WHERE e_entid = 
'845b38f1-2b91-11e3-854d-126aad0075d4-CJOB'  AND p_prop in (

'urn:bby:pcm:job:ingest:content:all:count','urn:bby:pcm:job:ingest:content:fail:count');





cqlsh describe table internal_submission.entity_job;

CREATE TABLE entity_job (
  e_entid text,
  p_prop text,
  describes text,
  dndcondition text,
  e_entlinks text,
  e_entname text,
  e_enttype text,
  ingeststatus text,
  ingeststatusdetail text,
  p_flags text,
  p_propid text,
  p_proplinks text,
  p_storage text,
  p_subents text,
  p_val text,
  p_vallang text,
  p_vallinks text,
  p_valtype text,
  p_valunit text,
  p_vars text,
  partnerid text,
  referenceid text,
  size int,
  sourceip text,
  submitdate bigint,
  submitevent text,
  userid text,
  version text,
  PRIMARY KEY (e_entid, p_prop)
) WITH
  bloom_filter_fp_chance=0.01 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.00 AND
  gc_grace_seconds=864000 AND
  index_interval=128 AND
  read_repair_chance=0.10 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  default_time_to_live=0 AND
  speculative_retry='NONE' AND
  memtable_flush_period_in_ms=0 AND
  compaction={'class': 'SizeTieredCompactionStrategy'} AND
  compression={'sstable_compression': 'LZ4Compressor'};

CREATE INDEX internal_submission__JobDescribesIDX ON entity_job (describes);

CREATE INDEX internal_submission__JobDNDConditionIDX ON entity_job 
(dndcondition);

CREATE INDEX internal_submission__JobIngestStatusIDX ON entity_job 
(ingeststatus);

CREATE INDEX internal_submission__JobIngestStatusDetailIDX ON entity_job 
(ingeststatusdetail);

CREATE INDEX internal_submission__JobReferenceIDIDX ON entity_job (referenceid);

CREATE INDEX internal_submission__JobUserIDX ON entity_job (userid);

CREATE INDEX internal_submission__JobVersionIDX ON entity_job (version);

---

My suspicion is that the three-column-key IN Clause is translated (improperly 
or not) to a two-column key range with the assumption that the third column key 
is present in that range, but it isn't...


  was:
We are encountering inconsistent results from CQL3 queries with column keys 
using IN clause in WHERE. This has been reproduced in cqlsh.

Rowkey is e_entid
Column key is p_prop

This returns roughly 21 rows for 21 column keys that match p_prop.

cqlsh SELECT 
e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_proplinks,p_subents,p_val,p_vallinks,p_vars
 FROM internal_submission.Entity_Job WHERE e_entid = 

[jira] [Updated] (CASSANDRA-6137) CQL3 SELECT IN CLAUSE inconsistent

2013-10-03 Thread Constance Eustace (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Constance Eustace updated CASSANDRA-6137:
-

Environment: Ubuntu AWS Cassandra 2.0.1 SINGLE NODE  (was: Ubuntu AWS 
Cassandra 2.0.1)

 CQL3 SELECT IN CLAUSE inconsistent
 --

 Key: CASSANDRA-6137
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6137
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Ubuntu AWS Cassandra 2.0.1 SINGLE NODE
Reporter: Constance Eustace
 Fix For: 2.0.1


 We are encountering inconsistent results from CQL3 queries with column keys 
 using IN clause in WHERE. This has been reproduced in cqlsh.
 Rowkey is e_entid
 Column key is p_prop
 This returns roughly 21 rows for 21 column keys that match p_prop.
 cqlsh SELECT 
 e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_proplinks,p_subents,p_val,p_vallinks,p_vars
  FROM internal_submission.Entity_Job WHERE e_entid = 
 '845b38f1-2b91-11e3-854d-126aad0075d4-CJOB';
 These three queries each return one row for the requested single column key 
 in the IN clause:
 SELECT 
 e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_proplinks,p_subents,p_val,p_vallinks,p_vars
  FROM internal_submission.Entity_Job WHERE e_entid = 
 '845b38f1-2b91-11e3-854d-126aad0075d4-CJOB'  AND p_prop in 
 ('urn:bby:pcm:job:ingest:content:complete:count');
 SELECT 
 e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_proplinks,p_subents,p_val,p_vallinks,p_vars
  FROM internal_submission.Entity_Job WHERE e_entid = 
 '845b38f1-2b91-11e3-854d-126aad0075d4-CJOB'  AND p_prop in 
 ('urn:bby:pcm:job:ingest:content:all:count');
 SELECT 
 e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_proplinks,p_subents,p_val,p_vallinks,p_vars
  FROM internal_submission.Entity_Job WHERE e_entid = 
 '845b38f1-2b91-11e3-854d-126aad0075d4-CJOB'  AND p_prop in 
 ('urn:bby:pcm:job:ingest:content:fail:count');
 This query returns ONLY ONE ROW (one column key), not three as I would expect 
 from the three-column-key IN clause:
 cqlsh SELECT 
 e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_proplinks,p_subents,p_val,p_vallinks,p_vars
  FROM internal_submission.Entity_Job WHERE e_entid = 
 '845b38f1-2b91-11e3-854d-126aad0075d4-CJOB'  AND p_prop in 
 ('urn:bby:pcm:job:ingest:content:complete:count','urn:bby:pcm:job:ingest:content:all:count','urn:bby:pcm:job:ingest:content:fail:count');
 This query does return two rows however for the requested two column keys:
 cqlsh SELECT 
 e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_proplinks,p_subents,p_val,p_vallinks,p_vars
  FROM internal_submission.Entity_Job WHERE e_entid = 
 '845b38f1-2b91-11e3-854d-126aad0075d4-CJOB'  AND p_prop in (  
   
 'urn:bby:pcm:job:ingest:content:all:count','urn:bby:pcm:job:ingest:content:fail:count');
 cqlsh describe table internal_submission.entity_job;
 CREATE TABLE entity_job (
   e_entid text,
   p_prop text,
   describes text,
   dndcondition text,
   e_entlinks text,
   e_entname text,
   e_enttype text,
   ingeststatus text,
   ingeststatusdetail text,
   p_flags text,
   p_propid text,
   p_proplinks text,
   p_storage text,
   p_subents text,
   p_val text,
   p_vallang text,
   p_vallinks text,
   p_valtype text,
   p_valunit text,
   p_vars text,
   partnerid text,
   referenceid text,
   size int,
   sourceip text,
   submitdate bigint,
   submitevent text,
   userid text,
   version text,
   PRIMARY KEY (e_entid, p_prop)
 ) WITH
   bloom_filter_fp_chance=0.01 AND
   caching='KEYS_ONLY' AND
   comment='' AND
   dclocal_read_repair_chance=0.00 AND
   gc_grace_seconds=864000 AND
   index_interval=128 AND
   read_repair_chance=0.10 AND
   replicate_on_write='true' AND
   populate_io_cache_on_flush='false' AND
   default_time_to_live=0 AND
   speculative_retry='NONE' AND
   memtable_flush_period_in_ms=0 AND
   compaction={'class': 'SizeTieredCompactionStrategy'} AND
   compression={'sstable_compression': 'LZ4Compressor'};
 CREATE INDEX internal_submission__JobDescribesIDX ON entity_job (describes);
 CREATE INDEX internal_submission__JobDNDConditionIDX ON entity_job 
 (dndcondition);
 CREATE INDEX internal_submission__JobIngestStatusIDX ON entity_job 
 (ingeststatus);
 CREATE INDEX internal_submission__JobIngestStatusDetailIDX ON entity_job 
 (ingeststatusdetail);
 CREATE INDEX internal_submission__JobReferenceIDIDX ON entity_job 
 (referenceid);
 CREATE INDEX internal_submission__JobUserIDX ON entity_job (userid);
 CREATE INDEX internal_submission__JobVersionIDX ON entity_job (version);
 ---
 My suspicion is that the three-column-key IN Clause is translated (improperly 
 or not) to a two-column key range with the assumption 

[jira] [Updated] (CASSANDRA-6137) CQL3 SELECT IN CLAUSE inconsistent

2013-10-02 Thread Constance Eustace (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Constance Eustace updated CASSANDRA-6137:
-

Component/s: Core
Description: 
We are encountering inconsistent results from CQL3 queries with column keys 
using IN clause in WHERE. This has been reproduced in cqlsh.

Rowkey is e_entid
Column key is p_prop

This returns roughly 21 rows for 21 column keys that match p_prop.

cqlsh SELECT 
e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_proplinks,p_subents,p_val,p_vallinks,p_vars
 FROM internal_submission.Entity_Job WHERE e_entid = 
'845b38f1-2b91-11e3-854d-126aad0075d4-CJOB';

These three queries each return one row for the requested single column key in 
the IN clause:

SELECT 
e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_proplinks,p_subents,p_val,p_vallinks,p_vars
 FROM internal_submission.Entity_Job WHERE e_entid = 
'845b38f1-2b91-11e3-854d-126aad0075d4-CJOB'  AND p_prop in 
('urn:bby:pcm:job:ingest:content:complete:count');
SELECT 
e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_proplinks,p_subents,p_val,p_vallinks,p_vars
 FROM internal_submission.Entity_Job WHERE e_entid = 
'845b38f1-2b91-11e3-854d-126aad0075d4-CJOB'  AND p_prop in 
('urn:bby:pcm:job:ingest:content:all:count');
SELECT 
e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_proplinks,p_subents,p_val,p_vallinks,p_vars
 FROM internal_submission.Entity_Job WHERE e_entid = 
'845b38f1-2b91-11e3-854d-126aad0075d4-CJOB'  AND p_prop in 
('urn:bby:pcm:job:ingest:content:fail:count');

This query returns ONLY ONE ROW (one column key), not three as I would expect 
from the three-column-key IN clause:

cqlsh SELECT 
e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_proplinks,p_subents,p_val,p_vallinks,p_vars
 FROM internal_submission.Entity_Job WHERE e_entid = 
'845b38f1-2b91-11e3-854d-126aad0075d4-CJOB'  AND p_prop in 
('urn:bby:pcm:job:ingest:content:complete:count','urn:bby:pcm:job:ingest:content:all:count','urn:bby:pcm:job:ingest:content:fail:count');

This query does return two rows however for the requested two column keys:

cqlsh SELECT 
e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_proplinks,p_subents,p_val,p_vallinks,p_vars
 FROM internal_submission.Entity_Job WHERE e_entid = 
'845b38f1-2b91-11e3-854d-126aad0075d4-CJOB'  AND p_prop in (

'urn:bby:pcm:job:ingest:content:all:count','urn:bby:pcm:job:ingest:content:fail:count');





cqlsh describe table internal_submission.entity_job;

CREATE TABLE entity_job (
  e_entid text,
  p_prop text,
  describes text,
  dndcondition text,
  e_entlinks text,
  e_entname text,
  e_enttype text,
  ingeststatus text,
  ingeststatusdetail text,
  p_flags text,
  p_propid text,
  p_proplinks text,
  p_storage text,
  p_subents text,
  p_val text,
  p_vallang text,
  p_vallinks text,
  p_valtype text,
  p_valunit text,
  p_vars text,
  partnerid text,
  referenceid text,
  size int,
  sourceip text,
  submitdate bigint,
  submitevent text,
  userid text,
  version text,
  PRIMARY KEY (e_entid, p_prop)
) WITH
  bloom_filter_fp_chance=0.01 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.00 AND
  gc_grace_seconds=864000 AND
  index_interval=128 AND
  read_repair_chance=0.10 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  default_time_to_live=0 AND
  speculative_retry='NONE' AND
  memtable_flush_period_in_ms=0 AND
  compaction={'class': 'SizeTieredCompactionStrategy'} AND
  compression={'sstable_compression': 'LZ4Compressor'};

CREATE INDEX internal_submission__JobDescribesIDX ON entity_job (describes);

CREATE INDEX internal_submission__JobDNDConditionIDX ON entity_job 
(dndcondition);

CREATE INDEX internal_submission__JobIngestStatusIDX ON entity_job 
(ingeststatus);

CREATE INDEX internal_submission__JobIngestStatusDetailIDX ON entity_job 
(ingeststatusdetail);

CREATE INDEX internal_submission__JobReferenceIDIDX ON entity_job (referenceid);

CREATE INDEX internal_submission__JobUserIDX ON entity_job (userid);

CREATE INDEX internal_submission__JobVersionIDX ON entity_job (version);

---

My suspicion is that the three-column-key IN Clause is translated (improperly 
or not) to a two-column key range with the assumption that the third column key 
is present in that range, but it isn't...


  was:

We are encountering inconsistent results from CQL3 queries with column keys 
using IN clause in WHERE. This has been reproduced in cqlsh.

Rowkey is e_entid
Column key is p_prop

This returns roughly 21 rows for 21 column keys that match p_prop.

cqlsh SELECT 
e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_proplinks,p_subents,p_val,p_vallinks,p_vars
 FROM internal_submission.Entity_Job WHERE e_entid =