[jira] [Commented] (CASSANDRA-8062) IllegalArgumentException passing blob as tuple value element in list

2014-10-15 Thread Bill Mitchell (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14173200#comment-14173200
 ] 

Bill Mitchell commented on CASSANDRA-8062:
--

Great, Tyler.  I applied patch 8062.txt to my copy of the 2.1 source and it 
worked like a champ.  

 IllegalArgumentException passing blob as tuple value element in list
 

 Key: CASSANDRA-8062
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8062
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Windows 7, DataStax 2.1.0 Cassandra server, Java 
 cassandra-driver-2.1.1 
Reporter: Bill Mitchell
Assignee: Tyler Hobbs
 Fix For: 2.1.1

 Attachments: 8062.txt


 I am using the same table schema as described in earlier reports, e.g., 
 CASSANDRA-7105:
 {code}
 CREATE TABLE sr (siteid uuid, listid bigint, partition int, createdate 
 timestamp, emailcrypt blob, emailaddr text, properties text, removedate 
 timestamp. removeimportid bigint,
 PRIMARY KEY ((siteid, listid, partition), createdate, emailcrypt)
 ) WITH CLUSTERING ORDER BY (createdate DESC, emailcrypt DESC);
 {code}
 I am trying to take advantage of the new Tuple support to issue a query to 
 request multiple rows in a single wide row by (createdate,emailcrypt) pair.  
 I declare a new TupleType that covers the clustering columns and then issue 
 an IN predicate against a list of these values:
 {code}
 private static final TupleType dateEmailTupleType = 
 TupleType.of(DataType.timestamp(), DataType.blob());
 ...
 ListTupleValue partitionKeys = new ArrayList(recipKeys.size());
 ...
 BoundStatement boundStatement = new BoundStatement(preparedStatement);
 boundStatement = boundStatement.bind(siteID, partition, listID);
 boundStatement.setList(3, partitionKeys);
 {code}
 When I issue a SELECT against this table, the server fails apparently trying 
 to break apart the list values:
 {code}
 DEBUG [SharedPool-Worker-1] 2014-10-05 14:20:15,312 Message.java:420 - 
 Received: PREPARE SELECT emailCrypt, emailAddr, removeDate, removeImportID, 
 properties FROM sr WHERE siteID = ? AND partition = ? AND listID = ? AND ( 
 createDate, emailCrypt ) IN ? ;, v=2
 DEBUG [SharedPool-Worker-1] 2014-10-05 14:20:15,323 Tracing.java:157 - 
 request complete
 DEBUG [SharedPool-Worker-1] 2014-10-05 14:20:15,323 Message.java:433 - 
 Responding: RESULT PREPARED a18ff9151e8bd3b13b48a0ba56ecb784 
 [siteid(testdb_1412536748414, sr), 
 org.apache.cassandra.db.marshal.UUIDType][partition(testdb_1412536748414, 
 sr), org.apache.cassandra.db.marshal.Int32Type][listid(testdb_1412536748414, 
 sr), 
 org.apache.cassandra.db.marshal.LongType][in(createdate,emailcrypt)(testdb_1412536748414,
  sr), 
 org.apache.cassandra.db.marshal.ListType(org.apache.cassandra.db.marshal.TupleType(org.apache.cassandra.db.marshal.ReversedType(org.apache.cassandra.db.marshal.TimestampType),org.apache.cassandra.db.marshal.ReversedType(org.apache.cassandra.db.marshal.BytesType)))]
  (resultMetadata=[emailcrypt(testdb_1412536748414, sr), 
 org.apache.cassandra.db.marshal.ReversedType(org.apache.cassandra.db.marshal.BytesType)][emailaddr(testdb_1412536748414,
  sr), 
 org.apache.cassandra.db.marshal.UTF8Type][removedate(testdb_1412536748414, 
 sr), 
 org.apache.cassandra.db.marshal.TimestampType][removeimportid(testdb_1412536748414,
  sr), 
 org.apache.cassandra.db.marshal.LongType][properties(testdb_1412536748414, 
 sr), org.apache.cassandra.db.marshal.UTF8Type]), v=2
 DEBUG [SharedPool-Worker-1] 2014-10-05 14:20:15,363 Message.java:420 - 
 Received: EXECUTE a18ff9151e8bd3b13b48a0ba56ecb784 with 4 values at 
 consistency QUORUM, v=2
 DEBUG [SharedPool-Worker-2] 2014-10-05 14:20:15,380 Message.java:420 - 
 Received: EXECUTE a18ff9151e8bd3b13b48a0ba56ecb784 with 4 values at 
 consistency QUORUM, v=2
 DEBUG [SharedPool-Worker-5] 2014-10-05 14:20:15,402 Message.java:420 - 
 Received: EXECUTE a18ff9151e8bd3b13b48a0ba56ecb784 with 4 values at 
 consistency QUORUM, v=2
 ERROR [SharedPool-Worker-5] 2014-10-05 14:20:16,125 ErrorMessage.java:218 - 
 Unexpected exception during request
 java.lang.IllegalArgumentException: null
   at java.nio.Buffer.limit(Unknown Source) ~[na:1.7.0_25]
   at 
 org.apache.cassandra.utils.ByteBufferUtil.readBytes(ByteBufferUtil.java:539) 
 ~[apache-cassandra-2.1.0.jar:2.1.0]
   at 
 org.apache.cassandra.serializers.CollectionSerializer.readValue(CollectionSerializer.java:122)
  ~[apache-cassandra-2.1.0.jar:2.1.0]
   at 
 org.apache.cassandra.serializers.ListSerializer.deserializeForNativeProtocol(ListSerializer.java:87)
  ~[apache-cassandra-2.1.0.jar:2.1.0]
   at 
 org.apache.cassandra.serializers.ListSerializer.deserializeForNativeProtocol(ListSerializer.java:27)
  

[jira] [Commented] (CASSANDRA-8062) IllegalArgumentException passing blob as tuple value element in list

2014-10-11 Thread Bill Mitchell (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14168418#comment-14168418
 ] 

Bill Mitchell commented on CASSANDRA-8062:
--

Still using the same 2.1 sources I downloaded last weekend, I ran Cassandra in 
debug mode while also running my JUnit test.  On first glance, this appears to 
be an issue on the server side.  

Looking in the log above, I see on the PREPARE statement a trailing v=2 which 
makes me think it is specifying protocol version 2.  When I step back up the 
debug stack, Buffer.limit was passed a newLimit of 2097192, 
ByteBufferUtil.readBytes(:543) was passed a length of 2097152, 
CollectionSerializer.readValue(:122) was passed a version of 3, so it retrieved 
a four byte int value.  If I back up from the now current position of 40, I see 
in input, input[36]=0, input[37]=32, input[38]=0, input[39]=0.  This has the 
appearance of parsing a 4 byte size when it was passed a 2 byte size.  Stepping 
up two more levels to CollectionSerializer.deserialize(ByteBuffer)(:48), one 
sees where it forces the version number to 3, apparently thinking that only 
internal ByteBuffers are passed through this path, and these would all be v3.  

Of course, it may still be a driver bug, if the driver must use protocol v3 to 
pass such a statement.

 IllegalArgumentException passing blob as tuple value element in list
 

 Key: CASSANDRA-8062
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8062
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Windows 7, DataStax 2.1.0 Cassandra server, Java 
 cassandra-driver-2.1.1 
Reporter: Bill Mitchell
Assignee: Tyler Hobbs

 I am using the same table schema as described in earlier reports, e.g., 
 CASSANDRA-7105:
 {code}
 CREATE TABLE sr (siteid uuid, listid bigint, partition int, createdate 
 timestamp, emailcrypt blob, emailaddr text, properties text, removedate 
 timestamp. removeimportid bigint,
 PRIMARY KEY ((siteid, listid, partition), createdate, emailcrypt)
 ) WITH CLUSTERING ORDER BY (createdate DESC, emailcrypt DESC);
 {code}
 I am trying to take advantage of the new Tuple support to issue a query to 
 request multiple rows in a single wide row by (createdate,emailcrypt) pair.  
 I declare a new TupleType that covers the clustering columns and then issue 
 an IN predicate against a list of these values:
 {code}
 private static final TupleType dateEmailTupleType = 
 TupleType.of(DataType.timestamp(), DataType.blob());
 ...
 ListTupleValue partitionKeys = new ArrayList(recipKeys.size());
 ...
 BoundStatement boundStatement = new BoundStatement(preparedStatement);
 boundStatement = boundStatement.bind(siteID, partition, listID);
 boundStatement.setList(3, partitionKeys);
 {code}
 When I issue a SELECT against this table, the server fails apparently trying 
 to break apart the list values:
 {code}
 DEBUG [SharedPool-Worker-1] 2014-10-05 14:20:15,312 Message.java:420 - 
 Received: PREPARE SELECT emailCrypt, emailAddr, removeDate, removeImportID, 
 properties FROM sr WHERE siteID = ? AND partition = ? AND listID = ? AND ( 
 createDate, emailCrypt ) IN ? ;, v=2
 DEBUG [SharedPool-Worker-1] 2014-10-05 14:20:15,323 Tracing.java:157 - 
 request complete
 DEBUG [SharedPool-Worker-1] 2014-10-05 14:20:15,323 Message.java:433 - 
 Responding: RESULT PREPARED a18ff9151e8bd3b13b48a0ba56ecb784 
 [siteid(testdb_1412536748414, sr), 
 org.apache.cassandra.db.marshal.UUIDType][partition(testdb_1412536748414, 
 sr), org.apache.cassandra.db.marshal.Int32Type][listid(testdb_1412536748414, 
 sr), 
 org.apache.cassandra.db.marshal.LongType][in(createdate,emailcrypt)(testdb_1412536748414,
  sr), 
 org.apache.cassandra.db.marshal.ListType(org.apache.cassandra.db.marshal.TupleType(org.apache.cassandra.db.marshal.ReversedType(org.apache.cassandra.db.marshal.TimestampType),org.apache.cassandra.db.marshal.ReversedType(org.apache.cassandra.db.marshal.BytesType)))]
  (resultMetadata=[emailcrypt(testdb_1412536748414, sr), 
 org.apache.cassandra.db.marshal.ReversedType(org.apache.cassandra.db.marshal.BytesType)][emailaddr(testdb_1412536748414,
  sr), 
 org.apache.cassandra.db.marshal.UTF8Type][removedate(testdb_1412536748414, 
 sr), 
 org.apache.cassandra.db.marshal.TimestampType][removeimportid(testdb_1412536748414,
  sr), 
 org.apache.cassandra.db.marshal.LongType][properties(testdb_1412536748414, 
 sr), org.apache.cassandra.db.marshal.UTF8Type]), v=2
 DEBUG [SharedPool-Worker-1] 2014-10-05 14:20:15,363 Message.java:420 - 
 Received: EXECUTE a18ff9151e8bd3b13b48a0ba56ecb784 with 4 values at 
 consistency QUORUM, v=2
 DEBUG [SharedPool-Worker-2] 2014-10-05 14:20:15,380 Message.java:420 - 
 Received: EXECUTE 

[jira] [Comment Edited] (CASSANDRA-8062) IllegalArgumentException passing blob as tuple value element in list

2014-10-11 Thread Bill Mitchell (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14168418#comment-14168418
 ] 

Bill Mitchell edited comment on CASSANDRA-8062 at 10/12/14 12:27 AM:
-

Still using the same 2.1 sources I downloaded last weekend, I ran Cassandra in 
debug mode while also running my JUnit test.  On first glance, this appears to 
be an issue on the server side.  

Looking in the log above, I see on the PREPARE statement a trailing v=2 which 
makes me think it is specifying protocol version 2.  When I step back up the 
debug stack, Buffer.limit was passed a newLimit of 2097192, 
ByteBufferUtil.readBytes(:543) was passed a length of 2097152, 
CollectionSerializer.readValue(:122) was passed a version of 3, so it retrieved 
a four byte int value.  If I back up from the now current position of 40, I see 
in input, input[36]=0, input[37]=32, input[38]=0, input[39]=0.  This has the 
appearance of parsing a 4 byte size when it was passed a 2 byte size.  Stepping 
up two more levels to CollectionSerializer.deserialize(ByteBuffer)(:48), one 
sees where it forces the version number to 3, apparently thinking that only 
internal ByteBuffers are passed through this path, and these would all be v3.  

Of course, it may still be a driver bug, if the driver must use protocol v3 to 
pass such a statement.  Looking at the driver interface, though, 
Cluster.Builder accepts only protocol versions 1 or 2.


was (Author: wtmitchell3):
Still using the same 2.1 sources I downloaded last weekend, I ran Cassandra in 
debug mode while also running my JUnit test.  On first glance, this appears to 
be an issue on the server side.  

Looking in the log above, I see on the PREPARE statement a trailing v=2 which 
makes me think it is specifying protocol version 2.  When I step back up the 
debug stack, Buffer.limit was passed a newLimit of 2097192, 
ByteBufferUtil.readBytes(:543) was passed a length of 2097152, 
CollectionSerializer.readValue(:122) was passed a version of 3, so it retrieved 
a four byte int value.  If I back up from the now current position of 40, I see 
in input, input[36]=0, input[37]=32, input[38]=0, input[39]=0.  This has the 
appearance of parsing a 4 byte size when it was passed a 2 byte size.  Stepping 
up two more levels to CollectionSerializer.deserialize(ByteBuffer)(:48), one 
sees where it forces the version number to 3, apparently thinking that only 
internal ByteBuffers are passed through this path, and these would all be v3.  

Of course, it may still be a driver bug, if the driver must use protocol v3 to 
pass such a statement.  Looking at the driver interface, though, the 
ClusterBuilder accepts only protocol versions 1 or 2.

 IllegalArgumentException passing blob as tuple value element in list
 

 Key: CASSANDRA-8062
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8062
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Windows 7, DataStax 2.1.0 Cassandra server, Java 
 cassandra-driver-2.1.1 
Reporter: Bill Mitchell
Assignee: Tyler Hobbs

 I am using the same table schema as described in earlier reports, e.g., 
 CASSANDRA-7105:
 {code}
 CREATE TABLE sr (siteid uuid, listid bigint, partition int, createdate 
 timestamp, emailcrypt blob, emailaddr text, properties text, removedate 
 timestamp. removeimportid bigint,
 PRIMARY KEY ((siteid, listid, partition), createdate, emailcrypt)
 ) WITH CLUSTERING ORDER BY (createdate DESC, emailcrypt DESC);
 {code}
 I am trying to take advantage of the new Tuple support to issue a query to 
 request multiple rows in a single wide row by (createdate,emailcrypt) pair.  
 I declare a new TupleType that covers the clustering columns and then issue 
 an IN predicate against a list of these values:
 {code}
 private static final TupleType dateEmailTupleType = 
 TupleType.of(DataType.timestamp(), DataType.blob());
 ...
 ListTupleValue partitionKeys = new ArrayList(recipKeys.size());
 ...
 BoundStatement boundStatement = new BoundStatement(preparedStatement);
 boundStatement = boundStatement.bind(siteID, partition, listID);
 boundStatement.setList(3, partitionKeys);
 {code}
 When I issue a SELECT against this table, the server fails apparently trying 
 to break apart the list values:
 {code}
 DEBUG [SharedPool-Worker-1] 2014-10-05 14:20:15,312 Message.java:420 - 
 Received: PREPARE SELECT emailCrypt, emailAddr, removeDate, removeImportID, 
 properties FROM sr WHERE siteID = ? AND partition = ? AND listID = ? AND ( 
 createDate, emailCrypt ) IN ? ;, v=2
 DEBUG [SharedPool-Worker-1] 2014-10-05 14:20:15,323 Tracing.java:157 - 
 request complete
 DEBUG [SharedPool-Worker-1] 2014-10-05 14:20:15,323 

[jira] [Comment Edited] (CASSANDRA-8062) IllegalArgumentException passing blob as tuple value element in list

2014-10-11 Thread Bill Mitchell (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14168418#comment-14168418
 ] 

Bill Mitchell edited comment on CASSANDRA-8062 at 10/12/14 12:26 AM:
-

Still using the same 2.1 sources I downloaded last weekend, I ran Cassandra in 
debug mode while also running my JUnit test.  On first glance, this appears to 
be an issue on the server side.  

Looking in the log above, I see on the PREPARE statement a trailing v=2 which 
makes me think it is specifying protocol version 2.  When I step back up the 
debug stack, Buffer.limit was passed a newLimit of 2097192, 
ByteBufferUtil.readBytes(:543) was passed a length of 2097152, 
CollectionSerializer.readValue(:122) was passed a version of 3, so it retrieved 
a four byte int value.  If I back up from the now current position of 40, I see 
in input, input[36]=0, input[37]=32, input[38]=0, input[39]=0.  This has the 
appearance of parsing a 4 byte size when it was passed a 2 byte size.  Stepping 
up two more levels to CollectionSerializer.deserialize(ByteBuffer)(:48), one 
sees where it forces the version number to 3, apparently thinking that only 
internal ByteBuffers are passed through this path, and these would all be v3.  

Of course, it may still be a driver bug, if the driver must use protocol v3 to 
pass such a statement.  Looking at the driver interface, though, the 
ClusterBuilder accepts only protocol versions 1 or 2.


was (Author: wtmitchell3):
Still using the same 2.1 sources I downloaded last weekend, I ran Cassandra in 
debug mode while also running my JUnit test.  On first glance, this appears to 
be an issue on the server side.  

Looking in the log above, I see on the PREPARE statement a trailing v=2 which 
makes me think it is specifying protocol version 2.  When I step back up the 
debug stack, Buffer.limit was passed a newLimit of 2097192, 
ByteBufferUtil.readBytes(:543) was passed a length of 2097152, 
CollectionSerializer.readValue(:122) was passed a version of 3, so it retrieved 
a four byte int value.  If I back up from the now current position of 40, I see 
in input, input[36]=0, input[37]=32, input[38]=0, input[39]=0.  This has the 
appearance of parsing a 4 byte size when it was passed a 2 byte size.  Stepping 
up two more levels to CollectionSerializer.deserialize(ByteBuffer)(:48), one 
sees where it forces the version number to 3, apparently thinking that only 
internal ByteBuffers are passed through this path, and these would all be v3.  

Of course, it may still be a driver bug, if the driver must use protocol v3 to 
pass such a statement.

 IllegalArgumentException passing blob as tuple value element in list
 

 Key: CASSANDRA-8062
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8062
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Windows 7, DataStax 2.1.0 Cassandra server, Java 
 cassandra-driver-2.1.1 
Reporter: Bill Mitchell
Assignee: Tyler Hobbs

 I am using the same table schema as described in earlier reports, e.g., 
 CASSANDRA-7105:
 {code}
 CREATE TABLE sr (siteid uuid, listid bigint, partition int, createdate 
 timestamp, emailcrypt blob, emailaddr text, properties text, removedate 
 timestamp. removeimportid bigint,
 PRIMARY KEY ((siteid, listid, partition), createdate, emailcrypt)
 ) WITH CLUSTERING ORDER BY (createdate DESC, emailcrypt DESC);
 {code}
 I am trying to take advantage of the new Tuple support to issue a query to 
 request multiple rows in a single wide row by (createdate,emailcrypt) pair.  
 I declare a new TupleType that covers the clustering columns and then issue 
 an IN predicate against a list of these values:
 {code}
 private static final TupleType dateEmailTupleType = 
 TupleType.of(DataType.timestamp(), DataType.blob());
 ...
 ListTupleValue partitionKeys = new ArrayList(recipKeys.size());
 ...
 BoundStatement boundStatement = new BoundStatement(preparedStatement);
 boundStatement = boundStatement.bind(siteID, partition, listID);
 boundStatement.setList(3, partitionKeys);
 {code}
 When I issue a SELECT against this table, the server fails apparently trying 
 to break apart the list values:
 {code}
 DEBUG [SharedPool-Worker-1] 2014-10-05 14:20:15,312 Message.java:420 - 
 Received: PREPARE SELECT emailCrypt, emailAddr, removeDate, removeImportID, 
 properties FROM sr WHERE siteID = ? AND partition = ? AND listID = ? AND ( 
 createDate, emailCrypt ) IN ? ;, v=2
 DEBUG [SharedPool-Worker-1] 2014-10-05 14:20:15,323 Tracing.java:157 - 
 request complete
 DEBUG [SharedPool-Worker-1] 2014-10-05 14:20:15,323 Message.java:433 - 
 Responding: RESULT PREPARED a18ff9151e8bd3b13b48a0ba56ecb784 
 

[jira] [Commented] (CASSANDRA-7105) SELECT with IN on final column of composite and compound primary key fails

2014-10-09 Thread Bill Mitchell (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14165105#comment-14165105
 ] 

Bill Mitchell commented on CASSANDRA-7105:
--

Although I did not validate this when the fixed 2.0 driver was released, I have 
now verified that my test does not fail with the 2.1.1 driver.  

 SELECT with IN on final column of composite and compound primary key fails
 --

 Key: CASSANDRA-7105
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7105
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: DataStax Cassandra 2.0.7
 Windows dual-core laptop
Reporter: Bill Mitchell
Assignee: Sylvain Lebresne
 Fix For: 1.2.17

 Attachments: 7105-v2.txt, 7105.txt


 I have a failing sequence where I specify an IN constraint on the final int 
 column of the composite primary key and an IN constraint on the final String 
 column of the compound primary key and no rows are returned, when rows should 
 be returned.  
 {noformat}
 CREATE TABLE IF NOT EXISTS sr2 (siteID TEXT, partition INT, listID BIGINT, 
 emailAddr TEXT, emailCrypt TEXT, createDate TIMESTAMP, removeDate TIMESTAMP, 
 removeImportID BIGINT, properties TEXT, PRIMARY KEY ((siteID, listID, 
 partition), createDate, emailCrypt) ) WITH CLUSTERING ORDER BY (createDate 
 DESC, emailCrypt DESC)  AND compression = {'sstable_compression' : 
 'SnappyCompressor'} AND compaction = {'class' : 
 'SizeTieredCompactionStrategy'};
 insert into sr2 (siteID, listID, partition, emailAddr, emailCrypt, 
 createDate) values ('4ca4f79e-3ab2-41c5-ae42-c7009736f1d5', 34, 1, 'xyzzy', 
 '5fe7719229092cdde4526afbc65c900c', '2014-04-28T14:05:59.236-0500');
 insert into sr2 (siteID, listID, partition, emailAddr, emailCrypt, 
 createDate) values ('4ca4f79e-3ab2-41c5-ae42-c7009736f1d5', 34, 2, 'noname', 
 '97bf28af2ca9c498d6e47237bb8680bf', '2014-04-28T14:05:59.236-0500');
 select emailCrypt, emailAddr from sr2 where siteID = 
 '4ca4f79e-3ab2-41c5-ae42-c7009736f1d5' and listID = 34 and partition = 2 and 
 createDate = '2014-04-28T14:05:59.236-0500' and emailCrypt = 
 '97bf28af2ca9c498d6e47237bb8680bf';
  emailcrypt   | emailaddr
 --+---
  97bf28af2ca9c498d6e47237bb8680bf |noname
 (1 rows)
 select emailCrypt, emailAddr  from sr2 where siteID = 
 '4ca4f79e-3ab2-41c5-ae42-c7009736f1d5' and listID = 34 and partition = 1 and 
 createDate = '2014-04-28T14:05:59.236-0500' and emailCrypt = 
 '5fe7719229092cdde4526afbc65c900c';
  emailcrypt   | emailaddr
 --+---
  5fe7719229092cdde4526afbc65c900c | xyzzy
 (1 rows)
 select emailCrypt, emailAddr from sr2 where siteID = 
 '4ca4f79e-3ab2-41c5-ae42-c7009736f1d5' and listID = 34 and partition IN (1,2) 
 and createDate = '2014-04-28T14:05:59.236-0500' and emailCrypt IN 
 ('97bf28af2ca9c498d6e47237bb8680bf','5fe7719229092cdde4526afbc65c900c');
 (0 rows)
 cqlsh:test_multiple_in select * from sr2;
  siteid   | listid | partition | createdate   
 | emailcrypt | emailaddr| 
 properties | removedate | re
 moveimportid
 --++---+--++--+++---
 -
  4ca4f79e-3ab2-41c5-ae42-c7009736f1d5 | 34 | 2 | 2014-04-28 
 14:05:59Central Daylight Time | noname | 97bf28af2ca9c498d6e47237bb8680bf 
 |   null |   null |
 null
  4ca4f79e-3ab2-41c5-ae42-c7009736f1d5 | 34 | 1 | 2014-04-28 
 14:05:59Central Daylight Time |  xyzzy | 5fe7719229092cdde4526afbc65c900c 
 |   null |   null |
 null
 (2 rows)
 select emailCrypt, emailAddr from sr2 where siteID = 
 '4ca4f79e-3ab2-41c5-ae42-c7009736f1d5' and listID = 34 and partition IN (1,2) 
 and createDate = '2014-04-28T14:05:59.236-0500' and emailCrypt IN 
 ('97bf28af2ca9c498d6e47237bb8680bf','5fe7719229092cdde4526afbc65c900c');
 (0 rows)
 select emailCrypt, emailAddr from sr2 where siteID = 
 '4ca4f79e-3ab2-41c5-ae42-c7009736f1d5' and listID = 34 and partition = 1 and 
 createDate = '2014-04-28T14:05:59.236-0500' and emailCrypt IN 
 ('97bf28af2ca9c498d6e47237bb8680bf','5fe7719229092cdde4526afbc65c900c');
 (0 rows)
 select emailCrypt, emailAddr from sr2 where siteID = 
 '4ca4f79e-3ab2-41c5-ae42-c7009736f1d5' and listID = 34 and partition = 2 and 
 createDate = '2014-04-28T14:05:59.236-0500' and emailCrypt IN 
 ('97bf28af2ca9c498d6e47237bb8680bf','5fe7719229092cdde4526afbc65c900c');
 (0 rows)
 select emailCrypt, emailAddr from sr2 where siteID = 
 '4ca4f79e-3ab2-41c5-ae42-c7009736f1d5' and 

[jira] [Comment Edited] (CASSANDRA-8062) IllegalArgumentException passing blob as tuple value element in list

2014-10-07 Thread Bill Mitchell (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14161776#comment-14161776
 ] 

Bill Mitchell edited comment on CASSANDRA-8062 at 10/7/14 11:28 AM:


Thank you for your response, Robert.  That was one of my concerns when I read 
the comments around TupleType in the Java driver, that it might be restricting 
it to columns described as tuple, and not to tuples being passed as parameters. 
 

I had the impression when reading about the tuple syntax in the WHERE predicate 
that the intent was to make this generally available in the 2.1. release.  This 
would seem to be an oversight in the Java driver interface, that we have a CQL 
statement that is syntactically correct, but there is no method, and least none 
that I've uncovered, that supports passing bound parameters for such a 
statement when it has been prepared.  


was (Author: wtmitchell3):
Thank you for your response, Robert.  That was one of my concerns when I read 
the comments around TupleType in the Java driver, that it might be restricting 
it to columns described as tuple, and not to tuples being passed as parameters. 
 

I certainly had the impression when reading about the tuple syntax in the WHERE 
predicate that the intent was to make this generally available in the 2.1. 
release.  This would seem to be an oversight in the Java driver interface, that 
we have a CQL statement that is syntactically correct, but there is no method, 
and least none that I've uncovered, that supports passing bound parameters for 
such a statement when it has been prepared.  

 IllegalArgumentException passing blob as tuple value element in list
 

 Key: CASSANDRA-8062
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8062
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Windows 7, DataStax 2.1.0 Cassandra server, Java 
 cassandra-driver-2.1.1 
Reporter: Bill Mitchell

 I am using the same table schema as described in earlier reports, e.g., 
 CASSANDRA-7105:
 {code}
 CREATE TABLE sr (siteid uuid, listid bigint, partition int, createdate 
 timestamp, emailcrypt blob, emailaddr text, properties text, removedate 
 timestamp. removeimportid bigint,
 PRIMARY KEY ((siteid, listid, partition), createdate, emailcrypt)
 ) WITH CLUSTERING ORDER BY (createdate DESC, emailcrypt DESC);
 {code}
 I am trying to take advantage of the new Tuple support to issue a query to 
 request multiple rows in a single wide row by (createdate,emailcrypt) pair.  
 I declare a new TupleType that covers the clustering columns and then issue 
 an IN predicate against a list of these values:
 {code}
 private static final TupleType dateEmailTupleType = 
 TupleType.of(DataType.timestamp(), DataType.blob());
 ...
 ListTupleValue partitionKeys = new ArrayList(recipKeys.size());
 ...
 BoundStatement boundStatement = new BoundStatement(preparedStatement);
 boundStatement = boundStatement.bind(siteID, partition, listID);
 boundStatement.setList(3, partitionKeys);
 {code}
 When I issue a SELECT against this table, the server fails apparently trying 
 to break apart the list values:
 {code}
 DEBUG [SharedPool-Worker-1] 2014-10-05 14:20:15,312 Message.java:420 - 
 Received: PREPARE SELECT emailCrypt, emailAddr, removeDate, removeImportID, 
 properties FROM sr WHERE siteID = ? AND partition = ? AND listID = ? AND ( 
 createDate, emailCrypt ) IN ? ;, v=2
 DEBUG [SharedPool-Worker-1] 2014-10-05 14:20:15,323 Tracing.java:157 - 
 request complete
 DEBUG [SharedPool-Worker-1] 2014-10-05 14:20:15,323 Message.java:433 - 
 Responding: RESULT PREPARED a18ff9151e8bd3b13b48a0ba56ecb784 
 [siteid(testdb_1412536748414, sr), 
 org.apache.cassandra.db.marshal.UUIDType][partition(testdb_1412536748414, 
 sr), org.apache.cassandra.db.marshal.Int32Type][listid(testdb_1412536748414, 
 sr), 
 org.apache.cassandra.db.marshal.LongType][in(createdate,emailcrypt)(testdb_1412536748414,
  sr), 
 org.apache.cassandra.db.marshal.ListType(org.apache.cassandra.db.marshal.TupleType(org.apache.cassandra.db.marshal.ReversedType(org.apache.cassandra.db.marshal.TimestampType),org.apache.cassandra.db.marshal.ReversedType(org.apache.cassandra.db.marshal.BytesType)))]
  (resultMetadata=[emailcrypt(testdb_1412536748414, sr), 
 org.apache.cassandra.db.marshal.ReversedType(org.apache.cassandra.db.marshal.BytesType)][emailaddr(testdb_1412536748414,
  sr), 
 org.apache.cassandra.db.marshal.UTF8Type][removedate(testdb_1412536748414, 
 sr), 
 org.apache.cassandra.db.marshal.TimestampType][removeimportid(testdb_1412536748414,
  sr), 
 org.apache.cassandra.db.marshal.LongType][properties(testdb_1412536748414, 
 sr), org.apache.cassandra.db.marshal.UTF8Type]), v=2
 DEBUG 

[jira] [Commented] (CASSANDRA-8062) IllegalArgumentException passing blob as tuple value element in list

2014-10-07 Thread Bill Mitchell (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14161776#comment-14161776
 ] 

Bill Mitchell commented on CASSANDRA-8062:
--

Thank you for your response, Robert.  That was one of my concerns when I read 
the comments around TupleType in the Java driver, that it might be restricting 
it to columns described as tuple, and not to tuples being passed as parameters. 
 

I certainly had the impression when reading about the tuple syntax in the WHERE 
predicate that the intent was to make this generally available in the 2.1. 
release.  This would seem to be an oversight in the Java driver interface, that 
we have a CQL statement that is syntactically correct, but there is no method, 
and least none that I've uncovered, that supports passing bound parameters for 
such a statement when it has been prepared.  

 IllegalArgumentException passing blob as tuple value element in list
 

 Key: CASSANDRA-8062
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8062
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Windows 7, DataStax 2.1.0 Cassandra server, Java 
 cassandra-driver-2.1.1 
Reporter: Bill Mitchell

 I am using the same table schema as described in earlier reports, e.g., 
 CASSANDRA-7105:
 {code}
 CREATE TABLE sr (siteid uuid, listid bigint, partition int, createdate 
 timestamp, emailcrypt blob, emailaddr text, properties text, removedate 
 timestamp. removeimportid bigint,
 PRIMARY KEY ((siteid, listid, partition), createdate, emailcrypt)
 ) WITH CLUSTERING ORDER BY (createdate DESC, emailcrypt DESC);
 {code}
 I am trying to take advantage of the new Tuple support to issue a query to 
 request multiple rows in a single wide row by (createdate,emailcrypt) pair.  
 I declare a new TupleType that covers the clustering columns and then issue 
 an IN predicate against a list of these values:
 {code}
 private static final TupleType dateEmailTupleType = 
 TupleType.of(DataType.timestamp(), DataType.blob());
 ...
 ListTupleValue partitionKeys = new ArrayList(recipKeys.size());
 ...
 BoundStatement boundStatement = new BoundStatement(preparedStatement);
 boundStatement = boundStatement.bind(siteID, partition, listID);
 boundStatement.setList(3, partitionKeys);
 {code}
 When I issue a SELECT against this table, the server fails apparently trying 
 to break apart the list values:
 {code}
 DEBUG [SharedPool-Worker-1] 2014-10-05 14:20:15,312 Message.java:420 - 
 Received: PREPARE SELECT emailCrypt, emailAddr, removeDate, removeImportID, 
 properties FROM sr WHERE siteID = ? AND partition = ? AND listID = ? AND ( 
 createDate, emailCrypt ) IN ? ;, v=2
 DEBUG [SharedPool-Worker-1] 2014-10-05 14:20:15,323 Tracing.java:157 - 
 request complete
 DEBUG [SharedPool-Worker-1] 2014-10-05 14:20:15,323 Message.java:433 - 
 Responding: RESULT PREPARED a18ff9151e8bd3b13b48a0ba56ecb784 
 [siteid(testdb_1412536748414, sr), 
 org.apache.cassandra.db.marshal.UUIDType][partition(testdb_1412536748414, 
 sr), org.apache.cassandra.db.marshal.Int32Type][listid(testdb_1412536748414, 
 sr), 
 org.apache.cassandra.db.marshal.LongType][in(createdate,emailcrypt)(testdb_1412536748414,
  sr), 
 org.apache.cassandra.db.marshal.ListType(org.apache.cassandra.db.marshal.TupleType(org.apache.cassandra.db.marshal.ReversedType(org.apache.cassandra.db.marshal.TimestampType),org.apache.cassandra.db.marshal.ReversedType(org.apache.cassandra.db.marshal.BytesType)))]
  (resultMetadata=[emailcrypt(testdb_1412536748414, sr), 
 org.apache.cassandra.db.marshal.ReversedType(org.apache.cassandra.db.marshal.BytesType)][emailaddr(testdb_1412536748414,
  sr), 
 org.apache.cassandra.db.marshal.UTF8Type][removedate(testdb_1412536748414, 
 sr), 
 org.apache.cassandra.db.marshal.TimestampType][removeimportid(testdb_1412536748414,
  sr), 
 org.apache.cassandra.db.marshal.LongType][properties(testdb_1412536748414, 
 sr), org.apache.cassandra.db.marshal.UTF8Type]), v=2
 DEBUG [SharedPool-Worker-1] 2014-10-05 14:20:15,363 Message.java:420 - 
 Received: EXECUTE a18ff9151e8bd3b13b48a0ba56ecb784 with 4 values at 
 consistency QUORUM, v=2
 DEBUG [SharedPool-Worker-2] 2014-10-05 14:20:15,380 Message.java:420 - 
 Received: EXECUTE a18ff9151e8bd3b13b48a0ba56ecb784 with 4 values at 
 consistency QUORUM, v=2
 DEBUG [SharedPool-Worker-5] 2014-10-05 14:20:15,402 Message.java:420 - 
 Received: EXECUTE a18ff9151e8bd3b13b48a0ba56ecb784 with 4 values at 
 consistency QUORUM, v=2
 ERROR [SharedPool-Worker-5] 2014-10-05 14:20:16,125 ErrorMessage.java:218 - 
 Unexpected exception during request
 java.lang.IllegalArgumentException: null
   at java.nio.Buffer.limit(Unknown Source) ~[na:1.7.0_25]
   at 
 

[jira] [Commented] (CASSANDRA-8062) IllegalArgumentException passing blob as tuple value element in list

2014-10-06 Thread Bill Mitchell (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14161301#comment-14161301
 ] 

Bill Mitchell commented on CASSANDRA-8062:
--

The basic CQL statement works fine through the CQL Shell.  The difference is 
that the CQL Shell passes the statement as text, where with the Java driver a 
prepared statement was used with bound parameters, so the request should be 
passed in a binary format.  So I expect the issue is a protocol formatting 
issue, on one side or the other.  

Using the 2.1 branch I downloaded from github yesterday, I built a later 
server, and the failure mode is the same; only the line numbers are slightly 
different.

 IllegalArgumentException passing blob as tuple value element in list
 

 Key: CASSANDRA-8062
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8062
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Windows 7, DataStax 2.1.0 Cassandra server, Java 
 cassandra-driver-2.1.1 
Reporter: Bill Mitchell

 I am using the same table schema as described in earlier reports, e.g., 
 CASSANDRA-7105:
 {code}
 CREATE TABLE sr (siteid uuid, listid bigint, partition int, createdate 
 timestamp, emailcrypt blob, emailaddr text, properties text, removedate 
 timestamp. removeimportid bigint,
 PRIMARY KEY ((siteid, listid, partition), createdate, emailcrypt)
 ) WITH CLUSTERING ORDER BY (createdate DESC, emailcrypt DESC);
 {code}
 I am trying to take advantage of the new Tuple support to issue a query to 
 request multiple rows in a single wide row by (createdate,emailcrypt) pair.  
 I declare a new TupleType that covers the clustering columns and then issue 
 an IN predicate against a list of these values:
 {code}
 private static final TupleType dateEmailTupleType = 
 TupleType.of(DataType.timestamp(), DataType.blob());
 ...
 ListTupleValue partitionKeys = new ArrayList(recipKeys.size());
 ...
 BoundStatement boundStatement = new BoundStatement(preparedStatement);
 boundStatement = boundStatement.bind(siteID, partition, listID);
 boundStatement.setList(3, partitionKeys);
 {code}
 When I issue a SELECT against this table, the server fails apparently trying 
 to break apart the list values:
 {code}
 DEBUG [SharedPool-Worker-1] 2014-10-05 14:20:15,312 Message.java:420 - 
 Received: PREPARE SELECT emailCrypt, emailAddr, removeDate, removeImportID, 
 properties FROM sr WHERE siteID = ? AND partition = ? AND listID = ? AND ( 
 createDate, emailCrypt ) IN ? ;, v=2
 DEBUG [SharedPool-Worker-1] 2014-10-05 14:20:15,323 Tracing.java:157 - 
 request complete
 DEBUG [SharedPool-Worker-1] 2014-10-05 14:20:15,323 Message.java:433 - 
 Responding: RESULT PREPARED a18ff9151e8bd3b13b48a0ba56ecb784 
 [siteid(testdb_1412536748414, sr), 
 org.apache.cassandra.db.marshal.UUIDType][partition(testdb_1412536748414, 
 sr), org.apache.cassandra.db.marshal.Int32Type][listid(testdb_1412536748414, 
 sr), 
 org.apache.cassandra.db.marshal.LongType][in(createdate,emailcrypt)(testdb_1412536748414,
  sr), 
 org.apache.cassandra.db.marshal.ListType(org.apache.cassandra.db.marshal.TupleType(org.apache.cassandra.db.marshal.ReversedType(org.apache.cassandra.db.marshal.TimestampType),org.apache.cassandra.db.marshal.ReversedType(org.apache.cassandra.db.marshal.BytesType)))]
  (resultMetadata=[emailcrypt(testdb_1412536748414, sr), 
 org.apache.cassandra.db.marshal.ReversedType(org.apache.cassandra.db.marshal.BytesType)][emailaddr(testdb_1412536748414,
  sr), 
 org.apache.cassandra.db.marshal.UTF8Type][removedate(testdb_1412536748414, 
 sr), 
 org.apache.cassandra.db.marshal.TimestampType][removeimportid(testdb_1412536748414,
  sr), 
 org.apache.cassandra.db.marshal.LongType][properties(testdb_1412536748414, 
 sr), org.apache.cassandra.db.marshal.UTF8Type]), v=2
 DEBUG [SharedPool-Worker-1] 2014-10-05 14:20:15,363 Message.java:420 - 
 Received: EXECUTE a18ff9151e8bd3b13b48a0ba56ecb784 with 4 values at 
 consistency QUORUM, v=2
 DEBUG [SharedPool-Worker-2] 2014-10-05 14:20:15,380 Message.java:420 - 
 Received: EXECUTE a18ff9151e8bd3b13b48a0ba56ecb784 with 4 values at 
 consistency QUORUM, v=2
 DEBUG [SharedPool-Worker-5] 2014-10-05 14:20:15,402 Message.java:420 - 
 Received: EXECUTE a18ff9151e8bd3b13b48a0ba56ecb784 with 4 values at 
 consistency QUORUM, v=2
 ERROR [SharedPool-Worker-5] 2014-10-05 14:20:16,125 ErrorMessage.java:218 - 
 Unexpected exception during request
 java.lang.IllegalArgumentException: null
   at java.nio.Buffer.limit(Unknown Source) ~[na:1.7.0_25]
   at 
 org.apache.cassandra.utils.ByteBufferUtil.readBytes(ByteBufferUtil.java:539) 
 ~[apache-cassandra-2.1.0.jar:2.1.0]
   at 
 

[jira] [Created] (CASSANDRA-8062) IllegalArgumentException passing blob as tuple value element in list

2014-10-05 Thread Bill Mitchell (JIRA)
Bill Mitchell created CASSANDRA-8062:


 Summary: IllegalArgumentException passing blob as tuple value 
element in list
 Key: CASSANDRA-8062
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8062
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Windows 7, DataStax 2.1.0 Cassandra server, Java 
cassandra-driver-2.1.1 
Reporter: Bill Mitchell


I am using the same table schema as described in earlier reports, e.g., 
CASSANDRA-7105:
CREATE TABLE sr (siteid uuid, listid bigint, partition int, createdate 
timestamp, emailcrypt blob, emailaddr text, properties text, removedate 
timestamp. removeimportid bigint,
PRIMARY KEY ((siteid, listid, partition), createdate, emailcrypt)
) WITH CLUSTERING ORDER BY (createdate DESC, emailcrypt DESC);

I am trying to take advantage of the new Tuple support to issue a query to 
request multiple rows in a single wide row by (createdate,emailcrypt) pair.  I 
declare a new TupleType that covers the clustering columns and then issue an IN 
predicate against a list of these values:
private static final TupleType dateEmailTupleType = 
TupleType.of(DataType.timestamp(), DataType.blob());
...
ListTupleValue partitionKeys = new ArrayList(recipKeys.size());
...
BoundStatement boundStatement = new BoundStatement(preparedStatement);
boundStatement = boundStatement.bind(siteID, partition, listID);
boundStatement.setList(3, partitionKeys);

When I issue a SELECT against this table, the server fails apparently trying to 
break apart the list values:
DEBUG [SharedPool-Worker-1] 2014-10-05 14:20:15,312 Message.java:420 - 
Received: PREPARE SELECT emailCrypt, emailAddr, removeDate, removeImportID, 
properties FROM sr WHERE siteID = ? AND partition = ? AND listID = ? AND ( 
createDate, emailCrypt ) IN ? ;, v=2
DEBUG [SharedPool-Worker-1] 2014-10-05 14:20:15,323 Tracing.java:157 - request 
complete
DEBUG [SharedPool-Worker-1] 2014-10-05 14:20:15,323 Message.java:433 - 
Responding: RESULT PREPARED a18ff9151e8bd3b13b48a0ba56ecb784 
[siteid(testdb_1412536748414, sr), 
org.apache.cassandra.db.marshal.UUIDType][partition(testdb_1412536748414, sr), 
org.apache.cassandra.db.marshal.Int32Type][listid(testdb_1412536748414, sr), 
org.apache.cassandra.db.marshal.LongType][in(createdate,emailcrypt)(testdb_1412536748414,
 sr), 
org.apache.cassandra.db.marshal.ListType(org.apache.cassandra.db.marshal.TupleType(org.apache.cassandra.db.marshal.ReversedType(org.apache.cassandra.db.marshal.TimestampType),org.apache.cassandra.db.marshal.ReversedType(org.apache.cassandra.db.marshal.BytesType)))]
 (resultMetadata=[emailcrypt(testdb_1412536748414, sr), 
org.apache.cassandra.db.marshal.ReversedType(org.apache.cassandra.db.marshal.BytesType)][emailaddr(testdb_1412536748414,
 sr), 
org.apache.cassandra.db.marshal.UTF8Type][removedate(testdb_1412536748414, sr), 
org.apache.cassandra.db.marshal.TimestampType][removeimportid(testdb_1412536748414,
 sr), 
org.apache.cassandra.db.marshal.LongType][properties(testdb_1412536748414, sr), 
org.apache.cassandra.db.marshal.UTF8Type]), v=2
DEBUG [SharedPool-Worker-1] 2014-10-05 14:20:15,363 Message.java:420 - 
Received: EXECUTE a18ff9151e8bd3b13b48a0ba56ecb784 with 4 values at consistency 
QUORUM, v=2
DEBUG [SharedPool-Worker-2] 2014-10-05 14:20:15,380 Message.java:420 - 
Received: EXECUTE a18ff9151e8bd3b13b48a0ba56ecb784 with 4 values at consistency 
QUORUM, v=2
DEBUG [SharedPool-Worker-5] 2014-10-05 14:20:15,402 Message.java:420 - 
Received: EXECUTE a18ff9151e8bd3b13b48a0ba56ecb784 with 4 values at consistency 
QUORUM, v=2
ERROR [SharedPool-Worker-5] 2014-10-05 14:20:16,125 ErrorMessage.java:218 - 
Unexpected exception during request
java.lang.IllegalArgumentException: null
at java.nio.Buffer.limit(Unknown Source) ~[na:1.7.0_25]
at 
org.apache.cassandra.utils.ByteBufferUtil.readBytes(ByteBufferUtil.java:539) 
~[apache-cassandra-2.1.0.jar:2.1.0]
at 
org.apache.cassandra.serializers.CollectionSerializer.readValue(CollectionSerializer.java:122)
 ~[apache-cassandra-2.1.0.jar:2.1.0]
at 
org.apache.cassandra.serializers.ListSerializer.deserializeForNativeProtocol(ListSerializer.java:87)
 ~[apache-cassandra-2.1.0.jar:2.1.0]
at 
org.apache.cassandra.serializers.ListSerializer.deserializeForNativeProtocol(ListSerializer.java:27)
 ~[apache-cassandra-2.1.0.jar:2.1.0]
at 
org.apache.cassandra.serializers.CollectionSerializer.deserialize(CollectionSerializer.java:48)
 ~[apache-cassandra-2.1.0.jar:2.1.0]
at 
org.apache.cassandra.db.marshal.AbstractType.compose(AbstractType.java:66) 
~[apache-cassandra-2.1.0.jar:2.1.0]
at 
org.apache.cassandra.cql3.Tuples$InValue.fromSerialized(Tuples.java:249) 
~[apache-cassandra-2.1.0.jar:2.1.0]
at org.apache.cassandra.cql3.Tuples$InMarker.bind(Tuples.java:394) 

[jira] [Commented] (CASSANDRA-6875) CQL3: select multiple CQL rows in a single partition using IN

2014-05-26 Thread Bill Mitchell (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14009075#comment-14009075
 ] 

Bill Mitchell commented on CASSANDRA-6875:
--

To try this out, I cobbled up a test case by accessing the TupleType directly 
on the client side, as this feature is not yet supported in the Java driver.  
My approach was to serialize my two ordering column values, then use 
TupleType.buildValue() to concatenate them into a single ByteBuffer, build a 
List of all these, then use serialize on a ListTypeByteBuffer instance to get 
a single ByteBuffer representing the entire list, and bind that using 
setBytesUnsafe().  I'm not totally sure of all this, but it seems reasonable.  

My SELECT statement syntax followed the first of the three Tyler suggested: ... 
WHERE (c1, c2) IN ?, as this allows the statement to be prepared only once, 
irrespective of the number of compound keys provided.  

What I saw was the following traceback on the server:
14/05/26 14:33:09 ERROR messages.ErrorMessage: Unexpected exception during 
request
java.util.NoSuchElementException
at 
java.util.LinkedHashMap$LinkedHashIterator.nextEntry(LinkedHashMap.java:396)
at java.util.LinkedHashMap$ValueIterator.next(LinkedHashMap.java:409)
at 
org.apache.cassandra.cql3.statements.SelectStatement.buildMultiColumnInBound(SelectStatement.java:941)
at 
org.apache.cassandra.cql3.statements.SelectStatement.buildBound(SelectStatement.java:814)
at 
org.apache.cassandra.cql3.statements.SelectStatement.getRequestedBound(SelectStatement.java:977)
at 
org.apache.cassandra.cql3.statements.SelectStatement.makeFilter(SelectStatement.java:444)
at 
org.apache.cassandra.cql3.statements.SelectStatement.getSliceCommands(SelectStatement.java:340)
at 
org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:210)
at 
org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:61)
at 
org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:158)
at 
org.apache.cassandra.cql3.QueryProcessor.processPrepared(QueryProcessor.java:309)
at 
org.apache.cassandra.transport.messages.ExecuteMessage.execute(ExecuteMessage.java:132)
at 
org.apache.cassandra.transport.Message$Dispatcher.messageReceived(Message.java:304)

Stepping through the code, it appears to have analyzed my statement correctly.  
In BuildMultiColumnInBound, splitInValues contains 1426 tuples, which is the 
number I intended to pass.  The names parameter identifies two columns, 
createdate and emailcrypt.  The loop executes twice, but on the third iteration 
there are no more elements in names, thus the exception. 

Moving the construction of the iterator within the loop fixed my Exception.  
The code still looks suspect, though, as it calculates a bound b based on 
whether the first column is reversed, then uses bound, not b, in the following 
statement.  I've not researched which would be correct, as this appears closely 
related to the fix Sylvain just developed for CASSANDRA-7105.   

{code}
TreeSetByteBuffer inValues = new TreeSet(isReversed ? 
cfDef.cfm.comparator.reverseComparator : cfDef.cfm.comparator);
for (ListByteBuffer components : splitInValues)
{
ColumnNameBuilder nameBuilder = builder.copy();
for (ByteBuffer component : components)
nameBuilder.add(component);

IteratorCFDefinition.Name iter = names.iterator();
Bound b = isReversed == isReversedType(iter.next()) ? bound : 
Bound.reverse(bound);
inValues.add((bound == Bound.END  nameBuilder.remainingCount()  
0) ? nameBuilder.buildAsEndOfRange() : nameBuilder.build());
}
return new ArrayList(inValues);
{code}  

 CQL3: select multiple CQL rows in a single partition using IN
 -

 Key: CASSANDRA-6875
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6875
 Project: Cassandra
  Issue Type: Bug
  Components: API
Reporter: Nicolas Favre-Felix
Assignee: Tyler Hobbs
Priority: Minor
 Fix For: 2.0.9, 2.1 rc1


 In the spirit of CASSANDRA-4851 and to bring CQL to parity with Thrift, it is 
 important to support reading several distinct CQL rows from a given partition 
 using a distinct set of coordinates for these rows within the partition.
 CASSANDRA-4851 introduced a range scan over the multi-dimensional space of 
 clustering keys. We also need to support a multi-get of CQL rows, 
 potentially using the IN keyword to define a set of clustering keys to 
 fetch at once.
 (reusing the same example\:)
 Consider the following table:
 {code}
 CREATE TABLE test (
   k int,
   c1 int,
   c2 int,
  

[jira] [Comment Edited] (CASSANDRA-6875) CQL3: select multiple CQL rows in a single partition using IN

2014-05-26 Thread Bill Mitchell (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14009075#comment-14009075
 ] 

Bill Mitchell edited comment on CASSANDRA-6875 at 5/26/14 8:52 PM:
---

To try this out, I cobbled up a test case by accessing the TupleType directly 
on the client side, as this feature is not yet supported in the Java driver.  
My approach was to serialize my two ordering column values, then use 
TupleType.buildValue() to concatenate them into a single ByteBuffer, build a 
List of all these, then use serialize on a ListTypeByteBuffer instance to get 
a single ByteBuffer representing the entire list, and bind that using 
setBytesUnsafe().  I'm not totally sure of all this, but it seems reasonable.  

My SELECT statement syntax followed the first of the three Tyler suggested: ... 
WHERE (c1, c2) IN ?, as this allows the statement to be prepared only once, 
irrespective of the number of compound keys provided.  

What I saw was the following traceback on the server:
14/05/26 14:33:09 ERROR messages.ErrorMessage: Unexpected exception during 
request
java.util.NoSuchElementException
at 
java.util.LinkedHashMap$LinkedHashIterator.nextEntry(LinkedHashMap.java:396)
at java.util.LinkedHashMap$ValueIterator.next(LinkedHashMap.java:409)
at 
org.apache.cassandra.cql3.statements.SelectStatement.buildMultiColumnInBound(SelectStatement.java:941)
at 
org.apache.cassandra.cql3.statements.SelectStatement.buildBound(SelectStatement.java:814)
at 
org.apache.cassandra.cql3.statements.SelectStatement.getRequestedBound(SelectStatement.java:977)
at 
org.apache.cassandra.cql3.statements.SelectStatement.makeFilter(SelectStatement.java:444)
at 
org.apache.cassandra.cql3.statements.SelectStatement.getSliceCommands(SelectStatement.java:340)
at 
org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:210)
at 
org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:61)
at 
org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:158)
at 
org.apache.cassandra.cql3.QueryProcessor.processPrepared(QueryProcessor.java:309)
at 
org.apache.cassandra.transport.messages.ExecuteMessage.execute(ExecuteMessage.java:132)
at 
org.apache.cassandra.transport.Message$Dispatcher.messageReceived(Message.java:304)

Stepping through the code, it appears to have analyzed my statement correctly.  
In BuildMultiColumnInBound, splitInValues contains 1426 tuples, which is the 
number I intended to pass.  The names parameter identifies two columns, 
createdate and emailcrypt.  The loop executes twice, but on the third iteration 
there are no more elements in names, thus the exception. 

Moving the construction of the iterator within the loop fixed my Exception.  
The code still looks suspect, though, as it calculates a bound b based on 
whether the first column is reversed, then uses bound, not b, in the following 
statement.  I've not researched which would be correct, as this appears closely 
related to the fix Sylvain just developed for CASSANDRA-7105.  In my test case, 
where the columns were declared as DESC, the code as written did return all the 
expected rows. 

{code}
TreeSetByteBuffer inValues = new TreeSet(isReversed ? 
cfDef.cfm.comparator.reverseComparator : cfDef.cfm.comparator);
for (ListByteBuffer components : splitInValues)
{
ColumnNameBuilder nameBuilder = builder.copy();
for (ByteBuffer component : components)
nameBuilder.add(component);

IteratorCFDefinition.Name iter = names.iterator();
Bound b = isReversed == isReversedType(iter.next()) ? bound : 
Bound.reverse(bound);
inValues.add((bound == Bound.END  nameBuilder.remainingCount()  
0) ? nameBuilder.buildAsEndOfRange() : nameBuilder.build());
}
return new ArrayList(inValues);
{code}  


was (Author: wtmitchell3):
To try this out, I cobbled up a test case by accessing the TupleType directly 
on the client side, as this feature is not yet supported in the Java driver.  
My approach was to serialize my two ordering column values, then use 
TupleType.buildValue() to concatenate them into a single ByteBuffer, build a 
List of all these, then use serialize on a ListTypeByteBuffer instance to get 
a single ByteBuffer representing the entire list, and bind that using 
setBytesUnsafe().  I'm not totally sure of all this, but it seems reasonable.  

My SELECT statement syntax followed the first of the three Tyler suggested: ... 
WHERE (c1, c2) IN ?, as this allows the statement to be prepared only once, 
irrespective of the number of compound keys provided.  

What I saw was the following traceback on the server:
14/05/26 14:33:09 ERROR messages.ErrorMessage: Unexpected 

[jira] [Comment Edited] (CASSANDRA-6875) CQL3: select multiple CQL rows in a single partition using IN

2014-05-26 Thread Bill Mitchell (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14009075#comment-14009075
 ] 

Bill Mitchell edited comment on CASSANDRA-6875 at 5/27/14 2:16 AM:
---

To try this out, I cobbled up a test case by accessing the TupleType directly 
on the client side, as this feature is not yet supported in the Java driver.  
My approach was to serialize my two ordering column values, then use 
TupleType.buildValue() to concatenate them into a single ByteBuffer, build a 
List of all these, then use serialize on a ListTypeByteBuffer instance to get 
a single ByteBuffer representing the entire list, and bind that using 
setBytesUnsafe().  I'm not totally sure of all this, but it seems reasonable.  

My SELECT statement syntax followed the first of the three Tyler suggested: ... 
WHERE (c1, c2) IN ?, as this allows the statement to be prepared only once, 
irrespective of the number of compound keys provided.  

What I saw was the following traceback on the server:
14/05/26 14:33:09 ERROR messages.ErrorMessage: Unexpected exception during 
request
java.util.NoSuchElementException
at 
java.util.LinkedHashMap$LinkedHashIterator.nextEntry(LinkedHashMap.java:396)
at java.util.LinkedHashMap$ValueIterator.next(LinkedHashMap.java:409)
at 
org.apache.cassandra.cql3.statements.SelectStatement.buildMultiColumnInBound(SelectStatement.java:941)
at 
org.apache.cassandra.cql3.statements.SelectStatement.buildBound(SelectStatement.java:814)
at 
org.apache.cassandra.cql3.statements.SelectStatement.getRequestedBound(SelectStatement.java:977)
at 
org.apache.cassandra.cql3.statements.SelectStatement.makeFilter(SelectStatement.java:444)
at 
org.apache.cassandra.cql3.statements.SelectStatement.getSliceCommands(SelectStatement.java:340)
at 
org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:210)
at 
org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:61)
at 
org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:158)
at 
org.apache.cassandra.cql3.QueryProcessor.processPrepared(QueryProcessor.java:309)
at 
org.apache.cassandra.transport.messages.ExecuteMessage.execute(ExecuteMessage.java:132)
at 
org.apache.cassandra.transport.Message$Dispatcher.messageReceived(Message.java:304)

Stepping through the code, it appears to have analyzed my statement correctly.  
In BuildMultiColumnInBound, splitInValues contains 1426 tuples, which is the 
number I intended to pass.  The names parameter identifies two columns, 
createdate and emailcrypt.  The loop executes twice, but on the third iteration 
there are no more elements in names, thus the exception. 

Moving the construction of the iterator within the loop fixed my Exception.  
The code still looks suspect, though, as it calculates a bound b based on 
whether the first column is reversed, then uses bound, not b, in the following 
statement.  I've not researched which would be correct, as this appears closely 
related to the fix Sylvain just developed for CASSANDRA-7105.  In my test case, 
where the columns were declared as DESC, the code as fixed below did return all 
the expected rows. 

{code}
TreeSetByteBuffer inValues = new TreeSet(isReversed ? 
cfDef.cfm.comparator.reverseComparator : cfDef.cfm.comparator);
for (ListByteBuffer components : splitInValues)
{
ColumnNameBuilder nameBuilder = builder.copy();
for (ByteBuffer component : components)
nameBuilder.add(component);

IteratorCFDefinition.Name iter = names.iterator();
Bound b = isReversed == isReversedType(iter.next()) ? bound : 
Bound.reverse(bound);
inValues.add((bound == Bound.END  nameBuilder.remainingCount()  
0) ? nameBuilder.buildAsEndOfRange() : nameBuilder.build());
}
return new ArrayList(inValues);
{code}  

P.S. I changed my test configuration to declare the ordering columns as ASC 
instead of DESC and reran the tests.  There was no failure with the code as 
changed.  So apparently the comparison of bound == and not b == works fine, 
which should mean that both iter and b can be dropped.  


was (Author: wtmitchell3):
To try this out, I cobbled up a test case by accessing the TupleType directly 
on the client side, as this feature is not yet supported in the Java driver.  
My approach was to serialize my two ordering column values, then use 
TupleType.buildValue() to concatenate them into a single ByteBuffer, build a 
List of all these, then use serialize on a ListTypeByteBuffer instance to get 
a single ByteBuffer representing the entire list, and bind that using 
setBytesUnsafe().  I'm not totally sure of all this, but it seems reasonable.  

My SELECT statement syntax followed 

[jira] [Commented] (CASSANDRA-7105) SELECT with IN on final column of composite and compound primary key fails

2014-05-25 Thread Bill Mitchell (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14008353#comment-14008353
 ] 

Bill Mitchell commented on CASSANDRA-7105:
--

I removed Dave's patch and reclocated the patch Sylvain developed into the 2.0 
branch I downloaded a week ago.  I ran my test case several times, with random 
generation of the email addresses while varying the number of partitions, and 
observed no failures.  

 SELECT with IN on final column of composite and compound primary key fails
 --

 Key: CASSANDRA-7105
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7105
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: DataStax Cassandra 2.0.7
 Windows dual-core laptop
Reporter: Bill Mitchell
Assignee: Sylvain Lebresne
 Fix For: 1.2.17

 Attachments: 7105-v2.txt, 7105.txt


 I have a failing sequence where I specify an IN constraint on the final int 
 column of the composite primary key and an IN constraint on the final String 
 column of the compound primary key and no rows are returned, when rows should 
 be returned.  
 {noformat}
 CREATE TABLE IF NOT EXISTS sr2 (siteID TEXT, partition INT, listID BIGINT, 
 emailAddr TEXT, emailCrypt TEXT, createDate TIMESTAMP, removeDate TIMESTAMP, 
 removeImportID BIGINT, properties TEXT, PRIMARY KEY ((siteID, listID, 
 partition), createDate, emailCrypt) ) WITH CLUSTERING ORDER BY (createDate 
 DESC, emailCrypt DESC)  AND compression = {'sstable_compression' : 
 'SnappyCompressor'} AND compaction = {'class' : 
 'SizeTieredCompactionStrategy'};
 insert into sr2 (siteID, listID, partition, emailAddr, emailCrypt, 
 createDate) values ('4ca4f79e-3ab2-41c5-ae42-c7009736f1d5', 34, 1, 'xyzzy', 
 '5fe7719229092cdde4526afbc65c900c', '2014-04-28T14:05:59.236-0500');
 insert into sr2 (siteID, listID, partition, emailAddr, emailCrypt, 
 createDate) values ('4ca4f79e-3ab2-41c5-ae42-c7009736f1d5', 34, 2, 'noname', 
 '97bf28af2ca9c498d6e47237bb8680bf', '2014-04-28T14:05:59.236-0500');
 select emailCrypt, emailAddr from sr2 where siteID = 
 '4ca4f79e-3ab2-41c5-ae42-c7009736f1d5' and listID = 34 and partition = 2 and 
 createDate = '2014-04-28T14:05:59.236-0500' and emailCrypt = 
 '97bf28af2ca9c498d6e47237bb8680bf';
  emailcrypt   | emailaddr
 --+---
  97bf28af2ca9c498d6e47237bb8680bf |noname
 (1 rows)
 select emailCrypt, emailAddr  from sr2 where siteID = 
 '4ca4f79e-3ab2-41c5-ae42-c7009736f1d5' and listID = 34 and partition = 1 and 
 createDate = '2014-04-28T14:05:59.236-0500' and emailCrypt = 
 '5fe7719229092cdde4526afbc65c900c';
  emailcrypt   | emailaddr
 --+---
  5fe7719229092cdde4526afbc65c900c | xyzzy
 (1 rows)
 select emailCrypt, emailAddr from sr2 where siteID = 
 '4ca4f79e-3ab2-41c5-ae42-c7009736f1d5' and listID = 34 and partition IN (1,2) 
 and createDate = '2014-04-28T14:05:59.236-0500' and emailCrypt IN 
 ('97bf28af2ca9c498d6e47237bb8680bf','5fe7719229092cdde4526afbc65c900c');
 (0 rows)
 cqlsh:test_multiple_in select * from sr2;
  siteid   | listid | partition | createdate   
 | emailcrypt | emailaddr| 
 properties | removedate | re
 moveimportid
 --++---+--++--+++---
 -
  4ca4f79e-3ab2-41c5-ae42-c7009736f1d5 | 34 | 2 | 2014-04-28 
 14:05:59Central Daylight Time | noname | 97bf28af2ca9c498d6e47237bb8680bf 
 |   null |   null |
 null
  4ca4f79e-3ab2-41c5-ae42-c7009736f1d5 | 34 | 1 | 2014-04-28 
 14:05:59Central Daylight Time |  xyzzy | 5fe7719229092cdde4526afbc65c900c 
 |   null |   null |
 null
 (2 rows)
 select emailCrypt, emailAddr from sr2 where siteID = 
 '4ca4f79e-3ab2-41c5-ae42-c7009736f1d5' and listID = 34 and partition IN (1,2) 
 and createDate = '2014-04-28T14:05:59.236-0500' and emailCrypt IN 
 ('97bf28af2ca9c498d6e47237bb8680bf','5fe7719229092cdde4526afbc65c900c');
 (0 rows)
 select emailCrypt, emailAddr from sr2 where siteID = 
 '4ca4f79e-3ab2-41c5-ae42-c7009736f1d5' and listID = 34 and partition = 1 and 
 createDate = '2014-04-28T14:05:59.236-0500' and emailCrypt IN 
 ('97bf28af2ca9c498d6e47237bb8680bf','5fe7719229092cdde4526afbc65c900c');
 (0 rows)
 select emailCrypt, emailAddr from sr2 where siteID = 
 '4ca4f79e-3ab2-41c5-ae42-c7009736f1d5' and listID = 34 and partition = 2 and 
 createDate = '2014-04-28T14:05:59.236-0500' and emailCrypt IN 
 

[jira] [Commented] (CASSANDRA-7105) SELECT with IN on final column of composite and compound primary key fails

2014-05-18 Thread Bill Mitchell (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14001249#comment-14001249
 ] 

Bill Mitchell commented on CASSANDRA-7105:
--

Thank you for taking a look at this, Dave, as I was not yet ambitious enough to 
delve into this.  

I tried applying the attached 7105.txt patch to the current 2.0 branch (which 
is labelled 2.0.9).  The code in 2.0 is slightly different, but I'm assuming 
the fix is parallel:
{code}
private ListByteBuffer buildBound(Bound bound,
CollectionCFDefinition.Name names,
Restriction[] restrictions,
boolean isReversed,
ColumnNameBuilder builder,
ListByteBuffer variables) throws 
InvalidRequestException
{
...
s.add((b == Bound.END  copy.remainingCount()  0) ? 
copy.buildAsEndOfRange() : copy.build());
}
this.isReversed ^= isReversedType(name);
return new ArrayListByteBuffer(s);
}
{code}
It appears to me that we have only a partial fix to this problem.

Going back to the initial problem description where there were two IN operators 
in the predicate, the SELECT specifying the two partition id values and two 
emailcrypt values should return 2 rows.  The initial fault was that no rows 
were returned.  With the patch, one row was returned, not two, returning the 
value from partition 1.  

Trying to explore what might be happening, I added a third row:  
insert into sr2 (siteID, listID, partition, emailAddr, emailCrypt, createDate) 
values ('4ca4f79e-3ab2-41c5-ae42-c7009736f1d5', 34, 1, 'noname2', 
'98bf28af2ca9c498d6e47237bb8680c0', '2014-04-28T14:05:59.236-0500');

When I did a select requesting the three email values from the two partitions, 
it now returned only the first two and not the third.  If I specified only a 
single value, partition IN (1), it returned only 1 row, not the now expected 
two.  

I then added another row into a new partition 3.  This time, perhaps it was an 
error on my part, I included both the insert and the subsequent select in a 
single request in DataStax DevCenter:
insert into sr2 (siteID, listID, partition, emailAddr, emailCrypt, createDate) 
values ('4ca4f79e-3ab2-41c5-ae42-c7009736f1d5', 34, 3, 'noname3', 
'99bf28af2ca9c498d6e47237bb8680c1', '2014-04-28T14:05:59.236-0500');
select emailCrypt, emailAddr from sr2 where siteID = 
'4ca4f79e-3ab2-41c5-ae42-c7009736f1d5' and listID = 34 and partition IN (1,2) 
and createDate = '2014-04-28T14:05:59.236-0500' and emailCrypt IN 
('5fe7719229092cdde4526afbc65c900c','99bf28af2ca9c498d6e47237bb8680c1','97bf28af2ca9c498d6e47237bb8680bf','98bf28af2ca9c498d6e47237bb8680c0');

This timed out, with an exception in Cassandra output window:
14/05/18 17:36:33 ERROR service.CassandraDaemon: Exception in thread 
Thread[ReadStage:137,5,main]
java.lang.AssertionError: Added column does not sort as the last column
at 
org.apache.cassandra.db.ArrayBackedSortedColumns.addColumn(ArrayBackedSortedColumns.java:115)
at org.apache.cassandra.db.ColumnFamily.addColumn(ColumnFamily.java:116)
at 
org.apache.cassandra.db.ColumnFamily.addIfRelevant(ColumnFamily.java:110)
at 
org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:205)
at 
org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:122)
at 
org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:80)
at 
org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:72)
at 
org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:297)
at 
org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:53)
at 
org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1541)
at 
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1370)
at org.apache.cassandra.db.Keyspace.getRow(Keyspace.java:327)
at 
org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:65)
at 
org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:1348)
at 
org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1912)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)

If it helps in locating the problem, in my full application test, I am also 
seeing a problem where a row is being returned that does not 

[jira] [Resolved] (CASSANDRA-7099) Concurrent instances of same Prepared Statement seeing intermingled result sets

2014-04-30 Thread Bill Mitchell (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Mitchell resolved CASSANDRA-7099.
--

Resolution: Not a Problem

This problem would seem to be my fault.  

In the normal, non parallel case, one can cheat.  One can bind a 
PreparedStatement, execute it, process its result set, then bind a different 
parameter value and execute the same BoundStatement again.  

This does not work when the resultSet size exceeds the fetch size.  The initial 
segments are all fetched fine, but the Java Driver apparently uses the 
BoundStatement to distinguish the queries.  If one executes the same 
BoundStatement object, with different values, to generate multiple result sets, 
the Java driver or Cassandra get quite confused as to which results to return 
to which query.  

Building distinct BoundStatement objects and executing each just once avoids 
the confusion.  

 Concurrent instances of same Prepared Statement seeing intermingled result 
 sets
 ---

 Key: CASSANDRA-7099
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7099
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Cassandra 2.0.7 with single node cluster
 Windows dual-core laptop
 DataStax Java driver 2.0.1
Reporter: Bill Mitchell

 I have a schema in which a wide row is partitioned into smaller rows.  (See 
 CASSANDRA-6826, CASSANDRA-6825 for more detail on this schema.)  In this 
 case, I randomly assigned the rows across the partitions based on the first 
 four hex digits of a hash value modulo the number of partitions.  
 Occasionally I need to retrieve the rows in order of insertion irrespective 
 of the partitioning.  Cassandra, of course, does not support this when paging 
 by fetch size is enabled, so I am issuing a query against each of the 
 partitions to obtain their rows in order, and merging the results:
 SELECT l, partition, cd, rd, ec, ea FROM sr WHERE s = ?, l = ?, partition = ? 
 ORDER BY cd ASC, ec ASC ALLOW FILTERING;
 These parallel queries are all instances of a single PreparedStatement.  
 What I saw was identical values from multiple queries, which by construction 
 should never happen, and after further investigation, discovered that rows 
 from partition 5 are being returned in the result set for the query against 
 another partition, e.g., 1.  This was so unbelievable that I added diagnostic 
 code in my test case to detect this:
 After reading 167 rows, returned partition 5 does not match query partition 4
 The merge logic works fine and delivers correct results when I use LIMIT to 
 avoid fetch size paging.  Even if there were a bug there, it is hard to see 
 how any client error explains ResultSet.one() returning a row whose values 
 don't match the constraints in that ResultSet's query.
 I'm not sure of the exact significance of 167, as I have configured the 
 queryFetchSize for the cluster to 1000, and in this merge logic I divide that 
 by the number of partitions, 7, so the fetchSize for each of these parallel 
 queries was set to 142.  I suspect this is being treated as a minimum 
 fetchSize, and the driver or server is rounding this up to fill a 
 transmission block.  When I prime the pump, issuing the query against each of 
 the partitions, the initial contents of the result sets are correct.  The 
 failure appears after we advance two of these queries to the next page.
 Although I had been experimenting with fetchMoreResults() for prefetching, I 
 disabled that to isolate this problem, so that is not a factor.   
 I have not yet tried preparing separate instances of the query, as I already 
 have common logic to cache and reuse already prepared statements.
 I have not proven that it is a server bug and not a Java driver bug, but on 
 first glance it was not obvious how the Java driver might associate the 
 responses with the wrong requests.  Were that happening, one would expect to 
 see the right overall collection of rows, just to the wrong queries, and not 
 duplicates, which is what I saw.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7099) Concurrent instances of same Prepared Statement seeing intermingled result sets

2014-04-30 Thread Bill Mitchell (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13985608#comment-13985608
 ] 

Bill Mitchell commented on CASSANDRA-7099:
--

My thought was that, if the Java driver were more clever, it might be possible 
to use the ResultSet to determine the correlation id when paging in more 
results, instead of the Statement.  But there may be reasons why it wants to 
assume the Statement parameters have not changed, e.g., to avoid having to copy 
the bound parameters if it needs these to generate the later paged requests.  

 Concurrent instances of same Prepared Statement seeing intermingled result 
 sets
 ---

 Key: CASSANDRA-7099
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7099
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Cassandra 2.0.7 with single node cluster
 Windows dual-core laptop
 DataStax Java driver 2.0.1
Reporter: Bill Mitchell

 I have a schema in which a wide row is partitioned into smaller rows.  (See 
 CASSANDRA-6826, CASSANDRA-6825 for more detail on this schema.)  In this 
 case, I randomly assigned the rows across the partitions based on the first 
 four hex digits of a hash value modulo the number of partitions.  
 Occasionally I need to retrieve the rows in order of insertion irrespective 
 of the partitioning.  Cassandra, of course, does not support this when paging 
 by fetch size is enabled, so I am issuing a query against each of the 
 partitions to obtain their rows in order, and merging the results:
 SELECT l, partition, cd, rd, ec, ea FROM sr WHERE s = ?, l = ?, partition = ? 
 ORDER BY cd ASC, ec ASC ALLOW FILTERING;
 These parallel queries are all instances of a single PreparedStatement.  
 What I saw was identical values from multiple queries, which by construction 
 should never happen, and after further investigation, discovered that rows 
 from partition 5 are being returned in the result set for the query against 
 another partition, e.g., 1.  This was so unbelievable that I added diagnostic 
 code in my test case to detect this:
 After reading 167 rows, returned partition 5 does not match query partition 4
 The merge logic works fine and delivers correct results when I use LIMIT to 
 avoid fetch size paging.  Even if there were a bug there, it is hard to see 
 how any client error explains ResultSet.one() returning a row whose values 
 don't match the constraints in that ResultSet's query.
 I'm not sure of the exact significance of 167, as I have configured the 
 queryFetchSize for the cluster to 1000, and in this merge logic I divide that 
 by the number of partitions, 7, so the fetchSize for each of these parallel 
 queries was set to 142.  I suspect this is being treated as a minimum 
 fetchSize, and the driver or server is rounding this up to fill a 
 transmission block.  When I prime the pump, issuing the query against each of 
 the partitions, the initial contents of the result sets are correct.  The 
 failure appears after we advance two of these queries to the next page.
 Although I had been experimenting with fetchMoreResults() for prefetching, I 
 disabled that to isolate this problem, so that is not a factor.   
 I have not yet tried preparing separate instances of the query, as I already 
 have common logic to cache and reuse already prepared statements.
 I have not proven that it is a server bug and not a Java driver bug, but on 
 first glance it was not obvious how the Java driver might associate the 
 responses with the wrong requests.  Were that happening, one would expect to 
 see the right overall collection of rows, just to the wrong queries, and not 
 duplicates, which is what I saw.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7099) Concurrent instances of same Prepared Statement seeing intermingled result sets

2014-04-28 Thread Bill Mitchell (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13983044#comment-13983044
 ] 

Bill Mitchell commented on CASSANDRA-7099:
--

I should clarify, as the title is misleading, that I was seeing more than 
intermingled results.  Intermingled suggests that the results from query 2 came 
back to query 1 and vice versa.  What I saw was the same results being returned 
to two different queries -- something that might happen if, say, there were a 
query results buffer based on PreparedStatement id without looking at the bound 
parameters, so that the second query thought the results were already 
calculated and grabbed up the results from the first.  

 Concurrent instances of same Prepared Statement seeing intermingled result 
 sets
 ---

 Key: CASSANDRA-7099
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7099
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Cassandra 2.0.7 with single node cluster
 Windows dual-core laptop
 DataStax Java driver 2.0.1
Reporter: Bill Mitchell

 I have a schema in which a wide row is partitioned into smaller rows.  (See 
 CASSANDRA-6826, CASSANDRA-6825 for more detail on this schema.)  In this 
 case, I randomly assigned the rows across the partitions based on the first 
 four hex digits of a hash value modulo the number of partitions.  
 Occasionally I need to retrieve the rows in order of insertion irrespective 
 of the partitioning.  Cassandra, of course, does not support this when paging 
 by fetch size is enabled, so I am issuing a query against each of the 
 partitions to obtain their rows in order, and merging the results:
 SELECT l, partition, cd, rd, ec, ea FROM sr WHERE s = ?, l = ?, partition = ? 
 ORDER BY cd ASC, ec ASC ALLOW FILTERING;
 These parallel queries are all instances of a single PreparedStatement.  
 What I saw was identical values from multiple queries, which by construction 
 should never happen, and after further investigation, discovered that rows 
 from partition 5 are being returned in the result set for the query against 
 another partition, e.g., 1.  This was so unbelievable that I added diagnostic 
 code in my test case to detect this:
 After reading 167 rows, returned partition 5 does not match query partition 4
 The merge logic works fine and delivers correct results when I use LIMIT to 
 avoid fetch size paging.  Even if there were a bug there, it is hard to see 
 how any client error explains ResultSet.one() returning a row whose values 
 don't match the constraints in that ResultSet's query.
 I'm not sure of the exact significance of 167, as I have configured the 
 queryFetchSize for the cluster to 1000, and in this merge logic I divide that 
 by the number of partitions, 7, so the fetchSize for each of these parallel 
 queries was set to 142.  I suspect this is being treated as a minimum 
 fetchSize, and the driver or server is rounding this up to fill a 
 transmission block.  When I prime the pump, issuing the query against each of 
 the partitions, the initial contents of the result sets are correct.  The 
 failure appears after we advance two of these queries to the next page.
 Although I had been experimenting with fetchMoreResults() for prefetching, I 
 disabled that to isolate this problem, so that is not a factor.   
 I have not yet tried preparing separate instances of the query, as I already 
 have common logic to cache and reuse already prepared statements.
 I have not proven that it is a server bug and not a Java driver bug, but on 
 first glance it was not obvious how the Java driver might associate the 
 responses with the wrong requests.  Were that happening, one would expect to 
 see the right overall collection of rows, just to the wrong queries, and not 
 duplicates, which is what I saw.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (CASSANDRA-7099) Concurrent instances of same Prepared Statement seeing intermingled result sets

2014-04-28 Thread Bill Mitchell (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13983044#comment-13983044
 ] 

Bill Mitchell edited comment on CASSANDRA-7099 at 4/28/14 2:33 PM:
---

I should clarify, as the title is misleading, that I was seeing more than 
intermingled results.  Intermingled suggests that the results from query 2 came 
back to query 1 and vice versa.  What I saw was the same results being returned 
to two different queries -- something that might happen if, say, there were a 
query results buffer based on PreparedStatement id ignoring the bound 
parameters, so that the second query thought the results were already 
calculated and grabbed up the results from the first.  


was (Author: wtmitchell3):
I should clarify, as the title is misleading, that I was seeing more than 
intermingled results.  Intermingled suggests that the results from query 2 came 
back to query 1 and vice versa.  What I saw was the same results being returned 
to two different queries -- something that might happen if, say, there were a 
query results buffer based on PreparedStatement id without looking at the bound 
parameters, so that the second query thought the results were already 
calculated and grabbed up the results from the first.  

 Concurrent instances of same Prepared Statement seeing intermingled result 
 sets
 ---

 Key: CASSANDRA-7099
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7099
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Cassandra 2.0.7 with single node cluster
 Windows dual-core laptop
 DataStax Java driver 2.0.1
Reporter: Bill Mitchell

 I have a schema in which a wide row is partitioned into smaller rows.  (See 
 CASSANDRA-6826, CASSANDRA-6825 for more detail on this schema.)  In this 
 case, I randomly assigned the rows across the partitions based on the first 
 four hex digits of a hash value modulo the number of partitions.  
 Occasionally I need to retrieve the rows in order of insertion irrespective 
 of the partitioning.  Cassandra, of course, does not support this when paging 
 by fetch size is enabled, so I am issuing a query against each of the 
 partitions to obtain their rows in order, and merging the results:
 SELECT l, partition, cd, rd, ec, ea FROM sr WHERE s = ?, l = ?, partition = ? 
 ORDER BY cd ASC, ec ASC ALLOW FILTERING;
 These parallel queries are all instances of a single PreparedStatement.  
 What I saw was identical values from multiple queries, which by construction 
 should never happen, and after further investigation, discovered that rows 
 from partition 5 are being returned in the result set for the query against 
 another partition, e.g., 1.  This was so unbelievable that I added diagnostic 
 code in my test case to detect this:
 After reading 167 rows, returned partition 5 does not match query partition 4
 The merge logic works fine and delivers correct results when I use LIMIT to 
 avoid fetch size paging.  Even if there were a bug there, it is hard to see 
 how any client error explains ResultSet.one() returning a row whose values 
 don't match the constraints in that ResultSet's query.
 I'm not sure of the exact significance of 167, as I have configured the 
 queryFetchSize for the cluster to 1000, and in this merge logic I divide that 
 by the number of partitions, 7, so the fetchSize for each of these parallel 
 queries was set to 142.  I suspect this is being treated as a minimum 
 fetchSize, and the driver or server is rounding this up to fill a 
 transmission block.  When I prime the pump, issuing the query against each of 
 the partitions, the initial contents of the result sets are correct.  The 
 failure appears after we advance two of these queries to the next page.
 Although I had been experimenting with fetchMoreResults() for prefetching, I 
 disabled that to isolate this problem, so that is not a factor.   
 I have not yet tried preparing separate instances of the query, as I already 
 have common logic to cache and reuse already prepared statements.
 I have not proven that it is a server bug and not a Java driver bug, but on 
 first glance it was not obvious how the Java driver might associate the 
 responses with the wrong requests.  Were that happening, one would expect to 
 see the right overall collection of rows, just to the wrong queries, and not 
 duplicates, which is what I saw.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (CASSANDRA-7105) SELECT with IN on final column of composite and compound primary key fails

2014-04-28 Thread Bill Mitchell (JIRA)
Bill Mitchell created CASSANDRA-7105:


 Summary: SELECT with IN on final column of composite and compound 
primary key fails
 Key: CASSANDRA-7105
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7105
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: DataStax Cassandra 2.0.7
Windows dual-core laptop
Reporter: Bill Mitchell


I have a failing sequence where I specify an IN constraint on the final int 
column of the composite primary key and an IN constraint on the final String 
column of the compound primary key and no rows are returned, when rows should 
be returned.  
{noformat}
CREATE TABLE IF NOT EXISTS sr2 (siteID TEXT, partition INT, listID BIGINT, 
emailAddr TEXT, emailCrypt TEXT, createDate TIMESTAMP, removeDate TIMESTAMP, 
removeImportID BIGINT, properties TEXT, PRIMARY KEY ((siteID, listID, 
partition), createDate, emailCrypt) ) WITH CLUSTERING ORDER BY (createDate 
DESC, emailCrypt DESC)  AND compression = {'sstable_compression' : 
'SnappyCompressor'} AND compaction = {'class' : 'SizeTieredCompactionStrategy'};
insert into sr2 (siteID, listID, partition, emailAddr, emailCrypt, createDate) 
values ('4ca4f79e-3ab2-41c5-ae42-c7009736f1d5', 34, 1, 'xyzzy', 
'5fe7719229092cdde4526afbc65c900c', '2014-04-28T14:05:59.236-0500');
insert into sr2 (siteID, listID, partition, emailAddr, emailCrypt, createDate) 
values ('4ca4f79e-3ab2-41c5-ae42-c7009736f1d5', 34, 2, 'noname', 
'97bf28af2ca9c498d6e47237bb8680bf', '2014-04-28T14:05:59.236-0500');
select emailCrypt, emailAddr from sr2 where siteID = 
'4ca4f79e-3ab2-41c5-ae42-c7009736f1d5' and listID = 34 and partition = 2 and 
createDate = '2014-04-28T14:05:59.236-0500' and emailCrypt = 
'97bf28af2ca9c498d6e47237bb8680bf';

 emailcrypt   | emailaddr
--+---
 97bf28af2ca9c498d6e47237bb8680bf |noname

(1 rows)

select emailCrypt, emailAddr  from sr2 where siteID = 
'4ca4f79e-3ab2-41c5-ae42-c7009736f1d5' and listID = 34 and partition = 1 and 
createDate = '2014-04-28T14:05:59.236-0500' and emailCrypt = 
'5fe7719229092cdde4526afbc65c900c';


 emailcrypt   | emailaddr
--+---
 5fe7719229092cdde4526afbc65c900c | xyzzy

(1 rows)

select emailCrypt, emailAddr from sr2 where siteID = 
'4ca4f79e-3ab2-41c5-ae42-c7009736f1d5' and listID = 34 and partition IN (1,2) 
and createDate = '2014-04-28T14:05:59.236-0500' and emailCrypt IN 
('97bf28af2ca9c498d6e47237bb8680bf','5fe7719229092cdde4526afbc65c900c');

(0 rows)

cqlsh:test_multiple_in select * from sr2;

 siteid   | listid | partition | createdate 
  | emailcrypt | emailaddr| 
properties | removedate | re
moveimportid
--++---+--++--+++---
-
 4ca4f79e-3ab2-41c5-ae42-c7009736f1d5 | 34 | 2 | 2014-04-28 
14:05:59Central Daylight Time | noname | 97bf28af2ca9c498d6e47237bb8680bf | 
  null |   null |
null
 4ca4f79e-3ab2-41c5-ae42-c7009736f1d5 | 34 | 1 | 2014-04-28 
14:05:59Central Daylight Time |  xyzzy | 5fe7719229092cdde4526afbc65c900c | 
  null |   null |
null

(2 rows)

select emailCrypt, emailAddr from sr2 where siteID = 
'4ca4f79e-3ab2-41c5-ae42-c7009736f1d5' and listID = 34 and partition IN (1,2) 
and createDate = '2014-04-28T14:05:59.236-0500' and emailCrypt IN 
('97bf28af2ca9c498d6e47237bb8680bf','5fe7719229092cdde4526afbc65c900c');

(0 rows)

select emailCrypt, emailAddr from sr2 where siteID = 
'4ca4f79e-3ab2-41c5-ae42-c7009736f1d5' and listID = 34 and partition = 1 and 
createDate = '2014-04-28T14:05:59.236-0500' and emailCrypt IN 
('97bf28af2ca9c498d6e47237bb8680bf','5fe7719229092cdde4526afbc65c900c');

(0 rows)

select emailCrypt, emailAddr from sr2 where siteID = 
'4ca4f79e-3ab2-41c5-ae42-c7009736f1d5' and listID = 34 and partition = 2 and 
createDate = '2014-04-28T14:05:59.236-0500' and emailCrypt IN 
('97bf28af2ca9c498d6e47237bb8680bf','5fe7719229092cdde4526afbc65c900c');

(0 rows)

select emailCrypt, emailAddr from sr2 where siteID = 
'4ca4f79e-3ab2-41c5-ae42-c7009736f1d5' and listID = 34 and partition IN (1,2) 
and createDate = '2014-04-28T14:05:59.236-0500' and emailCrypt IN 
('97bf28af2ca9c498d6e47237bb8680bf','5fe7719229092cdde4526afbc65c900c');

(0 rows)

cqlsh:test_multiple_in select emailCrypt, emailAddr from sr2 where siteID = 
'4ca4f79e-3ab2-41c5-ae42-c7009736f1d5' and listID = 34 and partition IN (1,2) 
and createDate = '2014-04-28T14:05:59.236-0500' and emailCrypt IN 
('97bf28af2ca9c498d6e47237bb8680bf');

 emailcrypt   | emailaddr
--+---
 

[jira] [Updated] (CASSANDRA-7105) SELECT with IN on final column of composite and compound primary key fails

2014-04-28 Thread Bill Mitchell (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Mitchell updated CASSANDRA-7105:
-

Description: 
I have a failing sequence where I specify an IN constraint on the final int 
column of the composite primary key and an IN constraint on the final String 
column of the compound primary key and no rows are returned, when rows should 
be returned.  
{noformat}
CREATE TABLE IF NOT EXISTS sr2 (siteID TEXT, partition INT, listID BIGINT, 
emailAddr TEXT, emailCrypt TEXT, createDate TIMESTAMP, removeDate TIMESTAMP, 
removeImportID BIGINT, properties TEXT, PRIMARY KEY ((siteID, listID, 
partition), createDate, emailCrypt) ) WITH CLUSTERING ORDER BY (createDate 
DESC, emailCrypt DESC)  AND compression = {'sstable_compression' : 
'SnappyCompressor'} AND compaction = {'class' : 'SizeTieredCompactionStrategy'};
insert into sr2 (siteID, listID, partition, emailAddr, emailCrypt, createDate) 
values ('4ca4f79e-3ab2-41c5-ae42-c7009736f1d5', 34, 1, 'xyzzy', 
'5fe7719229092cdde4526afbc65c900c', '2014-04-28T14:05:59.236-0500');
insert into sr2 (siteID, listID, partition, emailAddr, emailCrypt, createDate) 
values ('4ca4f79e-3ab2-41c5-ae42-c7009736f1d5', 34, 2, 'noname', 
'97bf28af2ca9c498d6e47237bb8680bf', '2014-04-28T14:05:59.236-0500');
select emailCrypt, emailAddr from sr2 where siteID = 
'4ca4f79e-3ab2-41c5-ae42-c7009736f1d5' and listID = 34 and partition = 2 and 
createDate = '2014-04-28T14:05:59.236-0500' and emailCrypt = 
'97bf28af2ca9c498d6e47237bb8680bf';

 emailcrypt   | emailaddr
--+---
 97bf28af2ca9c498d6e47237bb8680bf |noname

(1 rows)

select emailCrypt, emailAddr  from sr2 where siteID = 
'4ca4f79e-3ab2-41c5-ae42-c7009736f1d5' and listID = 34 and partition = 1 and 
createDate = '2014-04-28T14:05:59.236-0500' and emailCrypt = 
'5fe7719229092cdde4526afbc65c900c';


 emailcrypt   | emailaddr
--+---
 5fe7719229092cdde4526afbc65c900c | xyzzy

(1 rows)

select emailCrypt, emailAddr from sr2 where siteID = 
'4ca4f79e-3ab2-41c5-ae42-c7009736f1d5' and listID = 34 and partition IN (1,2) 
and createDate = '2014-04-28T14:05:59.236-0500' and emailCrypt IN 
('97bf28af2ca9c498d6e47237bb8680bf','5fe7719229092cdde4526afbc65c900c');

(0 rows)

cqlsh:test_multiple_in select * from sr2;

 siteid   | listid | partition | createdate 
  | emailcrypt | emailaddr| 
properties | removedate | re
moveimportid
--++---+--++--+++---
-
 4ca4f79e-3ab2-41c5-ae42-c7009736f1d5 | 34 | 2 | 2014-04-28 
14:05:59Central Daylight Time | noname | 97bf28af2ca9c498d6e47237bb8680bf | 
  null |   null |
null
 4ca4f79e-3ab2-41c5-ae42-c7009736f1d5 | 34 | 1 | 2014-04-28 
14:05:59Central Daylight Time |  xyzzy | 5fe7719229092cdde4526afbc65c900c | 
  null |   null |
null

(2 rows)

select emailCrypt, emailAddr from sr2 where siteID = 
'4ca4f79e-3ab2-41c5-ae42-c7009736f1d5' and listID = 34 and partition IN (1,2) 
and createDate = '2014-04-28T14:05:59.236-0500' and emailCrypt IN 
('97bf28af2ca9c498d6e47237bb8680bf','5fe7719229092cdde4526afbc65c900c');

(0 rows)

select emailCrypt, emailAddr from sr2 where siteID = 
'4ca4f79e-3ab2-41c5-ae42-c7009736f1d5' and listID = 34 and partition = 1 and 
createDate = '2014-04-28T14:05:59.236-0500' and emailCrypt IN 
('97bf28af2ca9c498d6e47237bb8680bf','5fe7719229092cdde4526afbc65c900c');

(0 rows)

select emailCrypt, emailAddr from sr2 where siteID = 
'4ca4f79e-3ab2-41c5-ae42-c7009736f1d5' and listID = 34 and partition = 2 and 
createDate = '2014-04-28T14:05:59.236-0500' and emailCrypt IN 
('97bf28af2ca9c498d6e47237bb8680bf','5fe7719229092cdde4526afbc65c900c');

(0 rows)

select emailCrypt, emailAddr from sr2 where siteID = 
'4ca4f79e-3ab2-41c5-ae42-c7009736f1d5' and listID = 34 and partition IN (1,2) 
and createDate = '2014-04-28T14:05:59.236-0500' and emailCrypt IN 
('97bf28af2ca9c498d6e47237bb8680bf','5fe7719229092cdde4526afbc65c900c');

(0 rows)

cqlsh:test_multiple_in select emailCrypt, emailAddr from sr2 where siteID = 
'4ca4f79e-3ab2-41c5-ae42-c7009736f1d5' and listID = 34 and partition IN (1,2) 
and createDate = '2014-04-28T14:05:59.236-0500' and emailCrypt IN 
('97bf28af2ca9c498d6e47237bb8680bf');

 emailcrypt   | emailaddr
--+---
 97bf28af2ca9c498d6e47237bb8680bf |noname

(1 rows)

cqlsh:test_multiple_in select emailCrypt, emailAddr from sr2 where siteID = 
'4ca4f79e-3ab2-41c5-ae42-c7009736f1d5' and listID = 34 and partition IN (1,2) 
and createDate = '2014-04-28T14:05:59.236-0500' and 

[jira] [Commented] (CASSANDRA-7105) SELECT with IN on final column of composite and compound primary key fails

2014-04-28 Thread Bill Mitchell (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13983883#comment-13983883
 ] 

Bill Mitchell commented on CASSANDRA-7105:
--

As was suggested in the query results above, the first IN is not essential to 
this problem.  Issuing the query against each partition separately with an 
equality relation, specifying IN only for the final column, returns no rows.  

 SELECT with IN on final column of composite and compound primary key fails
 --

 Key: CASSANDRA-7105
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7105
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: DataStax Cassandra 2.0.7
 Windows dual-core laptop
Reporter: Bill Mitchell

 I have a failing sequence where I specify an IN constraint on the final int 
 column of the composite primary key and an IN constraint on the final String 
 column of the compound primary key and no rows are returned, when rows should 
 be returned.  
 {noformat}
 CREATE TABLE IF NOT EXISTS sr2 (siteID TEXT, partition INT, listID BIGINT, 
 emailAddr TEXT, emailCrypt TEXT, createDate TIMESTAMP, removeDate TIMESTAMP, 
 removeImportID BIGINT, properties TEXT, PRIMARY KEY ((siteID, listID, 
 partition), createDate, emailCrypt) ) WITH CLUSTERING ORDER BY (createDate 
 DESC, emailCrypt DESC)  AND compression = {'sstable_compression' : 
 'SnappyCompressor'} AND compaction = {'class' : 
 'SizeTieredCompactionStrategy'};
 insert into sr2 (siteID, listID, partition, emailAddr, emailCrypt, 
 createDate) values ('4ca4f79e-3ab2-41c5-ae42-c7009736f1d5', 34, 1, 'xyzzy', 
 '5fe7719229092cdde4526afbc65c900c', '2014-04-28T14:05:59.236-0500');
 insert into sr2 (siteID, listID, partition, emailAddr, emailCrypt, 
 createDate) values ('4ca4f79e-3ab2-41c5-ae42-c7009736f1d5', 34, 2, 'noname', 
 '97bf28af2ca9c498d6e47237bb8680bf', '2014-04-28T14:05:59.236-0500');
 select emailCrypt, emailAddr from sr2 where siteID = 
 '4ca4f79e-3ab2-41c5-ae42-c7009736f1d5' and listID = 34 and partition = 2 and 
 createDate = '2014-04-28T14:05:59.236-0500' and emailCrypt = 
 '97bf28af2ca9c498d6e47237bb8680bf';
  emailcrypt   | emailaddr
 --+---
  97bf28af2ca9c498d6e47237bb8680bf |noname
 (1 rows)
 select emailCrypt, emailAddr  from sr2 where siteID = 
 '4ca4f79e-3ab2-41c5-ae42-c7009736f1d5' and listID = 34 and partition = 1 and 
 createDate = '2014-04-28T14:05:59.236-0500' and emailCrypt = 
 '5fe7719229092cdde4526afbc65c900c';
  emailcrypt   | emailaddr
 --+---
  5fe7719229092cdde4526afbc65c900c | xyzzy
 (1 rows)
 select emailCrypt, emailAddr from sr2 where siteID = 
 '4ca4f79e-3ab2-41c5-ae42-c7009736f1d5' and listID = 34 and partition IN (1,2) 
 and createDate = '2014-04-28T14:05:59.236-0500' and emailCrypt IN 
 ('97bf28af2ca9c498d6e47237bb8680bf','5fe7719229092cdde4526afbc65c900c');
 (0 rows)
 cqlsh:test_multiple_in select * from sr2;
  siteid   | listid | partition | createdate   
 | emailcrypt | emailaddr| 
 properties | removedate | re
 moveimportid
 --++---+--++--+++---
 -
  4ca4f79e-3ab2-41c5-ae42-c7009736f1d5 | 34 | 2 | 2014-04-28 
 14:05:59Central Daylight Time | noname | 97bf28af2ca9c498d6e47237bb8680bf 
 |   null |   null |
 null
  4ca4f79e-3ab2-41c5-ae42-c7009736f1d5 | 34 | 1 | 2014-04-28 
 14:05:59Central Daylight Time |  xyzzy | 5fe7719229092cdde4526afbc65c900c 
 |   null |   null |
 null
 (2 rows)
 select emailCrypt, emailAddr from sr2 where siteID = 
 '4ca4f79e-3ab2-41c5-ae42-c7009736f1d5' and listID = 34 and partition IN (1,2) 
 and createDate = '2014-04-28T14:05:59.236-0500' and emailCrypt IN 
 ('97bf28af2ca9c498d6e47237bb8680bf','5fe7719229092cdde4526afbc65c900c');
 (0 rows)
 select emailCrypt, emailAddr from sr2 where siteID = 
 '4ca4f79e-3ab2-41c5-ae42-c7009736f1d5' and listID = 34 and partition = 1 and 
 createDate = '2014-04-28T14:05:59.236-0500' and emailCrypt IN 
 ('97bf28af2ca9c498d6e47237bb8680bf','5fe7719229092cdde4526afbc65c900c');
 (0 rows)
 select emailCrypt, emailAddr from sr2 where siteID = 
 '4ca4f79e-3ab2-41c5-ae42-c7009736f1d5' and listID = 34 and partition = 2 and 
 createDate = '2014-04-28T14:05:59.236-0500' and emailCrypt IN 
 ('97bf28af2ca9c498d6e47237bb8680bf','5fe7719229092cdde4526afbc65c900c');
 (0 rows)
 select emailCrypt, emailAddr from sr2 where siteID = 
 '4ca4f79e-3ab2-41c5-ae42-c7009736f1d5' and listID = 34 and partition 

[jira] [Comment Edited] (CASSANDRA-7105) SELECT with IN on final column of composite and compound primary key fails

2014-04-28 Thread Bill Mitchell (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13983883#comment-13983883
 ] 

Bill Mitchell edited comment on CASSANDRA-7105 at 4/29/14 2:25 AM:
---

As was suggested in the query results above, the first IN is not essential to 
this problem.  Issuing the query against each partition separately with an 
equality relation, specifying IN only for the final column, returns no rows.  

When I changed the schema so the CLUSTERING ORDER was ASC instead of DESC, the 
queries worked fine.  So apparently the problem lies with the IN relation 
combined with order DESC on the column.


was (Author: wtmitchell3):
As was suggested in the query results above, the first IN is not essential to 
this problem.  Issuing the query against each partition separately with an 
equality relation, specifying IN only for the final column, returns no rows.  

 SELECT with IN on final column of composite and compound primary key fails
 --

 Key: CASSANDRA-7105
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7105
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: DataStax Cassandra 2.0.7
 Windows dual-core laptop
Reporter: Bill Mitchell

 I have a failing sequence where I specify an IN constraint on the final int 
 column of the composite primary key and an IN constraint on the final String 
 column of the compound primary key and no rows are returned, when rows should 
 be returned.  
 {noformat}
 CREATE TABLE IF NOT EXISTS sr2 (siteID TEXT, partition INT, listID BIGINT, 
 emailAddr TEXT, emailCrypt TEXT, createDate TIMESTAMP, removeDate TIMESTAMP, 
 removeImportID BIGINT, properties TEXT, PRIMARY KEY ((siteID, listID, 
 partition), createDate, emailCrypt) ) WITH CLUSTERING ORDER BY (createDate 
 DESC, emailCrypt DESC)  AND compression = {'sstable_compression' : 
 'SnappyCompressor'} AND compaction = {'class' : 
 'SizeTieredCompactionStrategy'};
 insert into sr2 (siteID, listID, partition, emailAddr, emailCrypt, 
 createDate) values ('4ca4f79e-3ab2-41c5-ae42-c7009736f1d5', 34, 1, 'xyzzy', 
 '5fe7719229092cdde4526afbc65c900c', '2014-04-28T14:05:59.236-0500');
 insert into sr2 (siteID, listID, partition, emailAddr, emailCrypt, 
 createDate) values ('4ca4f79e-3ab2-41c5-ae42-c7009736f1d5', 34, 2, 'noname', 
 '97bf28af2ca9c498d6e47237bb8680bf', '2014-04-28T14:05:59.236-0500');
 select emailCrypt, emailAddr from sr2 where siteID = 
 '4ca4f79e-3ab2-41c5-ae42-c7009736f1d5' and listID = 34 and partition = 2 and 
 createDate = '2014-04-28T14:05:59.236-0500' and emailCrypt = 
 '97bf28af2ca9c498d6e47237bb8680bf';
  emailcrypt   | emailaddr
 --+---
  97bf28af2ca9c498d6e47237bb8680bf |noname
 (1 rows)
 select emailCrypt, emailAddr  from sr2 where siteID = 
 '4ca4f79e-3ab2-41c5-ae42-c7009736f1d5' and listID = 34 and partition = 1 and 
 createDate = '2014-04-28T14:05:59.236-0500' and emailCrypt = 
 '5fe7719229092cdde4526afbc65c900c';
  emailcrypt   | emailaddr
 --+---
  5fe7719229092cdde4526afbc65c900c | xyzzy
 (1 rows)
 select emailCrypt, emailAddr from sr2 where siteID = 
 '4ca4f79e-3ab2-41c5-ae42-c7009736f1d5' and listID = 34 and partition IN (1,2) 
 and createDate = '2014-04-28T14:05:59.236-0500' and emailCrypt IN 
 ('97bf28af2ca9c498d6e47237bb8680bf','5fe7719229092cdde4526afbc65c900c');
 (0 rows)
 cqlsh:test_multiple_in select * from sr2;
  siteid   | listid | partition | createdate   
 | emailcrypt | emailaddr| 
 properties | removedate | re
 moveimportid
 --++---+--++--+++---
 -
  4ca4f79e-3ab2-41c5-ae42-c7009736f1d5 | 34 | 2 | 2014-04-28 
 14:05:59Central Daylight Time | noname | 97bf28af2ca9c498d6e47237bb8680bf 
 |   null |   null |
 null
  4ca4f79e-3ab2-41c5-ae42-c7009736f1d5 | 34 | 1 | 2014-04-28 
 14:05:59Central Daylight Time |  xyzzy | 5fe7719229092cdde4526afbc65c900c 
 |   null |   null |
 null
 (2 rows)
 select emailCrypt, emailAddr from sr2 where siteID = 
 '4ca4f79e-3ab2-41c5-ae42-c7009736f1d5' and listID = 34 and partition IN (1,2) 
 and createDate = '2014-04-28T14:05:59.236-0500' and emailCrypt IN 
 ('97bf28af2ca9c498d6e47237bb8680bf','5fe7719229092cdde4526afbc65c900c');
 (0 rows)
 select emailCrypt, emailAddr from sr2 where siteID = 
 '4ca4f79e-3ab2-41c5-ae42-c7009736f1d5' and listID = 34 and partition = 1 and 
 createDate = '2014-04-28T14:05:59.236-0500' 

[jira] [Created] (CASSANDRA-7099) Concurrent instances of same Prepared Statement seeing intermingled result sets

2014-04-27 Thread Bill Mitchell (JIRA)
Bill Mitchell created CASSANDRA-7099:


 Summary: Concurrent instances of same Prepared Statement seeing 
intermingled result sets
 Key: CASSANDRA-7099
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7099
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Cassandra 2.0.7 with single node cluster
Windows dual-core laptop
DataStax Java driver 2.0.1
Reporter: Bill Mitchell


I have a schema in which a wide row is partitioned into smaller rows.  (See 
CASSANDRA-6826, CASSANDRA-6825 for more detail on this schema.)  In this case, 
I randomly assigned the rows across the partitions based on the first four hex 
digits of a hash value modulo the number of partitions.  

Occasionally I need to retrieve the rows in order of insertion irrespective of 
the partitioning.  Cassandra, of course, does not support this when paging by 
fetch size is enabled, so I am issuing a query against each of the partitions 
to obtain their rows in order, and merging the results:

SELECT l, partition, cd, rd, ec, ea FROM sr WHERE s = ?, l = ?, partition = ? 
ORDER BY cd ASC, ec ASC ALLOW FILTERING;

These parallel queries are all instances of a single PreparedStatement.  

What I saw was identical values from multiple queries, which by construction 
should never happen, and after further investigation, discovered that rows from 
partition 5 are being returned in the result set for the query against another 
partition, e.g., 1.  This was so unbelievable that I added diagnostic code in 
my test case to detect this:

After reading 167 rows, returned partition 5 does not match query partition 4

The merge logic works fine and delivers correct results when I use LIMIT to 
avoid fetch size paging.  Even if there were a bug there, it is hard to see how 
any client error explains ResultSet.one() returning a row whose values don't 
match the constraints in that ResultSet's query.

I'm not sure of the exact significance of 167, as I have configured the 
queryFetchSize for the cluster to 1000, and in this merge logic I divide that 
by the number of partitions, 7, so the fetchSize for each of these parallel 
queries was set to 142.  I suspect this is being treated as a minimum 
fetchSize, and the driver or server is rounding this up to fill a transmission 
block.  When I prime the pump, issuing the query against each of the 
partitions, the initial contents of the result sets are correct.  The failure 
appears after we advance two of these queries to the next page.

Although I had been experimenting with fetchMoreResults() for prefetching, I 
disabled that to isolate this problem, so that is not a factor.   

I have not yet tried preparing separate instances of the query, as I already 
have common logic to cache and reuse already prepared statements.

I have not proven that it is a server bug and not a Java driver bug, but on 
first glance it was not obvious how the Java driver might associate the 
responses with the wrong requests.  Were that happening, one would expect to 
see the right overall collection of rows, just to the wrong queries, and not 
duplicates, which is what I saw.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6826) Query returns different number of results depending on fetchsize

2014-04-27 Thread Bill Mitchell (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13982679#comment-13982679
 ] 

Bill Mitchell commented on CASSANDRA-6826:
--

Thank you for calling my attention to its release; the last time I checked the 
DataStax site, I did not yet see it, and once again I forgot to check the 
Apache site directly.  

Although in my first 2.0.7 tests I saw a failure, I was trying something new, 
to use fetchMoreResults now that fetchSize was supposed to be fixed.  Further 
testing has convinced me that these failures are new issues, different from 
this report.  The specific test that failed for me above works in 2.0.7, so, 
yes, I believe this problem is fixed. 

 Query returns different number of results depending on fetchsize
 

 Key: CASSANDRA-6826
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6826
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: quad-core Windows 7 x64, single node cluster
 Cassandra 2.0.5
Reporter: Bill Mitchell
Assignee: Sylvain Lebresne

 I issue a query across the set of partitioned wide rows for one logical row, 
 where s, l, and partition specify the composite primary key for the row:
 SELECT ec, ea, rd FROM sr WHERE s = ? and partition IN ? and l = ? ALLOW 
 FILTERING;
 If I set fetchSize to only 1000 when the Cluster is configured, the query 
 sometimes does not return all the results.  In the particular case I am 
 chasing, it returns a total of 98586 rows.  If I increase the fetchsize to 
 10, all the 9 actual rows are returned.  This suggests there is some 
 problem with fetchsize re-establishing the position on the next segment of 
 the result set, at least when multiple partitions are being accessed.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6841) ConcurrentModificationException in commit-log-writer after local schema reset

2014-04-25 Thread Bill Mitchell (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13981084#comment-13981084
 ] 

Bill Mitchell commented on CASSANDRA-6841:
--

Although not the author, I ran into this symptom of the 
ConcurrentModificationException at line 309 in the CommitLogAllocator more than 
once this week on 2.0.6, on an 8-core Windows desktop.  After downloading and 
installing 2.0.7 yesterday, I have not seen this problem reappear.

 ConcurrentModificationException in commit-log-writer after local schema reset
 -

 Key: CASSANDRA-6841
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6841
 Project: Cassandra
  Issue Type: Bug
 Environment: Linux 3.2.0 (Debian Wheezy) Cassandra 2.0.6, Oracle JVM 
 1.7.0_51
 Almost default cassandra.yaml (IPs and cluster name changed)
 This is the 2nd node in a 2-node ring. It has ~2500 keyspaces and very low 
 traffic. (Only new keyspaces see reads and writes.)
Reporter: Pas
Assignee: Benedict
Priority: Minor
 Fix For: 1.2.17, 2.0.7, 2.1 beta2

 Attachments: 6841.12.txt


 {code}
  INFO [RMI TCP Connection(38)-192.168.36.171] 2014-03-12 11:37:54,013 
 MigrationManager.java (line 329) Starting local schema reset...
  INFO [RMI TCP Connection(38)-192.168.36.171] 2014-03-12 11:37:54,016 
 ColumnFamilyStore.java (line 785) Enqueuing flush of 
 Memtable-local@394448776(114/1140 serialized/live bytes, 3 ops)
  INFO [FlushWriter:6] 2014-03-12 11:37:54,016 Memtable.java (line 331) 
 Writing Memtable-local@394448776(114/1140 serialized/live bytes, 3 ops)
  INFO [FlushWriter:6] 2014-03-12 11:37:54,182 Memtable.java (line 371) 
 Completed flushing 
 /var/lib/cassandra/data/system/local/system-local-jb-398-Data.db (145 bytes) 
 for commitlog position ReplayPosition(segmentId=1394620057452, 
 position=33159822)
  INFO [RMI TCP Connection(38)-192.168.36.171] 2014-03-12 11:37:54,185 
 ColumnFamilyStore.java (line 785) Enqueuing flush of 
 Memtable-local@1087210140(62/620 serialized/live bytes, 1 ops)
  INFO [FlushWriter:6] 2014-03-12 11:37:54,185 Memtable.java (line 331) 
 Writing Memtable-local@1087210140(62/620 serialized/live bytes, 1 ops)
  INFO [FlushWriter:6] 2014-03-12 11:37:54,357 Memtable.java (line 371) 
 Completed flushing 
 /var/lib/cassandra/data/system/local/system-local-jb-399-Data.db (96 bytes) 
 for commitlog position ReplayPosition(segmentId=1394620057452, 
 position=33159959)
  INFO [RMI TCP Connection(38)-192.168.36.171] 2014-03-12 11:37:54,361 
 ColumnFamilyStore.java (line 785) Enqueuing flush of 
 Memtable-local@768887091(62/620 serialized/live bytes, 1 ops)
  INFO [FlushWriter:6] 2014-03-12 11:37:54,361 Memtable.java (line 331) 
 Writing Memtable-local@768887091(62/620 serialized/live bytes, 1 ops)
  INFO [FlushWriter:6] 2014-03-12 11:37:54,516 Memtable.java (line 371) 
 Completed flushing 
 /var/lib/cassandra/data/system/local/system-local-jb-400-Data.db (96 bytes) 
 for commitlog position ReplayPosition(segmentId=1394620057452, 
 position=33160096)
  INFO [CompactionExecutor:38] 2014-03-12 11:37:54,517 CompactionTask.java 
 (line 115) Compacting 
 [SSTableReader(path='/var/lib/cassandra/data/system/local/system-local-jb-398-Data.db'),
  
 SSTableReader(path='/var/lib/cassandra/data/system/local/system-local-jb-400-Data.db'),
  
 SSTableReader(path='/var/lib/cassandra/data/system/local/system-local-jb-399-Data.db'),
  
 SSTableReader(path='/var/lib/cassandra/data/system/local/system-local-jb-397-Data.db')]
  INFO [RMI TCP Connection(38)-192.168.36.171] 2014-03-12 11:37:54,519 
 ColumnFamilyStore.java (line 785) Enqueuing flush of 
 Memtable-local@271993477(62/620 serialized/live bytes, 1 ops)
  INFO [FlushWriter:6] 2014-03-12 11:37:54,519 Memtable.java (line 331) 
 Writing Memtable-local@271993477(62/620 serialized/live bytes, 1 ops)
  INFO [FlushWriter:6] 2014-03-12 11:37:54,794 Memtable.java (line 371) 
 Completed flushing 
 /var/lib/cassandra/data/system/local/system-local-jb-401-Data.db (96 bytes) 
 for commitlog position ReplayPosition(segmentId=1394620057452, 
 position=33160233)
  INFO [RMI TCP Connection(38)-192.168.36.171] 2014-03-12 11:37:54,799 
 MigrationManager.java (line 357) Local schema reset is complete.
  INFO [CompactionExecutor:38] 2014-03-12 11:37:54,848 CompactionTask.java 
 (line 275) Compacted 4 sstables to 
 [/var/lib/cassandra/data/system/local/system-local-jb-402,].  6,099 bytes to 
 5,821 (~95% of original) in 330ms = 0.016822MB/s.  4 total partitions merged 
 to 1.  Partition merge counts were {4:1, }
  INFO [OptionalTasks:1] 2014-03-12 11:37:55,110 ColumnFamilyStore.java (line 
 785) Enqueuing flush of 
 Memtable-schema_columnfamilies@106276050(181506/509164 serialized/live bytes, 
 3276 ops)
  INFO [FlushWriter:6] 2014-03-12 

[jira] [Commented] (CASSANDRA-6826) Query returns different number of results depending on fetchsize

2014-03-28 Thread Bill Mitchell (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13951706#comment-13951706
 ] 

Bill Mitchell commented on CASSANDRA-6826:
--

I started working on a smaller testcase, but competing time pressures at work 
put that effort on hold.  In the meantime, I was able to work around this 
problem by using LIMIT instead of fetch, iterating over the partitions, and 
using a compound comparison in the WHERE clause to establish position for the 
next query.  This prompted me to open JAVA-295, as I had to abandon the 
QueryBuilder in order to construct this WHERE clause.  

When Cassandra 2.0.7 comes out, I will check if the fix to CASSANDRA-6825 also 
fixes all the issue I found with the SELECT.

 Query returns different number of results depending on fetchsize
 

 Key: CASSANDRA-6826
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6826
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: quad-core Windows 7 x64, single node cluster
 Cassandra 2.0.5
Reporter: Bill Mitchell
Assignee: Sylvain Lebresne

 I issue a query across the set of partitioned wide rows for one logical row, 
 where s, l, and partition specify the composite primary key for the row:
 SELECT ec, ea, rd FROM sr WHERE s = ? and partition IN ? and l = ? ALLOW 
 FILTERING;
 If I set fetchSize to only 1000 when the Cluster is configured, the query 
 sometimes does not return all the results.  In the particular case I am 
 chasing, it returns a total of 98586 rows.  If I increase the fetchsize to 
 10, all the 9 actual rows are returned.  This suggests there is some 
 problem with fetchsize re-establishing the position on the next segment of 
 the result set, at least when multiple partitions are being accessed.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (CASSANDRA-6825) COUNT(*) with WHERE not finding all the matching rows

2014-03-21 Thread Bill Mitchell (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Mitchell updated CASSANDRA-6825:
-

Attachment: testdb_1395372407904.zip

 COUNT(*) with WHERE not finding all the matching rows
 -

 Key: CASSANDRA-6825
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6825
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: quad core Windows7 x64, single node cluster
 Cassandra 2.0.5
Reporter: Bill Mitchell
Assignee: Tyler Hobbs
 Attachments: cassandra.log, selectpartitions.zip, 
 selectrowcounts.txt, testdb_1395372407904.zip, testdb_1395372407904.zip


 Investigating another problem, I needed to do COUNT(*) on the several 
 partitions of a table immediately after a test case ran, and I discovered 
 that count(*) on the full table and on each of the partitions returned 
 different counts.  
 In particular case, SELECT COUNT(*) FROM sr LIMIT 100; returned the 
 expected count from the test 9 rows.  The composite primary key splits 
 the logical row into six distinct partitions, and when I issue a query asking 
 for the total across all six partitions, the returned result is only 83999.  
 Drilling down, I find that SELECT * from sr WHERE s = 5 AND l = 11 AND 
 partition = 0; returns 30,000 rows, but a SELECT COUNT(*) with the identical 
 WHERE predicate reports only 14,000. 
 This is failing immediately after running a single small test, such that 
 there are only two SSTables, sr-jb-1 and sr-jb-2.  Compaction never needed to 
 run.  
 In selectrowcounts.txt is a copy of the cqlsh output showing the incorrect 
 count(*) results.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6825) COUNT(*) with WHERE not finding all the matching rows

2014-03-21 Thread Bill Mitchell (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13942881#comment-13942881
 ] 

Bill Mitchell commented on CASSANDRA-6825:
--

Tyler, you use an interesting word, flush.  After running a test with a 
different database name, I went back and looked at the first keyspace, as I did 
not drain the node before zipping the file the first time.  A third SSTable had 
now been written.  See the larger .zip file I have attached.  When I try the 
same statements through cqlsh, a SELECT * FROM sr WHERE ... AND partition = 2 
now shows 2 rows, but SELECT COUNT(*) FROM sr WHERE ... AND partition=2 
still returns a count of 1.  So the count is still incorrect.  

 COUNT(*) with WHERE not finding all the matching rows
 -

 Key: CASSANDRA-6825
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6825
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: quad core Windows7 x64, single node cluster
 Cassandra 2.0.5
Reporter: Bill Mitchell
Assignee: Tyler Hobbs
 Attachments: cassandra.log, selectpartitions.zip, 
 selectrowcounts.txt, testdb_1395372407904.zip, testdb_1395372407904.zip


 Investigating another problem, I needed to do COUNT(*) on the several 
 partitions of a table immediately after a test case ran, and I discovered 
 that count(*) on the full table and on each of the partitions returned 
 different counts.  
 In particular case, SELECT COUNT(*) FROM sr LIMIT 100; returned the 
 expected count from the test 9 rows.  The composite primary key splits 
 the logical row into six distinct partitions, and when I issue a query asking 
 for the total across all six partitions, the returned result is only 83999.  
 Drilling down, I find that SELECT * from sr WHERE s = 5 AND l = 11 AND 
 partition = 0; returns 30,000 rows, but a SELECT COUNT(*) with the identical 
 WHERE predicate reports only 14,000. 
 This is failing immediately after running a single small test, such that 
 there are only two SSTables, sr-jb-1 and sr-jb-2.  Compaction never needed to 
 run.  
 In selectrowcounts.txt is a copy of the cqlsh output showing the incorrect 
 count(*) results.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6826) Query returns different number of results depending on fetchsize

2014-03-21 Thread Bill Mitchell (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943037#comment-13943037
 ] 

Bill Mitchell commented on CASSANDRA-6826:
--

It is worth noting that, when I first reported this problem, the difference 
between the two expected and actual number of rows returned was 1413, a rather 
odd number.  So far, on 2.0.6, I have seen differences that are always a 
multiple of 10,000, matching the behavior in CASSANDRA-6825.  So it may indeed 
be, as Sylvain suggested, that CASSANDRA-6748 fixed one problem, that I was 
seeing when I first reported this, but that the one test was hitting two 
problems, depending on timing and other issues, and now only CASSANDRA-6825 
remains.

 Query returns different number of results depending on fetchsize
 

 Key: CASSANDRA-6826
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6826
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: quad-core Windows 7 x64, single node cluster
 Cassandra 2.0.5
Reporter: Bill Mitchell
Assignee: Sylvain Lebresne

 I issue a query across the set of partitioned wide rows for one logical row, 
 where s, l, and partition specify the composite primary key for the row:
 SELECT ec, ea, rd FROM sr WHERE s = ? and partition IN ? and l = ? ALLOW 
 FILTERING;
 If I set fetchSize to only 1000 when the Cluster is configured, the query 
 sometimes does not return all the results.  In the particular case I am 
 chasing, it returns a total of 98586 rows.  If I increase the fetchsize to 
 10, all the 9 actual rows are returned.  This suggests there is some 
 problem with fetchsize re-establishing the position on the next segment of 
 the result set, at least when multiple partitions are being accessed.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (CASSANDRA-6825) COUNT(*) with WHERE not finding all the matching rows

2014-03-21 Thread Bill Mitchell (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13942881#comment-13942881
 ] 

Bill Mitchell edited comment on CASSANDRA-6825 at 3/21/14 7:09 PM:
---

Tyler, you used an interesting word, flush.  After running a test with a 
different database name, I went back and looked at the first keyspace, as I did 
not drain the node before zipping the file the first time.  A third SSTable had 
now been written.  See the larger .zip file I have attached.  When I try the 
same statements through cqlsh, a SELECT * FROM sr WHERE ... AND partition = 2 
now shows 2 rows, but SELECT COUNT(*) FROM sr WHERE ... AND partition=2 
still returns a count of 1.  So the count is still incorrect.  


was (Author: wtmitchell3):
Tyler, you use an interesting word, flush.  After running a test with a 
different database name, I went back and looked at the first keyspace, as I did 
not drain the node before zipping the file the first time.  A third SSTable had 
now been written.  See the larger .zip file I have attached.  When I try the 
same statements through cqlsh, a SELECT * FROM sr WHERE ... AND partition = 2 
now shows 2 rows, but SELECT COUNT(*) FROM sr WHERE ... AND partition=2 
still returns a count of 1.  So the count is still incorrect.  

 COUNT(*) with WHERE not finding all the matching rows
 -

 Key: CASSANDRA-6825
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6825
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: quad core Windows7 x64, single node cluster
 Cassandra 2.0.5
Reporter: Bill Mitchell
Assignee: Tyler Hobbs
 Attachments: cassandra.log, selectpartitions.zip, 
 selectrowcounts.txt, testdb_1395372407904.zip, testdb_1395372407904.zip


 Investigating another problem, I needed to do COUNT(*) on the several 
 partitions of a table immediately after a test case ran, and I discovered 
 that count(*) on the full table and on each of the partitions returned 
 different counts.  
 In particular case, SELECT COUNT(*) FROM sr LIMIT 100; returned the 
 expected count from the test 9 rows.  The composite primary key splits 
 the logical row into six distinct partitions, and when I issue a query asking 
 for the total across all six partitions, the returned result is only 83999.  
 Drilling down, I find that SELECT * from sr WHERE s = 5 AND l = 11 AND 
 partition = 0; returns 30,000 rows, but a SELECT COUNT(*) with the identical 
 WHERE predicate reports only 14,000. 
 This is failing immediately after running a single small test, such that 
 there are only two SSTables, sr-jb-1 and sr-jb-2.  Compaction never needed to 
 run.  
 In selectrowcounts.txt is a copy of the cqlsh output showing the incorrect 
 count(*) results.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6825) COUNT(*) with WHERE not finding all the matching rows

2014-03-21 Thread Bill Mitchell (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943586#comment-13943586
 ] 

Bill Mitchell commented on CASSANDRA-6825:
--

As it happens, I have that info handy as my JUnit testcase includes it in the 
log4j output:


CREATE TABLE testdb_1395374703023.sr (
siteid text,
listid bigint,
partition int,
createdate timestamp,
emailcrypt text,
emailaddr text,
properties text,
removedate timestamp,
PRIMARY KEY ((siteid, listid, partition), createdate, emailcrypt)
) WITH CLUSTERING ORDER BY (createdate DESC, emailcrypt ASC)
   AND read_repair_chance = 0.1
   AND dclocal_read_repair_chance = 0.0
   AND replicate_on_write = true
   AND gc_grace_seconds = 864000
   AND bloom_filter_fp_chance = 0.01
   AND caching = 'KEYS_ONLY'
   AND comment = ''
   AND compaction = { 'class' : 
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy' }
   AND compression = { 'sstable_compression' : 
'org.apache.cassandra.io.compress.SnappyCompressor' };

(siteID was a BIGINT until recently when the schema was changed to TEXT to 
match the use of siteID elsewhere in the product.  I had not thought to 
represent our Java String as a Cassandra UUID.)

 COUNT(*) with WHERE not finding all the matching rows
 -

 Key: CASSANDRA-6825
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6825
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: quad core Windows7 x64, single node cluster
 Cassandra 2.0.5
Reporter: Bill Mitchell
Assignee: Tyler Hobbs
 Attachments: cassandra.log, selectpartitions.zip, 
 selectrowcounts.txt, testdb_1395372407904.zip, testdb_1395372407904.zip


 Investigating another problem, I needed to do COUNT(*) on the several 
 partitions of a table immediately after a test case ran, and I discovered 
 that count(*) on the full table and on each of the partitions returned 
 different counts.  
 In particular case, SELECT COUNT(*) FROM sr LIMIT 100; returned the 
 expected count from the test 9 rows.  The composite primary key splits 
 the logical row into six distinct partitions, and when I issue a query asking 
 for the total across all six partitions, the returned result is only 83999.  
 Drilling down, I find that SELECT * from sr WHERE s = 5 AND l = 11 AND 
 partition = 0; returns 30,000 rows, but a SELECT COUNT(*) with the identical 
 WHERE predicate reports only 14,000. 
 This is failing immediately after running a single small test, such that 
 there are only two SSTables, sr-jb-1 and sr-jb-2.  Compaction never needed to 
 run.  
 In selectrowcounts.txt is a copy of the cqlsh output showing the incorrect 
 count(*) results.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6825) COUNT(*) with WHERE not finding all the matching rows

2014-03-20 Thread Bill Mitchell (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13942232#comment-13942232
 ] 

Bill Mitchell commented on CASSANDRA-6825:
--

I can confirm the problem is still there in 2.0.6.  As I was verifying that I 
could still reproduce CASSANDRA-6826, I checked for the COUNT(*) issue too.  In 
one of the tables six partitions, a COUNT(*) reported 1 rows, but if I did 
a SELECT * in either ascending or descending order, cqlsh printed 2 rows.  
Would it help if I zipped up the data directory containing the table after the 
problem appeared?  Or would you need other information from the system 
directory, too, to see how the data is recorded? That might help in isolating 
how the problem arises.

 COUNT(*) with WHERE not finding all the matching rows
 -

 Key: CASSANDRA-6825
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6825
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: quad core Windows7 x64, single node cluster
 Cassandra 2.0.5
Reporter: Bill Mitchell
Assignee: Russ Hatch
 Attachments: cassandra.log, selectpartitions.zip, selectrowcounts.txt


 Investigating another problem, I needed to do COUNT(*) on the several 
 partitions of a table immediately after a test case ran, and I discovered 
 that count(*) on the full table and on each of the partitions returned 
 different counts.  
 In particular case, SELECT COUNT(*) FROM sr LIMIT 100; returned the 
 expected count from the test 9 rows.  The composite primary key splits 
 the logical row into six distinct partitions, and when I issue a query asking 
 for the total across all six partitions, the returned result is only 83999.  
 Drilling down, I find that SELECT * from sr WHERE s = 5 AND l = 11 AND 
 partition = 0; returns 30,000 rows, but a SELECT COUNT(*) with the identical 
 WHERE predicate reports only 14,000. 
 This is failing immediately after running a single small test, such that 
 there are only two SSTables, sr-jb-1 and sr-jb-2.  Compaction never needed to 
 run.  
 In selectrowcounts.txt is a copy of the cqlsh output showing the incorrect 
 count(*) results.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6826) Query returns different number of results depending on fetchsize

2014-03-20 Thread Bill Mitchell (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13942250#comment-13942250
 ] 

Bill Mitchell commented on CASSANDRA-6826:
--

No doubt.  At the moment, though, the test case is embedded in a full 
application, as I mentioned to Joshua (CASSANDRA-6736).  Stripping that 
application down so that the test case did not carry with it so much 
proprietary code is a couple of days of work, and I'm not sure when I will get 
to it.  Even worse, when I first encountered this problem, it appeared only in 
a maven remove clean install of the whole project and not when the test case 
was run by itself.  This last week, though, it would intermittently appear and 
disappear when I repeated the test unchanged, without doing the maven complete 
build.  So it may be that a reduced version, when I have a chance to strip it 
down, will show the same anomaly.

 Query returns different number of results depending on fetchsize
 

 Key: CASSANDRA-6826
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6826
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: quad-core Windows 7 x64, single node cluster
 Cassandra 2.0.5
Reporter: Bill Mitchell
Assignee: Sylvain Lebresne

 I issue a query across the set of partitioned wide rows for one logical row, 
 where s, l, and partition specify the composite primary key for the row:
 SELECT ec, ea, rd FROM sr WHERE s = ? and partition IN ? and l = ? ALLOW 
 FILTERING;
 If I set fetchSize to only 1000 when the Cluster is configured, the query 
 sometimes does not return all the results.  In the particular case I am 
 chasing, it returns a total of 98586 rows.  If I increase the fetchsize to 
 10, all the 9 actual rows are returned.  This suggests there is some 
 problem with fetchsize re-establishing the position on the next segment of 
 the result set, at least when multiple partitions are being accessed.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (CASSANDRA-6825) COUNT(*) with WHERE not finding all the matching rows

2014-03-20 Thread Bill Mitchell (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Mitchell updated CASSANDRA-6825:
-

Attachment: testdb_1395372407904.zip

 COUNT(*) with WHERE not finding all the matching rows
 -

 Key: CASSANDRA-6825
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6825
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: quad core Windows7 x64, single node cluster
 Cassandra 2.0.5
Reporter: Bill Mitchell
Assignee: Tyler Hobbs
 Attachments: cassandra.log, selectpartitions.zip, 
 selectrowcounts.txt, testdb_1395372407904.zip


 Investigating another problem, I needed to do COUNT(*) on the several 
 partitions of a table immediately after a test case ran, and I discovered 
 that count(*) on the full table and on each of the partitions returned 
 different counts.  
 In particular case, SELECT COUNT(*) FROM sr LIMIT 100; returned the 
 expected count from the test 9 rows.  The composite primary key splits 
 the logical row into six distinct partitions, and when I issue a query asking 
 for the total across all six partitions, the returned result is only 83999.  
 Drilling down, I find that SELECT * from sr WHERE s = 5 AND l = 11 AND 
 partition = 0; returns 30,000 rows, but a SELECT COUNT(*) with the identical 
 WHERE predicate reports only 14,000. 
 This is failing immediately after running a single small test, such that 
 there are only two SSTables, sr-jb-1 and sr-jb-2.  Compaction never needed to 
 run.  
 In selectrowcounts.txt is a copy of the cqlsh output showing the incorrect 
 count(*) results.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6825) COUNT(*) with WHERE not finding all the matching rows

2014-03-20 Thread Bill Mitchell (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13942757#comment-13942757
 ] 

Bill Mitchell commented on CASSANDRA-6825:
--

I've attached a testdb_1395372407904.zip of the data/testdb_1395372407904 
directory after the test ran.  After the test completed, I did select * from sr 
and it returned 10 rows:

cqlsh:testdb_1395372407904 select count(*) from sr limit 10;

 count

 10

(1 rows)

When I did a select count(*) for each of the six partitions, they total only 
9:
cqlsh:testdb_1395372407904 select count(*) from sr where siteID = '4CA4F79E-3AB
2-41C5-AE42-C7009736F1D5' and listID = 24 and partition = 0 LIMIT 10;

 count
---
 2

(1 rows)

cqlsh:testdb_1395372407904 select count(*) from sr where siteID = '4CA4F79E-3AB
2-41C5-AE42-C7009736F1D5' and listID = 24 and partition = 1 LIMIT 10;

 count
---
 2

(1 rows)

cqlsh:testdb_1395372407904 select count(*) from sr where siteID = '4CA4F79E-3AB
2-41C5-AE42-C7009736F1D5' and listID = 24 and partition = 2 LIMIT 10;

 count
---
 1

(1 rows)

cqlsh:testdb_1395372407904 select count(*) from sr where siteID = '4CA4F79E-3AB
2-41C5-AE42-C7009736F1D5' and listID = 24 and partition = 3 LIMIT 10;

 count
---
 1

(1 rows)

cqlsh:testdb_1395372407904 select count(*) from sr where siteID = '4CA4F79E-3AB
2-41C5-AE42-C7009736F1D5' and listID = 24 and partition = 4 LIMIT 10;

 count
---
 1

(1 rows)

cqlsh:testdb_1395372407904 select count(*) from sr where siteID = '4CA4F79E-3AB
2-41C5-AE42-C7009736F1D5' and listID = 24 and partition = 5 LIMIT 10;

 count
---
 2

(1 rows)

As it turns out, the 1 rows not counted were all from partition=2, and have 
a createDate identical except in the milliseconds to 1 rows that do appear. 
 The common key values of the presumably uncounted rows (as they are the rows 
that did not return on the SELECT query, CASSANDRA-6826) are 
siteID=4CA4F79E-3AB2-41C5-AE42-C7009736F1D5,listID=24,partition=2,createDate=2014-03-20T22:27:26.457-0500.
 


 COUNT(*) with WHERE not finding all the matching rows
 -

 Key: CASSANDRA-6825
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6825
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: quad core Windows7 x64, single node cluster
 Cassandra 2.0.5
Reporter: Bill Mitchell
Assignee: Tyler Hobbs
 Attachments: cassandra.log, selectpartitions.zip, 
 selectrowcounts.txt, testdb_1395372407904.zip


 Investigating another problem, I needed to do COUNT(*) on the several 
 partitions of a table immediately after a test case ran, and I discovered 
 that count(*) on the full table and on each of the partitions returned 
 different counts.  
 In particular case, SELECT COUNT(*) FROM sr LIMIT 100; returned the 
 expected count from the test 9 rows.  The composite primary key splits 
 the logical row into six distinct partitions, and when I issue a query asking 
 for the total across all six partitions, the returned result is only 83999.  
 Drilling down, I find that SELECT * from sr WHERE s = 5 AND l = 11 AND 
 partition = 0; returns 30,000 rows, but a SELECT COUNT(*) with the identical 
 WHERE predicate reports only 14,000. 
 This is failing immediately after running a single small test, such that 
 there are only two SSTables, sr-jb-1 and sr-jb-2.  Compaction never needed to 
 run.  
 In selectrowcounts.txt is a copy of the cqlsh output showing the incorrect 
 count(*) results.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6826) Query returns different number of results depending on fetchsize

2014-03-18 Thread Bill Mitchell (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13940106#comment-13940106
 ] 

Bill Mitchell commented on CASSANDRA-6826:
--

Following Sylvain's suggestion that something about the null's might be 
affecting the problem, I tried changing the schema.  On my dual-core laptop, 
where the final column is null but not set explicitly null on INSERT, the 
SELECT * is returning a total of 9 rows where 10 are expected.  
Changing the name of the column to begin with an a, so the nullable column is 
no longer last, the SELECT * is returning a total of 8 rows, where 10 
are expected.  If I try the same query from cqlsh, where there is no limit on 
fetchSize, all the expected rows are returned.  

So, at least in this one experiment, changing the schema by changing the order 
of the columns affected the behavior.  This could, of course, be merely 
coincidental, some timing issue.  

 Query returns different number of results depending on fetchsize
 

 Key: CASSANDRA-6826
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6826
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: quad-core Windows 7 x64, single node cluster
 Cassandra 2.0.5
Reporter: Bill Mitchell
Assignee: Sylvain Lebresne

 I issue a query across the set of partitioned wide rows for one logical row, 
 where s, l, and partition specify the composite primary key for the row:
 SELECT ec, ea, rd FROM sr WHERE s = ? and partition IN ? and l = ? ALLOW 
 FILTERING;
 If I set fetchSize to only 1000 when the Cluster is configured, the query 
 sometimes does not return all the results.  In the particular case I am 
 chasing, it returns a total of 98586 rows.  If I increase the fetchsize to 
 10, all the 9 actual rows are returned.  This suggests there is some 
 problem with fetchsize re-establishing the position on the next segment of 
 the result set, at least when multiple partitions are being accessed.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6826) Query returns different number of results depending on fetchsize

2014-03-18 Thread Bill Mitchell (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13940108#comment-13940108
 ] 

Bill Mitchell commented on CASSANDRA-6826:
--

I tried a different experiment.  I used a different algorithm to compute the 
partition value when the rows are INSERTed.  In the failing case, I was 
inserting a block of 1 rows with an identical partition values (in 20 
batches of 500 each), then choosing another partition value for the next block 
of 1.  

I changed the partition calculation to randomly assign the partition value, so 
that rows were written across all the partition values in each block.  With 
this algorithm, no failure was observed, even though internally I grouped the 
inserts by partition value into distinct batches, to take advantage of 
CASSANDRA-6737.  Because of the random assignment of partition values, odds are 
the partition boundaries no longer align with the fetchSize.   

 Query returns different number of results depending on fetchsize
 

 Key: CASSANDRA-6826
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6826
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: quad-core Windows 7 x64, single node cluster
 Cassandra 2.0.5
Reporter: Bill Mitchell
Assignee: Sylvain Lebresne

 I issue a query across the set of partitioned wide rows for one logical row, 
 where s, l, and partition specify the composite primary key for the row:
 SELECT ec, ea, rd FROM sr WHERE s = ? and partition IN ? and l = ? ALLOW 
 FILTERING;
 If I set fetchSize to only 1000 when the Cluster is configured, the query 
 sometimes does not return all the results.  In the particular case I am 
 chasing, it returns a total of 98586 rows.  If I increase the fetchsize to 
 10, all the 9 actual rows are returned.  This suggests there is some 
 problem with fetchsize re-establishing the position on the next segment of 
 the result set, at least when multiple partitions are being accessed.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6826) Query returns different number of results depending on fetchsize

2014-03-12 Thread Bill Mitchell (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13931677#comment-13931677
 ] 

Bill Mitchell commented on CASSANDRA-6826:
--

Sylvain, merci de me l'avoir fait remarquer.  La dernière fois que j'ai cherché 
la nouvelle version, je ne l'ai pas trouvée.  Je vais la télécharger 
sur-le-champ.  Sans doute celui-ci c'est le même problème que CASSANDRA-6748.  

 Query returns different number of results depending on fetchsize
 

 Key: CASSANDRA-6826
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6826
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: quad-core Windows 7 x64, single node cluster
 Cassandra 2.0.5
Reporter: Bill Mitchell
Assignee: Sylvain Lebresne

 I issue a query across the set of partitioned wide rows for one logical row, 
 where s, l, and partition specify the composite primary key for the row:
 SELECT ec, ea, rd FROM sr WHERE s = ? and partition IN ? and l = ? ALLOW 
 FILTERING;
 If I set fetchSize to only 1000 when the Cluster is configured, the query 
 sometimes does not return all the results.  In the particular case I am 
 chasing, it returns a total of 98586 rows.  If I increase the fetchsize to 
 10, all the 9 actual rows are returned.  This suggests there is some 
 problem with fetchsize re-establishing the position on the next segment of 
 the result set, at least when multiple partitions are being accessed.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6826) Query returns different number of results depending on fetchsize

2014-03-12 Thread Bill Mitchell (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13932398#comment-13932398
 ] 

Bill Mitchell commented on CASSANDRA-6826:
--

Much as I found Sylvain's suggestion plausible, no, it does not explain this 
problem.  After installing the Apache Cassandra 2.0.6 build, the first time I 
tried this, it still failed.  

Unfortunately, the problem is data or timing dependent.  After seeing the 
failure on 2.0.6, I changed the test case to write all the rows into one 
partition, and that worked, so I changed it back to distributing the rows over 
6 partitions, and this time that worked, too.  So we were lucky that the 
first time I tried this, the failure did appear.  

(I should have noticed that CASSANDRA-6748 appeared only when a column was 
explicitly set to null.  That was the behavior of my code about two weeks ago, 
before I discovered the issues around having a large number of tombstones in a 
wide row.)  

 Query returns different number of results depending on fetchsize
 

 Key: CASSANDRA-6826
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6826
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: quad-core Windows 7 x64, single node cluster
 Cassandra 2.0.5
Reporter: Bill Mitchell
Assignee: Sylvain Lebresne

 I issue a query across the set of partitioned wide rows for one logical row, 
 where s, l, and partition specify the composite primary key for the row:
 SELECT ec, ea, rd FROM sr WHERE s = ? and partition IN ? and l = ? ALLOW 
 FILTERING;
 If I set fetchSize to only 1000 when the Cluster is configured, the query 
 sometimes does not return all the results.  In the particular case I am 
 chasing, it returns a total of 98586 rows.  If I increase the fetchsize to 
 10, all the 9 actual rows are returned.  This suggests there is some 
 problem with fetchsize re-establishing the position on the next segment of 
 the result set, at least when multiple partitions are being accessed.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6825) COUNT(*) with WHERE not finding all the matching rows

2014-03-10 Thread Bill Mitchell (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13926446#comment-13926446
 ] 

Bill Mitchell commented on CASSANDRA-6825:
--

After shortening the column names, the schema is: CREATE TABLE sr (s bigint, l 
bigint, partition int, cd timestamp, ec text, ea text, properties text, rd 
timestamp, PRIMARY KEY ((s, l, p), cd, ec)) WITH CLUSTERING ORDER BY (cd DESC, 
ec ASC).

 COUNT(*) with WHERE not finding all the matching rows
 -

 Key: CASSANDRA-6825
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6825
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: quad core Windows7 x64, single node cluster
 Cassandra 2.0.5
Reporter: Bill Mitchell
Assignee: Russ Hatch
 Attachments: cassandra.log, selectpartitions.zip, selectrowcounts.txt


 Investigating another problem, I needed to do COUNT(*) on the several 
 partitions of a table immediately after a test case ran, and I discovered 
 that count(*) on the full table and on each of the partitions returned 
 different counts.  
 In particular case, SELECT COUNT(*) FROM sr LIMIT 100; returned the 
 expected count from the test 9 rows.  The composite primary key splits 
 the logical row into six distinct partitions, and when I issue a query asking 
 for the total across all six partitions, the returned result is only 83999.  
 Drilling down, I find that SELECT * from sr WHERE s = 5 AND l = 11 AND 
 partition = 0; returns 30,000 rows, but a SELECT COUNT(*) with the identical 
 WHERE predicate reports only 14,000. 
 This is failing immediately after running a single small test, such that 
 there are only two SSTables, sr-jb-1 and sr-jb-2.  Compaction never needed to 
 run.  
 In selectrowcounts.txt is a copy of the cqlsh output showing the incorrect 
 count(*) results.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6736) Windows7 AccessDeniedException on commit log

2014-03-10 Thread Bill Mitchell (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13929908#comment-13929908
 ] 

Bill Mitchell commented on CASSANDRA-6736:
--

Josh, by accident I logged into my development machine from home this evening, 
found it had stopped with the COMMIT_LOG_ALLOCATOR exception, and 
coincidentally noticed that the Kaspersky virus scan was still running.  Which 
suggests that this may have never been a C* issue, but rather interference from 
the antivirus software -- a hypothesis consistent with this issue appearing 
only my development machine at work and never on my laptop.  So, unless someone 
else reports a similar symptom, I suggest we close this out.  

 Windows7 AccessDeniedException on commit log 
 -

 Key: CASSANDRA-6736
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6736
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Windows 7, quad core, 8GB RAM, single Cassandra node, 
 Cassandra 2.0.5 with leakdetect patch from CASSANDRA-6283
Reporter: Bill Mitchell
Assignee: Joshua McKenzie
 Attachments: 2014-02-18-22-16.log


 Similar to the data file deletion of CASSANDRA-6283, under heavy load with 
 logged batches, I am seeing a problem where the Commit log cannot be deleted:
  ERROR [COMMIT-LOG-ALLOCATOR] 2014-02-18 22:15:58,252 CassandraDaemon.java 
 (line 192) Exception in thread Thread[COMMIT-LOG-ALLOCATOR,5,main]
  FSWriteError in C:\Program Files\DataStax 
 Community\data\commitlog\CommitLog-3-1392761510706.log
   at 
 org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:120)
   at 
 org.apache.cassandra.db.commitlog.CommitLogSegment.discard(CommitLogSegment.java:150)
   at 
 org.apache.cassandra.db.commitlog.CommitLogAllocator$4.run(CommitLogAllocator.java:217)
   at 
 org.apache.cassandra.db.commitlog.CommitLogAllocator$1.runMayThrow(CommitLogAllocator.java:95)
   at 
 org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
   at java.lang.Thread.run(Unknown Source)
 Caused by: java.nio.file.AccessDeniedException: C:\Program Files\DataStax 
 Community\data\commitlog\CommitLog-3-1392761510706.log
   at sun.nio.fs.WindowsException.translateToIOException(Unknown Source)
   at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source)
   at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source)
   at sun.nio.fs.WindowsFileSystemProvider.implDelete(Unknown Source)
   at sun.nio.fs.AbstractFileSystemProvider.delete(Unknown Source)
   at java.nio.file.Files.delete(Unknown Source)
   at 
 org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:116)
   ... 5 more
 (Attached in 2014-02-18-22-16.log is a larger excerpt from the cassandra.log.)
 In this particular case, I was trying to do 100 million inserts into two 
 tables in parallel, one with a single wide row and one with narrow rows, and 
 the error appeared after inserting 43,151,232 rows.  So it does take a while 
 to trip over this timing issue.  
 It may be aggravated by the size of the batches. This test was writing 10,000 
 rows to each table in a batch.  
 When I try switching the same test from using a logged batch to an unlogged 
 batch, and no such failure appears. So the issue could be related to the use 
 of large, logged batches, or it could be that unlogged batches just change 
 the probability of failure.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6825) COUNT(*) with WHERE not finding all the matching rows

2014-03-08 Thread Bill Mitchell (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13924860#comment-13924860
 ] 

Bill Mitchell commented on CASSANDRA-6825:
--

If it helps in reproducing it, unlike my earlier report in CASSANDRA-6736, this 
failure and that of CASSANDRA-6826 appear in a small volume test, less than 
100,000 rows total.  This lower number was being run in a JUnit test as part of 
a maven build of a complete product, such that the test keyspace and tables 
were created, but the row insertion did not begin until 9 minutes later.  So 
Cassandra is not noting these as high-volume activity, and the row width is not 
large enough to provoke incremental compaction, or in fact any compaction 
whatsoever.  

 COUNT(*) with WHERE not finding all the matching rows
 -

 Key: CASSANDRA-6825
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6825
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: quad core Windows7 x64, single node cluster
 Cassandra 2.0.5
Reporter: Bill Mitchell
Assignee: Russ Hatch
 Attachments: cassandra.log, selectpartitions.zip, selectrowcounts.txt


 Investigating another problem, I needed to do COUNT(*) on the several 
 partitions of a table immediately after a test case ran, and I discovered 
 that count(*) on the full table and on each of the partitions returned 
 different counts.  
 In particular case, SELECT COUNT(*) FROM sr LIMIT 100; returned the 
 expected count from the test 9 rows.  The composite primary key splits 
 the logical row into six distinct partitions, and when I issue a query asking 
 for the total across all six partitions, the returned result is only 83999.  
 Drilling down, I find that SELECT * from sr WHERE s = 5 AND l = 11 AND 
 partition = 0; returns 30,000 rows, but a SELECT COUNT(*) with the identical 
 WHERE predicate reports only 14,000. 
 This is failing immediately after running a single small test, such that 
 there are only two SSTables, sr-jb-1 and sr-jb-2.  Compaction never needed to 
 run.  
 In selectrowcounts.txt is a copy of the cqlsh output showing the incorrect 
 count(*) results.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (CASSANDRA-6825) COUNT(*) with WHERE not finding all the matching rows

2014-03-07 Thread Bill Mitchell (JIRA)
Bill Mitchell created CASSANDRA-6825:


 Summary: COUNT(*) with WHERE not finding all the matching rows
 Key: CASSANDRA-6825
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6825
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: quad core Windows7 x64
Cassandra 2.0.5
Reporter: Bill Mitchell
 Attachments: selectrowcounts.txt

Investigating another problem, I needed to do COUNT(*) on the several 
partitions of a table immediately after a test case ran, and I discovered that 
count(*) on the full table and on each of the partitions returned different 
counts.  

In particular case, SELECT COUNT(*) FROM sr LIMIT 100; returned the 
expected count from the test 9 rows.  The composite primary key splits the 
logical row into six distinct partitions, and when I issue a query asking for 
the total across all six partitions, the returned result is only 83999.  
Drilling down, I find that SELECT * from sr WHERE s = 5 AND l = 11 AND 
partition = 0; returns 30,000 rows, but a SELECT COUNT(*) with the identical 
WHERE predicate reports only 14,000. 

This is failing immediately after running a single small test, such that there 
are only two SSTables, sr-jb-1 and sr-jb-2.  Compaction never needed to run.  

In selectrowcounts.txt is a copy of the cqlsh output showing the incorrect 
count(*) results.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (CASSANDRA-6825) COUNT(*) with WHERE not finding all the matching rows

2014-03-07 Thread Bill Mitchell (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Mitchell updated CASSANDRA-6825:
-

Attachment: cassandra.log

I've also attached the cassandra.log from the period during which the test was 
running.  

 COUNT(*) with WHERE not finding all the matching rows
 -

 Key: CASSANDRA-6825
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6825
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: quad core Windows7 x64
 Cassandra 2.0.5
Reporter: Bill Mitchell
 Attachments: cassandra.log, selectrowcounts.txt


 Investigating another problem, I needed to do COUNT(*) on the several 
 partitions of a table immediately after a test case ran, and I discovered 
 that count(*) on the full table and on each of the partitions returned 
 different counts.  
 In particular case, SELECT COUNT(*) FROM sr LIMIT 100; returned the 
 expected count from the test 9 rows.  The composite primary key splits 
 the logical row into six distinct partitions, and when I issue a query asking 
 for the total across all six partitions, the returned result is only 83999.  
 Drilling down, I find that SELECT * from sr WHERE s = 5 AND l = 11 AND 
 partition = 0; returns 30,000 rows, but a SELECT COUNT(*) with the identical 
 WHERE predicate reports only 14,000. 
 This is failing immediately after running a single small test, such that 
 there are only two SSTables, sr-jb-1 and sr-jb-2.  Compaction never needed to 
 run.  
 In selectrowcounts.txt is a copy of the cqlsh output showing the incorrect 
 count(*) results.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (CASSANDRA-6825) COUNT(*) with WHERE not finding all the matching rows

2014-03-07 Thread Bill Mitchell (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Mitchell updated CASSANDRA-6825:
-

Attachment: selectpartitions.zip

In selectpartitions.txt, one can see the full selects for each of the separate 
partition values.

 COUNT(*) with WHERE not finding all the matching rows
 -

 Key: CASSANDRA-6825
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6825
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: quad core Windows7 x64
 Cassandra 2.0.5
Reporter: Bill Mitchell
 Attachments: cassandra.log, selectpartitions.zip, selectrowcounts.txt


 Investigating another problem, I needed to do COUNT(*) on the several 
 partitions of a table immediately after a test case ran, and I discovered 
 that count(*) on the full table and on each of the partitions returned 
 different counts.  
 In particular case, SELECT COUNT(*) FROM sr LIMIT 100; returned the 
 expected count from the test 9 rows.  The composite primary key splits 
 the logical row into six distinct partitions, and when I issue a query asking 
 for the total across all six partitions, the returned result is only 83999.  
 Drilling down, I find that SELECT * from sr WHERE s = 5 AND l = 11 AND 
 partition = 0; returns 30,000 rows, but a SELECT COUNT(*) with the identical 
 WHERE predicate reports only 14,000. 
 This is failing immediately after running a single small test, such that 
 there are only two SSTables, sr-jb-1 and sr-jb-2.  Compaction never needed to 
 run.  
 In selectrowcounts.txt is a copy of the cqlsh output showing the incorrect 
 count(*) results.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (CASSANDRA-6825) COUNT(*) with WHERE not finding all the matching rows

2014-03-07 Thread Bill Mitchell (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Mitchell updated CASSANDRA-6825:
-

Environment: 
quad core Windows7 x64, single node cluster
Cassandra 2.0.5

  was:
quad core Windows7 x64
Cassandra 2.0.5


 COUNT(*) with WHERE not finding all the matching rows
 -

 Key: CASSANDRA-6825
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6825
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: quad core Windows7 x64, single node cluster
 Cassandra 2.0.5
Reporter: Bill Mitchell
 Attachments: cassandra.log, selectpartitions.zip, selectrowcounts.txt


 Investigating another problem, I needed to do COUNT(*) on the several 
 partitions of a table immediately after a test case ran, and I discovered 
 that count(*) on the full table and on each of the partitions returned 
 different counts.  
 In particular case, SELECT COUNT(*) FROM sr LIMIT 100; returned the 
 expected count from the test 9 rows.  The composite primary key splits 
 the logical row into six distinct partitions, and when I issue a query asking 
 for the total across all six partitions, the returned result is only 83999.  
 Drilling down, I find that SELECT * from sr WHERE s = 5 AND l = 11 AND 
 partition = 0; returns 30,000 rows, but a SELECT COUNT(*) with the identical 
 WHERE predicate reports only 14,000. 
 This is failing immediately after running a single small test, such that 
 there are only two SSTables, sr-jb-1 and sr-jb-2.  Compaction never needed to 
 run.  
 In selectrowcounts.txt is a copy of the cqlsh output showing the incorrect 
 count(*) results.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (CASSANDRA-6825) COUNT(*) with WHERE not finding all the matching rows

2014-03-07 Thread Bill Mitchell (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13924480#comment-13924480
 ] 

Bill Mitchell edited comment on CASSANDRA-6825 at 3/7/14 11:26 PM:
---

Yes.  I've added that to the environment description.  

The partitioning is in the schema and in the code for when we move this to a 
non-trivial cluster.


was (Author: wtmitchell3):
Yes.  I've added that to the environment description.  

 COUNT(*) with WHERE not finding all the matching rows
 -

 Key: CASSANDRA-6825
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6825
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: quad core Windows7 x64, single node cluster
 Cassandra 2.0.5
Reporter: Bill Mitchell
 Attachments: cassandra.log, selectpartitions.zip, selectrowcounts.txt


 Investigating another problem, I needed to do COUNT(*) on the several 
 partitions of a table immediately after a test case ran, and I discovered 
 that count(*) on the full table and on each of the partitions returned 
 different counts.  
 In particular case, SELECT COUNT(*) FROM sr LIMIT 100; returned the 
 expected count from the test 9 rows.  The composite primary key splits 
 the logical row into six distinct partitions, and when I issue a query asking 
 for the total across all six partitions, the returned result is only 83999.  
 Drilling down, I find that SELECT * from sr WHERE s = 5 AND l = 11 AND 
 partition = 0; returns 30,000 rows, but a SELECT COUNT(*) with the identical 
 WHERE predicate reports only 14,000. 
 This is failing immediately after running a single small test, such that 
 there are only two SSTables, sr-jb-1 and sr-jb-2.  Compaction never needed to 
 run.  
 In selectrowcounts.txt is a copy of the cqlsh output showing the incorrect 
 count(*) results.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6825) COUNT(*) with WHERE not finding all the matching rows

2014-03-07 Thread Bill Mitchell (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13924480#comment-13924480
 ] 

Bill Mitchell commented on CASSANDRA-6825:
--

Yes.  I've added that to the environment description.  

 COUNT(*) with WHERE not finding all the matching rows
 -

 Key: CASSANDRA-6825
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6825
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: quad core Windows7 x64, single node cluster
 Cassandra 2.0.5
Reporter: Bill Mitchell
 Attachments: cassandra.log, selectpartitions.zip, selectrowcounts.txt


 Investigating another problem, I needed to do COUNT(*) on the several 
 partitions of a table immediately after a test case ran, and I discovered 
 that count(*) on the full table and on each of the partitions returned 
 different counts.  
 In particular case, SELECT COUNT(*) FROM sr LIMIT 100; returned the 
 expected count from the test 9 rows.  The composite primary key splits 
 the logical row into six distinct partitions, and when I issue a query asking 
 for the total across all six partitions, the returned result is only 83999.  
 Drilling down, I find that SELECT * from sr WHERE s = 5 AND l = 11 AND 
 partition = 0; returns 30,000 rows, but a SELECT COUNT(*) with the identical 
 WHERE predicate reports only 14,000. 
 This is failing immediately after running a single small test, such that 
 there are only two SSTables, sr-jb-1 and sr-jb-2.  Compaction never needed to 
 run.  
 In selectrowcounts.txt is a copy of the cqlsh output showing the incorrect 
 count(*) results.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (CASSANDRA-6826) Query returns different number of results depending on fetchsize

2014-03-07 Thread Bill Mitchell (JIRA)
Bill Mitchell created CASSANDRA-6826:


 Summary: Query returns different number of results depending on 
fetchsize
 Key: CASSANDRA-6826
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6826
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: quad-core Windows 7 x64, single node cluster
Cassandra 2.0.5
Reporter: Bill Mitchell


I issue a query across the set of partitioned wide rows for one logical row, 
where s, l, and partition specify the composite primary key for the row:
SELECT ec, ea, rd FROM sr WHERE s = ? and partition IN ? and l = ? ALLOW 
FILTERING;

If I set fetchSize to only 1000 when the Cluster is configured, the query 
sometimes does not return all the results.  In the particular case I am 
chasing, it returns a total of 98586 rows.  If I increase the fetchsize to 
10, all the 9 actual rows are returned.  This suggests there is some 
problem with fetchsize re-establishing the position on the next segment of the 
result set, at least when multiple partitions are being accessed.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6826) Query returns different number of results depending on fetchsize

2014-03-07 Thread Bill Mitchell (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13924702#comment-13924702
 ] 

Bill Mitchell commented on CASSANDRA-6826:
--

It is conceivable that this problem and CASSANDRA-6825 are related, in that 
they were uncovered together.  I came across the behavior described in 
CASSANDRA-6825 trying to analyze the test failure caused by this problem.  

 Query returns different number of results depending on fetchsize
 

 Key: CASSANDRA-6826
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6826
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: quad-core Windows 7 x64, single node cluster
 Cassandra 2.0.5
Reporter: Bill Mitchell
Assignee: Sylvain Lebresne

 I issue a query across the set of partitioned wide rows for one logical row, 
 where s, l, and partition specify the composite primary key for the row:
 SELECT ec, ea, rd FROM sr WHERE s = ? and partition IN ? and l = ? ALLOW 
 FILTERING;
 If I set fetchSize to only 1000 when the Cluster is configured, the query 
 sometimes does not return all the results.  In the particular case I am 
 chasing, it returns a total of 98586 rows.  If I increase the fetchsize to 
 10, all the 9 actual rows are returned.  This suggests there is some 
 problem with fetchsize re-establishing the position on the next segment of 
 the result set, at least when multiple partitions are being accessed.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6736) Windows7 AccessDeniedException on commit log

2014-03-04 Thread Bill Mitchell (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919566#comment-13919566
 ] 

Bill Mitchell commented on CASSANDRA-6736:
--

That's a bit awkward, as the JUnit test cases live within the framework of a 
complete application, and stripping it down to something more minimal, that 
does not carry all of our application framework, will be a little work.  

As I've already built a 2.0.5 version with the leak detect patch, if you want 
me to try a version with a larger test or diagnostic patch, that would be 
straightforward.  

I'll drop you an email to explore other options.

 Windows7 AccessDeniedException on commit log 
 -

 Key: CASSANDRA-6736
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6736
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Windows 7, quad core, 8GB RAM, single Cassandra node, 
 Cassandra 2.0.5 with leakdetect patch from CASSANDRA-6283
Reporter: Bill Mitchell
Assignee: Joshua McKenzie
 Attachments: 2014-02-18-22-16.log


 Similar to the data file deletion of CASSANDRA-6283, under heavy load with 
 logged batches, I am seeing a problem where the Commit log cannot be deleted:
  ERROR [COMMIT-LOG-ALLOCATOR] 2014-02-18 22:15:58,252 CassandraDaemon.java 
 (line 192) Exception in thread Thread[COMMIT-LOG-ALLOCATOR,5,main]
  FSWriteError in C:\Program Files\DataStax 
 Community\data\commitlog\CommitLog-3-1392761510706.log
   at 
 org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:120)
   at 
 org.apache.cassandra.db.commitlog.CommitLogSegment.discard(CommitLogSegment.java:150)
   at 
 org.apache.cassandra.db.commitlog.CommitLogAllocator$4.run(CommitLogAllocator.java:217)
   at 
 org.apache.cassandra.db.commitlog.CommitLogAllocator$1.runMayThrow(CommitLogAllocator.java:95)
   at 
 org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
   at java.lang.Thread.run(Unknown Source)
 Caused by: java.nio.file.AccessDeniedException: C:\Program Files\DataStax 
 Community\data\commitlog\CommitLog-3-1392761510706.log
   at sun.nio.fs.WindowsException.translateToIOException(Unknown Source)
   at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source)
   at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source)
   at sun.nio.fs.WindowsFileSystemProvider.implDelete(Unknown Source)
   at sun.nio.fs.AbstractFileSystemProvider.delete(Unknown Source)
   at java.nio.file.Files.delete(Unknown Source)
   at 
 org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:116)
   ... 5 more
 (Attached in 2014-02-18-22-16.log is a larger excerpt from the cassandra.log.)
 In this particular case, I was trying to do 100 million inserts into two 
 tables in parallel, one with a single wide row and one with narrow rows, and 
 the error appeared after inserting 43,151,232 rows.  So it does take a while 
 to trip over this timing issue.  
 It may be aggravated by the size of the batches. This test was writing 10,000 
 rows to each table in a batch.  
 When I try switching the same test from using a logged batch to an unlogged 
 batch, and no such failure appears. So the issue could be related to the use 
 of large, logged batches, or it could be that unlogged batches just change 
 the probability of failure.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6736) Windows7 AccessDeniedException on commit log

2014-03-01 Thread Bill Mitchell (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13917188#comment-13917188
 ] 

Bill Mitchell commented on CASSANDRA-6736:
--

Off and on, I've been following the commentary in CASSANDRA-6283, but I opened 
this report as my issue seems to be a different situation and path through the 
code.  In particular:
1.  I'm not seeing any reports from the leak detection patch, except in the log 
cited in CASSANDRA-6721.
2.  As I'm working in a test environment, I'm not dealing with any node repair 
issues.  With only a single node I reboot it when it hangs, which clears the 
locks.  
3.  As these are only test cases, and I am frequently changing the schema, I'm 
am deleting and recreating the keyspaces; so I would not notice any lingering 
data files.  And I gave up after CASSANDRA-6721, and changed my test 
environment to use a new unique keyspace name, except when I really want to 
exercise multiple runs against the same database.  
4.  As these are only tests, I disabled snapshots in the cassandra.yaml, so I'm 
not seeing those locks.
5.  Similarly, after CASSANDRA-6721, I simply disabled saved key caching, so 
I'm not seeing any issues around those files.  
All of which may mean I've avoided the issues mentioned in CASSANDRA-6283 and 
am hitting a different set of issues.  

I can confirm that the issue here is not exclusive to the use of logged 
batches.  On Thursday I ran into the same COMMIT_LOG_ALLOCATOR failure using 
smaller, overlapped unlogged batches to the two tables (breaking the larger 
batch down into smaller segments, and overlapping the one segment against one 
table, while the next segment is applied to the other table).  So the use of 
large, logged batches just makes the problem likely enough for me to see it 
fairly consistently. 

I have seen the file deletion failure in a couple of other situations where it 
seemed a secondary result from an earlier failure.  If it helps, I will 
describe these below.   


 Windows7 AccessDeniedException on commit log 
 -

 Key: CASSANDRA-6736
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6736
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Windows 7, quad core, 8GB RAM, single Cassandra node, 
 Cassandra 2.0.5 with leakdetect patch from CASSANDRA-6283
Reporter: Bill Mitchell
Assignee: Joshua McKenzie
 Attachments: 2014-02-18-22-16.log


 Similar to the data file deletion of CASSANDRA-6283, under heavy load with 
 logged batches, I am seeing a problem where the Commit log cannot be deleted:
  ERROR [COMMIT-LOG-ALLOCATOR] 2014-02-18 22:15:58,252 CassandraDaemon.java 
 (line 192) Exception in thread Thread[COMMIT-LOG-ALLOCATOR,5,main]
  FSWriteError in C:\Program Files\DataStax 
 Community\data\commitlog\CommitLog-3-1392761510706.log
   at 
 org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:120)
   at 
 org.apache.cassandra.db.commitlog.CommitLogSegment.discard(CommitLogSegment.java:150)
   at 
 org.apache.cassandra.db.commitlog.CommitLogAllocator$4.run(CommitLogAllocator.java:217)
   at 
 org.apache.cassandra.db.commitlog.CommitLogAllocator$1.runMayThrow(CommitLogAllocator.java:95)
   at 
 org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
   at java.lang.Thread.run(Unknown Source)
 Caused by: java.nio.file.AccessDeniedException: C:\Program Files\DataStax 
 Community\data\commitlog\CommitLog-3-1392761510706.log
   at sun.nio.fs.WindowsException.translateToIOException(Unknown Source)
   at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source)
   at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source)
   at sun.nio.fs.WindowsFileSystemProvider.implDelete(Unknown Source)
   at sun.nio.fs.AbstractFileSystemProvider.delete(Unknown Source)
   at java.nio.file.Files.delete(Unknown Source)
   at 
 org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:116)
   ... 5 more
 (Attached in 2014-02-18-22-16.log is a larger excerpt from the cassandra.log.)
 In this particular case, I was trying to do 100 million inserts into two 
 tables in parallel, one with a single wide row and one with narrow rows, and 
 the error appeared after inserting 43,151,232 rows.  So it does take a while 
 to trip over this timing issue.  
 It may be aggravated by the size of the batches. This test was writing 10,000 
 rows to each table in a batch.  
 When I try switching the same test from using a logged batch to an unlogged 
 batch, and no such failure appears. So the issue could be related to the use 
 of large, logged batches, or it could be that unlogged batches just change 
 the probability of failure.  



--
This message was sent by 

[jira] [Commented] (CASSANDRA-6736) Windows7 AccessDeniedException on commit log

2014-03-01 Thread Bill Mitchell (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13917195#comment-13917195
 ] 

Bill Mitchell commented on CASSANDRA-6736:
--

The first time I came across the deleteWithConfirm failure on one of the Index 
files, on the previous run I was trying to use the SnappyCompressor on the 
table with lots of small rows and it died with a Java heap space error during 
garbage collection:

  INFO [ScheduledTasks:1] 2014-02-16 01:31:12,160 GCInspector.java (line 116) 
GC for ConcurrentMarkSweep: 1458 ms for 3 collections, 124495112 used; max is 
2130051072
 ERROR [ReadStage:7] 2014-02-16 01:31:13,820 CassandraDaemon.java (line 192) 
Exception in thread Thread[ReadStage:7,5,main]
 java.lang.OutOfMemoryError: Java heap space
at 
org.apache.cassandra.io.util.RandomAccessReader.readBytes(RandomAccessReader.java:353)
at 
org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:392)
at 
org.apache.cassandra.utils.ByteBufferUtil.readWithLength(ByteBufferUtil.java:355)
at 
org.apache.cassandra.db.ColumnSerializer.deserializeColumnBody(ColumnSerializer.java:124)
at 
org.apache.cassandra.db.OnDiskAtom$Serializer.deserializeFromSSTable(OnDiskAtom.java:85)
at org.apache.cassandra.db.Column$1.computeNext(Column.java:75)
at org.apache.cassandra.db.Column$1.computeNext(Column.java:64)
at 
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
at 
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
at 
com.google.common.collect.AbstractIterator.next(AbstractIterator.java:153)
at 
org.apache.cassandra.db.columniterator.IndexedSliceReader$IndexedBlockFetcher.getNextBlock(IndexedSliceReader.java:379)
at 
org.apache.cassandra.db.columniterator.IndexedSliceReader$IndexedBlockFetcher.fetchMoreData(IndexedSliceReader.java:332)
at 
org.apache.cassandra.db.columniterator.IndexedSliceReader.computeNext(IndexedSliceReader.java:145)
at 
org.apache.cassandra.db.columniterator.IndexedSliceReader.computeNext(IndexedSliceReader.java:45)
at 
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
at 
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
at 
org.apache.cassandra.db.columniterator.SSTableSliceIterator.hasNext(SSTableSliceIterator.java:82)
at 
org.apache.cassandra.db.filter.QueryFilter$2.getNext(QueryFilter.java:157)
at 
org.apache.cassandra.db.filter.QueryFilter$2.hasNext(QueryFilter.java:140)
at 
org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:144)
at 
org.apache.cassandra.utils.MergeIterator$ManyToOne.advance(MergeIterator.java:123)
at 
org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:97)
at 
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
at 
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
at 
org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:185)
at 
org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:122)
at 
org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:80)
at 
org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:72)
at 
org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:297)
at 
org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:53)
at 
org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1560)
at 
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1379)
 INFO [StorageServiceShutdownHook] 2014-02-16 01:31:13,953 ThriftServer.java 
(line 141) Stop listening to thrift clients
  INFO [StorageServiceShutdownHook] 2014-02-16 01:31:14,037 Server.java (line 
181) Stop listening for CQL clients

Soon after restart, 1.5 million rows into the test, I ran into the 
deleteWithConfirm failure:

  INFO [FlushWriter:5] 2014-02-16 08:53:57,084 Memtable.java (line 380) 
Completed flushing; nothing needed to be retained.  Commitlog position was 
ReplayPosition(segmentId=1392560341059, position=6159762)
 ERROR [NonPeriodicTasks:1] 2014-02-16 08:53:57,969 CassandraDaemon.java (line 
192) Exception in thread Thread[NonPeriodicTasks:1,5,main]
 FSWriteError in C:\Program Files\DataStax 
Community\data\data\testdb\sr\testdb-sr-jb-117-Index.db
at 
org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:120)
at 
org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:106)
at 

[jira] [Comment Edited] (CASSANDRA-6736) Windows7 AccessDeniedException on commit log

2014-03-01 Thread Bill Mitchell (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13917195#comment-13917195
 ] 

Bill Mitchell edited comment on CASSANDRA-6736 at 3/1/14 9:15 PM:
--

The first time I came across the deleteWithConfirm failure on one of the Index 
files, on the previous run I was trying to use the SnappyCompressor on the 
table with lots of small rows and it died with a Java heap space error during 
garbage collection:

  INFO [ScheduledTasks:1] 2014-02-16 01:31:12,160 GCInspector.java (line 116) 
GC for ConcurrentMarkSweep: 1458 ms for 3 collections, 124495112 used; max is 
2130051072
 ERROR [ReadStage:7] 2014-02-16 01:31:13,820 CassandraDaemon.java (line 192) 
Exception in thread Thread[ReadStage:7,5,main]
 java.lang.OutOfMemoryError: Java heap space
at 
org.apache.cassandra.io.util.RandomAccessReader.readBytes(RandomAccessReader.java:353)
at 
org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:392)
at 
org.apache.cassandra.utils.ByteBufferUtil.readWithLength(ByteBufferUtil.java:355)
at 
org.apache.cassandra.db.ColumnSerializer.deserializeColumnBody(ColumnSerializer.java:124)
at 
org.apache.cassandra.db.OnDiskAtom$Serializer.deserializeFromSSTable(OnDiskAtom.java:85)
at org.apache.cassandra.db.Column$1.computeNext(Column.java:75)
at org.apache.cassandra.db.Column$1.computeNext(Column.java:64)
at 
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
at 
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
at 
com.google.common.collect.AbstractIterator.next(AbstractIterator.java:153)
at 
org.apache.cassandra.db.columniterator.IndexedSliceReader$IndexedBlockFetcher.getNextBlock(IndexedSliceReader.java:379)
at 
org.apache.cassandra.db.columniterator.IndexedSliceReader$IndexedBlockFetcher.fetchMoreData(IndexedSliceReader.java:332)
at 
org.apache.cassandra.db.columniterator.IndexedSliceReader.computeNext(IndexedSliceReader.java:145)
at 
org.apache.cassandra.db.columniterator.IndexedSliceReader.computeNext(IndexedSliceReader.java:45)
at 
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
at 
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
at 
org.apache.cassandra.db.columniterator.SSTableSliceIterator.hasNext(SSTableSliceIterator.java:82)
at 
org.apache.cassandra.db.filter.QueryFilter$2.getNext(QueryFilter.java:157)
at 
org.apache.cassandra.db.filter.QueryFilter$2.hasNext(QueryFilter.java:140)
at 
org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:144)
at 
org.apache.cassandra.utils.MergeIterator$ManyToOne.advance(MergeIterator.java:123)
at 
org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:97)
at 
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
at 
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
at 
org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:185)
at 
org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:122)
at 
org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:80)
at 
org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:72)
at 
org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:297)
at 
org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:53)
at 
org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1560)
at 
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1379)
 INFO [StorageServiceShutdownHook] 2014-02-16 01:31:13,953 ThriftServer.java 
(line 141) Stop listening to thrift clients
  INFO [StorageServiceShutdownHook] 2014-02-16 01:31:14,037 Server.java (line 
181) Stop listening for CQL clients

Soon after restart, 1.5 million rows into the test, I ran into the 
deleteWithConfirm failure:

  INFO [FlushWriter:5] 2014-02-16 08:53:57,084 Memtable.java (line 380) 
Completed flushing; nothing needed to be retained.  Commitlog position was 
ReplayPosition(segmentId=1392560341059, position=6159762)
 ERROR [NonPeriodicTasks:1] 2014-02-16 08:53:57,969 CassandraDaemon.java (line 
192) Exception in thread Thread[NonPeriodicTasks:1,5,main]
 FSWriteError in C:\Program Files\DataStax 
Community\data\data\testdb\sr\testdb-sr-jb-117-Index.db
at 
org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:120)
at 
org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:106)
   

[jira] [Commented] (CASSANDRA-6736) Windows7 AccessDeniedException on commit log

2014-03-01 Thread Bill Mitchell (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13917201#comment-13917201
 ] 

Bill Mitchell commented on CASSANDRA-6736:
--

Trying to be polite, I started using Drain to shutdown Cassandra before 
rebooting the machine.  In one case, this provoked numerous ThreadPoolExecutor 
has shutdown messages underneath the compactor:
  INFO [RMI TCP Connection(2095)-127.0.0.1] 2014-02-24 08:34:23,743 
StorageService.java (line 947) DRAINING: starting drain process
  INFO [RMI TCP Connection(2095)-127.0.0.1] 2014-02-24 08:34:23,783 
ThriftServer.java (line 141) Stop listening to thrift clients
  INFO [RMI TCP Connection(2095)-127.0.0.1] 2014-02-24 08:34:24,980 Server.java 
(line 181) Stop listening for CQL clients
  INFO [RMI TCP Connection(2095)-127.0.0.1] 2014-02-24 08:34:24,980 
Gossiper.java (line 1251) Announcing shutdown
  INFO [RMI TCP Connection(2095)-127.0.0.1] 2014-02-24 08:34:27,001 
MessagingService.java (line 665) Waiting for messaging service to quiesce
  INFO [RMI TCP Connection(2095)-127.0.0.1] 2014-02-24 08:34:27,040 
ColumnFamilyStore.java (line 784) Enqueuing flush of 
Memtable-sr@1217138300(1825983/4411193 serialized/live bytes, 29946 ops)
  INFO [RMI TCP Connection(2095)-127.0.0.1] 2014-02-24 08:34:27,040 
ColumnFamilyStore.java (line 784) Enqueuing flush of 
Memtable-etol@703118381(2963818/46129889 serialized/live bytes, 68926 ops)
  INFO [FlushWriter:272] 2014-02-24 08:34:27,040 Memtable.java (line 333) 
Writing Memtable-sr@1217138300(1825983/4411193 serialized/live bytes, 29946 ops)
  INFO [RMI TCP Connection(2095)-127.0.0.1] 2014-02-24 08:34:27,054 
ColumnFamilyStore.java (line 784) Enqueuing flush of 
Memtable-events@899982591(188/1880 serialized/live bytes, 7 ops)
  INFO [RMI TCP Connection(2095)-127.0.0.1] 2014-02-24 08:34:27,075 
ColumnFamilyStore.java (line 784) Enqueuing flush of 
Memtable-events_timeline@1379706298(16/160 serialized/live bytes, 1 ops)
  INFO [FlushWriter:273] 2014-02-24 08:34:27,075 Memtable.java (line 333) 
Writing Memtable-etol@703118381(2963818/46129889 serialized/live bytes, 68926 
ops)
  INFO [ACCEPT-localhost/127.0.0.1] 2014-02-24 08:34:27,144 
MessagingService.java (line 875) MessagingService has terminated the accept() 
thread
  INFO [FlushWriter:272] 2014-02-24 08:34:27,411 Memtable.java (line 373) 
Completed flushing C:\Program Files\DataStax 
Community\data\data\testdb_1393207231382\sr\testdb_1393207231382-sr-jb-473-Data.db
 (428854 bytes) for commitlog position ReplayPosition(segmentId=1393178353775, 
position=18771262)
  INFO [FlushWriter:272] 2014-02-24 08:34:27,411 Memtable.java (line 333) 
Writing Memtable-events@899982591(188/1880 serialized/live bytes, 7 ops)
  INFO [FlushWriter:273] 2014-02-24 08:34:27,932 Memtable.java (line 373) 
Completed flushing C:\Program Files\DataStax 
Community\data\data\testdb_1393207231382\etol\testdb_1393207231382-etol-jb-1563-Data.db
 (1012805 bytes) for commitlog position ReplayPosition(segmentId=1393178353775, 
position=18771262)
  INFO [FlushWriter:273] 2014-02-24 08:34:27,933 Memtable.java (line 333) 
Writing Memtable-events_timeline@1379706298(16/160 serialized/live bytes, 1 ops)
  INFO [FlushWriter:272] 2014-02-24 08:34:28,366 Memtable.java (line 373) 
Completed flushing C:\Program Files\DataStax 
Community\data\data\OpsCenter\events\OpsCenter-events-jb-32-Data.db (184 bytes) 
for commitlog position ReplayPosition(segmentId=1393178353775, 
position=18771262)
  INFO [FlushWriter:273] 2014-02-24 08:34:28,456 Memtable.java (line 373) 
Completed flushing C:\Program Files\DataStax 
Community\data\data\OpsCenter\events_timeline\OpsCenter-events_timeline-jb-39-Data.db
 (47 bytes) for commitlog position ReplayPosition(segmentId=1393178353775, 
position=18771262)
  INFO [RMI TCP Connection(2095)-127.0.0.1] 2014-02-24 08:34:28,457 
ColumnFamilyStore.java (line 784) Enqueuing flush of 
Memtable-compaction_history@814197203(1725/19675 serialized/live bytes, 45 ops)
  INFO [FlushWriter:272] 2014-02-24 08:34:28,458 Memtable.java (line 333) 
Writing Memtable-compaction_history@814197203(1725/19675 serialized/live bytes, 
45 ops)
  INFO [RMI TCP Connection(2095)-127.0.0.1] 2014-02-24 08:34:28,458 
ColumnFamilyStore.java (line 784) Enqueuing flush of 
Memtable-sstable_activity@446592137(13500/207410 serialized/live bytes, 1442 
ops)
  INFO [FlushWriter:273] 2014-02-24 08:34:28,458 Memtable.java (line 333) 
Writing Memtable-sstable_activity@446592137(13500/207410 serialized/live bytes, 
1442 ops)
  INFO [FlushWriter:273] 2014-02-24 08:34:28,732 Memtable.java (line 373) 
Completed flushing C:\Program Files\DataStax 
Community\data\data\system\sstable_activity\system-sstable_activity-jb-428-Data.db
 (4072 bytes) for commitlog position ReplayPosition(segmentId=1393178353775, 
position=18771471)
  INFO [FlushWriter:272] 2014-02-24 08:34:28,761 Memtable.java (line 373) 
Completed flushing C:\Program 

[jira] [Commented] (CASSANDRA-6736) Windows7 AccessDeniedException on commit log

2014-02-22 Thread Bill Mitchell (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13909439#comment-13909439
 ] 

Bill Mitchell commented on CASSANDRA-6736:
--

{quote}Is this reproducible?{quote}

In a statistical sense, yes, I have encountered this failure repeatedly when 
trying to insert 100 million rows into two tables, one with a single wide row, 
and the other with narrow rows.  The failure does not happen at exactly the 
same point, which suggests it is a timing problem.  The test does use randomly 
generated data, so one cannot completely rule out a data dependency.  There are 
also reads intermixed with the inserts, as for each segment of 10,000 rows, it 
checks for duplicate values and removes them before inserting the new, unique 
values.   

Test history of the 100 million insert test:
2014-02-04 passed
2014-02-07 passed
2014-02-10 failed after 16,707,724 rows (changing the second table to have a 
few wide rows)
2014-02-10 failed after 6,215,471 rows
2014-02-12 failed after 21,038,110 rows (after installing 2.0.5)
2014-02-12 failed after 63,397,406 rows
2014-02-14 failed after 33,974,034  rows
2014-02-16 passed, using unlogged batch instead of logged batch
2014-02-18 failed after 43,151,232 rows, using logged batch
2014-02-20 failed after 54,263,560 rows
2014-02-21 passed, logged batch but with only 1,000 rows inserted in each table 
instead of 10,000

The failures were observed on 2.0.3, 2.0.4, and 2.0.5, so they are not 
restricted to the most recent build.  I don't have a hypothesis for why the 
test passed on 02-04 and 02-07.  

I tried the reduced batch size of 1000 pairs of rows under the hypothesis that 
the failure has something to do with the large batch size causing a large 
commit log that needs to be compacted, and cannot just be marked as complete.  
Of course, one success does not necessarily mean that the problem cannot happen 
with the smaller batch size; it may just change the timing such that one is 
less likely to hit the failure.  

 Windows7 AccessDeniedException on commit log 
 -

 Key: CASSANDRA-6736
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6736
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Windows 7, quad core, 8GB RAM, single Cassandra node, 
 Cassandra 2.0.5 with leakdetect patch from CASSANDRA-6283
Reporter: Bill Mitchell
Assignee: Joshua McKenzie
 Attachments: 2014-02-18-22-16.log


 Similar to the data file deletion of CASSANDRA-6283, under heavy load with 
 logged batches, I am seeing a problem where the Commit log cannot be deleted:
  ERROR [COMMIT-LOG-ALLOCATOR] 2014-02-18 22:15:58,252 CassandraDaemon.java 
 (line 192) Exception in thread Thread[COMMIT-LOG-ALLOCATOR,5,main]
  FSWriteError in C:\Program Files\DataStax 
 Community\data\commitlog\CommitLog-3-1392761510706.log
   at 
 org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:120)
   at 
 org.apache.cassandra.db.commitlog.CommitLogSegment.discard(CommitLogSegment.java:150)
   at 
 org.apache.cassandra.db.commitlog.CommitLogAllocator$4.run(CommitLogAllocator.java:217)
   at 
 org.apache.cassandra.db.commitlog.CommitLogAllocator$1.runMayThrow(CommitLogAllocator.java:95)
   at 
 org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
   at java.lang.Thread.run(Unknown Source)
 Caused by: java.nio.file.AccessDeniedException: C:\Program Files\DataStax 
 Community\data\commitlog\CommitLog-3-1392761510706.log
   at sun.nio.fs.WindowsException.translateToIOException(Unknown Source)
   at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source)
   at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source)
   at sun.nio.fs.WindowsFileSystemProvider.implDelete(Unknown Source)
   at sun.nio.fs.AbstractFileSystemProvider.delete(Unknown Source)
   at java.nio.file.Files.delete(Unknown Source)
   at 
 org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:116)
   ... 5 more
 (Attached in 2014-02-18-22-16.log is a larger excerpt from the cassandra.log.)
 In this particular case, I was trying to do 100 million inserts into two 
 tables in parallel, one with a single wide row and one with narrow rows, and 
 the error appeared after inserting 43,151,232 rows.  So it does take a while 
 to trip over this timing issue.  
 It may be aggravated by the size of the batches. This test was writing 10,000 
 rows to each table in a batch.  
 When I try switching the same test from using a logged batch to an unlogged 
 batch, and no such failure appears. So the issue could be related to the use 
 of large, logged batches, or it could be that unlogged batches just change 
 the probability of failure.  



--
This message was sent 

[jira] [Commented] (CASSANDRA-6721) READ-STAGE: IllegalArgumentException when re-reading wide row immediately upon creation

2014-02-19 Thread Bill Mitchell (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905481#comment-13905481
 ] 

Bill Mitchell commented on CASSANDRA-6721:
--

Thank you for the suggestion on the the saved_caches.  That was helpful, as 
there were entries there for testdb, even after the database was dropped.  They 
were timestamped at 1419 yesterday with a repeat of the failure seen in the 
attached 2014-02-18-13-45 log.   Removing these and restarting the server gave 
it a chance to forget about their data so that it was not applied to a new 
instance of the keyspace with the same name.  

As these two logs also displayed the LEAK finalizer message from the 
leakdetect.patch, and the earlier failures did not, it is still possible that 
they represent a different, earlier failure.  I will need to make runs over 
several days to see if this problem reappears, checking meanwhile to see if 
there are entries in saved_caches before the test begins. 

The first time I tried this on a larger test, it ran into a similar cross 
keyspace instance contamination that I figured out a few days ago, where 
transactions left in the commit log from an earlier transaction are replayed 
after the new keyspace is created.  

 READ-STAGE: IllegalArgumentException when re-reading wide row immediately 
 upon creation  
 -

 Key: CASSANDRA-6721
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6721
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Windows 7 x64 dual core, 8GB memory, single Cassandra 
 node, Java 1.7.0_45
Reporter: Bill Mitchell
 Attachments: 2014-02-15.txt, 2014-02-17-21-05.txt, 
 2014-02-17-22-05.txt, 2014-02-18-13-45.txt


 In my test case, I am writing a wide row to one table, ordering the columns 
 in reverse chronogical order, newest to oldest, by insertion time.  A 
 simplified version of the schema:
 CREATE TABLE IF NOT EXISTS sr (s BIGINT, p INT, l BIGINT, ec TEXT, createDate 
 TIMESTAMP, k BIGINT, properties TEXT, PRIMARY KEY ((s, p, l), createDate, ec) 
 ) WITH CLUSTERING ORDER BY (createDate DESC) AND compression = 
 {'sstable_compression' : 'LZ4Compressor'} 
 Intermittently, after inserting 1,000,000 or 10,000,000 or more rows, when my 
 test immediately turns around and tries to read this partition in its 
 entirety, the client times out on the read and the Cassandra log looks like 
 the following:
 java.lang.RuntimeException: java.lang.IllegalArgumentException
   at 
 org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1935)
   at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
   at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
   at java.lang.Thread.run(Unknown Source)
 Caused by: java.lang.IllegalArgumentException
   at java.nio.Buffer.limit(Unknown Source)
   at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:55)
   at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:64)
   at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:82)
   at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:35)
   at 
 org.apache.cassandra.db.marshal.AbstractType$3.compare(AbstractType.java:77)
   at 
 org.apache.cassandra.db.marshal.AbstractType$3.compare(AbstractType.java:74)
   at 
 org.apache.cassandra.utils.MergeIterator$Candidate.compareTo(MergeIterator.java:152)
   at 
 org.apache.cassandra.utils.MergeIterator$Candidate.compareTo(MergeIterator.java:129)
   at java.util.PriorityQueue.siftUpComparable(Unknown Source)
   at java.util.PriorityQueue.siftUp(Unknown Source)
   at java.util.PriorityQueue.offer(Unknown Source)
   at java.util.PriorityQueue.add(Unknown Source)
   at 
 org.apache.cassandra.utils.MergeIterator$ManyToOne.init(MergeIterator.java:90)
   at org.apache.cassandra.utils.MergeIterator.get(MergeIterator.java:46)
   at 
 org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:120)
   at 
 org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:80)
   at 
 org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:72)
   at 
 org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:297)
   at 
 org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:53)
   at 
 org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1560)
   at 
 

[jira] [Created] (CASSANDRA-6736) Windows7 AccessDeniedException on commit log

2014-02-19 Thread Bill Mitchell (JIRA)
Bill Mitchell created CASSANDRA-6736:


 Summary: Windows7 AccessDeniedException on commit log 
 Key: CASSANDRA-6736
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6736
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Windows 7, quad core, 8GB RAM, single Cassandra node, 
Cassandra 2.0.5 with leakdetect patch from CASSANDRA-6283
Reporter: Bill Mitchell
 Attachments: 2014-02-18-22-16.log

Similar to the data file deletion of CASSANDRA-6283, under heavy load with 
logged batches, I am seeing a problem where the Commit log cannot be deleted:
 ERROR [COMMIT-LOG-ALLOCATOR] 2014-02-18 22:15:58,252 CassandraDaemon.java 
(line 192) Exception in thread Thread[COMMIT-LOG-ALLOCATOR,5,main]
 FSWriteError in C:\Program Files\DataStax 
Community\data\commitlog\CommitLog-3-1392761510706.log
at 
org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:120)
at 
org.apache.cassandra.db.commitlog.CommitLogSegment.discard(CommitLogSegment.java:150)
at 
org.apache.cassandra.db.commitlog.CommitLogAllocator$4.run(CommitLogAllocator.java:217)
at 
org.apache.cassandra.db.commitlog.CommitLogAllocator$1.runMayThrow(CommitLogAllocator.java:95)
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
at java.lang.Thread.run(Unknown Source)
Caused by: java.nio.file.AccessDeniedException: C:\Program Files\DataStax 
Community\data\commitlog\CommitLog-3-1392761510706.log
at sun.nio.fs.WindowsException.translateToIOException(Unknown Source)
at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source)
at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source)
at sun.nio.fs.WindowsFileSystemProvider.implDelete(Unknown Source)
at sun.nio.fs.AbstractFileSystemProvider.delete(Unknown Source)
at java.nio.file.Files.delete(Unknown Source)
at 
org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:116)
... 5 more
(Attached in 2014-02-18-22-16.log is a larger excerpt from the cassandra.log.)

In this particular case, I was trying to do 100 million inserts into two tables 
in parallel, one with a single wide row and one with narrow rows, and the error 
appeared after inserting 43,151,232 rows.  So it does take a while to trip over 
this timing issue.  

It may be aggravated by the size of the batches. This test was writing 10,000 
rows to each table in a batch.  

When I try switching the same test from using a logged batch to an unlogged 
batch, and no such failure appears. So the issue could be related to the use of 
large, logged batches, or it could be that unlogged batches just change the 
probability of failure.  






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (CASSANDRA-6721) READ-STAGE: IllegalArgumentException when re-reading wide row immediately upon creation

2014-02-18 Thread Bill Mitchell (JIRA)
Bill Mitchell created CASSANDRA-6721:


 Summary: READ-STAGE: IllegalArgumentException when re-reading wide 
row immediately upon creation  
 Key: CASSANDRA-6721
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6721
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Windows 7 x64, 8GB memory, single Cassandra node, Java 
1.7.0_45
Reporter: Bill Mitchell


In my test case, I am writing a wide row to one table, ordering the columns in 
reverse chronogical order, newest to oldest, by insertion time.  A simplified 
version of the schema:
CREATE TABLE IF NOT EXISTS sr (s BIGINT, p INT, l BIGINT, ec TEXT, createDate 
TIMESTAMP, k BIGINT, properties TEXT, PRIMARY KEY ((s, p, l), createDate, ec) ) 
WITH CLUSTERING ORDER BY (createDate DESC) AND compression = 
{'sstable_compression' : 'LZ4Compressor'} ;

Intermittently, after inserting 1,000,000 or 10,000,000 or more rows, when my 
test immediately turns around and tries to read this partition in its entirety, 
the client times out on the read and the Cassandra log looks like the following:

java.lang.RuntimeException: java.lang.IllegalArgumentException
at 
org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1935)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.IllegalArgumentException
at java.nio.Buffer.limit(Unknown Source)
at 
org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:55)
at 
org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:64)
at 
org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:82)
at 
org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:35)
at 
org.apache.cassandra.db.marshal.AbstractType$3.compare(AbstractType.java:77)
at 
org.apache.cassandra.db.marshal.AbstractType$3.compare(AbstractType.java:74)
at 
org.apache.cassandra.utils.MergeIterator$Candidate.compareTo(MergeIterator.java:152)
at 
org.apache.cassandra.utils.MergeIterator$Candidate.compareTo(MergeIterator.java:129)
at java.util.PriorityQueue.siftUpComparable(Unknown Source)
at java.util.PriorityQueue.siftUp(Unknown Source)
at java.util.PriorityQueue.offer(Unknown Source)
at java.util.PriorityQueue.add(Unknown Source)
at 
org.apache.cassandra.utils.MergeIterator$ManyToOne.init(MergeIterator.java:90)
at org.apache.cassandra.utils.MergeIterator.get(MergeIterator.java:46)
at 
org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:120)
at 
org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:80)
at 
org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:72)
at 
org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:297)
at 
org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:53)
at 
org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1560)
at 
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1379)
at org.apache.cassandra.db.Keyspace.getRow(Keyspace.java:327)
at 
org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:65)
at 
org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:1396)
at 
org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1931)
... 3 more

I have seen the same failure whether I use the LZ4Compressor or the 
SnappyCompressor, so it is not dependent on the choice of compression. 
When compression is disabled, the log is similar, differing slightly in the 
details.  The exception is then:
 java.io.IOError: java.io.IOException: mmap segment underflow; remaining is 
10778639 but 876635247 requested

At least in this case of no compression, although the read test failed when run 
immediately after the data was written, running just the read tests again later 
succeeded.  Which suggests this is a problem with a cached version of the data, 
as the underlying file itself is not corrupted.

The attached 2014-02-15 and 2014-02-17-21-05 files show the initial failure 
with LZ4Compressor.  The 2014-02-17-22-05 file shows the log from the 
uncompressed test.
In all of these, the log includes the message 
CompactionController.java (line 192) Compacting large row testdb/sr:5:1:6 
(1079784915 bytes) incrementally.  
This may be coincidental, as it turns out, as I may be seeing 

[jira] [Updated] (CASSANDRA-6721) READ-STAGE: IllegalArgumentException when re-reading wide row immediately upon creation

2014-02-18 Thread Bill Mitchell (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Mitchell updated CASSANDRA-6721:
-

Attachment: 2014-02-18-13-45.txt
2014-02-17-22-05.txt
2014-02-17-21-05.txt
2014-02-15.txt

 READ-STAGE: IllegalArgumentException when re-reading wide row immediately 
 upon creation  
 -

 Key: CASSANDRA-6721
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6721
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Windows 7 x64, 8GB memory, single Cassandra node, Java 
 1.7.0_45
Reporter: Bill Mitchell
 Attachments: 2014-02-15.txt, 2014-02-17-21-05.txt, 
 2014-02-17-22-05.txt, 2014-02-18-13-45.txt


 In my test case, I am writing a wide row to one table, ordering the columns 
 in reverse chronogical order, newest to oldest, by insertion time.  A 
 simplified version of the schema:
 CREATE TABLE IF NOT EXISTS sr (s BIGINT, p INT, l BIGINT, ec TEXT, createDate 
 TIMESTAMP, k BIGINT, properties TEXT, PRIMARY KEY ((s, p, l), createDate, ec) 
 ) WITH CLUSTERING ORDER BY (createDate DESC) AND compression = 
 {'sstable_compression' : 'LZ4Compressor'} ;
 Intermittently, after inserting 1,000,000 or 10,000,000 or more rows, when my 
 test immediately turns around and tries to read this partition in its 
 entirety, the client times out on the read and the Cassandra log looks like 
 the following:
 java.lang.RuntimeException: java.lang.IllegalArgumentException
   at 
 org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1935)
   at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
   at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
   at java.lang.Thread.run(Unknown Source)
 Caused by: java.lang.IllegalArgumentException
   at java.nio.Buffer.limit(Unknown Source)
   at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:55)
   at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:64)
   at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:82)
   at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:35)
   at 
 org.apache.cassandra.db.marshal.AbstractType$3.compare(AbstractType.java:77)
   at 
 org.apache.cassandra.db.marshal.AbstractType$3.compare(AbstractType.java:74)
   at 
 org.apache.cassandra.utils.MergeIterator$Candidate.compareTo(MergeIterator.java:152)
   at 
 org.apache.cassandra.utils.MergeIterator$Candidate.compareTo(MergeIterator.java:129)
   at java.util.PriorityQueue.siftUpComparable(Unknown Source)
   at java.util.PriorityQueue.siftUp(Unknown Source)
   at java.util.PriorityQueue.offer(Unknown Source)
   at java.util.PriorityQueue.add(Unknown Source)
   at 
 org.apache.cassandra.utils.MergeIterator$ManyToOne.init(MergeIterator.java:90)
   at org.apache.cassandra.utils.MergeIterator.get(MergeIterator.java:46)
   at 
 org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:120)
   at 
 org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:80)
   at 
 org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:72)
   at 
 org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:297)
   at 
 org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:53)
   at 
 org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1560)
   at 
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1379)
   at org.apache.cassandra.db.Keyspace.getRow(Keyspace.java:327)
   at 
 org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:65)
   at 
 org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:1396)
   at 
 org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1931)
   ... 3 more
 I have seen the same failure whether I use the LZ4Compressor or the 
 SnappyCompressor, so it is not dependent on the choice of compression. 
 When compression is disabled, the log is similar, differing slightly in the 
 details.  The exception is then:
  java.io.IOError: java.io.IOException: mmap segment underflow; remaining is 
 10778639 but 876635247 requested
 At least in this case of no compression, although the read test failed when 
 run immediately after the data was written, running just the read tests again 
 later succeeded.  Which suggests this is a problem with a cached 

[jira] [Updated] (CASSANDRA-6720) Implment support for Log4j DOMConfigurator for Cassandra damon

2014-02-18 Thread Bill Mitchell (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Mitchell updated CASSANDRA-6720:
-

Attachment: (was: 2014-02-15.txt)

 Implment support for Log4j DOMConfigurator for Cassandra damon
 --

 Key: CASSANDRA-6720
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6720
 Project: Cassandra
  Issue Type: Improvement
  Components: Config, Core
Reporter: Nikolai Grigoriev
Priority: Trivial

 Currently CassandraDaemon explicitly uses PropertyConfigurator to load log4j 
 settings if log4j.defaultInitOverride is set to true, which is done by 
 default. This does not allow to use log4j XML configuration file because it 
 requires using of DOMConfigurator, in the similar fashion. The only way to 
 use it is to change the value of  log4j.defaultInitOverride property in the 
 startup script.
 Here is the background why I think it might be useful to support the XML 
 configuration, even if you hate XML ;)
 I wanted to ship my Cassandra logs to Logstash and I have been using 
 SocketAppender. But then I have discovered that any issue with Logstash log4j 
 server result in significant performance degradation for Cassandra as the 
 logger blocks. I was able to easily reproduce the problem with a separate 
 test. It seems that the obvious solution was to use AsyncAppender before 
 SocketAppender, that eliminates the blocking. However, AsyncAppender can be 
 only confgured via DOMConfigurator, at least in Log4j 1.2.
 I think it does not hurt to make a little change to support both 
 configuration types, in a way similar to Spring's Log4jConfigurer:
 {code}
   public static void initLogging(String location, long refreshInterval) 
 throws FileNotFoundException {
   String resolvedLocation = 
 SystemPropertyUtils.resolvePlaceholders(location);
   File file = ResourceUtils.getFile(resolvedLocation);
   if (!file.exists()) {
   throw new FileNotFoundException(Log4j config file [ + 
 resolvedLocation + ] not found);
   }
   if 
 (resolvedLocation.toLowerCase().endsWith(XML_FILE_EXTENSION)) {
   
 DOMConfigurator.configureAndWatch(file.getAbsolutePath(), refreshInterval);
   }
   else {
   
 PropertyConfigurator.configureAndWatch(file.getAbsolutePath(), 
 refreshInterval);
   }
   }
 I would be happy to submit the change unless there are any objections.
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (CASSANDRA-6720) Implment support for Log4j DOMConfigurator for Cassandra damon

2014-02-18 Thread Bill Mitchell (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Mitchell updated CASSANDRA-6720:
-

Attachment: 2014-02-18-13-45.txt
2014-02-17-22-05.txt
2014-02-17-21-05.txt
2014-02-15.txt

 Implment support for Log4j DOMConfigurator for Cassandra damon
 --

 Key: CASSANDRA-6720
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6720
 Project: Cassandra
  Issue Type: Improvement
  Components: Config, Core
Reporter: Nikolai Grigoriev
Priority: Trivial

 Currently CassandraDaemon explicitly uses PropertyConfigurator to load log4j 
 settings if log4j.defaultInitOverride is set to true, which is done by 
 default. This does not allow to use log4j XML configuration file because it 
 requires using of DOMConfigurator, in the similar fashion. The only way to 
 use it is to change the value of  log4j.defaultInitOverride property in the 
 startup script.
 Here is the background why I think it might be useful to support the XML 
 configuration, even if you hate XML ;)
 I wanted to ship my Cassandra logs to Logstash and I have been using 
 SocketAppender. But then I have discovered that any issue with Logstash log4j 
 server result in significant performance degradation for Cassandra as the 
 logger blocks. I was able to easily reproduce the problem with a separate 
 test. It seems that the obvious solution was to use AsyncAppender before 
 SocketAppender, that eliminates the blocking. However, AsyncAppender can be 
 only confgured via DOMConfigurator, at least in Log4j 1.2.
 I think it does not hurt to make a little change to support both 
 configuration types, in a way similar to Spring's Log4jConfigurer:
 {code}
   public static void initLogging(String location, long refreshInterval) 
 throws FileNotFoundException {
   String resolvedLocation = 
 SystemPropertyUtils.resolvePlaceholders(location);
   File file = ResourceUtils.getFile(resolvedLocation);
   if (!file.exists()) {
   throw new FileNotFoundException(Log4j config file [ + 
 resolvedLocation + ] not found);
   }
   if 
 (resolvedLocation.toLowerCase().endsWith(XML_FILE_EXTENSION)) {
   
 DOMConfigurator.configureAndWatch(file.getAbsolutePath(), refreshInterval);
   }
   else {
   
 PropertyConfigurator.configureAndWatch(file.getAbsolutePath(), 
 refreshInterval);
   }
   }
 I would be happy to submit the change unless there are any objections.
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (CASSANDRA-6720) Implment support for Log4j DOMConfigurator for Cassandra damon

2014-02-18 Thread Bill Mitchell (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Mitchell updated CASSANDRA-6720:
-

Attachment: (was: 2014-02-17-21-05.txt)

 Implment support for Log4j DOMConfigurator for Cassandra damon
 --

 Key: CASSANDRA-6720
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6720
 Project: Cassandra
  Issue Type: Improvement
  Components: Config, Core
Reporter: Nikolai Grigoriev
Priority: Trivial

 Currently CassandraDaemon explicitly uses PropertyConfigurator to load log4j 
 settings if log4j.defaultInitOverride is set to true, which is done by 
 default. This does not allow to use log4j XML configuration file because it 
 requires using of DOMConfigurator, in the similar fashion. The only way to 
 use it is to change the value of  log4j.defaultInitOverride property in the 
 startup script.
 Here is the background why I think it might be useful to support the XML 
 configuration, even if you hate XML ;)
 I wanted to ship my Cassandra logs to Logstash and I have been using 
 SocketAppender. But then I have discovered that any issue with Logstash log4j 
 server result in significant performance degradation for Cassandra as the 
 logger blocks. I was able to easily reproduce the problem with a separate 
 test. It seems that the obvious solution was to use AsyncAppender before 
 SocketAppender, that eliminates the blocking. However, AsyncAppender can be 
 only confgured via DOMConfigurator, at least in Log4j 1.2.
 I think it does not hurt to make a little change to support both 
 configuration types, in a way similar to Spring's Log4jConfigurer:
 {code}
   public static void initLogging(String location, long refreshInterval) 
 throws FileNotFoundException {
   String resolvedLocation = 
 SystemPropertyUtils.resolvePlaceholders(location);
   File file = ResourceUtils.getFile(resolvedLocation);
   if (!file.exists()) {
   throw new FileNotFoundException(Log4j config file [ + 
 resolvedLocation + ] not found);
   }
   if 
 (resolvedLocation.toLowerCase().endsWith(XML_FILE_EXTENSION)) {
   
 DOMConfigurator.configureAndWatch(file.getAbsolutePath(), refreshInterval);
   }
   else {
   
 PropertyConfigurator.configureAndWatch(file.getAbsolutePath(), 
 refreshInterval);
   }
   }
 I would be happy to submit the change unless there are any objections.
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (CASSANDRA-6720) Implment support for Log4j DOMConfigurator for Cassandra damon

2014-02-18 Thread Bill Mitchell (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Mitchell updated CASSANDRA-6720:
-

Attachment: (was: 2014-02-18-13-45.txt)

 Implment support for Log4j DOMConfigurator for Cassandra damon
 --

 Key: CASSANDRA-6720
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6720
 Project: Cassandra
  Issue Type: Improvement
  Components: Config, Core
Reporter: Nikolai Grigoriev
Priority: Trivial

 Currently CassandraDaemon explicitly uses PropertyConfigurator to load log4j 
 settings if log4j.defaultInitOverride is set to true, which is done by 
 default. This does not allow to use log4j XML configuration file because it 
 requires using of DOMConfigurator, in the similar fashion. The only way to 
 use it is to change the value of  log4j.defaultInitOverride property in the 
 startup script.
 Here is the background why I think it might be useful to support the XML 
 configuration, even if you hate XML ;)
 I wanted to ship my Cassandra logs to Logstash and I have been using 
 SocketAppender. But then I have discovered that any issue with Logstash log4j 
 server result in significant performance degradation for Cassandra as the 
 logger blocks. I was able to easily reproduce the problem with a separate 
 test. It seems that the obvious solution was to use AsyncAppender before 
 SocketAppender, that eliminates the blocking. However, AsyncAppender can be 
 only confgured via DOMConfigurator, at least in Log4j 1.2.
 I think it does not hurt to make a little change to support both 
 configuration types, in a way similar to Spring's Log4jConfigurer:
 {code}
   public static void initLogging(String location, long refreshInterval) 
 throws FileNotFoundException {
   String resolvedLocation = 
 SystemPropertyUtils.resolvePlaceholders(location);
   File file = ResourceUtils.getFile(resolvedLocation);
   if (!file.exists()) {
   throw new FileNotFoundException(Log4j config file [ + 
 resolvedLocation + ] not found);
   }
   if 
 (resolvedLocation.toLowerCase().endsWith(XML_FILE_EXTENSION)) {
   
 DOMConfigurator.configureAndWatch(file.getAbsolutePath(), refreshInterval);
   }
   else {
   
 PropertyConfigurator.configureAndWatch(file.getAbsolutePath(), 
 refreshInterval);
   }
   }
 I would be happy to submit the change unless there are any objections.
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (CASSANDRA-6720) Implment support for Log4j DOMConfigurator for Cassandra damon

2014-02-18 Thread Bill Mitchell (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Mitchell updated CASSANDRA-6720:
-

Attachment: (was: 2014-02-17-22-05.txt)

 Implment support for Log4j DOMConfigurator for Cassandra damon
 --

 Key: CASSANDRA-6720
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6720
 Project: Cassandra
  Issue Type: Improvement
  Components: Config, Core
Reporter: Nikolai Grigoriev
Priority: Trivial

 Currently CassandraDaemon explicitly uses PropertyConfigurator to load log4j 
 settings if log4j.defaultInitOverride is set to true, which is done by 
 default. This does not allow to use log4j XML configuration file because it 
 requires using of DOMConfigurator, in the similar fashion. The only way to 
 use it is to change the value of  log4j.defaultInitOverride property in the 
 startup script.
 Here is the background why I think it might be useful to support the XML 
 configuration, even if you hate XML ;)
 I wanted to ship my Cassandra logs to Logstash and I have been using 
 SocketAppender. But then I have discovered that any issue with Logstash log4j 
 server result in significant performance degradation for Cassandra as the 
 logger blocks. I was able to easily reproduce the problem with a separate 
 test. It seems that the obvious solution was to use AsyncAppender before 
 SocketAppender, that eliminates the blocking. However, AsyncAppender can be 
 only confgured via DOMConfigurator, at least in Log4j 1.2.
 I think it does not hurt to make a little change to support both 
 configuration types, in a way similar to Spring's Log4jConfigurer:
 {code}
   public static void initLogging(String location, long refreshInterval) 
 throws FileNotFoundException {
   String resolvedLocation = 
 SystemPropertyUtils.resolvePlaceholders(location);
   File file = ResourceUtils.getFile(resolvedLocation);
   if (!file.exists()) {
   throw new FileNotFoundException(Log4j config file [ + 
 resolvedLocation + ] not found);
   }
   if 
 (resolvedLocation.toLowerCase().endsWith(XML_FILE_EXTENSION)) {
   
 DOMConfigurator.configureAndWatch(file.getAbsolutePath(), refreshInterval);
   }
   else {
   
 PropertyConfigurator.configureAndWatch(file.getAbsolutePath(), 
 refreshInterval);
   }
   }
 I would be happy to submit the change unless there are any objections.
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (CASSANDRA-6721) READ-STAGE: IllegalArgumentException when re-reading wide row immediately upon creation

2014-02-18 Thread Bill Mitchell (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Mitchell updated CASSANDRA-6721:
-

Description: 
In my test case, I am writing a wide row to one table, ordering the columns in 
reverse chronogical order, newest to oldest, by insertion time.  A simplified 
version of the schema:
CREATE TABLE IF NOT EXISTS sr (s BIGINT, p INT, l BIGINT, ec TEXT, createDate 
TIMESTAMP, k BIGINT, properties TEXT, PRIMARY KEY ((s, p, l), createDate, ec) ) 
WITH CLUSTERING ORDER BY (createDate DESC) AND compression = 
{'sstable_compression' : 'LZ4Compressor'} 

Intermittently, after inserting 1,000,000 or 10,000,000 or more rows, when my 
test immediately turns around and tries to read this partition in its entirety, 
the client times out on the read and the Cassandra log looks like the following:

java.lang.RuntimeException: java.lang.IllegalArgumentException
at 
org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1935)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.IllegalArgumentException
at java.nio.Buffer.limit(Unknown Source)
at 
org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:55)
at 
org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:64)
at 
org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:82)
at 
org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:35)
at 
org.apache.cassandra.db.marshal.AbstractType$3.compare(AbstractType.java:77)
at 
org.apache.cassandra.db.marshal.AbstractType$3.compare(AbstractType.java:74)
at 
org.apache.cassandra.utils.MergeIterator$Candidate.compareTo(MergeIterator.java:152)
at 
org.apache.cassandra.utils.MergeIterator$Candidate.compareTo(MergeIterator.java:129)
at java.util.PriorityQueue.siftUpComparable(Unknown Source)
at java.util.PriorityQueue.siftUp(Unknown Source)
at java.util.PriorityQueue.offer(Unknown Source)
at java.util.PriorityQueue.add(Unknown Source)
at 
org.apache.cassandra.utils.MergeIterator$ManyToOne.init(MergeIterator.java:90)
at org.apache.cassandra.utils.MergeIterator.get(MergeIterator.java:46)
at 
org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:120)
at 
org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:80)
at 
org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:72)
at 
org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:297)
at 
org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:53)
at 
org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1560)
at 
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1379)
at org.apache.cassandra.db.Keyspace.getRow(Keyspace.java:327)
at 
org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:65)
at 
org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:1396)
at 
org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1931)
... 3 more

I have seen the same failure whether I use the LZ4Compressor or the 
SnappyCompressor, so it is not dependent on the choice of compression. 
When compression is disabled, the log is similar, differing slightly in the 
details.  The exception is then:
 java.io.IOError: java.io.IOException: mmap segment underflow; remaining is 
10778639 but 876635247 requested

At least in this case of no compression, although the read test failed when run 
immediately after the data was written, running just the read tests again later 
succeeded.  Which suggests this is a problem with a cached version of the data, 
as the underlying file itself is not corrupted.

The attached 2014-02-15 and 2014-02-17-21-05 files show the initial failure 
with LZ4Compressor.  The 2014-02-17-22-05 file shows the log from the 
uncompressed test.
In all of these, the log includes the message 
CompactionController.java (line 192) Compacting large row testdb/sr:5:1:6 
(1079784915 bytes) incrementally.  

This may be coincidental, as it turns out, as I may be seeing the same issue on 
a table with narrow rows and a large number of composite primary keys.  See the 
attached log 2014-02-18-13-45. 


  was:
In my test case, I am writing a wide row to one table, ordering the columns in 
reverse chronogical order, newest to oldest, by insertion time.  A 

[jira] [Updated] (CASSANDRA-6721) READ-STAGE: IllegalArgumentException when re-reading wide row immediately upon creation

2014-02-18 Thread Bill Mitchell (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Mitchell updated CASSANDRA-6721:
-

Environment: Windows 7 x64 dual core, 8GB memory, single Cassandra node, 
Java 1.7.0_45  (was: Windows 7 x64, 8GB memory, single Cassandra node, Java 
1.7.0_45)

 READ-STAGE: IllegalArgumentException when re-reading wide row immediately 
 upon creation  
 -

 Key: CASSANDRA-6721
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6721
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Windows 7 x64 dual core, 8GB memory, single Cassandra 
 node, Java 1.7.0_45
Reporter: Bill Mitchell
 Attachments: 2014-02-15.txt, 2014-02-17-21-05.txt, 
 2014-02-17-22-05.txt, 2014-02-18-13-45.txt


 In my test case, I am writing a wide row to one table, ordering the columns 
 in reverse chronogical order, newest to oldest, by insertion time.  A 
 simplified version of the schema:
 CREATE TABLE IF NOT EXISTS sr (s BIGINT, p INT, l BIGINT, ec TEXT, createDate 
 TIMESTAMP, k BIGINT, properties TEXT, PRIMARY KEY ((s, p, l), createDate, ec) 
 ) WITH CLUSTERING ORDER BY (createDate DESC) AND compression = 
 {'sstable_compression' : 'LZ4Compressor'} 
 Intermittently, after inserting 1,000,000 or 10,000,000 or more rows, when my 
 test immediately turns around and tries to read this partition in its 
 entirety, the client times out on the read and the Cassandra log looks like 
 the following:
 java.lang.RuntimeException: java.lang.IllegalArgumentException
   at 
 org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1935)
   at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
   at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
   at java.lang.Thread.run(Unknown Source)
 Caused by: java.lang.IllegalArgumentException
   at java.nio.Buffer.limit(Unknown Source)
   at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:55)
   at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:64)
   at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:82)
   at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:35)
   at 
 org.apache.cassandra.db.marshal.AbstractType$3.compare(AbstractType.java:77)
   at 
 org.apache.cassandra.db.marshal.AbstractType$3.compare(AbstractType.java:74)
   at 
 org.apache.cassandra.utils.MergeIterator$Candidate.compareTo(MergeIterator.java:152)
   at 
 org.apache.cassandra.utils.MergeIterator$Candidate.compareTo(MergeIterator.java:129)
   at java.util.PriorityQueue.siftUpComparable(Unknown Source)
   at java.util.PriorityQueue.siftUp(Unknown Source)
   at java.util.PriorityQueue.offer(Unknown Source)
   at java.util.PriorityQueue.add(Unknown Source)
   at 
 org.apache.cassandra.utils.MergeIterator$ManyToOne.init(MergeIterator.java:90)
   at org.apache.cassandra.utils.MergeIterator.get(MergeIterator.java:46)
   at 
 org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:120)
   at 
 org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:80)
   at 
 org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:72)
   at 
 org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:297)
   at 
 org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:53)
   at 
 org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1560)
   at 
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1379)
   at org.apache.cassandra.db.Keyspace.getRow(Keyspace.java:327)
   at 
 org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:65)
   at 
 org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:1396)
   at 
 org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1931)
   ... 3 more
 I have seen the same failure whether I use the LZ4Compressor or the 
 SnappyCompressor, so it is not dependent on the choice of compression. 
 When compression is disabled, the log is similar, differing slightly in the 
 details.  The exception is then:
  java.io.IOError: java.io.IOException: mmap segment underflow; remaining is 
 10778639 but 876635247 requested
 At least in this case of no compression, although the read test failed when 
 run immediately after the data was written, running just the read tests again 
 later succeeded.  Which suggests 

[jira] [Commented] (CASSANDRA-6721) READ-STAGE: IllegalArgumentException when re-reading wide row immediately upon creation

2014-02-18 Thread Bill Mitchell (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905043#comment-13905043
 ] 

Bill Mitchell commented on CASSANDRA-6721:
--

Generally my test case will drop the table and recreate it, to accommodate 
changes in the schema as I experiment with compression, key organization, etc.  
One would expect, though, that dropping the table would also clear the key 
cache.  

Several days ago, I did sometimes have to clear out the data directories, but 
those problems seem to have gone away with the upgrade to 2.0.5 -- it cleans up 
better on restart.  

 READ-STAGE: IllegalArgumentException when re-reading wide row immediately 
 upon creation  
 -

 Key: CASSANDRA-6721
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6721
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Windows 7 x64 dual core, 8GB memory, single Cassandra 
 node, Java 1.7.0_45
Reporter: Bill Mitchell
 Attachments: 2014-02-15.txt, 2014-02-17-21-05.txt, 
 2014-02-17-22-05.txt, 2014-02-18-13-45.txt


 In my test case, I am writing a wide row to one table, ordering the columns 
 in reverse chronogical order, newest to oldest, by insertion time.  A 
 simplified version of the schema:
 CREATE TABLE IF NOT EXISTS sr (s BIGINT, p INT, l BIGINT, ec TEXT, createDate 
 TIMESTAMP, k BIGINT, properties TEXT, PRIMARY KEY ((s, p, l), createDate, ec) 
 ) WITH CLUSTERING ORDER BY (createDate DESC) AND compression = 
 {'sstable_compression' : 'LZ4Compressor'} 
 Intermittently, after inserting 1,000,000 or 10,000,000 or more rows, when my 
 test immediately turns around and tries to read this partition in its 
 entirety, the client times out on the read and the Cassandra log looks like 
 the following:
 java.lang.RuntimeException: java.lang.IllegalArgumentException
   at 
 org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1935)
   at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
   at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
   at java.lang.Thread.run(Unknown Source)
 Caused by: java.lang.IllegalArgumentException
   at java.nio.Buffer.limit(Unknown Source)
   at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:55)
   at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:64)
   at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:82)
   at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:35)
   at 
 org.apache.cassandra.db.marshal.AbstractType$3.compare(AbstractType.java:77)
   at 
 org.apache.cassandra.db.marshal.AbstractType$3.compare(AbstractType.java:74)
   at 
 org.apache.cassandra.utils.MergeIterator$Candidate.compareTo(MergeIterator.java:152)
   at 
 org.apache.cassandra.utils.MergeIterator$Candidate.compareTo(MergeIterator.java:129)
   at java.util.PriorityQueue.siftUpComparable(Unknown Source)
   at java.util.PriorityQueue.siftUp(Unknown Source)
   at java.util.PriorityQueue.offer(Unknown Source)
   at java.util.PriorityQueue.add(Unknown Source)
   at 
 org.apache.cassandra.utils.MergeIterator$ManyToOne.init(MergeIterator.java:90)
   at org.apache.cassandra.utils.MergeIterator.get(MergeIterator.java:46)
   at 
 org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:120)
   at 
 org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:80)
   at 
 org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:72)
   at 
 org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:297)
   at 
 org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:53)
   at 
 org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1560)
   at 
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1379)
   at org.apache.cassandra.db.Keyspace.getRow(Keyspace.java:327)
   at 
 org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:65)
   at 
 org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:1396)
   at 
 org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1931)
   ... 3 more
 I have seen the same failure whether I use the LZ4Compressor or the 
 SnappyCompressor, so it is not dependent on the choice of compression. 
 When compression is disabled, the log is similar, differing slightly in the 
 details.  The 

[jira] [Comment Edited] (CASSANDRA-6721) READ-STAGE: IllegalArgumentException when re-reading wide row immediately upon creation

2014-02-18 Thread Bill Mitchell (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905043#comment-13905043
 ] 

Bill Mitchell edited comment on CASSANDRA-6721 at 2/19/14 2:37 AM:
---

Generally my test case will drop the table and recreate it, to accommodate 
changes in the schema as I experiment with compression, key organization, etc.  
I do allow a few second delay after the drop for it to clean up before the 
create table, to avoid some of the past known issues with re-creating a table 
with the same name.  One would expect that dropping the table would also clear 
the key cache.  

Several days ago, I did sometimes have to clear out the data directories, but 
those problems seem to have gone away with the upgrade to 2.0.5 -- it cleans up 
better on restart.  


was (Author: wtmitchell3):
Generally my test case will drop the table and recreate it, to accommodate 
changes in the schema as I experiment with compression, key organization, etc.  
One would expect, though, that dropping the table would also clear the key 
cache.  

Several days ago, I did sometimes have to clear out the data directories, but 
those problems seem to have gone away with the upgrade to 2.0.5 -- it cleans up 
better on restart.  

 READ-STAGE: IllegalArgumentException when re-reading wide row immediately 
 upon creation  
 -

 Key: CASSANDRA-6721
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6721
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Windows 7 x64 dual core, 8GB memory, single Cassandra 
 node, Java 1.7.0_45
Reporter: Bill Mitchell
 Attachments: 2014-02-15.txt, 2014-02-17-21-05.txt, 
 2014-02-17-22-05.txt, 2014-02-18-13-45.txt


 In my test case, I am writing a wide row to one table, ordering the columns 
 in reverse chronogical order, newest to oldest, by insertion time.  A 
 simplified version of the schema:
 CREATE TABLE IF NOT EXISTS sr (s BIGINT, p INT, l BIGINT, ec TEXT, createDate 
 TIMESTAMP, k BIGINT, properties TEXT, PRIMARY KEY ((s, p, l), createDate, ec) 
 ) WITH CLUSTERING ORDER BY (createDate DESC) AND compression = 
 {'sstable_compression' : 'LZ4Compressor'} 
 Intermittently, after inserting 1,000,000 or 10,000,000 or more rows, when my 
 test immediately turns around and tries to read this partition in its 
 entirety, the client times out on the read and the Cassandra log looks like 
 the following:
 java.lang.RuntimeException: java.lang.IllegalArgumentException
   at 
 org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1935)
   at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
   at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
   at java.lang.Thread.run(Unknown Source)
 Caused by: java.lang.IllegalArgumentException
   at java.nio.Buffer.limit(Unknown Source)
   at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:55)
   at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:64)
   at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:82)
   at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:35)
   at 
 org.apache.cassandra.db.marshal.AbstractType$3.compare(AbstractType.java:77)
   at 
 org.apache.cassandra.db.marshal.AbstractType$3.compare(AbstractType.java:74)
   at 
 org.apache.cassandra.utils.MergeIterator$Candidate.compareTo(MergeIterator.java:152)
   at 
 org.apache.cassandra.utils.MergeIterator$Candidate.compareTo(MergeIterator.java:129)
   at java.util.PriorityQueue.siftUpComparable(Unknown Source)
   at java.util.PriorityQueue.siftUp(Unknown Source)
   at java.util.PriorityQueue.offer(Unknown Source)
   at java.util.PriorityQueue.add(Unknown Source)
   at 
 org.apache.cassandra.utils.MergeIterator$ManyToOne.init(MergeIterator.java:90)
   at org.apache.cassandra.utils.MergeIterator.get(MergeIterator.java:46)
   at 
 org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:120)
   at 
 org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:80)
   at 
 org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:72)
   at 
 org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:297)
   at 
 org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:53)
   at 
 org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1560)
   at