[jira] [Commented] (CASSANDRA-6826) Query returns different number of results depending on fetchsize

2014-04-27 Thread Bill Mitchell (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13982679#comment-13982679
 ] 

Bill Mitchell commented on CASSANDRA-6826:
--

Thank you for calling my attention to its release; the last time I checked the 
DataStax site, I did not yet see it, and once again I forgot to check the 
Apache site directly.  

Although in my first 2.0.7 tests I saw a failure, I was trying something new, 
to use fetchMoreResults now that fetchSize was supposed to be fixed.  Further 
testing has convinced me that these failures are new issues, different from 
this report.  The specific test that failed for me above works in 2.0.7, so, 
yes, I believe this problem is fixed. 

 Query returns different number of results depending on fetchsize
 

 Key: CASSANDRA-6826
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6826
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: quad-core Windows 7 x64, single node cluster
 Cassandra 2.0.5
Reporter: Bill Mitchell
Assignee: Sylvain Lebresne

 I issue a query across the set of partitioned wide rows for one logical row, 
 where s, l, and partition specify the composite primary key for the row:
 SELECT ec, ea, rd FROM sr WHERE s = ? and partition IN ? and l = ? ALLOW 
 FILTERING;
 If I set fetchSize to only 1000 when the Cluster is configured, the query 
 sometimes does not return all the results.  In the particular case I am 
 chasing, it returns a total of 98586 rows.  If I increase the fetchsize to 
 10, all the 9 actual rows are returned.  This suggests there is some 
 problem with fetchsize re-establishing the position on the next segment of 
 the result set, at least when multiple partitions are being accessed.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6826) Query returns different number of results depending on fetchsize

2014-04-24 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13979734#comment-13979734
 ] 

Sylvain Lebresne commented on CASSANDRA-6826:
-

[~wtmitchell3] Did you time to check if you could reproduce on 2.0.7, now that 
it's out?

 Query returns different number of results depending on fetchsize
 

 Key: CASSANDRA-6826
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6826
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: quad-core Windows 7 x64, single node cluster
 Cassandra 2.0.5
Reporter: Bill Mitchell
Assignee: Sylvain Lebresne

 I issue a query across the set of partitioned wide rows for one logical row, 
 where s, l, and partition specify the composite primary key for the row:
 SELECT ec, ea, rd FROM sr WHERE s = ? and partition IN ? and l = ? ALLOW 
 FILTERING;
 If I set fetchSize to only 1000 when the Cluster is configured, the query 
 sometimes does not return all the results.  In the particular case I am 
 chasing, it returns a total of 98586 rows.  If I increase the fetchsize to 
 10, all the 9 actual rows are returned.  This suggests there is some 
 problem with fetchsize re-establishing the position on the next segment of 
 the result set, at least when multiple partitions are being accessed.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6826) Query returns different number of results depending on fetchsize

2014-03-28 Thread Bill Mitchell (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13951706#comment-13951706
 ] 

Bill Mitchell commented on CASSANDRA-6826:
--

I started working on a smaller testcase, but competing time pressures at work 
put that effort on hold.  In the meantime, I was able to work around this 
problem by using LIMIT instead of fetch, iterating over the partitions, and 
using a compound comparison in the WHERE clause to establish position for the 
next query.  This prompted me to open JAVA-295, as I had to abandon the 
QueryBuilder in order to construct this WHERE clause.  

When Cassandra 2.0.7 comes out, I will check if the fix to CASSANDRA-6825 also 
fixes all the issue I found with the SELECT.

 Query returns different number of results depending on fetchsize
 

 Key: CASSANDRA-6826
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6826
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: quad-core Windows 7 x64, single node cluster
 Cassandra 2.0.5
Reporter: Bill Mitchell
Assignee: Sylvain Lebresne

 I issue a query across the set of partitioned wide rows for one logical row, 
 where s, l, and partition specify the composite primary key for the row:
 SELECT ec, ea, rd FROM sr WHERE s = ? and partition IN ? and l = ? ALLOW 
 FILTERING;
 If I set fetchSize to only 1000 when the Cluster is configured, the query 
 sometimes does not return all the results.  In the particular case I am 
 chasing, it returns a total of 98586 rows.  If I increase the fetchsize to 
 10, all the 9 actual rows are returned.  This suggests there is some 
 problem with fetchsize re-establishing the position on the next segment of 
 the result set, at least when multiple partitions are being accessed.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6826) Query returns different number of results depending on fetchsize

2014-03-21 Thread Bill Mitchell (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943037#comment-13943037
 ] 

Bill Mitchell commented on CASSANDRA-6826:
--

It is worth noting that, when I first reported this problem, the difference 
between the two expected and actual number of rows returned was 1413, a rather 
odd number.  So far, on 2.0.6, I have seen differences that are always a 
multiple of 10,000, matching the behavior in CASSANDRA-6825.  So it may indeed 
be, as Sylvain suggested, that CASSANDRA-6748 fixed one problem, that I was 
seeing when I first reported this, but that the one test was hitting two 
problems, depending on timing and other issues, and now only CASSANDRA-6825 
remains.

 Query returns different number of results depending on fetchsize
 

 Key: CASSANDRA-6826
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6826
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: quad-core Windows 7 x64, single node cluster
 Cassandra 2.0.5
Reporter: Bill Mitchell
Assignee: Sylvain Lebresne

 I issue a query across the set of partitioned wide rows for one logical row, 
 where s, l, and partition specify the composite primary key for the row:
 SELECT ec, ea, rd FROM sr WHERE s = ? and partition IN ? and l = ? ALLOW 
 FILTERING;
 If I set fetchSize to only 1000 when the Cluster is configured, the query 
 sometimes does not return all the results.  In the particular case I am 
 chasing, it returns a total of 98586 rows.  If I increase the fetchsize to 
 10, all the 9 actual rows are returned.  This suggests there is some 
 problem with fetchsize re-establishing the position on the next segment of 
 the result set, at least when multiple partitions are being accessed.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6826) Query returns different number of results depending on fetchsize

2014-03-20 Thread Bill Mitchell (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13942250#comment-13942250
 ] 

Bill Mitchell commented on CASSANDRA-6826:
--

No doubt.  At the moment, though, the test case is embedded in a full 
application, as I mentioned to Joshua (CASSANDRA-6736).  Stripping that 
application down so that the test case did not carry with it so much 
proprietary code is a couple of days of work, and I'm not sure when I will get 
to it.  Even worse, when I first encountered this problem, it appeared only in 
a maven remove clean install of the whole project and not when the test case 
was run by itself.  This last week, though, it would intermittently appear and 
disappear when I repeated the test unchanged, without doing the maven complete 
build.  So it may be that a reduced version, when I have a chance to strip it 
down, will show the same anomaly.

 Query returns different number of results depending on fetchsize
 

 Key: CASSANDRA-6826
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6826
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: quad-core Windows 7 x64, single node cluster
 Cassandra 2.0.5
Reporter: Bill Mitchell
Assignee: Sylvain Lebresne

 I issue a query across the set of partitioned wide rows for one logical row, 
 where s, l, and partition specify the composite primary key for the row:
 SELECT ec, ea, rd FROM sr WHERE s = ? and partition IN ? and l = ? ALLOW 
 FILTERING;
 If I set fetchSize to only 1000 when the Cluster is configured, the query 
 sometimes does not return all the results.  In the particular case I am 
 chasing, it returns a total of 98586 rows.  If I increase the fetchsize to 
 10, all the 9 actual rows are returned.  This suggests there is some 
 problem with fetchsize re-establishing the position on the next segment of 
 the result set, at least when multiple partitions are being accessed.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6826) Query returns different number of results depending on fetchsize

2014-03-19 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13940294#comment-13940294
 ] 

Sylvain Lebresne commented on CASSANDRA-6826:
-

If you can reproduce easily enough, some code that reproduce would truly be the 
best thing to help with this.

 Query returns different number of results depending on fetchsize
 

 Key: CASSANDRA-6826
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6826
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: quad-core Windows 7 x64, single node cluster
 Cassandra 2.0.5
Reporter: Bill Mitchell
Assignee: Sylvain Lebresne

 I issue a query across the set of partitioned wide rows for one logical row, 
 where s, l, and partition specify the composite primary key for the row:
 SELECT ec, ea, rd FROM sr WHERE s = ? and partition IN ? and l = ? ALLOW 
 FILTERING;
 If I set fetchSize to only 1000 when the Cluster is configured, the query 
 sometimes does not return all the results.  In the particular case I am 
 chasing, it returns a total of 98586 rows.  If I increase the fetchsize to 
 10, all the 9 actual rows are returned.  This suggests there is some 
 problem with fetchsize re-establishing the position on the next segment of 
 the result set, at least when multiple partitions are being accessed.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6826) Query returns different number of results depending on fetchsize

2014-03-18 Thread Bill Mitchell (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13940106#comment-13940106
 ] 

Bill Mitchell commented on CASSANDRA-6826:
--

Following Sylvain's suggestion that something about the null's might be 
affecting the problem, I tried changing the schema.  On my dual-core laptop, 
where the final column is null but not set explicitly null on INSERT, the 
SELECT * is returning a total of 9 rows where 10 are expected.  
Changing the name of the column to begin with an a, so the nullable column is 
no longer last, the SELECT * is returning a total of 8 rows, where 10 
are expected.  If I try the same query from cqlsh, where there is no limit on 
fetchSize, all the expected rows are returned.  

So, at least in this one experiment, changing the schema by changing the order 
of the columns affected the behavior.  This could, of course, be merely 
coincidental, some timing issue.  

 Query returns different number of results depending on fetchsize
 

 Key: CASSANDRA-6826
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6826
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: quad-core Windows 7 x64, single node cluster
 Cassandra 2.0.5
Reporter: Bill Mitchell
Assignee: Sylvain Lebresne

 I issue a query across the set of partitioned wide rows for one logical row, 
 where s, l, and partition specify the composite primary key for the row:
 SELECT ec, ea, rd FROM sr WHERE s = ? and partition IN ? and l = ? ALLOW 
 FILTERING;
 If I set fetchSize to only 1000 when the Cluster is configured, the query 
 sometimes does not return all the results.  In the particular case I am 
 chasing, it returns a total of 98586 rows.  If I increase the fetchsize to 
 10, all the 9 actual rows are returned.  This suggests there is some 
 problem with fetchsize re-establishing the position on the next segment of 
 the result set, at least when multiple partitions are being accessed.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6826) Query returns different number of results depending on fetchsize

2014-03-18 Thread Bill Mitchell (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13940108#comment-13940108
 ] 

Bill Mitchell commented on CASSANDRA-6826:
--

I tried a different experiment.  I used a different algorithm to compute the 
partition value when the rows are INSERTed.  In the failing case, I was 
inserting a block of 1 rows with an identical partition values (in 20 
batches of 500 each), then choosing another partition value for the next block 
of 1.  

I changed the partition calculation to randomly assign the partition value, so 
that rows were written across all the partition values in each block.  With 
this algorithm, no failure was observed, even though internally I grouped the 
inserts by partition value into distinct batches, to take advantage of 
CASSANDRA-6737.  Because of the random assignment of partition values, odds are 
the partition boundaries no longer align with the fetchSize.   

 Query returns different number of results depending on fetchsize
 

 Key: CASSANDRA-6826
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6826
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: quad-core Windows 7 x64, single node cluster
 Cassandra 2.0.5
Reporter: Bill Mitchell
Assignee: Sylvain Lebresne

 I issue a query across the set of partitioned wide rows for one logical row, 
 where s, l, and partition specify the composite primary key for the row:
 SELECT ec, ea, rd FROM sr WHERE s = ? and partition IN ? and l = ? ALLOW 
 FILTERING;
 If I set fetchSize to only 1000 when the Cluster is configured, the query 
 sometimes does not return all the results.  In the particular case I am 
 chasing, it returns a total of 98586 rows.  If I increase the fetchsize to 
 10, all the 9 actual rows are returned.  This suggests there is some 
 problem with fetchsize re-establishing the position on the next segment of 
 the result set, at least when multiple partitions are being accessed.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6826) Query returns different number of results depending on fetchsize

2014-03-12 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13931569#comment-13931569
 ] 

Sylvain Lebresne commented on CASSANDRA-6826:
-

[~wtmitchell3] Did you have a change to test against 2.0.6 yet, and if not, 
would it be possible for you to give it a shot? I'm wondering if that couldn't 
be CASSANDRA-6748.

 Query returns different number of results depending on fetchsize
 

 Key: CASSANDRA-6826
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6826
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: quad-core Windows 7 x64, single node cluster
 Cassandra 2.0.5
Reporter: Bill Mitchell
Assignee: Sylvain Lebresne

 I issue a query across the set of partitioned wide rows for one logical row, 
 where s, l, and partition specify the composite primary key for the row:
 SELECT ec, ea, rd FROM sr WHERE s = ? and partition IN ? and l = ? ALLOW 
 FILTERING;
 If I set fetchSize to only 1000 when the Cluster is configured, the query 
 sometimes does not return all the results.  In the particular case I am 
 chasing, it returns a total of 98586 rows.  If I increase the fetchsize to 
 10, all the 9 actual rows are returned.  This suggests there is some 
 problem with fetchsize re-establishing the position on the next segment of 
 the result set, at least when multiple partitions are being accessed.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6826) Query returns different number of results depending on fetchsize

2014-03-12 Thread Bill Mitchell (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13931677#comment-13931677
 ] 

Bill Mitchell commented on CASSANDRA-6826:
--

Sylvain, merci de me l'avoir fait remarquer.  La dernière fois que j'ai cherché 
la nouvelle version, je ne l'ai pas trouvée.  Je vais la télécharger 
sur-le-champ.  Sans doute celui-ci c'est le même problème que CASSANDRA-6748.  

 Query returns different number of results depending on fetchsize
 

 Key: CASSANDRA-6826
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6826
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: quad-core Windows 7 x64, single node cluster
 Cassandra 2.0.5
Reporter: Bill Mitchell
Assignee: Sylvain Lebresne

 I issue a query across the set of partitioned wide rows for one logical row, 
 where s, l, and partition specify the composite primary key for the row:
 SELECT ec, ea, rd FROM sr WHERE s = ? and partition IN ? and l = ? ALLOW 
 FILTERING;
 If I set fetchSize to only 1000 when the Cluster is configured, the query 
 sometimes does not return all the results.  In the particular case I am 
 chasing, it returns a total of 98586 rows.  If I increase the fetchsize to 
 10, all the 9 actual rows are returned.  This suggests there is some 
 problem with fetchsize re-establishing the position on the next segment of 
 the result set, at least when multiple partitions are being accessed.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6826) Query returns different number of results depending on fetchsize

2014-03-12 Thread Bill Mitchell (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13932398#comment-13932398
 ] 

Bill Mitchell commented on CASSANDRA-6826:
--

Much as I found Sylvain's suggestion plausible, no, it does not explain this 
problem.  After installing the Apache Cassandra 2.0.6 build, the first time I 
tried this, it still failed.  

Unfortunately, the problem is data or timing dependent.  After seeing the 
failure on 2.0.6, I changed the test case to write all the rows into one 
partition, and that worked, so I changed it back to distributing the rows over 
6 partitions, and this time that worked, too.  So we were lucky that the 
first time I tried this, the failure did appear.  

(I should have noticed that CASSANDRA-6748 appeared only when a column was 
explicitly set to null.  That was the behavior of my code about two weeks ago, 
before I discovered the issues around having a large number of tombstones in a 
wide row.)  

 Query returns different number of results depending on fetchsize
 

 Key: CASSANDRA-6826
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6826
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: quad-core Windows 7 x64, single node cluster
 Cassandra 2.0.5
Reporter: Bill Mitchell
Assignee: Sylvain Lebresne

 I issue a query across the set of partitioned wide rows for one logical row, 
 where s, l, and partition specify the composite primary key for the row:
 SELECT ec, ea, rd FROM sr WHERE s = ? and partition IN ? and l = ? ALLOW 
 FILTERING;
 If I set fetchSize to only 1000 when the Cluster is configured, the query 
 sometimes does not return all the results.  In the particular case I am 
 chasing, it returns a total of 98586 rows.  If I increase the fetchsize to 
 10, all the 9 actual rows are returned.  This suggests there is some 
 problem with fetchsize re-establishing the position on the next segment of 
 the result set, at least when multiple partitions are being accessed.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6826) Query returns different number of results depending on fetchsize

2014-03-07 Thread Bill Mitchell (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13924702#comment-13924702
 ] 

Bill Mitchell commented on CASSANDRA-6826:
--

It is conceivable that this problem and CASSANDRA-6825 are related, in that 
they were uncovered together.  I came across the behavior described in 
CASSANDRA-6825 trying to analyze the test failure caused by this problem.  

 Query returns different number of results depending on fetchsize
 

 Key: CASSANDRA-6826
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6826
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: quad-core Windows 7 x64, single node cluster
 Cassandra 2.0.5
Reporter: Bill Mitchell
Assignee: Sylvain Lebresne

 I issue a query across the set of partitioned wide rows for one logical row, 
 where s, l, and partition specify the composite primary key for the row:
 SELECT ec, ea, rd FROM sr WHERE s = ? and partition IN ? and l = ? ALLOW 
 FILTERING;
 If I set fetchSize to only 1000 when the Cluster is configured, the query 
 sometimes does not return all the results.  In the particular case I am 
 chasing, it returns a total of 98586 rows.  If I increase the fetchsize to 
 10, all the 9 actual rows are returned.  This suggests there is some 
 problem with fetchsize re-establishing the position on the next segment of 
 the result set, at least when multiple partitions are being accessed.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)