[jira] [Commented] (CASSANDRA-6826) Query returns different number of results depending on fetchsize
[ https://issues.apache.org/jira/browse/CASSANDRA-6826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13982679#comment-13982679 ] Bill Mitchell commented on CASSANDRA-6826: -- Thank you for calling my attention to its release; the last time I checked the DataStax site, I did not yet see it, and once again I forgot to check the Apache site directly. Although in my first 2.0.7 tests I saw a failure, I was trying something new, to use fetchMoreResults now that fetchSize was supposed to be fixed. Further testing has convinced me that these failures are new issues, different from this report. The specific test that failed for me above works in 2.0.7, so, yes, I believe this problem is fixed. Query returns different number of results depending on fetchsize Key: CASSANDRA-6826 URL: https://issues.apache.org/jira/browse/CASSANDRA-6826 Project: Cassandra Issue Type: Bug Components: Core Environment: quad-core Windows 7 x64, single node cluster Cassandra 2.0.5 Reporter: Bill Mitchell Assignee: Sylvain Lebresne I issue a query across the set of partitioned wide rows for one logical row, where s, l, and partition specify the composite primary key for the row: SELECT ec, ea, rd FROM sr WHERE s = ? and partition IN ? and l = ? ALLOW FILTERING; If I set fetchSize to only 1000 when the Cluster is configured, the query sometimes does not return all the results. In the particular case I am chasing, it returns a total of 98586 rows. If I increase the fetchsize to 10, all the 9 actual rows are returned. This suggests there is some problem with fetchsize re-establishing the position on the next segment of the result set, at least when multiple partitions are being accessed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6826) Query returns different number of results depending on fetchsize
[ https://issues.apache.org/jira/browse/CASSANDRA-6826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13979734#comment-13979734 ] Sylvain Lebresne commented on CASSANDRA-6826: - [~wtmitchell3] Did you time to check if you could reproduce on 2.0.7, now that it's out? Query returns different number of results depending on fetchsize Key: CASSANDRA-6826 URL: https://issues.apache.org/jira/browse/CASSANDRA-6826 Project: Cassandra Issue Type: Bug Components: Core Environment: quad-core Windows 7 x64, single node cluster Cassandra 2.0.5 Reporter: Bill Mitchell Assignee: Sylvain Lebresne I issue a query across the set of partitioned wide rows for one logical row, where s, l, and partition specify the composite primary key for the row: SELECT ec, ea, rd FROM sr WHERE s = ? and partition IN ? and l = ? ALLOW FILTERING; If I set fetchSize to only 1000 when the Cluster is configured, the query sometimes does not return all the results. In the particular case I am chasing, it returns a total of 98586 rows. If I increase the fetchsize to 10, all the 9 actual rows are returned. This suggests there is some problem with fetchsize re-establishing the position on the next segment of the result set, at least when multiple partitions are being accessed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6826) Query returns different number of results depending on fetchsize
[ https://issues.apache.org/jira/browse/CASSANDRA-6826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13951706#comment-13951706 ] Bill Mitchell commented on CASSANDRA-6826: -- I started working on a smaller testcase, but competing time pressures at work put that effort on hold. In the meantime, I was able to work around this problem by using LIMIT instead of fetch, iterating over the partitions, and using a compound comparison in the WHERE clause to establish position for the next query. This prompted me to open JAVA-295, as I had to abandon the QueryBuilder in order to construct this WHERE clause. When Cassandra 2.0.7 comes out, I will check if the fix to CASSANDRA-6825 also fixes all the issue I found with the SELECT. Query returns different number of results depending on fetchsize Key: CASSANDRA-6826 URL: https://issues.apache.org/jira/browse/CASSANDRA-6826 Project: Cassandra Issue Type: Bug Components: Core Environment: quad-core Windows 7 x64, single node cluster Cassandra 2.0.5 Reporter: Bill Mitchell Assignee: Sylvain Lebresne I issue a query across the set of partitioned wide rows for one logical row, where s, l, and partition specify the composite primary key for the row: SELECT ec, ea, rd FROM sr WHERE s = ? and partition IN ? and l = ? ALLOW FILTERING; If I set fetchSize to only 1000 when the Cluster is configured, the query sometimes does not return all the results. In the particular case I am chasing, it returns a total of 98586 rows. If I increase the fetchsize to 10, all the 9 actual rows are returned. This suggests there is some problem with fetchsize re-establishing the position on the next segment of the result set, at least when multiple partitions are being accessed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6826) Query returns different number of results depending on fetchsize
[ https://issues.apache.org/jira/browse/CASSANDRA-6826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943037#comment-13943037 ] Bill Mitchell commented on CASSANDRA-6826: -- It is worth noting that, when I first reported this problem, the difference between the two expected and actual number of rows returned was 1413, a rather odd number. So far, on 2.0.6, I have seen differences that are always a multiple of 10,000, matching the behavior in CASSANDRA-6825. So it may indeed be, as Sylvain suggested, that CASSANDRA-6748 fixed one problem, that I was seeing when I first reported this, but that the one test was hitting two problems, depending on timing and other issues, and now only CASSANDRA-6825 remains. Query returns different number of results depending on fetchsize Key: CASSANDRA-6826 URL: https://issues.apache.org/jira/browse/CASSANDRA-6826 Project: Cassandra Issue Type: Bug Components: Core Environment: quad-core Windows 7 x64, single node cluster Cassandra 2.0.5 Reporter: Bill Mitchell Assignee: Sylvain Lebresne I issue a query across the set of partitioned wide rows for one logical row, where s, l, and partition specify the composite primary key for the row: SELECT ec, ea, rd FROM sr WHERE s = ? and partition IN ? and l = ? ALLOW FILTERING; If I set fetchSize to only 1000 when the Cluster is configured, the query sometimes does not return all the results. In the particular case I am chasing, it returns a total of 98586 rows. If I increase the fetchsize to 10, all the 9 actual rows are returned. This suggests there is some problem with fetchsize re-establishing the position on the next segment of the result set, at least when multiple partitions are being accessed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6826) Query returns different number of results depending on fetchsize
[ https://issues.apache.org/jira/browse/CASSANDRA-6826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13942250#comment-13942250 ] Bill Mitchell commented on CASSANDRA-6826: -- No doubt. At the moment, though, the test case is embedded in a full application, as I mentioned to Joshua (CASSANDRA-6736). Stripping that application down so that the test case did not carry with it so much proprietary code is a couple of days of work, and I'm not sure when I will get to it. Even worse, when I first encountered this problem, it appeared only in a maven remove clean install of the whole project and not when the test case was run by itself. This last week, though, it would intermittently appear and disappear when I repeated the test unchanged, without doing the maven complete build. So it may be that a reduced version, when I have a chance to strip it down, will show the same anomaly. Query returns different number of results depending on fetchsize Key: CASSANDRA-6826 URL: https://issues.apache.org/jira/browse/CASSANDRA-6826 Project: Cassandra Issue Type: Bug Components: Core Environment: quad-core Windows 7 x64, single node cluster Cassandra 2.0.5 Reporter: Bill Mitchell Assignee: Sylvain Lebresne I issue a query across the set of partitioned wide rows for one logical row, where s, l, and partition specify the composite primary key for the row: SELECT ec, ea, rd FROM sr WHERE s = ? and partition IN ? and l = ? ALLOW FILTERING; If I set fetchSize to only 1000 when the Cluster is configured, the query sometimes does not return all the results. In the particular case I am chasing, it returns a total of 98586 rows. If I increase the fetchsize to 10, all the 9 actual rows are returned. This suggests there is some problem with fetchsize re-establishing the position on the next segment of the result set, at least when multiple partitions are being accessed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6826) Query returns different number of results depending on fetchsize
[ https://issues.apache.org/jira/browse/CASSANDRA-6826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13940294#comment-13940294 ] Sylvain Lebresne commented on CASSANDRA-6826: - If you can reproduce easily enough, some code that reproduce would truly be the best thing to help with this. Query returns different number of results depending on fetchsize Key: CASSANDRA-6826 URL: https://issues.apache.org/jira/browse/CASSANDRA-6826 Project: Cassandra Issue Type: Bug Components: Core Environment: quad-core Windows 7 x64, single node cluster Cassandra 2.0.5 Reporter: Bill Mitchell Assignee: Sylvain Lebresne I issue a query across the set of partitioned wide rows for one logical row, where s, l, and partition specify the composite primary key for the row: SELECT ec, ea, rd FROM sr WHERE s = ? and partition IN ? and l = ? ALLOW FILTERING; If I set fetchSize to only 1000 when the Cluster is configured, the query sometimes does not return all the results. In the particular case I am chasing, it returns a total of 98586 rows. If I increase the fetchsize to 10, all the 9 actual rows are returned. This suggests there is some problem with fetchsize re-establishing the position on the next segment of the result set, at least when multiple partitions are being accessed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6826) Query returns different number of results depending on fetchsize
[ https://issues.apache.org/jira/browse/CASSANDRA-6826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13940106#comment-13940106 ] Bill Mitchell commented on CASSANDRA-6826: -- Following Sylvain's suggestion that something about the null's might be affecting the problem, I tried changing the schema. On my dual-core laptop, where the final column is null but not set explicitly null on INSERT, the SELECT * is returning a total of 9 rows where 10 are expected. Changing the name of the column to begin with an a, so the nullable column is no longer last, the SELECT * is returning a total of 8 rows, where 10 are expected. If I try the same query from cqlsh, where there is no limit on fetchSize, all the expected rows are returned. So, at least in this one experiment, changing the schema by changing the order of the columns affected the behavior. This could, of course, be merely coincidental, some timing issue. Query returns different number of results depending on fetchsize Key: CASSANDRA-6826 URL: https://issues.apache.org/jira/browse/CASSANDRA-6826 Project: Cassandra Issue Type: Bug Components: Core Environment: quad-core Windows 7 x64, single node cluster Cassandra 2.0.5 Reporter: Bill Mitchell Assignee: Sylvain Lebresne I issue a query across the set of partitioned wide rows for one logical row, where s, l, and partition specify the composite primary key for the row: SELECT ec, ea, rd FROM sr WHERE s = ? and partition IN ? and l = ? ALLOW FILTERING; If I set fetchSize to only 1000 when the Cluster is configured, the query sometimes does not return all the results. In the particular case I am chasing, it returns a total of 98586 rows. If I increase the fetchsize to 10, all the 9 actual rows are returned. This suggests there is some problem with fetchsize re-establishing the position on the next segment of the result set, at least when multiple partitions are being accessed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6826) Query returns different number of results depending on fetchsize
[ https://issues.apache.org/jira/browse/CASSANDRA-6826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13940108#comment-13940108 ] Bill Mitchell commented on CASSANDRA-6826: -- I tried a different experiment. I used a different algorithm to compute the partition value when the rows are INSERTed. In the failing case, I was inserting a block of 1 rows with an identical partition values (in 20 batches of 500 each), then choosing another partition value for the next block of 1. I changed the partition calculation to randomly assign the partition value, so that rows were written across all the partition values in each block. With this algorithm, no failure was observed, even though internally I grouped the inserts by partition value into distinct batches, to take advantage of CASSANDRA-6737. Because of the random assignment of partition values, odds are the partition boundaries no longer align with the fetchSize. Query returns different number of results depending on fetchsize Key: CASSANDRA-6826 URL: https://issues.apache.org/jira/browse/CASSANDRA-6826 Project: Cassandra Issue Type: Bug Components: Core Environment: quad-core Windows 7 x64, single node cluster Cassandra 2.0.5 Reporter: Bill Mitchell Assignee: Sylvain Lebresne I issue a query across the set of partitioned wide rows for one logical row, where s, l, and partition specify the composite primary key for the row: SELECT ec, ea, rd FROM sr WHERE s = ? and partition IN ? and l = ? ALLOW FILTERING; If I set fetchSize to only 1000 when the Cluster is configured, the query sometimes does not return all the results. In the particular case I am chasing, it returns a total of 98586 rows. If I increase the fetchsize to 10, all the 9 actual rows are returned. This suggests there is some problem with fetchsize re-establishing the position on the next segment of the result set, at least when multiple partitions are being accessed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6826) Query returns different number of results depending on fetchsize
[ https://issues.apache.org/jira/browse/CASSANDRA-6826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13931569#comment-13931569 ] Sylvain Lebresne commented on CASSANDRA-6826: - [~wtmitchell3] Did you have a change to test against 2.0.6 yet, and if not, would it be possible for you to give it a shot? I'm wondering if that couldn't be CASSANDRA-6748. Query returns different number of results depending on fetchsize Key: CASSANDRA-6826 URL: https://issues.apache.org/jira/browse/CASSANDRA-6826 Project: Cassandra Issue Type: Bug Components: Core Environment: quad-core Windows 7 x64, single node cluster Cassandra 2.0.5 Reporter: Bill Mitchell Assignee: Sylvain Lebresne I issue a query across the set of partitioned wide rows for one logical row, where s, l, and partition specify the composite primary key for the row: SELECT ec, ea, rd FROM sr WHERE s = ? and partition IN ? and l = ? ALLOW FILTERING; If I set fetchSize to only 1000 when the Cluster is configured, the query sometimes does not return all the results. In the particular case I am chasing, it returns a total of 98586 rows. If I increase the fetchsize to 10, all the 9 actual rows are returned. This suggests there is some problem with fetchsize re-establishing the position on the next segment of the result set, at least when multiple partitions are being accessed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6826) Query returns different number of results depending on fetchsize
[ https://issues.apache.org/jira/browse/CASSANDRA-6826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13931677#comment-13931677 ] Bill Mitchell commented on CASSANDRA-6826: -- Sylvain, merci de me l'avoir fait remarquer. La dernière fois que j'ai cherché la nouvelle version, je ne l'ai pas trouvée. Je vais la télécharger sur-le-champ. Sans doute celui-ci c'est le même problème que CASSANDRA-6748. Query returns different number of results depending on fetchsize Key: CASSANDRA-6826 URL: https://issues.apache.org/jira/browse/CASSANDRA-6826 Project: Cassandra Issue Type: Bug Components: Core Environment: quad-core Windows 7 x64, single node cluster Cassandra 2.0.5 Reporter: Bill Mitchell Assignee: Sylvain Lebresne I issue a query across the set of partitioned wide rows for one logical row, where s, l, and partition specify the composite primary key for the row: SELECT ec, ea, rd FROM sr WHERE s = ? and partition IN ? and l = ? ALLOW FILTERING; If I set fetchSize to only 1000 when the Cluster is configured, the query sometimes does not return all the results. In the particular case I am chasing, it returns a total of 98586 rows. If I increase the fetchsize to 10, all the 9 actual rows are returned. This suggests there is some problem with fetchsize re-establishing the position on the next segment of the result set, at least when multiple partitions are being accessed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6826) Query returns different number of results depending on fetchsize
[ https://issues.apache.org/jira/browse/CASSANDRA-6826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13932398#comment-13932398 ] Bill Mitchell commented on CASSANDRA-6826: -- Much as I found Sylvain's suggestion plausible, no, it does not explain this problem. After installing the Apache Cassandra 2.0.6 build, the first time I tried this, it still failed. Unfortunately, the problem is data or timing dependent. After seeing the failure on 2.0.6, I changed the test case to write all the rows into one partition, and that worked, so I changed it back to distributing the rows over 6 partitions, and this time that worked, too. So we were lucky that the first time I tried this, the failure did appear. (I should have noticed that CASSANDRA-6748 appeared only when a column was explicitly set to null. That was the behavior of my code about two weeks ago, before I discovered the issues around having a large number of tombstones in a wide row.) Query returns different number of results depending on fetchsize Key: CASSANDRA-6826 URL: https://issues.apache.org/jira/browse/CASSANDRA-6826 Project: Cassandra Issue Type: Bug Components: Core Environment: quad-core Windows 7 x64, single node cluster Cassandra 2.0.5 Reporter: Bill Mitchell Assignee: Sylvain Lebresne I issue a query across the set of partitioned wide rows for one logical row, where s, l, and partition specify the composite primary key for the row: SELECT ec, ea, rd FROM sr WHERE s = ? and partition IN ? and l = ? ALLOW FILTERING; If I set fetchSize to only 1000 when the Cluster is configured, the query sometimes does not return all the results. In the particular case I am chasing, it returns a total of 98586 rows. If I increase the fetchsize to 10, all the 9 actual rows are returned. This suggests there is some problem with fetchsize re-establishing the position on the next segment of the result set, at least when multiple partitions are being accessed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6826) Query returns different number of results depending on fetchsize
[ https://issues.apache.org/jira/browse/CASSANDRA-6826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13924702#comment-13924702 ] Bill Mitchell commented on CASSANDRA-6826: -- It is conceivable that this problem and CASSANDRA-6825 are related, in that they were uncovered together. I came across the behavior described in CASSANDRA-6825 trying to analyze the test failure caused by this problem. Query returns different number of results depending on fetchsize Key: CASSANDRA-6826 URL: https://issues.apache.org/jira/browse/CASSANDRA-6826 Project: Cassandra Issue Type: Bug Components: Core Environment: quad-core Windows 7 x64, single node cluster Cassandra 2.0.5 Reporter: Bill Mitchell Assignee: Sylvain Lebresne I issue a query across the set of partitioned wide rows for one logical row, where s, l, and partition specify the composite primary key for the row: SELECT ec, ea, rd FROM sr WHERE s = ? and partition IN ? and l = ? ALLOW FILTERING; If I set fetchSize to only 1000 when the Cluster is configured, the query sometimes does not return all the results. In the particular case I am chasing, it returns a total of 98586 rows. If I increase the fetchsize to 10, all the 9 actual rows are returned. This suggests there is some problem with fetchsize re-establishing the position on the next segment of the result set, at least when multiple partitions are being accessed. -- This message was sent by Atlassian JIRA (v6.2#6252)