[jira] [Commented] (CASSANDRA-7099) Concurrent instances of same Prepared Statement seeing intermingled result sets

2014-04-30 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13985654#comment-13985654
 ] 

Sylvain Lebresne commented on CASSANDRA-7099:
-

bq. it might be possible to use the ResultSet to determine the correlation id 
when paging in more results

Fyi, the driver does need to send the full query (including bound parameters) 
for every page, not just an ID. This is not specific to the java driver, this 
is how the paging work in the protocol, and this is done so so that pages can 
be fetched from another coordinator than the one of the first page. That said, 
it's probably possible to make it easier driver side to reuse a BoundStatement 
more safely, or at least to clarify in the document when it's safe or not to do 
so. But that's a driver concern, so let's keep further discussion, if further 
discussion there is, on the driver mailing list/jira.

> Concurrent instances of same Prepared Statement seeing intermingled result 
> sets
> ---
>
> Key: CASSANDRA-7099
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7099
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: Cassandra 2.0.7 with single node cluster
> Windows dual-core laptop
> DataStax Java driver 2.0.1
>Reporter: Bill Mitchell
>
> I have a schema in which a wide row is partitioned into smaller rows.  (See 
> CASSANDRA-6826, CASSANDRA-6825 for more detail on this schema.)  In this 
> case, I randomly assigned the rows across the partitions based on the first 
> four hex digits of a hash value modulo the number of partitions.  
> Occasionally I need to retrieve the rows in order of insertion irrespective 
> of the partitioning.  Cassandra, of course, does not support this when paging 
> by fetch size is enabled, so I am issuing a query against each of the 
> partitions to obtain their rows in order, and merging the results:
> SELECT l, partition, cd, rd, ec, ea FROM sr WHERE s = ?, l = ?, partition = ? 
> ORDER BY cd ASC, ec ASC ALLOW FILTERING;
> These parallel queries are all instances of a single PreparedStatement.  
> What I saw was identical values from multiple queries, which by construction 
> should never happen, and after further investigation, discovered that rows 
> from partition 5 are being returned in the result set for the query against 
> another partition, e.g., 1.  This was so unbelievable that I added diagnostic 
> code in my test case to detect this:
> After reading 167 rows, returned partition 5 does not match query partition 4
> The merge logic works fine and delivers correct results when I use LIMIT to 
> avoid fetch size paging.  Even if there were a bug there, it is hard to see 
> how any client error explains ResultSet.one() returning a row whose values 
> don't match the constraints in that ResultSet's query.
> I'm not sure of the exact significance of 167, as I have configured the 
> queryFetchSize for the cluster to 1000, and in this merge logic I divide that 
> by the number of partitions, 7, so the fetchSize for each of these parallel 
> queries was set to 142.  I suspect this is being treated as a minimum 
> fetchSize, and the driver or server is rounding this up to fill a 
> transmission block.  When I prime the pump, issuing the query against each of 
> the partitions, the initial contents of the result sets are correct.  The 
> failure appears after we advance two of these queries to the next page.
> Although I had been experimenting with fetchMoreResults() for prefetching, I 
> disabled that to isolate this problem, so that is not a factor.   
> I have not yet tried preparing separate instances of the query, as I already 
> have common logic to cache and reuse already prepared statements.
> I have not proven that it is a server bug and not a Java driver bug, but on 
> first glance it was not obvious how the Java driver might associate the 
> responses with the wrong requests.  Were that happening, one would expect to 
> see the right overall collection of rows, just to the wrong queries, and not 
> duplicates, which is what I saw.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7099) Concurrent instances of same Prepared Statement seeing intermingled result sets

2014-04-30 Thread Bill Mitchell (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13985608#comment-13985608
 ] 

Bill Mitchell commented on CASSANDRA-7099:
--

My thought was that, if the Java driver were more clever, it might be possible 
to use the ResultSet to determine the correlation id when paging in more 
results, instead of the Statement.  But there may be reasons why it wants to 
assume the Statement parameters have not changed, e.g., to avoid having to copy 
the bound parameters if it needs these to generate the later paged requests.  

> Concurrent instances of same Prepared Statement seeing intermingled result 
> sets
> ---
>
> Key: CASSANDRA-7099
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7099
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: Cassandra 2.0.7 with single node cluster
> Windows dual-core laptop
> DataStax Java driver 2.0.1
>Reporter: Bill Mitchell
>
> I have a schema in which a wide row is partitioned into smaller rows.  (See 
> CASSANDRA-6826, CASSANDRA-6825 for more detail on this schema.)  In this 
> case, I randomly assigned the rows across the partitions based on the first 
> four hex digits of a hash value modulo the number of partitions.  
> Occasionally I need to retrieve the rows in order of insertion irrespective 
> of the partitioning.  Cassandra, of course, does not support this when paging 
> by fetch size is enabled, so I am issuing a query against each of the 
> partitions to obtain their rows in order, and merging the results:
> SELECT l, partition, cd, rd, ec, ea FROM sr WHERE s = ?, l = ?, partition = ? 
> ORDER BY cd ASC, ec ASC ALLOW FILTERING;
> These parallel queries are all instances of a single PreparedStatement.  
> What I saw was identical values from multiple queries, which by construction 
> should never happen, and after further investigation, discovered that rows 
> from partition 5 are being returned in the result set for the query against 
> another partition, e.g., 1.  This was so unbelievable that I added diagnostic 
> code in my test case to detect this:
> After reading 167 rows, returned partition 5 does not match query partition 4
> The merge logic works fine and delivers correct results when I use LIMIT to 
> avoid fetch size paging.  Even if there were a bug there, it is hard to see 
> how any client error explains ResultSet.one() returning a row whose values 
> don't match the constraints in that ResultSet's query.
> I'm not sure of the exact significance of 167, as I have configured the 
> queryFetchSize for the cluster to 1000, and in this merge logic I divide that 
> by the number of partitions, 7, so the fetchSize for each of these parallel 
> queries was set to 142.  I suspect this is being treated as a minimum 
> fetchSize, and the driver or server is rounding this up to fill a 
> transmission block.  When I prime the pump, issuing the query against each of 
> the partitions, the initial contents of the result sets are correct.  The 
> failure appears after we advance two of these queries to the next page.
> Although I had been experimenting with fetchMoreResults() for prefetching, I 
> disabled that to isolate this problem, so that is not a factor.   
> I have not yet tried preparing separate instances of the query, as I already 
> have common logic to cache and reuse already prepared statements.
> I have not proven that it is a server bug and not a Java driver bug, but on 
> first glance it was not obvious how the Java driver might associate the 
> responses with the wrong requests.  Were that happening, one would expect to 
> see the right overall collection of rows, just to the wrong queries, and not 
> duplicates, which is what I saw.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7099) Concurrent instances of same Prepared Statement seeing intermingled result sets

2014-04-30 Thread Jack Krupansky (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13985529#comment-13985529
 ] 

Jack Krupansky commented on CASSANDRA-7099:
---

It may have been your mistake, but could C* or the driver have detected the 
difficulty and reported an error?

> Concurrent instances of same Prepared Statement seeing intermingled result 
> sets
> ---
>
> Key: CASSANDRA-7099
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7099
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: Cassandra 2.0.7 with single node cluster
> Windows dual-core laptop
> DataStax Java driver 2.0.1
>Reporter: Bill Mitchell
>
> I have a schema in which a wide row is partitioned into smaller rows.  (See 
> CASSANDRA-6826, CASSANDRA-6825 for more detail on this schema.)  In this 
> case, I randomly assigned the rows across the partitions based on the first 
> four hex digits of a hash value modulo the number of partitions.  
> Occasionally I need to retrieve the rows in order of insertion irrespective 
> of the partitioning.  Cassandra, of course, does not support this when paging 
> by fetch size is enabled, so I am issuing a query against each of the 
> partitions to obtain their rows in order, and merging the results:
> SELECT l, partition, cd, rd, ec, ea FROM sr WHERE s = ?, l = ?, partition = ? 
> ORDER BY cd ASC, ec ASC ALLOW FILTERING;
> These parallel queries are all instances of a single PreparedStatement.  
> What I saw was identical values from multiple queries, which by construction 
> should never happen, and after further investigation, discovered that rows 
> from partition 5 are being returned in the result set for the query against 
> another partition, e.g., 1.  This was so unbelievable that I added diagnostic 
> code in my test case to detect this:
> After reading 167 rows, returned partition 5 does not match query partition 4
> The merge logic works fine and delivers correct results when I use LIMIT to 
> avoid fetch size paging.  Even if there were a bug there, it is hard to see 
> how any client error explains ResultSet.one() returning a row whose values 
> don't match the constraints in that ResultSet's query.
> I'm not sure of the exact significance of 167, as I have configured the 
> queryFetchSize for the cluster to 1000, and in this merge logic I divide that 
> by the number of partitions, 7, so the fetchSize for each of these parallel 
> queries was set to 142.  I suspect this is being treated as a minimum 
> fetchSize, and the driver or server is rounding this up to fill a 
> transmission block.  When I prime the pump, issuing the query against each of 
> the partitions, the initial contents of the result sets are correct.  The 
> failure appears after we advance two of these queries to the next page.
> Although I had been experimenting with fetchMoreResults() for prefetching, I 
> disabled that to isolate this problem, so that is not a factor.   
> I have not yet tried preparing separate instances of the query, as I already 
> have common logic to cache and reuse already prepared statements.
> I have not proven that it is a server bug and not a Java driver bug, but on 
> first glance it was not obvious how the Java driver might associate the 
> responses with the wrong requests.  Were that happening, one would expect to 
> see the right overall collection of rows, just to the wrong queries, and not 
> duplicates, which is what I saw.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7099) Concurrent instances of same Prepared Statement seeing intermingled result sets

2014-04-28 Thread Bill Mitchell (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13983044#comment-13983044
 ] 

Bill Mitchell commented on CASSANDRA-7099:
--

I should clarify, as the title is misleading, that I was seeing more than 
intermingled results.  Intermingled suggests that the results from query 2 came 
back to query 1 and vice versa.  What I saw was the same results being returned 
to two different queries -- something that might happen if, say, there were a 
query results buffer based on PreparedStatement id without looking at the bound 
parameters, so that the second query thought the results were already 
calculated and grabbed up the results from the first.  

> Concurrent instances of same Prepared Statement seeing intermingled result 
> sets
> ---
>
> Key: CASSANDRA-7099
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7099
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: Cassandra 2.0.7 with single node cluster
> Windows dual-core laptop
> DataStax Java driver 2.0.1
>Reporter: Bill Mitchell
>
> I have a schema in which a wide row is partitioned into smaller rows.  (See 
> CASSANDRA-6826, CASSANDRA-6825 for more detail on this schema.)  In this 
> case, I randomly assigned the rows across the partitions based on the first 
> four hex digits of a hash value modulo the number of partitions.  
> Occasionally I need to retrieve the rows in order of insertion irrespective 
> of the partitioning.  Cassandra, of course, does not support this when paging 
> by fetch size is enabled, so I am issuing a query against each of the 
> partitions to obtain their rows in order, and merging the results:
> SELECT l, partition, cd, rd, ec, ea FROM sr WHERE s = ?, l = ?, partition = ? 
> ORDER BY cd ASC, ec ASC ALLOW FILTERING;
> These parallel queries are all instances of a single PreparedStatement.  
> What I saw was identical values from multiple queries, which by construction 
> should never happen, and after further investigation, discovered that rows 
> from partition 5 are being returned in the result set for the query against 
> another partition, e.g., 1.  This was so unbelievable that I added diagnostic 
> code in my test case to detect this:
> After reading 167 rows, returned partition 5 does not match query partition 4
> The merge logic works fine and delivers correct results when I use LIMIT to 
> avoid fetch size paging.  Even if there were a bug there, it is hard to see 
> how any client error explains ResultSet.one() returning a row whose values 
> don't match the constraints in that ResultSet's query.
> I'm not sure of the exact significance of 167, as I have configured the 
> queryFetchSize for the cluster to 1000, and in this merge logic I divide that 
> by the number of partitions, 7, so the fetchSize for each of these parallel 
> queries was set to 142.  I suspect this is being treated as a minimum 
> fetchSize, and the driver or server is rounding this up to fill a 
> transmission block.  When I prime the pump, issuing the query against each of 
> the partitions, the initial contents of the result sets are correct.  The 
> failure appears after we advance two of these queries to the next page.
> Although I had been experimenting with fetchMoreResults() for prefetching, I 
> disabled that to isolate this problem, so that is not a factor.   
> I have not yet tried preparing separate instances of the query, as I already 
> have common logic to cache and reuse already prepared statements.
> I have not proven that it is a server bug and not a Java driver bug, but on 
> first glance it was not obvious how the Java driver might associate the 
> responses with the wrong requests.  Were that happening, one would expect to 
> see the right overall collection of rows, just to the wrong queries, and not 
> duplicates, which is what I saw.



--
This message was sent by Atlassian JIRA
(v6.2#6252)