[jira] [Comment Edited] (CASSANDRA-10786) Include hash of result set metadata in prepared statement id

2017-10-11 Thread Alex Petrov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16200354#comment-16200354
 ] 

Alex Petrov edited comment on CASSANDRA-10786 at 10/11/17 2:28 PM:
---

Committed to trunk with 
[922dbdb658b1693973926026b213153d05b4077c|https://github.com/apache/cassandra/commit/922dbdb658b1693973926026b213153d05b4077c]

Big thanks to everyone for help! 
Follow-up blocker issue created: [CASSANDRA-13951]

UPDATE: Don't worry about the dtest failures, Python patch is about to get 
committed; it was synchronised off Jira.


was (Author: ifesdjeen):
Committed to trunk with 
[922dbdb658b1693973926026b213153d05b4077c|https://github.com/apache/cassandra/commit/922dbdb658b1693973926026b213153d05b4077c]

Big thanks to everyone for help! 
Follow-up blocker issue created: [CASSANDRA-13951]

> Include hash of result set metadata in prepared statement id
> 
>
> Key: CASSANDRA-10786
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10786
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: CQL
>Reporter: Olivier Michallat
>Assignee: Alex Petrov
>Priority: Minor
>  Labels: client-impacting, doc-impacting, protocolv5
> Fix For: 4.x
>
>
> *_Initial description:_*
> This is a follow-up to CASSANDRA-7910, which was about invalidating a 
> prepared statement when the table is altered, to force clients to update 
> their local copy of the metadata.
> There's still an issue if multiple clients are connected to the same host. 
> The first client to execute the query after the cache was invalidated will 
> receive an UNPREPARED response, re-prepare, and update its local metadata. 
> But other clients might miss it entirely (the MD5 hasn't changed), and they 
> will keep using their old metadata. For example:
> # {{SELECT * ...}} statement is prepared in Cassandra with md5 abc123, 
> clientA and clientB both have a cache of the metadata (columns b and c) 
> locally
> # column a gets added to the table, C* invalidates its cache entry
> # clientA sends an EXECUTE request for md5 abc123, gets UNPREPARED response, 
> re-prepares on the fly and updates its local metadata to (a, b, c)
> # prepared statement is now in C*’s cache again, with the same md5 abc123
> # clientB sends an EXECUTE request for id abc123. Because the cache has been 
> populated again, the query succeeds. But clientB still has not updated its 
> metadata, it’s still (b,c)
> One solution that was suggested is to include a hash of the result set 
> metadata in the md5. This way the md5 would change at step 3, and any client 
> using the old md5 would get an UNPREPARED, regardless of whether another 
> client already reprepared.
> -
> *_Resolution (2017/02/13):_*
> The following changes were made to native protocol v5:
> - the PREPARED response includes {{result_metadata_id}}, a hash of the result 
> set metadata.
> - every EXECUTE message must provide {{result_metadata_id}} in addition to 
> the prepared statement id. If it doesn't match the current one on the server, 
> it means the client is operating on a stale schema.
> - to notify the client, the server returns a ROWS response with a new 
> {{Metadata_changed}} flag, the new {{result_metadata_id}} and the updated 
> result metadata (this overrides the {{No_metadata}} flag, even if the client 
> had requested it)
> - the client updates its copy of the result metadata before it decodes the 
> results.
> So the scenario above would now look like:
> # {{SELECT * ...}} statement is prepared in Cassandra with md5 abc123, and 
> result set (b, c) that hashes to cde456
> # column a gets added to the table, C* does not invalidate its cache entry, 
> but only updates the result set to (a, b, c) which hashes to fff789
> # client sends an EXECUTE request for (statementId=abc123, resultId=cde456) 
> and skip_metadata flag
> # cde456!=fff789, so C* responds with ROWS(..., no_metadata=false, 
> metadata_changed=true, new_metadata_id=fff789,col specs for (a,b,c))
> # client updates its column specifications, and will send the next execute 
> queries with (statementId=abc123, resultId=fff789)
> This works the same with multiple clients.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-10786) Include hash of result set metadata in prepared statement id

2017-08-07 Thread Jeff Jirsa (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16117297#comment-16117297
 ] 

Jeff Jirsa edited comment on CASSANDRA-10786 at 8/7/17 9:56 PM:


We shouldn't be committing anything into the database that relies on a 
{{-SNAPSHOT}} of a third party library.

-I'm hard (binding)- -1- -on any patch that does so. It's one thing to rely on 
third party drivers, it's quite a different thing to rely on a SNAPSHOT build.-

Please open an open "blocker" JIRA with fixver=4.0 to make sure this gets 
reverted before release. 



was (Author: jjirsa):
We shouldn't be committing anything into the database that relies on a 
{{-SNAPSHOT}} of a third party library.

I'm hard (binding) -1 on any patch that does so. It's one thing to rely on 
third party drivers, it's quite a different thing to rely on a SNAPSHOT build. 


> Include hash of result set metadata in prepared statement id
> 
>
> Key: CASSANDRA-10786
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10786
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: CQL
>Reporter: Olivier Michallat
>Assignee: Alex Petrov
>Priority: Minor
>  Labels: client-impacting, doc-impacting, protocolv5
> Fix For: 4.x
>
>
> *_Initial description:_*
> This is a follow-up to CASSANDRA-7910, which was about invalidating a 
> prepared statement when the table is altered, to force clients to update 
> their local copy of the metadata.
> There's still an issue if multiple clients are connected to the same host. 
> The first client to execute the query after the cache was invalidated will 
> receive an UNPREPARED response, re-prepare, and update its local metadata. 
> But other clients might miss it entirely (the MD5 hasn't changed), and they 
> will keep using their old metadata. For example:
> # {{SELECT * ...}} statement is prepared in Cassandra with md5 abc123, 
> clientA and clientB both have a cache of the metadata (columns b and c) 
> locally
> # column a gets added to the table, C* invalidates its cache entry
> # clientA sends an EXECUTE request for md5 abc123, gets UNPREPARED response, 
> re-prepares on the fly and updates its local metadata to (a, b, c)
> # prepared statement is now in C*’s cache again, with the same md5 abc123
> # clientB sends an EXECUTE request for id abc123. Because the cache has been 
> populated again, the query succeeds. But clientB still has not updated its 
> metadata, it’s still (b,c)
> One solution that was suggested is to include a hash of the result set 
> metadata in the md5. This way the md5 would change at step 3, and any client 
> using the old md5 would get an UNPREPARED, regardless of whether another 
> client already reprepared.
> -
> *_Resolution (2017/02/13):_*
> The following changes were made to native protocol v5:
> - the PREPARED response includes {{result_metadata_id}}, a hash of the result 
> set metadata.
> - every EXECUTE message must provide {{result_metadata_id}} in addition to 
> the prepared statement id. If it doesn't match the current one on the server, 
> it means the client is operating on a stale schema.
> - to notify the client, the server returns a ROWS response with a new 
> {{Metadata_changed}} flag, the new {{result_metadata_id}} and the updated 
> result metadata (this overrides the {{No_metadata}} flag, even if the client 
> had requested it)
> - the client updates its copy of the result metadata before it decodes the 
> results.
> So the scenario above would now look like:
> # {{SELECT * ...}} statement is prepared in Cassandra with md5 abc123, and 
> result set (b, c) that hashes to cde456
> # column a gets added to the table, C* does not invalidate its cache entry, 
> but only updates the result set to (a, b, c) which hashes to fff789
> # client sends an EXECUTE request for (statementId=abc123, resultId=cde456) 
> and skip_metadata flag
> # cde456!=fff789, so C* responds with ROWS(..., no_metadata=false, 
> metadata_changed=true, new_metadata_id=fff789,col specs for (a,b,c))
> # client updates its column specifications, and will send the next execute 
> queries with (statementId=abc123, resultId=fff789)
> This works the same with multiple clients.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-10786) Include hash of result set metadata in prepared statement id

2017-08-07 Thread Jeff Jirsa (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16117297#comment-16117297
 ] 

Jeff Jirsa edited comment on CASSANDRA-10786 at 8/7/17 9:10 PM:


We shouldn't be committing anything into the database that relies on a 
{{-SNAPSHOT}} of a third party library.

I'm hard (binding) -1 on any patch that does so. It's one thing to rely on 
third party drivers, it's quite a different thing to rely on a SNAPSHOT build. 



was (Author: jjirsa):
We shouldn't be committing anything into the database that relies on a 
{{-SNAPSHOT}} of a third party library.


> Include hash of result set metadata in prepared statement id
> 
>
> Key: CASSANDRA-10786
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10786
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: CQL
>Reporter: Olivier Michallat
>Assignee: Alex Petrov
>Priority: Minor
>  Labels: client-impacting, doc-impacting, protocolv5
> Fix For: 4.x
>
>
> *_Initial description:_*
> This is a follow-up to CASSANDRA-7910, which was about invalidating a 
> prepared statement when the table is altered, to force clients to update 
> their local copy of the metadata.
> There's still an issue if multiple clients are connected to the same host. 
> The first client to execute the query after the cache was invalidated will 
> receive an UNPREPARED response, re-prepare, and update its local metadata. 
> But other clients might miss it entirely (the MD5 hasn't changed), and they 
> will keep using their old metadata. For example:
> # {{SELECT * ...}} statement is prepared in Cassandra with md5 abc123, 
> clientA and clientB both have a cache of the metadata (columns b and c) 
> locally
> # column a gets added to the table, C* invalidates its cache entry
> # clientA sends an EXECUTE request for md5 abc123, gets UNPREPARED response, 
> re-prepares on the fly and updates its local metadata to (a, b, c)
> # prepared statement is now in C*’s cache again, with the same md5 abc123
> # clientB sends an EXECUTE request for id abc123. Because the cache has been 
> populated again, the query succeeds. But clientB still has not updated its 
> metadata, it’s still (b,c)
> One solution that was suggested is to include a hash of the result set 
> metadata in the md5. This way the md5 would change at step 3, and any client 
> using the old md5 would get an UNPREPARED, regardless of whether another 
> client already reprepared.
> -
> *_Resolution (2017/02/13):_*
> The following changes were made to native protocol v5:
> - the PREPARED response includes {{result_metadata_id}}, a hash of the result 
> set metadata.
> - every EXECUTE message must provide {{result_metadata_id}} in addition to 
> the prepared statement id. If it doesn't match the current one on the server, 
> it means the client is operating on a stale schema.
> - to notify the client, the server returns a ROWS response with a new 
> {{Metadata_changed}} flag, the new {{result_metadata_id}} and the updated 
> result metadata (this overrides the {{No_metadata}} flag, even if the client 
> had requested it)
> - the client updates its copy of the result metadata before it decodes the 
> results.
> So the scenario above would now look like:
> # {{SELECT * ...}} statement is prepared in Cassandra with md5 abc123, and 
> result set (b, c) that hashes to cde456
> # column a gets added to the table, C* does not invalidate its cache entry, 
> but only updates the result set to (a, b, c) which hashes to fff789
> # client sends an EXECUTE request for (statementId=abc123, resultId=cde456) 
> and skip_metadata flag
> # cde456!=fff789, so C* responds with ROWS(..., no_metadata=false, 
> metadata_changed=true, new_metadata_id=fff789,col specs for (a,b,c))
> # client updates its column specifications, and will send the next execute 
> queries with (statementId=abc123, resultId=fff789)
> This works the same with multiple clients.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-10786) Include hash of result set metadata in prepared statement id

2017-08-07 Thread Alex Petrov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16117190#comment-16117190
 ] 

Alex Petrov edited comment on CASSANDRA-10786 at 8/7/17 8:18 PM:
-

Thank you a lot for the review! 

I completely agree on the fact we should not have {{SNAPSHOT}} in tree, but 
it's kind of chicken-and-egg problem since the driver relies on Cassandra for 
tests and vice versa. I'll talk to the {{java-driver}} maintainers to make sure 
we find a proper solution, the last time we talked we agreed that it makes 
sense to have more or less final version of the Cassandra patch first 
cc [~adutra] [~omichallat] (could you comment, what'd be the best way to 
proceed? the current patch is based off 3.4 driver iirc).

We might also want to wait until {{python}} driver implements this feature 
since otherwise {{dtests}} that rely on {{v5}} can't pass 
cc [~andrew.tolbert] (could you comment as well, is it somewhere on the radar 
for you? Might make sense to make sure python driver has it before we can merge)




was (Author: ifesdjeen):
I completely agree on the fact we should not have {{SNAPSHOT}} in tree, but 
it's kind of chicken-and-egg problem since the driver relies on Cassandra for 
tests and vice versa. I'll talk to the {{java-driver}} maintainers to make sure 
we find a proper solution, the last time we talked we agreed that it makes 
sense to have more or less final version of the Cassandra patch first 
cc [~adutra] [~omichallat] (could you comment, what'd be the best way to 
proceed? the current patch is based off 3.4 driver iirc).

We might also want to wait until {{python}} driver implements this feature 
since otherwise {{dtests}} that rely on {{v5}} can't pass 
cc [~andrew.tolbert] (could you comment as well, is it somewhere on the radar 
for you? Might make sense to make sure python driver has it before we can merge)



> Include hash of result set metadata in prepared statement id
> 
>
> Key: CASSANDRA-10786
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10786
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: CQL
>Reporter: Olivier Michallat
>Assignee: Alex Petrov
>Priority: Minor
>  Labels: client-impacting, doc-impacting, protocolv5
> Fix For: 4.x
>
>
> *_Initial description:_*
> This is a follow-up to CASSANDRA-7910, which was about invalidating a 
> prepared statement when the table is altered, to force clients to update 
> their local copy of the metadata.
> There's still an issue if multiple clients are connected to the same host. 
> The first client to execute the query after the cache was invalidated will 
> receive an UNPREPARED response, re-prepare, and update its local metadata. 
> But other clients might miss it entirely (the MD5 hasn't changed), and they 
> will keep using their old metadata. For example:
> # {{SELECT * ...}} statement is prepared in Cassandra with md5 abc123, 
> clientA and clientB both have a cache of the metadata (columns b and c) 
> locally
> # column a gets added to the table, C* invalidates its cache entry
> # clientA sends an EXECUTE request for md5 abc123, gets UNPREPARED response, 
> re-prepares on the fly and updates its local metadata to (a, b, c)
> # prepared statement is now in C*’s cache again, with the same md5 abc123
> # clientB sends an EXECUTE request for id abc123. Because the cache has been 
> populated again, the query succeeds. But clientB still has not updated its 
> metadata, it’s still (b,c)
> One solution that was suggested is to include a hash of the result set 
> metadata in the md5. This way the md5 would change at step 3, and any client 
> using the old md5 would get an UNPREPARED, regardless of whether another 
> client already reprepared.
> -
> *_Resolution (2017/02/13):_*
> The following changes were made to native protocol v5:
> - the PREPARED response includes {{result_metadata_id}}, a hash of the result 
> set metadata.
> - every EXECUTE message must provide {{result_metadata_id}} in addition to 
> the prepared statement id. If it doesn't match the current one on the server, 
> it means the client is operating on a stale schema.
> - to notify the client, the server returns a ROWS response with a new 
> {{Metadata_changed}} flag, the new {{result_metadata_id}} and the updated 
> result metadata (this overrides the {{No_metadata}} flag, even if the client 
> had requested it)
> - the client updates its copy of the result metadata before it decodes the 
> results.
> So the scenario above would now look like:
> # {{SELECT * ...}} statement is prepared in Cassandra with md5 abc123, and 
> result set (b, c) that hashes to cde456
> # column a gets added to the table, C* does not invalidate its cache entry, 
> but only 

[jira] [Comment Edited] (CASSANDRA-10786) Include hash of result set metadata in prepared statement id

2017-07-25 Thread Alex Petrov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099604#comment-16099604
 ] 

Alex Petrov edited comment on CASSANDRA-10786 at 7/25/17 7:09 AM:
--

Thank you for the review! Sorry it took me so long to [address your 
comments|https://github.com/ifesdjeen/cassandra/commit/7c57d45ccb559e8c1c6291a6c978402921b1a609],
 but it did require a could of takes as I'm not very happy with the way things 
are coupled right now. One of the complications is that {{SelectStatement}} 
returns {{ResultMetadata}} through {{Selection}}, which in my opinion quite 
unfortunately couples CQL internals (query processing) with something that CQL 
returns (result set). I have left the refactoring out of this issue and will 
submit it later in a separate patch as a proposal. 

I did address the issue with preparing {{UPDATE}} statements with LWTs, 
{{BATCH}} and {{LIST}} querues. Also have added 
[dtest|https://github.com/riptano/cassandra-dtest/compare/master...ifesdjeen:10786-dtest]
 for those. Sidenote: until the python driver backing our dtest suite lands, 
some of the dtests with v5 had to be disabled since they interpret {{SELECT}} 
statements incorrectly due to the new flag and metadata id.


was (Author: ifesdjeen):
Thank you for the review! Sorry it took me so long to [fix your 
comments|https://github.com/ifesdjeen/cassandra/commit/7c57d45ccb559e8c1c6291a6c978402921b1a609],
 but it did require a could of takes as I'm not very happy with the way things 
are coupled right now. One of the complications is that {{SelectStatement}} 
returns {{ResultMetadata}} through {{Selection}}, which in my opinion quite 
unfortunately couples CQL internals (query processing) with something that CQL 
returns (result set). I have left the refactoring out of this issue and will 
submit it later in a separate patch as a proposal. 

I did address the issue with preparing {{UPDATE}} statements with LWTs, 
{{BATCH}} and {{LIST}} querues. Also have added 
[dtest|https://github.com/riptano/cassandra-dtest/compare/master...ifesdjeen:10786-dtest]
 for those. Sidenote: until the python driver backing our dtest suite lands, 
some of the dtests with v5 had to be disabled since they interpret {{SELECT}} 
statements incorrectly due to the new flag and metadata id.

> Include hash of result set metadata in prepared statement id
> 
>
> Key: CASSANDRA-10786
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10786
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: CQL
>Reporter: Olivier Michallat
>Assignee: Alex Petrov
>Priority: Minor
>  Labels: client-impacting, doc-impacting, protocolv5
> Fix For: 4.x
>
>
> *_Initial description:_*
> This is a follow-up to CASSANDRA-7910, which was about invalidating a 
> prepared statement when the table is altered, to force clients to update 
> their local copy of the metadata.
> There's still an issue if multiple clients are connected to the same host. 
> The first client to execute the query after the cache was invalidated will 
> receive an UNPREPARED response, re-prepare, and update its local metadata. 
> But other clients might miss it entirely (the MD5 hasn't changed), and they 
> will keep using their old metadata. For example:
> # {{SELECT * ...}} statement is prepared in Cassandra with md5 abc123, 
> clientA and clientB both have a cache of the metadata (columns b and c) 
> locally
> # column a gets added to the table, C* invalidates its cache entry
> # clientA sends an EXECUTE request for md5 abc123, gets UNPREPARED response, 
> re-prepares on the fly and updates its local metadata to (a, b, c)
> # prepared statement is now in C*’s cache again, with the same md5 abc123
> # clientB sends an EXECUTE request for id abc123. Because the cache has been 
> populated again, the query succeeds. But clientB still has not updated its 
> metadata, it’s still (b,c)
> One solution that was suggested is to include a hash of the result set 
> metadata in the md5. This way the md5 would change at step 3, and any client 
> using the old md5 would get an UNPREPARED, regardless of whether another 
> client already reprepared.
> -
> *_Resolution (2017/02/13):_*
> The following changes were made to native protocol v5:
> - the PREPARED response includes {{result_metadata_id}}, a hash of the result 
> set metadata.
> - every EXECUTE message must provide {{result_metadata_id}} in addition to 
> the prepared statement id. If it doesn't match the current one on the server, 
> it means the client is operating on a stale schema.
> - to notify the client, the server returns a ROWS response with a new 
> {{Metadata_changed}} flag, the new {{result_metadata_id}} and the updated 
> result metadata (this 

[jira] [Comment Edited] (CASSANDRA-10786) Include hash of result set metadata in prepared statement id

2017-07-25 Thread Alex Petrov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099604#comment-16099604
 ] 

Alex Petrov edited comment on CASSANDRA-10786 at 7/25/17 7:09 AM:
--

Thank you for the review! Sorry it took me so long to [fix your 
comments|https://github.com/ifesdjeen/cassandra/commit/7c57d45ccb559e8c1c6291a6c978402921b1a609],
 but it did require a could of takes as I'm not very happy with the way things 
are coupled right now. One of the complications is that {{SelectStatement}} 
returns {{ResultMetadata}} through {{Selection}}, which in my opinion quite 
unfortunately couples CQL internals (query processing) with something that CQL 
returns (result set). I have left the refactoring out of this issue and will 
submit it later in a separate patch as a proposal. 

I did address the issue with preparing {{UPDATE}} statements with LWTs, 
{{BATCH}} and {{LIST}} querues. Also have added 
[dtest|https://github.com/riptano/cassandra-dtest/compare/master...ifesdjeen:10786-dtest]
 for those. Sidenote: until the python driver backing our dtest suite lands, 
some of the dtests with v5 had to be disabled since they interpret {{SELECT}} 
statements incorrectly due to the new flag and metadata id.


was (Author: ifesdjeen):
Thank you for the review! Sorry it took me so long to fix your comments, but it 
did require a could of takes as I'm not very happy with the way things are 
coupled right now. One of the complications is that {{SelectStatement}} returns 
{{ResultMetadata}} through {{Selection}}, which in my opinion quite 
unfortunately couples CQL internals (query processing) with something that CQL 
returns (result set). I have left the refactoring out of this issue and will 
submit it later in a separate patch as a proposal. 

I did address the issue with preparing {{UPDATE}} statements with LWTs, 
{{BATCH}} and {{LIST}} querues. Also have added 
[dtest|https://github.com/riptano/cassandra-dtest/compare/master...ifesdjeen:10786-dtest]
 for those. Sidenote: until the python driver backing our dtest suite lands, 
some of the dtests with v5 had to be disabled since they interpret {{SELECT}} 
statements incorrectly due to the new flag and metadata id.

> Include hash of result set metadata in prepared statement id
> 
>
> Key: CASSANDRA-10786
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10786
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: CQL
>Reporter: Olivier Michallat
>Assignee: Alex Petrov
>Priority: Minor
>  Labels: client-impacting, doc-impacting, protocolv5
> Fix For: 4.x
>
>
> *_Initial description:_*
> This is a follow-up to CASSANDRA-7910, which was about invalidating a 
> prepared statement when the table is altered, to force clients to update 
> their local copy of the metadata.
> There's still an issue if multiple clients are connected to the same host. 
> The first client to execute the query after the cache was invalidated will 
> receive an UNPREPARED response, re-prepare, and update its local metadata. 
> But other clients might miss it entirely (the MD5 hasn't changed), and they 
> will keep using their old metadata. For example:
> # {{SELECT * ...}} statement is prepared in Cassandra with md5 abc123, 
> clientA and clientB both have a cache of the metadata (columns b and c) 
> locally
> # column a gets added to the table, C* invalidates its cache entry
> # clientA sends an EXECUTE request for md5 abc123, gets UNPREPARED response, 
> re-prepares on the fly and updates its local metadata to (a, b, c)
> # prepared statement is now in C*’s cache again, with the same md5 abc123
> # clientB sends an EXECUTE request for id abc123. Because the cache has been 
> populated again, the query succeeds. But clientB still has not updated its 
> metadata, it’s still (b,c)
> One solution that was suggested is to include a hash of the result set 
> metadata in the md5. This way the md5 would change at step 3, and any client 
> using the old md5 would get an UNPREPARED, regardless of whether another 
> client already reprepared.
> -
> *_Resolution (2017/02/13):_*
> The following changes were made to native protocol v5:
> - the PREPARED response includes {{result_metadata_id}}, a hash of the result 
> set metadata.
> - every EXECUTE message must provide {{result_metadata_id}} in addition to 
> the prepared statement id. If it doesn't match the current one on the server, 
> it means the client is operating on a stale schema.
> - to notify the client, the server returns a ROWS response with a new 
> {{Metadata_changed}} flag, the new {{result_metadata_id}} and the updated 
> result metadata (this overrides the {{No_metadata}} flag, even if the client 
> had requested it)
> - the client 

[jira] [Comment Edited] (CASSANDRA-10786) Include hash of result set metadata in prepared statement id

2017-02-14 Thread Olivier Michallat (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15866365#comment-15866365
 ] 

Olivier Michallat edited comment on CASSANDRA-10786 at 2/14/17 6:42 PM:


There are a couple of minor issues in {{native_protocol_v5.spec}}. In the ROWS 
response metadata:
* if both paging state and new metadata id are present, the paging state comes 
first, not second
* the metadata_changed flag is 0x0008, not 0x0005


was (Author: omichallat):
There are a couple of minor issues in {{native_protocol_v5.spec}}:
* the paging state is before the new metadata id, not after
* the metadata_changed flag is 0x0008, not 0x0005

> Include hash of result set metadata in prepared statement id
> 
>
> Key: CASSANDRA-10786
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10786
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: CQL
>Reporter: Olivier Michallat
>Assignee: Alex Petrov
>Priority: Minor
>  Labels: client-impacting, doc-impacting, protocolv5
> Fix For: 3.11.x
>
>
> *_Initial description:_*
> This is a follow-up to CASSANDRA-7910, which was about invalidating a 
> prepared statement when the table is altered, to force clients to update 
> their local copy of the metadata.
> There's still an issue if multiple clients are connected to the same host. 
> The first client to execute the query after the cache was invalidated will 
> receive an UNPREPARED response, re-prepare, and update its local metadata. 
> But other clients might miss it entirely (the MD5 hasn't changed), and they 
> will keep using their old metadata. For example:
> # {{SELECT * ...}} statement is prepared in Cassandra with md5 abc123, 
> clientA and clientB both have a cache of the metadata (columns b and c) 
> locally
> # column a gets added to the table, C* invalidates its cache entry
> # clientA sends an EXECUTE request for md5 abc123, gets UNPREPARED response, 
> re-prepares on the fly and updates its local metadata to (a, b, c)
> # prepared statement is now in C*’s cache again, with the same md5 abc123
> # clientB sends an EXECUTE request for id abc123. Because the cache has been 
> populated again, the query succeeds. But clientB still has not updated its 
> metadata, it’s still (b,c)
> One solution that was suggested is to include a hash of the result set 
> metadata in the md5. This way the md5 would change at step 3, and any client 
> using the old md5 would get an UNPREPARED, regardless of whether another 
> client already reprepared.
> -
> *_Resolution (2017/02/13):_*
> The following changes were made to native protocol v5:
> - the PREPARED response includes {{result_metadata_id}}, a hash of the result 
> set metadata.
> - every EXECUTE message must provide {{result_metadata_id}} in addition to 
> the prepared statement id. If it doesn't match the current one on the server, 
> it means the client is operating on a stale schema.
> - to notify the client, the server returns a ROWS response with a new 
> {{Metadata_changed}} flag, the new {{result_metadata_id}} and the updated 
> result metadata (this overrides the {{No_metadata}} flag, even if the client 
> had requested it)
> - the client updates its copy of the result metadata before it decodes the 
> results.
> So the scenario above would now look like:
> # {{SELECT * ...}} statement is prepared in Cassandra with md5 abc123, and 
> result set (b, c) that hashes to cde456
> # column a gets added to the table, C* does not invalidate its cache entry, 
> but only updates the result set to (a, b, c) which hashes to fff789
> # client sends an EXECUTE request for (statementId=abc123, resultId=cde456) 
> and skip_metadata flag
> # cde456!=fff789, so C* responds with ROWS(..., no_metadata=false, 
> metadata_changed=true, new_metadata_id=fff789,col specs for (a,b,c))
> # client updates its column specifications, and will send the next execute 
> queries with (statementId=abc123, resultId=fff789)
> This works the same with multiple clients.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (CASSANDRA-10786) Include hash of result set metadata in prepared statement id

2016-06-17 Thread Andy Tolbert (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15337240#comment-15337240
 ] 

Andy Tolbert edited comment on CASSANDRA-10786 at 6/17/16 11:54 PM:


{quote}
What's our normal process for replacing the bundled Java driver when there's a 
protocol version change? Andy Tolbert any preferences on how to handle the 
review of the driver changes?
{quote}

Good question, I recall with protocol v4 and with C* 3.0 that there was a 
period of time where a pre-released driver jar was put in lib which I think is 
reasonable, but maybe there is a better way to do this, will discuss with the 
team on Monday.  There was [a 
PR|https://github.com/datastax/java-driver/pull/675] for reviewing this, but we 
will need to update it with the commit from [~ifesdjeen]'s 
[branch|https://github.com/ifesdjeen/java-driver/tree/10786-v5].


was (Author: andrew.tolbert):
{quote}
What's our normal process for replacing the bundled Java driver when there's a 
protocol version change? Andy Tolbert any preferences on how to handle the 
review of the driver changes?
{quote}

Good question, I recall with protocol v4 and with C* 3.0 that there was a 
period of time where a pre-released driver jar was put in lib which I think is 
reasonable, but maybe there is a better way to do this, will discuss with the 
team on Monday.  There was [a 
PR|https://github.com/datastax/java-driver/pull/675] for reviewing this, but we 
will need to update it with the commit from 
[~ifesdjeen's|https://github.com/ifesdjeen/java-driver/tree/10786-v5] branch.

> Include hash of result set metadata in prepared statement id
> 
>
> Key: CASSANDRA-10786
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10786
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL
>Reporter: Olivier Michallat
>Assignee: Alex Petrov
>Priority: Minor
>  Labels: client-impacting, doc-impacting, protocolv5
> Fix For: 3.x
>
>
> This is a follow-up to CASSANDRA-7910, which was about invalidating a 
> prepared statement when the table is altered, to force clients to update 
> their local copy of the metadata.
> There's still an issue if multiple clients are connected to the same host. 
> The first client to execute the query after the cache was invalidated will 
> receive an UNPREPARED response, re-prepare, and update its local metadata. 
> But other clients might miss it entirely (the MD5 hasn't changed), and they 
> will keep using their old metadata. For example:
> # {{SELECT * ...}} statement is prepared in Cassandra with md5 abc123, 
> clientA and clientB both have a cache of the metadata (columns b and c) 
> locally
> # column a gets added to the table, C* invalidates its cache entry
> # clientA sends an EXECUTE request for md5 abc123, gets UNPREPARED response, 
> re-prepares on the fly and updates its local metadata to (a, b, c)
> # prepared statement is now in C*’s cache again, with the same md5 abc123
> # clientB sends an EXECUTE request for id abc123. Because the cache has been 
> populated again, the query succeeds. But clientB still has not updated its 
> metadata, it’s still (b,c)
> One solution that was suggested is to include a hash of the result set 
> metadata in the md5. This way the md5 would change at step 3, and any client 
> using the old md5 would get an UNPREPARED, regardless of whether another 
> client already reprepared.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-10786) Include hash of result set metadata in prepared statement id

2016-06-03 Thread Alex Petrov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15314138#comment-15314138
 ] 

Alex Petrov edited comment on CASSANDRA-10786 at 6/3/16 1:56 PM:
-

This feature might actually be much more useful than it initially appeared to 
me, although it only occurred to me during the implementation. For example, 
when the table gets altered and the statements get re-prepared under the hood, 
all unaffected queries will not need to go through the long re-prepare path, 
they'll be able to submit their queries, and if metadata doesn't indicate any 
change, it'll just get the results.

I'm mostly done with the patch for both server and client, will think about 
possible corner cases I didn't consider and submit it on Monday if all goes 
right.

I just have a couple of question on the {{v4}} vs {{v5}} protocol change. 
Cassandra server would work with older ({{v4}} version as well as with {{v5}}). 
We're just using different invalidation strategy and have very small changes on 
the protocol side (new flag and re-sent metadata). Flag would still be sent (of 
course, changing it is trivial), I'm just wondering if we usually do so, since 
flags do not have any overlap there. Same question on the driver side: do we 
support "all" versions simultaneously? Is there any test matrix for that?
 
One more thing: in the current implementation I have omitted the prepared 
statement id in the {{Rows}} response. I've checked both Java and Python 
driver, and in both cases these IDs are directly available there, since the 
metadata might get skipped and driver needs a way to query it, so it's more or 
less always around. I think we can safely skip it there.


was (Author: ifesdjeen):
This feature might actually be much more useful than it initially appeared to 
me, although it only occurred to me during the implementation. For example, 
when the table gets altered and the statements get re-prepared under the hood, 
all unaffected queries will not need to go through the long re-prepare path, 
they'll be able to submit their queries, and if metadata doesn't indicate any 
change, it'll just get the results.

I'm mostly done with the patch for both server and client, will think about 
possible corner cases I didn't consider and submit it on Monday if all goes 
right.

I just have a couple of question on the {{v4}} vs {{v5}} protocol change. 
Cassandra server would work with older ({{v4}} version as well as with {{v5}}). 
We're just using different invalidation strategy and have very small changes on 
the protocol side (new flag and re-sent metadata). Flag would still be sent (of 
course, changing it is trivial), I'm just wondering if we usually do so, since 
flags do not have any overlap there. Same question on the driver side: do we 
support "all" versions simultaneously? Is there any test matrix for that?

> Include hash of result set metadata in prepared statement id
> 
>
> Key: CASSANDRA-10786
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10786
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL
>Reporter: Olivier Michallat
>Assignee: Alex Petrov
>Priority: Minor
>  Labels: client-impacting, doc-impacting, protocolv5
> Fix For: 3.x
>
>
> This is a follow-up to CASSANDRA-7910, which was about invalidating a 
> prepared statement when the table is altered, to force clients to update 
> their local copy of the metadata.
> There's still an issue if multiple clients are connected to the same host. 
> The first client to execute the query after the cache was invalidated will 
> receive an UNPREPARED response, re-prepare, and update its local metadata. 
> But other clients might miss it entirely (the MD5 hasn't changed), and they 
> will keep using their old metadata. For example:
> # {{SELECT * ...}} statement is prepared in Cassandra with md5 abc123, 
> clientA and clientB both have a cache of the metadata (columns b and c) 
> locally
> # column a gets added to the table, C* invalidates its cache entry
> # clientA sends an EXECUTE request for md5 abc123, gets UNPREPARED response, 
> re-prepares on the fly and updates its local metadata to (a, b, c)
> # prepared statement is now in C*’s cache again, with the same md5 abc123
> # clientB sends an EXECUTE request for id abc123. Because the cache has been 
> populated again, the query succeeds. But clientB still has not updated its 
> metadata, it’s still (b,c)
> One solution that was suggested is to include a hash of the result set 
> metadata in the md5. This way the md5 would change at step 3, and any client 
> using the old md5 would get an UNPREPARED, regardless of whether another 
> client already reprepared.



--
This message was sent by Atlassian 

[jira] [Comment Edited] (CASSANDRA-10786) Include hash of result set metadata in prepared statement id

2016-05-19 Thread Robert Stupp (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291106#comment-15291106
 ] 

Robert Stupp edited comment on CASSANDRA-10786 at 5/19/16 1:48 PM:
---

Oh, right. We invalidate a pstmt when one of its dependencies changes - so, I 
thought too complicated.

Another possible way to solve the opt-in/long-hash problem would be to just add 
another identifier, which is the hash over the result set metadata. So, the 
current ID would stay as it is and we add a _fingerprint_ to _Prepared_ 
response and _Execute_ request.

For native_protocol_v5.spec, section _4.2.5.4. Prepared_ would contain:
{code}
-  is [short bytes] representing the prepared query ID.
-  is [short bytes] representing the metadata hash.
-  is composed of:
{code}
And the body for _4.1.6 Execute_ would be 
{{}}.

To handle the situation when that result-set-metadata-fingerprint does not 
match, there are two options IMO.
# The coordinator could reply with a new error code (near to 0x2500, 
Unprepared) telling the client that the result set metadata no longer matches 
and the statement needs to be prepared again.
# We just send out the result set metadata with the _Rows_ response in case the 
metadata has changed / does not match the fingerprint.

The second option would also work around a race condition that could arise with 
a new error code during schema changes. Means: some nodes may already use the 
new result set metadata while others still use the old one. It would also save 
one roundtrip. It makes the code on the client probably a bit more complex, but 
I think it's worth to pay that price in order to prevent this race condition 
(and _prepare storm_).


was (Author: snazy):
Oh, right. We invalidate a pstmt when one of its dependencies changes - so, I 
thought too complicated.

Another possible way to solve the opt-in/long-hash problem would be to just add 
another identifier, which is the hash over the result set metadata. So, the 
current ID would stay as it is and we add a _fingerprint_ to _Prepared_ 
response and _Execute_ request.

For native_protocol_v5.spec, section _4.2.5.4. Prepared_ would contain:
{code}
-  is [short bytes] representing the prepared query ID.
-  is [short bytes] representing the metadata hash.
-  is composed of:
{code}
And the body for _4.1.6 Execute_ would be 
{{}}.

To handle the situation when that result-set-metadata-fingerprint does not 
match, there are two options IMO.
# The coordinator could reply with a new error code (near to 0x2500, 
Unprepared) telling the client that the result set metadata no longer matches 
and the statement needs to be prepared again.
# We just send out the result set metadata with the _Rows_ response in case it 
has.

The second option would also work around a race condition that could arise with 
a new error code during schema changes. Means: some nodes may already use the 
new result set metadata while others still use the old one. It would also save 
one roundtrip. It makes the code on the client probably a bit more complex, but 
I think it's worth to pay that price in order to prevent this race condition 
(and _prepare storm_).

> Include hash of result set metadata in prepared statement id
> 
>
> Key: CASSANDRA-10786
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10786
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL
>Reporter: Olivier Michallat
>Assignee: Alex Petrov
>Priority: Minor
>  Labels: client-impacting, protocolv5
> Fix For: 3.x
>
>
> This is a follow-up to CASSANDRA-7910, which was about invalidating a 
> prepared statement when the table is altered, to force clients to update 
> their local copy of the metadata.
> There's still an issue if multiple clients are connected to the same host. 
> The first client to execute the query after the cache was invalidated will 
> receive an UNPREPARED response, re-prepare, and update its local metadata. 
> But other clients might miss it entirely (the MD5 hasn't changed), and they 
> will keep using their old metadata. For example:
> # {{SELECT * ...}} statement is prepared in Cassandra with md5 abc123, 
> clientA and clientB both have a cache of the metadata (columns b and c) 
> locally
> # column a gets added to the table, C* invalidates its cache entry
> # clientA sends an EXECUTE request for md5 abc123, gets UNPREPARED response, 
> re-prepares on the fly and updates its local metadata to (a, b, c)
> # prepared statement is now in C*’s cache again, with the same md5 abc123
> # clientB sends an EXECUTE request for id abc123. Because the cache has been 
> populated again, the query succeeds. But clientB still has not updated its 
> metadata, it’s still (b,c)
> 

[jira] [Comment Edited] (CASSANDRA-10786) Include hash of result set metadata in prepared statement id

2016-05-17 Thread Alex Petrov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15286345#comment-15286345
 ] 

Alex Petrov edited comment on CASSANDRA-10786 at 5/17/16 10:37 AM:
---

I have a patch to fix the described behaviour. In order to verify whether or 
not the patch works, I had to also come up with at some patch for the client 
that would work.

After adding result metadata to md5, as the message metadata on the client is 
never updated and new client ID of the re-prepared message is not saved 
anywhere, client would fall into the infinite re-prepare loop: 
  * (1) try running the query
  * fail with {{UNPREPARED}}
  * re-prepare (although without saving the new ID)
  * go to (1)

I first tried to transparently swap the request when getting {{UNPREPARED}}, 
although that would mean that the original "outdated" prepared query would 
re-prepare no matter how many times it runs (even though results it returns 
would be correct). On the other hand, making changes in the {{BoundStatement}} 
itself (or in it's {{PreparedStatement}}) might lead to undesired behaviour on 
the client-side or even be prone to races (when request is sent assuming one id 
and metadata and came back after it got swapped).

I put the failing test on ignore for now until we decide what could should be 
done on the driver side in this case.

I've created a corresponding [java driver 
issue|https://datastax-oss.atlassian.net/browse/JAVA-1196]


was (Author: ifesdjeen):
I have a patch to fix the described behaviour. In order to verify whether or 
not the patch works, I had to also come up with at some patch for the client 
that would work.

After adding result metadata to md5, as the message metadata on the client is 
never updated and new client ID of the re-prepared message is not saved 
anywhere, client would fall into the infinite re-prepare loop: 
  * (1) try running the query
  * fail with {{UNPREPARED}}
  * re-prepare (although without saving the new ID)
  * go to (1)

I first tried to transparently swap the request when getting {{UNPREPARED}}, 
although that would mean that the original "outdated" prepared query would 
re-prepare no matter how many times it runs (even though results it returns 
would be correct). On the other hand, making changes in the {{BoundStatement}} 
itself (or in it's {{PreparedStatement}}) might lead to undesired behaviour on 
the client-side or even be prone to races (when request is sent assuming one id 
and metadata and came back after it got swapped).

I put the failing test on ignore for now until we decide what could should be 
done on the driver side in this case.

> Include hash of result set metadata in prepared statement id
> 
>
> Key: CASSANDRA-10786
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10786
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL
>Reporter: Olivier Michallat
>Assignee: Alex Petrov
>Priority: Minor
>  Labels: client-impacting
> Fix For: 3.x
>
>
> This is a follow-up to CASSANDRA-7910, which was about invalidating a 
> prepared statement when the table is altered, to force clients to update 
> their local copy of the metadata.
> There's still an issue if multiple clients are connected to the same host. 
> The first client to execute the query after the cache was invalidated will 
> receive an UNPREPARED response, re-prepare, and update its local metadata. 
> But other clients might miss it entirely (the MD5 hasn't changed), and they 
> will keep using their old metadata. For example:
> # {{SELECT * ...}} statement is prepared in Cassandra with md5 abc123, 
> clientA and clientB both have a cache of the metadata (columns b and c) 
> locally
> # column a gets added to the table, C* invalidates its cache entry
> # clientA sends an EXECUTE request for md5 abc123, gets UNPREPARED response, 
> re-prepares on the fly and updates its local metadata to (a, b, c)
> # prepared statement is now in C*’s cache again, with the same md5 abc123
> # clientB sends an EXECUTE request for id abc123. Because the cache has been 
> populated again, the query succeeds. But clientB still has not updated its 
> metadata, it’s still (b,c)
> One solution that was suggested is to include a hash of the result set 
> metadata in the md5. This way the md5 would change at step 3, and any client 
> using the old md5 would get an UNPREPARED, regardless of whether another 
> client already reprepared.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)