[jira] [Commented] (CASSANDRA-8483) Support streaming results

2014-12-15 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14246694#comment-14246694
 ] 

Aleksey Yeschenko commented on CASSANDRA-8483:
--

You mean, with more isolation, not with isolation, right? Can't realistically 
isolate a query over a large partition, or, worse, SELECT * FROM foo;

 Support streaming results
 -

 Key: CASSANDRA-8483
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8483
 Project: Cassandra
  Issue Type: Sub-task
  Components: Core
Reporter: Benedict
 Fix For: 3.0


 Currently we stream the number of rows back to the client before serializing, 
 which means we need to know how many there are before doing so, which means 
 materializing the entire resultset. We currently get around this with paging 
 which attempts to restrict the amount of materialization done in any step, 
 but supporting streaming entire result sets in one native transport action 
 without materializing them all upfront would remove the need for paging in 
 many cases, and would permit resultsets to be streamed _with isolation_, 
 which most users probably don't realise is broken by paging.
 We can't use this change yet, but the sooner support for this is introduced 
 to the protocol, the more likely it is clients will be able to make use of 
 streaming reads once we're actually able to deliver them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8483) Support streaming results

2014-12-15 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14246701#comment-14246701
 ] 

Benedict commented on CASSANDRA-8483:
-

Isolation over a given partition; isolation across partitions isn't ever 
offered.

If you want isolation over a large partition you pay the cost of preventing 
reclamation of expired sstables until the read completes, but that is all. If 
you expect the partition to be so large that's a problem, you're probably doing 
it wrong, but can always fall back to paging.

 Support streaming results
 -

 Key: CASSANDRA-8483
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8483
 Project: Cassandra
  Issue Type: Sub-task
  Components: Core
Reporter: Benedict
 Fix For: 3.0


 Currently we stream the number of rows back to the client before serializing, 
 which means we need to know how many there are before doing so, which means 
 materializing the entire resultset. We currently get around this with paging 
 which attempts to restrict the amount of materialization done in any step, 
 but supporting streaming entire result sets in one native transport action 
 without materializing them all upfront would remove the need for paging in 
 many cases, and would permit resultsets to be streamed _with isolation_, 
 which most users probably don't realise is broken by paging.
 We can't use this change yet, but the sooner support for this is introduced 
 to the protocol, the more likely it is clients will be able to make use of 
 streaming reads once we're actually able to deliver them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8483) Support streaming results

2014-12-15 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14246719#comment-14246719
 ] 

Aleksey Yeschenko commented on CASSANDRA-8483:
--

For CLs other than ONE, too (which is relevant, with QUORUM being the 
predominant CL, and only becoming more popular as we go more and more 
mainstream)?

FTR, I'm not saying this is a bad idea, or that it hasn't been considered 
before. Just saying that we will not get isolation here, without deep changes 
elsewhere.

 Support streaming results
 -

 Key: CASSANDRA-8483
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8483
 Project: Cassandra
  Issue Type: Sub-task
  Components: Core
Reporter: Benedict
 Fix For: 3.0


 Currently we stream the number of rows back to the client before serializing, 
 which means we need to know how many there are before doing so, which means 
 materializing the entire resultset. We currently get around this with paging 
 which attempts to restrict the amount of materialization done in any step, 
 but supporting streaming entire result sets in one native transport action 
 without materializing them all upfront would remove the need for paging in 
 many cases, and would permit resultsets to be streamed _with isolation_, 
 which most users probably don't realise is broken by paging.
 We can't use this change yet, but the sooner support for this is introduced 
 to the protocol, the more likely it is clients will be able to make use of 
 streaming reads once we're actually able to deliver them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8483) Support streaming results

2014-12-15 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14246722#comment-14246722
 ] 

Sylvain Lebresne commented on CASSANDRA-8483:
-

Not entirely convinced we should start by the native protocol. There is a fair 
amount of obstacles to actually stream results internally: CASSANDRA-8100 is a 
first one but more generally we'd probably need deeper change to the inter-node 
protocol to actually stream results, digest queries is another part that is not 
trivially amenable to streaming in it's current state, etc All of this can 
of course be changed, but I wonder how quickly we can get there in practice, or 
even if the effort is really worth the trouble (not saying it's not, saying 
it's a question worth asking). In that context, starting by complicating the 
native protocol (making the life of all drivers author harder) doesn't 
necessarilly feel to me like the good first step, and I'd rather wait until 
we're almost there internally (which will definitively not be 3.0) before 
considering this. 

 Support streaming results
 -

 Key: CASSANDRA-8483
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8483
 Project: Cassandra
  Issue Type: Sub-task
  Components: Core
Reporter: Benedict
 Fix For: 3.0


 Currently we stream the number of rows back to the client before serializing, 
 which means we need to know how many there are before doing so, which means 
 materializing the entire resultset. We currently get around this with paging 
 which attempts to restrict the amount of materialization done in any step, 
 but supporting streaming entire result sets in one native transport action 
 without materializing them all upfront would remove the need for paging in 
 many cases, and would permit resultsets to be streamed _with isolation_, 
 which most users probably don't realise is broken by paging.
 We can't use this change yet, but the sooner support for this is introduced 
 to the protocol, the more likely it is clients will be able to make use of 
 streaming reads once we're actually able to deliver them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8483) Support streaming results

2014-12-15 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14246723#comment-14246723
 ] 

Benedict commented on CASSANDRA-8483:
-

Yes, I agree we'll need a lot of changes elsewhere, but the goal here is only 
to introduce the capability to the clients so they're ready well ahead of time, 
and so we can hopefully taper off new protocol versions.

We do currently offer isolation for CLONE if you read from a single partition, 
since we corroborate all sources have the same digest and only actually return 
the results from a single replica that is read with isolation. Doing so for a 
streaming read would certainly be more difficult, but I don't think any (or 
much) more so than the streaming reads themselves, which will need to deal with 
streaming digests and read repair anyway.

 Support streaming results
 -

 Key: CASSANDRA-8483
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8483
 Project: Cassandra
  Issue Type: Sub-task
  Components: Core
Reporter: Benedict
 Fix For: 3.0


 Currently we stream the number of rows back to the client before serializing, 
 which means we need to know how many there are before doing so, which means 
 materializing the entire resultset. We currently get around this with paging 
 which attempts to restrict the amount of materialization done in any step, 
 but supporting streaming entire result sets in one native transport action 
 without materializing them all upfront would remove the need for paging in 
 many cases, and would permit resultsets to be streamed _with isolation_, 
 which most users probably don't realise is broken by paging.
 We can't use this change yet, but the sooner support for this is introduced 
 to the protocol, the more likely it is clients will be able to make use of 
 streaming reads once we're actually able to deliver them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8483) Support streaming results

2014-12-15 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14246733#comment-14246733
 ] 

Sylvain Lebresne commented on CASSANDRA-8483:
-

bq. but the goal here is only to introduce the capability to the clients so 
they're ready well ahead of time

To clarify, I understood this was the intend. But I'm saying that there is 
enough internal work to do imo that I don't think it's wise to introduce 
complexity for drivers authors that might, perharps, never be actually used. Or 
only in a long time. I'm fine brainstorming on how it could be done, but I'd 
rather not include it in the v4 of the protocol.

 Support streaming results
 -

 Key: CASSANDRA-8483
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8483
 Project: Cassandra
  Issue Type: Sub-task
  Components: Core
Reporter: Benedict
 Fix For: 3.0


 Currently we stream the number of rows back to the client before serializing, 
 which means we need to know how many there are before doing so, which means 
 materializing the entire resultset. We currently get around this with paging 
 which attempts to restrict the amount of materialization done in any step, 
 but supporting streaming entire result sets in one native transport action 
 without materializing them all upfront would remove the need for paging in 
 many cases, and would permit resultsets to be streamed _with isolation_, 
 which most users probably don't realise is broken by paging.
 We can't use this change yet, but the sooner support for this is introduced 
 to the protocol, the more likely it is clients will be able to make use of 
 streaming reads once we're actually able to deliver them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8483) Support streaming results

2014-12-15 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14246735#comment-14246735
 ] 

Benedict commented on CASSANDRA-8483:
-

bq. or even if the effort is really worth the trouble

Well, streaming reads comes up regularly as a topic, and it makes a lot of 
sense for keeping GC requirements constant regardless of workload. However if 
we decide we don't want to introduce them in future this ticket is definitely 
superfluous. I think we would be mistaken not to target a world where all 
workloads require a static amount of resources to service, to reduce total heap 
requirements and just as importantly to avoid hunting through heap dumps just 
to tell the user there are problems with their data model.

 Support streaming results
 -

 Key: CASSANDRA-8483
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8483
 Project: Cassandra
  Issue Type: Sub-task
  Components: Core
Reporter: Benedict
 Fix For: 3.0


 Currently we stream the number of rows back to the client before serializing, 
 which means we need to know how many there are before doing so, which means 
 materializing the entire resultset. We currently get around this with paging 
 which attempts to restrict the amount of materialization done in any step, 
 but supporting streaming entire result sets in one native transport action 
 without materializing them all upfront would remove the need for paging in 
 many cases, and would permit resultsets to be streamed _with isolation_, 
 which most users probably don't realise is broken by paging.
 We can't use this change yet, but the sooner support for this is introduced 
 to the protocol, the more likely it is clients will be able to make use of 
 streaming reads once we're actually able to deliver them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8483) Support streaming results

2014-12-15 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14246736#comment-14246736
 ] 

Benedict commented on CASSANDRA-8483:
-

bq. To clarify, I understood this was the intend

To clarify, I was responding to Aleksey, we just hit a race condition (or two) 
:)

 Support streaming results
 -

 Key: CASSANDRA-8483
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8483
 Project: Cassandra
  Issue Type: Sub-task
  Components: Core
Reporter: Benedict
 Fix For: 3.0


 Currently we stream the number of rows back to the client before serializing, 
 which means we need to know how many there are before doing so, which means 
 materializing the entire resultset. We currently get around this with paging 
 which attempts to restrict the amount of materialization done in any step, 
 but supporting streaming entire result sets in one native transport action 
 without materializing them all upfront would remove the need for paging in 
 many cases, and would permit resultsets to be streamed _with isolation_, 
 which most users probably don't realise is broken by paging.
 We can't use this change yet, but the sooner support for this is introduced 
 to the protocol, the more likely it is clients will be able to make use of 
 streaming reads once we're actually able to deliver them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8483) Support streaming results

2014-12-15 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14246739#comment-14246739
 ] 

Aleksey Yeschenko commented on CASSANDRA-8483:
--

bq. I think we would be mistaken not to target a world where all workloads 
require a static amount of resources to service, to reduce total heap 
requirements

No disagreement here.

 Support streaming results
 -

 Key: CASSANDRA-8483
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8483
 Project: Cassandra
  Issue Type: Sub-task
  Components: Core
Reporter: Benedict
 Fix For: 3.0


 Currently we stream the number of rows back to the client before serializing, 
 which means we need to know how many there are before doing so, which means 
 materializing the entire resultset. We currently get around this with paging 
 which attempts to restrict the amount of materialization done in any step, 
 but supporting streaming entire result sets in one native transport action 
 without materializing them all upfront would remove the need for paging in 
 many cases, and would permit resultsets to be streamed _with isolation_, 
 which most users probably don't realise is broken by paging.
 We can't use this change yet, but the sooner support for this is introduced 
 to the protocol, the more likely it is clients will be able to make use of 
 streaming reads once we're actually able to deliver them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)