[jira] [Commented] (CASSANDRA-6167) Add end-slice termination predicate

2016-03-11 Thread Pavel Trukhanov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15191520#comment-15191520
 ] 

Pavel Trukhanov commented on CASSANDRA-6167:


{code}LIMIT UNTIL condition {code}?

> Add end-slice termination predicate
> ---
>
> Key: CASSANDRA-6167
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6167
> Project: Cassandra
>  Issue Type: Improvement
>  Components: CQL
>Reporter: Tupshin Harper
>Priority: Minor
>  Labels: cql
>
> When doing performing storage-engine slices, it would sometimes be beneficial 
> to have the slice terminate for other reasons other than number of columns or 
> min/max cell name.
> Since we are able to look at the contents of each cell as we read it, this is 
> potentially doable with very little overhead. 
> Probably more challenging than the storage-engine implementation itself, is 
> to come up with appropriate CQL syntax (Thrift, should we decide to support 
> it, would be trivial).
> Two possibilities ar
> 1) special where function:
> SELECT pk,event from cf WHERE pk IN (1,5,10,11) AND 
> partition_predicate({predicate})
> or a bigger language change, but i think one I prefer. more like:
> 2) SELECT pk,event from cf where pk IN (1,5,10,11) UNTIL PARTITION event 
> {predicate}
> Neither feels perfect, but I do like the fact that the second one at least 
> clearly states what it is intended to do.
> By using "UNTIL PARTITION", we could re-use the UNTIL keyword to handle other 
> kinds of early-termination of selects that the coordinator might be able to 
> do, such as stop retrieving additional rows from shards after a particular 
> criterion was met.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-6167) Add end-slice termination predicate

2016-03-11 Thread Pavel Trukhanov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15191514#comment-15191514
 ] 

Pavel Trukhanov commented on CASSANDRA-6167:


As I understand it, it's kind of limit like but controllable by user. How about 
using LIMIT keyword but with condition rather than integer constant?

> Add end-slice termination predicate
> ---
>
> Key: CASSANDRA-6167
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6167
> Project: Cassandra
>  Issue Type: Improvement
>  Components: CQL
>Reporter: Tupshin Harper
>Priority: Minor
>  Labels: cql
>
> When doing performing storage-engine slices, it would sometimes be beneficial 
> to have the slice terminate for other reasons other than number of columns or 
> min/max cell name.
> Since we are able to look at the contents of each cell as we read it, this is 
> potentially doable with very little overhead. 
> Probably more challenging than the storage-engine implementation itself, is 
> to come up with appropriate CQL syntax (Thrift, should we decide to support 
> it, would be trivial).
> Two possibilities ar
> 1) special where function:
> SELECT pk,event from cf WHERE pk IN (1,5,10,11) AND 
> partition_predicate({predicate})
> or a bigger language change, but i think one I prefer. more like:
> 2) SELECT pk,event from cf where pk IN (1,5,10,11) UNTIL PARTITION event 
> {predicate}
> Neither feels perfect, but I do like the fact that the second one at least 
> clearly states what it is intended to do.
> By using "UNTIL PARTITION", we could re-use the UNTIL keyword to handle other 
> kinds of early-termination of selects that the coordinator might be able to 
> do, such as stop retrieving additional rows from shards after a particular 
> criterion was met.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-6167) Add end-slice termination predicate

2016-03-08 Thread Pavel Trukhanov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185556#comment-15185556
 ] 

Pavel Trukhanov commented on CASSANDRA-6167:


I wanted to vote for the issue but was not sure about the whole idea, though.

So please consider this comment as a vote for this or something alike:

bq. filtering based on the value of a static column by comparison to a 
non-static column

> Add end-slice termination predicate
> ---
>
> Key: CASSANDRA-6167
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6167
> Project: Cassandra
>  Issue Type: Improvement
>  Components: CQL
>Reporter: Tupshin Harper
>Priority: Minor
>  Labels: cql
>
> When doing performing storage-engine slices, it would sometimes be beneficial 
> to have the slice terminate for other reasons other than number of columns or 
> min/max cell name.
> Since we are able to look at the contents of each cell as we read it, this is 
> potentially doable with very little overhead. 
> Probably more challenging than the storage-engine implementation itself, is 
> to come up with appropriate CQL syntax (Thrift, should we decide to support 
> it, would be trivial).
> Two possibilities ar
> 1) special where function:
> SELECT pk,event from cf WHERE pk IN (1,5,10,11) AND 
> partition_predicate({predicate})
> or a bigger language change, but i think one I prefer. more like:
> 2) SELECT pk,event from cf where pk IN (1,5,10,11) UNTIL PARTITION event 
> {predicate}
> Neither feels perfect, but I do like the fact that the second one at least 
> clearly states what it is intended to do.
> By using "UNTIL PARTITION", we could re-use the UNTIL keyword to handle other 
> kinds of early-termination of selects that the coordinator might be able to 
> do, such as stop retrieving additional rows from shards after a particular 
> criterion was met.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-6167) Add end-slice termination predicate

2015-03-28 Thread Robert Stupp (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14385542#comment-14385542
 ] 

Robert Stupp commented on CASSANDRA-6167:
-

Ping on this one.
Now that we have CASSANDRA-4914 and CASSANDRA-8053, it could be possible to add 
some kind of row limit per CQL partition key or end aggregate calculation. 
WDYT?

 Add end-slice termination predicate
 ---

 Key: CASSANDRA-6167
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6167
 Project: Cassandra
  Issue Type: Improvement
  Components: API, Core
Reporter: Tupshin Harper
Priority: Minor
  Labels: cql

 When doing performing storage-engine slices, it would sometimes be beneficial 
 to have the slice terminate for other reasons other than number of columns or 
 min/max cell name.
 Since we are able to look at the contents of each cell as we read it, this is 
 potentially doable with very little overhead. 
 Probably more challenging than the storage-engine implementation itself, is 
 to come up with appropriate CQL syntax (Thrift, should we decide to support 
 it, would be trivial).
 Two possibilities ar
 1) special where function:
 SELECT pk,event from cf WHERE pk IN (1,5,10,11) AND 
 partition_predicate({predicate})
 or a bigger language change, but i think one I prefer. more like:
 2) SELECT pk,event from cf where pk IN (1,5,10,11) UNTIL PARTITION event 
 {predicate}
 Neither feels perfect, but I do like the fact that the second one at least 
 clearly states what it is intended to do.
 By using UNTIL PARTITION, we could re-use the UNTIL keyword to handle other 
 kinds of early-termination of selects that the coordinator might be able to 
 do, such as stop retrieving additional rows from shards after a particular 
 criterion was met.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-6167) Add end-slice termination predicate

2014-02-14 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13901224#comment-13901224
 ] 

Sylvain Lebresne commented on CASSANDRA-6167:
-

Playing devil's advocate here but why wouldn't you just store the last 
aggregated value in a separate table? Granted, that assume you know the last 
aggregation value which in theory means 2 reads, but in practice it doesn't 
sound particularly hard for clients to cache that last aggregated value (of 
course, you'd want to refresh that cached value at some frequency but that can 
be done in the background easily enough).

Because my main problem with that example is that this sound a lot like a hack. 
If I store floats, I want evtval to be a float, not some string that I abuse to 
store an aggregation in the middle of other stuffs (because that's fairly error 
prone for any consumer of the table that don't care about the pre-computed 
aggregation). I really don't think we should promote such ways. Note that I 
understand it's just an example, but it doesn't feels to me like we should 
add such a thing without a bunch of non-hacky examples of that being useful.

Also, there is CASSANDRA-4914. Once we have that, you'd want to use it for 
aggregation. Even if you still want to do the incremental aggregation like in 
your example, you'll still really want to use CASSANDRA-4914 to aggregate the 
values 'since last aggregation'. And I don't really see how the idea of this 
could cleanly cohabit with CASSANDRA-4914 (while it's trivial if you just 
store/cache the aggregation separately).  

 Add end-slice termination predicate
 ---

 Key: CASSANDRA-6167
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6167
 Project: Cassandra
  Issue Type: Improvement
  Components: API, Core
Reporter: Tupshin Harper
Priority: Minor
  Labels: ponies

 When doing performing storage-engine slices, it would sometimes be beneficial 
 to have the slice terminate for other reasons other than number of columns or 
 min/max cell name.
 Since we are able to look at the contents of each cell as we read it, this is 
 potentially doable with very little overhead. 
 Probably more challenging than the storage-engine implementation itself, is 
 to come up with appropriate CQL syntax (Thrift, should we decide to support 
 it, would be trivial).
 Two possibilities ar
 1) special where function:
 SELECT pk,event from cf WHERE pk IN (1,5,10,11) AND 
 partition_predicate({predicate})
 or a bigger language change, but i think one I prefer. more like:
 2) SELECT pk,event from cf where pk IN (1,5,10,11) UNTIL PARTITION event 
 {predicate}
 Neither feels perfect, but I do like the fact that the second one at least 
 clearly states what it is intended to do.
 By using UNTIL PARTITION, we could re-use the UNTIL keyword to handle other 
 kinds of early-termination of selects that the coordinator might be able to 
 do, such as stop retrieving additional rows from shards after a particular 
 criterion was met.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CASSANDRA-6167) Add end-slice termination predicate

2014-02-14 Thread Tupshin Harper (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13901454#comment-13901454
 ] 

Tupshin Harper commented on CASSANDRA-6167:
---

The example above shows how this feature would allow for efficient client-side 
aggregation without having to do two round trips from the client. What you 
describe is a variant that I use today. However, even caching the last 
aggregation value is clearly insufficient in a massively distributed 
environment. The assumption here is that you are likely to have had another 
process do the last aggregation, so it is almost always necessary to do two 
rounds trips with the current approach.

I share your dislike the string abuse hack, and am very open to other 
suggestions.

It is quite possible that this ticket should somehow be subsumed into 
CASSANDRA-4914, but not as an aggregate function that could be implemented, but 
instead by extending CASSANDRA-4914 to include partition slice termination 
functions. It would be very incorrect to assume that this ticket would only be 
used for aggregation. There are a lot of cases where I would want to slice 
backwards in time until event X occurred, where event X is determined by the 
value of the event itself.

So, if 4914 became Custom aggregate, filtering, and slice termination 
functions in CQL, then I would be all on board. But filtering is certainly not 
going to be sufficient as it implies that the nod would still have to read the 
entire partition (or explicitly specified portion of the partition) which is 
exactly the opposite of the goal here


 Add end-slice termination predicate
 ---

 Key: CASSANDRA-6167
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6167
 Project: Cassandra
  Issue Type: Improvement
  Components: API, Core
Reporter: Tupshin Harper
Priority: Minor
  Labels: ponies

 When doing performing storage-engine slices, it would sometimes be beneficial 
 to have the slice terminate for other reasons other than number of columns or 
 min/max cell name.
 Since we are able to look at the contents of each cell as we read it, this is 
 potentially doable with very little overhead. 
 Probably more challenging than the storage-engine implementation itself, is 
 to come up with appropriate CQL syntax (Thrift, should we decide to support 
 it, would be trivial).
 Two possibilities ar
 1) special where function:
 SELECT pk,event from cf WHERE pk IN (1,5,10,11) AND 
 partition_predicate({predicate})
 or a bigger language change, but i think one I prefer. more like:
 2) SELECT pk,event from cf where pk IN (1,5,10,11) UNTIL PARTITION event 
 {predicate}
 Neither feels perfect, but I do like the fact that the second one at least 
 clearly states what it is intended to do.
 By using UNTIL PARTITION, we could re-use the UNTIL keyword to handle other 
 kinds of early-termination of selects that the coordinator might be able to 
 do, such as stop retrieving additional rows from shards after a particular 
 criterion was met.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CASSANDRA-6167) Add end-slice termination predicate

2014-02-14 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13901459#comment-13901459
 ] 

Benedict commented on CASSANDRA-6167:
-

It seems like this could be achieved today with static columns. Perhaps we 
could permit filtering based on the value of a static column by comparison to a 
non-static column, so that you have a static (time, aggregate), and you select 
from non-static all values  time; compose with aggregate and update. Of 
course, updating the value would need to be done via CAS to be safe.

Note I'm not endorsing this, but this certainly seems less bad than abusing the 
type of the value column.

 Add end-slice termination predicate
 ---

 Key: CASSANDRA-6167
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6167
 Project: Cassandra
  Issue Type: Improvement
  Components: API, Core
Reporter: Tupshin Harper
Priority: Minor
  Labels: ponies

 When doing performing storage-engine slices, it would sometimes be beneficial 
 to have the slice terminate for other reasons other than number of columns or 
 min/max cell name.
 Since we are able to look at the contents of each cell as we read it, this is 
 potentially doable with very little overhead. 
 Probably more challenging than the storage-engine implementation itself, is 
 to come up with appropriate CQL syntax (Thrift, should we decide to support 
 it, would be trivial).
 Two possibilities ar
 1) special where function:
 SELECT pk,event from cf WHERE pk IN (1,5,10,11) AND 
 partition_predicate({predicate})
 or a bigger language change, but i think one I prefer. more like:
 2) SELECT pk,event from cf where pk IN (1,5,10,11) UNTIL PARTITION event 
 {predicate}
 Neither feels perfect, but I do like the fact that the second one at least 
 clearly states what it is intended to do.
 By using UNTIL PARTITION, we could re-use the UNTIL keyword to handle other 
 kinds of early-termination of selects that the coordinator might be able to 
 do, such as stop retrieving additional rows from shards after a particular 
 criterion was met.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CASSANDRA-6167) Add end-slice termination predicate

2014-02-14 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13901477#comment-13901477
 ] 

Sylvain Lebresne commented on CASSANDRA-6167:
-

bq.  it is almost always necessary to do two rounds trips with the current 
approach.

I suppose it's not entirely relevant to the overall scope but I'm curious, why 
is that (assuming you can cache the last computed aggregation and that whatever 
background thread you have can update that value)?

bq. Perhaps we could permit filtering based on the value of a static column by 
comparison to a non-static column

I though about that too and I agree that at least in this example that sounds a 
lot less bad. Though arguably that may not cover the full scope of 'termination 
predicate'. Might be good enough in practice, or useful anyway in its own 
right, I don't know.

bq. It would be very incorrect to assume that this ticket would only be used 
for aggregation.

I understand. I guess what I'm saying is that I'm really not convinced by the 
actual proposition here and that one sign that it may might not be the way to 
go would be to not be able to come with at least one convincing example that 
don't feel hackish. But I don't really disagree on the underlying idea of 
exposing finer ways to terminate slices :)

 Add end-slice termination predicate
 ---

 Key: CASSANDRA-6167
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6167
 Project: Cassandra
  Issue Type: Improvement
  Components: API, Core
Reporter: Tupshin Harper
Priority: Minor
  Labels: ponies

 When doing performing storage-engine slices, it would sometimes be beneficial 
 to have the slice terminate for other reasons other than number of columns or 
 min/max cell name.
 Since we are able to look at the contents of each cell as we read it, this is 
 potentially doable with very little overhead. 
 Probably more challenging than the storage-engine implementation itself, is 
 to come up with appropriate CQL syntax (Thrift, should we decide to support 
 it, would be trivial).
 Two possibilities ar
 1) special where function:
 SELECT pk,event from cf WHERE pk IN (1,5,10,11) AND 
 partition_predicate({predicate})
 or a bigger language change, but i think one I prefer. more like:
 2) SELECT pk,event from cf where pk IN (1,5,10,11) UNTIL PARTITION event 
 {predicate}
 Neither feels perfect, but I do like the fact that the second one at least 
 clearly states what it is intended to do.
 By using UNTIL PARTITION, we could re-use the UNTIL keyword to handle other 
 kinds of early-termination of selects that the coordinator might be able to 
 do, such as stop retrieving additional rows from shards after a particular 
 criterion was met.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CASSANDRA-6167) Add end-slice termination predicate

2014-02-14 Thread Tupshin Harper (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13901513#comment-13901513
 ] 

Tupshin Harper commented on CASSANDRA-6167:
---

That certainly has some appeal. Linking to CASSANDRA-6561 for reference.

On the other hand, I'm increasingly a fan of scala/haskell style Either types, 
and if/when we get custom types, I would consider implementing this feature in 
terms such a disjoint union.

 Add end-slice termination predicate
 ---

 Key: CASSANDRA-6167
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6167
 Project: Cassandra
  Issue Type: Improvement
  Components: API, Core
Reporter: Tupshin Harper
Priority: Minor
  Labels: ponies

 When doing performing storage-engine slices, it would sometimes be beneficial 
 to have the slice terminate for other reasons other than number of columns or 
 min/max cell name.
 Since we are able to look at the contents of each cell as we read it, this is 
 potentially doable with very little overhead. 
 Probably more challenging than the storage-engine implementation itself, is 
 to come up with appropriate CQL syntax (Thrift, should we decide to support 
 it, would be trivial).
 Two possibilities ar
 1) special where function:
 SELECT pk,event from cf WHERE pk IN (1,5,10,11) AND 
 partition_predicate({predicate})
 or a bigger language change, but i think one I prefer. more like:
 2) SELECT pk,event from cf where pk IN (1,5,10,11) UNTIL PARTITION event 
 {predicate}
 Neither feels perfect, but I do like the fact that the second one at least 
 clearly states what it is intended to do.
 By using UNTIL PARTITION, we could re-use the UNTIL keyword to handle other 
 kinds of early-termination of selects that the coordinator might be able to 
 do, such as stop retrieving additional rows from shards after a particular 
 criterion was met.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CASSANDRA-6167) Add end-slice termination predicate

2014-02-14 Thread Tupshin Harper (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13901526#comment-13901526
 ] 

Tupshin Harper commented on CASSANDRA-6167:
---

Sylvain: so given the complexity and ambiguity about best approach for my 
example above, I'll just return to the simple:
Partition represents a slice of time-series events for a particular source
Client wants to know if value = X appeared in the last 10,000 events (or since 
time Y). Outlier detection is one reason. There are others.
Currently we would either need to do a single large slice, and retrieve all 
10,000 events even if the value was found in the first few, or get smaller 
batches and keep retrieving additional batches until the target value is found.
No matter what the syntax or implementation is, I'm quite convinced in the 
utility of being able to short circuit reading from a partition.

 Add end-slice termination predicate
 ---

 Key: CASSANDRA-6167
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6167
 Project: Cassandra
  Issue Type: Improvement
  Components: API, Core
Reporter: Tupshin Harper
Priority: Minor
  Labels: ponies

 When doing performing storage-engine slices, it would sometimes be beneficial 
 to have the slice terminate for other reasons other than number of columns or 
 min/max cell name.
 Since we are able to look at the contents of each cell as we read it, this is 
 potentially doable with very little overhead. 
 Probably more challenging than the storage-engine implementation itself, is 
 to come up with appropriate CQL syntax (Thrift, should we decide to support 
 it, would be trivial).
 Two possibilities ar
 1) special where function:
 SELECT pk,event from cf WHERE pk IN (1,5,10,11) AND 
 partition_predicate({predicate})
 or a bigger language change, but i think one I prefer. more like:
 2) SELECT pk,event from cf where pk IN (1,5,10,11) UNTIL PARTITION event 
 {predicate}
 Neither feels perfect, but I do like the fact that the second one at least 
 clearly states what it is intended to do.
 By using UNTIL PARTITION, we could re-use the UNTIL keyword to handle other 
 kinds of early-termination of selects that the coordinator might be able to 
 do, such as stop retrieving additional rows from shards after a particular 
 criterion was met.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CASSANDRA-6167) Add end-slice termination predicate

2014-02-14 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13901528#comment-13901528
 ] 

Benedict commented on CASSANDRA-6167:
-

This new example doesn't seem to be a good argument for it. If you're 
attempting outlier detection, you expect it *not* to be present, and so will 
ordinarily read the whole 10k events. So it does not save you anything.

 Add end-slice termination predicate
 ---

 Key: CASSANDRA-6167
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6167
 Project: Cassandra
  Issue Type: Improvement
  Components: API, Core
Reporter: Tupshin Harper
Priority: Minor
  Labels: ponies

 When doing performing storage-engine slices, it would sometimes be beneficial 
 to have the slice terminate for other reasons other than number of columns or 
 min/max cell name.
 Since we are able to look at the contents of each cell as we read it, this is 
 potentially doable with very little overhead. 
 Probably more challenging than the storage-engine implementation itself, is 
 to come up with appropriate CQL syntax (Thrift, should we decide to support 
 it, would be trivial).
 Two possibilities ar
 1) special where function:
 SELECT pk,event from cf WHERE pk IN (1,5,10,11) AND 
 partition_predicate({predicate})
 or a bigger language change, but i think one I prefer. more like:
 2) SELECT pk,event from cf where pk IN (1,5,10,11) UNTIL PARTITION event 
 {predicate}
 Neither feels perfect, but I do like the fact that the second one at least 
 clearly states what it is intended to do.
 By using UNTIL PARTITION, we could re-use the UNTIL keyword to handle other 
 kinds of early-termination of selects that the coordinator might be able to 
 do, such as stop retrieving additional rows from shards after a particular 
 criterion was met.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CASSANDRA-6167) Add end-slice termination predicate

2014-02-13 Thread Tupshin Harper (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13901052#comment-13901052
 ] 

Tupshin Harper commented on CASSANDRA-6167:
---

Adding an example for how this could be used for efficient client-side 
implemented aggregation.

Assume a CQL table with the following structure:
CREATE TABLE t6167 (
  uid text,
  evtts int,
  evtval text,
  PRIMARY KEY (uid, evtts)
) WITH CLUSTERING ORDER BY (evtts DESC)
(In a real system, evtts would probably be a timeuuid to avoid risk of 
collisions)
Assume data in that table for a single partition looks like
 uid | evtts | evtval
-+---+
   1 | 7 |0.5
   1 | 6 |   -1.4
   1 | 5 |0.3
   1 | 4 |   s5.1
   1 | 3 |1.7
   1 | 2 |1.3
   1 | 1 |2.1
(Ignore the monotonically increasing timestamps.  Here only for simplicity.  
Timeuuids, there would not do this, of course)


So this structure is used to write new floats (only used as an example of 
arbitrary aggregation)

Source events will write values such as 2.1 and 1.3
At read time, the logic would be as follows:
you would get a slice of the partition from most recent (hence the DESC 
ordering) back to either beginning of the partition or the most recently 
written summation value (e.g. s5.1 and s4.5).
With the syntax from option 2 above (and using % as a wildcard), you would get
 SELECT uid,evtts,evtval from t6167 where  uid=1 and evtts  NOW() UNTIL 
PARTITION evtval='s%' 
Asuming NOW() = 8, and the above data, this would return :
 uid | evtts | evtval
-+---+
   1 | 7 |0.5
   1 | 6 |   -1.4
   1 | 5 |0.3
   1 | 4 |   s5.1
At that point, the client would return evtval 4.5 by doing client-side 
agregation of those 4 rows. 
Then optionally, if enough time had elapsed since the last aggregation column 
to ensure no out of order delivery (business rule), then that same reader 
thread would write back a new aggregation value at an appropriate timestamp 
lagging behing the current time by the potential out-of-order delivery window.
These writes would be inherently idemptotent, and hence race-condition free and 
could be easily tuned for delivery window and aggregation frequency to various 
workloads

 Add end-slice termination predicate
 ---

 Key: CASSANDRA-6167
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6167
 Project: Cassandra
  Issue Type: Improvement
  Components: API, Core
Reporter: Tupshin Harper
Priority: Minor
  Labels: ponies

 When doing performing storage-engine slices, it would sometimes be beneficial 
 to have the slice terminate for other reasons other than number of columns or 
 min/max cell name.
 Since we are able to look at the contents of each cell as we read it, this is 
 potentially doable with very little overhead. 
 Probably more challenging than the storage-engine implementation itself, is 
 to come up with appropriate CQL syntax (Thrift, should we decide to support 
 it, would be trivial).
 Two possibilities ar
 1) special where function:
 SELECT pk,event from cf WHERE pk IN (1,5,10,11) AND 
 partition_predicate({predicate})
 or a bigger language change, but i think one I prefer. more like:
 2) SELECT pk,event from cf where pk IN (1,5,10,11) UNTIL PARTITION event 
 {predicate}
 Neither feels perfect, but I do like the fact that the second one at least 
 clearly states what it is intended to do.
 By using UNTIL PARTITION, we could re-use the UNTIL keyword to handle other 
 kinds of early-termination of selects that the coordinator might be able to 
 do, such as stop retrieving additional rows from shards after a particular 
 criterion was met.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)