[jira] [Commented] (CASSANDRA-4536) Ability for CQL3 to list partition keys

2013-08-21 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13746123#comment-13746123
 ] 

Sylvain Lebresne commented on CASSANDRA-4536:
-

lgtm, +1


(I've created CASSANDRA-5912 as a follow up if we want to get fancy and 
optimize that further. Probably not a priority though).

 Ability for CQL3 to list partition keys
 ---

 Key: CASSANDRA-4536
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4536
 Project: Cassandra
  Issue Type: New Feature
  Components: API
Affects Versions: 1.1.0
Reporter: Jonathan Ellis
Assignee: Aleksey Yeschenko
Priority: Minor
  Labels: cql3
 Fix For: 2.0.1

 Attachments: 4536.txt, cassandra-4536_1.1.0.patch, 
 cassandra-4536_1.2.2.patch, cassandra-4536_1.2.5.patch


 It can be useful to know the set of in-use partition keys (storage engine row 
 keys).  One example given to me was where application data was modeled as a 
 few 10s of 1000s of wide rows, where the app required presenting these rows 
 to the user sorted based on information in the partition key.  The partition 
 count is small enough to do the sort client-side in memory, which is what the 
 app did with the Thrift API--a range slice with an empty columns list.
 This was a problem when migrating to CQL3.  {{SELECT mykey FROM mytable}} 
 includes all the logical rows, which makes the resultset too large to make 
 this a reasonable approach, even with paging.
 One way to add support would be to allow DISTINCT in the special case of 
 {{SELECT DISTINCT mykey FROM mytable}}.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-4536) Ability for CQL3 to list partition keys

2013-07-05 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13700642#comment-13700642
 ] 

Sylvain Lebresne commented on CASSANDRA-4536:
-

The main missing part I think is that we should handle composite partition 
keys. Meaning that we should allow DISTINCT only if it's on *all* the (CQL3) 
partition key columns, but there may be more than one.

Also, in SelectStatement.process(), it seems we only allow CQL3 tables. Why not 
just move the isDistinct block at the beginning to include all cases?

Other minor remarks/nits:
* In the parser, I'd rather just have K_DISTINCT optional in front of normal 
selectClause and do validation later in SelectStatement, rather than having a 
special selectDistinctClause (partly to keep the parser simpler, but also 
because we can return better error messages that way). We'd need to support 
distinct on multiple columns for composite partition keys anyway.
* In makeFilter(): there's a ColumnSlice.ALL_COLUMNS_ARRAY to shorten that 
further. Also, we don't care about reversed, so since reversed slice are 
slightly slower, let's never reverse.

 Ability for CQL3 to list partition keys
 ---

 Key: CASSANDRA-4536
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4536
 Project: Cassandra
  Issue Type: New Feature
  Components: API
Affects Versions: 1.1.0
Reporter: Jonathan Ellis
Assignee: dan jatnieks
Priority: Minor
  Labels: cql3
 Fix For: 1.2.7

 Attachments: cassandra-4536_1.1.0.patch, cassandra-4536_1.2.2.patch, 
 cassandra-4536_1.2.5.patch


 It can be useful to know the set of in-use partition keys (storage engine row 
 keys).  One example given to me was where application data was modeled as a 
 few 10s of 1000s of wide rows, where the app required presenting these rows 
 to the user sorted based on information in the partition key.  The partition 
 count is small enough to do the sort client-side in memory, which is what the 
 app did with the Thrift API--a range slice with an empty columns list.
 This was a problem when migrating to CQL3.  {{SELECT mykey FROM mytable}} 
 includes all the logical rows, which makes the resultset too large to make 
 this a reasonable approach, even with paging.
 One way to add support would be to allow DISTINCT in the special case of 
 {{SELECT DISTINCT mykey FROM mytable}}.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-4536) Ability for CQL3 to list partition keys

2012-11-07 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13493022#comment-13493022
 ] 

Sylvain Lebresne commented on CASSANDRA-4536:
-

Let me note that in CQL3 a row that have no live column don't exist, so we 
can't really implement this with a range slice having an empty columns list. 
Instead we should do a range slice with a full-row slice predicate with a count 
of 1, to make sure we do have a live column before including the partition key. 
The downside being that this won't 'just read the index file'.

On the longer run, it should be possible to optimize that further if we 
consider it worth it by adding a 1 bit per key info in the sstable index saying 
'is there at least one live column for that key in that sstable' (we could even 
add that bit-per-key without augmenting the on-disk index size if we want to by 
using the first bit of the key position (since we use it as a signed long and 
thus the first bit is unused)).

 Ability for CQL3 to list partition keys
 ---

 Key: CASSANDRA-4536
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4536
 Project: Cassandra
  Issue Type: New Feature
  Components: API
Affects Versions: 1.1.0
Reporter: Jonathan Ellis
Priority: Minor
  Labels: cql3
 Fix For: 1.2.1


 It can be useful to know the set of in-use partition keys (storage engine row 
 keys).  One example given to me was where application data was modeled as a 
 few 10s of 1000s of wide rows, where the app required presenting these rows 
 to the user sorted based on information in the partition key.  The partition 
 count is small enough to do the sort client-side in memory, which is what the 
 app did with the Thrift API--a range slice with an empty columns list.
 This was a problem when migrating to CQL3.  {{SELECT mykey FROM mytable}} 
 includes all the logical rows, which makes the resultset too large to make 
 this a reasonable approach, even with paging.
 One way to add support would be to allow DISTINCT in the special case of 
 {{SELECT DISTINCT mykey FROM mytable}}.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira