[jira] [Updated] (CASSANDRA-1337) parallelize fetching rows for low-cardinality indexes and range scans of small tables

2013-11-13 Thread Jeremiah Jordan (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremiah Jordan updated CASSANDRA-1337:
---

Description: 
currently, we read the indexed rows (and rows for a range scan) from the first 
node (in partitioner order); if that does not have enough matching rows, we 
read the rows from the next, and so forth.

we should use the statistics fom CASSANDRA-1155 to query multiple nodes in 
parallel, such that we have a high chance of getting enough rows w/o having to 
do another round of queries (but, if our estimate is incorrect, we do need to 
loop and do more rounds until we have enough data or we have fetched from each 
node).


  was:
currently, we read the indexed rows from the first node (in partitioner order); 
if that does not have enough matching rows, we read the rows from the next, and 
so forth.

we should use the statistics fom CASSANDRA-1155 to query multiple nodes in 
parallel, such that we have a high chance of getting enough rows w/o having to 
do another round of queries (but, if our estimate is incorrect, we do need to 
loop and do more rounds until we have enough data or we have fetched from each 
node).



 parallelize fetching rows for low-cardinality indexes and range scans of 
 small tables
 -

 Key: CASSANDRA-1337
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1337
 Project: Cassandra
  Issue Type: Improvement
Reporter: Jonathan Ellis
Assignee: Tyler Hobbs
Priority: Minor
 Fix For: 2.1

 Attachments: 1137-bugfix.patch, 1337-v4.patch, 1337-v5.patch, 
 1337.patch, 
 ASF.LICENSE.NOT.GRANTED--0001-CASSANDRA-1337-scan-concurrently-depending-on-num-rows.txt,
  CASSANDRA-1337.patch

   Original Estimate: 8h
  Remaining Estimate: 8h

 currently, we read the indexed rows (and rows for a range scan) from the 
 first node (in partitioner order); if that does not have enough matching 
 rows, we read the rows from the next, and so forth.
 we should use the statistics fom CASSANDRA-1155 to query multiple nodes in 
 parallel, such that we have a high chance of getting enough rows w/o having 
 to do another round of queries (but, if our estimate is incorrect, we do need 
 to loop and do more rounds until we have enough data or we have fetched from 
 each node).



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (CASSANDRA-1337) parallelize fetching rows for low-cardinality indexes and range scans of small tables

2013-11-13 Thread Jeremiah Jordan (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremiah Jordan updated CASSANDRA-1337:
---

Summary: parallelize fetching rows for low-cardinality indexes and range 
scans of small tables  (was: parallelize fetching rows for low-cardinality 
indexes)

 parallelize fetching rows for low-cardinality indexes and range scans of 
 small tables
 -

 Key: CASSANDRA-1337
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1337
 Project: Cassandra
  Issue Type: Improvement
Reporter: Jonathan Ellis
Assignee: Tyler Hobbs
Priority: Minor
 Fix For: 2.1

 Attachments: 1137-bugfix.patch, 1337-v4.patch, 1337-v5.patch, 
 1337.patch, 
 ASF.LICENSE.NOT.GRANTED--0001-CASSANDRA-1337-scan-concurrently-depending-on-num-rows.txt,
  CASSANDRA-1337.patch

   Original Estimate: 8h
  Remaining Estimate: 8h

 currently, we read the indexed rows from the first node (in partitioner 
 order); if that does not have enough matching rows, we read the rows from the 
 next, and so forth.
 we should use the statistics fom CASSANDRA-1155 to query multiple nodes in 
 parallel, such that we have a high chance of getting enough rows w/o having 
 to do another round of queries (but, if our estimate is incorrect, we do need 
 to loop and do more rounds until we have enough data or we have fetched from 
 each node).



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (CASSANDRA-1337) parallelize fetching rows for low-cardinality indexes

2013-10-04 Thread Tyler Hobbs (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tyler Hobbs updated CASSANDRA-1337:
---

Attachment: (was: 0001-Concurrent-range-and-2ary-index-subqueries.patch)

 parallelize fetching rows for low-cardinality indexes
 -

 Key: CASSANDRA-1337
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1337
 Project: Cassandra
  Issue Type: Improvement
Reporter: Jonathan Ellis
Assignee: Tyler Hobbs
Priority: Minor
 Fix For: 2.1

 Attachments: 1137-bugfix.patch, 1337.patch, 1337-v4.patch, 
 ASF.LICENSE.NOT.GRANTED--0001-CASSANDRA-1337-scan-concurrently-depending-on-num-rows.txt,
  CASSANDRA-1337.patch

   Original Estimate: 8h
  Remaining Estimate: 8h

 currently, we read the indexed rows from the first node (in partitioner 
 order); if that does not have enough matching rows, we read the rows from the 
 next, and so forth.
 we should use the statistics fom CASSANDRA-1155 to query multiple nodes in 
 parallel, such that we have a high chance of getting enough rows w/o having 
 to do another round of queries (but, if our estimate is incorrect, we do need 
 to loop and do more rounds until we have enough data or we have fetched from 
 each node).



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (CASSANDRA-1337) parallelize fetching rows for low-cardinality indexes

2013-10-04 Thread Tyler Hobbs (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tyler Hobbs updated CASSANDRA-1337:
---

Attachment: 1337-v5.patch

1337-v5.patch includes your cleanup (with the fix I mentioned) and adds a 10% 
margin to the estimate along with switching to Math.ceil().

Added dest is here: 
https://github.com/thobbs/cassandra-dtest/tree/CASSANDRA-1337

 parallelize fetching rows for low-cardinality indexes
 -

 Key: CASSANDRA-1337
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1337
 Project: Cassandra
  Issue Type: Improvement
Reporter: Jonathan Ellis
Assignee: Tyler Hobbs
Priority: Minor
 Fix For: 2.1

 Attachments: 1137-bugfix.patch, 1337.patch, 1337-v4.patch, 
 ASF.LICENSE.NOT.GRANTED--0001-CASSANDRA-1337-scan-concurrently-depending-on-num-rows.txt,
  CASSANDRA-1337.patch

   Original Estimate: 8h
  Remaining Estimate: 8h

 currently, we read the indexed rows from the first node (in partitioner 
 order); if that does not have enough matching rows, we read the rows from the 
 next, and so forth.
 we should use the statistics fom CASSANDRA-1155 to query multiple nodes in 
 parallel, such that we have a high chance of getting enough rows w/o having 
 to do another round of queries (but, if our estimate is incorrect, we do need 
 to loop and do more rounds until we have enough data or we have fetched from 
 each node).



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (CASSANDRA-1337) parallelize fetching rows for low-cardinality indexes

2013-10-04 Thread Tyler Hobbs (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tyler Hobbs updated CASSANDRA-1337:
---

Attachment: (was: 1337-v5.patch)

 parallelize fetching rows for low-cardinality indexes
 -

 Key: CASSANDRA-1337
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1337
 Project: Cassandra
  Issue Type: Improvement
Reporter: Jonathan Ellis
Assignee: Tyler Hobbs
Priority: Minor
 Fix For: 2.1

 Attachments: 1137-bugfix.patch, 1337.patch, 1337-v4.patch, 
 ASF.LICENSE.NOT.GRANTED--0001-CASSANDRA-1337-scan-concurrently-depending-on-num-rows.txt,
  CASSANDRA-1337.patch

   Original Estimate: 8h
  Remaining Estimate: 8h

 currently, we read the indexed rows from the first node (in partitioner 
 order); if that does not have enough matching rows, we read the rows from the 
 next, and so forth.
 we should use the statistics fom CASSANDRA-1155 to query multiple nodes in 
 parallel, such that we have a high chance of getting enough rows w/o having 
 to do another round of queries (but, if our estimate is incorrect, we do need 
 to loop and do more rounds until we have enough data or we have fetched from 
 each node).



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (CASSANDRA-1337) parallelize fetching rows for low-cardinality indexes

2013-10-04 Thread Tyler Hobbs (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tyler Hobbs updated CASSANDRA-1337:
---

Attachment: 1337-v5.patch

1337-v5.patch includes your cleanup (along with the minor fix I mentioned), 
adds a 10% margin, and uses Math.ceil().

Added dtest is here: 
https://github.com/thobbs/cassandra-dtest/tree/CASSANDRA-1337

 parallelize fetching rows for low-cardinality indexes
 -

 Key: CASSANDRA-1337
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1337
 Project: Cassandra
  Issue Type: Improvement
Reporter: Jonathan Ellis
Assignee: Tyler Hobbs
Priority: Minor
 Fix For: 2.1

 Attachments: 1137-bugfix.patch, 1337.patch, 1337-v4.patch, 
 1337-v5.patch, 
 ASF.LICENSE.NOT.GRANTED--0001-CASSANDRA-1337-scan-concurrently-depending-on-num-rows.txt,
  CASSANDRA-1337.patch

   Original Estimate: 8h
  Remaining Estimate: 8h

 currently, we read the indexed rows from the first node (in partitioner 
 order); if that does not have enough matching rows, we read the rows from the 
 next, and so forth.
 we should use the statistics fom CASSANDRA-1155 to query multiple nodes in 
 parallel, such that we have a high chance of getting enough rows w/o having 
 to do another round of queries (but, if our estimate is incorrect, we do need 
 to loop and do more rounds until we have enough data or we have fetched from 
 each node).



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (CASSANDRA-1337) parallelize fetching rows for low-cardinality indexes

2013-10-02 Thread Tyler Hobbs (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tyler Hobbs updated CASSANDRA-1337:
---

Attachment: 0001-Concurrent-range-and-2ary-index-subqueries.patch

Patch 0001 is a somewhat re-worked version of David's patch against trunk.

Breaking down estimates by the filter type was neglecting all non-compact cql3 
queries (and some compact ones).  A simpler way to estimate the number of 
results is to break down the queries into four groups:
* Secondary index query: use the mean columns from the index CF (one column per 
data row or cql3 data row)
* Non-cql3 range query: use the estimated keys
* cql3 range query, compact (compact storage or single-component primary key): 
use the estimated keys
* cql3 range query, non-compact: use (estimated_keys * mean_columns) / 
(number_of_defined_columns)

The last case is the least accurate.  When collections are involved, it will 
overestimate the number of cql3 rows that will be returned, meaning additional 
ranges may need to be queried, but I think this is an acceptable optimization 
degradation.

Another change from David's patch is that if an insufficient number of results 
are fetched by the first round, the concurrency factor will be recalculated 
based on the results we've seen so far instead of simply being set to 1.

I wasn't sure where to add tests for this; would dtests be the best place?

 parallelize fetching rows for low-cardinality indexes
 -

 Key: CASSANDRA-1337
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1337
 Project: Cassandra
  Issue Type: Improvement
Reporter: Jonathan Ellis
Assignee: Tyler Hobbs
Priority: Minor
 Fix For: 2.1

 Attachments: 0001-Concurrent-range-and-2ary-index-subqueries.patch, 
 1137-bugfix.patch, 1337.patch, 1337-v4.patch, 
 ASF.LICENSE.NOT.GRANTED--0001-CASSANDRA-1337-scan-concurrently-depending-on-num-rows.txt,
  CASSANDRA-1337.patch

   Original Estimate: 8h
  Remaining Estimate: 8h

 currently, we read the indexed rows from the first node (in partitioner 
 order); if that does not have enough matching rows, we read the rows from the 
 next, and so forth.
 we should use the statistics fom CASSANDRA-1155 to query multiple nodes in 
 parallel, such that we have a high chance of getting enough rows w/o having 
 to do another round of queries (but, if our estimate is incorrect, we do need 
 to loop and do more rounds until we have enough data or we have fetched from 
 each node).



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (CASSANDRA-1337) parallelize fetching rows for low-cardinality indexes

2013-10-02 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-1337:
--

Reviewer: Jonathan Ellis  (was: Sylvain Lebresne)

 parallelize fetching rows for low-cardinality indexes
 -

 Key: CASSANDRA-1337
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1337
 Project: Cassandra
  Issue Type: Improvement
Reporter: Jonathan Ellis
Assignee: Tyler Hobbs
Priority: Minor
 Fix For: 2.1

 Attachments: 0001-Concurrent-range-and-2ary-index-subqueries.patch, 
 1137-bugfix.patch, 1337.patch, 1337-v4.patch, 
 ASF.LICENSE.NOT.GRANTED--0001-CASSANDRA-1337-scan-concurrently-depending-on-num-rows.txt,
  CASSANDRA-1337.patch

   Original Estimate: 8h
  Remaining Estimate: 8h

 currently, we read the indexed rows from the first node (in partitioner 
 order); if that does not have enough matching rows, we read the rows from the 
 next, and so forth.
 we should use the statistics fom CASSANDRA-1155 to query multiple nodes in 
 parallel, such that we have a high chance of getting enough rows w/o having 
 to do another round of queries (but, if our estimate is incorrect, we do need 
 to loop and do more rounds until we have enough data or we have fetched from 
 each node).



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (CASSANDRA-1337) parallelize fetching rows for low-cardinality indexes

2013-05-22 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-1337:
--

Fix Version/s: (was: 2.0)
   2.1

 parallelize fetching rows for low-cardinality indexes
 -

 Key: CASSANDRA-1337
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1337
 Project: Cassandra
  Issue Type: Improvement
Reporter: Jonathan Ellis
Priority: Minor
 Fix For: 2.1

 Attachments: 1137-bugfix.patch, 1337.patch, 1337-v4.patch, 
 ASF.LICENSE.NOT.GRANTED--0001-CASSANDRA-1337-scan-concurrently-depending-on-num-rows.txt,
  CASSANDRA-1337.patch

   Original Estimate: 8h
  Remaining Estimate: 8h

 currently, we read the indexed rows from the first node (in partitioner 
 order); if that does not have enough matching rows, we read the rows from the 
 next, and so forth.
 we should use the statistics fom CASSANDRA-1155 to query multiple nodes in 
 parallel, such that we have a high chance of getting enough rows w/o having 
 to do another round of queries (but, if our estimate is incorrect, we do need 
 to loop and do more rounds until we have enough data or we have fetched from 
 each node).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-1337) parallelize fetching rows for low-cardinality indexes

2013-01-25 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-1337:
--

Assignee: (was: David Alves)

 parallelize fetching rows for low-cardinality indexes
 -

 Key: CASSANDRA-1337
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1337
 Project: Cassandra
  Issue Type: Improvement
Reporter: Jonathan Ellis
Priority: Minor
 Fix For: 2.0

 Attachments: 1137-bugfix.patch, 1337.patch, 1337-v4.patch, 
 ASF.LICENSE.NOT.GRANTED--0001-CASSANDRA-1337-scan-concurrently-depending-on-num-rows.txt,
  CASSANDRA-1337.patch

   Original Estimate: 8h
  Remaining Estimate: 8h

 currently, we read the indexed rows from the first node (in partitioner 
 order); if that does not have enough matching rows, we read the rows from the 
 next, and so forth.
 we should use the statistics fom CASSANDRA-1155 to query multiple nodes in 
 parallel, such that we have a high chance of getting enough rows w/o having 
 to do another round of queries (but, if our estimate is incorrect, we do need 
 to loop and do more rounds until we have enough data or we have fetched from 
 each node).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-1337) parallelize fetching rows for low-cardinality indexes

2013-01-07 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-1337:
--

Fix Version/s: (was: 1.2.1)
   2.0

 parallelize fetching rows for low-cardinality indexes
 -

 Key: CASSANDRA-1337
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1337
 Project: Cassandra
  Issue Type: Improvement
Reporter: Jonathan Ellis
Assignee: David Alves
Priority: Minor
 Fix For: 2.0

 Attachments: 1137-bugfix.patch, 1337.patch, 1337-v4.patch, 
 ASF.LICENSE.NOT.GRANTED--0001-CASSANDRA-1337-scan-concurrently-depending-on-num-rows.txt,
  CASSANDRA-1337.patch

   Original Estimate: 8h
  Remaining Estimate: 8h

 currently, we read the indexed rows from the first node (in partitioner 
 order); if that does not have enough matching rows, we read the rows from the 
 next, and so forth.
 we should use the statistics fom CASSANDRA-1155 to query multiple nodes in 
 parallel, such that we have a high chance of getting enough rows w/o having 
 to do another round of queries (but, if our estimate is incorrect, we do need 
 to loop and do more rounds until we have enough data or we have fetched from 
 each node).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-1337) parallelize fetching rows for low-cardinality indexes

2012-11-08 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-1337:
--

Fix Version/s: (was: 1.2.1)
   1.2.0 rc1

 parallelize fetching rows for low-cardinality indexes
 -

 Key: CASSANDRA-1337
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1337
 Project: Cassandra
  Issue Type: Improvement
Reporter: Jonathan Ellis
Assignee: David Alves
Priority: Minor
 Fix For: 1.2.0 rc1

 Attachments: 1137-bugfix.patch, 1337.patch, 1337-v4.patch, 
 ASF.LICENSE.NOT.GRANTED--0001-CASSANDRA-1337-scan-concurrently-depending-on-num-rows.txt,
  CASSANDRA-1337.patch

   Original Estimate: 8h
  Remaining Estimate: 8h

 currently, we read the indexed rows from the first node (in partitioner 
 order); if that does not have enough matching rows, we read the rows from the 
 next, and so forth.
 we should use the statistics fom CASSANDRA-1155 to query multiple nodes in 
 parallel, such that we have a high chance of getting enough rows w/o having 
 to do another round of queries (but, if our estimate is incorrect, we do need 
 to loop and do more rounds until we have enough data or we have fetched from 
 each node).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-1337) parallelize fetching rows for low-cardinality indexes

2012-11-08 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-1337:
--

Fix Version/s: (was: 1.2.0 rc1)
   1.2.1

 parallelize fetching rows for low-cardinality indexes
 -

 Key: CASSANDRA-1337
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1337
 Project: Cassandra
  Issue Type: Improvement
Reporter: Jonathan Ellis
Assignee: David Alves
Priority: Minor
 Fix For: 1.2.1

 Attachments: 1137-bugfix.patch, 1337.patch, 1337-v4.patch, 
 ASF.LICENSE.NOT.GRANTED--0001-CASSANDRA-1337-scan-concurrently-depending-on-num-rows.txt,
  CASSANDRA-1337.patch

   Original Estimate: 8h
  Remaining Estimate: 8h

 currently, we read the indexed rows from the first node (in partitioner 
 order); if that does not have enough matching rows, we read the rows from the 
 next, and so forth.
 we should use the statistics fom CASSANDRA-1155 to query multiple nodes in 
 parallel, such that we have a high chance of getting enough rows w/o having 
 to do another round of queries (but, if our estimate is incorrect, we do need 
 to loop and do more rounds until we have enough data or we have fetched from 
 each node).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-1337) parallelize fetching rows for low-cardinality indexes

2012-09-02 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-1337:
--

Reviewer: slebresne  (was: vijay2...@yahoo.com)

 parallelize fetching rows for low-cardinality indexes
 -

 Key: CASSANDRA-1337
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1337
 Project: Cassandra
  Issue Type: Improvement
Reporter: Jonathan Ellis
Assignee: David Alves
Priority: Minor
 Fix For: 1.2.1

 Attachments: 1137-bugfix.patch, 1337.patch, 1337-v4.patch, 
 ASF.LICENSE.NOT.GRANTED--0001-CASSANDRA-1337-scan-concurrently-depending-on-num-rows.txt,
  CASSANDRA-1337.patch

   Original Estimate: 8h
  Remaining Estimate: 8h

 currently, we read the indexed rows from the first node (in partitioner 
 order); if that does not have enough matching rows, we read the rows from the 
 next, and so forth.
 we should use the statistics fom CASSANDRA-1155 to query multiple nodes in 
 parallel, such that we have a high chance of getting enough rows w/o having 
 to do another round of queries (but, if our estimate is incorrect, we do need 
 to loop and do more rounds until we have enough data or we have fetched from 
 each node).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-1337) parallelize fetching rows for low-cardinality indexes

2012-09-01 Thread David Alves (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Alves updated CASSANDRA-1337:
---

Attachment: 1337-v4.patch

Corrected 2ndary index cfactor calc and (slightly) optimized gathering rows.

 parallelize fetching rows for low-cardinality indexes
 -

 Key: CASSANDRA-1337
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1337
 Project: Cassandra
  Issue Type: Improvement
Reporter: Jonathan Ellis
Assignee: David Alves
Priority: Minor
 Fix For: 1.2.1

 Attachments: 1137-bugfix.patch, 1337.patch, 1337-v4.patch, 
 ASF.LICENSE.NOT.GRANTED--0001-CASSANDRA-1337-scan-concurrently-depending-on-num-rows.txt,
  CASSANDRA-1337.patch

   Original Estimate: 8h
  Remaining Estimate: 8h

 currently, we read the indexed rows from the first node (in partitioner 
 order); if that does not have enough matching rows, we read the rows from the 
 next, and so forth.
 we should use the statistics fom CASSANDRA-1155 to query multiple nodes in 
 parallel, such that we have a high chance of getting enough rows w/o having 
 to do another round of queries (but, if our estimate is incorrect, we do need 
 to loop and do more rounds until we have enough data or we have fetched from 
 each node).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-1337) parallelize fetching rows for low-cardinality indexes

2012-08-31 Thread David Alves (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Alves updated CASSANDRA-1337:
---

Attachment: 1337.patch

Clean rehash that addressed Sylvain's (very helpful comments) including 
implementing for the CQL3 case. It estimated concurrency factor the following 
ways:

- Primary Indexes + Thrift - divides cfs by RF
- 2ndary indexes + Thrift - uses the mean col count of the most selective index 
to estimate the number of keys
- CQL3 + IdentityFilter - uses the estimated keys + mean col count to estimate 
cols per node
- CQL3 + Names filter - assumes cols with names are present and uses estimated 
keys to calculate cols per node
- CQL3 - Other filters - as sylvain mentioned because we have no idea on the 
selectivity of the col filter we cannot estimate how many cols will be returned 
per node so we revert to concurrecy factor = 1.

Reimplemented parallel the parallel execution part to make it a lot cleaner IMO 
(previous implementation was adapting sequential execution which made it 
difficult to read)

cql_test.py dtest is failing in the same place as trunk ,need to look into it 
to make sure Sylvain's dtest passes

 parallelize fetching rows for low-cardinality indexes
 -

 Key: CASSANDRA-1337
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1337
 Project: Cassandra
  Issue Type: Improvement
Reporter: Jonathan Ellis
Assignee: David Alves
Priority: Minor
 Fix For: 1.2.1

 Attachments: 1137-bugfix.patch, 1337.patch, 
 ASF.LICENSE.NOT.GRANTED--0001-CASSANDRA-1337-scan-concurrently-depending-on-num-rows.txt,
  CASSANDRA-1337.patch

   Original Estimate: 8h
  Remaining Estimate: 8h

 currently, we read the indexed rows from the first node (in partitioner 
 order); if that does not have enough matching rows, we read the rows from the 
 next, and so forth.
 we should use the statistics fom CASSANDRA-1155 to query multiple nodes in 
 parallel, such that we have a high chance of getting enough rows w/o having 
 to do another round of queries (but, if our estimate is incorrect, we do need 
 to loop and do more rounds until we have enough data or we have fetched from 
 each node).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-1337) parallelize fetching rows for low-cardinality indexes

2012-08-16 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-1337:
--

 Priority: Minor  (was: Major)
Fix Version/s: (was: 1.2.0)
   1.2.1

 parallelize fetching rows for low-cardinality indexes
 -

 Key: CASSANDRA-1337
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1337
 Project: Cassandra
  Issue Type: Improvement
Reporter: Jonathan Ellis
Assignee: David Alves
Priority: Minor
 Fix For: 1.2.1

 Attachments: 
 0001-CASSANDRA-1337-scan-concurrently-depending-on-num-rows.txt, 
 1137-bugfix.patch, CASSANDRA-1337.patch

   Original Estimate: 8h
  Remaining Estimate: 8h

 currently, we read the indexed rows from the first node (in partitioner 
 order); if that does not have enough matching rows, we read the rows from the 
 next, and so forth.
 we should use the statistics fom CASSANDRA-1155 to query multiple nodes in 
 parallel, such that we have a high chance of getting enough rows w/o having 
 to do another round of queries (but, if our estimate is incorrect, we do need 
 to loop and do more rounds until we have enough data or we have fetched from 
 each node).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-1337) parallelize fetching rows for low-cardinality indexes

2012-07-27 Thread David Alves (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Alves updated CASSANDRA-1337:
---

Attachment: 1137-bugfix.patch

patch that addresses the bugs raised by sylvain. (StoragProxyTest and 
cql_test.py both pass) namely:
- local path counts as one less handler
- enough check moven out of the remote branch
- estimatedKeysPerRange take into account replication factor
- columns.maxIsColumns sets concurrency to 1

still working on the dtest that proves (or disproves that this works) 

I'd like to move the rest of the issues raised by sylvain to another ticket.

 parallelize fetching rows for low-cardinality indexes
 -

 Key: CASSANDRA-1337
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1337
 Project: Cassandra
  Issue Type: Improvement
Reporter: Jonathan Ellis
Assignee: David Alves
 Fix For: 1.2

 Attachments: 
 0001-CASSANDRA-1337-scan-concurrently-depending-on-num-rows.txt, 
 1137-bugfix.patch, CASSANDRA-1337.patch

   Original Estimate: 8h
  Remaining Estimate: 8h

 currently, we read the indexed rows from the first node (in partitioner 
 order); if that does not have enough matching rows, we read the rows from the 
 next, and so forth.
 we should use the statistics fom CASSANDRA-1155 to query multiple nodes in 
 parallel, such that we have a high chance of getting enough rows w/o having 
 to do another round of queries (but, if our estimate is incorrect, we do need 
 to loop and do more rounds until we have enough data or we have fetched from 
 each node).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-1337) parallelize fetching rows for low-cardinality indexes

2012-07-26 Thread Sylvain Lebresne (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-1337:


Priority: Major  (was: Minor)

(changing the priority to major because basically I think that currently it 
does more harm than good, and we should either fix it before release or revert 
it)

 parallelize fetching rows for low-cardinality indexes
 -

 Key: CASSANDRA-1337
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1337
 Project: Cassandra
  Issue Type: Improvement
Reporter: Jonathan Ellis
Assignee: David Alves
 Fix For: 1.2

 Attachments: 
 0001-CASSANDRA-1337-scan-concurrently-depending-on-num-rows.txt, 
 CASSANDRA-1337.patch

   Original Estimate: 8h
  Remaining Estimate: 8h

 currently, we read the indexed rows from the first node (in partitioner 
 order); if that does not have enough matching rows, we read the rows from the 
 next, and so forth.
 we should use the statistics fom CASSANDRA-1155 to query multiple nodes in 
 parallel, such that we have a high chance of getting enough rows w/o having 
 to do another round of queries (but, if our estimate is incorrect, we do need 
 to loop and do more rounds until we have enough data or we have fetched from 
 each node).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-1337) parallelize fetching rows for low-cardinality indexes

2012-06-09 Thread David Alves (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Alves updated CASSANDRA-1337:
---

Attachment: CASSANDRA-1337.patch

direct rework of Jake Luciani's original patch applied to trunk. untested. just 
to get the discussion going

 parallelize fetching rows for low-cardinality indexes
 -

 Key: CASSANDRA-1337
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1337
 Project: Cassandra
  Issue Type: Improvement
Reporter: Jonathan Ellis
Assignee: David Alves
Priority: Minor
 Fix For: 1.2

 Attachments: 
 0001-CASSANDRA-1337-scan-concurrently-depending-on-num-rows.txt, 
 CASSANDRA-1337.patch

   Original Estimate: 8h
  Remaining Estimate: 8h

 currently, we read the indexed rows from the first node (in partitioner 
 order); if that does not have enough matching rows, we read the rows from the 
 next, and so forth.
 we should use the statistics fom CASSANDRA-1155 to query multiple nodes in 
 parallel, such that we have a high chance of getting enough rows w/o having 
 to do another round of queries (but, if our estimate is incorrect, we do need 
 to loop and do more rounds until we have enough data or we have fetched from 
 each node).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-1337) parallelize fetching rows for low-cardinality indexes

2012-06-09 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-1337:
--

Reviewer: vijay2...@yahoo.com

 parallelize fetching rows for low-cardinality indexes
 -

 Key: CASSANDRA-1337
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1337
 Project: Cassandra
  Issue Type: Improvement
Reporter: Jonathan Ellis
Assignee: David Alves
Priority: Minor
 Fix For: 1.2

 Attachments: 
 0001-CASSANDRA-1337-scan-concurrently-depending-on-num-rows.txt, 
 CASSANDRA-1337.patch

   Original Estimate: 8h
  Remaining Estimate: 8h

 currently, we read the indexed rows from the first node (in partitioner 
 order); if that does not have enough matching rows, we read the rows from the 
 next, and so forth.
 we should use the statistics fom CASSANDRA-1155 to query multiple nodes in 
 parallel, such that we have a high chance of getting enough rows w/o having 
 to do another round of queries (but, if our estimate is incorrect, we do need 
 to loop and do more rounds until we have enough data or we have fetched from 
 each node).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (CASSANDRA-1337) parallelize fetching rows for low-cardinality indexes

2011-02-17 Thread Brandon Williams (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-1337:


Fix Version/s: (was: 0.7.2)
   0.7.3

 parallelize fetching rows for low-cardinality indexes
 -

 Key: CASSANDRA-1337
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1337
 Project: Cassandra
  Issue Type: Improvement
Reporter: Jonathan Ellis
Assignee: T Jake Luciani
Priority: Minor
 Fix For: 0.7.3

 Attachments: 
 0001-CASSANDRA-1337-scan-concurrently-depending-on-num-rows.txt

   Original Estimate: 8h
  Remaining Estimate: 8h

 currently, we read the indexed rows from the first node (in partitioner 
 order); if that does not have enough matching rows, we read the rows from the 
 next, and so forth.
 we should use the statistics fom CASSANDRA-1155 to query multiple nodes in 
 parallel, such that we have a high chance of getting enough rows w/o having 
 to do another round of queries (but, if our estimate is incorrect, we do need 
 to loop and do more rounds until we have enough data or we have fetched from 
 each node).

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (CASSANDRA-1337) parallelize fetching rows for low-cardinality indexes

2011-01-12 Thread T Jake Luciani (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

T Jake Luciani updated CASSANDRA-1337:
--

Attachment: 0001-CASSANDRA-1337-scan-concurrently-depending-on-num-rows.txt

 parallelize fetching rows for low-cardinality indexes
 -

 Key: CASSANDRA-1337
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1337
 Project: Cassandra
  Issue Type: Improvement
Reporter: Jonathan Ellis
Assignee: T Jake Luciani
Priority: Minor
 Fix For: 0.7.1

 Attachments: 
 0001-CASSANDRA-1337-scan-concurrently-depending-on-num-rows.txt

   Original Estimate: 8h
  Remaining Estimate: 8h

 currently, we read the indexed rows from the first node (in partitioner 
 order); if that does not have enough matching rows, we read the rows from the 
 next, and so forth.
 we should use the statistics fom CASSANDRA-1155 to query multiple nodes in 
 parallel, such that we have a high chance of getting enough rows w/o having 
 to do another round of queries (but, if our estimate is incorrect, we do need 
 to loop and do more rounds until we have enough data or we have fetched from 
 each node).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (CASSANDRA-1337) parallelize fetching rows for low-cardinality indexes

2010-12-28 Thread T Jake Luciani (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

T Jake Luciani updated CASSANDRA-1337:
--

Remaining Estimate: 24h
 Original Estimate: 24h

 parallelize fetching rows for low-cardinality indexes
 -

 Key: CASSANDRA-1337
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1337
 Project: Cassandra
  Issue Type: Improvement
Reporter: Jonathan Ellis
Assignee: T Jake Luciani
Priority: Minor
 Fix For: 0.7.1

   Original Estimate: 24h
  Remaining Estimate: 24h

 currently, we read the indexed rows from the first node (in partitioner 
 order); if that does not have enough matching rows, we read the rows from the 
 next, and so forth.
 we should use the statistics fom CASSANDRA-1155 to query multiple nodes in 
 parallel, such that we have a high chance of getting enough rows w/o having 
 to do another round of queries (but, if our estimate is incorrect, we do need 
 to loop and do more rounds until we have enough data or we have fetched from 
 each node).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (CASSANDRA-1337) parallelize fetching rows for low-cardinality indexes

2010-12-28 Thread T Jake Luciani (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

T Jake Luciani updated CASSANDRA-1337:
--

Remaining Estimate: 8h  (was: 24h)
 Original Estimate: 8h  (was: 24h)

 parallelize fetching rows for low-cardinality indexes
 -

 Key: CASSANDRA-1337
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1337
 Project: Cassandra
  Issue Type: Improvement
Reporter: Jonathan Ellis
Assignee: T Jake Luciani
Priority: Minor
 Fix For: 0.7.1

   Original Estimate: 8h
  Remaining Estimate: 8h

 currently, we read the indexed rows from the first node (in partitioner 
 order); if that does not have enough matching rows, we read the rows from the 
 next, and so forth.
 we should use the statistics fom CASSANDRA-1155 to query multiple nodes in 
 parallel, such that we have a high chance of getting enough rows w/o having 
 to do another round of queries (but, if our estimate is incorrect, we do need 
 to loop and do more rounds until we have enough data or we have fetched from 
 each node).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (CASSANDRA-1337) parallelize fetching rows for low-cardinality indexes

2010-09-17 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-1337:
--

Description: 
currently, we read the indexed rows from the first node (in partitioner order); 
if that does not have enough matching rows, we read the rows from the next, and 
so forth.

we should use the statistics fom CASSANDRA-1155 to query multiple nodes in 
parallel, such that we have a high chance of getting enough rows w/o having to 
do another round of queries (but, if our estimate is incorrect, we do need to 
loop and do more rounds until we have enough data or we have fetched from each 
node).


  was:
currently, we read the indexed rows from the first node (in partitioner order); 
if that does not have enough matching rows, we read the rows from the next, and 
so forth.

we should use the statistics fom CASSANDRA-1155 to query multiple nodes in 
parallel, such that we have a high chance of getting enough rows w/o having to 
do another query (but, if our estimate is incorrect, we do need to loop and do 
a 2nd query).



 parallelize fetching rows for low-cardinality indexes
 -

 Key: CASSANDRA-1337
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1337
 Project: Cassandra
  Issue Type: Improvement
Reporter: Jonathan Ellis
Priority: Minor
 Fix For: 0.7.1


 currently, we read the indexed rows from the first node (in partitioner 
 order); if that does not have enough matching rows, we read the rows from the 
 next, and so forth.
 we should use the statistics fom CASSANDRA-1155 to query multiple nodes in 
 parallel, such that we have a high chance of getting enough rows w/o having 
 to do another round of queries (but, if our estimate is incorrect, we do need 
 to loop and do more rounds until we have enough data or we have fetched from 
 each node).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.