[jira] [Commented] (CASSANDRA-4803) CFRR wide row iterators improvements

2012-11-21 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13501886#comment-13501886
 ] 

Jonathan Ellis commented on CASSANDRA-4803:
---

Without trying it, I think it would be TApplicationException.WRONG_METHOD_NAME.

 CFRR wide row iterators improvements
 

 Key: CASSANDRA-4803
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4803
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Affects Versions: 1.1.0
Reporter: Piotr Kołaczkowski
Assignee: Piotr Kołaczkowski
 Fix For: 1.1.7

 Attachments: 
 0004-Better-token-range-wrap-around-handling-in-CFIF-CFRR.patch, 
 0006-Code-cleanup-refactoring-in-CFRR.-Fixed-bug-with-mis.patch, 
 0007-Fallback-to-describe_splits-in-case-describe_splits_.patch


 {code}
  public float getProgress()
 {
 // TODO this is totally broken for wide rows
 // the progress is likely to be reported slightly off the actual but 
 close enough
 float progress = ((float) iter.rowsRead() / totalRowCount);
 return progress  1.0F ? 1.0F : progress;
 }
 {code}
 The problem is iter.rowsRead() does not return the number of rows read from 
 the wide row iterator, but returns number of *columns* (every row is counted 
 multiple times). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-4803) CFRR wide row iterators improvements

2012-11-21 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13501885#comment-13501885
 ] 

Jonathan Ellis commented on CASSANDRA-4803:
---

Is there a more specific exception we can catch besides TException?

 CFRR wide row iterators improvements
 

 Key: CASSANDRA-4803
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4803
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Affects Versions: 1.1.0
Reporter: Piotr Kołaczkowski
Assignee: Piotr Kołaczkowski
 Fix For: 1.1.7

 Attachments: 
 0004-Better-token-range-wrap-around-handling-in-CFIF-CFRR.patch, 
 0006-Code-cleanup-refactoring-in-CFRR.-Fixed-bug-with-mis.patch, 
 0007-Fallback-to-describe_splits-in-case-describe_splits_.patch


 {code}
  public float getProgress()
 {
 // TODO this is totally broken for wide rows
 // the progress is likely to be reported slightly off the actual but 
 close enough
 float progress = ((float) iter.rowsRead() / totalRowCount);
 return progress  1.0F ? 1.0F : progress;
 }
 {code}
 The problem is iter.rowsRead() does not return the number of rows read from 
 the wide row iterator, but returns number of *columns* (every row is counted 
 multiple times). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-4803) CFRR wide row iterators improvements

2012-11-21 Thread JIRA

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13502291#comment-13502291
 ] 

Piotr Kołaczkowski commented on CASSANDRA-4803:
---

[~jbellis] right, I change it.

 CFRR wide row iterators improvements
 

 Key: CASSANDRA-4803
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4803
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Affects Versions: 1.1.0
Reporter: Piotr Kołaczkowski
Assignee: Piotr Kołaczkowski
 Fix For: 1.1.7

 Attachments: 
 0004-Better-token-range-wrap-around-handling-in-CFIF-CFRR.patch, 
 0006-Code-cleanup-refactoring-in-CFRR.-Fixed-bug-with-mis.patch, 
 0007-Fallback-to-describe_splits-in-case-describe_splits_.patch


 {code}
  public float getProgress()
 {
 // TODO this is totally broken for wide rows
 // the progress is likely to be reported slightly off the actual but 
 close enough
 float progress = ((float) iter.rowsRead() / totalRowCount);
 return progress  1.0F ? 1.0F : progress;
 }
 {code}
 The problem is iter.rowsRead() does not return the number of rows read from 
 the wide row iterator, but returns number of *columns* (every row is counted 
 multiple times). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-4803) CFRR wide row iterators improvements

2012-11-20 Thread JIRA

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13501242#comment-13501242
 ] 

Piotr Kołaczkowski commented on CASSANDRA-4803:
---

0007 - this is for rolling upgrade. When you upgrade one node and the other 
nodes don't have describe_splits_ex yet, starting a hadoop job on a newly 
upgraded node fails.

As for 0004 / 0006 fixes - I agree. Let's move them to a separate ticket. 

 CFRR wide row iterators improvements
 

 Key: CASSANDRA-4803
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4803
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Affects Versions: 1.1.0
Reporter: Piotr Kołaczkowski
Assignee: Piotr Kołaczkowski
 Fix For: 1.1.7

 Attachments: 
 0004-Better-token-range-wrap-around-handling-in-CFIF-CFRR.patch, 
 0006-Code-cleanup-refactoring-in-CFRR.-Fixed-bug-with-mis.patch, 
 0007-Fallback-to-describe_splits-in-case-describe_splits_.patch


 {code}
  public float getProgress()
 {
 // TODO this is totally broken for wide rows
 // the progress is likely to be reported slightly off the actual but 
 close enough
 float progress = ((float) iter.rowsRead() / totalRowCount);
 return progress  1.0F ? 1.0F : progress;
 }
 {code}
 The problem is iter.rowsRead() does not return the number of rows read from 
 the wide row iterator, but returns number of *columns* (every row is counted 
 multiple times). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-4803) CFRR wide row iterators improvements

2012-11-20 Thread JIRA

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13501243#comment-13501243
 ] 

Piotr Kołaczkowski commented on CASSANDRA-4803:
---

BTW: I don't get email notifications from ASF Jira. It is because I registered 
long, long time ago and my email is obsolete. How to change my email to a newer 
one? I can't see an option in the user profile for doing that. 

 CFRR wide row iterators improvements
 

 Key: CASSANDRA-4803
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4803
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Affects Versions: 1.1.0
Reporter: Piotr Kołaczkowski
Assignee: Piotr Kołaczkowski
 Fix For: 1.1.7

 Attachments: 
 0004-Better-token-range-wrap-around-handling-in-CFIF-CFRR.patch, 
 0006-Code-cleanup-refactoring-in-CFRR.-Fixed-bug-with-mis.patch, 
 0007-Fallback-to-describe_splits-in-case-describe_splits_.patch


 {code}
  public float getProgress()
 {
 // TODO this is totally broken for wide rows
 // the progress is likely to be reported slightly off the actual but 
 close enough
 float progress = ((float) iter.rowsRead() / totalRowCount);
 return progress  1.0F ? 1.0F : progress;
 }
 {code}
 The problem is iter.rowsRead() does not return the number of rows read from 
 the wide row iterator, but returns number of *columns* (every row is counted 
 multiple times). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-4803) CFRR wide row iterators improvements

2012-11-17 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13499405#comment-13499405
 ] 

Jonathan Ellis commented on CASSANDRA-4803:
---

0006: Can you split out the bug fix and rebase?  Refactoring is fine but let's 
keep it separate from bug fixes.

0007: I'm unclear what this is useful for, anyone running a 1.1 recent enough 
to have this patch, would also have describe_splits_ex, no?

 CFRR wide row iterators improvements
 

 Key: CASSANDRA-4803
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4803
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Affects Versions: 1.1.0
Reporter: Piotr Kołaczkowski
Assignee: Piotr Kołaczkowski
 Fix For: 1.1.7

 Attachments: 
 0004-Better-token-range-wrap-around-handling-in-CFIF-CFRR.patch, 
 0006-Code-cleanup-refactoring-in-CFRR.-Fixed-bug-with-mis.patch, 
 0007-Fallback-to-describe_splits-in-case-describe_splits_.patch


 {code}
  public float getProgress()
 {
 // TODO this is totally broken for wide rows
 // the progress is likely to be reported slightly off the actual but 
 close enough
 float progress = ((float) iter.rowsRead() / totalRowCount);
 return progress  1.0F ? 1.0F : progress;
 }
 {code}
 The problem is iter.rowsRead() does not return the number of rows read from 
 the wide row iterator, but returns number of *columns* (every row is counted 
 multiple times). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-4803) CFRR wide row iterators improvements

2012-11-17 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13499404#comment-13499404
 ] 

Jonathan Ellis commented on CASSANDRA-4803:
---

bq. what about virtual nodes in 1.2? Do we insist that split may not span more 
than one contiguous token range?

That's kind of orthogonal to wrapping ranges per se -- you'll still only have a 
single [virtual] node whose range wraps.  So vnodes won't make that worse.  
Moreover, you're still going to need two scans at the disk level since a 
wrapping range won't be contiguous there.  (Currently wrapping ranges are split 
by StorageProxy.getRestrictedRanges but this may change for CASSANDRA-4858.)  
Doing an extra Thrift or CQL query is negligible overhead compared to the 
actual scan.

Finally, getRestrictedRanges *will* split it up into scan-per-vnode which I 
agree is something we should fix but I don't think this patch does it.  As an 
optimization I don't think it's something we should block 1.2.0 for.  Should we 
split this into a separate ticket?

 CFRR wide row iterators improvements
 

 Key: CASSANDRA-4803
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4803
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Affects Versions: 1.1.0
Reporter: Piotr Kołaczkowski
Assignee: Piotr Kołaczkowski
 Fix For: 1.1.7

 Attachments: 
 0004-Better-token-range-wrap-around-handling-in-CFIF-CFRR.patch, 
 0006-Code-cleanup-refactoring-in-CFRR.-Fixed-bug-with-mis.patch, 
 0007-Fallback-to-describe_splits-in-case-describe_splits_.patch


 {code}
  public float getProgress()
 {
 // TODO this is totally broken for wide rows
 // the progress is likely to be reported slightly off the actual but 
 close enough
 float progress = ((float) iter.rowsRead() / totalRowCount);
 return progress  1.0F ? 1.0F : progress;
 }
 {code}
 The problem is iter.rowsRead() does not return the number of rows read from 
 the wide row iterator, but returns number of *columns* (every row is counted 
 multiple times). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-4803) CFRR wide row iterators improvements

2012-11-05 Thread JIRA

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13490531#comment-13490531
 ] 

Piotr Kołaczkowski commented on CASSANDRA-4803:
---

#04 - what about virtual nodes in 1.2? Do we insist that split may not span 
more than one contiguous token range? It will be harder to avoid too small 
splits. And too small split = bigger task book-keeping overhead.

 CFRR wide row iterators improvements
 

 Key: CASSANDRA-4803
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4803
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Affects Versions: 1.1.0
Reporter: Piotr Kołaczkowski
Assignee: Piotr Kołaczkowski
 Fix For: 1.1.7, 1.2.0

 Attachments: 0001-Wide-row-iterator-counts-rows-not-columns.patch, 
 0002-Fixed-bugs-in-describe_splits.-CFRR-uses-row-counts-.patch, 
 0003-Fixed-get_paged_slice-memtable-and-sstable-column-it.patch, 
 0004-Better-token-range-wrap-around-handling-in-CFIF-CFRR.patch, 
 0005-Fixed-handling-of-start_key-end_token-in-get_range_s.patch, 
 0006-Code-cleanup-refactoring-in-CFRR.-Fixed-bug-with-mis.patch


 {code}
  public float getProgress()
 {
 // TODO this is totally broken for wide rows
 // the progress is likely to be reported slightly off the actual but 
 close enough
 float progress = ((float) iter.rowsRead() / totalRowCount);
 return progress  1.0F ? 1.0F : progress;
 }
 {code}
 The problem is iter.rowsRead() does not return the number of rows read from 
 the wide row iterator, but returns number of *columns* (every row is counted 
 multiple times). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-4803) CFRR wide row iterators improvements

2012-11-05 Thread JIRA

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13490548#comment-13490548
 ] 

Piotr Kołaczkowski commented on CASSANDRA-4803:
---

Hold on with applying patch 2 for a while. We just discovered it breaks running 
hive queries while doing rolling upgrade. There is a need for falling back to 
old describe_splits method if describe_splits_ex is not found.

 CFRR wide row iterators improvements
 

 Key: CASSANDRA-4803
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4803
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Affects Versions: 1.1.0
Reporter: Piotr Kołaczkowski
Assignee: Piotr Kołaczkowski
 Fix For: 1.1.7, 1.2.0

 Attachments: 0001-Wide-row-iterator-counts-rows-not-columns.patch, 
 0002-Fixed-bugs-in-describe_splits.-CFRR-uses-row-counts-.patch, 
 0003-Fixed-get_paged_slice-memtable-and-sstable-column-it.patch, 
 0004-Better-token-range-wrap-around-handling-in-CFIF-CFRR.patch, 
 0005-Fixed-handling-of-start_key-end_token-in-get_range_s.patch, 
 0006-Code-cleanup-refactoring-in-CFRR.-Fixed-bug-with-mis.patch


 {code}
  public float getProgress()
 {
 // TODO this is totally broken for wide rows
 // the progress is likely to be reported slightly off the actual but 
 close enough
 float progress = ((float) iter.rowsRead() / totalRowCount);
 return progress  1.0F ? 1.0F : progress;
 }
 {code}
 The problem is iter.rowsRead() does not return the number of rows read from 
 the wide row iterator, but returns number of *columns* (every row is counted 
 multiple times). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-4803) CFRR wide row iterators improvements

2012-10-24 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13483522#comment-13483522
 ] 

Jonathan Ellis commented on CASSANDRA-4803:
---

Not sure about 04 -- I'm a fan of the simplifications we get from letting CFRR 
only need to deal with non-wrapping splits.

 CFRR wide row iterators improvements
 

 Key: CASSANDRA-4803
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4803
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Affects Versions: 1.1.0
Reporter: Piotr Kołaczkowski
Assignee: Piotr Kołaczkowski
 Fix For: 1.1.7, 1.2.0 beta 2

 Attachments: 0001-Wide-row-iterator-counts-rows-not-columns.patch, 
 0002-Fixed-bugs-in-describe_splits.-CFRR-uses-row-counts-.patch, 
 0003-Fixed-get_paged_slice-memtable-and-sstable-column-it.patch, 
 0004-Better-token-range-wrap-around-handling-in-CFIF-CFRR.patch, 
 0005-Fixed-handling-of-start_key-end_token-in-get_range_s.patch, 
 0006-Code-cleanup-refactoring-in-CFRR.-Fixed-bug-with-mis.patch


 {code}
  public float getProgress()
 {
 // TODO this is totally broken for wide rows
 // the progress is likely to be reported slightly off the actual but 
 close enough
 float progress = ((float) iter.rowsRead() / totalRowCount);
 return progress  1.0F ? 1.0F : progress;
 }
 {code}
 The problem is iter.rowsRead() does not return the number of rows read from 
 the wide row iterator, but returns number of *columns* (every row is counted 
 multiple times). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira