[jira] [Commented] (CASSANDRA-4803) CFRR wide row iterators improvements
[ https://issues.apache.org/jira/browse/CASSANDRA-4803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13501886#comment-13501886 ] Jonathan Ellis commented on CASSANDRA-4803: --- Without trying it, I think it would be TApplicationException.WRONG_METHOD_NAME. CFRR wide row iterators improvements Key: CASSANDRA-4803 URL: https://issues.apache.org/jira/browse/CASSANDRA-4803 Project: Cassandra Issue Type: Bug Components: Hadoop Affects Versions: 1.1.0 Reporter: Piotr Kołaczkowski Assignee: Piotr Kołaczkowski Fix For: 1.1.7 Attachments: 0004-Better-token-range-wrap-around-handling-in-CFIF-CFRR.patch, 0006-Code-cleanup-refactoring-in-CFRR.-Fixed-bug-with-mis.patch, 0007-Fallback-to-describe_splits-in-case-describe_splits_.patch {code} public float getProgress() { // TODO this is totally broken for wide rows // the progress is likely to be reported slightly off the actual but close enough float progress = ((float) iter.rowsRead() / totalRowCount); return progress 1.0F ? 1.0F : progress; } {code} The problem is iter.rowsRead() does not return the number of rows read from the wide row iterator, but returns number of *columns* (every row is counted multiple times). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4803) CFRR wide row iterators improvements
[ https://issues.apache.org/jira/browse/CASSANDRA-4803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13501885#comment-13501885 ] Jonathan Ellis commented on CASSANDRA-4803: --- Is there a more specific exception we can catch besides TException? CFRR wide row iterators improvements Key: CASSANDRA-4803 URL: https://issues.apache.org/jira/browse/CASSANDRA-4803 Project: Cassandra Issue Type: Bug Components: Hadoop Affects Versions: 1.1.0 Reporter: Piotr Kołaczkowski Assignee: Piotr Kołaczkowski Fix For: 1.1.7 Attachments: 0004-Better-token-range-wrap-around-handling-in-CFIF-CFRR.patch, 0006-Code-cleanup-refactoring-in-CFRR.-Fixed-bug-with-mis.patch, 0007-Fallback-to-describe_splits-in-case-describe_splits_.patch {code} public float getProgress() { // TODO this is totally broken for wide rows // the progress is likely to be reported slightly off the actual but close enough float progress = ((float) iter.rowsRead() / totalRowCount); return progress 1.0F ? 1.0F : progress; } {code} The problem is iter.rowsRead() does not return the number of rows read from the wide row iterator, but returns number of *columns* (every row is counted multiple times). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4803) CFRR wide row iterators improvements
[ https://issues.apache.org/jira/browse/CASSANDRA-4803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13502291#comment-13502291 ] Piotr Kołaczkowski commented on CASSANDRA-4803: --- [~jbellis] right, I change it. CFRR wide row iterators improvements Key: CASSANDRA-4803 URL: https://issues.apache.org/jira/browse/CASSANDRA-4803 Project: Cassandra Issue Type: Bug Components: Hadoop Affects Versions: 1.1.0 Reporter: Piotr Kołaczkowski Assignee: Piotr Kołaczkowski Fix For: 1.1.7 Attachments: 0004-Better-token-range-wrap-around-handling-in-CFIF-CFRR.patch, 0006-Code-cleanup-refactoring-in-CFRR.-Fixed-bug-with-mis.patch, 0007-Fallback-to-describe_splits-in-case-describe_splits_.patch {code} public float getProgress() { // TODO this is totally broken for wide rows // the progress is likely to be reported slightly off the actual but close enough float progress = ((float) iter.rowsRead() / totalRowCount); return progress 1.0F ? 1.0F : progress; } {code} The problem is iter.rowsRead() does not return the number of rows read from the wide row iterator, but returns number of *columns* (every row is counted multiple times). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4803) CFRR wide row iterators improvements
[ https://issues.apache.org/jira/browse/CASSANDRA-4803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13501242#comment-13501242 ] Piotr Kołaczkowski commented on CASSANDRA-4803: --- 0007 - this is for rolling upgrade. When you upgrade one node and the other nodes don't have describe_splits_ex yet, starting a hadoop job on a newly upgraded node fails. As for 0004 / 0006 fixes - I agree. Let's move them to a separate ticket. CFRR wide row iterators improvements Key: CASSANDRA-4803 URL: https://issues.apache.org/jira/browse/CASSANDRA-4803 Project: Cassandra Issue Type: Bug Components: Hadoop Affects Versions: 1.1.0 Reporter: Piotr Kołaczkowski Assignee: Piotr Kołaczkowski Fix For: 1.1.7 Attachments: 0004-Better-token-range-wrap-around-handling-in-CFIF-CFRR.patch, 0006-Code-cleanup-refactoring-in-CFRR.-Fixed-bug-with-mis.patch, 0007-Fallback-to-describe_splits-in-case-describe_splits_.patch {code} public float getProgress() { // TODO this is totally broken for wide rows // the progress is likely to be reported slightly off the actual but close enough float progress = ((float) iter.rowsRead() / totalRowCount); return progress 1.0F ? 1.0F : progress; } {code} The problem is iter.rowsRead() does not return the number of rows read from the wide row iterator, but returns number of *columns* (every row is counted multiple times). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4803) CFRR wide row iterators improvements
[ https://issues.apache.org/jira/browse/CASSANDRA-4803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13501243#comment-13501243 ] Piotr Kołaczkowski commented on CASSANDRA-4803: --- BTW: I don't get email notifications from ASF Jira. It is because I registered long, long time ago and my email is obsolete. How to change my email to a newer one? I can't see an option in the user profile for doing that. CFRR wide row iterators improvements Key: CASSANDRA-4803 URL: https://issues.apache.org/jira/browse/CASSANDRA-4803 Project: Cassandra Issue Type: Bug Components: Hadoop Affects Versions: 1.1.0 Reporter: Piotr Kołaczkowski Assignee: Piotr Kołaczkowski Fix For: 1.1.7 Attachments: 0004-Better-token-range-wrap-around-handling-in-CFIF-CFRR.patch, 0006-Code-cleanup-refactoring-in-CFRR.-Fixed-bug-with-mis.patch, 0007-Fallback-to-describe_splits-in-case-describe_splits_.patch {code} public float getProgress() { // TODO this is totally broken for wide rows // the progress is likely to be reported slightly off the actual but close enough float progress = ((float) iter.rowsRead() / totalRowCount); return progress 1.0F ? 1.0F : progress; } {code} The problem is iter.rowsRead() does not return the number of rows read from the wide row iterator, but returns number of *columns* (every row is counted multiple times). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4803) CFRR wide row iterators improvements
[ https://issues.apache.org/jira/browse/CASSANDRA-4803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13499405#comment-13499405 ] Jonathan Ellis commented on CASSANDRA-4803: --- 0006: Can you split out the bug fix and rebase? Refactoring is fine but let's keep it separate from bug fixes. 0007: I'm unclear what this is useful for, anyone running a 1.1 recent enough to have this patch, would also have describe_splits_ex, no? CFRR wide row iterators improvements Key: CASSANDRA-4803 URL: https://issues.apache.org/jira/browse/CASSANDRA-4803 Project: Cassandra Issue Type: Bug Components: Hadoop Affects Versions: 1.1.0 Reporter: Piotr Kołaczkowski Assignee: Piotr Kołaczkowski Fix For: 1.1.7 Attachments: 0004-Better-token-range-wrap-around-handling-in-CFIF-CFRR.patch, 0006-Code-cleanup-refactoring-in-CFRR.-Fixed-bug-with-mis.patch, 0007-Fallback-to-describe_splits-in-case-describe_splits_.patch {code} public float getProgress() { // TODO this is totally broken for wide rows // the progress is likely to be reported slightly off the actual but close enough float progress = ((float) iter.rowsRead() / totalRowCount); return progress 1.0F ? 1.0F : progress; } {code} The problem is iter.rowsRead() does not return the number of rows read from the wide row iterator, but returns number of *columns* (every row is counted multiple times). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4803) CFRR wide row iterators improvements
[ https://issues.apache.org/jira/browse/CASSANDRA-4803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13499404#comment-13499404 ] Jonathan Ellis commented on CASSANDRA-4803: --- bq. what about virtual nodes in 1.2? Do we insist that split may not span more than one contiguous token range? That's kind of orthogonal to wrapping ranges per se -- you'll still only have a single [virtual] node whose range wraps. So vnodes won't make that worse. Moreover, you're still going to need two scans at the disk level since a wrapping range won't be contiguous there. (Currently wrapping ranges are split by StorageProxy.getRestrictedRanges but this may change for CASSANDRA-4858.) Doing an extra Thrift or CQL query is negligible overhead compared to the actual scan. Finally, getRestrictedRanges *will* split it up into scan-per-vnode which I agree is something we should fix but I don't think this patch does it. As an optimization I don't think it's something we should block 1.2.0 for. Should we split this into a separate ticket? CFRR wide row iterators improvements Key: CASSANDRA-4803 URL: https://issues.apache.org/jira/browse/CASSANDRA-4803 Project: Cassandra Issue Type: Bug Components: Hadoop Affects Versions: 1.1.0 Reporter: Piotr Kołaczkowski Assignee: Piotr Kołaczkowski Fix For: 1.1.7 Attachments: 0004-Better-token-range-wrap-around-handling-in-CFIF-CFRR.patch, 0006-Code-cleanup-refactoring-in-CFRR.-Fixed-bug-with-mis.patch, 0007-Fallback-to-describe_splits-in-case-describe_splits_.patch {code} public float getProgress() { // TODO this is totally broken for wide rows // the progress is likely to be reported slightly off the actual but close enough float progress = ((float) iter.rowsRead() / totalRowCount); return progress 1.0F ? 1.0F : progress; } {code} The problem is iter.rowsRead() does not return the number of rows read from the wide row iterator, but returns number of *columns* (every row is counted multiple times). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4803) CFRR wide row iterators improvements
[ https://issues.apache.org/jira/browse/CASSANDRA-4803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13490531#comment-13490531 ] Piotr Kołaczkowski commented on CASSANDRA-4803: --- #04 - what about virtual nodes in 1.2? Do we insist that split may not span more than one contiguous token range? It will be harder to avoid too small splits. And too small split = bigger task book-keeping overhead. CFRR wide row iterators improvements Key: CASSANDRA-4803 URL: https://issues.apache.org/jira/browse/CASSANDRA-4803 Project: Cassandra Issue Type: Bug Components: Hadoop Affects Versions: 1.1.0 Reporter: Piotr Kołaczkowski Assignee: Piotr Kołaczkowski Fix For: 1.1.7, 1.2.0 Attachments: 0001-Wide-row-iterator-counts-rows-not-columns.patch, 0002-Fixed-bugs-in-describe_splits.-CFRR-uses-row-counts-.patch, 0003-Fixed-get_paged_slice-memtable-and-sstable-column-it.patch, 0004-Better-token-range-wrap-around-handling-in-CFIF-CFRR.patch, 0005-Fixed-handling-of-start_key-end_token-in-get_range_s.patch, 0006-Code-cleanup-refactoring-in-CFRR.-Fixed-bug-with-mis.patch {code} public float getProgress() { // TODO this is totally broken for wide rows // the progress is likely to be reported slightly off the actual but close enough float progress = ((float) iter.rowsRead() / totalRowCount); return progress 1.0F ? 1.0F : progress; } {code} The problem is iter.rowsRead() does not return the number of rows read from the wide row iterator, but returns number of *columns* (every row is counted multiple times). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4803) CFRR wide row iterators improvements
[ https://issues.apache.org/jira/browse/CASSANDRA-4803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13490548#comment-13490548 ] Piotr Kołaczkowski commented on CASSANDRA-4803: --- Hold on with applying patch 2 for a while. We just discovered it breaks running hive queries while doing rolling upgrade. There is a need for falling back to old describe_splits method if describe_splits_ex is not found. CFRR wide row iterators improvements Key: CASSANDRA-4803 URL: https://issues.apache.org/jira/browse/CASSANDRA-4803 Project: Cassandra Issue Type: Bug Components: Hadoop Affects Versions: 1.1.0 Reporter: Piotr Kołaczkowski Assignee: Piotr Kołaczkowski Fix For: 1.1.7, 1.2.0 Attachments: 0001-Wide-row-iterator-counts-rows-not-columns.patch, 0002-Fixed-bugs-in-describe_splits.-CFRR-uses-row-counts-.patch, 0003-Fixed-get_paged_slice-memtable-and-sstable-column-it.patch, 0004-Better-token-range-wrap-around-handling-in-CFIF-CFRR.patch, 0005-Fixed-handling-of-start_key-end_token-in-get_range_s.patch, 0006-Code-cleanup-refactoring-in-CFRR.-Fixed-bug-with-mis.patch {code} public float getProgress() { // TODO this is totally broken for wide rows // the progress is likely to be reported slightly off the actual but close enough float progress = ((float) iter.rowsRead() / totalRowCount); return progress 1.0F ? 1.0F : progress; } {code} The problem is iter.rowsRead() does not return the number of rows read from the wide row iterator, but returns number of *columns* (every row is counted multiple times). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4803) CFRR wide row iterators improvements
[ https://issues.apache.org/jira/browse/CASSANDRA-4803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13483522#comment-13483522 ] Jonathan Ellis commented on CASSANDRA-4803: --- Not sure about 04 -- I'm a fan of the simplifications we get from letting CFRR only need to deal with non-wrapping splits. CFRR wide row iterators improvements Key: CASSANDRA-4803 URL: https://issues.apache.org/jira/browse/CASSANDRA-4803 Project: Cassandra Issue Type: Bug Components: Hadoop Affects Versions: 1.1.0 Reporter: Piotr Kołaczkowski Assignee: Piotr Kołaczkowski Fix For: 1.1.7, 1.2.0 beta 2 Attachments: 0001-Wide-row-iterator-counts-rows-not-columns.patch, 0002-Fixed-bugs-in-describe_splits.-CFRR-uses-row-counts-.patch, 0003-Fixed-get_paged_slice-memtable-and-sstable-column-it.patch, 0004-Better-token-range-wrap-around-handling-in-CFIF-CFRR.patch, 0005-Fixed-handling-of-start_key-end_token-in-get_range_s.patch, 0006-Code-cleanup-refactoring-in-CFRR.-Fixed-bug-with-mis.patch {code} public float getProgress() { // TODO this is totally broken for wide rows // the progress is likely to be reported slightly off the actual but close enough float progress = ((float) iter.rowsRead() / totalRowCount); return progress 1.0F ? 1.0F : progress; } {code} The problem is iter.rowsRead() does not return the number of rows read from the wide row iterator, but returns number of *columns* (every row is counted multiple times). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira