Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/11395 )
Change subject: KUDU-686 (part 2/2): use DeltaPreparer in DeltaFileIterator ...................................................................... Patch Set 3: (6 comments) http://gerrit.cloudera.org:8080/#/c/11395/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/11395/2//COMMIT_MSG@7 PS2, Line 7: use DeltaPreparer > Can you talk more about how this changes the logic? Or do you view this as It's a refactor at heart, though it (obviously) affects the division of labor between PrepareBatch() and ApplyUpdates(). I mentioned that below where I talk about flame graphs. At the end of the day the DeltaFileIterator still does the same thing it always did: read deltas from a file, decode them, and apply the appropriate ones during a scan. It just does it more efficiently. I actually don't want to mention diff scans in these two patches because they stand on their own merits: they improve scan performance when projecting multiple columns by reducing CPU load incurred via unnecessary delta decoding. Bringing diff scans into this will just muddy the waters. Instead, I'll tie back to these patches in the diff scan patch series. http://gerrit.cloudera.org:8080/#/c/11395/2//COMMIT_MSG@9 PS2, Line 9: leverage DeltaPreparer > why? just code reuse? get rid of Todd's TODO in the code? See the JIRA and below: this refactor improves performance by getting rid of the decoding-heavy approach taken by the "visitor" pattern previously used by DeltaFileIterator. http://gerrit.cloudera.org:8080/#/c/11395/2//COMMIT_MSG@25 PS2, Line 25: entralize as > Can you be more specific about what we think these improvements are? The rest of the sentence explains: it does away with a bunch of unnecessary delta decoding. There's more information in the JIRA. I'll add some more color here but the commit message is already quite detailed (by my standards) and in the interest of brevity I don't want to duplicate too much from the JIRA. http://gerrit.cloudera.org:8080/#/c/11395/2/src/kudu/tablet/delta_store.h File src/kudu/tablet/delta_store.h: http://gerrit.cloudera.org:8080/#/c/11395/2/src/kudu/tablet/delta_store.h@258 PS2, Line 258: re irrelevan > nit: doc this parameter Done http://gerrit.cloudera.org:8080/#/c/11395/2/src/kudu/tablet/delta_store.cc File src/kudu/tablet/delta_store.cc: http://gerrit.cloudera.org:8080/#/c/11395/2/src/kudu/tablet/delta_store.cc@57 PS2, Line 57: // Returns whether a mutation at 'ts' is relevant under 'snap'. > Mind documenting the interface to this helper? It's not obvious what the me Done http://gerrit.cloudera.org:8080/#/c/11395/2/src/kudu/tablet/deltamemstore.cc File src/kudu/tablet/deltamemstore.cc: http://gerrit.cloudera.org:8080/#/c/11395/2/src/kudu/tablet/deltamemstore.cc@248 PS2, Line 248: // TODO(adar): is this correct? > Mind adding a comment with the thought process behind this bookkeeping / cl Not sure what you mean. This isn't bookkeeping or cleanup; it's an optimization to skip any additional deltas for this row once we've established that we're done with the row. I added the TODO because this seemed like a brain-dead optimization that should have been done previously, so I was wondering if perhaps it's an incorrect thing to do. See https://gerrit.cloudera.org/c/11394/2/src/kudu/tablet/deltamemstore.cc#b268 for the old code here. -- To view, visit http://gerrit.cloudera.org:8080/11395 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I87de52092262c4b42c1bd5107f9139edfc3888b5 Gerrit-Change-Number: 11395 Gerrit-PatchSet: 3 Gerrit-Owner: Adar Dembo <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: David Ribeiro Alves <[email protected]> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy <[email protected]> Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-Comment-Date: Wed, 12 Sep 2018 19:40:18 +0000 Gerrit-HasComments: Yes
