Lars Volker has posted comments on this change. Change subject: IMPALA-5210: Count rows and collection items in parquet scanner separately ......................................................................
Patch Set 2: (8 comments) http://gerrit.cloudera.org:8080/#/c/7776/2/be/src/exec/hdfs-parquet-scanner.cc File be/src/exec/hdfs-parquet-scanner.cc: Line 964: auto update_read_count = MakeScopeExitTrigger([&] { > Done. If RETURN_IF_ERROR or num_tuples_mismatch branch is taken the value o Yeah, it changes the semantics to something like "Number of successfully committed items/rows" but I think that's ok, given it keeps the code simpler. PS2, Line 978: continue_execution > Reverted to original code. But is it good practice to add a dead initializa You're right, I think it's better to move it into the loop here. If someone tries to use it without initializing it, the compiler would issue a warning, too. Apologies for the back and forth. http://gerrit.cloudera.org:8080/#/c/7776/3/be/src/exec/hdfs-parquet-scanner.cc File be/src/exec/hdfs-parquet-scanner.cc: Line 1008: num_rows_read += scratch_batch_->num_tuples; You could use COUNTER_SET here and get rid of num_rows_read altogether. PS3, Line 1009: int num_r How about removing the initialization at the beginning of the method and using COUNTER_SET here instead? This way it is also more in line with the variable comment in the .h file. http://gerrit.cloudera.org:8080/#/c/7776/3/be/src/exec/hdfs-parquet-scanner.h File be/src/exec/hdfs-parquet-scanner.h: Line 470: /// Number of scanners that end up doing no reads because their splits don't overlap Can you describe the scope of this in the comment, e.g. "Total number of collection items read by this scanner." If you switch the scope to the scanner's lifetime you should also initialize it. PS3, Line 551: d of the collection has coll_items_read_counter_. http://gerrit.cloudera.org:8080/#/c/7776/2/be/src/exec/parquet-column-readers.cc File be/src/exec/parquet-column-readers.cc: PS2, Line 242: /*num_coll_item_read*/ > The parameter doesn't exist now. Can we add that to the style guide wiki pa Sure, feel free to add it here: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65868536 I think it could also pass as "consistency with the surrounding code", though that's harder to spot. http://gerrit.cloudera.org:8080/#/c/7776/3/be/src/exec/scan-node.h File be/src/exec/scan-node.h: Line 153: /// # collection items read from the scanner Can you explain here that this is across nested collections, so that A = {B, C} would actually result in three items being read? -- To view, visit http://gerrit.cloudera.org:8080/7776 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7f6efddaea18507482940f5bdab7326b6482b067 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tianyi Wang <[email protected]> Gerrit-Reviewer: Lars Volker <[email protected]> Gerrit-Reviewer: Tianyi Wang <[email protected]> Gerrit-HasComments: Yes
