Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/12168 )
Change subject: IMPALA-6503: Support reading complex types from ORC format files ...................................................................... Patch Set 12: (12 comments) It's a big change so it's taking a while to get my head around! http://gerrit.cloudera.org:8080/#/c/12168/9/be/src/exec/hdfs-orc-scanner.cc File be/src/exec/hdfs-orc-scanner.cc: http://gerrit.cloudera.org:8080/#/c/12168/9/be/src/exec/hdfs-orc-scanner.cc@363 PS9, Line 363: // We only deal with collection columns (ARRAY/MAP) and primitive columns here. > Tim will correct me if I'm wrong, but I guess he was also thinking about th Yep that's the rule - use braces if the whole thing doesn't fit on a single line. http://gerrit.cloudera.org:8080/#/c/12168/12/be/src/exec/hdfs-orc-scanner.cc File be/src/exec/hdfs-orc-scanner.cc: http://gerrit.cloudera.org:8080/#/c/12168/12/be/src/exec/hdfs-orc-scanner.cc@198 PS12, Line 198: td:: nit:std:: not needed since we import it into the namespace with common/names.h http://gerrit.cloudera.org:8080/#/c/12168/12/be/src/exec/hdfs-orc-scanner.cc@316 PS12, Line 316: VLOG_FILE << "Add ORC column " << node->getMaximumColumnId() << " for empty tuple " Can we remove these VLOG_FILE statements here and below? It could get quite noisy. I think we could either remove them or make them VLOG(3) since they're mainly for debugging I think? http://gerrit.cloudera.org:8080/#/c/12168/12/be/src/exec/hdfs-orc-scanner.cc@380 PS12, Line 380: list<uint64_t>& selected_type_ids) { Can this argument be const? http://gerrit.cloudera.org:8080/#/c/12168/12/be/src/exec/hdfs-orc-scanner.cc@593 PS12, Line 593: // We're going to free the previous batch. Clear the reference first. How do we know that we're finished processing the previous batch? What happens if we exited the while() loop because row_batch() was at capacity? Don't we lose some rows? http://gerrit.cloudera.org:8080/#/c/12168/12/be/src/exec/hdfs-orc-scanner.cc@638 PS12, Line 638: hat not materializing tuples grammar: "that are not materializing tuples" http://gerrit.cloudera.org:8080/#/c/12168/12/be/src/exec/hdfs-orc-scanner.cc@638 PS12, Line 638: materializtion typo: materialization http://gerrit.cloudera.org:8080/#/c/12168/12/be/src/exec/hdfs-orc-scanner.cc@688 PS12, Line 688: OrcComplexColumnReader& complex_col_reade Pass in complex_col_reader by pointer since it's mutable. http://gerrit.cloudera.org:8080/#/c/12168/12/be/src/exec/orc-column-readers.cc File be/src/exec/orc-column-readers.cc: http://gerrit.cloudera.org:8080/#/c/12168/12/be/src/exec/orc-column-readers.cc@133 PS12, Line 133: VLOG_FILE << "Created reader for " << PrintNode(orc_type) << ": slot_desc_=" VLOG_FILE is also probably too verbose here - VLOG(3) would be more appropriate. http://gerrit.cloudera.org:8080/#/c/12168/12/be/src/exec/orc-column-readers.cc@170 PS12, Line 170: TryAllocate This is minor, but TryAllocateUnaligned() is better for strings - it is marginally more efficient and avoids wasting space. http://gerrit.cloudera.org:8080/#/c/12168/12/be/src/exec/orc-column-readers.cc@219 PS12, Line 219: VLOG_FILE same comment about logging http://gerrit.cloudera.org:8080/#/c/12168/12/be/src/runtime/descriptors.cc File be/src/runtime/descriptors.cc: http://gerrit.cloudera.org:8080/#/c/12168/12/be/src/runtime/descriptors.cc@65 PS12, Line 65: const int SchemaPathConstants::ARRAY_ITEM = 0; I think it's better to define the constant value in the header and just instantiate it here, i.e. revert the change in the header and make these like: const int SchemaPathConstants::ARRAY_ITEM; That just means the constant value can be inlined in callers. -- To view, visit http://gerrit.cloudera.org:8080/12168 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I244dc9d2b3e425393f90e45632cb8cdbea6cf790 Gerrit-Change-Number: 12168 Gerrit-PatchSet: 12 Gerrit-Owner: Quanlong Huang <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Joe McDonnell <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Tue, 12 Feb 2019 00:08:33 +0000 Gerrit-HasComments: Yes
