Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12168 )

Change subject: IMPALA-6503: Support reading complex types from ORC format files
......................................................................


Patch Set 12:

(12 comments)

It's a big change so it's taking a while to get my head around!

http://gerrit.cloudera.org:8080/#/c/12168/9/be/src/exec/hdfs-orc-scanner.cc
File be/src/exec/hdfs-orc-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/12168/9/be/src/exec/hdfs-orc-scanner.cc@363
PS9, Line 363:     // We only deal with collection columns (ARRAY/MAP) and 
primitive columns here.
> Tim will correct me if I'm wrong, but I guess he was also thinking about th
Yep that's the rule - use braces if the whole thing doesn't fit on a single 
line.


http://gerrit.cloudera.org:8080/#/c/12168/12/be/src/exec/hdfs-orc-scanner.cc
File be/src/exec/hdfs-orc-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/12168/12/be/src/exec/hdfs-orc-scanner.cc@198
PS12, Line 198: td::
nit:std:: not needed since we import it into the namespace with common/names.h


http://gerrit.cloudera.org:8080/#/c/12168/12/be/src/exec/hdfs-orc-scanner.cc@316
PS12, Line 316:     VLOG_FILE << "Add ORC column " << 
node->getMaximumColumnId() << " for empty tuple "
Can we remove these VLOG_FILE statements here and below? It could get quite 
noisy. I think we could either remove them or make them VLOG(3) since they're 
mainly for debugging I think?


http://gerrit.cloudera.org:8080/#/c/12168/12/be/src/exec/hdfs-orc-scanner.cc@380
PS12, Line 380:     list<uint64_t>& selected_type_ids) {
Can this argument be const?


http://gerrit.cloudera.org:8080/#/c/12168/12/be/src/exec/hdfs-orc-scanner.cc@593
PS12, Line 593:   // We're going to free the previous batch. Clear the 
reference first.
How do we know that we're finished processing the previous batch? What happens 
if we exited the while() loop because row_batch() was at capacity? Don't we 
lose some rows?


http://gerrit.cloudera.org:8080/#/c/12168/12/be/src/exec/hdfs-orc-scanner.cc@638
PS12, Line 638: hat not materializing tuples
grammar: "that are not materializing tuples"


http://gerrit.cloudera.org:8080/#/c/12168/12/be/src/exec/hdfs-orc-scanner.cc@638
PS12, Line 638: materializtion
typo: materialization


http://gerrit.cloudera.org:8080/#/c/12168/12/be/src/exec/hdfs-orc-scanner.cc@688
PS12, Line 688: OrcComplexColumnReader& complex_col_reade
Pass in complex_col_reader by pointer since it's mutable.


http://gerrit.cloudera.org:8080/#/c/12168/12/be/src/exec/orc-column-readers.cc
File be/src/exec/orc-column-readers.cc:

http://gerrit.cloudera.org:8080/#/c/12168/12/be/src/exec/orc-column-readers.cc@133
PS12, Line 133:   VLOG_FILE << "Created reader for " << PrintNode(orc_type) << 
": slot_desc_="
VLOG_FILE is also probably too verbose here - VLOG(3) would be more appropriate.


http://gerrit.cloudera.org:8080/#/c/12168/12/be/src/exec/orc-column-readers.cc@170
PS12, Line 170: TryAllocate
This is minor, but TryAllocateUnaligned() is better for strings - it is 
marginally more efficient and avoids wasting space.


http://gerrit.cloudera.org:8080/#/c/12168/12/be/src/exec/orc-column-readers.cc@219
PS12, Line 219: VLOG_FILE
same comment about logging


http://gerrit.cloudera.org:8080/#/c/12168/12/be/src/runtime/descriptors.cc
File be/src/runtime/descriptors.cc:

http://gerrit.cloudera.org:8080/#/c/12168/12/be/src/runtime/descriptors.cc@65
PS12, Line 65: const int SchemaPathConstants::ARRAY_ITEM = 0;
I think it's better to define the constant value in the header and just 
instantiate it here, i.e. revert the change in the header and make these like:

  const int SchemaPathConstants::ARRAY_ITEM;

That just means the constant value can be inlined in callers.



--
To view, visit http://gerrit.cloudera.org:8080/12168
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I244dc9d2b3e425393f90e45632cb8cdbea6cf790
Gerrit-Change-Number: 12168
Gerrit-PatchSet: 12
Gerrit-Owner: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Tue, 12 Feb 2019 00:08:33 +0000
Gerrit-HasComments: Yes

Reply via email to