Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12168 )

Change subject: IMPALA-6503: Support reading complex types from ORC
......................................................................


Patch Set 25:

(2 comments)

Todd, I just added some more replies. Do you have time to keep reviewing these 
codes? If so, I'd like to create a JIRA and a patch for refactoring the 
comments and addressing some minor changes (add DCHECKs, resolve "nit"s).

http://gerrit.cloudera.org:8080/#/c/12168/18/be/src/exec/orc-column-readers.cc
File be/src/exec/orc-column-readers.cc:

http://gerrit.cloudera.org:8080/#/c/12168/18/be/src/exec/orc-column-readers.cc@46
PS18, Line 46:     switch (slot_desc->type().type) {
> Yes. There's a reason why we switch on Impala slot type here. Will sumarize
About why we switch on impala slot type here but switch on ORC type above for 
complex types:
- An OrcColumnReader should be the same type corresponding to the ORC node in 
its constructor. So it's reasonable to switch on ORC type above.
- For primitive types, it's been verified in OrcSchemaResolver::ResolveColumn 
that the ORC type can be converted to the slot type.
- Some column readers are template classes (e.g. OrcIntColumnReader). The 
template variable depends on the slot type. If we switch on the ORC type, we 
should still check the slot type to create the reader. So codes will be very 
ugly and verbose like:


switch (node->getKind()) {
  ...
  case orc::TypeKind::BYTE:
    if (slot_desc->type().type == TYPE_TINYINT) {
      reader = new OrcIntColumnReader<int8_t>(node, slot_desc, scanner);
    } else if (slot_desc->type().type == TYPE_SMALLINT) {
      reader = new OrcIntColumnReader<int16_t>(node, slot_desc, scanner);
    } else if (slot_desc->type().type == TYPE_INT) {
      reader = new OrcIntColumnReader<int32_t>(node, slot_desc, scanner);
    } else if (slot_desc->type().type == TYPE_BIGINT) {
      reader = new OrcIntColumnReader<int64_t>(node, slot_desc, scanner);
    } else {
        DCHECK(false);
    }
    break;
  case orc::TypeKind::SHORT:
    if (slot_desc->type().type == TYPE_SMALLINT) {
      reader = new OrcIntColumnReader<int16_t>(node, slot_desc, scanner);
    } else if (slot_desc->type().type == TYPE_INT) {
      reader = new OrcIntColumnReader<int32_t>(node, slot_desc, scanner);
    } else if (slot_desc->type().type == TYPE_BIGINT) {
      reader = new OrcIntColumnReader<int64_t>(node, slot_desc, scanner);
    } else {
        DCHECK(false);
    }
    break;
  case orc::TypeKind::INT:
    if (slot_desc->type().type == TYPE_INT) {
      reader = new OrcIntColumnReader<int32_t>(node, slot_desc, scanner);
    } else if (slot_desc->type().type == TYPE_BIGINT) {
      reader = new OrcIntColumnReader<int64_t>(node, slot_desc, scanner);
    } else {
        DCHECK(false);
    }
    break;
  case orc::TypeKind::LONG:
    if (slot_desc->type().type == TYPE_BIGINT) {
      reader = new OrcIntColumnReader<int64_t>(node, slot_desc, scanner);
    } else {
        DCHECK(false);
    }
    break;
  case ...
}

Hence, we switch on impala slot type here.


http://gerrit.cloudera.org:8080/#/c/12168/18/be/src/exec/orc-metadata-utils.cc
File be/src/exec/orc-metadata-utils.cc:

http://gerrit.cloudera.org:8080/#/c/12168/18/be/src/exec/orc-metadata-utils.cc@27
PS18, Line 27:   SchemaPath path;
             :   paths->push_back(path);
             :   DCHECK_EQ(root.getKind(), orc::TypeKind::STRUCT);
             :   int num_columns = root.getSubtypeCount();
             :   for (int i = 0; i < num_columns; ++i) {
             :     path.push_back(i + num_partition_keys);
             :     BuildSchemaPath(*root.getSubtype(i), &path, paths);
             :     path.pop_back();
             :   }
> it seems like this code is the same handling as the 'STRUCT' case below in
No, they're different. Here we're dealing with table level columns. The column 
indices in this level should add on 'num_partition_keys'. For columns on other 
levels, the column index always starts at 0.

I'd like to not refactor them into the same code path, so readers can 
distinguish these two cases easily. Maybe I should add some more comments :)



--
To view, visit http://gerrit.cloudera.org:8080/12168
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I244dc9d2b3e425393f90e45632cb8cdbea6cf790
Gerrit-Change-Number: 12168
Gerrit-PatchSet: 25
Gerrit-Owner: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Mon, 11 Mar 2019 03:43:14 +0000
Gerrit-HasComments: Yes

Reply via email to