Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/12168 )
Change subject: IMPALA-6503: Support reading complex types from ORC ...................................................................... Patch Set 25: (2 comments) Todd, I just added some more replies. Do you have time to keep reviewing these codes? If so, I'd like to create a JIRA and a patch for refactoring the comments and addressing some minor changes (add DCHECKs, resolve "nit"s). http://gerrit.cloudera.org:8080/#/c/12168/18/be/src/exec/orc-column-readers.cc File be/src/exec/orc-column-readers.cc: http://gerrit.cloudera.org:8080/#/c/12168/18/be/src/exec/orc-column-readers.cc@46 PS18, Line 46: switch (slot_desc->type().type) { > Yes. There's a reason why we switch on Impala slot type here. Will sumarize About why we switch on impala slot type here but switch on ORC type above for complex types: - An OrcColumnReader should be the same type corresponding to the ORC node in its constructor. So it's reasonable to switch on ORC type above. - For primitive types, it's been verified in OrcSchemaResolver::ResolveColumn that the ORC type can be converted to the slot type. - Some column readers are template classes (e.g. OrcIntColumnReader). The template variable depends on the slot type. If we switch on the ORC type, we should still check the slot type to create the reader. So codes will be very ugly and verbose like: switch (node->getKind()) { ... case orc::TypeKind::BYTE: if (slot_desc->type().type == TYPE_TINYINT) { reader = new OrcIntColumnReader<int8_t>(node, slot_desc, scanner); } else if (slot_desc->type().type == TYPE_SMALLINT) { reader = new OrcIntColumnReader<int16_t>(node, slot_desc, scanner); } else if (slot_desc->type().type == TYPE_INT) { reader = new OrcIntColumnReader<int32_t>(node, slot_desc, scanner); } else if (slot_desc->type().type == TYPE_BIGINT) { reader = new OrcIntColumnReader<int64_t>(node, slot_desc, scanner); } else { DCHECK(false); } break; case orc::TypeKind::SHORT: if (slot_desc->type().type == TYPE_SMALLINT) { reader = new OrcIntColumnReader<int16_t>(node, slot_desc, scanner); } else if (slot_desc->type().type == TYPE_INT) { reader = new OrcIntColumnReader<int32_t>(node, slot_desc, scanner); } else if (slot_desc->type().type == TYPE_BIGINT) { reader = new OrcIntColumnReader<int64_t>(node, slot_desc, scanner); } else { DCHECK(false); } break; case orc::TypeKind::INT: if (slot_desc->type().type == TYPE_INT) { reader = new OrcIntColumnReader<int32_t>(node, slot_desc, scanner); } else if (slot_desc->type().type == TYPE_BIGINT) { reader = new OrcIntColumnReader<int64_t>(node, slot_desc, scanner); } else { DCHECK(false); } break; case orc::TypeKind::LONG: if (slot_desc->type().type == TYPE_BIGINT) { reader = new OrcIntColumnReader<int64_t>(node, slot_desc, scanner); } else { DCHECK(false); } break; case ... } Hence, we switch on impala slot type here. http://gerrit.cloudera.org:8080/#/c/12168/18/be/src/exec/orc-metadata-utils.cc File be/src/exec/orc-metadata-utils.cc: http://gerrit.cloudera.org:8080/#/c/12168/18/be/src/exec/orc-metadata-utils.cc@27 PS18, Line 27: SchemaPath path; : paths->push_back(path); : DCHECK_EQ(root.getKind(), orc::TypeKind::STRUCT); : int num_columns = root.getSubtypeCount(); : for (int i = 0; i < num_columns; ++i) { : path.push_back(i + num_partition_keys); : BuildSchemaPath(*root.getSubtype(i), &path, paths); : path.pop_back(); : } > it seems like this code is the same handling as the 'STRUCT' case below in No, they're different. Here we're dealing with table level columns. The column indices in this level should add on 'num_partition_keys'. For columns on other levels, the column index always starts at 0. I'd like to not refactor them into the same code path, so readers can distinguish these two cases easily. Maybe I should add some more comments :) -- To view, visit http://gerrit.cloudera.org:8080/12168 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I244dc9d2b3e425393f90e45632cb8cdbea6cf790 Gerrit-Change-Number: 12168 Gerrit-PatchSet: 25 Gerrit-Owner: Quanlong Huang <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Joe McDonnell <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Mon, 11 Mar 2019 03:43:14 +0000 Gerrit-HasComments: Yes
