Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/9134 )
Change subject: IMPALA-5717: Support for reading ORC data files ...................................................................... Patch Set 5: (4 comments) I've refactored the previous patch. Main changes include * shrink the size of ORC test files * refactor nested types test * refactor types check and add tests for type conversions * refactor exceptions * add support for Char(N), Varchar(N) and add tests * correct codec for orc tests (orc/none -> orc/def/block) http://gerrit.cloudera.org:8080/#/c/9134/5/be/src/exec/hdfs-orc-scanner.cc File be/src/exec/hdfs-orc-scanner.cc: http://gerrit.cloudera.org:8080/#/c/9134/5/be/src/exec/hdfs-orc-scanner.cc@567 PS5, Line 567: // TODO: combine this with the Parquet implementation > Can we do this as part of this patch? I'd prefer to avoid adding duplicate Not quite familiar with the LLVM Codegen codes. I've simply tried extracting these two functions and parquet scanner's into hdfs-scanner. However, the impalad cannot launch after that... The failure is I0321 06:25:05.327670 1320 status.cc:125] Failed to find function _ZN6impala18HdfsParquetScanner17EvalRuntimeFilterEiPNS_8TupleRowE @ 0x16b63bd impala::Status::Status() @ 0x1bd20b1 impala::LlvmCodeGen::InitializeLlvm() @ 0x19c46e6 ImpaladMain() @ 0x1658c10 main @ 0x7fa14a578f45 __libc_start_main @ 0x1658a81 (unknown) F0321 06:25:05.327700 1320 llvm-codegen.cc:448] Check failed: execution_engine_ == nullptr Must Close() before destruction Looks like Codegen functions are not initialized correctly. Maybe codegen logics like HdfsParquetScanner::CodegenEvalRuntimeFilters should be extracted too. Is it trivial to fix this? If we extract the EvalRuntimeFilter codes into hdfs-scanner, does that mean we can support RuntimeFilter in all file formats? If so we could create a JIRA for this complicate refactoring. http://gerrit.cloudera.org:8080/#/c/9134/5/be/src/exec/hdfs-scanner.h File be/src/exec/hdfs-scanner.h: http://gerrit.cloudera.org:8080/#/c/9134/5/be/src/exec/hdfs-scanner.h@326 PS5, Line 326: /// Size of the file footer. This is a guess. If this value is too little, we will > Mention ORC and Parquet? E.g. "Size of the file footer for ORC and Parquet" Done http://gerrit.cloudera.org:8080/#/c/9134/5/testdata/LineItemMultiBlock/lineitem_orc_multiblock_one_stripe.orc File testdata/LineItemMultiBlock/lineitem_orc_multiblock_one_stripe.orc: PS5: > These files are pretty big - they'll increase the size of the Impala repo q Done http://gerrit.cloudera.org:8080/#/c/9134/5/testdata/bin/load_nested.py File testdata/bin/load_nested.py: http://gerrit.cloudera.org:8080/#/c/9134/5/testdata/bin/load_nested.py@297 PS5, Line 297: def load_orc(): > That's really helpful! Thanks, Joe! Done -- To view, visit http://gerrit.cloudera.org:8080/9134 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia7b6ae4ce3b9ee8125b21993702faa87537790a4 Gerrit-Change-Number: 9134 Gerrit-PatchSet: 5 Gerrit-Owner: Quanlong Huang <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Joe McDonnell <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Comment-Date: Wed, 21 Mar 2018 14:12:54 +0000 Gerrit-HasComments: Yes
