Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9134 )

Change subject: IMPALA-5717: Support for reading ORC data files
......................................................................


Patch Set 5:

(4 comments)

I've refactored the previous patch. Main changes include
* shrink the size of ORC test files
* refactor nested types test
* refactor types check and add tests for type conversions
* refactor exceptions
* add support for Char(N), Varchar(N) and add tests
* correct codec for orc tests (orc/none -> orc/def/block)

http://gerrit.cloudera.org:8080/#/c/9134/5/be/src/exec/hdfs-orc-scanner.cc
File be/src/exec/hdfs-orc-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/9134/5/be/src/exec/hdfs-orc-scanner.cc@567
PS5, Line 567: // TODO: combine this with the Parquet implementation
> Can we do this as part of this patch? I'd prefer to avoid adding duplicate
Not quite familiar with the LLVM Codegen codes. I've simply tried extracting 
these two functions and parquet scanner's into hdfs-scanner. However, the 
impalad cannot launch after that... The failure is

 I0321 06:25:05.327670  1320 status.cc:125] Failed to find function
 _ZN6impala18HdfsParquetScanner17EvalRuntimeFilterEiPNS_8TupleRowE
    @          0x16b63bd  impala::Status::Status()
    @          0x1bd20b1  impala::LlvmCodeGen::InitializeLlvm()
    @          0x19c46e6  ImpaladMain()
    @          0x1658c10  main
    @     0x7fa14a578f45  __libc_start_main
    @          0x1658a81  (unknown)
 F0321 06:25:05.327700  1320 llvm-codegen.cc:448] Check failed: 
execution_engine_ == nullptr Must Close() before destruction

Looks like Codegen functions are not initialized correctly. Maybe codegen 
logics like HdfsParquetScanner::CodegenEvalRuntimeFilters should be extracted 
too. Is it trivial to fix this? If we extract the EvalRuntimeFilter codes into 
hdfs-scanner, does that mean we can support RuntimeFilter in all file formats? 
If so we could create a JIRA for this complicate refactoring.


http://gerrit.cloudera.org:8080/#/c/9134/5/be/src/exec/hdfs-scanner.h
File be/src/exec/hdfs-scanner.h:

http://gerrit.cloudera.org:8080/#/c/9134/5/be/src/exec/hdfs-scanner.h@326
PS5, Line 326:   /// Size of the file footer.  This is a guess.  If this value 
is too little, we will
> Mention ORC and Parquet? E.g. "Size of the file footer for ORC and Parquet"
Done


http://gerrit.cloudera.org:8080/#/c/9134/5/testdata/LineItemMultiBlock/lineitem_orc_multiblock_one_stripe.orc
File testdata/LineItemMultiBlock/lineitem_orc_multiblock_one_stripe.orc:

PS5:
> These files are pretty big - they'll increase the size of the Impala repo q
Done


http://gerrit.cloudera.org:8080/#/c/9134/5/testdata/bin/load_nested.py
File testdata/bin/load_nested.py:

http://gerrit.cloudera.org:8080/#/c/9134/5/testdata/bin/load_nested.py@297
PS5, Line 297: def load_orc():
> That's really helpful! Thanks, Joe!
Done



--
To view, visit http://gerrit.cloudera.org:8080/9134
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia7b6ae4ce3b9ee8125b21993702faa87537790a4
Gerrit-Change-Number: 9134
Gerrit-PatchSet: 5
Gerrit-Owner: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Dan Hecht <[email protected]>
Gerrit-Reviewer: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Comment-Date: Wed, 21 Mar 2018 14:12:54 +0000
Gerrit-HasComments: Yes

Reply via email to