Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9134 )

Change subject: IMPALA-5717: Support for reading ORC data files
......................................................................


Patch Set 5:

(1 comment)

Thanks for your detailed comments, Tim! Wrapping the Status in the exception is 
quite a good idea! I'm working on bug fixes of the ORC library these days, so 
please expect my slow updates.
I'll refactor codes and add support & tests for VARCHAR and CHAR.

http://gerrit.cloudera.org:8080/#/c/9134/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/9134/5//COMMIT_MSG@19
PS5, Line 19:  - Most of the end-to-end tests can run on ORC format.
> Can you enable orc in test_scanners_fuzz.py and run that in a loop for a wh
Sorry that I should have mentioned this in the commit message. I've enabled the 
test_scanner_fuzz.py in my local branch and found many bugs in the ORC lib when 
reading corrupt files, i.e. ORC-311, ORC-312, ORC-313, ORC-314, ORC-317, 
ORC-319. Some of my PRs are merged and some are under review. The stacktrace 
you pasted is due to ORC-313.

The ORC reader in version 1.4.3-release is not robust enough for 
test_scanner_fuzz.py, so I finally disable this test. I think it's ok for now 
since the random corrupt test is too strict. We can enable it when ORC release 
a new version.

In Hulu, we've deployed Impala-ORC (impala-2.5 on ORC-1.2.3) in production for 
more than half a year. It's luckily that we haven't encountered corrupt files 
that crash Impala. So I think we can compromise on this until a new ORC version 
is released.



--
To view, visit http://gerrit.cloudera.org:8080/9134
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia7b6ae4ce3b9ee8125b21993702faa87537790a4
Gerrit-Change-Number: 9134
Gerrit-PatchSet: 5
Gerrit-Owner: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Dan Hecht <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Comment-Date: Thu, 15 Mar 2018 12:20:55 +0000
Gerrit-HasComments: Yes

Reply via email to