Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/9134 )
Change subject: IMPALA-5717: Support for reading ORC data files ...................................................................... Patch Set 5: (1 comment) Thanks for your detailed comments, Tim! Wrapping the Status in the exception is quite a good idea! I'm working on bug fixes of the ORC library these days, so please expect my slow updates. I'll refactor codes and add support & tests for VARCHAR and CHAR. http://gerrit.cloudera.org:8080/#/c/9134/5//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/9134/5//COMMIT_MSG@19 PS5, Line 19: - Most of the end-to-end tests can run on ORC format. > Can you enable orc in test_scanners_fuzz.py and run that in a loop for a wh Sorry that I should have mentioned this in the commit message. I've enabled the test_scanner_fuzz.py in my local branch and found many bugs in the ORC lib when reading corrupt files, i.e. ORC-311, ORC-312, ORC-313, ORC-314, ORC-317, ORC-319. Some of my PRs are merged and some are under review. The stacktrace you pasted is due to ORC-313. The ORC reader in version 1.4.3-release is not robust enough for test_scanner_fuzz.py, so I finally disable this test. I think it's ok for now since the random corrupt test is too strict. We can enable it when ORC release a new version. In Hulu, we've deployed Impala-ORC (impala-2.5 on ORC-1.2.3) in production for more than half a year. It's luckily that we haven't encountered corrupt files that crash Impala. So I think we can compromise on this until a new ORC version is released. -- To view, visit http://gerrit.cloudera.org:8080/9134 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia7b6ae4ce3b9ee8125b21993702faa87537790a4 Gerrit-Change-Number: 9134 Gerrit-PatchSet: 5 Gerrit-Owner: Quanlong Huang <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Comment-Date: Thu, 15 Mar 2018 12:20:55 +0000 Gerrit-HasComments: Yes
