Happy to report back, that this is really a parquet-cpp issue and not
something in Drill. Kudos to Deepak Majeti for finding that we did not
set the dictionary_page_offset in the C++ code.
Uwe
On 07.09.16 21:08, Kunal Khatua wrote:
Hi Uwe
I believe you're using the latest Apache Drill 1.8.0. From a quick look at the
stack trace, it appears to be a potential bug on Drill's interpretation of
dictionary encoded data.
One way to verify that your C++ implementation of Parquet is correct would be
to have your generated data without dictionary encoding before attempting to
see if Drill can read that.
Regards
Kunal
On Wed 7-Sep-2016 5:30:32 AM, Uwe Korn <[email protected]> wrote:
Hello,
I'm currently looking at the correctness of our C++ implementation of
Parquet and noticed that I cannot load these files in Drill. Although
this is probably a bug in the C++ implementation, I don't understand
what causes the error. Using the Java parquet-tools, I can read these
files. I'm using Apache Drill 1.8.0 on OSX.
I've posted the error output from Drill and the parquet file as a gist:
https://gist.github.com/xhochy/d4441a5ff2025b877df43fecd4466a11
If anyone could have a short look into this and tell me why Drill cannot
read the file, you would really help me to fix the parquet-cpp issues.
Kind Regards,
Uwe