So working with MapR support we tried that with Impala, but it didn't
produce the desired results because the outputfile worked fine in Drill.
Theory: Evil file is created in Mapr Reduce, and is using a different
writer than Impala is using. Impala can read the evil file, but when it
writes it uses it's own writer, "fixing" the issue on the fly.  Thus, Drill
can't read evil file, but if we try to reduce with Impala, files is no
longer evil, consider it... chaotic neutral ... (For all you D&D fans )

I'd ideally love to extract into badness, but on the phone now with MapR
support to figure out HOW, that is the question at hand.

John

On Fri, May 27, 2016 at 10:09 AM, Ted Dunning <[email protected]> wrote:

> On Thu, May 26, 2016 at 8:50 PM, John Omernik <[email protected]> wrote:
>
> > So, if we have a known "bad" Parquet file (I use quotes, because
> remember,
> > Impala queries this file just fine) created in Map Reduce, with a column
> > causing Array Index Out of Bounds problems with a BIGINT typed column.
> What
> > would your next steps be to troubleshoot?
> >
>
> I would start reducing the size of the evil file.
>
> If you have a tool that can query the bad parquet and write a new one
> (sounds like Impala might do here) then selecting just the evil column is a
> good first step.
>
> After that, I would start bisecting to find a small range that still causes
> the problem. There may not be such, but it is good thing to try.
>
> At that point, you could easily have the problem down to a few kilobytes of
> data that can be used in a unit test.
>

Reply via email to