the new parquet reader, the complex reader, is disabled by default. You can
enable it by setting the following option to true:

store.parquet.use_new_reader



On Sat, May 28, 2016 at 4:56 AM, John Omernik <j...@omernik.com> wrote:

> I remember reading that drill uses two readers. One for certain cases ( I
> think flat structures) and the other for complex structures.  A. Am I
> remembering correctly? B. If so, can I determine via the plan or something
> which is being used? And C. Can I force Drill to try the other reader?
>
> On Saturday, May 28, 2016, Ted Dunning <ted.dunn...@gmail.com> wrote:
>
> > The Parquet user/dev mailing list might be helpful here. They have a real
> > stake in making sure that all readers/writers can work together. The
> > problem here really does sound like there is a borderline case that isn't
> > handled as well in the Drill special purpose parquet reader as in the
> > normal readers.
> >
> >
> >
> >
> >
> > On Fri, May 27, 2016 at 7:23 PM, John Omernik <j...@omernik.com
> > <javascript:;>> wrote:
> >
> > > So working with MapR support we tried that with Impala, but it didn't
> > > produce the desired results because the outputfile worked fine in
> Drill.
> > > Theory: Evil file is created in Mapr Reduce, and is using a different
> > > writer than Impala is using. Impala can read the evil file, but when it
> > > writes it uses it's own writer, "fixing" the issue on the fly.  Thus,
> > Drill
> > > can't read evil file, but if we try to reduce with Impala, files is no
> > > longer evil, consider it... chaotic neutral ... (For all you D&D fans )
> > >
> > > I'd ideally love to extract into badness, but on the phone now with
> MapR
> > > support to figure out HOW, that is the question at hand.
> > >
> > > John
> > >
> > > On Fri, May 27, 2016 at 10:09 AM, Ted Dunning <ted.dunn...@gmail.com
> > <javascript:;>>
> > > wrote:
> > >
> > > > On Thu, May 26, 2016 at 8:50 PM, John Omernik <j...@omernik.com
> > <javascript:;>> wrote:
> > > >
> > > > > So, if we have a known "bad" Parquet file (I use quotes, because
> > > > remember,
> > > > > Impala queries this file just fine) created in Map Reduce, with a
> > > column
> > > > > causing Array Index Out of Bounds problems with a BIGINT typed
> > column.
> > > > What
> > > > > would your next steps be to troubleshoot?
> > > > >
> > > >
> > > > I would start reducing the size of the evil file.
> > > >
> > > > If you have a tool that can query the bad parquet and write a new one
> > > > (sounds like Impala might do here) then selecting just the evil
> column
> > > is a
> > > > good first step.
> > > >
> > > > After that, I would start bisecting to find a small range that still
> > > causes
> > > > the problem. There may not be such, but it is good thing to try.
> > > >
> > > > At that point, you could easily have the problem down to a few
> > kilobytes
> > > of
> > > > data that can be used in a unit test.
> > > >
> > >
> >
>
>
> --
> Sent from my iThing
>



-- 

Abdelhakim Deneche

Software Engineer

  <http://www.mapr.com/>


Now Available - Free Hadoop On-Demand Training
<http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available>

Reply via email to