Stefan,

Does your source data contain varchar columns?  We've seen instances where
Drill isn't as efficient as it can be when Parquet is dealing with variable
length columns.

-- Zelaine

On Fri, May 13, 2016 at 9:26 AM, Stefan Sedich <stefan.sed...@gmail.com>
wrote:

> Thanks for getting back to me so fast!
>
> I was just playing with that now, went up to 8GB and still ran into it,
> trying to go higher to see if I can find the sweet spot, only got 16GB
> total RAM on this laptop :)
>
> Is this an expected amount of memory for not an overly huge table (16
> million rows, 6 columns of integers), even now at a 12GB heap seems to have
> filled up again.
>
>
>
> Thanks
>
> On Fri, May 13, 2016 at 9:20 AM Jason Altekruse <ja...@dremio.com> wrote:
>
> > I could not find anywhere this is mentioned in the docs, but it has come
> up
> > a few times one the list. While we made a number of efforts to move our
> > interactions with the Parquet library to the off-heap memory (which we
> use
> > everywhere else in the engine during processing) the version of the
> writer
> > we are using still buffers a non-trivial amount of data into heap memory
> > when writing parquet files. Try raising your JVM heap memory in
> > drill-env.sh on startup and see if that prevents the out of memory issue.
> >
> > Jason Altekruse
> > Software Engineer at Dremio
> > Apache Drill Committer
> >
> > On Fri, May 13, 2016 at 9:07 AM, Stefan Sedich <stefan.sed...@gmail.com>
> > wrote:
> >
> > > Just trying to do a CTAS on a postgres table, it is not huge and only
> has
> > > 16 odd million rows, I end up with an out of memory after a while.
> > >
> > > Unable to handle out of memory condition in FragmentExecutor.
> > >
> > > java.lang.OutOfMemoryError: GC overhead limit exceeded
> > >
> > >
> > > Is there a way to avoid this without needing to do the CTAS on a subset
> > of
> > > my table?
> > >
> >
>

Reply via email to