Re: Requesting guidance. Having trouble generating parquet files from jdbc connection to PostgreSQL. "java.lang.OutOfMemoryError: GC overhead limit exceeded"

Vitalii Diravka Wed, 15 Aug 2018 04:45:17 -0700

Hi Reid,

Am I right, *defaultRowFetchSize=10000 *property in URL solves that OOM
issue?
If so possibly it can be useful to have this information in Drill docs [1].


[1] https://drill.apache.org/docs/rdbms-storage-plugin/

Kind regards
Vitalii


On Tue, Aug 14, 2018 at 4:17 PM Reid Thompson <[email protected]>
wrote:

> using the below parameters in the URL and looking in the defined logfile
> indicates that the fetch size is being set to 10000, as expected.
>
> just to note that it appears that the param defaultRowFetchSize sets the
> fetch size and signifies that a cursor should be used.  It is different
> from the originally noted defaultFetchSize param, and it appears that
> postgresql doesn't require the useCursorFetch=true or the
> defaultAutoCommit=false.
>
> ...snip..
>   "url": "jdbc:postgresql://
> myhost.mydomain.com/mydb?useCursorFetch=true&defaultAutoCommit=false&loggerLevel=TRACE&loggerFile=/tmp/jdbc.log&defaultRowFetchSize=10000
> ",
> ...snip..
>
>
>
> On Tue, 2018-08-14 at 07:26 -0400, Reid Thompson wrote:
> > attempting with the below still fails.
> > looking at pg_stat_activity it doesn't appear that a cursor is being
> > created.  It's still attempting to pull all the data at once.
> >
> > thanks,
> > reid
> > On Mon, 2018-08-13 at 14:18 -0400, Reid Thompson wrote:
> > > Vitalii,
> > >
> > > Ok, thanks, I had found that report, but didn't note the option related
> > > to defaultAutoCommit.
> > > > [1] https://issues.apache.org/jira/browse/DRILL-4177
> > >
> > >
> > > so, something along the lines of
> > >
> > > ..snip..
> > >   "url": "jdbc:postgresql://
> myhost.mydomain.com/ateb?useCursorFetch=true&defaultFetchSize=10000&defaultAutoCommit=false
> ",
> > > ..snip..
> > >
> > >
> > > thanks,
> > > reid
> > >
> > > On Mon, 2018-08-13 at 20:33 +0300, Vitalii Diravka wrote:
> > > > [EXTERNAL SOURCE]
> > > >
> > > > Hi Reid,
> > > >
> > > > Look like your issue is similar to DRILL-4177 [1].
> > > > It was related to MySQL connection. Looks like the similar issue is
> with PostgreSQL.
> > > > Looking at the Postgres documentation, the code needs to explicitly
> set the connection autocommit mode
> > > > to false e.g. conn.setAutoCommit(false) [2]. For data size of 10
> million plus, this is a must.
> > > >
> > > > You could disable "Auto Commit" option as session option [3]
> > > > or to do it within plugin config URL with the following property:
> defaultAutoCommit=false [4]
> > > >
> > > > [1] https://issues.apache.org/jira/browse/DRILL-4177
> > > > [2]
> https://jdbc.postgresql.org/documentation/93/query.html#fetchsize-example
> > > > [3]
> https://www.postgresql.org/docs/9.3/static/ecpg-sql-set-autocommit.html
> > > > [4] https://jdbc.postgresql.org/documentation/head/ds-cpds.html
> > > >
> > > > Kind regards
> > > > Vitalii
> > > >
> > > >
> > > > On Mon, Aug 13, 2018 at 3:03 PM Reid Thompson <
> [email protected]> wrote:
> > > > > My standalone host is configured with 16GB RAM, 8 cpus.  Using
> > > > > drill-embedded (single host standalone), I am attempting to pull
> data
> > > > > from PostgreSQL tables to parquet files via CTAS. Smaller datasets
> work
> > > > > fine, but larger data sets fail (for example ~11GB) with
> > > > > "java.lang.OutOfMemoryError: GC overhead limit exceeded"  Can
> someone
> > > > > advise on how to get past this?
> > > > >
> > > > > Is there a way to have drill stream this data from PostgreSQL to
> parquet
> > > > > files on disk, or does the data set have to be completely loaded
> into
> > > > > memory before it can be written to disk?  The documentation
> indicates
> > > > > that drill will spill to disk to avoid memory issues, so I had
> hoped
> > > > > that it would be straightforward to extract from the DB to disk.
> > > > >
> > > > > Should I not be attempting this via CTAS?  What are the other
> options?
> > > > >
> > > > >
> > > > > thanks,
> > > > > reid
> > > > >
> > > > >
> > > > >
> > > > >
> > >
> > >
> >
> >
>
>
>
>

Re: Requesting guidance. Having trouble generating parquet files from jdbc connection to PostgreSQL. "java.lang.OutOfMemoryError: GC overhead limit exceeded"

Reply via email to