Re: Requesting guidance. Having trouble generating parquet files from jdbc connection to PostgreSQL. "java.lang.OutOfMemoryError: GC overhead limit exceeded"

Reid Thompson Tue, 14 Aug 2018 04:27:02 -0700

attempting with the below still fails.
looking at pg_stat_activity it doesn't appear that a cursor is being
created.  It's still attempting to pull all the data at once.


thanks,
reid
On Mon, 2018-08-13 at 14:18 -0400, Reid Thompson wrote:
> Vitalii,
> 
> Ok, thanks, I had found that report, but didn't note the option related
> to defaultAutoCommit. 
> > [1] https://issues.apache.org/jira/browse/DRILL-4177
> 
> 
> so, something along the lines of
> 
> ..snip..
>   "url": 
> "jdbc:postgresql://myhost.mydomain.com/ateb?useCursorFetch=true&defaultFetchSize=10000&defaultAutoCommit=false",
> ..snip..
> 
> 
> thanks,
> reid
> 
> On Mon, 2018-08-13 at 20:33 +0300, Vitalii Diravka wrote:
> > [EXTERNAL SOURCE]
> >  
> > Hi Reid,
> > 
> > Look like your issue is similar to DRILL-4177 [1].
> > It was related to MySQL connection. Looks like the similar issue is with 
> > PostgreSQL.
> > Looking at the Postgres documentation, the code needs to explicitly set the 
> > connection autocommit mode 
> > to false e.g. conn.setAutoCommit(false) [2]. For data size of 10 million 
> > plus, this is a must.
> > 
> > You could disable "Auto Commit" option as session option [3] 
> > or to do it within plugin config URL with the following property: 
> > defaultAutoCommit=false [4]
> > 
> > [1] https://issues.apache.org/jira/browse/DRILL-4177
> > [2] 
> > https://jdbc.postgresql.org/documentation/93/query.html#fetchsize-example
> > [3] https://www.postgresql.org/docs/9.3/static/ecpg-sql-set-autocommit.html
> > [4] https://jdbc.postgresql.org/documentation/head/ds-cpds.html
> > 
> > Kind regards
> > Vitalii
> > 
> > 
> > On Mon, Aug 13, 2018 at 3:03 PM Reid Thompson <[email protected]> 
> > wrote:
> > > My standalone host is configured with 16GB RAM, 8 cpus.  Using
> > > drill-embedded (single host standalone), I am attempting to pull data
> > > from PostgreSQL tables to parquet files via CTAS. Smaller datasets work
> > > fine, but larger data sets fail (for example ~11GB) with
> > > "java.lang.OutOfMemoryError: GC overhead limit exceeded"  Can someone
> > > advise on how to get past this?
> > > 
> > > Is there a way to have drill stream this data from PostgreSQL to parquet
> > > files on disk, or does the data set have to be completely loaded into
> > > memory before it can be written to disk?  The documentation indicates
> > > that drill will spill to disk to avoid memory issues, so I had hoped
> > > that it would be straightforward to extract from the DB to disk.
> > > 
> > > Should I not be attempting this via CTAS?  What are the other options?
> > > 
> > > 
> > > thanks,
> > > reid
> > > 
> > > 
> > > 
> > > 
> 
>

Re: Requesting guidance. Having trouble generating parquet files from jdbc connection to PostgreSQL. "java.lang.OutOfMemoryError: GC overhead limit exceeded"

Reply via email to