Hi Reid, Am I right, *defaultRowFetchSize=10000 *property in URL solves that OOM issue? If so possibly it can be useful to have this information in Drill docs [1].
[1] https://drill.apache.org/docs/rdbms-storage-plugin/ Kind regards Vitalii On Tue, Aug 14, 2018 at 4:17 PM Reid Thompson <[email protected]> wrote: > using the below parameters in the URL and looking in the defined logfile > indicates that the fetch size is being set to 10000, as expected. > > just to note that it appears that the param defaultRowFetchSize sets the > fetch size and signifies that a cursor should be used. It is different > from the originally noted defaultFetchSize param, and it appears that > postgresql doesn't require the useCursorFetch=true or the > defaultAutoCommit=false. > > ...snip.. > "url": "jdbc:postgresql:// > myhost.mydomain.com/mydb?useCursorFetch=true&defaultAutoCommit=false&loggerLevel=TRACE&loggerFile=/tmp/jdbc.log&defaultRowFetchSize=10000 > ", > ...snip.. > > > > On Tue, 2018-08-14 at 07:26 -0400, Reid Thompson wrote: > > attempting with the below still fails. > > looking at pg_stat_activity it doesn't appear that a cursor is being > > created. It's still attempting to pull all the data at once. > > > > thanks, > > reid > > On Mon, 2018-08-13 at 14:18 -0400, Reid Thompson wrote: > > > Vitalii, > > > > > > Ok, thanks, I had found that report, but didn't note the option related > > > to defaultAutoCommit. > > > > [1] https://issues.apache.org/jira/browse/DRILL-4177 > > > > > > > > > so, something along the lines of > > > > > > ..snip.. > > > "url": "jdbc:postgresql:// > myhost.mydomain.com/ateb?useCursorFetch=true&defaultFetchSize=10000&defaultAutoCommit=false > ", > > > ..snip.. > > > > > > > > > thanks, > > > reid > > > > > > On Mon, 2018-08-13 at 20:33 +0300, Vitalii Diravka wrote: > > > > [EXTERNAL SOURCE] > > > > > > > > Hi Reid, > > > > > > > > Look like your issue is similar to DRILL-4177 [1]. > > > > It was related to MySQL connection. Looks like the similar issue is > with PostgreSQL. > > > > Looking at the Postgres documentation, the code needs to explicitly > set the connection autocommit mode > > > > to false e.g. conn.setAutoCommit(false) [2]. For data size of 10 > million plus, this is a must. > > > > > > > > You could disable "Auto Commit" option as session option [3] > > > > or to do it within plugin config URL with the following property: > defaultAutoCommit=false [4] > > > > > > > > [1] https://issues.apache.org/jira/browse/DRILL-4177 > > > > [2] > https://jdbc.postgresql.org/documentation/93/query.html#fetchsize-example > > > > [3] > https://www.postgresql.org/docs/9.3/static/ecpg-sql-set-autocommit.html > > > > [4] https://jdbc.postgresql.org/documentation/head/ds-cpds.html > > > > > > > > Kind regards > > > > Vitalii > > > > > > > > > > > > On Mon, Aug 13, 2018 at 3:03 PM Reid Thompson < > [email protected]> wrote: > > > > > My standalone host is configured with 16GB RAM, 8 cpus. Using > > > > > drill-embedded (single host standalone), I am attempting to pull > data > > > > > from PostgreSQL tables to parquet files via CTAS. Smaller datasets > work > > > > > fine, but larger data sets fail (for example ~11GB) with > > > > > "java.lang.OutOfMemoryError: GC overhead limit exceeded" Can > someone > > > > > advise on how to get past this? > > > > > > > > > > Is there a way to have drill stream this data from PostgreSQL to > parquet > > > > > files on disk, or does the data set have to be completely loaded > into > > > > > memory before it can be written to disk? The documentation > indicates > > > > > that drill will spill to disk to avoid memory issues, so I had > hoped > > > > > that it would be straightforward to extract from the DB to disk. > > > > > > > > > > Should I not be attempting this via CTAS? What are the other > options? > > > > > > > > > > > > > > > thanks, > > > > > reid > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
