That setting is for off-heap memory. The earlier case hit heap memory limit.
> On Sep 1, 2016, at 11:36 AM, Zelaine Fong <[email protected]> wrote: > > One other thing ... have you tried tuning the planner.memory_limit > parameter? Based on the earlier stack trace, you're hitting a memory limit > during query planning. So, tuning this parameter should help that. The > default is 256 MB. > > -- Zelaine > > On Thu, Sep 1, 2016 at 11:21 AM, rahul challapalli < > [email protected]> wrote: > >> While planning we use heap memory. 2GB of heap should be sufficient for >> what you mentioned. This looks like a bug to me. Can you raise a jira for >> the same? And it would be super helpful if you can also attach the data set >> used. >> >> Rahul >> >> On Wed, Aug 31, 2016 at 9:14 AM, Oscar Morante <[email protected]> >> wrote: >> >>> Sure, >>> This is what I remember: >>> >>> * Failure >>> - embedded mode on my laptop >>> - drill memory: 2Gb/4Gb (heap/direct) >>> - cpu: 4cores (+hyperthreading) >>> - `planner.width.max_per_node=6` >>> >>> * Success >>> - AWS Cluster 2x c3.8xlarge >>> - drill memory: 16Gb/32Gb >>> - cpu: limited by kubernetes to 24cores >>> - `planner.width.max_per_node=23` >>> >>> I'm very busy right now to test again, but I'll try to provide better >> info >>> as soon as I can. >>> >>> >>> >>> On Wed, Aug 31, 2016 at 05:38:53PM +0530, Khurram Faraaz wrote: >>> >>>> Can you please share the number of cores on the setup where the query >> hung >>>> as compared to the number of cores on the setup where the query went >>>> through successfully. >>>> And details of memory from the two scenarios. >>>> >>>> Thanks, >>>> Khurram >>>> >>>> On Wed, Aug 31, 2016 at 4:50 PM, Oscar Morante <[email protected]> >>>> wrote: >>>> >>>> For the record, I think this was just bad memory configuration after >> all. >>>>> I retested on bigger machines and everything seems to be working fine. >>>>> >>>>> >>>>> On Tue, Aug 09, 2016 at 10:46:33PM +0530, Khurram Faraaz wrote: >>>>> >>>>> Oscar, can you please report a JIRA with the required steps to >> reproduce >>>>>> the OOM error. That way someone from the Drill team will take a look >> and >>>>>> investigate. >>>>>> >>>>>> For others interested here is the stack trace. >>>>>> >>>>>> 2016-08-09 16:51:14,280 [285642de-ab37-de6e-a54c- >> 378aaa4ce50e:foreman] >>>>>> ERROR o.a.drill.common.CatastrophicFailure - Catastrophic Failure >>>>>> Occurred, >>>>>> exiting. Information message: Unable to handle out of memory condition >>>>>> in >>>>>> Foreman. >>>>>> java.lang.OutOfMemoryError: Java heap space >>>>>> at java.util.Arrays.copyOfRange(Arrays.java:2694) >>>>>> ~[na:1.7.0_111] >>>>>> at java.lang.String.<init>(String.java:203) ~[na:1.7.0_111] >>>>>> at java.lang.StringBuilder.toString(StringBuilder.java:405) >>>>>> ~[na:1.7.0_111] >>>>>> at org.apache.calcite.util.Util.newInternal(Util.java:785) >>>>>> ~[calcite-core-1.4.0-drill-r16-PATCHED.jar:1.4.0-drill-r16-PATCHED] >>>>>> at >>>>>> org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch( >>>>>> VolcanoRuleCall.java:251) >>>>>> ~[calcite-core-1.4.0-drill-r16-PATCHED.jar:1.4.0-drill-r16-PATCHED] >>>>>> at >>>>>> org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp( >>>>>> VolcanoPlanner.java:808) >>>>>> ~[calcite-core-1.4.0-drill-r16-PATCHED.jar:1.4.0-drill-r16-PATCHED] >>>>>> at >>>>>> org.apache.calcite.tools.Programs$RuleSetProgram.run( >> Programs.java:303) >>>>>> ~[calcite-core-1.4.0-drill-r16-PATCHED.jar:1.4.0-drill-r16-PATCHED] >>>>>> at >>>>>> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler >>>>>> .transform(DefaultSqlHandler.java:404) >>>>>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT] >>>>>> at >>>>>> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler >>>>>> .transform(DefaultSqlHandler.java:343) >>>>>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT] >>>>>> at >>>>>> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler >>>>>> .convertToDrel(DefaultSqlHandler.java:240) >>>>>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT] >>>>>> at >>>>>> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler >>>>>> .convertToDrel(DefaultSqlHandler.java:290) >>>>>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT] >>>>>> at >>>>>> org.apache.drill.exec.planner.sql.handlers.ExplainHandler.ge >>>>>> tPlan(ExplainHandler.java:61) >>>>>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT] >>>>>> at >>>>>> org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(Dri >>>>>> llSqlWorker.java:94) >>>>>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT] >>>>>> at >>>>>> org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:978) >>>>>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT] >>>>>> at org.apache.drill.exec.work.foreman.Foreman.run(Foreman. >> java: >>>>>> 257) >>>>>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT] >>>>>> at >>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool >>>>>> Executor.java:1145) >>>>>> [na:1.7.0_111] >>>>>> at >>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo >>>>>> lExecutor.java:615) >>>>>> [na:1.7.0_111] >>>>>> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_111] >>>>>> >>>>>> Thanks, >>>>>> Khurram >>>>>> >>>>>> On Tue, Aug 9, 2016 at 7:46 PM, Oscar Morante <[email protected]> >>>>>> wrote: >>>>>> >>>>>> Yeah, when I uncomment only the `upload_date` lines (a dir0 alias), >>>>>> >>>>>>> explain succeeds within ~30s. Enabling any of the other lines >> triggers >>>>>>> the >>>>>>> failure. >>>>>>> >>>>>>> This is a log with the `upload_date` lines and `usage <> 'Test'` >>>>>>> enabled: >>>>>>> https://gist.github.com/spacepluk/d7ac11c0de6859e4bd003d2022b3c55e >>>>>>> >>>>>>> The client times out around here (~1.5hours): >>>>>>> https://gist.github.com/spacepluk/d7ac11c0de6859e4bd003d2022 >>>>>>> b3c55e#file-drillbit-log-L178 >>>>>>> >>>>>>> And it still keeps running for a while until it dies (~2.5hours): >>>>>>> https://gist.github.com/spacepluk/d7ac11c0de6859e4bd003d2022 >>>>>>> b3c55e#file-drillbit-log-L178 >>>>>>> >>>>>>> The memory settings for this test were: >>>>>>> >>>>>>> DRILL_HEAP="4G" >>>>>>> DRILL_MAX_DIRECT_MEMORY="8G" >>>>>>> >>>>>>> This is on a laptop with 16G and I should probably lower it, but it >>>>>>> seems >>>>>>> a bit excessive for such a small query. And I think I got the same >>>>>>> results >>>>>>> on a 2 node cluster with 8/16. I'm gonna try again on the cluster to >>>>>>> make >>>>>>> sure. >>>>>>> >>>>>>> Thanks, >>>>>>> Oscar >>>>>>> >>>>>>> >>>>>>> On Tue, Aug 09, 2016 at 04:13:17PM +0530, Khurram Faraaz wrote: >>>>>>> >>>>>>> You mentioned "*But if I uncomment the where clause then it runs for >> a >>>>>>> >>>>>>>> couple of hours until it runs out of memory.*" >>>>>>>> >>>>>>>> Can you please share the OutOfMemory details from drillbit.log and >> the >>>>>>>> value of DRILL_MAX_DIRECT_MEMORY >>>>>>>> >>>>>>>> Can you also try to see what happens if you retain just this line >>>>>>>> where >>>>>>>> upload_date = '2016-08-01' in your where clause, can you check if >> the >>>>>>>> explain succeeds. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Khurram >>>>>>>> >>>>>>>> On Tue, Aug 9, 2016 at 4:00 PM, Oscar Morante <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>> Hi there, >>>>>>>> >>>>>>>> I've been stuck with this for a while and I'm not sure if I'm >> running >>>>>>>>> into >>>>>>>>> a bug or I'm just doing something very wrong. >>>>>>>>> >>>>>>>>> I have this stripped-down version of my query: >>>>>>>>> https://gist.github.com/spacepluk/9ab1e1a0cfec6f0efb298f023f4c805b >>>>>>>>> >>>>>>>>> The data is just a single file with one record (1.5K). >>>>>>>>> >>>>>>>>> Without changing anything, explain takes ~1sec on my machine. But >>>>>>>>> if I >>>>>>>>> uncomment the where clause then it runs for a couple of hours until >>>>>>>>> it >>>>>>>>> runs >>>>>>>>> out of memory. >>>>>>>>> >>>>>>>>> Also if I uncomment the where clause *and* take out the join, then >> it >>>>>>>>> takes around 30s to plan. >>>>>>>>> >>>>>>>>> Any ideas? >>>>>>>>> Thanks! >>>>>>>>> >>>>>>>>> >>
