For the record, I think this was just bad memory configuration after all. I retested on bigger machines and everything seems to be working fine.

On Tue, Aug 09, 2016 at 10:46:33PM +0530, Khurram Faraaz wrote:
Oscar, can you please report a JIRA with the required steps to reproduce
the OOM error. That way someone from the Drill team will take a look and
investigate.

For others interested here is the stack trace.

2016-08-09 16:51:14,280 [285642de-ab37-de6e-a54c-378aaa4ce50e:foreman]
ERROR o.a.drill.common.CatastrophicFailure - Catastrophic Failure Occurred,
exiting. Information message: Unable to handle out of memory condition in
Foreman.
java.lang.OutOfMemoryError: Java heap space
       at java.util.Arrays.copyOfRange(Arrays.java:2694) ~[na:1.7.0_111]
       at java.lang.String.<init>(String.java:203) ~[na:1.7.0_111]
       at java.lang.StringBuilder.toString(StringBuilder.java:405)
~[na:1.7.0_111]
       at org.apache.calcite.util.Util.newInternal(Util.java:785)
~[calcite-core-1.4.0-drill-r16-PATCHED.jar:1.4.0-drill-r16-PATCHED]
       at
org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:251)
~[calcite-core-1.4.0-drill-r16-PATCHED.jar:1.4.0-drill-r16-PATCHED]
       at
org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:808)
~[calcite-core-1.4.0-drill-r16-PATCHED.jar:1.4.0-drill-r16-PATCHED]
       at
org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:303)
~[calcite-core-1.4.0-drill-r16-PATCHED.jar:1.4.0-drill-r16-PATCHED]
       at
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.transform(DefaultSqlHandler.java:404)
~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
       at
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.transform(DefaultSqlHandler.java:343)
~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
       at
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToDrel(DefaultSqlHandler.java:240)
~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
       at
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToDrel(DefaultSqlHandler.java:290)
~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
       at
org.apache.drill.exec.planner.sql.handlers.ExplainHandler.getPlan(ExplainHandler.java:61)
~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
       at
org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:94)
~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
       at
org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:978)
~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
       at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:257)
~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
       at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
[na:1.7.0_111]
       at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
[na:1.7.0_111]
       at java.lang.Thread.run(Thread.java:745) [na:1.7.0_111]

Thanks,
Khurram

On Tue, Aug 9, 2016 at 7:46 PM, Oscar Morante <[email protected]> wrote:

Yeah, when I uncomment only the `upload_date` lines (a dir0 alias),
explain succeeds within ~30s.  Enabling any of the other lines triggers the
failure.

This is a log with the `upload_date` lines and `usage <> 'Test'` enabled:
https://gist.github.com/spacepluk/d7ac11c0de6859e4bd003d2022b3c55e

The client times out around here (~1.5hours):
https://gist.github.com/spacepluk/d7ac11c0de6859e4bd003d2022
b3c55e#file-drillbit-log-L178

And it still keeps running for a while until it dies (~2.5hours):
https://gist.github.com/spacepluk/d7ac11c0de6859e4bd003d2022
b3c55e#file-drillbit-log-L178

The memory settings for this test were:

   DRILL_HEAP="4G"
   DRILL_MAX_DIRECT_MEMORY="8G"

This is on a laptop with 16G and I should probably lower it, but it seems
a bit excessive for such a small query.  And I think I got the same results
on a 2 node cluster with 8/16.  I'm gonna try again on the cluster to make
sure.

Thanks,
Oscar


On Tue, Aug 09, 2016 at 04:13:17PM +0530, Khurram Faraaz wrote:

You mentioned "*But if I uncomment the where clause then it runs for a
couple of hours until it runs out of memory.*"

Can you please share the OutOfMemory details from drillbit.log and the
value of DRILL_MAX_DIRECT_MEMORY

Can you also try to see what happens if you retain just this line where
upload_date = '2016-08-01' in your where clause, can you check if the
explain succeeds.

Thanks,
Khurram

On Tue, Aug 9, 2016 at 4:00 PM, Oscar Morante <[email protected]>
wrote:

Hi there,
I've been stuck with this for a while and I'm not sure if I'm running
into
a bug or I'm just doing something very wrong.

I have this stripped-down version of my query:
https://gist.github.com/spacepluk/9ab1e1a0cfec6f0efb298f023f4c805b

The data is just a single file with one record (1.5K).

Without changing anything, explain takes ~1sec on my machine.  But if I
uncomment the where clause then it runs for a couple of hours until it
runs
out of memory.

Also if I uncomment the where clause *and* take out the join, then it
takes around 30s to plan.

Any ideas?
Thanks!




--
Oscar Morante
"Self-education is, I firmly believe, the only kind of education there is."
                                                         -- Isaac Asimov.

Attachment: signature.asc
Description: Digital signature

Reply via email to