RE: Drill error with large sort

2016-02-26 Thread Paul Friedman
Thanks for this, these parameters fixed it!

---Paul


-Original Message-
From: Abdel Hakim Deneche [mailto:adene...@maprtech.com]
Sent: Thursday, February 25, 2016 5:32 PM
To: user <user@drill.apache.org>
Subject: Re: Drill error with large sort

Not so short answer:

In Drill 1.5 (I assume you are using 1.5) we have an improved allocator that
better tracks how much memory each operator is using. In your case it seems
that the date has very wide columns that are causing Sort to choke on the
very first batch of data (1024 records taking up 224MB!!!) because it's way
more than it's memory limit (around 178MB in your particular case).
Drill uses a fancy equation to compute this limit and increasing the
aforementioned option will increase the sort limit. More details here:

http://drill.apache.org/docs/configuring-drill-memory/

On Thu, Feb 25, 2016 at 5:26 PM, Abdel Hakim Deneche <adene...@maprtech.com>
wrote:

> Short answer:
>
> increase the value of planner.memory.max_query_memory_per_node, by
> default it's set to 2GB, try setting to 4 or even 8GB. This should get
> the query to pass.
>
> On Thu, Feb 25, 2016 at 5:24 PM, Jeff Maass <jma...@cccis.com> wrote:
>
>>
>> If you are open to changing the query:
>>   # try removing the functions on the 5th column
>>   # is there any way you could further limit the query?
>>   # does the query finish if u add a limit / top clause?
>>   # what do the logs say?
>>
>> 
>> From: Paul Friedman <paul.fried...@streetlightdata.com>
>> Sent: Thursday, February 25, 2016 7:07:12 PM
>> To: user@drill.apache.org
>> Subject: Drill error with large sort
>>
>> I’ve got a query reading from a large directory of parquet files (41
>> GB) and I’m consistently getting this error:
>>
>>
>>
>> Error: RESOURCE ERROR: One or more nodes ran out of memory while
>> executing the query.
>>
>>
>>
>> Unable to allocate sv2 for 1023 records, and not enough batchGroups
>> to spill.
>>
>> batchGroups.size 0
>>
>> spilledBatchGroups.size 0
>>
>> allocated memory 224287987
>>
>> allocator limit 178956970
>>
>> Fragment 0:0
>>
>>
>>
>> [Error Id: 878d604c-4656-4a5a-8b46-ff38a6ae020d on
>> chai.dev.streetlightdata.com:31010] (state=,code=0)
>>
>>
>>
>> Direct memory is set to 48GB and heap is 8GB.
>>
>>
>>
>> The query is:
>>
>>
>>
>> select probe_id, provider_id, is_moving, mode,
>> cast(convert_to(points,
>> 'JSON') as varchar(1))
>>
>> from dfs.`/home/paul/data`
>>
>> where
>>
>> start_lat between 24.4873780449008 and 60.0108911181433 and
>>
>> start_lon between -139.065890469841 and -52.8305074899881 and
>>
>> provider_id = '343' and
>>
>> mod(abs(hash(probe_id)),  100) = 0
>>
>> order by probe_id, start_time;
>>
>>
>>
>> I’m also using the “example” drill-override configuration.
>>
>>
>>
>> Any help would be appreciated.
>>
>>
>>
>> Thanks.
>>
>>
>>
>> ---Paul
>>
>
>
>
> --
>
> Abdelhakim Deneche
>
> Software Engineer
>
>   <http://www.mapr.com/>
>
>
> Now Available - Free Hadoop On-Demand Training
> <http://www.mapr.com/training?utm_source=Email_medium=Signature
> m_campaign=Free%20available>
>



-- 

Abdelhakim Deneche

Software Engineer

  <http://www.mapr.com/>


Now Available - Free Hadoop On-Demand Training
<http://www.mapr.com/training?utm_source=Email_medium=Signature_campaign=Free%20available>


Drill error with large sort

2016-02-25 Thread Paul Friedman
I’ve got a query reading from a large directory of parquet files (41 GB)
and I’m consistently getting this error:



Error: RESOURCE ERROR: One or more nodes ran out of memory while executing
the query.



Unable to allocate sv2 for 1023 records, and not enough batchGroups to
spill.

batchGroups.size 0

spilledBatchGroups.size 0

allocated memory 224287987

allocator limit 178956970

Fragment 0:0



[Error Id: 878d604c-4656-4a5a-8b46-ff38a6ae020d on
chai.dev.streetlightdata.com:31010] (state=,code=0)



Direct memory is set to 48GB and heap is 8GB.



The query is:



select probe_id, provider_id, is_moving, mode,  cast(convert_to(points,
'JSON') as varchar(1))

from dfs.`/home/paul/data`

where

start_lat between 24.4873780449008 and 60.0108911181433 and

start_lon between -139.065890469841 and -52.8305074899881 and

provider_id = '343' and

mod(abs(hash(probe_id)),  100) = 0

order by probe_id, start_time;



I’m also using the “example” drill-override configuration.



Any help would be appreciated.



Thanks.



---Paul


RE: Help with error message...

2016-02-10 Thread Paul Friedman
Hmmm...  I'm not sure that's it.

I can also reproduce this by connecting sqlline from the same machine to the
IP address of the machine (no network/firewall/etc).  It appears there's a
timeout SOMEWHERE but I'm at a loss to find it - and the PostgreSQL instance
on the same machine ISN'T having this problem.

---Paul


-Original Message-
From: Nirav Shah [mailto:nirav.s...@games24x7.com]
Sent: Tuesday, February 09, 2016 9:07 PM
To: user@drill.apache.org
Subject: RE: Help with error message...

I had the same issue, servers were on aws same lane, means very remote
chance of disconnection but finally we found there were packet drops.
On Feb 10, 2016 3:39 AM, "Paul Friedman" <paul.fried...@streetlightdata.com>
wrote:

> Thanks for the reply.  Since the 2 machines are on the same LAN (no
> firewall in between), does the Drill JDBC driver (or drill-embedded
> server) have any timeouts which can be increased?
>
> Interestingly, the client side (JDBC) doesn't notice that the server
> side
> (Drill-embedded) has disconnected.
>
> ---Paul
>
>
> -Original Message-
> From: Nirav Shah [mailto:nirav.s...@games24x7.com]
> Sent: Tuesday, February 09, 2016 11:38 AM
> To: user@drill.apache.org
> Subject: Re: Help with error message...
>
> From the logs it looks like network drop between nodes.
> If it fails on exact time say 10 min than check with firewall settings.
> On Feb 10, 2016 12:27 AM, "Paul Friedman"
> <paul.fried...@streetlightdata.com>
> wrote:
>
> > Hello...
> >
> > I'm executing a long-running Drill (1.4) query (4-10mins) called via
> > JDBC from Talend and sometimes I'm seeing an error stack like this
> > (see below)
> >
> > The query is a select statement with an order by against a directory
> > of Parquet files which were produced by Spark.  Probably half the
> > time it succeeds and returns the expected results, but often it's
> > erroring out as below.
> >
> > Can you help with any insights?
> >
> > Thanks in advance.
> >
> > ---Paul
> >
> > ...
> > 2016-02-08 16:47:47,275
> > [2946cbe3-e73d-2ed4-da60-76c1bd799372:frag:1:0]
> > INFO
> > o.a.d.e.w.fragment.FragmentExecutor -
> > 2946cbe3-e73d-2ed4-da60-76c1bd799372:1:0: State change requested
> > RUNNING
> > -->
> > FINISHED
> > 2016-02-08 16:47:47,276
> > [2946cbe3-e73d-2ed4-da60-76c1bd799372:frag:1:0]
> > INFO
> > o.a.d.e.w.f.FragmentStatusReporter -
> > 2946cbe3-e73d-2ed4-da60-76c1bd799372:1:0: State to report: FINISHED
> > 2016-02-08 16:48:25,496 [UserServer-1] INFO
> > o.a.d.e.w.fragment.FragmentExecutor -
> > 2946cbe3-e73d-2ed4-da60-76c1bd799372:0:0: State change requested
> > RUNNING
> > -->
> > FAILED
> > 2016-02-08 16:48:25,778
> > [2946cbe3-e73d-2ed4-da60-76c1bd799372:frag:0:0]
> > INFO
> > o.a.d.e.w.fragment.FragmentExecutor -
> > 2946cbe3-e73d-2ed4-da60-76c1bd799372:0:0: State change requested
> > FAILED --> FAILED
> > 2016-02-08 16:48:25,779 [UserServer-1] INFO
> > o.a.d.e.w.fragment.FragmentExecutor -
> > 2946cbe3-e73d-2ed4-da60-76c1bd799372:0:0: State change requested
> > FAILED --> FAILED
> > 2016-02-08 16:48:25,779 [CONTROL-rpc-event-queue] INFO
> > o.a.d.e.w.fragment.FragmentExecutor -
> > 2946cbe3-e73d-2ed4-da60-76c1bd799372:0:0: State change requested
> > FAILED --> CANCELLATION_REQUESTED
> > 2016-02-08 16:48:25,779 [CONTROL-rpc-event-queue] WARN
> > o.a.d.e.w.fragment.FragmentExecutor -
> > 2946cbe3-e73d-2ed4-da60-76c1bd799372:0:0: Ignoring unexpected state
> > transition FAILED --> CANCELLATION_REQUESTED
> > 2016-02-08 16:48:25,779
> > [2946cbe3-e73d-2ed4-da60-76c1bd799372:frag:0:0]
> > INFO
> > o.a.d.e.w.fragment.FragmentExecutor -
> > 2946cbe3-e73d-2ed4-da60-76c1bd799372:0:0: State change requested
> > FAILED --> FAILED
> > 2016-02-08 16:48:25,780
> > [2946cbe3-e73d-2ed4-da60-76c1bd799372:frag:0:0]
> > INFO
> > o.a.d.e.w.fragment.FragmentExecutor -
> > 2946cbe3-e73d-2ed4-da60-76c1bd799372:0:0: State change requested
> > FAILED --> FINISHED
> > 2016-02-08 16:48:25,781 [UserServer-1] WARN
> > o.a.d.exec.rpc.RpcExceptionHandler - Exception occurred with closed
> > channel.
> > Connection: /172.20.20.154:31010 <--> /172.20.20.157:64101 (user
> > client)
> > java.nio.channels.ClosedChannelException: null
> > 2016-02-08 16:48:25,783
> > [2946cbe3-e73d-2ed4-da60-76c1bd799372:frag:0:0]
> > ERROR o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR:
> > ChannelClosedException: Channel closed /172.20.20.154:3

Help with error message...

2016-02-09 Thread Paul Friedman
Hello...

I'm executing a long-running Drill (1.4) query (4-10mins) called via JDBC
from Talend and sometimes I'm seeing an error stack like this (see below)

The query is a select statement with an order by against a directory of
Parquet files which were produced by Spark.  Probably half the time it
succeeds and returns the expected results, but often it's erroring out as
below.

Can you help with any insights?

Thanks in advance.

---Paul

...
2016-02-08 16:47:47,275 [2946cbe3-e73d-2ed4-da60-76c1bd799372:frag:1:0] INFO
o.a.d.e.w.fragment.FragmentExecutor -
2946cbe3-e73d-2ed4-da60-76c1bd799372:1:0: State change requested RUNNING -->
FINISHED
2016-02-08 16:47:47,276 [2946cbe3-e73d-2ed4-da60-76c1bd799372:frag:1:0] INFO
o.a.d.e.w.f.FragmentStatusReporter -
2946cbe3-e73d-2ed4-da60-76c1bd799372:1:0: State to report: FINISHED
2016-02-08 16:48:25,496 [UserServer-1] INFO
o.a.d.e.w.fragment.FragmentExecutor -
2946cbe3-e73d-2ed4-da60-76c1bd799372:0:0: State change requested RUNNING -->
FAILED
2016-02-08 16:48:25,778 [2946cbe3-e73d-2ed4-da60-76c1bd799372:frag:0:0] INFO
o.a.d.e.w.fragment.FragmentExecutor -
2946cbe3-e73d-2ed4-da60-76c1bd799372:0:0: State change requested FAILED -->
FAILED
2016-02-08 16:48:25,779 [UserServer-1] INFO
o.a.d.e.w.fragment.FragmentExecutor -
2946cbe3-e73d-2ed4-da60-76c1bd799372:0:0: State change requested FAILED -->
FAILED
2016-02-08 16:48:25,779 [CONTROL-rpc-event-queue] INFO
o.a.d.e.w.fragment.FragmentExecutor -
2946cbe3-e73d-2ed4-da60-76c1bd799372:0:0: State change requested FAILED -->
CANCELLATION_REQUESTED
2016-02-08 16:48:25,779 [CONTROL-rpc-event-queue] WARN
o.a.d.e.w.fragment.FragmentExecutor -
2946cbe3-e73d-2ed4-da60-76c1bd799372:0:0: Ignoring unexpected state
transition FAILED --> CANCELLATION_REQUESTED
2016-02-08 16:48:25,779 [2946cbe3-e73d-2ed4-da60-76c1bd799372:frag:0:0] INFO
o.a.d.e.w.fragment.FragmentExecutor -
2946cbe3-e73d-2ed4-da60-76c1bd799372:0:0: State change requested FAILED -->
FAILED
2016-02-08 16:48:25,780 [2946cbe3-e73d-2ed4-da60-76c1bd799372:frag:0:0] INFO
o.a.d.e.w.fragment.FragmentExecutor -
2946cbe3-e73d-2ed4-da60-76c1bd799372:0:0: State change requested FAILED -->
FINISHED
2016-02-08 16:48:25,781 [UserServer-1] WARN
o.a.d.exec.rpc.RpcExceptionHandler - Exception occurred with closed channel.
Connection: /172.20.20.154:31010 <--> /172.20.20.157:64101 (user client)
java.nio.channels.ClosedChannelException: null
2016-02-08 16:48:25,783 [2946cbe3-e73d-2ed4-da60-76c1bd799372:frag:0:0]
ERROR o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR:
ChannelClosedException: Channel closed /172.20.20.154:31010 <-->
/172.20.20.157:64101.

Fragment 0:0

[Error Id: 2f075631-fb49-4feb-b39d-cbe89083a2ee on
chai.dev.streetlightdata.com:31010]
org.apache.drill.common.exceptions.UserException: SYSTEM ERROR:
ChannelClosedException: Channel closed /172.20.20.154:31010 <-->
/172.20.20.157:64101.

Fragment 0:0

[Error Id: 2f075631-fb49-4feb-b39d-cbe89083a2ee on
chai.dev.streetlightdata.com:31010]
at
org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:534)
~[drill-common-1.4.0.jar:1.4.0]
at
org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:321)
[drill-java-exec-1.4.0.jar:1.4.0]
at
org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:184)
[drill-java-exec-1.4.0.jar:1.4.0]
at
org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:290)
[drill-java-exec-1.4.0.jar:1.4.0]
at
org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
[drill-common-1.4.0.jar:1.4.0]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
[na:1.8.0_66]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
[na:1.8.0_66]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_66]
Caused by: org.apache.drill.exec.rpc.ChannelClosedException: Channel closed
/172.20.20.154:31010 <--> /172.20.20.157:64101.
at
org.apache.drill.exec.rpc.RpcBus$ChannelClosedHandler.operationComplete(RpcBus.java:175)
~[drill-rpc-1.4.0.jar:1.4.0]
at
org.apache.drill.exec.rpc.RpcBus$ChannelClosedHandler.operationComplete(RpcBus.java:151)
~[drill-rpc-1.4.0.jar:1.4.0]
at
io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680)
~[netty-common-4.0.27.Final.jar:4.0.27.Final]
at
io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:603)
~[netty-common-4.0.27.Final.jar:4.0.27.Final]
at
io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:563)
~[netty-common-4.0.27.Final.jar:4.0.27.Final]
at
io.netty.util.concurrent.DefaultPromise.trySuccess(DefaultPromise.java:406)
~[netty-common-4.0.27.Final.jar:4.0.27.Final]
at
io.netty.channel.DefaultChannelPromise.trySuccess(DefaultChannelPromise.java:82)
~[netty-transport-4.0.27.Final.jar:4.0.27.Final]
at

RE: Help with error message...

2016-02-09 Thread Paul Friedman
Thanks for the reply.  Since the 2 machines are on the same LAN (no firewall
in between), does the Drill JDBC driver (or drill-embedded server) have any
timeouts which can be increased?

Interestingly, the client side (JDBC) doesn't notice that the server side
(Drill-embedded) has disconnected.

---Paul


-Original Message-
From: Nirav Shah [mailto:nirav.s...@games24x7.com]
Sent: Tuesday, February 09, 2016 11:38 AM
To: user@drill.apache.org
Subject: Re: Help with error message...

>From the logs it looks like network drop between nodes.
If it fails on exact time say 10 min than check with firewall settings.
On Feb 10, 2016 12:27 AM, "Paul Friedman"
<paul.fried...@streetlightdata.com>
wrote:

> Hello...
>
> I'm executing a long-running Drill (1.4) query (4-10mins) called via
> JDBC from Talend and sometimes I'm seeing an error stack like this
> (see below)
>
> The query is a select statement with an order by against a directory
> of Parquet files which were produced by Spark.  Probably half the time
> it succeeds and returns the expected results, but often it's erroring
> out as below.
>
> Can you help with any insights?
>
> Thanks in advance.
>
> ---Paul
>
> ...
> 2016-02-08 16:47:47,275
> [2946cbe3-e73d-2ed4-da60-76c1bd799372:frag:1:0]
> INFO
> o.a.d.e.w.fragment.FragmentExecutor -
> 2946cbe3-e73d-2ed4-da60-76c1bd799372:1:0: State change requested
> RUNNING
> -->
> FINISHED
> 2016-02-08 16:47:47,276
> [2946cbe3-e73d-2ed4-da60-76c1bd799372:frag:1:0]
> INFO
> o.a.d.e.w.f.FragmentStatusReporter -
> 2946cbe3-e73d-2ed4-da60-76c1bd799372:1:0: State to report: FINISHED
> 2016-02-08 16:48:25,496 [UserServer-1] INFO
> o.a.d.e.w.fragment.FragmentExecutor -
> 2946cbe3-e73d-2ed4-da60-76c1bd799372:0:0: State change requested
> RUNNING
> -->
> FAILED
> 2016-02-08 16:48:25,778
> [2946cbe3-e73d-2ed4-da60-76c1bd799372:frag:0:0]
> INFO
> o.a.d.e.w.fragment.FragmentExecutor -
> 2946cbe3-e73d-2ed4-da60-76c1bd799372:0:0: State change requested
> FAILED --> FAILED
> 2016-02-08 16:48:25,779 [UserServer-1] INFO
> o.a.d.e.w.fragment.FragmentExecutor -
> 2946cbe3-e73d-2ed4-da60-76c1bd799372:0:0: State change requested
> FAILED --> FAILED
> 2016-02-08 16:48:25,779 [CONTROL-rpc-event-queue] INFO
> o.a.d.e.w.fragment.FragmentExecutor -
> 2946cbe3-e73d-2ed4-da60-76c1bd799372:0:0: State change requested
> FAILED --> CANCELLATION_REQUESTED
> 2016-02-08 16:48:25,779 [CONTROL-rpc-event-queue] WARN
> o.a.d.e.w.fragment.FragmentExecutor -
> 2946cbe3-e73d-2ed4-da60-76c1bd799372:0:0: Ignoring unexpected state
> transition FAILED --> CANCELLATION_REQUESTED
> 2016-02-08 16:48:25,779
> [2946cbe3-e73d-2ed4-da60-76c1bd799372:frag:0:0]
> INFO
> o.a.d.e.w.fragment.FragmentExecutor -
> 2946cbe3-e73d-2ed4-da60-76c1bd799372:0:0: State change requested
> FAILED --> FAILED
> 2016-02-08 16:48:25,780
> [2946cbe3-e73d-2ed4-da60-76c1bd799372:frag:0:0]
> INFO
> o.a.d.e.w.fragment.FragmentExecutor -
> 2946cbe3-e73d-2ed4-da60-76c1bd799372:0:0: State change requested
> FAILED --> FINISHED
> 2016-02-08 16:48:25,781 [UserServer-1] WARN
> o.a.d.exec.rpc.RpcExceptionHandler - Exception occurred with closed
> channel.
> Connection: /172.20.20.154:31010 <--> /172.20.20.157:64101 (user
> client)
> java.nio.channels.ClosedChannelException: null
> 2016-02-08 16:48:25,783
> [2946cbe3-e73d-2ed4-da60-76c1bd799372:frag:0:0]
> ERROR o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR:
> ChannelClosedException: Channel closed /172.20.20.154:31010 <-->
> /172.20.20.157:64101.
>
> Fragment 0:0
>
> [Error Id: 2f075631-fb49-4feb-b39d-cbe89083a2ee on
> chai.dev.streetlightdata.com:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR:
> ChannelClosedException: Channel closed /172.20.20.154:31010 <-->
> /172.20.20.157:64101.
>
> Fragment 0:0
>
> [Error Id: 2f075631-fb49-4feb-b39d-cbe89083a2ee on
> chai.dev.streetlightdata.com:31010]
> at
>
> org.apache.drill.common.exceptions.UserException$Builder.build(UserExc
> eption.java:534)
> ~[drill-common-1.4.0.jar:1.4.0]
> at
>
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(Fr
> agmentExecutor.java:321)
> [drill-java-exec-1.4.0.jar:1.4.0]
> at
>
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentE
> xecutor.java:184)
> [drill-java-exec-1.4.0.jar:1.4.0]
> at
>
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecu
> tor.java:290)
> [drill-java-exec-1.4.0.jar:1.4.0]
> at
>
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.
> java:38)
> [drill-common-1.4.0.jar:1.4.0]
> a