Re: Memory Settings for a Non-Sorted Failed Query

John Omernik Tue, 14 Jun 2016 11:38:27 -0700

This is what I have thus far... I can provide more complete logs on a one
on one basis.


The cluster was completely mine, with fresh logs. I ran a CTAS query on a
large table that over 100 fields. This query works well in other cases,
however I was working with the Block size, both in MapR FS and Drill
Parquet. I had successfully tested 512m on each, this case was different.
Here are the facts in this setup:

- No Compression in MapRFS - Using Standard Parquet Snappy Compression
- MapR Block Size 1024m
- Parquet Block size 1024m
- Query  ends up disappearing in the profiles

- The UI page listing bits only show 4 bits however 5 are running (node 03
process is running, but no longer in the bit)

- Error (copied below)  from rest API

- No output in STD out or STD error on node3. Only two nodes actually had
"Parquet Writing" logs. The other three on Stdout, did not have any
issues/errors/

- I have standard log files, gclogs, the profile.json (before it
disappeared), and the physical plan.  Only some components that looked
possibly at issue included here

- The Node 3 GC log shows a bunch of "Full GC Allocation Failures"  that
take 4 seconds or more (When I've seen this in other cases, this time can
balloon to 8 secs or more)

- The node 3 output log show some issues with really long RPC issues.
Perhaps the GCs prevent RPC communication and create a snowball loop effect?


Other logs if people are interested can be provided upon request. I just
didn't want to flood the whole list with all the logs.


Thanks!


John






Rest Error:

./load_day.py 2016-05-09

Drill Rest Endpoint: https://drillprod.marathonprod.zeta:20000
<https://drillprod.marathonprod.zeta.ctu-bo.secureworks.net:20000/>

Day: 2016-05-09

/usr/lib/python2.7/site-packages/urllib3/connectionpool.py:769:
InsecureRequestWarning: Unverified HTTPS request is being made. Adding
certificate verification is strongly advised. See:
https://urllib3.readthedocs.org/en/latest/security.html

  InsecureRequestWarning)

Authentication successful

Error encountered: 500

{

  "errorMessage" : "SYSTEM ERROR: ForemanException: One more more nodes
lost connectivity during query.  Identified nodes were
[atl1ctuzeta03:20001].\n\n\n[Error Id: d7dd0120-f7c0-44ef-ac54-29c746b26354
on atl1ctuzeta01 <http://atl1ctuzeta01.ctu-bo.secureworks.net:20001/>:20001"

}


Possible issue in Node3 Log:


2016-06-14 17:25:27,860 [289fc208-7266-6a81-73a1-709efff6c412:frag:1:90]
INFO  o.a.d.e.w.f.FragmentStatusReporter -
289fc208-7266-6a81-73a1-709efff6c412:1:90: State to report: RUNNING

2016-06-14 17:25:27,871 [289fc208-7266-6a81-73a1-709efff6c412:frag:1:70]
INFO  o.a.d.e.w.fragment.FragmentExecutor -
289fc208-7266-6a81-73a1-709efff6c412:1:70: State change requested
AWAITING_ALLOCATION --> RUNNING

2016-06-14 17:25:27,871 [289fc208-7266-6a81-73a1-709efff6c412:frag:1:70]
INFO  o.a.d.e.w.f.FragmentStatusReporter -
289fc208-7266-6a81-73a1-709efff6c412:1:70: State to report: RUNNING

2016-06-14 17:43:41,869 [BitServer-4] WARN
o.a.d.exec.rpc.control.ControlClient - Message of mode RESPONSE of rpc type
1 took longer than 500ms.  Actual duration was 4192ms.

2016-06-14 17:45:36,720 [CONTROL-rpc-event-queue] INFO
o.a.d.e.w.fragment.FragmentExecutor -
289fc208-7266-6a81-73a1-709efff6c412:1:0: State change requested RUNNING
--> CANCELLATION_REQUESTED

2016-06-14 17:45:45,740 [CONTROL-rpc-event-queue] INFO
o.a.d.e.w.f.FragmentStatusReporter -
289fc208-7266-6a81-73a1-709efff6c412:1:0: State to report:
CANCELLATION_REQUESTED

2016-06-14 17:46:15,318 [BitServer-3] WARN
o.a.d.exec.rpc.control.ControlServer - Message of mode REQUEST of rpc type
6 took longer than 500ms.  Actual duration was 55328ms.

2016-06-14 17:46:36,057 [CONTROL-rpc-event-queue] INFO
o.a.d.e.w.fragment.FragmentExecutor -
289fc208-7266-6a81-73a1-709efff6c412:1:5: State change requested RUNNING
--> CANCELLATION_REQUESTED

2016-06-14 17:46:44,620 [CONTROL-rpc-event-queue] INFO
o.a.d.e.w.f.FragmentStatusReporter -
289fc208-7266-6a81-73a1-709efff6c412:1:5: State to report:
CANCELLATION_REQUESTED

2016-06-14 17:47:01,393 [BitServer-3] WARN
o.a.d.exec.rpc.control.ControlServer - Message of mode REQUEST of rpc type
6 took longer than 500ms.  Actual duration was 29781ms.

2016-06-14 17:47:09,463 [CONTROL-rpc-event-queue] INFO
o.a.d.e.w.fragment.FragmentExecutor -
289fc208-7266-6a81-73a1-709efff6c412:1:10: State change requested RUNNING
--> CANCELLATION_REQUESTED

2016-06-14 17:47:26,967 [CONTROL-rpc-event-queue] INFO
o.a.d.e.w.f.FragmentStatusReporter -
289fc208-7266-6a81-73a1-709efff6c412:1:10: State to report:
CANCELLATION_REQUESTED

2016-06-14 17:47:55,593 [BitServer-3] WARN
o.a.d.exec.rpc.control.ControlServer - Message of mode REQUEST of rpc type
6 took longer than 500ms.  Actual duration was 46130ms.

2016-06-14 17:48:04,497 [CONTROL-rpc-event-queue] INFO
o.a.d.e.w.fragment.FragmentExecutor -
289fc208-7266-6a81-73a1-709efff6c412:1:15: State change requested RUNNING
--> CANCELLATION_REQUESTED

2016-06-14 17:48:12,742 [BitServer-4] WARN
o.a.d.exec.rpc.control.ControlClient - Message of mode RESPONSE of rpc type
1 took longer than 500ms.  Actual duration was 4236ms.

2016-06-14 17:48:42,328 [CONTROL-rpc-event-queue] INFO
o.a.d.e.w.f.FragmentStatusReporter -
289fc208-7266-6a81-73a1-709efff6c412:1:15: State to report:
CANCELLATION_REQUESTED

2016-06-14 17:49:36,351 [BitServer-4] WARN
o.a.d.exec.rpc.control.ControlClient - Message of mode RESPONSE of rpc type
1 took longer than 500ms.  Actual duration was 4260ms.

2016-06-14 17:49:36,351 [BitServer-3] WARN
o.a.d.exec.rpc.control.ControlServer - Message of mode REQUEST of rpc type
6 took longer than 500ms.  Actual duration was 91854ms.

2016-06-14 17:50:35,273 [CONTROL-rpc-event-queue] INFO
o.a.d.e.w.fragment.FragmentExecutor -
289fc208-7266-6a81-73a1-709efff6c412:1:20: State change requested RUNNING
--> CANCELLATION_REQUESTED

2016-06-14 17:50:39,322 [289fc208-7266-6a81-73a1-709efff6c412:frag:1:15]
INFO  o.a.d.e.w.fragment.FragmentExecutor -
289fc208-7266-6a81-73a1-709efff6c412:1:15: State change requested
CANCELLATION_REQUESTED --> FAILED

2016-06-14 17:50:51,546 [CONTROL-rpc-event-queue] INFO
o.a.d.e.w.f.FragmentStatusReporter -
289fc208-7266-6a81-73a1-709efff6c412:1:20: State to report:
CANCELLATION_REQUESTED

2016-06-14 17:51:36,905 [BitServer-4] WARN
o.a.d.exec.rpc.control.ControlClient - Message of mode RESPONSE of rpc type
1 took longer than 500ms.  Actual duration was 4426ms.

2016-06-14 17:52:30,805 [BitServer-3] WARN
o.a.d.exec.rpc.control.ControlServer - Message of mode REQUEST of rpc type
6 took longer than 500ms.  Actual duration was 98767ms.

2016-06-14 17:52:47,042 [BitServer-4] WARN
o.a.d.exec.rpc.control.ControlClient - Message of mode RESPONSE of rpc type
1 took longer than 500ms.  Actual duration was 12041ms.

Possible issues in Node3 gclog:


1819.137: [Full GC (Allocation Failure)  23G->23G(24G), 4.0657064 secs]

1823.205: [GC concurrent-mark-abort]

1823.221: [GC pause (G1 Evacuation Pause) (young)-- 23G->23G(24G),
0.0382934 secs]

1823.271: [GC pause (G1 Evacuation Pause) (young) (initial-mark)
23G->23G(24G), 0.0250215 secs]

1823.296: [GC concurrent-root-region-scan-start]

1823.296: [GC concurrent-root-region-scan-end, 0.0000105 secs]

1823.296: [GC concurrent-mark-start]

1823.308: [Full GC (Allocation Failure)  23G->23G(24G), 4.3719713 secs]

1827.694: [GC pause (G1 Evacuation Pause) (young)-- 23G->23G(24G),
0.0312666 secs]

1827.727: [GC concurrent-mark-abort]

1827.735: [GC pause (G1 Evacuation Pause) (young) 23G->23G(24G), 0.0212359
secs]

1827.766: [Full GC (Allocation Failure)  23G->23G(24G), 3.9308980 secs]

1831.710: [GC pause (G1 Evacuation Pause) (young)-- 23G->23G(24G),
0.0298229 secs]

1831.750: [GC pause (G1 Evacuation Pause) (young) (initial-mark)
23G->23G(24G), 0.0268410 secs]

1831.777: [GC concurrent-root-region-scan-start]

1831.777: [GC concurrent-root-region-scan-end, 0.0000212 secs]

1831.777: [GC concurrent-mark-start]

1831.789: [Full GC (Allocation Failure)  23G->23G(24G), 3.9250410 secs]



On Mon, Jun 13, 2016 at 6:54 PM, John Omernik <[email protected]> wrote:

> Yep, I will create all clean logs tomorrow run the query that caused it.
> Thanks.  John
>
>
> On Monday, June 13, 2016, Parth Chandra <[email protected]> wrote:
>
>> Yes, we can discuss this on the hangout.
>> You're right, there are two issues -
>>   Limiting memory usage to a maximum limit should be the goal of every
>> component. We are not there yet with Drill though.
>>   Getting an Out of Memory Error and having the Drillbit become
>> unresponsive is something we should rarely see as either the Drill
>> allocator or the JVM successfully catch the condition. Can you grep your
>> logs so we can see if that indeed is what happened?
>>
>>
>>
>> On Mon, Jun 13, 2016 at 4:27 PM, John Omernik <[email protected]> wrote:
>>
>> > I'd like to talk about that on the hangout.  Drill should do better at
>> > failing with a clean oom error rather then having a bit go unresponsive.
>> > Can just that bit be restarted to return to a copacetic state? As an
>> admin,
>> > if this is the case, how do I find this bit?
>> >
>> > Other than adding RAM, are there any query tuning settings that could
>> help
>> > prevent the unresponsive bit? ( I see this as two issues, the memory
>> > settings for the 1024m block size CTAS and the how can we prevent a bit
>> > from going unresponsive? )
>> > On Jun 13, 2016 6:19 PM, "Parth Chandra" <[email protected]> wrote:
>> >
>> > The only time I've seen a drillbit get unresponsive is when you run out
>> of
>> > Direct memory. Did you see any 'Out of Memory Error' in your logs? If
>> you
>> > see those then you need to increase the Direct memory setting for the
>> JVM.
>> > ( DRILL_MAX_DIRECT_MEMORY in drill-env.sh)
>> >
>> >
>> >
>> >
>> > On Mon, Jun 13, 2016 at 4:10 PM, John Omernik <[email protected]> wrote:
>> >
>> > > The 512m block size worked.  My issue with the 1024m block size was on
>> > the
>> > > writing using a CTAS.... that's where my nodes got into a bad
>> > state....thus
>> > > I am wondering what setting on drill would be the right setting to
>> help
>> > > node memory pressures on a CTAs using 1024m block size
>> > > On Jun 13, 2016 6:06 PM, "Parth Chandra" <[email protected]>
>> wrote:
>> > >
>> > > In general, you want to make the Parquet block size and the HDFS block
>> > size
>> > > the same. A Parquet block size that is larger than the HDFS block size
>> > can
>> > > split a Parquet block ( i.e. row_group ) across nodes and that will
>> > > severely affect performance as data reads will no longer be local.
>> 512 MB
>> > > is a pretty good setting.
>> > >
>> > > Note that you need to ensure the Parquet block size in the source file
>> > > which (maybe) was produced outside of Drill. So you will need to make
>> the
>> > > change in the application used to write the Parquet file.
>> > >
>> > > If you're using Drill to write the source file as well then, of
>> course,
>> > the
>> > > block size setting will be used by the writer.
>> > >
>> > > If you're using the new reader, then there is really no knob you have
>> to
>> > > tweak. Is parquet-tools able to read the file(s)?
>> > >
>> > >
>> > >
>> > > On Mon, Jun 13, 2016 at 1:59 PM, John Omernik <[email protected]>
>> wrote:
>> > >
>> > > > I am doing some performance testing, and per the Impala
>> documentation,
>> > I
>> > > am
>> > > > trying to use a block size of 1024m in both Drill and MapR FS.
>> When I
>> > > set
>> > > > the MFS block size to 512 and the Drill (default) block size I saw
>> some
>> > > > performance improvements, and wanted to try the 1024 to see how it
>> > > worked,
>> > > > however, my query hung and I got into that "bad state" where the
>> nodes
>> > > are
>> > > > not responding right and I have to restart my whole cluster (This
>> > really
>> > > > bothers me that a query can make the cluster be unresponsive)
>> > > >
>> > > > That said, what memory settings can I tweak to help the query work.
>> > This
>> > > is
>> > > > quite a bit of data, a CTAS from Parquet to Parquet, 100-130G of
>> data
>> > per
>> > > > data (I am doing a day at a time), 103 columns.   I have to use the
>> > > > "use_new_reader" option due to my other issues, but other than that
>> I
>> > am
>> > > > just setting the block size on MFS and then updating the block size
>> in
>> > > > Drill, and it's dying. Since this is a simple CTAS (no sort) which
>> > > settings
>> > > > can be beneficial for what is happening here?
>> > > >
>> > > > Thanks
>> > > >
>> > > > John
>> > > >
>> > >
>> >
>>
>
>
> --
> Sent from my iThing
>

Re: Memory Settings for a Non-Sorted Failed Query

Reply via email to