[jira] [Resolved] (DRILL-3898) No space error during external sort does not cancel the query

2016-09-19 Thread Boaz Ben-Zvi (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boaz Ben-Zvi resolved DRILL-3898.
-
   Resolution: Fixed
Fix Version/s: (was: Future)
   1.9.0

Commit ID: 140304d47daf8b18c72a0ab8f39c67d9d8a3031d

The code fix: "widen" the exception catch (from IOException to Throwable) to 
also cover the FSError returned from the Hadoop FS I/O call; in addition, 
ignore a repeat of the same exception while closing the new group (which 
triggers a fllush()).  This would produce a relatively "clean and clear" error 
message (instead of a Java stack dump; which is more appropriate for a software 
bug).

 

> No space error during external sort does not cancel the query
> -
>
> Key: DRILL-3898
> URL: https://issues.apache.org/jira/browse/DRILL-3898
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.2.0, 1.8.0
>Reporter: Victoria Markman
>Assignee: Boaz Ben-Zvi
> Fix For: 1.9.0
>
> Attachments: drillbit.log, sqlline_3898.ver_1_8.log
>
>
> While verifying DRILL-3732 I ran into a new problem.
> I think drill somehow loses track of out of disk exception and does not 
> cancel rest of the query, which results in NPE:
> Reproduction is the same as in DRILL-3732:
> {code}
> 0: jdbc:drill:schema=dfs> create table store_sales_20(ss_item_sk, 
> ss_customer_sk, ss_cdemo_sk, ss_hdemo_sk, s_sold_date_sk, ss_promo_sk) 
> partition by (ss_promo_sk) as
> . . . . . . . . . . . . >  select 
> . . . . . . . . . . . . >  case when columns[2] = '' then cast(null as 
> varchar(100)) else cast(columns[2] as varchar(100)) end,
> . . . . . . . . . . . . >  case when columns[3] = '' then cast(null as 
> varchar(100)) else cast(columns[3] as varchar(100)) end,
> . . . . . . . . . . . . >  case when columns[4] = '' then cast(null as 
> varchar(100)) else cast(columns[4] as varchar(100)) end, 
> . . . . . . . . . . . . >  case when columns[5] = '' then cast(null as 
> varchar(100)) else cast(columns[5] as varchar(100)) end, 
> . . . . . . . . . . . . >  case when columns[0] = '' then cast(null as 
> varchar(100)) else cast(columns[0] as varchar(100)) end, 
> . . . . . . . . . . . . >  case when columns[8] = '' then cast(null as 
> varchar(100)) else cast(columns[8] as varchar(100)) end
> . . . . . . . . . . . . >  from 
> . . . . . . . . . . . . >   `store_sales.dat` ss 
> . . . . . . . . . . . . > ;
> Error: SYSTEM ERROR: NullPointerException
> Fragment 1:16
> [Error Id: 0ae9338d-d04f-4b4a-93aa-a80d13cedb29 on atsqa4-133.qa.lab:31010] 
> (state=,code=0)
> {code}
> This exception in drillbit.log should have triggered query cancellation:
> {code}
> 2015-10-06 17:01:34,463 [WorkManager-2] ERROR 
> o.apache.drill.exec.work.WorkManager - 
> org.apache.drill.exec.work.WorkManager$WorkerBee$1.run() leaked an exception.
> org.apache.hadoop.fs.FSError: java.io.IOException: No space left on device
> at 
> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:226)
>  ~[hadoop-common-2.5.1-mapr-1503.jar:na]
> at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) 
> ~[na:1.7.0_71]
> at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140) 
> ~[na:1.7.0_71]
> at java.io.FilterOutputStream.close(FilterOutputStream.java:157) 
> ~[na:1.7.0_71]
> at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
>  ~[hadoop-common-2.5.1-mapr-1503.jar:na]
> at 
> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106) 
> ~[hadoop-common-2.5.1-mapr-1503.jar:na]
> at 
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.close(ChecksumFileSystem.java:400)
>  ~[hadoop-common-2.5.1-mapr-1503.jar:na]
> at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
>  ~[hadoop-common-2.5.1-mapr-1503.jar:na]
> at 
> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106) 
> ~[hadoop-common-2.5.1-mapr-1503.jar:na]
> at 
> org.apache.drill.exec.physical.impl.xsort.BatchGroup.close(BatchGroup.java:152)
>  ~[drill-java-exec-1.2.0.jar:1.2.0]
> at 
> org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:44) 
> ~[drill-common-1.2.0.jar:1.2.0]
> at 
> org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.mergeAndSpill(ExternalSortBatch.java:553)
>  ~[drill-java-exec-1.2.0.jar:1.2.0]
> at 
> org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.innerNext(ExternalSortBatch.java:362)
>  ~[drill-java-exec-1.2.0.jar:1.2.0]
> at 
> 

[jira] [Resolved] (DRILL-4771) Drill should avoid doing the same join twice if count(distinct) exists

2016-09-19 Thread Gautam Kumar Parai (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gautam Kumar Parai resolved DRILL-4771.
---
   Resolution: Fixed
Fix Version/s: 1.9.0

Closed with commit: 229571533bce1e37395d9675ea804ee97b1a2362

> Drill should avoid doing the same join twice if count(distinct) exists
> --
>
> Key: DRILL-4771
> URL: https://issues.apache.org/jira/browse/DRILL-4771
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.2.0
>Reporter: Gautam Kumar Parai
>Assignee: Gautam Kumar Parai
> Fix For: 1.9.0
>
>
> When the query has one distinct aggregate and one or more non-distinct 
> aggregates, the join instance need not produce the join-based plan. We can 
> generate multi-phase aggregates. Another approach would be to use grouping 
> sets. However, Drill is unable to support grouping sets and instead relies on 
> the join-based plan (see the plan below)
> {code}
> select emp.empno, count(*), avg(distinct dept.deptno) 
> from sales.emp emp inner join sales.dept dept 
> on emp.deptno = dept.deptno 
> group by emp.empno
> LogicalProject(EMPNO=[$0], EXPR$1=[$1], EXPR$2=[$3])
>   LogicalJoin(condition=[IS NOT DISTINCT FROM($0, $2)], joinType=[inner])
> LogicalAggregate(group=[{0}], EXPR$1=[COUNT()])
>   LogicalProject(EMPNO=[$0], DEPTNO0=[$9])
> LogicalJoin(condition=[=($7, $9)], joinType=[inner])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]])
>   LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
> LogicalAggregate(group=[{0}], EXPR$2=[AVG($1)])
>   LogicalAggregate(group=[{0, 1}])
> LogicalProject(EMPNO=[$0], DEPTNO0=[$9])
>   LogicalJoin(condition=[=($7, $9)], joinType=[inner])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
> {code}
> The more efficient form should look like this
> {code}
> select emp.empno, count(*), avg(distinct dept.deptno) 
> from sales.emp emp inner join sales.dept dept 
> on emp.deptno = dept.deptno 
> group by emp.empno
> LogicalAggregate(group=[{0}], EXPR$1=[SUM($2)], EXPR$2=[AVG($1)])
>   LogicalAggregate(group=[{0, 1}], EXPR$1=[COUNT()])
> LogicalProject(EMPNO=[$0], DEPTNO0=[$9])
>   LogicalJoin(condition=[=($7, $9)], joinType=[inner])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Suggest topics for hangout tomorrow (9/20)

2016-09-19 Thread Aman Sinha
I'll start the hangout tomorrow at the usual time.  I don't have a set
agenda yet but if there are any topics folks wish to discuss, please
respond on this thread such that others who might be interested can also
join.

Thanks.


[jira] [Created] (DRILL-4896) After a failed CTAS, the table both exists and does not exist

2016-09-19 Thread Boaz Ben-Zvi (JIRA)
Boaz Ben-Zvi created DRILL-4896:
---

 Summary: After a failed CTAS, the table both exists and does not 
exist
 Key: DRILL-4896
 URL: https://issues.apache.org/jira/browse/DRILL-4896
 Project: Apache Drill
  Issue Type: Improvement
  Components:  Server
Affects Versions: 1.8.0
Reporter: Boaz Ben-Zvi


  After CTAS failed (due to no space on storage device) there were (incomplete) 
Parquet files left.  A subsequent CTAS for the same table name fails with 
"table exists", and a subsequent DROP on the same table name fails with "table 
does not exist".

  A possible enhancement: DROP to be able to cleanup such a corrupted table.

0: jdbc:drill:zk=local> create table `/drill/spill/tt1` as
. . . . . . . . . . . >  select
. . . . . . . . . . . >case when columns[2] = '' then cast(null as 
varchar(100)) else cast(columns[2] as varchar(100)) end,
. . . . . . . . . . . >case when columns[3] = '' then cast(null as 
varchar(100)) else cast(columns[3] as varchar(100)) end,
. . . . . . . . . . . >case when columns[4] = '' then cast(null as 
varchar(100)) else cast(columns[4] as varchar(100)) end, 
. . . . . . . . . . . >case when columns[5] = '' then cast(null as 
varchar(100)) else cast(columns[5] as varchar(100)) end, 
. . . . . . . . . . . >case when columns[0] = '' then cast(null as 
varchar(100)) else cast(columns[0] as varchar(100)) end, 
. . . . . . . . . . . >case when columns[8] = '' then cast(null as 
varchar(100)) else cast(columns[8] as varchar(100)) end
. . . . . . . . . . . > FROM 
dfs.`/Users/boazben-zvi/data/store_sales/store_sales.dat`;
Exception in thread "drill-executor-4" org.apache.hadoop.fs.FSError: 
java.io.IOException: No space left on device
. 39 more
Error: SYSTEM ERROR: IOException: The file being written is in an invalid 
state. Probably caused by an error thrown previously. Current state: COLUMN

Fragment 0:0

[Error Id: de84c212-2400-4a08-a15c-8e3adb5ec774 on 10.250.57.63:31010] 
(state=,code=0)
0: jdbc:drill:zk=local> create table `/drill/spill/tt1` as select * from 
dfs.`/Users/boazben-zvi/data/store_sales/store_sales.dat`;
Error: VALIDATION ERROR: A table or view with given name [/drill/spill/tt1] 
already exists in schema [dfs.tmp]


[Error Id: 0ef99a15-9d67-49ad-87fb-023105dece3c on 10.250.57.63:31010] 
(state=,code=0)
0: jdbc:drill:zk=local> drop table `/drill/spill/tt1` ;
Error: DATA_WRITE ERROR: Failed to drop table: File /drill/spill/tt1 does not 
exist


[Error Id: c22da79f-ecbd-423c-b5b2-4eae7d1263d7 on 10.250.57.63:31010] 
(state=,code=0)




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] drill pull request #588: Added test cases

2016-09-19 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/588


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill issue #518: DRILL-4653.json - Malformed JSON should not stop the entir...

2016-09-19 Thread ssriniva123
Github user ssriniva123 commented on the issue:

https://github.com/apache/drill/pull/518
  
Apologize for getting back on this thread late, got tied up with some 
issues@work.

Paul,
The json parser is not just a tokenizer, it keeps track of the JSON 
structure and understands various aspects of it like root, array/objectcontext 
and all parsing is done under that context.

- we cannot keep track of {} accurately - For eg: The counting json 
processor does a parser. skipChildren which tries to skip to the end of the 
JSON, but this can rollover to next line when
there is a malformed JSON in the bottom most json sub object - see example 
below (missing " in last json structure). This is similar behavior with the 
JsonReader.

{"balance": 1000.0,"num": 100,"is_vip": true,"name": 
"foo3","curr":{"denom":"pound","test":{"value  :false}}}

- One possible solution is to rewind the input source to reset the stream 
(which is not recommended and there is no guarentee that all streams support 
mark/reset semantics.

Given where we are, I think the solution proposed works perfect for almost 
all malformed JSON's.





---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #591: DRILL-3898 : Sort spill was modified to catch all e...

2016-09-19 Thread Ben-Zvi
GitHub user Ben-Zvi opened a pull request:

https://github.com/apache/drill/pull/591

DRILL-3898 :  Sort spill was modified to catch all errors, ignore rep…

…eated errors while closing the new group and issue a more detailed error 
message.

See PR history in #585

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Ben-Zvi/drill DRILL-3898

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/591.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #591


commit 48846355b42714925586b549eab11de62c365a5c
Author: Boaz Ben-Zvi 
Date:   2016-09-09T23:36:03Z

DRILL-3898 :  Sort spill was modified to catch all errors, ignore repeated 
errors while closing the new group and issue a more detailed error message.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #585: DRILL-3898 : Sort spill was modified to catch all e...

2016-09-19 Thread Ben-Zvi
Github user Ben-Zvi closed the pull request at:

https://github.com/apache/drill/pull/585


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: ZK lost connectivity issue on large cluster

2016-09-19 Thread François Méthot
Hi Sudheesh,

  If I add selection filter so that no row are returned, the same problem
occur. I also simplified the query to include only few integer columns.

That particular data repo is ~200+ Billions records spread over ~50 000
parquet files.

We have other CSV data repo that are 100x smaller that does not trigger
this issue.


+ Is atsqa4-133.qa.lab [1] the Foreman node for the query in this case?
There is also a bizarre case where the node that is reported as lost is the
node itself.
Yes, the stack trace is from the ticket, It did occurred once or twice (in
the many many attempts) that it was the node itself.

+ Is there a spike in memory usage of the Drillbit this is the Foreman for
the query (process memory, not just heap)?
We don't notice any unusual spike, each nodes gets busy in the same range
when query is running.

I tried running with 8GB/20GB and 4GB/24GB heap/off-heap, did not see any
improvement.


We will update from 1.7 to 1.8 before going ahead with more investigation.

Thanks a lot.










On Mon, Sep 19, 2016 at 1:19 PM, Sudheesh Katkam 
wrote:

> Hi Francois,
>
> A simple query with only projections is not an “ideal” use case, since
> Drill is bound by how fast the client can consume records. There are 1000
> scanners sending data to 1 client (vs far fewer scanners sending data in
> the 12 node case).
>
> This might increase the load on the Drillbit that is the Foreman for the
> query. In the query profile, the scanners should be spending a lot more
> time “waiting” to send records to the client (via root fragment).
> + Is atsqa4-133.qa.lab [1] the Foreman node for the query in this case?
> There is also a bizarre case where the node that is reported as lost is the
> node itself.
> + Is there a spike in memory usage of the Drillbit this is the Foreman for
> the query (process memory, not just heap)?
>
> Regarding the warnings ...
>
> > 2016-09-19 13:31:56,866 [BitServer-7] WARN
> > o.a.d.exec.rpc.control.ControlServer - Message of mode REQUEST of rpc
> type
> > 6 took longer than 500 ms. Actual Duration was 16053ms.
>
>
> RPC type 6 is a cancellation request; DRILL-4766 [2] should help in this
> case, which is resolved in the latest version of Drill. So as Chun
> suggested, upgrade the cluster to the latest version of Drill.
>
> > 2016-09-19 14:15:33,357 [BitServer-4] WARN
> > o.a.d.exec.rpc.control.ControlClient - Message of mode RESPONSE of rpc
> type
> > 1 took longer than 500 ms. Actual Duration was 981ms.
>
> I am surprised that responses are taking that long to handle.
> + Are both messages on the same Drillbit?
>
> The other warnings can be ignored.
>
> Thank you,
> Sudheesh
>
> [1] I just realized that atsqa4-133.qa.lab is in one of our test
> environments :)
> [2] https://issues.apache.org/jira/browse/DRILL-4766 <
> https://issues.apache.org/jira/browse/DRILL-4766>
>
> > On Sep 19, 2016, at 9:15 AM, François Méthot 
> wrote:
> >
> > Hi Sudheesh,
> >
> >
> > + Does the query involve any aggregations or filters? Or is this a select
> > query with only projections?
> > Simple query with only projections
> >
> > + Any suspicious timings in the query profile?
> > Nothing specially different than our working query on our small cluster.
> >
> > + Any suspicious warning messages in the logs around the time of failure
> on
> > any of the drillbits? Specially on atsqa4-133.qa.lab? Specially this one
> > (“..” are place holders):
> >  Message of mode .. of rpc type .. took longer than ..ms.  Actual
> duration
> > was ..ms.
> >
> > Well we do see this warning on the failing node (on my last test), I
> found
> > this WARNING in our log for the past month for pretty much every node I
> > checked.
> > 2016-09-19 13:31:56,866 [BitServer-7] WARN
> > o.a.d.exec.rpc.control.ControlServer - Message of mode REQUEST of rpc
> type
> > 6 took longer than 500 ms. Actual Duration was 16053ms.
> > 2016-09-19 14:15:33,357 [BitServer-4] WARN
> > o.a.d.exec.rpc.control.ControlClient - Message of mode RESPONSE of rpc
> type
> > 1 took longer than 500 ms. Actual Duration was 981ms.
> >
> > We really appreciate your help. I will dig in the source code for when
> and
> > why this error happen.
> >
> >
> > Francois
> >
> > P.S.:
> > We do see this also:
> > 2016-09-19 14:48:23,444 [drill-executor-9] WARN
> > o.a.d.exec.rpc.control.WorkEventBus - Fragment ..:1:2 not found in
> the
> > work bus.
> > 2016-09-19 14:48:23,444 [drill-executor-11] WARN
> > o.a.d.exec.rpc.control.WorkEventBus - Fragment :1:222 not found in
> the
> > work bus.
> > 2016-09-19 14:48:23,444 [drill-executor-12] WARN
> > o.a.d.exec.rpc.control.WorkEventBus - Fragment :1:442 not found in
> the
> > work bus.
> > 2016-09-19 14:48:23,444 [drill-executor-10] WARN
> > o.a.d.exec.rpc.control.WorkEventBus - Fragment :1:662 not found in
> the
> > work bus.
> >
> >
> >
> >
> > On Sun, Sep 18, 2016 at 2:57 PM, Sudheesh Katkam 
> > wrote:
> >
> >> Hi Francois,
> >>
> 

Re: ZK lost connectivity issue on large cluster

2016-09-19 Thread Sudheesh Katkam
One more interesting thing and another guess to resolve the problem,

> P.S.:
> We do see this also:
> 2016-09-19 14:48:23,444 [drill-executor-9] WARN
> o.a.d.exec.rpc.control.WorkEventBus - Fragment ..:1:2 not found in the
> work bus.
> 2016-09-19 14:48:23,444 [drill-executor-11] WARN
> o.a.d.exec.rpc.control.WorkEventBus - Fragment :1:222 not found in the
> work bus.
> 2016-09-19 14:48:23,444 [drill-executor-12] WARN
> o.a.d.exec.rpc.control.WorkEventBus - Fragment :1:442 not found in the
> work bus.
> 2016-09-19 14:48:23,444 [drill-executor-10] WARN
> o.a.d.exec.rpc.control.WorkEventBus - Fragment :1:662 not found in the
> work bus.

+ Are there 220 nodes in your cluster?

If that is the case and these fragments are scanners, they seem be be assigned 
in a round robin fashion (and not based on data locality?). If so, please 
upgrade to the latest version where DRILL-4446 [1] is resolved.

Thank you,
Sudheesh

[1] https://issues.apache.org/jira/browse/DRILL-4446 


> On Sep 19, 2016, at 10:19 AM, Sudheesh Katkam  wrote:
> 
> Hi Francois,
> 
> A simple query with only projections is not an “ideal” use case, since Drill 
> is bound by how fast the client can consume records. There are 1000 scanners 
> sending data to 1 client (vs far fewer scanners sending data in the 12 node 
> case).
> 
> This might increase the load on the Drillbit that is the Foreman for the 
> query. In the query profile, the scanners should be spending a lot more time 
> “waiting” to send records to the client (via root fragment).
> + Is atsqa4-133.qa.lab [1] the Foreman node for the query in this case? There 
> is also a bizarre case where the node that is reported as lost is the node 
> itself.
> + Is there a spike in memory usage of the Drillbit this is the Foreman for 
> the query (process memory, not just heap)?
> 
> Regarding the warnings ...
> 
>> 2016-09-19 13:31:56,866 [BitServer-7] WARN
>> o.a.d.exec.rpc.control.ControlServer - Message of mode REQUEST of rpc type
>> 6 took longer than 500 ms. Actual Duration was 16053ms.
> 
> 
> RPC type 6 is a cancellation request; DRILL-4766 [2] should help in this 
> case, which is resolved in the latest version of Drill. So as Chun suggested, 
> upgrade the cluster to the latest version of Drill.
> 
>> 2016-09-19 14:15:33,357 [BitServer-4] WARN
>> o.a.d.exec.rpc.control.ControlClient - Message of mode RESPONSE of rpc type
>> 1 took longer than 500 ms. Actual Duration was 981ms.
> 
> I am surprised that responses are taking that long to handle.
> + Are both messages on the same Drillbit?
> 
> The other warnings can be ignored.
> 
> Thank you,
> Sudheesh
> 
> [1] I just realized that atsqa4-133.qa.lab is in one of our test environments 
> :)
> [2] https://issues.apache.org/jira/browse/DRILL-4766 
> 
> 
>> On Sep 19, 2016, at 9:15 AM, François Méthot > > wrote:
>> 
>> Hi Sudheesh,
>> 
>> 
>> + Does the query involve any aggregations or filters? Or is this a select
>> query with only projections?
>> Simple query with only projections
>> 
>> + Any suspicious timings in the query profile?
>> Nothing specially different than our working query on our small cluster.
>> 
>> + Any suspicious warning messages in the logs around the time of failure on
>> any of the drillbits? Specially on atsqa4-133.qa.lab? Specially this one
>> (“..” are place holders):
>>  Message of mode .. of rpc type .. took longer than ..ms.  Actual duration
>> was ..ms.
>> 
>> Well we do see this warning on the failing node (on my last test), I found
>> this WARNING in our log for the past month for pretty much every node I
>> checked.
>> 2016-09-19 13:31:56,866 [BitServer-7] WARN
>> o.a.d.exec.rpc.control.ControlServer - Message of mode REQUEST of rpc type
>> 6 took longer than 500 ms. Actual Duration was 16053ms.
>> 2016-09-19 14:15:33,357 [BitServer-4] WARN
>> o.a.d.exec.rpc.control.ControlClient - Message of mode RESPONSE of rpc type
>> 1 took longer than 500 ms. Actual Duration was 981ms.
>> 
>> We really appreciate your help. I will dig in the source code for when and
>> why this error happen.
>> 
>> 
>> Francois
>> 
>> P.S.:
>> We do see this also:
>> 2016-09-19 14:48:23,444 [drill-executor-9] WARN
>> o.a.d.exec.rpc.control.WorkEventBus - Fragment ..:1:2 not found in the
>> work bus.
>> 2016-09-19 14:48:23,444 [drill-executor-11] WARN
>> o.a.d.exec.rpc.control.WorkEventBus - Fragment :1:222 not found in the
>> work bus.
>> 2016-09-19 14:48:23,444 [drill-executor-12] WARN
>> o.a.d.exec.rpc.control.WorkEventBus - Fragment :1:442 not found in the
>> work bus.
>> 2016-09-19 14:48:23,444 [drill-executor-10] WARN
>> o.a.d.exec.rpc.control.WorkEventBus - Fragment :1:662 not found in the
>> work bus.
>> 
>> 
>> 
>> 
>> On Sun, Sep 18, 2016 at 2:57 PM, Sudheesh Katkam > >

Re: ZK lost connectivity issue on large cluster

2016-09-19 Thread Sudheesh Katkam
Hi Francois,

A simple query with only projections is not an “ideal” use case, since Drill is 
bound by how fast the client can consume records. There are 1000 scanners 
sending data to 1 client (vs far fewer scanners sending data in the 12 node 
case).

This might increase the load on the Drillbit that is the Foreman for the query. 
In the query profile, the scanners should be spending a lot more time “waiting” 
to send records to the client (via root fragment).
+ Is atsqa4-133.qa.lab [1] the Foreman node for the query in this case? There 
is also a bizarre case where the node that is reported as lost is the node 
itself.
+ Is there a spike in memory usage of the Drillbit this is the Foreman for the 
query (process memory, not just heap)?

Regarding the warnings ...

> 2016-09-19 13:31:56,866 [BitServer-7] WARN
> o.a.d.exec.rpc.control.ControlServer - Message of mode REQUEST of rpc type
> 6 took longer than 500 ms. Actual Duration was 16053ms.


RPC type 6 is a cancellation request; DRILL-4766 [2] should help in this case, 
which is resolved in the latest version of Drill. So as Chun suggested, upgrade 
the cluster to the latest version of Drill.

> 2016-09-19 14:15:33,357 [BitServer-4] WARN
> o.a.d.exec.rpc.control.ControlClient - Message of mode RESPONSE of rpc type
> 1 took longer than 500 ms. Actual Duration was 981ms.

I am surprised that responses are taking that long to handle.
+ Are both messages on the same Drillbit?

The other warnings can be ignored.

Thank you,
Sudheesh

[1] I just realized that atsqa4-133.qa.lab is in one of our test environments :)
[2] https://issues.apache.org/jira/browse/DRILL-4766 


> On Sep 19, 2016, at 9:15 AM, François Méthot  wrote:
> 
> Hi Sudheesh,
> 
> 
> + Does the query involve any aggregations or filters? Or is this a select
> query with only projections?
> Simple query with only projections
> 
> + Any suspicious timings in the query profile?
> Nothing specially different than our working query on our small cluster.
> 
> + Any suspicious warning messages in the logs around the time of failure on
> any of the drillbits? Specially on atsqa4-133.qa.lab? Specially this one
> (“..” are place holders):
>  Message of mode .. of rpc type .. took longer than ..ms.  Actual duration
> was ..ms.
> 
> Well we do see this warning on the failing node (on my last test), I found
> this WARNING in our log for the past month for pretty much every node I
> checked.
> 2016-09-19 13:31:56,866 [BitServer-7] WARN
> o.a.d.exec.rpc.control.ControlServer - Message of mode REQUEST of rpc type
> 6 took longer than 500 ms. Actual Duration was 16053ms.
> 2016-09-19 14:15:33,357 [BitServer-4] WARN
> o.a.d.exec.rpc.control.ControlClient - Message of mode RESPONSE of rpc type
> 1 took longer than 500 ms. Actual Duration was 981ms.
> 
> We really appreciate your help. I will dig in the source code for when and
> why this error happen.
> 
> 
> Francois
> 
> P.S.:
> We do see this also:
> 2016-09-19 14:48:23,444 [drill-executor-9] WARN
> o.a.d.exec.rpc.control.WorkEventBus - Fragment ..:1:2 not found in the
> work bus.
> 2016-09-19 14:48:23,444 [drill-executor-11] WARN
> o.a.d.exec.rpc.control.WorkEventBus - Fragment :1:222 not found in the
> work bus.
> 2016-09-19 14:48:23,444 [drill-executor-12] WARN
> o.a.d.exec.rpc.control.WorkEventBus - Fragment :1:442 not found in the
> work bus.
> 2016-09-19 14:48:23,444 [drill-executor-10] WARN
> o.a.d.exec.rpc.control.WorkEventBus - Fragment :1:662 not found in the
> work bus.
> 
> 
> 
> 
> On Sun, Sep 18, 2016 at 2:57 PM, Sudheesh Katkam 
> wrote:
> 
>> Hi Francois,
>> 
>> More questions..
>> 
>>> + Can you share the query profile?
>>>  I will sum it up:
>>> It is a select on 18 columns: 9 string, 9 integers.
>>> Scan is done on 13862 parquet files spread  on 1000 fragments.
>>> Fragments are spread accross 215 nodes.
>> 
>> So ~5 leaf fragments (or scanners) per Drillbit seems fine.
>> 
>> + Does the query involve any aggregations or filters? Or is this a select
>> query with only projections?
>> + Any suspicious timings in the query profile?
>> + Any suspicious warning messages in the logs around the time of failure
>> on any of the drillbits? Specially on atsqa4-133.qa.lab? Specially this one
>> (“..” are place holders):
>>  Message of mode .. of rpc type .. took longer than ..ms.  Actual
>> duration was ..ms.
>> 
>> Thank you,
>> Sudheesh
>> 
>>> On Sep 15, 2016, at 11:27 AM, François Méthot 
>> wrote:
>>> 
>>> Hi Sudheesh,
>>> 
>>> + How many zookeeper servers in the quorum?
>>> The quorum has 3 servers, everything looks healthy
>>> 
>>> + What is the load on atsqa4-133.qa.lab when this happens? Any other
>>> applications running on that node? How many threads is the Drill process
>>> using?
>>> The load on the failing node(8 cores) is 14, when Drill is running. Which
>>> is nothing out of the ordinary according to our 

Re: ZK lost connectivity issue on large cluster

2016-09-19 Thread François Méthot
Hi Sudheesh,


+ Does the query involve any aggregations or filters? Or is this a select
query with only projections?
Simple query with only projections

+ Any suspicious timings in the query profile?
Nothing specially different than our working query on our small cluster.

+ Any suspicious warning messages in the logs around the time of failure on
any of the drillbits? Specially on atsqa4-133.qa.lab? Specially this one
(“..” are place holders):
  Message of mode .. of rpc type .. took longer than ..ms.  Actual duration
was ..ms.

Well we do see this warning on the failing node (on my last test), I found
this WARNING in our log for the past month for pretty much every node I
checked.
2016-09-19 13:31:56,866 [BitServer-7] WARN
o.a.d.exec.rpc.control.ControlServer - Message of mode REQUEST of rpc type
6 took longer than 500 ms. Actual Duration was 16053ms.
2016-09-19 14:15:33,357 [BitServer-4] WARN
o.a.d.exec.rpc.control.ControlClient - Message of mode RESPONSE of rpc type
1 took longer than 500 ms. Actual Duration was 981ms.

We really appreciate your help. I will dig in the source code for when and
why this error happen.


Francois

P.S.:
We do see this also:
2016-09-19 14:48:23,444 [drill-executor-9] WARN
o.a.d.exec.rpc.control.WorkEventBus - Fragment ..:1:2 not found in the
work bus.
2016-09-19 14:48:23,444 [drill-executor-11] WARN
o.a.d.exec.rpc.control.WorkEventBus - Fragment :1:222 not found in the
work bus.
2016-09-19 14:48:23,444 [drill-executor-12] WARN
o.a.d.exec.rpc.control.WorkEventBus - Fragment :1:442 not found in the
work bus.
2016-09-19 14:48:23,444 [drill-executor-10] WARN
o.a.d.exec.rpc.control.WorkEventBus - Fragment :1:662 not found in the
work bus.




On Sun, Sep 18, 2016 at 2:57 PM, Sudheesh Katkam 
wrote:

> Hi Francois,
>
> More questions..
>
> > + Can you share the query profile?
> >   I will sum it up:
> >  It is a select on 18 columns: 9 string, 9 integers.
> >  Scan is done on 13862 parquet files spread  on 1000 fragments.
> >  Fragments are spread accross 215 nodes.
>
> So ~5 leaf fragments (or scanners) per Drillbit seems fine.
>
> + Does the query involve any aggregations or filters? Or is this a select
> query with only projections?
> + Any suspicious timings in the query profile?
> + Any suspicious warning messages in the logs around the time of failure
> on any of the drillbits? Specially on atsqa4-133.qa.lab? Specially this one
> (“..” are place holders):
>   Message of mode .. of rpc type .. took longer than ..ms.  Actual
> duration was ..ms.
>
> Thank you,
> Sudheesh
>
> > On Sep 15, 2016, at 11:27 AM, François Méthot 
> wrote:
> >
> > Hi Sudheesh,
> >
> > + How many zookeeper servers in the quorum?
> > The quorum has 3 servers, everything looks healthy
> >
> > + What is the load on atsqa4-133.qa.lab when this happens? Any other
> > applications running on that node? How many threads is the Drill process
> > using?
> > The load on the failing node(8 cores) is 14, when Drill is running. Which
> > is nothing out of the ordinary according to our admin.
> > HBase is also running.
> > planner.width.max_per_node is set to 8
> >
> > + When running the same query on 12 nodes, is the data size same?
> > Yes
> >
> > + Can you share the query profile?
> >   I will sum it up:
> >  It is a select on 18 columns: 9 string, 9 integers.
> >  Scan is done on 13862 parquet files spread  on 1000 fragments.
> >  Fragments are spread accross 215 nodes.
> >
> >
> > We are in process of increasing our Zookeeper session timeout config to
> see
> > if it helps.
> >
> > thanks
> >
> > Francois
> >
> >
> >
> >
> >
> >
> >
> > On Wed, Sep 14, 2016 at 4:40 PM, Sudheesh Katkam 
> > wrote:
> >
> >> Hi Francois,
> >>
> >> Few questions:
> >> + How many zookeeper servers in the quorum?
> >> + What is the load on atsqa4-133.qa.lab when this happens? Any other
> >> applications running on that node? How many threads is the Drill process
> >> using?
> >> + When running the same query on 12 nodes, is the data size same?
> >> + Can you share the query profile?
> >>
> >> This may not be the right thing to do, but for now, If the cluster is
> >> heavily loaded, increase the zk timeout.
> >>
> >> Thank you,
> >> Sudheesh
> >>
> >>> On Sep 14, 2016, at 11:53 AM, François Méthot 
> >> wrote:
> >>>
> >>> We are running 1.7.
> >>> The log were taken from the jira tickets.
> >>>
> >>> We will try out 1.8 soon.
> >>>
> >>>
> >>>
> >>>
> >>> On Wed, Sep 14, 2016 at 2:52 PM, Chun Chang 
> wrote:
> >>>
>  Looks like you are running 1.5. I believe there are some work done in
> >> that
>  area and the newer release should behave better.
> 
>  On Wed, Sep 14, 2016 at 11:43 AM, François Méthot <
> fmetho...@gmail.com>
>  wrote:
> 
> > Hi,
> >
> > We are trying to find a solution/workaround to issue:
> >
> > 2016-01-28 16:36:14,367 [Curator-ServiceCache-0]