[jira] [Updated] (DRILL-6143) Make Fragment Runner's RPC Timeout a SystemOption

2018-02-08 Thread Timothy Farkas (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Farkas updated DRILL-6143:
--
Reviewer: Arina Ielchiieva

> Make Fragment Runner's RPC Timeout a SystemOption
> -
>
> Key: DRILL-6143
> URL: https://issues.apache.org/jira/browse/DRILL-6143
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.13.0
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Major
> Fix For: 1.13.0
>
>
> Queries frequently fail sporadically on some clusters due to the following 
> error
> {code}
> oadd.org.apache.drill.common.exceptions.UserRemoteException: CONNECTION 
> ERROR: Exceeded timeout (25000) while waiting send intermediate work 
> fragments to remote nodes. Sent 5 and only heard response back from 4 nodes.
> {code}
> This error happens because the FragmentsRunner has a hardcoded timeout 
> RPC_WAIT_IN_MSECS_PER_FRAGMENT which is set at 5 seconds. Increasing the 
> timeout to 10 seconds resolved the sporadic failures that were observed. This 
> timeout should be changed to 10 and should also be configurable via the 
> SystemOptionManager



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6143) Make Fragment Runner's RPC Timeout a SystemOption

2018-02-08 Thread Timothy Farkas (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357932#comment-16357932
 ] 

Timothy Farkas commented on DRILL-6143:
---

There seem to be cases where a drillbit does not send a response to the 
FragmentRunner with a larger timeout. There is likely another issue which can 
cause a response never to be sent in some cases. I will create a separate 
ticket for this case.

> Make Fragment Runner's RPC Timeout a SystemOption
> -
>
> Key: DRILL-6143
> URL: https://issues.apache.org/jira/browse/DRILL-6143
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.13.0
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Major
> Fix For: 1.13.0
>
>
> Queries frequently fail sporadically on some clusters due to the following 
> error
> {code}
> oadd.org.apache.drill.common.exceptions.UserRemoteException: CONNECTION 
> ERROR: Exceeded timeout (25000) while waiting send intermediate work 
> fragments to remote nodes. Sent 5 and only heard response back from 4 nodes.
> {code}
> This error happens because the FragmentsRunner has a hardcoded timeout 
> RPC_WAIT_IN_MSECS_PER_FRAGMENT which is set at 5 seconds. Increasing the 
> timeout to 10 seconds resolved the sporadic failures that were observed. This 
> timeout should be changed to 10 and should also be configurable via the 
> SystemOptionManager



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6143) Make Fragment Runner's RPC Timeout a SystemOption

2018-02-08 Thread Timothy Farkas (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Farkas updated DRILL-6143:
--
Summary: Make Fragment Runner's RPC Timeout a SystemOption  (was: Queries 
Fail Due To Aggressive Hardcoded RPC Timeout)

> Make Fragment Runner's RPC Timeout a SystemOption
> -
>
> Key: DRILL-6143
> URL: https://issues.apache.org/jira/browse/DRILL-6143
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.13.0
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Major
> Fix For: 1.13.0
>
>
> Queries frequently fail sporadically on some clusters due to the following 
> error
> {code}
> oadd.org.apache.drill.common.exceptions.UserRemoteException: CONNECTION 
> ERROR: Exceeded timeout (25000) while waiting send intermediate work 
> fragments to remote nodes. Sent 5 and only heard response back from 4 nodes.
> {code}
> This error happens because the FragmentsRunner has a hardcoded timeout 
> RPC_WAIT_IN_MSECS_PER_FRAGMENT which is set at 5 seconds. Increasing the 
> timeout to 10 seconds resolved the sporadic failures that were observed. This 
> timeout should be changed to 10 and should also be configurable via the 
> SystemOptionManager



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6143) Queries Fail Due To Aggressive Hardcoded RPC Timeout

2018-02-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357929#comment-16357929
 ] 

ASF GitHub Bot commented on DRILL-6143:
---

Github user ilooner commented on the issue:

https://github.com/apache/drill/pull/1119
  
@arina-ielchiieva 


> Queries Fail Due To Aggressive Hardcoded RPC Timeout
> 
>
> Key: DRILL-6143
> URL: https://issues.apache.org/jira/browse/DRILL-6143
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.13.0
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Major
> Fix For: 1.13.0
>
>
> Queries frequently fail sporadically on some clusters due to the following 
> error
> {code}
> oadd.org.apache.drill.common.exceptions.UserRemoteException: CONNECTION 
> ERROR: Exceeded timeout (25000) while waiting send intermediate work 
> fragments to remote nodes. Sent 5 and only heard response back from 4 nodes.
> {code}
> This error happens because the FragmentsRunner has a hardcoded timeout 
> RPC_WAIT_IN_MSECS_PER_FRAGMENT which is set at 5 seconds. Increasing the 
> timeout to 10 seconds resolved the sporadic failures that were observed. This 
> timeout should be changed to 10 and should also be configurable via the 
> SystemOptionManager



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6143) Queries Fail Due To Aggressive Hardcoded RPC Timeout

2018-02-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357927#comment-16357927
 ] 

ASF GitHub Bot commented on DRILL-6143:
---

GitHub user ilooner opened a pull request:

https://github.com/apache/drill/pull/1119

DRILL-6143: Made FragmentsRunner's rpc timeout larger to reduce rando…

…m failures and made it configurable as a SystemOption.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ilooner/drill DRILL-6143

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/1119.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1119


commit d918265c4d5caee11c0b707ff49a49c547c8dc8a
Author: Timothy Farkas 
Date:   2018-02-08T23:25:59Z

DRILL-6143: Made FragmentsRunner's rpc timeout larger to reduce random 
failures and made it configurable as a SystemOption.




> Queries Fail Due To Aggressive Hardcoded RPC Timeout
> 
>
> Key: DRILL-6143
> URL: https://issues.apache.org/jira/browse/DRILL-6143
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.13.0
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Major
> Fix For: 1.13.0
>
>
> Queries frequently fail sporadically on some clusters due to the following 
> error
> {code}
> oadd.org.apache.drill.common.exceptions.UserRemoteException: CONNECTION 
> ERROR: Exceeded timeout (25000) while waiting send intermediate work 
> fragments to remote nodes. Sent 5 and only heard response back from 4 nodes.
> {code}
> This error happens because the FragmentsRunner has a hardcoded timeout 
> RPC_WAIT_IN_MSECS_PER_FRAGMENT which is set at 5 seconds. Increasing the 
> timeout to 10 seconds resolved the sporadic failures that were observed. This 
> timeout should be changed to 10 and should also be configurable via the 
> SystemOptionManager



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6003) Unit test TestDynamicUDFSupport.testLazyInitWhenDynamicUdfSupportIsDisabled fails with FUNCTION ERROR: Failure reading Function class.

2018-02-08 Thread Parth Chandra (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Parth Chandra updated DRILL-6003:
-
Labels:   (was: ready-to-commit)

> Unit test TestDynamicUDFSupport.testLazyInitWhenDynamicUdfSupportIsDisabled 
> fails with FUNCTION ERROR: Failure reading Function class.
> --
>
> Key: DRILL-6003
> URL: https://issues.apache.org/jira/browse/DRILL-6003
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Tools, Build  Test
>Affects Versions: 1.12.0, 1.13.0
>Reporter: Abhishek Girish
>Assignee: Timothy Farkas
>Priority: Major
>
> {code}
> 14:05:23.170 [main] ERROR org.apache.drill.TestReporter - Test Failed (d: 0 
> B(1 B), h: 229.7 MiB(1.1 GiB), nh: 187.0 KiB(73.2 MiB)): 
> testLazyInitWhenDynamicUdfSupportIsDisabled(org.apache.drill.TestDynamicUDFSupport)
> org.apache.drill.exec.rpc.RpcException: 
> org.apache.drill.common.exceptions.UserRemoteException: FUNCTION ERROR: 
> Failure reading Function class.
> Function Class com.drill.udf.CustomLowerFunction
> Fragment 0:0
> [Error Id: 1d6ea0e5-fd65-4622-924d-d196defaedc8 on 10.10.104.57:31010]
>   at 
> org.apache.drill.exec.rpc.RpcException.mapException(RpcException.java:60) 
> ~[drill-rpc-1.12.0.jar:1.12.0]
>   at 
> org.apache.drill.exec.client.DrillClient$ListHoldingResultsListener.getResults(DrillClient.java:865)
>  ~[classes/:na]
>   at 
> org.apache.drill.exec.client.DrillClient.runQuery(DrillClient.java:567) 
> ~[classes/:na]
>   at 
> org.apache.drill.test.BaseTestQuery.testRunAndReturn(BaseTestQuery.java:338) 
> ~[test-classes/:na]
>   at 
> org.apache.drill.test.BaseTestQuery$ClassicTestServices.testRunAndReturn(BaseTestQuery.java:276)
>  ~[test-classes/:na]
>   at 
> org.apache.drill.test.DrillTestWrapper.testRunAndReturn(DrillTestWrapper.java:830)
>  ~[test-classes/:na]
>   at 
> org.apache.drill.test.DrillTestWrapper.compareUnorderedResults(DrillTestWrapper.java:484)
>  ~[test-classes/:na]
>   at 
> org.apache.drill.test.DrillTestWrapper.run(DrillTestWrapper.java:147) 
> ~[test-classes/:na]
>   at org.apache.drill.test.TestBuilder.go(TestBuilder.java:139) 
> ~[test-classes/:na]
>   at 
> org.apache.drill.TestDynamicUDFSupport.testLazyInitWhenDynamicUdfSupportIsDisabled(TestDynamicUDFSupport.java:506)
>  ~[test-classes/:na]
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[na:1.7.0_131]
>   at java.lang.reflect.Method.invoke(Method.java:606) ~[na:1.7.0_131]
>   at java.lang.reflect.Method.invoke(Method.java:606) ~[na:1.7.0_131]
> org.apache.drill.common.exceptions.UserRemoteException: FUNCTION ERROR: 
> Failure reading Function class.
> Function Class com.drill.udf.CustomLowerFunction
> Fragment 0:0
> [Error Id: 1d6ea0e5-fd65-4622-924d-d196defaedc8 on 10.10.104.57:31010]
>   at 
> org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:123)
>  ~[classes/:na]
>   at 
> org.apache.drill.exec.rpc.user.UserClient.handle(UserClient.java:468) 
> ~[classes/:na]
>   at 
> org.apache.drill.exec.rpc.user.UserClient.handle(UserClient.java:102) 
> ~[classes/:na]
>   at 
> org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:274) 
> ~[drill-rpc-1.12.0.jar:1.12.0]
>   at 
> org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:244) 
> ~[drill-rpc-1.12.0.jar:1.12.0]
>   at 
> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:88)
>  ~[netty-codec-4.0.48.Final.jar:4.0.48.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
>  ~[netty-transport-4.0.48.Final.jar:4.0.48.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
>  ~[netty-transport-4.0.48.Final.jar:4.0.48.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
>  ~[netty-transport-4.0.48.Final.jar:4.0.48.Final]
>   at 
> io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:287)
>  ~[netty-handler-4.0.48.Final.jar:4.0.48.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
>  ~[netty-transport-4.0.48.Final.jar:4.0.48.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
>  ~[netty-transport-4.0.48.Final.jar:4.0.48.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
>  ~[netty-transport-4.0.48.Final.jar:4.0.48.Final]
>   at 
> 

[jira] [Commented] (DRILL-6003) Unit test TestDynamicUDFSupport.testLazyInitWhenDynamicUdfSupportIsDisabled fails with FUNCTION ERROR: Failure reading Function class.

2018-02-08 Thread Parth Chandra (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357918#comment-16357918
 ] 

Parth Chandra commented on DRILL-6003:
--

Removing the ready-to-commit label. The issue seems to be occurring and the PR 
referred to in this issue is already closed.

 

> Unit test TestDynamicUDFSupport.testLazyInitWhenDynamicUdfSupportIsDisabled 
> fails with FUNCTION ERROR: Failure reading Function class.
> --
>
> Key: DRILL-6003
> URL: https://issues.apache.org/jira/browse/DRILL-6003
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Tools, Build  Test
>Affects Versions: 1.12.0, 1.13.0
>Reporter: Abhishek Girish
>Assignee: Timothy Farkas
>Priority: Major
>
> {code}
> 14:05:23.170 [main] ERROR org.apache.drill.TestReporter - Test Failed (d: 0 
> B(1 B), h: 229.7 MiB(1.1 GiB), nh: 187.0 KiB(73.2 MiB)): 
> testLazyInitWhenDynamicUdfSupportIsDisabled(org.apache.drill.TestDynamicUDFSupport)
> org.apache.drill.exec.rpc.RpcException: 
> org.apache.drill.common.exceptions.UserRemoteException: FUNCTION ERROR: 
> Failure reading Function class.
> Function Class com.drill.udf.CustomLowerFunction
> Fragment 0:0
> [Error Id: 1d6ea0e5-fd65-4622-924d-d196defaedc8 on 10.10.104.57:31010]
>   at 
> org.apache.drill.exec.rpc.RpcException.mapException(RpcException.java:60) 
> ~[drill-rpc-1.12.0.jar:1.12.0]
>   at 
> org.apache.drill.exec.client.DrillClient$ListHoldingResultsListener.getResults(DrillClient.java:865)
>  ~[classes/:na]
>   at 
> org.apache.drill.exec.client.DrillClient.runQuery(DrillClient.java:567) 
> ~[classes/:na]
>   at 
> org.apache.drill.test.BaseTestQuery.testRunAndReturn(BaseTestQuery.java:338) 
> ~[test-classes/:na]
>   at 
> org.apache.drill.test.BaseTestQuery$ClassicTestServices.testRunAndReturn(BaseTestQuery.java:276)
>  ~[test-classes/:na]
>   at 
> org.apache.drill.test.DrillTestWrapper.testRunAndReturn(DrillTestWrapper.java:830)
>  ~[test-classes/:na]
>   at 
> org.apache.drill.test.DrillTestWrapper.compareUnorderedResults(DrillTestWrapper.java:484)
>  ~[test-classes/:na]
>   at 
> org.apache.drill.test.DrillTestWrapper.run(DrillTestWrapper.java:147) 
> ~[test-classes/:na]
>   at org.apache.drill.test.TestBuilder.go(TestBuilder.java:139) 
> ~[test-classes/:na]
>   at 
> org.apache.drill.TestDynamicUDFSupport.testLazyInitWhenDynamicUdfSupportIsDisabled(TestDynamicUDFSupport.java:506)
>  ~[test-classes/:na]
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[na:1.7.0_131]
>   at java.lang.reflect.Method.invoke(Method.java:606) ~[na:1.7.0_131]
>   at java.lang.reflect.Method.invoke(Method.java:606) ~[na:1.7.0_131]
> org.apache.drill.common.exceptions.UserRemoteException: FUNCTION ERROR: 
> Failure reading Function class.
> Function Class com.drill.udf.CustomLowerFunction
> Fragment 0:0
> [Error Id: 1d6ea0e5-fd65-4622-924d-d196defaedc8 on 10.10.104.57:31010]
>   at 
> org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:123)
>  ~[classes/:na]
>   at 
> org.apache.drill.exec.rpc.user.UserClient.handle(UserClient.java:468) 
> ~[classes/:na]
>   at 
> org.apache.drill.exec.rpc.user.UserClient.handle(UserClient.java:102) 
> ~[classes/:na]
>   at 
> org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:274) 
> ~[drill-rpc-1.12.0.jar:1.12.0]
>   at 
> org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:244) 
> ~[drill-rpc-1.12.0.jar:1.12.0]
>   at 
> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:88)
>  ~[netty-codec-4.0.48.Final.jar:4.0.48.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
>  ~[netty-transport-4.0.48.Final.jar:4.0.48.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
>  ~[netty-transport-4.0.48.Final.jar:4.0.48.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
>  ~[netty-transport-4.0.48.Final.jar:4.0.48.Final]
>   at 
> io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:287)
>  ~[netty-handler-4.0.48.Final.jar:4.0.48.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
>  ~[netty-transport-4.0.48.Final.jar:4.0.48.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
>  ~[netty-transport-4.0.48.Final.jar:4.0.48.Final]
>   at 
> 

[jira] [Assigned] (DRILL-6146) UNION with empty input on any one side returns incorrect results

2018-02-08 Thread Pritesh Maker (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker reassigned DRILL-6146:


Assignee: Vitalii Diravka

> UNION with empty input on any one side returns incorrect results
> 
>
> Key: DRILL-6146
> URL: https://issues.apache.org/jira/browse/DRILL-6146
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization
>Affects Versions: 1.12.0
>Reporter: Khurram Faraaz
>Assignee: Vitalii Diravka
>Priority: Major
>
> When any one side of the UNION has an empty file as input, Drill returns 
> incorrect results.
>  
> table t3 does not have any data inserted into its rows. Postgress returns 1 
> as the result for both the queries, whereas Drill does not.
>  
> {noformat}
> postgres=# create table t3(id int, name varchar(25));
> CREATE TABLE 
> postgres=# select * from (values(1)) t union select id from t3;
>        1
>  
> postgres=# select id from t3 union select * from (values(1)) t;
>   1
>  {noformat}
>  
>  
> Results from Drill 1.12.0-mapr, note we return result 1 as result after the 
> union.
> We have a directory named empty_JSON_f , and it has a single empty JSON file 
> (that JSON file has no content in it, it is empty).
>  
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> select * from (values(1)) UNION select id from 
> empty_JSON_f;
> +-+
> | EXPR$0  |
> +-+
> | 1       |
> +-+
> 1 row selected (2.272 seconds){noformat}
> However, in this query we return null and loose the value 1 from the right 
> hand side, after the union, this doesn't seem correct 
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> select id from empty_JSON_f UNION select * from 
> (values(1));
> +---+
> |  id   |
> +---+
> | null  |
> +---+
> 1 row selected (0.33 seconds){noformat}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6144) Make directMemory amount configurable in tests

2018-02-08 Thread Timothy Farkas (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357751#comment-16357751
 ] 

Timothy Farkas commented on DRILL-6144:
---

[~paul-rogers] The test that hangs on Travis is 
TestVariableWidthWriter#testRestartRow . Which you already have an issue for. 

The test that hangs on Jenkins is org.apache.drill.TestTpchPlanning#tpch18 . 
Note this test does not hang on the master branch only my personal branch. But 
the changes I've made are small and the hang stops occurr when I increase the 
amount of direct memory.

> Make directMemory amount configurable in tests
> --
>
> Key: DRILL-6144
> URL: https://issues.apache.org/jira/browse/DRILL-6144
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.13.0
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.13.0
>
>
> Lately unit tests on Travis as well as our Jenkins server have been hanging 
> indefinitely. I have not identified the root cause, but increasing direct 
> memory resolves the issue. In order to unblock any further work until the 
> root cause is identified I would like to make the amount of direct memory we 
> use for our builds configurable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6144) Make directMemory amount configurable in tests

2018-02-08 Thread Timothy Farkas (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Farkas updated DRILL-6144:
--
Summary: Make directMemory amount configurable in tests  (was: Unit tests 
hang indefinitely make directMemory amount configurable in tests)

> Make directMemory amount configurable in tests
> --
>
> Key: DRILL-6144
> URL: https://issues.apache.org/jira/browse/DRILL-6144
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.13.0
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.13.0
>
>
> Lately unit tests on Travis as well as our Jenkins server have been hanging 
> indefinitely. I have not identified the root cause, but increasing direct 
> memory resolves the issue. In order to unblock any further work until the 
> root cause is identified I would like to make the amount of direct memory we 
> use for our builds configurable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6144) Unit tests hang indefinitely make directMemory amount configurable in tests

2018-02-08 Thread Timothy Farkas (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Farkas updated DRILL-6144:
--
Description: 
Lately unit tests on Travis as well as our Jenkins server have been hanging 
indefinitely. I have not identified the root cause, but increasing direct 
memory resolves the issue. In order to unblock any further work until the root 
cause is identified I would like to make the amount of direct memory we use for 
our builds configurable.




  was:
Lately unit tests on Travis as well as our Jenkins server have been hanging 
indefinitely. I have not identified the root cause, but increasing direct 
memory resolves the issue. In order to unblock any further work until the root 
cause is identified I would like to update the amount of direct memory we use 
for our builds.





> Unit tests hang indefinitely make directMemory amount configurable in tests
> ---
>
> Key: DRILL-6144
> URL: https://issues.apache.org/jira/browse/DRILL-6144
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.13.0
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.13.0
>
>
> Lately unit tests on Travis as well as our Jenkins server have been hanging 
> indefinitely. I have not identified the root cause, but increasing direct 
> memory resolves the issue. In order to unblock any further work until the 
> root cause is identified I would like to make the amount of direct memory we 
> use for our builds configurable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6144) Unit tests hang indefinitely make directMemory amount configurable in tests

2018-02-08 Thread Timothy Farkas (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Farkas updated DRILL-6144:
--
Summary: Unit tests hang indefinitely make directMemory amount configurable 
in tests  (was: Unit tests hang indefinitely)

> Unit tests hang indefinitely make directMemory amount configurable in tests
> ---
>
> Key: DRILL-6144
> URL: https://issues.apache.org/jira/browse/DRILL-6144
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.13.0
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.13.0
>
>
> Lately unit tests on Travis as well as our Jenkins server have been hanging 
> indefinitely. I have not identified the root cause, but increasing direct 
> memory resolves the issue. In order to unblock any further work until the 
> root cause is identified I would like to update the amount of direct memory 
> we use for our builds.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6144) Unit tests hang indefinitely

2018-02-08 Thread Timothy Farkas (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Farkas updated DRILL-6144:
--
Description: 
Lately unit tests on Travis as well as our Jenkins server have been hanging 
indefinitely. I have not identified the root cause, but increasing direct 
memory resolves the issue. In order to unblock any further work until the root 
cause is identified I would like to update the amount of direct memory we use 
for our builds.




  was:Lately unit tests on Travis as well as our Jenkins server have been 
hanging indefinitely. I have not identified the root cause, but increasing 
direct memory resolves the issue. In order to unblock any further work until 
the root cause is identified I would like to update the amount of direct memory 
we use for our builds.


> Unit tests hang indefinitely
> 
>
> Key: DRILL-6144
> URL: https://issues.apache.org/jira/browse/DRILL-6144
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.13.0
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.13.0
>
>
> Lately unit tests on Travis as well as our Jenkins server have been hanging 
> indefinitely. I have not identified the root cause, but increasing direct 
> memory resolves the issue. In order to unblock any further work until the 
> root cause is identified I would like to update the amount of direct memory 
> we use for our builds.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6144) Unit tests hang indefinitely

2018-02-08 Thread Timothy Farkas (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Farkas updated DRILL-6144:
--
Labels: ready-to-commit  (was: )

> Unit tests hang indefinitely
> 
>
> Key: DRILL-6144
> URL: https://issues.apache.org/jira/browse/DRILL-6144
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.13.0
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.13.0
>
>
> Lately unit tests on Travis as well as our Jenkins server have been hanging 
> indefinitely. I have not identified the root cause, but increasing direct 
> memory resolves the issue. In order to unblock any further work until the 
> root cause is identified I would like to update the amount of direct memory 
> we use for our builds.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6128) Wrong Result with Nested Loop Join

2018-02-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357696#comment-16357696
 ] 

ASF GitHub Bot commented on DRILL-6128:
---

Github user sohami commented on the issue:

https://github.com/apache/drill/pull/1109
  
Updated the comment and rebased on top of latest master


> Wrong Result with Nested Loop Join
> --
>
> Key: DRILL-6128
> URL: https://issues.apache.org/jira/browse/DRILL-6128
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Reporter: Sorabh Hamirwasia
>Assignee: Sorabh Hamirwasia
>Priority: Major
>
> Nested Loop Join produces wrong result's if there are multiple batches on the 
> right side. It builds an ExapandableHyperContainer to hold all the right side 
> of batches. Then for each record on left side input evaluates the condition 
> with all records on right side and emit the output if condition is satisfied. 
> The main loop inside 
> [populateOutgoingBatch|https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/NestedLoopJoinTemplate.java#L106]
>  call's *doEval* with correct indexes to evaluate records on both the sides. 
> In generated code of *doEval* for some reason there is a right shift of 16 
> done on the rightBatchIndex (sample shared below).
> {code:java}
> public boolean doEval(int leftIndex, int rightBatchIndex, int 
> rightRecordIndexWithinBatch)
>  throws SchemaChangeException
> {
>   {
>IntHolder out3 = new IntHolder();
>{
>  out3 .value = vv0 .getAccessor().get((leftIndex));
>}
>IntHolder out7 = new IntHolder();
>{
>  out7 .value =  
>  
> vv4[((rightBatchIndex)>>>16)].getAccessor().get(((rightRecordIndexWithinBatch)&
>  65535));
>}
> ..
> ..
> }{code}
>  
> When the actual loop is processing second batch, inside eval method the index 
> with right shift becomes 0 and it ends up evaluating condition w.r.t first 
> right batch again. So if there is more than one batch (upto 65535) on right 
> side doEval will always consider first batch for condition evaluation. But 
> the output data will be based on correct batch so there will be issues like 
> OutOfBound and WrongData. Cases can be:
> Let's say: *rightBatchIndex*: index of right batch to consider, 
> *rightRecordIndexWithinBatch*: index of record in right batch at 
> rightBatchIndex
> 1) First right batch comes with zero data and with OK_NEW_SCHEMA (let's say 
> because of filter in the operator tree). Next Right batch has > 0 data. So 
> when we call doEval for second batch(*rightBatchIndex = 1*) and first record 
> in it (i.e. *rightRecordIndexWithinBatch = 0*), actual evaluation will happen 
> using first batch (since *rightBatchIndex >>> 16 = 0*). On accessing record 
> at *rightRecordIndexWithinBatch* in first batch it will throw 
> *IndexOutofBoundException* since the first batch has no records.
> 2) Let's say there are 2 batches on right side. Also let's say first batch 
> contains 3 records (with id_right=1/2/3) and 2nd batch also contain 3 records 
> (with id_right=10/20/30). Also let's say there is 1 batch on left side with 3 
> records (with id_left=1/2/3). Then in this case the NestedLoopJoin (with 
> equality condition) will end up producing 6 records instead of 3. It produces 
> first 3 records based on match between left records and match in first right 
> batch records. But while 2nd right batch it will evaluate id_left=id_right 
> based on first batch instead and will again find matches and will produce 
> another 3 records. *Example:*
> *Left Batch Data:*
>  
> {code:java}
> Batch1:
> {
>  "id_left": 1,
>  "cost_left": 11,
>  "name_left": "item11"
> }
> {
>  "id_left": 2,
>  "cost_left": 21,
>  "name_left": "item21"
> }
> {
>  "id_left": 3,
>  "cost_left": 31,
>  "name_left": "item31"
> }{code}
>  
> *Right Batch Data:*
>  
> {code:java}
> Batch 1:
> {
>  "id_right": 1,
>  "cost_right": 10,
>  "name_right": "item1"
> }
> {
>  "id_right": 2,
>  "cost_right": 20,
>  "name_right": "item2"
> }
> {
>  "id_right": 3,
>  "cost_right": 30,
>  "name_right": "item3"
> }
> {code}
>  
>  
> {code:java}
> Batch 2:
> {
>  "id_right": 4,
>  "cost_right": 40,
>  "name_right": "item4"
> }
> {
>  "id_right": 4,
>  "cost_right": 40,
>  "name_right": "item4"
> }
> {
>  "id_right": 4,
>  "cost_right": 40,
>  "name_right": "item4"
> }{code}
>  
> *Produced output:*
> {code:java}
> {
>  "id_left": 1,
>  "cost_left": 11,
>  "name_left": "item11",
>  "id_right": 1,
>  "cost_right": 10,
>  "name_right": "item1"
> }
> {
>  "id_left": 1,
>  "cost_left": 11,
>  "name_left": "item11",
>  "id_right": 4,
>  "cost_right": 40,
>  "name_right": "item4"
> }
> {
>  "id_left": 2,
>  "cost_left": 21,
>  "name_left": "item21"
>  "id_right": 2, 
>  

[jira] [Commented] (DRILL-6115) SingleMergeExchange is not scaling up when many minor fragments are allocated for a query.

2018-02-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357680#comment-16357680
 ] 

ASF GitHub Bot commented on DRILL-6115:
---

Github user vrozov commented on a diff in the pull request:

https://github.com/apache/drill/pull/1110#discussion_r167089878
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/base/PhysicalVisitor.java
 ---
@@ -55,10 +56,10 @@
 
 
   public RETURN visitExchange(Exchange exchange, EXTRA value) throws EXCEP;
+  public RETURN visitSingleMergeExchange(SingleMergeExchange exchange, 
EXTRA value) throws EXCEP;
--- End diff --

The same question as for `PrelVisitor.java`. Is it necessary to have 
separate `visitSingleMergeExchange`?


> SingleMergeExchange is not scaling up when many minor fragments are allocated 
> for a query.
> --
>
> Key: DRILL-6115
> URL: https://issues.apache.org/jira/browse/DRILL-6115
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Relational Operators
>Affects Versions: 1.12.0
>Reporter: Hanumath Rao Maduri
>Assignee: Hanumath Rao Maduri
>Priority: Major
> Fix For: 1.13.0
>
> Attachments: Enhancing Drill to multiplex ordered merge exchanges.docx
>
>
> SingleMergeExchange is created when a global order is required in the output. 
> The following query produces the SingleMergeExchange.
> {code:java}
> 0: jdbc:drill:zk=local> explain plan for select L_LINENUMBER from 
> dfs.`/drill/tables/lineitem` order by L_LINENUMBER;
> +--+--+
> | text | json |
> +--+--+
> | 00-00 Screen
> 00-01 Project(L_LINENUMBER=[$0])
> 00-02 SingleMergeExchange(sort0=[0])
> 01-01 SelectionVectorRemover
> 01-02 Sort(sort0=[$0], dir0=[ASC])
> 01-03 HashToRandomExchange(dist0=[[$0]])
> 02-01 Scan(table=[[dfs, /drill/tables/lineitem]], 
> groupscan=[JsonTableGroupScan [ScanSpec=JsonScanSpec 
> [tableName=maprfs:///drill/tables/lineitem, condition=null], 
> columns=[`L_LINENUMBER`], maxwidth=15]])
> {code}
> On a 10 node cluster if the table is huge then DRILL can spawn many minor 
> fragments which are all merged on a single node with one merge receiver. 
> Doing so will create lot of memory pressure on the receiver node and also 
> execution bottleneck. To address this issue, merge receiver should be 
> multiphase merge receiver. 
> Ideally for large cluster one can introduce tree merges so that merging can 
> be done parallel. But as a first step I think it is better to use the 
> existing infrastructure for multiplexing operators to generate an OrderedMux 
> so that all the minor fragments pertaining to one DRILLBIT should be merged 
> and the merged data can be sent across to the receiver operator.
> On a 10 node cluster if each node processes 14 minor fragments.
> Current version of code merges 140 minor fragments
> the proposed version has two level merges 1 - 14 merge in each drillbit which 
> is parallel 
> and 10 minorfragments are merged at the receiver node.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6115) SingleMergeExchange is not scaling up when many minor fragments are allocated for a query.

2018-02-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357681#comment-16357681
 ] 

ASF GitHub Bot commented on DRILL-6115:
---

Github user vrozov commented on a diff in the pull request:

https://github.com/apache/drill/pull/1110#discussion_r167088606
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/visitor/PrelVisitor.java
 ---
@@ -35,7 +38,9 @@
   public RETURN visitScan(ScanPrel prel, EXTRA value) throws EXCEP;
   public RETURN visitJoin(JoinPrel prel, EXTRA value) throws EXCEP;
   public RETURN visitProject(ProjectPrel prel, EXTRA value) throws EXCEP;
-
+  public RETURN visitHashToRandomExchange(HashToRandomExchangePrel prel, 
EXTRA value) throws EXCEP;
--- End diff --

Are 3 new methods necessary? Can `visitExchange` delegate to `prel` or use 
instance of? 


> SingleMergeExchange is not scaling up when many minor fragments are allocated 
> for a query.
> --
>
> Key: DRILL-6115
> URL: https://issues.apache.org/jira/browse/DRILL-6115
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Relational Operators
>Affects Versions: 1.12.0
>Reporter: Hanumath Rao Maduri
>Assignee: Hanumath Rao Maduri
>Priority: Major
> Fix For: 1.13.0
>
> Attachments: Enhancing Drill to multiplex ordered merge exchanges.docx
>
>
> SingleMergeExchange is created when a global order is required in the output. 
> The following query produces the SingleMergeExchange.
> {code:java}
> 0: jdbc:drill:zk=local> explain plan for select L_LINENUMBER from 
> dfs.`/drill/tables/lineitem` order by L_LINENUMBER;
> +--+--+
> | text | json |
> +--+--+
> | 00-00 Screen
> 00-01 Project(L_LINENUMBER=[$0])
> 00-02 SingleMergeExchange(sort0=[0])
> 01-01 SelectionVectorRemover
> 01-02 Sort(sort0=[$0], dir0=[ASC])
> 01-03 HashToRandomExchange(dist0=[[$0]])
> 02-01 Scan(table=[[dfs, /drill/tables/lineitem]], 
> groupscan=[JsonTableGroupScan [ScanSpec=JsonScanSpec 
> [tableName=maprfs:///drill/tables/lineitem, condition=null], 
> columns=[`L_LINENUMBER`], maxwidth=15]])
> {code}
> On a 10 node cluster if the table is huge then DRILL can spawn many minor 
> fragments which are all merged on a single node with one merge receiver. 
> Doing so will create lot of memory pressure on the receiver node and also 
> execution bottleneck. To address this issue, merge receiver should be 
> multiphase merge receiver. 
> Ideally for large cluster one can introduce tree merges so that merging can 
> be done parallel. But as a first step I think it is better to use the 
> existing infrastructure for multiplexing operators to generate an OrderedMux 
> so that all the minor fragments pertaining to one DRILLBIT should be merged 
> and the merged data can be sent across to the receiver operator.
> On a 10 node cluster if each node processes 14 minor fragments.
> Current version of code merges 140 minor fragments
> the proposed version has two level merges 1 - 14 merge in each drillbit which 
> is parallel 
> and 10 minorfragments are merged at the receiver node.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6115) SingleMergeExchange is not scaling up when many minor fragments are allocated for a query.

2018-02-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357683#comment-16357683
 ] 

ASF GitHub Bot commented on DRILL-6115:
---

Github user vrozov commented on a diff in the pull request:

https://github.com/apache/drill/pull/1110#discussion_r167090991
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/config/OrderedMuxExchange.java
 ---
@@ -0,0 +1,58 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.config;
+
+import java.util.List;
+
+import org.apache.drill.exec.physical.MinorFragmentEndpoint;
+import org.apache.drill.exec.physical.base.PhysicalOperator;
+import org.apache.drill.exec.physical.base.Receiver;
+import org.apache.drill.common.logical.data.Order.Ordering;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import com.fasterxml.jackson.annotation.JsonTypeName;
+
+/**
+ * OrderedMuxExchange is a version of MuxExchange where the incoming 
batches are sorted
+ * merge operation is performed to produced a sorted stream as output.
+ */
+@JsonTypeName("ordered-mux-exchange")
+public class OrderedMuxExchange extends AbstractMuxExchange {
+  private final List orderings;
+
+  public OrderedMuxExchange(@JsonProperty("child") PhysicalOperator child, 
List orderings) {
--- End diff --

Json annotation for orderings?


> SingleMergeExchange is not scaling up when many minor fragments are allocated 
> for a query.
> --
>
> Key: DRILL-6115
> URL: https://issues.apache.org/jira/browse/DRILL-6115
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Relational Operators
>Affects Versions: 1.12.0
>Reporter: Hanumath Rao Maduri
>Assignee: Hanumath Rao Maduri
>Priority: Major
> Fix For: 1.13.0
>
> Attachments: Enhancing Drill to multiplex ordered merge exchanges.docx
>
>
> SingleMergeExchange is created when a global order is required in the output. 
> The following query produces the SingleMergeExchange.
> {code:java}
> 0: jdbc:drill:zk=local> explain plan for select L_LINENUMBER from 
> dfs.`/drill/tables/lineitem` order by L_LINENUMBER;
> +--+--+
> | text | json |
> +--+--+
> | 00-00 Screen
> 00-01 Project(L_LINENUMBER=[$0])
> 00-02 SingleMergeExchange(sort0=[0])
> 01-01 SelectionVectorRemover
> 01-02 Sort(sort0=[$0], dir0=[ASC])
> 01-03 HashToRandomExchange(dist0=[[$0]])
> 02-01 Scan(table=[[dfs, /drill/tables/lineitem]], 
> groupscan=[JsonTableGroupScan [ScanSpec=JsonScanSpec 
> [tableName=maprfs:///drill/tables/lineitem, condition=null], 
> columns=[`L_LINENUMBER`], maxwidth=15]])
> {code}
> On a 10 node cluster if the table is huge then DRILL can spawn many minor 
> fragments which are all merged on a single node with one merge receiver. 
> Doing so will create lot of memory pressure on the receiver node and also 
> execution bottleneck. To address this issue, merge receiver should be 
> multiphase merge receiver. 
> Ideally for large cluster one can introduce tree merges so that merging can 
> be done parallel. But as a first step I think it is better to use the 
> existing infrastructure for multiplexing operators to generate an OrderedMux 
> so that all the minor fragments pertaining to one DRILLBIT should be merged 
> and the merged data can be sent across to the receiver operator.
> On a 10 node cluster if each node processes 14 minor fragments.
> Current version of code merges 140 minor fragments
> the proposed version has two level merges 1 - 14 merge in each drillbit which 
> is parallel 
> and 10 minorfragments are merged at the receiver node.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6115) SingleMergeExchange is not scaling up when many minor fragments are allocated for a query.

2018-02-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357682#comment-16357682
 ] 

ASF GitHub Bot commented on DRILL-6115:
---

Github user vrozov commented on a diff in the pull request:

https://github.com/apache/drill/pull/1110#discussion_r167092037
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/xsort/TestSortSpillWithException.java
 ---
@@ -59,6 +59,7 @@ public static void setup() throws Exception {
 ClusterFixtureBuilder builder = ClusterFixture.builder(dirTestWatcher)
 .configProperty(ExecConstants.EXTERNAL_SORT_SPILL_THRESHOLD, 1) // 
Unmanaged
 .configProperty(ExecConstants.EXTERNAL_SORT_SPILL_GROUP_SIZE, 1) 
// Unmanaged
+.configProperty(ExecConstants.EXTERNAL_SORT_MAX_MEMORY, 10 * 1024 
* 1024) //use less memory for sorting.
--- End diff --

Why is the change necessary?


> SingleMergeExchange is not scaling up when many minor fragments are allocated 
> for a query.
> --
>
> Key: DRILL-6115
> URL: https://issues.apache.org/jira/browse/DRILL-6115
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Relational Operators
>Affects Versions: 1.12.0
>Reporter: Hanumath Rao Maduri
>Assignee: Hanumath Rao Maduri
>Priority: Major
> Fix For: 1.13.0
>
> Attachments: Enhancing Drill to multiplex ordered merge exchanges.docx
>
>
> SingleMergeExchange is created when a global order is required in the output. 
> The following query produces the SingleMergeExchange.
> {code:java}
> 0: jdbc:drill:zk=local> explain plan for select L_LINENUMBER from 
> dfs.`/drill/tables/lineitem` order by L_LINENUMBER;
> +--+--+
> | text | json |
> +--+--+
> | 00-00 Screen
> 00-01 Project(L_LINENUMBER=[$0])
> 00-02 SingleMergeExchange(sort0=[0])
> 01-01 SelectionVectorRemover
> 01-02 Sort(sort0=[$0], dir0=[ASC])
> 01-03 HashToRandomExchange(dist0=[[$0]])
> 02-01 Scan(table=[[dfs, /drill/tables/lineitem]], 
> groupscan=[JsonTableGroupScan [ScanSpec=JsonScanSpec 
> [tableName=maprfs:///drill/tables/lineitem, condition=null], 
> columns=[`L_LINENUMBER`], maxwidth=15]])
> {code}
> On a 10 node cluster if the table is huge then DRILL can spawn many minor 
> fragments which are all merged on a single node with one merge receiver. 
> Doing so will create lot of memory pressure on the receiver node and also 
> execution bottleneck. To address this issue, merge receiver should be 
> multiphase merge receiver. 
> Ideally for large cluster one can introduce tree merges so that merging can 
> be done parallel. But as a first step I think it is better to use the 
> existing infrastructure for multiplexing operators to generate an OrderedMux 
> so that all the minor fragments pertaining to one DRILLBIT should be merged 
> and the merged data can be sent across to the receiver operator.
> On a 10 node cluster if each node processes 14 minor fragments.
> Current version of code merges 140 minor fragments
> the proposed version has two level merges 1 - 14 merge in each drillbit which 
> is parallel 
> and 10 minorfragments are merged at the receiver node.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6115) SingleMergeExchange is not scaling up when many minor fragments are allocated for a query.

2018-02-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357679#comment-16357679
 ] 

ASF GitHub Bot commented on DRILL-6115:
---

Github user vrozov commented on a diff in the pull request:

https://github.com/apache/drill/pull/1110#discussion_r167091413
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/config/OrderedMuxExchange.java
 ---
@@ -0,0 +1,58 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.config;
+
+import java.util.List;
+
+import org.apache.drill.exec.physical.MinorFragmentEndpoint;
+import org.apache.drill.exec.physical.base.PhysicalOperator;
+import org.apache.drill.exec.physical.base.Receiver;
+import org.apache.drill.common.logical.data.Order.Ordering;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import com.fasterxml.jackson.annotation.JsonTypeName;
+
+/**
+ * OrderedMuxExchange is a version of MuxExchange where the incoming 
batches are sorted
+ * merge operation is performed to produced a sorted stream as output.
+ */
+@JsonTypeName("ordered-mux-exchange")
+public class OrderedMuxExchange extends AbstractMuxExchange {
+  private final List orderings;
+
+  public OrderedMuxExchange(@JsonProperty("child") PhysicalOperator child, 
List orderings) {
+super(child);
+this.orderings = orderings;
+  }
+
+  @Override
+  public Receiver getReceiver(int minorFragmentId) {
+createSenderReceiverMapping();
+
+List senders = 
receiverToSenderMapping.get(minorFragmentId);
+if (senders == null || senders.size() <= 0) {
--- End diff --

Add debug level info for `receiverToSenderMapping` and minorFragmentId.


> SingleMergeExchange is not scaling up when many minor fragments are allocated 
> for a query.
> --
>
> Key: DRILL-6115
> URL: https://issues.apache.org/jira/browse/DRILL-6115
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Relational Operators
>Affects Versions: 1.12.0
>Reporter: Hanumath Rao Maduri
>Assignee: Hanumath Rao Maduri
>Priority: Major
> Fix For: 1.13.0
>
> Attachments: Enhancing Drill to multiplex ordered merge exchanges.docx
>
>
> SingleMergeExchange is created when a global order is required in the output. 
> The following query produces the SingleMergeExchange.
> {code:java}
> 0: jdbc:drill:zk=local> explain plan for select L_LINENUMBER from 
> dfs.`/drill/tables/lineitem` order by L_LINENUMBER;
> +--+--+
> | text | json |
> +--+--+
> | 00-00 Screen
> 00-01 Project(L_LINENUMBER=[$0])
> 00-02 SingleMergeExchange(sort0=[0])
> 01-01 SelectionVectorRemover
> 01-02 Sort(sort0=[$0], dir0=[ASC])
> 01-03 HashToRandomExchange(dist0=[[$0]])
> 02-01 Scan(table=[[dfs, /drill/tables/lineitem]], 
> groupscan=[JsonTableGroupScan [ScanSpec=JsonScanSpec 
> [tableName=maprfs:///drill/tables/lineitem, condition=null], 
> columns=[`L_LINENUMBER`], maxwidth=15]])
> {code}
> On a 10 node cluster if the table is huge then DRILL can spawn many minor 
> fragments which are all merged on a single node with one merge receiver. 
> Doing so will create lot of memory pressure on the receiver node and also 
> execution bottleneck. To address this issue, merge receiver should be 
> multiphase merge receiver. 
> Ideally for large cluster one can introduce tree merges so that merging can 
> be done parallel. But as a first step I think it is better to use the 
> existing infrastructure for multiplexing operators to generate an OrderedMux 
> so that all the minor fragments pertaining to one DRILLBIT should be merged 
> and the merged data can be sent across to the receiver operator.
> On a 10 node cluster if each node processes 14 minor fragments.
> Current version of code merges 140 minor fragments

[jira] [Commented] (DRILL-6139) Travis CI hangs on TestVariableWidthWriter#testRestartRow

2018-02-08 Thread Paul Rogers (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357662#comment-16357662
 ] 

Paul Rogers commented on DRILL-6139:


[~timothyfarkas], if you see tests that have unclosed allocators, please file 
one or more JIRA entries. In general, the test *should* fail with an exception 
about an unclosed allocator. When I see these, the individual test cases pass, 
but the test itself fails, due to an exception in the test wrap-up block.

At first glance, it is not clear how memory could impact these tests since they 
generally allocate only one large vector, and the whole point of the code under 
test is to ensure that such vectors don't exceed 16 MB.

But, given that we run many test cases in the same JVM under Maven (and thus 
Travis), perhaps there is some cumulative effect that we need to figure out.

> Travis CI hangs on TestVariableWidthWriter#testRestartRow
> -
>
> Key: DRILL-6139
> URL: https://issues.apache.org/jira/browse/DRILL-6139
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.12.0
>Reporter: Boaz Ben-Zvi
>Assignee: Paul Rogers
>Priority: Major
>
> The Travis CI fails (probably hangs, then times out) in the following test:
> {code:java}
> Running org.apache.drill.test.rowSet.test.DummyWriterTest Running 
> org.apache.drill.test.rowSet.test.DummyWriterTest#testDummyScalar Running 
> org.apache.drill.test.rowSet.test.DummyWriterTest#testDummyMap Tests run: 2, 
> Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 3.109 sec - in 
> org.apache.drill.test.rowSet.test.DummyWriterTest Running 
> org.apache.drill.test.rowSet.test.TestVariableWidthWriter Running 
> org.apache.drill.test.rowSet.test.TestVariableWidthWriter#testSkipNulls 
> Running org.apache.drill.test.rowSet.test.TestVariableWidthWriter#testWrite 
> Running 
> org.apache.drill.test.rowSet.test.TestVariableWidthWriter#testFillEmpties 
> Running 
> org.apache.drill.test.rowSet.test.TestVariableWidthWriter#testRollover 
> Running 
> org.apache.drill.test.rowSet.test.TestVariableWidthWriter#testSizeLimit 
> Running 
> org.apache.drill.test.rowSet.test.TestVariableWidthWriter#testRolloverWithEmpties
>  Running 
> org.apache.drill.test.rowSet.test.TestVariableWidthWriter#testRestartRow 
> Killed
>  
> Results : 
> Tests run: 1554, Failures: 0, Errors: 0, Skipped: 66{code}
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6144) Unit tests hang indefinitely

2018-02-08 Thread Paul Rogers (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357656#comment-16357656
 ] 

Paul Rogers commented on DRILL-6144:


Do we have a lists of tests that have hung? Perhaps we can detect a pattern.

> Unit tests hang indefinitely
> 
>
> Key: DRILL-6144
> URL: https://issues.apache.org/jira/browse/DRILL-6144
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.13.0
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Major
> Fix For: 1.13.0
>
>
> Lately unit tests on Travis as well as our Jenkins server have been hanging 
> indefinitely. I have not identified the root cause, but increasing direct 
> memory resolves the issue. In order to unblock any further work until the 
> root cause is identified I would like to update the amount of direct memory 
> we use for our builds.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6144) Unit tests hang indefinitely

2018-02-08 Thread Timothy Farkas (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Farkas updated DRILL-6144:
--
Affects Version/s: 1.13.0
 Reviewer: Boaz Ben-Zvi
Fix Version/s: 1.13.0

> Unit tests hang indefinitely
> 
>
> Key: DRILL-6144
> URL: https://issues.apache.org/jira/browse/DRILL-6144
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.13.0
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Major
> Fix For: 1.13.0
>
>
> Lately unit tests on Travis as well as our Jenkins server have been hanging 
> indefinitely. I have not identified the root cause, but increasing direct 
> memory resolves the issue. In order to unblock any further work until the 
> root cause is identified I would like to update the amount of direct memory 
> we use for our builds.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6144) Unit tests hang indefinitely

2018-02-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357643#comment-16357643
 ] 

ASF GitHub Bot commented on DRILL-6144:
---

Github user ilooner commented on the issue:

https://github.com/apache/drill/pull/1118
  
@Ben-Zvi Please review


> Unit tests hang indefinitely
> 
>
> Key: DRILL-6144
> URL: https://issues.apache.org/jira/browse/DRILL-6144
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Major
>
> Lately unit tests on Travis as well as our Jenkins server have been hanging 
> indefinitely. I have not identified the root cause, but increasing direct 
> memory resolves the issue. In order to unblock any further work until the 
> root cause is identified I would like to update the amount of direct memory 
> we use for our builds.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6144) Unit tests hang indefinitely

2018-02-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357642#comment-16357642
 ] 

ASF GitHub Bot commented on DRILL-6144:
---

GitHub user ilooner opened a pull request:

https://github.com/apache/drill/pull/1118

DRILL-6144: Increase direct memory for unit tests

Travis builds have been hanging as well as unit tests on our Jenkins 
server. It is not clear what the underlying cause is but increasing direct 
memory resolves the issue. 

One possible root cause is that unit tests do leak direct memory by 
creating allocators which allocate memory and are then never closed. It will 
take a while to find a true fix for the issue and resolve it, so to unblock our 
work for now I would like to increase direct memory available for the build.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ilooner/drill DRILL-6144

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/1118.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1118


commit 3c85f1dd5062cefc9e10fcbf6babeec5aa2dac8f
Author: Timothy Farkas 
Date:   2018-02-08T22:03:41Z

DRILL-6144: Increase direct memory for unit tests on Travis and build 
servers to prevent unit tests from hanging.




> Unit tests hang indefinitely
> 
>
> Key: DRILL-6144
> URL: https://issues.apache.org/jira/browse/DRILL-6144
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Major
>
> Lately unit tests on Travis as well as our Jenkins server have been hanging 
> indefinitely. I have not identified the root cause, but increasing direct 
> memory resolves the issue. In order to unblock any further work until the 
> root cause is identified I would like to update the amount of direct memory 
> we use for our builds.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6032) Use RecordBatchSizer to estimate size of columns in HashAgg

2018-02-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357630#comment-16357630
 ] 

ASF GitHub Bot commented on DRILL-6032:
---

Github user ilooner commented on the issue:

https://github.com/apache/drill/pull/1101
  
@Ben-Zvi I have addressed review comments. All unit tests and functional 
tests (except for typical random failures are passing). I have included a 
commit for DRILL-6144 in this PR in order to fix other issues that were 
breaking our builds and validate that these changes pass all tests. After these 
changes are +1 I will remove the commit for DRILL-6144. I will open a separate 
PR to merge the changes in DRILL-6144.


> Use RecordBatchSizer to estimate size of columns in HashAgg
> ---
>
> Key: DRILL-6032
> URL: https://issues.apache.org/jira/browse/DRILL-6032
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Major
> Fix For: 1.13.0
>
>
> We need to use the RecordBatchSize to estimate the size of columns in the 
> Partition batches created by HashAgg.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6146) UNION with empty input on any one side returns incorrect results

2018-02-08 Thread Khurram Faraaz (JIRA)
Khurram Faraaz created DRILL-6146:
-

 Summary: UNION with empty input on any one side returns incorrect 
results
 Key: DRILL-6146
 URL: https://issues.apache.org/jira/browse/DRILL-6146
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Affects Versions: 1.12.0
Reporter: Khurram Faraaz


When any one side of the UNION has an empty file as input, Drill returns 
incorrect results.
 
table t3 does not have any data inserted into its rows. Postgress returns 1 as 
the result for both the queries, whereas Drill does not.
 
{noformat}
postgres=# create table t3(id int, name varchar(25));
CREATE TABLE 
postgres=# select * from (values(1)) t union select id from t3;
       1
 
postgres=# select id from t3 union select * from (values(1)) t;
  1
 {noformat}
 
 
Results from Drill 1.12.0-mapr, note we return result 1 as result after the 
union.
We have a directory named empty_JSON_f , and it has a single empty JSON file 
(that JSON file has no content in it, it is empty).
 
{noformat}
0: jdbc:drill:schema=dfs.tmp> select * from (values(1)) UNION select id from 
empty_JSON_f;
+-+
| EXPR$0  |
+-+
| 1       |
+-+
1 row selected (2.272 seconds){noformat}
However, in this query we return null and loose the value 1 from the right hand 
side, after the union, this doesn't seem correct 
{noformat}
0: jdbc:drill:schema=dfs.tmp> select id from empty_JSON_f UNION select * from 
(values(1));
+---+
|  id   |
+---+
| null  |
+---+
1 row selected (0.33 seconds){noformat}
 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-5902) Regression: Queries encounter random failure due to RPC connection timed out

2018-02-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357524#comment-16357524
 ] 

ASF GitHub Bot commented on DRILL-5902:
---

Github user ilooner commented on the issue:

https://github.com/apache/drill/pull/1113
  
K thx @vrozov 


> Regression: Queries encounter random failure due to RPC connection timed out
> 
>
> Key: DRILL-5902
> URL: https://issues.apache.org/jira/browse/DRILL-5902
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - RPC
>Affects Versions: 1.11.0
>Reporter: Robert Hou
>Assignee: Vlad Rozov
>Priority: Critical
> Attachments: 261230f7-e3b9-0cee-22d8-921cb56e3e12.sys.drill, 
> node196.drillbit.log
>
>
> Multiple random failures (25) occurred with the latest 
> Functional-Baseline-88.193 run.  Here is a sample query:
> {noformat}
> /root/drillAutomation/prasadns14/framework/resources/Functional/window_functions/multiple_partitions/q27.sql
> -- Kitchen sink
> -- Use all supported functions
> select
> rank()  over W,
> dense_rank()over W,
> percent_rank()  over W,
> cume_dist() over W,
> avg(c_integer + c_integer)  over W,
> sum(c_integer/100)  over W,
> count(*)over W,
> min(c_integer)  over W,
> max(c_integer)  over W,
> row_number()over W
> from
> j7
> where
> c_boolean is not null
> window  W as (partition by c_bigint, c_date, c_time, c_boolean order by 
> c_integer)
> {noformat}
> From the logs:
> {noformat}
> 2017-10-23 04:14:36,536 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:1 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> 2017-10-23 04:14:36,537 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:5 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> 2017-10-23 04:14:36,537 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:9 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> 2017-10-23 04:14:36,537 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:13 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> 2017-10-23 04:14:36,537 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:17 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> 2017-10-23 04:14:36,538 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:21 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> 2017-10-23 04:14:36,538 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:25 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> {noformat}
> {noformat}
> 2017-10-23 04:14:53,941 [UserServer-1] INFO  
> o.a.drill.exec.rpc.user.UserServer - RPC connection /10.10.88.196:31010 <--> 
> /10.10.88.193:38281 (user server) timed out.  Timeout was set to 30 seconds. 
> Closing connection.
> 2017-10-23 04:14:53,952 [UserServer-1] INFO  
> o.a.d.e.w.fragment.FragmentExecutor - 
> 261230f8-2698-15b2-952f-d4ade8d6b180:0:0: State change requested RUNNING --> 
> FAILED
> 2017-10-23 04:14:53,952 [261230f8-2698-15b2-952f-d4ade8d6b180:frag:0:0] INFO  
> o.a.d.e.w.fragment.FragmentExecutor - 
> 261230f8-2698-15b2-952f-d4ade8d6b180:0:0: State change requested FAILED --> 
> FINISHED
> 2017-10-23 04:14:53,956 [UserServer-1] WARN  
> o.apache.drill.exec.rpc.RequestIdMap - Failure while attempting to fail rpc 
> response.
> java.lang.IllegalArgumentException: Self-suppression not permitted
> at java.lang.Throwable.addSuppressed(Throwable.java:1043) 
> ~[na:1.7.0_45]
> at 
> org.apache.drill.common.DeferredException.addException(DeferredException.java:88)
>  ~[drill-common-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
> at 
> org.apache.drill.common.DeferredException.addThrowable(DeferredException.java:97)

[jira] [Commented] (DRILL-6139) Travis CI hangs on TestVariableWidthWriter#testRestartRow

2018-02-08 Thread Timothy Farkas (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357522#comment-16357522
 ] 

Timothy Farkas commented on DRILL-6139:
---

[~Paul.Rogers] Didn't know there was already a jira for this. I also created 
https://issues.apache.org/jira/browse/DRILL-6144 for unit test hangs. But I 
observed that the hangs were occurring in unit tests other than this one and 
not just on Travis. After I increased direct memory to 5gb on jenkins and 4.6gb 
on travis, the issue disappeared. I am going to open a PR with the increase in 
memory. Perhaps if you find something during your investigation and have a fix 
we can reduce the memory on our builds once the fix goes in.

Also as a side note, I noticed some unit tests where allocators are created and 
never closed. I'm a bit hazy on the details but I believe MiniPlanUnitTestBase 
is one of these offending test classes. 

> Travis CI hangs on TestVariableWidthWriter#testRestartRow
> -
>
> Key: DRILL-6139
> URL: https://issues.apache.org/jira/browse/DRILL-6139
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.12.0
>Reporter: Boaz Ben-Zvi
>Assignee: Paul Rogers
>Priority: Major
>
> The Travis CI fails (probably hangs, then times out) in the following test:
> {code:java}
> Running org.apache.drill.test.rowSet.test.DummyWriterTest Running 
> org.apache.drill.test.rowSet.test.DummyWriterTest#testDummyScalar Running 
> org.apache.drill.test.rowSet.test.DummyWriterTest#testDummyMap Tests run: 2, 
> Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 3.109 sec - in 
> org.apache.drill.test.rowSet.test.DummyWriterTest Running 
> org.apache.drill.test.rowSet.test.TestVariableWidthWriter Running 
> org.apache.drill.test.rowSet.test.TestVariableWidthWriter#testSkipNulls 
> Running org.apache.drill.test.rowSet.test.TestVariableWidthWriter#testWrite 
> Running 
> org.apache.drill.test.rowSet.test.TestVariableWidthWriter#testFillEmpties 
> Running 
> org.apache.drill.test.rowSet.test.TestVariableWidthWriter#testRollover 
> Running 
> org.apache.drill.test.rowSet.test.TestVariableWidthWriter#testSizeLimit 
> Running 
> org.apache.drill.test.rowSet.test.TestVariableWidthWriter#testRolloverWithEmpties
>  Running 
> org.apache.drill.test.rowSet.test.TestVariableWidthWriter#testRestartRow 
> Killed
>  
> Results : 
> Tests run: 1554, Failures: 0, Errors: 0, Skipped: 66{code}
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-5902) Regression: Queries encounter random failure due to RPC connection timed out

2018-02-08 Thread Vlad Rozov (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357489#comment-16357489
 ] 

Vlad Rozov commented on DRILL-5902:
---

The connection is forcibly terminated by a Drillbit (foreman) due to naive flow 
control that Drill implements. It uses "idle" timeout of 15 seconds to detect 
"bad" connections and in case processing is done on the thread that handles 
connection communication and it takes longer than 15 seconds to process a 
request, the connection is considered "bad" and is forcibly terminated. In this 
case, while processing cancellation request, Drill writes to query profile and 
it takes longer than 15 seconds (especially if there are a lot of profiles 
already written to profiles directory). To fix the issue, foreman cancellation 
is processed asynchronously. 

> Regression: Queries encounter random failure due to RPC connection timed out
> 
>
> Key: DRILL-5902
> URL: https://issues.apache.org/jira/browse/DRILL-5902
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - RPC
>Affects Versions: 1.11.0
>Reporter: Robert Hou
>Assignee: Vlad Rozov
>Priority: Critical
> Attachments: 261230f7-e3b9-0cee-22d8-921cb56e3e12.sys.drill, 
> node196.drillbit.log
>
>
> Multiple random failures (25) occurred with the latest 
> Functional-Baseline-88.193 run.  Here is a sample query:
> {noformat}
> /root/drillAutomation/prasadns14/framework/resources/Functional/window_functions/multiple_partitions/q27.sql
> -- Kitchen sink
> -- Use all supported functions
> select
> rank()  over W,
> dense_rank()over W,
> percent_rank()  over W,
> cume_dist() over W,
> avg(c_integer + c_integer)  over W,
> sum(c_integer/100)  over W,
> count(*)over W,
> min(c_integer)  over W,
> max(c_integer)  over W,
> row_number()over W
> from
> j7
> where
> c_boolean is not null
> window  W as (partition by c_bigint, c_date, c_time, c_boolean order by 
> c_integer)
> {noformat}
> From the logs:
> {noformat}
> 2017-10-23 04:14:36,536 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:1 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> 2017-10-23 04:14:36,537 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:5 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> 2017-10-23 04:14:36,537 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:9 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> 2017-10-23 04:14:36,537 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:13 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> 2017-10-23 04:14:36,537 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:17 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> 2017-10-23 04:14:36,538 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:21 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> 2017-10-23 04:14:36,538 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:25 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> {noformat}
> {noformat}
> 2017-10-23 04:14:53,941 [UserServer-1] INFO  
> o.a.drill.exec.rpc.user.UserServer - RPC connection /10.10.88.196:31010 <--> 
> /10.10.88.193:38281 (user server) timed out.  Timeout was set to 30 seconds. 
> Closing connection.
> 2017-10-23 04:14:53,952 [UserServer-1] INFO  
> o.a.d.e.w.fragment.FragmentExecutor - 
> 261230f8-2698-15b2-952f-d4ade8d6b180:0:0: State change requested RUNNING --> 
> FAILED
> 2017-10-23 04:14:53,952 [261230f8-2698-15b2-952f-d4ade8d6b180:frag:0:0] INFO  
> o.a.d.e.w.fragment.FragmentExecutor - 
> 261230f8-2698-15b2-952f-d4ade8d6b180:0:0: State change requested FAILED --> 
> FINISHED
> 

[jira] [Commented] (DRILL-5902) Regression: Queries encounter random failure due to RPC connection timed out

2018-02-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357478#comment-16357478
 ] 

ASF GitHub Bot commented on DRILL-5902:
---

Github user vrozov commented on the issue:

https://github.com/apache/drill/pull/1113
  
@ilooner DRILL-6143 is not related to DRILL-5902. DRILL-6143 requires a 
separate RCA. See DRILL-5902 for details.


> Regression: Queries encounter random failure due to RPC connection timed out
> 
>
> Key: DRILL-5902
> URL: https://issues.apache.org/jira/browse/DRILL-5902
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - RPC
>Affects Versions: 1.11.0
>Reporter: Robert Hou
>Assignee: Vlad Rozov
>Priority: Critical
> Attachments: 261230f7-e3b9-0cee-22d8-921cb56e3e12.sys.drill, 
> node196.drillbit.log
>
>
> Multiple random failures (25) occurred with the latest 
> Functional-Baseline-88.193 run.  Here is a sample query:
> {noformat}
> /root/drillAutomation/prasadns14/framework/resources/Functional/window_functions/multiple_partitions/q27.sql
> -- Kitchen sink
> -- Use all supported functions
> select
> rank()  over W,
> dense_rank()over W,
> percent_rank()  over W,
> cume_dist() over W,
> avg(c_integer + c_integer)  over W,
> sum(c_integer/100)  over W,
> count(*)over W,
> min(c_integer)  over W,
> max(c_integer)  over W,
> row_number()over W
> from
> j7
> where
> c_boolean is not null
> window  W as (partition by c_bigint, c_date, c_time, c_boolean order by 
> c_integer)
> {noformat}
> From the logs:
> {noformat}
> 2017-10-23 04:14:36,536 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:1 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> 2017-10-23 04:14:36,537 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:5 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> 2017-10-23 04:14:36,537 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:9 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> 2017-10-23 04:14:36,537 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:13 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> 2017-10-23 04:14:36,537 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:17 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> 2017-10-23 04:14:36,538 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:21 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> 2017-10-23 04:14:36,538 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:25 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> {noformat}
> {noformat}
> 2017-10-23 04:14:53,941 [UserServer-1] INFO  
> o.a.drill.exec.rpc.user.UserServer - RPC connection /10.10.88.196:31010 <--> 
> /10.10.88.193:38281 (user server) timed out.  Timeout was set to 30 seconds. 
> Closing connection.
> 2017-10-23 04:14:53,952 [UserServer-1] INFO  
> o.a.d.e.w.fragment.FragmentExecutor - 
> 261230f8-2698-15b2-952f-d4ade8d6b180:0:0: State change requested RUNNING --> 
> FAILED
> 2017-10-23 04:14:53,952 [261230f8-2698-15b2-952f-d4ade8d6b180:frag:0:0] INFO  
> o.a.d.e.w.fragment.FragmentExecutor - 
> 261230f8-2698-15b2-952f-d4ade8d6b180:0:0: State change requested FAILED --> 
> FINISHED
> 2017-10-23 04:14:53,956 [UserServer-1] WARN  
> o.apache.drill.exec.rpc.RequestIdMap - Failure while attempting to fail rpc 
> response.
> java.lang.IllegalArgumentException: Self-suppression not permitted
> at java.lang.Throwable.addSuppressed(Throwable.java:1043) 
> ~[na:1.7.0_45]
> at 
> org.apache.drill.common.DeferredException.addException(DeferredException.java:88)
>  

[jira] [Commented] (DRILL-5902) Regression: Queries encounter random failure due to RPC connection timed out

2018-02-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357459#comment-16357459
 ] 

ASF GitHub Bot commented on DRILL-5902:
---

Github user ilooner commented on the issue:

https://github.com/apache/drill/pull/1113
  
@vrozov Can you provide a brief description of the issue? I have recently 
filed https://issues.apache.org/jira/browse/DRILL-6143 and I want to verify 
that these are two separate issues. 

DRILL-6143 causes a premature timeout when fragments are sent to drillbits 
in the FragmentsRunner. The issue you fixed here seems to involve a timeout 
when a query is cancelled. So my initial guess is that these two issues are 
unrelated. Please let me know if they are related.


> Regression: Queries encounter random failure due to RPC connection timed out
> 
>
> Key: DRILL-5902
> URL: https://issues.apache.org/jira/browse/DRILL-5902
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - RPC
>Affects Versions: 1.11.0
>Reporter: Robert Hou
>Assignee: Vlad Rozov
>Priority: Critical
> Attachments: 261230f7-e3b9-0cee-22d8-921cb56e3e12.sys.drill, 
> node196.drillbit.log
>
>
> Multiple random failures (25) occurred with the latest 
> Functional-Baseline-88.193 run.  Here is a sample query:
> {noformat}
> /root/drillAutomation/prasadns14/framework/resources/Functional/window_functions/multiple_partitions/q27.sql
> -- Kitchen sink
> -- Use all supported functions
> select
> rank()  over W,
> dense_rank()over W,
> percent_rank()  over W,
> cume_dist() over W,
> avg(c_integer + c_integer)  over W,
> sum(c_integer/100)  over W,
> count(*)over W,
> min(c_integer)  over W,
> max(c_integer)  over W,
> row_number()over W
> from
> j7
> where
> c_boolean is not null
> window  W as (partition by c_bigint, c_date, c_time, c_boolean order by 
> c_integer)
> {noformat}
> From the logs:
> {noformat}
> 2017-10-23 04:14:36,536 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:1 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> 2017-10-23 04:14:36,537 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:5 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> 2017-10-23 04:14:36,537 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:9 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> 2017-10-23 04:14:36,537 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:13 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> 2017-10-23 04:14:36,537 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:17 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> 2017-10-23 04:14:36,538 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:21 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> 2017-10-23 04:14:36,538 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler 
> - Dropping request for early fragment termination for path 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:1:25 -> 
> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> {noformat}
> {noformat}
> 2017-10-23 04:14:53,941 [UserServer-1] INFO  
> o.a.drill.exec.rpc.user.UserServer - RPC connection /10.10.88.196:31010 <--> 
> /10.10.88.193:38281 (user server) timed out.  Timeout was set to 30 seconds. 
> Closing connection.
> 2017-10-23 04:14:53,952 [UserServer-1] INFO  
> o.a.d.e.w.fragment.FragmentExecutor - 
> 261230f8-2698-15b2-952f-d4ade8d6b180:0:0: State change requested RUNNING --> 
> FAILED
> 2017-10-23 04:14:53,952 [261230f8-2698-15b2-952f-d4ade8d6b180:frag:0:0] INFO  
> o.a.d.e.w.fragment.FragmentExecutor - 
> 261230f8-2698-15b2-952f-d4ade8d6b180:0:0: State change requested FAILED --> 
> FINISHED
> 2017-10-23 04:14:53,956 [UserServer-1] WARN  
> 

[jira] [Commented] (DRILL-6089) Validate That Planner Does Not Assume HashJoin Preserves Ordering for FS, MaprDB, or Hive

2018-02-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357433#comment-16357433
 ] 

ASF GitHub Bot commented on DRILL-6089:
---

Github user HanumathRao commented on the issue:

https://github.com/apache/drill/pull/1117
  
@ilooner Changes look fine to me. However, as discussed offline I couldn't 
reproduce the plan which was sorting one of the inputs for a HASHJOIN. This 
might be because in the hashjoinprule we are not asking for those traits in the 
first place. However, I do agree that this fix can be a safegaurd in case if 
that plan happens to show up in some scenarios.


Thanks for the fix. 


> Validate That Planner Does Not Assume HashJoin Preserves Ordering for FS, 
> MaprDB, or Hive
> -
>
> Key: DRILL-6089
> URL: https://issues.apache.org/jira/browse/DRILL-6089
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.13.0
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Major
> Fix For: 1.13.0
>
>
> Explanation provided by Boaz:
> (As explained in the design document) The new "automatic spill" feature of 
> the Hash-Join operator may cause (if spilling occurs) the rows from the 
> left/probe side to be returned in a different order than their incoming order 
> (due to splitting the rows into partitions).
> Currently the Drill planner assumes that left-order is preserved by the 
> Hash-Join operator; therefore if not changes, a query relying on that order 
> may return wrong results (when the Hash-Join spills).
> A fix is needed. Here are few options (ordered from the simpler down to the 
> most complex):
>  # Change the order rule in the planner. Thus whenever an order is needed 
> above (downstream) the Hash-Join, the planner would add a sort operator. That 
> would be a big execution time waste.
>  # When the planner needs the left-order above the Hash-Join, it may assess 
> the size of the right/build side (need statistics). If the right side is 
> small enough, the planner would set an option for the runtime to avoid 
> spilling, hence preserving the left-side order. In case spilling becomes 
> necessary, the code would return an error (possibly with a message suggesting 
> setting some special option and retrying; the special option would add a sort 
> operator and allow the hash-join to spill).
>  # When generating the code for the fragment above the Hash-Join (where 
> left-order should be maintained) - at code-gen time check if the hash-join 
> below spilled, and if so, add a sort operator. (Nothing like that exists in 
> Drill now, so it may be complicated).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6129) Query fails on nested data type schema change

2018-02-08 Thread Pritesh Maker (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker updated DRILL-6129:
-
Labels: ready-to-commit  (was: )

> Query fails on nested data type schema change
> -
>
> Key: DRILL-6129
> URL: https://issues.apache.org/jira/browse/DRILL-6129
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - CLI
>Affects Versions: 1.10.0
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.13.0
>
>
> Use-Case -
>  * Assume two parquet files with similar schemas except for a nested column
>  * Schema file1
>  ** int64 field1
>  ** optional group field2
>  *** optional group field2.1 (LIST)
>   repeated group list
>  * optional group element
>  ** optional int64 child_field
>  * Schema file2
>  ** int64 field1
>  ** optional group field2
>  *** optional group field2.1 (LIST)
>   repeated group list
>  * optional group element
>  ** optional group child_field
>  *** optional int64 child_field_f1
>  *** optional int64 child_field_f1
>  * Essentially child_field changed from an int64 to a group of fields
>  
> Observed Query Failure
> select * from ;
> Error: Unexpected RuntimeException: java.lang.IllegalArgumentException: The 
> field $bits$(UINT1:REQUIRED) doesn't match the provided metadata major_type {
>   minor_type: MAP
>   mode: REQUIRED
> Note that selecting one file at a time succeeds which seems to indicate the 
> issue has to do with the schema change logic. 
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6138) Move RecordBatchSizer to org.apache.drill.exec.record package

2018-02-08 Thread Pritesh Maker (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker updated DRILL-6138:
-
Reviewer: Boaz Ben-Zvi

> Move RecordBatchSizer to org.apache.drill.exec.record package
> -
>
> Key: DRILL-6138
> URL: https://issues.apache.org/jira/browse/DRILL-6138
> Project: Apache Drill
>  Issue Type: Task
>  Components: Execution - Flow
>Affects Versions: 1.12.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.13.0
>
>
> Move RecordBatchSizer from org.apache.drill.exec.physical.impl.spill package 
> to org.apache.drill.exec.record package.
> Minor refactoring - change columnSizes from list to map. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6145) Implement Hive MapR-DB JSON handler.

2018-02-08 Thread Chunhui Shi (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357161#comment-16357161
 ] 

Chunhui Shi commented on DRILL-6145:


Should this Jira be created in Hive project?

> Implement Hive MapR-DB JSON handler. 
> -
>
> Key: DRILL-6145
> URL: https://issues.apache.org/jira/browse/DRILL-6145
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive, Storage - MapRDB
>Affects Versions: 1.12.0
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
>Priority: Critical
>  Labels: doc-impacting
> Fix For: 1.13.0
>
>
> Similar to "hive-hbase-storage-handler" to support querying MapR-DB Hive's 
> external tables it is necessary to add "hive-maprdb-json-handler".
> Use case:
>  # Create a table MapR-DB JSON table:
> {code}
> _> mapr dbshell_
> _maprdb root:> create /tmp/table/json_  (make sure /tmp/table exists)
> {code}
> -- insert data
> {code}
> insert /tmp/table/json --value '\{"_id":"movie002" , "title":"Developers 
> on the Edge", "studio":"Command Line Studios"}'
> insert /tmp/table/json --id movie003 --value '\{"title":"The Golden 
> Master", "studio":"All-Nighter"}'
> {code} 
>  #  Create a Hive external table:
> {code}
> CREATE EXTERNAL TABLE mapr_db_json_hive_tbl ( 
> movie_id string, title string, studio string) 
> STORED BY 'org.apache.hadoop.hive.maprdb.json.MapRDBJsonStorageHandler' 
> TBLPROPERTIES("maprdb.table.name" = "/tmp/table/json","maprdb.column.id" = 
> "movie_id");
> {code}
>  
>  #  Use hive schema to query this table:
> {code}
> select * from hive.mapr_db_json_hive_tbl;
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6128) Wrong Result with Nested Loop Join

2018-02-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357118#comment-16357118
 ] 

ASF GitHub Bot commented on DRILL-6128:
---

Github user amansinha100 commented on the issue:

https://github.com/apache/drill/pull/1109
  
+1 .  Thanks for adding the unit tests. 


> Wrong Result with Nested Loop Join
> --
>
> Key: DRILL-6128
> URL: https://issues.apache.org/jira/browse/DRILL-6128
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Reporter: Sorabh Hamirwasia
>Assignee: Sorabh Hamirwasia
>Priority: Major
>
> Nested Loop Join produces wrong result's if there are multiple batches on the 
> right side. It builds an ExapandableHyperContainer to hold all the right side 
> of batches. Then for each record on left side input evaluates the condition 
> with all records on right side and emit the output if condition is satisfied. 
> The main loop inside 
> [populateOutgoingBatch|https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/NestedLoopJoinTemplate.java#L106]
>  call's *doEval* with correct indexes to evaluate records on both the sides. 
> In generated code of *doEval* for some reason there is a right shift of 16 
> done on the rightBatchIndex (sample shared below).
> {code:java}
> public boolean doEval(int leftIndex, int rightBatchIndex, int 
> rightRecordIndexWithinBatch)
>  throws SchemaChangeException
> {
>   {
>IntHolder out3 = new IntHolder();
>{
>  out3 .value = vv0 .getAccessor().get((leftIndex));
>}
>IntHolder out7 = new IntHolder();
>{
>  out7 .value =  
>  
> vv4[((rightBatchIndex)>>>16)].getAccessor().get(((rightRecordIndexWithinBatch)&
>  65535));
>}
> ..
> ..
> }{code}
>  
> When the actual loop is processing second batch, inside eval method the index 
> with right shift becomes 0 and it ends up evaluating condition w.r.t first 
> right batch again. So if there is more than one batch (upto 65535) on right 
> side doEval will always consider first batch for condition evaluation. But 
> the output data will be based on correct batch so there will be issues like 
> OutOfBound and WrongData. Cases can be:
> Let's say: *rightBatchIndex*: index of right batch to consider, 
> *rightRecordIndexWithinBatch*: index of record in right batch at 
> rightBatchIndex
> 1) First right batch comes with zero data and with OK_NEW_SCHEMA (let's say 
> because of filter in the operator tree). Next Right batch has > 0 data. So 
> when we call doEval for second batch(*rightBatchIndex = 1*) and first record 
> in it (i.e. *rightRecordIndexWithinBatch = 0*), actual evaluation will happen 
> using first batch (since *rightBatchIndex >>> 16 = 0*). On accessing record 
> at *rightRecordIndexWithinBatch* in first batch it will throw 
> *IndexOutofBoundException* since the first batch has no records.
> 2) Let's say there are 2 batches on right side. Also let's say first batch 
> contains 3 records (with id_right=1/2/3) and 2nd batch also contain 3 records 
> (with id_right=10/20/30). Also let's say there is 1 batch on left side with 3 
> records (with id_left=1/2/3). Then in this case the NestedLoopJoin (with 
> equality condition) will end up producing 6 records instead of 3. It produces 
> first 3 records based on match between left records and match in first right 
> batch records. But while 2nd right batch it will evaluate id_left=id_right 
> based on first batch instead and will again find matches and will produce 
> another 3 records. *Example:*
> *Left Batch Data:*
>  
> {code:java}
> Batch1:
> {
>  "id_left": 1,
>  "cost_left": 11,
>  "name_left": "item11"
> }
> {
>  "id_left": 2,
>  "cost_left": 21,
>  "name_left": "item21"
> }
> {
>  "id_left": 3,
>  "cost_left": 31,
>  "name_left": "item31"
> }{code}
>  
> *Right Batch Data:*
>  
> {code:java}
> Batch 1:
> {
>  "id_right": 1,
>  "cost_right": 10,
>  "name_right": "item1"
> }
> {
>  "id_right": 2,
>  "cost_right": 20,
>  "name_right": "item2"
> }
> {
>  "id_right": 3,
>  "cost_right": 30,
>  "name_right": "item3"
> }
> {code}
>  
>  
> {code:java}
> Batch 2:
> {
>  "id_right": 4,
>  "cost_right": 40,
>  "name_right": "item4"
> }
> {
>  "id_right": 4,
>  "cost_right": 40,
>  "name_right": "item4"
> }
> {
>  "id_right": 4,
>  "cost_right": 40,
>  "name_right": "item4"
> }{code}
>  
> *Produced output:*
> {code:java}
> {
>  "id_left": 1,
>  "cost_left": 11,
>  "name_left": "item11",
>  "id_right": 1,
>  "cost_right": 10,
>  "name_right": "item1"
> }
> {
>  "id_left": 1,
>  "cost_left": 11,
>  "name_left": "item11",
>  "id_right": 4,
>  "cost_right": 40,
>  "name_right": "item4"
> }
> {
>  "id_left": 2,
>  "cost_left": 21,
>  "name_left": "item21"
>  "id_right": 2, 
>  "cost_right": 20,
>  

[jira] [Commented] (DRILL-6128) Wrong Result with Nested Loop Join

2018-02-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357112#comment-16357112
 ] 

ASF GitHub Bot commented on DRILL-6128:
---

Github user amansinha100 commented on a diff in the pull request:

https://github.com/apache/drill/pull/1109#discussion_r166981670
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/NestedLoopJoinTemplate.java
 ---
@@ -117,9 +117,12 @@ private int populateOutgoingBatch(JoinRelType 
joinType, int outputIndex) {
   for (; nextRightBatchToProcess < rightCounts.size(); 
nextRightBatchToProcess++) {
 int rightRecordCount = rightCounts.get(nextRightBatchToProcess);
 // for every record in right batch
+final int currentRightBatchIndex = nextRightBatchToProcess << 16;
 for (; nextRightRecordToProcess < rightRecordCount; 
nextRightRecordToProcess++) {
 
-  if (doEval(nextLeftRecordToProcess, nextRightBatchToProcess, 
nextRightRecordToProcess)) {
+  // Since right container is a hyper container, in doEval 
generated code it expects the
--- End diff --

Minor: this comment ideally could be moved up to the statement where you do 
the left shift since that's the main change. 


> Wrong Result with Nested Loop Join
> --
>
> Key: DRILL-6128
> URL: https://issues.apache.org/jira/browse/DRILL-6128
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Reporter: Sorabh Hamirwasia
>Assignee: Sorabh Hamirwasia
>Priority: Major
>
> Nested Loop Join produces wrong result's if there are multiple batches on the 
> right side. It builds an ExapandableHyperContainer to hold all the right side 
> of batches. Then for each record on left side input evaluates the condition 
> with all records on right side and emit the output if condition is satisfied. 
> The main loop inside 
> [populateOutgoingBatch|https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/NestedLoopJoinTemplate.java#L106]
>  call's *doEval* with correct indexes to evaluate records on both the sides. 
> In generated code of *doEval* for some reason there is a right shift of 16 
> done on the rightBatchIndex (sample shared below).
> {code:java}
> public boolean doEval(int leftIndex, int rightBatchIndex, int 
> rightRecordIndexWithinBatch)
>  throws SchemaChangeException
> {
>   {
>IntHolder out3 = new IntHolder();
>{
>  out3 .value = vv0 .getAccessor().get((leftIndex));
>}
>IntHolder out7 = new IntHolder();
>{
>  out7 .value =  
>  
> vv4[((rightBatchIndex)>>>16)].getAccessor().get(((rightRecordIndexWithinBatch)&
>  65535));
>}
> ..
> ..
> }{code}
>  
> When the actual loop is processing second batch, inside eval method the index 
> with right shift becomes 0 and it ends up evaluating condition w.r.t first 
> right batch again. So if there is more than one batch (upto 65535) on right 
> side doEval will always consider first batch for condition evaluation. But 
> the output data will be based on correct batch so there will be issues like 
> OutOfBound and WrongData. Cases can be:
> Let's say: *rightBatchIndex*: index of right batch to consider, 
> *rightRecordIndexWithinBatch*: index of record in right batch at 
> rightBatchIndex
> 1) First right batch comes with zero data and with OK_NEW_SCHEMA (let's say 
> because of filter in the operator tree). Next Right batch has > 0 data. So 
> when we call doEval for second batch(*rightBatchIndex = 1*) and first record 
> in it (i.e. *rightRecordIndexWithinBatch = 0*), actual evaluation will happen 
> using first batch (since *rightBatchIndex >>> 16 = 0*). On accessing record 
> at *rightRecordIndexWithinBatch* in first batch it will throw 
> *IndexOutofBoundException* since the first batch has no records.
> 2) Let's say there are 2 batches on right side. Also let's say first batch 
> contains 3 records (with id_right=1/2/3) and 2nd batch also contain 3 records 
> (with id_right=10/20/30). Also let's say there is 1 batch on left side with 3 
> records (with id_left=1/2/3). Then in this case the NestedLoopJoin (with 
> equality condition) will end up producing 6 records instead of 3. It produces 
> first 3 records based on match between left records and match in first right 
> batch records. But while 2nd right batch it will evaluate id_left=id_right 
> based on first batch instead and will again find matches and will produce 
> another 3 records. *Example:*
> *Left Batch Data:*
>  
> {code:java}
> Batch1:
> {
>  "id_left": 1,
>  "cost_left": 11,
>  "name_left": "item11"
> }
> {
>  "id_left": 2,
>  "cost_left": 21,
>  "name_left": "item21"
> }
> {
>  "id_left": 3,
>  "cost_left": 31,
>  

[jira] [Created] (DRILL-6145) Implement Hive MapR-DB JSON handler.

2018-02-08 Thread Vitalii Diravka (JIRA)
Vitalii Diravka created DRILL-6145:
--

 Summary: Implement Hive MapR-DB JSON handler. 
 Key: DRILL-6145
 URL: https://issues.apache.org/jira/browse/DRILL-6145
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - Hive, Storage - MapRDB
Affects Versions: 1.12.0
Reporter: Vitalii Diravka
Assignee: Vitalii Diravka
 Fix For: 1.13.0


Similar to "hive-hbase-storage-handler" to support querying MapR-DB Hive's 
external tables it is necessary to add "hive-maprdb-json-handler".

Use case:

 # Create a table MapR-DB JSON table:
{code}
_> mapr dbshell_

_maprdb root:> create /tmp/table/json_  (make sure /tmp/table exists)
{code}
-- insert data
{code}
insert /tmp/table/json --value '\{"_id":"movie002" , "title":"Developers on 
the Edge", "studio":"Command Line Studios"}'

insert /tmp/table/json --id movie003 --value '\{"title":"The Golden 
Master", "studio":"All-Nighter"}'

{code} 
 #  Create a Hive external table:
{code}
CREATE EXTERNAL TABLE mapr_db_json_hive_tbl ( 
movie_id string, title string, studio string) 
STORED BY 'org.apache.hadoop.hive.maprdb.json.MapRDBJsonStorageHandler' 
TBLPROPERTIES("maprdb.table.name" = "/tmp/table/json","maprdb.column.id" = 
"movie_id");
{code}
 
 #  Use hive schema to query this table:
{code}
select * from hive.mapr_db_json_hive_tbl;
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6145) Implement Hive MapR-DB JSON handler.

2018-02-08 Thread Vitalii Diravka (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalii Diravka updated DRILL-6145:
---
Labels: doc-impacting  (was: )

> Implement Hive MapR-DB JSON handler. 
> -
>
> Key: DRILL-6145
> URL: https://issues.apache.org/jira/browse/DRILL-6145
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive, Storage - MapRDB
>Affects Versions: 1.12.0
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
>Priority: Critical
>  Labels: doc-impacting
> Fix For: 1.13.0
>
>
> Similar to "hive-hbase-storage-handler" to support querying MapR-DB Hive's 
> external tables it is necessary to add "hive-maprdb-json-handler".
> Use case:
>  # Create a table MapR-DB JSON table:
> {code}
> _> mapr dbshell_
> _maprdb root:> create /tmp/table/json_  (make sure /tmp/table exists)
> {code}
> -- insert data
> {code}
> insert /tmp/table/json --value '\{"_id":"movie002" , "title":"Developers 
> on the Edge", "studio":"Command Line Studios"}'
> insert /tmp/table/json --id movie003 --value '\{"title":"The Golden 
> Master", "studio":"All-Nighter"}'
> {code} 
>  #  Create a Hive external table:
> {code}
> CREATE EXTERNAL TABLE mapr_db_json_hive_tbl ( 
> movie_id string, title string, studio string) 
> STORED BY 'org.apache.hadoop.hive.maprdb.json.MapRDBJsonStorageHandler' 
> TBLPROPERTIES("maprdb.table.name" = "/tmp/table/json","maprdb.column.id" = 
> "movie_id");
> {code}
>  
>  #  Use hive schema to query this table:
> {code}
> select * from hive.mapr_db_json_hive_tbl;
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)