date:20180427

[jira] [Commented] (DRILL-6364) WebUI does not cleanly handle shutdown and state toggling when Drillbits go on and offline

2018-04-27 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16457253#comment-16457253
 ] 

ASF GitHub Bot commented on DRILL-6364:
---

Github user kkhatua commented on the issue:

https://github.com/apache/drill/pull/1241
  
@sohami  / @arina-ielchiieva  can you review this? The change is not 
extensive and fairly straightforward.


> WebUI does not cleanly handle shutdown and state toggling when Drillbits go 
> on and offline
> --
>
> Key: DRILL-6364
> URL: https://issues.apache.org/jira/browse/DRILL-6364
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Web Server
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>Priority: Major
> Fix For: 1.14.0
>
>
> When the webpage is loaded the first time, the shutdown button is enabled by 
> default, which might not be correct, since scenarios like HTTPS, etc does not 
> support this for remote bits. (i.e the user needs to navigate to that node's 
> UI for shutting it down). 
> Similarly, when a previously unseen Drillbit comes online, the node will not 
> be rendered until the page is refreshed by the user. 
> Lastly, if the node from whom the UI page was served goes down, the status 
> update for the rest of the cluster is not updated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6364) WebUI does not cleanly handle shutdown and state toggling when Drillbits go on and offline

2018-04-27 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16457251#comment-16457251
 ] 

ASF GitHub Bot commented on DRILL-6364:
---

Github user kkhatua commented on the issue:

https://github.com/apache/drill/pull/1241
  
Screenshot of when UI node `kk127` goes down. The UI's javascript logic 
queries other Drillbits in the list (in this case, `kk128`) and discovers two 
new previously unseen Drillbits - `kk130` and `kk129`, discovered in the 
sequence in which they were discovered in the cluster. State changes are marked 
correctly, with shutdown buttons disabled.
A prompt in the form of an orange refresh button near the Drillbit count 
indicates the need to refresh. Alternatively, one of the other nodes can be 
used for pop-out of a new WebUI.


![image](https://user-images.githubusercontent.com/4335237/39389539-681fed40-4a3e-11e8-92f7-6d5e717e0881.png)

 


> WebUI does not cleanly handle shutdown and state toggling when Drillbits go 
> on and offline
> --
>
> Key: DRILL-6364
> URL: https://issues.apache.org/jira/browse/DRILL-6364
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Web Server
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>Priority: Major
> Fix For: 1.14.0
>
>
> When the webpage is loaded the first time, the shutdown button is enabled by 
> default, which might not be correct, since scenarios like HTTPS, etc does not 
> support this for remote bits. (i.e the user needs to navigate to that node's 
> UI for shutting it down). 
> Similarly, when a previously unseen Drillbit comes online, the node will not 
> be rendered until the page is refreshed by the user. 
> Lastly, if the node from whom the UI page was served goes down, the status 
> update for the rest of the cluster is not updated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6364) WebUI does not cleanly handle shutdown and state toggling when Drillbits go on and offline

2018-04-27 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16457232#comment-16457232
 ] 

ASF GitHub Bot commented on DRILL-6364:
---

GitHub user kkhatua opened a pull request:

https://github.com/apache/drill/pull/1241

DRILL-6364: Handle Cluster Info in WebUI when existing/new bits restart

As a follow up to DRILL-6289, the following improvements have been done:
1. When loading the page for the first time, the WebUI enables the shutdown 
button without actually checking the state of the Drillbits.
   The ideal behaviour should be to disable the button till the state is 
verified. **[Done]**
   _If a Drillbit is confirmed down (i.e. not in `/state` response), it is 
marked as OFFLINE and button is disabled._
2. When shutting down the current Drillbit, the WebUI no more has access to 
the cluster state. 
   The ideal behaviour here should be to fetch the state from any of the 
other Drillbits to update the status. **[Done]**
   _With the current Drillbit down, the other bits are requested for 
cluster state info and update accordingly._
3. When a new, previously unseen Drillbit comes up, the WebUI will never 
render it because the table is statically generated during the first page load. 
   The idea behaviour should be to append to the table on discovery of a 
new node. **[Done]**
   _The new Drillbit info is injected and a prompt appears to refresh the 
page to re-populate any missing info. This also works with feature (2) 
mentioned above._

The only Java code change was to have the state response carry the address 
and http-port as a tuple, instead of the user-port (which seems to be never 
used).

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kkhatua/drill DRILL-6364

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/1241.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1241


commit ab3e8619c6259803eb362be290a3a3605839a194
Author: Kunal Khatua 
Date:   2018-04-27T23:27:45Z

DRILL-6364: Handle Cluster Info in WebUI when existing/new bits restart

As a follow up to DRILL-6289, the following improvements have been done:
1. When loading the page for the first time, the WebUI enables the shutdown 
button without actually checking the state of the Drillbits.
   The ideal behaviour should be to disable the button till the state is 
verified. [Done]
   If a Drillbit is confirmed down (i.e. not in `/state` response), it is 
marked as OFFLINE and button is disabled.
2. When shutting down the current Drillbit, the WebUI no more has access to 
the cluster state. 
   The ideal behaviour here should be to fetch the state from any of the 
other Drillbits to update the status. [Done]
   With the current Drillbit down, the other bits are requested for cluster 
state info and update accordingly.
3. When a new, previously unseen Drillbit comes up, the WebUI will never 
render it because the table is statically generated during the first page load. 
   The idea behaviour should be to append to the table on discovery of a 
new node. [Done]
   The new Drillbit info is injected and a prompt appears to refresh the 
page to re-populate any missing info. This also works with feature (2) 
mentioned above.

The only Java code change was to have the state response carry the address 
and http-port as a tuple, instead of the user-port (which is never used).




> WebUI does not cleanly handle shutdown and state toggling when Drillbits go 
> on and offline
> --
>
> Key: DRILL-6364
> URL: https://issues.apache.org/jira/browse/DRILL-6364
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Web Server
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>Priority: Major
> Fix For: 1.14.0
>
>
> When the webpage is loaded the first time, the shutdown button is enabled by 
> default, which might not be correct, since scenarios like HTTPS, etc does not 
> support this for remote bits. (i.e the user needs to navigate to that node's 
> UI for shutting it down). 
> Similarly, when a previously unseen Drillbit comes online, the node will not 
> be rendered until the page is refreshed by the user. 
> Lastly, if the node from whom the UI page was served goes down, the status 
> update for the rest of the cluster is not updated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (DRILL-6364) WebUI does not cleanly handle shutdown and state toggling when Drillbits go on and offline

2018-04-27 Thread Kunal Khatua (JIRA)

Kunal Khatua created DRILL-6364:
---

 Summary: WebUI does not cleanly handle shutdown and state toggling 
when Drillbits go on and offline
 Key: DRILL-6364
 URL: https://issues.apache.org/jira/browse/DRILL-6364
 Project: Apache Drill
  Issue Type: Bug
  Components: Web Server
Reporter: Kunal Khatua
Assignee: Kunal Khatua
 Fix For: 1.14.0


When the webpage is loaded the first time, the shutdown button is enabled by 
default, which might not be correct, since scenarios like HTTPS, etc does not 
support this for remote bits. (i.e the user needs to navigate to that node's UI 
for shutting it down). 

Similarly, when a previously unseen Drillbit comes online, the node will not be 
rendered until the page is refreshed by the user. 

Lastly, if the node from whom the UI page was served goes down, the status 
update for the rest of the cluster is not updated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6242) Output format for nested date, time, timestamp values in an object hierarchy

2018-04-27 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16457203#comment-16457203
 ] 

ASF GitHub Bot commented on DRILL-6242:
---

Github user paul-rogers commented on the issue:

https://github.com/apache/drill/pull/1184
  
Just a quick reminder that the current "JSON Map" returned for a map column 
in JDBC was very likely done so that calling `toString()` in `sqlline` produces 
something like this: `{"c":"foo"}`. I realize this is a very obscure point; but 
worth keeping in mind to avoid bugs from `sqlline` users...


> Output format for nested date, time, timestamp values in an object hierarchy
> 
>
> Key: DRILL-6242
> URL: https://issues.apache.org/jira/browse/DRILL-6242
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.12.0
>Reporter: Jiang Wu
>Assignee: Jiang Wu
>Priority: Major
> Fix For: 1.14.0
>
>
> Some storages (mapr db, mongo db, etc.) have hierarchical objects that 
> contain nested fields of date, time, timestamp types.  When a query returns 
> these objects, the output format for the nested date, time, timestamp, are 
> showing the internal object (org.joda.time.DateTime), rather than the logical 
> data value.
> For example.  Suppose in MongoDB, we have a single object that looks like 
> this:
> {code:java}
> > db.test.findOne();
> {
> "_id" : ObjectId("5aa8487d470dd39a635a12f5"),
> "name" : "orange",
> "context" : {
> "date" : ISODate("2018-03-13T21:52:54.940Z"),
> "user" : "jack"
> }
> }
> {code}
> Then connect Drill to the above MongoDB storage, and run the following query 
> within Drill:
> {code:java}
> > select t.context.`date`, t.context from test t; 
> ++-+ 
> | EXPR$0 | context | 
> ++-+ 
> | 2018-03-13 | 
> {"date":{"dayOfYear":72,"year":2018,"dayOfMonth":13,"dayOfWeek":2,"era":1,"millisOfDay":78774940,"weekOfWeekyear":11,"weekyear":2018,"monthOfYear":3,"yearOfEra":2018,"yearOfCentury":18,"centuryOfEra":20,"millisOfSecond":940,"secondOfMinute":54,"secondOfDay":78774,"minuteOfHour":52,"minuteOfDay":1312,"hourOfDay":21,"zone":{"fixed":true,"id":"UTC"},"millis":1520977974940,"chronology":{"zone":{"fixed":true,"id":"UTC"}},"afterNow":false,"beforeNow":true,"equalNow":false},"user":"jack"}
>  |
> {code}
> We can see that from the above output, when the date field is retrieved as a 
> top level column, Drill outputs a logical date value.  But when the same 
> field is within an object hierarchy, Drill outputs the internal object used 
> to hold the date value.
> The expected output is the same display for whether the date field is shown 
> as a top level column or when it is within an object hierarchy:
> {code:java}
> > select t.context.`date`, t.context from test t; 
> ++-+ 
> | EXPR$0 | context | 
> ++-+ 
> | 2018-03-13 | {"date":"2018-03-13","user":"jack"} |
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-5917) Ban org.json:json library in Drill

2018-04-27 Thread Vlad Rozov (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16457196#comment-16457196
 ] 

Vlad Rozov commented on DRILL-5917:
---

?

> Ban org.json:json library in Drill
> --
>
> Key: DRILL-5917
> URL: https://issues.apache.org/jira/browse/DRILL-5917
> Project: Apache Drill
>  Issue Type: Task
>Affects Versions: 1.11.0
>Reporter: Arina Ielchiieva
>Assignee: Vlad Rozov
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.12.0
>
> Attachments: image.png
>
>
> Apache Drill has dependencies on json.org lib indirectly from two libraries:
> com.mapr.hadoop:maprfs:jar:5.2.1-mapr
> com.mapr.fs:mapr-hbase:jar:5.2.1-mapr
> {noformat}
> [INFO] org.apache.drill.contrib:drill-format-mapr:jar:1.12.0-SNAPSHOT
> [INFO] +- com.mapr.hadoop:maprfs:jar:5.2.1-mapr:compile
> [INFO] |  \- org.json:json:jar:20080701:compile
> [INFO] \- com.mapr.fs:mapr-hbase:jar:5.2.1-mapr:compile
> [INFO]\- (org.json:json:jar:20080701:compile - omitted for duplicate)
> {noformat}
> Need to make sure we won't have any dependencies from these libs to 
> org.json:json lib and ban this lib in main pom.xml file.
> Issue is critical since Apache release won't happen until we make sure 
> org.json:json lib is not used (https://www.apache.org/legal/resolved.html).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-5917) Ban org.json:json library in Drill

2018-04-27 Thread David Capwell (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Capwell updated DRILL-5917:
-
Attachment: image.png

> Ban org.json:json library in Drill
> --
>
> Key: DRILL-5917
> URL: https://issues.apache.org/jira/browse/DRILL-5917
> Project: Apache Drill
>  Issue Type: Task
>Affects Versions: 1.11.0
>Reporter: Arina Ielchiieva
>Assignee: Vlad Rozov
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.12.0
>
> Attachments: image.png
>
>
> Apache Drill has dependencies on json.org lib indirectly from two libraries:
> com.mapr.hadoop:maprfs:jar:5.2.1-mapr
> com.mapr.fs:mapr-hbase:jar:5.2.1-mapr
> {noformat}
> [INFO] org.apache.drill.contrib:drill-format-mapr:jar:1.12.0-SNAPSHOT
> [INFO] +- com.mapr.hadoop:maprfs:jar:5.2.1-mapr:compile
> [INFO] |  \- org.json:json:jar:20080701:compile
> [INFO] \- com.mapr.fs:mapr-hbase:jar:5.2.1-mapr:compile
> [INFO]\- (org.json:json:jar:20080701:compile - omitted for duplicate)
> {noformat}
> Need to make sure we won't have any dependencies from these libs to 
> org.json:json lib and ban this lib in main pom.xml file.
> Issue is critical since Apache release won't happen until we make sure 
> org.json:json lib is not used (https://www.apache.org/legal/resolved.html).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6242) Output format for nested date, time, timestamp values in an object hierarchy

2018-04-27 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16457182#comment-16457182
 ] 

ASF GitHub Bot commented on DRILL-6242:
---

Github user parthchandra commented on the issue:

https://github.com/apache/drill/pull/1184
  
```
What do you mean by "Json representation"? 
```
Sorry, my mistake, got all tangled up. 
```
 we may want to further translate the Local [Date|Time|DateTime] objects 
inside the Map|List to java.sql.[Date|Time|Timestamp] upon access. But to do 
that inside the SqlAccessor, you would need to deep copy the Map|List and build 
another version with the date|time translated into java.sql.date|time.
```
That is what I thought you wanted to get to. If the current state is 
something you can work with, then great.  I can review the final changes once 
you're done and merge them as well. 
Let's move the other discussion to another thread or JIRA.


> Output format for nested date, time, timestamp values in an object hierarchy
> 
>
> Key: DRILL-6242
> URL: https://issues.apache.org/jira/browse/DRILL-6242
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.12.0
>Reporter: Jiang Wu
>Assignee: Jiang Wu
>Priority: Major
> Fix For: 1.14.0
>
>
> Some storages (mapr db, mongo db, etc.) have hierarchical objects that 
> contain nested fields of date, time, timestamp types.  When a query returns 
> these objects, the output format for the nested date, time, timestamp, are 
> showing the internal object (org.joda.time.DateTime), rather than the logical 
> data value.
> For example.  Suppose in MongoDB, we have a single object that looks like 
> this:
> {code:java}
> > db.test.findOne();
> {
> "_id" : ObjectId("5aa8487d470dd39a635a12f5"),
> "name" : "orange",
> "context" : {
> "date" : ISODate("2018-03-13T21:52:54.940Z"),
> "user" : "jack"
> }
> }
> {code}
> Then connect Drill to the above MongoDB storage, and run the following query 
> within Drill:
> {code:java}
> > select t.context.`date`, t.context from test t; 
> ++-+ 
> | EXPR$0 | context | 
> ++-+ 
> | 2018-03-13 | 
> {"date":{"dayOfYear":72,"year":2018,"dayOfMonth":13,"dayOfWeek":2,"era":1,"millisOfDay":78774940,"weekOfWeekyear":11,"weekyear":2018,"monthOfYear":3,"yearOfEra":2018,"yearOfCentury":18,"centuryOfEra":20,"millisOfSecond":940,"secondOfMinute":54,"secondOfDay":78774,"minuteOfHour":52,"minuteOfDay":1312,"hourOfDay":21,"zone":{"fixed":true,"id":"UTC"},"millis":1520977974940,"chronology":{"zone":{"fixed":true,"id":"UTC"}},"afterNow":false,"beforeNow":true,"equalNow":false},"user":"jack"}
>  |
> {code}
> We can see that from the above output, when the date field is retrieved as a 
> top level column, Drill outputs a logical date value.  But when the same 
> field is within an object hierarchy, Drill outputs the internal object used 
> to hold the date value.
> The expected output is the same display for whether the date field is shown 
> as a top level column or when it is within an object hierarchy:
> {code:java}
> > select t.context.`date`, t.context from test t; 
> ++-+ 
> | EXPR$0 | context | 
> ++-+ 
> | 2018-03-13 | {"date":"2018-03-13","user":"jack"} |
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6348) Unordered Receiver does not report its memory usage

2018-04-27 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16457120#comment-16457120
 ] 

ASF GitHub Bot commented on DRILL-6348:
---

Github user vrozov commented on the issue:

https://github.com/apache/drill/pull/1237
  
IMO, it will be good to understand what other operators do as well. For 
example what Project or Filter operators do. Do they take ownership of incoming 
batches? And if they do, when is the ownership taken?

I do not suggest that we change how Sender and Receiver control **all** 
aspects of communication, at least not as part of this JIRA/PR. The difference 
in my and your approach is whether or not UnorderedReceiver and other receivers 
are pass-through operators. My view is that receivers are not pass-through 
operators and they are buffering operators as they receive batches from the 
network and buffer them before downstream operators are ready to consume those 
batches. In your view, receivers are pass-through operators that get batches 
from fragment queue or some other queue and pass them to downstream. As there 
is no wait and no processing between getting a batch from fragment queue and 
passing it to the next operator, I don't see why a receiver needs to take the 
ownership. 


> Unordered Receiver does not report its memory usage
> ---
>
> Key: DRILL-6348
> URL: https://issues.apache.org/jira/browse/DRILL-6348
> Project: Apache Drill
>  Issue Type: Task
>  Components: Execution - Flow
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Major
> Fix For: 1.14.0
>
>
> The Drill Profile functionality doesn't show any memory usage for the 
> Unordered Receiver operator. This is problematic when analyzing OOM 
> conditions since we cannot account for all of a query memory usage. This Jira 
> is to fix memory reporting for the Unordered Receiver operator.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6348) Unordered Receiver does not report its memory usage

2018-04-27 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16457069#comment-16457069
 ] 

ASF GitHub Bot commented on DRILL-6348:
---

Github user sachouche commented on a diff in the pull request:

https://github.com/apache/drill/pull/1237#discussion_r184807153
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/unorderedreceiver/UnorderedReceiverBatch.java
 ---
@@ -149,25 +149,32 @@ private RawFragmentBatch getNextBatch() throws 
IOException {
 }
   }
 
+  private RawFragmentBatch getNextNotEmptyBatch() throws IOException {
+RawFragmentBatch batch;
+try {
+  stats.startWait();
--- End diff --

I see; I will then fix any such occurrences when opportunity presents 
itself as I have seen both patterns in the Drill code base.


> Unordered Receiver does not report its memory usage
> ---
>
> Key: DRILL-6348
> URL: https://issues.apache.org/jira/browse/DRILL-6348
> Project: Apache Drill
>  Issue Type: Task
>  Components: Execution - Flow
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Major
> Fix For: 1.14.0
>
>
> The Drill Profile functionality doesn't show any memory usage for the 
> Unordered Receiver operator. This is problematic when analyzing OOM 
> conditions since we cannot account for all of a query memory usage. This Jira 
> is to fix memory reporting for the Unordered Receiver operator.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6348) Unordered Receiver does not report its memory usage

2018-04-27 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16457056#comment-16457056
 ] 

ASF GitHub Bot commented on DRILL-6348:
---

Github user vrozov commented on a diff in the pull request:

https://github.com/apache/drill/pull/1237#discussion_r184804819
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/unorderedreceiver/UnorderedReceiverBatch.java
 ---
@@ -149,25 +149,32 @@ private RawFragmentBatch getNextBatch() throws 
IOException {
 }
   }
 
+  private RawFragmentBatch getNextNotEmptyBatch() throws IOException {
+RawFragmentBatch batch;
+try {
+  stats.startWait();
--- End diff --

it may throw `AssertException` now and other exceptions may be added in the 
future.


> Unordered Receiver does not report its memory usage
> ---
>
> Key: DRILL-6348
> URL: https://issues.apache.org/jira/browse/DRILL-6348
> Project: Apache Drill
>  Issue Type: Task
>  Components: Execution - Flow
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Major
> Fix For: 1.14.0
>
>
> The Drill Profile functionality doesn't show any memory usage for the 
> Unordered Receiver operator. This is problematic when analyzing OOM 
> conditions since we cannot account for all of a query memory usage. This Jira 
> is to fix memory reporting for the Unordered Receiver operator.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-6307) Handle empty batches in record batch sizer correctly

2018-04-27 Thread Padma Penumarthy (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Padma Penumarthy updated DRILL-6307:

Labels: ready-to-commit  (was: )

> Handle empty batches in record batch sizer correctly
> 
>
> Key: DRILL-6307
> URL: https://issues.apache.org/jira/browse/DRILL-6307
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.13.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.14.0
>
>
> when we get empty batch, record batch sizer calculates row width as zero. In 
> that case, we do not do accounting and memory allocation correctly for 
> outgoing batches. 
> For example, in merge join, for outer left join, if right side batch is 
> empty, we still have to include the right side columns as null in outgoing 
> batch. 
> Say first batch is empty. Then, for outgoing, we allocate empty vectors with 
> zero capacity.  When we read the next batch with data, we will end up going 
> through realloc loop. If we use right side row width as 0 in outgoing row 
> width calculation, number of rows we will calculate will be higher and later 
> when we get a non empty batch, we might exceed the memory limits. 
> One possible workaround/solution : Allocate memory based on std size for 
> empty input batch. Use allocation width as width of the batch in number of 
> rows calculation. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6348) Unordered Receiver does not report its memory usage

2018-04-27 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16456910#comment-16456910
 ] 

ASF GitHub Bot commented on DRILL-6348:
---

Github user sachouche commented on the issue:

https://github.com/apache/drill/pull/1237
  
That was not my intention as my current change aimed at describing the 
system the way it is. 

@parthchandra, any feedback?


> Unordered Receiver does not report its memory usage
> ---
>
> Key: DRILL-6348
> URL: https://issues.apache.org/jira/browse/DRILL-6348
> Project: Apache Drill
>  Issue Type: Task
>  Components: Execution - Flow
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Major
> Fix For: 1.14.0
>
>
> The Drill Profile functionality doesn't show any memory usage for the 
> Unordered Receiver operator. This is problematic when analyzing OOM 
> conditions since we cannot account for all of a query memory usage. This Jira 
> is to fix memory reporting for the Unordered Receiver operator.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6348) Unordered Receiver does not report its memory usage

2018-04-27 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16456899#comment-16456899
 ] 

ASF GitHub Bot commented on DRILL-6348:
---

Github user vrozov commented on the issue:

https://github.com/apache/drill/pull/1237
  
@sachouche I'd suggest moving the discussion to dev list as the topic of 
the batch ownership is beyond PR review (code changes).


> Unordered Receiver does not report its memory usage
> ---
>
> Key: DRILL-6348
> URL: https://issues.apache.org/jira/browse/DRILL-6348
> Project: Apache Drill
>  Issue Type: Task
>  Components: Execution - Flow
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Major
> Fix For: 1.14.0
>
>
> The Drill Profile functionality doesn't show any memory usage for the 
> Unordered Receiver operator. This is problematic when analyzing OOM 
> conditions since we cannot account for all of a query memory usage. This Jira 
> is to fix memory reporting for the Unordered Receiver operator.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6327) Update unary operators to handle IterOutcome.EMIT

2018-04-27 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16456879#comment-16456879
 ] 

ASF GitHub Bot commented on DRILL-6327:
---

Github user parthchandra commented on the issue:

https://github.com/apache/drill/pull/1240
  
+1. Very nicely done.


> Update unary operators to handle IterOutcome.EMIT
> -
>
> Key: DRILL-6327
> URL: https://issues.apache.org/jira/browse/DRILL-6327
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Parth Chandra
>Assignee: Sorabh Hamirwasia
>Priority: Major
>
> IterOutcome.EMIT is a new state introduced by the Lateral Join 
> implementation. All operators need to be updated to handle it.
> This Jira is to track the subtask of updating the unary operators (derived 
> from AbstractSingleRecordBatch).
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6348) Unordered Receiver does not report its memory usage

2018-04-27 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16456839#comment-16456839
 ] 

ASF GitHub Bot commented on DRILL-6348:
---

Github user sachouche commented on the issue:

https://github.com/apache/drill/pull/1237
  
@vrozov,

**What are we trying to solve / improve**
- Drill is currently not properly reporting memory held in Fragment's 
receive queues
- This makes it hard to analyze OOM conditions
This is what I want to see
- Every operator reporting on the resources it is currently using (needed)
- Fragment held resources (other than the ones already reported by the 
child operators)
- Drilbit level (metadata caches, web-server, ..)
- I am ok to incrementally reach this goal

**Data Exchange Logistic**
- Ideally, the data exchange fabric should be decoupled from the Drill 
Receive / Send operators
- The fabric should be handling all the aspects of pre-fetch / pressuring 
and so forth
- It will tune to the speed of producers / consumers when writing / reading 
data from it
- This infrastructure should have its own resource management and reporting 
capabilities

**Operator based Reporting**
- Receive and Send operators shall not worry about batches they didn't 
consume yet
- Doing so is counter productive as the Data Exchange fabric will interpret 
a "drain" operation as the operator "needing" more data. 
- For example, the merge-receiver should not be managing the receive 
queues; it should only advertise the pattern of data consumption and let the 
exchange fabric figure out the rest. 

The main difference in the two approaches, is that essentially, you are 
preaching for Receive and Send operators to control all aspects of 
communication whereas I am preaching for decoupling such aspects.


> Unordered Receiver does not report its memory usage
> ---
>
> Key: DRILL-6348
> URL: https://issues.apache.org/jira/browse/DRILL-6348
> Project: Apache Drill
>  Issue Type: Task
>  Components: Execution - Flow
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Major
> Fix For: 1.14.0
>
>
> The Drill Profile functionality doesn't show any memory usage for the 
> Unordered Receiver operator. This is problematic when analyzing OOM 
> conditions since we cannot account for all of a query memory usage. This Jira 
> is to fix memory reporting for the Unordered Receiver operator.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-6202) Deprecate usage of IndexOutOfBoundsException to re-alloc vectors

2018-04-27 Thread Vlad Rozov (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vlad Rozov updated DRILL-6202:
--
Labels: ready-to-commit  (was: )

> Deprecate usage of IndexOutOfBoundsException to re-alloc vectors
> 
>
> Key: DRILL-6202
> URL: https://issues.apache.org/jira/browse/DRILL-6202
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Vlad Rozov
>Assignee: Vlad Rozov
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.14.0
>
>
> As bounds checking may be enabled or disabled, using 
> IndexOutOfBoundsException to resize vectors is unreliable. It works only when 
> bounds checking is enabled.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6348) Unordered Receiver does not report its memory usage

2018-04-27 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16456760#comment-16456760
 ] 

ASF GitHub Bot commented on DRILL-6348:
---

Github user sachouche commented on a diff in the pull request:

https://github.com/apache/drill/pull/1237#discussion_r184747702
  
--- Diff: 
exec/memory/base/src/main/java/org/apache/drill/exec/memory/AllocationManager.java
 ---
@@ -253,10 +261,12 @@ public boolean transferBalance(final BufferLedger 
target) {
   target.historicalLog.recordEvent("incoming(from %s)", 
owningLedger.allocator.name);
 }
 
-boolean overlimit = target.allocator.forceAllocate(size);
+// Release first to handle the case where the current and target 
allocators were part of the same
+// parent / child tree.
 allocator.releaseBytes(size);
+boolean allocationFit = target.allocator.forceAllocate(size);
--- End diff --

- The change of order is an optimization for a parent / child relationship 
as if we don't release first, then we could unnecessarily go over the memory 
budget (double counting).
- The force-alloc() / free() failures should never happen on normal 
conditions; when they do, the best thing to do is to exit. I still prefer not 
to promote the target allocator till it is 100% successful.




> Unordered Receiver does not report its memory usage
> ---
>
> Key: DRILL-6348
> URL: https://issues.apache.org/jira/browse/DRILL-6348
> Project: Apache Drill
>  Issue Type: Task
>  Components: Execution - Flow
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Major
> Fix For: 1.14.0
>
>
> The Drill Profile functionality doesn't show any memory usage for the 
> Unordered Receiver operator. This is problematic when analyzing OOM 
> conditions since we cannot account for all of a query memory usage. This Jira 
> is to fix memory reporting for the Unordered Receiver operator.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6348) Unordered Receiver does not report its memory usage

2018-04-27 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16456757#comment-16456757
 ] 

ASF GitHub Bot commented on DRILL-6348:
---

Github user sachouche commented on a diff in the pull request:

https://github.com/apache/drill/pull/1237#discussion_r184727914
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/unorderedreceiver/UnorderedReceiverBatch.java
 ---
@@ -149,25 +149,32 @@ private RawFragmentBatch getNextBatch() throws 
IOException {
 }
   }
 
+  private RawFragmentBatch getNextNotEmptyBatch() throws IOException {
+RawFragmentBatch batch;
+try {
+  stats.startWait();
--- End diff --

Ok good point, as I have seen both practices being done within the Drill 
code. Though, I don't think this is a big deal as I don't see startWait() 
failing as it merely invokes nano time.


> Unordered Receiver does not report its memory usage
> ---
>
> Key: DRILL-6348
> URL: https://issues.apache.org/jira/browse/DRILL-6348
> Project: Apache Drill
>  Issue Type: Task
>  Components: Execution - Flow
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Major
> Fix For: 1.14.0
>
>
> The Drill Profile functionality doesn't show any memory usage for the 
> Unordered Receiver operator. This is problematic when analyzing OOM 
> conditions since we cannot account for all of a query memory usage. This Jira 
> is to fix memory reporting for the Unordered Receiver operator.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6348) Unordered Receiver does not report its memory usage

2018-04-27 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16456758#comment-16456758
 ] 

ASF GitHub Bot commented on DRILL-6348:
---

Github user sachouche commented on a diff in the pull request:

https://github.com/apache/drill/pull/1237#discussion_r184730050
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/record/RawFragmentBatch.java 
---
@@ -77,4 +83,46 @@ public long getByteCount() {
   public boolean isAckSent() {
 return ackSent.get();
   }
+
+  /**
+   * Transfer ownership of this DrillBuf to the target allocator. This is 
done for better memory
+   * accounting (that is, the operator should be charged with the body's 
Drillbuf memory).
+   *
+   * NOTES -
+   * 
+   * This operation is a NOOP when a) the current allocator 
(associated with the DrillBuf) is not the
+   * owning allocator or b) the target allocator is already the owner
+   * When transfer happens, a new RawFragmentBatch instance is 
allocated; this is done for proper
+   * DrillBuf reference count accounting
+   * The RPC handling code caches a reference to this RawFragmentBatch 
object instance; release()
+   * calls should be routed to the previous DrillBuf
+   * 
+   *
+   * @param targetAllocator target allocator
+   * @return a new {@link RawFragmentBatch} object instance on success 
(where the buffer ownership has
+   * been switched to the target allocator); otherwise this 
operation is a NOOP (current instance
+   * returned)
+   */
+  public RawFragmentBatch transferBodyOwnership(BufferAllocator 
targetAllocator) {
+if (body == null) {
+  return this; // NOOP
+}
+
+if (!body.getLedger().isOwningLedger()
+ || body.getLedger().isOwner(targetAllocator)) {
+
+  return this;
+}
+
+int writerIndex   = body.writerIndex();
+TransferResult transferResult = 
body.transferOwnership(targetAllocator);
+
+// Set the index and increment reference count
+transferResult.buffer.writerIndex(writerIndex);
+
+// Clear the current Drillbuffer since caller will perform release() 
on the new one
+body.release();
+
+return new RawFragmentBatch(getHeader(), transferResult.buffer, 
getSender(), false);
--- End diff --

We can take up such an enhancement as as part of another JIRA as any 
changes within the RPC layer have to be thoroughly tested.


> Unordered Receiver does not report its memory usage
> ---
>
> Key: DRILL-6348
> URL: https://issues.apache.org/jira/browse/DRILL-6348
> Project: Apache Drill
>  Issue Type: Task
>  Components: Execution - Flow
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Major
> Fix For: 1.14.0
>
>
> The Drill Profile functionality doesn't show any memory usage for the 
> Unordered Receiver operator. This is problematic when analyzing OOM 
> conditions since we cannot account for all of a query memory usage. This Jira 
> is to fix memory reporting for the Unordered Receiver operator.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6348) Unordered Receiver does not report its memory usage

2018-04-27 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16456759#comment-16456759
 ] 

ASF GitHub Bot commented on DRILL-6348:
---

Github user sachouche commented on a diff in the pull request:

https://github.com/apache/drill/pull/1237#discussion_r184728292
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/unorderedreceiver/UnorderedReceiverBatch.java
 ---
@@ -149,25 +149,32 @@ private RawFragmentBatch getNextBatch() throws 
IOException {
 }
   }
 
+  private RawFragmentBatch getNextNotEmptyBatch() throws IOException {
+RawFragmentBatch batch;
+try {
+  stats.startWait();
+  batch = getNextBatch();
+
+  // skip over empty batches. we do this since these are basically 
control messages.
+  while (batch != null && batch.getHeader().getDef().getRecordCount() 
== 0
--- End diff --

Ignore this comment as I thought you were releasing the returned batch.


> Unordered Receiver does not report its memory usage
> ---
>
> Key: DRILL-6348
> URL: https://issues.apache.org/jira/browse/DRILL-6348
> Project: Apache Drill
>  Issue Type: Task
>  Components: Execution - Flow
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Major
> Fix For: 1.14.0
>
>
> The Drill Profile functionality doesn't show any memory usage for the 
> Unordered Receiver operator. This is problematic when analyzing OOM 
> conditions since we cannot account for all of a query memory usage. This Jira 
> is to fix memory reporting for the Unordered Receiver operator.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6272) Remove binary jars files from source distribution

2018-04-27 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16456657#comment-16456657
 ] 

ASF GitHub Bot commented on DRILL-6272:
---

Github user arina-ielchiieva commented on the issue:

https://github.com/apache/drill/pull/1225
  
@vrozov now PR contains two commits: 
1. jmockit and mockito upgrade (DRILL-6363);
2. maven-embedder usage for unit tests (used latest version as you 
suggested) (DRILL-6272).
Please review.


> Remove binary jars files from source distribution
> -
>
> Key: DRILL-6272
> URL: https://issues.apache.org/jira/browse/DRILL-6272
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Vlad Rozov
>Assignee: Arina Ielchiieva
>Priority: Critical
> Fix For: 1.14.0
>
>
> Per [~vrozov] the source distribution contains binary jar files under 
> exec/java-exec/src/test/resources/jars



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6348) Unordered Receiver does not report its memory usage

2018-04-27 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16456626#comment-16456626
 ] 

ASF GitHub Bot commented on DRILL-6348:
---

Github user sachouche commented on a diff in the pull request:

https://github.com/apache/drill/pull/1237#discussion_r184726839
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/unorderedreceiver/UnorderedReceiverBatch.java
 ---
@@ -201,6 +208,11 @@ public IterOutcome next() {
   context.getExecutorState().fail(ex);
   return IterOutcome.STOP;
 } finally {
+
+  if (batch != null) {
+batch.release();
+batch = null;
--- End diff --

The point of this pattern is that if you would like to continue using this 
object then be prepared to know what can and what cannot be used.


> Unordered Receiver does not report its memory usage
> ---
>
> Key: DRILL-6348
> URL: https://issues.apache.org/jira/browse/DRILL-6348
> Project: Apache Drill
>  Issue Type: Task
>  Components: Execution - Flow
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Major
> Fix For: 1.14.0
>
>
> The Drill Profile functionality doesn't show any memory usage for the 
> Unordered Receiver operator. This is problematic when analyzing OOM 
> conditions since we cannot account for all of a query memory usage. This Jira 
> is to fix memory reporting for the Unordered Receiver operator.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6281) Refactor TimedRunnable

2018-04-27 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16456620#comment-16456620
 ] 

ASF GitHub Bot commented on DRILL-6281:
---

Github user vrozov commented on a diff in the pull request:

https://github.com/apache/drill/pull/1238#discussion_r184724657
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/TimedCallable.java ---
@@ -0,0 +1,266 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.Objects;
+import java.util.concurrent.Callable;
+import java.util.concurrent.CancellationException;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.Future;
+import java.util.concurrent.RejectedExecutionException;
+import java.util.concurrent.TimeUnit;
+import java.util.function.Consumer;
+import java.util.function.Function;
+import java.util.stream.Collectors;
+
+import org.apache.drill.common.exceptions.UserException;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.google.common.base.Preconditions;
+import com.google.common.base.Stopwatch;
+import com.google.common.util.concurrent.MoreExecutors;
+import com.google.common.util.concurrent.ThreadFactoryBuilder;
+
+/**
+ * Class used to allow parallel executions of tasks in a simplified way. 
Also maintains and reports timings of task completion.
+ * TODO: look at switching to fork join.
+ * @param  The time value that will be returned when the task is 
executed.
+ */
+public abstract class TimedCallable implements Callable {
+  private static final Logger logger = 
LoggerFactory.getLogger(TimedCallable.class);
+
+  private static long TIMEOUT_PER_RUNNABLE_IN_MSECS = 15000;
+
+  private volatile long startTime = 0;
+  private volatile long executionTime = -1;
+
+  private static class FutureMapper implements Function {
+int count;
+Throwable throwable = null;
+
+private void setThrowable(Throwable t) {
+  if (throwable == null) {
+throwable = t;
+  } else {
+throwable.addSuppressed(t);
+  }
+}
+
+@Override
+public V apply(Future future) {
+  Preconditions.checkState(future.isDone());
+  if (!future.isCancelled()) {
+try {
+  count++;
+  return future.get();
+} catch (InterruptedException e) {
+  // there is no wait as we are getting result from the 
completed/done future
+  logger.error("Unexpected exception", e);
+  throw UserException.internalError(e)
+  .message("Unexpected exception")
+  .build(logger);
+} catch (ExecutionException e) {
+  setThrowable(e.getCause());
+}
+  } else {
+setThrowable(new CancellationException());
+  }
+  return null;
+}
+  }
+
+  private static class Statistics implements Consumer 
{
+final long start = System.nanoTime();
+final Stopwatch watch = Stopwatch.createStarted();
+long totalExecution;
+long maxExecution;
+int count;
+int startedCount;
+private int doneCount;
+// measure thread creation times
+long earliestStart;
+long latestStart;
+long totalStart;
+
+@Override
+public void accept(TimedCallable task) {
+  count++;
+  long threadStart = task.getStartTime(TimeUnit.NANOSECONDS) - start;
+  if (threadStart >= 0) {
+startedCount++;
+earliestStart = Math.min(earliestStart, threadStart);
+latestStart = Math.max(latestStart, threadStart);
+

[jira] [Commented] (DRILL-5797) Use more often the new parquet reader

2018-04-27 Thread Oleksandr Kalinin (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16456529#comment-16456529
 ] 

Oleksandr Kalinin commented on DRILL-5797:
--

Just for a record, further debugging shows how complex column sneaks into 
ReadState:

(1) `ParquetRecordReader.setup()` triggers ParquetSchema 
buildSchema/loadParquetSchema for column mapping
(2) `ParquetSchema.loadParquetSchema()` is using `ParqueSchema.fieldSelected()` 
for column matching
(3) fieldSelected() takes MaterializedField as an argument and uses it's 
getName() method for column name comparison. For column B.A it returns A.
(4) As result of that, column B.A of the file gets positively matched to column 
A and gets added to selectedColumnMetadata in the ParquetSchema which is then 
passed to ReadState

> Use more often the new parquet reader
> -
>
> Key: DRILL-5797
> URL: https://issues.apache.org/jira/browse/DRILL-5797
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Parquet
>Reporter: Damien Profeta
>Assignee: Oleksandr Kalinin
>Priority: Major
> Fix For: 1.14.0
>
>
> The choice of using the regular parquet reader of the optimized one is based 
> of what type of columns is in the file. But the columns that are read by the 
> query doesn't matter. We can increase a little bit the cases where the 
> optimized reader is used by checking is the projected column are simple or 
> not.
> This is an optimization waiting for the fast parquet reader to handle complex 
> structure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-6363) Upgrade jmockit and mockito libs

2018-04-27 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-6363:

Description: 
Change Jmockit from 
{noformat}

  com.googlecode.jmockit
  jmockit
  1.3
  test

{noformat}
to
{noformat}

  org.jmockit
  jmockit
  1.39
  test

{noformat}

Change Mockito core version from 1.9.5 to 2.18.3.

  was:
JMOCKIT
{noformat}

  com.googlecode.jmockit
  jmockit
  1.3
  test

{noformat}
to
{noformat}

  org.jmockit
  jmockit
  1.39
  test

{noformat}

Change Mockito core version from 1.9.5 to 2.18.3.


> Upgrade jmockit and mockito libs
> 
>
> Key: DRILL-6363
> URL: https://issues.apache.org/jira/browse/DRILL-6363
> Project: Apache Drill
>  Issue Type: Task
>Affects Versions: 1.13.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
> Fix For: 1.14.0
>
>
> Change Jmockit from 
> {noformat}
> 
>   com.googlecode.jmockit
>   jmockit
>   1.3
>   test
> 
> {noformat}
> to
> {noformat}
> 
>   org.jmockit
>   jmockit
>   1.39
>   test
> 
> {noformat}
> Change Mockito core version from 1.9.5 to 2.18.3.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-6363) Upgrade jmockit and mockito libs

2018-04-27 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-6363:

Description: 
JMOCKIT
{noformat}

  com.googlecode.jmockit
  jmockit
  1.3
  test

{noformat}
to
{noformat}

  org.jmockit
  jmockit
  1.39
  test

{noformat}

Change Mockito core version from 1.9.5 to 2.18.3.

> Upgrade jmockit and mockito libs
> 
>
> Key: DRILL-6363
> URL: https://issues.apache.org/jira/browse/DRILL-6363
> Project: Apache Drill
>  Issue Type: Task
>Affects Versions: 1.13.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
> Fix For: 1.14.0
>
>
> JMOCKIT
> {noformat}
> 
>   com.googlecode.jmockit
>   jmockit
>   1.3
>   test
> 
> {noformat}
> to
> {noformat}
> 
>   org.jmockit
>   jmockit
>   1.39
>   test
> 
> {noformat}
> Change Mockito core version from 1.9.5 to 2.18.3.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (DRILL-6363) Upgrade jmockit and mockito libs

2018-04-27 Thread Arina Ielchiieva (JIRA)

Arina Ielchiieva created DRILL-6363:
---

 Summary: Upgrade jmockit and mockito libs
 Key: DRILL-6363
 URL: https://issues.apache.org/jira/browse/DRILL-6363
 Project: Apache Drill
  Issue Type: Task
Affects Versions: 1.13.0
Reporter: Arina Ielchiieva
Assignee: Arina Ielchiieva
 Fix For: 1.14.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6281) Refactor TimedRunnable

2018-04-27 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16456515#comment-16456515
 ] 

ASF GitHub Bot commented on DRILL-6281:
---

Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1238#discussion_r184701824
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/TimedCallable.java ---
@@ -0,0 +1,266 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.Objects;
+import java.util.concurrent.Callable;
+import java.util.concurrent.CancellationException;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.Future;
+import java.util.concurrent.RejectedExecutionException;
+import java.util.concurrent.TimeUnit;
+import java.util.function.Consumer;
+import java.util.function.Function;
+import java.util.stream.Collectors;
+
+import org.apache.drill.common.exceptions.UserException;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.google.common.base.Preconditions;
+import com.google.common.base.Stopwatch;
+import com.google.common.util.concurrent.MoreExecutors;
+import com.google.common.util.concurrent.ThreadFactoryBuilder;
+
+/**
+ * Class used to allow parallel executions of tasks in a simplified way. 
Also maintains and reports timings of task completion.
+ * TODO: look at switching to fork join.
+ * @param  The time value that will be returned when the task is 
executed.
+ */
+public abstract class TimedCallable implements Callable {
+  private static final Logger logger = 
LoggerFactory.getLogger(TimedCallable.class);
+
+  private static long TIMEOUT_PER_RUNNABLE_IN_MSECS = 15000;
+
+  private volatile long startTime = 0;
+  private volatile long executionTime = -1;
+
+  private static class FutureMapper implements Function {
+int count;
+Throwable throwable = null;
+
+private void setThrowable(Throwable t) {
+  if (throwable == null) {
+throwable = t;
+  } else {
+throwable.addSuppressed(t);
+  }
+}
+
+@Override
+public V apply(Future future) {
+  Preconditions.checkState(future.isDone());
+  if (!future.isCancelled()) {
+try {
+  count++;
+  return future.get();
+} catch (InterruptedException e) {
+  // there is no wait as we are getting result from the 
completed/done future
+  logger.error("Unexpected exception", e);
+  throw UserException.internalError(e)
+  .message("Unexpected exception")
+  .build(logger);
+} catch (ExecutionException e) {
+  setThrowable(e.getCause());
+}
+  } else {
+setThrowable(new CancellationException());
+  }
+  return null;
+}
+  }
+
+  private static class Statistics implements Consumer 
{
+final long start = System.nanoTime();
+final Stopwatch watch = Stopwatch.createStarted();
+long totalExecution;
+long maxExecution;
+int count;
+int startedCount;
+private int doneCount;
+// measure thread creation times
+long earliestStart;
+long latestStart;
+long totalStart;
+
+@Override
+public void accept(TimedCallable task) {
+  count++;
+  long threadStart = task.getStartTime(TimeUnit.NANOSECONDS) - start;
+  if (threadStart >= 0) {
+startedCount++;
+earliestStart = Math.min(earliestStart, threadStart);
+latestStart = Math.max(latestStart, threadStart);
+

[jira] [Commented] (DRILL-6281) Refactor TimedRunnable

2018-04-27 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16456481#comment-16456481
 ] 

ASF GitHub Bot commented on DRILL-6281:
---

Github user vrozov commented on a diff in the pull request:

https://github.com/apache/drill/pull/1238#discussion_r184694930
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/TimedCallable.java ---
@@ -0,0 +1,266 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.Objects;
+import java.util.concurrent.Callable;
+import java.util.concurrent.CancellationException;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.Future;
+import java.util.concurrent.RejectedExecutionException;
+import java.util.concurrent.TimeUnit;
+import java.util.function.Consumer;
+import java.util.function.Function;
+import java.util.stream.Collectors;
+
+import org.apache.drill.common.exceptions.UserException;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.google.common.base.Preconditions;
+import com.google.common.base.Stopwatch;
+import com.google.common.util.concurrent.MoreExecutors;
+import com.google.common.util.concurrent.ThreadFactoryBuilder;
+
+/**
+ * Class used to allow parallel executions of tasks in a simplified way. 
Also maintains and reports timings of task completion.
+ * TODO: look at switching to fork join.
+ * @param  The time value that will be returned when the task is 
executed.
+ */
+public abstract class TimedCallable implements Callable {
+  private static final Logger logger = 
LoggerFactory.getLogger(TimedCallable.class);
+
+  private static long TIMEOUT_PER_RUNNABLE_IN_MSECS = 15000;
+
+  private volatile long startTime = 0;
+  private volatile long executionTime = -1;
+
+  private static class FutureMapper implements Function {
+int count;
+Throwable throwable = null;
+
+private void setThrowable(Throwable t) {
+  if (throwable == null) {
+throwable = t;
+  } else {
+throwable.addSuppressed(t);
+  }
+}
+
+@Override
+public V apply(Future future) {
+  Preconditions.checkState(future.isDone());
+  if (!future.isCancelled()) {
+try {
+  count++;
+  return future.get();
+} catch (InterruptedException e) {
+  // there is no wait as we are getting result from the 
completed/done future
+  logger.error("Unexpected exception", e);
+  throw UserException.internalError(e)
+  .message("Unexpected exception")
+  .build(logger);
+} catch (ExecutionException e) {
+  setThrowable(e.getCause());
+}
+  } else {
+setThrowable(new CancellationException());
+  }
+  return null;
+}
+  }
+
+  private static class Statistics implements Consumer 
{
+final long start = System.nanoTime();
+final Stopwatch watch = Stopwatch.createStarted();
+long totalExecution;
+long maxExecution;
+int count;
+int startedCount;
+private int doneCount;
+// measure thread creation times
+long earliestStart;
+long latestStart;
+long totalStart;
+
+@Override
+public void accept(TimedCallable task) {
+  count++;
+  long threadStart = task.getStartTime(TimeUnit.NANOSECONDS) - start;
+  if (threadStart >= 0) {
+startedCount++;
+earliestStart = Math.min(earliestStart, threadStart);
+latestStart = Math.max(latestStart, threadStart);
+

[jira] [Commented] (DRILL-6281) Refactor TimedRunnable

2018-04-27 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16456472#comment-16456472
 ] 

ASF GitHub Bot commented on DRILL-6281:
---

Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1238#discussion_r184693553
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/TimedCallable.java ---
@@ -0,0 +1,266 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.Objects;
+import java.util.concurrent.Callable;
+import java.util.concurrent.CancellationException;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.Future;
+import java.util.concurrent.RejectedExecutionException;
+import java.util.concurrent.TimeUnit;
+import java.util.function.Consumer;
+import java.util.function.Function;
+import java.util.stream.Collectors;
+
+import org.apache.drill.common.exceptions.UserException;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.google.common.base.Preconditions;
+import com.google.common.base.Stopwatch;
+import com.google.common.util.concurrent.MoreExecutors;
+import com.google.common.util.concurrent.ThreadFactoryBuilder;
+
+/**
+ * Class used to allow parallel executions of tasks in a simplified way. 
Also maintains and reports timings of task completion.
+ * TODO: look at switching to fork join.
+ * @param  The time value that will be returned when the task is 
executed.
+ */
+public abstract class TimedCallable implements Callable {
+  private static final Logger logger = 
LoggerFactory.getLogger(TimedCallable.class);
+
+  private static long TIMEOUT_PER_RUNNABLE_IN_MSECS = 15000;
+
+  private volatile long startTime = 0;
+  private volatile long executionTime = -1;
+
+  private static class FutureMapper implements Function {
+int count;
+Throwable throwable = null;
+
+private void setThrowable(Throwable t) {
+  if (throwable == null) {
+throwable = t;
+  } else {
+throwable.addSuppressed(t);
+  }
+}
+
+@Override
+public V apply(Future future) {
+  Preconditions.checkState(future.isDone());
+  if (!future.isCancelled()) {
+try {
+  count++;
+  return future.get();
+} catch (InterruptedException e) {
+  // there is no wait as we are getting result from the 
completed/done future
+  logger.error("Unexpected exception", e);
+  throw UserException.internalError(e)
+  .message("Unexpected exception")
+  .build(logger);
+} catch (ExecutionException e) {
+  setThrowable(e.getCause());
+}
+  } else {
+setThrowable(new CancellationException());
+  }
+  return null;
+}
+  }
+
+  private static class Statistics implements Consumer 
{
+final long start = System.nanoTime();
+final Stopwatch watch = Stopwatch.createStarted();
+long totalExecution;
+long maxExecution;
+int count;
+int startedCount;
+private int doneCount;
+// measure thread creation times
+long earliestStart;
+long latestStart;
+long totalStart;
+
+@Override
+public void accept(TimedCallable task) {
+  count++;
+  long threadStart = task.getStartTime(TimeUnit.NANOSECONDS) - start;
+  if (threadStart >= 0) {
+startedCount++;
+earliestStart = Math.min(earliestStart, threadStart);
+latestStart = Math.max(latestStart, threadStart);
+

[jira] [Commented] (DRILL-6281) Refactor TimedRunnable

2018-04-27 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16456468#comment-16456468
 ] 

ASF GitHub Bot commented on DRILL-6281:
---

Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1238#discussion_r184693216
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/TimedCallable.java ---
@@ -0,0 +1,258 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.Objects;
+import java.util.concurrent.Callable;
+import java.util.concurrent.CancellationException;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.Future;
+import java.util.concurrent.RejectedExecutionException;
+import java.util.concurrent.TimeUnit;
+import java.util.function.Consumer;
+import java.util.function.Function;
+import java.util.stream.Collectors;
+
+import org.apache.drill.common.exceptions.UserException;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.google.common.base.Preconditions;
+import com.google.common.base.Stopwatch;
+import com.google.common.util.concurrent.MoreExecutors;
+import com.google.common.util.concurrent.ThreadFactoryBuilder;
+
+/**
+ * Class used to allow parallel executions of tasks in a simplified way. 
Also maintains and reports timings of task completion.
+ * TODO: look at switching to fork join.
+ * @param  The time value that will be returned when the task is 
executed.
+ */
+public abstract class TimedCallable implements Callable {
+  private static final Logger logger = 
LoggerFactory.getLogger(TimedCallable.class);
+
+  private static long TIMEOUT_PER_RUNNABLE_IN_MSECS = 15000;
+
+  private volatile long startTime = 0;
+  private volatile long executionTime = -1;
+
+  private static class FutureMapper implements Function {
+int count;
+Throwable throwable = null;
+
+private void setThrowable(Throwable t) {
+  if (throwable == null) {
+throwable = t;
+  } else {
+throwable.addSuppressed(t);
+  }
+}
+
+@Override
+public V apply(Future future) {
+  Preconditions.checkState(future.isDone());
+  if (!future.isCancelled()) {
+try {
+  count++;
+  return future.get();
+} catch (InterruptedException e) {
+  // there is no wait as we are getting result from the 
completed/done future
+  logger.error("Unexpected exception", e);
+  throw UserException.internalError(e)
+  .message("Unexpected exception")
+  .build(logger);
+} catch (ExecutionException e) {
+  setThrowable(e.getCause());
+}
+  } else {
+setThrowable(new CancellationException());
+  }
+  return null;
+}
+  }
+
+  private static class Statistics implements Consumer 
{
+final long start = System.nanoTime();
+final Stopwatch watch = Stopwatch.createStarted();
+long totalExecution = 0;
+long maxExecution = 0;
+int startedCount = 0;
+private int doneCount = 0;
+// measure thread creation times
+long earliestStart = Long.MAX_VALUE;
+long latestStart = 0;
+long totalStart = 0;
+
+@Override
+public void accept(TimedCallable task) {
+  long threadStart = task.getStartTime(TimeUnit.NANOSECONDS) - start;
+  if (threadStart >= 0) {
+startedCount++;
+earliestStart = Math.min(earliestStart, threadStart);
+latestStart = Math.max(latestStart, threadStart);

[jira] [Commented] (DRILL-6281) Refactor TimedRunnable

2018-04-27 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16456461#comment-16456461
 ] 

ASF GitHub Bot commented on DRILL-6281:
---

Github user vrozov commented on a diff in the pull request:

https://github.com/apache/drill/pull/1238#discussion_r184691926
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/TimedCallable.java ---
@@ -0,0 +1,258 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.Objects;
+import java.util.concurrent.Callable;
+import java.util.concurrent.CancellationException;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.Future;
+import java.util.concurrent.RejectedExecutionException;
+import java.util.concurrent.TimeUnit;
+import java.util.function.Consumer;
+import java.util.function.Function;
+import java.util.stream.Collectors;
+
+import org.apache.drill.common.exceptions.UserException;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.google.common.base.Preconditions;
+import com.google.common.base.Stopwatch;
+import com.google.common.util.concurrent.MoreExecutors;
+import com.google.common.util.concurrent.ThreadFactoryBuilder;
+
+/**
+ * Class used to allow parallel executions of tasks in a simplified way. 
Also maintains and reports timings of task completion.
+ * TODO: look at switching to fork join.
+ * @param  The time value that will be returned when the task is 
executed.
+ */
+public abstract class TimedCallable implements Callable {
+  private static final Logger logger = 
LoggerFactory.getLogger(TimedCallable.class);
+
+  private static long TIMEOUT_PER_RUNNABLE_IN_MSECS = 15000;
+
+  private volatile long startTime = 0;
+  private volatile long executionTime = -1;
+
+  private static class FutureMapper implements Function {
+int count;
+Throwable throwable = null;
+
+private void setThrowable(Throwable t) {
+  if (throwable == null) {
+throwable = t;
+  } else {
+throwable.addSuppressed(t);
+  }
+}
+
+@Override
+public V apply(Future future) {
+  Preconditions.checkState(future.isDone());
+  if (!future.isCancelled()) {
+try {
+  count++;
+  return future.get();
+} catch (InterruptedException e) {
+  // there is no wait as we are getting result from the 
completed/done future
+  logger.error("Unexpected exception", e);
+  throw UserException.internalError(e)
+  .message("Unexpected exception")
+  .build(logger);
+} catch (ExecutionException e) {
+  setThrowable(e.getCause());
+}
+  } else {
+setThrowable(new CancellationException());
+  }
+  return null;
+}
+  }
+
+  private static class Statistics implements Consumer 
{
+final long start = System.nanoTime();
+final Stopwatch watch = Stopwatch.createStarted();
+long totalExecution = 0;
+long maxExecution = 0;
+int startedCount = 0;
+private int doneCount = 0;
+// measure thread creation times
+long earliestStart = Long.MAX_VALUE;
+long latestStart = 0;
+long totalStart = 0;
+
+@Override
+public void accept(TimedCallable task) {
+  long threadStart = task.getStartTime(TimeUnit.NANOSECONDS) - start;
+  if (threadStart >= 0) {
+startedCount++;
+earliestStart = Math.min(earliestStart, threadStart);
+latestStart = Math.max(latestStart, threadStart);
+

[jira] [Commented] (DRILL-6331) Parquet filter pushdown does not support the native hive reader

2018-04-27 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16456452#comment-16456452
 ] 

ASF GitHub Bot commented on DRILL-6331:
---

Github user vrozov commented on the issue:

https://github.com/apache/drill/pull/1214
  
When moving files around please preserve the history of modifications done 
to the file.


> Parquet filter pushdown does not support the native hive reader
> ---
>
> Key: DRILL-6331
> URL: https://issues.apache.org/jira/browse/DRILL-6331
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive
>Affects Versions: 1.13.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.14.0
>
>
> Initially HiveDrillNativeParquetGroupScan was based mainly on HiveScan, the 
> core difference between them was
> that HiveDrillNativeParquetScanBatchCreator was creating ParquetRecordReader 
> instead of HiveReader.
> This allowed to read Hive parquet files using Drill native parquet reader but 
> did not expose Hive data to Drill optimizations.
> For example, filter push down, limit push down, count to direct scan 
> optimizations.
> Hive code had to be refactored to use the same interfaces as 
> ParquestGroupScan in order to be exposed to such optimizations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-6345) Add LOG10 function implementation

2018-04-27 Thread Volodymyr Vysotskyi (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Volodymyr Vysotskyi updated DRILL-6345:
---
Labels: ready-to-commit  (was: )

> Add LOG10 function implementation
> -
>
> Key: DRILL-6345
> URL: https://issues.apache.org/jira/browse/DRILL-6345
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Functions - Drill
>Reporter: Volodymyr Tkach
>Assignee: Volodymyr Tkach
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.14.0
>
>
> Add LOG10 function implementation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6331) Parquet filter pushdown does not support the native hive reader

2018-04-27 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16456171#comment-16456171
 ] 

ASF GitHub Bot commented on DRILL-6331:
---

Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/1214


> Parquet filter pushdown does not support the native hive reader
> ---
>
> Key: DRILL-6331
> URL: https://issues.apache.org/jira/browse/DRILL-6331
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive
>Affects Versions: 1.13.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.14.0
>
>
> Initially HiveDrillNativeParquetGroupScan was based mainly on HiveScan, the 
> core difference between them was
> that HiveDrillNativeParquetScanBatchCreator was creating ParquetRecordReader 
> instead of HiveReader.
> This allowed to read Hive parquet files using Drill native parquet reader but 
> did not expose Hive data to Drill optimizations.
> For example, filter push down, limit push down, count to direct scan 
> optimizations.
> Hive code had to be refactored to use the same interfaces as 
> ParquestGroupScan in order to be exposed to such optimizations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6342) Parquet filter pushdown doesn't work in case of filtering fields inside arrays of complex fields

2018-04-27 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16456170#comment-16456170
 ] 

ASF GitHub Bot commented on DRILL-6342:
---

Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/1231


> Parquet filter pushdown doesn't work in case of filtering fields inside 
> arrays of complex fields
> 
>
> Key: DRILL-6342
> URL: https://issues.apache.org/jira/browse/DRILL-6342
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.14.0
>Reporter: Anton Gozhiy
>Assignee: Arina Ielchiieva
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.14.0
>
> Attachments: Complex_data.tar.gz
>
>
> *Data:*
>  Complex_data data set is attached
> *Query:*
> {code:sql}
> explain plan for select * from dfs.tmp.`Complex_data` t where 
> t.list_of_complex_fields[2].nested_field is true
> {code}
> *Expected result:*
> numFiles=2
> Statistics of the file that should't be scanned:
> {noformat}
> list_of_complex_fields:
> .nested_field:   BOOLEAN UNCOMPRESSED DO:0 FPO:497 
> SZ:41/41/1.00 VC:3 ENC:PLAIN,RLE ST:[min: false, max: false, num_nulls: 0]
> {noformat}
> *Actual result:*
> numFiles=3
> I.e, filter pushdown is not work



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-6360) Document the typeof() function

2018-04-27 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-6360:

Fix Version/s: 1.14.0

> Document the typeof() function
> --
>
> Key: DRILL-6360
> URL: https://issues.apache.org/jira/browse/DRILL-6360
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.13.0
>Reporter: Paul Rogers
>Assignee: Bridget Bevens
>Priority: Minor
>  Labels: doc-impacting
> Fix For: 1.14.0
>
>
> Drill has a {{typeof()}} function that returns the data type (but not mode) 
> of a column. It was discussed on the dev list recently. However, a search of 
> the Drill web site, and a scan by hand, failed to turn up documentation about 
> the function.
> As a general suggestion, would be great to have an alphabetical list of all 
> functions so we don't have to hunt all over the site to find which functions 
> are available.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-6360) Document the typeof() function

2018-04-27 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-6360:

Labels: doc-impacting  (was: )

> Document the typeof() function
> --
>
> Key: DRILL-6360
> URL: https://issues.apache.org/jira/browse/DRILL-6360
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.13.0
>Reporter: Paul Rogers
>Assignee: Bridget Bevens
>Priority: Minor
>  Labels: doc-impacting
> Fix For: 1.14.0
>
>
> Drill has a {{typeof()}} function that returns the data type (but not mode) 
> of a column. It was discussed on the dev list recently. However, a search of 
> the Drill web site, and a scan by hand, failed to turn up documentation about 
> the function.
> As a general suggestion, would be great to have an alphabetical list of all 
> functions so we don't have to hunt all over the site to find which functions 
> are available.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6281) Refactor TimedRunnable

2018-04-27 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16456106#comment-16456106
 ] 

ASF GitHub Bot commented on DRILL-6281:
---

Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1238#discussion_r184622525
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/TimedCallable.java ---
@@ -0,0 +1,258 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.Objects;
+import java.util.concurrent.Callable;
+import java.util.concurrent.CancellationException;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.Future;
+import java.util.concurrent.RejectedExecutionException;
+import java.util.concurrent.TimeUnit;
+import java.util.function.Consumer;
+import java.util.function.Function;
+import java.util.stream.Collectors;
+
+import org.apache.drill.common.exceptions.UserException;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.google.common.base.Preconditions;
+import com.google.common.base.Stopwatch;
+import com.google.common.util.concurrent.MoreExecutors;
+import com.google.common.util.concurrent.ThreadFactoryBuilder;
+
+/**
+ * Class used to allow parallel executions of tasks in a simplified way. 
Also maintains and reports timings of task completion.
+ * TODO: look at switching to fork join.
+ * @param  The time value that will be returned when the task is 
executed.
+ */
+public abstract class TimedCallable implements Callable {
+  private static final Logger logger = 
LoggerFactory.getLogger(TimedCallable.class);
+
+  private static long TIMEOUT_PER_RUNNABLE_IN_MSECS = 15000;
+
+  private volatile long startTime = 0;
+  private volatile long executionTime = -1;
+
+  private static class FutureMapper implements Function {
+int count;
+Throwable throwable = null;
+
+private void setThrowable(Throwable t) {
+  if (throwable == null) {
+throwable = t;
+  } else {
+throwable.addSuppressed(t);
+  }
+}
+
+@Override
+public V apply(Future future) {
+  Preconditions.checkState(future.isDone());
+  if (!future.isCancelled()) {
+try {
+  count++;
+  return future.get();
+} catch (InterruptedException e) {
+  // there is no wait as we are getting result from the 
completed/done future
+  logger.error("Unexpected exception", e);
+  throw UserException.internalError(e)
+  .message("Unexpected exception")
+  .build(logger);
+} catch (ExecutionException e) {
+  setThrowable(e.getCause());
+}
+  } else {
+setThrowable(new CancellationException());
+  }
+  return null;
+}
+  }
+
+  private static class Statistics implements Consumer 
{
+final long start = System.nanoTime();
+final Stopwatch watch = Stopwatch.createStarted();
+long totalExecution = 0;
+long maxExecution = 0;
+int startedCount = 0;
+private int doneCount = 0;
+// measure thread creation times
+long earliestStart = Long.MAX_VALUE;
+long latestStart = 0;
+long totalStart = 0;
+
+@Override
+public void accept(TimedCallable task) {
+  long threadStart = task.getStartTime(TimeUnit.NANOSECONDS) - start;
+  if (threadStart >= 0) {
+startedCount++;
+earliestStart = Math.min(earliestStart, threadStart);
+latestStart = Math.max(latestStart, threadStart);

[jira] [Updated] (DRILL-6281) Refactor TimedRunnable

2018-04-27 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-6281:

Labels:   (was: ready-to-commit)

> Refactor TimedRunnable
> --
>
> Key: DRILL-6281
> URL: https://issues.apache.org/jira/browse/DRILL-6281
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Vlad Rozov
>Assignee: Vlad Rozov
>Priority: Major
> Fix For: 1.14.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-6281) Refactor TimedRunnable

2018-04-27 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-6281:

Labels: ready-to-commit  (was: )

> Refactor TimedRunnable
> --
>
> Key: DRILL-6281
> URL: https://issues.apache.org/jira/browse/DRILL-6281
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Vlad Rozov
>Assignee: Vlad Rozov
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.14.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-6281) Refactor TimedRunnable

2018-04-27 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-6281:

Fix Version/s: 1.14.0

> Refactor TimedRunnable
> --
>
> Key: DRILL-6281
> URL: https://issues.apache.org/jira/browse/DRILL-6281
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Vlad Rozov
>Assignee: Vlad Rozov
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.14.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6345) Add LOG10 function implementation

2018-04-27 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16456102#comment-16456102
 ] 

ASF GitHub Bot commented on DRILL-6345:
---

Github user vladimirtkach commented on the issue:

https://github.com/apache/drill/pull/1230
  
@vvysotskyi made changes according to your remarks


> Add LOG10 function implementation
> -
>
> Key: DRILL-6345
> URL: https://issues.apache.org/jira/browse/DRILL-6345
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Functions - Drill
>Reporter: Volodymyr Tkach
>Assignee: Volodymyr Tkach
>Priority: Major
> Fix For: 1.14.0
>
>
> Add LOG10 function implementation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

44 matches

Mail list logo