Re: [GitHub] drill issue #1237: DRILL-6348: Fixed code so that Unordered Receiver reports...

2018-04-27 Thread salim achouche
Correction for example II as Drill uses a single thread per pipeline (a
batch is fully processed before the next one is; only receive of batches
can happen concurrently):
- Using batch identifiers for more clarity
- t0: (fragment, opr-1, opr-2) = ([b1], [], [])
- t1: (fragment, opr-1, opr-2) = ([b2], [b1], [])
- t2: (fragment, opr-1, opr-2) = ([b3,b2], [], [b1])
   (fragment, opr-1, opr-2) = ([b3], [b2], [])
   (fragment, opr-1, opr-2) = ([b3], [], [b2])
   (fragment, opr-1, opr-2) = ([], [b3], [])
   (fragment, opr-1, opr-2) = ([], [], [b3])

The point remains the same that change of ownership for pass-through
remains valid as it doesn't inflate resource allocation for a given time
snapshot.


On Sat, Apr 28, 2018 at 12:42 AM, salim achouche 
wrote:

> Another point, I don't see a functional benefit from avoiding a change of
> ownership for pass-through operators. Consider the following use-cases:
>
> Example I -
> - Single batch of size 8MB is received at time t0 and then is passed
> through a set of pass-through operators
> - At time t1 owned by operator Opr1, time t2 owned by operator t2, and so
> forth
> - Assume we report memory usage at time t0 - t2; this is what will be seen
> - t0: (fragment, opr-1, opr-2) = (8Mb, 0, 0)
> - t1: (fragment, opr-1, opr-2) = (0, 8MB, 0)
> - t2: (fragment, opr-1, opr-2) = (0, 0, 8MB)
>
> Example II -
> - Multiple batches of size 8MB are received at time t0 - t2 and then is
> passed through a set of pass-through operators
> - At time t1 owned by operator Opr1, time t2 owned by operator t2, and so
> forth
> - Assume we report memory usage at time t0 - t2; this is what will be seen
> - t0: (fragment, opr-1, opr-2) = (8Mb, 0, 0)
> - t1: (fragment, opr-1, opr-2) = (8Mb, 8MB, 0)
> - t2: (fragment, opr-1, opr-2) = (8Mb, 8Mb, 8MB)
>
>
> The key thing is that we clarify our reporting metrics so that users do
> not make the wrong conclusions.
>
> Regards,
> Salim
>
> On Fri, Apr 27, 2018 at 11:47 PM, salim achouche 
> wrote:
>
>> Vlad,
>>
>> - My understanding is that operators need to take ownership of incoming
>> buffers (using
>>
>> the vector method transferTo())
>>
>> - My view is not that receivers are pass-through; instead, I feel that
>> sender & receiver operators should focus on their business logic
>>
>> - It just happens that the unordered-receiver does very little
>> (deserializes the batch through the BatchLoader)
>>
>> - Contrast this with the merge-receiver which needs to consume data from
>> multiple inputs to provide ordered batches
>>
>> - The operator implementation will dictate how many batches are consumed
>> (this should have nothing to do with communication concerns)
>>
>> - Intricacies of buffering, acking, back-pressuring, etc is ideally left
>> to a communication module
>>
>>
>> My intent, is to consistently report on resource usage (I am fine if we
>> exclude pass-through operators as long as we do it consistently). The next
>>
>> enhancement that I am planning to do is to report on the fragment
>> buffered batches. This will enable us to account for such resources when
>> analyzing
>>
>> memory usage.
>>
>> On Fri, Apr 27, 2018 at 9:50 PM, vrozov  wrote:
>>
>>> Github user vrozov commented on the issue:
>>>
>>> https://github.com/apache/drill/pull/1237
>>>
>>> IMO, it will be good to understand what other operators do as well.
>>> For example what Project or Filter operators do. Do they take ownership of
>>> incoming batches? And if they do, when is the ownership taken?
>>>
>>> I do not suggest that we change how Sender and Receiver control
>>> **all** aspects of communication, at least not as part of this JIRA/PR. The
>>> difference in my and your approach is whether or not UnorderedReceiver and
>>> other receivers are pass-through operators. My view is that receivers are
>>> not pass-through operators and they are buffering operators as they receive
>>> batches from the network and buffer them before downstream operators are
>>> ready to consume those batches. In your view, receivers are pass-through
>>> operators that get batches from fragment queue or some other queue and pass
>>> them to downstream. As there is no wait and no processing between getting a
>>> batch from fragment queue and passing it to the next operator, I don't see
>>> why a receiver needs to take the ownership.
>>>
>>>
>>> ---
>>>
>>
>>
>


Re: [GitHub] drill issue #1237: DRILL-6348: Fixed code so that Unordered Receiver reports...

2018-04-27 Thread salim achouche
Another point, I don't see a functional benefit from avoiding a change of
ownership for pass-through operators. Consider the following use-cases:

Example I -
- Single batch of size 8MB is received at time t0 and then is passed
through a set of pass-through operators
- At time t1 owned by operator Opr1, time t2 owned by operator t2, and so
forth
- Assume we report memory usage at time t0 - t2; this is what will be seen
- t0: (fragment, opr-1, opr-2) = (8Mb, 0, 0)
- t1: (fragment, opr-1, opr-2) = (0, 8MB, 0)
- t2: (fragment, opr-1, opr-2) = (0, 0, 8MB)

Example II -
- Multiple batches of size 8MB are received at time t0 - t2 and then is
passed through a set of pass-through operators
- At time t1 owned by operator Opr1, time t2 owned by operator t2, and so
forth
- Assume we report memory usage at time t0 - t2; this is what will be seen
- t0: (fragment, opr-1, opr-2) = (8Mb, 0, 0)
- t1: (fragment, opr-1, opr-2) = (8Mb, 8MB, 0)
- t2: (fragment, opr-1, opr-2) = (8Mb, 8Mb, 8MB)


The key thing is that we clarify our reporting metrics so that users do not
make the wrong conclusions.

Regards,
Salim

On Fri, Apr 27, 2018 at 11:47 PM, salim achouche 
wrote:

> Vlad,
>
> - My understanding is that operators need to take ownership of incoming
> buffers (using
>
> the vector method transferTo())
>
> - My view is not that receivers are pass-through; instead, I feel that
> sender & receiver operators should focus on their business logic
>
> - It just happens that the unordered-receiver does very little
> (deserializes the batch through the BatchLoader)
>
> - Contrast this with the merge-receiver which needs to consume data from
> multiple inputs to provide ordered batches
>
> - The operator implementation will dictate how many batches are consumed
> (this should have nothing to do with communication concerns)
>
> - Intricacies of buffering, acking, back-pressuring, etc is ideally left
> to a communication module
>
>
> My intent, is to consistently report on resource usage (I am fine if we
> exclude pass-through operators as long as we do it consistently). The next
>
> enhancement that I am planning to do is to report on the fragment buffered
> batches. This will enable us to account for such resources when analyzing
>
> memory usage.
>
> On Fri, Apr 27, 2018 at 9:50 PM, vrozov  wrote:
>
>> Github user vrozov commented on the issue:
>>
>> https://github.com/apache/drill/pull/1237
>>
>> IMO, it will be good to understand what other operators do as well.
>> For example what Project or Filter operators do. Do they take ownership of
>> incoming batches? And if they do, when is the ownership taken?
>>
>> I do not suggest that we change how Sender and Receiver control
>> **all** aspects of communication, at least not as part of this JIRA/PR. The
>> difference in my and your approach is whether or not UnorderedReceiver and
>> other receivers are pass-through operators. My view is that receivers are
>> not pass-through operators and they are buffering operators as they receive
>> batches from the network and buffer them before downstream operators are
>> ready to consume those batches. In your view, receivers are pass-through
>> operators that get batches from fragment queue or some other queue and pass
>> them to downstream. As there is no wait and no processing between getting a
>> batch from fragment queue and passing it to the next operator, I don't see
>> why a receiver needs to take the ownership.
>>
>>
>> ---
>>
>
>


[GitHub] drill issue #1241: DRILL-6364: Handle Cluster Info in WebUI when existing/ne...

2018-04-27 Thread kkhatua
Github user kkhatua commented on the issue:

https://github.com/apache/drill/pull/1241
  
@sohami  / @arina-ielchiieva  can you review this? The change is not 
extensive and fairly straightforward.


---


[GitHub] drill issue #1241: DRILL-6364: Handle Cluster Info in WebUI when existing/ne...

2018-04-27 Thread kkhatua
Github user kkhatua commented on the issue:

https://github.com/apache/drill/pull/1241
  
Screenshot of when UI node `kk127` goes down. The UI's javascript logic 
queries other Drillbits in the list (in this case, `kk128`) and discovers two 
new previously unseen Drillbits - `kk130` and `kk129`, discovered in the 
sequence in which they were discovered in the cluster. State changes are marked 
correctly, with shutdown buttons disabled.
A prompt in the form of an orange refresh button near the Drillbit count 
indicates the need to refresh. Alternatively, one of the other nodes can be 
used for pop-out of a new WebUI.


![image](https://user-images.githubusercontent.com/4335237/39389539-681fed40-4a3e-11e8-92f7-6d5e717e0881.png)

 


---


[GitHub] drill pull request #1241: DRILL-6364: Handle Cluster Info in WebUI when exis...

2018-04-27 Thread kkhatua
GitHub user kkhatua opened a pull request:

https://github.com/apache/drill/pull/1241

DRILL-6364: Handle Cluster Info in WebUI when existing/new bits restart

As a follow up to DRILL-6289, the following improvements have been done:
1. When loading the page for the first time, the WebUI enables the shutdown 
button without actually checking the state of the Drillbits.
   The ideal behaviour should be to disable the button till the state is 
verified. **[Done]**
   _If a Drillbit is confirmed down (i.e. not in `/state` response), it is 
marked as OFFLINE and button is disabled._
2. When shutting down the current Drillbit, the WebUI no more has access to 
the cluster state. 
   The ideal behaviour here should be to fetch the state from any of the 
other Drillbits to update the status. **[Done]**
   _With the current Drillbit down, the other bits are requested for 
cluster state info and update accordingly._
3. When a new, previously unseen Drillbit comes up, the WebUI will never 
render it because the table is statically generated during the first page load. 
   The idea behaviour should be to append to the table on discovery of a 
new node. **[Done]**
   _The new Drillbit info is injected and a prompt appears to refresh the 
page to re-populate any missing info. This also works with feature (2) 
mentioned above._

The only Java code change was to have the state response carry the address 
and http-port as a tuple, instead of the user-port (which seems to be never 
used).

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kkhatua/drill DRILL-6364

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/1241.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1241


commit ab3e8619c6259803eb362be290a3a3605839a194
Author: Kunal Khatua 
Date:   2018-04-27T23:27:45Z

DRILL-6364: Handle Cluster Info in WebUI when existing/new bits restart

As a follow up to DRILL-6289, the following improvements have been done:
1. When loading the page for the first time, the WebUI enables the shutdown 
button without actually checking the state of the Drillbits.
   The ideal behaviour should be to disable the button till the state is 
verified. [Done]
   If a Drillbit is confirmed down (i.e. not in `/state` response), it is 
marked as OFFLINE and button is disabled.
2. When shutting down the current Drillbit, the WebUI no more has access to 
the cluster state. 
   The ideal behaviour here should be to fetch the state from any of the 
other Drillbits to update the status. [Done]
   With the current Drillbit down, the other bits are requested for cluster 
state info and update accordingly.
3. When a new, previously unseen Drillbit comes up, the WebUI will never 
render it because the table is statically generated during the first page load. 
   The idea behaviour should be to append to the table on discovery of a 
new node. [Done]
   The new Drillbit info is injected and a prompt appears to refresh the 
page to re-populate any missing info. This also works with feature (2) 
mentioned above.

The only Java code change was to have the state response carry the address 
and http-port as a tuple, instead of the user-port (which is never used).




---


[jira] [Created] (DRILL-6364) WebUI does not cleanly handle shutdown and state toggling when Drillbits go on and offline

2018-04-27 Thread Kunal Khatua (JIRA)
Kunal Khatua created DRILL-6364:
---

 Summary: WebUI does not cleanly handle shutdown and state toggling 
when Drillbits go on and offline
 Key: DRILL-6364
 URL: https://issues.apache.org/jira/browse/DRILL-6364
 Project: Apache Drill
  Issue Type: Bug
  Components: Web Server
Reporter: Kunal Khatua
Assignee: Kunal Khatua
 Fix For: 1.14.0


When the webpage is loaded the first time, the shutdown button is enabled by 
default, which might not be correct, since scenarios like HTTPS, etc does not 
support this for remote bits. (i.e the user needs to navigate to that node's UI 
for shutting it down). 

Similarly, when a previously unseen Drillbit comes online, the node will not be 
rendered until the page is refreshed by the user. 

Lastly, if the node from whom the UI page was served goes down, the status 
update for the rest of the cluster is not updated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] drill issue #1184: DRILL-6242 - Use java.sql.[Date|Time|Timestamp] classes t...

2018-04-27 Thread parthchandra
Github user parthchandra commented on the issue:

https://github.com/apache/drill/pull/1184
  
```
What do you mean by "Json representation"? 
```
Sorry, my mistake, got all tangled up. 
```
 we may want to further translate the Local [Date|Time|DateTime] objects 
inside the Map|List to java.sql.[Date|Time|Timestamp] upon access. But to do 
that inside the SqlAccessor, you would need to deep copy the Map|List and build 
another version with the date|time translated into java.sql.date|time.
```
That is what I thought you wanted to get to. If the current state is 
something you can work with, then great.  I can review the final changes once 
you're done and merge them as well. 
Let's move the other discussion to another thread or JIRA.


---


[GitHub] drill issue #1237: DRILL-6348: Fixed code so that Unordered Receiver reports...

2018-04-27 Thread vrozov
Github user vrozov commented on the issue:

https://github.com/apache/drill/pull/1237
  
IMO, it will be good to understand what other operators do as well. For 
example what Project or Filter operators do. Do they take ownership of incoming 
batches? And if they do, when is the ownership taken?

I do not suggest that we change how Sender and Receiver control **all** 
aspects of communication, at least not as part of this JIRA/PR. The difference 
in my and your approach is whether or not UnorderedReceiver and other receivers 
are pass-through operators. My view is that receivers are not pass-through 
operators and they are buffering operators as they receive batches from the 
network and buffer them before downstream operators are ready to consume those 
batches. In your view, receivers are pass-through operators that get batches 
from fragment queue or some other queue and pass them to downstream. As there 
is no wait and no processing between getting a batch from fragment queue and 
passing it to the next operator, I don't see why a receiver needs to take the 
ownership. 


---


[GitHub] drill pull request #1237: DRILL-6348: Fixed code so that Unordered Receiver ...

2018-04-27 Thread sachouche
Github user sachouche commented on a diff in the pull request:

https://github.com/apache/drill/pull/1237#discussion_r184807153
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/unorderedreceiver/UnorderedReceiverBatch.java
 ---
@@ -149,25 +149,32 @@ private RawFragmentBatch getNextBatch() throws 
IOException {
 }
   }
 
+  private RawFragmentBatch getNextNotEmptyBatch() throws IOException {
+RawFragmentBatch batch;
+try {
+  stats.startWait();
--- End diff --

I see; I will then fix any such occurrences when opportunity presents 
itself as I have seen both patterns in the Drill code base.


---


[GitHub] drill pull request #1237: DRILL-6348: Fixed code so that Unordered Receiver ...

2018-04-27 Thread vrozov
Github user vrozov commented on a diff in the pull request:

https://github.com/apache/drill/pull/1237#discussion_r184804819
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/unorderedreceiver/UnorderedReceiverBatch.java
 ---
@@ -149,25 +149,32 @@ private RawFragmentBatch getNextBatch() throws 
IOException {
 }
   }
 
+  private RawFragmentBatch getNextNotEmptyBatch() throws IOException {
+RawFragmentBatch batch;
+try {
+  stats.startWait();
--- End diff --

it may throw `AssertException` now and other exceptions may be added in the 
future.


---


[GitHub] drill issue #1237: DRILL-6348: Fixed code so that Unordered Receiver reports...

2018-04-27 Thread sachouche
Github user sachouche commented on the issue:

https://github.com/apache/drill/pull/1237
  
That was not my intention as my current change aimed at describing the 
system the way it is. 

@parthchandra, any feedback?


---


[GitHub] drill issue #1240: DRILL-6327: Update unary operators to handle IterOutcome....

2018-04-27 Thread parthchandra
Github user parthchandra commented on the issue:

https://github.com/apache/drill/pull/1240
  
+1. Very nicely done.


---


[GitHub] drill issue #1237: DRILL-6348: Fixed code so that Unordered Receiver reports...

2018-04-27 Thread sachouche
Github user sachouche commented on the issue:

https://github.com/apache/drill/pull/1237
  
@vrozov,

**What are we trying to solve / improve**
- Drill is currently not properly reporting memory held in Fragment's 
receive queues
- This makes it hard to analyze OOM conditions
This is what I want to see
- Every operator reporting on the resources it is currently using (needed)
- Fragment held resources (other than the ones already reported by the 
child operators)
- Drilbit level (metadata caches, web-server, ..)
- I am ok to incrementally reach this goal

**Data Exchange Logistic**
- Ideally, the data exchange fabric should be decoupled from the Drill 
Receive / Send operators
- The fabric should be handling all the aspects of pre-fetch / pressuring 
and so forth
- It will tune to the speed of producers / consumers when writing / reading 
data from it
- This infrastructure should have its own resource management and reporting 
capabilities

**Operator based Reporting**
- Receive and Send operators shall not worry about batches they didn't 
consume yet
- Doing so is counter productive as the Data Exchange fabric will interpret 
a "drain" operation as the operator "needing" more data. 
- For example, the merge-receiver should not be managing the receive 
queues; it should only advertise the pattern of data consumption and let the 
exchange fabric figure out the rest. 

The main difference in the two approaches, is that essentially, you are 
preaching for Receive and Send operators to control all aspects of 
communication whereas I am preaching for decoupling such aspects.


---


[GitHub] drill pull request #1237: DRILL-6348: Fixed code so that Unordered Receiver ...

2018-04-27 Thread sachouche
Github user sachouche commented on a diff in the pull request:

https://github.com/apache/drill/pull/1237#discussion_r184747702
  
--- Diff: 
exec/memory/base/src/main/java/org/apache/drill/exec/memory/AllocationManager.java
 ---
@@ -253,10 +261,12 @@ public boolean transferBalance(final BufferLedger 
target) {
   target.historicalLog.recordEvent("incoming(from %s)", 
owningLedger.allocator.name);
 }
 
-boolean overlimit = target.allocator.forceAllocate(size);
+// Release first to handle the case where the current and target 
allocators were part of the same
+// parent / child tree.
 allocator.releaseBytes(size);
+boolean allocationFit = target.allocator.forceAllocate(size);
--- End diff --

- The change of order is an optimization for a parent / child relationship 
as if we don't release first, then we could unnecessarily go over the memory 
budget (double counting).
- The force-alloc() / free() failures should never happen on normal 
conditions; when they do, the best thing to do is to exit. I still prefer not 
to promote the target allocator till it is 100% successful.




---


[GitHub] drill pull request #1237: DRILL-6348: Fixed code so that Unordered Receiver ...

2018-04-27 Thread sachouche
Github user sachouche commented on a diff in the pull request:

https://github.com/apache/drill/pull/1237#discussion_r184727914
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/unorderedreceiver/UnorderedReceiverBatch.java
 ---
@@ -149,25 +149,32 @@ private RawFragmentBatch getNextBatch() throws 
IOException {
 }
   }
 
+  private RawFragmentBatch getNextNotEmptyBatch() throws IOException {
+RawFragmentBatch batch;
+try {
+  stats.startWait();
--- End diff --

Ok good point, as I have seen both practices being done within the Drill 
code. Though, I don't think this is a big deal as I don't see startWait() 
failing as it merely invokes nano time.


---


[GitHub] drill pull request #1237: DRILL-6348: Fixed code so that Unordered Receiver ...

2018-04-27 Thread sachouche
Github user sachouche commented on a diff in the pull request:

https://github.com/apache/drill/pull/1237#discussion_r184730050
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/record/RawFragmentBatch.java 
---
@@ -77,4 +83,46 @@ public long getByteCount() {
   public boolean isAckSent() {
 return ackSent.get();
   }
+
+  /**
+   * Transfer ownership of this DrillBuf to the target allocator. This is 
done for better memory
+   * accounting (that is, the operator should be charged with the body's 
Drillbuf memory).
+   *
+   * NOTES -
+   * 
+   * This operation is a NOOP when a) the current allocator 
(associated with the DrillBuf) is not the
+   * owning allocator or b) the target allocator is already the owner
+   * When transfer happens, a new RawFragmentBatch instance is 
allocated; this is done for proper
+   * DrillBuf reference count accounting
+   * The RPC handling code caches a reference to this RawFragmentBatch 
object instance; release()
+   * calls should be routed to the previous DrillBuf
+   * 
+   *
+   * @param targetAllocator target allocator
+   * @return a new {@link RawFragmentBatch} object instance on success 
(where the buffer ownership has
+   * been switched to the target allocator); otherwise this 
operation is a NOOP (current instance
+   * returned)
+   */
+  public RawFragmentBatch transferBodyOwnership(BufferAllocator 
targetAllocator) {
+if (body == null) {
+  return this; // NOOP
+}
+
+if (!body.getLedger().isOwningLedger()
+ || body.getLedger().isOwner(targetAllocator)) {
+
+  return this;
+}
+
+int writerIndex   = body.writerIndex();
+TransferResult transferResult = 
body.transferOwnership(targetAllocator);
+
+// Set the index and increment reference count
+transferResult.buffer.writerIndex(writerIndex);
+
+// Clear the current Drillbuffer since caller will perform release() 
on the new one
+body.release();
+
+return new RawFragmentBatch(getHeader(), transferResult.buffer, 
getSender(), false);
--- End diff --

We can take up such an enhancement as as part of another JIRA as any 
changes within the RPC layer have to be thoroughly tested.


---


[GitHub] drill pull request #1237: DRILL-6348: Fixed code so that Unordered Receiver ...

2018-04-27 Thread sachouche
Github user sachouche commented on a diff in the pull request:

https://github.com/apache/drill/pull/1237#discussion_r184728292
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/unorderedreceiver/UnorderedReceiverBatch.java
 ---
@@ -149,25 +149,32 @@ private RawFragmentBatch getNextBatch() throws 
IOException {
 }
   }
 
+  private RawFragmentBatch getNextNotEmptyBatch() throws IOException {
+RawFragmentBatch batch;
+try {
+  stats.startWait();
+  batch = getNextBatch();
+
+  // skip over empty batches. we do this since these are basically 
control messages.
+  while (batch != null && batch.getHeader().getDef().getRecordCount() 
== 0
--- End diff --

Ignore this comment as I thought you were releasing the returned batch.


---


[GitHub] drill issue #1225: DRILL-6272: Refactor dynamic UDFs and function initialize...

2018-04-27 Thread arina-ielchiieva
Github user arina-ielchiieva commented on the issue:

https://github.com/apache/drill/pull/1225
  
@vrozov now PR contains two commits: 
1. jmockit and mockito upgrade (DRILL-6363);
2. maven-embedder usage for unit tests (used latest version as you 
suggested) (DRILL-6272).
Please review.


---


[GitHub] drill pull request #1237: DRILL-6348: Fixed code so that Unordered Receiver ...

2018-04-27 Thread sachouche
Github user sachouche commented on a diff in the pull request:

https://github.com/apache/drill/pull/1237#discussion_r184726839
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/unorderedreceiver/UnorderedReceiverBatch.java
 ---
@@ -201,6 +208,11 @@ public IterOutcome next() {
   context.getExecutorState().fail(ex);
   return IterOutcome.STOP;
 } finally {
+
+  if (batch != null) {
+batch.release();
+batch = null;
--- End diff --

The point of this pattern is that if you would like to continue using this 
object then be prepared to know what can and what cannot be used.


---


[GitHub] drill pull request #1238: DRILL-6281: Refactor TimedRunnable

2018-04-27 Thread vrozov
Github user vrozov commented on a diff in the pull request:

https://github.com/apache/drill/pull/1238#discussion_r184724657
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/TimedCallable.java ---
@@ -0,0 +1,266 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.Objects;
+import java.util.concurrent.Callable;
+import java.util.concurrent.CancellationException;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.Future;
+import java.util.concurrent.RejectedExecutionException;
+import java.util.concurrent.TimeUnit;
+import java.util.function.Consumer;
+import java.util.function.Function;
+import java.util.stream.Collectors;
+
+import org.apache.drill.common.exceptions.UserException;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.google.common.base.Preconditions;
+import com.google.common.base.Stopwatch;
+import com.google.common.util.concurrent.MoreExecutors;
+import com.google.common.util.concurrent.ThreadFactoryBuilder;
+
+/**
+ * Class used to allow parallel executions of tasks in a simplified way. 
Also maintains and reports timings of task completion.
+ * TODO: look at switching to fork join.
+ * @param  The time value that will be returned when the task is 
executed.
+ */
+public abstract class TimedCallable implements Callable {
+  private static final Logger logger = 
LoggerFactory.getLogger(TimedCallable.class);
+
+  private static long TIMEOUT_PER_RUNNABLE_IN_MSECS = 15000;
+
+  private volatile long startTime = 0;
+  private volatile long executionTime = -1;
+
+  private static class FutureMapper implements Function {
+int count;
+Throwable throwable = null;
+
+private void setThrowable(Throwable t) {
+  if (throwable == null) {
+throwable = t;
+  } else {
+throwable.addSuppressed(t);
+  }
+}
+
+@Override
+public V apply(Future future) {
+  Preconditions.checkState(future.isDone());
+  if (!future.isCancelled()) {
+try {
+  count++;
+  return future.get();
+} catch (InterruptedException e) {
+  // there is no wait as we are getting result from the 
completed/done future
+  logger.error("Unexpected exception", e);
+  throw UserException.internalError(e)
+  .message("Unexpected exception")
+  .build(logger);
+} catch (ExecutionException e) {
+  setThrowable(e.getCause());
+}
+  } else {
+setThrowable(new CancellationException());
+  }
+  return null;
+}
+  }
+
+  private static class Statistics implements Consumer 
{
+final long start = System.nanoTime();
+final Stopwatch watch = Stopwatch.createStarted();
+long totalExecution;
+long maxExecution;
+int count;
+int startedCount;
+private int doneCount;
+// measure thread creation times
+long earliestStart;
+long latestStart;
+long totalStart;
+
+@Override
+public void accept(TimedCallable task) {
+  count++;
+  long threadStart = task.getStartTime(TimeUnit.NANOSECONDS) - start;
+  if (threadStart >= 0) {
+startedCount++;
+earliestStart = Math.min(earliestStart, threadStart);
+latestStart = Math.max(latestStart, threadStart);
+totalStart += threadStart;
+long executionTime = task.getExecutionTime(TimeUnit.NANOSECONDS);
+if (executionTime != -1) {
+  doneCount++;
+  totalExecution += executionTime;
+  

[jira] [Created] (DRILL-6363) Upgrade jmockit and mockito libs

2018-04-27 Thread Arina Ielchiieva (JIRA)
Arina Ielchiieva created DRILL-6363:
---

 Summary: Upgrade jmockit and mockito libs
 Key: DRILL-6363
 URL: https://issues.apache.org/jira/browse/DRILL-6363
 Project: Apache Drill
  Issue Type: Task
Affects Versions: 1.13.0
Reporter: Arina Ielchiieva
Assignee: Arina Ielchiieva
 Fix For: 1.14.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] drill pull request #1238: DRILL-6281: Refactor TimedRunnable

2018-04-27 Thread arina-ielchiieva
Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1238#discussion_r184701824
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/TimedCallable.java ---
@@ -0,0 +1,266 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.Objects;
+import java.util.concurrent.Callable;
+import java.util.concurrent.CancellationException;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.Future;
+import java.util.concurrent.RejectedExecutionException;
+import java.util.concurrent.TimeUnit;
+import java.util.function.Consumer;
+import java.util.function.Function;
+import java.util.stream.Collectors;
+
+import org.apache.drill.common.exceptions.UserException;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.google.common.base.Preconditions;
+import com.google.common.base.Stopwatch;
+import com.google.common.util.concurrent.MoreExecutors;
+import com.google.common.util.concurrent.ThreadFactoryBuilder;
+
+/**
+ * Class used to allow parallel executions of tasks in a simplified way. 
Also maintains and reports timings of task completion.
+ * TODO: look at switching to fork join.
+ * @param  The time value that will be returned when the task is 
executed.
+ */
+public abstract class TimedCallable implements Callable {
+  private static final Logger logger = 
LoggerFactory.getLogger(TimedCallable.class);
+
+  private static long TIMEOUT_PER_RUNNABLE_IN_MSECS = 15000;
+
+  private volatile long startTime = 0;
+  private volatile long executionTime = -1;
+
+  private static class FutureMapper implements Function {
+int count;
+Throwable throwable = null;
+
+private void setThrowable(Throwable t) {
+  if (throwable == null) {
+throwable = t;
+  } else {
+throwable.addSuppressed(t);
+  }
+}
+
+@Override
+public V apply(Future future) {
+  Preconditions.checkState(future.isDone());
+  if (!future.isCancelled()) {
+try {
+  count++;
+  return future.get();
+} catch (InterruptedException e) {
+  // there is no wait as we are getting result from the 
completed/done future
+  logger.error("Unexpected exception", e);
+  throw UserException.internalError(e)
+  .message("Unexpected exception")
+  .build(logger);
+} catch (ExecutionException e) {
+  setThrowable(e.getCause());
+}
+  } else {
+setThrowable(new CancellationException());
+  }
+  return null;
+}
+  }
+
+  private static class Statistics implements Consumer 
{
+final long start = System.nanoTime();
+final Stopwatch watch = Stopwatch.createStarted();
+long totalExecution;
+long maxExecution;
+int count;
+int startedCount;
+private int doneCount;
+// measure thread creation times
+long earliestStart;
+long latestStart;
+long totalStart;
+
+@Override
+public void accept(TimedCallable task) {
+  count++;
+  long threadStart = task.getStartTime(TimeUnit.NANOSECONDS) - start;
+  if (threadStart >= 0) {
+startedCount++;
+earliestStart = Math.min(earliestStart, threadStart);
+latestStart = Math.max(latestStart, threadStart);
+totalStart += threadStart;
+long executionTime = task.getExecutionTime(TimeUnit.NANOSECONDS);
+if (executionTime != -1) {
+  doneCount++;
+  totalExecution += executionTime;
+ 

[GitHub] drill pull request #1238: DRILL-6281: Refactor TimedRunnable

2018-04-27 Thread vrozov
Github user vrozov commented on a diff in the pull request:

https://github.com/apache/drill/pull/1238#discussion_r184694930
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/TimedCallable.java ---
@@ -0,0 +1,266 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.Objects;
+import java.util.concurrent.Callable;
+import java.util.concurrent.CancellationException;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.Future;
+import java.util.concurrent.RejectedExecutionException;
+import java.util.concurrent.TimeUnit;
+import java.util.function.Consumer;
+import java.util.function.Function;
+import java.util.stream.Collectors;
+
+import org.apache.drill.common.exceptions.UserException;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.google.common.base.Preconditions;
+import com.google.common.base.Stopwatch;
+import com.google.common.util.concurrent.MoreExecutors;
+import com.google.common.util.concurrent.ThreadFactoryBuilder;
+
+/**
+ * Class used to allow parallel executions of tasks in a simplified way. 
Also maintains and reports timings of task completion.
+ * TODO: look at switching to fork join.
+ * @param  The time value that will be returned when the task is 
executed.
+ */
+public abstract class TimedCallable implements Callable {
+  private static final Logger logger = 
LoggerFactory.getLogger(TimedCallable.class);
+
+  private static long TIMEOUT_PER_RUNNABLE_IN_MSECS = 15000;
+
+  private volatile long startTime = 0;
+  private volatile long executionTime = -1;
+
+  private static class FutureMapper implements Function {
+int count;
+Throwable throwable = null;
+
+private void setThrowable(Throwable t) {
+  if (throwable == null) {
+throwable = t;
+  } else {
+throwable.addSuppressed(t);
+  }
+}
+
+@Override
+public V apply(Future future) {
+  Preconditions.checkState(future.isDone());
+  if (!future.isCancelled()) {
+try {
+  count++;
+  return future.get();
+} catch (InterruptedException e) {
+  // there is no wait as we are getting result from the 
completed/done future
+  logger.error("Unexpected exception", e);
+  throw UserException.internalError(e)
+  .message("Unexpected exception")
+  .build(logger);
+} catch (ExecutionException e) {
+  setThrowable(e.getCause());
+}
+  } else {
+setThrowable(new CancellationException());
+  }
+  return null;
+}
+  }
+
+  private static class Statistics implements Consumer 
{
+final long start = System.nanoTime();
+final Stopwatch watch = Stopwatch.createStarted();
+long totalExecution;
+long maxExecution;
+int count;
+int startedCount;
+private int doneCount;
+// measure thread creation times
+long earliestStart;
+long latestStart;
+long totalStart;
+
+@Override
+public void accept(TimedCallable task) {
+  count++;
+  long threadStart = task.getStartTime(TimeUnit.NANOSECONDS) - start;
+  if (threadStart >= 0) {
+startedCount++;
+earliestStart = Math.min(earliestStart, threadStart);
+latestStart = Math.max(latestStart, threadStart);
+totalStart += threadStart;
+long executionTime = task.getExecutionTime(TimeUnit.NANOSECONDS);
+if (executionTime != -1) {
+  doneCount++;
+  totalExecution += executionTime;
+  

[GitHub] drill pull request #1238: DRILL-6281: Refactor TimedRunnable

2018-04-27 Thread arina-ielchiieva
Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1238#discussion_r184693216
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/TimedCallable.java ---
@@ -0,0 +1,258 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.Objects;
+import java.util.concurrent.Callable;
+import java.util.concurrent.CancellationException;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.Future;
+import java.util.concurrent.RejectedExecutionException;
+import java.util.concurrent.TimeUnit;
+import java.util.function.Consumer;
+import java.util.function.Function;
+import java.util.stream.Collectors;
+
+import org.apache.drill.common.exceptions.UserException;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.google.common.base.Preconditions;
+import com.google.common.base.Stopwatch;
+import com.google.common.util.concurrent.MoreExecutors;
+import com.google.common.util.concurrent.ThreadFactoryBuilder;
+
+/**
+ * Class used to allow parallel executions of tasks in a simplified way. 
Also maintains and reports timings of task completion.
+ * TODO: look at switching to fork join.
+ * @param  The time value that will be returned when the task is 
executed.
+ */
+public abstract class TimedCallable implements Callable {
+  private static final Logger logger = 
LoggerFactory.getLogger(TimedCallable.class);
+
+  private static long TIMEOUT_PER_RUNNABLE_IN_MSECS = 15000;
+
+  private volatile long startTime = 0;
+  private volatile long executionTime = -1;
+
+  private static class FutureMapper implements Function {
+int count;
+Throwable throwable = null;
+
+private void setThrowable(Throwable t) {
+  if (throwable == null) {
+throwable = t;
+  } else {
+throwable.addSuppressed(t);
+  }
+}
+
+@Override
+public V apply(Future future) {
+  Preconditions.checkState(future.isDone());
+  if (!future.isCancelled()) {
+try {
+  count++;
+  return future.get();
+} catch (InterruptedException e) {
+  // there is no wait as we are getting result from the 
completed/done future
+  logger.error("Unexpected exception", e);
+  throw UserException.internalError(e)
+  .message("Unexpected exception")
+  .build(logger);
+} catch (ExecutionException e) {
+  setThrowable(e.getCause());
+}
+  } else {
+setThrowable(new CancellationException());
+  }
+  return null;
+}
+  }
+
+  private static class Statistics implements Consumer 
{
+final long start = System.nanoTime();
+final Stopwatch watch = Stopwatch.createStarted();
+long totalExecution = 0;
+long maxExecution = 0;
+int startedCount = 0;
+private int doneCount = 0;
+// measure thread creation times
+long earliestStart = Long.MAX_VALUE;
+long latestStart = 0;
+long totalStart = 0;
+
+@Override
+public void accept(TimedCallable task) {
+  long threadStart = task.getStartTime(TimeUnit.NANOSECONDS) - start;
+  if (threadStart >= 0) {
+startedCount++;
+earliestStart = Math.min(earliestStart, threadStart);
+latestStart = Math.max(latestStart, threadStart);
+totalStart += threadStart;
+long executionTime = task.getExecutionTime(TimeUnit.NANOSECONDS);
+if (executionTime != -1) {
+  doneCount++;
+  totalExecution += executionTime;
+

[GitHub] drill pull request #1238: DRILL-6281: Refactor TimedRunnable

2018-04-27 Thread vrozov
Github user vrozov commented on a diff in the pull request:

https://github.com/apache/drill/pull/1238#discussion_r184691926
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/TimedCallable.java ---
@@ -0,0 +1,258 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.Objects;
+import java.util.concurrent.Callable;
+import java.util.concurrent.CancellationException;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.Future;
+import java.util.concurrent.RejectedExecutionException;
+import java.util.concurrent.TimeUnit;
+import java.util.function.Consumer;
+import java.util.function.Function;
+import java.util.stream.Collectors;
+
+import org.apache.drill.common.exceptions.UserException;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.google.common.base.Preconditions;
+import com.google.common.base.Stopwatch;
+import com.google.common.util.concurrent.MoreExecutors;
+import com.google.common.util.concurrent.ThreadFactoryBuilder;
+
+/**
+ * Class used to allow parallel executions of tasks in a simplified way. 
Also maintains and reports timings of task completion.
+ * TODO: look at switching to fork join.
+ * @param  The time value that will be returned when the task is 
executed.
+ */
+public abstract class TimedCallable implements Callable {
+  private static final Logger logger = 
LoggerFactory.getLogger(TimedCallable.class);
+
+  private static long TIMEOUT_PER_RUNNABLE_IN_MSECS = 15000;
+
+  private volatile long startTime = 0;
+  private volatile long executionTime = -1;
+
+  private static class FutureMapper implements Function {
+int count;
+Throwable throwable = null;
+
+private void setThrowable(Throwable t) {
+  if (throwable == null) {
+throwable = t;
+  } else {
+throwable.addSuppressed(t);
+  }
+}
+
+@Override
+public V apply(Future future) {
+  Preconditions.checkState(future.isDone());
+  if (!future.isCancelled()) {
+try {
+  count++;
+  return future.get();
+} catch (InterruptedException e) {
+  // there is no wait as we are getting result from the 
completed/done future
+  logger.error("Unexpected exception", e);
+  throw UserException.internalError(e)
+  .message("Unexpected exception")
+  .build(logger);
+} catch (ExecutionException e) {
+  setThrowable(e.getCause());
+}
+  } else {
+setThrowable(new CancellationException());
+  }
+  return null;
+}
+  }
+
+  private static class Statistics implements Consumer 
{
+final long start = System.nanoTime();
+final Stopwatch watch = Stopwatch.createStarted();
+long totalExecution = 0;
+long maxExecution = 0;
+int startedCount = 0;
+private int doneCount = 0;
+// measure thread creation times
+long earliestStart = Long.MAX_VALUE;
+long latestStart = 0;
+long totalStart = 0;
+
+@Override
+public void accept(TimedCallable task) {
+  long threadStart = task.getStartTime(TimeUnit.NANOSECONDS) - start;
+  if (threadStart >= 0) {
+startedCount++;
+earliestStart = Math.min(earliestStart, threadStart);
+latestStart = Math.max(latestStart, threadStart);
+totalStart += threadStart;
+long executionTime = task.getExecutionTime(TimeUnit.NANOSECONDS);
+if (executionTime != -1) {
+  doneCount++;
+  totalExecution += executionTime;
+  

[GitHub] drill issue #1214: DRILL-6331: Revisit Hive Drill native parquet implementat...

2018-04-27 Thread vrozov
Github user vrozov commented on the issue:

https://github.com/apache/drill/pull/1214
  
When moving files around please preserve the history of modifications done 
to the file.


---


[GitHub] drill pull request #1214: DRILL-6331: Revisit Hive Drill native parquet impl...

2018-04-27 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/1214


---


[GitHub] drill pull request #1231: DRILL-6342: Fix schema path unIndexed method to re...

2018-04-27 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/1231


---


[GitHub] drill issue #1230: DRILL-6345: DRILL Query fails on Function LOG10

2018-04-27 Thread vladimirtkach
Github user vladimirtkach commented on the issue:

https://github.com/apache/drill/pull/1230
  
@vvysotskyi made changes according to your remarks


---