[GitHub] drill issue #921: DRILL-4286 Graceful shutdown of drillbit

2017-11-28 Thread paul-rogers
Github user paul-rogers commented on the issue:

https://github.com/apache/drill/pull/921
  
My suggestion is to go ahead and commit this PR with Jyothsna's change to 
disable the "graceful" part of the graceful shutdown. The feature will be in 
the product and people can turn it on to play with it. Then, @dvjyothsna should 
file a JIRA for the improvements we discussed (enabling by default for true 
servers, disabling for embedded.)


---


[GitHub] drill issue #921: DRILL-4286 Graceful shutdown of drillbit

2017-11-28 Thread arina-ielchiieva
Github user arina-ielchiieva commented on the issue:

https://github.com/apache/drill/pull/921
  
Since we don't want to hold the release any more. @paul-rogers should we 
address this issue in the next release or move this PR to the next release? I 
am in the favor of the second.


---


[GitHub] drill issue #921: DRILL-4286 Graceful shutdown of drillbit

2017-11-27 Thread paul-rogers
Github user paul-rogers commented on the issue:

https://github.com/apache/drill/pull/921
  
Interesting issue! Let's think about this a bit.

In a production system, we do want the grace period; it is an essential 
part of the graceful shutdown procedure.

However, if we are doing a non-graceful shutdown, the grace is unneeded.

Also, if the cluster contains only one node (as in most unit tests), there 
is nothing to wait for, so the grace period is not needed. The same is true in 
an embedded Drillbit for Sqlline.

So, can we provide a solution that handles these cases rather than simply 
turning off the grace period always?

If using the local cluster coordinator, say, then no grace is needed. If 
using ZK, but there is only one Drillbit, no grace is needed. (There is a race 
condition, but may be OK.)

Or, if we detect we are embedded, no grace period.

Then, also, if we are doing a graceful shutdown, we need the grace. But, if 
we are doing a "classic" shutdown, no grace is needed.

The result should be that the grace period is used only in production 
servers, only when doing a graceful shutdown.

There are probably some details to simplify, but I hope the above 
communicates the idea. Can we make this work?


---


[GitHub] drill issue #921: DRILL-4286 Graceful shutdown of drillbit

2017-11-27 Thread dvjyothsna
Github user dvjyothsna commented on the issue:

https://github.com/apache/drill/pull/921
  
This was due to grace_period during shutdown of drillbits. Fixed it by 
changing grace_period = 0 in .conf


---


[GitHub] drill issue #921: DRILL-4286 Graceful shutdown of drillbit

2017-11-27 Thread arina-ielchiieva
Github user arina-ielchiieva commented on the issue:

https://github.com/apache/drill/pull/921
  
@dvjyothsna all tests has passed. Though there is significant performance 
degradation.
Before your changes unit test run takes ~ 33 minutes, after ~ 1 hour 18 
minutes. 
For example, exec package:
before `[INFO] exec/Java Execution Engine . SUCCESS 
[17:14 min]`,
after `[INFO] exec/Java Execution Engine . SUCCESS 
[56:45 min]`.
Since you did not add that many unit tests, could you please explain the 
reason of such performance degradation and ideally fix it.
Attached two unit tests outputs in Jira.



---


[GitHub] drill issue #921: DRILL-4286 Graceful shutdown of drillbit

2017-11-26 Thread arina-ielchiieva
Github user arina-ielchiieva commented on the issue:

https://github.com/apache/drill/pull/921
  
There are unit test failures with this fix. Full output is attached into 
Jira. Please check and fix.
```
  TestMetadataProvider.columns:251 expected:<92> but was:<93>
  
DrillSeparatePlanningTest.testMultiMinorFragmentComplexQuery:122->getResultsHelper:219
 expected:<1> but was:<0>
  
DrillSeparatePlanningTest.testSingleFragmentQuery:88->getResultsHelper:219 
expected:<1> but was:<0>
  
DrillSeparatePlanningTest.testMultiMinorFragmentSimpleQuery:105->getResultsHelper:219
 expected:<1> but was:<0>
```


---


[GitHub] drill issue #921: DRILL-4286 Graceful shutdown of drillbit

2017-11-26 Thread arina-ielchiieva
Github user arina-ielchiieva commented on the issue:

https://github.com/apache/drill/pull/921
  
+1, LGTM.


---


[GitHub] drill issue #921: DRILL-4286 Graceful shutdown of drillbit

2017-11-25 Thread arina-ielchiieva
Github user arina-ielchiieva commented on the issue:

https://github.com/apache/drill/pull/921
  
Is there any ETA for the fix?


---


[GitHub] drill issue #921: DRILL-4286 Graceful shutdown of drillbit

2017-11-22 Thread arina-ielchiieva
Github user arina-ielchiieva commented on the issue:

https://github.com/apache/drill/pull/921
  
@dvjyothsna  also please remove `e.printStackTrace();` from the code, I see 
4 occurrences.


---


[GitHub] drill issue #921: DRILL-4286 Graceful shutdown of drillbit

2017-11-22 Thread arina-ielchiieva
Github user arina-ielchiieva commented on the issue:

https://github.com/apache/drill/pull/921
  
@paul-rogers is right, there is something wrong with the protos @dvjyothsna 
has generated.
I took master branch and generated protos. No changes so these extra 
changes we see are wrong. Also I have created branch where I have added changes 
in protos from this PR [1] and from the diff it can be seen that no extra 
changes as well [2].

@dvjyothsna please address this issue.
If it helps:
I have installed protoc:
```
protoc --version
libprotoc 2.5.0
```
and generated using the following:
```
cd ~/git_repo/drill/protocol
mvn process-sources -P proto-compile
```
[1] https://github.com/arina-ielchiieva/drill/tree/DRILL-4386-proto
[2] 
https://github.com/arina-ielchiieva/drill/commit/07af545ab27b77e5f86002d4ce0f54b3dd371343


---


[GitHub] drill issue #921: DRILL-4286 Graceful shutdown of drillbit

2017-10-20 Thread arina-ielchiieva
Github user arina-ielchiieva commented on the issue:

https://github.com/apache/drill/pull/921
  
@dvjyothsna is there any update on the status for this PR? Should it be 
targeted for the 1.12.0 release?


---


[GitHub] drill issue #921: DRILL-4286 Graceful shutdown of drillbit

2017-09-13 Thread dvjyothsna
Github user dvjyothsna commented on the issue:

https://github.com/apache/drill/pull/921
  
@weijietong Please refer to comments under JIRA - 
https://issues.apache.org/jira/browse/DRILL-4286


---


[GitHub] drill issue #921: DRILL-4286 Graceful shutdown of drillbit

2017-08-27 Thread weijietong
Github user weijietong commented on the issue:

https://github.com/apache/drill/pull/921
  
@dvjyothsna thanks for your contribution. Some objections please consider, 
the shutdown process should be quick and allow fault occurred. Google calls it 
lame duck model. If your cluster is large and has high concurrent queries,the 
grace model will let the shutdown process and the service unavailable time both 
too long. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---