[jira] [Commented] (DRILL-6039) drillbit.sh graceful_stop does not wait for fragments to complete before stopping the drillbit

2018-12-04 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16708357#comment-16708357
 ] 

ASF GitHub Bot commented on DRILL-6039:
---

vdiravka commented on issue #1560: Fixed regression caused by DRILL-6039
URL: https://github.com/apache/drill/pull/1560#issuecomment-444010551
 
 
   @dvjyothsna Please follow the usual PR and commits naming. It allows to add 
link for PR to the Jira ticket automatically.
   > DRILL-6877: NPE when starting Drill on Windows


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> drillbit.sh graceful_stop does not wait for fragments to complete before 
> stopping the drillbit
> --
>
> Key: DRILL-6039
> URL: https://issues.apache.org/jira/browse/DRILL-6039
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.3.0
>Reporter: Krystal
>Assignee: Venkata Jyothsna Donapati
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.15.0
>
>
> git.commit.id.abbrev=eb0c403
> I have 3-nodes cluster with drillbits running on each node.  I kicked off a 
> long running query.  In the middle of the query, I did a "./drillbit.sh 
> graceful_stop" on one of the non-foreman node.  The node was stopped within a 
> few seconds and the query failed with error:
> Error: SYSTEM ERROR: IOException: Filesystem closed
> Fragment 4:15



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6039) drillbit.sh graceful_stop does not wait for fragments to complete before stopping the drillbit

2018-12-03 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16708033#comment-16708033
 ] 

ASF GitHub Bot commented on DRILL-6039:
---

dvjyothsna commented on issue #1560: Fixed regression caused by DRILL-6039
URL: https://github.com/apache/drill/pull/1560#issuecomment-443929243
 
 
   @sohami Please review.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> drillbit.sh graceful_stop does not wait for fragments to complete before 
> stopping the drillbit
> --
>
> Key: DRILL-6039
> URL: https://issues.apache.org/jira/browse/DRILL-6039
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.3.0
>Reporter: Krystal
>Assignee: Venkata Jyothsna Donapati
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.15.0
>
>
> git.commit.id.abbrev=eb0c403
> I have 3-nodes cluster with drillbits running on each node.  I kicked off a 
> long running query.  In the middle of the query, I did a "./drillbit.sh 
> graceful_stop" on one of the non-foreman node.  The node was stopped within a 
> few seconds and the query failed with error:
> Error: SYSTEM ERROR: IOException: Filesystem closed
> Fragment 4:15



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6039) drillbit.sh graceful_stop does not wait for fragments to complete before stopping the drillbit

2018-12-03 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16708025#comment-16708025
 ] 

ASF GitHub Bot commented on DRILL-6039:
---

dvjyothsna opened a new pull request #1560: Fixed regression caused by 
DRILL-6039
URL: https://github.com/apache/drill/pull/1560
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> drillbit.sh graceful_stop does not wait for fragments to complete before 
> stopping the drillbit
> --
>
> Key: DRILL-6039
> URL: https://issues.apache.org/jira/browse/DRILL-6039
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.3.0
>Reporter: Krystal
>Assignee: Venkata Jyothsna Donapati
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.15.0
>
>
> git.commit.id.abbrev=eb0c403
> I have 3-nodes cluster with drillbits running on each node.  I kicked off a 
> long running query.  In the middle of the query, I did a "./drillbit.sh 
> graceful_stop" on one of the non-foreman node.  The node was stopped within a 
> few seconds and the query failed with error:
> Error: SYSTEM ERROR: IOException: Filesystem closed
> Fragment 4:15



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6039) drillbit.sh graceful_stop does not wait for fragments to complete before stopping the drillbit

2018-11-29 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16703751#comment-16703751
 ] 

ASF GitHub Bot commented on DRILL-6039:
---

asfgit closed pull request #1536: DRILL-6039: Fixed drillbit.sh script to do 
graceful shutdown
URL: https://github.com/apache/drill/pull/1536
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/distribution/src/resources/drill-config.sh 
b/distribution/src/resources/drill-config.sh
index d23788b006a..a4686c50354 100644
--- a/distribution/src/resources/drill-config.sh
+++ b/distribution/src/resources/drill-config.sh
@@ -334,6 +334,7 @@ fi
 # provided in drill-env.sh.
 
 export DRILL_PID_DIR=${DRILL_PID_DIR:-$DRILL_HOME}
+export GRACEFUL_SIGFILE=${GRACEFUL_SIGFILE:-"graceful"}
 
 # Prepare log file prefix and the main Drillbit log file.
 
diff --git a/distribution/src/resources/drillbit.sh 
b/distribution/src/resources/drillbit.sh
index 88d56c8a14f..5ad87b15cdb 100755
--- a/distribution/src/resources/drillbit.sh
+++ b/distribution/src/resources/drillbit.sh
@@ -87,6 +87,7 @@ export args
 
 # Set default scheduling priority
 DRILL_NICENESS=${DRILL_NICENESS:-0}
+GRACEFUL_FILE=$DRILL_PID_DIR/$GRACEFUL_SIGFILE
 
 waitForProcessEnd()
 {
@@ -94,11 +95,19 @@ waitForProcessEnd()
   commandName=$2
   kill_drillbit=$3
   processedAt=`date +%s`
+  triggered_shutdown=false
   origcnt=${DRILL_STOP_TIMEOUT:-120}
   while kill -0 $pidKilled > /dev/null 2>&1;
do
  echo -n "."
  sleep 1;
+ #Incase of graceful shutdown, create graceful file and wait till the 
process ends.
+ if [ "$kill_drillbit" = false ]; then
+   if [ "$triggered_shutdown" = false ]; then
+ touch $GRACEFUL_FILE
+ triggered_shutdown=true
+   fi
+ fi
  if [ "$kill_drillbit" = true ] ; then
 # if process persists more than $DRILL_STOP_TIMEOUT (default 120 sec) 
no mercy
 if [ $(( `date +%s` - $processedAt )) -gt $origcnt ]; then
@@ -125,6 +134,15 @@ check_before_start()
   exit 1
 fi
   fi
+   #remove any previous uncleaned graceful file
+  if [ -f "$GRACEFUL_FILE" ]; then
+rm $GRACEFUL_FILE
+rm_status=$?
+if [ $rm_status -ne 0 ];then
+echo "Error: Failed to remove $GRACEFUL_FILE!"
+exit $rm_status
+fi
+  fi
 }
 
 check_after_start(){
@@ -204,7 +222,9 @@ stop_bit ( )
 if kill -0 $pidToKill > /dev/null 2>&1; then
   echo "Stopping $command"
   echo "`date` Terminating $command pid $pidToKill" >> "$DRILLBIT_LOG_PATH"
-  kill $pidToKill > /dev/null 2>&1
+  if [ $kill_drillbit = true ]; then
+kill $pidToKill > /dev/null 2>&1
+  fi
   waitForProcessEnd $pidToKill $command $kill_drillbit
   retval=0
 else
diff --git 
a/exec/java-exec/src/main/java/org/apache/drill/exec/server/Drillbit.java 
b/exec/java-exec/src/main/java/org/apache/drill/exec/server/Drillbit.java
index dd1c5f19faf..a0c63ab6dcc 100644
--- a/exec/java-exec/src/main/java/org/apache/drill/exec/server/Drillbit.java
+++ b/exec/java-exec/src/main/java/org/apache/drill/exec/server/Drillbit.java
@@ -17,6 +17,13 @@
  */
 package org.apache.drill.exec.server;
 
+import java.io.IOException;
+import java.nio.file.FileSystems;
+import java.nio.file.Path;
+import java.nio.file.StandardWatchEventKinds;
+import java.nio.file.WatchEvent;
+import java.nio.file.WatchKey;
+import java.nio.file.WatchService;
 import java.util.concurrent.TimeUnit;
 import java.util.concurrent.atomic.AtomicInteger;
 
@@ -90,6 +97,8 @@
   private DrillbitStateManager stateManager;
   private boolean quiescentMode;
   private boolean forcefulShutdown = false;
+  GracefulShutdownThread gracefulShutdownThread;
+  private boolean interruptPollShutdown = true;
 
   public void setQuiescentMode(boolean quiescentMode) {
 this.quiescentMode = quiescentMode;
@@ -212,6 +221,8 @@ public void run() throws Exception {
 drillbitContext.startRM();
 
 Runtime.getRuntime().addShutdownHook(new ShutdownThread(this, new 
StackTrace()));
+gracefulShutdownThread = new GracefulShutdownThread(this, new 
StackTrace());
+gracefulShutdownThread.start();
 logger.info("Startup completed ({} ms).", 
w.elapsed(TimeUnit.MILLISECONDS));
   }
 
@@ -291,6 +302,11 @@ public synchronized void close() {
 
 logger.info("Shutdown completed ({} ms).", 
w.elapsed(TimeUnit.MILLISECONDS) );
 stateManager.setState(DrillbitState.SHUTDOWN);
+// Interrupt GracefulShutdownThread since Drillbit close is not called 
from it.
+if (interruptPollShutdown) {
+  gracefulShutdownThread.interrupt();
+}
+
   }
 
   private void javaPropertiesToSystemOptions() {
@@ -335,6 +351,55 @@ private void javaPropertiesToSystemOptions() {

[jira] [Commented] (DRILL-6039) drillbit.sh graceful_stop does not wait for fragments to complete before stopping the drillbit

2018-11-26 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16699599#comment-16699599
 ] 

ASF GitHub Bot commented on DRILL-6039:
---

dvjyothsna commented on issue #1536: DRILL-6039: Fixed drillbit.sh script to do 
graceful shutdown
URL: https://github.com/apache/drill/pull/1536#issuecomment-441804936
 
 
   Thank you Sorabh! Squashed all the commits.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> drillbit.sh graceful_stop does not wait for fragments to complete before 
> stopping the drillbit
> --
>
> Key: DRILL-6039
> URL: https://issues.apache.org/jira/browse/DRILL-6039
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.3.0
>Reporter: Krystal
>Assignee: Venkata Jyothsna Donapati
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.15.0
>
>
> git.commit.id.abbrev=eb0c403
> I have 3-nodes cluster with drillbits running on each node.  I kicked off a 
> long running query.  In the middle of the query, I did a "./drillbit.sh 
> graceful_stop" on one of the non-foreman node.  The node was stopped within a 
> few seconds and the query failed with error:
> Error: SYSTEM ERROR: IOException: Filesystem closed
> Fragment 4:15



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6039) drillbit.sh graceful_stop does not wait for fragments to complete before stopping the drillbit

2018-11-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695286#comment-16695286
 ] 

ASF GitHub Bot commented on DRILL-6039:
---

dvjyothsna commented on a change in pull request #1536: DRILL-6039: Fixed 
drillbit.sh script to do graceful shutdown
URL: https://github.com/apache/drill/pull/1536#discussion_r235547490
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/Drillbit.java
 ##
 @@ -335,6 +351,50 @@ private void javaPropertiesToSystemOptions() {
 }
   }
 
+
+  // Polls for graceful file to check if graceful shutdown is triggered from 
the script.
+  private static class PollShutdownThread extends Thread {
+
+private final Drillbit drillbit;
+private final StackTrace stackTrace;
+
+public PollShutdownThread(final Drillbit drillbit, final StackTrace 
stackTrace) {
+  this.drillbit = drillbit;
+  this.stackTrace = stackTrace;
+}
+
+@Override
+public void run () {
+  try {
+pollShutdown(drillbit);
+  } catch (Exception e) {
+throw new RuntimeException("Caught exception while polling for 
shutdown\n" + stackTrace, e);
+  }
 
 Review comment:
   Changed it


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> drillbit.sh graceful_stop does not wait for fragments to complete before 
> stopping the drillbit
> --
>
> Key: DRILL-6039
> URL: https://issues.apache.org/jira/browse/DRILL-6039
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.3.0
>Reporter: Krystal
>Assignee: Venkata Jyothsna Donapati
>Priority: Major
> Fix For: 1.15.0
>
>
> git.commit.id.abbrev=eb0c403
> I have 3-nodes cluster with drillbits running on each node.  I kicked off a 
> long running query.  In the middle of the query, I did a "./drillbit.sh 
> graceful_stop" on one of the non-foreman node.  The node was stopped within a 
> few seconds and the query failed with error:
> Error: SYSTEM ERROR: IOException: Filesystem closed
> Fragment 4:15



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6039) drillbit.sh graceful_stop does not wait for fragments to complete before stopping the drillbit

2018-11-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695288#comment-16695288
 ] 

ASF GitHub Bot commented on DRILL-6039:
---

dvjyothsna commented on a change in pull request #1536: DRILL-6039: Fixed 
drillbit.sh script to do graceful shutdown
URL: https://github.com/apache/drill/pull/1536#discussion_r235547582
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/Drillbit.java
 ##
 @@ -335,6 +351,50 @@ private void javaPropertiesToSystemOptions() {
 }
   }
 
+
+  // Polls for graceful file to check if graceful shutdown is triggered from 
the script.
+  private static class PollShutdownThread extends Thread {
+
+private final Drillbit drillbit;
+private final StackTrace stackTrace;
+
+public PollShutdownThread(final Drillbit drillbit, final StackTrace 
stackTrace) {
+  this.drillbit = drillbit;
+  this.stackTrace = stackTrace;
+}
+
+@Override
+public void run () {
+  try {
+pollShutdown(drillbit);
+  } catch (Exception e) {
+throw new RuntimeException("Caught exception while polling for 
shutdown\n" + stackTrace, e);
+  }
+}
+
+private void pollShutdown(Drillbit drillbit) throws IOException, 
InterruptedException {
+  final Path path = 
FileSystems.getDefault().getPath(System.getenv("DRILL_PID_DIR"));
+  final String file = System.getenv("GRACEFUL_SIGFILE");
+  boolean triggered_shutdown = false;
+  try (final WatchService watchService = 
FileSystems.getDefault().newWatchService()) {
+path.register(watchService, StandardWatchEventKinds.ENTRY_MODIFY, 
StandardWatchEventKinds.ENTRY_CREATE);
+while (!triggered_shutdown) {
+  final WatchKey wk = watchService.take();
+  for (WatchEvent event : wk.pollEvents()) {
+final Path changed = (Path) event.context();
 
 Review comment:
   Added a null check for event.context().


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> drillbit.sh graceful_stop does not wait for fragments to complete before 
> stopping the drillbit
> --
>
> Key: DRILL-6039
> URL: https://issues.apache.org/jira/browse/DRILL-6039
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.3.0
>Reporter: Krystal
>Assignee: Venkata Jyothsna Donapati
>Priority: Major
> Fix For: 1.15.0
>
>
> git.commit.id.abbrev=eb0c403
> I have 3-nodes cluster with drillbits running on each node.  I kicked off a 
> long running query.  In the middle of the query, I did a "./drillbit.sh 
> graceful_stop" on one of the non-foreman node.  The node was stopped within a 
> few seconds and the query failed with error:
> Error: SYSTEM ERROR: IOException: Filesystem closed
> Fragment 4:15



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6039) drillbit.sh graceful_stop does not wait for fragments to complete before stopping the drillbit

2018-11-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695291#comment-16695291
 ] 

ASF GitHub Bot commented on DRILL-6039:
---

dvjyothsna commented on a change in pull request #1536: DRILL-6039: Fixed 
drillbit.sh script to do graceful shutdown
URL: https://github.com/apache/drill/pull/1536#discussion_r235547885
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/Drillbit.java
 ##
 @@ -335,6 +351,50 @@ private void javaPropertiesToSystemOptions() {
 }
   }
 
+
+  // Polls for graceful file to check if graceful shutdown is triggered from 
the script.
+  private static class PollShutdownThread extends Thread {
+
+private final Drillbit drillbit;
+private final StackTrace stackTrace;
+
+public PollShutdownThread(final Drillbit drillbit, final StackTrace 
stackTrace) {
+  this.drillbit = drillbit;
+  this.stackTrace = stackTrace;
+}
+
+@Override
+public void run () {
+  try {
+pollShutdown(drillbit);
+  } catch (Exception e) {
+throw new RuntimeException("Caught exception while polling for 
shutdown\n" + stackTrace, e);
+  }
+}
+
+private void pollShutdown(Drillbit drillbit) throws IOException, 
InterruptedException {
+  final Path path = 
FileSystems.getDefault().getPath(System.getenv("DRILL_PID_DIR"));
+  final String file = System.getenv("GRACEFUL_SIGFILE");
+  boolean triggered_shutdown = false;
+  try (final WatchService watchService = 
FileSystems.getDefault().newWatchService()) {
+path.register(watchService, StandardWatchEventKinds.ENTRY_MODIFY, 
StandardWatchEventKinds.ENTRY_CREATE);
+while (!triggered_shutdown) {
+  final WatchKey wk = watchService.take();
+  for (WatchEvent event : wk.pollEvents()) {
+final Path changed = (Path) event.context();
+if (changed.endsWith(file)) {
+  drillbit.interruptPollShutdown = false;
+  triggered_shutdown = true;
+  drillbit.close();
+  wk.cancel();
 
 Review comment:
   Done.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> drillbit.sh graceful_stop does not wait for fragments to complete before 
> stopping the drillbit
> --
>
> Key: DRILL-6039
> URL: https://issues.apache.org/jira/browse/DRILL-6039
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.3.0
>Reporter: Krystal
>Assignee: Venkata Jyothsna Donapati
>Priority: Major
> Fix For: 1.15.0
>
>
> git.commit.id.abbrev=eb0c403
> I have 3-nodes cluster with drillbits running on each node.  I kicked off a 
> long running query.  In the middle of the query, I did a "./drillbit.sh 
> graceful_stop" on one of the non-foreman node.  The node was stopped within a 
> few seconds and the query failed with error:
> Error: SYSTEM ERROR: IOException: Filesystem closed
> Fragment 4:15



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6039) drillbit.sh graceful_stop does not wait for fragments to complete before stopping the drillbit

2018-11-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695292#comment-16695292
 ] 

ASF GitHub Bot commented on DRILL-6039:
---

dvjyothsna commented on a change in pull request #1536: DRILL-6039: Fixed 
drillbit.sh script to do graceful shutdown
URL: https://github.com/apache/drill/pull/1536#discussion_r235547910
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/Drillbit.java
 ##
 @@ -335,6 +351,50 @@ private void javaPropertiesToSystemOptions() {
 }
   }
 
+
+  // Polls for graceful file to check if graceful shutdown is triggered from 
the script.
+  private static class PollShutdownThread extends Thread {
+
+private final Drillbit drillbit;
+private final StackTrace stackTrace;
+
+public PollShutdownThread(final Drillbit drillbit, final StackTrace 
stackTrace) {
+  this.drillbit = drillbit;
+  this.stackTrace = stackTrace;
+}
+
+@Override
+public void run () {
+  try {
+pollShutdown(drillbit);
+  } catch (Exception e) {
+throw new RuntimeException("Caught exception while polling for 
shutdown\n" + stackTrace, e);
+  }
+}
+
+private void pollShutdown(Drillbit drillbit) throws IOException, 
InterruptedException {
 
 Review comment:
   Done.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> drillbit.sh graceful_stop does not wait for fragments to complete before 
> stopping the drillbit
> --
>
> Key: DRILL-6039
> URL: https://issues.apache.org/jira/browse/DRILL-6039
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.3.0
>Reporter: Krystal
>Assignee: Venkata Jyothsna Donapati
>Priority: Major
> Fix For: 1.15.0
>
>
> git.commit.id.abbrev=eb0c403
> I have 3-nodes cluster with drillbits running on each node.  I kicked off a 
> long running query.  In the middle of the query, I did a "./drillbit.sh 
> graceful_stop" on one of the non-foreman node.  The node was stopped within a 
> few seconds and the query failed with error:
> Error: SYSTEM ERROR: IOException: Filesystem closed
> Fragment 4:15



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6039) drillbit.sh graceful_stop does not wait for fragments to complete before stopping the drillbit

2018-11-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695290#comment-16695290
 ] 

ASF GitHub Bot commented on DRILL-6039:
---

dvjyothsna commented on a change in pull request #1536: DRILL-6039: Fixed 
drillbit.sh script to do graceful shutdown
URL: https://github.com/apache/drill/pull/1536#discussion_r235547864
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/Drillbit.java
 ##
 @@ -335,6 +351,50 @@ private void javaPropertiesToSystemOptions() {
 }
   }
 
+
+  // Polls for graceful file to check if graceful shutdown is triggered from 
the script.
+  private static class PollShutdownThread extends Thread {
 
 Review comment:
   Changed it. Initially, it was GracefulShutdownThread but I thought people 
might be confused since it only gracefully shuts down from the script.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> drillbit.sh graceful_stop does not wait for fragments to complete before 
> stopping the drillbit
> --
>
> Key: DRILL-6039
> URL: https://issues.apache.org/jira/browse/DRILL-6039
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.3.0
>Reporter: Krystal
>Assignee: Venkata Jyothsna Donapati
>Priority: Major
> Fix For: 1.15.0
>
>
> git.commit.id.abbrev=eb0c403
> I have 3-nodes cluster with drillbits running on each node.  I kicked off a 
> long running query.  In the middle of the query, I did a "./drillbit.sh 
> graceful_stop" on one of the non-foreman node.  The node was stopped within a 
> few seconds and the query failed with error:
> Error: SYSTEM ERROR: IOException: Filesystem closed
> Fragment 4:15



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6039) drillbit.sh graceful_stop does not wait for fragments to complete before stopping the drillbit

2018-11-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695287#comment-16695287
 ] 

ASF GitHub Bot commented on DRILL-6039:
---

dvjyothsna commented on a change in pull request #1536: DRILL-6039: Fixed 
drillbit.sh script to do graceful shutdown
URL: https://github.com/apache/drill/pull/1536#discussion_r235547528
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/Drillbit.java
 ##
 @@ -291,6 +302,11 @@ public synchronized void close() {
 
 logger.info("Shutdown completed ({} ms).", 
w.elapsed(TimeUnit.MILLISECONDS) );
 stateManager.setState(DrillbitState.SHUTDOWN);
+// Interrupt the polling for shutdown since shutdown is triggered from 
WebUI.
 
 Review comment:
   Done.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> drillbit.sh graceful_stop does not wait for fragments to complete before 
> stopping the drillbit
> --
>
> Key: DRILL-6039
> URL: https://issues.apache.org/jira/browse/DRILL-6039
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.3.0
>Reporter: Krystal
>Assignee: Venkata Jyothsna Donapati
>Priority: Major
> Fix For: 1.15.0
>
>
> git.commit.id.abbrev=eb0c403
> I have 3-nodes cluster with drillbits running on each node.  I kicked off a 
> long running query.  In the middle of the query, I did a "./drillbit.sh 
> graceful_stop" on one of the non-foreman node.  The node was stopped within a 
> few seconds and the query failed with error:
> Error: SYSTEM ERROR: IOException: Filesystem closed
> Fragment 4:15



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6039) drillbit.sh graceful_stop does not wait for fragments to complete before stopping the drillbit

2018-11-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695281#comment-16695281
 ] 

ASF GitHub Bot commented on DRILL-6039:
---

dvjyothsna commented on a change in pull request #1536: DRILL-6039: Fixed 
drillbit.sh script to do graceful shutdown
URL: https://github.com/apache/drill/pull/1536#discussion_r235547065
 
 

 ##
 File path: distribution/src/resources/drillbit.sh
 ##
 @@ -192,6 +201,10 @@ start_bit ( )
   echo $procId > $pidFile # Yeah, $pidFile is a file, $procId is the pid...
   echo $! > $pidFile
   sleep 1
+  #remove any previous uncleaned graceful file
+  if [ -f "$GRACEFUL_FILE" ]; then
+rm $GRACEFUL_FILE
 
 Review comment:
   The error thrown by rm is currently logged to console.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> drillbit.sh graceful_stop does not wait for fragments to complete before 
> stopping the drillbit
> --
>
> Key: DRILL-6039
> URL: https://issues.apache.org/jira/browse/DRILL-6039
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.3.0
>Reporter: Krystal
>Assignee: Venkata Jyothsna Donapati
>Priority: Major
> Fix For: 1.15.0
>
>
> git.commit.id.abbrev=eb0c403
> I have 3-nodes cluster with drillbits running on each node.  I kicked off a 
> long running query.  In the middle of the query, I did a "./drillbit.sh 
> graceful_stop" on one of the non-foreman node.  The node was stopped within a 
> few seconds and the query failed with error:
> Error: SYSTEM ERROR: IOException: Filesystem closed
> Fragment 4:15



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6039) drillbit.sh graceful_stop does not wait for fragments to complete before stopping the drillbit

2018-11-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695117#comment-16695117
 ] 

ASF GitHub Bot commented on DRILL-6039:
---

sohami commented on a change in pull request #1536: DRILL-6039: Fixed 
drillbit.sh script to do graceful shutdown
URL: https://github.com/apache/drill/pull/1536#discussion_r235502954
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/Drillbit.java
 ##
 @@ -291,6 +302,11 @@ public synchronized void close() {
 
 logger.info("Shutdown completed ({} ms).", 
w.elapsed(TimeUnit.MILLISECONDS) );
 stateManager.setState(DrillbitState.SHUTDOWN);
+// Interrupt the polling for shutdown since shutdown is triggered from 
WebUI.
 
 Review comment:
   `Interrupt GracefulShutdownThread since Drillbit close is not called from it`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> drillbit.sh graceful_stop does not wait for fragments to complete before 
> stopping the drillbit
> --
>
> Key: DRILL-6039
> URL: https://issues.apache.org/jira/browse/DRILL-6039
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.3.0
>Reporter: Krystal
>Assignee: Venkata Jyothsna Donapati
>Priority: Major
> Fix For: 1.15.0
>
>
> git.commit.id.abbrev=eb0c403
> I have 3-nodes cluster with drillbits running on each node.  I kicked off a 
> long running query.  In the middle of the query, I did a "./drillbit.sh 
> graceful_stop" on one of the non-foreman node.  The node was stopped within a 
> few seconds and the query failed with error:
> Error: SYSTEM ERROR: IOException: Filesystem closed
> Fragment 4:15



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6039) drillbit.sh graceful_stop does not wait for fragments to complete before stopping the drillbit

2018-11-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695115#comment-16695115
 ] 

ASF GitHub Bot commented on DRILL-6039:
---

sohami commented on a change in pull request #1536: DRILL-6039: Fixed 
drillbit.sh script to do graceful shutdown
URL: https://github.com/apache/drill/pull/1536#discussion_r235491872
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/Drillbit.java
 ##
 @@ -335,6 +351,50 @@ private void javaPropertiesToSystemOptions() {
 }
   }
 
+
+  // Polls for graceful file to check if graceful shutdown is triggered from 
the script.
+  private static class PollShutdownThread extends Thread {
 
 Review comment:
   Rename to `GracefulShutdownThread`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> drillbit.sh graceful_stop does not wait for fragments to complete before 
> stopping the drillbit
> --
>
> Key: DRILL-6039
> URL: https://issues.apache.org/jira/browse/DRILL-6039
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.3.0
>Reporter: Krystal
>Assignee: Venkata Jyothsna Donapati
>Priority: Major
> Fix For: 1.15.0
>
>
> git.commit.id.abbrev=eb0c403
> I have 3-nodes cluster with drillbits running on each node.  I kicked off a 
> long running query.  In the middle of the query, I did a "./drillbit.sh 
> graceful_stop" on one of the non-foreman node.  The node was stopped within a 
> few seconds and the query failed with error:
> Error: SYSTEM ERROR: IOException: Filesystem closed
> Fragment 4:15



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6039) drillbit.sh graceful_stop does not wait for fragments to complete before stopping the drillbit

2018-11-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695118#comment-16695118
 ] 

ASF GitHub Bot commented on DRILL-6039:
---

sohami commented on a change in pull request #1536: DRILL-6039: Fixed 
drillbit.sh script to do graceful shutdown
URL: https://github.com/apache/drill/pull/1536#discussion_r235501156
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/Drillbit.java
 ##
 @@ -335,6 +351,50 @@ private void javaPropertiesToSystemOptions() {
 }
   }
 
+
+  // Polls for graceful file to check if graceful shutdown is triggered from 
the script.
+  private static class PollShutdownThread extends Thread {
+
+private final Drillbit drillbit;
+private final StackTrace stackTrace;
+
+public PollShutdownThread(final Drillbit drillbit, final StackTrace 
stackTrace) {
+  this.drillbit = drillbit;
+  this.stackTrace = stackTrace;
+}
+
+@Override
+public void run () {
+  try {
+pollShutdown(drillbit);
+  } catch (Exception e) {
+throw new RuntimeException("Caught exception while polling for 
shutdown\n" + stackTrace, e);
+  }
+}
+
+private void pollShutdown(Drillbit drillbit) throws IOException, 
InterruptedException {
+  final Path path = 
FileSystems.getDefault().getPath(System.getenv("DRILL_PID_DIR"));
+  final String file = System.getenv("GRACEFUL_SIGFILE");
+  boolean triggered_shutdown = false;
+  try (final WatchService watchService = 
FileSystems.getDefault().newWatchService()) {
+path.register(watchService, StandardWatchEventKinds.ENTRY_MODIFY, 
StandardWatchEventKinds.ENTRY_CREATE);
+while (!triggered_shutdown) {
+  final WatchKey wk = watchService.take();
+  for (WatchEvent event : wk.pollEvents()) {
+final Path changed = (Path) event.context();
 
 Review comment:
   `event.context()` can be null (See 
[here](https://docs.oracle.com/javase/7/docs/api/java/nio/file/WatchEvent.html#context())).
 please add a check for it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> drillbit.sh graceful_stop does not wait for fragments to complete before 
> stopping the drillbit
> --
>
> Key: DRILL-6039
> URL: https://issues.apache.org/jira/browse/DRILL-6039
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.3.0
>Reporter: Krystal
>Assignee: Venkata Jyothsna Donapati
>Priority: Major
> Fix For: 1.15.0
>
>
> git.commit.id.abbrev=eb0c403
> I have 3-nodes cluster with drillbits running on each node.  I kicked off a 
> long running query.  In the middle of the query, I did a "./drillbit.sh 
> graceful_stop" on one of the non-foreman node.  The node was stopped within a 
> few seconds and the query failed with error:
> Error: SYSTEM ERROR: IOException: Filesystem closed
> Fragment 4:15



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6039) drillbit.sh graceful_stop does not wait for fragments to complete before stopping the drillbit

2018-11-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695116#comment-16695116
 ] 

ASF GitHub Bot commented on DRILL-6039:
---

sohami commented on a change in pull request #1536: DRILL-6039: Fixed 
drillbit.sh script to do graceful shutdown
URL: https://github.com/apache/drill/pull/1536#discussion_r235505271
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/Drillbit.java
 ##
 @@ -335,6 +351,50 @@ private void javaPropertiesToSystemOptions() {
 }
   }
 
+
+  // Polls for graceful file to check if graceful shutdown is triggered from 
the script.
+  private static class PollShutdownThread extends Thread {
+
+private final Drillbit drillbit;
+private final StackTrace stackTrace;
+
+public PollShutdownThread(final Drillbit drillbit, final StackTrace 
stackTrace) {
+  this.drillbit = drillbit;
+  this.stackTrace = stackTrace;
+}
+
+@Override
+public void run () {
+  try {
+pollShutdown(drillbit);
+  } catch (Exception e) {
+throw new RuntimeException("Caught exception while polling for 
shutdown\n" + stackTrace, e);
+  }
 
 Review comment:
   I think you have to catch `InterruptedException` and consume it after 
logging rather than throwing `RuntimeException` for it. Since when 
`Drillbit.close` is called from another path it will interrupt this thread and 
we should not throw `RuntimeException` in that case.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> drillbit.sh graceful_stop does not wait for fragments to complete before 
> stopping the drillbit
> --
>
> Key: DRILL-6039
> URL: https://issues.apache.org/jira/browse/DRILL-6039
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.3.0
>Reporter: Krystal
>Assignee: Venkata Jyothsna Donapati
>Priority: Major
> Fix For: 1.15.0
>
>
> git.commit.id.abbrev=eb0c403
> I have 3-nodes cluster with drillbits running on each node.  I kicked off a 
> long running query.  In the middle of the query, I did a "./drillbit.sh 
> graceful_stop" on one of the non-foreman node.  The node was stopped within a 
> few seconds and the query failed with error:
> Error: SYSTEM ERROR: IOException: Filesystem closed
> Fragment 4:15



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6039) drillbit.sh graceful_stop does not wait for fragments to complete before stopping the drillbit

2018-11-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695120#comment-16695120
 ] 

ASF GitHub Bot commented on DRILL-6039:
---

sohami commented on a change in pull request #1536: DRILL-6039: Fixed 
drillbit.sh script to do graceful shutdown
URL: https://github.com/apache/drill/pull/1536#discussion_r235490503
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/Drillbit.java
 ##
 @@ -335,6 +351,50 @@ private void javaPropertiesToSystemOptions() {
 }
   }
 
+
+  // Polls for graceful file to check if graceful shutdown is triggered from 
the script.
+  private static class PollShutdownThread extends Thread {
+
+private final Drillbit drillbit;
+private final StackTrace stackTrace;
+
+public PollShutdownThread(final Drillbit drillbit, final StackTrace 
stackTrace) {
+  this.drillbit = drillbit;
+  this.stackTrace = stackTrace;
+}
+
+@Override
+public void run () {
+  try {
+pollShutdown(drillbit);
+  } catch (Exception e) {
+throw new RuntimeException("Caught exception while polling for 
shutdown\n" + stackTrace, e);
+  }
+}
+
+private void pollShutdown(Drillbit drillbit) throws IOException, 
InterruptedException {
 
 Review comment:
   please rename all the variables to be something else rather than using the 
type for name. Like 
   `path-->drillPidDirPath`
   `file-->gracefulFileName`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> drillbit.sh graceful_stop does not wait for fragments to complete before 
> stopping the drillbit
> --
>
> Key: DRILL-6039
> URL: https://issues.apache.org/jira/browse/DRILL-6039
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.3.0
>Reporter: Krystal
>Assignee: Venkata Jyothsna Donapati
>Priority: Major
> Fix For: 1.15.0
>
>
> git.commit.id.abbrev=eb0c403
> I have 3-nodes cluster with drillbits running on each node.  I kicked off a 
> long running query.  In the middle of the query, I did a "./drillbit.sh 
> graceful_stop" on one of the non-foreman node.  The node was stopped within a 
> few seconds and the query failed with error:
> Error: SYSTEM ERROR: IOException: Filesystem closed
> Fragment 4:15



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6039) drillbit.sh graceful_stop does not wait for fragments to complete before stopping the drillbit

2018-11-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695119#comment-16695119
 ] 

ASF GitHub Bot commented on DRILL-6039:
---

sohami commented on a change in pull request #1536: DRILL-6039: Fixed 
drillbit.sh script to do graceful shutdown
URL: https://github.com/apache/drill/pull/1536#discussion_r235488412
 
 

 ##
 File path: distribution/src/resources/drillbit.sh
 ##
 @@ -192,6 +201,10 @@ start_bit ( )
   echo $procId > $pidFile # Yeah, $pidFile is a file, $procId is the pid...
   echo $! > $pidFile
   sleep 1
+  #remove any previous uncleaned graceful file
+  if [ -f "$GRACEFUL_FILE" ]; then
+rm $GRACEFUL_FILE
 
 Review comment:
   we should check for error while removing the graceful file and if it fails 
then echo an error message and exit.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> drillbit.sh graceful_stop does not wait for fragments to complete before 
> stopping the drillbit
> --
>
> Key: DRILL-6039
> URL: https://issues.apache.org/jira/browse/DRILL-6039
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.3.0
>Reporter: Krystal
>Assignee: Venkata Jyothsna Donapati
>Priority: Major
> Fix For: 1.15.0
>
>
> git.commit.id.abbrev=eb0c403
> I have 3-nodes cluster with drillbits running on each node.  I kicked off a 
> long running query.  In the middle of the query, I did a "./drillbit.sh 
> graceful_stop" on one of the non-foreman node.  The node was stopped within a 
> few seconds and the query failed with error:
> Error: SYSTEM ERROR: IOException: Filesystem closed
> Fragment 4:15



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6039) drillbit.sh graceful_stop does not wait for fragments to complete before stopping the drillbit

2018-11-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695114#comment-16695114
 ] 

ASF GitHub Bot commented on DRILL-6039:
---

sohami commented on a change in pull request #1536: DRILL-6039: Fixed 
drillbit.sh script to do graceful shutdown
URL: https://github.com/apache/drill/pull/1536#discussion_r235491237
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/Drillbit.java
 ##
 @@ -335,6 +351,50 @@ private void javaPropertiesToSystemOptions() {
 }
   }
 
+
+  // Polls for graceful file to check if graceful shutdown is triggered from 
the script.
+  private static class PollShutdownThread extends Thread {
+
+private final Drillbit drillbit;
+private final StackTrace stackTrace;
+
+public PollShutdownThread(final Drillbit drillbit, final StackTrace 
stackTrace) {
+  this.drillbit = drillbit;
+  this.stackTrace = stackTrace;
+}
+
+@Override
+public void run () {
+  try {
+pollShutdown(drillbit);
+  } catch (Exception e) {
+throw new RuntimeException("Caught exception while polling for 
shutdown\n" + stackTrace, e);
+  }
+}
+
+private void pollShutdown(Drillbit drillbit) throws IOException, 
InterruptedException {
+  final Path path = 
FileSystems.getDefault().getPath(System.getenv("DRILL_PID_DIR"));
+  final String file = System.getenv("GRACEFUL_SIGFILE");
+  boolean triggered_shutdown = false;
+  try (final WatchService watchService = 
FileSystems.getDefault().newWatchService()) {
+path.register(watchService, StandardWatchEventKinds.ENTRY_MODIFY, 
StandardWatchEventKinds.ENTRY_CREATE);
+while (!triggered_shutdown) {
+  final WatchKey wk = watchService.take();
+  for (WatchEvent event : wk.pollEvents()) {
+final Path changed = (Path) event.context();
+if (changed.endsWith(file)) {
+  drillbit.interruptPollShutdown = false;
+  triggered_shutdown = true;
+  drillbit.close();
+  wk.cancel();
 
 Review comment:
   put `wk.cancel()` in finally block to handle case when `drillbit.close()` 
throws exception


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> drillbit.sh graceful_stop does not wait for fragments to complete before 
> stopping the drillbit
> --
>
> Key: DRILL-6039
> URL: https://issues.apache.org/jira/browse/DRILL-6039
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.3.0
>Reporter: Krystal
>Assignee: Venkata Jyothsna Donapati
>Priority: Major
> Fix For: 1.15.0
>
>
> git.commit.id.abbrev=eb0c403
> I have 3-nodes cluster with drillbits running on each node.  I kicked off a 
> long running query.  In the middle of the query, I did a "./drillbit.sh 
> graceful_stop" on one of the non-foreman node.  The node was stopped within a 
> few seconds and the query failed with error:
> Error: SYSTEM ERROR: IOException: Filesystem closed
> Fragment 4:15



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6039) drillbit.sh graceful_stop does not wait for fragments to complete before stopping the drillbit

2018-11-20 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16693943#comment-16693943
 ] 

ASF GitHub Bot commented on DRILL-6039:
---

dvjyothsna commented on a change in pull request #1536: DRILL-6039: Fixed 
drillbit.sh script to do graceful shutdown
URL: https://github.com/apache/drill/pull/1536#discussion_r235218184
 
 

 ##
 File path: distribution/src/resources/drill-config.sh
 ##
 @@ -334,6 +334,7 @@ fi
 # provided in drill-env.sh.
 
 export DRILL_PID_DIR=${DRILL_PID_DIR:-$DRILL_HOME}
+export GRACEFUL_FILE_SUFFIX=GRACEFUL_SIGFILE
 
 Review comment:
   Changed it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> drillbit.sh graceful_stop does not wait for fragments to complete before 
> stopping the drillbit
> --
>
> Key: DRILL-6039
> URL: https://issues.apache.org/jira/browse/DRILL-6039
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.3.0
>Reporter: Krystal
>Assignee: Venkata Jyothsna Donapati
>Priority: Major
> Fix For: 1.15.0
>
>
> git.commit.id.abbrev=eb0c403
> I have 3-nodes cluster with drillbits running on each node.  I kicked off a 
> long running query.  In the middle of the query, I did a "./drillbit.sh 
> graceful_stop" on one of the non-foreman node.  The node was stopped within a 
> few seconds and the query failed with error:
> Error: SYSTEM ERROR: IOException: Filesystem closed
> Fragment 4:15



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6039) drillbit.sh graceful_stop does not wait for fragments to complete before stopping the drillbit

2018-11-20 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16693817#comment-16693817
 ] 

ASF GitHub Bot commented on DRILL-6039:
---

dvjyothsna commented on a change in pull request #1536: DRILL-6039: Fixed 
drillbit.sh script to do graceful shutdown
URL: https://github.com/apache/drill/pull/1536#discussion_r235181475
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/Drillbit.java
 ##
 @@ -285,6 +286,10 @@ public synchronized void close() {
   if (storeProvider != profileStoreProvider) {
 AutoCloseables.close(profileStoreProvider);
   }
+  File f = new File(System.getenv("DRILL_PID_DIR") + "/.graceful");
 
 Review comment:
   Added the file name(GRACEFUL_SIGFILE) in drill-config.sh 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> drillbit.sh graceful_stop does not wait for fragments to complete before 
> stopping the drillbit
> --
>
> Key: DRILL-6039
> URL: https://issues.apache.org/jira/browse/DRILL-6039
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.3.0
>Reporter: Krystal
>Assignee: Venkata Jyothsna Donapati
>Priority: Major
> Fix For: 1.15.0
>
>
> git.commit.id.abbrev=eb0c403
> I have 3-nodes cluster with drillbits running on each node.  I kicked off a 
> long running query.  In the middle of the query, I did a "./drillbit.sh 
> graceful_stop" on one of the non-foreman node.  The node was stopped within a 
> few seconds and the query failed with error:
> Error: SYSTEM ERROR: IOException: Filesystem closed
> Fragment 4:15



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6039) drillbit.sh graceful_stop does not wait for fragments to complete before stopping the drillbit

2018-11-20 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16693830#comment-16693830
 ] 

ASF GitHub Bot commented on DRILL-6039:
---

dvjyothsna commented on a change in pull request #1536: DRILL-6039: Fixed 
drillbit.sh script to do graceful shutdown
URL: https://github.com/apache/drill/pull/1536#discussion_r235184369
 
 

 ##
 File path: distribution/src/resources/drill-config.sh
 ##
 @@ -334,6 +334,7 @@ fi
 # provided in drill-env.sh.
 
 export DRILL_PID_DIR=${DRILL_PID_DIR:-$DRILL_HOME}
+export GRACEFUL_FILE_SUFFIX=GRACEFUL_SIGFILE
 
 Review comment:
   I thought you asked me to change the name of the file graceful to 
GRACEFUL_SIGFILE. Let me change it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> drillbit.sh graceful_stop does not wait for fragments to complete before 
> stopping the drillbit
> --
>
> Key: DRILL-6039
> URL: https://issues.apache.org/jira/browse/DRILL-6039
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.3.0
>Reporter: Krystal
>Assignee: Venkata Jyothsna Donapati
>Priority: Major
> Fix For: 1.15.0
>
>
> git.commit.id.abbrev=eb0c403
> I have 3-nodes cluster with drillbits running on each node.  I kicked off a 
> long running query.  In the middle of the query, I did a "./drillbit.sh 
> graceful_stop" on one of the non-foreman node.  The node was stopped within a 
> few seconds and the query failed with error:
> Error: SYSTEM ERROR: IOException: Filesystem closed
> Fragment 4:15



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6039) drillbit.sh graceful_stop does not wait for fragments to complete before stopping the drillbit

2018-11-20 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16693827#comment-16693827
 ] 

ASF GitHub Bot commented on DRILL-6039:
---

kkhatua commented on a change in pull request #1536: DRILL-6039: Fixed 
drillbit.sh script to do graceful shutdown
URL: https://github.com/apache/drill/pull/1536#discussion_r235183590
 
 

 ##
 File path: distribution/src/resources/drill-config.sh
 ##
 @@ -334,6 +334,7 @@ fi
 # provided in drill-env.sh.
 
 export DRILL_PID_DIR=${DRILL_PID_DIR:-$DRILL_HOME}
+export GRACEFUL_FILE_SUFFIX=GRACEFUL_SIGFILE
 
 Review comment:
   Where is the value for `GRACEFUL_SIGFILE` defined? Provide a default if not 
set
   ```
   export GRACEFUL_FILE_SUFFIX=${GRACEFUL_SIGFILE:-"graceful"}
   ```
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> drillbit.sh graceful_stop does not wait for fragments to complete before 
> stopping the drillbit
> --
>
> Key: DRILL-6039
> URL: https://issues.apache.org/jira/browse/DRILL-6039
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.3.0
>Reporter: Krystal
>Assignee: Venkata Jyothsna Donapati
>Priority: Major
> Fix For: 1.15.0
>
>
> git.commit.id.abbrev=eb0c403
> I have 3-nodes cluster with drillbits running on each node.  I kicked off a 
> long running query.  In the middle of the query, I did a "./drillbit.sh 
> graceful_stop" on one of the non-foreman node.  The node was stopped within a 
> few seconds and the query failed with error:
> Error: SYSTEM ERROR: IOException: Filesystem closed
> Fragment 4:15



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6039) drillbit.sh graceful_stop does not wait for fragments to complete before stopping the drillbit

2018-11-20 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16693825#comment-16693825
 ] 

ASF GitHub Bot commented on DRILL-6039:
---

kkhatua commented on a change in pull request #1536: DRILL-6039: Fixed 
drillbit.sh script to do graceful shutdown
URL: https://github.com/apache/drill/pull/1536#discussion_r235183590
 
 

 ##
 File path: distribution/src/resources/drill-config.sh
 ##
 @@ -334,6 +334,7 @@ fi
 # provided in drill-env.sh.
 
 export DRILL_PID_DIR=${DRILL_PID_DIR:-$DRILL_HOME}
+export GRACEFUL_FILE_SUFFIX=GRACEFUL_SIGFILE
 
 Review comment:
   Where is the value for `GRACEFUL_SIGFILE` defined? Provide a default if not 
set
   ```
   export FOO=${FOO:-"value"}
   ```
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> drillbit.sh graceful_stop does not wait for fragments to complete before 
> stopping the drillbit
> --
>
> Key: DRILL-6039
> URL: https://issues.apache.org/jira/browse/DRILL-6039
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.3.0
>Reporter: Krystal
>Assignee: Venkata Jyothsna Donapati
>Priority: Major
> Fix For: 1.15.0
>
>
> git.commit.id.abbrev=eb0c403
> I have 3-nodes cluster with drillbits running on each node.  I kicked off a 
> long running query.  In the middle of the query, I did a "./drillbit.sh 
> graceful_stop" on one of the non-foreman node.  The node was stopped within a 
> few seconds and the query failed with error:
> Error: SYSTEM ERROR: IOException: Filesystem closed
> Fragment 4:15



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6039) drillbit.sh graceful_stop does not wait for fragments to complete before stopping the drillbit

2018-11-20 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16693821#comment-16693821
 ] 

ASF GitHub Bot commented on DRILL-6039:
---

dvjyothsna commented on a change in pull request #1536: DRILL-6039: Fixed 
drillbit.sh script to do graceful shutdown
URL: https://github.com/apache/drill/pull/1536#discussion_r235181928
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/work/WorkManager.java
 ##
 @@ -442,6 +467,14 @@ public void run() {
   // StatusThread is started
   final Controller controller = dContext.getController();
   final DrillbitEndpoint localBitEndPoint = dContext.getEndpoint();
+  try {
+pollShutdown(drillbit);
+  } catch (IOException e) {
+e.printStackTrace();
 
 Review comment:
   Changed it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> drillbit.sh graceful_stop does not wait for fragments to complete before 
> stopping the drillbit
> --
>
> Key: DRILL-6039
> URL: https://issues.apache.org/jira/browse/DRILL-6039
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.3.0
>Reporter: Krystal
>Assignee: Venkata Jyothsna Donapati
>Priority: Major
> Fix For: 1.15.0
>
>
> git.commit.id.abbrev=eb0c403
> I have 3-nodes cluster with drillbits running on each node.  I kicked off a 
> long running query.  In the middle of the query, I did a "./drillbit.sh 
> graceful_stop" on one of the non-foreman node.  The node was stopped within a 
> few seconds and the query failed with error:
> Error: SYSTEM ERROR: IOException: Filesystem closed
> Fragment 4:15



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6039) drillbit.sh graceful_stop does not wait for fragments to complete before stopping the drillbit

2018-11-20 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16693820#comment-16693820
 ] 

ASF GitHub Bot commented on DRILL-6039:
---

dvjyothsna commented on a change in pull request #1536: DRILL-6039: Fixed 
drillbit.sh script to do graceful shutdown
URL: https://github.com/apache/drill/pull/1536#discussion_r235181861
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/work/WorkManager.java
 ##
 @@ -433,6 +442,22 @@ public void run() {
 public StatusThread() {
   // assume this thread is created by a non-daemon thread
   setName("WorkManager.StatusThread");
+  }
+  private void pollShutdown(Drillbit drillbit) throws IOException, 
InterruptedException {
+  final Path path = 
FileSystems.getDefault().getPath(System.getenv("DRILL_PID_DIR"));
+  try (final WatchService watchService = 
FileSystems.getDefault().newWatchService()) {
+path.register(watchService, StandardWatchEventKinds.ENTRY_MODIFY, 
StandardWatchEventKinds.ENTRY_CREATE);
+while (true) {
+  final WatchKey wk = watchService.take();
 
 Review comment:
   @sohami I have made the changes based on your comments. Please let me know 
if you have any comments.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> drillbit.sh graceful_stop does not wait for fragments to complete before 
> stopping the drillbit
> --
>
> Key: DRILL-6039
> URL: https://issues.apache.org/jira/browse/DRILL-6039
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.3.0
>Reporter: Krystal
>Assignee: Venkata Jyothsna Donapati
>Priority: Major
> Fix For: 1.15.0
>
>
> git.commit.id.abbrev=eb0c403
> I have 3-nodes cluster with drillbits running on each node.  I kicked off a 
> long running query.  In the middle of the query, I did a "./drillbit.sh 
> graceful_stop" on one of the non-foreman node.  The node was stopped within a 
> few seconds and the query failed with error:
> Error: SYSTEM ERROR: IOException: Filesystem closed
> Fragment 4:15



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6039) drillbit.sh graceful_stop does not wait for fragments to complete before stopping the drillbit

2018-11-20 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16693819#comment-16693819
 ] 

ASF GitHub Bot commented on DRILL-6039:
---

dvjyothsna commented on a change in pull request #1536: DRILL-6039: Fixed 
drillbit.sh script to do graceful shutdown
URL: https://github.com/apache/drill/pull/1536#discussion_r235181676
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/Drillbit.java
 ##
 @@ -285,6 +286,10 @@ public synchronized void close() {
   if (storeProvider != profileStoreProvider) {
 AutoCloseables.close(profileStoreProvider);
   }
+  File f = new File(System.getenv("DRILL_PID_DIR") + "/.graceful");
+  if (f.exists()) {
+f.delete();
 
 Review comment:
   Deletion of the file is done in drillbit.sh start


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> drillbit.sh graceful_stop does not wait for fragments to complete before 
> stopping the drillbit
> --
>
> Key: DRILL-6039
> URL: https://issues.apache.org/jira/browse/DRILL-6039
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.3.0
>Reporter: Krystal
>Assignee: Venkata Jyothsna Donapati
>Priority: Major
> Fix For: 1.15.0
>
>
> git.commit.id.abbrev=eb0c403
> I have 3-nodes cluster with drillbits running on each node.  I kicked off a 
> long running query.  In the middle of the query, I did a "./drillbit.sh 
> graceful_stop" on one of the non-foreman node.  The node was stopped within a 
> few seconds and the query failed with error:
> Error: SYSTEM ERROR: IOException: Filesystem closed
> Fragment 4:15



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6039) drillbit.sh graceful_stop does not wait for fragments to complete before stopping the drillbit

2018-11-20 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16693816#comment-16693816
 ] 

ASF GitHub Bot commented on DRILL-6039:
---

dvjyothsna commented on a change in pull request #1536: DRILL-6039: Fixed 
drillbit.sh script to do graceful shutdown
URL: https://github.com/apache/drill/pull/1536#discussion_r235181291
 
 

 ##
 File path: distribution/src/resources/drillbit.sh
 ##
 @@ -94,11 +94,24 @@ waitForProcessEnd()
   commandName=$2
   kill_drillbit=$3
   processedAt=`date +%s`
+  triggered_shutdown=false
+  FILE=$DRILL_PID_DIR/.graceful
   origcnt=${DRILL_STOP_TIMEOUT:-120}
   while kill -0 $pidKilled > /dev/null 2>&1;
do
  echo -n "."
  sleep 1;
+ #Incase of graceful shutdown, create .graceful file and wait till it is 
deleted to trigger kill command.
+ if [ "$kill_drillbit" = false ]; then
+   if [ "$triggered_shutdown" = false ]; then
+ touch $DRILL_PID_DIR/.graceful
+ triggered_shutdown=true
+   else
+ if [ ! -f "$FILE" ]; then
+   kill $pidKilled > /dev/null 2>&1;
 
 Review comment:
   Removed this code and drillbit.sh start will delete the sig file.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> drillbit.sh graceful_stop does not wait for fragments to complete before 
> stopping the drillbit
> --
>
> Key: DRILL-6039
> URL: https://issues.apache.org/jira/browse/DRILL-6039
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.3.0
>Reporter: Krystal
>Assignee: Venkata Jyothsna Donapati
>Priority: Major
> Fix For: 1.15.0
>
>
> git.commit.id.abbrev=eb0c403
> I have 3-nodes cluster with drillbits running on each node.  I kicked off a 
> long running query.  In the middle of the query, I did a "./drillbit.sh 
> graceful_stop" on one of the non-foreman node.  The node was stopped within a 
> few seconds and the query failed with error:
> Error: SYSTEM ERROR: IOException: Filesystem closed
> Fragment 4:15



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6039) drillbit.sh graceful_stop does not wait for fragments to complete before stopping the drillbit

2018-11-20 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16693814#comment-16693814
 ] 

ASF GitHub Bot commented on DRILL-6039:
---

dvjyothsna commented on a change in pull request #1536: DRILL-6039: Fixed 
drillbit.sh script to do graceful shutdown
URL: https://github.com/apache/drill/pull/1536#discussion_r235180886
 
 

 ##
 File path: distribution/src/resources/drillbit.sh
 ##
 @@ -94,11 +94,24 @@ waitForProcessEnd()
   commandName=$2
   kill_drillbit=$3
   processedAt=`date +%s`
+  triggered_shutdown=false
+  FILE=$DRILL_PID_DIR/.graceful
 
 Review comment:
   Changed the name. Should I make it hidden or not?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> drillbit.sh graceful_stop does not wait for fragments to complete before 
> stopping the drillbit
> --
>
> Key: DRILL-6039
> URL: https://issues.apache.org/jira/browse/DRILL-6039
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.3.0
>Reporter: Krystal
>Assignee: Venkata Jyothsna Donapati
>Priority: Major
> Fix For: 1.15.0
>
>
> git.commit.id.abbrev=eb0c403
> I have 3-nodes cluster with drillbits running on each node.  I kicked off a 
> long running query.  In the middle of the query, I did a "./drillbit.sh 
> graceful_stop" on one of the non-foreman node.  The node was stopped within a 
> few seconds and the query failed with error:
> Error: SYSTEM ERROR: IOException: Filesystem closed
> Fragment 4:15



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6039) drillbit.sh graceful_stop does not wait for fragments to complete before stopping the drillbit

2018-11-20 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16693702#comment-16693702
 ] 

ASF GitHub Bot commented on DRILL-6039:
---

kkhatua commented on a change in pull request #1536: DRILL-6039: Fixed 
drillbit.sh script to do graceful shutdown
URL: https://github.com/apache/drill/pull/1536#discussion_r235140653
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/Drillbit.java
 ##
 @@ -285,6 +286,10 @@ public synchronized void close() {
   if (storeProvider != profileStoreProvider) {
 AutoCloseables.close(profileStoreProvider);
   }
+  File f = new File(System.getenv("DRILL_PID_DIR") + "/.graceful");
 
 Review comment:
   You could provide this as an env var (full path: 
`DRILL_PID_DIR+"/"+GRACEFUL_SIGFILE`) within the `drill-config.sh` (ref: 
https://github.com/apache/drill/blob/master/distribution/src/resources/drill-config.sh#L336
 )
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> drillbit.sh graceful_stop does not wait for fragments to complete before 
> stopping the drillbit
> --
>
> Key: DRILL-6039
> URL: https://issues.apache.org/jira/browse/DRILL-6039
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.3.0
>Reporter: Krystal
>Assignee: Venkata Jyothsna Donapati
>Priority: Major
> Fix For: 1.15.0
>
>
> git.commit.id.abbrev=eb0c403
> I have 3-nodes cluster with drillbits running on each node.  I kicked off a 
> long running query.  In the middle of the query, I did a "./drillbit.sh 
> graceful_stop" on one of the non-foreman node.  The node was stopped within a 
> few seconds and the query failed with error:
> Error: SYSTEM ERROR: IOException: Filesystem closed
> Fragment 4:15



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6039) drillbit.sh graceful_stop does not wait for fragments to complete before stopping the drillbit

2018-11-20 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16693701#comment-16693701
 ] 

ASF GitHub Bot commented on DRILL-6039:
---

kkhatua commented on a change in pull request #1536: DRILL-6039: Fixed 
drillbit.sh script to do graceful shutdown
URL: https://github.com/apache/drill/pull/1536#discussion_r235138498
 
 

 ##
 File path: distribution/src/resources/drillbit.sh
 ##
 @@ -94,11 +94,24 @@ waitForProcessEnd()
   commandName=$2
   kill_drillbit=$3
   processedAt=`date +%s`
+  triggered_shutdown=false
+  FILE=$DRILL_PID_DIR/.graceful
 
 Review comment:
   I think making it hidden is practical, since this is only serves the purpose 
of signalling and is not expected to be around visible. 
   Also, @dvjyothsna , it might be worth changing the variable name to 
something like `GRACEFUL_SIGFILE`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> drillbit.sh graceful_stop does not wait for fragments to complete before 
> stopping the drillbit
> --
>
> Key: DRILL-6039
> URL: https://issues.apache.org/jira/browse/DRILL-6039
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.3.0
>Reporter: Krystal
>Assignee: Venkata Jyothsna Donapati
>Priority: Major
> Fix For: 1.15.0
>
>
> git.commit.id.abbrev=eb0c403
> I have 3-nodes cluster with drillbits running on each node.  I kicked off a 
> long running query.  In the middle of the query, I did a "./drillbit.sh 
> graceful_stop" on one of the non-foreman node.  The node was stopped within a 
> few seconds and the query failed with error:
> Error: SYSTEM ERROR: IOException: Filesystem closed
> Fragment 4:15



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6039) drillbit.sh graceful_stop does not wait for fragments to complete before stopping the drillbit

2018-11-20 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16693700#comment-16693700
 ] 

ASF GitHub Bot commented on DRILL-6039:
---

kkhatua commented on a change in pull request #1536: DRILL-6039: Fixed 
drillbit.sh script to do graceful shutdown
URL: https://github.com/apache/drill/pull/1536#discussion_r235139192
 
 

 ##
 File path: distribution/src/resources/drillbit.sh
 ##
 @@ -94,11 +94,24 @@ waitForProcessEnd()
   commandName=$2
   kill_drillbit=$3
   processedAt=`date +%s`
+  triggered_shutdown=false
+  FILE=$DRILL_PID_DIR/.graceful
   origcnt=${DRILL_STOP_TIMEOUT:-120}
   while kill -0 $pidKilled > /dev/null 2>&1;
do
  echo -n "."
  sleep 1;
+ #Incase of graceful shutdown, create .graceful file and wait till it is 
deleted to trigger kill command.
+ if [ "$kill_drillbit" = false ]; then
+   if [ "$triggered_shutdown" = false ]; then
+ touch $DRILL_PID_DIR/.graceful
+ triggered_shutdown=true
+   else
+ if [ ! -f "$FILE" ]; then
+   kill $pidKilled > /dev/null 2>&1;
 
 Review comment:
   @sohami I think she is just playing it safe by doing a clean up. 
   A suggestion... for all the `kill` commands, provide the signal as part of 
the arguments.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> drillbit.sh graceful_stop does not wait for fragments to complete before 
> stopping the drillbit
> --
>
> Key: DRILL-6039
> URL: https://issues.apache.org/jira/browse/DRILL-6039
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.3.0
>Reporter: Krystal
>Assignee: Venkata Jyothsna Donapati
>Priority: Major
> Fix For: 1.15.0
>
>
> git.commit.id.abbrev=eb0c403
> I have 3-nodes cluster with drillbits running on each node.  I kicked off a 
> long running query.  In the middle of the query, I did a "./drillbit.sh 
> graceful_stop" on one of the non-foreman node.  The node was stopped within a 
> few seconds and the query failed with error:
> Error: SYSTEM ERROR: IOException: Filesystem closed
> Fragment 4:15



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6039) drillbit.sh graceful_stop does not wait for fragments to complete before stopping the drillbit

2018-11-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16691369#comment-16691369
 ] 

ASF GitHub Bot commented on DRILL-6039:
---

arina-ielchiieva commented on a change in pull request #1536: DRILL-6039: Fixed 
drillbit.sh script to do graceful shutdown
URL: https://github.com/apache/drill/pull/1536#discussion_r234524071
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/work/WorkManager.java
 ##
 @@ -442,6 +467,14 @@ public void run() {
   // StatusThread is started
   final Controller controller = dContext.getController();
   final DrillbitEndpoint localBitEndPoint = dContext.getEndpoint();
+  try {
+pollShutdown(drillbit);
+  } catch (IOException e) {
+e.printStackTrace();
 
 Review comment:
   I think we should not use `e.printStackTrace();` but rather do proper error 
handling or logging.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> drillbit.sh graceful_stop does not wait for fragments to complete before 
> stopping the drillbit
> --
>
> Key: DRILL-6039
> URL: https://issues.apache.org/jira/browse/DRILL-6039
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.3.0
>Reporter: Krystal
>Assignee: Venkata Jyothsna Donapati
>Priority: Major
> Fix For: 1.15.0
>
>
> git.commit.id.abbrev=eb0c403
> I have 3-nodes cluster with drillbits running on each node.  I kicked off a 
> long running query.  In the middle of the query, I did a "./drillbit.sh 
> graceful_stop" on one of the non-foreman node.  The node was stopped within a 
> few seconds and the query failed with error:
> Error: SYSTEM ERROR: IOException: Filesystem closed
> Fragment 4:15



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6039) drillbit.sh graceful_stop does not wait for fragments to complete before stopping the drillbit

2018-11-16 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16689908#comment-16689908
 ] 

ASF GitHub Bot commented on DRILL-6039:
---

dvjyothsna commented on a change in pull request #1536: DRILL-6039: Fixed 
drillbit.sh script to do graceful shutdown
URL: https://github.com/apache/drill/pull/1536#discussion_r234318353
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/Drillbit.java
 ##
 @@ -285,6 +286,10 @@ public synchronized void close() {
   if (storeProvider != profileStoreProvider) {
 AutoCloseables.close(profileStoreProvider);
   }
+  File f = new File(System.getenv("DRILL_PID_DIR") + "/.graceful");
 
 Review comment:
   Can we get that config param in the script?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> drillbit.sh graceful_stop does not wait for fragments to complete before 
> stopping the drillbit
> --
>
> Key: DRILL-6039
> URL: https://issues.apache.org/jira/browse/DRILL-6039
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.3.0
>Reporter: Krystal
>Assignee: Venkata Jyothsna Donapati
>Priority: Major
> Fix For: 1.15.0
>
>
> git.commit.id.abbrev=eb0c403
> I have 3-nodes cluster with drillbits running on each node.  I kicked off a 
> long running query.  In the middle of the query, I did a "./drillbit.sh 
> graceful_stop" on one of the non-foreman node.  The node was stopped within a 
> few seconds and the query failed with error:
> Error: SYSTEM ERROR: IOException: Filesystem closed
> Fragment 4:15



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6039) drillbit.sh graceful_stop does not wait for fragments to complete before stopping the drillbit

2018-11-16 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16689793#comment-16689793
 ] 

ASF GitHub Bot commented on DRILL-6039:
---

sohami commented on a change in pull request #1536: DRILL-6039: Fixed 
drillbit.sh script to do graceful shutdown
URL: https://github.com/apache/drill/pull/1536#discussion_r234070327
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/Drillbit.java
 ##
 @@ -285,6 +286,10 @@ public synchronized void close() {
   if (storeProvider != profileStoreProvider) {
 AutoCloseables.close(profileStoreProvider);
   }
+  File f = new File(System.getenv("DRILL_PID_DIR") + "/.graceful");
 
 Review comment:
   Provide this as a configuration parameter with default value being `graceful`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> drillbit.sh graceful_stop does not wait for fragments to complete before 
> stopping the drillbit
> --
>
> Key: DRILL-6039
> URL: https://issues.apache.org/jira/browse/DRILL-6039
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.3.0
>Reporter: Krystal
>Assignee: Venkata Jyothsna Donapati
>Priority: Major
> Fix For: 1.15.0
>
>
> git.commit.id.abbrev=eb0c403
> I have 3-nodes cluster with drillbits running on each node.  I kicked off a 
> long running query.  In the middle of the query, I did a "./drillbit.sh 
> graceful_stop" on one of the non-foreman node.  The node was stopped within a 
> few seconds and the query failed with error:
> Error: SYSTEM ERROR: IOException: Filesystem closed
> Fragment 4:15



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6039) drillbit.sh graceful_stop does not wait for fragments to complete before stopping the drillbit

2018-11-16 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16689795#comment-16689795
 ] 

ASF GitHub Bot commented on DRILL-6039:
---

sohami commented on a change in pull request #1536: DRILL-6039: Fixed 
drillbit.sh script to do graceful shutdown
URL: https://github.com/apache/drill/pull/1536#discussion_r233586484
 
 

 ##
 File path: distribution/src/resources/drillbit.sh
 ##
 @@ -94,11 +94,24 @@ waitForProcessEnd()
   commandName=$2
   kill_drillbit=$3
   processedAt=`date +%s`
+  triggered_shutdown=false
+  FILE=$DRILL_PID_DIR/.graceful
 
 Review comment:
   Let's not make it a hidden file.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> drillbit.sh graceful_stop does not wait for fragments to complete before 
> stopping the drillbit
> --
>
> Key: DRILL-6039
> URL: https://issues.apache.org/jira/browse/DRILL-6039
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.3.0
>Reporter: Krystal
>Assignee: Venkata Jyothsna Donapati
>Priority: Major
> Fix For: 1.15.0
>
>
> git.commit.id.abbrev=eb0c403
> I have 3-nodes cluster with drillbits running on each node.  I kicked off a 
> long running query.  In the middle of the query, I did a "./drillbit.sh 
> graceful_stop" on one of the non-foreman node.  The node was stopped within a 
> few seconds and the query failed with error:
> Error: SYSTEM ERROR: IOException: Filesystem closed
> Fragment 4:15



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6039) drillbit.sh graceful_stop does not wait for fragments to complete before stopping the drillbit

2018-11-16 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16689794#comment-16689794
 ] 

ASF GitHub Bot commented on DRILL-6039:
---

sohami commented on a change in pull request #1536: DRILL-6039: Fixed 
drillbit.sh script to do graceful shutdown
URL: https://github.com/apache/drill/pull/1536#discussion_r234070509
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/Drillbit.java
 ##
 @@ -285,6 +286,10 @@ public synchronized void close() {
   if (storeProvider != profileStoreProvider) {
 AutoCloseables.close(profileStoreProvider);
   }
+  File f = new File(System.getenv("DRILL_PID_DIR") + "/.graceful");
+  if (f.exists()) {
+f.delete();
 
 Review comment:
   check and log warning based on the return value of `f.delete()`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> drillbit.sh graceful_stop does not wait for fragments to complete before 
> stopping the drillbit
> --
>
> Key: DRILL-6039
> URL: https://issues.apache.org/jira/browse/DRILL-6039
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.3.0
>Reporter: Krystal
>Assignee: Venkata Jyothsna Donapati
>Priority: Major
> Fix For: 1.15.0
>
>
> git.commit.id.abbrev=eb0c403
> I have 3-nodes cluster with drillbits running on each node.  I kicked off a 
> long running query.  In the middle of the query, I did a "./drillbit.sh 
> graceful_stop" on one of the non-foreman node.  The node was stopped within a 
> few seconds and the query failed with error:
> Error: SYSTEM ERROR: IOException: Filesystem closed
> Fragment 4:15



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6039) drillbit.sh graceful_stop does not wait for fragments to complete before stopping the drillbit

2018-11-16 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16689792#comment-16689792
 ] 

ASF GitHub Bot commented on DRILL-6039:
---

sohami commented on a change in pull request #1536: DRILL-6039: Fixed 
drillbit.sh script to do graceful shutdown
URL: https://github.com/apache/drill/pull/1536#discussion_r234069987
 
 

 ##
 File path: distribution/src/resources/drillbit.sh
 ##
 @@ -94,11 +94,24 @@ waitForProcessEnd()
   commandName=$2
   kill_drillbit=$3
   processedAt=`date +%s`
+  triggered_shutdown=false
+  FILE=$DRILL_PID_DIR/.graceful
   origcnt=${DRILL_STOP_TIMEOUT:-120}
   while kill -0 $pidKilled > /dev/null 2>&1;
do
  echo -n "."
  sleep 1;
+ #Incase of graceful shutdown, create .graceful file and wait till it is 
deleted to trigger kill command.
+ if [ "$kill_drillbit" = false ]; then
+   if [ "$triggered_shutdown" = false ]; then
+ touch $DRILL_PID_DIR/.graceful
+ triggered_shutdown=true
+   else
+ if [ ! -f "$FILE" ]; then
+   kill $pidKilled > /dev/null 2>&1;
 
 Review comment:
   this shouldn't be required since after `close` Drillbit process should exit. 
If not then we should find out the reason behind it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> drillbit.sh graceful_stop does not wait for fragments to complete before 
> stopping the drillbit
> --
>
> Key: DRILL-6039
> URL: https://issues.apache.org/jira/browse/DRILL-6039
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.3.0
>Reporter: Krystal
>Assignee: Venkata Jyothsna Donapati
>Priority: Major
> Fix For: 1.15.0
>
>
> git.commit.id.abbrev=eb0c403
> I have 3-nodes cluster with drillbits running on each node.  I kicked off a 
> long running query.  In the middle of the query, I did a "./drillbit.sh 
> graceful_stop" on one of the non-foreman node.  The node was stopped within a 
> few seconds and the query failed with error:
> Error: SYSTEM ERROR: IOException: Filesystem closed
> Fragment 4:15



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6039) drillbit.sh graceful_stop does not wait for fragments to complete before stopping the drillbit

2018-11-16 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16689791#comment-16689791
 ] 

ASF GitHub Bot commented on DRILL-6039:
---

sohami commented on a change in pull request #1536: DRILL-6039: Fixed 
drillbit.sh script to do graceful shutdown
URL: https://github.com/apache/drill/pull/1536#discussion_r234301751
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/work/WorkManager.java
 ##
 @@ -433,6 +442,22 @@ public void run() {
 public StatusThread() {
   // assume this thread is created by a non-daemon thread
   setName("WorkManager.StatusThread");
+  }
+  private void pollShutdown(Drillbit drillbit) throws IOException, 
InterruptedException {
+  final Path path = 
FileSystems.getDefault().getPath(System.getenv("DRILL_PID_DIR"));
+  try (final WatchService watchService = 
FileSystems.getDefault().newWatchService()) {
+path.register(watchService, StandardWatchEventKinds.ENTRY_MODIFY, 
StandardWatchEventKinds.ENTRY_CREATE);
+while (true) {
+  final WatchKey wk = watchService.take();
 
 Review comment:
   This design is not correct and has issues.
   
   - You are using `WorkManager` `StatusThread` for registering a 
`WatchService` and then doing a blocking call `take()` which will make this 
thread stuck until there is any event seen by `WatchService`. Hence 
`StatusThread` will not be able to perform it's actual job of sending status of 
`runningFragments` to it's `Foreman`.
   - The `pollShutdown` method is calling `Drillbit.close()` which is in 
`StatusThread`. Now `Drillbit.close() `calls close on `WorkManager` and which 
calls `interrupt` on `StatusThread`. There is a cycle here. What it means to 
call interrupt on itself ? Usually interrupt is used by another thread to wake 
up a blocking thread. I think based on interrupt call the interrupt flag of 
status thread will be set. Then when it comes out of `pollShutdown` and calls 
sleep it will hit interrupted exception. It might work but I don't like the 
idea where WorkManager which is part of Drillbit has a thread which calling 
`Drillbit.close()`. 
   
   Please create a separate thread within Drillbit class itself to do this. But 
in `Drillbit::close()` method you need to still `interrupt` this new thread for 
cases when `close` it called from some other path. But you can maintain a flag 
which will indicate if `Thread.interrupt()` should be called or not.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> drillbit.sh graceful_stop does not wait for fragments to complete before 
> stopping the drillbit
> --
>
> Key: DRILL-6039
> URL: https://issues.apache.org/jira/browse/DRILL-6039
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.3.0
>Reporter: Krystal
>Assignee: Venkata Jyothsna Donapati
>Priority: Major
> Fix For: 1.15.0
>
>
> git.commit.id.abbrev=eb0c403
> I have 3-nodes cluster with drillbits running on each node.  I kicked off a 
> long running query.  In the middle of the query, I did a "./drillbit.sh 
> graceful_stop" on one of the non-foreman node.  The node was stopped within a 
> few seconds and the query failed with error:
> Error: SYSTEM ERROR: IOException: Filesystem closed
> Fragment 4:15



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6039) drillbit.sh graceful_stop does not wait for fragments to complete before stopping the drillbit

2018-11-12 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16684388#comment-16684388
 ] 

ASF GitHub Bot commented on DRILL-6039:
---

dvjyothsna opened a new pull request #1536: DRILL-6039: Fixed drillbit.sh 
script to do graceful shutdown
URL: https://github.com/apache/drill/pull/1536
 
 
   @sohami please review


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> drillbit.sh graceful_stop does not wait for fragments to complete before 
> stopping the drillbit
> --
>
> Key: DRILL-6039
> URL: https://issues.apache.org/jira/browse/DRILL-6039
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.3.0
>Reporter: Krystal
>Assignee: Venkata Jyothsna Donapati
>Priority: Major
> Fix For: 1.15.0
>
>
> git.commit.id.abbrev=eb0c403
> I have 3-nodes cluster with drillbits running on each node.  I kicked off a 
> long running query.  In the middle of the query, I did a "./drillbit.sh 
> graceful_stop" on one of the non-foreman node.  The node was stopped within a 
> few seconds and the query failed with error:
> Error: SYSTEM ERROR: IOException: Filesystem closed
> Fragment 4:15



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6039) drillbit.sh graceful_stop does not wait for fragments to complete before stopping the drillbit

2018-06-15 Thread Krystal (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16514293#comment-16514293
 ] 

Krystal commented on DRILL-6039:


The "drillbit.sh graceful_stop" from command line against parquet files still 
fails - does not wait for fragments to finish.  Interesting thing is the 
problem does not occur when shutting down the drillbit from the WebUI.  The 
drillbit.log does not show any memory leaks.  Here is the stack trace:
{code:java}
Error: SYSTEM ERROR: IOException: Filesystem closed

Fragment 1:10

(org.apache.drill.common.exceptions.DrillRuntimeException) Error in parquet 
record reader.
Message: Failure in setting up reader
Parquet Metadata: ParquetMetaData{FileMetaData{schema: message root {
  optional int64 l_orderkey;
  optional int64 l_partkey;
  optional int64 l_suppkey;
  optional int32 l_linenumber;
  optional double l_quantity;
  optional double l_extendedprice;
  optional double l_discount;
  optional double l_tax;
  optional binary l_returnflag (UTF8);
  optional binary l_linestatus (UTF8);
  optional int32 l_shipdate (DATE);
  optional int32 l_commitdate (DATE);
  optional int32 l_receiptdate (DATE);
  optional binary l_shipinstruct (UTF8);
  optional binary l_shipmode (UTF8);
  optional binary l_comment (UTF8);
}
, metadata: {drill.version=1.7.0-SNAPSHOT}}, blocks: [BlockMetaData{9785551, 
1338587809 [ColumnMetaData{SNAPPY [l_orderkey] INT64  [PLAIN, BIT_PACKED, RLE], 
4}, ColumnMetaData{SNAPPY [l_partkey] INT64  [PLAIN, BIT_PACKED, RLE], 
15273019}, ColumnMetaData{SNAPPY [l_suppkey] INT64  [PLAIN, BIT_PACKED, RLE], 
73277460}, ColumnMetaData{SNAPPY [l_linenumber] INT32  [PLAIN, BIT_PACKED, 
RLE], 124321400}, ColumnMetaData{SNAPPY [l_quantity] DOUBLE  [PLAIN, 
BIT_PACKED, RLE], 132087986}, ColumnMetaData{SNAPPY [l_extendedprice] DOUBLE  
[PLAIN, BIT_PACKED, RLE], 151838465}, ColumnMetaData{SNAPPY [l_discount] DOUBLE 
 [PLAIN, BIT_PACKED, RLE], 208270450}, ColumnMetaData{SNAPPY [l_tax] DOUBLE  
[PLAIN, BIT_PACKED, RLE], 227351535}, ColumnMetaData{SNAPPY [l_returnflag] 
BINARY  [PLAIN, BIT_PACKED, RLE], 245574230}, ColumnMetaData{SNAPPY 
[l_linestatus] BINARY  [PLAIN, BIT_PACKED, RLE], 254814472}, 
ColumnMetaData{SNAPPY [l_shipdate] INT32  [PLAIN, BIT_PACKED, RLE], 260500185}, 
ColumnMetaData{SNAPPY [l_commitdate] INT32  [PLAIN, BIT_PACKED, RLE], 
290097700}, ColumnMetaData{SNAPPY [l_receiptdate] INT32  [PLAIN, BIT_PACKED, 
RLE], 319358270}, ColumnMetaData{SNAPPY [l_shipinstruct] BINARY  [PLAIN, 
BIT_PACKED, RLE], 348982057}, ColumnMetaData{SNAPPY [l_shipmode] BINARY  
[PLAIN, BIT_PACKED, RLE], 370125048}, ColumnMetaData{SNAPPY [l_comment] BINARY  
[PLAIN, BIT_PACKED, RLE], 392116052}]}]}
    
org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.handleException():316
    
org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.setup():300
    org.apache.drill.exec.physical.impl.ScanBatch.getNextReaderIfHas():335
    org.apache.drill.exec.physical.impl.ScanBatch.internalNext():222
    org.apache.drill.exec.physical.impl.ScanBatch.next():274
    org.apache.drill.exec.record.AbstractRecordBatch.next():119
    org.apache.drill.exec.record.AbstractRecordBatch.next():109
    org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
    org.apache.drill.exec.record.AbstractRecordBatch.next():164
    org.apache.drill.exec.record.AbstractRecordBatch.next():119
    org.apache.drill.exec.record.AbstractRecordBatch.next():109
    org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
    
org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext():80
    org.apache.drill.exec.record.AbstractRecordBatch.next():164
    org.apache.drill.exec.record.AbstractRecordBatch.next():119
    org.apache.drill.exec.record.AbstractRecordBatch.next():109
    org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
    
org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():134
    org.apache.drill.exec.record.AbstractRecordBatch.next():164
    org.apache.drill.exec.record.AbstractRecordBatch.next():119
    org.apache.drill.exec.test.generated.StreamingAggregatorGen42.doWork():187
    
org.apache.drill.exec.physical.impl.aggregate.StreamingAggBatch.innerNext():194
    org.apache.drill.exec.record.AbstractRecordBatch.next():164
    org.apache.drill.exec.physical.impl.BaseRootExec.next():105
    
org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext():93
    org.apache.drill.exec.physical.impl.BaseRootExec.next():95
    org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():233
    org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():226
    java.security.AccessController.doPrivileged():-2
    javax.security.auth.Subject.doAs():422
    org.apache.hadoop.security.UserGroupInformation.doAs():1633
    org.apache.drill.exec.work.fragment.FragmentExecutor.run():226
    

[jira] [Commented] (DRILL-6039) drillbit.sh graceful_stop does not wait for fragments to complete before stopping the drillbit

2018-06-12 Thread Pritesh Maker (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16510437#comment-16510437
 ] 

Pritesh Maker commented on DRILL-6039:
--

[~knguyen] can you verify this since DRILL-6252 is now resolved?

> drillbit.sh graceful_stop does not wait for fragments to complete before 
> stopping the drillbit
> --
>
> Key: DRILL-6039
> URL: https://issues.apache.org/jira/browse/DRILL-6039
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.3.0
>Reporter: Krystal
>Assignee: Venkata Jyothsna Donapati
>Priority: Major
> Fix For: 1.14.0
>
>
> git.commit.id.abbrev=eb0c403
> I have 3-nodes cluster with drillbits running on each node.  I kicked off a 
> long running query.  In the middle of the query, I did a "./drillbit.sh 
> graceful_stop" on one of the non-foreman node.  The node was stopped within a 
> few seconds and the query failed with error:
> Error: SYSTEM ERROR: IOException: Filesystem closed
> Fragment 4:15



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6039) drillbit.sh graceful_stop does not wait for fragments to complete before stopping the drillbit

2018-03-16 Thread Pritesh Maker (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16402447#comment-16402447
 ] 

Pritesh Maker commented on DRILL-6039:
--

This should be tested after DRIL-6252 is addressed.

> drillbit.sh graceful_stop does not wait for fragments to complete before 
> stopping the drillbit
> --
>
> Key: DRILL-6039
> URL: https://issues.apache.org/jira/browse/DRILL-6039
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.3.0
>Reporter: Krystal
>Assignee: Venkata Jyothsna Donapati
>Priority: Major
> Fix For: 1.14.0
>
>
> git.commit.id.abbrev=eb0c403
> I have 3-nodes cluster with drillbits running on each node.  I kicked off a 
> long running query.  In the middle of the query, I did a "./drillbit.sh 
> graceful_stop" on one of the non-foreman node.  The node was stopped within a 
> few seconds and the query failed with error:
> Error: SYSTEM ERROR: IOException: Filesystem closed
> Fragment 4:15



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6039) drillbit.sh graceful_stop does not wait for fragments to complete before stopping the drillbit

2018-03-15 Thread Venkata Jyothsna Donapati (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16401054#comment-16401054
 ] 

Venkata Jyothsna Donapati commented on DRILL-6039:
--

This happened only when we query on parquet files. And looks like this is the 
issue : https://issues.apache.org/jira/browse/DRILL-6252 and graceful shutdown 
has nothing to do with this 

> drillbit.sh graceful_stop does not wait for fragments to complete before 
> stopping the drillbit
> --
>
> Key: DRILL-6039
> URL: https://issues.apache.org/jira/browse/DRILL-6039
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.3.0
>Reporter: Krystal
>Assignee: Venkata Jyothsna Donapati
>Priority: Major
> Fix For: 1.14.0
>
>
> git.commit.id.abbrev=eb0c403
> I have 3-nodes cluster with drillbits running on each node.  I kicked off a 
> long running query.  In the middle of the query, I did a "./drillbit.sh 
> graceful_stop" on one of the non-foreman node.  The node was stopped within a 
> few seconds and the query failed with error:
> Error: SYSTEM ERROR: IOException: Filesystem closed
> Fragment 4:15



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)