[jira] [Commented] (DRILL-6039) drillbit.sh graceful_stop does not wait for fragments to complete before stopping the drillbit
[ https://issues.apache.org/jira/browse/DRILL-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16708357#comment-16708357 ] ASF GitHub Bot commented on DRILL-6039: --- vdiravka commented on issue #1560: Fixed regression caused by DRILL-6039 URL: https://github.com/apache/drill/pull/1560#issuecomment-444010551 @dvjyothsna Please follow the usual PR and commits naming. It allows to add link for PR to the Jira ticket automatically. > DRILL-6877: NPE when starting Drill on Windows This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > drillbit.sh graceful_stop does not wait for fragments to complete before > stopping the drillbit > -- > > Key: DRILL-6039 > URL: https://issues.apache.org/jira/browse/DRILL-6039 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.3.0 >Reporter: Krystal >Assignee: Venkata Jyothsna Donapati >Priority: Major > Labels: ready-to-commit > Fix For: 1.15.0 > > > git.commit.id.abbrev=eb0c403 > I have 3-nodes cluster with drillbits running on each node. I kicked off a > long running query. In the middle of the query, I did a "./drillbit.sh > graceful_stop" on one of the non-foreman node. The node was stopped within a > few seconds and the query failed with error: > Error: SYSTEM ERROR: IOException: Filesystem closed > Fragment 4:15 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6039) drillbit.sh graceful_stop does not wait for fragments to complete before stopping the drillbit
[ https://issues.apache.org/jira/browse/DRILL-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16708033#comment-16708033 ] ASF GitHub Bot commented on DRILL-6039: --- dvjyothsna commented on issue #1560: Fixed regression caused by DRILL-6039 URL: https://github.com/apache/drill/pull/1560#issuecomment-443929243 @sohami Please review. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > drillbit.sh graceful_stop does not wait for fragments to complete before > stopping the drillbit > -- > > Key: DRILL-6039 > URL: https://issues.apache.org/jira/browse/DRILL-6039 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.3.0 >Reporter: Krystal >Assignee: Venkata Jyothsna Donapati >Priority: Major > Labels: ready-to-commit > Fix For: 1.15.0 > > > git.commit.id.abbrev=eb0c403 > I have 3-nodes cluster with drillbits running on each node. I kicked off a > long running query. In the middle of the query, I did a "./drillbit.sh > graceful_stop" on one of the non-foreman node. The node was stopped within a > few seconds and the query failed with error: > Error: SYSTEM ERROR: IOException: Filesystem closed > Fragment 4:15 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6039) drillbit.sh graceful_stop does not wait for fragments to complete before stopping the drillbit
[ https://issues.apache.org/jira/browse/DRILL-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16708025#comment-16708025 ] ASF GitHub Bot commented on DRILL-6039: --- dvjyothsna opened a new pull request #1560: Fixed regression caused by DRILL-6039 URL: https://github.com/apache/drill/pull/1560 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > drillbit.sh graceful_stop does not wait for fragments to complete before > stopping the drillbit > -- > > Key: DRILL-6039 > URL: https://issues.apache.org/jira/browse/DRILL-6039 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.3.0 >Reporter: Krystal >Assignee: Venkata Jyothsna Donapati >Priority: Major > Labels: ready-to-commit > Fix For: 1.15.0 > > > git.commit.id.abbrev=eb0c403 > I have 3-nodes cluster with drillbits running on each node. I kicked off a > long running query. In the middle of the query, I did a "./drillbit.sh > graceful_stop" on one of the non-foreman node. The node was stopped within a > few seconds and the query failed with error: > Error: SYSTEM ERROR: IOException: Filesystem closed > Fragment 4:15 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6039) drillbit.sh graceful_stop does not wait for fragments to complete before stopping the drillbit
[ https://issues.apache.org/jira/browse/DRILL-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16703751#comment-16703751 ] ASF GitHub Bot commented on DRILL-6039: --- asfgit closed pull request #1536: DRILL-6039: Fixed drillbit.sh script to do graceful shutdown URL: https://github.com/apache/drill/pull/1536 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/distribution/src/resources/drill-config.sh b/distribution/src/resources/drill-config.sh index d23788b006a..a4686c50354 100644 --- a/distribution/src/resources/drill-config.sh +++ b/distribution/src/resources/drill-config.sh @@ -334,6 +334,7 @@ fi # provided in drill-env.sh. export DRILL_PID_DIR=${DRILL_PID_DIR:-$DRILL_HOME} +export GRACEFUL_SIGFILE=${GRACEFUL_SIGFILE:-"graceful"} # Prepare log file prefix and the main Drillbit log file. diff --git a/distribution/src/resources/drillbit.sh b/distribution/src/resources/drillbit.sh index 88d56c8a14f..5ad87b15cdb 100755 --- a/distribution/src/resources/drillbit.sh +++ b/distribution/src/resources/drillbit.sh @@ -87,6 +87,7 @@ export args # Set default scheduling priority DRILL_NICENESS=${DRILL_NICENESS:-0} +GRACEFUL_FILE=$DRILL_PID_DIR/$GRACEFUL_SIGFILE waitForProcessEnd() { @@ -94,11 +95,19 @@ waitForProcessEnd() commandName=$2 kill_drillbit=$3 processedAt=`date +%s` + triggered_shutdown=false origcnt=${DRILL_STOP_TIMEOUT:-120} while kill -0 $pidKilled > /dev/null 2>&1; do echo -n "." sleep 1; + #Incase of graceful shutdown, create graceful file and wait till the process ends. + if [ "$kill_drillbit" = false ]; then + if [ "$triggered_shutdown" = false ]; then + touch $GRACEFUL_FILE + triggered_shutdown=true + fi + fi if [ "$kill_drillbit" = true ] ; then # if process persists more than $DRILL_STOP_TIMEOUT (default 120 sec) no mercy if [ $(( `date +%s` - $processedAt )) -gt $origcnt ]; then @@ -125,6 +134,15 @@ check_before_start() exit 1 fi fi + #remove any previous uncleaned graceful file + if [ -f "$GRACEFUL_FILE" ]; then +rm $GRACEFUL_FILE +rm_status=$? +if [ $rm_status -ne 0 ];then +echo "Error: Failed to remove $GRACEFUL_FILE!" +exit $rm_status +fi + fi } check_after_start(){ @@ -204,7 +222,9 @@ stop_bit ( ) if kill -0 $pidToKill > /dev/null 2>&1; then echo "Stopping $command" echo "`date` Terminating $command pid $pidToKill" >> "$DRILLBIT_LOG_PATH" - kill $pidToKill > /dev/null 2>&1 + if [ $kill_drillbit = true ]; then +kill $pidToKill > /dev/null 2>&1 + fi waitForProcessEnd $pidToKill $command $kill_drillbit retval=0 else diff --git a/exec/java-exec/src/main/java/org/apache/drill/exec/server/Drillbit.java b/exec/java-exec/src/main/java/org/apache/drill/exec/server/Drillbit.java index dd1c5f19faf..a0c63ab6dcc 100644 --- a/exec/java-exec/src/main/java/org/apache/drill/exec/server/Drillbit.java +++ b/exec/java-exec/src/main/java/org/apache/drill/exec/server/Drillbit.java @@ -17,6 +17,13 @@ */ package org.apache.drill.exec.server; +import java.io.IOException; +import java.nio.file.FileSystems; +import java.nio.file.Path; +import java.nio.file.StandardWatchEventKinds; +import java.nio.file.WatchEvent; +import java.nio.file.WatchKey; +import java.nio.file.WatchService; import java.util.concurrent.TimeUnit; import java.util.concurrent.atomic.AtomicInteger; @@ -90,6 +97,8 @@ private DrillbitStateManager stateManager; private boolean quiescentMode; private boolean forcefulShutdown = false; + GracefulShutdownThread gracefulShutdownThread; + private boolean interruptPollShutdown = true; public void setQuiescentMode(boolean quiescentMode) { this.quiescentMode = quiescentMode; @@ -212,6 +221,8 @@ public void run() throws Exception { drillbitContext.startRM(); Runtime.getRuntime().addShutdownHook(new ShutdownThread(this, new StackTrace())); +gracefulShutdownThread = new GracefulShutdownThread(this, new StackTrace()); +gracefulShutdownThread.start(); logger.info("Startup completed ({} ms).", w.elapsed(TimeUnit.MILLISECONDS)); } @@ -291,6 +302,11 @@ public synchronized void close() { logger.info("Shutdown completed ({} ms).", w.elapsed(TimeUnit.MILLISECONDS) ); stateManager.setState(DrillbitState.SHUTDOWN); +// Interrupt GracefulShutdownThread since Drillbit close is not called from it. +if (interruptPollShutdown) { + gracefulShutdownThread.interrupt(); +} + } private void javaPropertiesToSystemOptions() { @@ -335,6 +351,55 @@ private void javaPropertiesToSystemOptions() {
[jira] [Commented] (DRILL-6039) drillbit.sh graceful_stop does not wait for fragments to complete before stopping the drillbit
[ https://issues.apache.org/jira/browse/DRILL-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16699599#comment-16699599 ] ASF GitHub Bot commented on DRILL-6039: --- dvjyothsna commented on issue #1536: DRILL-6039: Fixed drillbit.sh script to do graceful shutdown URL: https://github.com/apache/drill/pull/1536#issuecomment-441804936 Thank you Sorabh! Squashed all the commits. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > drillbit.sh graceful_stop does not wait for fragments to complete before > stopping the drillbit > -- > > Key: DRILL-6039 > URL: https://issues.apache.org/jira/browse/DRILL-6039 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.3.0 >Reporter: Krystal >Assignee: Venkata Jyothsna Donapati >Priority: Major > Labels: ready-to-commit > Fix For: 1.15.0 > > > git.commit.id.abbrev=eb0c403 > I have 3-nodes cluster with drillbits running on each node. I kicked off a > long running query. In the middle of the query, I did a "./drillbit.sh > graceful_stop" on one of the non-foreman node. The node was stopped within a > few seconds and the query failed with error: > Error: SYSTEM ERROR: IOException: Filesystem closed > Fragment 4:15 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6039) drillbit.sh graceful_stop does not wait for fragments to complete before stopping the drillbit
[ https://issues.apache.org/jira/browse/DRILL-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695286#comment-16695286 ] ASF GitHub Bot commented on DRILL-6039: --- dvjyothsna commented on a change in pull request #1536: DRILL-6039: Fixed drillbit.sh script to do graceful shutdown URL: https://github.com/apache/drill/pull/1536#discussion_r235547490 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/server/Drillbit.java ## @@ -335,6 +351,50 @@ private void javaPropertiesToSystemOptions() { } } + + // Polls for graceful file to check if graceful shutdown is triggered from the script. + private static class PollShutdownThread extends Thread { + +private final Drillbit drillbit; +private final StackTrace stackTrace; + +public PollShutdownThread(final Drillbit drillbit, final StackTrace stackTrace) { + this.drillbit = drillbit; + this.stackTrace = stackTrace; +} + +@Override +public void run () { + try { +pollShutdown(drillbit); + } catch (Exception e) { +throw new RuntimeException("Caught exception while polling for shutdown\n" + stackTrace, e); + } Review comment: Changed it This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > drillbit.sh graceful_stop does not wait for fragments to complete before > stopping the drillbit > -- > > Key: DRILL-6039 > URL: https://issues.apache.org/jira/browse/DRILL-6039 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.3.0 >Reporter: Krystal >Assignee: Venkata Jyothsna Donapati >Priority: Major > Fix For: 1.15.0 > > > git.commit.id.abbrev=eb0c403 > I have 3-nodes cluster with drillbits running on each node. I kicked off a > long running query. In the middle of the query, I did a "./drillbit.sh > graceful_stop" on one of the non-foreman node. The node was stopped within a > few seconds and the query failed with error: > Error: SYSTEM ERROR: IOException: Filesystem closed > Fragment 4:15 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6039) drillbit.sh graceful_stop does not wait for fragments to complete before stopping the drillbit
[ https://issues.apache.org/jira/browse/DRILL-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695288#comment-16695288 ] ASF GitHub Bot commented on DRILL-6039: --- dvjyothsna commented on a change in pull request #1536: DRILL-6039: Fixed drillbit.sh script to do graceful shutdown URL: https://github.com/apache/drill/pull/1536#discussion_r235547582 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/server/Drillbit.java ## @@ -335,6 +351,50 @@ private void javaPropertiesToSystemOptions() { } } + + // Polls for graceful file to check if graceful shutdown is triggered from the script. + private static class PollShutdownThread extends Thread { + +private final Drillbit drillbit; +private final StackTrace stackTrace; + +public PollShutdownThread(final Drillbit drillbit, final StackTrace stackTrace) { + this.drillbit = drillbit; + this.stackTrace = stackTrace; +} + +@Override +public void run () { + try { +pollShutdown(drillbit); + } catch (Exception e) { +throw new RuntimeException("Caught exception while polling for shutdown\n" + stackTrace, e); + } +} + +private void pollShutdown(Drillbit drillbit) throws IOException, InterruptedException { + final Path path = FileSystems.getDefault().getPath(System.getenv("DRILL_PID_DIR")); + final String file = System.getenv("GRACEFUL_SIGFILE"); + boolean triggered_shutdown = false; + try (final WatchService watchService = FileSystems.getDefault().newWatchService()) { +path.register(watchService, StandardWatchEventKinds.ENTRY_MODIFY, StandardWatchEventKinds.ENTRY_CREATE); +while (!triggered_shutdown) { + final WatchKey wk = watchService.take(); + for (WatchEvent event : wk.pollEvents()) { +final Path changed = (Path) event.context(); Review comment: Added a null check for event.context(). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > drillbit.sh graceful_stop does not wait for fragments to complete before > stopping the drillbit > -- > > Key: DRILL-6039 > URL: https://issues.apache.org/jira/browse/DRILL-6039 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.3.0 >Reporter: Krystal >Assignee: Venkata Jyothsna Donapati >Priority: Major > Fix For: 1.15.0 > > > git.commit.id.abbrev=eb0c403 > I have 3-nodes cluster with drillbits running on each node. I kicked off a > long running query. In the middle of the query, I did a "./drillbit.sh > graceful_stop" on one of the non-foreman node. The node was stopped within a > few seconds and the query failed with error: > Error: SYSTEM ERROR: IOException: Filesystem closed > Fragment 4:15 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6039) drillbit.sh graceful_stop does not wait for fragments to complete before stopping the drillbit
[ https://issues.apache.org/jira/browse/DRILL-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695291#comment-16695291 ] ASF GitHub Bot commented on DRILL-6039: --- dvjyothsna commented on a change in pull request #1536: DRILL-6039: Fixed drillbit.sh script to do graceful shutdown URL: https://github.com/apache/drill/pull/1536#discussion_r235547885 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/server/Drillbit.java ## @@ -335,6 +351,50 @@ private void javaPropertiesToSystemOptions() { } } + + // Polls for graceful file to check if graceful shutdown is triggered from the script. + private static class PollShutdownThread extends Thread { + +private final Drillbit drillbit; +private final StackTrace stackTrace; + +public PollShutdownThread(final Drillbit drillbit, final StackTrace stackTrace) { + this.drillbit = drillbit; + this.stackTrace = stackTrace; +} + +@Override +public void run () { + try { +pollShutdown(drillbit); + } catch (Exception e) { +throw new RuntimeException("Caught exception while polling for shutdown\n" + stackTrace, e); + } +} + +private void pollShutdown(Drillbit drillbit) throws IOException, InterruptedException { + final Path path = FileSystems.getDefault().getPath(System.getenv("DRILL_PID_DIR")); + final String file = System.getenv("GRACEFUL_SIGFILE"); + boolean triggered_shutdown = false; + try (final WatchService watchService = FileSystems.getDefault().newWatchService()) { +path.register(watchService, StandardWatchEventKinds.ENTRY_MODIFY, StandardWatchEventKinds.ENTRY_CREATE); +while (!triggered_shutdown) { + final WatchKey wk = watchService.take(); + for (WatchEvent event : wk.pollEvents()) { +final Path changed = (Path) event.context(); +if (changed.endsWith(file)) { + drillbit.interruptPollShutdown = false; + triggered_shutdown = true; + drillbit.close(); + wk.cancel(); Review comment: Done. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > drillbit.sh graceful_stop does not wait for fragments to complete before > stopping the drillbit > -- > > Key: DRILL-6039 > URL: https://issues.apache.org/jira/browse/DRILL-6039 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.3.0 >Reporter: Krystal >Assignee: Venkata Jyothsna Donapati >Priority: Major > Fix For: 1.15.0 > > > git.commit.id.abbrev=eb0c403 > I have 3-nodes cluster with drillbits running on each node. I kicked off a > long running query. In the middle of the query, I did a "./drillbit.sh > graceful_stop" on one of the non-foreman node. The node was stopped within a > few seconds and the query failed with error: > Error: SYSTEM ERROR: IOException: Filesystem closed > Fragment 4:15 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6039) drillbit.sh graceful_stop does not wait for fragments to complete before stopping the drillbit
[ https://issues.apache.org/jira/browse/DRILL-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695292#comment-16695292 ] ASF GitHub Bot commented on DRILL-6039: --- dvjyothsna commented on a change in pull request #1536: DRILL-6039: Fixed drillbit.sh script to do graceful shutdown URL: https://github.com/apache/drill/pull/1536#discussion_r235547910 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/server/Drillbit.java ## @@ -335,6 +351,50 @@ private void javaPropertiesToSystemOptions() { } } + + // Polls for graceful file to check if graceful shutdown is triggered from the script. + private static class PollShutdownThread extends Thread { + +private final Drillbit drillbit; +private final StackTrace stackTrace; + +public PollShutdownThread(final Drillbit drillbit, final StackTrace stackTrace) { + this.drillbit = drillbit; + this.stackTrace = stackTrace; +} + +@Override +public void run () { + try { +pollShutdown(drillbit); + } catch (Exception e) { +throw new RuntimeException("Caught exception while polling for shutdown\n" + stackTrace, e); + } +} + +private void pollShutdown(Drillbit drillbit) throws IOException, InterruptedException { Review comment: Done. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > drillbit.sh graceful_stop does not wait for fragments to complete before > stopping the drillbit > -- > > Key: DRILL-6039 > URL: https://issues.apache.org/jira/browse/DRILL-6039 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.3.0 >Reporter: Krystal >Assignee: Venkata Jyothsna Donapati >Priority: Major > Fix For: 1.15.0 > > > git.commit.id.abbrev=eb0c403 > I have 3-nodes cluster with drillbits running on each node. I kicked off a > long running query. In the middle of the query, I did a "./drillbit.sh > graceful_stop" on one of the non-foreman node. The node was stopped within a > few seconds and the query failed with error: > Error: SYSTEM ERROR: IOException: Filesystem closed > Fragment 4:15 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6039) drillbit.sh graceful_stop does not wait for fragments to complete before stopping the drillbit
[ https://issues.apache.org/jira/browse/DRILL-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695290#comment-16695290 ] ASF GitHub Bot commented on DRILL-6039: --- dvjyothsna commented on a change in pull request #1536: DRILL-6039: Fixed drillbit.sh script to do graceful shutdown URL: https://github.com/apache/drill/pull/1536#discussion_r235547864 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/server/Drillbit.java ## @@ -335,6 +351,50 @@ private void javaPropertiesToSystemOptions() { } } + + // Polls for graceful file to check if graceful shutdown is triggered from the script. + private static class PollShutdownThread extends Thread { Review comment: Changed it. Initially, it was GracefulShutdownThread but I thought people might be confused since it only gracefully shuts down from the script. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > drillbit.sh graceful_stop does not wait for fragments to complete before > stopping the drillbit > -- > > Key: DRILL-6039 > URL: https://issues.apache.org/jira/browse/DRILL-6039 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.3.0 >Reporter: Krystal >Assignee: Venkata Jyothsna Donapati >Priority: Major > Fix For: 1.15.0 > > > git.commit.id.abbrev=eb0c403 > I have 3-nodes cluster with drillbits running on each node. I kicked off a > long running query. In the middle of the query, I did a "./drillbit.sh > graceful_stop" on one of the non-foreman node. The node was stopped within a > few seconds and the query failed with error: > Error: SYSTEM ERROR: IOException: Filesystem closed > Fragment 4:15 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6039) drillbit.sh graceful_stop does not wait for fragments to complete before stopping the drillbit
[ https://issues.apache.org/jira/browse/DRILL-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695287#comment-16695287 ] ASF GitHub Bot commented on DRILL-6039: --- dvjyothsna commented on a change in pull request #1536: DRILL-6039: Fixed drillbit.sh script to do graceful shutdown URL: https://github.com/apache/drill/pull/1536#discussion_r235547528 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/server/Drillbit.java ## @@ -291,6 +302,11 @@ public synchronized void close() { logger.info("Shutdown completed ({} ms).", w.elapsed(TimeUnit.MILLISECONDS) ); stateManager.setState(DrillbitState.SHUTDOWN); +// Interrupt the polling for shutdown since shutdown is triggered from WebUI. Review comment: Done. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > drillbit.sh graceful_stop does not wait for fragments to complete before > stopping the drillbit > -- > > Key: DRILL-6039 > URL: https://issues.apache.org/jira/browse/DRILL-6039 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.3.0 >Reporter: Krystal >Assignee: Venkata Jyothsna Donapati >Priority: Major > Fix For: 1.15.0 > > > git.commit.id.abbrev=eb0c403 > I have 3-nodes cluster with drillbits running on each node. I kicked off a > long running query. In the middle of the query, I did a "./drillbit.sh > graceful_stop" on one of the non-foreman node. The node was stopped within a > few seconds and the query failed with error: > Error: SYSTEM ERROR: IOException: Filesystem closed > Fragment 4:15 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6039) drillbit.sh graceful_stop does not wait for fragments to complete before stopping the drillbit
[ https://issues.apache.org/jira/browse/DRILL-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695281#comment-16695281 ] ASF GitHub Bot commented on DRILL-6039: --- dvjyothsna commented on a change in pull request #1536: DRILL-6039: Fixed drillbit.sh script to do graceful shutdown URL: https://github.com/apache/drill/pull/1536#discussion_r235547065 ## File path: distribution/src/resources/drillbit.sh ## @@ -192,6 +201,10 @@ start_bit ( ) echo $procId > $pidFile # Yeah, $pidFile is a file, $procId is the pid... echo $! > $pidFile sleep 1 + #remove any previous uncleaned graceful file + if [ -f "$GRACEFUL_FILE" ]; then +rm $GRACEFUL_FILE Review comment: The error thrown by rm is currently logged to console. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > drillbit.sh graceful_stop does not wait for fragments to complete before > stopping the drillbit > -- > > Key: DRILL-6039 > URL: https://issues.apache.org/jira/browse/DRILL-6039 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.3.0 >Reporter: Krystal >Assignee: Venkata Jyothsna Donapati >Priority: Major > Fix For: 1.15.0 > > > git.commit.id.abbrev=eb0c403 > I have 3-nodes cluster with drillbits running on each node. I kicked off a > long running query. In the middle of the query, I did a "./drillbit.sh > graceful_stop" on one of the non-foreman node. The node was stopped within a > few seconds and the query failed with error: > Error: SYSTEM ERROR: IOException: Filesystem closed > Fragment 4:15 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6039) drillbit.sh graceful_stop does not wait for fragments to complete before stopping the drillbit
[ https://issues.apache.org/jira/browse/DRILL-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695117#comment-16695117 ] ASF GitHub Bot commented on DRILL-6039: --- sohami commented on a change in pull request #1536: DRILL-6039: Fixed drillbit.sh script to do graceful shutdown URL: https://github.com/apache/drill/pull/1536#discussion_r235502954 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/server/Drillbit.java ## @@ -291,6 +302,11 @@ public synchronized void close() { logger.info("Shutdown completed ({} ms).", w.elapsed(TimeUnit.MILLISECONDS) ); stateManager.setState(DrillbitState.SHUTDOWN); +// Interrupt the polling for shutdown since shutdown is triggered from WebUI. Review comment: `Interrupt GracefulShutdownThread since Drillbit close is not called from it` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > drillbit.sh graceful_stop does not wait for fragments to complete before > stopping the drillbit > -- > > Key: DRILL-6039 > URL: https://issues.apache.org/jira/browse/DRILL-6039 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.3.0 >Reporter: Krystal >Assignee: Venkata Jyothsna Donapati >Priority: Major > Fix For: 1.15.0 > > > git.commit.id.abbrev=eb0c403 > I have 3-nodes cluster with drillbits running on each node. I kicked off a > long running query. In the middle of the query, I did a "./drillbit.sh > graceful_stop" on one of the non-foreman node. The node was stopped within a > few seconds and the query failed with error: > Error: SYSTEM ERROR: IOException: Filesystem closed > Fragment 4:15 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6039) drillbit.sh graceful_stop does not wait for fragments to complete before stopping the drillbit
[ https://issues.apache.org/jira/browse/DRILL-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695115#comment-16695115 ] ASF GitHub Bot commented on DRILL-6039: --- sohami commented on a change in pull request #1536: DRILL-6039: Fixed drillbit.sh script to do graceful shutdown URL: https://github.com/apache/drill/pull/1536#discussion_r235491872 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/server/Drillbit.java ## @@ -335,6 +351,50 @@ private void javaPropertiesToSystemOptions() { } } + + // Polls for graceful file to check if graceful shutdown is triggered from the script. + private static class PollShutdownThread extends Thread { Review comment: Rename to `GracefulShutdownThread` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > drillbit.sh graceful_stop does not wait for fragments to complete before > stopping the drillbit > -- > > Key: DRILL-6039 > URL: https://issues.apache.org/jira/browse/DRILL-6039 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.3.0 >Reporter: Krystal >Assignee: Venkata Jyothsna Donapati >Priority: Major > Fix For: 1.15.0 > > > git.commit.id.abbrev=eb0c403 > I have 3-nodes cluster with drillbits running on each node. I kicked off a > long running query. In the middle of the query, I did a "./drillbit.sh > graceful_stop" on one of the non-foreman node. The node was stopped within a > few seconds and the query failed with error: > Error: SYSTEM ERROR: IOException: Filesystem closed > Fragment 4:15 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6039) drillbit.sh graceful_stop does not wait for fragments to complete before stopping the drillbit
[ https://issues.apache.org/jira/browse/DRILL-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695118#comment-16695118 ] ASF GitHub Bot commented on DRILL-6039: --- sohami commented on a change in pull request #1536: DRILL-6039: Fixed drillbit.sh script to do graceful shutdown URL: https://github.com/apache/drill/pull/1536#discussion_r235501156 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/server/Drillbit.java ## @@ -335,6 +351,50 @@ private void javaPropertiesToSystemOptions() { } } + + // Polls for graceful file to check if graceful shutdown is triggered from the script. + private static class PollShutdownThread extends Thread { + +private final Drillbit drillbit; +private final StackTrace stackTrace; + +public PollShutdownThread(final Drillbit drillbit, final StackTrace stackTrace) { + this.drillbit = drillbit; + this.stackTrace = stackTrace; +} + +@Override +public void run () { + try { +pollShutdown(drillbit); + } catch (Exception e) { +throw new RuntimeException("Caught exception while polling for shutdown\n" + stackTrace, e); + } +} + +private void pollShutdown(Drillbit drillbit) throws IOException, InterruptedException { + final Path path = FileSystems.getDefault().getPath(System.getenv("DRILL_PID_DIR")); + final String file = System.getenv("GRACEFUL_SIGFILE"); + boolean triggered_shutdown = false; + try (final WatchService watchService = FileSystems.getDefault().newWatchService()) { +path.register(watchService, StandardWatchEventKinds.ENTRY_MODIFY, StandardWatchEventKinds.ENTRY_CREATE); +while (!triggered_shutdown) { + final WatchKey wk = watchService.take(); + for (WatchEvent event : wk.pollEvents()) { +final Path changed = (Path) event.context(); Review comment: `event.context()` can be null (See [here](https://docs.oracle.com/javase/7/docs/api/java/nio/file/WatchEvent.html#context())). please add a check for it. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > drillbit.sh graceful_stop does not wait for fragments to complete before > stopping the drillbit > -- > > Key: DRILL-6039 > URL: https://issues.apache.org/jira/browse/DRILL-6039 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.3.0 >Reporter: Krystal >Assignee: Venkata Jyothsna Donapati >Priority: Major > Fix For: 1.15.0 > > > git.commit.id.abbrev=eb0c403 > I have 3-nodes cluster with drillbits running on each node. I kicked off a > long running query. In the middle of the query, I did a "./drillbit.sh > graceful_stop" on one of the non-foreman node. The node was stopped within a > few seconds and the query failed with error: > Error: SYSTEM ERROR: IOException: Filesystem closed > Fragment 4:15 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6039) drillbit.sh graceful_stop does not wait for fragments to complete before stopping the drillbit
[ https://issues.apache.org/jira/browse/DRILL-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695116#comment-16695116 ] ASF GitHub Bot commented on DRILL-6039: --- sohami commented on a change in pull request #1536: DRILL-6039: Fixed drillbit.sh script to do graceful shutdown URL: https://github.com/apache/drill/pull/1536#discussion_r235505271 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/server/Drillbit.java ## @@ -335,6 +351,50 @@ private void javaPropertiesToSystemOptions() { } } + + // Polls for graceful file to check if graceful shutdown is triggered from the script. + private static class PollShutdownThread extends Thread { + +private final Drillbit drillbit; +private final StackTrace stackTrace; + +public PollShutdownThread(final Drillbit drillbit, final StackTrace stackTrace) { + this.drillbit = drillbit; + this.stackTrace = stackTrace; +} + +@Override +public void run () { + try { +pollShutdown(drillbit); + } catch (Exception e) { +throw new RuntimeException("Caught exception while polling for shutdown\n" + stackTrace, e); + } Review comment: I think you have to catch `InterruptedException` and consume it after logging rather than throwing `RuntimeException` for it. Since when `Drillbit.close` is called from another path it will interrupt this thread and we should not throw `RuntimeException` in that case. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > drillbit.sh graceful_stop does not wait for fragments to complete before > stopping the drillbit > -- > > Key: DRILL-6039 > URL: https://issues.apache.org/jira/browse/DRILL-6039 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.3.0 >Reporter: Krystal >Assignee: Venkata Jyothsna Donapati >Priority: Major > Fix For: 1.15.0 > > > git.commit.id.abbrev=eb0c403 > I have 3-nodes cluster with drillbits running on each node. I kicked off a > long running query. In the middle of the query, I did a "./drillbit.sh > graceful_stop" on one of the non-foreman node. The node was stopped within a > few seconds and the query failed with error: > Error: SYSTEM ERROR: IOException: Filesystem closed > Fragment 4:15 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6039) drillbit.sh graceful_stop does not wait for fragments to complete before stopping the drillbit
[ https://issues.apache.org/jira/browse/DRILL-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695120#comment-16695120 ] ASF GitHub Bot commented on DRILL-6039: --- sohami commented on a change in pull request #1536: DRILL-6039: Fixed drillbit.sh script to do graceful shutdown URL: https://github.com/apache/drill/pull/1536#discussion_r235490503 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/server/Drillbit.java ## @@ -335,6 +351,50 @@ private void javaPropertiesToSystemOptions() { } } + + // Polls for graceful file to check if graceful shutdown is triggered from the script. + private static class PollShutdownThread extends Thread { + +private final Drillbit drillbit; +private final StackTrace stackTrace; + +public PollShutdownThread(final Drillbit drillbit, final StackTrace stackTrace) { + this.drillbit = drillbit; + this.stackTrace = stackTrace; +} + +@Override +public void run () { + try { +pollShutdown(drillbit); + } catch (Exception e) { +throw new RuntimeException("Caught exception while polling for shutdown\n" + stackTrace, e); + } +} + +private void pollShutdown(Drillbit drillbit) throws IOException, InterruptedException { Review comment: please rename all the variables to be something else rather than using the type for name. Like `path-->drillPidDirPath` `file-->gracefulFileName` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > drillbit.sh graceful_stop does not wait for fragments to complete before > stopping the drillbit > -- > > Key: DRILL-6039 > URL: https://issues.apache.org/jira/browse/DRILL-6039 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.3.0 >Reporter: Krystal >Assignee: Venkata Jyothsna Donapati >Priority: Major > Fix For: 1.15.0 > > > git.commit.id.abbrev=eb0c403 > I have 3-nodes cluster with drillbits running on each node. I kicked off a > long running query. In the middle of the query, I did a "./drillbit.sh > graceful_stop" on one of the non-foreman node. The node was stopped within a > few seconds and the query failed with error: > Error: SYSTEM ERROR: IOException: Filesystem closed > Fragment 4:15 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6039) drillbit.sh graceful_stop does not wait for fragments to complete before stopping the drillbit
[ https://issues.apache.org/jira/browse/DRILL-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695119#comment-16695119 ] ASF GitHub Bot commented on DRILL-6039: --- sohami commented on a change in pull request #1536: DRILL-6039: Fixed drillbit.sh script to do graceful shutdown URL: https://github.com/apache/drill/pull/1536#discussion_r235488412 ## File path: distribution/src/resources/drillbit.sh ## @@ -192,6 +201,10 @@ start_bit ( ) echo $procId > $pidFile # Yeah, $pidFile is a file, $procId is the pid... echo $! > $pidFile sleep 1 + #remove any previous uncleaned graceful file + if [ -f "$GRACEFUL_FILE" ]; then +rm $GRACEFUL_FILE Review comment: we should check for error while removing the graceful file and if it fails then echo an error message and exit. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > drillbit.sh graceful_stop does not wait for fragments to complete before > stopping the drillbit > -- > > Key: DRILL-6039 > URL: https://issues.apache.org/jira/browse/DRILL-6039 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.3.0 >Reporter: Krystal >Assignee: Venkata Jyothsna Donapati >Priority: Major > Fix For: 1.15.0 > > > git.commit.id.abbrev=eb0c403 > I have 3-nodes cluster with drillbits running on each node. I kicked off a > long running query. In the middle of the query, I did a "./drillbit.sh > graceful_stop" on one of the non-foreman node. The node was stopped within a > few seconds and the query failed with error: > Error: SYSTEM ERROR: IOException: Filesystem closed > Fragment 4:15 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6039) drillbit.sh graceful_stop does not wait for fragments to complete before stopping the drillbit
[ https://issues.apache.org/jira/browse/DRILL-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695114#comment-16695114 ] ASF GitHub Bot commented on DRILL-6039: --- sohami commented on a change in pull request #1536: DRILL-6039: Fixed drillbit.sh script to do graceful shutdown URL: https://github.com/apache/drill/pull/1536#discussion_r235491237 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/server/Drillbit.java ## @@ -335,6 +351,50 @@ private void javaPropertiesToSystemOptions() { } } + + // Polls for graceful file to check if graceful shutdown is triggered from the script. + private static class PollShutdownThread extends Thread { + +private final Drillbit drillbit; +private final StackTrace stackTrace; + +public PollShutdownThread(final Drillbit drillbit, final StackTrace stackTrace) { + this.drillbit = drillbit; + this.stackTrace = stackTrace; +} + +@Override +public void run () { + try { +pollShutdown(drillbit); + } catch (Exception e) { +throw new RuntimeException("Caught exception while polling for shutdown\n" + stackTrace, e); + } +} + +private void pollShutdown(Drillbit drillbit) throws IOException, InterruptedException { + final Path path = FileSystems.getDefault().getPath(System.getenv("DRILL_PID_DIR")); + final String file = System.getenv("GRACEFUL_SIGFILE"); + boolean triggered_shutdown = false; + try (final WatchService watchService = FileSystems.getDefault().newWatchService()) { +path.register(watchService, StandardWatchEventKinds.ENTRY_MODIFY, StandardWatchEventKinds.ENTRY_CREATE); +while (!triggered_shutdown) { + final WatchKey wk = watchService.take(); + for (WatchEvent event : wk.pollEvents()) { +final Path changed = (Path) event.context(); +if (changed.endsWith(file)) { + drillbit.interruptPollShutdown = false; + triggered_shutdown = true; + drillbit.close(); + wk.cancel(); Review comment: put `wk.cancel()` in finally block to handle case when `drillbit.close()` throws exception This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > drillbit.sh graceful_stop does not wait for fragments to complete before > stopping the drillbit > -- > > Key: DRILL-6039 > URL: https://issues.apache.org/jira/browse/DRILL-6039 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.3.0 >Reporter: Krystal >Assignee: Venkata Jyothsna Donapati >Priority: Major > Fix For: 1.15.0 > > > git.commit.id.abbrev=eb0c403 > I have 3-nodes cluster with drillbits running on each node. I kicked off a > long running query. In the middle of the query, I did a "./drillbit.sh > graceful_stop" on one of the non-foreman node. The node was stopped within a > few seconds and the query failed with error: > Error: SYSTEM ERROR: IOException: Filesystem closed > Fragment 4:15 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6039) drillbit.sh graceful_stop does not wait for fragments to complete before stopping the drillbit
[ https://issues.apache.org/jira/browse/DRILL-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16693943#comment-16693943 ] ASF GitHub Bot commented on DRILL-6039: --- dvjyothsna commented on a change in pull request #1536: DRILL-6039: Fixed drillbit.sh script to do graceful shutdown URL: https://github.com/apache/drill/pull/1536#discussion_r235218184 ## File path: distribution/src/resources/drill-config.sh ## @@ -334,6 +334,7 @@ fi # provided in drill-env.sh. export DRILL_PID_DIR=${DRILL_PID_DIR:-$DRILL_HOME} +export GRACEFUL_FILE_SUFFIX=GRACEFUL_SIGFILE Review comment: Changed it. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > drillbit.sh graceful_stop does not wait for fragments to complete before > stopping the drillbit > -- > > Key: DRILL-6039 > URL: https://issues.apache.org/jira/browse/DRILL-6039 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.3.0 >Reporter: Krystal >Assignee: Venkata Jyothsna Donapati >Priority: Major > Fix For: 1.15.0 > > > git.commit.id.abbrev=eb0c403 > I have 3-nodes cluster with drillbits running on each node. I kicked off a > long running query. In the middle of the query, I did a "./drillbit.sh > graceful_stop" on one of the non-foreman node. The node was stopped within a > few seconds and the query failed with error: > Error: SYSTEM ERROR: IOException: Filesystem closed > Fragment 4:15 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6039) drillbit.sh graceful_stop does not wait for fragments to complete before stopping the drillbit
[ https://issues.apache.org/jira/browse/DRILL-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16693817#comment-16693817 ] ASF GitHub Bot commented on DRILL-6039: --- dvjyothsna commented on a change in pull request #1536: DRILL-6039: Fixed drillbit.sh script to do graceful shutdown URL: https://github.com/apache/drill/pull/1536#discussion_r235181475 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/server/Drillbit.java ## @@ -285,6 +286,10 @@ public synchronized void close() { if (storeProvider != profileStoreProvider) { AutoCloseables.close(profileStoreProvider); } + File f = new File(System.getenv("DRILL_PID_DIR") + "/.graceful"); Review comment: Added the file name(GRACEFUL_SIGFILE) in drill-config.sh This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > drillbit.sh graceful_stop does not wait for fragments to complete before > stopping the drillbit > -- > > Key: DRILL-6039 > URL: https://issues.apache.org/jira/browse/DRILL-6039 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.3.0 >Reporter: Krystal >Assignee: Venkata Jyothsna Donapati >Priority: Major > Fix For: 1.15.0 > > > git.commit.id.abbrev=eb0c403 > I have 3-nodes cluster with drillbits running on each node. I kicked off a > long running query. In the middle of the query, I did a "./drillbit.sh > graceful_stop" on one of the non-foreman node. The node was stopped within a > few seconds and the query failed with error: > Error: SYSTEM ERROR: IOException: Filesystem closed > Fragment 4:15 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6039) drillbit.sh graceful_stop does not wait for fragments to complete before stopping the drillbit
[ https://issues.apache.org/jira/browse/DRILL-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16693830#comment-16693830 ] ASF GitHub Bot commented on DRILL-6039: --- dvjyothsna commented on a change in pull request #1536: DRILL-6039: Fixed drillbit.sh script to do graceful shutdown URL: https://github.com/apache/drill/pull/1536#discussion_r235184369 ## File path: distribution/src/resources/drill-config.sh ## @@ -334,6 +334,7 @@ fi # provided in drill-env.sh. export DRILL_PID_DIR=${DRILL_PID_DIR:-$DRILL_HOME} +export GRACEFUL_FILE_SUFFIX=GRACEFUL_SIGFILE Review comment: I thought you asked me to change the name of the file graceful to GRACEFUL_SIGFILE. Let me change it. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > drillbit.sh graceful_stop does not wait for fragments to complete before > stopping the drillbit > -- > > Key: DRILL-6039 > URL: https://issues.apache.org/jira/browse/DRILL-6039 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.3.0 >Reporter: Krystal >Assignee: Venkata Jyothsna Donapati >Priority: Major > Fix For: 1.15.0 > > > git.commit.id.abbrev=eb0c403 > I have 3-nodes cluster with drillbits running on each node. I kicked off a > long running query. In the middle of the query, I did a "./drillbit.sh > graceful_stop" on one of the non-foreman node. The node was stopped within a > few seconds and the query failed with error: > Error: SYSTEM ERROR: IOException: Filesystem closed > Fragment 4:15 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6039) drillbit.sh graceful_stop does not wait for fragments to complete before stopping the drillbit
[ https://issues.apache.org/jira/browse/DRILL-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16693827#comment-16693827 ] ASF GitHub Bot commented on DRILL-6039: --- kkhatua commented on a change in pull request #1536: DRILL-6039: Fixed drillbit.sh script to do graceful shutdown URL: https://github.com/apache/drill/pull/1536#discussion_r235183590 ## File path: distribution/src/resources/drill-config.sh ## @@ -334,6 +334,7 @@ fi # provided in drill-env.sh. export DRILL_PID_DIR=${DRILL_PID_DIR:-$DRILL_HOME} +export GRACEFUL_FILE_SUFFIX=GRACEFUL_SIGFILE Review comment: Where is the value for `GRACEFUL_SIGFILE` defined? Provide a default if not set ``` export GRACEFUL_FILE_SUFFIX=${GRACEFUL_SIGFILE:-"graceful"} ``` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > drillbit.sh graceful_stop does not wait for fragments to complete before > stopping the drillbit > -- > > Key: DRILL-6039 > URL: https://issues.apache.org/jira/browse/DRILL-6039 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.3.0 >Reporter: Krystal >Assignee: Venkata Jyothsna Donapati >Priority: Major > Fix For: 1.15.0 > > > git.commit.id.abbrev=eb0c403 > I have 3-nodes cluster with drillbits running on each node. I kicked off a > long running query. In the middle of the query, I did a "./drillbit.sh > graceful_stop" on one of the non-foreman node. The node was stopped within a > few seconds and the query failed with error: > Error: SYSTEM ERROR: IOException: Filesystem closed > Fragment 4:15 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6039) drillbit.sh graceful_stop does not wait for fragments to complete before stopping the drillbit
[ https://issues.apache.org/jira/browse/DRILL-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16693825#comment-16693825 ] ASF GitHub Bot commented on DRILL-6039: --- kkhatua commented on a change in pull request #1536: DRILL-6039: Fixed drillbit.sh script to do graceful shutdown URL: https://github.com/apache/drill/pull/1536#discussion_r235183590 ## File path: distribution/src/resources/drill-config.sh ## @@ -334,6 +334,7 @@ fi # provided in drill-env.sh. export DRILL_PID_DIR=${DRILL_PID_DIR:-$DRILL_HOME} +export GRACEFUL_FILE_SUFFIX=GRACEFUL_SIGFILE Review comment: Where is the value for `GRACEFUL_SIGFILE` defined? Provide a default if not set ``` export FOO=${FOO:-"value"} ``` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > drillbit.sh graceful_stop does not wait for fragments to complete before > stopping the drillbit > -- > > Key: DRILL-6039 > URL: https://issues.apache.org/jira/browse/DRILL-6039 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.3.0 >Reporter: Krystal >Assignee: Venkata Jyothsna Donapati >Priority: Major > Fix For: 1.15.0 > > > git.commit.id.abbrev=eb0c403 > I have 3-nodes cluster with drillbits running on each node. I kicked off a > long running query. In the middle of the query, I did a "./drillbit.sh > graceful_stop" on one of the non-foreman node. The node was stopped within a > few seconds and the query failed with error: > Error: SYSTEM ERROR: IOException: Filesystem closed > Fragment 4:15 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6039) drillbit.sh graceful_stop does not wait for fragments to complete before stopping the drillbit
[ https://issues.apache.org/jira/browse/DRILL-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16693821#comment-16693821 ] ASF GitHub Bot commented on DRILL-6039: --- dvjyothsna commented on a change in pull request #1536: DRILL-6039: Fixed drillbit.sh script to do graceful shutdown URL: https://github.com/apache/drill/pull/1536#discussion_r235181928 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/work/WorkManager.java ## @@ -442,6 +467,14 @@ public void run() { // StatusThread is started final Controller controller = dContext.getController(); final DrillbitEndpoint localBitEndPoint = dContext.getEndpoint(); + try { +pollShutdown(drillbit); + } catch (IOException e) { +e.printStackTrace(); Review comment: Changed it. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > drillbit.sh graceful_stop does not wait for fragments to complete before > stopping the drillbit > -- > > Key: DRILL-6039 > URL: https://issues.apache.org/jira/browse/DRILL-6039 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.3.0 >Reporter: Krystal >Assignee: Venkata Jyothsna Donapati >Priority: Major > Fix For: 1.15.0 > > > git.commit.id.abbrev=eb0c403 > I have 3-nodes cluster with drillbits running on each node. I kicked off a > long running query. In the middle of the query, I did a "./drillbit.sh > graceful_stop" on one of the non-foreman node. The node was stopped within a > few seconds and the query failed with error: > Error: SYSTEM ERROR: IOException: Filesystem closed > Fragment 4:15 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6039) drillbit.sh graceful_stop does not wait for fragments to complete before stopping the drillbit
[ https://issues.apache.org/jira/browse/DRILL-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16693820#comment-16693820 ] ASF GitHub Bot commented on DRILL-6039: --- dvjyothsna commented on a change in pull request #1536: DRILL-6039: Fixed drillbit.sh script to do graceful shutdown URL: https://github.com/apache/drill/pull/1536#discussion_r235181861 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/work/WorkManager.java ## @@ -433,6 +442,22 @@ public void run() { public StatusThread() { // assume this thread is created by a non-daemon thread setName("WorkManager.StatusThread"); + } + private void pollShutdown(Drillbit drillbit) throws IOException, InterruptedException { + final Path path = FileSystems.getDefault().getPath(System.getenv("DRILL_PID_DIR")); + try (final WatchService watchService = FileSystems.getDefault().newWatchService()) { +path.register(watchService, StandardWatchEventKinds.ENTRY_MODIFY, StandardWatchEventKinds.ENTRY_CREATE); +while (true) { + final WatchKey wk = watchService.take(); Review comment: @sohami I have made the changes based on your comments. Please let me know if you have any comments. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > drillbit.sh graceful_stop does not wait for fragments to complete before > stopping the drillbit > -- > > Key: DRILL-6039 > URL: https://issues.apache.org/jira/browse/DRILL-6039 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.3.0 >Reporter: Krystal >Assignee: Venkata Jyothsna Donapati >Priority: Major > Fix For: 1.15.0 > > > git.commit.id.abbrev=eb0c403 > I have 3-nodes cluster with drillbits running on each node. I kicked off a > long running query. In the middle of the query, I did a "./drillbit.sh > graceful_stop" on one of the non-foreman node. The node was stopped within a > few seconds and the query failed with error: > Error: SYSTEM ERROR: IOException: Filesystem closed > Fragment 4:15 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6039) drillbit.sh graceful_stop does not wait for fragments to complete before stopping the drillbit
[ https://issues.apache.org/jira/browse/DRILL-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16693819#comment-16693819 ] ASF GitHub Bot commented on DRILL-6039: --- dvjyothsna commented on a change in pull request #1536: DRILL-6039: Fixed drillbit.sh script to do graceful shutdown URL: https://github.com/apache/drill/pull/1536#discussion_r235181676 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/server/Drillbit.java ## @@ -285,6 +286,10 @@ public synchronized void close() { if (storeProvider != profileStoreProvider) { AutoCloseables.close(profileStoreProvider); } + File f = new File(System.getenv("DRILL_PID_DIR") + "/.graceful"); + if (f.exists()) { +f.delete(); Review comment: Deletion of the file is done in drillbit.sh start This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > drillbit.sh graceful_stop does not wait for fragments to complete before > stopping the drillbit > -- > > Key: DRILL-6039 > URL: https://issues.apache.org/jira/browse/DRILL-6039 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.3.0 >Reporter: Krystal >Assignee: Venkata Jyothsna Donapati >Priority: Major > Fix For: 1.15.0 > > > git.commit.id.abbrev=eb0c403 > I have 3-nodes cluster with drillbits running on each node. I kicked off a > long running query. In the middle of the query, I did a "./drillbit.sh > graceful_stop" on one of the non-foreman node. The node was stopped within a > few seconds and the query failed with error: > Error: SYSTEM ERROR: IOException: Filesystem closed > Fragment 4:15 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6039) drillbit.sh graceful_stop does not wait for fragments to complete before stopping the drillbit
[ https://issues.apache.org/jira/browse/DRILL-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16693816#comment-16693816 ] ASF GitHub Bot commented on DRILL-6039: --- dvjyothsna commented on a change in pull request #1536: DRILL-6039: Fixed drillbit.sh script to do graceful shutdown URL: https://github.com/apache/drill/pull/1536#discussion_r235181291 ## File path: distribution/src/resources/drillbit.sh ## @@ -94,11 +94,24 @@ waitForProcessEnd() commandName=$2 kill_drillbit=$3 processedAt=`date +%s` + triggered_shutdown=false + FILE=$DRILL_PID_DIR/.graceful origcnt=${DRILL_STOP_TIMEOUT:-120} while kill -0 $pidKilled > /dev/null 2>&1; do echo -n "." sleep 1; + #Incase of graceful shutdown, create .graceful file and wait till it is deleted to trigger kill command. + if [ "$kill_drillbit" = false ]; then + if [ "$triggered_shutdown" = false ]; then + touch $DRILL_PID_DIR/.graceful + triggered_shutdown=true + else + if [ ! -f "$FILE" ]; then + kill $pidKilled > /dev/null 2>&1; Review comment: Removed this code and drillbit.sh start will delete the sig file. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > drillbit.sh graceful_stop does not wait for fragments to complete before > stopping the drillbit > -- > > Key: DRILL-6039 > URL: https://issues.apache.org/jira/browse/DRILL-6039 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.3.0 >Reporter: Krystal >Assignee: Venkata Jyothsna Donapati >Priority: Major > Fix For: 1.15.0 > > > git.commit.id.abbrev=eb0c403 > I have 3-nodes cluster with drillbits running on each node. I kicked off a > long running query. In the middle of the query, I did a "./drillbit.sh > graceful_stop" on one of the non-foreman node. The node was stopped within a > few seconds and the query failed with error: > Error: SYSTEM ERROR: IOException: Filesystem closed > Fragment 4:15 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6039) drillbit.sh graceful_stop does not wait for fragments to complete before stopping the drillbit
[ https://issues.apache.org/jira/browse/DRILL-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16693814#comment-16693814 ] ASF GitHub Bot commented on DRILL-6039: --- dvjyothsna commented on a change in pull request #1536: DRILL-6039: Fixed drillbit.sh script to do graceful shutdown URL: https://github.com/apache/drill/pull/1536#discussion_r235180886 ## File path: distribution/src/resources/drillbit.sh ## @@ -94,11 +94,24 @@ waitForProcessEnd() commandName=$2 kill_drillbit=$3 processedAt=`date +%s` + triggered_shutdown=false + FILE=$DRILL_PID_DIR/.graceful Review comment: Changed the name. Should I make it hidden or not? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > drillbit.sh graceful_stop does not wait for fragments to complete before > stopping the drillbit > -- > > Key: DRILL-6039 > URL: https://issues.apache.org/jira/browse/DRILL-6039 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.3.0 >Reporter: Krystal >Assignee: Venkata Jyothsna Donapati >Priority: Major > Fix For: 1.15.0 > > > git.commit.id.abbrev=eb0c403 > I have 3-nodes cluster with drillbits running on each node. I kicked off a > long running query. In the middle of the query, I did a "./drillbit.sh > graceful_stop" on one of the non-foreman node. The node was stopped within a > few seconds and the query failed with error: > Error: SYSTEM ERROR: IOException: Filesystem closed > Fragment 4:15 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6039) drillbit.sh graceful_stop does not wait for fragments to complete before stopping the drillbit
[ https://issues.apache.org/jira/browse/DRILL-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16693702#comment-16693702 ] ASF GitHub Bot commented on DRILL-6039: --- kkhatua commented on a change in pull request #1536: DRILL-6039: Fixed drillbit.sh script to do graceful shutdown URL: https://github.com/apache/drill/pull/1536#discussion_r235140653 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/server/Drillbit.java ## @@ -285,6 +286,10 @@ public synchronized void close() { if (storeProvider != profileStoreProvider) { AutoCloseables.close(profileStoreProvider); } + File f = new File(System.getenv("DRILL_PID_DIR") + "/.graceful"); Review comment: You could provide this as an env var (full path: `DRILL_PID_DIR+"/"+GRACEFUL_SIGFILE`) within the `drill-config.sh` (ref: https://github.com/apache/drill/blob/master/distribution/src/resources/drill-config.sh#L336 ) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > drillbit.sh graceful_stop does not wait for fragments to complete before > stopping the drillbit > -- > > Key: DRILL-6039 > URL: https://issues.apache.org/jira/browse/DRILL-6039 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.3.0 >Reporter: Krystal >Assignee: Venkata Jyothsna Donapati >Priority: Major > Fix For: 1.15.0 > > > git.commit.id.abbrev=eb0c403 > I have 3-nodes cluster with drillbits running on each node. I kicked off a > long running query. In the middle of the query, I did a "./drillbit.sh > graceful_stop" on one of the non-foreman node. The node was stopped within a > few seconds and the query failed with error: > Error: SYSTEM ERROR: IOException: Filesystem closed > Fragment 4:15 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6039) drillbit.sh graceful_stop does not wait for fragments to complete before stopping the drillbit
[ https://issues.apache.org/jira/browse/DRILL-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16693701#comment-16693701 ] ASF GitHub Bot commented on DRILL-6039: --- kkhatua commented on a change in pull request #1536: DRILL-6039: Fixed drillbit.sh script to do graceful shutdown URL: https://github.com/apache/drill/pull/1536#discussion_r235138498 ## File path: distribution/src/resources/drillbit.sh ## @@ -94,11 +94,24 @@ waitForProcessEnd() commandName=$2 kill_drillbit=$3 processedAt=`date +%s` + triggered_shutdown=false + FILE=$DRILL_PID_DIR/.graceful Review comment: I think making it hidden is practical, since this is only serves the purpose of signalling and is not expected to be around visible. Also, @dvjyothsna , it might be worth changing the variable name to something like `GRACEFUL_SIGFILE` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > drillbit.sh graceful_stop does not wait for fragments to complete before > stopping the drillbit > -- > > Key: DRILL-6039 > URL: https://issues.apache.org/jira/browse/DRILL-6039 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.3.0 >Reporter: Krystal >Assignee: Venkata Jyothsna Donapati >Priority: Major > Fix For: 1.15.0 > > > git.commit.id.abbrev=eb0c403 > I have 3-nodes cluster with drillbits running on each node. I kicked off a > long running query. In the middle of the query, I did a "./drillbit.sh > graceful_stop" on one of the non-foreman node. The node was stopped within a > few seconds and the query failed with error: > Error: SYSTEM ERROR: IOException: Filesystem closed > Fragment 4:15 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6039) drillbit.sh graceful_stop does not wait for fragments to complete before stopping the drillbit
[ https://issues.apache.org/jira/browse/DRILL-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16693700#comment-16693700 ] ASF GitHub Bot commented on DRILL-6039: --- kkhatua commented on a change in pull request #1536: DRILL-6039: Fixed drillbit.sh script to do graceful shutdown URL: https://github.com/apache/drill/pull/1536#discussion_r235139192 ## File path: distribution/src/resources/drillbit.sh ## @@ -94,11 +94,24 @@ waitForProcessEnd() commandName=$2 kill_drillbit=$3 processedAt=`date +%s` + triggered_shutdown=false + FILE=$DRILL_PID_DIR/.graceful origcnt=${DRILL_STOP_TIMEOUT:-120} while kill -0 $pidKilled > /dev/null 2>&1; do echo -n "." sleep 1; + #Incase of graceful shutdown, create .graceful file and wait till it is deleted to trigger kill command. + if [ "$kill_drillbit" = false ]; then + if [ "$triggered_shutdown" = false ]; then + touch $DRILL_PID_DIR/.graceful + triggered_shutdown=true + else + if [ ! -f "$FILE" ]; then + kill $pidKilled > /dev/null 2>&1; Review comment: @sohami I think she is just playing it safe by doing a clean up. A suggestion... for all the `kill` commands, provide the signal as part of the arguments. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > drillbit.sh graceful_stop does not wait for fragments to complete before > stopping the drillbit > -- > > Key: DRILL-6039 > URL: https://issues.apache.org/jira/browse/DRILL-6039 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.3.0 >Reporter: Krystal >Assignee: Venkata Jyothsna Donapati >Priority: Major > Fix For: 1.15.0 > > > git.commit.id.abbrev=eb0c403 > I have 3-nodes cluster with drillbits running on each node. I kicked off a > long running query. In the middle of the query, I did a "./drillbit.sh > graceful_stop" on one of the non-foreman node. The node was stopped within a > few seconds and the query failed with error: > Error: SYSTEM ERROR: IOException: Filesystem closed > Fragment 4:15 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6039) drillbit.sh graceful_stop does not wait for fragments to complete before stopping the drillbit
[ https://issues.apache.org/jira/browse/DRILL-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16691369#comment-16691369 ] ASF GitHub Bot commented on DRILL-6039: --- arina-ielchiieva commented on a change in pull request #1536: DRILL-6039: Fixed drillbit.sh script to do graceful shutdown URL: https://github.com/apache/drill/pull/1536#discussion_r234524071 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/work/WorkManager.java ## @@ -442,6 +467,14 @@ public void run() { // StatusThread is started final Controller controller = dContext.getController(); final DrillbitEndpoint localBitEndPoint = dContext.getEndpoint(); + try { +pollShutdown(drillbit); + } catch (IOException e) { +e.printStackTrace(); Review comment: I think we should not use `e.printStackTrace();` but rather do proper error handling or logging. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > drillbit.sh graceful_stop does not wait for fragments to complete before > stopping the drillbit > -- > > Key: DRILL-6039 > URL: https://issues.apache.org/jira/browse/DRILL-6039 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.3.0 >Reporter: Krystal >Assignee: Venkata Jyothsna Donapati >Priority: Major > Fix For: 1.15.0 > > > git.commit.id.abbrev=eb0c403 > I have 3-nodes cluster with drillbits running on each node. I kicked off a > long running query. In the middle of the query, I did a "./drillbit.sh > graceful_stop" on one of the non-foreman node. The node was stopped within a > few seconds and the query failed with error: > Error: SYSTEM ERROR: IOException: Filesystem closed > Fragment 4:15 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6039) drillbit.sh graceful_stop does not wait for fragments to complete before stopping the drillbit
[ https://issues.apache.org/jira/browse/DRILL-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16689908#comment-16689908 ] ASF GitHub Bot commented on DRILL-6039: --- dvjyothsna commented on a change in pull request #1536: DRILL-6039: Fixed drillbit.sh script to do graceful shutdown URL: https://github.com/apache/drill/pull/1536#discussion_r234318353 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/server/Drillbit.java ## @@ -285,6 +286,10 @@ public synchronized void close() { if (storeProvider != profileStoreProvider) { AutoCloseables.close(profileStoreProvider); } + File f = new File(System.getenv("DRILL_PID_DIR") + "/.graceful"); Review comment: Can we get that config param in the script? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > drillbit.sh graceful_stop does not wait for fragments to complete before > stopping the drillbit > -- > > Key: DRILL-6039 > URL: https://issues.apache.org/jira/browse/DRILL-6039 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.3.0 >Reporter: Krystal >Assignee: Venkata Jyothsna Donapati >Priority: Major > Fix For: 1.15.0 > > > git.commit.id.abbrev=eb0c403 > I have 3-nodes cluster with drillbits running on each node. I kicked off a > long running query. In the middle of the query, I did a "./drillbit.sh > graceful_stop" on one of the non-foreman node. The node was stopped within a > few seconds and the query failed with error: > Error: SYSTEM ERROR: IOException: Filesystem closed > Fragment 4:15 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6039) drillbit.sh graceful_stop does not wait for fragments to complete before stopping the drillbit
[ https://issues.apache.org/jira/browse/DRILL-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16689793#comment-16689793 ] ASF GitHub Bot commented on DRILL-6039: --- sohami commented on a change in pull request #1536: DRILL-6039: Fixed drillbit.sh script to do graceful shutdown URL: https://github.com/apache/drill/pull/1536#discussion_r234070327 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/server/Drillbit.java ## @@ -285,6 +286,10 @@ public synchronized void close() { if (storeProvider != profileStoreProvider) { AutoCloseables.close(profileStoreProvider); } + File f = new File(System.getenv("DRILL_PID_DIR") + "/.graceful"); Review comment: Provide this as a configuration parameter with default value being `graceful` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > drillbit.sh graceful_stop does not wait for fragments to complete before > stopping the drillbit > -- > > Key: DRILL-6039 > URL: https://issues.apache.org/jira/browse/DRILL-6039 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.3.0 >Reporter: Krystal >Assignee: Venkata Jyothsna Donapati >Priority: Major > Fix For: 1.15.0 > > > git.commit.id.abbrev=eb0c403 > I have 3-nodes cluster with drillbits running on each node. I kicked off a > long running query. In the middle of the query, I did a "./drillbit.sh > graceful_stop" on one of the non-foreman node. The node was stopped within a > few seconds and the query failed with error: > Error: SYSTEM ERROR: IOException: Filesystem closed > Fragment 4:15 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6039) drillbit.sh graceful_stop does not wait for fragments to complete before stopping the drillbit
[ https://issues.apache.org/jira/browse/DRILL-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16689795#comment-16689795 ] ASF GitHub Bot commented on DRILL-6039: --- sohami commented on a change in pull request #1536: DRILL-6039: Fixed drillbit.sh script to do graceful shutdown URL: https://github.com/apache/drill/pull/1536#discussion_r233586484 ## File path: distribution/src/resources/drillbit.sh ## @@ -94,11 +94,24 @@ waitForProcessEnd() commandName=$2 kill_drillbit=$3 processedAt=`date +%s` + triggered_shutdown=false + FILE=$DRILL_PID_DIR/.graceful Review comment: Let's not make it a hidden file. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > drillbit.sh graceful_stop does not wait for fragments to complete before > stopping the drillbit > -- > > Key: DRILL-6039 > URL: https://issues.apache.org/jira/browse/DRILL-6039 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.3.0 >Reporter: Krystal >Assignee: Venkata Jyothsna Donapati >Priority: Major > Fix For: 1.15.0 > > > git.commit.id.abbrev=eb0c403 > I have 3-nodes cluster with drillbits running on each node. I kicked off a > long running query. In the middle of the query, I did a "./drillbit.sh > graceful_stop" on one of the non-foreman node. The node was stopped within a > few seconds and the query failed with error: > Error: SYSTEM ERROR: IOException: Filesystem closed > Fragment 4:15 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6039) drillbit.sh graceful_stop does not wait for fragments to complete before stopping the drillbit
[ https://issues.apache.org/jira/browse/DRILL-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16689794#comment-16689794 ] ASF GitHub Bot commented on DRILL-6039: --- sohami commented on a change in pull request #1536: DRILL-6039: Fixed drillbit.sh script to do graceful shutdown URL: https://github.com/apache/drill/pull/1536#discussion_r234070509 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/server/Drillbit.java ## @@ -285,6 +286,10 @@ public synchronized void close() { if (storeProvider != profileStoreProvider) { AutoCloseables.close(profileStoreProvider); } + File f = new File(System.getenv("DRILL_PID_DIR") + "/.graceful"); + if (f.exists()) { +f.delete(); Review comment: check and log warning based on the return value of `f.delete()`. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > drillbit.sh graceful_stop does not wait for fragments to complete before > stopping the drillbit > -- > > Key: DRILL-6039 > URL: https://issues.apache.org/jira/browse/DRILL-6039 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.3.0 >Reporter: Krystal >Assignee: Venkata Jyothsna Donapati >Priority: Major > Fix For: 1.15.0 > > > git.commit.id.abbrev=eb0c403 > I have 3-nodes cluster with drillbits running on each node. I kicked off a > long running query. In the middle of the query, I did a "./drillbit.sh > graceful_stop" on one of the non-foreman node. The node was stopped within a > few seconds and the query failed with error: > Error: SYSTEM ERROR: IOException: Filesystem closed > Fragment 4:15 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6039) drillbit.sh graceful_stop does not wait for fragments to complete before stopping the drillbit
[ https://issues.apache.org/jira/browse/DRILL-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16689792#comment-16689792 ] ASF GitHub Bot commented on DRILL-6039: --- sohami commented on a change in pull request #1536: DRILL-6039: Fixed drillbit.sh script to do graceful shutdown URL: https://github.com/apache/drill/pull/1536#discussion_r234069987 ## File path: distribution/src/resources/drillbit.sh ## @@ -94,11 +94,24 @@ waitForProcessEnd() commandName=$2 kill_drillbit=$3 processedAt=`date +%s` + triggered_shutdown=false + FILE=$DRILL_PID_DIR/.graceful origcnt=${DRILL_STOP_TIMEOUT:-120} while kill -0 $pidKilled > /dev/null 2>&1; do echo -n "." sleep 1; + #Incase of graceful shutdown, create .graceful file and wait till it is deleted to trigger kill command. + if [ "$kill_drillbit" = false ]; then + if [ "$triggered_shutdown" = false ]; then + touch $DRILL_PID_DIR/.graceful + triggered_shutdown=true + else + if [ ! -f "$FILE" ]; then + kill $pidKilled > /dev/null 2>&1; Review comment: this shouldn't be required since after `close` Drillbit process should exit. If not then we should find out the reason behind it. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > drillbit.sh graceful_stop does not wait for fragments to complete before > stopping the drillbit > -- > > Key: DRILL-6039 > URL: https://issues.apache.org/jira/browse/DRILL-6039 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.3.0 >Reporter: Krystal >Assignee: Venkata Jyothsna Donapati >Priority: Major > Fix For: 1.15.0 > > > git.commit.id.abbrev=eb0c403 > I have 3-nodes cluster with drillbits running on each node. I kicked off a > long running query. In the middle of the query, I did a "./drillbit.sh > graceful_stop" on one of the non-foreman node. The node was stopped within a > few seconds and the query failed with error: > Error: SYSTEM ERROR: IOException: Filesystem closed > Fragment 4:15 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6039) drillbit.sh graceful_stop does not wait for fragments to complete before stopping the drillbit
[ https://issues.apache.org/jira/browse/DRILL-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16689791#comment-16689791 ] ASF GitHub Bot commented on DRILL-6039: --- sohami commented on a change in pull request #1536: DRILL-6039: Fixed drillbit.sh script to do graceful shutdown URL: https://github.com/apache/drill/pull/1536#discussion_r234301751 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/work/WorkManager.java ## @@ -433,6 +442,22 @@ public void run() { public StatusThread() { // assume this thread is created by a non-daemon thread setName("WorkManager.StatusThread"); + } + private void pollShutdown(Drillbit drillbit) throws IOException, InterruptedException { + final Path path = FileSystems.getDefault().getPath(System.getenv("DRILL_PID_DIR")); + try (final WatchService watchService = FileSystems.getDefault().newWatchService()) { +path.register(watchService, StandardWatchEventKinds.ENTRY_MODIFY, StandardWatchEventKinds.ENTRY_CREATE); +while (true) { + final WatchKey wk = watchService.take(); Review comment: This design is not correct and has issues. - You are using `WorkManager` `StatusThread` for registering a `WatchService` and then doing a blocking call `take()` which will make this thread stuck until there is any event seen by `WatchService`. Hence `StatusThread` will not be able to perform it's actual job of sending status of `runningFragments` to it's `Foreman`. - The `pollShutdown` method is calling `Drillbit.close()` which is in `StatusThread`. Now `Drillbit.close() `calls close on `WorkManager` and which calls `interrupt` on `StatusThread`. There is a cycle here. What it means to call interrupt on itself ? Usually interrupt is used by another thread to wake up a blocking thread. I think based on interrupt call the interrupt flag of status thread will be set. Then when it comes out of `pollShutdown` and calls sleep it will hit interrupted exception. It might work but I don't like the idea where WorkManager which is part of Drillbit has a thread which calling `Drillbit.close()`. Please create a separate thread within Drillbit class itself to do this. But in `Drillbit::close()` method you need to still `interrupt` this new thread for cases when `close` it called from some other path. But you can maintain a flag which will indicate if `Thread.interrupt()` should be called or not. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > drillbit.sh graceful_stop does not wait for fragments to complete before > stopping the drillbit > -- > > Key: DRILL-6039 > URL: https://issues.apache.org/jira/browse/DRILL-6039 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.3.0 >Reporter: Krystal >Assignee: Venkata Jyothsna Donapati >Priority: Major > Fix For: 1.15.0 > > > git.commit.id.abbrev=eb0c403 > I have 3-nodes cluster with drillbits running on each node. I kicked off a > long running query. In the middle of the query, I did a "./drillbit.sh > graceful_stop" on one of the non-foreman node. The node was stopped within a > few seconds and the query failed with error: > Error: SYSTEM ERROR: IOException: Filesystem closed > Fragment 4:15 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6039) drillbit.sh graceful_stop does not wait for fragments to complete before stopping the drillbit
[ https://issues.apache.org/jira/browse/DRILL-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16684388#comment-16684388 ] ASF GitHub Bot commented on DRILL-6039: --- dvjyothsna opened a new pull request #1536: DRILL-6039: Fixed drillbit.sh script to do graceful shutdown URL: https://github.com/apache/drill/pull/1536 @sohami please review This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > drillbit.sh graceful_stop does not wait for fragments to complete before > stopping the drillbit > -- > > Key: DRILL-6039 > URL: https://issues.apache.org/jira/browse/DRILL-6039 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.3.0 >Reporter: Krystal >Assignee: Venkata Jyothsna Donapati >Priority: Major > Fix For: 1.15.0 > > > git.commit.id.abbrev=eb0c403 > I have 3-nodes cluster with drillbits running on each node. I kicked off a > long running query. In the middle of the query, I did a "./drillbit.sh > graceful_stop" on one of the non-foreman node. The node was stopped within a > few seconds and the query failed with error: > Error: SYSTEM ERROR: IOException: Filesystem closed > Fragment 4:15 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6039) drillbit.sh graceful_stop does not wait for fragments to complete before stopping the drillbit
[ https://issues.apache.org/jira/browse/DRILL-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16514293#comment-16514293 ] Krystal commented on DRILL-6039: The "drillbit.sh graceful_stop" from command line against parquet files still fails - does not wait for fragments to finish. Interesting thing is the problem does not occur when shutting down the drillbit from the WebUI. The drillbit.log does not show any memory leaks. Here is the stack trace: {code:java} Error: SYSTEM ERROR: IOException: Filesystem closed Fragment 1:10 (org.apache.drill.common.exceptions.DrillRuntimeException) Error in parquet record reader. Message: Failure in setting up reader Parquet Metadata: ParquetMetaData{FileMetaData{schema: message root { optional int64 l_orderkey; optional int64 l_partkey; optional int64 l_suppkey; optional int32 l_linenumber; optional double l_quantity; optional double l_extendedprice; optional double l_discount; optional double l_tax; optional binary l_returnflag (UTF8); optional binary l_linestatus (UTF8); optional int32 l_shipdate (DATE); optional int32 l_commitdate (DATE); optional int32 l_receiptdate (DATE); optional binary l_shipinstruct (UTF8); optional binary l_shipmode (UTF8); optional binary l_comment (UTF8); } , metadata: {drill.version=1.7.0-SNAPSHOT}}, blocks: [BlockMetaData{9785551, 1338587809 [ColumnMetaData{SNAPPY [l_orderkey] INT64 [PLAIN, BIT_PACKED, RLE], 4}, ColumnMetaData{SNAPPY [l_partkey] INT64 [PLAIN, BIT_PACKED, RLE], 15273019}, ColumnMetaData{SNAPPY [l_suppkey] INT64 [PLAIN, BIT_PACKED, RLE], 73277460}, ColumnMetaData{SNAPPY [l_linenumber] INT32 [PLAIN, BIT_PACKED, RLE], 124321400}, ColumnMetaData{SNAPPY [l_quantity] DOUBLE [PLAIN, BIT_PACKED, RLE], 132087986}, ColumnMetaData{SNAPPY [l_extendedprice] DOUBLE [PLAIN, BIT_PACKED, RLE], 151838465}, ColumnMetaData{SNAPPY [l_discount] DOUBLE [PLAIN, BIT_PACKED, RLE], 208270450}, ColumnMetaData{SNAPPY [l_tax] DOUBLE [PLAIN, BIT_PACKED, RLE], 227351535}, ColumnMetaData{SNAPPY [l_returnflag] BINARY [PLAIN, BIT_PACKED, RLE], 245574230}, ColumnMetaData{SNAPPY [l_linestatus] BINARY [PLAIN, BIT_PACKED, RLE], 254814472}, ColumnMetaData{SNAPPY [l_shipdate] INT32 [PLAIN, BIT_PACKED, RLE], 260500185}, ColumnMetaData{SNAPPY [l_commitdate] INT32 [PLAIN, BIT_PACKED, RLE], 290097700}, ColumnMetaData{SNAPPY [l_receiptdate] INT32 [PLAIN, BIT_PACKED, RLE], 319358270}, ColumnMetaData{SNAPPY [l_shipinstruct] BINARY [PLAIN, BIT_PACKED, RLE], 348982057}, ColumnMetaData{SNAPPY [l_shipmode] BINARY [PLAIN, BIT_PACKED, RLE], 370125048}, ColumnMetaData{SNAPPY [l_comment] BINARY [PLAIN, BIT_PACKED, RLE], 392116052}]}]} org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.handleException():316 org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.setup():300 org.apache.drill.exec.physical.impl.ScanBatch.getNextReaderIfHas():335 org.apache.drill.exec.physical.impl.ScanBatch.internalNext():222 org.apache.drill.exec.physical.impl.ScanBatch.next():274 org.apache.drill.exec.record.AbstractRecordBatch.next():119 org.apache.drill.exec.record.AbstractRecordBatch.next():109 org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51 org.apache.drill.exec.record.AbstractRecordBatch.next():164 org.apache.drill.exec.record.AbstractRecordBatch.next():119 org.apache.drill.exec.record.AbstractRecordBatch.next():109 org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51 org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext():80 org.apache.drill.exec.record.AbstractRecordBatch.next():164 org.apache.drill.exec.record.AbstractRecordBatch.next():119 org.apache.drill.exec.record.AbstractRecordBatch.next():109 org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51 org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():134 org.apache.drill.exec.record.AbstractRecordBatch.next():164 org.apache.drill.exec.record.AbstractRecordBatch.next():119 org.apache.drill.exec.test.generated.StreamingAggregatorGen42.doWork():187 org.apache.drill.exec.physical.impl.aggregate.StreamingAggBatch.innerNext():194 org.apache.drill.exec.record.AbstractRecordBatch.next():164 org.apache.drill.exec.physical.impl.BaseRootExec.next():105 org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext():93 org.apache.drill.exec.physical.impl.BaseRootExec.next():95 org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():233 org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():226 java.security.AccessController.doPrivileged():-2 javax.security.auth.Subject.doAs():422 org.apache.hadoop.security.UserGroupInformation.doAs():1633 org.apache.drill.exec.work.fragment.FragmentExecutor.run():226
[jira] [Commented] (DRILL-6039) drillbit.sh graceful_stop does not wait for fragments to complete before stopping the drillbit
[ https://issues.apache.org/jira/browse/DRILL-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16510437#comment-16510437 ] Pritesh Maker commented on DRILL-6039: -- [~knguyen] can you verify this since DRILL-6252 is now resolved? > drillbit.sh graceful_stop does not wait for fragments to complete before > stopping the drillbit > -- > > Key: DRILL-6039 > URL: https://issues.apache.org/jira/browse/DRILL-6039 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.3.0 >Reporter: Krystal >Assignee: Venkata Jyothsna Donapati >Priority: Major > Fix For: 1.14.0 > > > git.commit.id.abbrev=eb0c403 > I have 3-nodes cluster with drillbits running on each node. I kicked off a > long running query. In the middle of the query, I did a "./drillbit.sh > graceful_stop" on one of the non-foreman node. The node was stopped within a > few seconds and the query failed with error: > Error: SYSTEM ERROR: IOException: Filesystem closed > Fragment 4:15 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6039) drillbit.sh graceful_stop does not wait for fragments to complete before stopping the drillbit
[ https://issues.apache.org/jira/browse/DRILL-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16402447#comment-16402447 ] Pritesh Maker commented on DRILL-6039: -- This should be tested after DRIL-6252 is addressed. > drillbit.sh graceful_stop does not wait for fragments to complete before > stopping the drillbit > -- > > Key: DRILL-6039 > URL: https://issues.apache.org/jira/browse/DRILL-6039 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.3.0 >Reporter: Krystal >Assignee: Venkata Jyothsna Donapati >Priority: Major > Fix For: 1.14.0 > > > git.commit.id.abbrev=eb0c403 > I have 3-nodes cluster with drillbits running on each node. I kicked off a > long running query. In the middle of the query, I did a "./drillbit.sh > graceful_stop" on one of the non-foreman node. The node was stopped within a > few seconds and the query failed with error: > Error: SYSTEM ERROR: IOException: Filesystem closed > Fragment 4:15 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6039) drillbit.sh graceful_stop does not wait for fragments to complete before stopping the drillbit
[ https://issues.apache.org/jira/browse/DRILL-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16401054#comment-16401054 ] Venkata Jyothsna Donapati commented on DRILL-6039: -- This happened only when we query on parquet files. And looks like this is the issue : https://issues.apache.org/jira/browse/DRILL-6252 and graceful shutdown has nothing to do with this > drillbit.sh graceful_stop does not wait for fragments to complete before > stopping the drillbit > -- > > Key: DRILL-6039 > URL: https://issues.apache.org/jira/browse/DRILL-6039 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.3.0 >Reporter: Krystal >Assignee: Venkata Jyothsna Donapati >Priority: Major > Fix For: 1.14.0 > > > git.commit.id.abbrev=eb0c403 > I have 3-nodes cluster with drillbits running on each node. I kicked off a > long running query. In the middle of the query, I did a "./drillbit.sh > graceful_stop" on one of the non-foreman node. The node was stopped within a > few seconds and the query failed with error: > Error: SYSTEM ERROR: IOException: Filesystem closed > Fragment 4:15 -- This message was sent by Atlassian JIRA (v7.6.3#76005)