[GitHub] drill pull request #921: DRILL-4286 Graceful shutdown of drillbit

2017-11-29 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/921


---


[GitHub] drill pull request #921: DRILL-4286 Graceful shutdown of drillbit

2017-11-21 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/921#discussion_r152361270
  
--- Diff: 
protocol/src/main/java/org/apache/drill/exec/proto/beans/RpcType.java ---
@@ -25,28 +25,8 @@
 HANDSHAKE(0),
 ACK(1),
 GOODBYE(2),
-RUN_QUERY(3),
-CANCEL_QUERY(4),
-REQUEST_RESULTS(5),
-RESUME_PAUSED_QUERY(11),
-GET_QUERY_PLAN_FRAGMENTS(12),
-GET_CATALOGS(14),
-GET_SCHEMAS(15),
-GET_TABLES(16),
-GET_COLUMNS(17),
-CREATE_PREPARED_STATEMENT(22),
-GET_SERVER_META(8),
-QUERY_DATA(6),
-QUERY_HANDLE(7),
-QUERY_PLAN_FRAGMENTS(13),
-CATALOGS(18),
-SCHEMAS(19),
-TABLES(20),
-COLUMNS(21),
-PREPARED_STATEMENT(23),
-SERVER_META(9),
-QUERY_RESULT(10),
-SASL_MESSAGE(24);
+REQ_RECORD_BATCH(3),
+SASL_MESSAGE(4);
--- End diff --

The change seems to be that messages are dropped. That can't be good. The 
only diff that should show up here is the addition of your new state codes.

The other explanation is that master is wrong, which would be a bad state 
of affairs.


---


[GitHub] drill pull request #921: DRILL-4286 Graceful shutdown of drillbit

2017-11-15 Thread dvjyothsna
Github user dvjyothsna commented on a diff in the pull request:

https://github.com/apache/drill/pull/921#discussion_r151246632
  
--- Diff: 
protocol/src/main/java/org/apache/drill/exec/proto/beans/RpcType.java ---
@@ -25,28 +25,8 @@
 HANDSHAKE(0),
 ACK(1),
 GOODBYE(2),
-RUN_QUERY(3),
-CANCEL_QUERY(4),
-REQUEST_RESULTS(5),
-RESUME_PAUSED_QUERY(11),
-GET_QUERY_PLAN_FRAGMENTS(12),
-GET_CATALOGS(14),
-GET_SCHEMAS(15),
-GET_TABLES(16),
-GET_COLUMNS(17),
-CREATE_PREPARED_STATEMENT(22),
-GET_SERVER_META(8),
-QUERY_DATA(6),
-QUERY_HANDLE(7),
-QUERY_PLAN_FRAGMENTS(13),
-CATALOGS(18),
-SCHEMAS(19),
-TABLES(20),
-COLUMNS(21),
-PREPARED_STATEMENT(23),
-SERVER_META(9),
-QUERY_RESULT(10),
-SASL_MESSAGE(24);
+REQ_RECORD_BATCH(3),
+SASL_MESSAGE(4);
--- End diff --

This is not my change. This file is generated by protobuf. When I looked 
into the issue I found out that user.proto and BitData.proto, Bitcontrol.proto 
have same enum (RpcType) . That is causing the issue.


---


[GitHub] drill pull request #921: DRILL-4286 Graceful shutdown of drillbit

2017-11-15 Thread dvjyothsna
Github user dvjyothsna commented on a diff in the pull request:

https://github.com/apache/drill/pull/921#discussion_r151201233
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/work/WorkManager.java ---
@@ -165,32 +169,59 @@ public DrillbitContext getContext() {
*
* This is intended to be used by {@link 
org.apache.drill.exec.server.Drillbit#close()}.
*/
-  public void waitToExit() {
+  public void waitToExit(Drillbit bit, boolean forcefulShutdown) {
 synchronized(this) {
-  if (queries.isEmpty() && runningFragments.isEmpty()) {
+  numOfRunningQueries = queries.size();
+  numOfRunningFragments = runningFragments.size();
+  if ( queries.isEmpty() && runningFragments.isEmpty()) {
--- End diff --

We compare if the remaining queries are going down or if new queries are 
added during shutdown.


---


[GitHub] drill pull request #921: DRILL-4286 Graceful shutdown of drillbit

2017-11-15 Thread dvjyothsna
Github user dvjyothsna commented on a diff in the pull request:

https://github.com/apache/drill/pull/921#discussion_r151201163
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/work/WorkManager.java ---
@@ -165,32 +169,59 @@ public DrillbitContext getContext() {
*
* This is intended to be used by {@link 
org.apache.drill.exec.server.Drillbit#close()}.
*/
-  public void waitToExit() {
+  public void waitToExit(Drillbit bit, boolean forcefulShutdown) {
 synchronized(this) {
-  if (queries.isEmpty() && runningFragments.isEmpty()) {
+  numOfRunningQueries = queries.size();
+  numOfRunningFragments = runningFragments.size();
--- End diff --

We compare if the remaining queries are going down or if new queries are 
added during shutdown.


---


[GitHub] drill pull request #921: DRILL-4286 Graceful shutdown of drillbit

2017-11-15 Thread arina-ielchiieva
Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/921#discussion_r15078
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/DrillRoot.java 
---
@@ -315,7 +481,12 @@ public int compareTo(DrillbitInfo drillbitToCompare) {
 
   if (this.isVersionMatch() == drillbitToCompare.isVersionMatch()) {
 if (this.version.equals(drillbitToCompare.getVersion())) {
-  return this.address.compareTo(drillbitToCompare.getAddress());
+  {
+if(this.address.equals(drillbitToCompare.getAddress())) {
--- End diff --

`if (`.


---


[GitHub] drill pull request #921: DRILL-4286 Graceful shutdown of drillbit

2017-11-15 Thread arina-ielchiieva
Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/921#discussion_r151110780
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/DrillRoot.java 
---
@@ -58,13 +63,170 @@
   @Inject UserAuthEnabled authEnabled;
   @Inject WorkManager work;
   @Inject SecurityContext sc;
+  @Inject Drillbit drillbit;
 
   @GET
   @Produces(MediaType.TEXT_HTML)
   public Viewable getClusterInfo() {
 return ViewableWithPermissions.create(authEnabled.get(), 
"/rest/index.ftl", sc, getClusterInfoJSON());
   }
 
+
+  @SuppressWarnings("resource")
+  @GET
+  @Path("/state")
+  @Produces(MediaType.APPLICATION_JSON)
+  public Response getDrillbitStatus(){
+Collection drillbits = 
getClusterInfoJSON().getDrillbits();
+Map drillStatusMap = new HashMap();
+for (DrillbitInfo drillbit : drillbits) {
+  
drillStatusMap.put(drillbit.getAddress()+"-"+drillbit.getUserPort(),drillbit.getState());
+}
+return Response.ok()
+.entity(drillStatusMap)
+.header("Access-Control-Allow-Origin", "*")
+.header("Access-Control-Allow-Methods", "GET, POST, DELETE, 
PUT")
+  .header("Access-Control-Allow-Credentials","true")
+.allow("OPTIONS").build();
+  }
+
+  @SuppressWarnings("resource")
+  @GET
+  @Path("/graceperiod")
+  @Produces(MediaType.APPLICATION_JSON)
+  public Map getGracePeriod(){
+
+final DrillConfig config = work.getContext().getConfig();
+final int gracePeriod = config.getInt(ExecConstants.GRACE_PERIOD);
+Map gracePeriodMap = new HashMap();
+gracePeriodMap.put("graceperiod",gracePeriod);
+return gracePeriodMap;
+  }
+
+  @SuppressWarnings("resource")
+  @GET
+  @Path("/queriesCount")
+  @Produces(MediaType.APPLICATION_JSON)
+  public Response getRemainingQueries() {
+Map queriesInfo = new HashMap();
+queriesInfo = work.getRemainingQueries();
+return Response.ok()
+.entity(queriesInfo)
+.header("Access-Control-Allow-Origin", "*")
+.header("Access-Control-Allow-Methods", "GET, POST, DELETE, 
PUT")
+  .header("Access-Control-Allow-Credentials","true")
+.allow("OPTIONS").build();
+  }
+
+  @SuppressWarnings("resource")
+  @POST
+  @Path("/graceful_shutdown")
+  @Produces(MediaType.APPLICATION_JSON)
+  public Response shutdownDrillbit() throws Exception {
+Map shutdownInfo = new HashMap();
+try {
+  new Thread(new Runnable() {
+public void run() {
+  try {
+drillbit.close();
+  } catch (Exception e) {
+e.printStackTrace();
--- End diff --

There are at least 7 usages in code all should be removed.


---


[GitHub] drill pull request #921: DRILL-4286 Graceful shutdown of drillbit

2017-11-15 Thread arina-ielchiieva
Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/921#discussion_r151110657
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/DrillRoot.java 
---
@@ -58,13 +63,170 @@
   @Inject UserAuthEnabled authEnabled;
   @Inject WorkManager work;
   @Inject SecurityContext sc;
+  @Inject Drillbit drillbit;
 
   @GET
   @Produces(MediaType.TEXT_HTML)
   public Viewable getClusterInfo() {
 return ViewableWithPermissions.create(authEnabled.get(), 
"/rest/index.ftl", sc, getClusterInfoJSON());
   }
 
+
+  @SuppressWarnings("resource")
+  @GET
+  @Path("/state")
+  @Produces(MediaType.APPLICATION_JSON)
+  public Response getDrillbitStatus(){
+Collection drillbits = 
getClusterInfoJSON().getDrillbits();
+Map drillStatusMap = new HashMap();
+for (DrillbitInfo drillbit : drillbits) {
+  
drillStatusMap.put(drillbit.getAddress()+"-"+drillbit.getUserPort(),drillbit.getState());
+}
+return Response.ok()
+.entity(drillStatusMap)
+.header("Access-Control-Allow-Origin", "*")
+.header("Access-Control-Allow-Methods", "GET, POST, DELETE, 
PUT")
+  .header("Access-Control-Allow-Credentials","true")
+.allow("OPTIONS").build();
+  }
+
+  @SuppressWarnings("resource")
+  @GET
+  @Path("/graceperiod")
--- End diff --

/gracePeriod


---


[GitHub] drill pull request #921: DRILL-4286 Graceful shutdown of drillbit

2017-11-15 Thread arina-ielchiieva
Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/921#discussion_r15410
  
--- Diff: exec/java-exec/src/main/resources/rest/index.ftl ---
@@ -179,6 +184,98 @@
 
   
   
+   

[GitHub] drill pull request #921: DRILL-4286 Graceful shutdown of drillbit

2017-11-14 Thread bitblender
Github user bitblender commented on a diff in the pull request:

https://github.com/apache/drill/pull/921#discussion_r151003788
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/DrillbitContext.java 
---
@@ -157,10 +157,29 @@ public DrillConfig getConfig() {
 return context.getConfig();
   }
 
-  public Collection getBits() {
+  public Collection getAvailableBits() {
 return coord.getAvailableEndpoints();
   }
 
+  public Collection getBits() {
+return coord.getOnlineEndPoints();
+  }
+  
+  public boolean isOnline(DrillbitEndpoint endpoint) { return 
endpoint.getState().equals(DrillbitEndpoint.State.ONLINE); }
+
+  public boolean isForemanOnline() {
+DrillbitEndpoint foreman = getEndpoint();
+Collection dbs = getAvailableBits();
+for( DrillbitEndpoint db : dbs) {
+  if( db.getAddress().equals(foreman.getAddress()) && db.getUserPort() 
== foreman.getUserPort()) {
--- End diff --

Can you please explain the logic in isForemanOnline(). Why do you have to 
get the list of endpoints from ZK and then check for foreman in that list 
before making the isOnline test ? Why can't it be done on the foreman object? 
Is this to make sure that the state is updated in ZK before refusing to take 
queries ?
Why do you assume that the foreman is online if the foreman is not found in 
the list of endPoints? 
i.e. if it is not in the dbs list
why do you return true in that case ?


---


[GitHub] drill pull request #921: DRILL-4286 Graceful shutdown of drillbit

2017-11-14 Thread dvjyothsna
Github user dvjyothsna commented on a diff in the pull request:

https://github.com/apache/drill/pull/921#discussion_r150994906
  
--- Diff: exec/java-exec/src/main/resources/rest/index.ftl ---
@@ -57,19 +57,19 @@
   Control Port
   Data Port
   Version
+  Status
 
   
   
 <#assign i = 1>
 <#list model.getDrillbits() as drillbit>
-  
+  
 ${i}
-${drillbit.getAddress()}
-  <#if drillbit.isCurrent()>
+${drillbit.getAddress()}<#if 
drillbit.isCurrent()>
--- End diff --

I have tried that before. But it was giving some alignment issues.


---


[GitHub] drill pull request #921: DRILL-4286 Graceful shutdown of drillbit

2017-11-14 Thread dvjyothsna
Github user dvjyothsna commented on a diff in the pull request:

https://github.com/apache/drill/pull/921#discussion_r150993718
  
--- Diff: exec/java-exec/src/main/resources/rest/index.ftl ---
@@ -57,19 +57,19 @@
   Control Port
   Data Port
   Version
+  Status
 
   
   
 <#assign i = 1>
 <#list model.getDrillbits() as drillbit>
-  
+  
 ${i}
-${drillbit.getAddress()}
-  <#if drillbit.isCurrent()>
+${drillbit.getAddress()}<#if 
drillbit.isCurrent()>
 Current
   
 
-${drillbit.getUserPort()}
+${drillbit.getUserPort()}
--- End diff --

Yes, true. But for every row I have a different ID ("row-1") and I'm 
accessing "port"(child) of that row.  This was the entire row is unique and 
need not assign  a new unique id for every element in that row. Am I missing 
something here?


---


[GitHub] drill pull request #921: DRILL-4286 Graceful shutdown of drillbit

2017-11-14 Thread dvjyothsna
Github user dvjyothsna commented on a diff in the pull request:

https://github.com/apache/drill/pull/921#discussion_r150992676
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/work/WorkManager.java ---
@@ -165,32 +169,59 @@ public DrillbitContext getContext() {
*
* This is intended to be used by {@link 
org.apache.drill.exec.server.Drillbit#close()}.
*/
-  public void waitToExit() {
+  public void waitToExit(Drillbit bit, boolean forcefulShutdown) {
 synchronized(this) {
-  if (queries.isEmpty() && runningFragments.isEmpty()) {
+  numOfRunningQueries = queries.size();
+  numOfRunningFragments = runningFragments.size();
+  if ( queries.isEmpty() && runningFragments.isEmpty()) {
 return;
   }
-
+  logger.info("Draining " + queries +" queries and "+ 
runningFragments+" fragments.");
   exitLatch = new ExtendedLatch();
 }
-
-// Wait for at most 5 seconds or until the latch is released.
-exitLatch.awaitUninterruptibly(5000);
+// Wait uninterruptibly until all the queries and running fragments on 
that drillbit goes down
+// to zero
+if( forcefulShutdown ) {
+  exitLatch.awaitUninterruptibly(5000);
--- End diff --

This was the default one that existed before. That is the reason I didn't 
changed it. 


---


[GitHub] drill pull request #921: DRILL-4286 Graceful shutdown of drillbit

2017-11-14 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/921#discussion_r150985739
  
--- Diff: 
protocol/src/main/java/org/apache/drill/exec/proto/beans/RpcType.java ---
@@ -25,28 +25,8 @@
 HANDSHAKE(0),
 ACK(1),
 GOODBYE(2),
-RUN_QUERY(3),
-CANCEL_QUERY(4),
-REQUEST_RESULTS(5),
-RESUME_PAUSED_QUERY(11),
-GET_QUERY_PLAN_FRAGMENTS(12),
-GET_CATALOGS(14),
-GET_SCHEMAS(15),
-GET_TABLES(16),
-GET_COLUMNS(17),
-CREATE_PREPARED_STATEMENT(22),
-GET_SERVER_META(8),
-QUERY_DATA(6),
-QUERY_HANDLE(7),
-QUERY_PLAN_FRAGMENTS(13),
-CATALOGS(18),
-SCHEMAS(19),
-TABLES(20),
-COLUMNS(21),
-PREPARED_STATEMENT(23),
-SERVER_META(9),
-QUERY_RESULT(10),
-SASL_MESSAGE(24);
+REQ_RECORD_BATCH(3),
+SASL_MESSAGE(4);
--- End diff --

Is this your change? Why has the list of RPC types changed? Is this a merge 
issue?


---


[GitHub] drill pull request #921: DRILL-4286 Graceful shutdown of drillbit

2017-11-14 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/921#discussion_r150985401
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/TestGracefulShutdown.java ---
@@ -0,0 +1,248 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.test;
+import com.google.common.io.Files;
+import org.apache.commons.io.FileUtils;
+import org.apache.drill.exec.ExecConstants;
+import org.apache.drill.exec.proto.CoordinationProtos.DrillbitEndpoint;
+import org.apache.drill.exec.server.Drillbit;
+import org.junit.Assert;
+import org.junit.BeforeClass;
+import org.junit.Test;
+import java.io.File;
+import java.io.FileWriter;
+import java.io.IOException;
+import java.io.PrintWriter;
+import java.net.HttpURLConnection;
+import java.net.URL;
+import java.util.Collection;
+import java.util.Properties;
+import static org.junit.Assert.assertNotEquals;
+import static org.junit.Assert.fail;
+
+public class TestGracefulShutdown {
+
+  static String testDirPath;
+  @BeforeClass
+  public static void setUpTestData() throws Exception {
+
+final File testDir = getTempDir("graceful_shutdown");
+testDirPath = testDir.getAbsolutePath();
+for( int i = 0; i < 500; i++) {
+  setupFile(testDir, i);
+}
+  }
+
+
+  public static final Properties WEBSERVER_CONFIGURATION = new 
Properties() {
+{
+  put(ExecConstants.HTTP_ENABLE, true);
+  put(ExecConstants.HTTP_PORT_HUNT, true);
+}
+  };
+
+  public ClusterFixtureBuilder enableWebServer(ClusterFixtureBuilder 
builder) {
+Properties props = new Properties();
+props.putAll(WEBSERVER_CONFIGURATION);
+builder.configBuilder.configProps(props);
+return builder;
+  }
+
+
+  /*
+  Start multiple drillbits and then shutdown a drillbit. Query the online
+  endpoints and check if the drillbit still exists.
+   */
+  @Test
+  public void testOnlineEndPoints() throws  Exception {
+
+String[] drillbits = {"db1" ,"db2","db3", "db4", "db5", "db6"};
+ClusterFixtureBuilder builder = 
ClusterFixture.builder().withBits(drillbits).withLocalZk();
+
+
+try ( ClusterFixture cluster = builder.build();
+  ClientFixture client = cluster.clientFixture()) {
+
+  Drillbit drillbit = cluster.drillbit("db2");
+  DrillbitEndpoint drillbitEndpoint =  
drillbit.getRegistrationHandle().getEndPoint();
+  int grace_period = 
drillbit.getContext().getConfig().getInt("drill.exec.grace_period");
+  new Thread(new Runnable() {
+public void run() {
+  try {
+cluster.closeDrillbit("db2");
+  } catch (Exception e) {
+e.printStackTrace();
+  }
+}
+  }).start();
+  //wait for graceperiod
+  Thread.sleep(grace_period);
+  Collection drillbitEndpoints = 
cluster.drillbit().getContext()
+  .getClusterCoordinator()
+  .getOnlineEndPoints();
+  Assert.assertFalse(drillbitEndpoints.contains(drillbitEndpoint));
+}
+  }
+  /*
+Test if the drillbit transitions from ONLINE state when a shutdown
+request is initiated
+   */
+  @Test
+  public void testStateChange() throws  Exception {
+
+String[] drillbits = {"db1" ,"db2", "db3", "db4", "db5", "db6"};
+ClusterFixtureBuilder builder = 
ClusterFixture.builder().withBits(drillbits).withLocalZk();
+
+try ( ClusterFixture cluster = builder.build();
+  ClientFixture client = cluster.clientFixture()) {
+  Drillbit drillbit = cluster.drillbit("db2");
+  int grace_period = 
drillbit.getContext().getConfig().getInt("drill.exec.grace_period");
+  DrillbitEndpoint drillbitEndpoint =  
drillbit.getRegistrationHandle().getEndPoint();
+  new Thread(new Runnable() {
+  

[GitHub] drill pull request #921: DRILL-4286 Graceful shutdown of drillbit

2017-11-14 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/921#discussion_r150981136
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/work/WorkManager.java ---
@@ -165,32 +169,59 @@ public DrillbitContext getContext() {
*
* This is intended to be used by {@link 
org.apache.drill.exec.server.Drillbit#close()}.
*/
-  public void waitToExit() {
+  public void waitToExit(Drillbit bit, boolean forcefulShutdown) {
 synchronized(this) {
-  if (queries.isEmpty() && runningFragments.isEmpty()) {
+  numOfRunningQueries = queries.size();
+  numOfRunningFragments = runningFragments.size();
+  if ( queries.isEmpty() && runningFragments.isEmpty()) {
 return;
   }
-
+  logger.info("Draining " + queries +" queries and "+ 
runningFragments+" fragments.");
   exitLatch = new ExtendedLatch();
 }
-
-// Wait for at most 5 seconds or until the latch is released.
-exitLatch.awaitUninterruptibly(5000);
+// Wait uninterruptibly until all the queries and running fragments on 
that drillbit goes down
+// to zero
+if( forcefulShutdown ) {
+  exitLatch.awaitUninterruptibly(5000);
+} else {
+  exitLatch.awaitUninterruptibly();
+}
   }
 
   /**
* If it is safe to exit, and the exitLatch is in use, signals it so 
that waitToExit() will
-   * unblock.
+   * unblock. Logs the number of pending fragments and queries that are 
running on that
+   * drillbit to track the progress of shutdown process.
*/
   private void indicateIfSafeToExit() {
 synchronized(this) {
   if (exitLatch != null) {
+logger.info("Waiting for "+ queries.size() +" queries to complete 
before shutting down");
+logger.info("Waiting for "+ runningFragments.size() +" running 
fragments to complete before shutting down");
+if(runningFragments.size() > numOfRunningFragments|| 
queries.size() > numOfRunningQueries) {
+  logger.info("New Fragments or queries are added while drillbit 
is Shutting down");
+}
 if (queries.isEmpty() && runningFragments.isEmpty()) {
+  // Both Queries and Running fragments are empty.
+  // So its safe for the drillbit to exit.
--- End diff --

As it turns out @ilooner is making significant changes to this area. Can 
you two coordinate?

We just need the @dvjyothsna is making to work long enough for the @ilooner 
changes replace them. But, Tim's changes need to support this new graceful 
shutdown model.


---


[GitHub] drill pull request #921: DRILL-4286 Graceful shutdown of drillbit

2017-11-14 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/921#discussion_r150984396
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/ClusterFixture.java ---
@@ -471,6 +471,21 @@ public void close() throws Exception {
   }
 
   /**
+   * Shutdown the drillbit given the name of the drillbit.
+   */
+  public void closeDrillbit(final String drillbitName) throws Exception {
+Exception ex = null;
+for (Drillbit bit : drillbits()) {
+  if(bit.equals(bits.get(drillbitName))) {
+bit.close();
+  }
+}
+if(ex != null) {
--- End diff --

Looks like `ex` is declared, but never set. Was the goal to capture 
exceptions from `bit.close()`?


---


[GitHub] drill pull request #921: DRILL-4286 Graceful shutdown of drillbit

2017-11-14 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/921#discussion_r150984962
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/TestGracefulShutdown.java ---
@@ -0,0 +1,248 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.test;
+import com.google.common.io.Files;
+import org.apache.commons.io.FileUtils;
+import org.apache.drill.exec.ExecConstants;
+import org.apache.drill.exec.proto.CoordinationProtos.DrillbitEndpoint;
+import org.apache.drill.exec.server.Drillbit;
+import org.junit.Assert;
+import org.junit.BeforeClass;
+import org.junit.Test;
+import java.io.File;
+import java.io.FileWriter;
+import java.io.IOException;
+import java.io.PrintWriter;
+import java.net.HttpURLConnection;
+import java.net.URL;
+import java.util.Collection;
+import java.util.Properties;
+import static org.junit.Assert.assertNotEquals;
+import static org.junit.Assert.fail;
+
+public class TestGracefulShutdown {
+
+  static String testDirPath;
+  @BeforeClass
+  public static void setUpTestData() throws Exception {
+
+final File testDir = getTempDir("graceful_shutdown");
+testDirPath = testDir.getAbsolutePath();
+for( int i = 0; i < 500; i++) {
+  setupFile(testDir, i);
+}
+  }
+
+
+  public static final Properties WEBSERVER_CONFIGURATION = new 
Properties() {
+{
+  put(ExecConstants.HTTP_ENABLE, true);
+  put(ExecConstants.HTTP_PORT_HUNT, true);
+}
+  };
+
+  public ClusterFixtureBuilder enableWebServer(ClusterFixtureBuilder 
builder) {
+Properties props = new Properties();
+props.putAll(WEBSERVER_CONFIGURATION);
+builder.configBuilder.configProps(props);
+return builder;
+  }
+
+
+  /*
+  Start multiple drillbits and then shutdown a drillbit. Query the online
+  endpoints and check if the drillbit still exists.
+   */
+  @Test
+  public void testOnlineEndPoints() throws  Exception {
+
+String[] drillbits = {"db1" ,"db2","db3", "db4", "db5", "db6"};
+ClusterFixtureBuilder builder = 
ClusterFixture.builder().withBits(drillbits).withLocalZk();
+
+
+try ( ClusterFixture cluster = builder.build();
+  ClientFixture client = cluster.clientFixture()) {
+
+  Drillbit drillbit = cluster.drillbit("db2");
+  DrillbitEndpoint drillbitEndpoint =  
drillbit.getRegistrationHandle().getEndPoint();
+  int grace_period = 
drillbit.getContext().getConfig().getInt("drill.exec.grace_period");
+  new Thread(new Runnable() {
+public void run() {
+  try {
+cluster.closeDrillbit("db2");
+  } catch (Exception e) {
+e.printStackTrace();
+  }
+}
+  }).start();
+  //wait for graceperiod
+  Thread.sleep(grace_period);
+  Collection drillbitEndpoints = 
cluster.drillbit().getContext()
+  .getClusterCoordinator()
+  .getOnlineEndPoints();
+  Assert.assertFalse(drillbitEndpoints.contains(drillbitEndpoint));
+}
+  }
+  /*
+Test if the drillbit transitions from ONLINE state when a shutdown
+request is initiated
+   */
+  @Test
+  public void testStateChange() throws  Exception {
+
+String[] drillbits = {"db1" ,"db2", "db3", "db4", "db5", "db6"};
+ClusterFixtureBuilder builder = 
ClusterFixture.builder().withBits(drillbits).withLocalZk();
+
+try ( ClusterFixture cluster = builder.build();
+  ClientFixture client = cluster.clientFixture()) {
+  Drillbit drillbit = cluster.drillbit("db2");
+  int grace_period = 
drillbit.getContext().getConfig().getInt("drill.exec.grace_period");
+  DrillbitEndpoint drillbitEndpoint =  
drillbit.getRegistrationHandle().getEndPoint();
+  new Thread(new Runnable() {
+  

[GitHub] drill pull request #921: DRILL-4286 Graceful shutdown of drillbit

2017-11-14 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/921#discussion_r150985212
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/TestGracefulShutdown.java ---
@@ -0,0 +1,248 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.test;
+import com.google.common.io.Files;
+import org.apache.commons.io.FileUtils;
+import org.apache.drill.exec.ExecConstants;
+import org.apache.drill.exec.proto.CoordinationProtos.DrillbitEndpoint;
+import org.apache.drill.exec.server.Drillbit;
+import org.junit.Assert;
+import org.junit.BeforeClass;
+import org.junit.Test;
+import java.io.File;
+import java.io.FileWriter;
+import java.io.IOException;
+import java.io.PrintWriter;
+import java.net.HttpURLConnection;
+import java.net.URL;
+import java.util.Collection;
+import java.util.Properties;
+import static org.junit.Assert.assertNotEquals;
+import static org.junit.Assert.fail;
+
+public class TestGracefulShutdown {
+
+  static String testDirPath;
+  @BeforeClass
+  public static void setUpTestData() throws Exception {
+
+final File testDir = getTempDir("graceful_shutdown");
+testDirPath = testDir.getAbsolutePath();
+for( int i = 0; i < 500; i++) {
+  setupFile(testDir, i);
+}
+  }
+
+
+  public static final Properties WEBSERVER_CONFIGURATION = new 
Properties() {
+{
+  put(ExecConstants.HTTP_ENABLE, true);
+  put(ExecConstants.HTTP_PORT_HUNT, true);
+}
+  };
+
+  public ClusterFixtureBuilder enableWebServer(ClusterFixtureBuilder 
builder) {
+Properties props = new Properties();
+props.putAll(WEBSERVER_CONFIGURATION);
+builder.configBuilder.configProps(props);
+return builder;
+  }
+
+
+  /*
+  Start multiple drillbits and then shutdown a drillbit. Query the online
+  endpoints and check if the drillbit still exists.
+   */
+  @Test
+  public void testOnlineEndPoints() throws  Exception {
+
+String[] drillbits = {"db1" ,"db2","db3", "db4", "db5", "db6"};
+ClusterFixtureBuilder builder = 
ClusterFixture.builder().withBits(drillbits).withLocalZk();
+
+
+try ( ClusterFixture cluster = builder.build();
+  ClientFixture client = cluster.clientFixture()) {
+
+  Drillbit drillbit = cluster.drillbit("db2");
+  DrillbitEndpoint drillbitEndpoint =  
drillbit.getRegistrationHandle().getEndPoint();
+  int grace_period = 
drillbit.getContext().getConfig().getInt("drill.exec.grace_period");
+  new Thread(new Runnable() {
+public void run() {
+  try {
+cluster.closeDrillbit("db2");
+  } catch (Exception e) {
+e.printStackTrace();
+  }
+}
+  }).start();
+  //wait for graceperiod
+  Thread.sleep(grace_period);
+  Collection drillbitEndpoints = 
cluster.drillbit().getContext()
+  .getClusterCoordinator()
+  .getOnlineEndPoints();
+  Assert.assertFalse(drillbitEndpoints.contains(drillbitEndpoint));
+}
+  }
+  /*
+Test if the drillbit transitions from ONLINE state when a shutdown
+request is initiated
+   */
+  @Test
+  public void testStateChange() throws  Exception {
+
+String[] drillbits = {"db1" ,"db2", "db3", "db4", "db5", "db6"};
+ClusterFixtureBuilder builder = 
ClusterFixture.builder().withBits(drillbits).withLocalZk();
+
+try ( ClusterFixture cluster = builder.build();
+  ClientFixture client = cluster.clientFixture()) {
+  Drillbit drillbit = cluster.drillbit("db2");
+  int grace_period = 
drillbit.getContext().getConfig().getInt("drill.exec.grace_period");
+  DrillbitEndpoint drillbitEndpoint =  
drillbit.getRegistrationHandle().getEndPoint();
+  new Thread(new Runnable() {
+  

[GitHub] drill pull request #921: DRILL-4286 Graceful shutdown of drillbit

2017-11-14 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/921#discussion_r150983985
  
--- Diff: exec/java-exec/src/main/resources/rest/index.ftl ---
@@ -78,6 +78,11 @@
 ${drillbit.getVersion()}
   
 
+${drillbit.getState()}
+
+ SHUTDOWN 
+
+  
--- End diff --

Again.


---


[GitHub] drill pull request #921: DRILL-4286 Graceful shutdown of drillbit

2017-11-14 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/921#discussion_r150982788
  
--- Diff: exec/java-exec/src/main/resources/drill-module.conf ---
@@ -368,6 +368,13 @@ drill.exec: {
 // planning and managing queries. Primarily for testing.
 cpus_per_node: 0,
   }
+  # Grace period is the amount of time where the drillbit accepts work 
after
+  # the shutdown request is triggered. The primary use of grace period is 
to
+  # avoid the race conditions caused by zookeeper delay in updating the 
state
+  # information of the drillbit that is shutting down. So, it is advisable
+  # to have a grace period that is atleast twice the amount of zookeeper
+  # refresh time.
+  grace_period : 1
--- End diff --

Units? Just add a note to the end of your comment to say the units are ms. 
In the future, I've found it to be handy to put the units in the name: 
`grace_period_ms` for example.


---


[GitHub] drill pull request #921: DRILL-4286 Graceful shutdown of drillbit

2017-11-14 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/921#discussion_r150976659
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/DrillbitStateManager.java
 ---
@@ -0,0 +1,80 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.server;
+/*
+  State manager to manage the state of drillbit.
+ */
+public class DrillbitStateManager {
+
+
+  public DrillbitStateManager(DrillbitState currentState) {
+this.currentState = currentState;
+  }
+
+  public enum DrillbitState {
+STARTUP, ONLINE, GRACE, DRAINING, OFFLINE, SHUTDOWN
+  }
+
+  public DrillbitState getState() {
+return currentState;
+  }
+
+  private DrillbitState currentState;
+  public void setState(DrillbitState state) {
+switch (state) {
+  case ONLINE:
+if (currentState == DrillbitState.STARTUP) {
+  currentState = state;
+} else {
+  throw new IllegalStateException("Cannot set drillbit to" + state 
+ "from" + currentState);
--- End diff --

To avoid redundant code and to catch new states:

```
setState(State newState) {
  currentState = transitionTo(newState);
}

State transitionTo(State newState) {
  switch (newState) {
case ONLINE:
  if (currentState == DrillbitState.STARTUP) {
return newState;
  }
  break;
...
default:
  break;
}
  throw new IllegalStateException("Cannot set drillbit to" + state + "from" 
+ currentState);
```

An alternative is to switch on current state to say, essentially "if we are 
in state X, we allow transition to states Y and Z." But, that may be overkill 
here.


---


[GitHub] drill pull request #921: DRILL-4286 Graceful shutdown of drillbit

2017-11-14 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/921#discussion_r150975172
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/DrillbitStateManager.java
 ---
@@ -0,0 +1,80 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.server;
+/*
+  State manager to manage the state of drillbit.
+ */
+public class DrillbitStateManager {
+
+
+  public DrillbitStateManager(DrillbitState currentState) {
+this.currentState = currentState;
+  }
+
+  public enum DrillbitState {
+STARTUP, ONLINE, GRACE, DRAINING, OFFLINE, SHUTDOWN
+  }
--- End diff --

Please move enum before constructor. General order that I observe in Drill 
is:

* Nested interfaces, classes or enums.
* Static fields
* Instance fields
* Constructors
* Static "builder" or "factory" methods
* Other methods


---


[GitHub] drill pull request #921: DRILL-4286 Graceful shutdown of drillbit

2017-11-14 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/921#discussion_r150979320
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/WebServer.java 
---
@@ -114,11 +117,12 @@
* @param context Bootstrap context.
* @param workManager WorkManager instance.
*/
-  public WebServer(final BootStrapContext context, final WorkManager 
workManager) {
+  public WebServer(final BootStrapContext context, final WorkManager 
workManager, final Drillbit drillbit) {
--- End diff --

Do we really want to pass the entire Drillbit to the web server? Seems like 
this creates a coupling that is too tight.

What services are needed? Only the shutdown? Something else?

Can we define an interface just for those services, and let `Drillbit` 
implement that interface to avoid the tight coupling?


---


[GitHub] drill pull request #921: DRILL-4286 Graceful shutdown of drillbit

2017-11-14 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/921#discussion_r150978597
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/DrillRoot.java 
---
@@ -58,13 +63,170 @@
   @Inject UserAuthEnabled authEnabled;
   @Inject WorkManager work;
   @Inject SecurityContext sc;
+  @Inject Drillbit drillbit;
 
   @GET
   @Produces(MediaType.TEXT_HTML)
   public Viewable getClusterInfo() {
 return ViewableWithPermissions.create(authEnabled.get(), 
"/rest/index.ftl", sc, getClusterInfoJSON());
   }
 
+
+  @SuppressWarnings("resource")
+  @GET
+  @Path("/state")
+  @Produces(MediaType.APPLICATION_JSON)
+  public Response getDrillbitStatus(){
+Collection drillbits = 
getClusterInfoJSON().getDrillbits();
+Map drillStatusMap = new HashMap();
+for (DrillbitInfo drillbit : drillbits) {
+  
drillStatusMap.put(drillbit.getAddress()+"-"+drillbit.getUserPort(),drillbit.getState());
+}
+return Response.ok()
+.entity(drillStatusMap)
+.header("Access-Control-Allow-Origin", "*")
+.header("Access-Control-Allow-Methods", "GET, POST, DELETE, 
PUT")
+  .header("Access-Control-Allow-Credentials","true")
+.allow("OPTIONS").build();
+  }
+
+  @SuppressWarnings("resource")
+  @GET
+  @Path("/graceperiod")
+  @Produces(MediaType.APPLICATION_JSON)
+  public Map getGracePeriod(){
+
+final DrillConfig config = work.getContext().getConfig();
+final int gracePeriod = config.getInt(ExecConstants.GRACE_PERIOD);
+Map gracePeriodMap = new HashMap();
+gracePeriodMap.put("graceperiod",gracePeriod);
+return gracePeriodMap;
+  }
+
+  @SuppressWarnings("resource")
+  @GET
+  @Path("/queriesCount")
+  @Produces(MediaType.APPLICATION_JSON)
+  public Response getRemainingQueries() {
+Map queriesInfo = new HashMap();
+queriesInfo = work.getRemainingQueries();
+return Response.ok()
+.entity(queriesInfo)
+.header("Access-Control-Allow-Origin", "*")
+.header("Access-Control-Allow-Methods", "GET, POST, DELETE, 
PUT")
+  .header("Access-Control-Allow-Credentials","true")
+.allow("OPTIONS").build();
+  }
+
+  @SuppressWarnings("resource")
+  @POST
+  @Path("/graceful_shutdown")
+  @Produces(MediaType.APPLICATION_JSON)
+  public Response shutdownDrillbit() throws Exception {
+Map shutdownInfo = new HashMap();
+try {
+  new Thread(new Runnable() {
+public void run() {
+  try {
+drillbit.close();
+  } catch (Exception e) {
+e.printStackTrace();
+  }
+}
+  }).start();
+  shutdownInfo.put("response", "Shutdown request is triggered");
+  return Response.ok()
+  .entity(shutdownInfo)
+  .header("Access-Control-Allow-Origin", "*")
+  .header("Access-Control-Allow-Methods", "GET, POST, DELETE, 
PUT")
+  .header("Access-Control-Allow-Credentials","true")
+  .allow("OPTIONS").build();
+}
+catch (Exception e) {
+  logger.debug("Exception in triggering shutdown request",e);
+  shutdownInfo.put("response", "Error in triggering shutdown request");
+  return Response.ok()
+  .entity(shutdownInfo)
+  .header("Access-Control-Allow-Origin", "*")
+  .header("Access-Control-Allow-Methods", "GET, POST, DELETE, 
PUT")
+  .header("Access-Control-Allow-Credentials","true")
+  .allow("OPTIONS").build();
+}
+  }
+
+  @SuppressWarnings("resource")
+  @POST
+  @Path("/shutdown")
+  @Produces(MediaType.APPLICATION_JSON)
+  public Response ShutdownForcefully() throws Exception {
+Map shutdownInfo = new HashMap();
+try {
+  new Thread(new Runnable() {
+public void run() {
+  try {
+drillbit.setForcefulShutdown(true);
+drillbit.close();
+  } catch (Exception e) {
+e.printStackTrace();
+  }
+}
+  }).start();
+  shutdownInfo.put("response", "Forceful shutdown request is 
triggered");
+  return Response.ok()
+  .entity(shutdownInfo)
+  .header("Access-Control-Allow-Origin", "*")
+  .header("Access-Control-Allow-Methods", "GET, POST, DELETE, 
PUT")
+ 

[GitHub] drill pull request #921: DRILL-4286 Graceful shutdown of drillbit

2017-11-14 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/921#discussion_r150977759
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/DrillRoot.java 
---
@@ -58,13 +63,170 @@
   @Inject UserAuthEnabled authEnabled;
   @Inject WorkManager work;
   @Inject SecurityContext sc;
+  @Inject Drillbit drillbit;
 
   @GET
   @Produces(MediaType.TEXT_HTML)
   public Viewable getClusterInfo() {
 return ViewableWithPermissions.create(authEnabled.get(), 
"/rest/index.ftl", sc, getClusterInfoJSON());
   }
 
+
+  @SuppressWarnings("resource")
+  @GET
+  @Path("/state")
+  @Produces(MediaType.APPLICATION_JSON)
+  public Response getDrillbitStatus(){
+Collection drillbits = 
getClusterInfoJSON().getDrillbits();
+Map drillStatusMap = new HashMap();
+for (DrillbitInfo drillbit : drillbits) {
+  
drillStatusMap.put(drillbit.getAddress()+"-"+drillbit.getUserPort(),drillbit.getState());
--- End diff --

Small point: spaces around `+`


---


[GitHub] drill pull request #921: DRILL-4286 Graceful shutdown of drillbit

2017-11-14 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/921#discussion_r150973540
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/Drillbit.java ---
@@ -176,18 +200,37 @@ public void run() throws Exception {
 logger.info("Startup completed ({} ms).", 
w.elapsed(TimeUnit.MILLISECONDS));
   }
 
+  /*
+Wait uninterruptibly
+   */
+  public void waitForGracePeriod() {
+ExtendedLatch exitLatch = null;
+exitLatch = new ExtendedLatch();
+exitLatch.awaitUninterruptibly(gracePeriod);
--- End diff --

I wonder about synchronization. This latch is private, so never triggered, 
except in a timeout. How is this different than `Thread.sleep()`, other than 
blocking interrupts? If so, maybe a comment to this effect.

Note also that this is called from inside a synchronized block. Just want 
to verify that this is the intent.


---


[GitHub] drill pull request #921: DRILL-4286 Graceful shutdown of drillbit

2017-11-14 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/921#discussion_r150978981
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/DrillRoot.java 
---
@@ -315,7 +481,12 @@ public int compareTo(DrillbitInfo drillbitToCompare) {
 
   if (this.isVersionMatch() == drillbitToCompare.isVersionMatch()) {
 if (this.version.equals(drillbitToCompare.getVersion())) {
-  return this.address.compareTo(drillbitToCompare.getAddress());
+  {
+if(this.address.equals(drillbitToCompare.getAddress())) {
+  return 
(this.controlPort.compareTo(drillbitToCompare.getControlPort()));
--- End diff --

Repeat of the `isSameDrillbit()` pattern noted earlier.


---


[GitHub] drill pull request #921: DRILL-4286 Graceful shutdown of drillbit

2017-11-14 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/921#discussion_r150984759
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/TestGracefulShutdown.java ---
@@ -0,0 +1,248 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.test;
+import com.google.common.io.Files;
+import org.apache.commons.io.FileUtils;
+import org.apache.drill.exec.ExecConstants;
+import org.apache.drill.exec.proto.CoordinationProtos.DrillbitEndpoint;
+import org.apache.drill.exec.server.Drillbit;
+import org.junit.Assert;
+import org.junit.BeforeClass;
+import org.junit.Test;
+import java.io.File;
+import java.io.FileWriter;
+import java.io.IOException;
+import java.io.PrintWriter;
+import java.net.HttpURLConnection;
+import java.net.URL;
+import java.util.Collection;
+import java.util.Properties;
+import static org.junit.Assert.assertNotEquals;
+import static org.junit.Assert.fail;
+
+public class TestGracefulShutdown {
+
+  static String testDirPath;
+  @BeforeClass
+  public static void setUpTestData() throws Exception {
+
+final File testDir = getTempDir("graceful_shutdown");
--- End diff --

Will want to change this to integated with the temp dir changes that 
@ilooner is making for Drill 1.12.


---


[GitHub] drill pull request #921: DRILL-4286 Graceful shutdown of drillbit

2017-11-14 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/921#discussion_r150978095
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/DrillRoot.java 
---
@@ -58,13 +63,170 @@
   @Inject UserAuthEnabled authEnabled;
   @Inject WorkManager work;
   @Inject SecurityContext sc;
+  @Inject Drillbit drillbit;
 
   @GET
   @Produces(MediaType.TEXT_HTML)
   public Viewable getClusterInfo() {
 return ViewableWithPermissions.create(authEnabled.get(), 
"/rest/index.ftl", sc, getClusterInfoJSON());
   }
 
+
+  @SuppressWarnings("resource")
+  @GET
+  @Path("/state")
+  @Produces(MediaType.APPLICATION_JSON)
+  public Response getDrillbitStatus(){
+Collection drillbits = 
getClusterInfoJSON().getDrillbits();
+Map drillStatusMap = new HashMap();
+for (DrillbitInfo drillbit : drillbits) {
+  
drillStatusMap.put(drillbit.getAddress()+"-"+drillbit.getUserPort(),drillbit.getState());
+}
+return Response.ok()
+.entity(drillStatusMap)
+.header("Access-Control-Allow-Origin", "*")
+.header("Access-Control-Allow-Methods", "GET, POST, DELETE, 
PUT")
+  .header("Access-Control-Allow-Credentials","true")
+.allow("OPTIONS").build();
+  }
+
+  @SuppressWarnings("resource")
+  @GET
+  @Path("/graceperiod")
+  @Produces(MediaType.APPLICATION_JSON)
+  public Map getGracePeriod(){
+
+final DrillConfig config = work.getContext().getConfig();
+final int gracePeriod = config.getInt(ExecConstants.GRACE_PERIOD);
+Map gracePeriodMap = new HashMap();
+gracePeriodMap.put("graceperiod",gracePeriod);
+return gracePeriodMap;
+  }
+
+  @SuppressWarnings("resource")
+  @GET
+  @Path("/queriesCount")
+  @Produces(MediaType.APPLICATION_JSON)
+  public Response getRemainingQueries() {
+Map queriesInfo = new HashMap();
+queriesInfo = work.getRemainingQueries();
+return Response.ok()
+.entity(queriesInfo)
+.header("Access-Control-Allow-Origin", "*")
+.header("Access-Control-Allow-Methods", "GET, POST, DELETE, 
PUT")
+  .header("Access-Control-Allow-Credentials","true")
+.allow("OPTIONS").build();
+  }
+
+  @SuppressWarnings("resource")
+  @POST
+  @Path("/graceful_shutdown")
+  @Produces(MediaType.APPLICATION_JSON)
+  public Response shutdownDrillbit() throws Exception {
+Map shutdownInfo = new HashMap();
+try {
+  new Thread(new Runnable() {
+public void run() {
+  try {
+drillbit.close();
+  } catch (Exception e) {
+e.printStackTrace();
--- End diff --

Printing the stack trace is OK for debugging. Fo production, we need to log:

```
logger.error("Web request to shutdown drillbit failed", e);
```


---


[GitHub] drill pull request #921: DRILL-4286 Graceful shutdown of drillbit

2017-11-14 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/921#discussion_r150984249
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/ClusterFixture.java ---
@@ -471,6 +471,21 @@ public void close() throws Exception {
   }
 
   /**
+   * Shutdown the drillbit given the name of the drillbit.
+   */
+  public void closeDrillbit(final String drillbitName) throws Exception {
+Exception ex = null;
+for (Drillbit bit : drillbits()) {
+  if(bit.equals(bits.get(drillbitName))) {
--- End diff --

Space after `if`, here and below.


---


[GitHub] drill pull request #921: DRILL-4286 Graceful shutdown of drillbit

2017-11-14 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/921#discussion_r150972227
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/Drillbit.java ---
@@ -77,6 +80,24 @@
   private final WorkManager manager;
   private final BootStrapContext context;
   private final WebServer webServer;
+  private final int gracePeriod;
+  private DrillbitStateManager stateManager;
+
+  public void setQuiescentMode(boolean quiescentMode) {
+this.quiescentMode = quiescentMode;
+  }
+
+  private boolean quiescentMode;
+
+  public void setForcefulShutdown(boolean forcefulShutdown) {
+this.forcefulShutdown = forcefulShutdown;
+  }
+
+  private boolean forcefulShutdown = false;
--- End diff --

Same.


---


[GitHub] drill pull request #921: DRILL-4286 Graceful shutdown of drillbit

2017-11-14 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/921#discussion_r150980131
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/work/WorkManager.java ---
@@ -165,32 +169,59 @@ public DrillbitContext getContext() {
*
* This is intended to be used by {@link 
org.apache.drill.exec.server.Drillbit#close()}.
*/
-  public void waitToExit() {
+  public void waitToExit(Drillbit bit, boolean forcefulShutdown) {
 synchronized(this) {
-  if (queries.isEmpty() && runningFragments.isEmpty()) {
+  numOfRunningQueries = queries.size();
+  numOfRunningFragments = runningFragments.size();
--- End diff --

Here we update the counts inside a synchronized block. Do we access them 
outside the block? How do we synchronize reads? Or, can these be local 
variables inside this method?

Or, should we have an `updateQueryCount()` method that handles the updates? 
That method would be synchronized, maybe?


---


[GitHub] drill pull request #921: DRILL-4286 Graceful shutdown of drillbit

2017-11-14 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/921#discussion_r150981324
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/work/WorkManager.java ---
@@ -165,32 +169,59 @@ public DrillbitContext getContext() {
*
* This is intended to be used by {@link 
org.apache.drill.exec.server.Drillbit#close()}.
*/
-  public void waitToExit() {
+  public void waitToExit(Drillbit bit, boolean forcefulShutdown) {
 synchronized(this) {
-  if (queries.isEmpty() && runningFragments.isEmpty()) {
+  numOfRunningQueries = queries.size();
+  numOfRunningFragments = runningFragments.size();
+  if ( queries.isEmpty() && runningFragments.isEmpty()) {
 return;
   }
-
+  logger.info("Draining " + queries +" queries and "+ 
runningFragments+" fragments.");
   exitLatch = new ExtendedLatch();
 }
-
-// Wait for at most 5 seconds or until the latch is released.
-exitLatch.awaitUninterruptibly(5000);
+// Wait uninterruptibly until all the queries and running fragments on 
that drillbit goes down
+// to zero
+if( forcefulShutdown ) {
+  exitLatch.awaitUninterruptibly(5000);
+} else {
+  exitLatch.awaitUninterruptibly();
+}
   }
 
   /**
* If it is safe to exit, and the exitLatch is in use, signals it so 
that waitToExit() will
-   * unblock.
+   * unblock. Logs the number of pending fragments and queries that are 
running on that
+   * drillbit to track the progress of shutdown process.
*/
   private void indicateIfSafeToExit() {
 synchronized(this) {
   if (exitLatch != null) {
+logger.info("Waiting for "+ queries.size() +" queries to complete 
before shutting down");
+logger.info("Waiting for "+ runningFragments.size() +" running 
fragments to complete before shutting down");
+if(runningFragments.size() > numOfRunningFragments|| 
queries.size() > numOfRunningQueries) {
+  logger.info("New Fragments or queries are added while drillbit 
is Shutting down");
+}
 if (queries.isEmpty() && runningFragments.isEmpty()) {
+  // Both Queries and Running fragments are empty.
+  // So its safe for the drillbit to exit.
   exitLatch.countDown();
 }
   }
 }
   }
+  /**
+   *  Get the number of queries that are running on a drillbit.
+   *  Primarily used to monitor the number of running queries after a
+   *  shutdown request is triggered.
+   */
+  public Map getRemainingQueries() {
+synchronized (this) {
--- End diff --

Since the entire body is synchronized, put the `synchronized` on the method 
itself.


---


[GitHub] drill pull request #921: DRILL-4286 Graceful shutdown of drillbit

2017-11-14 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/921#discussion_r150974512
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/DrillbitContext.java 
---
@@ -157,10 +157,29 @@ public DrillConfig getConfig() {
 return context.getConfig();
   }
 
-  public Collection getBits() {
+  public Collection getAvailableBits() {
 return coord.getAvailableEndpoints();
   }
 
+  public Collection getBits() {
+return coord.getOnlineEndPoints();
+  }
+  
+  public boolean isOnline(DrillbitEndpoint endpoint) { return 
endpoint.getState().equals(DrillbitEndpoint.State.ONLINE); }
+
+  public boolean isForemanOnline() {
+DrillbitEndpoint foreman = getEndpoint();
+Collection dbs = getAvailableBits();
+for( DrillbitEndpoint db : dbs) {
+  if( db.getAddress().equals(foreman.getAddress()) && db.getUserPort() 
== foreman.getUserPort()) {
+if( !isOnline(db)) {
--- End diff --

`if( !isOnline...` --> `if (! isOnline...`


---


[GitHub] drill pull request #921: DRILL-4286 Graceful shutdown of drillbit

2017-11-14 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/921#discussion_r150980543
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/work/WorkManager.java ---
@@ -165,32 +169,59 @@ public DrillbitContext getContext() {
*
* This is intended to be used by {@link 
org.apache.drill.exec.server.Drillbit#close()}.
*/
-  public void waitToExit() {
+  public void waitToExit(Drillbit bit, boolean forcefulShutdown) {
 synchronized(this) {
-  if (queries.isEmpty() && runningFragments.isEmpty()) {
+  numOfRunningQueries = queries.size();
+  numOfRunningFragments = runningFragments.size();
+  if ( queries.isEmpty() && runningFragments.isEmpty()) {
 return;
   }
-
+  logger.info("Draining " + queries +" queries and "+ 
runningFragments+" fragments.");
   exitLatch = new ExtendedLatch();
 }
-
-// Wait for at most 5 seconds or until the latch is released.
-exitLatch.awaitUninterruptibly(5000);
+// Wait uninterruptibly until all the queries and running fragments on 
that drillbit goes down
+// to zero
+if( forcefulShutdown ) {
+  exitLatch.awaitUninterruptibly(5000);
--- End diff --

Why 5000? I guess that is the old magic number. Perhaps move this to a 
constant:

```
private final int FORCEFUL_SHUTDOWN_GRACE_MS = 5000;
```


---


[GitHub] drill pull request #921: DRILL-4286 Graceful shutdown of drillbit

2017-11-14 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/921#discussion_r150983656
  
--- Diff: exec/java-exec/src/main/resources/rest/index.ftl ---
@@ -57,19 +57,19 @@
   Control Port
   Data Port
   Version
+  Status
 
   
   
 <#assign i = 1>
 <#list model.getDrillbits() as drillbit>
-  
+  
 ${i}
-${drillbit.getAddress()}
-  <#if drillbit.isCurrent()>
+${drillbit.getAddress()}<#if 
drillbit.isCurrent()>
 Current
   
 
-${drillbit.getUserPort()}
+${drillbit.getUserPort()}
--- End diff --

Can't us an id here; this is a list of many Drillbits and ids in HTML must 
be unique (that is, after all, what "identifier" means...)

If this is form formatting, use a class. If for identification, append a 
suffix: "id-1" or "id-xxx-xxx-xxx-xxx:xx".


---


[GitHub] drill pull request #921: DRILL-4286 Graceful shutdown of drillbit

2017-11-14 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/921#discussion_r150978855
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/DrillRoot.java 
---
@@ -58,13 +63,170 @@
   @Inject UserAuthEnabled authEnabled;
   @Inject WorkManager work;
   @Inject SecurityContext sc;
+  @Inject Drillbit drillbit;
 
   @GET
   @Produces(MediaType.TEXT_HTML)
   public Viewable getClusterInfo() {
 return ViewableWithPermissions.create(authEnabled.get(), 
"/rest/index.ftl", sc, getClusterInfoJSON());
   }
 
+
+  @SuppressWarnings("resource")
+  @GET
+  @Path("/state")
+  @Produces(MediaType.APPLICATION_JSON)
+  public Response getDrillbitStatus(){
+Collection drillbits = 
getClusterInfoJSON().getDrillbits();
+Map drillStatusMap = new HashMap();
+for (DrillbitInfo drillbit : drillbits) {
+  
drillStatusMap.put(drillbit.getAddress()+"-"+drillbit.getUserPort(),drillbit.getState());
+}
+return Response.ok()
+.entity(drillStatusMap)
+.header("Access-Control-Allow-Origin", "*")
+.header("Access-Control-Allow-Methods", "GET, POST, DELETE, 
PUT")
+  .header("Access-Control-Allow-Credentials","true")
+.allow("OPTIONS").build();
+  }
+
+  @SuppressWarnings("resource")
+  @GET
+  @Path("/graceperiod")
+  @Produces(MediaType.APPLICATION_JSON)
+  public Map getGracePeriod(){
+
+final DrillConfig config = work.getContext().getConfig();
+final int gracePeriod = config.getInt(ExecConstants.GRACE_PERIOD);
+Map gracePeriodMap = new HashMap();
+gracePeriodMap.put("graceperiod",gracePeriod);
+return gracePeriodMap;
+  }
+
+  @SuppressWarnings("resource")
+  @GET
+  @Path("/queriesCount")
+  @Produces(MediaType.APPLICATION_JSON)
+  public Response getRemainingQueries() {
+Map queriesInfo = new HashMap();
+queriesInfo = work.getRemainingQueries();
+return Response.ok()
+.entity(queriesInfo)
+.header("Access-Control-Allow-Origin", "*")
+.header("Access-Control-Allow-Methods", "GET, POST, DELETE, 
PUT")
+  .header("Access-Control-Allow-Credentials","true")
+.allow("OPTIONS").build();
+  }
+
+  @SuppressWarnings("resource")
+  @POST
+  @Path("/graceful_shutdown")
+  @Produces(MediaType.APPLICATION_JSON)
+  public Response shutdownDrillbit() throws Exception {
+Map shutdownInfo = new HashMap();
+try {
+  new Thread(new Runnable() {
+public void run() {
+  try {
+drillbit.close();
+  } catch (Exception e) {
+e.printStackTrace();
+  }
+}
+  }).start();
+  shutdownInfo.put("response", "Shutdown request is triggered");
+  return Response.ok()
+  .entity(shutdownInfo)
+  .header("Access-Control-Allow-Origin", "*")
+  .header("Access-Control-Allow-Methods", "GET, POST, DELETE, 
PUT")
+  .header("Access-Control-Allow-Credentials","true")
+  .allow("OPTIONS").build();
+}
+catch (Exception e) {
+  logger.debug("Exception in triggering shutdown request",e);
+  shutdownInfo.put("response", "Error in triggering shutdown request");
+  return Response.ok()
+  .entity(shutdownInfo)
+  .header("Access-Control-Allow-Origin", "*")
+  .header("Access-Control-Allow-Methods", "GET, POST, DELETE, 
PUT")
+  .header("Access-Control-Allow-Credentials","true")
+  .allow("OPTIONS").build();
+}
+  }
+
+  @SuppressWarnings("resource")
+  @POST
+  @Path("/shutdown")
+  @Produces(MediaType.APPLICATION_JSON)
+  public Response ShutdownForcefully() throws Exception {
+Map shutdownInfo = new HashMap();
+try {
+  new Thread(new Runnable() {
+public void run() {
+  try {
+drillbit.setForcefulShutdown(true);
+drillbit.close();
+  } catch (Exception e) {
+e.printStackTrace();
+  }
+}
+  }).start();
+  shutdownInfo.put("response", "Forceful shutdown request is 
triggered");
+  return Response.ok()
+  .entity(shutdownInfo)
+  .header("Access-Control-Allow-Origin", "*")
+  .header("Access-Control-Allow-Methods", "GET, POST, DELETE, 
PUT")
+ 

[GitHub] drill pull request #921: DRILL-4286 Graceful shutdown of drillbit

2017-11-14 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/921#discussion_r150974066
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/DrillbitContext.java 
---
@@ -157,10 +157,29 @@ public DrillConfig getConfig() {
 return context.getConfig();
   }
 
-  public Collection getBits() {
+  public Collection getAvailableBits() {
 return coord.getAvailableEndpoints();
   }
 
+  public Collection getBits() {
+return coord.getOnlineEndPoints();
+  }
+  
+  public boolean isOnline(DrillbitEndpoint endpoint) { return 
endpoint.getState().equals(DrillbitEndpoint.State.ONLINE); }
+
+  public boolean isForemanOnline() {
+DrillbitEndpoint foreman = getEndpoint();
+Collection dbs = getAvailableBits();
+for( DrillbitEndpoint db : dbs) {
+  if( db.getAddress().equals(foreman.getAddress()) && db.getUserPort() 
== foreman.getUserPort()) {
--- End diff --

`if( db...` --> `if (db...`

Also, `for( Drillbit...` --> `for (Drillbit...`


---


[GitHub] drill pull request #921: DRILL-4286 Graceful shutdown of drillbit

2017-11-14 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/921#discussion_r150972745
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/Drillbit.java ---
@@ -176,18 +200,37 @@ public void run() throws Exception {
 logger.info("Startup completed ({} ms).", 
w.elapsed(TimeUnit.MILLISECONDS));
   }
 
+  /*
+Wait uninterruptibly
+   */
+  public void waitForGracePeriod() {
+ExtendedLatch exitLatch = null;
+exitLatch = new ExtendedLatch();
--- End diff --

```
ExtendedLatch exitLatch = new ExtendedLatch();
```


---


[GitHub] drill pull request #921: DRILL-4286 Graceful shutdown of drillbit

2017-11-14 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/921#discussion_r150978228
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/DrillRoot.java 
---
@@ -58,13 +63,170 @@
   @Inject UserAuthEnabled authEnabled;
   @Inject WorkManager work;
   @Inject SecurityContext sc;
+  @Inject Drillbit drillbit;
 
   @GET
   @Produces(MediaType.TEXT_HTML)
   public Viewable getClusterInfo() {
 return ViewableWithPermissions.create(authEnabled.get(), 
"/rest/index.ftl", sc, getClusterInfoJSON());
   }
 
+
+  @SuppressWarnings("resource")
+  @GET
+  @Path("/state")
+  @Produces(MediaType.APPLICATION_JSON)
+  public Response getDrillbitStatus(){
+Collection drillbits = 
getClusterInfoJSON().getDrillbits();
+Map drillStatusMap = new HashMap();
+for (DrillbitInfo drillbit : drillbits) {
+  
drillStatusMap.put(drillbit.getAddress()+"-"+drillbit.getUserPort(),drillbit.getState());
+}
+return Response.ok()
+.entity(drillStatusMap)
+.header("Access-Control-Allow-Origin", "*")
+.header("Access-Control-Allow-Methods", "GET, POST, DELETE, 
PUT")
+  .header("Access-Control-Allow-Credentials","true")
+.allow("OPTIONS").build();
+  }
+
+  @SuppressWarnings("resource")
+  @GET
+  @Path("/graceperiod")
+  @Produces(MediaType.APPLICATION_JSON)
+  public Map getGracePeriod(){
+
+final DrillConfig config = work.getContext().getConfig();
+final int gracePeriod = config.getInt(ExecConstants.GRACE_PERIOD);
+Map gracePeriodMap = new HashMap();
+gracePeriodMap.put("graceperiod",gracePeriod);
+return gracePeriodMap;
+  }
+
+  @SuppressWarnings("resource")
+  @GET
+  @Path("/queriesCount")
+  @Produces(MediaType.APPLICATION_JSON)
+  public Response getRemainingQueries() {
+Map queriesInfo = new HashMap();
+queriesInfo = work.getRemainingQueries();
+return Response.ok()
+.entity(queriesInfo)
+.header("Access-Control-Allow-Origin", "*")
+.header("Access-Control-Allow-Methods", "GET, POST, DELETE, 
PUT")
+  .header("Access-Control-Allow-Credentials","true")
+.allow("OPTIONS").build();
+  }
+
+  @SuppressWarnings("resource")
+  @POST
+  @Path("/graceful_shutdown")
+  @Produces(MediaType.APPLICATION_JSON)
+  public Response shutdownDrillbit() throws Exception {
+Map shutdownInfo = new HashMap();
+try {
+  new Thread(new Runnable() {
+public void run() {
+  try {
+drillbit.close();
+  } catch (Exception e) {
+e.printStackTrace();
+  }
+}
+  }).start();
+  shutdownInfo.put("response", "Shutdown request is triggered");
+  return Response.ok()
+  .entity(shutdownInfo)
+  .header("Access-Control-Allow-Origin", "*")
+  .header("Access-Control-Allow-Methods", "GET, POST, DELETE, 
PUT")
+  .header("Access-Control-Allow-Credentials","true")
+  .allow("OPTIONS").build();
+}
+catch (Exception e) {
+  logger.debug("Exception in triggering shutdown request",e);
+  shutdownInfo.put("response", "Error in triggering shutdown request");
+  return Response.ok()
+  .entity(shutdownInfo)
+  .header("Access-Control-Allow-Origin", "*")
+  .header("Access-Control-Allow-Methods", "GET, POST, DELETE, 
PUT")
+  .header("Access-Control-Allow-Credentials","true")
+  .allow("OPTIONS").build();
+}
+  }
+
+  @SuppressWarnings("resource")
+  @POST
+  @Path("/shutdown")
+  @Produces(MediaType.APPLICATION_JSON)
+  public Response ShutdownForcefully() throws Exception {
+Map shutdownInfo = new HashMap();
+try {
+  new Thread(new Runnable() {
+public void run() {
+  try {
+drillbit.setForcefulShutdown(true);
+drillbit.close();
+  } catch (Exception e) {
+e.printStackTrace();
--- End diff --

See above.


---


[GitHub] drill pull request #921: DRILL-4286 Graceful shutdown of drillbit

2017-11-14 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/921#discussion_r150973864
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/DrillbitContext.java 
---
@@ -157,10 +157,29 @@ public DrillConfig getConfig() {
 return context.getConfig();
   }
 
-  public Collection getBits() {
+  public Collection getAvailableBits() {
 return coord.getAvailableEndpoints();
   }
 
+  public Collection getBits() {
+return coord.getOnlineEndPoints();
+  }
+  
+  public boolean isOnline(DrillbitEndpoint endpoint) { return 
endpoint.getState().equals(DrillbitEndpoint.State.ONLINE); }
--- End diff --

A bit too much code for a single line. Maybe put the body inside the method 
on a separate line.


---


[GitHub] drill pull request #921: DRILL-4286 Graceful shutdown of drillbit

2017-11-14 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/921#discussion_r150983939
  
--- Diff: exec/java-exec/src/main/resources/rest/index.ftl ---
@@ -78,6 +78,11 @@
 ${drillbit.getVersion()}
   
 
+${drillbit.getState()}
+
+ SHUTDOWN 
--- End diff --

Same id issue.


---


[GitHub] drill pull request #921: DRILL-4286 Graceful shutdown of drillbit

2017-11-14 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/921#discussion_r150983296
  
--- Diff: exec/java-exec/src/main/resources/rest/index.ftl ---
@@ -57,19 +57,19 @@
   Control Port
   Data Port
   Version
+  Status
 
   
   
 <#assign i = 1>
 <#list model.getDrillbits() as drillbit>
-  
+  
 ${i}
-${drillbit.getAddress()}
-  <#if drillbit.isCurrent()>
+${drillbit.getAddress()}<#if 
drillbit.isCurrent()>
--- End diff --

`"address" >` --> `"address">`

If-statement back to being on its own line?


---


[GitHub] drill pull request #921: DRILL-4286 Graceful shutdown of drillbit

2017-11-14 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/921#discussion_r150984538
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/TestGracefulShutdown.java ---
@@ -0,0 +1,248 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.test;
+import com.google.common.io.Files;
+import org.apache.commons.io.FileUtils;
+import org.apache.drill.exec.ExecConstants;
+import org.apache.drill.exec.proto.CoordinationProtos.DrillbitEndpoint;
+import org.apache.drill.exec.server.Drillbit;
+import org.junit.Assert;
+import org.junit.BeforeClass;
+import org.junit.Test;
+import java.io.File;
+import java.io.FileWriter;
+import java.io.IOException;
+import java.io.PrintWriter;
+import java.net.HttpURLConnection;
+import java.net.URL;
+import java.util.Collection;
+import java.util.Properties;
+import static org.junit.Assert.assertNotEquals;
+import static org.junit.Assert.fail;
+
+public class TestGracefulShutdown {
--- End diff --

Add `extends DrillTest` so we get the narration of tests as they run.


---


[GitHub] drill pull request #921: DRILL-4286 Graceful shutdown of drillbit

2017-11-14 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/921#discussion_r150975442
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/DrillbitStateManager.java
 ---
@@ -0,0 +1,80 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.server;
+/*
+  State manager to manage the state of drillbit.
+ */
+public class DrillbitStateManager {
+
+
+  public DrillbitStateManager(DrillbitState currentState) {
+this.currentState = currentState;
+  }
+
+  public enum DrillbitState {
+STARTUP, ONLINE, GRACE, DRAINING, OFFLINE, SHUTDOWN
+  }
+
+  public DrillbitState getState() {
+return currentState;
+  }
+
+  private DrillbitState currentState;
+  public void setState(DrillbitState state) {
--- End diff --

Maybe `state` --> `newState` to make clear that this is the state we would 
like to transition to.


---


[GitHub] drill pull request #921: DRILL-4286 Graceful shutdown of drillbit

2017-11-14 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/921#discussion_r150975269
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/DrillbitStateManager.java
 ---
@@ -0,0 +1,80 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.server;
+/*
+  State manager to manage the state of drillbit.
+ */
+public class DrillbitStateManager {
+
+
+  public DrillbitStateManager(DrillbitState currentState) {
+this.currentState = currentState;
+  }
+
+  public enum DrillbitState {
+STARTUP, ONLINE, GRACE, DRAINING, OFFLINE, SHUTDOWN
+  }
+
+  public DrillbitState getState() {
+return currentState;
+  }
+
+  private DrillbitState currentState;
--- End diff --

Please move members after enum, before constructor.


---


[GitHub] drill pull request #921: DRILL-4286 Graceful shutdown of drillbit

2017-11-14 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/921#discussion_r150972162
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/Drillbit.java ---
@@ -77,6 +80,24 @@
   private final WorkManager manager;
   private final BootStrapContext context;
   private final WebServer webServer;
+  private final int gracePeriod;
+  private DrillbitStateManager stateManager;
+
+  public void setQuiescentMode(boolean quiescentMode) {
+this.quiescentMode = quiescentMode;
+  }
+
+  private boolean quiescentMode;
--- End diff --

Please move field up with other fields.


---


[GitHub] drill pull request #921: DRILL-4286 Graceful shutdown of drillbit

2017-11-14 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/921#discussion_r150971745
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/coord/zk/ZKClusterCoordinator.java
 ---
@@ -200,11 +206,50 @@ public void unregister(RegistrationHandle handle) {
 }
   }
 
+  /**
+   * Update drillbit endpoint state. Drillbit advertises its
+   * state in Zookeeper when a shutdown request of drillbit is
+   * triggered. State information is used during planning and
+   * initial client connection phases.
+   */
+  public RegistrationHandle update(RegistrationHandle handle, State state) 
{
+ZKRegistrationHandle h = (ZKRegistrationHandle) handle;
+  try {
+endpoint = h.endpoint.toBuilder().setState(state).build();
+ServiceInstance serviceInstance = 
ServiceInstance.builder()
+.name(serviceName)
+.id(h.id)
+.payload(endpoint).build();
+discovery.updateService(serviceInstance);
+  } catch (Exception e) {
+e.printStackTrace();
+  }
+  return handle;
+  }
+
   @Override
   public Collection getAvailableEndpoints() {
 return this.endpoints;
   }
 
+  /*
+   * Get a collection of ONLINE Drillbit endpoints by excluding the 
drillbits
+   * that are in QUIESCENT state (drillbits shutting down). Primarily used 
by the planner
+   * to plan queries only on ONLINE drillbits and used by the client 
during initial connection
+   * phase to connect to a drillbit (foreman)
+   * @return A collection of ONLINE endpoints
+   */
+  @Override
+  public Collection getOnlineEndPoints() {
+Collection runningEndPoints = new ArrayList<>();
+for (DrillbitEndpoint endpoint: endpoints){
+  if(!endpoint.hasState() || endpoint.getState().equals(State.ONLINE)) 
{
--- End diff --

This stanza appears multiple times. Can we define a static function that 
does the double check to avoid redundant code?

```
boolean isInState(State state) {
...
```
Choose an appropriate name, maybe `isDrillbitInState` or whatever.


---


[GitHub] drill pull request #921: DRILL-4286 Graceful shutdown of drillbit

2017-11-14 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/921#discussion_r150972546
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/Drillbit.java ---
@@ -135,8 +157,9 @@ public Drillbit(
   profileStoreProvider = storeProvider;
 }
 
-engine = new ServiceEngine(manager, context, allowPortHunting, 
isDistributedMode);
+engine = new ServiceEngine(manager, context, true, isDistributedMode);
--- End diff --

Do we really want to change to always allow port hunting? This means that 
an attempt to start a second Drillbit on a node will succeed by default, rather 
than fail. OK?


---


[GitHub] drill pull request #921: DRILL-4286 Graceful shutdown of drillbit

2017-11-14 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/921#discussion_r150974323
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/DrillbitContext.java 
---
@@ -157,10 +157,29 @@ public DrillConfig getConfig() {
 return context.getConfig();
   }
 
-  public Collection getBits() {
+  public Collection getAvailableBits() {
 return coord.getAvailableEndpoints();
   }
 
+  public Collection getBits() {
+return coord.getOnlineEndPoints();
+  }
+  
+  public boolean isOnline(DrillbitEndpoint endpoint) { return 
endpoint.getState().equals(DrillbitEndpoint.State.ONLINE); }
+
+  public boolean isForemanOnline() {
+DrillbitEndpoint foreman = getEndpoint();
+Collection dbs = getAvailableBits();
+for( DrillbitEndpoint db : dbs) {
+  if( db.getAddress().equals(foreman.getAddress()) && db.getUserPort() 
== foreman.getUserPort()) {
--- End diff --

This is another place for a static function:

```
public boolean isSameDrillbit(DrillbitEndpoint target) ...
```


---


[GitHub] drill pull request #921: DRILL-4286 Graceful shutdown of drillbit

2017-11-14 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/921#discussion_r150971276
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/coord/zk/ZKClusterCoordinator.java
 ---
@@ -200,11 +206,50 @@ public void unregister(RegistrationHandle handle) {
 }
   }
 
+  /**
+   * Update drillbit endpoint state. Drillbit advertises its
+   * state in Zookeeper when a shutdown request of drillbit is
+   * triggered. State information is used during planning and
+   * initial client connection phases.
+   */
+  public RegistrationHandle update(RegistrationHandle handle, State state) 
{
+ZKRegistrationHandle h = (ZKRegistrationHandle) handle;
+  try {
+endpoint = h.endpoint.toBuilder().setState(state).build();
+ServiceInstance serviceInstance = 
ServiceInstance.builder()
+.name(serviceName)
+.id(h.id)
+.payload(endpoint).build();
+discovery.updateService(serviceInstance);
+  } catch (Exception e) {
+e.printStackTrace();
+  }
+  return handle;
+  }
+
   @Override
   public Collection getAvailableEndpoints() {
 return this.endpoints;
   }
 
+  /*
+   * Get a collection of ONLINE Drillbit endpoints by excluding the 
drillbits
+   * that are in QUIESCENT state (drillbits shutting down). Primarily used 
by the planner
+   * to plan queries only on ONLINE drillbits and used by the client 
during initial connection
+   * phase to connect to a drillbit (foreman)
+   * @return A collection of ONLINE endpoints
+   */
+  @Override
+  public Collection getOnlineEndPoints() {
+Collection runningEndPoints = new ArrayList<>();
+for (DrillbitEndpoint endpoint: endpoints){
+  if(!endpoint.hasState() || endpoint.getState().equals(State.ONLINE)) 
{
+runningEndPoints.add(endpoint);
+  }
+}
+logger.debug("Online endpoints in ZK are"+runningEndPoints.toString());
--- End diff --

Minor: spaces around `+`


---


[GitHub] drill pull request #921: DRILL-4286 Graceful shutdown of drillbit

2017-11-14 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/921#discussion_r150971910
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/coord/zk/ZKClusterCoordinator.java
 ---
@@ -229,27 +275,42 @@ public DrillbitEndpoint 
apply(ServiceInstance input) {
 });
 
   // set of newly dead bits : original bits - new set of active bits.
-  Set unregisteredBits = new HashSet<>(endpoints);
-  unregisteredBits.removeAll(newDrillbitSet);
-
+  Set unregisteredBits = new HashSet<>();
   // Set of newly live bits : new set of active bits - original bits.
-  Set registeredBits = new HashSet<>(newDrillbitSet);
-  registeredBits.removeAll(endpoints);
+  Set registeredBits = new HashSet<>();
 
-  endpoints = newDrillbitSet;
 
+  // Updates the endpoints map if there is a change in state of the 
endpoint or with the addition
+  // of new drillbit endpoints. Registered endpoints is set to newly 
live drillbit endpoints.
+  for ( DrillbitEndpoint endpoint : newDrillbitSet) {
+String endpointAddress = endpoint.getAddress();
+int endpointPort = endpoint.getUserPort();
+if (! endpointsMap.containsKey(new MultiKey(endpointAddress, 
endpointPort))) {
+  registeredBits.add(endpoint);
+}
+endpointsMap.put(new MultiKey(endpointAddress, 
endpointPort),endpoint);
+  }
+//   Remove all the endpoints that are newly dead
--- End diff --

Minor: comment character usually is aligned with code, as above.


---


[GitHub] drill pull request #921: DRILL-4286 Graceful shutdown of drillbit

2017-11-09 Thread bitblender
Github user bitblender commented on a diff in the pull request:

https://github.com/apache/drill/pull/921#discussion_r150047464
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/work/foreman/Foreman.java ---
@@ -348,6 +354,21 @@ public void run() {
  */
   }
 
+  /*
+Check if the foreman is ONLINE. If not dont accept any new queries.
+   */
+  public void checkForemanState() throws ForemanException{
+DrillbitEndpoint foreman = drillbitContext.getEndpoint();
+Collection dbs = drillbitContext.getAvailableBits();
--- End diff --

I was thinking of encapsulating code from lines 360 to 367 into a boolean 
isOnline(), since all the values in that code are derived from the current 
DrillbitContext.  Then your code would be simplified to 
`
public void checkForemanState() throws ForemanException{
  if (!drillbitContext.isOnline()) {
  throw new ForemanException("Query submission failed since Foreman is 
shutting down.");
  }
}
`


---


[GitHub] drill pull request #921: DRILL-4286 Graceful shutdown of drillbit

2017-11-07 Thread bitblender
Github user bitblender commented on a diff in the pull request:

https://github.com/apache/drill/pull/921#discussion_r149542196
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/work/foreman/Foreman.java ---
@@ -348,6 +354,21 @@ public void run() {
  */
   }
 
+  /*
+Check if the foreman is ONLINE. If not dont accept any new queries.
+   */
+  public void checkForemanState() throws ForemanException{
+DrillbitEndpoint foreman = drillbitContext.getEndpoint();
+Collection dbs = drillbitContext.getAvailableBits();
--- End diff --

Maybe add it to the DrillbitContext class ?


---


[GitHub] drill pull request #921: DRILL-4286 Graceful shutdown of drillbit

2017-11-07 Thread bitblender
Github user bitblender commented on a diff in the pull request:

https://github.com/apache/drill/pull/921#discussion_r149541807
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/DrillbitStateManager.java
 ---
@@ -0,0 +1,80 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.server;
+/*
+  State manager to manage the state of drillbit.
+ */
+public class DrillbitStateManager {
+
+
+  public DrillbitStateManager(DrillbitState currentState) {
+this.currentState = currentState;
+  }
+
+  public enum DrillbitState {
+STARTUP, ONLINE, GRACE, DRAINING, OFFLINE, SHUTDOWN
+  }
+
+  public DrillbitState getState() {
+return currentState;
+  }
+
+  private DrillbitState currentState;
--- End diff --

I think Drillbit.quiescentMode and Drillbit.forceful_shutdown also need NOT 
be volatile given the way they are used now. You don't have to enforce 
happens-before (by preventing re-ordering) here and even if these variables are 
volatile, the read of these variables in close() can anyway race with the 
setting of these variables in another thread doing a stop/gracefulShutdown. Let 
me know if I am missing anything.

That said, adding volatiles can only makes the code more correct (and 
slower). Since this code is not critical you can let it be as it is.  


---


[GitHub] drill pull request #921: DRILL-4286 Graceful shutdown of drillbit

2017-11-07 Thread bitblender
Github user bitblender commented on a diff in the pull request:

https://github.com/apache/drill/pull/921#discussion_r149544267
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/TestGracefulShutdown.java ---
@@ -0,0 +1,323 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.test;
+
+import ch.qos.logback.classic.Level;
+import org.apache.commons.io.FileUtils;
+import org.apache.drill.exec.ExecConstants;
+import org.apache.drill.exec.proto.CoordinationProtos.DrillbitEndpoint;
+import org.apache.drill.exec.proto.UserBitShared.QueryResult.QueryState;
+import org.apache.drill.exec.server.Drillbit;
+import org.junit.AfterClass;
+import org.junit.Assert;
+import org.junit.BeforeClass;
+import org.junit.Test;
+import org.omg.PortableServer.THREAD_POLICY_ID;
+
+import java.io.File;
+import java.io.FileWriter;
+import java.io.IOException;
+import java.io.PrintWriter;
+import java.net.HttpURLConnection;
+import java.net.URL;
+import java.util.Collection;
+import java.util.Properties;
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertNotEquals;
+import static org.junit.Assert.fail;
+
+public class TestGracefulShutdown {
+
+  @BeforeClass
+  public static void setUpTestData() {
+for( int i = 0; i < 1000; i++) {
+  setupFile(i);
+}
+  }
+
+
+  public static final Properties WEBSERVER_CONFIGURATION = new 
Properties() {
+{
+  put(ExecConstants.HTTP_ENABLE, true);
+}
+  };
+
+  public FixtureBuilder enableWebServer(FixtureBuilder builder) {
+Properties props = new Properties();
+props.putAll(WEBSERVER_CONFIGURATION);
+builder.configBuilder.configProps(props);
+return builder;
+  }
+
+
+  /*
+  Start multiple drillbits and then shutdown a drillbit. Query the online
+  endpoints and check if the drillbit still exists.
+   */
+  @Test
+  public void testOnlineEndPoints() throws  Exception {
+
+String[] drillbits = {"db1" ,"db2","db3", "db4", "db5", "db6"};
+FixtureBuilder builder = 
ClusterFixture.builder().withBits(drillbits).withLocalZk();
+
+
+try ( ClusterFixture cluster = builder.build();
+  ClientFixture client = cluster.clientFixture()) {
+
+  Drillbit drillbit = cluster.drillbit("db2");
+  DrillbitEndpoint drillbitEndpoint =  
drillbit.getRegistrationHandle().getEndPoint();
+  int grace_period = 
drillbit.getContext().getConfig().getInt("drill.exec.grace_period");
+  new Thread(new Runnable() {
+public void run() {
+  try {
+cluster.close_drillbit("db2");
+  } catch (Exception e) {
+e.printStackTrace();
+  }
+}
+  }).start();
+  //wait for graceperiod
+  Thread.sleep(grace_period);
+  Collection drillbitEndpoints = 
cluster.drillbit().getContext()
+  .getClusterCoordinator()
+  .getOnlineEndPoints();
+  Assert.assertFalse(drillbitEndpoints.contains(drillbitEndpoint));
+}
+  }
+  /*
+Test if the drillbit transitions from ONLINE state when a shutdown
+request is initiated
+   */
+  @Test
+  public void testStateChange() throws  Exception {
+
+String[] drillbits = {"db1" ,"db2", "db3", "db4", "db5", "db6"};
+FixtureBuilder builder = 
ClusterFixture.builder().withBits(drillbits).withLocalZk();
+
+try ( ClusterFixture cluster = builder.build();
+  ClientFixture client = cluster.clientFixture()) {
+  Drillbit drillbit = cluster.drillbit("db2");
+  int grace_period = 
drillbit.getContext().getConfig().getInt("drill.exec.grace_period");
+  DrillbitEndpoint drillbitEndpoint =  
drillbit.getRegistrationHandle().getEndPoint();
+  new Thread(new Runnable() {
+public void run() 

[GitHub] drill pull request #921: DRILL-4286 Graceful shutdown of drillbit

2017-11-06 Thread dvjyothsna
Github user dvjyothsna commented on a diff in the pull request:

https://github.com/apache/drill/pull/921#discussion_r149186970
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/TestGracefulShutdown.java ---
@@ -0,0 +1,323 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.test;
+
+import ch.qos.logback.classic.Level;
+import org.apache.commons.io.FileUtils;
+import org.apache.drill.exec.ExecConstants;
+import org.apache.drill.exec.proto.CoordinationProtos.DrillbitEndpoint;
+import org.apache.drill.exec.proto.UserBitShared.QueryResult.QueryState;
+import org.apache.drill.exec.server.Drillbit;
+import org.junit.AfterClass;
+import org.junit.Assert;
+import org.junit.BeforeClass;
+import org.junit.Test;
+import org.omg.PortableServer.THREAD_POLICY_ID;
+
+import java.io.File;
+import java.io.FileWriter;
+import java.io.IOException;
+import java.io.PrintWriter;
+import java.net.HttpURLConnection;
+import java.net.URL;
+import java.util.Collection;
+import java.util.Properties;
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertNotEquals;
+import static org.junit.Assert.fail;
+
+public class TestGracefulShutdown {
+
+  @BeforeClass
+  public static void setUpTestData() {
+for( int i = 0; i < 1000; i++) {
+  setupFile(i);
+}
+  }
+
+
+  public static final Properties WEBSERVER_CONFIGURATION = new 
Properties() {
+{
+  put(ExecConstants.HTTP_ENABLE, true);
+}
+  };
+
+  public FixtureBuilder enableWebServer(FixtureBuilder builder) {
+Properties props = new Properties();
+props.putAll(WEBSERVER_CONFIGURATION);
+builder.configBuilder.configProps(props);
+return builder;
+  }
+
+
+  /*
+  Start multiple drillbits and then shutdown a drillbit. Query the online
+  endpoints and check if the drillbit still exists.
+   */
+  @Test
+  public void testOnlineEndPoints() throws  Exception {
+
+String[] drillbits = {"db1" ,"db2","db3", "db4", "db5", "db6"};
+FixtureBuilder builder = 
ClusterFixture.builder().withBits(drillbits).withLocalZk();
+
+
+try ( ClusterFixture cluster = builder.build();
+  ClientFixture client = cluster.clientFixture()) {
+
+  Drillbit drillbit = cluster.drillbit("db2");
+  DrillbitEndpoint drillbitEndpoint =  
drillbit.getRegistrationHandle().getEndPoint();
+  int grace_period = 
drillbit.getContext().getConfig().getInt("drill.exec.grace_period");
+  new Thread(new Runnable() {
+public void run() {
+  try {
+cluster.close_drillbit("db2");
+  } catch (Exception e) {
+e.printStackTrace();
+  }
+}
+  }).start();
+  //wait for graceperiod
+  Thread.sleep(grace_period);
+  Collection drillbitEndpoints = 
cluster.drillbit().getContext()
+  .getClusterCoordinator()
+  .getOnlineEndPoints();
+  Assert.assertFalse(drillbitEndpoints.contains(drillbitEndpoint));
+}
+  }
+  /*
+Test if the drillbit transitions from ONLINE state when a shutdown
+request is initiated
+   */
+  @Test
+  public void testStateChange() throws  Exception {
+
+String[] drillbits = {"db1" ,"db2", "db3", "db4", "db5", "db6"};
+FixtureBuilder builder = 
ClusterFixture.builder().withBits(drillbits).withLocalZk();
+
+try ( ClusterFixture cluster = builder.build();
+  ClientFixture client = cluster.clientFixture()) {
+  Drillbit drillbit = cluster.drillbit("db2");
+  int grace_period = 
drillbit.getContext().getConfig().getInt("drill.exec.grace_period");
+  DrillbitEndpoint drillbitEndpoint =  
drillbit.getRegistrationHandle().getEndPoint();
+  new Thread(new Runnable() {
+public void run() 

[GitHub] drill pull request #921: DRILL-4286 Graceful shutdown of drillbit

2017-11-06 Thread dvjyothsna
Github user dvjyothsna commented on a diff in the pull request:

https://github.com/apache/drill/pull/921#discussion_r149186401
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/work/foreman/Foreman.java ---
@@ -348,6 +354,21 @@ public void run() {
  */
   }
 
+  /*
+Check if the foreman is ONLINE. If not dont accept any new queries.
+   */
+  public void checkForemanState() throws ForemanException{
+DrillbitEndpoint foreman = drillbitContext.getEndpoint();
+Collection dbs = drillbitContext.getAvailableBits();
--- End diff --

DrillbitEndPoint is protobuf generated code so I cannot modify that. Can I 
add isOnline() method in foreman or a new Class ?


---


[GitHub] drill pull request #921: DRILL-4286 Graceful shutdown of drillbit

2017-11-06 Thread dvjyothsna
Github user dvjyothsna commented on a diff in the pull request:

https://github.com/apache/drill/pull/921#discussion_r149185238
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/DrillbitStateManager.java
 ---
@@ -0,0 +1,80 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.server;
+/*
+  State manager to manage the state of drillbit.
+ */
+public class DrillbitStateManager {
+
+
+  public DrillbitStateManager(DrillbitState currentState) {
+this.currentState = currentState;
+  }
+
+  public enum DrillbitState {
+STARTUP, ONLINE, GRACE, DRAINING, OFFLINE, SHUTDOWN
+  }
+
+  public DrillbitState getState() {
+return currentState;
+  }
+
+  private DrillbitState currentState;
--- End diff --

currentState is updated only in the close() method and close() is 
synchronized.


---


[GitHub] drill pull request #921: DRILL-4286 Graceful shutdown of drillbit

2017-11-03 Thread bitblender
Github user bitblender commented on a diff in the pull request:

https://github.com/apache/drill/pull/921#discussion_r148868884
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/work/WorkManager.java ---
@@ -165,32 +176,60 @@ public DrillbitContext getContext() {
*
* This is intended to be used by {@link 
org.apache.drill.exec.server.Drillbit#close()}.
*/
-  public void waitToExit() {
+  public void waitToExit(Drillbit bit, boolean forcceful_shutdown) {
 synchronized(this) {
-  if (queries.isEmpty() && runningFragments.isEmpty()) {
+  numOfRunningQueries = queries.size();
+  numOfRunningFragments = runningFragments.size();
+  if ( queries.isEmpty() && runningFragments.isEmpty()) {
 return;
   }
-
+  logger.info("Draining " + queries +" queries and "+ 
runningFragments+" fragments.");
   exitLatch = new ExtendedLatch();
 }
-
-// Wait for at most 5 seconds or until the latch is released.
-exitLatch.awaitUninterruptibly(5000);
+// Wait uninterruptibly until all the queries and running fragments on 
that drillbit goes down
+// to zero
+if( forcceful_shutdown) {
+  exitLatch.awaitUninterruptibly(5000);
+}
+else {
--- End diff --

Inconsistent if-else formatting


---


[GitHub] drill pull request #921: DRILL-4286 Graceful shutdown of drillbit

2017-11-03 Thread bitblender
Github user bitblender commented on a diff in the pull request:

https://github.com/apache/drill/pull/921#discussion_r148871345
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/work/foreman/Foreman.java ---
@@ -251,6 +252,11 @@ public void run() {
 final String originalName = currentThread.getName();
 currentThread.setName(queryIdString + ":foreman");
 
+try {
+  checkForemanState();
+} catch (ForemanException e) {
+  addToEventQueue(QueryState.FAILED, new ForemanException("Query 
submission failed since foreman is shutting down"));
--- End diff --

why are you not just adding 'e' to the eventQueue ? why do you have to 
create a new ForemanException object?


---


[GitHub] drill pull request #921: DRILL-4286 Graceful shutdown of drillbit

2017-11-03 Thread bitblender
Github user bitblender commented on a diff in the pull request:

https://github.com/apache/drill/pull/921#discussion_r148872536
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/ClusterFixture.java ---
@@ -471,6 +471,22 @@ public void close() throws Exception {
   }
 
   /**
+   * Shutdown the drillbit given the name of the drillbit.
+   */
+  public void close_drillbit(final String drillbitname) throws Exception {
--- End diff --

camel case.


---


[GitHub] drill pull request #921: DRILL-4286 Graceful shutdown of drillbit

2017-11-03 Thread bitblender
Github user bitblender commented on a diff in the pull request:

https://github.com/apache/drill/pull/921#discussion_r148684246
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/coord/zk/ZKClusterCoordinator.java
 ---
@@ -70,7 +72,10 @@
   private final CountDownLatch initialConnection = new CountDownLatch(1);
   private final TransientStoreFactory factory;
   private ServiceCache serviceCache;
+  private DrillbitEndpoint endpoint;
 
+//private HashMap endpointsMap = new 
HashMap();
+  private ConcurrentHashMap endpointsMap = new 
ConcurrentHashMap();
--- End diff --

please add a comment about what this map contains. 


---


[GitHub] drill pull request #921: DRILL-4286 Graceful shutdown of drillbit

2017-11-03 Thread bitblender
Github user bitblender commented on a diff in the pull request:

https://github.com/apache/drill/pull/921#discussion_r148173100
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/coord/zk/ZKClusterCoordinator.java
 ---
@@ -70,7 +72,10 @@
   private final CountDownLatch initialConnection = new CountDownLatch(1);
   private final TransientStoreFactory factory;
   private ServiceCache serviceCache;
+  private DrillbitEndpoint endpoint;
 
+//private HashMap endpointsMap = new 
HashMap();
--- End diff --

commented code


---


[GitHub] drill pull request #921: DRILL-4286 Graceful shutdown of drillbit

2017-11-03 Thread bitblender
Github user bitblender commented on a diff in the pull request:

https://github.com/apache/drill/pull/921#discussion_r148682043
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/coord/zk/ZKClusterCoordinator.java
 ---
@@ -229,27 +272,52 @@ public DrillbitEndpoint 
apply(ServiceInstance input) {
 });
 
   // set of newly dead bits : original bits - new set of active bits.
-  Set unregisteredBits = new HashSet<>(endpoints);
-  unregisteredBits.removeAll(newDrillbitSet);
-
+  Set unregisteredBits = new HashSet<>();
   // Set of newly live bits : new set of active bits - original bits.
-  Set registeredBits = new HashSet<>(newDrillbitSet);
-  registeredBits.removeAll(endpoints);
-
-  endpoints = newDrillbitSet;
+  Set registeredBits = new HashSet<>();
 
+  // Update the endpoints map with change in state of the endpoint or 
with the addition
+  // of new drillbit endpoints. Registered endpoints is set to newly 
live drillbit endpoints.
+  for ( DrillbitEndpoint endpoint : newDrillbitSet) {
+if (endpointsMap.containsKey(new 
MultiKey(endpoint.getAddress(),endpoint.getUserPort( {
+  endpointsMap.put(new 
MultiKey(endpoint.getAddress(),endpoint.getUserPort()),endpoint);
+}
+else {
+  registeredBits.add(endpoint);
+  endpointsMap.put(new 
MultiKey(endpoint.getAddress(),endpoint.getUserPort()),endpoint);
+}
+  }
+//  Iterator iterator = endpointsMap.keySet().iterator() ;
--- End diff --

move the common code out of the if-else


---


[GitHub] drill pull request #921: DRILL-4286 Graceful shutdown of drillbit

2017-11-03 Thread bitblender
Github user bitblender commented on a diff in the pull request:

https://github.com/apache/drill/pull/921#discussion_r148872381
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/work/foreman/Foreman.java ---
@@ -348,6 +354,21 @@ public void run() {
  */
   }
 
+  /*
+Check if the foreman is ONLINE. If not dont accept any new queries.
+   */
+  public void checkForemanState() throws ForemanException{
+DrillbitEndpoint foreman = drillbitContext.getEndpoint();
+Collection dbs = drillbitContext.getAvailableBits();
--- End diff --

I would recommend refactoring this into a generic boolean 
isOnline(DrillbitEndpoint endpoint) in the DrillbitEndpoint class.


---


[GitHub] drill pull request #921: DRILL-4286 Graceful shutdown of drillbit

2017-11-03 Thread bitblender
Github user bitblender commented on a diff in the pull request:

https://github.com/apache/drill/pull/921#discussion_r148682167
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/coord/zk/ZKClusterCoordinator.java
 ---
@@ -229,27 +272,52 @@ public DrillbitEndpoint 
apply(ServiceInstance input) {
 });
 
   // set of newly dead bits : original bits - new set of active bits.
-  Set unregisteredBits = new HashSet<>(endpoints);
-  unregisteredBits.removeAll(newDrillbitSet);
-
+  Set unregisteredBits = new HashSet<>();
   // Set of newly live bits : new set of active bits - original bits.
-  Set registeredBits = new HashSet<>(newDrillbitSet);
-  registeredBits.removeAll(endpoints);
-
-  endpoints = newDrillbitSet;
+  Set registeredBits = new HashSet<>();
 
+  // Update the endpoints map with change in state of the endpoint or 
with the addition
--- End diff --

loop below does not handle change in state, so the comment should be 
changed. Is the endpointsMap necessary because of port hunting ? 


---


[GitHub] drill pull request #921: DRILL-4286 Graceful shutdown of drillbit

2017-11-03 Thread bitblender
Github user bitblender commented on a diff in the pull request:

https://github.com/apache/drill/pull/921#discussion_r148872633
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/ClusterFixture.java ---
@@ -471,6 +471,22 @@ public void close() throws Exception {
   }
 
   /**
+   * Shutdown the drillbit given the name of the drillbit.
+   */
+  public void close_drillbit(final String drillbitname) throws Exception {
+Exception ex = null;
+for (Drillbit bit : drillbits())
+{
--- End diff --

placement of '{'


---


[GitHub] drill pull request #921: DRILL-4286 Graceful shutdown of drillbit

2017-11-03 Thread bitblender
Github user bitblender commented on a diff in the pull request:

https://github.com/apache/drill/pull/921#discussion_r148861255
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/DrillbitStateManager.java
 ---
@@ -0,0 +1,80 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.server;
+/*
+  State manager to manage the state of drillbit.
+ */
+public class DrillbitStateManager {
+
+
+  public DrillbitStateManager(DrillbitState currentState) {
+this.currentState = currentState;
+  }
+
+  public enum DrillbitState {
+STARTUP, ONLINE, GRACE, DRAINING, OFFLINE, SHUTDOWN
+  }
+
+  public DrillbitState getState() {
+return currentState;
+  }
+
+  private DrillbitState currentState;
--- End diff --

Drillbit.quiescentMode and Drillbit.forceful_shutdown are volatiles but 
currentState is not. Can you explain why this is so ?


---


[GitHub] drill pull request #921: DRILL-4286 Graceful shutdown of drillbit

2017-11-03 Thread bitblender
Github user bitblender commented on a diff in the pull request:

https://github.com/apache/drill/pull/921#discussion_r148668138
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/coord/zk/ZKClusterCoordinator.java
 ---
@@ -229,27 +272,52 @@ public DrillbitEndpoint 
apply(ServiceInstance input) {
 });
 
   // set of newly dead bits : original bits - new set of active bits.
-  Set unregisteredBits = new HashSet<>(endpoints);
-  unregisteredBits.removeAll(newDrillbitSet);
-
+  Set unregisteredBits = new HashSet<>();
   // Set of newly live bits : new set of active bits - original bits.
-  Set registeredBits = new HashSet<>(newDrillbitSet);
-  registeredBits.removeAll(endpoints);
-
-  endpoints = newDrillbitSet;
+  Set registeredBits = new HashSet<>();
 
+  // Update the endpoints map with change in state of the endpoint or 
with the addition
+  // of new drillbit endpoints. Registered endpoints is set to newly 
live drillbit endpoints.
+  for ( DrillbitEndpoint endpoint : newDrillbitSet) {
+if (endpointsMap.containsKey(new 
MultiKey(endpoint.getAddress(),endpoint.getUserPort( {
+  endpointsMap.put(new 
MultiKey(endpoint.getAddress(),endpoint.getUserPort()),endpoint);
+}
+else {
+  registeredBits.add(endpoint);
+  endpointsMap.put(new 
MultiKey(endpoint.getAddress(),endpoint.getUserPort()),endpoint);
+}
+  }
+//  Iterator iterator = endpointsMap.keySet().iterator() ;
--- End diff --

commented code


---


[GitHub] drill pull request #921: DRILL-4286 Graceful shutdown of drillbit

2017-11-03 Thread bitblender
Github user bitblender commented on a diff in the pull request:

https://github.com/apache/drill/pull/921#discussion_r148674083
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/coord/ClusterCoordinator.java
 ---
@@ -60,7 +61,26 @@
*/
   public abstract Collection getAvailableEndpoints();
 
+  /**
+   * Get a collection of ONLINE drillbit endpoints by excluding the 
drillbits
+   * that are in QUIESCENT state (drillbits that are shutting down). 
Primarily used by the planner
+   * to plan queries only on ONLINE drillbits and used by the client 
during initial connection
+   * phase to connect to a drillbit (foreman)
+   * @return A collection of ONLINE endpoints
+   */
+
+  public abstract Collection getOnlineEndPoints();
+
+  public abstract RegistrationHandle update(RegistrationHandle handle, 
State state);
+
   public interface RegistrationHandle {
+/**
+ * Get the drillbit endpoint associated with the registration handle
+ * @return drillbit endpoint
+ */
+public abstract DrillbitEndpoint getEndPoint();
+
+public abstract void setEndPoint( DrillbitEndpoint endpoint);
--- End diff --

spacing


---


[GitHub] drill pull request #921: DRILL-4286 Graceful shutdown of drillbit

2017-11-03 Thread bitblender
Github user bitblender commented on a diff in the pull request:

https://github.com/apache/drill/pull/921#discussion_r148874640
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/TestGracefulShutdown.java ---
@@ -0,0 +1,323 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.test;
+
+import ch.qos.logback.classic.Level;
+import org.apache.commons.io.FileUtils;
+import org.apache.drill.exec.ExecConstants;
+import org.apache.drill.exec.proto.CoordinationProtos.DrillbitEndpoint;
+import org.apache.drill.exec.proto.UserBitShared.QueryResult.QueryState;
+import org.apache.drill.exec.server.Drillbit;
+import org.junit.AfterClass;
+import org.junit.Assert;
+import org.junit.BeforeClass;
+import org.junit.Test;
+import org.omg.PortableServer.THREAD_POLICY_ID;
+
+import java.io.File;
+import java.io.FileWriter;
+import java.io.IOException;
+import java.io.PrintWriter;
+import java.net.HttpURLConnection;
+import java.net.URL;
+import java.util.Collection;
+import java.util.Properties;
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertNotEquals;
+import static org.junit.Assert.fail;
+
+public class TestGracefulShutdown {
+
+  @BeforeClass
+  public static void setUpTestData() {
+for( int i = 0; i < 1000; i++) {
+  setupFile(i);
+}
+  }
+
+
+  public static final Properties WEBSERVER_CONFIGURATION = new 
Properties() {
+{
+  put(ExecConstants.HTTP_ENABLE, true);
+}
+  };
+
+  public FixtureBuilder enableWebServer(FixtureBuilder builder) {
+Properties props = new Properties();
+props.putAll(WEBSERVER_CONFIGURATION);
+builder.configBuilder.configProps(props);
+return builder;
+  }
+
+
+  /*
+  Start multiple drillbits and then shutdown a drillbit. Query the online
+  endpoints and check if the drillbit still exists.
+   */
+  @Test
+  public void testOnlineEndPoints() throws  Exception {
+
+String[] drillbits = {"db1" ,"db2","db3", "db4", "db5", "db6"};
+FixtureBuilder builder = 
ClusterFixture.builder().withBits(drillbits).withLocalZk();
+
+
+try ( ClusterFixture cluster = builder.build();
+  ClientFixture client = cluster.clientFixture()) {
+
+  Drillbit drillbit = cluster.drillbit("db2");
+  DrillbitEndpoint drillbitEndpoint =  
drillbit.getRegistrationHandle().getEndPoint();
+  int grace_period = 
drillbit.getContext().getConfig().getInt("drill.exec.grace_period");
+  new Thread(new Runnable() {
+public void run() {
+  try {
+cluster.close_drillbit("db2");
+  } catch (Exception e) {
+e.printStackTrace();
+  }
+}
+  }).start();
+  //wait for graceperiod
+  Thread.sleep(grace_period);
+  Collection drillbitEndpoints = 
cluster.drillbit().getContext()
+  .getClusterCoordinator()
+  .getOnlineEndPoints();
+  Assert.assertFalse(drillbitEndpoints.contains(drillbitEndpoint));
+}
+  }
+  /*
+Test if the drillbit transitions from ONLINE state when a shutdown
+request is initiated
+   */
+  @Test
+  public void testStateChange() throws  Exception {
+
+String[] drillbits = {"db1" ,"db2", "db3", "db4", "db5", "db6"};
+FixtureBuilder builder = 
ClusterFixture.builder().withBits(drillbits).withLocalZk();
+
+try ( ClusterFixture cluster = builder.build();
+  ClientFixture client = cluster.clientFixture()) {
+  Drillbit drillbit = cluster.drillbit("db2");
+  int grace_period = 
drillbit.getContext().getConfig().getInt("drill.exec.grace_period");
+  DrillbitEndpoint drillbitEndpoint =  
drillbit.getRegistrationHandle().getEndPoint();
+  new Thread(new Runnable() {
+public void run() 

[GitHub] drill pull request #921: DRILL-4286 Graceful shutdown of drillbit

2017-11-03 Thread bitblender
Github user bitblender commented on a diff in the pull request:

https://github.com/apache/drill/pull/921#discussion_r148171637
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/coord/local/LocalClusterCoordinator.java
 ---
@@ -85,13 +88,62 @@ public void unregister(final RegistrationHandle handle) 
{
 endpoints.remove(handle);
   }
 
+  /**
+   * Update drillbit endpoint state. Drillbit advertises its
+   * state. State information is used during planning and initial
+   * client connection phases.
+   */
+  @Override
+  public RegistrationHandle update(RegistrationHandle handle, State state) 
{
+DrillbitEndpoint endpoint = handle.getEndPoint();
+endpoint = endpoint.toBuilder().setState(state).build();
+handle.setEndPoint(endpoint);
+endpoints.put(handle,endpoint);
+return handle;
+  }
+
   @Override
   public Collection getAvailableEndpoints() {
 return endpoints.values();
   }
 
+  /**
+   * Get a collection of ONLINE Drillbit endpoints by excluding the 
drillbits
+   * that are in QUIESCENT state (drillbits shutting down). Primarily used 
by the planner
+   * to plan queries only on ONLINE drillbits and used by the client 
during initial connection
+   * phase to connect to a drillbit (foreman)
+   * @return A collection of ONLINE endpoints
+   */
+  @Override
+  public Collection getOnlineEndPoints() {
+Collection runningEndPoints = new ArrayList<>();
+for (DrillbitEndpoint endpoint: endpoints.values()){
+  if(!endpoint.hasState() || endpoint.getState().equals(State.ONLINE)) 
{
+runningEndPoints.add(endpoint);
+  }
+}
+return runningEndPoints;
+  }
+
   private class Handle implements RegistrationHandle {
 private final UUID id = UUID.randomUUID();
+private DrillbitEndpoint drillbitEndpoint;
+
+/**
+ * Get the drillbit endpoint associated with the registration handle
+ * @return drillbit endpoint
+ */
+public DrillbitEndpoint getEndPoint() {
+  return drillbitEndpoint;
+}
+
+public void setEndPoint( DrillbitEndpoint endpoint) {
--- End diff --

spacing


---


[GitHub] drill pull request #921: DRILL-4286 Graceful shutdown of drillbit

2017-11-03 Thread bitblender
Github user bitblender commented on a diff in the pull request:

https://github.com/apache/drill/pull/921#discussion_r148676672
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/coord/zk/ZKClusterCoordinator.java
 ---
@@ -200,11 +206,47 @@ public void unregister(RegistrationHandle handle) {
 }
   }
 
+  /**
+   * Update drillbit endpoint state. Drillbit advertises its
+   * state in Zookeeper when a shutdown request of drillbit is
+   * triggered. State information is used during planning and
+   * initial client connection phases.
+   */
+  public RegistrationHandle update(RegistrationHandle handle, State state) 
{
+ZKRegistrationHandle h = (ZKRegistrationHandle) handle;
+  try {
+endpoint = h.endpoint.toBuilder().setState(state).build();
+ServiceInstance serviceInstance = 
ServiceInstance.builder().name(serviceName).id(h.id).payload(endpoint).build();
--- End diff --

suggestion: wrap this long line since you have already wrapped the other 
lines and comments


---


[GitHub] drill pull request #921: DRILL-4286 Graceful shutdown of drillbit

2017-11-03 Thread bitblender
Github user bitblender commented on a diff in the pull request:

https://github.com/apache/drill/pull/921#discussion_r148686859
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/Drillbit.java ---
@@ -69,14 +73,30 @@
 
   public final static String SYSTEM_OPTIONS_NAME = 
"org.apache.drill.exec.server.Drillbit.system_options";
 
-  private boolean isClosed = false;
-
   private final ClusterCoordinator coord;
   private final ServiceEngine engine;
   private final PersistentStoreProvider storeProvider;
   private final WorkManager manager;
   private final BootStrapContext context;
   private final WebServer webServer;
+  private final int gracePeriod;
+  private DrillbitStateManager stateManager;
+
+  public void setQuiescentMode(boolean quiescentMode) {
+this.quiescentMode = quiescentMode;
+  }
+
+  private volatile boolean quiescentMode;
+
+  public void setForceful_shutdown(boolean forceful_shutdown) {
+this.forceful_shutdown = forceful_shutdown;
+  }
+
+  private volatile boolean forceful_shutdown = false;
--- End diff --

any reason why you did not camel-case this variable? 


---


[GitHub] drill pull request #921: DRILL-4286 Graceful shutdown of drillbit

2017-11-03 Thread bitblender
Github user bitblender commented on a diff in the pull request:

https://github.com/apache/drill/pull/921#discussion_r148685835
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/coord/zk/ZKClusterCoordinator.java
 ---
@@ -229,27 +272,52 @@ public DrillbitEndpoint 
apply(ServiceInstance input) {
 });
 
   // set of newly dead bits : original bits - new set of active bits.
-  Set unregisteredBits = new HashSet<>(endpoints);
-  unregisteredBits.removeAll(newDrillbitSet);
-
+  Set unregisteredBits = new HashSet<>();
   // Set of newly live bits : new set of active bits - original bits.
-  Set registeredBits = new HashSet<>(newDrillbitSet);
-  registeredBits.removeAll(endpoints);
-
-  endpoints = newDrillbitSet;
+  Set registeredBits = new HashSet<>();
 
+  // Update the endpoints map with change in state of the endpoint or 
with the addition
+  // of new drillbit endpoints. Registered endpoints is set to newly 
live drillbit endpoints.
+  for ( DrillbitEndpoint endpoint : newDrillbitSet) {
+if (endpointsMap.containsKey(new 
MultiKey(endpoint.getAddress(),endpoint.getUserPort( {
+  endpointsMap.put(new 
MultiKey(endpoint.getAddress(),endpoint.getUserPort()),endpoint);
+}
+else {
+  registeredBits.add(endpoint);
+  endpointsMap.put(new 
MultiKey(endpoint.getAddress(),endpoint.getUserPort()),endpoint);
+}
+  }
+//  Iterator iterator = endpointsMap.keySet().iterator() ;
+//  while(iterator.hasNext()) {
+//MultiKey key = iterator.next();
+//if(!newDrillbitSet.contains(endpointsMap.get(key))) {
+//  unregisteredBits.add(endpointsMap.get(key));
+//  iterator.remove();
+//}
+//  }
+
+//   Remove all the endpoints that are newly dead
+  for ( MultiKey key: endpointsMap.keySet())
+  {
--- End diff --

open-brace placement for the for-loop is inconsistent with other instances


---


[GitHub] drill pull request #921: DRILL-4286 Graceful shutdown of drillbit

2017-11-03 Thread bitblender
Github user bitblender commented on a diff in the pull request:

https://github.com/apache/drill/pull/921#discussion_r148669660
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/coord/zk/ZKClusterCoordinator.java
 ---
@@ -229,27 +272,52 @@ public DrillbitEndpoint 
apply(ServiceInstance input) {
 });
 
   // set of newly dead bits : original bits - new set of active bits.
-  Set unregisteredBits = new HashSet<>(endpoints);
-  unregisteredBits.removeAll(newDrillbitSet);
-
+  Set unregisteredBits = new HashSet<>();
   // Set of newly live bits : new set of active bits - original bits.
-  Set registeredBits = new HashSet<>(newDrillbitSet);
-  registeredBits.removeAll(endpoints);
-
-  endpoints = newDrillbitSet;
+  Set registeredBits = new HashSet<>();
 
+  // Update the endpoints map with change in state of the endpoint or 
with the addition
+  // of new drillbit endpoints. Registered endpoints is set to newly 
live drillbit endpoints.
+  for ( DrillbitEndpoint endpoint : newDrillbitSet) {
+if (endpointsMap.containsKey(new 
MultiKey(endpoint.getAddress(),endpoint.getUserPort( {
--- End diff --

I would recommend extracting endpoint.getAddress() and 
endpoint.getUserPort() into locals


---


[GitHub] drill pull request #921: DRILL-4286 Graceful shutdown of drillbit

2017-09-29 Thread sudheeshkatkam
Github user sudheeshkatkam commented on a diff in the pull request:

https://github.com/apache/drill/pull/921#discussion_r141981097
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/coord/zk/ZKClusterCoordinator.java
 ---
@@ -200,11 +206,47 @@ public void unregister(RegistrationHandle handle) {
 }
   }
 
+  /**
+   * Update drillbit endpoint state. Drillbit advertises its
+   * state in Zookeeper when a shutdown request of drillbit is
+   * triggered. State information is used during planning and
+   * initial client connection phases.
+   */
+  public RegistrationHandle update(RegistrationHandle handle, State state) 
{
+ZKRegistrationHandle h = (ZKRegistrationHandle) handle;
+  try {
+endpoint = h.endpoint.toBuilder().setState(state).build();
+ServiceInstance serviceInstance = 
ServiceInstance.builder().name(serviceName).id(h.id).payload(endpoint).build();
+discovery.updateService(serviceInstance);
+  } catch (Exception e) {
+e.printStackTrace();
+  }
+  return handle;
+  }
+
   @Override
   public Collection getAvailableEndpoints() {
 return this.endpoints;
   }
 
+  /*
+   * Get a collection of ONLINE Drillbit endpoints by excluding the 
drillbits
+   * that are in QUIESCENT state (drillbits shutting down). Primarily used 
by the planner
+   * to plan queries only on ONLINE drillbits and used by the client 
during initial connection
+   * phase to connect to a drillbit (foreman)
+   * @return A collection of ONLINE endpoints
+   */
+  @Override
+  public Collection getOnlineEndPoints() {
+Collection runningEndPoints = new ArrayList<>();
+for (DrillbitEndpoint endpoint: endpoints){
+  if(endpoint.getState().equals(State.ONLINE)) {
--- End diff --

(more relevant in this case)

This check should be:
if(!endpoint.hasState() || endpoint.getState().equals(State.ONLINE)) {
to not break backward compatibility (new client and old cluster of bits). 
So the assumption is that old server is ONLINE.


---


[GitHub] drill pull request #921: DRILL-4286 Graceful shutdown of drillbit

2017-09-29 Thread sudheeshkatkam
Github user sudheeshkatkam commented on a diff in the pull request:

https://github.com/apache/drill/pull/921#discussion_r141980790
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/coord/local/LocalClusterCoordinator.java
 ---
@@ -85,13 +88,62 @@ public void unregister(final RegistrationHandle handle) 
{
 endpoints.remove(handle);
   }
 
+  /**
+   * Update drillbit endpoint state. Drillbit advertises its
+   * state. State information is used during planning and initial
+   * client connection phases.
+   */
+  @Override
+  public RegistrationHandle update(RegistrationHandle handle, State state) 
{
+DrillbitEndpoint endpoint = handle.getEndPoint();
+endpoint = endpoint.toBuilder().setState(state).build();
+handle.setEndPoint(endpoint);
+endpoints.put(handle,endpoint);
+return handle;
+  }
+
   @Override
   public Collection getAvailableEndpoints() {
 return endpoints.values();
   }
 
+  /**
+   * Get a collection of ONLINE Drillbit endpoints by excluding the 
drillbits
+   * that are in QUIESCENT state (drillbits shutting down). Primarily used 
by the planner
+   * to plan queries only on ONLINE drillbits and used by the client 
during initial connection
+   * phase to connect to a drillbit (foreman)
+   * @return A collection of ONLINE endpoints
+   */
+  @Override
+  public Collection getOnlineEndPoints() {
+Collection runningEndPoints = new ArrayList<>();
+for (DrillbitEndpoint endpoint: endpoints.values()){
+  if(endpoint.getState().equals(State.ONLINE)) {
--- End diff --

This check should be:
`if(!endpoint.hasState() || endpoint.getState().equals(State.ONLINE)) {`
to not break backward compatibility (new client and old cluster of bits). 
So the assumption is that old server is ONLINE.


---


[GitHub] drill pull request #921: DRILL-4286 Graceful shutdown of drillbit

2017-08-25 Thread dvjyothsna
GitHub user dvjyothsna opened a pull request:

https://github.com/apache/drill/pull/921

DRILL-4286 Graceful shutdown of drillbit

Following is the design document 
https://docs.google.com/document/d/1AauIwVCjsBKBSbbU6sQODErrEp7n3IwniMtgK4m09MQ/edit#heading=h.224lbmcaxu2d.

I will be doing minor modifications and adding more tests for a couple of 
days.

PCAP_SUB_SCAN in UserBitShared.proto is missing. I have added that since I 
made some changes to protobuf and generated code. Once it is added as part of 
DRILL-5432 I will remove it from my commit.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dvjyothsna/drill DRILL-4286

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/921.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #921


commit 9968d60da1c5c75dda3162f5913968f05a378366
Author: dvjyothsna 
Date:   2017-08-24T15:58:00Z

DRILL-4286 Graceful shutdown of drillbit




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---