[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query

2017-11-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263503#comment-16263503
 ] 

ASF GitHub Bot commented on DRILL-5089:
---

Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/1032


> Skip initializing all enabled storage plugins for every query
> -
>
> Key: DRILL-5089
> URL: https://issues.apache.org/jira/browse/DRILL-5089
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 1.9.0
>Reporter: Abhishek Girish
>Assignee: Chunhui Shi
>Priority: Critical
>  Labels: ready-to-commit
> Fix For: 1.12.0
>
>
> In a query's lifecycle, at attempt is made to initialize each enabled storage 
> plugin, while building the schema tree. This is done regardless of the actual 
> plugins involved within a query. 
> Sometimes, when one or more of the enabled storage plugins have issues - 
> either due to misconfiguration or the underlying datasource being slow or 
> being down, the overall query time taken increases drastically. Most likely 
> due the attempt being made to register schemas from a faulty plugin.
> For example, when a jdbc plugin is configured with SQL Server, and at one 
> point the underlying SQL Server db goes down, any Drill query starting to 
> execute at that point and beyond begin to slow down drastically. 
> We must skip registering unrelated schemas (& workspaces) for a query. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query

2017-11-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261373#comment-16261373
 ] 

ASF GitHub Bot commented on DRILL-5089:
---

Github user chunhui-shi closed the pull request at:

https://github.com/apache/drill/pull/795


> Skip initializing all enabled storage plugins for every query
> -
>
> Key: DRILL-5089
> URL: https://issues.apache.org/jira/browse/DRILL-5089
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 1.9.0
>Reporter: Abhishek Girish
>Assignee: Chunhui Shi
>Priority: Critical
>  Labels: ready-to-commit
> Fix For: 1.12.0
>
>
> In a query's lifecycle, at attempt is made to initialize each enabled storage 
> plugin, while building the schema tree. This is done regardless of the actual 
> plugins involved within a query. 
> Sometimes, when one or more of the enabled storage plugins have issues - 
> either due to misconfiguration or the underlying datasource being slow or 
> being down, the overall query time taken increases drastically. Most likely 
> due the attempt being made to register schemas from a faulty plugin.
> For example, when a jdbc plugin is configured with SQL Server, and at one 
> point the underlying SQL Server db goes down, any Drill query starting to 
> execute at that point and beyond begin to slow down drastically. 
> We must skip registering unrelated schemas (& workspaces) for a query. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query

2017-11-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16260553#comment-16260553
 ] 

ASF GitHub Bot commented on DRILL-5089:
---

Github user arina-ielchiieva commented on the issue:

https://github.com/apache/drill/pull/1032
  
+1


> Skip initializing all enabled storage plugins for every query
> -
>
> Key: DRILL-5089
> URL: https://issues.apache.org/jira/browse/DRILL-5089
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 1.9.0
>Reporter: Abhishek Girish
>Assignee: Chunhui Shi
>Priority: Critical
>  Labels: ready-to-commit
> Fix For: 1.12.0
>
>
> In a query's lifecycle, at attempt is made to initialize each enabled storage 
> plugin, while building the schema tree. This is done regardless of the actual 
> plugins involved within a query. 
> Sometimes, when one or more of the enabled storage plugins have issues - 
> either due to misconfiguration or the underlying datasource being slow or 
> being down, the overall query time taken increases drastically. Most likely 
> due the attempt being made to register schemas from a faulty plugin.
> For example, when a jdbc plugin is configured with SQL Server, and at one 
> point the underlying SQL Server db goes down, any Drill query starting to 
> execute at that point and beyond begin to slow down drastically. 
> We must skip registering unrelated schemas (& workspaces) for a query. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query

2017-11-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259614#comment-16259614
 ] 

ASF GitHub Bot commented on DRILL-5089:
---

Github user chunhui-shi commented on a diff in the pull request:

https://github.com/apache/drill/pull/1032#discussion_r152073873
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/DynamicRootSchema.java
 ---
@@ -0,0 +1,140 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.sql;
+
+import com.google.common.collect.ImmutableSortedSet;
+import com.google.common.collect.Lists;
+import com.google.common.collect.Sets;
+import org.apache.calcite.DataContext;
+import org.apache.calcite.jdbc.CalciteRootSchema;
+import org.apache.calcite.jdbc.CalciteSchema;
+
+import org.apache.calcite.linq4j.tree.Expression;
+import org.apache.calcite.linq4j.tree.Expressions;
+import org.apache.calcite.schema.SchemaPlus;
+import org.apache.calcite.schema.impl.AbstractSchema;
+import org.apache.calcite.util.BuiltInMethod;
+import org.apache.calcite.util.Compatible;
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.exec.store.SchemaConfig;
+import org.apache.drill.exec.store.StoragePlugin;
+import org.apache.drill.exec.store.StoragePluginRegistry;
+import org.apache.drill.exec.store.SubSchemaWrapper;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.Map;
+import java.util.NavigableSet;
+import java.util.Set;
+
+/**
+ * This class is to allow us loading schemas from storage plugins later 
when {@link #getSubSchema(String, boolean)}
+ * is called.
+ */
+public class DynamicRootSchema extends DynamicSchema
+implements CalciteRootSchema {
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(DynamicRootSchema.class);
+  /** Creates a root schema. */
+  DynamicRootSchema(StoragePluginRegistry storages, SchemaConfig 
schemaConfig) {
+super(null, new RootSchema(), "");
+this.schemaConfig = schemaConfig;
+this.storages = storages;
+  }
+
+  @Override
+  public CalciteSchema getSubSchema(String schemaName, boolean 
caseSensitive) {
+CalciteSchema retSchema = getSubSchemaMap().get(schemaName);
+if (retSchema != null) {
+  return retSchema;
+}
+
+loadSchemaFactory(schemaName, caseSensitive);
+retSchema = getSubSchemaMap().get(schemaName);
+return retSchema;
+  }
+
+  @Override
+  public NavigableSet getTableNames() {
+Set pluginNames = Sets.newHashSet();
+for (Map.Entry storageEntry : 
getSchemaFactories()) {
+  pluginNames.add(storageEntry.getKey());
+}
+return Compatible.INSTANCE.navigableSet(
+ImmutableSortedSet.copyOf(
+Sets.union(pluginNames, getSubSchemaMap().keySet(;
+  }
+
+  /**
+   * load schema factory(storage plugin) for schemaName
+   * @param schemaName
+   * @param caseSensitive
+   */
+  public void loadSchemaFactory(String schemaName, boolean caseSensitive) {
+try {
+  SchemaPlus thisPlus = this.plus();
+  StoragePlugin plugin = getSchemaFactories().getPlugin(schemaName);
+  if (plugin != null) {
+plugin.registerSchemas(schemaConfig, thisPlus);
+return;
+  }
+
+  // we could not find the plugin, the schemaName could be `dfs.tmp`, 
a 2nd level schema under 'dfs'
+  String[] paths = schemaName.split("\\.");
+  if (paths.length == 2) {
+plugin = getSchemaFactories().getPlugin(paths[0]);
+if (plugin == null) {
+  return;
+}
+
+// we could find storage plugin for first part(e.g. 'dfs') of 
schemaName (e.g. 'dfs.tmp')
+// register schema for this storage 

[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query

2017-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257633#comment-16257633
 ] 

ASF GitHub Bot commented on DRILL-5089:
---

Github user chunhui-shi commented on a diff in the pull request:

https://github.com/apache/drill/pull/1032#discussion_r151798147
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/DynamicRootSchema.java
 ---
@@ -0,0 +1,140 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.sql;
+
+import com.google.common.collect.ImmutableSortedSet;
+import com.google.common.collect.Lists;
+import com.google.common.collect.Sets;
+import org.apache.calcite.DataContext;
+import org.apache.calcite.jdbc.CalciteRootSchema;
+import org.apache.calcite.jdbc.CalciteSchema;
+
+import org.apache.calcite.linq4j.tree.Expression;
+import org.apache.calcite.linq4j.tree.Expressions;
+import org.apache.calcite.schema.SchemaPlus;
+import org.apache.calcite.schema.impl.AbstractSchema;
+import org.apache.calcite.util.BuiltInMethod;
+import org.apache.calcite.util.Compatible;
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.exec.store.SchemaConfig;
+import org.apache.drill.exec.store.StoragePlugin;
+import org.apache.drill.exec.store.StoragePluginRegistry;
+import org.apache.drill.exec.store.SubSchemaWrapper;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.Map;
+import java.util.NavigableSet;
+import java.util.Set;
+
+/**
+ * This class is to allow us loading schemas from storage plugins later 
when {@link #getSubSchema(String, boolean)}
+ * is called.
+ */
+public class DynamicRootSchema extends DynamicSchema
+implements CalciteRootSchema {
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(DynamicRootSchema.class);
+  /** Creates a root schema. */
+  DynamicRootSchema(StoragePluginRegistry storages, SchemaConfig 
schemaConfig) {
+super(null, new RootSchema(), "");
+this.schemaConfig = schemaConfig;
+this.storages = storages;
+  }
+
+  @Override
+  public CalciteSchema getSubSchema(String schemaName, boolean 
caseSensitive) {
+CalciteSchema retSchema = getSubSchemaMap().get(schemaName);
+if (retSchema != null) {
+  return retSchema;
+}
+
+loadSchemaFactory(schemaName, caseSensitive);
+retSchema = getSubSchemaMap().get(schemaName);
+return retSchema;
+  }
+
+  @Override
+  public NavigableSet getTableNames() {
+Set pluginNames = Sets.newHashSet();
+for (Map.Entry storageEntry : 
getSchemaFactories()) {
+  pluginNames.add(storageEntry.getKey());
+}
+return Compatible.INSTANCE.navigableSet(
+ImmutableSortedSet.copyOf(
+Sets.union(pluginNames, getSubSchemaMap().keySet(;
+  }
+
+  /**
+   * load schema factory(storage plugin) for schemaName
+   * @param schemaName
+   * @param caseSensitive
+   */
+  public void loadSchemaFactory(String schemaName, boolean caseSensitive) {
+try {
+  SchemaPlus thisPlus = this.plus();
+  StoragePlugin plugin = getSchemaFactories().getPlugin(schemaName);
+  if (plugin != null) {
+plugin.registerSchemas(schemaConfig, thisPlus);
+return;
+  }
+
+  // we could not find the plugin, the schemaName could be `dfs.tmp`, 
a 2nd level schema under 'dfs'
+  String[] paths = schemaName.split("\\.");
+  if (paths.length == 2) {
+plugin = getSchemaFactories().getPlugin(paths[0]);
+if (plugin == null) {
+  return;
+}
+
+// we could find storage plugin for first part(e.g. 'dfs') of 
schemaName (e.g. 'dfs.tmp')
+// register schema for this storage 

[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query

2017-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257599#comment-16257599
 ] 

ASF GitHub Bot commented on DRILL-5089:
---

Github user chunhui-shi commented on a diff in the pull request:

https://github.com/apache/drill/pull/1032#discussion_r151793647
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java
 ---
@@ -373,12 +402,12 @@ public String toString() {
   public class WorkspaceSchema extends AbstractSchema implements 
ExpandingConcurrentMap.MapValueFactory {
 private final ExpandingConcurrentMap tables 
= new ExpandingConcurrentMap<>(this);
 private final SchemaConfig schemaConfig;
-private final DrillFileSystem fs;
+private DrillFileSystem fs;
 
-public WorkspaceSchema(List parentSchemaPath, String wsName, 
SchemaConfig schemaConfig) throws IOException {
+public WorkspaceSchema(List parentSchemaPath, String wsName, 
SchemaConfig schemaConfig, DrillFileSystem fs) throws IOException {
   super(parentSchemaPath, wsName);
   this.schemaConfig = schemaConfig;
-  this.fs = 
ImpersonationUtil.createFileSystem(schemaConfig.getUserName(), fsConf);
+  this.fs = fs;
--- End diff --

Now we pass in fs instead creating from inside of WorkspaceSchema. 


> Skip initializing all enabled storage plugins for every query
> -
>
> Key: DRILL-5089
> URL: https://issues.apache.org/jira/browse/DRILL-5089
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 1.9.0
>Reporter: Abhishek Girish
>Assignee: Chunhui Shi
>Priority: Critical
>
> In a query's lifecycle, at attempt is made to initialize each enabled storage 
> plugin, while building the schema tree. This is done regardless of the actual 
> plugins involved within a query. 
> Sometimes, when one or more of the enabled storage plugins have issues - 
> either due to misconfiguration or the underlying datasource being slow or 
> being down, the overall query time taken increases drastically. Most likely 
> due the attempt being made to register schemas from a faulty plugin.
> For example, when a jdbc plugin is configured with SQL Server, and at one 
> point the underlying SQL Server db goes down, any Drill query starting to 
> execute at that point and beyond begin to slow down drastically. 
> We must skip registering unrelated schemas (& workspaces) for a query. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query

2017-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257601#comment-16257601
 ] 

ASF GitHub Bot commented on DRILL-5089:
---

Github user chunhui-shi commented on a diff in the pull request:

https://github.com/apache/drill/pull/1032#discussion_r151793708
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java
 ---
@@ -150,14 +152,30 @@ public WorkspaceSchemaFactory(
* @return True if the user has access. False otherwise.
*/
   public boolean accessible(final String userName) throws IOException {
-final FileSystem fs = ImpersonationUtil.createFileSystem(userName, 
fsConf);
+final DrillFileSystem fs = 
ImpersonationUtil.createFileSystem(userName, fsConf);
+return accessible(fs);
+  }
+
+  /**
+   * Checks whether a FileSystem object has the permission to list/read 
workspace directory
+   * @param fs a DrillFileSystem object that was created with certain user 
privilege
+   * @return True if the user has access. False otherwise.
+   * @throws IOException
+   */
+  public boolean accessible(DrillFileSystem fs) throws IOException {
 try {
-  // We have to rely on the listStatus as a FileSystem can have 
complicated controls such as regular unix style
-  // permissions, Access Control Lists (ACLs) or Access Control 
Expressions (ACE). Hadoop 2.7 version of FileSystem
-  // has a limited private API (FileSystem.access) to check the 
permissions directly
-  // (see https://issues.apache.org/jira/browse/HDFS-6570). Drill 
currently relies on Hadoop 2.5.0 version of
-  // FileClient. TODO: Update this when DRILL-3749 is fixed.
-  fs.listStatus(wsPath);
+  /**
+   * For Windows local file system, fs.access ends up using 
DeprecatedRawLocalFileStatus which has
+   * TrustedInstaller as owner, and a member of Administrators group 
could not satisfy the permission.
+   * In this case, we will still use method listStatus.
+   * In other cases, we use access method since it is cheaper.
+   */
+  if (SystemUtils.IS_OS_WINDOWS && 
fs.getUri().getScheme().equalsIgnoreCase(FileSystemSchemaFactory.LOCAL_FS_SCHEME))
 {
--- End diff --

Yes. it was tested in windows unit tests.


> Skip initializing all enabled storage plugins for every query
> -
>
> Key: DRILL-5089
> URL: https://issues.apache.org/jira/browse/DRILL-5089
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 1.9.0
>Reporter: Abhishek Girish
>Assignee: Chunhui Shi
>Priority: Critical
>
> In a query's lifecycle, at attempt is made to initialize each enabled storage 
> plugin, while building the schema tree. This is done regardless of the actual 
> plugins involved within a query. 
> Sometimes, when one or more of the enabled storage plugins have issues - 
> either due to misconfiguration or the underlying datasource being slow or 
> being down, the overall query time taken increases drastically. Most likely 
> due the attempt being made to register schemas from a faulty plugin.
> For example, when a jdbc plugin is configured with SQL Server, and at one 
> point the underlying SQL Server db goes down, any Drill query starting to 
> execute at that point and beyond begin to slow down drastically. 
> We must skip registering unrelated schemas (& workspaces) for a query. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query

2017-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257240#comment-16257240
 ] 

ASF GitHub Bot commented on DRILL-5089:
---

Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1032#discussion_r151732963
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java
 ---
@@ -373,12 +402,12 @@ public String toString() {
   public class WorkspaceSchema extends AbstractSchema implements 
ExpandingConcurrentMap.MapValueFactory {
 private final ExpandingConcurrentMap tables 
= new ExpandingConcurrentMap<>(this);
 private final SchemaConfig schemaConfig;
-private final DrillFileSystem fs;
+private DrillFileSystem fs;
 
-public WorkspaceSchema(List parentSchemaPath, String wsName, 
SchemaConfig schemaConfig) throws IOException {
+public WorkspaceSchema(List parentSchemaPath, String wsName, 
SchemaConfig schemaConfig, DrillFileSystem fs) throws IOException {
   super(parentSchemaPath, wsName);
   this.schemaConfig = schemaConfig;
-  this.fs = 
ImpersonationUtil.createFileSystem(schemaConfig.getUserName(), fsConf);
+  this.fs = fs;
--- End diff --

Why don't we anymore need to create fs using `ImpersonationUtil` but needed 
before?


> Skip initializing all enabled storage plugins for every query
> -
>
> Key: DRILL-5089
> URL: https://issues.apache.org/jira/browse/DRILL-5089
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 1.9.0
>Reporter: Abhishek Girish
>Assignee: Chunhui Shi
>Priority: Critical
>
> In a query's lifecycle, at attempt is made to initialize each enabled storage 
> plugin, while building the schema tree. This is done regardless of the actual 
> plugins involved within a query. 
> Sometimes, when one or more of the enabled storage plugins have issues - 
> either due to misconfiguration or the underlying datasource being slow or 
> being down, the overall query time taken increases drastically. Most likely 
> due the attempt being made to register schemas from a faulty plugin.
> For example, when a jdbc plugin is configured with SQL Server, and at one 
> point the underlying SQL Server db goes down, any Drill query starting to 
> execute at that point and beyond begin to slow down drastically. 
> We must skip registering unrelated schemas (& workspaces) for a query. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query

2017-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257239#comment-16257239
 ] 

ASF GitHub Bot commented on DRILL-5089:
---

Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1032#discussion_r151713558
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/TestSchema.java
 ---
@@ -0,0 +1,84 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl;
+
+import org.apache.drill.test.ClientFixture;
+import org.apache.drill.test.ClusterFixture;
+import org.apache.drill.test.ClusterMockStorageFixture;
+import org.apache.drill.test.DrillTest;
+
+import org.junit.AfterClass;
+import org.junit.BeforeClass;
+import org.junit.Test;
+
+import static org.junit.Assert.assertTrue;
+
+public class TestSchema extends DrillTest {
+
+  private static ClusterMockStorageFixture cluster;
+  private static ClientFixture client;
+
+  @BeforeClass
+  public static void setup() throws Exception {
+cluster = ClusterFixture.builder().buildCustomMockStorage();
+boolean breakRegisterSchema = true;
+cluster.insertMockStorage("mock_broken", breakRegisterSchema);
+cluster.insertMockStorage("mock_good", !breakRegisterSchema);
+client = cluster.clientFixture();
+  }
+
+  @Test
+  public void testQueryBrokenStorage() {
+String sql = "SELECT id_i, name_s10 FROM `mock_broken`.`employees_5`";
+try {
+  client.queryBuilder().sql(sql).printCsv();
+} catch (Exception ex) {
+  assertTrue(ex.getMessage().contains("VALIDATION ERROR: Schema"));
--- End diff --

This test can give false positive result when exception won't be thrown at 
all. Please re-throw the exception after the check and add `@Test(expected = 
Exception.class)`.


> Skip initializing all enabled storage plugins for every query
> -
>
> Key: DRILL-5089
> URL: https://issues.apache.org/jira/browse/DRILL-5089
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 1.9.0
>Reporter: Abhishek Girish
>Assignee: Chunhui Shi
>Priority: Critical
>
> In a query's lifecycle, at attempt is made to initialize each enabled storage 
> plugin, while building the schema tree. This is done regardless of the actual 
> plugins involved within a query. 
> Sometimes, when one or more of the enabled storage plugins have issues - 
> either due to misconfiguration or the underlying datasource being slow or 
> being down, the overall query time taken increases drastically. Most likely 
> due the attempt being made to register schemas from a faulty plugin.
> For example, when a jdbc plugin is configured with SQL Server, and at one 
> point the underlying SQL Server db goes down, any Drill query starting to 
> execute at that point and beyond begin to slow down drastically. 
> We must skip registering unrelated schemas (& workspaces) for a query. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query

2017-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257236#comment-16257236
 ] 

ASF GitHub Bot commented on DRILL-5089:
---

Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1032#discussion_r151713734
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/mock/MockBreakageStorage.java
 ---
@@ -0,0 +1,46 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.mock;
+
+import org.apache.calcite.schema.SchemaPlus;
+import org.apache.drill.exec.server.DrillbitContext;
+import org.apache.drill.exec.store.SchemaConfig;
+
+import java.io.IOException;
+
+public class MockBreakageStorage extends MockStorageEngine {
+
+  boolean breakRegister;
--- End diff --

private


> Skip initializing all enabled storage plugins for every query
> -
>
> Key: DRILL-5089
> URL: https://issues.apache.org/jira/browse/DRILL-5089
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 1.9.0
>Reporter: Abhishek Girish
>Assignee: Chunhui Shi
>Priority: Critical
>
> In a query's lifecycle, at attempt is made to initialize each enabled storage 
> plugin, while building the schema tree. This is done regardless of the actual 
> plugins involved within a query. 
> Sometimes, when one or more of the enabled storage plugins have issues - 
> either due to misconfiguration or the underlying datasource being slow or 
> being down, the overall query time taken increases drastically. Most likely 
> due the attempt being made to register schemas from a faulty plugin.
> For example, when a jdbc plugin is configured with SQL Server, and at one 
> point the underlying SQL Server db goes down, any Drill query starting to 
> execute at that point and beyond begin to slow down drastically. 
> We must skip registering unrelated schemas (& workspaces) for a query. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query

2017-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257234#comment-16257234
 ] 

ASF GitHub Bot commented on DRILL-5089:
---

Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1032#discussion_r151713266
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/TestSchema.java
 ---
@@ -0,0 +1,84 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl;
+
+import org.apache.drill.test.ClientFixture;
+import org.apache.drill.test.ClusterFixture;
+import org.apache.drill.test.ClusterMockStorageFixture;
+import org.apache.drill.test.DrillTest;
+
+import org.junit.AfterClass;
+import org.junit.BeforeClass;
+import org.junit.Test;
+
+import static org.junit.Assert.assertTrue;
+
+public class TestSchema extends DrillTest {
+
+  private static ClusterMockStorageFixture cluster;
+  private static ClientFixture client;
+
+  @BeforeClass
+  public static void setup() throws Exception {
+cluster = ClusterFixture.builder().buildCustomMockStorage();
+boolean breakRegisterSchema = true;
+cluster.insertMockStorage("mock_broken", breakRegisterSchema);
+cluster.insertMockStorage("mock_good", !breakRegisterSchema);
+client = cluster.clientFixture();
+  }
+
+  @Test
+  public void testQueryBrokenStorage() {
+String sql = "SELECT id_i, name_s10 FROM `mock_broken`.`employees_5`";
+try {
+  client.queryBuilder().sql(sql).printCsv();
+} catch (Exception ex) {
+  assertTrue(ex.getMessage().contains("VALIDATION ERROR: Schema"));
+}
+  }
+
+  @Test
+  public void testQueryGoodStorage() {
+String sql = "SELECT id_i, name_s10 FROM `mock_good`.`employees_5`";
+client.queryBuilder().sql(sql).printCsv();
+  }
+
+  @Test
+  public void testQueryGoodStorageWithDefaultSchema() throws Exception {
+String use_dfs = "use dfs.tmp";
+client.queryBuilder().sql(use_dfs).run();
+String sql = "SELECT id_i, name_s10 FROM `mock_good`.`employees_5`";
+client.queryBuilder().sql(sql).printCsv();
--- End diff --

Do we actually want to print csv here? I suggest we produce no output here.


> Skip initializing all enabled storage plugins for every query
> -
>
> Key: DRILL-5089
> URL: https://issues.apache.org/jira/browse/DRILL-5089
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 1.9.0
>Reporter: Abhishek Girish
>Assignee: Chunhui Shi
>Priority: Critical
>
> In a query's lifecycle, at attempt is made to initialize each enabled storage 
> plugin, while building the schema tree. This is done regardless of the actual 
> plugins involved within a query. 
> Sometimes, when one or more of the enabled storage plugins have issues - 
> either due to misconfiguration or the underlying datasource being slow or 
> being down, the overall query time taken increases drastically. Most likely 
> due the attempt being made to register schemas from a faulty plugin.
> For example, when a jdbc plugin is configured with SQL Server, and at one 
> point the underlying SQL Server db goes down, any Drill query starting to 
> execute at that point and beyond begin to slow down drastically. 
> We must skip registering unrelated schemas (& workspaces) for a query. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query

2017-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257242#comment-16257242
 ] 

ASF GitHub Bot commented on DRILL-5089:
---

Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1032#discussion_r151713624
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/TestSchema.java
 ---
@@ -0,0 +1,84 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl;
+
+import org.apache.drill.test.ClientFixture;
+import org.apache.drill.test.ClusterFixture;
+import org.apache.drill.test.ClusterMockStorageFixture;
+import org.apache.drill.test.DrillTest;
+
+import org.junit.AfterClass;
+import org.junit.BeforeClass;
+import org.junit.Test;
+
+import static org.junit.Assert.assertTrue;
+
+public class TestSchema extends DrillTest {
+
+  private static ClusterMockStorageFixture cluster;
+  private static ClientFixture client;
+
+  @BeforeClass
+  public static void setup() throws Exception {
+cluster = ClusterFixture.builder().buildCustomMockStorage();
+boolean breakRegisterSchema = true;
+cluster.insertMockStorage("mock_broken", breakRegisterSchema);
+cluster.insertMockStorage("mock_good", !breakRegisterSchema);
+client = cluster.clientFixture();
+  }
+
+  @Test
+  public void testQueryBrokenStorage() {
+String sql = "SELECT id_i, name_s10 FROM `mock_broken`.`employees_5`";
+try {
+  client.queryBuilder().sql(sql).printCsv();
+} catch (Exception ex) {
+  assertTrue(ex.getMessage().contains("VALIDATION ERROR: Schema"));
+}
+  }
+
+  @Test
+  public void testQueryGoodStorage() {
+String sql = "SELECT id_i, name_s10 FROM `mock_good`.`employees_5`";
+client.queryBuilder().sql(sql).printCsv();
+  }
+
+  @Test
+  public void testQueryGoodStorageWithDefaultSchema() throws Exception {
+String use_dfs = "use dfs.tmp";
+client.queryBuilder().sql(use_dfs).run();
+String sql = "SELECT id_i, name_s10 FROM `mock_good`.`employees_5`";
+client.queryBuilder().sql(sql).printCsv();
+  }
+
+  @Test
+  public void testUseBrokenStorage() throws Exception {
+try {
+  String use_dfs = "use mock_broken";
+  client.queryBuilder().sql(use_dfs).run();
+} catch(Exception ex) {
+  assertTrue(ex.getMessage().contains("VALIDATION ERROR: Schema"));
+}
+  }
+
+  @AfterClass
--- End diff --

Can be removed.


> Skip initializing all enabled storage plugins for every query
> -
>
> Key: DRILL-5089
> URL: https://issues.apache.org/jira/browse/DRILL-5089
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 1.9.0
>Reporter: Abhishek Girish
>Assignee: Chunhui Shi
>Priority: Critical
>
> In a query's lifecycle, at attempt is made to initialize each enabled storage 
> plugin, while building the schema tree. This is done regardless of the actual 
> plugins involved within a query. 
> Sometimes, when one or more of the enabled storage plugins have issues - 
> either due to misconfiguration or the underlying datasource being slow or 
> being down, the overall query time taken increases drastically. Most likely 
> due the attempt being made to register schemas from a faulty plugin.
> For example, when a jdbc plugin is configured with SQL Server, and at one 
> point the underlying SQL Server db goes down, any Drill query starting to 
> execute at that point and beyond begin to slow down drastically. 
> We must skip registering unrelated schemas (& workspaces) for a query. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query

2017-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257235#comment-16257235
 ] 

ASF GitHub Bot commented on DRILL-5089:
---

Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1032#discussion_r151714217
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java
 ---
@@ -150,14 +152,30 @@ public WorkspaceSchemaFactory(
* @return True if the user has access. False otherwise.
*/
   public boolean accessible(final String userName) throws IOException {
-final FileSystem fs = ImpersonationUtil.createFileSystem(userName, 
fsConf);
+final DrillFileSystem fs = 
ImpersonationUtil.createFileSystem(userName, fsConf);
+return accessible(fs);
+  }
+
+  /**
+   * Checks whether a FileSystem object has the permission to list/read 
workspace directory
+   * @param fs a DrillFileSystem object that was created with certain user 
privilege
+   * @return True if the user has access. False otherwise.
+   * @throws IOException
+   */
+  public boolean accessible(DrillFileSystem fs) throws IOException {
 try {
-  // We have to rely on the listStatus as a FileSystem can have 
complicated controls such as regular unix style
-  // permissions, Access Control Lists (ACLs) or Access Control 
Expressions (ACE). Hadoop 2.7 version of FileSystem
-  // has a limited private API (FileSystem.access) to check the 
permissions directly
-  // (see https://issues.apache.org/jira/browse/HDFS-6570). Drill 
currently relies on Hadoop 2.5.0 version of
-  // FileClient. TODO: Update this when DRILL-3749 is fixed.
-  fs.listStatus(wsPath);
+  /**
+   * For Windows local file system, fs.access ends up using 
DeprecatedRawLocalFileStatus which has
+   * TrustedInstaller as owner, and a member of Administrators group 
could not satisfy the permission.
+   * In this case, we will still use method listStatus.
+   * In other cases, we use access method since it is cheaper.
+   */
+  if (SystemUtils.IS_OS_WINDOWS && 
fs.getUri().getScheme().equalsIgnoreCase(FileSystemSchemaFactory.LOCAL_FS_SCHEME))
 {
--- End diff --

Just in case, did you check that everything works on Windows?


> Skip initializing all enabled storage plugins for every query
> -
>
> Key: DRILL-5089
> URL: https://issues.apache.org/jira/browse/DRILL-5089
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 1.9.0
>Reporter: Abhishek Girish
>Assignee: Chunhui Shi
>Priority: Critical
>
> In a query's lifecycle, at attempt is made to initialize each enabled storage 
> plugin, while building the schema tree. This is done regardless of the actual 
> plugins involved within a query. 
> Sometimes, when one or more of the enabled storage plugins have issues - 
> either due to misconfiguration or the underlying datasource being slow or 
> being down, the overall query time taken increases drastically. Most likely 
> due the attempt being made to register schemas from a faulty plugin.
> For example, when a jdbc plugin is configured with SQL Server, and at one 
> point the underlying SQL Server db goes down, any Drill query starting to 
> execute at that point and beyond begin to slow down drastically. 
> We must skip registering unrelated schemas (& workspaces) for a query. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query

2017-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257241#comment-16257241
 ] 

ASF GitHub Bot commented on DRILL-5089:
---

Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1032#discussion_r151717700
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/DynamicRootSchema.java
 ---
@@ -0,0 +1,140 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.sql;
+
+import com.google.common.collect.ImmutableSortedSet;
+import com.google.common.collect.Lists;
+import com.google.common.collect.Sets;
+import org.apache.calcite.DataContext;
+import org.apache.calcite.jdbc.CalciteRootSchema;
+import org.apache.calcite.jdbc.CalciteSchema;
+
+import org.apache.calcite.linq4j.tree.Expression;
+import org.apache.calcite.linq4j.tree.Expressions;
+import org.apache.calcite.schema.SchemaPlus;
+import org.apache.calcite.schema.impl.AbstractSchema;
+import org.apache.calcite.util.BuiltInMethod;
+import org.apache.calcite.util.Compatible;
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.exec.store.SchemaConfig;
+import org.apache.drill.exec.store.StoragePlugin;
+import org.apache.drill.exec.store.StoragePluginRegistry;
+import org.apache.drill.exec.store.SubSchemaWrapper;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.Map;
+import java.util.NavigableSet;
+import java.util.Set;
+
+/**
+ * This class is to allow us loading schemas from storage plugins later 
when {@link #getSubSchema(String, boolean)}
+ * is called.
+ */
+public class DynamicRootSchema extends DynamicSchema
+implements CalciteRootSchema {
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(DynamicRootSchema.class);
+  /** Creates a root schema. */
+  DynamicRootSchema(StoragePluginRegistry storages, SchemaConfig 
schemaConfig) {
+super(null, new RootSchema(), "");
+this.schemaConfig = schemaConfig;
+this.storages = storages;
+  }
+
+  @Override
+  public CalciteSchema getSubSchema(String schemaName, boolean 
caseSensitive) {
+CalciteSchema retSchema = getSubSchemaMap().get(schemaName);
+if (retSchema != null) {
+  return retSchema;
+}
+
+loadSchemaFactory(schemaName, caseSensitive);
+retSchema = getSubSchemaMap().get(schemaName);
+return retSchema;
+  }
+
+  @Override
+  public NavigableSet getTableNames() {
+Set pluginNames = Sets.newHashSet();
+for (Map.Entry storageEntry : 
getSchemaFactories()) {
+  pluginNames.add(storageEntry.getKey());
+}
+return Compatible.INSTANCE.navigableSet(
+ImmutableSortedSet.copyOf(
+Sets.union(pluginNames, getSubSchemaMap().keySet(;
+  }
+
+  /**
+   * load schema factory(storage plugin) for schemaName
+   * @param schemaName
+   * @param caseSensitive
+   */
+  public void loadSchemaFactory(String schemaName, boolean caseSensitive) {
+try {
+  SchemaPlus thisPlus = this.plus();
+  StoragePlugin plugin = getSchemaFactories().getPlugin(schemaName);
+  if (plugin != null) {
+plugin.registerSchemas(schemaConfig, thisPlus);
+return;
+  }
+
+  // we could not find the plugin, the schemaName could be `dfs.tmp`, 
a 2nd level schema under 'dfs'
+  String[] paths = schemaName.split("\\.");
+  if (paths.length == 2) {
+plugin = getSchemaFactories().getPlugin(paths[0]);
+if (plugin == null) {
+  return;
+}
+
+// we could find storage plugin for first part(e.g. 'dfs') of 
schemaName (e.g. 'dfs.tmp')
+// register schema for this 

[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query

2017-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257237#comment-16257237
 ] 

ASF GitHub Bot commented on DRILL-5089:
---

Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1032#discussion_r151714418
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemSchemaFactory.java
 ---
@@ -50,11 +53,23 @@
 
   public static final String DEFAULT_WS_NAME = "default";
 
+  public static final String LOCAL_FS_SCHEME = "file";
+
   private List factories;
   private String schemaName;
+  protected FileSystemPlugin plugin;
 
   public FileSystemSchemaFactory(String schemaName, 
List factories) {
-super();
+// when the correspondent FileSystemPlugin is not passed in, we dig 
into ANY workspace factory to get it.
+if (factories.size() > 0 ) {
--- End diff --

Please remove space `if (factories.size() > 0) {`.


> Skip initializing all enabled storage plugins for every query
> -
>
> Key: DRILL-5089
> URL: https://issues.apache.org/jira/browse/DRILL-5089
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 1.9.0
>Reporter: Abhishek Girish
>Assignee: Chunhui Shi
>Priority: Critical
>
> In a query's lifecycle, at attempt is made to initialize each enabled storage 
> plugin, while building the schema tree. This is done regardless of the actual 
> plugins involved within a query. 
> Sometimes, when one or more of the enabled storage plugins have issues - 
> either due to misconfiguration or the underlying datasource being slow or 
> being down, the overall query time taken increases drastically. Most likely 
> due the attempt being made to register schemas from a faulty plugin.
> For example, when a jdbc plugin is configured with SQL Server, and at one 
> point the underlying SQL Server db goes down, any Drill query starting to 
> execute at that point and beyond begin to slow down drastically. 
> We must skip registering unrelated schemas (& workspaces) for a query. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query

2017-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257243#comment-16257243
 ] 

ASF GitHub Bot commented on DRILL-5089:
---

Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1032#discussion_r151732407
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/DynamicRootSchema.java
 ---
@@ -0,0 +1,140 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.sql;
+
+import com.google.common.collect.ImmutableSortedSet;
+import com.google.common.collect.Lists;
+import com.google.common.collect.Sets;
+import org.apache.calcite.DataContext;
+import org.apache.calcite.jdbc.CalciteRootSchema;
+import org.apache.calcite.jdbc.CalciteSchema;
+
+import org.apache.calcite.linq4j.tree.Expression;
+import org.apache.calcite.linq4j.tree.Expressions;
+import org.apache.calcite.schema.SchemaPlus;
+import org.apache.calcite.schema.impl.AbstractSchema;
+import org.apache.calcite.util.BuiltInMethod;
+import org.apache.calcite.util.Compatible;
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.exec.store.SchemaConfig;
+import org.apache.drill.exec.store.StoragePlugin;
+import org.apache.drill.exec.store.StoragePluginRegistry;
+import org.apache.drill.exec.store.SubSchemaWrapper;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.Map;
+import java.util.NavigableSet;
+import java.util.Set;
+
+/**
+ * This class is to allow us loading schemas from storage plugins later 
when {@link #getSubSchema(String, boolean)}
+ * is called.
+ */
+public class DynamicRootSchema extends DynamicSchema
+implements CalciteRootSchema {
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(DynamicRootSchema.class);
+  /** Creates a root schema. */
+  DynamicRootSchema(StoragePluginRegistry storages, SchemaConfig 
schemaConfig) {
+super(null, new RootSchema(), "");
+this.schemaConfig = schemaConfig;
+this.storages = storages;
+  }
+
+  @Override
+  public CalciteSchema getSubSchema(String schemaName, boolean 
caseSensitive) {
+CalciteSchema retSchema = getSubSchemaMap().get(schemaName);
+if (retSchema != null) {
+  return retSchema;
+}
+
+loadSchemaFactory(schemaName, caseSensitive);
+retSchema = getSubSchemaMap().get(schemaName);
+return retSchema;
+  }
+
+  @Override
+  public NavigableSet getTableNames() {
+Set pluginNames = Sets.newHashSet();
+for (Map.Entry storageEntry : 
getSchemaFactories()) {
+  pluginNames.add(storageEntry.getKey());
+}
+return Compatible.INSTANCE.navigableSet(
+ImmutableSortedSet.copyOf(
+Sets.union(pluginNames, getSubSchemaMap().keySet(;
+  }
+
+  /**
+   * load schema factory(storage plugin) for schemaName
+   * @param schemaName
+   * @param caseSensitive
+   */
+  public void loadSchemaFactory(String schemaName, boolean caseSensitive) {
+try {
+  SchemaPlus thisPlus = this.plus();
+  StoragePlugin plugin = getSchemaFactories().getPlugin(schemaName);
+  if (plugin != null) {
+plugin.registerSchemas(schemaConfig, thisPlus);
+return;
+  }
+
+  // we could not find the plugin, the schemaName could be `dfs.tmp`, 
a 2nd level schema under 'dfs'
+  String[] paths = schemaName.split("\\.");
+  if (paths.length == 2) {
+plugin = getSchemaFactories().getPlugin(paths[0]);
+if (plugin == null) {
+  return;
+}
+
+// we could find storage plugin for first part(e.g. 'dfs') of 
schemaName (e.g. 'dfs.tmp')
+// register schema for this 

[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query

2017-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257238#comment-16257238
 ] 

ASF GitHub Bot commented on DRILL-5089:
---

Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1032#discussion_r151716880
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/DynamicRootSchema.java
 ---
@@ -0,0 +1,140 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.sql;
+
+import com.google.common.collect.ImmutableSortedSet;
+import com.google.common.collect.Lists;
+import com.google.common.collect.Sets;
+import org.apache.calcite.DataContext;
+import org.apache.calcite.jdbc.CalciteRootSchema;
+import org.apache.calcite.jdbc.CalciteSchema;
+
+import org.apache.calcite.linq4j.tree.Expression;
+import org.apache.calcite.linq4j.tree.Expressions;
+import org.apache.calcite.schema.SchemaPlus;
+import org.apache.calcite.schema.impl.AbstractSchema;
+import org.apache.calcite.util.BuiltInMethod;
+import org.apache.calcite.util.Compatible;
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.exec.store.SchemaConfig;
+import org.apache.drill.exec.store.StoragePlugin;
+import org.apache.drill.exec.store.StoragePluginRegistry;
+import org.apache.drill.exec.store.SubSchemaWrapper;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.Map;
+import java.util.NavigableSet;
+import java.util.Set;
+
+/**
+ * This class is to allow us loading schemas from storage plugins later 
when {@link #getSubSchema(String, boolean)}
+ * is called.
+ */
+public class DynamicRootSchema extends DynamicSchema
+implements CalciteRootSchema {
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(DynamicRootSchema.class);
+  /** Creates a root schema. */
+  DynamicRootSchema(StoragePluginRegistry storages, SchemaConfig 
schemaConfig) {
+super(null, new RootSchema(), "");
+this.schemaConfig = schemaConfig;
+this.storages = storages;
+  }
+
+  @Override
+  public CalciteSchema getSubSchema(String schemaName, boolean 
caseSensitive) {
+CalciteSchema retSchema = getSubSchemaMap().get(schemaName);
+if (retSchema != null) {
+  return retSchema;
+}
+
+loadSchemaFactory(schemaName, caseSensitive);
+retSchema = getSubSchemaMap().get(schemaName);
+return retSchema;
+  }
+
+  @Override
+  public NavigableSet getTableNames() {
+Set pluginNames = Sets.newHashSet();
+for (Map.Entry storageEntry : 
getSchemaFactories()) {
+  pluginNames.add(storageEntry.getKey());
+}
+return Compatible.INSTANCE.navigableSet(
+ImmutableSortedSet.copyOf(
+Sets.union(pluginNames, getSubSchemaMap().keySet(;
--- End diff --

Could you please explain what this method actually returns? According by 
its name it should return table names but it seems it returns different 
things...


> Skip initializing all enabled storage plugins for every query
> -
>
> Key: DRILL-5089
> URL: https://issues.apache.org/jira/browse/DRILL-5089
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 1.9.0
>Reporter: Abhishek Girish
>Assignee: Chunhui Shi
>Priority: Critical
>
> In a query's lifecycle, at attempt is made to initialize each enabled storage 
> plugin, while building the schema tree. This is done regardless of the actual 
> plugins involved within a query. 
> Sometimes, when one or more of the enabled storage plugins have issues - 
> either due to misconfiguration or the 

[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query

2017-11-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16255710#comment-16255710
 ] 

ASF GitHub Bot commented on DRILL-5089:
---

Github user chunhui-shi commented on a diff in the pull request:

https://github.com/apache/drill/pull/1032#discussion_r151493351
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java
 ---
@@ -175,6 +193,21 @@ public WorkspaceSchema createSchema(List 
parentSchemaPath, SchemaConfig
 return new WorkspaceSchema(parentSchemaPath, schemaName, schemaConfig);
   }
 
+  public WorkspaceSchema createSchema(List parentSchemaPath, 
SchemaConfig schemaConfig, DrillFileSystem fs) throws IOException {
+if (!accessible(fs)) {
--- End diff --

returning null then user could not even list this workspace, so they don't 
know the existence of this workspace at all. I think that is a good access 
control practice. 

If users expect to see a workspace but could not see it, then they need to 
figure out why by themselves.


> Skip initializing all enabled storage plugins for every query
> -
>
> Key: DRILL-5089
> URL: https://issues.apache.org/jira/browse/DRILL-5089
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 1.9.0
>Reporter: Abhishek Girish
>Assignee: Chunhui Shi
>Priority: Critical
>
> In a query's lifecycle, at attempt is made to initialize each enabled storage 
> plugin, while building the schema tree. This is done regardless of the actual 
> plugins involved within a query. 
> Sometimes, when one or more of the enabled storage plugins have issues - 
> either due to misconfiguration or the underlying datasource being slow or 
> being down, the overall query time taken increases drastically. Most likely 
> due the attempt being made to register schemas from a faulty plugin.
> For example, when a jdbc plugin is configured with SQL Server, and at one 
> point the underlying SQL Server db goes down, any Drill query starting to 
> execute at that point and beyond begin to slow down drastically. 
> We must skip registering unrelated schemas (& workspaces) for a query. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query

2017-11-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16254618#comment-16254618
 ] 

ASF GitHub Bot commented on DRILL-5089:
---

Github user chunhui-shi commented on a diff in the pull request:

https://github.com/apache/drill/pull/1032#discussion_r151302799
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/DynamicRootSchema.java
 ---
@@ -0,0 +1,125 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.sql;
+
+import com.google.common.collect.ImmutableSortedSet;
+import com.google.common.collect.Lists;
+import com.google.common.collect.Sets;
+import org.apache.calcite.DataContext;
+import org.apache.calcite.jdbc.CalciteRootSchema;
+import org.apache.calcite.jdbc.CalciteSchema;
+
+import org.apache.calcite.linq4j.tree.Expression;
+import org.apache.calcite.linq4j.tree.Expressions;
+import org.apache.calcite.schema.SchemaPlus;
+import org.apache.calcite.schema.impl.AbstractSchema;
+import org.apache.calcite.util.BuiltInMethod;
+import org.apache.calcite.util.Compatible;
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.exec.store.SchemaConfig;
+import org.apache.drill.exec.store.StoragePlugin;
+import org.apache.drill.exec.store.StoragePluginRegistry;
+import org.apache.drill.exec.store.SubSchemaWrapper;
+
+import java.io.IOException;
+import java.util.Map;
+import java.util.NavigableSet;
+import java.util.Set;
+
+public class DynamicRootSchema extends DynamicSchema
+implements CalciteRootSchema {
+
+  /** Creates a root schema. */
+  DynamicRootSchema(StoragePluginRegistry storages, SchemaConfig 
schemaConfig) {
+super(null, new RootSchema(), "");
+this.schemaConfig = schemaConfig;
+this.storages = storages;
+  }
+
+  @Override
+  public CalciteSchema getSubSchema(String schemaName, boolean 
caseSensitive) {
+CalciteSchema retSchema = getSubSchemaMap().get(schemaName);
+
+if (retSchema == null) {
+  loadSchemaFactory(schemaName, caseSensitive);
+}
+
+retSchema = getSubSchemaMap().get(schemaName);
+return retSchema;
+  }
+
+  @Override
+  public NavigableSet getTableNames() {
+Set pluginNames = Sets.newHashSet();
--- End diff --

plugin name in drill is case sensitive.


> Skip initializing all enabled storage plugins for every query
> -
>
> Key: DRILL-5089
> URL: https://issues.apache.org/jira/browse/DRILL-5089
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 1.9.0
>Reporter: Abhishek Girish
>Assignee: Chunhui Shi
>Priority: Critical
>
> In a query's lifecycle, at attempt is made to initialize each enabled storage 
> plugin, while building the schema tree. This is done regardless of the actual 
> plugins involved within a query. 
> Sometimes, when one or more of the enabled storage plugins have issues - 
> either due to misconfiguration or the underlying datasource being slow or 
> being down, the overall query time taken increases drastically. Most likely 
> due the attempt being made to register schemas from a faulty plugin.
> For example, when a jdbc plugin is configured with SQL Server, and at one 
> point the underlying SQL Server db goes down, any Drill query starting to 
> execute at that point and beyond begin to slow down drastically. 
> We must skip registering unrelated schemas (& workspaces) for a query. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query

2017-11-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16254602#comment-16254602
 ] 

ASF GitHub Bot commented on DRILL-5089:
---

Github user chunhui-shi commented on a diff in the pull request:

https://github.com/apache/drill/pull/1032#discussion_r151301607
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java
 ---
@@ -150,14 +152,30 @@ public WorkspaceSchemaFactory(
* @return True if the user has access. False otherwise.
*/
   public boolean accessible(final String userName) throws IOException {
-final FileSystem fs = ImpersonationUtil.createFileSystem(userName, 
fsConf);
+final DrillFileSystem fs = 
ImpersonationUtil.createFileSystem(userName, fsConf);
+return accessible(fs);
+  }
+
+  /**
+   * Checks whether a FileSystem object has the permission to list/read 
workspace directory
+   * @param fs a DrillFileSystem object that was created with certain user 
privilege
+   * @return True if the user has access. False otherwise.
+   * @throws IOException
+   */
+  public boolean accessible(DrillFileSystem fs) throws IOException {
 try {
-  // We have to rely on the listStatus as a FileSystem can have 
complicated controls such as regular unix style
-  // permissions, Access Control Lists (ACLs) or Access Control 
Expressions (ACE). Hadoop 2.7 version of FileSystem
-  // has a limited private API (FileSystem.access) to check the 
permissions directly
-  // (see https://issues.apache.org/jira/browse/HDFS-6570). Drill 
currently relies on Hadoop 2.5.0 version of
-  // FileClient. TODO: Update this when DRILL-3749 is fixed.
-  fs.listStatus(wsPath);
+  /**
+   * For Windows local file system, fs.access ends up using 
DeprecatedRawLocalFileStatus which has
+   * TrustedInstaller as owner, and a member of Administrators group 
could not satisfy the permission.
+   * In this case, we will still use method listStatus.
+   * In other cases, we use access method since it is cheaper.
+   */
+  if (SystemUtils.IS_OS_WINDOWS && 
fs.getUri().getScheme().equalsIgnoreCase("file")) {
--- End diff --

FileSystem in hdfs has a constant DEFAULT_FS "file:///", for now I will 
define our own.


> Skip initializing all enabled storage plugins for every query
> -
>
> Key: DRILL-5089
> URL: https://issues.apache.org/jira/browse/DRILL-5089
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 1.9.0
>Reporter: Abhishek Girish
>Assignee: Chunhui Shi
>Priority: Critical
>
> In a query's lifecycle, at attempt is made to initialize each enabled storage 
> plugin, while building the schema tree. This is done regardless of the actual 
> plugins involved within a query. 
> Sometimes, when one or more of the enabled storage plugins have issues - 
> either due to misconfiguration or the underlying datasource being slow or 
> being down, the overall query time taken increases drastically. Most likely 
> due the attempt being made to register schemas from a faulty plugin.
> For example, when a jdbc plugin is configured with SQL Server, and at one 
> point the underlying SQL Server db goes down, any Drill query starting to 
> execute at that point and beyond begin to slow down drastically. 
> We must skip registering unrelated schemas (& workspaces) for a query. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query

2017-11-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16254589#comment-16254589
 ] 

ASF GitHub Bot commented on DRILL-5089:
---

Github user chunhui-shi commented on a diff in the pull request:

https://github.com/apache/drill/pull/1032#discussion_r151299650
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/SchemaTreeProvider.java
 ---
@@ -105,12 +106,36 @@ public SchemaPlus createRootSchema(final String 
userName, final SchemaConfigInfo
* @return
*/
   public SchemaPlus createRootSchema(SchemaConfig schemaConfig) {
+  final SchemaPlus rootSchema = 
DynamicSchema.createRootSchema(dContext.getStorage(), schemaConfig);
+  schemaTreesToClose.add(rootSchema);
+  return rootSchema;
+  }
+
+  /**
+   * Return full root schema with schema owner as the given user.
+   *
+   * @param userName Name of the user who is accessing the storage sources.
+   * @param provider {@link SchemaConfigInfoProvider} instance
+   * @return Root of the schema tree.
+   */
+  public SchemaPlus createFullRootSchema(final String userName, final 
SchemaConfigInfoProvider provider) {
+final String schemaUser = isImpersonationEnabled ? userName : 
ImpersonationUtil.getProcessUserName();
--- End diff --

not that many places, and need to pass in isImpersonationEnabled and 
userName if this line became a standalone method. will keep it as is for now.


> Skip initializing all enabled storage plugins for every query
> -
>
> Key: DRILL-5089
> URL: https://issues.apache.org/jira/browse/DRILL-5089
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 1.9.0
>Reporter: Abhishek Girish
>Assignee: Chunhui Shi
>Priority: Critical
>
> In a query's lifecycle, at attempt is made to initialize each enabled storage 
> plugin, while building the schema tree. This is done regardless of the actual 
> plugins involved within a query. 
> Sometimes, when one or more of the enabled storage plugins have issues - 
> either due to misconfiguration or the underlying datasource being slow or 
> being down, the overall query time taken increases drastically. Most likely 
> due the attempt being made to register schemas from a faulty plugin.
> For example, when a jdbc plugin is configured with SQL Server, and at one 
> point the underlying SQL Server db goes down, any Drill query starting to 
> execute at that point and beyond begin to slow down drastically. 
> We must skip registering unrelated schemas (& workspaces) for a query. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query

2017-11-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16254580#comment-16254580
 ] 

ASF GitHub Bot commented on DRILL-5089:
---

Github user chunhui-shi commented on a diff in the pull request:

https://github.com/apache/drill/pull/1032#discussion_r151298428
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemSchemaFactory.java
 ---
@@ -73,9 +87,10 @@ public void registerSchemas(SchemaConfig schemaConfig, 
SchemaPlus parent) throws
 
 public FileSystemSchema(String name, SchemaConfig schemaConfig) throws 
IOException {
   super(ImmutableList.of(), name);
+  final DrillFileSystem fs = 
ImpersonationUtil.createFileSystem(schemaConfig.getUserName(), 
plugin.getFsConf());
   for(WorkspaceSchemaFactory f :  factories){
-if (f.accessible(schemaConfig.getUserName())) {
-  WorkspaceSchema s = f.createSchema(getSchemaPath(), 
schemaConfig);
+WorkspaceSchema s = f.createSchema(getSchemaPath(), schemaConfig, 
fs);
+if ( s != null) {
--- End diff --

'factories' is from storage plugin, so it will be identical when we don't 
update this storage plugin. This FileSystemSchema constructor will be called 
only once for a query if this FileSystem storage plugin is needed for this 
query.



> Skip initializing all enabled storage plugins for every query
> -
>
> Key: DRILL-5089
> URL: https://issues.apache.org/jira/browse/DRILL-5089
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 1.9.0
>Reporter: Abhishek Girish
>Assignee: Chunhui Shi
>Priority: Critical
>
> In a query's lifecycle, at attempt is made to initialize each enabled storage 
> plugin, while building the schema tree. This is done regardless of the actual 
> plugins involved within a query. 
> Sometimes, when one or more of the enabled storage plugins have issues - 
> either due to misconfiguration or the underlying datasource being slow or 
> being down, the overall query time taken increases drastically. Most likely 
> due the attempt being made to register schemas from a faulty plugin.
> For example, when a jdbc plugin is configured with SQL Server, and at one 
> point the underlying SQL Server db goes down, any Drill query starting to 
> execute at that point and beyond begin to slow down drastically. 
> We must skip registering unrelated schemas (& workspaces) for a query. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query

2017-11-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250389#comment-16250389
 ] 

ASF GitHub Bot commented on DRILL-5089:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1032#discussion_r150680095
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/DynamicRootSchema.java
 ---
@@ -0,0 +1,125 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.sql;
+
+import com.google.common.collect.ImmutableSortedSet;
+import com.google.common.collect.Lists;
+import com.google.common.collect.Sets;
+import org.apache.calcite.DataContext;
+import org.apache.calcite.jdbc.CalciteRootSchema;
+import org.apache.calcite.jdbc.CalciteSchema;
+
+import org.apache.calcite.linq4j.tree.Expression;
+import org.apache.calcite.linq4j.tree.Expressions;
+import org.apache.calcite.schema.SchemaPlus;
+import org.apache.calcite.schema.impl.AbstractSchema;
+import org.apache.calcite.util.BuiltInMethod;
+import org.apache.calcite.util.Compatible;
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.exec.store.SchemaConfig;
+import org.apache.drill.exec.store.StoragePlugin;
+import org.apache.drill.exec.store.StoragePluginRegistry;
+import org.apache.drill.exec.store.SubSchemaWrapper;
+
+import java.io.IOException;
+import java.util.Map;
+import java.util.NavigableSet;
+import java.util.Set;
+
+public class DynamicRootSchema extends DynamicSchema
+implements CalciteRootSchema {
+
+  /** Creates a root schema. */
+  DynamicRootSchema(StoragePluginRegistry storages, SchemaConfig 
schemaConfig) {
+super(null, new RootSchema(), "");
+this.schemaConfig = schemaConfig;
+this.storages = storages;
+  }
+
+  @Override
+  public CalciteSchema getSubSchema(String schemaName, boolean 
caseSensitive) {
+CalciteSchema retSchema = getSubSchemaMap().get(schemaName);
+
+if (retSchema == null) {
+  loadSchemaFactory(schemaName, caseSensitive);
+}
+
+retSchema = getSubSchemaMap().get(schemaName);
+return retSchema;
+  }
+
+  @Override
+  public NavigableSet getTableNames() {
+Set pluginNames = Sets.newHashSet();
+for (Map.Entry storageEntry : 
getSchemaFactories()) {
+  pluginNames.add(storageEntry.getKey());
+}
+return Compatible.INSTANCE.navigableSet(
+ImmutableSortedSet.copyOf(
+Sets.union(pluginNames, getSubSchemaMap().keySet(;
+  }
+
+  /**
+   * load schema factory(storage plugin) for schemaName
+   * @param schemaName
+   * @param caseSensitive
+   */
+  public void loadSchemaFactory(String schemaName, boolean caseSensitive) {
+try {
+  SchemaPlus thisPlus = this.plus();
+  StoragePlugin plugin = getSchemaFactories().getPlugin(schemaName);
+  if (plugin != null) {
+plugin.registerSchemas(schemaConfig, thisPlus);
+  }
+  else {
+//this schema name could be `dfs.tmp`, a 2nd level schema under 
'dfs'
+String[] paths = schemaName.split("\\.");
+if (paths.length == 2) {
--- End diff --

Do we support only 1- and 2-part names? Should we assert that the length <= 
2?


> Skip initializing all enabled storage plugins for every query
> -
>
> Key: DRILL-5089
> URL: https://issues.apache.org/jira/browse/DRILL-5089
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 1.9.0
>Reporter: Abhishek Girish
>Assignee: Chunhui Shi
>Priority: Critical
>
> In a query's lifecycle, at attempt 

[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query

2017-11-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250402#comment-16250402
 ] 

ASF GitHub Bot commented on DRILL-5089:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1032#discussion_r150685113
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemSchemaFactory.java
 ---
@@ -73,9 +87,10 @@ public void registerSchemas(SchemaConfig schemaConfig, 
SchemaPlus parent) throws
 
 public FileSystemSchema(String name, SchemaConfig schemaConfig) throws 
IOException {
   super(ImmutableList.of(), name);
+  final DrillFileSystem fs = 
ImpersonationUtil.createFileSystem(schemaConfig.getUserName(), 
plugin.getFsConf());
   for(WorkspaceSchemaFactory f :  factories){
-if (f.accessible(schemaConfig.getUserName())) {
-  WorkspaceSchema s = f.createSchema(getSchemaPath(), 
schemaConfig);
+WorkspaceSchema s = f.createSchema(getSchemaPath(), schemaConfig, 
fs);
+if ( s != null) {
--- End diff --

Here we iterate over a list of workspace schema factories. For each, we 
resolve a schemaConfig against the file system.

Under what situations would we have multiple factories? Selecting from two 
distinct storage plugins?

Calcite tends to resolve the same things over and over. Will this method be 
called multiple times?


> Skip initializing all enabled storage plugins for every query
> -
>
> Key: DRILL-5089
> URL: https://issues.apache.org/jira/browse/DRILL-5089
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 1.9.0
>Reporter: Abhishek Girish
>Assignee: Chunhui Shi
>Priority: Critical
>
> In a query's lifecycle, at attempt is made to initialize each enabled storage 
> plugin, while building the schema tree. This is done regardless of the actual 
> plugins involved within a query. 
> Sometimes, when one or more of the enabled storage plugins have issues - 
> either due to misconfiguration or the underlying datasource being slow or 
> being down, the overall query time taken increases drastically. Most likely 
> due the attempt being made to register schemas from a faulty plugin.
> For example, when a jdbc plugin is configured with SQL Server, and at one 
> point the underlying SQL Server db goes down, any Drill query starting to 
> execute at that point and beyond begin to slow down drastically. 
> We must skip registering unrelated schemas (& workspaces) for a query. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query

2017-11-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250408#comment-16250408
 ] 

ASF GitHub Bot commented on DRILL-5089:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1032#discussion_r150686157
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java
 ---
@@ -532,7 +572,10 @@ public boolean isMutable() {
 }
 
 public DrillFileSystem getFS() {
-  return fs;
+  if (this.fs == null) {
+this.fs = 
ImpersonationUtil.createFileSystem(schemaConfig.getUserName(), fsConf);
+  }
+  return this.fs;
--- End diff --

This class caches the file system, which is good. The other classes in this 
PR do not; they create the fs as needed.

Does Calcite allow some kind of session state in which we can cache the fs 
for the query (plan) rather than creating it on the fly each time we need it?


> Skip initializing all enabled storage plugins for every query
> -
>
> Key: DRILL-5089
> URL: https://issues.apache.org/jira/browse/DRILL-5089
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 1.9.0
>Reporter: Abhishek Girish
>Assignee: Chunhui Shi
>Priority: Critical
>
> In a query's lifecycle, at attempt is made to initialize each enabled storage 
> plugin, while building the schema tree. This is done regardless of the actual 
> plugins involved within a query. 
> Sometimes, when one or more of the enabled storage plugins have issues - 
> either due to misconfiguration or the underlying datasource being slow or 
> being down, the overall query time taken increases drastically. Most likely 
> due the attempt being made to register schemas from a faulty plugin.
> For example, when a jdbc plugin is configured with SQL Server, and at one 
> point the underlying SQL Server db goes down, any Drill query starting to 
> execute at that point and beyond begin to slow down drastically. 
> We must skip registering unrelated schemas (& workspaces) for a query. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query

2017-11-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250387#comment-16250387
 ] 

ASF GitHub Bot commented on DRILL-5089:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1032#discussion_r150678803
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/DynamicRootSchema.java
 ---
@@ -0,0 +1,125 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.sql;
+
+import com.google.common.collect.ImmutableSortedSet;
+import com.google.common.collect.Lists;
+import com.google.common.collect.Sets;
+import org.apache.calcite.DataContext;
+import org.apache.calcite.jdbc.CalciteRootSchema;
+import org.apache.calcite.jdbc.CalciteSchema;
+
+import org.apache.calcite.linq4j.tree.Expression;
+import org.apache.calcite.linq4j.tree.Expressions;
+import org.apache.calcite.schema.SchemaPlus;
+import org.apache.calcite.schema.impl.AbstractSchema;
+import org.apache.calcite.util.BuiltInMethod;
+import org.apache.calcite.util.Compatible;
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.exec.store.SchemaConfig;
+import org.apache.drill.exec.store.StoragePlugin;
+import org.apache.drill.exec.store.StoragePluginRegistry;
+import org.apache.drill.exec.store.SubSchemaWrapper;
+
+import java.io.IOException;
+import java.util.Map;
+import java.util.NavigableSet;
+import java.util.Set;
+
+public class DynamicRootSchema extends DynamicSchema
+implements CalciteRootSchema {
+
+  /** Creates a root schema. */
+  DynamicRootSchema(StoragePluginRegistry storages, SchemaConfig 
schemaConfig) {
+super(null, new RootSchema(), "");
+this.schemaConfig = schemaConfig;
+this.storages = storages;
+  }
+
+  @Override
+  public CalciteSchema getSubSchema(String schemaName, boolean 
caseSensitive) {
+CalciteSchema retSchema = getSubSchemaMap().get(schemaName);
+
+if (retSchema == null) {
+  loadSchemaFactory(schemaName, caseSensitive);
+}
+
+retSchema = getSubSchemaMap().get(schemaName);
+return retSchema;
+  }
+
+  @Override
+  public NavigableSet getTableNames() {
+Set pluginNames = Sets.newHashSet();
--- End diff --

Should be case insensitive?


> Skip initializing all enabled storage plugins for every query
> -
>
> Key: DRILL-5089
> URL: https://issues.apache.org/jira/browse/DRILL-5089
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 1.9.0
>Reporter: Abhishek Girish
>Assignee: Chunhui Shi
>Priority: Critical
>
> In a query's lifecycle, at attempt is made to initialize each enabled storage 
> plugin, while building the schema tree. This is done regardless of the actual 
> plugins involved within a query. 
> Sometimes, when one or more of the enabled storage plugins have issues - 
> either due to misconfiguration or the underlying datasource being slow or 
> being down, the overall query time taken increases drastically. Most likely 
> due the attempt being made to register schemas from a faulty plugin.
> For example, when a jdbc plugin is configured with SQL Server, and at one 
> point the underlying SQL Server db goes down, any Drill query starting to 
> execute at that point and beyond begin to slow down drastically. 
> We must skip registering unrelated schemas (& workspaces) for a query. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query

2017-11-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250401#comment-16250401
 ] 

ASF GitHub Bot commented on DRILL-5089:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1032#discussion_r150684557
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemSchemaFactory.java
 ---
@@ -73,9 +87,10 @@ public void registerSchemas(SchemaConfig schemaConfig, 
SchemaPlus parent) throws
 
 public FileSystemSchema(String name, SchemaConfig schemaConfig) throws 
IOException {
   super(ImmutableList.of(), name);
+  final DrillFileSystem fs = 
ImpersonationUtil.createFileSystem(schemaConfig.getUserName(), 
plugin.getFsConf());
--- End diff --

Comment to explain what's happening here?


> Skip initializing all enabled storage plugins for every query
> -
>
> Key: DRILL-5089
> URL: https://issues.apache.org/jira/browse/DRILL-5089
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 1.9.0
>Reporter: Abhishek Girish
>Assignee: Chunhui Shi
>Priority: Critical
>
> In a query's lifecycle, at attempt is made to initialize each enabled storage 
> plugin, while building the schema tree. This is done regardless of the actual 
> plugins involved within a query. 
> Sometimes, when one or more of the enabled storage plugins have issues - 
> either due to misconfiguration or the underlying datasource being slow or 
> being down, the overall query time taken increases drastically. Most likely 
> due the attempt being made to register schemas from a faulty plugin.
> For example, when a jdbc plugin is configured with SQL Server, and at one 
> point the underlying SQL Server db goes down, any Drill query starting to 
> execute at that point and beyond begin to slow down drastically. 
> We must skip registering unrelated schemas (& workspaces) for a query. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query

2017-11-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250391#comment-16250391
 ] 

ASF GitHub Bot commented on DRILL-5089:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1032#discussion_r150682074
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/DynamicRootSchema.java
 ---
@@ -0,0 +1,125 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.sql;
+
+import com.google.common.collect.ImmutableSortedSet;
+import com.google.common.collect.Lists;
+import com.google.common.collect.Sets;
+import org.apache.calcite.DataContext;
+import org.apache.calcite.jdbc.CalciteRootSchema;
+import org.apache.calcite.jdbc.CalciteSchema;
+
+import org.apache.calcite.linq4j.tree.Expression;
+import org.apache.calcite.linq4j.tree.Expressions;
+import org.apache.calcite.schema.SchemaPlus;
+import org.apache.calcite.schema.impl.AbstractSchema;
+import org.apache.calcite.util.BuiltInMethod;
+import org.apache.calcite.util.Compatible;
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.exec.store.SchemaConfig;
+import org.apache.drill.exec.store.StoragePlugin;
+import org.apache.drill.exec.store.StoragePluginRegistry;
+import org.apache.drill.exec.store.SubSchemaWrapper;
+
+import java.io.IOException;
+import java.util.Map;
+import java.util.NavigableSet;
+import java.util.Set;
+
+public class DynamicRootSchema extends DynamicSchema
+implements CalciteRootSchema {
+
+  /** Creates a root schema. */
+  DynamicRootSchema(StoragePluginRegistry storages, SchemaConfig 
schemaConfig) {
+super(null, new RootSchema(), "");
+this.schemaConfig = schemaConfig;
+this.storages = storages;
+  }
+
+  @Override
+  public CalciteSchema getSubSchema(String schemaName, boolean 
caseSensitive) {
+CalciteSchema retSchema = getSubSchemaMap().get(schemaName);
+
+if (retSchema == null) {
+  loadSchemaFactory(schemaName, caseSensitive);
+}
+
+retSchema = getSubSchemaMap().get(schemaName);
+return retSchema;
+  }
+
+  @Override
+  public NavigableSet getTableNames() {
+Set pluginNames = Sets.newHashSet();
+for (Map.Entry storageEntry : 
getSchemaFactories()) {
+  pluginNames.add(storageEntry.getKey());
+}
+return Compatible.INSTANCE.navigableSet(
+ImmutableSortedSet.copyOf(
+Sets.union(pluginNames, getSubSchemaMap().keySet(;
+  }
+
+  /**
+   * load schema factory(storage plugin) for schemaName
+   * @param schemaName
+   * @param caseSensitive
+   */
+  public void loadSchemaFactory(String schemaName, boolean caseSensitive) {
+try {
+  SchemaPlus thisPlus = this.plus();
+  StoragePlugin plugin = getSchemaFactories().getPlugin(schemaName);
+  if (plugin != null) {
+plugin.registerSchemas(schemaConfig, thisPlus);
+  }
+  else {
+//this schema name could be `dfs.tmp`, a 2nd level schema under 
'dfs'
+String[] paths = schemaName.split("\\.");
+if (paths.length == 2) {
+  plugin = getSchemaFactories().getPlugin(paths[0]);
+  if (plugin != null) {
+plugin.registerSchemas(schemaConfig, thisPlus);
+  }
--- End diff --

What if the plugin is null? Should we fail with an error, or just keep 
going?


> Skip initializing all enabled storage plugins for every query
> -
>
> Key: DRILL-5089
> URL: https://issues.apache.org/jira/browse/DRILL-5089
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & 

[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query

2017-11-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250404#comment-16250404
 ] 

ASF GitHub Bot commented on DRILL-5089:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1032#discussion_r150685672
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java
 ---
@@ -175,6 +193,21 @@ public WorkspaceSchema createSchema(List 
parentSchemaPath, SchemaConfig
 return new WorkspaceSchema(parentSchemaPath, schemaName, schemaConfig);
   }
 
+  public WorkspaceSchema createSchema(List parentSchemaPath, 
SchemaConfig schemaConfig, DrillFileSystem fs) throws IOException {
+if (!accessible(fs)) {
--- End diff --

Is returning null sufficient to tell the user that they don't have 
permission to do this operation?


> Skip initializing all enabled storage plugins for every query
> -
>
> Key: DRILL-5089
> URL: https://issues.apache.org/jira/browse/DRILL-5089
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 1.9.0
>Reporter: Abhishek Girish
>Assignee: Chunhui Shi
>Priority: Critical
>
> In a query's lifecycle, at attempt is made to initialize each enabled storage 
> plugin, while building the schema tree. This is done regardless of the actual 
> plugins involved within a query. 
> Sometimes, when one or more of the enabled storage plugins have issues - 
> either due to misconfiguration or the underlying datasource being slow or 
> being down, the overall query time taken increases drastically. Most likely 
> due the attempt being made to register schemas from a faulty plugin.
> For example, when a jdbc plugin is configured with SQL Server, and at one 
> point the underlying SQL Server db goes down, any Drill query starting to 
> execute at that point and beyond begin to slow down drastically. 
> We must skip registering unrelated schemas (& workspaces) for a query. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query

2017-11-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250392#comment-16250392
 ] 

ASF GitHub Bot commented on DRILL-5089:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1032#discussion_r150680454
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/DynamicRootSchema.java
 ---
@@ -0,0 +1,125 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.sql;
+
+import com.google.common.collect.ImmutableSortedSet;
+import com.google.common.collect.Lists;
+import com.google.common.collect.Sets;
+import org.apache.calcite.DataContext;
+import org.apache.calcite.jdbc.CalciteRootSchema;
+import org.apache.calcite.jdbc.CalciteSchema;
+
+import org.apache.calcite.linq4j.tree.Expression;
+import org.apache.calcite.linq4j.tree.Expressions;
+import org.apache.calcite.schema.SchemaPlus;
+import org.apache.calcite.schema.impl.AbstractSchema;
+import org.apache.calcite.util.BuiltInMethod;
+import org.apache.calcite.util.Compatible;
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.exec.store.SchemaConfig;
+import org.apache.drill.exec.store.StoragePlugin;
+import org.apache.drill.exec.store.StoragePluginRegistry;
+import org.apache.drill.exec.store.SubSchemaWrapper;
+
+import java.io.IOException;
+import java.util.Map;
+import java.util.NavigableSet;
+import java.util.Set;
+
+public class DynamicRootSchema extends DynamicSchema
+implements CalciteRootSchema {
+
+  /** Creates a root schema. */
+  DynamicRootSchema(StoragePluginRegistry storages, SchemaConfig 
schemaConfig) {
+super(null, new RootSchema(), "");
+this.schemaConfig = schemaConfig;
+this.storages = storages;
+  }
+
+  @Override
+  public CalciteSchema getSubSchema(String schemaName, boolean 
caseSensitive) {
+CalciteSchema retSchema = getSubSchemaMap().get(schemaName);
+
+if (retSchema == null) {
+  loadSchemaFactory(schemaName, caseSensitive);
+}
+
+retSchema = getSubSchemaMap().get(schemaName);
+return retSchema;
+  }
+
+  @Override
+  public NavigableSet getTableNames() {
+Set pluginNames = Sets.newHashSet();
+for (Map.Entry storageEntry : 
getSchemaFactories()) {
+  pluginNames.add(storageEntry.getKey());
+}
+return Compatible.INSTANCE.navigableSet(
+ImmutableSortedSet.copyOf(
+Sets.union(pluginNames, getSubSchemaMap().keySet(;
+  }
+
+  /**
+   * load schema factory(storage plugin) for schemaName
+   * @param schemaName
+   * @param caseSensitive
+   */
+  public void loadSchemaFactory(String schemaName, boolean caseSensitive) {
+try {
+  SchemaPlus thisPlus = this.plus();
+  StoragePlugin plugin = getSchemaFactories().getPlugin(schemaName);
+  if (plugin != null) {
+plugin.registerSchemas(schemaConfig, thisPlus);
+  }
+  else {
+//this schema name could be `dfs.tmp`, a 2nd level schema under 
'dfs'
+String[] paths = schemaName.split("\\.");
+if (paths.length == 2) {
+  plugin = getSchemaFactories().getPlugin(paths[0]);
+  if (plugin != null) {
+plugin.registerSchemas(schemaConfig, thisPlus);
+  }
+
+  final CalciteSchema secondLevelSchema = 
getSubSchemaMap().get(paths[0]).getSubSchema(paths[1], caseSensitive);
+  if (secondLevelSchema != null) {
+SchemaPlus secondlevel = secondLevelSchema.plus();
+org.apache.drill.exec.store.AbstractSchema drillSchema =
+
secondlevel.unwrap(org.apache.drill.exec.store.AbstractSchema.class);
+SubSchemaWrapper wrapper = 

[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query

2017-11-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250409#comment-16250409
 ] 

ASF GitHub Bot commented on DRILL-5089:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1032#discussion_r150682963
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/SchemaTreeProvider.java
 ---
@@ -20,12 +20,13 @@
 import java.io.IOException;
 import java.util.List;
 
-import org.apache.calcite.jdbc.SimpleCalciteSchema;
+//import org.apache.calcite.jdbc.SimpleCalciteSchema;
--- End diff --

Remove


> Skip initializing all enabled storage plugins for every query
> -
>
> Key: DRILL-5089
> URL: https://issues.apache.org/jira/browse/DRILL-5089
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 1.9.0
>Reporter: Abhishek Girish
>Assignee: Chunhui Shi
>Priority: Critical
>
> In a query's lifecycle, at attempt is made to initialize each enabled storage 
> plugin, while building the schema tree. This is done regardless of the actual 
> plugins involved within a query. 
> Sometimes, when one or more of the enabled storage plugins have issues - 
> either due to misconfiguration or the underlying datasource being slow or 
> being down, the overall query time taken increases drastically. Most likely 
> due the attempt being made to register schemas from a faulty plugin.
> For example, when a jdbc plugin is configured with SQL Server, and at one 
> point the underlying SQL Server db goes down, any Drill query starting to 
> execute at that point and beyond begin to slow down drastically. 
> We must skip registering unrelated schemas (& workspaces) for a query. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query

2017-11-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250403#comment-16250403
 ] 

ASF GitHub Bot commented on DRILL-5089:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1032#discussion_r150685414
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java
 ---
@@ -150,14 +152,30 @@ public WorkspaceSchemaFactory(
* @return True if the user has access. False otherwise.
*/
   public boolean accessible(final String userName) throws IOException {
-final FileSystem fs = ImpersonationUtil.createFileSystem(userName, 
fsConf);
+final DrillFileSystem fs = 
ImpersonationUtil.createFileSystem(userName, fsConf);
+return accessible(fs);
+  }
+
+  /**
+   * Checks whether a FileSystem object has the permission to list/read 
workspace directory
+   * @param fs a DrillFileSystem object that was created with certain user 
privilege
+   * @return True if the user has access. False otherwise.
+   * @throws IOException
+   */
+  public boolean accessible(DrillFileSystem fs) throws IOException {
 try {
-  // We have to rely on the listStatus as a FileSystem can have 
complicated controls such as regular unix style
-  // permissions, Access Control Lists (ACLs) or Access Control 
Expressions (ACE). Hadoop 2.7 version of FileSystem
-  // has a limited private API (FileSystem.access) to check the 
permissions directly
-  // (see https://issues.apache.org/jira/browse/HDFS-6570). Drill 
currently relies on Hadoop 2.5.0 version of
-  // FileClient. TODO: Update this when DRILL-3749 is fixed.
-  fs.listStatus(wsPath);
+  /**
+   * For Windows local file system, fs.access ends up using 
DeprecatedRawLocalFileStatus which has
+   * TrustedInstaller as owner, and a member of Administrators group 
could not satisfy the permission.
+   * In this case, we will still use method listStatus.
+   * In other cases, we use access method since it is cheaper.
+   */
+  if (SystemUtils.IS_OS_WINDOWS && 
fs.getUri().getScheme().equalsIgnoreCase("file")) {
--- End diff --

HDFS probably defines a constant for "file". Should we reference that?


> Skip initializing all enabled storage plugins for every query
> -
>
> Key: DRILL-5089
> URL: https://issues.apache.org/jira/browse/DRILL-5089
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 1.9.0
>Reporter: Abhishek Girish
>Assignee: Chunhui Shi
>Priority: Critical
>
> In a query's lifecycle, at attempt is made to initialize each enabled storage 
> plugin, while building the schema tree. This is done regardless of the actual 
> plugins involved within a query. 
> Sometimes, when one or more of the enabled storage plugins have issues - 
> either due to misconfiguration or the underlying datasource being slow or 
> being down, the overall query time taken increases drastically. Most likely 
> due the attempt being made to register schemas from a faulty plugin.
> For example, when a jdbc plugin is configured with SQL Server, and at one 
> point the underlying SQL Server db goes down, any Drill query starting to 
> execute at that point and beyond begin to slow down drastically. 
> We must skip registering unrelated schemas (& workspaces) for a query. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query

2017-11-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250405#comment-16250405
 ] 

ASF GitHub Bot commented on DRILL-5089:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1032#discussion_r150684042
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemSchemaFactory.java
 ---
@@ -52,9 +55,20 @@
 
   private List factories;
   private String schemaName;
+  protected FileSystemPlugin plugin;
 
   public FileSystemSchemaFactory(String schemaName, 
List factories) {
 super();
+if (factories.size() > 0 ) {
+  this.plugin = factories.get(0).getPlugin();
--- End diff --

A comment would be helpful: what is special about plugin 0 that it should 
be the default? Is plugin 0 the default of some sort?


> Skip initializing all enabled storage plugins for every query
> -
>
> Key: DRILL-5089
> URL: https://issues.apache.org/jira/browse/DRILL-5089
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 1.9.0
>Reporter: Abhishek Girish
>Assignee: Chunhui Shi
>Priority: Critical
>
> In a query's lifecycle, at attempt is made to initialize each enabled storage 
> plugin, while building the schema tree. This is done regardless of the actual 
> plugins involved within a query. 
> Sometimes, when one or more of the enabled storage plugins have issues - 
> either due to misconfiguration or the underlying datasource being slow or 
> being down, the overall query time taken increases drastically. Most likely 
> due the attempt being made to register schemas from a faulty plugin.
> For example, when a jdbc plugin is configured with SQL Server, and at one 
> point the underlying SQL Server db goes down, any Drill query starting to 
> execute at that point and beyond begin to slow down drastically. 
> We must skip registering unrelated schemas (& workspaces) for a query. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query

2017-11-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250397#comment-16250397
 ] 

ASF GitHub Bot commented on DRILL-5089:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1032#discussion_r150682252
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/DynamicRootSchema.java
 ---
@@ -0,0 +1,125 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.sql;
+
+import com.google.common.collect.ImmutableSortedSet;
+import com.google.common.collect.Lists;
+import com.google.common.collect.Sets;
+import org.apache.calcite.DataContext;
+import org.apache.calcite.jdbc.CalciteRootSchema;
+import org.apache.calcite.jdbc.CalciteSchema;
+
+import org.apache.calcite.linq4j.tree.Expression;
+import org.apache.calcite.linq4j.tree.Expressions;
+import org.apache.calcite.schema.SchemaPlus;
+import org.apache.calcite.schema.impl.AbstractSchema;
+import org.apache.calcite.util.BuiltInMethod;
+import org.apache.calcite.util.Compatible;
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.exec.store.SchemaConfig;
+import org.apache.drill.exec.store.StoragePlugin;
+import org.apache.drill.exec.store.StoragePluginRegistry;
+import org.apache.drill.exec.store.SubSchemaWrapper;
+
+import java.io.IOException;
+import java.util.Map;
+import java.util.NavigableSet;
+import java.util.Set;
+
+public class DynamicRootSchema extends DynamicSchema
+implements CalciteRootSchema {
+
+  /** Creates a root schema. */
+  DynamicRootSchema(StoragePluginRegistry storages, SchemaConfig 
schemaConfig) {
+super(null, new RootSchema(), "");
+this.schemaConfig = schemaConfig;
+this.storages = storages;
+  }
+
+  @Override
+  public CalciteSchema getSubSchema(String schemaName, boolean 
caseSensitive) {
+CalciteSchema retSchema = getSubSchemaMap().get(schemaName);
+
+if (retSchema == null) {
+  loadSchemaFactory(schemaName, caseSensitive);
+}
+
+retSchema = getSubSchemaMap().get(schemaName);
+return retSchema;
+  }
+
+  @Override
+  public NavigableSet getTableNames() {
+Set pluginNames = Sets.newHashSet();
+for (Map.Entry storageEntry : 
getSchemaFactories()) {
+  pluginNames.add(storageEntry.getKey());
+}
+return Compatible.INSTANCE.navigableSet(
+ImmutableSortedSet.copyOf(
+Sets.union(pluginNames, getSubSchemaMap().keySet(;
+  }
+
+  /**
+   * load schema factory(storage plugin) for schemaName
+   * @param schemaName
+   * @param caseSensitive
+   */
+  public void loadSchemaFactory(String schemaName, boolean caseSensitive) {
+try {
+  SchemaPlus thisPlus = this.plus();
+  StoragePlugin plugin = getSchemaFactories().getPlugin(schemaName);
+  if (plugin != null) {
+plugin.registerSchemas(schemaConfig, thisPlus);
+  }
+  else {
+//this schema name could be `dfs.tmp`, a 2nd level schema under 
'dfs'
+String[] paths = schemaName.split("\\.");
+if (paths.length == 2) {
+  plugin = getSchemaFactories().getPlugin(paths[0]);
+  if (plugin != null) {
+plugin.registerSchemas(schemaConfig, thisPlus);
+  }
+
+  final CalciteSchema secondLevelSchema = 
getSubSchemaMap().get(paths[0]).getSubSchema(paths[1], caseSensitive);
+  if (secondLevelSchema != null) {
+SchemaPlus secondlevel = secondLevelSchema.plus();
+org.apache.drill.exec.store.AbstractSchema drillSchema =
--- End diff --

Fully qualified name needed? Or, will an import work?


> Skip initializing all enabled storage plugins for 

[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query

2017-11-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250396#comment-16250396
 ] 

ASF GitHub Bot commented on DRILL-5089:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1032#discussion_r150683680
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/SchemaTreeProvider.java
 ---
@@ -105,12 +106,36 @@ public SchemaPlus createRootSchema(final String 
userName, final SchemaConfigInfo
* @return
*/
   public SchemaPlus createRootSchema(SchemaConfig schemaConfig) {
+  final SchemaPlus rootSchema = 
DynamicSchema.createRootSchema(dContext.getStorage(), schemaConfig);
+  schemaTreesToClose.add(rootSchema);
+  return rootSchema;
+  }
+
+  /**
+   * Return full root schema with schema owner as the given user.
+   *
+   * @param userName Name of the user who is accessing the storage sources.
+   * @param provider {@link SchemaConfigInfoProvider} instance
+   * @return Root of the schema tree.
+   */
+  public SchemaPlus createFullRootSchema(final String userName, final 
SchemaConfigInfoProvider provider) {
+final String schemaUser = isImpersonationEnabled ? userName : 
ImpersonationUtil.getProcessUserName();
--- End diff --

Should this be factored out somewhere? Seems this magic stanza will be 
needed in many places: better to have one copy than several. Maybe a method in 
`ImpersonationUtil`?


> Skip initializing all enabled storage plugins for every query
> -
>
> Key: DRILL-5089
> URL: https://issues.apache.org/jira/browse/DRILL-5089
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 1.9.0
>Reporter: Abhishek Girish
>Assignee: Chunhui Shi
>Priority: Critical
>
> In a query's lifecycle, at attempt is made to initialize each enabled storage 
> plugin, while building the schema tree. This is done regardless of the actual 
> plugins involved within a query. 
> Sometimes, when one or more of the enabled storage plugins have issues - 
> either due to misconfiguration or the underlying datasource being slow or 
> being down, the overall query time taken increases drastically. Most likely 
> due the attempt being made to register schemas from a faulty plugin.
> For example, when a jdbc plugin is configured with SQL Server, and at one 
> point the underlying SQL Server db goes down, any Drill query starting to 
> execute at that point and beyond begin to slow down drastically. 
> We must skip registering unrelated schemas (& workspaces) for a query. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query

2017-11-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250388#comment-16250388
 ] 

ASF GitHub Bot commented on DRILL-5089:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1032#discussion_r150678680
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/DynamicRootSchema.java
 ---
@@ -0,0 +1,125 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.sql;
+
+import com.google.common.collect.ImmutableSortedSet;
+import com.google.common.collect.Lists;
+import com.google.common.collect.Sets;
+import org.apache.calcite.DataContext;
+import org.apache.calcite.jdbc.CalciteRootSchema;
+import org.apache.calcite.jdbc.CalciteSchema;
+
+import org.apache.calcite.linq4j.tree.Expression;
+import org.apache.calcite.linq4j.tree.Expressions;
+import org.apache.calcite.schema.SchemaPlus;
+import org.apache.calcite.schema.impl.AbstractSchema;
+import org.apache.calcite.util.BuiltInMethod;
+import org.apache.calcite.util.Compatible;
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.exec.store.SchemaConfig;
+import org.apache.drill.exec.store.StoragePlugin;
+import org.apache.drill.exec.store.StoragePluginRegistry;
+import org.apache.drill.exec.store.SubSchemaWrapper;
+
+import java.io.IOException;
+import java.util.Map;
+import java.util.NavigableSet;
+import java.util.Set;
+
+public class DynamicRootSchema extends DynamicSchema
+implements CalciteRootSchema {
+
+  /** Creates a root schema. */
+  DynamicRootSchema(StoragePluginRegistry storages, SchemaConfig 
schemaConfig) {
+super(null, new RootSchema(), "");
+this.schemaConfig = schemaConfig;
+this.storages = storages;
+  }
+
+  @Override
+  public CalciteSchema getSubSchema(String schemaName, boolean 
caseSensitive) {
+CalciteSchema retSchema = getSubSchemaMap().get(schemaName);
+
+if (retSchema == null) {
+  loadSchemaFactory(schemaName, caseSensitive);
+}
+
+retSchema = getSubSchemaMap().get(schemaName);
+return retSchema;
--- End diff --

if the original call returns non-null, we make the same call a second time. 
Better:
```
retSchema = ...
if (retSchema != null) { return retSchema; }
loadSchemaFactory(...)
return getSubSchemaMap()...
```


> Skip initializing all enabled storage plugins for every query
> -
>
> Key: DRILL-5089
> URL: https://issues.apache.org/jira/browse/DRILL-5089
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 1.9.0
>Reporter: Abhishek Girish
>Assignee: Chunhui Shi
>Priority: Critical
>
> In a query's lifecycle, at attempt is made to initialize each enabled storage 
> plugin, while building the schema tree. This is done regardless of the actual 
> plugins involved within a query. 
> Sometimes, when one or more of the enabled storage plugins have issues - 
> either due to misconfiguration or the underlying datasource being slow or 
> being down, the overall query time taken increases drastically. Most likely 
> due the attempt being made to register schemas from a faulty plugin.
> For example, when a jdbc plugin is configured with SQL Server, and at one 
> point the underlying SQL Server db goes down, any Drill query starting to 
> execute at that point and beyond begin to slow down drastically. 
> We must skip registering unrelated schemas (& workspaces) for a query. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query

2017-11-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250399#comment-16250399
 ] 

ASF GitHub Bot commented on DRILL-5089:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1032#discussion_r150684114
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemSchemaFactory.java
 ---
@@ -52,9 +55,20 @@
 
   private List factories;
   private String schemaName;
+  protected FileSystemPlugin plugin;
 
   public FileSystemSchemaFactory(String schemaName, 
List factories) {
 super();
+if (factories.size() > 0 ) {
+  this.plugin = factories.get(0).getPlugin();
+}
+this.schemaName = schemaName;
+this.factories = factories;
+  }
+
+  public FileSystemSchemaFactory(FileSystemPlugin plugin, String 
schemaName, List factories) {
+super();
--- End diff --

Omit; Java does this by default.


> Skip initializing all enabled storage plugins for every query
> -
>
> Key: DRILL-5089
> URL: https://issues.apache.org/jira/browse/DRILL-5089
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 1.9.0
>Reporter: Abhishek Girish
>Assignee: Chunhui Shi
>Priority: Critical
>
> In a query's lifecycle, at attempt is made to initialize each enabled storage 
> plugin, while building the schema tree. This is done regardless of the actual 
> plugins involved within a query. 
> Sometimes, when one or more of the enabled storage plugins have issues - 
> either due to misconfiguration or the underlying datasource being slow or 
> being down, the overall query time taken increases drastically. Most likely 
> due the attempt being made to register schemas from a faulty plugin.
> For example, when a jdbc plugin is configured with SQL Server, and at one 
> point the underlying SQL Server db goes down, any Drill query starting to 
> execute at that point and beyond begin to slow down drastically. 
> We must skip registering unrelated schemas (& workspaces) for a query. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query

2017-11-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250407#comment-16250407
 ] 

ASF GitHub Bot commented on DRILL-5089:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1032#discussion_r150685946
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java
 ---
@@ -532,7 +572,10 @@ public boolean isMutable() {
 }
 
 public DrillFileSystem getFS() {
-  return fs;
+  if (this.fs == null) {
+this.fs = 
ImpersonationUtil.createFileSystem(schemaConfig.getUserName(), fsConf);
+  }
+  return this.fs;
--- End diff --

No need for `this.fs`, just `fs` will do.


> Skip initializing all enabled storage plugins for every query
> -
>
> Key: DRILL-5089
> URL: https://issues.apache.org/jira/browse/DRILL-5089
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 1.9.0
>Reporter: Abhishek Girish
>Assignee: Chunhui Shi
>Priority: Critical
>
> In a query's lifecycle, at attempt is made to initialize each enabled storage 
> plugin, while building the schema tree. This is done regardless of the actual 
> plugins involved within a query. 
> Sometimes, when one or more of the enabled storage plugins have issues - 
> either due to misconfiguration or the underlying datasource being slow or 
> being down, the overall query time taken increases drastically. Most likely 
> due the attempt being made to register schemas from a faulty plugin.
> For example, when a jdbc plugin is configured with SQL Server, and at one 
> point the underlying SQL Server db goes down, any Drill query starting to 
> execute at that point and beyond begin to slow down drastically. 
> We must skip registering unrelated schemas (& workspaces) for a query. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query

2017-11-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250393#comment-16250393
 ] 

ASF GitHub Bot commented on DRILL-5089:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1032#discussion_r150679636
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/DynamicRootSchema.java
 ---
@@ -0,0 +1,125 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.sql;
+
+import com.google.common.collect.ImmutableSortedSet;
+import com.google.common.collect.Lists;
+import com.google.common.collect.Sets;
+import org.apache.calcite.DataContext;
+import org.apache.calcite.jdbc.CalciteRootSchema;
+import org.apache.calcite.jdbc.CalciteSchema;
+
+import org.apache.calcite.linq4j.tree.Expression;
+import org.apache.calcite.linq4j.tree.Expressions;
+import org.apache.calcite.schema.SchemaPlus;
+import org.apache.calcite.schema.impl.AbstractSchema;
+import org.apache.calcite.util.BuiltInMethod;
+import org.apache.calcite.util.Compatible;
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.exec.store.SchemaConfig;
+import org.apache.drill.exec.store.StoragePlugin;
+import org.apache.drill.exec.store.StoragePluginRegistry;
+import org.apache.drill.exec.store.SubSchemaWrapper;
+
+import java.io.IOException;
+import java.util.Map;
+import java.util.NavigableSet;
+import java.util.Set;
+
+public class DynamicRootSchema extends DynamicSchema
+implements CalciteRootSchema {
+
+  /** Creates a root schema. */
+  DynamicRootSchema(StoragePluginRegistry storages, SchemaConfig 
schemaConfig) {
+super(null, new RootSchema(), "");
+this.schemaConfig = schemaConfig;
+this.storages = storages;
+  }
+
+  @Override
+  public CalciteSchema getSubSchema(String schemaName, boolean 
caseSensitive) {
+CalciteSchema retSchema = getSubSchemaMap().get(schemaName);
+
+if (retSchema == null) {
+  loadSchemaFactory(schemaName, caseSensitive);
+}
+
+retSchema = getSubSchemaMap().get(schemaName);
+return retSchema;
+  }
+
+  @Override
+  public NavigableSet getTableNames() {
+Set pluginNames = Sets.newHashSet();
+for (Map.Entry storageEntry : 
getSchemaFactories()) {
+  pluginNames.add(storageEntry.getKey());
+}
+return Compatible.INSTANCE.navigableSet(
+ImmutableSortedSet.copyOf(
+Sets.union(pluginNames, getSubSchemaMap().keySet(;
+  }
+
+  /**
+   * load schema factory(storage plugin) for schemaName
+   * @param schemaName
+   * @param caseSensitive
+   */
+  public void loadSchemaFactory(String schemaName, boolean caseSensitive) {
+try {
+  SchemaPlus thisPlus = this.plus();
+  StoragePlugin plugin = getSchemaFactories().getPlugin(schemaName);
+  if (plugin != null) {
+plugin.registerSchemas(schemaConfig, thisPlus);
+  }
--- End diff --

If the name is `dfs.test`, we first look up the compound name, then the 
parts? Why? Do we put the compound names in the map? Or can we have one schema 
named "dfs.test" and another called `dfs`.`test`? Or, can this code be 
restructured a bit?


> Skip initializing all enabled storage plugins for every query
> -
>
> Key: DRILL-5089
> URL: https://issues.apache.org/jira/browse/DRILL-5089
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 1.9.0
>Reporter: Abhishek Girish
>Assignee: Chunhui Shi
>Priority: Critical
>
> In a query's lifecycle, at attempt is made to initialize each 

[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query

2017-11-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250406#comment-16250406
 ] 

ASF GitHub Bot commented on DRILL-5089:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1032#discussion_r150686404
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/ops/FragmentContext.java ---
@@ -229,7 +229,7 @@ public DrillbitContext getDrillbitContext() {
 return context;
   }
 
-  public SchemaPlus getRootSchema() {
+  public SchemaPlus getFullRootSchema() {
--- End diff --

Comment to explain what a "full root schema" is? Apparently, this is both 
the plugin config name and workspace combined?


> Skip initializing all enabled storage plugins for every query
> -
>
> Key: DRILL-5089
> URL: https://issues.apache.org/jira/browse/DRILL-5089
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 1.9.0
>Reporter: Abhishek Girish
>Assignee: Chunhui Shi
>Priority: Critical
>
> In a query's lifecycle, at attempt is made to initialize each enabled storage 
> plugin, while building the schema tree. This is done regardless of the actual 
> plugins involved within a query. 
> Sometimes, when one or more of the enabled storage plugins have issues - 
> either due to misconfiguration or the underlying datasource being slow or 
> being down, the overall query time taken increases drastically. Most likely 
> due the attempt being made to register schemas from a faulty plugin.
> For example, when a jdbc plugin is configured with SQL Server, and at one 
> point the underlying SQL Server db goes down, any Drill query starting to 
> execute at that point and beyond begin to slow down drastically. 
> We must skip registering unrelated schemas (& workspaces) for a query. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query

2017-11-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250400#comment-16250400
 ] 

ASF GitHub Bot commented on DRILL-5089:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1032#discussion_r150682787
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/DynamicSchema.java
 ---
@@ -0,0 +1,65 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.sql;
+
+import org.apache.calcite.jdbc.CalciteSchema;
+import org.apache.calcite.jdbc.SimpleCalciteSchema;
+import org.apache.calcite.schema.Schema;
+import org.apache.calcite.schema.SchemaPlus;
+import org.apache.drill.exec.store.SchemaConfig;
+import org.apache.drill.exec.store.StoragePluginRegistry;
+
+
+/**
+ * Unlike SimpleCalciteSchema, DynamicSchema could have an empty or 
partial schemaMap, but it could maintain a map of
+ * name->SchemaFactory, and only register schema when the corresponsdent 
name is requested.
+ */
+public class DynamicSchema extends SimpleCalciteSchema {
+
+  protected SchemaConfig schemaConfig;
--- End diff --

Seems the schemaConfig is never set or used.


> Skip initializing all enabled storage plugins for every query
> -
>
> Key: DRILL-5089
> URL: https://issues.apache.org/jira/browse/DRILL-5089
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 1.9.0
>Reporter: Abhishek Girish
>Assignee: Chunhui Shi
>Priority: Critical
>
> In a query's lifecycle, at attempt is made to initialize each enabled storage 
> plugin, while building the schema tree. This is done regardless of the actual 
> plugins involved within a query. 
> Sometimes, when one or more of the enabled storage plugins have issues - 
> either due to misconfiguration or the underlying datasource being slow or 
> being down, the overall query time taken increases drastically. Most likely 
> due the attempt being made to register schemas from a faulty plugin.
> For example, when a jdbc plugin is configured with SQL Server, and at one 
> point the underlying SQL Server db goes down, any Drill query starting to 
> execute at that point and beyond begin to slow down drastically. 
> We must skip registering unrelated schemas (& workspaces) for a query. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query

2017-11-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250390#comment-16250390
 ] 

ASF GitHub Bot commented on DRILL-5089:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1032#discussion_r150679941
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/DynamicRootSchema.java
 ---
@@ -0,0 +1,125 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.sql;
+
+import com.google.common.collect.ImmutableSortedSet;
+import com.google.common.collect.Lists;
+import com.google.common.collect.Sets;
+import org.apache.calcite.DataContext;
+import org.apache.calcite.jdbc.CalciteRootSchema;
+import org.apache.calcite.jdbc.CalciteSchema;
+
+import org.apache.calcite.linq4j.tree.Expression;
+import org.apache.calcite.linq4j.tree.Expressions;
+import org.apache.calcite.schema.SchemaPlus;
+import org.apache.calcite.schema.impl.AbstractSchema;
+import org.apache.calcite.util.BuiltInMethod;
+import org.apache.calcite.util.Compatible;
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.exec.store.SchemaConfig;
+import org.apache.drill.exec.store.StoragePlugin;
+import org.apache.drill.exec.store.StoragePluginRegistry;
+import org.apache.drill.exec.store.SubSchemaWrapper;
+
+import java.io.IOException;
+import java.util.Map;
+import java.util.NavigableSet;
+import java.util.Set;
+
+public class DynamicRootSchema extends DynamicSchema
+implements CalciteRootSchema {
+
+  /** Creates a root schema. */
+  DynamicRootSchema(StoragePluginRegistry storages, SchemaConfig 
schemaConfig) {
+super(null, new RootSchema(), "");
+this.schemaConfig = schemaConfig;
+this.storages = storages;
+  }
+
+  @Override
+  public CalciteSchema getSubSchema(String schemaName, boolean 
caseSensitive) {
+CalciteSchema retSchema = getSubSchemaMap().get(schemaName);
+
+if (retSchema == null) {
+  loadSchemaFactory(schemaName, caseSensitive);
+}
+
+retSchema = getSubSchemaMap().get(schemaName);
+return retSchema;
+  }
+
+  @Override
+  public NavigableSet getTableNames() {
+Set pluginNames = Sets.newHashSet();
+for (Map.Entry storageEntry : 
getSchemaFactories()) {
+  pluginNames.add(storageEntry.getKey());
+}
+return Compatible.INSTANCE.navigableSet(
+ImmutableSortedSet.copyOf(
+Sets.union(pluginNames, getSubSchemaMap().keySet(;
+  }
+
+  /**
+   * load schema factory(storage plugin) for schemaName
+   * @param schemaName
+   * @param caseSensitive
+   */
+  public void loadSchemaFactory(String schemaName, boolean caseSensitive) {
+try {
+  SchemaPlus thisPlus = this.plus();
+  StoragePlugin plugin = getSchemaFactories().getPlugin(schemaName);
+  if (plugin != null) {
+plugin.registerSchemas(schemaConfig, thisPlus);
+  }
+  else {
+//this schema name could be `dfs.tmp`, a 2nd level schema under 
'dfs'
+String[] paths = schemaName.split("\\.");
--- End diff --

Should this be done here in this simple way? How many other places do we do 
the same thing? Or, should we have a common function to split schema names so 
we can handle, way, escapes and other special cases that might come along?


> Skip initializing all enabled storage plugins for every query
> -
>
> Key: DRILL-5089
> URL: https://issues.apache.org/jira/browse/DRILL-5089
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 1.9.0
>Reporter: Abhishek 

[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query

2017-06-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16060112#comment-16060112
 ] 

ASF GitHub Bot commented on DRILL-5089:
---

Github user paul-rogers commented on the issue:

https://github.com/apache/drill/pull/795
  
Is this PR ready for final review? Or, on hold waiting for something?


> Skip initializing all enabled storage plugins for every query
> -
>
> Key: DRILL-5089
> URL: https://issues.apache.org/jira/browse/DRILL-5089
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 1.9.0
>Reporter: Abhishek Girish
>Assignee: Chunhui Shi
>Priority: Critical
>
> In a query's lifecycle, at attempt is made to initialize each enabled storage 
> plugin, while building the schema tree. This is done regardless of the actual 
> plugins involved within a query. 
> Sometimes, when one or more of the enabled storage plugins have issues - 
> either due to misconfiguration or the underlying datasource being slow or 
> being down, the overall query time taken increases drastically. Most likely 
> due the attempt being made to register schemas from a faulty plugin.
> For example, when a jdbc plugin is configured with SQL Server, and at one 
> point the underlying SQL Server db goes down, any Drill query starting to 
> execute at that point and beyond begin to slow down drastically. 
> We must skip registering unrelated schemas (& workspaces) for a query. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query

2017-04-13 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15968517#comment-15968517
 ] 

Julian Hyde commented on DRILL-5089:


How are you able to call {{SimpleCalciteSchema.from}}? Isn't 
{{SimpleCalciteSchema}} package-private?

In CALCITE-1748 you ask to override a method that returns {{CalciteSchema}} but 
in CALCITE-911 we agreed that Drill wouldn't create your own CalciteSchema 
sub-classes. What has changed?

Can you take a look at CALCITE-1742, and tell me how it relates to your 
problem? I'd rather fix Calcite's schema cache for Drill's and Phoenix's needs 
rather than let people drill holes in our APIs.

> Skip initializing all enabled storage plugins for every query
> -
>
> Key: DRILL-5089
> URL: https://issues.apache.org/jira/browse/DRILL-5089
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 1.9.0
>Reporter: Abhishek Girish
>Assignee: Chunhui Shi
>Priority: Critical
>
> In a query's lifecycle, at attempt is made to initialize each enabled storage 
> plugin, while building the schema tree. This is done regardless of the actual 
> plugins involved within a query. 
> Sometimes, when one or more of the enabled storage plugins have issues - 
> either due to misconfiguration or the underlying datasource being slow or 
> being down, the overall query time taken increases drastically. Most likely 
> due the attempt being made to register schemas from a faulty plugin.
> For example, when a jdbc plugin is configured with SQL Server, and at one 
> point the underlying SQL Server db goes down, any Drill query starting to 
> execute at that point and beyond begin to slow down drastically. 
> We must skip registering unrelated schemas (& workspaces) for a query. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query

2017-03-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15942163#comment-15942163
 ] 

ASF GitHub Bot commented on DRILL-5089:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/795#discussion_r108051728
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/SchemaTreeProvider.java
 ---
@@ -48,7 +55,8 @@
   public SchemaTreeProvider(final DrillbitContext dContext) {
 this.dContext = dContext;
 schemaTreesToClose = Lists.newArrayList();
-isImpersonationEnabled = 
dContext.getConfig().getBoolean(ExecConstants.IMPERSONATION_ENABLED);
+final DrillConfig config = dContext.getConfig();
+isImpersonationEnabled = config == null? false : 
config.getBoolean(ExecConstants.IMPERSONATION_ENABLED);
--- End diff --

Under what conditions will the config be null? Actually, should never be; 
we depend on config everywhere. So, just get the boolean and let the code get 
an NPE if someone makes a mistake and omits the (required) config.


> Skip initializing all enabled storage plugins for every query
> -
>
> Key: DRILL-5089
> URL: https://issues.apache.org/jira/browse/DRILL-5089
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 1.9.0
>Reporter: Abhishek Girish
>Assignee: Chunhui Shi
>Priority: Critical
>
> In a query's lifecycle, at attempt is made to initialize each enabled storage 
> plugin, while building the schema tree. This is done regardless of the actual 
> plugins involved within a query. 
> Sometimes, when one or more of the enabled storage plugins have issues - 
> either due to misconfiguration or the underlying datasource being slow or 
> being down, the overall query time taken increases drastically. Most likely 
> due the attempt being made to register schemas from a faulty plugin.
> For example, when a jdbc plugin is configured with SQL Server, and at one 
> point the underlying SQL Server db goes down, any Drill query starting to 
> execute at that point and beyond begin to slow down drastically. 
> We must skip registering unrelated schemas (& workspaces) for a query. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query

2017-03-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15942155#comment-15942155
 ] 

ASF GitHub Bot commented on DRILL-5089:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/795#discussion_r108051767
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/SchemaTreeProvider.java
 ---
@@ -119,6 +127,74 @@ public SchemaPlus createRootSchema(SchemaConfig 
schemaConfig) {
 }
   }
 
+
+  public SchemaPlus createPartialRootSchema(final String userName, final 
SchemaConfigInfoProvider provider,
+final String storage) {
+final String schemaUser = isImpersonationEnabled ? userName : 
ImpersonationUtil.getProcessUserName();
--- End diff --

Please move the user resolution into a method rather than duplicating the 
code.

More broadly, wouldn't the answer here be a constant for any qiven query? 
So, there is no reason to keep recomputing it. Can we just do it once and save 
it somewhere?

Let's think. If this whole structure is for a single query, then the user 
is constant and can be stored in the object itself.

Is this the only place where we need to figure out impersonation and actual 
user? If not, then the setup should be done at a higher level, not in each 
place where it is needed.

Perhaps we need a UserIdentity class that has the loginName and 
effectiveName, where these are the same for non-impersonation, but differ for 
impersonation.


> Skip initializing all enabled storage plugins for every query
> -
>
> Key: DRILL-5089
> URL: https://issues.apache.org/jira/browse/DRILL-5089
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 1.9.0
>Reporter: Abhishek Girish
>Assignee: Chunhui Shi
>Priority: Critical
>
> In a query's lifecycle, at attempt is made to initialize each enabled storage 
> plugin, while building the schema tree. This is done regardless of the actual 
> plugins involved within a query. 
> Sometimes, when one or more of the enabled storage plugins have issues - 
> either due to misconfiguration or the underlying datasource being slow or 
> being down, the overall query time taken increases drastically. Most likely 
> due the attempt being made to register schemas from a faulty plugin.
> For example, when a jdbc plugin is configured with SQL Server, and at one 
> point the underlying SQL Server db goes down, any Drill query starting to 
> execute at that point and beyond begin to slow down drastically. 
> We must skip registering unrelated schemas (& workspaces) for a query. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query

2017-03-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15942170#comment-15942170
 ] 

ASF GitHub Bot commented on DRILL-5089:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/795#discussion_r108052385
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/DrillCatalogReader.java
 ---
@@ -0,0 +1,180 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.sql;
+
+import com.google.common.base.Joiner;
+import com.google.common.collect.ImmutableList;
+import com.google.common.collect.Lists;
+import com.google.common.collect.Sets;
+import org.apache.calcite.adapter.java.JavaTypeFactory;
+import org.apache.calcite.jdbc.CalciteSchema;
+import org.apache.calcite.jdbc.CalciteSchemaImpl;
+import org.apache.calcite.jdbc.SimpleCalciteSchema;
+import org.apache.calcite.prepare.CalciteCatalogReader;
+import org.apache.calcite.prepare.RelOptTableImpl;
+import org.apache.calcite.schema.SchemaPlus;
+import org.apache.drill.common.config.DrillConfig;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.exec.ops.QueryContext;
+import org.apache.drill.exec.store.SchemaConfig;
+
+import java.util.Arrays;
+import java.util.List;
+import java.util.Set;
+
+/**
+ * Implementation of {@link 
org.apache.calcite.prepare.Prepare.CatalogReader}
--- End diff --

Maybe explain the purpose and not just the implementation? What part of 
query processing does this address? Does this resolve references to schemas and 
workspaces? Then the schema resolves the reference to a table name?


> Skip initializing all enabled storage plugins for every query
> -
>
> Key: DRILL-5089
> URL: https://issues.apache.org/jira/browse/DRILL-5089
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 1.9.0
>Reporter: Abhishek Girish
>Assignee: Chunhui Shi
>Priority: Critical
>
> In a query's lifecycle, at attempt is made to initialize each enabled storage 
> plugin, while building the schema tree. This is done regardless of the actual 
> plugins involved within a query. 
> Sometimes, when one or more of the enabled storage plugins have issues - 
> either due to misconfiguration or the underlying datasource being slow or 
> being down, the overall query time taken increases drastically. Most likely 
> due the attempt being made to register schemas from a faulty plugin.
> For example, when a jdbc plugin is configured with SQL Server, and at one 
> point the underlying SQL Server db goes down, any Drill query starting to 
> execute at that point and beyond begin to slow down drastically. 
> We must skip registering unrelated schemas (& workspaces) for a query. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query

2017-03-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15942157#comment-15942157
 ] 

ASF GitHub Bot commented on DRILL-5089:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/795#discussion_r108051773
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/SchemaTreeProvider.java
 ---
@@ -119,6 +127,74 @@ public SchemaPlus createRootSchema(SchemaConfig 
schemaConfig) {
 }
   }
 
+
+  public SchemaPlus createPartialRootSchema(final String userName, final 
SchemaConfigInfoProvider provider,
+final String storage) {
+final String schemaUser = isImpersonationEnabled ? userName : 
ImpersonationUtil.getProcessUserName();
+final SchemaConfig schemaConfig = SchemaConfig.newBuilder(schemaUser, 
provider).build();
+final SchemaPlus rootSchema = 
SimpleCalciteSchema.createRootSchema(false);
+Set storageSet = Sets.newHashSet();
+storageSet.add(storage);
+addNewStoragesToRootSchema(schemaConfig, rootSchema, storageSet);
+schemaTreesToClose.add(rootSchema);
+return rootSchema;
+  }
+
+  public SchemaPlus addPartialRootSchema(final String userName, final 
SchemaConfigInfoProvider provider,
--- End diff --

Would be very helpful to add a bit of Javadoc to explain the purpose of 
these methods.


> Skip initializing all enabled storage plugins for every query
> -
>
> Key: DRILL-5089
> URL: https://issues.apache.org/jira/browse/DRILL-5089
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 1.9.0
>Reporter: Abhishek Girish
>Assignee: Chunhui Shi
>Priority: Critical
>
> In a query's lifecycle, at attempt is made to initialize each enabled storage 
> plugin, while building the schema tree. This is done regardless of the actual 
> plugins involved within a query. 
> Sometimes, when one or more of the enabled storage plugins have issues - 
> either due to misconfiguration or the underlying datasource being slow or 
> being down, the overall query time taken increases drastically. Most likely 
> due the attempt being made to register schemas from a faulty plugin.
> For example, when a jdbc plugin is configured with SQL Server, and at one 
> point the underlying SQL Server db goes down, any Drill query starting to 
> execute at that point and beyond begin to slow down drastically. 
> We must skip registering unrelated schemas (& workspaces) for a query. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query

2017-03-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15942167#comment-15942167
 ] 

ASF GitHub Bot commented on DRILL-5089:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/795#discussion_r108050520
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/DrillCatalogReader.java
 ---
@@ -0,0 +1,180 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.sql;
+
+import com.google.common.base.Joiner;
+import com.google.common.collect.ImmutableList;
+import com.google.common.collect.Lists;
+import com.google.common.collect.Sets;
+import org.apache.calcite.adapter.java.JavaTypeFactory;
+import org.apache.calcite.jdbc.CalciteSchema;
+import org.apache.calcite.jdbc.CalciteSchemaImpl;
+import org.apache.calcite.jdbc.SimpleCalciteSchema;
+import org.apache.calcite.prepare.CalciteCatalogReader;
+import org.apache.calcite.prepare.RelOptTableImpl;
+import org.apache.calcite.schema.SchemaPlus;
+import org.apache.drill.common.config.DrillConfig;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.exec.ops.QueryContext;
+import org.apache.drill.exec.store.SchemaConfig;
+
+import java.util.Arrays;
+import java.util.List;
+import java.util.Set;
+
+/**
+ * Implementation of {@link 
org.apache.calcite.prepare.Prepare.CatalogReader}
+ * and also {@link org.apache.calcite.sql.SqlOperatorTable} based on 
tables and
+ * functions defined schemas.
+ *
+ */
+public class DrillCatalogReader extends CalciteCatalogReader {
+  private static org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(DrillCatalogReader.class);
+
+  final private QueryContext queryContext;
+  private boolean allowTemporaryTables;
+  private String temporarySchema;
+
+  public DrillCatalogReader(
+  QueryContext qcontext,
+  CalciteSchema rootSchema,
+  boolean caseSensitive,
+  List defaultSchema,
+  JavaTypeFactory typeFactory,
+  String temporarySchema) {
+super(rootSchema, caseSensitive, defaultSchema, typeFactory);
+assert rootSchema != defaultSchema;
+queryContext = qcontext;
+this.temporarySchema = temporarySchema;
+this.allowTemporaryTables = true;
+  }
+
+  /** Disallow temporary tables presence in sql statement (ex: in view 
definitions) */
+  public void disallowTemporaryTables() {
--- End diff --

Once the constructor gets as complex as this one, and we have secondary 
methods to set what should be immutable properties, it may be time to move to a 
builder pattern:

```
public class DrillCatalogReader ...
static CatalogBuilder builder(QueryContext context) ...
...
class CatalogBuilder {
CatalogBuider(QueryContext context);
CatalogBuilder withTemporarySchema(String temporarySchema);
CatalogBuilder allowsTempTables();
CatalogBuilder caseSensitive();
...
DrillCatalogReader build();
}
```

Code that uses this class will be easier to follow with named methods for 
each property. Reasonable defaults reduces the number of constructor arguments 
and avoids the need for lots of code fixups when, inevitably, we add arguments.


> Skip initializing all enabled storage plugins for every query
> -
>
> Key: DRILL-5089
> URL: https://issues.apache.org/jira/browse/DRILL-5089
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 1.9.0
>Reporter: Abhishek Girish
>Assignee: Chunhui Shi
>Priority: Critical
>
> In a query's lifecycle, at attempt is made to initialize each enabled storage 
> plugin, while 

[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query

2017-03-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15942169#comment-15942169
 ] 

ASF GitHub Bot commented on DRILL-5089:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/795#discussion_r108051870
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/SchemaTreeProvider.java
 ---
@@ -119,6 +127,74 @@ public SchemaPlus createRootSchema(SchemaConfig 
schemaConfig) {
 }
   }
 
+
+  public SchemaPlus createPartialRootSchema(final String userName, final 
SchemaConfigInfoProvider provider,
+final String storage) {
+final String schemaUser = isImpersonationEnabled ? userName : 
ImpersonationUtil.getProcessUserName();
+final SchemaConfig schemaConfig = SchemaConfig.newBuilder(schemaUser, 
provider).build();
+final SchemaPlus rootSchema = 
SimpleCalciteSchema.createRootSchema(false);
+Set storageSet = Sets.newHashSet();
+storageSet.add(storage);
+addNewStoragesToRootSchema(schemaConfig, rootSchema, storageSet);
+schemaTreesToClose.add(rootSchema);
+return rootSchema;
+  }
+
+  public SchemaPlus addPartialRootSchema(final String userName, final 
SchemaConfigInfoProvider provider,
+Set storages, 
SchemaPlus rootSchema) {
+final String schemaUser = isImpersonationEnabled ? userName : 
ImpersonationUtil.getProcessUserName();
+final SchemaConfig schemaConfig = SchemaConfig.newBuilder(schemaUser, 
provider).build();
+addNewStoragesToRootSchema(schemaConfig, rootSchema, storages);
+schemaTreesToClose.add(rootSchema);
+return rootSchema;
+  }
+
+  private void expandSecondLevelSchema(SchemaPlus parent) {
+List secondLevelSchemas = Lists.newArrayList();
+for (String firstLevelSchemaName : parent.getSubSchemaNames()) {
+  SchemaPlus firstLevelSchema = 
parent.getSubSchema(firstLevelSchemaName);
+  for (String secondLevelSchemaName : 
firstLevelSchema.getSubSchemaNames()) {
--- End diff --

Is there something special about second-level schemas? Must they be 
workspaces? Something else? Please explain.


> Skip initializing all enabled storage plugins for every query
> -
>
> Key: DRILL-5089
> URL: https://issues.apache.org/jira/browse/DRILL-5089
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 1.9.0
>Reporter: Abhishek Girish
>Assignee: Chunhui Shi
>Priority: Critical
>
> In a query's lifecycle, at attempt is made to initialize each enabled storage 
> plugin, while building the schema tree. This is done regardless of the actual 
> plugins involved within a query. 
> Sometimes, when one or more of the enabled storage plugins have issues - 
> either due to misconfiguration or the underlying datasource being slow or 
> being down, the overall query time taken increases drastically. Most likely 
> due the attempt being made to register schemas from a faulty plugin.
> For example, when a jdbc plugin is configured with SQL Server, and at one 
> point the underlying SQL Server db goes down, any Drill query starting to 
> execute at that point and beyond begin to slow down drastically. 
> We must skip registering unrelated schemas (& workspaces) for a query. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query

2017-03-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15942161#comment-15942161
 ] 

ASF GitHub Bot commented on DRILL-5089:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/795#discussion_r108051710
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/AbstractSchema.java ---
@@ -198,6 +200,10 @@ public Expression getExpression(SchemaPlus 
parentSchema, String name) {
 
   @Override
   public boolean contentsHaveChangedSince(long lastCheck, long now) {
+//use cached schema for 1 second, it will be helpful if it is highly 
concurrent scenario
+if( abs(now - lastCheck) < 1000 ) {
+  return false;
--- End diff --

Clearer would be:

```
return (now - lastCheck) < 1000;
```

And, the 1000 should be made into a constant:

```
public static int CACHED_SCHEMA_REFRESH_MS = 1000;
```


> Skip initializing all enabled storage plugins for every query
> -
>
> Key: DRILL-5089
> URL: https://issues.apache.org/jira/browse/DRILL-5089
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 1.9.0
>Reporter: Abhishek Girish
>Assignee: Chunhui Shi
>Priority: Critical
>
> In a query's lifecycle, at attempt is made to initialize each enabled storage 
> plugin, while building the schema tree. This is done regardless of the actual 
> plugins involved within a query. 
> Sometimes, when one or more of the enabled storage plugins have issues - 
> either due to misconfiguration or the underlying datasource being slow or 
> being down, the overall query time taken increases drastically. Most likely 
> due the attempt being made to register schemas from a faulty plugin.
> For example, when a jdbc plugin is configured with SQL Server, and at one 
> point the underlying SQL Server db goes down, any Drill query starting to 
> execute at that point and beyond begin to slow down drastically. 
> We must skip registering unrelated schemas (& workspaces) for a query. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query

2017-03-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15942159#comment-15942159
 ] 

ASF GitHub Bot commented on DRILL-5089:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/795#discussion_r108052368
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/ops/QueryContext.java ---
@@ -144,6 +145,25 @@ public SchemaPlus getNewDefaultSchema() {
 return defaultSchema;
   }
 
+  public SchemaPlus getPartialDefaultSchema() {
+
+final String sessionSchemaPath = session.getDefaultSchemaPath();
+final List schemaPathAsList = 
Lists.newArrayList(sessionSchemaPath.split("\\."));
+
+final SchemaPlus rootSchema = 
schemaTreeProvider.createPartialRootSchema(getQueryUserName(),
+this, schemaPathAsList.get(0));
+
+final SchemaPlus defaultSchema = session.getDefaultSchema(rootSchema);
+
+if (defaultSchema == null) {
+  return rootSchema;
+}
+return defaultSchema;
+  }
+
+  public void addNewRelevantSchema(Set storages, SchemaPlus 
toExpandSchema) {
+schemaTreeProvider.addPartialRootSchema(getQueryUserName(), this, 
storages, toExpandSchema);
--- End diff --

Per a comment later on, find a good home for computing the effective user 
name (with impersonation enabled). That calculation should probably be done on 
the user session if the impersonation is for the life of the session, or on the 
query context if the impersonation is only for the single query. (Note that if 
it is only for a query, temp tables probably won't work the way we expect.)

But, doing the impersonation check in the schema tree provider is pushing 
that behavior too far down into the code.


> Skip initializing all enabled storage plugins for every query
> -
>
> Key: DRILL-5089
> URL: https://issues.apache.org/jira/browse/DRILL-5089
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 1.9.0
>Reporter: Abhishek Girish
>Assignee: Chunhui Shi
>Priority: Critical
>
> In a query's lifecycle, at attempt is made to initialize each enabled storage 
> plugin, while building the schema tree. This is done regardless of the actual 
> plugins involved within a query. 
> Sometimes, when one or more of the enabled storage plugins have issues - 
> either due to misconfiguration or the underlying datasource being slow or 
> being down, the overall query time taken increases drastically. Most likely 
> due the attempt being made to register schemas from a faulty plugin.
> For example, when a jdbc plugin is configured with SQL Server, and at one 
> point the underlying SQL Server db goes down, any Drill query starting to 
> execute at that point and beyond begin to slow down drastically. 
> We must skip registering unrelated schemas (& workspaces) for a query. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query

2017-03-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15942160#comment-15942160
 ] 

ASF GitHub Bot commented on DRILL-5089:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/795#discussion_r108052279
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/SchemaTreeProvider.java
 ---
@@ -119,6 +127,74 @@ public SchemaPlus createRootSchema(SchemaConfig 
schemaConfig) {
 }
   }
 
+
+  public SchemaPlus createPartialRootSchema(final String userName, final 
SchemaConfigInfoProvider provider,
+final String storage) {
+final String schemaUser = isImpersonationEnabled ? userName : 
ImpersonationUtil.getProcessUserName();
+final SchemaConfig schemaConfig = SchemaConfig.newBuilder(schemaUser, 
provider).build();
+final SchemaPlus rootSchema = 
SimpleCalciteSchema.createRootSchema(false);
+Set storageSet = Sets.newHashSet();
+storageSet.add(storage);
+addNewStoragesToRootSchema(schemaConfig, rootSchema, storageSet);
+schemaTreesToClose.add(rootSchema);
+return rootSchema;
+  }
+
+  public SchemaPlus addPartialRootSchema(final String userName, final 
SchemaConfigInfoProvider provider,
+Set storages, 
SchemaPlus rootSchema) {
+final String schemaUser = isImpersonationEnabled ? userName : 
ImpersonationUtil.getProcessUserName();
+final SchemaConfig schemaConfig = SchemaConfig.newBuilder(schemaUser, 
provider).build();
+addNewStoragesToRootSchema(schemaConfig, rootSchema, storages);
+schemaTreesToClose.add(rootSchema);
+return rootSchema;
+  }
+
+  private void expandSecondLevelSchema(SchemaPlus parent) {
+List secondLevelSchemas = Lists.newArrayList();
+for (String firstLevelSchemaName : parent.getSubSchemaNames()) {
+  SchemaPlus firstLevelSchema = 
parent.getSubSchema(firstLevelSchemaName);
+  for (String secondLevelSchemaName : 
firstLevelSchema.getSubSchemaNames()) {
+
secondLevelSchemas.add(firstLevelSchema.getSubSchema(secondLevelSchemaName));
+  }
+}
+
+for (SchemaPlus schema : secondLevelSchemas) {
+  AbstractSchema drillSchema;
+  try {
+drillSchema = schema.unwrap(AbstractSchema.class);
+  } catch (ClassCastException e) {
+throw new RuntimeException(String.format("Schema '%s' is not 
expected under root schema", schema.getName()));
+  }
+  SubSchemaWrapper wrapper = new SubSchemaWrapper(drillSchema);
+  parent.add(wrapper.getName(), wrapper);
+}
+  }
+
+  public void addNewStoragesToRootSchema(SchemaConfig schemaConfig, 
SchemaPlus rootSchema, Set storages) {
+try {
+  for(String name: storages) {
+try {
+  StoragePlugin plugin = dContext.getStorage().getPlugin(name);
+  if(plugin == null) {
+logger.error("Got null for storage plugin of name: " + name);
+continue;
+  }
+  plugin.registerSchemas(schemaConfig, rootSchema);
+  expandSecondLevelSchema(rootSchema);
+}
+catch (ExecutionSetupException ex) {
--- End diff --

Presumably this method adds storages based on need: we add a storage only 
if a query references it. If so, then we need to pass the exception along. If 
we don't, the user will simply be told that "foo.bar" does not exist when, in 
fact, perhaps it does exist but is configured wrong.

And, note that we do catch an IOException and translate it to a user 
exception below.

So, maybe

* Remove this try/catch block.
* Change the following catch from IOException to just Exception.

The result will be that we fail the query for all errors, not just IO 
errors. Again, this is the right choice if we are only loading the schemas 
needed by the query.


> Skip initializing all enabled storage plugins for every query
> -
>
> Key: DRILL-5089
> URL: https://issues.apache.org/jira/browse/DRILL-5089
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 1.9.0
>Reporter: Abhishek Girish
>Assignee: Chunhui Shi
>Priority: Critical
>
> In a query's lifecycle, at attempt is made to initialize each enabled storage 
> plugin, while building the schema tree. This is done regardless of the actual 
> plugins involved within a query. 
> Sometimes, when one or more of the enabled storage plugins have issues - 
> either 

[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query

2017-03-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15942156#comment-15942156
 ] 

ASF GitHub Bot commented on DRILL-5089:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/795#discussion_r108052308
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/ops/QueryContext.java ---
@@ -144,6 +145,25 @@ public SchemaPlus getNewDefaultSchema() {
 return defaultSchema;
   }
 
+  public SchemaPlus getPartialDefaultSchema() {
--- End diff --

Javadoc to explain what a partial default schema might be?


> Skip initializing all enabled storage plugins for every query
> -
>
> Key: DRILL-5089
> URL: https://issues.apache.org/jira/browse/DRILL-5089
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 1.9.0
>Reporter: Abhishek Girish
>Assignee: Chunhui Shi
>Priority: Critical
>
> In a query's lifecycle, at attempt is made to initialize each enabled storage 
> plugin, while building the schema tree. This is done regardless of the actual 
> plugins involved within a query. 
> Sometimes, when one or more of the enabled storage plugins have issues - 
> either due to misconfiguration or the underlying datasource being slow or 
> being down, the overall query time taken increases drastically. Most likely 
> due the attempt being made to register schemas from a faulty plugin.
> For example, when a jdbc plugin is configured with SQL Server, and at one 
> point the underlying SQL Server db goes down, any Drill query starting to 
> execute at that point and beyond begin to slow down drastically. 
> We must skip registering unrelated schemas (& workspaces) for a query. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query

2017-03-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15942168#comment-15942168
 ] 

ASF GitHub Bot commented on DRILL-5089:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/795#discussion_r108050436
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/ops/QueryContext.java ---
@@ -144,6 +145,25 @@ public SchemaPlus getNewDefaultSchema() {
 return defaultSchema;
   }
 
+  public SchemaPlus getPartialDefaultSchema() {
+
+final String sessionSchemaPath = session.getDefaultSchemaPath();
+final List schemaPathAsList = 
Lists.newArrayList(sessionSchemaPath.split("\\."));
+
+final SchemaPlus rootSchema = 
schemaTreeProvider.createPartialRootSchema(getQueryUserName(),
+this, schemaPathAsList.get(0));
+
+final SchemaPlus defaultSchema = session.getDefaultSchema(rootSchema);
+
+if (defaultSchema == null) {
+  return rootSchema;
+}
+return defaultSchema;
+  }
+
+  public void addNewRelevantSchema(Set storages, SchemaPlus 
toExpandSchema) {
--- End diff --

`addNewSchema`? Presumably the schema is relevant of we'd not be adding 
it...

If we are adding the schema, it is new to this context, so maybe just 
`addSchema`.


> Skip initializing all enabled storage plugins for every query
> -
>
> Key: DRILL-5089
> URL: https://issues.apache.org/jira/browse/DRILL-5089
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 1.9.0
>Reporter: Abhishek Girish
>Assignee: Chunhui Shi
>Priority: Critical
>
> In a query's lifecycle, at attempt is made to initialize each enabled storage 
> plugin, while building the schema tree. This is done regardless of the actual 
> plugins involved within a query. 
> Sometimes, when one or more of the enabled storage plugins have issues - 
> either due to misconfiguration or the underlying datasource being slow or 
> being down, the overall query time taken increases drastically. Most likely 
> due the attempt being made to register schemas from a faulty plugin.
> For example, when a jdbc plugin is configured with SQL Server, and at one 
> point the underlying SQL Server db goes down, any Drill query starting to 
> execute at that point and beyond begin to slow down drastically. 
> We must skip registering unrelated schemas (& workspaces) for a query. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query

2017-03-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15942162#comment-15942162
 ] 

ASF GitHub Bot commented on DRILL-5089:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/795#discussion_r108050603
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/SqlConverter.java
 ---
@@ -107,26 +110,51 @@ public SqlConverter(QueryContext context) {
 this.sqlToRelConverterConfig = new SqlToRelConverterConfig();
 this.isInnerQuery = false;
 this.typeFactory = new JavaTypeFactoryImpl(DRILL_TYPE_SYSTEM);
-this.defaultSchema =  context.getNewDefaultSchema();
-this.rootSchema = rootSchema(defaultSchema);
+
+this.defaultSchema = context.getPartialDefaultSchema();
--- End diff --

All these "this." prefixes are not needed. "this." is used only when member 
names are the same as argument names:

```
public Foo(String name) {
this.name = name;
}
```

In this case, a reader has to double check: what is the difference between 
`this.defaultSchema` and `defaultSchema`? Turns out, they are the same, so we 
should use the same form of reference everywhere in the ctor.


> Skip initializing all enabled storage plugins for every query
> -
>
> Key: DRILL-5089
> URL: https://issues.apache.org/jira/browse/DRILL-5089
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 1.9.0
>Reporter: Abhishek Girish
>Assignee: Chunhui Shi
>Priority: Critical
>
> In a query's lifecycle, at attempt is made to initialize each enabled storage 
> plugin, while building the schema tree. This is done regardless of the actual 
> plugins involved within a query. 
> Sometimes, when one or more of the enabled storage plugins have issues - 
> either due to misconfiguration or the underlying datasource being slow or 
> being down, the overall query time taken increases drastically. Most likely 
> due the attempt being made to register schemas from a faulty plugin.
> For example, when a jdbc plugin is configured with SQL Server, and at one 
> point the underlying SQL Server db goes down, any Drill query starting to 
> execute at that point and beyond begin to slow down drastically. 
> We must skip registering unrelated schemas (& workspaces) for a query. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query

2017-03-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15942164#comment-15942164
 ] 

ASF GitHub Bot commented on DRILL-5089:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/795#discussion_r108051823
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/SchemaTreeProvider.java
 ---
@@ -119,6 +127,74 @@ public SchemaPlus createRootSchema(SchemaConfig 
schemaConfig) {
 }
   }
 
+
+  public SchemaPlus createPartialRootSchema(final String userName, final 
SchemaConfigInfoProvider provider,
+final String storage) {
+final String schemaUser = isImpersonationEnabled ? userName : 
ImpersonationUtil.getProcessUserName();
+final SchemaConfig schemaConfig = SchemaConfig.newBuilder(schemaUser, 
provider).build();
+final SchemaPlus rootSchema = 
SimpleCalciteSchema.createRootSchema(false);
+Set storageSet = Sets.newHashSet();
+storageSet.add(storage);
+addNewStoragesToRootSchema(schemaConfig, rootSchema, storageSet);
+schemaTreesToClose.add(rootSchema);
--- End diff --

There is quite a bit of plumbing here to get right -- for each new schema 
added. Please factor out the common code into a (private) implementation 
method. Else, we'll be chasing bugs forever as people modify one copy but miss 
the others.


> Skip initializing all enabled storage plugins for every query
> -
>
> Key: DRILL-5089
> URL: https://issues.apache.org/jira/browse/DRILL-5089
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 1.9.0
>Reporter: Abhishek Girish
>Assignee: Chunhui Shi
>Priority: Critical
>
> In a query's lifecycle, at attempt is made to initialize each enabled storage 
> plugin, while building the schema tree. This is done regardless of the actual 
> plugins involved within a query. 
> Sometimes, when one or more of the enabled storage plugins have issues - 
> either due to misconfiguration or the underlying datasource being slow or 
> being down, the overall query time taken increases drastically. Most likely 
> due the attempt being made to register schemas from a faulty plugin.
> For example, when a jdbc plugin is configured with SQL Server, and at one 
> point the underlying SQL Server db goes down, any Drill query starting to 
> execute at that point and beyond begin to slow down drastically. 
> We must skip registering unrelated schemas (& workspaces) for a query. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query

2017-03-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15942166#comment-15942166
 ] 

ASF GitHub Bot commented on DRILL-5089:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/795#discussion_r108050543
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/SchemaUtilites.java
 ---
@@ -185,6 +186,35 @@ public static AbstractSchema 
resolveToMutableDrillSchema(final SchemaPlus defaul
   }
 
   /**
+   * Given a schema in schema tree, once this schema is found resolve it 
into a mutable AbstractDrillSchema
+   *  instance. A {@link UserException} is throws when:
+   *   1. Schema is a root schema
+   *   2. schema is not a mutable schema.
+   * @param schema
+   * @return
+   */
+  public static AbstractSchema toMutableDrillSchema(final SchemaPlus 
schema) {
+if (schema == null) {
+  throwSchemaNotFoundException(schema, schema.getName());
--- End diff --

Err... If schema is null, then `schema.getName()` will throw an NPE...

Why would the schema be null? Should this be:

```
assert schema != null;
```



> Skip initializing all enabled storage plugins for every query
> -
>
> Key: DRILL-5089
> URL: https://issues.apache.org/jira/browse/DRILL-5089
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 1.9.0
>Reporter: Abhishek Girish
>Assignee: Chunhui Shi
>Priority: Critical
>
> In a query's lifecycle, at attempt is made to initialize each enabled storage 
> plugin, while building the schema tree. This is done regardless of the actual 
> plugins involved within a query. 
> Sometimes, when one or more of the enabled storage plugins have issues - 
> either due to misconfiguration or the underlying datasource being slow or 
> being down, the overall query time taken increases drastically. Most likely 
> due the attempt being made to register schemas from a faulty plugin.
> For example, when a jdbc plugin is configured with SQL Server, and at one 
> point the underlying SQL Server db goes down, any Drill query starting to 
> execute at that point and beyond begin to slow down drastically. 
> We must skip registering unrelated schemas (& workspaces) for a query. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query

2017-03-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15942158#comment-15942158
 ] 

ASF GitHub Bot commented on DRILL-5089:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/795#discussion_r108051857
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/SchemaTreeProvider.java
 ---
@@ -119,6 +127,74 @@ public SchemaPlus createRootSchema(SchemaConfig 
schemaConfig) {
 }
   }
 
+
+  public SchemaPlus createPartialRootSchema(final String userName, final 
SchemaConfigInfoProvider provider,
+final String storage) {
+final String schemaUser = isImpersonationEnabled ? userName : 
ImpersonationUtil.getProcessUserName();
+final SchemaConfig schemaConfig = SchemaConfig.newBuilder(schemaUser, 
provider).build();
+final SchemaPlus rootSchema = 
SimpleCalciteSchema.createRootSchema(false);
+Set storageSet = Sets.newHashSet();
+storageSet.add(storage);
+addNewStoragesToRootSchema(schemaConfig, rootSchema, storageSet);
+schemaTreesToClose.add(rootSchema);
+return rootSchema;
+  }
+
+  public SchemaPlus addPartialRootSchema(final String userName, final 
SchemaConfigInfoProvider provider,
+Set storages, 
SchemaPlus rootSchema) {
+final String schemaUser = isImpersonationEnabled ? userName : 
ImpersonationUtil.getProcessUserName();
+final SchemaConfig schemaConfig = SchemaConfig.newBuilder(schemaUser, 
provider).build();
+addNewStoragesToRootSchema(schemaConfig, rootSchema, storages);
+schemaTreesToClose.add(rootSchema);
+return rootSchema;
+  }
+
+  private void expandSecondLevelSchema(SchemaPlus parent) {
--- End diff --

Maybe explain this a bit? Why are we expanding second-level schemas for 
*all* top-level schemas? Can't we do the expansion on the fly as we resolve? 
That is, if a query has a path "a.b.c.d", can't we just resolve a, then within 
a, resolve b, and so on until we get to d? Else, we are still open to a 
performance hit if, say, a is a directory of a million files, or a database 
with 10K tables.


> Skip initializing all enabled storage plugins for every query
> -
>
> Key: DRILL-5089
> URL: https://issues.apache.org/jira/browse/DRILL-5089
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 1.9.0
>Reporter: Abhishek Girish
>Assignee: Chunhui Shi
>Priority: Critical
>
> In a query's lifecycle, at attempt is made to initialize each enabled storage 
> plugin, while building the schema tree. This is done regardless of the actual 
> plugins involved within a query. 
> Sometimes, when one or more of the enabled storage plugins have issues - 
> either due to misconfiguration or the underlying datasource being slow or 
> being down, the overall query time taken increases drastically. Most likely 
> due the attempt being made to register schemas from a faulty plugin.
> For example, when a jdbc plugin is configured with SQL Server, and at one 
> point the underlying SQL Server db goes down, any Drill query starting to 
> execute at that point and beyond begin to slow down drastically. 
> We must skip registering unrelated schemas (& workspaces) for a query. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query

2017-03-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15942154#comment-15942154
 ] 

ASF GitHub Bot commented on DRILL-5089:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/795#discussion_r108051696
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/AbstractSchema.java ---
@@ -198,6 +200,10 @@ public Expression getExpression(SchemaPlus 
parentSchema, String name) {
 
   @Override
   public boolean contentsHaveChangedSince(long lastCheck, long now) {
+//use cached schema for 1 second, it will be helpful if it is highly 
concurrent scenario
+if( abs(now - lastCheck) < 1000 ) {
--- End diff --

You want to check if either now is one second in the future -- or one 
second in the past? Under what conditions would lastCheck - now be negative and 
so would need an abs?


> Skip initializing all enabled storage plugins for every query
> -
>
> Key: DRILL-5089
> URL: https://issues.apache.org/jira/browse/DRILL-5089
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 1.9.0
>Reporter: Abhishek Girish
>Assignee: Chunhui Shi
>Priority: Critical
>
> In a query's lifecycle, at attempt is made to initialize each enabled storage 
> plugin, while building the schema tree. This is done regardless of the actual 
> plugins involved within a query. 
> Sometimes, when one or more of the enabled storage plugins have issues - 
> either due to misconfiguration or the underlying datasource being slow or 
> being down, the overall query time taken increases drastically. Most likely 
> due the attempt being made to register schemas from a faulty plugin.
> For example, when a jdbc plugin is configured with SQL Server, and at one 
> point the underlying SQL Server db goes down, any Drill query starting to 
> execute at that point and beyond begin to slow down drastically. 
> We must skip registering unrelated schemas (& workspaces) for a query. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query

2017-03-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15942165#comment-15942165
 ] 

ASF GitHub Bot commented on DRILL-5089:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/795#discussion_r108052374
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/ops/QueryContext.java ---
@@ -253,6 +273,10 @@ public RemoteFunctionRegistry 
getRemoteFunctionRegistry() {
 return drillbitContext.getRemoteFunctionRegistry();
   }
 
+  public SchemaTreeProvider getSchemaTreeProvider() {
--- End diff --

Javadoc comment to explain this?


> Skip initializing all enabled storage plugins for every query
> -
>
> Key: DRILL-5089
> URL: https://issues.apache.org/jira/browse/DRILL-5089
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 1.9.0
>Reporter: Abhishek Girish
>Assignee: Chunhui Shi
>Priority: Critical
>
> In a query's lifecycle, at attempt is made to initialize each enabled storage 
> plugin, while building the schema tree. This is done regardless of the actual 
> plugins involved within a query. 
> Sometimes, when one or more of the enabled storage plugins have issues - 
> either due to misconfiguration or the underlying datasource being slow or 
> being down, the overall query time taken increases drastically. Most likely 
> due the attempt being made to register schemas from a faulty plugin.
> For example, when a jdbc plugin is configured with SQL Server, and at one 
> point the underlying SQL Server db goes down, any Drill query starting to 
> execute at that point and beyond begin to slow down drastically. 
> We must skip registering unrelated schemas (& workspaces) for a query. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query

2017-03-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15941151#comment-15941151
 ] 

ASF GitHub Bot commented on DRILL-5089:
---

GitHub user chunhui-shi opened a pull request:

https://github.com/apache/drill/pull/795

DRILL-5089: Get only partial schemas of relevant storage plugins inst…

…ead of all storages ahead.

1. For each query, rootSchema is empty, add schemas of a storage plugin 
only when needed -- when asked for a schemaPath.

2. Allow mock environments to provide dynamic schemas.

3. SchemaUtils.findSchema used in many Sql handlers now is handled by 
SqlConverter.catalog to allow expand schema dynamically

4. Temp table resolve also could not take a schema tree and assume it is 
complete.

NOTE: pom.xml in this pull request is pointing to temporary calcite versoin 
'r20-test', this should change to 'r20' once this pull request is ready to 
commit

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chunhui-shi/drill DRILL-5089

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/795.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #795


commit 4814e37eefd16ee7aa1ba6d21d66211e41922dbf
Author: chunhui-shi 
Date:   2017-01-06T09:13:02Z

DRILL-5089: Get only partial schemas of relevant storage plugins instead of 
all storages ahead.

1. For each query, rootSchema is empty, add schemas of a storage plugin 
only when needed -- when asked for a schemaPath.

2. Allow mock environments to provide dynamic schemas.

3. SchemaUtils.findSchema used in many Sql handlers now is handled by 
SqlConverter.catalog to allow expand schema dynamically

4. Temp table resolve also could not take a schema tree and assume it is 
complete.

NOTE: pom.xml in this pull request is pointing to temporary calcite versoin 
'r20-test', this should change to 'r20' once this pull request is ready to 
commit




> Skip initializing all enabled storage plugins for every query
> -
>
> Key: DRILL-5089
> URL: https://issues.apache.org/jira/browse/DRILL-5089
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 1.9.0
>Reporter: Abhishek Girish
>Assignee: Chunhui Shi
>Priority: Critical
>
> In a query's lifecycle, at attempt is made to initialize each enabled storage 
> plugin, while building the schema tree. This is done regardless of the actual 
> plugins involved within a query. 
> Sometimes, when one or more of the enabled storage plugins have issues - 
> either due to misconfiguration or the underlying datasource being slow or 
> being down, the overall query time taken increases drastically. Most likely 
> due the attempt being made to register schemas from a faulty plugin.
> For example, when a jdbc plugin is configured with SQL Server, and at one 
> point the underlying SQL Server db goes down, any Drill query starting to 
> execute at that point and beyond begin to slow down drastically. 
> We must skip registering unrelated schemas (& workspaces) for a query. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)