[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query
[ https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263503#comment-16263503 ] ASF GitHub Bot commented on DRILL-5089: --- Github user asfgit closed the pull request at: https://github.com/apache/drill/pull/1032 > Skip initializing all enabled storage plugins for every query > - > > Key: DRILL-5089 > URL: https://issues.apache.org/jira/browse/DRILL-5089 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Affects Versions: 1.9.0 >Reporter: Abhishek Girish >Assignee: Chunhui Shi >Priority: Critical > Labels: ready-to-commit > Fix For: 1.12.0 > > > In a query's lifecycle, at attempt is made to initialize each enabled storage > plugin, while building the schema tree. This is done regardless of the actual > plugins involved within a query. > Sometimes, when one or more of the enabled storage plugins have issues - > either due to misconfiguration or the underlying datasource being slow or > being down, the overall query time taken increases drastically. Most likely > due the attempt being made to register schemas from a faulty plugin. > For example, when a jdbc plugin is configured with SQL Server, and at one > point the underlying SQL Server db goes down, any Drill query starting to > execute at that point and beyond begin to slow down drastically. > We must skip registering unrelated schemas (& workspaces) for a query. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query
[ https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261373#comment-16261373 ] ASF GitHub Bot commented on DRILL-5089: --- Github user chunhui-shi closed the pull request at: https://github.com/apache/drill/pull/795 > Skip initializing all enabled storage plugins for every query > - > > Key: DRILL-5089 > URL: https://issues.apache.org/jira/browse/DRILL-5089 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Affects Versions: 1.9.0 >Reporter: Abhishek Girish >Assignee: Chunhui Shi >Priority: Critical > Labels: ready-to-commit > Fix For: 1.12.0 > > > In a query's lifecycle, at attempt is made to initialize each enabled storage > plugin, while building the schema tree. This is done regardless of the actual > plugins involved within a query. > Sometimes, when one or more of the enabled storage plugins have issues - > either due to misconfiguration or the underlying datasource being slow or > being down, the overall query time taken increases drastically. Most likely > due the attempt being made to register schemas from a faulty plugin. > For example, when a jdbc plugin is configured with SQL Server, and at one > point the underlying SQL Server db goes down, any Drill query starting to > execute at that point and beyond begin to slow down drastically. > We must skip registering unrelated schemas (& workspaces) for a query. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query
[ https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16260553#comment-16260553 ] ASF GitHub Bot commented on DRILL-5089: --- Github user arina-ielchiieva commented on the issue: https://github.com/apache/drill/pull/1032 +1 > Skip initializing all enabled storage plugins for every query > - > > Key: DRILL-5089 > URL: https://issues.apache.org/jira/browse/DRILL-5089 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Affects Versions: 1.9.0 >Reporter: Abhishek Girish >Assignee: Chunhui Shi >Priority: Critical > Labels: ready-to-commit > Fix For: 1.12.0 > > > In a query's lifecycle, at attempt is made to initialize each enabled storage > plugin, while building the schema tree. This is done regardless of the actual > plugins involved within a query. > Sometimes, when one or more of the enabled storage plugins have issues - > either due to misconfiguration or the underlying datasource being slow or > being down, the overall query time taken increases drastically. Most likely > due the attempt being made to register schemas from a faulty plugin. > For example, when a jdbc plugin is configured with SQL Server, and at one > point the underlying SQL Server db goes down, any Drill query starting to > execute at that point and beyond begin to slow down drastically. > We must skip registering unrelated schemas (& workspaces) for a query. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query
[ https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259614#comment-16259614 ] ASF GitHub Bot commented on DRILL-5089: --- Github user chunhui-shi commented on a diff in the pull request: https://github.com/apache/drill/pull/1032#discussion_r152073873 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/DynamicRootSchema.java --- @@ -0,0 +1,140 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.planner.sql; + +import com.google.common.collect.ImmutableSortedSet; +import com.google.common.collect.Lists; +import com.google.common.collect.Sets; +import org.apache.calcite.DataContext; +import org.apache.calcite.jdbc.CalciteRootSchema; +import org.apache.calcite.jdbc.CalciteSchema; + +import org.apache.calcite.linq4j.tree.Expression; +import org.apache.calcite.linq4j.tree.Expressions; +import org.apache.calcite.schema.SchemaPlus; +import org.apache.calcite.schema.impl.AbstractSchema; +import org.apache.calcite.util.BuiltInMethod; +import org.apache.calcite.util.Compatible; +import org.apache.drill.common.exceptions.ExecutionSetupException; +import org.apache.drill.exec.store.SchemaConfig; +import org.apache.drill.exec.store.StoragePlugin; +import org.apache.drill.exec.store.StoragePluginRegistry; +import org.apache.drill.exec.store.SubSchemaWrapper; + +import java.io.IOException; +import java.util.List; +import java.util.Map; +import java.util.NavigableSet; +import java.util.Set; + +/** + * This class is to allow us loading schemas from storage plugins later when {@link #getSubSchema(String, boolean)} + * is called. + */ +public class DynamicRootSchema extends DynamicSchema +implements CalciteRootSchema { + private static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(DynamicRootSchema.class); + /** Creates a root schema. */ + DynamicRootSchema(StoragePluginRegistry storages, SchemaConfig schemaConfig) { +super(null, new RootSchema(), ""); +this.schemaConfig = schemaConfig; +this.storages = storages; + } + + @Override + public CalciteSchema getSubSchema(String schemaName, boolean caseSensitive) { +CalciteSchema retSchema = getSubSchemaMap().get(schemaName); +if (retSchema != null) { + return retSchema; +} + +loadSchemaFactory(schemaName, caseSensitive); +retSchema = getSubSchemaMap().get(schemaName); +return retSchema; + } + + @Override + public NavigableSet getTableNames() { +Set pluginNames = Sets.newHashSet(); +for (Map.EntrystorageEntry : getSchemaFactories()) { + pluginNames.add(storageEntry.getKey()); +} +return Compatible.INSTANCE.navigableSet( +ImmutableSortedSet.copyOf( +Sets.union(pluginNames, getSubSchemaMap().keySet(; + } + + /** + * load schema factory(storage plugin) for schemaName + * @param schemaName + * @param caseSensitive + */ + public void loadSchemaFactory(String schemaName, boolean caseSensitive) { +try { + SchemaPlus thisPlus = this.plus(); + StoragePlugin plugin = getSchemaFactories().getPlugin(schemaName); + if (plugin != null) { +plugin.registerSchemas(schemaConfig, thisPlus); +return; + } + + // we could not find the plugin, the schemaName could be `dfs.tmp`, a 2nd level schema under 'dfs' + String[] paths = schemaName.split("\\."); + if (paths.length == 2) { +plugin = getSchemaFactories().getPlugin(paths[0]); +if (plugin == null) { + return; +} + +// we could find storage plugin for first part(e.g. 'dfs') of schemaName (e.g. 'dfs.tmp') +// register schema for this storage
[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query
[ https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257633#comment-16257633 ] ASF GitHub Bot commented on DRILL-5089: --- Github user chunhui-shi commented on a diff in the pull request: https://github.com/apache/drill/pull/1032#discussion_r151798147 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/DynamicRootSchema.java --- @@ -0,0 +1,140 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.planner.sql; + +import com.google.common.collect.ImmutableSortedSet; +import com.google.common.collect.Lists; +import com.google.common.collect.Sets; +import org.apache.calcite.DataContext; +import org.apache.calcite.jdbc.CalciteRootSchema; +import org.apache.calcite.jdbc.CalciteSchema; + +import org.apache.calcite.linq4j.tree.Expression; +import org.apache.calcite.linq4j.tree.Expressions; +import org.apache.calcite.schema.SchemaPlus; +import org.apache.calcite.schema.impl.AbstractSchema; +import org.apache.calcite.util.BuiltInMethod; +import org.apache.calcite.util.Compatible; +import org.apache.drill.common.exceptions.ExecutionSetupException; +import org.apache.drill.exec.store.SchemaConfig; +import org.apache.drill.exec.store.StoragePlugin; +import org.apache.drill.exec.store.StoragePluginRegistry; +import org.apache.drill.exec.store.SubSchemaWrapper; + +import java.io.IOException; +import java.util.List; +import java.util.Map; +import java.util.NavigableSet; +import java.util.Set; + +/** + * This class is to allow us loading schemas from storage plugins later when {@link #getSubSchema(String, boolean)} + * is called. + */ +public class DynamicRootSchema extends DynamicSchema +implements CalciteRootSchema { + private static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(DynamicRootSchema.class); + /** Creates a root schema. */ + DynamicRootSchema(StoragePluginRegistry storages, SchemaConfig schemaConfig) { +super(null, new RootSchema(), ""); +this.schemaConfig = schemaConfig; +this.storages = storages; + } + + @Override + public CalciteSchema getSubSchema(String schemaName, boolean caseSensitive) { +CalciteSchema retSchema = getSubSchemaMap().get(schemaName); +if (retSchema != null) { + return retSchema; +} + +loadSchemaFactory(schemaName, caseSensitive); +retSchema = getSubSchemaMap().get(schemaName); +return retSchema; + } + + @Override + public NavigableSet getTableNames() { +Set pluginNames = Sets.newHashSet(); +for (Map.EntrystorageEntry : getSchemaFactories()) { + pluginNames.add(storageEntry.getKey()); +} +return Compatible.INSTANCE.navigableSet( +ImmutableSortedSet.copyOf( +Sets.union(pluginNames, getSubSchemaMap().keySet(; + } + + /** + * load schema factory(storage plugin) for schemaName + * @param schemaName + * @param caseSensitive + */ + public void loadSchemaFactory(String schemaName, boolean caseSensitive) { +try { + SchemaPlus thisPlus = this.plus(); + StoragePlugin plugin = getSchemaFactories().getPlugin(schemaName); + if (plugin != null) { +plugin.registerSchemas(schemaConfig, thisPlus); +return; + } + + // we could not find the plugin, the schemaName could be `dfs.tmp`, a 2nd level schema under 'dfs' + String[] paths = schemaName.split("\\."); + if (paths.length == 2) { +plugin = getSchemaFactories().getPlugin(paths[0]); +if (plugin == null) { + return; +} + +// we could find storage plugin for first part(e.g. 'dfs') of schemaName (e.g. 'dfs.tmp') +// register schema for this storage
[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query
[ https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257599#comment-16257599 ] ASF GitHub Bot commented on DRILL-5089: --- Github user chunhui-shi commented on a diff in the pull request: https://github.com/apache/drill/pull/1032#discussion_r151793647 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java --- @@ -373,12 +402,12 @@ public String toString() { public class WorkspaceSchema extends AbstractSchema implements ExpandingConcurrentMap.MapValueFactory{ private final ExpandingConcurrentMap tables = new ExpandingConcurrentMap<>(this); private final SchemaConfig schemaConfig; -private final DrillFileSystem fs; +private DrillFileSystem fs; -public WorkspaceSchema(List parentSchemaPath, String wsName, SchemaConfig schemaConfig) throws IOException { +public WorkspaceSchema(List parentSchemaPath, String wsName, SchemaConfig schemaConfig, DrillFileSystem fs) throws IOException { super(parentSchemaPath, wsName); this.schemaConfig = schemaConfig; - this.fs = ImpersonationUtil.createFileSystem(schemaConfig.getUserName(), fsConf); + this.fs = fs; --- End diff -- Now we pass in fs instead creating from inside of WorkspaceSchema. > Skip initializing all enabled storage plugins for every query > - > > Key: DRILL-5089 > URL: https://issues.apache.org/jira/browse/DRILL-5089 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Affects Versions: 1.9.0 >Reporter: Abhishek Girish >Assignee: Chunhui Shi >Priority: Critical > > In a query's lifecycle, at attempt is made to initialize each enabled storage > plugin, while building the schema tree. This is done regardless of the actual > plugins involved within a query. > Sometimes, when one or more of the enabled storage plugins have issues - > either due to misconfiguration or the underlying datasource being slow or > being down, the overall query time taken increases drastically. Most likely > due the attempt being made to register schemas from a faulty plugin. > For example, when a jdbc plugin is configured with SQL Server, and at one > point the underlying SQL Server db goes down, any Drill query starting to > execute at that point and beyond begin to slow down drastically. > We must skip registering unrelated schemas (& workspaces) for a query. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query
[ https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257601#comment-16257601 ] ASF GitHub Bot commented on DRILL-5089: --- Github user chunhui-shi commented on a diff in the pull request: https://github.com/apache/drill/pull/1032#discussion_r151793708 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java --- @@ -150,14 +152,30 @@ public WorkspaceSchemaFactory( * @return True if the user has access. False otherwise. */ public boolean accessible(final String userName) throws IOException { -final FileSystem fs = ImpersonationUtil.createFileSystem(userName, fsConf); +final DrillFileSystem fs = ImpersonationUtil.createFileSystem(userName, fsConf); +return accessible(fs); + } + + /** + * Checks whether a FileSystem object has the permission to list/read workspace directory + * @param fs a DrillFileSystem object that was created with certain user privilege + * @return True if the user has access. False otherwise. + * @throws IOException + */ + public boolean accessible(DrillFileSystem fs) throws IOException { try { - // We have to rely on the listStatus as a FileSystem can have complicated controls such as regular unix style - // permissions, Access Control Lists (ACLs) or Access Control Expressions (ACE). Hadoop 2.7 version of FileSystem - // has a limited private API (FileSystem.access) to check the permissions directly - // (see https://issues.apache.org/jira/browse/HDFS-6570). Drill currently relies on Hadoop 2.5.0 version of - // FileClient. TODO: Update this when DRILL-3749 is fixed. - fs.listStatus(wsPath); + /** + * For Windows local file system, fs.access ends up using DeprecatedRawLocalFileStatus which has + * TrustedInstaller as owner, and a member of Administrators group could not satisfy the permission. + * In this case, we will still use method listStatus. + * In other cases, we use access method since it is cheaper. + */ + if (SystemUtils.IS_OS_WINDOWS && fs.getUri().getScheme().equalsIgnoreCase(FileSystemSchemaFactory.LOCAL_FS_SCHEME)) { --- End diff -- Yes. it was tested in windows unit tests. > Skip initializing all enabled storage plugins for every query > - > > Key: DRILL-5089 > URL: https://issues.apache.org/jira/browse/DRILL-5089 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Affects Versions: 1.9.0 >Reporter: Abhishek Girish >Assignee: Chunhui Shi >Priority: Critical > > In a query's lifecycle, at attempt is made to initialize each enabled storage > plugin, while building the schema tree. This is done regardless of the actual > plugins involved within a query. > Sometimes, when one or more of the enabled storage plugins have issues - > either due to misconfiguration or the underlying datasource being slow or > being down, the overall query time taken increases drastically. Most likely > due the attempt being made to register schemas from a faulty plugin. > For example, when a jdbc plugin is configured with SQL Server, and at one > point the underlying SQL Server db goes down, any Drill query starting to > execute at that point and beyond begin to slow down drastically. > We must skip registering unrelated schemas (& workspaces) for a query. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query
[ https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257240#comment-16257240 ] ASF GitHub Bot commented on DRILL-5089: --- Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/1032#discussion_r151732963 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java --- @@ -373,12 +402,12 @@ public String toString() { public class WorkspaceSchema extends AbstractSchema implements ExpandingConcurrentMap.MapValueFactory{ private final ExpandingConcurrentMap tables = new ExpandingConcurrentMap<>(this); private final SchemaConfig schemaConfig; -private final DrillFileSystem fs; +private DrillFileSystem fs; -public WorkspaceSchema(List parentSchemaPath, String wsName, SchemaConfig schemaConfig) throws IOException { +public WorkspaceSchema(List parentSchemaPath, String wsName, SchemaConfig schemaConfig, DrillFileSystem fs) throws IOException { super(parentSchemaPath, wsName); this.schemaConfig = schemaConfig; - this.fs = ImpersonationUtil.createFileSystem(schemaConfig.getUserName(), fsConf); + this.fs = fs; --- End diff -- Why don't we anymore need to create fs using `ImpersonationUtil` but needed before? > Skip initializing all enabled storage plugins for every query > - > > Key: DRILL-5089 > URL: https://issues.apache.org/jira/browse/DRILL-5089 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Affects Versions: 1.9.0 >Reporter: Abhishek Girish >Assignee: Chunhui Shi >Priority: Critical > > In a query's lifecycle, at attempt is made to initialize each enabled storage > plugin, while building the schema tree. This is done regardless of the actual > plugins involved within a query. > Sometimes, when one or more of the enabled storage plugins have issues - > either due to misconfiguration or the underlying datasource being slow or > being down, the overall query time taken increases drastically. Most likely > due the attempt being made to register schemas from a faulty plugin. > For example, when a jdbc plugin is configured with SQL Server, and at one > point the underlying SQL Server db goes down, any Drill query starting to > execute at that point and beyond begin to slow down drastically. > We must skip registering unrelated schemas (& workspaces) for a query. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query
[ https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257239#comment-16257239 ] ASF GitHub Bot commented on DRILL-5089: --- Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/1032#discussion_r151713558 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/TestSchema.java --- @@ -0,0 +1,84 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.physical.impl; + +import org.apache.drill.test.ClientFixture; +import org.apache.drill.test.ClusterFixture; +import org.apache.drill.test.ClusterMockStorageFixture; +import org.apache.drill.test.DrillTest; + +import org.junit.AfterClass; +import org.junit.BeforeClass; +import org.junit.Test; + +import static org.junit.Assert.assertTrue; + +public class TestSchema extends DrillTest { + + private static ClusterMockStorageFixture cluster; + private static ClientFixture client; + + @BeforeClass + public static void setup() throws Exception { +cluster = ClusterFixture.builder().buildCustomMockStorage(); +boolean breakRegisterSchema = true; +cluster.insertMockStorage("mock_broken", breakRegisterSchema); +cluster.insertMockStorage("mock_good", !breakRegisterSchema); +client = cluster.clientFixture(); + } + + @Test + public void testQueryBrokenStorage() { +String sql = "SELECT id_i, name_s10 FROM `mock_broken`.`employees_5`"; +try { + client.queryBuilder().sql(sql).printCsv(); +} catch (Exception ex) { + assertTrue(ex.getMessage().contains("VALIDATION ERROR: Schema")); --- End diff -- This test can give false positive result when exception won't be thrown at all. Please re-throw the exception after the check and add `@Test(expected = Exception.class)`. > Skip initializing all enabled storage plugins for every query > - > > Key: DRILL-5089 > URL: https://issues.apache.org/jira/browse/DRILL-5089 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Affects Versions: 1.9.0 >Reporter: Abhishek Girish >Assignee: Chunhui Shi >Priority: Critical > > In a query's lifecycle, at attempt is made to initialize each enabled storage > plugin, while building the schema tree. This is done regardless of the actual > plugins involved within a query. > Sometimes, when one or more of the enabled storage plugins have issues - > either due to misconfiguration or the underlying datasource being slow or > being down, the overall query time taken increases drastically. Most likely > due the attempt being made to register schemas from a faulty plugin. > For example, when a jdbc plugin is configured with SQL Server, and at one > point the underlying SQL Server db goes down, any Drill query starting to > execute at that point and beyond begin to slow down drastically. > We must skip registering unrelated schemas (& workspaces) for a query. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query
[ https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257236#comment-16257236 ] ASF GitHub Bot commented on DRILL-5089: --- Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/1032#discussion_r151713734 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/mock/MockBreakageStorage.java --- @@ -0,0 +1,46 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.store.mock; + +import org.apache.calcite.schema.SchemaPlus; +import org.apache.drill.exec.server.DrillbitContext; +import org.apache.drill.exec.store.SchemaConfig; + +import java.io.IOException; + +public class MockBreakageStorage extends MockStorageEngine { + + boolean breakRegister; --- End diff -- private > Skip initializing all enabled storage plugins for every query > - > > Key: DRILL-5089 > URL: https://issues.apache.org/jira/browse/DRILL-5089 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Affects Versions: 1.9.0 >Reporter: Abhishek Girish >Assignee: Chunhui Shi >Priority: Critical > > In a query's lifecycle, at attempt is made to initialize each enabled storage > plugin, while building the schema tree. This is done regardless of the actual > plugins involved within a query. > Sometimes, when one or more of the enabled storage plugins have issues - > either due to misconfiguration or the underlying datasource being slow or > being down, the overall query time taken increases drastically. Most likely > due the attempt being made to register schemas from a faulty plugin. > For example, when a jdbc plugin is configured with SQL Server, and at one > point the underlying SQL Server db goes down, any Drill query starting to > execute at that point and beyond begin to slow down drastically. > We must skip registering unrelated schemas (& workspaces) for a query. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query
[ https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257234#comment-16257234 ] ASF GitHub Bot commented on DRILL-5089: --- Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/1032#discussion_r151713266 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/TestSchema.java --- @@ -0,0 +1,84 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.physical.impl; + +import org.apache.drill.test.ClientFixture; +import org.apache.drill.test.ClusterFixture; +import org.apache.drill.test.ClusterMockStorageFixture; +import org.apache.drill.test.DrillTest; + +import org.junit.AfterClass; +import org.junit.BeforeClass; +import org.junit.Test; + +import static org.junit.Assert.assertTrue; + +public class TestSchema extends DrillTest { + + private static ClusterMockStorageFixture cluster; + private static ClientFixture client; + + @BeforeClass + public static void setup() throws Exception { +cluster = ClusterFixture.builder().buildCustomMockStorage(); +boolean breakRegisterSchema = true; +cluster.insertMockStorage("mock_broken", breakRegisterSchema); +cluster.insertMockStorage("mock_good", !breakRegisterSchema); +client = cluster.clientFixture(); + } + + @Test + public void testQueryBrokenStorage() { +String sql = "SELECT id_i, name_s10 FROM `mock_broken`.`employees_5`"; +try { + client.queryBuilder().sql(sql).printCsv(); +} catch (Exception ex) { + assertTrue(ex.getMessage().contains("VALIDATION ERROR: Schema")); +} + } + + @Test + public void testQueryGoodStorage() { +String sql = "SELECT id_i, name_s10 FROM `mock_good`.`employees_5`"; +client.queryBuilder().sql(sql).printCsv(); + } + + @Test + public void testQueryGoodStorageWithDefaultSchema() throws Exception { +String use_dfs = "use dfs.tmp"; +client.queryBuilder().sql(use_dfs).run(); +String sql = "SELECT id_i, name_s10 FROM `mock_good`.`employees_5`"; +client.queryBuilder().sql(sql).printCsv(); --- End diff -- Do we actually want to print csv here? I suggest we produce no output here. > Skip initializing all enabled storage plugins for every query > - > > Key: DRILL-5089 > URL: https://issues.apache.org/jira/browse/DRILL-5089 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Affects Versions: 1.9.0 >Reporter: Abhishek Girish >Assignee: Chunhui Shi >Priority: Critical > > In a query's lifecycle, at attempt is made to initialize each enabled storage > plugin, while building the schema tree. This is done regardless of the actual > plugins involved within a query. > Sometimes, when one or more of the enabled storage plugins have issues - > either due to misconfiguration or the underlying datasource being slow or > being down, the overall query time taken increases drastically. Most likely > due the attempt being made to register schemas from a faulty plugin. > For example, when a jdbc plugin is configured with SQL Server, and at one > point the underlying SQL Server db goes down, any Drill query starting to > execute at that point and beyond begin to slow down drastically. > We must skip registering unrelated schemas (& workspaces) for a query. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query
[ https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257242#comment-16257242 ] ASF GitHub Bot commented on DRILL-5089: --- Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/1032#discussion_r151713624 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/TestSchema.java --- @@ -0,0 +1,84 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.physical.impl; + +import org.apache.drill.test.ClientFixture; +import org.apache.drill.test.ClusterFixture; +import org.apache.drill.test.ClusterMockStorageFixture; +import org.apache.drill.test.DrillTest; + +import org.junit.AfterClass; +import org.junit.BeforeClass; +import org.junit.Test; + +import static org.junit.Assert.assertTrue; + +public class TestSchema extends DrillTest { + + private static ClusterMockStorageFixture cluster; + private static ClientFixture client; + + @BeforeClass + public static void setup() throws Exception { +cluster = ClusterFixture.builder().buildCustomMockStorage(); +boolean breakRegisterSchema = true; +cluster.insertMockStorage("mock_broken", breakRegisterSchema); +cluster.insertMockStorage("mock_good", !breakRegisterSchema); +client = cluster.clientFixture(); + } + + @Test + public void testQueryBrokenStorage() { +String sql = "SELECT id_i, name_s10 FROM `mock_broken`.`employees_5`"; +try { + client.queryBuilder().sql(sql).printCsv(); +} catch (Exception ex) { + assertTrue(ex.getMessage().contains("VALIDATION ERROR: Schema")); +} + } + + @Test + public void testQueryGoodStorage() { +String sql = "SELECT id_i, name_s10 FROM `mock_good`.`employees_5`"; +client.queryBuilder().sql(sql).printCsv(); + } + + @Test + public void testQueryGoodStorageWithDefaultSchema() throws Exception { +String use_dfs = "use dfs.tmp"; +client.queryBuilder().sql(use_dfs).run(); +String sql = "SELECT id_i, name_s10 FROM `mock_good`.`employees_5`"; +client.queryBuilder().sql(sql).printCsv(); + } + + @Test + public void testUseBrokenStorage() throws Exception { +try { + String use_dfs = "use mock_broken"; + client.queryBuilder().sql(use_dfs).run(); +} catch(Exception ex) { + assertTrue(ex.getMessage().contains("VALIDATION ERROR: Schema")); +} + } + + @AfterClass --- End diff -- Can be removed. > Skip initializing all enabled storage plugins for every query > - > > Key: DRILL-5089 > URL: https://issues.apache.org/jira/browse/DRILL-5089 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Affects Versions: 1.9.0 >Reporter: Abhishek Girish >Assignee: Chunhui Shi >Priority: Critical > > In a query's lifecycle, at attempt is made to initialize each enabled storage > plugin, while building the schema tree. This is done regardless of the actual > plugins involved within a query. > Sometimes, when one or more of the enabled storage plugins have issues - > either due to misconfiguration or the underlying datasource being slow or > being down, the overall query time taken increases drastically. Most likely > due the attempt being made to register schemas from a faulty plugin. > For example, when a jdbc plugin is configured with SQL Server, and at one > point the underlying SQL Server db goes down, any Drill query starting to > execute at that point and beyond begin to slow down drastically. > We must skip registering unrelated schemas (& workspaces) for a query. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query
[ https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257235#comment-16257235 ] ASF GitHub Bot commented on DRILL-5089: --- Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/1032#discussion_r151714217 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java --- @@ -150,14 +152,30 @@ public WorkspaceSchemaFactory( * @return True if the user has access. False otherwise. */ public boolean accessible(final String userName) throws IOException { -final FileSystem fs = ImpersonationUtil.createFileSystem(userName, fsConf); +final DrillFileSystem fs = ImpersonationUtil.createFileSystem(userName, fsConf); +return accessible(fs); + } + + /** + * Checks whether a FileSystem object has the permission to list/read workspace directory + * @param fs a DrillFileSystem object that was created with certain user privilege + * @return True if the user has access. False otherwise. + * @throws IOException + */ + public boolean accessible(DrillFileSystem fs) throws IOException { try { - // We have to rely on the listStatus as a FileSystem can have complicated controls such as regular unix style - // permissions, Access Control Lists (ACLs) or Access Control Expressions (ACE). Hadoop 2.7 version of FileSystem - // has a limited private API (FileSystem.access) to check the permissions directly - // (see https://issues.apache.org/jira/browse/HDFS-6570). Drill currently relies on Hadoop 2.5.0 version of - // FileClient. TODO: Update this when DRILL-3749 is fixed. - fs.listStatus(wsPath); + /** + * For Windows local file system, fs.access ends up using DeprecatedRawLocalFileStatus which has + * TrustedInstaller as owner, and a member of Administrators group could not satisfy the permission. + * In this case, we will still use method listStatus. + * In other cases, we use access method since it is cheaper. + */ + if (SystemUtils.IS_OS_WINDOWS && fs.getUri().getScheme().equalsIgnoreCase(FileSystemSchemaFactory.LOCAL_FS_SCHEME)) { --- End diff -- Just in case, did you check that everything works on Windows? > Skip initializing all enabled storage plugins for every query > - > > Key: DRILL-5089 > URL: https://issues.apache.org/jira/browse/DRILL-5089 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Affects Versions: 1.9.0 >Reporter: Abhishek Girish >Assignee: Chunhui Shi >Priority: Critical > > In a query's lifecycle, at attempt is made to initialize each enabled storage > plugin, while building the schema tree. This is done regardless of the actual > plugins involved within a query. > Sometimes, when one or more of the enabled storage plugins have issues - > either due to misconfiguration or the underlying datasource being slow or > being down, the overall query time taken increases drastically. Most likely > due the attempt being made to register schemas from a faulty plugin. > For example, when a jdbc plugin is configured with SQL Server, and at one > point the underlying SQL Server db goes down, any Drill query starting to > execute at that point and beyond begin to slow down drastically. > We must skip registering unrelated schemas (& workspaces) for a query. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query
[ https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257241#comment-16257241 ] ASF GitHub Bot commented on DRILL-5089: --- Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/1032#discussion_r151717700 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/DynamicRootSchema.java --- @@ -0,0 +1,140 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.planner.sql; + +import com.google.common.collect.ImmutableSortedSet; +import com.google.common.collect.Lists; +import com.google.common.collect.Sets; +import org.apache.calcite.DataContext; +import org.apache.calcite.jdbc.CalciteRootSchema; +import org.apache.calcite.jdbc.CalciteSchema; + +import org.apache.calcite.linq4j.tree.Expression; +import org.apache.calcite.linq4j.tree.Expressions; +import org.apache.calcite.schema.SchemaPlus; +import org.apache.calcite.schema.impl.AbstractSchema; +import org.apache.calcite.util.BuiltInMethod; +import org.apache.calcite.util.Compatible; +import org.apache.drill.common.exceptions.ExecutionSetupException; +import org.apache.drill.exec.store.SchemaConfig; +import org.apache.drill.exec.store.StoragePlugin; +import org.apache.drill.exec.store.StoragePluginRegistry; +import org.apache.drill.exec.store.SubSchemaWrapper; + +import java.io.IOException; +import java.util.List; +import java.util.Map; +import java.util.NavigableSet; +import java.util.Set; + +/** + * This class is to allow us loading schemas from storage plugins later when {@link #getSubSchema(String, boolean)} + * is called. + */ +public class DynamicRootSchema extends DynamicSchema +implements CalciteRootSchema { + private static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(DynamicRootSchema.class); + /** Creates a root schema. */ + DynamicRootSchema(StoragePluginRegistry storages, SchemaConfig schemaConfig) { +super(null, new RootSchema(), ""); +this.schemaConfig = schemaConfig; +this.storages = storages; + } + + @Override + public CalciteSchema getSubSchema(String schemaName, boolean caseSensitive) { +CalciteSchema retSchema = getSubSchemaMap().get(schemaName); +if (retSchema != null) { + return retSchema; +} + +loadSchemaFactory(schemaName, caseSensitive); +retSchema = getSubSchemaMap().get(schemaName); +return retSchema; + } + + @Override + public NavigableSet getTableNames() { +Set pluginNames = Sets.newHashSet(); +for (Map.EntrystorageEntry : getSchemaFactories()) { + pluginNames.add(storageEntry.getKey()); +} +return Compatible.INSTANCE.navigableSet( +ImmutableSortedSet.copyOf( +Sets.union(pluginNames, getSubSchemaMap().keySet(; + } + + /** + * load schema factory(storage plugin) for schemaName + * @param schemaName + * @param caseSensitive + */ + public void loadSchemaFactory(String schemaName, boolean caseSensitive) { +try { + SchemaPlus thisPlus = this.plus(); + StoragePlugin plugin = getSchemaFactories().getPlugin(schemaName); + if (plugin != null) { +plugin.registerSchemas(schemaConfig, thisPlus); +return; + } + + // we could not find the plugin, the schemaName could be `dfs.tmp`, a 2nd level schema under 'dfs' + String[] paths = schemaName.split("\\."); + if (paths.length == 2) { +plugin = getSchemaFactories().getPlugin(paths[0]); +if (plugin == null) { + return; +} + +// we could find storage plugin for first part(e.g. 'dfs') of schemaName (e.g. 'dfs.tmp') +// register schema for this
[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query
[ https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257237#comment-16257237 ] ASF GitHub Bot commented on DRILL-5089: --- Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/1032#discussion_r151714418 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemSchemaFactory.java --- @@ -50,11 +53,23 @@ public static final String DEFAULT_WS_NAME = "default"; + public static final String LOCAL_FS_SCHEME = "file"; + private List factories; private String schemaName; + protected FileSystemPlugin plugin; public FileSystemSchemaFactory(String schemaName, List factories) { -super(); +// when the correspondent FileSystemPlugin is not passed in, we dig into ANY workspace factory to get it. +if (factories.size() > 0 ) { --- End diff -- Please remove space `if (factories.size() > 0) {`. > Skip initializing all enabled storage plugins for every query > - > > Key: DRILL-5089 > URL: https://issues.apache.org/jira/browse/DRILL-5089 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Affects Versions: 1.9.0 >Reporter: Abhishek Girish >Assignee: Chunhui Shi >Priority: Critical > > In a query's lifecycle, at attempt is made to initialize each enabled storage > plugin, while building the schema tree. This is done regardless of the actual > plugins involved within a query. > Sometimes, when one or more of the enabled storage plugins have issues - > either due to misconfiguration or the underlying datasource being slow or > being down, the overall query time taken increases drastically. Most likely > due the attempt being made to register schemas from a faulty plugin. > For example, when a jdbc plugin is configured with SQL Server, and at one > point the underlying SQL Server db goes down, any Drill query starting to > execute at that point and beyond begin to slow down drastically. > We must skip registering unrelated schemas (& workspaces) for a query. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query
[ https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257243#comment-16257243 ] ASF GitHub Bot commented on DRILL-5089: --- Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/1032#discussion_r151732407 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/DynamicRootSchema.java --- @@ -0,0 +1,140 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.planner.sql; + +import com.google.common.collect.ImmutableSortedSet; +import com.google.common.collect.Lists; +import com.google.common.collect.Sets; +import org.apache.calcite.DataContext; +import org.apache.calcite.jdbc.CalciteRootSchema; +import org.apache.calcite.jdbc.CalciteSchema; + +import org.apache.calcite.linq4j.tree.Expression; +import org.apache.calcite.linq4j.tree.Expressions; +import org.apache.calcite.schema.SchemaPlus; +import org.apache.calcite.schema.impl.AbstractSchema; +import org.apache.calcite.util.BuiltInMethod; +import org.apache.calcite.util.Compatible; +import org.apache.drill.common.exceptions.ExecutionSetupException; +import org.apache.drill.exec.store.SchemaConfig; +import org.apache.drill.exec.store.StoragePlugin; +import org.apache.drill.exec.store.StoragePluginRegistry; +import org.apache.drill.exec.store.SubSchemaWrapper; + +import java.io.IOException; +import java.util.List; +import java.util.Map; +import java.util.NavigableSet; +import java.util.Set; + +/** + * This class is to allow us loading schemas from storage plugins later when {@link #getSubSchema(String, boolean)} + * is called. + */ +public class DynamicRootSchema extends DynamicSchema +implements CalciteRootSchema { + private static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(DynamicRootSchema.class); + /** Creates a root schema. */ + DynamicRootSchema(StoragePluginRegistry storages, SchemaConfig schemaConfig) { +super(null, new RootSchema(), ""); +this.schemaConfig = schemaConfig; +this.storages = storages; + } + + @Override + public CalciteSchema getSubSchema(String schemaName, boolean caseSensitive) { +CalciteSchema retSchema = getSubSchemaMap().get(schemaName); +if (retSchema != null) { + return retSchema; +} + +loadSchemaFactory(schemaName, caseSensitive); +retSchema = getSubSchemaMap().get(schemaName); +return retSchema; + } + + @Override + public NavigableSet getTableNames() { +Set pluginNames = Sets.newHashSet(); +for (Map.EntrystorageEntry : getSchemaFactories()) { + pluginNames.add(storageEntry.getKey()); +} +return Compatible.INSTANCE.navigableSet( +ImmutableSortedSet.copyOf( +Sets.union(pluginNames, getSubSchemaMap().keySet(; + } + + /** + * load schema factory(storage plugin) for schemaName + * @param schemaName + * @param caseSensitive + */ + public void loadSchemaFactory(String schemaName, boolean caseSensitive) { +try { + SchemaPlus thisPlus = this.plus(); + StoragePlugin plugin = getSchemaFactories().getPlugin(schemaName); + if (plugin != null) { +plugin.registerSchemas(schemaConfig, thisPlus); +return; + } + + // we could not find the plugin, the schemaName could be `dfs.tmp`, a 2nd level schema under 'dfs' + String[] paths = schemaName.split("\\."); + if (paths.length == 2) { +plugin = getSchemaFactories().getPlugin(paths[0]); +if (plugin == null) { + return; +} + +// we could find storage plugin for first part(e.g. 'dfs') of schemaName (e.g. 'dfs.tmp') +// register schema for this
[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query
[ https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257238#comment-16257238 ] ASF GitHub Bot commented on DRILL-5089: --- Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/1032#discussion_r151716880 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/DynamicRootSchema.java --- @@ -0,0 +1,140 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.planner.sql; + +import com.google.common.collect.ImmutableSortedSet; +import com.google.common.collect.Lists; +import com.google.common.collect.Sets; +import org.apache.calcite.DataContext; +import org.apache.calcite.jdbc.CalciteRootSchema; +import org.apache.calcite.jdbc.CalciteSchema; + +import org.apache.calcite.linq4j.tree.Expression; +import org.apache.calcite.linq4j.tree.Expressions; +import org.apache.calcite.schema.SchemaPlus; +import org.apache.calcite.schema.impl.AbstractSchema; +import org.apache.calcite.util.BuiltInMethod; +import org.apache.calcite.util.Compatible; +import org.apache.drill.common.exceptions.ExecutionSetupException; +import org.apache.drill.exec.store.SchemaConfig; +import org.apache.drill.exec.store.StoragePlugin; +import org.apache.drill.exec.store.StoragePluginRegistry; +import org.apache.drill.exec.store.SubSchemaWrapper; + +import java.io.IOException; +import java.util.List; +import java.util.Map; +import java.util.NavigableSet; +import java.util.Set; + +/** + * This class is to allow us loading schemas from storage plugins later when {@link #getSubSchema(String, boolean)} + * is called. + */ +public class DynamicRootSchema extends DynamicSchema +implements CalciteRootSchema { + private static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(DynamicRootSchema.class); + /** Creates a root schema. */ + DynamicRootSchema(StoragePluginRegistry storages, SchemaConfig schemaConfig) { +super(null, new RootSchema(), ""); +this.schemaConfig = schemaConfig; +this.storages = storages; + } + + @Override + public CalciteSchema getSubSchema(String schemaName, boolean caseSensitive) { +CalciteSchema retSchema = getSubSchemaMap().get(schemaName); +if (retSchema != null) { + return retSchema; +} + +loadSchemaFactory(schemaName, caseSensitive); +retSchema = getSubSchemaMap().get(schemaName); +return retSchema; + } + + @Override + public NavigableSet getTableNames() { +Set pluginNames = Sets.newHashSet(); +for (Map.EntrystorageEntry : getSchemaFactories()) { + pluginNames.add(storageEntry.getKey()); +} +return Compatible.INSTANCE.navigableSet( +ImmutableSortedSet.copyOf( +Sets.union(pluginNames, getSubSchemaMap().keySet(; --- End diff -- Could you please explain what this method actually returns? According by its name it should return table names but it seems it returns different things... > Skip initializing all enabled storage plugins for every query > - > > Key: DRILL-5089 > URL: https://issues.apache.org/jira/browse/DRILL-5089 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Affects Versions: 1.9.0 >Reporter: Abhishek Girish >Assignee: Chunhui Shi >Priority: Critical > > In a query's lifecycle, at attempt is made to initialize each enabled storage > plugin, while building the schema tree. This is done regardless of the actual > plugins involved within a query. > Sometimes, when one or more of the enabled storage plugins have issues - > either due to misconfiguration or the
[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query
[ https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16255710#comment-16255710 ] ASF GitHub Bot commented on DRILL-5089: --- Github user chunhui-shi commented on a diff in the pull request: https://github.com/apache/drill/pull/1032#discussion_r151493351 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java --- @@ -175,6 +193,21 @@ public WorkspaceSchema createSchema(List parentSchemaPath, SchemaConfig return new WorkspaceSchema(parentSchemaPath, schemaName, schemaConfig); } + public WorkspaceSchema createSchema(List parentSchemaPath, SchemaConfig schemaConfig, DrillFileSystem fs) throws IOException { +if (!accessible(fs)) { --- End diff -- returning null then user could not even list this workspace, so they don't know the existence of this workspace at all. I think that is a good access control practice. If users expect to see a workspace but could not see it, then they need to figure out why by themselves. > Skip initializing all enabled storage plugins for every query > - > > Key: DRILL-5089 > URL: https://issues.apache.org/jira/browse/DRILL-5089 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Affects Versions: 1.9.0 >Reporter: Abhishek Girish >Assignee: Chunhui Shi >Priority: Critical > > In a query's lifecycle, at attempt is made to initialize each enabled storage > plugin, while building the schema tree. This is done regardless of the actual > plugins involved within a query. > Sometimes, when one or more of the enabled storage plugins have issues - > either due to misconfiguration or the underlying datasource being slow or > being down, the overall query time taken increases drastically. Most likely > due the attempt being made to register schemas from a faulty plugin. > For example, when a jdbc plugin is configured with SQL Server, and at one > point the underlying SQL Server db goes down, any Drill query starting to > execute at that point and beyond begin to slow down drastically. > We must skip registering unrelated schemas (& workspaces) for a query. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query
[ https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16254618#comment-16254618 ] ASF GitHub Bot commented on DRILL-5089: --- Github user chunhui-shi commented on a diff in the pull request: https://github.com/apache/drill/pull/1032#discussion_r151302799 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/DynamicRootSchema.java --- @@ -0,0 +1,125 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.planner.sql; + +import com.google.common.collect.ImmutableSortedSet; +import com.google.common.collect.Lists; +import com.google.common.collect.Sets; +import org.apache.calcite.DataContext; +import org.apache.calcite.jdbc.CalciteRootSchema; +import org.apache.calcite.jdbc.CalciteSchema; + +import org.apache.calcite.linq4j.tree.Expression; +import org.apache.calcite.linq4j.tree.Expressions; +import org.apache.calcite.schema.SchemaPlus; +import org.apache.calcite.schema.impl.AbstractSchema; +import org.apache.calcite.util.BuiltInMethod; +import org.apache.calcite.util.Compatible; +import org.apache.drill.common.exceptions.ExecutionSetupException; +import org.apache.drill.exec.store.SchemaConfig; +import org.apache.drill.exec.store.StoragePlugin; +import org.apache.drill.exec.store.StoragePluginRegistry; +import org.apache.drill.exec.store.SubSchemaWrapper; + +import java.io.IOException; +import java.util.Map; +import java.util.NavigableSet; +import java.util.Set; + +public class DynamicRootSchema extends DynamicSchema +implements CalciteRootSchema { + + /** Creates a root schema. */ + DynamicRootSchema(StoragePluginRegistry storages, SchemaConfig schemaConfig) { +super(null, new RootSchema(), ""); +this.schemaConfig = schemaConfig; +this.storages = storages; + } + + @Override + public CalciteSchema getSubSchema(String schemaName, boolean caseSensitive) { +CalciteSchema retSchema = getSubSchemaMap().get(schemaName); + +if (retSchema == null) { + loadSchemaFactory(schemaName, caseSensitive); +} + +retSchema = getSubSchemaMap().get(schemaName); +return retSchema; + } + + @Override + public NavigableSet getTableNames() { +Set pluginNames = Sets.newHashSet(); --- End diff -- plugin name in drill is case sensitive. > Skip initializing all enabled storage plugins for every query > - > > Key: DRILL-5089 > URL: https://issues.apache.org/jira/browse/DRILL-5089 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Affects Versions: 1.9.0 >Reporter: Abhishek Girish >Assignee: Chunhui Shi >Priority: Critical > > In a query's lifecycle, at attempt is made to initialize each enabled storage > plugin, while building the schema tree. This is done regardless of the actual > plugins involved within a query. > Sometimes, when one or more of the enabled storage plugins have issues - > either due to misconfiguration or the underlying datasource being slow or > being down, the overall query time taken increases drastically. Most likely > due the attempt being made to register schemas from a faulty plugin. > For example, when a jdbc plugin is configured with SQL Server, and at one > point the underlying SQL Server db goes down, any Drill query starting to > execute at that point and beyond begin to slow down drastically. > We must skip registering unrelated schemas (& workspaces) for a query. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query
[ https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16254602#comment-16254602 ] ASF GitHub Bot commented on DRILL-5089: --- Github user chunhui-shi commented on a diff in the pull request: https://github.com/apache/drill/pull/1032#discussion_r151301607 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java --- @@ -150,14 +152,30 @@ public WorkspaceSchemaFactory( * @return True if the user has access. False otherwise. */ public boolean accessible(final String userName) throws IOException { -final FileSystem fs = ImpersonationUtil.createFileSystem(userName, fsConf); +final DrillFileSystem fs = ImpersonationUtil.createFileSystem(userName, fsConf); +return accessible(fs); + } + + /** + * Checks whether a FileSystem object has the permission to list/read workspace directory + * @param fs a DrillFileSystem object that was created with certain user privilege + * @return True if the user has access. False otherwise. + * @throws IOException + */ + public boolean accessible(DrillFileSystem fs) throws IOException { try { - // We have to rely on the listStatus as a FileSystem can have complicated controls such as regular unix style - // permissions, Access Control Lists (ACLs) or Access Control Expressions (ACE). Hadoop 2.7 version of FileSystem - // has a limited private API (FileSystem.access) to check the permissions directly - // (see https://issues.apache.org/jira/browse/HDFS-6570). Drill currently relies on Hadoop 2.5.0 version of - // FileClient. TODO: Update this when DRILL-3749 is fixed. - fs.listStatus(wsPath); + /** + * For Windows local file system, fs.access ends up using DeprecatedRawLocalFileStatus which has + * TrustedInstaller as owner, and a member of Administrators group could not satisfy the permission. + * In this case, we will still use method listStatus. + * In other cases, we use access method since it is cheaper. + */ + if (SystemUtils.IS_OS_WINDOWS && fs.getUri().getScheme().equalsIgnoreCase("file")) { --- End diff -- FileSystem in hdfs has a constant DEFAULT_FS "file:///", for now I will define our own. > Skip initializing all enabled storage plugins for every query > - > > Key: DRILL-5089 > URL: https://issues.apache.org/jira/browse/DRILL-5089 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Affects Versions: 1.9.0 >Reporter: Abhishek Girish >Assignee: Chunhui Shi >Priority: Critical > > In a query's lifecycle, at attempt is made to initialize each enabled storage > plugin, while building the schema tree. This is done regardless of the actual > plugins involved within a query. > Sometimes, when one or more of the enabled storage plugins have issues - > either due to misconfiguration or the underlying datasource being slow or > being down, the overall query time taken increases drastically. Most likely > due the attempt being made to register schemas from a faulty plugin. > For example, when a jdbc plugin is configured with SQL Server, and at one > point the underlying SQL Server db goes down, any Drill query starting to > execute at that point and beyond begin to slow down drastically. > We must skip registering unrelated schemas (& workspaces) for a query. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query
[ https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16254589#comment-16254589 ] ASF GitHub Bot commented on DRILL-5089: --- Github user chunhui-shi commented on a diff in the pull request: https://github.com/apache/drill/pull/1032#discussion_r151299650 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/SchemaTreeProvider.java --- @@ -105,12 +106,36 @@ public SchemaPlus createRootSchema(final String userName, final SchemaConfigInfo * @return */ public SchemaPlus createRootSchema(SchemaConfig schemaConfig) { + final SchemaPlus rootSchema = DynamicSchema.createRootSchema(dContext.getStorage(), schemaConfig); + schemaTreesToClose.add(rootSchema); + return rootSchema; + } + + /** + * Return full root schema with schema owner as the given user. + * + * @param userName Name of the user who is accessing the storage sources. + * @param provider {@link SchemaConfigInfoProvider} instance + * @return Root of the schema tree. + */ + public SchemaPlus createFullRootSchema(final String userName, final SchemaConfigInfoProvider provider) { +final String schemaUser = isImpersonationEnabled ? userName : ImpersonationUtil.getProcessUserName(); --- End diff -- not that many places, and need to pass in isImpersonationEnabled and userName if this line became a standalone method. will keep it as is for now. > Skip initializing all enabled storage plugins for every query > - > > Key: DRILL-5089 > URL: https://issues.apache.org/jira/browse/DRILL-5089 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Affects Versions: 1.9.0 >Reporter: Abhishek Girish >Assignee: Chunhui Shi >Priority: Critical > > In a query's lifecycle, at attempt is made to initialize each enabled storage > plugin, while building the schema tree. This is done regardless of the actual > plugins involved within a query. > Sometimes, when one or more of the enabled storage plugins have issues - > either due to misconfiguration or the underlying datasource being slow or > being down, the overall query time taken increases drastically. Most likely > due the attempt being made to register schemas from a faulty plugin. > For example, when a jdbc plugin is configured with SQL Server, and at one > point the underlying SQL Server db goes down, any Drill query starting to > execute at that point and beyond begin to slow down drastically. > We must skip registering unrelated schemas (& workspaces) for a query. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query
[ https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16254580#comment-16254580 ] ASF GitHub Bot commented on DRILL-5089: --- Github user chunhui-shi commented on a diff in the pull request: https://github.com/apache/drill/pull/1032#discussion_r151298428 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemSchemaFactory.java --- @@ -73,9 +87,10 @@ public void registerSchemas(SchemaConfig schemaConfig, SchemaPlus parent) throws public FileSystemSchema(String name, SchemaConfig schemaConfig) throws IOException { super(ImmutableList.of(), name); + final DrillFileSystem fs = ImpersonationUtil.createFileSystem(schemaConfig.getUserName(), plugin.getFsConf()); for(WorkspaceSchemaFactory f : factories){ -if (f.accessible(schemaConfig.getUserName())) { - WorkspaceSchema s = f.createSchema(getSchemaPath(), schemaConfig); +WorkspaceSchema s = f.createSchema(getSchemaPath(), schemaConfig, fs); +if ( s != null) { --- End diff -- 'factories' is from storage plugin, so it will be identical when we don't update this storage plugin. This FileSystemSchema constructor will be called only once for a query if this FileSystem storage plugin is needed for this query. > Skip initializing all enabled storage plugins for every query > - > > Key: DRILL-5089 > URL: https://issues.apache.org/jira/browse/DRILL-5089 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Affects Versions: 1.9.0 >Reporter: Abhishek Girish >Assignee: Chunhui Shi >Priority: Critical > > In a query's lifecycle, at attempt is made to initialize each enabled storage > plugin, while building the schema tree. This is done regardless of the actual > plugins involved within a query. > Sometimes, when one or more of the enabled storage plugins have issues - > either due to misconfiguration or the underlying datasource being slow or > being down, the overall query time taken increases drastically. Most likely > due the attempt being made to register schemas from a faulty plugin. > For example, when a jdbc plugin is configured with SQL Server, and at one > point the underlying SQL Server db goes down, any Drill query starting to > execute at that point and beyond begin to slow down drastically. > We must skip registering unrelated schemas (& workspaces) for a query. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query
[ https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250389#comment-16250389 ] ASF GitHub Bot commented on DRILL-5089: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1032#discussion_r150680095 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/DynamicRootSchema.java --- @@ -0,0 +1,125 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.planner.sql; + +import com.google.common.collect.ImmutableSortedSet; +import com.google.common.collect.Lists; +import com.google.common.collect.Sets; +import org.apache.calcite.DataContext; +import org.apache.calcite.jdbc.CalciteRootSchema; +import org.apache.calcite.jdbc.CalciteSchema; + +import org.apache.calcite.linq4j.tree.Expression; +import org.apache.calcite.linq4j.tree.Expressions; +import org.apache.calcite.schema.SchemaPlus; +import org.apache.calcite.schema.impl.AbstractSchema; +import org.apache.calcite.util.BuiltInMethod; +import org.apache.calcite.util.Compatible; +import org.apache.drill.common.exceptions.ExecutionSetupException; +import org.apache.drill.exec.store.SchemaConfig; +import org.apache.drill.exec.store.StoragePlugin; +import org.apache.drill.exec.store.StoragePluginRegistry; +import org.apache.drill.exec.store.SubSchemaWrapper; + +import java.io.IOException; +import java.util.Map; +import java.util.NavigableSet; +import java.util.Set; + +public class DynamicRootSchema extends DynamicSchema +implements CalciteRootSchema { + + /** Creates a root schema. */ + DynamicRootSchema(StoragePluginRegistry storages, SchemaConfig schemaConfig) { +super(null, new RootSchema(), ""); +this.schemaConfig = schemaConfig; +this.storages = storages; + } + + @Override + public CalciteSchema getSubSchema(String schemaName, boolean caseSensitive) { +CalciteSchema retSchema = getSubSchemaMap().get(schemaName); + +if (retSchema == null) { + loadSchemaFactory(schemaName, caseSensitive); +} + +retSchema = getSubSchemaMap().get(schemaName); +return retSchema; + } + + @Override + public NavigableSet getTableNames() { +Set pluginNames = Sets.newHashSet(); +for (Map.EntrystorageEntry : getSchemaFactories()) { + pluginNames.add(storageEntry.getKey()); +} +return Compatible.INSTANCE.navigableSet( +ImmutableSortedSet.copyOf( +Sets.union(pluginNames, getSubSchemaMap().keySet(; + } + + /** + * load schema factory(storage plugin) for schemaName + * @param schemaName + * @param caseSensitive + */ + public void loadSchemaFactory(String schemaName, boolean caseSensitive) { +try { + SchemaPlus thisPlus = this.plus(); + StoragePlugin plugin = getSchemaFactories().getPlugin(schemaName); + if (plugin != null) { +plugin.registerSchemas(schemaConfig, thisPlus); + } + else { +//this schema name could be `dfs.tmp`, a 2nd level schema under 'dfs' +String[] paths = schemaName.split("\\."); +if (paths.length == 2) { --- End diff -- Do we support only 1- and 2-part names? Should we assert that the length <= 2? > Skip initializing all enabled storage plugins for every query > - > > Key: DRILL-5089 > URL: https://issues.apache.org/jira/browse/DRILL-5089 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Affects Versions: 1.9.0 >Reporter: Abhishek Girish >Assignee: Chunhui Shi >Priority: Critical > > In a query's lifecycle, at attempt
[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query
[ https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250402#comment-16250402 ] ASF GitHub Bot commented on DRILL-5089: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1032#discussion_r150685113 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemSchemaFactory.java --- @@ -73,9 +87,10 @@ public void registerSchemas(SchemaConfig schemaConfig, SchemaPlus parent) throws public FileSystemSchema(String name, SchemaConfig schemaConfig) throws IOException { super(ImmutableList.of(), name); + final DrillFileSystem fs = ImpersonationUtil.createFileSystem(schemaConfig.getUserName(), plugin.getFsConf()); for(WorkspaceSchemaFactory f : factories){ -if (f.accessible(schemaConfig.getUserName())) { - WorkspaceSchema s = f.createSchema(getSchemaPath(), schemaConfig); +WorkspaceSchema s = f.createSchema(getSchemaPath(), schemaConfig, fs); +if ( s != null) { --- End diff -- Here we iterate over a list of workspace schema factories. For each, we resolve a schemaConfig against the file system. Under what situations would we have multiple factories? Selecting from two distinct storage plugins? Calcite tends to resolve the same things over and over. Will this method be called multiple times? > Skip initializing all enabled storage plugins for every query > - > > Key: DRILL-5089 > URL: https://issues.apache.org/jira/browse/DRILL-5089 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Affects Versions: 1.9.0 >Reporter: Abhishek Girish >Assignee: Chunhui Shi >Priority: Critical > > In a query's lifecycle, at attempt is made to initialize each enabled storage > plugin, while building the schema tree. This is done regardless of the actual > plugins involved within a query. > Sometimes, when one or more of the enabled storage plugins have issues - > either due to misconfiguration or the underlying datasource being slow or > being down, the overall query time taken increases drastically. Most likely > due the attempt being made to register schemas from a faulty plugin. > For example, when a jdbc plugin is configured with SQL Server, and at one > point the underlying SQL Server db goes down, any Drill query starting to > execute at that point and beyond begin to slow down drastically. > We must skip registering unrelated schemas (& workspaces) for a query. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query
[ https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250408#comment-16250408 ] ASF GitHub Bot commented on DRILL-5089: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1032#discussion_r150686157 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java --- @@ -532,7 +572,10 @@ public boolean isMutable() { } public DrillFileSystem getFS() { - return fs; + if (this.fs == null) { +this.fs = ImpersonationUtil.createFileSystem(schemaConfig.getUserName(), fsConf); + } + return this.fs; --- End diff -- This class caches the file system, which is good. The other classes in this PR do not; they create the fs as needed. Does Calcite allow some kind of session state in which we can cache the fs for the query (plan) rather than creating it on the fly each time we need it? > Skip initializing all enabled storage plugins for every query > - > > Key: DRILL-5089 > URL: https://issues.apache.org/jira/browse/DRILL-5089 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Affects Versions: 1.9.0 >Reporter: Abhishek Girish >Assignee: Chunhui Shi >Priority: Critical > > In a query's lifecycle, at attempt is made to initialize each enabled storage > plugin, while building the schema tree. This is done regardless of the actual > plugins involved within a query. > Sometimes, when one or more of the enabled storage plugins have issues - > either due to misconfiguration or the underlying datasource being slow or > being down, the overall query time taken increases drastically. Most likely > due the attempt being made to register schemas from a faulty plugin. > For example, when a jdbc plugin is configured with SQL Server, and at one > point the underlying SQL Server db goes down, any Drill query starting to > execute at that point and beyond begin to slow down drastically. > We must skip registering unrelated schemas (& workspaces) for a query. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query
[ https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250387#comment-16250387 ] ASF GitHub Bot commented on DRILL-5089: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1032#discussion_r150678803 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/DynamicRootSchema.java --- @@ -0,0 +1,125 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.planner.sql; + +import com.google.common.collect.ImmutableSortedSet; +import com.google.common.collect.Lists; +import com.google.common.collect.Sets; +import org.apache.calcite.DataContext; +import org.apache.calcite.jdbc.CalciteRootSchema; +import org.apache.calcite.jdbc.CalciteSchema; + +import org.apache.calcite.linq4j.tree.Expression; +import org.apache.calcite.linq4j.tree.Expressions; +import org.apache.calcite.schema.SchemaPlus; +import org.apache.calcite.schema.impl.AbstractSchema; +import org.apache.calcite.util.BuiltInMethod; +import org.apache.calcite.util.Compatible; +import org.apache.drill.common.exceptions.ExecutionSetupException; +import org.apache.drill.exec.store.SchemaConfig; +import org.apache.drill.exec.store.StoragePlugin; +import org.apache.drill.exec.store.StoragePluginRegistry; +import org.apache.drill.exec.store.SubSchemaWrapper; + +import java.io.IOException; +import java.util.Map; +import java.util.NavigableSet; +import java.util.Set; + +public class DynamicRootSchema extends DynamicSchema +implements CalciteRootSchema { + + /** Creates a root schema. */ + DynamicRootSchema(StoragePluginRegistry storages, SchemaConfig schemaConfig) { +super(null, new RootSchema(), ""); +this.schemaConfig = schemaConfig; +this.storages = storages; + } + + @Override + public CalciteSchema getSubSchema(String schemaName, boolean caseSensitive) { +CalciteSchema retSchema = getSubSchemaMap().get(schemaName); + +if (retSchema == null) { + loadSchemaFactory(schemaName, caseSensitive); +} + +retSchema = getSubSchemaMap().get(schemaName); +return retSchema; + } + + @Override + public NavigableSet getTableNames() { +Set pluginNames = Sets.newHashSet(); --- End diff -- Should be case insensitive? > Skip initializing all enabled storage plugins for every query > - > > Key: DRILL-5089 > URL: https://issues.apache.org/jira/browse/DRILL-5089 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Affects Versions: 1.9.0 >Reporter: Abhishek Girish >Assignee: Chunhui Shi >Priority: Critical > > In a query's lifecycle, at attempt is made to initialize each enabled storage > plugin, while building the schema tree. This is done regardless of the actual > plugins involved within a query. > Sometimes, when one or more of the enabled storage plugins have issues - > either due to misconfiguration or the underlying datasource being slow or > being down, the overall query time taken increases drastically. Most likely > due the attempt being made to register schemas from a faulty plugin. > For example, when a jdbc plugin is configured with SQL Server, and at one > point the underlying SQL Server db goes down, any Drill query starting to > execute at that point and beyond begin to slow down drastically. > We must skip registering unrelated schemas (& workspaces) for a query. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query
[ https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250401#comment-16250401 ] ASF GitHub Bot commented on DRILL-5089: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1032#discussion_r150684557 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemSchemaFactory.java --- @@ -73,9 +87,10 @@ public void registerSchemas(SchemaConfig schemaConfig, SchemaPlus parent) throws public FileSystemSchema(String name, SchemaConfig schemaConfig) throws IOException { super(ImmutableList.of(), name); + final DrillFileSystem fs = ImpersonationUtil.createFileSystem(schemaConfig.getUserName(), plugin.getFsConf()); --- End diff -- Comment to explain what's happening here? > Skip initializing all enabled storage plugins for every query > - > > Key: DRILL-5089 > URL: https://issues.apache.org/jira/browse/DRILL-5089 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Affects Versions: 1.9.0 >Reporter: Abhishek Girish >Assignee: Chunhui Shi >Priority: Critical > > In a query's lifecycle, at attempt is made to initialize each enabled storage > plugin, while building the schema tree. This is done regardless of the actual > plugins involved within a query. > Sometimes, when one or more of the enabled storage plugins have issues - > either due to misconfiguration or the underlying datasource being slow or > being down, the overall query time taken increases drastically. Most likely > due the attempt being made to register schemas from a faulty plugin. > For example, when a jdbc plugin is configured with SQL Server, and at one > point the underlying SQL Server db goes down, any Drill query starting to > execute at that point and beyond begin to slow down drastically. > We must skip registering unrelated schemas (& workspaces) for a query. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query
[ https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250391#comment-16250391 ] ASF GitHub Bot commented on DRILL-5089: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1032#discussion_r150682074 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/DynamicRootSchema.java --- @@ -0,0 +1,125 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.planner.sql; + +import com.google.common.collect.ImmutableSortedSet; +import com.google.common.collect.Lists; +import com.google.common.collect.Sets; +import org.apache.calcite.DataContext; +import org.apache.calcite.jdbc.CalciteRootSchema; +import org.apache.calcite.jdbc.CalciteSchema; + +import org.apache.calcite.linq4j.tree.Expression; +import org.apache.calcite.linq4j.tree.Expressions; +import org.apache.calcite.schema.SchemaPlus; +import org.apache.calcite.schema.impl.AbstractSchema; +import org.apache.calcite.util.BuiltInMethod; +import org.apache.calcite.util.Compatible; +import org.apache.drill.common.exceptions.ExecutionSetupException; +import org.apache.drill.exec.store.SchemaConfig; +import org.apache.drill.exec.store.StoragePlugin; +import org.apache.drill.exec.store.StoragePluginRegistry; +import org.apache.drill.exec.store.SubSchemaWrapper; + +import java.io.IOException; +import java.util.Map; +import java.util.NavigableSet; +import java.util.Set; + +public class DynamicRootSchema extends DynamicSchema +implements CalciteRootSchema { + + /** Creates a root schema. */ + DynamicRootSchema(StoragePluginRegistry storages, SchemaConfig schemaConfig) { +super(null, new RootSchema(), ""); +this.schemaConfig = schemaConfig; +this.storages = storages; + } + + @Override + public CalciteSchema getSubSchema(String schemaName, boolean caseSensitive) { +CalciteSchema retSchema = getSubSchemaMap().get(schemaName); + +if (retSchema == null) { + loadSchemaFactory(schemaName, caseSensitive); +} + +retSchema = getSubSchemaMap().get(schemaName); +return retSchema; + } + + @Override + public NavigableSet getTableNames() { +Set pluginNames = Sets.newHashSet(); +for (Map.EntrystorageEntry : getSchemaFactories()) { + pluginNames.add(storageEntry.getKey()); +} +return Compatible.INSTANCE.navigableSet( +ImmutableSortedSet.copyOf( +Sets.union(pluginNames, getSubSchemaMap().keySet(; + } + + /** + * load schema factory(storage plugin) for schemaName + * @param schemaName + * @param caseSensitive + */ + public void loadSchemaFactory(String schemaName, boolean caseSensitive) { +try { + SchemaPlus thisPlus = this.plus(); + StoragePlugin plugin = getSchemaFactories().getPlugin(schemaName); + if (plugin != null) { +plugin.registerSchemas(schemaConfig, thisPlus); + } + else { +//this schema name could be `dfs.tmp`, a 2nd level schema under 'dfs' +String[] paths = schemaName.split("\\."); +if (paths.length == 2) { + plugin = getSchemaFactories().getPlugin(paths[0]); + if (plugin != null) { +plugin.registerSchemas(schemaConfig, thisPlus); + } --- End diff -- What if the plugin is null? Should we fail with an error, or just keep going? > Skip initializing all enabled storage plugins for every query > - > > Key: DRILL-5089 > URL: https://issues.apache.org/jira/browse/DRILL-5089 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning &
[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query
[ https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250404#comment-16250404 ] ASF GitHub Bot commented on DRILL-5089: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1032#discussion_r150685672 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java --- @@ -175,6 +193,21 @@ public WorkspaceSchema createSchema(List parentSchemaPath, SchemaConfig return new WorkspaceSchema(parentSchemaPath, schemaName, schemaConfig); } + public WorkspaceSchema createSchema(List parentSchemaPath, SchemaConfig schemaConfig, DrillFileSystem fs) throws IOException { +if (!accessible(fs)) { --- End diff -- Is returning null sufficient to tell the user that they don't have permission to do this operation? > Skip initializing all enabled storage plugins for every query > - > > Key: DRILL-5089 > URL: https://issues.apache.org/jira/browse/DRILL-5089 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Affects Versions: 1.9.0 >Reporter: Abhishek Girish >Assignee: Chunhui Shi >Priority: Critical > > In a query's lifecycle, at attempt is made to initialize each enabled storage > plugin, while building the schema tree. This is done regardless of the actual > plugins involved within a query. > Sometimes, when one or more of the enabled storage plugins have issues - > either due to misconfiguration or the underlying datasource being slow or > being down, the overall query time taken increases drastically. Most likely > due the attempt being made to register schemas from a faulty plugin. > For example, when a jdbc plugin is configured with SQL Server, and at one > point the underlying SQL Server db goes down, any Drill query starting to > execute at that point and beyond begin to slow down drastically. > We must skip registering unrelated schemas (& workspaces) for a query. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query
[ https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250392#comment-16250392 ] ASF GitHub Bot commented on DRILL-5089: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1032#discussion_r150680454 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/DynamicRootSchema.java --- @@ -0,0 +1,125 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.planner.sql; + +import com.google.common.collect.ImmutableSortedSet; +import com.google.common.collect.Lists; +import com.google.common.collect.Sets; +import org.apache.calcite.DataContext; +import org.apache.calcite.jdbc.CalciteRootSchema; +import org.apache.calcite.jdbc.CalciteSchema; + +import org.apache.calcite.linq4j.tree.Expression; +import org.apache.calcite.linq4j.tree.Expressions; +import org.apache.calcite.schema.SchemaPlus; +import org.apache.calcite.schema.impl.AbstractSchema; +import org.apache.calcite.util.BuiltInMethod; +import org.apache.calcite.util.Compatible; +import org.apache.drill.common.exceptions.ExecutionSetupException; +import org.apache.drill.exec.store.SchemaConfig; +import org.apache.drill.exec.store.StoragePlugin; +import org.apache.drill.exec.store.StoragePluginRegistry; +import org.apache.drill.exec.store.SubSchemaWrapper; + +import java.io.IOException; +import java.util.Map; +import java.util.NavigableSet; +import java.util.Set; + +public class DynamicRootSchema extends DynamicSchema +implements CalciteRootSchema { + + /** Creates a root schema. */ + DynamicRootSchema(StoragePluginRegistry storages, SchemaConfig schemaConfig) { +super(null, new RootSchema(), ""); +this.schemaConfig = schemaConfig; +this.storages = storages; + } + + @Override + public CalciteSchema getSubSchema(String schemaName, boolean caseSensitive) { +CalciteSchema retSchema = getSubSchemaMap().get(schemaName); + +if (retSchema == null) { + loadSchemaFactory(schemaName, caseSensitive); +} + +retSchema = getSubSchemaMap().get(schemaName); +return retSchema; + } + + @Override + public NavigableSet getTableNames() { +Set pluginNames = Sets.newHashSet(); +for (Map.EntrystorageEntry : getSchemaFactories()) { + pluginNames.add(storageEntry.getKey()); +} +return Compatible.INSTANCE.navigableSet( +ImmutableSortedSet.copyOf( +Sets.union(pluginNames, getSubSchemaMap().keySet(; + } + + /** + * load schema factory(storage plugin) for schemaName + * @param schemaName + * @param caseSensitive + */ + public void loadSchemaFactory(String schemaName, boolean caseSensitive) { +try { + SchemaPlus thisPlus = this.plus(); + StoragePlugin plugin = getSchemaFactories().getPlugin(schemaName); + if (plugin != null) { +plugin.registerSchemas(schemaConfig, thisPlus); + } + else { +//this schema name could be `dfs.tmp`, a 2nd level schema under 'dfs' +String[] paths = schemaName.split("\\."); +if (paths.length == 2) { + plugin = getSchemaFactories().getPlugin(paths[0]); + if (plugin != null) { +plugin.registerSchemas(schemaConfig, thisPlus); + } + + final CalciteSchema secondLevelSchema = getSubSchemaMap().get(paths[0]).getSubSchema(paths[1], caseSensitive); + if (secondLevelSchema != null) { +SchemaPlus secondlevel = secondLevelSchema.plus(); +org.apache.drill.exec.store.AbstractSchema drillSchema = + secondlevel.unwrap(org.apache.drill.exec.store.AbstractSchema.class); +SubSchemaWrapper wrapper =
[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query
[ https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250409#comment-16250409 ] ASF GitHub Bot commented on DRILL-5089: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1032#discussion_r150682963 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/SchemaTreeProvider.java --- @@ -20,12 +20,13 @@ import java.io.IOException; import java.util.List; -import org.apache.calcite.jdbc.SimpleCalciteSchema; +//import org.apache.calcite.jdbc.SimpleCalciteSchema; --- End diff -- Remove > Skip initializing all enabled storage plugins for every query > - > > Key: DRILL-5089 > URL: https://issues.apache.org/jira/browse/DRILL-5089 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Affects Versions: 1.9.0 >Reporter: Abhishek Girish >Assignee: Chunhui Shi >Priority: Critical > > In a query's lifecycle, at attempt is made to initialize each enabled storage > plugin, while building the schema tree. This is done regardless of the actual > plugins involved within a query. > Sometimes, when one or more of the enabled storage plugins have issues - > either due to misconfiguration or the underlying datasource being slow or > being down, the overall query time taken increases drastically. Most likely > due the attempt being made to register schemas from a faulty plugin. > For example, when a jdbc plugin is configured with SQL Server, and at one > point the underlying SQL Server db goes down, any Drill query starting to > execute at that point and beyond begin to slow down drastically. > We must skip registering unrelated schemas (& workspaces) for a query. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query
[ https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250403#comment-16250403 ] ASF GitHub Bot commented on DRILL-5089: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1032#discussion_r150685414 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java --- @@ -150,14 +152,30 @@ public WorkspaceSchemaFactory( * @return True if the user has access. False otherwise. */ public boolean accessible(final String userName) throws IOException { -final FileSystem fs = ImpersonationUtil.createFileSystem(userName, fsConf); +final DrillFileSystem fs = ImpersonationUtil.createFileSystem(userName, fsConf); +return accessible(fs); + } + + /** + * Checks whether a FileSystem object has the permission to list/read workspace directory + * @param fs a DrillFileSystem object that was created with certain user privilege + * @return True if the user has access. False otherwise. + * @throws IOException + */ + public boolean accessible(DrillFileSystem fs) throws IOException { try { - // We have to rely on the listStatus as a FileSystem can have complicated controls such as regular unix style - // permissions, Access Control Lists (ACLs) or Access Control Expressions (ACE). Hadoop 2.7 version of FileSystem - // has a limited private API (FileSystem.access) to check the permissions directly - // (see https://issues.apache.org/jira/browse/HDFS-6570). Drill currently relies on Hadoop 2.5.0 version of - // FileClient. TODO: Update this when DRILL-3749 is fixed. - fs.listStatus(wsPath); + /** + * For Windows local file system, fs.access ends up using DeprecatedRawLocalFileStatus which has + * TrustedInstaller as owner, and a member of Administrators group could not satisfy the permission. + * In this case, we will still use method listStatus. + * In other cases, we use access method since it is cheaper. + */ + if (SystemUtils.IS_OS_WINDOWS && fs.getUri().getScheme().equalsIgnoreCase("file")) { --- End diff -- HDFS probably defines a constant for "file". Should we reference that? > Skip initializing all enabled storage plugins for every query > - > > Key: DRILL-5089 > URL: https://issues.apache.org/jira/browse/DRILL-5089 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Affects Versions: 1.9.0 >Reporter: Abhishek Girish >Assignee: Chunhui Shi >Priority: Critical > > In a query's lifecycle, at attempt is made to initialize each enabled storage > plugin, while building the schema tree. This is done regardless of the actual > plugins involved within a query. > Sometimes, when one or more of the enabled storage plugins have issues - > either due to misconfiguration or the underlying datasource being slow or > being down, the overall query time taken increases drastically. Most likely > due the attempt being made to register schemas from a faulty plugin. > For example, when a jdbc plugin is configured with SQL Server, and at one > point the underlying SQL Server db goes down, any Drill query starting to > execute at that point and beyond begin to slow down drastically. > We must skip registering unrelated schemas (& workspaces) for a query. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query
[ https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250405#comment-16250405 ] ASF GitHub Bot commented on DRILL-5089: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1032#discussion_r150684042 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemSchemaFactory.java --- @@ -52,9 +55,20 @@ private List factories; private String schemaName; + protected FileSystemPlugin plugin; public FileSystemSchemaFactory(String schemaName, List factories) { super(); +if (factories.size() > 0 ) { + this.plugin = factories.get(0).getPlugin(); --- End diff -- A comment would be helpful: what is special about plugin 0 that it should be the default? Is plugin 0 the default of some sort? > Skip initializing all enabled storage plugins for every query > - > > Key: DRILL-5089 > URL: https://issues.apache.org/jira/browse/DRILL-5089 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Affects Versions: 1.9.0 >Reporter: Abhishek Girish >Assignee: Chunhui Shi >Priority: Critical > > In a query's lifecycle, at attempt is made to initialize each enabled storage > plugin, while building the schema tree. This is done regardless of the actual > plugins involved within a query. > Sometimes, when one or more of the enabled storage plugins have issues - > either due to misconfiguration or the underlying datasource being slow or > being down, the overall query time taken increases drastically. Most likely > due the attempt being made to register schemas from a faulty plugin. > For example, when a jdbc plugin is configured with SQL Server, and at one > point the underlying SQL Server db goes down, any Drill query starting to > execute at that point and beyond begin to slow down drastically. > We must skip registering unrelated schemas (& workspaces) for a query. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query
[ https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250397#comment-16250397 ] ASF GitHub Bot commented on DRILL-5089: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1032#discussion_r150682252 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/DynamicRootSchema.java --- @@ -0,0 +1,125 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.planner.sql; + +import com.google.common.collect.ImmutableSortedSet; +import com.google.common.collect.Lists; +import com.google.common.collect.Sets; +import org.apache.calcite.DataContext; +import org.apache.calcite.jdbc.CalciteRootSchema; +import org.apache.calcite.jdbc.CalciteSchema; + +import org.apache.calcite.linq4j.tree.Expression; +import org.apache.calcite.linq4j.tree.Expressions; +import org.apache.calcite.schema.SchemaPlus; +import org.apache.calcite.schema.impl.AbstractSchema; +import org.apache.calcite.util.BuiltInMethod; +import org.apache.calcite.util.Compatible; +import org.apache.drill.common.exceptions.ExecutionSetupException; +import org.apache.drill.exec.store.SchemaConfig; +import org.apache.drill.exec.store.StoragePlugin; +import org.apache.drill.exec.store.StoragePluginRegistry; +import org.apache.drill.exec.store.SubSchemaWrapper; + +import java.io.IOException; +import java.util.Map; +import java.util.NavigableSet; +import java.util.Set; + +public class DynamicRootSchema extends DynamicSchema +implements CalciteRootSchema { + + /** Creates a root schema. */ + DynamicRootSchema(StoragePluginRegistry storages, SchemaConfig schemaConfig) { +super(null, new RootSchema(), ""); +this.schemaConfig = schemaConfig; +this.storages = storages; + } + + @Override + public CalciteSchema getSubSchema(String schemaName, boolean caseSensitive) { +CalciteSchema retSchema = getSubSchemaMap().get(schemaName); + +if (retSchema == null) { + loadSchemaFactory(schemaName, caseSensitive); +} + +retSchema = getSubSchemaMap().get(schemaName); +return retSchema; + } + + @Override + public NavigableSet getTableNames() { +Set pluginNames = Sets.newHashSet(); +for (Map.EntrystorageEntry : getSchemaFactories()) { + pluginNames.add(storageEntry.getKey()); +} +return Compatible.INSTANCE.navigableSet( +ImmutableSortedSet.copyOf( +Sets.union(pluginNames, getSubSchemaMap().keySet(; + } + + /** + * load schema factory(storage plugin) for schemaName + * @param schemaName + * @param caseSensitive + */ + public void loadSchemaFactory(String schemaName, boolean caseSensitive) { +try { + SchemaPlus thisPlus = this.plus(); + StoragePlugin plugin = getSchemaFactories().getPlugin(schemaName); + if (plugin != null) { +plugin.registerSchemas(schemaConfig, thisPlus); + } + else { +//this schema name could be `dfs.tmp`, a 2nd level schema under 'dfs' +String[] paths = schemaName.split("\\."); +if (paths.length == 2) { + plugin = getSchemaFactories().getPlugin(paths[0]); + if (plugin != null) { +plugin.registerSchemas(schemaConfig, thisPlus); + } + + final CalciteSchema secondLevelSchema = getSubSchemaMap().get(paths[0]).getSubSchema(paths[1], caseSensitive); + if (secondLevelSchema != null) { +SchemaPlus secondlevel = secondLevelSchema.plus(); +org.apache.drill.exec.store.AbstractSchema drillSchema = --- End diff -- Fully qualified name needed? Or, will an import work? > Skip initializing all enabled storage plugins for
[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query
[ https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250396#comment-16250396 ] ASF GitHub Bot commented on DRILL-5089: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1032#discussion_r150683680 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/SchemaTreeProvider.java --- @@ -105,12 +106,36 @@ public SchemaPlus createRootSchema(final String userName, final SchemaConfigInfo * @return */ public SchemaPlus createRootSchema(SchemaConfig schemaConfig) { + final SchemaPlus rootSchema = DynamicSchema.createRootSchema(dContext.getStorage(), schemaConfig); + schemaTreesToClose.add(rootSchema); + return rootSchema; + } + + /** + * Return full root schema with schema owner as the given user. + * + * @param userName Name of the user who is accessing the storage sources. + * @param provider {@link SchemaConfigInfoProvider} instance + * @return Root of the schema tree. + */ + public SchemaPlus createFullRootSchema(final String userName, final SchemaConfigInfoProvider provider) { +final String schemaUser = isImpersonationEnabled ? userName : ImpersonationUtil.getProcessUserName(); --- End diff -- Should this be factored out somewhere? Seems this magic stanza will be needed in many places: better to have one copy than several. Maybe a method in `ImpersonationUtil`? > Skip initializing all enabled storage plugins for every query > - > > Key: DRILL-5089 > URL: https://issues.apache.org/jira/browse/DRILL-5089 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Affects Versions: 1.9.0 >Reporter: Abhishek Girish >Assignee: Chunhui Shi >Priority: Critical > > In a query's lifecycle, at attempt is made to initialize each enabled storage > plugin, while building the schema tree. This is done regardless of the actual > plugins involved within a query. > Sometimes, when one or more of the enabled storage plugins have issues - > either due to misconfiguration or the underlying datasource being slow or > being down, the overall query time taken increases drastically. Most likely > due the attempt being made to register schemas from a faulty plugin. > For example, when a jdbc plugin is configured with SQL Server, and at one > point the underlying SQL Server db goes down, any Drill query starting to > execute at that point and beyond begin to slow down drastically. > We must skip registering unrelated schemas (& workspaces) for a query. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query
[ https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250388#comment-16250388 ] ASF GitHub Bot commented on DRILL-5089: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1032#discussion_r150678680 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/DynamicRootSchema.java --- @@ -0,0 +1,125 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.planner.sql; + +import com.google.common.collect.ImmutableSortedSet; +import com.google.common.collect.Lists; +import com.google.common.collect.Sets; +import org.apache.calcite.DataContext; +import org.apache.calcite.jdbc.CalciteRootSchema; +import org.apache.calcite.jdbc.CalciteSchema; + +import org.apache.calcite.linq4j.tree.Expression; +import org.apache.calcite.linq4j.tree.Expressions; +import org.apache.calcite.schema.SchemaPlus; +import org.apache.calcite.schema.impl.AbstractSchema; +import org.apache.calcite.util.BuiltInMethod; +import org.apache.calcite.util.Compatible; +import org.apache.drill.common.exceptions.ExecutionSetupException; +import org.apache.drill.exec.store.SchemaConfig; +import org.apache.drill.exec.store.StoragePlugin; +import org.apache.drill.exec.store.StoragePluginRegistry; +import org.apache.drill.exec.store.SubSchemaWrapper; + +import java.io.IOException; +import java.util.Map; +import java.util.NavigableSet; +import java.util.Set; + +public class DynamicRootSchema extends DynamicSchema +implements CalciteRootSchema { + + /** Creates a root schema. */ + DynamicRootSchema(StoragePluginRegistry storages, SchemaConfig schemaConfig) { +super(null, new RootSchema(), ""); +this.schemaConfig = schemaConfig; +this.storages = storages; + } + + @Override + public CalciteSchema getSubSchema(String schemaName, boolean caseSensitive) { +CalciteSchema retSchema = getSubSchemaMap().get(schemaName); + +if (retSchema == null) { + loadSchemaFactory(schemaName, caseSensitive); +} + +retSchema = getSubSchemaMap().get(schemaName); +return retSchema; --- End diff -- if the original call returns non-null, we make the same call a second time. Better: ``` retSchema = ... if (retSchema != null) { return retSchema; } loadSchemaFactory(...) return getSubSchemaMap()... ``` > Skip initializing all enabled storage plugins for every query > - > > Key: DRILL-5089 > URL: https://issues.apache.org/jira/browse/DRILL-5089 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Affects Versions: 1.9.0 >Reporter: Abhishek Girish >Assignee: Chunhui Shi >Priority: Critical > > In a query's lifecycle, at attempt is made to initialize each enabled storage > plugin, while building the schema tree. This is done regardless of the actual > plugins involved within a query. > Sometimes, when one or more of the enabled storage plugins have issues - > either due to misconfiguration or the underlying datasource being slow or > being down, the overall query time taken increases drastically. Most likely > due the attempt being made to register schemas from a faulty plugin. > For example, when a jdbc plugin is configured with SQL Server, and at one > point the underlying SQL Server db goes down, any Drill query starting to > execute at that point and beyond begin to slow down drastically. > We must skip registering unrelated schemas (& workspaces) for a query. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query
[ https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250399#comment-16250399 ] ASF GitHub Bot commented on DRILL-5089: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1032#discussion_r150684114 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemSchemaFactory.java --- @@ -52,9 +55,20 @@ private List factories; private String schemaName; + protected FileSystemPlugin plugin; public FileSystemSchemaFactory(String schemaName, List factories) { super(); +if (factories.size() > 0 ) { + this.plugin = factories.get(0).getPlugin(); +} +this.schemaName = schemaName; +this.factories = factories; + } + + public FileSystemSchemaFactory(FileSystemPlugin plugin, String schemaName, List factories) { +super(); --- End diff -- Omit; Java does this by default. > Skip initializing all enabled storage plugins for every query > - > > Key: DRILL-5089 > URL: https://issues.apache.org/jira/browse/DRILL-5089 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Affects Versions: 1.9.0 >Reporter: Abhishek Girish >Assignee: Chunhui Shi >Priority: Critical > > In a query's lifecycle, at attempt is made to initialize each enabled storage > plugin, while building the schema tree. This is done regardless of the actual > plugins involved within a query. > Sometimes, when one or more of the enabled storage plugins have issues - > either due to misconfiguration or the underlying datasource being slow or > being down, the overall query time taken increases drastically. Most likely > due the attempt being made to register schemas from a faulty plugin. > For example, when a jdbc plugin is configured with SQL Server, and at one > point the underlying SQL Server db goes down, any Drill query starting to > execute at that point and beyond begin to slow down drastically. > We must skip registering unrelated schemas (& workspaces) for a query. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query
[ https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250407#comment-16250407 ] ASF GitHub Bot commented on DRILL-5089: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1032#discussion_r150685946 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java --- @@ -532,7 +572,10 @@ public boolean isMutable() { } public DrillFileSystem getFS() { - return fs; + if (this.fs == null) { +this.fs = ImpersonationUtil.createFileSystem(schemaConfig.getUserName(), fsConf); + } + return this.fs; --- End diff -- No need for `this.fs`, just `fs` will do. > Skip initializing all enabled storage plugins for every query > - > > Key: DRILL-5089 > URL: https://issues.apache.org/jira/browse/DRILL-5089 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Affects Versions: 1.9.0 >Reporter: Abhishek Girish >Assignee: Chunhui Shi >Priority: Critical > > In a query's lifecycle, at attempt is made to initialize each enabled storage > plugin, while building the schema tree. This is done regardless of the actual > plugins involved within a query. > Sometimes, when one or more of the enabled storage plugins have issues - > either due to misconfiguration or the underlying datasource being slow or > being down, the overall query time taken increases drastically. Most likely > due the attempt being made to register schemas from a faulty plugin. > For example, when a jdbc plugin is configured with SQL Server, and at one > point the underlying SQL Server db goes down, any Drill query starting to > execute at that point and beyond begin to slow down drastically. > We must skip registering unrelated schemas (& workspaces) for a query. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query
[ https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250393#comment-16250393 ] ASF GitHub Bot commented on DRILL-5089: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1032#discussion_r150679636 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/DynamicRootSchema.java --- @@ -0,0 +1,125 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.planner.sql; + +import com.google.common.collect.ImmutableSortedSet; +import com.google.common.collect.Lists; +import com.google.common.collect.Sets; +import org.apache.calcite.DataContext; +import org.apache.calcite.jdbc.CalciteRootSchema; +import org.apache.calcite.jdbc.CalciteSchema; + +import org.apache.calcite.linq4j.tree.Expression; +import org.apache.calcite.linq4j.tree.Expressions; +import org.apache.calcite.schema.SchemaPlus; +import org.apache.calcite.schema.impl.AbstractSchema; +import org.apache.calcite.util.BuiltInMethod; +import org.apache.calcite.util.Compatible; +import org.apache.drill.common.exceptions.ExecutionSetupException; +import org.apache.drill.exec.store.SchemaConfig; +import org.apache.drill.exec.store.StoragePlugin; +import org.apache.drill.exec.store.StoragePluginRegistry; +import org.apache.drill.exec.store.SubSchemaWrapper; + +import java.io.IOException; +import java.util.Map; +import java.util.NavigableSet; +import java.util.Set; + +public class DynamicRootSchema extends DynamicSchema +implements CalciteRootSchema { + + /** Creates a root schema. */ + DynamicRootSchema(StoragePluginRegistry storages, SchemaConfig schemaConfig) { +super(null, new RootSchema(), ""); +this.schemaConfig = schemaConfig; +this.storages = storages; + } + + @Override + public CalciteSchema getSubSchema(String schemaName, boolean caseSensitive) { +CalciteSchema retSchema = getSubSchemaMap().get(schemaName); + +if (retSchema == null) { + loadSchemaFactory(schemaName, caseSensitive); +} + +retSchema = getSubSchemaMap().get(schemaName); +return retSchema; + } + + @Override + public NavigableSet getTableNames() { +Set pluginNames = Sets.newHashSet(); +for (Map.EntrystorageEntry : getSchemaFactories()) { + pluginNames.add(storageEntry.getKey()); +} +return Compatible.INSTANCE.navigableSet( +ImmutableSortedSet.copyOf( +Sets.union(pluginNames, getSubSchemaMap().keySet(; + } + + /** + * load schema factory(storage plugin) for schemaName + * @param schemaName + * @param caseSensitive + */ + public void loadSchemaFactory(String schemaName, boolean caseSensitive) { +try { + SchemaPlus thisPlus = this.plus(); + StoragePlugin plugin = getSchemaFactories().getPlugin(schemaName); + if (plugin != null) { +plugin.registerSchemas(schemaConfig, thisPlus); + } --- End diff -- If the name is `dfs.test`, we first look up the compound name, then the parts? Why? Do we put the compound names in the map? Or can we have one schema named "dfs.test" and another called `dfs`.`test`? Or, can this code be restructured a bit? > Skip initializing all enabled storage plugins for every query > - > > Key: DRILL-5089 > URL: https://issues.apache.org/jira/browse/DRILL-5089 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Affects Versions: 1.9.0 >Reporter: Abhishek Girish >Assignee: Chunhui Shi >Priority: Critical > > In a query's lifecycle, at attempt is made to initialize each
[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query
[ https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250406#comment-16250406 ] ASF GitHub Bot commented on DRILL-5089: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1032#discussion_r150686404 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/ops/FragmentContext.java --- @@ -229,7 +229,7 @@ public DrillbitContext getDrillbitContext() { return context; } - public SchemaPlus getRootSchema() { + public SchemaPlus getFullRootSchema() { --- End diff -- Comment to explain what a "full root schema" is? Apparently, this is both the plugin config name and workspace combined? > Skip initializing all enabled storage plugins for every query > - > > Key: DRILL-5089 > URL: https://issues.apache.org/jira/browse/DRILL-5089 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Affects Versions: 1.9.0 >Reporter: Abhishek Girish >Assignee: Chunhui Shi >Priority: Critical > > In a query's lifecycle, at attempt is made to initialize each enabled storage > plugin, while building the schema tree. This is done regardless of the actual > plugins involved within a query. > Sometimes, when one or more of the enabled storage plugins have issues - > either due to misconfiguration or the underlying datasource being slow or > being down, the overall query time taken increases drastically. Most likely > due the attempt being made to register schemas from a faulty plugin. > For example, when a jdbc plugin is configured with SQL Server, and at one > point the underlying SQL Server db goes down, any Drill query starting to > execute at that point and beyond begin to slow down drastically. > We must skip registering unrelated schemas (& workspaces) for a query. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query
[ https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250400#comment-16250400 ] ASF GitHub Bot commented on DRILL-5089: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1032#discussion_r150682787 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/DynamicSchema.java --- @@ -0,0 +1,65 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.planner.sql; + +import org.apache.calcite.jdbc.CalciteSchema; +import org.apache.calcite.jdbc.SimpleCalciteSchema; +import org.apache.calcite.schema.Schema; +import org.apache.calcite.schema.SchemaPlus; +import org.apache.drill.exec.store.SchemaConfig; +import org.apache.drill.exec.store.StoragePluginRegistry; + + +/** + * Unlike SimpleCalciteSchema, DynamicSchema could have an empty or partial schemaMap, but it could maintain a map of + * name->SchemaFactory, and only register schema when the corresponsdent name is requested. + */ +public class DynamicSchema extends SimpleCalciteSchema { + + protected SchemaConfig schemaConfig; --- End diff -- Seems the schemaConfig is never set or used. > Skip initializing all enabled storage plugins for every query > - > > Key: DRILL-5089 > URL: https://issues.apache.org/jira/browse/DRILL-5089 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Affects Versions: 1.9.0 >Reporter: Abhishek Girish >Assignee: Chunhui Shi >Priority: Critical > > In a query's lifecycle, at attempt is made to initialize each enabled storage > plugin, while building the schema tree. This is done regardless of the actual > plugins involved within a query. > Sometimes, when one or more of the enabled storage plugins have issues - > either due to misconfiguration or the underlying datasource being slow or > being down, the overall query time taken increases drastically. Most likely > due the attempt being made to register schemas from a faulty plugin. > For example, when a jdbc plugin is configured with SQL Server, and at one > point the underlying SQL Server db goes down, any Drill query starting to > execute at that point and beyond begin to slow down drastically. > We must skip registering unrelated schemas (& workspaces) for a query. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query
[ https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250390#comment-16250390 ] ASF GitHub Bot commented on DRILL-5089: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1032#discussion_r150679941 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/DynamicRootSchema.java --- @@ -0,0 +1,125 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.planner.sql; + +import com.google.common.collect.ImmutableSortedSet; +import com.google.common.collect.Lists; +import com.google.common.collect.Sets; +import org.apache.calcite.DataContext; +import org.apache.calcite.jdbc.CalciteRootSchema; +import org.apache.calcite.jdbc.CalciteSchema; + +import org.apache.calcite.linq4j.tree.Expression; +import org.apache.calcite.linq4j.tree.Expressions; +import org.apache.calcite.schema.SchemaPlus; +import org.apache.calcite.schema.impl.AbstractSchema; +import org.apache.calcite.util.BuiltInMethod; +import org.apache.calcite.util.Compatible; +import org.apache.drill.common.exceptions.ExecutionSetupException; +import org.apache.drill.exec.store.SchemaConfig; +import org.apache.drill.exec.store.StoragePlugin; +import org.apache.drill.exec.store.StoragePluginRegistry; +import org.apache.drill.exec.store.SubSchemaWrapper; + +import java.io.IOException; +import java.util.Map; +import java.util.NavigableSet; +import java.util.Set; + +public class DynamicRootSchema extends DynamicSchema +implements CalciteRootSchema { + + /** Creates a root schema. */ + DynamicRootSchema(StoragePluginRegistry storages, SchemaConfig schemaConfig) { +super(null, new RootSchema(), ""); +this.schemaConfig = schemaConfig; +this.storages = storages; + } + + @Override + public CalciteSchema getSubSchema(String schemaName, boolean caseSensitive) { +CalciteSchema retSchema = getSubSchemaMap().get(schemaName); + +if (retSchema == null) { + loadSchemaFactory(schemaName, caseSensitive); +} + +retSchema = getSubSchemaMap().get(schemaName); +return retSchema; + } + + @Override + public NavigableSet getTableNames() { +Set pluginNames = Sets.newHashSet(); +for (Map.EntrystorageEntry : getSchemaFactories()) { + pluginNames.add(storageEntry.getKey()); +} +return Compatible.INSTANCE.navigableSet( +ImmutableSortedSet.copyOf( +Sets.union(pluginNames, getSubSchemaMap().keySet(; + } + + /** + * load schema factory(storage plugin) for schemaName + * @param schemaName + * @param caseSensitive + */ + public void loadSchemaFactory(String schemaName, boolean caseSensitive) { +try { + SchemaPlus thisPlus = this.plus(); + StoragePlugin plugin = getSchemaFactories().getPlugin(schemaName); + if (plugin != null) { +plugin.registerSchemas(schemaConfig, thisPlus); + } + else { +//this schema name could be `dfs.tmp`, a 2nd level schema under 'dfs' +String[] paths = schemaName.split("\\."); --- End diff -- Should this be done here in this simple way? How many other places do we do the same thing? Or, should we have a common function to split schema names so we can handle, way, escapes and other special cases that might come along? > Skip initializing all enabled storage plugins for every query > - > > Key: DRILL-5089 > URL: https://issues.apache.org/jira/browse/DRILL-5089 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Affects Versions: 1.9.0 >Reporter: Abhishek
[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query
[ https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16060112#comment-16060112 ] ASF GitHub Bot commented on DRILL-5089: --- Github user paul-rogers commented on the issue: https://github.com/apache/drill/pull/795 Is this PR ready for final review? Or, on hold waiting for something? > Skip initializing all enabled storage plugins for every query > - > > Key: DRILL-5089 > URL: https://issues.apache.org/jira/browse/DRILL-5089 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Affects Versions: 1.9.0 >Reporter: Abhishek Girish >Assignee: Chunhui Shi >Priority: Critical > > In a query's lifecycle, at attempt is made to initialize each enabled storage > plugin, while building the schema tree. This is done regardless of the actual > plugins involved within a query. > Sometimes, when one or more of the enabled storage plugins have issues - > either due to misconfiguration or the underlying datasource being slow or > being down, the overall query time taken increases drastically. Most likely > due the attempt being made to register schemas from a faulty plugin. > For example, when a jdbc plugin is configured with SQL Server, and at one > point the underlying SQL Server db goes down, any Drill query starting to > execute at that point and beyond begin to slow down drastically. > We must skip registering unrelated schemas (& workspaces) for a query. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query
[ https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15968517#comment-15968517 ] Julian Hyde commented on DRILL-5089: How are you able to call {{SimpleCalciteSchema.from}}? Isn't {{SimpleCalciteSchema}} package-private? In CALCITE-1748 you ask to override a method that returns {{CalciteSchema}} but in CALCITE-911 we agreed that Drill wouldn't create your own CalciteSchema sub-classes. What has changed? Can you take a look at CALCITE-1742, and tell me how it relates to your problem? I'd rather fix Calcite's schema cache for Drill's and Phoenix's needs rather than let people drill holes in our APIs. > Skip initializing all enabled storage plugins for every query > - > > Key: DRILL-5089 > URL: https://issues.apache.org/jira/browse/DRILL-5089 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Affects Versions: 1.9.0 >Reporter: Abhishek Girish >Assignee: Chunhui Shi >Priority: Critical > > In a query's lifecycle, at attempt is made to initialize each enabled storage > plugin, while building the schema tree. This is done regardless of the actual > plugins involved within a query. > Sometimes, when one or more of the enabled storage plugins have issues - > either due to misconfiguration or the underlying datasource being slow or > being down, the overall query time taken increases drastically. Most likely > due the attempt being made to register schemas from a faulty plugin. > For example, when a jdbc plugin is configured with SQL Server, and at one > point the underlying SQL Server db goes down, any Drill query starting to > execute at that point and beyond begin to slow down drastically. > We must skip registering unrelated schemas (& workspaces) for a query. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query
[ https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15942163#comment-15942163 ] ASF GitHub Bot commented on DRILL-5089: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/795#discussion_r108051728 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/SchemaTreeProvider.java --- @@ -48,7 +55,8 @@ public SchemaTreeProvider(final DrillbitContext dContext) { this.dContext = dContext; schemaTreesToClose = Lists.newArrayList(); -isImpersonationEnabled = dContext.getConfig().getBoolean(ExecConstants.IMPERSONATION_ENABLED); +final DrillConfig config = dContext.getConfig(); +isImpersonationEnabled = config == null? false : config.getBoolean(ExecConstants.IMPERSONATION_ENABLED); --- End diff -- Under what conditions will the config be null? Actually, should never be; we depend on config everywhere. So, just get the boolean and let the code get an NPE if someone makes a mistake and omits the (required) config. > Skip initializing all enabled storage plugins for every query > - > > Key: DRILL-5089 > URL: https://issues.apache.org/jira/browse/DRILL-5089 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Affects Versions: 1.9.0 >Reporter: Abhishek Girish >Assignee: Chunhui Shi >Priority: Critical > > In a query's lifecycle, at attempt is made to initialize each enabled storage > plugin, while building the schema tree. This is done regardless of the actual > plugins involved within a query. > Sometimes, when one or more of the enabled storage plugins have issues - > either due to misconfiguration or the underlying datasource being slow or > being down, the overall query time taken increases drastically. Most likely > due the attempt being made to register schemas from a faulty plugin. > For example, when a jdbc plugin is configured with SQL Server, and at one > point the underlying SQL Server db goes down, any Drill query starting to > execute at that point and beyond begin to slow down drastically. > We must skip registering unrelated schemas (& workspaces) for a query. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query
[ https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15942155#comment-15942155 ] ASF GitHub Bot commented on DRILL-5089: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/795#discussion_r108051767 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/SchemaTreeProvider.java --- @@ -119,6 +127,74 @@ public SchemaPlus createRootSchema(SchemaConfig schemaConfig) { } } + + public SchemaPlus createPartialRootSchema(final String userName, final SchemaConfigInfoProvider provider, +final String storage) { +final String schemaUser = isImpersonationEnabled ? userName : ImpersonationUtil.getProcessUserName(); --- End diff -- Please move the user resolution into a method rather than duplicating the code. More broadly, wouldn't the answer here be a constant for any qiven query? So, there is no reason to keep recomputing it. Can we just do it once and save it somewhere? Let's think. If this whole structure is for a single query, then the user is constant and can be stored in the object itself. Is this the only place where we need to figure out impersonation and actual user? If not, then the setup should be done at a higher level, not in each place where it is needed. Perhaps we need a UserIdentity class that has the loginName and effectiveName, where these are the same for non-impersonation, but differ for impersonation. > Skip initializing all enabled storage plugins for every query > - > > Key: DRILL-5089 > URL: https://issues.apache.org/jira/browse/DRILL-5089 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Affects Versions: 1.9.0 >Reporter: Abhishek Girish >Assignee: Chunhui Shi >Priority: Critical > > In a query's lifecycle, at attempt is made to initialize each enabled storage > plugin, while building the schema tree. This is done regardless of the actual > plugins involved within a query. > Sometimes, when one or more of the enabled storage plugins have issues - > either due to misconfiguration or the underlying datasource being slow or > being down, the overall query time taken increases drastically. Most likely > due the attempt being made to register schemas from a faulty plugin. > For example, when a jdbc plugin is configured with SQL Server, and at one > point the underlying SQL Server db goes down, any Drill query starting to > execute at that point and beyond begin to slow down drastically. > We must skip registering unrelated schemas (& workspaces) for a query. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query
[ https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15942170#comment-15942170 ] ASF GitHub Bot commented on DRILL-5089: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/795#discussion_r108052385 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/DrillCatalogReader.java --- @@ -0,0 +1,180 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.planner.sql; + +import com.google.common.base.Joiner; +import com.google.common.collect.ImmutableList; +import com.google.common.collect.Lists; +import com.google.common.collect.Sets; +import org.apache.calcite.adapter.java.JavaTypeFactory; +import org.apache.calcite.jdbc.CalciteSchema; +import org.apache.calcite.jdbc.CalciteSchemaImpl; +import org.apache.calcite.jdbc.SimpleCalciteSchema; +import org.apache.calcite.prepare.CalciteCatalogReader; +import org.apache.calcite.prepare.RelOptTableImpl; +import org.apache.calcite.schema.SchemaPlus; +import org.apache.drill.common.config.DrillConfig; +import org.apache.drill.common.exceptions.UserException; +import org.apache.drill.exec.ops.QueryContext; +import org.apache.drill.exec.store.SchemaConfig; + +import java.util.Arrays; +import java.util.List; +import java.util.Set; + +/** + * Implementation of {@link org.apache.calcite.prepare.Prepare.CatalogReader} --- End diff -- Maybe explain the purpose and not just the implementation? What part of query processing does this address? Does this resolve references to schemas and workspaces? Then the schema resolves the reference to a table name? > Skip initializing all enabled storage plugins for every query > - > > Key: DRILL-5089 > URL: https://issues.apache.org/jira/browse/DRILL-5089 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Affects Versions: 1.9.0 >Reporter: Abhishek Girish >Assignee: Chunhui Shi >Priority: Critical > > In a query's lifecycle, at attempt is made to initialize each enabled storage > plugin, while building the schema tree. This is done regardless of the actual > plugins involved within a query. > Sometimes, when one or more of the enabled storage plugins have issues - > either due to misconfiguration or the underlying datasource being slow or > being down, the overall query time taken increases drastically. Most likely > due the attempt being made to register schemas from a faulty plugin. > For example, when a jdbc plugin is configured with SQL Server, and at one > point the underlying SQL Server db goes down, any Drill query starting to > execute at that point and beyond begin to slow down drastically. > We must skip registering unrelated schemas (& workspaces) for a query. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query
[ https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15942157#comment-15942157 ] ASF GitHub Bot commented on DRILL-5089: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/795#discussion_r108051773 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/SchemaTreeProvider.java --- @@ -119,6 +127,74 @@ public SchemaPlus createRootSchema(SchemaConfig schemaConfig) { } } + + public SchemaPlus createPartialRootSchema(final String userName, final SchemaConfigInfoProvider provider, +final String storage) { +final String schemaUser = isImpersonationEnabled ? userName : ImpersonationUtil.getProcessUserName(); +final SchemaConfig schemaConfig = SchemaConfig.newBuilder(schemaUser, provider).build(); +final SchemaPlus rootSchema = SimpleCalciteSchema.createRootSchema(false); +Set storageSet = Sets.newHashSet(); +storageSet.add(storage); +addNewStoragesToRootSchema(schemaConfig, rootSchema, storageSet); +schemaTreesToClose.add(rootSchema); +return rootSchema; + } + + public SchemaPlus addPartialRootSchema(final String userName, final SchemaConfigInfoProvider provider, --- End diff -- Would be very helpful to add a bit of Javadoc to explain the purpose of these methods. > Skip initializing all enabled storage plugins for every query > - > > Key: DRILL-5089 > URL: https://issues.apache.org/jira/browse/DRILL-5089 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Affects Versions: 1.9.0 >Reporter: Abhishek Girish >Assignee: Chunhui Shi >Priority: Critical > > In a query's lifecycle, at attempt is made to initialize each enabled storage > plugin, while building the schema tree. This is done regardless of the actual > plugins involved within a query. > Sometimes, when one or more of the enabled storage plugins have issues - > either due to misconfiguration or the underlying datasource being slow or > being down, the overall query time taken increases drastically. Most likely > due the attempt being made to register schemas from a faulty plugin. > For example, when a jdbc plugin is configured with SQL Server, and at one > point the underlying SQL Server db goes down, any Drill query starting to > execute at that point and beyond begin to slow down drastically. > We must skip registering unrelated schemas (& workspaces) for a query. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query
[ https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15942167#comment-15942167 ] ASF GitHub Bot commented on DRILL-5089: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/795#discussion_r108050520 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/DrillCatalogReader.java --- @@ -0,0 +1,180 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.planner.sql; + +import com.google.common.base.Joiner; +import com.google.common.collect.ImmutableList; +import com.google.common.collect.Lists; +import com.google.common.collect.Sets; +import org.apache.calcite.adapter.java.JavaTypeFactory; +import org.apache.calcite.jdbc.CalciteSchema; +import org.apache.calcite.jdbc.CalciteSchemaImpl; +import org.apache.calcite.jdbc.SimpleCalciteSchema; +import org.apache.calcite.prepare.CalciteCatalogReader; +import org.apache.calcite.prepare.RelOptTableImpl; +import org.apache.calcite.schema.SchemaPlus; +import org.apache.drill.common.config.DrillConfig; +import org.apache.drill.common.exceptions.UserException; +import org.apache.drill.exec.ops.QueryContext; +import org.apache.drill.exec.store.SchemaConfig; + +import java.util.Arrays; +import java.util.List; +import java.util.Set; + +/** + * Implementation of {@link org.apache.calcite.prepare.Prepare.CatalogReader} + * and also {@link org.apache.calcite.sql.SqlOperatorTable} based on tables and + * functions defined schemas. + * + */ +public class DrillCatalogReader extends CalciteCatalogReader { + private static org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(DrillCatalogReader.class); + + final private QueryContext queryContext; + private boolean allowTemporaryTables; + private String temporarySchema; + + public DrillCatalogReader( + QueryContext qcontext, + CalciteSchema rootSchema, + boolean caseSensitive, + List defaultSchema, + JavaTypeFactory typeFactory, + String temporarySchema) { +super(rootSchema, caseSensitive, defaultSchema, typeFactory); +assert rootSchema != defaultSchema; +queryContext = qcontext; +this.temporarySchema = temporarySchema; +this.allowTemporaryTables = true; + } + + /** Disallow temporary tables presence in sql statement (ex: in view definitions) */ + public void disallowTemporaryTables() { --- End diff -- Once the constructor gets as complex as this one, and we have secondary methods to set what should be immutable properties, it may be time to move to a builder pattern: ``` public class DrillCatalogReader ... static CatalogBuilder builder(QueryContext context) ... ... class CatalogBuilder { CatalogBuider(QueryContext context); CatalogBuilder withTemporarySchema(String temporarySchema); CatalogBuilder allowsTempTables(); CatalogBuilder caseSensitive(); ... DrillCatalogReader build(); } ``` Code that uses this class will be easier to follow with named methods for each property. Reasonable defaults reduces the number of constructor arguments and avoids the need for lots of code fixups when, inevitably, we add arguments. > Skip initializing all enabled storage plugins for every query > - > > Key: DRILL-5089 > URL: https://issues.apache.org/jira/browse/DRILL-5089 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Affects Versions: 1.9.0 >Reporter: Abhishek Girish >Assignee: Chunhui Shi >Priority: Critical > > In a query's lifecycle, at attempt is made to initialize each enabled storage > plugin, while
[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query
[ https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15942169#comment-15942169 ] ASF GitHub Bot commented on DRILL-5089: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/795#discussion_r108051870 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/SchemaTreeProvider.java --- @@ -119,6 +127,74 @@ public SchemaPlus createRootSchema(SchemaConfig schemaConfig) { } } + + public SchemaPlus createPartialRootSchema(final String userName, final SchemaConfigInfoProvider provider, +final String storage) { +final String schemaUser = isImpersonationEnabled ? userName : ImpersonationUtil.getProcessUserName(); +final SchemaConfig schemaConfig = SchemaConfig.newBuilder(schemaUser, provider).build(); +final SchemaPlus rootSchema = SimpleCalciteSchema.createRootSchema(false); +Set storageSet = Sets.newHashSet(); +storageSet.add(storage); +addNewStoragesToRootSchema(schemaConfig, rootSchema, storageSet); +schemaTreesToClose.add(rootSchema); +return rootSchema; + } + + public SchemaPlus addPartialRootSchema(final String userName, final SchemaConfigInfoProvider provider, +Set storages, SchemaPlus rootSchema) { +final String schemaUser = isImpersonationEnabled ? userName : ImpersonationUtil.getProcessUserName(); +final SchemaConfig schemaConfig = SchemaConfig.newBuilder(schemaUser, provider).build(); +addNewStoragesToRootSchema(schemaConfig, rootSchema, storages); +schemaTreesToClose.add(rootSchema); +return rootSchema; + } + + private void expandSecondLevelSchema(SchemaPlus parent) { +List secondLevelSchemas = Lists.newArrayList(); +for (String firstLevelSchemaName : parent.getSubSchemaNames()) { + SchemaPlus firstLevelSchema = parent.getSubSchema(firstLevelSchemaName); + for (String secondLevelSchemaName : firstLevelSchema.getSubSchemaNames()) { --- End diff -- Is there something special about second-level schemas? Must they be workspaces? Something else? Please explain. > Skip initializing all enabled storage plugins for every query > - > > Key: DRILL-5089 > URL: https://issues.apache.org/jira/browse/DRILL-5089 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Affects Versions: 1.9.0 >Reporter: Abhishek Girish >Assignee: Chunhui Shi >Priority: Critical > > In a query's lifecycle, at attempt is made to initialize each enabled storage > plugin, while building the schema tree. This is done regardless of the actual > plugins involved within a query. > Sometimes, when one or more of the enabled storage plugins have issues - > either due to misconfiguration or the underlying datasource being slow or > being down, the overall query time taken increases drastically. Most likely > due the attempt being made to register schemas from a faulty plugin. > For example, when a jdbc plugin is configured with SQL Server, and at one > point the underlying SQL Server db goes down, any Drill query starting to > execute at that point and beyond begin to slow down drastically. > We must skip registering unrelated schemas (& workspaces) for a query. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query
[ https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15942161#comment-15942161 ] ASF GitHub Bot commented on DRILL-5089: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/795#discussion_r108051710 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/AbstractSchema.java --- @@ -198,6 +200,10 @@ public Expression getExpression(SchemaPlus parentSchema, String name) { @Override public boolean contentsHaveChangedSince(long lastCheck, long now) { +//use cached schema for 1 second, it will be helpful if it is highly concurrent scenario +if( abs(now - lastCheck) < 1000 ) { + return false; --- End diff -- Clearer would be: ``` return (now - lastCheck) < 1000; ``` And, the 1000 should be made into a constant: ``` public static int CACHED_SCHEMA_REFRESH_MS = 1000; ``` > Skip initializing all enabled storage plugins for every query > - > > Key: DRILL-5089 > URL: https://issues.apache.org/jira/browse/DRILL-5089 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Affects Versions: 1.9.0 >Reporter: Abhishek Girish >Assignee: Chunhui Shi >Priority: Critical > > In a query's lifecycle, at attempt is made to initialize each enabled storage > plugin, while building the schema tree. This is done regardless of the actual > plugins involved within a query. > Sometimes, when one or more of the enabled storage plugins have issues - > either due to misconfiguration or the underlying datasource being slow or > being down, the overall query time taken increases drastically. Most likely > due the attempt being made to register schemas from a faulty plugin. > For example, when a jdbc plugin is configured with SQL Server, and at one > point the underlying SQL Server db goes down, any Drill query starting to > execute at that point and beyond begin to slow down drastically. > We must skip registering unrelated schemas (& workspaces) for a query. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query
[ https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15942159#comment-15942159 ] ASF GitHub Bot commented on DRILL-5089: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/795#discussion_r108052368 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/ops/QueryContext.java --- @@ -144,6 +145,25 @@ public SchemaPlus getNewDefaultSchema() { return defaultSchema; } + public SchemaPlus getPartialDefaultSchema() { + +final String sessionSchemaPath = session.getDefaultSchemaPath(); +final List schemaPathAsList = Lists.newArrayList(sessionSchemaPath.split("\\.")); + +final SchemaPlus rootSchema = schemaTreeProvider.createPartialRootSchema(getQueryUserName(), +this, schemaPathAsList.get(0)); + +final SchemaPlus defaultSchema = session.getDefaultSchema(rootSchema); + +if (defaultSchema == null) { + return rootSchema; +} +return defaultSchema; + } + + public void addNewRelevantSchema(Set storages, SchemaPlus toExpandSchema) { +schemaTreeProvider.addPartialRootSchema(getQueryUserName(), this, storages, toExpandSchema); --- End diff -- Per a comment later on, find a good home for computing the effective user name (with impersonation enabled). That calculation should probably be done on the user session if the impersonation is for the life of the session, or on the query context if the impersonation is only for the single query. (Note that if it is only for a query, temp tables probably won't work the way we expect.) But, doing the impersonation check in the schema tree provider is pushing that behavior too far down into the code. > Skip initializing all enabled storage plugins for every query > - > > Key: DRILL-5089 > URL: https://issues.apache.org/jira/browse/DRILL-5089 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Affects Versions: 1.9.0 >Reporter: Abhishek Girish >Assignee: Chunhui Shi >Priority: Critical > > In a query's lifecycle, at attempt is made to initialize each enabled storage > plugin, while building the schema tree. This is done regardless of the actual > plugins involved within a query. > Sometimes, when one or more of the enabled storage plugins have issues - > either due to misconfiguration or the underlying datasource being slow or > being down, the overall query time taken increases drastically. Most likely > due the attempt being made to register schemas from a faulty plugin. > For example, when a jdbc plugin is configured with SQL Server, and at one > point the underlying SQL Server db goes down, any Drill query starting to > execute at that point and beyond begin to slow down drastically. > We must skip registering unrelated schemas (& workspaces) for a query. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query
[ https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15942160#comment-15942160 ] ASF GitHub Bot commented on DRILL-5089: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/795#discussion_r108052279 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/SchemaTreeProvider.java --- @@ -119,6 +127,74 @@ public SchemaPlus createRootSchema(SchemaConfig schemaConfig) { } } + + public SchemaPlus createPartialRootSchema(final String userName, final SchemaConfigInfoProvider provider, +final String storage) { +final String schemaUser = isImpersonationEnabled ? userName : ImpersonationUtil.getProcessUserName(); +final SchemaConfig schemaConfig = SchemaConfig.newBuilder(schemaUser, provider).build(); +final SchemaPlus rootSchema = SimpleCalciteSchema.createRootSchema(false); +Set storageSet = Sets.newHashSet(); +storageSet.add(storage); +addNewStoragesToRootSchema(schemaConfig, rootSchema, storageSet); +schemaTreesToClose.add(rootSchema); +return rootSchema; + } + + public SchemaPlus addPartialRootSchema(final String userName, final SchemaConfigInfoProvider provider, +Set storages, SchemaPlus rootSchema) { +final String schemaUser = isImpersonationEnabled ? userName : ImpersonationUtil.getProcessUserName(); +final SchemaConfig schemaConfig = SchemaConfig.newBuilder(schemaUser, provider).build(); +addNewStoragesToRootSchema(schemaConfig, rootSchema, storages); +schemaTreesToClose.add(rootSchema); +return rootSchema; + } + + private void expandSecondLevelSchema(SchemaPlus parent) { +List secondLevelSchemas = Lists.newArrayList(); +for (String firstLevelSchemaName : parent.getSubSchemaNames()) { + SchemaPlus firstLevelSchema = parent.getSubSchema(firstLevelSchemaName); + for (String secondLevelSchemaName : firstLevelSchema.getSubSchemaNames()) { + secondLevelSchemas.add(firstLevelSchema.getSubSchema(secondLevelSchemaName)); + } +} + +for (SchemaPlus schema : secondLevelSchemas) { + AbstractSchema drillSchema; + try { +drillSchema = schema.unwrap(AbstractSchema.class); + } catch (ClassCastException e) { +throw new RuntimeException(String.format("Schema '%s' is not expected under root schema", schema.getName())); + } + SubSchemaWrapper wrapper = new SubSchemaWrapper(drillSchema); + parent.add(wrapper.getName(), wrapper); +} + } + + public void addNewStoragesToRootSchema(SchemaConfig schemaConfig, SchemaPlus rootSchema, Set storages) { +try { + for(String name: storages) { +try { + StoragePlugin plugin = dContext.getStorage().getPlugin(name); + if(plugin == null) { +logger.error("Got null for storage plugin of name: " + name); +continue; + } + plugin.registerSchemas(schemaConfig, rootSchema); + expandSecondLevelSchema(rootSchema); +} +catch (ExecutionSetupException ex) { --- End diff -- Presumably this method adds storages based on need: we add a storage only if a query references it. If so, then we need to pass the exception along. If we don't, the user will simply be told that "foo.bar" does not exist when, in fact, perhaps it does exist but is configured wrong. And, note that we do catch an IOException and translate it to a user exception below. So, maybe * Remove this try/catch block. * Change the following catch from IOException to just Exception. The result will be that we fail the query for all errors, not just IO errors. Again, this is the right choice if we are only loading the schemas needed by the query. > Skip initializing all enabled storage plugins for every query > - > > Key: DRILL-5089 > URL: https://issues.apache.org/jira/browse/DRILL-5089 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Affects Versions: 1.9.0 >Reporter: Abhishek Girish >Assignee: Chunhui Shi >Priority: Critical > > In a query's lifecycle, at attempt is made to initialize each enabled storage > plugin, while building the schema tree. This is done regardless of the actual > plugins involved within a query. > Sometimes, when one or more of the enabled storage plugins have issues - > either
[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query
[ https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15942156#comment-15942156 ] ASF GitHub Bot commented on DRILL-5089: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/795#discussion_r108052308 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/ops/QueryContext.java --- @@ -144,6 +145,25 @@ public SchemaPlus getNewDefaultSchema() { return defaultSchema; } + public SchemaPlus getPartialDefaultSchema() { --- End diff -- Javadoc to explain what a partial default schema might be? > Skip initializing all enabled storage plugins for every query > - > > Key: DRILL-5089 > URL: https://issues.apache.org/jira/browse/DRILL-5089 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Affects Versions: 1.9.0 >Reporter: Abhishek Girish >Assignee: Chunhui Shi >Priority: Critical > > In a query's lifecycle, at attempt is made to initialize each enabled storage > plugin, while building the schema tree. This is done regardless of the actual > plugins involved within a query. > Sometimes, when one or more of the enabled storage plugins have issues - > either due to misconfiguration or the underlying datasource being slow or > being down, the overall query time taken increases drastically. Most likely > due the attempt being made to register schemas from a faulty plugin. > For example, when a jdbc plugin is configured with SQL Server, and at one > point the underlying SQL Server db goes down, any Drill query starting to > execute at that point and beyond begin to slow down drastically. > We must skip registering unrelated schemas (& workspaces) for a query. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query
[ https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15942168#comment-15942168 ] ASF GitHub Bot commented on DRILL-5089: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/795#discussion_r108050436 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/ops/QueryContext.java --- @@ -144,6 +145,25 @@ public SchemaPlus getNewDefaultSchema() { return defaultSchema; } + public SchemaPlus getPartialDefaultSchema() { + +final String sessionSchemaPath = session.getDefaultSchemaPath(); +final List schemaPathAsList = Lists.newArrayList(sessionSchemaPath.split("\\.")); + +final SchemaPlus rootSchema = schemaTreeProvider.createPartialRootSchema(getQueryUserName(), +this, schemaPathAsList.get(0)); + +final SchemaPlus defaultSchema = session.getDefaultSchema(rootSchema); + +if (defaultSchema == null) { + return rootSchema; +} +return defaultSchema; + } + + public void addNewRelevantSchema(Set storages, SchemaPlus toExpandSchema) { --- End diff -- `addNewSchema`? Presumably the schema is relevant of we'd not be adding it... If we are adding the schema, it is new to this context, so maybe just `addSchema`. > Skip initializing all enabled storage plugins for every query > - > > Key: DRILL-5089 > URL: https://issues.apache.org/jira/browse/DRILL-5089 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Affects Versions: 1.9.0 >Reporter: Abhishek Girish >Assignee: Chunhui Shi >Priority: Critical > > In a query's lifecycle, at attempt is made to initialize each enabled storage > plugin, while building the schema tree. This is done regardless of the actual > plugins involved within a query. > Sometimes, when one or more of the enabled storage plugins have issues - > either due to misconfiguration or the underlying datasource being slow or > being down, the overall query time taken increases drastically. Most likely > due the attempt being made to register schemas from a faulty plugin. > For example, when a jdbc plugin is configured with SQL Server, and at one > point the underlying SQL Server db goes down, any Drill query starting to > execute at that point and beyond begin to slow down drastically. > We must skip registering unrelated schemas (& workspaces) for a query. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query
[ https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15942162#comment-15942162 ] ASF GitHub Bot commented on DRILL-5089: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/795#discussion_r108050603 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/SqlConverter.java --- @@ -107,26 +110,51 @@ public SqlConverter(QueryContext context) { this.sqlToRelConverterConfig = new SqlToRelConverterConfig(); this.isInnerQuery = false; this.typeFactory = new JavaTypeFactoryImpl(DRILL_TYPE_SYSTEM); -this.defaultSchema = context.getNewDefaultSchema(); -this.rootSchema = rootSchema(defaultSchema); + +this.defaultSchema = context.getPartialDefaultSchema(); --- End diff -- All these "this." prefixes are not needed. "this." is used only when member names are the same as argument names: ``` public Foo(String name) { this.name = name; } ``` In this case, a reader has to double check: what is the difference between `this.defaultSchema` and `defaultSchema`? Turns out, they are the same, so we should use the same form of reference everywhere in the ctor. > Skip initializing all enabled storage plugins for every query > - > > Key: DRILL-5089 > URL: https://issues.apache.org/jira/browse/DRILL-5089 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Affects Versions: 1.9.0 >Reporter: Abhishek Girish >Assignee: Chunhui Shi >Priority: Critical > > In a query's lifecycle, at attempt is made to initialize each enabled storage > plugin, while building the schema tree. This is done regardless of the actual > plugins involved within a query. > Sometimes, when one or more of the enabled storage plugins have issues - > either due to misconfiguration or the underlying datasource being slow or > being down, the overall query time taken increases drastically. Most likely > due the attempt being made to register schemas from a faulty plugin. > For example, when a jdbc plugin is configured with SQL Server, and at one > point the underlying SQL Server db goes down, any Drill query starting to > execute at that point and beyond begin to slow down drastically. > We must skip registering unrelated schemas (& workspaces) for a query. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query
[ https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15942164#comment-15942164 ] ASF GitHub Bot commented on DRILL-5089: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/795#discussion_r108051823 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/SchemaTreeProvider.java --- @@ -119,6 +127,74 @@ public SchemaPlus createRootSchema(SchemaConfig schemaConfig) { } } + + public SchemaPlus createPartialRootSchema(final String userName, final SchemaConfigInfoProvider provider, +final String storage) { +final String schemaUser = isImpersonationEnabled ? userName : ImpersonationUtil.getProcessUserName(); +final SchemaConfig schemaConfig = SchemaConfig.newBuilder(schemaUser, provider).build(); +final SchemaPlus rootSchema = SimpleCalciteSchema.createRootSchema(false); +Set storageSet = Sets.newHashSet(); +storageSet.add(storage); +addNewStoragesToRootSchema(schemaConfig, rootSchema, storageSet); +schemaTreesToClose.add(rootSchema); --- End diff -- There is quite a bit of plumbing here to get right -- for each new schema added. Please factor out the common code into a (private) implementation method. Else, we'll be chasing bugs forever as people modify one copy but miss the others. > Skip initializing all enabled storage plugins for every query > - > > Key: DRILL-5089 > URL: https://issues.apache.org/jira/browse/DRILL-5089 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Affects Versions: 1.9.0 >Reporter: Abhishek Girish >Assignee: Chunhui Shi >Priority: Critical > > In a query's lifecycle, at attempt is made to initialize each enabled storage > plugin, while building the schema tree. This is done regardless of the actual > plugins involved within a query. > Sometimes, when one or more of the enabled storage plugins have issues - > either due to misconfiguration or the underlying datasource being slow or > being down, the overall query time taken increases drastically. Most likely > due the attempt being made to register schemas from a faulty plugin. > For example, when a jdbc plugin is configured with SQL Server, and at one > point the underlying SQL Server db goes down, any Drill query starting to > execute at that point and beyond begin to slow down drastically. > We must skip registering unrelated schemas (& workspaces) for a query. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query
[ https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15942166#comment-15942166 ] ASF GitHub Bot commented on DRILL-5089: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/795#discussion_r108050543 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/SchemaUtilites.java --- @@ -185,6 +186,35 @@ public static AbstractSchema resolveToMutableDrillSchema(final SchemaPlus defaul } /** + * Given a schema in schema tree, once this schema is found resolve it into a mutable AbstractDrillSchema + * instance. A {@link UserException} is throws when: + * 1. Schema is a root schema + * 2. schema is not a mutable schema. + * @param schema + * @return + */ + public static AbstractSchema toMutableDrillSchema(final SchemaPlus schema) { +if (schema == null) { + throwSchemaNotFoundException(schema, schema.getName()); --- End diff -- Err... If schema is null, then `schema.getName()` will throw an NPE... Why would the schema be null? Should this be: ``` assert schema != null; ``` > Skip initializing all enabled storage plugins for every query > - > > Key: DRILL-5089 > URL: https://issues.apache.org/jira/browse/DRILL-5089 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Affects Versions: 1.9.0 >Reporter: Abhishek Girish >Assignee: Chunhui Shi >Priority: Critical > > In a query's lifecycle, at attempt is made to initialize each enabled storage > plugin, while building the schema tree. This is done regardless of the actual > plugins involved within a query. > Sometimes, when one or more of the enabled storage plugins have issues - > either due to misconfiguration or the underlying datasource being slow or > being down, the overall query time taken increases drastically. Most likely > due the attempt being made to register schemas from a faulty plugin. > For example, when a jdbc plugin is configured with SQL Server, and at one > point the underlying SQL Server db goes down, any Drill query starting to > execute at that point and beyond begin to slow down drastically. > We must skip registering unrelated schemas (& workspaces) for a query. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query
[ https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15942158#comment-15942158 ] ASF GitHub Bot commented on DRILL-5089: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/795#discussion_r108051857 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/SchemaTreeProvider.java --- @@ -119,6 +127,74 @@ public SchemaPlus createRootSchema(SchemaConfig schemaConfig) { } } + + public SchemaPlus createPartialRootSchema(final String userName, final SchemaConfigInfoProvider provider, +final String storage) { +final String schemaUser = isImpersonationEnabled ? userName : ImpersonationUtil.getProcessUserName(); +final SchemaConfig schemaConfig = SchemaConfig.newBuilder(schemaUser, provider).build(); +final SchemaPlus rootSchema = SimpleCalciteSchema.createRootSchema(false); +Set storageSet = Sets.newHashSet(); +storageSet.add(storage); +addNewStoragesToRootSchema(schemaConfig, rootSchema, storageSet); +schemaTreesToClose.add(rootSchema); +return rootSchema; + } + + public SchemaPlus addPartialRootSchema(final String userName, final SchemaConfigInfoProvider provider, +Set storages, SchemaPlus rootSchema) { +final String schemaUser = isImpersonationEnabled ? userName : ImpersonationUtil.getProcessUserName(); +final SchemaConfig schemaConfig = SchemaConfig.newBuilder(schemaUser, provider).build(); +addNewStoragesToRootSchema(schemaConfig, rootSchema, storages); +schemaTreesToClose.add(rootSchema); +return rootSchema; + } + + private void expandSecondLevelSchema(SchemaPlus parent) { --- End diff -- Maybe explain this a bit? Why are we expanding second-level schemas for *all* top-level schemas? Can't we do the expansion on the fly as we resolve? That is, if a query has a path "a.b.c.d", can't we just resolve a, then within a, resolve b, and so on until we get to d? Else, we are still open to a performance hit if, say, a is a directory of a million files, or a database with 10K tables. > Skip initializing all enabled storage plugins for every query > - > > Key: DRILL-5089 > URL: https://issues.apache.org/jira/browse/DRILL-5089 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Affects Versions: 1.9.0 >Reporter: Abhishek Girish >Assignee: Chunhui Shi >Priority: Critical > > In a query's lifecycle, at attempt is made to initialize each enabled storage > plugin, while building the schema tree. This is done regardless of the actual > plugins involved within a query. > Sometimes, when one or more of the enabled storage plugins have issues - > either due to misconfiguration or the underlying datasource being slow or > being down, the overall query time taken increases drastically. Most likely > due the attempt being made to register schemas from a faulty plugin. > For example, when a jdbc plugin is configured with SQL Server, and at one > point the underlying SQL Server db goes down, any Drill query starting to > execute at that point and beyond begin to slow down drastically. > We must skip registering unrelated schemas (& workspaces) for a query. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query
[ https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15942154#comment-15942154 ] ASF GitHub Bot commented on DRILL-5089: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/795#discussion_r108051696 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/AbstractSchema.java --- @@ -198,6 +200,10 @@ public Expression getExpression(SchemaPlus parentSchema, String name) { @Override public boolean contentsHaveChangedSince(long lastCheck, long now) { +//use cached schema for 1 second, it will be helpful if it is highly concurrent scenario +if( abs(now - lastCheck) < 1000 ) { --- End diff -- You want to check if either now is one second in the future -- or one second in the past? Under what conditions would lastCheck - now be negative and so would need an abs? > Skip initializing all enabled storage plugins for every query > - > > Key: DRILL-5089 > URL: https://issues.apache.org/jira/browse/DRILL-5089 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Affects Versions: 1.9.0 >Reporter: Abhishek Girish >Assignee: Chunhui Shi >Priority: Critical > > In a query's lifecycle, at attempt is made to initialize each enabled storage > plugin, while building the schema tree. This is done regardless of the actual > plugins involved within a query. > Sometimes, when one or more of the enabled storage plugins have issues - > either due to misconfiguration or the underlying datasource being slow or > being down, the overall query time taken increases drastically. Most likely > due the attempt being made to register schemas from a faulty plugin. > For example, when a jdbc plugin is configured with SQL Server, and at one > point the underlying SQL Server db goes down, any Drill query starting to > execute at that point and beyond begin to slow down drastically. > We must skip registering unrelated schemas (& workspaces) for a query. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query
[ https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15942165#comment-15942165 ] ASF GitHub Bot commented on DRILL-5089: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/795#discussion_r108052374 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/ops/QueryContext.java --- @@ -253,6 +273,10 @@ public RemoteFunctionRegistry getRemoteFunctionRegistry() { return drillbitContext.getRemoteFunctionRegistry(); } + public SchemaTreeProvider getSchemaTreeProvider() { --- End diff -- Javadoc comment to explain this? > Skip initializing all enabled storage plugins for every query > - > > Key: DRILL-5089 > URL: https://issues.apache.org/jira/browse/DRILL-5089 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Affects Versions: 1.9.0 >Reporter: Abhishek Girish >Assignee: Chunhui Shi >Priority: Critical > > In a query's lifecycle, at attempt is made to initialize each enabled storage > plugin, while building the schema tree. This is done regardless of the actual > plugins involved within a query. > Sometimes, when one or more of the enabled storage plugins have issues - > either due to misconfiguration or the underlying datasource being slow or > being down, the overall query time taken increases drastically. Most likely > due the attempt being made to register schemas from a faulty plugin. > For example, when a jdbc plugin is configured with SQL Server, and at one > point the underlying SQL Server db goes down, any Drill query starting to > execute at that point and beyond begin to slow down drastically. > We must skip registering unrelated schemas (& workspaces) for a query. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query
[ https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15941151#comment-15941151 ] ASF GitHub Bot commented on DRILL-5089: --- GitHub user chunhui-shi opened a pull request: https://github.com/apache/drill/pull/795 DRILL-5089: Get only partial schemas of relevant storage plugins inst… …ead of all storages ahead. 1. For each query, rootSchema is empty, add schemas of a storage plugin only when needed -- when asked for a schemaPath. 2. Allow mock environments to provide dynamic schemas. 3. SchemaUtils.findSchema used in many Sql handlers now is handled by SqlConverter.catalog to allow expand schema dynamically 4. Temp table resolve also could not take a schema tree and assume it is complete. NOTE: pom.xml in this pull request is pointing to temporary calcite versoin 'r20-test', this should change to 'r20' once this pull request is ready to commit You can merge this pull request into a Git repository by running: $ git pull https://github.com/chunhui-shi/drill DRILL-5089 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/795.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #795 commit 4814e37eefd16ee7aa1ba6d21d66211e41922dbf Author: chunhui-shiDate: 2017-01-06T09:13:02Z DRILL-5089: Get only partial schemas of relevant storage plugins instead of all storages ahead. 1. For each query, rootSchema is empty, add schemas of a storage plugin only when needed -- when asked for a schemaPath. 2. Allow mock environments to provide dynamic schemas. 3. SchemaUtils.findSchema used in many Sql handlers now is handled by SqlConverter.catalog to allow expand schema dynamically 4. Temp table resolve also could not take a schema tree and assume it is complete. NOTE: pom.xml in this pull request is pointing to temporary calcite versoin 'r20-test', this should change to 'r20' once this pull request is ready to commit > Skip initializing all enabled storage plugins for every query > - > > Key: DRILL-5089 > URL: https://issues.apache.org/jira/browse/DRILL-5089 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Affects Versions: 1.9.0 >Reporter: Abhishek Girish >Assignee: Chunhui Shi >Priority: Critical > > In a query's lifecycle, at attempt is made to initialize each enabled storage > plugin, while building the schema tree. This is done regardless of the actual > plugins involved within a query. > Sometimes, when one or more of the enabled storage plugins have issues - > either due to misconfiguration or the underlying datasource being slow or > being down, the overall query time taken increases drastically. Most likely > due the attempt being made to register schemas from a faulty plugin. > For example, when a jdbc plugin is configured with SQL Server, and at one > point the underlying SQL Server db goes down, any Drill query starting to > execute at that point and beyond begin to slow down drastically. > We must skip registering unrelated schemas (& workspaces) for a query. -- This message was sent by Atlassian JIRA (v6.3.15#6346)