[jira] [Commented] (DRILL-5771) Fix serDe errors for format plugins

2017-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257790#comment-16257790
 ] 

ASF GitHub Bot commented on DRILL-5771:
---

Github user parthchandra commented on the issue:

https://github.com/apache/drill/pull/1014
  
@arina-ielchiieva can you resolve the merge conflict? 


> Fix serDe errors for format plugins
> ---
>
> Key: DRILL-5771
> URL: https://issues.apache.org/jira/browse/DRILL-5771
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.11.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>  Labels: ready-to-commit
> Fix For: 1.12.0
>
>
> Create unit tests to check that all storage format plugins can be 
> successfully serialized  / deserialized.
> Usually this happens when query has several major fragments. 
> One way to check serde is to generate physical plan (generated as json) and 
> then submit it back to Drill.
> One example of found errors is described in the first comment. Another 
> example is described in DRILL-5166.
> *Serde issues:*
> 1. Could not obtain format plugin during deserialization
> Format plugin is created based on format plugin configuration or its name. 
> On Drill start up we load information about available plugins (its reloaded 
> each time storage plugin is updated, can be done only by admin).
> When query is parsed, we try to get plugin from the available ones, it we can 
> not find one we try to [create 
> one|https://github.com/apache/drill/blob/3e8b01d5b0d3013e3811913f0fd6028b22c1ac3f/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemPlugin.java#L136-L144]
> but on other query execution stages we always assume that [plugin exists 
> based on 
> configuration|https://github.com/apache/drill/blob/3e8b01d5b0d3013e3811913f0fd6028b22c1ac3f/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemPlugin.java#L156-L162].
> For example, during query parsing we had to create format plugin on one node 
> based on format configuration.
> Then we have sent major fragment to the different node where we used this 
> format configuration we could not get format plugin based on it and 
> deserialization has failed.
> To fix this problem we need to create format plugin during query 
> deserialization if it's absent.
>   
> 2.  Absent hash code and equals.
> Format plugins are stored in hash map where key is format plugin config.
> Since some format plugin configs did not have overridden hash code and 
> equals, we could not find format plugin based on its configuration.
> 3. Named format plugin usage
> Named format plugins configs allow to get format plugin by its name for 
> configuration shared among all drillbits.
> They are used as alias for pre-configured format plugiins. User with admin 
> priliges can modify them at runtime.
> Named format plugins configs are used instead of sending all non-default 
> parameters of format plugin config, in this case only name is sent.
> Their usage in distributed system may cause raise conditions.
> For example, 
> 1. Query is submitted. 
> 2. Parquet format plugin is created with the following configuration 
> (autoCorrectCorruptDates=>true).
> 3. Seralized named format plugin config with name as parquet.
> 4. Major fragment is sent to the different node.
> 5. Admin has changed parquet configuration for the alias 'parquet' on all 
> nodes to autoCorrectCorruptDates=>false.
> 6. Named format is deserialized on the different node into parquet format 
> plugin with configuration (autoCorrectCorruptDates=>false).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5657) Implement size-aware result set loader

2017-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257639#comment-16257639
 ] 

ASF GitHub Bot commented on DRILL-5657:
---

Github user parthchandra commented on the issue:

https://github.com/apache/drill/pull/914
  
+1. Ship it!


> Implement size-aware result set loader
> --
>
> Key: DRILL-5657
> URL: https://issues.apache.org/jira/browse/DRILL-5657
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: Future
>Reporter: Paul Rogers
>Assignee: Paul Rogers
> Fix For: Future
>
>
> A recent extension to Drill's set of test tools created a "row set" 
> abstraction to allow us to create, and verify, record batches with very few 
> lines of code. Part of this work involved creating a set of "column 
> accessors" in the vector subsystem. Column readers provide a uniform API to 
> obtain data from columns (vectors), while column writers provide a uniform 
> writing interface.
> DRILL-5211 discusses a set of changes to limit value vectors to 16 MB in size 
> (to avoid memory fragmentation due to Drill's two memory allocators.) The 
> column accessors have proven to be so useful that they will be the basis for 
> the new, size-aware writers used by Drill's record readers.
> A step in that direction is to retrofit the column writers to use the 
> size-aware {{setScalar()}} and {{setArray()}} methods introduced in 
> DRILL-5517.
> Since the test framework row set classes are (at present) the only consumer 
> of the accessors, those classes must also be updated with the changes.
> This then allows us to add a new "row mutator" class that handles size-aware 
> vector writing, including the case in which a vector fills in the middle of a 
> row.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query

2017-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257633#comment-16257633
 ] 

ASF GitHub Bot commented on DRILL-5089:
---

Github user chunhui-shi commented on a diff in the pull request:

https://github.com/apache/drill/pull/1032#discussion_r151798147
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/DynamicRootSchema.java
 ---
@@ -0,0 +1,140 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.sql;
+
+import com.google.common.collect.ImmutableSortedSet;
+import com.google.common.collect.Lists;
+import com.google.common.collect.Sets;
+import org.apache.calcite.DataContext;
+import org.apache.calcite.jdbc.CalciteRootSchema;
+import org.apache.calcite.jdbc.CalciteSchema;
+
+import org.apache.calcite.linq4j.tree.Expression;
+import org.apache.calcite.linq4j.tree.Expressions;
+import org.apache.calcite.schema.SchemaPlus;
+import org.apache.calcite.schema.impl.AbstractSchema;
+import org.apache.calcite.util.BuiltInMethod;
+import org.apache.calcite.util.Compatible;
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.exec.store.SchemaConfig;
+import org.apache.drill.exec.store.StoragePlugin;
+import org.apache.drill.exec.store.StoragePluginRegistry;
+import org.apache.drill.exec.store.SubSchemaWrapper;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.Map;
+import java.util.NavigableSet;
+import java.util.Set;
+
+/**
+ * This class is to allow us loading schemas from storage plugins later 
when {@link #getSubSchema(String, boolean)}
+ * is called.
+ */
+public class DynamicRootSchema extends DynamicSchema
+implements CalciteRootSchema {
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(DynamicRootSchema.class);
+  /** Creates a root schema. */
+  DynamicRootSchema(StoragePluginRegistry storages, SchemaConfig 
schemaConfig) {
+super(null, new RootSchema(), "");
+this.schemaConfig = schemaConfig;
+this.storages = storages;
+  }
+
+  @Override
+  public CalciteSchema getSubSchema(String schemaName, boolean 
caseSensitive) {
+CalciteSchema retSchema = getSubSchemaMap().get(schemaName);
+if (retSchema != null) {
+  return retSchema;
+}
+
+loadSchemaFactory(schemaName, caseSensitive);
+retSchema = getSubSchemaMap().get(schemaName);
+return retSchema;
+  }
+
+  @Override
+  public NavigableSet getTableNames() {
+Set pluginNames = Sets.newHashSet();
+for (Map.Entry storageEntry : 
getSchemaFactories()) {
+  pluginNames.add(storageEntry.getKey());
+}
+return Compatible.INSTANCE.navigableSet(
+ImmutableSortedSet.copyOf(
+Sets.union(pluginNames, getSubSchemaMap().keySet(;
+  }
+
+  /**
+   * load schema factory(storage plugin) for schemaName
+   * @param schemaName
+   * @param caseSensitive
+   */
+  public void loadSchemaFactory(String schemaName, boolean caseSensitive) {
+try {
+  SchemaPlus thisPlus = this.plus();
+  StoragePlugin plugin = getSchemaFactories().getPlugin(schemaName);
+  if (plugin != null) {
+plugin.registerSchemas(schemaConfig, thisPlus);
+return;
+  }
+
+  // we could not find the plugin, the schemaName could be `dfs.tmp`, 
a 2nd level schema under 'dfs'
+  String[] paths = schemaName.split("\\.");
+  if (paths.length == 2) {
+plugin = getSchemaFactories().getPlugin(paths[0]);
+if (plugin == null) {
+  return;
+}
+
+// we could find storage plugin for first part(e.g. 'dfs') of 
schemaName (e.g. 'dfs.tmp')
+// register schema for this storage 

[jira] [Commented] (DRILL-5657) Implement size-aware result set loader

2017-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257628#comment-16257628
 ] 

ASF GitHub Bot commented on DRILL-5657:
---

Github user parthchandra commented on the issue:

https://github.com/apache/drill/pull/914
  
I was hoping that we would just get better encapsulation without losing 
performance. This performance boost is rather serendipitous.
One possible explanation might be in the JVM optimizing away code paths in 
the benchmark itself. This means that we might not see the same performance 
gains in real life. @bitblender knows more about this than I do. 



> Implement size-aware result set loader
> --
>
> Key: DRILL-5657
> URL: https://issues.apache.org/jira/browse/DRILL-5657
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: Future
>Reporter: Paul Rogers
>Assignee: Paul Rogers
> Fix For: Future
>
>
> A recent extension to Drill's set of test tools created a "row set" 
> abstraction to allow us to create, and verify, record batches with very few 
> lines of code. Part of this work involved creating a set of "column 
> accessors" in the vector subsystem. Column readers provide a uniform API to 
> obtain data from columns (vectors), while column writers provide a uniform 
> writing interface.
> DRILL-5211 discusses a set of changes to limit value vectors to 16 MB in size 
> (to avoid memory fragmentation due to Drill's two memory allocators.) The 
> column accessors have proven to be so useful that they will be the basis for 
> the new, size-aware writers used by Drill's record readers.
> A step in that direction is to retrofit the column writers to use the 
> size-aware {{setScalar()}} and {{setArray()}} methods introduced in 
> DRILL-5517.
> Since the test framework row set classes are (at present) the only consumer 
> of the accessors, those classes must also be updated with the changes.
> This then allows us to add a new "row mutator" class that handles size-aware 
> vector writing, including the case in which a vector fills in the middle of a 
> row.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query

2017-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257599#comment-16257599
 ] 

ASF GitHub Bot commented on DRILL-5089:
---

Github user chunhui-shi commented on a diff in the pull request:

https://github.com/apache/drill/pull/1032#discussion_r151793647
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java
 ---
@@ -373,12 +402,12 @@ public String toString() {
   public class WorkspaceSchema extends AbstractSchema implements 
ExpandingConcurrentMap.MapValueFactory {
 private final ExpandingConcurrentMap tables 
= new ExpandingConcurrentMap<>(this);
 private final SchemaConfig schemaConfig;
-private final DrillFileSystem fs;
+private DrillFileSystem fs;
 
-public WorkspaceSchema(List parentSchemaPath, String wsName, 
SchemaConfig schemaConfig) throws IOException {
+public WorkspaceSchema(List parentSchemaPath, String wsName, 
SchemaConfig schemaConfig, DrillFileSystem fs) throws IOException {
   super(parentSchemaPath, wsName);
   this.schemaConfig = schemaConfig;
-  this.fs = 
ImpersonationUtil.createFileSystem(schemaConfig.getUserName(), fsConf);
+  this.fs = fs;
--- End diff --

Now we pass in fs instead creating from inside of WorkspaceSchema. 


> Skip initializing all enabled storage plugins for every query
> -
>
> Key: DRILL-5089
> URL: https://issues.apache.org/jira/browse/DRILL-5089
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 1.9.0
>Reporter: Abhishek Girish
>Assignee: Chunhui Shi
>Priority: Critical
>
> In a query's lifecycle, at attempt is made to initialize each enabled storage 
> plugin, while building the schema tree. This is done regardless of the actual 
> plugins involved within a query. 
> Sometimes, when one or more of the enabled storage plugins have issues - 
> either due to misconfiguration or the underlying datasource being slow or 
> being down, the overall query time taken increases drastically. Most likely 
> due the attempt being made to register schemas from a faulty plugin.
> For example, when a jdbc plugin is configured with SQL Server, and at one 
> point the underlying SQL Server db goes down, any Drill query starting to 
> execute at that point and beyond begin to slow down drastically. 
> We must skip registering unrelated schemas (& workspaces) for a query. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query

2017-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257601#comment-16257601
 ] 

ASF GitHub Bot commented on DRILL-5089:
---

Github user chunhui-shi commented on a diff in the pull request:

https://github.com/apache/drill/pull/1032#discussion_r151793708
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java
 ---
@@ -150,14 +152,30 @@ public WorkspaceSchemaFactory(
* @return True if the user has access. False otherwise.
*/
   public boolean accessible(final String userName) throws IOException {
-final FileSystem fs = ImpersonationUtil.createFileSystem(userName, 
fsConf);
+final DrillFileSystem fs = 
ImpersonationUtil.createFileSystem(userName, fsConf);
+return accessible(fs);
+  }
+
+  /**
+   * Checks whether a FileSystem object has the permission to list/read 
workspace directory
+   * @param fs a DrillFileSystem object that was created with certain user 
privilege
+   * @return True if the user has access. False otherwise.
+   * @throws IOException
+   */
+  public boolean accessible(DrillFileSystem fs) throws IOException {
 try {
-  // We have to rely on the listStatus as a FileSystem can have 
complicated controls such as regular unix style
-  // permissions, Access Control Lists (ACLs) or Access Control 
Expressions (ACE). Hadoop 2.7 version of FileSystem
-  // has a limited private API (FileSystem.access) to check the 
permissions directly
-  // (see https://issues.apache.org/jira/browse/HDFS-6570). Drill 
currently relies on Hadoop 2.5.0 version of
-  // FileClient. TODO: Update this when DRILL-3749 is fixed.
-  fs.listStatus(wsPath);
+  /**
+   * For Windows local file system, fs.access ends up using 
DeprecatedRawLocalFileStatus which has
+   * TrustedInstaller as owner, and a member of Administrators group 
could not satisfy the permission.
+   * In this case, we will still use method listStatus.
+   * In other cases, we use access method since it is cheaper.
+   */
+  if (SystemUtils.IS_OS_WINDOWS && 
fs.getUri().getScheme().equalsIgnoreCase(FileSystemSchemaFactory.LOCAL_FS_SCHEME))
 {
--- End diff --

Yes. it was tested in windows unit tests.


> Skip initializing all enabled storage plugins for every query
> -
>
> Key: DRILL-5089
> URL: https://issues.apache.org/jira/browse/DRILL-5089
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 1.9.0
>Reporter: Abhishek Girish
>Assignee: Chunhui Shi
>Priority: Critical
>
> In a query's lifecycle, at attempt is made to initialize each enabled storage 
> plugin, while building the schema tree. This is done regardless of the actual 
> plugins involved within a query. 
> Sometimes, when one or more of the enabled storage plugins have issues - 
> either due to misconfiguration or the underlying datasource being slow or 
> being down, the overall query time taken increases drastically. Most likely 
> due the attempt being made to register schemas from a faulty plugin.
> For example, when a jdbc plugin is configured with SQL Server, and at one 
> point the underlying SQL Server db goes down, any Drill query starting to 
> execute at that point and beyond begin to slow down drastically. 
> We must skip registering unrelated schemas (& workspaces) for a query. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (DRILL-5974) Read JSON non-relational fields using text mode

2017-11-17 Thread Paul Rogers (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers updated DRILL-5974:
---
Description: 
Proposed is a minor enhancement to the JSON reader to better handle 
non-relational JSON structures.

As background, Drill handles simple tuples:

{code}
{a: 10, b: “fred”}
{code}

Drill also handles arrays:

{code}
{name: “fred”, hobbies: [“bowling”, “golf”]}
{code}

Drill even handles arrays of tuples:

{code}
{name: “fred”, orders: [
  {id: 1001, amount: 12.34},
  {id: 1002, amount: 56.78}]}
{code}

The above are termed "relational" because there is a straightforward mapping 
to/from tables into the above JSON structures.

Things get interesting with non-relational types, such as 2-D arrays:

{code}
{id: 4, shape: “square”, points: [[0, 0], [0, 5], [5, 0], [5, 5]]}
{code}

Drill has two solutions:

* Turn on the experimental list and union support.
* Enable all-text mode to read all fields as JSON text.

Proposed is a middle ground:

* Read fields with relational types into vectors.
* Read non-relational fields using text mode.

Thus, the first three examples would all result in the JSON data parsed into 
Drill vectors. But, the fourth, non-relational example would produce a row that 
looks like this:

{noformat}
id, shape, points
4, “shape”, “[[0, 0], [0, 5], [5, 0], [5, 5]]”
{noformat}

Although Drill can’t parse the 2-D array, Drill will pass the array along to 
the client, which can use its favorite JSON parser to parse the array and do 
something useful (like draw the square in this case.)

Specifically, the proposal is to:

* Apply this change only to the revised “batch size aware” JSON reader.
* Use the above parsing model by default.
* Use the experimental list-and-union support if the existing 
{{exec.enable_union_type}} system/session option is set.

Existing queries should “just work.” In fact, now JSON with non-relational 
types will work “out-of-the-box” without all-text mode or the experimental 
types.

  was:
Proposed is a minor enhancement to the JSON reader to better handle 
non-relational JSON structures.

As background, Drill handles simple tuples:

{code}
{a: 10, b: “fred”}
{code}

Drill also handles arrays:

{code}
{name: “fred”, hobbies: [“bowling”, “golf”]}
{code}

Drill even handles arrays of tuples:

{code}
{name: “fred”, orders: [
  {id: 1001, amount: 12.34},
  {id: 1002, amount: 56.78}]}
{code}

The above are termed "relational" because there is a straightforward mapping 
to/from tables into the above JSON structures.

Things get interesting with non-relational types, such as 2-D arrays:

{code}
{id: 4, shape: “square”, points: [[0, 0], [0, 5], [5, 0], [5, 5]]}
{code}

Drill has two solutions:

* Turn on the experimental list and union support.
* Enable all-text mode to read all fields as JSON text.

Proposed is a middle ground:

* Read fields with relational types into vectors.
* Read non-relational fields using text mode.

Thus, the first three examples would all result in the JSON data parsed into 
Drill vectors. But, the fourth, non-relational example would produce a row that 
looks like this:

{noformat}
id, shape, points
4, “shape”, “[[0, 0], [0, 5], [5, 0], [5, 5]]”
{noformat}

Although Drill can’t parse the 2-D array, Drill will pass the array along to 
the client, which can use its favorite JSON parser to parse the array and do 
something useful (like draw the square in this case.)

Specifically, the proposal is to:

* Apply this change only to the revised “batch size aware” JSON reader.
* Use the above parsing model by default.
* Use the experimental list-and-union support if the existing 
{{exec.enable_union_type}} system/session option is set.


> Read JSON non-relational fields using text mode
> ---
>
> Key: DRILL-5974
> URL: https://issues.apache.org/jira/browse/DRILL-5974
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.13.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
> Fix For: 1.13.0
>
>
> Proposed is a minor enhancement to the JSON reader to better handle 
> non-relational JSON structures.
> As background, Drill handles simple tuples:
> {code}
> {a: 10, b: “fred”}
> {code}
> Drill also handles arrays:
> {code}
> {name: “fred”, hobbies: [“bowling”, “golf”]}
> {code}
> Drill even handles arrays of tuples:
> {code}
> {name: “fred”, orders: [
>   {id: 1001, amount: 12.34},
>   {id: 1002, amount: 56.78}]}
> {code}
> The above are termed "relational" because there is a straightforward mapping 
> to/from tables into the above JSON structures.
> Things get interesting with non-relational types, such as 2-D arrays:
> {code}
> {id: 4, shape: “square”, points: [[0, 0], [0, 5], [5, 0], [5, 5]]}
> {code}
> Drill has two solutions:
> * Turn on the experimental list and union support.
> * Enable all-text mode 

[jira] [Created] (DRILL-5974) Read JSON non-relational fields using text mode

2017-11-17 Thread Paul Rogers (JIRA)
Paul Rogers created DRILL-5974:
--

 Summary: Read JSON non-relational fields using text mode
 Key: DRILL-5974
 URL: https://issues.apache.org/jira/browse/DRILL-5974
 Project: Apache Drill
  Issue Type: Improvement
Affects Versions: 1.13.0
Reporter: Paul Rogers
Assignee: Paul Rogers
 Fix For: 1.13.0


Proposed is a minor enhancement to the JSON reader to better handle 
non-relational JSON structures.

As background, Drill handles simple tuples:

{code}
{a: 10, b: “fred”}
{code}

Drill also handles arrays:

{code}
{name: “fred”, hobbies: [“bowling”, “golf”]}
{code}

Drill even handles arrays of tuples:

{code}
{name: “fred”, orders: [
  {id: 1001, amount: 12.34},
  {id: 1002, amount: 56.78}]}
{code}

The above are termed "relational" because there is a straightforward mapping 
to/from tables into the above JSON structures.

Things get interesting with non-relational types, such as 2-D arrays:

{code}
{id: 4, shape: “square”, points: [[0, 0], [0, 5], [5, 0], [5, 5]]}
{code}

Drill has two solutions:

* Turn on the experimental list and union support.
* Enable all-text mode to read all fields as JSON text.

Proposed is a middle ground:

* Read fields with relational types into vectors.
* Read non-relational fields using text mode.

Thus, the first three examples would all result in the JSON data parsed into 
Drill vectors. But, the fourth, non-relational example would produce a row that 
looks like this:

{noformat}
id, shape, points
4, “shape”, “[[0, 0], [0, 5], [5, 0], [5, 5]]”
{noformat}

Although Drill can’t parse the 2-D array, Drill will pass the array along to 
the client, which can use its favorite JSON parser to parse the array and do 
something useful (like draw the square in this case.)

Specifically, the proposal is to:

* Apply this change only to the revised “batch size aware” JSON reader.
* Use the above parsing model by default.
* Use the experimental list-and-union support if the existing 
{{exec.enable_union_type}} system/session option is set.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-4091) Support more functions in gis contrib module

2017-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257524#comment-16257524
 ] 

ASF GitHub Bot commented on DRILL-4091:
---

Github user cgivre commented on the issue:

https://github.com/apache/drill/pull/258
  
I'm getting some unit test failures when I build this.  

[org.apache.drill.exec.expr.fn.impl.gis.TestGeometryFunctions.txt](https://github.com/apache/drill/files/1483754/org.apache.drill.exec.expr.fn.impl.gis.TestGeometryFunctions.txt)





> Support more functions in gis contrib module
> 
>
> Key: DRILL-4091
> URL: https://issues.apache.org/jira/browse/DRILL-4091
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Functions - Drill
>Reporter: Karol Potocki
>Assignee: Karol Potocki
>
> Support for commonly used gis functions in gis contrib module: relate, 
> contains, crosses, intersects, touches, difference, disjoint, buffer, union 
> etc.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5657) Implement size-aware result set loader

2017-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257406#comment-16257406
 ] 

ASF GitHub Bot commented on DRILL-5657:
---

Github user paul-rogers commented on the issue:

https://github.com/apache/drill/pull/914
  
There has been discussion recently on Drill's goal of integrating with 
Arrow. The work to use the `Drillbuf` highlights how we can integrate with 
Arrow. Simply replace the `Drillbuf` usage with `ArrowBuf`, make the required 
changes in vector names, add a few methods to the Arrow API, and this entire 
mechanism can be easily ported over to use Arrow.


> Implement size-aware result set loader
> --
>
> Key: DRILL-5657
> URL: https://issues.apache.org/jira/browse/DRILL-5657
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: Future
>Reporter: Paul Rogers
>Assignee: Paul Rogers
> Fix For: Future
>
>
> A recent extension to Drill's set of test tools created a "row set" 
> abstraction to allow us to create, and verify, record batches with very few 
> lines of code. Part of this work involved creating a set of "column 
> accessors" in the vector subsystem. Column readers provide a uniform API to 
> obtain data from columns (vectors), while column writers provide a uniform 
> writing interface.
> DRILL-5211 discusses a set of changes to limit value vectors to 16 MB in size 
> (to avoid memory fragmentation due to Drill's two memory allocators.) The 
> column accessors have proven to be so useful that they will be the basis for 
> the new, size-aware writers used by Drill's record readers.
> A step in that direction is to retrofit the column writers to use the 
> size-aware {{setScalar()}} and {{setArray()}} methods introduced in 
> DRILL-5517.
> Since the test framework row set classes are (at present) the only consumer 
> of the accessors, those classes must also be updated with the changes.
> This then allows us to add a new "row mutator" class that handles size-aware 
> vector writing, including the case in which a vector fills in the middle of a 
> row.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5657) Implement size-aware result set loader

2017-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257402#comment-16257402
 ] 

ASF GitHub Bot commented on DRILL-5657:
---

Github user bitblender commented on a diff in the pull request:

https://github.com/apache/drill/pull/914#discussion_r140644571
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/rowSet/impl/TupleState.java
 ---
@@ -0,0 +1,353 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.rowSet.impl;
+
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.drill.exec.expr.TypeHelper;
+import 
org.apache.drill.exec.physical.rowSet.impl.ColumnState.BaseMapColumnState;
+import 
org.apache.drill.exec.physical.rowSet.impl.ColumnState.MapArrayColumnState;
+import 
org.apache.drill.exec.physical.rowSet.impl.ColumnState.MapColumnState;
+import org.apache.drill.exec.record.ColumnMetadata;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.record.TupleMetadata;
+import org.apache.drill.exec.record.TupleSchema;
+import org.apache.drill.exec.record.TupleSchema.AbstractColumnMetadata;
+import org.apache.drill.exec.vector.ValueVector;
+import org.apache.drill.exec.vector.accessor.ObjectType;
+import org.apache.drill.exec.vector.accessor.ObjectWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import 
org.apache.drill.exec.vector.accessor.TupleWriter.TupleWriterListener;
+import org.apache.drill.exec.vector.accessor.impl.HierarchicalFormatter;
+import org.apache.drill.exec.vector.accessor.writer.AbstractObjectWriter;
+import org.apache.drill.exec.vector.accessor.writer.AbstractTupleWriter;
+import org.apache.drill.exec.vector.accessor.writer.ColumnWriterFactory;
+import org.apache.drill.exec.vector.complex.AbstractMapVector;
+
+public abstract class TupleState implements TupleWriterListener {
+
+  public static class RowState extends TupleState {
+
+/**
+ * The row-level writer for stepping through rows as they are written,
+ * and for accessing top-level columns.
+ */
+
+private final RowSetLoaderImpl writer;
+
+public RowState(ResultSetLoaderImpl rsLoader) {
+  super(rsLoader, rsLoader.projectionSet);
+  writer = new RowSetLoaderImpl(rsLoader, schema);
+  writer.bindListener(this);
+}
+
+public RowSetLoaderImpl rootWriter() { return writer; }
+
+@Override
+public AbstractTupleWriter writer() { return writer; }
+
+@Override
+public int innerCardinality() { return 
resultSetLoader.targetRowCount();}
+  }
+
+  public static class MapState extends TupleState {
+
+protected final AbstractMapVector mapVector;
+protected final BaseMapColumnState mapColumnState;
+protected int outerCardinality;
+
+public MapState(ResultSetLoaderImpl rsLoader,
+BaseMapColumnState mapColumnState,
+AbstractMapVector mapVector,
+ProjectionSet projectionSet) {
+  super(rsLoader, projectionSet);
+  this.mapVector = mapVector;
+  this.mapColumnState = mapColumnState;
+  mapColumnState.writer().bindListener(this);
+}
+
+@Override
+protected void columnAdded(ColumnState colState) {
+  @SuppressWarnings("resource")
+  ValueVector vector = colState.vector();
+
+  // Can't materialize the child if the map itself is
+  // not materialized.
+
+  assert mapVector != null || vector == null;
+  if (vector != null) {
+mapVector.putChild(vector.getField().getName(), vector);
+  }
+}
+
+@Override
+public AbstractTupleWriter writer() {
+  AbstractObjectWriter objWriter = mapColumnState.writer();
+  TupleWriter tupleWriter;
+  if (objWriter.type() == 

[jira] [Commented] (DRILL-5657) Implement size-aware result set loader

2017-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257403#comment-16257403
 ] 

ASF GitHub Bot commented on DRILL-5657:
---

Github user bitblender commented on a diff in the pull request:

https://github.com/apache/drill/pull/914#discussion_r151762298
  
--- Diff: exec/vector/src/main/codegen/templates/ColumnAccessors.java ---
@@ -275,17 +273,17 @@ public boolean isNull() {
   final int offset = writeIndex(len);
   <#else>
   final int writeIndex = writeIndex();
-  <#assign putAddr = "bufAddr + writeIndex * VALUE_WIDTH">
+  <#assign putAddr = "writeIndex * VALUE_WIDTH">
--- End diff --

putAddr is not an address but an offset, right ? Should be renamed. 


> Implement size-aware result set loader
> --
>
> Key: DRILL-5657
> URL: https://issues.apache.org/jira/browse/DRILL-5657
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: Future
>Reporter: Paul Rogers
>Assignee: Paul Rogers
> Fix For: Future
>
>
> A recent extension to Drill's set of test tools created a "row set" 
> abstraction to allow us to create, and verify, record batches with very few 
> lines of code. Part of this work involved creating a set of "column 
> accessors" in the vector subsystem. Column readers provide a uniform API to 
> obtain data from columns (vectors), while column writers provide a uniform 
> writing interface.
> DRILL-5211 discusses a set of changes to limit value vectors to 16 MB in size 
> (to avoid memory fragmentation due to Drill's two memory allocators.) The 
> column accessors have proven to be so useful that they will be the basis for 
> the new, size-aware writers used by Drill's record readers.
> A step in that direction is to retrofit the column writers to use the 
> size-aware {{setScalar()}} and {{setArray()}} methods introduced in 
> DRILL-5517.
> Since the test framework row set classes are (at present) the only consumer 
> of the accessors, those classes must also be updated with the changes.
> This then allows us to add a new "row mutator" class that handles size-aware 
> vector writing, including the case in which a vector fills in the middle of a 
> row.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5657) Implement size-aware result set loader

2017-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257401#comment-16257401
 ] 

ASF GitHub Bot commented on DRILL-5657:
---

Github user bitblender commented on a diff in the pull request:

https://github.com/apache/drill/pull/914#discussion_r151762286
  
--- Diff: exec/memory/base/src/main/java/io/netty/buffer/DrillBuf.java ---
@@ -882,4 +882,71 @@ public void print(StringBuilder sb, int indent, 
Verbosity verbosity) {
 }
   }
 
+  // The "unsafe" methods are for use ONLY by code that does its own
+  // bounds checking. They are called "unsafe" for a reason: they will 
crash
+  // the JVM if values are addressed out of bounds.
+
+  /**
+   * Write an integer to the buffer at the given byte index, without
+   * bounds checks.
+   *
+   * @param index byte (not int) index of the location to write
+   * @param value the value to write
+   */
+
+  public void unsafePutInt(int index, int value) {
--- End diff --

The first argument in these unsafePutXXX methods is an offset right? Should 
the 'index' be changed to an 'offset' ?


> Implement size-aware result set loader
> --
>
> Key: DRILL-5657
> URL: https://issues.apache.org/jira/browse/DRILL-5657
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: Future
>Reporter: Paul Rogers
>Assignee: Paul Rogers
> Fix For: Future
>
>
> A recent extension to Drill's set of test tools created a "row set" 
> abstraction to allow us to create, and verify, record batches with very few 
> lines of code. Part of this work involved creating a set of "column 
> accessors" in the vector subsystem. Column readers provide a uniform API to 
> obtain data from columns (vectors), while column writers provide a uniform 
> writing interface.
> DRILL-5211 discusses a set of changes to limit value vectors to 16 MB in size 
> (to avoid memory fragmentation due to Drill's two memory allocators.) The 
> column accessors have proven to be so useful that they will be the basis for 
> the new, size-aware writers used by Drill's record readers.
> A step in that direction is to retrofit the column writers to use the 
> size-aware {{setScalar()}} and {{setArray()}} methods introduced in 
> DRILL-5517.
> Since the test framework row set classes are (at present) the only consumer 
> of the accessors, those classes must also be updated with the changes.
> This then allows us to add a new "row mutator" class that handles size-aware 
> vector writing, including the case in which a vector fills in the middle of a 
> row.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-4303) ESRI Shapefile (shp) format plugin

2017-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257300#comment-16257300
 ] 

ASF GitHub Bot commented on DRILL-4303:
---

Github user cgivre commented on the issue:

https://github.com/apache/drill/pull/335
  
HI @k255 
Are you still interested in this?  I think if we're going to get the GIS 
functions into Drill, we really should get this in as well and I'm happy to 
help.  For whatever reason, I didn't see this until this morning.  Anyway, if 
you'd be willing to rebase this, I can help shepherd it through the review 
process.
-- C


> ESRI Shapefile (shp) format plugin
> --
>
> Key: DRILL-4303
> URL: https://issues.apache.org/jira/browse/DRILL-4303
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Reporter: Karol Potocki
>
> Allow Drill (drill-gis) to read esri shapefiles, one of the most popular 
> geospatial formats.
> Format described here:
> https://www.esri.com/library/whitepapers/pdfs/shapefile.pdf
> It consists of three files (prj - srid information, dbf - data fields, shp - 
> geometry data)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query

2017-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257240#comment-16257240
 ] 

ASF GitHub Bot commented on DRILL-5089:
---

Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1032#discussion_r151732963
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java
 ---
@@ -373,12 +402,12 @@ public String toString() {
   public class WorkspaceSchema extends AbstractSchema implements 
ExpandingConcurrentMap.MapValueFactory {
 private final ExpandingConcurrentMap tables 
= new ExpandingConcurrentMap<>(this);
 private final SchemaConfig schemaConfig;
-private final DrillFileSystem fs;
+private DrillFileSystem fs;
 
-public WorkspaceSchema(List parentSchemaPath, String wsName, 
SchemaConfig schemaConfig) throws IOException {
+public WorkspaceSchema(List parentSchemaPath, String wsName, 
SchemaConfig schemaConfig, DrillFileSystem fs) throws IOException {
   super(parentSchemaPath, wsName);
   this.schemaConfig = schemaConfig;
-  this.fs = 
ImpersonationUtil.createFileSystem(schemaConfig.getUserName(), fsConf);
+  this.fs = fs;
--- End diff --

Why don't we anymore need to create fs using `ImpersonationUtil` but needed 
before?


> Skip initializing all enabled storage plugins for every query
> -
>
> Key: DRILL-5089
> URL: https://issues.apache.org/jira/browse/DRILL-5089
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 1.9.0
>Reporter: Abhishek Girish
>Assignee: Chunhui Shi
>Priority: Critical
>
> In a query's lifecycle, at attempt is made to initialize each enabled storage 
> plugin, while building the schema tree. This is done regardless of the actual 
> plugins involved within a query. 
> Sometimes, when one or more of the enabled storage plugins have issues - 
> either due to misconfiguration or the underlying datasource being slow or 
> being down, the overall query time taken increases drastically. Most likely 
> due the attempt being made to register schemas from a faulty plugin.
> For example, when a jdbc plugin is configured with SQL Server, and at one 
> point the underlying SQL Server db goes down, any Drill query starting to 
> execute at that point and beyond begin to slow down drastically. 
> We must skip registering unrelated schemas (& workspaces) for a query. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query

2017-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257239#comment-16257239
 ] 

ASF GitHub Bot commented on DRILL-5089:
---

Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1032#discussion_r151713558
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/TestSchema.java
 ---
@@ -0,0 +1,84 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl;
+
+import org.apache.drill.test.ClientFixture;
+import org.apache.drill.test.ClusterFixture;
+import org.apache.drill.test.ClusterMockStorageFixture;
+import org.apache.drill.test.DrillTest;
+
+import org.junit.AfterClass;
+import org.junit.BeforeClass;
+import org.junit.Test;
+
+import static org.junit.Assert.assertTrue;
+
+public class TestSchema extends DrillTest {
+
+  private static ClusterMockStorageFixture cluster;
+  private static ClientFixture client;
+
+  @BeforeClass
+  public static void setup() throws Exception {
+cluster = ClusterFixture.builder().buildCustomMockStorage();
+boolean breakRegisterSchema = true;
+cluster.insertMockStorage("mock_broken", breakRegisterSchema);
+cluster.insertMockStorage("mock_good", !breakRegisterSchema);
+client = cluster.clientFixture();
+  }
+
+  @Test
+  public void testQueryBrokenStorage() {
+String sql = "SELECT id_i, name_s10 FROM `mock_broken`.`employees_5`";
+try {
+  client.queryBuilder().sql(sql).printCsv();
+} catch (Exception ex) {
+  assertTrue(ex.getMessage().contains("VALIDATION ERROR: Schema"));
--- End diff --

This test can give false positive result when exception won't be thrown at 
all. Please re-throw the exception after the check and add `@Test(expected = 
Exception.class)`.


> Skip initializing all enabled storage plugins for every query
> -
>
> Key: DRILL-5089
> URL: https://issues.apache.org/jira/browse/DRILL-5089
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 1.9.0
>Reporter: Abhishek Girish
>Assignee: Chunhui Shi
>Priority: Critical
>
> In a query's lifecycle, at attempt is made to initialize each enabled storage 
> plugin, while building the schema tree. This is done regardless of the actual 
> plugins involved within a query. 
> Sometimes, when one or more of the enabled storage plugins have issues - 
> either due to misconfiguration or the underlying datasource being slow or 
> being down, the overall query time taken increases drastically. Most likely 
> due the attempt being made to register schemas from a faulty plugin.
> For example, when a jdbc plugin is configured with SQL Server, and at one 
> point the underlying SQL Server db goes down, any Drill query starting to 
> execute at that point and beyond begin to slow down drastically. 
> We must skip registering unrelated schemas (& workspaces) for a query. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query

2017-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257236#comment-16257236
 ] 

ASF GitHub Bot commented on DRILL-5089:
---

Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1032#discussion_r151713734
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/mock/MockBreakageStorage.java
 ---
@@ -0,0 +1,46 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.mock;
+
+import org.apache.calcite.schema.SchemaPlus;
+import org.apache.drill.exec.server.DrillbitContext;
+import org.apache.drill.exec.store.SchemaConfig;
+
+import java.io.IOException;
+
+public class MockBreakageStorage extends MockStorageEngine {
+
+  boolean breakRegister;
--- End diff --

private


> Skip initializing all enabled storage plugins for every query
> -
>
> Key: DRILL-5089
> URL: https://issues.apache.org/jira/browse/DRILL-5089
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 1.9.0
>Reporter: Abhishek Girish
>Assignee: Chunhui Shi
>Priority: Critical
>
> In a query's lifecycle, at attempt is made to initialize each enabled storage 
> plugin, while building the schema tree. This is done regardless of the actual 
> plugins involved within a query. 
> Sometimes, when one or more of the enabled storage plugins have issues - 
> either due to misconfiguration or the underlying datasource being slow or 
> being down, the overall query time taken increases drastically. Most likely 
> due the attempt being made to register schemas from a faulty plugin.
> For example, when a jdbc plugin is configured with SQL Server, and at one 
> point the underlying SQL Server db goes down, any Drill query starting to 
> execute at that point and beyond begin to slow down drastically. 
> We must skip registering unrelated schemas (& workspaces) for a query. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query

2017-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257234#comment-16257234
 ] 

ASF GitHub Bot commented on DRILL-5089:
---

Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1032#discussion_r151713266
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/TestSchema.java
 ---
@@ -0,0 +1,84 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl;
+
+import org.apache.drill.test.ClientFixture;
+import org.apache.drill.test.ClusterFixture;
+import org.apache.drill.test.ClusterMockStorageFixture;
+import org.apache.drill.test.DrillTest;
+
+import org.junit.AfterClass;
+import org.junit.BeforeClass;
+import org.junit.Test;
+
+import static org.junit.Assert.assertTrue;
+
+public class TestSchema extends DrillTest {
+
+  private static ClusterMockStorageFixture cluster;
+  private static ClientFixture client;
+
+  @BeforeClass
+  public static void setup() throws Exception {
+cluster = ClusterFixture.builder().buildCustomMockStorage();
+boolean breakRegisterSchema = true;
+cluster.insertMockStorage("mock_broken", breakRegisterSchema);
+cluster.insertMockStorage("mock_good", !breakRegisterSchema);
+client = cluster.clientFixture();
+  }
+
+  @Test
+  public void testQueryBrokenStorage() {
+String sql = "SELECT id_i, name_s10 FROM `mock_broken`.`employees_5`";
+try {
+  client.queryBuilder().sql(sql).printCsv();
+} catch (Exception ex) {
+  assertTrue(ex.getMessage().contains("VALIDATION ERROR: Schema"));
+}
+  }
+
+  @Test
+  public void testQueryGoodStorage() {
+String sql = "SELECT id_i, name_s10 FROM `mock_good`.`employees_5`";
+client.queryBuilder().sql(sql).printCsv();
+  }
+
+  @Test
+  public void testQueryGoodStorageWithDefaultSchema() throws Exception {
+String use_dfs = "use dfs.tmp";
+client.queryBuilder().sql(use_dfs).run();
+String sql = "SELECT id_i, name_s10 FROM `mock_good`.`employees_5`";
+client.queryBuilder().sql(sql).printCsv();
--- End diff --

Do we actually want to print csv here? I suggest we produce no output here.


> Skip initializing all enabled storage plugins for every query
> -
>
> Key: DRILL-5089
> URL: https://issues.apache.org/jira/browse/DRILL-5089
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 1.9.0
>Reporter: Abhishek Girish
>Assignee: Chunhui Shi
>Priority: Critical
>
> In a query's lifecycle, at attempt is made to initialize each enabled storage 
> plugin, while building the schema tree. This is done regardless of the actual 
> plugins involved within a query. 
> Sometimes, when one or more of the enabled storage plugins have issues - 
> either due to misconfiguration or the underlying datasource being slow or 
> being down, the overall query time taken increases drastically. Most likely 
> due the attempt being made to register schemas from a faulty plugin.
> For example, when a jdbc plugin is configured with SQL Server, and at one 
> point the underlying SQL Server db goes down, any Drill query starting to 
> execute at that point and beyond begin to slow down drastically. 
> We must skip registering unrelated schemas (& workspaces) for a query. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query

2017-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257242#comment-16257242
 ] 

ASF GitHub Bot commented on DRILL-5089:
---

Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1032#discussion_r151713624
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/TestSchema.java
 ---
@@ -0,0 +1,84 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl;
+
+import org.apache.drill.test.ClientFixture;
+import org.apache.drill.test.ClusterFixture;
+import org.apache.drill.test.ClusterMockStorageFixture;
+import org.apache.drill.test.DrillTest;
+
+import org.junit.AfterClass;
+import org.junit.BeforeClass;
+import org.junit.Test;
+
+import static org.junit.Assert.assertTrue;
+
+public class TestSchema extends DrillTest {
+
+  private static ClusterMockStorageFixture cluster;
+  private static ClientFixture client;
+
+  @BeforeClass
+  public static void setup() throws Exception {
+cluster = ClusterFixture.builder().buildCustomMockStorage();
+boolean breakRegisterSchema = true;
+cluster.insertMockStorage("mock_broken", breakRegisterSchema);
+cluster.insertMockStorage("mock_good", !breakRegisterSchema);
+client = cluster.clientFixture();
+  }
+
+  @Test
+  public void testQueryBrokenStorage() {
+String sql = "SELECT id_i, name_s10 FROM `mock_broken`.`employees_5`";
+try {
+  client.queryBuilder().sql(sql).printCsv();
+} catch (Exception ex) {
+  assertTrue(ex.getMessage().contains("VALIDATION ERROR: Schema"));
+}
+  }
+
+  @Test
+  public void testQueryGoodStorage() {
+String sql = "SELECT id_i, name_s10 FROM `mock_good`.`employees_5`";
+client.queryBuilder().sql(sql).printCsv();
+  }
+
+  @Test
+  public void testQueryGoodStorageWithDefaultSchema() throws Exception {
+String use_dfs = "use dfs.tmp";
+client.queryBuilder().sql(use_dfs).run();
+String sql = "SELECT id_i, name_s10 FROM `mock_good`.`employees_5`";
+client.queryBuilder().sql(sql).printCsv();
+  }
+
+  @Test
+  public void testUseBrokenStorage() throws Exception {
+try {
+  String use_dfs = "use mock_broken";
+  client.queryBuilder().sql(use_dfs).run();
+} catch(Exception ex) {
+  assertTrue(ex.getMessage().contains("VALIDATION ERROR: Schema"));
+}
+  }
+
+  @AfterClass
--- End diff --

Can be removed.


> Skip initializing all enabled storage plugins for every query
> -
>
> Key: DRILL-5089
> URL: https://issues.apache.org/jira/browse/DRILL-5089
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 1.9.0
>Reporter: Abhishek Girish
>Assignee: Chunhui Shi
>Priority: Critical
>
> In a query's lifecycle, at attempt is made to initialize each enabled storage 
> plugin, while building the schema tree. This is done regardless of the actual 
> plugins involved within a query. 
> Sometimes, when one or more of the enabled storage plugins have issues - 
> either due to misconfiguration or the underlying datasource being slow or 
> being down, the overall query time taken increases drastically. Most likely 
> due the attempt being made to register schemas from a faulty plugin.
> For example, when a jdbc plugin is configured with SQL Server, and at one 
> point the underlying SQL Server db goes down, any Drill query starting to 
> execute at that point and beyond begin to slow down drastically. 
> We must skip registering unrelated schemas (& workspaces) for a query. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query

2017-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257235#comment-16257235
 ] 

ASF GitHub Bot commented on DRILL-5089:
---

Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1032#discussion_r151714217
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java
 ---
@@ -150,14 +152,30 @@ public WorkspaceSchemaFactory(
* @return True if the user has access. False otherwise.
*/
   public boolean accessible(final String userName) throws IOException {
-final FileSystem fs = ImpersonationUtil.createFileSystem(userName, 
fsConf);
+final DrillFileSystem fs = 
ImpersonationUtil.createFileSystem(userName, fsConf);
+return accessible(fs);
+  }
+
+  /**
+   * Checks whether a FileSystem object has the permission to list/read 
workspace directory
+   * @param fs a DrillFileSystem object that was created with certain user 
privilege
+   * @return True if the user has access. False otherwise.
+   * @throws IOException
+   */
+  public boolean accessible(DrillFileSystem fs) throws IOException {
 try {
-  // We have to rely on the listStatus as a FileSystem can have 
complicated controls such as regular unix style
-  // permissions, Access Control Lists (ACLs) or Access Control 
Expressions (ACE). Hadoop 2.7 version of FileSystem
-  // has a limited private API (FileSystem.access) to check the 
permissions directly
-  // (see https://issues.apache.org/jira/browse/HDFS-6570). Drill 
currently relies on Hadoop 2.5.0 version of
-  // FileClient. TODO: Update this when DRILL-3749 is fixed.
-  fs.listStatus(wsPath);
+  /**
+   * For Windows local file system, fs.access ends up using 
DeprecatedRawLocalFileStatus which has
+   * TrustedInstaller as owner, and a member of Administrators group 
could not satisfy the permission.
+   * In this case, we will still use method listStatus.
+   * In other cases, we use access method since it is cheaper.
+   */
+  if (SystemUtils.IS_OS_WINDOWS && 
fs.getUri().getScheme().equalsIgnoreCase(FileSystemSchemaFactory.LOCAL_FS_SCHEME))
 {
--- End diff --

Just in case, did you check that everything works on Windows?


> Skip initializing all enabled storage plugins for every query
> -
>
> Key: DRILL-5089
> URL: https://issues.apache.org/jira/browse/DRILL-5089
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 1.9.0
>Reporter: Abhishek Girish
>Assignee: Chunhui Shi
>Priority: Critical
>
> In a query's lifecycle, at attempt is made to initialize each enabled storage 
> plugin, while building the schema tree. This is done regardless of the actual 
> plugins involved within a query. 
> Sometimes, when one or more of the enabled storage plugins have issues - 
> either due to misconfiguration or the underlying datasource being slow or 
> being down, the overall query time taken increases drastically. Most likely 
> due the attempt being made to register schemas from a faulty plugin.
> For example, when a jdbc plugin is configured with SQL Server, and at one 
> point the underlying SQL Server db goes down, any Drill query starting to 
> execute at that point and beyond begin to slow down drastically. 
> We must skip registering unrelated schemas (& workspaces) for a query. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query

2017-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257241#comment-16257241
 ] 

ASF GitHub Bot commented on DRILL-5089:
---

Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1032#discussion_r151717700
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/DynamicRootSchema.java
 ---
@@ -0,0 +1,140 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.sql;
+
+import com.google.common.collect.ImmutableSortedSet;
+import com.google.common.collect.Lists;
+import com.google.common.collect.Sets;
+import org.apache.calcite.DataContext;
+import org.apache.calcite.jdbc.CalciteRootSchema;
+import org.apache.calcite.jdbc.CalciteSchema;
+
+import org.apache.calcite.linq4j.tree.Expression;
+import org.apache.calcite.linq4j.tree.Expressions;
+import org.apache.calcite.schema.SchemaPlus;
+import org.apache.calcite.schema.impl.AbstractSchema;
+import org.apache.calcite.util.BuiltInMethod;
+import org.apache.calcite.util.Compatible;
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.exec.store.SchemaConfig;
+import org.apache.drill.exec.store.StoragePlugin;
+import org.apache.drill.exec.store.StoragePluginRegistry;
+import org.apache.drill.exec.store.SubSchemaWrapper;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.Map;
+import java.util.NavigableSet;
+import java.util.Set;
+
+/**
+ * This class is to allow us loading schemas from storage plugins later 
when {@link #getSubSchema(String, boolean)}
+ * is called.
+ */
+public class DynamicRootSchema extends DynamicSchema
+implements CalciteRootSchema {
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(DynamicRootSchema.class);
+  /** Creates a root schema. */
+  DynamicRootSchema(StoragePluginRegistry storages, SchemaConfig 
schemaConfig) {
+super(null, new RootSchema(), "");
+this.schemaConfig = schemaConfig;
+this.storages = storages;
+  }
+
+  @Override
+  public CalciteSchema getSubSchema(String schemaName, boolean 
caseSensitive) {
+CalciteSchema retSchema = getSubSchemaMap().get(schemaName);
+if (retSchema != null) {
+  return retSchema;
+}
+
+loadSchemaFactory(schemaName, caseSensitive);
+retSchema = getSubSchemaMap().get(schemaName);
+return retSchema;
+  }
+
+  @Override
+  public NavigableSet getTableNames() {
+Set pluginNames = Sets.newHashSet();
+for (Map.Entry storageEntry : 
getSchemaFactories()) {
+  pluginNames.add(storageEntry.getKey());
+}
+return Compatible.INSTANCE.navigableSet(
+ImmutableSortedSet.copyOf(
+Sets.union(pluginNames, getSubSchemaMap().keySet(;
+  }
+
+  /**
+   * load schema factory(storage plugin) for schemaName
+   * @param schemaName
+   * @param caseSensitive
+   */
+  public void loadSchemaFactory(String schemaName, boolean caseSensitive) {
+try {
+  SchemaPlus thisPlus = this.plus();
+  StoragePlugin plugin = getSchemaFactories().getPlugin(schemaName);
+  if (plugin != null) {
+plugin.registerSchemas(schemaConfig, thisPlus);
+return;
+  }
+
+  // we could not find the plugin, the schemaName could be `dfs.tmp`, 
a 2nd level schema under 'dfs'
+  String[] paths = schemaName.split("\\.");
+  if (paths.length == 2) {
+plugin = getSchemaFactories().getPlugin(paths[0]);
+if (plugin == null) {
+  return;
+}
+
+// we could find storage plugin for first part(e.g. 'dfs') of 
schemaName (e.g. 'dfs.tmp')
+// register schema for this 

[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query

2017-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257237#comment-16257237
 ] 

ASF GitHub Bot commented on DRILL-5089:
---

Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1032#discussion_r151714418
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemSchemaFactory.java
 ---
@@ -50,11 +53,23 @@
 
   public static final String DEFAULT_WS_NAME = "default";
 
+  public static final String LOCAL_FS_SCHEME = "file";
+
   private List factories;
   private String schemaName;
+  protected FileSystemPlugin plugin;
 
   public FileSystemSchemaFactory(String schemaName, 
List factories) {
-super();
+// when the correspondent FileSystemPlugin is not passed in, we dig 
into ANY workspace factory to get it.
+if (factories.size() > 0 ) {
--- End diff --

Please remove space `if (factories.size() > 0) {`.


> Skip initializing all enabled storage plugins for every query
> -
>
> Key: DRILL-5089
> URL: https://issues.apache.org/jira/browse/DRILL-5089
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 1.9.0
>Reporter: Abhishek Girish
>Assignee: Chunhui Shi
>Priority: Critical
>
> In a query's lifecycle, at attempt is made to initialize each enabled storage 
> plugin, while building the schema tree. This is done regardless of the actual 
> plugins involved within a query. 
> Sometimes, when one or more of the enabled storage plugins have issues - 
> either due to misconfiguration or the underlying datasource being slow or 
> being down, the overall query time taken increases drastically. Most likely 
> due the attempt being made to register schemas from a faulty plugin.
> For example, when a jdbc plugin is configured with SQL Server, and at one 
> point the underlying SQL Server db goes down, any Drill query starting to 
> execute at that point and beyond begin to slow down drastically. 
> We must skip registering unrelated schemas (& workspaces) for a query. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query

2017-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257243#comment-16257243
 ] 

ASF GitHub Bot commented on DRILL-5089:
---

Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1032#discussion_r151732407
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/DynamicRootSchema.java
 ---
@@ -0,0 +1,140 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.sql;
+
+import com.google.common.collect.ImmutableSortedSet;
+import com.google.common.collect.Lists;
+import com.google.common.collect.Sets;
+import org.apache.calcite.DataContext;
+import org.apache.calcite.jdbc.CalciteRootSchema;
+import org.apache.calcite.jdbc.CalciteSchema;
+
+import org.apache.calcite.linq4j.tree.Expression;
+import org.apache.calcite.linq4j.tree.Expressions;
+import org.apache.calcite.schema.SchemaPlus;
+import org.apache.calcite.schema.impl.AbstractSchema;
+import org.apache.calcite.util.BuiltInMethod;
+import org.apache.calcite.util.Compatible;
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.exec.store.SchemaConfig;
+import org.apache.drill.exec.store.StoragePlugin;
+import org.apache.drill.exec.store.StoragePluginRegistry;
+import org.apache.drill.exec.store.SubSchemaWrapper;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.Map;
+import java.util.NavigableSet;
+import java.util.Set;
+
+/**
+ * This class is to allow us loading schemas from storage plugins later 
when {@link #getSubSchema(String, boolean)}
+ * is called.
+ */
+public class DynamicRootSchema extends DynamicSchema
+implements CalciteRootSchema {
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(DynamicRootSchema.class);
+  /** Creates a root schema. */
+  DynamicRootSchema(StoragePluginRegistry storages, SchemaConfig 
schemaConfig) {
+super(null, new RootSchema(), "");
+this.schemaConfig = schemaConfig;
+this.storages = storages;
+  }
+
+  @Override
+  public CalciteSchema getSubSchema(String schemaName, boolean 
caseSensitive) {
+CalciteSchema retSchema = getSubSchemaMap().get(schemaName);
+if (retSchema != null) {
+  return retSchema;
+}
+
+loadSchemaFactory(schemaName, caseSensitive);
+retSchema = getSubSchemaMap().get(schemaName);
+return retSchema;
+  }
+
+  @Override
+  public NavigableSet getTableNames() {
+Set pluginNames = Sets.newHashSet();
+for (Map.Entry storageEntry : 
getSchemaFactories()) {
+  pluginNames.add(storageEntry.getKey());
+}
+return Compatible.INSTANCE.navigableSet(
+ImmutableSortedSet.copyOf(
+Sets.union(pluginNames, getSubSchemaMap().keySet(;
+  }
+
+  /**
+   * load schema factory(storage plugin) for schemaName
+   * @param schemaName
+   * @param caseSensitive
+   */
+  public void loadSchemaFactory(String schemaName, boolean caseSensitive) {
+try {
+  SchemaPlus thisPlus = this.plus();
+  StoragePlugin plugin = getSchemaFactories().getPlugin(schemaName);
+  if (plugin != null) {
+plugin.registerSchemas(schemaConfig, thisPlus);
+return;
+  }
+
+  // we could not find the plugin, the schemaName could be `dfs.tmp`, 
a 2nd level schema under 'dfs'
+  String[] paths = schemaName.split("\\.");
+  if (paths.length == 2) {
+plugin = getSchemaFactories().getPlugin(paths[0]);
+if (plugin == null) {
+  return;
+}
+
+// we could find storage plugin for first part(e.g. 'dfs') of 
schemaName (e.g. 'dfs.tmp')
+// register schema for this 

[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query

2017-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257238#comment-16257238
 ] 

ASF GitHub Bot commented on DRILL-5089:
---

Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1032#discussion_r151716880
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/DynamicRootSchema.java
 ---
@@ -0,0 +1,140 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.sql;
+
+import com.google.common.collect.ImmutableSortedSet;
+import com.google.common.collect.Lists;
+import com.google.common.collect.Sets;
+import org.apache.calcite.DataContext;
+import org.apache.calcite.jdbc.CalciteRootSchema;
+import org.apache.calcite.jdbc.CalciteSchema;
+
+import org.apache.calcite.linq4j.tree.Expression;
+import org.apache.calcite.linq4j.tree.Expressions;
+import org.apache.calcite.schema.SchemaPlus;
+import org.apache.calcite.schema.impl.AbstractSchema;
+import org.apache.calcite.util.BuiltInMethod;
+import org.apache.calcite.util.Compatible;
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.exec.store.SchemaConfig;
+import org.apache.drill.exec.store.StoragePlugin;
+import org.apache.drill.exec.store.StoragePluginRegistry;
+import org.apache.drill.exec.store.SubSchemaWrapper;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.Map;
+import java.util.NavigableSet;
+import java.util.Set;
+
+/**
+ * This class is to allow us loading schemas from storage plugins later 
when {@link #getSubSchema(String, boolean)}
+ * is called.
+ */
+public class DynamicRootSchema extends DynamicSchema
+implements CalciteRootSchema {
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(DynamicRootSchema.class);
+  /** Creates a root schema. */
+  DynamicRootSchema(StoragePluginRegistry storages, SchemaConfig 
schemaConfig) {
+super(null, new RootSchema(), "");
+this.schemaConfig = schemaConfig;
+this.storages = storages;
+  }
+
+  @Override
+  public CalciteSchema getSubSchema(String schemaName, boolean 
caseSensitive) {
+CalciteSchema retSchema = getSubSchemaMap().get(schemaName);
+if (retSchema != null) {
+  return retSchema;
+}
+
+loadSchemaFactory(schemaName, caseSensitive);
+retSchema = getSubSchemaMap().get(schemaName);
+return retSchema;
+  }
+
+  @Override
+  public NavigableSet getTableNames() {
+Set pluginNames = Sets.newHashSet();
+for (Map.Entry storageEntry : 
getSchemaFactories()) {
+  pluginNames.add(storageEntry.getKey());
+}
+return Compatible.INSTANCE.navigableSet(
+ImmutableSortedSet.copyOf(
+Sets.union(pluginNames, getSubSchemaMap().keySet(;
--- End diff --

Could you please explain what this method actually returns? According by 
its name it should return table names but it seems it returns different 
things...


> Skip initializing all enabled storage plugins for every query
> -
>
> Key: DRILL-5089
> URL: https://issues.apache.org/jira/browse/DRILL-5089
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 1.9.0
>Reporter: Abhishek Girish
>Assignee: Chunhui Shi
>Priority: Critical
>
> In a query's lifecycle, at attempt is made to initialize each enabled storage 
> plugin, while building the schema tree. This is done regardless of the actual 
> plugins involved within a query. 
> Sometimes, when one or more of the enabled storage plugins have issues - 
> either due to misconfiguration or the 

[jira] [Commented] (DRILL-5917) Ban org.json:json library in Drill

2017-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257175#comment-16257175
 ] 

ASF GitHub Bot commented on DRILL-5917:
---

Github user vrozov commented on the issue:

https://github.com/apache/drill/pull/1031
  
I already tested mapr and default profile locally. Once test pass on your 
cluster, let me know and I'll squash commits.


> Ban org.json:json library in Drill
> --
>
> Key: DRILL-5917
> URL: https://issues.apache.org/jira/browse/DRILL-5917
> Project: Apache Drill
>  Issue Type: Task
>Affects Versions: 1.11.0
>Reporter: Arina Ielchiieva
>Assignee: Vlad Rozov
> Fix For: 1.12.0
>
>
> Apache Drill has dependencies on json.org lib indirectly from two libraries:
> com.mapr.hadoop:maprfs:jar:5.2.1-mapr
> com.mapr.fs:mapr-hbase:jar:5.2.1-mapr
> {noformat}
> [INFO] org.apache.drill.contrib:drill-format-mapr:jar:1.12.0-SNAPSHOT
> [INFO] +- com.mapr.hadoop:maprfs:jar:5.2.1-mapr:compile
> [INFO] |  \- org.json:json:jar:20080701:compile
> [INFO] \- com.mapr.fs:mapr-hbase:jar:5.2.1-mapr:compile
> [INFO]\- (org.json:json:jar:20080701:compile - omitted for duplicate)
> {noformat}
> Need to make sure we won't have any dependencies from these libs to 
> org.json:json lib and ban this lib in main pom.xml file.
> Issue is critical since Apache release won't happen until we make sure 
> org.json:json lib is not used (https://www.apache.org/legal/resolved.html).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5917) Ban org.json:json library in Drill

2017-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257128#comment-16257128
 ] 

ASF GitHub Bot commented on DRILL-5917:
---

Github user arina-ielchiieva commented on the issue:

https://github.com/apache/drill/pull/1031
  
Test cluster is down for the weekend. You can test it locally, just run 
{{mvn clean install -Pmapr}}.


> Ban org.json:json library in Drill
> --
>
> Key: DRILL-5917
> URL: https://issues.apache.org/jira/browse/DRILL-5917
> Project: Apache Drill
>  Issue Type: Task
>Affects Versions: 1.11.0
>Reporter: Arina Ielchiieva
>Assignee: Vlad Rozov
> Fix For: 1.12.0
>
>
> Apache Drill has dependencies on json.org lib indirectly from two libraries:
> com.mapr.hadoop:maprfs:jar:5.2.1-mapr
> com.mapr.fs:mapr-hbase:jar:5.2.1-mapr
> {noformat}
> [INFO] org.apache.drill.contrib:drill-format-mapr:jar:1.12.0-SNAPSHOT
> [INFO] +- com.mapr.hadoop:maprfs:jar:5.2.1-mapr:compile
> [INFO] |  \- org.json:json:jar:20080701:compile
> [INFO] \- com.mapr.fs:mapr-hbase:jar:5.2.1-mapr:compile
> [INFO]\- (org.json:json:jar:20080701:compile - omitted for duplicate)
> {noformat}
> Need to make sure we won't have any dependencies from these libs to 
> org.json:json lib and ban this lib in main pom.xml file.
> Issue is critical since Apache release won't happen until we make sure 
> org.json:json lib is not used (https://www.apache.org/legal/resolved.html).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5917) Ban org.json:json library in Drill

2017-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257123#comment-16257123
 ] 

ASF GitHub Bot commented on DRILL-5917:
---

Github user vrozov commented on the issue:

https://github.com/apache/drill/pull/1031
  
@arina-ielchiieva Please test


> Ban org.json:json library in Drill
> --
>
> Key: DRILL-5917
> URL: https://issues.apache.org/jira/browse/DRILL-5917
> Project: Apache Drill
>  Issue Type: Task
>Affects Versions: 1.11.0
>Reporter: Arina Ielchiieva
>Assignee: Vlad Rozov
> Fix For: 1.12.0
>
>
> Apache Drill has dependencies on json.org lib indirectly from two libraries:
> com.mapr.hadoop:maprfs:jar:5.2.1-mapr
> com.mapr.fs:mapr-hbase:jar:5.2.1-mapr
> {noformat}
> [INFO] org.apache.drill.contrib:drill-format-mapr:jar:1.12.0-SNAPSHOT
> [INFO] +- com.mapr.hadoop:maprfs:jar:5.2.1-mapr:compile
> [INFO] |  \- org.json:json:jar:20080701:compile
> [INFO] \- com.mapr.fs:mapr-hbase:jar:5.2.1-mapr:compile
> [INFO]\- (org.json:json:jar:20080701:compile - omitted for duplicate)
> {noformat}
> Need to make sure we won't have any dependencies from these libs to 
> org.json:json lib and ban this lib in main pom.xml file.
> Issue is critical since Apache release won't happen until we make sure 
> org.json:json lib is not used (https://www.apache.org/legal/resolved.html).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5960) Add function STAsGeoJSON to extend GIS support

2017-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257119#comment-16257119
 ] 

ASF GitHub Bot commented on DRILL-5960:
---

Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1034#discussion_r151713040
  
--- Diff: 
contrib/gis/src/main/java/org/apache/drill/exec/expr/fn/impl/gis/STAsGeoJson.java
 ---
@@ -0,0 +1,63 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/*
+ * Wrapper for ESRI ST_AsGeoJson function to convert geometry to valid 
geojson
+ */
+package org.apache.drill.exec.expr.fn.impl.gis;
+
+import javax.inject.Inject;
+
+import org.apache.drill.exec.expr.DrillSimpleFunc;
+import org.apache.drill.exec.expr.annotations.FunctionTemplate;
+import org.apache.drill.exec.expr.annotations.Output;
+import org.apache.drill.exec.expr.annotations.Param;
+import org.apache.drill.exec.expr.holders.VarBinaryHolder;
+import org.apache.drill.exec.expr.holders.VarCharHolder;
+
+import io.netty.buffer.DrillBuf;
+
+@FunctionTemplate(name = "st_as_geo_json", scope = 
FunctionTemplate.FunctionScope.SIMPLE,
--- End diff --

@ChrisSandison since you have changed function names here and did not not 
in unit tests below, they will fail. Please note, functions are cached when 
Drill is built so to make sure your changes took affect you need to build drill 
first and then run unit tests).

It seems I saw comment that suggested to keep function name as with 
compliance with previously created functions like (`st_astext`) which at some 
point makes sense. It looked in Calcite and they have geographical functions 
with the same naming convention as well, so I guess we should revert the 
previous name. 


> Add function STAsGeoJSON to extend GIS support
> --
>
> Key: DRILL-5960
> URL: https://issues.apache.org/jira/browse/DRILL-5960
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Reporter: Chris Sandison
>Assignee: Chris Sandison
>Priority: Minor
>  Labels: features, newbie
>   Original Estimate: 3h
>  Remaining Estimate: 3h
>
> Add function as wrapper to ESRI's `asGeoJson` functionality. 
> Implementation is very similar to STAsText



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (DRILL-5963) Canceling a query hung in planning state, leaves the query in ENQUEUED state for ever.

2017-11-17 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva reassigned DRILL-5963:
---

Assignee: Arina Ielchiieva

> Canceling a query hung in planning state, leaves the query in ENQUEUED state 
> for ever.
> --
>
> Key: DRILL-5963
> URL: https://issues.apache.org/jira/browse/DRILL-5963
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.12.0
> Environment: Drill 1.12.0-SNAPSHOT, commit: 
> 4a718a0bd728ae02b502ac93620d132f0f6e1b6c
>Reporter: Khurram Faraaz
>Assignee: Arina Ielchiieva
>Priority: Critical
> Attachments: enqueued-2.png
>
>
> Canceling the below query that is hung in planning state, leaves the query in 
> ENQUEUED state for ever.
> Here is the query that is hung in planning state
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> select 1 || ',' || 2 || ',' || 3 || ',' || 4 || 
> ',' || 5 || ',' || 6 || ',' || 7 || ',' || 8 || ',' || 9 || ',' || 0 || ',' 
> AS CSV_DATA from (values(1));
> +--+
> |  |
> +--+
> +--+
> No rows selected (304.291 seconds)
> {noformat}
> Explain plan for that query also just hangs.
> {noformat}
> explain plan for select 1 || ',' || 2 || ',' || 3 || ',' || 4 || ',' || 5 || 
> ',' || 6 || ',' || 7 || ',' || 8 || ',' || 9 || ',' || 0 || ',' AS CSV_DATA 
> from (values(1));
> ...
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)