[jira] [Commented] (DRILL-4120) dir0 does not work when the directory structure contains Avro files

2018-03-12 Thread Boaz Ben-Zvi (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16395718#comment-16395718
 ] 

Boaz Ben-Zvi commented on DRILL-4120:
-

Merged with commit ID - 4652b0ba4f9a0708227e2b83a7097ff0517df33e

 

> dir0 does not work when the directory structure contains Avro files
> ---
>
> Key: DRILL-4120
> URL: https://issues.apache.org/jira/browse/DRILL-4120
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization, Storage - Avro
>Affects Versions: 1.3.0
>Reporter: Stefán Baxter
>Assignee: Volodymyr Vysotskyi
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.13.0
>
> Attachments: 
> 0001-DRILL-4120-Support-reading-directories-having-avro-f.patch
>
>
> Any select statment containing dirN fails if the target directory structure 
> contains Avro files.
> Steps to test:
> 1. create a simple directory structure
> 2. copy an avro file into each directory
> 3. execute a query containing dir0
> outcome:
> Error: VALIDATION ERROR: From line 1, column 117 to line 1, column 120: 
> Column 'dir0' not found in any table



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-4120) dir0 does not work when the directory structure contains Avro files

2018-03-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390643#comment-16390643
 ] 

ASF GitHub Bot commented on DRILL-4120:
---

Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/1138


> dir0 does not work when the directory structure contains Avro files
> ---
>
> Key: DRILL-4120
> URL: https://issues.apache.org/jira/browse/DRILL-4120
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization, Storage - Avro
>Affects Versions: 1.3.0
>Reporter: Stefán Baxter
>Assignee: Volodymyr Vysotskyi
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.13.0
>
> Attachments: 
> 0001-DRILL-4120-Support-reading-directories-having-avro-f.patch
>
>
> Any select statment containing dirN fails if the target directory structure 
> contains Avro files.
> Steps to test:
> 1. create a simple directory structure
> 2. copy an avro file into each directory
> 3. execute a query containing dir0
> outcome:
> Error: VALIDATION ERROR: From line 1, column 117 to line 1, column 120: 
> Column 'dir0' not found in any table



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-4120) dir0 does not work when the directory structure contains Avro files

2018-03-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389352#comment-16389352
 ] 

ASF GitHub Bot commented on DRILL-4120:
---

Github user arina-ielchiieva commented on the issue:

https://github.com/apache/drill/pull/1138
  
+1. LGTM.


> dir0 does not work when the directory structure contains Avro files
> ---
>
> Key: DRILL-4120
> URL: https://issues.apache.org/jira/browse/DRILL-4120
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization, Storage - Avro
>Affects Versions: 1.3.0
>Reporter: Stefán Baxter
>Assignee: Volodymyr Vysotskyi
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.13.0
>
> Attachments: 
> 0001-DRILL-4120-Support-reading-directories-having-avro-f.patch
>
>
> Any select statment containing dirN fails if the target directory structure 
> contains Avro files.
> Steps to test:
> 1. create a simple directory structure
> 2. copy an avro file into each directory
> 3. execute a query containing dir0
> outcome:
> Error: VALIDATION ERROR: From line 1, column 117 to line 1, column 120: 
> Column 'dir0' not found in any table



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-4120) dir0 does not work when the directory structure contains Avro files

2018-03-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16388631#comment-16388631
 ] 

ASF GitHub Bot commented on DRILL-4120:
---

Github user paul-rogers commented on the issue:

https://github.com/apache/drill/pull/1138
  
@vvysotskyi, thanks for addressing the schema issues! 


> dir0 does not work when the directory structure contains Avro files
> ---
>
> Key: DRILL-4120
> URL: https://issues.apache.org/jira/browse/DRILL-4120
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization, Storage - Avro
>Affects Versions: 1.3.0
>Reporter: Stefán Baxter
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.13.0
>
> Attachments: 
> 0001-DRILL-4120-Support-reading-directories-having-avro-f.patch
>
>
> Any select statment containing dirN fails if the target directory structure 
> contains Avro files.
> Steps to test:
> 1. create a simple directory structure
> 2. copy an avro file into each directory
> 3. execute a query containing dir0
> outcome:
> Error: VALIDATION ERROR: From line 1, column 117 to line 1, column 120: 
> Column 'dir0' not found in any table



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-4120) dir0 does not work when the directory structure contains Avro files

2018-03-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16388384#comment-16388384
 ] 

ASF GitHub Bot commented on DRILL-4120:
---

Github user vvysotskyi commented on a diff in the pull request:

https://github.com/apache/drill/pull/1138#discussion_r172618262
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/ColumnExplorer.java ---
@@ -78,6 +79,23 @@ public ColumnExplorer(OptionManager optionManager, 
List columns) {
 return map;
   }
 
+  /**
+   * Returns list with implicit column names taken from specified {@link 
SchemaConfig}.
+   *
+   * @param schemaConfig the source of session options values.
+   * @return list with implicit column names.
+   */
+  public static List getImplicitColumns(SchemaConfig schemaConfig) 
{
--- End diff --

Thanks, renamed.


> dir0 does not work when the directory structure contains Avro files
> ---
>
> Key: DRILL-4120
> URL: https://issues.apache.org/jira/browse/DRILL-4120
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization
>Affects Versions: 1.3.0
>Reporter: Stefán Baxter
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.13.0
>
> Attachments: 
> 0001-DRILL-4120-Support-reading-directories-having-avro-f.patch
>
>
> Any select statment containing dirN fails if the target directory structure 
> contains Avro files.
> Steps to test:
> 1. create a simple directory structure
> 2. copy an avro file into each directory
> 3. execute a query containing dir0
> outcome:
> Error: VALIDATION ERROR: From line 1, column 117 to line 1, column 120: 
> Column 'dir0' not found in any table



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-4120) dir0 does not work when the directory structure contains Avro files

2018-03-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16388383#comment-16388383
 ] 

ASF GitHub Bot commented on DRILL-4120:
---

Github user vvysotskyi commented on a diff in the pull request:

https://github.com/apache/drill/pull/1138#discussion_r172620291
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/avro/AvroDrillTable.java
 ---
@@ -58,16 +65,31 @@ public AvroDrillTable(String storageEngineName,
 
   @Override
   public RelDataType getRowType(RelDataTypeFactory typeFactory) {
-List typeList = Lists.newArrayList();
-List fieldNameList = Lists.newArrayList();
+// ExtendableRelDataTypeHolder is reused to preserve previously added 
implicit columns
+if (holder == null) {
+  List typeList = Lists.newArrayList();
+  List fieldNameList = Lists.newArrayList();
 
-Schema schema = reader.getSchema();
-for (Field field : schema.getFields()) {
-  fieldNameList.add(field.name());
-  typeList.add(getNullableRelDataTypeFromAvroType(typeFactory, 
field.schema()));
+  // adds partition columns to RowType
+  List partitions = 
ColumnExplorer.getPartitions(((FormatSelection) getSelection()).getSelection(), 
schemaConfig);
--- End diff --

1. Yes, it is safe, since we are using the same `FormatSelection` instance 
as we passed to the parent constructor.
2. Thanks, done.


> dir0 does not work when the directory structure contains Avro files
> ---
>
> Key: DRILL-4120
> URL: https://issues.apache.org/jira/browse/DRILL-4120
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization
>Affects Versions: 1.3.0
>Reporter: Stefán Baxter
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.13.0
>
> Attachments: 
> 0001-DRILL-4120-Support-reading-directories-having-avro-f.patch
>
>
> Any select statment containing dirN fails if the target directory structure 
> contains Avro files.
> Steps to test:
> 1. create a simple directory structure
> 2. copy an avro file into each directory
> 3. execute a query containing dir0
> outcome:
> Error: VALIDATION ERROR: From line 1, column 117 to line 1, column 120: 
> Column 'dir0' not found in any table



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-4120) dir0 does not work when the directory structure contains Avro files

2018-03-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16388385#comment-16388385
 ] 

ASF GitHub Bot commented on DRILL-4120:
---

Github user vvysotskyi commented on a diff in the pull request:

https://github.com/apache/drill/pull/1138#discussion_r172630422
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/SqlConverter.java
 ---
@@ -277,6 +279,25 @@ public String deriveAlias(
   return SqlValidatorUtil.getAlias(node, ordinal);
 }
 
+/**
+ * Checks that specified expression is not implicit column and
+ * adds is to a select list, ensuring that its alias does not
+ * clash with any existing expressions on the list.
+ */
+@Override
+protected void addToSelectList(
--- End diff --

Added comment into Javadoc, which describes when and why this method is 
used.

As for Avro and Dynamic tables, the key point is the result of 
`RelDataType.isDynamicStruct()` method. 
For `AvroDrillTable` we use `ExtendableRelDataType` whose 
`isDynamicStruct()` method returns `false`, but for `DynamicDrillTable` we use 
`RelDataTypeDrillImpl`  whose `isDynamicStruct()` method returns `true`. 
In such way, Calcites `SqlValidatorImpl` determines whether columns should 
be added.


> dir0 does not work when the directory structure contains Avro files
> ---
>
> Key: DRILL-4120
> URL: https://issues.apache.org/jira/browse/DRILL-4120
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization
>Affects Versions: 1.3.0
>Reporter: Stefán Baxter
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.13.0
>
> Attachments: 
> 0001-DRILL-4120-Support-reading-directories-having-avro-f.patch
>
>
> Any select statment containing dirN fails if the target directory structure 
> contains Avro files.
> Steps to test:
> 1. create a simple directory structure
> 2. copy an avro file into each directory
> 3. execute a query containing dir0
> outcome:
> Error: VALIDATION ERROR: From line 1, column 117 to line 1, column 120: 
> Column 'dir0' not found in any table



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-4120) dir0 does not work when the directory structure contains Avro files

2018-03-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16388382#comment-16388382
 ] 

ASF GitHub Bot commented on DRILL-4120:
---

Github user vvysotskyi commented on a diff in the pull request:

https://github.com/apache/drill/pull/1138#discussion_r172618093
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/types/ExtendableRelDataTypeHolder.java
 ---
@@ -0,0 +1,82 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to you under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.types;
+
+import org.apache.calcite.rel.type.RelDataTypeFactory;
+import org.apache.calcite.rel.type.RelDataTypeField;
+import org.apache.calcite.rel.type.RelDataTypeFieldImpl;
+import org.apache.calcite.sql.type.SqlTypeName;
+
+import java.util.List;
+
+/**
+ * Holder for list of RelDataTypeField which may be expanded by partition 
or implicit columns.
--- End diff --

You are right, partition columns are added before table columns in 
`AvroDrillTable.getRowType()`. Thanks for pointing this, modified comment.


> dir0 does not work when the directory structure contains Avro files
> ---
>
> Key: DRILL-4120
> URL: https://issues.apache.org/jira/browse/DRILL-4120
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization
>Affects Versions: 1.3.0
>Reporter: Stefán Baxter
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.13.0
>
> Attachments: 
> 0001-DRILL-4120-Support-reading-directories-having-avro-f.patch
>
>
> Any select statment containing dirN fails if the target directory structure 
> contains Avro files.
> Steps to test:
> 1. create a simple directory structure
> 2. copy an avro file into each directory
> 3. execute a query containing dir0
> outcome:
> Error: VALIDATION ERROR: From line 1, column 117 to line 1, column 120: 
> Column 'dir0' not found in any table



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-4120) dir0 does not work when the directory structure contains Avro files

2018-03-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16388255#comment-16388255
 ] 

ASF GitHub Bot commented on DRILL-4120:
---

Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1138#discussion_r172606895
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/types/ExtendableRelDataTypeHolder.java
 ---
@@ -0,0 +1,82 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to you under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.types;
+
+import org.apache.calcite.rel.type.RelDataTypeFactory;
+import org.apache.calcite.rel.type.RelDataTypeField;
+import org.apache.calcite.rel.type.RelDataTypeFieldImpl;
+import org.apache.calcite.sql.type.SqlTypeName;
+
+import java.util.List;
+
+/**
+ * Holder for list of RelDataTypeField which may be expanded by partition 
or implicit columns.
--- End diff --

When this holder is extended with partition columns? I can see only 
implicit.


> dir0 does not work when the directory structure contains Avro files
> ---
>
> Key: DRILL-4120
> URL: https://issues.apache.org/jira/browse/DRILL-4120
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization
>Affects Versions: 1.3.0
>Reporter: Stefán Baxter
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.13.0
>
> Attachments: 
> 0001-DRILL-4120-Support-reading-directories-having-avro-f.patch
>
>
> Any select statment containing dirN fails if the target directory structure 
> contains Avro files.
> Steps to test:
> 1. create a simple directory structure
> 2. copy an avro file into each directory
> 3. execute a query containing dir0
> outcome:
> Error: VALIDATION ERROR: From line 1, column 117 to line 1, column 120: 
> Column 'dir0' not found in any table



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-4120) dir0 does not work when the directory structure contains Avro files

2018-03-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16388257#comment-16388257
 ] 

ASF GitHub Bot commented on DRILL-4120:
---

Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1138#discussion_r172599233
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/ColumnExplorer.java ---
@@ -78,6 +79,23 @@ public ColumnExplorer(OptionManager optionManager, 
List columns) {
 return map;
   }
 
+  /**
+   * Returns list with implicit column names taken from specified {@link 
SchemaConfig}.
+   *
+   * @param schemaConfig the source of session options values.
+   * @return list with implicit column names.
+   */
+  public static List getImplicitColumns(SchemaConfig schemaConfig) 
{
--- End diff --

Please rename
1. `getImplicitColumns` -> `getImplicitColumnsNames`
2. `getPartitions` -> `getPartitionColumnNames`


> dir0 does not work when the directory structure contains Avro files
> ---
>
> Key: DRILL-4120
> URL: https://issues.apache.org/jira/browse/DRILL-4120
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization
>Affects Versions: 1.3.0
>Reporter: Stefán Baxter
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.13.0
>
> Attachments: 
> 0001-DRILL-4120-Support-reading-directories-having-avro-f.patch
>
>
> Any select statment containing dirN fails if the target directory structure 
> contains Avro files.
> Steps to test:
> 1. create a simple directory structure
> 2. copy an avro file into each directory
> 3. execute a query containing dir0
> outcome:
> Error: VALIDATION ERROR: From line 1, column 117 to line 1, column 120: 
> Column 'dir0' not found in any table



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-4120) dir0 does not work when the directory structure contains Avro files

2018-03-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16388258#comment-16388258
 ] 

ASF GitHub Bot commented on DRILL-4120:
---

Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1138#discussion_r172600066
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/avro/AvroDrillTable.java
 ---
@@ -58,16 +65,31 @@ public AvroDrillTable(String storageEngineName,
 
   @Override
   public RelDataType getRowType(RelDataTypeFactory typeFactory) {
-List typeList = Lists.newArrayList();
-List fieldNameList = Lists.newArrayList();
+// ExtendableRelDataTypeHolder is reused to preserve previously added 
implicit columns
+if (holder == null) {
+  List typeList = Lists.newArrayList();
+  List fieldNameList = Lists.newArrayList();
 
-Schema schema = reader.getSchema();
-for (Field field : schema.getFields()) {
-  fieldNameList.add(field.name());
-  typeList.add(getNullableRelDataTypeFromAvroType(typeFactory, 
field.schema()));
+  // adds partition columns to RowType
+  List partitions = 
ColumnExplorer.getPartitions(((FormatSelection) getSelection()).getSelection(), 
schemaConfig);
--- End diff --

1. Is it safe to cast to `FormatSelection` without checking?
2. Please update the comment - `adds partition columns to RowType` -> `adds 
partition columns to RowType since they always present in star queries`.


> dir0 does not work when the directory structure contains Avro files
> ---
>
> Key: DRILL-4120
> URL: https://issues.apache.org/jira/browse/DRILL-4120
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization
>Affects Versions: 1.3.0
>Reporter: Stefán Baxter
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.13.0
>
> Attachments: 
> 0001-DRILL-4120-Support-reading-directories-having-avro-f.patch
>
>
> Any select statment containing dirN fails if the target directory structure 
> contains Avro files.
> Steps to test:
> 1. create a simple directory structure
> 2. copy an avro file into each directory
> 3. execute a query containing dir0
> outcome:
> Error: VALIDATION ERROR: From line 1, column 117 to line 1, column 120: 
> Column 'dir0' not found in any table



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-4120) dir0 does not work when the directory structure contains Avro files

2018-03-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16388256#comment-16388256
 ] 

ASF GitHub Bot commented on DRILL-4120:
---

Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1138#discussion_r172606583
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/SqlConverter.java
 ---
@@ -277,6 +279,25 @@ public String deriveAlias(
   return SqlValidatorUtil.getAlias(node, ordinal);
 }
 
+/**
+ * Checks that specified expression is not implicit column and
+ * adds is to a select list, ensuring that its alias does not
+ * clash with any existing expressions on the list.
+ */
+@Override
+protected void addToSelectList(
--- End diff --

Could you please add comment explaining when this method is used and how 
know when to add columns to Avro table and not for Dynamic?


> dir0 does not work when the directory structure contains Avro files
> ---
>
> Key: DRILL-4120
> URL: https://issues.apache.org/jira/browse/DRILL-4120
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization
>Affects Versions: 1.3.0
>Reporter: Stefán Baxter
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.13.0
>
> Attachments: 
> 0001-DRILL-4120-Support-reading-directories-having-avro-f.patch
>
>
> Any select statment containing dirN fails if the target directory structure 
> contains Avro files.
> Steps to test:
> 1. create a simple directory structure
> 2. copy an avro file into each directory
> 3. execute a query containing dir0
> outcome:
> Error: VALIDATION ERROR: From line 1, column 117 to line 1, column 120: 
> Column 'dir0' not found in any table



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-4120) dir0 does not work when the directory structure contains Avro files

2018-03-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16387714#comment-16387714
 ] 

ASF GitHub Bot commented on DRILL-4120:
---

Github user vvysotskyi commented on the issue:

https://github.com/apache/drill/pull/1138
  
@arina-ielchiieva, @paul-rogers, I have reworked this pull request to use 
`AvroDrillTable`, as it was before my changes. Also, with `AvroDrillTable` 
there is no need to make changes in `AvroRecordReader`, so I reverted them.
Could you please take a look again?


> dir0 does not work when the directory structure contains Avro files
> ---
>
> Key: DRILL-4120
> URL: https://issues.apache.org/jira/browse/DRILL-4120
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization
>Affects Versions: 1.3.0
>Reporter: Stefán Baxter
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.13.0
>
> Attachments: 
> 0001-DRILL-4120-Support-reading-directories-having-avro-f.patch
>
>
> Any select statment containing dirN fails if the target directory structure 
> contains Avro files.
> Steps to test:
> 1. create a simple directory structure
> 2. copy an avro file into each directory
> 3. execute a query containing dir0
> outcome:
> Error: VALIDATION ERROR: From line 1, column 117 to line 1, column 120: 
> Column 'dir0' not found in any table



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-4120) dir0 does not work when the directory structure contains Avro files

2018-03-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16382494#comment-16382494
 ] 

ASF GitHub Bot commented on DRILL-4120:
---

Github user vvysotskyi commented on the issue:

https://github.com/apache/drill/pull/1138
  
@paul-rogers, schema is taken from the first file in the `FormatSelection`. 
Therefore for the case, when we have a table with several files with a 
different scheme, Drill query will fail.

As for the plan-time type information, besides the validation at the stage 
when a query is converted into rel nodes, field list may be used in project rel 
nodes instead of the dynamic star for `DynamicDrillTable`.


> dir0 does not work when the directory structure contains Avro files
> ---
>
> Key: DRILL-4120
> URL: https://issues.apache.org/jira/browse/DRILL-4120
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization
>Affects Versions: 1.3.0
>Reporter: Stefán Baxter
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.13.0
>
> Attachments: 
> 0001-DRILL-4120-Support-reading-directories-having-avro-f.patch
>
>
> Any select statment containing dirN fails if the target directory structure 
> contains Avro files.
> Steps to test:
> 1. create a simple directory structure
> 2. copy an avro file into each directory
> 3. execute a query containing dir0
> outcome:
> Error: VALIDATION ERROR: From line 1, column 117 to line 1, column 120: 
> Column 'dir0' not found in any table



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-4120) dir0 does not work when the directory structure contains Avro files

2018-03-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16382316#comment-16382316
 ] 

ASF GitHub Bot commented on DRILL-4120:
---

Github user paul-rogers commented on the issue:

https://github.com/apache/drill/pull/1138
  
Another thought. The removed code is at plan time. Did the original code 
have to open each file to retrieve schema? If so, does removing the code remove 
that load? If so, then this change could be a huge performance improvement if 
avoids the need to open every file in the Foreman.

Then, the the next question is: do we actually do anything with the 
plan-time type information? Few files have that information. Given that, does 
the planner actually use the information? Is this something we get for free 
from Calcite? If we are not using the type information at plan time, then 
clearly there is no harm in removing the code that retrieves the type 
information.


> dir0 does not work when the directory structure contains Avro files
> ---
>
> Key: DRILL-4120
> URL: https://issues.apache.org/jira/browse/DRILL-4120
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization
>Affects Versions: 1.3.0
>Reporter: Stefán Baxter
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.13.0
>
> Attachments: 
> 0001-DRILL-4120-Support-reading-directories-having-avro-f.patch
>
>
> Any select statment containing dirN fails if the target directory structure 
> contains Avro files.
> Steps to test:
> 1. create a simple directory structure
> 2. copy an avro file into each directory
> 3. execute a query containing dir0
> outcome:
> Error: VALIDATION ERROR: From line 1, column 117 to line 1, column 120: 
> Column 'dir0' not found in any table



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-4120) dir0 does not work when the directory structure contains Avro files

2018-03-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16382240#comment-16382240
 ] 

ASF GitHub Bot commented on DRILL-4120:
---

Github user paul-rogers commented on the issue:

https://github.com/apache/drill/pull/1138
  
As @arina-ielchiieva points out, this change backs out plan-time knowledge 
of schema. This may not affect run-time accuracy. However, it does mean that 
queries can be planned, based on not knowing types, that fail at runtime when 
types are learned. This seems more like a bug that a feature. In general, we 
should use all information available. It is not helpful to ignore information 
if doing so results in poorer user experience.


> dir0 does not work when the directory structure contains Avro files
> ---
>
> Key: DRILL-4120
> URL: https://issues.apache.org/jira/browse/DRILL-4120
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization
>Affects Versions: 1.3.0
>Reporter: Stefán Baxter
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.13.0
>
> Attachments: 
> 0001-DRILL-4120-Support-reading-directories-having-avro-f.patch
>
>
> Any select statment containing dirN fails if the target directory structure 
> contains Avro files.
> Steps to test:
> 1. create a simple directory structure
> 2. copy an avro file into each directory
> 3. execute a query containing dir0
> outcome:
> Error: VALIDATION ERROR: From line 1, column 117 to line 1, column 120: 
> Column 'dir0' not found in any table



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-4120) dir0 does not work when the directory structure contains Avro files

2018-03-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16382224#comment-16382224
 ] 

ASF GitHub Bot commented on DRILL-4120:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1138#discussion_r171606330
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/avro/AvroRecordReader.java
 ---
@@ -154,6 +156,12 @@ public int next() {
 
   writer.setValueCount(recordCount);
 
+  // adds fields which don't exist in the table but should be present 
in the schema
+  if (recordCount > 0) {
+JsonReaderUtils.ensureAtLeastOneField(writer, getColumns(), false,
--- End diff --

In general, this is a bad idea, though existing code does this. If we find 
an empty file in one scanner, but a real file in another, we create an 
unnecessary schema change by making up a column.

Jinfeng's changes last year are supposed to handle the "fast none" case of 
a reader with no rows. There should be no reason to add a dummy column. Old 
code that adds such a column should be fixed. IMHO, code that does not add 
dummy columns should not begin to do so.


> dir0 does not work when the directory structure contains Avro files
> ---
>
> Key: DRILL-4120
> URL: https://issues.apache.org/jira/browse/DRILL-4120
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization
>Affects Versions: 1.3.0
>Reporter: Stefán Baxter
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.13.0
>
> Attachments: 
> 0001-DRILL-4120-Support-reading-directories-having-avro-f.patch
>
>
> Any select statment containing dirN fails if the target directory structure 
> contains Avro files.
> Steps to test:
> 1. create a simple directory structure
> 2. copy an avro file into each directory
> 3. execute a query containing dir0
> outcome:
> Error: VALIDATION ERROR: From line 1, column 117 to line 1, column 120: 
> Column 'dir0' not found in any table



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-4120) dir0 does not work when the directory structure contains Avro files

2018-03-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16382225#comment-16382225
 ] 

ASF GitHub Bot commented on DRILL-4120:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1138#discussion_r171607241
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/avro/AvroRecordReader.java
 ---
@@ -295,7 +301,8 @@ private void processPrimitive(final Object value, final 
Schema.Type type, final
 writer.binary(fieldName).writeVarBinary(0, length, buffer);
 break;
   case NULL:
-// Nothing to do for null type
+// The default Drill behaviour is to create int column
+writer.integer(fieldName);
--- End diff --

This maps a NULL type to integer. Probably OK if we do this consistently.


> dir0 does not work when the directory structure contains Avro files
> ---
>
> Key: DRILL-4120
> URL: https://issues.apache.org/jira/browse/DRILL-4120
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization
>Affects Versions: 1.3.0
>Reporter: Stefán Baxter
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.13.0
>
> Attachments: 
> 0001-DRILL-4120-Support-reading-directories-having-avro-f.patch
>
>
> Any select statment containing dirN fails if the target directory structure 
> contains Avro files.
> Steps to test:
> 1. create a simple directory structure
> 2. copy an avro file into each directory
> 3. execute a query containing dir0
> outcome:
> Error: VALIDATION ERROR: From line 1, column 117 to line 1, column 120: 
> Column 'dir0' not found in any table



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-4120) dir0 does not work when the directory structure contains Avro files

2018-03-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16381924#comment-16381924
 ] 

ASF GitHub Bot commented on DRILL-4120:
---

Github user vvysotskyi commented on a diff in the pull request:

https://github.com/apache/drill/pull/1138#discussion_r171544478
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/store/avro/AvroFormatTest.java
 ---
@@ -170,25 +169,35 @@ public void 
testSimplePrimitiveSchema_SelectColumnSubset() throws Exception {
 
   @Test
   public void testSimplePrimitiveSchema_NoColumnsExistInTheSchema() throws 
Exception {
-final String file = 
generateSimplePrimitiveSchema_NoNullValues().getFileName();
-try {
-  test("select h_dummy1, e_dummy2 from dfs.`%s`", file);
-  Assert.fail("Test should fail as h_dummy1 and e_dummy2 does not 
exist.");
-} catch(UserException ue) {
-  Assert.assertTrue("Test should fail as h_dummy1 and e_dummy2 does 
not exist.",
-  ue.getMessage().contains("Column 'h_dummy1' not found in any 
table"));
-}
+final String file = 
generateSimplePrimitiveSchema_NoNullValues(1).getFileName();
+testBuilder()
+  .sqlQuery("select h_dummy1, e_dummy2 from dfs.`%s`", file)
+  .unOrdered()
+  .baselineColumns("h_dummy1", "e_dummy2")
+  .baselineValues(null, null)
+  .go();
   }
 
   @Test
   public void 
testSimplePrimitiveSchema_OneExistAndOneDoesNotExistInTheSchema() throws 
Exception {
-final String file = 
generateSimplePrimitiveSchema_NoNullValues().getFileName();
-try {
-  test("select h_boolean, e_dummy2 from dfs.`%s`", file);
-  Assert.fail("Test should fail as e_dummy2 does not exist.");
-} catch(UserException ue) {
-  Assert.assertTrue("Test should fail as e_dummy2 does not exist.", 
true);
-}
+final String file = 
generateSimplePrimitiveSchema_NoNullValues(1).getFileName();
+testBuilder()
+  .sqlQuery("select h_boolean, e_dummy2 from dfs.`%s`", file)
+  .unOrdered()
+  .baselineColumns("h_boolean", "e_dummy2")
+  .baselineValues(true, null)
+  .go();
+  }
+
+  @Test
+  public void testImplicitColumnFilename() throws Exception {
+final String file = 
generateSimplePrimitiveSchema_NoNullValues(1).getFileName();
+testBuilder()
+  .sqlQuery("select filename from dfs.`%s`", file)
--- End diff --

Thanks for pointing this, modified existing test to check except the 
`filename` also `suffix`, `fqn` and `filepath` implicit columns. Added separate 
test for partition column.


> dir0 does not work when the directory structure contains Avro files
> ---
>
> Key: DRILL-4120
> URL: https://issues.apache.org/jira/browse/DRILL-4120
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization
>Affects Versions: 1.3.0
>Reporter: Stefán Baxter
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.13.0
>
> Attachments: 
> 0001-DRILL-4120-Support-reading-directories-having-avro-f.patch
>
>
> Any select statment containing dirN fails if the target directory structure 
> contains Avro files.
> Steps to test:
> 1. create a simple directory structure
> 2. copy an avro file into each directory
> 3. execute a query containing dir0
> outcome:
> Error: VALIDATION ERROR: From line 1, column 117 to line 1, column 120: 
> Column 'dir0' not found in any table



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-4120) dir0 does not work when the directory structure contains Avro files

2018-03-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16381809#comment-16381809
 ] 

ASF GitHub Bot commented on DRILL-4120:
---

Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1138#discussion_r171517376
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/store/avro/AvroFormatTest.java
 ---
@@ -170,25 +169,35 @@ public void 
testSimplePrimitiveSchema_SelectColumnSubset() throws Exception {
 
   @Test
   public void testSimplePrimitiveSchema_NoColumnsExistInTheSchema() throws 
Exception {
-final String file = 
generateSimplePrimitiveSchema_NoNullValues().getFileName();
-try {
-  test("select h_dummy1, e_dummy2 from dfs.`%s`", file);
-  Assert.fail("Test should fail as h_dummy1 and e_dummy2 does not 
exist.");
-} catch(UserException ue) {
-  Assert.assertTrue("Test should fail as h_dummy1 and e_dummy2 does 
not exist.",
-  ue.getMessage().contains("Column 'h_dummy1' not found in any 
table"));
-}
+final String file = 
generateSimplePrimitiveSchema_NoNullValues(1).getFileName();
+testBuilder()
+  .sqlQuery("select h_dummy1, e_dummy2 from dfs.`%s`", file)
+  .unOrdered()
+  .baselineColumns("h_dummy1", "e_dummy2")
+  .baselineValues(null, null)
+  .go();
   }
 
   @Test
   public void 
testSimplePrimitiveSchema_OneExistAndOneDoesNotExistInTheSchema() throws 
Exception {
-final String file = 
generateSimplePrimitiveSchema_NoNullValues().getFileName();
-try {
-  test("select h_boolean, e_dummy2 from dfs.`%s`", file);
-  Assert.fail("Test should fail as e_dummy2 does not exist.");
-} catch(UserException ue) {
-  Assert.assertTrue("Test should fail as e_dummy2 does not exist.", 
true);
-}
+final String file = 
generateSimplePrimitiveSchema_NoNullValues(1).getFileName();
+testBuilder()
+  .sqlQuery("select h_boolean, e_dummy2 from dfs.`%s`", file)
+  .unOrdered()
+  .baselineColumns("h_boolean", "e_dummy2")
+  .baselineValues(true, null)
+  .go();
+  }
+
+  @Test
+  public void testImplicitColumnFilename() throws Exception {
+final String file = 
generateSimplePrimitiveSchema_NoNullValues(1).getFileName();
+testBuilder()
+  .sqlQuery("select filename from dfs.`%s`", file)
--- End diff --

Please test all implicit columns and at least one partition column.


> dir0 does not work when the directory structure contains Avro files
> ---
>
> Key: DRILL-4120
> URL: https://issues.apache.org/jira/browse/DRILL-4120
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization
>Affects Versions: 1.3.0
>Reporter: Stefán Baxter
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.13.0
>
> Attachments: 
> 0001-DRILL-4120-Support-reading-directories-having-avro-f.patch
>
>
> Any select statment containing dirN fails if the target directory structure 
> contains Avro files.
> Steps to test:
> 1. create a simple directory structure
> 2. copy an avro file into each directory
> 3. execute a query containing dir0
> outcome:
> Error: VALIDATION ERROR: From line 1, column 117 to line 1, column 120: 
> Column 'dir0' not found in any table



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-4120) dir0 does not work when the directory structure contains Avro files

2018-02-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16380687#comment-16380687
 ] 

ASF GitHub Bot commented on DRILL-4120:
---

Github user paul-rogers commented on the issue:

https://github.com/apache/drill/pull/1138
  
General comment: if we could move to the new scan framework; it handles 
implicit columns for all file-based readers. It also handles projection, 
missing columns, etc...


> dir0 does not work when the directory structure contains Avro files
> ---
>
> Key: DRILL-4120
> URL: https://issues.apache.org/jira/browse/DRILL-4120
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization
>Affects Versions: 1.3.0
>Reporter: Stefán Baxter
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.13.0
>
> Attachments: 
> 0001-DRILL-4120-Support-reading-directories-having-avro-f.patch
>
>
> Any select statment containing dirN fails if the target directory structure 
> contains Avro files.
> Steps to test:
> 1. create a simple directory structure
> 2. copy an avro file into each directory
> 3. execute a query containing dir0
> outcome:
> Error: VALIDATION ERROR: From line 1, column 117 to line 1, column 120: 
> Column 'dir0' not found in any table



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-4120) dir0 does not work when the directory structure contains Avro files

2018-02-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16380160#comment-16380160
 ] 

ASF GitHub Bot commented on DRILL-4120:
---

Github user arina-ielchiieva commented on the issue:

https://github.com/apache/drill/pull/1138
  
You are basically reverting changes done in DRILL-3810 to support schema 
validation in Avro. 
Avro format is strict and has schema. Should Drill treat it the same way or 
do loosen parsing?

We should evaluate the option of leaving schema for avro but adding 
implicit columns. Maybe the change won't be as easy as changing 
`AvroDrillTable` to `DynamicDrillTable` but it might be more correct.

You can also start mailing thread on dev / user list, asking about treating 
avro as dynamic format (listing pros and cons) and get feedback from the users. 

[1] https://issues.apache.org/jira/browse/DRILL-3810


> dir0 does not work when the directory structure contains Avro files
> ---
>
> Key: DRILL-4120
> URL: https://issues.apache.org/jira/browse/DRILL-4120
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization
>Affects Versions: 1.3.0
>Reporter: Stefán Baxter
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.13.0
>
> Attachments: 
> 0001-DRILL-4120-Support-reading-directories-having-avro-f.patch
>
>
> Any select statment containing dirN fails if the target directory structure 
> contains Avro files.
> Steps to test:
> 1. create a simple directory structure
> 2. copy an avro file into each directory
> 3. execute a query containing dir0
> outcome:
> Error: VALIDATION ERROR: From line 1, column 117 to line 1, column 120: 
> Column 'dir0' not found in any table



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-4120) dir0 does not work when the directory structure contains Avro files

2018-02-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16380114#comment-16380114
 ] 

ASF GitHub Bot commented on DRILL-4120:
---

GitHub user vvysotskyi opened a pull request:

https://github.com/apache/drill/pull/1138

DRILL-4120: Allow implicit columns for Avro storage format

Existing implementation of `AvroDrillTabl` does not allow dynamic columns 
discovering. `AvroDrillTable.getRowType()` method returns `RelDataTypeImlp` 
instance with the list of all table columns. It forces validator to check 
columns from select list in `RowType` list. It makes impossible to use implicit 
columns.

This fix replaces the usage of `AvroDrillTable` by `DynamicDrillTable` for 
Avro format and also allows usage of non-existent columns in Avro tables to be 
consistent with other storage formats.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vvysotskyi/drill DRILL-4120

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/1138.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1138


commit 402accca668481bb6816aad438c867781157fac6
Author: Volodymyr Vysotskyi 
Date:   2018-02-27T16:39:22Z

DRILL-4120: Allow implicit columns for Avro storage format




> dir0 does not work when the directory structure contains Avro files
> ---
>
> Key: DRILL-4120
> URL: https://issues.apache.org/jira/browse/DRILL-4120
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization
>Affects Versions: 1.3.0
>Reporter: Stefán Baxter
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.13.0
>
> Attachments: 
> 0001-DRILL-4120-Support-reading-directories-having-avro-f.patch
>
>
> Any select statment containing dirN fails if the target directory structure 
> contains Avro files.
> Steps to test:
> 1. create a simple directory structure
> 2. copy an avro file into each directory
> 3. execute a query containing dir0
> outcome:
> Error: VALIDATION ERROR: From line 1, column 117 to line 1, column 120: 
> Column 'dir0' not found in any table



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-4120) dir0 does not work when the directory structure contains Avro files

2018-02-28 Thread Volodymyr Vysotskyi (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16379990#comment-16379990
 ] 

Volodymyr Vysotskyi commented on DRILL-4120:


Existing implementation of AvroDrillTable does not allow dynamic columns 
discovering. {{AvroDrillTable.getRowType()}} method returns {{RelDataTypeImlp}} 
instance with the list of all table columns. It forces validator to check 
columns from select list in {{RowType}} list. It makes impossible to use 
implicit columns.

I think the usage of {{AvroDrillTable}} should be replaced by 
{{DynamicDrillTable}} for Avro format and also should be allowed usage of 
non-existent columns in Avro tables to be consistent with other storage formats.

> dir0 does not work when the directory structure contains Avro files
> ---
>
> Key: DRILL-4120
> URL: https://issues.apache.org/jira/browse/DRILL-4120
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization
>Affects Versions: 1.3.0
>Reporter: Stefán Baxter
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.13.0
>
> Attachments: 
> 0001-DRILL-4120-Support-reading-directories-having-avro-f.patch
>
>
> Any select statment containing dirN fails if the target directory structure 
> contains Avro files.
> Steps to test:
> 1. create a simple directory structure
> 2. copy an avro file into each directory
> 3. execute a query containing dir0
> outcome:
> Error: VALIDATION ERROR: From line 1, column 117 to line 1, column 120: 
> Column 'dir0' not found in any table



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-4120) dir0 does not work when the directory structure contains Avro files

2018-02-14 Thread Engin Sozer (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364017#comment-16364017
 ] 

Engin Sozer commented on DRILL-4120:


Similarly, we can not get filename in the select statement. So the following 
wouldn't work:
{code:java}
SELECT fqn, filepath, filename, suffix FROM dfs.`/tmpd/test.avro` LIMIT 1;
{code}

> dir0 does not work when the directory structure contains Avro files
> ---
>
> Key: DRILL-4120
> URL: https://issues.apache.org/jira/browse/DRILL-4120
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization
>Affects Versions: 1.3.0
>Reporter: Stefán Baxter
>Assignee: Bhallamudi Venkata Siva Kamesh
>Priority: Major
> Fix For: Future
>
> Attachments: 
> 0001-DRILL-4120-Support-reading-directories-having-avro-f.patch
>
>
> Any select statment containing dirN fails if the target directory structure 
> contains Avro files.
> Steps to test:
> 1. create a simple directory structure
> 2. copy an avro file into each directory
> 3. execute a query containing dir0
> outcome:
> Error: VALIDATION ERROR: From line 1, column 117 to line 1, column 120: 
> Column 'dir0' not found in any table



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-4120) dir0 does not work when the directory structure contains Avro files

2018-02-09 Thread Engin Sozer (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16358211#comment-16358211
 ] 

Engin Sozer commented on DRILL-4120:


Hi [~kam_iitkgp],

Is there any plan to make this work with dir0? If not, what do you suggest on 
partitioning the avro data on dfs and how to query the data with optimized 
performance?

> dir0 does not work when the directory structure contains Avro files
> ---
>
> Key: DRILL-4120
> URL: https://issues.apache.org/jira/browse/DRILL-4120
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization
>Affects Versions: 1.3.0
>Reporter: Stefán Baxter
>Assignee: Bhallamudi Venkata Siva Kamesh
>Priority: Major
> Fix For: Future
>
> Attachments: 
> 0001-DRILL-4120-Support-reading-directories-having-avro-f.patch
>
>
> Any select statment containing dirN fails if the target directory structure 
> contains Avro files.
> Steps to test:
> 1. create a simple directory structure
> 2. copy an avro file into each directory
> 3. execute a query containing dir0
> outcome:
> Error: VALIDATION ERROR: From line 1, column 117 to line 1, column 120: 
> Column 'dir0' not found in any table



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-4120) dir0 does not work when the directory structure contains Avro files

2015-12-09 Thread Bhallamudi Venkata Siva Kamesh (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15048248#comment-15048248
 ] 

Bhallamudi Venkata Siva Kamesh commented on DRILL-4120:
---

For directories, schema will be constructed at run time. 

> dir0 does not work when the directory structure contains Avro files
> ---
>
> Key: DRILL-4120
> URL: https://issues.apache.org/jira/browse/DRILL-4120
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.3.0
>Reporter: Stefán Baxter
>Assignee: Bhallamudi Venkata Siva Kamesh
> Fix For: 1.5.0
>
> Attachments: 
> 0001-DRILL-4120-Support-reading-directories-having-avro-f.patch
>
>
> Any select statment containing dirN fails if the target directory structure 
> contains Avro files.
> Steps to test:
> 1. create a simple directory structure
> 2. copy an avro file into each directory
> 3. execute a query containing dir0
> outcome:
> Error: VALIDATION ERROR: From line 1, column 117 to line 1, column 120: 
> Column 'dir0' not found in any table



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4120) dir0 does not work when the directory structure contains Avro files

2015-12-08 Thread JIRA

[ 
https://issues.apache.org/jira/browse/DRILL-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15047307#comment-15047307
 ] 

Stefán Baxter commented on DRILL-4120:
--

Hi,

Is this a complicated fix or could it be included in 1.4?

Regards,
 -Stefan

> dir0 does not work when the directory structure contains Avro files
> ---
>
> Key: DRILL-4120
> URL: https://issues.apache.org/jira/browse/DRILL-4120
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.3.0
>Reporter: Stefán Baxter
>Assignee: Bhallamudi Venkata Siva Kamesh
> Fix For: 1.5.0
>
>
> Any select statment containing dirN fails if the target directory structure 
> contains Avro files.
> Steps to test:
> 1. create a simple directory structure
> 2. copy an avro file into each directory
> 3. execute a query containing dir0
> outcome:
> Error: VALIDATION ERROR: From line 1, column 117 to line 1, column 120: 
> Column 'dir0' not found in any table



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4120) dir0 does not work when the directory structure contains Avro files

2015-12-08 Thread Bhallamudi Venkata Siva Kamesh (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15048070#comment-15048070
 ] 

Bhallamudi Venkata Siva Kamesh commented on DRILL-4120:
---

Attached patch does not contain schema support for directory. It uses, as of 
now, *DynamicDrillTable*. 
To add schema support for directory, I think, we need to cache schema at every 
directory level (which is union schema of all the files under a directory 
recursively). Otherwise, constructing this for every query may be very costly 
operation. If so, will implement it as a separate JIRA.
Any comments?

> dir0 does not work when the directory structure contains Avro files
> ---
>
> Key: DRILL-4120
> URL: https://issues.apache.org/jira/browse/DRILL-4120
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.3.0
>Reporter: Stefán Baxter
>Assignee: Bhallamudi Venkata Siva Kamesh
> Fix For: 1.5.0
>
> Attachments: 
> 0001-DRILL-4120-Support-reading-directories-having-avro-f.patch
>
>
> Any select statment containing dirN fails if the target directory structure 
> contains Avro files.
> Steps to test:
> 1. create a simple directory structure
> 2. copy an avro file into each directory
> 3. execute a query containing dir0
> outcome:
> Error: VALIDATION ERROR: From line 1, column 117 to line 1, column 120: 
> Column 'dir0' not found in any table



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4120) dir0 does not work when the directory structure contains Avro files

2015-12-08 Thread JIRA

[ 
https://issues.apache.org/jira/browse/DRILL-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15048174#comment-15048174
 ] 

Stefán Baxter commented on DRILL-4120:
--

Can you explain how it is used? (DynamicDrillTable) 

> dir0 does not work when the directory structure contains Avro files
> ---
>
> Key: DRILL-4120
> URL: https://issues.apache.org/jira/browse/DRILL-4120
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.3.0
>Reporter: Stefán Baxter
>Assignee: Bhallamudi Venkata Siva Kamesh
> Fix For: 1.5.0
>
> Attachments: 
> 0001-DRILL-4120-Support-reading-directories-having-avro-f.patch
>
>
> Any select statment containing dirN fails if the target directory structure 
> contains Avro files.
> Steps to test:
> 1. create a simple directory structure
> 2. copy an avro file into each directory
> 3. execute a query containing dir0
> outcome:
> Error: VALIDATION ERROR: From line 1, column 117 to line 1, column 120: 
> Column 'dir0' not found in any table



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4120) dir0 does not work when the directory structure contains Avro files

2015-11-21 Thread Bhallamudi Venkata Siva Kamesh (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15020820#comment-15020820
 ] 

Bhallamudi Venkata Siva Kamesh commented on DRILL-4120:
---

Hi Jacques,
 I will look into this.

> dir0 does not work when the directory structure contains Avro files
> ---
>
> Key: DRILL-4120
> URL: https://issues.apache.org/jira/browse/DRILL-4120
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.3.0
>Reporter: Stefán Baxter
>Assignee: Bhallamudi Venkata Siva Kamesh
> Fix For: 1.4.0
>
>
> Any select statment containing dirN fails if the target directory structure 
> contains Avro files.
> Steps to test:
> 1. create a simple directory structure
> 2. copy an avro file into each directory
> 3. execute a query containing dir0
> outcome:
> Error: VALIDATION ERROR: From line 1, column 117 to line 1, column 120: 
> Column 'dir0' not found in any table



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4120) dir0 does not work when the directory structure contains Avro files

2015-11-21 Thread Jacques Nadeau (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15020712#comment-15020712
 ] 

Jacques Nadeau commented on DRILL-4120:
---

Looks like this is a regression due to 
https://issues.apache.org/jira/browse/DRILL-3810

We added schema validation for Avro. We should also validate against dir 
columns.

[~kam_iitkgp], can you take a look?

> dir0 does not work when the directory structure contains Avro files
> ---
>
> Key: DRILL-4120
> URL: https://issues.apache.org/jira/browse/DRILL-4120
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.3.0
>Reporter: Stefán Baxter
>Assignee: Bhallamudi Venkata Siva Kamesh
> Fix For: 1.4.0
>
>
> Any select statment containing dirN fails if the target directory structure 
> contains Avro files.
> Steps to test:
> 1. create a simple directory structure
> 2. copy an avro file into each directory
> 3. execute a query containing dir0
> outcome:
> Error: VALIDATION ERROR: From line 1, column 117 to line 1, column 120: 
> Column 'dir0' not found in any table



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)