[jira] [Commented] (DRILL-7278) Refactor result set loader projection mechanism

2019-06-05 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16857295#comment-16857295
 ] 

ASF GitHub Bot commented on DRILL-7278:
---

paul-rogers commented on pull request #1797: DRILL-7278: Refactor result set 
loader projection mechanism
URL: https://github.com/apache/drill/pull/1797#discussion_r291012596
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/scan/project/projSet/ProjectionSetFactory.java
 ##
 @@ -0,0 +1,69 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.scan.project.projSet;
+
+import java.util.List;
+import java.util.Map;
+
+import org.apache.drill.common.expression.SchemaPath;
+import 
org.apache.drill.exec.physical.impl.scan.project.projSet.TypeConverter.CustomTypeTransform;
+import org.apache.drill.exec.physical.rowSet.ProjectionSet;
+import org.apache.drill.exec.physical.rowSet.project.RequestedTuple;
+import org.apache.drill.exec.physical.rowSet.project.RequestedTupleImpl;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.vector.accessor.convert.ColumnConversionFactory;
+import 
org.apache.drill.exec.vector.accessor.convert.StandardConversions.ConversionDefn;
+
+public class ProjectionSetFactory {
+
+  public static ProjectionSet projectAll() { return new 
WildcardProjectionSet(null); }
+
+  public static ProjectionSet projectNone() { return 
EmptyProjectionSet.PROJECT_NONE; }
+
+  public static ProjectionSet wrap(RequestedTuple mapProjection) {
+switch (mapProjection.type()) {
+case ALL:
+  return projectAll();
+case NONE:
+  return projectNone();
+case SOME:
+  return new ExplicitProjectionSet(mapProjection, null);
+default:
+  throw new IllegalStateException(mapProjection.type().toString());
+}
+  }
+
+  public static ProjectionSet build(List selection) {
+if (selection == null) {
+  return projectAll();
+}
+return wrap(RequestedTupleImpl.parse(selection));
+  }
+
+  public static CustomTypeTransform simpleTransform(ColumnConversionFactory 
colFactory) {
+return new CustomTypeTransform() {
 
 Review comment:
   Was leveraging the closure over colFactory, but pulled this out as a 
separate class as suggested.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor result set loader projection mechanism
> ---
>
> Key: DRILL-7278
> URL: https://issues.apache.org/jira/browse/DRILL-7278
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.17.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7278) Refactor result set loader projection mechanism

2019-06-05 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16857294#comment-16857294
 ] 

ASF GitHub Bot commented on DRILL-7278:
---

paul-rogers commented on pull request #1797: DRILL-7278: Refactor result set 
loader projection mechanism
URL: https://github.com/apache/drill/pull/1797#discussion_r291023826
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/record/metadata/AbstractColumnMetadata.java
 ##
 @@ -30,10 +30,11 @@
 import org.apache.drill.exec.record.metadata.schema.parser.SchemaExprParser;
 import org.joda.time.format.DateTimeFormatter;
 
-import java.io.IOException;
-import java.util.HashMap;
-import java.util.Map;
-import java.util.stream.Collectors;
+import com.fasterxml.jackson.annotation.JsonAutoDetect;
 
 Review comment:
   It is the order into which Eclipse sorts the imports automatically on save. 
If Drill has a preferred order, I can see if I can get Eclipse to follow that 
order; or I can turn off the auto-sort and organize them manually.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor result set loader projection mechanism
> ---
>
> Key: DRILL-7278
> URL: https://issues.apache.org/jira/browse/DRILL-7278
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.17.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7278) Refactor result set loader projection mechanism

2019-06-05 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16857296#comment-16857296
 ] 

ASF GitHub Bot commented on DRILL-7278:
---

paul-rogers commented on pull request #1797: DRILL-7278: Refactor result set 
loader projection mechanism
URL: https://github.com/apache/drill/pull/1797#discussion_r291011884
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/scan/project/projSet/ProjectedReadColumn.java
 ##
 @@ -0,0 +1,78 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.scan.project.projSet;
+
+import org.apache.drill.exec.physical.rowSet.ProjectionSet;
+import org.apache.drill.exec.physical.rowSet.project.ProjectionType;
+import 
org.apache.drill.exec.physical.rowSet.project.RequestedTuple.RequestedColumn;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.vector.accessor.convert.ColumnConversionFactory;
+
+/**
+ * Projected column. Includes at least the reader schema. May also
+ * include projection specification, and output schema and a type
+ * conversion.
+ */
+
+public class ProjectedReadColumn extends AbstractReadColProj {
+  private final RequestedColumn requestedCol;
+  private final ColumnMetadata outputSchema;
+  private final ColumnConversionFactory conversionFactory;
+
+  public ProjectedReadColumn(ColumnMetadata readSchema) {
+this(readSchema, null, null, null);
+  }
+
+  public ProjectedReadColumn(ColumnMetadata readSchema,
+  RequestedColumn requestedCol) {
+this(readSchema, requestedCol, null, null);
+  }
+
+  public ProjectedReadColumn(ColumnMetadata readSchema,
+  ColumnMetadata outputSchema, ColumnConversionFactory conversionFactory) {
+this(readSchema, null, outputSchema, null);
+  }
+
+  public ProjectedReadColumn(ColumnMetadata readSchema,
+  RequestedColumn requestedCol, ColumnMetadata outputSchema,
+  ColumnConversionFactory conversionFactory) {
+super(readSchema);
+this.requestedCol = requestedCol;
+this.outputSchema = outputSchema;
+this.conversionFactory = conversionFactory;
+  }
+
+  @Override
+  public ColumnMetadata providedSchema() {
+return outputSchema == null ? readSchema : outputSchema;
+  }
+
+  @Override
+  public ProjectionSet mapProjection() {
+// Should never occur: maps should use the map class.
+return null;
 
 Review comment:
   In general, the caller is aware if this is a map column or not. Returning 
null seems a benign way to say, "hey, this isn't a map."
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor result set loader projection mechanism
> ---
>
> Key: DRILL-7278
> URL: https://issues.apache.org/jira/browse/DRILL-7278
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.17.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7279) Support provided schema for CSV without headers

2019-06-05 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16857247#comment-16857247
 ] 

ASF GitHub Bot commented on DRILL-7279:
---

paul-rogers commented on pull request #1798: DRILL-7279: Enable provided schema 
for text files without headers
URL: https://github.com/apache/drill/pull/1798#discussion_r291007270
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/record/metadata/TupleMetadata.java
 ##
 @@ -46,6 +46,10 @@
 public interface TupleMetadata extends Propertied, Iterable {
 
   public static final String IS_STRICT_SCHEMA_PROP = DRILL_PROP_PREFIX + 
"strict";
+  public static final String HAS_HEADERS_PROP = DRILL_PROP_PREFIX + "headers";
+  public static final String SKIP_FIRST_LINE_PROP = DRILL_PROP_PREFIX + 
"skipFirstLine";
+  public static final String DELIMITER_PROP = DRILL_PROP_PREFIX + "delimiter";
+  public static final String COMMENT_CHAR_PROP = DRILL_PROP_PREFIX + 
"commentChar";
 
 Review comment:
   Added unit tests. This revealed that quote delimiters **never worked** in 
the V2 text reader. That is, the following never worked:
   
   ```
   quote="@"
   escape="~"
   ```
   
   Data:
   
   ```
   @foo~@bar@
   ```
   
   Reworked the parsing code for quoted strings to fix this issue, the output 
is now `foo@bar`. Also fixed:
   
   ```
   @foo~~bar@
   ```
   An escape without a quote should be a normal char. The output is now 
`foo~~bar`.
   
   Verified that V3 works for the quote and delimiter as the same character 
(did not check V2):
   
   ```
   quote="@"
   escape="@"
   ```
   
   Data:
   
   ```
   @foo@@bar@
   ```
   
   Produces output `foo@bar`.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support provided schema for CSV without headers
> ---
>
> Key: DRILL-7279
> URL: https://issues.apache.org/jira/browse/DRILL-7279
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.17.0
>
>
> Extend the Drill 1.16 provided schema support for the text reader to allow a 
> provided schema for files without headers. Behavior:
> * If the file is configured to not extract headers, and a schema is provided, 
> and the schema has at least one column, then use the provided schema to 
> create individual columns. Otherwise, continue to use {{columns}} as in 
> previous versions.
> * The columns in the schema are assumed to match left-to-right with those in 
> the file.
> * If the schema contains more columns than the file, the extra columns take 
> their default values. (This occurs in schema evolution when a column is added 
> to newer files.)
> * If the file contains more columns than the schema, then the extra columns, 
> at the end of the line, are ignored. This is the same behavior as occurs if 
> the file contains headers.
> h4. Table Properties
> Also adds four table properties for text files. These properties, if present, 
> override those defined in the format plugin configuration. The properties 
> allow the user to have a single "csv" config, but to have many tables with 
> the "csv" suffix, each with different properties. That is, the user need not 
> define a new plugin config, and define a new extension, just to change a file 
> format property. With this system, the user can have a ".csv" file with 
> headers; the user need not define a different suffix (usually ".csvh" in 
> Drill) for this case.
> || Table Property || Equivalent Plugin Config Property ||
> | {{drill.headers}} | {{extractHeader}} |
> | {{drill.skipFirstLine}} |  {{skipFirstLine}} | 
> | {{drill.delimiter}} |  {{fieldDelimiter}} | 
> |  {{drill.commentChar}} |  {{comment}}| 
> For each, the rules are:
> * If the table property is not set, then the plugin property is used.
> * If the table property is set, then the property value replaces the plugin 
> property value for that one specific table.
> * For the delimiter, if the property value is an empty string, then this is 
> the same as an unset property.
> * For the comment, if the property value is an empty string, then the comment 
> is set to the ASCII NULL, which will never match. This effectively turns off 
> the comment feature for this one table.
> * If the delimiter or comment value is longer than a single character, only 
> the first character is used.
> It is possible to use the table properties without specifying a "provided" 
> schema. Just omit any columns from the schema:

[jira] [Commented] (DRILL-7279) Support provided schema for CSV without headers

2019-06-05 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16857245#comment-16857245
 ] 

ASF GitHub Bot commented on DRILL-7279:
---

paul-rogers commented on pull request #1798: DRILL-7279: Enable provided schema 
for text files without headers
URL: https://github.com/apache/drill/pull/1798#discussion_r290993793
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/record/metadata/TupleMetadata.java
 ##
 @@ -46,6 +46,10 @@
 public interface TupleMetadata extends Propertied, Iterable {
 
   public static final String IS_STRICT_SCHEMA_PROP = DRILL_PROP_PREFIX + 
"strict";
+  public static final String HAS_HEADERS_PROP = DRILL_PROP_PREFIX + "headers";
+  public static final String SKIP_FIRST_LINE_PROP = DRILL_PROP_PREFIX + 
"skipFirstLine";
+  public static final String DELIMITER_PROP = DRILL_PROP_PREFIX + "delimiter";
 
 Review comment:
   Good question. To the code here, the property is just a string. Any 
encoding/decoding should be done in the code that populates the properties.
   
   Created a unit test to verify. Found the following SQL:
   
   ```
   drill.text.quote=
   ```
   
   Works to set a quote to a single-quote character.
   
   This SQL:
   
   ```
   drill.text.fieldDelimiter='\01'
   ```
   
   Works to set the delimiter to ASCII 1. The output in the `.drill.schema` 
file:
   
   ```
 "drill.text.fieldDelimiter" : "\u0001",
   ```
   
   The following SQL also works:
   
   ```
   drill.text.fieldDelimiter='\u0001'
   ```
   
   Also, seems the following work in SQL and the schema file:
   
   ```
   drill.text.newline='\n'
   drill.text.newline='\r'
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support provided schema for CSV without headers
> ---
>
> Key: DRILL-7279
> URL: https://issues.apache.org/jira/browse/DRILL-7279
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.17.0
>
>
> Extend the Drill 1.16 provided schema support for the text reader to allow a 
> provided schema for files without headers. Behavior:
> * If the file is configured to not extract headers, and a schema is provided, 
> and the schema has at least one column, then use the provided schema to 
> create individual columns. Otherwise, continue to use {{columns}} as in 
> previous versions.
> * The columns in the schema are assumed to match left-to-right with those in 
> the file.
> * If the schema contains more columns than the file, the extra columns take 
> their default values. (This occurs in schema evolution when a column is added 
> to newer files.)
> * If the file contains more columns than the schema, then the extra columns, 
> at the end of the line, are ignored. This is the same behavior as occurs if 
> the file contains headers.
> h4. Table Properties
> Also adds four table properties for text files. These properties, if present, 
> override those defined in the format plugin configuration. The properties 
> allow the user to have a single "csv" config, but to have many tables with 
> the "csv" suffix, each with different properties. That is, the user need not 
> define a new plugin config, and define a new extension, just to change a file 
> format property. With this system, the user can have a ".csv" file with 
> headers; the user need not define a different suffix (usually ".csvh" in 
> Drill) for this case.
> || Table Property || Equivalent Plugin Config Property ||
> | {{drill.headers}} | {{extractHeader}} |
> | {{drill.skipFirstLine}} |  {{skipFirstLine}} | 
> | {{drill.delimiter}} |  {{fieldDelimiter}} | 
> |  {{drill.commentChar}} |  {{comment}}| 
> For each, the rules are:
> * If the table property is not set, then the plugin property is used.
> * If the table property is set, then the property value replaces the plugin 
> property value for that one specific table.
> * For the delimiter, if the property value is an empty string, then this is 
> the same as an unset property.
> * For the comment, if the property value is an empty string, then the comment 
> is set to the ASCII NULL, which will never match. This effectively turns off 
> the comment feature for this one table.
> * If the delimiter or comment value is longer than a single character, only 
> the first character is used.
> It is possible to use the table properties without specifying a "provided" 
> schema. Just omit any columns from the schema:

[jira] [Commented] (DRILL-7279) Support provided schema for CSV without headers

2019-06-05 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16857246#comment-16857246
 ] 

ASF GitHub Bot commented on DRILL-7279:
---

paul-rogers commented on pull request #1798: DRILL-7279: Enable provided schema 
for text files without headers
URL: https://github.com/apache/drill/pull/1798#discussion_r290986177
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/record/metadata/TupleMetadata.java
 ##
 @@ -46,6 +46,10 @@
 public interface TupleMetadata extends Propertied, Iterable {
 
   public static final String IS_STRICT_SCHEMA_PROP = DRILL_PROP_PREFIX + 
"strict";
+  public static final String HAS_HEADERS_PROP = DRILL_PROP_PREFIX + "headers";
 
 Review comment:
   Moved to the text format plugin.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support provided schema for CSV without headers
> ---
>
> Key: DRILL-7279
> URL: https://issues.apache.org/jira/browse/DRILL-7279
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.17.0
>
>
> Extend the Drill 1.16 provided schema support for the text reader to allow a 
> provided schema for files without headers. Behavior:
> * If the file is configured to not extract headers, and a schema is provided, 
> and the schema has at least one column, then use the provided schema to 
> create individual columns. Otherwise, continue to use {{columns}} as in 
> previous versions.
> * The columns in the schema are assumed to match left-to-right with those in 
> the file.
> * If the schema contains more columns than the file, the extra columns take 
> their default values. (This occurs in schema evolution when a column is added 
> to newer files.)
> * If the file contains more columns than the schema, then the extra columns, 
> at the end of the line, are ignored. This is the same behavior as occurs if 
> the file contains headers.
> h4. Table Properties
> Also adds four table properties for text files. These properties, if present, 
> override those defined in the format plugin configuration. The properties 
> allow the user to have a single "csv" config, but to have many tables with 
> the "csv" suffix, each with different properties. That is, the user need not 
> define a new plugin config, and define a new extension, just to change a file 
> format property. With this system, the user can have a ".csv" file with 
> headers; the user need not define a different suffix (usually ".csvh" in 
> Drill) for this case.
> || Table Property || Equivalent Plugin Config Property ||
> | {{drill.headers}} | {{extractHeader}} |
> | {{drill.skipFirstLine}} |  {{skipFirstLine}} | 
> | {{drill.delimiter}} |  {{fieldDelimiter}} | 
> |  {{drill.commentChar}} |  {{comment}}| 
> For each, the rules are:
> * If the table property is not set, then the plugin property is used.
> * If the table property is set, then the property value replaces the plugin 
> property value for that one specific table.
> * For the delimiter, if the property value is an empty string, then this is 
> the same as an unset property.
> * For the comment, if the property value is an empty string, then the comment 
> is set to the ASCII NULL, which will never match. This effectively turns off 
> the comment feature for this one table.
> * If the delimiter or comment value is longer than a single character, only 
> the first character is used.
> It is possible to use the table properties without specifying a "provided" 
> schema. Just omit any columns from the schema:
> {noformat}
> create schema () for table `dfs.data`.`example`
> PROPERTIES ('drill.headers'='false', 'drill.skipFirstLine'='false', 
> 'drill.delimiter'='|')
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7279) Support provided schema for CSV without headers

2019-06-05 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16857244#comment-16857244
 ] 

ASF GitHub Bot commented on DRILL-7279:
---

paul-rogers commented on pull request #1798: DRILL-7279: Enable provided schema 
for text files without headers
URL: https://github.com/apache/drill/pull/1798#discussion_r291003536
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/record/metadata/TupleMetadata.java
 ##
 @@ -46,6 +46,10 @@
 public interface TupleMetadata extends Propertied, Iterable {
 
   public static final String IS_STRICT_SCHEMA_PROP = DRILL_PROP_PREFIX + 
"strict";
+  public static final String HAS_HEADERS_PROP = DRILL_PROP_PREFIX + "headers";
+  public static final String SKIP_FIRST_LINE_PROP = DRILL_PROP_PREFIX + 
"skipFirstLine";
+  public static final String DELIMITER_PROP = DRILL_PROP_PREFIX + "delimiter";
+  public static final String COMMENT_CHAR_PROP = DRILL_PROP_PREFIX + 
"commentChar";
 
 Review comment:
   Added all properties. Renamed the properties to have a `text.` prefix. This 
then allowed the properties to use the same field names as in the format 
config. Example: `drill.text.fieldDelimiter`. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support provided schema for CSV without headers
> ---
>
> Key: DRILL-7279
> URL: https://issues.apache.org/jira/browse/DRILL-7279
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.17.0
>
>
> Extend the Drill 1.16 provided schema support for the text reader to allow a 
> provided schema for files without headers. Behavior:
> * If the file is configured to not extract headers, and a schema is provided, 
> and the schema has at least one column, then use the provided schema to 
> create individual columns. Otherwise, continue to use {{columns}} as in 
> previous versions.
> * The columns in the schema are assumed to match left-to-right with those in 
> the file.
> * If the schema contains more columns than the file, the extra columns take 
> their default values. (This occurs in schema evolution when a column is added 
> to newer files.)
> * If the file contains more columns than the schema, then the extra columns, 
> at the end of the line, are ignored. This is the same behavior as occurs if 
> the file contains headers.
> h4. Table Properties
> Also adds four table properties for text files. These properties, if present, 
> override those defined in the format plugin configuration. The properties 
> allow the user to have a single "csv" config, but to have many tables with 
> the "csv" suffix, each with different properties. That is, the user need not 
> define a new plugin config, and define a new extension, just to change a file 
> format property. With this system, the user can have a ".csv" file with 
> headers; the user need not define a different suffix (usually ".csvh" in 
> Drill) for this case.
> || Table Property || Equivalent Plugin Config Property ||
> | {{drill.headers}} | {{extractHeader}} |
> | {{drill.skipFirstLine}} |  {{skipFirstLine}} | 
> | {{drill.delimiter}} |  {{fieldDelimiter}} | 
> |  {{drill.commentChar}} |  {{comment}}| 
> For each, the rules are:
> * If the table property is not set, then the plugin property is used.
> * If the table property is set, then the property value replaces the plugin 
> property value for that one specific table.
> * For the delimiter, if the property value is an empty string, then this is 
> the same as an unset property.
> * For the comment, if the property value is an empty string, then the comment 
> is set to the ASCII NULL, which will never match. This effectively turns off 
> the comment feature for this one table.
> * If the delimiter or comment value is longer than a single character, only 
> the first character is used.
> It is possible to use the table properties without specifying a "provided" 
> schema. Just omit any columns from the schema:
> {noformat}
> create schema () for table `dfs.data`.`example`
> PROPERTIES ('drill.headers'='false', 'drill.skipFirstLine'='false', 
> 'drill.delimiter'='|')
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (DRILL-7181) [Text V3 Reader] Exception with inadequate message is thrown if select columns as array with extractHeader set to true

2019-06-05 Thread Anton Gozhiy (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Gozhiy closed DRILL-7181.
---

Verified with Drill version 1.17.0-SNAPSHOT (commit 
2615d68de4e44b1f03f5c047018548c06a7396b4)
The message is clear now.

> [Text V3 Reader] Exception with inadequate message is thrown if select 
> columns as array with extractHeader set to true
> --
>
> Key: DRILL-7181
> URL: https://issues.apache.org/jira/browse/DRILL-7181
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Anton Gozhiy
>Assignee: Paul Rogers
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.17.0
>
>
> *Prerequisites:*
>  # Create a simple .csv file with header, like this:
> {noformat}
> col1,col2,col3
> 1,2,3
> 4,5,6
> 7,8,9
> {noformat}
>  # Set exec.storage.enable_v3_text_reader=true
>  # Set "extractHeader": true for csv format in dfs storage plugin.
> *Query:*
> {code:sql}
> select columns[0] from dfs.tmp.`/test.csv`
> {code}
> *Expected result:* Exception should happen, here is the message from V2 
> reader:
> {noformat}
> UNSUPPORTED_OPERATION ERROR: Drill Remote Exception
>   (java.lang.Exception) UNSUPPORTED_OPERATION ERROR: With extractHeader 
> enabled, only header names are supported
> column name columns
> column index
> Fragment 0:0
> [Error Id: 5affa696-1dbd-43d7-ac14-72d235c00f43 on userf87d-pc:31010]
> org.apache.drill.common.exceptions.UserException$Builder.build():630
> 
> org.apache.drill.exec.store.easy.text.compliant.FieldVarCharOutput.():106
> 
> org.apache.drill.exec.store.easy.text.compliant.CompliantTextRecordReader.setup():139
> org.apache.drill.exec.physical.impl.ScanBatch.getNextReaderIfHas():321
> org.apache.drill.exec.physical.impl.ScanBatch.internalNext():216
> org.apache.drill.exec.physical.impl.ScanBatch.next():271
> org.apache.drill.exec.record.AbstractRecordBatch.next():126
> org.apache.drill.exec.record.AbstractRecordBatch.next():116
> org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
> org.apache.drill.exec.physical.impl.limit.LimitRecordBatch.innerNext():101
> org.apache.drill.exec.record.AbstractRecordBatch.next():186
> org.apache.drill.exec.record.AbstractRecordBatch.next():126
> org.apache.drill.exec.record.AbstractRecordBatch.next():116
> org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
> org.apache.drill.exec.physical.impl.limit.LimitRecordBatch.innerNext():101
> org.apache.drill.exec.record.AbstractRecordBatch.next():186
> org.apache.drill.exec.record.AbstractRecordBatch.next():126
> org.apache.drill.exec.record.AbstractRecordBatch.next():116
> org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
> org.apache.drill.exec.record.AbstractRecordBatch.next():186
> org.apache.drill.exec.record.AbstractRecordBatch.next():126
> org.apache.drill.exec.record.AbstractRecordBatch.next():116
> org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():141
> org.apache.drill.exec.record.AbstractRecordBatch.next():186
> org.apache.drill.exec.physical.impl.BaseRootExec.next():104
> 
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():83
> org.apache.drill.exec.physical.impl.BaseRootExec.next():94
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():296
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():283
> ...():0
> org.apache.hadoop.security.UserGroupInformation.doAs():1746
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():283
> org.apache.drill.common.SelfCleaningRunnable.run():38
> ...():0
> {noformat}
> *Actual result:* The exception message is inadequate:
> {noformat}
> org.apache.drill.common.exceptions.UserRemoteException: EXECUTION_ERROR 
> ERROR: Table schema must have exactly one column.
> Exception thrown from 
> org.apache.drill.exec.physical.impl.scan.ScanOperatorExec
> Fragment 0:0
> [Error Id: a76a1576-419a-413f-840f-088157167a6d on userf87d-pc:31010]
>   (java.lang.IllegalStateException) Table schema must have exactly one column.
> 
> org.apache.drill.exec.physical.impl.scan.columns.ColumnsArrayManager.resolveColumn():108
> 
> org.apache.drill.exec.physical.impl.scan.project.ReaderLevelProjection.resolveSpecial():91
> 
> org.apache.drill.exec.physical.impl.scan.project.ExplicitSchemaProjection.resolveRootTuple():62
> 
> org.apache.drill.exec.physical.impl.scan.project.ExplicitSchemaProjection.():52
> 
> org.apache.drill.exec.physical.impl.scan.project.ReaderSchemaOrchestrator.d

[jira] [Commented] (DRILL-7282) Apache Drill using Outdated Version of many Libraries

2019-06-05 Thread Arina Ielchiieva (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16856613#comment-16856613
 ] 

Arina Ielchiieva commented on DRILL-7282:
-

[~er.ayushsha...@gmail.com] I think its better check against latest Drill 
version, some of the dependencies were updated.

> Apache Drill using Outdated Version of many Libraries
> -
>
> Key: DRILL-7282
> URL: https://issues.apache.org/jira/browse/DRILL-7282
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.14.0
>Reporter: Ayush Sharma
>Priority: Major
> Fix For: 1.17.0
>
>
> Apache Drill is using outdated version of libraries and should update to the 
> latest versions to avoid security issues in future.
> The below are the list of libraries which needs to be updated:
> commons-compiler-2.7.6 - Latest Version 3.0.12 -Jan 2019 
> commons-compress-1.4.1 - Latest Version 1.18 - Aug 2018
> janino-2.7.6 - Latest Version 3.0.12 - Jan 2019
> jersey-common-2.8 - Latest Version 2.28 - Jan 2019
> jersey-container-servlet-core-2.8 - Latest Version 2.28 - Jan 2019
> jersey-guava-2.8 - Latest Version 2.28 - Jan 2019
> jersey-media-multipart-2.8 - Latest Version 2.28 - Jan 2019
> jersey-mvc-2.8 - Latest Version 2.28 - Jan 2019
> jersey-mvc-freemarker-2.8 - Latest Version 2.28 - Jan 2019
> jersey-server-2.8 - Latest Version 2.28 - Jan 2019
> jline-2.10 - Latest Version 3.0.0.M1 - May 2016
> log4j-over-slf4j-1.7.6 - Latest Version 1.8.0-beta4 - Feb 2019
> logback-classic-1.2.3 - Latest Version 1.3.0-alpha4 Feb 2018
> logback-core-1.2.3 - Latest Version 1.3.0-alpha4 Feb 2018
> mimepull-1.9.3 - Latest Version 1.9.11 - Jan 2019
> protostuff-json-1.0.8 - Latest Version 1.1.3 - Nov 2017
> reflections-0.9.10 - Latest Version 0.9.11 - Mar 2017
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7248) Apache Drill API returns the numeric column value as string

2019-06-05 Thread Arina Ielchiieva (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16856608#comment-16856608
 ] 

Arina Ielchiieva commented on DRILL-7248:
-

[~Gayathri01] not sure if there is a workaround.
There is similar Jira - https://issues.apache.org/jira/browse/DRILL-4821

> Apache Drill API returns the numeric column value as string
> ---
>
> Key: DRILL-7248
> URL: https://issues.apache.org/jira/browse/DRILL-7248
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.14.0
>Reporter: Gayathri
>Priority: Blocker
>
> *test.json:*
> [\{"a":"uxv","b":1,"c":2.0},\{"a":"uxv","b":2,"c":3.15}]
> +Example:+ *Request:* \{"queryType" : "SQL","query" : "select * from 
> dfs.`\files\test.json`"};
> *Response:* { "queryId": "233ff474-0902-828b-9efd-0a0bd57eee51",
>  "columns": [
>  "a",
>  "b",
>  "c"
>  ],
>  "rows": [
> { "b": "1", "c": " 2.0", "a": "uxv"}
> , \{ "b": "2", "c": "3.15", "a": "uxv"}
> ]
>  }
> Here in the response, why the column values are coming as string even though 
> it a numeric.
> Can anyone please help me in this. Is there any configuration where I can 
> modify to get the response column values in proper numeric format rather than 
> a string.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7156) Empty Parquet is not getting created if 0 records in result

2019-06-05 Thread Arina Ielchiieva (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16856605#comment-16856605
 ] 

Arina Ielchiieva commented on DRILL-7156:
-

[~le.louch] there are a similar issue in Drill:
https://issues.apache.org/jira/browse/DRILL-4517
https://issues.apache.org/jira/browse/DRILL-6885

If you can contribute the patch, it would be great.



> Empty Parquet is not getting created if 0 records in result
> ---
>
> Key: DRILL-7156
> URL: https://issues.apache.org/jira/browse/DRILL-7156
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Reporter: Sayalee Bhanavase
>Priority: Blocker
>
> I am creating parquet tables out of joins. If there is no record in join, it 
> does not create empty. table and when I reused the table my further script 
> fails. 
> Has anyone faced this issue? Any suggestion or workaround?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7214) Error While Strating Drill in distributed mode.

2019-06-05 Thread Arina Ielchiieva (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16856600#comment-16856600
 ] 

Arina Ielchiieva commented on DRILL-7214:
-

[~aashu] Do you have JDK or JRE installed? This is required to use Drill.

> Error While Strating Drill in distributed mode.
> ---
>
> Key: DRILL-7214
> URL: https://issues.apache.org/jira/browse/DRILL-7214
> Project: Apache Drill
>  Issue Type: Task
>  Components: Client - Java
>Affects Versions: 1.15.0
> Environment: centos7
>Reporter: Abhay Kumar Singh
>Priority: Blocker
>  Labels: beginner
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Exception in thread "main" 
> org.apache.drill.exec.exception.DrillbitStartupException: JDK Java compiler 
> not available. Ensure Drill is running with the java executable from a JDK 
> and not a JRE



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)