date:20190603

[jira] [Commented] (DRILL-7278) Refactor result set loader projection mechanism

2019-06-03 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-7278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16854845#comment-16854845
 ] 

ASF GitHub Bot commented on DRILL-7278:
---

arina-ielchiieva commented on pull request #1797: DRILL-7278: Refactor result 
set loader projection mechanism
URL: https://github.com/apache/drill/pull/1797#discussion_r289937319
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/scan/project/projSet/EmptyProjectionSet.java
 ##
 @@ -0,0 +1,40 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.scan.project.projSet;
+
+import org.apache.drill.common.exceptions.CustomErrorContext;
+import org.apache.drill.exec.physical.rowSet.ProjectionSet;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+
+/**
+ * Handles simple cases in which either all nor no columns are projected.
 
 Review comment:
   ```suggestion
* Handles simple cases in which either all or no columns are projected.
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor result set loader projection mechanism
> ---
>
> Key: DRILL-7278
> URL: https://issues.apache.org/jira/browse/DRILL-7278
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.17.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-7278) Refactor result set loader projection mechanism

2019-06-03 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-7278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16854856#comment-16854856
 ] 

ASF GitHub Bot commented on DRILL-7278:
---

arina-ielchiieva commented on pull request #1797: DRILL-7278: Refactor result 
set loader projection mechanism
URL: https://github.com/apache/drill/pull/1797#discussion_r289945691
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/rowSet/project/ProjectionType.java
 ##
 @@ -53,13 +109,26 @@ public static ProjectionType typeFor(MajorType majorType) 
{
 if (majorType.getMode() == DataMode.REPEATED) {
   return ARRAY;
 }
+if (majorType.getMinorType() == MinorType.LIST) {
+  return ARRAY;
+}
 return SCALAR;
   }
 
-  public boolean isCompatible(ProjectionType other) {
-switch (other) {
+  /**
+   * Reports if this type (representing an item in a projection list)
+   * is compatible with the projection type representing an actual
+   * column produced by an operator. The check is not symmetric.
 
 Review comment:
   `The check is not symmetric.` could you please explain this part?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor result set loader projection mechanism
> ---
>
> Key: DRILL-7278
> URL: https://issues.apache.org/jira/browse/DRILL-7278
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.17.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-7278) Refactor result set loader projection mechanism

2019-06-03 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-7278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16854847#comment-16854847
 ] 

ASF GitHub Bot commented on DRILL-7278:
---

arina-ielchiieva commented on pull request #1797: DRILL-7278: Refactor result 
set loader projection mechanism
URL: https://github.com/apache/drill/pull/1797#discussion_r289938524
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/scan/project/projSet/ProjectionSetBuilder.java
 ##
 @@ -0,0 +1,101 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.scan.project.projSet;
+
+import java.util.Collection;
+
+import org.apache.drill.common.exceptions.CustomErrorContext;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.physical.rowSet.ProjectionSet;
+import org.apache.drill.exec.physical.rowSet.project.RequestedTuple;
+import 
org.apache.drill.exec.physical.rowSet.project.RequestedTuple.TupleProjectionType;
+import org.apache.drill.exec.physical.rowSet.project.RequestedTupleImpl;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+
+public class ProjectionSetBuilder {
+
+  private RequestedTuple parsedProjection;
+  private TypeConverter typeConverter;
+  private CustomErrorContext errorContext;
+
+  /**
+   * Record (batch) readers often read a subset of available table columns,
+   * but want to use a writer schema that includes all columns for ease of
+   * writing. (For example, a CSV reader must read all columns, even if the 
user
+   * wants a subset. The unwanted columns are simply discarded.)
+   * 
+   * This option provides a projection list, in the form of column names, for
+   * those columns which are to be projected. Only those columns will be
+   * backed by value vectors; non-projected columns will be backed by "null"
+   * writers that discard all values.
+   *
+   * @param projection the list of projected columns
+   * @return this builder
+   */
+
+  public ProjectionSetBuilder projectionList(Collection 
projection) {
+if (projection == null) {
+  parsedProjection = null;
+} else {
+  parsedProjection = RequestedTupleImpl.parse(projection);
+}
+return this;
+  }
+
+  public ProjectionSetBuilder parsedProjection(RequestedTuple projection) {
+parsedProjection = projection;
+return this;
+  }
+
+  public ProjectionSetBuilder outputSchema(TupleMetadata schema) {
+typeConverter = TypeConverter.builder().providedSchema(schema).build();
+return this;
+  }
+
+  public ProjectionSetBuilder typeConverter(TypeConverter converter) {
+this.typeConverter = converter;
+return this;
+  }
+
+  public ProjectionSetBuilder errorContext(CustomErrorContext errorContext) {
+this.errorContext = errorContext;
+return this;
+  }
+
+  public ProjectionSet build() {
+TupleProjectionType projType = parsedProjection == null ?
+TupleProjectionType.ALL : parsedProjection.type();
+
+ProjectionSet projSet;
+switch (projType) {
+case ALL:
+  projSet = new WildcardProjectionSet(typeConverter);
+  break;
+case NONE:
+  projSet = ProjectionSetFactory.projectNone();
+  break;
+case SOME:
+  projSet =  new ExplicitProjectionSet(parsedProjection, typeConverter);
 
 Review comment:
   ```suggestion
 projSet = new ExplicitProjectionSet(parsedProjection, typeConverter);
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor result set loader projection mechanism
> ---
>
> Key: DRILL-7278
> URL: https://issues.apache.org/jira/browse/DRILL-7278
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>

[jira] [Commented] (DRILL-7278) Refactor result set loader projection mechanism

2019-06-03 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-7278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16854855#comment-16854855
 ] 

ASF GitHub Bot commented on DRILL-7278:
---

arina-ielchiieva commented on pull request #1797: DRILL-7278: Refactor result 
set loader projection mechanism
URL: https://github.com/apache/drill/pull/1797#discussion_r289938151
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/scan/project/projSet/ProjectedReadColumn.java
 ##
 @@ -0,0 +1,78 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.scan.project.projSet;
+
+import org.apache.drill.exec.physical.rowSet.ProjectionSet;
+import org.apache.drill.exec.physical.rowSet.project.ProjectionType;
+import 
org.apache.drill.exec.physical.rowSet.project.RequestedTuple.RequestedColumn;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.vector.accessor.convert.ColumnConversionFactory;
+
+/**
+ * Projected column. Includes at least the reader schema. May also
+ * include projection specification, and output schema and a type
+ * conversion.
+ */
+
+public class ProjectedReadColumn extends AbstractReadColProj {
+  private final RequestedColumn requestedCol;
+  private final ColumnMetadata outputSchema;
+  private final ColumnConversionFactory conversionFactory;
+
+  public ProjectedReadColumn(ColumnMetadata readSchema) {
+this(readSchema, null, null, null);
+  }
+
+  public ProjectedReadColumn(ColumnMetadata readSchema,
+  RequestedColumn requestedCol) {
+this(readSchema, requestedCol, null, null);
+  }
+
+  public ProjectedReadColumn(ColumnMetadata readSchema,
+  ColumnMetadata outputSchema, ColumnConversionFactory conversionFactory) {
+this(readSchema, null, outputSchema, null);
+  }
+
+  public ProjectedReadColumn(ColumnMetadata readSchema,
+  RequestedColumn requestedCol, ColumnMetadata outputSchema,
+  ColumnConversionFactory conversionFactory) {
+super(readSchema);
+this.requestedCol = requestedCol;
+this.outputSchema = outputSchema;
+this.conversionFactory = conversionFactory;
+  }
+
+  @Override
+  public ColumnMetadata providedSchema() {
+return outputSchema == null ? readSchema : outputSchema;
+  }
+
+  @Override
+  public ProjectionSet mapProjection() {
+// Should never occur: maps should use the map class.
+return null;
 
 Review comment:
   May be thrown an error instead? Or null will be handled somewhere if 
occurred?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor result set loader projection mechanism
> ---
>
> Key: DRILL-7278
> URL: https://issues.apache.org/jira/browse/DRILL-7278
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.17.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-7278) Refactor result set loader projection mechanism

2019-06-03 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-7278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16854857#comment-16854857
 ] 

ASF GitHub Bot commented on DRILL-7278:
---

arina-ielchiieva commented on pull request #1797: DRILL-7278: Refactor result 
set loader projection mechanism
URL: https://github.com/apache/drill/pull/1797#discussion_r289942694
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/scan/project/projSet/package-info.java
 ##
 @@ -0,0 +1,98 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/**
+ * The dynamic projection in Drill is complex. With the advent of
+ * provided schema, we now have many ways to manage projection. The
+ * classes here implement these many policies. They are implemnted
+ * as distinct classes (rather than chains of if-statements) to
+ * make the classes easier to test and reason about.
+ * 
+ * Projection is a combination of three distinct policies:
+ * 
+ * Projection policy (all, none, explicit, etc.)
+ * Column policy (unprojected, explicit projection,
+ * projection with schema, etc.)
+ * Type conversion: none, based on a provided schema,
+ * custom.
+ * 
+ * Experience has shown that these must be separated: each is designed
+ * and tested separately to keep the problem tractable.
+ *
+ * Projection Set Cases
+ *
+ * The project cases and their classes:
+ * 
+ * 
+ * {@link EmptyProjectionSet}
+ * SELECT COUNT(*): Project nothing. Only count records.
+ * 
+ * {@link WildcardProjectionSet}
+ * SELECT *: Project everything, with an optional provided
+ * schema. If a schema is provided, and is strict, then project only
+ * reader columns that appear in the provided schema.
+ * However, don't project columns which have been marked as
+ * special: {@link ColumnMetadata#EXCLUDE_FROM_WILDCARD}, whether marked
+ * in the reader or provided schemas.
+ * {@link ExplicitProjectionSet}
+ * SELECT a, b[10], c.d: Explicit projection with or without
+ * a schema. Project only the selected columns. Verify that the reader
+ * provides column types/modes consistent with the implied form in the
+ * projection list. That is, in this example, `b` must be an array.
+ * 
+ *
+ * Column Projection Cases
+ *
+ * Each projection set answers a query: "the reader wants to add such-and-so
+ * column: what should I do?" Since the reader is free to add any column,
+ * we don't cache the list of columns as is done with the parsed project
+ * list, or the output schema. Instead, we handle each column on a
+ * case-by-case basis; we create a {@link ColumnReadProjection} instance
+ * to answer the query. Instances of this class are meant to be transient:
+ * use them and discard them. We answer the query differently depending on
+ * many factors, including:
+ * 
+ * 
+ * {@link UnprojectedReadColumn}
+ * Column is not projected. Nothing to convert, no type checks
+ * needed. The result set loader should create a dummy writer for this
+ * case.
+ * {@link ProjectedReadColumn}
+ * Column is projected. It may have an associated projection list
+ * item, an output schema, or a type conversion. All these variations
+ * should be transparent to the consumer.
+ * 
+ *
+ * Type Conversion
+ *
+ * The {@link TypeConverter} class handles a provided schema, custom type
+ * conversion, and custom properties passed to the conversion shims. A null
+ * type converter passed to a projection set means no conversion is done.
 
 Review comment:
   Same here - should we use some dummy converter to avoid dealing with nulls?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor result set loader projection mechanism
> ---
>
> Key: DRILL-7278
> URL: https://issues.apache.org/jira/browse/DRILL-7278
> Project: Apache Drill
>  Issue Type: Improvement
>Aff

[jira] [Commented] (DRILL-7278) Refactor result set loader projection mechanism

2019-06-03 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-7278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16854852#comment-16854852
 ] 

ASF GitHub Bot commented on DRILL-7278:
---

arina-ielchiieva commented on pull request #1797: DRILL-7278: Refactor result 
set loader projection mechanism
URL: https://github.com/apache/drill/pull/1797#discussion_r289941138
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/scan/project/projSet/TypeConverter.java
 ##
 @@ -0,0 +1,168 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.scan.project.projSet;
+
+import java.util.HashMap;
+import java.util.Map;
+
+import org.apache.drill.common.exceptions.CustomErrorContext;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.vector.accessor.convert.ColumnConversionFactory;
+import org.apache.drill.exec.vector.accessor.convert.StandardConversions;
+import 
org.apache.drill.exec.vector.accessor.convert.StandardConversions.ConversionDefn;
+import 
org.apache.drill.exec.vector.accessor.convert.StandardConversions.ConversionType;
+
+public class TypeConverter {
+  private static final org.slf4j.Logger logger =
+  org.slf4j.LoggerFactory.getLogger(TypeConverter.class);
+
+  public static class Builder {
+private TupleMetadata providedSchema;
+private TypeConverter.CustomTypeTransform transform;
+private Map properties;
+private CustomErrorContext errorContext;
+
+public Builder providedSchema(TupleMetadata schema) {
+  providedSchema = schema;
+  return this;
+}
+
+public Builder transform(TypeConverter.CustomTypeTransform transform) {
+  this.transform = transform;
+  return this;
+}
+
+public Builder properties(Map properties) {
+  this.properties = properties;
+  return this;
+}
+
+public Builder setConversionProperty(String key, String value) {
+  if (key == null || value == null) {
+return this;
+  }
+  if (properties == null) {
+properties = new HashMap<>();
+  }
+  properties.put(key, value);
+  return this;
+}
+
+public Builder errorContext(CustomErrorContext errorContext) {
+  this.errorContext = errorContext;
+  return this;
+}
+
+public TypeConverter build() {
+  return new TypeConverter(this);
+}
+  }
+
+  public static interface CustomTypeTransform {
+ColumnConversionFactory transform(ColumnMetadata inputDefn,
+Map properties,
+ColumnMetadata outputDefn, ConversionDefn defn);
+  }
+
+  private final TupleMetadata providedSchema;
+  private final TypeConverter.CustomTypeTransform customTransform;
+  private final Map properties;
+  private final CustomErrorContext errorContext;
+
+  public static Builder builder() { return new Builder(); }
+
+  public TypeConverter(Builder builder) {
+this.providedSchema = builder.providedSchema;
+this.customTransform = builder.transform;
+this.properties = builder.properties;
+this.errorContext = builder.errorContext;
+  }
+
+  public TypeConverter(TypeConverter parent,
+  TupleMetadata childSchema) {
+this.providedSchema = childSchema;
+this.customTransform = parent.customTransform;
+this.properties = parent.properties;
+this.errorContext = parent.errorContext;
+  }
+
+  public TupleMetadata providedSchema() { return providedSchema; }
+
+  public ColumnConversionFactory conversionFactory(ColumnMetadata inputSchema,
+  ColumnMetadata outputCol) {
+if (outputCol == null) {
+  return customConversion(inputSchema);
+} else {
+  return schemaBasedConversion(inputSchema, outputCol);
+}
+  }
+
+  private ColumnConversionFactory customConversion(ColumnMetadata inputSchema) 
{
+if (customTransform == null) {
+  return null;
+}
+return customTransform.transform(inputSchema, properties, null, null);
+  }
+
+  public ColumnConversionFactory schemaBasedConversion(ColumnMetadata 
in

[jira] [Commented] (DRILL-7278) Refactor result set loader projection mechanism

2019-06-03 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-7278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16854853#comment-16854853
 ] 

ASF GitHub Bot commented on DRILL-7278:
---

arina-ielchiieva commented on pull request #1797: DRILL-7278: Refactor result 
set loader projection mechanism
URL: https://github.com/apache/drill/pull/1797#discussion_r289938831
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/scan/project/projSet/ProjectionSetFactory.java
 ##
 @@ -0,0 +1,69 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.scan.project.projSet;
+
+import java.util.List;
+import java.util.Map;
+
+import org.apache.drill.common.expression.SchemaPath;
+import 
org.apache.drill.exec.physical.impl.scan.project.projSet.TypeConverter.CustomTypeTransform;
+import org.apache.drill.exec.physical.rowSet.ProjectionSet;
+import org.apache.drill.exec.physical.rowSet.project.RequestedTuple;
+import org.apache.drill.exec.physical.rowSet.project.RequestedTupleImpl;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.vector.accessor.convert.ColumnConversionFactory;
+import 
org.apache.drill.exec.vector.accessor.convert.StandardConversions.ConversionDefn;
+
+public class ProjectionSetFactory {
+
+  public static ProjectionSet projectAll() { return new 
WildcardProjectionSet(null); }
+
+  public static ProjectionSet projectNone() { return 
EmptyProjectionSet.PROJECT_NONE; }
+
+  public static ProjectionSet wrap(RequestedTuple mapProjection) {
+switch (mapProjection.type()) {
+case ALL:
+  return projectAll();
+case NONE:
+  return projectNone();
+case SOME:
+  return new ExplicitProjectionSet(mapProjection, null);
+default:
+  throw new IllegalStateException(mapProjection.type().toString());
 
 Review comment:
   Same here.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor result set loader projection mechanism
> ---
>
> Key: DRILL-7278
> URL: https://issues.apache.org/jira/browse/DRILL-7278
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.17.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-7278) Refactor result set loader projection mechanism

2019-06-03 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-7278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16854854#comment-16854854
 ] 

ASF GitHub Bot commented on DRILL-7278:
---

arina-ielchiieva commented on pull request #1797: DRILL-7278: Refactor result 
set loader projection mechanism
URL: https://github.com/apache/drill/pull/1797#discussion_r289944701
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/rowSet/project/ProjectionType.java
 ##
 @@ -15,19 +15,70 @@
  * See the License for the specific language governing permissions and
  * limitations under the License.
  */
-package org.apache.drill.exec.record.metadata;
+package org.apache.drill.exec.physical.rowSet.project;
 
 import org.apache.drill.common.types.TypeProtos.DataMode;
 import org.apache.drill.common.types.TypeProtos.MajorType;
 import org.apache.drill.common.types.TypeProtos.MinorType;
 
+/**
+ * Specifies the type of projection obtained by parsing the
+ * projection list. The type is returned from a query of the
+ * form "how is this column projected, if at all?"
+ * 
+ * The projection type allows the scan framework to catch
+ * inconsistencies, such as projecting an array as a map,
+ * and so on.
+ */
+
 public enum ProjectionType {
+
+  /**
+   * The column is not projected in the query.
+   */
+
   UNPROJECTED,
+
+  /**
+   * Projection is a wildcard.
+   */
   WILDCARD, // *
-  UNSPECIFIED,  // x
+
+  /**
+   * Projection is by simple name. "General" means that
+   * we have no hints about the type of the column from
+   * the projection.
+   */
+
+  GENERAL,  // x
 
 Review comment:
   Not sure if `GENERAL` is better than `UNSPECIFIED` but let it be.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor result set loader projection mechanism
> ---
>
> Key: DRILL-7278
> URL: https://issues.apache.org/jira/browse/DRILL-7278
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.17.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-7278) Refactor result set loader projection mechanism

2019-06-03 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-7278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16854846#comment-16854846
 ] 

ASF GitHub Bot commented on DRILL-7278:
---

arina-ielchiieva commented on pull request #1797: DRILL-7278: Refactor result 
set loader projection mechanism
URL: https://github.com/apache/drill/pull/1797#discussion_r289936396
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/scan/project/package-info.java
 ##
 @@ -71,9 +141,72 @@
  * distinct class. Classes combine via composition to create a "framework"
  * suitable for each kind of reader: whether it be early or late schema,
  * file-based or something else, etc.
+ *
+ * Nuances of Reader-Level Projection
+ *
+ * We've said that the scan-level projection identifies what the query
+ * wants. We've said that the reader identifies what the external
+ * data actually is. We've mentioned how we bridge between the
+ * two. Here we explore this in more detail.
+ * 
+ * Run-time schema resolution occurs at various stages:
  * 
+ * 
+ * The per-column resolution identified earlier: matching types,
+ * type conversion, and so on.
+ * The reader will provides some set of columns. We don't know which
 
 Review comment:
   ```suggestion
* The reader will provide some set of columns. We don't know which
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor result set loader projection mechanism
> ---
>
> Key: DRILL-7278
> URL: https://issues.apache.org/jira/browse/DRILL-7278
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.17.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-7278) Refactor result set loader projection mechanism

2019-06-03 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-7278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16854850#comment-16854850
 ] 

ASF GitHub Bot commented on DRILL-7278:
---

arina-ielchiieva commented on pull request #1797: DRILL-7278: Refactor result 
set loader projection mechanism
URL: https://github.com/apache/drill/pull/1797#discussion_r289946612
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/record/metadata/AbstractColumnMetadata.java
 ##
 @@ -30,10 +30,11 @@
 import org.apache.drill.exec.record.metadata.schema.parser.SchemaExprParser;
 import org.joda.time.format.DateTimeFormatter;
 
-import java.io.IOException;
-import java.util.HashMap;
-import java.util.Map;
-import java.util.stream.Collectors;
+import com.fasterxml.jackson.annotation.JsonAutoDetect;
 
 Review comment:
   Is this correct import order?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor result set loader projection mechanism
> ---
>
> Key: DRILL-7278
> URL: https://issues.apache.org/jira/browse/DRILL-7278
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.17.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-7278) Refactor result set loader projection mechanism

2019-06-03 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-7278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16854858#comment-16854858
 ] 

ASF GitHub Bot commented on DRILL-7278:
---

arina-ielchiieva commented on pull request #1797: DRILL-7278: Refactor result 
set loader projection mechanism
URL: https://github.com/apache/drill/pull/1797#discussion_r289941630
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/scan/project/projSet/package-info.java
 ##
 @@ -0,0 +1,98 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/**
+ * The dynamic projection in Drill is complex. With the advent of
+ * provided schema, we now have many ways to manage projection. The
+ * classes here implement these many policies. They are implemnted
 
 Review comment:
   ```suggestion
* classes here implement these many policies. They are implemented
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor result set loader projection mechanism
> ---
>
> Key: DRILL-7278
> URL: https://issues.apache.org/jira/browse/DRILL-7278
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.17.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-7278) Refactor result set loader projection mechanism

2019-06-03 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-7278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16854848#comment-16854848
 ] 

ASF GitHub Bot commented on DRILL-7278:
---

arina-ielchiieva commented on pull request #1797: DRILL-7278: Refactor result 
set loader projection mechanism
URL: https://github.com/apache/drill/pull/1797#discussion_r289938759
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/scan/project/projSet/ProjectionSetBuilder.java
 ##
 @@ -0,0 +1,101 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.scan.project.projSet;
+
+import java.util.Collection;
+
+import org.apache.drill.common.exceptions.CustomErrorContext;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.physical.rowSet.ProjectionSet;
+import org.apache.drill.exec.physical.rowSet.project.RequestedTuple;
+import 
org.apache.drill.exec.physical.rowSet.project.RequestedTuple.TupleProjectionType;
+import org.apache.drill.exec.physical.rowSet.project.RequestedTupleImpl;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+
+public class ProjectionSetBuilder {
+
+  private RequestedTuple parsedProjection;
+  private TypeConverter typeConverter;
+  private CustomErrorContext errorContext;
+
+  /**
+   * Record (batch) readers often read a subset of available table columns,
+   * but want to use a writer schema that includes all columns for ease of
+   * writing. (For example, a CSV reader must read all columns, even if the 
user
+   * wants a subset. The unwanted columns are simply discarded.)
+   * 
+   * This option provides a projection list, in the form of column names, for
+   * those columns which are to be projected. Only those columns will be
+   * backed by value vectors; non-projected columns will be backed by "null"
+   * writers that discard all values.
+   *
+   * @param projection the list of projected columns
+   * @return this builder
+   */
+
+  public ProjectionSetBuilder projectionList(Collection 
projection) {
+if (projection == null) {
+  parsedProjection = null;
+} else {
+  parsedProjection = RequestedTupleImpl.parse(projection);
+}
+return this;
+  }
+
+  public ProjectionSetBuilder parsedProjection(RequestedTuple projection) {
+parsedProjection = projection;
+return this;
+  }
+
+  public ProjectionSetBuilder outputSchema(TupleMetadata schema) {
+typeConverter = TypeConverter.builder().providedSchema(schema).build();
+return this;
+  }
+
+  public ProjectionSetBuilder typeConverter(TypeConverter converter) {
+this.typeConverter = converter;
+return this;
+  }
+
+  public ProjectionSetBuilder errorContext(CustomErrorContext errorContext) {
+this.errorContext = errorContext;
+return this;
+  }
+
+  public ProjectionSet build() {
+TupleProjectionType projType = parsedProjection == null ?
+TupleProjectionType.ALL : parsedProjection.type();
+
+ProjectionSet projSet;
+switch (projType) {
+case ALL:
+  projSet = new WildcardProjectionSet(typeConverter);
+  break;
+case NONE:
+  projSet = ProjectionSetFactory.projectNone();
+  break;
+case SOME:
+  projSet =  new ExplicitProjectionSet(parsedProjection, typeConverter);
+  break;
+default:
+  throw new IllegalStateException(projType.toString());
 
 Review comment:
   Please add to the error message: "Unexpected projection type: "
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor result set loader projection mechanism
> ---
>
> Key: DRILL-7278
> URL: https://issues.apache.org/jira/browse/DRILL-7278
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.16.0
>Reporter: Paul

[jira] [Commented] (DRILL-7278) Refactor result set loader projection mechanism

2019-06-03 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-7278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16854860#comment-16854860
 ] 

ASF GitHub Bot commented on DRILL-7278:
---

arina-ielchiieva commented on pull request #1797: DRILL-7278: Refactor result 
set loader projection mechanism
URL: https://github.com/apache/drill/pull/1797#discussion_r289946314
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/rowSet/project/RequestedTupleImpl.java
 ##
 @@ -74,6 +75,7 @@
 public class RequestedTupleImpl implements RequestedTuple {
 
   private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(RequestedTupleImpl.class);
+  private static final Collection PROJECT_ALL = 
ImmutableList.of(SchemaPath.STAR_COLUMN);
 
 Review comment:
   Consider using build-in Java methods instead of guava when possible: 
`Collections.sigletonList`.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor result set loader projection mechanism
> ---
>
> Key: DRILL-7278
> URL: https://issues.apache.org/jira/browse/DRILL-7278
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.17.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-7278) Refactor result set loader projection mechanism

2019-06-03 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-7278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16854849#comment-16854849
 ] 

ASF GitHub Bot commented on DRILL-7278:
---

arina-ielchiieva commented on pull request #1797: DRILL-7278: Refactor result 
set loader projection mechanism
URL: https://github.com/apache/drill/pull/1797#discussion_r289943217
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/rowSet/ProjectionSet.java
 ##
 @@ -0,0 +1,105 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.rowSet;
+
+import org.apache.drill.common.exceptions.CustomErrorContext;
+import org.apache.drill.exec.physical.rowSet.project.ProjectionType;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.vector.accessor.convert.ColumnConversionFactory;
+import 
org.apache.drill.shaded.guava.com.google.common.annotations.VisibleForTesting;
+
+/**
+ * Provides a dynamic, run-time view of a projection set. Used by
+ * the result set loader to:
+ * 
+ * Determine if a column is projected according to some
+ * defined projection schema (see implementation for details.)
+ * Provide type conversions, either using built-in implicit
+ * conversions, or a custom conversion. Type conversions require
+ * the reader column and a "provided" column that gives the "to"
+ * type for the conversion. Without the "to" column, the reader
+ * column type is used as-is.
+ * Verify that the (possibly converted) type and mode are
+ * compatible with an explicit projection item. For example, if
+ * the query has `a.b`, but `a` is scalar, then there is an
+ * inconsistency.
+ * 
+ * 
+ * This interface filters columns added dynamically
+ * at scan time. The reader may offer a column (as to add a column
+ * writer for the column.) The projection mechanism says whether to
+ * materialize the column, or whether to ignore the column and
+ * return a dummy column writer.
+ * 
+ * The Project All must handle several additional nuances:
+ * 
+ * External schema: If an external schema is provided, then that
+ * schema may be "strict" which causes the wildcard to expand to the
+ * set of columns defined within the schema. When used with columns
+ * added dynamically, a column may be excluded from the projection
+ * set if it is not part of the defined external schema.
+ * Metadata filtering: A reader may offer a special column which
+ * is available only in explicit projection, and behaves like Drill's
+ * implicit file columns. Such columns are not included in a "project
+ * all" projection.
+ * 
+ * At present, only the top-level row supports these additional filtering
+ * options; they are not supported on mays (though could be with additional
 
 Review comment:
   ```suggestion
* options; they are not supported on maps (though could be with additional
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor result set loader projection mechanism
> ---
>
> Key: DRILL-7278
> URL: https://issues.apache.org/jira/browse/DRILL-7278
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.17.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-7278) Refactor result set loader projection mechanism

2019-06-03 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-7278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16854851#comment-16854851
 ] 

ASF GitHub Bot commented on DRILL-7278:
---

arina-ielchiieva commented on pull request #1797: DRILL-7278: Refactor result 
set loader projection mechanism
URL: https://github.com/apache/drill/pull/1797#discussion_r289939372
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/scan/project/projSet/ProjectionSetFactory.java
 ##
 @@ -0,0 +1,69 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.scan.project.projSet;
+
+import java.util.List;
+import java.util.Map;
+
+import org.apache.drill.common.expression.SchemaPath;
+import 
org.apache.drill.exec.physical.impl.scan.project.projSet.TypeConverter.CustomTypeTransform;
+import org.apache.drill.exec.physical.rowSet.ProjectionSet;
+import org.apache.drill.exec.physical.rowSet.project.RequestedTuple;
+import org.apache.drill.exec.physical.rowSet.project.RequestedTupleImpl;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.vector.accessor.convert.ColumnConversionFactory;
+import 
org.apache.drill.exec.vector.accessor.convert.StandardConversions.ConversionDefn;
+
+public class ProjectionSetFactory {
+
+  public static ProjectionSet projectAll() { return new 
WildcardProjectionSet(null); }
+
+  public static ProjectionSet projectNone() { return 
EmptyProjectionSet.PROJECT_NONE; }
+
+  public static ProjectionSet wrap(RequestedTuple mapProjection) {
+switch (mapProjection.type()) {
+case ALL:
+  return projectAll();
+case NONE:
+  return projectNone();
+case SOME:
+  return new ExplicitProjectionSet(mapProjection, null);
+default:
+  throw new IllegalStateException(mapProjection.type().toString());
+}
+  }
+
+  public static ProjectionSet build(List selection) {
+if (selection == null) {
+  return projectAll();
+}
+return wrap(RequestedTupleImpl.parse(selection));
+  }
+
+  public static CustomTypeTransform simpleTransform(ColumnConversionFactory 
colFactory) {
+return new CustomTypeTransform() {
 
 Review comment:
   May be extract it in `private static final CustomTypeTransform 
SIMPLE_TRANSFORM = ...`?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor result set loader projection mechanism
> ---
>
> Key: DRILL-7278
> URL: https://issues.apache.org/jira/browse/DRILL-7278
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.17.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-7278) Refactor result set loader projection mechanism

2019-06-03 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-7278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16854859#comment-16854859
 ] 

ASF GitHub Bot commented on DRILL-7278:
---

arina-ielchiieva commented on pull request #1797: DRILL-7278: Refactor result 
set loader projection mechanism
URL: https://github.com/apache/drill/pull/1797#discussion_r289948150
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/record/metadata/ColumnMetadata.java
 ##
 @@ -51,19 +51,20 @@
*/
   String FORMAT_PROP = DRILL_PROP_PREFIX + "format";
 
-  /**
-   * Indicates if the column is projected. Used only for internal
-   * reader-provided schemas.
-   */
-  String PROJECTED_PROP = DRILL_PROP_PREFIX + "projected";
-
   /**
* Indicates how to handle blanks. Must be one of the valid values defined
* in AbstractConvertFromString. Normally set on the converter by the plugin
* rather than by the user in the schema.
*/
   String BLANK_AS_PROP = DRILL_PROP_PREFIX + "blank-as";
 
+  /**
+   * Indicates whether to project the column in a wildcard (*) query.
+   * Special columns may be excluded from projection. Certain "special"
+   * columns may be available only when explicitly requested.
+   */
+  String EXCLUDE_FROM_WILDCARD = DRILL_PROP_PREFIX + "special";
 
 Review comment:
   Could you please provide an example form the log reader? I did not quite 
understand from the description.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor result set loader projection mechanism
> ---
>
> Key: DRILL-7278
> URL: https://issues.apache.org/jira/browse/DRILL-7278
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.17.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-7279) Support provided schema for CSV without headers

2019-06-03 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-7279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16854785#comment-16854785
 ] 

ASF GitHub Bot commented on DRILL-7279:
---

arina-ielchiieva commented on pull request #1798: DRILL-7279: Enable provided 
schema for text files without headers
URL: https://github.com/apache/drill/pull/1798#discussion_r289931276
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/record/metadata/TupleMetadata.java
 ##
 @@ -46,6 +46,10 @@
 public interface TupleMetadata extends Propertied, Iterable {
 
   public static final String IS_STRICT_SCHEMA_PROP = DRILL_PROP_PREFIX + 
"strict";
+  public static final String HAS_HEADERS_PROP = DRILL_PROP_PREFIX + "headers";
+  public static final String SKIP_FIRST_LINE_PROP = DRILL_PROP_PREFIX + 
"skipFirstLine";
+  public static final String DELIMITER_PROP = DRILL_PROP_PREFIX + "delimiter";
+  public static final String COMMENT_CHAR_PROP = DRILL_PROP_PREFIX + 
"commentChar";
 
 Review comment:
   Why we only have four properties since we adding such support it would be 
fair to add all text properties, there are not that many of them. Since I 
believe fo the the user it would be confusing why only some of them are 
supported.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support provided schema for CSV without headers
> ---
>
> Key: DRILL-7279
> URL: https://issues.apache.org/jira/browse/DRILL-7279
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.17.0
>
>
> Extend the Drill 1.16 provided schema support for the text reader to allow a 
> provided schema for files without headers. Behavior:
> * If the file is configured to not extract headers, and a schema is provided, 
> and the schema has at least one column, then use the provided schema to 
> create individual columns. Otherwise, continue to use {{columns}} as in 
> previous versions.
> * The columns in the schema are assumed to match left-to-right with those in 
> the file.
> * If the schema contains more columns than the file, the extra columns take 
> their default values. (This occurs in schema evolution when a column is added 
> to newer files.)
> * If the file contains more columns than the schema, then the extra columns, 
> at the end of the line, are ignored. This is the same behavior as occurs if 
> the file contains headers.
> h4. Table Properties
> Also adds four table properties for text files. These properties, if present, 
> override those defined in the format plugin configuration. The properties 
> allow the user to have a single "csv" config, but to have many tables with 
> the "csv" suffix, each with different properties. That is, the user need not 
> define a new plugin config, and define a new extension, just to change a file 
> format property. With this system, the user can have a ".csv" file with 
> headers; the user need not define a different suffix (usually ".csvh" in 
> Drill) for this case.
> || Table Property || Equivalent Plugin Config Property ||
> | {{drill.headers}} | {{extractHeader}} |
> | {{drill.skipFirstLine}} |  {{skipFirstLine}} | 
> | {{drill.delimiter}} |  {{fieldDelimiter}} | 
> |  {{drill.commentChar}} |  {{comment}}| 
> For each, the rules are:
> * If the table property is not set, then the plugin property is used.
> * If the table property is set, then the property value replaces the plugin 
> property value for that one specific table.
> * For the delimiter, if the property value is an empty string, then this is 
> the same as an unset property.
> * For the comment, if the property value is an empty string, then the comment 
> is set to the ASCII NULL, which will never match. This effectively turns off 
> the comment feature for this one table.
> * If the delimiter or comment value is longer than a single character, only 
> the first character is used.
> It is possible to use the table properties without specifying a "provided" 
> schema. Just omit any columns from the schema:
> {noformat}
> create schema () for table `dfs.data`.`example`
> PROPERTIES ('drill.headers'='false', 'drill.skipFirstLine'='false', 
> 'drill.delimiter'='|')
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-7279) Support provided schema for CSV without headers

2019-06-03 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-7279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16854788#comment-16854788
 ] 

ASF GitHub Bot commented on DRILL-7279:
---

arina-ielchiieva commented on pull request #1798: DRILL-7279: Enable provided 
schema for text files without headers
URL: https://github.com/apache/drill/pull/1798#discussion_r289930608
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/record/metadata/TupleMetadata.java
 ##
 @@ -46,6 +46,10 @@
 public interface TupleMetadata extends Propertied, Iterable {
 
   public static final String IS_STRICT_SCHEMA_PROP = DRILL_PROP_PREFIX + 
"strict";
+  public static final String HAS_HEADERS_PROP = DRILL_PROP_PREFIX + "headers";
+  public static final String SKIP_FIRST_LINE_PROP = DRILL_PROP_PREFIX + 
"skipFirstLine";
+  public static final String DELIMITER_PROP = DRILL_PROP_PREFIX + "delimiter";
 
 Review comment:
   Can we pass special symbols in delimiter prop? How for example new line can 
be passed?
   `'drill.delimiter' = '\n'` will work? Or we need to escape it?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support provided schema for CSV without headers
> ---
>
> Key: DRILL-7279
> URL: https://issues.apache.org/jira/browse/DRILL-7279
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.17.0
>
>
> Extend the Drill 1.16 provided schema support for the text reader to allow a 
> provided schema for files without headers. Behavior:
> * If the file is configured to not extract headers, and a schema is provided, 
> and the schema has at least one column, then use the provided schema to 
> create individual columns. Otherwise, continue to use {{columns}} as in 
> previous versions.
> * The columns in the schema are assumed to match left-to-right with those in 
> the file.
> * If the schema contains more columns than the file, the extra columns take 
> their default values. (This occurs in schema evolution when a column is added 
> to newer files.)
> * If the file contains more columns than the schema, then the extra columns, 
> at the end of the line, are ignored. This is the same behavior as occurs if 
> the file contains headers.
> h4. Table Properties
> Also adds four table properties for text files. These properties, if present, 
> override those defined in the format plugin configuration. The properties 
> allow the user to have a single "csv" config, but to have many tables with 
> the "csv" suffix, each with different properties. That is, the user need not 
> define a new plugin config, and define a new extension, just to change a file 
> format property. With this system, the user can have a ".csv" file with 
> headers; the user need not define a different suffix (usually ".csvh" in 
> Drill) for this case.
> || Table Property || Equivalent Plugin Config Property ||
> | {{drill.headers}} | {{extractHeader}} |
> | {{drill.skipFirstLine}} |  {{skipFirstLine}} | 
> | {{drill.delimiter}} |  {{fieldDelimiter}} | 
> |  {{drill.commentChar}} |  {{comment}}| 
> For each, the rules are:
> * If the table property is not set, then the plugin property is used.
> * If the table property is set, then the property value replaces the plugin 
> property value for that one specific table.
> * For the delimiter, if the property value is an empty string, then this is 
> the same as an unset property.
> * For the comment, if the property value is an empty string, then the comment 
> is set to the ASCII NULL, which will never match. This effectively turns off 
> the comment feature for this one table.
> * If the delimiter or comment value is longer than a single character, only 
> the first character is used.
> It is possible to use the table properties without specifying a "provided" 
> schema. Just omit any columns from the schema:
> {noformat}
> create schema () for table `dfs.data`.`example`
> PROPERTIES ('drill.headers'='false', 'drill.skipFirstLine'='false', 
> 'drill.delimiter'='|')
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-7279) Support provided schema for CSV without headers

2019-06-03 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-7279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16854786#comment-16854786
 ] 

ASF GitHub Bot commented on DRILL-7279:
---

arina-ielchiieva commented on pull request #1798: DRILL-7279: Enable provided 
schema for text files without headers
URL: https://github.com/apache/drill/pull/1798#discussion_r289925557
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/text/compliant/v3/CompliantTextBatchReader.java
 ##
 @@ -122,10 +122,12 @@ public boolean open(ColumnsSchemaNegotiator 
schemaNegotiator) {
 
   /**
* Extract header and use that to define the reader schema.
+   * @param strings
 
 Review comment:
   Same here.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support provided schema for CSV without headers
> ---
>
> Key: DRILL-7279
> URL: https://issues.apache.org/jira/browse/DRILL-7279
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.17.0
>
>
> Extend the Drill 1.16 provided schema support for the text reader to allow a 
> provided schema for files without headers. Behavior:
> * If the file is configured to not extract headers, and a schema is provided, 
> and the schema has at least one column, then use the provided schema to 
> create individual columns. Otherwise, continue to use {{columns}} as in 
> previous versions.
> * The columns in the schema are assumed to match left-to-right with those in 
> the file.
> * If the schema contains more columns than the file, the extra columns take 
> their default values. (This occurs in schema evolution when a column is added 
> to newer files.)
> * If the file contains more columns than the schema, then the extra columns, 
> at the end of the line, are ignored. This is the same behavior as occurs if 
> the file contains headers.
> h4. Table Properties
> Also adds four table properties for text files. These properties, if present, 
> override those defined in the format plugin configuration. The properties 
> allow the user to have a single "csv" config, but to have many tables with 
> the "csv" suffix, each with different properties. That is, the user need not 
> define a new plugin config, and define a new extension, just to change a file 
> format property. With this system, the user can have a ".csv" file with 
> headers; the user need not define a different suffix (usually ".csvh" in 
> Drill) for this case.
> || Table Property || Equivalent Plugin Config Property ||
> | {{drill.headers}} | {{extractHeader}} |
> | {{drill.skipFirstLine}} |  {{skipFirstLine}} | 
> | {{drill.delimiter}} |  {{fieldDelimiter}} | 
> |  {{drill.commentChar}} |  {{comment}}| 
> For each, the rules are:
> * If the table property is not set, then the plugin property is used.
> * If the table property is set, then the property value replaces the plugin 
> property value for that one specific table.
> * For the delimiter, if the property value is an empty string, then this is 
> the same as an unset property.
> * For the comment, if the property value is an empty string, then the comment 
> is set to the ASCII NULL, which will never match. This effectively turns off 
> the comment feature for this one table.
> * If the delimiter or comment value is longer than a single character, only 
> the first character is used.
> It is possible to use the table properties without specifying a "provided" 
> schema. Just omit any columns from the schema:
> {noformat}
> create schema () for table `dfs.data`.`example`
> PROPERTIES ('drill.headers'='false', 'drill.skipFirstLine'='false', 
> 'drill.delimiter'='|')
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-7279) Support provided schema for CSV without headers

2019-06-03 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-7279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16854783#comment-16854783
 ] 

ASF GitHub Bot commented on DRILL-7279:
---

arina-ielchiieva commented on pull request #1798: DRILL-7279: Enable provided 
schema for text files without headers
URL: https://github.com/apache/drill/pull/1798#discussion_r289924885
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/easy/EasyFormatPlugin.java
 ##
 @@ -278,6 +277,7 @@ public CloseableRecordBatch buildScan(
  * rules for projection. It handles "early" or "late" schema readers. A 
typical
  * framework builds on standardized frameworks for files in general or text
  * files in particular.
+ * @param options
 
 Review comment:
   Please add description to avoid IDE warning.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support provided schema for CSV without headers
> ---
>
> Key: DRILL-7279
> URL: https://issues.apache.org/jira/browse/DRILL-7279
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.17.0
>
>
> Extend the Drill 1.16 provided schema support for the text reader to allow a 
> provided schema for files without headers. Behavior:
> * If the file is configured to not extract headers, and a schema is provided, 
> and the schema has at least one column, then use the provided schema to 
> create individual columns. Otherwise, continue to use {{columns}} as in 
> previous versions.
> * The columns in the schema are assumed to match left-to-right with those in 
> the file.
> * If the schema contains more columns than the file, the extra columns take 
> their default values. (This occurs in schema evolution when a column is added 
> to newer files.)
> * If the file contains more columns than the schema, then the extra columns, 
> at the end of the line, are ignored. This is the same behavior as occurs if 
> the file contains headers.
> h4. Table Properties
> Also adds four table properties for text files. These properties, if present, 
> override those defined in the format plugin configuration. The properties 
> allow the user to have a single "csv" config, but to have many tables with 
> the "csv" suffix, each with different properties. That is, the user need not 
> define a new plugin config, and define a new extension, just to change a file 
> format property. With this system, the user can have a ".csv" file with 
> headers; the user need not define a different suffix (usually ".csvh" in 
> Drill) for this case.
> || Table Property || Equivalent Plugin Config Property ||
> | {{drill.headers}} | {{extractHeader}} |
> | {{drill.skipFirstLine}} |  {{skipFirstLine}} | 
> | {{drill.delimiter}} |  {{fieldDelimiter}} | 
> |  {{drill.commentChar}} |  {{comment}}| 
> For each, the rules are:
> * If the table property is not set, then the plugin property is used.
> * If the table property is set, then the property value replaces the plugin 
> property value for that one specific table.
> * For the delimiter, if the property value is an empty string, then this is 
> the same as an unset property.
> * For the comment, if the property value is an empty string, then the comment 
> is set to the ASCII NULL, which will never match. This effectively turns off 
> the comment feature for this one table.
> * If the delimiter or comment value is longer than a single character, only 
> the first character is used.
> It is possible to use the table properties without specifying a "provided" 
> schema. Just omit any columns from the schema:
> {noformat}
> create schema () for table `dfs.data`.`example`
> PROPERTIES ('drill.headers'='false', 'drill.skipFirstLine'='false', 
> 'drill.delimiter'='|')
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-7279) Support provided schema for CSV without headers

2019-06-03 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-7279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16854787#comment-16854787
 ] 

ASF GitHub Bot commented on DRILL-7279:
---

arina-ielchiieva commented on pull request #1798: DRILL-7279: Enable provided 
schema for text files without headers
URL: https://github.com/apache/drill/pull/1798#discussion_r289927887
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/record/metadata/TupleMetadata.java
 ##
 @@ -46,6 +46,10 @@
 public interface TupleMetadata extends Propertied, Iterable {
 
   public static final String IS_STRICT_SCHEMA_PROP = DRILL_PROP_PREFIX + 
"strict";
+  public static final String HAS_HEADERS_PROP = DRILL_PROP_PREFIX + "headers";
 
 Review comment:
   I am not sure this should be placed here, these properties are mostly 
related to text readers unlike `IS_STRICT_SCHEMA_PROP` or `TupleMetadata` class 
itself. Maybe its better to move these properties to separate class which will 
clearly indicate that they are applicable for the text readers only. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support provided schema for CSV without headers
> ---
>
> Key: DRILL-7279
> URL: https://issues.apache.org/jira/browse/DRILL-7279
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.17.0
>
>
> Extend the Drill 1.16 provided schema support for the text reader to allow a 
> provided schema for files without headers. Behavior:
> * If the file is configured to not extract headers, and a schema is provided, 
> and the schema has at least one column, then use the provided schema to 
> create individual columns. Otherwise, continue to use {{columns}} as in 
> previous versions.
> * The columns in the schema are assumed to match left-to-right with those in 
> the file.
> * If the schema contains more columns than the file, the extra columns take 
> their default values. (This occurs in schema evolution when a column is added 
> to newer files.)
> * If the file contains more columns than the schema, then the extra columns, 
> at the end of the line, are ignored. This is the same behavior as occurs if 
> the file contains headers.
> h4. Table Properties
> Also adds four table properties for text files. These properties, if present, 
> override those defined in the format plugin configuration. The properties 
> allow the user to have a single "csv" config, but to have many tables with 
> the "csv" suffix, each with different properties. That is, the user need not 
> define a new plugin config, and define a new extension, just to change a file 
> format property. With this system, the user can have a ".csv" file with 
> headers; the user need not define a different suffix (usually ".csvh" in 
> Drill) for this case.
> || Table Property || Equivalent Plugin Config Property ||
> | {{drill.headers}} | {{extractHeader}} |
> | {{drill.skipFirstLine}} |  {{skipFirstLine}} | 
> | {{drill.delimiter}} |  {{fieldDelimiter}} | 
> |  {{drill.commentChar}} |  {{comment}}| 
> For each, the rules are:
> * If the table property is not set, then the plugin property is used.
> * If the table property is set, then the property value replaces the plugin 
> property value for that one specific table.
> * For the delimiter, if the property value is an empty string, then this is 
> the same as an unset property.
> * For the comment, if the property value is an empty string, then the comment 
> is set to the ASCII NULL, which will never match. This effectively turns off 
> the comment feature for this one table.
> * If the delimiter or comment value is longer than a single character, only 
> the first character is used.
> It is possible to use the table properties without specifying a "provided" 
> schema. Just omit any columns from the schema:
> {noformat}
> create schema () for table `dfs.data`.`example`
> PROPERTIES ('drill.headers'='false', 'drill.skipFirstLine'='false', 
> 'drill.delimiter'='|')
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-7279) Support provided schema for CSV without headers

2019-06-03 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-7279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16854784#comment-16854784
 ] 

ASF GitHub Bot commented on DRILL-7279:
---

arina-ielchiieva commented on pull request #1798: DRILL-7279: Enable provided 
schema for text files without headers
URL: https://github.com/apache/drill/pull/1798#discussion_r289932538
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/text/compliant/v3/TextParsingSettingsV3.java
 ##
 @@ -17,45 +17,123 @@
  */
 package org.apache.drill.exec.store.easy.text.compliant.v3;
 
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.server.options.OptionManager;
+import org.apache.drill.exec.store.dfs.easy.EasySubScan;
 import org.apache.drill.exec.store.easy.text.TextFormatPlugin.TextFormatConfig;
-
 import org.apache.drill.shaded.guava.com.google.common.base.Charsets;
 
 // TODO: Remove the "V3" suffix once the V2 version is retired.
 public class TextParsingSettingsV3 {
 
-  public static final TextParsingSettingsV3 DEFAULT = new 
TextParsingSettingsV3();
-
-  private String emptyValue = null;
-  private boolean parseUnescapedQuotes = true;
-  private byte quote = b('"');
-  private byte quoteEscape = b('"');
-  private byte delimiter = b(',');
-  private byte comment = b('#');
-
-  private long maxCharsPerColumn = Character.MAX_VALUE;
-  private byte normalizedNewLine = b('\n');
-  private byte[] newLineDelimiter = {normalizedNewLine};
-  private boolean ignoreLeadingWhitespaces;
-  private boolean ignoreTrailingWhitespaces;
-  private String lineSeparatorString = "\n";
+  private final String emptyValue = null;
+  private final boolean parseUnescapedQuotes = true;
+  private final byte quote;
+  private final byte quoteEscape;
+  private final byte delimiter;
+  private final byte comment;
+
+  private final long maxCharsPerColumn = Character.MAX_VALUE;
+  private final byte normalizedNewLine = b('\n');
+  private final byte[] newLineDelimiter;
+  private final boolean ignoreLeadingWhitespaces = false;
+  private final boolean ignoreTrailingWhitespaces = false;
+  private final String lineSeparatorString = "\n";
   private boolean skipFirstLine;
 
-  private boolean headerExtractionEnabled;
-  private boolean useRepeatedVarChar = true;
-
-  public void set(TextFormatConfig config){
-this.quote = bSafe(config.getQuote(), "quote");
-this.quoteEscape = bSafe(config.getEscape(), "escape");
-this.newLineDelimiter = config.getLineDelimiter().getBytes(Charsets.UTF_8);
-this.delimiter = bSafe(config.getFieldDelimiter(), "fieldDelimiter");
-this.comment = bSafe(config.getComment(), "comment");
-this.skipFirstLine = config.isSkipFirstLine();
-this.headerExtractionEnabled = config.isHeaderExtractionEnabled();
-if (this.headerExtractionEnabled) {
-  // In case of header TextRecordReader will use set of VarChar vectors vs 
RepeatedVarChar
-  this.useRepeatedVarChar = false;
+  private final boolean headerExtractionEnabled;
+  private final boolean useRepeatedVarChar;
+  private final String providedHeaders[];
+
+  /**
+   * Configure the properties for this one scan based on:
+   * 
+   * 
+   * The defaults in the plugin config (if properties not defined
+   * in the config JSON.
+   * The config values from the config JSON as stored in the
+   * plugin config.
+   * Table function settings expressed in the query (and passed
+   * in as part of the plugin config.
+   * Table properties.
+   * 
+   * 
+   * The implementation does not use system/session properties, but
 
 Review comment:
   Not sure of value if this comment.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support provided schema for CSV without headers
> ---
>
> Key: DRILL-7279
> URL: https://issues.apache.org/jira/browse/DRILL-7279
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.17.0
>
>
> Extend the Drill 1.16 provided schema support for the text reader to allow a 
> provided schema for files without headers. Behavior:
> * If the file is configured to not extract headers, and a schema is provided, 
> and the schema has at least one column, then use the provided schema to 
> create individual columns. Otherwise, continue to use {{columns}} as in 
> previous versions.
> * The columns in the schema are assumed to match left-to-right with those in 
> the file.
>

[jira] [Commented] (DRILL-7097) Rename MapVector to StructVector

2019-06-03 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-7097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16854766#comment-16854766
 ] 

ASF GitHub Bot commented on DRILL-7097:
---

ihuzenko commented on issue #1803: DRILL-7097: Rename MapVector to StructVector
URL: https://github.com/apache/drill/pull/1803#issuecomment-498325374
 
 
   Closed due to discussion 
https://lists.apache.org/thread.html/5773447b82c9d6e508a62f66354613b812493cbb8c0c1cc463ccdd9f@%3Cdev.drill.apache.org%3E
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Rename MapVector to StructVector
> 
>
> Key: DRILL-7097
> URL: https://issues.apache.org/jira/browse/DRILL-7097
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Igor Guzenko
>Assignee: Igor Guzenko
>Priority: Major
>
> For a long time Drill's MapVector was actually more suitable for representing 
> Struct data. And in Apache Arrow it was actually renamed to StructVector. To 
> align our code with Arrow and give space for planned implementation of 
> canonical Map (DRILL-7096) we need to rename existing MapVector and all 
> related classes. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-7097) Rename MapVector to StructVector

2019-06-03 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-7097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16854765#comment-16854765
 ] 

ASF GitHub Bot commented on DRILL-7097:
---

ihuzenko commented on pull request #1803: DRILL-7097: Rename MapVector to 
StructVector
URL: https://github.com/apache/drill/pull/1803
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Rename MapVector to StructVector
> 
>
> Key: DRILL-7097
> URL: https://issues.apache.org/jira/browse/DRILL-7097
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Igor Guzenko
>Assignee: Igor Guzenko
>Priority: Major
>
> For a long time Drill's MapVector was actually more suitable for representing 
> Struct data. And in Apache Arrow it was actually renamed to StructVector. To 
> align our code with Arrow and give space for planned implementation of 
> canonical Map (DRILL-7096) we need to rename existing MapVector and all 
> related classes. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-7236) SqlLine 1.8 upgrade

2019-06-03 Thread Volodymyr Vysotskyi (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-7236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Volodymyr Vysotskyi updated DRILL-7236:
---
Labels: ready-to-commit  (was: )

> SqlLine 1.8 upgrade
> ---
>
> Key: DRILL-7236
> URL: https://issues.apache.org/jira/browse/DRILL-7236
> Project: Apache Drill
>  Issue Type: Task
>Affects Versions: 1.16.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.17.0
>
>
> SqlLine 1.8 upgrade



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-7236) SqlLine 1.8 upgrade

2019-06-03 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-7236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16854723#comment-16854723
 ] 

ASF GitHub Bot commented on DRILL-7236:
---

arina-ielchiieva commented on pull request #1804: DRILL-7236: SqlLine 1.8 
upgrade
URL: https://github.com/apache/drill/pull/1804
 
 
   Jira - [DRILL-7236](https://issues.apache.org/jira/browse/DRILL-7236).
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> SqlLine 1.8 upgrade
> ---
>
> Key: DRILL-7236
> URL: https://issues.apache.org/jira/browse/DRILL-7236
> Project: Apache Drill
>  Issue Type: Task
>Affects Versions: 1.16.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
> Fix For: 1.17.0
>
>
> SqlLine 1.8 upgrade



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Closed] (DRILL-7097) Rename MapVector to StructVector

2019-06-03 Thread Igor Guzenko (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-7097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Guzenko closed DRILL-7097.
---
Resolution: Abandoned

Abandoned according to discussion 
[https://lists.apache.org/thread.html/5773447b82c9d6e508a62f66354613b812493cbb8c0c1cc463ccdd9f@%3Cdev.drill.apache.org%3E]
 .

> Rename MapVector to StructVector
> 
>
> Key: DRILL-7097
> URL: https://issues.apache.org/jira/browse/DRILL-7097
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Igor Guzenko
>Assignee: Igor Guzenko
>Priority: Major
>
> For a long time Drill's MapVector was actually more suitable for representing 
> Struct data. And in Apache Arrow it was actually renamed to StructVector. To 
> align our code with Arrow and give space for planned implementation of 
> canonical Map (DRILL-7096) we need to rename existing MapVector and all 
> related classes. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-7251) Read Hive array w/o nulls

2019-06-03 Thread Arina Ielchiieva (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-7251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7251:

Fix Version/s: 1.17.0

> Read Hive array w/o nulls
> -
>
> Key: DRILL-7251
> URL: https://issues.apache.org/jira/browse/DRILL-7251
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Storage - Hive
>Reporter: Igor Guzenko
>Assignee: Igor Guzenko
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.17.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-7236) SqlLine 1.8 upgrade

2019-06-03 Thread Arina Ielchiieva (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-7236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7236:

Reviewer: Volodymyr Vysotskyi

> SqlLine 1.8 upgrade
> ---
>
> Key: DRILL-7236
> URL: https://issues.apache.org/jira/browse/DRILL-7236
> Project: Apache Drill
>  Issue Type: Task
>Affects Versions: 1.16.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
> Fix For: 1.17.0
>
>
> SqlLine 1.8 upgrade



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-7251) Read Hive array w/o nulls

2019-06-03 Thread Arina Ielchiieva (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-7251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7251:

Reviewer: Volodymyr Vysotskyi

> Read Hive array w/o nulls
> -
>
> Key: DRILL-7251
> URL: https://issues.apache.org/jira/browse/DRILL-7251
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Storage - Hive
>Reporter: Igor Guzenko
>Assignee: Igor Guzenko
>Priority: Major
>  Labels: ready-to-commit
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-7251) Read Hive array w/o nulls

2019-06-03 Thread Volodymyr Vysotskyi (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-7251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Volodymyr Vysotskyi updated DRILL-7251:
---
Labels: ready-to-commit  (was: )

> Read Hive array w/o nulls
> -
>
> Key: DRILL-7251
> URL: https://issues.apache.org/jira/browse/DRILL-7251
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Storage - Hive
>Reporter: Igor Guzenko
>Assignee: Igor Guzenko
>Priority: Major
>  Labels: ready-to-commit
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-7169) Rename drill-root ArtifactID to apache-drill

2019-06-03 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-7169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16854401#comment-16854401
 ] 

ASF GitHub Bot commented on DRILL-7169:
---

vvysotskyi commented on issue #1746: DRILL-7169: Rename drill-root ArtifactID 
to apache-drill
URL: https://github.com/apache/drill/pull/1746#issuecomment-498175661
 
 
   Currently, there is no consensus for this change, here is mail thread with 
the discussion: 
https://lists.apache.org/thread.html/99a2098ae04d56dd5994288b5fc05780d7de09a2f4a5ffbc5ce39cde@%3Cdev.drill.apache.org%3E
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Rename drill-root ArtifactID to apache-drill
> 
>
> Key: DRILL-7169
> URL: https://issues.apache.org/jira/browse/DRILL-7169
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Tools, Build & Test
>Affects Versions: 1.15.0
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
>Priority: Minor
> Fix For: Future
>
>
> Rename {{drill-root}} root POM ArtifactID to {{apache-drill, see:}}
> {{[https://github.com/apache/drill/blob/master/pom.xml#L32]}}
> Most of all Apache projects use short project name as artifactId.
> Rename it to {{apache-drill}} allow to use it as variable for drill build 
> process.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-7261) Simplify Easy format config for new scan framework

2019-06-03 Thread Arina Ielchiieva (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-7261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7261:

Labels: ready-to-commit  (was: )

> Simplify Easy format config for new scan framework
> --
>
> Key: DRILL-7261
> URL: https://issues.apache.org/jira/browse/DRILL-7261
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.17.0
>
>
> Rollup of related CSV V3 fixes along with supporting row set framework fixes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-7258) [Text V3 Reader] Unsupported operation error is thrown when select a column with a long string

2019-06-03 Thread Arina Ielchiieva (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-7258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7258:

Labels: ready-to-commit  (was: arina)

> [Text V3 Reader] Unsupported operation error is thrown when select a column 
> with a long string
> --
>
> Key: DRILL-7258
> URL: https://issues.apache.org/jira/browse/DRILL-7258
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Anton Gozhiy
>Assignee: Paul Rogers
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.17.0
>
> Attachments: 10.tbl
>
>
> *Data:*
> 10.tbl is attached
> *Steps:*
> # Set exec.storage.enable_v3_text_reader=true
> # Run the following query:
> {code:sql}
> select * from dfs.`/tmp/drill/data/10.tbl`
> {code}
> *Expected result:*
> The query should return result normally.
> *Actual result:*
> Exception is thrown:
> {noformat}
> UNSUPPORTED_OPERATION ERROR: Drill Remote Exception
>   (java.lang.Exception) UNSUPPORTED_OPERATION ERROR: Text column is too large.
> Column 0
> Limit 65536
> Fragment 0:0
> [Error Id: 5f73232f-f0c0-48aa-ab0f-b5f86495d3c8 on userf87d-pc:31010]
> org.apache.drill.common.exceptions.UserException$Builder.build():630
> 
> org.apache.drill.exec.store.easy.text.compliant.v3.BaseFieldOutput.append():131
> 
> org.apache.drill.exec.store.easy.text.compliant.v3.TextReader.parseValueAll():208
> 
> org.apache.drill.exec.store.easy.text.compliant.v3.TextReader.parseValue():225
> 
> org.apache.drill.exec.store.easy.text.compliant.v3.TextReader.parseField():341
> 
> org.apache.drill.exec.store.easy.text.compliant.v3.TextReader.parseRecord():137
> 
> org.apache.drill.exec.store.easy.text.compliant.v3.TextReader.parseNext():388
> 
> org.apache.drill.exec.store.easy.text.compliant.v3.CompliantTextBatchReader.next():220
> 
> org.apache.drill.exec.physical.impl.scan.framework.ShimBatchReader.next():132
> org.apache.drill.exec.physical.impl.scan.ReaderState.readBatch():397
> org.apache.drill.exec.physical.impl.scan.ReaderState.next():354
> org.apache.drill.exec.physical.impl.scan.ScanOperatorExec.nextAction():184
> org.apache.drill.exec.physical.impl.scan.ScanOperatorExec.next():159
> org.apache.drill.exec.physical.impl.protocol.OperatorDriver.doNext():176
> org.apache.drill.exec.physical.impl.protocol.OperatorDriver.next():114
> 
> org.apache.drill.exec.physical.impl.protocol.OperatorRecordBatch.next():147
> org.apache.drill.exec.record.AbstractRecordBatch.next():126
> org.apache.drill.exec.record.AbstractRecordBatch.next():116
> org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():141
> org.apache.drill.exec.record.AbstractRecordBatch.next():186
> org.apache.drill.exec.physical.impl.BaseRootExec.next():104
> 
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():83
> org.apache.drill.exec.physical.impl.BaseRootExec.next():94
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():296
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():283
> ...():0
> org.apache.hadoop.security.UserGroupInformation.doAs():1746
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():283
> org.apache.drill.common.SelfCleaningRunnable.run():38
> ...():0
> {noformat}
> *Note:* works fine with v2 reader. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-7258) [Text V3 Reader] Unsupported operation error is thrown when select a column with a long string

2019-06-03 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-7258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16854326#comment-16854326
 ] 

ASF GitHub Bot commented on DRILL-7258:
---

arina-ielchiieva commented on issue #1802: DRILL-7258: Remove field width limit 
for text reader
URL: https://github.com/apache/drill/pull/1802#issuecomment-498153110
 
 
   +1
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Text V3 Reader] Unsupported operation error is thrown when select a column 
> with a long string
> --
>
> Key: DRILL-7258
> URL: https://issues.apache.org/jira/browse/DRILL-7258
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Anton Gozhiy
>Assignee: Paul Rogers
>Priority: Major
>  Labels: arina
> Fix For: 1.17.0
>
> Attachments: 10.tbl
>
>
> *Data:*
> 10.tbl is attached
> *Steps:*
> # Set exec.storage.enable_v3_text_reader=true
> # Run the following query:
> {code:sql}
> select * from dfs.`/tmp/drill/data/10.tbl`
> {code}
> *Expected result:*
> The query should return result normally.
> *Actual result:*
> Exception is thrown:
> {noformat}
> UNSUPPORTED_OPERATION ERROR: Drill Remote Exception
>   (java.lang.Exception) UNSUPPORTED_OPERATION ERROR: Text column is too large.
> Column 0
> Limit 65536
> Fragment 0:0
> [Error Id: 5f73232f-f0c0-48aa-ab0f-b5f86495d3c8 on userf87d-pc:31010]
> org.apache.drill.common.exceptions.UserException$Builder.build():630
> 
> org.apache.drill.exec.store.easy.text.compliant.v3.BaseFieldOutput.append():131
> 
> org.apache.drill.exec.store.easy.text.compliant.v3.TextReader.parseValueAll():208
> 
> org.apache.drill.exec.store.easy.text.compliant.v3.TextReader.parseValue():225
> 
> org.apache.drill.exec.store.easy.text.compliant.v3.TextReader.parseField():341
> 
> org.apache.drill.exec.store.easy.text.compliant.v3.TextReader.parseRecord():137
> 
> org.apache.drill.exec.store.easy.text.compliant.v3.TextReader.parseNext():388
> 
> org.apache.drill.exec.store.easy.text.compliant.v3.CompliantTextBatchReader.next():220
> 
> org.apache.drill.exec.physical.impl.scan.framework.ShimBatchReader.next():132
> org.apache.drill.exec.physical.impl.scan.ReaderState.readBatch():397
> org.apache.drill.exec.physical.impl.scan.ReaderState.next():354
> org.apache.drill.exec.physical.impl.scan.ScanOperatorExec.nextAction():184
> org.apache.drill.exec.physical.impl.scan.ScanOperatorExec.next():159
> org.apache.drill.exec.physical.impl.protocol.OperatorDriver.doNext():176
> org.apache.drill.exec.physical.impl.protocol.OperatorDriver.next():114
> 
> org.apache.drill.exec.physical.impl.protocol.OperatorRecordBatch.next():147
> org.apache.drill.exec.record.AbstractRecordBatch.next():126
> org.apache.drill.exec.record.AbstractRecordBatch.next():116
> org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():141
> org.apache.drill.exec.record.AbstractRecordBatch.next():186
> org.apache.drill.exec.physical.impl.BaseRootExec.next():104
> 
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():83
> org.apache.drill.exec.physical.impl.BaseRootExec.next():94
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():296
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():283
> ...():0
> org.apache.hadoop.security.UserGroupInformation.doAs():1746
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():283
> org.apache.drill.common.SelfCleaningRunnable.run():38
> ...():0
> {noformat}
> *Note:* works fine with v2 reader. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-7258) [Text V3 Reader] Unsupported operation error is thrown when select a column with a long string

2019-06-03 Thread Arina Ielchiieva (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-7258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7258:

Reviewer: Arina Ielchiieva

> [Text V3 Reader] Unsupported operation error is thrown when select a column 
> with a long string
> --
>
> Key: DRILL-7258
> URL: https://issues.apache.org/jira/browse/DRILL-7258
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Anton Gozhiy
>Assignee: Paul Rogers
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.17.0
>
> Attachments: 10.tbl
>
>
> *Data:*
> 10.tbl is attached
> *Steps:*
> # Set exec.storage.enable_v3_text_reader=true
> # Run the following query:
> {code:sql}
> select * from dfs.`/tmp/drill/data/10.tbl`
> {code}
> *Expected result:*
> The query should return result normally.
> *Actual result:*
> Exception is thrown:
> {noformat}
> UNSUPPORTED_OPERATION ERROR: Drill Remote Exception
>   (java.lang.Exception) UNSUPPORTED_OPERATION ERROR: Text column is too large.
> Column 0
> Limit 65536
> Fragment 0:0
> [Error Id: 5f73232f-f0c0-48aa-ab0f-b5f86495d3c8 on userf87d-pc:31010]
> org.apache.drill.common.exceptions.UserException$Builder.build():630
> 
> org.apache.drill.exec.store.easy.text.compliant.v3.BaseFieldOutput.append():131
> 
> org.apache.drill.exec.store.easy.text.compliant.v3.TextReader.parseValueAll():208
> 
> org.apache.drill.exec.store.easy.text.compliant.v3.TextReader.parseValue():225
> 
> org.apache.drill.exec.store.easy.text.compliant.v3.TextReader.parseField():341
> 
> org.apache.drill.exec.store.easy.text.compliant.v3.TextReader.parseRecord():137
> 
> org.apache.drill.exec.store.easy.text.compliant.v3.TextReader.parseNext():388
> 
> org.apache.drill.exec.store.easy.text.compliant.v3.CompliantTextBatchReader.next():220
> 
> org.apache.drill.exec.physical.impl.scan.framework.ShimBatchReader.next():132
> org.apache.drill.exec.physical.impl.scan.ReaderState.readBatch():397
> org.apache.drill.exec.physical.impl.scan.ReaderState.next():354
> org.apache.drill.exec.physical.impl.scan.ScanOperatorExec.nextAction():184
> org.apache.drill.exec.physical.impl.scan.ScanOperatorExec.next():159
> org.apache.drill.exec.physical.impl.protocol.OperatorDriver.doNext():176
> org.apache.drill.exec.physical.impl.protocol.OperatorDriver.next():114
> 
> org.apache.drill.exec.physical.impl.protocol.OperatorRecordBatch.next():147
> org.apache.drill.exec.record.AbstractRecordBatch.next():126
> org.apache.drill.exec.record.AbstractRecordBatch.next():116
> org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():141
> org.apache.drill.exec.record.AbstractRecordBatch.next():186
> org.apache.drill.exec.physical.impl.BaseRootExec.next():104
> 
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():83
> org.apache.drill.exec.physical.impl.BaseRootExec.next():94
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():296
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():283
> ...():0
> org.apache.hadoop.security.UserGroupInformation.doAs():1746
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():283
> org.apache.drill.common.SelfCleaningRunnable.run():38
> ...():0
> {noformat}
> *Note:* works fine with v2 reader. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-7261) Simplify Easy format config for new scan framework

2019-06-03 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-7261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16854294#comment-16854294
 ] 

ASF GitHub Bot commented on DRILL-7261:
---

arina-ielchiieva commented on pull request #1796: DRILL-7261: Simplify Easy 
framework config
URL: https://github.com/apache/drill/pull/1796#discussion_r289711757
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/text/TextFormatPlugin.java
 ##
 @@ -336,6 +268,53 @@ public RecordReader getRecordReader(FragmentContext 
context,
 }
   }
 
+  @Override
+  protected FileScanBuilder frameworkBuilder(
+  OptionManager options, EasySubScan scan) throws ExecutionSetupException {
+ColumnsScanBuilder builder = new ColumnsScanBuilder();
+builder.setReaderFactory(new ColumnsReaderFactory(this));
+
+// If this format has no headers, or wants to skip them,
+// then we must use the columns column to hold the data.
+
+builder.requireColumnsArray(
+! getConfig().isHeaderExtractionEnabled());
+
+// Text files handle nulls in an unusual way. Missing columns
+// are set to required Varchar and filled with blanks. Yes, this
+// means that the SQL statement or code cannot differentiate missing
+// columns from empty columns, but that is how CSV and other text
+// files have been defined within Drill.
+
+builder.setNullType(Types.required(MinorType.VARCHAR));
+
+// CSV maps blank columns to nulls (for nullable non-string columns),
+// or to the default value (for non-nullable non-string columns.)
+
+builder.setConversionProperty(AbstractConvertFromString.BLANK_ACTION_PROP,
+AbstractConvertFromString.BLANK_AS_NULL);
+
+// The text readers use required Varchar columns to represent null columns.
+
+builder.allowRequiredNullColumns(true);
+
+// Provide custom error context
+builder.setContext(
+new CustomErrorContext() {
+  @Override
+  public void addContext(UserException.Builder builder) {
+builder.addContext("Format plugin:", PLUGIN_NAME);
+builder.addContext("Plugin config name:", getName());
+builder.addContext("Extract headers:",
+Boolean.toString(getConfig().isHeaderExtractionEnabled()));
+builder.addContext("Skip headers:",
 
 Review comment:
   Skip lines?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Simplify Easy format config for new scan framework
> --
>
> Key: DRILL-7261
> URL: https://issues.apache.org/jira/browse/DRILL-7261
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.17.0
>
>
> Rollup of related CSV V3 fixes along with supporting row set framework fixes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-7261) Simplify Easy format config for new scan framework

2019-06-03 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-7261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16854293#comment-16854293
 ] 

ASF GitHub Bot commented on DRILL-7261:
---

arina-ielchiieva commented on pull request #1796: DRILL-7261: Simplify Easy 
framework config
URL: https://github.com/apache/drill/pull/1796#discussion_r289711815
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/scan/file/FileScanFramework.java
 ##
 @@ -89,6 +89,7 @@
  * @return Hadoop file split object with the file path, block
  * offset, and length.
  */
+
 
 Review comment:
   Nit: remove new line.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Simplify Easy format config for new scan framework
> --
>
> Key: DRILL-7261
> URL: https://issues.apache.org/jira/browse/DRILL-7261
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.17.0
>
>
> Rollup of related CSV V3 fixes along with supporting row set framework fixes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-7169) Rename drill-root ArtifactID to apache-drill

2019-06-03 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-7169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16854288#comment-16854288
 ] 

ASF GitHub Bot commented on DRILL-7169:
---

arina-ielchiieva commented on issue #1746: DRILL-7169: Rename drill-root 
ArtifactID to apache-drill
URL: https://github.com/apache/drill/pull/1746#issuecomment-498138795
 
 
   @vdiravka / @vvysotskyi what is the status of this PR?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Rename drill-root ArtifactID to apache-drill
> 
>
> Key: DRILL-7169
> URL: https://issues.apache.org/jira/browse/DRILL-7169
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Tools, Build & Test
>Affects Versions: 1.15.0
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
>Priority: Minor
> Fix For: Future
>
>
> Rename {{drill-root}} root POM ArtifactID to {{apache-drill, see:}}
> {{[https://github.com/apache/drill/blob/master/pom.xml#L32]}}
> Most of all Apache projects use short project name as artifactId.
> Rename it to {{apache-drill}} allow to use it as variable for drill build 
> process.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

39 matches

Mail list logo