[jira] [Commented] (DRILL-7454) Convert the Avro format plugin to use EVF

2020-01-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17022210#comment-17022210
 ] 

ASF GitHub Bot commented on DRILL-7454:
---

asfgit commented on pull request #1951: DRILL-7454: Convert Avro to EVF
URL: https://github.com/apache/drill/pull/1951
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Convert the Avro format plugin to use EVF
> -
>
> Key: DRILL-7454
> URL: https://issues.apache.org/jira/browse/DRILL-7454
> Project: Apache Drill
>  Issue Type: Sub-task
>Affects Versions: 1.17.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.18.0
>
>
> Convert the Avro format plugin to use EVF.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7454) Convert the Avro format plugin to use EVF

2020-01-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17015125#comment-17015125
 ] 

ASF GitHub Bot commented on DRILL-7454:
---

arina-ielchiieva commented on issue #1951: DRILL-7454: Convert Avro to EVF
URL: https://github.com/apache/drill/pull/1951#issuecomment-574200463
 
 
   @cgivre create Jira for the documentation update 
(https://issues.apache.org/jira/browse/DRILL-7528).
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Convert the Avro format plugin to use EVF
> -
>
> Key: DRILL-7454
> URL: https://issues.apache.org/jira/browse/DRILL-7454
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.18.0
>
>
> Convert the Avro format plugin to use EVF.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7454) Convert the Avro format plugin to use EVF

2020-01-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17015126#comment-17015126
 ] 

ASF GitHub Bot commented on DRILL-7454:
---

arina-ielchiieva commented on issue #1951: DRILL-7454: Convert Avro to EVF
URL: https://github.com/apache/drill/pull/1951#issuecomment-574200463
 
 
   @cgivre created Jira for the documentation update 
(https://issues.apache.org/jira/browse/DRILL-7528).
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Convert the Avro format plugin to use EVF
> -
>
> Key: DRILL-7454
> URL: https://issues.apache.org/jira/browse/DRILL-7454
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.18.0
>
>
> Convert the Avro format plugin to use EVF.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7454) Convert the Avro format plugin to use EVF

2020-01-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17015118#comment-17015118
 ] 

ASF GitHub Bot commented on DRILL-7454:
---

cgivre commented on issue #1951: DRILL-7454: Convert Avro to EVF
URL: https://github.com/apache/drill/pull/1951#issuecomment-574198128
 
 
   Thanks @arina-ielchiieva for doing this. Can we also update the docs [1] as 
they state that querying Avro is experimental and there are known issues.
   
   [1]: https://drill.apache.org/docs/querying-avro-files/
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Convert the Avro format plugin to use EVF
> -
>
> Key: DRILL-7454
> URL: https://issues.apache.org/jira/browse/DRILL-7454
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.18.0
>
>
> Convert the Avro format plugin to use EVF.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7454) Convert the Avro format plugin to use EVF

2020-01-13 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17014268#comment-17014268
 ] 

ASF GitHub Bot commented on DRILL-7454:
---

cgivre commented on issue #1951: DRILL-7454: Convert Avro to EVF
URL: https://github.com/apache/drill/pull/1951#issuecomment-573638073
 
 
   Nice work everyone!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Convert the Avro format plugin to use EVF
> -
>
> Key: DRILL-7454
> URL: https://issues.apache.org/jira/browse/DRILL-7454
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.18.0
>
>
> Convert the Avro format plugin to use EVF.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7454) Convert the Avro format plugin to use EVF

2020-01-13 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17014267#comment-17014267
 ] 

ASF GitHub Bot commented on DRILL-7454:
---

arina-ielchiieva commented on issue #1951: DRILL-7454: Convert Avro to EVF
URL: https://github.com/apache/drill/pull/1951#issuecomment-573637810
 
 
   @paul-rogers / @vvysotskyi thank you for the review!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Convert the Avro format plugin to use EVF
> -
>
> Key: DRILL-7454
> URL: https://issues.apache.org/jira/browse/DRILL-7454
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.18.0
>
>
> Convert the Avro format plugin to use EVF.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7454) Convert the Avro format plugin to use EVF

2020-01-12 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17013857#comment-17013857
 ] 

ASF GitHub Bot commented on DRILL-7454:
---

paul-rogers commented on pull request #1951: DRILL-7454: Convert Avro to EVF
URL: https://github.com/apache/drill/pull/1951#discussion_r365603939
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/avro/ColumnConverter.java
 ##
 @@ -0,0 +1,261 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.avro;
+
+import org.apache.avro.generic.GenericArray;
+import org.apache.avro.generic.GenericFixed;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.util.Utf8;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.vector.accessor.ArrayWriter;
+import org.apache.drill.exec.vector.accessor.DictWriter;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.drill.shaded.guava.com.google.common.base.Charsets;
+import org.joda.time.DateTimeConstants;
+import org.joda.time.Period;
+
+import java.math.BigDecimal;
+import java.math.BigInteger;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.nio.IntBuffer;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import java.util.function.Consumer;
+import java.util.stream.IntStream;
+
+/**
+ * Converts and sets given value into the specific column writer.
+ */
+public interface ColumnConverter {
+
+  void convert(Object value);
+
+  /**
+   * Does nothing, is used when column is not projected to avoid unnecessary
+   * column values conversions and writes.
+   */
+  class DummyColumnConverter implements ColumnConverter {
+
+public static final DummyColumnConverter INSTANCE = new 
DummyColumnConverter();
+
+@Override
+public void convert(Object value) {
+  // do nothing
+}
+  }
+
+  /**
+   * Converts and writes scalar values using provided {@link #valueConverter}.
+   * {@link #valueConverter} has different implementation depending
+   * on the scalar value type.
+   */
+  class ScalarColumnConverter implements ColumnConverter {
+
+private final Consumer valueConverter;
+
+public ScalarColumnConverter(Consumer valueConverter) {
+  this.valueConverter = valueConverter;
+}
+
+public static ScalarColumnConverter init(ScalarWriter writer) {
 
 Review comment:
   Very nice! Clean and simple; much simpler than having a class per type. If 
we get a chance to update the Drill book, we'll point to this as an example of 
how to do conversion simply.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Convert the Avro format plugin to use EVF
> -
>
> Key: DRILL-7454
> URL: https://issues.apache.org/jira/browse/DRILL-7454
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.18.0
>
>
> Convert the Avro format plugin to use EVF.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7454) Convert the Avro format plugin to use EVF

2020-01-10 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17012948#comment-17012948
 ] 

ASF GitHub Bot commented on DRILL-7454:
---

arina-ielchiieva commented on issue #1951: DRILL-7454: Convert Avro to EVF
URL: https://github.com/apache/drill/pull/1951#issuecomment-573071260
 
 
   @paul-rogers / @vvysotskyi addressed final review comment regarding column 
converters in a separate commit. Please take a look.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Convert the Avro format plugin to use EVF
> -
>
> Key: DRILL-7454
> URL: https://issues.apache.org/jira/browse/DRILL-7454
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
> Fix For: 1.18.0
>
>
> Convert the Avro format plugin to use EVF.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7454) Convert the Avro format plugin to use EVF

2020-01-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17009741#comment-17009741
 ] 

ASF GitHub Bot commented on DRILL-7454:
---

arina-ielchiieva commented on pull request #1951: DRILL-7454: Convert Avro to 
EVF
URL: https://github.com/apache/drill/pull/1951#discussion_r363758891
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/avro/AvroBatchReader.java
 ##
 @@ -0,0 +1,368 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.avro;
+
+import org.apache.avro.Schema;
+import org.apache.avro.file.DataFileReader;
+import org.apache.avro.generic.GenericArray;
+import org.apache.avro.generic.GenericContainer;
+import org.apache.avro.generic.GenericDatumReader;
+import org.apache.avro.generic.GenericFixed;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.mapred.FsInput;
+import org.apache.avro.util.Utf8;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.exec.physical.impl.scan.file.FileScanFramework;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.util.ImpersonationUtil;
+import org.apache.drill.exec.vector.accessor.ArrayWriter;
+import org.apache.drill.exec.vector.accessor.DictWriter;
+import org.apache.drill.exec.vector.accessor.ObjectWriter;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.drill.shaded.guava.com.google.common.base.Charsets;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapred.FileSplit;
+import org.apache.hadoop.security.UserGroupInformation;
+import org.joda.time.DateTimeConstants;
+import org.joda.time.Period;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.math.BigDecimal;
+import java.math.BigInteger;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.nio.IntBuffer;
+import java.security.PrivilegedExceptionAction;
+import java.util.List;
+import java.util.Map;
+
+public class AvroBatchReader implements 
ManagedReader {
+
+  private static final Logger logger = 
LoggerFactory.getLogger(AvroBatchReader.class);
+
+  // currently config is unused but maybe used later
+  private final AvroReaderConfig config;
+
+  private Path filePath;
+  private long endPosition;
+  private DataFileReader reader;
+  private ResultSetLoader loader;
+  // re-use container instance
+  private GenericContainer container = null;
+
+  public AvroBatchReader(AvroReaderConfig config) {
+this.config = config;
+  }
+
+  @Override
+  public boolean open(FileScanFramework.FileSchemaNegotiator negotiator) {
+FileSplit split = negotiator.split();
+filePath = split.getPath();
+
+// Avro files are splittable, define reading start / end positions
+long startPosition = split.getStart();
+endPosition = startPosition + split.getLength();
+
+logger.debug("Processing Avro file: {}, start position: {}, end position: 
{}",
+  filePath, startPosition, endPosition);
+
+reader = prepareReader(split, negotiator.fileSystem(),
+  negotiator.userName(), 
negotiator.context().getFragmentContext().getQueryUserName());
+
+logger.debug("Avro file schema: {}", reader.getSchema());
+TupleMetadata schema = AvroSchemaUtil.convert(reader.getSchema());
 
 Review comment:
   When reading Avro files we obtain schema from each file. 
   I believe we might have list of files to read at planning time (that what we 
have for Parquet where we read the footers). We can open first file read the 
schema, convert it and then provide it when reading all files but of course we 
would have to check each file is compatible with such schema. Though from the 
EVF code, I am not sure where such approach 

[jira] [Commented] (DRILL-7454) Convert the Avro format plugin to use EVF

2020-01-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17009735#comment-17009735
 ] 

ASF GitHub Bot commented on DRILL-7454:
---

arina-ielchiieva commented on pull request #1951: DRILL-7454: Convert Avro to 
EVF
URL: https://github.com/apache/drill/pull/1951#discussion_r363756143
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/avro/AvroBatchReader.java
 ##
 @@ -0,0 +1,368 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.avro;
+
+import org.apache.avro.Schema;
+import org.apache.avro.file.DataFileReader;
+import org.apache.avro.generic.GenericArray;
+import org.apache.avro.generic.GenericContainer;
+import org.apache.avro.generic.GenericDatumReader;
+import org.apache.avro.generic.GenericFixed;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.mapred.FsInput;
+import org.apache.avro.util.Utf8;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.exec.physical.impl.scan.file.FileScanFramework;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.util.ImpersonationUtil;
+import org.apache.drill.exec.vector.accessor.ArrayWriter;
+import org.apache.drill.exec.vector.accessor.DictWriter;
+import org.apache.drill.exec.vector.accessor.ObjectWriter;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.drill.shaded.guava.com.google.common.base.Charsets;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapred.FileSplit;
+import org.apache.hadoop.security.UserGroupInformation;
+import org.joda.time.DateTimeConstants;
+import org.joda.time.Period;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.math.BigDecimal;
+import java.math.BigInteger;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.nio.IntBuffer;
+import java.security.PrivilegedExceptionAction;
+import java.util.List;
+import java.util.Map;
+
+public class AvroBatchReader implements 
ManagedReader {
+
+  private static final Logger logger = 
LoggerFactory.getLogger(AvroBatchReader.class);
+
+  // currently config is unused but maybe used later
+  private final AvroReaderConfig config;
+
+  private Path filePath;
+  private long endPosition;
+  private DataFileReader reader;
+  private ResultSetLoader loader;
+  // re-use container instance
+  private GenericContainer container = null;
+
+  public AvroBatchReader(AvroReaderConfig config) {
+this.config = config;
+  }
+
+  @Override
+  public boolean open(FileScanFramework.FileSchemaNegotiator negotiator) {
+FileSplit split = negotiator.split();
+filePath = split.getPath();
+
+// Avro files are splittable, define reading start / end positions
+long startPosition = split.getStart();
+endPosition = startPosition + split.getLength();
+
+logger.debug("Processing Avro file: {}, start position: {}, end position: 
{}",
+  filePath, startPosition, endPosition);
+
+reader = prepareReader(split, negotiator.fileSystem(),
+  negotiator.userName(), 
negotiator.context().getFragmentContext().getQueryUserName());
+
+logger.debug("Avro file schema: {}", reader.getSchema());
+TupleMetadata schema = AvroSchemaUtil.convert(reader.getSchema());
+logger.debug("Avro file converted schema: {}", schema);
+negotiator.setTableSchema(schema, true);
+loader = negotiator.build();
+
+return true;
+  }
+
+  @Override
+  public boolean next() {
+RowSetLoader rowWriter = loader.writer();
+while (!rowWriter.isFull()) {
+  if (!nextLine(rowWriter)) {
+return false;
+  }
+}
+return true;
+  }
+
+  @Override
+  public void close() {
+try {
+ 

[jira] [Commented] (DRILL-7454) Convert the Avro format plugin to use EVF

2020-01-06 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17009292#comment-17009292
 ] 

ASF GitHub Bot commented on DRILL-7454:
---

paul-rogers commented on pull request #1951: DRILL-7454: Convert Avro to EVF
URL: https://github.com/apache/drill/pull/1951#discussion_r363564292
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/avro/AvroBatchReader.java
 ##
 @@ -0,0 +1,368 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.avro;
+
+import org.apache.avro.Schema;
+import org.apache.avro.file.DataFileReader;
+import org.apache.avro.generic.GenericArray;
+import org.apache.avro.generic.GenericContainer;
+import org.apache.avro.generic.GenericDatumReader;
+import org.apache.avro.generic.GenericFixed;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.mapred.FsInput;
+import org.apache.avro.util.Utf8;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.exec.physical.impl.scan.file.FileScanFramework;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.util.ImpersonationUtil;
+import org.apache.drill.exec.vector.accessor.ArrayWriter;
+import org.apache.drill.exec.vector.accessor.DictWriter;
+import org.apache.drill.exec.vector.accessor.ObjectWriter;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.drill.shaded.guava.com.google.common.base.Charsets;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapred.FileSplit;
+import org.apache.hadoop.security.UserGroupInformation;
+import org.joda.time.DateTimeConstants;
+import org.joda.time.Period;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.math.BigDecimal;
+import java.math.BigInteger;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.nio.IntBuffer;
+import java.security.PrivilegedExceptionAction;
+import java.util.List;
+import java.util.Map;
+
+public class AvroBatchReader implements 
ManagedReader {
+
+  private static final Logger logger = 
LoggerFactory.getLogger(AvroBatchReader.class);
+
+  // currently config is unused but maybe used later
+  private final AvroReaderConfig config;
+
+  private Path filePath;
+  private long endPosition;
+  private DataFileReader reader;
+  private ResultSetLoader loader;
+  // re-use container instance
+  private GenericContainer container = null;
+
+  public AvroBatchReader(AvroReaderConfig config) {
+this.config = config;
+  }
+
+  @Override
+  public boolean open(FileScanFramework.FileSchemaNegotiator negotiator) {
+FileSplit split = negotiator.split();
+filePath = split.getPath();
+
+// Avro files are splittable, define reading start / end positions
+long startPosition = split.getStart();
+endPosition = startPosition + split.getLength();
+
+logger.debug("Processing Avro file: {}, start position: {}, end position: 
{}",
+  filePath, startPosition, endPosition);
+
+reader = prepareReader(split, negotiator.fileSystem(),
+  negotiator.userName(), 
negotiator.context().getFragmentContext().getQueryUserName());
+
+logger.debug("Avro file schema: {}", reader.getSchema());
+TupleMetadata schema = AvroSchemaUtil.convert(reader.getSchema());
+logger.debug("Avro file converted schema: {}", schema);
+negotiator.setTableSchema(schema, true);
+loader = negotiator.build();
+
+return true;
+  }
+
+  @Override
+  public boolean next() {
+RowSetLoader rowWriter = loader.writer();
+while (!rowWriter.isFull()) {
+  if (!nextLine(rowWriter)) {
+return false;
+  }
+}
+return true;
+  }
+
+  @Override
+  public void close() {
+try {
+  

[jira] [Commented] (DRILL-7454) Convert the Avro format plugin to use EVF

2020-01-06 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17009290#comment-17009290
 ] 

ASF GitHub Bot commented on DRILL-7454:
---

paul-rogers commented on pull request #1951: DRILL-7454: Convert Avro to EVF
URL: https://github.com/apache/drill/pull/1951#discussion_r363563099
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/avro/AvroFormatPlugin.java
 ##
 @@ -17,117 +17,68 @@
  */
 package org.apache.drill.exec.store.avro;
 
-import java.io.IOException;
-import java.util.List;
-import java.util.regex.Pattern;
-
-import org.apache.drill.common.exceptions.ExecutionSetupException;
-import org.apache.drill.common.expression.SchemaPath;
 import org.apache.drill.common.logical.StoragePluginConfig;
-import org.apache.drill.exec.ops.FragmentContext;
-import org.apache.drill.exec.planner.common.DrillStatsTable.TableStatistics;
-import org.apache.drill.exec.planner.logical.DrillTable;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.common.types.Types;
+import org.apache.drill.exec.physical.impl.scan.file.FileScanFramework;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
 import org.apache.drill.exec.proto.UserBitShared.CoreOperatorType;
 import org.apache.drill.exec.server.DrillbitContext;
-import org.apache.drill.exec.store.RecordReader;
-import org.apache.drill.exec.store.RecordWriter;
-import org.apache.drill.exec.store.SchemaConfig;
-import org.apache.drill.exec.store.dfs.BasicFormatMatcher;
-import org.apache.drill.exec.store.dfs.DrillFileSystem;
-import org.apache.drill.exec.store.dfs.FileSelection;
-import org.apache.drill.exec.store.dfs.FileSystemPlugin;
-import org.apache.drill.exec.store.dfs.FormatMatcher;
-import org.apache.drill.exec.store.dfs.FormatSelection;
-import org.apache.drill.exec.store.dfs.MagicString;
+import org.apache.drill.exec.server.options.OptionManager;
 import org.apache.drill.exec.store.dfs.easy.EasyFormatPlugin;
-import org.apache.drill.exec.store.dfs.easy.EasyWriter;
-import org.apache.drill.exec.store.dfs.easy.FileWork;
+import org.apache.drill.exec.store.dfs.easy.EasySubScan;
 import org.apache.hadoop.conf.Configuration;
 
-import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableList;
-import org.apache.drill.shaded.guava.com.google.common.collect.Lists;
-import org.apache.hadoop.fs.FileSystem;
-import org.apache.hadoop.fs.Path;
-
 /**
  * Format plugin for Avro data files.
  */
 public class AvroFormatPlugin extends EasyFormatPlugin {
 
-  private final AvroFormatMatcher matcher;
-
-  public AvroFormatPlugin(String name, DrillbitContext context, Configuration 
fsConf,
-  StoragePluginConfig storagePluginConfig) {
-this(name, context, fsConf, storagePluginConfig, new AvroFormatConfig());
-  }
-
-  public AvroFormatPlugin(String name, DrillbitContext context, Configuration 
fsConf, StoragePluginConfig config, AvroFormatConfig formatPluginConfig) {
-super(name, context, fsConf, config, formatPluginConfig, true, false, 
true, false, Lists.newArrayList("avro"), "avro");
-this.matcher = new AvroFormatMatcher(this);
-  }
-
-  @Override
-  public boolean supportsPushDown() {
-return true;
-  }
-
-  @Override
-  public RecordReader getRecordReader(FragmentContext context, DrillFileSystem 
dfs, FileWork fileWork, List columns, String userName) throws 
ExecutionSetupException {
-return new AvroRecordReader(context, fileWork.getPath(), 
fileWork.getStart(), fileWork.getLength(), dfs, columns,
-  userName);
-  }
-
-  @Override
-  public RecordWriter getRecordWriter(FragmentContext context, EasyWriter 
writer) throws IOException {
-throw new UnsupportedOperationException("unimplemented");
-  }
+  public static final String DEFAULT_NAME = "avro";
 
-  @Override
-  public int getReaderOperatorType() {
-return CoreOperatorType.AVRO_SUB_SCAN_VALUE;
+  public AvroFormatPlugin(String name,
+ DrillbitContext context,
+ Configuration fsConf,
+ StoragePluginConfig storageConfig,
+ AvroFormatConfig formatConfig) {
+super(name, easyConfig(fsConf, formatConfig), context, storageConfig, 
formatConfig);
   }
 
-  @Override
-  public int getWriterOperatorType() {
-throw new UnsupportedOperationException("unimplemented");
+  private static EasyFormatConfig easyConfig(Configuration fsConf, 
AvroFormatConfig formatConfig) {
+EasyFormatConfig config = new EasyFormatConfig();
+config.readable = true;
+config.writable = false;
+config.blockSplittable = true;
+config.compressible = false;
+config.supportsProjectPushdown = true;
+config.extensions = formatConfig.extensions;
+config.fsConf = fsConf;
+config.defaultName = DEFAULT_NAME;
+config.readerOperatorType = 

[jira] [Commented] (DRILL-7454) Convert the Avro format plugin to use EVF

2020-01-06 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17009289#comment-17009289
 ] 

ASF GitHub Bot commented on DRILL-7454:
---

paul-rogers commented on pull request #1951: DRILL-7454: Convert Avro to EVF
URL: https://github.com/apache/drill/pull/1951#discussion_r363563012
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/avro/AvroBatchReader.java
 ##
 @@ -0,0 +1,368 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.avro;
+
+import org.apache.avro.Schema;
+import org.apache.avro.file.DataFileReader;
+import org.apache.avro.generic.GenericArray;
+import org.apache.avro.generic.GenericContainer;
+import org.apache.avro.generic.GenericDatumReader;
+import org.apache.avro.generic.GenericFixed;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.mapred.FsInput;
+import org.apache.avro.util.Utf8;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.exec.physical.impl.scan.file.FileScanFramework;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.util.ImpersonationUtil;
+import org.apache.drill.exec.vector.accessor.ArrayWriter;
+import org.apache.drill.exec.vector.accessor.DictWriter;
+import org.apache.drill.exec.vector.accessor.ObjectWriter;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.drill.shaded.guava.com.google.common.base.Charsets;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapred.FileSplit;
+import org.apache.hadoop.security.UserGroupInformation;
+import org.joda.time.DateTimeConstants;
+import org.joda.time.Period;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.math.BigDecimal;
+import java.math.BigInteger;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.nio.IntBuffer;
+import java.security.PrivilegedExceptionAction;
+import java.util.List;
+import java.util.Map;
+
+public class AvroBatchReader implements 
ManagedReader {
+
+  private static final Logger logger = 
LoggerFactory.getLogger(AvroBatchReader.class);
+
+  // currently config is unused but maybe used later
+  private final AvroReaderConfig config;
+
+  private Path filePath;
+  private long endPosition;
+  private DataFileReader reader;
+  private ResultSetLoader loader;
+  // re-use container instance
+  private GenericContainer container = null;
+
+  public AvroBatchReader(AvroReaderConfig config) {
+this.config = config;
+  }
+
+  @Override
+  public boolean open(FileScanFramework.FileSchemaNegotiator negotiator) {
+FileSplit split = negotiator.split();
+filePath = split.getPath();
+
+// Avro files are splittable, define reading start / end positions
+long startPosition = split.getStart();
+endPosition = startPosition + split.getLength();
+
+logger.debug("Processing Avro file: {}, start position: {}, end position: 
{}",
+  filePath, startPosition, endPosition);
+
+reader = prepareReader(split, negotiator.fileSystem(),
+  negotiator.userName(), 
negotiator.context().getFragmentContext().getQueryUserName());
+
+logger.debug("Avro file schema: {}", reader.getSchema());
+TupleMetadata schema = AvroSchemaUtil.convert(reader.getSchema());
 
 Review comment:
   I actually don't have a good example in mind. Planner-side schema planning 
is not really an EVF-specific thing. We used an external schema for CSV files 
using your provisioned schema mechanism. There is the new metadata mechanism 
that was recently added. We do partition and row group pruning for Parquet 
using external metadata.
   
   So, planner-side schema analysis is probably something new we'd want to add. 
I 

[jira] [Commented] (DRILL-7454) Convert the Avro format plugin to use EVF

2020-01-06 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17008869#comment-17008869
 ] 

ASF GitHub Bot commented on DRILL-7454:
---

arina-ielchiieva commented on pull request #1951: DRILL-7454: Convert Avro to 
EVF
URL: https://github.com/apache/drill/pull/1951#discussion_r363299653
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/avro/AvroBatchReader.java
 ##
 @@ -0,0 +1,368 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.avro;
+
+import org.apache.avro.Schema;
+import org.apache.avro.file.DataFileReader;
+import org.apache.avro.generic.GenericArray;
+import org.apache.avro.generic.GenericContainer;
+import org.apache.avro.generic.GenericDatumReader;
+import org.apache.avro.generic.GenericFixed;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.mapred.FsInput;
+import org.apache.avro.util.Utf8;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.exec.physical.impl.scan.file.FileScanFramework;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.util.ImpersonationUtil;
+import org.apache.drill.exec.vector.accessor.ArrayWriter;
+import org.apache.drill.exec.vector.accessor.DictWriter;
+import org.apache.drill.exec.vector.accessor.ObjectWriter;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.drill.shaded.guava.com.google.common.base.Charsets;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapred.FileSplit;
+import org.apache.hadoop.security.UserGroupInformation;
+import org.joda.time.DateTimeConstants;
+import org.joda.time.Period;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.math.BigDecimal;
+import java.math.BigInteger;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.nio.IntBuffer;
+import java.security.PrivilegedExceptionAction;
+import java.util.List;
+import java.util.Map;
+
+public class AvroBatchReader implements 
ManagedReader {
+
+  private static final Logger logger = 
LoggerFactory.getLogger(AvroBatchReader.class);
+
+  // currently config is unused but maybe used later
+  private final AvroReaderConfig config;
+
+  private Path filePath;
+  private long endPosition;
+  private DataFileReader reader;
+  private ResultSetLoader loader;
+  // re-use container instance
+  private GenericContainer container = null;
+
+  public AvroBatchReader(AvroReaderConfig config) {
+this.config = config;
+  }
+
+  @Override
+  public boolean open(FileScanFramework.FileSchemaNegotiator negotiator) {
+FileSplit split = negotiator.split();
+filePath = split.getPath();
+
+// Avro files are splittable, define reading start / end positions
+long startPosition = split.getStart();
+endPosition = startPosition + split.getLength();
+
+logger.debug("Processing Avro file: {}, start position: {}, end position: 
{}",
+  filePath, startPosition, endPosition);
+
+reader = prepareReader(split, negotiator.fileSystem(),
+  negotiator.userName(), 
negotiator.context().getFragmentContext().getQueryUserName());
+
+logger.debug("Avro file schema: {}", reader.getSchema());
+TupleMetadata schema = AvroSchemaUtil.convert(reader.getSchema());
+logger.debug("Avro file converted schema: {}", schema);
+negotiator.setTableSchema(schema, true);
+loader = negotiator.build();
+
+return true;
+  }
+
+  @Override
+  public boolean next() {
+RowSetLoader rowWriter = loader.writer();
+while (!rowWriter.isFull()) {
+  if (!nextLine(rowWriter)) {
+return false;
+  }
+}
+return true;
+  }
+
+  @Override
+  public void close() {
+try {
+ 

[jira] [Commented] (DRILL-7454) Convert the Avro format plugin to use EVF

2020-01-06 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17008867#comment-17008867
 ] 

ASF GitHub Bot commented on DRILL-7454:
---

arina-ielchiieva commented on pull request #1951: DRILL-7454: Convert Avro to 
EVF
URL: https://github.com/apache/drill/pull/1951#discussion_r363299653
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/avro/AvroBatchReader.java
 ##
 @@ -0,0 +1,368 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.avro;
+
+import org.apache.avro.Schema;
+import org.apache.avro.file.DataFileReader;
+import org.apache.avro.generic.GenericArray;
+import org.apache.avro.generic.GenericContainer;
+import org.apache.avro.generic.GenericDatumReader;
+import org.apache.avro.generic.GenericFixed;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.mapred.FsInput;
+import org.apache.avro.util.Utf8;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.exec.physical.impl.scan.file.FileScanFramework;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.util.ImpersonationUtil;
+import org.apache.drill.exec.vector.accessor.ArrayWriter;
+import org.apache.drill.exec.vector.accessor.DictWriter;
+import org.apache.drill.exec.vector.accessor.ObjectWriter;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.drill.shaded.guava.com.google.common.base.Charsets;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapred.FileSplit;
+import org.apache.hadoop.security.UserGroupInformation;
+import org.joda.time.DateTimeConstants;
+import org.joda.time.Period;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.math.BigDecimal;
+import java.math.BigInteger;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.nio.IntBuffer;
+import java.security.PrivilegedExceptionAction;
+import java.util.List;
+import java.util.Map;
+
+public class AvroBatchReader implements 
ManagedReader {
+
+  private static final Logger logger = 
LoggerFactory.getLogger(AvroBatchReader.class);
+
+  // currently config is unused but maybe used later
+  private final AvroReaderConfig config;
+
+  private Path filePath;
+  private long endPosition;
+  private DataFileReader reader;
+  private ResultSetLoader loader;
+  // re-use container instance
+  private GenericContainer container = null;
+
+  public AvroBatchReader(AvroReaderConfig config) {
+this.config = config;
+  }
+
+  @Override
+  public boolean open(FileScanFramework.FileSchemaNegotiator negotiator) {
+FileSplit split = negotiator.split();
+filePath = split.getPath();
+
+// Avro files are splittable, define reading start / end positions
+long startPosition = split.getStart();
+endPosition = startPosition + split.getLength();
+
+logger.debug("Processing Avro file: {}, start position: {}, end position: 
{}",
+  filePath, startPosition, endPosition);
+
+reader = prepareReader(split, negotiator.fileSystem(),
+  negotiator.userName(), 
negotiator.context().getFragmentContext().getQueryUserName());
+
+logger.debug("Avro file schema: {}", reader.getSchema());
+TupleMetadata schema = AvroSchemaUtil.convert(reader.getSchema());
+logger.debug("Avro file converted schema: {}", schema);
+negotiator.setTableSchema(schema, true);
+loader = negotiator.build();
+
+return true;
+  }
+
+  @Override
+  public boolean next() {
+RowSetLoader rowWriter = loader.writer();
+while (!rowWriter.isFull()) {
+  if (!nextLine(rowWriter)) {
+return false;
+  }
+}
+return true;
+  }
+
+  @Override
+  public void close() {
+try {
+ 

[jira] [Commented] (DRILL-7454) Convert the Avro format plugin to use EVF

2020-01-06 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17008863#comment-17008863
 ] 

ASF GitHub Bot commented on DRILL-7454:
---

arina-ielchiieva commented on pull request #1951: DRILL-7454: Convert Avro to 
EVF
URL: https://github.com/apache/drill/pull/1951#discussion_r363299757
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/avro/AvroBatchReader.java
 ##
 @@ -0,0 +1,368 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.avro;
+
+import org.apache.avro.Schema;
+import org.apache.avro.file.DataFileReader;
+import org.apache.avro.generic.GenericArray;
+import org.apache.avro.generic.GenericContainer;
+import org.apache.avro.generic.GenericDatumReader;
+import org.apache.avro.generic.GenericFixed;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.mapred.FsInput;
+import org.apache.avro.util.Utf8;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.exec.physical.impl.scan.file.FileScanFramework;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.util.ImpersonationUtil;
+import org.apache.drill.exec.vector.accessor.ArrayWriter;
+import org.apache.drill.exec.vector.accessor.DictWriter;
+import org.apache.drill.exec.vector.accessor.ObjectWriter;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.drill.shaded.guava.com.google.common.base.Charsets;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapred.FileSplit;
+import org.apache.hadoop.security.UserGroupInformation;
+import org.joda.time.DateTimeConstants;
+import org.joda.time.Period;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.math.BigDecimal;
+import java.math.BigInteger;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.nio.IntBuffer;
+import java.security.PrivilegedExceptionAction;
+import java.util.List;
+import java.util.Map;
+
+public class AvroBatchReader implements 
ManagedReader {
+
+  private static final Logger logger = 
LoggerFactory.getLogger(AvroBatchReader.class);
+
+  // currently config is unused but maybe used later
+  private final AvroReaderConfig config;
+
+  private Path filePath;
+  private long endPosition;
+  private DataFileReader reader;
+  private ResultSetLoader loader;
+  // re-use container instance
+  private GenericContainer container = null;
+
+  public AvroBatchReader(AvroReaderConfig config) {
+this.config = config;
+  }
+
+  @Override
+  public boolean open(FileScanFramework.FileSchemaNegotiator negotiator) {
+FileSplit split = negotiator.split();
+filePath = split.getPath();
+
+// Avro files are splittable, define reading start / end positions
+long startPosition = split.getStart();
+endPosition = startPosition + split.getLength();
+
+logger.debug("Processing Avro file: {}, start position: {}, end position: 
{}",
+  filePath, startPosition, endPosition);
+
+reader = prepareReader(split, negotiator.fileSystem(),
+  negotiator.userName(), 
negotiator.context().getFragmentContext().getQueryUserName());
+
+logger.debug("Avro file schema: {}", reader.getSchema());
+TupleMetadata schema = AvroSchemaUtil.convert(reader.getSchema());
+logger.debug("Avro file converted schema: {}", schema);
+negotiator.setTableSchema(schema, true);
+loader = negotiator.build();
+
+return true;
+  }
+
+  @Override
+  public boolean next() {
+RowSetLoader rowWriter = loader.writer();
+while (!rowWriter.isFull()) {
+  if (!nextLine(rowWriter)) {
+return false;
+  }
+}
+return true;
+  }
+
+  @Override
+  public void close() {
+try {
+ 

[jira] [Commented] (DRILL-7454) Convert the Avro format plugin to use EVF

2020-01-06 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17008860#comment-17008860
 ] 

ASF GitHub Bot commented on DRILL-7454:
---

arina-ielchiieva commented on pull request #1951: DRILL-7454: Convert Avro to 
EVF
URL: https://github.com/apache/drill/pull/1951#discussion_r363299757
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/avro/AvroBatchReader.java
 ##
 @@ -0,0 +1,368 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.avro;
+
+import org.apache.avro.Schema;
+import org.apache.avro.file.DataFileReader;
+import org.apache.avro.generic.GenericArray;
+import org.apache.avro.generic.GenericContainer;
+import org.apache.avro.generic.GenericDatumReader;
+import org.apache.avro.generic.GenericFixed;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.mapred.FsInput;
+import org.apache.avro.util.Utf8;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.exec.physical.impl.scan.file.FileScanFramework;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.util.ImpersonationUtil;
+import org.apache.drill.exec.vector.accessor.ArrayWriter;
+import org.apache.drill.exec.vector.accessor.DictWriter;
+import org.apache.drill.exec.vector.accessor.ObjectWriter;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.drill.shaded.guava.com.google.common.base.Charsets;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapred.FileSplit;
+import org.apache.hadoop.security.UserGroupInformation;
+import org.joda.time.DateTimeConstants;
+import org.joda.time.Period;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.math.BigDecimal;
+import java.math.BigInteger;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.nio.IntBuffer;
+import java.security.PrivilegedExceptionAction;
+import java.util.List;
+import java.util.Map;
+
+public class AvroBatchReader implements 
ManagedReader {
+
+  private static final Logger logger = 
LoggerFactory.getLogger(AvroBatchReader.class);
+
+  // currently config is unused but maybe used later
+  private final AvroReaderConfig config;
+
+  private Path filePath;
+  private long endPosition;
+  private DataFileReader reader;
+  private ResultSetLoader loader;
+  // re-use container instance
+  private GenericContainer container = null;
+
+  public AvroBatchReader(AvroReaderConfig config) {
+this.config = config;
+  }
+
+  @Override
+  public boolean open(FileScanFramework.FileSchemaNegotiator negotiator) {
+FileSplit split = negotiator.split();
+filePath = split.getPath();
+
+// Avro files are splittable, define reading start / end positions
+long startPosition = split.getStart();
+endPosition = startPosition + split.getLength();
+
+logger.debug("Processing Avro file: {}, start position: {}, end position: 
{}",
+  filePath, startPosition, endPosition);
+
+reader = prepareReader(split, negotiator.fileSystem(),
+  negotiator.userName(), 
negotiator.context().getFragmentContext().getQueryUserName());
+
+logger.debug("Avro file schema: {}", reader.getSchema());
+TupleMetadata schema = AvroSchemaUtil.convert(reader.getSchema());
+logger.debug("Avro file converted schema: {}", schema);
+negotiator.setTableSchema(schema, true);
+loader = negotiator.build();
+
+return true;
+  }
+
+  @Override
+  public boolean next() {
+RowSetLoader rowWriter = loader.writer();
+while (!rowWriter.isFull()) {
+  if (!nextLine(rowWriter)) {
+return false;
+  }
+}
+return true;
+  }
+
+  @Override
+  public void close() {
+try {
+ 

[jira] [Commented] (DRILL-7454) Convert the Avro format plugin to use EVF

2020-01-06 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17008859#comment-17008859
 ] 

ASF GitHub Bot commented on DRILL-7454:
---

arina-ielchiieva commented on pull request #1951: DRILL-7454: Convert Avro to 
EVF
URL: https://github.com/apache/drill/pull/1951#discussion_r363299653
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/avro/AvroBatchReader.java
 ##
 @@ -0,0 +1,368 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.avro;
+
+import org.apache.avro.Schema;
+import org.apache.avro.file.DataFileReader;
+import org.apache.avro.generic.GenericArray;
+import org.apache.avro.generic.GenericContainer;
+import org.apache.avro.generic.GenericDatumReader;
+import org.apache.avro.generic.GenericFixed;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.mapred.FsInput;
+import org.apache.avro.util.Utf8;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.exec.physical.impl.scan.file.FileScanFramework;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.util.ImpersonationUtil;
+import org.apache.drill.exec.vector.accessor.ArrayWriter;
+import org.apache.drill.exec.vector.accessor.DictWriter;
+import org.apache.drill.exec.vector.accessor.ObjectWriter;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.drill.shaded.guava.com.google.common.base.Charsets;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapred.FileSplit;
+import org.apache.hadoop.security.UserGroupInformation;
+import org.joda.time.DateTimeConstants;
+import org.joda.time.Period;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.math.BigDecimal;
+import java.math.BigInteger;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.nio.IntBuffer;
+import java.security.PrivilegedExceptionAction;
+import java.util.List;
+import java.util.Map;
+
+public class AvroBatchReader implements 
ManagedReader {
+
+  private static final Logger logger = 
LoggerFactory.getLogger(AvroBatchReader.class);
+
+  // currently config is unused but maybe used later
+  private final AvroReaderConfig config;
+
+  private Path filePath;
+  private long endPosition;
+  private DataFileReader reader;
+  private ResultSetLoader loader;
+  // re-use container instance
+  private GenericContainer container = null;
+
+  public AvroBatchReader(AvroReaderConfig config) {
+this.config = config;
+  }
+
+  @Override
+  public boolean open(FileScanFramework.FileSchemaNegotiator negotiator) {
+FileSplit split = negotiator.split();
+filePath = split.getPath();
+
+// Avro files are splittable, define reading start / end positions
+long startPosition = split.getStart();
+endPosition = startPosition + split.getLength();
+
+logger.debug("Processing Avro file: {}, start position: {}, end position: 
{}",
+  filePath, startPosition, endPosition);
+
+reader = prepareReader(split, negotiator.fileSystem(),
+  negotiator.userName(), 
negotiator.context().getFragmentContext().getQueryUserName());
+
+logger.debug("Avro file schema: {}", reader.getSchema());
+TupleMetadata schema = AvroSchemaUtil.convert(reader.getSchema());
+logger.debug("Avro file converted schema: {}", schema);
+negotiator.setTableSchema(schema, true);
+loader = negotiator.build();
+
+return true;
+  }
+
+  @Override
+  public boolean next() {
+RowSetLoader rowWriter = loader.writer();
+while (!rowWriter.isFull()) {
+  if (!nextLine(rowWriter)) {
+return false;
+  }
+}
+return true;
+  }
+
+  @Override
+  public void close() {
+try {
+ 

[jira] [Commented] (DRILL-7454) Convert the Avro format plugin to use EVF

2020-01-06 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17008836#comment-17008836
 ] 

ASF GitHub Bot commented on DRILL-7454:
---

arina-ielchiieva commented on pull request #1951: DRILL-7454: Convert Avro to 
EVF
URL: https://github.com/apache/drill/pull/1951#discussion_r363287952
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/avro/AvroBatchReader.java
 ##
 @@ -0,0 +1,368 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.avro;
+
+import org.apache.avro.Schema;
+import org.apache.avro.file.DataFileReader;
+import org.apache.avro.generic.GenericArray;
+import org.apache.avro.generic.GenericContainer;
+import org.apache.avro.generic.GenericDatumReader;
+import org.apache.avro.generic.GenericFixed;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.mapred.FsInput;
+import org.apache.avro.util.Utf8;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.exec.physical.impl.scan.file.FileScanFramework;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.util.ImpersonationUtil;
+import org.apache.drill.exec.vector.accessor.ArrayWriter;
+import org.apache.drill.exec.vector.accessor.DictWriter;
+import org.apache.drill.exec.vector.accessor.ObjectWriter;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.drill.shaded.guava.com.google.common.base.Charsets;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapred.FileSplit;
+import org.apache.hadoop.security.UserGroupInformation;
+import org.joda.time.DateTimeConstants;
+import org.joda.time.Period;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.math.BigDecimal;
+import java.math.BigInteger;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.nio.IntBuffer;
+import java.security.PrivilegedExceptionAction;
+import java.util.List;
+import java.util.Map;
+
+public class AvroBatchReader implements 
ManagedReader {
+
+  private static final Logger logger = 
LoggerFactory.getLogger(AvroBatchReader.class);
+
+  // currently config is unused but maybe used later
+  private final AvroReaderConfig config;
+
+  private Path filePath;
+  private long endPosition;
+  private DataFileReader reader;
+  private ResultSetLoader loader;
+  // re-use container instance
+  private GenericContainer container = null;
+
+  public AvroBatchReader(AvroReaderConfig config) {
+this.config = config;
+  }
+
+  @Override
+  public boolean open(FileScanFramework.FileSchemaNegotiator negotiator) {
+FileSplit split = negotiator.split();
+filePath = split.getPath();
+
+// Avro files are splittable, define reading start / end positions
+long startPosition = split.getStart();
+endPosition = startPosition + split.getLength();
+
+logger.debug("Processing Avro file: {}, start position: {}, end position: 
{}",
+  filePath, startPosition, endPosition);
+
+reader = prepareReader(split, negotiator.fileSystem(),
+  negotiator.userName(), 
negotiator.context().getFragmentContext().getQueryUserName());
+
+logger.debug("Avro file schema: {}", reader.getSchema());
+TupleMetadata schema = AvroSchemaUtil.convert(reader.getSchema());
+logger.debug("Avro file converted schema: {}", schema);
+negotiator.setTableSchema(schema, true);
+loader = negotiator.build();
+
+return true;
+  }
+
+  @Override
+  public boolean next() {
+RowSetLoader rowWriter = loader.writer();
+while (!rowWriter.isFull()) {
+  if (!nextLine(rowWriter)) {
+return false;
+  }
+}
+return true;
+  }
+
+  @Override
+  public void close() {
+try {
+ 

[jira] [Commented] (DRILL-7454) Convert the Avro format plugin to use EVF

2020-01-06 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17008835#comment-17008835
 ] 

ASF GitHub Bot commented on DRILL-7454:
---

arina-ielchiieva commented on pull request #1951: DRILL-7454: Convert Avro to 
EVF
URL: https://github.com/apache/drill/pull/1951#discussion_r363287952
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/avro/AvroBatchReader.java
 ##
 @@ -0,0 +1,368 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.avro;
+
+import org.apache.avro.Schema;
+import org.apache.avro.file.DataFileReader;
+import org.apache.avro.generic.GenericArray;
+import org.apache.avro.generic.GenericContainer;
+import org.apache.avro.generic.GenericDatumReader;
+import org.apache.avro.generic.GenericFixed;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.mapred.FsInput;
+import org.apache.avro.util.Utf8;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.exec.physical.impl.scan.file.FileScanFramework;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.util.ImpersonationUtil;
+import org.apache.drill.exec.vector.accessor.ArrayWriter;
+import org.apache.drill.exec.vector.accessor.DictWriter;
+import org.apache.drill.exec.vector.accessor.ObjectWriter;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.drill.shaded.guava.com.google.common.base.Charsets;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapred.FileSplit;
+import org.apache.hadoop.security.UserGroupInformation;
+import org.joda.time.DateTimeConstants;
+import org.joda.time.Period;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.math.BigDecimal;
+import java.math.BigInteger;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.nio.IntBuffer;
+import java.security.PrivilegedExceptionAction;
+import java.util.List;
+import java.util.Map;
+
+public class AvroBatchReader implements 
ManagedReader {
+
+  private static final Logger logger = 
LoggerFactory.getLogger(AvroBatchReader.class);
+
+  // currently config is unused but maybe used later
+  private final AvroReaderConfig config;
+
+  private Path filePath;
+  private long endPosition;
+  private DataFileReader reader;
+  private ResultSetLoader loader;
+  // re-use container instance
+  private GenericContainer container = null;
+
+  public AvroBatchReader(AvroReaderConfig config) {
+this.config = config;
+  }
+
+  @Override
+  public boolean open(FileScanFramework.FileSchemaNegotiator negotiator) {
+FileSplit split = negotiator.split();
+filePath = split.getPath();
+
+// Avro files are splittable, define reading start / end positions
+long startPosition = split.getStart();
+endPosition = startPosition + split.getLength();
+
+logger.debug("Processing Avro file: {}, start position: {}, end position: 
{}",
+  filePath, startPosition, endPosition);
+
+reader = prepareReader(split, negotiator.fileSystem(),
+  negotiator.userName(), 
negotiator.context().getFragmentContext().getQueryUserName());
+
+logger.debug("Avro file schema: {}", reader.getSchema());
+TupleMetadata schema = AvroSchemaUtil.convert(reader.getSchema());
+logger.debug("Avro file converted schema: {}", schema);
+negotiator.setTableSchema(schema, true);
+loader = negotiator.build();
+
+return true;
+  }
+
+  @Override
+  public boolean next() {
+RowSetLoader rowWriter = loader.writer();
+while (!rowWriter.isFull()) {
+  if (!nextLine(rowWriter)) {
+return false;
+  }
+}
+return true;
+  }
+
+  @Override
+  public void close() {
+try {
+ 

[jira] [Commented] (DRILL-7454) Convert the Avro format plugin to use EVF

2020-01-06 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17008834#comment-17008834
 ] 

ASF GitHub Bot commented on DRILL-7454:
---

arina-ielchiieva commented on pull request #1951: DRILL-7454: Convert Avro to 
EVF
URL: https://github.com/apache/drill/pull/1951#discussion_r363287952
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/avro/AvroBatchReader.java
 ##
 @@ -0,0 +1,368 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.avro;
+
+import org.apache.avro.Schema;
+import org.apache.avro.file.DataFileReader;
+import org.apache.avro.generic.GenericArray;
+import org.apache.avro.generic.GenericContainer;
+import org.apache.avro.generic.GenericDatumReader;
+import org.apache.avro.generic.GenericFixed;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.mapred.FsInput;
+import org.apache.avro.util.Utf8;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.exec.physical.impl.scan.file.FileScanFramework;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.util.ImpersonationUtil;
+import org.apache.drill.exec.vector.accessor.ArrayWriter;
+import org.apache.drill.exec.vector.accessor.DictWriter;
+import org.apache.drill.exec.vector.accessor.ObjectWriter;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.drill.shaded.guava.com.google.common.base.Charsets;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapred.FileSplit;
+import org.apache.hadoop.security.UserGroupInformation;
+import org.joda.time.DateTimeConstants;
+import org.joda.time.Period;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.math.BigDecimal;
+import java.math.BigInteger;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.nio.IntBuffer;
+import java.security.PrivilegedExceptionAction;
+import java.util.List;
+import java.util.Map;
+
+public class AvroBatchReader implements 
ManagedReader {
+
+  private static final Logger logger = 
LoggerFactory.getLogger(AvroBatchReader.class);
+
+  // currently config is unused but maybe used later
+  private final AvroReaderConfig config;
+
+  private Path filePath;
+  private long endPosition;
+  private DataFileReader reader;
+  private ResultSetLoader loader;
+  // re-use container instance
+  private GenericContainer container = null;
+
+  public AvroBatchReader(AvroReaderConfig config) {
+this.config = config;
+  }
+
+  @Override
+  public boolean open(FileScanFramework.FileSchemaNegotiator negotiator) {
+FileSplit split = negotiator.split();
+filePath = split.getPath();
+
+// Avro files are splittable, define reading start / end positions
+long startPosition = split.getStart();
+endPosition = startPosition + split.getLength();
+
+logger.debug("Processing Avro file: {}, start position: {}, end position: 
{}",
+  filePath, startPosition, endPosition);
+
+reader = prepareReader(split, negotiator.fileSystem(),
+  negotiator.userName(), 
negotiator.context().getFragmentContext().getQueryUserName());
+
+logger.debug("Avro file schema: {}", reader.getSchema());
+TupleMetadata schema = AvroSchemaUtil.convert(reader.getSchema());
+logger.debug("Avro file converted schema: {}", schema);
+negotiator.setTableSchema(schema, true);
+loader = negotiator.build();
+
+return true;
+  }
+
+  @Override
+  public boolean next() {
+RowSetLoader rowWriter = loader.writer();
+while (!rowWriter.isFull()) {
+  if (!nextLine(rowWriter)) {
+return false;
+  }
+}
+return true;
+  }
+
+  @Override
+  public void close() {
+try {
+ 

[jira] [Commented] (DRILL-7454) Convert the Avro format plugin to use EVF

2020-01-06 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17008821#comment-17008821
 ] 

ASF GitHub Bot commented on DRILL-7454:
---

arina-ielchiieva commented on pull request #1951: DRILL-7454: Convert Avro to 
EVF
URL: https://github.com/apache/drill/pull/1951#discussion_r363281719
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/avro/AvroBatchReader.java
 ##
 @@ -0,0 +1,368 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.avro;
+
+import org.apache.avro.Schema;
+import org.apache.avro.file.DataFileReader;
+import org.apache.avro.generic.GenericArray;
+import org.apache.avro.generic.GenericContainer;
+import org.apache.avro.generic.GenericDatumReader;
+import org.apache.avro.generic.GenericFixed;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.mapred.FsInput;
+import org.apache.avro.util.Utf8;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.exec.physical.impl.scan.file.FileScanFramework;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.util.ImpersonationUtil;
+import org.apache.drill.exec.vector.accessor.ArrayWriter;
+import org.apache.drill.exec.vector.accessor.DictWriter;
+import org.apache.drill.exec.vector.accessor.ObjectWriter;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.drill.shaded.guava.com.google.common.base.Charsets;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapred.FileSplit;
+import org.apache.hadoop.security.UserGroupInformation;
+import org.joda.time.DateTimeConstants;
+import org.joda.time.Period;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.math.BigDecimal;
+import java.math.BigInteger;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.nio.IntBuffer;
+import java.security.PrivilegedExceptionAction;
+import java.util.List;
+import java.util.Map;
+
+public class AvroBatchReader implements 
ManagedReader {
+
+  private static final Logger logger = 
LoggerFactory.getLogger(AvroBatchReader.class);
+
+  // currently config is unused but maybe used later
+  private final AvroReaderConfig config;
+
+  private Path filePath;
+  private long endPosition;
+  private DataFileReader reader;
+  private ResultSetLoader loader;
+  // re-use container instance
+  private GenericContainer container = null;
 
 Review comment:
   Good point, replace with record.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Convert the Avro format plugin to use EVF
> -
>
> Key: DRILL-7454
> URL: https://issues.apache.org/jira/browse/DRILL-7454
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
> Fix For: 1.18.0
>
>
> Convert the Avro format plugin to use EVF.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7454) Convert the Avro format plugin to use EVF

2020-01-06 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17008824#comment-17008824
 ] 

ASF GitHub Bot commented on DRILL-7454:
---

arina-ielchiieva commented on pull request #1951: DRILL-7454: Convert Avro to 
EVF
URL: https://github.com/apache/drill/pull/1951#discussion_r363282901
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/store/avro/AvroDataGenerator.java
 ##
 @@ -0,0 +1,819 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.avro;
+
+import org.apache.avro.LogicalType;
+import org.apache.avro.LogicalTypes;
+import org.apache.avro.Schema;
+import org.apache.avro.Schema.Type;
+import org.apache.avro.SchemaBuilder;
+import org.apache.avro.file.DataFileWriter;
+import org.apache.avro.generic.GenericArray;
+import org.apache.avro.generic.GenericData;
+import org.apache.avro.generic.GenericDatumWriter;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.drill.exec.util.JsonStringArrayList;
+import org.apache.drill.exec.util.JsonStringHashMap;
+import org.apache.drill.exec.util.Text;
+import org.apache.drill.test.BaseDirTestWatcher;
+
+import java.io.Closeable;
+import java.io.File;
+import java.io.IOException;
+import java.math.BigInteger;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.nio.file.Paths;
+import java.time.LocalDateTime;
+import java.time.ZoneOffset;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.TreeMap;
+
+/**
+ * Utilities for generating Avro test data.
+ */
+public class AvroDataGenerator {
+
+  public static final int RECORD_COUNT = 50;
+  public static int ARRAY_SIZE = 4;
+
+  private final BaseDirTestWatcher dirTestWatcher;
+
+  public AvroDataGenerator(BaseDirTestWatcher dirTestWatcher) {
+this.dirTestWatcher = dirTestWatcher;
+  }
+
+  /**
+   * Class to write records to an Avro file while simultaneously
+   * constructing a corresponding list of records in the format taken in
+   * by the Drill test builder to describe expected results.
+   */
+  public static class AvroTestRecordWriter implements Closeable {
+
+private final List> expectedRecords;
+private final Schema schema;
+private final DataFileWriter writer;
+private final String filePath;
+private final String fileName;
+
+private GenericData.Record currentAvroRecord;
+private Map currentExpectedRecord;
+
+public AvroTestRecordWriter(Schema schema, File file) {
+  writer = new DataFileWriter<>(new GenericDatumWriter<>(schema));
+  try {
+writer.create(schema, file);
+  } catch (IOException e) {
+throw new RuntimeException("Error creating file in Avro test setup.", 
e);
+  }
+  this.schema = schema;
+  currentExpectedRecord = new TreeMap<>();
+  expectedRecords = new ArrayList<>();
+  filePath = file.getAbsolutePath();
+  fileName = file.getName();
+}
+
+public void startRecord() {
+  currentAvroRecord = new GenericData.Record(schema);
+  currentExpectedRecord = new TreeMap<>();
+}
+
+public void put(String key, Object value) {
+  currentAvroRecord.put(key, value);
+  // convert binary values into byte[], the format they will be given
+  // in the Drill result set in the test framework
+  currentExpectedRecord.put("`" + key + "`", convertAvroValToDrill(value, 
true));
+}
+
+// TODO - fix this the test wrapper to prevent the need for this hack
+// to make the root behave differently than nested fields for String vs. 
Text
+private Object convertAvroValToDrill(Object value, boolean root) {
+  if (value instanceof ByteBuffer) {
+ByteBuffer bb = ((ByteBuffer)value);
 
 Review comment:
   Fixed.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact 

[jira] [Commented] (DRILL-7454) Convert the Avro format plugin to use EVF

2020-01-06 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17008827#comment-17008827
 ] 

ASF GitHub Bot commented on DRILL-7454:
---

arina-ielchiieva commented on pull request #1951: DRILL-7454: Convert Avro to 
EVF
URL: https://github.com/apache/drill/pull/1951#discussion_r363284629
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/avro/AvroDrillTable.java
 ##
 @@ -1,197 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-package org.apache.drill.exec.store.avro;
-
-import java.io.IOException;
-import java.util.List;
-
-import org.apache.avro.LogicalType;
-import org.apache.avro.Schema;
-import org.apache.avro.Schema.Field;
-import org.apache.avro.file.DataFileReader;
-import org.apache.avro.generic.GenericContainer;
-import org.apache.avro.generic.GenericDatumReader;
-import org.apache.avro.mapred.FsInput;
-import org.apache.calcite.rel.type.RelDataType;
-import org.apache.calcite.rel.type.RelDataTypeFactory;
-import org.apache.calcite.sql.type.SqlTypeName;
-import org.apache.drill.common.exceptions.UserException;
-import org.apache.drill.exec.planner.logical.DrillTable;
-import org.apache.drill.exec.planner.logical.ExtendableRelDataType;
-import org.apache.drill.exec.planner.types.ExtendableRelDataTypeHolder;
-import org.apache.drill.exec.store.ColumnExplorer;
-import org.apache.drill.exec.store.SchemaConfig;
-import org.apache.drill.exec.store.dfs.FileSystemPlugin;
-import org.apache.drill.exec.store.dfs.FormatSelection;
-import org.apache.hadoop.fs.Path;
-
-import org.apache.drill.shaded.guava.com.google.common.collect.Lists;
-
-public class AvroDrillTable extends DrillTable {
-  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(AvroDrillTable.class);
-
-  private final DataFileReader reader;
-  private final SchemaConfig schemaConfig;
-  private ExtendableRelDataTypeHolder holder;
 
 Review comment:
   Thanks for pointing to this. Removed.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Convert the Avro format plugin to use EVF
> -
>
> Key: DRILL-7454
> URL: https://issues.apache.org/jira/browse/DRILL-7454
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
> Fix For: 1.18.0
>
>
> Convert the Avro format plugin to use EVF.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7454) Convert the Avro format plugin to use EVF

2020-01-06 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17008825#comment-17008825
 ] 

ASF GitHub Bot commented on DRILL-7454:
---

arina-ielchiieva commented on pull request #1951: DRILL-7454: Convert Avro to 
EVF
URL: https://github.com/apache/drill/pull/1951#discussion_r363283207
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/store/avro/AvroSchemaUtilTest.java
 ##
 @@ -0,0 +1,431 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.avro;
+
+import org.apache.avro.LogicalType;
+import org.apache.avro.LogicalTypes;
+import org.apache.avro.Schema;
+import org.apache.avro.SchemaBuilder;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.test.BaseTest;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.ExpectedException;
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertTrue;
+
+public class AvroSchemaUtilTest extends BaseTest {
 
 Review comment:
   Thanks!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Convert the Avro format plugin to use EVF
> -
>
> Key: DRILL-7454
> URL: https://issues.apache.org/jira/browse/DRILL-7454
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
> Fix For: 1.18.0
>
>
> Convert the Avro format plugin to use EVF.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7454) Convert the Avro format plugin to use EVF

2020-01-06 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17008822#comment-17008822
 ] 

ASF GitHub Bot commented on DRILL-7454:
---

arina-ielchiieva commented on pull request #1951: DRILL-7454: Convert Avro to 
EVF
URL: https://github.com/apache/drill/pull/1951#discussion_r363282520
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/avro/AvroFormatPlugin.java
 ##
 @@ -17,117 +17,68 @@
  */
 package org.apache.drill.exec.store.avro;
 
-import java.io.IOException;
-import java.util.List;
-import java.util.regex.Pattern;
-
-import org.apache.drill.common.exceptions.ExecutionSetupException;
-import org.apache.drill.common.expression.SchemaPath;
 import org.apache.drill.common.logical.StoragePluginConfig;
-import org.apache.drill.exec.ops.FragmentContext;
-import org.apache.drill.exec.planner.common.DrillStatsTable.TableStatistics;
-import org.apache.drill.exec.planner.logical.DrillTable;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.common.types.Types;
+import org.apache.drill.exec.physical.impl.scan.file.FileScanFramework;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
 import org.apache.drill.exec.proto.UserBitShared.CoreOperatorType;
 import org.apache.drill.exec.server.DrillbitContext;
-import org.apache.drill.exec.store.RecordReader;
-import org.apache.drill.exec.store.RecordWriter;
-import org.apache.drill.exec.store.SchemaConfig;
-import org.apache.drill.exec.store.dfs.BasicFormatMatcher;
-import org.apache.drill.exec.store.dfs.DrillFileSystem;
-import org.apache.drill.exec.store.dfs.FileSelection;
-import org.apache.drill.exec.store.dfs.FileSystemPlugin;
-import org.apache.drill.exec.store.dfs.FormatMatcher;
-import org.apache.drill.exec.store.dfs.FormatSelection;
-import org.apache.drill.exec.store.dfs.MagicString;
+import org.apache.drill.exec.server.options.OptionManager;
 import org.apache.drill.exec.store.dfs.easy.EasyFormatPlugin;
-import org.apache.drill.exec.store.dfs.easy.EasyWriter;
-import org.apache.drill.exec.store.dfs.easy.FileWork;
+import org.apache.drill.exec.store.dfs.easy.EasySubScan;
 import org.apache.hadoop.conf.Configuration;
 
-import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableList;
-import org.apache.drill.shaded.guava.com.google.common.collect.Lists;
-import org.apache.hadoop.fs.FileSystem;
-import org.apache.hadoop.fs.Path;
-
 /**
  * Format plugin for Avro data files.
  */
 public class AvroFormatPlugin extends EasyFormatPlugin {
 
-  private final AvroFormatMatcher matcher;
-
-  public AvroFormatPlugin(String name, DrillbitContext context, Configuration 
fsConf,
-  StoragePluginConfig storagePluginConfig) {
-this(name, context, fsConf, storagePluginConfig, new AvroFormatConfig());
-  }
-
-  public AvroFormatPlugin(String name, DrillbitContext context, Configuration 
fsConf, StoragePluginConfig config, AvroFormatConfig formatPluginConfig) {
-super(name, context, fsConf, config, formatPluginConfig, true, false, 
true, false, Lists.newArrayList("avro"), "avro");
-this.matcher = new AvroFormatMatcher(this);
-  }
-
-  @Override
-  public boolean supportsPushDown() {
-return true;
-  }
-
-  @Override
-  public RecordReader getRecordReader(FragmentContext context, DrillFileSystem 
dfs, FileWork fileWork, List columns, String userName) throws 
ExecutionSetupException {
-return new AvroRecordReader(context, fileWork.getPath(), 
fileWork.getStart(), fileWork.getLength(), dfs, columns,
-  userName);
-  }
-
-  @Override
-  public RecordWriter getRecordWriter(FragmentContext context, EasyWriter 
writer) throws IOException {
-throw new UnsupportedOperationException("unimplemented");
-  }
+  public static final String DEFAULT_NAME = "avro";
 
-  @Override
-  public int getReaderOperatorType() {
-return CoreOperatorType.AVRO_SUB_SCAN_VALUE;
+  public AvroFormatPlugin(String name,
+ DrillbitContext context,
+ Configuration fsConf,
+ StoragePluginConfig storageConfig,
+ AvroFormatConfig formatConfig) {
+super(name, easyConfig(fsConf, formatConfig), context, storageConfig, 
formatConfig);
   }
 
-  @Override
-  public int getWriterOperatorType() {
-throw new UnsupportedOperationException("unimplemented");
+  private static EasyFormatConfig easyConfig(Configuration fsConf, 
AvroFormatConfig formatConfig) {
+EasyFormatConfig config = new EasyFormatConfig();
+config.readable = true;
+config.writable = false;
+config.blockSplittable = true;
+config.compressible = false;
+config.supportsProjectPushdown = true;
+config.extensions = formatConfig.extensions;
+config.fsConf = fsConf;
+config.defaultName = DEFAULT_NAME;
+config.readerOperatorType = 

[jira] [Commented] (DRILL-7454) Convert the Avro format plugin to use EVF

2020-01-06 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17008829#comment-17008829
 ] 

ASF GitHub Bot commented on DRILL-7454:
---

arina-ielchiieva commented on pull request #1951: DRILL-7454: Convert Avro to 
EVF
URL: https://github.com/apache/drill/pull/1951#discussion_r363286237
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/avro/AvroBatchReader.java
 ##
 @@ -0,0 +1,368 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.avro;
+
+import org.apache.avro.Schema;
+import org.apache.avro.file.DataFileReader;
+import org.apache.avro.generic.GenericArray;
+import org.apache.avro.generic.GenericContainer;
+import org.apache.avro.generic.GenericDatumReader;
+import org.apache.avro.generic.GenericFixed;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.mapred.FsInput;
+import org.apache.avro.util.Utf8;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.exec.physical.impl.scan.file.FileScanFramework;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.util.ImpersonationUtil;
+import org.apache.drill.exec.vector.accessor.ArrayWriter;
+import org.apache.drill.exec.vector.accessor.DictWriter;
+import org.apache.drill.exec.vector.accessor.ObjectWriter;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.drill.shaded.guava.com.google.common.base.Charsets;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapred.FileSplit;
+import org.apache.hadoop.security.UserGroupInformation;
+import org.joda.time.DateTimeConstants;
+import org.joda.time.Period;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.math.BigDecimal;
+import java.math.BigInteger;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.nio.IntBuffer;
+import java.security.PrivilegedExceptionAction;
+import java.util.List;
+import java.util.Map;
+
+public class AvroBatchReader implements 
ManagedReader {
+
+  private static final Logger logger = 
LoggerFactory.getLogger(AvroBatchReader.class);
+
+  // currently config is unused but maybe used later
+  private final AvroReaderConfig config;
+
+  private Path filePath;
+  private long endPosition;
+  private DataFileReader reader;
+  private ResultSetLoader loader;
+  // re-use container instance
+  private GenericContainer container = null;
+
+  public AvroBatchReader(AvroReaderConfig config) {
+this.config = config;
+  }
+
+  @Override
+  public boolean open(FileScanFramework.FileSchemaNegotiator negotiator) {
+FileSplit split = negotiator.split();
+filePath = split.getPath();
+
+// Avro files are splittable, define reading start / end positions
+long startPosition = split.getStart();
+endPosition = startPosition + split.getLength();
+
+logger.debug("Processing Avro file: {}, start position: {}, end position: 
{}",
+  filePath, startPosition, endPosition);
+
+reader = prepareReader(split, negotiator.fileSystem(),
+  negotiator.userName(), 
negotiator.context().getFragmentContext().getQueryUserName());
+
+logger.debug("Avro file schema: {}", reader.getSchema());
+TupleMetadata schema = AvroSchemaUtil.convert(reader.getSchema());
 
 Review comment:
   Sounds good, are there examples for EVF where schema is inferred at plan 
time? When doing this implementation, I looked at existing EVF implementations 
and mostly they inferred schema in reader.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific 

[jira] [Commented] (DRILL-7454) Convert the Avro format plugin to use EVF

2020-01-06 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17008826#comment-17008826
 ] 

ASF GitHub Bot commented on DRILL-7454:
---

arina-ielchiieva commented on pull request #1951: DRILL-7454: Convert Avro to 
EVF
URL: https://github.com/apache/drill/pull/1951#discussion_r363281988
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/avro/AvroBatchReader.java
 ##
 @@ -0,0 +1,368 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.avro;
+
+import org.apache.avro.Schema;
+import org.apache.avro.file.DataFileReader;
+import org.apache.avro.generic.GenericArray;
+import org.apache.avro.generic.GenericContainer;
+import org.apache.avro.generic.GenericDatumReader;
+import org.apache.avro.generic.GenericFixed;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.mapred.FsInput;
+import org.apache.avro.util.Utf8;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.exec.physical.impl.scan.file.FileScanFramework;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.util.ImpersonationUtil;
+import org.apache.drill.exec.vector.accessor.ArrayWriter;
+import org.apache.drill.exec.vector.accessor.DictWriter;
+import org.apache.drill.exec.vector.accessor.ObjectWriter;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.drill.shaded.guava.com.google.common.base.Charsets;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapred.FileSplit;
+import org.apache.hadoop.security.UserGroupInformation;
+import org.joda.time.DateTimeConstants;
+import org.joda.time.Period;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.math.BigDecimal;
+import java.math.BigInteger;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.nio.IntBuffer;
+import java.security.PrivilegedExceptionAction;
+import java.util.List;
+import java.util.Map;
+
+public class AvroBatchReader implements 
ManagedReader {
+
+  private static final Logger logger = 
LoggerFactory.getLogger(AvroBatchReader.class);
+
+  // currently config is unused but maybe used later
+  private final AvroReaderConfig config;
+
+  private Path filePath;
+  private long endPosition;
+  private DataFileReader reader;
+  private ResultSetLoader loader;
+  // re-use container instance
+  private GenericContainer container = null;
+
+  public AvroBatchReader(AvroReaderConfig config) {
+this.config = config;
+  }
+
+  @Override
+  public boolean open(FileScanFramework.FileSchemaNegotiator negotiator) {
+FileSplit split = negotiator.split();
+filePath = split.getPath();
+
+// Avro files are splittable, define reading start / end positions
+long startPosition = split.getStart();
+endPosition = startPosition + split.getLength();
+
+logger.debug("Processing Avro file: {}, start position: {}, end position: 
{}",
+  filePath, startPosition, endPosition);
+
+reader = prepareReader(split, negotiator.fileSystem(),
+  negotiator.userName(), 
negotiator.context().getFragmentContext().getQueryUserName());
+
+logger.debug("Avro file schema: {}", reader.getSchema());
+TupleMetadata schema = AvroSchemaUtil.convert(reader.getSchema());
+logger.debug("Avro file converted schema: {}", schema);
+negotiator.setTableSchema(schema, true);
+loader = negotiator.build();
+
+return true;
+  }
+
+  @Override
+  public boolean next() {
+RowSetLoader rowWriter = loader.writer();
+while (!rowWriter.isFull()) {
+  if (!nextLine(rowWriter)) {
+return false;
+  }
+}
+return true;
+  }
+
+  @Override
+  public void close() {
+try {
+ 

[jira] [Commented] (DRILL-7454) Convert the Avro format plugin to use EVF

2020-01-06 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17008828#comment-17008828
 ] 

ASF GitHub Bot commented on DRILL-7454:
---

arina-ielchiieva commented on pull request #1951: DRILL-7454: Convert Avro to 
EVF
URL: https://github.com/apache/drill/pull/1951#discussion_r363286237
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/avro/AvroBatchReader.java
 ##
 @@ -0,0 +1,368 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.avro;
+
+import org.apache.avro.Schema;
+import org.apache.avro.file.DataFileReader;
+import org.apache.avro.generic.GenericArray;
+import org.apache.avro.generic.GenericContainer;
+import org.apache.avro.generic.GenericDatumReader;
+import org.apache.avro.generic.GenericFixed;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.mapred.FsInput;
+import org.apache.avro.util.Utf8;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.exec.physical.impl.scan.file.FileScanFramework;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.util.ImpersonationUtil;
+import org.apache.drill.exec.vector.accessor.ArrayWriter;
+import org.apache.drill.exec.vector.accessor.DictWriter;
+import org.apache.drill.exec.vector.accessor.ObjectWriter;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.drill.shaded.guava.com.google.common.base.Charsets;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapred.FileSplit;
+import org.apache.hadoop.security.UserGroupInformation;
+import org.joda.time.DateTimeConstants;
+import org.joda.time.Period;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.math.BigDecimal;
+import java.math.BigInteger;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.nio.IntBuffer;
+import java.security.PrivilegedExceptionAction;
+import java.util.List;
+import java.util.Map;
+
+public class AvroBatchReader implements 
ManagedReader {
+
+  private static final Logger logger = 
LoggerFactory.getLogger(AvroBatchReader.class);
+
+  // currently config is unused but maybe used later
+  private final AvroReaderConfig config;
+
+  private Path filePath;
+  private long endPosition;
+  private DataFileReader reader;
+  private ResultSetLoader loader;
+  // re-use container instance
+  private GenericContainer container = null;
+
+  public AvroBatchReader(AvroReaderConfig config) {
+this.config = config;
+  }
+
+  @Override
+  public boolean open(FileScanFramework.FileSchemaNegotiator negotiator) {
+FileSplit split = negotiator.split();
+filePath = split.getPath();
+
+// Avro files are splittable, define reading start / end positions
+long startPosition = split.getStart();
+endPosition = startPosition + split.getLength();
+
+logger.debug("Processing Avro file: {}, start position: {}, end position: 
{}",
+  filePath, startPosition, endPosition);
+
+reader = prepareReader(split, negotiator.fileSystem(),
+  negotiator.userName(), 
negotiator.context().getFragmentContext().getQueryUserName());
+
+logger.debug("Avro file schema: {}", reader.getSchema());
+TupleMetadata schema = AvroSchemaUtil.convert(reader.getSchema());
 
 Review comment:
   Sounds good, is there examples for EVF where schema is inferred at plan 
time? When doing this implementation, I looked at existing EVF implementations 
and mostly they inferred schema in reader.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific 

[jira] [Commented] (DRILL-7454) Convert the Avro format plugin to use EVF

2020-01-06 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17008823#comment-17008823
 ] 

ASF GitHub Bot commented on DRILL-7454:
---

arina-ielchiieva commented on pull request #1951: DRILL-7454: Convert Avro to 
EVF
URL: https://github.com/apache/drill/pull/1951#discussion_r363282929
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/store/avro/AvroDataGenerator.java
 ##
 @@ -0,0 +1,819 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.avro;
+
+import org.apache.avro.LogicalType;
+import org.apache.avro.LogicalTypes;
+import org.apache.avro.Schema;
+import org.apache.avro.Schema.Type;
+import org.apache.avro.SchemaBuilder;
+import org.apache.avro.file.DataFileWriter;
+import org.apache.avro.generic.GenericArray;
+import org.apache.avro.generic.GenericData;
+import org.apache.avro.generic.GenericDatumWriter;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.drill.exec.util.JsonStringArrayList;
+import org.apache.drill.exec.util.JsonStringHashMap;
+import org.apache.drill.exec.util.Text;
+import org.apache.drill.test.BaseDirTestWatcher;
+
+import java.io.Closeable;
+import java.io.File;
+import java.io.IOException;
+import java.math.BigInteger;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.nio.file.Paths;
+import java.time.LocalDateTime;
+import java.time.ZoneOffset;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.TreeMap;
+
+/**
+ * Utilities for generating Avro test data.
+ */
+public class AvroDataGenerator {
+
+  public static final int RECORD_COUNT = 50;
+  public static int ARRAY_SIZE = 4;
+
+  private final BaseDirTestWatcher dirTestWatcher;
+
+  public AvroDataGenerator(BaseDirTestWatcher dirTestWatcher) {
+this.dirTestWatcher = dirTestWatcher;
+  }
+
+  /**
+   * Class to write records to an Avro file while simultaneously
+   * constructing a corresponding list of records in the format taken in
+   * by the Drill test builder to describe expected results.
+   */
+  public static class AvroTestRecordWriter implements Closeable {
+
+private final List> expectedRecords;
+private final Schema schema;
+private final DataFileWriter writer;
+private final String filePath;
+private final String fileName;
+
+private GenericData.Record currentAvroRecord;
+private Map currentExpectedRecord;
+
+public AvroTestRecordWriter(Schema schema, File file) {
+  writer = new DataFileWriter<>(new GenericDatumWriter<>(schema));
+  try {
+writer.create(schema, file);
+  } catch (IOException e) {
+throw new RuntimeException("Error creating file in Avro test setup.", 
e);
+  }
+  this.schema = schema;
+  currentExpectedRecord = new TreeMap<>();
+  expectedRecords = new ArrayList<>();
+  filePath = file.getAbsolutePath();
+  fileName = file.getName();
+}
+
+public void startRecord() {
+  currentAvroRecord = new GenericData.Record(schema);
+  currentExpectedRecord = new TreeMap<>();
+}
+
+public void put(String key, Object value) {
+  currentAvroRecord.put(key, value);
+  // convert binary values into byte[], the format they will be given
+  // in the Drill result set in the test framework
+  currentExpectedRecord.put("`" + key + "`", convertAvroValToDrill(value, 
true));
+}
+
+// TODO - fix this the test wrapper to prevent the need for this hack
+// to make the root behave differently than nested fields for String vs. 
Text
+private Object convertAvroValToDrill(Object value, boolean root) {
+  if (value instanceof ByteBuffer) {
+ByteBuffer bb = ((ByteBuffer)value);
+byte[] drillVal = new byte[((ByteBuffer)value).remaining()];
 
 Review comment:
   Fixed.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the 

[jira] [Commented] (DRILL-7454) Convert the Avro format plugin to use EVF

2020-01-05 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17008478#comment-17008478
 ] 

ASF GitHub Bot commented on DRILL-7454:
---

paul-rogers commented on pull request #1951: DRILL-7454: Convert Avro to EVF
URL: https://github.com/apache/drill/pull/1951#discussion_r363134119
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/store/avro/AvroSchemaUtilTest.java
 ##
 @@ -0,0 +1,431 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.avro;
+
+import org.apache.avro.LogicalType;
+import org.apache.avro.LogicalTypes;
+import org.apache.avro.Schema;
+import org.apache.avro.SchemaBuilder;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.test.BaseTest;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.ExpectedException;
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertTrue;
+
+public class AvroSchemaUtilTest extends BaseTest {
 
 Review comment:
   Nicely done; I like how you were able to test the type conversion separate 
from the reader itself.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Convert the Avro format plugin to use EVF
> -
>
> Key: DRILL-7454
> URL: https://issues.apache.org/jira/browse/DRILL-7454
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
> Fix For: 1.18.0
>
>
> Convert the Avro format plugin to use EVF.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7454) Convert the Avro format plugin to use EVF

2020-01-05 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17008473#comment-17008473
 ] 

ASF GitHub Bot commented on DRILL-7454:
---

paul-rogers commented on pull request #1951: DRILL-7454: Convert Avro to EVF
URL: https://github.com/apache/drill/pull/1951#discussion_r363132606
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/avro/AvroBatchReader.java
 ##
 @@ -0,0 +1,368 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.avro;
+
+import org.apache.avro.Schema;
+import org.apache.avro.file.DataFileReader;
+import org.apache.avro.generic.GenericArray;
+import org.apache.avro.generic.GenericContainer;
+import org.apache.avro.generic.GenericDatumReader;
+import org.apache.avro.generic.GenericFixed;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.mapred.FsInput;
+import org.apache.avro.util.Utf8;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.exec.physical.impl.scan.file.FileScanFramework;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.util.ImpersonationUtil;
+import org.apache.drill.exec.vector.accessor.ArrayWriter;
+import org.apache.drill.exec.vector.accessor.DictWriter;
+import org.apache.drill.exec.vector.accessor.ObjectWriter;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.drill.shaded.guava.com.google.common.base.Charsets;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapred.FileSplit;
+import org.apache.hadoop.security.UserGroupInformation;
+import org.joda.time.DateTimeConstants;
+import org.joda.time.Period;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.math.BigDecimal;
+import java.math.BigInteger;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.nio.IntBuffer;
+import java.security.PrivilegedExceptionAction;
+import java.util.List;
+import java.util.Map;
+
+public class AvroBatchReader implements 
ManagedReader {
+
+  private static final Logger logger = 
LoggerFactory.getLogger(AvroBatchReader.class);
+
+  // currently config is unused but maybe used later
+  private final AvroReaderConfig config;
+
+  private Path filePath;
+  private long endPosition;
+  private DataFileReader reader;
+  private ResultSetLoader loader;
+  // re-use container instance
+  private GenericContainer container = null;
+
+  public AvroBatchReader(AvroReaderConfig config) {
+this.config = config;
+  }
+
+  @Override
+  public boolean open(FileScanFramework.FileSchemaNegotiator negotiator) {
+FileSplit split = negotiator.split();
+filePath = split.getPath();
+
+// Avro files are splittable, define reading start / end positions
+long startPosition = split.getStart();
+endPosition = startPosition + split.getLength();
+
+logger.debug("Processing Avro file: {}, start position: {}, end position: 
{}",
+  filePath, startPosition, endPosition);
+
+reader = prepareReader(split, negotiator.fileSystem(),
+  negotiator.userName(), 
negotiator.context().getFragmentContext().getQueryUserName());
+
+logger.debug("Avro file schema: {}", reader.getSchema());
+TupleMetadata schema = AvroSchemaUtil.convert(reader.getSchema());
+logger.debug("Avro file converted schema: {}", schema);
+negotiator.setTableSchema(schema, true);
+loader = negotiator.build();
+
+return true;
+  }
+
+  @Override
+  public boolean next() {
+RowSetLoader rowWriter = loader.writer();
+while (!rowWriter.isFull()) {
+  if (!nextLine(rowWriter)) {
+return false;
+  }
+}
+return true;
+  }
+
+  @Override
+  public void close() {
+try {
+  

[jira] [Commented] (DRILL-7454) Convert the Avro format plugin to use EVF

2020-01-05 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17008476#comment-17008476
 ] 

ASF GitHub Bot commented on DRILL-7454:
---

paul-rogers commented on pull request #1951: DRILL-7454: Convert Avro to EVF
URL: https://github.com/apache/drill/pull/1951#discussion_r363133393
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/avro/AvroFormatPlugin.java
 ##
 @@ -17,117 +17,68 @@
  */
 package org.apache.drill.exec.store.avro;
 
-import java.io.IOException;
-import java.util.List;
-import java.util.regex.Pattern;
-
-import org.apache.drill.common.exceptions.ExecutionSetupException;
-import org.apache.drill.common.expression.SchemaPath;
 import org.apache.drill.common.logical.StoragePluginConfig;
-import org.apache.drill.exec.ops.FragmentContext;
-import org.apache.drill.exec.planner.common.DrillStatsTable.TableStatistics;
-import org.apache.drill.exec.planner.logical.DrillTable;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.common.types.Types;
+import org.apache.drill.exec.physical.impl.scan.file.FileScanFramework;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
 import org.apache.drill.exec.proto.UserBitShared.CoreOperatorType;
 import org.apache.drill.exec.server.DrillbitContext;
-import org.apache.drill.exec.store.RecordReader;
-import org.apache.drill.exec.store.RecordWriter;
-import org.apache.drill.exec.store.SchemaConfig;
-import org.apache.drill.exec.store.dfs.BasicFormatMatcher;
-import org.apache.drill.exec.store.dfs.DrillFileSystem;
-import org.apache.drill.exec.store.dfs.FileSelection;
-import org.apache.drill.exec.store.dfs.FileSystemPlugin;
-import org.apache.drill.exec.store.dfs.FormatMatcher;
-import org.apache.drill.exec.store.dfs.FormatSelection;
-import org.apache.drill.exec.store.dfs.MagicString;
+import org.apache.drill.exec.server.options.OptionManager;
 import org.apache.drill.exec.store.dfs.easy.EasyFormatPlugin;
-import org.apache.drill.exec.store.dfs.easy.EasyWriter;
-import org.apache.drill.exec.store.dfs.easy.FileWork;
+import org.apache.drill.exec.store.dfs.easy.EasySubScan;
 import org.apache.hadoop.conf.Configuration;
 
-import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableList;
-import org.apache.drill.shaded.guava.com.google.common.collect.Lists;
-import org.apache.hadoop.fs.FileSystem;
-import org.apache.hadoop.fs.Path;
-
 /**
  * Format plugin for Avro data files.
  */
 public class AvroFormatPlugin extends EasyFormatPlugin {
 
-  private final AvroFormatMatcher matcher;
-
-  public AvroFormatPlugin(String name, DrillbitContext context, Configuration 
fsConf,
-  StoragePluginConfig storagePluginConfig) {
-this(name, context, fsConf, storagePluginConfig, new AvroFormatConfig());
-  }
-
-  public AvroFormatPlugin(String name, DrillbitContext context, Configuration 
fsConf, StoragePluginConfig config, AvroFormatConfig formatPluginConfig) {
-super(name, context, fsConf, config, formatPluginConfig, true, false, 
true, false, Lists.newArrayList("avro"), "avro");
-this.matcher = new AvroFormatMatcher(this);
-  }
-
-  @Override
-  public boolean supportsPushDown() {
-return true;
-  }
-
-  @Override
-  public RecordReader getRecordReader(FragmentContext context, DrillFileSystem 
dfs, FileWork fileWork, List columns, String userName) throws 
ExecutionSetupException {
-return new AvroRecordReader(context, fileWork.getPath(), 
fileWork.getStart(), fileWork.getLength(), dfs, columns,
-  userName);
-  }
-
-  @Override
-  public RecordWriter getRecordWriter(FragmentContext context, EasyWriter 
writer) throws IOException {
-throw new UnsupportedOperationException("unimplemented");
-  }
+  public static final String DEFAULT_NAME = "avro";
 
-  @Override
-  public int getReaderOperatorType() {
-return CoreOperatorType.AVRO_SUB_SCAN_VALUE;
+  public AvroFormatPlugin(String name,
+ DrillbitContext context,
+ Configuration fsConf,
+ StoragePluginConfig storageConfig,
+ AvroFormatConfig formatConfig) {
+super(name, easyConfig(fsConf, formatConfig), context, storageConfig, 
formatConfig);
   }
 
-  @Override
-  public int getWriterOperatorType() {
-throw new UnsupportedOperationException("unimplemented");
+  private static EasyFormatConfig easyConfig(Configuration fsConf, 
AvroFormatConfig formatConfig) {
+EasyFormatConfig config = new EasyFormatConfig();
+config.readable = true;
+config.writable = false;
+config.blockSplittable = true;
+config.compressible = false;
+config.supportsProjectPushdown = true;
+config.extensions = formatConfig.extensions;
+config.fsConf = fsConf;
+config.defaultName = DEFAULT_NAME;
+config.readerOperatorType = 

[jira] [Commented] (DRILL-7454) Convert the Avro format plugin to use EVF

2020-01-05 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17008479#comment-17008479
 ] 

ASF GitHub Bot commented on DRILL-7454:
---

paul-rogers commented on pull request #1951: DRILL-7454: Convert Avro to EVF
URL: https://github.com/apache/drill/pull/1951#discussion_r363133083
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/avro/AvroBatchReader.java
 ##
 @@ -0,0 +1,368 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.avro;
+
+import org.apache.avro.Schema;
+import org.apache.avro.file.DataFileReader;
+import org.apache.avro.generic.GenericArray;
+import org.apache.avro.generic.GenericContainer;
+import org.apache.avro.generic.GenericDatumReader;
+import org.apache.avro.generic.GenericFixed;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.mapred.FsInput;
+import org.apache.avro.util.Utf8;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.exec.physical.impl.scan.file.FileScanFramework;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.util.ImpersonationUtil;
+import org.apache.drill.exec.vector.accessor.ArrayWriter;
+import org.apache.drill.exec.vector.accessor.DictWriter;
+import org.apache.drill.exec.vector.accessor.ObjectWriter;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.drill.shaded.guava.com.google.common.base.Charsets;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapred.FileSplit;
+import org.apache.hadoop.security.UserGroupInformation;
+import org.joda.time.DateTimeConstants;
+import org.joda.time.Period;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.math.BigDecimal;
+import java.math.BigInteger;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.nio.IntBuffer;
+import java.security.PrivilegedExceptionAction;
+import java.util.List;
+import java.util.Map;
+
+public class AvroBatchReader implements 
ManagedReader {
+
+  private static final Logger logger = 
LoggerFactory.getLogger(AvroBatchReader.class);
+
+  // currently config is unused but maybe used later
+  private final AvroReaderConfig config;
+
+  private Path filePath;
+  private long endPosition;
+  private DataFileReader reader;
+  private ResultSetLoader loader;
+  // re-use container instance
+  private GenericContainer container = null;
+
+  public AvroBatchReader(AvroReaderConfig config) {
+this.config = config;
+  }
+
+  @Override
+  public boolean open(FileScanFramework.FileSchemaNegotiator negotiator) {
+FileSplit split = negotiator.split();
+filePath = split.getPath();
+
+// Avro files are splittable, define reading start / end positions
+long startPosition = split.getStart();
+endPosition = startPosition + split.getLength();
+
+logger.debug("Processing Avro file: {}, start position: {}, end position: 
{}",
+  filePath, startPosition, endPosition);
+
+reader = prepareReader(split, negotiator.fileSystem(),
+  negotiator.userName(), 
negotiator.context().getFragmentContext().getQueryUserName());
+
+logger.debug("Avro file schema: {}", reader.getSchema());
+TupleMetadata schema = AvroSchemaUtil.convert(reader.getSchema());
+logger.debug("Avro file converted schema: {}", schema);
+negotiator.setTableSchema(schema, true);
+loader = negotiator.build();
+
+return true;
+  }
+
+  @Override
+  public boolean next() {
+RowSetLoader rowWriter = loader.writer();
+while (!rowWriter.isFull()) {
+  if (!nextLine(rowWriter)) {
+return false;
+  }
+}
+return true;
+  }
+
+  @Override
+  public void close() {
+try {
+  

[jira] [Commented] (DRILL-7454) Convert the Avro format plugin to use EVF

2020-01-05 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17008475#comment-17008475
 ] 

ASF GitHub Bot commented on DRILL-7454:
---

paul-rogers commented on pull request #1951: DRILL-7454: Convert Avro to EVF
URL: https://github.com/apache/drill/pull/1951#discussion_r363132099
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/avro/AvroBatchReader.java
 ##
 @@ -0,0 +1,368 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.avro;
+
+import org.apache.avro.Schema;
+import org.apache.avro.file.DataFileReader;
+import org.apache.avro.generic.GenericArray;
+import org.apache.avro.generic.GenericContainer;
+import org.apache.avro.generic.GenericDatumReader;
+import org.apache.avro.generic.GenericFixed;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.mapred.FsInput;
+import org.apache.avro.util.Utf8;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.exec.physical.impl.scan.file.FileScanFramework;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.util.ImpersonationUtil;
+import org.apache.drill.exec.vector.accessor.ArrayWriter;
+import org.apache.drill.exec.vector.accessor.DictWriter;
+import org.apache.drill.exec.vector.accessor.ObjectWriter;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.drill.shaded.guava.com.google.common.base.Charsets;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapred.FileSplit;
+import org.apache.hadoop.security.UserGroupInformation;
+import org.joda.time.DateTimeConstants;
+import org.joda.time.Period;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.math.BigDecimal;
+import java.math.BigInteger;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.nio.IntBuffer;
+import java.security.PrivilegedExceptionAction;
+import java.util.List;
+import java.util.Map;
+
+public class AvroBatchReader implements 
ManagedReader {
+
+  private static final Logger logger = 
LoggerFactory.getLogger(AvroBatchReader.class);
+
+  // currently config is unused but maybe used later
+  private final AvroReaderConfig config;
+
+  private Path filePath;
+  private long endPosition;
+  private DataFileReader reader;
+  private ResultSetLoader loader;
+  // re-use container instance
+  private GenericContainer container = null;
+
+  public AvroBatchReader(AvroReaderConfig config) {
+this.config = config;
+  }
+
+  @Override
+  public boolean open(FileScanFramework.FileSchemaNegotiator negotiator) {
+FileSplit split = negotiator.split();
+filePath = split.getPath();
+
+// Avro files are splittable, define reading start / end positions
+long startPosition = split.getStart();
+endPosition = startPosition + split.getLength();
+
+logger.debug("Processing Avro file: {}, start position: {}, end position: 
{}",
+  filePath, startPosition, endPosition);
+
+reader = prepareReader(split, negotiator.fileSystem(),
+  negotiator.userName(), 
negotiator.context().getFragmentContext().getQueryUserName());
+
+logger.debug("Avro file schema: {}", reader.getSchema());
+TupleMetadata schema = AvroSchemaUtil.convert(reader.getSchema());
+logger.debug("Avro file converted schema: {}", schema);
+negotiator.setTableSchema(schema, true);
+loader = negotiator.build();
+
+return true;
+  }
+
+  @Override
+  public boolean next() {
+RowSetLoader rowWriter = loader.writer();
+while (!rowWriter.isFull()) {
+  if (!nextLine(rowWriter)) {
+return false;
+  }
+}
+return true;
+  }
+
+  @Override
+  public void close() {
+try {
+  

[jira] [Commented] (DRILL-7454) Convert the Avro format plugin to use EVF

2020-01-05 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17008480#comment-17008480
 ] 

ASF GitHub Bot commented on DRILL-7454:
---

paul-rogers commented on pull request #1951: DRILL-7454: Convert Avro to EVF
URL: https://github.com/apache/drill/pull/1951#discussion_r363133713
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/avro/AvroSchemaUtil.java
 ##
 @@ -0,0 +1,274 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.avro;
+
+import org.apache.avro.LogicalType;
+import org.apache.avro.LogicalTypes;
+import org.apache.avro.Schema;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.common.types.Types;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.DictBuilder;
+import org.apache.drill.exec.record.metadata.MapBuilder;
+import org.apache.drill.exec.record.metadata.MetadataUtils;
+import org.apache.drill.exec.record.metadata.RepeatedListBuilder;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.record.metadata.TupleSchema;
+import org.apache.drill.exec.vector.complex.DictVector;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.HashSet;
+import java.util.List;
+import java.util.Set;
+import java.util.stream.Collectors;
+
+/**
+ * Utility class that provides methods to interact with Avro schema.
+ */
+public class AvroSchemaUtil {
+
+  private static final Logger logger = 
LoggerFactory.getLogger(AvroSchemaUtil.class);
+
+  public static final String AVRO_LOGICAL_TYPE_PROPERTY = "avro_logical_type";
+
+  public static final String DECIMAL_LOGICAL_TYPE = "decimal";
+  public static final String TIMESTAMP_MICROS_LOGICAL_TYPE = 
"timestamp-micros";
+  public static final String TIMESTAMP_MILLIS_LOGICAL_TYPE = 
"timestamp-millis";
+  public static final String DATE_LOGICAL_TYPE = "date";
+  public static final String TIME_MICROS_LOGICAL_TYPE = "time-micros";
+  public static final String TIME_MILLIS_LOGICAL_TYPE = "time-millis";
+  public static final String DURATION_LOGICAL_TYPE = "duration";
+
+  /**
+   * Converts Avro schema into Drill metadata description of the schema.
+   *
+   * @param schema Avro schema
+   * @return metadata description of the schema
+   * @throws UserException if schema contains unsupported types
+   */
+  public static TupleMetadata convert(Schema schema) {
+return SchemaConverter.INSTANCE.convert(schema);
+  }
+
+  /**
+   * Avro represents nullable type as union of null and another schema: 
["null", "some-type"].
+   * This method extracts non-nullable schema for given union schema.
+   *
+   * @param schema Avro schema
+   * @param columnName column name
+   * @return non-nullable Avro schema
+   * @throws UserException if given schema is not a union or represents 
complex union
+   */
+  public static Schema extractSchemaFromNullable(Schema schema, String 
columnName) {
+if (!schema.isUnion()) {
+  throw UserException.validationError()
+.message("Expected union type, but received: %s", schema.getType())
+.addContext("Column", columnName)
+.build(logger);
+}
+List unionSchemas = schema.getTypes();
+
+// exclude all schemas with null type
+List nonNullSchemas = unionSchemas.stream()
+  .filter(unionSchema -> !Schema.Type.NULL.equals(unionSchema.getType()))
+  .collect(Collectors.toList());
+
+// if original schema has two elements and only one non-nullable schema, 
this is simple nullable type
+if (unionSchemas.size() == 2 && nonNullSchemas.size() == 1) {
+  return nonNullSchemas.get(0);
+} else {
+  return throwUnsupportedErrorForType("complex union", columnName);
+}
+  }
+
+  private static  T throwUnsupportedErrorForType(String type, String 
columnName) {
+throw UserException.unsupportedError()
+  .message("'%s' type is not supported", type)
+  .addContext("Column", columnName)

[jira] [Commented] (DRILL-7454) Convert the Avro format plugin to use EVF

2020-01-05 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17008477#comment-17008477
 ] 

ASF GitHub Bot commented on DRILL-7454:
---

paul-rogers commented on pull request #1951: DRILL-7454: Convert Avro to EVF
URL: https://github.com/apache/drill/pull/1951#discussion_r363132975
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/avro/AvroBatchReader.java
 ##
 @@ -0,0 +1,368 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.avro;
+
+import org.apache.avro.Schema;
+import org.apache.avro.file.DataFileReader;
+import org.apache.avro.generic.GenericArray;
+import org.apache.avro.generic.GenericContainer;
+import org.apache.avro.generic.GenericDatumReader;
+import org.apache.avro.generic.GenericFixed;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.mapred.FsInput;
+import org.apache.avro.util.Utf8;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.exec.physical.impl.scan.file.FileScanFramework;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.util.ImpersonationUtil;
+import org.apache.drill.exec.vector.accessor.ArrayWriter;
+import org.apache.drill.exec.vector.accessor.DictWriter;
+import org.apache.drill.exec.vector.accessor.ObjectWriter;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.drill.shaded.guava.com.google.common.base.Charsets;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapred.FileSplit;
+import org.apache.hadoop.security.UserGroupInformation;
+import org.joda.time.DateTimeConstants;
+import org.joda.time.Period;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.math.BigDecimal;
+import java.math.BigInteger;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.nio.IntBuffer;
+import java.security.PrivilegedExceptionAction;
+import java.util.List;
+import java.util.Map;
+
+public class AvroBatchReader implements 
ManagedReader {
+
+  private static final Logger logger = 
LoggerFactory.getLogger(AvroBatchReader.class);
+
+  // currently config is unused but maybe used later
+  private final AvroReaderConfig config;
+
+  private Path filePath;
+  private long endPosition;
+  private DataFileReader reader;
+  private ResultSetLoader loader;
+  // re-use container instance
+  private GenericContainer container = null;
+
+  public AvroBatchReader(AvroReaderConfig config) {
+this.config = config;
+  }
+
+  @Override
+  public boolean open(FileScanFramework.FileSchemaNegotiator negotiator) {
+FileSplit split = negotiator.split();
+filePath = split.getPath();
+
+// Avro files are splittable, define reading start / end positions
+long startPosition = split.getStart();
+endPosition = startPosition + split.getLength();
+
+logger.debug("Processing Avro file: {}, start position: {}, end position: 
{}",
+  filePath, startPosition, endPosition);
+
+reader = prepareReader(split, negotiator.fileSystem(),
+  negotiator.userName(), 
negotiator.context().getFragmentContext().getQueryUserName());
+
+logger.debug("Avro file schema: {}", reader.getSchema());
+TupleMetadata schema = AvroSchemaUtil.convert(reader.getSchema());
+logger.debug("Avro file converted schema: {}", schema);
+negotiator.setTableSchema(schema, true);
+loader = negotiator.build();
+
+return true;
+  }
+
+  @Override
+  public boolean next() {
+RowSetLoader rowWriter = loader.writer();
+while (!rowWriter.isFull()) {
+  if (!nextLine(rowWriter)) {
+return false;
+  }
+}
+return true;
+  }
+
+  @Override
+  public void close() {
+try {
+  

[jira] [Commented] (DRILL-7454) Convert the Avro format plugin to use EVF

2020-01-05 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17008481#comment-17008481
 ] 

ASF GitHub Bot commented on DRILL-7454:
---

paul-rogers commented on pull request #1951: DRILL-7454: Convert Avro to EVF
URL: https://github.com/apache/drill/pull/1951#discussion_r363133984
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/avro/AvroBatchReader.java
 ##
 @@ -0,0 +1,368 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.avro;
+
+import org.apache.avro.Schema;
+import org.apache.avro.file.DataFileReader;
+import org.apache.avro.generic.GenericArray;
+import org.apache.avro.generic.GenericContainer;
+import org.apache.avro.generic.GenericDatumReader;
+import org.apache.avro.generic.GenericFixed;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.mapred.FsInput;
+import org.apache.avro.util.Utf8;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.exec.physical.impl.scan.file.FileScanFramework;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.util.ImpersonationUtil;
+import org.apache.drill.exec.vector.accessor.ArrayWriter;
+import org.apache.drill.exec.vector.accessor.DictWriter;
+import org.apache.drill.exec.vector.accessor.ObjectWriter;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.drill.shaded.guava.com.google.common.base.Charsets;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapred.FileSplit;
+import org.apache.hadoop.security.UserGroupInformation;
+import org.joda.time.DateTimeConstants;
+import org.joda.time.Period;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.math.BigDecimal;
+import java.math.BigInteger;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.nio.IntBuffer;
+import java.security.PrivilegedExceptionAction;
+import java.util.List;
+import java.util.Map;
+
+public class AvroBatchReader implements 
ManagedReader {
+
+  private static final Logger logger = 
LoggerFactory.getLogger(AvroBatchReader.class);
+
+  // currently config is unused but maybe used later
+  private final AvroReaderConfig config;
+
+  private Path filePath;
+  private long endPosition;
+  private DataFileReader reader;
+  private ResultSetLoader loader;
+  // re-use container instance
+  private GenericContainer container = null;
+
+  public AvroBatchReader(AvroReaderConfig config) {
+this.config = config;
+  }
+
+  @Override
+  public boolean open(FileScanFramework.FileSchemaNegotiator negotiator) {
+FileSplit split = negotiator.split();
+filePath = split.getPath();
+
+// Avro files are splittable, define reading start / end positions
+long startPosition = split.getStart();
+endPosition = startPosition + split.getLength();
+
+logger.debug("Processing Avro file: {}, start position: {}, end position: 
{}",
+  filePath, startPosition, endPosition);
+
+reader = prepareReader(split, negotiator.fileSystem(),
+  negotiator.userName(), 
negotiator.context().getFragmentContext().getQueryUserName());
+
+logger.debug("Avro file schema: {}", reader.getSchema());
+TupleMetadata schema = AvroSchemaUtil.convert(reader.getSchema());
 
 Review comment:
   Here we are converting the Avro schema to a Drill schema, which makes sense. 
If we scan a directory with multiple files, they should all have the same 
schema (or an evolved schema: `(A, B) --> (A,B,C) --> (B,C)`.
   
   In the future, it would be handy to infer the schema a plan time, then pass 
the resulting schema to each reader so we don't have to repeat identical work 
in each of perhaps hundreds of readers.
 

[jira] [Commented] (DRILL-7454) Convert the Avro format plugin to use EVF

2020-01-05 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17008474#comment-17008474
 ] 

ASF GitHub Bot commented on DRILL-7454:
---

paul-rogers commented on pull request #1951: DRILL-7454: Convert Avro to EVF
URL: https://github.com/apache/drill/pull/1951#discussion_r363132679
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/avro/AvroBatchReader.java
 ##
 @@ -0,0 +1,368 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.avro;
+
+import org.apache.avro.Schema;
+import org.apache.avro.file.DataFileReader;
+import org.apache.avro.generic.GenericArray;
+import org.apache.avro.generic.GenericContainer;
+import org.apache.avro.generic.GenericDatumReader;
+import org.apache.avro.generic.GenericFixed;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.mapred.FsInput;
+import org.apache.avro.util.Utf8;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.exec.physical.impl.scan.file.FileScanFramework;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.util.ImpersonationUtil;
+import org.apache.drill.exec.vector.accessor.ArrayWriter;
+import org.apache.drill.exec.vector.accessor.DictWriter;
+import org.apache.drill.exec.vector.accessor.ObjectWriter;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.drill.shaded.guava.com.google.common.base.Charsets;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapred.FileSplit;
+import org.apache.hadoop.security.UserGroupInformation;
+import org.joda.time.DateTimeConstants;
+import org.joda.time.Period;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.math.BigDecimal;
+import java.math.BigInteger;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.nio.IntBuffer;
+import java.security.PrivilegedExceptionAction;
+import java.util.List;
+import java.util.Map;
+
+public class AvroBatchReader implements 
ManagedReader {
+
+  private static final Logger logger = 
LoggerFactory.getLogger(AvroBatchReader.class);
+
+  // currently config is unused but maybe used later
+  private final AvroReaderConfig config;
+
+  private Path filePath;
+  private long endPosition;
+  private DataFileReader reader;
+  private ResultSetLoader loader;
+  // re-use container instance
+  private GenericContainer container = null;
+
+  public AvroBatchReader(AvroReaderConfig config) {
+this.config = config;
+  }
+
+  @Override
+  public boolean open(FileScanFramework.FileSchemaNegotiator negotiator) {
+FileSplit split = negotiator.split();
+filePath = split.getPath();
+
+// Avro files are splittable, define reading start / end positions
+long startPosition = split.getStart();
+endPosition = startPosition + split.getLength();
+
+logger.debug("Processing Avro file: {}, start position: {}, end position: 
{}",
+  filePath, startPosition, endPosition);
+
+reader = prepareReader(split, negotiator.fileSystem(),
+  negotiator.userName(), 
negotiator.context().getFragmentContext().getQueryUserName());
+
+logger.debug("Avro file schema: {}", reader.getSchema());
+TupleMetadata schema = AvroSchemaUtil.convert(reader.getSchema());
+logger.debug("Avro file converted schema: {}", schema);
+negotiator.setTableSchema(schema, true);
+loader = negotiator.build();
+
+return true;
+  }
+
+  @Override
+  public boolean next() {
+RowSetLoader rowWriter = loader.writer();
+while (!rowWriter.isFull()) {
+  if (!nextLine(rowWriter)) {
+return false;
+  }
+}
+return true;
+  }
+
+  @Override
+  public void close() {
+try {
+  

[jira] [Commented] (DRILL-7454) Convert the Avro format plugin to use EVF

2020-01-05 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17008318#comment-17008318
 ] 

ASF GitHub Bot commented on DRILL-7454:
---

vvysotskyi commented on pull request #1951: DRILL-7454: Convert Avro to EVF
URL: https://github.com/apache/drill/pull/1951#discussion_r363086923
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/avro/AvroBatchReader.java
 ##
 @@ -0,0 +1,368 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.avro;
+
+import org.apache.avro.Schema;
+import org.apache.avro.file.DataFileReader;
+import org.apache.avro.generic.GenericArray;
+import org.apache.avro.generic.GenericContainer;
+import org.apache.avro.generic.GenericDatumReader;
+import org.apache.avro.generic.GenericFixed;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.mapred.FsInput;
+import org.apache.avro.util.Utf8;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.exec.physical.impl.scan.file.FileScanFramework;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.util.ImpersonationUtil;
+import org.apache.drill.exec.vector.accessor.ArrayWriter;
+import org.apache.drill.exec.vector.accessor.DictWriter;
+import org.apache.drill.exec.vector.accessor.ObjectWriter;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.drill.shaded.guava.com.google.common.base.Charsets;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapred.FileSplit;
+import org.apache.hadoop.security.UserGroupInformation;
+import org.joda.time.DateTimeConstants;
+import org.joda.time.Period;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.math.BigDecimal;
+import java.math.BigInteger;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.nio.IntBuffer;
+import java.security.PrivilegedExceptionAction;
+import java.util.List;
+import java.util.Map;
+
+public class AvroBatchReader implements 
ManagedReader {
+
+  private static final Logger logger = 
LoggerFactory.getLogger(AvroBatchReader.class);
+
+  // currently config is unused but maybe used later
+  private final AvroReaderConfig config;
+
+  private Path filePath;
+  private long endPosition;
+  private DataFileReader reader;
+  private ResultSetLoader loader;
+  // re-use container instance
+  private GenericContainer container = null;
 
 Review comment:
   Is it possible to use `GenericRecord` type here, so we would avoid casts in 
`nextLine()` method?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Convert the Avro format plugin to use EVF
> -
>
> Key: DRILL-7454
> URL: https://issues.apache.org/jira/browse/DRILL-7454
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
> Fix For: 1.18.0
>
>
> Convert the Avro format plugin to use EVF.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7454) Convert the Avro format plugin to use EVF

2020-01-05 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17008317#comment-17008317
 ] 

ASF GitHub Bot commented on DRILL-7454:
---

vvysotskyi commented on pull request #1951: DRILL-7454: Convert Avro to EVF
URL: https://github.com/apache/drill/pull/1951#discussion_r363088693
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/store/avro/AvroDataGenerator.java
 ##
 @@ -0,0 +1,819 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.avro;
+
+import org.apache.avro.LogicalType;
+import org.apache.avro.LogicalTypes;
+import org.apache.avro.Schema;
+import org.apache.avro.Schema.Type;
+import org.apache.avro.SchemaBuilder;
+import org.apache.avro.file.DataFileWriter;
+import org.apache.avro.generic.GenericArray;
+import org.apache.avro.generic.GenericData;
+import org.apache.avro.generic.GenericDatumWriter;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.drill.exec.util.JsonStringArrayList;
+import org.apache.drill.exec.util.JsonStringHashMap;
+import org.apache.drill.exec.util.Text;
+import org.apache.drill.test.BaseDirTestWatcher;
+
+import java.io.Closeable;
+import java.io.File;
+import java.io.IOException;
+import java.math.BigInteger;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.nio.file.Paths;
+import java.time.LocalDateTime;
+import java.time.ZoneOffset;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.TreeMap;
+
+/**
+ * Utilities for generating Avro test data.
+ */
+public class AvroDataGenerator {
+
+  public static final int RECORD_COUNT = 50;
+  public static int ARRAY_SIZE = 4;
+
+  private final BaseDirTestWatcher dirTestWatcher;
+
+  public AvroDataGenerator(BaseDirTestWatcher dirTestWatcher) {
+this.dirTestWatcher = dirTestWatcher;
+  }
+
+  /**
+   * Class to write records to an Avro file while simultaneously
+   * constructing a corresponding list of records in the format taken in
+   * by the Drill test builder to describe expected results.
+   */
+  public static class AvroTestRecordWriter implements Closeable {
+
+private final List> expectedRecords;
+private final Schema schema;
+private final DataFileWriter writer;
+private final String filePath;
+private final String fileName;
+
+private GenericData.Record currentAvroRecord;
+private Map currentExpectedRecord;
+
+public AvroTestRecordWriter(Schema schema, File file) {
+  writer = new DataFileWriter<>(new GenericDatumWriter<>(schema));
+  try {
+writer.create(schema, file);
+  } catch (IOException e) {
+throw new RuntimeException("Error creating file in Avro test setup.", 
e);
+  }
+  this.schema = schema;
+  currentExpectedRecord = new TreeMap<>();
+  expectedRecords = new ArrayList<>();
+  filePath = file.getAbsolutePath();
+  fileName = file.getName();
+}
+
+public void startRecord() {
+  currentAvroRecord = new GenericData.Record(schema);
+  currentExpectedRecord = new TreeMap<>();
+}
+
+public void put(String key, Object value) {
+  currentAvroRecord.put(key, value);
+  // convert binary values into byte[], the format they will be given
+  // in the Drill result set in the test framework
+  currentExpectedRecord.put("`" + key + "`", convertAvroValToDrill(value, 
true));
+}
+
+// TODO - fix this the test wrapper to prevent the need for this hack
+// to make the root behave differently than nested fields for String vs. 
Text
+private Object convertAvroValToDrill(Object value, boolean root) {
+  if (value instanceof ByteBuffer) {
+ByteBuffer bb = ((ByteBuffer)value);
 
 Review comment:
   ```suggestion
   ByteBuffer bb = ((ByteBuffer) value);
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific 

[jira] [Commented] (DRILL-7454) Convert the Avro format plugin to use EVF

2020-01-05 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17008316#comment-17008316
 ] 

ASF GitHub Bot commented on DRILL-7454:
---

vvysotskyi commented on pull request #1951: DRILL-7454: Convert Avro to EVF
URL: https://github.com/apache/drill/pull/1951#discussion_r363088696
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/store/avro/AvroDataGenerator.java
 ##
 @@ -0,0 +1,819 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.avro;
+
+import org.apache.avro.LogicalType;
+import org.apache.avro.LogicalTypes;
+import org.apache.avro.Schema;
+import org.apache.avro.Schema.Type;
+import org.apache.avro.SchemaBuilder;
+import org.apache.avro.file.DataFileWriter;
+import org.apache.avro.generic.GenericArray;
+import org.apache.avro.generic.GenericData;
+import org.apache.avro.generic.GenericDatumWriter;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.drill.exec.util.JsonStringArrayList;
+import org.apache.drill.exec.util.JsonStringHashMap;
+import org.apache.drill.exec.util.Text;
+import org.apache.drill.test.BaseDirTestWatcher;
+
+import java.io.Closeable;
+import java.io.File;
+import java.io.IOException;
+import java.math.BigInteger;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.nio.file.Paths;
+import java.time.LocalDateTime;
+import java.time.ZoneOffset;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.TreeMap;
+
+/**
+ * Utilities for generating Avro test data.
+ */
+public class AvroDataGenerator {
+
+  public static final int RECORD_COUNT = 50;
+  public static int ARRAY_SIZE = 4;
+
+  private final BaseDirTestWatcher dirTestWatcher;
+
+  public AvroDataGenerator(BaseDirTestWatcher dirTestWatcher) {
+this.dirTestWatcher = dirTestWatcher;
+  }
+
+  /**
+   * Class to write records to an Avro file while simultaneously
+   * constructing a corresponding list of records in the format taken in
+   * by the Drill test builder to describe expected results.
+   */
+  public static class AvroTestRecordWriter implements Closeable {
+
+private final List> expectedRecords;
+private final Schema schema;
+private final DataFileWriter writer;
+private final String filePath;
+private final String fileName;
+
+private GenericData.Record currentAvroRecord;
+private Map currentExpectedRecord;
+
+public AvroTestRecordWriter(Schema schema, File file) {
+  writer = new DataFileWriter<>(new GenericDatumWriter<>(schema));
+  try {
+writer.create(schema, file);
+  } catch (IOException e) {
+throw new RuntimeException("Error creating file in Avro test setup.", 
e);
+  }
+  this.schema = schema;
+  currentExpectedRecord = new TreeMap<>();
+  expectedRecords = new ArrayList<>();
+  filePath = file.getAbsolutePath();
+  fileName = file.getName();
+}
+
+public void startRecord() {
+  currentAvroRecord = new GenericData.Record(schema);
+  currentExpectedRecord = new TreeMap<>();
+}
+
+public void put(String key, Object value) {
+  currentAvroRecord.put(key, value);
+  // convert binary values into byte[], the format they will be given
+  // in the Drill result set in the test framework
+  currentExpectedRecord.put("`" + key + "`", convertAvroValToDrill(value, 
true));
+}
+
+// TODO - fix this the test wrapper to prevent the need for this hack
+// to make the root behave differently than nested fields for String vs. 
Text
+private Object convertAvroValToDrill(Object value, boolean root) {
+  if (value instanceof ByteBuffer) {
+ByteBuffer bb = ((ByteBuffer)value);
+byte[] drillVal = new byte[((ByteBuffer)value).remaining()];
 
 Review comment:
   ```suggestion
   byte[] drillVal = new byte[((ByteBuffer) value).remaining()];
   ```
 

This is an automated message from the Apache Git Service.
To 

[jira] [Commented] (DRILL-7454) Convert the Avro format plugin to use EVF

2020-01-05 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17008315#comment-17008315
 ] 

ASF GitHub Bot commented on DRILL-7454:
---

vvysotskyi commented on pull request #1951: DRILL-7454: Convert Avro to EVF
URL: https://github.com/apache/drill/pull/1951#discussion_r363089977
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/avro/AvroDrillTable.java
 ##
 @@ -1,197 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-package org.apache.drill.exec.store.avro;
-
-import java.io.IOException;
-import java.util.List;
-
-import org.apache.avro.LogicalType;
-import org.apache.avro.Schema;
-import org.apache.avro.Schema.Field;
-import org.apache.avro.file.DataFileReader;
-import org.apache.avro.generic.GenericContainer;
-import org.apache.avro.generic.GenericDatumReader;
-import org.apache.avro.mapred.FsInput;
-import org.apache.calcite.rel.type.RelDataType;
-import org.apache.calcite.rel.type.RelDataTypeFactory;
-import org.apache.calcite.sql.type.SqlTypeName;
-import org.apache.drill.common.exceptions.UserException;
-import org.apache.drill.exec.planner.logical.DrillTable;
-import org.apache.drill.exec.planner.logical.ExtendableRelDataType;
-import org.apache.drill.exec.planner.types.ExtendableRelDataTypeHolder;
-import org.apache.drill.exec.store.ColumnExplorer;
-import org.apache.drill.exec.store.SchemaConfig;
-import org.apache.drill.exec.store.dfs.FileSystemPlugin;
-import org.apache.drill.exec.store.dfs.FormatSelection;
-import org.apache.hadoop.fs.Path;
-
-import org.apache.drill.shaded.guava.com.google.common.collect.Lists;
-
-public class AvroDrillTable extends DrillTable {
-  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(AvroDrillTable.class);
-
-  private final DataFileReader reader;
-  private final SchemaConfig schemaConfig;
-  private ExtendableRelDataTypeHolder holder;
 
 Review comment:
   Could you please also remove `ExtendableRelDataTypeHolder`, 
`ExtendableRelDataType` classes and `DrillValidator.addToSelectList()` method 
added in DRILL-4120 sine they are not needed with these changes.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Convert the Avro format plugin to use EVF
> -
>
> Key: DRILL-7454
> URL: https://issues.apache.org/jira/browse/DRILL-7454
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
> Fix For: 1.18.0
>
>
> Convert the Avro format plugin to use EVF.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7454) Convert the Avro format plugin to use EVF

2020-01-03 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17007552#comment-17007552
 ] 

ASF GitHub Bot commented on DRILL-7454:
---

arina-ielchiieva commented on pull request #1951: DRILL-7454: Convert Avro to 
EVF
URL: https://github.com/apache/drill/pull/1951
 
 
   1. Replaced old format implementation with EVF.
   2. Updated, added and improved performance for Avro tests.
   3. Code refactoring.
   
   Jira - [DRILL-7454](https://issues.apache.org/jira/browse/DRILL-7454).
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Convert the Avro format plugin to use EVF
> -
>
> Key: DRILL-7454
> URL: https://issues.apache.org/jira/browse/DRILL-7454
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
> Fix For: 1.18.0
>
>
> Convert the Avro format plugin to use EVF.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)