[jira] [Commented] (DRILL-4303) ESRI Shapefile (shp) format plugin

2019-10-28 Thread Vova Vysotskyi (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961240#comment-16961240
 ] 

Vova Vysotskyi commented on DRILL-4303:
---

Merged into Apache master with commit id 
[8f40dc9e|https://github.com/apache/drill/commit/8f40dc9ea50c036a36ddc25183f48ce578a5154e].

> ESRI Shapefile (shp) format plugin
> --
>
> Key: DRILL-4303
> URL: https://issues.apache.org/jira/browse/DRILL-4303
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Affects Versions: 1.17.0
>Reporter: Karol Potocki
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.17.0
>
>
> Allow Drill (drill-gis) to read esri shapefiles, one of the most popular 
> geospatial formats.
> Format described here:
> https://www.esri.com/library/whitepapers/pdfs/shapefile.pdf
> It consists of three files (prj - srid information, dbf - data fields, shp - 
> geometry data)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-4303) ESRI Shapefile (shp) format plugin

2019-10-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961238#comment-16961238
 ] 

ASF GitHub Bot commented on DRILL-4303:
---

vvysotskyi commented on pull request #1858: DRILL-4303: ESRI Shapefile (shp) 
Format Plugin
URL: https://github.com/apache/drill/pull/1858
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> ESRI Shapefile (shp) format plugin
> --
>
> Key: DRILL-4303
> URL: https://issues.apache.org/jira/browse/DRILL-4303
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Affects Versions: 1.17.0
>Reporter: Karol Potocki
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.17.0
>
>
> Allow Drill (drill-gis) to read esri shapefiles, one of the most popular 
> geospatial formats.
> Format described here:
> https://www.esri.com/library/whitepapers/pdfs/shapefile.pdf
> It consists of three files (prj - srid information, dbf - data fields, shp - 
> geometry data)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-4303) ESRI Shapefile (shp) format plugin

2019-10-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961124#comment-16961124
 ] 

ASF GitHub Bot commented on DRILL-4303:
---

arina-ielchiieva commented on issue #1858: DRILL-4303: ESRI Shapefile (shp) 
Format Plugin
URL: https://github.com/apache/drill/pull/1858#issuecomment-546985478
 
 
   Thanks.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> ESRI Shapefile (shp) format plugin
> --
>
> Key: DRILL-4303
> URL: https://issues.apache.org/jira/browse/DRILL-4303
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Affects Versions: 1.17.0
>Reporter: Karol Potocki
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.17.0
>
>
> Allow Drill (drill-gis) to read esri shapefiles, one of the most popular 
> geospatial formats.
> Format described here:
> https://www.esri.com/library/whitepapers/pdfs/shapefile.pdf
> It consists of three files (prj - srid information, dbf - data fields, shp - 
> geometry data)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-4303) ESRI Shapefile (shp) format plugin

2019-10-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961120#comment-16961120
 ] 

ASF GitHub Bot commented on DRILL-4303:
---

cgivre commented on issue #1858: DRILL-4303: ESRI Shapefile (shp) Format Plugin
URL: https://github.com/apache/drill/pull/1858#issuecomment-546984435
 
 
   I will create the ticket shortly. 
   
   > On Oct 28, 2019, at 10:54 AM, Arina Ielchiieva  
wrote:
   > 
   > @cgivre  one more thing, I think you forgot to 
create Jira for the reader enhancement Paul has mentioned.
   > 
   > —
   > You are receiving this because you were mentioned.
   > Reply to this email directly, view it on GitHub 
,
 or unsubscribe 
.
   > 
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> ESRI Shapefile (shp) format plugin
> --
>
> Key: DRILL-4303
> URL: https://issues.apache.org/jira/browse/DRILL-4303
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Affects Versions: 1.17.0
>Reporter: Karol Potocki
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.17.0
>
>
> Allow Drill (drill-gis) to read esri shapefiles, one of the most popular 
> geospatial formats.
> Format described here:
> https://www.esri.com/library/whitepapers/pdfs/shapefile.pdf
> It consists of three files (prj - srid information, dbf - data fields, shp - 
> geometry data)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-4303) ESRI Shapefile (shp) format plugin

2019-10-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961117#comment-16961117
 ] 

ASF GitHub Bot commented on DRILL-4303:
---

arina-ielchiieva commented on issue #1858: DRILL-4303: ESRI Shapefile (shp) 
Format Plugin
URL: https://github.com/apache/drill/pull/1858#issuecomment-546984086
 
 
   @cgivre one more thing, I think you forgot to create Jira for the reader 
enhancement Paul has mentioned.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> ESRI Shapefile (shp) format plugin
> --
>
> Key: DRILL-4303
> URL: https://issues.apache.org/jira/browse/DRILL-4303
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Affects Versions: 1.17.0
>Reporter: Karol Potocki
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.17.0
>
>
> Allow Drill (drill-gis) to read esri shapefiles, one of the most popular 
> geospatial formats.
> Format described here:
> https://www.esri.com/library/whitepapers/pdfs/shapefile.pdf
> It consists of three files (prj - srid information, dbf - data fields, shp - 
> geometry data)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-4303) ESRI Shapefile (shp) format plugin

2019-10-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961100#comment-16961100
 ] 

ASF GitHub Bot commented on DRILL-4303:
---

arina-ielchiieva commented on issue #1858: DRILL-4303: ESRI Shapefile (shp) 
Format Plugin
URL: https://github.com/apache/drill/pull/1858#issuecomment-546979600
 
 
   Once full test run will pass, PR will be merged.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> ESRI Shapefile (shp) format plugin
> --
>
> Key: DRILL-4303
> URL: https://issues.apache.org/jira/browse/DRILL-4303
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Affects Versions: 1.17.0
>Reporter: Karol Potocki
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.17.0
>
>
> Allow Drill (drill-gis) to read esri shapefiles, one of the most popular 
> geospatial formats.
> Format described here:
> https://www.esri.com/library/whitepapers/pdfs/shapefile.pdf
> It consists of three files (prj - srid information, dbf - data fields, shp - 
> geometry data)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-4303) ESRI Shapefile (shp) format plugin

2019-10-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961099#comment-16961099
 ] 

ASF GitHub Bot commented on DRILL-4303:
---

cgivre commented on issue #1858: DRILL-4303: ESRI Shapefile (shp) Format Plugin
URL: https://github.com/apache/drill/pull/1858#issuecomment-546977509
 
 
   Thank you @arina-ielchiieva and @paul-rogers for the review.  Also thank you 
@k255 for the original PR.  Commits squashed. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> ESRI Shapefile (shp) format plugin
> --
>
> Key: DRILL-4303
> URL: https://issues.apache.org/jira/browse/DRILL-4303
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Affects Versions: 1.17.0
>Reporter: Karol Potocki
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.17.0
>
>
> Allow Drill (drill-gis) to read esri shapefiles, one of the most popular 
> geospatial formats.
> Format described here:
> https://www.esri.com/library/whitepapers/pdfs/shapefile.pdf
> It consists of three files (prj - srid information, dbf - data fields, shp - 
> geometry data)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-4303) ESRI Shapefile (shp) format plugin

2019-10-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961098#comment-16961098
 ] 

ASF GitHub Bot commented on DRILL-4303:
---

arina-ielchiieva commented on issue #1858: DRILL-4303: ESRI Shapefile (shp) 
Format Plugin
URL: https://github.com/apache/drill/pull/1858#issuecomment-546975836
 
 
   +1, please rebase if needed and squash the commits.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> ESRI Shapefile (shp) format plugin
> --
>
> Key: DRILL-4303
> URL: https://issues.apache.org/jira/browse/DRILL-4303
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Affects Versions: 1.17.0
>Reporter: Karol Potocki
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.17.0
>
>
> Allow Drill (drill-gis) to read esri shapefiles, one of the most popular 
> geospatial formats.
> Format described here:
> https://www.esri.com/library/whitepapers/pdfs/shapefile.pdf
> It consists of three files (prj - srid information, dbf - data fields, shp - 
> geometry data)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-4303) ESRI Shapefile (shp) format plugin

2019-10-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961087#comment-16961087
 ] 

ASF GitHub Bot commented on DRILL-4303:
---

arina-ielchiieva commented on pull request #1858: DRILL-4303: ESRI Shapefile 
(shp) Format Plugin
URL: https://github.com/apache/drill/pull/1858#discussion_r339584367
 
 

 ##
 File path: 
contrib/format-esri/src/main/java/org/apache/drill/exec/store/esri/ShpBatchReader.java
 ##
 @@ -0,0 +1,318 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.esri;
+
+import com.esri.core.geometry.Geometry;
+import com.esri.core.geometry.GeometryCursor;
+import com.esri.core.geometry.ShapefileReader;
+import com.esri.core.geometry.SpatialReference;
+import com.esri.core.geometry.ogc.OGCGeometry;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.types.TypeProtos;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.MetadataUtils;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapred.FileSplit;
+import org.jamel.dbf.DbfReader;
+import org.jamel.dbf.structure.DbfField;
+import org.joda.time.Instant;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.nio.ByteBuffer;
+import java.nio.charset.Charset;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+public class ShpBatchReader implements ManagedReader {
+
+  private FileSplit split;
+  private ResultSetLoader loader;
+  private ShpReaderConfig readerConfig;
+  private Path hadoopShp;
+  private Path hadoopDbf;
+  private Path hadoopPrj;
+  private InputStream fileReaderShp = null;
+  private InputStream fileReaderDbf = null;
+  private InputStream fileReaderPrj = null;
+  private GeometryCursor geomCursor = null;
+  private DbfReader dbfReader = null;
+  private ScalarWriter gidWriter;
+  private ScalarWriter sridWriter;
+  private ScalarWriter shapeTypeWriter;
+  private ScalarWriter geomWriter;
+  private RowSetLoader rowWriter;
+  private int srid;
+  private SpatialReference spatialReference;
+  private static final Logger logger = 
LoggerFactory.getLogger(ShpBatchReader.class);
+
+  public static class ShpReaderConfig {
+protected final ShpFormatPlugin plugin;
+
+public ShpReaderConfig(ShpFormatPlugin plugin) {
+  this.plugin = plugin;
+}
+  }
+
+  public ShpBatchReader(ShpReaderConfig readerConfig) {
+this.readerConfig = readerConfig;
+  }
+
+  @Override
+  public boolean open(FileSchemaNegotiator negotiator) {
+this.split = negotiator.split();
+this.hadoopShp = split.getPath();
+
+String filePath = split.getPath().toString();
+this.hadoopDbf = new Path(filePath.replace(".shp", ".dbf"));
+this.hadoopPrj = new Path(filePath.replace(".shp", ".prj"));
+
+openFile(negotiator);
+SchemaBuilder builder = new SchemaBuilder()
+  .addNullable("gid", TypeProtos.MinorType.INT)
+  .addNullable("srid", TypeProtos.MinorType.INT)
+  .addNullable("shapeType", TypeProtos.MinorType.VARCHAR)
+  .addNullable("geom", TypeProtos.MinorType.VARBINARY);
+
+negotiator.setTableSchema(builder.buildSchema(), false);
+loader = negotiator.build();
+
+rowWriter = loader.writer();
+gidWriter = rowWriter.scalar("gid");
+sridWriter = rowWriter.scalar("srid");
+shapeTypeWriter = rowWriter.scalar("shapeType");
+geomWriter = rowWriter.scalar("geom");
+
+return true;
+  }
+
+  @Override
+  public boolean next() {
+Geometry geom = null;
+
+while (!rowWriter.isFull()) {
+  Object[] 

[jira] [Commented] (DRILL-4303) ESRI Shapefile (shp) format plugin

2019-10-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961080#comment-16961080
 ] 

ASF GitHub Bot commented on DRILL-4303:
---

arina-ielchiieva commented on pull request #1858: DRILL-4303: ESRI Shapefile 
(shp) Format Plugin
URL: https://github.com/apache/drill/pull/1858#discussion_r339584367
 
 

 ##
 File path: 
contrib/format-esri/src/main/java/org/apache/drill/exec/store/esri/ShpBatchReader.java
 ##
 @@ -0,0 +1,318 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.esri;
+
+import com.esri.core.geometry.Geometry;
+import com.esri.core.geometry.GeometryCursor;
+import com.esri.core.geometry.ShapefileReader;
+import com.esri.core.geometry.SpatialReference;
+import com.esri.core.geometry.ogc.OGCGeometry;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.types.TypeProtos;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.MetadataUtils;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapred.FileSplit;
+import org.jamel.dbf.DbfReader;
+import org.jamel.dbf.structure.DbfField;
+import org.joda.time.Instant;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.nio.ByteBuffer;
+import java.nio.charset.Charset;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+public class ShpBatchReader implements ManagedReader {
+
+  private FileSplit split;
+  private ResultSetLoader loader;
+  private ShpReaderConfig readerConfig;
+  private Path hadoopShp;
+  private Path hadoopDbf;
+  private Path hadoopPrj;
+  private InputStream fileReaderShp = null;
+  private InputStream fileReaderDbf = null;
+  private InputStream fileReaderPrj = null;
+  private GeometryCursor geomCursor = null;
+  private DbfReader dbfReader = null;
+  private ScalarWriter gidWriter;
+  private ScalarWriter sridWriter;
+  private ScalarWriter shapeTypeWriter;
+  private ScalarWriter geomWriter;
+  private RowSetLoader rowWriter;
+  private int srid;
+  private SpatialReference spatialReference;
+  private static final Logger logger = 
LoggerFactory.getLogger(ShpBatchReader.class);
+
+  public static class ShpReaderConfig {
+protected final ShpFormatPlugin plugin;
+
+public ShpReaderConfig(ShpFormatPlugin plugin) {
+  this.plugin = plugin;
+}
+  }
+
+  public ShpBatchReader(ShpReaderConfig readerConfig) {
+this.readerConfig = readerConfig;
+  }
+
+  @Override
+  public boolean open(FileSchemaNegotiator negotiator) {
+this.split = negotiator.split();
+this.hadoopShp = split.getPath();
+
+String filePath = split.getPath().toString();
+this.hadoopDbf = new Path(filePath.replace(".shp", ".dbf"));
+this.hadoopPrj = new Path(filePath.replace(".shp", ".prj"));
+
+openFile(negotiator);
+SchemaBuilder builder = new SchemaBuilder()
+  .addNullable("gid", TypeProtos.MinorType.INT)
+  .addNullable("srid", TypeProtos.MinorType.INT)
+  .addNullable("shapeType", TypeProtos.MinorType.VARCHAR)
+  .addNullable("geom", TypeProtos.MinorType.VARBINARY);
+
+negotiator.setTableSchema(builder.buildSchema(), false);
+loader = negotiator.build();
+
+rowWriter = loader.writer();
+gidWriter = rowWriter.scalar("gid");
+sridWriter = rowWriter.scalar("srid");
+shapeTypeWriter = rowWriter.scalar("shapeType");
+geomWriter = rowWriter.scalar("geom");
+
+return true;
+  }
+
+  @Override
+  public boolean next() {
+Geometry geom = null;
+
+while (!rowWriter.isFull()) {
+  Object[] 

[jira] [Commented] (DRILL-4303) ESRI Shapefile (shp) format plugin

2019-10-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961067#comment-16961067
 ] 

ASF GitHub Bot commented on DRILL-4303:
---

cgivre commented on pull request #1858: DRILL-4303: ESRI Shapefile (shp) Format 
Plugin
URL: https://github.com/apache/drill/pull/1858#discussion_r339576671
 
 

 ##
 File path: 
contrib/format-esri/src/main/java/org/apache/drill/exec/store/esri/ShpBatchReader.java
 ##
 @@ -0,0 +1,318 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.esri;
+
+import com.esri.core.geometry.Geometry;
+import com.esri.core.geometry.GeometryCursor;
+import com.esri.core.geometry.ShapefileReader;
+import com.esri.core.geometry.SpatialReference;
+import com.esri.core.geometry.ogc.OGCGeometry;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.types.TypeProtos;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.MetadataUtils;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapred.FileSplit;
+import org.jamel.dbf.DbfReader;
+import org.jamel.dbf.structure.DbfField;
+import org.joda.time.Instant;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.nio.ByteBuffer;
+import java.nio.charset.Charset;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+public class ShpBatchReader implements ManagedReader {
+
+  private FileSplit split;
+  private ResultSetLoader loader;
+  private ShpReaderConfig readerConfig;
+  private Path hadoopShp;
+  private Path hadoopDbf;
+  private Path hadoopPrj;
+  private InputStream fileReaderShp = null;
+  private InputStream fileReaderDbf = null;
+  private InputStream fileReaderPrj = null;
+  private GeometryCursor geomCursor = null;
+  private DbfReader dbfReader = null;
+  private ScalarWriter gidWriter;
+  private ScalarWriter sridWriter;
+  private ScalarWriter shapeTypeWriter;
+  private ScalarWriter geomWriter;
+  private RowSetLoader rowWriter;
+  private int srid;
+  private SpatialReference spatialReference;
+  private static final Logger logger = 
LoggerFactory.getLogger(ShpBatchReader.class);
+
+  public static class ShpReaderConfig {
+protected final ShpFormatPlugin plugin;
+
+public ShpReaderConfig(ShpFormatPlugin plugin) {
+  this.plugin = plugin;
+}
+  }
+
+  public ShpBatchReader(ShpReaderConfig readerConfig) {
+this.readerConfig = readerConfig;
+  }
+
+  @Override
+  public boolean open(FileSchemaNegotiator negotiator) {
+this.split = negotiator.split();
+this.hadoopShp = split.getPath();
+
+String filePath = split.getPath().toString();
+this.hadoopDbf = new Path(filePath.replace(".shp", ".dbf"));
+this.hadoopPrj = new Path(filePath.replace(".shp", ".prj"));
+
+openFile(negotiator);
+SchemaBuilder builder = new SchemaBuilder()
+  .addNullable("gid", TypeProtos.MinorType.INT)
+  .addNullable("srid", TypeProtos.MinorType.INT)
+  .addNullable("shapeType", TypeProtos.MinorType.VARCHAR)
+  .addNullable("geom", TypeProtos.MinorType.VARBINARY);
+
+negotiator.setTableSchema(builder.buildSchema(), false);
+loader = negotiator.build();
+
+rowWriter = loader.writer();
+gidWriter = rowWriter.scalar("gid");
+sridWriter = rowWriter.scalar("srid");
+shapeTypeWriter = rowWriter.scalar("shapeType");
+geomWriter = rowWriter.scalar("geom");
+
+return true;
+  }
+
+  @Override
+  public boolean next() {
+Geometry geom = null;
+
+while (!rowWriter.isFull()) {
+  Object[] dbfRow = 

[jira] [Commented] (DRILL-4303) ESRI Shapefile (shp) format plugin

2019-10-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961060#comment-16961060
 ] 

ASF GitHub Bot commented on DRILL-4303:
---

arina-ielchiieva commented on pull request #1858: DRILL-4303: ESRI Shapefile 
(shp) Format Plugin
URL: https://github.com/apache/drill/pull/1858#discussion_r339573014
 
 

 ##
 File path: 
contrib/format-esri/src/test/java/org/apache/drill/exec/store/esri/TestShapefileFormatPlugin.java
 ##
 @@ -0,0 +1,113 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.esri;
+
+import org.apache.drill.categories.RowSetTests;
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.physical.rowSet.RowSet;
+import org.apache.drill.exec.physical.rowSet.RowSetBuilder;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.server.Drillbit;
+import org.apache.drill.exec.store.StoragePluginRegistry;
+import org.apache.drill.exec.store.dfs.FileSystemConfig;
+import org.apache.drill.exec.store.dfs.FileSystemPlugin;
+import org.apache.drill.test.BaseDirTestWatcher;
+import org.apache.drill.test.ClusterFixture;
+import org.apache.drill.test.ClusterTest;
+import org.apache.drill.test.rowSet.RowSetComparison;
+import org.junit.BeforeClass;
+import org.junit.ClassRule;
+import org.junit.Test;
+import org.junit.experimental.categories.Category;
+
+@Category(RowSetTests.class)
+public class TestShapefileFormatPlugin extends ClusterTest {
 
 Review comment:
   Please replace `openPossiblyCompressedStream` with `open` since when `set 
config.compressible = false;` compressed file will never be processed thus you 
require reader to do extra logic which is unneeded.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> ESRI Shapefile (shp) format plugin
> --
>
> Key: DRILL-4303
> URL: https://issues.apache.org/jira/browse/DRILL-4303
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Affects Versions: 1.17.0
>Reporter: Karol Potocki
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.17.0
>
>
> Allow Drill (drill-gis) to read esri shapefiles, one of the most popular 
> geospatial formats.
> Format described here:
> https://www.esri.com/library/whitepapers/pdfs/shapefile.pdf
> It consists of three files (prj - srid information, dbf - data fields, shp - 
> geometry data)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-4303) ESRI Shapefile (shp) format plugin

2019-10-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961061#comment-16961061
 ] 

ASF GitHub Bot commented on DRILL-4303:
---

arina-ielchiieva commented on pull request #1858: DRILL-4303: ESRI Shapefile 
(shp) Format Plugin
URL: https://github.com/apache/drill/pull/1858#discussion_r339573443
 
 

 ##
 File path: 
contrib/format-esri/src/main/java/org/apache/drill/exec/store/esri/ShpBatchReader.java
 ##
 @@ -0,0 +1,323 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.esri;
+
+import com.esri.core.geometry.Geometry;
+import com.esri.core.geometry.GeometryCursor;
+import com.esri.core.geometry.ShapefileReader;
+import com.esri.core.geometry.SpatialReference;
+import com.esri.core.geometry.ogc.OGCGeometry;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.types.TypeProtos;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.MetadataUtils;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapred.FileSplit;
+import org.jamel.dbf.DbfReader;
+import org.jamel.dbf.structure.DbfField;
+import org.joda.time.Instant;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.nio.ByteBuffer;
+import java.nio.charset.Charset;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+public class ShpBatchReader implements ManagedReader {
+
+  private static final Logger logger = 
LoggerFactory.getLogger(ShpBatchReader.class);
+  private static final String GID_FIELD_NAME = "gid";
+  private static final String SRID_FIELD_NAME = "srid";
+  private static final String SHAPE_TYPE_FIELD_NAME = "shapeType";
+  private static final String GEOM_FIELD_NAME = "geom";
+  private static final String SRID_PATTERN_TEXT = 
"AUTHORITY\\[\"\\w+\"\\s*,\\s*\"*(\\d+)\"*\\]\\]$";
+
+  private FileSplit split;
+  private ResultSetLoader loader;
+  private ShpReaderConfig readerConfig;
+  private Path hadoopShp;
+  private Path hadoopDbf;
+  private Path hadoopPrj;
+  private InputStream fileReaderShp = null;
+  private InputStream fileReaderDbf = null;
+  private InputStream fileReaderPrj = null;
+  private GeometryCursor geomCursor = null;
+  private DbfReader dbfReader = null;
+  private ScalarWriter gidWriter;
+  private ScalarWriter sridWriter;
+  private ScalarWriter shapeTypeWriter;
+  private ScalarWriter geomWriter;
+  private RowSetLoader rowWriter;
+  private int srid;
+  private SpatialReference spatialReference;
+
+  public static class ShpReaderConfig {
+protected final ShpFormatPlugin plugin;
+
+public ShpReaderConfig(ShpFormatPlugin plugin) {
+  this.plugin = plugin;
+}
+  }
+
+  public ShpBatchReader(ShpReaderConfig readerConfig) {
+this.readerConfig = readerConfig;
+  }
+
+  @Override
+  public boolean open(FileSchemaNegotiator negotiator) {
+split = negotiator.split();
+hadoopShp = split.getPath();
+
+String filePath = split.getPath().toString();
+hadoopDbf = new Path(filePath.replace(".shp", ".dbf"));
+hadoopPrj = new Path(filePath.replace(".shp", ".prj"));
+
+openFile(negotiator);
+SchemaBuilder builder = new SchemaBuilder()
+  .addNullable(GID_FIELD_NAME, TypeProtos.MinorType.INT)
+  .addNullable(SRID_FIELD_NAME, TypeProtos.MinorType.INT)
+  .addNullable(SHAPE_TYPE_FIELD_NAME, TypeProtos.MinorType.VARCHAR)
+  .addNullable(GEOM_FIELD_NAME, TypeProtos.MinorType.VARBINARY);
+
+negotiator.setTableSchema(builder.buildSchema(), false);
+loader = negotiator.build();
+
+

[jira] [Commented] (DRILL-4303) ESRI Shapefile (shp) format plugin

2019-10-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961056#comment-16961056
 ] 

ASF GitHub Bot commented on DRILL-4303:
---

cgivre commented on pull request #1858: DRILL-4303: ESRI Shapefile (shp) Format 
Plugin
URL: https://github.com/apache/drill/pull/1858#discussion_r339571526
 
 

 ##
 File path: 
contrib/format-esri/src/main/java/org/apache/drill/exec/store/esri/ShpFormatConfig.java
 ##
 @@ -0,0 +1,68 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.esri;
+
+import com.fasterxml.jackson.annotation.JsonInclude;
+import com.fasterxml.jackson.annotation.JsonTypeName;
+import org.apache.drill.common.logical.FormatPluginConfig;
+import org.apache.drill.shaded.guava.com.google.common.base.Objects;
+import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableList;
+
+import java.util.Arrays;
+import java.util.List;
+
+/**
+ * Format plugin config for shapefile data files.
+ */
+@JsonTypeName(ShpFormatPlugin.PLUGIN_NAME)
+public class ShpFormatConfig implements FormatPluginConfig {
+
+  private static final List DEFAULT_EXTS = ImmutableList.of("shp", 
"dbf");
+
+  public List extensions;
 
 Review comment:
   Done
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> ESRI Shapefile (shp) format plugin
> --
>
> Key: DRILL-4303
> URL: https://issues.apache.org/jira/browse/DRILL-4303
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Affects Versions: 1.17.0
>Reporter: Karol Potocki
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.17.0
>
>
> Allow Drill (drill-gis) to read esri shapefiles, one of the most popular 
> geospatial formats.
> Format described here:
> https://www.esri.com/library/whitepapers/pdfs/shapefile.pdf
> It consists of three files (prj - srid information, dbf - data fields, shp - 
> geometry data)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-4303) ESRI Shapefile (shp) format plugin

2019-10-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961052#comment-16961052
 ] 

ASF GitHub Bot commented on DRILL-4303:
---

cgivre commented on pull request #1858: DRILL-4303: ESRI Shapefile (shp) Format 
Plugin
URL: https://github.com/apache/drill/pull/1858#discussion_r339569884
 
 

 ##
 File path: 
contrib/format-esri/src/main/java/org/apache/drill/exec/store/esri/ShpFormatConfig.java
 ##
 @@ -0,0 +1,71 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.esri;
+
+import com.fasterxml.jackson.annotation.JsonInclude;
+import com.fasterxml.jackson.annotation.JsonTypeName;
+import org.apache.drill.common.logical.FormatPluginConfig;
+import org.apache.drill.shaded.guava.com.google.common.base.Objects;
+import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableList;
+
+import java.util.Arrays;
+import java.util.List;
+
+/**
+ * Format plugin config for shapefile data files.
+ */
+@JsonTypeName(ShpFormatPlugin.PLUGIN_NAME)
+@JsonInclude(JsonInclude.Include.NON_DEFAULT)
+public class ShpFormatConfig implements FormatPluginConfig {
+
+  private static final List DEFAULT_EXTS = ImmutableList.of("shp", 
"dbf");
 
 Review comment:
   Fixed.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> ESRI Shapefile (shp) format plugin
> --
>
> Key: DRILL-4303
> URL: https://issues.apache.org/jira/browse/DRILL-4303
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Affects Versions: 1.17.0
>Reporter: Karol Potocki
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.17.0
>
>
> Allow Drill (drill-gis) to read esri shapefiles, one of the most popular 
> geospatial formats.
> Format described here:
> https://www.esri.com/library/whitepapers/pdfs/shapefile.pdf
> It consists of three files (prj - srid information, dbf - data fields, shp - 
> geometry data)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-4303) ESRI Shapefile (shp) format plugin

2019-10-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961053#comment-16961053
 ] 

ASF GitHub Bot commented on DRILL-4303:
---

cgivre commented on pull request #1858: DRILL-4303: ESRI Shapefile (shp) Format 
Plugin
URL: https://github.com/apache/drill/pull/1858#discussion_r339570276
 
 

 ##
 File path: 
contrib/format-esri/src/test/java/org/apache/drill/exec/store/esri/TestShapefileFormatPlugin.java
 ##
 @@ -0,0 +1,113 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.esri;
+
+import org.apache.drill.categories.RowSetTests;
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.physical.rowSet.RowSet;
+import org.apache.drill.exec.physical.rowSet.RowSetBuilder;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.server.Drillbit;
+import org.apache.drill.exec.store.StoragePluginRegistry;
+import org.apache.drill.exec.store.dfs.FileSystemConfig;
+import org.apache.drill.exec.store.dfs.FileSystemPlugin;
+import org.apache.drill.test.BaseDirTestWatcher;
+import org.apache.drill.test.ClusterFixture;
+import org.apache.drill.test.ClusterTest;
+import org.apache.drill.test.rowSet.RowSetComparison;
+import org.junit.BeforeClass;
+import org.junit.ClassRule;
+import org.junit.Test;
+import org.junit.experimental.categories.Category;
+
+@Category(RowSetTests.class)
+public class TestShapefileFormatPlugin extends ClusterTest {
 
 Review comment:
   I removed the unit test.  I left the open method to use the 
`openPossiblyCompressedStream()` function however in the event that a file is 
compressed.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> ESRI Shapefile (shp) format plugin
> --
>
> Key: DRILL-4303
> URL: https://issues.apache.org/jira/browse/DRILL-4303
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Affects Versions: 1.17.0
>Reporter: Karol Potocki
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.17.0
>
>
> Allow Drill (drill-gis) to read esri shapefiles, one of the most popular 
> geospatial formats.
> Format described here:
> https://www.esri.com/library/whitepapers/pdfs/shapefile.pdf
> It consists of three files (prj - srid information, dbf - data fields, shp - 
> geometry data)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-4303) ESRI Shapefile (shp) format plugin

2019-10-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961050#comment-16961050
 ] 

ASF GitHub Bot commented on DRILL-4303:
---

cgivre commented on pull request #1858: DRILL-4303: ESRI Shapefile (shp) Format 
Plugin
URL: https://github.com/apache/drill/pull/1858#discussion_r339569795
 
 

 ##
 File path: 
contrib/format-esri/src/main/java/org/apache/drill/exec/store/esri/ShpFormatConfig.java
 ##
 @@ -0,0 +1,71 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.esri;
+
+import com.fasterxml.jackson.annotation.JsonInclude;
+import com.fasterxml.jackson.annotation.JsonTypeName;
+import org.apache.drill.common.logical.FormatPluginConfig;
+import org.apache.drill.shaded.guava.com.google.common.base.Objects;
+import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableList;
+
+import java.util.Arrays;
+import java.util.List;
+
+/**
+ * Format plugin config for shapefile data files.
+ */
+@JsonTypeName(ShpFormatPlugin.PLUGIN_NAME)
+@JsonInclude(JsonInclude.Include.NON_DEFAULT)
+public class ShpFormatConfig implements FormatPluginConfig {
+
+  private static final List DEFAULT_EXTS = ImmutableList.of("shp", 
"dbf");
+
+  public List extensions;
+
+  @JsonInclude(JsonInclude.Include.NON_DEFAULT)
+  public List getExtensions() {
+if (extensions == null) {
+  return DEFAULT_EXTS;
+}
+return extensions;
+  }
+
+  public ShpBatchReader.ShpReaderConfig getReaderConfig(ShpFormatPlugin 
plugin) {
+ShpBatchReader.ShpReaderConfig readerConfig = new 
ShpBatchReader.ShpReaderConfig(plugin);
+
+return readerConfig;
+  }
+
+  @Override
+  public int hashCode() {
+return Arrays.hashCode(new Object[]{extensions});
+  }
+
+  @Override
+  public boolean equals(Object obj) {
+if (this == obj) {
+  return true;
+}
+if (obj == null || getClass() != obj.getClass()) {
+  return false;
+}
+ShpFormatConfig other = (ShpFormatConfig)obj;
+return Objects.equal(extensions, other.getExtensions() );
 
 Review comment:
   Fixed and replaced with `java.util.Objects`
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> ESRI Shapefile (shp) format plugin
> --
>
> Key: DRILL-4303
> URL: https://issues.apache.org/jira/browse/DRILL-4303
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Affects Versions: 1.17.0
>Reporter: Karol Potocki
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.17.0
>
>
> Allow Drill (drill-gis) to read esri shapefiles, one of the most popular 
> geospatial formats.
> Format described here:
> https://www.esri.com/library/whitepapers/pdfs/shapefile.pdf
> It consists of three files (prj - srid information, dbf - data fields, shp - 
> geometry data)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-4303) ESRI Shapefile (shp) format plugin

2019-10-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961048#comment-16961048
 ] 

ASF GitHub Bot commented on DRILL-4303:
---

cgivre commented on pull request #1858: DRILL-4303: ESRI Shapefile (shp) Format 
Plugin
URL: https://github.com/apache/drill/pull/1858#discussion_r339569525
 
 

 ##
 File path: 
contrib/format-esri/src/main/java/org/apache/drill/exec/store/esri/ShpFormatConfig.java
 ##
 @@ -0,0 +1,71 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.esri;
+
+import com.fasterxml.jackson.annotation.JsonInclude;
+import com.fasterxml.jackson.annotation.JsonTypeName;
+import org.apache.drill.common.logical.FormatPluginConfig;
+import org.apache.drill.shaded.guava.com.google.common.base.Objects;
+import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableList;
+
+import java.util.Arrays;
+import java.util.List;
+
+/**
+ * Format plugin config for shapefile data files.
+ */
+@JsonTypeName(ShpFormatPlugin.PLUGIN_NAME)
+@JsonInclude(JsonInclude.Include.NON_DEFAULT)
+public class ShpFormatConfig implements FormatPluginConfig {
+
+  private static final List DEFAULT_EXTS = ImmutableList.of("shp", 
"dbf");
 
 Review comment:
   Fixed
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> ESRI Shapefile (shp) format plugin
> --
>
> Key: DRILL-4303
> URL: https://issues.apache.org/jira/browse/DRILL-4303
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Affects Versions: 1.17.0
>Reporter: Karol Potocki
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.17.0
>
>
> Allow Drill (drill-gis) to read esri shapefiles, one of the most popular 
> geospatial formats.
> Format described here:
> https://www.esri.com/library/whitepapers/pdfs/shapefile.pdf
> It consists of three files (prj - srid information, dbf - data fields, shp - 
> geometry data)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-4303) ESRI Shapefile (shp) format plugin

2019-10-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961047#comment-16961047
 ] 

ASF GitHub Bot commented on DRILL-4303:
---

cgivre commented on pull request #1858: DRILL-4303: ESRI Shapefile (shp) Format 
Plugin
URL: https://github.com/apache/drill/pull/1858#discussion_r339569053
 
 

 ##
 File path: 
contrib/format-esri/src/test/java/org/apache/drill/exec/store/esri/TestShapefileFormatPlugin.java
 ##
 @@ -0,0 +1,161 @@
+/*
 
 Review comment:
   Removed file.  Not sure why that was there.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> ESRI Shapefile (shp) format plugin
> --
>
> Key: DRILL-4303
> URL: https://issues.apache.org/jira/browse/DRILL-4303
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Affects Versions: 1.17.0
>Reporter: Karol Potocki
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.17.0
>
>
> Allow Drill (drill-gis) to read esri shapefiles, one of the most popular 
> geospatial formats.
> Format described here:
> https://www.esri.com/library/whitepapers/pdfs/shapefile.pdf
> It consists of three files (prj - srid information, dbf - data fields, shp - 
> geometry data)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-4303) ESRI Shapefile (shp) format plugin

2019-10-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16960833#comment-16960833
 ] 

ASF GitHub Bot commented on DRILL-4303:
---

arina-ielchiieva commented on pull request #1858: DRILL-4303: ESRI Shapefile 
(shp) Format Plugin
URL: https://github.com/apache/drill/pull/1858#discussion_r339430477
 
 

 ##
 File path: 
contrib/format-esri/src/main/java/org/apache/drill/exec/store/esri/ShpFormatConfig.java
 ##
 @@ -0,0 +1,71 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.esri;
+
+import com.fasterxml.jackson.annotation.JsonInclude;
+import com.fasterxml.jackson.annotation.JsonTypeName;
+import org.apache.drill.common.logical.FormatPluginConfig;
+import org.apache.drill.shaded.guava.com.google.common.base.Objects;
+import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableList;
+
+import java.util.Arrays;
+import java.util.List;
+
+/**
+ * Format plugin config for shapefile data files.
+ */
+@JsonTypeName(ShpFormatPlugin.PLUGIN_NAME)
+@JsonInclude(JsonInclude.Include.NON_DEFAULT)
+public class ShpFormatConfig implements FormatPluginConfig {
+
+  private static final List DEFAULT_EXTS = ImmutableList.of("shp", 
"dbf");
+
+  public List extensions;
+
+  @JsonInclude(JsonInclude.Include.NON_DEFAULT)
+  public List getExtensions() {
+if (extensions == null) {
+  return DEFAULT_EXTS;
+}
+return extensions;
+  }
+
+  public ShpBatchReader.ShpReaderConfig getReaderConfig(ShpFormatPlugin 
plugin) {
+ShpBatchReader.ShpReaderConfig readerConfig = new 
ShpBatchReader.ShpReaderConfig(plugin);
+
+return readerConfig;
+  }
+
+  @Override
+  public int hashCode() {
+return Arrays.hashCode(new Object[]{extensions});
+  }
+
+  @Override
+  public boolean equals(Object obj) {
+if (this == obj) {
+  return true;
+}
+if (obj == null || getClass() != obj.getClass()) {
+  return false;
+}
+ShpFormatConfig other = (ShpFormatConfig)obj;
+return Objects.equal(extensions, other.getExtensions() );
 
 Review comment:
   Is there an analog method not from Guava? We should try to use build-in Java 
methods when possible...
   
   ```suggestion
   return Objects.equal(extensions, other.getExtensions());
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> ESRI Shapefile (shp) format plugin
> --
>
> Key: DRILL-4303
> URL: https://issues.apache.org/jira/browse/DRILL-4303
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Affects Versions: 1.17.0
>Reporter: Karol Potocki
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.17.0
>
>
> Allow Drill (drill-gis) to read esri shapefiles, one of the most popular 
> geospatial formats.
> Format described here:
> https://www.esri.com/library/whitepapers/pdfs/shapefile.pdf
> It consists of three files (prj - srid information, dbf - data fields, shp - 
> geometry data)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-4303) ESRI Shapefile (shp) format plugin

2019-10-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16960830#comment-16960830
 ] 

ASF GitHub Bot commented on DRILL-4303:
---

arina-ielchiieva commented on pull request #1858: DRILL-4303: ESRI Shapefile 
(shp) Format Plugin
URL: https://github.com/apache/drill/pull/1858#discussion_r339429171
 
 

 ##
 File path: 
contrib/format-esri/src/main/java/org/apache/drill/exec/store/esri/ShpFormatConfig.java
 ##
 @@ -0,0 +1,71 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.esri;
+
+import com.fasterxml.jackson.annotation.JsonInclude;
+import com.fasterxml.jackson.annotation.JsonTypeName;
+import org.apache.drill.common.logical.FormatPluginConfig;
+import org.apache.drill.shaded.guava.com.google.common.base.Objects;
+import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableList;
+
+import java.util.Arrays;
+import java.util.List;
+
+/**
+ * Format plugin config for shapefile data files.
+ */
+@JsonTypeName(ShpFormatPlugin.PLUGIN_NAME)
+@JsonInclude(JsonInclude.Include.NON_DEFAULT)
+public class ShpFormatConfig implements FormatPluginConfig {
+
+  private static final List DEFAULT_EXTS = ImmutableList.of("shp", 
"dbf");
 
 Review comment:
   No I meant set `List extensions = Arrays.asList("shp", "dbf"); `
   And `public List getExtensions() { return extensions; }`
   
   In test class I see you only read from shp extension, dbf is also valid? I 
though it is used for the second file of three files. As far as I understand, 
for Drill only shp will be valid...
   So I think proper code should be `List extensions = 
Collections.singletonList("shp"); `
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> ESRI Shapefile (shp) format plugin
> --
>
> Key: DRILL-4303
> URL: https://issues.apache.org/jira/browse/DRILL-4303
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Affects Versions: 1.17.0
>Reporter: Karol Potocki
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.17.0
>
>
> Allow Drill (drill-gis) to read esri shapefiles, one of the most popular 
> geospatial formats.
> Format described here:
> https://www.esri.com/library/whitepapers/pdfs/shapefile.pdf
> It consists of three files (prj - srid information, dbf - data fields, shp - 
> geometry data)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-4303) ESRI Shapefile (shp) format plugin

2019-10-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16960835#comment-16960835
 ] 

ASF GitHub Bot commented on DRILL-4303:
---

arina-ielchiieva commented on pull request #1858: DRILL-4303: ESRI Shapefile 
(shp) Format Plugin
URL: https://github.com/apache/drill/pull/1858#discussion_r339429916
 
 

 ##
 File path: 
contrib/format-esri/src/test/java/org/apache/drill/exec/store/esri/TestShapefileFormatPlugin.java
 ##
 @@ -0,0 +1,113 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.esri;
+
+import org.apache.drill.categories.RowSetTests;
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.physical.rowSet.RowSet;
+import org.apache.drill.exec.physical.rowSet.RowSetBuilder;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.server.Drillbit;
+import org.apache.drill.exec.store.StoragePluginRegistry;
+import org.apache.drill.exec.store.dfs.FileSystemConfig;
+import org.apache.drill.exec.store.dfs.FileSystemPlugin;
+import org.apache.drill.test.BaseDirTestWatcher;
+import org.apache.drill.test.ClusterFixture;
+import org.apache.drill.test.ClusterTest;
+import org.apache.drill.test.rowSet.RowSetComparison;
+import org.junit.BeforeClass;
+import org.junit.ClassRule;
+import org.junit.Test;
+import org.junit.experimental.categories.Category;
+
+@Category(RowSetTests.class)
+public class TestShapefileFormatPlugin extends ClusterTest {
 
 Review comment:
   Drill does not support such functionality. One one compressed file can be 
read at a time.
   In this case you need to use 
`negotiator.fileSystem().open(split.getPath());` and `set `config.compressible 
= false;`, also remove unnecessary compression code from the test class.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> ESRI Shapefile (shp) format plugin
> --
>
> Key: DRILL-4303
> URL: https://issues.apache.org/jira/browse/DRILL-4303
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Affects Versions: 1.17.0
>Reporter: Karol Potocki
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.17.0
>
>
> Allow Drill (drill-gis) to read esri shapefiles, one of the most popular 
> geospatial formats.
> Format described here:
> https://www.esri.com/library/whitepapers/pdfs/shapefile.pdf
> It consists of three files (prj - srid information, dbf - data fields, shp - 
> geometry data)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-4303) ESRI Shapefile (shp) format plugin

2019-10-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16960831#comment-16960831
 ] 

ASF GitHub Bot commented on DRILL-4303:
---

arina-ielchiieva commented on pull request #1858: DRILL-4303: ESRI Shapefile 
(shp) Format Plugin
URL: https://github.com/apache/drill/pull/1858#discussion_r339432499
 
 

 ##
 File path: 
contrib/format-esri/src/test/java/org/apache/drill/exec/store/esri/TestShapefileFormatPlugin.java
 ##
 @@ -0,0 +1,161 @@
+/*
 
 Review comment:
   Why you need to add 
`contrib/format-esri/src/test/resources/shapefiles/CA-cities.parquet` in 
resources?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> ESRI Shapefile (shp) format plugin
> --
>
> Key: DRILL-4303
> URL: https://issues.apache.org/jira/browse/DRILL-4303
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Affects Versions: 1.17.0
>Reporter: Karol Potocki
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.17.0
>
>
> Allow Drill (drill-gis) to read esri shapefiles, one of the most popular 
> geospatial formats.
> Format described here:
> https://www.esri.com/library/whitepapers/pdfs/shapefile.pdf
> It consists of three files (prj - srid information, dbf - data fields, shp - 
> geometry data)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-4303) ESRI Shapefile (shp) format plugin

2019-10-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16960834#comment-16960834
 ] 

ASF GitHub Bot commented on DRILL-4303:
---

arina-ielchiieva commented on pull request #1858: DRILL-4303: ESRI Shapefile 
(shp) Format Plugin
URL: https://github.com/apache/drill/pull/1858#discussion_r339429242
 
 

 ##
 File path: 
contrib/format-esri/src/main/java/org/apache/drill/exec/store/esri/ShpBatchReader.java
 ##
 @@ -0,0 +1,318 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.esri;
+
+import com.esri.core.geometry.Geometry;
+import com.esri.core.geometry.GeometryCursor;
+import com.esri.core.geometry.ShapefileReader;
+import com.esri.core.geometry.SpatialReference;
+import com.esri.core.geometry.ogc.OGCGeometry;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.types.TypeProtos;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.MetadataUtils;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapred.FileSplit;
+import org.jamel.dbf.DbfReader;
+import org.jamel.dbf.structure.DbfField;
+import org.joda.time.Instant;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.nio.ByteBuffer;
+import java.nio.charset.Charset;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+public class ShpBatchReader implements ManagedReader {
+
+  private FileSplit split;
+  private ResultSetLoader loader;
+  private ShpReaderConfig readerConfig;
+  private Path hadoopShp;
+  private Path hadoopDbf;
+  private Path hadoopPrj;
+  private InputStream fileReaderShp = null;
+  private InputStream fileReaderDbf = null;
+  private InputStream fileReaderPrj = null;
+  private GeometryCursor geomCursor = null;
+  private DbfReader dbfReader = null;
+  private ScalarWriter gidWriter;
+  private ScalarWriter sridWriter;
+  private ScalarWriter shapeTypeWriter;
+  private ScalarWriter geomWriter;
+  private RowSetLoader rowWriter;
+  private int srid;
+  private SpatialReference spatialReference;
+  private static final Logger logger = 
LoggerFactory.getLogger(ShpBatchReader.class);
+
+  public static class ShpReaderConfig {
+protected final ShpFormatPlugin plugin;
+
+public ShpReaderConfig(ShpFormatPlugin plugin) {
+  this.plugin = plugin;
+}
+  }
+
+  public ShpBatchReader(ShpReaderConfig readerConfig) {
+this.readerConfig = readerConfig;
+  }
+
+  @Override
+  public boolean open(FileSchemaNegotiator negotiator) {
+this.split = negotiator.split();
+this.hadoopShp = split.getPath();
+
+String filePath = split.getPath().toString();
+this.hadoopDbf = new Path(filePath.replace(".shp", ".dbf"));
+this.hadoopPrj = new Path(filePath.replace(".shp", ".prj"));
+
+openFile(negotiator);
+SchemaBuilder builder = new SchemaBuilder()
+  .addNullable("gid", TypeProtos.MinorType.INT)
+  .addNullable("srid", TypeProtos.MinorType.INT)
+  .addNullable("shapeType", TypeProtos.MinorType.VARCHAR)
+  .addNullable("geom", TypeProtos.MinorType.VARBINARY);
+
+negotiator.setTableSchema(builder.buildSchema(), false);
+loader = negotiator.build();
+
+rowWriter = loader.writer();
+gidWriter = rowWriter.scalar("gid");
+sridWriter = rowWriter.scalar("srid");
+shapeTypeWriter = rowWriter.scalar("shapeType");
+geomWriter = rowWriter.scalar("geom");
+
+return true;
+  }
+
+  @Override
+  public boolean next() {
+Geometry geom = null;
+
+while (!rowWriter.isFull()) {
+  Object[] 

[jira] [Commented] (DRILL-4303) ESRI Shapefile (shp) format plugin

2019-10-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16960832#comment-16960832
 ] 

ASF GitHub Bot commented on DRILL-4303:
---

arina-ielchiieva commented on pull request #1858: DRILL-4303: ESRI Shapefile 
(shp) Format Plugin
URL: https://github.com/apache/drill/pull/1858#discussion_r339430653
 
 

 ##
 File path: 
contrib/format-esri/src/main/java/org/apache/drill/exec/store/esri/ShpFormatConfig.java
 ##
 @@ -0,0 +1,71 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.esri;
+
+import com.fasterxml.jackson.annotation.JsonInclude;
+import com.fasterxml.jackson.annotation.JsonTypeName;
+import org.apache.drill.common.logical.FormatPluginConfig;
+import org.apache.drill.shaded.guava.com.google.common.base.Objects;
+import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableList;
+
+import java.util.Arrays;
+import java.util.List;
+
+/**
+ * Format plugin config for shapefile data files.
+ */
+@JsonTypeName(ShpFormatPlugin.PLUGIN_NAME)
+@JsonInclude(JsonInclude.Include.NON_DEFAULT)
+public class ShpFormatConfig implements FormatPluginConfig {
+
+  private static final List DEFAULT_EXTS = ImmutableList.of("shp", 
"dbf");
 
 Review comment:
   Please avoid using Guava: `ImmutableList.of` -> `Arrays.asList`
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> ESRI Shapefile (shp) format plugin
> --
>
> Key: DRILL-4303
> URL: https://issues.apache.org/jira/browse/DRILL-4303
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Affects Versions: 1.17.0
>Reporter: Karol Potocki
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.17.0
>
>
> Allow Drill (drill-gis) to read esri shapefiles, one of the most popular 
> geospatial formats.
> Format described here:
> https://www.esri.com/library/whitepapers/pdfs/shapefile.pdf
> It consists of three files (prj - srid information, dbf - data fields, shp - 
> geometry data)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-4303) ESRI Shapefile (shp) format plugin

2019-10-27 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16960743#comment-16960743
 ] 

ASF GitHub Bot commented on DRILL-4303:
---

cgivre commented on pull request #1858: DRILL-4303: ESRI Shapefile (shp) Format 
Plugin
URL: https://github.com/apache/drill/pull/1858#discussion_r339395240
 
 

 ##
 File path: 
contrib/format-esri/src/test/java/org/apache/drill/exec/store/esri/TestShapefileFormatPlugin.java
 ##
 @@ -0,0 +1,113 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.esri;
+
+import org.apache.drill.categories.RowSetTests;
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.physical.rowSet.RowSet;
+import org.apache.drill.exec.physical.rowSet.RowSetBuilder;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.server.Drillbit;
+import org.apache.drill.exec.store.StoragePluginRegistry;
+import org.apache.drill.exec.store.dfs.FileSystemConfig;
+import org.apache.drill.exec.store.dfs.FileSystemPlugin;
+import org.apache.drill.test.BaseDirTestWatcher;
+import org.apache.drill.test.ClusterFixture;
+import org.apache.drill.test.ClusterTest;
+import org.apache.drill.test.rowSet.RowSetComparison;
+import org.junit.BeforeClass;
+import org.junit.ClassRule;
+import org.junit.Test;
+import org.junit.experimental.categories.Category;
+
+@Category(RowSetTests.class)
+public class TestShapefileFormatPlugin extends ClusterTest {
 
 Review comment:
   @arina-ielchiieva 
   In looking at this, I added the SeDe test, however I'm having an issue with 
the compression situation.  Here's the situation.  Shapefiles are actually a 
collection of three files which contains aspects of the metadata.  Typically, 
these files would be distributed as a collection, like a tarball.  
   I'm curious as to how you would recommend adding this functionality? 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> ESRI Shapefile (shp) format plugin
> --
>
> Key: DRILL-4303
> URL: https://issues.apache.org/jira/browse/DRILL-4303
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Affects Versions: 1.17.0
>Reporter: Karol Potocki
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.17.0
>
>
> Allow Drill (drill-gis) to read esri shapefiles, one of the most popular 
> geospatial formats.
> Format described here:
> https://www.esri.com/library/whitepapers/pdfs/shapefile.pdf
> It consists of three files (prj - srid information, dbf - data fields, shp - 
> geometry data)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-4303) ESRI Shapefile (shp) format plugin

2019-10-27 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16960603#comment-16960603
 ] 

ASF GitHub Bot commented on DRILL-4303:
---

arina-ielchiieva commented on issue #1858: DRILL-4303: ESRI Shapefile (shp) 
Format Plugin
URL: https://github.com/apache/drill/pull/1858#issuecomment-546696403
 
 
   Also please create Jira for future reader improvements...
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> ESRI Shapefile (shp) format plugin
> --
>
> Key: DRILL-4303
> URL: https://issues.apache.org/jira/browse/DRILL-4303
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Affects Versions: 1.17.0
>Reporter: Karol Potocki
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.17.0
>
>
> Allow Drill (drill-gis) to read esri shapefiles, one of the most popular 
> geospatial formats.
> Format described here:
> https://www.esri.com/library/whitepapers/pdfs/shapefile.pdf
> It consists of three files (prj - srid information, dbf - data fields, shp - 
> geometry data)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-4303) ESRI Shapefile (shp) format plugin

2019-10-27 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16960591#comment-16960591
 ] 

ASF GitHub Bot commented on DRILL-4303:
---

arina-ielchiieva commented on pull request #1858: DRILL-4303: ESRI Shapefile 
(shp) Format Plugin
URL: https://github.com/apache/drill/pull/1858#discussion_r339343179
 
 

 ##
 File path: 
contrib/format-esri/src/main/java/org/apache/drill/exec/store/esri/ShpFormatConfig.java
 ##
 @@ -0,0 +1,68 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.esri;
+
+import com.fasterxml.jackson.annotation.JsonInclude;
+import com.fasterxml.jackson.annotation.JsonTypeName;
+import org.apache.drill.common.logical.FormatPluginConfig;
+import org.apache.drill.shaded.guava.com.google.common.base.Objects;
+import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableList;
+
+import java.util.Arrays;
+import java.util.List;
+
+/**
+ * Format plugin config for shapefile data files.
+ */
+@JsonTypeName(ShpFormatPlugin.PLUGIN_NAME)
+public class ShpFormatConfig implements FormatPluginConfig {
+
+  private static final List DEFAULT_EXTS = ImmutableList.of("shp", 
"dbf");
+
+  public List extensions;
+
+  public ShpFormatConfig() { }
+
+  @JsonInclude(JsonInclude.Include.NON_DEFAULT)
+  public List getExtensions() {
+return extensions == null ? DEFAULT_EXTS : extensions;
 
 Review comment:
   Can be removed and return only extensions without assigning logic.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> ESRI Shapefile (shp) format plugin
> --
>
> Key: DRILL-4303
> URL: https://issues.apache.org/jira/browse/DRILL-4303
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Affects Versions: 1.17.0
>Reporter: Karol Potocki
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.17.0
>
>
> Allow Drill (drill-gis) to read esri shapefiles, one of the most popular 
> geospatial formats.
> Format described here:
> https://www.esri.com/library/whitepapers/pdfs/shapefile.pdf
> It consists of three files (prj - srid information, dbf - data fields, shp - 
> geometry data)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-4303) ESRI Shapefile (shp) format plugin

2019-10-27 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16960598#comment-16960598
 ] 

ASF GitHub Bot commented on DRILL-4303:
---

arina-ielchiieva commented on pull request #1858: DRILL-4303: ESRI Shapefile 
(shp) Format Plugin
URL: https://github.com/apache/drill/pull/1858#discussion_r339342956
 
 

 ##
 File path: contrib/format-esri/README.md
 ##
 @@ -0,0 +1,35 @@
+# Format Plugin for ESRI Shape Files
+This format plugin allows Drill to read ESRI Shape files. You can read about 
the shapefile format here: https://en.wikipedia.org/wiki/Shapefile. 
+
+## Configuration Options
+Other than the file extensions, there are no configuration options for this 
plugin. To use, simply add the following to your configuration:
+
+```
+"shp": {
+  "type": "shp",
+  "extensions": [
+"shp"
+  ]
+}
+```
+
+## Usage Notes:
+This plugin will return the following fields:
+
+* `gid`:  Integer
+* `srid`:  Integer
+* `shapeType`:  String
+* `name`:  Plain text 
+* `geom`:  A geometric point or path.  This field is returned as a `VARBINARY`.
 
 Review comment:
   Use one space...
   ```suggestion
   * `geom`:  A geometric point or path. This field is returned as a 
`VARBINARY`.
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> ESRI Shapefile (shp) format plugin
> --
>
> Key: DRILL-4303
> URL: https://issues.apache.org/jira/browse/DRILL-4303
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Affects Versions: 1.17.0
>Reporter: Karol Potocki
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.17.0
>
>
> Allow Drill (drill-gis) to read esri shapefiles, one of the most popular 
> geospatial formats.
> Format described here:
> https://www.esri.com/library/whitepapers/pdfs/shapefile.pdf
> It consists of three files (prj - srid information, dbf - data fields, shp - 
> geometry data)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-4303) ESRI Shapefile (shp) format plugin

2019-10-27 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16960592#comment-16960592
 ] 

ASF GitHub Bot commented on DRILL-4303:
---

arina-ielchiieva commented on pull request #1858: DRILL-4303: ESRI Shapefile 
(shp) Format Plugin
URL: https://github.com/apache/drill/pull/1858#discussion_r339343016
 
 

 ##
 File path: 
contrib/format-esri/src/main/java/org/apache/drill/exec/store/esri/ShpBatchReader.java
 ##
 @@ -0,0 +1,318 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.esri;
+
+import com.esri.core.geometry.Geometry;
+import com.esri.core.geometry.GeometryCursor;
+import com.esri.core.geometry.ShapefileReader;
+import com.esri.core.geometry.SpatialReference;
+import com.esri.core.geometry.ogc.OGCGeometry;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.types.TypeProtos;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.MetadataUtils;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapred.FileSplit;
+import org.jamel.dbf.DbfReader;
+import org.jamel.dbf.structure.DbfField;
+import org.joda.time.Instant;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.nio.ByteBuffer;
+import java.nio.charset.Charset;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+public class ShpBatchReader implements ManagedReader {
+
+  private FileSplit split;
+  private ResultSetLoader loader;
+  private ShpReaderConfig readerConfig;
+  private Path hadoopShp;
+  private Path hadoopDbf;
+  private Path hadoopPrj;
+  private InputStream fileReaderShp = null;
+  private InputStream fileReaderDbf = null;
+  private InputStream fileReaderPrj = null;
+  private GeometryCursor geomCursor = null;
+  private DbfReader dbfReader = null;
+  private ScalarWriter gidWriter;
+  private ScalarWriter sridWriter;
+  private ScalarWriter shapeTypeWriter;
+  private ScalarWriter geomWriter;
+  private RowSetLoader rowWriter;
+  private int srid;
+  private SpatialReference spatialReference;
+  private static final Logger logger = 
LoggerFactory.getLogger(ShpBatchReader.class);
+
+  public static class ShpReaderConfig {
+protected final ShpFormatPlugin plugin;
+
+public ShpReaderConfig(ShpFormatPlugin plugin) {
+  this.plugin = plugin;
+}
+  }
+
+  public ShpBatchReader(ShpReaderConfig readerConfig) {
+this.readerConfig = readerConfig;
+  }
+
+  @Override
+  public boolean open(FileSchemaNegotiator negotiator) {
+this.split = negotiator.split();
+this.hadoopShp = split.getPath();
+
+String filePath = split.getPath().toString();
+this.hadoopDbf = new Path(filePath.replace(".shp", ".dbf"));
+this.hadoopPrj = new Path(filePath.replace(".shp", ".prj"));
+
+openFile(negotiator);
+SchemaBuilder builder = new SchemaBuilder()
+  .addNullable("gid", TypeProtos.MinorType.INT)
+  .addNullable("srid", TypeProtos.MinorType.INT)
+  .addNullable("shapeType", TypeProtos.MinorType.VARCHAR)
+  .addNullable("geom", TypeProtos.MinorType.VARBINARY);
+
+negotiator.setTableSchema(builder.buildSchema(), false);
+loader = negotiator.build();
+
+rowWriter = loader.writer();
+gidWriter = rowWriter.scalar("gid");
+sridWriter = rowWriter.scalar("srid");
+shapeTypeWriter = rowWriter.scalar("shapeType");
+geomWriter = rowWriter.scalar("geom");
+
+return true;
+  }
+
+  @Override
+  public boolean next() {
+Geometry geom = null;
+
+while (!rowWriter.isFull()) {
+  Object[] 

[jira] [Commented] (DRILL-4303) ESRI Shapefile (shp) format plugin

2019-10-27 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16960596#comment-16960596
 ] 

ASF GitHub Bot commented on DRILL-4303:
---

arina-ielchiieva commented on pull request #1858: DRILL-4303: ESRI Shapefile 
(shp) Format Plugin
URL: https://github.com/apache/drill/pull/1858#discussion_r339343249
 
 

 ##
 File path: 
contrib/format-esri/src/test/java/org/apache/drill/exec/store/esri/TestShapefileFormatPlugin.java
 ##
 @@ -0,0 +1,113 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.esri;
+
+import org.apache.drill.categories.RowSetTests;
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.physical.rowSet.RowSet;
+import org.apache.drill.exec.physical.rowSet.RowSetBuilder;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.server.Drillbit;
+import org.apache.drill.exec.store.StoragePluginRegistry;
+import org.apache.drill.exec.store.dfs.FileSystemConfig;
+import org.apache.drill.exec.store.dfs.FileSystemPlugin;
+import org.apache.drill.test.BaseDirTestWatcher;
+import org.apache.drill.test.ClusterFixture;
+import org.apache.drill.test.ClusterTest;
+import org.apache.drill.test.rowSet.RowSetComparison;
+import org.junit.BeforeClass;
+import org.junit.ClassRule;
+import org.junit.Test;
+import org.junit.experimental.categories.Category;
+
+@Category(RowSetTests.class)
+public class TestShapefileFormatPlugin extends ClusterTest {
 
 Review comment:
   Add ser / de and compression tests.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> ESRI Shapefile (shp) format plugin
> --
>
> Key: DRILL-4303
> URL: https://issues.apache.org/jira/browse/DRILL-4303
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Affects Versions: 1.17.0
>Reporter: Karol Potocki
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.17.0
>
>
> Allow Drill (drill-gis) to read esri shapefiles, one of the most popular 
> geospatial formats.
> Format described here:
> https://www.esri.com/library/whitepapers/pdfs/shapefile.pdf
> It consists of three files (prj - srid information, dbf - data fields, shp - 
> geometry data)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-4303) ESRI Shapefile (shp) format plugin

2019-10-27 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16960593#comment-16960593
 ] 

ASF GitHub Bot commented on DRILL-4303:
---

arina-ielchiieva commented on pull request #1858: DRILL-4303: ESRI Shapefile 
(shp) Format Plugin
URL: https://github.com/apache/drill/pull/1858#discussion_r339342995
 
 

 ##
 File path: 
contrib/format-esri/src/main/java/org/apache/drill/exec/store/esri/ShpBatchReader.java
 ##
 @@ -0,0 +1,318 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.esri;
+
+import com.esri.core.geometry.Geometry;
+import com.esri.core.geometry.GeometryCursor;
+import com.esri.core.geometry.ShapefileReader;
+import com.esri.core.geometry.SpatialReference;
+import com.esri.core.geometry.ogc.OGCGeometry;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.types.TypeProtos;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.MetadataUtils;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapred.FileSplit;
+import org.jamel.dbf.DbfReader;
+import org.jamel.dbf.structure.DbfField;
+import org.joda.time.Instant;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.nio.ByteBuffer;
+import java.nio.charset.Charset;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+public class ShpBatchReader implements ManagedReader {
+
+  private FileSplit split;
+  private ResultSetLoader loader;
+  private ShpReaderConfig readerConfig;
+  private Path hadoopShp;
+  private Path hadoopDbf;
+  private Path hadoopPrj;
+  private InputStream fileReaderShp = null;
+  private InputStream fileReaderDbf = null;
+  private InputStream fileReaderPrj = null;
+  private GeometryCursor geomCursor = null;
+  private DbfReader dbfReader = null;
+  private ScalarWriter gidWriter;
+  private ScalarWriter sridWriter;
+  private ScalarWriter shapeTypeWriter;
+  private ScalarWriter geomWriter;
+  private RowSetLoader rowWriter;
+  private int srid;
+  private SpatialReference spatialReference;
+  private static final Logger logger = 
LoggerFactory.getLogger(ShpBatchReader.class);
+
+  public static class ShpReaderConfig {
+protected final ShpFormatPlugin plugin;
+
+public ShpReaderConfig(ShpFormatPlugin plugin) {
+  this.plugin = plugin;
+}
+  }
+
+  public ShpBatchReader(ShpReaderConfig readerConfig) {
+this.readerConfig = readerConfig;
+  }
+
+  @Override
+  public boolean open(FileSchemaNegotiator negotiator) {
+this.split = negotiator.split();
+this.hadoopShp = split.getPath();
+
+String filePath = split.getPath().toString();
+this.hadoopDbf = new Path(filePath.replace(".shp", ".dbf"));
+this.hadoopPrj = new Path(filePath.replace(".shp", ".prj"));
+
+openFile(negotiator);
+SchemaBuilder builder = new SchemaBuilder()
+  .addNullable("gid", TypeProtos.MinorType.INT)
 
 Review comment:
   Use constants...
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> ESRI Shapefile (shp) format plugin
> --
>
> Key: DRILL-4303
> URL: https://issues.apache.org/jira/browse/DRILL-4303
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - 

[jira] [Commented] (DRILL-4303) ESRI Shapefile (shp) format plugin

2019-10-27 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16960594#comment-16960594
 ] 

ASF GitHub Bot commented on DRILL-4303:
---

arina-ielchiieva commented on pull request #1858: DRILL-4303: ESRI Shapefile 
(shp) Format Plugin
URL: https://github.com/apache/drill/pull/1858#discussion_r339343128
 
 

 ##
 File path: 
contrib/format-esri/src/main/java/org/apache/drill/exec/store/esri/ShpFormatConfig.java
 ##
 @@ -0,0 +1,68 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.esri;
+
+import com.fasterxml.jackson.annotation.JsonInclude;
+import com.fasterxml.jackson.annotation.JsonTypeName;
+import org.apache.drill.common.logical.FormatPluginConfig;
+import org.apache.drill.shaded.guava.com.google.common.base.Objects;
+import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableList;
+
+import java.util.Arrays;
+import java.util.List;
+
+/**
+ * Format plugin config for shapefile data files.
+ */
+@JsonTypeName(ShpFormatPlugin.PLUGIN_NAME)
+public class ShpFormatConfig implements FormatPluginConfig {
+
+  private static final List DEFAULT_EXTS = ImmutableList.of("shp", 
"dbf");
+
+  public List extensions;
+
+  public ShpFormatConfig() { }
 
 Review comment:
   No need in default constructor.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> ESRI Shapefile (shp) format plugin
> --
>
> Key: DRILL-4303
> URL: https://issues.apache.org/jira/browse/DRILL-4303
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Affects Versions: 1.17.0
>Reporter: Karol Potocki
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.17.0
>
>
> Allow Drill (drill-gis) to read esri shapefiles, one of the most popular 
> geospatial formats.
> Format described here:
> https://www.esri.com/library/whitepapers/pdfs/shapefile.pdf
> It consists of three files (prj - srid information, dbf - data fields, shp - 
> geometry data)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-4303) ESRI Shapefile (shp) format plugin

2019-10-27 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16960599#comment-16960599
 ] 

ASF GitHub Bot commented on DRILL-4303:
---

arina-ielchiieva commented on pull request #1858: DRILL-4303: ESRI Shapefile 
(shp) Format Plugin
URL: https://github.com/apache/drill/pull/1858#discussion_r339343058
 
 

 ##
 File path: 
contrib/format-esri/src/main/java/org/apache/drill/exec/store/esri/ShpBatchReader.java
 ##
 @@ -0,0 +1,318 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.esri;
+
+import com.esri.core.geometry.Geometry;
+import com.esri.core.geometry.GeometryCursor;
+import com.esri.core.geometry.ShapefileReader;
+import com.esri.core.geometry.SpatialReference;
+import com.esri.core.geometry.ogc.OGCGeometry;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.types.TypeProtos;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.MetadataUtils;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapred.FileSplit;
+import org.jamel.dbf.DbfReader;
+import org.jamel.dbf.structure.DbfField;
+import org.joda.time.Instant;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.nio.ByteBuffer;
+import java.nio.charset.Charset;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+public class ShpBatchReader implements ManagedReader {
+
+  private FileSplit split;
+  private ResultSetLoader loader;
+  private ShpReaderConfig readerConfig;
+  private Path hadoopShp;
+  private Path hadoopDbf;
+  private Path hadoopPrj;
+  private InputStream fileReaderShp = null;
+  private InputStream fileReaderDbf = null;
+  private InputStream fileReaderPrj = null;
+  private GeometryCursor geomCursor = null;
+  private DbfReader dbfReader = null;
+  private ScalarWriter gidWriter;
+  private ScalarWriter sridWriter;
+  private ScalarWriter shapeTypeWriter;
+  private ScalarWriter geomWriter;
+  private RowSetLoader rowWriter;
+  private int srid;
+  private SpatialReference spatialReference;
+  private static final Logger logger = 
LoggerFactory.getLogger(ShpBatchReader.class);
 
 Review comment:
   Move logger to the top
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> ESRI Shapefile (shp) format plugin
> --
>
> Key: DRILL-4303
> URL: https://issues.apache.org/jira/browse/DRILL-4303
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Affects Versions: 1.17.0
>Reporter: Karol Potocki
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.17.0
>
>
> Allow Drill (drill-gis) to read esri shapefiles, one of the most popular 
> geospatial formats.
> Format described here:
> https://www.esri.com/library/whitepapers/pdfs/shapefile.pdf
> It consists of three files (prj - srid information, dbf - data fields, shp - 
> geometry data)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-4303) ESRI Shapefile (shp) format plugin

2019-10-27 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16960597#comment-16960597
 ] 

ASF GitHub Bot commented on DRILL-4303:
---

arina-ielchiieva commented on pull request #1858: DRILL-4303: ESRI Shapefile 
(shp) Format Plugin
URL: https://github.com/apache/drill/pull/1858#discussion_r339343152
 
 

 ##
 File path: 
contrib/format-esri/src/main/java/org/apache/drill/exec/store/esri/ShpFormatConfig.java
 ##
 @@ -0,0 +1,68 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.esri;
+
+import com.fasterxml.jackson.annotation.JsonInclude;
+import com.fasterxml.jackson.annotation.JsonTypeName;
+import org.apache.drill.common.logical.FormatPluginConfig;
+import org.apache.drill.shaded.guava.com.google.common.base.Objects;
+import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableList;
+
+import java.util.Arrays;
+import java.util.List;
+
+/**
+ * Format plugin config for shapefile data files.
+ */
+@JsonTypeName(ShpFormatPlugin.PLUGIN_NAME)
+public class ShpFormatConfig implements FormatPluginConfig {
+
+  private static final List DEFAULT_EXTS = ImmutableList.of("shp", 
"dbf");
+
+  public List extensions;
 
 Review comment:
   You can assign default extensions right away, they will be overwritten 
during deserialization if needed.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> ESRI Shapefile (shp) format plugin
> --
>
> Key: DRILL-4303
> URL: https://issues.apache.org/jira/browse/DRILL-4303
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Affects Versions: 1.17.0
>Reporter: Karol Potocki
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.17.0
>
>
> Allow Drill (drill-gis) to read esri shapefiles, one of the most popular 
> geospatial formats.
> Format described here:
> https://www.esri.com/library/whitepapers/pdfs/shapefile.pdf
> It consists of three files (prj - srid information, dbf - data fields, shp - 
> geometry data)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-4303) ESRI Shapefile (shp) format plugin

2019-10-27 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16960600#comment-16960600
 ] 

ASF GitHub Bot commented on DRILL-4303:
---

arina-ielchiieva commented on pull request #1858: DRILL-4303: ESRI Shapefile 
(shp) Format Plugin
URL: https://github.com/apache/drill/pull/1858#discussion_r339343087
 
 

 ##
 File path: 
contrib/format-esri/src/main/java/org/apache/drill/exec/store/esri/ShpBatchReader.java
 ##
 @@ -0,0 +1,318 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.esri;
+
+import com.esri.core.geometry.Geometry;
+import com.esri.core.geometry.GeometryCursor;
+import com.esri.core.geometry.ShapefileReader;
+import com.esri.core.geometry.SpatialReference;
+import com.esri.core.geometry.ogc.OGCGeometry;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.types.TypeProtos;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.MetadataUtils;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapred.FileSplit;
+import org.jamel.dbf.DbfReader;
+import org.jamel.dbf.structure.DbfField;
+import org.joda.time.Instant;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.nio.ByteBuffer;
+import java.nio.charset.Charset;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+public class ShpBatchReader implements ManagedReader {
+
+  private FileSplit split;
+  private ResultSetLoader loader;
+  private ShpReaderConfig readerConfig;
+  private Path hadoopShp;
+  private Path hadoopDbf;
+  private Path hadoopPrj;
+  private InputStream fileReaderShp = null;
+  private InputStream fileReaderDbf = null;
+  private InputStream fileReaderPrj = null;
+  private GeometryCursor geomCursor = null;
+  private DbfReader dbfReader = null;
+  private ScalarWriter gidWriter;
+  private ScalarWriter sridWriter;
+  private ScalarWriter shapeTypeWriter;
+  private ScalarWriter geomWriter;
+  private RowSetLoader rowWriter;
+  private int srid;
+  private SpatialReference spatialReference;
+  private static final Logger logger = 
LoggerFactory.getLogger(ShpBatchReader.class);
+
+  public static class ShpReaderConfig {
+protected final ShpFormatPlugin plugin;
+
+public ShpReaderConfig(ShpFormatPlugin plugin) {
+  this.plugin = plugin;
+}
+  }
+
+  public ShpBatchReader(ShpReaderConfig readerConfig) {
+this.readerConfig = readerConfig;
+  }
+
+  @Override
+  public boolean open(FileSchemaNegotiator negotiator) {
+this.split = negotiator.split();
+this.hadoopShp = split.getPath();
+
+String filePath = split.getPath().toString();
+this.hadoopDbf = new Path(filePath.replace(".shp", ".dbf"));
+this.hadoopPrj = new Path(filePath.replace(".shp", ".prj"));
+
+openFile(negotiator);
+SchemaBuilder builder = new SchemaBuilder()
+  .addNullable("gid", TypeProtos.MinorType.INT)
+  .addNullable("srid", TypeProtos.MinorType.INT)
+  .addNullable("shapeType", TypeProtos.MinorType.VARCHAR)
+  .addNullable("geom", TypeProtos.MinorType.VARBINARY);
+
+negotiator.setTableSchema(builder.buildSchema(), false);
+loader = negotiator.build();
+
+rowWriter = loader.writer();
+gidWriter = rowWriter.scalar("gid");
+sridWriter = rowWriter.scalar("srid");
+shapeTypeWriter = rowWriter.scalar("shapeType");
+geomWriter = rowWriter.scalar("geom");
+
+return true;
+  }
+
+  @Override
+  public boolean next() {
+Geometry geom = null;
+
+while (!rowWriter.isFull()) {
+  Object[] 

[jira] [Commented] (DRILL-4303) ESRI Shapefile (shp) format plugin

2019-10-27 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16960595#comment-16960595
 ] 

ASF GitHub Bot commented on DRILL-4303:
---

arina-ielchiieva commented on pull request #1858: DRILL-4303: ESRI Shapefile 
(shp) Format Plugin
URL: https://github.com/apache/drill/pull/1858#discussion_r339343209
 
 

 ##
 File path: 
contrib/format-esri/src/test/java/org/apache/drill/exec/store/esri/TestShapefileFormatPlugin.java
 ##
 @@ -0,0 +1,113 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.esri;
+
+import org.apache.drill.categories.RowSetTests;
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.physical.rowSet.RowSet;
+import org.apache.drill.exec.physical.rowSet.RowSetBuilder;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.server.Drillbit;
+import org.apache.drill.exec.store.StoragePluginRegistry;
+import org.apache.drill.exec.store.dfs.FileSystemConfig;
+import org.apache.drill.exec.store.dfs.FileSystemPlugin;
+import org.apache.drill.test.BaseDirTestWatcher;
+import org.apache.drill.test.ClusterFixture;
+import org.apache.drill.test.ClusterTest;
+import org.apache.drill.test.rowSet.RowSetComparison;
+import org.junit.BeforeClass;
+import org.junit.ClassRule;
+import org.junit.Test;
+import org.junit.experimental.categories.Category;
+
+@Category(RowSetTests.class)
+public class TestShapefileFormatPlugin extends ClusterTest {
+
+  @ClassRule
+  public static final BaseDirTestWatcher dirTestWatcher = new 
BaseDirTestWatcher();
+
+  @BeforeClass
+  public static void setup() throws Exception {
+ClusterTest.startCluster(ClusterFixture.builder(dirTestWatcher));
+definePlugin();
+  }
+
+  private static void definePlugin() throws ExecutionSetupException {
+ShpFormatConfig sampleConfig = new ShpFormatConfig();
+
+// Define a temporary plugin for the "cp" storage plugin.
+Drillbit drillbit = cluster.drillbit();
 
 Review comment:
   Define plugin using built-in method in cluster.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> ESRI Shapefile (shp) format plugin
> --
>
> Key: DRILL-4303
> URL: https://issues.apache.org/jira/browse/DRILL-4303
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Affects Versions: 1.17.0
>Reporter: Karol Potocki
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.17.0
>
>
> Allow Drill (drill-gis) to read esri shapefiles, one of the most popular 
> geospatial formats.
> Format described here:
> https://www.esri.com/library/whitepapers/pdfs/shapefile.pdf
> It consists of three files (prj - srid information, dbf - data fields, shp - 
> geometry data)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-4303) ESRI Shapefile (shp) format plugin

2019-10-27 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16960584#comment-16960584
 ] 

ASF GitHub Bot commented on DRILL-4303:
---

arina-ielchiieva commented on pull request #1858: DRILL-4303: ESRI Shapefile 
(shp) Format Plugin
URL: https://github.com/apache/drill/pull/1858#discussion_r339342415
 
 

 ##
 File path: contrib/format-esri/pom.xml
 ##
 @@ -0,0 +1,104 @@
+
+
+http://maven.apache.org/POM/4.0.0; 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance; 
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd;>
+  4.0.0
+  
+drill-contrib-parent
+org.apache.drill.contrib
+1.17.0-SNAPSHOT
+  
+
+  drill-format-esri
+  contrib/format-esri
+
+
+  
+org.apache.drill.exec
+drill-java-exec
+${project.version}
+  
+  
+com.esri.geometry
+esri-geometry-api
+2.2.3
+  
+  
+org.jamel.dbf
+dbf-reader
+0.1.0
 
 Review comment:
   Use last version `0.3.0`.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> ESRI Shapefile (shp) format plugin
> --
>
> Key: DRILL-4303
> URL: https://issues.apache.org/jira/browse/DRILL-4303
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Affects Versions: 1.17.0
>Reporter: Karol Potocki
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.17.0
>
>
> Allow Drill (drill-gis) to read esri shapefiles, one of the most popular 
> geospatial formats.
> Format described here:
> https://www.esri.com/library/whitepapers/pdfs/shapefile.pdf
> It consists of three files (prj - srid information, dbf - data fields, shp - 
> geometry data)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-4303) ESRI Shapefile (shp) format plugin

2019-10-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16957954#comment-16957954
 ] 

ASF GitHub Bot commented on DRILL-4303:
---

cgivre commented on pull request #1858: DRILL-4303: ESRI Shapefile (shp) Format 
Plugin
URL: https://github.com/apache/drill/pull/1858#discussion_r338112147
 
 

 ##
 File path: 
contrib/format-esri/src/main/java/org/apache/drill/exec/store/esri/ShpBatchReader.java
 ##
 @@ -0,0 +1,334 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.esri;
+
+import com.esri.core.geometry.Geometry;
+import com.esri.core.geometry.GeometryCursor;
+import com.esri.core.geometry.ShapefileReader;
+import com.esri.core.geometry.SpatialReference;
+import com.esri.core.geometry.ogc.OGCGeometry;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.types.TypeProtos;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.MetadataUtils;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapred.FileSplit;
+import org.jamel.dbf.DbfReader;
+import org.jamel.dbf.structure.DbfField;
+import org.joda.time.Instant;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.BufferedReader;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.charset.Charset;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+public class ShpBatchReader implements ManagedReader {
+
+  private FileSplit split;
+  private BufferedReader reader;
+  private ResultSetLoader loader;
+  private ShpReaderConfig readerConfig;
+  private Path hadoopShp;
+  private Path hadoopDbf;
+  private Path hadoopPrj;
+  private FSDataInputStream fileReaderShp = null;
+  private FSDataInputStream fileReaderDbf = null;
+  private FSDataInputStream fileReaderPrj = null;
+  private GeometryCursor geomCursor = null;
+  private DbfReader dbfReader = null;
+  private ScalarWriter gidWriter;
+  private ScalarWriter sridWriter;
+  private ScalarWriter shapeTypeWriter;
+  private ScalarWriter geomWriter;
+  private RowSetLoader rowWriter;
+
+
+  private int srid;
+  private SpatialReference spatialReference;
+  private static final Logger logger = 
LoggerFactory.getLogger(ShpBatchReader.class);
+
+  public static class ShpReaderConfig {
+protected final ShpFormatPlugin plugin;
+
+public ShpReaderConfig(ShpFormatPlugin plugin) {
+  this.plugin = plugin;
+}
+  }
+
+  public ShpBatchReader(ShpReaderConfig readerConfig) {
+this.readerConfig = readerConfig;
+  }
+
+  @Override
+  public boolean open(FileSchemaNegotiator negotiator) {
+this.split = negotiator.split();
+this.hadoopShp = split.getPath();
+this.hadoopDbf = new Path(split.getPath().toString().replace("shp", 
"dbf"));
+this.hadoopPrj = new Path(split.getPath().toString().replace("shp", 
"prj"));
 
 Review comment:
   I was trying to avoid a regex here.  What I did instead was made the 
replacement text `.shp` instead of just `shp`
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> ESRI Shapefile (shp) format plugin
> --
>
> Key: DRILL-4303
> URL: https://issues.apache.org/jira/browse/DRILL-4303
> Project: Apache Drill
>  Issue Type: 

[jira] [Commented] (DRILL-4303) ESRI Shapefile (shp) format plugin

2019-10-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16957951#comment-16957951
 ] 

ASF GitHub Bot commented on DRILL-4303:
---

cgivre commented on pull request #1858: DRILL-4303: ESRI Shapefile (shp) Format 
Plugin
URL: https://github.com/apache/drill/pull/1858#discussion_r33813
 
 

 ##
 File path: 
contrib/format-esri/src/test/java/org/apache/drill/exec/store/esri/TestShapefileFormatPlugin.java
 ##
 @@ -0,0 +1,92 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.esri;
+
+import org.apache.drill.categories.RowSetTests;
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.exec.server.Drillbit;
+import org.apache.drill.exec.store.StoragePluginRegistry;
+import org.apache.drill.exec.store.dfs.FileSystemConfig;
+import org.apache.drill.exec.store.dfs.FileSystemPlugin;
+import org.apache.drill.test.BaseDirTestWatcher;
+import org.apache.drill.test.ClusterFixture;
+import org.apache.drill.test.ClusterTest;
+import org.junit.BeforeClass;
+import org.junit.ClassRule;
+import org.junit.Test;
+import org.junit.experimental.categories.Category;
+
+@Category(RowSetTests.class)
+public class TestShapefileFormatPlugin extends ClusterTest {
+
+  @ClassRule
+  public static final BaseDirTestWatcher dirTestWatcher = new 
BaseDirTestWatcher();
+
+  @BeforeClass
+  public static void setup() throws Exception {
+ClusterTest.startCluster(ClusterFixture.builder(dirTestWatcher));
+definePlugin();
+  }
+
+  private static void definePlugin() throws ExecutionSetupException {
+ShpFormatConfig sampleConfig = new ShpFormatConfig();
+
+// Define a temporary plugin for the "cp" storage plugin.
+Drillbit drillbit = cluster.drillbit();
+final StoragePluginRegistry pluginRegistry = 
drillbit.getContext().getStorage();
+final FileSystemPlugin plugin = (FileSystemPlugin) 
pluginRegistry.getPlugin("cp");
+final FileSystemConfig pluginConfig = (FileSystemConfig) 
plugin.getConfig();
+pluginConfig.getFormats().put("sample", sampleConfig);
+pluginRegistry.createOrUpdate("cp", pluginConfig, false);
+  }
+
+  @Test
+  public void testRowCount() throws Exception {
+testBuilder()
+  .sqlQuery("select count(*) "
++ "from cp.`CA-cities.shp`")
+  .ordered()
+  .baselineColumns("EXPR$0")
+  .baselineValues(5727L)
+  .go();
 
 Review comment:
   I updated the tests.  The purpose of this test was just to test that it was 
reading the complete file. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> ESRI Shapefile (shp) format plugin
> --
>
> Key: DRILL-4303
> URL: https://issues.apache.org/jira/browse/DRILL-4303
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Affects Versions: 1.17.0
>Reporter: Karol Potocki
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.17.0
>
>
> Allow Drill (drill-gis) to read esri shapefiles, one of the most popular 
> geospatial formats.
> Format described here:
> https://www.esri.com/library/whitepapers/pdfs/shapefile.pdf
> It consists of three files (prj - srid information, dbf - data fields, shp - 
> geometry data)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-4303) ESRI Shapefile (shp) format plugin

2019-10-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16957950#comment-16957950
 ] 

ASF GitHub Bot commented on DRILL-4303:
---

cgivre commented on pull request #1858: DRILL-4303: ESRI Shapefile (shp) Format 
Plugin
URL: https://github.com/apache/drill/pull/1858#discussion_r338110832
 
 

 ##
 File path: 
contrib/format-esri/src/main/java/org/apache/drill/exec/store/esri/ShpBatchReader.java
 ##
 @@ -0,0 +1,334 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.esri;
+
+import com.esri.core.geometry.Geometry;
+import com.esri.core.geometry.GeometryCursor;
+import com.esri.core.geometry.ShapefileReader;
+import com.esri.core.geometry.SpatialReference;
+import com.esri.core.geometry.ogc.OGCGeometry;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.types.TypeProtos;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.MetadataUtils;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapred.FileSplit;
+import org.jamel.dbf.DbfReader;
+import org.jamel.dbf.structure.DbfField;
+import org.joda.time.Instant;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.BufferedReader;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.charset.Charset;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+public class ShpBatchReader implements ManagedReader {
+
+  private FileSplit split;
+  private BufferedReader reader;
+  private ResultSetLoader loader;
+  private ShpReaderConfig readerConfig;
+  private Path hadoopShp;
+  private Path hadoopDbf;
+  private Path hadoopPrj;
+  private FSDataInputStream fileReaderShp = null;
+  private FSDataInputStream fileReaderDbf = null;
+  private FSDataInputStream fileReaderPrj = null;
+  private GeometryCursor geomCursor = null;
+  private DbfReader dbfReader = null;
+  private ScalarWriter gidWriter;
+  private ScalarWriter sridWriter;
+  private ScalarWriter shapeTypeWriter;
+  private ScalarWriter geomWriter;
+  private RowSetLoader rowWriter;
+
+
+  private int srid;
+  private SpatialReference spatialReference;
+  private static final Logger logger = 
LoggerFactory.getLogger(ShpBatchReader.class);
+
+  public static class ShpReaderConfig {
+protected final ShpFormatPlugin plugin;
+
+public ShpReaderConfig(ShpFormatPlugin plugin) {
+  this.plugin = plugin;
+}
+  }
+
+  public ShpBatchReader(ShpReaderConfig readerConfig) {
+this.readerConfig = readerConfig;
+  }
+
+  @Override
+  public boolean open(FileSchemaNegotiator negotiator) {
+this.split = negotiator.split();
+this.hadoopShp = split.getPath();
+this.hadoopDbf = new Path(split.getPath().toString().replace("shp", 
"dbf"));
+this.hadoopPrj = new Path(split.getPath().toString().replace("shp", 
"prj"));
+
+openFile(negotiator);
+SchemaBuilder builder = new SchemaBuilder();
+builder.addNullable("gid", TypeProtos.MinorType.INT);
+builder.addNullable("srid", TypeProtos.MinorType.INT);
+builder.addNullable("shapeType", TypeProtos.MinorType.VARCHAR);
+builder.addNullable("geom", TypeProtos.MinorType.VARBINARY);
+
+negotiator.setTableSchema(builder.buildSchema(), false);
+loader = negotiator.build();
+
+rowWriter = loader.writer();
+gidWriter = rowWriter.scalar("gid");
+sridWriter = rowWriter.scalar("srid");
+shapeTypeWriter = rowWriter.scalar("shapeType");
+geomWriter = rowWriter.scalar("geom");
+
+return true;
+  }
+
+  @Override
+  public 

[jira] [Commented] (DRILL-4303) ESRI Shapefile (shp) format plugin

2019-10-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16957948#comment-16957948
 ] 

ASF GitHub Bot commented on DRILL-4303:
---

cgivre commented on pull request #1858: DRILL-4303: ESRI Shapefile (shp) Format 
Plugin
URL: https://github.com/apache/drill/pull/1858#discussion_r338109042
 
 

 ##
 File path: 
contrib/format-esri/src/main/java/org/apache/drill/exec/store/esri/ShpBatchReader.java
 ##
 @@ -0,0 +1,334 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.esri;
+
+import com.esri.core.geometry.Geometry;
+import com.esri.core.geometry.GeometryCursor;
+import com.esri.core.geometry.ShapefileReader;
+import com.esri.core.geometry.SpatialReference;
+import com.esri.core.geometry.ogc.OGCGeometry;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.types.TypeProtos;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.MetadataUtils;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapred.FileSplit;
+import org.jamel.dbf.DbfReader;
+import org.jamel.dbf.structure.DbfField;
+import org.joda.time.Instant;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.BufferedReader;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.charset.Charset;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+public class ShpBatchReader implements ManagedReader {
+
+  private FileSplit split;
+  private BufferedReader reader;
+  private ResultSetLoader loader;
+  private ShpReaderConfig readerConfig;
+  private Path hadoopShp;
+  private Path hadoopDbf;
+  private Path hadoopPrj;
+  private FSDataInputStream fileReaderShp = null;
+  private FSDataInputStream fileReaderDbf = null;
+  private FSDataInputStream fileReaderPrj = null;
+  private GeometryCursor geomCursor = null;
+  private DbfReader dbfReader = null;
+  private ScalarWriter gidWriter;
+  private ScalarWriter sridWriter;
+  private ScalarWriter shapeTypeWriter;
+  private ScalarWriter geomWriter;
+  private RowSetLoader rowWriter;
+
+
+  private int srid;
+  private SpatialReference spatialReference;
+  private static final Logger logger = 
LoggerFactory.getLogger(ShpBatchReader.class);
+
+  public static class ShpReaderConfig {
+protected final ShpFormatPlugin plugin;
+
+public ShpReaderConfig(ShpFormatPlugin plugin) {
+  this.plugin = plugin;
+}
+  }
+
+  public ShpBatchReader(ShpReaderConfig readerConfig) {
+this.readerConfig = readerConfig;
+  }
+
+  @Override
+  public boolean open(FileSchemaNegotiator negotiator) {
+this.split = negotiator.split();
+this.hadoopShp = split.getPath();
+this.hadoopDbf = new Path(split.getPath().toString().replace("shp", 
"dbf"));
+this.hadoopPrj = new Path(split.getPath().toString().replace("shp", 
"prj"));
+
+openFile(negotiator);
+SchemaBuilder builder = new SchemaBuilder();
+builder.addNullable("gid", TypeProtos.MinorType.INT);
+builder.addNullable("srid", TypeProtos.MinorType.INT);
+builder.addNullable("shapeType", TypeProtos.MinorType.VARCHAR);
+builder.addNullable("geom", TypeProtos.MinorType.VARBINARY);
+
+negotiator.setTableSchema(builder.buildSchema(), false);
+loader = negotiator.build();
+
+rowWriter = loader.writer();
+gidWriter = rowWriter.scalar("gid");
+sridWriter = rowWriter.scalar("srid");
+shapeTypeWriter = rowWriter.scalar("shapeType");
+geomWriter = rowWriter.scalar("geom");
+
+return true;
+  }
+
+  @Override
+  public 

[jira] [Commented] (DRILL-4303) ESRI Shapefile (shp) format plugin

2019-10-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16957947#comment-16957947
 ] 

ASF GitHub Bot commented on DRILL-4303:
---

cgivre commented on pull request #1858: DRILL-4303: ESRI Shapefile (shp) Format 
Plugin
URL: https://github.com/apache/drill/pull/1858#discussion_r338108707
 
 

 ##
 File path: 
contrib/format-esri/src/main/java/org/apache/drill/exec/store/esri/ShpBatchReader.java
 ##
 @@ -0,0 +1,334 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.esri;
+
+import com.esri.core.geometry.Geometry;
+import com.esri.core.geometry.GeometryCursor;
+import com.esri.core.geometry.ShapefileReader;
+import com.esri.core.geometry.SpatialReference;
+import com.esri.core.geometry.ogc.OGCGeometry;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.types.TypeProtos;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.MetadataUtils;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapred.FileSplit;
+import org.jamel.dbf.DbfReader;
+import org.jamel.dbf.structure.DbfField;
+import org.joda.time.Instant;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.BufferedReader;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.charset.Charset;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+public class ShpBatchReader implements ManagedReader {
+
+  private FileSplit split;
+  private BufferedReader reader;
+  private ResultSetLoader loader;
+  private ShpReaderConfig readerConfig;
+  private Path hadoopShp;
+  private Path hadoopDbf;
+  private Path hadoopPrj;
+  private FSDataInputStream fileReaderShp = null;
+  private FSDataInputStream fileReaderDbf = null;
+  private FSDataInputStream fileReaderPrj = null;
+  private GeometryCursor geomCursor = null;
+  private DbfReader dbfReader = null;
+  private ScalarWriter gidWriter;
+  private ScalarWriter sridWriter;
+  private ScalarWriter shapeTypeWriter;
+  private ScalarWriter geomWriter;
+  private RowSetLoader rowWriter;
+
+
+  private int srid;
+  private SpatialReference spatialReference;
+  private static final Logger logger = 
LoggerFactory.getLogger(ShpBatchReader.class);
+
+  public static class ShpReaderConfig {
+protected final ShpFormatPlugin plugin;
+
+public ShpReaderConfig(ShpFormatPlugin plugin) {
+  this.plugin = plugin;
+}
+  }
+
+  public ShpBatchReader(ShpReaderConfig readerConfig) {
+this.readerConfig = readerConfig;
+  }
+
+  @Override
+  public boolean open(FileSchemaNegotiator negotiator) {
+this.split = negotiator.split();
+this.hadoopShp = split.getPath();
+this.hadoopDbf = new Path(split.getPath().toString().replace("shp", 
"dbf"));
+this.hadoopPrj = new Path(split.getPath().toString().replace("shp", 
"prj"));
+
+openFile(negotiator);
+SchemaBuilder builder = new SchemaBuilder();
+builder.addNullable("gid", TypeProtos.MinorType.INT);
+builder.addNullable("srid", TypeProtos.MinorType.INT);
+builder.addNullable("shapeType", TypeProtos.MinorType.VARCHAR);
+builder.addNullable("geom", TypeProtos.MinorType.VARBINARY);
+
+negotiator.setTableSchema(builder.buildSchema(), false);
+loader = negotiator.build();
+
+rowWriter = loader.writer();
+gidWriter = rowWriter.scalar("gid");
+sridWriter = rowWriter.scalar("srid");
+shapeTypeWriter = rowWriter.scalar("shapeType");
+geomWriter = rowWriter.scalar("geom");
+
+return true;
+  }
+
+  @Override
+  public 

[jira] [Commented] (DRILL-4303) ESRI Shapefile (shp) format plugin

2019-10-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16957944#comment-16957944
 ] 

ASF GitHub Bot commented on DRILL-4303:
---

cgivre commented on pull request #1858: DRILL-4303: ESRI Shapefile (shp) Format 
Plugin
URL: https://github.com/apache/drill/pull/1858#discussion_r338107649
 
 

 ##
 File path: 
contrib/format-esri/src/main/java/org/apache/drill/exec/store/esri/ShpBatchReader.java
 ##
 @@ -0,0 +1,334 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.esri;
+
+import com.esri.core.geometry.Geometry;
+import com.esri.core.geometry.GeometryCursor;
+import com.esri.core.geometry.ShapefileReader;
+import com.esri.core.geometry.SpatialReference;
+import com.esri.core.geometry.ogc.OGCGeometry;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.types.TypeProtos;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.MetadataUtils;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapred.FileSplit;
+import org.jamel.dbf.DbfReader;
+import org.jamel.dbf.structure.DbfField;
+import org.joda.time.Instant;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.BufferedReader;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.charset.Charset;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+public class ShpBatchReader implements ManagedReader {
+
+  private FileSplit split;
+  private BufferedReader reader;
+  private ResultSetLoader loader;
+  private ShpReaderConfig readerConfig;
+  private Path hadoopShp;
+  private Path hadoopDbf;
+  private Path hadoopPrj;
+  private FSDataInputStream fileReaderShp = null;
+  private FSDataInputStream fileReaderDbf = null;
+  private FSDataInputStream fileReaderPrj = null;
+  private GeometryCursor geomCursor = null;
+  private DbfReader dbfReader = null;
+  private ScalarWriter gidWriter;
+  private ScalarWriter sridWriter;
+  private ScalarWriter shapeTypeWriter;
+  private ScalarWriter geomWriter;
+  private RowSetLoader rowWriter;
+
+
+  private int srid;
+  private SpatialReference spatialReference;
+  private static final Logger logger = 
LoggerFactory.getLogger(ShpBatchReader.class);
+
+  public static class ShpReaderConfig {
+protected final ShpFormatPlugin plugin;
+
+public ShpReaderConfig(ShpFormatPlugin plugin) {
+  this.plugin = plugin;
+}
+  }
+
+  public ShpBatchReader(ShpReaderConfig readerConfig) {
+this.readerConfig = readerConfig;
+  }
+
+  @Override
+  public boolean open(FileSchemaNegotiator negotiator) {
+this.split = negotiator.split();
+this.hadoopShp = split.getPath();
+this.hadoopDbf = new Path(split.getPath().toString().replace("shp", 
"dbf"));
+this.hadoopPrj = new Path(split.getPath().toString().replace("shp", 
"prj"));
+
+openFile(negotiator);
+SchemaBuilder builder = new SchemaBuilder();
+builder.addNullable("gid", TypeProtos.MinorType.INT);
+builder.addNullable("srid", TypeProtos.MinorType.INT);
+builder.addNullable("shapeType", TypeProtos.MinorType.VARCHAR);
+builder.addNullable("geom", TypeProtos.MinorType.VARBINARY);
+
+negotiator.setTableSchema(builder.buildSchema(), false);
 
 Review comment:
   Fixed
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please 

[jira] [Commented] (DRILL-4303) ESRI Shapefile (shp) format plugin

2019-10-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16957940#comment-16957940
 ] 

ASF GitHub Bot commented on DRILL-4303:
---

cgivre commented on pull request #1858: DRILL-4303: ESRI Shapefile (shp) Format 
Plugin
URL: https://github.com/apache/drill/pull/1858#discussion_r338106359
 
 

 ##
 File path: contrib/format-esri/README.md
 ##
 @@ -0,0 +1,190 @@
+# Format Plugin for ESRI Shape Files
+This format plugin allows Drill to read ESRI Shape files. You can read about 
the shapefile format here: https://en.wikipedia.org/wiki/Shapefile. 
+
+## Configuration Options
+Other than the file extensions, there are no configuration options for this 
plugin. To use, simply add the following to your configuration:
+
+```
+"shp": {
+  "type": "shp",
+  "extensions": [
+"shp"
+  ]
+}
+```
+
+## Usage Notes:
+This plugin will return the following fields:
+
+* `gid`:  Integer
+* `srid`:  Integer
+* `shapeType`:  String
+* `name`:  Plain text 
+* `geom`:  A geometric point or path.  This field is returned as a `VARBINARY`.
+
+This plugin is best used with the suite of GIS functions in Drill which 
include the following:
+
+Geospatial Functions
 
 Review comment:
   Yup.  Fixed
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> ESRI Shapefile (shp) format plugin
> --
>
> Key: DRILL-4303
> URL: https://issues.apache.org/jira/browse/DRILL-4303
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Affects Versions: 1.17.0
>Reporter: Karol Potocki
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.17.0
>
>
> Allow Drill (drill-gis) to read esri shapefiles, one of the most popular 
> geospatial formats.
> Format described here:
> https://www.esri.com/library/whitepapers/pdfs/shapefile.pdf
> It consists of three files (prj - srid information, dbf - data fields, shp - 
> geometry data)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-4303) ESRI Shapefile (shp) format plugin

2019-10-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16957941#comment-16957941
 ] 

ASF GitHub Bot commented on DRILL-4303:
---

cgivre commented on pull request #1858: DRILL-4303: ESRI Shapefile (shp) Format 
Plugin
URL: https://github.com/apache/drill/pull/1858#discussion_r338106428
 
 

 ##
 File path: contrib/format-esri/README.md
 ##
 @@ -0,0 +1,190 @@
+# Format Plugin for ESRI Shape Files
 
 Review comment:
   YW!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> ESRI Shapefile (shp) format plugin
> --
>
> Key: DRILL-4303
> URL: https://issues.apache.org/jira/browse/DRILL-4303
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Affects Versions: 1.17.0
>Reporter: Karol Potocki
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.17.0
>
>
> Allow Drill (drill-gis) to read esri shapefiles, one of the most popular 
> geospatial formats.
> Format described here:
> https://www.esri.com/library/whitepapers/pdfs/shapefile.pdf
> It consists of three files (prj - srid information, dbf - data fields, shp - 
> geometry data)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-4303) ESRI Shapefile (shp) format plugin

2019-10-21 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955877#comment-16955877
 ] 

ASF GitHub Bot commented on DRILL-4303:
---

arina-ielchiieva commented on pull request #1858: DRILL-4303: ESRI Shapefile 
(shp) Format Plugin
URL: https://github.com/apache/drill/pull/1858#discussion_r336899443
 
 

 ##
 File path: 
contrib/format-esri/src/main/java/org/apache/drill/exec/store/esri/ShpBatchReader.java
 ##
 @@ -0,0 +1,334 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.esri;
+
+import com.esri.core.geometry.Geometry;
+import com.esri.core.geometry.GeometryCursor;
+import com.esri.core.geometry.ShapefileReader;
+import com.esri.core.geometry.SpatialReference;
+import com.esri.core.geometry.ogc.OGCGeometry;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.types.TypeProtos;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.MetadataUtils;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapred.FileSplit;
+import org.jamel.dbf.DbfReader;
+import org.jamel.dbf.structure.DbfField;
+import org.joda.time.Instant;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.BufferedReader;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.charset.Charset;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+public class ShpBatchReader implements ManagedReader {
+
+  private FileSplit split;
+  private BufferedReader reader;
+  private ResultSetLoader loader;
+  private ShpReaderConfig readerConfig;
+  private Path hadoopShp;
+  private Path hadoopDbf;
+  private Path hadoopPrj;
+  private FSDataInputStream fileReaderShp = null;
+  private FSDataInputStream fileReaderDbf = null;
+  private FSDataInputStream fileReaderPrj = null;
+  private GeometryCursor geomCursor = null;
+  private DbfReader dbfReader = null;
+  private ScalarWriter gidWriter;
+  private ScalarWriter sridWriter;
+  private ScalarWriter shapeTypeWriter;
+  private ScalarWriter geomWriter;
+  private RowSetLoader rowWriter;
+
+
+  private int srid;
+  private SpatialReference spatialReference;
+  private static final Logger logger = 
LoggerFactory.getLogger(ShpBatchReader.class);
+
+  public static class ShpReaderConfig {
+protected final ShpFormatPlugin plugin;
+
+public ShpReaderConfig(ShpFormatPlugin plugin) {
+  this.plugin = plugin;
+}
+  }
+
+  public ShpBatchReader(ShpReaderConfig readerConfig) {
+this.readerConfig = readerConfig;
+  }
+
+  @Override
+  public boolean open(FileSchemaNegotiator negotiator) {
+this.split = negotiator.split();
+this.hadoopShp = split.getPath();
+this.hadoopDbf = new Path(split.getPath().toString().replace("shp", 
"dbf"));
+this.hadoopPrj = new Path(split.getPath().toString().replace("shp", 
"prj"));
+
+openFile(negotiator);
+SchemaBuilder builder = new SchemaBuilder();
+builder.addNullable("gid", TypeProtos.MinorType.INT);
+builder.addNullable("srid", TypeProtos.MinorType.INT);
+builder.addNullable("shapeType", TypeProtos.MinorType.VARCHAR);
+builder.addNullable("geom", TypeProtos.MinorType.VARBINARY);
+
+negotiator.setTableSchema(builder.buildSchema(), false);
+loader = negotiator.build();
+
+rowWriter = loader.writer();
+gidWriter = rowWriter.scalar("gid");
+sridWriter = rowWriter.scalar("srid");
+shapeTypeWriter = rowWriter.scalar("shapeType");
+geomWriter = rowWriter.scalar("geom");
+
+return true;
+  }
+
+  @Override

[jira] [Commented] (DRILL-4303) ESRI Shapefile (shp) format plugin

2019-10-21 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955876#comment-16955876
 ] 

ASF GitHub Bot commented on DRILL-4303:
---

arina-ielchiieva commented on pull request #1858: DRILL-4303: ESRI Shapefile 
(shp) Format Plugin
URL: https://github.com/apache/drill/pull/1858#discussion_r336899443
 
 

 ##
 File path: 
contrib/format-esri/src/main/java/org/apache/drill/exec/store/esri/ShpBatchReader.java
 ##
 @@ -0,0 +1,334 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.esri;
+
+import com.esri.core.geometry.Geometry;
+import com.esri.core.geometry.GeometryCursor;
+import com.esri.core.geometry.ShapefileReader;
+import com.esri.core.geometry.SpatialReference;
+import com.esri.core.geometry.ogc.OGCGeometry;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.types.TypeProtos;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.MetadataUtils;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapred.FileSplit;
+import org.jamel.dbf.DbfReader;
+import org.jamel.dbf.structure.DbfField;
+import org.joda.time.Instant;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.BufferedReader;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.charset.Charset;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+public class ShpBatchReader implements ManagedReader {
+
+  private FileSplit split;
+  private BufferedReader reader;
+  private ResultSetLoader loader;
+  private ShpReaderConfig readerConfig;
+  private Path hadoopShp;
+  private Path hadoopDbf;
+  private Path hadoopPrj;
+  private FSDataInputStream fileReaderShp = null;
+  private FSDataInputStream fileReaderDbf = null;
+  private FSDataInputStream fileReaderPrj = null;
+  private GeometryCursor geomCursor = null;
+  private DbfReader dbfReader = null;
+  private ScalarWriter gidWriter;
+  private ScalarWriter sridWriter;
+  private ScalarWriter shapeTypeWriter;
+  private ScalarWriter geomWriter;
+  private RowSetLoader rowWriter;
+
+
+  private int srid;
+  private SpatialReference spatialReference;
+  private static final Logger logger = 
LoggerFactory.getLogger(ShpBatchReader.class);
+
+  public static class ShpReaderConfig {
+protected final ShpFormatPlugin plugin;
+
+public ShpReaderConfig(ShpFormatPlugin plugin) {
+  this.plugin = plugin;
+}
+  }
+
+  public ShpBatchReader(ShpReaderConfig readerConfig) {
+this.readerConfig = readerConfig;
+  }
+
+  @Override
+  public boolean open(FileSchemaNegotiator negotiator) {
+this.split = negotiator.split();
+this.hadoopShp = split.getPath();
+this.hadoopDbf = new Path(split.getPath().toString().replace("shp", 
"dbf"));
+this.hadoopPrj = new Path(split.getPath().toString().replace("shp", 
"prj"));
+
+openFile(negotiator);
+SchemaBuilder builder = new SchemaBuilder();
+builder.addNullable("gid", TypeProtos.MinorType.INT);
+builder.addNullable("srid", TypeProtos.MinorType.INT);
+builder.addNullable("shapeType", TypeProtos.MinorType.VARCHAR);
+builder.addNullable("geom", TypeProtos.MinorType.VARBINARY);
+
+negotiator.setTableSchema(builder.buildSchema(), false);
+loader = negotiator.build();
+
+rowWriter = loader.writer();
+gidWriter = rowWriter.scalar("gid");
+sridWriter = rowWriter.scalar("srid");
+shapeTypeWriter = rowWriter.scalar("shapeType");
+geomWriter = rowWriter.scalar("geom");
+
+return true;
+  }
+
+  @Override

[jira] [Commented] (DRILL-4303) ESRI Shapefile (shp) format plugin

2019-10-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955642#comment-16955642
 ] 

ASF GitHub Bot commented on DRILL-4303:
---

cgivre commented on pull request #1858: DRILL-4303: ESRI Shapefile (shp) Format 
Plugin
URL: https://github.com/apache/drill/pull/1858#discussion_r336798709
 
 

 ##
 File path: 
contrib/format-esri/src/main/java/org/apache/drill/exec/store/esri/ShpFormatPlugin.java
 ##
 @@ -0,0 +1,92 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.esri;
+
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.common.logical.StoragePluginConfig;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.common.types.Types;
+import org.apache.drill.exec.physical.impl.scan.file.FileScanFramework;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator;
+
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileReaderFactory;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.proto.UserBitShared.CoreOperatorType;
+import org.apache.drill.exec.server.DrillbitContext;
+import org.apache.drill.exec.server.options.OptionManager;
+import org.apache.drill.exec.store.dfs.easy.EasyFormatPlugin;
+import org.apache.drill.exec.store.dfs.easy.EasySubScan;
+import org.apache.drill.exec.store.esri.ShpBatchReader.ShpReaderConfig;
+import org.apache.hadoop.conf.Configuration;
+
+import org.apache.drill.shaded.guava.com.google.common.collect.Lists;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+public class ShpFormatPlugin extends EasyFormatPlugin {
+
+  private static final Logger logger = 
LoggerFactory.getLogger(ShpFormatPlugin.class);
+
+  public static final String PLUGIN_NAME = "shp";
+
+  public static class ShpReaderFactory extends FileReaderFactory {
+private final ShpReaderConfig readerConfig;
+
+public ShpReaderFactory(ShpReaderConfig config) {
+  readerConfig = config;
+}
+
+@Override
+public ManagedReader 
newReader() {
+  return new ShpBatchReader(readerConfig);
+}
+  }
+
+  public ShpFormatPlugin(String name, DrillbitContext context, Configuration 
fsConf, StoragePluginConfig storageConfig, ShpFormatConfig formatConfig) {
+super(name, easyConfig(fsConf, formatConfig), context, storageConfig, 
formatConfig);
+  }
+
+  @Override
+  public ManagedReader 
newBatchReader(EasySubScan scan, OptionManager options) throws 
ExecutionSetupException {
+return new ShpBatchReader(formatConfig.getReaderConfig(this));
+  }
+
+  @Override
+  protected FileScanFramework.FileScanBuilder frameworkBuilder(OptionManager 
options, EasySubScan scan) {
+FileScanFramework.FileScanBuilder builder = new 
FileScanFramework.FileScanBuilder();
+builder.setReaderFactory(new ShpReaderFactory(new ShpReaderConfig(this)));
+initScanBuilder(builder, scan);
+builder.setNullType(Types.optional(TypeProtos.MinorType.VARCHAR));
 
 Review comment:
   I'd rather leave the functionality as is because one entry could contain an 
attribute, like `elevation` and another might not so if it was required, I 
think you'd end up with data that is difficult to query. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> ESRI Shapefile (shp) format plugin
> --
>
> Key: DRILL-4303
> URL: https://issues.apache.org/jira/browse/DRILL-4303
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Affects Versions: 1.17.0
>Reporter: Karol Potocki
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.17.0
>
>
> Allow Drill (drill-gis) to 

[jira] [Commented] (DRILL-4303) ESRI Shapefile (shp) format plugin

2019-10-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955636#comment-16955636
 ] 

ASF GitHub Bot commented on DRILL-4303:
---

cgivre commented on pull request #1858: DRILL-4303: ESRI Shapefile (shp) Format 
Plugin
URL: https://github.com/apache/drill/pull/1858#discussion_r336797742
 
 

 ##
 File path: 
contrib/format-esri/src/main/java/org/apache/drill/exec/store/esri/ShpBatchReader.java
 ##
 @@ -0,0 +1,334 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.esri;
+
+import com.esri.core.geometry.Geometry;
+import com.esri.core.geometry.GeometryCursor;
+import com.esri.core.geometry.ShapefileReader;
+import com.esri.core.geometry.SpatialReference;
+import com.esri.core.geometry.ogc.OGCGeometry;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.types.TypeProtos;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.MetadataUtils;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapred.FileSplit;
+import org.jamel.dbf.DbfReader;
+import org.jamel.dbf.structure.DbfField;
+import org.joda.time.Instant;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.BufferedReader;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.charset.Charset;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+public class ShpBatchReader implements ManagedReader {
+
+  private FileSplit split;
+  private BufferedReader reader;
+  private ResultSetLoader loader;
+  private ShpReaderConfig readerConfig;
+  private Path hadoopShp;
+  private Path hadoopDbf;
+  private Path hadoopPrj;
+  private FSDataInputStream fileReaderShp = null;
+  private FSDataInputStream fileReaderDbf = null;
+  private FSDataInputStream fileReaderPrj = null;
+  private GeometryCursor geomCursor = null;
+  private DbfReader dbfReader = null;
+  private ScalarWriter gidWriter;
+  private ScalarWriter sridWriter;
+  private ScalarWriter shapeTypeWriter;
+  private ScalarWriter geomWriter;
+  private RowSetLoader rowWriter;
+
+
+  private int srid;
+  private SpatialReference spatialReference;
+  private static final Logger logger = 
LoggerFactory.getLogger(ShpBatchReader.class);
+
+  public static class ShpReaderConfig {
+protected final ShpFormatPlugin plugin;
+
+public ShpReaderConfig(ShpFormatPlugin plugin) {
+  this.plugin = plugin;
+}
+  }
+
+  public ShpBatchReader(ShpReaderConfig readerConfig) {
+this.readerConfig = readerConfig;
+  }
+
+  @Override
+  public boolean open(FileSchemaNegotiator negotiator) {
+this.split = negotiator.split();
+this.hadoopShp = split.getPath();
+this.hadoopDbf = new Path(split.getPath().toString().replace("shp", 
"dbf"));
+this.hadoopPrj = new Path(split.getPath().toString().replace("shp", 
"prj"));
+
+openFile(negotiator);
+SchemaBuilder builder = new SchemaBuilder();
+builder.addNullable("gid", TypeProtos.MinorType.INT);
+builder.addNullable("srid", TypeProtos.MinorType.INT);
+builder.addNullable("shapeType", TypeProtos.MinorType.VARCHAR);
+builder.addNullable("geom", TypeProtos.MinorType.VARBINARY);
+
+negotiator.setTableSchema(builder.buildSchema(), false);
+loader = negotiator.build();
+
+rowWriter = loader.writer();
+gidWriter = rowWriter.scalar("gid");
+sridWriter = rowWriter.scalar("srid");
+shapeTypeWriter = rowWriter.scalar("shapeType");
+geomWriter = rowWriter.scalar("geom");
+
+return true;
+  }
+
+  @Override
+  public 

[jira] [Commented] (DRILL-4303) ESRI Shapefile (shp) format plugin

2019-10-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955634#comment-16955634
 ] 

ASF GitHub Bot commented on DRILL-4303:
---

cgivre commented on pull request #1858: DRILL-4303: ESRI Shapefile (shp) Format 
Plugin
URL: https://github.com/apache/drill/pull/1858#discussion_r336797615
 
 

 ##
 File path: 
contrib/format-esri/src/main/java/org/apache/drill/exec/store/esri/ShpFormatPlugin.java
 ##
 @@ -0,0 +1,92 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.esri;
+
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.common.logical.StoragePluginConfig;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.common.types.Types;
+import org.apache.drill.exec.physical.impl.scan.file.FileScanFramework;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator;
+
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileReaderFactory;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.proto.UserBitShared.CoreOperatorType;
+import org.apache.drill.exec.server.DrillbitContext;
+import org.apache.drill.exec.server.options.OptionManager;
+import org.apache.drill.exec.store.dfs.easy.EasyFormatPlugin;
+import org.apache.drill.exec.store.dfs.easy.EasySubScan;
+import org.apache.drill.exec.store.esri.ShpBatchReader.ShpReaderConfig;
+import org.apache.hadoop.conf.Configuration;
+
+import org.apache.drill.shaded.guava.com.google.common.collect.Lists;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+public class ShpFormatPlugin extends EasyFormatPlugin {
+
+  private static final Logger logger = 
LoggerFactory.getLogger(ShpFormatPlugin.class);
+
+  public static final String PLUGIN_NAME = "shp";
+
+  public static class ShpReaderFactory extends FileReaderFactory {
+private final ShpReaderConfig readerConfig;
+
+public ShpReaderFactory(ShpReaderConfig config) {
+  readerConfig = config;
+}
+
+@Override
+public ManagedReader 
newReader() {
+  return new ShpBatchReader(readerConfig);
+}
+  }
+
+  public ShpFormatPlugin(String name, DrillbitContext context, Configuration 
fsConf, StoragePluginConfig storageConfig, ShpFormatConfig formatConfig) {
+super(name, easyConfig(fsConf, formatConfig), context, storageConfig, 
formatConfig);
+  }
+
+  @Override
+  public ManagedReader 
newBatchReader(EasySubScan scan, OptionManager options) throws 
ExecutionSetupException {
+return new ShpBatchReader(formatConfig.getReaderConfig(this));
+  }
+
+  @Override
+  protected FileScanFramework.FileScanBuilder frameworkBuilder(OptionManager 
options, EasySubScan scan) {
+FileScanFramework.FileScanBuilder builder = new 
FileScanFramework.FileScanBuilder();
+builder.setReaderFactory(new ShpReaderFactory(new ShpReaderConfig(this)));
+initScanBuilder(builder, scan);
+builder.setNullType(Types.optional(TypeProtos.MinorType.VARCHAR));
 
 Review comment:
   So this one is interesting. There are 4 or so fields that must (by 
definition) be in each row of the shape file then there is the possibility of 
other fields which are TBD.  
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> ESRI Shapefile (shp) format plugin
> --
>
> Key: DRILL-4303
> URL: https://issues.apache.org/jira/browse/DRILL-4303
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Affects Versions: 1.17.0
>Reporter: Karol Potocki
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.17.0
>
>
> Allow Drill (drill-gis) to read esri shapefiles, one of the most 

[jira] [Commented] (DRILL-4303) ESRI Shapefile (shp) format plugin

2019-10-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955633#comment-16955633
 ] 

ASF GitHub Bot commented on DRILL-4303:
---

cgivre commented on pull request #1858: DRILL-4303: ESRI Shapefile (shp) Format 
Plugin
URL: https://github.com/apache/drill/pull/1858#discussion_r336797574
 
 

 ##
 File path: 
contrib/format-esri/src/main/java/com/esri/core/geometry/ShapefileByteBufferCursor.java
 ##
 @@ -0,0 +1,72 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package com.esri.core.geometry;
 
 Review comment:
   This package is licensed under the Apache license.  I didn't write the 
original PR, so I'm not sure why this was included separately. 
   (https://github.com/Esri/geometry-api-java) 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> ESRI Shapefile (shp) format plugin
> --
>
> Key: DRILL-4303
> URL: https://issues.apache.org/jira/browse/DRILL-4303
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Affects Versions: 1.17.0
>Reporter: Karol Potocki
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.17.0
>
>
> Allow Drill (drill-gis) to read esri shapefiles, one of the most popular 
> geospatial formats.
> Format described here:
> https://www.esri.com/library/whitepapers/pdfs/shapefile.pdf
> It consists of three files (prj - srid information, dbf - data fields, shp - 
> geometry data)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-4303) ESRI Shapefile (shp) format plugin

2019-10-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955625#comment-16955625
 ] 

ASF GitHub Bot commented on DRILL-4303:
---

paul-rogers commented on pull request #1858: DRILL-4303: ESRI Shapefile (shp) 
Format Plugin
URL: https://github.com/apache/drill/pull/1858#discussion_r336791821
 
 

 ##
 File path: 
contrib/format-esri/src/main/java/org/apache/drill/exec/store/esri/ShpBatchReader.java
 ##
 @@ -0,0 +1,334 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.esri;
+
+import com.esri.core.geometry.Geometry;
+import com.esri.core.geometry.GeometryCursor;
+import com.esri.core.geometry.ShapefileReader;
+import com.esri.core.geometry.SpatialReference;
+import com.esri.core.geometry.ogc.OGCGeometry;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.types.TypeProtos;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.MetadataUtils;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapred.FileSplit;
+import org.jamel.dbf.DbfReader;
+import org.jamel.dbf.structure.DbfField;
+import org.joda.time.Instant;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.BufferedReader;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.charset.Charset;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+public class ShpBatchReader implements ManagedReader {
+
+  private FileSplit split;
+  private BufferedReader reader;
+  private ResultSetLoader loader;
+  private ShpReaderConfig readerConfig;
+  private Path hadoopShp;
+  private Path hadoopDbf;
+  private Path hadoopPrj;
+  private FSDataInputStream fileReaderShp = null;
+  private FSDataInputStream fileReaderDbf = null;
+  private FSDataInputStream fileReaderPrj = null;
+  private GeometryCursor geomCursor = null;
+  private DbfReader dbfReader = null;
+  private ScalarWriter gidWriter;
+  private ScalarWriter sridWriter;
+  private ScalarWriter shapeTypeWriter;
+  private ScalarWriter geomWriter;
+  private RowSetLoader rowWriter;
+
+
+  private int srid;
+  private SpatialReference spatialReference;
+  private static final Logger logger = 
LoggerFactory.getLogger(ShpBatchReader.class);
+
+  public static class ShpReaderConfig {
+protected final ShpFormatPlugin plugin;
+
+public ShpReaderConfig(ShpFormatPlugin plugin) {
+  this.plugin = plugin;
+}
+  }
+
+  public ShpBatchReader(ShpReaderConfig readerConfig) {
+this.readerConfig = readerConfig;
+  }
+
+  @Override
+  public boolean open(FileSchemaNegotiator negotiator) {
+this.split = negotiator.split();
+this.hadoopShp = split.getPath();
+this.hadoopDbf = new Path(split.getPath().toString().replace("shp", 
"dbf"));
+this.hadoopPrj = new Path(split.getPath().toString().replace("shp", 
"prj"));
+
+openFile(negotiator);
+SchemaBuilder builder = new SchemaBuilder();
+builder.addNullable("gid", TypeProtos.MinorType.INT);
+builder.addNullable("srid", TypeProtos.MinorType.INT);
+builder.addNullable("shapeType", TypeProtos.MinorType.VARCHAR);
+builder.addNullable("geom", TypeProtos.MinorType.VARBINARY);
+
+negotiator.setTableSchema(builder.buildSchema(), false);
 
 Review comment:
   Nit: because the schema is fixed, you can do something like:
   
   ```
   BatchSchema schema = new SchemaBuilder()
 .addNullable("gid", TypeProtos.MinorType.INT)
 ...
 .build();
   ```
 

This is an 

[jira] [Commented] (DRILL-4303) ESRI Shapefile (shp) format plugin

2019-10-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955630#comment-16955630
 ] 

ASF GitHub Bot commented on DRILL-4303:
---

paul-rogers commented on pull request #1858: DRILL-4303: ESRI Shapefile (shp) 
Format Plugin
URL: https://github.com/apache/drill/pull/1858#discussion_r336796777
 
 

 ##
 File path: 
contrib/format-esri/src/main/java/org/apache/drill/exec/store/esri/ShpBatchReader.java
 ##
 @@ -0,0 +1,334 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.esri;
+
+import com.esri.core.geometry.Geometry;
+import com.esri.core.geometry.GeometryCursor;
+import com.esri.core.geometry.ShapefileReader;
+import com.esri.core.geometry.SpatialReference;
+import com.esri.core.geometry.ogc.OGCGeometry;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.types.TypeProtos;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.MetadataUtils;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapred.FileSplit;
+import org.jamel.dbf.DbfReader;
+import org.jamel.dbf.structure.DbfField;
+import org.joda.time.Instant;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.BufferedReader;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.charset.Charset;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+public class ShpBatchReader implements ManagedReader {
+
+  private FileSplit split;
+  private BufferedReader reader;
+  private ResultSetLoader loader;
+  private ShpReaderConfig readerConfig;
+  private Path hadoopShp;
+  private Path hadoopDbf;
+  private Path hadoopPrj;
+  private FSDataInputStream fileReaderShp = null;
+  private FSDataInputStream fileReaderDbf = null;
+  private FSDataInputStream fileReaderPrj = null;
+  private GeometryCursor geomCursor = null;
+  private DbfReader dbfReader = null;
+  private ScalarWriter gidWriter;
+  private ScalarWriter sridWriter;
+  private ScalarWriter shapeTypeWriter;
+  private ScalarWriter geomWriter;
+  private RowSetLoader rowWriter;
+
+
+  private int srid;
+  private SpatialReference spatialReference;
+  private static final Logger logger = 
LoggerFactory.getLogger(ShpBatchReader.class);
+
+  public static class ShpReaderConfig {
+protected final ShpFormatPlugin plugin;
+
+public ShpReaderConfig(ShpFormatPlugin plugin) {
+  this.plugin = plugin;
+}
+  }
+
+  public ShpBatchReader(ShpReaderConfig readerConfig) {
+this.readerConfig = readerConfig;
+  }
+
+  @Override
+  public boolean open(FileSchemaNegotiator negotiator) {
+this.split = negotiator.split();
+this.hadoopShp = split.getPath();
+this.hadoopDbf = new Path(split.getPath().toString().replace("shp", 
"dbf"));
+this.hadoopPrj = new Path(split.getPath().toString().replace("shp", 
"prj"));
+
+openFile(negotiator);
+SchemaBuilder builder = new SchemaBuilder();
+builder.addNullable("gid", TypeProtos.MinorType.INT);
+builder.addNullable("srid", TypeProtos.MinorType.INT);
+builder.addNullable("shapeType", TypeProtos.MinorType.VARCHAR);
+builder.addNullable("geom", TypeProtos.MinorType.VARBINARY);
+
+negotiator.setTableSchema(builder.buildSchema(), false);
+loader = negotiator.build();
+
+rowWriter = loader.writer();
+gidWriter = rowWriter.scalar("gid");
+sridWriter = rowWriter.scalar("srid");
+shapeTypeWriter = rowWriter.scalar("shapeType");
+geomWriter = rowWriter.scalar("geom");
+
+return true;
+  }
+
+  @Override
+  

[jira] [Commented] (DRILL-4303) ESRI Shapefile (shp) format plugin

2019-10-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955617#comment-16955617
 ] 

ASF GitHub Bot commented on DRILL-4303:
---

paul-rogers commented on pull request #1858: DRILL-4303: ESRI Shapefile (shp) 
Format Plugin
URL: https://github.com/apache/drill/pull/1858#discussion_r336791151
 
 

 ##
 File path: 
contrib/format-esri/src/main/java/com/esri/core/geometry/ShapefileByteBufferCursor.java
 ##
 @@ -0,0 +1,72 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package com.esri.core.geometry;
 
 Review comment:
   Is this file from, or adapted from, ESRI? Should we add an attribution 
comment? Is their license compatible with Apache?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> ESRI Shapefile (shp) format plugin
> --
>
> Key: DRILL-4303
> URL: https://issues.apache.org/jira/browse/DRILL-4303
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Affects Versions: 1.17.0
>Reporter: Karol Potocki
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.17.0
>
>
> Allow Drill (drill-gis) to read esri shapefiles, one of the most popular 
> geospatial formats.
> Format described here:
> https://www.esri.com/library/whitepapers/pdfs/shapefile.pdf
> It consists of three files (prj - srid information, dbf - data fields, shp - 
> geometry data)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-4303) ESRI Shapefile (shp) format plugin

2019-10-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955624#comment-16955624
 ] 

ASF GitHub Bot commented on DRILL-4303:
---

paul-rogers commented on pull request #1858: DRILL-4303: ESRI Shapefile (shp) 
Format Plugin
URL: https://github.com/apache/drill/pull/1858#discussion_r336791770
 
 

 ##
 File path: 
contrib/format-esri/src/main/java/org/apache/drill/exec/store/esri/ShpBatchReader.java
 ##
 @@ -0,0 +1,334 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.esri;
+
+import com.esri.core.geometry.Geometry;
+import com.esri.core.geometry.GeometryCursor;
+import com.esri.core.geometry.ShapefileReader;
+import com.esri.core.geometry.SpatialReference;
+import com.esri.core.geometry.ogc.OGCGeometry;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.types.TypeProtos;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.MetadataUtils;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapred.FileSplit;
+import org.jamel.dbf.DbfReader;
+import org.jamel.dbf.structure.DbfField;
+import org.joda.time.Instant;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.BufferedReader;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.charset.Charset;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+public class ShpBatchReader implements ManagedReader {
+
+  private FileSplit split;
+  private BufferedReader reader;
+  private ResultSetLoader loader;
+  private ShpReaderConfig readerConfig;
+  private Path hadoopShp;
+  private Path hadoopDbf;
+  private Path hadoopPrj;
+  private FSDataInputStream fileReaderShp = null;
+  private FSDataInputStream fileReaderDbf = null;
+  private FSDataInputStream fileReaderPrj = null;
+  private GeometryCursor geomCursor = null;
+  private DbfReader dbfReader = null;
+  private ScalarWriter gidWriter;
+  private ScalarWriter sridWriter;
+  private ScalarWriter shapeTypeWriter;
+  private ScalarWriter geomWriter;
+  private RowSetLoader rowWriter;
+
+
+  private int srid;
+  private SpatialReference spatialReference;
+  private static final Logger logger = 
LoggerFactory.getLogger(ShpBatchReader.class);
+
+  public static class ShpReaderConfig {
+protected final ShpFormatPlugin plugin;
+
+public ShpReaderConfig(ShpFormatPlugin plugin) {
+  this.plugin = plugin;
+}
+  }
+
+  public ShpBatchReader(ShpReaderConfig readerConfig) {
+this.readerConfig = readerConfig;
+  }
+
+  @Override
+  public boolean open(FileSchemaNegotiator negotiator) {
+this.split = negotiator.split();
+this.hadoopShp = split.getPath();
+this.hadoopDbf = new Path(split.getPath().toString().replace("shp", 
"dbf"));
+this.hadoopPrj = new Path(split.getPath().toString().replace("shp", 
"prj"));
 
 Review comment:
   Does this do what you want? Suppose I have "myshpFile.foo". Do you want to 
change this to "mydbfFile.foo"?
   
   I think you want a regex that matches only "\.\w$" or some such.
   
   Also, is this replacing a "shp" extension with something else? If so, why? A 
comment would help to explain what's happening...
   
   Also, nit: no need to convert the path to String twice.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:

[jira] [Commented] (DRILL-4303) ESRI Shapefile (shp) format plugin

2019-10-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955629#comment-16955629
 ] 

ASF GitHub Bot commented on DRILL-4303:
---

paul-rogers commented on pull request #1858: DRILL-4303: ESRI Shapefile (shp) 
Format Plugin
URL: https://github.com/apache/drill/pull/1858#discussion_r336796656
 
 

 ##
 File path: 
contrib/format-esri/src/main/java/org/apache/drill/exec/store/esri/ShpBatchReader.java
 ##
 @@ -0,0 +1,334 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.esri;
+
+import com.esri.core.geometry.Geometry;
+import com.esri.core.geometry.GeometryCursor;
+import com.esri.core.geometry.ShapefileReader;
+import com.esri.core.geometry.SpatialReference;
+import com.esri.core.geometry.ogc.OGCGeometry;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.types.TypeProtos;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.MetadataUtils;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapred.FileSplit;
+import org.jamel.dbf.DbfReader;
+import org.jamel.dbf.structure.DbfField;
+import org.joda.time.Instant;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.BufferedReader;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.charset.Charset;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+public class ShpBatchReader implements ManagedReader {
+
+  private FileSplit split;
+  private BufferedReader reader;
+  private ResultSetLoader loader;
+  private ShpReaderConfig readerConfig;
+  private Path hadoopShp;
+  private Path hadoopDbf;
+  private Path hadoopPrj;
+  private FSDataInputStream fileReaderShp = null;
+  private FSDataInputStream fileReaderDbf = null;
+  private FSDataInputStream fileReaderPrj = null;
+  private GeometryCursor geomCursor = null;
+  private DbfReader dbfReader = null;
+  private ScalarWriter gidWriter;
+  private ScalarWriter sridWriter;
+  private ScalarWriter shapeTypeWriter;
+  private ScalarWriter geomWriter;
+  private RowSetLoader rowWriter;
+
+
+  private int srid;
+  private SpatialReference spatialReference;
+  private static final Logger logger = 
LoggerFactory.getLogger(ShpBatchReader.class);
+
+  public static class ShpReaderConfig {
+protected final ShpFormatPlugin plugin;
+
+public ShpReaderConfig(ShpFormatPlugin plugin) {
+  this.plugin = plugin;
+}
+  }
+
+  public ShpBatchReader(ShpReaderConfig readerConfig) {
+this.readerConfig = readerConfig;
+  }
+
+  @Override
+  public boolean open(FileSchemaNegotiator negotiator) {
+this.split = negotiator.split();
+this.hadoopShp = split.getPath();
+this.hadoopDbf = new Path(split.getPath().toString().replace("shp", 
"dbf"));
+this.hadoopPrj = new Path(split.getPath().toString().replace("shp", 
"prj"));
+
+openFile(negotiator);
+SchemaBuilder builder = new SchemaBuilder();
+builder.addNullable("gid", TypeProtos.MinorType.INT);
+builder.addNullable("srid", TypeProtos.MinorType.INT);
+builder.addNullable("shapeType", TypeProtos.MinorType.VARCHAR);
+builder.addNullable("geom", TypeProtos.MinorType.VARBINARY);
+
+negotiator.setTableSchema(builder.buildSchema(), false);
+loader = negotiator.build();
+
+rowWriter = loader.writer();
+gidWriter = rowWriter.scalar("gid");
+sridWriter = rowWriter.scalar("srid");
+shapeTypeWriter = rowWriter.scalar("shapeType");
+geomWriter = rowWriter.scalar("geom");
+
+return true;
+  }
+
+  @Override
+  

[jira] [Commented] (DRILL-4303) ESRI Shapefile (shp) format plugin

2019-10-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955627#comment-16955627
 ] 

ASF GitHub Bot commented on DRILL-4303:
---

paul-rogers commented on pull request #1858: DRILL-4303: ESRI Shapefile (shp) 
Format Plugin
URL: https://github.com/apache/drill/pull/1858#discussion_r336791945
 
 

 ##
 File path: 
contrib/format-esri/src/main/java/org/apache/drill/exec/store/esri/ShpBatchReader.java
 ##
 @@ -0,0 +1,334 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.esri;
+
+import com.esri.core.geometry.Geometry;
+import com.esri.core.geometry.GeometryCursor;
+import com.esri.core.geometry.ShapefileReader;
+import com.esri.core.geometry.SpatialReference;
+import com.esri.core.geometry.ogc.OGCGeometry;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.types.TypeProtos;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.MetadataUtils;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapred.FileSplit;
+import org.jamel.dbf.DbfReader;
+import org.jamel.dbf.structure.DbfField;
+import org.joda.time.Instant;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.BufferedReader;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.charset.Charset;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+public class ShpBatchReader implements ManagedReader {
+
+  private FileSplit split;
+  private BufferedReader reader;
+  private ResultSetLoader loader;
+  private ShpReaderConfig readerConfig;
+  private Path hadoopShp;
+  private Path hadoopDbf;
+  private Path hadoopPrj;
+  private FSDataInputStream fileReaderShp = null;
+  private FSDataInputStream fileReaderDbf = null;
+  private FSDataInputStream fileReaderPrj = null;
+  private GeometryCursor geomCursor = null;
+  private DbfReader dbfReader = null;
+  private ScalarWriter gidWriter;
+  private ScalarWriter sridWriter;
+  private ScalarWriter shapeTypeWriter;
+  private ScalarWriter geomWriter;
+  private RowSetLoader rowWriter;
+
+
+  private int srid;
+  private SpatialReference spatialReference;
+  private static final Logger logger = 
LoggerFactory.getLogger(ShpBatchReader.class);
+
+  public static class ShpReaderConfig {
+protected final ShpFormatPlugin plugin;
+
+public ShpReaderConfig(ShpFormatPlugin plugin) {
+  this.plugin = plugin;
+}
+  }
+
+  public ShpBatchReader(ShpReaderConfig readerConfig) {
+this.readerConfig = readerConfig;
+  }
+
+  @Override
+  public boolean open(FileSchemaNegotiator negotiator) {
+this.split = negotiator.split();
+this.hadoopShp = split.getPath();
+this.hadoopDbf = new Path(split.getPath().toString().replace("shp", 
"dbf"));
+this.hadoopPrj = new Path(split.getPath().toString().replace("shp", 
"prj"));
+
+openFile(negotiator);
+SchemaBuilder builder = new SchemaBuilder();
+builder.addNullable("gid", TypeProtos.MinorType.INT);
+builder.addNullable("srid", TypeProtos.MinorType.INT);
+builder.addNullable("shapeType", TypeProtos.MinorType.VARCHAR);
+builder.addNullable("geom", TypeProtos.MinorType.VARBINARY);
+
+negotiator.setTableSchema(builder.buildSchema(), false);
+loader = negotiator.build();
+
+rowWriter = loader.writer();
+gidWriter = rowWriter.scalar("gid");
+sridWriter = rowWriter.scalar("srid");
+shapeTypeWriter = rowWriter.scalar("shapeType");
+geomWriter = rowWriter.scalar("geom");
+
+return true;
+  }
+
+  @Override
+  

[jira] [Commented] (DRILL-4303) ESRI Shapefile (shp) format plugin

2019-10-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955621#comment-16955621
 ] 

ASF GitHub Bot commented on DRILL-4303:
---

paul-rogers commented on pull request #1858: DRILL-4303: ESRI Shapefile (shp) 
Format Plugin
URL: https://github.com/apache/drill/pull/1858#discussion_r336791910
 
 

 ##
 File path: 
contrib/format-esri/src/main/java/org/apache/drill/exec/store/esri/ShpBatchReader.java
 ##
 @@ -0,0 +1,334 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.esri;
+
+import com.esri.core.geometry.Geometry;
+import com.esri.core.geometry.GeometryCursor;
+import com.esri.core.geometry.ShapefileReader;
+import com.esri.core.geometry.SpatialReference;
+import com.esri.core.geometry.ogc.OGCGeometry;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.types.TypeProtos;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.MetadataUtils;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapred.FileSplit;
+import org.jamel.dbf.DbfReader;
+import org.jamel.dbf.structure.DbfField;
+import org.joda.time.Instant;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.BufferedReader;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.charset.Charset;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+public class ShpBatchReader implements ManagedReader {
+
+  private FileSplit split;
+  private BufferedReader reader;
+  private ResultSetLoader loader;
+  private ShpReaderConfig readerConfig;
+  private Path hadoopShp;
+  private Path hadoopDbf;
+  private Path hadoopPrj;
+  private FSDataInputStream fileReaderShp = null;
+  private FSDataInputStream fileReaderDbf = null;
+  private FSDataInputStream fileReaderPrj = null;
+  private GeometryCursor geomCursor = null;
+  private DbfReader dbfReader = null;
+  private ScalarWriter gidWriter;
+  private ScalarWriter sridWriter;
+  private ScalarWriter shapeTypeWriter;
+  private ScalarWriter geomWriter;
+  private RowSetLoader rowWriter;
+
+
+  private int srid;
+  private SpatialReference spatialReference;
+  private static final Logger logger = 
LoggerFactory.getLogger(ShpBatchReader.class);
+
+  public static class ShpReaderConfig {
+protected final ShpFormatPlugin plugin;
+
+public ShpReaderConfig(ShpFormatPlugin plugin) {
+  this.plugin = plugin;
+}
+  }
+
+  public ShpBatchReader(ShpReaderConfig readerConfig) {
+this.readerConfig = readerConfig;
+  }
+
+  @Override
+  public boolean open(FileSchemaNegotiator negotiator) {
+this.split = negotiator.split();
+this.hadoopShp = split.getPath();
+this.hadoopDbf = new Path(split.getPath().toString().replace("shp", 
"dbf"));
+this.hadoopPrj = new Path(split.getPath().toString().replace("shp", 
"prj"));
+
+openFile(negotiator);
+SchemaBuilder builder = new SchemaBuilder();
+builder.addNullable("gid", TypeProtos.MinorType.INT);
+builder.addNullable("srid", TypeProtos.MinorType.INT);
+builder.addNullable("shapeType", TypeProtos.MinorType.VARCHAR);
+builder.addNullable("geom", TypeProtos.MinorType.VARBINARY);
+
+negotiator.setTableSchema(builder.buildSchema(), false);
+loader = negotiator.build();
+
+rowWriter = loader.writer();
+gidWriter = rowWriter.scalar("gid");
+sridWriter = rowWriter.scalar("srid");
+shapeTypeWriter = rowWriter.scalar("shapeType");
+geomWriter = rowWriter.scalar("geom");
+
+return true;
+  }
+
+  @Override
+  

[jira] [Commented] (DRILL-4303) ESRI Shapefile (shp) format plugin

2019-10-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955628#comment-16955628
 ] 

ASF GitHub Bot commented on DRILL-4303:
---

paul-rogers commented on pull request #1858: DRILL-4303: ESRI Shapefile (shp) 
Format Plugin
URL: https://github.com/apache/drill/pull/1858#discussion_r336791976
 
 

 ##
 File path: 
contrib/format-esri/src/main/java/org/apache/drill/exec/store/esri/ShpBatchReader.java
 ##
 @@ -0,0 +1,334 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.esri;
+
+import com.esri.core.geometry.Geometry;
+import com.esri.core.geometry.GeometryCursor;
+import com.esri.core.geometry.ShapefileReader;
+import com.esri.core.geometry.SpatialReference;
+import com.esri.core.geometry.ogc.OGCGeometry;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.types.TypeProtos;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.MetadataUtils;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapred.FileSplit;
+import org.jamel.dbf.DbfReader;
+import org.jamel.dbf.structure.DbfField;
+import org.joda.time.Instant;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.BufferedReader;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.charset.Charset;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+public class ShpBatchReader implements ManagedReader {
+
+  private FileSplit split;
+  private BufferedReader reader;
+  private ResultSetLoader loader;
+  private ShpReaderConfig readerConfig;
+  private Path hadoopShp;
+  private Path hadoopDbf;
+  private Path hadoopPrj;
+  private FSDataInputStream fileReaderShp = null;
+  private FSDataInputStream fileReaderDbf = null;
+  private FSDataInputStream fileReaderPrj = null;
+  private GeometryCursor geomCursor = null;
+  private DbfReader dbfReader = null;
+  private ScalarWriter gidWriter;
+  private ScalarWriter sridWriter;
+  private ScalarWriter shapeTypeWriter;
+  private ScalarWriter geomWriter;
+  private RowSetLoader rowWriter;
+
+
+  private int srid;
+  private SpatialReference spatialReference;
+  private static final Logger logger = 
LoggerFactory.getLogger(ShpBatchReader.class);
+
+  public static class ShpReaderConfig {
+protected final ShpFormatPlugin plugin;
+
+public ShpReaderConfig(ShpFormatPlugin plugin) {
+  this.plugin = plugin;
+}
+  }
+
+  public ShpBatchReader(ShpReaderConfig readerConfig) {
+this.readerConfig = readerConfig;
+  }
+
+  @Override
+  public boolean open(FileSchemaNegotiator negotiator) {
+this.split = negotiator.split();
+this.hadoopShp = split.getPath();
+this.hadoopDbf = new Path(split.getPath().toString().replace("shp", 
"dbf"));
+this.hadoopPrj = new Path(split.getPath().toString().replace("shp", 
"prj"));
+
+openFile(negotiator);
+SchemaBuilder builder = new SchemaBuilder();
+builder.addNullable("gid", TypeProtos.MinorType.INT);
+builder.addNullable("srid", TypeProtos.MinorType.INT);
+builder.addNullable("shapeType", TypeProtos.MinorType.VARCHAR);
+builder.addNullable("geom", TypeProtos.MinorType.VARBINARY);
+
+negotiator.setTableSchema(builder.buildSchema(), false);
+loader = negotiator.build();
+
+rowWriter = loader.writer();
+gidWriter = rowWriter.scalar("gid");
+sridWriter = rowWriter.scalar("srid");
+shapeTypeWriter = rowWriter.scalar("shapeType");
+geomWriter = rowWriter.scalar("geom");
+
+return true;
+  }
+
+  @Override
+  

[jira] [Commented] (DRILL-4303) ESRI Shapefile (shp) format plugin

2019-10-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955620#comment-16955620
 ] 

ASF GitHub Bot commented on DRILL-4303:
---

paul-rogers commented on pull request #1858: DRILL-4303: ESRI Shapefile (shp) 
Format Plugin
URL: https://github.com/apache/drill/pull/1858#discussion_r336791601
 
 

 ##
 File path: contrib/format-esri/README.md
 ##
 @@ -0,0 +1,190 @@
+# Format Plugin for ESRI Shape Files
 
 Review comment:
   Thanks as always for your README files, very helpful!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> ESRI Shapefile (shp) format plugin
> --
>
> Key: DRILL-4303
> URL: https://issues.apache.org/jira/browse/DRILL-4303
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Affects Versions: 1.17.0
>Reporter: Karol Potocki
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.17.0
>
>
> Allow Drill (drill-gis) to read esri shapefiles, one of the most popular 
> geospatial formats.
> Format described here:
> https://www.esri.com/library/whitepapers/pdfs/shapefile.pdf
> It consists of three files (prj - srid information, dbf - data fields, shp - 
> geometry data)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-4303) ESRI Shapefile (shp) format plugin

2019-10-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955631#comment-16955631
 ] 

ASF GitHub Bot commented on DRILL-4303:
---

paul-rogers commented on pull request #1858: DRILL-4303: ESRI Shapefile (shp) 
Format Plugin
URL: https://github.com/apache/drill/pull/1858#discussion_r336791191
 
 

 ##
 File path: 
contrib/format-esri/src/main/java/org/apache/drill/exec/store/esri/ShpFormatConfig.java
 ##
 @@ -0,0 +1,69 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.esri;
+
+import com.fasterxml.jackson.annotation.JsonInclude;
+import com.fasterxml.jackson.annotation.JsonTypeName;
+import org.apache.drill.common.logical.FormatPluginConfig;
+import org.apache.drill.shaded.guava.com.google.common.base.Objects;
+import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableList;
+
+import java.util.Arrays;
+import java.util.List;
+
+/**
+ * Format plugin config for shapefile data files.
+ */
+@JsonTypeName(ShpFormatPlugin.PLUGIN_NAME)
+public class ShpFormatConfig implements FormatPluginConfig {
+  public List extensions;
+  private static final List DEFAULT_EXTS = ImmutableList.of("shp", 
"dbf");
 
 Review comment:
   Nit: statics at top of class, followed by members.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> ESRI Shapefile (shp) format plugin
> --
>
> Key: DRILL-4303
> URL: https://issues.apache.org/jira/browse/DRILL-4303
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Affects Versions: 1.17.0
>Reporter: Karol Potocki
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.17.0
>
>
> Allow Drill (drill-gis) to read esri shapefiles, one of the most popular 
> geospatial formats.
> Format described here:
> https://www.esri.com/library/whitepapers/pdfs/shapefile.pdf
> It consists of three files (prj - srid information, dbf - data fields, shp - 
> geometry data)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-4303) ESRI Shapefile (shp) format plugin

2019-10-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955626#comment-16955626
 ] 

ASF GitHub Bot commented on DRILL-4303:
---

paul-rogers commented on pull request #1858: DRILL-4303: ESRI Shapefile (shp) 
Format Plugin
URL: https://github.com/apache/drill/pull/1858#discussion_r336796751
 
 

 ##
 File path: 
contrib/format-esri/src/main/java/org/apache/drill/exec/store/esri/ShpBatchReader.java
 ##
 @@ -0,0 +1,334 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.esri;
+
+import com.esri.core.geometry.Geometry;
+import com.esri.core.geometry.GeometryCursor;
+import com.esri.core.geometry.ShapefileReader;
+import com.esri.core.geometry.SpatialReference;
+import com.esri.core.geometry.ogc.OGCGeometry;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.types.TypeProtos;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.MetadataUtils;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapred.FileSplit;
+import org.jamel.dbf.DbfReader;
+import org.jamel.dbf.structure.DbfField;
+import org.joda.time.Instant;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.BufferedReader;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.charset.Charset;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+public class ShpBatchReader implements ManagedReader {
+
+  private FileSplit split;
+  private BufferedReader reader;
+  private ResultSetLoader loader;
+  private ShpReaderConfig readerConfig;
+  private Path hadoopShp;
+  private Path hadoopDbf;
+  private Path hadoopPrj;
+  private FSDataInputStream fileReaderShp = null;
+  private FSDataInputStream fileReaderDbf = null;
+  private FSDataInputStream fileReaderPrj = null;
+  private GeometryCursor geomCursor = null;
+  private DbfReader dbfReader = null;
+  private ScalarWriter gidWriter;
+  private ScalarWriter sridWriter;
+  private ScalarWriter shapeTypeWriter;
+  private ScalarWriter geomWriter;
+  private RowSetLoader rowWriter;
+
+
+  private int srid;
+  private SpatialReference spatialReference;
+  private static final Logger logger = 
LoggerFactory.getLogger(ShpBatchReader.class);
+
+  public static class ShpReaderConfig {
+protected final ShpFormatPlugin plugin;
+
+public ShpReaderConfig(ShpFormatPlugin plugin) {
+  this.plugin = plugin;
+}
+  }
+
+  public ShpBatchReader(ShpReaderConfig readerConfig) {
+this.readerConfig = readerConfig;
+  }
+
+  @Override
+  public boolean open(FileSchemaNegotiator negotiator) {
+this.split = negotiator.split();
+this.hadoopShp = split.getPath();
+this.hadoopDbf = new Path(split.getPath().toString().replace("shp", 
"dbf"));
+this.hadoopPrj = new Path(split.getPath().toString().replace("shp", 
"prj"));
+
+openFile(negotiator);
+SchemaBuilder builder = new SchemaBuilder();
+builder.addNullable("gid", TypeProtos.MinorType.INT);
+builder.addNullable("srid", TypeProtos.MinorType.INT);
+builder.addNullable("shapeType", TypeProtos.MinorType.VARCHAR);
+builder.addNullable("geom", TypeProtos.MinorType.VARBINARY);
+
+negotiator.setTableSchema(builder.buildSchema(), false);
+loader = negotiator.build();
+
+rowWriter = loader.writer();
+gidWriter = rowWriter.scalar("gid");
+sridWriter = rowWriter.scalar("srid");
+shapeTypeWriter = rowWriter.scalar("shapeType");
+geomWriter = rowWriter.scalar("geom");
+
+return true;
+  }
+
+  @Override
+  

[jira] [Commented] (DRILL-4303) ESRI Shapefile (shp) format plugin

2019-10-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955619#comment-16955619
 ] 

ASF GitHub Bot commented on DRILL-4303:
---

paul-rogers commented on pull request #1858: DRILL-4303: ESRI Shapefile (shp) 
Format Plugin
URL: https://github.com/apache/drill/pull/1858#discussion_r336791541
 
 

 ##
 File path: contrib/format-esri/README.md
 ##
 @@ -0,0 +1,190 @@
+# Format Plugin for ESRI Shape Files
+This format plugin allows Drill to read ESRI Shape files. You can read about 
the shapefile format here: https://en.wikipedia.org/wiki/Shapefile. 
+
+## Configuration Options
+Other than the file extensions, there are no configuration options for this 
plugin. To use, simply add the following to your configuration:
+
+```
+"shp": {
+  "type": "shp",
+  "extensions": [
+"shp"
+  ]
+}
+```
+
+## Usage Notes:
+This plugin will return the following fields:
+
+* `gid`:  Integer
+* `srid`:  Integer
+* `shapeType`:  String
+* `name`:  Plain text 
+* `geom`:  A geometric point or path.  This field is returned as a `VARBINARY`.
+
+This plugin is best used with the suite of GIS functions in Drill which 
include the following:
+
+Geospatial Functions
 
 Review comment:
   This seems to be a copy/past of a later file. Any reason to have two copies 
we maintain in parallel? Can we, instead, reference that other file?
   
   Or, are there differences between the two copies? Can't easily tell. If so, 
can we handle those differently so we don't have to maintain two copies?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> ESRI Shapefile (shp) format plugin
> --
>
> Key: DRILL-4303
> URL: https://issues.apache.org/jira/browse/DRILL-4303
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Affects Versions: 1.17.0
>Reporter: Karol Potocki
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.17.0
>
>
> Allow Drill (drill-gis) to read esri shapefiles, one of the most popular 
> geospatial formats.
> Format described here:
> https://www.esri.com/library/whitepapers/pdfs/shapefile.pdf
> It consists of three files (prj - srid information, dbf - data fields, shp - 
> geometry data)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-4303) ESRI Shapefile (shp) format plugin

2019-10-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955618#comment-16955618
 ] 

ASF GitHub Bot commented on DRILL-4303:
---

paul-rogers commented on pull request #1858: DRILL-4303: ESRI Shapefile (shp) 
Format Plugin
URL: https://github.com/apache/drill/pull/1858#discussion_r336791219
 
 

 ##
 File path: 
contrib/format-esri/src/main/java/org/apache/drill/exec/store/esri/ShpFormatConfig.java
 ##
 @@ -0,0 +1,69 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.esri;
+
+import com.fasterxml.jackson.annotation.JsonInclude;
+import com.fasterxml.jackson.annotation.JsonTypeName;
+import org.apache.drill.common.logical.FormatPluginConfig;
+import org.apache.drill.shaded.guava.com.google.common.base.Objects;
+import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableList;
+
+import java.util.Arrays;
+import java.util.List;
+
+/**
+ * Format plugin config for shapefile data files.
+ */
+@JsonTypeName(ShpFormatPlugin.PLUGIN_NAME)
+public class ShpFormatConfig implements FormatPluginConfig {
+  public List extensions;
+  private static final List DEFAULT_EXTS = ImmutableList.of("shp", 
"dbf");
+
+  public ShpFormatConfig() { }
+
+  @JsonInclude(JsonInclude.Include.NON_DEFAULT)
+  public List getExtensions() {
+if (extensions == null) {
+  return DEFAULT_EXTS;
+}
+return extensions;
 
 Review comment:
   Nit: `return extensions == null ? DEFAULT_EXTS ? extensions;`
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> ESRI Shapefile (shp) format plugin
> --
>
> Key: DRILL-4303
> URL: https://issues.apache.org/jira/browse/DRILL-4303
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Affects Versions: 1.17.0
>Reporter: Karol Potocki
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.17.0
>
>
> Allow Drill (drill-gis) to read esri shapefiles, one of the most popular 
> geospatial formats.
> Format described here:
> https://www.esri.com/library/whitepapers/pdfs/shapefile.pdf
> It consists of three files (prj - srid information, dbf - data fields, shp - 
> geometry data)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-4303) ESRI Shapefile (shp) format plugin

2019-10-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955622#comment-16955622
 ] 

ASF GitHub Bot commented on DRILL-4303:
---

paul-rogers commented on pull request #1858: DRILL-4303: ESRI Shapefile (shp) 
Format Plugin
URL: https://github.com/apache/drill/pull/1858#discussion_r336791441
 
 

 ##
 File path: 
contrib/format-esri/src/test/java/org/apache/drill/exec/store/esri/TestShapefileFormatPlugin.java
 ##
 @@ -0,0 +1,92 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.esri;
+
+import org.apache.drill.categories.RowSetTests;
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.exec.server.Drillbit;
+import org.apache.drill.exec.store.StoragePluginRegistry;
+import org.apache.drill.exec.store.dfs.FileSystemConfig;
+import org.apache.drill.exec.store.dfs.FileSystemPlugin;
+import org.apache.drill.test.BaseDirTestWatcher;
+import org.apache.drill.test.ClusterFixture;
+import org.apache.drill.test.ClusterTest;
+import org.junit.BeforeClass;
+import org.junit.ClassRule;
+import org.junit.Test;
+import org.junit.experimental.categories.Category;
+
+@Category(RowSetTests.class)
+public class TestShapefileFormatPlugin extends ClusterTest {
+
+  @ClassRule
+  public static final BaseDirTestWatcher dirTestWatcher = new 
BaseDirTestWatcher();
+
+  @BeforeClass
+  public static void setup() throws Exception {
+ClusterTest.startCluster(ClusterFixture.builder(dirTestWatcher));
+definePlugin();
+  }
+
+  private static void definePlugin() throws ExecutionSetupException {
+ShpFormatConfig sampleConfig = new ShpFormatConfig();
+
+// Define a temporary plugin for the "cp" storage plugin.
+Drillbit drillbit = cluster.drillbit();
+final StoragePluginRegistry pluginRegistry = 
drillbit.getContext().getStorage();
+final FileSystemPlugin plugin = (FileSystemPlugin) 
pluginRegistry.getPlugin("cp");
+final FileSystemConfig pluginConfig = (FileSystemConfig) 
plugin.getConfig();
+pluginConfig.getFormats().put("sample", sampleConfig);
+pluginRegistry.createOrUpdate("cp", pluginConfig, false);
+  }
+
+  @Test
+  public void testRowCount() throws Exception {
+testBuilder()
+  .sqlQuery("select count(*) "
++ "from cp.`CA-cities.shp`")
+  .ordered()
+  .baselineColumns("EXPR$0")
+  .baselineValues(5727L)
+  .go();
 
 Review comment:
   Any reason to use this old-school way of testing queries? Does not test 
types effectively...
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> ESRI Shapefile (shp) format plugin
> --
>
> Key: DRILL-4303
> URL: https://issues.apache.org/jira/browse/DRILL-4303
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Affects Versions: 1.17.0
>Reporter: Karol Potocki
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.17.0
>
>
> Allow Drill (drill-gis) to read esri shapefiles, one of the most popular 
> geospatial formats.
> Format described here:
> https://www.esri.com/library/whitepapers/pdfs/shapefile.pdf
> It consists of three files (prj - srid information, dbf - data fields, shp - 
> geometry data)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-4303) ESRI Shapefile (shp) format plugin

2019-10-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955623#comment-16955623
 ] 

ASF GitHub Bot commented on DRILL-4303:
---

paul-rogers commented on pull request #1858: DRILL-4303: ESRI Shapefile (shp) 
Format Plugin
URL: https://github.com/apache/drill/pull/1858#discussion_r336791381
 
 

 ##
 File path: 
contrib/format-esri/src/main/java/org/apache/drill/exec/store/esri/ShpFormatPlugin.java
 ##
 @@ -0,0 +1,92 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.esri;
+
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.common.logical.StoragePluginConfig;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.common.types.Types;
+import org.apache.drill.exec.physical.impl.scan.file.FileScanFramework;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator;
+
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileReaderFactory;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.proto.UserBitShared.CoreOperatorType;
+import org.apache.drill.exec.server.DrillbitContext;
+import org.apache.drill.exec.server.options.OptionManager;
+import org.apache.drill.exec.store.dfs.easy.EasyFormatPlugin;
+import org.apache.drill.exec.store.dfs.easy.EasySubScan;
+import org.apache.drill.exec.store.esri.ShpBatchReader.ShpReaderConfig;
+import org.apache.hadoop.conf.Configuration;
+
+import org.apache.drill.shaded.guava.com.google.common.collect.Lists;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+public class ShpFormatPlugin extends EasyFormatPlugin {
+
+  private static final Logger logger = 
LoggerFactory.getLogger(ShpFormatPlugin.class);
+
+  public static final String PLUGIN_NAME = "shp";
+
+  public static class ShpReaderFactory extends FileReaderFactory {
+private final ShpReaderConfig readerConfig;
+
+public ShpReaderFactory(ShpReaderConfig config) {
+  readerConfig = config;
+}
+
+@Override
+public ManagedReader 
newReader() {
+  return new ShpBatchReader(readerConfig);
+}
+  }
+
+  public ShpFormatPlugin(String name, DrillbitContext context, Configuration 
fsConf, StoragePluginConfig storageConfig, ShpFormatConfig formatConfig) {
+super(name, easyConfig(fsConf, formatConfig), context, storageConfig, 
formatConfig);
+  }
+
+  @Override
+  public ManagedReader 
newBatchReader(EasySubScan scan, OptionManager options) throws 
ExecutionSetupException {
+return new ShpBatchReader(formatConfig.getReaderConfig(this));
+  }
+
+  @Override
+  protected FileScanFramework.FileScanBuilder frameworkBuilder(OptionManager 
options, EasySubScan scan) {
+FileScanFramework.FileScanBuilder builder = new 
FileScanFramework.FileScanBuilder();
+builder.setReaderFactory(new ShpReaderFactory(new ShpReaderConfig(this)));
+initScanBuilder(builder, scan);
+builder.setNullType(Types.optional(TypeProtos.MinorType.VARCHAR));
 
 Review comment:
   This one is interesting. You've defined a fixed set of columns. Yet, I can 
request others, such as `foo`. The above says that `foo` will be defined as 
nullable VARCHAR, filled with nulls. I wonder, when the set of columns are 
fixed, should we return an error if the user requests an unknown column?
   
   I think I saw a mailing list post about this recently where, for some format 
or other, someone was enforcing the set of available columns.
   
   Not a big deal (it is fail-soft), but worth considering...
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> ESRI Shapefile (shp) format plugin
> --
>
> Key: DRILL-4303
> URL: https://issues.apache.org/jira/browse/DRILL-4303
> Project: Apache Drill
> 

[jira] [Commented] (DRILL-4303) ESRI Shapefile (shp) format plugin

2019-09-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16934576#comment-16934576
 ] 

ASF GitHub Bot commented on DRILL-4303:
---

cgivre commented on pull request #335: DRILL-4303: ESRI Shapefile (shp) format 
plugin
URL: https://github.com/apache/drill/pull/335
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> ESRI Shapefile (shp) format plugin
> --
>
> Key: DRILL-4303
> URL: https://issues.apache.org/jira/browse/DRILL-4303
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Reporter: Karol Potocki
>Priority: Major
>
> Allow Drill (drill-gis) to read esri shapefiles, one of the most popular 
> geospatial formats.
> Format described here:
> https://www.esri.com/library/whitepapers/pdfs/shapefile.pdf
> It consists of three files (prj - srid information, dbf - data fields, shp - 
> geometry data)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-4303) ESRI Shapefile (shp) format plugin

2017-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257300#comment-16257300
 ] 

ASF GitHub Bot commented on DRILL-4303:
---

Github user cgivre commented on the issue:

https://github.com/apache/drill/pull/335
  
HI @k255 
Are you still interested in this?  I think if we're going to get the GIS 
functions into Drill, we really should get this in as well and I'm happy to 
help.  For whatever reason, I didn't see this until this morning.  Anyway, if 
you'd be willing to rebase this, I can help shepherd it through the review 
process.
-- C


> ESRI Shapefile (shp) format plugin
> --
>
> Key: DRILL-4303
> URL: https://issues.apache.org/jira/browse/DRILL-4303
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Reporter: Karol Potocki
>
> Allow Drill (drill-gis) to read esri shapefiles, one of the most popular 
> geospatial formats.
> Format described here:
> https://www.esri.com/library/whitepapers/pdfs/shapefile.pdf
> It consists of three files (prj - srid information, dbf - data fields, shp - 
> geometry data)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-4303) ESRI Shapefile (shp) format plugin

2016-01-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15112388#comment-15112388
 ] 

ASF GitHub Bot commented on DRILL-4303:
---

GitHub user k255 opened a pull request:

https://github.com/apache/drill/pull/335

DRILL-4303: ESRI Shapefile (shp) format plugin

Shp format plugin. Main idea is to read shapefiles for joining with other 
sources or enabling the conversion to i.e. parquet file which is capable of 
storing geometry data in binary format (WKT) on hdfs.
The implementation is based on esri java lib which lets to parse single 
geometry definition. Custom code is written to read whole file 
(ShapefileByteBufferCursor). The plugin also handles reading of accompanying 
data file (dbf) and srid informations (srid). 
Sample usage:
- reading shp
```select *, ST_AsText(geom) from cp.`sample-data/CA-cities.shp`;```

- conversion to parquet
```alter session set `store.format`='parquet';```
```create table dfs.tmp.`/CA-cities-par` as select * from 
cp.`sample-data/CA-cities.shp`;```

There is also sample parquet file in cp.`sample-data/CA-cities.parquet`

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/k255/drill drill-gis-shp

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/335.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #335


commit ecaa6ff5303cd179cc0c0f96518b1ee69ff40955
Author: potocki 
Date:   2016-01-22T11:21:04Z

ESRI Shapefile (shp) reader implemented as drill format plugin

commit 91ccd1ccf0d06802dcf0da2ee1ef83c903c248af
Author: potocki 
Date:   2016-01-22T12:19:00Z

added sample file in parquet format




> ESRI Shapefile (shp) format plugin
> --
>
> Key: DRILL-4303
> URL: https://issues.apache.org/jira/browse/DRILL-4303
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Reporter: Karol Potocki
>
> Allow Drill (drill-gis) to read esri shapefiles, one of the most popular 
> geospatial formats.
> Format described here:
> https://www.esri.com/library/whitepapers/pdfs/shapefile.pdf
> It consists of three files (prj - srid information, dbf - data fields, shp - 
> geometry data)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)