[jira] [Created] (DRILL-4303) ESRI Shapefile (shp) format plugin

2016-01-22 Thread Karol Potocki (JIRA)
Karol Potocki created DRILL-4303:


 Summary: ESRI Shapefile (shp) format plugin
 Key: DRILL-4303
 URL: https://issues.apache.org/jira/browse/DRILL-4303
 Project: Apache Drill
  Issue Type: Improvement
  Components: Storage - Other
Reporter: Karol Potocki


Allow Drill (drill-gis) to read esri shapefiles, one of the most popular 
geospatial formats.
Format described here:
https://www.esri.com/library/whitepapers/pdfs/shapefile.pdf

It consists of three files (prj - srid information, dbf - data fields, shp - 
geometry data)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4200) drill-jdbc-storage: applies timezone to java.sql.Date field and fails

2015-12-16 Thread Karol Potocki (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059838#comment-15059838
 ] 

Karol Potocki commented on DRILL-4200:
--

This [thread|http://stackoverflow.com/a/2306051] might be helpful.

The issue seems related with 
[DRILL-3882|https://issues.apache.org/jira/browse/DRILL-3882].

> drill-jdbc-storage: applies timezone to java.sql.Date field and fails
> -
>
> Key: DRILL-4200
> URL: https://issues.apache.org/jira/browse/DRILL-4200
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Other
>Affects Versions: 1.3.0
> Environment: drill-jdbc-storage plugin configured (based on 
> https://drill.apache.org/docs/rdbms-storage-plugin) with 
> org.relique.jdbc.csv.CsvDriver to access dbf (dbase) files.
>Reporter: Karol Potocki
>
> When using org.relique.jdbc.csv.CsvDriver to query files with date fields 
> (i.e. 2012-05-01) causes:
> {code}
> UnsupportedOperationException: Method not supported: ResultSet.getDate(int, 
> Calendar)
> {code}
> In JdbcRecordReader.java:406  there is getDate which tries to apply timezone 
> to java.sql.Date which probably is not timezone related and this brings the 
> error.
> Quick fix is to use ResultSet.getDate(int) instead.
> Details:
> {code}
> Caused by: java.lang.UnsupportedOperationException: Method not supported: 
> Result
> Set.getDate(int, Calendar)
> at org.relique.jdbc.csv.CsvResultSet.getDate(Unknown Source) 
> ~[csvjdbc-1
> .0-28.jar:na]
> at 
> org.apache.commons.dbcp.DelegatingResultSet.getDate(DelegatingResultS
> et.java:574) ~[commons-dbcp-1.4.jar:1.4]
> at 
> org.apache.commons.dbcp.DelegatingResultSet.getDate(DelegatingResultS
> et.java:574) ~[commons-dbcp-1.4.jar:1.4]
> at 
> org.apache.drill.exec.store.jdbc.JdbcRecordReader$DateCopier.copy(Jdb
> cRecordReader.java:406) 
> ~[drill-jdbc-storage-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
> at 
> org.apache.drill.exec.store.jdbc.JdbcRecordReader.next(JdbcRecordRead
> er.java:242) ~[drill-jdbc-storage-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4200) drill-jdbc-storage: applies timezone to java.sql.Date field and fails

2015-12-15 Thread Karol Potocki (JIRA)
Karol Potocki created DRILL-4200:


 Summary: drill-jdbc-storage: applies timezone to java.sql.Date 
field and fails
 Key: DRILL-4200
 URL: https://issues.apache.org/jira/browse/DRILL-4200
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - Other
Affects Versions: 1.3.0
 Environment: drill-jdbc-storage plugin configured (based on 
https://drill.apache.org/docs/rdbms-storage-plugin) with 
org.relique.jdbc.csv.CsvDriver to access dbf (dbase) files.
Reporter: Karol Potocki


When using org.relique.jdbc.csv.CsvDriver to query files with date fields (i.e. 
2012-05-01) causes:

{code}
UnsupportedOperationException: Method not supported: ResultSet.getDate(int, 
Calendar)
{code}

In JdbcRecordReader.java:406  there is getDate which tries to apply timezone to 
java.sql.Date which probably is not timezone related and this brings the error.

Quick fix is to use ResultSet.getDate(int) instead.

Details:
{code}
Caused by: java.lang.UnsupportedOperationException: Method not supported: Result
Set.getDate(int, Calendar)
at org.relique.jdbc.csv.CsvResultSet.getDate(Unknown Source) ~[csvjdbc-1
.0-28.jar:na]
at org.apache.commons.dbcp.DelegatingResultSet.getDate(DelegatingResultS
et.java:574) ~[commons-dbcp-1.4.jar:1.4]
at org.apache.commons.dbcp.DelegatingResultSet.getDate(DelegatingResultS
et.java:574) ~[commons-dbcp-1.4.jar:1.4]
at org.apache.drill.exec.store.jdbc.JdbcRecordReader$DateCopier.copy(Jdb
cRecordReader.java:406) ~[drill-jdbc-storage-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
at org.apache.drill.exec.store.jdbc.JdbcRecordReader.next(JdbcRecordRead
er.java:242) ~[drill-jdbc-storage-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4091) Support more functions in gis contrib module

2015-12-15 Thread Karol Potocki (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karol Potocki updated DRILL-4091:
-
Target Version/s: 1.5.0

> Support more functions in gis contrib module
> 
>
> Key: DRILL-4091
> URL: https://issues.apache.org/jira/browse/DRILL-4091
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Functions - Drill
>Reporter: Karol Potocki
>
> Support for commonly used gis functions in gis contrib module: relate, 
> contains, crosses, intersects, touches, difference, disjoint, buffer, union 
> etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4091) Support more functions in gis contrib module

2015-11-17 Thread Karol Potocki (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karol Potocki updated DRILL-4091:
-
Target Version/s:   (was: 1.3.0)

> Support more functions in gis contrib module
> 
>
> Key: DRILL-4091
> URL: https://issues.apache.org/jira/browse/DRILL-4091
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Functions - Drill
>Reporter: Karol Potocki
>
> Support for commonly used gis functions in gis contrib module: relate, 
> contains, crosses, intersects, touches, difference, disjoint, buffer, union 
> etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4091) Support more functions in gis contrib module

2015-11-16 Thread Karol Potocki (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15006648#comment-15006648
 ] 

Karol Potocki commented on DRILL-4091:
--

extends DRILL-3914 functionality

> Support more functions in gis contrib module
> 
>
> Key: DRILL-4091
> URL: https://issues.apache.org/jira/browse/DRILL-4091
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Functions - Drill
>Reporter: Karol Potocki
>
> Support for commonly used gis functions in gis contrib module: relate, 
> contains, crosses, intersects, touches, difference, disjoint, buffer, union 
> etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4091) Support more functions in gis contrib module

2015-11-16 Thread Karol Potocki (JIRA)
Karol Potocki created DRILL-4091:


 Summary: Support more functions in gis contrib module
 Key: DRILL-4091
 URL: https://issues.apache.org/jira/browse/DRILL-4091
 Project: Apache Drill
  Issue Type: Improvement
  Components: Functions - Drill
Reporter: Karol Potocki


Support for commonly used gis functions in gis contrib module: relate, 
contains, crosses, intersects, touches, difference, disjoint, buffer, union etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3747) UDF for "fuzzy" string and similarity matching

2015-10-30 Thread Karol Potocki (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14982753#comment-14982753
 ] 

Karol Potocki commented on DRILL-3747:
--

Such functionality is often required when we search through data produced by 
user collaboration (i.e. street names etc. in internet datasources) or we make 
search conditions based on user input (handling spelling mistakes).
Recently I needed solution like that, basic implementation is on my github:
https://github.com/k255/drill-fuzzy-search
It works on simmetrics library which recently went apache license.

> UDF for "fuzzy" string and similarity matching
> --
>
> Key: DRILL-3747
> URL: https://issues.apache.org/jira/browse/DRILL-3747
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Functions - Drill
>Affects Versions: Future
>Reporter: Edmon Begoli
>Priority: Minor
>  Labels: features
> Fix For: Future
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> I propose implementation of string/distance or distance matching functions 
> similar to what one finds in most of other databases - soundex, metaphone, 
> levenshtein (and more advanced variants such as levenshtein-damerau, 
> jaro-winkler, etc.).
> See fuzzystrmatch 
> http://www.postgresql.org/docs/9.5/static/fuzzystrmatch.html, 
> and pg_similarity http://pgsimilarity.projects.pgfoundry.org/
> for inspiration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (DRILL-3747) UDF for "fuzzy" string and similarity matching

2015-10-30 Thread Karol Potocki (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14982753#comment-14982753
 ] 

Karol Potocki edited comment on DRILL-3747 at 10/30/15 6:24 PM:


Such functionality is often required when we search through data produced by 
user collaboration (i.e. street names etc. in internet datasources) or we make 
search conditions based on user input (handling typos).
Recently I needed solution like that, basic implementation is on my github:
https://github.com/k255/drill-fuzzy-search
It works on simmetrics library which recently went apache license.


was (Author: k255):
Such functionality is often required when we search through data produced by 
user collaboration (i.e. street names etc. in internet datasources) or we make 
search conditions based on user input (handling spelling mistakes).
Recently I needed solution like that, basic implementation is on my github:
https://github.com/k255/drill-fuzzy-search
It works on simmetrics library which recently went apache license.

> UDF for "fuzzy" string and similarity matching
> --
>
> Key: DRILL-3747
> URL: https://issues.apache.org/jira/browse/DRILL-3747
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Functions - Drill
>Affects Versions: Future
>Reporter: Edmon Begoli
>Priority: Minor
>  Labels: features
> Fix For: Future
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> I propose implementation of string/distance or distance matching functions 
> similar to what one finds in most of other databases - soundex, metaphone, 
> levenshtein (and more advanced variants such as levenshtein-damerau, 
> jaro-winkler, etc.).
> See fuzzystrmatch 
> http://www.postgresql.org/docs/9.5/static/fuzzystrmatch.html, 
> and pg_similarity http://pgsimilarity.projects.pgfoundry.org/
> for inspiration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-3914) Support geospatial queries

2015-10-08 Thread Karol Potocki (JIRA)
Karol Potocki created DRILL-3914:


 Summary: Support geospatial queries
 Key: DRILL-3914
 URL: https://issues.apache.org/jira/browse/DRILL-3914
 Project: Apache Drill
  Issue Type: Improvement
  Components: Client - JDBC, Functions - Drill
Reporter: Karol Potocki


Implement spatial query functionality in Drill to provide location based 
queries and filtering. It could be similar to PostGIS for Postgres and allow 
queries like:

select * from
(select columns[2] as location, columns[4] as lon, columns[3] as lat,
ST_DWithin(ST_Point(-121.895, 37.339), ST_Point(columns[4], 
columns[3]), 0.1) as isWithin
from dfs.`default`.`/home/k255/drill/sample-data/CA-cities.csv`
)
where isWithin = true;

Working proposal is available at http://github.com/k255 (see drill-gis and 
drill fork).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)