[jira] [Commented] (DRILL-4369) Database driver fails to report any major or minor version information

2016-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15587318#comment-15587318
 ] 

ASF GitHub Bot commented on DRILL-4369:
---

Github user adeneche commented on a diff in the pull request:

https://github.com/apache/drill/pull/622#discussion_r83983884
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/client/DrillClient.java ---
@@ -176,6 +180,23 @@ public void setAutoRead(boolean enableAutoRead) {
   }
 
   /**
+   * Sets the client name.
+   *
+   * If not set, default is {@code DrillClient#DEFAULT_CLIENT_NAME}.
+   *
+   * @param name the client name
+   *
+   * @throws IllegalStateException if called after a connection has been 
established.
+   * @throws NullPointerException if client name is empty
--- End diff --

you are not checking if the name is null


> Database driver fails to report any major or minor version information
> --
>
> Key: DRILL-4369
> URL: https://issues.apache.org/jira/browse/DRILL-4369
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - JDBC
>Affects Versions: 1.4.0
>Reporter: N Campbell
>
> Using Apache 1.4 Drill
> The DatabaseMetadata.getters to obtain the Major and Minor versions of the 
> server or JDBC driver return 0 instead of 1.4.
> This prevents an application from dynamically adjusting how it interacts 
> based on which version of Drill a connection is accessing.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4369) Database driver fails to report any major or minor version information

2016-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15587315#comment-15587315
 ] 

ASF GitHub Bot commented on DRILL-4369:
---

Github user adeneche commented on a diff in the pull request:

https://github.com/apache/drill/pull/622#discussion_r83983836
  
--- Diff: 
protocol/src/main/java/org/apache/drill/exec/proto/SchemaUserProtos.java ---
@@ -255,6 +255,152 @@ public static int getFieldNumber(java.lang.String 
name)
 }
 }
 
+public static final class RpcEndpointInfos
--- End diff --

my bad


> Database driver fails to report any major or minor version information
> --
>
> Key: DRILL-4369
> URL: https://issues.apache.org/jira/browse/DRILL-4369
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - JDBC
>Affects Versions: 1.4.0
>Reporter: N Campbell
>
> Using Apache 1.4 Drill
> The DatabaseMetadata.getters to obtain the Major and Minor versions of the 
> server or JDBC driver return 0 instead of 1.4.
> This prevents an application from dynamically adjusting how it interacts 
> based on which version of Drill a connection is accessing.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4369) Database driver fails to report any major or minor version information

2016-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15587277#comment-15587277
 ] 

ASF GitHub Bot commented on DRILL-4369:
---

Github user laurentgo commented on a diff in the pull request:

https://github.com/apache/drill/pull/622#discussion_r83982899
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/client/DrillClient.java ---
@@ -176,6 +180,23 @@ public void setAutoRead(boolean enableAutoRead) {
   }
 
   /**
+   * Sets the client name.
+   *
+   * If not set, default is {@code DrillClient#DEFAULT_CLIENT_NAME}.
+   *
+   * @param name the client name
+   *
+   * @throws IllegalStateException if called after a connection has been 
established.
+   * @throws NullPointerException if client name is empty
--- End diff --

it should be changed to if client name is null


> Database driver fails to report any major or minor version information
> --
>
> Key: DRILL-4369
> URL: https://issues.apache.org/jira/browse/DRILL-4369
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - JDBC
>Affects Versions: 1.4.0
>Reporter: N Campbell
>
> Using Apache 1.4 Drill
> The DatabaseMetadata.getters to obtain the Major and Minor versions of the 
> server or JDBC driver return 0 instead of 1.4.
> This prevents an application from dynamically adjusting how it interacts 
> based on which version of Drill a connection is accessing.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4369) Database driver fails to report any major or minor version information

2016-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15587276#comment-15587276
 ] 

ASF GitHub Bot commented on DRILL-4369:
---

Github user laurentgo commented on a diff in the pull request:

https://github.com/apache/drill/pull/622#discussion_r83982861
  
--- Diff: 
exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillDatabaseMetaDataImpl.java
 ---
@@ -54,6 +55,10 @@ private void throwIfClosed() throws 
AlreadyClosedSqlException,
 }
   }
 
+  private RpcEndpointInfos getServerInfos() throws SQLException {
+DrillConnectionImpl connection = (DrillConnectionImpl) getConnection();
+return connection.getClient().getServerInfos();
--- End diff --

yes, because the client is initialized as part of the constructor of 
DrillConnectionImpl (the field is final too)


> Database driver fails to report any major or minor version information
> --
>
> Key: DRILL-4369
> URL: https://issues.apache.org/jira/browse/DRILL-4369
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - JDBC
>Affects Versions: 1.4.0
>Reporter: N Campbell
>
> Using Apache 1.4 Drill
> The DatabaseMetadata.getters to obtain the Major and Minor versions of the 
> server or JDBC driver return 0 instead of 1.4.
> This prevents an application from dynamically adjusting how it interacts 
> based on which version of Drill a connection is accessing.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4369) Database driver fails to report any major or minor version information

2016-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15587274#comment-15587274
 ] 

ASF GitHub Bot commented on DRILL-4369:
---

Github user laurentgo commented on a diff in the pull request:

https://github.com/apache/drill/pull/622#discussion_r83982790
  
--- Diff: 
protocol/src/main/java/org/apache/drill/exec/proto/SchemaUserProtos.java ---
@@ -255,6 +255,152 @@ public static int getFieldNumber(java.lang.String 
name)
 }
 }
 
+public static final class RpcEndpointInfos
--- End diff --

no, it's in the review


> Database driver fails to report any major or minor version information
> --
>
> Key: DRILL-4369
> URL: https://issues.apache.org/jira/browse/DRILL-4369
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - JDBC
>Affects Versions: 1.4.0
>Reporter: N Campbell
>
> Using Apache 1.4 Drill
> The DatabaseMetadata.getters to obtain the Major and Minor versions of the 
> server or JDBC driver return 0 instead of 1.4.
> This prevents an application from dynamically adjusting how it interacts 
> based on which version of Drill a connection is accessing.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4369) Database driver fails to report any major or minor version information

2016-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15587262#comment-15587262
 ] 

ASF GitHub Bot commented on DRILL-4369:
---

Github user adeneche commented on a diff in the pull request:

https://github.com/apache/drill/pull/622#discussion_r83980514
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/client/DrillClient.java ---
@@ -176,6 +180,23 @@ public void setAutoRead(boolean enableAutoRead) {
   }
 
   /**
+   * Sets the client name.
+   *
+   * If not set, default is {@code DrillClient#DEFAULT_CLIENT_NAME}.
+   *
+   * @param name the client name
+   *
+   * @throws IllegalStateException if called after a connection has been 
established.
+   * @throws NullPointerException if client name is empty
--- End diff --

it doesn't :P


> Database driver fails to report any major or minor version information
> --
>
> Key: DRILL-4369
> URL: https://issues.apache.org/jira/browse/DRILL-4369
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - JDBC
>Affects Versions: 1.4.0
>Reporter: N Campbell
>
> Using Apache 1.4 Drill
> The DatabaseMetadata.getters to obtain the Major and Minor versions of the 
> server or JDBC driver return 0 instead of 1.4.
> This prevents an application from dynamically adjusting how it interacts 
> based on which version of Drill a connection is accessing.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4369) Database driver fails to report any major or minor version information

2016-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15587263#comment-15587263
 ] 

ASF GitHub Bot commented on DRILL-4369:
---

Github user adeneche commented on a diff in the pull request:

https://github.com/apache/drill/pull/622#discussion_r83982303
  
--- Diff: 
protocol/src/main/java/org/apache/drill/exec/proto/SchemaUserProtos.java ---
@@ -255,6 +255,152 @@ public static int getFieldNumber(java.lang.String 
name)
 }
 }
 
+public static final class RpcEndpointInfos
--- End diff --

Did you forget to include the modified .proto file ?


> Database driver fails to report any major or minor version information
> --
>
> Key: DRILL-4369
> URL: https://issues.apache.org/jira/browse/DRILL-4369
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - JDBC
>Affects Versions: 1.4.0
>Reporter: N Campbell
>
> Using Apache 1.4 Drill
> The DatabaseMetadata.getters to obtain the Major and Minor versions of the 
> server or JDBC driver return 0 instead of 1.4.
> This prevents an application from dynamically adjusting how it interacts 
> based on which version of Drill a connection is accessing.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4369) Database driver fails to report any major or minor version information

2016-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15587264#comment-15587264
 ] 

ASF GitHub Bot commented on DRILL-4369:
---

Github user adeneche commented on a diff in the pull request:

https://github.com/apache/drill/pull/622#discussion_r83981530
  
--- Diff: 
exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillDatabaseMetaDataImpl.java
 ---
@@ -54,6 +55,10 @@ private void throwIfClosed() throws 
AlreadyClosedSqlException,
 }
   }
 
+  private RpcEndpointInfos getServerInfos() throws SQLException {
+DrillConnectionImpl connection = (DrillConnectionImpl) getConnection();
+return connection.getClient().getServerInfos();
--- End diff --

are guaranteed _connection.getClient()_ won't return null ?


> Database driver fails to report any major or minor version information
> --
>
> Key: DRILL-4369
> URL: https://issues.apache.org/jira/browse/DRILL-4369
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - JDBC
>Affects Versions: 1.4.0
>Reporter: N Campbell
>
> Using Apache 1.4 Drill
> The DatabaseMetadata.getters to obtain the Major and Minor versions of the 
> server or JDBC driver return 0 instead of 1.4.
> This prevents an application from dynamically adjusting how it interacts 
> based on which version of Drill a connection is accessing.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4864) Add ANSI format for date/time functions

2016-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15587149#comment-15587149
 ] 

ASF GitHub Bot commented on DRILL-4864:
---

Github user adeneche commented on the issue:

https://github.com/apache/drill/pull/581
  
There seem to be an issue with milliseconds. Consider the following:
```TO_DATE('2013-01-01 12:13:14.001', '-MM-DD HH:MI:SS:MS')```
This works fine in Postgresql but fails in Drill with the following error:
```Invalid format: "2013-01-01 12:13:14.001" is malformed at ".001"```

The issue seems to be that MS is being converted to S but it should be SSS 
instead


> Add ANSI format for date/time functions
> ---
>
> Key: DRILL-4864
> URL: https://issues.apache.org/jira/browse/DRILL-4864
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.8.0
>Reporter: Serhii Harnyk
>Assignee: Gautam Kumar Parai
>  Labels: doc-impacting
> Fix For: 1.9.0
>
>
> The TO_DATE() is exposing the Joda string formatting conventions into the SQL 
> layer. This is not following SQL conventions used by ANSI and many other 
> database engines on the market.
> Add new UDF "ansi_to_joda(string)", that takes string that represents ANSI 
> datetime format and returns string that represents equal Joda format.
> Add new session option "drill.exec.fn.to_date_format" that can be one of two 
> values - "JODA"(default) and "ANSI".
> If option is set to "JODA" queries with to_date() function would work in 
> usual way.
> If option is set to "ANSI" second argument would be wrapped with 
> ansi_to_joda() function, that allows user to use ANSI datetime format
> Wrapping is used in to_date(), to_time() and to_timestamp() functions.
> Table of joda and ansi patterns which may be replaced
> ||Pattern name||  Ansi format ||  JodaTime format
> | Full name of day|   day |   
> | Day of year |   ddd |   D
> | Day of month|   dd  |   d
> | Day of week |   d   |   e
> | Name of month   |   month   |   
> | Abr name of month   |   mon |   MMM
> | Full era name   |   ee  |   G
> | Name of day |   dy  |   E
> | Time zone   |   tz  |   TZ
> | Hour 12 |   hh  |   h
> | Hour 12 |   hh12|   h
> | Hour 24 |   hh24|   H
> | Minute of hour  |   mi  |   m
> | Second of minute|   ss  |   s
> | Millisecond of minute   |   ms  |   S
> | Week of year|   ww  |   w
> | Month   |   mm  |   MM
> | Halfday am  |   am  |   aa
> | Halfday pm  |   pm  |   aa
> | ref.|   
> https://www.postgresql.org/docs/8.2/static/functions-formatting.html| 
>   
> http://www.joda.org/joda-time/apidocs/org/joda/time/format/DateTimeFormat.html
>  |
> Table of ansi pattern modifiers, which may be deleted from string
> ||Description ||  Pattern ||
> | fill mode (suppress padding blanks and zeroes)  |   fm  |
> | fixed format global option (see usage notes)|   fx  |
> | translation mode (print localized day and month names based on 
> lc_messages) |   tm  |
> | spell mode (not yet implemented)|   sp  |
> | ref.|   
> https://www.postgresql.org/docs/8.2/static/functions-formatting.html|



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (DRILL-4921) Scripts drill_config.sh, drillbit.sh, and drill-embedded fail when accessed via a symbolic link

2016-10-18 Thread Zelaine Fong (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong reassigned DRILL-4921:
---

Assignee: Paul Rogers

Assigning to Paul for code review.

> Scripts drill_config.sh,  drillbit.sh, and drill-embedded fail when accessed 
> via a symbolic link
> 
>
> Key: DRILL-4921
> URL: https://issues.apache.org/jira/browse/DRILL-4921
> Project: Apache Drill
>  Issue Type: Bug
>  Components:  Server
>Affects Versions: 1.8.0
> Environment: The drill-embedded on the Mac; the other files on Linux
>Reporter: Boaz Ben-Zvi
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.9.0
>
>
>   Several of the drill... scripts under $DRILL_HOME/bin use "pwd" to produce 
> the local path of that script. However "pwd" defaults to "logical" (i.e. the 
> same as "pwd -L"); so if accessed via a symbolic link, that link is used 
> verbatim in the path, which can produce wrong paths (e.g., when followed by 
> "cd ..").
> For example, creating a symbolic link and using it (on the Mac):
> $  cd ~/drill
> $  ln -s $DRILL_HOME/bin 
> $  bin/drill-embedded
> ERROR: Drill config file missing: 
> /Users/boazben-zvi/drill/conf/drill-override.conf -- Wrong config dir?
> Similarly on Linux the CLASS_PATH gets set wrong (when running "drillbit.sh 
> start" via a symlink).
> Solution: need to replace all the "pwd" in all the scripts with "pwd -P" 
> which produces the Physical path. (Or replace a preceding "cd" with "cd -P" 
> which does the same).
> Relevant scripts:
> =
> $ cd bin; grep pwd *
> drillbit.sh:bin=`cd "$bin">/dev/null; pwd`
> drillbit.sh:  echo "cwd:" `pwd`
> drill-conf:bin=`cd "$bin">/dev/null; pwd`
> drill-config.sh:home=`cd "$bin/..">/dev/null; pwd`
> drill-config.sh:  DIR="$( cd -P "$( dirname "$SOURCE" )" && pwd )"
> drill-config.sh:JAVA_HOME="$( cd -P "$( dirname "$SOURCE" )" && cd .. && 
> pwd )"
> drill-embedded:bin=`cd "$bin">/dev/null; pwd`
> drill-localhost:bin=`cd "$bin">/dev/null; pwd`
> submit_plan:bin=`cd "$bin">/dev/null; pwd`
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4203) Parquet File : Date is stored wrongly

2016-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15587060#comment-15587060
 ] 

ASF GitHub Bot commented on DRILL-4203:
---

Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/595


> Parquet File : Date is stored wrongly
> -
>
> Key: DRILL-4203
> URL: https://issues.apache.org/jira/browse/DRILL-4203
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.4.0
>Reporter: Stéphane Trou
>Assignee: Vitalii Diravka
>Priority: Critical
>  Labels: doc-impacting
> Fix For: 1.9.0
>
>
> Hello,
> I have some problems when i try to read parquet files produce by drill with  
> Spark,  all dates are corrupted.
> I think the problem come from drill :)
> {code}
> cat /tmp/date_parquet.csv 
> Epoch,1970-01-01
> {code}
> {code}
> 0: jdbc:drill:zk=local> select columns[0] as name, cast(columns[1] as date) 
> as epoch_date from dfs.tmp.`date_parquet.csv`;
> ++-+
> |  name  | epoch_date  |
> ++-+
> | Epoch  | 1970-01-01  |
> ++-+
> {code}
> {code}
> 0: jdbc:drill:zk=local> create table dfs.tmp.`buggy_parquet`as select 
> columns[0] as name, cast(columns[1] as date) as epoch_date from 
> dfs.tmp.`date_parquet.csv`;
> +---++
> | Fragment  | Number of records written  |
> +---++
> | 0_0   | 1  |
> +---++
> {code}
> When I read the file with parquet tools, i found  
> {code}
> java -jar parquet-tools-1.8.1.jar head /tmp/buggy_parquet/
> name = Epoch
> epoch_date = 4881176
> {code}
> According to 
> [https://github.com/Parquet/parquet-format/blob/master/LogicalTypes.md#date], 
> epoch_date should be equals to 0.
> Meta : 
> {code}
> java -jar parquet-tools-1.8.1.jar meta /tmp/buggy_parquet/
> file:file:/tmp/buggy_parquet/0_0_0.parquet 
> creator: parquet-mr version 1.8.1-drill-r0 (build 
> 6b605a4ea05b66e1a6bf843353abcb4834a4ced8) 
> extra:   drill.version = 1.4.0 
> file schema: root 
> 
> name:OPTIONAL BINARY O:UTF8 R:0 D:1
> epoch_date:  OPTIONAL INT32 O:DATE R:0 D:1
> row group 1: RC:1 TS:93 OFFSET:4 
> 
> name: BINARY SNAPPY DO:0 FPO:4 SZ:52/50/0,96 VC:1 
> ENC:RLE,BIT_PACKED,PLAIN
> epoch_date:   INT32 SNAPPY DO:0 FPO:56 SZ:45/43/0,96 VC:1 
> ENC:RLE,BIT_PACKED,PLAIN
> {code}
> Implementation:
> After the fix Drill can automatically determine date corruption in parquet 
> files 
> and convert it to correct values.
> For the reason, when the user want to work with the dates over the 5 000 
> years,
> an option is included to turn off the auto-correction.
> Use of this option is assumed to be extremely unlikely, but it is included for
> completeness.
> To disable "auto correction" you should use the parquet config in the plugin 
> settings. Something like this:
> {code}
>   "formats": {
> "parquet": {
>   "type": "parquet",
>   "autoCorrectCorruptDates": false
> }
> {code}
> Or you can try to use the query like this:
> {code}
> select l_shipdate, l_commitdate from 
> table(dfs.`/drill/testdata/parquet_date/dates_nodrillversion/drillgen2_lineitem`
>  
> (type => 'parquet', autoCorrectCorruptDates => false)) limit 1;
> {code}
> After the fix the new files generated from drill will have 
> "is.date.correct=true" extra property in parquet 
> metadata, which defines that the file can't involve corrupted date values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4653) Malformed JSON should not stop the entire query from progressing

2016-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15587057#comment-15587057
 ] 

ASF GitHub Bot commented on DRILL-4653:
---

Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/518


> Malformed JSON should not stop the entire query from progressing
> 
>
> Key: DRILL-4653
> URL: https://issues.apache.org/jira/browse/DRILL-4653
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - JSON
>Affects Versions: 1.6.0
>Reporter: subbu srinivasan
> Fix For: Future
>
>
> Currently Drill query terminates upon first encounter of a invalid JSON line.
> Drill has to continue progressing after ignoring the bad records. Something 
> similar to a setting of (ignore.malformed.json) would help.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4870) drill-config.sh sets JAVA_HOME incorrectly for the Mac

2016-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15587058#comment-15587058
 ] 

ASF GitHub Bot commented on DRILL-4870:
---

Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/605


> drill-config.sh sets JAVA_HOME incorrectly for the Mac
> --
>
> Key: DRILL-4870
> URL: https://issues.apache.org/jira/browse/DRILL-4870
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.8.0
> Environment: MacOS with unset JAVA_HOME
>Reporter: Paul Rogers
>Assignee: Chunhui Shi
>Priority: Minor
> Fix For: 1.9.0
>
>
> It turns out that drill-config.sh is both improperly and unnecessarily 
> setting the JAVA_HOME envrironment variable. That setting should be removed.
> In the Drill 1.7 version, drill-config.sh checks if the JAVA_HOME environment 
> variable is set. If not, it sets JAVA_HOME based on its guess as to the 
> proper value.
> In the 1.7 version, the veriable was set, but not exported, so the variable 
> was never actually used.
> The recent script fixes for 1.8 "fixed" the export problem. The fix works 
> fine on Linux. But, the Java install on the Mac has a different structure 
> than that on Linux. The value that drill-config.sh guesses is fine for Linux, 
> wrong for the Mac.
> When we export the (wrong) JAVA_HOME, Mac users who have not set JAVA_HOME 
> will get the following error when using a Drill script:
> ./drill-embedded 
> Unable to locate an executable at 
> "/System/Library/Frameworks/JavaVM.framework/Versions/A/bin/java"
> Mac users who do set JAVA_HOME will not encounter the problem (because 
> drill-config.sh does not change an existing value.)
> It seems likely that someone in the past ecountered the same problem and 
> removed the export of DRILL_HOME as an attempt to fix the problem.
> As it turns out, Java does know how to set JAVA_HOME properly if not set. So, 
> setting JAVA_HOME is unnecessary.
> The proper fix is to remove JAVA_HOME setting from drill-config.sh.
> The workaround for any 1.8 user who encounters the problem is to edit their 
> $DRILL_HOME/bin/drill-config.sh file and delete this line near the end of the 
> file:
> export JAVA_HOME



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4726) Dynamic UDFs support

2016-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15587059#comment-15587059
 ] 

ASF GitHub Bot commented on DRILL-4726:
---

Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/574


> Dynamic UDFs support
> 
>
> Key: DRILL-4726
> URL: https://issues.apache.org/jira/browse/DRILL-4726
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: 1.6.0
>Reporter: Arina Ielchiieva
>Assignee: Paul Rogers
> Fix For: Future
>
>
> Allow register UDFs without  restart of Drillbits.
> Design is described in document below:
> https://docs.google.com/document/d/1FfyJtWae5TLuyheHCfldYUpCdeIezR2RlNsrOTYyAB4/edit?usp=sharing
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3178) csv reader should allow newlines inside quotes

2016-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15587056#comment-15587056
 ] 

ASF GitHub Bot commented on DRILL-3178:
---

Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/593


> csv reader should allow newlines inside quotes 
> ---
>
> Key: DRILL-3178
> URL: https://issues.apache.org/jira/browse/DRILL-3178
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Text & CSV
>Affects Versions: 1.0.0
> Environment: Ubuntu Trusty 14.04.2 LTS
>Reporter: Neal McBurnett
>Assignee: F Méthot
> Fix For: Future
>
> Attachments: drill-3178.patch
>
>
> When reading a csv file which contains newlines within quoted strings, e.g. 
> via
> select * from dfs.`/tmp/q.csv`;
> Drill 1.0 says:
> Error: SYSTEM ERROR: com.univocity.parsers.common.TextParsingException:  
> Error processing input: Cannot use newline character within quoted string
> But many tools produce csv files with newlines in quoted strings.  Drill 
> should be able to handle them.
> Workaround: the csvquote program (https://github.com/dbro/csvquote) can 
> encode embedded commas and newlines, and even decode them later if desired.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4791) Provide a light-weight, versioned client API

2016-10-18 Thread Laurent Goujon (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15586907#comment-15586907
 ] 

Laurent Goujon commented on DRILL-4791:
---

I wonder how much of the jdbc driver size is caused by embedding classes we 
actually don't need? The RPC classes are all in exec/java-exec along with all 
the parser and execution code, so splitting the rpc part in submodule might 
achieve the size reduction you are looking at. The only exception to this is 
the embedded mode, but maybe a optional jar could be created for it. Thoughts?

Also, I don't mind having a new API, but transport should be based on something 
standard like gRPC (which is a protobuf-based protocol on top of HTTP/2): lots 
of things come for free like authentication,qos,security...

> Provide a light-weight, versioned client API
> 
>
> Key: DRILL-4791
> URL: https://issues.apache.org/jira/browse/DRILL-4791
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>
> Drill's existing client APIs are "industrial strength" - they provide full 
> access to the sophisticated distributed, columnar RPCs which Drill uses 
> internall. However, they are too complex for most client needs. Provide a 
> simpler API optimized for clients: row-based result sets, synchronous, etc.
> At the same time, Drill clients must currently link with the same version of 
> Drill code as is running on the Drill cluster. This forces clients to upgrade 
> in lock-step with the cluster. Allow Drill clients to be upgraded after (or 
> even before) the Drill cluster to simplify management of desktop apps that 
> use Drill.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4905) Push down the LIMIT to the parquet reader scan to limit the numbers of records read

2016-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15586890#comment-15586890
 ] 

ASF GitHub Bot commented on DRILL-4905:
---

Github user ppadma commented on the issue:

https://github.com/apache/drill/pull/597
  
updated diffs with review comments taken care of.


> Push down the LIMIT to the parquet reader scan to limit the numbers of 
> records read
> ---
>
> Key: DRILL-4905
> URL: https://issues.apache.org/jira/browse/DRILL-4905
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.8.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
> Fix For: 1.9.0
>
>
> Limit the number of records read from disk by pushing down the limit to 
> parquet reader.
> For queries like
> select * from  limit N; 
> where N < size of Parquet row group, we are reading 32K/64k rows or entire 
> row group. This needs to be optimized to read only N rows.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4905) Push down the LIMIT to the parquet reader scan to limit the numbers of records read

2016-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15586883#comment-15586883
 ] 

ASF GitHub Bot commented on DRILL-4905:
---

Github user ppadma commented on a diff in the pull request:

https://github.com/apache/drill/pull/597#discussion_r83965044
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/store/parquet/TestParquetGroupScan.java
 ---
@@ -117,4 +119,18 @@ public void testSelectEmptyNoCache() throws Exception {
 uex.getMessage()), uex.getMessage().contains(expectedMsg));
 }
   }
+
+  @Test
+  public void testLimit() throws Exception {
+List results = 
testSqlWithResults(String.format("select * from cp.`parquet/limitTest.parquet` 
limit 1"));
--- End diff --

done


> Push down the LIMIT to the parquet reader scan to limit the numbers of 
> records read
> ---
>
> Key: DRILL-4905
> URL: https://issues.apache.org/jira/browse/DRILL-4905
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.8.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
> Fix For: 1.9.0
>
>
> Limit the number of records read from disk by pushing down the limit to 
> parquet reader.
> For queries like
> select * from  limit N; 
> where N < size of Parquet row group, we are reading 32K/64k rows or entire 
> row group. This needs to be optimized to read only N rows.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4905) Push down the LIMIT to the parquet reader scan to limit the numbers of records read

2016-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15586882#comment-15586882
 ] 

ASF GitHub Bot commented on DRILL-4905:
---

Github user ppadma commented on a diff in the pull request:

https://github.com/apache/drill/pull/597#discussion_r83965017
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/ParquetRecordReader.java
 ---
@@ -139,6 +155,11 @@ public ParquetRecordReader(
 this.batchSize = batchSize;
 this.footer = footer;
 this.fragmentContext = fragmentContext;
+if (numRecordsToRead == DEFAULT_RECORDS_TO_READ_NOT_SPECIFIED) {
+  this.numRecordsToRead =  
footer.getBlocks().get(rowGroupIndex).getRowCount();
+} else {
+  this.numRecordsToRead = numRecordsToRead;
--- End diff --

Current code handles if numRecordsToRead is not in the range.  So, I am not 
adding additional checks here. 


> Push down the LIMIT to the parquet reader scan to limit the numbers of 
> records read
> ---
>
> Key: DRILL-4905
> URL: https://issues.apache.org/jira/browse/DRILL-4905
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.8.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
> Fix For: 1.9.0
>
>
> Limit the number of records read from disk by pushing down the limit to 
> parquet reader.
> For queries like
> select * from  limit N; 
> where N < size of Parquet row group, we are reading 32K/64k rows or entire 
> row group. This needs to be optimized to read only N rows.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4369) Database driver fails to report any major or minor version information

2016-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15586835#comment-15586835
 ] 

ASF GitHub Bot commented on DRILL-4369:
---

GitHub user laurentgo opened a pull request:

https://github.com/apache/drill/pull/622

DRILL-4369: Exchange name and version infos during handshake

There's no name and version exchanged between client and server over the 
User RPC
channel.

On client side, having access to the server name and version is useful to 
expose it
to the user (through JDBC or ODBC api like 
DatabaseMetadata#getDatabaseProductVersion()),
or to implement fallback strategy when some recent API are not available 
(like
metadata API).

On the server side, having access to the client version might be useful for 
audit
purposes and eventually to implement fallback strategy if it doesn't 
require a RPC
version change.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/laurentgo/drill 
laurent/DRILL-4369-rpc-endpoint-infos

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/622.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #622


commit 6767082b64684ce519f5101f386d4758cbd5f03c
Author: Laurent Goujon 
Date:   2016-10-18T22:01:38Z

DRILL-4369: Exchange name and version infos during handshake

There's no name and version exchanged between client and server over the 
User RPC
channel.

On client side, having access to the server name and version is useful to 
expose it
to the user (through JDBC or ODBC api like 
DatabaseMetadata#getDatabaseProductVersion()),
or to implement fallback strategy when some recent API are not available 
(like
metadata API).

On the server side, having access to the client version might be useful for 
audit
purposes and eventually to implement fallback strategy if it doesn't 
require a RPC
version change.




> Database driver fails to report any major or minor version information
> --
>
> Key: DRILL-4369
> URL: https://issues.apache.org/jira/browse/DRILL-4369
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - JDBC
>Affects Versions: 1.4.0
>Reporter: N Campbell
>
> Using Apache 1.4 Drill
> The DatabaseMetadata.getters to obtain the Major and Minor versions of the 
> server or JDBC driver return 0 instead of 1.4.
> This prevents an application from dynamically adjusting how it interacts 
> based on which version of Drill a connection is accessing.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4369) Database driver fails to report any major or minor version information

2016-10-18 Thread Laurent Goujon (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15586832#comment-15586832
 ] 

Laurent Goujon commented on DRILL-4369:
---

I just hit the same situation, so I will come up with some proposal 

> Database driver fails to report any major or minor version information
> --
>
> Key: DRILL-4369
> URL: https://issues.apache.org/jira/browse/DRILL-4369
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - JDBC
>Affects Versions: 1.4.0
>Reporter: N Campbell
>
> Using Apache 1.4 Drill
> The DatabaseMetadata.getters to obtain the Major and Minor versions of the 
> server or JDBC driver return 0 instead of 1.4.
> This prevents an application from dynamically adjusting how it interacts 
> based on which version of Drill a connection is accessing.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4950) Consume Spurious Empty Batches in JDBC

2016-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15586795#comment-15586795
 ] 

ASF GitHub Bot commented on DRILL-4950:
---

Github user parthchandra commented on the issue:

https://github.com/apache/drill/pull/621
  
+1. Thanks for fixing this!


> Consume Spurious Empty Batches in JDBC
> --
>
> Key: DRILL-4950
> URL: https://issues.apache.org/jira/browse/DRILL-4950
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - JDBC
>Reporter: Sudheesh Katkam
>Assignee: Parth Chandra
>Priority: Blocker
> Fix For: 1.9.0
>
>
> In 
> [DrillCursor|https://github.com/apache/drill/blob/master/exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillCursor.java#L199],
>  consume all empty batches, not just non-continuous empty batches. This 
> results in query cancellation (from sqlline) and incomplete results.
> Introduced (regression?) in DRILL-2548.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4950) Consume Spurious Empty Batches in JDBC

2016-10-18 Thread Sudheesh Katkam (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sudheesh Katkam updated DRILL-4950:
---
Assignee: Parth Chandra  (was: Sudheesh Katkam)

> Consume Spurious Empty Batches in JDBC
> --
>
> Key: DRILL-4950
> URL: https://issues.apache.org/jira/browse/DRILL-4950
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - JDBC
>Reporter: Sudheesh Katkam
>Assignee: Parth Chandra
>Priority: Blocker
> Fix For: 1.9.0
>
>
> In 
> [DrillCursor|https://github.com/apache/drill/blob/master/exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillCursor.java#L199],
>  consume all empty batches, not just non-continuous empty batches. This 
> results in query cancellation (from sqlline) and incomplete results.
> Introduced (regression?) in DRILL-2548.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4950) Consume Spurious Empty Batches in JDBC

2016-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15586708#comment-15586708
 ] 

ASF GitHub Bot commented on DRILL-4950:
---

GitHub user sudheeshkatkam opened a pull request:

https://github.com/apache/drill/pull/621

DRILL-4950: Remove incorrect false condition; consume all empty batches

@parthchandra please review

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sudheeshkatkam/drill DRILL-4950

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/621.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #621


commit f349648b0782aba80def7f549b6b57ad32ad3881
Author: Sudheesh Katkam 
Date:   2016-10-18T21:22:53Z

DRILL-4950: Remove incorrect false condition; consume all empty batches




> Consume Spurious Empty Batches in JDBC
> --
>
> Key: DRILL-4950
> URL: https://issues.apache.org/jira/browse/DRILL-4950
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - JDBC
>Reporter: Sudheesh Katkam
>Assignee: Sudheesh Katkam
>Priority: Blocker
> Fix For: 1.9.0
>
>
> In 
> [DrillCursor|https://github.com/apache/drill/blob/master/exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillCursor.java#L199],
>  consume all empty batches, not just non-continuous empty batches. This 
> results in query cancellation (from sqlline) and incomplete results.
> Introduced (regression?) in DRILL-2548.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4800) Improve parquet reader performance

2016-10-18 Thread Parth Chandra (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Parth Chandra updated DRILL-4800:
-
Labels: doc-impacting  (was: )

> Improve parquet reader performance
> --
>
> Key: DRILL-4800
> URL: https://issues.apache.org/jira/browse/DRILL-4800
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Parth Chandra
>Assignee: Parth Chandra
>  Labels: doc-impacting
>
> Reported by a user in the field - 
> We're generally getting read speeds of about 100-150 MB/s/node on PARQUET 
> scan operator. This seems a little low given the number of drives on the node 
> - 24. We're looking for options we can improve the performance of this 
> operator as most of our queries are I/O bound. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4947) Multiple inner join drill query fails to return result

2016-10-18 Thread Tushar Pathare (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15586559#comment-15586559
 ] 

Tushar Pathare commented on DRILL-4947:
---

This is related to following issue
https://community.mapr.com/thread/10461

> Multiple inner join drill query fails to return result
> --
>
> Key: DRILL-4947
> URL: https://issues.apache.org/jira/browse/DRILL-4947
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - RPC, Functions - Drill, Query Planning & 
> Optimization, Storage - JDBC
>Affects Versions: 1.8.0, 1.9.0
> Environment: RHEL 6.4 
> 32 Cores 256GB RAM
> 2.6.32-358.el6.x86_64
>Reporter: Tushar Pathare
>Priority: Blocker
> Attachments: Screen Shot 2016-10-15 at 4.00.33 PM.png, Screen Shot 
> 2016-10-15 at 4.12.16 PM.png, Screen Shot 2016-10-15 at 4.12.31 PM.png, 
> Screen Shot 2016-10-15 at 4.12.40 PM.png, Screen Shot 2016-10-15 at 4.12.48 
> PM.png, Screen Shot 2016-10-15 at 4.12.55 PM.png, Screen Shot 2016-10-15 at 
> 4.13.03 PM.png, Screen Shot 2016-10-15 at 4.13.11 PM.png, Screen Shot 
> 2016-10-15 at 4.13.21 PM.png, Screen Shot 2016-10-15 at 4.13.29 PM.png, 
> drilllog_jira
>
>
> Huge query fails to return results.
> The query is related to a couple of data bases in the ORACLE DB.
> We are using ojdbc7.jar as 3rd party jars
> The query looks has about 11 inner joins or joins
> The Query and Planning page just mentions 
> Major fragment showing all the columns 'SENDING'
> Minor Fragment ID Host Name   Start   End Runtime Max Records 
> Max Batches Last Update Last Progress   Peak Memory State
> 0-0   SENDING SENDING SENDING SENDING SENDING SENDING SENDING SENDING SENDING 
> SENDING
> the last log is as below 
> 016-10-15 15:11:24,924 [27fde379-3cf4-c429-8d42-470f55c5fead:foreman] DEBUG 
> o.a.d.exec.rpc.control.WorkEventBus - Adding fragment status listener for 
> queryId 27fde379-3cf4-c429-8d42-470f55c5fead.
> 2016-10-15 15:11:24,924 [27fde379-3cf4-c429-8d42-470f55c5fead:foreman] DEBUG 
> o.a.drill.exec.work.foreman.Foreman - Submitting fragments to run.
> 2016-10-15 15:11:24,927 [27fde379-3cf4-c429-8d42-470f55c5fead:foreman] DEBUG 
> o.a.drill.exec.ops.FragmentContext - Getting initial memory allocation of 
> 11300
> 2016-10-15 15:11:24,928 [27fde379-3cf4-c429-8d42-470f55c5fead:foreman] DEBUG 
> o.a.drill.exec.ops.FragmentContext - Fragment max allocation: 930093368854
> 2016-10-15 15:11:24,932 [27fde379-3cf4-c429-8d42-470f55c5fead:foreman] DEBUG 
> o.a.d.e.work.batch.IncomingBuffers - Came up with a list of 0 required 
> fragments.  Fragments {}
> 2016-10-15 15:11:24,941 [27fde379-3cf4-c429-8d42-470f55c5fead:foreman] DEBUG 
> o.a.drill.exec.work.foreman.Foreman - 27fde379-3cf4-c429-8d42-470f55c5fead: 
> State change requested STARTING --> RUNNING
> 2016-10-15 15:11:24,942 [27fde379-3cf4-c429-8d42-470f55c5fead:foreman] DEBUG 
> o.a.drill.exec.work.foreman.Foreman - Fragments running.
> 2016-10-15 15:11:25,265 [27fde379-3cf4-c429-8d42-470f55c5fead:frag:0:0] DEBUG 
> o.a.d.e.p.i.s.RemovingRecordBatch - Created.
> 2016-10-15 15:11:25,390 [27fde379-3cf4-c429-8d42-470f55c5fead:frag:0:0] DEBUG 
> o.a.d.e.p.i.s.RemovingRecordBatch - Created.
> 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4373) Drill and Hive have incompatible timestamp representations in parquet

2016-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15586088#comment-15586088
 ] 

ASF GitHub Bot commented on DRILL-4373:
---

Github user parthchandra commented on a diff in the pull request:

https://github.com/apache/drill/pull/600#discussion_r83908798
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestParquetWriter.java
 ---
@@ -739,30 +739,54 @@ public void runTestAndValidate(String selection, 
String validationSelection, Str
   }
 
   /*
-  Test the reading of an int96 field. Impala encodes timestamps as int96 
fields
+Impala encodes timestamp values as int96 fields. Test the reading of 
an int96 field with two converters:
+the first one converts parquet INT96 into drill VARBINARY and the 
second one (works while
+store.parquet.reader.int96_as_timestamp option is enabled) converts 
parquet INT96 into drill TIMESTAMP.
*/
   @Test
   public void testImpalaParquetInt96() throws Exception {
 compareParquetReadersColumnar("field_impala_ts", 
"cp.`parquet/int96_impala_1.parquet`");
+try {
+  test("alter session set %s = true", 
ExecConstants.PARQUET_READER_INT96_AS_TIMESTAMP);
+  compareParquetReadersColumnar("field_impala_ts", 
"cp.`parquet/int96_impala_1.parquet`");
--- End diff --

Github seems to have swallowed the previous comments so including 
@vdiravka's questions here:

>  1) Is it better to compare result with baseline columns and values from 
the file or it is ok to compare with sqlBaselineQuery and disabled new 
PARQUET_READER_INT96_AS_TIMESTAMP option?
> In the process of investigating this test I found that the primitive data 
type of the column in the file int96_dict_change.parquet is BINARY, not INT96.
> 2) I am a little bit confused with this. Do we need convert this BINARY 
to TIMESTAMP as well? CONVERT_FROM function with IMPALA_TIMESTAMP argument 
works properly for this field. I will investigate a little more about does 
impala and hive can store timestamps into parquet BINARY.

For 1) I think it is better to compare values from the file as opposed to 
running with the the PARQUET_READER_INT96_AS_TIMESTAMP disabled.
For 2) Can you correct the int96 data in the file? AFAIK, the data should 
be int96 for the test.


> Drill and Hive have incompatible timestamp representations in parquet
> -
>
> Key: DRILL-4373
> URL: https://issues.apache.org/jira/browse/DRILL-4373
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive, Storage - Parquet
>Affects Versions: 1.8.0
>Reporter: Rahul Challapalli
>Assignee: Karthikeyan Manivannan
>  Labels: doc-impacting
> Fix For: 1.9.0
>
>
> git.commit.id.abbrev=83d460c
> I created a parquet file with a timestamp type using Drill. Now if I define a 
> hive table on top of the parquet file and use "timestamp" as the column type, 
> drill fails to read the hive table through the hive storage plugin
> Implementation: 
> Added int96 to timestamp converter for both parquet readers and controling it 
> by system / session option "store.parquet.int96_as_timestamp".
> The value of the option is false by default for the proper work of the old 
> query scripts with the "convert_from TIMESTAMP_IMPALA" function.
> When the option is true using of that function is unnesessary and can lead to 
> the query fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4951) Running single HBase Unit Test results in error: java.lang.IllegalAccessError: tried to access method com.google.common.base.Stopwatch.()V

2016-10-18 Thread Chunhui Shi (JIRA)
Chunhui Shi created DRILL-4951:
--

 Summary: Running single HBase Unit Test results in error: 
java.lang.IllegalAccessError: tried to access method 
com.google.common.base.Stopwatch.()V
 Key: DRILL-4951
 URL: https://issues.apache.org/jira/browse/DRILL-4951
 Project: Apache Drill
  Issue Type: Bug
Reporter: Chunhui Shi
Assignee: Chunhui Shi


Under contrib/storage-hbase, running this command:
mvn test -Dtest=org.apache.drill.hbase.TestHBaseQueries#testWithEmptyTable

Got an error complaining Stopwatch does not have an expected constructor.

Running org.apache.drill.hbase.TestHBaseQueries
10:13:58.402 [main] WARN  o.a.hadoop.util.NativeCodeLoader - Unable to load 
native-hadoop library for your platform... using builtin-java classes where 
applicable
10:14:01.458 [main] WARN  o.a.h.metrics2.impl.MetricsConfig - Cannot locate 
configuration: tried hadoop-metrics2-hbase.properties,hadoop-metrics2.properties
10:14:02.020 [main] WARN  o.a.hadoop.hbase.http.HttpRequestLog - Jetty request 
log can only be enabled using Log4j
10:14:02.584 [localhost:37323.activeMasterManager] WARN  
org.apache.hadoop.hbase.ZNodeClearer - Environment variable HBASE_ZNODE_FILE 
not set; znodes will not be cleared on crash by start scripts (Longer MTTR!)
10:14:03.130 [JvmPauseMonitor] ERROR o.a.z.server.NIOServerCnxnFactory - Thread 
Thread[JvmPauseMonitor,5,main] died
java.lang.IllegalAccessError: tried to access method 
com.google.common.base.Stopwatch.()V from class 
org.apache.hadoop.hbase.util.JvmPauseMonitor$Monitor
at 
org.apache.hadoop.hbase.util.JvmPauseMonitor$Monitor.run(JvmPauseMonitor.java:154)
 ~[hbase-server-1.1.3.jar:1.1.3]
at java.lang.Thread.run(Thread.java:745) ~[na:1.7.0_101]
10:14:03.157 [JvmPauseMonitor] ERROR o.a.z.server.NIOServerCnxnFactory - Thread 
Thread[JvmPauseMonitor,5,main] died
java.lang.IllegalAccessError: tried to access method 
com.google.common.base.Stopwatch.()V from class 
org.apache.hadoop.hbase.util.JvmPauseMonitor$Monitor
at 
org.apache.hadoop.hbase.util.JvmPauseMonitor$Monitor.run(JvmPauseMonitor.java:154)
 ~[hbase-server-1.1.3.jar:1.1.3]
at java.lang.Thread.run(Thread.java:745) ~[na:1.7.0_101]
10:14:03.670 [localhost:37323.activeMasterManager] WARN  
o.a.h.h.p.s.wal.WALProcedureStore - Log directory not found: File 
file:/home/shi/dev/chunhui-shi/drill/contrib/storage-hbase/target/test-data/cea28708-595f-4585-ba37-9ba2a85ff0b1/MasterProcWALs
 does not exist
10:14:03.907 [RS:0;localhost:43220] WARN  o.a.h.h.regionserver.HRegionServer - 
reportForDuty failed; sleeping and then retrying.
10:14:04.931 [RS:0;localhost:43220] WARN  org.apache.hadoop.hbase.ZNodeClearer 
- Environment variable HBASE_ZNODE_FILE not set; znodes will not be cleared on 
crash by start scripts (Longer MTTR!)
10:14:04.981 [localhost:37323.activeMasterManager] ERROR 
o.apache.hadoop.hbase.master.HMaster - Failed to become active master
java.lang.IllegalAccessError: tried to access method 
com.google.common.base.Stopwatch.()V from class 
org.apache.hadoop.hbase.zookeeper.MetaTableLocator
at 
org.apache.hadoop.hbase.zookeeper.MetaTableLocator.blockUntilAvailable(MetaTableLocator.java:596)
 ~[hbase-client-1.1.3.jar:1.1.3]
at 
org.apache.hadoop.hbase.zookeeper.MetaTableLocator.waitMetaRegionLocation(MetaTableLocator.java:217)
 ~[hbase-client-1.1.3.jar:1.1.3]
at 
org.apache.hadoop.hbase.zookeeper.MetaTableLocator.getMetaServerConnection(MetaTableLocator.java:363)
 ~[hbase-client-1.1.3.jar:1.1.3]
at 
org.apache.hadoop.hbase.zookeeper.MetaTableLocator.verifyMetaRegionLocation(MetaTableLocator.java:283)
 ~[hbase-client-1.1.3.jar:1.1.3]
at org.apache.hadoop.hbase.master.HMaster.assignMeta(HMaster.java:907) 
~[hbase-server-1.1.3.jar:1.1.3]
at 
org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:743)
 ~[hbase-server-1.1.3.jar:1.1.3]
at org.apache.hadoop.hbase.master.HMaster.access$500(HMaster.java:183) 
~[hbase-server-1.1.3.jar:1.1.3]
at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1652) 
~[hbase-server-1.1.3.jar:1.1.3]
at java.lang.Thread.run(Thread.java:745) [na:1.7.0_101]
10:14:04.982 [localhost:37323.activeMasterManager] ERROR 
o.apache.hadoop.hbase.master.HMaster - Master server abort: loaded coprocessors 
are: []
10:14:04.985 [localhost:37323.activeMasterManager] ERROR 
o.apache.hadoop.hbase.master.HMaster - Unhandled exception. Starting shutdown.
java.lang.IllegalAccessError: tried to access method 
com.google.common.base.Stopwatch.()V from class 
org.apache.hadoop.hbase.zookeeper.MetaTableLocator
at 
org.apache.hadoop.hbase.zookeeper.MetaTableLocator.blockUntilAvailable(MetaTableLocator.java:596)
 ~[hbase-client-1.1.3.jar:1.1.3]
at 
org.apache.hadoop.hbase.zookeeper.MetaTableLocator.waitMetaRegionLocation(MetaTableLocator.java:217)
 ~[hbase-cl

[jira] [Commented] (DRILL-4373) Drill and Hive have incompatible timestamp representations in parquet

2016-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15585690#comment-15585690
 ] 

ASF GitHub Bot commented on DRILL-4373:
---

Github user vdiravka commented on a diff in the pull request:

https://github.com/apache/drill/pull/600#discussion_r83853146
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java ---
@@ -132,6 +132,8 @@
   OptionValidator PARQUET_VECTOR_FILL_CHECK_THRESHOLD_VALIDATOR = new 
PositiveLongValidator(PARQUET_VECTOR_FILL_CHECK_THRESHOLD, 100l, 10l);
   String PARQUET_NEW_RECORD_READER = "store.parquet.use_new_reader";
   OptionValidator PARQUET_RECORD_READER_IMPLEMENTATION_VALIDATOR = new 
BooleanValidator(PARQUET_NEW_RECORD_READER, false);
+  String PARQUET_READER_INT96_AS_TIMESTAMP = 
"store.parquet.int96_as_timestamp";
--- End diff --

Agree. Done.


> Drill and Hive have incompatible timestamp representations in parquet
> -
>
> Key: DRILL-4373
> URL: https://issues.apache.org/jira/browse/DRILL-4373
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive, Storage - Parquet
>Affects Versions: 1.8.0
>Reporter: Rahul Challapalli
>Assignee: Karthikeyan Manivannan
>  Labels: doc-impacting
> Fix For: 1.9.0
>
>
> git.commit.id.abbrev=83d460c
> I created a parquet file with a timestamp type using Drill. Now if I define a 
> hive table on top of the parquet file and use "timestamp" as the column type, 
> drill fails to read the hive table through the hive storage plugin
> Implementation: 
> Added int96 to timestamp converter for both parquet readers and controling it 
> by system / session option "store.parquet.int96_as_timestamp".
> The value of the option is false by default for the proper work of the old 
> query scripts with the "convert_from TIMESTAMP_IMPALA" function.
> When the option is true using of that function is unnesessary and can lead to 
> the query fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4373) Drill and Hive have incompatible timestamp representations in parquet

2016-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15585692#comment-15585692
 ] 

ASF GitHub Bot commented on DRILL-4373:
---

Github user vdiravka commented on a diff in the pull request:

https://github.com/apache/drill/pull/600#discussion_r83853471
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetReaderUtility.java
 ---
@@ -45,4 +53,34 @@ public static int getIntFromLEBytes(byte[] input, int 
start) {
 }
 return out;
   }
+
+  /**
+   * Utilities for converting from parquet INT96 binary (impala, hive 
timestamp)
+   * to date time value. This utilizes the Joda library.
+   */
+  public static class NanoTimeUtils {
+
+public static final long NANOS_PER_DAY = TimeUnit.DAYS.toNanos(1);
+public static final long NANOS_PER_HOUR = TimeUnit.HOURS.toNanos(1);
+public static final long NANOS_PER_MINUTE = 
TimeUnit.MINUTES.toNanos(1);
+public static final long NANOS_PER_SECOND = 
TimeUnit.SECONDS.toNanos(1);
+public static final long NANOS_PER_MILLISECOND =  
TimeUnit.MILLISECONDS.toNanos(1);
+
+  /**
+   * @param binaryTimeStampValue
+   *  hive, impala timestamp values with nanoseconds precision
+   *  are stored in parquet Binary as INT96
+   *
+   * @return  the number of milliseconds since January 1, 1970, 00:00:00 
GMT
+   *  represented by @param binaryTimeStampValue .
+   */
+public static long getDateTimeValueFromBinary(Binary 
binaryTimeStampValue) {
+  NanoTime nt = NanoTime.fromBinary(binaryTimeStampValue);
+  int julianDay = nt.getJulianDay();
+  long nanosOfDay = nt.getTimeOfDayNanos();
+  return DateTimeUtils.fromJulianDay(julianDay-0.5d) + 
nanosOfDay/NANOS_PER_MILLISECOND;
--- End diff --

The comment is removed. And numbers are replaced with constants from 
ParquetReaderUtility and DateTimeConstants.


> Drill and Hive have incompatible timestamp representations in parquet
> -
>
> Key: DRILL-4373
> URL: https://issues.apache.org/jira/browse/DRILL-4373
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive, Storage - Parquet
>Affects Versions: 1.8.0
>Reporter: Rahul Challapalli
>Assignee: Karthikeyan Manivannan
>  Labels: doc-impacting
> Fix For: 1.9.0
>
>
> git.commit.id.abbrev=83d460c
> I created a parquet file with a timestamp type using Drill. Now if I define a 
> hive table on top of the parquet file and use "timestamp" as the column type, 
> drill fails to read the hive table through the hive storage plugin
> Implementation: 
> Added int96 to timestamp converter for both parquet readers and controling it 
> by system / session option "store.parquet.int96_as_timestamp".
> The value of the option is false by default for the proper work of the old 
> query scripts with the "convert_from TIMESTAMP_IMPALA" function.
> When the option is true using of that function is unnesessary and can lead to 
> the query fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4373) Drill and Hive have incompatible timestamp representations in parquet

2016-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15585691#comment-15585691
 ] 

ASF GitHub Bot commented on DRILL-4373:
---

Github user vdiravka commented on a diff in the pull request:

https://github.com/apache/drill/pull/600#discussion_r83852721
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestParquetWriter.java
 ---
@@ -899,18 +883,21 @@ public void testLastPageOneNull() throws Exception {
 "cp.`parquet/last_page_one_null.parquet`");
   }
 
-  private void compareParquetInt96Converters(String newInt96ConverterQuery,
-  String oldInt96ConverterAndConvertFromFunctionQuery) throws 
Exception {
-testBuilder()
-.ordered()
-.sqlQuery(newInt96ConverterQuery)
-.optionSettingQueriesForTestQuery(
-"alter session set `%s` = true", 
ExecConstants.PARQUET_READER_INT96_AS_TIMESTAMP)
-.sqlBaselineQuery(oldInt96ConverterAndConvertFromFunctionQuery)
-.optionSettingQueriesForBaseline(
-"alter session set `%s` = false", 
ExecConstants.PARQUET_READER_INT96_AS_TIMESTAMP)
-.build()
-.run();
+  private void compareParquetInt96Converters(String selection, String 
table) throws Exception {
+try {
--- End diff --

I refactored my helped method with more clear code.


> Drill and Hive have incompatible timestamp representations in parquet
> -
>
> Key: DRILL-4373
> URL: https://issues.apache.org/jira/browse/DRILL-4373
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive, Storage - Parquet
>Affects Versions: 1.8.0
>Reporter: Rahul Challapalli
>Assignee: Karthikeyan Manivannan
>  Labels: doc-impacting
> Fix For: 1.9.0
>
>
> git.commit.id.abbrev=83d460c
> I created a parquet file with a timestamp type using Drill. Now if I define a 
> hive table on top of the parquet file and use "timestamp" as the column type, 
> drill fails to read the hive table through the hive storage plugin
> Implementation: 
> Added int96 to timestamp converter for both parquet readers and controling it 
> by system / session option "store.parquet.int96_as_timestamp".
> The value of the option is false by default for the proper work of the old 
> query scripts with the "convert_from TIMESTAMP_IMPALA" function.
> When the option is true using of that function is unnesessary and can lead to 
> the query fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4864) Add ANSI format for date/time functions

2016-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15585655#comment-15585655
 ] 

ASF GitHub Bot commented on DRILL-4864:
---

Github user Serhii-Harnyk commented on a diff in the pull request:

https://github.com/apache/drill/pull/581#discussion_r83871818
  
--- Diff: 
logical/src/main/java/org/apache/drill/common/expression/fn/JodaDateValidator.java
 ---
@@ -0,0 +1,213 @@
+/*
+* Licensed to the Apache Software Foundation (ASF) under one or more
+* contributor license agreements.  See the NOTICE file distributed with
+* this work for additional information regarding copyright ownership.
+* The ASF licenses this file to you under the Apache License, Version 2.0
+* (the "License"); you may not use this file except in compliance with
+* the License.  You may obtain a copy of the License at
+*
+* http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+*/
+package org.apache.drill.common.expression.fn;
+
+import com.google.common.collect.Sets;
+import org.apache.commons.lang3.StringUtils;
+import org.apache.drill.common.map.CaseInsensitiveMap;
+
+import java.util.Comparator;
+import java.util.Set;
+
+public class JodaDateValidator {
+
+  private static final Set ansiValuesForDeleting = 
Sets.newTreeSet(new LengthDescComparator());
+  private static final CaseInsensitiveMap ansiToJodaMap = 
CaseInsensitiveMap.newTreeMap(new LengthDescComparator());
+
+  //tokens for deleting
+  public static final String SUFFIX_SP = "sp";
+  public static final String PREFIX_FM = "fm";
+  public static final String PREFIX_FX = "fx";
+  public static final String PREFIX_TM = "tm";
+
+  //ansi patterns
+  public static final String ANSI_FULL_NAME_OF_DAY = "day";
+  public static final String ANSI_DAY_OF_YEAR = "ddd";
+  public static final String ANSI_DAY_OF_MONTH = "dd";
+  public static final String ANSI_DAY_OF_WEEK = "d";
+  public static final String ANSI_NAME_OF_MONTH = "month";
+  public static final String ANSI_ABR_NAME_OF_MONTH = "mon";
+  public static final String ANSI_FULL_ERA_NAME = "ee";
+  public static final String ANSI_NAME_OF_DAY = "dy";
+  public static final String ANSI_TIME_ZONE_NAME = "tz";
+  public static final String ANSI_HOUR_12_NAME = "hh";
+  public static final String ANSI_HOUR_12_OTHER_NAME = "hh12";
+  public static final String ANSI_HOUR_24_NAME = "hh24";
+  public static final String ANSI_MINUTE_OF_HOUR_NAME = "mi";
+  public static final String ANSI_SECOND_OF_MINUTE_NAME = "ss";
+  public static final String ANSI_MILLISECOND_OF_MINUTE_NAME = "ms";
+  public static final String ANSI_WEEK_OF_YEAR = "ww";
+  public static final String ANSI_MONTH = "mm";
+  public static final String ANSI_HALFDAY_AM = "am";
+  public static final String ANSI_HALFDAY_PM = "pm";
+
+  //jodaTime patterns
+  public static final String JODA_FULL_NAME_OF_DAY = "";
+  public static final String JODA_DAY_OF_YEAR = "D";
+  public static final String JODA_DAY_OF_MONTH = "d";
+  public static final String JODA_DAY_OF_WEEK = "e";
+  public static final String JODA_NAME_OF_MONTH = "";
+  public static final String JODA_ABR_NAME_OF_MONTH = "MMM";
+  public static final String JODA_FULL_ERA_NAME = "G";
+  public static final String JODA_NAME_OF_DAY = "E";
+  public static final String JODA_TIME_ZONE_NAME = "TZ";
+  public static final String JODA_HOUR_12_NAME = "h";
+  public static final String JODA_HOUR_12_OTHER_NAME = "h";
+  public static final String JODA_HOUR_24_NAME = "H";
+  public static final String JODA_MINUTE_OF_HOUR_NAME = "m";
+  public static final String JODA_SECOND_OF_MINUTE_NAME = "s";
+  public static final String JODA_MILLISECOND_OF_MINUTE_NAME = "S";
+  public static final String JODA_WEEK_OF_YEAR = "w";
+  public static final String JODA_MONTH = "MM";
+  public static final String JODA_HALFDAY = "aa";
+
+  static {
+ansiToJodaMap.put(ANSI_FULL_NAME_OF_DAY, JODA_FULL_NAME_OF_DAY);
+ansiToJodaMap.put(ANSI_DAY_OF_YEAR, JODA_DAY_OF_YEAR);
+ansiToJodaMap.put(ANSI_DAY_OF_MONTH, JODA_DAY_OF_MONTH);
+ansiToJodaMap.put(ANSI_DAY_OF_WEEK, JODA_DAY_OF_WEEK);
+ansiToJodaMap.put(ANSI_NAME_OF_MONTH, JODA_NAME_OF_MONTH);
+ansiToJodaMap.put(ANSI_ABR_NAME_OF_MONTH, JODA_ABR_NAME_OF_MONTH);
+ansiToJodaMap.put(ANSI_FU

[jira] [Commented] (DRILL-4864) Add ANSI format for date/time functions

2016-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15585656#comment-15585656
 ] 

ASF GitHub Bot commented on DRILL-4864:
---

Github user Serhii-Harnyk commented on a diff in the pull request:

https://github.com/apache/drill/pull/581#discussion_r83871870
  
--- Diff: 
logical/src/main/java/org/apache/drill/common/expression/fn/JodaDateValidator.java
 ---
@@ -0,0 +1,213 @@
+/*
+* Licensed to the Apache Software Foundation (ASF) under one or more
+* contributor license agreements.  See the NOTICE file distributed with
+* this work for additional information regarding copyright ownership.
+* The ASF licenses this file to you under the Apache License, Version 2.0
+* (the "License"); you may not use this file except in compliance with
+* the License.  You may obtain a copy of the License at
+*
+* http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+*/
+package org.apache.drill.common.expression.fn;
+
+import com.google.common.collect.Sets;
+import org.apache.commons.lang3.StringUtils;
+import org.apache.drill.common.map.CaseInsensitiveMap;
+
+import java.util.Comparator;
+import java.util.Set;
+
+public class JodaDateValidator {
+
+  private static final Set ansiValuesForDeleting = 
Sets.newTreeSet(new LengthDescComparator());
+  private static final CaseInsensitiveMap ansiToJodaMap = 
CaseInsensitiveMap.newTreeMap(new LengthDescComparator());
+
+  //tokens for deleting
+  public static final String SUFFIX_SP = "sp";
+  public static final String PREFIX_FM = "fm";
+  public static final String PREFIX_FX = "fx";
+  public static final String PREFIX_TM = "tm";
+
+  //ansi patterns
+  public static final String ANSI_FULL_NAME_OF_DAY = "day";
+  public static final String ANSI_DAY_OF_YEAR = "ddd";
+  public static final String ANSI_DAY_OF_MONTH = "dd";
+  public static final String ANSI_DAY_OF_WEEK = "d";
+  public static final String ANSI_NAME_OF_MONTH = "month";
+  public static final String ANSI_ABR_NAME_OF_MONTH = "mon";
+  public static final String ANSI_FULL_ERA_NAME = "ee";
+  public static final String ANSI_NAME_OF_DAY = "dy";
+  public static final String ANSI_TIME_ZONE_NAME = "tz";
+  public static final String ANSI_HOUR_12_NAME = "hh";
+  public static final String ANSI_HOUR_12_OTHER_NAME = "hh12";
+  public static final String ANSI_HOUR_24_NAME = "hh24";
+  public static final String ANSI_MINUTE_OF_HOUR_NAME = "mi";
+  public static final String ANSI_SECOND_OF_MINUTE_NAME = "ss";
+  public static final String ANSI_MILLISECOND_OF_MINUTE_NAME = "ms";
+  public static final String ANSI_WEEK_OF_YEAR = "ww";
+  public static final String ANSI_MONTH = "mm";
+  public static final String ANSI_HALFDAY_AM = "am";
+  public static final String ANSI_HALFDAY_PM = "pm";
+
+  //jodaTime patterns
+  public static final String JODA_FULL_NAME_OF_DAY = "";
+  public static final String JODA_DAY_OF_YEAR = "D";
+  public static final String JODA_DAY_OF_MONTH = "d";
+  public static final String JODA_DAY_OF_WEEK = "e";
+  public static final String JODA_NAME_OF_MONTH = "";
+  public static final String JODA_ABR_NAME_OF_MONTH = "MMM";
+  public static final String JODA_FULL_ERA_NAME = "G";
+  public static final String JODA_NAME_OF_DAY = "E";
+  public static final String JODA_TIME_ZONE_NAME = "TZ";
+  public static final String JODA_HOUR_12_NAME = "h";
+  public static final String JODA_HOUR_12_OTHER_NAME = "h";
+  public static final String JODA_HOUR_24_NAME = "H";
+  public static final String JODA_MINUTE_OF_HOUR_NAME = "m";
+  public static final String JODA_SECOND_OF_MINUTE_NAME = "s";
+  public static final String JODA_MILLISECOND_OF_MINUTE_NAME = "S";
+  public static final String JODA_WEEK_OF_YEAR = "w";
+  public static final String JODA_MONTH = "MM";
+  public static final String JODA_HALFDAY = "aa";
+
+  static {
+ansiToJodaMap.put(ANSI_FULL_NAME_OF_DAY, JODA_FULL_NAME_OF_DAY);
+ansiToJodaMap.put(ANSI_DAY_OF_YEAR, JODA_DAY_OF_YEAR);
+ansiToJodaMap.put(ANSI_DAY_OF_MONTH, JODA_DAY_OF_MONTH);
+ansiToJodaMap.put(ANSI_DAY_OF_WEEK, JODA_DAY_OF_WEEK);
+ansiToJodaMap.put(ANSI_NAME_OF_MONTH, JODA_NAME_OF_MONTH);
+ansiToJodaMap.put(ANSI_ABR_NAME_OF_MONTH, JODA_ABR_NAME_OF_MONTH);
+ansiToJodaMap.put(ANSI_FU

[jira] [Commented] (DRILL-4864) Add ANSI format for date/time functions

2016-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15585652#comment-15585652
 ] 

ASF GitHub Bot commented on DRILL-4864:
---

Github user Serhii-Harnyk commented on a diff in the pull request:

https://github.com/apache/drill/pull/581#discussion_r83871296
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/AnsiToJoda.java 
---
@@ -0,0 +1,58 @@
+/*
+* Licensed to the Apache Software Foundation (ASF) under one or more
+* contributor license agreements.  See the NOTICE file distributed with
+* this work for additional information regarding copyright ownership.
+* The ASF licenses this file to you under the Apache License, Version 2.0
+* (the "License"); you may not use this file except in compliance with
+* the License.  You may obtain a copy of the License at
+*
+* http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+*/
+package org.apache.drill.exec.expr.fn.impl;
+
+import io.netty.buffer.DrillBuf;
+import org.apache.drill.exec.expr.DrillSimpleFunc;
+import org.apache.drill.exec.expr.annotations.FunctionTemplate;
+import org.apache.drill.exec.expr.annotations.Output;
+import org.apache.drill.exec.expr.annotations.Param;
+import org.apache.drill.exec.expr.holders.VarCharHolder;
+
+import javax.inject.Inject;
+
+/**
+ * Replaces all ansi patterns to joda equivalents.
+ */
+@FunctionTemplate(name = "ansi_to_joda",
+  scope = FunctionTemplate.FunctionScope.SIMPLE,
+  nulls= FunctionTemplate.NullHandling.NULL_IF_NULL)
+public class AnsiToJoda implements DrillSimpleFunc {
+
+  @Param
+  VarCharHolder in;
+
+  @Output
+  VarCharHolder out;
+
+  @Inject
+  DrillBuf buffer;
+
+  @Override
+  public void setup() {
+  }
+
+  @Override
+  public void eval() {
+String pattern = 
org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.toStringFromUTF8(in.start,
 in.end, in.buffer);
--- End diff --

Datetime patterns validation isn't a simple task. For example JODA Time has 
no validation.


> Add ANSI format for date/time functions
> ---
>
> Key: DRILL-4864
> URL: https://issues.apache.org/jira/browse/DRILL-4864
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.8.0
>Reporter: Serhii Harnyk
>Assignee: Gautam Kumar Parai
>  Labels: doc-impacting
> Fix For: 1.9.0
>
>
> The TO_DATE() is exposing the Joda string formatting conventions into the SQL 
> layer. This is not following SQL conventions used by ANSI and many other 
> database engines on the market.
> Add new UDF "ansi_to_joda(string)", that takes string that represents ANSI 
> datetime format and returns string that represents equal Joda format.
> Add new session option "drill.exec.fn.to_date_format" that can be one of two 
> values - "JODA"(default) and "ANSI".
> If option is set to "JODA" queries with to_date() function would work in 
> usual way.
> If option is set to "ANSI" second argument would be wrapped with 
> ansi_to_joda() function, that allows user to use ANSI datetime format
> Wrapping is used in to_date(), to_time() and to_timestamp() functions.
> Table of joda and ansi patterns which may be replaced
> ||Pattern name||  Ansi format ||  JodaTime format
> | Full name of day|   day |   
> | Day of year |   ddd |   D
> | Day of month|   dd  |   d
> | Day of week |   d   |   e
> | Name of month   |   month   |   
> | Abr name of month   |   mon |   MMM
> | Full era name   |   ee  |   G
> | Name of day |   dy  |   E
> | Time zone   |   tz  |   TZ
> | Hour 12 |   hh  |   h
> | Hour 12 |   hh12|   h
> | Hour 24 |   hh24|   H
> | Minute of hour  |   mi  |   m
> | Second of minute|   ss  |   s
> | Millisecond of minute   |   ms  |   S
> | Week of year|   ww  |   w
> | Month   |   mm  |   MM
> | Halfday am  |   am  |   aa
> | Halfday pm  |   pm  |   aa
> | ref.|   
> https://www.postgr

[jira] [Commented] (DRILL-4864) Add ANSI format for date/time functions

2016-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15585650#comment-15585650
 ] 

ASF GitHub Bot commented on DRILL-4864:
---

Github user Serhii-Harnyk commented on a diff in the pull request:

https://github.com/apache/drill/pull/581#discussion_r83871606
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillOptiq.java
 ---
@@ -408,6 +411,12 @@ private LogicalExpression 
getDrillFunctionFromOptiqCall(RexCall call) {
 
   return first;
 }
+  } else if (functionName.equals("to_date") || 
functionName.equals("to_time") || functionName.equals("to_timestamp")) {
+// convert ansi date format string to joda according to session 
option
+OptionManager om = this.context.getPlannerSettings().getOptions();
+
if(ToDateFormats.valueOf(om.getOption(ExecConstants.TO_DATE_FORMAT).string_val.toUpperCase()).equals(ToDateFormats.ANSI))
 {
--- End diff --

Fixed


> Add ANSI format for date/time functions
> ---
>
> Key: DRILL-4864
> URL: https://issues.apache.org/jira/browse/DRILL-4864
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.8.0
>Reporter: Serhii Harnyk
>Assignee: Gautam Kumar Parai
>  Labels: doc-impacting
> Fix For: 1.9.0
>
>
> The TO_DATE() is exposing the Joda string formatting conventions into the SQL 
> layer. This is not following SQL conventions used by ANSI and many other 
> database engines on the market.
> Add new UDF "ansi_to_joda(string)", that takes string that represents ANSI 
> datetime format and returns string that represents equal Joda format.
> Add new session option "drill.exec.fn.to_date_format" that can be one of two 
> values - "JODA"(default) and "ANSI".
> If option is set to "JODA" queries with to_date() function would work in 
> usual way.
> If option is set to "ANSI" second argument would be wrapped with 
> ansi_to_joda() function, that allows user to use ANSI datetime format
> Wrapping is used in to_date(), to_time() and to_timestamp() functions.
> Table of joda and ansi patterns which may be replaced
> ||Pattern name||  Ansi format ||  JodaTime format
> | Full name of day|   day |   
> | Day of year |   ddd |   D
> | Day of month|   dd  |   d
> | Day of week |   d   |   e
> | Name of month   |   month   |   
> | Abr name of month   |   mon |   MMM
> | Full era name   |   ee  |   G
> | Name of day |   dy  |   E
> | Time zone   |   tz  |   TZ
> | Hour 12 |   hh  |   h
> | Hour 12 |   hh12|   h
> | Hour 24 |   hh24|   H
> | Minute of hour  |   mi  |   m
> | Second of minute|   ss  |   s
> | Millisecond of minute   |   ms  |   S
> | Week of year|   ww  |   w
> | Month   |   mm  |   MM
> | Halfday am  |   am  |   aa
> | Halfday pm  |   pm  |   aa
> | ref.|   
> https://www.postgresql.org/docs/8.2/static/functions-formatting.html| 
>   
> http://www.joda.org/joda-time/apidocs/org/joda/time/format/DateTimeFormat.html
>  |
> Table of ansi pattern modifiers, which may be deleted from string
> ||Description ||  Pattern ||
> | fill mode (suppress padding blanks and zeroes)  |   fm  |
> | fixed format global option (see usage notes)|   fx  |
> | translation mode (print localized day and month names based on 
> lc_messages) |   tm  |
> | spell mode (not yet implemented)|   sp  |
> | ref.|   
> https://www.postgresql.org/docs/8.2/static/functions-formatting.html|



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4864) Add ANSI format for date/time functions

2016-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15585651#comment-15585651
 ] 

ASF GitHub Bot commented on DRILL-4864:
---

Github user Serhii-Harnyk commented on a diff in the pull request:

https://github.com/apache/drill/pull/581#discussion_r83870798
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/AnsiToJoda.java 
---
@@ -0,0 +1,58 @@
+/*
+* Licensed to the Apache Software Foundation (ASF) under one or more
+* contributor license agreements.  See the NOTICE file distributed with
+* this work for additional information regarding copyright ownership.
+* The ASF licenses this file to you under the Apache License, Version 2.0
+* (the "License"); you may not use this file except in compliance with
+* the License.  You may obtain a copy of the License at
+*
+* http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+*/
+package org.apache.drill.exec.expr.fn.impl;
+
+import io.netty.buffer.DrillBuf;
+import org.apache.drill.exec.expr.DrillSimpleFunc;
+import org.apache.drill.exec.expr.annotations.FunctionTemplate;
+import org.apache.drill.exec.expr.annotations.Output;
+import org.apache.drill.exec.expr.annotations.Param;
+import org.apache.drill.exec.expr.holders.VarCharHolder;
+
+import javax.inject.Inject;
+
+/**
+ * Replaces all ansi patterns to joda equivalents.
+ */
+@FunctionTemplate(name = "ansi_to_joda",
+  scope = FunctionTemplate.FunctionScope.SIMPLE,
+  nulls= FunctionTemplate.NullHandling.NULL_IF_NULL)
--- End diff --

Fixed


> Add ANSI format for date/time functions
> ---
>
> Key: DRILL-4864
> URL: https://issues.apache.org/jira/browse/DRILL-4864
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.8.0
>Reporter: Serhii Harnyk
>Assignee: Gautam Kumar Parai
>  Labels: doc-impacting
> Fix For: 1.9.0
>
>
> The TO_DATE() is exposing the Joda string formatting conventions into the SQL 
> layer. This is not following SQL conventions used by ANSI and many other 
> database engines on the market.
> Add new UDF "ansi_to_joda(string)", that takes string that represents ANSI 
> datetime format and returns string that represents equal Joda format.
> Add new session option "drill.exec.fn.to_date_format" that can be one of two 
> values - "JODA"(default) and "ANSI".
> If option is set to "JODA" queries with to_date() function would work in 
> usual way.
> If option is set to "ANSI" second argument would be wrapped with 
> ansi_to_joda() function, that allows user to use ANSI datetime format
> Wrapping is used in to_date(), to_time() and to_timestamp() functions.
> Table of joda and ansi patterns which may be replaced
> ||Pattern name||  Ansi format ||  JodaTime format
> | Full name of day|   day |   
> | Day of year |   ddd |   D
> | Day of month|   dd  |   d
> | Day of week |   d   |   e
> | Name of month   |   month   |   
> | Abr name of month   |   mon |   MMM
> | Full era name   |   ee  |   G
> | Name of day |   dy  |   E
> | Time zone   |   tz  |   TZ
> | Hour 12 |   hh  |   h
> | Hour 12 |   hh12|   h
> | Hour 24 |   hh24|   H
> | Minute of hour  |   mi  |   m
> | Second of minute|   ss  |   s
> | Millisecond of minute   |   ms  |   S
> | Week of year|   ww  |   w
> | Month   |   mm  |   MM
> | Halfday am  |   am  |   aa
> | Halfday pm  |   pm  |   aa
> | ref.|   
> https://www.postgresql.org/docs/8.2/static/functions-formatting.html| 
>   
> http://www.joda.org/joda-time/apidocs/org/joda/time/format/DateTimeFormat.html
>  |
> Table of ansi pattern modifiers, which may be deleted from string
> ||Description ||  Pattern ||
> | fill mode (suppress padding blanks and zeroes)  |   fm  |
> | fixed format global option (see usage notes)|   fx  |
> | translation mode (print localized day and month names based on 
> lc_messages) |   tm  |
> |  

[jira] [Commented] (DRILL-4864) Add ANSI format for date/time functions

2016-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15585654#comment-15585654
 ] 

ASF GitHub Bot commented on DRILL-4864:
---

Github user Serhii-Harnyk commented on a diff in the pull request:

https://github.com/apache/drill/pull/581#discussion_r83871748
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillOptiq.java
 ---
@@ -408,6 +411,12 @@ private LogicalExpression 
getDrillFunctionFromOptiqCall(RexCall call) {
 
   return first;
 }
+  } else if (functionName.equals("to_date") || 
functionName.equals("to_time") || functionName.equals("to_timestamp")) {
+// convert ansi date format string to joda according to session 
option
+OptionManager om = this.context.getPlannerSettings().getOptions();
+
if(ToDateFormats.valueOf(om.getOption(ExecConstants.TO_DATE_FORMAT).string_val.toUpperCase()).equals(ToDateFormats.ANSI))
 {
+  args.set(1, FunctionCallFactory.createExpression("ansi_to_joda", 
Arrays.asList(args.get(1;
--- End diff --

Yes, it would be two nested ansi_to_joda conversions.


> Add ANSI format for date/time functions
> ---
>
> Key: DRILL-4864
> URL: https://issues.apache.org/jira/browse/DRILL-4864
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.8.0
>Reporter: Serhii Harnyk
>Assignee: Gautam Kumar Parai
>  Labels: doc-impacting
> Fix For: 1.9.0
>
>
> The TO_DATE() is exposing the Joda string formatting conventions into the SQL 
> layer. This is not following SQL conventions used by ANSI and many other 
> database engines on the market.
> Add new UDF "ansi_to_joda(string)", that takes string that represents ANSI 
> datetime format and returns string that represents equal Joda format.
> Add new session option "drill.exec.fn.to_date_format" that can be one of two 
> values - "JODA"(default) and "ANSI".
> If option is set to "JODA" queries with to_date() function would work in 
> usual way.
> If option is set to "ANSI" second argument would be wrapped with 
> ansi_to_joda() function, that allows user to use ANSI datetime format
> Wrapping is used in to_date(), to_time() and to_timestamp() functions.
> Table of joda and ansi patterns which may be replaced
> ||Pattern name||  Ansi format ||  JodaTime format
> | Full name of day|   day |   
> | Day of year |   ddd |   D
> | Day of month|   dd  |   d
> | Day of week |   d   |   e
> | Name of month   |   month   |   
> | Abr name of month   |   mon |   MMM
> | Full era name   |   ee  |   G
> | Name of day |   dy  |   E
> | Time zone   |   tz  |   TZ
> | Hour 12 |   hh  |   h
> | Hour 12 |   hh12|   h
> | Hour 24 |   hh24|   H
> | Minute of hour  |   mi  |   m
> | Second of minute|   ss  |   s
> | Millisecond of minute   |   ms  |   S
> | Week of year|   ww  |   w
> | Month   |   mm  |   MM
> | Halfday am  |   am  |   aa
> | Halfday pm  |   pm  |   aa
> | ref.|   
> https://www.postgresql.org/docs/8.2/static/functions-formatting.html| 
>   
> http://www.joda.org/joda-time/apidocs/org/joda/time/format/DateTimeFormat.html
>  |
> Table of ansi pattern modifiers, which may be deleted from string
> ||Description ||  Pattern ||
> | fill mode (suppress padding blanks and zeroes)  |   fm  |
> | fixed format global option (see usage notes)|   fx  |
> | translation mode (print localized day and month names based on 
> lc_messages) |   tm  |
> | spell mode (not yet implemented)|   sp  |
> | ref.|   
> https://www.postgresql.org/docs/8.2/static/functions-formatting.html|



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4864) Add ANSI format for date/time functions

2016-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15585653#comment-15585653
 ] 

ASF GitHub Bot commented on DRILL-4864:
---

Github user Serhii-Harnyk commented on a diff in the pull request:

https://github.com/apache/drill/pull/581#discussion_r83871584
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillOptiq.java
 ---
@@ -408,6 +411,12 @@ private LogicalExpression 
getDrillFunctionFromOptiqCall(RexCall call) {
 
   return first;
 }
+  } else if (functionName.equals("to_date") || 
functionName.equals("to_time") || functionName.equals("to_timestamp")) {
--- End diff --

No. Variable functionName has been already lowercased. It's for all 
functions.


> Add ANSI format for date/time functions
> ---
>
> Key: DRILL-4864
> URL: https://issues.apache.org/jira/browse/DRILL-4864
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.8.0
>Reporter: Serhii Harnyk
>Assignee: Gautam Kumar Parai
>  Labels: doc-impacting
> Fix For: 1.9.0
>
>
> The TO_DATE() is exposing the Joda string formatting conventions into the SQL 
> layer. This is not following SQL conventions used by ANSI and many other 
> database engines on the market.
> Add new UDF "ansi_to_joda(string)", that takes string that represents ANSI 
> datetime format and returns string that represents equal Joda format.
> Add new session option "drill.exec.fn.to_date_format" that can be one of two 
> values - "JODA"(default) and "ANSI".
> If option is set to "JODA" queries with to_date() function would work in 
> usual way.
> If option is set to "ANSI" second argument would be wrapped with 
> ansi_to_joda() function, that allows user to use ANSI datetime format
> Wrapping is used in to_date(), to_time() and to_timestamp() functions.
> Table of joda and ansi patterns which may be replaced
> ||Pattern name||  Ansi format ||  JodaTime format
> | Full name of day|   day |   
> | Day of year |   ddd |   D
> | Day of month|   dd  |   d
> | Day of week |   d   |   e
> | Name of month   |   month   |   
> | Abr name of month   |   mon |   MMM
> | Full era name   |   ee  |   G
> | Name of day |   dy  |   E
> | Time zone   |   tz  |   TZ
> | Hour 12 |   hh  |   h
> | Hour 12 |   hh12|   h
> | Hour 24 |   hh24|   H
> | Minute of hour  |   mi  |   m
> | Second of minute|   ss  |   s
> | Millisecond of minute   |   ms  |   S
> | Week of year|   ww  |   w
> | Month   |   mm  |   MM
> | Halfday am  |   am  |   aa
> | Halfday pm  |   pm  |   aa
> | ref.|   
> https://www.postgresql.org/docs/8.2/static/functions-formatting.html| 
>   
> http://www.joda.org/joda-time/apidocs/org/joda/time/format/DateTimeFormat.html
>  |
> Table of ansi pattern modifiers, which may be deleted from string
> ||Description ||  Pattern ||
> | fill mode (suppress padding blanks and zeroes)  |   fm  |
> | fixed format global option (see usage notes)|   fx  |
> | translation mode (print localized day and month names based on 
> lc_messages) |   tm  |
> | spell mode (not yet implemented)|   sp  |
> | ref.|   
> https://www.postgresql.org/docs/8.2/static/functions-formatting.html|



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4203) Parquet File : Date is stored wrongly

2016-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15585234#comment-15585234
 ] 

ASF GitHub Bot commented on DRILL-4203:
---

Github user vdiravka commented on the issue:

https://github.com/apache/drill/pull/595
  
@tushu1232 Until this fix is merged into the master you can clone my fork 
repository (https://github.com/vdiravka/drill), switch to the DRILL-4203 branch 
and build the project.


> Parquet File : Date is stored wrongly
> -
>
> Key: DRILL-4203
> URL: https://issues.apache.org/jira/browse/DRILL-4203
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.4.0
>Reporter: Stéphane Trou
>Assignee: Vitalii Diravka
>Priority: Critical
>  Labels: doc-impacting
> Fix For: 1.9.0
>
>
> Hello,
> I have some problems when i try to read parquet files produce by drill with  
> Spark,  all dates are corrupted.
> I think the problem come from drill :)
> {code}
> cat /tmp/date_parquet.csv 
> Epoch,1970-01-01
> {code}
> {code}
> 0: jdbc:drill:zk=local> select columns[0] as name, cast(columns[1] as date) 
> as epoch_date from dfs.tmp.`date_parquet.csv`;
> ++-+
> |  name  | epoch_date  |
> ++-+
> | Epoch  | 1970-01-01  |
> ++-+
> {code}
> {code}
> 0: jdbc:drill:zk=local> create table dfs.tmp.`buggy_parquet`as select 
> columns[0] as name, cast(columns[1] as date) as epoch_date from 
> dfs.tmp.`date_parquet.csv`;
> +---++
> | Fragment  | Number of records written  |
> +---++
> | 0_0   | 1  |
> +---++
> {code}
> When I read the file with parquet tools, i found  
> {code}
> java -jar parquet-tools-1.8.1.jar head /tmp/buggy_parquet/
> name = Epoch
> epoch_date = 4881176
> {code}
> According to 
> [https://github.com/Parquet/parquet-format/blob/master/LogicalTypes.md#date], 
> epoch_date should be equals to 0.
> Meta : 
> {code}
> java -jar parquet-tools-1.8.1.jar meta /tmp/buggy_parquet/
> file:file:/tmp/buggy_parquet/0_0_0.parquet 
> creator: parquet-mr version 1.8.1-drill-r0 (build 
> 6b605a4ea05b66e1a6bf843353abcb4834a4ced8) 
> extra:   drill.version = 1.4.0 
> file schema: root 
> 
> name:OPTIONAL BINARY O:UTF8 R:0 D:1
> epoch_date:  OPTIONAL INT32 O:DATE R:0 D:1
> row group 1: RC:1 TS:93 OFFSET:4 
> 
> name: BINARY SNAPPY DO:0 FPO:4 SZ:52/50/0,96 VC:1 
> ENC:RLE,BIT_PACKED,PLAIN
> epoch_date:   INT32 SNAPPY DO:0 FPO:56 SZ:45/43/0,96 VC:1 
> ENC:RLE,BIT_PACKED,PLAIN
> {code}
> Implementation:
> After the fix Drill can automatically determine date corruption in parquet 
> files 
> and convert it to correct values.
> For the reason, when the user want to work with the dates over the 5 000 
> years,
> an option is included to turn off the auto-correction.
> Use of this option is assumed to be extremely unlikely, but it is included for
> completeness.
> To disable "auto correction" you should use the parquet config in the plugin 
> settings. Something like this:
> {code}
>   "formats": {
> "parquet": {
>   "type": "parquet",
>   "autoCorrectCorruptDates": false
> }
> {code}
> Or you can try to use the query like this:
> {code}
> select l_shipdate, l_commitdate from 
> table(dfs.`/drill/testdata/parquet_date/dates_nodrillversion/drillgen2_lineitem`
>  
> (type => 'parquet', autoCorrectCorruptDates => false)) limit 1;
> {code}
> After the fix the new files generated from drill will have 
> "is.date.correct=true" extra property in parquet 
> metadata, which defines that the file can't involve corrupted date values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4203) Parquet File : Date is stored wrongly

2016-10-18 Thread Vitalii Diravka (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalii Diravka updated DRILL-4203:
---
Description: 
Hello,

I have some problems when i try to read parquet files produce by drill with  
Spark,  all dates are corrupted.

I think the problem come from drill :)

{code}
cat /tmp/date_parquet.csv 
Epoch,1970-01-01
{code}

{code}
0: jdbc:drill:zk=local> select columns[0] as name, cast(columns[1] as date) as 
epoch_date from dfs.tmp.`date_parquet.csv`;
++-+
|  name  | epoch_date  |
++-+
| Epoch  | 1970-01-01  |
++-+
{code}

{code}
0: jdbc:drill:zk=local> create table dfs.tmp.`buggy_parquet`as select 
columns[0] as name, cast(columns[1] as date) as epoch_date from 
dfs.tmp.`date_parquet.csv`;
+---++
| Fragment  | Number of records written  |
+---++
| 0_0   | 1  |
+---++
{code}

When I read the file with parquet tools, i found  
{code}
java -jar parquet-tools-1.8.1.jar head /tmp/buggy_parquet/
name = Epoch
epoch_date = 4881176
{code}

According to 
[https://github.com/Parquet/parquet-format/blob/master/LogicalTypes.md#date], 
epoch_date should be equals to 0.

Meta : 
{code}
java -jar parquet-tools-1.8.1.jar meta /tmp/buggy_parquet/
file:file:/tmp/buggy_parquet/0_0_0.parquet 
creator: parquet-mr version 1.8.1-drill-r0 (build 
6b605a4ea05b66e1a6bf843353abcb4834a4ced8) 
extra:   drill.version = 1.4.0 

file schema: root 

name:OPTIONAL BINARY O:UTF8 R:0 D:1
epoch_date:  OPTIONAL INT32 O:DATE R:0 D:1

row group 1: RC:1 TS:93 OFFSET:4 

name: BINARY SNAPPY DO:0 FPO:4 SZ:52/50/0,96 VC:1 
ENC:RLE,BIT_PACKED,PLAIN
epoch_date:   INT32 SNAPPY DO:0 FPO:56 SZ:45/43/0,96 VC:1 
ENC:RLE,BIT_PACKED,PLAIN
{code}




Implementation:

After the fix Drill can automatically determine date corruption in parquet 
files 
and convert it to correct values.

For the reason, when the user want to work with the dates over the 5 000 years,
an option is included to turn off the auto-correction.
Use of this option is assumed to be extremely unlikely, but it is included for
completeness.
To disable "auto correction" you should use the parquet config in the plugin 
settings. Something like this:
{code}
  "formats": {
"parquet": {
  "type": "parquet",
  "autoCorrectCorruptDates": false
}
{code}
Or you can try to use the query like this:
{code}
select l_shipdate, l_commitdate from 
table(dfs.`/drill/testdata/parquet_date/dates_nodrillversion/drillgen2_lineitem`
 
(type => 'parquet', autoCorrectCorruptDates => false)) limit 1;
{code}

After the fix the new files generated from drill will have 
"is.date.correct=true" extra property in parquet 
metadata, which defines that the file can't involve corrupted date values.



  was:
Hello,

I have some problems when i try to read parquet files produce by drill with  
Spark,  all dates are corrupted.

I think the problem come from drill :)

{code}
cat /tmp/date_parquet.csv 
Epoch,1970-01-01
{code}

{code}
0: jdbc:drill:zk=local> select columns[0] as name, cast(columns[1] as date) as 
epoch_date from dfs.tmp.`date_parquet.csv`;
++-+
|  name  | epoch_date  |
++-+
| Epoch  | 1970-01-01  |
++-+
{code}

{code}
0: jdbc:drill:zk=local> create table dfs.tmp.`buggy_parquet`as select 
columns[0] as name, cast(columns[1] as date) as epoch_date from 
dfs.tmp.`date_parquet.csv`;
+---++
| Fragment  | Number of records written  |
+---++
| 0_0   | 1  |
+---++
{code}

When I read the file with parquet tools, i found  
{code}
java -jar parquet-tools-1.8.1.jar head /tmp/buggy_parquet/
name = Epoch
epoch_date = 4881176
{code}

According to 
[https://github.com/Parquet/parquet-format/blob/master/LogicalTypes.md#date], 
epoch_date should be equals to 0.

Meta : 
{code}
java -jar parquet-tools-1.8.1.jar meta /tmp/buggy_parquet/
file:file:/tmp/buggy_parquet/0_0_0.parquet 
creator: parquet-mr version 1.8.1-drill-r0 (build 
6b605a4ea05b66e1a6bf843353abcb4834a4ced8) 
extra:   drill.version = 1.4.0 

file schema: root 

name:OPTIONAL BINARY O:UTF8 R:0 D:1
epoch_date:  OPTIONAL INT32 O:DATE R:0 D:1

row group 1: RC:1 TS:93 OFFSET:4 

name: BINARY SNAPPY DO:0 FPO:4 SZ:52/50/0,96 VC:1 
ENC:RLE,BIT_PACKED,PLAIN
epoch_date:   INT32 SNAPPY DO:0 FPO:

[jira] [Updated] (DRILL-4373) Drill and Hive have incompatible timestamp representations in parquet

2016-10-18 Thread Vitalii Diravka (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalii Diravka updated DRILL-4373:
---
Description: 
git.commit.id.abbrev=83d460c

I created a parquet file with a timestamp type using Drill. Now if I define a 
hive table on top of the parquet file and use "timestamp" as the column type, 
drill fails to read the hive table through the hive storage plugin

Implementation: 

Added int96 to timestamp converter for both parquet readers and controling it 
by system / session option "store.parquet.int96_as_timestamp".
The value of the option is false by default for the proper work of the old 
query scripts with the "convert_from TIMESTAMP_IMPALA" function.

When the option is true using of that function is unnesessary and can lead to 
the query fail.


  was:
git.commit.id.abbrev=83d460c

I created a parquet file with a timestamp type using Drill. Now if I define a 
hive table on top of the parquet file and use "timestamp" as the column type, 
drill fails to read the hive table through the hive storage plugin


> Drill and Hive have incompatible timestamp representations in parquet
> -
>
> Key: DRILL-4373
> URL: https://issues.apache.org/jira/browse/DRILL-4373
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive, Storage - Parquet
>Affects Versions: 1.8.0
>Reporter: Rahul Challapalli
>Assignee: Karthikeyan Manivannan
>  Labels: doc-impacting
> Fix For: 1.9.0
>
>
> git.commit.id.abbrev=83d460c
> I created a parquet file with a timestamp type using Drill. Now if I define a 
> hive table on top of the parquet file and use "timestamp" as the column type, 
> drill fails to read the hive table through the hive storage plugin
> Implementation: 
> Added int96 to timestamp converter for both parquet readers and controling it 
> by system / session option "store.parquet.int96_as_timestamp".
> The value of the option is false by default for the proper work of the old 
> query scripts with the "convert_from TIMESTAMP_IMPALA" function.
> When the option is true using of that function is unnesessary and can lead to 
> the query fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4203) Parquet File : Date is stored wrongly

2016-10-18 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-4203:

Labels: doc-impacting  (was: )

> Parquet File : Date is stored wrongly
> -
>
> Key: DRILL-4203
> URL: https://issues.apache.org/jira/browse/DRILL-4203
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.4.0
>Reporter: Stéphane Trou
>Assignee: Vitalii Diravka
>Priority: Critical
>  Labels: doc-impacting
> Fix For: 1.9.0
>
>
> Hello,
> I have some problems when i try to read parquet files produce by drill with  
> Spark,  all dates are corrupted.
> I think the problem come from drill :)
> {code}
> cat /tmp/date_parquet.csv 
> Epoch,1970-01-01
> {code}
> {code}
> 0: jdbc:drill:zk=local> select columns[0] as name, cast(columns[1] as date) 
> as epoch_date from dfs.tmp.`date_parquet.csv`;
> ++-+
> |  name  | epoch_date  |
> ++-+
> | Epoch  | 1970-01-01  |
> ++-+
> {code}
> {code}
> 0: jdbc:drill:zk=local> create table dfs.tmp.`buggy_parquet`as select 
> columns[0] as name, cast(columns[1] as date) as epoch_date from 
> dfs.tmp.`date_parquet.csv`;
> +---++
> | Fragment  | Number of records written  |
> +---++
> | 0_0   | 1  |
> +---++
> {code}
> When I read the file with parquet tools, i found  
> {code}
> java -jar parquet-tools-1.8.1.jar head /tmp/buggy_parquet/
> name = Epoch
> epoch_date = 4881176
> {code}
> According to 
> [https://github.com/Parquet/parquet-format/blob/master/LogicalTypes.md#date], 
> epoch_date should be equals to 0.
> Meta : 
> {code}
> java -jar parquet-tools-1.8.1.jar meta /tmp/buggy_parquet/
> file:file:/tmp/buggy_parquet/0_0_0.parquet 
> creator: parquet-mr version 1.8.1-drill-r0 (build 
> 6b605a4ea05b66e1a6bf843353abcb4834a4ced8) 
> extra:   drill.version = 1.4.0 
> file schema: root 
> 
> name:OPTIONAL BINARY O:UTF8 R:0 D:1
> epoch_date:  OPTIONAL INT32 O:DATE R:0 D:1
> row group 1: RC:1 TS:93 OFFSET:4 
> 
> name: BINARY SNAPPY DO:0 FPO:4 SZ:52/50/0,96 VC:1 
> ENC:RLE,BIT_PACKED,PLAIN
> epoch_date:   INT32 SNAPPY DO:0 FPO:56 SZ:45/43/0,96 VC:1 
> ENC:RLE,BIT_PACKED,PLAIN
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)