date:20161006

[jira] [Commented] (DRILL-4933) Column aliasing isn’t working when we use partition by clause with row_number() [Ranking Window Functions]

2016-10-06 Thread Khurram Faraaz (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15551000#comment-15551000
 ] 

Khurram Faraaz commented on DRILL-4933:
---

Works just fine for me, I am on 1.9.0

rownum without back quotes.
{noformat}
0: jdbc:drill:schema=dfs.tmp> select id, name, row_number() over(partition by 
state order by id) as rownum from `emp_tbl` limit 10;
+--+-+-+
|  id  |name | rownum  |
+--+-+-+
| 1| John Doe| 1   |
| 6| Shawn Jay   | 2   |
| 29   | Alex| 3   |
| 35   | Roger King  | 4   |
| 39   | Philip  | 5   |
| 800  | Susan   | 6   |
| 9| Wright Bob  | 1   |
| 10   | Sharma  | 2   |
| 30   | Khan| 3   |
| 100  | Kumar   | 4   |
+--+-+-+
10 rows selected (0.195 seconds)
{noformat}

rownum within back quotes
{noformat}
0: jdbc:drill:schema=dfs.tmp> select id, name, row_number() over(partition by 
state order by id) as `rownum` from `emp_tbl` limit 10;
+--+-+-+
|  id  |name | rownum  |
+--+-+-+
| 1| John Doe| 1   |
| 6| Shawn Jay   | 2   |
| 29   | Alex| 3   |
| 35   | Roger King  | 4   |
| 39   | Philip  | 5   |
| 800  | Susan   | 6   |
| 9| Wright Bob  | 1   |
| 10   | Sharma  | 2   |
| 30   | Khan| 3   |
| 100  | Kumar   | 4   |
+--+-+-+
10 rows selected (0.169 seconds)
{noformat}

> Column aliasing isn’t working when we use partition by clause with 
> row_number() [Ranking Window Functions]
> --
>
> Key: DRILL-4933
> URL: https://issues.apache.org/jira/browse/DRILL-4933
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.8.0
>Reporter: Deepak Shivamurthy
>Priority: Minor
>
> I have run below query and alias is done on the third column, but still alias 
> does not working.
> select ID, sts_utc, row_number() over(partition by ID order by sts_utc) as 
> `rownum` from dfs.`/tmp/events` limit 10;
> Output:
> ID sts_utc, $2
> I would be expecting column rownum instead of $2.. (even i have other column 
> names, still aliasing didnt work)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4862) wrong results - use of convert_from(binary_string(key),'UTF8') in filter results in wrong results

2016-10-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15551219#comment-15551219
 ] 

ASF GitHub Bot commented on DRILL-4862:
---

GitHub user chunhui-shi opened a pull request:

https://github.com/apache/drill/pull/604

DRILL-4862: binary_string should use another buffer as out buffer to …

…be used in more generic usage.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chunhui-shi/drill DRILL-4862-string

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/604.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #604


commit 1a6ae051eaa11d507e53ddd6912995e7aa6f7111
Author: chunhui-shi 
Date:   2016-08-25T17:23:53Z

DRILL-4862: binary_string should use another buffer as out buffer to be 
used in more generic usage.




> wrong results - use of convert_from(binary_string(key),'UTF8') in filter 
> results in wrong results
> -
>
> Key: DRILL-4862
> URL: https://issues.apache.org/jira/browse/DRILL-4862
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.8.0
> Environment: 4 node cluster CentOS
>Reporter: Khurram Faraaz
>Assignee: Chunhui Shi
>
> These results do not look right, i.e when the predicate has 
> convert_from(binary_string(key),'UTF8')
> Apache drill 1.8.0-SNAPSHOT git commit ID: 57dc9f43
> {noformat}
> [root@centos-0x drill4478]# cat f1.json
> {"key":"\\x30\\x31\\x32\\x33"}
> {"key":"\\x34\\x35\\x36\\x37"}
> {"key":"\\x38\\x39\\x30\\x31"}
> {"key":"\\x30\\x30\\x30\\x30"}
> {"key":"\\x31\\x31\\x31\\x31"}
> {"key":"\\x35\\x35\\x35\\x35"}
> {"key":"\\x38\\x38\\x38\\x38"}
> {"key":"\\x39\\x39\\x39\\x39"}
> {"key":"\\x41\\x42\\x43\\x44"}
> {"key":"\\x45\\x46\\x47\\x48"}
> {"key":"\\x49\\x41\\x44\\x46"}
> {"key":"\\x4a\\x4b\\x4c\\x4d"}
> {"key":"\\x57\\x58\\x59\\x5a"}
> {"key":"\\x4e\\x4f\\x50\\x51"}
> {"key":"\\x46\\x46\\x46\\x46"}
> {noformat}
> results without the predicate - these are correct results
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> select convert_from(binary_string(key),'UTF8') 
> from `f1.json`;
> +-+
> | EXPR$0  |
> +-+
> | 0123|
> | 4567|
> | 8901|
> | |
> | |
> | |
> | |
> | |
> | ABCD|
> | EFGH|
> | IADF|
> | JKLM|
> | WXYZ|
> | NOPQ|
> | |
> +-+
> 15 rows selected (0.256 seconds)
> {noformat}
> results with a predicate - these results don't look correct
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> select convert_from(binary_string(key),'UTF8') 
> from `f1.json` where convert_from(binary_string(key),'UTF8') is not null;
> +--+
> |  EXPR$0  |
> +--+
> | 0123123  |
> | 4567567  |
> | 8901901  |
> | 000  |
> | 111  |
> | 555  |
> | 888  |
> | 999  |
> | ABCDBCD  |
> | EFGHFGH  |
> | IADFADF  |
> | JKLMKLM  |
> | WXYZXYZ  |
> | NOPQOPQ  |
> | FFF  |
> +--+
> 15 rows selected (0.279 seconds)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (DRILL-4862) wrong results - use of convert_from(binary_string(key),'UTF8') in filter results in wrong results

2016-10-06 Thread Chunhui Shi (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chunhui Shi reassigned DRILL-4862:
--

Assignee: Chunhui Shi

> wrong results - use of convert_from(binary_string(key),'UTF8') in filter 
> results in wrong results
> -
>
> Key: DRILL-4862
> URL: https://issues.apache.org/jira/browse/DRILL-4862
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.8.0
> Environment: 4 node cluster CentOS
>Reporter: Khurram Faraaz
>Assignee: Chunhui Shi
>
> These results do not look right, i.e when the predicate has 
> convert_from(binary_string(key),'UTF8')
> Apache drill 1.8.0-SNAPSHOT git commit ID: 57dc9f43
> {noformat}
> [root@centos-0x drill4478]# cat f1.json
> {"key":"\\x30\\x31\\x32\\x33"}
> {"key":"\\x34\\x35\\x36\\x37"}
> {"key":"\\x38\\x39\\x30\\x31"}
> {"key":"\\x30\\x30\\x30\\x30"}
> {"key":"\\x31\\x31\\x31\\x31"}
> {"key":"\\x35\\x35\\x35\\x35"}
> {"key":"\\x38\\x38\\x38\\x38"}
> {"key":"\\x39\\x39\\x39\\x39"}
> {"key":"\\x41\\x42\\x43\\x44"}
> {"key":"\\x45\\x46\\x47\\x48"}
> {"key":"\\x49\\x41\\x44\\x46"}
> {"key":"\\x4a\\x4b\\x4c\\x4d"}
> {"key":"\\x57\\x58\\x59\\x5a"}
> {"key":"\\x4e\\x4f\\x50\\x51"}
> {"key":"\\x46\\x46\\x46\\x46"}
> {noformat}
> results without the predicate - these are correct results
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> select convert_from(binary_string(key),'UTF8') 
> from `f1.json`;
> +-+
> | EXPR$0  |
> +-+
> | 0123|
> | 4567|
> | 8901|
> | |
> | |
> | |
> | |
> | |
> | ABCD|
> | EFGH|
> | IADF|
> | JKLM|
> | WXYZ|
> | NOPQ|
> | |
> +-+
> 15 rows selected (0.256 seconds)
> {noformat}
> results with a predicate - these results don't look correct
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> select convert_from(binary_string(key),'UTF8') 
> from `f1.json` where convert_from(binary_string(key),'UTF8') is not null;
> +--+
> |  EXPR$0  |
> +--+
> | 0123123  |
> | 4567567  |
> | 8901901  |
> | 000  |
> | 111  |
> | 555  |
> | 888  |
> | 999  |
> | ABCDBCD  |
> | EFGHFGH  |
> | IADFADF  |
> | JKLMKLM  |
> | WXYZXYZ  |
> | NOPQOPQ  |
> | FFF  |
> +--+
> 15 rows selected (0.279 seconds)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (DRILL-4933) Column aliasing isn’t working when we use partition by clause with row_number() [Ranking Window Functions]

2016-10-06 Thread Deepak Shivamurthy (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15551303#comment-15551303
 ] 

Deepak Shivamurthy edited comment on DRILL-4933 at 10/6/16 8:20 AM:


Thanks Khurram Faraaz

Could you please send across URL to download stable drill 1.9.0 as I could not 
see in the below url;
http://apache.mirrors.hoobly.com/drill/


was (Author: deepaks):
Could you please send across URL to download stable drill 1.9.0 as I could not 
see in the below url;
http://apache.mirrors.hoobly.com/drill/

> Column aliasing isn’t working when we use partition by clause with 
> row_number() [Ranking Window Functions]
> --
>
> Key: DRILL-4933
> URL: https://issues.apache.org/jira/browse/DRILL-4933
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.8.0
>Reporter: Deepak Shivamurthy
>Priority: Minor
>
> I have run below query and alias is done on the third column, but still alias 
> does not working.
> select ID, sts_utc, row_number() over(partition by ID order by sts_utc) as 
> `rownum` from dfs.`/tmp/events` limit 10;
> Output:
> ID sts_utc, $2
> I would be expecting column rownum instead of $2.. (even i have other column 
> names, still aliasing didnt work)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4933) Column aliasing isn’t working when we use partition by clause with row_number() [Ranking Window Functions]

2016-10-06 Thread Deepak Shivamurthy (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15551303#comment-15551303
 ] 

Deepak Shivamurthy commented on DRILL-4933:
---

Could you please send across URL to download stable drill 1.9.0 as I could not 
see in the below url;
http://apache.mirrors.hoobly.com/drill/

> Column aliasing isn’t working when we use partition by clause with 
> row_number() [Ranking Window Functions]
> --
>
> Key: DRILL-4933
> URL: https://issues.apache.org/jira/browse/DRILL-4933
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.8.0
>Reporter: Deepak Shivamurthy
>Priority: Minor
>
> I have run below query and alias is done on the third column, but still alias 
> does not working.
> select ID, sts_utc, row_number() over(partition by ID order by sts_utc) as 
> `rownum` from dfs.`/tmp/events` limit 10;
> Output:
> ID sts_utc, $2
> I would be expecting column rownum instead of $2.. (even i have other column 
> names, still aliasing didnt work)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4933) Column aliasing isn’t working when we use partition by clause with row_number() [Ranking Window Functions]

2016-10-06 Thread Khurram Faraaz (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15551389#comment-15551389
 ] 

Khurram Faraaz commented on DRILL-4933:
---

Apache Drill 1.9.0 is still not released. You can build from source and deploy 
binaries and use them.

> Column aliasing isn’t working when we use partition by clause with 
> row_number() [Ranking Window Functions]
> --
>
> Key: DRILL-4933
> URL: https://issues.apache.org/jira/browse/DRILL-4933
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.8.0
>Reporter: Deepak Shivamurthy
>Priority: Minor
>
> I have run below query and alias is done on the third column, but still alias 
> does not working.
> select ID, sts_utc, row_number() over(partition by ID order by sts_utc) as 
> `rownum` from dfs.`/tmp/events` limit 10;
> Output:
> ID sts_utc, $2
> I would be expecting column rownum instead of $2.. (even i have other column 
> names, still aliasing didnt work)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4910) Apache Drill 1.8 UI doesn't display Hive join query results

2016-10-06 Thread Gopal Nagar (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15551483#comment-15551483
 ] 

Gopal Nagar commented on DRILL-4910:


Hi Faraaz,

Did you get chance to look into this issue ?
Appreciate your help.

Thanks & Regards,
Gopal Nagar

> Apache Drill 1.8 UI doesn't display Hive join query results
> ---
>
> Key: DRILL-4910
> URL: https://issues.apache.org/jira/browse/DRILL-4910
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Gopal Nagar
> Attachments: drillbit.log
>
>
> Hi All,
> I am using Apache Drill 1.8.0 on AWS EMR and joining two hive tables. Below 
> is sample query. This working fine in Drill CLI but giving below error after 
> running few minutes. If i try simple select query (select t1.col from 
> hive.table t1) it works fine in both Drill CLI and UI. Only problem with join 
> query.
> If i cancel the join query from background, it displays results in UI. This 
> is very strange situation.
> Join Query 
> --
> select t1.col FROM hive.table1 as t1 join hive.table2 as t2 on t1.col = 
> t2.col limit 1000;
> Error
> ---
> Query Failed: An Error Occurred 
> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: 
> RpcException: Data not accepted downstream. Fragment 1:4 [Error Id: 
> 0b5ed2db-3653-4e3a-9c92-d0a6cd69b66e on 
> ip-172-31-16-222.us-west-2.compute.internal:31010]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (DRILL-4862) wrong results - use of convert_from(binary_string(key),'UTF8') in filter results in wrong results

2016-10-06 Thread Zelaine Fong (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong reassigned DRILL-4862:
---

Assignee: Gautam Kumar Parai  (was: Chunhui Shi)

Assigning to [~gparai] for code review.

> wrong results - use of convert_from(binary_string(key),'UTF8') in filter 
> results in wrong results
> -
>
> Key: DRILL-4862
> URL: https://issues.apache.org/jira/browse/DRILL-4862
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.8.0
> Environment: 4 node cluster CentOS
>Reporter: Khurram Faraaz
>Assignee: Gautam Kumar Parai
>
> These results do not look right, i.e when the predicate has 
> convert_from(binary_string(key),'UTF8')
> Apache drill 1.8.0-SNAPSHOT git commit ID: 57dc9f43
> {noformat}
> [root@centos-0x drill4478]# cat f1.json
> {"key":"\\x30\\x31\\x32\\x33"}
> {"key":"\\x34\\x35\\x36\\x37"}
> {"key":"\\x38\\x39\\x30\\x31"}
> {"key":"\\x30\\x30\\x30\\x30"}
> {"key":"\\x31\\x31\\x31\\x31"}
> {"key":"\\x35\\x35\\x35\\x35"}
> {"key":"\\x38\\x38\\x38\\x38"}
> {"key":"\\x39\\x39\\x39\\x39"}
> {"key":"\\x41\\x42\\x43\\x44"}
> {"key":"\\x45\\x46\\x47\\x48"}
> {"key":"\\x49\\x41\\x44\\x46"}
> {"key":"\\x4a\\x4b\\x4c\\x4d"}
> {"key":"\\x57\\x58\\x59\\x5a"}
> {"key":"\\x4e\\x4f\\x50\\x51"}
> {"key":"\\x46\\x46\\x46\\x46"}
> {noformat}
> results without the predicate - these are correct results
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> select convert_from(binary_string(key),'UTF8') 
> from `f1.json`;
> +-+
> | EXPR$0  |
> +-+
> | 0123|
> | 4567|
> | 8901|
> | |
> | |
> | |
> | |
> | |
> | ABCD|
> | EFGH|
> | IADF|
> | JKLM|
> | WXYZ|
> | NOPQ|
> | |
> +-+
> 15 rows selected (0.256 seconds)
> {noformat}
> results with a predicate - these results don't look correct
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> select convert_from(binary_string(key),'UTF8') 
> from `f1.json` where convert_from(binary_string(key),'UTF8') is not null;
> +--+
> |  EXPR$0  |
> +--+
> | 0123123  |
> | 4567567  |
> | 8901901  |
> | 000  |
> | 111  |
> | 555  |
> | 888  |
> | 999  |
> | ABCDBCD  |
> | EFGHFGH  |
> | IADFADF  |
> | JKLMKLM  |
> | WXYZXYZ  |
> | NOPQOPQ  |
> | FFF  |
> +--+
> 15 rows selected (0.279 seconds)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4203) Parquet File : Date is stored wrongly

2016-10-06 Thread Vitalii Diravka (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15551878#comment-15551878
 ] 

Vitalii Diravka commented on DRILL-4203:


It was a special case: exception while parsing "created by" meta. And it was 
expected only from drill files. But this file with correct dates. 
So I changed appropriate logic, now column metadata is checked for this case. 
Also added unit test with this paruet file. Changes in last commit.
Thanks [~rkins].

> Parquet File : Date is stored wrongly
> -
>
> Key: DRILL-4203
> URL: https://issues.apache.org/jira/browse/DRILL-4203
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.4.0
>Reporter: Stéphane Trou
>Assignee: Vitalii Diravka
>Priority: Critical
> Fix For: 1.9.0
>
>
> Hello,
> I have some problems when i try to read parquet files produce by drill with  
> Spark,  all dates are corrupted.
> I think the problem come from drill :)
> {code}
> cat /tmp/date_parquet.csv 
> Epoch,1970-01-01
> {code}
> {code}
> 0: jdbc:drill:zk=local> select columns[0] as name, cast(columns[1] as date) 
> as epoch_date from dfs.tmp.`date_parquet.csv`;
> ++-+
> |  name  | epoch_date  |
> ++-+
> | Epoch  | 1970-01-01  |
> ++-+
> {code}
> {code}
> 0: jdbc:drill:zk=local> create table dfs.tmp.`buggy_parquet`as select 
> columns[0] as name, cast(columns[1] as date) as epoch_date from 
> dfs.tmp.`date_parquet.csv`;
> +---++
> | Fragment  | Number of records written  |
> +---++
> | 0_0   | 1  |
> +---++
> {code}
> When I read the file with parquet tools, i found  
> {code}
> java -jar parquet-tools-1.8.1.jar head /tmp/buggy_parquet/
> name = Epoch
> epoch_date = 4881176
> {code}
> According to 
> [https://github.com/Parquet/parquet-format/blob/master/LogicalTypes.md#date], 
> epoch_date should be equals to 0.
> Meta : 
> {code}
> java -jar parquet-tools-1.8.1.jar meta /tmp/buggy_parquet/
> file:file:/tmp/buggy_parquet/0_0_0.parquet 
> creator: parquet-mr version 1.8.1-drill-r0 (build 
> 6b605a4ea05b66e1a6bf843353abcb4834a4ced8) 
> extra:   drill.version = 1.4.0 
> file schema: root 
> 
> name:OPTIONAL BINARY O:UTF8 R:0 D:1
> epoch_date:  OPTIONAL INT32 O:DATE R:0 D:1
> row group 1: RC:1 TS:93 OFFSET:4 
> 
> name: BINARY SNAPPY DO:0 FPO:4 SZ:52/50/0,96 VC:1 
> ENC:RLE,BIT_PACKED,PLAIN
> epoch_date:   INT32 SNAPPY DO:0 FPO:56 SZ:45/43/0,96 VC:1 
> ENC:RLE,BIT_PACKED,PLAIN
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4203) Parquet File : Date is stored wrongly

2016-10-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15553044#comment-15553044
 ] 

ASF GitHub Bot commented on DRILL-4203:
---

Github user jaltekruse commented on a diff in the pull request:

https://github.com/apache/drill/pull/595#discussion_r82272848
  
--- Diff: 
contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveDrillNativeScanBatchCreator.java
 ---
@@ -123,6 +123,7 @@ public ScanBatch getBatch(FragmentContext context, 
HiveDrillNativeParquetSubScan
   // in the first row group
   ParquetReaderUtility.DateCorruptionStatus containsCorruptDates =
   ParquetReaderUtility.detectCorruptDates(parquetMetadata, 
config.getColumns(), true);
+  logger.info(containsCorruptDates.toString());
--- End diff --

not a very useful logging statement to check in, should at least contain a 
message about what this value is. This will just print a boolean value. If it 
was just for local debugging it can be removed.


> Parquet File : Date is stored wrongly
> -
>
> Key: DRILL-4203
> URL: https://issues.apache.org/jira/browse/DRILL-4203
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.4.0
>Reporter: Stéphane Trou
>Assignee: Vitalii Diravka
>Priority: Critical
> Fix For: 1.9.0
>
>
> Hello,
> I have some problems when i try to read parquet files produce by drill with  
> Spark,  all dates are corrupted.
> I think the problem come from drill :)
> {code}
> cat /tmp/date_parquet.csv 
> Epoch,1970-01-01
> {code}
> {code}
> 0: jdbc:drill:zk=local> select columns[0] as name, cast(columns[1] as date) 
> as epoch_date from dfs.tmp.`date_parquet.csv`;
> ++-+
> |  name  | epoch_date  |
> ++-+
> | Epoch  | 1970-01-01  |
> ++-+
> {code}
> {code}
> 0: jdbc:drill:zk=local> create table dfs.tmp.`buggy_parquet`as select 
> columns[0] as name, cast(columns[1] as date) as epoch_date from 
> dfs.tmp.`date_parquet.csv`;
> +---++
> | Fragment  | Number of records written  |
> +---++
> | 0_0   | 1  |
> +---++
> {code}
> When I read the file with parquet tools, i found  
> {code}
> java -jar parquet-tools-1.8.1.jar head /tmp/buggy_parquet/
> name = Epoch
> epoch_date = 4881176
> {code}
> According to 
> [https://github.com/Parquet/parquet-format/blob/master/LogicalTypes.md#date], 
> epoch_date should be equals to 0.
> Meta : 
> {code}
> java -jar parquet-tools-1.8.1.jar meta /tmp/buggy_parquet/
> file:file:/tmp/buggy_parquet/0_0_0.parquet 
> creator: parquet-mr version 1.8.1-drill-r0 (build 
> 6b605a4ea05b66e1a6bf843353abcb4834a4ced8) 
> extra:   drill.version = 1.4.0 
> file schema: root 
> 
> name:OPTIONAL BINARY O:UTF8 R:0 D:1
> epoch_date:  OPTIONAL INT32 O:DATE R:0 D:1
> row group 1: RC:1 TS:93 OFFSET:4 
> 
> name: BINARY SNAPPY DO:0 FPO:4 SZ:52/50/0,96 VC:1 
> ENC:RLE,BIT_PACKED,PLAIN
> epoch_date:   INT32 SNAPPY DO:0 FPO:56 SZ:45/43/0,96 VC:1 
> ENC:RLE,BIT_PACKED,PLAIN
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4841) Use user server event loop group for web clients

2016-10-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15553351#comment-15553351
 ] 

ASF GitHub Bot commented on DRILL-4841:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/565#discussion_r82266904
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/client/DrillClient.java ---
@@ -688,4 +802,142 @@ public DrillBuf getBuffer() {
   return null;
 }
   }
+
+  /**
+   * Return a new {@link DrillClient.Builder Drill client builder}.
+   * @return a new builder
+   */
+  public static Builder newBuilder() {
+return new Builder();
+  }
+
+  /**
+   * Helper class to construct a {@link DrillClient Drill client}.
+   */
+  public static class Builder {
+
+private DrillConfig config;
+private BufferAllocator allocator;
+private ClusterCoordinator clusterCoordinator;
+private EventLoopGroup eventLoopGroup;
+private ExecutorService executor;
+
+// defaults
+private boolean supportComplexTypes = true;
+private boolean isDirectConnection = false;
+
+/**
+ * Sets the {@link DrillConfig configuration} for this client.
+ *
+ * @param drillConfig drill configuration
+ * @return this builder
+ */
+public Builder setConfig(DrillConfig drillConfig) {
+  this.config = drillConfig;
+  return this;
+}
+
+/**
+ * Sets the {@link DrillConfig configuration} for this client based on 
the given file.
+ *
+ * @param fileName configuration file name
+ * @return this builder
+ */
+public Builder setConfigFromFile(final String fileName) {
+  this.config = DrillConfig.create(fileName);
+  return this;
+}
+
+/**
+ * Sets the {@link BufferAllocator buffer allocator} to be used by 
this client.
+ * If this is not set, an allocator will be created based on the 
configuration.
--- End diff --

But, what if I don't provide a config? Or, create my own? What do I have to 
set in that config to make it work?

More to the point, WHEN do I have to provide my own root? When I do 
multiple clients in a single app? If so, can we provide a short example showing 
how this works?

Even easier, can the builder itself create a static allocator that it 
shares across connections?


> Use user server event loop group for web clients
> 
>
> Key: DRILL-4841
> URL: https://issues.apache.org/jira/browse/DRILL-4841
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Client - HTTP
>Reporter: Sudheesh Katkam
>Assignee: Sorabh Hamirwasia
>Priority: Minor
>
> Currently we spawn an event loop group for handling requests from clients. 
> This group should also be used to handles responses (from server) for web 
> clients.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4841) Use user server event loop group for web clients

2016-10-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15553350#comment-15553350
 ] 

ASF GitHub Bot commented on DRILL-4841:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/565#discussion_r82269525
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/TestDistributedFragmentRun.java
 ---
@@ -41,7 +41,11 @@
   public void oneBitOneExchangeOneEntryRun() throws Exception{
 RemoteServiceSet serviceSet = RemoteServiceSet.getLocalServiceSet();
 
-try(Drillbit bit1 = new Drillbit(CONFIG, serviceSet); DrillClient 
client = new DrillClient(CONFIG, serviceSet.getCoordinator());){
+try(Drillbit bit1 = new Drillbit(CONFIG, serviceSet);
--- End diff --

This pattern show up over and over. Should it just be moved into a function 
rather than as a duplicated "code injection" in a zillion files?


> Use user server event loop group for web clients
> 
>
> Key: DRILL-4841
> URL: https://issues.apache.org/jira/browse/DRILL-4841
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Client - HTTP
>Reporter: Sudheesh Katkam
>Assignee: Sorabh Hamirwasia
>Priority: Minor
>
> Currently we spawn an event loop group for handling requests from clients. 
> This group should also be used to handles responses (from server) for web 
> clients.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4841) Use user server event loop group for web clients

2016-10-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15553345#comment-15553345
 ] 

ASF GitHub Bot commented on DRILL-4841:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/565#discussion_r82269228
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/auth/DrillUserPrincipal.java
 ---
@@ -118,8 +118,11 @@ public AnonDrillUserPrincipal(final DrillbitContext 
drillbitContext) {
 public DrillClient getDrillClient() throws IOException {
   try {
 // Create a DrillClient
-drillClient = new DrillClient(drillbitContext.getConfig(),
-drillbitContext.getClusterCoordinator(), 
drillbitContext.getAllocator());
+drillClient = DrillClient.newBuilder()
--- End diff --

Is the context something we should expose as part of our changes:

DrillbitContext context = ...
DrillClient client = DrillClient.builder( )
.fromContext( context )
.build( );

Doing the above might simplify test code.


> Use user server event loop group for web clients
> 
>
> Key: DRILL-4841
> URL: https://issues.apache.org/jira/browse/DRILL-4841
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Client - HTTP
>Reporter: Sudheesh Katkam
>Assignee: Sorabh Hamirwasia
>Priority: Minor
>
> Currently we spawn an event loop group for handling requests from clients. 
> This group should also be used to handles responses (from server) for web 
> clients.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4841) Use user server event loop group for web clients

2016-10-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15553348#comment-15553348
 ] 

ASF GitHub Bot commented on DRILL-4841:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/565#discussion_r82267401
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/client/DrillClient.java ---
@@ -688,4 +802,142 @@ public DrillBuf getBuffer() {
   return null;
 }
   }
+
+  /**
+   * Return a new {@link DrillClient.Builder Drill client builder}.
+   * @return a new builder
+   */
+  public static Builder newBuilder() {
+return new Builder();
+  }
+
+  /**
+   * Helper class to construct a {@link DrillClient Drill client}.
+   */
+  public static class Builder {
+
+private DrillConfig config;
+private BufferAllocator allocator;
+private ClusterCoordinator clusterCoordinator;
+private EventLoopGroup eventLoopGroup;
+private ExecutorService executor;
+
+// defaults
+private boolean supportComplexTypes = true;
+private boolean isDirectConnection = false;
+
+/**
+ * Sets the {@link DrillConfig configuration} for this client.
+ *
+ * @param drillConfig drill configuration
+ * @return this builder
+ */
+public Builder setConfig(DrillConfig drillConfig) {
+  this.config = drillConfig;
+  return this;
+}
+
+/**
+ * Sets the {@link DrillConfig configuration} for this client based on 
the given file.
+ *
+ * @param fileName configuration file name
+ * @return this builder
+ */
+public Builder setConfigFromFile(final String fileName) {
+  this.config = DrillConfig.create(fileName);
+  return this;
+}
+
+/**
+ * Sets the {@link BufferAllocator buffer allocator} to be used by 
this client.
+ * If this is not set, an allocator will be created based on the 
configuration.
+ *
+ * If this is set, the caller is responsible for closing the given 
allocator.
+ *
+ * @param allocator buffer allocator
+ * @return this builder
+ */
+public Builder setAllocator(final BufferAllocator allocator) {
+  this.allocator = allocator;
+  return this;
+}
+
+/**
+ * Sets the {@link ClusterCoordinator cluster coordinator} that this 
client
+ * registers with. If this is not set and the this client does not use 
a
+ * {@link #setDirectConnection direct connection}, a cluster 
coordinator will
+ * be created based on the configuration.
+ *
+ * If this is set, the caller is responsible for closing the given 
coordinator.
+ *
+ * @param clusterCoordinator cluster coordinator
+ * @return this builder
+ */
+public Builder setClusterCoordinator(final ClusterCoordinator 
clusterCoordinator) {
+  this.clusterCoordinator = clusterCoordinator;
+  return this;
+}
+
+/**
+ * Sets the event loop group that to be used by the client. If this is 
not set,
+ * an event loop group will be created based on the configuration.
+ *
+ * If this is set, the caller is responsible for closing the given 
event loop group.
+ *
+ * @param eventLoopGroup event loop group
+ * @return this builder
+ */
+public Builder setEventLoopGroup(final EventLoopGroup eventLoopGroup) {
+  this.eventLoopGroup = eventLoopGroup;
+  return this;
+}
+
+/**
+ * Sets the executor service to be used by the client. If this is not 
set,
--- End diff --

What is an executor service and why would I want to create my own?


> Use user server event loop group for web clients
> 
>
> Key: DRILL-4841
> URL: https://issues.apache.org/jira/browse/DRILL-4841
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Client - HTTP
>Reporter: Sudheesh Katkam
>Assignee: Sorabh Hamirwasia
>Priority: Minor
>
> Currently we spawn an event loop group for handling requests from clients. 
> This group should also be used to handles responses (from server) for web 
> clients.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4841) Use user server event loop group for web clients

2016-10-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15553342#comment-15553342
 ] 

ASF GitHub Bot commented on DRILL-4841:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/565#discussion_r82266574
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/client/DrillClient.java ---
@@ -688,4 +802,142 @@ public DrillBuf getBuffer() {
   return null;
 }
   }
+
+  /**
+   * Return a new {@link DrillClient.Builder Drill client builder}.
+   * @return a new builder
+   */
+  public static Builder newBuilder() {
+return new Builder();
+  }
+
+  /**
+   * Helper class to construct a {@link DrillClient Drill client}.
+   */
+  public static class Builder {
+
+private DrillConfig config;
+private BufferAllocator allocator;
+private ClusterCoordinator clusterCoordinator;
+private EventLoopGroup eventLoopGroup;
+private ExecutorService executor;
+
+// defaults
+private boolean supportComplexTypes = true;
+private boolean isDirectConnection = false;
+
+/**
+ * Sets the {@link DrillConfig configuration} for this client.
+ *
+ * @param drillConfig drill configuration
+ * @return this builder
+ */
+public Builder setConfig(DrillConfig drillConfig) {
--- End diff --

DrillConfig objects are pretty complex. Should we accept the underlying 
Config (TypeSafe object) instead/in addition?


> Use user server event loop group for web clients
> 
>
> Key: DRILL-4841
> URL: https://issues.apache.org/jira/browse/DRILL-4841
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Client - HTTP
>Reporter: Sudheesh Katkam
>Assignee: Sorabh Hamirwasia
>Priority: Minor
>
> Currently we spawn an event loop group for handling requests from clients. 
> This group should also be used to handles responses (from server) for web 
> clients.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4841) Use user server event loop group for web clients

2016-10-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15553347#comment-15553347
 ] 

ASF GitHub Bot commented on DRILL-4841:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/565#discussion_r82267107
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/client/DrillClient.java ---
@@ -688,4 +802,142 @@ public DrillBuf getBuffer() {
   return null;
 }
   }
+
+  /**
+   * Return a new {@link DrillClient.Builder Drill client builder}.
+   * @return a new builder
+   */
+  public static Builder newBuilder() {
+return new Builder();
+  }
+
+  /**
+   * Helper class to construct a {@link DrillClient Drill client}.
+   */
+  public static class Builder {
+
+private DrillConfig config;
+private BufferAllocator allocator;
+private ClusterCoordinator clusterCoordinator;
+private EventLoopGroup eventLoopGroup;
+private ExecutorService executor;
+
+// defaults
+private boolean supportComplexTypes = true;
+private boolean isDirectConnection = false;
+
+/**
+ * Sets the {@link DrillConfig configuration} for this client.
+ *
+ * @param drillConfig drill configuration
+ * @return this builder
+ */
+public Builder setConfig(DrillConfig drillConfig) {
+  this.config = drillConfig;
+  return this;
+}
+
+/**
+ * Sets the {@link DrillConfig configuration} for this client based on 
the given file.
+ *
+ * @param fileName configuration file name
+ * @return this builder
+ */
+public Builder setConfigFromFile(final String fileName) {
+  this.config = DrillConfig.create(fileName);
+  return this;
+}
+
+/**
+ * Sets the {@link BufferAllocator buffer allocator} to be used by 
this client.
+ * If this is not set, an allocator will be created based on the 
configuration.
+ *
+ * If this is set, the caller is responsible for closing the given 
allocator.
+ *
+ * @param allocator buffer allocator
+ * @return this builder
+ */
+public Builder setAllocator(final BufferAllocator allocator) {
+  this.allocator = allocator;
+  return this;
+}
+
+/**
+ * Sets the {@link ClusterCoordinator cluster coordinator} that this 
client
--- End diff --

When would I use this rather than letting Drill create it from the config? 
Would I do this if I have multiple Drill clients in a single app? Or, will the 
underlying code know to share the same coordinator?


> Use user server event loop group for web clients
> 
>
> Key: DRILL-4841
> URL: https://issues.apache.org/jira/browse/DRILL-4841
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Client - HTTP
>Reporter: Sudheesh Katkam
>Assignee: Sorabh Hamirwasia
>Priority: Minor
>
> Currently we spawn an event loop group for handling requests from clients. 
> This group should also be used to handles responses (from server) for web 
> clients.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4203) Parquet File : Date is stored wrongly

2016-10-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15553370#comment-15553370
 ] 

ASF GitHub Bot commented on DRILL-4203:
---

Github user vdiravka commented on the issue:

https://github.com/apache/drill/pull/595
  
Thanks @jaltekruse.
@parthchandra Could you please take a look?


> Parquet File : Date is stored wrongly
> -
>
> Key: DRILL-4203
> URL: https://issues.apache.org/jira/browse/DRILL-4203
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.4.0
>Reporter: Stéphane Trou
>Assignee: Vitalii Diravka
>Priority: Critical
> Fix For: 1.9.0
>
>
> Hello,
> I have some problems when i try to read parquet files produce by drill with  
> Spark,  all dates are corrupted.
> I think the problem come from drill :)
> {code}
> cat /tmp/date_parquet.csv 
> Epoch,1970-01-01
> {code}
> {code}
> 0: jdbc:drill:zk=local> select columns[0] as name, cast(columns[1] as date) 
> as epoch_date from dfs.tmp.`date_parquet.csv`;
> ++-+
> |  name  | epoch_date  |
> ++-+
> | Epoch  | 1970-01-01  |
> ++-+
> {code}
> {code}
> 0: jdbc:drill:zk=local> create table dfs.tmp.`buggy_parquet`as select 
> columns[0] as name, cast(columns[1] as date) as epoch_date from 
> dfs.tmp.`date_parquet.csv`;
> +---++
> | Fragment  | Number of records written  |
> +---++
> | 0_0   | 1  |
> +---++
> {code}
> When I read the file with parquet tools, i found  
> {code}
> java -jar parquet-tools-1.8.1.jar head /tmp/buggy_parquet/
> name = Epoch
> epoch_date = 4881176
> {code}
> According to 
> [https://github.com/Parquet/parquet-format/blob/master/LogicalTypes.md#date], 
> epoch_date should be equals to 0.
> Meta : 
> {code}
> java -jar parquet-tools-1.8.1.jar meta /tmp/buggy_parquet/
> file:file:/tmp/buggy_parquet/0_0_0.parquet 
> creator: parquet-mr version 1.8.1-drill-r0 (build 
> 6b605a4ea05b66e1a6bf843353abcb4834a4ced8) 
> extra:   drill.version = 1.4.0 
> file schema: root 
> 
> name:OPTIONAL BINARY O:UTF8 R:0 D:1
> epoch_date:  OPTIONAL INT32 O:DATE R:0 D:1
> row group 1: RC:1 TS:93 OFFSET:4 
> 
> name: BINARY SNAPPY DO:0 FPO:4 SZ:52/50/0,96 VC:1 
> ENC:RLE,BIT_PACKED,PLAIN
> epoch_date:   INT32 SNAPPY DO:0 FPO:56 SZ:45/43/0,96 VC:1 
> ENC:RLE,BIT_PACKED,PLAIN
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4841) Use user server event loop group for web clients

2016-10-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15553339#comment-15553339
 ] 

ASF GitHub Bot commented on DRILL-4841:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/565#discussion_r82264416
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/client/DrillClient.java ---
@@ -89,79 +88,175 @@
 import com.google.common.annotations.VisibleForTesting;
 import com.google.common.base.Strings;
 import com.google.common.util.concurrent.AbstractCheckedFuture;
+import com.google.common.util.concurrent.MoreExecutors;
 import com.google.common.util.concurrent.SettableFuture;
 
 /**
  * Thin wrapper around a UserClient that handles connect/close and 
transforms
  * String into ByteBuf.
+ *
+ * To create non-default objects, use {@link DrillClient.Builder the 
builder class}.
+ * E.g.
+ * 
+ *   DrillClient client = DrillClient.newBuilder()
+ *   .setConfig(...)
+ *   .setIsDirectConnection(true)
--- End diff --

setDirectConnection

The "setIs" form is rather awkward. The typical convention is:

setSomething( boolean flag )
boolean isSomething( )


> Use user server event loop group for web clients
> 
>
> Key: DRILL-4841
> URL: https://issues.apache.org/jira/browse/DRILL-4841
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Client - HTTP
>Reporter: Sudheesh Katkam
>Assignee: Sorabh Hamirwasia
>Priority: Minor
>
> Currently we spawn an event loop group for handling requests from clients. 
> This group should also be used to handles responses (from server) for web 
> clients.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4841) Use user server event loop group for web clients

2016-10-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15553341#comment-15553341
 ] 

ASF GitHub Bot commented on DRILL-4841:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/565#discussion_r82268707
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/auth/AbstractDrillLoginService.java
 ---
@@ -46,8 +46,11 @@ protected DrillClient createDrillClient(final String 
userName, final String pass
 
 try {
   // Create a DrillClient
-  drillClient = new DrillClient(drillbitContext.getConfig(),
-  drillbitContext.getClusterCoordinator(), 
drillbitContext.getAllocator());
+  drillClient = DrillClient.newBuilder()
+  .setConfig(drillbitContext.getConfig())
+  .setClusterCoordinator(drillbitContext.getClusterCoordinator())
+  .setAllocator(drillbitContext.getAllocator())
+  .build();
--- End diff --

Should the builder be extended to also (optionally) do connect? Or, return 
a connect builder?

Here I'm thinking that it would be helpful to have a single builder gather 
things like the user name property and the connection string (which would seem 
to be a DrillClient parameter but is actually a connection property.)

That is, either:

DrillClient.builder( ) ...
   .withUser( userName )
   .connectTo( "myHost", 1234 )
  .build( );

Or

DrillClient.builder( ) ...
   .buildClient( )
   .withUser( userName )
   .connectTo( "myHost", 1234 )
   .connect( );



> Use user server event loop group for web clients
> 
>
> Key: DRILL-4841
> URL: https://issues.apache.org/jira/browse/DRILL-4841
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Client - HTTP
>Reporter: Sudheesh Katkam
>Assignee: Sorabh Hamirwasia
>Priority: Minor
>
> Currently we spawn an event loop group for handling requests from clients. 
> This group should also be used to handles responses (from server) for web 
> clients.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4841) Use user server event loop group for web clients

2016-10-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15553338#comment-15553338
 ] 

ASF GitHub Bot commented on DRILL-4841:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/565#discussion_r82295775
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/client/DrillClient.java ---
@@ -89,79 +88,175 @@
 import com.google.common.annotations.VisibleForTesting;
 import com.google.common.base.Strings;
 import com.google.common.util.concurrent.AbstractCheckedFuture;
+import com.google.common.util.concurrent.MoreExecutors;
 import com.google.common.util.concurrent.SettableFuture;
 
 /**
  * Thin wrapper around a UserClient that handles connect/close and 
transforms
  * String into ByteBuf.
+ *
+ * To create non-default objects, use {@link DrillClient.Builder the 
builder class}.
+ * E.g.
+ * 
+ *   DrillClient client = DrillClient.newBuilder()
+ *   .setConfig(...)
+ *   .setIsDirectConnection(true)
+ *   .build();
+ * 
+ *
+ * Except for {@link #runQuery} and {@link #cancelQuery}, this class is 
generally not thread safe.
  */
 public class DrillClient implements Closeable, ConnectionThrottle {
   private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(DrillClient.class);
 
   private static final ObjectMapper objectMapper = new ObjectMapper();
   private final DrillConfig config;
-  private UserClient client;
-  private UserProperties props = null;
-  private volatile ClusterCoordinator clusterCoordinator;
-  private volatile boolean connected = false;
   private final BufferAllocator allocator;
-  private int reconnectTimes;
-  private int reconnectDelay;
-  private boolean supportComplexTypes;
-  private final boolean ownsZkConnection;
+  private final EventLoopGroup eventLoopGroup;
+  private final ExecutorService executor;
+  private final boolean isDirectConnection;
+  private final int reconnectTimes;
+  private final int reconnectDelay;
+
+  // TODO: clusterCoordinator should be initialized in the constructor.
+  // Currently, initialization is tightly coupled with #connect.
+  private ClusterCoordinator clusterCoordinator;
+
+  // checks if this client owns these resources (used when closing)
   private final boolean ownsAllocator;
-  private final boolean isDirectConnection; // true if the connection 
bypasses zookeeper and connects directly to a drillbit
-  private EventLoopGroup eventLoopGroup;
-  private ExecutorService executor;
+  private final boolean ownsZkConnection;
+  private final boolean ownsEventLoopGroup;
+  private final boolean ownsExecutor;
+
+  // once #setSupportComplexTypes() is removed, make this final
+  private boolean supportComplexTypes;
+
+  private UserClient client;
+  private UserProperties props;
+  private boolean connected;
 
-  public DrillClient() throws OutOfMemoryException {
-this(DrillConfig.create(), false);
+  public DrillClient() {
+this(newBuilder());
   }
 
-  public DrillClient(boolean isDirect) throws OutOfMemoryException {
-this(DrillConfig.create(), isDirect);
+  /**
+   * @deprecated Create a DrillClient using {@link DrillClient.Builder}.
+   */
+  @Deprecated
+  public DrillClient(boolean isDirect) {
+this(newBuilder()
+.setDirectConnection(isDirect));
   }
 
-  public DrillClient(String fileName) throws OutOfMemoryException {
-this(DrillConfig.create(fileName), false);
+  /**
+   * @deprecated Create a DrillClient using {@link DrillClient.Builder}.
+   */
+  @Deprecated
+  public DrillClient(String fileName) {
+this(newBuilder()
+.setConfigFromFile(fileName));
   }
 
-  public DrillClient(DrillConfig config) throws OutOfMemoryException {
-this(config, null, false);
+  /**
+   * @deprecated Create a DrillClient using {@link DrillClient.Builder}.
+   */
+  @Deprecated
+  public DrillClient(DrillConfig config) {
+this(newBuilder()
+.setConfig(config));
   }
 
-  public DrillClient(DrillConfig config, boolean isDirect)
-  throws OutOfMemoryException {
-this(config, null, isDirect);
+  /**
+   * @deprecated Create a DrillClient using {@link DrillClient.Builder}.
+   */
+  @Deprecated
+  public DrillClient(DrillConfig config, boolean isDirect) {
+this(newBuilder()
+.setConfig(config)
+.setDirectConnection(isDirect));
   }
 
-  public DrillClient(DrillConfig config, ClusterCoordinator coordinator)
-

[jira] [Commented] (DRILL-4841) Use user server event loop group for web clients

2016-10-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15553353#comment-15553353
 ] 

ASF GitHub Bot commented on DRILL-4841:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/565#discussion_r82267967
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/client/DrillClient.java ---
@@ -688,4 +802,142 @@ public DrillBuf getBuffer() {
   return null;
 }
   }
+
+  /**
+   * Return a new {@link DrillClient.Builder Drill client builder}.
+   * @return a new builder
+   */
+  public static Builder newBuilder() {
+return new Builder();
+  }
+
+  /**
+   * Helper class to construct a {@link DrillClient Drill client}.
+   */
+  public static class Builder {
+
+private DrillConfig config;
+private BufferAllocator allocator;
+private ClusterCoordinator clusterCoordinator;
+private EventLoopGroup eventLoopGroup;
+private ExecutorService executor;
+
+// defaults
+private boolean supportComplexTypes = true;
+private boolean isDirectConnection = false;
+
+/**
+ * Sets the {@link DrillConfig configuration} for this client.
+ *
+ * @param drillConfig drill configuration
+ * @return this builder
+ */
+public Builder setConfig(DrillConfig drillConfig) {
+  this.config = drillConfig;
+  return this;
+}
+
+/**
+ * Sets the {@link DrillConfig configuration} for this client based on 
the given file.
+ *
+ * @param fileName configuration file name
+ * @return this builder
+ */
+public Builder setConfigFromFile(final String fileName) {
+  this.config = DrillConfig.create(fileName);
+  return this;
+}
+
+/**
+ * Sets the {@link BufferAllocator buffer allocator} to be used by 
this client.
+ * If this is not set, an allocator will be created based on the 
configuration.
+ *
+ * If this is set, the caller is responsible for closing the given 
allocator.
+ *
+ * @param allocator buffer allocator
+ * @return this builder
+ */
+public Builder setAllocator(final BufferAllocator allocator) {
+  this.allocator = allocator;
+  return this;
+}
+
+/**
+ * Sets the {@link ClusterCoordinator cluster coordinator} that this 
client
+ * registers with. If this is not set and the this client does not use 
a
+ * {@link #setDirectConnection direct connection}, a cluster 
coordinator will
+ * be created based on the configuration.
+ *
+ * If this is set, the caller is responsible for closing the given 
coordinator.
+ *
+ * @param clusterCoordinator cluster coordinator
+ * @return this builder
+ */
+public Builder setClusterCoordinator(final ClusterCoordinator 
clusterCoordinator) {
+  this.clusterCoordinator = clusterCoordinator;
+  return this;
+}
+
+/**
+ * Sets the event loop group that to be used by the client. If this is 
not set,
+ * an event loop group will be created based on the configuration.
+ *
+ * If this is set, the caller is responsible for closing the given 
event loop group.
+ *
+ * @param eventLoopGroup event loop group
+ * @return this builder
+ */
+public Builder setEventLoopGroup(final EventLoopGroup eventLoopGroup) {
+  this.eventLoopGroup = eventLoopGroup;
+  return this;
+}
+
+/**
+ * Sets the executor service to be used by the client. If this is not 
set,
+ * an executor will be created based on the configuration.
+ *
+ * If this is set, the caller is responsible for closing the given 
executor.
+ *
+ * @param executor executor service
+ * @return this builder
+ */
+public Builder setExecutorService(final ExecutorService executor) {
+  this.executor = executor;
+  return this;
+}
+
+/**
+ * Sets whether the application is willing to accept complex types 
(Map, Arrays)
+ * in the returned result set. Default is {@code true}. If set to 
{@code false},
+ * the complex types are returned as JSON encoded VARCHAR type.
+ *
+ * @param supportComplexTypes if client accepts complex types
+ * @return this builder
+ */
+public Builder setSupportsComplexTypes(final boolean 
supportComplexTypes) {
+  this.supportComplexTypes = supportComplexTypes;
+  return this;
+}
+
+/**
+ * Sets

[jira] [Commented] (DRILL-4841) Use user server event loop group for web clients

2016-10-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15553344#comment-15553344
 ] 

ASF GitHub Bot commented on DRILL-4841:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/565#discussion_r82263890
  
--- Diff: common/src/main/java/org/apache/drill/common/AutoCloseables.java 
---
@@ -87,4 +87,29 @@ public static void close(Iterable ac) throws Exception
   throw topLevelException;
 }
   }
+
+  /**
+   * close() an {@see java.lang.AutoCloseable} without throwing a (checked)
+   * {@see java.lang.Exception}. This wraps the close() call with a
+   * try-catch that will rethrow an Exception wrapped with a
+   * {@see java.lang.RuntimeException}, providing a way to call close()
+   * without having to do the try-catch everywhere or propagate the 
Exception.
+   *
+   * @param autoCloseable the AutoCloseable to close; may be null
+   * @throws RuntimeException if an Exception occurs; the Exception is
+   *   wrapped by the RuntimeException
+   */
+  public static void closeNoChecked(final AutoCloseable autoCloseable) {
--- End diff --

closeUnchecked ?

But, we are "checking", the "checked" has to do with the Exception type.

Maybe cleanClose( ) for this function. Then, add a "closeSilently" to catch 
and ignore close exceptions. (The closeSilently is handy in the case when, say, 
a file is full, a write failed, and the close will also fail because it still 
can't flush pending buffers.) There are other functions to the to silent close, 
but might be handy to have them in one place.


> Use user server event loop group for web clients
> 
>
> Key: DRILL-4841
> URL: https://issues.apache.org/jira/browse/DRILL-4841
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Client - HTTP
>Reporter: Sudheesh Katkam
>Assignee: Sorabh Hamirwasia
>Priority: Minor
>
> Currently we spawn an event loop group for handling requests from clients. 
> This group should also be used to handles responses (from server) for web 
> clients.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4841) Use user server event loop group for web clients

2016-10-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15553343#comment-15553343
 ] 

ASF GitHub Bot commented on DRILL-4841:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/565#discussion_r82266459
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/client/DrillClient.java ---
@@ -688,4 +802,142 @@ public DrillBuf getBuffer() {
   return null;
 }
   }
+
+  /**
+   * Return a new {@link DrillClient.Builder Drill client builder}.
+   * @return a new builder
+   */
+  public static Builder newBuilder() {
+return new Builder();
+  }
+
+  /**
+   * Helper class to construct a {@link DrillClient Drill client}.
+   */
+  public static class Builder {
+
+private DrillConfig config;
+private BufferAllocator allocator;
+private ClusterCoordinator clusterCoordinator;
+private EventLoopGroup eventLoopGroup;
+private ExecutorService executor;
+
+// defaults
+private boolean supportComplexTypes = true;
+private boolean isDirectConnection = false;
+
+/**
+ * Sets the {@link DrillConfig configuration} for this client.
+ *
+ * @param drillConfig drill configuration
+ * @return this builder
+ */
+public Builder setConfig(DrillConfig drillConfig) {
+  this.config = drillConfig;
+  return this;
+}
+
+/**
+ * Sets the {@link DrillConfig configuration} for this client based on 
the given file.
--- End diff --

Better to set this based on a File (or, for the trendy, Path) object, since 
that is the Java standard for local files.

Does this replace the default class-path config? Or, is this added to the 
defaults?

Can I use this with the above? Or, can I set either the config directory OR 
via a file? If one or the other, should we check that case and throw an 
IllegalStateException (unchecked) or the like?


> Use user server event loop group for web clients
> 
>
> Key: DRILL-4841
> URL: https://issues.apache.org/jira/browse/DRILL-4841
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Client - HTTP
>Reporter: Sudheesh Katkam
>Assignee: Sorabh Hamirwasia
>Priority: Minor
>
> Currently we spawn an event loop group for handling requests from clients. 
> This group should also be used to handles responses (from server) for web 
> clients.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4841) Use user server event loop group for web clients

2016-10-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15553352#comment-15553352
 ] 

ASF GitHub Bot commented on DRILL-4841:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/565#discussion_r82267251
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/client/DrillClient.java ---
@@ -688,4 +802,142 @@ public DrillBuf getBuffer() {
   return null;
 }
   }
+
+  /**
+   * Return a new {@link DrillClient.Builder Drill client builder}.
+   * @return a new builder
+   */
+  public static Builder newBuilder() {
+return new Builder();
+  }
+
+  /**
+   * Helper class to construct a {@link DrillClient Drill client}.
+   */
+  public static class Builder {
+
+private DrillConfig config;
+private BufferAllocator allocator;
+private ClusterCoordinator clusterCoordinator;
+private EventLoopGroup eventLoopGroup;
+private ExecutorService executor;
+
+// defaults
+private boolean supportComplexTypes = true;
+private boolean isDirectConnection = false;
+
+/**
+ * Sets the {@link DrillConfig configuration} for this client.
+ *
+ * @param drillConfig drill configuration
+ * @return this builder
+ */
+public Builder setConfig(DrillConfig drillConfig) {
+  this.config = drillConfig;
+  return this;
+}
+
+/**
+ * Sets the {@link DrillConfig configuration} for this client based on 
the given file.
+ *
+ * @param fileName configuration file name
+ * @return this builder
+ */
+public Builder setConfigFromFile(final String fileName) {
+  this.config = DrillConfig.create(fileName);
+  return this;
+}
+
+/**
+ * Sets the {@link BufferAllocator buffer allocator} to be used by 
this client.
+ * If this is not set, an allocator will be created based on the 
configuration.
+ *
+ * If this is set, the caller is responsible for closing the given 
allocator.
+ *
+ * @param allocator buffer allocator
+ * @return this builder
+ */
+public Builder setAllocator(final BufferAllocator allocator) {
+  this.allocator = allocator;
+  return this;
+}
+
+/**
+ * Sets the {@link ClusterCoordinator cluster coordinator} that this 
client
+ * registers with. If this is not set and the this client does not use 
a
+ * {@link #setDirectConnection direct connection}, a cluster 
coordinator will
+ * be created based on the configuration.
+ *
+ * If this is set, the caller is responsible for closing the given 
coordinator.
+ *
+ * @param clusterCoordinator cluster coordinator
+ * @return this builder
+ */
+public Builder setClusterCoordinator(final ClusterCoordinator 
clusterCoordinator) {
+  this.clusterCoordinator = clusterCoordinator;
+  return this;
+}
+
+/**
+ * Sets the event loop group that to be used by the client. If this is 
not set,
--- End diff --

Event loop for what? Why would I want to provide my own? What are the 
constraints on the event loop?


> Use user server event loop group for web clients
> 
>
> Key: DRILL-4841
> URL: https://issues.apache.org/jira/browse/DRILL-4841
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Client - HTTP
>Reporter: Sudheesh Katkam
>Assignee: Sorabh Hamirwasia
>Priority: Minor
>
> Currently we spawn an event loop group for handling requests from clients. 
> This group should also be used to handles responses (from server) for web 
> clients.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4841) Use user server event loop group for web clients

2016-10-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15553349#comment-15553349
 ] 

ASF GitHub Bot commented on DRILL-4841:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/565#discussion_r82267594
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/client/DrillClient.java ---
@@ -688,4 +802,142 @@ public DrillBuf getBuffer() {
   return null;
 }
   }
+
+  /**
+   * Return a new {@link DrillClient.Builder Drill client builder}.
+   * @return a new builder
+   */
+  public static Builder newBuilder() {
+return new Builder();
+  }
+
+  /**
+   * Helper class to construct a {@link DrillClient Drill client}.
+   */
+  public static class Builder {
+
+private DrillConfig config;
+private BufferAllocator allocator;
+private ClusterCoordinator clusterCoordinator;
+private EventLoopGroup eventLoopGroup;
+private ExecutorService executor;
+
+// defaults
+private boolean supportComplexTypes = true;
+private boolean isDirectConnection = false;
+
+/**
+ * Sets the {@link DrillConfig configuration} for this client.
+ *
+ * @param drillConfig drill configuration
+ * @return this builder
+ */
+public Builder setConfig(DrillConfig drillConfig) {
+  this.config = drillConfig;
+  return this;
+}
+
+/**
+ * Sets the {@link DrillConfig configuration} for this client based on 
the given file.
+ *
+ * @param fileName configuration file name
+ * @return this builder
+ */
+public Builder setConfigFromFile(final String fileName) {
+  this.config = DrillConfig.create(fileName);
+  return this;
+}
+
+/**
+ * Sets the {@link BufferAllocator buffer allocator} to be used by 
this client.
+ * If this is not set, an allocator will be created based on the 
configuration.
+ *
+ * If this is set, the caller is responsible for closing the given 
allocator.
+ *
+ * @param allocator buffer allocator
+ * @return this builder
+ */
+public Builder setAllocator(final BufferAllocator allocator) {
+  this.allocator = allocator;
+  return this;
+}
+
+/**
+ * Sets the {@link ClusterCoordinator cluster coordinator} that this 
client
+ * registers with. If this is not set and the this client does not use 
a
+ * {@link #setDirectConnection direct connection}, a cluster 
coordinator will
+ * be created based on the configuration.
+ *
+ * If this is set, the caller is responsible for closing the given 
coordinator.
+ *
+ * @param clusterCoordinator cluster coordinator
+ * @return this builder
+ */
+public Builder setClusterCoordinator(final ClusterCoordinator 
clusterCoordinator) {
+  this.clusterCoordinator = clusterCoordinator;
+  return this;
+}
+
+/**
+ * Sets the event loop group that to be used by the client. If this is 
not set,
+ * an event loop group will be created based on the configuration.
+ *
+ * If this is set, the caller is responsible for closing the given 
event loop group.
+ *
+ * @param eventLoopGroup event loop group
+ * @return this builder
+ */
+public Builder setEventLoopGroup(final EventLoopGroup eventLoopGroup) {
+  this.eventLoopGroup = eventLoopGroup;
+  return this;
+}
+
+/**
+ * Sets the executor service to be used by the client. If this is not 
set,
+ * an executor will be created based on the configuration.
+ *
+ * If this is set, the caller is responsible for closing the given 
executor.
+ *
+ * @param executor executor service
+ * @return this builder
+ */
+public Builder setExecutorService(final ExecutorService executor) {
+  this.executor = executor;
+  return this;
+}
+
+/**
+ * Sets whether the application is willing to accept complex types 
(Map, Arrays)
--- End diff --

Default value?


> Use user server event loop group for web clients
> 
>
> Key: DRILL-4841
> URL: https://issues.apache.org/jira/browse/DRILL-4841
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Client - HTTP
>Reporter: Sudheesh Katkam
>Assignee: Sorabh Hamirwasia
>Priority: Minor
>
> Currently we spawn an event loop

[jira] [Commented] (DRILL-4841) Use user server event loop group for web clients

2016-10-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15553346#comment-15553346
 ] 

ASF GitHub Bot commented on DRILL-4841:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/565#discussion_r82266114
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/client/DrillClient.java ---
@@ -688,4 +802,142 @@ public DrillBuf getBuffer() {
   return null;
 }
   }
+
+  /**
+   * Return a new {@link DrillClient.Builder Drill client builder}.
+   * @return a new builder
+   */
+  public static Builder newBuilder() {
+return new Builder();
+  }
+
+  /**
+   * Helper class to construct a {@link DrillClient Drill client}.
+   */
+  public static class Builder {
+
+private DrillConfig config;
+private BufferAllocator allocator;
+private ClusterCoordinator clusterCoordinator;
+private EventLoopGroup eventLoopGroup;
+private ExecutorService executor;
+
+// defaults
+private boolean supportComplexTypes = true;
+private boolean isDirectConnection = false;
+
+/**
+ * Sets the {@link DrillConfig configuration} for this client.
--- End diff --

Scenario? What is used by default? An empty config? One found on the class 
path?

What if I want to use the class-path (default) Drill config, but add my own 
customizations (as for tests)? Should we explain how to do this, or provide a 
method to help:

withExtraConfig( Config props )


> Use user server event loop group for web clients
> 
>
> Key: DRILL-4841
> URL: https://issues.apache.org/jira/browse/DRILL-4841
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Client - HTTP
>Reporter: Sudheesh Katkam
>Assignee: Sorabh Hamirwasia
>Priority: Minor
>
> Currently we spawn an event loop group for handling requests from clients. 
> This group should also be used to handles responses (from server) for web 
> clients.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4203) Parquet File : Date is stored wrongly

2016-10-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15553104#comment-15553104
 ] 

ASF GitHub Bot commented on DRILL-4203:
---

Github user jaltekruse commented on a diff in the pull request:

https://github.com/apache/drill/pull/595#discussion_r82274813
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetReaderUtility.java
 ---
@@ -184,17 +201,15 @@ public static DateCorruptionStatus 
detectCorruptDates(ParquetMetadata footer,
   if (parsedCreatedByVersion.hasSemanticVersion()) {
 SemanticVersion semVer = 
parsedCreatedByVersion.getSemanticVersion();
 String pre = semVer.pre + "";
-if (semVer != null && semVer.major == 1 && semVer.minor == 8 
&& semVer.patch == 1 && pre.contains("drill")) {
--- End diff --

I don't know if you are guarding against a null value for semVer at a 
higher level, but I'm pretty sure the reason I added it is that older versions 
of parquet files can be lacking this field. I assume with all of the tests 
cases that were added we should be okay, but it might be better to keep it in 
defensively. If this field isn't set in the metadata, it shouldn't make it an 
invalid file, but if we get an NPE it will stop query processing.


> Parquet File : Date is stored wrongly
> -
>
> Key: DRILL-4203
> URL: https://issues.apache.org/jira/browse/DRILL-4203
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.4.0
>Reporter: Stéphane Trou
>Assignee: Vitalii Diravka
>Priority: Critical
> Fix For: 1.9.0
>
>
> Hello,
> I have some problems when i try to read parquet files produce by drill with  
> Spark,  all dates are corrupted.
> I think the problem come from drill :)
> {code}
> cat /tmp/date_parquet.csv 
> Epoch,1970-01-01
> {code}
> {code}
> 0: jdbc:drill:zk=local> select columns[0] as name, cast(columns[1] as date) 
> as epoch_date from dfs.tmp.`date_parquet.csv`;
> ++-+
> |  name  | epoch_date  |
> ++-+
> | Epoch  | 1970-01-01  |
> ++-+
> {code}
> {code}
> 0: jdbc:drill:zk=local> create table dfs.tmp.`buggy_parquet`as select 
> columns[0] as name, cast(columns[1] as date) as epoch_date from 
> dfs.tmp.`date_parquet.csv`;
> +---++
> | Fragment  | Number of records written  |
> +---++
> | 0_0   | 1  |
> +---++
> {code}
> When I read the file with parquet tools, i found  
> {code}
> java -jar parquet-tools-1.8.1.jar head /tmp/buggy_parquet/
> name = Epoch
> epoch_date = 4881176
> {code}
> According to 
> [https://github.com/Parquet/parquet-format/blob/master/LogicalTypes.md#date], 
> epoch_date should be equals to 0.
> Meta : 
> {code}
> java -jar parquet-tools-1.8.1.jar meta /tmp/buggy_parquet/
> file:file:/tmp/buggy_parquet/0_0_0.parquet 
> creator: parquet-mr version 1.8.1-drill-r0 (build 
> 6b605a4ea05b66e1a6bf843353abcb4834a4ced8) 
> extra:   drill.version = 1.4.0 
> file schema: root 
> 
> name:OPTIONAL BINARY O:UTF8 R:0 D:1
> epoch_date:  OPTIONAL INT32 O:DATE R:0 D:1
> row group 1: RC:1 TS:93 OFFSET:4 
> 
> name: BINARY SNAPPY DO:0 FPO:4 SZ:52/50/0,96 VC:1 
> ENC:RLE,BIT_PACKED,PLAIN
> epoch_date:   INT32 SNAPPY DO:0 FPO:56 SZ:45/43/0,96 VC:1 
> ENC:RLE,BIT_PACKED,PLAIN
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4203) Parquet File : Date is stored wrongly

2016-10-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15553103#comment-15553103
 ] 

ASF GitHub Bot commented on DRILL-4203:
---

Github user jaltekruse commented on a diff in the pull request:

https://github.com/apache/drill/pull/595#discussion_r82277733
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetScanBatchCreator.java
 ---
@@ -107,6 +107,7 @@ public ScanBatch getBatch(FragmentContext context, 
ParquetRowGroupScan rowGroupS
 boolean autoCorrectCorruptDates = 
rowGroupScan.formatConfig.autoCorrectCorruptDates;
 ParquetReaderUtility.DateCorruptionStatus containsCorruptDates = 
ParquetReaderUtility.detectCorruptDates(footers.get(e.getPath()), 
rowGroupScan.getColumns(),
 autoCorrectCorruptDates);
+logger.info(containsCorruptDates.toString());
--- End diff --

same as above, not a very useful logging statement.


> Parquet File : Date is stored wrongly
> -
>
> Key: DRILL-4203
> URL: https://issues.apache.org/jira/browse/DRILL-4203
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.4.0
>Reporter: Stéphane Trou
>Assignee: Vitalii Diravka
>Priority: Critical
> Fix For: 1.9.0
>
>
> Hello,
> I have some problems when i try to read parquet files produce by drill with  
> Spark,  all dates are corrupted.
> I think the problem come from drill :)
> {code}
> cat /tmp/date_parquet.csv 
> Epoch,1970-01-01
> {code}
> {code}
> 0: jdbc:drill:zk=local> select columns[0] as name, cast(columns[1] as date) 
> as epoch_date from dfs.tmp.`date_parquet.csv`;
> ++-+
> |  name  | epoch_date  |
> ++-+
> | Epoch  | 1970-01-01  |
> ++-+
> {code}
> {code}
> 0: jdbc:drill:zk=local> create table dfs.tmp.`buggy_parquet`as select 
> columns[0] as name, cast(columns[1] as date) as epoch_date from 
> dfs.tmp.`date_parquet.csv`;
> +---++
> | Fragment  | Number of records written  |
> +---++
> | 0_0   | 1  |
> +---++
> {code}
> When I read the file with parquet tools, i found  
> {code}
> java -jar parquet-tools-1.8.1.jar head /tmp/buggy_parquet/
> name = Epoch
> epoch_date = 4881176
> {code}
> According to 
> [https://github.com/Parquet/parquet-format/blob/master/LogicalTypes.md#date], 
> epoch_date should be equals to 0.
> Meta : 
> {code}
> java -jar parquet-tools-1.8.1.jar meta /tmp/buggy_parquet/
> file:file:/tmp/buggy_parquet/0_0_0.parquet 
> creator: parquet-mr version 1.8.1-drill-r0 (build 
> 6b605a4ea05b66e1a6bf843353abcb4834a4ced8) 
> extra:   drill.version = 1.4.0 
> file schema: root 
> 
> name:OPTIONAL BINARY O:UTF8 R:0 D:1
> epoch_date:  OPTIONAL INT32 O:DATE R:0 D:1
> row group 1: RC:1 TS:93 OFFSET:4 
> 
> name: BINARY SNAPPY DO:0 FPO:4 SZ:52/50/0,96 VC:1 
> ENC:RLE,BIT_PACKED,PLAIN
> epoch_date:   INT32 SNAPPY DO:0 FPO:56 SZ:45/43/0,96 VC:1 
> ENC:RLE,BIT_PACKED,PLAIN
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4373) Drill and Hive have incompatible timestamp representations in parquet

2016-10-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15553230#comment-15553230
 ] 

ASF GitHub Bot commented on DRILL-4373:
---

Github user bitblender commented on the issue:

https://github.com/apache/drill/pull/600
  
I can't see NullableFixedByteAlignedReaders.java. Shows up as a binary file.


> Drill and Hive have incompatible timestamp representations in parquet
> -
>
> Key: DRILL-4373
> URL: https://issues.apache.org/jira/browse/DRILL-4373
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive, Storage - Parquet
>Affects Versions: 1.8.0
>Reporter: Rahul Challapalli
>Assignee: Karthikeyan Manivannan
>  Labels: doc-impacting
> Fix For: 1.9.0
>
>
> git.commit.id.abbrev=83d460c
> I created a parquet file with a timestamp type using Drill. Now if I define a 
> hive table on top of the parquet file and use "timestamp" as the column type, 
> drill fails to read the hive table through the hive storage plugin



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4203) Parquet File : Date is stored wrongly

2016-10-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15553256#comment-15553256
 ] 

ASF GitHub Bot commented on DRILL-4203:
---

Github user vdiravka commented on a diff in the pull request:

https://github.com/apache/drill/pull/595#discussion_r82287507
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetReaderUtility.java
 ---
@@ -184,17 +201,15 @@ public static DateCorruptionStatus 
detectCorruptDates(ParquetMetadata footer,
   if (parsedCreatedByVersion.hasSemanticVersion()) {
 SemanticVersion semVer = 
parsedCreatedByVersion.getSemanticVersion();
 String pre = semVer.pre + "";
-if (semVer != null && semVer.major == 1 && semVer.minor == 8 
&& semVer.patch == 1 && pre.contains("drill")) {
--- End diff --

If `semVer `will be null we get NPE above, where `semVer.pre` is called. 
It happened with tests on our regression test framework. That's why I added 
`hasSemanticVersion()` above. So if this method return `true`, `semVer `cann't 
be `NULL` according `org.apache.parquet.SemanticVersion` and the following 
`semVer!=null` checking is redundant.


> Parquet File : Date is stored wrongly
> -
>
> Key: DRILL-4203
> URL: https://issues.apache.org/jira/browse/DRILL-4203
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.4.0
>Reporter: Stéphane Trou
>Assignee: Vitalii Diravka
>Priority: Critical
> Fix For: 1.9.0
>
>
> Hello,
> I have some problems when i try to read parquet files produce by drill with  
> Spark,  all dates are corrupted.
> I think the problem come from drill :)
> {code}
> cat /tmp/date_parquet.csv 
> Epoch,1970-01-01
> {code}
> {code}
> 0: jdbc:drill:zk=local> select columns[0] as name, cast(columns[1] as date) 
> as epoch_date from dfs.tmp.`date_parquet.csv`;
> ++-+
> |  name  | epoch_date  |
> ++-+
> | Epoch  | 1970-01-01  |
> ++-+
> {code}
> {code}
> 0: jdbc:drill:zk=local> create table dfs.tmp.`buggy_parquet`as select 
> columns[0] as name, cast(columns[1] as date) as epoch_date from 
> dfs.tmp.`date_parquet.csv`;
> +---++
> | Fragment  | Number of records written  |
> +---++
> | 0_0   | 1  |
> +---++
> {code}
> When I read the file with parquet tools, i found  
> {code}
> java -jar parquet-tools-1.8.1.jar head /tmp/buggy_parquet/
> name = Epoch
> epoch_date = 4881176
> {code}
> According to 
> [https://github.com/Parquet/parquet-format/blob/master/LogicalTypes.md#date], 
> epoch_date should be equals to 0.
> Meta : 
> {code}
> java -jar parquet-tools-1.8.1.jar meta /tmp/buggy_parquet/
> file:file:/tmp/buggy_parquet/0_0_0.parquet 
> creator: parquet-mr version 1.8.1-drill-r0 (build 
> 6b605a4ea05b66e1a6bf843353abcb4834a4ced8) 
> extra:   drill.version = 1.4.0 
> file schema: root 
> 
> name:OPTIONAL BINARY O:UTF8 R:0 D:1
> epoch_date:  OPTIONAL INT32 O:DATE R:0 D:1
> row group 1: RC:1 TS:93 OFFSET:4 
> 
> name: BINARY SNAPPY DO:0 FPO:4 SZ:52/50/0,96 VC:1 
> ENC:RLE,BIT_PACKED,PLAIN
> epoch_date:   INT32 SNAPPY DO:0 FPO:56 SZ:45/43/0,96 VC:1 
> ENC:RLE,BIT_PACKED,PLAIN
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4203) Parquet File : Date is stored wrongly

2016-10-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15553257#comment-15553257
 ] 

ASF GitHub Bot commented on DRILL-4203:
---

Github user vdiravka commented on a diff in the pull request:

https://github.com/apache/drill/pull/595#discussion_r82290736
  
--- Diff: 
contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveDrillNativeScanBatchCreator.java
 ---
@@ -123,6 +123,7 @@ public ScanBatch getBatch(FragmentContext context, 
HiveDrillNativeParquetSubScan
   // in the first row group
   ParquetReaderUtility.DateCorruptionStatus containsCorruptDates =
   ParquetReaderUtility.detectCorruptDates(parquetMetadata, 
config.getColumns(), true);
+  logger.info(containsCorruptDates.toString());
--- End diff --

But it is not a regular Boolean value, `DateCorruptionStatus` is an enum 
with overrided toString method. For example how log file looks after querying 
hive parquet file with correct date values:

`2016-10-06 23:32:30,598 [280920f1-e362-136e-0fdd-24779fef2c4a:frag:0:0] 
INFO  o.a.d.e.s.p.ParquetScanBatchCreator - It is determined from metadata that 
the date values are definitely CORRECT`

Is it enought?


> Parquet File : Date is stored wrongly
> -
>
> Key: DRILL-4203
> URL: https://issues.apache.org/jira/browse/DRILL-4203
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.4.0
>Reporter: Stéphane Trou
>Assignee: Vitalii Diravka
>Priority: Critical
> Fix For: 1.9.0
>
>
> Hello,
> I have some problems when i try to read parquet files produce by drill with  
> Spark,  all dates are corrupted.
> I think the problem come from drill :)
> {code}
> cat /tmp/date_parquet.csv 
> Epoch,1970-01-01
> {code}
> {code}
> 0: jdbc:drill:zk=local> select columns[0] as name, cast(columns[1] as date) 
> as epoch_date from dfs.tmp.`date_parquet.csv`;
> ++-+
> |  name  | epoch_date  |
> ++-+
> | Epoch  | 1970-01-01  |
> ++-+
> {code}
> {code}
> 0: jdbc:drill:zk=local> create table dfs.tmp.`buggy_parquet`as select 
> columns[0] as name, cast(columns[1] as date) as epoch_date from 
> dfs.tmp.`date_parquet.csv`;
> +---++
> | Fragment  | Number of records written  |
> +---++
> | 0_0   | 1  |
> +---++
> {code}
> When I read the file with parquet tools, i found  
> {code}
> java -jar parquet-tools-1.8.1.jar head /tmp/buggy_parquet/
> name = Epoch
> epoch_date = 4881176
> {code}
> According to 
> [https://github.com/Parquet/parquet-format/blob/master/LogicalTypes.md#date], 
> epoch_date should be equals to 0.
> Meta : 
> {code}
> java -jar parquet-tools-1.8.1.jar meta /tmp/buggy_parquet/
> file:file:/tmp/buggy_parquet/0_0_0.parquet 
> creator: parquet-mr version 1.8.1-drill-r0 (build 
> 6b605a4ea05b66e1a6bf843353abcb4834a4ced8) 
> extra:   drill.version = 1.4.0 
> file schema: root 
> 
> name:OPTIONAL BINARY O:UTF8 R:0 D:1
> epoch_date:  OPTIONAL INT32 O:DATE R:0 D:1
> row group 1: RC:1 TS:93 OFFSET:4 
> 
> name: BINARY SNAPPY DO:0 FPO:4 SZ:52/50/0,96 VC:1 
> ENC:RLE,BIT_PACKED,PLAIN
> epoch_date:   INT32 SNAPPY DO:0 FPO:56 SZ:45/43/0,96 VC:1 
> ENC:RLE,BIT_PACKED,PLAIN
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4203) Parquet File : Date is stored wrongly

2016-10-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15553290#comment-15553290
 ] 

ASF GitHub Bot commented on DRILL-4203:
---

Github user jaltekruse commented on a diff in the pull request:

https://github.com/apache/drill/pull/595#discussion_r82291973
  
--- Diff: 
contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveDrillNativeScanBatchCreator.java
 ---
@@ -123,6 +123,7 @@ public ScanBatch getBatch(FragmentContext context, 
HiveDrillNativeParquetSubScan
   // in the first row group
   ParquetReaderUtility.DateCorruptionStatus containsCorruptDates =
   ParquetReaderUtility.detectCorruptDates(parquetMetadata, 
config.getColumns(), true);
+  logger.info(containsCorruptDates.toString());
--- End diff --

Sounds good


> Parquet File : Date is stored wrongly
> -
>
> Key: DRILL-4203
> URL: https://issues.apache.org/jira/browse/DRILL-4203
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.4.0
>Reporter: Stéphane Trou
>Assignee: Vitalii Diravka
>Priority: Critical
> Fix For: 1.9.0
>
>
> Hello,
> I have some problems when i try to read parquet files produce by drill with  
> Spark,  all dates are corrupted.
> I think the problem come from drill :)
> {code}
> cat /tmp/date_parquet.csv 
> Epoch,1970-01-01
> {code}
> {code}
> 0: jdbc:drill:zk=local> select columns[0] as name, cast(columns[1] as date) 
> as epoch_date from dfs.tmp.`date_parquet.csv`;
> ++-+
> |  name  | epoch_date  |
> ++-+
> | Epoch  | 1970-01-01  |
> ++-+
> {code}
> {code}
> 0: jdbc:drill:zk=local> create table dfs.tmp.`buggy_parquet`as select 
> columns[0] as name, cast(columns[1] as date) as epoch_date from 
> dfs.tmp.`date_parquet.csv`;
> +---++
> | Fragment  | Number of records written  |
> +---++
> | 0_0   | 1  |
> +---++
> {code}
> When I read the file with parquet tools, i found  
> {code}
> java -jar parquet-tools-1.8.1.jar head /tmp/buggy_parquet/
> name = Epoch
> epoch_date = 4881176
> {code}
> According to 
> [https://github.com/Parquet/parquet-format/blob/master/LogicalTypes.md#date], 
> epoch_date should be equals to 0.
> Meta : 
> {code}
> java -jar parquet-tools-1.8.1.jar meta /tmp/buggy_parquet/
> file:file:/tmp/buggy_parquet/0_0_0.parquet 
> creator: parquet-mr version 1.8.1-drill-r0 (build 
> 6b605a4ea05b66e1a6bf843353abcb4834a4ced8) 
> extra:   drill.version = 1.4.0 
> file schema: root 
> 
> name:OPTIONAL BINARY O:UTF8 R:0 D:1
> epoch_date:  OPTIONAL INT32 O:DATE R:0 D:1
> row group 1: RC:1 TS:93 OFFSET:4 
> 
> name: BINARY SNAPPY DO:0 FPO:4 SZ:52/50/0,96 VC:1 
> ENC:RLE,BIT_PACKED,PLAIN
> epoch_date:   INT32 SNAPPY DO:0 FPO:56 SZ:45/43/0,96 VC:1 
> ENC:RLE,BIT_PACKED,PLAIN
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4203) Parquet File : Date is stored wrongly

2016-10-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15553289#comment-15553289
 ] 

ASF GitHub Bot commented on DRILL-4203:
---

Github user jaltekruse commented on a diff in the pull request:

https://github.com/apache/drill/pull/595#discussion_r82292165
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetReaderUtility.java
 ---
@@ -184,17 +201,15 @@ public static DateCorruptionStatus 
detectCorruptDates(ParquetMetadata footer,
   if (parsedCreatedByVersion.hasSemanticVersion()) {
 SemanticVersion semVer = 
parsedCreatedByVersion.getSemanticVersion();
 String pre = semVer.pre + "";
-if (semVer != null && semVer.major == 1 && semVer.minor == 8 
&& semVer.patch == 1 && pre.contains("drill")) {
--- End diff --

Sorry I should have looked through the rest of the changes, thank you for 
fixing this


> Parquet File : Date is stored wrongly
> -
>
> Key: DRILL-4203
> URL: https://issues.apache.org/jira/browse/DRILL-4203
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.4.0
>Reporter: Stéphane Trou
>Assignee: Vitalii Diravka
>Priority: Critical
> Fix For: 1.9.0
>
>
> Hello,
> I have some problems when i try to read parquet files produce by drill with  
> Spark,  all dates are corrupted.
> I think the problem come from drill :)
> {code}
> cat /tmp/date_parquet.csv 
> Epoch,1970-01-01
> {code}
> {code}
> 0: jdbc:drill:zk=local> select columns[0] as name, cast(columns[1] as date) 
> as epoch_date from dfs.tmp.`date_parquet.csv`;
> ++-+
> |  name  | epoch_date  |
> ++-+
> | Epoch  | 1970-01-01  |
> ++-+
> {code}
> {code}
> 0: jdbc:drill:zk=local> create table dfs.tmp.`buggy_parquet`as select 
> columns[0] as name, cast(columns[1] as date) as epoch_date from 
> dfs.tmp.`date_parquet.csv`;
> +---++
> | Fragment  | Number of records written  |
> +---++
> | 0_0   | 1  |
> +---++
> {code}
> When I read the file with parquet tools, i found  
> {code}
> java -jar parquet-tools-1.8.1.jar head /tmp/buggy_parquet/
> name = Epoch
> epoch_date = 4881176
> {code}
> According to 
> [https://github.com/Parquet/parquet-format/blob/master/LogicalTypes.md#date], 
> epoch_date should be equals to 0.
> Meta : 
> {code}
> java -jar parquet-tools-1.8.1.jar meta /tmp/buggy_parquet/
> file:file:/tmp/buggy_parquet/0_0_0.parquet 
> creator: parquet-mr version 1.8.1-drill-r0 (build 
> 6b605a4ea05b66e1a6bf843353abcb4834a4ced8) 
> extra:   drill.version = 1.4.0 
> file schema: root 
> 
> name:OPTIONAL BINARY O:UTF8 R:0 D:1
> epoch_date:  OPTIONAL INT32 O:DATE R:0 D:1
> row group 1: RC:1 TS:93 OFFSET:4 
> 
> name: BINARY SNAPPY DO:0 FPO:4 SZ:52/50/0,96 VC:1 
> ENC:RLE,BIT_PACKED,PLAIN
> epoch_date:   INT32 SNAPPY DO:0 FPO:56 SZ:45/43/0,96 VC:1 
> ENC:RLE,BIT_PACKED,PLAIN
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4203) Parquet File : Date is stored wrongly

2016-10-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15553288#comment-15553288
 ] 

ASF GitHub Bot commented on DRILL-4203:
---

Github user jaltekruse commented on a diff in the pull request:

https://github.com/apache/drill/pull/595#discussion_r82292241
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetScanBatchCreator.java
 ---
@@ -107,6 +107,7 @@ public ScanBatch getBatch(FragmentContext context, 
ParquetRowGroupScan rowGroupS
 boolean autoCorrectCorruptDates = 
rowGroupScan.formatConfig.autoCorrectCorruptDates;
 ParquetReaderUtility.DateCorruptionStatus containsCorruptDates = 
ParquetReaderUtility.detectCorruptDates(footers.get(e.getPath()), 
rowGroupScan.getColumns(),
 autoCorrectCorruptDates);
+logger.info(containsCorruptDates.toString());
--- End diff --

addressed, in other similar comment, this is good


> Parquet File : Date is stored wrongly
> -
>
> Key: DRILL-4203
> URL: https://issues.apache.org/jira/browse/DRILL-4203
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.4.0
>Reporter: Stéphane Trou
>Assignee: Vitalii Diravka
>Priority: Critical
> Fix For: 1.9.0
>
>
> Hello,
> I have some problems when i try to read parquet files produce by drill with  
> Spark,  all dates are corrupted.
> I think the problem come from drill :)
> {code}
> cat /tmp/date_parquet.csv 
> Epoch,1970-01-01
> {code}
> {code}
> 0: jdbc:drill:zk=local> select columns[0] as name, cast(columns[1] as date) 
> as epoch_date from dfs.tmp.`date_parquet.csv`;
> ++-+
> |  name  | epoch_date  |
> ++-+
> | Epoch  | 1970-01-01  |
> ++-+
> {code}
> {code}
> 0: jdbc:drill:zk=local> create table dfs.tmp.`buggy_parquet`as select 
> columns[0] as name, cast(columns[1] as date) as epoch_date from 
> dfs.tmp.`date_parquet.csv`;
> +---++
> | Fragment  | Number of records written  |
> +---++
> | 0_0   | 1  |
> +---++
> {code}
> When I read the file with parquet tools, i found  
> {code}
> java -jar parquet-tools-1.8.1.jar head /tmp/buggy_parquet/
> name = Epoch
> epoch_date = 4881176
> {code}
> According to 
> [https://github.com/Parquet/parquet-format/blob/master/LogicalTypes.md#date], 
> epoch_date should be equals to 0.
> Meta : 
> {code}
> java -jar parquet-tools-1.8.1.jar meta /tmp/buggy_parquet/
> file:file:/tmp/buggy_parquet/0_0_0.parquet 
> creator: parquet-mr version 1.8.1-drill-r0 (build 
> 6b605a4ea05b66e1a6bf843353abcb4834a4ced8) 
> extra:   drill.version = 1.4.0 
> file schema: root 
> 
> name:OPTIONAL BINARY O:UTF8 R:0 D:1
> epoch_date:  OPTIONAL INT32 O:DATE R:0 D:1
> row group 1: RC:1 TS:93 OFFSET:4 
> 
> name: BINARY SNAPPY DO:0 FPO:4 SZ:52/50/0,96 VC:1 
> ENC:RLE,BIT_PACKED,PLAIN
> epoch_date:   INT32 SNAPPY DO:0 FPO:56 SZ:45/43/0,96 VC:1 
> ENC:RLE,BIT_PACKED,PLAIN
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4862) wrong results - use of convert_from(binary_string(key),'UTF8') in filter results in wrong results

2016-10-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15553319#comment-15553319
 ] 

ASF GitHub Bot commented on DRILL-4862:
---

Github user gparai commented on a diff in the pull request:

https://github.com/apache/drill/pull/604#discussion_r82294640
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/StringFunctions.java
 ---
@@ -1540,15 +1540,16 @@ public void eval() {
   public static class BinaryString implements DrillSimpleFunc {
 @Param  VarCharHolder in;
 @Output VarBinaryHolder out;
+@Inject DrillBuf buffer;
 
 @Override
 public void setup() {}
 
 @Override
 public void eval() {
-  out.buffer = in.buffer;
-  out.start = in.start;
-  out.end = 
org.apache.drill.common.util.DrillStringUtils.parseBinaryString(in.buffer, 
in.start, in.end);
+  out.buffer = buffer.reallocIfNeeded(in.end - in.start);
+  out.start = out.end = 0;
+  out.end = 
org.apache.drill.common.util.DrillStringUtils.parseBinaryString(in.buffer, 
in.start, in.end, out.buffer);
   out.buffer.setIndex(out.start, out.end);
--- End diff --

Do we still need to set the index?


> wrong results - use of convert_from(binary_string(key),'UTF8') in filter 
> results in wrong results
> -
>
> Key: DRILL-4862
> URL: https://issues.apache.org/jira/browse/DRILL-4862
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.8.0
> Environment: 4 node cluster CentOS
>Reporter: Khurram Faraaz
>Assignee: Gautam Kumar Parai
>
> These results do not look right, i.e when the predicate has 
> convert_from(binary_string(key),'UTF8')
> Apache drill 1.8.0-SNAPSHOT git commit ID: 57dc9f43
> {noformat}
> [root@centos-0x drill4478]# cat f1.json
> {"key":"\\x30\\x31\\x32\\x33"}
> {"key":"\\x34\\x35\\x36\\x37"}
> {"key":"\\x38\\x39\\x30\\x31"}
> {"key":"\\x30\\x30\\x30\\x30"}
> {"key":"\\x31\\x31\\x31\\x31"}
> {"key":"\\x35\\x35\\x35\\x35"}
> {"key":"\\x38\\x38\\x38\\x38"}
> {"key":"\\x39\\x39\\x39\\x39"}
> {"key":"\\x41\\x42\\x43\\x44"}
> {"key":"\\x45\\x46\\x47\\x48"}
> {"key":"\\x49\\x41\\x44\\x46"}
> {"key":"\\x4a\\x4b\\x4c\\x4d"}
> {"key":"\\x57\\x58\\x59\\x5a"}
> {"key":"\\x4e\\x4f\\x50\\x51"}
> {"key":"\\x46\\x46\\x46\\x46"}
> {noformat}
> results without the predicate - these are correct results
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> select convert_from(binary_string(key),'UTF8') 
> from `f1.json`;
> +-+
> | EXPR$0  |
> +-+
> | 0123|
> | 4567|
> | 8901|
> | |
> | |
> | |
> | |
> | |
> | ABCD|
> | EFGH|
> | IADF|
> | JKLM|
> | WXYZ|
> | NOPQ|
> | |
> +-+
> 15 rows selected (0.256 seconds)
> {noformat}
> results with a predicate - these results don't look correct
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> select convert_from(binary_string(key),'UTF8') 
> from `f1.json` where convert_from(binary_string(key),'UTF8') is not null;
> +--+
> |  EXPR$0  |
> +--+
> | 0123123  |
> | 4567567  |
> | 8901901  |
> | 000  |
> | 111  |
> | 555  |
> | 888  |
> | 999  |
> | ABCDBCD  |
> | EFGHFGH  |
> | IADFADF  |
> | JKLMKLM  |
> | WXYZXYZ  |
> | NOPQOPQ  |
> | FFF  |
> +--+
> 15 rows selected (0.279 seconds)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4862) wrong results - use of convert_from(binary_string(key),'UTF8') in filter results in wrong results

2016-10-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15553317#comment-15553317
 ] 

ASF GitHub Bot commented on DRILL-4862:
---

Github user gparai commented on a diff in the pull request:

https://github.com/apache/drill/pull/604#discussion_r82293808
  
--- Diff: 
common/src/main/java/org/apache/drill/common/util/DrillStringUtils.java ---
@@ -160,10 +160,9 @@ private static void appendByte(StringBuilder result, 
byte b) {
*
* @return Index in the byte buffer just after the last written byte.
*/
-  public static int parseBinaryString(ByteBuf str, int strStart, int 
strEnd) {
-int length = (strEnd - strStart);
-int dstEnd = strStart;
-for (int i = strStart; i < strStart+length ; i++) {
+  public static int parseBinaryString(ByteBuf str, int strStart, int 
strEnd, ByteBuf out) {
+int dstEnd = 0;
+for (int i = strStart; i < strEnd ; i++) {
--- End diff --

Please change to `strEnd;`


> wrong results - use of convert_from(binary_string(key),'UTF8') in filter 
> results in wrong results
> -
>
> Key: DRILL-4862
> URL: https://issues.apache.org/jira/browse/DRILL-4862
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.8.0
> Environment: 4 node cluster CentOS
>Reporter: Khurram Faraaz
>Assignee: Gautam Kumar Parai
>
> These results do not look right, i.e when the predicate has 
> convert_from(binary_string(key),'UTF8')
> Apache drill 1.8.0-SNAPSHOT git commit ID: 57dc9f43
> {noformat}
> [root@centos-0x drill4478]# cat f1.json
> {"key":"\\x30\\x31\\x32\\x33"}
> {"key":"\\x34\\x35\\x36\\x37"}
> {"key":"\\x38\\x39\\x30\\x31"}
> {"key":"\\x30\\x30\\x30\\x30"}
> {"key":"\\x31\\x31\\x31\\x31"}
> {"key":"\\x35\\x35\\x35\\x35"}
> {"key":"\\x38\\x38\\x38\\x38"}
> {"key":"\\x39\\x39\\x39\\x39"}
> {"key":"\\x41\\x42\\x43\\x44"}
> {"key":"\\x45\\x46\\x47\\x48"}
> {"key":"\\x49\\x41\\x44\\x46"}
> {"key":"\\x4a\\x4b\\x4c\\x4d"}
> {"key":"\\x57\\x58\\x59\\x5a"}
> {"key":"\\x4e\\x4f\\x50\\x51"}
> {"key":"\\x46\\x46\\x46\\x46"}
> {noformat}
> results without the predicate - these are correct results
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> select convert_from(binary_string(key),'UTF8') 
> from `f1.json`;
> +-+
> | EXPR$0  |
> +-+
> | 0123|
> | 4567|
> | 8901|
> | |
> | |
> | |
> | |
> | |
> | ABCD|
> | EFGH|
> | IADF|
> | JKLM|
> | WXYZ|
> | NOPQ|
> | |
> +-+
> 15 rows selected (0.256 seconds)
> {noformat}
> results with a predicate - these results don't look correct
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> select convert_from(binary_string(key),'UTF8') 
> from `f1.json` where convert_from(binary_string(key),'UTF8') is not null;
> +--+
> |  EXPR$0  |
> +--+
> | 0123123  |
> | 4567567  |
> | 8901901  |
> | 000  |
> | 111  |
> | 555  |
> | 888  |
> | 999  |
> | ABCDBCD  |
> | EFGHFGH  |
> | IADFADF  |
> | JKLMKLM  |
> | WXYZXYZ  |
> | NOPQOPQ  |
> | FFF  |
> +--+
> 15 rows selected (0.279 seconds)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4862) wrong results - use of convert_from(binary_string(key),'UTF8') in filter results in wrong results

2016-10-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15553318#comment-15553318
 ] 

ASF GitHub Bot commented on DRILL-4862:
---

Github user gparai commented on a diff in the pull request:

https://github.com/apache/drill/pull/604#discussion_r82293523
  
--- Diff: 
common/src/main/java/org/apache/drill/common/util/DrillStringUtils.java ---
@@ -160,10 +160,9 @@ private static void appendByte(StringBuilder result, 
byte b) {
*
--- End diff --

Please update the comment - it no longer does in-place parsing.


> wrong results - use of convert_from(binary_string(key),'UTF8') in filter 
> results in wrong results
> -
>
> Key: DRILL-4862
> URL: https://issues.apache.org/jira/browse/DRILL-4862
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.8.0
> Environment: 4 node cluster CentOS
>Reporter: Khurram Faraaz
>Assignee: Gautam Kumar Parai
>
> These results do not look right, i.e when the predicate has 
> convert_from(binary_string(key),'UTF8')
> Apache drill 1.8.0-SNAPSHOT git commit ID: 57dc9f43
> {noformat}
> [root@centos-0x drill4478]# cat f1.json
> {"key":"\\x30\\x31\\x32\\x33"}
> {"key":"\\x34\\x35\\x36\\x37"}
> {"key":"\\x38\\x39\\x30\\x31"}
> {"key":"\\x30\\x30\\x30\\x30"}
> {"key":"\\x31\\x31\\x31\\x31"}
> {"key":"\\x35\\x35\\x35\\x35"}
> {"key":"\\x38\\x38\\x38\\x38"}
> {"key":"\\x39\\x39\\x39\\x39"}
> {"key":"\\x41\\x42\\x43\\x44"}
> {"key":"\\x45\\x46\\x47\\x48"}
> {"key":"\\x49\\x41\\x44\\x46"}
> {"key":"\\x4a\\x4b\\x4c\\x4d"}
> {"key":"\\x57\\x58\\x59\\x5a"}
> {"key":"\\x4e\\x4f\\x50\\x51"}
> {"key":"\\x46\\x46\\x46\\x46"}
> {noformat}
> results without the predicate - these are correct results
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> select convert_from(binary_string(key),'UTF8') 
> from `f1.json`;
> +-+
> | EXPR$0  |
> +-+
> | 0123|
> | 4567|
> | 8901|
> | |
> | |
> | |
> | |
> | |
> | ABCD|
> | EFGH|
> | IADF|
> | JKLM|
> | WXYZ|
> | NOPQ|
> | |
> +-+
> 15 rows selected (0.256 seconds)
> {noformat}
> results with a predicate - these results don't look correct
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> select convert_from(binary_string(key),'UTF8') 
> from `f1.json` where convert_from(binary_string(key),'UTF8') is not null;
> +--+
> |  EXPR$0  |
> +--+
> | 0123123  |
> | 4567567  |
> | 8901901  |
> | 000  |
> | 111  |
> | 555  |
> | 888  |
> | 999  |
> | ABCDBCD  |
> | EFGHFGH  |
> | IADFADF  |
> | JKLMKLM  |
> | WXYZXYZ  |
> | NOPQOPQ  |
> | FFF  |
> +--+
> 15 rows selected (0.279 seconds)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4373) Drill and Hive have incompatible timestamp representations in parquet

2016-10-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15553313#comment-15553313
 ] 

ASF GitHub Bot commented on DRILL-4373:
---

Github user vdiravka commented on the issue:

https://github.com/apache/drill/pull/600
  
@bitblender Sorry about this. That was hidden `\u` symbols.
Fixed.


> Drill and Hive have incompatible timestamp representations in parquet
> -
>
> Key: DRILL-4373
> URL: https://issues.apache.org/jira/browse/DRILL-4373
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive, Storage - Parquet
>Affects Versions: 1.8.0
>Reporter: Rahul Challapalli
>Assignee: Karthikeyan Manivannan
>  Labels: doc-impacting
> Fix For: 1.9.0
>
>
> git.commit.id.abbrev=83d460c
> I created a parquet file with a timestamp type using Drill. Now if I define a 
> hive table on top of the parquet file and use "timestamp" as the column type, 
> drill fails to read the hive table through the hive storage plugin



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (DRILL-4504) Create an event loop for each of [user, control, data] RPC components

2016-10-06 Thread Sorabh Hamirwasia (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sorabh Hamirwasia reassigned DRILL-4504:


Assignee: Sudheesh Katkam  (was: Sorabh Hamirwasia)

Assigning back to Sudheesh as discussed offline yesterday.

> Create an event loop for each of [user, control, data] RPC components
> -
>
> Key: DRILL-4504
> URL: https://issues.apache.org/jira/browse/DRILL-4504
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - RPC
>Reporter: Sudheesh Katkam
>Assignee: Sudheesh Katkam
>
> + Create an event loop group for each client-server pair (data, client and 
> user)
> Miscellaneous:
> + Move WorkEventBus from exec/rpc/control to exec/work



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4203) Parquet File : Date is stored wrongly

2016-10-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15553046#comment-15553046
 ] 

ASF GitHub Bot commented on DRILL-4203:
---

Github user jaltekruse commented on a diff in the pull request:

https://github.com/apache/drill/pull/595#discussion_r82273282
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java 
---
@@ -918,18 +916,22 @@ public void setMax(Object max) {
 @JsonProperty public ConcurrentHashMap columnTypeInfo;
 @JsonProperty List files;
 @JsonProperty List directories;
-@JsonProperty String drillVersion;
--- End diff --

This sounds reasonable, having both is okay.


> Parquet File : Date is stored wrongly
> -
>
> Key: DRILL-4203
> URL: https://issues.apache.org/jira/browse/DRILL-4203
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.4.0
>Reporter: Stéphane Trou
>Assignee: Vitalii Diravka
>Priority: Critical
> Fix For: 1.9.0
>
>
> Hello,
> I have some problems when i try to read parquet files produce by drill with  
> Spark,  all dates are corrupted.
> I think the problem come from drill :)
> {code}
> cat /tmp/date_parquet.csv 
> Epoch,1970-01-01
> {code}
> {code}
> 0: jdbc:drill:zk=local> select columns[0] as name, cast(columns[1] as date) 
> as epoch_date from dfs.tmp.`date_parquet.csv`;
> ++-+
> |  name  | epoch_date  |
> ++-+
> | Epoch  | 1970-01-01  |
> ++-+
> {code}
> {code}
> 0: jdbc:drill:zk=local> create table dfs.tmp.`buggy_parquet`as select 
> columns[0] as name, cast(columns[1] as date) as epoch_date from 
> dfs.tmp.`date_parquet.csv`;
> +---++
> | Fragment  | Number of records written  |
> +---++
> | 0_0   | 1  |
> +---++
> {code}
> When I read the file with parquet tools, i found  
> {code}
> java -jar parquet-tools-1.8.1.jar head /tmp/buggy_parquet/
> name = Epoch
> epoch_date = 4881176
> {code}
> According to 
> [https://github.com/Parquet/parquet-format/blob/master/LogicalTypes.md#date], 
> epoch_date should be equals to 0.
> Meta : 
> {code}
> java -jar parquet-tools-1.8.1.jar meta /tmp/buggy_parquet/
> file:file:/tmp/buggy_parquet/0_0_0.parquet 
> creator: parquet-mr version 1.8.1-drill-r0 (build 
> 6b605a4ea05b66e1a6bf843353abcb4834a4ced8) 
> extra:   drill.version = 1.4.0 
> file schema: root 
> 
> name:OPTIONAL BINARY O:UTF8 R:0 D:1
> epoch_date:  OPTIONAL INT32 O:DATE R:0 D:1
> row group 1: RC:1 TS:93 OFFSET:4 
> 
> name: BINARY SNAPPY DO:0 FPO:4 SZ:52/50/0,96 VC:1 
> ENC:RLE,BIT_PACKED,PLAIN
> epoch_date:   INT32 SNAPPY DO:0 FPO:56 SZ:45/43/0,96 VC:1 
> ENC:RLE,BIT_PACKED,PLAIN
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4726) Dynamic UDFs support

2016-10-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15552883#comment-15552883
 ] 

ASF GitHub Bot commented on DRILL-4726:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/574#discussion_r82262083
  
--- Diff: exec/java-exec/src/main/resources/drill-module.conf ---
@@ -194,10 +194,12 @@ drill.exec: {
   udf: {
 retry-attempts: 5,
 directory: {
-  base: ${drill.home}"/"${drill.exec.zk.root}"/udf",
-  staging: ${drill.exec.udf.directory.base}"/staging",
-  registry: ${drill.exec.udf.directory.base}"/registry",
-  tmp: ${drill.exec.udf.directory.base}"/tmp"
+  base: ${drill.exec.zk.root}"/udf",
+  local: ${drill.exec.udf.directory.base}"/local",
--- End diff --

How is this used? The path given by local seems relative:

drill/udf

Maybe include a simple comment to describe each config setting.

Otherwise, I like how this has come together!


> Dynamic UDFs support
> 
>
> Key: DRILL-4726
> URL: https://issues.apache.org/jira/browse/DRILL-4726
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: 1.6.0
>Reporter: Arina Ielchiieva
>Assignee: Paul Rogers
> Fix For: Future
>
>
> Allow register UDFs without  restart of Drillbits.
> Design is described in document below:
> https://docs.google.com/document/d/1FfyJtWae5TLuyheHCfldYUpCdeIezR2RlNsrOTYyAB4/edit?usp=sharing
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4726) Dynamic UDFs support

2016-10-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15552881#comment-15552881
 ] 

ASF GitHub Bot commented on DRILL-4726:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/574#discussion_r82251309
  
--- Diff: distribution/src/resources/drill-config.sh ---
@@ -324,6 +324,21 @@ if [ -n "$DRILL_CLASSPATH" ]; then
   CP="$CP:$DRILL_CLASSPATH"
 fi
 
+# If tmp dir is given, it must exist.
+if [ -n "$DRILL_TMP_DIR" ]; then
+  if [[ ! -d "$DRILL_TMP_DIR" ]]; then
+fatal_error "temporary dir does not exist:" $DRILL_TMP_DIR
--- End diff --

Capitalize "Temporary".


> Dynamic UDFs support
> 
>
> Key: DRILL-4726
> URL: https://issues.apache.org/jira/browse/DRILL-4726
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: 1.6.0
>Reporter: Arina Ielchiieva
>Assignee: Paul Rogers
> Fix For: Future
>
>
> Allow register UDFs without  restart of Drillbits.
> Design is described in document below:
> https://docs.google.com/document/d/1FfyJtWae5TLuyheHCfldYUpCdeIezR2RlNsrOTYyAB4/edit?usp=sharing
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4726) Dynamic UDFs support

2016-10-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15552885#comment-15552885
 ] 

ASF GitHub Bot commented on DRILL-4726:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/574#discussion_r8225
  
--- Diff: distribution/src/resources/drill-config.sh ---
@@ -324,6 +324,21 @@ if [ -n "$DRILL_CLASSPATH" ]; then
   CP="$CP:$DRILL_CLASSPATH"
 fi
 
+# If tmp dir is given, it must exist.
--- End diff --

Please explain that DRILL_TMP_DIR is used for temporary storage of Dynamic 
UDF jars. (This comment helps folks understand that this is not the same as the 
tmp dir used for spill-to-disk...)


> Dynamic UDFs support
> 
>
> Key: DRILL-4726
> URL: https://issues.apache.org/jira/browse/DRILL-4726
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: 1.6.0
>Reporter: Arina Ielchiieva
>Assignee: Paul Rogers
> Fix For: Future
>
>
> Allow register UDFs without  restart of Drillbits.
> Design is described in document below:
> https://docs.google.com/document/d/1FfyJtWae5TLuyheHCfldYUpCdeIezR2RlNsrOTYyAB4/edit?usp=sharing
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4726) Dynamic UDFs support

2016-10-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15552882#comment-15552882
 ] 

ASF GitHub Bot commented on DRILL-4726:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/574#discussion_r82252246
  
--- Diff: distribution/src/resources/sqlline.bat ---
@@ -114,6 +114,10 @@ if "test%DRILL_LOG_DIR%" == "test" (
   set DRILL_LOG_DIR=%DRILL_HOME%\log
 )
 
+if "test%DRILL_TMP_DIR%" == "test" (
+  set DRILL_TMP_DIR=\tmp
--- End diff --

On Windows, I think the standard is to use %TEMP%. There generally is no 
directory called "\tmp", sometimes there is a "\temp". But %TEMP% is the 
accepted practice.


> Dynamic UDFs support
> 
>
> Key: DRILL-4726
> URL: https://issues.apache.org/jira/browse/DRILL-4726
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: 1.6.0
>Reporter: Arina Ielchiieva
>Assignee: Paul Rogers
> Fix For: Future
>
>
> Allow register UDFs without  restart of Drillbits.
> Design is described in document below:
> https://docs.google.com/document/d/1FfyJtWae5TLuyheHCfldYUpCdeIezR2RlNsrOTYyAB4/edit?usp=sharing
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4726) Dynamic UDFs support

2016-10-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15552884#comment-15552884
 ] 

ASF GitHub Bot commented on DRILL-4726:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/574#discussion_r82259602
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/FunctionImplementationRegistry.java
 ---
@@ -378,38 +399,33 @@ private Path getLocalUdfDir() {
   }
 
   /**
-   * First tries to get drill conf directory value from system properties,
+   * First tries to get drill temporary directory value from system 
properties,
* if value is missing, checks environment properties.
* Throws exception is value is null.
-   * @return drill conf dir path
+   * @return drill temporary directory path
*/
-  private String getConfDir() {
-String drillConfDir = "DRILL_CONF_DIR";
-String value = System.getProperty(drillConfDir);
+  private String getTmpDir() {
--- End diff --

Can we be more forgiving here?

1. Use DRILL_TMP_DIR, if set.
2. Use a config file setting, if set.
3. Use Google's Files.createTempDir( ) which "Atomically creates a new 
directory somewhere beneath the system's temporary directory (as defined by the 
java.io.tmpdir system property)"

For most users, the choice in 3 should work fine. It would only be folks 
who have special needs that would set one of the other two properties.

Then, we can use a TypeSafe trick to combine 1 and 2. Define the config 
property something like this:

drill.temp-dir: "${DRILL_TMP_DIR"

Now, you just have to check drill.temp-dir in your function. If not set, 
use the Files approach as a default.

The nice thing about the Files approach is that each Drillbit will have a 
different directory (if two happen to be running (with different ports) at the 
same time.)

I wonder, however, does the Files temp directory get deleted on Drillbit 
exit?


> Dynamic UDFs support
> 
>
> Key: DRILL-4726
> URL: https://issues.apache.org/jira/browse/DRILL-4726
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: 1.6.0
>Reporter: Arina Ielchiieva
>Assignee: Paul Rogers
> Fix For: Future
>
>
> Allow register UDFs without  restart of Drillbits.
> Design is described in document below:
> https://docs.google.com/document/d/1FfyJtWae5TLuyheHCfldYUpCdeIezR2RlNsrOTYyAB4/edit?usp=sharing
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4726) Dynamic UDFs support

2016-10-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15552886#comment-15552886
 ] 

ASF GitHub Bot commented on DRILL-4726:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/574#discussion_r82260717
  
--- Diff: exec/java-exec/src/main/resources/drill-module.conf ---
@@ -45,7 +45,7 @@ drill.client: {
   supports-complex-types: true
 }
 
-drill.home: "/tmp"
+drill.dfs-home: "/tmp"
--- End diff --

On a standard HDFS setup, is there a /tmp folder? All the examples suggest 
that stuff often goes into "/user/something" such as "/user/drill".

Also, if "/tmp" behaves like a Linux /tmp, might something come along and 
clean up the UDFs during a long Drill run?


> Dynamic UDFs support
> 
>
> Key: DRILL-4726
> URL: https://issues.apache.org/jira/browse/DRILL-4726
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: 1.6.0
>Reporter: Arina Ielchiieva
>Assignee: Paul Rogers
> Fix For: Future
>
>
> Allow register UDFs without  restart of Drillbits.
> Design is described in document below:
> https://docs.google.com/document/d/1FfyJtWae5TLuyheHCfldYUpCdeIezR2RlNsrOTYyAB4/edit?usp=sharing
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4373) Drill and Hive have incompatible timestamp representations in parquet

2016-10-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15553744#comment-15553744
 ] 

ASF GitHub Bot commented on DRILL-4373:
---

Github user bitblender commented on a diff in the pull request:

https://github.com/apache/drill/pull/600#discussion_r82314071
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetReaderUtility.java
 ---
@@ -45,4 +53,34 @@ public static int getIntFromLEBytes(byte[] input, int 
start) {
 }
 return out;
   }
+
+  /**
+   * Utilities for converting from parquet INT96 binary (impala, hive 
timestamp)
+   * to date time value. This utilizes the Joda library.
+   */
+  public static class NanoTimeUtils {
+
+public static final long NANOS_PER_DAY = TimeUnit.DAYS.toNanos(1);
+public static final long NANOS_PER_HOUR = TimeUnit.HOURS.toNanos(1);
+public static final long NANOS_PER_MINUTE = 
TimeUnit.MINUTES.toNanos(1);
+public static final long NANOS_PER_SECOND = 
TimeUnit.SECONDS.toNanos(1);
+public static final long NANOS_PER_MILLISECOND =  
TimeUnit.MILLISECONDS.toNanos(1);
+
+  /**
+   * @param binaryTimeStampValue
+   *  hive, impala timestamp values with nanoseconds precision
+   *  are stored in parquet Binary as INT96
+   *
+   * @return  the number of milliseconds since January 1, 1970, 00:00:00 
GMT
+   *  represented by @param binaryTimeStampValue .
+   */
+public static long getDateTimeValueFromBinary(Binary 
binaryTimeStampValue) {
+  NanoTime nt = NanoTime.fromBinary(binaryTimeStampValue);
+  int julianDay = nt.getJulianDay();
+  long nanosOfDay = nt.getTimeOfDayNanos();
+  return DateTimeUtils.fromJulianDay(julianDay-0.5d) + 
nanosOfDay/NANOS_PER_MILLISECOND;
--- End diff --

1.  I would recommend not using Joda. Do the calculations directly, like in 
ConvertFromImpalaTimestamp. Joda uses non-standard, hence  confusing, 
terminology. What Joda calls and uses as JulianDay, is actually Julian Date. 
Seems like you have identified this discrepancy and adjusted for it by 
subtracting 0.5 from _julianDay_. 

Note: (I guess you have already figured this out) : The actual code and 
the Joda code in the comment, in ConvertFromImpalaTimestamp, are inconsistent. 
Took me a day to figure out the reason behind this ! A bug should be opened to 
delete the comment. 

2. Can you please also leave a comment stating that 2440588 is the JDN for 
the Unix Epoch.

3. Please leave a comment stating that the order of the calls to get 
_julianDay_ and _nanosOfDay_ matters. You can do this by just stating how 
timestamps are stored in INT96 i.e 32-bit JDN followed by 64-bit nanosOfDay.

4. Consistent(single or none) spacing for binary operators (+-/) used here 
would be nice. Single spacing would be preferable.


> Drill and Hive have incompatible timestamp representations in parquet
> -
>
> Key: DRILL-4373
> URL: https://issues.apache.org/jira/browse/DRILL-4373
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive, Storage - Parquet
>Affects Versions: 1.8.0
>Reporter: Rahul Challapalli
>Assignee: Karthikeyan Manivannan
>  Labels: doc-impacting
> Fix For: 1.9.0
>
>
> git.commit.id.abbrev=83d460c
> I created a parquet file with a timestamp type using Drill. Now if I define a 
> hive table on top of the parquet file and use "timestamp" as the column type, 
> drill fails to read the hive table through the hive storage plugin



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (DRILL-4862) wrong results - use of convert_from(binary_string(key),'UTF8') in filter results in wrong results

2016-10-06 Thread Gautam Kumar Parai (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gautam Kumar Parai reassigned DRILL-4862:
-

Assignee: Chunhui Shi  (was: Gautam Kumar Parai)

+1. The changes look good.

> wrong results - use of convert_from(binary_string(key),'UTF8') in filter 
> results in wrong results
> -
>
> Key: DRILL-4862
> URL: https://issues.apache.org/jira/browse/DRILL-4862
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.8.0
> Environment: 4 node cluster CentOS
>Reporter: Khurram Faraaz
>Assignee: Chunhui Shi
>
> These results do not look right, i.e when the predicate has 
> convert_from(binary_string(key),'UTF8')
> Apache drill 1.8.0-SNAPSHOT git commit ID: 57dc9f43
> {noformat}
> [root@centos-0x drill4478]# cat f1.json
> {"key":"\\x30\\x31\\x32\\x33"}
> {"key":"\\x34\\x35\\x36\\x37"}
> {"key":"\\x38\\x39\\x30\\x31"}
> {"key":"\\x30\\x30\\x30\\x30"}
> {"key":"\\x31\\x31\\x31\\x31"}
> {"key":"\\x35\\x35\\x35\\x35"}
> {"key":"\\x38\\x38\\x38\\x38"}
> {"key":"\\x39\\x39\\x39\\x39"}
> {"key":"\\x41\\x42\\x43\\x44"}
> {"key":"\\x45\\x46\\x47\\x48"}
> {"key":"\\x49\\x41\\x44\\x46"}
> {"key":"\\x4a\\x4b\\x4c\\x4d"}
> {"key":"\\x57\\x58\\x59\\x5a"}
> {"key":"\\x4e\\x4f\\x50\\x51"}
> {"key":"\\x46\\x46\\x46\\x46"}
> {noformat}
> results without the predicate - these are correct results
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> select convert_from(binary_string(key),'UTF8') 
> from `f1.json`;
> +-+
> | EXPR$0  |
> +-+
> | 0123|
> | 4567|
> | 8901|
> | |
> | |
> | |
> | |
> | |
> | ABCD|
> | EFGH|
> | IADF|
> | JKLM|
> | WXYZ|
> | NOPQ|
> | |
> +-+
> 15 rows selected (0.256 seconds)
> {noformat}
> results with a predicate - these results don't look correct
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> select convert_from(binary_string(key),'UTF8') 
> from `f1.json` where convert_from(binary_string(key),'UTF8') is not null;
> +--+
> |  EXPR$0  |
> +--+
> | 0123123  |
> | 4567567  |
> | 8901901  |
> | 000  |
> | 111  |
> | 555  |
> | 888  |
> | 999  |
> | ABCDBCD  |
> | EFGHFGH  |
> | IADFADF  |
> | JKLMKLM  |
> | WXYZXYZ  |
> | NOPQOPQ  |
> | FFF  |
> +--+
> 15 rows selected (0.279 seconds)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4862) wrong results - use of convert_from(binary_string(key),'UTF8') in filter results in wrong results

2016-10-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15553782#comment-15553782
 ] 

ASF GitHub Bot commented on DRILL-4862:
---

Github user gparai commented on the issue:

https://github.com/apache/drill/pull/604
  
+1


> wrong results - use of convert_from(binary_string(key),'UTF8') in filter 
> results in wrong results
> -
>
> Key: DRILL-4862
> URL: https://issues.apache.org/jira/browse/DRILL-4862
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.8.0
> Environment: 4 node cluster CentOS
>Reporter: Khurram Faraaz
>Assignee: Gautam Kumar Parai
>
> These results do not look right, i.e when the predicate has 
> convert_from(binary_string(key),'UTF8')
> Apache drill 1.8.0-SNAPSHOT git commit ID: 57dc9f43
> {noformat}
> [root@centos-0x drill4478]# cat f1.json
> {"key":"\\x30\\x31\\x32\\x33"}
> {"key":"\\x34\\x35\\x36\\x37"}
> {"key":"\\x38\\x39\\x30\\x31"}
> {"key":"\\x30\\x30\\x30\\x30"}
> {"key":"\\x31\\x31\\x31\\x31"}
> {"key":"\\x35\\x35\\x35\\x35"}
> {"key":"\\x38\\x38\\x38\\x38"}
> {"key":"\\x39\\x39\\x39\\x39"}
> {"key":"\\x41\\x42\\x43\\x44"}
> {"key":"\\x45\\x46\\x47\\x48"}
> {"key":"\\x49\\x41\\x44\\x46"}
> {"key":"\\x4a\\x4b\\x4c\\x4d"}
> {"key":"\\x57\\x58\\x59\\x5a"}
> {"key":"\\x4e\\x4f\\x50\\x51"}
> {"key":"\\x46\\x46\\x46\\x46"}
> {noformat}
> results without the predicate - these are correct results
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> select convert_from(binary_string(key),'UTF8') 
> from `f1.json`;
> +-+
> | EXPR$0  |
> +-+
> | 0123|
> | 4567|
> | 8901|
> | |
> | |
> | |
> | |
> | |
> | ABCD|
> | EFGH|
> | IADF|
> | JKLM|
> | WXYZ|
> | NOPQ|
> | |
> +-+
> 15 rows selected (0.256 seconds)
> {noformat}
> results with a predicate - these results don't look correct
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> select convert_from(binary_string(key),'UTF8') 
> from `f1.json` where convert_from(binary_string(key),'UTF8') is not null;
> +--+
> |  EXPR$0  |
> +--+
> | 0123123  |
> | 4567567  |
> | 8901901  |
> | 000  |
> | 111  |
> | 555  |
> | 888  |
> | 999  |
> | ABCDBCD  |
> | EFGHFGH  |
> | IADFADF  |
> | JKLMKLM  |
> | WXYZXYZ  |
> | NOPQOPQ  |
> | FFF  |
> +--+
> 15 rows selected (0.279 seconds)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4862) wrong results - use of convert_from(binary_string(key),'UTF8') in filter results in wrong results

2016-10-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15553749#comment-15553749
 ] 

ASF GitHub Bot commented on DRILL-4862:
---

Github user chunhui-shi commented on a diff in the pull request:

https://github.com/apache/drill/pull/604#discussion_r82315506
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/StringFunctions.java
 ---
@@ -1540,15 +1540,16 @@ public void eval() {
   public static class BinaryString implements DrillSimpleFunc {
 @Param  VarCharHolder in;
 @Output VarBinaryHolder out;
+@Inject DrillBuf buffer;
 
 @Override
 public void setup() {}
 
 @Override
 public void eval() {
-  out.buffer = in.buffer;
-  out.start = in.start;
-  out.end = 
org.apache.drill.common.util.DrillStringUtils.parseBinaryString(in.buffer, 
in.start, in.end);
+  out.buffer = buffer.reallocIfNeeded(in.end - in.start);
+  out.start = out.end = 0;
+  out.end = 
org.apache.drill.common.util.DrillStringUtils.parseBinaryString(in.buffer, 
in.start, in.end, out.buffer);
   out.buffer.setIndex(out.start, out.end);
--- End diff --

Yes, we need to set readerIndex and writerIndex. Otherwise if there is 
another function consume the output of this function, it will hit error.


> wrong results - use of convert_from(binary_string(key),'UTF8') in filter 
> results in wrong results
> -
>
> Key: DRILL-4862
> URL: https://issues.apache.org/jira/browse/DRILL-4862
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.8.0
> Environment: 4 node cluster CentOS
>Reporter: Khurram Faraaz
>Assignee: Gautam Kumar Parai
>
> These results do not look right, i.e when the predicate has 
> convert_from(binary_string(key),'UTF8')
> Apache drill 1.8.0-SNAPSHOT git commit ID: 57dc9f43
> {noformat}
> [root@centos-0x drill4478]# cat f1.json
> {"key":"\\x30\\x31\\x32\\x33"}
> {"key":"\\x34\\x35\\x36\\x37"}
> {"key":"\\x38\\x39\\x30\\x31"}
> {"key":"\\x30\\x30\\x30\\x30"}
> {"key":"\\x31\\x31\\x31\\x31"}
> {"key":"\\x35\\x35\\x35\\x35"}
> {"key":"\\x38\\x38\\x38\\x38"}
> {"key":"\\x39\\x39\\x39\\x39"}
> {"key":"\\x41\\x42\\x43\\x44"}
> {"key":"\\x45\\x46\\x47\\x48"}
> {"key":"\\x49\\x41\\x44\\x46"}
> {"key":"\\x4a\\x4b\\x4c\\x4d"}
> {"key":"\\x57\\x58\\x59\\x5a"}
> {"key":"\\x4e\\x4f\\x50\\x51"}
> {"key":"\\x46\\x46\\x46\\x46"}
> {noformat}
> results without the predicate - these are correct results
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> select convert_from(binary_string(key),'UTF8') 
> from `f1.json`;
> +-+
> | EXPR$0  |
> +-+
> | 0123|
> | 4567|
> | 8901|
> | |
> | |
> | |
> | |
> | |
> | ABCD|
> | EFGH|
> | IADF|
> | JKLM|
> | WXYZ|
> | NOPQ|
> | |
> +-+
> 15 rows selected (0.256 seconds)
> {noformat}
> results with a predicate - these results don't look correct
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> select convert_from(binary_string(key),'UTF8') 
> from `f1.json` where convert_from(binary_string(key),'UTF8') is not null;
> +--+
> |  EXPR$0  |
> +--+
> | 0123123  |
> | 4567567  |
> | 8901901  |
> | 000  |
> | 111  |
> | 555  |
> | 888  |
> | 999  |
> | ABCDBCD  |
> | EFGHFGH  |
> | IADFADF  |
> | JKLMKLM  |
> | WXYZXYZ  |
> | NOPQOPQ  |
> | FFF  |
> +--+
> 15 rows selected (0.279 seconds)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4862) wrong results - use of convert_from(binary_string(key),'UTF8') in filter results in wrong results

2016-10-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15553748#comment-15553748
 ] 

ASF GitHub Bot commented on DRILL-4862:
---

Github user chunhui-shi commented on a diff in the pull request:

https://github.com/apache/drill/pull/604#discussion_r82315487
  
--- Diff: 
common/src/main/java/org/apache/drill/common/util/DrillStringUtils.java ---
@@ -160,10 +160,9 @@ private static void appendByte(StringBuilder result, 
byte b) {
*
--- End diff --

Done


> wrong results - use of convert_from(binary_string(key),'UTF8') in filter 
> results in wrong results
> -
>
> Key: DRILL-4862
> URL: https://issues.apache.org/jira/browse/DRILL-4862
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.8.0
> Environment: 4 node cluster CentOS
>Reporter: Khurram Faraaz
>Assignee: Gautam Kumar Parai
>
> These results do not look right, i.e when the predicate has 
> convert_from(binary_string(key),'UTF8')
> Apache drill 1.8.0-SNAPSHOT git commit ID: 57dc9f43
> {noformat}
> [root@centos-0x drill4478]# cat f1.json
> {"key":"\\x30\\x31\\x32\\x33"}
> {"key":"\\x34\\x35\\x36\\x37"}
> {"key":"\\x38\\x39\\x30\\x31"}
> {"key":"\\x30\\x30\\x30\\x30"}
> {"key":"\\x31\\x31\\x31\\x31"}
> {"key":"\\x35\\x35\\x35\\x35"}
> {"key":"\\x38\\x38\\x38\\x38"}
> {"key":"\\x39\\x39\\x39\\x39"}
> {"key":"\\x41\\x42\\x43\\x44"}
> {"key":"\\x45\\x46\\x47\\x48"}
> {"key":"\\x49\\x41\\x44\\x46"}
> {"key":"\\x4a\\x4b\\x4c\\x4d"}
> {"key":"\\x57\\x58\\x59\\x5a"}
> {"key":"\\x4e\\x4f\\x50\\x51"}
> {"key":"\\x46\\x46\\x46\\x46"}
> {noformat}
> results without the predicate - these are correct results
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> select convert_from(binary_string(key),'UTF8') 
> from `f1.json`;
> +-+
> | EXPR$0  |
> +-+
> | 0123|
> | 4567|
> | 8901|
> | |
> | |
> | |
> | |
> | |
> | ABCD|
> | EFGH|
> | IADF|
> | JKLM|
> | WXYZ|
> | NOPQ|
> | |
> +-+
> 15 rows selected (0.256 seconds)
> {noformat}
> results with a predicate - these results don't look correct
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> select convert_from(binary_string(key),'UTF8') 
> from `f1.json` where convert_from(binary_string(key),'UTF8') is not null;
> +--+
> |  EXPR$0  |
> +--+
> | 0123123  |
> | 4567567  |
> | 8901901  |
> | 000  |
> | 111  |
> | 555  |
> | 888  |
> | 999  |
> | ABCDBCD  |
> | EFGHFGH  |
> | IADFADF  |
> | JKLMKLM  |
> | WXYZXYZ  |
> | NOPQOPQ  |
> | FFF  |
> +--+
> 15 rows selected (0.279 seconds)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-3178) csv reader should allow newlines inside quotes

2016-10-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15553520#comment-15553520
 ] 

ASF GitHub Bot commented on DRILL-3178:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/593#discussion_r82304834
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/text/compliant/TextReader.java
 ---
@@ -231,33 +231,34 @@ private void parseQuotedValue(byte prev) throws 
IOException {
 final TextInput input = this.input;
 final byte quote = this.quote;
 
-ch = input.nextChar();
+try {
+  input.setMonitorForNewLine(false);
+  ch = input.nextChar();
 
-while (!(prev == quote && (ch == delimiter || ch == newLine || 
isWhite(ch {
-  if (ch != quote) {
-if (prev == quote) { // unescaped quote detected
-  if (parseUnescapedQuotes) {
-output.append(quote);
-output.append(ch);
-parseQuotedValue(ch);
-break;
-  } else {
-throw new TextParsingException(
-context,
-"Unescaped quote character '"
-+ quote
-+ "' inside quoted value of CSV field. To allow 
unescaped quotes, set 'parseUnescapedQuotes' to 'true' in the CSV parser 
settings. Cannot parse CSV input.");
+  while (!(prev == quote && (ch == delimiter || ch == newLine || 
isWhite(ch {
+if (ch != quote) {
+  if (prev == quote) { // unescaped quote detected
+if (parseUnescapedQuotes) {
+  output.append(quote);
+  output.append(ch);
+  parseQuotedValue(ch);
+  break;
+} else {
+  throw new TextParsingException(context, "Unescaped quote 
character '" + quote + "' inside quoted value of CSV field. To allow unescaped 
quotes, set 'parseUnescapedQuotes' to 'true' in the CSV parser settings. Cannot 
parse CSV input.");
+}
   }
+  output.append(ch);
+  prev = ch;
+} else if (prev == quoteEscape) {
+  output.append(quote);
+  prev = NULL_BYTE;
+} else {
+  prev = ch;
 }
-output.append(ch);
-prev = ch;
-  } else if (prev == quoteEscape) {
-output.append(quote);
-prev = NULL_BYTE;
-  } else {
-prev = ch;
+ch = input.nextChar();
   }
-  ch = input.nextChar();
+} finally {
--- End diff --

I see why it is done in finally. However, as noted above, I'm not sure that 
pushing this kind of flag into the getChar function is the optimal approach...


> csv reader should allow newlines inside quotes 
> ---
>
> Key: DRILL-3178
> URL: https://issues.apache.org/jira/browse/DRILL-3178
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Text & CSV
>Affects Versions: 1.0.0
> Environment: Ubuntu Trusty 14.04.2 LTS
>Reporter: Neal McBurnett
>Assignee: F Méthot
> Fix For: Future
>
> Attachments: drill-3178.patch
>
>
> When reading a csv file which contains newlines within quoted strings, e.g. 
> via
> select * from dfs.`/tmp/q.csv`;
> Drill 1.0 says:
> Error: SYSTEM ERROR: com.univocity.parsers.common.TextParsingException:  
> Error processing input: Cannot use newline character within quoted string
> But many tools produce csv files with newlines in quoted strings.  Drill 
> should be able to handle them.
> Workaround: the csvquote program (https://github.com/dbro/csvquote) can 
> encode embedded commas and newlines, and even decode them later if desired.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-3178) csv reader should allow newlines inside quotes

2016-10-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15553522#comment-15553522
 ] 

ASF GitHub Bot commented on DRILL-3178:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/593#discussion_r82303401
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/text/compliant/TextReader.java
 ---
@@ -231,33 +231,34 @@ private void parseQuotedValue(byte prev) throws 
IOException {
 final TextInput input = this.input;
 final byte quote = this.quote;
 
-ch = input.nextChar();
+try {
+  input.setMonitorForNewLine(false);
--- End diff --

Seems an overly complex way to do the parsing. Is there any reason we want 
to capture the original newline character rather than the normalized one?

If we need to capture the original one, then a cleaner way to do that is to 
keep track of the start & end position of the current token (character), and 
provide a method to return that block as a string. Then, scan for a close 
quote, reading characters & special-casing any newlines.

If we want to include newlines in quoted strings sometimes, but not other 
times, then the check logic can be a bit more complex.

But, the proposed solution of making newlines not be newlines seems a bit 
odd...


> csv reader should allow newlines inside quotes 
> ---
>
> Key: DRILL-3178
> URL: https://issues.apache.org/jira/browse/DRILL-3178
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Text & CSV
>Affects Versions: 1.0.0
> Environment: Ubuntu Trusty 14.04.2 LTS
>Reporter: Neal McBurnett
>Assignee: F Méthot
> Fix For: Future
>
> Attachments: drill-3178.patch
>
>
> When reading a csv file which contains newlines within quoted strings, e.g. 
> via
> select * from dfs.`/tmp/q.csv`;
> Drill 1.0 says:
> Error: SYSTEM ERROR: com.univocity.parsers.common.TextParsingException:  
> Error processing input: Cannot use newline character within quoted string
> But many tools produce csv files with newlines in quoted strings.  Drill 
> should be able to handle them.
> Workaround: the csvquote program (https://github.com/dbro/csvquote) can 
> encode embedded commas and newlines, and even decode them later if desired.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-3178) csv reader should allow newlines inside quotes

2016-10-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15553521#comment-15553521
 ] 

ASF GitHub Bot commented on DRILL-3178:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/593#discussion_r82296690
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/text/compliant/TextInput.java
 ---
@@ -88,6 +88,11 @@
   private boolean endFound = false;
 
   /**
+   * Switch for enabling/disabling new line detection
--- End diff --

Explain a bit more? Presumably, we already "monitor" and "detect" new lines 
in some way. What, specifically does this add? Presumably, it sets the mode to 
enable new line detection within quotes (the title of the Jira entry)?


> csv reader should allow newlines inside quotes 
> ---
>
> Key: DRILL-3178
> URL: https://issues.apache.org/jira/browse/DRILL-3178
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Text & CSV
>Affects Versions: 1.0.0
> Environment: Ubuntu Trusty 14.04.2 LTS
>Reporter: Neal McBurnett
>Assignee: F Méthot
> Fix For: Future
>
> Attachments: drill-3178.patch
>
>
> When reading a csv file which contains newlines within quoted strings, e.g. 
> via
> select * from dfs.`/tmp/q.csv`;
> Drill 1.0 says:
> Error: SYSTEM ERROR: com.univocity.parsers.common.TextParsingException:  
> Error processing input: Cannot use newline character within quoted string
> But many tools produce csv files with newlines in quoted strings.  Drill 
> should be able to handle them.
> Workaround: the csvquote program (https://github.com/dbro/csvquote) can 
> encode embedded commas and newlines, and even decode them later if desired.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-3178) csv reader should allow newlines inside quotes

2016-10-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15553523#comment-15553523
 ] 

ASF GitHub Bot commented on DRILL-3178:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/593#discussion_r82303872
  
--- Diff: exec/java-exec/src/test/resources/store/text/WithQuotedCrLf.tbl 
---
@@ -0,0 +1,6 @@
+"a
+1"|a|a
+a|"a
+2"|a
+a|a|"a
+3"
--- End diff --

Is there an issue with git converting Windows-style newlines (\r\n) into 
Unix-style (\n) when this file is checked in & out? Will that mess up the test? 
Should the test generate this file to handle this particular special case?


> csv reader should allow newlines inside quotes 
> ---
>
> Key: DRILL-3178
> URL: https://issues.apache.org/jira/browse/DRILL-3178
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Text & CSV
>Affects Versions: 1.0.0
> Environment: Ubuntu Trusty 14.04.2 LTS
>Reporter: Neal McBurnett
>Assignee: F Méthot
> Fix For: Future
>
> Attachments: drill-3178.patch
>
>
> When reading a csv file which contains newlines within quoted strings, e.g. 
> via
> select * from dfs.`/tmp/q.csv`;
> Drill 1.0 says:
> Error: SYSTEM ERROR: com.univocity.parsers.common.TextParsingException:  
> Error processing input: Cannot use newline character within quoted string
> But many tools produce csv files with newlines in quoted strings.  Drill 
> should be able to handle them.
> Workaround: the csvquote program (https://github.com/dbro/csvquote) can 
> encode embedded commas and newlines, and even decode them later if desired.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4581) Various problems in the Drill startup scripts

2016-10-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15553546#comment-15553546
 ] 

ASF GitHub Bot commented on DRILL-4581:
---

Github user paul-rogers closed the pull request at:

https://github.com/apache/drill/pull/478


> Various problems in the Drill startup scripts
> -
>
> Key: DRILL-4581
> URL: https://issues.apache.org/jira/browse/DRILL-4581
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components:  Server
>Affects Versions: 1.6.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.8.0
>
>
> Noticed the following in drillbit.sh:
> 1) Comment: DRILL_LOG_DIRWhere log files are stored.  PWD by default.
> Code: DRILL_LOG_DIR=/var/log/drill or, if it does not exist, $DRILL_HOME/log
> 2) Comment: DRILL_PID_DIRThe pid files are stored. /tmp by default.
> Code: DRILL_PID_DIR=$DRILL_HOME
> 3) Redundant checking of JAVA_HOME. drillbit.sh sources drill-config.sh which 
> checks JAVA_HOME. Later, drillbit.sh checks it again. The second check is 
> both unnecessary and prints a less informative message than the 
> drill-config.sh check. Suggestion: Remove the JAVA_HOME check in drillbit.sh.
> 4) Though drill-config.sh carefully checks JAVA_HOME, it does not export the 
> JAVA_HOME variable. Perhaps this is why drillbit.sh repeats the check? 
> Recommended: export JAVA_HOME from drill-config.sh.
> 5) Both drillbit.sh and the sourced drill-config.sh check DRILL_LOG_DIR and 
> set the default value. Drill-config.sh defaults to /var/log/drill, or if that 
> fails, to $DRILL_HOME/log. Drillbit.sh just sets /var/log/drill and does not 
> handle the case where that directory is not writable. Suggested: remove the 
> check in drillbit.sh.
> 6) Drill-config.sh checks the writability of the DRILL_LOG_DIR by touching 
> sqlline.log, but does not delete that file, leaving a bogus, empty client log 
> file on the drillbit server. Recommendation: use bash commands instead.
> 7) The implementation of the above check is a bit awkward. It has a fallback 
> case with somewhat awkward logic. Clean this up.
> 8) drillbit.sh, but not drill-config.sh, attempts to create /var/log/drill if 
> it does not exist. Recommended: decide on a single choice, implement it in 
> drill-config.sh.
> 9) drill-config.sh checks if $DRILL_CONF_DIR is a directory. If not, defaults 
> it to $DRILL_HOME/conf. This can lead to subtle errors. If I use
> drillbit.sh --config /misspelled/path
> where I mistype the path, I won't get an error, I get the default config, 
> which may not at all be what I want to run. Recommendation: if the value of 
> DRILL_CONF_DRILL is passed into the script (as a variable or via --config), 
> then that directory must exist. Else, use the default.
> 10) drill-config.sh exports, but may not set, HADOOP_HOME. This may be left 
> over from the original Hadoop script that the Drill script was based upon. 
> Recomendation: export only in the case that HADOOP_HOME is set for cygwin.
> 11) Drill-config.sh checks JAVA_HOME and prints a big, bold error message to 
> stderr if JAVA_HOME is not set. Then, it checks the Java version and prints a 
> different message (to stdout) if the version is wrong. Recommendation: use 
> the same format (and stderr) for both.
> 12) Similarly, other Java checks later in the script produce messages to 
> stdout, not stderr.
> 13) Drill-config.sh searches $JAVA_HOME to find java/java.exe and verifies 
> that it is executable. The script then throws away what we just found. Then, 
> drill-bit.sh tries to recreate this information as:
> JAVA=$JAVA_HOME/bin/java
> This is wrong in two ways: 1) it ignores the actual java location and assumes 
> it, and 2) it does not handle the java.exe case that drill-config.sh 
> carefully worked out.
> Recommendation: export JAVA from drill-config.sh and remove the above line 
> from drillbit.sh.
> 14) drillbit.sh presumably takes extra arguments like this:
> drillbit.sh -Dvar0=value0 --config /my/conf/dir start -Dvar1=value1 
> -Dvar2=value2 -Dvar3=value3
> The -D bit allows the user to override config variables at the command line. 
> But, the scripts don't use the values.
> A) drill-config.sh consumes --config /my/conf/dir after consuming the leading 
> arguments:
> while [ $# -gt 1 ]; do
>   if [ "--config" = "$1" ]; then
> shift
> confdir=$1
> shift
> DRILL_CONF_DIR=$confdir
>   else
> # Presume we are at end of options and break
> break
>   fi
> done
> B) drill-bit.sh will discard the var1:
> startStopStatus=$1 <-- grabs "start"
> shift
> command=drillbit
> shift   <-- Consumes -Dvar1=value1
> C) Remaining values passed back into drillbit.sh:
> args=$@
> nohup $thiscmd internal_start $command $args
> D) Second invocation discards

[jira] [Commented] (DRILL-4581) Various problems in the Drill startup scripts

2016-10-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15553545#comment-15553545
 ] 

ASF GitHub Bot commented on DRILL-4581:
---

Github user paul-rogers commented on the issue:

https://github.com/apache/drill/pull/478
  
Replaced by later commit.


> Various problems in the Drill startup scripts
> -
>
> Key: DRILL-4581
> URL: https://issues.apache.org/jira/browse/DRILL-4581
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components:  Server
>Affects Versions: 1.6.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.8.0
>
>
> Noticed the following in drillbit.sh:
> 1) Comment: DRILL_LOG_DIRWhere log files are stored.  PWD by default.
> Code: DRILL_LOG_DIR=/var/log/drill or, if it does not exist, $DRILL_HOME/log
> 2) Comment: DRILL_PID_DIRThe pid files are stored. /tmp by default.
> Code: DRILL_PID_DIR=$DRILL_HOME
> 3) Redundant checking of JAVA_HOME. drillbit.sh sources drill-config.sh which 
> checks JAVA_HOME. Later, drillbit.sh checks it again. The second check is 
> both unnecessary and prints a less informative message than the 
> drill-config.sh check. Suggestion: Remove the JAVA_HOME check in drillbit.sh.
> 4) Though drill-config.sh carefully checks JAVA_HOME, it does not export the 
> JAVA_HOME variable. Perhaps this is why drillbit.sh repeats the check? 
> Recommended: export JAVA_HOME from drill-config.sh.
> 5) Both drillbit.sh and the sourced drill-config.sh check DRILL_LOG_DIR and 
> set the default value. Drill-config.sh defaults to /var/log/drill, or if that 
> fails, to $DRILL_HOME/log. Drillbit.sh just sets /var/log/drill and does not 
> handle the case where that directory is not writable. Suggested: remove the 
> check in drillbit.sh.
> 6) Drill-config.sh checks the writability of the DRILL_LOG_DIR by touching 
> sqlline.log, but does not delete that file, leaving a bogus, empty client log 
> file on the drillbit server. Recommendation: use bash commands instead.
> 7) The implementation of the above check is a bit awkward. It has a fallback 
> case with somewhat awkward logic. Clean this up.
> 8) drillbit.sh, but not drill-config.sh, attempts to create /var/log/drill if 
> it does not exist. Recommended: decide on a single choice, implement it in 
> drill-config.sh.
> 9) drill-config.sh checks if $DRILL_CONF_DIR is a directory. If not, defaults 
> it to $DRILL_HOME/conf. This can lead to subtle errors. If I use
> drillbit.sh --config /misspelled/path
> where I mistype the path, I won't get an error, I get the default config, 
> which may not at all be what I want to run. Recommendation: if the value of 
> DRILL_CONF_DRILL is passed into the script (as a variable or via --config), 
> then that directory must exist. Else, use the default.
> 10) drill-config.sh exports, but may not set, HADOOP_HOME. This may be left 
> over from the original Hadoop script that the Drill script was based upon. 
> Recomendation: export only in the case that HADOOP_HOME is set for cygwin.
> 11) Drill-config.sh checks JAVA_HOME and prints a big, bold error message to 
> stderr if JAVA_HOME is not set. Then, it checks the Java version and prints a 
> different message (to stdout) if the version is wrong. Recommendation: use 
> the same format (and stderr) for both.
> 12) Similarly, other Java checks later in the script produce messages to 
> stdout, not stderr.
> 13) Drill-config.sh searches $JAVA_HOME to find java/java.exe and verifies 
> that it is executable. The script then throws away what we just found. Then, 
> drill-bit.sh tries to recreate this information as:
> JAVA=$JAVA_HOME/bin/java
> This is wrong in two ways: 1) it ignores the actual java location and assumes 
> it, and 2) it does not handle the java.exe case that drill-config.sh 
> carefully worked out.
> Recommendation: export JAVA from drill-config.sh and remove the above line 
> from drillbit.sh.
> 14) drillbit.sh presumably takes extra arguments like this:
> drillbit.sh -Dvar0=value0 --config /my/conf/dir start -Dvar1=value1 
> -Dvar2=value2 -Dvar3=value3
> The -D bit allows the user to override config variables at the command line. 
> But, the scripts don't use the values.
> A) drill-config.sh consumes --config /my/conf/dir after consuming the leading 
> arguments:
> while [ $# -gt 1 ]; do
>   if [ "--config" = "$1" ]; then
> shift
> confdir=$1
> shift
> DRILL_CONF_DIR=$confdir
>   else
> # Presume we are at end of options and break
> break
>   fi
> done
> B) drill-bit.sh will discard the var1:
> startStopStatus=$1 <-- grabs "start"
> shift
> command=drillbit
> shift   <-- Consumes -Dvar1=value1
> C) Remaining values passed back into drillbit.sh:
> args=$@
> nohup $thiscmd internal_start $command $args
> D) Second

[jira] [Commented] (DRILL-4618) random numbers generator function broken

2016-10-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15553879#comment-15553879
 ] 

ASF GitHub Bot commented on DRILL-4618:
---

Github user Ben-Zvi commented on the issue:

https://github.com/apache/drill/pull/509
  
Tested and checked ... LGTM



> random numbers generator function broken
> 
>
> Key: DRILL-4618
> URL: https://issues.apache.org/jira/browse/DRILL-4618
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Chunhui Shi
>Assignee: Boaz Ben-Zvi
>
> File this JIRA based on the the bug description from Ted's email and 
> discussion in dev mail list for record purpose:
> I am trying to generate some random numbers. I have a large base file (foo)
> this is what I get:
> 0: jdbc:drill:>  select floor(1000*random()) as x, floor(1000*random()) as
> y, floor(1000*rand()) as z from (select * from maprfs.tdunning.foo) a limit
> 20;
> ++++
> |   x|   y|   z|
> ++++
> | 556.0  | 556.0  | 618.0  |
> | 564.0  | 564.0  | 618.0  |
> | 129.0  | 129.0  | 618.0  |
> | 48.0   | 48.0   | 618.0  |
> | 696.0  | 696.0  | 618.0  |
> | 642.0  | 642.0  | 618.0  |
> | 535.0  | 535.0  | 618.0  |
> | 440.0  | 440.0  | 618.0  |
> | 894.0  | 894.0  | 618.0  |
> | 24.0   | 24.0   | 618.0  |
> | 508.0  | 508.0  | 618.0  |
> | 28.0   | 28.0   | 618.0  |
> | 816.0  | 816.0  | 618.0  |
> | 717.0  | 717.0  | 618.0  |
> | 334.0  | 334.0  | 618.0  |
> | 978.0  | 978.0  | 618.0  |
> | 646.0  | 646.0  | 618.0  |
> | 787.0  | 787.0  | 618.0  |
> | 260.0  | 260.0  | 618.0  |
> | 711.0  | 711.0  | 618.0  |
> ++++
> On this page, https://drill.apache.org/docs/math-and-trig/, the rand
> function is described and random() is not. But it appears that rand()
> delivers a constant instead (although a different constant each time the
> query is run) and it appears that random() delivers the same value when
> used multiple times in each returned value.
> This seems very, very wrong.
> The fault does not seem to be related to my querying a table:
> 0: jdbc:drill:> select rand(), random(), random() from (values (1),(2),(3))
> x;
> +-+---+---+
> |   EXPR$0|EXPR$1 |EXPR$2 |
> +-+---+---+
> | 0.1347749257216052  | 0.36724556209765014   | 0.36724556209765014   |
> | 0.1347749257216052  | 0.006087161689924625  | 0.006087161689924625  |
> | 0.1347749257216052  | 0.09417099142512142   | 0.09417099142512142   |
> +-+---+---+
> For reference, postgres doesn't have rand() and does the right thing with
> random().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4653) Malformed JSON should not stop the entire query from progressing

2016-10-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15554155#comment-15554155
 ] 

ASF GitHub Bot commented on DRILL-4653:
---

Github user paul-rogers commented on the issue:

https://github.com/apache/drill/pull/518
  
Ran some tests. The results look good. In particular, files with nested 
structures produced the correct results. Since it was the nested structure case 
that had me a bit worried, looks like the code is good to go.

+1 (non-binding)


> Malformed JSON should not stop the entire query from progressing
> 
>
> Key: DRILL-4653
> URL: https://issues.apache.org/jira/browse/DRILL-4653
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - JSON
>Affects Versions: 1.6.0
>Reporter: subbu srinivasan
> Fix For: Future
>
>
> Currently Drill query terminates upon first encounter of a invalid JSON line.
> Drill has to continue progressing after ignoring the bad records. Something 
> similar to a setting of (ignore.malformed.json) would help.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

59 matches

Mail list logo