[jira] [Commented] (DRILL-4479) JsonReader should pick a less restrictive type when creating the default column

2016-03-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15186687#comment-15186687
 ] 

ASF GitHub Bot commented on DRILL-4479:
---

GitHub user amansinha100 opened a pull request:

https://github.com/apache/drill/pull/420

DRILL-4479: Use varchar for default column when all_text_mode is enab…

…led.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/amansinha100/incubator-drill DRILL-4479

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/420.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #420


commit c5b4aef5b35547561ea71ce880391429643a6ee0
Author: Aman Sinha 
Date:   2016-03-08T17:27:32Z

DRILL-4479: Use varchar for default column when all_text_mode is enabled.




> JsonReader should pick a less restrictive type when creating the default 
> column
> ---
>
> Key: DRILL-4479
> URL: https://issues.apache.org/jira/browse/DRILL-4479
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - JSON
>Affects Versions: 1.5.0
>Reporter: Aman Sinha
> Attachments: mostlynulls.json
>
>
> This JIRA is related to DRILL-3806 but has a narrower scope, so I decided to 
> create separate one. 
> The JsonReader has the method ensureAtLeastOneField() (see 
> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/fn/JsonReader.java#L91)
>  that ensures that when no columns are found, create an empty one and it 
> chooses to create a nullable int column.  One consequence is that queries of 
> the following type fail:
> {noformat}
> select c1 from dfs.`mostlynulls.json`;
> ...
> ...
> | null  |
> | null  |
> Error: DATA_READ ERROR: Error parsing JSON - You tried to write a VarChar 
> type when you are using a ValueWriter of type NullableIntWriterImpl.
> File  /Users/asinha/data/mostlynulls.json
> Record  4097
> {noformat}
> In this file the first 4096 rows have NULL values for c1 followed by rows 
> that have a valid string.  
> It would be useful for the Json reader to choose a less restrictive type such 
> as varchar in order to allow more types of queries to run.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4484) NPE when querying empty directory

2016-03-08 Thread Deneche A. Hakim (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deneche A. Hakim updated DRILL-4484:

Affects Version/s: (was: 1.6.0)
   1.5.0

> NPE when querying  empty directory 
> ---
>
> Key: DRILL-4484
> URL: https://issues.apache.org/jira/browse/DRILL-4484
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.5.0
>Reporter: Victoria Markman
>Assignee: Deneche A. Hakim
> Fix For: 1.7.0
>
>
> {code}
> 0: jdbc:drill:drillbit=localhost> select count(*) from 
> dfs.`/drill/xyz/201604*`;
> Error: VALIDATION ERROR: null
> SQL Query null
> 0: jdbc:drill:drillbit=localhost> select count(*) from 
> dfs.`/drill/xyz/20160401`;
> Error: VALIDATION ERROR: null
> SQL Query null
> [Error Id: 87366a2d-fc90-42f3-a076-aed5efdd27cb on atsqa4-133.qa.lab:31010] 
> (state=,code=0)
> 0: jdbc:drill:drillbit=localhost> select count(*) from 
> dfs.`/drill/xyz/20160401/`;
> Error: VALIDATION ERROR: null
> SQL Query null
> [Error Id: ac122243-488e-4fb8-b89f-dc01c7e5c63a on atsqa4-133.qa.lab:31010] 
> (state=,code=0)
> {code}
> {code}
> [Mon Mar 07 15:00:19 root@/drill/xyz ] # ls -lR
> .:
> total 5
> -rw-r--r-- 1 root root 395 Feb 26 16:31 0_0_0.parquet
> drwxr-xr-x 2 root root   2 Feb 26 16:31 20160101
> drwxr-xr-x 2 root root   2 Feb 26 16:31 20160102
> drwxr-xr-x 2 root root   2 Feb 26 16:31 20160103
> drwxr-xr-x 2 root root   2 Feb 26 16:31 20160104
> drwxr-xr-x 2 root root   2 Feb 26 16:31 20160105
> drwxr-xr-x 2 root root   1 Feb 26 16:31 20160201
> drwxr-xr-x 2 root root   3 Feb 26 16:31 20160202
> drwxr-xr-x 2 root root   4 Feb 26 16:31 20160301
> drwxr-xr-x 2 root root   0 Feb 26 16:31 20160401
> ./20160101:
> total 1
> -rw-r--r-- 1 root root 395 Feb 26 16:31 0_0_0.parquet
> ./20160102:
> total 1
> -rw-r--r-- 1 root root 395 Feb 26 16:31 0_0_0.parquet
> ./20160103:
> total 1
> -rw-r--r-- 1 root root 395 Feb 26 16:31 0_0_0.parquet
> ./20160104:
> total 1
> -rw-r--r-- 1 root root 395 Feb 26 16:31 0_0_0.parquet
> ./20160105:
> total 1
> -rw-r--r-- 1 root root 395 Feb 26 16:31 0_0_0.parquet
> ./20160201:
> total 0
> ./20160202:
> total 1
> -rw-r--r-- 1 root root 395 Feb 26 16:31 0_0_0.parquet
> -rw-r--r-- 1 root root 395 Feb 26 16:31 1_0_0.parquet
> ./20160301:
> total 2
> -rw-r--r-- 1 root root 395 Feb 26 16:31 0_0_0.parquet
> -rw-r--r-- 1 root root 395 Feb 26 16:31 1_0_0.parquet
> -rw-r--r-- 1 root root 395 Feb 26 16:31 2_0_0.parquet
> ./20160401:
> total 0
> {code}
> Hakim's analysis:
> {code}
> More details about the NPE, actually it's an IllegalArgumentException: what 
> happens is that during planing no file meets the wildcard selection and the 
> query should fail during planing with a "Table not found" message, instead 
> execution starts and the scanner fail because no file was assigned to them
> {code}
> Drill version:
> {code}
> #Generated by Git-Commit-Id-Plugin
> #Mon Mar 07 19:38:24 UTC 2016
> git.commit.id.abbrev=a2fec78
> git.commit.user.email=adene...@gmail.com
> git.commit.message.full=DRILL-4457\: Difference in results returned by window 
> function over BIGINT data\n\nthis closes \#410\n
> git.commit.id=a2fec78695df979e240231cb9d32c7f18274a333
> git.commit.message.short=DRILL-4457\: Difference in results returned by 
> window function over BIGINT data
> git.commit.user.name=adeneche
> git.build.user.name=Unknown
> git.commit.id.describe=0.9.0-625-ga2fec78-dirty
> git.build.user.email=Unknown
> git.branch=master
> git.commit.time=07.03.2016 @ 17\:38\:42 UTC
> git.build.time=07.03.2016 @ 19\:38\:24 UTC
> git.remote.origin.url=https\://github.com/apache/drill
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4484) NPE when querying empty directory

2016-03-08 Thread Deneche A. Hakim (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deneche A. Hakim updated DRILL-4484:

Fix Version/s: 1.7.0

> NPE when querying  empty directory 
> ---
>
> Key: DRILL-4484
> URL: https://issues.apache.org/jira/browse/DRILL-4484
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.6.0
>Reporter: Victoria Markman
>Assignee: Deneche A. Hakim
> Fix For: 1.7.0
>
>
> {code}
> 0: jdbc:drill:drillbit=localhost> select count(*) from 
> dfs.`/drill/xyz/201604*`;
> Error: VALIDATION ERROR: null
> SQL Query null
> 0: jdbc:drill:drillbit=localhost> select count(*) from 
> dfs.`/drill/xyz/20160401`;
> Error: VALIDATION ERROR: null
> SQL Query null
> [Error Id: 87366a2d-fc90-42f3-a076-aed5efdd27cb on atsqa4-133.qa.lab:31010] 
> (state=,code=0)
> 0: jdbc:drill:drillbit=localhost> select count(*) from 
> dfs.`/drill/xyz/20160401/`;
> Error: VALIDATION ERROR: null
> SQL Query null
> [Error Id: ac122243-488e-4fb8-b89f-dc01c7e5c63a on atsqa4-133.qa.lab:31010] 
> (state=,code=0)
> {code}
> {code}
> [Mon Mar 07 15:00:19 root@/drill/xyz ] # ls -lR
> .:
> total 5
> -rw-r--r-- 1 root root 395 Feb 26 16:31 0_0_0.parquet
> drwxr-xr-x 2 root root   2 Feb 26 16:31 20160101
> drwxr-xr-x 2 root root   2 Feb 26 16:31 20160102
> drwxr-xr-x 2 root root   2 Feb 26 16:31 20160103
> drwxr-xr-x 2 root root   2 Feb 26 16:31 20160104
> drwxr-xr-x 2 root root   2 Feb 26 16:31 20160105
> drwxr-xr-x 2 root root   1 Feb 26 16:31 20160201
> drwxr-xr-x 2 root root   3 Feb 26 16:31 20160202
> drwxr-xr-x 2 root root   4 Feb 26 16:31 20160301
> drwxr-xr-x 2 root root   0 Feb 26 16:31 20160401
> ./20160101:
> total 1
> -rw-r--r-- 1 root root 395 Feb 26 16:31 0_0_0.parquet
> ./20160102:
> total 1
> -rw-r--r-- 1 root root 395 Feb 26 16:31 0_0_0.parquet
> ./20160103:
> total 1
> -rw-r--r-- 1 root root 395 Feb 26 16:31 0_0_0.parquet
> ./20160104:
> total 1
> -rw-r--r-- 1 root root 395 Feb 26 16:31 0_0_0.parquet
> ./20160105:
> total 1
> -rw-r--r-- 1 root root 395 Feb 26 16:31 0_0_0.parquet
> ./20160201:
> total 0
> ./20160202:
> total 1
> -rw-r--r-- 1 root root 395 Feb 26 16:31 0_0_0.parquet
> -rw-r--r-- 1 root root 395 Feb 26 16:31 1_0_0.parquet
> ./20160301:
> total 2
> -rw-r--r-- 1 root root 395 Feb 26 16:31 0_0_0.parquet
> -rw-r--r-- 1 root root 395 Feb 26 16:31 1_0_0.parquet
> -rw-r--r-- 1 root root 395 Feb 26 16:31 2_0_0.parquet
> ./20160401:
> total 0
> {code}
> Hakim's analysis:
> {code}
> More details about the NPE, actually it's an IllegalArgumentException: what 
> happens is that during planing no file meets the wildcard selection and the 
> query should fail during planing with a "Table not found" message, instead 
> execution starts and the scanner fail because no file was assigned to them
> {code}
> Drill version:
> {code}
> #Generated by Git-Commit-Id-Plugin
> #Mon Mar 07 19:38:24 UTC 2016
> git.commit.id.abbrev=a2fec78
> git.commit.user.email=adene...@gmail.com
> git.commit.message.full=DRILL-4457\: Difference in results returned by window 
> function over BIGINT data\n\nthis closes \#410\n
> git.commit.id=a2fec78695df979e240231cb9d32c7f18274a333
> git.commit.message.short=DRILL-4457\: Difference in results returned by 
> window function over BIGINT data
> git.commit.user.name=adeneche
> git.build.user.name=Unknown
> git.commit.id.describe=0.9.0-625-ga2fec78-dirty
> git.build.user.email=Unknown
> git.branch=master
> git.commit.time=07.03.2016 @ 17\:38\:42 UTC
> git.build.time=07.03.2016 @ 19\:38\:24 UTC
> git.remote.origin.url=https\://github.com/apache/drill
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (DRILL-4484) NPE when querying empty directory

2016-03-08 Thread Deneche A. Hakim (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deneche A. Hakim reassigned DRILL-4484:
---

Assignee: Deneche A. Hakim

> NPE when querying  empty directory 
> ---
>
> Key: DRILL-4484
> URL: https://issues.apache.org/jira/browse/DRILL-4484
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.6.0
>Reporter: Victoria Markman
>Assignee: Deneche A. Hakim
> Fix For: 1.7.0
>
>
> {code}
> 0: jdbc:drill:drillbit=localhost> select count(*) from 
> dfs.`/drill/xyz/201604*`;
> Error: VALIDATION ERROR: null
> SQL Query null
> 0: jdbc:drill:drillbit=localhost> select count(*) from 
> dfs.`/drill/xyz/20160401`;
> Error: VALIDATION ERROR: null
> SQL Query null
> [Error Id: 87366a2d-fc90-42f3-a076-aed5efdd27cb on atsqa4-133.qa.lab:31010] 
> (state=,code=0)
> 0: jdbc:drill:drillbit=localhost> select count(*) from 
> dfs.`/drill/xyz/20160401/`;
> Error: VALIDATION ERROR: null
> SQL Query null
> [Error Id: ac122243-488e-4fb8-b89f-dc01c7e5c63a on atsqa4-133.qa.lab:31010] 
> (state=,code=0)
> {code}
> {code}
> [Mon Mar 07 15:00:19 root@/drill/xyz ] # ls -lR
> .:
> total 5
> -rw-r--r-- 1 root root 395 Feb 26 16:31 0_0_0.parquet
> drwxr-xr-x 2 root root   2 Feb 26 16:31 20160101
> drwxr-xr-x 2 root root   2 Feb 26 16:31 20160102
> drwxr-xr-x 2 root root   2 Feb 26 16:31 20160103
> drwxr-xr-x 2 root root   2 Feb 26 16:31 20160104
> drwxr-xr-x 2 root root   2 Feb 26 16:31 20160105
> drwxr-xr-x 2 root root   1 Feb 26 16:31 20160201
> drwxr-xr-x 2 root root   3 Feb 26 16:31 20160202
> drwxr-xr-x 2 root root   4 Feb 26 16:31 20160301
> drwxr-xr-x 2 root root   0 Feb 26 16:31 20160401
> ./20160101:
> total 1
> -rw-r--r-- 1 root root 395 Feb 26 16:31 0_0_0.parquet
> ./20160102:
> total 1
> -rw-r--r-- 1 root root 395 Feb 26 16:31 0_0_0.parquet
> ./20160103:
> total 1
> -rw-r--r-- 1 root root 395 Feb 26 16:31 0_0_0.parquet
> ./20160104:
> total 1
> -rw-r--r-- 1 root root 395 Feb 26 16:31 0_0_0.parquet
> ./20160105:
> total 1
> -rw-r--r-- 1 root root 395 Feb 26 16:31 0_0_0.parquet
> ./20160201:
> total 0
> ./20160202:
> total 1
> -rw-r--r-- 1 root root 395 Feb 26 16:31 0_0_0.parquet
> -rw-r--r-- 1 root root 395 Feb 26 16:31 1_0_0.parquet
> ./20160301:
> total 2
> -rw-r--r-- 1 root root 395 Feb 26 16:31 0_0_0.parquet
> -rw-r--r-- 1 root root 395 Feb 26 16:31 1_0_0.parquet
> -rw-r--r-- 1 root root 395 Feb 26 16:31 2_0_0.parquet
> ./20160401:
> total 0
> {code}
> Hakim's analysis:
> {code}
> More details about the NPE, actually it's an IllegalArgumentException: what 
> happens is that during planing no file meets the wildcard selection and the 
> query should fail during planing with a "Table not found" message, instead 
> execution starts and the scanner fail because no file was assigned to them
> {code}
> Drill version:
> {code}
> #Generated by Git-Commit-Id-Plugin
> #Mon Mar 07 19:38:24 UTC 2016
> git.commit.id.abbrev=a2fec78
> git.commit.user.email=adene...@gmail.com
> git.commit.message.full=DRILL-4457\: Difference in results returned by window 
> function over BIGINT data\n\nthis closes \#410\n
> git.commit.id=a2fec78695df979e240231cb9d32c7f18274a333
> git.commit.message.short=DRILL-4457\: Difference in results returned by 
> window function over BIGINT data
> git.commit.user.name=adeneche
> git.build.user.name=Unknown
> git.commit.id.describe=0.9.0-625-ga2fec78-dirty
> git.build.user.email=Unknown
> git.branch=master
> git.commit.time=07.03.2016 @ 17\:38\:42 UTC
> git.build.time=07.03.2016 @ 19\:38\:24 UTC
> git.remote.origin.url=https\://github.com/apache/drill
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4473) Removing trivial projects reveals bugs in handling of nonexistent columns in StreamingAggregate

2016-03-08 Thread Jacques Nadeau (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacques Nadeau updated DRILL-4473:
--
Assignee: Sean Hsuan-Yi Chu

> Removing trivial projects reveals bugs in handling of nonexistent columns in 
> StreamingAggregate
> ---
>
> Key: DRILL-4473
> URL: https://issues.apache.org/jira/browse/DRILL-4473
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Jacques Nadeau
>Assignee: Sean Hsuan-Yi Chu
>
> We see a couple unit test failures in working with nonexistent columns once 
> DRILL-4467 is fixed. This is because trivial projects no longer protect 
> StreamingAggregate from non-existent columns. This is likely due to an 
> incorrect check before throwing a Unsupported error. An unknown/ANY type 
> should probably be allowed in the case of using sum/max/stddev
> {code:title=Plan before DRILL-4467}
> VOLCANO:Physical Planning (71ms):
> ScreenPrel: rowcount = 1.0, cumulative cost = {464.1 rows, 2375.1 cpu, 0.0 
> io, 0.0 network, 0.0 memory}, id = 185
>   ProjectPrel(col1=[$0], col2=[$1], col3=[$2], col4=[$3], col5=[$4]): 
> rowcount = 1.0, cumulative cost = {464.0 rows, 2375.0 cpu, 0.0 io, 0.0 
> network, 0.0 memory}, id = 184
> StreamAggPrel(group=[{}], col1=[SUM($0)], col2=[SUM($1)], col3=[SUM($2)], 
> col4=[SUM($3)], col5=[SUM($4)]): rowcount = 1.0, cumulative cost = {464.0 
> rows, 2375.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 183
>   LimitPrel(offset=[0], fetch=[0]): rowcount = 1.0, cumulative cost = 
> {463.0 rows, 2315.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 182
> ProjectPrel(int_col=[$0], bigint_col=[$3], float4_col=[$4], 
> float8_col=[$1], interval_year_col=[$2]): rowcount = 463.0, cumulative cost = 
> {463.0 rows, 2315.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 181
>   ScanPrel(groupscan=[EasyGroupScan 
> [selectionRoot=classpath:/employee.json, numFiles=1, columns=[`int_col`, 
> `bigint_col`, `float4_col`, `float8_col`, `interval_year_col`], 
> files=[classpath:/employee.json]]]): rowcount = 463.0, cumulative cost = 
> {463.0 rows, 2315.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 160
> {code}
> {code:title=Plan after DRILL-4467}
> VOLCANO:Physical Planning (63ms):
> ScreenPrel: rowcount = 1.0, cumulative cost = {464.1 rows, 2375.1 cpu, 0.0 
> io, 0.0 network, 0.0 memory}, id = 151
>   ProjectPrel(col1=[$0], col2=[$1], col3=[$2], col4=[$3], col5=[$4]): 
> rowcount = 1.0, cumulative cost = {464.0 rows, 2375.0 cpu, 0.0 io, 0.0 
> network, 0.0 memory}, id = 150
> StreamAggPrel(group=[{}], col1=[SUM($0)], col2=[SUM($1)], col3=[SUM($2)], 
> col4=[SUM($3)], col5=[SUM($4)]): rowcount = 1.0, cumulative cost = {464.0 
> rows, 2375.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 149
>   LimitPrel(offset=[0], fetch=[0]): rowcount = 1.0, cumulative cost = 
> {463.0 rows, 2315.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 148
> ScanPrel(groupscan=[EasyGroupScan 
> [selectionRoot=classpath:/employee.json, numFiles=1, columns=[`int_col`, 
> `bigint_col`, `float4_col`, `float8_col`, `interval_year_col`], 
> files=[classpath:/employee.json]]]): rowcount = 463.0, cumulative cost = 
> {463.0 rows, 2315.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 141
> Tests disabled referring to this bug in TestAggregateFunctions show multiple 
> examples of this behavior.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (DRILL-4323) Hive Native Reader : A simple count(*) throws Incoming batch has an empty schema error

2016-03-08 Thread Rahul Challapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rahul Challapalli closed DRILL-4323.


Verified and automated

> Hive Native Reader : A simple count(*) throws Incoming batch has an empty 
> schema error
> --
>
> Key: DRILL-4323
> URL: https://issues.apache.org/jira/browse/DRILL-4323
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 1.5.0
>Reporter: Rahul Challapalli
>Assignee: Sean Hsuan-Yi Chu
>Priority: Critical
> Fix For: 1.6.0
>
> Attachments: error.log
>
>
> git.commit.id.abbrev=3d0b4b0
> A simple count(*) query does not work when hive native reader is enabled
> {code}
> 0: jdbc:drill:zk=10.10.100.190:5181> select count(*) from customer;
> +-+
> | EXPR$0  |
> +-+
> | 10  |
> +-+
> 1 row selected (3.074 seconds)
> 0: jdbc:drill:zk=10.10.100.190:5181> alter session set 
> `store.hive.optimize_scan_with_native_readers` = true;
> +---++
> |  ok   |summary |
> +---++
> | true  | store.hive.optimize_scan_with_native_readers updated.  |
> +---++
> 1 row selected (0.2 seconds)
> 0: jdbc:drill:zk=10.10.100.190:5181> select count(*) from customer;
> Error: SYSTEM ERROR: IllegalStateException: Incoming batch [#1341, 
> ProjectRecordBatch] has an empty schema. This is not allowed.
> Fragment 0:0
> [Error Id: 4c867440-0fd3-4eda-922f-0f5eadcb1463 on qa-node191.qa.lab:31010] 
> (state=,code=0)
> {code}
> Hive DDL for the table :
> {code}
> create table customer
> (
> c_customer_sk int,
> c_customer_id string,
> c_current_cdemo_sk int,
> c_current_hdemo_sk int,
> c_current_addr_sk int,
> c_first_shipto_date_sk int,
> c_first_sales_date_sk int,
> c_salutation string,
> c_first_name string,
> c_last_name string,
> c_preferred_cust_flag string,
> c_birth_day int,
> c_birth_month int,
> c_birth_year int,
> c_birth_country string,
> c_login string,
> c_email_address string,
> c_last_review_date string
> )
> STORED AS PARQUET
> LOCATION '/drill/testdata/customer'
> {code}
> Attached the log file with the stacktrace



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4491) FormatPluginOptionsDescriptor requires FormatPluginConfig fields to be public

2016-03-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15186430#comment-15186430
 ] 

ASF GitHub Bot commented on DRILL-4491:
---

Github user adityakishore commented on the pull request:

https://github.com/apache/drill/pull/418#issuecomment-194097775
  
Until I looked at the code, I was under assumption that we are using 
Jackson to extract the serializable properties. We can, and should, definitely 
go that route.

The way code currently works is that it iterate through allthe table 
options and see if there is a Java field present in the corresponding 
FormatPluginConfig class. If it does find one, and this is why I say it is a 
bug in the current implementation, it makes is accessible 
(`setAccesible(true)`) implying that it is expected to work with non-public 
fields and sets the value to the one passed as parameter.


> FormatPluginOptionsDescriptor requires FormatPluginConfig fields to be public
> -
>
> Key: DRILL-4491
> URL: https://issues.apache.org/jira/browse/DRILL-4491
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Aditya Kishore
>Assignee: Aditya Kishore
>Priority: Minor
> Fix For: 1.7.0
>
>
> The code uses {{getField()}} instead of {{getDeclaredField()}}, which returns 
> only the public fields.
> {code:title=FormatPluginOptionsDescriptor.java:165|borderStyle=solid}
> Field field = pluginConfigClass.getField(paramDef.name);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (DRILL-4491) FormatPluginOptionsDescriptor requires FormatPluginConfig fields to be public

2016-03-08 Thread Aditya Kishore (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15186425#comment-15186425
 ] 

Aditya Kishore edited comment on DRILL-4491 at 3/9/16 3:30 AM:
---

Until I looked at the code, I was under assumption that we are using Jackson to 
extract the serializable properties. We can definitely go that route.

The way code currently works is that it iterate through allthe  table options 
and see if there is a Java field present in the corresponding 
FormatPluginConfig class. If it does find one, and this is why I say it is a 
bug in the current implementation, it makes is accessible 
({{setAccesible(true)}}) implying that it is expected to work with non-public 
fields and sets the value to the one passed as parameter.


was (Author: adityakishore):
Until I looked at the code, I was under assumption that we are using Jackson to 
extract the serializable properties. We can definitely go that route.

The way code currently work is that it iterate 

> FormatPluginOptionsDescriptor requires FormatPluginConfig fields to be public
> -
>
> Key: DRILL-4491
> URL: https://issues.apache.org/jira/browse/DRILL-4491
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Aditya Kishore
>Assignee: Aditya Kishore
>Priority: Minor
> Fix For: 1.7.0
>
>
> The code uses {{getField()}} instead of {{getDeclaredField()}}, which returns 
> only the public fields.
> {code:title=FormatPluginOptionsDescriptor.java:165|borderStyle=solid}
> Field field = pluginConfigClass.getField(paramDef.name);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4491) FormatPluginOptionsDescriptor requires FormatPluginConfig fields to be public

2016-03-08 Thread Aditya Kishore (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15186425#comment-15186425
 ] 

Aditya Kishore commented on DRILL-4491:
---

Until I looked at the code, I was under assumption that we are using Jackson to 
extract the serializable properties. We can definitely go that route.

The way code currently work is that it iterate 

> FormatPluginOptionsDescriptor requires FormatPluginConfig fields to be public
> -
>
> Key: DRILL-4491
> URL: https://issues.apache.org/jira/browse/DRILL-4491
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Aditya Kishore
>Assignee: Aditya Kishore
>Priority: Minor
> Fix For: 1.7.0
>
>
> The code uses {{getField()}} instead of {{getDeclaredField()}}, which returns 
> only the public fields.
> {code:title=FormatPluginOptionsDescriptor.java:165|borderStyle=solid}
> Field field = pluginConfigClass.getField(paramDef.name);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4482) Avro no longer selects data correctly from a sub-structure

2016-03-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15186293#comment-15186293
 ] 

ASF GitHub Bot commented on DRILL-4482:
---

Github user StevenMPhillips commented on the pull request:

https://github.com/apache/drill/pull/419#issuecomment-194060258
  
+1


> Avro no longer selects data correctly from a sub-structure
> --
>
> Key: DRILL-4482
> URL: https://issues.apache.org/jira/browse/DRILL-4482
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Avro
>Affects Versions: 1.6.0
>Reporter: Stefán Baxter
>Assignee: Stefán Baxter
>Priority: Blocker
> Fix For: 1.6.0
>
>
> Parquet:
> 0: jdbc:drill:zk=local> select s.client_ip.ip from 
> dfs.asa.`/processed/<>/transactions` as s limit 1;
> ++
> | EXPR$0 |
> ++
> | 87.55.171.210  |
> ++
> 1 row selected (1.184 seconds)
> Avro:
> 0: jdbc:drill:zk=local> select s.client_ip.ip from 
> dfs.asa.`/streaming/<>/transactions` as s limit 1;
> +-+
> | EXPR$0  |
> +-+
> | null|
> +-+
> 1 row selected (0.29 seconds)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4482) Avro no longer selects data correctly from a sub-structure

2016-03-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15186280#comment-15186280
 ] 

ASF GitHub Bot commented on DRILL-4482:
---

GitHub user jaltekruse opened a pull request:

https://github.com/apache/drill/pull/419

DRILL-4482: Avro subselection broken by 4382

This fix includes a number of test updates to ensure Avro files are being 
read correctly.

The branch includes 4441, which is on a different PR, but touched some of 
the same code, so I just based this fix on that branch.

The actual regression fix is in the AvroRecordReader, in the case of a 
Union, we should not be created a child of the fieldSelection, which was 
properly done in the case with maps and records.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jaltekruse/incubator-drill 4441-4482-avro-bugs

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/419.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #419


commit 56048dae231a6c11c2650384da6893a8c1011fee
Author: Jason Altekruse 
Date:   2016-02-26T17:55:05Z

DRILL-4441: Fix varchar data read out of Avro filtering incorrectly due to 
metadata bug

The precision of the Varchar datatype was not being set causing inconsistent
truncation of values to the default length of 1. Fixed the same issue with 
varbinary.

The test framework was previously taking a string as the baseline for a 
binary value,
which cannot express all possible values. Fixed the test to intstead use a 
byte array.
Thie required updating the hive tests that were using the old method of 
specifying
baselines with a String.

Fix cast to varbinary when reading from a data source with schema needed 
for writing
a test.

commit 15209ea07a41b0a7bdccb382950b5738bd229b18
Author: Jason Altekruse 
Date:   2016-03-08T22:16:03Z

DRILL-4482: Fix Avro nested field selection regression

Update some of the Avro tests to properly verify their results,
others still need to be fixed. These will be addressed in DRILL-4110.




> Avro no longer selects data correctly from a sub-structure
> --
>
> Key: DRILL-4482
> URL: https://issues.apache.org/jira/browse/DRILL-4482
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Avro
>Affects Versions: 1.6.0
>Reporter: Stefán Baxter
>Assignee: Stefán Baxter
>Priority: Blocker
> Fix For: 1.6.0
>
>
> Parquet:
> 0: jdbc:drill:zk=local> select s.client_ip.ip from 
> dfs.asa.`/processed/<>/transactions` as s limit 1;
> ++
> | EXPR$0 |
> ++
> | 87.55.171.210  |
> ++
> 1 row selected (1.184 seconds)
> Avro:
> 0: jdbc:drill:zk=local> select s.client_ip.ip from 
> dfs.asa.`/streaming/<>/transactions` as s limit 1;
> +-+
> | EXPR$0  |
> +-+
> | null|
> +-+
> 1 row selected (0.29 seconds)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4474) Inconsistent behavior while using COUNT in select (Apache drill 1.2.0)

2016-03-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15186247#comment-15186247
 ] 

ASF GitHub Bot commented on DRILL-4474:
---

Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/406


> Inconsistent behavior while using COUNT in select (Apache drill 1.2.0)
> --
>
> Key: DRILL-4474
> URL: https://issues.apache.org/jira/browse/DRILL-4474
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.5.0
> Environment: m3.xlarge AWS instances ( 3 nodes)
> CentOS6.5 x64
>Reporter: Shankar
>Assignee: Jacques Nadeau
>Priority: Blocker
>
> {quote}
> * We are using drill to retrieve the business data from game analytic. 
> * We are running below queries on table of size 50GB (parquet)
> * We have found some major inconsistency in data when we use COUNT function.
> * Below is the case by case queries and their output. {color:blue}*Please 
> analyse it carefully, to for clear understanding of behaviour. *{color}
> * Please let me know how to resolve this ? (or any earlier JIRA has been 
> already created). 
> * Hope this may be fixed in later versions. If not please do the needful.
> {quote}
> --
> CASE-1 (Wrong result)
> --
> {color:red}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = 
> 'Click' then sessionid end) as cnt
> . . . . . . . > from dfs.tmp.a_games_log_visit_base t
> . . . . . . . > ; 
> +---+
> |   count   |
> +---+
> | 27645752  |
> +---+
> 1 row selected (0.281 seconds)
> {noformat}
> {quote}
> {color}
> --
> CASE-2 (Wrong result)
> --
> {color:red}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(sessionid), 
> . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = 
> 'Click' then sessionid end) as cnt
> . . . . . . . > from dfs.tmp.a_games_log_visit_base t
> . . . . . . . > ; 
> +---+---+
> |  EXPR$0   |  cnt  |
> +---+---+
> | 37772844  | 2108  |
> +---+---+
> 1 row selected (12.597 seconds)
> {noformat}
> {quote}
> {color}
> --
> CASE-3 (Wrong result, only first count is correct)
> --
> {color:red}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(distinct sessionid), 
> . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = 
> 'Click' then sessionid end) as cnt
> . . . . . . . > from dfs.tmp.a_games_log_visit_base t
> . . . . . . . > ; 
> +-+---+
> | EXPR$0  |cnt|
> +-+---+
> | 201941  | 37772844  |
> +-+---+
> 1 row selected (8.259 seconds)
> {noformat}
> {quote}
> {color}
> --
> CASE-4 (Correct result)
> --
> {color:green}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(distinct case when t.id = '/confirmDrop/btnYes/' and 
> t.event = 'Click' then sessionid end) as cnt
> . . . . . . . > from dfs.tmp.a_games_log_visit_base t
> . . . . . . . > ; 
> +--+
> | cnt  |
> +--+
> | 525  |
> +--+
> 1 row selected (14.318 seconds)
> {noformat}
> {quote}
> {color}
> --
> CASE-5 (Correct result)
> --
> {color:green}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(sessionid),
> . . . . . . . 

[jira] [Commented] (DRILL-4474) Inconsistent behavior while using COUNT in select (Apache drill 1.2.0)

2016-03-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15186246#comment-15186246
 ] 

ASF GitHub Bot commented on DRILL-4474:
---

Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/416


> Inconsistent behavior while using COUNT in select (Apache drill 1.2.0)
> --
>
> Key: DRILL-4474
> URL: https://issues.apache.org/jira/browse/DRILL-4474
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.5.0
> Environment: m3.xlarge AWS instances ( 3 nodes)
> CentOS6.5 x64
>Reporter: Shankar
>Assignee: Jacques Nadeau
>Priority: Blocker
>
> {quote}
> * We are using drill to retrieve the business data from game analytic. 
> * We are running below queries on table of size 50GB (parquet)
> * We have found some major inconsistency in data when we use COUNT function.
> * Below is the case by case queries and their output. {color:blue}*Please 
> analyse it carefully, to for clear understanding of behaviour. *{color}
> * Please let me know how to resolve this ? (or any earlier JIRA has been 
> already created). 
> * Hope this may be fixed in later versions. If not please do the needful.
> {quote}
> --
> CASE-1 (Wrong result)
> --
> {color:red}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = 
> 'Click' then sessionid end) as cnt
> . . . . . . . > from dfs.tmp.a_games_log_visit_base t
> . . . . . . . > ; 
> +---+
> |   count   |
> +---+
> | 27645752  |
> +---+
> 1 row selected (0.281 seconds)
> {noformat}
> {quote}
> {color}
> --
> CASE-2 (Wrong result)
> --
> {color:red}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(sessionid), 
> . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = 
> 'Click' then sessionid end) as cnt
> . . . . . . . > from dfs.tmp.a_games_log_visit_base t
> . . . . . . . > ; 
> +---+---+
> |  EXPR$0   |  cnt  |
> +---+---+
> | 37772844  | 2108  |
> +---+---+
> 1 row selected (12.597 seconds)
> {noformat}
> {quote}
> {color}
> --
> CASE-3 (Wrong result, only first count is correct)
> --
> {color:red}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(distinct sessionid), 
> . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = 
> 'Click' then sessionid end) as cnt
> . . . . . . . > from dfs.tmp.a_games_log_visit_base t
> . . . . . . . > ; 
> +-+---+
> | EXPR$0  |cnt|
> +-+---+
> | 201941  | 37772844  |
> +-+---+
> 1 row selected (8.259 seconds)
> {noformat}
> {quote}
> {color}
> --
> CASE-4 (Correct result)
> --
> {color:green}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(distinct case when t.id = '/confirmDrop/btnYes/' and 
> t.event = 'Click' then sessionid end) as cnt
> . . . . . . . > from dfs.tmp.a_games_log_visit_base t
> . . . . . . . > ; 
> +--+
> | cnt  |
> +--+
> | 525  |
> +--+
> 1 row selected (14.318 seconds)
> {noformat}
> {quote}
> {color}
> --
> CASE-5 (Correct result)
> --
> {color:green}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(sessionid),
> . . . . . . . 

[jira] [Commented] (DRILL-4491) FormatPluginOptionsDescriptor requires FormatPluginConfig fields to be public

2016-03-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15186220#comment-15186220
 ] 

ASF GitHub Bot commented on DRILL-4491:
---

GitHub user adityakishore opened a pull request:

https://github.com/apache/drill/pull/418

DRILL-4491: FormatPluginOptionsDescriptor requires FormatPluginConfig…

… fields to be public

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/adityakishore/drill DRILL-4491

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/418.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #418


commit cce8467c2476da871891bad7db6cab3236537f7c
Author: Aditya Kishore 
Date:   2016-03-09T00:49:55Z

DRILL-4491: FormatPluginOptionsDescriptor requires FormatPluginConfig 
fields to be public




> FormatPluginOptionsDescriptor requires FormatPluginConfig fields to be public
> -
>
> Key: DRILL-4491
> URL: https://issues.apache.org/jira/browse/DRILL-4491
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Aditya Kishore
>Assignee: Aditya Kishore
>Priority: Minor
> Fix For: 1.7.0
>
>
> The code uses {{getField()}} instead of {{getDeclaredField()}}, which returns 
> only the public fields.
> {code:title=FormatPluginOptionsDescriptor.java:165|borderStyle=solid}
> Field field = pluginConfigClass.getField(paramDef.name);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4474) Inconsistent behavior while using COUNT in select (Apache drill 1.2.0)

2016-03-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15186213#comment-15186213
 ] 

ASF GitHub Bot commented on DRILL-4474:
---

Github user jinfengni commented on a diff in the pull request:

https://github.com/apache/drill/pull/416#discussion_r55455931
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/ConvertCountToDirectScan.java
 ---
@@ -117,6 +117,24 @@ public void onMatch(RelOptRuleCall call) {
   } else if (aggCall.getArgList().size() == 1) {
   // count(columnName) ==> Agg ( Scan )) ==> columnValueCount
 int index = aggCall.getArgList().get(0);
+
+if (proj != null) {
+  // project in the middle of Agg and Scan : Only when input of 
AggCall is a RexInputRef in Project, we find the index of Scan's field.
+  // For instance,
+  // Agg - count($0)
+  //  \
+  //  Proj - Exp={$1}
+  //\
+  //   Scan (col1, col2).
+  // return count of "col2" in Scan's metadata, if found.
+
+  if (proj.getProjects().get(index) instanceof RexInputRef) {
+index = ((RexInputRef) 
proj.getProjects().get(index)).getIndex();
--- End diff --

Make sense. Let me add more unit test in the patch. 


> Inconsistent behavior while using COUNT in select (Apache drill 1.2.0)
> --
>
> Key: DRILL-4474
> URL: https://issues.apache.org/jira/browse/DRILL-4474
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.5.0
> Environment: m3.xlarge AWS instances ( 3 nodes)
> CentOS6.5 x64
>Reporter: Shankar
>Assignee: Jacques Nadeau
>Priority: Blocker
>
> {quote}
> * We are using drill to retrieve the business data from game analytic. 
> * We are running below queries on table of size 50GB (parquet)
> * We have found some major inconsistency in data when we use COUNT function.
> * Below is the case by case queries and their output. {color:blue}*Please 
> analyse it carefully, to for clear understanding of behaviour. *{color}
> * Please let me know how to resolve this ? (or any earlier JIRA has been 
> already created). 
> * Hope this may be fixed in later versions. If not please do the needful.
> {quote}
> --
> CASE-1 (Wrong result)
> --
> {color:red}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = 
> 'Click' then sessionid end) as cnt
> . . . . . . . > from dfs.tmp.a_games_log_visit_base t
> . . . . . . . > ; 
> +---+
> |   count   |
> +---+
> | 27645752  |
> +---+
> 1 row selected (0.281 seconds)
> {noformat}
> {quote}
> {color}
> --
> CASE-2 (Wrong result)
> --
> {color:red}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(sessionid), 
> . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = 
> 'Click' then sessionid end) as cnt
> . . . . . . . > from dfs.tmp.a_games_log_visit_base t
> . . . . . . . > ; 
> +---+---+
> |  EXPR$0   |  cnt  |
> +---+---+
> | 37772844  | 2108  |
> +---+---+
> 1 row selected (12.597 seconds)
> {noformat}
> {quote}
> {color}
> --
> CASE-3 (Wrong result, only first count is correct)
> --
> {color:red}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(distinct sessionid), 
> . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = 
> 'Click' then sessionid end) as cnt
> . . . . . . . > from dfs.tmp.a_games_log_visit_base t
> . . . . . . . > ; 
> +-+---+
> | EXPR$0  |cnt|
> +-+---+
> | 201941  | 37772844  |
> +-+---+
> 1 row selected (8.259 seconds)
> {noformat}
> {quote}
> {color}
> 

[jira] [Commented] (DRILL-4474) Inconsistent behavior while using COUNT in select (Apache drill 1.2.0)

2016-03-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15186216#comment-15186216
 ] 

ASF GitHub Bot commented on DRILL-4474:
---

Github user amansinha100 commented on the pull request:

https://github.com/apache/drill/pull/416#issuecomment-194042039
  
Overall, LGTM.  +1


> Inconsistent behavior while using COUNT in select (Apache drill 1.2.0)
> --
>
> Key: DRILL-4474
> URL: https://issues.apache.org/jira/browse/DRILL-4474
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.5.0
> Environment: m3.xlarge AWS instances ( 3 nodes)
> CentOS6.5 x64
>Reporter: Shankar
>Assignee: Jacques Nadeau
>Priority: Blocker
>
> {quote}
> * We are using drill to retrieve the business data from game analytic. 
> * We are running below queries on table of size 50GB (parquet)
> * We have found some major inconsistency in data when we use COUNT function.
> * Below is the case by case queries and their output. {color:blue}*Please 
> analyse it carefully, to for clear understanding of behaviour. *{color}
> * Please let me know how to resolve this ? (or any earlier JIRA has been 
> already created). 
> * Hope this may be fixed in later versions. If not please do the needful.
> {quote}
> --
> CASE-1 (Wrong result)
> --
> {color:red}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = 
> 'Click' then sessionid end) as cnt
> . . . . . . . > from dfs.tmp.a_games_log_visit_base t
> . . . . . . . > ; 
> +---+
> |   count   |
> +---+
> | 27645752  |
> +---+
> 1 row selected (0.281 seconds)
> {noformat}
> {quote}
> {color}
> --
> CASE-2 (Wrong result)
> --
> {color:red}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(sessionid), 
> . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = 
> 'Click' then sessionid end) as cnt
> . . . . . . . > from dfs.tmp.a_games_log_visit_base t
> . . . . . . . > ; 
> +---+---+
> |  EXPR$0   |  cnt  |
> +---+---+
> | 37772844  | 2108  |
> +---+---+
> 1 row selected (12.597 seconds)
> {noformat}
> {quote}
> {color}
> --
> CASE-3 (Wrong result, only first count is correct)
> --
> {color:red}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(distinct sessionid), 
> . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = 
> 'Click' then sessionid end) as cnt
> . . . . . . . > from dfs.tmp.a_games_log_visit_base t
> . . . . . . . > ; 
> +-+---+
> | EXPR$0  |cnt|
> +-+---+
> | 201941  | 37772844  |
> +-+---+
> 1 row selected (8.259 seconds)
> {noformat}
> {quote}
> {color}
> --
> CASE-4 (Correct result)
> --
> {color:green}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(distinct case when t.id = '/confirmDrop/btnYes/' and 
> t.event = 'Click' then sessionid end) as cnt
> . . . . . . . > from dfs.tmp.a_games_log_visit_base t
> . . . . . . . > ; 
> +--+
> | cnt  |
> +--+
> | 525  |
> +--+
> 1 row selected (14.318 seconds)
> {noformat}
> {quote}
> {color}
> --
> CASE-5 (Correct result)
> --
> {color:green}
> {quote}
> {noformat}
> 0: jdbc:drill:> 

[jira] [Created] (DRILL-4491) FormatPluginOptionsDescriptor requires FormatPluginConfig fields to be public

2016-03-08 Thread Aditya Kishore (JIRA)
Aditya Kishore created DRILL-4491:
-

 Summary: FormatPluginOptionsDescriptor requires FormatPluginConfig 
fields to be public
 Key: DRILL-4491
 URL: https://issues.apache.org/jira/browse/DRILL-4491
 Project: Apache Drill
  Issue Type: Bug
Reporter: Aditya Kishore
Assignee: Aditya Kishore
Priority: Minor


The code uses {{getField()}} instead of {{getDeclaredField()}}, which returns 
only the public fields.

{code:title=FormatPluginOptionsDescriptor.java:165|borderStyle=solid}
Field field = pluginConfigClass.getField(paramDef.name);
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4474) Inconsistent behavior while using COUNT in select (Apache drill 1.2.0)

2016-03-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15186209#comment-15186209
 ] 

ASF GitHub Bot commented on DRILL-4474:
---

Github user jacques-n commented on a diff in the pull request:

https://github.com/apache/drill/pull/416#discussion_r55455592
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/ConvertCountToDirectScan.java
 ---
@@ -117,6 +117,24 @@ public void onMatch(RelOptRuleCall call) {
   } else if (aggCall.getArgList().size() == 1) {
   // count(columnName) ==> Agg ( Scan )) ==> columnValueCount
 int index = aggCall.getArgList().get(0);
+
+if (proj != null) {
+  // project in the middle of Agg and Scan : Only when input of 
AggCall is a RexInputRef in Project, we find the index of Scan's field.
+  // For instance,
+  // Agg - count($0)
+  //  \
+  //  Proj - Exp={$1}
+  //\
+  //   Scan (col1, col2).
+  // return count of "col2" in Scan's metadata, if found.
+
+  if (proj.getProjects().get(index) instanceof RexInputRef) {
+index = ((RexInputRef) 
proj.getProjects().get(index)).getIndex();
--- End diff --

might be good to add a test case for that just in case calcite changes this 
behavior in future.


> Inconsistent behavior while using COUNT in select (Apache drill 1.2.0)
> --
>
> Key: DRILL-4474
> URL: https://issues.apache.org/jira/browse/DRILL-4474
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.5.0
> Environment: m3.xlarge AWS instances ( 3 nodes)
> CentOS6.5 x64
>Reporter: Shankar
>Assignee: Jacques Nadeau
>Priority: Blocker
>
> {quote}
> * We are using drill to retrieve the business data from game analytic. 
> * We are running below queries on table of size 50GB (parquet)
> * We have found some major inconsistency in data when we use COUNT function.
> * Below is the case by case queries and their output. {color:blue}*Please 
> analyse it carefully, to for clear understanding of behaviour. *{color}
> * Please let me know how to resolve this ? (or any earlier JIRA has been 
> already created). 
> * Hope this may be fixed in later versions. If not please do the needful.
> {quote}
> --
> CASE-1 (Wrong result)
> --
> {color:red}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = 
> 'Click' then sessionid end) as cnt
> . . . . . . . > from dfs.tmp.a_games_log_visit_base t
> . . . . . . . > ; 
> +---+
> |   count   |
> +---+
> | 27645752  |
> +---+
> 1 row selected (0.281 seconds)
> {noformat}
> {quote}
> {color}
> --
> CASE-2 (Wrong result)
> --
> {color:red}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(sessionid), 
> . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = 
> 'Click' then sessionid end) as cnt
> . . . . . . . > from dfs.tmp.a_games_log_visit_base t
> . . . . . . . > ; 
> +---+---+
> |  EXPR$0   |  cnt  |
> +---+---+
> | 37772844  | 2108  |
> +---+---+
> 1 row selected (12.597 seconds)
> {noformat}
> {quote}
> {color}
> --
> CASE-3 (Wrong result, only first count is correct)
> --
> {color:red}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(distinct sessionid), 
> . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = 
> 'Click' then sessionid end) as cnt
> . . . . . . . > from dfs.tmp.a_games_log_visit_base t
> . . . . . . . > ; 
> +-+---+
> | EXPR$0  |cnt|
> +-+---+
> | 201941  | 37772844  |
> +-+---+
> 1 row selected (8.259 seconds)
> {noformat}
> {quote}
> {color}
> 

[jira] [Commented] (DRILL-4485) MapR profile - use MapR 5.1.0

2016-03-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15186175#comment-15186175
 ] 

ASF GitHub Bot commented on DRILL-4485:
---

Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/417


> MapR profile - use MapR 5.1.0
> -
>
> Key: DRILL-4485
> URL: https://issues.apache.org/jira/browse/DRILL-4485
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Tools, Build & Test
>Reporter: Patrick Wong
>Assignee: Parth Chandra
> Fix For: 1.6.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4474) Inconsistent behavior while using COUNT in select (Apache drill 1.2.0)

2016-03-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15186171#comment-15186171
 ] 

ASF GitHub Bot commented on DRILL-4474:
---

Github user jinfengni commented on a diff in the pull request:

https://github.com/apache/drill/pull/416#discussion_r55454105
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/ConvertCountToDirectScan.java
 ---
@@ -117,6 +117,24 @@ public void onMatch(RelOptRuleCall call) {
   } else if (aggCall.getArgList().size() == 1) {
   // count(columnName) ==> Agg ( Scan )) ==> columnValueCount
 int index = aggCall.getArgList().get(0);
+
+if (proj != null) {
+  // project in the middle of Agg and Scan : Only when input of 
AggCall is a RexInputRef in Project, we find the index of Scan's field.
+  // For instance,
+  // Agg - count($0)
+  //  \
+  //  Proj - Exp={$1}
+  //\
+  //   Scan (col1, col2).
+  // return count of "col2" in Scan's metadata, if found.
+
+  if (proj.getProjects().get(index) instanceof RexInputRef) {
+index = ((RexInputRef) 
proj.getProjects().get(index)).getIndex();
--- End diff --

Calcite rewrote count(100) or count(1) into count() ==> 
aggCall.getArgList.isEmpty() is true. So Line 113 will take care of those 
cases. 



> Inconsistent behavior while using COUNT in select (Apache drill 1.2.0)
> --
>
> Key: DRILL-4474
> URL: https://issues.apache.org/jira/browse/DRILL-4474
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.5.0
> Environment: m3.xlarge AWS instances ( 3 nodes)
> CentOS6.5 x64
>Reporter: Shankar
>Assignee: Jacques Nadeau
>Priority: Blocker
>
> {quote}
> * We are using drill to retrieve the business data from game analytic. 
> * We are running below queries on table of size 50GB (parquet)
> * We have found some major inconsistency in data when we use COUNT function.
> * Below is the case by case queries and their output. {color:blue}*Please 
> analyse it carefully, to for clear understanding of behaviour. *{color}
> * Please let me know how to resolve this ? (or any earlier JIRA has been 
> already created). 
> * Hope this may be fixed in later versions. If not please do the needful.
> {quote}
> --
> CASE-1 (Wrong result)
> --
> {color:red}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = 
> 'Click' then sessionid end) as cnt
> . . . . . . . > from dfs.tmp.a_games_log_visit_base t
> . . . . . . . > ; 
> +---+
> |   count   |
> +---+
> | 27645752  |
> +---+
> 1 row selected (0.281 seconds)
> {noformat}
> {quote}
> {color}
> --
> CASE-2 (Wrong result)
> --
> {color:red}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(sessionid), 
> . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = 
> 'Click' then sessionid end) as cnt
> . . . . . . . > from dfs.tmp.a_games_log_visit_base t
> . . . . . . . > ; 
> +---+---+
> |  EXPR$0   |  cnt  |
> +---+---+
> | 37772844  | 2108  |
> +---+---+
> 1 row selected (12.597 seconds)
> {noformat}
> {quote}
> {color}
> --
> CASE-3 (Wrong result, only first count is correct)
> --
> {color:red}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(distinct sessionid), 
> . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = 
> 'Click' then sessionid end) as cnt
> . . . . . . . > from dfs.tmp.a_games_log_visit_base t
> . . . . . . . > ; 
> +-+---+
> | EXPR$0  |cnt|
> +-+---+
> | 201941  | 37772844  |
> +-+---+
> 1 row selected (8.259 seconds)
> {noformat}
> 

[jira] [Commented] (DRILL-4474) Inconsistent behavior while using COUNT in select (Apache drill 1.2.0)

2016-03-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15186153#comment-15186153
 ] 

ASF GitHub Bot commented on DRILL-4474:
---

Github user jacques-n commented on a diff in the pull request:

https://github.com/apache/drill/pull/416#discussion_r55453543
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/ConvertCountToDirectScan.java
 ---
@@ -117,6 +117,24 @@ public void onMatch(RelOptRuleCall call) {
   } else if (aggCall.getArgList().size() == 1) {
   // count(columnName) ==> Agg ( Scan )) ==> columnValueCount
 int index = aggCall.getArgList().get(0);
+
+if (proj != null) {
+  // project in the middle of Agg and Scan : Only when input of 
AggCall is a RexInputRef in Project, we find the index of Scan's field.
+  // For instance,
+  // Agg - count($0)
+  //  \
+  //  Proj - Exp={$1}
+  //\
+  //   Scan (col1, col2).
+  // return count of "col2" in Scan's metadata, if found.
+
+  if (proj.getProjects().get(index) instanceof RexInputRef) {
+index = ((RexInputRef) 
proj.getProjects().get(index)).getIndex();
--- End diff --

Doesn't this mean count(100) & count(1) still fail to pushdown?


> Inconsistent behavior while using COUNT in select (Apache drill 1.2.0)
> --
>
> Key: DRILL-4474
> URL: https://issues.apache.org/jira/browse/DRILL-4474
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.5.0
> Environment: m3.xlarge AWS instances ( 3 nodes)
> CentOS6.5 x64
>Reporter: Shankar
>Assignee: Jacques Nadeau
>Priority: Blocker
>
> {quote}
> * We are using drill to retrieve the business data from game analytic. 
> * We are running below queries on table of size 50GB (parquet)
> * We have found some major inconsistency in data when we use COUNT function.
> * Below is the case by case queries and their output. {color:blue}*Please 
> analyse it carefully, to for clear understanding of behaviour. *{color}
> * Please let me know how to resolve this ? (or any earlier JIRA has been 
> already created). 
> * Hope this may be fixed in later versions. If not please do the needful.
> {quote}
> --
> CASE-1 (Wrong result)
> --
> {color:red}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = 
> 'Click' then sessionid end) as cnt
> . . . . . . . > from dfs.tmp.a_games_log_visit_base t
> . . . . . . . > ; 
> +---+
> |   count   |
> +---+
> | 27645752  |
> +---+
> 1 row selected (0.281 seconds)
> {noformat}
> {quote}
> {color}
> --
> CASE-2 (Wrong result)
> --
> {color:red}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(sessionid), 
> . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = 
> 'Click' then sessionid end) as cnt
> . . . . . . . > from dfs.tmp.a_games_log_visit_base t
> . . . . . . . > ; 
> +---+---+
> |  EXPR$0   |  cnt  |
> +---+---+
> | 37772844  | 2108  |
> +---+---+
> 1 row selected (12.597 seconds)
> {noformat}
> {quote}
> {color}
> --
> CASE-3 (Wrong result, only first count is correct)
> --
> {color:red}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(distinct sessionid), 
> . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = 
> 'Click' then sessionid end) as cnt
> . . . . . . . > from dfs.tmp.a_games_log_visit_base t
> . . . . . . . > ; 
> +-+---+
> | EXPR$0  |cnt|
> +-+---+
> | 201941  | 37772844  |
> +-+---+
> 1 row selected (8.259 seconds)
> {noformat}
> {quote}
> {color}
> 

[jira] [Commented] (DRILL-4485) MapR profile - use MapR 5.1.0

2016-03-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15186142#comment-15186142
 ] 

ASF GitHub Bot commented on DRILL-4485:
---

GitHub user pwong-mapr opened a pull request:

https://github.com/apache/drill/pull/417

DRILL-4485 - MapR profile - switch to MapR 5.1.0, and improve compatibility 
with maprfs storage format and MapR DB storage plugin



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/pwong-mapr/incubator-drill DRILL-4485-4

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/417.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #417


commit fc076488cb88e1071ef403c300a8681d0b9c584c
Author: Patrick Wong 
Date:   2016-03-08T02:22:08Z

DRILL-4485 - MapR profile - switch to MapR 5.1.0, and improve compatibility 
with maprfs storage format and MapR DB storage plugin




> MapR profile - use MapR 5.1.0
> -
>
> Key: DRILL-4485
> URL: https://issues.apache.org/jira/browse/DRILL-4485
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Tools, Build & Test
>Reporter: Patrick Wong
>Assignee: Parth Chandra
> Fix For: 1.6.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4474) Inconsistent behavior while using COUNT in select (Apache drill 1.2.0)

2016-03-08 Thread Jinfeng Ni (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15186144#comment-15186144
 ] 

Jinfeng Ni commented on DRILL-4474:
---

I submit a new PR, after putting a patch on top of Jacque's DRILL-4474 patch. 

https://github.com/apache/drill/pull/416

Complete run the pre-commit function and unit test.

[~amansinha100] or [~jnadeau], could you please review the new PR for 
DRILL-4474? Thanks!




> Inconsistent behavior while using COUNT in select (Apache drill 1.2.0)
> --
>
> Key: DRILL-4474
> URL: https://issues.apache.org/jira/browse/DRILL-4474
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.5.0
> Environment: m3.xlarge AWS instances ( 3 nodes)
> CentOS6.5 x64
>Reporter: Shankar
>Assignee: Jacques Nadeau
>Priority: Blocker
>
> {quote}
> * We are using drill to retrieve the business data from game analytic. 
> * We are running below queries on table of size 50GB (parquet)
> * We have found some major inconsistency in data when we use COUNT function.
> * Below is the case by case queries and their output. {color:blue}*Please 
> analyse it carefully, to for clear understanding of behaviour. *{color}
> * Please let me know how to resolve this ? (or any earlier JIRA has been 
> already created). 
> * Hope this may be fixed in later versions. If not please do the needful.
> {quote}
> --
> CASE-1 (Wrong result)
> --
> {color:red}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = 
> 'Click' then sessionid end) as cnt
> . . . . . . . > from dfs.tmp.a_games_log_visit_base t
> . . . . . . . > ; 
> +---+
> |   count   |
> +---+
> | 27645752  |
> +---+
> 1 row selected (0.281 seconds)
> {noformat}
> {quote}
> {color}
> --
> CASE-2 (Wrong result)
> --
> {color:red}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(sessionid), 
> . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = 
> 'Click' then sessionid end) as cnt
> . . . . . . . > from dfs.tmp.a_games_log_visit_base t
> . . . . . . . > ; 
> +---+---+
> |  EXPR$0   |  cnt  |
> +---+---+
> | 37772844  | 2108  |
> +---+---+
> 1 row selected (12.597 seconds)
> {noformat}
> {quote}
> {color}
> --
> CASE-3 (Wrong result, only first count is correct)
> --
> {color:red}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(distinct sessionid), 
> . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = 
> 'Click' then sessionid end) as cnt
> . . . . . . . > from dfs.tmp.a_games_log_visit_base t
> . . . . . . . > ; 
> +-+---+
> | EXPR$0  |cnt|
> +-+---+
> | 201941  | 37772844  |
> +-+---+
> 1 row selected (8.259 seconds)
> {noformat}
> {quote}
> {color}
> --
> CASE-4 (Correct result)
> --
> {color:green}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(distinct case when t.id = '/confirmDrop/btnYes/' and 
> t.event = 'Click' then sessionid end) as cnt
> . . . . . . . > from dfs.tmp.a_games_log_visit_base t
> . . . . . . . > ; 
> +--+
> | cnt  |
> +--+
> | 525  |
> +--+
> 1 row selected (14.318 seconds)
> {noformat}
> {quote}
> {color}
> --
> CASE-5 (Correct result)
> 

[jira] [Commented] (DRILL-4474) Inconsistent behavior while using COUNT in select (Apache drill 1.2.0)

2016-03-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15186137#comment-15186137
 ] 

ASF GitHub Bot commented on DRILL-4474:
---

GitHub user jinfengni opened a pull request:

https://github.com/apache/drill/pull/416

DRILL-4474: Ensure that ConvertCountToDirectScan does not push through 
project when nullable input of count is not RexInputRef



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jinfengni/incubator-drill review/DRILL-4474

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/416.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #416


commit 0a5f8fab786f931665d9d28ea67cf19ab37c07fb
Author: Jacques Nadeau 
Date:   2016-03-04T21:27:26Z

DRILL-4474: Ensure that ConvertCountToDirectScan only pushes through 
project when project is trivial.

commit ab00e6aa9563d79e62154ba1f3bbb71dba7d8036
Author: Jinfeng Ni 
Date:   2016-03-08T22:15:27Z

DRILL-4474: Ensure that ConvertCountToDirectScan does not push through 
project when nullable input of count is not RexInputRef




> Inconsistent behavior while using COUNT in select (Apache drill 1.2.0)
> --
>
> Key: DRILL-4474
> URL: https://issues.apache.org/jira/browse/DRILL-4474
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.5.0
> Environment: m3.xlarge AWS instances ( 3 nodes)
> CentOS6.5 x64
>Reporter: Shankar
>Assignee: Jacques Nadeau
>Priority: Blocker
>
> {quote}
> * We are using drill to retrieve the business data from game analytic. 
> * We are running below queries on table of size 50GB (parquet)
> * We have found some major inconsistency in data when we use COUNT function.
> * Below is the case by case queries and their output. {color:blue}*Please 
> analyse it carefully, to for clear understanding of behaviour. *{color}
> * Please let me know how to resolve this ? (or any earlier JIRA has been 
> already created). 
> * Hope this may be fixed in later versions. If not please do the needful.
> {quote}
> --
> CASE-1 (Wrong result)
> --
> {color:red}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = 
> 'Click' then sessionid end) as cnt
> . . . . . . . > from dfs.tmp.a_games_log_visit_base t
> . . . . . . . > ; 
> +---+
> |   count   |
> +---+
> | 27645752  |
> +---+
> 1 row selected (0.281 seconds)
> {noformat}
> {quote}
> {color}
> --
> CASE-2 (Wrong result)
> --
> {color:red}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(sessionid), 
> . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = 
> 'Click' then sessionid end) as cnt
> . . . . . . . > from dfs.tmp.a_games_log_visit_base t
> . . . . . . . > ; 
> +---+---+
> |  EXPR$0   |  cnt  |
> +---+---+
> | 37772844  | 2108  |
> +---+---+
> 1 row selected (12.597 seconds)
> {noformat}
> {quote}
> {color}
> --
> CASE-3 (Wrong result, only first count is correct)
> --
> {color:red}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(distinct sessionid), 
> . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = 
> 'Click' then sessionid end) as cnt
> . . . . . . . > from dfs.tmp.a_games_log_visit_base t
> . . . . . . . > ; 
> +-+---+
> | EXPR$0  |cnt|
> +-+---+
> | 201941  | 37772844  |
> +-+---+
> 1 row selected (8.259 seconds)
> {noformat}
> {quote}
> {color}
> 

[jira] [Commented] (DRILL-4485) MapR profile - use MapR 5.1.0

2016-03-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15186138#comment-15186138
 ] 

ASF GitHub Bot commented on DRILL-4485:
---

Github user pwong-mapr closed the pull request at:

https://github.com/apache/drill/pull/413


> MapR profile - use MapR 5.1.0
> -
>
> Key: DRILL-4485
> URL: https://issues.apache.org/jira/browse/DRILL-4485
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Tools, Build & Test
>Reporter: Patrick Wong
>Assignee: Parth Chandra
> Fix For: 1.6.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4490) Count(*) function returns as optional instead of required

2016-03-08 Thread Krystal (JIRA)
Krystal created DRILL-4490:
--

 Summary: Count(*) function returns as optional instead of required
 Key: DRILL-4490
 URL: https://issues.apache.org/jira/browse/DRILL-4490
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Data Types
Affects Versions: 1.6.0
Reporter: Krystal
Assignee: Sean Hsuan-Yi Chu


git.commit.id.abbrev=c8a7840

I have the following CTAS query:
create table test as select count(*) as col1 from cp.`tpch/orders.parquet`;

The schema of the test table shows col1 as optional:
message root {
  optional int64 col1;
}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4474) Inconsistent behavior while using COUNT in select (Apache drill 1.2.0)

2016-03-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15186029#comment-15186029
 ] 

ASF GitHub Bot commented on DRILL-4474:
---

Github user jinfengni commented on a diff in the pull request:

https://github.com/apache/drill/pull/406#discussion_r5563
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/ConvertCountToDirectScan.java
 ---
@@ -103,6 +104,10 @@ public void onMatch(RelOptRuleCall call) {
   return;
 }
 
+if (proj != null && !ProjectRemoveRule.isTrivial(proj)) {
--- End diff --

I have a patch, which works fine for Jacque's new unit test. It continues 
to use directScan for simple count query. The patch is pending pre-commit & 
unit test run. Will update results shortly. 


> Inconsistent behavior while using COUNT in select (Apache drill 1.2.0)
> --
>
> Key: DRILL-4474
> URL: https://issues.apache.org/jira/browse/DRILL-4474
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.5.0
> Environment: m3.xlarge AWS instances ( 3 nodes)
> CentOS6.5 x64
>Reporter: Shankar
>Assignee: Jacques Nadeau
>Priority: Blocker
>
> {quote}
> * We are using drill to retrieve the business data from game analytic. 
> * We are running below queries on table of size 50GB (parquet)
> * We have found some major inconsistency in data when we use COUNT function.
> * Below is the case by case queries and their output. {color:blue}*Please 
> analyse it carefully, to for clear understanding of behaviour. *{color}
> * Please let me know how to resolve this ? (or any earlier JIRA has been 
> already created). 
> * Hope this may be fixed in later versions. If not please do the needful.
> {quote}
> --
> CASE-1 (Wrong result)
> --
> {color:red}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = 
> 'Click' then sessionid end) as cnt
> . . . . . . . > from dfs.tmp.a_games_log_visit_base t
> . . . . . . . > ; 
> +---+
> |   count   |
> +---+
> | 27645752  |
> +---+
> 1 row selected (0.281 seconds)
> {noformat}
> {quote}
> {color}
> --
> CASE-2 (Wrong result)
> --
> {color:red}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(sessionid), 
> . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = 
> 'Click' then sessionid end) as cnt
> . . . . . . . > from dfs.tmp.a_games_log_visit_base t
> . . . . . . . > ; 
> +---+---+
> |  EXPR$0   |  cnt  |
> +---+---+
> | 37772844  | 2108  |
> +---+---+
> 1 row selected (12.597 seconds)
> {noformat}
> {quote}
> {color}
> --
> CASE-3 (Wrong result, only first count is correct)
> --
> {color:red}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(distinct sessionid), 
> . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = 
> 'Click' then sessionid end) as cnt
> . . . . . . . > from dfs.tmp.a_games_log_visit_base t
> . . . . . . . > ; 
> +-+---+
> | EXPR$0  |cnt|
> +-+---+
> | 201941  | 37772844  |
> +-+---+
> 1 row selected (8.259 seconds)
> {noformat}
> {quote}
> {color}
> --
> CASE-4 (Correct result)
> --
> {color:green}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(distinct case when t.id = '/confirmDrop/btnYes/' and 
> t.event = 'Click' then sessionid end) as cnt
> . . . . . . . > from dfs.tmp.a_games_log_visit_base t
> . . . . . . . > ; 
> +--+
> | 

[jira] [Closed] (DRILL-4489) Add ValueVector tests from Drill

2016-03-08 Thread Steven Phillips (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Phillips closed DRILL-4489.
--
Resolution: Invalid

This jira should be in the Arrow project, not Drill

> Add ValueVector tests from Drill
> 
>
> Key: DRILL-4489
> URL: https://issues.apache.org/jira/browse/DRILL-4489
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Steven Phillips
>
> There are some simple ValueVector tests that should be included in the Arrow 
> project.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4489) Add ValueVector tests from Drill

2016-03-08 Thread Steven Phillips (JIRA)
Steven Phillips created DRILL-4489:
--

 Summary: Add ValueVector tests from Drill
 Key: DRILL-4489
 URL: https://issues.apache.org/jira/browse/DRILL-4489
 Project: Apache Drill
  Issue Type: Bug
Reporter: Steven Phillips


There are some simple ValueVector tests that should be included in the Arrow 
project.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4482) Avro no longer selects data correctly from a sub-structure

2016-03-08 Thread JIRA

[ 
https://issues.apache.org/jira/browse/DRILL-4482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185887#comment-15185887
 ] 

Stefán Baxter commented on DRILL-4482:
--

good news, thank you

On Tue, Mar 8, 2016 at 9:39 PM, Jason Altekruse (JIRA) 



> Avro no longer selects data correctly from a sub-structure
> --
>
> Key: DRILL-4482
> URL: https://issues.apache.org/jira/browse/DRILL-4482
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Avro
>Affects Versions: 1.6.0
>Reporter: Stefán Baxter
>Assignee: Stefán Baxter
>Priority: Blocker
> Fix For: 1.6.0
>
>
> Parquet:
> 0: jdbc:drill:zk=local> select s.client_ip.ip from 
> dfs.asa.`/processed/<>/transactions` as s limit 1;
> ++
> | EXPR$0 |
> ++
> | 87.55.171.210  |
> ++
> 1 row selected (1.184 seconds)
> Avro:
> 0: jdbc:drill:zk=local> select s.client_ip.ip from 
> dfs.asa.`/streaming/<>/transactions` as s limit 1;
> +-+
> | EXPR$0  |
> +-+
> | null|
> +-+
> 1 row selected (0.29 seconds)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4482) Avro no longer selects data correctly from a sub-structure

2016-03-08 Thread Jason Altekruse (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185880#comment-15185880
 ] 

Jason Altekruse commented on DRILL-4482:


I definitely found a fixed the issue, the regression was introduced by 
DRILL-4382, but the tests were not written properly to catch the change. Adding 
more tests now, patch should be posted soon.

> Avro no longer selects data correctly from a sub-structure
> --
>
> Key: DRILL-4482
> URL: https://issues.apache.org/jira/browse/DRILL-4482
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Avro
>Affects Versions: 1.6.0
>Reporter: Stefán Baxter
>Assignee: Stefán Baxter
>Priority: Blocker
> Fix For: 1.6.0
>
>
> Parquet:
> 0: jdbc:drill:zk=local> select s.client_ip.ip from 
> dfs.asa.`/processed/<>/transactions` as s limit 1;
> ++
> | EXPR$0 |
> ++
> | 87.55.171.210  |
> ++
> 1 row selected (1.184 seconds)
> Avro:
> 0: jdbc:drill:zk=local> select s.client_ip.ip from 
> dfs.asa.`/streaming/<>/transactions` as s limit 1;
> +-+
> | EXPR$0  |
> +-+
> | null|
> +-+
> 1 row selected (0.29 seconds)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4474) Inconsistent behavior while using COUNT in select (Apache drill 1.2.0)

2016-03-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185820#comment-15185820
 ] 

ASF GitHub Bot commented on DRILL-4474:
---

Github user amansinha100 commented on the pull request:

https://github.com/apache/drill/pull/406#issuecomment-193971973
  
Agree with @jinfengni that the current fix can cause performance regression 
for simpler count queries.  I will change my review to -1 and let's see how to 
get the proper nullability check. 


> Inconsistent behavior while using COUNT in select (Apache drill 1.2.0)
> --
>
> Key: DRILL-4474
> URL: https://issues.apache.org/jira/browse/DRILL-4474
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.5.0
> Environment: m3.xlarge AWS instances ( 3 nodes)
> CentOS6.5 x64
>Reporter: Shankar
>Assignee: Jacques Nadeau
>Priority: Blocker
>
> {quote}
> * We are using drill to retrieve the business data from game analytic. 
> * We are running below queries on table of size 50GB (parquet)
> * We have found some major inconsistency in data when we use COUNT function.
> * Below is the case by case queries and their output. {color:blue}*Please 
> analyse it carefully, to for clear understanding of behaviour. *{color}
> * Please let me know how to resolve this ? (or any earlier JIRA has been 
> already created). 
> * Hope this may be fixed in later versions. If not please do the needful.
> {quote}
> --
> CASE-1 (Wrong result)
> --
> {color:red}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = 
> 'Click' then sessionid end) as cnt
> . . . . . . . > from dfs.tmp.a_games_log_visit_base t
> . . . . . . . > ; 
> +---+
> |   count   |
> +---+
> | 27645752  |
> +---+
> 1 row selected (0.281 seconds)
> {noformat}
> {quote}
> {color}
> --
> CASE-2 (Wrong result)
> --
> {color:red}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(sessionid), 
> . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = 
> 'Click' then sessionid end) as cnt
> . . . . . . . > from dfs.tmp.a_games_log_visit_base t
> . . . . . . . > ; 
> +---+---+
> |  EXPR$0   |  cnt  |
> +---+---+
> | 37772844  | 2108  |
> +---+---+
> 1 row selected (12.597 seconds)
> {noformat}
> {quote}
> {color}
> --
> CASE-3 (Wrong result, only first count is correct)
> --
> {color:red}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(distinct sessionid), 
> . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = 
> 'Click' then sessionid end) as cnt
> . . . . . . . > from dfs.tmp.a_games_log_visit_base t
> . . . . . . . > ; 
> +-+---+
> | EXPR$0  |cnt|
> +-+---+
> | 201941  | 37772844  |
> +-+---+
> 1 row selected (8.259 seconds)
> {noformat}
> {quote}
> {color}
> --
> CASE-4 (Correct result)
> --
> {color:green}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(distinct case when t.id = '/confirmDrop/btnYes/' and 
> t.event = 'Click' then sessionid end) as cnt
> . . . . . . . > from dfs.tmp.a_games_log_visit_base t
> . . . . . . . > ; 
> +--+
> | cnt  |
> +--+
> | 525  |
> +--+
> 1 row selected (14.318 seconds)
> {noformat}
> {quote}
> {color}
> --
> CASE-5 (Correct result)
> 

[jira] [Commented] (DRILL-4474) Inconsistent behavior while using COUNT in select (Apache drill 1.2.0)

2016-03-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185808#comment-15185808
 ] 

ASF GitHub Bot commented on DRILL-4474:
---

Github user jinfengni commented on a diff in the pull request:

https://github.com/apache/drill/pull/406#discussion_r55428219
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/ConvertCountToDirectScan.java
 ---
@@ -103,6 +104,10 @@ public void onMatch(RelOptRuleCall call) {
   return;
 }
 
+if (proj != null && !ProjectRemoveRule.isTrivial(proj)) {
--- End diff --

With the patch, the following query will not use directScan. 

{code}
select count(*) from cp.`tpch/nation.parquet`;
{code}

{code}
00-00Screen : rowType = RecordType(BIGINT EXPR$0): rowcount = 1.0, 
cumulative cost = {75.1 rows, 425.1 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 
169
00-01  Project(EXPR$0=[$0]) : rowType = RecordType(BIGINT EXPR$0): 
rowcount = 1.0, cumulative cost = {75.0 rows, 425.0 cpu, 0.0 io, 0.0 network, 
0.0 memory}, id = 168
00-02StreamAgg(group=[{}], EXPR$0=[COUNT()]) : rowType = 
RecordType(BIGINT EXPR$0): rowcount = 1.0, cumulative cost = {75.0 rows, 425.0 
cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 167
00-03  Project($f0=[0]) : rowType = RecordType(INTEGER $f0): 
rowcount = 25.0, cumulative cost = {50.0 rows, 125.0 cpu, 0.0 io, 0.0 network, 
0.0 memory}, id = 166
00-04Scan(groupscan=[ParquetGroupScan 
[entries=[ReadEntryWithPath [path=classpath:/tpch/nation.parquet]], 
selectionRoot=classpath:/tpch/nation.parquet, numFiles=1, 
usedMetadataFile=false, columns=[]]]) : rowType = RecordType(): rowcount = 
25.0, cumulative cost = {25.0 rows, 25.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, 
id = 165
{code}

I debug a bit. Seems Line 115 is fine. But something is worng in the code 
Line 117 - 123. 



> Inconsistent behavior while using COUNT in select (Apache drill 1.2.0)
> --
>
> Key: DRILL-4474
> URL: https://issues.apache.org/jira/browse/DRILL-4474
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.5.0
> Environment: m3.xlarge AWS instances ( 3 nodes)
> CentOS6.5 x64
>Reporter: Shankar
>Assignee: Jacques Nadeau
>Priority: Blocker
>
> {quote}
> * We are using drill to retrieve the business data from game analytic. 
> * We are running below queries on table of size 50GB (parquet)
> * We have found some major inconsistency in data when we use COUNT function.
> * Below is the case by case queries and their output. {color:blue}*Please 
> analyse it carefully, to for clear understanding of behaviour. *{color}
> * Please let me know how to resolve this ? (or any earlier JIRA has been 
> already created). 
> * Hope this may be fixed in later versions. If not please do the needful.
> {quote}
> --
> CASE-1 (Wrong result)
> --
> {color:red}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = 
> 'Click' then sessionid end) as cnt
> . . . . . . . > from dfs.tmp.a_games_log_visit_base t
> . . . . . . . > ; 
> +---+
> |   count   |
> +---+
> | 27645752  |
> +---+
> 1 row selected (0.281 seconds)
> {noformat}
> {quote}
> {color}
> --
> CASE-2 (Wrong result)
> --
> {color:red}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(sessionid), 
> . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = 
> 'Click' then sessionid end) as cnt
> . . . . . . . > from dfs.tmp.a_games_log_visit_base t
> . . . . . . . > ; 
> +---+---+
> |  EXPR$0   |  cnt  |
> +---+---+
> | 37772844  | 2108  |
> +---+---+
> 1 row selected (12.597 seconds)
> {noformat}
> {quote}
> {color}
> --
> CASE-3 (Wrong result, only first count is correct)
> 

[jira] [Commented] (DRILL-4487) add unit test for DRILL-4449

2016-03-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185760#comment-15185760
 ] 

ASF GitHub Bot commented on DRILL-4487:
---

Github user amansinha100 commented on the pull request:

https://github.com/apache/drill/pull/414#issuecomment-193960370
  
+1


> add unit test for DRILL-4449
> 
>
> Key: DRILL-4487
> URL: https://issues.apache.org/jira/browse/DRILL-4487
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Storage - Parquet
>Reporter: Deneche A. Hakim
>Assignee: Aman Sinha
> Fix For: 1.7.0
>
>
> now that we have a simple reproduction, we should add a unit test to make 
> sure we don't regress



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4474) Inconsistent behavior while using COUNT in select (Apache drill 1.2.0)

2016-03-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185729#comment-15185729
 ] 

ASF GitHub Bot commented on DRILL-4474:
---

Github user jinfengni commented on a diff in the pull request:

https://github.com/apache/drill/pull/406#discussion_r55422075
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/ConvertCountToDirectScan.java
 ---
@@ -103,6 +104,10 @@ public void onMatch(RelOptRuleCall call) {
   return;
 }
 
+if (proj != null && !ProjectRemoveRule.isTrivial(proj)) {
--- End diff --

We have a check whether the input to count() is nullable in Line 115. In 
theory, if the input is non-nullable, then count(non-nullalbe expression) = 
rowcount.

My guess is that the query (case expression) with incorrect result is 
caused by the wrong type resolution for the case expression. 

 


> Inconsistent behavior while using COUNT in select (Apache drill 1.2.0)
> --
>
> Key: DRILL-4474
> URL: https://issues.apache.org/jira/browse/DRILL-4474
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.5.0
> Environment: m3.xlarge AWS instances ( 3 nodes)
> CentOS6.5 x64
>Reporter: Shankar
>Assignee: Jacques Nadeau
>Priority: Blocker
>
> {quote}
> * We are using drill to retrieve the business data from game analytic. 
> * We are running below queries on table of size 50GB (parquet)
> * We have found some major inconsistency in data when we use COUNT function.
> * Below is the case by case queries and their output. {color:blue}*Please 
> analyse it carefully, to for clear understanding of behaviour. *{color}
> * Please let me know how to resolve this ? (or any earlier JIRA has been 
> already created). 
> * Hope this may be fixed in later versions. If not please do the needful.
> {quote}
> --
> CASE-1 (Wrong result)
> --
> {color:red}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = 
> 'Click' then sessionid end) as cnt
> . . . . . . . > from dfs.tmp.a_games_log_visit_base t
> . . . . . . . > ; 
> +---+
> |   count   |
> +---+
> | 27645752  |
> +---+
> 1 row selected (0.281 seconds)
> {noformat}
> {quote}
> {color}
> --
> CASE-2 (Wrong result)
> --
> {color:red}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(sessionid), 
> . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = 
> 'Click' then sessionid end) as cnt
> . . . . . . . > from dfs.tmp.a_games_log_visit_base t
> . . . . . . . > ; 
> +---+---+
> |  EXPR$0   |  cnt  |
> +---+---+
> | 37772844  | 2108  |
> +---+---+
> 1 row selected (12.597 seconds)
> {noformat}
> {quote}
> {color}
> --
> CASE-3 (Wrong result, only first count is correct)
> --
> {color:red}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(distinct sessionid), 
> . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = 
> 'Click' then sessionid end) as cnt
> . . . . . . . > from dfs.tmp.a_games_log_visit_base t
> . . . . . . . > ; 
> +-+---+
> | EXPR$0  |cnt|
> +-+---+
> | 201941  | 37772844  |
> +-+---+
> 1 row selected (8.259 seconds)
> {noformat}
> {quote}
> {color}
> --
> CASE-4 (Correct result)
> --
> {color:green}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(distinct case when t.id = '/confirmDrop/btnYes/' and 
> t.event = 'Click' 

[jira] [Commented] (DRILL-4474) Inconsistent behavior while using COUNT in select (Apache drill 1.2.0)

2016-03-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185724#comment-15185724
 ] 

ASF GitHub Bot commented on DRILL-4474:
---

Github user jinfengni commented on a diff in the pull request:

https://github.com/apache/drill/pull/406#discussion_r55421789
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/ConvertCountToDirectScan.java
 ---
@@ -103,6 +104,10 @@ public void onMatch(RelOptRuleCall call) {
   return;
 }
 
+if (proj != null && !ProjectRemoveRule.isTrivial(proj)) {
--- End diff --

I feel that this check might over-kill some optimization opportunity.  For 
example,

select count(100) 
from `parquetTable`;

In this case, count(100) is equal to rowcount in parquet table. However, 
the project is not a trial project, meaning the new code will disable the 
optimization.



> Inconsistent behavior while using COUNT in select (Apache drill 1.2.0)
> --
>
> Key: DRILL-4474
> URL: https://issues.apache.org/jira/browse/DRILL-4474
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.5.0
> Environment: m3.xlarge AWS instances ( 3 nodes)
> CentOS6.5 x64
>Reporter: Shankar
>Assignee: Jacques Nadeau
>Priority: Blocker
>
> {quote}
> * We are using drill to retrieve the business data from game analytic. 
> * We are running below queries on table of size 50GB (parquet)
> * We have found some major inconsistency in data when we use COUNT function.
> * Below is the case by case queries and their output. {color:blue}*Please 
> analyse it carefully, to for clear understanding of behaviour. *{color}
> * Please let me know how to resolve this ? (or any earlier JIRA has been 
> already created). 
> * Hope this may be fixed in later versions. If not please do the needful.
> {quote}
> --
> CASE-1 (Wrong result)
> --
> {color:red}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = 
> 'Click' then sessionid end) as cnt
> . . . . . . . > from dfs.tmp.a_games_log_visit_base t
> . . . . . . . > ; 
> +---+
> |   count   |
> +---+
> | 27645752  |
> +---+
> 1 row selected (0.281 seconds)
> {noformat}
> {quote}
> {color}
> --
> CASE-2 (Wrong result)
> --
> {color:red}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(sessionid), 
> . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = 
> 'Click' then sessionid end) as cnt
> . . . . . . . > from dfs.tmp.a_games_log_visit_base t
> . . . . . . . > ; 
> +---+---+
> |  EXPR$0   |  cnt  |
> +---+---+
> | 37772844  | 2108  |
> +---+---+
> 1 row selected (12.597 seconds)
> {noformat}
> {quote}
> {color}
> --
> CASE-3 (Wrong result, only first count is correct)
> --
> {color:red}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(distinct sessionid), 
> . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = 
> 'Click' then sessionid end) as cnt
> . . . . . . . > from dfs.tmp.a_games_log_visit_base t
> . . . . . . . > ; 
> +-+---+
> | EXPR$0  |cnt|
> +-+---+
> | 201941  | 37772844  |
> +-+---+
> 1 row selected (8.259 seconds)
> {noformat}
> {quote}
> {color}
> --
> CASE-4 (Correct result)
> --
> {color:green}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(distinct case when t.id = '/confirmDrop/btnYes/' and 
> t.event = 'Click' then 

[jira] [Created] (DRILL-4488) Prefix "-" cause failure (NPE) in constant folding

2016-03-08 Thread Sean Hsuan-Yi Chu (JIRA)
Sean Hsuan-Yi Chu created DRILL-4488:


 Summary: Prefix "-" cause failure (NPE) in constant folding
 Key: DRILL-4488
 URL: https://issues.apache.org/jira/browse/DRILL-4488
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning & Optimization
Reporter: Sean Hsuan-Yi Chu
Assignee: Sean Hsuan-Yi Chu


For example, a query like this one:
{code}
SELECT -sqrt(5) as col
from cp.`tpch/nation.parquet`
{code}
gives NPE. 

The reason is because of the translation of prefix "-" to -1 .



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4482) Avro no longer selects data correctly from a sub-structure

2016-03-08 Thread JIRA

[ 
https://issues.apache.org/jira/browse/DRILL-4482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185672#comment-15185672
 ] 

Stefán Baxter commented on DRILL-4482:
--

Hi,

This is via dfs.

The relevant part of the schema is here:

{"name": "client_ip", "type": ["null",{"name":"ClientIPEntry",
"type":"record", "fields": [
   {"name": "ip", "type": "string"},
   {"name": "isp", "type": ["null","string"]},
   {"name": "postal_code", "type": ["null","string"]},
   {"name": "country_code", "type": ["null","string"]},
   {"name": "latitude", "type": ["null","double"]},
   {"name": "longitude", "type": ["null","double"]}
]}]},


- Stefán


On Tue, Mar 8, 2016 at 7:56 PM, Jason Altekruse (JIRA) 



> Avro no longer selects data correctly from a sub-structure
> --
>
> Key: DRILL-4482
> URL: https://issues.apache.org/jira/browse/DRILL-4482
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Avro
>Affects Versions: 1.6.0
>Reporter: Stefán Baxter
>Assignee: Stefán Baxter
>Priority: Blocker
> Fix For: 1.6.0
>
>
> Parquet:
> 0: jdbc:drill:zk=local> select s.client_ip.ip from 
> dfs.asa.`/processed/<>/transactions` as s limit 1;
> ++
> | EXPR$0 |
> ++
> | 87.55.171.210  |
> ++
> 1 row selected (1.184 seconds)
> Avro:
> 0: jdbc:drill:zk=local> select s.client_ip.ip from 
> dfs.asa.`/streaming/<>/transactions` as s limit 1;
> +-+
> | EXPR$0  |
> +-+
> | null|
> +-+
> 1 row selected (0.29 seconds)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4482) Avro no longer selects data correctly from a sub-structure

2016-03-08 Thread Jason Altekruse (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185658#comment-15185658
 ] 

Jason Altekruse commented on DRILL-4482:


I think I may have reproduced the issue, is this field in your dataset a Map or 
a Record in the avro schema? I am seeing nulls in a case with maps, I am trying 
to figure out the cause right now.

I will be improving our test coverage for Avro as a part of this change to make 
sure we don't have regressions like this in the future.

> Avro no longer selects data correctly from a sub-structure
> --
>
> Key: DRILL-4482
> URL: https://issues.apache.org/jira/browse/DRILL-4482
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Avro
>Affects Versions: 1.6.0
>Reporter: Stefán Baxter
>Assignee: Stefán Baxter
>Priority: Blocker
> Fix For: 1.6.0
>
>
> Parquet:
> 0: jdbc:drill:zk=local> select s.client_ip.ip from 
> dfs.asa.`/processed/<>/transactions` as s limit 1;
> ++
> | EXPR$0 |
> ++
> | 87.55.171.210  |
> ++
> 1 row selected (1.184 seconds)
> Avro:
> 0: jdbc:drill:zk=local> select s.client_ip.ip from 
> dfs.asa.`/streaming/<>/transactions` as s limit 1;
> +-+
> | EXPR$0  |
> +-+
> | null|
> +-+
> 1 row selected (0.29 seconds)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4184) Drill does not support Parquet DECIMAL values in variable length BINARY fields

2016-03-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185648#comment-15185648
 ] 

ASF GitHub Bot commented on DRILL-4184:
---

Github user daveoshinsky commented on a diff in the pull request:

https://github.com/apache/drill/pull/372#discussion_r55417098
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/NullableVarLengthValuesColumn.java
 ---
@@ -69,11 +73,16 @@ protected boolean readAndStoreValueSizeInformation() 
throws IOException {
 if ( currDefLevel == -1 ) {
   currDefLevel = pageReader.definitionLevels.readInteger();
 }
-if ( columnDescriptor.getMaxDefinitionLevel() > currDefLevel) {
+
+if (columnDescriptor.getMaxDefinitionLevel() > currDefLevel) {
   nullsRead++;
-  // set length of zero, each index in the vector defaults to null so 
no need to set the nullability
-  variableWidthVector.getMutator().setValueLengthSafe(
-  valuesReadInCurrentPass + pageReader.valuesReadyToRead, 0);
+  // set length of zero, each index in the vector defaults to null so 
no
+  // need to set the nullability
+  if (variableWidthVector == null) {
--- End diff --

Regarding the two variables variableWidthVector and fixedWidthVector that I 
added, here is my reasoning.  Either variableWidthVector is set if we have a 
VariableWidthVector, or fixedWidthVector is set if we have a FixedWidthVector 
(i.e., decimal).  Hence, variableWidthVector is non-null if and only if we are 
to invoke the pre-existing logic, that assumed a variable width vector.  When 
variableWidthVector is null (fixedWidthVector is non-null, but not currently 
used), we invoke the new logic to save the length information in 
decimalLengths.  If this is no good, please tell me why, and suggest an 
alternative.


> Drill does not support Parquet DECIMAL values in variable length BINARY fields
> --
>
> Key: DRILL-4184
> URL: https://issues.apache.org/jira/browse/DRILL-4184
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.4.0
> Environment: Windows 7 Professional, Java 1.8.0_66
>Reporter: Dave Oshinsky
>
> Encoding a DECIMAL logical type in Parquet using the variable length BINARY 
> primitive type is not supported by Drill as of versions 1.3.0 and 1.4.0.  The 
> problem first surfaces with the ClassCastException shown below, but fixing 
> the immediate cause of the exception is not sufficient to support this 
> combination (DECIMAL, BINARY) in a Parquet file.
> In Drill, DECIMAL is currently assumed to be INT32, INT64, INT96, or 
> FIXED_LEN_BINARY_ARRAY.  Are there any plans to support DECIMAL with variable 
> length BINARY?  Avro definitely supports encoding DECIMAL in variable length 
> bytes (see https://avro.apache.org/docs/current/spec.html#Decimal), but this 
> support in Parquet is less clear.
> Selecting on a BINARY DECIMAL field in a parquet file throws an exception as 
> shown below (java.lang.ClassCastException: 
> org.apache.drill.exec.vector.Decimal28SparseVector cannot be cast to 
> org.apache.drill.exec.vector.VariableWidthVector).  The successful query at 
> bottom selected on a string field in the same file.
> 0: jdbc:drill:zk=local> select count(*) from 
> dfs.`c:/dao/DBArchivePredictor/tenrows.parquet` where acct_no=7020;
> org.apache.drill.common.exceptions.DrillRuntimeException: Error in parquet 
> recor
> d reader.
> Message: Failure in setting up reader
> Parquet Metadata: ParquetMetaData{FileMetaData{schema: message sbi.acct_mstr {
>   required binary ACCT_NO (DECIMAL(20,0));
>   optional binary SF_NO (UTF8);
>   optional binary LF_NO (UTF8);
>   optional binary BRANCH_NO (DECIMAL(20,0));
>   optional binary INTRO_CUST_NO (DECIMAL(20,0));
>   optional binary INTRO_ACCT_NO (DECIMAL(20,0));
>   optional binary INTRO_SIGN (UTF8);
>   optional binary TYPE (UTF8);
>   optional binary OPR_MODE (UTF8);
>   optional binary CUR_ACCT_TYPE (UTF8);
>   optional binary TITLE (UTF8);
>   optional binary CORP_CUST_NO (DECIMAL(20,0));
>   optional binary APLNDT (UTF8);
>   optional binary OPNDT (UTF8);
>   optional binary VERI_EMP_NO (DECIMAL(20,0));
>   optional binary VERI_SIGN (UTF8);
>   optional binary MANAGER_SIGN (UTF8);
>   optional binary CURBAL (DECIMAL(8,2));
>   optional binary STATUS (UTF8);
> }
> , metadata: 
> {parquet.avro.schema={"type":"record","name":"acct_mstr","namespace"
> :"sbi","fields":[{"name":"ACCT_NO","type":{"type":"bytes","logicalType":"decimal
> ","precision":20,"scale":0,"cv_auto_incr":false,"cv_case_sensitive":false,"cv_co
> lumn_class":"java.math.BigDecimal","cv_connection":"oracle.jdbc.driver.T4CConnec
> 

[jira] [Commented] (DRILL-4482) Avro no longer selects data correctly from a sub-structure

2016-03-08 Thread JIRA

[ 
https://issues.apache.org/jira/browse/DRILL-4482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185439#comment-15185439
 ] 

Stefán Baxter commented on DRILL-4482:
--

Still behaves the same.

This query is getting the *same records* from the *same Avro* file just:

0: jdbc:drill:zk=local> select *s.client_ip* from
dfs.asa.`/streaming/venuepoint/transactions` as s limit 2;
+---+
|
+---+
{"ip":"77.106.147.165","postal_code":"2601","country_code":"NO","latitude":61.1151,"longitude":10.4663}
  |
{"ip":"unknown","postal_code":"unknown","country_code":"unknown","latitude":0.0,"longitude":0.0,"isp":"unknown"}
 |
+---+
2 rows selected (0.39 seconds)

0: jdbc:drill:zk=local> select s.*client_ip.ip* from
dfs.asa.`/streaming/venuepoint/transactions` as s limit 2;
+-+
+-+
+-+
2 rows selected (0.16 seconds)

Notice what happens wen a reference to the the sub field is added.

Regards,
 -Stefán

On Tue, Mar 8, 2016 at 6:08 PM, Jason Altekruse (JIRA) 
wrote:



> Avro no longer selects data correctly from a sub-structure
> --
>
> Key: DRILL-4482
> URL: https://issues.apache.org/jira/browse/DRILL-4482
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Avro
>Affects Versions: 1.6.0
>Reporter: Stefán Baxter
>Assignee: Stefán Baxter
>Priority: Blocker
> Fix For: 1.6.0
>
>
> Parquet:
> 0: jdbc:drill:zk=local> select s.client_ip.ip from 
> dfs.asa.`/processed/<>/transactions` as s limit 1;
> ++
> | EXPR$0 |
> ++
> | 87.55.171.210  |
> ++
> 1 row selected (1.184 seconds)
> Avro:
> 0: jdbc:drill:zk=local> select s.client_ip.ip from 
> dfs.asa.`/streaming/<>/transactions` as s limit 1;
> +-+
> | EXPR$0  |
> +-+
> | null|
> +-+
> 1 row selected (0.29 seconds)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4474) Inconsistent behavior while using COUNT in select (Apache drill 1.2.0)

2016-03-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185429#comment-15185429
 ] 

ASF GitHub Bot commented on DRILL-4474:
---

Github user amansinha100 closed the pull request at:

https://github.com/apache/drill/pull/415


> Inconsistent behavior while using COUNT in select (Apache drill 1.2.0)
> --
>
> Key: DRILL-4474
> URL: https://issues.apache.org/jira/browse/DRILL-4474
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.5.0
> Environment: m3.xlarge AWS instances ( 3 nodes)
> CentOS6.5 x64
>Reporter: Shankar
>Assignee: Jacques Nadeau
>Priority: Blocker
>
> {quote}
> * We are using drill to retrieve the business data from game analytic. 
> * We are running below queries on table of size 50GB (parquet)
> * We have found some major inconsistency in data when we use COUNT function.
> * Below is the case by case queries and their output. {color:blue}*Please 
> analyse it carefully, to for clear understanding of behaviour. *{color}
> * Please let me know how to resolve this ? (or any earlier JIRA has been 
> already created). 
> * Hope this may be fixed in later versions. If not please do the needful.
> {quote}
> --
> CASE-1 (Wrong result)
> --
> {color:red}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = 
> 'Click' then sessionid end) as cnt
> . . . . . . . > from dfs.tmp.a_games_log_visit_base t
> . . . . . . . > ; 
> +---+
> |   count   |
> +---+
> | 27645752  |
> +---+
> 1 row selected (0.281 seconds)
> {noformat}
> {quote}
> {color}
> --
> CASE-2 (Wrong result)
> --
> {color:red}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(sessionid), 
> . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = 
> 'Click' then sessionid end) as cnt
> . . . . . . . > from dfs.tmp.a_games_log_visit_base t
> . . . . . . . > ; 
> +---+---+
> |  EXPR$0   |  cnt  |
> +---+---+
> | 37772844  | 2108  |
> +---+---+
> 1 row selected (12.597 seconds)
> {noformat}
> {quote}
> {color}
> --
> CASE-3 (Wrong result, only first count is correct)
> --
> {color:red}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(distinct sessionid), 
> . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = 
> 'Click' then sessionid end) as cnt
> . . . . . . . > from dfs.tmp.a_games_log_visit_base t
> . . . . . . . > ; 
> +-+---+
> | EXPR$0  |cnt|
> +-+---+
> | 201941  | 37772844  |
> +-+---+
> 1 row selected (8.259 seconds)
> {noformat}
> {quote}
> {color}
> --
> CASE-4 (Correct result)
> --
> {color:green}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(distinct case when t.id = '/confirmDrop/btnYes/' and 
> t.event = 'Click' then sessionid end) as cnt
> . . . . . . . > from dfs.tmp.a_games_log_visit_base t
> . . . . . . . > ; 
> +--+
> | cnt  |
> +--+
> | 525  |
> +--+
> 1 row selected (14.318 seconds)
> {noformat}
> {quote}
> {color}
> --
> CASE-5 (Correct result)
> --
> {color:green}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(sessionid),
> . . . . 

[jira] [Commented] (DRILL-4474) Inconsistent behavior while using COUNT in select (Apache drill 1.2.0)

2016-03-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185418#comment-15185418
 ] 

ASF GitHub Bot commented on DRILL-4474:
---

Github user amansinha100 commented on the pull request:

https://github.com/apache/drill/pull/415#issuecomment-193901238
  
oops ... sorry, closing this and will reopen against the correct JIRA. 


> Inconsistent behavior while using COUNT in select (Apache drill 1.2.0)
> --
>
> Key: DRILL-4474
> URL: https://issues.apache.org/jira/browse/DRILL-4474
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.5.0
> Environment: m3.xlarge AWS instances ( 3 nodes)
> CentOS6.5 x64
>Reporter: Shankar
>Assignee: Jacques Nadeau
>Priority: Blocker
>
> {quote}
> * We are using drill to retrieve the business data from game analytic. 
> * We are running below queries on table of size 50GB (parquet)
> * We have found some major inconsistency in data when we use COUNT function.
> * Below is the case by case queries and their output. {color:blue}*Please 
> analyse it carefully, to for clear understanding of behaviour. *{color}
> * Please let me know how to resolve this ? (or any earlier JIRA has been 
> already created). 
> * Hope this may be fixed in later versions. If not please do the needful.
> {quote}
> --
> CASE-1 (Wrong result)
> --
> {color:red}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = 
> 'Click' then sessionid end) as cnt
> . . . . . . . > from dfs.tmp.a_games_log_visit_base t
> . . . . . . . > ; 
> +---+
> |   count   |
> +---+
> | 27645752  |
> +---+
> 1 row selected (0.281 seconds)
> {noformat}
> {quote}
> {color}
> --
> CASE-2 (Wrong result)
> --
> {color:red}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(sessionid), 
> . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = 
> 'Click' then sessionid end) as cnt
> . . . . . . . > from dfs.tmp.a_games_log_visit_base t
> . . . . . . . > ; 
> +---+---+
> |  EXPR$0   |  cnt  |
> +---+---+
> | 37772844  | 2108  |
> +---+---+
> 1 row selected (12.597 seconds)
> {noformat}
> {quote}
> {color}
> --
> CASE-3 (Wrong result, only first count is correct)
> --
> {color:red}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(distinct sessionid), 
> . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = 
> 'Click' then sessionid end) as cnt
> . . . . . . . > from dfs.tmp.a_games_log_visit_base t
> . . . . . . . > ; 
> +-+---+
> | EXPR$0  |cnt|
> +-+---+
> | 201941  | 37772844  |
> +-+---+
> 1 row selected (8.259 seconds)
> {noformat}
> {quote}
> {color}
> --
> CASE-4 (Correct result)
> --
> {color:green}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(distinct case when t.id = '/confirmDrop/btnYes/' and 
> t.event = 'Click' then sessionid end) as cnt
> . . . . . . . > from dfs.tmp.a_games_log_visit_base t
> . . . . . . . > ; 
> +--+
> | cnt  |
> +--+
> | 525  |
> +--+
> 1 row selected (14.318 seconds)
> {noformat}
> {quote}
> {color}
> --
> CASE-5 (Correct result)
> --
> 

[jira] [Commented] (DRILL-4482) Avro no longer selects data correctly from a sub-structure

2016-03-08 Thread Jason Altekruse (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185392#comment-15185392
 ] 

Jason Altekruse commented on DRILL-4482:


[~acmeguy] Thanks for the quick response, I will continue to try to reproduce 
the failure.

> Avro no longer selects data correctly from a sub-structure
> --
>
> Key: DRILL-4482
> URL: https://issues.apache.org/jira/browse/DRILL-4482
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Avro
>Affects Versions: 1.6.0
>Reporter: Stefán Baxter
>Assignee: Stefán Baxter
>Priority: Blocker
> Fix For: 1.6.0
>
>
> Parquet:
> 0: jdbc:drill:zk=local> select s.client_ip.ip from 
> dfs.asa.`/processed/<>/transactions` as s limit 1;
> ++
> | EXPR$0 |
> ++
> | 87.55.171.210  |
> ++
> 1 row selected (1.184 seconds)
> Avro:
> 0: jdbc:drill:zk=local> select s.client_ip.ip from 
> dfs.asa.`/streaming/<>/transactions` as s limit 1;
> +-+
> | EXPR$0  |
> +-+
> | null|
> +-+
> 1 row selected (0.29 seconds)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4482) Avro no longer selects data correctly from a sub-structure

2016-03-08 Thread JIRA

[ 
https://issues.apache.org/jira/browse/DRILL-4482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185380#comment-15185380
 ] 

Stefán Baxter commented on DRILL-4482:
--

I can pull the laster master/head to verify if this is still a problem.

I will let you know once that is done

On Tue, Mar 8, 2016 at 6:00 PM, Stefán Baxter 



> Avro no longer selects data correctly from a sub-structure
> --
>
> Key: DRILL-4482
> URL: https://issues.apache.org/jira/browse/DRILL-4482
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Avro
>Affects Versions: 1.6.0
>Reporter: Stefán Baxter
>Assignee: Stefán Baxter
>Priority: Blocker
> Fix For: 1.6.0
>
>
> Parquet:
> 0: jdbc:drill:zk=local> select s.client_ip.ip from 
> dfs.asa.`/processed/<>/transactions` as s limit 1;
> ++
> | EXPR$0 |
> ++
> | 87.55.171.210  |
> ++
> 1 row selected (1.184 seconds)
> Avro:
> 0: jdbc:drill:zk=local> select s.client_ip.ip from 
> dfs.asa.`/streaming/<>/transactions` as s limit 1;
> +-+
> | EXPR$0  |
> +-+
> | null|
> +-+
> 1 row selected (0.29 seconds)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4482) Avro no longer selects data correctly from a sub-structure

2016-03-08 Thread JIRA

[ 
https://issues.apache.org/jira/browse/DRILL-4482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185371#comment-15185371
 ] 

Stefán Baxter commented on DRILL-4482:
--

Yes I'm sure.

I only included the limit there for the test. It returns null for all
values and the underlying data includes no nulls.

On Tue, Mar 8, 2016 at 5:58 PM, Jason Altekruse (JIRA) 



> Avro no longer selects data correctly from a sub-structure
> --
>
> Key: DRILL-4482
> URL: https://issues.apache.org/jira/browse/DRILL-4482
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Avro
>Affects Versions: 1.6.0
>Reporter: Stefán Baxter
>Assignee: Stefán Baxter
>Priority: Blocker
> Fix For: 1.6.0
>
>
> Parquet:
> 0: jdbc:drill:zk=local> select s.client_ip.ip from 
> dfs.asa.`/processed/<>/transactions` as s limit 1;
> ++
> | EXPR$0 |
> ++
> | 87.55.171.210  |
> ++
> 1 row selected (1.184 seconds)
> Avro:
> 0: jdbc:drill:zk=local> select s.client_ip.ip from 
> dfs.asa.`/streaming/<>/transactions` as s limit 1;
> +-+
> | EXPR$0  |
> +-+
> | null|
> +-+
> 1 row selected (0.29 seconds)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4482) Avro no longer selects data correctly from a sub-structure

2016-03-08 Thread Jason Altekruse (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185364#comment-15185364
 ] 

Jason Altekruse commented on DRILL-4482:


[~acmeguy] I'm trying to reproduce this issue and not seeing it on a small avro 
file. There is no guarantee about read order when reading a directory, so 
running a limit 0 query over the same table in two formats (or even the same 
list of files two different times) will not be guaranteed to give the same 
result. Is transactions a directory or file without an extension?

Are you sure that there are not null values in this column? Could you try to 
run a query with a predictable result like a max/min on the column or a limit 
with a sort? 

It is still possible that this is a Drill bug, and I will try with a 
distributed query to see if I can reproduce it, but if you have time to try to 
confirm any of these things it could help with creating a reproduction.

> Avro no longer selects data correctly from a sub-structure
> --
>
> Key: DRILL-4482
> URL: https://issues.apache.org/jira/browse/DRILL-4482
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Avro
>Affects Versions: 1.6.0
>Reporter: Stefán Baxter
>Assignee: Stefán Baxter
>Priority: Blocker
> Fix For: 1.6.0
>
>
> Parquet:
> 0: jdbc:drill:zk=local> select s.client_ip.ip from 
> dfs.asa.`/processed/<>/transactions` as s limit 1;
> ++
> | EXPR$0 |
> ++
> | 87.55.171.210  |
> ++
> 1 row selected (1.184 seconds)
> Avro:
> 0: jdbc:drill:zk=local> select s.client_ip.ip from 
> dfs.asa.`/streaming/<>/transactions` as s limit 1;
> +-+
> | EXPR$0  |
> +-+
> | null|
> +-+
> 1 row selected (0.29 seconds)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4474) Inconsistent behavior while using COUNT in select (Apache drill 1.2.0)

2016-03-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185328#comment-15185328
 ] 

ASF GitHub Bot commented on DRILL-4474:
---

Github user jaltekruse commented on the pull request:

https://github.com/apache/drill/pull/415#issuecomment-193884035
  
Could you also close this PR and open a new one? the JIRA number was wrong 
in your commit so this is posting to the JIRA about incorrect creation if 
direct scans. The correct number is 4479


> Inconsistent behavior while using COUNT in select (Apache drill 1.2.0)
> --
>
> Key: DRILL-4474
> URL: https://issues.apache.org/jira/browse/DRILL-4474
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.5.0
> Environment: m3.xlarge AWS instances ( 3 nodes)
> CentOS6.5 x64
>Reporter: Shankar
>Assignee: Jacques Nadeau
>Priority: Blocker
>
> {quote}
> * We are using drill to retrieve the business data from game analytic. 
> * We are running below queries on table of size 50GB (parquet)
> * We have found some major inconsistency in data when we use COUNT function.
> * Below is the case by case queries and their output. {color:blue}*Please 
> analyse it carefully, to for clear understanding of behaviour. *{color}
> * Please let me know how to resolve this ? (or any earlier JIRA has been 
> already created). 
> * Hope this may be fixed in later versions. If not please do the needful.
> {quote}
> --
> CASE-1 (Wrong result)
> --
> {color:red}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = 
> 'Click' then sessionid end) as cnt
> . . . . . . . > from dfs.tmp.a_games_log_visit_base t
> . . . . . . . > ; 
> +---+
> |   count   |
> +---+
> | 27645752  |
> +---+
> 1 row selected (0.281 seconds)
> {noformat}
> {quote}
> {color}
> --
> CASE-2 (Wrong result)
> --
> {color:red}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(sessionid), 
> . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = 
> 'Click' then sessionid end) as cnt
> . . . . . . . > from dfs.tmp.a_games_log_visit_base t
> . . . . . . . > ; 
> +---+---+
> |  EXPR$0   |  cnt  |
> +---+---+
> | 37772844  | 2108  |
> +---+---+
> 1 row selected (12.597 seconds)
> {noformat}
> {quote}
> {color}
> --
> CASE-3 (Wrong result, only first count is correct)
> --
> {color:red}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(distinct sessionid), 
> . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = 
> 'Click' then sessionid end) as cnt
> . . . . . . . > from dfs.tmp.a_games_log_visit_base t
> . . . . . . . > ; 
> +-+---+
> | EXPR$0  |cnt|
> +-+---+
> | 201941  | 37772844  |
> +-+---+
> 1 row selected (8.259 seconds)
> {noformat}
> {quote}
> {color}
> --
> CASE-4 (Correct result)
> --
> {color:green}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(distinct case when t.id = '/confirmDrop/btnYes/' and 
> t.event = 'Click' then sessionid end) as cnt
> . . . . . . . > from dfs.tmp.a_games_log_visit_base t
> . . . . . . . > ; 
> +--+
> | cnt  |
> +--+
> | 525  |
> +--+
> 1 row selected (14.318 seconds)
> {noformat}
> {quote}
> {color}
> --
> CASE-5 (Correct result)
> 

[jira] [Commented] (DRILL-4474) Inconsistent behavior while using COUNT in select (Apache drill 1.2.0)

2016-03-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185318#comment-15185318
 ] 

ASF GitHub Bot commented on DRILL-4474:
---

Github user amansinha100 commented on the pull request:

https://github.com/apache/drill/pull/415#issuecomment-193882461
  
Yes, I can do that.


> Inconsistent behavior while using COUNT in select (Apache drill 1.2.0)
> --
>
> Key: DRILL-4474
> URL: https://issues.apache.org/jira/browse/DRILL-4474
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.5.0
> Environment: m3.xlarge AWS instances ( 3 nodes)
> CentOS6.5 x64
>Reporter: Shankar
>Assignee: Jacques Nadeau
>Priority: Blocker
>
> {quote}
> * We are using drill to retrieve the business data from game analytic. 
> * We are running below queries on table of size 50GB (parquet)
> * We have found some major inconsistency in data when we use COUNT function.
> * Below is the case by case queries and their output. {color:blue}*Please 
> analyse it carefully, to for clear understanding of behaviour. *{color}
> * Please let me know how to resolve this ? (or any earlier JIRA has been 
> already created). 
> * Hope this may be fixed in later versions. If not please do the needful.
> {quote}
> --
> CASE-1 (Wrong result)
> --
> {color:red}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = 
> 'Click' then sessionid end) as cnt
> . . . . . . . > from dfs.tmp.a_games_log_visit_base t
> . . . . . . . > ; 
> +---+
> |   count   |
> +---+
> | 27645752  |
> +---+
> 1 row selected (0.281 seconds)
> {noformat}
> {quote}
> {color}
> --
> CASE-2 (Wrong result)
> --
> {color:red}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(sessionid), 
> . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = 
> 'Click' then sessionid end) as cnt
> . . . . . . . > from dfs.tmp.a_games_log_visit_base t
> . . . . . . . > ; 
> +---+---+
> |  EXPR$0   |  cnt  |
> +---+---+
> | 37772844  | 2108  |
> +---+---+
> 1 row selected (12.597 seconds)
> {noformat}
> {quote}
> {color}
> --
> CASE-3 (Wrong result, only first count is correct)
> --
> {color:red}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(distinct sessionid), 
> . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = 
> 'Click' then sessionid end) as cnt
> . . . . . . . > from dfs.tmp.a_games_log_visit_base t
> . . . . . . . > ; 
> +-+---+
> | EXPR$0  |cnt|
> +-+---+
> | 201941  | 37772844  |
> +-+---+
> 1 row selected (8.259 seconds)
> {noformat}
> {quote}
> {color}
> --
> CASE-4 (Correct result)
> --
> {color:green}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(distinct case when t.id = '/confirmDrop/btnYes/' and 
> t.event = 'Click' then sessionid end) as cnt
> . . . . . . . > from dfs.tmp.a_games_log_visit_base t
> . . . . . . . > ; 
> +--+
> | cnt  |
> +--+
> | 525  |
> +--+
> 1 row selected (14.318 seconds)
> {noformat}
> {quote}
> {color}
> --
> CASE-5 (Correct result)
> --
> {color:green}
> {quote}
> {noformat}
> 0: jdbc:drill:> 

[jira] [Commented] (DRILL-4474) Inconsistent behavior while using COUNT in select (Apache drill 1.2.0)

2016-03-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185313#comment-15185313
 ] 

ASF GitHub Bot commented on DRILL-4474:
---

Github user jacques-n commented on the pull request:

https://github.com/apache/drill/pull/415#issuecomment-193880193
  
Can you generate the test file as part of the test rather than check in 
static?


> Inconsistent behavior while using COUNT in select (Apache drill 1.2.0)
> --
>
> Key: DRILL-4474
> URL: https://issues.apache.org/jira/browse/DRILL-4474
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.5.0
> Environment: m3.xlarge AWS instances ( 3 nodes)
> CentOS6.5 x64
>Reporter: Shankar
>Assignee: Jacques Nadeau
>Priority: Blocker
>
> {quote}
> * We are using drill to retrieve the business data from game analytic. 
> * We are running below queries on table of size 50GB (parquet)
> * We have found some major inconsistency in data when we use COUNT function.
> * Below is the case by case queries and their output. {color:blue}*Please 
> analyse it carefully, to for clear understanding of behaviour. *{color}
> * Please let me know how to resolve this ? (or any earlier JIRA has been 
> already created). 
> * Hope this may be fixed in later versions. If not please do the needful.
> {quote}
> --
> CASE-1 (Wrong result)
> --
> {color:red}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = 
> 'Click' then sessionid end) as cnt
> . . . . . . . > from dfs.tmp.a_games_log_visit_base t
> . . . . . . . > ; 
> +---+
> |   count   |
> +---+
> | 27645752  |
> +---+
> 1 row selected (0.281 seconds)
> {noformat}
> {quote}
> {color}
> --
> CASE-2 (Wrong result)
> --
> {color:red}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(sessionid), 
> . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = 
> 'Click' then sessionid end) as cnt
> . . . . . . . > from dfs.tmp.a_games_log_visit_base t
> . . . . . . . > ; 
> +---+---+
> |  EXPR$0   |  cnt  |
> +---+---+
> | 37772844  | 2108  |
> +---+---+
> 1 row selected (12.597 seconds)
> {noformat}
> {quote}
> {color}
> --
> CASE-3 (Wrong result, only first count is correct)
> --
> {color:red}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(distinct sessionid), 
> . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = 
> 'Click' then sessionid end) as cnt
> . . . . . . . > from dfs.tmp.a_games_log_visit_base t
> . . . . . . . > ; 
> +-+---+
> | EXPR$0  |cnt|
> +-+---+
> | 201941  | 37772844  |
> +-+---+
> 1 row selected (8.259 seconds)
> {noformat}
> {quote}
> {color}
> --
> CASE-4 (Correct result)
> --
> {color:green}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(distinct case when t.id = '/confirmDrop/btnYes/' and 
> t.event = 'Click' then sessionid end) as cnt
> . . . . . . . > from dfs.tmp.a_games_log_visit_base t
> . . . . . . . > ; 
> +--+
> | cnt  |
> +--+
> | 525  |
> +--+
> 1 row selected (14.318 seconds)
> {noformat}
> {quote}
> {color}
> --
> CASE-5 (Correct result)
> --

[jira] [Commented] (DRILL-4474) Inconsistent behavior while using COUNT in select (Apache drill 1.2.0)

2016-03-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185306#comment-15185306
 ] 

ASF GitHub Bot commented on DRILL-4474:
---

GitHub user amansinha100 opened a pull request:

https://github.com/apache/drill/pull/415

DRILL-4474: Use varchar for default column when all_text_mode is enab…

…led.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/amansinha100/incubator-drill DRILL-4479

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/415.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #415


commit edfbf9bf0acd94fd0e8737f9162ca13281d00906
Author: Aman Sinha 
Date:   2016-03-08T17:27:32Z

DRILL-4474: Use varchar for default column when all_text_mode is enabled.




> Inconsistent behavior while using COUNT in select (Apache drill 1.2.0)
> --
>
> Key: DRILL-4474
> URL: https://issues.apache.org/jira/browse/DRILL-4474
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.5.0
> Environment: m3.xlarge AWS instances ( 3 nodes)
> CentOS6.5 x64
>Reporter: Shankar
>Assignee: Jacques Nadeau
>Priority: Blocker
>
> {quote}
> * We are using drill to retrieve the business data from game analytic. 
> * We are running below queries on table of size 50GB (parquet)
> * We have found some major inconsistency in data when we use COUNT function.
> * Below is the case by case queries and their output. {color:blue}*Please 
> analyse it carefully, to for clear understanding of behaviour. *{color}
> * Please let me know how to resolve this ? (or any earlier JIRA has been 
> already created). 
> * Hope this may be fixed in later versions. If not please do the needful.
> {quote}
> --
> CASE-1 (Wrong result)
> --
> {color:red}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = 
> 'Click' then sessionid end) as cnt
> . . . . . . . > from dfs.tmp.a_games_log_visit_base t
> . . . . . . . > ; 
> +---+
> |   count   |
> +---+
> | 27645752  |
> +---+
> 1 row selected (0.281 seconds)
> {noformat}
> {quote}
> {color}
> --
> CASE-2 (Wrong result)
> --
> {color:red}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(sessionid), 
> . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = 
> 'Click' then sessionid end) as cnt
> . . . . . . . > from dfs.tmp.a_games_log_visit_base t
> . . . . . . . > ; 
> +---+---+
> |  EXPR$0   |  cnt  |
> +---+---+
> | 37772844  | 2108  |
> +---+---+
> 1 row selected (12.597 seconds)
> {noformat}
> {quote}
> {color}
> --
> CASE-3 (Wrong result, only first count is correct)
> --
> {color:red}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(distinct sessionid), 
> . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = 
> 'Click' then sessionid end) as cnt
> . . . . . . . > from dfs.tmp.a_games_log_visit_base t
> . . . . . . . > ; 
> +-+---+
> | EXPR$0  |cnt|
> +-+---+
> | 201941  | 37772844  |
> +-+---+
> 1 row selected (8.259 seconds)
> {noformat}
> {quote}
> {color}
> --
> CASE-4 (Correct result)
> --
> {color:green}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(distinct case when t.id = 

[jira] [Assigned] (DRILL-4487) add unit test for DRILL-4449

2016-03-08 Thread Deneche A. Hakim (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deneche A. Hakim reassigned DRILL-4487:
---

Assignee: Deneche A. Hakim  (was: Aman Sinha)

> add unit test for DRILL-4449
> 
>
> Key: DRILL-4487
> URL: https://issues.apache.org/jira/browse/DRILL-4487
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Storage - Parquet
>Reporter: Deneche A. Hakim
>Assignee: Deneche A. Hakim
> Fix For: 1.7.0
>
>
> now that we have a simple reproduction, we should add a unit test to make 
> sure we don't regress



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4487) add unit test for DRILL-4449

2016-03-08 Thread Deneche A. Hakim (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deneche A. Hakim updated DRILL-4487:

Assignee: Aman Sinha  (was: Deneche A. Hakim)

> add unit test for DRILL-4449
> 
>
> Key: DRILL-4487
> URL: https://issues.apache.org/jira/browse/DRILL-4487
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Storage - Parquet
>Reporter: Deneche A. Hakim
>Assignee: Aman Sinha
> Fix For: 1.7.0
>
>
> now that we have a simple reproduction, we should add a unit test to make 
> sure we don't regress



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4487) add unit test for DRILL-4449

2016-03-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185294#comment-15185294
 ] 

ASF GitHub Bot commented on DRILL-4487:
---

Github user adeneche commented on the pull request:

https://github.com/apache/drill/pull/414#issuecomment-193876468
  
one easy way to reproduce the issue, and fail the unit test, is to change 
the [following 
line](https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetGroupScan.java#L716)
 from:

ParquetTableMetadataBase metadata = parquetTableMetadata.clone();

to

ParquetTableMetadataBase metadata = parquetTableMetadata;


> add unit test for DRILL-4449
> 
>
> Key: DRILL-4487
> URL: https://issues.apache.org/jira/browse/DRILL-4487
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Storage - Parquet
>Reporter: Deneche A. Hakim
>Assignee: Deneche A. Hakim
> Fix For: 1.7.0
>
>
> now that we have a simple reproduction, we should add a unit test to make 
> sure we don't regress



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4487) add unit test for DRILL-4449

2016-03-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185287#comment-15185287
 ] 

ASF GitHub Bot commented on DRILL-4487:
---

GitHub user adeneche opened a pull request:

https://github.com/apache/drill/pull/414

DRILL-4487: add unit test for DRILL-4449

@amansinha100 can you please review ? thanks

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/adeneche/incubator-drill DRILL-4487

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/414.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #414


commit b1f052d800bae05bbf36b3594fe3c171ea4cede4
Author: adeneche 
Date:   2016-03-08T15:54:31Z

DRILL-4487: add unit test for DRILL-4449




> add unit test for DRILL-4449
> 
>
> Key: DRILL-4487
> URL: https://issues.apache.org/jira/browse/DRILL-4487
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Storage - Parquet
>Reporter: Deneche A. Hakim
>Assignee: Deneche A. Hakim
> Fix For: 1.7.0
>
>
> now that we have a simple reproduction, we should add a unit test to make 
> sure we don't regress



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4313) C++ client - Improve method of drillbit selection from cluster

2016-03-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185220#comment-15185220
 ] 

ASF GitHub Bot commented on DRILL-4313:
---

Github user parthchandra closed the pull request at:

https://github.com/apache/drill/pull/396


> C++ client - Improve method of drillbit selection from cluster
> --
>
> Key: DRILL-4313
> URL: https://issues.apache.org/jira/browse/DRILL-4313
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Parth Chandra
>Assignee: Parth Chandra
> Fix For: 1.6.0
>
>
> The current C++ client handles multiple parallel queries over the same 
> connection, but that creates a bottleneck as the queries get sent to the same 
> drillbit.
> The client can manage this more effectively by choosing from a configurable 
> pool of connections and round robin queries to them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-4313) C++ client - Improve method of drillbit selection from cluster

2016-03-08 Thread Parth Chandra (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Parth Chandra resolved DRILL-4313.
--
Resolution: Fixed

Fixed in df0f0af3d963c1b65eb01c3141fe84532c53f5a5

> C++ client - Improve method of drillbit selection from cluster
> --
>
> Key: DRILL-4313
> URL: https://issues.apache.org/jira/browse/DRILL-4313
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Parth Chandra
>Assignee: Parth Chandra
> Fix For: 1.6.0
>
>
> The current C++ client handles multiple parallel queries over the same 
> connection, but that creates a bottleneck as the queries get sent to the same 
> drillbit.
> The client can manage this more effectively by choosing from a configurable 
> pool of connections and round robin queries to them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-4332) tests in TestFrameworkTest fail in Java 8

2016-03-08 Thread Jason Altekruse (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Altekruse resolved DRILL-4332.

   Resolution: Fixed
Fix Version/s: (was: Future)
   1.6.0

Fixed in 447b093cd2b05bfeae001844a7e3573935e84389

> tests in TestFrameworkTest fail in Java 8
> -
>
> Key: DRILL-4332
> URL: https://issues.apache.org/jira/browse/DRILL-4332
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Tools, Build & Test
>Affects Versions: 1.5.0
>Reporter: Deneche A. Hakim
>Assignee: Laurent Goujon
> Fix For: 1.6.0
>
>
> the following unit tests fail in Java 8:
> {noformat}
> TestFrameworkTest.testRepeatedColumnMatching
> TestFrameworkTest.testCSVVerificationOfOrder_checkFailure
> {noformat}
> The tests expect the query to fail with a specific error message. The message 
> generated by DrillTestWrapper.compareMergedVectors assumes a specific order 
> in a map keySet (which we shouldn't). In Java 8 it seems the order changed 
> which causes a slightly different error message



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-4486) Expression serializer incorrectly serializes escaped characters

2016-03-08 Thread Jason Altekruse (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Altekruse resolved DRILL-4486.

   Resolution: Fixed
Fix Version/s: 1.6.0

Fixed in 80316f3f8bef866720f99e609fe758ec8e0c4612

> Expression serializer incorrectly serializes escaped characters
> ---
>
> Key: DRILL-4486
> URL: https://issues.apache.org/jira/browse/DRILL-4486
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Steven Phillips
>Assignee: Steven Phillips
> Fix For: 1.6.0
>
>
> the drill expression parser requires backslashes to be escaped. But the 
> ExpressionStringBuilder is not properly escaping them. This causes problems, 
> especially in the case of regex expressions run with parallel execution.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-4375) Fix the maven release profile, broken by jdbc jar size enforcer added in DRILL-4291

2016-03-08 Thread Jason Altekruse (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Altekruse resolved DRILL-4375.

   Resolution: Fixed
Fix Version/s: 1.6.0

Fixed in 1f29914fc5c7d1e36651ac28167804c4012501fe

> Fix the maven release profile, broken by jdbc jar size enforcer added in 
> DRILL-4291
> ---
>
> Key: DRILL-4375
> URL: https://issues.apache.org/jira/browse/DRILL-4375
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Jason Altekruse
>Assignee: Jason Altekruse
> Fix For: 1.6.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (DRILL-2048) Malformed drill stoage config stored in zookeeper will prevent Drill from starting

2016-03-08 Thread Jason Altekruse (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Altekruse reassigned DRILL-2048:
--

Assignee: Jason Altekruse

> Malformed drill stoage config stored in zookeeper will prevent Drill from 
> starting
> --
>
> Key: DRILL-2048
> URL: https://issues.apache.org/jira/browse/DRILL-2048
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Other
>Reporter: Jason Altekruse
>Assignee: Jason Altekruse
> Fix For: 1.7.0
>
>
> We noticed this problem while trying to test dev builds on a common cluster. 
> When applying changes that added a field to the configuration of a storage 
> plugin, the new format of the configuration would be persisted in zookeeper. 
> When a different dev build that did not include the change set tried to be 
> deployed on the same cluster the config stored in zookeeper would fail to 
> parse and the drillbit would not be able to start. This is not system 
> critical configuration so the drillbit should be able to still start with the 
> plugin disabled.
> This fix could also include changing the jackson mapper to allow ignoring 
> unexpected fields in the configuration. This would give a little better 
> chance for interoperability between future versions of Drill as we add new 
> configuration options as necessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-2048) Malformed drill stoage config stored in zookeeper will prevent Drill from starting

2016-03-08 Thread Jason Altekruse (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Altekruse updated DRILL-2048:
---
Fix Version/s: (was: Future)

> Malformed drill stoage config stored in zookeeper will prevent Drill from 
> starting
> --
>
> Key: DRILL-2048
> URL: https://issues.apache.org/jira/browse/DRILL-2048
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Other
>Reporter: Jason Altekruse
>Assignee: Jason Altekruse
> Fix For: 1.7.0
>
>
> We noticed this problem while trying to test dev builds on a common cluster. 
> When applying changes that added a field to the configuration of a storage 
> plugin, the new format of the configuration would be persisted in zookeeper. 
> When a different dev build that did not include the change set tried to be 
> deployed on the same cluster the config stored in zookeeper would fail to 
> parse and the drillbit would not be able to start. This is not system 
> critical configuration so the drillbit should be able to still start with the 
> plugin disabled.
> This fix could also include changing the jackson mapper to allow ignoring 
> unexpected fields in the configuration. This would give a little better 
> chance for interoperability between future versions of Drill as we add new 
> configuration options as necessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-2048) Malformed drill stoage config stored in zookeeper will prevent Drill from starting

2016-03-08 Thread Jason Altekruse (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Altekruse updated DRILL-2048:
---
Fix Version/s: 1.7.0

> Malformed drill stoage config stored in zookeeper will prevent Drill from 
> starting
> --
>
> Key: DRILL-2048
> URL: https://issues.apache.org/jira/browse/DRILL-2048
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Other
>Reporter: Jason Altekruse
>Assignee: Jason Altekruse
> Fix For: 1.7.0
>
>
> We noticed this problem while trying to test dev builds on a common cluster. 
> When applying changes that added a field to the configuration of a storage 
> plugin, the new format of the configuration would be persisted in zookeeper. 
> When a different dev build that did not include the change set tried to be 
> deployed on the same cluster the config stored in zookeeper would fail to 
> parse and the drillbit would not be able to start. This is not system 
> critical configuration so the drillbit should be able to still start with the 
> plugin disabled.
> This fix could also include changing the jackson mapper to allow ignoring 
> unexpected fields in the configuration. This would give a little better 
> chance for interoperability between future versions of Drill as we add new 
> configuration options as necessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4449) Wrong results when using metadata cache with specific set of queries

2016-03-08 Thread Deneche A. Hakim (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185131#comment-15185131
 ] 

Deneche A. Hakim commented on DRILL-4449:
-

a clarification here. To have a reproduction the table must be partitioned and 
referenced in both inner queries with different filters. The filters need to 
trigger a parquet partition pruning and leave more than one file after the 
pruning.

> Wrong results when using metadata cache with specific set of queries
> 
>
> Key: DRILL-4449
> URL: https://issues.apache.org/jira/browse/DRILL-4449
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.5.0
>Reporter: Deneche A. Hakim
>Assignee: Deneche A. Hakim
>Priority: Critical
> Fix For: 1.6.0
>
>
> We are still working on a reproduction but when we have a query similar to 
> this one:
> {noformat}
> with q1 as (
> select a.field
> from `table` a
> where 
> group by a.field
> having ...
> )
> , q2 as (
> select a.field
> from `table` a
> where 
> group by a.field
> )
> select * from (
> select count(*) as cnt from q1
> union all
> select count(*) as cnt from q2
> );
> {noformat}
> The table is partitioned and both sub queries will force a parquet pruning on 
> the table. Because we share the parquet metadata object in ParquetGroupScan, 
> the second query end up being "over pruned" and we get wrong results.
> The plan doesn't show the problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4487) add unit test for DRILL-4449

2016-03-08 Thread Deneche A. Hakim (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deneche A. Hakim updated DRILL-4487:

Summary: add unit test for DRILL-4449  (was: add unit test fro DRILL-4449)

> add unit test for DRILL-4449
> 
>
> Key: DRILL-4487
> URL: https://issues.apache.org/jira/browse/DRILL-4487
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Storage - Parquet
>Reporter: Deneche A. Hakim
>Assignee: Deneche A. Hakim
> Fix For: 1.7.0
>
>
> now that we have a simple reproduction, we should add a unit test to make 
> sure we don't regress



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4487) add unit test fro DRILL-4449

2016-03-08 Thread Deneche A. Hakim (JIRA)
Deneche A. Hakim created DRILL-4487:
---

 Summary: add unit test fro DRILL-4449
 Key: DRILL-4487
 URL: https://issues.apache.org/jira/browse/DRILL-4487
 Project: Apache Drill
  Issue Type: Sub-task
Reporter: Deneche A. Hakim
Assignee: Deneche A. Hakim
 Fix For: 1.7.0


now that we have a simple reproduction, we should add a unit test to make sure 
we don't regress



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4449) Wrong results when using metadata cache with specific set of queries

2016-03-08 Thread Deneche A. Hakim (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185043#comment-15185043
 ] 

Deneche A. Hakim commented on DRILL-4449:
-

I was able to create a reproduction of the issue, in case it's later needed for 
validation:

create a partitioned table:
{noformat}
CREATE TABLE dfs.tmp.t PARTITION BY(l_discount) AS SELECT * FROM 
cp.`tpch/lineitem.parquet`;
{noformat}

The following query will give wrong results if the table has a metadata cache 
file:
{noformat}
SELECT COUNT(*) FROM (
SELECT l_orderkey FROM dfs.tmp.t WHERE l_discount < 0.05 
UNION ALL
SELECT l_orderkey FROM dfs.tmp.t WHERE l_discount > 0.02
);
{noformat}

> Wrong results when using metadata cache with specific set of queries
> 
>
> Key: DRILL-4449
> URL: https://issues.apache.org/jira/browse/DRILL-4449
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.5.0
>Reporter: Deneche A. Hakim
>Assignee: Deneche A. Hakim
>Priority: Critical
> Fix For: 1.6.0
>
>
> We are still working on a reproduction but when we have a query similar to 
> this one:
> {noformat}
> with q1 as (
> select a.field
> from `table` a
> where 
> group by a.field
> having ...
> )
> , q2 as (
> select a.field
> from `table` a
> where 
> group by a.field
> )
> select * from (
> select count(*) as cnt from q1
> union all
> select count(*) as cnt from q2
> );
> {noformat}
> The table is partitioned and both sub queries will force a parquet pruning on 
> the table. Because we share the parquet metadata object in ParquetGroupScan, 
> the second query end up being "over pruned" and we get wrong results.
> The plan doesn't show the problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4443) MIN/MAX on VARCHAR throw a NullPointerException

2016-03-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15184695#comment-15184695
 ] 

ASF GitHub Bot commented on DRILL-4443:
---

Github user adeneche closed the pull request at:

https://github.com/apache/drill/pull/409


> MIN/MAX on VARCHAR throw a NullPointerException
> ---
>
> Key: DRILL-4443
> URL: https://issues.apache.org/jira/browse/DRILL-4443
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.6.0
> Environment: 4 node cluster CentOS
>Reporter: Khurram Faraaz
>Assignee: Deneche A. Hakim
>Priority: Critical
> Fix For: 1.6.0
>
> Attachments: DRILL_4443.parquet, test4443.csv
>
>
> Using a simple csv file that contains at least 2 groups of rows:
> {noformat}
> a,
> a,
> a,
> b,
> {noformat}
> Running a query with min/max throws a NullPointerException:
> {noformat}
> SELECT MIN(columns[1]) FROM `test4443.csv` GROUP BY columns[0];
> Error: SYSTEM ERROR: NullPointerException
> ...
> {noformat}
> {noformat}
> SELECT MAX(columns[1]) FROM `test4443.csv` GROUP BY columns[0];
> Error: SYSTEM ERROR: NullPointerException
> ...
> {noformat}
> The problem is caused by {{VarCharAggrFunctions.java}} that is not reseting 
> it's internal buffer properly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (DRILL-4443) MIN/MAX on VARCHAR throw a NullPointerException

2016-03-08 Thread Deneche A. Hakim (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deneche A. Hakim reassigned DRILL-4443:
---

Assignee: Deneche A. Hakim  (was: Hanifi Gunes)

> MIN/MAX on VARCHAR throw a NullPointerException
> ---
>
> Key: DRILL-4443
> URL: https://issues.apache.org/jira/browse/DRILL-4443
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.6.0
> Environment: 4 node cluster CentOS
>Reporter: Khurram Faraaz
>Assignee: Deneche A. Hakim
>Priority: Critical
> Fix For: 1.6.0
>
> Attachments: DRILL_4443.parquet, test4443.csv
>
>
> Using a simple csv file that contains at least 2 groups of rows:
> {noformat}
> a,
> a,
> a,
> b,
> {noformat}
> Running a query with min/max throws a NullPointerException:
> {noformat}
> SELECT MIN(columns[1]) FROM `test4443.csv` GROUP BY columns[0];
> Error: SYSTEM ERROR: NullPointerException
> ...
> {noformat}
> {noformat}
> SELECT MAX(columns[1]) FROM `test4443.csv` GROUP BY columns[0];
> Error: SYSTEM ERROR: NullPointerException
> ...
> {noformat}
> The problem is caused by {{VarCharAggrFunctions.java}} that is not reseting 
> it's internal buffer properly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4453) Difference in results over char data, window function query

2016-03-08 Thread Deneche A. Hakim (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15184652#comment-15184652
 ] 

Deneche A. Hakim commented on DRILL-4453:
-

[~khfaraaz] how about the results, are they still different ?

> Difference in results over char data, window function query
> ---
>
> Key: DRILL-4453
> URL: https://issues.apache.org/jira/browse/DRILL-4453
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.6.0
> Environment: 4 node cluster
>Reporter: Khurram Faraaz
>Assignee: Khurram Faraaz
>  Labels: window_function
> Attachments: t_alltype.csv, t_alltype.parquet
>
>
> Window function query with frame clause returns results that are different 
> from those returned by same query on Postgres 9.3 of same data.
> Note that the two tables have same number of nulls in both Drill and Postgres.
> The length of the result returned by MIN function is different on Postgres 
> 9.3 vs Drill 1.6.0
> Drill 1.6.0 => returns 1 as length.
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> select length(min(c4)) from dfs.tmp.`t_alltype`;
> +-+
> | EXPR$0  |
> +-+
> | 1   |
> +-+
> 1 row selected (0.282 seconds)
> {noformat}
> Postgress 9.3 returns 0 as length.
> {noformat}
> postgres=# select length(min(c4)) from t_alltype;
>  length
> 
>   0
> (1 row)
> {noformat}
> {noformat}
> postgres=# \d t_alltype
>  Table "public.t_alltype"
>  Column |Type | Modifiers
> +-+---
>  c1 | integer |
>  c2 | integer |
>  c3 | bigint  |
>  c4 | character(256)  |
>  c5 | character varying(256)  |
>  c6 | timestamp without time zone |
>  c7 | date|
>  c8 | boolean |
>  c9 | double precision|
> postgres=# select c4 from t_alltype where c4 is null;
>  c4
> 
> (3 rows)
> {noformat}
> {noformat}
> postgres=# SELECT MIN(c4) OVER(PARTITION BY c8 ORDER BY c1 ROWS BETWEEN 
> UNBOUNDED PRECEDING AND CURRENT ROW) FROM t_alltype;
>   
>  min
> --
>  gwfrW
>  ZAFOcferhjkcl
>  ZAFOcferhjkcl
>  ZAFOcferhjkcl
>  ZAFOcferhjkcl
>  ...
>  ...
>  
>  ApKK
>  ApKK
> (145 rows)
> {noformat}
> Parquet schema details
> {noformat}
> [root@centos-01 parquet-tools]# ./parquet-schema 
> ./Datasources/window_functions/t_alltype.parquet
> message root {
>   optional int32 c1;
>   optional int32 c2;
>   optional int64 c3;
>   optional binary c4 (UTF8);
>   optional binary c5 (UTF8);
>   optional int64 c6 (TIMESTAMP_MILLIS);
>   optional int32 c7 (DATE);
>   optional boolean c8;
>   optional double c9;
> }
> {noformat}
> On Drill 1.6.0 
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> SELECT MIN(c4) OVER(PARTITION BY c8 ORDER BY c1 
> ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) FROM dfs.tmp.`t_alltype`;
> ++
> | EXPR$0 |
> ++
> | gwfrW  |
> | ZAFOcferhjkcl  |
> | ZAFOcferhjkcl  |
> | ZAFOcferhjkcl  |
> | ZAFOcferhjkcl  |
> ...
> ...
> | ApKK |
> | ApKK |
> |  |
> |  |
> |  |
> |  |
> |  |
> |  |
> |  |
> |  |
> |  |
> |  |
> | null |
> | null |
> |  |
> |  |
> |  |
> +--+
> 145 rows selected (0.409 seconds)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)