[jira] [Commented] (DRILL-5546) Schema change problems caused by empty batch

2017-08-04 Thread Jinfeng Ni (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16115154#comment-16115154
 ] 

Jinfeng Ni commented on DRILL-5546:
---

I have a work-in-progress branch.  Tried the various of failed queries reported 
in the following JIRAs. Except for DRILL-4734, where I could not get the 
detailed instructions how to prepare the dataset, and could not reproduce the 
problem,  all the queries in the other issues were shown to run successfully, 
when query is dealing with an empty batch ( empty schema and data).

DRILL-5185
DRILL-5464
DRILL-5480
DRILL-5327
DRILL-4686
DRILL-4255 ( same as DRILL-5464).

 

> Schema change problems caused by empty batch
> 
>
> Key: DRILL-5546
> URL: https://issues.apache.org/jira/browse/DRILL-5546
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Jinfeng Ni
>Assignee: Jinfeng Ni
>
> There have been a few JIRAs opened related to schema change failure caused by 
> empty batch. This JIRA is opened as an umbrella for all those related JIRAS ( 
> such as DRILL-4686, DRILL-4734, DRILL4476, DRILL-4255, etc).
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5327) Hash aggregate can return empty batch which can cause schema change exception

2017-08-04 Thread Jinfeng Ni (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16115149#comment-16115149
 ] 

Jinfeng Ni commented on DRILL-5327:
---

I run tpcds- q66 on tpcds-sf1 text dataset on 1.11.0 on a cluster of 2 nodes, 
and the query failed with a different error which seems to also related to 
schema change.
{code}
Error: SYSTEM ERROR: IllegalStateException: Failure while reading vector.  
Expected vector class of org.apache.drill.exec.vector.NullableIntVector but was 
holding vector class org.apache.drill.exec.vector.NullableVarCharVector, field= 
w_warehouse_name(VARCHAR(200):OPTIONAL) [$bits$(UINT1:REQUIRED), 
w_warehouse_name(VARCHAR(200):OPTIONAL) [$offsets$(UINT4:REQUIRED)]]
{code}

With the patch for DRILL-5546, the q66 run successfully in multiple times.

{code}
+---++---++--++++---+---+---+---+---+---+---+--+---+---+--+---+++++++++++++---+---+---+---+---+---+---+--+--+---+--+---+
|   w_warehouse_name| w_warehouse_sq_ft  |  w_city   |  w_county  | 
w_state  |   w_country| ship_carriers  | year1  |   jan_sales   |   
feb_sales   |   mar_sales   |   apr_sales   |   
may_sales   |   jun_sales   |   jul_sales   |  
aug_sales   |   sep_sales   |   oct_sales   |  
nov_sales   |   dec_sales   | jan_sales_per_sq_foot  | 
feb_sales_per_sq_foot  | mar_sales_per_sq_foot  | apr_sales_per_sq_foot  | 
may_sales_per_sq_foot  | jun_sales_per_sq_foot  | jul_sales_per_sq_foot  | 
aug_sales_per_sq_foot  | sep_sales_per_sq_foot  | oct_sales_per_sq_foot  | 
nov_sales_per_sq_foot  | dec_sales_per_sq_foot  |jan_net|   
 feb_net|mar_net|apr_net|
may_net|jun_net|jul_net|   aug_net  
  |   sep_net|oct_net|   nov_net|   
 dec_net|
+---++---++--++++---+---+---+---+---+---+---+--+---+---+--+---+++++++++++++---+---+---+---+---+---+---+--+--+---+--+---+
| Bad cards must make.  | 621234 | Fairview  | Williamson County  | 
TN   | United States  | ZOUROS,ZHOU| 1998   | 1.789528168003E7  | 
2.105534367E7 | 1.494257482E7 | 1.678696090996E7  | 
2.932699781997E7  | 1.771871621E7 | 2.128628804E7 | 
4.15647143801E7  | 4.39726708E7  | 3.636744521E7 | 
5.92148230404E7  | 7.16019977899E7   | null   | NaN 
   | null   | null   | null 
  | null   | null   | null  
 | null   | null   | null   
| null   | 1.872240875E7 | 2.175182704007E7  | 
1.503194233002E7  | 1.551366257E7 | 2.867464643E7 | 
1.954106949002E7  | 2.141437321E7 | 4.61527039101E7  | 
4.655948892E7| 3.913212232E7 | 6.46767638501E7  | 
7.45311048598E7   |
| Conventional childr   | 977787 | Fairview  | Williamson County  | 

[jira] [Comment Edited] (DRILL-4686) Aggregation query over HBase table results in IllegalStateException: Failure while reading vector

2017-08-04 Thread Jinfeng Ni (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16115140#comment-16115140
 ] 

Jinfeng Ni edited comment on DRILL-4686 at 8/5/17 12:01 AM:


I was able to reproduce the problem on 1.11.0.  With the patch for DRILL-5546, 
the query seems to run successfully consistently in multiple runs.

1. Prepare hbase table 
{code}
## hbase shell
create 'browser_action2', 'v', {SPLITS => 
['0','1','2','3','4','5','6','7','8','9']}

put 'browser_action2', '1','v:e0', 'abc1';
put 'browser_action2', '2','v:e0', 'abc2';
put 'browser_action2', '3','v:e0', 'abc3';
put 'browser_action2', '4','v:e0', 'abc4';
put 'browser_action2', '5','v:e0', 'abc5';
put 'browser_action2', '6','v:e0', 'abc6';
put 'browser_action2', '7','v:e0', 'abc7';
put 'browser_action2', '8','v:e0', 'abc8';
put 'browser_action2', '9','v:e0', 'abc9';
put 'browser_action2', '10','v:e0', 'abc10';
{code}

2. Hit issue on 1.11.0 release.
{code}
## drill sqlline
select CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k, count(a.`v`.`e0`) p from 
hbase.browser_action2 a where a.row_key > '0'  group by a.`v`.`e0`;
Error: SYSTEM ERROR: IllegalStateException: Failure while reading vector.  
Expected vector class of org.apache.drill.exec.vector.NullableIntVector but was 
holding vector class org.apache.drill.exec.vector.NullableVarBinaryVector, 
field= $f0(VARBINARY:OPTIONAL) [$bits$(UINT1:REQUIRED), $f0(VARBINARY:OPTIONAL) 
[$offsets$(UINT4:REQUIRED)]]

Fragment 2:1
{code}

3. run successful with the patch of DRILL-5546.
{code}
select CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k, count(a.`v`.`e0`) p from 
hbase.browser_action2 a where a.row_key > '0'  group by a.`v`.`e0`;
+++
|   k| p  |
+++
| abc9   | 1  |
| abc7   | 1  |
| abc8   | 1  |
| abc2   | 1  |
| abc4   | 1  |
| abc3   | 1  |
| abc10  | 1  |
| abc1   | 1  |
| abc6   | 1  |
| abc5   | 1  |
+++
{code}


was (Author: jni):
I was able to reproduce the problem on 1.11.0.  With the patch for DRILL-5466, 
the query seems to run successfully consistently in multiple runs.

1. Prepare hbase table 
{code}
## hbase shell
create 'browser_action2', 'v', {SPLITS => 
['0','1','2','3','4','5','6','7','8','9']}

put 'browser_action2', '1','v:e0', 'abc1';
put 'browser_action2', '2','v:e0', 'abc2';
put 'browser_action2', '3','v:e0', 'abc3';
put 'browser_action2', '4','v:e0', 'abc4';
put 'browser_action2', '5','v:e0', 'abc5';
put 'browser_action2', '6','v:e0', 'abc6';
put 'browser_action2', '7','v:e0', 'abc7';
put 'browser_action2', '8','v:e0', 'abc8';
put 'browser_action2', '9','v:e0', 'abc9';
put 'browser_action2', '10','v:e0', 'abc10';
{code}

2. Hit issue on 1.11.0 release.
{code}
## drill sqlline
select CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k, count(a.`v`.`e0`) p from 
hbase.browser_action2 a where a.row_key > '0'  group by a.`v`.`e0`;
Error: SYSTEM ERROR: IllegalStateException: Failure while reading vector.  
Expected vector class of org.apache.drill.exec.vector.NullableIntVector but was 
holding vector class org.apache.drill.exec.vector.NullableVarBinaryVector, 
field= $f0(VARBINARY:OPTIONAL) [$bits$(UINT1:REQUIRED), $f0(VARBINARY:OPTIONAL) 
[$offsets$(UINT4:REQUIRED)]]

Fragment 2:1
{code}

3. run successful with the patch of DRILL-5466.
{code}
select CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k, count(a.`v`.`e0`) p from 
hbase.browser_action2 a where a.row_key > '0'  group by a.`v`.`e0`;
+++
|   k| p  |
+++
| abc9   | 1  |
| abc7   | 1  |
| abc8   | 1  |
| abc2   | 1  |
| abc4   | 1  |
| abc3   | 1  |
| abc10  | 1  |
| abc1   | 1  |
| abc6   | 1  |
| abc5   | 1  |
+++
{code}

> Aggregation query over HBase table results in IllegalStateException: Failure 
> while reading vector
> -
>
> Key: DRILL-4686
> URL: https://issues.apache.org/jira/browse/DRILL-4686
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.7.0
> Environment: 4 node cluster CentOS
>Reporter: Khurram Faraaz
>
> Aggregation query over HBase table from Drill 1.7.0 returns 
> IllegalStateException
> Drill version 1.7.0-SNAPSHOT,  commit ID : 09b26277
> {noformat}
> put 'browser_action2', '1','v:e0', 'abc1';
> put 'browser_action2', '2','v:e0', 'abc2';
> put 'browser_action2', '3','v:e0', 'abc3';
> put 'browser_action2', '4','v:e0', 'abc4';
> put 'browser_action2', '5','v:e0', 'abc5';
> put 'browser_action2', '6','v:e0', 'abc6';
> put 'browser_action2', '7','v:e0', 'abc7';
> put 'browser_action2', '8','v:e0', 'abc8';
> put 'browser_action2', '9','v:e0', 'abc9';
> put 'browser_action2', '10','v:e0', 'abc10';
> {noformat}
> {noformat}
> [root@centos-01 ~]# hbase shell
> HBase Shell; enter 'help' for list of supported commands.
> Type "exit" 

[jira] [Commented] (DRILL-4686) Aggregation query over HBase table results in IllegalStateException: Failure while reading vector

2017-08-04 Thread Jinfeng Ni (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16115140#comment-16115140
 ] 

Jinfeng Ni commented on DRILL-4686:
---

I was able to reproduce the problem on 1.11.0.  With the patch for DRILL-5466, 
the query seems to run successfully consistently in multiple runs.

1. Prepare hbase table 
{code}
## hbase shell
create 'browser_action2', 'v', {SPLITS => 
['0','1','2','3','4','5','6','7','8','9']}

put 'browser_action2', '1','v:e0', 'abc1';
put 'browser_action2', '2','v:e0', 'abc2';
put 'browser_action2', '3','v:e0', 'abc3';
put 'browser_action2', '4','v:e0', 'abc4';
put 'browser_action2', '5','v:e0', 'abc5';
put 'browser_action2', '6','v:e0', 'abc6';
put 'browser_action2', '7','v:e0', 'abc7';
put 'browser_action2', '8','v:e0', 'abc8';
put 'browser_action2', '9','v:e0', 'abc9';
put 'browser_action2', '10','v:e0', 'abc10';
{code}

2. Hit issue on 1.11.0 release.
{code}
## drill sqlline
select CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k, count(a.`v`.`e0`) p from 
hbase.browser_action2 a where a.row_key > '0'  group by a.`v`.`e0`;
Error: SYSTEM ERROR: IllegalStateException: Failure while reading vector.  
Expected vector class of org.apache.drill.exec.vector.NullableIntVector but was 
holding vector class org.apache.drill.exec.vector.NullableVarBinaryVector, 
field= $f0(VARBINARY:OPTIONAL) [$bits$(UINT1:REQUIRED), $f0(VARBINARY:OPTIONAL) 
[$offsets$(UINT4:REQUIRED)]]

Fragment 2:1
{code}

3. run successful with the patch of DRILL-5466.
{code}
select CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k, count(a.`v`.`e0`) p from 
hbase.browser_action2 a where a.row_key > '0'  group by a.`v`.`e0`;
+++
|   k| p  |
+++
| abc9   | 1  |
| abc7   | 1  |
| abc8   | 1  |
| abc2   | 1  |
| abc4   | 1  |
| abc3   | 1  |
| abc10  | 1  |
| abc1   | 1  |
| abc6   | 1  |
| abc5   | 1  |
+++
{code}

> Aggregation query over HBase table results in IllegalStateException: Failure 
> while reading vector
> -
>
> Key: DRILL-4686
> URL: https://issues.apache.org/jira/browse/DRILL-4686
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.7.0
> Environment: 4 node cluster CentOS
>Reporter: Khurram Faraaz
>
> Aggregation query over HBase table from Drill 1.7.0 returns 
> IllegalStateException
> Drill version 1.7.0-SNAPSHOT,  commit ID : 09b26277
> {noformat}
> put 'browser_action2', '1','v:e0', 'abc1';
> put 'browser_action2', '2','v:e0', 'abc2';
> put 'browser_action2', '3','v:e0', 'abc3';
> put 'browser_action2', '4','v:e0', 'abc4';
> put 'browser_action2', '5','v:e0', 'abc5';
> put 'browser_action2', '6','v:e0', 'abc6';
> put 'browser_action2', '7','v:e0', 'abc7';
> put 'browser_action2', '8','v:e0', 'abc8';
> put 'browser_action2', '9','v:e0', 'abc9';
> put 'browser_action2', '10','v:e0', 'abc10';
> {noformat}
> {noformat}
> [root@centos-01 ~]# hbase shell
> HBase Shell; enter 'help' for list of supported commands.
> Type "exit" to leave the HBase Shell
> Version 1.1.1-mapr-1602-SNAPSHOT, r05ceb750d7ac9decac18e92650fedc0e86d85c7a, 
> Mon Mar 28 18:32:45 UTC 2016
> Not all HBase shell commands are applicable to MapR tables.
> Consult MapR documentation for the list of supported commands.
> hbase(main):001:0> scan 'browser_action2'
> ROWCOLUMN+CELL
>  1 column=v:e0, timestamp=1463589516782, 
> value=abc1
>  10column=v:e0, timestamp=1463589516916, 
> value=abc10
>  2 column=v:e0, timestamp=1463589516809, 
> value=abc2
>  3 column=v:e0, timestamp=1463589516829, 
> value=abc3
>  4 column=v:e0, timestamp=1463589516834, 
> value=abc4
>  5 column=v:e0, timestamp=1463589516847, 
> value=abc5
>  6 column=v:e0, timestamp=1463589516861, 
> value=abc6
>  7 column=v:e0, timestamp=1463589516874, 
> value=abc7
>  8 column=v:e0, timestamp=1463589516896, 
> value=abc8
>  9 column=v:e0, timestamp=1463589516905, 
> value=abc9
> 10 row(s) in 0.7970 seconds
> hbase(main):002:0>
> {noformat}
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> use hbase;
> +---++
> |  ok   |  summary   |
> +---++
> | true  | Default schema changed to [hbase]  |
> +---++
> 1 row selected (0.327 seconds)
> 0: jdbc:drill:schema=dfs.tmp> show tables;
> +---+-+
> | TABLE_SCHEMA  | TABLE_NAME  |
> 

[jira] [Created] (DRILL-5706) Select * on hbase table having multiple regions(one or more empty) returns wrong result intermittently

2017-08-04 Thread Prasad Nagaraj Subramanya (JIRA)
Prasad Nagaraj Subramanya created DRILL-5706:


 Summary: Select * on hbase table having multiple regions(one or 
more empty) returns wrong result intermittently
 Key: DRILL-5706
 URL: https://issues.apache.org/jira/browse/DRILL-5706
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - HBase
Affects Versions: 1.11.0
Reporter: Prasad Nagaraj Subramanya


1) Create a hbase table with 4 regions
{code}
create 'myhbase', 'cf1', {SPLITS => ['a', 'b', 'c']}
put 'myhbase','a','cf1:col1','somedata'
put 'myhbase','b','cf1:col1','somedata'
put 'myhbase','c','cf1:col1','somedata'
{code}

2) Run select * on the hbase table
{code}
select * from hbase.myhbase;
{code}
The query returns wrong result intermittently



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (DRILL-5705) Select * on hbase table having multiple regions and multiple schema returns wrong result

2017-08-04 Thread Prasad Nagaraj Subramanya (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasad Nagaraj Subramanya updated DRILL-5705:
-
Description: 
Repro steps-

1) Create a hbase table with 4 regions-
{code}
create 'myhbase', 'cf1','cf2', {SPLITS => ['a', 'b', 'c']}
put 'myhbase','a','cf1:col1','somedata'
put 'myhbase','b','cf1:col2','somedata'
put 'myhbase','c','cf2:col1','somedata'
put 'myhbase','d','cf1:col1','somedata'
put 'myhbase','d','cf2:col1','somedata'
{code}

2) Run select * on the hbase table
{code}
select * from hbase.myhbase;
{code}

The query returns wrong result, and the result is not consistent across 
multiple runs.

  was:
Repro steps-

1) Create a hbase table with 4 regions-
{code}
create 'myhbase', 'cf1','cf2', {SPLITS => ['a', 'b', 'c']}
put 'myhbase','a','cf1:col1','somedata'
put 'myhbase','b','cf1:col2','somedata'
put 'myhbase','c','cf2:col1','somedata'
put 'myhbase','d','cf1:col1','somedata'
put 'myhbase','d','cf2:col1','somedata'
{code}

2) Run select * on the hbase table
{code}
Select * from hbase.myhbase;
{code}

The query returns wrong result, and the result is not consistent across 
multiple runs.


> Select * on hbase table having multiple regions and multiple schema returns 
> wrong result
> 
>
> Key: DRILL-5705
> URL: https://issues.apache.org/jira/browse/DRILL-5705
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.11.0
>Reporter: Prasad Nagaraj Subramanya
>
> Repro steps-
> 1) Create a hbase table with 4 regions-
> {code}
> create 'myhbase', 'cf1','cf2', {SPLITS => ['a', 'b', 'c']}
> put 'myhbase','a','cf1:col1','somedata'
> put 'myhbase','b','cf1:col2','somedata'
> put 'myhbase','c','cf2:col1','somedata'
> put 'myhbase','d','cf1:col1','somedata'
> put 'myhbase','d','cf2:col1','somedata'
> {code}
> 2) Run select * on the hbase table
> {code}
> select * from hbase.myhbase;
> {code}
> The query returns wrong result, and the result is not consistent across 
> multiple runs.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (DRILL-5705) Select * on hbase table having multiple regions and multiple schema returns wrong result

2017-08-04 Thread Prasad Nagaraj Subramanya (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasad Nagaraj Subramanya updated DRILL-5705:
-
Description: 
Repro steps-

1) Create a hbase table with 4 regions-
{code}
create 'myhbase', 'cf1','cf2', {SPLITS => ['a', 'b', 'c']}
put 'myhbase','a','cf1:col1','somedata'
put 'myhbase','b','cf1:col2','somedata'
put 'myhbase','c','cf2:col1','somedata'
put 'myhbase','d','cf1:col1','somedata'
put 'myhbase','d','cf2:col1','somedata'
{code}

2) Run select * on the hbase table
{code}
Select * from hbase.myhbase;
{code}

The query returns wrong result, and the result is not consistent across 
multiple runs.

  was:
Repro steps-

1) Create a hbase table with 4 regions-
{code}
create 'myhbase', 'cf1','cf2', {SPLITS => ['a', 'b', 'c']}
put 'myhbase','a','cf1:col1','somedata'
put 'myhbase','b','cf1:col2','somedata'
put 'myhbase','c','cf2:col1','somedata'
put 'myhbase', 'd','cf1:col1','somedata'
put 'myhbase', 'd','cf2:col1','somedata'
{code}

2) Run select * on the hbase table
{code}
Select * from hbase.myhbase;
{code}

The query returns wrong result, and the result is not consistent across 
multiple runs.


> Select * on hbase table having multiple regions and multiple schema returns 
> wrong result
> 
>
> Key: DRILL-5705
> URL: https://issues.apache.org/jira/browse/DRILL-5705
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.11.0
>Reporter: Prasad Nagaraj Subramanya
>
> Repro steps-
> 1) Create a hbase table with 4 regions-
> {code}
> create 'myhbase', 'cf1','cf2', {SPLITS => ['a', 'b', 'c']}
> put 'myhbase','a','cf1:col1','somedata'
> put 'myhbase','b','cf1:col2','somedata'
> put 'myhbase','c','cf2:col1','somedata'
> put 'myhbase','d','cf1:col1','somedata'
> put 'myhbase','d','cf2:col1','somedata'
> {code}
> 2) Run select * on the hbase table
> {code}
> Select * from hbase.myhbase;
> {code}
> The query returns wrong result, and the result is not consistent across 
> multiple runs.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (DRILL-5705) Select * on hbase table having multiple regions and multiple schema returns wrong result

2017-08-04 Thread Prasad Nagaraj Subramanya (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasad Nagaraj Subramanya updated DRILL-5705:
-
Description: 
Repro steps-

1) Create a hbase table with 4 regions-
{code}
create 'myhbase', 'cf1','cf2', {SPLITS => ['a', 'b', 'c']}
put 'myhbase','a','cf1:col1','somedata'
put 'myhbase','b','cf1:col2','somedata'
put 'myhbase','c','cf2:col1','somedata'
put 'myhbase', 'd','cf1:col1','somedata'
put 'myhbase', 'd','cf2:col1','somedata'
{code}

2) Run select * on the hbase table
{code}
Select * from hbase.myhbase;
{code}

The query returns wrong result, and the result is not consistent across 
multiple runs.

  was:
Repro steps-

1) Create a hbase table with 4 regions-
{code}
create 'myhbase', 'cf1','cf2', {SPLITS => ['a', 'b', 'c']}
put 'myhbase','a','cf1:col1','somedata'
put 'myhbase','b','cf1:col2','somedata'
put 'myhbase','c','cf2:col1','somedata'
put 'myhbase', 'd', 'cf1:col1', 'somedata'
put 'myhbase', 'd', 'cf2:col1', 'somedata'
{code}

2) Run select * on the hbase table
{code}
Select * from hbase.myhbase;
{code}

The query returns wrong result, and the result is not consistent across 
multiple runs.


> Select * on hbase table having multiple regions and multiple schema returns 
> wrong result
> 
>
> Key: DRILL-5705
> URL: https://issues.apache.org/jira/browse/DRILL-5705
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.11.0
>Reporter: Prasad Nagaraj Subramanya
>
> Repro steps-
> 1) Create a hbase table with 4 regions-
> {code}
> create 'myhbase', 'cf1','cf2', {SPLITS => ['a', 'b', 'c']}
> put 'myhbase','a','cf1:col1','somedata'
> put 'myhbase','b','cf1:col2','somedata'
> put 'myhbase','c','cf2:col1','somedata'
> put 'myhbase', 'd','cf1:col1','somedata'
> put 'myhbase', 'd','cf2:col1','somedata'
> {code}
> 2) Run select * on the hbase table
> {code}
> Select * from hbase.myhbase;
> {code}
> The query returns wrong result, and the result is not consistent across 
> multiple runs.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (DRILL-5705) Select * on hbase table having multiple regions and multiple schema returns wrong result

2017-08-04 Thread Prasad Nagaraj Subramanya (JIRA)
Prasad Nagaraj Subramanya created DRILL-5705:


 Summary: Select * on hbase table having multiple regions and 
multiple schema returns wrong result
 Key: DRILL-5705
 URL: https://issues.apache.org/jira/browse/DRILL-5705
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.11.0
Reporter: Prasad Nagaraj Subramanya


Repro steps-

1) Create a hbase table with 4 regions-
{code}
create 'myhbase', 'cf1','cf2', {SPLITS => ['a', 'b', 'c']}
put 'myhbase','a','cf1:col1','somedata'
put 'myhbase','b','cf1:col2','somedata'
put 'myhbase','c','cf2:col1','somedata'
put 'myhbase', 'd', 'cf1:col1', 'somedata'
put 'myhbase', 'd', 'cf2:col1', 'somedata'
{code}

2) Run select * on the hbase table
{code}
Select * from hbase.myhbase;
{code}

The query returns wrong result, and the result is not consistent across 
multiple runs.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-4255) SELECT DISTINCT query over JSON data returns UNSUPPORTED OPERATION

2017-08-04 Thread Jinfeng Ni (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16115103#comment-16115103
 ] 

Jinfeng Ni commented on DRILL-4255:
---

Seems to me the issues is same as DRILL-5464; the distinct clause essentially 
will convert to an Aggregate operator. 

As mentioned in DRILL-5464,  the query used to fail with SchemaChange error 
runs successfully, over the patch for DRILL-5546 ( the umbrella JIRA for schema 
change issues caused by null dataset).



> SELECT DISTINCT query over JSON data returns UNSUPPORTED OPERATION
> --
>
> Key: DRILL-4255
> URL: https://issues.apache.org/jira/browse/DRILL-4255
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.4.0
> Environment: CentOS
>Reporter: Khurram Faraaz
>
> SELECT DISTINCT over mapr fs generated audit logs (JSON files) results in 
> unsupported operation. An exact query over another set of JSON data returns 
> correct results.
> MapR Drill 1.4.0, commit ID : 9627a80f
> MapRBuildVersion : 5.1.0.36488.GA
> OS : CentOS x86_64 GNU/Linux
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> select distinct t.operation from `auditlogs` t;
> Error: UNSUPPORTED_OPERATION ERROR: Hash aggregate does not support schema 
> changes
> Fragment 3:3
> [Error Id: 1233bf68-13da-4043-a162-cf6d98c07ec9 on example.com:31010] 
> (state=,code=0)
> {noformat}
> Stack trace from drillbit.log
> {noformat}
> 2016-01-08 11:35:35,093 [297060f9-1c7a-b32c-09e8-24b5ad863e73:frag:3:3] INFO  
> o.a.d.e.p.i.aggregate.HashAggBatch - User Error Occurred
> org.apache.drill.common.exceptions.UserException: UNSUPPORTED_OPERATION 
> ERROR: Hash aggregate does not support schema changes
> [Error Id: 1233bf68-13da-4043-a162-cf6d98c07ec9 ]
> at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:534)
>  ~[drill-common-1.4.0.jar:1.4.0]
> at 
> org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext(HashAggBatch.java:144)
>  [drill-java-exec-1.4.0.jar:1.4.0]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
>  [drill-java-exec-1.4.0.jar:1.4.0]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
>  [drill-java-exec-1.4.0.jar:1.4.0]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
>  [drill-java-exec-1.4.0.jar:1.4.0]
> at 
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
>  [drill-java-exec-1.4.0.jar:1.4.0]
> at 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:132)
>  [drill-java-exec-1.4.0.jar:1.4.0]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
>  [drill-java-exec-1.4.0.jar:1.4.0]
> at 
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:104) 
> [drill-java-exec-1.4.0.jar:1.4.0]
> at 
> org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext(SingleSenderCreator.java:93)
>  [drill-java-exec-1.4.0.jar:1.4.0]
> at 
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:94) 
> [drill-java-exec-1.4.0.jar:1.4.0]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:256)
>  [drill-java-exec-1.4.0.jar:1.4.0]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:250)
>  [drill-java-exec-1.4.0.jar:1.4.0]
> at java.security.AccessController.doPrivileged(Native Method) 
> [na:1.7.0_65]
> at javax.security.auth.Subject.doAs(Subject.java:415) [na:1.7.0_65]
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1595)
>  [hadoop-common-2.7.0-mapr-1506.jar:na]
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:250)
>  [drill-java-exec-1.4.0.jar:1.4.0]
> at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.4.0.jar:1.4.0]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_65]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_65]
> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_65]
> {noformat}
> Query plan for above query.
> {noformat}
> 00-00Screen : rowType = RecordType(ANY operation): rowcount = 141437.16, 
> cumulative cost = {3.4100499276E7 rows, 1.69455861396E8 cpu, 0.0 io, 
> 1.2165858754560001E10 network, 2.738223417605E8 memory}, id = 7572
> 00-01  UnionExchange : rowType = RecordType(ANY 

[jira] [Commented] (DRILL-5185) Union all not passing type info when the output contains 0 rows

2017-08-04 Thread Jinfeng Ni (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16115068#comment-16115068
 ] 

Jinfeng Ni commented on DRILL-5185:
---

Run against a patch for DRILL-5466. The query, which originally failed, is 
completed successfully.

{code}
0: jdbc:drill:zk=local> select t1.l_partkey, t2.o_orderdate from (
. . . . . . . . . . . >   select l_orderkey, l_partkey, l_comment from 
cp.`tpch/lineitem.parquet` where l_quantity is null
. . . . . . . . . . . >   union
. . . . . . . . . . . >   select l_orderkey, l_partkey, l_comment from 
cp.`tpch/lineitem.parquet` where l_quantity is null
. . . . . . . . . . . >   ) as t1,
. . . . . . . . . . . >   cp.`tpch/orders.parquet` as t2
. . . . . . . . . . . > where t1.l_comment = t2.o_comment;
++--+
| l_partkey  | o_orderdate  |
++--+
++--+
No rows selected (1.099 seconds)
{code}

> Union all not passing type info when the output contains 0 rows
> ---
>
> Key: DRILL-5185
> URL: https://issues.apache.org/jira/browse/DRILL-5185
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators, Query Planning & 
> Optimization
>Affects Versions: 1.10.0
>Reporter: Rahul Challapalli
>Assignee: Jinfeng Ni
>
> Version : 1.10.0
> git.commit.id.abbrev=4d4e0c2
> The below query fails without an explicit cast
> {code}
> 0: jdbc:drill:zk=10.10.100.190:5181> select t1.l_partkey, t2.o_orderdate from 
> (
> . . . . . . . . . . . . . . . . . .> select l_orderkey, l_partkey, l_comment 
> from cp.`tpch/lineitem.parquet` where l_quantity is null
> . . . . . . . . . . . . . . . . . .> union 
> . . . . . . . . . . . . . . . . . .> select l_orderkey, l_partkey, l_comment 
> from cp.`tpch/lineitem.parquet` where l_quantity is null
> . . . . . . . . . . . . . . . . . .> ) as t1,
> . . . . . . . . . . . . . . . . . .> cp.`tpch/orders.parquet` as t2
> . . . . . . . . . . . . . . . . . .> where t1.l_comment = t2.o_comment;
> Error: SYSTEM ERROR: DrillRuntimeException: Join only supports implicit casts 
> between 1. Numeric data
>  2. Varchar, Varbinary data 3. Date, Timestamp data Left type: VARCHAR, Right 
> type: INT. Add explicit casts to avoid this error
> Fragment 0:0
> [Error Id: e09bb8ee-cb1c-48bc-9dce-42ace2d4b80b on qa-node190.qa.lab:31010] 
> (state=,code=0)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (DRILL-5464) Fix JSON reader when it deals with empty file

2017-08-04 Thread Jinfeng Ni (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16115064#comment-16115064
 ] 

Jinfeng Ni edited comment on DRILL-5464 at 8/4/17 10:52 PM:


Run the above query with the patch for DRILL-5546, the umbrella jira for schema 
change issues related to NULL dataset.  The query was finished successfully in 
multiple runs. 

{code}
 select stars, count(*) as cnt from dfs.tmp.yelp group by stars;
++-+
| stars  |   cnt   |
++-+
| 2  | 102737  |
| 1  | 110772  |
| 4  | 342143  |
| 5  | 406045  |
| 3  | 163761  |
++-+
{code} 

Physical plan for the query; 
{code}
00-00Screen
00-01  Project(stars=[$0], cnt=[$1])
00-02UnionExchange
01-01  HashAgg(group=[{0}], cnt=[$SUM0($1)])
01-02Project(stars=[$0], cnt=[$1])
01-03  HashToRandomExchange(dist0=[[$0]])
02-01UnorderedMuxExchange
03-01  Project(stars=[$0], cnt=[$1], 
E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0, 1301011)])
03-02HashAgg(group=[{0}], cnt=[COUNT()])
03-03  Scan(groupscan=[EasyGroupScan 
[selectionRoot=file:/tmp/yelp, numFiles=2, columns=[`stars`], 
files=[file:/tmp/yelp/empty.json, 
file:/tmp/yelp/yelp_academic_dataset_review.json]]])
{code}


was (Author: jni):
Run the above query with the patch for DRILL-5546, the umbrella jira for schema 
change issues related to NULL dataset.  The query was finished successfully.

{code}
 select stars, count(*) as cnt from dfs.tmp.yelp group by stars;
++-+
| stars  |   cnt   |
++-+
| 2  | 102737  |
| 1  | 110772  |
| 4  | 342143  |
| 5  | 406045  |
| 3  | 163761  |
++-+
{code} 

Physical plan for the query; 
{code}
00-00Screen
00-01  Project(stars=[$0], cnt=[$1])
00-02UnionExchange
01-01  HashAgg(group=[{0}], cnt=[$SUM0($1)])
01-02Project(stars=[$0], cnt=[$1])
01-03  HashToRandomExchange(dist0=[[$0]])
02-01UnorderedMuxExchange
03-01  Project(stars=[$0], cnt=[$1], 
E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0, 1301011)])
03-02HashAgg(group=[{0}], cnt=[COUNT()])
03-03  Scan(groupscan=[EasyGroupScan 
[selectionRoot=file:/tmp/yelp, numFiles=2, columns=[`stars`], 
files=[file:/tmp/yelp/empty.json, 
file:/tmp/yelp/yelp_academic_dataset_review.json]]])
{code}

> Fix JSON reader when it deals with empty file
> -
>
> Key: DRILL-5464
> URL: https://issues.apache.org/jira/browse/DRILL-5464
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Jinfeng Ni
>
> An empty json file is the one without any json object.  If we query an empty 
> json file asking it to return column 'A',  Drill's JSON record reader would 
> return a batch with 0 row, and put column 'A' as a nullable int column. A 
> better name for such column might be phantom columns, as the record reader 
> does not have any knowledge of the column schema, and the nullable int column 
> is just a guessed schema. 
> However, that processing could introduce many issues. Consider if we have a 
> directory consisting of multiple json files and at least one of them is 
> empty.  If column 'A' is returned as nullable-int column from the reader over 
> the empty file, while the other json files contains a real typed column 'A', 
> that would cause query hit many issues, including 1) SchemaChangeException, 
> 2) failed in certain operator which does not detect SchemaChange, 3) or 
> incorrect query result, since the run-time code is generated over a phantom 
> column type, not a real type.
> For instance, the following query against yelp json file run successfully.
> {code}
> select count(*), stars  from 
> dfs.`/tmp/yelp/yelp_academic_dataset_review.json` group by stars;
> {code}
> If an empty json file is added to the directory,  the query would fail with 
> the following error (which falls into the 2nd category : PartitionSender did 
> not detect schema change properly).  
> {code}
> select count(*), stars  from dfs.`/tmp/yelp` group by stars;
> Error: SYSTEM ERROR: IllegalStateException: Failure while reading vector.  
> Expected vector class of org.apache.drill.exec.vector.NullableIntVector but 
> was holding vector class org.apache.drill.exec.vector.NullableBigIntVector, 
> field= stars(BIGINT:OPTIONAL)[$bits$(UINT1:REQUIRED), stars(BIGINT:OPTIONAL)]
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5464) Fix JSON reader when it deals with empty file

2017-08-04 Thread Jinfeng Ni (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16115064#comment-16115064
 ] 

Jinfeng Ni commented on DRILL-5464:
---

Run the above query with the patch for DRILL-5546, the umbrella jira for schema 
change issues related to NULL dataset.  The query was finished successfully.

{code}
 select stars, count(*) as cnt from dfs.tmp.yelp group by stars;
++-+
| stars  |   cnt   |
++-+
| 2  | 102737  |
| 1  | 110772  |
| 4  | 342143  |
| 5  | 406045  |
| 3  | 163761  |
++-+
{code} 

Physical plan for the query; 
{code}
00-00Screen
00-01  Project(stars=[$0], cnt=[$1])
00-02UnionExchange
01-01  HashAgg(group=[{0}], cnt=[$SUM0($1)])
01-02Project(stars=[$0], cnt=[$1])
01-03  HashToRandomExchange(dist0=[[$0]])
02-01UnorderedMuxExchange
03-01  Project(stars=[$0], cnt=[$1], 
E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0, 1301011)])
03-02HashAgg(group=[{0}], cnt=[COUNT()])
03-03  Scan(groupscan=[EasyGroupScan 
[selectionRoot=file:/tmp/yelp, numFiles=2, columns=[`stars`], 
files=[file:/tmp/yelp/empty.json, 
file:/tmp/yelp/yelp_academic_dataset_review.json]]])
{code}

> Fix JSON reader when it deals with empty file
> -
>
> Key: DRILL-5464
> URL: https://issues.apache.org/jira/browse/DRILL-5464
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Jinfeng Ni
>
> An empty json file is the one without any json object.  If we query an empty 
> json file asking it to return column 'A',  Drill's JSON record reader would 
> return a batch with 0 row, and put column 'A' as a nullable int column. A 
> better name for such column might be phantom columns, as the record reader 
> does not have any knowledge of the column schema, and the nullable int column 
> is just a guessed schema. 
> However, that processing could introduce many issues. Consider if we have a 
> directory consisting of multiple json files and at least one of them is 
> empty.  If column 'A' is returned as nullable-int column from the reader over 
> the empty file, while the other json files contains a real typed column 'A', 
> that would cause query hit many issues, including 1) SchemaChangeException, 
> 2) failed in certain operator which does not detect SchemaChange, 3) or 
> incorrect query result, since the run-time code is generated over a phantom 
> column type, not a real type.
> For instance, the following query against yelp json file run successfully.
> {code}
> select count(*), stars  from 
> dfs.`/tmp/yelp/yelp_academic_dataset_review.json` group by stars;
> {code}
> If an empty json file is added to the directory,  the query would fail with 
> the following error (which falls into the 2nd category : PartitionSender did 
> not detect schema change properly).  
> {code}
> select count(*), stars  from dfs.`/tmp/yelp` group by stars;
> Error: SYSTEM ERROR: IllegalStateException: Failure while reading vector.  
> Expected vector class of org.apache.drill.exec.vector.NullableIntVector but 
> was holding vector class org.apache.drill.exec.vector.NullableBigIntVector, 
> field= stars(BIGINT:OPTIONAL)[$bits$(UINT1:REQUIRED), stars(BIGINT:OPTIONAL)]
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5704) Improve error message on client side when queries fail with "Failed to create schema tree." when Impersonation is enabled and logins are anonymous

2017-08-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16114929#comment-16114929
 ] 

ASF GitHub Bot commented on DRILL-5704:
---

Github user priteshm commented on the issue:

https://github.com/apache/drill/pull/895
  
@parthchandra can you please review this?


> Improve error message on client side when queries fail with "Failed to create 
> schema tree." when Impersonation is enabled and logins are anonymous
> --
>
> Key: DRILL-5704
> URL: https://issues.apache.org/jira/browse/DRILL-5704
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Sorabh Hamirwasia
>Assignee: Sorabh Hamirwasia
> Fix For: 1.12.0
>
>
> Reported by [~agirish]
> When username is not specified then Drill set's the session user as anonymous 
> if impersonation is enabled. During query execution Drill tries to build 
> schema tree and as part of that it validates if the user has access to the 
> workspace or not by using FileClient Api liststatus which verifies the user 
> from the OS user. Since impersonation is only enabled here without 
> authentication and we don't specify any user in connection string, Drill will 
> use default user which is "anonymous" and pass that to check workspace 
> permission which will fail as node doesn't have any valid user with that name.
> {code:java}
> Caused by: java.io.IOException: Error getting user info for current user, 
> anonymous
>..
>..
> at 
> org.apache.drill.exec.store.dfs.DrillFileSystem.listStatus(DrillFileSystem.java:523)
>  ~[drill-java-exec-1.9.0-SNAPSHOT.jar:1.9.0-SNAPSHOT]
> at 
> org.apache.drill.exec.store.dfs.WorkspaceSchemaFactory.accessible(WorkspaceSchemaFactory.java:157)
>  ~[drill-java-exec-1.9.0-SNAPSHOT.jar:1.9.0-SNAPSHOT]
> at 
> org.apache.drill.exec.store.dfs.FileSystemSchemaFactory$FileSystemSchema.(FileSystemSchemaFactory.java:78)
>  ~[drill-java-exec-1.9.0-SNAPSHOT.jar:1.9.0-SNAPSHOT]
> at 
> org.apache.drill.exec.store.dfs.FileSystemSchemaFactory.registerSchemas(FileSystemSchemaFactory.java:65)
>  ~[drill-java-exec-1.9.0-SNAPSHOT.jar:1.9.0-SNAPSHOT]
> at 
> org.apache.drill.exec.store.dfs.FileSystemPlugin.registerSchemas(FileSystemPlugin.java:150)
>  ~[drill-java-exec-1.9.0-SNAPSHOT.jar:1.9.0-SNAPSHOT]
> at 
> org.apache.drill.exec.store.StoragePluginRegistryImpl$DrillSchemaFactory.registerSchemas(StoragePluginRegistryImpl.java:365)
>  ~[drill-java-exec-1.9.0-SNAPSHOT.jar:1.9.0-SNAPSHOT]
> at 
> org.apache.drill.exec.store.SchemaTreeProvider.createRootSchema(SchemaTreeProvider.java:72)
>  [drill-java-exec-1.9.0-SNAPSHOT.jar:1.9.0-SNAPSHOT]
> ... 10 common frames omitted
> {code}
> # $DRILL_HOME/bin/sqlline -u "jdbc:drill:zk=localhost:5181" 
> sqlline> select * from sys.drillbits;
> User Error Occurred
> org.apache.drill.common.exceptions.UserException: RESOURCE ERROR: Failed to 
> create schema tree.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5480) Empty batch returning from HBase may cause SchemChangeException or incorrect query result

2017-08-04 Thread Jinfeng Ni (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16114919#comment-16114919
 ] 

Jinfeng Ni commented on DRILL-5480:
---

I have a branch to address the umbrella jira DRILL-5546. Tried the above two 
queries, and it seems to run successfully.

{code}
select * from hbase.customer c, cp.`tpch/orders.parquet` o where 
cast(c.orders.id as bigint) = o.o_orderkey and c.orders.id <= '200';
+--++-+++---+--+--+--+-+---+
|   row_key| orders | o_orderkey  | o_custkey  | o_orderstatus  | 
o_totalprice  | o_orderdate  | o_orderpriority  | o_clerk  | 
o_shippriority  |   o_comment   |
+--++-+++---+--+--+--+-+---+
| [B@33b51673  | {"id":"MTAw"}  | 100 | 1471   | O  | 
198978.27 | 1998-02-28   | 4-NOT SPECIFIED  | Clerk#00577  | 0  
 | heodolites detect slyly alongside of the ent  |
+--++-+++---+--+--+--+-+---+
{code}

{code}
select * from hbase.customer2 c, cp.`tpch/orders.parquet` o where 
cast(c.orders.id as bigint) = o.o_orderkey and c.orders.id <= '500';
+--++-+++---+--+--+--+-+---+
|   row_key| orders | o_orderkey  | o_custkey  | o_orderstatus  | 
o_totalprice  | o_orderdate  | o_orderpriority  | o_clerk  | 
o_shippriority  |   o_comment   |
+--++-+++---+--+--+--+-+---+
| [B@5a44adf8  | {"id":"MTAw"}  | 100 | 1471   | O  | 
198978.27 | 1998-02-28   | 4-NOT SPECIFIED  | Clerk#00577  | 0  
 | heodolites detect slyly alongside of the ent  |
+--++-+++---+--+--+--+-+---+
{code}

> Empty batch returning from HBase may cause SchemChangeException or incorrect 
> query result
> -
>
> Key: DRILL-5480
> URL: https://issues.apache.org/jira/browse/DRILL-5480
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Jinfeng Ni
>Assignee: Jinfeng Ni
>
> The following repo was provided by [~haozhu].
> 1. Create a Hbase table with 4 regions
> {code}
> create 'myhbase', 'cf1','cf2', {SPLITS => ['a', 'b', 'c']}
> put 'myhbase','a','cf1:col1','somedata'
> put 'myhbase','b','cf1:col2','somedata'
> put 'myhbase','c','cf2:col1','somedata'
> {code}
> One region has cf1.col1.  One region has column family 'cf1', but does not 
> have 'col1' under 'cf1'. One region has only column family 'cf2'. And last 
> region is complete empty.
> 2. Prepare a csv file.
> {code}
> select * from dfs.tmp.`joinhbase.csv`;
> +---+
> |  columns  |
> +---+
> | ["1","somedata"]  |
> | ["2","somedata"]  |
> | ["3","somedata"]  |
> {code}
> Now run the following query on drill 1.11.0-SNAPSHOT:
> {code}
> select cast(H.row_key as varchar(10)) as keyCol, CONVERT_FROM(H.cf1.col1, 
> 'UTF8') as col1
> from 
> hbase.myhbase H JOIN dfs.tmp.`joinhbase.csv` C
> ON CONVERT_FROM(H.cf1.col1, 'UTF8')= C.columns[1]
> ;
> {code}
> The correct query result show be:
> {code}
> +-+---+
> | keyCol  |   col1|
> +-+---+
> | a   | somedata  |
> | a   | somedata  |
> | a   | somedata  |
> +-+---+
> {code}
> Turn off broadcast join, then we will see SchemaChangeException, or incorrect 
> result randomly. By 'randomly', it means in the same session, the same query 
> would hit SchemaChangeException in one run, while gets incorrect result in a 
> second run. 
> {code}
> alter session set `planner.enable_broadcast_join`=false;
> {code}
> {code}
> select cast(H.row_key as varchar(10)) as keyCol, CONVERT_FROM(H.cf1.col1, 
> 'UTF8') as col1
> . . . . . . . . . . . . . . . . . .> from
> . . . . . . . . . . . . . . . . . .> hbase.myhbase H JOIN 
> 

[jira] [Commented] (DRILL-4735) Count(dir0) on parquet returns 0 result

2017-08-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16114729#comment-16114729
 ] 

ASF GitHub Bot commented on DRILL-4735:
---

Github user jinfengni commented on a diff in the pull request:

https://github.com/apache/drill/pull/882#discussion_r131445381
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/ConvertCountToDirectScan.java
 ---
@@ -85,109 +91,231 @@ protected ConvertCountToDirectScan(RelOptRuleOperand 
rule, String id) {
   @Override
   public void onMatch(RelOptRuleCall call) {
 final DrillAggregateRel agg = (DrillAggregateRel) call.rel(0);
-final DrillScanRel scan = (DrillScanRel) call.rel(call.rels.length -1);
-final DrillProjectRel proj = call.rels.length == 3 ? (DrillProjectRel) 
call.rel(1) : null;
+final DrillScanRel scan = (DrillScanRel) call.rel(call.rels.length - 
1);
+final DrillProjectRel project = call.rels.length == 3 ? 
(DrillProjectRel) call.rel(1) : null;
 
 final GroupScan oldGrpScan = scan.getGroupScan();
 final PlannerSettings settings = 
PrelUtil.getPlannerSettings(call.getPlanner());
 
-// Only apply the rule when :
+// Only apply the rule when:
 //1) scan knows the exact row count in getSize() call,
 //2) No GroupBY key,
-//3) only one agg function (Check if it's count(*) below).
-//4) No distinct agg call.
+//3) No distinct agg call.
 if 
(!(oldGrpScan.getScanStats(settings).getGroupScanProperty().hasExactRowCount()
 && agg.getGroupCount() == 0
-&& agg.getAggCallList().size() == 1
 && !agg.containsDistinctCall())) {
   return;
 }
 
-AggregateCall aggCall = agg.getAggCallList().get(0);
-
-if (aggCall.getAggregation().getName().equals("COUNT") ) {
-
-  long cnt = 0;
-  //  count(*)  == >  empty arg  ==>  rowCount
-  //  count(Not-null-input) ==> rowCount
-  if (aggCall.getArgList().isEmpty() ||
-  (aggCall.getArgList().size() == 1 &&
-   ! 
agg.getInput().getRowType().getFieldList().get(aggCall.getArgList().get(0).intValue()).getType().isNullable()))
 {
-cnt = (long) oldGrpScan.getScanStats(settings).getRecordCount();
-  } else if (aggCall.getArgList().size() == 1) {
-  // count(columnName) ==> Agg ( Scan )) ==> columnValueCount
-int index = aggCall.getArgList().get(0);
-
-if (proj != null) {
-  // project in the middle of Agg and Scan : Only when input of 
AggCall is a RexInputRef in Project, we find the index of Scan's field.
-  // For instance,
-  // Agg - count($0)
-  //  \
-  //  Proj - Exp={$1}
-  //\
-  //   Scan (col1, col2).
-  // return count of "col2" in Scan's metadata, if found.
-
-  if (proj.getProjects().get(index) instanceof RexInputRef) {
-index = ((RexInputRef) 
proj.getProjects().get(index)).getIndex();
-  } else {
-return;  // do not apply for all other cases.
-  }
-}
+final CountsCollector countsCollector = new CountsCollector(settings);
+// if counts were not collected, rule won't be applied
+if (!countsCollector.collect(agg, scan, project)) {
+  return;
+}
 
-String columnName = 
scan.getRowType().getFieldNames().get(index).toLowerCase();
+final RelDataType scanRowType = constructDataType(agg);
 
-cnt = 
oldGrpScan.getColumnValueCount(SchemaPath.getSimplePath(columnName));
-if (cnt == GroupScan.NO_COLUMN_STATS) {
-  // if column stats are not available don't apply this rule
-  return;
-}
-  } else {
-return; // do nothing.
-  }
+final DynamicPojoRecordReader reader = new 
DynamicPojoRecordReader<>(
+buildSchema(scanRowType.getFieldNames()),
+Collections.singletonList(countsCollector.getCounts()));
 
-  RelDataType scanRowType = 
getCountDirectScanRowType(agg.getCluster().getTypeFactory());
+final ScanStats scanStats = new 
ScanStats(ScanStats.GroupScanProperty.EXACT_ROW_COUNT, 1, 1, 
scanRowType.getFieldCount());
+final GroupScan directScan = new MetadataDirectGroupScan(reader, 
oldGrpScan.getFiles(), scanStats);
 
-  final ScanPrel newScan = ScanPrel.create(scan,
-  
scan.getTraitSet().plus(Prel.DRILL_PHYSICAL).plus(DrillDistributionTrait.SINGLETON),
 getCountDirectScan(cnt),
-  scanRowType);
+final ScanPrel newScan = ScanPrel.create(scan,
+

[jira] [Commented] (DRILL-4735) Count(dir0) on parquet returns 0 result

2017-08-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16114728#comment-16114728
 ] 

ASF GitHub Bot commented on DRILL-4735:
---

Github user jinfengni commented on a diff in the pull request:

https://github.com/apache/drill/pull/882#discussion_r131447047
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/ConvertCountToDirectScan.java
 ---
@@ -85,109 +91,231 @@ protected ConvertCountToDirectScan(RelOptRuleOperand 
rule, String id) {
   @Override
   public void onMatch(RelOptRuleCall call) {
 final DrillAggregateRel agg = (DrillAggregateRel) call.rel(0);
-final DrillScanRel scan = (DrillScanRel) call.rel(call.rels.length -1);
-final DrillProjectRel proj = call.rels.length == 3 ? (DrillProjectRel) 
call.rel(1) : null;
+final DrillScanRel scan = (DrillScanRel) call.rel(call.rels.length - 
1);
+final DrillProjectRel project = call.rels.length == 3 ? 
(DrillProjectRel) call.rel(1) : null;
 
 final GroupScan oldGrpScan = scan.getGroupScan();
 final PlannerSettings settings = 
PrelUtil.getPlannerSettings(call.getPlanner());
 
-// Only apply the rule when :
+// Only apply the rule when:
 //1) scan knows the exact row count in getSize() call,
 //2) No GroupBY key,
-//3) only one agg function (Check if it's count(*) below).
-//4) No distinct agg call.
+//3) No distinct agg call.
 if 
(!(oldGrpScan.getScanStats(settings).getGroupScanProperty().hasExactRowCount()
 && agg.getGroupCount() == 0
-&& agg.getAggCallList().size() == 1
 && !agg.containsDistinctCall())) {
   return;
 }
 
-AggregateCall aggCall = agg.getAggCallList().get(0);
-
-if (aggCall.getAggregation().getName().equals("COUNT") ) {
-
-  long cnt = 0;
-  //  count(*)  == >  empty arg  ==>  rowCount
-  //  count(Not-null-input) ==> rowCount
-  if (aggCall.getArgList().isEmpty() ||
-  (aggCall.getArgList().size() == 1 &&
-   ! 
agg.getInput().getRowType().getFieldList().get(aggCall.getArgList().get(0).intValue()).getType().isNullable()))
 {
-cnt = (long) oldGrpScan.getScanStats(settings).getRecordCount();
-  } else if (aggCall.getArgList().size() == 1) {
-  // count(columnName) ==> Agg ( Scan )) ==> columnValueCount
-int index = aggCall.getArgList().get(0);
-
-if (proj != null) {
-  // project in the middle of Agg and Scan : Only when input of 
AggCall is a RexInputRef in Project, we find the index of Scan's field.
-  // For instance,
-  // Agg - count($0)
-  //  \
-  //  Proj - Exp={$1}
-  //\
-  //   Scan (col1, col2).
-  // return count of "col2" in Scan's metadata, if found.
-
-  if (proj.getProjects().get(index) instanceof RexInputRef) {
-index = ((RexInputRef) 
proj.getProjects().get(index)).getIndex();
-  } else {
-return;  // do not apply for all other cases.
-  }
-}
+final CountsCollector countsCollector = new CountsCollector(settings);
+// if counts were not collected, rule won't be applied
+if (!countsCollector.collect(agg, scan, project)) {
+  return;
+}
 
-String columnName = 
scan.getRowType().getFieldNames().get(index).toLowerCase();
+final RelDataType scanRowType = constructDataType(agg);
 
-cnt = 
oldGrpScan.getColumnValueCount(SchemaPath.getSimplePath(columnName));
-if (cnt == GroupScan.NO_COLUMN_STATS) {
-  // if column stats are not available don't apply this rule
-  return;
-}
-  } else {
-return; // do nothing.
-  }
+final DynamicPojoRecordReader reader = new 
DynamicPojoRecordReader<>(
+buildSchema(scanRowType.getFieldNames()),
+Collections.singletonList(countsCollector.getCounts()));
 
-  RelDataType scanRowType = 
getCountDirectScanRowType(agg.getCluster().getTypeFactory());
+final ScanStats scanStats = new 
ScanStats(ScanStats.GroupScanProperty.EXACT_ROW_COUNT, 1, 1, 
scanRowType.getFieldCount());
+final GroupScan directScan = new MetadataDirectGroupScan(reader, 
oldGrpScan.getFiles(), scanStats);
 
-  final ScanPrel newScan = ScanPrel.create(scan,
-  
scan.getTraitSet().plus(Prel.DRILL_PHYSICAL).plus(DrillDistributionTrait.SINGLETON),
 getCountDirectScan(cnt),
-  scanRowType);
+final ScanPrel newScan = ScanPrel.create(scan,
+

[jira] [Commented] (DRILL-4735) Count(dir0) on parquet returns 0 result

2017-08-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16114730#comment-16114730
 ] 

ASF GitHub Bot commented on DRILL-4735:
---

Github user jinfengni commented on a diff in the pull request:

https://github.com/apache/drill/pull/882#discussion_r131449948
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/ConvertCountToDirectScan.java
 ---
@@ -85,109 +91,231 @@ protected ConvertCountToDirectScan(RelOptRuleOperand 
rule, String id) {
   @Override
   public void onMatch(RelOptRuleCall call) {
 final DrillAggregateRel agg = (DrillAggregateRel) call.rel(0);
-final DrillScanRel scan = (DrillScanRel) call.rel(call.rels.length -1);
-final DrillProjectRel proj = call.rels.length == 3 ? (DrillProjectRel) 
call.rel(1) : null;
+final DrillScanRel scan = (DrillScanRel) call.rel(call.rels.length - 
1);
+final DrillProjectRel project = call.rels.length == 3 ? 
(DrillProjectRel) call.rel(1) : null;
 
 final GroupScan oldGrpScan = scan.getGroupScan();
 final PlannerSettings settings = 
PrelUtil.getPlannerSettings(call.getPlanner());
 
-// Only apply the rule when :
+// Only apply the rule when:
 //1) scan knows the exact row count in getSize() call,
 //2) No GroupBY key,
-//3) only one agg function (Check if it's count(*) below).
-//4) No distinct agg call.
+//3) No distinct agg call.
 if 
(!(oldGrpScan.getScanStats(settings).getGroupScanProperty().hasExactRowCount()
 && agg.getGroupCount() == 0
-&& agg.getAggCallList().size() == 1
 && !agg.containsDistinctCall())) {
   return;
 }
 
-AggregateCall aggCall = agg.getAggCallList().get(0);
-
-if (aggCall.getAggregation().getName().equals("COUNT") ) {
-
-  long cnt = 0;
-  //  count(*)  == >  empty arg  ==>  rowCount
-  //  count(Not-null-input) ==> rowCount
-  if (aggCall.getArgList().isEmpty() ||
-  (aggCall.getArgList().size() == 1 &&
-   ! 
agg.getInput().getRowType().getFieldList().get(aggCall.getArgList().get(0).intValue()).getType().isNullable()))
 {
-cnt = (long) oldGrpScan.getScanStats(settings).getRecordCount();
-  } else if (aggCall.getArgList().size() == 1) {
-  // count(columnName) ==> Agg ( Scan )) ==> columnValueCount
-int index = aggCall.getArgList().get(0);
-
-if (proj != null) {
-  // project in the middle of Agg and Scan : Only when input of 
AggCall is a RexInputRef in Project, we find the index of Scan's field.
-  // For instance,
-  // Agg - count($0)
-  //  \
-  //  Proj - Exp={$1}
-  //\
-  //   Scan (col1, col2).
-  // return count of "col2" in Scan's metadata, if found.
-
-  if (proj.getProjects().get(index) instanceof RexInputRef) {
-index = ((RexInputRef) 
proj.getProjects().get(index)).getIndex();
-  } else {
-return;  // do not apply for all other cases.
-  }
-}
+final CountsCollector countsCollector = new CountsCollector(settings);
+// if counts were not collected, rule won't be applied
+if (!countsCollector.collect(agg, scan, project)) {
+  return;
+}
 
-String columnName = 
scan.getRowType().getFieldNames().get(index).toLowerCase();
+final RelDataType scanRowType = constructDataType(agg);
 
-cnt = 
oldGrpScan.getColumnValueCount(SchemaPath.getSimplePath(columnName));
-if (cnt == GroupScan.NO_COLUMN_STATS) {
-  // if column stats are not available don't apply this rule
-  return;
-}
-  } else {
-return; // do nothing.
-  }
+final DynamicPojoRecordReader reader = new 
DynamicPojoRecordReader<>(
+buildSchema(scanRowType.getFieldNames()),
+Collections.singletonList(countsCollector.getCounts()));
 
-  RelDataType scanRowType = 
getCountDirectScanRowType(agg.getCluster().getTypeFactory());
+final ScanStats scanStats = new 
ScanStats(ScanStats.GroupScanProperty.EXACT_ROW_COUNT, 1, 1, 
scanRowType.getFieldCount());
+final GroupScan directScan = new MetadataDirectGroupScan(reader, 
oldGrpScan.getFiles(), scanStats);
 
-  final ScanPrel newScan = ScanPrel.create(scan,
-  
scan.getTraitSet().plus(Prel.DRILL_PHYSICAL).plus(DrillDistributionTrait.SINGLETON),
 getCountDirectScan(cnt),
-  scanRowType);
+final ScanPrel newScan = ScanPrel.create(scan,
+

[jira] [Commented] (DRILL-4735) Count(dir0) on parquet returns 0 result

2017-08-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16114732#comment-16114732
 ] 

ASF GitHub Bot commented on DRILL-4735:
---

Github user jinfengni commented on a diff in the pull request:

https://github.com/apache/drill/pull/882#discussion_r131447579
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/ConvertCountToDirectScan.java
 ---
@@ -85,109 +91,231 @@ protected ConvertCountToDirectScan(RelOptRuleOperand 
rule, String id) {
   @Override
   public void onMatch(RelOptRuleCall call) {
 final DrillAggregateRel agg = (DrillAggregateRel) call.rel(0);
-final DrillScanRel scan = (DrillScanRel) call.rel(call.rels.length -1);
-final DrillProjectRel proj = call.rels.length == 3 ? (DrillProjectRel) 
call.rel(1) : null;
+final DrillScanRel scan = (DrillScanRel) call.rel(call.rels.length - 
1);
+final DrillProjectRel project = call.rels.length == 3 ? 
(DrillProjectRel) call.rel(1) : null;
 
 final GroupScan oldGrpScan = scan.getGroupScan();
 final PlannerSettings settings = 
PrelUtil.getPlannerSettings(call.getPlanner());
 
-// Only apply the rule when :
+// Only apply the rule when:
 //1) scan knows the exact row count in getSize() call,
 //2) No GroupBY key,
-//3) only one agg function (Check if it's count(*) below).
-//4) No distinct agg call.
+//3) No distinct agg call.
 if 
(!(oldGrpScan.getScanStats(settings).getGroupScanProperty().hasExactRowCount()
 && agg.getGroupCount() == 0
-&& agg.getAggCallList().size() == 1
 && !agg.containsDistinctCall())) {
   return;
 }
 
-AggregateCall aggCall = agg.getAggCallList().get(0);
-
-if (aggCall.getAggregation().getName().equals("COUNT") ) {
-
-  long cnt = 0;
-  //  count(*)  == >  empty arg  ==>  rowCount
-  //  count(Not-null-input) ==> rowCount
-  if (aggCall.getArgList().isEmpty() ||
-  (aggCall.getArgList().size() == 1 &&
-   ! 
agg.getInput().getRowType().getFieldList().get(aggCall.getArgList().get(0).intValue()).getType().isNullable()))
 {
-cnt = (long) oldGrpScan.getScanStats(settings).getRecordCount();
-  } else if (aggCall.getArgList().size() == 1) {
-  // count(columnName) ==> Agg ( Scan )) ==> columnValueCount
-int index = aggCall.getArgList().get(0);
-
-if (proj != null) {
-  // project in the middle of Agg and Scan : Only when input of 
AggCall is a RexInputRef in Project, we find the index of Scan's field.
-  // For instance,
-  // Agg - count($0)
-  //  \
-  //  Proj - Exp={$1}
-  //\
-  //   Scan (col1, col2).
-  // return count of "col2" in Scan's metadata, if found.
-
-  if (proj.getProjects().get(index) instanceof RexInputRef) {
-index = ((RexInputRef) 
proj.getProjects().get(index)).getIndex();
-  } else {
-return;  // do not apply for all other cases.
-  }
-}
+final CountsCollector countsCollector = new CountsCollector(settings);
+// if counts were not collected, rule won't be applied
+if (!countsCollector.collect(agg, scan, project)) {
+  return;
+}
 
-String columnName = 
scan.getRowType().getFieldNames().get(index).toLowerCase();
+final RelDataType scanRowType = constructDataType(agg);
 
-cnt = 
oldGrpScan.getColumnValueCount(SchemaPath.getSimplePath(columnName));
-if (cnt == GroupScan.NO_COLUMN_STATS) {
-  // if column stats are not available don't apply this rule
-  return;
-}
-  } else {
-return; // do nothing.
-  }
+final DynamicPojoRecordReader reader = new 
DynamicPojoRecordReader<>(
+buildSchema(scanRowType.getFieldNames()),
+Collections.singletonList(countsCollector.getCounts()));
 
-  RelDataType scanRowType = 
getCountDirectScanRowType(agg.getCluster().getTypeFactory());
+final ScanStats scanStats = new 
ScanStats(ScanStats.GroupScanProperty.EXACT_ROW_COUNT, 1, 1, 
scanRowType.getFieldCount());
+final GroupScan directScan = new MetadataDirectGroupScan(reader, 
oldGrpScan.getFiles(), scanStats);
 
-  final ScanPrel newScan = ScanPrel.create(scan,
-  
scan.getTraitSet().plus(Prel.DRILL_PHYSICAL).plus(DrillDistributionTrait.SINGLETON),
 getCountDirectScan(cnt),
-  scanRowType);
+final ScanPrel newScan = ScanPrel.create(scan,
+

[jira] [Commented] (DRILL-4735) Count(dir0) on parquet returns 0 result

2017-08-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16114727#comment-16114727
 ] 

ASF GitHub Bot commented on DRILL-4735:
---

Github user jinfengni commented on a diff in the pull request:

https://github.com/apache/drill/pull/882#discussion_r131446392
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/ConvertCountToDirectScan.java
 ---
@@ -85,109 +91,231 @@ protected ConvertCountToDirectScan(RelOptRuleOperand 
rule, String id) {
   @Override
   public void onMatch(RelOptRuleCall call) {
 final DrillAggregateRel agg = (DrillAggregateRel) call.rel(0);
-final DrillScanRel scan = (DrillScanRel) call.rel(call.rels.length -1);
-final DrillProjectRel proj = call.rels.length == 3 ? (DrillProjectRel) 
call.rel(1) : null;
+final DrillScanRel scan = (DrillScanRel) call.rel(call.rels.length - 
1);
+final DrillProjectRel project = call.rels.length == 3 ? 
(DrillProjectRel) call.rel(1) : null;
 
 final GroupScan oldGrpScan = scan.getGroupScan();
 final PlannerSettings settings = 
PrelUtil.getPlannerSettings(call.getPlanner());
 
-// Only apply the rule when :
+// Only apply the rule when:
 //1) scan knows the exact row count in getSize() call,
 //2) No GroupBY key,
-//3) only one agg function (Check if it's count(*) below).
-//4) No distinct agg call.
+//3) No distinct agg call.
 if 
(!(oldGrpScan.getScanStats(settings).getGroupScanProperty().hasExactRowCount()
 && agg.getGroupCount() == 0
-&& agg.getAggCallList().size() == 1
 && !agg.containsDistinctCall())) {
   return;
 }
 
-AggregateCall aggCall = agg.getAggCallList().get(0);
-
-if (aggCall.getAggregation().getName().equals("COUNT") ) {
-
-  long cnt = 0;
-  //  count(*)  == >  empty arg  ==>  rowCount
-  //  count(Not-null-input) ==> rowCount
-  if (aggCall.getArgList().isEmpty() ||
-  (aggCall.getArgList().size() == 1 &&
-   ! 
agg.getInput().getRowType().getFieldList().get(aggCall.getArgList().get(0).intValue()).getType().isNullable()))
 {
-cnt = (long) oldGrpScan.getScanStats(settings).getRecordCount();
-  } else if (aggCall.getArgList().size() == 1) {
-  // count(columnName) ==> Agg ( Scan )) ==> columnValueCount
-int index = aggCall.getArgList().get(0);
-
-if (proj != null) {
-  // project in the middle of Agg and Scan : Only when input of 
AggCall is a RexInputRef in Project, we find the index of Scan's field.
-  // For instance,
-  // Agg - count($0)
-  //  \
-  //  Proj - Exp={$1}
-  //\
-  //   Scan (col1, col2).
-  // return count of "col2" in Scan's metadata, if found.
-
-  if (proj.getProjects().get(index) instanceof RexInputRef) {
-index = ((RexInputRef) 
proj.getProjects().get(index)).getIndex();
-  } else {
-return;  // do not apply for all other cases.
-  }
-}
+final CountsCollector countsCollector = new CountsCollector(settings);
+// if counts were not collected, rule won't be applied
+if (!countsCollector.collect(agg, scan, project)) {
+  return;
+}
 
-String columnName = 
scan.getRowType().getFieldNames().get(index).toLowerCase();
+final RelDataType scanRowType = constructDataType(agg);
 
-cnt = 
oldGrpScan.getColumnValueCount(SchemaPath.getSimplePath(columnName));
-if (cnt == GroupScan.NO_COLUMN_STATS) {
-  // if column stats are not available don't apply this rule
-  return;
-}
-  } else {
-return; // do nothing.
-  }
+final DynamicPojoRecordReader reader = new 
DynamicPojoRecordReader<>(
+buildSchema(scanRowType.getFieldNames()),
+Collections.singletonList(countsCollector.getCounts()));
 
-  RelDataType scanRowType = 
getCountDirectScanRowType(agg.getCluster().getTypeFactory());
+final ScanStats scanStats = new 
ScanStats(ScanStats.GroupScanProperty.EXACT_ROW_COUNT, 1, 1, 
scanRowType.getFieldCount());
+final GroupScan directScan = new MetadataDirectGroupScan(reader, 
oldGrpScan.getFiles(), scanStats);
 
-  final ScanPrel newScan = ScanPrel.create(scan,
-  
scan.getTraitSet().plus(Prel.DRILL_PHYSICAL).plus(DrillDistributionTrait.SINGLETON),
 getCountDirectScan(cnt),
-  scanRowType);
+final ScanPrel newScan = ScanPrel.create(scan,
+

[jira] [Commented] (DRILL-4735) Count(dir0) on parquet returns 0 result

2017-08-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16114731#comment-16114731
 ] 

ASF GitHub Bot commented on DRILL-4735:
---

Github user jinfengni commented on a diff in the pull request:

https://github.com/apache/drill/pull/882#discussion_r131450166
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/direct/MetadataDirectGroupScan.java
 ---
@@ -0,0 +1,86 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.direct;
+
+import com.fasterxml.jackson.annotation.JsonTypeName;
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.physical.base.GroupScan;
+import org.apache.drill.exec.physical.base.PhysicalOperator;
+import org.apache.drill.exec.physical.base.ScanStats;
+import org.apache.drill.exec.store.RecordReader;
+
+import java.util.Collection;
+import java.util.List;
+
+/**
+ * Represents direct scan based on metadata information.
+ * For example, for parquet files it can be obtained from parquet footer 
(total row count)
+ * or from parquet metadata files (column counts).
+ * Contains reader, statistics and list of scanned files if present.
+ */
+@JsonTypeName("metadata-direct-scan")
+public class MetadataDirectGroupScan extends DirectGroupScan {
+
+  private final Collection files;
+
+  public MetadataDirectGroupScan(RecordReader reader, Collection 
files) {
+super(reader);
+this.files = files;
+  }
+
+  public MetadataDirectGroupScan(RecordReader reader, Collection 
files, ScanStats stats) {
+super(reader, stats);
+this.files = files;
+  }
+
+  @Override
+  public PhysicalOperator getNewWithChildren(List 
children) throws ExecutionSetupException {
+assert children == null || children.isEmpty();
+return new MetadataDirectGroupScan(reader, files, stats);
+  }
+
+  @Override
+  public GroupScan clone(List columns) {
+return this;
+  }
+
+  /**
+   * 
+   * Returns string representation of group scan data.
+   * Includes list of files if present.
+   * 
+   *
+   * 
+   * Example: [usedMetadata = true, files = [/tmp/0_0_0.parquet], numFiles 
= 1]
+   * 
+   *
+   * @return string representation of group scan data
+   */
+  @Override
+  public String getDigest() {
+StringBuilder builder = new StringBuilder();
+builder.append("usedMetadata = true, ");
--- End diff --

This "useMetadata=true" seems to be redundant, since it's for 
MetadataDirectGS. 


> Count(dir0) on parquet returns 0 result
> ---
>
> Key: DRILL-4735
> URL: https://issues.apache.org/jira/browse/DRILL-4735
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization, Storage - Parquet
>Affects Versions: 1.0.0, 1.4.0, 1.6.0, 1.7.0
>Reporter: Krystal
>Assignee: Arina Ielchiieva
>Priority: Critical
>
> Selecting a count of dir0, dir1, etc against a parquet directory returns 0 
> rows.
> select count(dir0) from `min_max_dir`;
> +-+
> | EXPR$0  |
> +-+
> | 0   |
> +-+
> select count(dir1) from `min_max_dir`;
> +-+
> | EXPR$0  |
> +-+
> | 0   |
> +-+
> If I put both dir0 and dir1 in the same select, it returns expected result:
> select count(dir0), count(dir1) from `min_max_dir`;
> +-+-+
> | EXPR$0  | EXPR$1  |
> +-+-+
> | 600 | 600 |
> +-+-+
> Here is the physical plan for count(dir0) query:
> {code}
> 00-00Screen : rowType = RecordType(BIGINT EXPR$0): rowcount = 20.0, 
> cumulative cost = {22.0 rows, 22.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id 
> = 1346
> 00-01  Project(EXPR$0=[$0]) 

[jira] [Resolved] (DRILL-3119) Query stays in "CANCELLATION_REQUESTED" status in UI after OOM of Direct buffer memory

2017-08-04 Thread Roman (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roman resolved DRILL-3119.
--
   Resolution: Duplicate
Fix Version/s: 1.11.0

> Query stays in "CANCELLATION_REQUESTED" status in UI after OOM of Direct 
> buffer memory
> --
>
> Key: DRILL-3119
> URL: https://issues.apache.org/jira/browse/DRILL-3119
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.0.0
>Reporter: Hao Zhu
>Assignee: Roman
> Fix For: 1.11.0
>
>
> Tested in 1.0.0 with below commit id:
> {code}
> > select * from sys.version;
> +---+++--++
> | commit_id |   
> commit_message   |commit_time | 
> build_email  | build_time |
> +---+++--++
> | d8b19759657698581cc0d01d7038797952888123  | DRILL-3100: 
> TestImpersonationDisabledWithMiniDFS fails on Windows  | 15.05.2015 @ 
> 01:18:03 EDT  | Unknown  | 15.05.2015 @ 03:07:10 EDT  |
> +---+++--++
> 1 row selected (0.26 seconds)
> {code}
> How to reproduce:
> 1. Single node cluster.
> 2.  Reduce DRILL_MAX_DIRECT_MEMORY="2G".
> 3. Run a hash join which is big enough to trigger OOM.
> eg:
> {code}
> select count(*) from
> (
> select a.* from dfs.root.`user/hive/warehouse/passwords_csv_big` a, 
> dfs.root.`user/hive/warehouse/passwords_csv_big` b
> where a.columns[1]=b.columns[1]
> );
> {code}
> After that, drillbit.log shows OOM:
> {code}
> 2015-05-16 19:24:34,391 [2aa866ba-8939-b184-0ba2-291734329f88:frag:4:4] INFO  
> o.a.d.e.w.fragment.FragmentExecutor - 
> 2aa866ba-8939-b184-0ba2-291734329f88:4:4: State change requested from RUNNING 
> --> FINISHED for
> 2015-05-16 19:24:34,391 [2aa866ba-8939-b184-0ba2-291734329f88:frag:4:4] INFO  
> o.a.d.e.w.f.AbstractStatusReporter - State changed for 
> 2aa866ba-8939-b184-0ba2-291734329f88:4:4. New state: FINISHED
> 2015-05-16 19:24:38,561 [BitServer-5] ERROR 
> o.a.d.exec.rpc.RpcExceptionHandler - Exception in RPC communication.  
> Connection: /10.0.0.31:31012 <--> /10.0.0.31:41923 (data server).  Closing 
> connection.
> io.netty.handler.codec.DecoderException: java.lang.OutOfMemoryError: Direct 
> buffer memory
>   at 
> io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:233)
>  ~[netty-codec-4.0.27.Final.jar:4.0.27.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
>   at 
> io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
>   at 
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:847)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
>   at 
> io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:618)
>  [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
>   at 
> io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:329) 
> [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
>   at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:250) 
> [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
>   at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
>  [netty-common-4.0.27.Final.jar:4.0.27.Final]
>   at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45]
> Caused by: java.lang.OutOfMemoryError: Direct buffer memory
>   at java.nio.Bits.reserveMemory(Bits.java:658) ~[na:1.8.0_45]
>   at 

[jira] [Commented] (DRILL-3119) Query stays in "CANCELLATION_REQUESTED" status in UI after OOM of Direct buffer memory

2017-08-04 Thread Roman (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16114634#comment-16114634
 ] 

Roman commented on DRILL-3119:
--

I tried to reproduce this issue on 1 drillbit cluster using 
mapr-drill-1.2.0.201510071035 with query:

{code:title=Query|borderStyle=solid}
select count(*) from
(
select a.* from dfs.tmp.`lineitembig.dat` a, dfs.tmp.`lineitembig.dat` b
where a.columns[0]=b.columns[0]
);
{code}

And I can get similar error:
{code:title=Error|borderStyle=solid}
Exception in thread "267b60a0-d710-d8b9-1ff0-62761fcf4c1e:frag:1:0" 
java.lang.OutOfMemoryError: Direct buffer memory
at java.nio.Bits.reserveMemory(Bits.java:694)
at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123)
at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311)
at io.netty.util.internal.Cleaner0.(Cleaner0.java:37)
at 
io.netty.util.internal.PlatformDependent0.freeDirectBuffer(PlatformDependent0.java:147)
at 
io.netty.util.internal.PlatformDependent.freeDirectBuffer(PlatformDependent.java:281)
at 
io.netty.buffer.PoolArena$DirectArena.destroyChunk(PoolArena.java:448)
at io.netty.buffer.PoolChunkList.free(PoolChunkList.java:70)
at io.netty.buffer.PoolArena.free(PoolArena.java:203)
at io.netty.buffer.PooledByteBuf.deallocate(PooledByteBuf.java:147)
at 
io.netty.buffer.AbstractReferenceCountedByteBuf.release(AbstractReferenceCountedByteBuf.java:128)
at io.netty.buffer.WrappedByteBuf.release(WrappedByteBuf.java:825)
at 
io.netty.buffer.UnsafeDirectLittleEndian.release(UnsafeDirectLittleEndian.java:238)
at 
io.netty.buffer.AbstractDerivedByteBuf.release(AbstractDerivedByteBuf.java:55)
at io.netty.buffer.DrillBuf.release(DrillBuf.java:252)
at io.netty.buffer.DrillBuf.release(DrillBuf.java:259)
at io.netty.buffer.DrillBuf.release(DrillBuf.java:259)
at io.netty.buffer.DrillBuf.release(DrillBuf.java:259)
at io.netty.buffer.DrillBuf.release(DrillBuf.java:239)
at 
org.apache.drill.exec.vector.BaseDataValueVector.clear(BaseDataValueVector.java:39)
at 
org.apache.drill.exec.vector.VarCharVector.clear(VarCharVector.java:206)
at 
org.apache.drill.exec.vector.NullableVarCharVector.clear(NullableVarCharVector.java:151)
at 
org.apache.drill.exec.record.HyperVectorWrapper.clear(HyperVectorWrapper.java:82)
at 
org.apache.drill.exec.record.VectorContainer.zeroVectors(VectorContainer.java:312)
at 
org.apache.drill.exec.record.VectorContainer.clear(VectorContainer.java:296)
at 
org.apache.drill.exec.physical.impl.join.HashJoinBatch.close(HashJoinBatch.java:529)
at 
org.apache.drill.exec.physical.impl.BaseRootExec.close(BaseRootExec.java:122)
at 
org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources(FragmentExecutor.java:341)
at 
org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:173)
at 
org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:292)
at 
org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
{code}

And after this error Drill hangs in CANCELLATION_REQUESTED state with jstack:
{code:title=Jstack|borderStyle=solid}
"267b60a0-d710-d8b9-1ff0-62761fcf4c1e:frag:2:0" #63 daemon prio=10 os_prio=0 
tid=0x7f3dfc694000 nid=0x492d waiting on condition [0x7f3de2d7f000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0xf5fb1278> (a 
java.util.concurrent.Semaphore$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
at java.util.concurrent.Semaphore.acquire(Semaphore.java:467)
at 
org.apache.drill.exec.ops.SendingAccountor.waitForSendComplete(SendingAccountor.java:48)
- locked <0xf5fb1240> (a 
org.apache.drill.exec.ops.SendingAccountor)
at 
org.apache.drill.exec.ops.FragmentContext.waitForSendComplete(FragmentContext.java:436)
at 
org.apache.drill.exec.physical.impl.BaseRootExec.close(BaseRootExec.java:112)
at 
org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.close(PartitionSenderRootExec.java:336)

[jira] [Assigned] (DRILL-3119) Query stays in "CANCELLATION_REQUESTED" status in UI after OOM of Direct buffer memory

2017-08-04 Thread Roman (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roman reassigned DRILL-3119:


Assignee: Roman

> Query stays in "CANCELLATION_REQUESTED" status in UI after OOM of Direct 
> buffer memory
> --
>
> Key: DRILL-3119
> URL: https://issues.apache.org/jira/browse/DRILL-3119
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.0.0
>Reporter: Hao Zhu
>Assignee: Roman
>
> Tested in 1.0.0 with below commit id:
> {code}
> > select * from sys.version;
> +---+++--++
> | commit_id |   
> commit_message   |commit_time | 
> build_email  | build_time |
> +---+++--++
> | d8b19759657698581cc0d01d7038797952888123  | DRILL-3100: 
> TestImpersonationDisabledWithMiniDFS fails on Windows  | 15.05.2015 @ 
> 01:18:03 EDT  | Unknown  | 15.05.2015 @ 03:07:10 EDT  |
> +---+++--++
> 1 row selected (0.26 seconds)
> {code}
> How to reproduce:
> 1. Single node cluster.
> 2.  Reduce DRILL_MAX_DIRECT_MEMORY="2G".
> 3. Run a hash join which is big enough to trigger OOM.
> eg:
> {code}
> select count(*) from
> (
> select a.* from dfs.root.`user/hive/warehouse/passwords_csv_big` a, 
> dfs.root.`user/hive/warehouse/passwords_csv_big` b
> where a.columns[1]=b.columns[1]
> );
> {code}
> After that, drillbit.log shows OOM:
> {code}
> 2015-05-16 19:24:34,391 [2aa866ba-8939-b184-0ba2-291734329f88:frag:4:4] INFO  
> o.a.d.e.w.fragment.FragmentExecutor - 
> 2aa866ba-8939-b184-0ba2-291734329f88:4:4: State change requested from RUNNING 
> --> FINISHED for
> 2015-05-16 19:24:34,391 [2aa866ba-8939-b184-0ba2-291734329f88:frag:4:4] INFO  
> o.a.d.e.w.f.AbstractStatusReporter - State changed for 
> 2aa866ba-8939-b184-0ba2-291734329f88:4:4. New state: FINISHED
> 2015-05-16 19:24:38,561 [BitServer-5] ERROR 
> o.a.d.exec.rpc.RpcExceptionHandler - Exception in RPC communication.  
> Connection: /10.0.0.31:31012 <--> /10.0.0.31:41923 (data server).  Closing 
> connection.
> io.netty.handler.codec.DecoderException: java.lang.OutOfMemoryError: Direct 
> buffer memory
>   at 
> io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:233)
>  ~[netty-codec-4.0.27.Final.jar:4.0.27.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
>   at 
> io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
>   at 
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:847)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
>   at 
> io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:618)
>  [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
>   at 
> io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:329) 
> [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
>   at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:250) 
> [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
>   at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
>  [netty-common-4.0.27.Final.jar:4.0.27.Final]
>   at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45]
> Caused by: java.lang.OutOfMemoryError: Direct buffer memory
>   at java.nio.Bits.reserveMemory(Bits.java:658) ~[na:1.8.0_45]
>   at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123) 
> ~[na:1.8.0_45]
>   at 

[jira] [Commented] (DRILL-5699) Drill Web UI Page Source Has Links To External Sites

2017-08-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16114504#comment-16114504
 ] 

ASF GitHub Bot commented on DRILL-5699:
---

Github user sindhurirayavaram commented on the issue:

https://github.com/apache/drill/pull/891
  
@arina-ielchiieva After the first commit, I had a detailed discussion with 
@parthchandra. There are two reasons two include all the js and css files 
locally
1) The biggest library among them, the jquery is already included in the 
resources file. Even libraries like d3 are there locally.
2) One of the js and css libraries, Colvis by datatables is retired. We 
have to keep a local copy of that incase the url becomes extinct in the future. 
Check this please [Colvis](https://datatables.net/extensions/colvis/) 
 So we decided to just include all the js and css libraries locally. I 
updated the libraries to their min versions and some of them to their latest 
versions. I made a change in generic page where we are loading google external 
cdn first and if its failing falling back to the local library. Please let me 
know if there are any changes needed here. Thank you. 


> Drill Web UI Page Source Has Links To External Sites
> 
>
> Key: DRILL-5699
> URL: https://issues.apache.org/jira/browse/DRILL-5699
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - HTTP
>Reporter: Sindhuri Ramanarayan Rayavaram
>Assignee: Sindhuri Ramanarayan Rayavaram
>Priority: Minor
> Fix For: 1.12.0
>
>
> Drill uses external CDN for javascript and css files in the result page. When 
> there is no internet connection this page fails to load. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5699) Drill Web UI Page Source Has Links To External Sites

2017-08-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16114474#comment-16114474
 ] 

ASF GitHub Bot commented on DRILL-5699:
---

Github user arina-ielchiieva commented on the issue:

https://github.com/apache/drill/pull/891
  
@sindhurirayavaram I am a little confused with your latest changes. I 
thought the main purpose of this PR is not to include js and css files into 
Drill code as was discussed in https://github.com/apache/drill/pull/663.
I thought your first round of changes was correct, though we just needed to 
make sure, they work in different browsers, so basically I thought that all is 
left to do is to add `onload` so changes can be run in IE. 
Though maybe I am missing something ...


> Drill Web UI Page Source Has Links To External Sites
> 
>
> Key: DRILL-5699
> URL: https://issues.apache.org/jira/browse/DRILL-5699
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - HTTP
>Reporter: Sindhuri Ramanarayan Rayavaram
>Assignee: Sindhuri Ramanarayan Rayavaram
>Priority: Minor
> Fix For: 1.12.0
>
>
> Drill uses external CDN for javascript and css files in the result page. When 
> there is no internet connection this page fails to load. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5691) multiple count distinct query planning error at physical phase

2017-08-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16114206#comment-16114206
 ] 

ASF GitHub Bot commented on DRILL-5691:
---

Github user weijietong commented on the issue:

https://github.com/apache/drill/pull/889
  
@arina-ielchiieva  I have corrected the codes as you guide. But sorry for 
the unit tests, I have tried a long time to simulate a scan with 1 row ,but 
failed to do that.  The row count of a scan is fetched from the 
AbstractGroupScan.getScanStats method. At my plugin ,I override this method to 
ensure it will return 1 .  


> multiple count distinct query planning error at physical phase 
> ---
>
> Key: DRILL-5691
> URL: https://issues.apache.org/jira/browse/DRILL-5691
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.9.0, 1.10.0
>Reporter: weijie.tong
>
> I materialized the count distinct query result in a cache , added a plugin 
> rule to translate the (Aggregate、Aggregate、Project、Scan) or 
> (Aggregate、Aggregate、Scan) to (Project、Scan) at the PARTITION_PRUNING phase. 
> Then ,once user issue count distinct queries , it will be translated to query 
> the cache to get the result.
> eg1: " select count(*),sum(a) ,count(distinct b)  from t where dt=xx " 
> eg2:"select count(*),sum(a) ,count(distinct b) ,count(distinct c) from t 
> where dt=xxx "
> eg3:"select count(distinct b), count(distinct c) from t where dt=xxx"
> eg1 will be right and have a query result as I expected , but eg2 will be 
> wrong at the physical phase.The error info is here: 
> https://gist.github.com/weijietong/1b8ed12db9490bf006e8b3fe0ee52269. 
> eg3 will also get the similar error.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)