from:"Robert Hou \\\\\\\(JIRA\\\\\\\)"

[jira] [Comment Edited] (DRILL-4707) Conflicting columns names under case-insensitive policy lead to either memory leak or incorrect result

2016-06-08 Thread Robert Hou (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15321712#comment-15321712
 ] 

Robert Hou edited comment on DRILL-4707 at 6/9/16 12:37 AM:


Here is another query:

SELECT s.student_id as student, s.name as StudenT, s.gpa as StudenT from 
student s join hive.alltypesp1 h on (s.student_id = h.c1);
Error: SYSTEM ERROR: IllegalStateException: Failure while reading vector.  
Expected vector class of org.apache.drill.exec.vector.NullableVarBinaryVector 
but was holding vector class org.apache.drill.exec.vector.NullableFloat8Vector, 
field= StudenT0(FLOAT8:OPTIONAL) 

student_id is an integer
name is a varchar
gpa is a double

c1 is an integer


select version, commit_id, commit_message from sys.version;
+-+---++
| version | commit_id |   
commit_message   |
+-+---++
| 1.7.0-SNAPSHOT  | a07f4de7e8725f7971ace308e81a241b7b07b5b6  | DRILL-3522: Fix 
for sporadic Mongo errors  |
+-+---++


was (Author: rhou):
Here is another query:

SELECT s.student_id as student, s.name as StudenT, s.gpa as StudenT from 
student s join hive.alltypesp1 h on (s.student_id = h.c1);
Error: SYSTEM ERROR: IllegalStateException: Failure while reading vector.  
Expected vector class of org.apache.drill.exec.vector.NullableVarBinaryVector 
but was holding vector class org.apache.drill.exec.vector.NullableFloat8Vector, 
field= StudenT0(FLOAT8:OPTIONAL) 

student_id is an integer
name is a varchar
gpa is a double

c1 is an integer


0: jdbc:drill:zk=10.10.100.186:5181> select version, commit_id, commit_message 
from sys.version;
+-+---++
| version | commit_id |   
commit_message   |
+-+---++
| 1.7.0-SNAPSHOT  | a07f4de7e8725f7971ace308e81a241b7b07b5b6  | DRILL-3522: Fix 
for sporadic Mongo errors  |
+-+---++

> Conflicting columns names under case-insensitive policy lead to either memory 
> leak or incorrect result
> --
>
> Key: DRILL-4707
> URL: https://issues.apache.org/jira/browse/DRILL-4707
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Jinfeng Ni
>Assignee: Jinfeng Ni
>Priority: Critical
>
> On latest master branch:
> {code}
> select version, commit_id, commit_message from sys.version;
> +-+---+-+
> | version | commit_id |   
>   commit_message  |
> +-+---+-+
> | 1.7.0-SNAPSHOT  | 3186217e5abe3c6c2c7e504cdb695567ff577e4c  | DRILL-4607: 
> Add a split function that allows to separate string by a delimiter  |
> +-+---+-+
> {code}
> If a query has two conflicting column names under case-insensitive policy, 
> Drill will either hit memory leak, or incorrect issue.
> Q1.
> {code}
> select r_regionkey as XYZ, r_name as xyz FROM cp.`tpch/region.parquet`;
> Error: SYSTEM ERROR: IllegalStateException: Memory was leaked by query. 
> Memory leaked: (131072)
> Allocator(op:0:0:1:Project) 100/131072/2490368/100 
> (res/actual/peak/limit)
> Fragment 0:0
> {code}
> Q2: return only one column in the result. 
> {code}
> select n_nationkey as XYZ, n_regionkey as xyz FROM cp.`tpch/nation.parquet`;
> +--+
> | XYZ  |
> +--+
> | 0|
> | 1|
> | 1|
> | 1|
> | 4|
> | 0|
> | 3|
> {code}
> The cause of the problem seems to be that the Project thinks the two incoming 
> columns as identical (since Drill adopts case-insensitive for column names in 
> execution). 
> The planner should make sure that the conflicting columns are resolved, since 
> execution is name-based. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4707) Conflicting columns names under case-insensitive policy lead to either memory leak or incorrect result

2016-06-08 Thread Robert Hou (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15321712#comment-15321712
 ] 

Robert Hou commented on DRILL-4707:
---

Here is another query:

SELECT s.student_id as student, s.name as StudenT, s.gpa as StudenT from 
student s join hive.alltypesp1 h on (s.student_id = h.c1);
Error: SYSTEM ERROR: IllegalStateException: Failure while reading vector.  
Expected vector class of org.apache.drill.exec.vector.NullableVarBinaryVector 
but was holding vector class org.apache.drill.exec.vector.NullableFloat8Vector, 
field= StudenT0(FLOAT8:OPTIONAL) 

student_id is an integer
name is a varchar
gpa is a double

c1 is an integer


0: jdbc:drill:zk=10.10.100.186:5181> select version, commit_id, commit_message 
from sys.version;
+-+---++
| version | commit_id |   
commit_message   |
+-+---++
| 1.7.0-SNAPSHOT  | a07f4de7e8725f7971ace308e81a241b7b07b5b6  | DRILL-3522: Fix 
for sporadic Mongo errors  |
+-+---++

> Conflicting columns names under case-insensitive policy lead to either memory 
> leak or incorrect result
> --
>
> Key: DRILL-4707
> URL: https://issues.apache.org/jira/browse/DRILL-4707
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Jinfeng Ni
>Assignee: Jinfeng Ni
>Priority: Critical
>
> On latest master branch:
> {code}
> select version, commit_id, commit_message from sys.version;
> +-+---+-+
> | version | commit_id |   
>   commit_message  |
> +-+---+-+
> | 1.7.0-SNAPSHOT  | 3186217e5abe3c6c2c7e504cdb695567ff577e4c  | DRILL-4607: 
> Add a split function that allows to separate string by a delimiter  |
> +-+---+-+
> {code}
> If a query has two conflicting column names under case-insensitive policy, 
> Drill will either hit memory leak, or incorrect issue.
> Q1.
> {code}
> select r_regionkey as XYZ, r_name as xyz FROM cp.`tpch/region.parquet`;
> Error: SYSTEM ERROR: IllegalStateException: Memory was leaked by query. 
> Memory leaked: (131072)
> Allocator(op:0:0:1:Project) 100/131072/2490368/100 
> (res/actual/peak/limit)
> Fragment 0:0
> {code}
> Q2: return only one column in the result. 
> {code}
> select n_nationkey as XYZ, n_regionkey as xyz FROM cp.`tpch/nation.parquet`;
> +--+
> | XYZ  |
> +--+
> | 0|
> | 1|
> | 1|
> | 1|
> | 4|
> | 0|
> | 3|
> {code}
> The cause of the problem seems to be that the Project thinks the two incoming 
> columns as identical (since Drill adopts case-insensitive for column names in 
> execution). 
> The planner should make sure that the conflicting columns are resolved, since 
> execution is name-based. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-4707) Conflicting columns names under case-insensitive policy lead to either memory leak or incorrect result

2016-07-13 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou updated DRILL-4707:
--
Reviewer: Robert Hou  (was: Chun Chang)

> Conflicting columns names under case-insensitive policy lead to either memory 
> leak or incorrect result
> --
>
> Key: DRILL-4707
> URL: https://issues.apache.org/jira/browse/DRILL-4707
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Jinfeng Ni
>Assignee: Jinfeng Ni
>Priority: Critical
> Fix For: 1.8.0
>
>
> On latest master branch:
> {code}
> select version, commit_id, commit_message from sys.version;
> +-+---+-+
> | version | commit_id |   
>   commit_message  |
> +-+---+-+
> | 1.7.0-SNAPSHOT  | 3186217e5abe3c6c2c7e504cdb695567ff577e4c  | DRILL-4607: 
> Add a split function that allows to separate string by a delimiter  |
> +-+---+-+
> {code}
> If a query has two conflicting column names under case-insensitive policy, 
> Drill will either hit memory leak, or incorrect issue.
> Q1.
> {code}
> select r_regionkey as XYZ, r_name as xyz FROM cp.`tpch/region.parquet`;
> Error: SYSTEM ERROR: IllegalStateException: Memory was leaked by query. 
> Memory leaked: (131072)
> Allocator(op:0:0:1:Project) 100/131072/2490368/100 
> (res/actual/peak/limit)
> Fragment 0:0
> {code}
> Q2: return only one column in the result. 
> {code}
> select n_nationkey as XYZ, n_regionkey as xyz FROM cp.`tpch/nation.parquet`;
> +--+
> | XYZ  |
> +--+
> | 0|
> | 1|
> | 1|
> | 1|
> | 4|
> | 0|
> | 3|
> {code}
> The cause of the problem seems to be that the Project thinks the two incoming 
> columns as identical (since Drill adopts case-insensitive for column names in 
> execution). 
> The planner should make sure that the conflicting columns are resolved, since 
> execution is name-based. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Closed] (DRILL-4514) Add describe schema command

2016-07-25 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou closed DRILL-4514.
-

Tests pass.

> Add describe schema  command
> -
>
> Key: DRILL-4514
> URL: https://issues.apache.org/jira/browse/DRILL-4514
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: Future
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>  Labels: doc-impacting
> Fix For: 1.8.0
>
>
> Add describe database  command which will return directory 
> associated with a database on the fly.
> Syntax:
> describe database 
> describe schema 
> Output:
> {code:sql}
>  DESCRIBE SCHEMA dfs.tmp;
> {code}
> {noformat}
> +++
> | schema | properties |
> +++
> | dfs.tmp | {
>   "type" : "file",
>   "enabled" : true,
>   "connection" : "file:///",
>   "config" : null,
>   "formats" : {
> "psv" : {
>   "type" : "text",
>   "extensions" : [ "tbl" ],
>   "delimiter" : "|"
> },
> "csv" : {
>   "type" : "text",
>   "extensions" : [ "csv" ],
>   "delimiter" : ","
> },
> "tsv" : {
>   "type" : "text",
>   "extensions" : [ "tsv" ],
>   "delimiter" : "\t"
> },
> "parquet" : {
>   "type" : "parquet"
> },
> "json" : {
>   "type" : "json",
>   "extensions" : [ "json" ]
> },
> "avro" : {
>   "type" : "avro"
> },
> "sequencefile" : {
>   "type" : "sequencefile",
>   "extensions" : [ "seq" ]
> },
> "csvh" : {
>   "type" : "text",
>   "extensions" : [ "csvh" ],
>   "extractHeader" : true,
>   "delimiter" : ","
> }
>   },
>   "location" : "/tmp",
>   "writable" : true,
>   "defaultInputFormat" : null
> } |
> +++
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4514) Add describe schema command

2016-07-25 Thread Robert Hou (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15392490#comment-15392490
 ] 

Robert Hou commented on DRILL-4514:
---

Tests have been added, commit: cdcb7a0736646105ae01db8d49b88de22977a336.

Tests pass.

> Add describe schema  command
> -
>
> Key: DRILL-4514
> URL: https://issues.apache.org/jira/browse/DRILL-4514
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: Future
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>  Labels: doc-impacting
> Fix For: 1.8.0
>
>
> Add describe database  command which will return directory 
> associated with a database on the fly.
> Syntax:
> describe database 
> describe schema 
> Output:
> {code:sql}
>  DESCRIBE SCHEMA dfs.tmp;
> {code}
> {noformat}
> +++
> | schema | properties |
> +++
> | dfs.tmp | {
>   "type" : "file",
>   "enabled" : true,
>   "connection" : "file:///",
>   "config" : null,
>   "formats" : {
> "psv" : {
>   "type" : "text",
>   "extensions" : [ "tbl" ],
>   "delimiter" : "|"
> },
> "csv" : {
>   "type" : "text",
>   "extensions" : [ "csv" ],
>   "delimiter" : ","
> },
> "tsv" : {
>   "type" : "text",
>   "extensions" : [ "tsv" ],
>   "delimiter" : "\t"
> },
> "parquet" : {
>   "type" : "parquet"
> },
> "json" : {
>   "type" : "json",
>   "extensions" : [ "json" ]
> },
> "avro" : {
>   "type" : "avro"
> },
> "sequencefile" : {
>   "type" : "sequencefile",
>   "extensions" : [ "seq" ]
> },
> "csvh" : {
>   "type" : "text",
>   "extensions" : [ "csvh" ],
>   "extractHeader" : true,
>   "delimiter" : ","
> }
>   },
>   "location" : "/tmp",
>   "writable" : true,
>   "defaultInputFormat" : null
> } |
> +++
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Closed] (DRILL-4707) Conflicting columns names under case-insensitive policy lead to either memory leak or incorrect result

2016-07-27 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou closed DRILL-4707.
-

Tests have passed.

> Conflicting columns names under case-insensitive policy lead to either memory 
> leak or incorrect result
> --
>
> Key: DRILL-4707
> URL: https://issues.apache.org/jira/browse/DRILL-4707
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Jinfeng Ni
>Assignee: Jinfeng Ni
>Priority: Critical
> Fix For: 1.8.0
>
>
> On latest master branch:
> {code}
> select version, commit_id, commit_message from sys.version;
> +-+---+-+
> | version | commit_id |   
>   commit_message  |
> +-+---+-+
> | 1.7.0-SNAPSHOT  | 3186217e5abe3c6c2c7e504cdb695567ff577e4c  | DRILL-4607: 
> Add a split function that allows to separate string by a delimiter  |
> +-+---+-+
> {code}
> If a query has two conflicting column names under case-insensitive policy, 
> Drill will either hit memory leak, or incorrect issue.
> Q1.
> {code}
> select r_regionkey as XYZ, r_name as xyz FROM cp.`tpch/region.parquet`;
> Error: SYSTEM ERROR: IllegalStateException: Memory was leaked by query. 
> Memory leaked: (131072)
> Allocator(op:0:0:1:Project) 100/131072/2490368/100 
> (res/actual/peak/limit)
> Fragment 0:0
> {code}
> Q2: return only one column in the result. 
> {code}
> select n_nationkey as XYZ, n_regionkey as xyz FROM cp.`tpch/nation.parquet`;
> +--+
> | XYZ  |
> +--+
> | 0|
> | 1|
> | 1|
> | 1|
> | 4|
> | 0|
> | 3|
> {code}
> The cause of the problem seems to be that the Project thinks the two incoming 
> columns as identical (since Drill adopts case-insensitive for column names in 
> execution). 
> The planner should make sure that the conflicting columns are resolved, since 
> execution is name-based. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4147) Union All operator runs in a single fragment

2016-08-06 Thread Robert Hou (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15410511#comment-15410511
 ] 

Robert Hou commented on DRILL-4147:
---

Here is a simple test case using lineitem.  Lineitem can be small, but it needs 
to be created with many parquet files.

alter session set `store.parquet.block-size`=50;
create table lineitemfiles as select * from lineitem;

create table newlineitemfiles as 
with lineitem_cte as
(
select l.l_orderkey, l.l_partkey
from lineitemfiles l limit 1)

(select l.l_orderkey, l.l_partkey
from
lineitemfiles l
inner join
orders o
on l.l_orderkey = o.o_orderkey)

union all

(select l.l_orderkey, l.l_partkey
from lineitem_cte l);



> Union All operator runs in a single fragment
> 
>
> Key: DRILL-4147
> URL: https://issues.apache.org/jira/browse/DRILL-4147
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: amit hadke
>Assignee: Aman Sinha
>
> A User noticed that running select  from a single directory is much faster 
> than union all on two directories.
> (https://drill.apache.org/blog/2014/12/09/running-sql-queries-on-amazon-s3/#comment-2349732267)
>  
> It seems like UNION ALL operator doesn't parallelize sub scans (its using 
> SINGLETON for distribution type). Everything is ran in single fragment.
> We may have to use SubsetTransformer in UnionAllPrule.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-4833) Union-All with a small cardinality input on one side does not get parallelized

2016-08-17 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou updated DRILL-4833:
--
Reviewer: Robert Hou

> Union-All with a small cardinality input on one side does not get parallelized
> --
>
> Key: DRILL-4833
> URL: https://issues.apache.org/jira/browse/DRILL-4833
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.7.0
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>
> When a Union-All has an input that is a LIMIT 1 (or some small value relative 
> to the slice_target), and that input is accessing Parquet files, Drill does 
> an optimization where a single Parquet file is read (based on the rowcount 
> statistics in the Parquet file, we determine that reading 1 file is 
> sufficient).  This also means that the max width for that major fragment is 
> set to 1 because only 1 minor fragment is needed to read 1 row-group. 
> The net effect of this is the width of 1 is applied to the major fragment 
> which consists of union-all and its inputs.  This is sub-optimal because it 
> prevents parallelization of the other input and the union-all operator 
> itself.  
> Here's an example query and plan that illustrates the issue: 
> {noformat}
> alter session set `planner.slice_target` = 1;
> explain plan for 
> (select c.c_nationkey, c.c_custkey, c.c_name
> from
> dfs.`/Users/asinha/data/tpchmulti/customer` c
> inner join
> dfs.`/Users/asinha/data/tpchmulti/nation`  n
> on c.c_nationkey = n.n_nationkey)
> union all
> (select c_nationkey, c_custkey, c_name
> from dfs.`/Users/asinha/data/tpchmulti/customer` c limit 1)
> +--+--+
> | text | json |
> +--+--+
> | 00-00Screen
> 00-01  Project(c_nationkey=[$0], c_custkey=[$1], c_name=[$2])
> 00-02Project(c_nationkey=[$0], c_custkey=[$1], c_name=[$2])
> 00-03  UnionAll(all=[true])
> 00-05Project(c_nationkey=[$0], c_custkey=[$1], c_name=[$2])
> 00-07  HashJoin(condition=[=($0, $3)], joinType=[inner])
> 00-10Project(c_nationkey=[$0], c_custkey=[$1], c_name=[$2])
> 00-13  HashToRandomExchange(dist0=[[$0]])
> 01-01UnorderedMuxExchange
> 03-01  Project(c_nationkey=[$0], c_custkey=[$1], 
> c_name=[$2], E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
> 03-02Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath 
> [path=file:/Users/asinha/data/tpchmulti/customer]], 
> selectionRoot=file:/Users/asinha/data/tpchmulti/customer, numFiles=1, 
> usedMetadataFile=false, columns=[`c_nationkey`, `c_custkey`, `c_name`]]])
> 00-09Project(n_nationkey=[$0])
> 00-12  HashToRandomExchange(dist0=[[$0]])
> 02-01UnorderedMuxExchange
> 04-01  Project(n_nationkey=[$0], 
> E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
> 04-02Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath [path=file:/Users/asinha/data/tpchmulti/nation]], 
> selectionRoot=file:/Users/asinha/data/tpchmulti/nation, numFiles=1, 
> usedMetadataFile=false, columns=[`n_nationkey`]]])
> 00-04Project(c_nationkey=[$0], c_custkey=[$1], c_name=[$2])
> 00-06  SelectionVectorRemover
> 00-08Limit(fetch=[1])
> 00-11  Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath 
> [path=/Users/asinha/data/tpchmulti/customer/01.parquet]], 
> selectionRoot=file:/Users/asinha/data/tpchmulti/customer, numFiles=1, 
> usedMetadataFile=false, columns=[`c_nationkey`, `c_custkey`, `c_name`]]])
> {noformat}
> Note that Union-all and HashJoin are part of fragment 0 (single minor 
> fragment) even though they could have been parallelized.  This clearly 
> affects performance for larger data sets. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-4883) Drill Explorer returns "SYSTEM ERROR: UnsupportedOperationException: Unhandled field reference ; a field reference identifier must not have the form of a qualified name (

2016-09-07 Thread Robert Hou (JIRA)

Robert Hou created DRILL-4883:
-

 Summary: Drill Explorer returns "SYSTEM ERROR: 
UnsupportedOperationException: Unhandled field reference ; a field reference 
identifier must not have the form of a qualified name (i.e., with ".").
 Key: DRILL-4883
 URL: https://issues.apache.org/jira/browse/DRILL-4883
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Codegen
Affects Versions: 1.8.0
 Environment: Drill Explorer runs in Windows
Reporter: Robert Hou


When Drill Explorer submits this query, it returns an error regarding 
favorites.color:

select age,`favorites.color` from 
`dfs`.`drillTestDir`.`./json_storage/employeeNestedArrayAndObject.json`

The error is:

ERROR [HY000] [MapR][Drill] (1040) Drill failed to execute the query: select 
age,`favorites.color` from 
`dfs`.`drillTestDir`.`./json_storage/employeeNestedArrayAndObject.json`
[30027]Query execution error. Details:[ 
SYSTEM ERROR: UnsupportedOperationException: Unhandled field reference 
"favorites.color"; a field reference identifier must not have the form of a 
qualified name (i.e., with ".").

This query can be executed by sqlline (note that the format of the query is 
slightly different for sqlline and Drill Explorer).

select age,`favorites.color` from 
`json_storage/employeeNestedArrayAndObject.json`;

The physical plan for the query when using sqlline is different from the 
physical plan when using Drill Explorer.  Here is the plan when using sqlline:

00-00Screen : rowType = RecordType(ANY age, ANY favorites.color): rowcount 
= 1.0, cumulative cost = {0.1 rows, 0.1 cpu, 0.0 io, 0.0 network, 0.0 memory}, 
id = 19699870
00-01  Project(age=[$0], favorites.color=[$1]) : rowType = RecordType(ANY 
age, ANY favorites.color): rowcount = 1.0, cumulative cost = {0.0 rows, 0.0 
cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 19699869
00-02Scan(groupscan=[EasyGroupScan 
[selectionRoot=maprfs:/drill/testdata/json_storage/employeeNestedArrayAndObject.json,
 numFiles=1, columns=[`age`, `favorites.color`], 
files=[maprfs:///drill/testdata/json_storage/employeeNestedArrayAndObject.json]]])
 : rowType = RecordType(ANY age, ANY favorites.color): rowcount = 1.0, 
cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 
19699868

The physical plan when using Drill Explorer is:

00-00Screen : rowType = RecordType(ANY age, ANY favorites.color): rowcount 
= 1.0, cumulative cost = {1.1 rows, 1.1 cpu, 0.0 io, 0.0 network, 0.0 memory}, 
id = 19675621
00-01  ComplexToJson : rowType = RecordType(ANY age, ANY favorites.color): 
rowcount = 1.0, cumulative cost = {1.0 rows, 1.0 cpu, 0.0 io, 0.0 network, 0.0 
memory}, id = 19675620
00-02Project(age=[$0], favorites.color=[$1]) : rowType = RecordType(ANY 
age, ANY favorites.color): rowcount = 1.0, cumulative cost = {0.0 rows, 0.0 
cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 19675619
00-03  Scan(groupscan=[EasyGroupScan 
[selectionRoot=maprfs:/drill/testdata/json_storage/employeeNestedArrayAndObject.json,
 numFiles=1, columns=[`age`, `favorites.color`], 
files=[maprfs:///drill/testdata/json_storage/employeeNestedArrayAndObject.json]]])
 : rowType = RecordType(ANY age, ANY favorites.color): rowcount = 1.0, 
cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 
19675618

Drill Explorer has an extra ComplexToJson operator that may have a problem.

Here is the data file used:

{
 "first": "John",
 "last": "Doe",
 "age": 39,
 "sex": "M",
 "salary": 7,
 "registered": true,
 "interests": [ "Reading", "Mountain Biking", "Hacking" ],
 "favorites": {
  "color": "Blue",
  "sport": "Soccer",
  "food": "Spaghetti"
 }
}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-3944) Drill MAXDIR Unknown variable or type "FILE_SEPARATOR"

2016-09-19 Thread Robert Hou (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-3944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15504960#comment-15504960
 ] 

Robert Hou commented on DRILL-3944:
---

This fix has been verified.

> Drill MAXDIR Unknown variable or type "FILE_SEPARATOR"
> --
>
> Key: DRILL-3944
> URL: https://issues.apache.org/jira/browse/DRILL-3944
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.2.0
> Environment: 1.2.0
>Reporter: Jitendra
>Assignee: Arina Ielchiieva
> Attachments: newStackTrace.txt
>
>
> We are facing issue with MAXDIR function, below is the query we are using to 
> reproduce this issue.
> 0: jdbc:drill:drillbit=localhost> select maxdir('vspace.wspace', 'freemat2') 
> from vspace.wspace.`freemat2`;
> Error: SYSTEM ERROR: CompileException: Line 75, Column 70: Unknown variable 
> or type "FILE_SEPARATOR"
> Fragment 0:0
> [Error Id: d17c6e48-554d-4934-bc4d-783ca3dc6f51 on 10.10.99.71:31010] 
> (state=,code=0);
> Below are the drillbit logs.
> 2015-10-09 21:26:21,972 [29e7cf02-02bf-b007-72f2-52c67c80ea1c:frag:0:0] INFO 
> o.a.d.e.w.fragment.FragmentExecutor - 
> 29e7cf02-02bf-b007-72f2-52c67c80ea1c:0:0: State change requested 
> AWAITING_ALLOCATION --> RUNNING
> 2015-10-09 21:26:21,972 [29e7cf02-02bf-b007-72f2-52c67c80ea1c:frag:0:0] INFO 
> o.a.d.e.w.f.FragmentStatusReporter - 
> 29e7cf02-02bf-b007-72f2-52c67c80ea1c:0:0: State to report: RUNNING
> 2015-10-09 21:26:22,038 [29e7cf02-02bf-b007-72f2-52c67c80ea1c:frag:0:0] INFO 
> o.a.d.e.w.fragment.FragmentExecutor - 
> 29e7cf02-02bf-b007-72f2-52c67c80ea1c:0:0: State change requested RUNNING --> 
> FINISHED
> 2015-10-09 21:26:22,039 [29e7cf02-02bf-b007-72f2-52c67c80ea1c:frag:0:0] INFO 
> o.a.d.e.w.f.FragmentStatusReporter - 
> 29e7cf02-02bf-b007-72f2-52c67c80ea1c:0:0: State to report: FINISHED
> 2015-10-09 21:29:59,281 [29e7ce27-9cad-9d8a-a482-39f54cc7deda:foreman] INFO 
> o.a.d.e.store.mock.MockStorageEngine - Failure while attempting to check for 
> Parquet metadata file.
> java.io.IOException: Open failed for file: /vspace/wspace/freemat2/20151005, 
> error: Invalid argument (22)
> at com.mapr.fs.MapRClientImpl.open(MapRClientImpl.java:212) 
> ~[maprfs-4.1.0-mapr.jar:4.1.0-mapr]
> at com.mapr.fs.MapRFileSystem.open(MapRFileSystem.java:862) 
> ~[maprfs-4.1.0-mapr.jar:4.1.0-mapr]
> at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:800) 
> ~[hadoop-common-2.5.1-mapr-1503.jar:na]
> at 
> org.apache.drill.exec.store.dfs.DrillFileSystem.open(DrillFileSystem.java:132)
>  ~[drill-java-exec-1.2.0.jar:1.2.0]
> at 
> org.apache.drill.exec.store.dfs.BasicFormatMatcher$MagicStringMatcher.matches(BasicFormatMatcher.java:142)
>  ~[drill-java-exec-1.2.0.jar:1.2.0]
> at 
> org.apache.drill.exec.store.dfs.BasicFormatMatcher.isFileReadable(BasicFormatMatcher.java:112)
>  ~[drill-java-exec-1.2.0.jar:1.2.0]
> at 
> org.apache.drill.exec.store.parquet.ParquetFormatPlugin$ParquetFormatMatcher.isDirReadable(ParquetFormatPlugin.java:256)
>  [drill-java-exec-1.2.0.jar:1.2.0]
> at 
> org.apache.drill.exec.store.parquet.ParquetFormatPlugin$ParquetFormatMatcher.isReadable(ParquetFormatPlugin.java:210)
>  [drill-java-exec-1.2.0.jar:1.2.0]
> at 
> org.apache.drill.exec.store.dfs.WorkspaceSchemaFactory$WorkspaceSchema.create(WorkspaceSchemaFactory.java:326)
>  [drill-java-exec-1.2.0.jar:1.2.0]
> at 
> org.apache.drill.exec.store.dfs.WorkspaceSchemaFactory$WorkspaceSchema.create(WorkspaceSchemaFactory.java:153)
>  [drill-java-exec-1.2.0.jar:1.2.0]
> at 
> org.apache.drill.exec.planner.sql.ExpandingConcurrentMap.getNewEntry(ExpandingConcurrentMap.java:96)
>  [drill-java-exec-1.2.0.jar:1.2.0]
> at 
> org.apache.drill.exec.planner.sql.ExpandingConcurrentMap.get(ExpandingConcurrentMap.java:90)
>  [drill-java-exec-1.2.0.jar:1.2.0]
> at 
> org.apache.drill.exec.store.dfs.WorkspaceSchemaFactory$WorkspaceSchema.getTable(WorkspaceSchemaFactory.java:276)
>  [drill-java-exec-1.2.0.jar:1.2.0]
> at 
> org.apache.calcite.jdbc.SimpleCalciteSchema.getTable(SimpleCalciteSchema.java:83)
>  [calcite-core-1.4.0-drill-r5.jar:1.4.0-drill-r5]
> at 
> org.apache.calcite.prepare.CalciteCatalogReader.getTableFrom(CalciteCatalogReader.java:116)
>  [calcite-core-1.4.0-drill-r5.jar:1.4.0-drill-r5]
> at 
> org.apache.calcite.prepare.CalciteCatalogReader.getTable(CalciteCatalogReader.java:99)
>  [calcite-core-1.4.0-drill-r5.jar:1.4.0-drill-r5]
> at 
> org.apache.calcite.prepare.CalciteCatalogReader.getTable(CalciteCatalogReader.java:70)
>  [calcite-core-1.4.0-drill-r5.jar:1.4.0-drill-r5]
> at 
> org.apache.calcite.sql.validate.EmptyScope.getTableNamespace(EmptyScope.java:75)
>  [calcite-core-1.4.0-drill-r5.jar:1.4.0-drill-r5]
> at 
> org.apache.calcite.sql.validate.DelegatingScope.getTableNamespace(DelegatingScope.java:124)
>  [calcite-core-1.4.0-drill-r5.jar:1.4.0-drill-r5]
> at 
>

[jira] [Reopened] (DRILL-3944) Drill MAXDIR Unknown variable or type "FILE_SEPARATOR"

2016-09-19 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-3944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou reopened DRILL-3944:
---

> Drill MAXDIR Unknown variable or type "FILE_SEPARATOR"
> --
>
> Key: DRILL-3944
> URL: https://issues.apache.org/jira/browse/DRILL-3944
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.2.0
> Environment: 1.2.0
>Reporter: Jitendra
>Assignee: Arina Ielchiieva
> Attachments: newStackTrace.txt
>
>
> We are facing issue with MAXDIR function, below is the query we are using to 
> reproduce this issue.
> 0: jdbc:drill:drillbit=localhost> select maxdir('vspace.wspace', 'freemat2') 
> from vspace.wspace.`freemat2`;
> Error: SYSTEM ERROR: CompileException: Line 75, Column 70: Unknown variable 
> or type "FILE_SEPARATOR"
> Fragment 0:0
> [Error Id: d17c6e48-554d-4934-bc4d-783ca3dc6f51 on 10.10.99.71:31010] 
> (state=,code=0);
> Below are the drillbit logs.
> 2015-10-09 21:26:21,972 [29e7cf02-02bf-b007-72f2-52c67c80ea1c:frag:0:0] INFO 
> o.a.d.e.w.fragment.FragmentExecutor - 
> 29e7cf02-02bf-b007-72f2-52c67c80ea1c:0:0: State change requested 
> AWAITING_ALLOCATION --> RUNNING
> 2015-10-09 21:26:21,972 [29e7cf02-02bf-b007-72f2-52c67c80ea1c:frag:0:0] INFO 
> o.a.d.e.w.f.FragmentStatusReporter - 
> 29e7cf02-02bf-b007-72f2-52c67c80ea1c:0:0: State to report: RUNNING
> 2015-10-09 21:26:22,038 [29e7cf02-02bf-b007-72f2-52c67c80ea1c:frag:0:0] INFO 
> o.a.d.e.w.fragment.FragmentExecutor - 
> 29e7cf02-02bf-b007-72f2-52c67c80ea1c:0:0: State change requested RUNNING --> 
> FINISHED
> 2015-10-09 21:26:22,039 [29e7cf02-02bf-b007-72f2-52c67c80ea1c:frag:0:0] INFO 
> o.a.d.e.w.f.FragmentStatusReporter - 
> 29e7cf02-02bf-b007-72f2-52c67c80ea1c:0:0: State to report: FINISHED
> 2015-10-09 21:29:59,281 [29e7ce27-9cad-9d8a-a482-39f54cc7deda:foreman] INFO 
> o.a.d.e.store.mock.MockStorageEngine - Failure while attempting to check for 
> Parquet metadata file.
> java.io.IOException: Open failed for file: /vspace/wspace/freemat2/20151005, 
> error: Invalid argument (22)
> at com.mapr.fs.MapRClientImpl.open(MapRClientImpl.java:212) 
> ~[maprfs-4.1.0-mapr.jar:4.1.0-mapr]
> at com.mapr.fs.MapRFileSystem.open(MapRFileSystem.java:862) 
> ~[maprfs-4.1.0-mapr.jar:4.1.0-mapr]
> at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:800) 
> ~[hadoop-common-2.5.1-mapr-1503.jar:na]
> at 
> org.apache.drill.exec.store.dfs.DrillFileSystem.open(DrillFileSystem.java:132)
>  ~[drill-java-exec-1.2.0.jar:1.2.0]
> at 
> org.apache.drill.exec.store.dfs.BasicFormatMatcher$MagicStringMatcher.matches(BasicFormatMatcher.java:142)
>  ~[drill-java-exec-1.2.0.jar:1.2.0]
> at 
> org.apache.drill.exec.store.dfs.BasicFormatMatcher.isFileReadable(BasicFormatMatcher.java:112)
>  ~[drill-java-exec-1.2.0.jar:1.2.0]
> at 
> org.apache.drill.exec.store.parquet.ParquetFormatPlugin$ParquetFormatMatcher.isDirReadable(ParquetFormatPlugin.java:256)
>  [drill-java-exec-1.2.0.jar:1.2.0]
> at 
> org.apache.drill.exec.store.parquet.ParquetFormatPlugin$ParquetFormatMatcher.isReadable(ParquetFormatPlugin.java:210)
>  [drill-java-exec-1.2.0.jar:1.2.0]
> at 
> org.apache.drill.exec.store.dfs.WorkspaceSchemaFactory$WorkspaceSchema.create(WorkspaceSchemaFactory.java:326)
>  [drill-java-exec-1.2.0.jar:1.2.0]
> at 
> org.apache.drill.exec.store.dfs.WorkspaceSchemaFactory$WorkspaceSchema.create(WorkspaceSchemaFactory.java:153)
>  [drill-java-exec-1.2.0.jar:1.2.0]
> at 
> org.apache.drill.exec.planner.sql.ExpandingConcurrentMap.getNewEntry(ExpandingConcurrentMap.java:96)
>  [drill-java-exec-1.2.0.jar:1.2.0]
> at 
> org.apache.drill.exec.planner.sql.ExpandingConcurrentMap.get(ExpandingConcurrentMap.java:90)
>  [drill-java-exec-1.2.0.jar:1.2.0]
> at 
> org.apache.drill.exec.store.dfs.WorkspaceSchemaFactory$WorkspaceSchema.getTable(WorkspaceSchemaFactory.java:276)
>  [drill-java-exec-1.2.0.jar:1.2.0]
> at 
> org.apache.calcite.jdbc.SimpleCalciteSchema.getTable(SimpleCalciteSchema.java:83)
>  [calcite-core-1.4.0-drill-r5.jar:1.4.0-drill-r5]
> at 
> org.apache.calcite.prepare.CalciteCatalogReader.getTableFrom(CalciteCatalogReader.java:116)
>  [calcite-core-1.4.0-drill-r5.jar:1.4.0-drill-r5]
> at 
> org.apache.calcite.prepare.CalciteCatalogReader.getTable(CalciteCatalogReader.java:99)
>  [calcite-core-1.4.0-drill-r5.jar:1.4.0-drill-r5]
> at 
> org.apache.calcite.prepare.CalciteCatalogReader.getTable(CalciteCatalogReader.java:70)
>  [calcite-core-1.4.0-drill-r5.jar:1.4.0-drill-r5]
> at 
> org.apache.calcite.sql.validate.EmptyScope.getTableNamespace(EmptyScope.java:75)
>  [calcite-core-1.4.0-drill-r5.jar:1.4.0-drill-r5]
> at 
> org.apache.calcite.sql.validate.DelegatingScope.getTableNamespace(DelegatingScope.java:124)
>  [calcite-core-1.4.0-drill-r5.jar:1.4.0-drill-r5]
> at 
>

[jira] [Closed] (DRILL-3944) Drill MAXDIR Unknown variable or type "FILE_SEPARATOR"

2016-09-19 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-3944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou closed DRILL-3944.
-
Resolution: Fixed

> Drill MAXDIR Unknown variable or type "FILE_SEPARATOR"
> --
>
> Key: DRILL-3944
> URL: https://issues.apache.org/jira/browse/DRILL-3944
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.2.0
> Environment: 1.2.0
>Reporter: Jitendra
>Assignee: Arina Ielchiieva
> Attachments: newStackTrace.txt
>
>
> We are facing issue with MAXDIR function, below is the query we are using to 
> reproduce this issue.
> 0: jdbc:drill:drillbit=localhost> select maxdir('vspace.wspace', 'freemat2') 
> from vspace.wspace.`freemat2`;
> Error: SYSTEM ERROR: CompileException: Line 75, Column 70: Unknown variable 
> or type "FILE_SEPARATOR"
> Fragment 0:0
> [Error Id: d17c6e48-554d-4934-bc4d-783ca3dc6f51 on 10.10.99.71:31010] 
> (state=,code=0);
> Below are the drillbit logs.
> 2015-10-09 21:26:21,972 [29e7cf02-02bf-b007-72f2-52c67c80ea1c:frag:0:0] INFO 
> o.a.d.e.w.fragment.FragmentExecutor - 
> 29e7cf02-02bf-b007-72f2-52c67c80ea1c:0:0: State change requested 
> AWAITING_ALLOCATION --> RUNNING
> 2015-10-09 21:26:21,972 [29e7cf02-02bf-b007-72f2-52c67c80ea1c:frag:0:0] INFO 
> o.a.d.e.w.f.FragmentStatusReporter - 
> 29e7cf02-02bf-b007-72f2-52c67c80ea1c:0:0: State to report: RUNNING
> 2015-10-09 21:26:22,038 [29e7cf02-02bf-b007-72f2-52c67c80ea1c:frag:0:0] INFO 
> o.a.d.e.w.fragment.FragmentExecutor - 
> 29e7cf02-02bf-b007-72f2-52c67c80ea1c:0:0: State change requested RUNNING --> 
> FINISHED
> 2015-10-09 21:26:22,039 [29e7cf02-02bf-b007-72f2-52c67c80ea1c:frag:0:0] INFO 
> o.a.d.e.w.f.FragmentStatusReporter - 
> 29e7cf02-02bf-b007-72f2-52c67c80ea1c:0:0: State to report: FINISHED
> 2015-10-09 21:29:59,281 [29e7ce27-9cad-9d8a-a482-39f54cc7deda:foreman] INFO 
> o.a.d.e.store.mock.MockStorageEngine - Failure while attempting to check for 
> Parquet metadata file.
> java.io.IOException: Open failed for file: /vspace/wspace/freemat2/20151005, 
> error: Invalid argument (22)
> at com.mapr.fs.MapRClientImpl.open(MapRClientImpl.java:212) 
> ~[maprfs-4.1.0-mapr.jar:4.1.0-mapr]
> at com.mapr.fs.MapRFileSystem.open(MapRFileSystem.java:862) 
> ~[maprfs-4.1.0-mapr.jar:4.1.0-mapr]
> at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:800) 
> ~[hadoop-common-2.5.1-mapr-1503.jar:na]
> at 
> org.apache.drill.exec.store.dfs.DrillFileSystem.open(DrillFileSystem.java:132)
>  ~[drill-java-exec-1.2.0.jar:1.2.0]
> at 
> org.apache.drill.exec.store.dfs.BasicFormatMatcher$MagicStringMatcher.matches(BasicFormatMatcher.java:142)
>  ~[drill-java-exec-1.2.0.jar:1.2.0]
> at 
> org.apache.drill.exec.store.dfs.BasicFormatMatcher.isFileReadable(BasicFormatMatcher.java:112)
>  ~[drill-java-exec-1.2.0.jar:1.2.0]
> at 
> org.apache.drill.exec.store.parquet.ParquetFormatPlugin$ParquetFormatMatcher.isDirReadable(ParquetFormatPlugin.java:256)
>  [drill-java-exec-1.2.0.jar:1.2.0]
> at 
> org.apache.drill.exec.store.parquet.ParquetFormatPlugin$ParquetFormatMatcher.isReadable(ParquetFormatPlugin.java:210)
>  [drill-java-exec-1.2.0.jar:1.2.0]
> at 
> org.apache.drill.exec.store.dfs.WorkspaceSchemaFactory$WorkspaceSchema.create(WorkspaceSchemaFactory.java:326)
>  [drill-java-exec-1.2.0.jar:1.2.0]
> at 
> org.apache.drill.exec.store.dfs.WorkspaceSchemaFactory$WorkspaceSchema.create(WorkspaceSchemaFactory.java:153)
>  [drill-java-exec-1.2.0.jar:1.2.0]
> at 
> org.apache.drill.exec.planner.sql.ExpandingConcurrentMap.getNewEntry(ExpandingConcurrentMap.java:96)
>  [drill-java-exec-1.2.0.jar:1.2.0]
> at 
> org.apache.drill.exec.planner.sql.ExpandingConcurrentMap.get(ExpandingConcurrentMap.java:90)
>  [drill-java-exec-1.2.0.jar:1.2.0]
> at 
> org.apache.drill.exec.store.dfs.WorkspaceSchemaFactory$WorkspaceSchema.getTable(WorkspaceSchemaFactory.java:276)
>  [drill-java-exec-1.2.0.jar:1.2.0]
> at 
> org.apache.calcite.jdbc.SimpleCalciteSchema.getTable(SimpleCalciteSchema.java:83)
>  [calcite-core-1.4.0-drill-r5.jar:1.4.0-drill-r5]
> at 
> org.apache.calcite.prepare.CalciteCatalogReader.getTableFrom(CalciteCatalogReader.java:116)
>  [calcite-core-1.4.0-drill-r5.jar:1.4.0-drill-r5]
> at 
> org.apache.calcite.prepare.CalciteCatalogReader.getTable(CalciteCatalogReader.java:99)
>  [calcite-core-1.4.0-drill-r5.jar:1.4.0-drill-r5]
> at 
> org.apache.calcite.prepare.CalciteCatalogReader.getTable(CalciteCatalogReader.java:70)
>  [calcite-core-1.4.0-drill-r5.jar:1.4.0-drill-r5]
> at 
> org.apache.calcite.sql.validate.EmptyScope.getTableNamespace(EmptyScope.java:75)
>  [calcite-core-1.4.0-drill-r5.jar:1.4.0-drill-r5]
> at 
> org.apache.calcite.sql.validate.DelegatingScope.getTableNamespace(DelegatingScope.java:124)
>  [calcite-core-1.4.0-drill-r5.jar:1.4.0-drill-r5]
> at 
>

[jira] [Updated] (DRILL-3944) Drill MAXDIR Unknown variable or type "FILE_SEPARATOR"

2016-09-19 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-3944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou updated DRILL-3944:
--
Reviewer: Robert Hou

> Drill MAXDIR Unknown variable or type "FILE_SEPARATOR"
> --
>
> Key: DRILL-3944
> URL: https://issues.apache.org/jira/browse/DRILL-3944
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.2.0
> Environment: 1.2.0
>Reporter: Jitendra
>Assignee: Arina Ielchiieva
> Attachments: newStackTrace.txt
>
>
> We are facing issue with MAXDIR function, below is the query we are using to 
> reproduce this issue.
> 0: jdbc:drill:drillbit=localhost> select maxdir('vspace.wspace', 'freemat2') 
> from vspace.wspace.`freemat2`;
> Error: SYSTEM ERROR: CompileException: Line 75, Column 70: Unknown variable 
> or type "FILE_SEPARATOR"
> Fragment 0:0
> [Error Id: d17c6e48-554d-4934-bc4d-783ca3dc6f51 on 10.10.99.71:31010] 
> (state=,code=0);
> Below are the drillbit logs.
> 2015-10-09 21:26:21,972 [29e7cf02-02bf-b007-72f2-52c67c80ea1c:frag:0:0] INFO 
> o.a.d.e.w.fragment.FragmentExecutor - 
> 29e7cf02-02bf-b007-72f2-52c67c80ea1c:0:0: State change requested 
> AWAITING_ALLOCATION --> RUNNING
> 2015-10-09 21:26:21,972 [29e7cf02-02bf-b007-72f2-52c67c80ea1c:frag:0:0] INFO 
> o.a.d.e.w.f.FragmentStatusReporter - 
> 29e7cf02-02bf-b007-72f2-52c67c80ea1c:0:0: State to report: RUNNING
> 2015-10-09 21:26:22,038 [29e7cf02-02bf-b007-72f2-52c67c80ea1c:frag:0:0] INFO 
> o.a.d.e.w.fragment.FragmentExecutor - 
> 29e7cf02-02bf-b007-72f2-52c67c80ea1c:0:0: State change requested RUNNING --> 
> FINISHED
> 2015-10-09 21:26:22,039 [29e7cf02-02bf-b007-72f2-52c67c80ea1c:frag:0:0] INFO 
> o.a.d.e.w.f.FragmentStatusReporter - 
> 29e7cf02-02bf-b007-72f2-52c67c80ea1c:0:0: State to report: FINISHED
> 2015-10-09 21:29:59,281 [29e7ce27-9cad-9d8a-a482-39f54cc7deda:foreman] INFO 
> o.a.d.e.store.mock.MockStorageEngine - Failure while attempting to check for 
> Parquet metadata file.
> java.io.IOException: Open failed for file: /vspace/wspace/freemat2/20151005, 
> error: Invalid argument (22)
> at com.mapr.fs.MapRClientImpl.open(MapRClientImpl.java:212) 
> ~[maprfs-4.1.0-mapr.jar:4.1.0-mapr]
> at com.mapr.fs.MapRFileSystem.open(MapRFileSystem.java:862) 
> ~[maprfs-4.1.0-mapr.jar:4.1.0-mapr]
> at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:800) 
> ~[hadoop-common-2.5.1-mapr-1503.jar:na]
> at 
> org.apache.drill.exec.store.dfs.DrillFileSystem.open(DrillFileSystem.java:132)
>  ~[drill-java-exec-1.2.0.jar:1.2.0]
> at 
> org.apache.drill.exec.store.dfs.BasicFormatMatcher$MagicStringMatcher.matches(BasicFormatMatcher.java:142)
>  ~[drill-java-exec-1.2.0.jar:1.2.0]
> at 
> org.apache.drill.exec.store.dfs.BasicFormatMatcher.isFileReadable(BasicFormatMatcher.java:112)
>  ~[drill-java-exec-1.2.0.jar:1.2.0]
> at 
> org.apache.drill.exec.store.parquet.ParquetFormatPlugin$ParquetFormatMatcher.isDirReadable(ParquetFormatPlugin.java:256)
>  [drill-java-exec-1.2.0.jar:1.2.0]
> at 
> org.apache.drill.exec.store.parquet.ParquetFormatPlugin$ParquetFormatMatcher.isReadable(ParquetFormatPlugin.java:210)
>  [drill-java-exec-1.2.0.jar:1.2.0]
> at 
> org.apache.drill.exec.store.dfs.WorkspaceSchemaFactory$WorkspaceSchema.create(WorkspaceSchemaFactory.java:326)
>  [drill-java-exec-1.2.0.jar:1.2.0]
> at 
> org.apache.drill.exec.store.dfs.WorkspaceSchemaFactory$WorkspaceSchema.create(WorkspaceSchemaFactory.java:153)
>  [drill-java-exec-1.2.0.jar:1.2.0]
> at 
> org.apache.drill.exec.planner.sql.ExpandingConcurrentMap.getNewEntry(ExpandingConcurrentMap.java:96)
>  [drill-java-exec-1.2.0.jar:1.2.0]
> at 
> org.apache.drill.exec.planner.sql.ExpandingConcurrentMap.get(ExpandingConcurrentMap.java:90)
>  [drill-java-exec-1.2.0.jar:1.2.0]
> at 
> org.apache.drill.exec.store.dfs.WorkspaceSchemaFactory$WorkspaceSchema.getTable(WorkspaceSchemaFactory.java:276)
>  [drill-java-exec-1.2.0.jar:1.2.0]
> at 
> org.apache.calcite.jdbc.SimpleCalciteSchema.getTable(SimpleCalciteSchema.java:83)
>  [calcite-core-1.4.0-drill-r5.jar:1.4.0-drill-r5]
> at 
> org.apache.calcite.prepare.CalciteCatalogReader.getTableFrom(CalciteCatalogReader.java:116)
>  [calcite-core-1.4.0-drill-r5.jar:1.4.0-drill-r5]
> at 
> org.apache.calcite.prepare.CalciteCatalogReader.getTable(CalciteCatalogReader.java:99)
>  [calcite-core-1.4.0-drill-r5.jar:1.4.0-drill-r5]
> at 
> org.apache.calcite.prepare.CalciteCatalogReader.getTable(CalciteCatalogReader.java:70)
>  [calcite-core-1.4.0-drill-r5.jar:1.4.0-drill-r5]
> at 
> org.apache.calcite.sql.validate.EmptyScope.getTableNamespace(EmptyScope.java:75)
>  [calcite-core-1.4.0-drill-r5.jar:1.4.0-drill-r5]
> at 
> org.apache.calcite.sql.validate.DelegatingScope.getTableNamespace(DelegatingScope.java:124)
>  [calcite-core-1.4.0-drill-r5.jar:1.4.0-drill-r5]
> at 
>

[jira] [Closed] (DRILL-4147) Union All operator runs in a single fragment

2016-08-29 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou closed DRILL-4147.
-

This fix has been verified.

> Union All operator runs in a single fragment
> 
>
> Key: DRILL-4147
> URL: https://issues.apache.org/jira/browse/DRILL-4147
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: amit hadke
>Assignee: Aman Sinha
> Fix For: 1.8.0
>
>
> A User noticed that running select  from a single directory is much faster 
> than union all on two directories.
> (https://drill.apache.org/blog/2014/12/09/running-sql-queries-on-amazon-s3/#comment-2349732267)
>  
> It seems like UNION ALL operator doesn't parallelize sub scans (its using 
> SINGLETON for distribution type). Everything is ran in single fragment.
> We may have to use SubsetTransformer in UnionAllPrule.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Closed] (DRILL-4743) HashJoin's not fully parallelized in query plan

2016-08-29 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou closed DRILL-4743.
-
Assignee: Robert Hou  (was: Gautam Kumar Parai)

This fix has been verified.

> HashJoin's not fully parallelized in query plan
> ---
>
> Key: DRILL-4743
> URL: https://issues.apache.org/jira/browse/DRILL-4743
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.5.0
>Reporter: Gautam Kumar Parai
>Assignee: Robert Hou
>  Labels: doc-impacting
> Fix For: 1.8.0
>
>
> The underlying problem is filter selectivity under-estimate for a query with 
> complicated predicates e.g. deeply nested and/or predicates. This leads to 
> under parallelization of the major fragment doing the join. 
> To really resolve this problem we need table/column statistics to correctly 
> estimate the selectivity. However, in the absence of statistics OR even when 
> existing statistics are insufficient to get a correct estimate of selectivity 
> this will serve as a workaround.
> For now, the fix is to provide options for controlling the lower and upper 
> bounds for filter selectivity. The user can use the following options. The 
> selectivity can be varied between 0 and 1 with min selectivity always less 
> than or equal to max selectivity.
> {code}planner.filter.min_selectivity_estimate_factor 
> planner.filter.max_selectivity_estimate_factor 
> {code} 
> When using 'explain plan including all attributes for ' it should cap the 
> estimated ROWCOUNT based on these options. Estimated ROWCOUNT of operators 
> downstream is not directly controlled by these options. However, they may 
> change as a result of dependency between different operators. The FILTER 
> operator only operates on the input of its immediate upstream operator (e.g. 
> SCAN, AGG). If two different filters are present in the same plan, they might 
> have different selectivities based on their immediate upstream operators 
> ROWCOUNT.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Closed] (DRILL-4833) Union-All with a small cardinality input on one side does not get parallelized

2016-08-29 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou closed DRILL-4833.
-

This fix has been verified.

> Union-All with a small cardinality input on one side does not get parallelized
> --
>
> Key: DRILL-4833
> URL: https://issues.apache.org/jira/browse/DRILL-4833
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.7.0
>Reporter: Aman Sinha
>Assignee: Aman Sinha
> Fix For: 1.8.0
>
>
> When a Union-All has an input that is a LIMIT 1 (or some small value relative 
> to the slice_target), and that input is accessing Parquet files, Drill does 
> an optimization where a single Parquet file is read (based on the rowcount 
> statistics in the Parquet file, we determine that reading 1 file is 
> sufficient).  This also means that the max width for that major fragment is 
> set to 1 because only 1 minor fragment is needed to read 1 row-group. 
> The net effect of this is the width of 1 is applied to the major fragment 
> which consists of union-all and its inputs.  This is sub-optimal because it 
> prevents parallelization of the other input and the union-all operator 
> itself.  
> Here's an example query and plan that illustrates the issue: 
> {noformat}
> alter session set `planner.slice_target` = 1;
> explain plan for 
> (select c.c_nationkey, c.c_custkey, c.c_name
> from
> dfs.`/Users/asinha/data/tpchmulti/customer` c
> inner join
> dfs.`/Users/asinha/data/tpchmulti/nation`  n
> on c.c_nationkey = n.n_nationkey)
> union all
> (select c_nationkey, c_custkey, c_name
> from dfs.`/Users/asinha/data/tpchmulti/customer` c limit 1)
> +--+--+
> | text | json |
> +--+--+
> | 00-00Screen
> 00-01  Project(c_nationkey=[$0], c_custkey=[$1], c_name=[$2])
> 00-02Project(c_nationkey=[$0], c_custkey=[$1], c_name=[$2])
> 00-03  UnionAll(all=[true])
> 00-05Project(c_nationkey=[$0], c_custkey=[$1], c_name=[$2])
> 00-07  HashJoin(condition=[=($0, $3)], joinType=[inner])
> 00-10Project(c_nationkey=[$0], c_custkey=[$1], c_name=[$2])
> 00-13  HashToRandomExchange(dist0=[[$0]])
> 01-01UnorderedMuxExchange
> 03-01  Project(c_nationkey=[$0], c_custkey=[$1], 
> c_name=[$2], E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
> 03-02Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath 
> [path=file:/Users/asinha/data/tpchmulti/customer]], 
> selectionRoot=file:/Users/asinha/data/tpchmulti/customer, numFiles=1, 
> usedMetadataFile=false, columns=[`c_nationkey`, `c_custkey`, `c_name`]]])
> 00-09Project(n_nationkey=[$0])
> 00-12  HashToRandomExchange(dist0=[[$0]])
> 02-01UnorderedMuxExchange
> 04-01  Project(n_nationkey=[$0], 
> E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
> 04-02Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath [path=file:/Users/asinha/data/tpchmulti/nation]], 
> selectionRoot=file:/Users/asinha/data/tpchmulti/nation, numFiles=1, 
> usedMetadataFile=false, columns=[`n_nationkey`]]])
> 00-04Project(c_nationkey=[$0], c_custkey=[$1], c_name=[$2])
> 00-06  SelectionVectorRemover
> 00-08Limit(fetch=[1])
> 00-11  Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath 
> [path=/Users/asinha/data/tpchmulti/customer/01.parquet]], 
> selectionRoot=file:/Users/asinha/data/tpchmulti/customer, numFiles=1, 
> usedMetadataFile=false, columns=[`c_nationkey`, `c_custkey`, `c_name`]]])
> {noformat}
> Note that Union-all and HashJoin are part of fragment 0 (single minor 
> fragment) even though they could have been parallelized.  This clearly 
> affects performance for larger data sets. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-4970) Wrong results when casting double to bigint or int

2016-10-26 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou updated DRILL-4970:
--
Description: 
This query returns the wrong result

0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(*) from test_table 
where (int_id > -3025 and bigint_id <= -256) or (cast(double_id as bigint) >= 
-255 and double_id <= -5);
+-+
| EXPR$0  |
+-+
| 2769|
+-+

Without the cast, it returns the correct result:

0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(*) from test_table 
where (int_id > -3025 and bigint_id <= -256) or (double_id >= -255 and 
double_id <= -5);
+-+
| EXPR$0  |
+-+
| 3020|
+-+

By itself, the result is also correct:

0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(*) from test_table 
where (cast(double_id as bigint) >= -255 and double_id <= -5);
+-+
| EXPR$0  |
+-+
| 251 |
+-+


  was:
This query returns the wrong result

0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(*) from 
orders_parts_rowgr1 where (int_id > -3025 and bigint_id <= -256) or 
(cast(double_id as bigint) >= -255 and double_id <= -5);
+-+
| EXPR$0  |
+-+
| 2769|
+-+

Without the cast, it returns the correct result:

0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(*) from 
orders_parts_rowgr1 where (int_id > -3025 and bigint_id <= -256) or (double_id 
>= -255 and double_id <= -5);
+-+
| EXPR$0  |
+-+
| 3020|
+-+

By itself, the result is also correct:

0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(*) from 
orders_parts_rowgr1 where (cast(double_id as bigint) >= -255 and double_id <= 
-5);
+-+
| EXPR$0  |
+-+
| 251 |
+-+



> Wrong results when casting double to bigint or int
> --
>
> Key: DRILL-4970
> URL: https://issues.apache.org/jira/browse/DRILL-4970
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.8.0
>Reporter: Robert Hou
> Attachments: test_table
>
>
> This query returns the wrong result
> 0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(*) from 
> test_table where (int_id > -3025 and bigint_id <= -256) or (cast(double_id as 
> bigint) >= -255 and double_id <= -5);
> +-+
> | EXPR$0  |
> +-+
> | 2769|
> +-+
> Without the cast, it returns the correct result:
> 0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(*) from 
> test_table where (int_id > -3025 and bigint_id <= -256) or (double_id >= -255 
> and double_id <= -5);
> +-+
> | EXPR$0  |
> +-+
> | 3020|
> +-+
> By itself, the result is also correct:
> 0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(*) from 
> test_table where (cast(double_id as bigint) >= -255 and double_id <= -5);
> +-+
> | EXPR$0  |
> +-+
> | 251 |
> +-+



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-4970) Wrong results when casting double to bigint or int

2016-10-26 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou updated DRILL-4970:
--
Attachment: test_table

> Wrong results when casting double to bigint or int
> --
>
> Key: DRILL-4970
> URL: https://issues.apache.org/jira/browse/DRILL-4970
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.8.0
>Reporter: Robert Hou
> Attachments: test_table
>
>
> This query returns the wrong result
> 0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(*) from 
> orders_parts_rowgr1 where (int_id > -3025 and bigint_id <= -256) or 
> (cast(double_id as bigint) >= -255 and double_id <= -5);
> +-+
> | EXPR$0  |
> +-+
> | 2769|
> +-+
> Without the cast, it returns the correct result:
> 0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(*) from 
> orders_parts_rowgr1 where (int_id > -3025 and bigint_id <= -256) or 
> (double_id >= -255 and double_id <= -5);
> +-+
> | EXPR$0  |
> +-+
> | 3020|
> +-+
> By itself, the result is also correct:
> 0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(*) from 
> orders_parts_rowgr1 where (cast(double_id as bigint) >= -255 and double_id <= 
> -5);
> +-+
> | EXPR$0  |
> +-+
> | 251 |
> +-+



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4971) query encounters system error: Statement "break AndOP3" is not enclosed by a breakable statement with label "AndOP3"

2016-10-26 Thread Robert Hou (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15610345#comment-15610345
 ] 

Robert Hou commented on DRILL-4971:
---

Put the two files into a directory called "test".

> query encounters system error: Statement "break AndOP3" is not enclosed by a 
> breakable statement with label "AndOP3"
> 
>
> Key: DRILL-4971
> URL: https://issues.apache.org/jira/browse/DRILL-4971
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Reporter: Robert Hou
> Attachments: low_table, medium_table
>
>
> This query returns an error:
> select count(\*) from test where ((int_id > 3060 and int_id < 6002) or 
> (int_id > 9025 and int_id < 11976)) and ((int_id > 9025 and int_id < 11976) 
> or (int_id > 3060 and int_id < 6002)) and (int_id > 3060 and int_id < 6002);
> Error: SYSTEM ERROR: CompileException: Line 232, Column 30: Statement "break 
> AndOP3" is not enclosed by a breakable statement with label "AndOP3"
> There are two partitions to the test table.  One covers the range 3061 - 6001 
> and the other covers the range 9026 - 11975.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-4971) query encounters system error: Statement "break AndOP3" is not enclosed by a breakable statement with label "AndOP3"

2016-10-26 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou updated DRILL-4971:
--
Attachment: low_table
medium_table

> query encounters system error: Statement "break AndOP3" is not enclosed by a 
> breakable statement with label "AndOP3"
> 
>
> Key: DRILL-4971
> URL: https://issues.apache.org/jira/browse/DRILL-4971
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Reporter: Robert Hou
> Attachments: low_table, medium_table
>
>
> This query returns an error:
> select count(\*) from test where ((int_id > 3060 and int_id < 6002) or 
> (int_id > 9025 and int_id < 11976)) and ((int_id > 9025 and int_id < 11976) 
> or (int_id > 3060 and int_id < 6002)) and (int_id > 3060 and int_id < 6002);
> Error: SYSTEM ERROR: CompileException: Line 232, Column 30: Statement "break 
> AndOP3" is not enclosed by a breakable statement with label "AndOP3"
> There are two partitions to the test table.  One covers the range 3061 - 6001 
> and the other covers the range 9026 - 11975.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-4971) query encounters system error: Statement "break AndOP3" is not enclosed by a breakable statement with label "AndOP3"

2016-10-26 Thread Robert Hou (JIRA)

Robert Hou created DRILL-4971:
-

 Summary: query encounters system error: Statement "break AndOP3" 
is not enclosed by a breakable statement with label "AndOP3"
 Key: DRILL-4971
 URL: https://issues.apache.org/jira/browse/DRILL-4971
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Flow
Reporter: Robert Hou
 Attachments: low_table, medium_table

This query returns an error:

select count(\*) from test where ((int_id > 3060 and int_id < 6002) or (int_id 
> 9025 and int_id < 11976)) and ((int_id > 9025 and int_id < 11976) or (int_id 
> 3060 and int_id < 6002)) and (int_id > 3060 and int_id < 6002);

Error: SYSTEM ERROR: CompileException: Line 232, Column 30: Statement "break 
AndOP3" is not enclosed by a breakable statement with label "AndOP3"

There are two partitions to the test table.  One covers the range 3061 - 6001 
and the other covers the range 9026 - 11975.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-4970) Wrong results when casting double to bigint or int

2016-10-26 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou updated DRILL-4970:
--
Description: 
This query returns the wrong result

0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(\*) from test_table 
where (int_id > -3025 and bigint_id <= -256) or (cast(double_id as bigint) >= 
-255 and double_id <= -5);
+-+
| EXPR$0  |
+-+
| 2769|
+-+

Without the cast, it returns the correct result:

0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(\*) from test_table 
where (int_id > -3025 and bigint_id <= -256) or (double_id >= -255 and 
double_id <= -5);
+-+
| EXPR$0  |
+-+
| 3020|
+-+

By itself, the result is also correct:

0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(*) from test_table 
where (cast(double_id as bigint) >= -255 and double_id <= -5);
+-+
| EXPR$0  |
+-+
| 251 |
+-+


  was:
This query returns the wrong result

0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(*) from test_table 
where (int_id > -3025 and bigint_id <= -256) or (cast(double_id as bigint) >= 
-255 and double_id <= -5);
+-+
| EXPR$0  |
+-+
| 2769|
+-+

Without the cast, it returns the correct result:

0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(*) from test_table 
where (int_id > -3025 and bigint_id <= -256) or (double_id >= -255 and 
double_id <= -5);
+-+
| EXPR$0  |
+-+
| 3020|
+-+

By itself, the result is also correct:

0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(*) from test_table 
where (cast(double_id as bigint) >= -255 and double_id <= -5);
+-+
| EXPR$0  |
+-+
| 251 |
+-+



> Wrong results when casting double to bigint or int
> --
>
> Key: DRILL-4970
> URL: https://issues.apache.org/jira/browse/DRILL-4970
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.8.0
>Reporter: Robert Hou
> Attachments: test_table
>
>
> This query returns the wrong result
> 0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(\*) from 
> test_table where (int_id > -3025 and bigint_id <= -256) or (cast(double_id as 
> bigint) >= -255 and double_id <= -5);
> +-+
> | EXPR$0  |
> +-+
> | 2769|
> +-+
> Without the cast, it returns the correct result:
> 0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(\*) from 
> test_table where (int_id > -3025 and bigint_id <= -256) or (double_id >= -255 
> and double_id <= -5);
> +-+
> | EXPR$0  |
> +-+
> | 3020|
> +-+
> By itself, the result is also correct:
> 0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(*) from 
> test_table where (cast(double_id as bigint) >= -255 and double_id <= -5);
> +-+
> | EXPR$0  |
> +-+
> | 251 |
> +-+



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-4970) Wrong results when casting double to bigint or int

2016-10-26 Thread Robert Hou (JIRA)

Robert Hou created DRILL-4970:
-

 Summary: Wrong results when casting double to bigint or int
 Key: DRILL-4970
 URL: https://issues.apache.org/jira/browse/DRILL-4970
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Data Types
Affects Versions: 1.8.0
Reporter: Robert Hou


This query returns the wrong result

0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(*) from 
orders_parts_rowgr1 where (int_id > -3025 and bigint_id <= -256) or 
(cast(double_id as bigint) >= -255 and double_id <= -5);
+-+
| EXPR$0  |
+-+
| 2769|
+-+

Without the cast, it returns the correct result:

0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(*) from 
orders_parts_rowgr1 where (int_id > -3025 and bigint_id <= -256) or (double_id 
>= -255 and double_id <= -5);
+-+
| EXPR$0  |
+-+
| 3020|
+-+

By itself, the result is also correct:

0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(*) from 
orders_parts_rowgr1 where (cast(double_id as bigint) >= -255 and double_id <= 
-5);
+-+
| EXPR$0  |
+-+
| 251 |
+-+




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-5018) Metadata cache has duplicate columnTypeInfo values

2016-11-08 Thread Robert Hou (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15649289#comment-15649289
 ] 

Robert Hou commented on DRILL-5018:
---

For the second lineitem table, use CTAS with the first lineitem table.

   create table lineitem2 as select * from lineitem;

> Metadata cache has duplicate columnTypeInfo values
> --
>
> Key: DRILL-5018
> URL: https://issues.apache.org/jira/browse/DRILL-5018
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.8.0
>Reporter: Robert Hou
>Assignee: Parth Chandra
> Attachments: lineitem_1_0_1.parquet, lineitem_999.parquet
>
>
> This lineitem table has duplicate entries in its metadata file, although the 
> entries have slightly different values.  This lineitem table uses 
> directory-based partitioning on year and month.
>   "columnTypeInfo" : {
> "L_RETURNFLAG" : {
>   "name" : [ "L_RETURNFLAG" ],
>   "primitiveType" : "BINARY",
>   "originalType" : null,
>   "precision" : 0,
>   "scale" : 0,
>   "repetitionLevel" : 0,
>   "definitionLevel" : 1
> },
> "l_returnflag" : {
>   "name" : [ "l_returnflag" ],
>   "primitiveType" : "BINARY",
>   "originalType" : "UTF8",
>   "precision" : 0,
>   "scale" : 0,
>   "repetitionLevel" : 0,
>   "definitionLevel" : 0
> },
> This lineitem table has two entries in its metadata file for each column, but 
> the two entries have different column names (adding a zero).  It also has 
> slightly different values.  This lineitem table was created using CTAS with 
> the first table above.
> "l_shipinstruct" : {
>   "name" : [ "l_shipinstruct" ],
>   "primitiveType" : "BINARY",
>   "originalType" : "UTF8",
>   "precision" : 0,
>   "scale" : 0,
>   "repetitionLevel" : 0,
>   "definitionLevel" : 0
> },
> "L_SHIPINSTRUCT0" : {
>   "name" : [ "L_SHIPINSTRUCT0" ],
>   "primitiveType" : "BINARY",
>   "originalType" : null,
>   "precision" : 0,
>   "scale" : 0,
>   "repetitionLevel" : 0,
>   "definitionLevel" : 1
> },



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-5018) Metadata cache has duplicate columnTypeInfo values

2016-11-08 Thread Robert Hou (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15649293#comment-15649293
 ] 

Robert Hou commented on DRILL-5018:
---

The metadata_caching/generated_caches/validate_cache3.q.fail test has been 
disabled due to this bug.  When this bug is fixed, then this test needs to be 
validated and enabled.

> Metadata cache has duplicate columnTypeInfo values
> --
>
> Key: DRILL-5018
> URL: https://issues.apache.org/jira/browse/DRILL-5018
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.8.0
>Reporter: Robert Hou
>Assignee: Parth Chandra
> Attachments: lineitem_1_0_1.parquet, lineitem_999.parquet
>
>
> This lineitem table has duplicate entries in its metadata file, although the 
> entries have slightly different values.  This lineitem table uses 
> directory-based partitioning on year and month.
>   "columnTypeInfo" : {
> "L_RETURNFLAG" : {
>   "name" : [ "L_RETURNFLAG" ],
>   "primitiveType" : "BINARY",
>   "originalType" : null,
>   "precision" : 0,
>   "scale" : 0,
>   "repetitionLevel" : 0,
>   "definitionLevel" : 1
> },
> "l_returnflag" : {
>   "name" : [ "l_returnflag" ],
>   "primitiveType" : "BINARY",
>   "originalType" : "UTF8",
>   "precision" : 0,
>   "scale" : 0,
>   "repetitionLevel" : 0,
>   "definitionLevel" : 0
> },
> This lineitem table has two entries in its metadata file for each column, but 
> the two entries have different column names (adding a zero).  It also has 
> slightly different values.  This lineitem table was created using CTAS with 
> the first table above.
> "l_shipinstruct" : {
>   "name" : [ "l_shipinstruct" ],
>   "primitiveType" : "BINARY",
>   "originalType" : "UTF8",
>   "precision" : 0,
>   "scale" : 0,
>   "repetitionLevel" : 0,
>   "definitionLevel" : 0
> },
> "L_SHIPINSTRUCT0" : {
>   "name" : [ "L_SHIPINSTRUCT0" ],
>   "primitiveType" : "BINARY",
>   "originalType" : null,
>   "precision" : 0,
>   "scale" : 0,
>   "repetitionLevel" : 0,
>   "definitionLevel" : 1
> },



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (DRILL-5035) Selecting timestamp value from Hive table causes IndexOutOfBoundsException

2016-11-10 Thread Robert Hou (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655397#comment-15655397
 ] 

Robert Hou edited comment on DRILL-5035 at 11/10/16 10:36 PM:
--

The partition only has null values for timestamp_id.


was (Author: rhou):
The partition only has null values.

> Selecting timestamp value from Hive table causes IndexOutOfBoundsException
> --
>
> Key: DRILL-5035
> URL: https://issues.apache.org/jira/browse/DRILL-5035
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.9.0
>Reporter: Robert Hou
>
> I used the new option to read Hive timestamps.
> alter session set `store.parquet.reader.int96_as_timestamp` = true;
> This query fails:
> select timestamp_id from orders_parts_hive where timestamp_id = '2016-10-03 
> 06:11:52.429';
> Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: 
> 36288 (expected: 0 <= readerIndex <= writerIndex <= capacity(32768))
> Fragment 0:0
> [Error Id: 50537b32-cdc9-4898-9581-531066288fbd on qa-node211:31010] 
> (state=,code=0)
> Selecting all the columns succeed.
> 0: jdbc:drill:zk=10.10.100.186:5181> select * from orders_parts_hive where 
> timestamp_id = '2016-10-03 06:11:52.429';
> +-+++---+--+--+-++-++---++-+-+--+-+
> | o_orderkey  | o_custkey  | o_orderstatus  | o_totalprice  | o_orderdate  | 
> o_clerk  | o_shippriority  | o_comment
>   | int_id  | bigint_id  | float_id  | double_id  | 
> varchar_id  |   date_id   |   timestamp_id   |  dir0  
>  |
> +-+++---+--+--+-++-++---++-+-+--+-+
> | 11335   | 871| F  | 133549.0  | 1994-10-22   | 
> null | 0   | ealms. theodolites maintain. regular, even 
> instructions against t  | -4  | -4 | -4.0  | -4.0   | -4  
> | 2016-09-29  | 2016-10-03 06:11:52.429  | o_orderpriority=2-HIGH  |
> +-+++---+--+--+-++-++---++-+-+--+-+



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-5035) Selecting timestamp value from Hive table causes IndexOutOfBoundsException

2016-11-10 Thread Robert Hou (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655464#comment-15655464
 ] 

Robert Hou commented on DRILL-5035:
---

I set the new option to false and I do not see a problem.  I will try with 
IMPALA_TIMESTAMP.

> Selecting timestamp value from Hive table causes IndexOutOfBoundsException
> --
>
> Key: DRILL-5035
> URL: https://issues.apache.org/jira/browse/DRILL-5035
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.9.0
>Reporter: Robert Hou
>Assignee: Vitalii Diravka
>
> I used the new option to read Hive timestamps.
> alter session set `store.parquet.reader.int96_as_timestamp` = true;
> This query fails:
> select timestamp_id from orders_parts_hive where timestamp_id = '2016-10-03 
> 06:11:52.429';
> Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: 
> 36288 (expected: 0 <= readerIndex <= writerIndex <= capacity(32768))
> Fragment 0:0
> [Error Id: 50537b32-cdc9-4898-9581-531066288fbd on qa-node211:31010] 
> (state=,code=0)
> Selecting all the columns succeed.
> 0: jdbc:drill:zk=10.10.100.186:5181> select * from orders_parts_hive where 
> timestamp_id = '2016-10-03 06:11:52.429';
> +-+++---+--+--+-++-++---++-+-+--+-+
> | o_orderkey  | o_custkey  | o_orderstatus  | o_totalprice  | o_orderdate  | 
> o_clerk  | o_shippriority  | o_comment
>   | int_id  | bigint_id  | float_id  | double_id  | 
> varchar_id  |   date_id   |   timestamp_id   |  dir0  
>  |
> +-+++---+--+--+-++-++---++-+-+--+-+
> | 11335   | 871| F  | 133549.0  | 1994-10-22   | 
> null | 0   | ealms. theodolites maintain. regular, even 
> instructions against t  | -4  | -4 | -4.0  | -4.0   | -4  
> | 2016-09-29  | 2016-10-03 06:11:52.429  | o_orderpriority=2-HIGH  |
> +-+++---+--+--+-++-++---++-+-+--+-+



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-5035) Selecting timestamp value from Hive table causes IndexOutOfBoundsException

2016-11-10 Thread Robert Hou (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655306#comment-15655306
 ] 

Robert Hou commented on DRILL-5035:
---

I am using RC1.

0: jdbc:drill:zk=10.10.100.186:5181> select * from sys.version;
+--+---+-++++
| version  | commit_id |   
commit_message|commit_time |  build_email   
| build_time |
+--+---+-++++
| 1.9.0| 5cea9afa6278e21574c6a982ae5c3d82085ef904  | [maven-release-plugin] 
prepare release drill-1.9.0  | 09.11.2016 @ 10:28:44 PST  | r...@mapr.com  | 
10.11.2016 @ 12:56:24 PST  |
+--+---+-++++


> Selecting timestamp value from Hive table causes IndexOutOfBoundsException
> --
>
> Key: DRILL-5035
> URL: https://issues.apache.org/jira/browse/DRILL-5035
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.9.0
>Reporter: Robert Hou
>
> I used the new option to read Hive timestamps.
> alter session set `store.parquet.reader.int96_as_timestamp` = true;
> This query fails:
> select timestamp_id from orders_parts_hive where timestamp_id = '2016-10-03 
> 06:11:52.429';
> Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: 
> 36288 (expected: 0 <= readerIndex <= writerIndex <= capacity(32768))
> Fragment 0:0
> [Error Id: 50537b32-cdc9-4898-9581-531066288fbd on qa-node211:31010] 
> (state=,code=0)
> Selecting all the columns succeed.
> 0: jdbc:drill:zk=10.10.100.186:5181> select * from orders_parts_hive where 
> timestamp_id = '2016-10-03 06:11:52.429';
> +-+++---+--+--+-++-++---++-+-+--+-+
> | o_orderkey  | o_custkey  | o_orderstatus  | o_totalprice  | o_orderdate  | 
> o_clerk  | o_shippriority  | o_comment
>   | int_id  | bigint_id  | float_id  | double_id  | 
> varchar_id  |   date_id   |   timestamp_id   |  dir0  
>  |
> +-+++---+--+--+-++-++---++-+-+--+-+
> | 11335   | 871| F  | 133549.0  | 1994-10-22   | 
> null | 0   | ealms. theodolites maintain. regular, even 
> instructions against t  | -4  | -4 | -4.0  | -4.0   | -4  
> | 2016-09-29  | 2016-10-03 06:11:52.429  | o_orderpriority=2-HIGH  |
> +-+++---+--+--+-++-++---++-+-+--+-+



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-5035) Selecting timestamp value from Hive table causes IndexOutOfBoundsException

2016-11-10 Thread Robert Hou (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655378#comment-15655378
 ] 

Robert Hou commented on DRILL-5035:
---

The Hive table is partitioned on o_orderpriority, which is a string.

> Selecting timestamp value from Hive table causes IndexOutOfBoundsException
> --
>
> Key: DRILL-5035
> URL: https://issues.apache.org/jira/browse/DRILL-5035
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.9.0
>Reporter: Robert Hou
>
> I used the new option to read Hive timestamps.
> alter session set `store.parquet.reader.int96_as_timestamp` = true;
> This query fails:
> select timestamp_id from orders_parts_hive where timestamp_id = '2016-10-03 
> 06:11:52.429';
> Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: 
> 36288 (expected: 0 <= readerIndex <= writerIndex <= capacity(32768))
> Fragment 0:0
> [Error Id: 50537b32-cdc9-4898-9581-531066288fbd on qa-node211:31010] 
> (state=,code=0)
> Selecting all the columns succeed.
> 0: jdbc:drill:zk=10.10.100.186:5181> select * from orders_parts_hive where 
> timestamp_id = '2016-10-03 06:11:52.429';
> +-+++---+--+--+-++-++---++-+-+--+-+
> | o_orderkey  | o_custkey  | o_orderstatus  | o_totalprice  | o_orderdate  | 
> o_clerk  | o_shippriority  | o_comment
>   | int_id  | bigint_id  | float_id  | double_id  | 
> varchar_id  |   date_id   |   timestamp_id   |  dir0  
>  |
> +-+++---+--+--+-++-++---++-+-+--+-+
> | 11335   | 871| F  | 133549.0  | 1994-10-22   | 
> null | 0   | ealms. theodolites maintain. regular, even 
> instructions against t  | -4  | -4 | -4.0  | -4.0   | -4  
> | 2016-09-29  | 2016-10-03 06:11:52.429  | o_orderpriority=2-HIGH  |
> +-+++---+--+--+-++-++---++-+-+--+-+



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-5035) Selecting timestamp value from Hive table causes IndexOutOfBoundsException

2016-11-10 Thread Robert Hou (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655397#comment-15655397
 ] 

Robert Hou commented on DRILL-5035:
---

The partition only has null values.

> Selecting timestamp value from Hive table causes IndexOutOfBoundsException
> --
>
> Key: DRILL-5035
> URL: https://issues.apache.org/jira/browse/DRILL-5035
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.9.0
>Reporter: Robert Hou
>
> I used the new option to read Hive timestamps.
> alter session set `store.parquet.reader.int96_as_timestamp` = true;
> This query fails:
> select timestamp_id from orders_parts_hive where timestamp_id = '2016-10-03 
> 06:11:52.429';
> Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: 
> 36288 (expected: 0 <= readerIndex <= writerIndex <= capacity(32768))
> Fragment 0:0
> [Error Id: 50537b32-cdc9-4898-9581-531066288fbd on qa-node211:31010] 
> (state=,code=0)
> Selecting all the columns succeed.
> 0: jdbc:drill:zk=10.10.100.186:5181> select * from orders_parts_hive where 
> timestamp_id = '2016-10-03 06:11:52.429';
> +-+++---+--+--+-++-++---++-+-+--+-+
> | o_orderkey  | o_custkey  | o_orderstatus  | o_totalprice  | o_orderdate  | 
> o_clerk  | o_shippriority  | o_comment
>   | int_id  | bigint_id  | float_id  | double_id  | 
> varchar_id  |   date_id   |   timestamp_id   |  dir0  
>  |
> +-+++---+--+--+-++-++---++-+-+--+-+
> | 11335   | 871| F  | 133549.0  | 1994-10-22   | 
> null | 0   | ealms. theodolites maintain. regular, even 
> instructions against t  | -4  | -4 | -4.0  | -4.0   | -4  
> | 2016-09-29  | 2016-10-03 06:11:52.429  | o_orderpriority=2-HIGH  |
> +-+++---+--+--+-++-++---++-+-+--+-+



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-5035) Selecting timestamp value from Hive table causes IndexOutOfBoundsException

2016-11-10 Thread Robert Hou (JIRA)

Robert Hou created DRILL-5035:
-

 Summary: Selecting timestamp value from Hive table causes 
IndexOutOfBoundsException
 Key: DRILL-5035
 URL: https://issues.apache.org/jira/browse/DRILL-5035
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Data Types
Affects Versions: 1.9.0
Reporter: Robert Hou


I used the new option to read Hive timestamps.

alter session set `store.parquet.reader.int96_as_timestamp` = true;

This query fails:

select timestamp_id from orders_parts_hive where timestamp_id = '2016-10-03 
06:11:52.429';
Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: 
36288 (expected: 0 <= readerIndex <= writerIndex <= capacity(32768))

Fragment 0:0

[Error Id: 50537b32-cdc9-4898-9581-531066288fbd on qa-node211:31010] 
(state=,code=0)


Selecting all the columns succeed.

0: jdbc:drill:zk=10.10.100.186:5181> select * from orders_parts_hive where 
timestamp_id = '2016-10-03 06:11:52.429';
+-+++---+--+--+-++-++---++-+-+--+-+
| o_orderkey  | o_custkey  | o_orderstatus  | o_totalprice  | o_orderdate  | 
o_clerk  | o_shippriority  | o_comment  
| int_id  | bigint_id  | float_id  | double_id  | varchar_id  | 
  date_id   |   timestamp_id   |  dir0   |
+-+++---+--+--+-++-++---++-+-+--+-+
| 11335   | 871| F  | 133549.0  | 1994-10-22   | 
null | 0   | ealms. theodolites maintain. regular, even 
instructions against t  | -4  | -4 | -4.0  | -4.0   | -4
  | 2016-09-29  | 2016-10-03 06:11:52.429  | o_orderpriority=2-HIGH  |
+-+++---+--+--+-++-++---++-+-+--+-+




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (DRILL-5035) Selecting timestamp value from Hive table causes IndexOutOfBoundsException

2016-11-10 Thread Robert Hou (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655374#comment-15655374
 ] 

Robert Hou edited comment on DRILL-5035 at 11/10/16 10:39 PM:
--

This table is partitioned on a string.  The problem occurs with a partition 
that has null values.  The value of the string is "NOT SPECIFIED".

I can select every row up to the null partition using:

select timestamp_id from orders_parts_hive limit 9026;

But the next row is in the null partition and causes an exception.

select timestamp_id from orders_parts_hive limit 9027;


was (Author: rhou):
This table is partitioned on a string.  The problem occurs with a partition 
that has null values.  The value of the varchar is "NOT SPECIFIED".

I can select every row up to the null partition using:

select timestamp_id from orders_parts_hive limit 9026;

But the next row is in the null partition and causes an exception.

select timestamp_id from orders_parts_hive limit 9027;

> Selecting timestamp value from Hive table causes IndexOutOfBoundsException
> --
>
> Key: DRILL-5035
> URL: https://issues.apache.org/jira/browse/DRILL-5035
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.9.0
>Reporter: Robert Hou
>
> I used the new option to read Hive timestamps.
> alter session set `store.parquet.reader.int96_as_timestamp` = true;
> This query fails:
> select timestamp_id from orders_parts_hive where timestamp_id = '2016-10-03 
> 06:11:52.429';
> Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: 
> 36288 (expected: 0 <= readerIndex <= writerIndex <= capacity(32768))
> Fragment 0:0
> [Error Id: 50537b32-cdc9-4898-9581-531066288fbd on qa-node211:31010] 
> (state=,code=0)
> Selecting all the columns succeed.
> 0: jdbc:drill:zk=10.10.100.186:5181> select * from orders_parts_hive where 
> timestamp_id = '2016-10-03 06:11:52.429';
> +-+++---+--+--+-++-++---++-+-+--+-+
> | o_orderkey  | o_custkey  | o_orderstatus  | o_totalprice  | o_orderdate  | 
> o_clerk  | o_shippriority  | o_comment
>   | int_id  | bigint_id  | float_id  | double_id  | 
> varchar_id  |   date_id   |   timestamp_id   |  dir0  
>  |
> +-+++---+--+--+-++-++---++-+-+--+-+
> | 11335   | 871| F  | 133549.0  | 1994-10-22   | 
> null | 0   | ealms. theodolites maintain. regular, even 
> instructions against t  | -4  | -4 | -4.0  | -4.0   | -4  
> | 2016-09-29  | 2016-10-03 06:11:52.429  | o_orderpriority=2-HIGH  |
> +-+++---+--+--+-++-++---++-+-+--+-+



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (DRILL-5035) Selecting timestamp value from Hive table causes IndexOutOfBoundsException

2016-11-10 Thread Robert Hou (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655397#comment-15655397
 ] 

Robert Hou edited comment on DRILL-5035 at 11/10/16 10:39 PM:
--

The partition only has null values for timestamp_id.  Could this be an issue 
with empty batches?  There are 3024 null values in the partition.


was (Author: rhou):
The partition only has null values for timestamp_id.

> Selecting timestamp value from Hive table causes IndexOutOfBoundsException
> --
>
> Key: DRILL-5035
> URL: https://issues.apache.org/jira/browse/DRILL-5035
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.9.0
>Reporter: Robert Hou
>
> I used the new option to read Hive timestamps.
> alter session set `store.parquet.reader.int96_as_timestamp` = true;
> This query fails:
> select timestamp_id from orders_parts_hive where timestamp_id = '2016-10-03 
> 06:11:52.429';
> Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: 
> 36288 (expected: 0 <= readerIndex <= writerIndex <= capacity(32768))
> Fragment 0:0
> [Error Id: 50537b32-cdc9-4898-9581-531066288fbd on qa-node211:31010] 
> (state=,code=0)
> Selecting all the columns succeed.
> 0: jdbc:drill:zk=10.10.100.186:5181> select * from orders_parts_hive where 
> timestamp_id = '2016-10-03 06:11:52.429';
> +-+++---+--+--+-++-++---++-+-+--+-+
> | o_orderkey  | o_custkey  | o_orderstatus  | o_totalprice  | o_orderdate  | 
> o_clerk  | o_shippriority  | o_comment
>   | int_id  | bigint_id  | float_id  | double_id  | 
> varchar_id  |   date_id   |   timestamp_id   |  dir0  
>  |
> +-+++---+--+--+-++-++---++-+-+--+-+
> | 11335   | 871| F  | 133549.0  | 1994-10-22   | 
> null | 0   | ealms. theodolites maintain. regular, even 
> instructions against t  | -4  | -4 | -4.0  | -4.0   | -4  
> | 2016-09-29  | 2016-10-03 06:11:52.429  | o_orderpriority=2-HIGH  |
> +-+++---+--+--+-++-++---++-+-+--+-+



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (DRILL-5035) Selecting timestamp value from Hive table causes IndexOutOfBoundsException

2016-11-10 Thread Robert Hou (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655374#comment-15655374
 ] 

Robert Hou edited comment on DRILL-5035 at 11/10/16 10:39 PM:
--

This table is partitioned on a string.  The problem occurs with a partition 
that has null values.  The value of the varchar is "NOT SPECIFIED".

I can select every row up to the null partition using:

select timestamp_id from orders_parts_hive limit 9026;

But the next row is in the null partition and causes an exception.

select timestamp_id from orders_parts_hive limit 9027;


was (Author: rhou):
This table is partitioned on a varchar.  The problem occurs with a partition 
that has null values.  The value of the varchar is "NOT SPECIFIED".

I can select every row up to the null partition using:

select timestamp_id from orders_parts_hive limit 9026;

But the next row is in the null partition and causes an exception.

select timestamp_id from orders_parts_hive limit 9027;

> Selecting timestamp value from Hive table causes IndexOutOfBoundsException
> --
>
> Key: DRILL-5035
> URL: https://issues.apache.org/jira/browse/DRILL-5035
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.9.0
>Reporter: Robert Hou
>
> I used the new option to read Hive timestamps.
> alter session set `store.parquet.reader.int96_as_timestamp` = true;
> This query fails:
> select timestamp_id from orders_parts_hive where timestamp_id = '2016-10-03 
> 06:11:52.429';
> Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: 
> 36288 (expected: 0 <= readerIndex <= writerIndex <= capacity(32768))
> Fragment 0:0
> [Error Id: 50537b32-cdc9-4898-9581-531066288fbd on qa-node211:31010] 
> (state=,code=0)
> Selecting all the columns succeed.
> 0: jdbc:drill:zk=10.10.100.186:5181> select * from orders_parts_hive where 
> timestamp_id = '2016-10-03 06:11:52.429';
> +-+++---+--+--+-++-++---++-+-+--+-+
> | o_orderkey  | o_custkey  | o_orderstatus  | o_totalprice  | o_orderdate  | 
> o_clerk  | o_shippriority  | o_comment
>   | int_id  | bigint_id  | float_id  | double_id  | 
> varchar_id  |   date_id   |   timestamp_id   |  dir0  
>  |
> +-+++---+--+--+-++-++---++-+-+--+-+
> | 11335   | 871| F  | 133549.0  | 1994-10-22   | 
> null | 0   | ealms. theodolites maintain. regular, even 
> instructions against t  | -4  | -4 | -4.0  | -4.0   | -4  
> | 2016-09-29  | 2016-10-03 06:11:52.429  | o_orderpriority=2-HIGH  |
> +-+++---+--+--+-++-++---++-+-+--+-+



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-5035) Selecting timestamp value from Hive table causes IndexOutOfBoundsException

2016-11-10 Thread Robert Hou (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655479#comment-15655479
 ] 

Robert Hou commented on DRILL-5035:
---

I'm trying to figure out how to do that.  Because it is a Hive partitioned 
table, it has five directories, each with one file, and they all have the same 
name.  Maybe I'll use a tar file.

> Selecting timestamp value from Hive table causes IndexOutOfBoundsException
> --
>
> Key: DRILL-5035
> URL: https://issues.apache.org/jira/browse/DRILL-5035
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.9.0
>Reporter: Robert Hou
>Assignee: Vitalii Diravka
>
> I used the new option to read Hive timestamps.
> alter session set `store.parquet.reader.int96_as_timestamp` = true;
> This query fails:
> select timestamp_id from orders_parts_hive where timestamp_id = '2016-10-03 
> 06:11:52.429';
> Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: 
> 36288 (expected: 0 <= readerIndex <= writerIndex <= capacity(32768))
> Fragment 0:0
> [Error Id: 50537b32-cdc9-4898-9581-531066288fbd on qa-node211:31010] 
> (state=,code=0)
> Selecting all the columns succeed.
> 0: jdbc:drill:zk=10.10.100.186:5181> select * from orders_parts_hive where 
> timestamp_id = '2016-10-03 06:11:52.429';
> +-+++---+--+--+-++-++---++-+-+--+-+
> | o_orderkey  | o_custkey  | o_orderstatus  | o_totalprice  | o_orderdate  | 
> o_clerk  | o_shippriority  | o_comment
>   | int_id  | bigint_id  | float_id  | double_id  | 
> varchar_id  |   date_id   |   timestamp_id   |  dir0  
>  |
> +-+++---+--+--+-++-++---++-+-+--+-+
> | 11335   | 871| F  | 133549.0  | 1994-10-22   | 
> null | 0   | ealms. theodolites maintain. regular, even 
> instructions against t  | -4  | -4 | -4.0  | -4.0   | -4  
> | 2016-09-29  | 2016-10-03 06:11:52.429  | o_orderpriority=2-HIGH  |
> +-+++---+--+--+-++-++---++-+-+--+-+



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-5035) Selecting timestamp value from Hive table causes IndexOutOfBoundsException

2016-11-10 Thread Robert Hou (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655374#comment-15655374
 ] 

Robert Hou commented on DRILL-5035:
---

This table is partitioned on a varchar.  The problem occurs with a partition 
that has null values.  The value of the varchar is "NOT SPECIFIED".

I can select every row up to the null partition using:

select timestamp_id from orders_parts_hive limit 9026;

But the next row is in the null partition and causes an exception.

select timestamp_id from orders_parts_hive limit 9027;

> Selecting timestamp value from Hive table causes IndexOutOfBoundsException
> --
>
> Key: DRILL-5035
> URL: https://issues.apache.org/jira/browse/DRILL-5035
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.9.0
>Reporter: Robert Hou
>
> I used the new option to read Hive timestamps.
> alter session set `store.parquet.reader.int96_as_timestamp` = true;
> This query fails:
> select timestamp_id from orders_parts_hive where timestamp_id = '2016-10-03 
> 06:11:52.429';
> Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: 
> 36288 (expected: 0 <= readerIndex <= writerIndex <= capacity(32768))
> Fragment 0:0
> [Error Id: 50537b32-cdc9-4898-9581-531066288fbd on qa-node211:31010] 
> (state=,code=0)
> Selecting all the columns succeed.
> 0: jdbc:drill:zk=10.10.100.186:5181> select * from orders_parts_hive where 
> timestamp_id = '2016-10-03 06:11:52.429';
> +-+++---+--+--+-++-++---++-+-+--+-+
> | o_orderkey  | o_custkey  | o_orderstatus  | o_totalprice  | o_orderdate  | 
> o_clerk  | o_shippriority  | o_comment
>   | int_id  | bigint_id  | float_id  | double_id  | 
> varchar_id  |   date_id   |   timestamp_id   |  dir0  
>  |
> +-+++---+--+--+-++-++---++-+-+--+-+
> | 11335   | 871| F  | 133549.0  | 1994-10-22   | 
> null | 0   | ealms. theodolites maintain. regular, even 
> instructions against t  | -4  | -4 | -4.0  | -4.0   | -4  
> | 2016-09-29  | 2016-10-03 06:11:52.429  | o_orderpriority=2-HIGH  |
> +-+++---+--+--+-++-++---++-+-+--+-+



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-5035) Selecting timestamp value from Hive table causes IndexOutOfBoundsException

2016-11-10 Thread Robert Hou (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655896#comment-15655896
 ] 

Robert Hou commented on DRILL-5035:
---

I am not able to use timestamp_impala yet.  But I tried the original query with 
Drill 1.8, and I get zero rows back.  Which makes sense, since we are not 
interpreting the timestamp correctly.

select timestamp_id from orders_parts_hive where timestamp_id >= '2016-10-09 
13:36:38.986' and timestamp_id <= '2016-10-09 13:45:38.986';
+---+
| timestamp_id  |
+---+
+---+


I also tried selecting the whole column.  I get bad values (known problem), but 
I get all the values.  I don't get an exception.

select timestamp_id from orders_parts_hive;




> Selecting timestamp value from Hive table causes IndexOutOfBoundsException
> --
>
> Key: DRILL-5035
> URL: https://issues.apache.org/jira/browse/DRILL-5035
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.9.0
>Reporter: Robert Hou
>Assignee: Vitalii Diravka
> Attachments: orders_parts_hive.tar
>
>
> I used the new option to read Hive timestamps.
> alter session set `store.parquet.reader.int96_as_timestamp` = true;
> This query fails:
> select timestamp_id from orders_parts_hive where timestamp_id = '2016-10-03 
> 06:11:52.429';
> Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: 
> 36288 (expected: 0 <= readerIndex <= writerIndex <= capacity(32768))
> Fragment 0:0
> [Error Id: 50537b32-cdc9-4898-9581-531066288fbd on qa-node211:31010] 
> (state=,code=0)
> Selecting all the columns succeed.
> 0: jdbc:drill:zk=10.10.100.186:5181> select * from orders_parts_hive where 
> timestamp_id = '2016-10-03 06:11:52.429';
> +-+++---+--+--+-++-++---++-+-+--+-+
> | o_orderkey  | o_custkey  | o_orderstatus  | o_totalprice  | o_orderdate  | 
> o_clerk  | o_shippriority  | o_comment
>   | int_id  | bigint_id  | float_id  | double_id  | 
> varchar_id  |   date_id   |   timestamp_id   |  dir0  
>  |
> +-+++---+--+--+-++-++---++-+-+--+-+
> | 11335   | 871| F  | 133549.0  | 1994-10-22   | 
> null | 0   | ealms. theodolites maintain. regular, even 
> instructions against t  | -4  | -4 | -4.0  | -4.0   | -4  
> | 2016-09-29  | 2016-10-03 06:11:52.429  | o_orderpriority=2-HIGH  |
> +-+++---+--+--+-++-++---++-+-+--+-+



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (DRILL-5035) Selecting timestamp value from Hive table causes IndexOutOfBoundsException

2016-11-10 Thread Robert Hou (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655464#comment-15655464
 ] 

Robert Hou edited comment on DRILL-5035 at 11/11/16 2:29 AM:
-

I set the new option to false and I do not get an exception.  I will try with 
IMPALA_TIMESTAMP.


was (Author: rhou):
I set the new option to false and I do not see a problem.  I will try with 
IMPALA_TIMESTAMP.

> Selecting timestamp value from Hive table causes IndexOutOfBoundsException
> --
>
> Key: DRILL-5035
> URL: https://issues.apache.org/jira/browse/DRILL-5035
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.9.0
>Reporter: Robert Hou
>Assignee: Vitalii Diravka
> Attachments: orders_parts_hive.tar
>
>
> I used the new option to read Hive timestamps.
> alter session set `store.parquet.reader.int96_as_timestamp` = true;
> This query fails:
> select timestamp_id from orders_parts_hive where timestamp_id = '2016-10-03 
> 06:11:52.429';
> Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: 
> 36288 (expected: 0 <= readerIndex <= writerIndex <= capacity(32768))
> Fragment 0:0
> [Error Id: 50537b32-cdc9-4898-9581-531066288fbd on qa-node211:31010] 
> (state=,code=0)
> Selecting all the columns succeed.
> 0: jdbc:drill:zk=10.10.100.186:5181> select * from orders_parts_hive where 
> timestamp_id = '2016-10-03 06:11:52.429';
> +-+++---+--+--+-++-++---++-+-+--+-+
> | o_orderkey  | o_custkey  | o_orderstatus  | o_totalprice  | o_orderdate  | 
> o_clerk  | o_shippriority  | o_comment
>   | int_id  | bigint_id  | float_id  | double_id  | 
> varchar_id  |   date_id   |   timestamp_id   |  dir0  
>  |
> +-+++---+--+--+-++-++---++-+-+--+-+
> | 11335   | 871| F  | 133549.0  | 1994-10-22   | 
> null | 0   | ealms. theodolites maintain. regular, even 
> instructions against t  | -4  | -4 | -4.0  | -4.0   | -4  
> | 2016-09-29  | 2016-10-03 06:11:52.429  | o_orderpriority=2-HIGH  |
> +-+++---+--+--+-++-++---++-+-+--+-+



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-5035) Selecting timestamp value from Hive table causes IndexOutOfBoundsException

2016-11-10 Thread Robert Hou (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655476#comment-15655476
 ] 

Robert Hou commented on DRILL-5035:
---

Yes, I created it.  It is a Hive table partitioned on a string.  I created it 
using data from a Drill table.

> Selecting timestamp value from Hive table causes IndexOutOfBoundsException
> --
>
> Key: DRILL-5035
> URL: https://issues.apache.org/jira/browse/DRILL-5035
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.9.0
>Reporter: Robert Hou
>Assignee: Vitalii Diravka
>
> I used the new option to read Hive timestamps.
> alter session set `store.parquet.reader.int96_as_timestamp` = true;
> This query fails:
> select timestamp_id from orders_parts_hive where timestamp_id = '2016-10-03 
> 06:11:52.429';
> Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: 
> 36288 (expected: 0 <= readerIndex <= writerIndex <= capacity(32768))
> Fragment 0:0
> [Error Id: 50537b32-cdc9-4898-9581-531066288fbd on qa-node211:31010] 
> (state=,code=0)
> Selecting all the columns succeed.
> 0: jdbc:drill:zk=10.10.100.186:5181> select * from orders_parts_hive where 
> timestamp_id = '2016-10-03 06:11:52.429';
> +-+++---+--+--+-++-++---++-+-+--+-+
> | o_orderkey  | o_custkey  | o_orderstatus  | o_totalprice  | o_orderdate  | 
> o_clerk  | o_shippriority  | o_comment
>   | int_id  | bigint_id  | float_id  | double_id  | 
> varchar_id  |   date_id   |   timestamp_id   |  dir0  
>  |
> +-+++---+--+--+-++-++---++-+-+--+-+
> | 11335   | 871| F  | 133549.0  | 1994-10-22   | 
> null | 0   | ealms. theodolites maintain. regular, even 
> instructions against t  | -4  | -4 | -4.0  | -4.0   | -4  
> | 2016-09-29  | 2016-10-03 06:11:52.429  | o_orderpriority=2-HIGH  |
> +-+++---+--+--+-++-++---++-+-+--+-+



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-5035) Selecting timestamp value from Hive table causes IndexOutOfBoundsException

2016-11-10 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou updated DRILL-5035:
--
Attachment: orders_parts_hive.tar

This is a Hive partitioned table.  It is partitioned on o_orderpriority.

> Selecting timestamp value from Hive table causes IndexOutOfBoundsException
> --
>
> Key: DRILL-5035
> URL: https://issues.apache.org/jira/browse/DRILL-5035
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.9.0
>Reporter: Robert Hou
>Assignee: Vitalii Diravka
> Attachments: orders_parts_hive.tar
>
>
> I used the new option to read Hive timestamps.
> alter session set `store.parquet.reader.int96_as_timestamp` = true;
> This query fails:
> select timestamp_id from orders_parts_hive where timestamp_id = '2016-10-03 
> 06:11:52.429';
> Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: 
> 36288 (expected: 0 <= readerIndex <= writerIndex <= capacity(32768))
> Fragment 0:0
> [Error Id: 50537b32-cdc9-4898-9581-531066288fbd on qa-node211:31010] 
> (state=,code=0)
> Selecting all the columns succeed.
> 0: jdbc:drill:zk=10.10.100.186:5181> select * from orders_parts_hive where 
> timestamp_id = '2016-10-03 06:11:52.429';
> +-+++---+--+--+-++-++---++-+-+--+-+
> | o_orderkey  | o_custkey  | o_orderstatus  | o_totalprice  | o_orderdate  | 
> o_clerk  | o_shippriority  | o_comment
>   | int_id  | bigint_id  | float_id  | double_id  | 
> varchar_id  |   date_id   |   timestamp_id   |  dir0  
>  |
> +-+++---+--+--+-++-++---++-+-+--+-+
> | 11335   | 871| F  | 133549.0  | 1994-10-22   | 
> null | 0   | ealms. theodolites maintain. regular, even 
> instructions against t  | -4  | -4 | -4.0  | -4.0   | -4  
> | 2016-09-29  | 2016-10-03 06:11:52.429  | o_orderpriority=2-HIGH  |
> +-+++---+--+--+-++-++---++-+-+--+-+



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-5035) Selecting timestamp value from Hive table causes IndexOutOfBoundsException

2016-11-10 Thread Robert Hou (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655504#comment-15655504
 ] 

Robert Hou commented on DRILL-5035:
---

Interesting.

I exported Drill data to a tbl file.  I edited the tbl file so that Hive could 
read it.  I created a Hive table and loaded it from the tbl file.  Created a 
parquet Hive table from the first Hive table.  And then created a partitioned 
Hive table from the parquet Hive table.

> Selecting timestamp value from Hive table causes IndexOutOfBoundsException
> --
>
> Key: DRILL-5035
> URL: https://issues.apache.org/jira/browse/DRILL-5035
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.9.0
>Reporter: Robert Hou
>Assignee: Vitalii Diravka
> Attachments: orders_parts_hive.tar
>
>
> I used the new option to read Hive timestamps.
> alter session set `store.parquet.reader.int96_as_timestamp` = true;
> This query fails:
> select timestamp_id from orders_parts_hive where timestamp_id = '2016-10-03 
> 06:11:52.429';
> Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: 
> 36288 (expected: 0 <= readerIndex <= writerIndex <= capacity(32768))
> Fragment 0:0
> [Error Id: 50537b32-cdc9-4898-9581-531066288fbd on qa-node211:31010] 
> (state=,code=0)
> Selecting all the columns succeed.
> 0: jdbc:drill:zk=10.10.100.186:5181> select * from orders_parts_hive where 
> timestamp_id = '2016-10-03 06:11:52.429';
> +-+++---+--+--+-++-++---++-+-+--+-+
> | o_orderkey  | o_custkey  | o_orderstatus  | o_totalprice  | o_orderdate  | 
> o_clerk  | o_shippriority  | o_comment
>   | int_id  | bigint_id  | float_id  | double_id  | 
> varchar_id  |   date_id   |   timestamp_id   |  dir0  
>  |
> +-+++---+--+--+-++-++---++-+-+--+-+
> | 11335   | 871| F  | 133549.0  | 1994-10-22   | 
> null | 0   | ealms. theodolites maintain. regular, even 
> instructions against t  | -4  | -4 | -4.0  | -4.0   | -4  
> | 2016-09-29  | 2016-10-03 06:11:52.429  | o_orderpriority=2-HIGH  |
> +-+++---+--+--+-++-++---++-+-+--+-+



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-5035) Selecting timestamp value from Hive table causes IndexOutOfBoundsException

2016-11-10 Thread Robert Hou (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655535#comment-15655535
 ] 

Robert Hou commented on DRILL-5035:
---

~/bin/parquet-meta 00_0 
file:   
file:/root/drill-test-framework-pushdown/data/orders_parts_hive/o_orderpriority=1-URGENT/00_0
 
creator:parquet-mr version 1.6.0 


> Selecting timestamp value from Hive table causes IndexOutOfBoundsException
> --
>
> Key: DRILL-5035
> URL: https://issues.apache.org/jira/browse/DRILL-5035
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.9.0
>Reporter: Robert Hou
>Assignee: Vitalii Diravka
> Attachments: orders_parts_hive.tar
>
>
> I used the new option to read Hive timestamps.
> alter session set `store.parquet.reader.int96_as_timestamp` = true;
> This query fails:
> select timestamp_id from orders_parts_hive where timestamp_id = '2016-10-03 
> 06:11:52.429';
> Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: 
> 36288 (expected: 0 <= readerIndex <= writerIndex <= capacity(32768))
> Fragment 0:0
> [Error Id: 50537b32-cdc9-4898-9581-531066288fbd on qa-node211:31010] 
> (state=,code=0)
> Selecting all the columns succeed.
> 0: jdbc:drill:zk=10.10.100.186:5181> select * from orders_parts_hive where 
> timestamp_id = '2016-10-03 06:11:52.429';
> +-+++---+--+--+-++-++---++-+-+--+-+
> | o_orderkey  | o_custkey  | o_orderstatus  | o_totalprice  | o_orderdate  | 
> o_clerk  | o_shippriority  | o_comment
>   | int_id  | bigint_id  | float_id  | double_id  | 
> varchar_id  |   date_id   |   timestamp_id   |  dir0  
>  |
> +-+++---+--+--+-++-++---++-+-+--+-+
> | 11335   | 871| F  | 133549.0  | 1994-10-22   | 
> null | 0   | ealms. theodolites maintain. regular, even 
> instructions against t  | -4  | -4 | -4.0  | -4.0   | -4  
> | 2016-09-29  | 2016-10-03 06:11:52.429  | o_orderpriority=2-HIGH  |
> +-+++---+--+--+-++-++---++-+-+--+-+



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-5035) Selecting timestamp value from Hive table causes IndexOutOfBoundsException

2016-11-10 Thread Robert Hou (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655566#comment-15655566
 ] 

Robert Hou commented on DRILL-5035:
---

I tried with Hive.  It succeeds.

> Selecting timestamp value from Hive table causes IndexOutOfBoundsException
> --
>
> Key: DRILL-5035
> URL: https://issues.apache.org/jira/browse/DRILL-5035
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.9.0
>Reporter: Robert Hou
>Assignee: Vitalii Diravka
> Attachments: orders_parts_hive.tar
>
>
> I used the new option to read Hive timestamps.
> alter session set `store.parquet.reader.int96_as_timestamp` = true;
> This query fails:
> select timestamp_id from orders_parts_hive where timestamp_id = '2016-10-03 
> 06:11:52.429';
> Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: 
> 36288 (expected: 0 <= readerIndex <= writerIndex <= capacity(32768))
> Fragment 0:0
> [Error Id: 50537b32-cdc9-4898-9581-531066288fbd on qa-node211:31010] 
> (state=,code=0)
> Selecting all the columns succeed.
> 0: jdbc:drill:zk=10.10.100.186:5181> select * from orders_parts_hive where 
> timestamp_id = '2016-10-03 06:11:52.429';
> +-+++---+--+--+-++-++---++-+-+--+-+
> | o_orderkey  | o_custkey  | o_orderstatus  | o_totalprice  | o_orderdate  | 
> o_clerk  | o_shippriority  | o_comment
>   | int_id  | bigint_id  | float_id  | double_id  | 
> varchar_id  |   date_id   |   timestamp_id   |  dir0  
>  |
> +-+++---+--+--+-++-++---++-+-+--+-+
> | 11335   | 871| F  | 133549.0  | 1994-10-22   | 
> null | 0   | ealms. theodolites maintain. regular, even 
> instructions against t  | -4  | -4 | -4.0  | -4.0   | -4  
> | 2016-09-29  | 2016-10-03 06:11:52.429  | o_orderpriority=2-HIGH  |
> +-+++---+--+--+-++-++---++-+-+--+-+



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-5035) Selecting timestamp value from Hive table causes IndexOutOfBoundsException

2016-11-10 Thread Robert Hou (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655518#comment-15655518
 ] 

Robert Hou commented on DRILL-5035:
---

I am not sure this is a release stopper.  It may be due to the fact that I have 
a partition that only has null values for the column.

> Selecting timestamp value from Hive table causes IndexOutOfBoundsException
> --
>
> Key: DRILL-5035
> URL: https://issues.apache.org/jira/browse/DRILL-5035
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.9.0
>Reporter: Robert Hou
>Assignee: Vitalii Diravka
> Attachments: orders_parts_hive.tar
>
>
> I used the new option to read Hive timestamps.
> alter session set `store.parquet.reader.int96_as_timestamp` = true;
> This query fails:
> select timestamp_id from orders_parts_hive where timestamp_id = '2016-10-03 
> 06:11:52.429';
> Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: 
> 36288 (expected: 0 <= readerIndex <= writerIndex <= capacity(32768))
> Fragment 0:0
> [Error Id: 50537b32-cdc9-4898-9581-531066288fbd on qa-node211:31010] 
> (state=,code=0)
> Selecting all the columns succeed.
> 0: jdbc:drill:zk=10.10.100.186:5181> select * from orders_parts_hive where 
> timestamp_id = '2016-10-03 06:11:52.429';
> +-+++---+--+--+-++-++---++-+-+--+-+
> | o_orderkey  | o_custkey  | o_orderstatus  | o_totalprice  | o_orderdate  | 
> o_clerk  | o_shippriority  | o_comment
>   | int_id  | bigint_id  | float_id  | double_id  | 
> varchar_id  |   date_id   |   timestamp_id   |  dir0  
>  |
> +-+++---+--+--+-++-++---++-+-+--+-+
> | 11335   | 871| F  | 133549.0  | 1994-10-22   | 
> null | 0   | ealms. theodolites maintain. regular, even 
> instructions against t  | -4  | -4 | -4.0  | -4.0   | -4  
> | 2016-09-29  | 2016-10-03 06:11:52.429  | o_orderpriority=2-HIGH  |
> +-+++---+--+--+-++-++---++-+-+--+-+



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-5035) Selecting timestamp value from Hive table causes IndexOutOfBoundsException

2016-11-10 Thread Robert Hou (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655507#comment-15655507
 ] 

Robert Hou commented on DRILL-5035:
---

The DDL for the partitioned Hive table:

create table orders_parts_hive (
o_orderkey int,
o_custkey int,
o_orderstatus string,
o_totalprice double,
o_orderdate date,
o_clerk string,
o_shippriority int,
o_comment string,
int_id int,
bigint_id bigint,
float_id float,
double_id double,
varchar_id string,
date_id date,
timestamp_id timestamp)
partitioned by (o_orderpriority string)
stored as parquet;

> Selecting timestamp value from Hive table causes IndexOutOfBoundsException
> --
>
> Key: DRILL-5035
> URL: https://issues.apache.org/jira/browse/DRILL-5035
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.9.0
>Reporter: Robert Hou
>Assignee: Vitalii Diravka
> Attachments: orders_parts_hive.tar
>
>
> I used the new option to read Hive timestamps.
> alter session set `store.parquet.reader.int96_as_timestamp` = true;
> This query fails:
> select timestamp_id from orders_parts_hive where timestamp_id = '2016-10-03 
> 06:11:52.429';
> Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: 
> 36288 (expected: 0 <= readerIndex <= writerIndex <= capacity(32768))
> Fragment 0:0
> [Error Id: 50537b32-cdc9-4898-9581-531066288fbd on qa-node211:31010] 
> (state=,code=0)
> Selecting all the columns succeed.
> 0: jdbc:drill:zk=10.10.100.186:5181> select * from orders_parts_hive where 
> timestamp_id = '2016-10-03 06:11:52.429';
> +-+++---+--+--+-++-++---++-+-+--+-+
> | o_orderkey  | o_custkey  | o_orderstatus  | o_totalprice  | o_orderdate  | 
> o_clerk  | o_shippriority  | o_comment
>   | int_id  | bigint_id  | float_id  | double_id  | 
> varchar_id  |   date_id   |   timestamp_id   |  dir0  
>  |
> +-+++---+--+--+-++-++---++-+-+--+-+
> | 11335   | 871| F  | 133549.0  | 1994-10-22   | 
> null | 0   | ealms. theodolites maintain. regular, even 
> instructions against t  | -4  | -4 | -4.0  | -4.0   | -4  
> | 2016-09-29  | 2016-10-03 06:11:52.429  | o_orderpriority=2-HIGH  |
> +-+++---+--+--+-++-++---++-+-+--+-+



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4971) query encounters system error: Statement "break AndOP3" is not enclosed by a breakable statement with label "AndOP3"

2016-11-11 Thread Robert Hou (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15658487#comment-15658487
 ] 

Robert Hou commented on DRILL-4971:
---

The same problem occurs on 1.8.0.

> query encounters system error: Statement "break AndOP3" is not enclosed by a 
> breakable statement with label "AndOP3"
> 
>
> Key: DRILL-4971
> URL: https://issues.apache.org/jira/browse/DRILL-4971
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Reporter: Robert Hou
> Attachments: low_table, medium_table
>
>
> This query returns an error:
> select count(\*) from test where ((int_id > 3060 and int_id < 6002) or 
> (int_id > 9025 and int_id < 11976)) and ((int_id > 9025 and int_id < 11976) 
> or (int_id > 3060 and int_id < 6002)) and (int_id > 3060 and int_id < 6002);
> Error: SYSTEM ERROR: CompileException: Line 232, Column 30: Statement "break 
> AndOP3" is not enclosed by a breakable statement with label "AndOP3"
> There are two partitions to the test table.  One covers the range 3061 - 6001 
> and the other covers the range 9026 - 11975.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-4971) query encounters system error: Statement "break AndOP3" is not enclosed by a breakable statement with label "AndOP3"

2016-11-11 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou updated DRILL-4971:
--
Description: 
This query returns an error.  The stack trace suggests it might be a schema 
change issue, but there is no schema change in this table.  Many other queries 
are succeeding.

select count(\*) from test where ((int_id > 3060 and int_id < 6002) or (int_id 
> 9025 and int_id < 11976)) and ((int_id > 9025 and int_id < 11976) or (int_id 
> 3060 and int_id < 6002)) and (int_id > 3060 and int_id < 6002);

Error: SYSTEM ERROR: CompileException: Line 232, Column 30: Statement "break 
AndOP3" is not enclosed by a breakable statement with label "AndOP3"

[Error Id: 254d093b-79a1-4425-802c-ade08db293e4 on qa-node211:31010]^M
^M
  (org.apache.drill.exec.exception.SchemaChangeException) Failure while 
attempting to load generated class^M

org.apache.drill.exec.physical.impl.filter.FilterRecordBatch.generateSV2Filterer():198^M

org.apache.drill.exec.physical.impl.filter.FilterRecordBatch.setupNewSchema():107^M
org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():78^M


There are two partitions to the test table.  One covers the range 3061 - 6001 
and the other covers the range 9026 - 11975.



This second query returns a different, but possibly related, error.  

select count(\*) from orders_parts where (((int_id > -3025 and int_id < -4) or 
(int_id > -5 and int_id < 3061) or (int_id > 3060 and int_id < 6002)) and 
(int_id > -5 and int_id < 3061)) and (((int_id > -5 and int_id < 3061) or 
(int_id > 9025 and int_id < 11976)) and (int_id > -5 and int_id < 3061))^M
Failed with exception^M
java.sql.SQLException: SYSTEM ERROR: CompileException: Line 447, Column 30: 
Statement "break AndOP6" is not enclosed by a breakable statement with label 
"AndOP6"^M
^M
Fragment 0:0^M
^M
[Error Id: ac09187e-d3a2-41a7-a659-b287aca6039c on qa-node209:31010]^M
^M
  (org.apache.drill.exec.exception.SchemaChangeException) Failure while 
attempting to load generated class^M

org.apache.drill.exec.physical.impl.filter.FilterRecordBatch.generateSV2Filterer():198^M

org.apache.drill.exec.physical.impl.filter.FilterRecordBatch.setupNewSchema():107^M
org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():78^M

  was:
This query returns an error:

select count(\*) from test where ((int_id > 3060 and int_id < 6002) or (int_id 
> 9025 and int_id < 11976)) and ((int_id > 9025 and int_id < 11976) or (int_id 
> 3060 and int_id < 6002)) and (int_id > 3060 and int_id < 6002);

Error: SYSTEM ERROR: CompileException: Line 232, Column 30: Statement "break 
AndOP3" is not enclosed by a breakable statement with label "AndOP3"

There are two partitions to the test table.  One covers the range 3061 - 6001 
and the other covers the range 9026 - 11975.




> query encounters system error: Statement "break AndOP3" is not enclosed by a 
> breakable statement with label "AndOP3"
> 
>
> Key: DRILL-4971
> URL: https://issues.apache.org/jira/browse/DRILL-4971
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Reporter: Robert Hou
> Attachments: low_table, medium_table
>
>
> This query returns an error.  The stack trace suggests it might be a schema 
> change issue, but there is no schema change in this table.  Many other 
> queries are succeeding.
> select count(\*) from test where ((int_id > 3060 and int_id < 6002) or 
> (int_id > 9025 and int_id < 11976)) and ((int_id > 9025 and int_id < 11976) 
> or (int_id > 3060 and int_id < 6002)) and (int_id > 3060 and int_id < 6002);
> Error: SYSTEM ERROR: CompileException: Line 232, Column 30: Statement "break 
> AndOP3" is not enclosed by a breakable statement with label "AndOP3"
> [Error Id: 254d093b-79a1-4425-802c-ade08db293e4 on qa-node211:31010]^M
> ^M
>   (org.apache.drill.exec.exception.SchemaChangeException) Failure while 
> attempting to load generated class^M
> 
> org.apache.drill.exec.physical.impl.filter.FilterRecordBatch.generateSV2Filterer():198^M
> 
> org.apache.drill.exec.physical.impl.filter.FilterRecordBatch.setupNewSchema():107^M
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():78^M
> There are two partitions to the test table.  One covers the range 3061 - 6001 
> and the other covers the range 9026 - 11975.
> This second query returns a different, but possibly related, error.  
> select count(\*) from orders_parts where (((int_id > -3025 and int_id < -4) 
> or (int_id > -5 and int_id < 3061) or (int_id > 3060 and int_id < 6002)) and 
> (int_id > -5 and int_id < 3061)) and (((int_id > -5 and int_id < 3061) or 
> (int_id > 9025 and int_id < 11976)) and (int_id > -5 and int_id < 3061))^M
> Failed with exception^M

[jira] [Updated] (DRILL-5086) ClassCastException when filter pushdown is used with a bigint or float column.

2016-11-30 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou updated DRILL-5086:
--
Attachment: 0_0_5.parquet
0_0_4.parquet
0_0_3.parquet
0_0_2.parquet
0_0_1.parquet
drill.parquet_metadata

> ClassCastException when filter pushdown is used with a bigint or float column.
> --
>
> Key: DRILL-5086
> URL: https://issues.apache.org/jira/browse/DRILL-5086
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.9.0
>Reporter: Robert Hou
>Assignee: Aman Sinha
> Attachments: 0_0_1.parquet, 0_0_2.parquet, 0_0_3.parquet, 
> 0_0_4.parquet, 0_0_5.parquet, drill.parquet_metadata
>
>
> This query results in a ClassCastException when filter pushdown is used.  The 
> bigint column is being compared with an integer value.
> 0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(\*) from 
> orders_parts_metadata where bigint_id < 1100;
> Error: SYSTEM ERROR: ClassCastException: java.lang.Integer cannot be cast to 
> java.lang.Long
> A similar problem occurs when a float column is being compared with a double 
> value.
> 0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(\*) from 
> orders_parts_metadata where float_id < 1100.0;
> Error: SYSTEM ERROR: ClassCastException
> Also when a timestamp column is being compared with a string.
> 0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(\*) from 
> orders_parts_metadata where timestamp_id < '2016-10-13';
> Error: SYSTEM ERROR: ClassCastException: java.lang.Integer cannot be cast to 
> java.lang.Long



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-5086) ClassCastException when filter pushdown is used with a bigint or float column.

2016-11-30 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou updated DRILL-5086:
--
Description: 
This query results in a ClassCastException when filter pushdown is used.  The 
bigint column is being compared with an integer value.

   0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(\*) from 
orders_parts_metadata where bigint_id < 1100;
   Error: SYSTEM ERROR: ClassCastException: java.lang.Integer cannot be cast to 
java.lang.Long

To reproduce the problem, put the attached files into a directory.  Then create 
the metadata:

   refresh table metadata dfs.`path_to_directory`;

For example, if you put the files in 
/drill/testdata/filter/orders_parts_metadata, then run this sql command

   refresh table metadata dfs.`/drill/testdata/filter/orders_parts_metadata`;

A similar problem occurs when a float column is being compared with a double 
value.

   0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(\*) from 
orders_parts_metadata where float_id < 1100.0;
   Error: SYSTEM ERROR: ClassCastException


Also when a timestamp column is being compared with a string.

   0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(\*) from 
orders_parts_metadata where timestamp_id < '2016-10-13';
   Error: SYSTEM ERROR: ClassCastException: java.lang.Integer cannot be cast to 
java.lang.Long


  was:
This query results in a ClassCastException when filter pushdown is used.  The 
bigint column is being compared with an integer value.

0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(\*) from 
orders_parts_metadata where bigint_id < 1100;
Error: SYSTEM ERROR: ClassCastException: java.lang.Integer cannot be cast to 
java.lang.Long

A similar problem occurs when a float column is being compared with a double 
value.

0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(\*) from 
orders_parts_metadata where float_id < 1100.0;
Error: SYSTEM ERROR: ClassCastException


Also when a timestamp column is being compared with a string.

0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(\*) from 
orders_parts_metadata where timestamp_id < '2016-10-13';
Error: SYSTEM ERROR: ClassCastException: java.lang.Integer cannot be cast to 
java.lang.Long



> ClassCastException when filter pushdown is used with a bigint or float column.
> --
>
> Key: DRILL-5086
> URL: https://issues.apache.org/jira/browse/DRILL-5086
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.9.0
>Reporter: Robert Hou
>Assignee: Aman Sinha
> Attachments: 0_0_1.parquet, 0_0_2.parquet, 0_0_3.parquet, 
> 0_0_4.parquet, 0_0_5.parquet, drill.parquet_metadata
>
>
> This query results in a ClassCastException when filter pushdown is used.  The 
> bigint column is being compared with an integer value.
>0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(\*) from 
> orders_parts_metadata where bigint_id < 1100;
>Error: SYSTEM ERROR: ClassCastException: java.lang.Integer cannot be cast 
> to java.lang.Long
> To reproduce the problem, put the attached files into a directory.  Then 
> create the metadata:
>refresh table metadata dfs.`path_to_directory`;
> For example, if you put the files in 
> /drill/testdata/filter/orders_parts_metadata, then run this sql command
>refresh table metadata dfs.`/drill/testdata/filter/orders_parts_metadata`;
> A similar problem occurs when a float column is being compared with a double 
> value.
>0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(\*) from 
> orders_parts_metadata where float_id < 1100.0;
>Error: SYSTEM ERROR: ClassCastException
> Also when a timestamp column is being compared with a string.
>0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(\*) from 
> orders_parts_metadata where timestamp_id < '2016-10-13';
>Error: SYSTEM ERROR: ClassCastException: java.lang.Integer cannot be cast 
> to java.lang.Long



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-5093) Explain plan shows all partitions when query scans all partitions, and filter pushdown is used with metadata caching.

2016-12-01 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou updated DRILL-5093:
--
Attachment: drill.parquet_metadata

> Explain plan shows all partitions when query scans all partitions, and filter 
> pushdown is used with metadata caching.
> -
>
> Key: DRILL-5093
> URL: https://issues.apache.org/jira/browse/DRILL-5093
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.9.0
>Reporter: Robert Hou
>Assignee: Jinfeng Ni
> Attachments: 0_0_1.parquet, 0_0_2.parquet, 0_0_3.parquet, 
> 0_0_4.parquet, 0_0_5.parquet, drill.parquet_metadata
>
>
> This query scans all the partitions because the partitions cannot be pruned.  
> When metadata caching is used, the explain plan shows all the partitions, 
> when it should only show the parent.
> 0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> explain plan for select \* 
> from orders_parts_metadata;
> +--+--+
> | text | json |
> +--+--+
> | 00-00Screen
> 00-01  Project(*=[$0])
> 00-02Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
> [path=/drill/testdata/filter/orders_parts_metadata/0_0_1.parquet], 
> ReadEntryWithPath 
> [path=/drill/testdata/filter/orders_parts_metadata/0_0_3.parquet], 
> ReadEntryWithPath 
> [path=/drill/testdata/filter/orders_parts_metadata/0_0_4.parquet], 
> ReadEntryWithPath 
> [path=/drill/testdata/filter/orders_parts_metadata/0_0_5.parquet], 
> ReadEntryWithPath 
> [path=/drill/testdata/filter/orders_parts_metadata/0_0_2.parquet]], 
> selectionRoot=/drill/testdata/filter/orders_parts_metadata, numFiles=5, 
> usedMetadataFile=true, 
> cacheFileRoot=/drill/testdata/filter/orders_parts_metadata, columns=[`*`]]])
> Here is the same query with a table that does not have metadata caching.
> 0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> explain plan for select \* 
> from orders_parts;
> +--+--+
> | text | json |
> +--+--+
> | 00-00Screen
> 00-01  Project(*=[$0])
> 00-02Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
> [path=maprfs:///drill/testdata/filter/orders_parts]], 
> selectionRoot=maprfs:/drill/testdata/filter/orders_parts, numFiles=1, 
> usedMetadataFile=false, columns=[`*`]]])



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-5093) Explain plan shows all partitions when query scans all partitions, and filter pushdown is used with metadata caching.

2016-12-01 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou updated DRILL-5093:
--
Attachment: 0_0_5.parquet
0_0_4.parquet
0_0_3.parquet
0_0_2.parquet
0_0_1.parquet
drill.parquet_metadata

> Explain plan shows all partitions when query scans all partitions, and filter 
> pushdown is used with metadata caching.
> -
>
> Key: DRILL-5093
> URL: https://issues.apache.org/jira/browse/DRILL-5093
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.9.0
>Reporter: Robert Hou
>Assignee: Jinfeng Ni
> Attachments: 0_0_1.parquet, 0_0_2.parquet, 0_0_3.parquet, 
> 0_0_4.parquet, 0_0_5.parquet, drill.parquet_metadata
>
>
> This query scans all the partitions because the partitions cannot be pruned.  
> When metadata caching is used, the explain plan shows all the partitions, 
> when it should only show the parent.
> 0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> explain plan for select \* 
> from orders_parts_metadata;
> +--+--+
> | text | json |
> +--+--+
> | 00-00Screen
> 00-01  Project(*=[$0])
> 00-02Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
> [path=/drill/testdata/filter/orders_parts_metadata/0_0_1.parquet], 
> ReadEntryWithPath 
> [path=/drill/testdata/filter/orders_parts_metadata/0_0_3.parquet], 
> ReadEntryWithPath 
> [path=/drill/testdata/filter/orders_parts_metadata/0_0_4.parquet], 
> ReadEntryWithPath 
> [path=/drill/testdata/filter/orders_parts_metadata/0_0_5.parquet], 
> ReadEntryWithPath 
> [path=/drill/testdata/filter/orders_parts_metadata/0_0_2.parquet]], 
> selectionRoot=/drill/testdata/filter/orders_parts_metadata, numFiles=5, 
> usedMetadataFile=true, 
> cacheFileRoot=/drill/testdata/filter/orders_parts_metadata, columns=[`*`]]])
> Here is the same query with a table that does not have metadata caching.
> 0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> explain plan for select \* 
> from orders_parts;
> +--+--+
> | text | json |
> +--+--+
> | 00-00Screen
> 00-01  Project(*=[$0])
> 00-02Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
> [path=maprfs:///drill/testdata/filter/orders_parts]], 
> selectionRoot=maprfs:/drill/testdata/filter/orders_parts, numFiles=1, 
> usedMetadataFile=false, columns=[`*`]]])



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-5093) Explain plan shows all partitions when query scans all partitions, and filter pushdown is used with metadata caching.

2016-12-01 Thread Robert Hou (JIRA)

Robert Hou created DRILL-5093:
-

 Summary: Explain plan shows all partitions when query scans all 
partitions, and filter pushdown is used with metadata caching.
 Key: DRILL-5093
 URL: https://issues.apache.org/jira/browse/DRILL-5093
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning & Optimization
Affects Versions: 1.9.0
Reporter: Robert Hou
Assignee: Jinfeng Ni


This query scans all the partitions because the partitions cannot be pruned.  
When metadata caching is used, the explain plan shows all the partitions, when 
it should only show the parent.

0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> explain plan for select \* from 
orders_parts_metadata;
+--+--+
| text | json |
+--+--+
| 00-00Screen
00-01  Project(*=[$0])
00-02Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
[path=/drill/testdata/filter/orders_parts_metadata/0_0_1.parquet], 
ReadEntryWithPath 
[path=/drill/testdata/filter/orders_parts_metadata/0_0_3.parquet], 
ReadEntryWithPath 
[path=/drill/testdata/filter/orders_parts_metadata/0_0_4.parquet], 
ReadEntryWithPath 
[path=/drill/testdata/filter/orders_parts_metadata/0_0_5.parquet], 
ReadEntryWithPath 
[path=/drill/testdata/filter/orders_parts_metadata/0_0_2.parquet]], 
selectionRoot=/drill/testdata/filter/orders_parts_metadata, numFiles=5, 
usedMetadataFile=true, 
cacheFileRoot=/drill/testdata/filter/orders_parts_metadata, columns=[`*`]]])


Here is the same query with a table that does not have metadata caching.

0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> explain plan for select \* from 
orders_parts;
+--+--+
| text | json |
+--+--+
| 00-00Screen
00-01  Project(*=[$0])
00-02Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
[path=maprfs:///drill/testdata/filter/orders_parts]], 
selectionRoot=maprfs:/drill/testdata/filter/orders_parts, numFiles=1, 
usedMetadataFile=false, columns=[`*`]]])




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-5093) Explain plan shows all partitions when query scans all partitions, and filter pushdown is used with metadata caching.

2016-12-01 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou updated DRILL-5093:
--
Description: 
This query scans all the partitions because the partitions cannot be pruned.  
When metadata caching is used, the explain plan shows all the partitions, when 
it should only show the parent.

0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> explain plan for select \* from 
orders_parts_metadata where int_id = -2000 or int_id = 0 or int_id = 4000 or 
int_id is null or int_id = 1;
+--+--+
| text | json |
+--+--+
| 00-00Screen
00-01  Project(*=[$0])
00-02Project(T62¦¦*=[$0])
00-03  SelectionVectorRemover
00-04Filter(condition=[OR(=($1, -2000), =($1, 0), =($1, 4000), IS 
NULL($1), =($1, 1))])
00-05  Project(T62¦¦*=[$0], int_id=[$1])
00-06Scan(groupscan=[ParquetGroupScan 
[entries=[ReadEntryWithPath 
[path=/drill/testdata/filter/orders_parts_metadata/0_0_1.parquet], 
ReadEntryWithPath 
[path=/drill/testdata/filter/orders_parts_metadata/0_0_3.parquet], 
ReadEntryWithPath 
[path=/drill/testdata/filter/orders_parts_metadata/0_0_4.parquet], 
ReadEntryWithPath 
[path=/drill/testdata/filter/orders_parts_metadata/0_0_5.parquet], 
ReadEntryWithPath 
[path=/drill/testdata/filter/orders_parts_metadata/0_0_2.parquet]], 
selectionRoot=/drill/testdata/filter/orders_parts_metadata, numFiles=5, 
usedMetadataFile=true, 
cacheFileRoot=/drill/testdata/filter/orders_parts_metadata, columns=[`*`]]])


To reproduce the problem, put the attached files into a directory. Then create 
the metadata:
refresh table metadata dfs.`path_to_directory`;
For example, if you put the files in 
/drill/testdata/filter/orders_parts_metadata, then run this sql command
refresh table metadata dfs.`/drill/testdata/filter/orders_parts_metadata`;


Here is the same query with the same table without metadata caching.

0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> explain plan for select \* from 
orders_parts where int_id = -2000 or int_id = 0 or int_id = 4000 or int_id is 
null or int_id = 1;
+--+--+
| text | json |
+--+--+
| 00-00Screen
00-01  Project(*=[$0])
00-02Project(T63¦¦*=[$0])
00-03  SelectionVectorRemover
00-04Filter(condition=[OR(=($1, -2000), =($1, 0), =($1, 4000), IS 
NULL($1), =($1, 1))])
00-05  Project(T63¦¦*=[$0], int_id=[$1])
00-06Scan(groupscan=[ParquetGroupScan 
[entries=[ReadEntryWithPath 
[path=maprfs:///drill/testdata/filter/orders_parts]], 
selectionRoot=maprfs:/drill/testdata/filter/orders_parts, numFiles=1, 
usedMetadataFile=false, columns=[`*`]]])


  was:
This query scans all the partitions because the partitions cannot be pruned.  
When metadata caching is used, the explain plan shows all the partitions, when 
it should only show the parent.

0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> explain plan for select \* from 
orders_parts_metadata;
+--+--+
| text | json |
+--+--+
| 00-00Screen
00-01  Project(*=[$0])
00-02Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
[path=/drill/testdata/filter/orders_parts_metadata/0_0_1.parquet], 
ReadEntryWithPath 
[path=/drill/testdata/filter/orders_parts_metadata/0_0_3.parquet], 
ReadEntryWithPath 
[path=/drill/testdata/filter/orders_parts_metadata/0_0_4.parquet], 
ReadEntryWithPath 
[path=/drill/testdata/filter/orders_parts_metadata/0_0_5.parquet], 
ReadEntryWithPath 
[path=/drill/testdata/filter/orders_parts_metadata/0_0_2.parquet]], 
selectionRoot=/drill/testdata/filter/orders_parts_metadata, numFiles=5, 
usedMetadataFile=true, 
cacheFileRoot=/drill/testdata/filter/orders_parts_metadata, columns=[`*`]]])


To reproduce the problem, put the attached files into a directory. Then create 
the metadata:
refresh table metadata dfs.`path_to_directory`;
For example, if you put the files in 
/drill/testdata/filter/orders_parts_metadata, then run this sql command
refresh table metadata dfs.`/drill/testdata/filter/orders_parts_metadata`;


Here is the same query with a table that does not have metadata caching.

0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> explain plan for select \* from 
orders_parts;
+--+--+
| text | json |
+--+--+
| 00-00Screen
00-01  Project(*=[$0])
00-02Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
[path=maprfs:///drill/testdata/filter/orders_parts]], 
selectionRoot=maprfs:/drill/testdata/filter/orders_parts, numFiles=1, 
usedMetadataFile=false, columns=[`*`]]])



> Explain plan shows all partitions when query scans all partitions, and filter 
> pushdown is used with metadata caching.
> -
>
> Key: DRILL-5093
> URL: https://issues.apache.org/jira/browse/DRILL-5093

[jira] [Updated] (DRILL-5086) ClassCastException when filter pushdown is used with a bigint or float column and metadata caching.

2016-11-30 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou updated DRILL-5086:
--
Description: 
This query results in a ClassCastException when filter pushdown is used with 
metadata caching.  The bigint column is being compared with an integer value.

   0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(\*) from 
orders_parts_metadata where bigint_id < 1100;
   Error: SYSTEM ERROR: ClassCastException: java.lang.Integer cannot be cast to 
java.lang.Long

To reproduce the problem, put the attached files into a directory.  Then create 
the metadata:

   refresh table metadata dfs.`path_to_directory`;

For example, if you put the files in 
/drill/testdata/filter/orders_parts_metadata, then run this sql command

   refresh table metadata dfs.`/drill/testdata/filter/orders_parts_metadata`;

A similar problem occurs when a float column is being compared with a double 
value.

   0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(\*) from 
orders_parts_metadata where float_id < 1100.0;
   Error: SYSTEM ERROR: ClassCastException


Also when a timestamp column is being compared with a string.

   0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(\*) from 
orders_parts_metadata where timestamp_id < '2016-10-13';
   Error: SYSTEM ERROR: ClassCastException: java.lang.Integer cannot be cast to 
java.lang.Long


  was:
This query results in a ClassCastException when filter pushdown is used.  The 
bigint column is being compared with an integer value.

   0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(\*) from 
orders_parts_metadata where bigint_id < 1100;
   Error: SYSTEM ERROR: ClassCastException: java.lang.Integer cannot be cast to 
java.lang.Long

To reproduce the problem, put the attached files into a directory.  Then create 
the metadata:

   refresh table metadata dfs.`path_to_directory`;

For example, if you put the files in 
/drill/testdata/filter/orders_parts_metadata, then run this sql command

   refresh table metadata dfs.`/drill/testdata/filter/orders_parts_metadata`;

A similar problem occurs when a float column is being compared with a double 
value.

   0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(\*) from 
orders_parts_metadata where float_id < 1100.0;
   Error: SYSTEM ERROR: ClassCastException


Also when a timestamp column is being compared with a string.

   0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(\*) from 
orders_parts_metadata where timestamp_id < '2016-10-13';
   Error: SYSTEM ERROR: ClassCastException: java.lang.Integer cannot be cast to 
java.lang.Long



> ClassCastException when filter pushdown is used with a bigint or float column 
> and metadata caching.
> ---
>
> Key: DRILL-5086
> URL: https://issues.apache.org/jira/browse/DRILL-5086
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.9.0
>Reporter: Robert Hou
>Assignee: Parth Chandra
> Attachments: 0_0_1.parquet, 0_0_2.parquet, 0_0_3.parquet, 
> 0_0_4.parquet, 0_0_5.parquet, drill.parquet_metadata
>
>
> This query results in a ClassCastException when filter pushdown is used with 
> metadata caching.  The bigint column is being compared with an integer value.
>0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(\*) from 
> orders_parts_metadata where bigint_id < 1100;
>Error: SYSTEM ERROR: ClassCastException: java.lang.Integer cannot be cast 
> to java.lang.Long
> To reproduce the problem, put the attached files into a directory.  Then 
> create the metadata:
>refresh table metadata dfs.`path_to_directory`;
> For example, if you put the files in 
> /drill/testdata/filter/orders_parts_metadata, then run this sql command
>refresh table metadata dfs.`/drill/testdata/filter/orders_parts_metadata`;
> A similar problem occurs when a float column is being compared with a double 
> value.
>0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(\*) from 
> orders_parts_metadata where float_id < 1100.0;
>Error: SYSTEM ERROR: ClassCastException
> Also when a timestamp column is being compared with a string.
>0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(\*) from 
> orders_parts_metadata where timestamp_id < '2016-10-13';
>Error: SYSTEM ERROR: ClassCastException: java.lang.Integer cannot be cast 
> to java.lang.Long



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-5086) ClassCastException when filter pushdown is used with a bigint or float column and metadata caching.

2016-11-30 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou updated DRILL-5086:
--
Summary: ClassCastException when filter pushdown is used with a bigint or 
float column and metadata caching.  (was: ClassCastException when filter 
pushdown is used with a bigint or float column.)

> ClassCastException when filter pushdown is used with a bigint or float column 
> and metadata caching.
> ---
>
> Key: DRILL-5086
> URL: https://issues.apache.org/jira/browse/DRILL-5086
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.9.0
>Reporter: Robert Hou
>Assignee: Parth Chandra
> Attachments: 0_0_1.parquet, 0_0_2.parquet, 0_0_3.parquet, 
> 0_0_4.parquet, 0_0_5.parquet, drill.parquet_metadata
>
>
> This query results in a ClassCastException when filter pushdown is used.  The 
> bigint column is being compared with an integer value.
>0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(\*) from 
> orders_parts_metadata where bigint_id < 1100;
>Error: SYSTEM ERROR: ClassCastException: java.lang.Integer cannot be cast 
> to java.lang.Long
> To reproduce the problem, put the attached files into a directory.  Then 
> create the metadata:
>refresh table metadata dfs.`path_to_directory`;
> For example, if you put the files in 
> /drill/testdata/filter/orders_parts_metadata, then run this sql command
>refresh table metadata dfs.`/drill/testdata/filter/orders_parts_metadata`;
> A similar problem occurs when a float column is being compared with a double 
> value.
>0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(\*) from 
> orders_parts_metadata where float_id < 1100.0;
>Error: SYSTEM ERROR: ClassCastException
> Also when a timestamp column is being compared with a string.
>0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(\*) from 
> orders_parts_metadata where timestamp_id < '2016-10-13';
>Error: SYSTEM ERROR: ClassCastException: java.lang.Integer cannot be cast 
> to java.lang.Long



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-5136) Some SQL statements fail when using Simba ODBC driver 1.3

2016-12-19 Thread Robert Hou (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15762542#comment-15762542
 ] 

Robert Hou commented on DRILL-5136:
---

Comments from Robert Wu (Simba):

This issue is likely caused by the 1.9.0 server turning the SHOW SCHEMAS query 
into a limit 0 query "SELECT * FROM (show schemas) LIMIT 0" while handling the 
prepare API call that was introduced in Drill 1.9.0.
 
We are able to reproduce issue with the latest driver against Drill 1.9.0 
server. We also noticed that when the driver passes the "show schemas" query to 
the prepare API exposed by Drill client the call return the following error:
   [
   PARSE ERROR: Encountered "( show" at line 1, column 15.
   Was expecting one of:
...
   … OMITTED …
   "TABLE" ...
   SQL Query SELECT * FROM (show schemas) LIMIT 0
 ^
   [Error Id: bd266398-8090-42c0-b7b3-1efcbf2bf986 on maprdemo:31010]
   ]
 
Tracing through the Drill 1.9 server side code we were able to track down the 
sequence of method calls that leads to the part of the code that turns an 
incoming query into a limit 0 query while handling the prepare API call. Below 
are the details:
· When the server received the prepare request, the Userserver class 
calls on the worker to submit a prepare statement (view code).

· In the submit prearestatment function, we can see that it creates a 
new PreparedStatementWorker (view code).

· Finally, in the PreparedStatementWorker class, we can see that the 
server is manually wrapping the user query with limit 0 (view code).

· The server is failing to prepare the new self-modified query, 
resulting in the error message reported by Robert Hou.

 
We also tested the "show schemas" query against a server running pre-1.9 Drill 
and the issue is not reproducible. The reason for that is the driver does not 
use the new prepare API when connecting to server running Drill earlier than 
1.9.
 
Best regards,
 
Rob
 

> Some SQL statements fail when using Simba ODBC driver 1.3
> -
>
> Key: DRILL-5136
> URL: https://issues.apache.org/jira/browse/DRILL-5136
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - ODBC
>Affects Versions: 1.9.0
>Reporter: Robert Hou
>Assignee: Laurent Goujon
>
> "show schemas" does not work with Simba ODBC driver 
> SQL>show schemas
> 1: SQLPrepare = [MapR][Drill] (1040) Drill failed to execute the query: show 
> schemas
> [30029]Query execution error. Details:[
> PARSE ERROR: Encountered "( show" at line 1, column 15.
> Was expecting one of:
>  ...
>  ...
>  ...
>  ...
>  ...
> "LATERAL" ...
> "(" "WITH" ...
> "(" "+" ...
> "(" "-" ...
> "("  ...
> "("  ...
> "("  The query profile shows this SQL statement is being executed:
>SELECT * FROM (show schemas) LIMIT 0
>  
> The yellow highlighted syntax has been added when displaying schemas
> "use schema" also does not work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-5136) Some SQL statements fail due to PreparedStatement

2016-12-19 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou updated DRILL-5136:
--
Summary: Some SQL statements fail due to PreparedStatement  (was: Some SQL 
statements fail due to Prepared Statement API)

> Some SQL statements fail due to PreparedStatement
> -
>
> Key: DRILL-5136
> URL: https://issues.apache.org/jira/browse/DRILL-5136
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - ODBC
>Affects Versions: 1.9.0
>Reporter: Robert Hou
>Assignee: Laurent Goujon
>
> "show schemas" does not work.
> SQL>show schemas
> 1: SQLPrepare = [MapR][Drill] (1040) Drill failed to execute the query: show 
> schemas
> [30029]Query execution error. Details:[
> PARSE ERROR: Encountered "( show" at line 1, column 15.
> Was expecting one of:
>  ...
>  ...
>  ...
>  ...
>  ...
> "LATERAL" ...
> "(" "WITH" ...
> "(" "+" ...
> "(" "-" ...
> "("  ...
> "("  ...
> "("  The query profile shows this SQL statement is being executed:
>SELECT * FROM (show schemas) LIMIT 0
>  
> The yellow highlighted syntax has been added when displaying schemas
> "use schema" also does not work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-5136) Some SQL statements fail when using Simba ODBC driver 1.3

2016-12-19 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou updated DRILL-5136:
--
Assignee: Laurent Goujon

> Some SQL statements fail when using Simba ODBC driver 1.3
> -
>
> Key: DRILL-5136
> URL: https://issues.apache.org/jira/browse/DRILL-5136
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - ODBC
>Affects Versions: 1.9.0
>Reporter: Robert Hou
>Assignee: Laurent Goujon
>
> "show schemas" does not work with Simba ODBC driver 
> SQL>show schemas
> 1: SQLPrepare = [MapR][Drill] (1040) Drill failed to execute the query: show 
> schemas
> [30029]Query execution error. Details:[
> PARSE ERROR: Encountered "( show" at line 1, column 15.
> Was expecting one of:
>  ...
>  ...
>  ...
>  ...
>  ...
> "LATERAL" ...
> "(" "WITH" ...
> "(" "+" ...
> "(" "-" ...
> "("  ...
> "("  ...
> "("  The query profile shows this SQL statement is being executed:
>SELECT * FROM (show schemas) LIMIT 0
>  
> The yellow highlighted syntax has been added when displaying schemas
> "use schema" also does not work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-5136) Some SQL statements fail due to Prepared Statement API

2016-12-19 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou updated DRILL-5136:
--
Description: 
"show schemas" does not work.

SQL>show schemas
1: SQLPrepare = [MapR][Drill] (1040) Drill failed to execute the query: show 
schemas
[30029]Query execution error. Details:[
PARSE ERROR: Encountered "( show" at line 1, column 15.
Was expecting one of:
 ...
 ...
 ...
 ...
 ...
"LATERAL" ...
"(" "WITH" ...
"(" "+" ...
"(" "-" ...
"("  ...
"("  ...
"(" show schemas
1: SQLPrepare = [MapR][Drill] (1040) Drill failed to execute the query: show 
schemas
[30029]Query execution error. Details:[
PARSE ERROR: Encountered "( show" at line 1, column 15.
Was expecting one of:
 ...
 ...
 ...
 ...
 ...
"LATERAL" ...
"(" "WITH" ...
"(" "+" ...
"(" "-" ...
"("  ...
"("  ...
"("  Some SQL statements fail due to Prepared Statement API
> --
>
> Key: DRILL-5136
> URL: https://issues.apache.org/jira/browse/DRILL-5136
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - ODBC
>Affects Versions: 1.9.0
>Reporter: Robert Hou
>Assignee: Laurent Goujon
>
> "show schemas" does not work.
> SQL>show schemas
> 1: SQLPrepare = [MapR][Drill] (1040) Drill failed to execute the query: show 
> schemas
> [30029]Query execution error. Details:[
> PARSE ERROR: Encountered "( show" at line 1, column 15.
> Was expecting one of:
>  ...
>  ...
>  ...
>  ...
>  ...
> "LATERAL" ...
> "(" "WITH" ...
> "(" "+" ...
> "(" "-" ...
> "("  ...
> "("  ...
> "("  The query profile shows this SQL statement is being executed:
>SELECT * FROM (show schemas) LIMIT 0
>  
> The yellow highlighted syntax has been added when displaying schemas
> "use schema" also does not work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-5136) Some SQL statements fail due to Prepared Statement API

2016-12-19 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou updated DRILL-5136:
--
Summary: Some SQL statements fail due to Prepared Statement API  (was: Some 
SQL statements fail when using Simba ODBC driver 1.3)

> Some SQL statements fail due to Prepared Statement API
> --
>
> Key: DRILL-5136
> URL: https://issues.apache.org/jira/browse/DRILL-5136
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - ODBC
>Affects Versions: 1.9.0
>Reporter: Robert Hou
>Assignee: Laurent Goujon
>
> "show schemas" does not work with Simba ODBC driver 
> SQL>show schemas
> 1: SQLPrepare = [MapR][Drill] (1040) Drill failed to execute the query: show 
> schemas
> [30029]Query execution error. Details:[
> PARSE ERROR: Encountered "( show" at line 1, column 15.
> Was expecting one of:
>  ...
>  ...
>  ...
>  ...
>  ...
> "LATERAL" ...
> "(" "WITH" ...
> "(" "+" ...
> "(" "-" ...
> "("  ...
> "("  ...
> "("  The query profile shows this SQL statement is being executed:
>SELECT * FROM (show schemas) LIMIT 0
>  
> The yellow highlighted syntax has been added when displaying schemas
> "use schema" also does not work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (DRILL-5136) Some SQL statements fail due to Prepared Statement API

2016-12-19 Thread Robert Hou (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15762542#comment-15762542
 ] 

Robert Hou edited comment on DRILL-5136 at 12/19/16 11:10 PM:
--

Comments from Robert Wu:

This issue is likely caused by the 1.9.0 server turning the SHOW SCHEMAS query 
into a limit 0 query "SELECT * FROM (show schemas) LIMIT 0" while handling the 
prepare API call that was introduced in Drill 1.9.0.
 
We are able to reproduce issue with the latest driver against Drill 1.9.0 
server. We also noticed that when the driver passes the "show schemas" query to 
the prepare API exposed by Drill client the call return the following error:
   [
   PARSE ERROR: Encountered "( show" at line 1, column 15.
   Was expecting one of:
...
   … OMITTED …
   "TABLE" ...
   SQL Query SELECT * FROM (show schemas) LIMIT 0
 ^
   [Error Id: bd266398-8090-42c0-b7b3-1efcbf2bf986 on maprdemo:31010]
   ]
 
Tracing through the Drill 1.9 server side code we were able to track down the 
sequence of method calls that leads to the part of the code that turns an 
incoming query into a limit 0 query while handling the prepare API call. Below 
are the details:
· When the server received the prepare request, the Userserver class 
calls on the worker to submit a prepare statement (view code).

· In the submit prearestatment function, we can see that it creates a 
new PreparedStatementWorker (view code).

· Finally, in the PreparedStatementWorker class, we can see that the 
server is manually wrapping the user query with limit 0 (view code).

· The server is failing to prepare the new self-modified query, 
resulting in the error message reported by Robert Hou.

 
We also tested the "show schemas" query against a server running pre-1.9 Drill 
and the issue is not reproducible. The reason for that is the driver does not 
use the new prepare API when connecting to server running Drill earlier than 
1.9.
 
Best regards,
 
Rob
 


was (Author: rhou):
Comments from Robert Wu (Simba):

This issue is likely caused by the 1.9.0 server turning the SHOW SCHEMAS query 
into a limit 0 query "SELECT * FROM (show schemas) LIMIT 0" while handling the 
prepare API call that was introduced in Drill 1.9.0.
 
We are able to reproduce issue with the latest driver against Drill 1.9.0 
server. We also noticed that when the driver passes the "show schemas" query to 
the prepare API exposed by Drill client the call return the following error:
   [
   PARSE ERROR: Encountered "( show" at line 1, column 15.
   Was expecting one of:
...
   … OMITTED …
   "TABLE" ...
   SQL Query SELECT * FROM (show schemas) LIMIT 0
 ^
   [Error Id: bd266398-8090-42c0-b7b3-1efcbf2bf986 on maprdemo:31010]
   ]
 
Tracing through the Drill 1.9 server side code we were able to track down the 
sequence of method calls that leads to the part of the code that turns an 
incoming query into a limit 0 query while handling the prepare API call. Below 
are the details:
· When the server received the prepare request, the Userserver class 
calls on the worker to submit a prepare statement (view code).

· In the submit prearestatment function, we can see that it creates a 
new PreparedStatementWorker (view code).

· Finally, in the PreparedStatementWorker class, we can see that the 
server is manually wrapping the user query with limit 0 (view code).

· The server is failing to prepare the new self-modified query, 
resulting in the error message reported by Robert Hou.

 
We also tested the "show schemas" query against a server running pre-1.9 Drill 
and the issue is not reproducible. The reason for that is the driver does not 
use the new prepare API when connecting to server running Drill earlier than 
1.9.
 
Best regards,
 
Rob
 

> Some SQL statements fail due to Prepared Statement API
> --
>
> Key: DRILL-5136
> URL: https://issues.apache.org/jira/browse/DRILL-5136
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - ODBC
>Affects Versions: 1.9.0
>Reporter: Robert Hou
>Assignee: Laurent Goujon
>
> "show schemas" does not work.
> SQL>show schemas
> 1: SQLPrepare = [MapR][Drill] (1040) Drill failed to execute the query: show 
> schemas
> [30029]Query execution error. Details:[
> PARSE ERROR: Encountered "( show" at line 1, column 15.
> Was expecting one of:
>  ...
>  ...
>  ...
>  ...
>  ...
> "LATERAL" ...
> "(" "WITH" ...
> "(" "+" ...
> "(" "-" ...
> "("  ...
> "("  ...
> "("  The query profile shows this SQL statement is being executed:
>SELECT * FROM (show schemas) LIMIT 0
>  
> The yellow highlighted syntax has been added when displaying schemas
>

[jira] [Created] (DRILL-5136) Some SQL statements fail when using Simba ODBC driver 1.3

2016-12-19 Thread Robert Hou (JIRA)

Robert Hou created DRILL-5136:
-

 Summary: Some SQL statements fail when using Simba ODBC driver 1.3
 Key: DRILL-5136
 URL: https://issues.apache.org/jira/browse/DRILL-5136
 Project: Apache Drill
  Issue Type: Bug
  Components: Client - ODBC
Affects Versions: 1.9.0
Reporter: Robert Hou


"show schemas" does not work with Simba ODBC driver 

SQL>show schemas
1: SQLPrepare = [MapR][Drill] (1040) Drill failed to execute the query: show 
schemas
[30029]Query execution error. Details:[
PARSE ERROR: Encountered "( show" at line 1, column 15.
Was expecting one of:
 ...
 ...
 ...
 ...
 ...
"LATERAL" ...
"(" "WITH" ...
"(" "+" ...
"(" "-" ...
"("  ...
"("  ...
"("

[jira] [Updated] (DRILL-5136) Some SQL statements fail due to PreparedStatement

2016-12-19 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou updated DRILL-5136:
--
Description: 
"show schemas" does not work.

SQL>show schemas
1: SQLPrepare = [MapR][Drill] (1040) Drill failed to execute the query: show 
schemas
[30029]Query execution error. Details:[
PARSE ERROR: Encountered "( show" at line 1, column 15.
Was expecting one of:
 ...
 ...
 ...
 ...
 ...
"LATERAL" ...
"(" "WITH" ...
"(" "+" ...
"(" "-" ...
"("  ...
"("  ...
"(" show schemas
1: SQLPrepare = [MapR][Drill] (1040) Drill failed to execute the query: show 
schemas
[30029]Query execution error. Details:[
PARSE ERROR: Encountered "( show" at line 1, column 15.
Was expecting one of:
 ...
 ...
 ...
 ...
 ...
"LATERAL" ...
"(" "WITH" ...
"(" "+" ...
"(" "-" ...
"("  ...
"("  ...
"("  Some SQL statements fail due to PreparedStatement
> -
>
> Key: DRILL-5136
> URL: https://issues.apache.org/jira/browse/DRILL-5136
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - ODBC
>Affects Versions: 1.9.0
>Reporter: Robert Hou
>Assignee: Laurent Goujon
>
> "show schemas" does not work.
> SQL>show schemas
> 1: SQLPrepare = [MapR][Drill] (1040) Drill failed to execute the query: show 
> schemas
> [30029]Query execution error. Details:[
> PARSE ERROR: Encountered "( show" at line 1, column 15.
> Was expecting one of:
>  ...
>  ...
>  ...
>  ...
>  ...
> "LATERAL" ...
> "(" "WITH" ...
> "(" "+" ...
> "(" "-" ...
> "("  ...
> "("  ...
> "("  The query profile shows this SQL statement is being executed:
>{color:blue} SELECT * FROM {color} (show schemas) {color:blue} LIMIT 0 
> {color}
>  
> The blue text has been added when displaying schemas
> "use schema" also does not work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-5136) Some SQL statements fail due to PreparedStatement

2016-12-19 Thread Robert Hou (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15762639#comment-15762639
 ] 

Robert Hou commented on DRILL-5136:
---

With this patch, the CTAS statement still does not work:

SQL>create table drill_3769 as select to_date(c3 + interval '1' day ) from 
ctas_t1 order by c3
1: SQLExec = [MapR][Drill] (1040) Drill failed to execute the query: create 
table drill_3769 as select to_date(c3 + interval '1' day ) from ctas_t1 order 
by c3
[30029]Query execution error. Details:[ 
VALIDATION ERROR: A table or view with given name [drill_3769] already exists 
in schema [dfs.ctas_parquet]

> Some SQL statements fail due to PreparedStatement
> -
>
> Key: DRILL-5136
> URL: https://issues.apache.org/jira/browse/DRILL-5136
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - ODBC
>Affects Versions: 1.9.0
>Reporter: Robert Hou
>Assignee: Laurent Goujon
>
> "show schemas" does not work.
> SQL>show schemas
> 1: SQLPrepare = [MapR][Drill] (1040) Drill failed to execute the query: show 
> schemas
> [30029]Query execution error. Details:[
> PARSE ERROR: Encountered "( show" at line 1, column 15.
> Was expecting one of:
>  ...
>  ...
>  ...
>  ...
>  ...
> "LATERAL" ...
> "(" "WITH" ...
> "(" "+" ...
> "(" "-" ...
> "("  ...
> "("  ...
> "("  The query profile shows this SQL statement is being executed:
>{color:blue} SELECT * FROM {color} (show schemas) {color:blue} LIMIT 0 
> {color}
>  
> The blue text has been added when displaying schemas
> "use schema" also does not work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-5311) C++ connector connect doesn't check handshake result for timeout

2017-03-21 Thread Robert Hou (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15934953#comment-15934953
 ] 

Robert Hou commented on DRILL-5311:
---

The framework will be updated to support ODBC.  We plan to use a python script 
to run SQL queries, it is a work in progress.  Hopefully 1.11.  I don't know if 
this will help with testing the C++ connector.

> C++ connector connect doesn't check handshake result for timeout
> 
>
> Key: DRILL-5311
> URL: https://issues.apache.org/jira/browse/DRILL-5311
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - C++
>Reporter: Laurent Goujon
>Assignee: Sudheesh Katkam
>  Labels: ready-to-commit
> Fix For: 1.11.0
>
>
> The C++ connector connect methods returns okay as soon as the tcp connection 
> is succesfully established between client and server, and the handshake 
> message is sent. However it doesn't wait for handshake to have completed.
> The consequence is that if handshake failed, the error is deferred to the 
> first query, which might be unexpected by the application.
> I believe that validateHanshake method in drillClientImpl should wait for the 
> handshake to complete, as it seems a bit more saner...



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (DRILL-5316) C++ Client Crashes When drillbitsVector.count is 0 after zoo_get_children completed with ZOK

2017-03-28 Thread Robert Hou (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15945783#comment-15945783
 ] 

Robert Hou commented on DRILL-5316:
---

I tried a couple of cluster IDs.  I used random characters and symbols.  One ID 
was almost 100 characters.  I have been unable to reproduce it so far.  I 
tested with v1.3.4, which runs on Windows.

> C++ Client Crashes When drillbitsVector.count is 0 after zoo_get_children 
> completed with ZOK
> 
>
> Key: DRILL-5316
> URL: https://issues.apache.org/jira/browse/DRILL-5316
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - C++
>Reporter: Rob Wu
>Assignee: Chun Chang
>Priority: Critical
>  Labels: ready-to-commit
> Fix For: 1.11.0
>
>
> When connecting to drillbit with Zookeeper, occasionally the C++ client would 
> crash without any reason.
> A further look into the code revealed that during this call 
> rc=zoo_get_children(p_zh.get(), m_path.c_str(), 0, ); 
> zoo_get_children returns ZOK (0) but drillbitsVector.count is 0.
> This causes drillbits to stay empty and thus 
> causes err = zook.getEndPoint(drillbits[drillbits.size() -1], endpoint); to 
> crash
> Size check should be done to prevent this from happening



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (DRILL-5316) C++ Client Crashes When drillbitsVector.count is 0 after zoo_get_children completed with ZOK

2017-03-28 Thread Robert Hou (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15945780#comment-15945780
 ] 

Robert Hou commented on DRILL-5316:
---

I asked Rob for some ideas on how to reproduce this problem.  He wrote:

For me, the issue surfaced on its own when the VM becomes unstable. 
To make it unstable, I modified the drill-override.conf’s cluster id to be a 
something very very long with different symbols. Restart the drill bit so the 
new setting gets loaded. Then switch it back to cluster id contains dots and 
restart again. Finally, try connecting with cluster id “drill” or “drillbits” 
or “drillbits1” or “drilbit1” or “drillbit” (or other cluster id of your 
choice).

Not sure how easily it is to reproduce on a new set up again.
Perhaps simply stopping the zk service would do?

> C++ Client Crashes When drillbitsVector.count is 0 after zoo_get_children 
> completed with ZOK
> 
>
> Key: DRILL-5316
> URL: https://issues.apache.org/jira/browse/DRILL-5316
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - C++
>Reporter: Rob Wu
>Assignee: Chun Chang
>Priority: Critical
>  Labels: ready-to-commit
> Fix For: 1.11.0
>
>
> When connecting to drillbit with Zookeeper, occasionally the C++ client would 
> crash without any reason.
> A further look into the code revealed that during this call 
> rc=zoo_get_children(p_zh.get(), m_path.c_str(), 0, ); 
> zoo_get_children returns ZOK (0) but drillbitsVector.count is 0.
> This causes drillbits to stay empty and thus 
> causes err = zook.getEndPoint(drillbits[drillbits.size() -1], endpoint); to 
> crash
> Size check should be done to prevent this from happening

--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (DRILL-5374) Parquet filter pushdown does not prune partition with nulls when predicate uses float column

2017-03-21 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou updated DRILL-5374:
--
Attachment: drill.parquet_metadata
0_0_5.parquet
0_0_4.parquet
0_0_3.parquet
0_0_2.parquet
0_0_1.parquet

> Parquet filter pushdown does not prune partition with nulls when predicate 
> uses float column
> 
>
> Key: DRILL-5374
> URL: https://issues.apache.org/jira/browse/DRILL-5374
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.9.0
>Reporter: Robert Hou
>Assignee: Jinfeng Ni
> Attachments: 0_0_1.parquet, 0_0_2.parquet, 0_0_3.parquet, 
> 0_0_4.parquet, 0_0_5.parquet, drill.parquet_metadata
>
>
> Drill does not prune enough partitions for this query when filter pushdown is 
> used with metadata caching. The float column is being compared with a double 
> value.
> {code}
> 0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(*) from 
> orders_parts_metadata where float_id < 1100.0;
> {code}
> To reproduce the problem, put the attached files into a directory. Then 
> {code}
> create the metadata:
> refresh table metadata dfs.`path_to_directory`;
> {code}
> For example, if you put the files in 
> /drill/testdata/filter/orders_parts_metadata, then run this sql command
> {code}
> refresh table metadata dfs.`/drill/testdata/filter/orders_parts_metadata`;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Closed] (DRILL-5086) ClassCastException when filter pushdown is used with a bigint or float column and metadata caching.

2017-03-21 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou closed DRILL-5086.
-

> ClassCastException when filter pushdown is used with a bigint or float column 
> and metadata caching.
> ---
>
> Key: DRILL-5086
> URL: https://issues.apache.org/jira/browse/DRILL-5086
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.9.0
>Reporter: Robert Hou
>Assignee: Parth Chandra
>  Labels: ready-to-commit
> Fix For: 1.10.0
>
> Attachments: 0_0_1.parquet, 0_0_2.parquet, 0_0_3.parquet, 
> 0_0_4.parquet, 0_0_5.parquet, drill.parquet_metadata
>
>
> This query results in a ClassCastException when filter pushdown is used with 
> metadata caching.  The bigint column is being compared with an integer value.
>0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(\*) from 
> orders_parts_metadata where bigint_id < 1100;
>Error: SYSTEM ERROR: ClassCastException: java.lang.Integer cannot be cast 
> to java.lang.Long
> To reproduce the problem, put the attached files into a directory.  Then 
> create the metadata:
>refresh table metadata dfs.`path_to_directory`;
> For example, if you put the files in 
> /drill/testdata/filter/orders_parts_metadata, then run this sql command
>refresh table metadata dfs.`/drill/testdata/filter/orders_parts_metadata`;
> A similar problem occurs when a float column is being compared with a double 
> value.
>0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(\*) from 
> orders_parts_metadata where float_id < 1100.0;
>Error: SYSTEM ERROR: ClassCastException
> Also when a timestamp column is being compared with a string.
>0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(\*) from 
> orders_parts_metadata where timestamp_id < '2016-10-13';
>Error: SYSTEM ERROR: ClassCastException: java.lang.Integer cannot be cast 
> to java.lang.Long



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (DRILL-5374) Parquet filter pushdown does not prune partition with nulls when predicate uses float column

2017-03-21 Thread Robert Hou (JIRA)

Robert Hou created DRILL-5374:
-

 Summary: Parquet filter pushdown does not prune partition with 
nulls when predicate uses float column
 Key: DRILL-5374
 URL: https://issues.apache.org/jira/browse/DRILL-5374
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning & Optimization
Affects Versions: 1.9.0
Reporter: Robert Hou
Assignee: Jinfeng Ni


Drill does not prune enough partitions for this query when filter pushdown is 
used with metadata caching. The float column is being compared with a double 
value.

{code}
0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(*) from 
orders_parts_metadata where float_id < 1100.0;
{code}

To reproduce the problem, put the attached files into a directory. Then 

{code}
create the metadata:
refresh table metadata dfs.`path_to_directory`;
{code}

For example, if you put the files in 
/drill/testdata/filter/orders_parts_metadata, then run this sql command

{code}
refresh table metadata dfs.`/drill/testdata/filter/orders_parts_metadata`;
{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (DRILL-5374) Parquet filter pushdown does not prune partition with nulls when predicate uses float column

2017-03-21 Thread Robert Hou (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15935591#comment-15935591
 ] 

Robert Hou commented on DRILL-5374:
---

This is the Scan step from the explain plan:

{code}
00-06Scan(groupscan=[ParquetGroupScan 
[entries=[ReadEntryWithPath 
[path=/drill/testdata/filter/orders_parts_metadata/0_0_1.parquet], 
ReadEntryWithPath 
[path=/drill/testdata/filter/orders_parts_metadata/0_0_4.parquet], 
ReadEntryWithPath 
[path=/drill/testdata/filter/orders_parts_metadata/0_0_2.parquet]], 
selectionRoot=/drill/testdata/filter/orders_parts_metadata, numFiles=3, 
usedMetadataFile=true, 
cacheFileRoot=/drill/testdata/filter/orders_parts_metadata, 
columns=[`float_id`]]])
{code}

Partition /drill/testdata/filter/orders_parts_metadata/0_0_4.parquet should not 
be scanned because it contains all null values for the float_id column.

> Parquet filter pushdown does not prune partition with nulls when predicate 
> uses float column
> 
>
> Key: DRILL-5374
> URL: https://issues.apache.org/jira/browse/DRILL-5374
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.9.0
>Reporter: Robert Hou
>Assignee: Jinfeng Ni
> Attachments: 0_0_1.parquet, 0_0_2.parquet, 0_0_3.parquet, 
> 0_0_4.parquet, 0_0_5.parquet, drill.parquet_metadata
>
>
> Drill does not prune enough partitions for this query when filter pushdown is 
> used with metadata caching. The float column is being compared with a double 
> value.
> {code}
> 0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(*) from 
> orders_parts_metadata where float_id < 1100.0;
> {code}
> To reproduce the problem, put the attached files into a directory. Then 
> {code}
> create the metadata:
> refresh table metadata dfs.`path_to_directory`;
> {code}
> For example, if you put the files in 
> /drill/testdata/filter/orders_parts_metadata, then run this sql command
> {code}
> refresh table metadata dfs.`/drill/testdata/filter/orders_parts_metadata`;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (DRILL-5086) ClassCastException when filter pushdown is used with a bigint or float column and metadata caching.

2017-03-21 Thread Robert Hou (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15935592#comment-15935592
 ] 

Robert Hou commented on DRILL-5086:
---

I have verified that the three SQL statements execute without errors.

The second SQL statement, however, does not prune enough partitions.  I have 
created a new Jira, [DRILL-5374], to track this new problem.

> ClassCastException when filter pushdown is used with a bigint or float column 
> and metadata caching.
> ---
>
> Key: DRILL-5086
> URL: https://issues.apache.org/jira/browse/DRILL-5086
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.9.0
>Reporter: Robert Hou
>Assignee: Parth Chandra
>  Labels: ready-to-commit
> Fix For: 1.10.0
>
> Attachments: 0_0_1.parquet, 0_0_2.parquet, 0_0_3.parquet, 
> 0_0_4.parquet, 0_0_5.parquet, drill.parquet_metadata
>
>
> This query results in a ClassCastException when filter pushdown is used with 
> metadata caching.  The bigint column is being compared with an integer value.
>0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(\*) from 
> orders_parts_metadata where bigint_id < 1100;
>Error: SYSTEM ERROR: ClassCastException: java.lang.Integer cannot be cast 
> to java.lang.Long
> To reproduce the problem, put the attached files into a directory.  Then 
> create the metadata:
>refresh table metadata dfs.`path_to_directory`;
> For example, if you put the files in 
> /drill/testdata/filter/orders_parts_metadata, then run this sql command
>refresh table metadata dfs.`/drill/testdata/filter/orders_parts_metadata`;
> A similar problem occurs when a float column is being compared with a double 
> value.
>0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(\*) from 
> orders_parts_metadata where float_id < 1100.0;
>Error: SYSTEM ERROR: ClassCastException
> Also when a timestamp column is being compared with a string.
>0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(\*) from 
> orders_parts_metadata where timestamp_id < '2016-10-13';
>Error: SYSTEM ERROR: ClassCastException: java.lang.Integer cannot be cast 
> to java.lang.Long



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (DRILL-5316) C++ Client Crashes When drillbitsVector.count is 0 after zoo_get_children completed with ZOK

2017-04-02 Thread Robert Hou (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15952903#comment-15952903
 ] 

Robert Hou commented on DRILL-5316:
---

I repeated the steps above, with v1.3.6 on Windows.  This time, there is a 
single error, and the error code is different.

ERROR [08S01] [MapR][Drill] (10) Failure occurred while trying to connect to 
zk=10.10.100.186:5181/drill/drillbits1

> C++ Client Crashes When drillbitsVector.count is 0 after zoo_get_children 
> completed with ZOK
> 
>
> Key: DRILL-5316
> URL: https://issues.apache.org/jira/browse/DRILL-5316
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - C++
>Reporter: Rob Wu
>Assignee: Chun Chang
>Priority: Critical
>  Labels: ready-to-commit
> Fix For: 1.11.0
>
>
> When connecting to drillbit with Zookeeper, occasionally the C++ client would 
> crash without any reason.
> A further look into the code revealed that during this call 
> rc=zoo_get_children(p_zh.get(), m_path.c_str(), 0, ); 
> zoo_get_children returns ZOK (0) but drillbitsVector.count is 0.
> This causes drillbits to stay empty and thus 
> causes err = zook.getEndPoint(drillbits[drillbits.size() -1], endpoint); to 
> crash
> Size check should be done to prevent this from happening



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (DRILL-5316) C++ Client Crashes When drillbitsVector.count is 0 after zoo_get_children completed with ZOK

2017-04-02 Thread Robert Hou (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15952901#comment-15952901
 ] 

Robert Hou commented on DRILL-5316:
---

I tried to enable logging, but it did not seem to work.  I set logging to 
LOG_TRACE and specified a directory.

I am using v1.2.1, which runs on Windows.

> C++ Client Crashes When drillbitsVector.count is 0 after zoo_get_children 
> completed with ZOK
> 
>
> Key: DRILL-5316
> URL: https://issues.apache.org/jira/browse/DRILL-5316
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - C++
>Reporter: Rob Wu
>Assignee: Chun Chang
>Priority: Critical
>  Labels: ready-to-commit
> Fix For: 1.11.0
>
>
> When connecting to drillbit with Zookeeper, occasionally the C++ client would 
> crash without any reason.
> A further look into the code revealed that during this call 
> rc=zoo_get_children(p_zh.get(), m_path.c_str(), 0, ); 
> zoo_get_children returns ZOK (0) but drillbitsVector.count is 0.
> This causes drillbits to stay empty and thus 
> causes err = zook.getEndPoint(drillbits[drillbits.size() -1], endpoint); to 
> crash
> Size check should be done to prevent this from happening



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Comment Edited] (DRILL-5316) C++ Client Crashes When drillbitsVector.count is 0 after zoo_get_children completed with ZOK

2017-04-02 Thread Robert Hou (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15952900#comment-15952900
 ] 

Robert Hou edited comment on DRILL-5316 at 4/3/17 12:32 AM:


How can I verify that the C++ Client crashed?

This is what I have done so far.

1) Set cluster-id in drill-override.conf file to
{code}
"-alakjdslkfjlskjdflkasdoiweuroiweurHKJQAIUW-__ksldjfYIUEWUIkljdsfoIOUOIUjlkjklj-_"
{code}

2) reboot Drill

3) Set cluster-id to "rhou1_com.drillbits.com.org"

4) reboot Drill

5) From Windows, connect using Drill Explorer.  Set cluster-id to 
/drill/drillbits1

Drill Explorer pops a window that says:

{code}
An error occurred while communicating with the data source.
ERROR [HY000] [MapR][Drill] (10) Failure occurred while trying to connect to 
zk=10.10.100.186:5181/drill/drillbits1
ERROR [HY000] [MapR][Drill] (10) Failure occurred while trying to connect to 
zk=10.10.100.186:5181/drill/drillbits1
{code}


was (Author: rhou):
How can I verify that the C++ Client crashed?

This is what I have done so far.

1) Set cluster-id in drill-override.conf file to 
"-alakjdslkfjlskjdflkasdoiweuroiweurHKJQAIUW-__ksldjfYIUEWUIkljdsfoIOUOIUjlkjklj-_"

2) reboot Drill

3) Set cluster-id to "rhou1_com.drillbits.com.org"

4) reboot Drill

5) From Windows, connect using Drill Explorer.  Set cluster-id to 
/drill/drillbits1

Drill Explorer pops a window that says:

{code}
An error occurred while communicating with the data source.
ERROR [HY000] [MapR][Drill] (10) Failure occurred while trying to connect to 
zk=10.10.100.186:5181/drill/drillbits1
ERROR [HY000] [MapR][Drill] (10) Failure occurred while trying to connect to 
zk=10.10.100.186:5181/drill/drillbits1
{code}

> C++ Client Crashes When drillbitsVector.count is 0 after zoo_get_children 
> completed with ZOK
> 
>
> Key: DRILL-5316
> URL: https://issues.apache.org/jira/browse/DRILL-5316
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - C++
>Reporter: Rob Wu
>Assignee: Chun Chang
>Priority: Critical
>  Labels: ready-to-commit
> Fix For: 1.11.0
>
>
> When connecting to drillbit with Zookeeper, occasionally the C++ client would 
> crash without any reason.
> A further look into the code revealed that during this call 
> rc=zoo_get_children(p_zh.get(), m_path.c_str(), 0, ); 
> zoo_get_children returns ZOK (0) but drillbitsVector.count is 0.
> This causes drillbits to stay empty and thus 
> causes err = zook.getEndPoint(drillbits[drillbits.size() -1], endpoint); to 
> crash
> Size check should be done to prevent this from happening



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (DRILL-5316) C++ Client Crashes When drillbitsVector.count is 0 after zoo_get_children completed with ZOK

2017-04-02 Thread Robert Hou (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15952900#comment-15952900
 ] 

Robert Hou commented on DRILL-5316:
---

How can I verify that the C++ Client crashed?

This is what I have done so far.

1) Set cluster-id in drill-override.conf file to 
"-alakjdslkfjlskjdflkasdoiweuroiweurHKJQAIUW-__ksldjfYIUEWUIkljdsfoIOUOIUjlkjklj-_"

2) reboot Drill

3) Set cluster-id to "rhou1_com.drillbits.com.org"

4) reboot Drill

5) From Windows, connect using Drill Explorer.  Set cluster-id to 
/drill/drillbits1

Drill Explorer pops a window that says:

{code}
An error occurred while communicating with the data source.
ERROR [HY000] [MapR][Drill] (10) Failure occurred while trying to connect to 
zk=10.10.100.186:5181/drill/drillbits1
ERROR [HY000] [MapR][Drill] (10) Failure occurred while trying to connect to 
zk=10.10.100.186:5181/drill/drillbits1
{code}

> C++ Client Crashes When drillbitsVector.count is 0 after zoo_get_children 
> completed with ZOK
> 
>
> Key: DRILL-5316
> URL: https://issues.apache.org/jira/browse/DRILL-5316
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - C++
>Reporter: Rob Wu
>Assignee: Chun Chang
>Priority: Critical
>  Labels: ready-to-commit
> Fix For: 1.11.0
>
>
> When connecting to drillbit with Zookeeper, occasionally the C++ client would 
> crash without any reason.
> A further look into the code revealed that during this call 
> rc=zoo_get_children(p_zh.get(), m_path.c_str(), 0, ); 
> zoo_get_children returns ZOK (0) but drillbitsVector.count is 0.
> This causes drillbits to stay empty and thus 
> causes err = zook.getEndPoint(drillbits[drillbits.size() -1], endpoint); to 
> crash
> Size check should be done to prevent this from happening



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (DRILL-5659) C++ Client (master) behavior is unstable resulting incorrect result or exception in API calls

2017-07-11 Thread Robert Hou (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16082688#comment-16082688
 ] 

Robert Hou commented on DRILL-5659:
---

Hi Rob,

Is this a blocking issue for you?  How likely is it that a customer will 
encounter this issue?

> C++ Client (master) behavior is unstable resulting incorrect result or 
> exception in API calls
> -
>
> Key: DRILL-5659
> URL: https://issues.apache.org/jira/browse/DRILL-5659
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Rob Wu
> Fix For: 1.11.0
>
> Attachments: 1-10cppClient-1.10.0Drillbit-hive.log, 
> 1-10cppClient-1.10.0Drillbit-metadata and catalog test.log, 
> 1-10cppClient-1.9.0Drillbit-dfs.log, 1-10cppClient-1.9.0Drillbit-metadata and 
> catalog test.log, 1-11cppClient-1.10.0Drillbit-hive.log, 
> 1-11cppClient-1.10.0Drillbit-metadata and catalog test.log, 
> 1-11cppClient-1.9.0Drillbit-dfs.log, 1-11cppClient-1.9.0Drillbit-metadata and 
> catalog test.log
>
>
> I recently compiled the C++ client (on windows) from master and tested 
> against a 1.9.0 drillbit. The client's behavior does not meet the stable 
> release requirement.
> Some API functionalities are broken and should be investigated.
> Most noticeable is the getColumns(...) call. It will throw an exception with 
> "Cannot decode getcolumns results" when the number of rows (column records) 
> exceeds a certain number. 
> I also noticed that: during query execution + data retrieval, if the table is 
> large enough, the result coming back will start to become NULL or empty.
> I will see if I can generate some drillclient logs to put in the attachment.
> I will also compile and test on other platforms.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-4281) Drill should support inbound impersonation

2017-07-25 Thread Robert Hou (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16100468#comment-16100468
 ] 

Robert Hou commented on DRILL-4281:
---

ODBC is verified.  There is a new configuration parameter, DelegationUID.  This 
can be set in the odbc.ini file.

> Drill should support inbound impersonation
> --
>
> Key: DRILL-4281
> URL: https://issues.apache.org/jira/browse/DRILL-4281
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Keys Botzum
>Assignee: Sudheesh Katkam
>  Labels: doc-impacting, security
> Fix For: 1.6.0
>
>
> Today Drill supports impersonation *to* external sources. For example I can 
> authenticate to Drill as myself and then Drill will access HDFS using 
> impersonation
> In many scenarios we also need impersonation to Drill. For example I might 
> use some front end tool (such as Tableau) and authenticate to it as myself. 
> That tool (server version) then needs to access Drill to perform queries and 
> I want those queries to run as myself, not as the Tableau user. While in 
> theory the intermediate tool could store the userid & password for every user 
> to the Drill this isn't a scalable or very secure solution.
> Note that HS2 today does support inbound impersonation as described here:  
> https://issues.apache.org/jira/browse/HIVE-5155 
> The above is not the best approach as it is tied to the connection object 
> which is very coarse grained and potentially expensive. It would be better if 
> there was a call on the ODBC/JDBC driver to switch the identity on a existing 
> connection. Most modern SQL databases (Oracle, DB2) support such function.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5316) C++ Client Crashes When drillbitsVector.count is 0 after zoo_get_children completed with ZOK

2017-07-26 Thread Robert Hou (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16102504#comment-16102504
 ] 

Robert Hou commented on DRILL-5316:
---

I tried to reproduce and verify this problem, with help from @robwu15, but I 
was not able to.  I will close this unless we find another way to reproduce it.

> C++ Client Crashes When drillbitsVector.count is 0 after zoo_get_children 
> completed with ZOK
> 
>
> Key: DRILL-5316
> URL: https://issues.apache.org/jira/browse/DRILL-5316
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - C++
>Reporter: Rob Wu
>Assignee: Robert Hou
>Priority: Critical
>  Labels: ready-to-commit
> Fix For: 1.11.0
>
>
> When connecting to drillbit with Zookeeper, occasionally the C++ client would 
> crash without any reason.
> A further look into the code revealed that during this call 
> rc=zoo_get_children(p_zh.get(), m_path.c_str(), 0, ); 
> zoo_get_children returns ZOK (0) but drillbitsVector.count is 0.
> This causes drillbits to stay empty and thus 
> causes err = zook.getEndPoint(drillbits[drillbits.size() -1], endpoint); to 
> crash
> Size check should be done to prevent this from happening



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Comment Edited] (DRILL-5316) C++ Client Crashes When drillbitsVector.count is 0 after zoo_get_children completed with ZOK

2017-07-26 Thread Robert Hou (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16102504#comment-16102504
 ] 

Robert Hou edited comment on DRILL-5316 at 7/27/17 12:14 AM:
-

I tried to reproduce and verify this problem, with help from [~robertw], but I 
was not able to.  I will close this unless we find another way to reproduce it.


was (Author: rhou):
I tried to reproduce and verify this problem, with help from @robwu15, but I 
was not able to.  I will close this unless we find another way to reproduce it.

> C++ Client Crashes When drillbitsVector.count is 0 after zoo_get_children 
> completed with ZOK
> 
>
> Key: DRILL-5316
> URL: https://issues.apache.org/jira/browse/DRILL-5316
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - C++
>Reporter: Rob Wu
>Assignee: Robert Hou
>Priority: Critical
>  Labels: ready-to-commit
> Fix For: 1.11.0
>
>
> When connecting to drillbit with Zookeeper, occasionally the C++ client would 
> crash without any reason.
> A further look into the code revealed that during this call 
> rc=zoo_get_children(p_zh.get(), m_path.c_str(), 0, ); 
> zoo_get_children returns ZOK (0) but drillbitsVector.count is 0.
> This causes drillbits to stay empty and thus 
> causes err = zook.getEndPoint(drillbits[drillbits.size() -1], endpoint); to 
> crash
> Size check should be done to prevent this from happening



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (DRILL-5732) Unable to allocate sv2 for 9039 records, and not enough batchGroups to spill.

2017-08-18 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou updated DRILL-5732:
--
Attachment: drillbit.log

> Unable to allocate sv2 for 9039 records, and not enough batchGroups to spill.
> -
>
> Key: DRILL-5732
> URL: https://issues.apache.org/jira/browse/DRILL-5732
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.10.0
>Reporter: Robert Hou
>Assignee: Paul Rogers
> Attachments: drillbit.log
>
>
> git commit id:
> {noformat}
> | 1.12.0-SNAPSHOT  | e9065b55ea560e7f737d6fcb4948f9e945b9b14f  | DRILL-5660: 
> Parquet metadata caching improvements  | 15.08.2017 @ 09:31:00 PDT  | 
> r...@qa-node190.qa.lab  | 15.08.2017 @ 13:29:26 PDT  |
> {noformat}
> Query is:
> {noformat}
> ALTER SESSION SET `exec.sort.disable_managed` = false;
> alter session set `planner.disable_exchanges` = true;
> alter session set `planner.memory.max_query_memory_per_node` = 104857600;
> alter session set `planner.width.max_per_node` = 1;
> alter session set `planner.width.max_per_query` = 1;
> select max(col1), max(cs_sold_date_sk), max(cs_sold_time_sk), 
> max(cs_ship_date_sk), max(cs_bill_customer_sk), max(cs_bill_cdemo_sk), 
> max(cs_bill_hdemo_sk), max(cs_bill_addr_sk), max(cs_ship_customer_sk), 
> max(cs_ship_cdemo_sk), max(cs_ship_hdemo_sk), max(cs_ship_addr_sk), 
> max(cs_call_center_sk), max(cs_catalog_page_sk), max(cs_ship_mode_sk), 
> min(cs_warehouse_sk), max(cs_item_sk), max(cs_promo_sk), 
> max(cs_order_number), max(cs_quantity), max(cs_wholesale_cost), 
> max(cs_list_price), max(cs_sales_price), max(cs_ext_discount_amt), 
> min(cs_ext_sales_price), max(cs_ext_wholesale_cost), min(cs_ext_list_price), 
> min(cs_ext_tax), min(cs_coupon_amt), max(cs_ext_ship_cost), max(cs_net_paid), 
> max(cs_net_paid_inc_tax), min(cs_net_paid_inc_ship), 
> min(cs_net_paid_inc_ship_tax), min(cs_net_profit), min(c_customer_sk), 
> min(length(c_customer_id)), max(c_current_cdemo_sk), max(c_current_hdemo_sk), 
> min(c_current_addr_sk), min(c_first_shipto_date_sk), 
> min(c_first_sales_date_sk), min(length(c_salutation)), 
> min(length(c_first_name)), min(length(c_last_name)), 
> min(length(c_preferred_cust_flag)), max(c_birth_day), min(c_birth_month), 
> min(c_birth_year), max(c_last_review_date), c_email_address  from (select 
> cs_sold_date_sk+cs_sold_time_sk col1, * from 
> dfs.`/drill/testdata/resource-manager/md1362` order by c_email_address nulls 
> first) d where d.col1 > 2536816 and c_email_address is not null group by 
> c_email_address;
> ALTER SESSION SET `exec.sort.disable_managed` = true;
> alter session set `planner.disable_exchanges` = false;
> alter session set `planner.memory.max_query_memory_per_node` = 2147483648;
> alter session set `planner.width.max_per_node` = 17;
> alter session set `planner.width.max_per_query` = 1000;
> {noformat}
> Here is the stack trace:
> {noformat}
> 2017-08-18 13:15:27,052 [2668b522-5833-8fd2-0b6d-e685197f0ae3:frag:0:0] DEBUG 
> o.a.d.e.t.g.SingleBatchSorterGen27 - Took 6445 us to sort 9039 records
> 2017-08-18 13:15:27,420 [2668b522-5833-8fd2-0b6d-e685197f0ae3:frag:0:0] DEBUG 
> o.a.d.e.p.i.xsort.ExternalSortBatch - Copier allocator current allocation 0
> 2017-08-18 13:15:27,420 [2668b522-5833-8fd2-0b6d-e685197f0ae3:frag:0:0] DEBUG 
> o.a.d.e.p.i.xsort.ExternalSortBatch - mergeAndSpill: starting total size in 
> memory = 71964288
> 2017-08-18 13:15:27,421 [2668b522-5833-8fd2-0b6d-e685197f0ae3:frag:0:0] INFO  
> o.a.d.e.p.i.xsort.ExternalSortBatch - User Error Occurred: One or more nodes 
> ran out of memory while executing the query.
> org.apache.drill.common.exceptions.UserException: RESOURCE ERROR: One or more 
> nodes ran out of memory while executing the query.
> Unable to allocate sv2 for 9039 records, and not enough batchGroups to spill.
> batchGroups.size 1
> spilledBatchGroups.size 0
> allocated memory 71964288
> allocator limit 52428800
> [Error Id: 7b248f12-2b31-4013-86b6-92e6c842db48 ]
> at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:550)
>  ~[drill-common-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.newSV2(ExternalSortBatch.java:637)
>  [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.innerNext(ExternalSortBatch.java:379)
>  [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:164)
>  [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:225)
>

[jira] [Created] (DRILL-5732) Unable to allocate sv2 for 9039 records, and not enough batchGroups to spill.

2017-08-18 Thread Robert Hou (JIRA)

Robert Hou created DRILL-5732:
-

 Summary: Unable to allocate sv2 for 9039 records, and not enough 
batchGroups to spill.
 Key: DRILL-5732
 URL: https://issues.apache.org/jira/browse/DRILL-5732
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.10.0
Reporter: Robert Hou
Assignee: Paul Rogers


git commit id:
{noformat}
| 1.12.0-SNAPSHOT  | e9065b55ea560e7f737d6fcb4948f9e945b9b14f  | DRILL-5660: 
Parquet metadata caching improvements  | 15.08.2017 @ 09:31:00 PDT  | 
r...@qa-node190.qa.lab  | 15.08.2017 @ 13:29:26 PDT  |
{noformat}

Query is:
{noformat}
ALTER SESSION SET `exec.sort.disable_managed` = false;
alter session set `planner.disable_exchanges` = true;
alter session set `planner.memory.max_query_memory_per_node` = 104857600;
alter session set `planner.width.max_per_node` = 1;
alter session set `planner.width.max_per_query` = 1;
select max(col1), max(cs_sold_date_sk), max(cs_sold_time_sk), 
max(cs_ship_date_sk), max(cs_bill_customer_sk), max(cs_bill_cdemo_sk), 
max(cs_bill_hdemo_sk), max(cs_bill_addr_sk), max(cs_ship_customer_sk), 
max(cs_ship_cdemo_sk), max(cs_ship_hdemo_sk), max(cs_ship_addr_sk), 
max(cs_call_center_sk), max(cs_catalog_page_sk), max(cs_ship_mode_sk), 
min(cs_warehouse_sk), max(cs_item_sk), max(cs_promo_sk), max(cs_order_number), 
max(cs_quantity), max(cs_wholesale_cost), max(cs_list_price), 
max(cs_sales_price), max(cs_ext_discount_amt), min(cs_ext_sales_price), 
max(cs_ext_wholesale_cost), min(cs_ext_list_price), min(cs_ext_tax), 
min(cs_coupon_amt), max(cs_ext_ship_cost), max(cs_net_paid), 
max(cs_net_paid_inc_tax), min(cs_net_paid_inc_ship), 
min(cs_net_paid_inc_ship_tax), min(cs_net_profit), min(c_customer_sk), 
min(length(c_customer_id)), max(c_current_cdemo_sk), max(c_current_hdemo_sk), 
min(c_current_addr_sk), min(c_first_shipto_date_sk), 
min(c_first_sales_date_sk), min(length(c_salutation)), 
min(length(c_first_name)), min(length(c_last_name)), 
min(length(c_preferred_cust_flag)), max(c_birth_day), min(c_birth_month), 
min(c_birth_year), max(c_last_review_date), c_email_address  from (select 
cs_sold_date_sk+cs_sold_time_sk col1, * from 
dfs.`/drill/testdata/resource-manager/md1362` order by c_email_address nulls 
first) d where d.col1 > 2536816 and c_email_address is not null group by 
c_email_address;
ALTER SESSION SET `exec.sort.disable_managed` = true;
alter session set `planner.disable_exchanges` = false;
alter session set `planner.memory.max_query_memory_per_node` = 2147483648;
alter session set `planner.width.max_per_node` = 17;
alter session set `planner.width.max_per_query` = 1000;
{noformat}

Here is the stack trace:
{noformat}
2017-08-18 13:15:27,052 [2668b522-5833-8fd2-0b6d-e685197f0ae3:frag:0:0] DEBUG 
o.a.d.e.t.g.SingleBatchSorterGen27 - Took 6445 us to sort 9039 records
2017-08-18 13:15:27,420 [2668b522-5833-8fd2-0b6d-e685197f0ae3:frag:0:0] DEBUG 
o.a.d.e.p.i.xsort.ExternalSortBatch - Copier allocator current allocation 0
2017-08-18 13:15:27,420 [2668b522-5833-8fd2-0b6d-e685197f0ae3:frag:0:0] DEBUG 
o.a.d.e.p.i.xsort.ExternalSortBatch - mergeAndSpill: starting total size in 
memory = 71964288
2017-08-18 13:15:27,421 [2668b522-5833-8fd2-0b6d-e685197f0ae3:frag:0:0] INFO  
o.a.d.e.p.i.xsort.ExternalSortBatch - User Error Occurred: One or more nodes 
ran out of memory while executing the query.
org.apache.drill.common.exceptions.UserException: RESOURCE ERROR: One or more 
nodes ran out of memory while executing the query.

Unable to allocate sv2 for 9039 records, and not enough batchGroups to spill.
batchGroups.size 1
spilledBatchGroups.size 0
allocated memory 71964288
allocator limit 52428800

[Error Id: 7b248f12-2b31-4013-86b6-92e6c842db48 ]
at 
org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:550)
 ~[drill-common-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.newSV2(ExternalSortBatch.java:637)
 [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.innerNext(ExternalSortBatch.java:379)
 [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:164)
 [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:225)
 [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
 [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
 [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at

[jira] [Updated] (DRILL-5732) Unable to allocate sv2 for 9039 records, and not enough batchGroups to spill.

2017-08-18 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou updated DRILL-5732:
--
Attachment: 2668b522-5833-8fd2-0b6d-e685197f0ae3.sys.drill

> Unable to allocate sv2 for 9039 records, and not enough batchGroups to spill.
> -
>
> Key: DRILL-5732
> URL: https://issues.apache.org/jira/browse/DRILL-5732
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.10.0
>Reporter: Robert Hou
>Assignee: Paul Rogers
> Attachments: 2668b522-5833-8fd2-0b6d-e685197f0ae3.sys.drill, 
> drillbit.log
>
>
> git commit id:
> {noformat}
> | 1.12.0-SNAPSHOT  | e9065b55ea560e7f737d6fcb4948f9e945b9b14f  | DRILL-5660: 
> Parquet metadata caching improvements  | 15.08.2017 @ 09:31:00 PDT  | 
> r...@qa-node190.qa.lab  | 15.08.2017 @ 13:29:26 PDT  |
> {noformat}
> Query is:
> {noformat}
> ALTER SESSION SET `exec.sort.disable_managed` = false;
> alter session set `planner.disable_exchanges` = true;
> alter session set `planner.memory.max_query_memory_per_node` = 104857600;
> alter session set `planner.width.max_per_node` = 1;
> alter session set `planner.width.max_per_query` = 1;
> select max(col1), max(cs_sold_date_sk), max(cs_sold_time_sk), 
> max(cs_ship_date_sk), max(cs_bill_customer_sk), max(cs_bill_cdemo_sk), 
> max(cs_bill_hdemo_sk), max(cs_bill_addr_sk), max(cs_ship_customer_sk), 
> max(cs_ship_cdemo_sk), max(cs_ship_hdemo_sk), max(cs_ship_addr_sk), 
> max(cs_call_center_sk), max(cs_catalog_page_sk), max(cs_ship_mode_sk), 
> min(cs_warehouse_sk), max(cs_item_sk), max(cs_promo_sk), 
> max(cs_order_number), max(cs_quantity), max(cs_wholesale_cost), 
> max(cs_list_price), max(cs_sales_price), max(cs_ext_discount_amt), 
> min(cs_ext_sales_price), max(cs_ext_wholesale_cost), min(cs_ext_list_price), 
> min(cs_ext_tax), min(cs_coupon_amt), max(cs_ext_ship_cost), max(cs_net_paid), 
> max(cs_net_paid_inc_tax), min(cs_net_paid_inc_ship), 
> min(cs_net_paid_inc_ship_tax), min(cs_net_profit), min(c_customer_sk), 
> min(length(c_customer_id)), max(c_current_cdemo_sk), max(c_current_hdemo_sk), 
> min(c_current_addr_sk), min(c_first_shipto_date_sk), 
> min(c_first_sales_date_sk), min(length(c_salutation)), 
> min(length(c_first_name)), min(length(c_last_name)), 
> min(length(c_preferred_cust_flag)), max(c_birth_day), min(c_birth_month), 
> min(c_birth_year), max(c_last_review_date), c_email_address  from (select 
> cs_sold_date_sk+cs_sold_time_sk col1, * from 
> dfs.`/drill/testdata/resource-manager/md1362` order by c_email_address nulls 
> first) d where d.col1 > 2536816 and c_email_address is not null group by 
> c_email_address;
> ALTER SESSION SET `exec.sort.disable_managed` = true;
> alter session set `planner.disable_exchanges` = false;
> alter session set `planner.memory.max_query_memory_per_node` = 2147483648;
> alter session set `planner.width.max_per_node` = 17;
> alter session set `planner.width.max_per_query` = 1000;
> {noformat}
> Here is the stack trace:
> {noformat}
> 2017-08-18 13:15:27,052 [2668b522-5833-8fd2-0b6d-e685197f0ae3:frag:0:0] DEBUG 
> o.a.d.e.t.g.SingleBatchSorterGen27 - Took 6445 us to sort 9039 records
> 2017-08-18 13:15:27,420 [2668b522-5833-8fd2-0b6d-e685197f0ae3:frag:0:0] DEBUG 
> o.a.d.e.p.i.xsort.ExternalSortBatch - Copier allocator current allocation 0
> 2017-08-18 13:15:27,420 [2668b522-5833-8fd2-0b6d-e685197f0ae3:frag:0:0] DEBUG 
> o.a.d.e.p.i.xsort.ExternalSortBatch - mergeAndSpill: starting total size in 
> memory = 71964288
> 2017-08-18 13:15:27,421 [2668b522-5833-8fd2-0b6d-e685197f0ae3:frag:0:0] INFO  
> o.a.d.e.p.i.xsort.ExternalSortBatch - User Error Occurred: One or more nodes 
> ran out of memory while executing the query.
> org.apache.drill.common.exceptions.UserException: RESOURCE ERROR: One or more 
> nodes ran out of memory while executing the query.
> Unable to allocate sv2 for 9039 records, and not enough batchGroups to spill.
> batchGroups.size 1
> spilledBatchGroups.size 0
> allocated memory 71964288
> allocator limit 52428800
> [Error Id: 7b248f12-2b31-4013-86b6-92e6c842db48 ]
> at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:550)
>  ~[drill-common-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.newSV2(ExternalSortBatch.java:637)
>  [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.innerNext(ExternalSortBatch.java:379)
>  [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:164)
>  [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
> at 
>

[jira] [Closed] (DRILL-5522) OOM during the merge and spill process of the managed external sort

2017-09-11 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou closed DRILL-5522.
-

This has been verified.

> OOM during the merge and spill process of the managed external sort
> ---
>
> Key: DRILL-5522
> URL: https://issues.apache.org/jira/browse/DRILL-5522
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.10.0
>Reporter: Rahul Challapalli
>Assignee: Paul Rogers
> Attachments: 26e334aa-1afa-753f-3afe-862f76b80c18.sys.drill, 
> drillbit.log, drillbit.out, drill-env.sh
>
>
> git.commit.id.abbrev=1e0a14c
> The below query fails with an OOM
> {code}
> ALTER SESSION SET `exec.sort.disable_managed` = false;
> alter session set `planner.memory.max_query_memory_per_node` = 1552428800;
> create table dfs.drillTestDir.xsort_ctas3_multiple partition by (type, aCol) 
> as select type, rptds, rms, s3.rms.a aCol, uid from (
>   select * from (
> select s1.type type, flatten(s1.rms.rptd) rptds, s1.rms, s1.uid
> from (
>   select d.type type, d.uid uid, flatten(d.map.rm) rms from 
> dfs.`/drill/testdata/resource-manager/nested-large.json` d order by d.uid
> ) s1
>   ) s2
>   order by s2.rms.mapid, s2.rptds.a
> ) s3;
> {code}
> Stack trace
> {code}
> 2017-05-17 15:15:35,027 [26e334aa-1afa-753f-3afe-862f76b80c18:frag:4:2] INFO  
> o.a.d.e.w.fragment.FragmentExecutor - User Error Occurred: One or more nodes 
> ran out of memory while executing the query. (Unable to allocate buffer of 
> size 2097152 due to memory limit. Current allocation: 29229064)
> org.apache.drill.common.exceptions.UserException: RESOURCE ERROR: One or more 
> nodes ran out of memory while executing the query.
> Unable to allocate buffer of size 2097152 due to memory limit. Current 
> allocation: 29229064
> [Error Id: 619e2e34-704c-4964-a354-1348fb33ce8a ]
> at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:544)
>  ~[drill-common-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:244)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_111]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_111]
> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_111]
> Caused by: org.apache.drill.exec.exception.OutOfMemoryException: Unable to 
> allocate buffer of size 2097152 due to memory limit. Current allocation: 
> 29229064
> at 
> org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:220) 
> ~[drill-memory-base-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:195) 
> ~[drill-memory-base-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.vector.BigIntVector.reAlloc(BigIntVector.java:212) 
> ~[vector-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.vector.BigIntVector.copyFromSafe(BigIntVector.java:324) 
> ~[vector-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.vector.NullableBigIntVector.copyFromSafe(NullableBigIntVector.java:367)
>  ~[vector-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.vector.NullableBigIntVector$TransferImpl.copyValueSafe(NullableBigIntVector.java:328)
>  ~[vector-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.vector.complex.RepeatedMapVector$RepeatedMapTransferPair.copyValueSafe(RepeatedMapVector.java:360)
>  ~[vector-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.vector.complex.MapVector$MapTransferPair.copyValueSafe(MapVector.java:220)
>  ~[vector-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.vector.complex.MapVector.copyFromSafe(MapVector.java:82)
>  ~[vector-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.test.generated.PriorityQueueCopierGen49.doCopy(PriorityQueueCopierTemplate.java:34)
>  ~[na:na]
> at 
> org.apache.drill.exec.test.generated.PriorityQueueCopierGen49.next(PriorityQueueCopierTemplate.java:76)
>  ~[na:na]
> at 
> org.apache.drill.exec.physical.impl.xsort.managed.CopierHolder$BatchMerger.next(CopierHolder.java:234)
>  ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.mergeSpilledRuns(ExternalSortBatch.java:1214)
>

[jira] [Resolved] (DRILL-5522) OOM during the merge and spill process of the managed external sort

2017-09-11 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou resolved DRILL-5522.
---
Resolution: Fixed

This has been resolved.

> OOM during the merge and spill process of the managed external sort
> ---
>
> Key: DRILL-5522
> URL: https://issues.apache.org/jira/browse/DRILL-5522
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.10.0
>Reporter: Rahul Challapalli
>Assignee: Paul Rogers
> Attachments: 26e334aa-1afa-753f-3afe-862f76b80c18.sys.drill, 
> drillbit.log, drillbit.out, drill-env.sh
>
>
> git.commit.id.abbrev=1e0a14c
> The below query fails with an OOM
> {code}
> ALTER SESSION SET `exec.sort.disable_managed` = false;
> alter session set `planner.memory.max_query_memory_per_node` = 1552428800;
> create table dfs.drillTestDir.xsort_ctas3_multiple partition by (type, aCol) 
> as select type, rptds, rms, s3.rms.a aCol, uid from (
>   select * from (
> select s1.type type, flatten(s1.rms.rptd) rptds, s1.rms, s1.uid
> from (
>   select d.type type, d.uid uid, flatten(d.map.rm) rms from 
> dfs.`/drill/testdata/resource-manager/nested-large.json` d order by d.uid
> ) s1
>   ) s2
>   order by s2.rms.mapid, s2.rptds.a
> ) s3;
> {code}
> Stack trace
> {code}
> 2017-05-17 15:15:35,027 [26e334aa-1afa-753f-3afe-862f76b80c18:frag:4:2] INFO  
> o.a.d.e.w.fragment.FragmentExecutor - User Error Occurred: One or more nodes 
> ran out of memory while executing the query. (Unable to allocate buffer of 
> size 2097152 due to memory limit. Current allocation: 29229064)
> org.apache.drill.common.exceptions.UserException: RESOURCE ERROR: One or more 
> nodes ran out of memory while executing the query.
> Unable to allocate buffer of size 2097152 due to memory limit. Current 
> allocation: 29229064
> [Error Id: 619e2e34-704c-4964-a354-1348fb33ce8a ]
> at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:544)
>  ~[drill-common-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:244)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_111]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_111]
> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_111]
> Caused by: org.apache.drill.exec.exception.OutOfMemoryException: Unable to 
> allocate buffer of size 2097152 due to memory limit. Current allocation: 
> 29229064
> at 
> org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:220) 
> ~[drill-memory-base-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:195) 
> ~[drill-memory-base-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.vector.BigIntVector.reAlloc(BigIntVector.java:212) 
> ~[vector-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.vector.BigIntVector.copyFromSafe(BigIntVector.java:324) 
> ~[vector-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.vector.NullableBigIntVector.copyFromSafe(NullableBigIntVector.java:367)
>  ~[vector-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.vector.NullableBigIntVector$TransferImpl.copyValueSafe(NullableBigIntVector.java:328)
>  ~[vector-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.vector.complex.RepeatedMapVector$RepeatedMapTransferPair.copyValueSafe(RepeatedMapVector.java:360)
>  ~[vector-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.vector.complex.MapVector$MapTransferPair.copyValueSafe(MapVector.java:220)
>  ~[vector-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.vector.complex.MapVector.copyFromSafe(MapVector.java:82)
>  ~[vector-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.test.generated.PriorityQueueCopierGen49.doCopy(PriorityQueueCopierTemplate.java:34)
>  ~[na:na]
> at 
> org.apache.drill.exec.test.generated.PriorityQueueCopierGen49.next(PriorityQueueCopierTemplate.java:76)
>  ~[na:na]
> at 
> org.apache.drill.exec.physical.impl.xsort.managed.CopierHolder$BatchMerger.next(CopierHolder.java:234)
>  ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
>

[jira] [Updated] (DRILL-5670) Varchar vector throws an assertion error when allocating a new vector

2017-09-11 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou updated DRILL-5670:
--
Attachment: drillbit.log.sort

> Varchar vector throws an assertion error when allocating a new vector
> -
>
> Key: DRILL-5670
> URL: https://issues.apache.org/jira/browse/DRILL-5670
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.11.0
>Reporter: Rahul Challapalli
>Assignee: Paul Rogers
> Fix For: 1.12.0
>
> Attachments: 26498995-bbad-83bc-618f-914c37a84e1f.sys.drill, 
> 26555749-4d36-10d2-6faf-e403db40c370.sys.drill, 
> 266290f3-5fdc-5873-7372-e9ee053bf867.sys.drill, 
> 269969ca-8d4d-073a-d916-9031e3d3fbf0.sys.drill, drillbit.log, drillbit.log, 
> drillbit.log, drillbit.log, drillbit.log, drillbit.log.sort, drillbit.out, 
> drill-override.conf
>
>
> I am running this test on a private branch of [paul's 
> repository|https://github.com/paul-rogers/drill]. Below is the commit info
> {code}
> git.commit.id.abbrev=d86e16c
> git.commit.user.email=prog...@maprtech.com
> git.commit.message.full=DRILL-5601\: Rollup of external sort fixes an 
> improvements\n\n- DRILL-5513\: Managed External Sort \: OOM error during the 
> merge phase\n- DRILL-5519\: Sort fails to spill and results in an OOM\n- 
> DRILL-5522\: OOM during the merge and spill process of the managed external 
> sort\n- DRILL-5594\: Excessive buffer reallocations during merge phase of 
> external sort\n- DRILL-5597\: Incorrect "bits" vector allocation in nullable 
> vectors allocateNew()\n- DRILL-5602\: Repeated List Vector fails to 
> initialize the offset vector\n\nAll of the bugs have to do with handling 
> low-memory conditions, and with\ncorrectly estimating the sizes of vectors, 
> even when those vectors come\nfrom the spill file or from an exchange. Hence, 
> the changes for all of\nthe above issues are interrelated.\n
> git.commit.id=d86e16c551e7d3553f2cde748a739b1c5a7a7659
> git.commit.message.short=DRILL-5601\: Rollup of external sort fixes an 
> improvements
> git.commit.user.name=Paul Rogers
> git.build.user.name=Rahul Challapalli
> git.commit.id.describe=0.9.0-1078-gd86e16c
> git.build.user.email=challapallira...@gmail.com
> git.branch=d86e16c551e7d3553f2cde748a739b1c5a7a7659
> git.commit.time=05.07.2017 @ 20\:34\:39 PDT
> git.build.time=12.07.2017 @ 14\:27\:03 PDT
> git.remote.origin.url=g...@github.com\:paul-rogers/drill.git
> {code}
> Below query fails with an Assertion Error
> {code}
> 0: jdbc:drill:zk=10.10.100.190:5181> ALTER SESSION SET 
> `exec.sort.disable_managed` = false;
> +---+-+
> |  ok   |   summary   |
> +---+-+
> | true  | exec.sort.disable_managed updated.  |
> +---+-+
> 1 row selected (1.044 seconds)
> 0: jdbc:drill:zk=10.10.100.190:5181> alter session set 
> `planner.memory.max_query_memory_per_node` = 482344960;
> +---++
> |  ok   |  summary   |
> +---++
> | true  | planner.memory.max_query_memory_per_node updated.  |
> +---++
> 1 row selected (0.372 seconds)
> 0: jdbc:drill:zk=10.10.100.190:5181> alter session set 
> `planner.width.max_per_node` = 1;
> +---+--+
> |  ok   |   summary|
> +---+--+
> | true  | planner.width.max_per_node updated.  |
> +---+--+
> 1 row selected (0.292 seconds)
> 0: jdbc:drill:zk=10.10.100.190:5181> alter session set 
> `planner.width.max_per_query` = 1;
> +---+---+
> |  ok   |summary|
> +---+---+
> | true  | planner.width.max_per_query updated.  |
> +---+---+
> 1 row selected (0.25 seconds)
> 0: jdbc:drill:zk=10.10.100.190:5181> select count(*) from (select * from 
> dfs.`/drill/testdata/resource-manager/3500cols.tbl` order by 
> columns[450],columns[330],columns[230],columns[220],columns[110],columns[90],columns[80],columns[70],columns[40],columns[10],columns[20],columns[30],columns[40],columns[50],
>  
> columns[454],columns[413],columns[940],columns[834],columns[73],columns[140],columns[104],columns[],columns[30],columns[2420],columns[1520],
>  columns[1410], 
> columns[1110],columns[1290],columns[2380],columns[705],columns[45],columns[1054],columns[2430],columns[420],columns[404],columns[3350],
>  
>

[jira] [Closed] (DRILL-5445) Assertion Error in Managed External Sort when dealing with repeated maps

2017-09-11 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou closed DRILL-5445.
-

This has been verified.

> Assertion Error in Managed External Sort when dealing with repeated maps
> 
>
> Key: DRILL-5445
> URL: https://issues.apache.org/jira/browse/DRILL-5445
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.10.0
>Reporter: Rahul Challapalli
>Assignee: Paul Rogers
> Fix For: 1.12.0
>
> Attachments: 27004a3c-c53d-52d1-c7ed-4beb563447f9.sys.drill, 
> drillbit.log
>
>
> git.commit.id.abbrev=3e8b01d
> The below query fails with an Assertion Error (I am running with assertions 
> enabled)
> {code}
> ALTER SESSION SET `exec.sort.disable_managed` = false;
> alter session set `planner.width.max_per_node` = 1;
> alter session set `planner.disable_exchanges` = true;
> alter session set `planner.width.max_per_query` = 1;
> alter session set `planner.memory.max_query_memory_per_node` = 152428800;
> select count(*) from (
> select * from (
> select event_info.uid, transaction_info.trans_id, event_info.event.evnt_id
> from (
>  select userinfo.transaction.trans_id trans_id, 
> max(userinfo.event.event_time) max_event_time
>  from (
>  select uid, flatten(events) event, flatten(transactions) transaction 
> from dfs.`/drill/testdata/resource-manager/nested-large.json`
>  ) userinfo
>  where userinfo.transaction.trans_time >= userinfo.event.event_time
>  group by userinfo.transaction.trans_id
> ) transaction_info
> inner join
> (
>  select uid, flatten(events) event
>  from dfs.`/drill/testdata/resource-manager/nested-large.json`
> ) event_info
> on transaction_info.max_event_time = event_info.event.event_time) d order by 
> features[0].type) d1 where d1.uid < -1;
> {code}
> Below is the error from the logs
> {code}
> [Error Id: 26983344-dee3-4a33-8508-ad125f01fee6 on qa-node190.qa.lab:31010]
> at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:544)
>  ~[drill-common-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:295)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:160)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:264)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_111]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_111]
> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_111]
> Caused by: java.lang.RuntimeException: java.lang.AssertionError
> at 
> org.apache.drill.common.DeferredException.addThrowable(DeferredException.java:101)
>  ~[drill-common-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.fail(FragmentExecutor.java:409)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:250)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> ... 4 common frames omitted
> Caused by: java.lang.AssertionError: null
> at 
> org.apache.drill.exec.vector.complex.RepeatedMapVector.load(RepeatedMapVector.java:444)
>  ~[vector-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.cache.VectorAccessibleSerializable.readFromStream(VectorAccessibleSerializable.java:118)
>  ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.xsort.managed.BatchGroup$SpilledRun.getBatch(BatchGroup.java:222)
>  ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.xsort.managed.BatchGroup$SpilledRun.getNextIndex(BatchGroup.java:196)
>  ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.test.generated.PriorityQueueCopierGen23.setup(PriorityQueueCopierTemplate.java:60)
>  ~[na:na]
> at 
> org.apache.drill.exec.physical.impl.xsort.managed.CopierHolder.createCopier(CopierHolder.java:116)
>  ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.xsort.managed.CopierHolder.access$200(CopierHolder.java:45)
>  ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
>

[jira] [Closed] (DRILL-5465) Managed external sort results in an OOM

2017-09-11 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou closed DRILL-5465.
-

This has been verified.

> Managed external sort results in an OOM
> ---
>
> Key: DRILL-5465
> URL: https://issues.apache.org/jira/browse/DRILL-5465
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Rahul Challapalli
>Assignee: Paul Rogers
> Fix For: 1.12.0
>
> Attachments: 26f7368e-21a1-6513-74ea-a178ae1e50f8.sys.drill, 
> createViewsParquet.sql, drillbit.log
>
>
> git.commit.id.abbrev=1e0a14c
> The below query fails with an OOM on top of Tpcds SF1 parquet data. Since the 
> sort already spilled once, I assume there is sufficient memory to handle the 
> spill/merge batches. The view definition file is attached and the data can be 
> downloaded from [1]
> {code}
> use dfs.tpcds_sf1_parquet_views;
> alter session set `planner.enable_decimal_data_type` = true;
> ALTER SESSION SET `exec.sort.disable_managed` = false;
> alter session set `planner.width.max_per_node` = 1;
> alter session set `planner.disable_exchanges` = true;
> alter session set `planner.width.max_per_query` = 1;
> alter session set `planner.memory.max_query_memory_per_node` = 200435456;
> alter session set `planner.enable_hashjoin` = false;
> SELECT dt.d_year,
>item.i_brand_id  brand_id,
>item.i_brand brand,
>Sum(ss_ext_discount_amt) sum_agg
> FROM   date_dim dt,
>store_sales,
>item
> WHERE  dt.d_date_sk = store_sales.ss_sold_date_sk
>AND store_sales.ss_item_sk = item.i_item_sk
>AND item.i_manufact_id = 427
>AND dt.d_moy = 11
> GROUP  BY dt.d_year,
>   item.i_brand,
>   item.i_brand_id
> ORDER  BY dt.d_year,
>   sum_agg DESC,
>   brand_id;
> {code}
> Exception from the logs
> {code}
> [Error Id: 676ff6ad-829d-4920-9d4f-5132601d27b4 ]
> at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:544)
>  ~[drill-common-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.mergeAndSpill(ExternalSortBatch.java:617)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.innerNext(ExternalSortBatch.java:425)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:215)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext(RemovingRecordBatch.java:93)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:215)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.RecordIterator.nextBatch(RecordIterator.java:99) 
> [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.RecordIterator.next(RecordIterator.java:185) 
> [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.RecordIterator.prepare(RecordIterator.java:169) 
> [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.join.JoinStatus.prepare(JoinStatus.java:87)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.join.MergeJoinBatch.innerNext(MergeJoinBatch.java:160)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
>

[jira] [Closed] (DRILL-5253) External sort fails with OOM error (Fails to allocate sv2)

2017-09-11 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou closed DRILL-5253.
-

This has been verified.

> External sort fails with OOM error (Fails to allocate sv2)
> --
>
> Key: DRILL-5253
> URL: https://issues.apache.org/jira/browse/DRILL-5253
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Execution - Relational Operators
>Affects Versions: 1.10.0
>Reporter: Rahul Challapalli
>Assignee: Paul Rogers
> Fix For: 1.12.0
>
> Attachments: 2762f36d-a2e7-5582-922d-3c4626be18c0.sys.drill
>
>
> git.commit.id.abbrev=2af709f
> The data set used in the below query has the same value for every column in 
> every row. The query fails with an OOM as it exceeds the allocated memory
> {code}
> alter session set `planner.width.max_per_node` = 1;
> alter session set `planner.memory.max_query_memory_per_node` = 104857600;
>  select count(*) from (select * from identical order by col1, col2, col3, 
> col4, col5, col6, col7, col8, col9, col10);
> Error: RESOURCE ERROR: One or more nodes ran out of memory while executing 
> the query.
> org.apache.drill.exec.exception.OutOfMemoryException: Unable to allocate sv2 
> buffer after repeated attempts
> Fragment 2:0
> [Error Id: aed43fa1-fd8b-4440-9426-0f35d055aabb on qa-node190.qa.lab:31010] 
> (state=,code=0)
> {code}
> Exception from the logs
> {code}
> org.apache.drill.common.exceptions.UserException: RESOURCE ERROR: One or more 
> nodes ran out of memory while executing the query.
> org.apache.drill.exec.exception.OutOfMemoryException: Unable to allocate sv2 
> buffer after repeated attempts
> [Error Id: aed43fa1-fd8b-4440-9426-0f35d055aabb ]
> at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:544)
>  ~[drill-common-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:242)
>  [drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
> at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_111]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_111]
> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_111]
> Caused by: org.apache.drill.exec.exception.OutOfMemoryException: 
> org.apache.drill.exec.exception.OutOfMemoryException: Unable to allocate sv2 
> buffer after repeated attempts
> at 
> org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.innerNext(ExternalSortBatch.java:371)
>  ~[drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
>  ~[drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:215)
>  ~[drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
>  ~[drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
>  ~[drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
>  ~[drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext(RemovingRecordBatch.java:93)
>  ~[drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
>  ~[drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:215)
>  ~[drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:104) 
> ~[drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext(SingleSenderCreator.java:92)
>  ~[drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:94) 
> ~[drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:232)
>

[jira] [Closed] (DRILL-5519) Sort fails to spill and results in an OOM

2017-09-11 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou closed DRILL-5519.
-

This has been verified.

> Sort fails to spill and results in an OOM
> -
>
> Key: DRILL-5519
> URL: https://issues.apache.org/jira/browse/DRILL-5519
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.10.0
>Reporter: Rahul Challapalli
>Assignee: Paul Rogers
> Fix For: 1.12.0
>
> Attachments: 26e49afc-cf45-637b-acc1-a70fee7fe7e2.sys.drill, 
> drillbit.log, drillbit.out, drill-env.sh
>
>
> Setup :
> {code}
> git.commit.id.abbrev=1e0a14c
> DRILL_MAX_DIRECT_MEMORY="32G"
> DRILL_MAX_HEAP="4G"
> No of nodes in the drill cluster : 1
> {code}
> The below query fails with an OOM in the "in-memory sort" code, which means 
> the logic which decides when to spill is flawed.
> {code}
> 0: jdbc:drill:zk=10.10.100.190:5181> ALTER SESSION SET 
> `exec.sort.disable_managed` = false;
> +---+-+
> |  ok   |   summary   |
> +---+-+
> | true  | exec.sort.disable_managed updated.  |
> +---+-+
> 1 row selected (1.022 seconds)
> 0: jdbc:drill:zk=10.10.100.190:5181> alter session set 
> `planner.memory.max_query_memory_per_node` = 334288000;
> +---++
> |  ok   |  summary   |
> +---++
> | true  | planner.memory.max_query_memory_per_node updated.  |
> +---++
> 1 row selected (0.369 seconds)
> 0: jdbc:drill:zk=10.10.100.190:5181> select count(*) from (select * from 
> (select flatten(flatten(lst_lst)) num from 
> dfs.`/drill/testdata/resource-manager/nested-large.json`) d order by d.num) 
> d1 where d1.num < -1;
> Error: RESOURCE ERROR: One or more nodes ran out of memory while executing 
> the query.
> Unable to allocate buffer of size 4194304 (rounded from 320) due to 
> memory limit. Current allocation: 16015936
> Fragment 2:2
> [Error Id: 4d9cc59a-b5d1-4ca9-9b26-69d9438f0bee on qa-node190.qa.lab:31010] 
> (state=,code=0)
> {code}
> Below is the exception from the logs
> {code}
> 2017-05-16 13:46:33,233 [26e49afc-cf45-637b-acc1-a70fee7fe7e2:frag:2:2] INFO  
> o.a.d.e.w.fragment.FragmentExecutor - User Error Occurred: One or more nodes 
> ran out of memory while executing the query. (Unable to allocate buffer of 
> size 4194304 (rounded from 320) due to memory limit. Current allocation: 
> 16015936)
> org.apache.drill.common.exceptions.UserException: RESOURCE ERROR: One or more 
> nodes ran out of memory while executing the query.
> Unable to allocate buffer of size 4194304 (rounded from 320) due to 
> memory limit. Current allocation: 16015936
> [Error Id: 4d9cc59a-b5d1-4ca9-9b26-69d9438f0bee ]
> at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:544)
>  ~[drill-common-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:244)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_111]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_111]
> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_111]
> Caused by: org.apache.drill.exec.exception.OutOfMemoryException: Unable to 
> allocate buffer of size 4194304 (rounded from 320) due to memory limit. 
> Current allocation: 16015936
> at 
> org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:220) 
> ~[drill-memory-base-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:195) 
> ~[drill-memory-base-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.test.generated.MSorterGen44.setup(MSortTemplate.java:91)
>  ~[na:na]
> at 
> org.apache.drill.exec.physical.impl.xsort.managed.MergeSort.merge(MergeSort.java:110)
>  ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.sortInMemory(ExternalSortBatch.java:1159)
>  ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.load(ExternalSortBatch.java:687)
>  ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>

[jira] [Closed] (DRILL-5447) Managed External Sort : Unable to allocate sv2 vector

2017-09-11 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou closed DRILL-5447.
-

This has been verified.

> Managed External Sort : Unable to allocate sv2 vector
> -
>
> Key: DRILL-5447
> URL: https://issues.apache.org/jira/browse/DRILL-5447
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.10.0
>Reporter: Rahul Challapalli
>Assignee: Paul Rogers
> Fix For: 1.12.0
>
> Attachments: 26550427-6adf-a52e-2ea8-dc52d8d8433f.sys.drill, 
> 26617a7e-b953-7ac3-556d-43fd88e51b19.sys.drill, 
> 26fee988-ed18-a86a-7164-3e75118c0ffc.sys.drill, drillbit.log, drillbit.log, 
> drillbit.log
>
>
> git.commit.id.abbrev=3e8b01d
> Dataset :
> {code}
> Every records contains a repeated type with 2000 elements. 
> The repeated type contains varchars of length 250 for the first 2000 records 
> and single character strings for the next 2000 records
> The above pattern is repeated a few types
> {code}
> The below query fails
> {code}
> ALTER SESSION SET `exec.sort.disable_managed` = false;
> alter session set `planner.width.max_per_node` = 1;
> alter session set `planner.disable_exchanges` = true;
> alter session set `planner.width.max_per_query` = 1;
> select count(*) from (select * from (select id, flatten(str_list) str from 
> dfs.`/drill/testdata/resource-manager/flatten-large-small.json`) d order by 
> d.str) d1 where d1.id=0;
> Error: RESOURCE ERROR: Unable to allocate sv2 buffer
> Fragment 0:0
> [Error Id: 9e45c293-ab26-489d-a90e-25da96004f15 on qa-node190.qa.lab:31010] 
> (state=,code=0)
> {code}
> Exception from the logs
> {code}
> [Error Id: 9e45c293-ab26-489d-a90e-25da96004f15 ]
> at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:544)
>  ~[drill-common-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.newSV2(ExternalSortBatch.java:1463)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.makeSelectionVector(ExternalSortBatch.java:799)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.processBatch(ExternalSortBatch.java:856)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.loadBatch(ExternalSortBatch.java:618)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.load(ExternalSortBatch.java:660)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.innerNext(ExternalSortBatch.java:559)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:215)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext(RemovingRecordBatch.java:93)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:215)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
>

[jira] [Commented] (DRILL-5670) Varchar vector throws an assertion error when allocating a new vector

2017-09-11 Thread Robert Hou (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16162289#comment-16162289
 ] 

Robert Hou commented on DRILL-5670:
---

I have attached drillbit.log.sort.  Can you confirm that sort has completed?

> Varchar vector throws an assertion error when allocating a new vector
> -
>
> Key: DRILL-5670
> URL: https://issues.apache.org/jira/browse/DRILL-5670
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.11.0
>Reporter: Rahul Challapalli
>Assignee: Paul Rogers
> Fix For: 1.12.0
>
> Attachments: 26498995-bbad-83bc-618f-914c37a84e1f.sys.drill, 
> 26555749-4d36-10d2-6faf-e403db40c370.sys.drill, 
> 266290f3-5fdc-5873-7372-e9ee053bf867.sys.drill, 
> 269969ca-8d4d-073a-d916-9031e3d3fbf0.sys.drill, drillbit.log, drillbit.log, 
> drillbit.log, drillbit.log, drillbit.log, drillbit.log.sort, drillbit.out, 
> drill-override.conf
>
>
> I am running this test on a private branch of [paul's 
> repository|https://github.com/paul-rogers/drill]. Below is the commit info
> {code}
> git.commit.id.abbrev=d86e16c
> git.commit.user.email=prog...@maprtech.com
> git.commit.message.full=DRILL-5601\: Rollup of external sort fixes an 
> improvements\n\n- DRILL-5513\: Managed External Sort \: OOM error during the 
> merge phase\n- DRILL-5519\: Sort fails to spill and results in an OOM\n- 
> DRILL-5522\: OOM during the merge and spill process of the managed external 
> sort\n- DRILL-5594\: Excessive buffer reallocations during merge phase of 
> external sort\n- DRILL-5597\: Incorrect "bits" vector allocation in nullable 
> vectors allocateNew()\n- DRILL-5602\: Repeated List Vector fails to 
> initialize the offset vector\n\nAll of the bugs have to do with handling 
> low-memory conditions, and with\ncorrectly estimating the sizes of vectors, 
> even when those vectors come\nfrom the spill file or from an exchange. Hence, 
> the changes for all of\nthe above issues are interrelated.\n
> git.commit.id=d86e16c551e7d3553f2cde748a739b1c5a7a7659
> git.commit.message.short=DRILL-5601\: Rollup of external sort fixes an 
> improvements
> git.commit.user.name=Paul Rogers
> git.build.user.name=Rahul Challapalli
> git.commit.id.describe=0.9.0-1078-gd86e16c
> git.build.user.email=challapallira...@gmail.com
> git.branch=d86e16c551e7d3553f2cde748a739b1c5a7a7659
> git.commit.time=05.07.2017 @ 20\:34\:39 PDT
> git.build.time=12.07.2017 @ 14\:27\:03 PDT
> git.remote.origin.url=g...@github.com\:paul-rogers/drill.git
> {code}
> Below query fails with an Assertion Error
> {code}
> 0: jdbc:drill:zk=10.10.100.190:5181> ALTER SESSION SET 
> `exec.sort.disable_managed` = false;
> +---+-+
> |  ok   |   summary   |
> +---+-+
> | true  | exec.sort.disable_managed updated.  |
> +---+-+
> 1 row selected (1.044 seconds)
> 0: jdbc:drill:zk=10.10.100.190:5181> alter session set 
> `planner.memory.max_query_memory_per_node` = 482344960;
> +---++
> |  ok   |  summary   |
> +---++
> | true  | planner.memory.max_query_memory_per_node updated.  |
> +---++
> 1 row selected (0.372 seconds)
> 0: jdbc:drill:zk=10.10.100.190:5181> alter session set 
> `planner.width.max_per_node` = 1;
> +---+--+
> |  ok   |   summary|
> +---+--+
> | true  | planner.width.max_per_node updated.  |
> +---+--+
> 1 row selected (0.292 seconds)
> 0: jdbc:drill:zk=10.10.100.190:5181> alter session set 
> `planner.width.max_per_query` = 1;
> +---+---+
> |  ok   |summary|
> +---+---+
> | true  | planner.width.max_per_query updated.  |
> +---+---+
> 1 row selected (0.25 seconds)
> 0: jdbc:drill:zk=10.10.100.190:5181> select count(*) from (select * from 
> dfs.`/drill/testdata/resource-manager/3500cols.tbl` order by 
> columns[450],columns[330],columns[230],columns[220],columns[110],columns[90],columns[80],columns[70],columns[40],columns[10],columns[20],columns[30],columns[40],columns[50],
>  
> columns[454],columns[413],columns[940],columns[834],columns[73],columns[140],columns[104],columns[],columns[30],columns[2420],columns[1520],
>  columns[1410], 
>

[jira] [Resolved] (DRILL-5443) Managed External Sort fails with OOM while spilling to disk

2017-09-11 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou resolved DRILL-5443.
---
Resolution: Fixed

This has been resolved.

> Managed External Sort fails with OOM while spilling to disk
> ---
>
> Key: DRILL-5443
> URL: https://issues.apache.org/jira/browse/DRILL-5443
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.10.0, 1.11.0
>Reporter: Rahul Challapalli
>Assignee: Paul Rogers
> Fix For: 1.12.0
>
> Attachments: 265a014b-8cae-30b5-adab-ff030b6c7086.sys.drill, 
> 27016969-ef53-40dc-b582-eea25371fa1c.sys.drill, drill5443.drillbit.log, 
> drillbit.log
>
>
> git.commit.id.abbrev=3e8b01d
> The below query fails with an OOM
> {code}
> ALTER SESSION SET `exec.sort.disable_managed` = false;
> alter session set `planner.width.max_per_node` = 1;
> alter session set `planner.disable_exchanges` = true;
> alter session set `planner.width.max_per_query` = 1;
> alter session set `planner.memory.max_query_memory_per_node` = 52428800;
> select s1.type type, flatten(s1.rms.rptd) rptds from (select d.type type, 
> d.uid uid, flatten(d.map.rm) rms from 
> dfs.`/drill/testdata/resource-manager/nested-large.json` d order by d.uid) s1 
> order by s1.rms.mapid;
> {code}
> Exception from the logs
> {code}
> 2017-04-24 17:22:59,439 [27016969-ef53-40dc-b582-eea25371fa1c:frag:0:0] INFO  
> o.a.d.e.p.i.x.m.ExternalSortBatch - User Error Occurred: External Sort 
> encountered an error while spilling to disk (Unable to allocate buffer of 
> size 524288 (rounded from 307197) due to memory limit. Current allocation: 
> 25886728)
> org.apache.drill.common.exceptions.UserException: RESOURCE ERROR: External 
> Sort encountered an error while spilling to disk
> [Error Id: a64e3790-3a34-42c8-b4ea-4cb1df780e63 ]
> at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:544)
>  ~[drill-common-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.doMergeAndSpill(ExternalSortBatch.java:1445)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.mergeAndSpill(ExternalSortBatch.java:1376)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.mergeRuns(ExternalSortBatch.java:1372)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.consolidateBatches(ExternalSortBatch.java:1299)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.mergeSpilledRuns(ExternalSortBatch.java:1195)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.load(ExternalSortBatch.java:689)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.innerNext(ExternalSortBatch.java:559)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:215)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext(RemovingRecordBatch.java:93)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:215)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
>

[jira] [Closed] (DRILL-5442) Managed Sort: IndexOutOfBounds with a join over an inlist

2017-09-11 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou closed DRILL-5442.
-

I have verified this has been fixed.

> Managed Sort: IndexOutOfBounds with a join over an inlist
> -
>
> Key: DRILL-5442
> URL: https://issues.apache.org/jira/browse/DRILL-5442
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.10.0
>Reporter: Boaz Ben-Zvi
>Assignee: Paul Rogers
> Fix For: 1.12.0
>
>
> The following query fails with IOOB when a managed sort is used, but passes 
> with the old default sort:
> =
> 0: jdbc:drill:zk=local> alter session set `exec.sort.disable_managed` = false;
> +---+-+
> |  ok   |   summary   |
> +---+-+
> | true  | exec.sort.disable_managed updated.  |
> +---+-+
> 1 row selected (0.16 seconds)
> 0: jdbc:drill:zk=local> select * from dfs.`/data/json/s1/date_dim` where 
> d_year in(1990, 1901, 1902, 1903, 1904, 1905, 1906, 1907, 1908, 1909, 1910, 
> 1911, 1912, 1913, 1914, 1915, 1916, 1917, 1918, 1919) limit 3;
> Error: SYSTEM ERROR: IndexOutOfBoundsException: index: 0, length: 1 
> (expected: range(0, 0))
> Fragment 0:0
> [Error Id: 370fd706-c365-421f-b57d-d6ab7fde82df on 10.250.56.251:31010] 
> (state=,code=0)
>  
> 
> (the above query was extracted from 
> /root/drillAutomation/framework-master/framework/resources/Functional/tpcds/variants/hive/q4_1.sql
>  )
> Note that the inlist must have at least 20 items, in which case the plan 
> becomes a join over a stream-aggregate over a sort over the (inlist's) 
> values. When the IOOB happens, the stack does not show the sort anymore, but 
> probably handling a NONE returned by the last next() on the sort ( 
> StreamingAggTemplate.doWork():182 ) 
> The "date_dim" can probably be made up with any data. The one above was taken 
> from:
> [root@atsqa6c85 ~]# hadoop fs -ls /drill/testdata/tpcds/json/s1/date_dim
> Found 1 items
> -rwxr-xr-x   3 root root   50713534 2014-10-14 22:39 
> /drill/testdata/tpcds/json/s1/date_dim/0_0_0.json



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Closed] (DRILL-5443) Managed External Sort fails with OOM while spilling to disk

2017-09-11 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou closed DRILL-5443.
-

This has been verified.

> Managed External Sort fails with OOM while spilling to disk
> ---
>
> Key: DRILL-5443
> URL: https://issues.apache.org/jira/browse/DRILL-5443
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.10.0, 1.11.0
>Reporter: Rahul Challapalli
>Assignee: Paul Rogers
> Fix For: 1.12.0
>
> Attachments: 265a014b-8cae-30b5-adab-ff030b6c7086.sys.drill, 
> 27016969-ef53-40dc-b582-eea25371fa1c.sys.drill, drill5443.drillbit.log, 
> drillbit.log
>
>
> git.commit.id.abbrev=3e8b01d
> The below query fails with an OOM
> {code}
> ALTER SESSION SET `exec.sort.disable_managed` = false;
> alter session set `planner.width.max_per_node` = 1;
> alter session set `planner.disable_exchanges` = true;
> alter session set `planner.width.max_per_query` = 1;
> alter session set `planner.memory.max_query_memory_per_node` = 52428800;
> select s1.type type, flatten(s1.rms.rptd) rptds from (select d.type type, 
> d.uid uid, flatten(d.map.rm) rms from 
> dfs.`/drill/testdata/resource-manager/nested-large.json` d order by d.uid) s1 
> order by s1.rms.mapid;
> {code}
> Exception from the logs
> {code}
> 2017-04-24 17:22:59,439 [27016969-ef53-40dc-b582-eea25371fa1c:frag:0:0] INFO  
> o.a.d.e.p.i.x.m.ExternalSortBatch - User Error Occurred: External Sort 
> encountered an error while spilling to disk (Unable to allocate buffer of 
> size 524288 (rounded from 307197) due to memory limit. Current allocation: 
> 25886728)
> org.apache.drill.common.exceptions.UserException: RESOURCE ERROR: External 
> Sort encountered an error while spilling to disk
> [Error Id: a64e3790-3a34-42c8-b4ea-4cb1df780e63 ]
> at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:544)
>  ~[drill-common-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.doMergeAndSpill(ExternalSortBatch.java:1445)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.mergeAndSpill(ExternalSortBatch.java:1376)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.mergeRuns(ExternalSortBatch.java:1372)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.consolidateBatches(ExternalSortBatch.java:1299)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.mergeSpilledRuns(ExternalSortBatch.java:1195)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.load(ExternalSortBatch.java:689)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.innerNext(ExternalSortBatch.java:559)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:215)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext(RemovingRecordBatch.java:93)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:215)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at

[jira] [Commented] (DRILL-5478) Spill file size parameter is not honored by the managed external sort

2017-09-15 Thread Robert Hou (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16168247#comment-16168247
 ] 

Robert Hou commented on DRILL-5478:
---

We should still test it on behalf of Support.  We don't have to test it 
extensively, but ensure it still works in general.

The file size in this example is 256 MB.  The memory is 1 GB.  Is this a 
reasonable set of values?

> Spill file size parameter is not honored by the managed external sort
> -
>
> Key: DRILL-5478
> URL: https://issues.apache.org/jira/browse/DRILL-5478
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.10.0
>Reporter: Rahul Challapalli
>Assignee: Paul Rogers
> Fix For: 1.12.0
>
>
> git.commit.id.abbrev=1e0a14c
> Query:
> {code}
> ALTER SESSION SET `exec.sort.disable_managed` = false;
> alter session set `planner.width.max_per_node` = 1;
> alter session set `planner.disable_exchanges` = true;
> alter session set `planner.width.max_per_query` = 1;
> alter session set `planner.memory.max_query_memory_per_node` = 1052428800;
> alter session set `planner.enable_decimal_data_type` = true;
> select count(*) from (
>   select * from dfs.`/drill/testdata/resource-manager/all_types_large` d1
>   order by d1.map.missing
> ) d;
> {code}
> Boot Options (spill file size is set to 256MB)
> {code}
> 0: jdbc:drill:zk=10.10.100.190:5181> select * from sys.boot where name like 
> '%spill%';
> +--+-+---+-+--++---++
> |   name   |  kind   | type  | status 
>  | num_val  | string_val | bool_val  
> | float_val  |
> +--+-+---+-+--++---++
> | drill.exec.sort.external.spill.directories   | STRING  | BOOT  | BOOT   
>  | null | [
> # drill-override.conf: 26
> "/tmp/test"
> ]  | null  | null   |
> | drill.exec.sort.external.spill.file_size | STRING  | BOOT  | BOOT   
>  | null | "256M" | null  
> | null   |
> | drill.exec.sort.external.spill.fs| STRING  | BOOT  | BOOT   
>  | null | "maprfs:///"   | null  
> | null   |
> | drill.exec.sort.external.spill.group.size| LONG| BOOT  | BOOT   
>  | 4| null   | null  
> | null   |
> | drill.exec.sort.external.spill.merge_batch_size  | STRING  | BOOT  | BOOT   
>  | null | "16M"  | null  
> | null   |
> | drill.exec.sort.external.spill.spill_batch_size  | STRING  | BOOT  | BOOT   
>  | null | "8M"   | null  
> | null   |
> | drill.exec.sort.external.spill.threshold | LONG| BOOT  | BOOT   
>  | 4| null   | null  
> | null   |
> +--+-+---+-+--++---++
> {code}
> Below are the spill files while the query is still executing. The size of the 
> spill files is ~34MB
> {code}
> -rwxr-xr-x   3 root root   34957815 2017-05-05 11:26 
> /tmp/test/26f33c36-4235-3531-aeaa-2c73dc4ddeb5_major0_minor0_op5_sort/run1
> -rwxr-xr-x   3 root root   34957815 2017-05-05 11:27 
> /tmp/test/26f33c36-4235-3531-aeaa-2c73dc4ddeb5_major0_minor0_op5_sort/run2
> -rwxr-xr-x   3 root root  0 2017-05-05 11:27 
> /tmp/test/26f33c36-4235-3531-aeaa-2c73dc4ddeb5_major0_minor0_op5_sort/run3
> {code}
> The data set is too large to attach here. Reach out to me if you need anything



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Closed] (DRILL-5153) RESOURCE ERROR: Waited for 15000ms, but tasks for 'Get block maps' are not complete

2017-09-15 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou closed DRILL-5153.
-
Resolution: Cannot Reproduce

> RESOURCE ERROR: Waited for 15000ms, but tasks for 'Get block maps'  are not 
> complete
> 
>
> Key: DRILL-5153
> URL: https://issues.apache.org/jira/browse/DRILL-5153
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - RPC, Query Planning & Optimization
>Reporter: Rahul Challapalli
> Attachments: tera.log
>
>
> git.commit.id.abbrev=cf2b7c7
> The below query consistently fails on my 2 node cluster. I used the data set 
> from the terasort benchmark
> {code}
> select * from dfs.`/drill/testdata/resource-manager/terasort-data` limit 1;
> Error: RESOURCE ERROR: Waited for 15000ms, but tasks for 'Get block maps' are 
> not complete. Total runnable size 2, parallelism 2.
> [Error Id: 580e6c04-7096-4c09-9c7a-63e70c71d574 on qa-node182.qa.lab:31010] 
> (state=,code=0)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Closed] (DRILL-5146) Unnecessary spilling to disk by sort when we only have 5000 rows with one column

2017-09-15 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou closed DRILL-5146.
-

I have verified this is fixed.

> Unnecessary spilling to disk by sort when we only have 5000 rows with one 
> column
> 
>
> Key: DRILL-5146
> URL: https://issues.apache.org/jira/browse/DRILL-5146
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Reporter: Rahul Challapalli
>Assignee: Paul Rogers
> Fix For: 1.12.0
>
> Attachments: 27a52efb-0ce6-f2ad-7216-aef007926649.sys.drill, 
> data.tgz, spill.log
>
>
> git.commit.id.abbrev=cf2b7c7
> The below query spills to disk for the sort. The dataset contains 5000 files 
> and each file contains a single record. 
> {code}
> select * from dfs.`/drill/testdata/resource-manager/5000files/text` order by 
> columns[1];
> {code}
> Enviironment :
> {code}
> DRILL_MAX_DIRECT_MEMORY="16G"
> DRILL_MAX_HEAP="4G"
> {code}
> I attached the dataset, logs and the profile



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Closed] (DRILL-5154) OOM error in external sort on top of 400GB data set generated using terasort benchamark

2017-09-15 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou closed DRILL-5154.
-
Resolution: Cannot Reproduce

> OOM error in external sort on top of 400GB data set generated using terasort 
> benchamark
> ---
>
> Key: DRILL-5154
> URL: https://issues.apache.org/jira/browse/DRILL-5154
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.10.0
>Reporter: Rahul Challapalli
>Assignee: Paul Rogers
> Attachments: 27a3de95-e30b-8890-6653-80fd6c49a3a1.sys.drill
>
>
> git.commit.id.abbrev=cf2b7c7
> The below query fails with an OOM in external sort
> {code}
> No of drillbits : 1
> Nodes in Mapr cluster : 2
> DRILL_MAX_DIRECT_MEMORY="16G"
> DRILL_MAX_HEAP="4G"
> select * from (select * from 
> dfs.`/drill/testdata/resource-manager/terasort-data/part-m-0.tbl` order 
> by columns[0]) d where d.columns[0] = 'null';
> Error: RESOURCE ERROR: One or more nodes ran out of memory while executing 
> the query.
> Unable to allocate buffer of size 8388608 due to memory limit. Current 
> allocation: 8441872
> Fragment 1:6
> [Error Id: 87ede736-b480-4286-b472-7694fdd2f7da on qa-node183.qa.lab:31010] 
> (state=,code=0)
> {code}
> I attached the logs and the query profile



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5154) OOM error in external sort on top of 400GB data set generated using terasort benchamark

2017-09-15 Thread Robert Hou (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16167417#comment-16167417
 ] 

Robert Hou commented on DRILL-5154:
---

We do not have this table any more, so we cannot reproduce the problem.  
Closing it for now.

> OOM error in external sort on top of 400GB data set generated using terasort 
> benchamark
> ---
>
> Key: DRILL-5154
> URL: https://issues.apache.org/jira/browse/DRILL-5154
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.10.0
>Reporter: Rahul Challapalli
>Assignee: Paul Rogers
> Attachments: 27a3de95-e30b-8890-6653-80fd6c49a3a1.sys.drill
>
>
> git.commit.id.abbrev=cf2b7c7
> The below query fails with an OOM in external sort
> {code}
> No of drillbits : 1
> Nodes in Mapr cluster : 2
> DRILL_MAX_DIRECT_MEMORY="16G"
> DRILL_MAX_HEAP="4G"
> select * from (select * from 
> dfs.`/drill/testdata/resource-manager/terasort-data/part-m-0.tbl` order 
> by columns[0]) d where d.columns[0] = 'null';
> Error: RESOURCE ERROR: One or more nodes ran out of memory while executing 
> the query.
> Unable to allocate buffer of size 8388608 due to memory limit. Current 
> allocation: 8441872
> Fragment 1:6
> [Error Id: 87ede736-b480-4286-b472-7694fdd2f7da on qa-node183.qa.lab:31010] 
> (state=,code=0)
> {code}
> I attached the logs and the query profile



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

1 2 3 4 5 >

1 - 100 of 468 matches

Mail list logo