[jira] [Comment Edited] (DRILL-4707) Conflicting columns names under case-insensitive policy lead to either memory leak or incorrect result
[ https://issues.apache.org/jira/browse/DRILL-4707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15321712#comment-15321712 ] Robert Hou edited comment on DRILL-4707 at 6/9/16 12:37 AM: Here is another query: SELECT s.student_id as student, s.name as StudenT, s.gpa as StudenT from student s join hive.alltypesp1 h on (s.student_id = h.c1); Error: SYSTEM ERROR: IllegalStateException: Failure while reading vector. Expected vector class of org.apache.drill.exec.vector.NullableVarBinaryVector but was holding vector class org.apache.drill.exec.vector.NullableFloat8Vector, field= StudenT0(FLOAT8:OPTIONAL) student_id is an integer name is a varchar gpa is a double c1 is an integer select version, commit_id, commit_message from sys.version; +-+---++ | version | commit_id | commit_message | +-+---++ | 1.7.0-SNAPSHOT | a07f4de7e8725f7971ace308e81a241b7b07b5b6 | DRILL-3522: Fix for sporadic Mongo errors | +-+---++ was (Author: rhou): Here is another query: SELECT s.student_id as student, s.name as StudenT, s.gpa as StudenT from student s join hive.alltypesp1 h on (s.student_id = h.c1); Error: SYSTEM ERROR: IllegalStateException: Failure while reading vector. Expected vector class of org.apache.drill.exec.vector.NullableVarBinaryVector but was holding vector class org.apache.drill.exec.vector.NullableFloat8Vector, field= StudenT0(FLOAT8:OPTIONAL) student_id is an integer name is a varchar gpa is a double c1 is an integer 0: jdbc:drill:zk=10.10.100.186:5181> select version, commit_id, commit_message from sys.version; +-+---++ | version | commit_id | commit_message | +-+---++ | 1.7.0-SNAPSHOT | a07f4de7e8725f7971ace308e81a241b7b07b5b6 | DRILL-3522: Fix for sporadic Mongo errors | +-+---++ > Conflicting columns names under case-insensitive policy lead to either memory > leak or incorrect result > -- > > Key: DRILL-4707 > URL: https://issues.apache.org/jira/browse/DRILL-4707 > Project: Apache Drill > Issue Type: Bug >Reporter: Jinfeng Ni >Assignee: Jinfeng Ni >Priority: Critical > > On latest master branch: > {code} > select version, commit_id, commit_message from sys.version; > +-+---+-+ > | version | commit_id | > commit_message | > +-+---+-+ > | 1.7.0-SNAPSHOT | 3186217e5abe3c6c2c7e504cdb695567ff577e4c | DRILL-4607: > Add a split function that allows to separate string by a delimiter | > +-+---+-+ > {code} > If a query has two conflicting column names under case-insensitive policy, > Drill will either hit memory leak, or incorrect issue. > Q1. > {code} > select r_regionkey as XYZ, r_name as xyz FROM cp.`tpch/region.parquet`; > Error: SYSTEM ERROR: IllegalStateException: Memory was leaked by query. > Memory leaked: (131072) > Allocator(op:0:0:1:Project) 100/131072/2490368/100 > (res/actual/peak/limit) > Fragment 0:0 > {code} > Q2: return only one column in the result. > {code} > select n_nationkey as XYZ, n_regionkey as xyz FROM cp.`tpch/nation.parquet`; > +--+ > | XYZ | > +--+ > | 0| > | 1| > | 1| > | 1| > | 4| > | 0| > | 3| > {code} > The cause of the problem seems to be that the Project thinks the two incoming > columns as identical (since Drill adopts case-insensitive for column names in > execution). > The planner should make sure that the conflicting columns are resolved, since > execution is name-based. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4707) Conflicting columns names under case-insensitive policy lead to either memory leak or incorrect result
[ https://issues.apache.org/jira/browse/DRILL-4707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15321712#comment-15321712 ] Robert Hou commented on DRILL-4707: --- Here is another query: SELECT s.student_id as student, s.name as StudenT, s.gpa as StudenT from student s join hive.alltypesp1 h on (s.student_id = h.c1); Error: SYSTEM ERROR: IllegalStateException: Failure while reading vector. Expected vector class of org.apache.drill.exec.vector.NullableVarBinaryVector but was holding vector class org.apache.drill.exec.vector.NullableFloat8Vector, field= StudenT0(FLOAT8:OPTIONAL) student_id is an integer name is a varchar gpa is a double c1 is an integer 0: jdbc:drill:zk=10.10.100.186:5181> select version, commit_id, commit_message from sys.version; +-+---++ | version | commit_id | commit_message | +-+---++ | 1.7.0-SNAPSHOT | a07f4de7e8725f7971ace308e81a241b7b07b5b6 | DRILL-3522: Fix for sporadic Mongo errors | +-+---++ > Conflicting columns names under case-insensitive policy lead to either memory > leak or incorrect result > -- > > Key: DRILL-4707 > URL: https://issues.apache.org/jira/browse/DRILL-4707 > Project: Apache Drill > Issue Type: Bug >Reporter: Jinfeng Ni >Assignee: Jinfeng Ni >Priority: Critical > > On latest master branch: > {code} > select version, commit_id, commit_message from sys.version; > +-+---+-+ > | version | commit_id | > commit_message | > +-+---+-+ > | 1.7.0-SNAPSHOT | 3186217e5abe3c6c2c7e504cdb695567ff577e4c | DRILL-4607: > Add a split function that allows to separate string by a delimiter | > +-+---+-+ > {code} > If a query has two conflicting column names under case-insensitive policy, > Drill will either hit memory leak, or incorrect issue. > Q1. > {code} > select r_regionkey as XYZ, r_name as xyz FROM cp.`tpch/region.parquet`; > Error: SYSTEM ERROR: IllegalStateException: Memory was leaked by query. > Memory leaked: (131072) > Allocator(op:0:0:1:Project) 100/131072/2490368/100 > (res/actual/peak/limit) > Fragment 0:0 > {code} > Q2: return only one column in the result. > {code} > select n_nationkey as XYZ, n_regionkey as xyz FROM cp.`tpch/nation.parquet`; > +--+ > | XYZ | > +--+ > | 0| > | 1| > | 1| > | 1| > | 4| > | 0| > | 3| > {code} > The cause of the problem seems to be that the Project thinks the two incoming > columns as identical (since Drill adopts case-insensitive for column names in > execution). > The planner should make sure that the conflicting columns are resolved, since > execution is name-based. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4707) Conflicting columns names under case-insensitive policy lead to either memory leak or incorrect result
[ https://issues.apache.org/jira/browse/DRILL-4707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou updated DRILL-4707: -- Reviewer: Robert Hou (was: Chun Chang) > Conflicting columns names under case-insensitive policy lead to either memory > leak or incorrect result > -- > > Key: DRILL-4707 > URL: https://issues.apache.org/jira/browse/DRILL-4707 > Project: Apache Drill > Issue Type: Bug >Reporter: Jinfeng Ni >Assignee: Jinfeng Ni >Priority: Critical > Fix For: 1.8.0 > > > On latest master branch: > {code} > select version, commit_id, commit_message from sys.version; > +-+---+-+ > | version | commit_id | > commit_message | > +-+---+-+ > | 1.7.0-SNAPSHOT | 3186217e5abe3c6c2c7e504cdb695567ff577e4c | DRILL-4607: > Add a split function that allows to separate string by a delimiter | > +-+---+-+ > {code} > If a query has two conflicting column names under case-insensitive policy, > Drill will either hit memory leak, or incorrect issue. > Q1. > {code} > select r_regionkey as XYZ, r_name as xyz FROM cp.`tpch/region.parquet`; > Error: SYSTEM ERROR: IllegalStateException: Memory was leaked by query. > Memory leaked: (131072) > Allocator(op:0:0:1:Project) 100/131072/2490368/100 > (res/actual/peak/limit) > Fragment 0:0 > {code} > Q2: return only one column in the result. > {code} > select n_nationkey as XYZ, n_regionkey as xyz FROM cp.`tpch/nation.parquet`; > +--+ > | XYZ | > +--+ > | 0| > | 1| > | 1| > | 1| > | 4| > | 0| > | 3| > {code} > The cause of the problem seems to be that the Project thinks the two incoming > columns as identical (since Drill adopts case-insensitive for column names in > execution). > The planner should make sure that the conflicting columns are resolved, since > execution is name-based. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (DRILL-4514) Add describe schema command
[ https://issues.apache.org/jira/browse/DRILL-4514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou closed DRILL-4514. - Tests pass. > Add describe schema command > - > > Key: DRILL-4514 > URL: https://issues.apache.org/jira/browse/DRILL-4514 > Project: Apache Drill > Issue Type: New Feature >Affects Versions: Future >Reporter: Arina Ielchiieva >Assignee: Arina Ielchiieva > Labels: doc-impacting > Fix For: 1.8.0 > > > Add describe database command which will return directory > associated with a database on the fly. > Syntax: > describe database > describe schema > Output: > {code:sql} > DESCRIBE SCHEMA dfs.tmp; > {code} > {noformat} > +++ > | schema | properties | > +++ > | dfs.tmp | { > "type" : "file", > "enabled" : true, > "connection" : "file:///", > "config" : null, > "formats" : { > "psv" : { > "type" : "text", > "extensions" : [ "tbl" ], > "delimiter" : "|" > }, > "csv" : { > "type" : "text", > "extensions" : [ "csv" ], > "delimiter" : "," > }, > "tsv" : { > "type" : "text", > "extensions" : [ "tsv" ], > "delimiter" : "\t" > }, > "parquet" : { > "type" : "parquet" > }, > "json" : { > "type" : "json", > "extensions" : [ "json" ] > }, > "avro" : { > "type" : "avro" > }, > "sequencefile" : { > "type" : "sequencefile", > "extensions" : [ "seq" ] > }, > "csvh" : { > "type" : "text", > "extensions" : [ "csvh" ], > "extractHeader" : true, > "delimiter" : "," > } > }, > "location" : "/tmp", > "writable" : true, > "defaultInputFormat" : null > } | > +++ > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4514) Add describe schema command
[ https://issues.apache.org/jira/browse/DRILL-4514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15392490#comment-15392490 ] Robert Hou commented on DRILL-4514: --- Tests have been added, commit: cdcb7a0736646105ae01db8d49b88de22977a336. Tests pass. > Add describe schema command > - > > Key: DRILL-4514 > URL: https://issues.apache.org/jira/browse/DRILL-4514 > Project: Apache Drill > Issue Type: New Feature >Affects Versions: Future >Reporter: Arina Ielchiieva >Assignee: Arina Ielchiieva > Labels: doc-impacting > Fix For: 1.8.0 > > > Add describe database command which will return directory > associated with a database on the fly. > Syntax: > describe database > describe schema > Output: > {code:sql} > DESCRIBE SCHEMA dfs.tmp; > {code} > {noformat} > +++ > | schema | properties | > +++ > | dfs.tmp | { > "type" : "file", > "enabled" : true, > "connection" : "file:///", > "config" : null, > "formats" : { > "psv" : { > "type" : "text", > "extensions" : [ "tbl" ], > "delimiter" : "|" > }, > "csv" : { > "type" : "text", > "extensions" : [ "csv" ], > "delimiter" : "," > }, > "tsv" : { > "type" : "text", > "extensions" : [ "tsv" ], > "delimiter" : "\t" > }, > "parquet" : { > "type" : "parquet" > }, > "json" : { > "type" : "json", > "extensions" : [ "json" ] > }, > "avro" : { > "type" : "avro" > }, > "sequencefile" : { > "type" : "sequencefile", > "extensions" : [ "seq" ] > }, > "csvh" : { > "type" : "text", > "extensions" : [ "csvh" ], > "extractHeader" : true, > "delimiter" : "," > } > }, > "location" : "/tmp", > "writable" : true, > "defaultInputFormat" : null > } | > +++ > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (DRILL-4707) Conflicting columns names under case-insensitive policy lead to either memory leak or incorrect result
[ https://issues.apache.org/jira/browse/DRILL-4707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou closed DRILL-4707. - Tests have passed. > Conflicting columns names under case-insensitive policy lead to either memory > leak or incorrect result > -- > > Key: DRILL-4707 > URL: https://issues.apache.org/jira/browse/DRILL-4707 > Project: Apache Drill > Issue Type: Bug >Reporter: Jinfeng Ni >Assignee: Jinfeng Ni >Priority: Critical > Fix For: 1.8.0 > > > On latest master branch: > {code} > select version, commit_id, commit_message from sys.version; > +-+---+-+ > | version | commit_id | > commit_message | > +-+---+-+ > | 1.7.0-SNAPSHOT | 3186217e5abe3c6c2c7e504cdb695567ff577e4c | DRILL-4607: > Add a split function that allows to separate string by a delimiter | > +-+---+-+ > {code} > If a query has two conflicting column names under case-insensitive policy, > Drill will either hit memory leak, or incorrect issue. > Q1. > {code} > select r_regionkey as XYZ, r_name as xyz FROM cp.`tpch/region.parquet`; > Error: SYSTEM ERROR: IllegalStateException: Memory was leaked by query. > Memory leaked: (131072) > Allocator(op:0:0:1:Project) 100/131072/2490368/100 > (res/actual/peak/limit) > Fragment 0:0 > {code} > Q2: return only one column in the result. > {code} > select n_nationkey as XYZ, n_regionkey as xyz FROM cp.`tpch/nation.parquet`; > +--+ > | XYZ | > +--+ > | 0| > | 1| > | 1| > | 1| > | 4| > | 0| > | 3| > {code} > The cause of the problem seems to be that the Project thinks the two incoming > columns as identical (since Drill adopts case-insensitive for column names in > execution). > The planner should make sure that the conflicting columns are resolved, since > execution is name-based. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4147) Union All operator runs in a single fragment
[ https://issues.apache.org/jira/browse/DRILL-4147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15410511#comment-15410511 ] Robert Hou commented on DRILL-4147: --- Here is a simple test case using lineitem. Lineitem can be small, but it needs to be created with many parquet files. alter session set `store.parquet.block-size`=50; create table lineitemfiles as select * from lineitem; create table newlineitemfiles as with lineitem_cte as ( select l.l_orderkey, l.l_partkey from lineitemfiles l limit 1) (select l.l_orderkey, l.l_partkey from lineitemfiles l inner join orders o on l.l_orderkey = o.o_orderkey) union all (select l.l_orderkey, l.l_partkey from lineitem_cte l); > Union All operator runs in a single fragment > > > Key: DRILL-4147 > URL: https://issues.apache.org/jira/browse/DRILL-4147 > Project: Apache Drill > Issue Type: Bug >Reporter: amit hadke >Assignee: Aman Sinha > > A User noticed that running select from a single directory is much faster > than union all on two directories. > (https://drill.apache.org/blog/2014/12/09/running-sql-queries-on-amazon-s3/#comment-2349732267) > > It seems like UNION ALL operator doesn't parallelize sub scans (its using > SINGLETON for distribution type). Everything is ran in single fragment. > We may have to use SubsetTransformer in UnionAllPrule. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4833) Union-All with a small cardinality input on one side does not get parallelized
[ https://issues.apache.org/jira/browse/DRILL-4833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou updated DRILL-4833: -- Reviewer: Robert Hou > Union-All with a small cardinality input on one side does not get parallelized > -- > > Key: DRILL-4833 > URL: https://issues.apache.org/jira/browse/DRILL-4833 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization >Affects Versions: 1.7.0 >Reporter: Aman Sinha >Assignee: Aman Sinha > > When a Union-All has an input that is a LIMIT 1 (or some small value relative > to the slice_target), and that input is accessing Parquet files, Drill does > an optimization where a single Parquet file is read (based on the rowcount > statistics in the Parquet file, we determine that reading 1 file is > sufficient). This also means that the max width for that major fragment is > set to 1 because only 1 minor fragment is needed to read 1 row-group. > The net effect of this is the width of 1 is applied to the major fragment > which consists of union-all and its inputs. This is sub-optimal because it > prevents parallelization of the other input and the union-all operator > itself. > Here's an example query and plan that illustrates the issue: > {noformat} > alter session set `planner.slice_target` = 1; > explain plan for > (select c.c_nationkey, c.c_custkey, c.c_name > from > dfs.`/Users/asinha/data/tpchmulti/customer` c > inner join > dfs.`/Users/asinha/data/tpchmulti/nation` n > on c.c_nationkey = n.n_nationkey) > union all > (select c_nationkey, c_custkey, c_name > from dfs.`/Users/asinha/data/tpchmulti/customer` c limit 1) > +--+--+ > | text | json | > +--+--+ > | 00-00Screen > 00-01 Project(c_nationkey=[$0], c_custkey=[$1], c_name=[$2]) > 00-02Project(c_nationkey=[$0], c_custkey=[$1], c_name=[$2]) > 00-03 UnionAll(all=[true]) > 00-05Project(c_nationkey=[$0], c_custkey=[$1], c_name=[$2]) > 00-07 HashJoin(condition=[=($0, $3)], joinType=[inner]) > 00-10Project(c_nationkey=[$0], c_custkey=[$1], c_name=[$2]) > 00-13 HashToRandomExchange(dist0=[[$0]]) > 01-01UnorderedMuxExchange > 03-01 Project(c_nationkey=[$0], c_custkey=[$1], > c_name=[$2], E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)]) > 03-02Scan(groupscan=[ParquetGroupScan > [entries=[ReadEntryWithPath > [path=file:/Users/asinha/data/tpchmulti/customer]], > selectionRoot=file:/Users/asinha/data/tpchmulti/customer, numFiles=1, > usedMetadataFile=false, columns=[`c_nationkey`, `c_custkey`, `c_name`]]]) > 00-09Project(n_nationkey=[$0]) > 00-12 HashToRandomExchange(dist0=[[$0]]) > 02-01UnorderedMuxExchange > 04-01 Project(n_nationkey=[$0], > E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)]) > 04-02Scan(groupscan=[ParquetGroupScan > [entries=[ReadEntryWithPath [path=file:/Users/asinha/data/tpchmulti/nation]], > selectionRoot=file:/Users/asinha/data/tpchmulti/nation, numFiles=1, > usedMetadataFile=false, columns=[`n_nationkey`]]]) > 00-04Project(c_nationkey=[$0], c_custkey=[$1], c_name=[$2]) > 00-06 SelectionVectorRemover > 00-08Limit(fetch=[1]) > 00-11 Scan(groupscan=[ParquetGroupScan > [entries=[ReadEntryWithPath > [path=/Users/asinha/data/tpchmulti/customer/01.parquet]], > selectionRoot=file:/Users/asinha/data/tpchmulti/customer, numFiles=1, > usedMetadataFile=false, columns=[`c_nationkey`, `c_custkey`, `c_name`]]]) > {noformat} > Note that Union-all and HashJoin are part of fragment 0 (single minor > fragment) even though they could have been parallelized. This clearly > affects performance for larger data sets. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4883) Drill Explorer returns "SYSTEM ERROR: UnsupportedOperationException: Unhandled field reference ; a field reference identifier must not have the form of a qualified name (
Robert Hou created DRILL-4883: - Summary: Drill Explorer returns "SYSTEM ERROR: UnsupportedOperationException: Unhandled field reference ; a field reference identifier must not have the form of a qualified name (i.e., with "."). Key: DRILL-4883 URL: https://issues.apache.org/jira/browse/DRILL-4883 Project: Apache Drill Issue Type: Bug Components: Execution - Codegen Affects Versions: 1.8.0 Environment: Drill Explorer runs in Windows Reporter: Robert Hou When Drill Explorer submits this query, it returns an error regarding favorites.color: select age,`favorites.color` from `dfs`.`drillTestDir`.`./json_storage/employeeNestedArrayAndObject.json` The error is: ERROR [HY000] [MapR][Drill] (1040) Drill failed to execute the query: select age,`favorites.color` from `dfs`.`drillTestDir`.`./json_storage/employeeNestedArrayAndObject.json` [30027]Query execution error. Details:[ SYSTEM ERROR: UnsupportedOperationException: Unhandled field reference "favorites.color"; a field reference identifier must not have the form of a qualified name (i.e., with "."). This query can be executed by sqlline (note that the format of the query is slightly different for sqlline and Drill Explorer). select age,`favorites.color` from `json_storage/employeeNestedArrayAndObject.json`; The physical plan for the query when using sqlline is different from the physical plan when using Drill Explorer. Here is the plan when using sqlline: 00-00Screen : rowType = RecordType(ANY age, ANY favorites.color): rowcount = 1.0, cumulative cost = {0.1 rows, 0.1 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 19699870 00-01 Project(age=[$0], favorites.color=[$1]) : rowType = RecordType(ANY age, ANY favorites.color): rowcount = 1.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 19699869 00-02Scan(groupscan=[EasyGroupScan [selectionRoot=maprfs:/drill/testdata/json_storage/employeeNestedArrayAndObject.json, numFiles=1, columns=[`age`, `favorites.color`], files=[maprfs:///drill/testdata/json_storage/employeeNestedArrayAndObject.json]]]) : rowType = RecordType(ANY age, ANY favorites.color): rowcount = 1.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 19699868 The physical plan when using Drill Explorer is: 00-00Screen : rowType = RecordType(ANY age, ANY favorites.color): rowcount = 1.0, cumulative cost = {1.1 rows, 1.1 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 19675621 00-01 ComplexToJson : rowType = RecordType(ANY age, ANY favorites.color): rowcount = 1.0, cumulative cost = {1.0 rows, 1.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 19675620 00-02Project(age=[$0], favorites.color=[$1]) : rowType = RecordType(ANY age, ANY favorites.color): rowcount = 1.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 19675619 00-03 Scan(groupscan=[EasyGroupScan [selectionRoot=maprfs:/drill/testdata/json_storage/employeeNestedArrayAndObject.json, numFiles=1, columns=[`age`, `favorites.color`], files=[maprfs:///drill/testdata/json_storage/employeeNestedArrayAndObject.json]]]) : rowType = RecordType(ANY age, ANY favorites.color): rowcount = 1.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 19675618 Drill Explorer has an extra ComplexToJson operator that may have a problem. Here is the data file used: { "first": "John", "last": "Doe", "age": 39, "sex": "M", "salary": 7, "registered": true, "interests": [ "Reading", "Mountain Biking", "Hacking" ], "favorites": { "color": "Blue", "sport": "Soccer", "food": "Spaghetti" } } -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3944) Drill MAXDIR Unknown variable or type "FILE_SEPARATOR"
[ https://issues.apache.org/jira/browse/DRILL-3944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15504960#comment-15504960 ] Robert Hou commented on DRILL-3944: --- This fix has been verified. > Drill MAXDIR Unknown variable or type "FILE_SEPARATOR" > -- > > Key: DRILL-3944 > URL: https://issues.apache.org/jira/browse/DRILL-3944 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.2.0 > Environment: 1.2.0 >Reporter: Jitendra >Assignee: Arina Ielchiieva > Attachments: newStackTrace.txt > > > We are facing issue with MAXDIR function, below is the query we are using to > reproduce this issue. > 0: jdbc:drill:drillbit=localhost> select maxdir('vspace.wspace', 'freemat2') > from vspace.wspace.`freemat2`; > Error: SYSTEM ERROR: CompileException: Line 75, Column 70: Unknown variable > or type "FILE_SEPARATOR" > Fragment 0:0 > [Error Id: d17c6e48-554d-4934-bc4d-783ca3dc6f51 on 10.10.99.71:31010] > (state=,code=0); > Below are the drillbit logs. > 2015-10-09 21:26:21,972 [29e7cf02-02bf-b007-72f2-52c67c80ea1c:frag:0:0] INFO > o.a.d.e.w.fragment.FragmentExecutor - > 29e7cf02-02bf-b007-72f2-52c67c80ea1c:0:0: State change requested > AWAITING_ALLOCATION --> RUNNING > 2015-10-09 21:26:21,972 [29e7cf02-02bf-b007-72f2-52c67c80ea1c:frag:0:0] INFO > o.a.d.e.w.f.FragmentStatusReporter - > 29e7cf02-02bf-b007-72f2-52c67c80ea1c:0:0: State to report: RUNNING > 2015-10-09 21:26:22,038 [29e7cf02-02bf-b007-72f2-52c67c80ea1c:frag:0:0] INFO > o.a.d.e.w.fragment.FragmentExecutor - > 29e7cf02-02bf-b007-72f2-52c67c80ea1c:0:0: State change requested RUNNING --> > FINISHED > 2015-10-09 21:26:22,039 [29e7cf02-02bf-b007-72f2-52c67c80ea1c:frag:0:0] INFO > o.a.d.e.w.f.FragmentStatusReporter - > 29e7cf02-02bf-b007-72f2-52c67c80ea1c:0:0: State to report: FINISHED > 2015-10-09 21:29:59,281 [29e7ce27-9cad-9d8a-a482-39f54cc7deda:foreman] INFO > o.a.d.e.store.mock.MockStorageEngine - Failure while attempting to check for > Parquet metadata file. > java.io.IOException: Open failed for file: /vspace/wspace/freemat2/20151005, > error: Invalid argument (22) > at com.mapr.fs.MapRClientImpl.open(MapRClientImpl.java:212) > ~[maprfs-4.1.0-mapr.jar:4.1.0-mapr] > at com.mapr.fs.MapRFileSystem.open(MapRFileSystem.java:862) > ~[maprfs-4.1.0-mapr.jar:4.1.0-mapr] > at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:800) > ~[hadoop-common-2.5.1-mapr-1503.jar:na] > at > org.apache.drill.exec.store.dfs.DrillFileSystem.open(DrillFileSystem.java:132) > ~[drill-java-exec-1.2.0.jar:1.2.0] > at > org.apache.drill.exec.store.dfs.BasicFormatMatcher$MagicStringMatcher.matches(BasicFormatMatcher.java:142) > ~[drill-java-exec-1.2.0.jar:1.2.0] > at > org.apache.drill.exec.store.dfs.BasicFormatMatcher.isFileReadable(BasicFormatMatcher.java:112) > ~[drill-java-exec-1.2.0.jar:1.2.0] > at > org.apache.drill.exec.store.parquet.ParquetFormatPlugin$ParquetFormatMatcher.isDirReadable(ParquetFormatPlugin.java:256) > [drill-java-exec-1.2.0.jar:1.2.0] > at > org.apache.drill.exec.store.parquet.ParquetFormatPlugin$ParquetFormatMatcher.isReadable(ParquetFormatPlugin.java:210) > [drill-java-exec-1.2.0.jar:1.2.0] > at > org.apache.drill.exec.store.dfs.WorkspaceSchemaFactory$WorkspaceSchema.create(WorkspaceSchemaFactory.java:326) > [drill-java-exec-1.2.0.jar:1.2.0] > at > org.apache.drill.exec.store.dfs.WorkspaceSchemaFactory$WorkspaceSchema.create(WorkspaceSchemaFactory.java:153) > [drill-java-exec-1.2.0.jar:1.2.0] > at > org.apache.drill.exec.planner.sql.ExpandingConcurrentMap.getNewEntry(ExpandingConcurrentMap.java:96) > [drill-java-exec-1.2.0.jar:1.2.0] > at > org.apache.drill.exec.planner.sql.ExpandingConcurrentMap.get(ExpandingConcurrentMap.java:90) > [drill-java-exec-1.2.0.jar:1.2.0] > at > org.apache.drill.exec.store.dfs.WorkspaceSchemaFactory$WorkspaceSchema.getTable(WorkspaceSchemaFactory.java:276) > [drill-java-exec-1.2.0.jar:1.2.0] > at > org.apache.calcite.jdbc.SimpleCalciteSchema.getTable(SimpleCalciteSchema.java:83) > [calcite-core-1.4.0-drill-r5.jar:1.4.0-drill-r5] > at > org.apache.calcite.prepare.CalciteCatalogReader.getTableFrom(CalciteCatalogReader.java:116) > [calcite-core-1.4.0-drill-r5.jar:1.4.0-drill-r5] > at > org.apache.calcite.prepare.CalciteCatalogReader.getTable(CalciteCatalogReader.java:99) > [calcite-core-1.4.0-drill-r5.jar:1.4.0-drill-r5] > at > org.apache.calcite.prepare.CalciteCatalogReader.getTable(CalciteCatalogReader.java:70) > [calcite-core-1.4.0-drill-r5.jar:1.4.0-drill-r5] > at > org.apache.calcite.sql.validate.EmptyScope.getTableNamespace(EmptyScope.java:75) > [calcite-core-1.4.0-drill-r5.jar:1.4.0-drill-r5] > at > org.apache.calcite.sql.validate.DelegatingScope.getTableNamespace(DelegatingScope.java:124) > [calcite-core-1.4.0-drill-r5.jar:1.4.0-drill-r5] > at >
[jira] [Reopened] (DRILL-3944) Drill MAXDIR Unknown variable or type "FILE_SEPARATOR"
[ https://issues.apache.org/jira/browse/DRILL-3944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou reopened DRILL-3944: --- > Drill MAXDIR Unknown variable or type "FILE_SEPARATOR" > -- > > Key: DRILL-3944 > URL: https://issues.apache.org/jira/browse/DRILL-3944 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.2.0 > Environment: 1.2.0 >Reporter: Jitendra >Assignee: Arina Ielchiieva > Attachments: newStackTrace.txt > > > We are facing issue with MAXDIR function, below is the query we are using to > reproduce this issue. > 0: jdbc:drill:drillbit=localhost> select maxdir('vspace.wspace', 'freemat2') > from vspace.wspace.`freemat2`; > Error: SYSTEM ERROR: CompileException: Line 75, Column 70: Unknown variable > or type "FILE_SEPARATOR" > Fragment 0:0 > [Error Id: d17c6e48-554d-4934-bc4d-783ca3dc6f51 on 10.10.99.71:31010] > (state=,code=0); > Below are the drillbit logs. > 2015-10-09 21:26:21,972 [29e7cf02-02bf-b007-72f2-52c67c80ea1c:frag:0:0] INFO > o.a.d.e.w.fragment.FragmentExecutor - > 29e7cf02-02bf-b007-72f2-52c67c80ea1c:0:0: State change requested > AWAITING_ALLOCATION --> RUNNING > 2015-10-09 21:26:21,972 [29e7cf02-02bf-b007-72f2-52c67c80ea1c:frag:0:0] INFO > o.a.d.e.w.f.FragmentStatusReporter - > 29e7cf02-02bf-b007-72f2-52c67c80ea1c:0:0: State to report: RUNNING > 2015-10-09 21:26:22,038 [29e7cf02-02bf-b007-72f2-52c67c80ea1c:frag:0:0] INFO > o.a.d.e.w.fragment.FragmentExecutor - > 29e7cf02-02bf-b007-72f2-52c67c80ea1c:0:0: State change requested RUNNING --> > FINISHED > 2015-10-09 21:26:22,039 [29e7cf02-02bf-b007-72f2-52c67c80ea1c:frag:0:0] INFO > o.a.d.e.w.f.FragmentStatusReporter - > 29e7cf02-02bf-b007-72f2-52c67c80ea1c:0:0: State to report: FINISHED > 2015-10-09 21:29:59,281 [29e7ce27-9cad-9d8a-a482-39f54cc7deda:foreman] INFO > o.a.d.e.store.mock.MockStorageEngine - Failure while attempting to check for > Parquet metadata file. > java.io.IOException: Open failed for file: /vspace/wspace/freemat2/20151005, > error: Invalid argument (22) > at com.mapr.fs.MapRClientImpl.open(MapRClientImpl.java:212) > ~[maprfs-4.1.0-mapr.jar:4.1.0-mapr] > at com.mapr.fs.MapRFileSystem.open(MapRFileSystem.java:862) > ~[maprfs-4.1.0-mapr.jar:4.1.0-mapr] > at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:800) > ~[hadoop-common-2.5.1-mapr-1503.jar:na] > at > org.apache.drill.exec.store.dfs.DrillFileSystem.open(DrillFileSystem.java:132) > ~[drill-java-exec-1.2.0.jar:1.2.0] > at > org.apache.drill.exec.store.dfs.BasicFormatMatcher$MagicStringMatcher.matches(BasicFormatMatcher.java:142) > ~[drill-java-exec-1.2.0.jar:1.2.0] > at > org.apache.drill.exec.store.dfs.BasicFormatMatcher.isFileReadable(BasicFormatMatcher.java:112) > ~[drill-java-exec-1.2.0.jar:1.2.0] > at > org.apache.drill.exec.store.parquet.ParquetFormatPlugin$ParquetFormatMatcher.isDirReadable(ParquetFormatPlugin.java:256) > [drill-java-exec-1.2.0.jar:1.2.0] > at > org.apache.drill.exec.store.parquet.ParquetFormatPlugin$ParquetFormatMatcher.isReadable(ParquetFormatPlugin.java:210) > [drill-java-exec-1.2.0.jar:1.2.0] > at > org.apache.drill.exec.store.dfs.WorkspaceSchemaFactory$WorkspaceSchema.create(WorkspaceSchemaFactory.java:326) > [drill-java-exec-1.2.0.jar:1.2.0] > at > org.apache.drill.exec.store.dfs.WorkspaceSchemaFactory$WorkspaceSchema.create(WorkspaceSchemaFactory.java:153) > [drill-java-exec-1.2.0.jar:1.2.0] > at > org.apache.drill.exec.planner.sql.ExpandingConcurrentMap.getNewEntry(ExpandingConcurrentMap.java:96) > [drill-java-exec-1.2.0.jar:1.2.0] > at > org.apache.drill.exec.planner.sql.ExpandingConcurrentMap.get(ExpandingConcurrentMap.java:90) > [drill-java-exec-1.2.0.jar:1.2.0] > at > org.apache.drill.exec.store.dfs.WorkspaceSchemaFactory$WorkspaceSchema.getTable(WorkspaceSchemaFactory.java:276) > [drill-java-exec-1.2.0.jar:1.2.0] > at > org.apache.calcite.jdbc.SimpleCalciteSchema.getTable(SimpleCalciteSchema.java:83) > [calcite-core-1.4.0-drill-r5.jar:1.4.0-drill-r5] > at > org.apache.calcite.prepare.CalciteCatalogReader.getTableFrom(CalciteCatalogReader.java:116) > [calcite-core-1.4.0-drill-r5.jar:1.4.0-drill-r5] > at > org.apache.calcite.prepare.CalciteCatalogReader.getTable(CalciteCatalogReader.java:99) > [calcite-core-1.4.0-drill-r5.jar:1.4.0-drill-r5] > at > org.apache.calcite.prepare.CalciteCatalogReader.getTable(CalciteCatalogReader.java:70) > [calcite-core-1.4.0-drill-r5.jar:1.4.0-drill-r5] > at > org.apache.calcite.sql.validate.EmptyScope.getTableNamespace(EmptyScope.java:75) > [calcite-core-1.4.0-drill-r5.jar:1.4.0-drill-r5] > at > org.apache.calcite.sql.validate.DelegatingScope.getTableNamespace(DelegatingScope.java:124) > [calcite-core-1.4.0-drill-r5.jar:1.4.0-drill-r5] > at >
[jira] [Closed] (DRILL-3944) Drill MAXDIR Unknown variable or type "FILE_SEPARATOR"
[ https://issues.apache.org/jira/browse/DRILL-3944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou closed DRILL-3944. - Resolution: Fixed > Drill MAXDIR Unknown variable or type "FILE_SEPARATOR" > -- > > Key: DRILL-3944 > URL: https://issues.apache.org/jira/browse/DRILL-3944 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.2.0 > Environment: 1.2.0 >Reporter: Jitendra >Assignee: Arina Ielchiieva > Attachments: newStackTrace.txt > > > We are facing issue with MAXDIR function, below is the query we are using to > reproduce this issue. > 0: jdbc:drill:drillbit=localhost> select maxdir('vspace.wspace', 'freemat2') > from vspace.wspace.`freemat2`; > Error: SYSTEM ERROR: CompileException: Line 75, Column 70: Unknown variable > or type "FILE_SEPARATOR" > Fragment 0:0 > [Error Id: d17c6e48-554d-4934-bc4d-783ca3dc6f51 on 10.10.99.71:31010] > (state=,code=0); > Below are the drillbit logs. > 2015-10-09 21:26:21,972 [29e7cf02-02bf-b007-72f2-52c67c80ea1c:frag:0:0] INFO > o.a.d.e.w.fragment.FragmentExecutor - > 29e7cf02-02bf-b007-72f2-52c67c80ea1c:0:0: State change requested > AWAITING_ALLOCATION --> RUNNING > 2015-10-09 21:26:21,972 [29e7cf02-02bf-b007-72f2-52c67c80ea1c:frag:0:0] INFO > o.a.d.e.w.f.FragmentStatusReporter - > 29e7cf02-02bf-b007-72f2-52c67c80ea1c:0:0: State to report: RUNNING > 2015-10-09 21:26:22,038 [29e7cf02-02bf-b007-72f2-52c67c80ea1c:frag:0:0] INFO > o.a.d.e.w.fragment.FragmentExecutor - > 29e7cf02-02bf-b007-72f2-52c67c80ea1c:0:0: State change requested RUNNING --> > FINISHED > 2015-10-09 21:26:22,039 [29e7cf02-02bf-b007-72f2-52c67c80ea1c:frag:0:0] INFO > o.a.d.e.w.f.FragmentStatusReporter - > 29e7cf02-02bf-b007-72f2-52c67c80ea1c:0:0: State to report: FINISHED > 2015-10-09 21:29:59,281 [29e7ce27-9cad-9d8a-a482-39f54cc7deda:foreman] INFO > o.a.d.e.store.mock.MockStorageEngine - Failure while attempting to check for > Parquet metadata file. > java.io.IOException: Open failed for file: /vspace/wspace/freemat2/20151005, > error: Invalid argument (22) > at com.mapr.fs.MapRClientImpl.open(MapRClientImpl.java:212) > ~[maprfs-4.1.0-mapr.jar:4.1.0-mapr] > at com.mapr.fs.MapRFileSystem.open(MapRFileSystem.java:862) > ~[maprfs-4.1.0-mapr.jar:4.1.0-mapr] > at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:800) > ~[hadoop-common-2.5.1-mapr-1503.jar:na] > at > org.apache.drill.exec.store.dfs.DrillFileSystem.open(DrillFileSystem.java:132) > ~[drill-java-exec-1.2.0.jar:1.2.0] > at > org.apache.drill.exec.store.dfs.BasicFormatMatcher$MagicStringMatcher.matches(BasicFormatMatcher.java:142) > ~[drill-java-exec-1.2.0.jar:1.2.0] > at > org.apache.drill.exec.store.dfs.BasicFormatMatcher.isFileReadable(BasicFormatMatcher.java:112) > ~[drill-java-exec-1.2.0.jar:1.2.0] > at > org.apache.drill.exec.store.parquet.ParquetFormatPlugin$ParquetFormatMatcher.isDirReadable(ParquetFormatPlugin.java:256) > [drill-java-exec-1.2.0.jar:1.2.0] > at > org.apache.drill.exec.store.parquet.ParquetFormatPlugin$ParquetFormatMatcher.isReadable(ParquetFormatPlugin.java:210) > [drill-java-exec-1.2.0.jar:1.2.0] > at > org.apache.drill.exec.store.dfs.WorkspaceSchemaFactory$WorkspaceSchema.create(WorkspaceSchemaFactory.java:326) > [drill-java-exec-1.2.0.jar:1.2.0] > at > org.apache.drill.exec.store.dfs.WorkspaceSchemaFactory$WorkspaceSchema.create(WorkspaceSchemaFactory.java:153) > [drill-java-exec-1.2.0.jar:1.2.0] > at > org.apache.drill.exec.planner.sql.ExpandingConcurrentMap.getNewEntry(ExpandingConcurrentMap.java:96) > [drill-java-exec-1.2.0.jar:1.2.0] > at > org.apache.drill.exec.planner.sql.ExpandingConcurrentMap.get(ExpandingConcurrentMap.java:90) > [drill-java-exec-1.2.0.jar:1.2.0] > at > org.apache.drill.exec.store.dfs.WorkspaceSchemaFactory$WorkspaceSchema.getTable(WorkspaceSchemaFactory.java:276) > [drill-java-exec-1.2.0.jar:1.2.0] > at > org.apache.calcite.jdbc.SimpleCalciteSchema.getTable(SimpleCalciteSchema.java:83) > [calcite-core-1.4.0-drill-r5.jar:1.4.0-drill-r5] > at > org.apache.calcite.prepare.CalciteCatalogReader.getTableFrom(CalciteCatalogReader.java:116) > [calcite-core-1.4.0-drill-r5.jar:1.4.0-drill-r5] > at > org.apache.calcite.prepare.CalciteCatalogReader.getTable(CalciteCatalogReader.java:99) > [calcite-core-1.4.0-drill-r5.jar:1.4.0-drill-r5] > at > org.apache.calcite.prepare.CalciteCatalogReader.getTable(CalciteCatalogReader.java:70) > [calcite-core-1.4.0-drill-r5.jar:1.4.0-drill-r5] > at > org.apache.calcite.sql.validate.EmptyScope.getTableNamespace(EmptyScope.java:75) > [calcite-core-1.4.0-drill-r5.jar:1.4.0-drill-r5] > at > org.apache.calcite.sql.validate.DelegatingScope.getTableNamespace(DelegatingScope.java:124) > [calcite-core-1.4.0-drill-r5.jar:1.4.0-drill-r5] > at >
[jira] [Updated] (DRILL-3944) Drill MAXDIR Unknown variable or type "FILE_SEPARATOR"
[ https://issues.apache.org/jira/browse/DRILL-3944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou updated DRILL-3944: -- Reviewer: Robert Hou > Drill MAXDIR Unknown variable or type "FILE_SEPARATOR" > -- > > Key: DRILL-3944 > URL: https://issues.apache.org/jira/browse/DRILL-3944 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.2.0 > Environment: 1.2.0 >Reporter: Jitendra >Assignee: Arina Ielchiieva > Attachments: newStackTrace.txt > > > We are facing issue with MAXDIR function, below is the query we are using to > reproduce this issue. > 0: jdbc:drill:drillbit=localhost> select maxdir('vspace.wspace', 'freemat2') > from vspace.wspace.`freemat2`; > Error: SYSTEM ERROR: CompileException: Line 75, Column 70: Unknown variable > or type "FILE_SEPARATOR" > Fragment 0:0 > [Error Id: d17c6e48-554d-4934-bc4d-783ca3dc6f51 on 10.10.99.71:31010] > (state=,code=0); > Below are the drillbit logs. > 2015-10-09 21:26:21,972 [29e7cf02-02bf-b007-72f2-52c67c80ea1c:frag:0:0] INFO > o.a.d.e.w.fragment.FragmentExecutor - > 29e7cf02-02bf-b007-72f2-52c67c80ea1c:0:0: State change requested > AWAITING_ALLOCATION --> RUNNING > 2015-10-09 21:26:21,972 [29e7cf02-02bf-b007-72f2-52c67c80ea1c:frag:0:0] INFO > o.a.d.e.w.f.FragmentStatusReporter - > 29e7cf02-02bf-b007-72f2-52c67c80ea1c:0:0: State to report: RUNNING > 2015-10-09 21:26:22,038 [29e7cf02-02bf-b007-72f2-52c67c80ea1c:frag:0:0] INFO > o.a.d.e.w.fragment.FragmentExecutor - > 29e7cf02-02bf-b007-72f2-52c67c80ea1c:0:0: State change requested RUNNING --> > FINISHED > 2015-10-09 21:26:22,039 [29e7cf02-02bf-b007-72f2-52c67c80ea1c:frag:0:0] INFO > o.a.d.e.w.f.FragmentStatusReporter - > 29e7cf02-02bf-b007-72f2-52c67c80ea1c:0:0: State to report: FINISHED > 2015-10-09 21:29:59,281 [29e7ce27-9cad-9d8a-a482-39f54cc7deda:foreman] INFO > o.a.d.e.store.mock.MockStorageEngine - Failure while attempting to check for > Parquet metadata file. > java.io.IOException: Open failed for file: /vspace/wspace/freemat2/20151005, > error: Invalid argument (22) > at com.mapr.fs.MapRClientImpl.open(MapRClientImpl.java:212) > ~[maprfs-4.1.0-mapr.jar:4.1.0-mapr] > at com.mapr.fs.MapRFileSystem.open(MapRFileSystem.java:862) > ~[maprfs-4.1.0-mapr.jar:4.1.0-mapr] > at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:800) > ~[hadoop-common-2.5.1-mapr-1503.jar:na] > at > org.apache.drill.exec.store.dfs.DrillFileSystem.open(DrillFileSystem.java:132) > ~[drill-java-exec-1.2.0.jar:1.2.0] > at > org.apache.drill.exec.store.dfs.BasicFormatMatcher$MagicStringMatcher.matches(BasicFormatMatcher.java:142) > ~[drill-java-exec-1.2.0.jar:1.2.0] > at > org.apache.drill.exec.store.dfs.BasicFormatMatcher.isFileReadable(BasicFormatMatcher.java:112) > ~[drill-java-exec-1.2.0.jar:1.2.0] > at > org.apache.drill.exec.store.parquet.ParquetFormatPlugin$ParquetFormatMatcher.isDirReadable(ParquetFormatPlugin.java:256) > [drill-java-exec-1.2.0.jar:1.2.0] > at > org.apache.drill.exec.store.parquet.ParquetFormatPlugin$ParquetFormatMatcher.isReadable(ParquetFormatPlugin.java:210) > [drill-java-exec-1.2.0.jar:1.2.0] > at > org.apache.drill.exec.store.dfs.WorkspaceSchemaFactory$WorkspaceSchema.create(WorkspaceSchemaFactory.java:326) > [drill-java-exec-1.2.0.jar:1.2.0] > at > org.apache.drill.exec.store.dfs.WorkspaceSchemaFactory$WorkspaceSchema.create(WorkspaceSchemaFactory.java:153) > [drill-java-exec-1.2.0.jar:1.2.0] > at > org.apache.drill.exec.planner.sql.ExpandingConcurrentMap.getNewEntry(ExpandingConcurrentMap.java:96) > [drill-java-exec-1.2.0.jar:1.2.0] > at > org.apache.drill.exec.planner.sql.ExpandingConcurrentMap.get(ExpandingConcurrentMap.java:90) > [drill-java-exec-1.2.0.jar:1.2.0] > at > org.apache.drill.exec.store.dfs.WorkspaceSchemaFactory$WorkspaceSchema.getTable(WorkspaceSchemaFactory.java:276) > [drill-java-exec-1.2.0.jar:1.2.0] > at > org.apache.calcite.jdbc.SimpleCalciteSchema.getTable(SimpleCalciteSchema.java:83) > [calcite-core-1.4.0-drill-r5.jar:1.4.0-drill-r5] > at > org.apache.calcite.prepare.CalciteCatalogReader.getTableFrom(CalciteCatalogReader.java:116) > [calcite-core-1.4.0-drill-r5.jar:1.4.0-drill-r5] > at > org.apache.calcite.prepare.CalciteCatalogReader.getTable(CalciteCatalogReader.java:99) > [calcite-core-1.4.0-drill-r5.jar:1.4.0-drill-r5] > at > org.apache.calcite.prepare.CalciteCatalogReader.getTable(CalciteCatalogReader.java:70) > [calcite-core-1.4.0-drill-r5.jar:1.4.0-drill-r5] > at > org.apache.calcite.sql.validate.EmptyScope.getTableNamespace(EmptyScope.java:75) > [calcite-core-1.4.0-drill-r5.jar:1.4.0-drill-r5] > at > org.apache.calcite.sql.validate.DelegatingScope.getTableNamespace(DelegatingScope.java:124) > [calcite-core-1.4.0-drill-r5.jar:1.4.0-drill-r5] > at >
[jira] [Closed] (DRILL-4147) Union All operator runs in a single fragment
[ https://issues.apache.org/jira/browse/DRILL-4147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou closed DRILL-4147. - This fix has been verified. > Union All operator runs in a single fragment > > > Key: DRILL-4147 > URL: https://issues.apache.org/jira/browse/DRILL-4147 > Project: Apache Drill > Issue Type: Bug >Reporter: amit hadke >Assignee: Aman Sinha > Fix For: 1.8.0 > > > A User noticed that running select from a single directory is much faster > than union all on two directories. > (https://drill.apache.org/blog/2014/12/09/running-sql-queries-on-amazon-s3/#comment-2349732267) > > It seems like UNION ALL operator doesn't parallelize sub scans (its using > SINGLETON for distribution type). Everything is ran in single fragment. > We may have to use SubsetTransformer in UnionAllPrule. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (DRILL-4743) HashJoin's not fully parallelized in query plan
[ https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou closed DRILL-4743. - Assignee: Robert Hou (was: Gautam Kumar Parai) This fix has been verified. > HashJoin's not fully parallelized in query plan > --- > > Key: DRILL-4743 > URL: https://issues.apache.org/jira/browse/DRILL-4743 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.5.0 >Reporter: Gautam Kumar Parai >Assignee: Robert Hou > Labels: doc-impacting > Fix For: 1.8.0 > > > The underlying problem is filter selectivity under-estimate for a query with > complicated predicates e.g. deeply nested and/or predicates. This leads to > under parallelization of the major fragment doing the join. > To really resolve this problem we need table/column statistics to correctly > estimate the selectivity. However, in the absence of statistics OR even when > existing statistics are insufficient to get a correct estimate of selectivity > this will serve as a workaround. > For now, the fix is to provide options for controlling the lower and upper > bounds for filter selectivity. The user can use the following options. The > selectivity can be varied between 0 and 1 with min selectivity always less > than or equal to max selectivity. > {code}planner.filter.min_selectivity_estimate_factor > planner.filter.max_selectivity_estimate_factor > {code} > When using 'explain plan including all attributes for ' it should cap the > estimated ROWCOUNT based on these options. Estimated ROWCOUNT of operators > downstream is not directly controlled by these options. However, they may > change as a result of dependency between different operators. The FILTER > operator only operates on the input of its immediate upstream operator (e.g. > SCAN, AGG). If two different filters are present in the same plan, they might > have different selectivities based on their immediate upstream operators > ROWCOUNT. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (DRILL-4833) Union-All with a small cardinality input on one side does not get parallelized
[ https://issues.apache.org/jira/browse/DRILL-4833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou closed DRILL-4833. - This fix has been verified. > Union-All with a small cardinality input on one side does not get parallelized > -- > > Key: DRILL-4833 > URL: https://issues.apache.org/jira/browse/DRILL-4833 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization >Affects Versions: 1.7.0 >Reporter: Aman Sinha >Assignee: Aman Sinha > Fix For: 1.8.0 > > > When a Union-All has an input that is a LIMIT 1 (or some small value relative > to the slice_target), and that input is accessing Parquet files, Drill does > an optimization where a single Parquet file is read (based on the rowcount > statistics in the Parquet file, we determine that reading 1 file is > sufficient). This also means that the max width for that major fragment is > set to 1 because only 1 minor fragment is needed to read 1 row-group. > The net effect of this is the width of 1 is applied to the major fragment > which consists of union-all and its inputs. This is sub-optimal because it > prevents parallelization of the other input and the union-all operator > itself. > Here's an example query and plan that illustrates the issue: > {noformat} > alter session set `planner.slice_target` = 1; > explain plan for > (select c.c_nationkey, c.c_custkey, c.c_name > from > dfs.`/Users/asinha/data/tpchmulti/customer` c > inner join > dfs.`/Users/asinha/data/tpchmulti/nation` n > on c.c_nationkey = n.n_nationkey) > union all > (select c_nationkey, c_custkey, c_name > from dfs.`/Users/asinha/data/tpchmulti/customer` c limit 1) > +--+--+ > | text | json | > +--+--+ > | 00-00Screen > 00-01 Project(c_nationkey=[$0], c_custkey=[$1], c_name=[$2]) > 00-02Project(c_nationkey=[$0], c_custkey=[$1], c_name=[$2]) > 00-03 UnionAll(all=[true]) > 00-05Project(c_nationkey=[$0], c_custkey=[$1], c_name=[$2]) > 00-07 HashJoin(condition=[=($0, $3)], joinType=[inner]) > 00-10Project(c_nationkey=[$0], c_custkey=[$1], c_name=[$2]) > 00-13 HashToRandomExchange(dist0=[[$0]]) > 01-01UnorderedMuxExchange > 03-01 Project(c_nationkey=[$0], c_custkey=[$1], > c_name=[$2], E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)]) > 03-02Scan(groupscan=[ParquetGroupScan > [entries=[ReadEntryWithPath > [path=file:/Users/asinha/data/tpchmulti/customer]], > selectionRoot=file:/Users/asinha/data/tpchmulti/customer, numFiles=1, > usedMetadataFile=false, columns=[`c_nationkey`, `c_custkey`, `c_name`]]]) > 00-09Project(n_nationkey=[$0]) > 00-12 HashToRandomExchange(dist0=[[$0]]) > 02-01UnorderedMuxExchange > 04-01 Project(n_nationkey=[$0], > E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)]) > 04-02Scan(groupscan=[ParquetGroupScan > [entries=[ReadEntryWithPath [path=file:/Users/asinha/data/tpchmulti/nation]], > selectionRoot=file:/Users/asinha/data/tpchmulti/nation, numFiles=1, > usedMetadataFile=false, columns=[`n_nationkey`]]]) > 00-04Project(c_nationkey=[$0], c_custkey=[$1], c_name=[$2]) > 00-06 SelectionVectorRemover > 00-08Limit(fetch=[1]) > 00-11 Scan(groupscan=[ParquetGroupScan > [entries=[ReadEntryWithPath > [path=/Users/asinha/data/tpchmulti/customer/01.parquet]], > selectionRoot=file:/Users/asinha/data/tpchmulti/customer, numFiles=1, > usedMetadataFile=false, columns=[`c_nationkey`, `c_custkey`, `c_name`]]]) > {noformat} > Note that Union-all and HashJoin are part of fragment 0 (single minor > fragment) even though they could have been parallelized. This clearly > affects performance for larger data sets. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4970) Wrong results when casting double to bigint or int
[ https://issues.apache.org/jira/browse/DRILL-4970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou updated DRILL-4970: -- Description: This query returns the wrong result 0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(*) from test_table where (int_id > -3025 and bigint_id <= -256) or (cast(double_id as bigint) >= -255 and double_id <= -5); +-+ | EXPR$0 | +-+ | 2769| +-+ Without the cast, it returns the correct result: 0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(*) from test_table where (int_id > -3025 and bigint_id <= -256) or (double_id >= -255 and double_id <= -5); +-+ | EXPR$0 | +-+ | 3020| +-+ By itself, the result is also correct: 0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(*) from test_table where (cast(double_id as bigint) >= -255 and double_id <= -5); +-+ | EXPR$0 | +-+ | 251 | +-+ was: This query returns the wrong result 0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(*) from orders_parts_rowgr1 where (int_id > -3025 and bigint_id <= -256) or (cast(double_id as bigint) >= -255 and double_id <= -5); +-+ | EXPR$0 | +-+ | 2769| +-+ Without the cast, it returns the correct result: 0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(*) from orders_parts_rowgr1 where (int_id > -3025 and bigint_id <= -256) or (double_id >= -255 and double_id <= -5); +-+ | EXPR$0 | +-+ | 3020| +-+ By itself, the result is also correct: 0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(*) from orders_parts_rowgr1 where (cast(double_id as bigint) >= -255 and double_id <= -5); +-+ | EXPR$0 | +-+ | 251 | +-+ > Wrong results when casting double to bigint or int > -- > > Key: DRILL-4970 > URL: https://issues.apache.org/jira/browse/DRILL-4970 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Data Types >Affects Versions: 1.8.0 >Reporter: Robert Hou > Attachments: test_table > > > This query returns the wrong result > 0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(*) from > test_table where (int_id > -3025 and bigint_id <= -256) or (cast(double_id as > bigint) >= -255 and double_id <= -5); > +-+ > | EXPR$0 | > +-+ > | 2769| > +-+ > Without the cast, it returns the correct result: > 0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(*) from > test_table where (int_id > -3025 and bigint_id <= -256) or (double_id >= -255 > and double_id <= -5); > +-+ > | EXPR$0 | > +-+ > | 3020| > +-+ > By itself, the result is also correct: > 0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(*) from > test_table where (cast(double_id as bigint) >= -255 and double_id <= -5); > +-+ > | EXPR$0 | > +-+ > | 251 | > +-+ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4970) Wrong results when casting double to bigint or int
[ https://issues.apache.org/jira/browse/DRILL-4970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou updated DRILL-4970: -- Attachment: test_table > Wrong results when casting double to bigint or int > -- > > Key: DRILL-4970 > URL: https://issues.apache.org/jira/browse/DRILL-4970 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Data Types >Affects Versions: 1.8.0 >Reporter: Robert Hou > Attachments: test_table > > > This query returns the wrong result > 0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(*) from > orders_parts_rowgr1 where (int_id > -3025 and bigint_id <= -256) or > (cast(double_id as bigint) >= -255 and double_id <= -5); > +-+ > | EXPR$0 | > +-+ > | 2769| > +-+ > Without the cast, it returns the correct result: > 0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(*) from > orders_parts_rowgr1 where (int_id > -3025 and bigint_id <= -256) or > (double_id >= -255 and double_id <= -5); > +-+ > | EXPR$0 | > +-+ > | 3020| > +-+ > By itself, the result is also correct: > 0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(*) from > orders_parts_rowgr1 where (cast(double_id as bigint) >= -255 and double_id <= > -5); > +-+ > | EXPR$0 | > +-+ > | 251 | > +-+ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4971) query encounters system error: Statement "break AndOP3" is not enclosed by a breakable statement with label "AndOP3"
[ https://issues.apache.org/jira/browse/DRILL-4971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15610345#comment-15610345 ] Robert Hou commented on DRILL-4971: --- Put the two files into a directory called "test". > query encounters system error: Statement "break AndOP3" is not enclosed by a > breakable statement with label "AndOP3" > > > Key: DRILL-4971 > URL: https://issues.apache.org/jira/browse/DRILL-4971 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Reporter: Robert Hou > Attachments: low_table, medium_table > > > This query returns an error: > select count(\*) from test where ((int_id > 3060 and int_id < 6002) or > (int_id > 9025 and int_id < 11976)) and ((int_id > 9025 and int_id < 11976) > or (int_id > 3060 and int_id < 6002)) and (int_id > 3060 and int_id < 6002); > Error: SYSTEM ERROR: CompileException: Line 232, Column 30: Statement "break > AndOP3" is not enclosed by a breakable statement with label "AndOP3" > There are two partitions to the test table. One covers the range 3061 - 6001 > and the other covers the range 9026 - 11975. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4971) query encounters system error: Statement "break AndOP3" is not enclosed by a breakable statement with label "AndOP3"
[ https://issues.apache.org/jira/browse/DRILL-4971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou updated DRILL-4971: -- Attachment: low_table medium_table > query encounters system error: Statement "break AndOP3" is not enclosed by a > breakable statement with label "AndOP3" > > > Key: DRILL-4971 > URL: https://issues.apache.org/jira/browse/DRILL-4971 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Reporter: Robert Hou > Attachments: low_table, medium_table > > > This query returns an error: > select count(\*) from test where ((int_id > 3060 and int_id < 6002) or > (int_id > 9025 and int_id < 11976)) and ((int_id > 9025 and int_id < 11976) > or (int_id > 3060 and int_id < 6002)) and (int_id > 3060 and int_id < 6002); > Error: SYSTEM ERROR: CompileException: Line 232, Column 30: Statement "break > AndOP3" is not enclosed by a breakable statement with label "AndOP3" > There are two partitions to the test table. One covers the range 3061 - 6001 > and the other covers the range 9026 - 11975. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4971) query encounters system error: Statement "break AndOP3" is not enclosed by a breakable statement with label "AndOP3"
Robert Hou created DRILL-4971: - Summary: query encounters system error: Statement "break AndOP3" is not enclosed by a breakable statement with label "AndOP3" Key: DRILL-4971 URL: https://issues.apache.org/jira/browse/DRILL-4971 Project: Apache Drill Issue Type: Bug Components: Execution - Flow Reporter: Robert Hou Attachments: low_table, medium_table This query returns an error: select count(\*) from test where ((int_id > 3060 and int_id < 6002) or (int_id > 9025 and int_id < 11976)) and ((int_id > 9025 and int_id < 11976) or (int_id > 3060 and int_id < 6002)) and (int_id > 3060 and int_id < 6002); Error: SYSTEM ERROR: CompileException: Line 232, Column 30: Statement "break AndOP3" is not enclosed by a breakable statement with label "AndOP3" There are two partitions to the test table. One covers the range 3061 - 6001 and the other covers the range 9026 - 11975. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4970) Wrong results when casting double to bigint or int
[ https://issues.apache.org/jira/browse/DRILL-4970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou updated DRILL-4970: -- Description: This query returns the wrong result 0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(\*) from test_table where (int_id > -3025 and bigint_id <= -256) or (cast(double_id as bigint) >= -255 and double_id <= -5); +-+ | EXPR$0 | +-+ | 2769| +-+ Without the cast, it returns the correct result: 0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(\*) from test_table where (int_id > -3025 and bigint_id <= -256) or (double_id >= -255 and double_id <= -5); +-+ | EXPR$0 | +-+ | 3020| +-+ By itself, the result is also correct: 0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(*) from test_table where (cast(double_id as bigint) >= -255 and double_id <= -5); +-+ | EXPR$0 | +-+ | 251 | +-+ was: This query returns the wrong result 0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(*) from test_table where (int_id > -3025 and bigint_id <= -256) or (cast(double_id as bigint) >= -255 and double_id <= -5); +-+ | EXPR$0 | +-+ | 2769| +-+ Without the cast, it returns the correct result: 0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(*) from test_table where (int_id > -3025 and bigint_id <= -256) or (double_id >= -255 and double_id <= -5); +-+ | EXPR$0 | +-+ | 3020| +-+ By itself, the result is also correct: 0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(*) from test_table where (cast(double_id as bigint) >= -255 and double_id <= -5); +-+ | EXPR$0 | +-+ | 251 | +-+ > Wrong results when casting double to bigint or int > -- > > Key: DRILL-4970 > URL: https://issues.apache.org/jira/browse/DRILL-4970 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Data Types >Affects Versions: 1.8.0 >Reporter: Robert Hou > Attachments: test_table > > > This query returns the wrong result > 0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(\*) from > test_table where (int_id > -3025 and bigint_id <= -256) or (cast(double_id as > bigint) >= -255 and double_id <= -5); > +-+ > | EXPR$0 | > +-+ > | 2769| > +-+ > Without the cast, it returns the correct result: > 0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(\*) from > test_table where (int_id > -3025 and bigint_id <= -256) or (double_id >= -255 > and double_id <= -5); > +-+ > | EXPR$0 | > +-+ > | 3020| > +-+ > By itself, the result is also correct: > 0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(*) from > test_table where (cast(double_id as bigint) >= -255 and double_id <= -5); > +-+ > | EXPR$0 | > +-+ > | 251 | > +-+ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4970) Wrong results when casting double to bigint or int
Robert Hou created DRILL-4970: - Summary: Wrong results when casting double to bigint or int Key: DRILL-4970 URL: https://issues.apache.org/jira/browse/DRILL-4970 Project: Apache Drill Issue Type: Bug Components: Execution - Data Types Affects Versions: 1.8.0 Reporter: Robert Hou This query returns the wrong result 0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(*) from orders_parts_rowgr1 where (int_id > -3025 and bigint_id <= -256) or (cast(double_id as bigint) >= -255 and double_id <= -5); +-+ | EXPR$0 | +-+ | 2769| +-+ Without the cast, it returns the correct result: 0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(*) from orders_parts_rowgr1 where (int_id > -3025 and bigint_id <= -256) or (double_id >= -255 and double_id <= -5); +-+ | EXPR$0 | +-+ | 3020| +-+ By itself, the result is also correct: 0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(*) from orders_parts_rowgr1 where (cast(double_id as bigint) >= -255 and double_id <= -5); +-+ | EXPR$0 | +-+ | 251 | +-+ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-5018) Metadata cache has duplicate columnTypeInfo values
[ https://issues.apache.org/jira/browse/DRILL-5018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15649289#comment-15649289 ] Robert Hou commented on DRILL-5018: --- For the second lineitem table, use CTAS with the first lineitem table. create table lineitem2 as select * from lineitem; > Metadata cache has duplicate columnTypeInfo values > -- > > Key: DRILL-5018 > URL: https://issues.apache.org/jira/browse/DRILL-5018 > Project: Apache Drill > Issue Type: Bug > Components: Metadata >Affects Versions: 1.8.0 >Reporter: Robert Hou >Assignee: Parth Chandra > Attachments: lineitem_1_0_1.parquet, lineitem_999.parquet > > > This lineitem table has duplicate entries in its metadata file, although the > entries have slightly different values. This lineitem table uses > directory-based partitioning on year and month. > "columnTypeInfo" : { > "L_RETURNFLAG" : { > "name" : [ "L_RETURNFLAG" ], > "primitiveType" : "BINARY", > "originalType" : null, > "precision" : 0, > "scale" : 0, > "repetitionLevel" : 0, > "definitionLevel" : 1 > }, > "l_returnflag" : { > "name" : [ "l_returnflag" ], > "primitiveType" : "BINARY", > "originalType" : "UTF8", > "precision" : 0, > "scale" : 0, > "repetitionLevel" : 0, > "definitionLevel" : 0 > }, > This lineitem table has two entries in its metadata file for each column, but > the two entries have different column names (adding a zero). It also has > slightly different values. This lineitem table was created using CTAS with > the first table above. > "l_shipinstruct" : { > "name" : [ "l_shipinstruct" ], > "primitiveType" : "BINARY", > "originalType" : "UTF8", > "precision" : 0, > "scale" : 0, > "repetitionLevel" : 0, > "definitionLevel" : 0 > }, > "L_SHIPINSTRUCT0" : { > "name" : [ "L_SHIPINSTRUCT0" ], > "primitiveType" : "BINARY", > "originalType" : null, > "precision" : 0, > "scale" : 0, > "repetitionLevel" : 0, > "definitionLevel" : 1 > }, -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-5018) Metadata cache has duplicate columnTypeInfo values
[ https://issues.apache.org/jira/browse/DRILL-5018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15649293#comment-15649293 ] Robert Hou commented on DRILL-5018: --- The metadata_caching/generated_caches/validate_cache3.q.fail test has been disabled due to this bug. When this bug is fixed, then this test needs to be validated and enabled. > Metadata cache has duplicate columnTypeInfo values > -- > > Key: DRILL-5018 > URL: https://issues.apache.org/jira/browse/DRILL-5018 > Project: Apache Drill > Issue Type: Bug > Components: Metadata >Affects Versions: 1.8.0 >Reporter: Robert Hou >Assignee: Parth Chandra > Attachments: lineitem_1_0_1.parquet, lineitem_999.parquet > > > This lineitem table has duplicate entries in its metadata file, although the > entries have slightly different values. This lineitem table uses > directory-based partitioning on year and month. > "columnTypeInfo" : { > "L_RETURNFLAG" : { > "name" : [ "L_RETURNFLAG" ], > "primitiveType" : "BINARY", > "originalType" : null, > "precision" : 0, > "scale" : 0, > "repetitionLevel" : 0, > "definitionLevel" : 1 > }, > "l_returnflag" : { > "name" : [ "l_returnflag" ], > "primitiveType" : "BINARY", > "originalType" : "UTF8", > "precision" : 0, > "scale" : 0, > "repetitionLevel" : 0, > "definitionLevel" : 0 > }, > This lineitem table has two entries in its metadata file for each column, but > the two entries have different column names (adding a zero). It also has > slightly different values. This lineitem table was created using CTAS with > the first table above. > "l_shipinstruct" : { > "name" : [ "l_shipinstruct" ], > "primitiveType" : "BINARY", > "originalType" : "UTF8", > "precision" : 0, > "scale" : 0, > "repetitionLevel" : 0, > "definitionLevel" : 0 > }, > "L_SHIPINSTRUCT0" : { > "name" : [ "L_SHIPINSTRUCT0" ], > "primitiveType" : "BINARY", > "originalType" : null, > "precision" : 0, > "scale" : 0, > "repetitionLevel" : 0, > "definitionLevel" : 1 > }, -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (DRILL-5035) Selecting timestamp value from Hive table causes IndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/DRILL-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655397#comment-15655397 ] Robert Hou edited comment on DRILL-5035 at 11/10/16 10:36 PM: -- The partition only has null values for timestamp_id. was (Author: rhou): The partition only has null values. > Selecting timestamp value from Hive table causes IndexOutOfBoundsException > -- > > Key: DRILL-5035 > URL: https://issues.apache.org/jira/browse/DRILL-5035 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Data Types >Affects Versions: 1.9.0 >Reporter: Robert Hou > > I used the new option to read Hive timestamps. > alter session set `store.parquet.reader.int96_as_timestamp` = true; > This query fails: > select timestamp_id from orders_parts_hive where timestamp_id = '2016-10-03 > 06:11:52.429'; > Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: > 36288 (expected: 0 <= readerIndex <= writerIndex <= capacity(32768)) > Fragment 0:0 > [Error Id: 50537b32-cdc9-4898-9581-531066288fbd on qa-node211:31010] > (state=,code=0) > Selecting all the columns succeed. > 0: jdbc:drill:zk=10.10.100.186:5181> select * from orders_parts_hive where > timestamp_id = '2016-10-03 06:11:52.429'; > +-+++---+--+--+-++-++---++-+-+--+-+ > | o_orderkey | o_custkey | o_orderstatus | o_totalprice | o_orderdate | > o_clerk | o_shippriority | o_comment > | int_id | bigint_id | float_id | double_id | > varchar_id | date_id | timestamp_id | dir0 > | > +-+++---+--+--+-++-++---++-+-+--+-+ > | 11335 | 871| F | 133549.0 | 1994-10-22 | > null | 0 | ealms. theodolites maintain. regular, even > instructions against t | -4 | -4 | -4.0 | -4.0 | -4 > | 2016-09-29 | 2016-10-03 06:11:52.429 | o_orderpriority=2-HIGH | > +-+++---+--+--+-++-++---++-+-+--+-+ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-5035) Selecting timestamp value from Hive table causes IndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/DRILL-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655464#comment-15655464 ] Robert Hou commented on DRILL-5035: --- I set the new option to false and I do not see a problem. I will try with IMPALA_TIMESTAMP. > Selecting timestamp value from Hive table causes IndexOutOfBoundsException > -- > > Key: DRILL-5035 > URL: https://issues.apache.org/jira/browse/DRILL-5035 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Data Types >Affects Versions: 1.9.0 >Reporter: Robert Hou >Assignee: Vitalii Diravka > > I used the new option to read Hive timestamps. > alter session set `store.parquet.reader.int96_as_timestamp` = true; > This query fails: > select timestamp_id from orders_parts_hive where timestamp_id = '2016-10-03 > 06:11:52.429'; > Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: > 36288 (expected: 0 <= readerIndex <= writerIndex <= capacity(32768)) > Fragment 0:0 > [Error Id: 50537b32-cdc9-4898-9581-531066288fbd on qa-node211:31010] > (state=,code=0) > Selecting all the columns succeed. > 0: jdbc:drill:zk=10.10.100.186:5181> select * from orders_parts_hive where > timestamp_id = '2016-10-03 06:11:52.429'; > +-+++---+--+--+-++-++---++-+-+--+-+ > | o_orderkey | o_custkey | o_orderstatus | o_totalprice | o_orderdate | > o_clerk | o_shippriority | o_comment > | int_id | bigint_id | float_id | double_id | > varchar_id | date_id | timestamp_id | dir0 > | > +-+++---+--+--+-++-++---++-+-+--+-+ > | 11335 | 871| F | 133549.0 | 1994-10-22 | > null | 0 | ealms. theodolites maintain. regular, even > instructions against t | -4 | -4 | -4.0 | -4.0 | -4 > | 2016-09-29 | 2016-10-03 06:11:52.429 | o_orderpriority=2-HIGH | > +-+++---+--+--+-++-++---++-+-+--+-+ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-5035) Selecting timestamp value from Hive table causes IndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/DRILL-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655306#comment-15655306 ] Robert Hou commented on DRILL-5035: --- I am using RC1. 0: jdbc:drill:zk=10.10.100.186:5181> select * from sys.version; +--+---+-++++ | version | commit_id | commit_message|commit_time | build_email | build_time | +--+---+-++++ | 1.9.0| 5cea9afa6278e21574c6a982ae5c3d82085ef904 | [maven-release-plugin] prepare release drill-1.9.0 | 09.11.2016 @ 10:28:44 PST | r...@mapr.com | 10.11.2016 @ 12:56:24 PST | +--+---+-++++ > Selecting timestamp value from Hive table causes IndexOutOfBoundsException > -- > > Key: DRILL-5035 > URL: https://issues.apache.org/jira/browse/DRILL-5035 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Data Types >Affects Versions: 1.9.0 >Reporter: Robert Hou > > I used the new option to read Hive timestamps. > alter session set `store.parquet.reader.int96_as_timestamp` = true; > This query fails: > select timestamp_id from orders_parts_hive where timestamp_id = '2016-10-03 > 06:11:52.429'; > Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: > 36288 (expected: 0 <= readerIndex <= writerIndex <= capacity(32768)) > Fragment 0:0 > [Error Id: 50537b32-cdc9-4898-9581-531066288fbd on qa-node211:31010] > (state=,code=0) > Selecting all the columns succeed. > 0: jdbc:drill:zk=10.10.100.186:5181> select * from orders_parts_hive where > timestamp_id = '2016-10-03 06:11:52.429'; > +-+++---+--+--+-++-++---++-+-+--+-+ > | o_orderkey | o_custkey | o_orderstatus | o_totalprice | o_orderdate | > o_clerk | o_shippriority | o_comment > | int_id | bigint_id | float_id | double_id | > varchar_id | date_id | timestamp_id | dir0 > | > +-+++---+--+--+-++-++---++-+-+--+-+ > | 11335 | 871| F | 133549.0 | 1994-10-22 | > null | 0 | ealms. theodolites maintain. regular, even > instructions against t | -4 | -4 | -4.0 | -4.0 | -4 > | 2016-09-29 | 2016-10-03 06:11:52.429 | o_orderpriority=2-HIGH | > +-+++---+--+--+-++-++---++-+-+--+-+ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-5035) Selecting timestamp value from Hive table causes IndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/DRILL-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655378#comment-15655378 ] Robert Hou commented on DRILL-5035: --- The Hive table is partitioned on o_orderpriority, which is a string. > Selecting timestamp value from Hive table causes IndexOutOfBoundsException > -- > > Key: DRILL-5035 > URL: https://issues.apache.org/jira/browse/DRILL-5035 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Data Types >Affects Versions: 1.9.0 >Reporter: Robert Hou > > I used the new option to read Hive timestamps. > alter session set `store.parquet.reader.int96_as_timestamp` = true; > This query fails: > select timestamp_id from orders_parts_hive where timestamp_id = '2016-10-03 > 06:11:52.429'; > Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: > 36288 (expected: 0 <= readerIndex <= writerIndex <= capacity(32768)) > Fragment 0:0 > [Error Id: 50537b32-cdc9-4898-9581-531066288fbd on qa-node211:31010] > (state=,code=0) > Selecting all the columns succeed. > 0: jdbc:drill:zk=10.10.100.186:5181> select * from orders_parts_hive where > timestamp_id = '2016-10-03 06:11:52.429'; > +-+++---+--+--+-++-++---++-+-+--+-+ > | o_orderkey | o_custkey | o_orderstatus | o_totalprice | o_orderdate | > o_clerk | o_shippriority | o_comment > | int_id | bigint_id | float_id | double_id | > varchar_id | date_id | timestamp_id | dir0 > | > +-+++---+--+--+-++-++---++-+-+--+-+ > | 11335 | 871| F | 133549.0 | 1994-10-22 | > null | 0 | ealms. theodolites maintain. regular, even > instructions against t | -4 | -4 | -4.0 | -4.0 | -4 > | 2016-09-29 | 2016-10-03 06:11:52.429 | o_orderpriority=2-HIGH | > +-+++---+--+--+-++-++---++-+-+--+-+ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-5035) Selecting timestamp value from Hive table causes IndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/DRILL-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655397#comment-15655397 ] Robert Hou commented on DRILL-5035: --- The partition only has null values. > Selecting timestamp value from Hive table causes IndexOutOfBoundsException > -- > > Key: DRILL-5035 > URL: https://issues.apache.org/jira/browse/DRILL-5035 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Data Types >Affects Versions: 1.9.0 >Reporter: Robert Hou > > I used the new option to read Hive timestamps. > alter session set `store.parquet.reader.int96_as_timestamp` = true; > This query fails: > select timestamp_id from orders_parts_hive where timestamp_id = '2016-10-03 > 06:11:52.429'; > Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: > 36288 (expected: 0 <= readerIndex <= writerIndex <= capacity(32768)) > Fragment 0:0 > [Error Id: 50537b32-cdc9-4898-9581-531066288fbd on qa-node211:31010] > (state=,code=0) > Selecting all the columns succeed. > 0: jdbc:drill:zk=10.10.100.186:5181> select * from orders_parts_hive where > timestamp_id = '2016-10-03 06:11:52.429'; > +-+++---+--+--+-++-++---++-+-+--+-+ > | o_orderkey | o_custkey | o_orderstatus | o_totalprice | o_orderdate | > o_clerk | o_shippriority | o_comment > | int_id | bigint_id | float_id | double_id | > varchar_id | date_id | timestamp_id | dir0 > | > +-+++---+--+--+-++-++---++-+-+--+-+ > | 11335 | 871| F | 133549.0 | 1994-10-22 | > null | 0 | ealms. theodolites maintain. regular, even > instructions against t | -4 | -4 | -4.0 | -4.0 | -4 > | 2016-09-29 | 2016-10-03 06:11:52.429 | o_orderpriority=2-HIGH | > +-+++---+--+--+-++-++---++-+-+--+-+ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-5035) Selecting timestamp value from Hive table causes IndexOutOfBoundsException
Robert Hou created DRILL-5035: - Summary: Selecting timestamp value from Hive table causes IndexOutOfBoundsException Key: DRILL-5035 URL: https://issues.apache.org/jira/browse/DRILL-5035 Project: Apache Drill Issue Type: Bug Components: Execution - Data Types Affects Versions: 1.9.0 Reporter: Robert Hou I used the new option to read Hive timestamps. alter session set `store.parquet.reader.int96_as_timestamp` = true; This query fails: select timestamp_id from orders_parts_hive where timestamp_id = '2016-10-03 06:11:52.429'; Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: 36288 (expected: 0 <= readerIndex <= writerIndex <= capacity(32768)) Fragment 0:0 [Error Id: 50537b32-cdc9-4898-9581-531066288fbd on qa-node211:31010] (state=,code=0) Selecting all the columns succeed. 0: jdbc:drill:zk=10.10.100.186:5181> select * from orders_parts_hive where timestamp_id = '2016-10-03 06:11:52.429'; +-+++---+--+--+-++-++---++-+-+--+-+ | o_orderkey | o_custkey | o_orderstatus | o_totalprice | o_orderdate | o_clerk | o_shippriority | o_comment | int_id | bigint_id | float_id | double_id | varchar_id | date_id | timestamp_id | dir0 | +-+++---+--+--+-++-++---++-+-+--+-+ | 11335 | 871| F | 133549.0 | 1994-10-22 | null | 0 | ealms. theodolites maintain. regular, even instructions against t | -4 | -4 | -4.0 | -4.0 | -4 | 2016-09-29 | 2016-10-03 06:11:52.429 | o_orderpriority=2-HIGH | +-+++---+--+--+-++-++---++-+-+--+-+ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (DRILL-5035) Selecting timestamp value from Hive table causes IndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/DRILL-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655374#comment-15655374 ] Robert Hou edited comment on DRILL-5035 at 11/10/16 10:39 PM: -- This table is partitioned on a string. The problem occurs with a partition that has null values. The value of the string is "NOT SPECIFIED". I can select every row up to the null partition using: select timestamp_id from orders_parts_hive limit 9026; But the next row is in the null partition and causes an exception. select timestamp_id from orders_parts_hive limit 9027; was (Author: rhou): This table is partitioned on a string. The problem occurs with a partition that has null values. The value of the varchar is "NOT SPECIFIED". I can select every row up to the null partition using: select timestamp_id from orders_parts_hive limit 9026; But the next row is in the null partition and causes an exception. select timestamp_id from orders_parts_hive limit 9027; > Selecting timestamp value from Hive table causes IndexOutOfBoundsException > -- > > Key: DRILL-5035 > URL: https://issues.apache.org/jira/browse/DRILL-5035 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Data Types >Affects Versions: 1.9.0 >Reporter: Robert Hou > > I used the new option to read Hive timestamps. > alter session set `store.parquet.reader.int96_as_timestamp` = true; > This query fails: > select timestamp_id from orders_parts_hive where timestamp_id = '2016-10-03 > 06:11:52.429'; > Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: > 36288 (expected: 0 <= readerIndex <= writerIndex <= capacity(32768)) > Fragment 0:0 > [Error Id: 50537b32-cdc9-4898-9581-531066288fbd on qa-node211:31010] > (state=,code=0) > Selecting all the columns succeed. > 0: jdbc:drill:zk=10.10.100.186:5181> select * from orders_parts_hive where > timestamp_id = '2016-10-03 06:11:52.429'; > +-+++---+--+--+-++-++---++-+-+--+-+ > | o_orderkey | o_custkey | o_orderstatus | o_totalprice | o_orderdate | > o_clerk | o_shippriority | o_comment > | int_id | bigint_id | float_id | double_id | > varchar_id | date_id | timestamp_id | dir0 > | > +-+++---+--+--+-++-++---++-+-+--+-+ > | 11335 | 871| F | 133549.0 | 1994-10-22 | > null | 0 | ealms. theodolites maintain. regular, even > instructions against t | -4 | -4 | -4.0 | -4.0 | -4 > | 2016-09-29 | 2016-10-03 06:11:52.429 | o_orderpriority=2-HIGH | > +-+++---+--+--+-++-++---++-+-+--+-+ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (DRILL-5035) Selecting timestamp value from Hive table causes IndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/DRILL-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655397#comment-15655397 ] Robert Hou edited comment on DRILL-5035 at 11/10/16 10:39 PM: -- The partition only has null values for timestamp_id. Could this be an issue with empty batches? There are 3024 null values in the partition. was (Author: rhou): The partition only has null values for timestamp_id. > Selecting timestamp value from Hive table causes IndexOutOfBoundsException > -- > > Key: DRILL-5035 > URL: https://issues.apache.org/jira/browse/DRILL-5035 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Data Types >Affects Versions: 1.9.0 >Reporter: Robert Hou > > I used the new option to read Hive timestamps. > alter session set `store.parquet.reader.int96_as_timestamp` = true; > This query fails: > select timestamp_id from orders_parts_hive where timestamp_id = '2016-10-03 > 06:11:52.429'; > Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: > 36288 (expected: 0 <= readerIndex <= writerIndex <= capacity(32768)) > Fragment 0:0 > [Error Id: 50537b32-cdc9-4898-9581-531066288fbd on qa-node211:31010] > (state=,code=0) > Selecting all the columns succeed. > 0: jdbc:drill:zk=10.10.100.186:5181> select * from orders_parts_hive where > timestamp_id = '2016-10-03 06:11:52.429'; > +-+++---+--+--+-++-++---++-+-+--+-+ > | o_orderkey | o_custkey | o_orderstatus | o_totalprice | o_orderdate | > o_clerk | o_shippriority | o_comment > | int_id | bigint_id | float_id | double_id | > varchar_id | date_id | timestamp_id | dir0 > | > +-+++---+--+--+-++-++---++-+-+--+-+ > | 11335 | 871| F | 133549.0 | 1994-10-22 | > null | 0 | ealms. theodolites maintain. regular, even > instructions against t | -4 | -4 | -4.0 | -4.0 | -4 > | 2016-09-29 | 2016-10-03 06:11:52.429 | o_orderpriority=2-HIGH | > +-+++---+--+--+-++-++---++-+-+--+-+ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (DRILL-5035) Selecting timestamp value from Hive table causes IndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/DRILL-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655374#comment-15655374 ] Robert Hou edited comment on DRILL-5035 at 11/10/16 10:39 PM: -- This table is partitioned on a string. The problem occurs with a partition that has null values. The value of the varchar is "NOT SPECIFIED". I can select every row up to the null partition using: select timestamp_id from orders_parts_hive limit 9026; But the next row is in the null partition and causes an exception. select timestamp_id from orders_parts_hive limit 9027; was (Author: rhou): This table is partitioned on a varchar. The problem occurs with a partition that has null values. The value of the varchar is "NOT SPECIFIED". I can select every row up to the null partition using: select timestamp_id from orders_parts_hive limit 9026; But the next row is in the null partition and causes an exception. select timestamp_id from orders_parts_hive limit 9027; > Selecting timestamp value from Hive table causes IndexOutOfBoundsException > -- > > Key: DRILL-5035 > URL: https://issues.apache.org/jira/browse/DRILL-5035 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Data Types >Affects Versions: 1.9.0 >Reporter: Robert Hou > > I used the new option to read Hive timestamps. > alter session set `store.parquet.reader.int96_as_timestamp` = true; > This query fails: > select timestamp_id from orders_parts_hive where timestamp_id = '2016-10-03 > 06:11:52.429'; > Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: > 36288 (expected: 0 <= readerIndex <= writerIndex <= capacity(32768)) > Fragment 0:0 > [Error Id: 50537b32-cdc9-4898-9581-531066288fbd on qa-node211:31010] > (state=,code=0) > Selecting all the columns succeed. > 0: jdbc:drill:zk=10.10.100.186:5181> select * from orders_parts_hive where > timestamp_id = '2016-10-03 06:11:52.429'; > +-+++---+--+--+-++-++---++-+-+--+-+ > | o_orderkey | o_custkey | o_orderstatus | o_totalprice | o_orderdate | > o_clerk | o_shippriority | o_comment > | int_id | bigint_id | float_id | double_id | > varchar_id | date_id | timestamp_id | dir0 > | > +-+++---+--+--+-++-++---++-+-+--+-+ > | 11335 | 871| F | 133549.0 | 1994-10-22 | > null | 0 | ealms. theodolites maintain. regular, even > instructions against t | -4 | -4 | -4.0 | -4.0 | -4 > | 2016-09-29 | 2016-10-03 06:11:52.429 | o_orderpriority=2-HIGH | > +-+++---+--+--+-++-++---++-+-+--+-+ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-5035) Selecting timestamp value from Hive table causes IndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/DRILL-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655479#comment-15655479 ] Robert Hou commented on DRILL-5035: --- I'm trying to figure out how to do that. Because it is a Hive partitioned table, it has five directories, each with one file, and they all have the same name. Maybe I'll use a tar file. > Selecting timestamp value from Hive table causes IndexOutOfBoundsException > -- > > Key: DRILL-5035 > URL: https://issues.apache.org/jira/browse/DRILL-5035 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Data Types >Affects Versions: 1.9.0 >Reporter: Robert Hou >Assignee: Vitalii Diravka > > I used the new option to read Hive timestamps. > alter session set `store.parquet.reader.int96_as_timestamp` = true; > This query fails: > select timestamp_id from orders_parts_hive where timestamp_id = '2016-10-03 > 06:11:52.429'; > Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: > 36288 (expected: 0 <= readerIndex <= writerIndex <= capacity(32768)) > Fragment 0:0 > [Error Id: 50537b32-cdc9-4898-9581-531066288fbd on qa-node211:31010] > (state=,code=0) > Selecting all the columns succeed. > 0: jdbc:drill:zk=10.10.100.186:5181> select * from orders_parts_hive where > timestamp_id = '2016-10-03 06:11:52.429'; > +-+++---+--+--+-++-++---++-+-+--+-+ > | o_orderkey | o_custkey | o_orderstatus | o_totalprice | o_orderdate | > o_clerk | o_shippriority | o_comment > | int_id | bigint_id | float_id | double_id | > varchar_id | date_id | timestamp_id | dir0 > | > +-+++---+--+--+-++-++---++-+-+--+-+ > | 11335 | 871| F | 133549.0 | 1994-10-22 | > null | 0 | ealms. theodolites maintain. regular, even > instructions against t | -4 | -4 | -4.0 | -4.0 | -4 > | 2016-09-29 | 2016-10-03 06:11:52.429 | o_orderpriority=2-HIGH | > +-+++---+--+--+-++-++---++-+-+--+-+ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-5035) Selecting timestamp value from Hive table causes IndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/DRILL-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655374#comment-15655374 ] Robert Hou commented on DRILL-5035: --- This table is partitioned on a varchar. The problem occurs with a partition that has null values. The value of the varchar is "NOT SPECIFIED". I can select every row up to the null partition using: select timestamp_id from orders_parts_hive limit 9026; But the next row is in the null partition and causes an exception. select timestamp_id from orders_parts_hive limit 9027; > Selecting timestamp value from Hive table causes IndexOutOfBoundsException > -- > > Key: DRILL-5035 > URL: https://issues.apache.org/jira/browse/DRILL-5035 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Data Types >Affects Versions: 1.9.0 >Reporter: Robert Hou > > I used the new option to read Hive timestamps. > alter session set `store.parquet.reader.int96_as_timestamp` = true; > This query fails: > select timestamp_id from orders_parts_hive where timestamp_id = '2016-10-03 > 06:11:52.429'; > Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: > 36288 (expected: 0 <= readerIndex <= writerIndex <= capacity(32768)) > Fragment 0:0 > [Error Id: 50537b32-cdc9-4898-9581-531066288fbd on qa-node211:31010] > (state=,code=0) > Selecting all the columns succeed. > 0: jdbc:drill:zk=10.10.100.186:5181> select * from orders_parts_hive where > timestamp_id = '2016-10-03 06:11:52.429'; > +-+++---+--+--+-++-++---++-+-+--+-+ > | o_orderkey | o_custkey | o_orderstatus | o_totalprice | o_orderdate | > o_clerk | o_shippriority | o_comment > | int_id | bigint_id | float_id | double_id | > varchar_id | date_id | timestamp_id | dir0 > | > +-+++---+--+--+-++-++---++-+-+--+-+ > | 11335 | 871| F | 133549.0 | 1994-10-22 | > null | 0 | ealms. theodolites maintain. regular, even > instructions against t | -4 | -4 | -4.0 | -4.0 | -4 > | 2016-09-29 | 2016-10-03 06:11:52.429 | o_orderpriority=2-HIGH | > +-+++---+--+--+-++-++---++-+-+--+-+ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-5035) Selecting timestamp value from Hive table causes IndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/DRILL-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655896#comment-15655896 ] Robert Hou commented on DRILL-5035: --- I am not able to use timestamp_impala yet. But I tried the original query with Drill 1.8, and I get zero rows back. Which makes sense, since we are not interpreting the timestamp correctly. select timestamp_id from orders_parts_hive where timestamp_id >= '2016-10-09 13:36:38.986' and timestamp_id <= '2016-10-09 13:45:38.986'; +---+ | timestamp_id | +---+ +---+ I also tried selecting the whole column. I get bad values (known problem), but I get all the values. I don't get an exception. select timestamp_id from orders_parts_hive; > Selecting timestamp value from Hive table causes IndexOutOfBoundsException > -- > > Key: DRILL-5035 > URL: https://issues.apache.org/jira/browse/DRILL-5035 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Data Types >Affects Versions: 1.9.0 >Reporter: Robert Hou >Assignee: Vitalii Diravka > Attachments: orders_parts_hive.tar > > > I used the new option to read Hive timestamps. > alter session set `store.parquet.reader.int96_as_timestamp` = true; > This query fails: > select timestamp_id from orders_parts_hive where timestamp_id = '2016-10-03 > 06:11:52.429'; > Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: > 36288 (expected: 0 <= readerIndex <= writerIndex <= capacity(32768)) > Fragment 0:0 > [Error Id: 50537b32-cdc9-4898-9581-531066288fbd on qa-node211:31010] > (state=,code=0) > Selecting all the columns succeed. > 0: jdbc:drill:zk=10.10.100.186:5181> select * from orders_parts_hive where > timestamp_id = '2016-10-03 06:11:52.429'; > +-+++---+--+--+-++-++---++-+-+--+-+ > | o_orderkey | o_custkey | o_orderstatus | o_totalprice | o_orderdate | > o_clerk | o_shippriority | o_comment > | int_id | bigint_id | float_id | double_id | > varchar_id | date_id | timestamp_id | dir0 > | > +-+++---+--+--+-++-++---++-+-+--+-+ > | 11335 | 871| F | 133549.0 | 1994-10-22 | > null | 0 | ealms. theodolites maintain. regular, even > instructions against t | -4 | -4 | -4.0 | -4.0 | -4 > | 2016-09-29 | 2016-10-03 06:11:52.429 | o_orderpriority=2-HIGH | > +-+++---+--+--+-++-++---++-+-+--+-+ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (DRILL-5035) Selecting timestamp value from Hive table causes IndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/DRILL-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655464#comment-15655464 ] Robert Hou edited comment on DRILL-5035 at 11/11/16 2:29 AM: - I set the new option to false and I do not get an exception. I will try with IMPALA_TIMESTAMP. was (Author: rhou): I set the new option to false and I do not see a problem. I will try with IMPALA_TIMESTAMP. > Selecting timestamp value from Hive table causes IndexOutOfBoundsException > -- > > Key: DRILL-5035 > URL: https://issues.apache.org/jira/browse/DRILL-5035 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Data Types >Affects Versions: 1.9.0 >Reporter: Robert Hou >Assignee: Vitalii Diravka > Attachments: orders_parts_hive.tar > > > I used the new option to read Hive timestamps. > alter session set `store.parquet.reader.int96_as_timestamp` = true; > This query fails: > select timestamp_id from orders_parts_hive where timestamp_id = '2016-10-03 > 06:11:52.429'; > Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: > 36288 (expected: 0 <= readerIndex <= writerIndex <= capacity(32768)) > Fragment 0:0 > [Error Id: 50537b32-cdc9-4898-9581-531066288fbd on qa-node211:31010] > (state=,code=0) > Selecting all the columns succeed. > 0: jdbc:drill:zk=10.10.100.186:5181> select * from orders_parts_hive where > timestamp_id = '2016-10-03 06:11:52.429'; > +-+++---+--+--+-++-++---++-+-+--+-+ > | o_orderkey | o_custkey | o_orderstatus | o_totalprice | o_orderdate | > o_clerk | o_shippriority | o_comment > | int_id | bigint_id | float_id | double_id | > varchar_id | date_id | timestamp_id | dir0 > | > +-+++---+--+--+-++-++---++-+-+--+-+ > | 11335 | 871| F | 133549.0 | 1994-10-22 | > null | 0 | ealms. theodolites maintain. regular, even > instructions against t | -4 | -4 | -4.0 | -4.0 | -4 > | 2016-09-29 | 2016-10-03 06:11:52.429 | o_orderpriority=2-HIGH | > +-+++---+--+--+-++-++---++-+-+--+-+ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-5035) Selecting timestamp value from Hive table causes IndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/DRILL-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655476#comment-15655476 ] Robert Hou commented on DRILL-5035: --- Yes, I created it. It is a Hive table partitioned on a string. I created it using data from a Drill table. > Selecting timestamp value from Hive table causes IndexOutOfBoundsException > -- > > Key: DRILL-5035 > URL: https://issues.apache.org/jira/browse/DRILL-5035 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Data Types >Affects Versions: 1.9.0 >Reporter: Robert Hou >Assignee: Vitalii Diravka > > I used the new option to read Hive timestamps. > alter session set `store.parquet.reader.int96_as_timestamp` = true; > This query fails: > select timestamp_id from orders_parts_hive where timestamp_id = '2016-10-03 > 06:11:52.429'; > Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: > 36288 (expected: 0 <= readerIndex <= writerIndex <= capacity(32768)) > Fragment 0:0 > [Error Id: 50537b32-cdc9-4898-9581-531066288fbd on qa-node211:31010] > (state=,code=0) > Selecting all the columns succeed. > 0: jdbc:drill:zk=10.10.100.186:5181> select * from orders_parts_hive where > timestamp_id = '2016-10-03 06:11:52.429'; > +-+++---+--+--+-++-++---++-+-+--+-+ > | o_orderkey | o_custkey | o_orderstatus | o_totalprice | o_orderdate | > o_clerk | o_shippriority | o_comment > | int_id | bigint_id | float_id | double_id | > varchar_id | date_id | timestamp_id | dir0 > | > +-+++---+--+--+-++-++---++-+-+--+-+ > | 11335 | 871| F | 133549.0 | 1994-10-22 | > null | 0 | ealms. theodolites maintain. regular, even > instructions against t | -4 | -4 | -4.0 | -4.0 | -4 > | 2016-09-29 | 2016-10-03 06:11:52.429 | o_orderpriority=2-HIGH | > +-+++---+--+--+-++-++---++-+-+--+-+ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-5035) Selecting timestamp value from Hive table causes IndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/DRILL-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou updated DRILL-5035: -- Attachment: orders_parts_hive.tar This is a Hive partitioned table. It is partitioned on o_orderpriority. > Selecting timestamp value from Hive table causes IndexOutOfBoundsException > -- > > Key: DRILL-5035 > URL: https://issues.apache.org/jira/browse/DRILL-5035 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Data Types >Affects Versions: 1.9.0 >Reporter: Robert Hou >Assignee: Vitalii Diravka > Attachments: orders_parts_hive.tar > > > I used the new option to read Hive timestamps. > alter session set `store.parquet.reader.int96_as_timestamp` = true; > This query fails: > select timestamp_id from orders_parts_hive where timestamp_id = '2016-10-03 > 06:11:52.429'; > Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: > 36288 (expected: 0 <= readerIndex <= writerIndex <= capacity(32768)) > Fragment 0:0 > [Error Id: 50537b32-cdc9-4898-9581-531066288fbd on qa-node211:31010] > (state=,code=0) > Selecting all the columns succeed. > 0: jdbc:drill:zk=10.10.100.186:5181> select * from orders_parts_hive where > timestamp_id = '2016-10-03 06:11:52.429'; > +-+++---+--+--+-++-++---++-+-+--+-+ > | o_orderkey | o_custkey | o_orderstatus | o_totalprice | o_orderdate | > o_clerk | o_shippriority | o_comment > | int_id | bigint_id | float_id | double_id | > varchar_id | date_id | timestamp_id | dir0 > | > +-+++---+--+--+-++-++---++-+-+--+-+ > | 11335 | 871| F | 133549.0 | 1994-10-22 | > null | 0 | ealms. theodolites maintain. regular, even > instructions against t | -4 | -4 | -4.0 | -4.0 | -4 > | 2016-09-29 | 2016-10-03 06:11:52.429 | o_orderpriority=2-HIGH | > +-+++---+--+--+-++-++---++-+-+--+-+ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-5035) Selecting timestamp value from Hive table causes IndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/DRILL-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655504#comment-15655504 ] Robert Hou commented on DRILL-5035: --- Interesting. I exported Drill data to a tbl file. I edited the tbl file so that Hive could read it. I created a Hive table and loaded it from the tbl file. Created a parquet Hive table from the first Hive table. And then created a partitioned Hive table from the parquet Hive table. > Selecting timestamp value from Hive table causes IndexOutOfBoundsException > -- > > Key: DRILL-5035 > URL: https://issues.apache.org/jira/browse/DRILL-5035 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Data Types >Affects Versions: 1.9.0 >Reporter: Robert Hou >Assignee: Vitalii Diravka > Attachments: orders_parts_hive.tar > > > I used the new option to read Hive timestamps. > alter session set `store.parquet.reader.int96_as_timestamp` = true; > This query fails: > select timestamp_id from orders_parts_hive where timestamp_id = '2016-10-03 > 06:11:52.429'; > Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: > 36288 (expected: 0 <= readerIndex <= writerIndex <= capacity(32768)) > Fragment 0:0 > [Error Id: 50537b32-cdc9-4898-9581-531066288fbd on qa-node211:31010] > (state=,code=0) > Selecting all the columns succeed. > 0: jdbc:drill:zk=10.10.100.186:5181> select * from orders_parts_hive where > timestamp_id = '2016-10-03 06:11:52.429'; > +-+++---+--+--+-++-++---++-+-+--+-+ > | o_orderkey | o_custkey | o_orderstatus | o_totalprice | o_orderdate | > o_clerk | o_shippriority | o_comment > | int_id | bigint_id | float_id | double_id | > varchar_id | date_id | timestamp_id | dir0 > | > +-+++---+--+--+-++-++---++-+-+--+-+ > | 11335 | 871| F | 133549.0 | 1994-10-22 | > null | 0 | ealms. theodolites maintain. regular, even > instructions against t | -4 | -4 | -4.0 | -4.0 | -4 > | 2016-09-29 | 2016-10-03 06:11:52.429 | o_orderpriority=2-HIGH | > +-+++---+--+--+-++-++---++-+-+--+-+ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-5035) Selecting timestamp value from Hive table causes IndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/DRILL-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655535#comment-15655535 ] Robert Hou commented on DRILL-5035: --- ~/bin/parquet-meta 00_0 file: file:/root/drill-test-framework-pushdown/data/orders_parts_hive/o_orderpriority=1-URGENT/00_0 creator:parquet-mr version 1.6.0 > Selecting timestamp value from Hive table causes IndexOutOfBoundsException > -- > > Key: DRILL-5035 > URL: https://issues.apache.org/jira/browse/DRILL-5035 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Data Types >Affects Versions: 1.9.0 >Reporter: Robert Hou >Assignee: Vitalii Diravka > Attachments: orders_parts_hive.tar > > > I used the new option to read Hive timestamps. > alter session set `store.parquet.reader.int96_as_timestamp` = true; > This query fails: > select timestamp_id from orders_parts_hive where timestamp_id = '2016-10-03 > 06:11:52.429'; > Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: > 36288 (expected: 0 <= readerIndex <= writerIndex <= capacity(32768)) > Fragment 0:0 > [Error Id: 50537b32-cdc9-4898-9581-531066288fbd on qa-node211:31010] > (state=,code=0) > Selecting all the columns succeed. > 0: jdbc:drill:zk=10.10.100.186:5181> select * from orders_parts_hive where > timestamp_id = '2016-10-03 06:11:52.429'; > +-+++---+--+--+-++-++---++-+-+--+-+ > | o_orderkey | o_custkey | o_orderstatus | o_totalprice | o_orderdate | > o_clerk | o_shippriority | o_comment > | int_id | bigint_id | float_id | double_id | > varchar_id | date_id | timestamp_id | dir0 > | > +-+++---+--+--+-++-++---++-+-+--+-+ > | 11335 | 871| F | 133549.0 | 1994-10-22 | > null | 0 | ealms. theodolites maintain. regular, even > instructions against t | -4 | -4 | -4.0 | -4.0 | -4 > | 2016-09-29 | 2016-10-03 06:11:52.429 | o_orderpriority=2-HIGH | > +-+++---+--+--+-++-++---++-+-+--+-+ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-5035) Selecting timestamp value from Hive table causes IndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/DRILL-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655566#comment-15655566 ] Robert Hou commented on DRILL-5035: --- I tried with Hive. It succeeds. > Selecting timestamp value from Hive table causes IndexOutOfBoundsException > -- > > Key: DRILL-5035 > URL: https://issues.apache.org/jira/browse/DRILL-5035 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Data Types >Affects Versions: 1.9.0 >Reporter: Robert Hou >Assignee: Vitalii Diravka > Attachments: orders_parts_hive.tar > > > I used the new option to read Hive timestamps. > alter session set `store.parquet.reader.int96_as_timestamp` = true; > This query fails: > select timestamp_id from orders_parts_hive where timestamp_id = '2016-10-03 > 06:11:52.429'; > Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: > 36288 (expected: 0 <= readerIndex <= writerIndex <= capacity(32768)) > Fragment 0:0 > [Error Id: 50537b32-cdc9-4898-9581-531066288fbd on qa-node211:31010] > (state=,code=0) > Selecting all the columns succeed. > 0: jdbc:drill:zk=10.10.100.186:5181> select * from orders_parts_hive where > timestamp_id = '2016-10-03 06:11:52.429'; > +-+++---+--+--+-++-++---++-+-+--+-+ > | o_orderkey | o_custkey | o_orderstatus | o_totalprice | o_orderdate | > o_clerk | o_shippriority | o_comment > | int_id | bigint_id | float_id | double_id | > varchar_id | date_id | timestamp_id | dir0 > | > +-+++---+--+--+-++-++---++-+-+--+-+ > | 11335 | 871| F | 133549.0 | 1994-10-22 | > null | 0 | ealms. theodolites maintain. regular, even > instructions against t | -4 | -4 | -4.0 | -4.0 | -4 > | 2016-09-29 | 2016-10-03 06:11:52.429 | o_orderpriority=2-HIGH | > +-+++---+--+--+-++-++---++-+-+--+-+ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-5035) Selecting timestamp value from Hive table causes IndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/DRILL-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655518#comment-15655518 ] Robert Hou commented on DRILL-5035: --- I am not sure this is a release stopper. It may be due to the fact that I have a partition that only has null values for the column. > Selecting timestamp value from Hive table causes IndexOutOfBoundsException > -- > > Key: DRILL-5035 > URL: https://issues.apache.org/jira/browse/DRILL-5035 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Data Types >Affects Versions: 1.9.0 >Reporter: Robert Hou >Assignee: Vitalii Diravka > Attachments: orders_parts_hive.tar > > > I used the new option to read Hive timestamps. > alter session set `store.parquet.reader.int96_as_timestamp` = true; > This query fails: > select timestamp_id from orders_parts_hive where timestamp_id = '2016-10-03 > 06:11:52.429'; > Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: > 36288 (expected: 0 <= readerIndex <= writerIndex <= capacity(32768)) > Fragment 0:0 > [Error Id: 50537b32-cdc9-4898-9581-531066288fbd on qa-node211:31010] > (state=,code=0) > Selecting all the columns succeed. > 0: jdbc:drill:zk=10.10.100.186:5181> select * from orders_parts_hive where > timestamp_id = '2016-10-03 06:11:52.429'; > +-+++---+--+--+-++-++---++-+-+--+-+ > | o_orderkey | o_custkey | o_orderstatus | o_totalprice | o_orderdate | > o_clerk | o_shippriority | o_comment > | int_id | bigint_id | float_id | double_id | > varchar_id | date_id | timestamp_id | dir0 > | > +-+++---+--+--+-++-++---++-+-+--+-+ > | 11335 | 871| F | 133549.0 | 1994-10-22 | > null | 0 | ealms. theodolites maintain. regular, even > instructions against t | -4 | -4 | -4.0 | -4.0 | -4 > | 2016-09-29 | 2016-10-03 06:11:52.429 | o_orderpriority=2-HIGH | > +-+++---+--+--+-++-++---++-+-+--+-+ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-5035) Selecting timestamp value from Hive table causes IndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/DRILL-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655507#comment-15655507 ] Robert Hou commented on DRILL-5035: --- The DDL for the partitioned Hive table: create table orders_parts_hive ( o_orderkey int, o_custkey int, o_orderstatus string, o_totalprice double, o_orderdate date, o_clerk string, o_shippriority int, o_comment string, int_id int, bigint_id bigint, float_id float, double_id double, varchar_id string, date_id date, timestamp_id timestamp) partitioned by (o_orderpriority string) stored as parquet; > Selecting timestamp value from Hive table causes IndexOutOfBoundsException > -- > > Key: DRILL-5035 > URL: https://issues.apache.org/jira/browse/DRILL-5035 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Data Types >Affects Versions: 1.9.0 >Reporter: Robert Hou >Assignee: Vitalii Diravka > Attachments: orders_parts_hive.tar > > > I used the new option to read Hive timestamps. > alter session set `store.parquet.reader.int96_as_timestamp` = true; > This query fails: > select timestamp_id from orders_parts_hive where timestamp_id = '2016-10-03 > 06:11:52.429'; > Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: > 36288 (expected: 0 <= readerIndex <= writerIndex <= capacity(32768)) > Fragment 0:0 > [Error Id: 50537b32-cdc9-4898-9581-531066288fbd on qa-node211:31010] > (state=,code=0) > Selecting all the columns succeed. > 0: jdbc:drill:zk=10.10.100.186:5181> select * from orders_parts_hive where > timestamp_id = '2016-10-03 06:11:52.429'; > +-+++---+--+--+-++-++---++-+-+--+-+ > | o_orderkey | o_custkey | o_orderstatus | o_totalprice | o_orderdate | > o_clerk | o_shippriority | o_comment > | int_id | bigint_id | float_id | double_id | > varchar_id | date_id | timestamp_id | dir0 > | > +-+++---+--+--+-++-++---++-+-+--+-+ > | 11335 | 871| F | 133549.0 | 1994-10-22 | > null | 0 | ealms. theodolites maintain. regular, even > instructions against t | -4 | -4 | -4.0 | -4.0 | -4 > | 2016-09-29 | 2016-10-03 06:11:52.429 | o_orderpriority=2-HIGH | > +-+++---+--+--+-++-++---++-+-+--+-+ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4971) query encounters system error: Statement "break AndOP3" is not enclosed by a breakable statement with label "AndOP3"
[ https://issues.apache.org/jira/browse/DRILL-4971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15658487#comment-15658487 ] Robert Hou commented on DRILL-4971: --- The same problem occurs on 1.8.0. > query encounters system error: Statement "break AndOP3" is not enclosed by a > breakable statement with label "AndOP3" > > > Key: DRILL-4971 > URL: https://issues.apache.org/jira/browse/DRILL-4971 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Reporter: Robert Hou > Attachments: low_table, medium_table > > > This query returns an error: > select count(\*) from test where ((int_id > 3060 and int_id < 6002) or > (int_id > 9025 and int_id < 11976)) and ((int_id > 9025 and int_id < 11976) > or (int_id > 3060 and int_id < 6002)) and (int_id > 3060 and int_id < 6002); > Error: SYSTEM ERROR: CompileException: Line 232, Column 30: Statement "break > AndOP3" is not enclosed by a breakable statement with label "AndOP3" > There are two partitions to the test table. One covers the range 3061 - 6001 > and the other covers the range 9026 - 11975. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4971) query encounters system error: Statement "break AndOP3" is not enclosed by a breakable statement with label "AndOP3"
[ https://issues.apache.org/jira/browse/DRILL-4971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou updated DRILL-4971: -- Description: This query returns an error. The stack trace suggests it might be a schema change issue, but there is no schema change in this table. Many other queries are succeeding. select count(\*) from test where ((int_id > 3060 and int_id < 6002) or (int_id > 9025 and int_id < 11976)) and ((int_id > 9025 and int_id < 11976) or (int_id > 3060 and int_id < 6002)) and (int_id > 3060 and int_id < 6002); Error: SYSTEM ERROR: CompileException: Line 232, Column 30: Statement "break AndOP3" is not enclosed by a breakable statement with label "AndOP3" [Error Id: 254d093b-79a1-4425-802c-ade08db293e4 on qa-node211:31010]^M ^M (org.apache.drill.exec.exception.SchemaChangeException) Failure while attempting to load generated class^M org.apache.drill.exec.physical.impl.filter.FilterRecordBatch.generateSV2Filterer():198^M org.apache.drill.exec.physical.impl.filter.FilterRecordBatch.setupNewSchema():107^M org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():78^M There are two partitions to the test table. One covers the range 3061 - 6001 and the other covers the range 9026 - 11975. This second query returns a different, but possibly related, error. select count(\*) from orders_parts where (((int_id > -3025 and int_id < -4) or (int_id > -5 and int_id < 3061) or (int_id > 3060 and int_id < 6002)) and (int_id > -5 and int_id < 3061)) and (((int_id > -5 and int_id < 3061) or (int_id > 9025 and int_id < 11976)) and (int_id > -5 and int_id < 3061))^M Failed with exception^M java.sql.SQLException: SYSTEM ERROR: CompileException: Line 447, Column 30: Statement "break AndOP6" is not enclosed by a breakable statement with label "AndOP6"^M ^M Fragment 0:0^M ^M [Error Id: ac09187e-d3a2-41a7-a659-b287aca6039c on qa-node209:31010]^M ^M (org.apache.drill.exec.exception.SchemaChangeException) Failure while attempting to load generated class^M org.apache.drill.exec.physical.impl.filter.FilterRecordBatch.generateSV2Filterer():198^M org.apache.drill.exec.physical.impl.filter.FilterRecordBatch.setupNewSchema():107^M org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():78^M was: This query returns an error: select count(\*) from test where ((int_id > 3060 and int_id < 6002) or (int_id > 9025 and int_id < 11976)) and ((int_id > 9025 and int_id < 11976) or (int_id > 3060 and int_id < 6002)) and (int_id > 3060 and int_id < 6002); Error: SYSTEM ERROR: CompileException: Line 232, Column 30: Statement "break AndOP3" is not enclosed by a breakable statement with label "AndOP3" There are two partitions to the test table. One covers the range 3061 - 6001 and the other covers the range 9026 - 11975. > query encounters system error: Statement "break AndOP3" is not enclosed by a > breakable statement with label "AndOP3" > > > Key: DRILL-4971 > URL: https://issues.apache.org/jira/browse/DRILL-4971 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Reporter: Robert Hou > Attachments: low_table, medium_table > > > This query returns an error. The stack trace suggests it might be a schema > change issue, but there is no schema change in this table. Many other > queries are succeeding. > select count(\*) from test where ((int_id > 3060 and int_id < 6002) or > (int_id > 9025 and int_id < 11976)) and ((int_id > 9025 and int_id < 11976) > or (int_id > 3060 and int_id < 6002)) and (int_id > 3060 and int_id < 6002); > Error: SYSTEM ERROR: CompileException: Line 232, Column 30: Statement "break > AndOP3" is not enclosed by a breakable statement with label "AndOP3" > [Error Id: 254d093b-79a1-4425-802c-ade08db293e4 on qa-node211:31010]^M > ^M > (org.apache.drill.exec.exception.SchemaChangeException) Failure while > attempting to load generated class^M > > org.apache.drill.exec.physical.impl.filter.FilterRecordBatch.generateSV2Filterer():198^M > > org.apache.drill.exec.physical.impl.filter.FilterRecordBatch.setupNewSchema():107^M > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():78^M > There are two partitions to the test table. One covers the range 3061 - 6001 > and the other covers the range 9026 - 11975. > This second query returns a different, but possibly related, error. > select count(\*) from orders_parts where (((int_id > -3025 and int_id < -4) > or (int_id > -5 and int_id < 3061) or (int_id > 3060 and int_id < 6002)) and > (int_id > -5 and int_id < 3061)) and (((int_id > -5 and int_id < 3061) or > (int_id > 9025 and int_id < 11976)) and (int_id > -5 and int_id < 3061))^M > Failed with exception^M
[jira] [Updated] (DRILL-5086) ClassCastException when filter pushdown is used with a bigint or float column.
[ https://issues.apache.org/jira/browse/DRILL-5086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou updated DRILL-5086: -- Attachment: 0_0_5.parquet 0_0_4.parquet 0_0_3.parquet 0_0_2.parquet 0_0_1.parquet drill.parquet_metadata > ClassCastException when filter pushdown is used with a bigint or float column. > -- > > Key: DRILL-5086 > URL: https://issues.apache.org/jira/browse/DRILL-5086 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization >Affects Versions: 1.9.0 >Reporter: Robert Hou >Assignee: Aman Sinha > Attachments: 0_0_1.parquet, 0_0_2.parquet, 0_0_3.parquet, > 0_0_4.parquet, 0_0_5.parquet, drill.parquet_metadata > > > This query results in a ClassCastException when filter pushdown is used. The > bigint column is being compared with an integer value. > 0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(\*) from > orders_parts_metadata where bigint_id < 1100; > Error: SYSTEM ERROR: ClassCastException: java.lang.Integer cannot be cast to > java.lang.Long > A similar problem occurs when a float column is being compared with a double > value. > 0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(\*) from > orders_parts_metadata where float_id < 1100.0; > Error: SYSTEM ERROR: ClassCastException > Also when a timestamp column is being compared with a string. > 0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(\*) from > orders_parts_metadata where timestamp_id < '2016-10-13'; > Error: SYSTEM ERROR: ClassCastException: java.lang.Integer cannot be cast to > java.lang.Long -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-5086) ClassCastException when filter pushdown is used with a bigint or float column.
[ https://issues.apache.org/jira/browse/DRILL-5086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou updated DRILL-5086: -- Description: This query results in a ClassCastException when filter pushdown is used. The bigint column is being compared with an integer value. 0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(\*) from orders_parts_metadata where bigint_id < 1100; Error: SYSTEM ERROR: ClassCastException: java.lang.Integer cannot be cast to java.lang.Long To reproduce the problem, put the attached files into a directory. Then create the metadata: refresh table metadata dfs.`path_to_directory`; For example, if you put the files in /drill/testdata/filter/orders_parts_metadata, then run this sql command refresh table metadata dfs.`/drill/testdata/filter/orders_parts_metadata`; A similar problem occurs when a float column is being compared with a double value. 0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(\*) from orders_parts_metadata where float_id < 1100.0; Error: SYSTEM ERROR: ClassCastException Also when a timestamp column is being compared with a string. 0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(\*) from orders_parts_metadata where timestamp_id < '2016-10-13'; Error: SYSTEM ERROR: ClassCastException: java.lang.Integer cannot be cast to java.lang.Long was: This query results in a ClassCastException when filter pushdown is used. The bigint column is being compared with an integer value. 0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(\*) from orders_parts_metadata where bigint_id < 1100; Error: SYSTEM ERROR: ClassCastException: java.lang.Integer cannot be cast to java.lang.Long A similar problem occurs when a float column is being compared with a double value. 0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(\*) from orders_parts_metadata where float_id < 1100.0; Error: SYSTEM ERROR: ClassCastException Also when a timestamp column is being compared with a string. 0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(\*) from orders_parts_metadata where timestamp_id < '2016-10-13'; Error: SYSTEM ERROR: ClassCastException: java.lang.Integer cannot be cast to java.lang.Long > ClassCastException when filter pushdown is used with a bigint or float column. > -- > > Key: DRILL-5086 > URL: https://issues.apache.org/jira/browse/DRILL-5086 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization >Affects Versions: 1.9.0 >Reporter: Robert Hou >Assignee: Aman Sinha > Attachments: 0_0_1.parquet, 0_0_2.parquet, 0_0_3.parquet, > 0_0_4.parquet, 0_0_5.parquet, drill.parquet_metadata > > > This query results in a ClassCastException when filter pushdown is used. The > bigint column is being compared with an integer value. >0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(\*) from > orders_parts_metadata where bigint_id < 1100; >Error: SYSTEM ERROR: ClassCastException: java.lang.Integer cannot be cast > to java.lang.Long > To reproduce the problem, put the attached files into a directory. Then > create the metadata: >refresh table metadata dfs.`path_to_directory`; > For example, if you put the files in > /drill/testdata/filter/orders_parts_metadata, then run this sql command >refresh table metadata dfs.`/drill/testdata/filter/orders_parts_metadata`; > A similar problem occurs when a float column is being compared with a double > value. >0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(\*) from > orders_parts_metadata where float_id < 1100.0; >Error: SYSTEM ERROR: ClassCastException > Also when a timestamp column is being compared with a string. >0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(\*) from > orders_parts_metadata where timestamp_id < '2016-10-13'; >Error: SYSTEM ERROR: ClassCastException: java.lang.Integer cannot be cast > to java.lang.Long -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-5093) Explain plan shows all partitions when query scans all partitions, and filter pushdown is used with metadata caching.
[ https://issues.apache.org/jira/browse/DRILL-5093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou updated DRILL-5093: -- Attachment: drill.parquet_metadata > Explain plan shows all partitions when query scans all partitions, and filter > pushdown is used with metadata caching. > - > > Key: DRILL-5093 > URL: https://issues.apache.org/jira/browse/DRILL-5093 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization >Affects Versions: 1.9.0 >Reporter: Robert Hou >Assignee: Jinfeng Ni > Attachments: 0_0_1.parquet, 0_0_2.parquet, 0_0_3.parquet, > 0_0_4.parquet, 0_0_5.parquet, drill.parquet_metadata > > > This query scans all the partitions because the partitions cannot be pruned. > When metadata caching is used, the explain plan shows all the partitions, > when it should only show the parent. > 0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> explain plan for select \* > from orders_parts_metadata; > +--+--+ > | text | json | > +--+--+ > | 00-00Screen > 00-01 Project(*=[$0]) > 00-02Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath > [path=/drill/testdata/filter/orders_parts_metadata/0_0_1.parquet], > ReadEntryWithPath > [path=/drill/testdata/filter/orders_parts_metadata/0_0_3.parquet], > ReadEntryWithPath > [path=/drill/testdata/filter/orders_parts_metadata/0_0_4.parquet], > ReadEntryWithPath > [path=/drill/testdata/filter/orders_parts_metadata/0_0_5.parquet], > ReadEntryWithPath > [path=/drill/testdata/filter/orders_parts_metadata/0_0_2.parquet]], > selectionRoot=/drill/testdata/filter/orders_parts_metadata, numFiles=5, > usedMetadataFile=true, > cacheFileRoot=/drill/testdata/filter/orders_parts_metadata, columns=[`*`]]]) > Here is the same query with a table that does not have metadata caching. > 0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> explain plan for select \* > from orders_parts; > +--+--+ > | text | json | > +--+--+ > | 00-00Screen > 00-01 Project(*=[$0]) > 00-02Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath > [path=maprfs:///drill/testdata/filter/orders_parts]], > selectionRoot=maprfs:/drill/testdata/filter/orders_parts, numFiles=1, > usedMetadataFile=false, columns=[`*`]]]) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-5093) Explain plan shows all partitions when query scans all partitions, and filter pushdown is used with metadata caching.
[ https://issues.apache.org/jira/browse/DRILL-5093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou updated DRILL-5093: -- Attachment: 0_0_5.parquet 0_0_4.parquet 0_0_3.parquet 0_0_2.parquet 0_0_1.parquet drill.parquet_metadata > Explain plan shows all partitions when query scans all partitions, and filter > pushdown is used with metadata caching. > - > > Key: DRILL-5093 > URL: https://issues.apache.org/jira/browse/DRILL-5093 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization >Affects Versions: 1.9.0 >Reporter: Robert Hou >Assignee: Jinfeng Ni > Attachments: 0_0_1.parquet, 0_0_2.parquet, 0_0_3.parquet, > 0_0_4.parquet, 0_0_5.parquet, drill.parquet_metadata > > > This query scans all the partitions because the partitions cannot be pruned. > When metadata caching is used, the explain plan shows all the partitions, > when it should only show the parent. > 0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> explain plan for select \* > from orders_parts_metadata; > +--+--+ > | text | json | > +--+--+ > | 00-00Screen > 00-01 Project(*=[$0]) > 00-02Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath > [path=/drill/testdata/filter/orders_parts_metadata/0_0_1.parquet], > ReadEntryWithPath > [path=/drill/testdata/filter/orders_parts_metadata/0_0_3.parquet], > ReadEntryWithPath > [path=/drill/testdata/filter/orders_parts_metadata/0_0_4.parquet], > ReadEntryWithPath > [path=/drill/testdata/filter/orders_parts_metadata/0_0_5.parquet], > ReadEntryWithPath > [path=/drill/testdata/filter/orders_parts_metadata/0_0_2.parquet]], > selectionRoot=/drill/testdata/filter/orders_parts_metadata, numFiles=5, > usedMetadataFile=true, > cacheFileRoot=/drill/testdata/filter/orders_parts_metadata, columns=[`*`]]]) > Here is the same query with a table that does not have metadata caching. > 0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> explain plan for select \* > from orders_parts; > +--+--+ > | text | json | > +--+--+ > | 00-00Screen > 00-01 Project(*=[$0]) > 00-02Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath > [path=maprfs:///drill/testdata/filter/orders_parts]], > selectionRoot=maprfs:/drill/testdata/filter/orders_parts, numFiles=1, > usedMetadataFile=false, columns=[`*`]]]) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-5093) Explain plan shows all partitions when query scans all partitions, and filter pushdown is used with metadata caching.
Robert Hou created DRILL-5093: - Summary: Explain plan shows all partitions when query scans all partitions, and filter pushdown is used with metadata caching. Key: DRILL-5093 URL: https://issues.apache.org/jira/browse/DRILL-5093 Project: Apache Drill Issue Type: Bug Components: Query Planning & Optimization Affects Versions: 1.9.0 Reporter: Robert Hou Assignee: Jinfeng Ni This query scans all the partitions because the partitions cannot be pruned. When metadata caching is used, the explain plan shows all the partitions, when it should only show the parent. 0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> explain plan for select \* from orders_parts_metadata; +--+--+ | text | json | +--+--+ | 00-00Screen 00-01 Project(*=[$0]) 00-02Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=/drill/testdata/filter/orders_parts_metadata/0_0_1.parquet], ReadEntryWithPath [path=/drill/testdata/filter/orders_parts_metadata/0_0_3.parquet], ReadEntryWithPath [path=/drill/testdata/filter/orders_parts_metadata/0_0_4.parquet], ReadEntryWithPath [path=/drill/testdata/filter/orders_parts_metadata/0_0_5.parquet], ReadEntryWithPath [path=/drill/testdata/filter/orders_parts_metadata/0_0_2.parquet]], selectionRoot=/drill/testdata/filter/orders_parts_metadata, numFiles=5, usedMetadataFile=true, cacheFileRoot=/drill/testdata/filter/orders_parts_metadata, columns=[`*`]]]) Here is the same query with a table that does not have metadata caching. 0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> explain plan for select \* from orders_parts; +--+--+ | text | json | +--+--+ | 00-00Screen 00-01 Project(*=[$0]) 00-02Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:///drill/testdata/filter/orders_parts]], selectionRoot=maprfs:/drill/testdata/filter/orders_parts, numFiles=1, usedMetadataFile=false, columns=[`*`]]]) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-5093) Explain plan shows all partitions when query scans all partitions, and filter pushdown is used with metadata caching.
[ https://issues.apache.org/jira/browse/DRILL-5093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou updated DRILL-5093: -- Description: This query scans all the partitions because the partitions cannot be pruned. When metadata caching is used, the explain plan shows all the partitions, when it should only show the parent. 0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> explain plan for select \* from orders_parts_metadata where int_id = -2000 or int_id = 0 or int_id = 4000 or int_id is null or int_id = 1; +--+--+ | text | json | +--+--+ | 00-00Screen 00-01 Project(*=[$0]) 00-02Project(T62¦¦*=[$0]) 00-03 SelectionVectorRemover 00-04Filter(condition=[OR(=($1, -2000), =($1, 0), =($1, 4000), IS NULL($1), =($1, 1))]) 00-05 Project(T62¦¦*=[$0], int_id=[$1]) 00-06Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=/drill/testdata/filter/orders_parts_metadata/0_0_1.parquet], ReadEntryWithPath [path=/drill/testdata/filter/orders_parts_metadata/0_0_3.parquet], ReadEntryWithPath [path=/drill/testdata/filter/orders_parts_metadata/0_0_4.parquet], ReadEntryWithPath [path=/drill/testdata/filter/orders_parts_metadata/0_0_5.parquet], ReadEntryWithPath [path=/drill/testdata/filter/orders_parts_metadata/0_0_2.parquet]], selectionRoot=/drill/testdata/filter/orders_parts_metadata, numFiles=5, usedMetadataFile=true, cacheFileRoot=/drill/testdata/filter/orders_parts_metadata, columns=[`*`]]]) To reproduce the problem, put the attached files into a directory. Then create the metadata: refresh table metadata dfs.`path_to_directory`; For example, if you put the files in /drill/testdata/filter/orders_parts_metadata, then run this sql command refresh table metadata dfs.`/drill/testdata/filter/orders_parts_metadata`; Here is the same query with the same table without metadata caching. 0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> explain plan for select \* from orders_parts where int_id = -2000 or int_id = 0 or int_id = 4000 or int_id is null or int_id = 1; +--+--+ | text | json | +--+--+ | 00-00Screen 00-01 Project(*=[$0]) 00-02Project(T63¦¦*=[$0]) 00-03 SelectionVectorRemover 00-04Filter(condition=[OR(=($1, -2000), =($1, 0), =($1, 4000), IS NULL($1), =($1, 1))]) 00-05 Project(T63¦¦*=[$0], int_id=[$1]) 00-06Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:///drill/testdata/filter/orders_parts]], selectionRoot=maprfs:/drill/testdata/filter/orders_parts, numFiles=1, usedMetadataFile=false, columns=[`*`]]]) was: This query scans all the partitions because the partitions cannot be pruned. When metadata caching is used, the explain plan shows all the partitions, when it should only show the parent. 0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> explain plan for select \* from orders_parts_metadata; +--+--+ | text | json | +--+--+ | 00-00Screen 00-01 Project(*=[$0]) 00-02Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=/drill/testdata/filter/orders_parts_metadata/0_0_1.parquet], ReadEntryWithPath [path=/drill/testdata/filter/orders_parts_metadata/0_0_3.parquet], ReadEntryWithPath [path=/drill/testdata/filter/orders_parts_metadata/0_0_4.parquet], ReadEntryWithPath [path=/drill/testdata/filter/orders_parts_metadata/0_0_5.parquet], ReadEntryWithPath [path=/drill/testdata/filter/orders_parts_metadata/0_0_2.parquet]], selectionRoot=/drill/testdata/filter/orders_parts_metadata, numFiles=5, usedMetadataFile=true, cacheFileRoot=/drill/testdata/filter/orders_parts_metadata, columns=[`*`]]]) To reproduce the problem, put the attached files into a directory. Then create the metadata: refresh table metadata dfs.`path_to_directory`; For example, if you put the files in /drill/testdata/filter/orders_parts_metadata, then run this sql command refresh table metadata dfs.`/drill/testdata/filter/orders_parts_metadata`; Here is the same query with a table that does not have metadata caching. 0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> explain plan for select \* from orders_parts; +--+--+ | text | json | +--+--+ | 00-00Screen 00-01 Project(*=[$0]) 00-02Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:///drill/testdata/filter/orders_parts]], selectionRoot=maprfs:/drill/testdata/filter/orders_parts, numFiles=1, usedMetadataFile=false, columns=[`*`]]]) > Explain plan shows all partitions when query scans all partitions, and filter > pushdown is used with metadata caching. > - > > Key: DRILL-5093 > URL: https://issues.apache.org/jira/browse/DRILL-5093
[jira] [Updated] (DRILL-5086) ClassCastException when filter pushdown is used with a bigint or float column and metadata caching.
[ https://issues.apache.org/jira/browse/DRILL-5086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou updated DRILL-5086: -- Description: This query results in a ClassCastException when filter pushdown is used with metadata caching. The bigint column is being compared with an integer value. 0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(\*) from orders_parts_metadata where bigint_id < 1100; Error: SYSTEM ERROR: ClassCastException: java.lang.Integer cannot be cast to java.lang.Long To reproduce the problem, put the attached files into a directory. Then create the metadata: refresh table metadata dfs.`path_to_directory`; For example, if you put the files in /drill/testdata/filter/orders_parts_metadata, then run this sql command refresh table metadata dfs.`/drill/testdata/filter/orders_parts_metadata`; A similar problem occurs when a float column is being compared with a double value. 0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(\*) from orders_parts_metadata where float_id < 1100.0; Error: SYSTEM ERROR: ClassCastException Also when a timestamp column is being compared with a string. 0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(\*) from orders_parts_metadata where timestamp_id < '2016-10-13'; Error: SYSTEM ERROR: ClassCastException: java.lang.Integer cannot be cast to java.lang.Long was: This query results in a ClassCastException when filter pushdown is used. The bigint column is being compared with an integer value. 0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(\*) from orders_parts_metadata where bigint_id < 1100; Error: SYSTEM ERROR: ClassCastException: java.lang.Integer cannot be cast to java.lang.Long To reproduce the problem, put the attached files into a directory. Then create the metadata: refresh table metadata dfs.`path_to_directory`; For example, if you put the files in /drill/testdata/filter/orders_parts_metadata, then run this sql command refresh table metadata dfs.`/drill/testdata/filter/orders_parts_metadata`; A similar problem occurs when a float column is being compared with a double value. 0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(\*) from orders_parts_metadata where float_id < 1100.0; Error: SYSTEM ERROR: ClassCastException Also when a timestamp column is being compared with a string. 0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(\*) from orders_parts_metadata where timestamp_id < '2016-10-13'; Error: SYSTEM ERROR: ClassCastException: java.lang.Integer cannot be cast to java.lang.Long > ClassCastException when filter pushdown is used with a bigint or float column > and metadata caching. > --- > > Key: DRILL-5086 > URL: https://issues.apache.org/jira/browse/DRILL-5086 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization >Affects Versions: 1.9.0 >Reporter: Robert Hou >Assignee: Parth Chandra > Attachments: 0_0_1.parquet, 0_0_2.parquet, 0_0_3.parquet, > 0_0_4.parquet, 0_0_5.parquet, drill.parquet_metadata > > > This query results in a ClassCastException when filter pushdown is used with > metadata caching. The bigint column is being compared with an integer value. >0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(\*) from > orders_parts_metadata where bigint_id < 1100; >Error: SYSTEM ERROR: ClassCastException: java.lang.Integer cannot be cast > to java.lang.Long > To reproduce the problem, put the attached files into a directory. Then > create the metadata: >refresh table metadata dfs.`path_to_directory`; > For example, if you put the files in > /drill/testdata/filter/orders_parts_metadata, then run this sql command >refresh table metadata dfs.`/drill/testdata/filter/orders_parts_metadata`; > A similar problem occurs when a float column is being compared with a double > value. >0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(\*) from > orders_parts_metadata where float_id < 1100.0; >Error: SYSTEM ERROR: ClassCastException > Also when a timestamp column is being compared with a string. >0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(\*) from > orders_parts_metadata where timestamp_id < '2016-10-13'; >Error: SYSTEM ERROR: ClassCastException: java.lang.Integer cannot be cast > to java.lang.Long -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-5086) ClassCastException when filter pushdown is used with a bigint or float column and metadata caching.
[ https://issues.apache.org/jira/browse/DRILL-5086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou updated DRILL-5086: -- Summary: ClassCastException when filter pushdown is used with a bigint or float column and metadata caching. (was: ClassCastException when filter pushdown is used with a bigint or float column.) > ClassCastException when filter pushdown is used with a bigint or float column > and metadata caching. > --- > > Key: DRILL-5086 > URL: https://issues.apache.org/jira/browse/DRILL-5086 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization >Affects Versions: 1.9.0 >Reporter: Robert Hou >Assignee: Parth Chandra > Attachments: 0_0_1.parquet, 0_0_2.parquet, 0_0_3.parquet, > 0_0_4.parquet, 0_0_5.parquet, drill.parquet_metadata > > > This query results in a ClassCastException when filter pushdown is used. The > bigint column is being compared with an integer value. >0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(\*) from > orders_parts_metadata where bigint_id < 1100; >Error: SYSTEM ERROR: ClassCastException: java.lang.Integer cannot be cast > to java.lang.Long > To reproduce the problem, put the attached files into a directory. Then > create the metadata: >refresh table metadata dfs.`path_to_directory`; > For example, if you put the files in > /drill/testdata/filter/orders_parts_metadata, then run this sql command >refresh table metadata dfs.`/drill/testdata/filter/orders_parts_metadata`; > A similar problem occurs when a float column is being compared with a double > value. >0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(\*) from > orders_parts_metadata where float_id < 1100.0; >Error: SYSTEM ERROR: ClassCastException > Also when a timestamp column is being compared with a string. >0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(\*) from > orders_parts_metadata where timestamp_id < '2016-10-13'; >Error: SYSTEM ERROR: ClassCastException: java.lang.Integer cannot be cast > to java.lang.Long -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-5136) Some SQL statements fail when using Simba ODBC driver 1.3
[ https://issues.apache.org/jira/browse/DRILL-5136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15762542#comment-15762542 ] Robert Hou commented on DRILL-5136: --- Comments from Robert Wu (Simba): This issue is likely caused by the 1.9.0 server turning the SHOW SCHEMAS query into a limit 0 query "SELECT * FROM (show schemas) LIMIT 0" while handling the prepare API call that was introduced in Drill 1.9.0. We are able to reproduce issue with the latest driver against Drill 1.9.0 server. We also noticed that when the driver passes the "show schemas" query to the prepare API exposed by Drill client the call return the following error: [ PARSE ERROR: Encountered "( show" at line 1, column 15. Was expecting one of: ... … OMITTED … "TABLE" ... SQL Query SELECT * FROM (show schemas) LIMIT 0 ^ [Error Id: bd266398-8090-42c0-b7b3-1efcbf2bf986 on maprdemo:31010] ] Tracing through the Drill 1.9 server side code we were able to track down the sequence of method calls that leads to the part of the code that turns an incoming query into a limit 0 query while handling the prepare API call. Below are the details: · When the server received the prepare request, the Userserver class calls on the worker to submit a prepare statement (view code). · In the submit prearestatment function, we can see that it creates a new PreparedStatementWorker (view code). · Finally, in the PreparedStatementWorker class, we can see that the server is manually wrapping the user query with limit 0 (view code). · The server is failing to prepare the new self-modified query, resulting in the error message reported by Robert Hou. We also tested the "show schemas" query against a server running pre-1.9 Drill and the issue is not reproducible. The reason for that is the driver does not use the new prepare API when connecting to server running Drill earlier than 1.9. Best regards, Rob > Some SQL statements fail when using Simba ODBC driver 1.3 > - > > Key: DRILL-5136 > URL: https://issues.apache.org/jira/browse/DRILL-5136 > Project: Apache Drill > Issue Type: Bug > Components: Client - ODBC >Affects Versions: 1.9.0 >Reporter: Robert Hou >Assignee: Laurent Goujon > > "show schemas" does not work with Simba ODBC driver > SQL>show schemas > 1: SQLPrepare = [MapR][Drill] (1040) Drill failed to execute the query: show > schemas > [30029]Query execution error. Details:[ > PARSE ERROR: Encountered "( show" at line 1, column 15. > Was expecting one of: > ... > ... > ... > ... > ... > "LATERAL" ... > "(" "WITH" ... > "(" "+" ... > "(" "-" ... > "(" ... > "(" ... > "(" The query profile shows this SQL statement is being executed: >SELECT * FROM (show schemas) LIMIT 0 > > The yellow highlighted syntax has been added when displaying schemas > "use schema" also does not work. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-5136) Some SQL statements fail due to PreparedStatement
[ https://issues.apache.org/jira/browse/DRILL-5136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou updated DRILL-5136: -- Summary: Some SQL statements fail due to PreparedStatement (was: Some SQL statements fail due to Prepared Statement API) > Some SQL statements fail due to PreparedStatement > - > > Key: DRILL-5136 > URL: https://issues.apache.org/jira/browse/DRILL-5136 > Project: Apache Drill > Issue Type: Bug > Components: Client - ODBC >Affects Versions: 1.9.0 >Reporter: Robert Hou >Assignee: Laurent Goujon > > "show schemas" does not work. > SQL>show schemas > 1: SQLPrepare = [MapR][Drill] (1040) Drill failed to execute the query: show > schemas > [30029]Query execution error. Details:[ > PARSE ERROR: Encountered "( show" at line 1, column 15. > Was expecting one of: > ... > ... > ... > ... > ... > "LATERAL" ... > "(" "WITH" ... > "(" "+" ... > "(" "-" ... > "(" ... > "(" ... > "(" The query profile shows this SQL statement is being executed: >SELECT * FROM (show schemas) LIMIT 0 > > The yellow highlighted syntax has been added when displaying schemas > "use schema" also does not work. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-5136) Some SQL statements fail when using Simba ODBC driver 1.3
[ https://issues.apache.org/jira/browse/DRILL-5136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou updated DRILL-5136: -- Assignee: Laurent Goujon > Some SQL statements fail when using Simba ODBC driver 1.3 > - > > Key: DRILL-5136 > URL: https://issues.apache.org/jira/browse/DRILL-5136 > Project: Apache Drill > Issue Type: Bug > Components: Client - ODBC >Affects Versions: 1.9.0 >Reporter: Robert Hou >Assignee: Laurent Goujon > > "show schemas" does not work with Simba ODBC driver > SQL>show schemas > 1: SQLPrepare = [MapR][Drill] (1040) Drill failed to execute the query: show > schemas > [30029]Query execution error. Details:[ > PARSE ERROR: Encountered "( show" at line 1, column 15. > Was expecting one of: > ... > ... > ... > ... > ... > "LATERAL" ... > "(" "WITH" ... > "(" "+" ... > "(" "-" ... > "(" ... > "(" ... > "(" The query profile shows this SQL statement is being executed: >SELECT * FROM (show schemas) LIMIT 0 > > The yellow highlighted syntax has been added when displaying schemas > "use schema" also does not work. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-5136) Some SQL statements fail due to Prepared Statement API
[ https://issues.apache.org/jira/browse/DRILL-5136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou updated DRILL-5136: -- Description: "show schemas" does not work. SQL>show schemas 1: SQLPrepare = [MapR][Drill] (1040) Drill failed to execute the query: show schemas [30029]Query execution error. Details:[ PARSE ERROR: Encountered "( show" at line 1, column 15. Was expecting one of: ... ... ... ... ... "LATERAL" ... "(" "WITH" ... "(" "+" ... "(" "-" ... "(" ... "(" ... "(" show schemas 1: SQLPrepare = [MapR][Drill] (1040) Drill failed to execute the query: show schemas [30029]Query execution error. Details:[ PARSE ERROR: Encountered "( show" at line 1, column 15. Was expecting one of: ... ... ... ... ... "LATERAL" ... "(" "WITH" ... "(" "+" ... "(" "-" ... "(" ... "(" ... "(" Some SQL statements fail due to Prepared Statement API > -- > > Key: DRILL-5136 > URL: https://issues.apache.org/jira/browse/DRILL-5136 > Project: Apache Drill > Issue Type: Bug > Components: Client - ODBC >Affects Versions: 1.9.0 >Reporter: Robert Hou >Assignee: Laurent Goujon > > "show schemas" does not work. > SQL>show schemas > 1: SQLPrepare = [MapR][Drill] (1040) Drill failed to execute the query: show > schemas > [30029]Query execution error. Details:[ > PARSE ERROR: Encountered "( show" at line 1, column 15. > Was expecting one of: > ... > ... > ... > ... > ... > "LATERAL" ... > "(" "WITH" ... > "(" "+" ... > "(" "-" ... > "(" ... > "(" ... > "(" The query profile shows this SQL statement is being executed: >SELECT * FROM (show schemas) LIMIT 0 > > The yellow highlighted syntax has been added when displaying schemas > "use schema" also does not work. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-5136) Some SQL statements fail due to Prepared Statement API
[ https://issues.apache.org/jira/browse/DRILL-5136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou updated DRILL-5136: -- Summary: Some SQL statements fail due to Prepared Statement API (was: Some SQL statements fail when using Simba ODBC driver 1.3) > Some SQL statements fail due to Prepared Statement API > -- > > Key: DRILL-5136 > URL: https://issues.apache.org/jira/browse/DRILL-5136 > Project: Apache Drill > Issue Type: Bug > Components: Client - ODBC >Affects Versions: 1.9.0 >Reporter: Robert Hou >Assignee: Laurent Goujon > > "show schemas" does not work with Simba ODBC driver > SQL>show schemas > 1: SQLPrepare = [MapR][Drill] (1040) Drill failed to execute the query: show > schemas > [30029]Query execution error. Details:[ > PARSE ERROR: Encountered "( show" at line 1, column 15. > Was expecting one of: > ... > ... > ... > ... > ... > "LATERAL" ... > "(" "WITH" ... > "(" "+" ... > "(" "-" ... > "(" ... > "(" ... > "(" The query profile shows this SQL statement is being executed: >SELECT * FROM (show schemas) LIMIT 0 > > The yellow highlighted syntax has been added when displaying schemas > "use schema" also does not work. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (DRILL-5136) Some SQL statements fail due to Prepared Statement API
[ https://issues.apache.org/jira/browse/DRILL-5136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15762542#comment-15762542 ] Robert Hou edited comment on DRILL-5136 at 12/19/16 11:10 PM: -- Comments from Robert Wu: This issue is likely caused by the 1.9.0 server turning the SHOW SCHEMAS query into a limit 0 query "SELECT * FROM (show schemas) LIMIT 0" while handling the prepare API call that was introduced in Drill 1.9.0. We are able to reproduce issue with the latest driver against Drill 1.9.0 server. We also noticed that when the driver passes the "show schemas" query to the prepare API exposed by Drill client the call return the following error: [ PARSE ERROR: Encountered "( show" at line 1, column 15. Was expecting one of: ... … OMITTED … "TABLE" ... SQL Query SELECT * FROM (show schemas) LIMIT 0 ^ [Error Id: bd266398-8090-42c0-b7b3-1efcbf2bf986 on maprdemo:31010] ] Tracing through the Drill 1.9 server side code we were able to track down the sequence of method calls that leads to the part of the code that turns an incoming query into a limit 0 query while handling the prepare API call. Below are the details: · When the server received the prepare request, the Userserver class calls on the worker to submit a prepare statement (view code). · In the submit prearestatment function, we can see that it creates a new PreparedStatementWorker (view code). · Finally, in the PreparedStatementWorker class, we can see that the server is manually wrapping the user query with limit 0 (view code). · The server is failing to prepare the new self-modified query, resulting in the error message reported by Robert Hou. We also tested the "show schemas" query against a server running pre-1.9 Drill and the issue is not reproducible. The reason for that is the driver does not use the new prepare API when connecting to server running Drill earlier than 1.9. Best regards, Rob was (Author: rhou): Comments from Robert Wu (Simba): This issue is likely caused by the 1.9.0 server turning the SHOW SCHEMAS query into a limit 0 query "SELECT * FROM (show schemas) LIMIT 0" while handling the prepare API call that was introduced in Drill 1.9.0. We are able to reproduce issue with the latest driver against Drill 1.9.0 server. We also noticed that when the driver passes the "show schemas" query to the prepare API exposed by Drill client the call return the following error: [ PARSE ERROR: Encountered "( show" at line 1, column 15. Was expecting one of: ... … OMITTED … "TABLE" ... SQL Query SELECT * FROM (show schemas) LIMIT 0 ^ [Error Id: bd266398-8090-42c0-b7b3-1efcbf2bf986 on maprdemo:31010] ] Tracing through the Drill 1.9 server side code we were able to track down the sequence of method calls that leads to the part of the code that turns an incoming query into a limit 0 query while handling the prepare API call. Below are the details: · When the server received the prepare request, the Userserver class calls on the worker to submit a prepare statement (view code). · In the submit prearestatment function, we can see that it creates a new PreparedStatementWorker (view code). · Finally, in the PreparedStatementWorker class, we can see that the server is manually wrapping the user query with limit 0 (view code). · The server is failing to prepare the new self-modified query, resulting in the error message reported by Robert Hou. We also tested the "show schemas" query against a server running pre-1.9 Drill and the issue is not reproducible. The reason for that is the driver does not use the new prepare API when connecting to server running Drill earlier than 1.9. Best regards, Rob > Some SQL statements fail due to Prepared Statement API > -- > > Key: DRILL-5136 > URL: https://issues.apache.org/jira/browse/DRILL-5136 > Project: Apache Drill > Issue Type: Bug > Components: Client - ODBC >Affects Versions: 1.9.0 >Reporter: Robert Hou >Assignee: Laurent Goujon > > "show schemas" does not work. > SQL>show schemas > 1: SQLPrepare = [MapR][Drill] (1040) Drill failed to execute the query: show > schemas > [30029]Query execution error. Details:[ > PARSE ERROR: Encountered "( show" at line 1, column 15. > Was expecting one of: > ... > ... > ... > ... > ... > "LATERAL" ... > "(" "WITH" ... > "(" "+" ... > "(" "-" ... > "(" ... > "(" ... > "(" The query profile shows this SQL statement is being executed: >SELECT * FROM (show schemas) LIMIT 0 > > The yellow highlighted syntax has been added when displaying schemas >
[jira] [Created] (DRILL-5136) Some SQL statements fail when using Simba ODBC driver 1.3
Robert Hou created DRILL-5136: - Summary: Some SQL statements fail when using Simba ODBC driver 1.3 Key: DRILL-5136 URL: https://issues.apache.org/jira/browse/DRILL-5136 Project: Apache Drill Issue Type: Bug Components: Client - ODBC Affects Versions: 1.9.0 Reporter: Robert Hou "show schemas" does not work with Simba ODBC driver SQL>show schemas 1: SQLPrepare = [MapR][Drill] (1040) Drill failed to execute the query: show schemas [30029]Query execution error. Details:[ PARSE ERROR: Encountered "( show" at line 1, column 15. Was expecting one of: ... ... ... ... ... "LATERAL" ... "(" "WITH" ... "(" "+" ... "(" "-" ... "(" ... "(" ... "("
[jira] [Updated] (DRILL-5136) Some SQL statements fail due to PreparedStatement
[ https://issues.apache.org/jira/browse/DRILL-5136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou updated DRILL-5136: -- Description: "show schemas" does not work. SQL>show schemas 1: SQLPrepare = [MapR][Drill] (1040) Drill failed to execute the query: show schemas [30029]Query execution error. Details:[ PARSE ERROR: Encountered "( show" at line 1, column 15. Was expecting one of: ... ... ... ... ... "LATERAL" ... "(" "WITH" ... "(" "+" ... "(" "-" ... "(" ... "(" ... "(" show schemas 1: SQLPrepare = [MapR][Drill] (1040) Drill failed to execute the query: show schemas [30029]Query execution error. Details:[ PARSE ERROR: Encountered "( show" at line 1, column 15. Was expecting one of: ... ... ... ... ... "LATERAL" ... "(" "WITH" ... "(" "+" ... "(" "-" ... "(" ... "(" ... "(" Some SQL statements fail due to PreparedStatement > - > > Key: DRILL-5136 > URL: https://issues.apache.org/jira/browse/DRILL-5136 > Project: Apache Drill > Issue Type: Bug > Components: Client - ODBC >Affects Versions: 1.9.0 >Reporter: Robert Hou >Assignee: Laurent Goujon > > "show schemas" does not work. > SQL>show schemas > 1: SQLPrepare = [MapR][Drill] (1040) Drill failed to execute the query: show > schemas > [30029]Query execution error. Details:[ > PARSE ERROR: Encountered "( show" at line 1, column 15. > Was expecting one of: > ... > ... > ... > ... > ... > "LATERAL" ... > "(" "WITH" ... > "(" "+" ... > "(" "-" ... > "(" ... > "(" ... > "(" The query profile shows this SQL statement is being executed: >{color:blue} SELECT * FROM {color} (show schemas) {color:blue} LIMIT 0 > {color} > > The blue text has been added when displaying schemas > "use schema" also does not work. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-5136) Some SQL statements fail due to PreparedStatement
[ https://issues.apache.org/jira/browse/DRILL-5136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15762639#comment-15762639 ] Robert Hou commented on DRILL-5136: --- With this patch, the CTAS statement still does not work: SQL>create table drill_3769 as select to_date(c3 + interval '1' day ) from ctas_t1 order by c3 1: SQLExec = [MapR][Drill] (1040) Drill failed to execute the query: create table drill_3769 as select to_date(c3 + interval '1' day ) from ctas_t1 order by c3 [30029]Query execution error. Details:[ VALIDATION ERROR: A table or view with given name [drill_3769] already exists in schema [dfs.ctas_parquet] > Some SQL statements fail due to PreparedStatement > - > > Key: DRILL-5136 > URL: https://issues.apache.org/jira/browse/DRILL-5136 > Project: Apache Drill > Issue Type: Bug > Components: Client - ODBC >Affects Versions: 1.9.0 >Reporter: Robert Hou >Assignee: Laurent Goujon > > "show schemas" does not work. > SQL>show schemas > 1: SQLPrepare = [MapR][Drill] (1040) Drill failed to execute the query: show > schemas > [30029]Query execution error. Details:[ > PARSE ERROR: Encountered "( show" at line 1, column 15. > Was expecting one of: > ... > ... > ... > ... > ... > "LATERAL" ... > "(" "WITH" ... > "(" "+" ... > "(" "-" ... > "(" ... > "(" ... > "(" The query profile shows this SQL statement is being executed: >{color:blue} SELECT * FROM {color} (show schemas) {color:blue} LIMIT 0 > {color} > > The blue text has been added when displaying schemas > "use schema" also does not work. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-5311) C++ connector connect doesn't check handshake result for timeout
[ https://issues.apache.org/jira/browse/DRILL-5311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15934953#comment-15934953 ] Robert Hou commented on DRILL-5311: --- The framework will be updated to support ODBC. We plan to use a python script to run SQL queries, it is a work in progress. Hopefully 1.11. I don't know if this will help with testing the C++ connector. > C++ connector connect doesn't check handshake result for timeout > > > Key: DRILL-5311 > URL: https://issues.apache.org/jira/browse/DRILL-5311 > Project: Apache Drill > Issue Type: Bug > Components: Client - C++ >Reporter: Laurent Goujon >Assignee: Sudheesh Katkam > Labels: ready-to-commit > Fix For: 1.11.0 > > > The C++ connector connect methods returns okay as soon as the tcp connection > is succesfully established between client and server, and the handshake > message is sent. However it doesn't wait for handshake to have completed. > The consequence is that if handshake failed, the error is deferred to the > first query, which might be unexpected by the application. > I believe that validateHanshake method in drillClientImpl should wait for the > handshake to complete, as it seems a bit more saner... -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (DRILL-5316) C++ Client Crashes When drillbitsVector.count is 0 after zoo_get_children completed with ZOK
[ https://issues.apache.org/jira/browse/DRILL-5316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15945783#comment-15945783 ] Robert Hou commented on DRILL-5316: --- I tried a couple of cluster IDs. I used random characters and symbols. One ID was almost 100 characters. I have been unable to reproduce it so far. I tested with v1.3.4, which runs on Windows. > C++ Client Crashes When drillbitsVector.count is 0 after zoo_get_children > completed with ZOK > > > Key: DRILL-5316 > URL: https://issues.apache.org/jira/browse/DRILL-5316 > Project: Apache Drill > Issue Type: Bug > Components: Client - C++ >Reporter: Rob Wu >Assignee: Chun Chang >Priority: Critical > Labels: ready-to-commit > Fix For: 1.11.0 > > > When connecting to drillbit with Zookeeper, occasionally the C++ client would > crash without any reason. > A further look into the code revealed that during this call > rc=zoo_get_children(p_zh.get(), m_path.c_str(), 0, ); > zoo_get_children returns ZOK (0) but drillbitsVector.count is 0. > This causes drillbits to stay empty and thus > causes err = zook.getEndPoint(drillbits[drillbits.size() -1], endpoint); to > crash > Size check should be done to prevent this from happening -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (DRILL-5316) C++ Client Crashes When drillbitsVector.count is 0 after zoo_get_children completed with ZOK
[ https://issues.apache.org/jira/browse/DRILL-5316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15945780#comment-15945780 ] Robert Hou commented on DRILL-5316: --- I asked Rob for some ideas on how to reproduce this problem. He wrote: For me, the issue surfaced on its own when the VM becomes unstable. To make it unstable, I modified the drill-override.conf’s cluster id to be a something very very long with different symbols. Restart the drill bit so the new setting gets loaded. Then switch it back to cluster id contains dots and restart again. Finally, try connecting with cluster id “drill” or “drillbits” or “drillbits1” or “drilbit1” or “drillbit” (or other cluster id of your choice). Not sure how easily it is to reproduce on a new set up again. Perhaps simply stopping the zk service would do? > C++ Client Crashes When drillbitsVector.count is 0 after zoo_get_children > completed with ZOK > > > Key: DRILL-5316 > URL: https://issues.apache.org/jira/browse/DRILL-5316 > Project: Apache Drill > Issue Type: Bug > Components: Client - C++ >Reporter: Rob Wu >Assignee: Chun Chang >Priority: Critical > Labels: ready-to-commit > Fix For: 1.11.0 > > > When connecting to drillbit with Zookeeper, occasionally the C++ client would > crash without any reason. > A further look into the code revealed that during this call > rc=zoo_get_children(p_zh.get(), m_path.c_str(), 0, ); > zoo_get_children returns ZOK (0) but drillbitsVector.count is 0. > This causes drillbits to stay empty and thus > causes err = zook.getEndPoint(drillbits[drillbits.size() -1], endpoint); to > crash > Size check should be done to prevent this from happening -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (DRILL-5374) Parquet filter pushdown does not prune partition with nulls when predicate uses float column
[ https://issues.apache.org/jira/browse/DRILL-5374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou updated DRILL-5374: -- Attachment: drill.parquet_metadata 0_0_5.parquet 0_0_4.parquet 0_0_3.parquet 0_0_2.parquet 0_0_1.parquet > Parquet filter pushdown does not prune partition with nulls when predicate > uses float column > > > Key: DRILL-5374 > URL: https://issues.apache.org/jira/browse/DRILL-5374 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization >Affects Versions: 1.9.0 >Reporter: Robert Hou >Assignee: Jinfeng Ni > Attachments: 0_0_1.parquet, 0_0_2.parquet, 0_0_3.parquet, > 0_0_4.parquet, 0_0_5.parquet, drill.parquet_metadata > > > Drill does not prune enough partitions for this query when filter pushdown is > used with metadata caching. The float column is being compared with a double > value. > {code} > 0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(*) from > orders_parts_metadata where float_id < 1100.0; > {code} > To reproduce the problem, put the attached files into a directory. Then > {code} > create the metadata: > refresh table metadata dfs.`path_to_directory`; > {code} > For example, if you put the files in > /drill/testdata/filter/orders_parts_metadata, then run this sql command > {code} > refresh table metadata dfs.`/drill/testdata/filter/orders_parts_metadata`; > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Closed] (DRILL-5086) ClassCastException when filter pushdown is used with a bigint or float column and metadata caching.
[ https://issues.apache.org/jira/browse/DRILL-5086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou closed DRILL-5086. - > ClassCastException when filter pushdown is used with a bigint or float column > and metadata caching. > --- > > Key: DRILL-5086 > URL: https://issues.apache.org/jira/browse/DRILL-5086 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization >Affects Versions: 1.9.0 >Reporter: Robert Hou >Assignee: Parth Chandra > Labels: ready-to-commit > Fix For: 1.10.0 > > Attachments: 0_0_1.parquet, 0_0_2.parquet, 0_0_3.parquet, > 0_0_4.parquet, 0_0_5.parquet, drill.parquet_metadata > > > This query results in a ClassCastException when filter pushdown is used with > metadata caching. The bigint column is being compared with an integer value. >0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(\*) from > orders_parts_metadata where bigint_id < 1100; >Error: SYSTEM ERROR: ClassCastException: java.lang.Integer cannot be cast > to java.lang.Long > To reproduce the problem, put the attached files into a directory. Then > create the metadata: >refresh table metadata dfs.`path_to_directory`; > For example, if you put the files in > /drill/testdata/filter/orders_parts_metadata, then run this sql command >refresh table metadata dfs.`/drill/testdata/filter/orders_parts_metadata`; > A similar problem occurs when a float column is being compared with a double > value. >0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(\*) from > orders_parts_metadata where float_id < 1100.0; >Error: SYSTEM ERROR: ClassCastException > Also when a timestamp column is being compared with a string. >0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(\*) from > orders_parts_metadata where timestamp_id < '2016-10-13'; >Error: SYSTEM ERROR: ClassCastException: java.lang.Integer cannot be cast > to java.lang.Long -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (DRILL-5374) Parquet filter pushdown does not prune partition with nulls when predicate uses float column
Robert Hou created DRILL-5374: - Summary: Parquet filter pushdown does not prune partition with nulls when predicate uses float column Key: DRILL-5374 URL: https://issues.apache.org/jira/browse/DRILL-5374 Project: Apache Drill Issue Type: Bug Components: Query Planning & Optimization Affects Versions: 1.9.0 Reporter: Robert Hou Assignee: Jinfeng Ni Drill does not prune enough partitions for this query when filter pushdown is used with metadata caching. The float column is being compared with a double value. {code} 0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(*) from orders_parts_metadata where float_id < 1100.0; {code} To reproduce the problem, put the attached files into a directory. Then {code} create the metadata: refresh table metadata dfs.`path_to_directory`; {code} For example, if you put the files in /drill/testdata/filter/orders_parts_metadata, then run this sql command {code} refresh table metadata dfs.`/drill/testdata/filter/orders_parts_metadata`; {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (DRILL-5374) Parquet filter pushdown does not prune partition with nulls when predicate uses float column
[ https://issues.apache.org/jira/browse/DRILL-5374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15935591#comment-15935591 ] Robert Hou commented on DRILL-5374: --- This is the Scan step from the explain plan: {code} 00-06Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=/drill/testdata/filter/orders_parts_metadata/0_0_1.parquet], ReadEntryWithPath [path=/drill/testdata/filter/orders_parts_metadata/0_0_4.parquet], ReadEntryWithPath [path=/drill/testdata/filter/orders_parts_metadata/0_0_2.parquet]], selectionRoot=/drill/testdata/filter/orders_parts_metadata, numFiles=3, usedMetadataFile=true, cacheFileRoot=/drill/testdata/filter/orders_parts_metadata, columns=[`float_id`]]]) {code} Partition /drill/testdata/filter/orders_parts_metadata/0_0_4.parquet should not be scanned because it contains all null values for the float_id column. > Parquet filter pushdown does not prune partition with nulls when predicate > uses float column > > > Key: DRILL-5374 > URL: https://issues.apache.org/jira/browse/DRILL-5374 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization >Affects Versions: 1.9.0 >Reporter: Robert Hou >Assignee: Jinfeng Ni > Attachments: 0_0_1.parquet, 0_0_2.parquet, 0_0_3.parquet, > 0_0_4.parquet, 0_0_5.parquet, drill.parquet_metadata > > > Drill does not prune enough partitions for this query when filter pushdown is > used with metadata caching. The float column is being compared with a double > value. > {code} > 0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(*) from > orders_parts_metadata where float_id < 1100.0; > {code} > To reproduce the problem, put the attached files into a directory. Then > {code} > create the metadata: > refresh table metadata dfs.`path_to_directory`; > {code} > For example, if you put the files in > /drill/testdata/filter/orders_parts_metadata, then run this sql command > {code} > refresh table metadata dfs.`/drill/testdata/filter/orders_parts_metadata`; > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (DRILL-5086) ClassCastException when filter pushdown is used with a bigint or float column and metadata caching.
[ https://issues.apache.org/jira/browse/DRILL-5086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15935592#comment-15935592 ] Robert Hou commented on DRILL-5086: --- I have verified that the three SQL statements execute without errors. The second SQL statement, however, does not prune enough partitions. I have created a new Jira, [DRILL-5374], to track this new problem. > ClassCastException when filter pushdown is used with a bigint or float column > and metadata caching. > --- > > Key: DRILL-5086 > URL: https://issues.apache.org/jira/browse/DRILL-5086 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization >Affects Versions: 1.9.0 >Reporter: Robert Hou >Assignee: Parth Chandra > Labels: ready-to-commit > Fix For: 1.10.0 > > Attachments: 0_0_1.parquet, 0_0_2.parquet, 0_0_3.parquet, > 0_0_4.parquet, 0_0_5.parquet, drill.parquet_metadata > > > This query results in a ClassCastException when filter pushdown is used with > metadata caching. The bigint column is being compared with an integer value. >0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(\*) from > orders_parts_metadata where bigint_id < 1100; >Error: SYSTEM ERROR: ClassCastException: java.lang.Integer cannot be cast > to java.lang.Long > To reproduce the problem, put the attached files into a directory. Then > create the metadata: >refresh table metadata dfs.`path_to_directory`; > For example, if you put the files in > /drill/testdata/filter/orders_parts_metadata, then run this sql command >refresh table metadata dfs.`/drill/testdata/filter/orders_parts_metadata`; > A similar problem occurs when a float column is being compared with a double > value. >0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(\*) from > orders_parts_metadata where float_id < 1100.0; >Error: SYSTEM ERROR: ClassCastException > Also when a timestamp column is being compared with a string. >0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(\*) from > orders_parts_metadata where timestamp_id < '2016-10-13'; >Error: SYSTEM ERROR: ClassCastException: java.lang.Integer cannot be cast > to java.lang.Long -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (DRILL-5316) C++ Client Crashes When drillbitsVector.count is 0 after zoo_get_children completed with ZOK
[ https://issues.apache.org/jira/browse/DRILL-5316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15952903#comment-15952903 ] Robert Hou commented on DRILL-5316: --- I repeated the steps above, with v1.3.6 on Windows. This time, there is a single error, and the error code is different. ERROR [08S01] [MapR][Drill] (10) Failure occurred while trying to connect to zk=10.10.100.186:5181/drill/drillbits1 > C++ Client Crashes When drillbitsVector.count is 0 after zoo_get_children > completed with ZOK > > > Key: DRILL-5316 > URL: https://issues.apache.org/jira/browse/DRILL-5316 > Project: Apache Drill > Issue Type: Bug > Components: Client - C++ >Reporter: Rob Wu >Assignee: Chun Chang >Priority: Critical > Labels: ready-to-commit > Fix For: 1.11.0 > > > When connecting to drillbit with Zookeeper, occasionally the C++ client would > crash without any reason. > A further look into the code revealed that during this call > rc=zoo_get_children(p_zh.get(), m_path.c_str(), 0, ); > zoo_get_children returns ZOK (0) but drillbitsVector.count is 0. > This causes drillbits to stay empty and thus > causes err = zook.getEndPoint(drillbits[drillbits.size() -1], endpoint); to > crash > Size check should be done to prevent this from happening -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (DRILL-5316) C++ Client Crashes When drillbitsVector.count is 0 after zoo_get_children completed with ZOK
[ https://issues.apache.org/jira/browse/DRILL-5316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15952901#comment-15952901 ] Robert Hou commented on DRILL-5316: --- I tried to enable logging, but it did not seem to work. I set logging to LOG_TRACE and specified a directory. I am using v1.2.1, which runs on Windows. > C++ Client Crashes When drillbitsVector.count is 0 after zoo_get_children > completed with ZOK > > > Key: DRILL-5316 > URL: https://issues.apache.org/jira/browse/DRILL-5316 > Project: Apache Drill > Issue Type: Bug > Components: Client - C++ >Reporter: Rob Wu >Assignee: Chun Chang >Priority: Critical > Labels: ready-to-commit > Fix For: 1.11.0 > > > When connecting to drillbit with Zookeeper, occasionally the C++ client would > crash without any reason. > A further look into the code revealed that during this call > rc=zoo_get_children(p_zh.get(), m_path.c_str(), 0, ); > zoo_get_children returns ZOK (0) but drillbitsVector.count is 0. > This causes drillbits to stay empty and thus > causes err = zook.getEndPoint(drillbits[drillbits.size() -1], endpoint); to > crash > Size check should be done to prevent this from happening -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (DRILL-5316) C++ Client Crashes When drillbitsVector.count is 0 after zoo_get_children completed with ZOK
[ https://issues.apache.org/jira/browse/DRILL-5316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15952900#comment-15952900 ] Robert Hou edited comment on DRILL-5316 at 4/3/17 12:32 AM: How can I verify that the C++ Client crashed? This is what I have done so far. 1) Set cluster-id in drill-override.conf file to {code} "-alakjdslkfjlskjdflkasdoiweuroiweurHKJQAIUW-__ksldjfYIUEWUIkljdsfoIOUOIUjlkjklj-_" {code} 2) reboot Drill 3) Set cluster-id to "rhou1_com.drillbits.com.org" 4) reboot Drill 5) From Windows, connect using Drill Explorer. Set cluster-id to /drill/drillbits1 Drill Explorer pops a window that says: {code} An error occurred while communicating with the data source. ERROR [HY000] [MapR][Drill] (10) Failure occurred while trying to connect to zk=10.10.100.186:5181/drill/drillbits1 ERROR [HY000] [MapR][Drill] (10) Failure occurred while trying to connect to zk=10.10.100.186:5181/drill/drillbits1 {code} was (Author: rhou): How can I verify that the C++ Client crashed? This is what I have done so far. 1) Set cluster-id in drill-override.conf file to "-alakjdslkfjlskjdflkasdoiweuroiweurHKJQAIUW-__ksldjfYIUEWUIkljdsfoIOUOIUjlkjklj-_" 2) reboot Drill 3) Set cluster-id to "rhou1_com.drillbits.com.org" 4) reboot Drill 5) From Windows, connect using Drill Explorer. Set cluster-id to /drill/drillbits1 Drill Explorer pops a window that says: {code} An error occurred while communicating with the data source. ERROR [HY000] [MapR][Drill] (10) Failure occurred while trying to connect to zk=10.10.100.186:5181/drill/drillbits1 ERROR [HY000] [MapR][Drill] (10) Failure occurred while trying to connect to zk=10.10.100.186:5181/drill/drillbits1 {code} > C++ Client Crashes When drillbitsVector.count is 0 after zoo_get_children > completed with ZOK > > > Key: DRILL-5316 > URL: https://issues.apache.org/jira/browse/DRILL-5316 > Project: Apache Drill > Issue Type: Bug > Components: Client - C++ >Reporter: Rob Wu >Assignee: Chun Chang >Priority: Critical > Labels: ready-to-commit > Fix For: 1.11.0 > > > When connecting to drillbit with Zookeeper, occasionally the C++ client would > crash without any reason. > A further look into the code revealed that during this call > rc=zoo_get_children(p_zh.get(), m_path.c_str(), 0, ); > zoo_get_children returns ZOK (0) but drillbitsVector.count is 0. > This causes drillbits to stay empty and thus > causes err = zook.getEndPoint(drillbits[drillbits.size() -1], endpoint); to > crash > Size check should be done to prevent this from happening -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (DRILL-5316) C++ Client Crashes When drillbitsVector.count is 0 after zoo_get_children completed with ZOK
[ https://issues.apache.org/jira/browse/DRILL-5316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15952900#comment-15952900 ] Robert Hou commented on DRILL-5316: --- How can I verify that the C++ Client crashed? This is what I have done so far. 1) Set cluster-id in drill-override.conf file to "-alakjdslkfjlskjdflkasdoiweuroiweurHKJQAIUW-__ksldjfYIUEWUIkljdsfoIOUOIUjlkjklj-_" 2) reboot Drill 3) Set cluster-id to "rhou1_com.drillbits.com.org" 4) reboot Drill 5) From Windows, connect using Drill Explorer. Set cluster-id to /drill/drillbits1 Drill Explorer pops a window that says: {code} An error occurred while communicating with the data source. ERROR [HY000] [MapR][Drill] (10) Failure occurred while trying to connect to zk=10.10.100.186:5181/drill/drillbits1 ERROR [HY000] [MapR][Drill] (10) Failure occurred while trying to connect to zk=10.10.100.186:5181/drill/drillbits1 {code} > C++ Client Crashes When drillbitsVector.count is 0 after zoo_get_children > completed with ZOK > > > Key: DRILL-5316 > URL: https://issues.apache.org/jira/browse/DRILL-5316 > Project: Apache Drill > Issue Type: Bug > Components: Client - C++ >Reporter: Rob Wu >Assignee: Chun Chang >Priority: Critical > Labels: ready-to-commit > Fix For: 1.11.0 > > > When connecting to drillbit with Zookeeper, occasionally the C++ client would > crash without any reason. > A further look into the code revealed that during this call > rc=zoo_get_children(p_zh.get(), m_path.c_str(), 0, ); > zoo_get_children returns ZOK (0) but drillbitsVector.count is 0. > This causes drillbits to stay empty and thus > causes err = zook.getEndPoint(drillbits[drillbits.size() -1], endpoint); to > crash > Size check should be done to prevent this from happening -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (DRILL-5659) C++ Client (master) behavior is unstable resulting incorrect result or exception in API calls
[ https://issues.apache.org/jira/browse/DRILL-5659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16082688#comment-16082688 ] Robert Hou commented on DRILL-5659: --- Hi Rob, Is this a blocking issue for you? How likely is it that a customer will encounter this issue? > C++ Client (master) behavior is unstable resulting incorrect result or > exception in API calls > - > > Key: DRILL-5659 > URL: https://issues.apache.org/jira/browse/DRILL-5659 > Project: Apache Drill > Issue Type: Bug >Reporter: Rob Wu > Fix For: 1.11.0 > > Attachments: 1-10cppClient-1.10.0Drillbit-hive.log, > 1-10cppClient-1.10.0Drillbit-metadata and catalog test.log, > 1-10cppClient-1.9.0Drillbit-dfs.log, 1-10cppClient-1.9.0Drillbit-metadata and > catalog test.log, 1-11cppClient-1.10.0Drillbit-hive.log, > 1-11cppClient-1.10.0Drillbit-metadata and catalog test.log, > 1-11cppClient-1.9.0Drillbit-dfs.log, 1-11cppClient-1.9.0Drillbit-metadata and > catalog test.log > > > I recently compiled the C++ client (on windows) from master and tested > against a 1.9.0 drillbit. The client's behavior does not meet the stable > release requirement. > Some API functionalities are broken and should be investigated. > Most noticeable is the getColumns(...) call. It will throw an exception with > "Cannot decode getcolumns results" when the number of rows (column records) > exceeds a certain number. > I also noticed that: during query execution + data retrieval, if the table is > large enough, the result coming back will start to become NULL or empty. > I will see if I can generate some drillclient logs to put in the attachment. > I will also compile and test on other platforms. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-4281) Drill should support inbound impersonation
[ https://issues.apache.org/jira/browse/DRILL-4281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16100468#comment-16100468 ] Robert Hou commented on DRILL-4281: --- ODBC is verified. There is a new configuration parameter, DelegationUID. This can be set in the odbc.ini file. > Drill should support inbound impersonation > -- > > Key: DRILL-4281 > URL: https://issues.apache.org/jira/browse/DRILL-4281 > Project: Apache Drill > Issue Type: Improvement >Reporter: Keys Botzum >Assignee: Sudheesh Katkam > Labels: doc-impacting, security > Fix For: 1.6.0 > > > Today Drill supports impersonation *to* external sources. For example I can > authenticate to Drill as myself and then Drill will access HDFS using > impersonation > In many scenarios we also need impersonation to Drill. For example I might > use some front end tool (such as Tableau) and authenticate to it as myself. > That tool (server version) then needs to access Drill to perform queries and > I want those queries to run as myself, not as the Tableau user. While in > theory the intermediate tool could store the userid & password for every user > to the Drill this isn't a scalable or very secure solution. > Note that HS2 today does support inbound impersonation as described here: > https://issues.apache.org/jira/browse/HIVE-5155 > The above is not the best approach as it is tied to the connection object > which is very coarse grained and potentially expensive. It would be better if > there was a call on the ODBC/JDBC driver to switch the identity on a existing > connection. Most modern SQL databases (Oracle, DB2) support such function. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5316) C++ Client Crashes When drillbitsVector.count is 0 after zoo_get_children completed with ZOK
[ https://issues.apache.org/jira/browse/DRILL-5316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16102504#comment-16102504 ] Robert Hou commented on DRILL-5316: --- I tried to reproduce and verify this problem, with help from @robwu15, but I was not able to. I will close this unless we find another way to reproduce it. > C++ Client Crashes When drillbitsVector.count is 0 after zoo_get_children > completed with ZOK > > > Key: DRILL-5316 > URL: https://issues.apache.org/jira/browse/DRILL-5316 > Project: Apache Drill > Issue Type: Bug > Components: Client - C++ >Reporter: Rob Wu >Assignee: Robert Hou >Priority: Critical > Labels: ready-to-commit > Fix For: 1.11.0 > > > When connecting to drillbit with Zookeeper, occasionally the C++ client would > crash without any reason. > A further look into the code revealed that during this call > rc=zoo_get_children(p_zh.get(), m_path.c_str(), 0, ); > zoo_get_children returns ZOK (0) but drillbitsVector.count is 0. > This causes drillbits to stay empty and thus > causes err = zook.getEndPoint(drillbits[drillbits.size() -1], endpoint); to > crash > Size check should be done to prevent this from happening -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (DRILL-5316) C++ Client Crashes When drillbitsVector.count is 0 after zoo_get_children completed with ZOK
[ https://issues.apache.org/jira/browse/DRILL-5316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16102504#comment-16102504 ] Robert Hou edited comment on DRILL-5316 at 7/27/17 12:14 AM: - I tried to reproduce and verify this problem, with help from [~robertw], but I was not able to. I will close this unless we find another way to reproduce it. was (Author: rhou): I tried to reproduce and verify this problem, with help from @robwu15, but I was not able to. I will close this unless we find another way to reproduce it. > C++ Client Crashes When drillbitsVector.count is 0 after zoo_get_children > completed with ZOK > > > Key: DRILL-5316 > URL: https://issues.apache.org/jira/browse/DRILL-5316 > Project: Apache Drill > Issue Type: Bug > Components: Client - C++ >Reporter: Rob Wu >Assignee: Robert Hou >Priority: Critical > Labels: ready-to-commit > Fix For: 1.11.0 > > > When connecting to drillbit with Zookeeper, occasionally the C++ client would > crash without any reason. > A further look into the code revealed that during this call > rc=zoo_get_children(p_zh.get(), m_path.c_str(), 0, ); > zoo_get_children returns ZOK (0) but drillbitsVector.count is 0. > This causes drillbits to stay empty and thus > causes err = zook.getEndPoint(drillbits[drillbits.size() -1], endpoint); to > crash > Size check should be done to prevent this from happening -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (DRILL-5732) Unable to allocate sv2 for 9039 records, and not enough batchGroups to spill.
[ https://issues.apache.org/jira/browse/DRILL-5732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou updated DRILL-5732: -- Attachment: drillbit.log > Unable to allocate sv2 for 9039 records, and not enough batchGroups to spill. > - > > Key: DRILL-5732 > URL: https://issues.apache.org/jira/browse/DRILL-5732 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.10.0 >Reporter: Robert Hou >Assignee: Paul Rogers > Attachments: drillbit.log > > > git commit id: > {noformat} > | 1.12.0-SNAPSHOT | e9065b55ea560e7f737d6fcb4948f9e945b9b14f | DRILL-5660: > Parquet metadata caching improvements | 15.08.2017 @ 09:31:00 PDT | > r...@qa-node190.qa.lab | 15.08.2017 @ 13:29:26 PDT | > {noformat} > Query is: > {noformat} > ALTER SESSION SET `exec.sort.disable_managed` = false; > alter session set `planner.disable_exchanges` = true; > alter session set `planner.memory.max_query_memory_per_node` = 104857600; > alter session set `planner.width.max_per_node` = 1; > alter session set `planner.width.max_per_query` = 1; > select max(col1), max(cs_sold_date_sk), max(cs_sold_time_sk), > max(cs_ship_date_sk), max(cs_bill_customer_sk), max(cs_bill_cdemo_sk), > max(cs_bill_hdemo_sk), max(cs_bill_addr_sk), max(cs_ship_customer_sk), > max(cs_ship_cdemo_sk), max(cs_ship_hdemo_sk), max(cs_ship_addr_sk), > max(cs_call_center_sk), max(cs_catalog_page_sk), max(cs_ship_mode_sk), > min(cs_warehouse_sk), max(cs_item_sk), max(cs_promo_sk), > max(cs_order_number), max(cs_quantity), max(cs_wholesale_cost), > max(cs_list_price), max(cs_sales_price), max(cs_ext_discount_amt), > min(cs_ext_sales_price), max(cs_ext_wholesale_cost), min(cs_ext_list_price), > min(cs_ext_tax), min(cs_coupon_amt), max(cs_ext_ship_cost), max(cs_net_paid), > max(cs_net_paid_inc_tax), min(cs_net_paid_inc_ship), > min(cs_net_paid_inc_ship_tax), min(cs_net_profit), min(c_customer_sk), > min(length(c_customer_id)), max(c_current_cdemo_sk), max(c_current_hdemo_sk), > min(c_current_addr_sk), min(c_first_shipto_date_sk), > min(c_first_sales_date_sk), min(length(c_salutation)), > min(length(c_first_name)), min(length(c_last_name)), > min(length(c_preferred_cust_flag)), max(c_birth_day), min(c_birth_month), > min(c_birth_year), max(c_last_review_date), c_email_address from (select > cs_sold_date_sk+cs_sold_time_sk col1, * from > dfs.`/drill/testdata/resource-manager/md1362` order by c_email_address nulls > first) d where d.col1 > 2536816 and c_email_address is not null group by > c_email_address; > ALTER SESSION SET `exec.sort.disable_managed` = true; > alter session set `planner.disable_exchanges` = false; > alter session set `planner.memory.max_query_memory_per_node` = 2147483648; > alter session set `planner.width.max_per_node` = 17; > alter session set `planner.width.max_per_query` = 1000; > {noformat} > Here is the stack trace: > {noformat} > 2017-08-18 13:15:27,052 [2668b522-5833-8fd2-0b6d-e685197f0ae3:frag:0:0] DEBUG > o.a.d.e.t.g.SingleBatchSorterGen27 - Took 6445 us to sort 9039 records > 2017-08-18 13:15:27,420 [2668b522-5833-8fd2-0b6d-e685197f0ae3:frag:0:0] DEBUG > o.a.d.e.p.i.xsort.ExternalSortBatch - Copier allocator current allocation 0 > 2017-08-18 13:15:27,420 [2668b522-5833-8fd2-0b6d-e685197f0ae3:frag:0:0] DEBUG > o.a.d.e.p.i.xsort.ExternalSortBatch - mergeAndSpill: starting total size in > memory = 71964288 > 2017-08-18 13:15:27,421 [2668b522-5833-8fd2-0b6d-e685197f0ae3:frag:0:0] INFO > o.a.d.e.p.i.xsort.ExternalSortBatch - User Error Occurred: One or more nodes > ran out of memory while executing the query. > org.apache.drill.common.exceptions.UserException: RESOURCE ERROR: One or more > nodes ran out of memory while executing the query. > Unable to allocate sv2 for 9039 records, and not enough batchGroups to spill. > batchGroups.size 1 > spilledBatchGroups.size 0 > allocated memory 71964288 > allocator limit 52428800 > [Error Id: 7b248f12-2b31-4013-86b6-92e6c842db48 ] > at > org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:550) > ~[drill-common-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.newSV2(ExternalSortBatch.java:637) > [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.innerNext(ExternalSortBatch.java:379) > [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:164) > [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:225) >
[jira] [Created] (DRILL-5732) Unable to allocate sv2 for 9039 records, and not enough batchGroups to spill.
Robert Hou created DRILL-5732: - Summary: Unable to allocate sv2 for 9039 records, and not enough batchGroups to spill. Key: DRILL-5732 URL: https://issues.apache.org/jira/browse/DRILL-5732 Project: Apache Drill Issue Type: Bug Affects Versions: 1.10.0 Reporter: Robert Hou Assignee: Paul Rogers git commit id: {noformat} | 1.12.0-SNAPSHOT | e9065b55ea560e7f737d6fcb4948f9e945b9b14f | DRILL-5660: Parquet metadata caching improvements | 15.08.2017 @ 09:31:00 PDT | r...@qa-node190.qa.lab | 15.08.2017 @ 13:29:26 PDT | {noformat} Query is: {noformat} ALTER SESSION SET `exec.sort.disable_managed` = false; alter session set `planner.disable_exchanges` = true; alter session set `planner.memory.max_query_memory_per_node` = 104857600; alter session set `planner.width.max_per_node` = 1; alter session set `planner.width.max_per_query` = 1; select max(col1), max(cs_sold_date_sk), max(cs_sold_time_sk), max(cs_ship_date_sk), max(cs_bill_customer_sk), max(cs_bill_cdemo_sk), max(cs_bill_hdemo_sk), max(cs_bill_addr_sk), max(cs_ship_customer_sk), max(cs_ship_cdemo_sk), max(cs_ship_hdemo_sk), max(cs_ship_addr_sk), max(cs_call_center_sk), max(cs_catalog_page_sk), max(cs_ship_mode_sk), min(cs_warehouse_sk), max(cs_item_sk), max(cs_promo_sk), max(cs_order_number), max(cs_quantity), max(cs_wholesale_cost), max(cs_list_price), max(cs_sales_price), max(cs_ext_discount_amt), min(cs_ext_sales_price), max(cs_ext_wholesale_cost), min(cs_ext_list_price), min(cs_ext_tax), min(cs_coupon_amt), max(cs_ext_ship_cost), max(cs_net_paid), max(cs_net_paid_inc_tax), min(cs_net_paid_inc_ship), min(cs_net_paid_inc_ship_tax), min(cs_net_profit), min(c_customer_sk), min(length(c_customer_id)), max(c_current_cdemo_sk), max(c_current_hdemo_sk), min(c_current_addr_sk), min(c_first_shipto_date_sk), min(c_first_sales_date_sk), min(length(c_salutation)), min(length(c_first_name)), min(length(c_last_name)), min(length(c_preferred_cust_flag)), max(c_birth_day), min(c_birth_month), min(c_birth_year), max(c_last_review_date), c_email_address from (select cs_sold_date_sk+cs_sold_time_sk col1, * from dfs.`/drill/testdata/resource-manager/md1362` order by c_email_address nulls first) d where d.col1 > 2536816 and c_email_address is not null group by c_email_address; ALTER SESSION SET `exec.sort.disable_managed` = true; alter session set `planner.disable_exchanges` = false; alter session set `planner.memory.max_query_memory_per_node` = 2147483648; alter session set `planner.width.max_per_node` = 17; alter session set `planner.width.max_per_query` = 1000; {noformat} Here is the stack trace: {noformat} 2017-08-18 13:15:27,052 [2668b522-5833-8fd2-0b6d-e685197f0ae3:frag:0:0] DEBUG o.a.d.e.t.g.SingleBatchSorterGen27 - Took 6445 us to sort 9039 records 2017-08-18 13:15:27,420 [2668b522-5833-8fd2-0b6d-e685197f0ae3:frag:0:0] DEBUG o.a.d.e.p.i.xsort.ExternalSortBatch - Copier allocator current allocation 0 2017-08-18 13:15:27,420 [2668b522-5833-8fd2-0b6d-e685197f0ae3:frag:0:0] DEBUG o.a.d.e.p.i.xsort.ExternalSortBatch - mergeAndSpill: starting total size in memory = 71964288 2017-08-18 13:15:27,421 [2668b522-5833-8fd2-0b6d-e685197f0ae3:frag:0:0] INFO o.a.d.e.p.i.xsort.ExternalSortBatch - User Error Occurred: One or more nodes ran out of memory while executing the query. org.apache.drill.common.exceptions.UserException: RESOURCE ERROR: One or more nodes ran out of memory while executing the query. Unable to allocate sv2 for 9039 records, and not enough batchGroups to spill. batchGroups.size 1 spilledBatchGroups.size 0 allocated memory 71964288 allocator limit 52428800 [Error Id: 7b248f12-2b31-4013-86b6-92e6c842db48 ] at org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:550) ~[drill-common-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT] at org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.newSV2(ExternalSortBatch.java:637) [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT] at org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.innerNext(ExternalSortBatch.java:379) [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:164) [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT] at org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:225) [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119) [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109) [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT] at
[jira] [Updated] (DRILL-5732) Unable to allocate sv2 for 9039 records, and not enough batchGroups to spill.
[ https://issues.apache.org/jira/browse/DRILL-5732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou updated DRILL-5732: -- Attachment: 2668b522-5833-8fd2-0b6d-e685197f0ae3.sys.drill > Unable to allocate sv2 for 9039 records, and not enough batchGroups to spill. > - > > Key: DRILL-5732 > URL: https://issues.apache.org/jira/browse/DRILL-5732 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.10.0 >Reporter: Robert Hou >Assignee: Paul Rogers > Attachments: 2668b522-5833-8fd2-0b6d-e685197f0ae3.sys.drill, > drillbit.log > > > git commit id: > {noformat} > | 1.12.0-SNAPSHOT | e9065b55ea560e7f737d6fcb4948f9e945b9b14f | DRILL-5660: > Parquet metadata caching improvements | 15.08.2017 @ 09:31:00 PDT | > r...@qa-node190.qa.lab | 15.08.2017 @ 13:29:26 PDT | > {noformat} > Query is: > {noformat} > ALTER SESSION SET `exec.sort.disable_managed` = false; > alter session set `planner.disable_exchanges` = true; > alter session set `planner.memory.max_query_memory_per_node` = 104857600; > alter session set `planner.width.max_per_node` = 1; > alter session set `planner.width.max_per_query` = 1; > select max(col1), max(cs_sold_date_sk), max(cs_sold_time_sk), > max(cs_ship_date_sk), max(cs_bill_customer_sk), max(cs_bill_cdemo_sk), > max(cs_bill_hdemo_sk), max(cs_bill_addr_sk), max(cs_ship_customer_sk), > max(cs_ship_cdemo_sk), max(cs_ship_hdemo_sk), max(cs_ship_addr_sk), > max(cs_call_center_sk), max(cs_catalog_page_sk), max(cs_ship_mode_sk), > min(cs_warehouse_sk), max(cs_item_sk), max(cs_promo_sk), > max(cs_order_number), max(cs_quantity), max(cs_wholesale_cost), > max(cs_list_price), max(cs_sales_price), max(cs_ext_discount_amt), > min(cs_ext_sales_price), max(cs_ext_wholesale_cost), min(cs_ext_list_price), > min(cs_ext_tax), min(cs_coupon_amt), max(cs_ext_ship_cost), max(cs_net_paid), > max(cs_net_paid_inc_tax), min(cs_net_paid_inc_ship), > min(cs_net_paid_inc_ship_tax), min(cs_net_profit), min(c_customer_sk), > min(length(c_customer_id)), max(c_current_cdemo_sk), max(c_current_hdemo_sk), > min(c_current_addr_sk), min(c_first_shipto_date_sk), > min(c_first_sales_date_sk), min(length(c_salutation)), > min(length(c_first_name)), min(length(c_last_name)), > min(length(c_preferred_cust_flag)), max(c_birth_day), min(c_birth_month), > min(c_birth_year), max(c_last_review_date), c_email_address from (select > cs_sold_date_sk+cs_sold_time_sk col1, * from > dfs.`/drill/testdata/resource-manager/md1362` order by c_email_address nulls > first) d where d.col1 > 2536816 and c_email_address is not null group by > c_email_address; > ALTER SESSION SET `exec.sort.disable_managed` = true; > alter session set `planner.disable_exchanges` = false; > alter session set `planner.memory.max_query_memory_per_node` = 2147483648; > alter session set `planner.width.max_per_node` = 17; > alter session set `planner.width.max_per_query` = 1000; > {noformat} > Here is the stack trace: > {noformat} > 2017-08-18 13:15:27,052 [2668b522-5833-8fd2-0b6d-e685197f0ae3:frag:0:0] DEBUG > o.a.d.e.t.g.SingleBatchSorterGen27 - Took 6445 us to sort 9039 records > 2017-08-18 13:15:27,420 [2668b522-5833-8fd2-0b6d-e685197f0ae3:frag:0:0] DEBUG > o.a.d.e.p.i.xsort.ExternalSortBatch - Copier allocator current allocation 0 > 2017-08-18 13:15:27,420 [2668b522-5833-8fd2-0b6d-e685197f0ae3:frag:0:0] DEBUG > o.a.d.e.p.i.xsort.ExternalSortBatch - mergeAndSpill: starting total size in > memory = 71964288 > 2017-08-18 13:15:27,421 [2668b522-5833-8fd2-0b6d-e685197f0ae3:frag:0:0] INFO > o.a.d.e.p.i.xsort.ExternalSortBatch - User Error Occurred: One or more nodes > ran out of memory while executing the query. > org.apache.drill.common.exceptions.UserException: RESOURCE ERROR: One or more > nodes ran out of memory while executing the query. > Unable to allocate sv2 for 9039 records, and not enough batchGroups to spill. > batchGroups.size 1 > spilledBatchGroups.size 0 > allocated memory 71964288 > allocator limit 52428800 > [Error Id: 7b248f12-2b31-4013-86b6-92e6c842db48 ] > at > org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:550) > ~[drill-common-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.newSV2(ExternalSortBatch.java:637) > [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.innerNext(ExternalSortBatch.java:379) > [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:164) > [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT] > at >
[jira] [Closed] (DRILL-5522) OOM during the merge and spill process of the managed external sort
[ https://issues.apache.org/jira/browse/DRILL-5522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou closed DRILL-5522. - This has been verified. > OOM during the merge and spill process of the managed external sort > --- > > Key: DRILL-5522 > URL: https://issues.apache.org/jira/browse/DRILL-5522 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.10.0 >Reporter: Rahul Challapalli >Assignee: Paul Rogers > Attachments: 26e334aa-1afa-753f-3afe-862f76b80c18.sys.drill, > drillbit.log, drillbit.out, drill-env.sh > > > git.commit.id.abbrev=1e0a14c > The below query fails with an OOM > {code} > ALTER SESSION SET `exec.sort.disable_managed` = false; > alter session set `planner.memory.max_query_memory_per_node` = 1552428800; > create table dfs.drillTestDir.xsort_ctas3_multiple partition by (type, aCol) > as select type, rptds, rms, s3.rms.a aCol, uid from ( > select * from ( > select s1.type type, flatten(s1.rms.rptd) rptds, s1.rms, s1.uid > from ( > select d.type type, d.uid uid, flatten(d.map.rm) rms from > dfs.`/drill/testdata/resource-manager/nested-large.json` d order by d.uid > ) s1 > ) s2 > order by s2.rms.mapid, s2.rptds.a > ) s3; > {code} > Stack trace > {code} > 2017-05-17 15:15:35,027 [26e334aa-1afa-753f-3afe-862f76b80c18:frag:4:2] INFO > o.a.d.e.w.fragment.FragmentExecutor - User Error Occurred: One or more nodes > ran out of memory while executing the query. (Unable to allocate buffer of > size 2097152 due to memory limit. Current allocation: 29229064) > org.apache.drill.common.exceptions.UserException: RESOURCE ERROR: One or more > nodes ran out of memory while executing the query. > Unable to allocate buffer of size 2097152 due to memory limit. Current > allocation: 29229064 > [Error Id: 619e2e34-704c-4964-a354-1348fb33ce8a ] > at > org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:544) > ~[drill-common-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:244) > [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) > [drill-common-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > [na:1.7.0_111] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > [na:1.7.0_111] > at java.lang.Thread.run(Thread.java:745) [na:1.7.0_111] > Caused by: org.apache.drill.exec.exception.OutOfMemoryException: Unable to > allocate buffer of size 2097152 due to memory limit. Current allocation: > 29229064 > at > org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:220) > ~[drill-memory-base-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:195) > ~[drill-memory-base-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > org.apache.drill.exec.vector.BigIntVector.reAlloc(BigIntVector.java:212) > ~[vector-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > org.apache.drill.exec.vector.BigIntVector.copyFromSafe(BigIntVector.java:324) > ~[vector-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > org.apache.drill.exec.vector.NullableBigIntVector.copyFromSafe(NullableBigIntVector.java:367) > ~[vector-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > org.apache.drill.exec.vector.NullableBigIntVector$TransferImpl.copyValueSafe(NullableBigIntVector.java:328) > ~[vector-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > org.apache.drill.exec.vector.complex.RepeatedMapVector$RepeatedMapTransferPair.copyValueSafe(RepeatedMapVector.java:360) > ~[vector-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > org.apache.drill.exec.vector.complex.MapVector$MapTransferPair.copyValueSafe(MapVector.java:220) > ~[vector-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > org.apache.drill.exec.vector.complex.MapVector.copyFromSafe(MapVector.java:82) > ~[vector-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > org.apache.drill.exec.test.generated.PriorityQueueCopierGen49.doCopy(PriorityQueueCopierTemplate.java:34) > ~[na:na] > at > org.apache.drill.exec.test.generated.PriorityQueueCopierGen49.next(PriorityQueueCopierTemplate.java:76) > ~[na:na] > at > org.apache.drill.exec.physical.impl.xsort.managed.CopierHolder$BatchMerger.next(CopierHolder.java:234) > ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.mergeSpilledRuns(ExternalSortBatch.java:1214) >
[jira] [Resolved] (DRILL-5522) OOM during the merge and spill process of the managed external sort
[ https://issues.apache.org/jira/browse/DRILL-5522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou resolved DRILL-5522. --- Resolution: Fixed This has been resolved. > OOM during the merge and spill process of the managed external sort > --- > > Key: DRILL-5522 > URL: https://issues.apache.org/jira/browse/DRILL-5522 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.10.0 >Reporter: Rahul Challapalli >Assignee: Paul Rogers > Attachments: 26e334aa-1afa-753f-3afe-862f76b80c18.sys.drill, > drillbit.log, drillbit.out, drill-env.sh > > > git.commit.id.abbrev=1e0a14c > The below query fails with an OOM > {code} > ALTER SESSION SET `exec.sort.disable_managed` = false; > alter session set `planner.memory.max_query_memory_per_node` = 1552428800; > create table dfs.drillTestDir.xsort_ctas3_multiple partition by (type, aCol) > as select type, rptds, rms, s3.rms.a aCol, uid from ( > select * from ( > select s1.type type, flatten(s1.rms.rptd) rptds, s1.rms, s1.uid > from ( > select d.type type, d.uid uid, flatten(d.map.rm) rms from > dfs.`/drill/testdata/resource-manager/nested-large.json` d order by d.uid > ) s1 > ) s2 > order by s2.rms.mapid, s2.rptds.a > ) s3; > {code} > Stack trace > {code} > 2017-05-17 15:15:35,027 [26e334aa-1afa-753f-3afe-862f76b80c18:frag:4:2] INFO > o.a.d.e.w.fragment.FragmentExecutor - User Error Occurred: One or more nodes > ran out of memory while executing the query. (Unable to allocate buffer of > size 2097152 due to memory limit. Current allocation: 29229064) > org.apache.drill.common.exceptions.UserException: RESOURCE ERROR: One or more > nodes ran out of memory while executing the query. > Unable to allocate buffer of size 2097152 due to memory limit. Current > allocation: 29229064 > [Error Id: 619e2e34-704c-4964-a354-1348fb33ce8a ] > at > org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:544) > ~[drill-common-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:244) > [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) > [drill-common-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > [na:1.7.0_111] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > [na:1.7.0_111] > at java.lang.Thread.run(Thread.java:745) [na:1.7.0_111] > Caused by: org.apache.drill.exec.exception.OutOfMemoryException: Unable to > allocate buffer of size 2097152 due to memory limit. Current allocation: > 29229064 > at > org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:220) > ~[drill-memory-base-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:195) > ~[drill-memory-base-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > org.apache.drill.exec.vector.BigIntVector.reAlloc(BigIntVector.java:212) > ~[vector-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > org.apache.drill.exec.vector.BigIntVector.copyFromSafe(BigIntVector.java:324) > ~[vector-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > org.apache.drill.exec.vector.NullableBigIntVector.copyFromSafe(NullableBigIntVector.java:367) > ~[vector-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > org.apache.drill.exec.vector.NullableBigIntVector$TransferImpl.copyValueSafe(NullableBigIntVector.java:328) > ~[vector-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > org.apache.drill.exec.vector.complex.RepeatedMapVector$RepeatedMapTransferPair.copyValueSafe(RepeatedMapVector.java:360) > ~[vector-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > org.apache.drill.exec.vector.complex.MapVector$MapTransferPair.copyValueSafe(MapVector.java:220) > ~[vector-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > org.apache.drill.exec.vector.complex.MapVector.copyFromSafe(MapVector.java:82) > ~[vector-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > org.apache.drill.exec.test.generated.PriorityQueueCopierGen49.doCopy(PriorityQueueCopierTemplate.java:34) > ~[na:na] > at > org.apache.drill.exec.test.generated.PriorityQueueCopierGen49.next(PriorityQueueCopierTemplate.java:76) > ~[na:na] > at > org.apache.drill.exec.physical.impl.xsort.managed.CopierHolder$BatchMerger.next(CopierHolder.java:234) > ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at >
[jira] [Updated] (DRILL-5670) Varchar vector throws an assertion error when allocating a new vector
[ https://issues.apache.org/jira/browse/DRILL-5670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou updated DRILL-5670: -- Attachment: drillbit.log.sort > Varchar vector throws an assertion error when allocating a new vector > - > > Key: DRILL-5670 > URL: https://issues.apache.org/jira/browse/DRILL-5670 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.11.0 >Reporter: Rahul Challapalli >Assignee: Paul Rogers > Fix For: 1.12.0 > > Attachments: 26498995-bbad-83bc-618f-914c37a84e1f.sys.drill, > 26555749-4d36-10d2-6faf-e403db40c370.sys.drill, > 266290f3-5fdc-5873-7372-e9ee053bf867.sys.drill, > 269969ca-8d4d-073a-d916-9031e3d3fbf0.sys.drill, drillbit.log, drillbit.log, > drillbit.log, drillbit.log, drillbit.log, drillbit.log.sort, drillbit.out, > drill-override.conf > > > I am running this test on a private branch of [paul's > repository|https://github.com/paul-rogers/drill]. Below is the commit info > {code} > git.commit.id.abbrev=d86e16c > git.commit.user.email=prog...@maprtech.com > git.commit.message.full=DRILL-5601\: Rollup of external sort fixes an > improvements\n\n- DRILL-5513\: Managed External Sort \: OOM error during the > merge phase\n- DRILL-5519\: Sort fails to spill and results in an OOM\n- > DRILL-5522\: OOM during the merge and spill process of the managed external > sort\n- DRILL-5594\: Excessive buffer reallocations during merge phase of > external sort\n- DRILL-5597\: Incorrect "bits" vector allocation in nullable > vectors allocateNew()\n- DRILL-5602\: Repeated List Vector fails to > initialize the offset vector\n\nAll of the bugs have to do with handling > low-memory conditions, and with\ncorrectly estimating the sizes of vectors, > even when those vectors come\nfrom the spill file or from an exchange. Hence, > the changes for all of\nthe above issues are interrelated.\n > git.commit.id=d86e16c551e7d3553f2cde748a739b1c5a7a7659 > git.commit.message.short=DRILL-5601\: Rollup of external sort fixes an > improvements > git.commit.user.name=Paul Rogers > git.build.user.name=Rahul Challapalli > git.commit.id.describe=0.9.0-1078-gd86e16c > git.build.user.email=challapallira...@gmail.com > git.branch=d86e16c551e7d3553f2cde748a739b1c5a7a7659 > git.commit.time=05.07.2017 @ 20\:34\:39 PDT > git.build.time=12.07.2017 @ 14\:27\:03 PDT > git.remote.origin.url=g...@github.com\:paul-rogers/drill.git > {code} > Below query fails with an Assertion Error > {code} > 0: jdbc:drill:zk=10.10.100.190:5181> ALTER SESSION SET > `exec.sort.disable_managed` = false; > +---+-+ > | ok | summary | > +---+-+ > | true | exec.sort.disable_managed updated. | > +---+-+ > 1 row selected (1.044 seconds) > 0: jdbc:drill:zk=10.10.100.190:5181> alter session set > `planner.memory.max_query_memory_per_node` = 482344960; > +---++ > | ok | summary | > +---++ > | true | planner.memory.max_query_memory_per_node updated. | > +---++ > 1 row selected (0.372 seconds) > 0: jdbc:drill:zk=10.10.100.190:5181> alter session set > `planner.width.max_per_node` = 1; > +---+--+ > | ok | summary| > +---+--+ > | true | planner.width.max_per_node updated. | > +---+--+ > 1 row selected (0.292 seconds) > 0: jdbc:drill:zk=10.10.100.190:5181> alter session set > `planner.width.max_per_query` = 1; > +---+---+ > | ok |summary| > +---+---+ > | true | planner.width.max_per_query updated. | > +---+---+ > 1 row selected (0.25 seconds) > 0: jdbc:drill:zk=10.10.100.190:5181> select count(*) from (select * from > dfs.`/drill/testdata/resource-manager/3500cols.tbl` order by > columns[450],columns[330],columns[230],columns[220],columns[110],columns[90],columns[80],columns[70],columns[40],columns[10],columns[20],columns[30],columns[40],columns[50], > > columns[454],columns[413],columns[940],columns[834],columns[73],columns[140],columns[104],columns[],columns[30],columns[2420],columns[1520], > columns[1410], > columns[1110],columns[1290],columns[2380],columns[705],columns[45],columns[1054],columns[2430],columns[420],columns[404],columns[3350], > >
[jira] [Closed] (DRILL-5445) Assertion Error in Managed External Sort when dealing with repeated maps
[ https://issues.apache.org/jira/browse/DRILL-5445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou closed DRILL-5445. - This has been verified. > Assertion Error in Managed External Sort when dealing with repeated maps > > > Key: DRILL-5445 > URL: https://issues.apache.org/jira/browse/DRILL-5445 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.10.0 >Reporter: Rahul Challapalli >Assignee: Paul Rogers > Fix For: 1.12.0 > > Attachments: 27004a3c-c53d-52d1-c7ed-4beb563447f9.sys.drill, > drillbit.log > > > git.commit.id.abbrev=3e8b01d > The below query fails with an Assertion Error (I am running with assertions > enabled) > {code} > ALTER SESSION SET `exec.sort.disable_managed` = false; > alter session set `planner.width.max_per_node` = 1; > alter session set `planner.disable_exchanges` = true; > alter session set `planner.width.max_per_query` = 1; > alter session set `planner.memory.max_query_memory_per_node` = 152428800; > select count(*) from ( > select * from ( > select event_info.uid, transaction_info.trans_id, event_info.event.evnt_id > from ( > select userinfo.transaction.trans_id trans_id, > max(userinfo.event.event_time) max_event_time > from ( > select uid, flatten(events) event, flatten(transactions) transaction > from dfs.`/drill/testdata/resource-manager/nested-large.json` > ) userinfo > where userinfo.transaction.trans_time >= userinfo.event.event_time > group by userinfo.transaction.trans_id > ) transaction_info > inner join > ( > select uid, flatten(events) event > from dfs.`/drill/testdata/resource-manager/nested-large.json` > ) event_info > on transaction_info.max_event_time = event_info.event.event_time) d order by > features[0].type) d1 where d1.uid < -1; > {code} > Below is the error from the logs > {code} > [Error Id: 26983344-dee3-4a33-8508-ad125f01fee6 on qa-node190.qa.lab:31010] > at > org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:544) > ~[drill-common-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:295) > [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:160) > [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:264) > [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) > [drill-common-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > [na:1.7.0_111] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > [na:1.7.0_111] > at java.lang.Thread.run(Thread.java:745) [na:1.7.0_111] > Caused by: java.lang.RuntimeException: java.lang.AssertionError > at > org.apache.drill.common.DeferredException.addThrowable(DeferredException.java:101) > ~[drill-common-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.fail(FragmentExecutor.java:409) > [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:250) > [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > ... 4 common frames omitted > Caused by: java.lang.AssertionError: null > at > org.apache.drill.exec.vector.complex.RepeatedMapVector.load(RepeatedMapVector.java:444) > ~[vector-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > org.apache.drill.exec.cache.VectorAccessibleSerializable.readFromStream(VectorAccessibleSerializable.java:118) > ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.xsort.managed.BatchGroup$SpilledRun.getBatch(BatchGroup.java:222) > ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.xsort.managed.BatchGroup$SpilledRun.getNextIndex(BatchGroup.java:196) > ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > org.apache.drill.exec.test.generated.PriorityQueueCopierGen23.setup(PriorityQueueCopierTemplate.java:60) > ~[na:na] > at > org.apache.drill.exec.physical.impl.xsort.managed.CopierHolder.createCopier(CopierHolder.java:116) > ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.xsort.managed.CopierHolder.access$200(CopierHolder.java:45) > ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at >
[jira] [Closed] (DRILL-5465) Managed external sort results in an OOM
[ https://issues.apache.org/jira/browse/DRILL-5465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou closed DRILL-5465. - This has been verified. > Managed external sort results in an OOM > --- > > Key: DRILL-5465 > URL: https://issues.apache.org/jira/browse/DRILL-5465 > Project: Apache Drill > Issue Type: Bug >Reporter: Rahul Challapalli >Assignee: Paul Rogers > Fix For: 1.12.0 > > Attachments: 26f7368e-21a1-6513-74ea-a178ae1e50f8.sys.drill, > createViewsParquet.sql, drillbit.log > > > git.commit.id.abbrev=1e0a14c > The below query fails with an OOM on top of Tpcds SF1 parquet data. Since the > sort already spilled once, I assume there is sufficient memory to handle the > spill/merge batches. The view definition file is attached and the data can be > downloaded from [1] > {code} > use dfs.tpcds_sf1_parquet_views; > alter session set `planner.enable_decimal_data_type` = true; > ALTER SESSION SET `exec.sort.disable_managed` = false; > alter session set `planner.width.max_per_node` = 1; > alter session set `planner.disable_exchanges` = true; > alter session set `planner.width.max_per_query` = 1; > alter session set `planner.memory.max_query_memory_per_node` = 200435456; > alter session set `planner.enable_hashjoin` = false; > SELECT dt.d_year, >item.i_brand_id brand_id, >item.i_brand brand, >Sum(ss_ext_discount_amt) sum_agg > FROM date_dim dt, >store_sales, >item > WHERE dt.d_date_sk = store_sales.ss_sold_date_sk >AND store_sales.ss_item_sk = item.i_item_sk >AND item.i_manufact_id = 427 >AND dt.d_moy = 11 > GROUP BY dt.d_year, > item.i_brand, > item.i_brand_id > ORDER BY dt.d_year, > sum_agg DESC, > brand_id; > {code} > Exception from the logs > {code} > [Error Id: 676ff6ad-829d-4920-9d4f-5132601d27b4 ] > at > org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:544) > ~[drill-common-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.mergeAndSpill(ExternalSortBatch.java:617) > [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.innerNext(ExternalSortBatch.java:425) > [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162) > [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:215) > [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119) > [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109) > [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51) > [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext(RemovingRecordBatch.java:93) > [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162) > [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:215) > [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119) > [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > org.apache.drill.exec.record.RecordIterator.nextBatch(RecordIterator.java:99) > [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > org.apache.drill.exec.record.RecordIterator.next(RecordIterator.java:185) > [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > org.apache.drill.exec.record.RecordIterator.prepare(RecordIterator.java:169) > [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.join.JoinStatus.prepare(JoinStatus.java:87) > [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.join.MergeJoinBatch.innerNext(MergeJoinBatch.java:160) > [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162) >
[jira] [Closed] (DRILL-5253) External sort fails with OOM error (Fails to allocate sv2)
[ https://issues.apache.org/jira/browse/DRILL-5253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou closed DRILL-5253. - This has been verified. > External sort fails with OOM error (Fails to allocate sv2) > -- > > Key: DRILL-5253 > URL: https://issues.apache.org/jira/browse/DRILL-5253 > Project: Apache Drill > Issue Type: Sub-task > Components: Execution - Relational Operators >Affects Versions: 1.10.0 >Reporter: Rahul Challapalli >Assignee: Paul Rogers > Fix For: 1.12.0 > > Attachments: 2762f36d-a2e7-5582-922d-3c4626be18c0.sys.drill > > > git.commit.id.abbrev=2af709f > The data set used in the below query has the same value for every column in > every row. The query fails with an OOM as it exceeds the allocated memory > {code} > alter session set `planner.width.max_per_node` = 1; > alter session set `planner.memory.max_query_memory_per_node` = 104857600; > select count(*) from (select * from identical order by col1, col2, col3, > col4, col5, col6, col7, col8, col9, col10); > Error: RESOURCE ERROR: One or more nodes ran out of memory while executing > the query. > org.apache.drill.exec.exception.OutOfMemoryException: Unable to allocate sv2 > buffer after repeated attempts > Fragment 2:0 > [Error Id: aed43fa1-fd8b-4440-9426-0f35d055aabb on qa-node190.qa.lab:31010] > (state=,code=0) > {code} > Exception from the logs > {code} > org.apache.drill.common.exceptions.UserException: RESOURCE ERROR: One or more > nodes ran out of memory while executing the query. > org.apache.drill.exec.exception.OutOfMemoryException: Unable to allocate sv2 > buffer after repeated attempts > [Error Id: aed43fa1-fd8b-4440-9426-0f35d055aabb ] > at > org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:544) > ~[drill-common-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:242) > [drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT] > at > org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) > [drill-common-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > [na:1.7.0_111] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > [na:1.7.0_111] > at java.lang.Thread.run(Thread.java:745) [na:1.7.0_111] > Caused by: org.apache.drill.exec.exception.OutOfMemoryException: > org.apache.drill.exec.exception.OutOfMemoryException: Unable to allocate sv2 > buffer after repeated attempts > at > org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.innerNext(ExternalSortBatch.java:371) > ~[drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162) > ~[drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:215) > ~[drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119) > ~[drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109) > ~[drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51) > ~[drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext(RemovingRecordBatch.java:93) > ~[drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162) > ~[drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:215) > ~[drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:104) > ~[drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext(SingleSenderCreator.java:92) > ~[drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:94) > ~[drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT] > at > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:232) >
[jira] [Closed] (DRILL-5519) Sort fails to spill and results in an OOM
[ https://issues.apache.org/jira/browse/DRILL-5519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou closed DRILL-5519. - This has been verified. > Sort fails to spill and results in an OOM > - > > Key: DRILL-5519 > URL: https://issues.apache.org/jira/browse/DRILL-5519 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.10.0 >Reporter: Rahul Challapalli >Assignee: Paul Rogers > Fix For: 1.12.0 > > Attachments: 26e49afc-cf45-637b-acc1-a70fee7fe7e2.sys.drill, > drillbit.log, drillbit.out, drill-env.sh > > > Setup : > {code} > git.commit.id.abbrev=1e0a14c > DRILL_MAX_DIRECT_MEMORY="32G" > DRILL_MAX_HEAP="4G" > No of nodes in the drill cluster : 1 > {code} > The below query fails with an OOM in the "in-memory sort" code, which means > the logic which decides when to spill is flawed. > {code} > 0: jdbc:drill:zk=10.10.100.190:5181> ALTER SESSION SET > `exec.sort.disable_managed` = false; > +---+-+ > | ok | summary | > +---+-+ > | true | exec.sort.disable_managed updated. | > +---+-+ > 1 row selected (1.022 seconds) > 0: jdbc:drill:zk=10.10.100.190:5181> alter session set > `planner.memory.max_query_memory_per_node` = 334288000; > +---++ > | ok | summary | > +---++ > | true | planner.memory.max_query_memory_per_node updated. | > +---++ > 1 row selected (0.369 seconds) > 0: jdbc:drill:zk=10.10.100.190:5181> select count(*) from (select * from > (select flatten(flatten(lst_lst)) num from > dfs.`/drill/testdata/resource-manager/nested-large.json`) d order by d.num) > d1 where d1.num < -1; > Error: RESOURCE ERROR: One or more nodes ran out of memory while executing > the query. > Unable to allocate buffer of size 4194304 (rounded from 320) due to > memory limit. Current allocation: 16015936 > Fragment 2:2 > [Error Id: 4d9cc59a-b5d1-4ca9-9b26-69d9438f0bee on qa-node190.qa.lab:31010] > (state=,code=0) > {code} > Below is the exception from the logs > {code} > 2017-05-16 13:46:33,233 [26e49afc-cf45-637b-acc1-a70fee7fe7e2:frag:2:2] INFO > o.a.d.e.w.fragment.FragmentExecutor - User Error Occurred: One or more nodes > ran out of memory while executing the query. (Unable to allocate buffer of > size 4194304 (rounded from 320) due to memory limit. Current allocation: > 16015936) > org.apache.drill.common.exceptions.UserException: RESOURCE ERROR: One or more > nodes ran out of memory while executing the query. > Unable to allocate buffer of size 4194304 (rounded from 320) due to > memory limit. Current allocation: 16015936 > [Error Id: 4d9cc59a-b5d1-4ca9-9b26-69d9438f0bee ] > at > org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:544) > ~[drill-common-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:244) > [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) > [drill-common-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > [na:1.7.0_111] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > [na:1.7.0_111] > at java.lang.Thread.run(Thread.java:745) [na:1.7.0_111] > Caused by: org.apache.drill.exec.exception.OutOfMemoryException: Unable to > allocate buffer of size 4194304 (rounded from 320) due to memory limit. > Current allocation: 16015936 > at > org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:220) > ~[drill-memory-base-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:195) > ~[drill-memory-base-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > org.apache.drill.exec.test.generated.MSorterGen44.setup(MSortTemplate.java:91) > ~[na:na] > at > org.apache.drill.exec.physical.impl.xsort.managed.MergeSort.merge(MergeSort.java:110) > ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.sortInMemory(ExternalSortBatch.java:1159) > ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.load(ExternalSortBatch.java:687) > ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] >
[jira] [Closed] (DRILL-5447) Managed External Sort : Unable to allocate sv2 vector
[ https://issues.apache.org/jira/browse/DRILL-5447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou closed DRILL-5447. - This has been verified. > Managed External Sort : Unable to allocate sv2 vector > - > > Key: DRILL-5447 > URL: https://issues.apache.org/jira/browse/DRILL-5447 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.10.0 >Reporter: Rahul Challapalli >Assignee: Paul Rogers > Fix For: 1.12.0 > > Attachments: 26550427-6adf-a52e-2ea8-dc52d8d8433f.sys.drill, > 26617a7e-b953-7ac3-556d-43fd88e51b19.sys.drill, > 26fee988-ed18-a86a-7164-3e75118c0ffc.sys.drill, drillbit.log, drillbit.log, > drillbit.log > > > git.commit.id.abbrev=3e8b01d > Dataset : > {code} > Every records contains a repeated type with 2000 elements. > The repeated type contains varchars of length 250 for the first 2000 records > and single character strings for the next 2000 records > The above pattern is repeated a few types > {code} > The below query fails > {code} > ALTER SESSION SET `exec.sort.disable_managed` = false; > alter session set `planner.width.max_per_node` = 1; > alter session set `planner.disable_exchanges` = true; > alter session set `planner.width.max_per_query` = 1; > select count(*) from (select * from (select id, flatten(str_list) str from > dfs.`/drill/testdata/resource-manager/flatten-large-small.json`) d order by > d.str) d1 where d1.id=0; > Error: RESOURCE ERROR: Unable to allocate sv2 buffer > Fragment 0:0 > [Error Id: 9e45c293-ab26-489d-a90e-25da96004f15 on qa-node190.qa.lab:31010] > (state=,code=0) > {code} > Exception from the logs > {code} > [Error Id: 9e45c293-ab26-489d-a90e-25da96004f15 ] > at > org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:544) > ~[drill-common-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.newSV2(ExternalSortBatch.java:1463) > [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.makeSelectionVector(ExternalSortBatch.java:799) > [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.processBatch(ExternalSortBatch.java:856) > [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.loadBatch(ExternalSortBatch.java:618) > [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.load(ExternalSortBatch.java:660) > [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.innerNext(ExternalSortBatch.java:559) > [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162) > [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:215) > [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119) > [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109) > [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51) > [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext(RemovingRecordBatch.java:93) > [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162) > [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:215) > [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119) > [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109) > [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51) >
[jira] [Commented] (DRILL-5670) Varchar vector throws an assertion error when allocating a new vector
[ https://issues.apache.org/jira/browse/DRILL-5670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16162289#comment-16162289 ] Robert Hou commented on DRILL-5670: --- I have attached drillbit.log.sort. Can you confirm that sort has completed? > Varchar vector throws an assertion error when allocating a new vector > - > > Key: DRILL-5670 > URL: https://issues.apache.org/jira/browse/DRILL-5670 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.11.0 >Reporter: Rahul Challapalli >Assignee: Paul Rogers > Fix For: 1.12.0 > > Attachments: 26498995-bbad-83bc-618f-914c37a84e1f.sys.drill, > 26555749-4d36-10d2-6faf-e403db40c370.sys.drill, > 266290f3-5fdc-5873-7372-e9ee053bf867.sys.drill, > 269969ca-8d4d-073a-d916-9031e3d3fbf0.sys.drill, drillbit.log, drillbit.log, > drillbit.log, drillbit.log, drillbit.log, drillbit.log.sort, drillbit.out, > drill-override.conf > > > I am running this test on a private branch of [paul's > repository|https://github.com/paul-rogers/drill]. Below is the commit info > {code} > git.commit.id.abbrev=d86e16c > git.commit.user.email=prog...@maprtech.com > git.commit.message.full=DRILL-5601\: Rollup of external sort fixes an > improvements\n\n- DRILL-5513\: Managed External Sort \: OOM error during the > merge phase\n- DRILL-5519\: Sort fails to spill and results in an OOM\n- > DRILL-5522\: OOM during the merge and spill process of the managed external > sort\n- DRILL-5594\: Excessive buffer reallocations during merge phase of > external sort\n- DRILL-5597\: Incorrect "bits" vector allocation in nullable > vectors allocateNew()\n- DRILL-5602\: Repeated List Vector fails to > initialize the offset vector\n\nAll of the bugs have to do with handling > low-memory conditions, and with\ncorrectly estimating the sizes of vectors, > even when those vectors come\nfrom the spill file or from an exchange. Hence, > the changes for all of\nthe above issues are interrelated.\n > git.commit.id=d86e16c551e7d3553f2cde748a739b1c5a7a7659 > git.commit.message.short=DRILL-5601\: Rollup of external sort fixes an > improvements > git.commit.user.name=Paul Rogers > git.build.user.name=Rahul Challapalli > git.commit.id.describe=0.9.0-1078-gd86e16c > git.build.user.email=challapallira...@gmail.com > git.branch=d86e16c551e7d3553f2cde748a739b1c5a7a7659 > git.commit.time=05.07.2017 @ 20\:34\:39 PDT > git.build.time=12.07.2017 @ 14\:27\:03 PDT > git.remote.origin.url=g...@github.com\:paul-rogers/drill.git > {code} > Below query fails with an Assertion Error > {code} > 0: jdbc:drill:zk=10.10.100.190:5181> ALTER SESSION SET > `exec.sort.disable_managed` = false; > +---+-+ > | ok | summary | > +---+-+ > | true | exec.sort.disable_managed updated. | > +---+-+ > 1 row selected (1.044 seconds) > 0: jdbc:drill:zk=10.10.100.190:5181> alter session set > `planner.memory.max_query_memory_per_node` = 482344960; > +---++ > | ok | summary | > +---++ > | true | planner.memory.max_query_memory_per_node updated. | > +---++ > 1 row selected (0.372 seconds) > 0: jdbc:drill:zk=10.10.100.190:5181> alter session set > `planner.width.max_per_node` = 1; > +---+--+ > | ok | summary| > +---+--+ > | true | planner.width.max_per_node updated. | > +---+--+ > 1 row selected (0.292 seconds) > 0: jdbc:drill:zk=10.10.100.190:5181> alter session set > `planner.width.max_per_query` = 1; > +---+---+ > | ok |summary| > +---+---+ > | true | planner.width.max_per_query updated. | > +---+---+ > 1 row selected (0.25 seconds) > 0: jdbc:drill:zk=10.10.100.190:5181> select count(*) from (select * from > dfs.`/drill/testdata/resource-manager/3500cols.tbl` order by > columns[450],columns[330],columns[230],columns[220],columns[110],columns[90],columns[80],columns[70],columns[40],columns[10],columns[20],columns[30],columns[40],columns[50], > > columns[454],columns[413],columns[940],columns[834],columns[73],columns[140],columns[104],columns[],columns[30],columns[2420],columns[1520], > columns[1410], >
[jira] [Resolved] (DRILL-5443) Managed External Sort fails with OOM while spilling to disk
[ https://issues.apache.org/jira/browse/DRILL-5443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou resolved DRILL-5443. --- Resolution: Fixed This has been resolved. > Managed External Sort fails with OOM while spilling to disk > --- > > Key: DRILL-5443 > URL: https://issues.apache.org/jira/browse/DRILL-5443 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.10.0, 1.11.0 >Reporter: Rahul Challapalli >Assignee: Paul Rogers > Fix For: 1.12.0 > > Attachments: 265a014b-8cae-30b5-adab-ff030b6c7086.sys.drill, > 27016969-ef53-40dc-b582-eea25371fa1c.sys.drill, drill5443.drillbit.log, > drillbit.log > > > git.commit.id.abbrev=3e8b01d > The below query fails with an OOM > {code} > ALTER SESSION SET `exec.sort.disable_managed` = false; > alter session set `planner.width.max_per_node` = 1; > alter session set `planner.disable_exchanges` = true; > alter session set `planner.width.max_per_query` = 1; > alter session set `planner.memory.max_query_memory_per_node` = 52428800; > select s1.type type, flatten(s1.rms.rptd) rptds from (select d.type type, > d.uid uid, flatten(d.map.rm) rms from > dfs.`/drill/testdata/resource-manager/nested-large.json` d order by d.uid) s1 > order by s1.rms.mapid; > {code} > Exception from the logs > {code} > 2017-04-24 17:22:59,439 [27016969-ef53-40dc-b582-eea25371fa1c:frag:0:0] INFO > o.a.d.e.p.i.x.m.ExternalSortBatch - User Error Occurred: External Sort > encountered an error while spilling to disk (Unable to allocate buffer of > size 524288 (rounded from 307197) due to memory limit. Current allocation: > 25886728) > org.apache.drill.common.exceptions.UserException: RESOURCE ERROR: External > Sort encountered an error while spilling to disk > [Error Id: a64e3790-3a34-42c8-b4ea-4cb1df780e63 ] > at > org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:544) > ~[drill-common-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.doMergeAndSpill(ExternalSortBatch.java:1445) > [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.mergeAndSpill(ExternalSortBatch.java:1376) > [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.mergeRuns(ExternalSortBatch.java:1372) > [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.consolidateBatches(ExternalSortBatch.java:1299) > [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.mergeSpilledRuns(ExternalSortBatch.java:1195) > [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.load(ExternalSortBatch.java:689) > [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.innerNext(ExternalSortBatch.java:559) > [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162) > [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:215) > [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119) > [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109) > [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51) > [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext(RemovingRecordBatch.java:93) > [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162) > [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:215) > [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119) >
[jira] [Closed] (DRILL-5442) Managed Sort: IndexOutOfBounds with a join over an inlist
[ https://issues.apache.org/jira/browse/DRILL-5442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou closed DRILL-5442. - I have verified this has been fixed. > Managed Sort: IndexOutOfBounds with a join over an inlist > - > > Key: DRILL-5442 > URL: https://issues.apache.org/jira/browse/DRILL-5442 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.10.0 >Reporter: Boaz Ben-Zvi >Assignee: Paul Rogers > Fix For: 1.12.0 > > > The following query fails with IOOB when a managed sort is used, but passes > with the old default sort: > = > 0: jdbc:drill:zk=local> alter session set `exec.sort.disable_managed` = false; > +---+-+ > | ok | summary | > +---+-+ > | true | exec.sort.disable_managed updated. | > +---+-+ > 1 row selected (0.16 seconds) > 0: jdbc:drill:zk=local> select * from dfs.`/data/json/s1/date_dim` where > d_year in(1990, 1901, 1902, 1903, 1904, 1905, 1906, 1907, 1908, 1909, 1910, > 1911, 1912, 1913, 1914, 1915, 1916, 1917, 1918, 1919) limit 3; > Error: SYSTEM ERROR: IndexOutOfBoundsException: index: 0, length: 1 > (expected: range(0, 0)) > Fragment 0:0 > [Error Id: 370fd706-c365-421f-b57d-d6ab7fde82df on 10.250.56.251:31010] > (state=,code=0) > > > (the above query was extracted from > /root/drillAutomation/framework-master/framework/resources/Functional/tpcds/variants/hive/q4_1.sql > ) > Note that the inlist must have at least 20 items, in which case the plan > becomes a join over a stream-aggregate over a sort over the (inlist's) > values. When the IOOB happens, the stack does not show the sort anymore, but > probably handling a NONE returned by the last next() on the sort ( > StreamingAggTemplate.doWork():182 ) > The "date_dim" can probably be made up with any data. The one above was taken > from: > [root@atsqa6c85 ~]# hadoop fs -ls /drill/testdata/tpcds/json/s1/date_dim > Found 1 items > -rwxr-xr-x 3 root root 50713534 2014-10-14 22:39 > /drill/testdata/tpcds/json/s1/date_dim/0_0_0.json -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (DRILL-5443) Managed External Sort fails with OOM while spilling to disk
[ https://issues.apache.org/jira/browse/DRILL-5443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou closed DRILL-5443. - This has been verified. > Managed External Sort fails with OOM while spilling to disk > --- > > Key: DRILL-5443 > URL: https://issues.apache.org/jira/browse/DRILL-5443 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.10.0, 1.11.0 >Reporter: Rahul Challapalli >Assignee: Paul Rogers > Fix For: 1.12.0 > > Attachments: 265a014b-8cae-30b5-adab-ff030b6c7086.sys.drill, > 27016969-ef53-40dc-b582-eea25371fa1c.sys.drill, drill5443.drillbit.log, > drillbit.log > > > git.commit.id.abbrev=3e8b01d > The below query fails with an OOM > {code} > ALTER SESSION SET `exec.sort.disable_managed` = false; > alter session set `planner.width.max_per_node` = 1; > alter session set `planner.disable_exchanges` = true; > alter session set `planner.width.max_per_query` = 1; > alter session set `planner.memory.max_query_memory_per_node` = 52428800; > select s1.type type, flatten(s1.rms.rptd) rptds from (select d.type type, > d.uid uid, flatten(d.map.rm) rms from > dfs.`/drill/testdata/resource-manager/nested-large.json` d order by d.uid) s1 > order by s1.rms.mapid; > {code} > Exception from the logs > {code} > 2017-04-24 17:22:59,439 [27016969-ef53-40dc-b582-eea25371fa1c:frag:0:0] INFO > o.a.d.e.p.i.x.m.ExternalSortBatch - User Error Occurred: External Sort > encountered an error while spilling to disk (Unable to allocate buffer of > size 524288 (rounded from 307197) due to memory limit. Current allocation: > 25886728) > org.apache.drill.common.exceptions.UserException: RESOURCE ERROR: External > Sort encountered an error while spilling to disk > [Error Id: a64e3790-3a34-42c8-b4ea-4cb1df780e63 ] > at > org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:544) > ~[drill-common-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.doMergeAndSpill(ExternalSortBatch.java:1445) > [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.mergeAndSpill(ExternalSortBatch.java:1376) > [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.mergeRuns(ExternalSortBatch.java:1372) > [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.consolidateBatches(ExternalSortBatch.java:1299) > [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.mergeSpilledRuns(ExternalSortBatch.java:1195) > [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.load(ExternalSortBatch.java:689) > [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.innerNext(ExternalSortBatch.java:559) > [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162) > [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:215) > [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119) > [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109) > [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51) > [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext(RemovingRecordBatch.java:93) > [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162) > [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:215) > [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119) > [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT] > at
[jira] [Commented] (DRILL-5478) Spill file size parameter is not honored by the managed external sort
[ https://issues.apache.org/jira/browse/DRILL-5478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16168247#comment-16168247 ] Robert Hou commented on DRILL-5478: --- We should still test it on behalf of Support. We don't have to test it extensively, but ensure it still works in general. The file size in this example is 256 MB. The memory is 1 GB. Is this a reasonable set of values? > Spill file size parameter is not honored by the managed external sort > - > > Key: DRILL-5478 > URL: https://issues.apache.org/jira/browse/DRILL-5478 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.10.0 >Reporter: Rahul Challapalli >Assignee: Paul Rogers > Fix For: 1.12.0 > > > git.commit.id.abbrev=1e0a14c > Query: > {code} > ALTER SESSION SET `exec.sort.disable_managed` = false; > alter session set `planner.width.max_per_node` = 1; > alter session set `planner.disable_exchanges` = true; > alter session set `planner.width.max_per_query` = 1; > alter session set `planner.memory.max_query_memory_per_node` = 1052428800; > alter session set `planner.enable_decimal_data_type` = true; > select count(*) from ( > select * from dfs.`/drill/testdata/resource-manager/all_types_large` d1 > order by d1.map.missing > ) d; > {code} > Boot Options (spill file size is set to 256MB) > {code} > 0: jdbc:drill:zk=10.10.100.190:5181> select * from sys.boot where name like > '%spill%'; > +--+-+---+-+--++---++ > | name | kind | type | status > | num_val | string_val | bool_val > | float_val | > +--+-+---+-+--++---++ > | drill.exec.sort.external.spill.directories | STRING | BOOT | BOOT > | null | [ > # drill-override.conf: 26 > "/tmp/test" > ] | null | null | > | drill.exec.sort.external.spill.file_size | STRING | BOOT | BOOT > | null | "256M" | null > | null | > | drill.exec.sort.external.spill.fs| STRING | BOOT | BOOT > | null | "maprfs:///" | null > | null | > | drill.exec.sort.external.spill.group.size| LONG| BOOT | BOOT > | 4| null | null > | null | > | drill.exec.sort.external.spill.merge_batch_size | STRING | BOOT | BOOT > | null | "16M" | null > | null | > | drill.exec.sort.external.spill.spill_batch_size | STRING | BOOT | BOOT > | null | "8M" | null > | null | > | drill.exec.sort.external.spill.threshold | LONG| BOOT | BOOT > | 4| null | null > | null | > +--+-+---+-+--++---++ > {code} > Below are the spill files while the query is still executing. The size of the > spill files is ~34MB > {code} > -rwxr-xr-x 3 root root 34957815 2017-05-05 11:26 > /tmp/test/26f33c36-4235-3531-aeaa-2c73dc4ddeb5_major0_minor0_op5_sort/run1 > -rwxr-xr-x 3 root root 34957815 2017-05-05 11:27 > /tmp/test/26f33c36-4235-3531-aeaa-2c73dc4ddeb5_major0_minor0_op5_sort/run2 > -rwxr-xr-x 3 root root 0 2017-05-05 11:27 > /tmp/test/26f33c36-4235-3531-aeaa-2c73dc4ddeb5_major0_minor0_op5_sort/run3 > {code} > The data set is too large to attach here. Reach out to me if you need anything -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (DRILL-5153) RESOURCE ERROR: Waited for 15000ms, but tasks for 'Get block maps' are not complete
[ https://issues.apache.org/jira/browse/DRILL-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou closed DRILL-5153. - Resolution: Cannot Reproduce > RESOURCE ERROR: Waited for 15000ms, but tasks for 'Get block maps' are not > complete > > > Key: DRILL-5153 > URL: https://issues.apache.org/jira/browse/DRILL-5153 > Project: Apache Drill > Issue Type: Bug > Components: Execution - RPC, Query Planning & Optimization >Reporter: Rahul Challapalli > Attachments: tera.log > > > git.commit.id.abbrev=cf2b7c7 > The below query consistently fails on my 2 node cluster. I used the data set > from the terasort benchmark > {code} > select * from dfs.`/drill/testdata/resource-manager/terasort-data` limit 1; > Error: RESOURCE ERROR: Waited for 15000ms, but tasks for 'Get block maps' are > not complete. Total runnable size 2, parallelism 2. > [Error Id: 580e6c04-7096-4c09-9c7a-63e70c71d574 on qa-node182.qa.lab:31010] > (state=,code=0) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (DRILL-5146) Unnecessary spilling to disk by sort when we only have 5000 rows with one column
[ https://issues.apache.org/jira/browse/DRILL-5146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou closed DRILL-5146. - I have verified this is fixed. > Unnecessary spilling to disk by sort when we only have 5000 rows with one > column > > > Key: DRILL-5146 > URL: https://issues.apache.org/jira/browse/DRILL-5146 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Reporter: Rahul Challapalli >Assignee: Paul Rogers > Fix For: 1.12.0 > > Attachments: 27a52efb-0ce6-f2ad-7216-aef007926649.sys.drill, > data.tgz, spill.log > > > git.commit.id.abbrev=cf2b7c7 > The below query spills to disk for the sort. The dataset contains 5000 files > and each file contains a single record. > {code} > select * from dfs.`/drill/testdata/resource-manager/5000files/text` order by > columns[1]; > {code} > Enviironment : > {code} > DRILL_MAX_DIRECT_MEMORY="16G" > DRILL_MAX_HEAP="4G" > {code} > I attached the dataset, logs and the profile -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (DRILL-5154) OOM error in external sort on top of 400GB data set generated using terasort benchamark
[ https://issues.apache.org/jira/browse/DRILL-5154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou closed DRILL-5154. - Resolution: Cannot Reproduce > OOM error in external sort on top of 400GB data set generated using terasort > benchamark > --- > > Key: DRILL-5154 > URL: https://issues.apache.org/jira/browse/DRILL-5154 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.10.0 >Reporter: Rahul Challapalli >Assignee: Paul Rogers > Attachments: 27a3de95-e30b-8890-6653-80fd6c49a3a1.sys.drill > > > git.commit.id.abbrev=cf2b7c7 > The below query fails with an OOM in external sort > {code} > No of drillbits : 1 > Nodes in Mapr cluster : 2 > DRILL_MAX_DIRECT_MEMORY="16G" > DRILL_MAX_HEAP="4G" > select * from (select * from > dfs.`/drill/testdata/resource-manager/terasort-data/part-m-0.tbl` order > by columns[0]) d where d.columns[0] = 'null'; > Error: RESOURCE ERROR: One or more nodes ran out of memory while executing > the query. > Unable to allocate buffer of size 8388608 due to memory limit. Current > allocation: 8441872 > Fragment 1:6 > [Error Id: 87ede736-b480-4286-b472-7694fdd2f7da on qa-node183.qa.lab:31010] > (state=,code=0) > {code} > I attached the logs and the query profile -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5154) OOM error in external sort on top of 400GB data set generated using terasort benchamark
[ https://issues.apache.org/jira/browse/DRILL-5154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16167417#comment-16167417 ] Robert Hou commented on DRILL-5154: --- We do not have this table any more, so we cannot reproduce the problem. Closing it for now. > OOM error in external sort on top of 400GB data set generated using terasort > benchamark > --- > > Key: DRILL-5154 > URL: https://issues.apache.org/jira/browse/DRILL-5154 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.10.0 >Reporter: Rahul Challapalli >Assignee: Paul Rogers > Attachments: 27a3de95-e30b-8890-6653-80fd6c49a3a1.sys.drill > > > git.commit.id.abbrev=cf2b7c7 > The below query fails with an OOM in external sort > {code} > No of drillbits : 1 > Nodes in Mapr cluster : 2 > DRILL_MAX_DIRECT_MEMORY="16G" > DRILL_MAX_HEAP="4G" > select * from (select * from > dfs.`/drill/testdata/resource-manager/terasort-data/part-m-0.tbl` order > by columns[0]) d where d.columns[0] = 'null'; > Error: RESOURCE ERROR: One or more nodes ran out of memory while executing > the query. > Unable to allocate buffer of size 8388608 due to memory limit. Current > allocation: 8441872 > Fragment 1:6 > [Error Id: 87ede736-b480-4286-b472-7694fdd2f7da on qa-node183.qa.lab:31010] > (state=,code=0) > {code} > I attached the logs and the query profile -- This message was sent by Atlassian JIRA (v6.4.14#64029)