[jira] [Updated] (DRILL-4734) Query against HBase table on a 5 node cluster fails with SchemaChangeException
[ https://issues.apache.org/jira/browse/DRILL-4734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] tiredqiang updated DRILL-4734: -- Attachment: drillbit.log Attached the driibit error log. For this test case, I used the hbase1.2.0 client, but there have the same error for version 1.1.3 , that's the reason I changed to 1.2.0. > Query against HBase table on a 5 node cluster fails with SchemaChangeException > -- > > Key: DRILL-4734 > URL: https://issues.apache.org/jira/browse/DRILL-4734 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators, Storage - HBase >Affects Versions: 1.6.0 >Reporter: Aman Sinha > Attachments: 2nodes.explain.txt, 5nodes.explain.txt, drillbit.log > > > [Creating this JIRA on behalf of Qiang Li] > Let say I have two tables. > {noformat} > offers_ref0 : rowkey salt(1byte)+long uid(8 byte ) , family: v, qualifier: > v(string) > offers_nation_idx: rowkey salt(1byte) + string, family:v, qualifier: v(long > 8 byte) > {noformat} > there is the SQL: > {noformat} > select CONVERT_FROM(BYTE_SUBSTR(`ref0`.row_key,-8,8),'BIGINT_BE') as did, >convert_from(`ref0`.`v`.`v`,'UTF8') as v from hbase.`offers_nation_idx` > as`nation` > join hbase.offers_ref0 as `ref0` > on CONVERT_FROM(BYTE_SUBSTR(`ref0`.row_key,-8,8),'BIGINT_BE') = > CONVERT_FROM(nation.`v`.`v`,'BIGINT_BE') >where `nation`.row_key > '0br' and `nation`.row_key < '0bs' > limit 10 > {noformat} > When I execute the query with single node or less than 5 nodes, its working > good. But when I execute it in cluster which have about 14 nodes, its throw > a exception: > First time will throw this exception: > *Caused by: java.sql.SQLException: SYSTEM ERROR: SchemaChangeException: > Hash join does not support schema changes* > Then if I query again, it will always throw below exception: > {noformat} > *Query Failed: An Error Occurred* > *org.apache.drill.common.exceptions.UserRemoteException: SYSTEM > ERROR:IllegalStateException: > Failure while reading vector. Expected vector class of > org.apache.drill.exec.vector.NullableIntVector but > was holding vector class org.apache.drill.exec.vector.complex.MapVector, > field=v(MAP:REQUIRED > [v(VARBINARY:OPTIONAL)[$bits$(UINT1:REQUIRED), > v(VARBINARY:OPTIONAL)[$offsets$(UINT4:REQUIRED)]]] Fragment 12:4 > [Error Id:06c6eae4-0822-4714-b0bf-a6e04ebfec79 on xxx:31010]* > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (DRILL-1328) Support table statistics
[ https://issues.apache.org/jira/browse/DRILL-1328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gautam Kumar Parai reassigned DRILL-1328: - Assignee: Gautam Kumar Parai > Support table statistics > > > Key: DRILL-1328 > URL: https://issues.apache.org/jira/browse/DRILL-1328 > Project: Apache Drill > Issue Type: Improvement >Reporter: Cliff Buchanan >Assignee: Gautam Kumar Parai > Fix For: Future > > Attachments: 0001-PRE-Set-value-count-in-splitAndTransfer.patch > > > This consists of several subtasks > * implement operators to generate statistics > * add "analyze table" support to parser/planner > * create a metadata provider to allow statistics to be used by optiq in > planning optimization > * implement statistics functions > Right now, the bulk of this functionality is implemented, but it hasn't been > rigorously tested and needs to have some definite answers for some of the > parts "around the edges" (how analyze table figures out where the table > statistics are located, how a table "append" should work in a read only file > system) > Also, here are a few known caveats: > * table statistics are collected by creating a sql query based on the string > path of the table. This should probably be done with a Table reference. > * Case sensitivity for column statistics is probably iffy > * Math for combining two column NDVs into a joint NDV should be checked. > * Schema changes aren't really being considered yet. > * adding getDrillTable is probably unnecessary; it might be better to do > getTable().unwrap(DrillTable.class) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4203) Parquet File : Date is stored wrongly
[ https://issues.apache.org/jira/browse/DRILL-4203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15343111#comment-15343111 ] Jacques Nadeau commented on DRILL-4203: --- I think someone will need to pick this up. I don't think anyone is actively working on it. > Parquet File : Date is stored wrongly > - > > Key: DRILL-4203 > URL: https://issues.apache.org/jira/browse/DRILL-4203 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.4.0 >Reporter: Stéphane Trou >Priority: Critical > > Hello, > I have some problems when i try to read parquet files produce by drill with > Spark, all dates are corrupted. > I think the problem come from drill :) > {code} > cat /tmp/date_parquet.csv > Epoch,1970-01-01 > {code} > {code} > 0: jdbc:drill:zk=local> select columns[0] as name, cast(columns[1] as date) > as epoch_date from dfs.tmp.`date_parquet.csv`; > ++-+ > | name | epoch_date | > ++-+ > | Epoch | 1970-01-01 | > ++-+ > {code} > {code} > 0: jdbc:drill:zk=local> create table dfs.tmp.`buggy_parquet`as select > columns[0] as name, cast(columns[1] as date) as epoch_date from > dfs.tmp.`date_parquet.csv`; > +---++ > | Fragment | Number of records written | > +---++ > | 0_0 | 1 | > +---++ > {code} > When I read the file with parquet tools, i found > {code} > java -jar parquet-tools-1.8.1.jar head /tmp/buggy_parquet/ > name = Epoch > epoch_date = 4881176 > {code} > According to > [https://github.com/Parquet/parquet-format/blob/master/LogicalTypes.md#date], > epoch_date should be equals to 0. > Meta : > {code} > java -jar parquet-tools-1.8.1.jar meta /tmp/buggy_parquet/ > file:file:/tmp/buggy_parquet/0_0_0.parquet > creator: parquet-mr version 1.8.1-drill-r0 (build > 6b605a4ea05b66e1a6bf843353abcb4834a4ced8) > extra: drill.version = 1.4.0 > file schema: root > > name:OPTIONAL BINARY O:UTF8 R:0 D:1 > epoch_date: OPTIONAL INT32 O:DATE R:0 D:1 > row group 1: RC:1 TS:93 OFFSET:4 > > name: BINARY SNAPPY DO:0 FPO:4 SZ:52/50/0,96 VC:1 > ENC:RLE,BIT_PACKED,PLAIN > epoch_date: INT32 SNAPPY DO:0 FPO:56 SZ:45/43/0,96 VC:1 > ENC:RLE,BIT_PACKED,PLAIN > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4203) Parquet File : Date is stored wrongly
[ https://issues.apache.org/jira/browse/DRILL-4203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacques Nadeau updated DRILL-4203: -- Assignee: (was: Jason Altekruse) > Parquet File : Date is stored wrongly > - > > Key: DRILL-4203 > URL: https://issues.apache.org/jira/browse/DRILL-4203 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.4.0 >Reporter: Stéphane Trou >Priority: Critical > > Hello, > I have some problems when i try to read parquet files produce by drill with > Spark, all dates are corrupted. > I think the problem come from drill :) > {code} > cat /tmp/date_parquet.csv > Epoch,1970-01-01 > {code} > {code} > 0: jdbc:drill:zk=local> select columns[0] as name, cast(columns[1] as date) > as epoch_date from dfs.tmp.`date_parquet.csv`; > ++-+ > | name | epoch_date | > ++-+ > | Epoch | 1970-01-01 | > ++-+ > {code} > {code} > 0: jdbc:drill:zk=local> create table dfs.tmp.`buggy_parquet`as select > columns[0] as name, cast(columns[1] as date) as epoch_date from > dfs.tmp.`date_parquet.csv`; > +---++ > | Fragment | Number of records written | > +---++ > | 0_0 | 1 | > +---++ > {code} > When I read the file with parquet tools, i found > {code} > java -jar parquet-tools-1.8.1.jar head /tmp/buggy_parquet/ > name = Epoch > epoch_date = 4881176 > {code} > According to > [https://github.com/Parquet/parquet-format/blob/master/LogicalTypes.md#date], > epoch_date should be equals to 0. > Meta : > {code} > java -jar parquet-tools-1.8.1.jar meta /tmp/buggy_parquet/ > file:file:/tmp/buggy_parquet/0_0_0.parquet > creator: parquet-mr version 1.8.1-drill-r0 (build > 6b605a4ea05b66e1a6bf843353abcb4834a4ced8) > extra: drill.version = 1.4.0 > file schema: root > > name:OPTIONAL BINARY O:UTF8 R:0 D:1 > epoch_date: OPTIONAL INT32 O:DATE R:0 D:1 > row group 1: RC:1 TS:93 OFFSET:4 > > name: BINARY SNAPPY DO:0 FPO:4 SZ:52/50/0,96 VC:1 > ENC:RLE,BIT_PACKED,PLAIN > epoch_date: INT32 SNAPPY DO:0 FPO:56 SZ:45/43/0,96 VC:1 > ENC:RLE,BIT_PACKED,PLAIN > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4743) HashJoin's not fully parallelized in query plan
[ https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15343061#comment-15343061 ] Gautam Kumar Parai commented on DRILL-4743: --- [~amansinha100] I have updated the pull request. Please take a look. > HashJoin's not fully parallelized in query plan > --- > > Key: DRILL-4743 > URL: https://issues.apache.org/jira/browse/DRILL-4743 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.5.0 >Reporter: Gautam Kumar Parai >Assignee: Gautam Kumar Parai > Labels: doc-impacting > > The underlying problem is filter selectivity under-estimate for a query with > complicated predicates e.g. deeply nested and/or predicates. This leads to > under parallelization of the major fragment doing the join. > To really resolve this problem we need table/column statistics to correctly > estimate the selectivity. However, in the absence of statistics OR even when > existing statistics are insufficient to get a correct estimate of selectivity > this will serve as a workaround. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4571) Add link to local Drill logs from the web UI
[ https://issues.apache.org/jira/browse/DRILL-4571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15343042#comment-15343042 ] Krystal commented on DRILL-4571: git.commit.id.abbrev=fbdd20e Verified feature. > Add link to local Drill logs from the web UI > > > Key: DRILL-4571 > URL: https://issues.apache.org/jira/browse/DRILL-4571 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.6.0 >Reporter: Arina Ielchiieva >Assignee: Arina Ielchiieva > Labels: doc-impacting > Fix For: 1.7.0 > > Attachments: display_log.JPG, drillbit_download.log.gz, > drillbit_queries_json_screenshot.jpg, drillbit_ui.log, log_list.JPG > > > Now we have link to the profile from the web UI. > It will be handy for the users to have the link to local logs as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (DRILL-4571) Add link to local Drill logs from the web UI
[ https://issues.apache.org/jira/browse/DRILL-4571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krystal closed DRILL-4571. -- Verified. > Add link to local Drill logs from the web UI > > > Key: DRILL-4571 > URL: https://issues.apache.org/jira/browse/DRILL-4571 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.6.0 >Reporter: Arina Ielchiieva >Assignee: Arina Ielchiieva > Labels: doc-impacting > Fix For: 1.7.0 > > Attachments: display_log.JPG, drillbit_download.log.gz, > drillbit_queries_json_screenshot.jpg, drillbit_ui.log, log_list.JPG > > > Now we have link to the profile from the web UI. > It will be handy for the users to have the link to local logs as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4203) Parquet File : Date is stored wrongly
[ https://issues.apache.org/jira/browse/DRILL-4203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15342999#comment-15342999 ] Rahul Challapalli commented on DRILL-4203: -- May be I lost track of some conversation around this. What is the latest update on this issue? > Parquet File : Date is stored wrongly > - > > Key: DRILL-4203 > URL: https://issues.apache.org/jira/browse/DRILL-4203 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.4.0 >Reporter: Stéphane Trou >Assignee: Jason Altekruse >Priority: Critical > > Hello, > I have some problems when i try to read parquet files produce by drill with > Spark, all dates are corrupted. > I think the problem come from drill :) > {code} > cat /tmp/date_parquet.csv > Epoch,1970-01-01 > {code} > {code} > 0: jdbc:drill:zk=local> select columns[0] as name, cast(columns[1] as date) > as epoch_date from dfs.tmp.`date_parquet.csv`; > ++-+ > | name | epoch_date | > ++-+ > | Epoch | 1970-01-01 | > ++-+ > {code} > {code} > 0: jdbc:drill:zk=local> create table dfs.tmp.`buggy_parquet`as select > columns[0] as name, cast(columns[1] as date) as epoch_date from > dfs.tmp.`date_parquet.csv`; > +---++ > | Fragment | Number of records written | > +---++ > | 0_0 | 1 | > +---++ > {code} > When I read the file with parquet tools, i found > {code} > java -jar parquet-tools-1.8.1.jar head /tmp/buggy_parquet/ > name = Epoch > epoch_date = 4881176 > {code} > According to > [https://github.com/Parquet/parquet-format/blob/master/LogicalTypes.md#date], > epoch_date should be equals to 0. > Meta : > {code} > java -jar parquet-tools-1.8.1.jar meta /tmp/buggy_parquet/ > file:file:/tmp/buggy_parquet/0_0_0.parquet > creator: parquet-mr version 1.8.1-drill-r0 (build > 6b605a4ea05b66e1a6bf843353abcb4834a4ced8) > extra: drill.version = 1.4.0 > file schema: root > > name:OPTIONAL BINARY O:UTF8 R:0 D:1 > epoch_date: OPTIONAL INT32 O:DATE R:0 D:1 > row group 1: RC:1 TS:93 OFFSET:4 > > name: BINARY SNAPPY DO:0 FPO:4 SZ:52/50/0,96 VC:1 > ENC:RLE,BIT_PACKED,PLAIN > epoch_date: INT32 SNAPPY DO:0 FPO:56 SZ:45/43/0,96 VC:1 > ENC:RLE,BIT_PACKED,PLAIN > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4733) max(dir0) reading more columns than necessary
[ https://issues.apache.org/jira/browse/DRILL-4733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15342874#comment-15342874 ] ASF GitHub Bot commented on DRILL-4733: --- Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/531#discussion_r67962703 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/store/TestImplicitFileColumns.java --- @@ -110,4 +111,20 @@ public void testImplicitColumnsForParquet() throws Exception { .go(); } + @Test // DRILL-4733 + public void testMultilevelParquetWithSchemaChange() throws Exception { +try { + test("alter session set `planner.enable_decimal_data_type` = true"); + testBuilder() + .sqlQuery(String.format("select max(dir0) as max_dir from dfs_test.`%s/src/test/resources/multilevel/parquetWithSchemaChange`", + TestTools.getWorkingPath())) + .unOrdered() + .baselineColumns("max_dir") + .baselineValues("voter50.parquet") --- End diff -- @jinfengni I guess confusion in here is that `voter50.parquet` is folder name. If would be clearer I can rename folder and files in it (currently files have names 0_0_0.parquet). > max(dir0) reading more columns than necessary > - > > Key: DRILL-4733 > URL: https://issues.apache.org/jira/browse/DRILL-4733 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization, Storage - Parquet >Affects Versions: 1.7.0 >Reporter: Rahul Challapalli >Assignee: Arina Ielchiieva >Priority: Critical > Fix For: 1.7.0 > > Attachments: bug.tgz > > > The below query started to fail from this commit : > 3209886a8548eea4a2f74c059542672f8665b8d2 > {code} > select max(dir0) from dfs.`/drill/testdata/bug/2016`; > Error: UNSUPPORTED_OPERATION ERROR: Streaming aggregate does not support > schema changes > Fragment 0:0 > [Error Id: b0060205-e9a6-428a-9803-7b4312b2c6f4 on qa-node190.qa.lab:31010] > (state=,code=0) > {code} > The sub-folders contains files which do have schema change for one column > "contributions" (int32 vs double). However prior to this commit we did not > fail in the scenario. Log files and test data are attached -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4743) HashJoin's not fully parallelized in query plan
[ https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zelaine Fong updated DRILL-4743: Labels: doc-impacting (was: ) > HashJoin's not fully parallelized in query plan > --- > > Key: DRILL-4743 > URL: https://issues.apache.org/jira/browse/DRILL-4743 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.5.0 >Reporter: Gautam Kumar Parai >Assignee: Gautam Kumar Parai > Labels: doc-impacting > > The underlying problem is filter selectivity under-estimate for a query with > complicated predicates e.g. deeply nested and/or predicates. This leads to > under parallelization of the major fragment doing the join. > To really resolve this problem we need table/column statistics to correctly > estimate the selectivity. However, in the absence of statistics OR even when > existing statistics are insufficient to get a correct estimate of selectivity > this will serve as a workaround. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4743) HashJoin's not fully parallelized in query plan
[ https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15342826#comment-15342826 ] Gautam Kumar Parai commented on DRILL-4743: --- I have created a pull request https://github.com/apache/drill/pull/534 [~amansinha100] can you please take a look and provide the feedback > HashJoin's not fully parallelized in query plan > --- > > Key: DRILL-4743 > URL: https://issues.apache.org/jira/browse/DRILL-4743 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.5.0 >Reporter: Gautam Kumar Parai >Assignee: Gautam Kumar Parai > > The underlying problem is filter selectivity under-estimate for a query with > complicated predicates e.g. deeply nested and/or predicates. This leads to > under parallelization of the major fragment doing the join. > To really resolve this problem we need table/column statistics to correctly > estimate the selectivity. However, in the absence of statistics OR even when > existing statistics are insufficient to get a correct estimate of selectivity > this will serve as a workaround. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4743) HashJoin's not fully parallelized in query plan
[ https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15342825#comment-15342825 ] ASF GitHub Bot commented on DRILL-4743: --- GitHub user gparai opened a pull request: https://github.com/apache/drill/pull/534 [DRILL-4743] HashJoin's not fully parallelized in query plan Provide a user parameter for defining a lower bound of selectivity to prevent under-estimates on filter selectivity. You can merge this pull request into a Git repository by running: $ git pull https://github.com/gparai/drill MD-880-ADM Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/534.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #534 > HashJoin's not fully parallelized in query plan > --- > > Key: DRILL-4743 > URL: https://issues.apache.org/jira/browse/DRILL-4743 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.5.0 >Reporter: Gautam Kumar Parai >Assignee: Gautam Kumar Parai > > The underlying problem is filter selectivity under-estimate for a query with > complicated predicates e.g. deeply nested and/or predicates. This leads to > under parallelization of the major fragment doing the join. > To really resolve this problem we need table/column statistics to correctly > estimate the selectivity. However, in the absence of statistics OR even when > existing statistics are insufficient to get a correct estimate of selectivity > this will serve as a workaround. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4743) HashJoin's not fully parallelized in query plan
Gautam Kumar Parai created DRILL-4743: - Summary: HashJoin's not fully parallelized in query plan Key: DRILL-4743 URL: https://issues.apache.org/jira/browse/DRILL-4743 Project: Apache Drill Issue Type: Bug Affects Versions: 1.5.0 Reporter: Gautam Kumar Parai Assignee: Gautam Kumar Parai The underlying problem is filter selectivity under-estimate for a query with complicated predicates e.g. deeply nested and/or predicates. This leads to under parallelization of the major fragment doing the join. To really resolve this problem we need table/column statistics to correctly estimate the selectivity. However, in the absence of statistics OR even when existing statistics are insufficient to get a correct estimate of selectivity this will serve as a workaround. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4733) max(dir0) reading more columns than necessary
[ https://issues.apache.org/jira/browse/DRILL-4733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15342753#comment-15342753 ] ASF GitHub Bot commented on DRILL-4733: --- Github user amansinha100 commented on a diff in the pull request: https://github.com/apache/drill/pull/531#discussion_r67955187 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/easy/EasyFormatPlugin.java --- @@ -126,8 +127,12 @@ CloseableRecordBatch getReaderBatch(FragmentContext context, EasySubScan scan) t final ImplicitColumnExplorer columnExplorer = new ImplicitColumnExplorer(context, scan.getColumns()); if (!columnExplorer.isSelectAllColumns()) { + // We must make sure to pass a table column (not to be confused with implicit column) to the underlying record reader. + List tableColumns = --- End diff -- Ok, I see. Although this patch resolves this issue, I am thinking that without doing a performance test it is not feasible to see the performance impact of the overall implicit columns support. It is a nice feature to have but I think we can give it a little more time to go through functional and perf tests. > max(dir0) reading more columns than necessary > - > > Key: DRILL-4733 > URL: https://issues.apache.org/jira/browse/DRILL-4733 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization, Storage - Parquet >Affects Versions: 1.7.0 >Reporter: Rahul Challapalli >Assignee: Arina Ielchiieva >Priority: Critical > Fix For: 1.7.0 > > Attachments: bug.tgz > > > The below query started to fail from this commit : > 3209886a8548eea4a2f74c059542672f8665b8d2 > {code} > select max(dir0) from dfs.`/drill/testdata/bug/2016`; > Error: UNSUPPORTED_OPERATION ERROR: Streaming aggregate does not support > schema changes > Fragment 0:0 > [Error Id: b0060205-e9a6-428a-9803-7b4312b2c6f4 on qa-node190.qa.lab:31010] > (state=,code=0) > {code} > The sub-folders contains files which do have schema change for one column > "contributions" (int32 vs double). However prior to this commit we did not > fail in the scenario. Log files and test data are attached -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4742) Using convert_from timestamp_impala gives a random error
[ https://issues.apache.org/jira/browse/DRILL-4742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rahul Challapalli updated DRILL-4742: - Attachment: temp.parquet error.txt The above query ran successfully 5-6 times before I hit the random error. The attached log contains information related to the successful runs as well > Using convert_from timestamp_impala gives a random error > > > Key: DRILL-4742 > URL: https://issues.apache.org/jira/browse/DRILL-4742 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.6.0, 1.7.0 >Reporter: Rahul Challapalli >Priority: Critical > Attachments: error.txt, temp.parquet > > > Drill Commit # fbdd20e54351879200184b478c2a32f238bf2176 > The following query randomly generates the below error. > {code} > select convert_from(create_timestamp, 'TIMESTAMP_IMPALA') from > dfs.`/drill/testdata/temp.parquet`; > Error: SYSTEM ERROR: ArrayIndexOutOfBoundsException: 0 > Fragment 0:0 > [Error Id: 9fe53a95-c4ae-424d-8c6d-489abab2d2ca on qa-node190.qa.lab:31010] > (state=,code=0) > {code} > The underlying parquet file is generated using hive. Below is the metadata > information > {code} > /root/parquet-tools-1.5.1-SNAPSHOT/parquet-meta temp.parquet > creator: parquet-mr version 1.6.0 > file schema: hive_schema > > voter_id: OPTIONAL INT32 R:0 D:1 > name: OPTIONAL BINARY O:UTF8 R:0 D:1 > age: OPTIONAL INT32 R:0 D:1 > registration: OPTIONAL BINARY O:UTF8 R:0 D:1 > contributions:OPTIONAL FLOAT R:0 D:1 > voterzone:OPTIONAL INT32 R:0 D:1 > create_timestamp: OPTIONAL INT96 R:0 D:1 > create_date: OPTIONAL INT32 O:DATE R:0 D:1 > row group 1: RC:200 TS:9902 > > voter_id: INT32 UNCOMPRESSED DO:0 FPO:4 SZ:843/843/1.00 VC:200 > ENC:RLE,BIT_PACKED,PLAIN > name: BINARY UNCOMPRESSED DO:0 FPO:847 SZ:3214/3214/1.00 VC:200 > ENC:PLAIN_DICTIONARY,RLE,BIT_PACKED > age: INT32 UNCOMPRESSED DO:0 FPO:4061 SZ:438/438/1.00 VC:200 > ENC:PLAIN_DICTIONARY,RLE,BIT_PACKED > registration: BINARY UNCOMPRESSED DO:0 FPO:4499 SZ:241/241/1.00 VC:200 > ENC:PLAIN_DICTIONARY,RLE,BIT_PACKED > contributions: FLOAT UNCOMPRESSED DO:0 FPO:4740 SZ:843/843/1.00 VC:200 > ENC:RLE,BIT_PACKED,PLAIN > voterzone: INT32 UNCOMPRESSED DO:0 FPO:5583 SZ:843/843/1.00 VC:200 > ENC:RLE,BIT_PACKED,PLAIN > create_timestamp: INT96 UNCOMPRESSED DO:0 FPO:6426 SZ:2642/2642/1.00 VC:200 > ENC:PLAIN_DICTIONARY,RLE,BIT_PACKED > create_date: INT32 UNCOMPRESSED DO:0 FPO:9068 SZ:838/838/1.00 VC:200 > ENC:RLE,BIT_PACKED,PLAIN > {code} > I attached the log file and the data file -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4742) Using convert_from timestamp_impala gives a random error
Rahul Challapalli created DRILL-4742: Summary: Using convert_from timestamp_impala gives a random error Key: DRILL-4742 URL: https://issues.apache.org/jira/browse/DRILL-4742 Project: Apache Drill Issue Type: Bug Affects Versions: 1.6.0, 1.7.0 Reporter: Rahul Challapalli Priority: Critical Drill Commit # fbdd20e54351879200184b478c2a32f238bf2176 The following query randomly generates the below error. {code} select convert_from(create_timestamp, 'TIMESTAMP_IMPALA') from dfs.`/drill/testdata/temp.parquet`; Error: SYSTEM ERROR: ArrayIndexOutOfBoundsException: 0 Fragment 0:0 [Error Id: 9fe53a95-c4ae-424d-8c6d-489abab2d2ca on qa-node190.qa.lab:31010] (state=,code=0) {code} The underlying parquet file is generated using hive. Below is the metadata information {code} /root/parquet-tools-1.5.1-SNAPSHOT/parquet-meta temp.parquet creator: parquet-mr version 1.6.0 file schema: hive_schema voter_id: OPTIONAL INT32 R:0 D:1 name: OPTIONAL BINARY O:UTF8 R:0 D:1 age: OPTIONAL INT32 R:0 D:1 registration: OPTIONAL BINARY O:UTF8 R:0 D:1 contributions:OPTIONAL FLOAT R:0 D:1 voterzone:OPTIONAL INT32 R:0 D:1 create_timestamp: OPTIONAL INT96 R:0 D:1 create_date: OPTIONAL INT32 O:DATE R:0 D:1 row group 1: RC:200 TS:9902 voter_id: INT32 UNCOMPRESSED DO:0 FPO:4 SZ:843/843/1.00 VC:200 ENC:RLE,BIT_PACKED,PLAIN name: BINARY UNCOMPRESSED DO:0 FPO:847 SZ:3214/3214/1.00 VC:200 ENC:PLAIN_DICTIONARY,RLE,BIT_PACKED age: INT32 UNCOMPRESSED DO:0 FPO:4061 SZ:438/438/1.00 VC:200 ENC:PLAIN_DICTIONARY,RLE,BIT_PACKED registration: BINARY UNCOMPRESSED DO:0 FPO:4499 SZ:241/241/1.00 VC:200 ENC:PLAIN_DICTIONARY,RLE,BIT_PACKED contributions: FLOAT UNCOMPRESSED DO:0 FPO:4740 SZ:843/843/1.00 VC:200 ENC:RLE,BIT_PACKED,PLAIN voterzone: INT32 UNCOMPRESSED DO:0 FPO:5583 SZ:843/843/1.00 VC:200 ENC:RLE,BIT_PACKED,PLAIN create_timestamp: INT96 UNCOMPRESSED DO:0 FPO:6426 SZ:2642/2642/1.00 VC:200 ENC:PLAIN_DICTIONARY,RLE,BIT_PACKED create_date: INT32 UNCOMPRESSED DO:0 FPO:9068 SZ:838/838/1.00 VC:200 ENC:RLE,BIT_PACKED,PLAIN {code} I attached the log file and the data file -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (DRILL-4735) Count(dir0) on parquet returns 0 result
[ https://issues.apache.org/jira/browse/DRILL-4735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15342270#comment-15342270 ] Jinfeng Ni edited comment on DRILL-4735 at 6/21/16 8:47 PM: I run the query on 1.4.0, and saw the same problem. I have not checked earlier version. But it's likely that this problem has been there for long time. This bug also happened on 1.0.0 release. was (Author: jni): I run the query on 1.4.0, and saw the same problem. I have not checked earlier version. But it's likely that this problem has been there for long time. > Count(dir0) on parquet returns 0 result > --- > > Key: DRILL-4735 > URL: https://issues.apache.org/jira/browse/DRILL-4735 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization, Storage - Parquet >Affects Versions: 1.0.0, 1.4.0, 1.6.0, 1.7.0 >Reporter: Krystal >Assignee: Jinfeng Ni >Priority: Critical > > Selecting a count of dir0, dir1, etc against a parquet directory returns 0 > rows. > select count(dir0) from `min_max_dir`; > +-+ > | EXPR$0 | > +-+ > | 0 | > +-+ > select count(dir1) from `min_max_dir`; > +-+ > | EXPR$0 | > +-+ > | 0 | > +-+ > If I put both dir0 and dir1 in the same select, it returns expected result: > select count(dir0), count(dir1) from `min_max_dir`; > +-+-+ > | EXPR$0 | EXPR$1 | > +-+-+ > | 600 | 600 | > +-+-+ > Here is the physical plan for count(dir0) query: > {code} > 00-00Screen : rowType = RecordType(BIGINT EXPR$0): rowcount = 20.0, > cumulative cost = {22.0 rows, 22.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id > = 1346 > 00-01 Project(EXPR$0=[$0]) : rowType = RecordType(BIGINT EXPR$0): > rowcount = 20.0, cumulative cost = {20.0 rows, 20.0 cpu, 0.0 io, 0.0 network, > 0.0 memory}, id = 1345 > 00-02Project(EXPR$0=[$0]) : rowType = RecordType(BIGINT EXPR$0): > rowcount = 20.0, cumulative cost = {20.0 rows, 20.0 cpu, 0.0 io, 0.0 network, > 0.0 memory}, id = 1344 > 00-03 > Scan(groupscan=[org.apache.drill.exec.store.pojo.PojoRecordReader@3da85d3b[columns > = null, isStarQuery = false, isSkipQuery = false]]) : rowType = > RecordType(BIGINT count): rowcount = 20.0, cumulative cost = {20.0 rows, 20.0 > cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 1343 > {code} > Here is part of the explain plan for the count(dir0) and count(dir1) in the > same select: > {code} > 00-00Screen : rowType = RecordType(BIGINT EXPR$0, BIGINT EXPR$1): > rowcount = 60.0, cumulative cost = {1206.0 rows, 15606.0 cpu, 0.0 io, 0.0 > network, 0.0 memory}, id = 1623 > 00-01 Project(EXPR$0=[$0], EXPR$1=[$1]) : rowType = RecordType(BIGINT > EXPR$0, BIGINT EXPR$1): rowcount = 60.0, cumulative cost = {1200.0 rows, > 15600.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 1622 > 00-02StreamAgg(group=[{}], EXPR$0=[COUNT($0)], EXPR$1=[COUNT($1)]) : > rowType = RecordType(BIGINT EXPR$0, BIGINT EXPR$1): rowcount = 60.0, > cumulative cost = {1200.0 rows, 15600.0 cpu, 0.0 io, 0.0 network, 0.0 > memory}, id = 1621 > 00-03 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath > [path=maprfs:/drill/testdata/min_max_dir/1999/Apr/voter20.parquet/0_0_0.parquet], > ReadEntryWithPath > [path=maprfs:/drill/testdata/min_max_dir/1999/MAR/voter15.parquet/0_0_0.parquet], > ReadEntryWithPath > [path=maprfs:/drill/testdata/min_max_dir/1985/jan/voter5.parquet/0_0_0.parquet], > ReadEntryWithPath > [path=maprfs:/drill/testdata/min_max_dir/1985/apr/voter60.parquet/0_0_0.parquet],..., > ReadEntryWithPath > [path=maprfs:/drill/testdata/min_max_dir/2014/jul/voter35.parquet/0_0_0.parquet]], > selectionRoot=maprfs:/drill/testdata/min_max_dir, numFiles=16, > usedMetadataFile=false, columns=[`dir0`, `dir1`]]]) : rowType = > RecordType(ANY dir0, ANY dir1): rowcount = 600.0, cumulative cost = {600.0 > rows, 1200.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 1620 > {code} > Notice that in the first case, > "org.apache.drill.exec.store.pojo.PojoRecordReader" is used. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (DRILL-4735) Count(dir0) on parquet returns 0 result
[ https://issues.apache.org/jira/browse/DRILL-4735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinfeng Ni reassigned DRILL-4735: - Assignee: Jinfeng Ni > Count(dir0) on parquet returns 0 result > --- > > Key: DRILL-4735 > URL: https://issues.apache.org/jira/browse/DRILL-4735 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization, Storage - Parquet >Affects Versions: 1.0.0, 1.4.0, 1.6.0, 1.7.0 >Reporter: Krystal >Assignee: Jinfeng Ni >Priority: Critical > > Selecting a count of dir0, dir1, etc against a parquet directory returns 0 > rows. > select count(dir0) from `min_max_dir`; > +-+ > | EXPR$0 | > +-+ > | 0 | > +-+ > select count(dir1) from `min_max_dir`; > +-+ > | EXPR$0 | > +-+ > | 0 | > +-+ > If I put both dir0 and dir1 in the same select, it returns expected result: > select count(dir0), count(dir1) from `min_max_dir`; > +-+-+ > | EXPR$0 | EXPR$1 | > +-+-+ > | 600 | 600 | > +-+-+ > Here is the physical plan for count(dir0) query: > {code} > 00-00Screen : rowType = RecordType(BIGINT EXPR$0): rowcount = 20.0, > cumulative cost = {22.0 rows, 22.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id > = 1346 > 00-01 Project(EXPR$0=[$0]) : rowType = RecordType(BIGINT EXPR$0): > rowcount = 20.0, cumulative cost = {20.0 rows, 20.0 cpu, 0.0 io, 0.0 network, > 0.0 memory}, id = 1345 > 00-02Project(EXPR$0=[$0]) : rowType = RecordType(BIGINT EXPR$0): > rowcount = 20.0, cumulative cost = {20.0 rows, 20.0 cpu, 0.0 io, 0.0 network, > 0.0 memory}, id = 1344 > 00-03 > Scan(groupscan=[org.apache.drill.exec.store.pojo.PojoRecordReader@3da85d3b[columns > = null, isStarQuery = false, isSkipQuery = false]]) : rowType = > RecordType(BIGINT count): rowcount = 20.0, cumulative cost = {20.0 rows, 20.0 > cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 1343 > {code} > Here is part of the explain plan for the count(dir0) and count(dir1) in the > same select: > {code} > 00-00Screen : rowType = RecordType(BIGINT EXPR$0, BIGINT EXPR$1): > rowcount = 60.0, cumulative cost = {1206.0 rows, 15606.0 cpu, 0.0 io, 0.0 > network, 0.0 memory}, id = 1623 > 00-01 Project(EXPR$0=[$0], EXPR$1=[$1]) : rowType = RecordType(BIGINT > EXPR$0, BIGINT EXPR$1): rowcount = 60.0, cumulative cost = {1200.0 rows, > 15600.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 1622 > 00-02StreamAgg(group=[{}], EXPR$0=[COUNT($0)], EXPR$1=[COUNT($1)]) : > rowType = RecordType(BIGINT EXPR$0, BIGINT EXPR$1): rowcount = 60.0, > cumulative cost = {1200.0 rows, 15600.0 cpu, 0.0 io, 0.0 network, 0.0 > memory}, id = 1621 > 00-03 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath > [path=maprfs:/drill/testdata/min_max_dir/1999/Apr/voter20.parquet/0_0_0.parquet], > ReadEntryWithPath > [path=maprfs:/drill/testdata/min_max_dir/1999/MAR/voter15.parquet/0_0_0.parquet], > ReadEntryWithPath > [path=maprfs:/drill/testdata/min_max_dir/1985/jan/voter5.parquet/0_0_0.parquet], > ReadEntryWithPath > [path=maprfs:/drill/testdata/min_max_dir/1985/apr/voter60.parquet/0_0_0.parquet],..., > ReadEntryWithPath > [path=maprfs:/drill/testdata/min_max_dir/2014/jul/voter35.parquet/0_0_0.parquet]], > selectionRoot=maprfs:/drill/testdata/min_max_dir, numFiles=16, > usedMetadataFile=false, columns=[`dir0`, `dir1`]]]) : rowType = > RecordType(ANY dir0, ANY dir1): rowcount = 600.0, cumulative cost = {600.0 > rows, 1200.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 1620 > {code} > Notice that in the first case, > "org.apache.drill.exec.store.pojo.PojoRecordReader" is used. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4735) Count(dir0) on parquet returns 0 result
[ https://issues.apache.org/jira/browse/DRILL-4735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinfeng Ni updated DRILL-4735: -- Affects Version/s: 1.0.0 > Count(dir0) on parquet returns 0 result > --- > > Key: DRILL-4735 > URL: https://issues.apache.org/jira/browse/DRILL-4735 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization, Storage - Parquet >Affects Versions: 1.0.0, 1.4.0, 1.6.0, 1.7.0 >Reporter: Krystal >Priority: Critical > > Selecting a count of dir0, dir1, etc against a parquet directory returns 0 > rows. > select count(dir0) from `min_max_dir`; > +-+ > | EXPR$0 | > +-+ > | 0 | > +-+ > select count(dir1) from `min_max_dir`; > +-+ > | EXPR$0 | > +-+ > | 0 | > +-+ > If I put both dir0 and dir1 in the same select, it returns expected result: > select count(dir0), count(dir1) from `min_max_dir`; > +-+-+ > | EXPR$0 | EXPR$1 | > +-+-+ > | 600 | 600 | > +-+-+ > Here is the physical plan for count(dir0) query: > {code} > 00-00Screen : rowType = RecordType(BIGINT EXPR$0): rowcount = 20.0, > cumulative cost = {22.0 rows, 22.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id > = 1346 > 00-01 Project(EXPR$0=[$0]) : rowType = RecordType(BIGINT EXPR$0): > rowcount = 20.0, cumulative cost = {20.0 rows, 20.0 cpu, 0.0 io, 0.0 network, > 0.0 memory}, id = 1345 > 00-02Project(EXPR$0=[$0]) : rowType = RecordType(BIGINT EXPR$0): > rowcount = 20.0, cumulative cost = {20.0 rows, 20.0 cpu, 0.0 io, 0.0 network, > 0.0 memory}, id = 1344 > 00-03 > Scan(groupscan=[org.apache.drill.exec.store.pojo.PojoRecordReader@3da85d3b[columns > = null, isStarQuery = false, isSkipQuery = false]]) : rowType = > RecordType(BIGINT count): rowcount = 20.0, cumulative cost = {20.0 rows, 20.0 > cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 1343 > {code} > Here is part of the explain plan for the count(dir0) and count(dir1) in the > same select: > {code} > 00-00Screen : rowType = RecordType(BIGINT EXPR$0, BIGINT EXPR$1): > rowcount = 60.0, cumulative cost = {1206.0 rows, 15606.0 cpu, 0.0 io, 0.0 > network, 0.0 memory}, id = 1623 > 00-01 Project(EXPR$0=[$0], EXPR$1=[$1]) : rowType = RecordType(BIGINT > EXPR$0, BIGINT EXPR$1): rowcount = 60.0, cumulative cost = {1200.0 rows, > 15600.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 1622 > 00-02StreamAgg(group=[{}], EXPR$0=[COUNT($0)], EXPR$1=[COUNT($1)]) : > rowType = RecordType(BIGINT EXPR$0, BIGINT EXPR$1): rowcount = 60.0, > cumulative cost = {1200.0 rows, 15600.0 cpu, 0.0 io, 0.0 network, 0.0 > memory}, id = 1621 > 00-03 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath > [path=maprfs:/drill/testdata/min_max_dir/1999/Apr/voter20.parquet/0_0_0.parquet], > ReadEntryWithPath > [path=maprfs:/drill/testdata/min_max_dir/1999/MAR/voter15.parquet/0_0_0.parquet], > ReadEntryWithPath > [path=maprfs:/drill/testdata/min_max_dir/1985/jan/voter5.parquet/0_0_0.parquet], > ReadEntryWithPath > [path=maprfs:/drill/testdata/min_max_dir/1985/apr/voter60.parquet/0_0_0.parquet],..., > ReadEntryWithPath > [path=maprfs:/drill/testdata/min_max_dir/2014/jul/voter35.parquet/0_0_0.parquet]], > selectionRoot=maprfs:/drill/testdata/min_max_dir, numFiles=16, > usedMetadataFile=false, columns=[`dir0`, `dir1`]]]) : rowType = > RecordType(ANY dir0, ANY dir1): rowcount = 600.0, cumulative cost = {600.0 > rows, 1200.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 1620 > {code} > Notice that in the first case, > "org.apache.drill.exec.store.pojo.PojoRecordReader" is used. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4733) max(dir0) reading more columns than necessary
[ https://issues.apache.org/jira/browse/DRILL-4733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15342588#comment-15342588 ] ASF GitHub Bot commented on DRILL-4733: --- Github user jinfengni commented on a diff in the pull request: https://github.com/apache/drill/pull/531#discussion_r67943855 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/store/TestImplicitFileColumns.java --- @@ -110,4 +111,20 @@ public void testImplicitColumnsForParquet() throws Exception { .go(); } + @Test // DRILL-4733 + public void testMultilevelParquetWithSchemaChange() throws Exception { +try { + test("alter session set `planner.enable_decimal_data_type` = true"); + testBuilder() + .sqlQuery(String.format("select max(dir0) as max_dir from dfs_test.`%s/src/test/resources/multilevel/parquetWithSchemaChange`", + TestTools.getWorkingPath())) + .unOrdered() + .baselineColumns("max_dir") + .baselineValues("voter50.parquet") --- End diff -- Why do you put baselineValue in a parquet, in stead of putting it in the testcase directly? Tthe query seems to return one single value. > max(dir0) reading more columns than necessary > - > > Key: DRILL-4733 > URL: https://issues.apache.org/jira/browse/DRILL-4733 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization, Storage - Parquet >Affects Versions: 1.7.0 >Reporter: Rahul Challapalli >Assignee: Arina Ielchiieva >Priority: Critical > Fix For: 1.7.0 > > Attachments: bug.tgz > > > The below query started to fail from this commit : > 3209886a8548eea4a2f74c059542672f8665b8d2 > {code} > select max(dir0) from dfs.`/drill/testdata/bug/2016`; > Error: UNSUPPORTED_OPERATION ERROR: Streaming aggregate does not support > schema changes > Fragment 0:0 > [Error Id: b0060205-e9a6-428a-9803-7b4312b2c6f4 on qa-node190.qa.lab:31010] > (state=,code=0) > {code} > The sub-folders contains files which do have schema change for one column > "contributions" (int32 vs double). However prior to this commit we did not > fail in the scenario. Log files and test data are attached -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4741) sqlline scripts should differentiate embedded vs remote config
Paul Rogers created DRILL-4741: -- Summary: sqlline scripts should differentiate embedded vs remote config Key: DRILL-4741 URL: https://issues.apache.org/jira/browse/DRILL-4741 Project: Apache Drill Issue Type: Improvement Affects Versions: 1.8.0 Reporter: Paul Rogers Priority: Minor $DRILL_HOME/bin contains four sqlline-related scripts: sqlline -- main script for running sqlline drill-conf — Wrapper for sqlline, uses drill config to find Drill. Seems this one needs fixing to use a config other than the hard-coded $DRILL_HOME/conf location. drill-embedded — Starts a drill “embedded” in SqlLine, using a local ZK. drill-localhost — Wrapper for sqlline, uses a local ZK. The last three turn around and call sqlline. Behind the scenes, the script call drill-config.sh and drill-env.sh to do setup. Note, however that we run Sqlline and Drill in three distinct configurations: sqlline as client: should run with light memory drillbit as daemon: should run with full memory use sqline with embedded drillbit: sqlline needs to run with Drillbit memory options. Today, sqlline always uses the Drillbit memory options (and VM options) which results in too much memory and port conflicts when running client-only. Provide sqlline specific VM and memory options. Then, the tricky bit, use them only when Drill is not embedded. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4387) Improve execution side when it handles skipAll query
[ https://issues.apache.org/jira/browse/DRILL-4387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15342463#comment-15342463 ] Khurram Faraaz commented on DRILL-4387: --- The below queries return wrong results. (the problem seems to be there for quite some time) {noformat} Directory structure is [root@centos-01 DRILL_4589]# ls 1990 1992 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014 1991 1993 1995 1997 1999 2001 2003 2005 2007 2009 2011 2013 2015 [root@centos-01 DRILL_4589]# cd 1990 [root@centos-01 1990]# ls Q1 Q2 Q3 Q4 and so on... Below two queries return 0, I don't think the results are correct, please review 0: jdbc:drill:schema=dfs.tmp> select count(dir0) from `DRILL_4589`; +-+ | EXPR$0 | +-+ | 0 | +-+ 1 row selected (9.117 seconds) 0: jdbc:drill:schema=dfs.tmp> select count(dir1) from `DRILL_4589`; +-+ | EXPR$0 | +-+ | 0 | +-+ 1 row selected (8.97 seconds) 0: jdbc:drill:schema=dfs.tmp> explain plan for select count(dir0) from `DRILL_4589`; +--+--+ | text | json | +--+--+ | 00-00Screen 00-01 Project(EXPR$0=[$0]) 00-02Project(EXPR$0=[$0]) 00-03 Scan(groupscan=[org.apache.drill.exec.store.pojo.PojoRecordReader@5275c59a[columns = null, isStarQuery = false, isSkipQuery = false]]) 0: jdbc:drill:schema=dfs.tmp> explain plan for select count(dir1) from `DRILL_4589`; +--+--+ | text | json | +--+--+ | 00-00Screen 00-01 Project(EXPR$0=[$0]) 00-02Project(EXPR$0=[$0]) 00-03 Scan(groupscan=[org.apache.drill.exec.store.pojo.PojoRecordReader@337121ac[columns = null, isStarQuery = false, isSkipQuery = false]]) {noformat} > Improve execution side when it handles skipAll query > > > Key: DRILL-4387 > URL: https://issues.apache.org/jira/browse/DRILL-4387 > Project: Apache Drill > Issue Type: Bug >Reporter: Jinfeng Ni >Assignee: Jinfeng Ni > Fix For: 1.6.0 > > > DRILL-4279 changes the planner side and the RecordReader in the execution > side when they handles skipAll query. However, it seems there are other > places in the codebase that do not handle skipAll query efficiently. In > particular, in GroupScan or ScanBatchCreator, we will replace a NULL or empty > column list with star column. This essentially will force the execution side > (RecordReader) to fetch all the columns for data source. Such behavior will > lead to big performance overhead for the SCAN operator. > To improve Drill's performance, we should change those places as well, as a > follow-up work after DRILL-4279. > One simple example of this problem is: > {code} >SELECT DISTINCT substring(dir1, 5) from dfs.`/Path/To/ParquetTable`; > {code} > The query does not require any regular column from the parquet file. However, > ParquetRowGroupScan and ParquetScanBatchCreator will put star column as the > column list. In case table has dozens or hundreds of columns, this will make > SCAN operator much more expensive than necessary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4736) "noexec" set for /tmp
[ https://issues.apache.org/jira/browse/DRILL-4736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sudheesh Katkam updated DRILL-4736: --- Description: We should. can you file a doc bug. The issue is caused by "noexec" set for /tmp. Should we mention this in Drill Doc? This is not the first time we hit this issue. Thanks, Hao was: We should. can you file a doc bug. The issue is caused by "noexec" set for /tmp. Should we mention this in Drill Doc? This is not the first time we hit this issue. Thanks, Hao https://maprdrill.atlassian.net/browse/MD-946 > "noexec" set for /tmp > - > > Key: DRILL-4736 > URL: https://issues.apache.org/jira/browse/DRILL-4736 > Project: Apache Drill > Issue Type: Bug > Components: Documentation >Reporter: Bridget Bevens >Assignee: Bridget Bevens > > We should. can you file a doc bug. > The issue is caused by "noexec" set for /tmp. > Should we mention this in Drill Doc? > This is not the first time we hit this issue. > Thanks, > Hao -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4737) Adjust drill-env.sh instructions to reflect config/site directories
[ https://issues.apache.org/jira/browse/DRILL-4737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Rogers updated DRILL-4737: --- Description: See https://drill.apache.org/docs/starting-drill-in-distributed-mode/ Requires a number of changes to reflect Drill's support of a configuration directory as specified by: drillbit.sh --config /path/to/config/dir cmd "The default memory for a Drillbit is 8G, but Drill prefers 16G" The default *direct* memory for Drill is 8G. The default total memory for Drill is 12G. (Included 4G heap.) "Drillbit startup script located in /conf/drill- env.sh." The location is $SITE_DIR/drill-env.sh. $SITE_DIR is either: 1. $DRILL_HOME/conf by default (as stated in the docs), or 2. Specified by the --config option to drillbit.sh "edit the XX:MaxDirectMemorySize parameter". Please show how to do this. The correct form (to work with YARN) is: export DRILL_MAX_DIRECT_MEMORY=${DRILL_MAX_DIRECT_MEMORY:-"8G"} Where the new value replaces the "8G". (This is different than the pre-1.8 form.) "If this parameter is not set, the limit depends on the amount of available system memory." This has never turned out to be true as the script always provides a default value. Another point: Drill assumes all nodes have the same amount of memory. Relying on system memory will not, in general, work as some Drillbits (with less system memory) will die with OOM errors. I suspect this is why a default setting is always provided. "After you edit /conf/drill-env.sh" change to "After you edit drill-env.sh" to avoid repeating the path. Further, note that in 1.8, drill-env.sh will become self-documenting: it will contain example settings and comments for each supported config option. (Thanks to John O. for that suggestion!) We migth want to mention this information somewhere... was: See https://drill.apache.org/docs/starting-drill-in-distributed-mode/ Requires a number of changes to reflect Drill's support of a configuration directory as specified by: drillbit.sh --config /path/to/config/dir cmd "The default memory for a Drillbit is 8G, but Drill prefers 16G" The default *direct* memory for Drill is 8G. The default total memory for Drill is 12G. (Included 4G heap.) "Drillbit startup script located in /conf/drill- env.sh." The location is $SITE_DIR/drill-env.sh. $SITE_DIR is either: 1. $DRILL_HOME/conf by default (as stated in the docs), or 2. Specified by the --config option to drillbit.sh "edit the XX:MaxDirectMemorySize parameter". Please show how to do this. The correct form (to work with YARN) is: export DRILL_MAX_DIRECT_MEMORY=${DRILL_MAX_DIRECT_MEMORY:-"8G"} Where the new value replaces the "8G". (This is different than the pre-1.8 form.) "If this parameter is not set, the limit depends on the amount of available system memory." This has never turned out to be true as the script always provides a default value. Another point: Drill assumes all nodes have the same amount of memory. Relying on system memory will not, in general, work as some Drillbits (with less system memory) will die with OOM errors. I suspect this is why a default setting is always provided. "After you edit /conf/drill-env.sh" change to "After you edit drill-env.sh" to avoid repeating the path. > Adjust drill-env.sh instructions to reflect config/site directories > --- > > Key: DRILL-4737 > URL: https://issues.apache.org/jira/browse/DRILL-4737 > Project: Apache Drill > Issue Type: Improvement > Components: Documentation >Affects Versions: 1.8.0 >Reporter: Paul Rogers >Priority: Minor > > See https://drill.apache.org/docs/starting-drill-in-distributed-mode/ > Requires a number of changes to reflect Drill's support of a configuration > directory as specified by: > drillbit.sh --config /path/to/config/dir cmd > "The default memory for a Drillbit is 8G, but Drill prefers 16G" The default > *direct* memory for Drill is 8G. The default total memory for Drill is 12G. > (Included 4G heap.) > "Drillbit startup script located in /conf/drill- > env.sh." The location is $SITE_DIR/drill-env.sh. $SITE_DIR is either: > 1. $DRILL_HOME/conf by default (as stated in the docs), or > 2. Specified by the --config option to drillbit.sh > "edit the XX:MaxDirectMemorySize parameter". Please show how to do this. The > correct form (to work with YARN) is: > export DRILL_MAX_DIRECT_MEMORY=${DRILL_MAX_DIRECT_MEMORY:-"8G"} > Where the new value replaces the "8G". (This is different than the pre-1.8 > form.) > "If this parameter is not set, the limit depends on the amount of available > system memory." This has never turned out to be true as the script always > provides a default value. > Another point: Drill assumes all nodes have the same amount of memory. > Relying on system memory
[jira] [Updated] (DRILL-4740) Improvements to "Analyzing the Yelp Academic Dataset"
[ https://issues.apache.org/jira/browse/DRILL-4740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Rogers updated DRILL-4740: --- Description: Consider the topic paragraph for the Yelp sample data page: http://drill.apache.org/docs/analyzing-the-yelp-academic-dataset/ It could use a bit of TLC. For example: "Apache Drill is one of the fastest growing open source projects, with the community making rapid progress with monthly releases The key difference is Drill’s agility and flexibility." This is a non-sequiter. The speed and agility of the software does not drive the monthly releases. Can we reword it to say that Drill’s speed and agility makes it a popular project? And that many people work hard to make it better with monthly releases? Something like that... (Although, at present, releases have dropped to bi-monthly or quarterly...) And: "Along with meeting the table stakes for SQL-on-Hadoop, which is to achieve low latency performance at scale, …" Seems two problems. 1. What does it mean “meeting the table stakes”? Very unclear. 2. This is a run-on sentence that tries to say multiple thoughts in a single sentence and should be rewritten. Then, there is redundancy: "...Drill allows users to analyze the data without any ETL or up-front schema definitions. … Drill, has a “no schema” approach…" I’m sure this paragraph was written quickly early on, but it could certainly be improved a bit… More comments: 1. Minor nit: "This document aligns Drill output for example purposes. Drill output is not aligned in this case." I think that what this is saying is, “Drill output in this document is aligned for clarity. The actual Drill output you see may not be aligned.” It would be better to explain why it is not aligned here, since data is aligned in the earlier examples… 2. Somewhat off: "You can directly query self-describing files such as JSON, Parquet, and text. There is no need to create metadata definitions in the Hive metastore." I think what this is saying is that Drill infers schema information from self-describing files such as JSON, Parquet and CSV/TSV (with a header row). Contrast this with other systems, such as Hive, that require that you first define the schema in a data dictionary. Note that text is NOT a self-describing file format in the general case! 3. Yelp seems to be creating new revisions of their data set. I downloaded Round 7. The results differ from those in the Drill page text. Perhaps insert a statement that the examples used Round (whatever round) and that the reader’s results may differ when using later rounds. 4. The Yelp data is JSON. Somewhere near the top of the page (perhaps directly under "Querying Data with Drill”), we should say: The Yelp data is in JSON format. Where the “JSON format” would be link to the JSON docs: https://drill.apache.org/docs/json-data-model/ This is handy later when we tell the user to set the all_text_mode: First, change Drill to work in all text mode (so we can take a look at all of the data). Where we should add: (See the JSON Data Model documentation for more information.) 5. This query: select attributes from dfs.`//yelp/yelp_academic_dataset_business.json` limit 10; Appears all on one line and is truncated at the right of the page. Looks like we’ve broken our other long queries onto multiple lines. Perhaps this one needs the same treatment. 7. Here: "Top first categories in number of review counts" Perhaps copy the following text from the JSON format page to add explanation: “Query Complex Data” show how to use composite types to access nested arrays. 8. Another nit. Consider "Top businesses with cool rated reviews”. This (and similar items) are headers, but appear as regular text. The items have the HTML h4 tag, but have no special formatting. Can we make them bold or some such? 9. The following example SQL has two problems: 0: jdbc:drill:zk=local> create or replace view dfs.tmp.businessreviews as Select b.name,b.stars,b.state,b.city,r.votes.funny,r.votes.useful,r.votes.cool, r.`date` from dfs.`//yelp/yelp_academic_dataset_business.json` b, dfs.`//yelp/yelp_academic_dataset_review.json` r where r.business_id=b.business_id First, the third line scrolls off the page on my (moderate sized) page. Perhaps split it after “b, “. Second, the statement must end with a semi-colon: “b.business_id;”. 10. Another nit. This paragraph: "The goal of Apache Drill is to provide the freedom and flexibility in exploring data in ways we have never seen before with SQL technologies. The community is working on more exciting features around nested data and supporting data with changing schemas in upcoming releases." Would seem to be a better fit at the top of the page rather than toward the end. 11. Another nit. This paragraph: "In addition to these queries, you can get many deep insights using Drill’s SQL functionality. If
[jira] [Updated] (DRILL-4740) Improvements to "Analyzing the Yelp Academic Dataset"
[ https://issues.apache.org/jira/browse/DRILL-4740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Rogers updated DRILL-4740: --- Summary: Improvements to "Analyzing the Yelp Academic Dataset" (was: Awkward wording in "Analyzing the Yelp Academic Dataset") > Improvements to "Analyzing the Yelp Academic Dataset" > - > > Key: DRILL-4740 > URL: https://issues.apache.org/jira/browse/DRILL-4740 > Project: Apache Drill > Issue Type: Improvement > Components: Documentation >Affects Versions: 1.6.0 >Reporter: Paul Rogers >Priority: Minor > > Consider the topic paragraph for the Yelp sample data page: > http://drill.apache.org/docs/analyzing-the-yelp-academic-dataset/ > It could use a bit of TLC. For example: > "Apache Drill is one of the fastest growing open source projects, with the > community making rapid progress with monthly releases The key difference is > Drill’s agility and flexibility." > This is a non-sequiter. The speed and agility of the software does not drive > the monthly releases. Can we reword it to say that Drill’s speed and agility > makes it a popular project? And that many people work hard to make it better > with monthly releases? Something like that... > (Although, at present, releases have dropped to bi-monthly or quarterly...) > And: > "Along with meeting the table stakes for SQL-on-Hadoop, which is to achieve > low latency performance at scale, …" > Seems two problems. > 1. What does it mean “meeting the table stakes”? Very unclear. > 2. This is a run-on sentence that tries to say multiple thoughts in a single > sentence and should be rewritten. > Then, there is redundancy: > "...Drill allows users to analyze the data without any ETL or up-front schema > definitions. … Drill, has a “no schema” approach…" > I’m sure this paragraph was written quickly early on, but it could certainly > be improved a bit… -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4740) Awkward wording in "Analyzing the Yelp Academic Dataset"
Paul Rogers created DRILL-4740: -- Summary: Awkward wording in "Analyzing the Yelp Academic Dataset" Key: DRILL-4740 URL: https://issues.apache.org/jira/browse/DRILL-4740 Project: Apache Drill Issue Type: Improvement Components: Documentation Affects Versions: 1.6.0 Reporter: Paul Rogers Priority: Minor Consider the topic paragraph for the Yelp sample data page: http://drill.apache.org/docs/analyzing-the-yelp-academic-dataset/ It could use a bit of TLC. For example: "Apache Drill is one of the fastest growing open source projects, with the community making rapid progress with monthly releases The key difference is Drill’s agility and flexibility." This is a non-sequiter. The speed and agility of the software does not drive the monthly releases. Can we reword it to say that Drill’s speed and agility makes it a popular project? And that many people work hard to make it better with monthly releases? Something like that... (Although, at present, releases have dropped to bi-monthly or quarterly...) And: "Along with meeting the table stakes for SQL-on-Hadoop, which is to achieve low latency performance at scale, …" Seems two problems. 1. What does it mean “meeting the table stakes”? Very unclear. 2. This is a run-on sentence that tries to say multiple thoughts in a single sentence and should be rewritten. Then, there is redundancy: "...Drill allows users to analyze the data without any ETL or up-front schema definitions. … Drill, has a “no schema” approach…" I’m sure this paragraph was written quickly early on, but it could certainly be improved a bit… -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4739) "SQL Extensions" doc. errata
Paul Rogers created DRILL-4739: -- Summary: "SQL Extensions" doc. errata Key: DRILL-4739 URL: https://issues.apache.org/jira/browse/DRILL-4739 Project: Apache Drill Issue Type: Bug Components: Documentation Affects Versions: 1.6.0 Reporter: Paul Rogers Priority: Minor The “sys.drilbits” (http://drill.apache.org/docs/sql-extensions/) example throws an error when used with the standalone version: SELECT host FROM sys.drillbits WHERE `current` = true; Produces the following error: Error: VALIDATION ERROR: From line 1, column 8 to line 1, column 11: Column 'host' not found in any table [Error Id: e1d92308-9235-4699-ac53-03a59d06ce69 on 10.250.50.31:31010] (state=,code=0) Performing “select * from sys.drillbits;” shows that the actual column name is “hostname”. Checking http://drill.apache.org/docs/querying-system-tables/ shows that hostname is the documented column name. So, change the above example to: SELECT hostname FROM sys.drillbits WHERE `current` = true; Note the use of "hostname" rather than "host". -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4738) "Compiling Drill from Source" doc changes
Paul Rogers created DRILL-4738: -- Summary: "Compiling Drill from Source" doc changes Key: DRILL-4738 URL: https://issues.apache.org/jira/browse/DRILL-4738 Project: Apache Drill Issue Type: Improvement Components: Documentation Affects Versions: 1.6.0 Reporter: Paul Rogers Priority: Minor In, “2. Compile the code”, before the “mvn clean …”, add: export MAVEN_OPTS="-Xms256m -Xmx512m -XX:MaxPermSize=256m" The code will not compile with the default JVM options, instead, you’ll get an Out of Memory message. I personally encountered this. A new Drill developer fought with this issue. Might as well document the issue to save others from the same hassles. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4737) Adjust drill-env.sh instructions to reflect config/site directories
[ https://issues.apache.org/jira/browse/DRILL-4737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Rogers updated DRILL-4737: --- Issue Type: Improvement (was: Bug) > Adjust drill-env.sh instructions to reflect config/site directories > --- > > Key: DRILL-4737 > URL: https://issues.apache.org/jira/browse/DRILL-4737 > Project: Apache Drill > Issue Type: Improvement > Components: Documentation >Affects Versions: 1.8.0 >Reporter: Paul Rogers >Priority: Minor > > See https://drill.apache.org/docs/starting-drill-in-distributed-mode/ > Requires a number of changes to reflect Drill's support of a configuration > directory as specified by: > drillbit.sh --config /path/to/config/dir cmd > "The default memory for a Drillbit is 8G, but Drill prefers 16G" The default > *direct* memory for Drill is 8G. The default total memory for Drill is 12G. > (Included 4G heap.) > "Drillbit startup script located in /conf/drill- > env.sh." The location is $SITE_DIR/drill-env.sh. $SITE_DIR is either: > 1. $DRILL_HOME/conf by default (as stated in the docs), or > 2. Specified by the --config option to drillbit.sh > "edit the XX:MaxDirectMemorySize parameter". Please show how to do this. The > correct form (to work with YARN) is: > export DRILL_MAX_DIRECT_MEMORY=${DRILL_MAX_DIRECT_MEMORY:-"8G"} > Where the new value replaces the "8G". (This is different than the pre-1.8 > form.) > "If this parameter is not set, the limit depends on the amount of available > system memory." This has never turned out to be true as the script always > provides a default value. > Another point: Drill assumes all nodes have the same amount of memory. > Relying on system memory will not, in general, work as some Drillbits (with > less system memory) will die with OOM errors. I suspect this is why a default > setting is always provided. > "After you edit /conf/drill-env.sh" change to > "After you edit drill-env.sh" to avoid repeating the path. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4737) Adjust drill-env.sh instructions to reflect config/site directories
Paul Rogers created DRILL-4737: -- Summary: Adjust drill-env.sh instructions to reflect config/site directories Key: DRILL-4737 URL: https://issues.apache.org/jira/browse/DRILL-4737 Project: Apache Drill Issue Type: Bug Components: Documentation Affects Versions: 1.8.0 Reporter: Paul Rogers Priority: Minor See https://drill.apache.org/docs/starting-drill-in-distributed-mode/ Requires a number of changes to reflect Drill's support of a configuration directory as specified by: drillbit.sh --config /path/to/config/dir cmd "The default memory for a Drillbit is 8G, but Drill prefers 16G" The default *direct* memory for Drill is 8G. The default total memory for Drill is 12G. (Included 4G heap.) "Drillbit startup script located in /conf/drill- env.sh." The location is $SITE_DIR/drill-env.sh. $SITE_DIR is either: 1. $DRILL_HOME/conf by default (as stated in the docs), or 2. Specified by the --config option to drillbit.sh "edit the XX:MaxDirectMemorySize parameter". Please show how to do this. The correct form (to work with YARN) is: export DRILL_MAX_DIRECT_MEMORY=${DRILL_MAX_DIRECT_MEMORY:-"8G"} Where the new value replaces the "8G". (This is different than the pre-1.8 form.) "If this parameter is not set, the limit depends on the amount of available system memory." This has never turned out to be true as the script always provides a default value. Another point: Drill assumes all nodes have the same amount of memory. Relying on system memory will not, in general, work as some Drillbits (with less system memory) will die with OOM errors. I suspect this is why a default setting is always provided. "After you edit /conf/drill-env.sh" change to "After you edit drill-env.sh" to avoid repeating the path. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4736) "noexec" set for /tmp
[ https://issues.apache.org/jira/browse/DRILL-4736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bridget Bevens updated DRILL-4736: -- Description: We should. can you file a doc bug. The issue is caused by "noexec" set for /tmp. Should we mention this in Drill Doc? This is not the first time we hit this issue. Thanks, Hao https://maprdrill.atlassian.net/browse/MD-946 was: Neeraja Rentachintala 10:44 AM (1 minute ago) to Hao, Zelaine, me, Kathleen, Dayanand +Bridget We should. can you file a doc bug. The issue is caused by "noexec" set for /tmp. Should we mention this in Drill Doc? This is not the first time we hit this issue. Thanks, Hao On Tue, Jun 21, 2016 at 10:06 AM, Kathleen Liwrote: Customer update as follows: 1) We upgrade our 6 node cluster from MapR 3.1 to MapR 5.1 while also upgrading the OS of the servers to SuSe 12. One of the nodes is using Drill so we installed the latest version of Drill at 1.6. 2) According to the MCS, this particular node is getting an alert - Drillbit Down Alarm. When viewing the alarm via the node it states - Can not determine if service: drill-bits is running. Check logs at: /opt/mapr/drill/drill-1.6.0/logs/ . I'm including a small piece of the drillbit.log attached to this email. Here is part of the error that I am seeing that may be significant. From: Neeraja Rentachintala > "noexec" set for /tmp > - > > Key: DRILL-4736 > URL: https://issues.apache.org/jira/browse/DRILL-4736 > Project: Apache Drill > Issue Type: Bug > Components: Documentation >Reporter: Bridget Bevens >Assignee: Bridget Bevens > > We should. can you file a doc bug. > The issue is caused by "noexec" set for /tmp. > Should we mention this in Drill Doc? > This is not the first time we hit this issue. > Thanks, > Hao > https://maprdrill.atlassian.net/browse/MD-946 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4736) "noexec" set for /tmp
Bridget Bevens created DRILL-4736: - Summary: "noexec" set for /tmp Key: DRILL-4736 URL: https://issues.apache.org/jira/browse/DRILL-4736 Project: Apache Drill Issue Type: Bug Components: Documentation Reporter: Bridget Bevens Assignee: Bridget Bevens Neeraja Rentachintala 10:44 AM (1 minute ago) to Hao, Zelaine, me, Kathleen, Dayanand +Bridget We should. can you file a doc bug. The issue is caused by "noexec" set for /tmp. Should we mention this in Drill Doc? This is not the first time we hit this issue. Thanks, Hao On Tue, Jun 21, 2016 at 10:06 AM, Kathleen Liwrote: Customer update as follows: 1) We upgrade our 6 node cluster from MapR 3.1 to MapR 5.1 while also upgrading the OS of the servers to SuSe 12. One of the nodes is using Drill so we installed the latest version of Drill at 1.6. 2) According to the MCS, this particular node is getting an alert - Drillbit Down Alarm. When viewing the alarm via the node it states - Can not determine if service: drill-bits is running. Check logs at: /opt/mapr/drill/drill-1.6.0/logs/ . I'm including a small piece of the drillbit.log attached to this email. Here is part of the error that I am seeing that may be significant. From: Neeraja Rentachintala -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4733) max(dir0) reading more columns than necessary
[ https://issues.apache.org/jira/browse/DRILL-4733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15342297#comment-15342297 ] ASF GitHub Bot commented on DRILL-4733: --- Github user rchallapalli commented on a diff in the pull request: https://github.com/apache/drill/pull/531#discussion_r67917077 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/store/TestImplicitFileColumns.java --- @@ -110,4 +111,20 @@ public void testImplicitColumnsForParquet() throws Exception { .go(); } + @Test // DRILL-4733 + public void testMultilevelParquetWithSchemaChange() throws Exception { +try { + test("alter session set `planner.enable_decimal_data_type` = true"); --- End diff -- One of the parquet files in the data set contain a column which is double. But I do not understand why drill requires us to enable the decimal type; > max(dir0) reading more columns than necessary > - > > Key: DRILL-4733 > URL: https://issues.apache.org/jira/browse/DRILL-4733 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization, Storage - Parquet >Affects Versions: 1.7.0 >Reporter: Rahul Challapalli >Assignee: Arina Ielchiieva >Priority: Critical > Fix For: 1.7.0 > > Attachments: bug.tgz > > > The below query started to fail from this commit : > 3209886a8548eea4a2f74c059542672f8665b8d2 > {code} > select max(dir0) from dfs.`/drill/testdata/bug/2016`; > Error: UNSUPPORTED_OPERATION ERROR: Streaming aggregate does not support > schema changes > Fragment 0:0 > [Error Id: b0060205-e9a6-428a-9803-7b4312b2c6f4 on qa-node190.qa.lab:31010] > (state=,code=0) > {code} > The sub-folders contains files which do have schema change for one column > "contributions" (int32 vs double). However prior to this commit we did not > fail in the scenario. Log files and test data are attached -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4733) max(dir0) reading more columns than necessary
[ https://issues.apache.org/jira/browse/DRILL-4733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15342290#comment-15342290 ] ASF GitHub Bot commented on DRILL-4733: --- Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/531#discussion_r67916478 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/easy/EasyFormatPlugin.java --- @@ -126,8 +127,12 @@ CloseableRecordBatch getReaderBatch(FragmentContext context, EasySubScan scan) t final ImplicitColumnExplorer columnExplorer = new ImplicitColumnExplorer(context, scan.getColumns()); if (!columnExplorer.isSelectAllColumns()) { + // We must make sure to pass a table column (not to be confused with implicit column) to the underlying record reader. + List tableColumns = --- End diff -- In original PR I have created helper class which contained common logic for parquet and test format plugins. Somehow I missed that this part is unique for text format plugin, and should NOT be used in parquet one. That's why I have removed it from ImplicitColumnExplorer and added to EasyFormatPlugin. > max(dir0) reading more columns than necessary > - > > Key: DRILL-4733 > URL: https://issues.apache.org/jira/browse/DRILL-4733 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization, Storage - Parquet >Affects Versions: 1.7.0 >Reporter: Rahul Challapalli >Assignee: Arina Ielchiieva >Priority: Critical > Fix For: 1.7.0 > > Attachments: bug.tgz > > > The below query started to fail from this commit : > 3209886a8548eea4a2f74c059542672f8665b8d2 > {code} > select max(dir0) from dfs.`/drill/testdata/bug/2016`; > Error: UNSUPPORTED_OPERATION ERROR: Streaming aggregate does not support > schema changes > Fragment 0:0 > [Error Id: b0060205-e9a6-428a-9803-7b4312b2c6f4 on qa-node190.qa.lab:31010] > (state=,code=0) > {code} > The sub-folders contains files which do have schema change for one column > "contributions" (int32 vs double). However prior to this commit we did not > fail in the scenario. Log files and test data are attached -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4733) max(dir0) reading more columns than necessary
[ https://issues.apache.org/jira/browse/DRILL-4733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15342281#comment-15342281 ] ASF GitHub Bot commented on DRILL-4733: --- Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/531#discussion_r67915875 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/store/TestImplicitFileColumns.java --- @@ -110,4 +111,20 @@ public void testImplicitColumnsForParquet() throws Exception { .go(); } + @Test // DRILL-4733 + public void testMultilevelParquetWithSchemaChange() throws Exception { +try { + test("alter session set `planner.enable_decimal_data_type` = true"); --- End diff -- When I run this query without decimal data type enabled, drill tell me to turn it on. Probably it's connected with data inside the dataset. > max(dir0) reading more columns than necessary > - > > Key: DRILL-4733 > URL: https://issues.apache.org/jira/browse/DRILL-4733 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization, Storage - Parquet >Affects Versions: 1.7.0 >Reporter: Rahul Challapalli >Assignee: Arina Ielchiieva >Priority: Critical > Fix For: 1.7.0 > > Attachments: bug.tgz > > > The below query started to fail from this commit : > 3209886a8548eea4a2f74c059542672f8665b8d2 > {code} > select max(dir0) from dfs.`/drill/testdata/bug/2016`; > Error: UNSUPPORTED_OPERATION ERROR: Streaming aggregate does not support > schema changes > Fragment 0:0 > [Error Id: b0060205-e9a6-428a-9803-7b4312b2c6f4 on qa-node190.qa.lab:31010] > (state=,code=0) > {code} > The sub-folders contains files which do have schema change for one column > "contributions" (int32 vs double). However prior to this commit we did not > fail in the scenario. Log files and test data are attached -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4735) Count(dir0) on parquet returns 0 result
[ https://issues.apache.org/jira/browse/DRILL-4735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15342270#comment-15342270 ] Jinfeng Ni commented on DRILL-4735: --- I run the query on 1.4.0, and saw the same problem. I have not checked earlier version. But it's likely that this problem has been there for long time. > Count(dir0) on parquet returns 0 result > --- > > Key: DRILL-4735 > URL: https://issues.apache.org/jira/browse/DRILL-4735 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization, Storage - Parquet >Affects Versions: 1.4.0, 1.6.0, 1.7.0 >Reporter: Krystal >Priority: Critical > > Selecting a count of dir0, dir1, etc against a parquet directory returns 0 > rows. > select count(dir0) from `min_max_dir`; > +-+ > | EXPR$0 | > +-+ > | 0 | > +-+ > select count(dir1) from `min_max_dir`; > +-+ > | EXPR$0 | > +-+ > | 0 | > +-+ > If I put both dir0 and dir1 in the same select, it returns expected result: > select count(dir0), count(dir1) from `min_max_dir`; > +-+-+ > | EXPR$0 | EXPR$1 | > +-+-+ > | 600 | 600 | > +-+-+ > Here is the physical plan for count(dir0) query: > {code} > 00-00Screen : rowType = RecordType(BIGINT EXPR$0): rowcount = 20.0, > cumulative cost = {22.0 rows, 22.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id > = 1346 > 00-01 Project(EXPR$0=[$0]) : rowType = RecordType(BIGINT EXPR$0): > rowcount = 20.0, cumulative cost = {20.0 rows, 20.0 cpu, 0.0 io, 0.0 network, > 0.0 memory}, id = 1345 > 00-02Project(EXPR$0=[$0]) : rowType = RecordType(BIGINT EXPR$0): > rowcount = 20.0, cumulative cost = {20.0 rows, 20.0 cpu, 0.0 io, 0.0 network, > 0.0 memory}, id = 1344 > 00-03 > Scan(groupscan=[org.apache.drill.exec.store.pojo.PojoRecordReader@3da85d3b[columns > = null, isStarQuery = false, isSkipQuery = false]]) : rowType = > RecordType(BIGINT count): rowcount = 20.0, cumulative cost = {20.0 rows, 20.0 > cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 1343 > {code} > Here is part of the explain plan for the count(dir0) and count(dir1) in the > same select: > {code} > 00-00Screen : rowType = RecordType(BIGINT EXPR$0, BIGINT EXPR$1): > rowcount = 60.0, cumulative cost = {1206.0 rows, 15606.0 cpu, 0.0 io, 0.0 > network, 0.0 memory}, id = 1623 > 00-01 Project(EXPR$0=[$0], EXPR$1=[$1]) : rowType = RecordType(BIGINT > EXPR$0, BIGINT EXPR$1): rowcount = 60.0, cumulative cost = {1200.0 rows, > 15600.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 1622 > 00-02StreamAgg(group=[{}], EXPR$0=[COUNT($0)], EXPR$1=[COUNT($1)]) : > rowType = RecordType(BIGINT EXPR$0, BIGINT EXPR$1): rowcount = 60.0, > cumulative cost = {1200.0 rows, 15600.0 cpu, 0.0 io, 0.0 network, 0.0 > memory}, id = 1621 > 00-03 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath > [path=maprfs:/drill/testdata/min_max_dir/1999/Apr/voter20.parquet/0_0_0.parquet], > ReadEntryWithPath > [path=maprfs:/drill/testdata/min_max_dir/1999/MAR/voter15.parquet/0_0_0.parquet], > ReadEntryWithPath > [path=maprfs:/drill/testdata/min_max_dir/1985/jan/voter5.parquet/0_0_0.parquet], > ReadEntryWithPath > [path=maprfs:/drill/testdata/min_max_dir/1985/apr/voter60.parquet/0_0_0.parquet],..., > ReadEntryWithPath > [path=maprfs:/drill/testdata/min_max_dir/2014/jul/voter35.parquet/0_0_0.parquet]], > selectionRoot=maprfs:/drill/testdata/min_max_dir, numFiles=16, > usedMetadataFile=false, columns=[`dir0`, `dir1`]]]) : rowType = > RecordType(ANY dir0, ANY dir1): rowcount = 600.0, cumulative cost = {600.0 > rows, 1200.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 1620 > {code} > Notice that in the first case, > "org.apache.drill.exec.store.pojo.PojoRecordReader" is used. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4733) max(dir0) reading more columns than necessary
[ https://issues.apache.org/jira/browse/DRILL-4733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15342268#comment-15342268 ] ASF GitHub Bot commented on DRILL-4733: --- Github user amansinha100 commented on a diff in the pull request: https://github.com/apache/drill/pull/531#discussion_r67914754 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/store/TestImplicitFileColumns.java --- @@ -110,4 +111,20 @@ public void testImplicitColumnsForParquet() throws Exception { .go(); } + @Test // DRILL-4733 + public void testMultilevelParquetWithSchemaChange() throws Exception { +try { + test("alter session set `planner.enable_decimal_data_type` = true"); --- End diff -- Why is decimal type relevant for this particular test ? > max(dir0) reading more columns than necessary > - > > Key: DRILL-4733 > URL: https://issues.apache.org/jira/browse/DRILL-4733 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization, Storage - Parquet >Affects Versions: 1.7.0 >Reporter: Rahul Challapalli >Assignee: Arina Ielchiieva >Priority: Critical > Fix For: 1.7.0 > > Attachments: bug.tgz > > > The below query started to fail from this commit : > 3209886a8548eea4a2f74c059542672f8665b8d2 > {code} > select max(dir0) from dfs.`/drill/testdata/bug/2016`; > Error: UNSUPPORTED_OPERATION ERROR: Streaming aggregate does not support > schema changes > Fragment 0:0 > [Error Id: b0060205-e9a6-428a-9803-7b4312b2c6f4 on qa-node190.qa.lab:31010] > (state=,code=0) > {code} > The sub-folders contains files which do have schema change for one column > "contributions" (int32 vs double). However prior to this commit we did not > fail in the scenario. Log files and test data are attached -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4735) Count(dir0) on parquet returns 0 result
[ https://issues.apache.org/jira/browse/DRILL-4735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15342265#comment-15342265 ] Rahul Challapalli commented on DRILL-4735: -- [~knguyen] Can you confirm whether this a regression from 1.6 ? > Count(dir0) on parquet returns 0 result > --- > > Key: DRILL-4735 > URL: https://issues.apache.org/jira/browse/DRILL-4735 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization, Storage - Parquet >Affects Versions: 1.4.0, 1.6.0, 1.7.0 >Reporter: Krystal >Priority: Critical > > Selecting a count of dir0, dir1, etc against a parquet directory returns 0 > rows. > select count(dir0) from `min_max_dir`; > +-+ > | EXPR$0 | > +-+ > | 0 | > +-+ > select count(dir1) from `min_max_dir`; > +-+ > | EXPR$0 | > +-+ > | 0 | > +-+ > If I put both dir0 and dir1 in the same select, it returns expected result: > select count(dir0), count(dir1) from `min_max_dir`; > +-+-+ > | EXPR$0 | EXPR$1 | > +-+-+ > | 600 | 600 | > +-+-+ > Here is the physical plan for count(dir0) query: > {code} > 00-00Screen : rowType = RecordType(BIGINT EXPR$0): rowcount = 20.0, > cumulative cost = {22.0 rows, 22.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id > = 1346 > 00-01 Project(EXPR$0=[$0]) : rowType = RecordType(BIGINT EXPR$0): > rowcount = 20.0, cumulative cost = {20.0 rows, 20.0 cpu, 0.0 io, 0.0 network, > 0.0 memory}, id = 1345 > 00-02Project(EXPR$0=[$0]) : rowType = RecordType(BIGINT EXPR$0): > rowcount = 20.0, cumulative cost = {20.0 rows, 20.0 cpu, 0.0 io, 0.0 network, > 0.0 memory}, id = 1344 > 00-03 > Scan(groupscan=[org.apache.drill.exec.store.pojo.PojoRecordReader@3da85d3b[columns > = null, isStarQuery = false, isSkipQuery = false]]) : rowType = > RecordType(BIGINT count): rowcount = 20.0, cumulative cost = {20.0 rows, 20.0 > cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 1343 > {code} > Here is part of the explain plan for the count(dir0) and count(dir1) in the > same select: > {code} > 00-00Screen : rowType = RecordType(BIGINT EXPR$0, BIGINT EXPR$1): > rowcount = 60.0, cumulative cost = {1206.0 rows, 15606.0 cpu, 0.0 io, 0.0 > network, 0.0 memory}, id = 1623 > 00-01 Project(EXPR$0=[$0], EXPR$1=[$1]) : rowType = RecordType(BIGINT > EXPR$0, BIGINT EXPR$1): rowcount = 60.0, cumulative cost = {1200.0 rows, > 15600.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 1622 > 00-02StreamAgg(group=[{}], EXPR$0=[COUNT($0)], EXPR$1=[COUNT($1)]) : > rowType = RecordType(BIGINT EXPR$0, BIGINT EXPR$1): rowcount = 60.0, > cumulative cost = {1200.0 rows, 15600.0 cpu, 0.0 io, 0.0 network, 0.0 > memory}, id = 1621 > 00-03 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath > [path=maprfs:/drill/testdata/min_max_dir/1999/Apr/voter20.parquet/0_0_0.parquet], > ReadEntryWithPath > [path=maprfs:/drill/testdata/min_max_dir/1999/MAR/voter15.parquet/0_0_0.parquet], > ReadEntryWithPath > [path=maprfs:/drill/testdata/min_max_dir/1985/jan/voter5.parquet/0_0_0.parquet], > ReadEntryWithPath > [path=maprfs:/drill/testdata/min_max_dir/1985/apr/voter60.parquet/0_0_0.parquet],..., > ReadEntryWithPath > [path=maprfs:/drill/testdata/min_max_dir/2014/jul/voter35.parquet/0_0_0.parquet]], > selectionRoot=maprfs:/drill/testdata/min_max_dir, numFiles=16, > usedMetadataFile=false, columns=[`dir0`, `dir1`]]]) : rowType = > RecordType(ANY dir0, ANY dir1): rowcount = 600.0, cumulative cost = {600.0 > rows, 1200.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 1620 > {code} > Notice that in the first case, > "org.apache.drill.exec.store.pojo.PojoRecordReader" is used. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4735) Count(dir0) on parquet returns 0 result
[ https://issues.apache.org/jira/browse/DRILL-4735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinfeng Ni updated DRILL-4735: -- Affects Version/s: 1.4.0 > Count(dir0) on parquet returns 0 result > --- > > Key: DRILL-4735 > URL: https://issues.apache.org/jira/browse/DRILL-4735 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization, Storage - Parquet >Affects Versions: 1.4.0, 1.6.0, 1.7.0 >Reporter: Krystal >Priority: Critical > > Selecting a count of dir0, dir1, etc against a parquet directory returns 0 > rows. > select count(dir0) from `min_max_dir`; > +-+ > | EXPR$0 | > +-+ > | 0 | > +-+ > select count(dir1) from `min_max_dir`; > +-+ > | EXPR$0 | > +-+ > | 0 | > +-+ > If I put both dir0 and dir1 in the same select, it returns expected result: > select count(dir0), count(dir1) from `min_max_dir`; > +-+-+ > | EXPR$0 | EXPR$1 | > +-+-+ > | 600 | 600 | > +-+-+ > Here is the physical plan for count(dir0) query: > {code} > 00-00Screen : rowType = RecordType(BIGINT EXPR$0): rowcount = 20.0, > cumulative cost = {22.0 rows, 22.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id > = 1346 > 00-01 Project(EXPR$0=[$0]) : rowType = RecordType(BIGINT EXPR$0): > rowcount = 20.0, cumulative cost = {20.0 rows, 20.0 cpu, 0.0 io, 0.0 network, > 0.0 memory}, id = 1345 > 00-02Project(EXPR$0=[$0]) : rowType = RecordType(BIGINT EXPR$0): > rowcount = 20.0, cumulative cost = {20.0 rows, 20.0 cpu, 0.0 io, 0.0 network, > 0.0 memory}, id = 1344 > 00-03 > Scan(groupscan=[org.apache.drill.exec.store.pojo.PojoRecordReader@3da85d3b[columns > = null, isStarQuery = false, isSkipQuery = false]]) : rowType = > RecordType(BIGINT count): rowcount = 20.0, cumulative cost = {20.0 rows, 20.0 > cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 1343 > {code} > Here is part of the explain plan for the count(dir0) and count(dir1) in the > same select: > {code} > 00-00Screen : rowType = RecordType(BIGINT EXPR$0, BIGINT EXPR$1): > rowcount = 60.0, cumulative cost = {1206.0 rows, 15606.0 cpu, 0.0 io, 0.0 > network, 0.0 memory}, id = 1623 > 00-01 Project(EXPR$0=[$0], EXPR$1=[$1]) : rowType = RecordType(BIGINT > EXPR$0, BIGINT EXPR$1): rowcount = 60.0, cumulative cost = {1200.0 rows, > 15600.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 1622 > 00-02StreamAgg(group=[{}], EXPR$0=[COUNT($0)], EXPR$1=[COUNT($1)]) : > rowType = RecordType(BIGINT EXPR$0, BIGINT EXPR$1): rowcount = 60.0, > cumulative cost = {1200.0 rows, 15600.0 cpu, 0.0 io, 0.0 network, 0.0 > memory}, id = 1621 > 00-03 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath > [path=maprfs:/drill/testdata/min_max_dir/1999/Apr/voter20.parquet/0_0_0.parquet], > ReadEntryWithPath > [path=maprfs:/drill/testdata/min_max_dir/1999/MAR/voter15.parquet/0_0_0.parquet], > ReadEntryWithPath > [path=maprfs:/drill/testdata/min_max_dir/1985/jan/voter5.parquet/0_0_0.parquet], > ReadEntryWithPath > [path=maprfs:/drill/testdata/min_max_dir/1985/apr/voter60.parquet/0_0_0.parquet],..., > ReadEntryWithPath > [path=maprfs:/drill/testdata/min_max_dir/2014/jul/voter35.parquet/0_0_0.parquet]], > selectionRoot=maprfs:/drill/testdata/min_max_dir, numFiles=16, > usedMetadataFile=false, columns=[`dir0`, `dir1`]]]) : rowType = > RecordType(ANY dir0, ANY dir1): rowcount = 600.0, cumulative cost = {600.0 > rows, 1200.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 1620 > {code} > Notice that in the first case, > "org.apache.drill.exec.store.pojo.PojoRecordReader" is used. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4733) max(dir0) reading more columns than necessary
[ https://issues.apache.org/jira/browse/DRILL-4733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15342263#comment-15342263 ] ASF GitHub Bot commented on DRILL-4733: --- Github user amansinha100 commented on a diff in the pull request: https://github.com/apache/drill/pull/531#discussion_r67914676 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/easy/EasyFormatPlugin.java --- @@ -126,8 +127,12 @@ CloseableRecordBatch getReaderBatch(FragmentContext context, EasySubScan scan) t final ImplicitColumnExplorer columnExplorer = new ImplicitColumnExplorer(context, scan.getColumns()); if (!columnExplorer.isSelectAllColumns()) { + // We must make sure to pass a table column (not to be confused with implicit column) to the underlying record reader. + List tableColumns = --- End diff -- I haven't looked at the original patch for implicit columns but I am not sure why this fix is in the EasyFormatPlugin when the test is against Parquet files ? > max(dir0) reading more columns than necessary > - > > Key: DRILL-4733 > URL: https://issues.apache.org/jira/browse/DRILL-4733 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization, Storage - Parquet >Affects Versions: 1.7.0 >Reporter: Rahul Challapalli >Assignee: Arina Ielchiieva >Priority: Critical > Fix For: 1.7.0 > > Attachments: bug.tgz > > > The below query started to fail from this commit : > 3209886a8548eea4a2f74c059542672f8665b8d2 > {code} > select max(dir0) from dfs.`/drill/testdata/bug/2016`; > Error: UNSUPPORTED_OPERATION ERROR: Streaming aggregate does not support > schema changes > Fragment 0:0 > [Error Id: b0060205-e9a6-428a-9803-7b4312b2c6f4 on qa-node190.qa.lab:31010] > (state=,code=0) > {code} > The sub-folders contains files which do have schema change for one column > "contributions" (int32 vs double). However prior to this commit we did not > fail in the scenario. Log files and test data are attached -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4735) Count(dir0) on parquet returns 0 result
[ https://issues.apache.org/jira/browse/DRILL-4735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rahul Challapalli updated DRILL-4735: - Priority: Critical (was: Major) > Count(dir0) on parquet returns 0 result > --- > > Key: DRILL-4735 > URL: https://issues.apache.org/jira/browse/DRILL-4735 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization, Storage - Parquet >Affects Versions: 1.6.0, 1.7.0 >Reporter: Krystal >Priority: Critical > > Selecting a count of dir0, dir1, etc against a parquet directory returns 0 > rows. > select count(dir0) from `min_max_dir`; > +-+ > | EXPR$0 | > +-+ > | 0 | > +-+ > select count(dir1) from `min_max_dir`; > +-+ > | EXPR$0 | > +-+ > | 0 | > +-+ > If I put both dir0 and dir1 in the same select, it returns expected result: > select count(dir0), count(dir1) from `min_max_dir`; > +-+-+ > | EXPR$0 | EXPR$1 | > +-+-+ > | 600 | 600 | > +-+-+ > Here is the physical plan for count(dir0) query: > {code} > 00-00Screen : rowType = RecordType(BIGINT EXPR$0): rowcount = 20.0, > cumulative cost = {22.0 rows, 22.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id > = 1346 > 00-01 Project(EXPR$0=[$0]) : rowType = RecordType(BIGINT EXPR$0): > rowcount = 20.0, cumulative cost = {20.0 rows, 20.0 cpu, 0.0 io, 0.0 network, > 0.0 memory}, id = 1345 > 00-02Project(EXPR$0=[$0]) : rowType = RecordType(BIGINT EXPR$0): > rowcount = 20.0, cumulative cost = {20.0 rows, 20.0 cpu, 0.0 io, 0.0 network, > 0.0 memory}, id = 1344 > 00-03 > Scan(groupscan=[org.apache.drill.exec.store.pojo.PojoRecordReader@3da85d3b[columns > = null, isStarQuery = false, isSkipQuery = false]]) : rowType = > RecordType(BIGINT count): rowcount = 20.0, cumulative cost = {20.0 rows, 20.0 > cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 1343 > {code} > Here is part of the explain plan for the count(dir0) and count(dir1) in the > same select: > {code} > 00-00Screen : rowType = RecordType(BIGINT EXPR$0, BIGINT EXPR$1): > rowcount = 60.0, cumulative cost = {1206.0 rows, 15606.0 cpu, 0.0 io, 0.0 > network, 0.0 memory}, id = 1623 > 00-01 Project(EXPR$0=[$0], EXPR$1=[$1]) : rowType = RecordType(BIGINT > EXPR$0, BIGINT EXPR$1): rowcount = 60.0, cumulative cost = {1200.0 rows, > 15600.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 1622 > 00-02StreamAgg(group=[{}], EXPR$0=[COUNT($0)], EXPR$1=[COUNT($1)]) : > rowType = RecordType(BIGINT EXPR$0, BIGINT EXPR$1): rowcount = 60.0, > cumulative cost = {1200.0 rows, 15600.0 cpu, 0.0 io, 0.0 network, 0.0 > memory}, id = 1621 > 00-03 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath > [path=maprfs:/drill/testdata/min_max_dir/1999/Apr/voter20.parquet/0_0_0.parquet], > ReadEntryWithPath > [path=maprfs:/drill/testdata/min_max_dir/1999/MAR/voter15.parquet/0_0_0.parquet], > ReadEntryWithPath > [path=maprfs:/drill/testdata/min_max_dir/1985/jan/voter5.parquet/0_0_0.parquet], > ReadEntryWithPath > [path=maprfs:/drill/testdata/min_max_dir/1985/apr/voter60.parquet/0_0_0.parquet],..., > ReadEntryWithPath > [path=maprfs:/drill/testdata/min_max_dir/2014/jul/voter35.parquet/0_0_0.parquet]], > selectionRoot=maprfs:/drill/testdata/min_max_dir, numFiles=16, > usedMetadataFile=false, columns=[`dir0`, `dir1`]]]) : rowType = > RecordType(ANY dir0, ANY dir1): rowcount = 600.0, cumulative cost = {600.0 > rows, 1200.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 1620 > {code} > Notice that in the first case, > "org.apache.drill.exec.store.pojo.PojoRecordReader" is used. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4732) Update JDBC driver to use the new prepared statement APIs on DrillClient
[ https://issues.apache.org/jira/browse/DRILL-4732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15342110#comment-15342110 ] ASF GitHub Bot commented on DRILL-4732: --- GitHub user vkorukanti opened a pull request: https://github.com/apache/drill/pull/532 DRILL-4732: Update JDBC driver to use the new prepared statement APIs in DrillClient Changes specific to DRILL-4732 are in last commit. You can merge this pull request into a Git repository by running: $ git pull https://github.com/vkorukanti/drill DRILL-4732 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/532.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #532 commit 32ba03c7abd9a3784c9a5376dd2835325fe8d5f9 Author: vkorukantiDate: 2016-06-09T23:03:06Z DRILL-4728: Add support for new metadata fetch APIs + Protobuf messages - GetCatalogsReq -> GetCatalogsResp - GetSchemasReq -> GetSchemasResp - GetTablesReq -> GetTablesResp - GetColumnsReq -> GetColumnsResp + Java Drill client changes + Server side changes to handle the metadata API calls - Provide a self contained `Runnable` implementation for each metadata API that process the requests and sends the response to client - In `UserWorker` override the `handle` method that takes the `ResponseSender` and send the response from the `handle` method instead of returning it. - Add a method for each new API to UserWorker to submit the metadata work. - Add a method `addNewWork(Runnable runnable)` to `WorkerBee` to submit a generic `Runnable` to `ExecutorService`. - Move out couple of methods from `QueryContext` into a separate interface `SchemaConfigInfoProvider` to enable instantiating Schema trees without the full `QueryContext` + New protobuf messages increased the `jdbc-all.jar` size. Up the limit to 21MB. Change-Id: I5a5e4b453caf912d832ff8547c5789c884195cc4 commit a2ca69b3a81a8ff66bd671da775318204d49dda0 Author: vkorukanti Date: 2016-06-13T18:20:25Z DRILL-4729: Add support for prepared statement implementation on server side + Add following APIs for Drill Java client - DrillRpcFuture createPreparedStatement(final String query) - void executePreparedStatement(final PreparedStatement preparedStatement, UserResultsListener resultsListener) - List executePreparedStatement(final PreparedStatement preparedStatement) (for testing purpose) + Separated out the interface from UserClientConnection. It makes it easy to have wrappers which need to tap the messages and data going to the actual client. + Implement CREATE_PREPARED_STATEMENT and handle RunQuery with PreparedStatement + Test changes to support prepared statement as query type + Add tests in TestPreparedStatementProvider Change-Id: Id26cbb9ed809f0ab3c7530e6a5d8314d2e868b86 commit 2d91a605eac808561f2bf9ae60e6582936a4e9f0 Author: vkorukanti Date: 2016-06-20T21:40:05Z DRILL-4732: Update JDBC driver to use the new prepared statement APIs on DrillClient Change-Id: Ib8131789e9ad257b3f60859bc4115eaef43aee48 > Update JDBC driver to use the new prepared statement APIs on DrillClient > > > Key: DRILL-4732 > URL: https://issues.apache.org/jira/browse/DRILL-4732 > Project: Apache Drill > Issue Type: Sub-task > Components: Metadata >Reporter: Venki Korukanti >Assignee: Venki Korukanti > Fix For: 1.8.0 > > > DRILL-4729 is adding new prepared statement implementation on server side and > it provides APIs on DrillClient to create new prepared statement which > returns metadata along with a opaque handle and submit prepared statement for > execution. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4735) Count(dir0) on parquet returns 0 result
Krystal created DRILL-4735: -- Summary: Count(dir0) on parquet returns 0 result Key: DRILL-4735 URL: https://issues.apache.org/jira/browse/DRILL-4735 Project: Apache Drill Issue Type: Bug Components: Query Planning & Optimization, Storage - Parquet Affects Versions: 1.6.0, 1.7.0 Reporter: Krystal Selecting a count of dir0, dir1, etc against a parquet directory returns 0 rows. select count(dir0) from `min_max_dir`; +-+ | EXPR$0 | +-+ | 0 | +-+ select count(dir1) from `min_max_dir`; +-+ | EXPR$0 | +-+ | 0 | +-+ If I put both dir0 and dir1 in the same select, it returns expected result: select count(dir0), count(dir1) from `min_max_dir`; +-+-+ | EXPR$0 | EXPR$1 | +-+-+ | 600 | 600 | +-+-+ Here is the physical plan for count(dir0) query: {code} 00-00Screen : rowType = RecordType(BIGINT EXPR$0): rowcount = 20.0, cumulative cost = {22.0 rows, 22.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 1346 00-01 Project(EXPR$0=[$0]) : rowType = RecordType(BIGINT EXPR$0): rowcount = 20.0, cumulative cost = {20.0 rows, 20.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 1345 00-02Project(EXPR$0=[$0]) : rowType = RecordType(BIGINT EXPR$0): rowcount = 20.0, cumulative cost = {20.0 rows, 20.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 1344 00-03 Scan(groupscan=[org.apache.drill.exec.store.pojo.PojoRecordReader@3da85d3b[columns = null, isStarQuery = false, isSkipQuery = false]]) : rowType = RecordType(BIGINT count): rowcount = 20.0, cumulative cost = {20.0 rows, 20.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 1343 {code} Here is part of the explain plan for the count(dir0) and count(dir1) in the same select: {code} 00-00Screen : rowType = RecordType(BIGINT EXPR$0, BIGINT EXPR$1): rowcount = 60.0, cumulative cost = {1206.0 rows, 15606.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 1623 00-01 Project(EXPR$0=[$0], EXPR$1=[$1]) : rowType = RecordType(BIGINT EXPR$0, BIGINT EXPR$1): rowcount = 60.0, cumulative cost = {1200.0 rows, 15600.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 1622 00-02StreamAgg(group=[{}], EXPR$0=[COUNT($0)], EXPR$1=[COUNT($1)]) : rowType = RecordType(BIGINT EXPR$0, BIGINT EXPR$1): rowcount = 60.0, cumulative cost = {1200.0 rows, 15600.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 1621 00-03 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/min_max_dir/1999/Apr/voter20.parquet/0_0_0.parquet], ReadEntryWithPath [path=maprfs:/drill/testdata/min_max_dir/1999/MAR/voter15.parquet/0_0_0.parquet], ReadEntryWithPath [path=maprfs:/drill/testdata/min_max_dir/1985/jan/voter5.parquet/0_0_0.parquet], ReadEntryWithPath [path=maprfs:/drill/testdata/min_max_dir/1985/apr/voter60.parquet/0_0_0.parquet],..., ReadEntryWithPath [path=maprfs:/drill/testdata/min_max_dir/2014/jul/voter35.parquet/0_0_0.parquet]], selectionRoot=maprfs:/drill/testdata/min_max_dir, numFiles=16, usedMetadataFile=false, columns=[`dir0`, `dir1`]]]) : rowType = RecordType(ANY dir0, ANY dir1): rowcount = 600.0, cumulative cost = {600.0 rows, 1200.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 1620 {code} Notice that in the first case, "org.apache.drill.exec.store.pojo.PojoRecordReader" is used. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-2385) count on complex objects failed with missing function implementation
[ https://issues.apache.org/jira/browse/DRILL-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15341890#comment-15341890 ] ASF GitHub Bot commented on DRILL-2385: --- Github user vdiravka commented on the issue: https://github.com/apache/drill/pull/501 Changes merged into master with commit id f86c4fa > count on complex objects failed with missing function implementation > > > Key: DRILL-2385 > URL: https://issues.apache.org/jira/browse/DRILL-2385 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Drill >Affects Versions: 0.8.0 >Reporter: Chun Chang >Assignee: Vitalii Diravka >Priority: Minor > Fix For: 1.7.0 > > > #Wed Mar 04 01:23:42 EST 2015 > git.commit.id.abbrev=71b6bfe > Have a complex type looks like the following: > {code} > 0: jdbc:drill:schema=dfs.drillTestDirComplexJ> select t.sia from > `complex.json` t limit 1; > ++ > |sia | > ++ > | [1,11,101,1001] | > ++ > {code} > A count on the complex type will fail with missing function implementation: > {code} > 0: jdbc:drill:schema=dfs.drillTestDirComplexJ> select t.gbyi, count(t.sia) > countsia from `complex.json` t group by t.gbyi; > Query failed: RemoteRpcException: Failure while running fragment., Schema is > currently null. You must call buildSchema(SelectionVectorMode) before this > container can return a schema. [ 12856530-3133-45be-bdf4-ef8cc784f7b3 on > qa-node119.qa.lab:31010 ] > [ 12856530-3133-45be-bdf4-ef8cc784f7b3 on qa-node119.qa.lab:31010 ] > Error: exception while executing query: Failure while executing query. > (state=,code=0) > {code} > drillbit.log > {code} > 2015-03-04 13:44:51,383 [2b08832b-9247-e90c-785d-751f02fc1548:frag:2:0] ERROR > o.a.drill.exec.ops.FragmentContext - Fragment Context received failure. > org.apache.drill.exec.exception.SchemaChangeException: Failure while > materializing expression. > Error in expression at index 0. Error: Missing function implementation: > [count(BIGINT-REPEATED)]. Full expression: null. > at > org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.createAggregatorInternal(HashAggBatch.java:210) > [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.createAggregator(HashAggBatch.java:158) > [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.buildSchema(HashAggBatch.java:101) > [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:130) > [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:118) > [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:67) > [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.innerNext(PartitionSenderRootExec.java:114) > [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:57) > [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:121) > [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] > at > org.apache.drill.exec.work.WorkManager$RunnableWrapper.run(WorkManager.java:303) > [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > [na:1.7.0_45] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > [na:1.7.0_45] > at java.lang.Thread.run(Thread.java:744) [na:1.7.0_45] > 2015-03-04 13:44:51,383 [2b08832b-9247-e90c-785d-751f02fc1548:frag:2:0] WARN > o.a.d.e.w.fragment.FragmentExecutor - Error while initializing or executing > fragment > java.lang.NullPointerException: Schema is currently null. You must call > buildSchema(SelectionVectorMode) before this container can return a schema. > at > com.google.common.base.Preconditions.checkNotNull(Preconditions.java:208) > ~[guava-14.0.1.jar:na] > at > org.apache.drill.exec.record.VectorContainer.getSchema(VectorContainer.java:261) > ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] > at >
[jira] [Commented] (DRILL-2385) count on complex objects failed with missing function implementation
[ https://issues.apache.org/jira/browse/DRILL-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15341891#comment-15341891 ] ASF GitHub Bot commented on DRILL-2385: --- Github user vdiravka closed the pull request at: https://github.com/apache/drill/pull/501 > count on complex objects failed with missing function implementation > > > Key: DRILL-2385 > URL: https://issues.apache.org/jira/browse/DRILL-2385 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Drill >Affects Versions: 0.8.0 >Reporter: Chun Chang >Assignee: Vitalii Diravka >Priority: Minor > Fix For: 1.7.0 > > > #Wed Mar 04 01:23:42 EST 2015 > git.commit.id.abbrev=71b6bfe > Have a complex type looks like the following: > {code} > 0: jdbc:drill:schema=dfs.drillTestDirComplexJ> select t.sia from > `complex.json` t limit 1; > ++ > |sia | > ++ > | [1,11,101,1001] | > ++ > {code} > A count on the complex type will fail with missing function implementation: > {code} > 0: jdbc:drill:schema=dfs.drillTestDirComplexJ> select t.gbyi, count(t.sia) > countsia from `complex.json` t group by t.gbyi; > Query failed: RemoteRpcException: Failure while running fragment., Schema is > currently null. You must call buildSchema(SelectionVectorMode) before this > container can return a schema. [ 12856530-3133-45be-bdf4-ef8cc784f7b3 on > qa-node119.qa.lab:31010 ] > [ 12856530-3133-45be-bdf4-ef8cc784f7b3 on qa-node119.qa.lab:31010 ] > Error: exception while executing query: Failure while executing query. > (state=,code=0) > {code} > drillbit.log > {code} > 2015-03-04 13:44:51,383 [2b08832b-9247-e90c-785d-751f02fc1548:frag:2:0] ERROR > o.a.drill.exec.ops.FragmentContext - Fragment Context received failure. > org.apache.drill.exec.exception.SchemaChangeException: Failure while > materializing expression. > Error in expression at index 0. Error: Missing function implementation: > [count(BIGINT-REPEATED)]. Full expression: null. > at > org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.createAggregatorInternal(HashAggBatch.java:210) > [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.createAggregator(HashAggBatch.java:158) > [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.buildSchema(HashAggBatch.java:101) > [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:130) > [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:118) > [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:67) > [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.innerNext(PartitionSenderRootExec.java:114) > [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:57) > [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:121) > [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] > at > org.apache.drill.exec.work.WorkManager$RunnableWrapper.run(WorkManager.java:303) > [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > [na:1.7.0_45] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > [na:1.7.0_45] > at java.lang.Thread.run(Thread.java:744) [na:1.7.0_45] > 2015-03-04 13:44:51,383 [2b08832b-9247-e90c-785d-751f02fc1548:frag:2:0] WARN > o.a.d.e.w.fragment.FragmentExecutor - Error while initializing or executing > fragment > java.lang.NullPointerException: Schema is currently null. You must call > buildSchema(SelectionVectorMode) before this container can return a schema. > at > com.google.common.base.Preconditions.checkNotNull(Preconditions.java:208) > ~[guava-14.0.1.jar:na] > at > org.apache.drill.exec.record.VectorContainer.getSchema(VectorContainer.java:261) > ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.getSchema(AbstractRecordBatch.java:155) >
[jira] [Comment Edited] (DRILL-4734) Query against HBase table on a 5 node cluster fails with SchemaChangeException
[ https://issues.apache.org/jira/browse/DRILL-4734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15341843#comment-15341843 ] Aman Sinha edited comment on DRILL-4734 at 6/21/16 2:15 PM: Attached Explain plan with 2 nodes was (Author: amansinha100): Explain plan with 2 nodes > Query against HBase table on a 5 node cluster fails with SchemaChangeException > -- > > Key: DRILL-4734 > URL: https://issues.apache.org/jira/browse/DRILL-4734 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators, Storage - HBase >Affects Versions: 1.6.0 >Reporter: Aman Sinha > Attachments: 2nodes.explain.txt, 5nodes.explain.txt > > > [Creating this JIRA on behalf of Qiang Li] > Let say I have two tables. > {noformat} > offers_ref0 : rowkey salt(1byte)+long uid(8 byte ) , family: v, qualifier: > v(string) > offers_nation_idx: rowkey salt(1byte) + string, family:v, qualifier: v(long > 8 byte) > {noformat} > there is the SQL: > {noformat} > select CONVERT_FROM(BYTE_SUBSTR(`ref0`.row_key,-8,8),'BIGINT_BE') as did, >convert_from(`ref0`.`v`.`v`,'UTF8') as v from hbase.`offers_nation_idx` > as`nation` > join hbase.offers_ref0 as `ref0` > on CONVERT_FROM(BYTE_SUBSTR(`ref0`.row_key,-8,8),'BIGINT_BE') = > CONVERT_FROM(nation.`v`.`v`,'BIGINT_BE') >where `nation`.row_key > '0br' and `nation`.row_key < '0bs' > limit 10 > {noformat} > When I execute the query with single node or less than 5 nodes, its working > good. But when I execute it in cluster which have about 14 nodes, its throw > a exception: > First time will throw this exception: > *Caused by: java.sql.SQLException: SYSTEM ERROR: SchemaChangeException: > Hash join does not support schema changes* > Then if I query again, it will always throw below exception: > {noformat} > *Query Failed: An Error Occurred* > *org.apache.drill.common.exceptions.UserRemoteException: SYSTEM > ERROR:IllegalStateException: > Failure while reading vector. Expected vector class of > org.apache.drill.exec.vector.NullableIntVector but > was holding vector class org.apache.drill.exec.vector.complex.MapVector, > field=v(MAP:REQUIRED > [v(VARBINARY:OPTIONAL)[$bits$(UINT1:REQUIRED), > v(VARBINARY:OPTIONAL)[$offsets$(UINT4:REQUIRED)]]] Fragment 12:4 > [Error Id:06c6eae4-0822-4714-b0bf-a6e04ebfec79 on xxx:31010]* > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4734) Query against HBase table on a 5 node cluster fails with SchemaChangeException
[ https://issues.apache.org/jira/browse/DRILL-4734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aman Sinha updated DRILL-4734: -- Attachment: 2nodes.explain.txt Explain plan with 2 nodes > Query against HBase table on a 5 node cluster fails with SchemaChangeException > -- > > Key: DRILL-4734 > URL: https://issues.apache.org/jira/browse/DRILL-4734 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators, Storage - HBase >Affects Versions: 1.6.0 >Reporter: Aman Sinha > Attachments: 2nodes.explain.txt, 5nodes.explain.txt > > > [Creating this JIRA on behalf of Qiang Li] > Let say I have two tables. > {noformat} > offers_ref0 : rowkey salt(1byte)+long uid(8 byte ) , family: v, qualifier: > v(string) > offers_nation_idx: rowkey salt(1byte) + string, family:v, qualifier: v(long > 8 byte) > {noformat} > there is the SQL: > {noformat} > select CONVERT_FROM(BYTE_SUBSTR(`ref0`.row_key,-8,8),'BIGINT_BE') as did, >convert_from(`ref0`.`v`.`v`,'UTF8') as v from hbase.`offers_nation_idx` > as`nation` > join hbase.offers_ref0 as `ref0` > on CONVERT_FROM(BYTE_SUBSTR(`ref0`.row_key,-8,8),'BIGINT_BE') = > CONVERT_FROM(nation.`v`.`v`,'BIGINT_BE') >where `nation`.row_key > '0br' and `nation`.row_key < '0bs' > limit 10 > {noformat} > When I execute the query with single node or less than 5 nodes, its working > good. But when I execute it in cluster which have about 14 nodes, its throw > a exception: > First time will throw this exception: > *Caused by: java.sql.SQLException: SYSTEM ERROR: SchemaChangeException: > Hash join does not support schema changes* > Then if I query again, it will always throw below exception: > {noformat} > *Query Failed: An Error Occurred* > *org.apache.drill.common.exceptions.UserRemoteException: SYSTEM > ERROR:IllegalStateException: > Failure while reading vector. Expected vector class of > org.apache.drill.exec.vector.NullableIntVector but > was holding vector class org.apache.drill.exec.vector.complex.MapVector, > field=v(MAP:REQUIRED > [v(VARBINARY:OPTIONAL)[$bits$(UINT1:REQUIRED), > v(VARBINARY:OPTIONAL)[$offsets$(UINT4:REQUIRED)]]] Fragment 12:4 > [Error Id:06c6eae4-0822-4714-b0bf-a6e04ebfec79 on xxx:31010]* > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4734) Query against HBase table on a 5 node cluster fails with SchemaChangeException
[ https://issues.apache.org/jira/browse/DRILL-4734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aman Sinha updated DRILL-4734: -- Attachment: 5nodes.explain.txt Attached Explain plan with 5 nodes > Query against HBase table on a 5 node cluster fails with SchemaChangeException > -- > > Key: DRILL-4734 > URL: https://issues.apache.org/jira/browse/DRILL-4734 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators, Storage - HBase >Affects Versions: 1.6.0 >Reporter: Aman Sinha > Attachments: 2nodes.explain.txt, 5nodes.explain.txt > > > [Creating this JIRA on behalf of Qiang Li] > Let say I have two tables. > {noformat} > offers_ref0 : rowkey salt(1byte)+long uid(8 byte ) , family: v, qualifier: > v(string) > offers_nation_idx: rowkey salt(1byte) + string, family:v, qualifier: v(long > 8 byte) > {noformat} > there is the SQL: > {noformat} > select CONVERT_FROM(BYTE_SUBSTR(`ref0`.row_key,-8,8),'BIGINT_BE') as did, >convert_from(`ref0`.`v`.`v`,'UTF8') as v from hbase.`offers_nation_idx` > as`nation` > join hbase.offers_ref0 as `ref0` > on CONVERT_FROM(BYTE_SUBSTR(`ref0`.row_key,-8,8),'BIGINT_BE') = > CONVERT_FROM(nation.`v`.`v`,'BIGINT_BE') >where `nation`.row_key > '0br' and `nation`.row_key < '0bs' > limit 10 > {noformat} > When I execute the query with single node or less than 5 nodes, its working > good. But when I execute it in cluster which have about 14 nodes, its throw > a exception: > First time will throw this exception: > *Caused by: java.sql.SQLException: SYSTEM ERROR: SchemaChangeException: > Hash join does not support schema changes* > Then if I query again, it will always throw below exception: > {noformat} > *Query Failed: An Error Occurred* > *org.apache.drill.common.exceptions.UserRemoteException: SYSTEM > ERROR:IllegalStateException: > Failure while reading vector. Expected vector class of > org.apache.drill.exec.vector.NullableIntVector but > was holding vector class org.apache.drill.exec.vector.complex.MapVector, > field=v(MAP:REQUIRED > [v(VARBINARY:OPTIONAL)[$bits$(UINT1:REQUIRED), > v(VARBINARY:OPTIONAL)[$offsets$(UINT4:REQUIRED)]]] Fragment 12:4 > [Error Id:06c6eae4-0822-4714-b0bf-a6e04ebfec79 on xxx:31010]* > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4734) Query against HBase table on a 5 node cluster fails with SchemaChangeException
[ https://issues.apache.org/jira/browse/DRILL-4734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aman Sinha updated DRILL-4734: -- Description: [Creating this JIRA on behalf of Qiang Li] Let say I have two tables. {noformat} offers_ref0 : rowkey salt(1byte)+long uid(8 byte ) , family: v, qualifier: v(string) offers_nation_idx: rowkey salt(1byte) + string, family:v, qualifier: v(long 8 byte) {noformat} there is the SQL: {noformat} select CONVERT_FROM(BYTE_SUBSTR(`ref0`.row_key,-8,8),'BIGINT_BE') as did, convert_from(`ref0`.`v`.`v`,'UTF8') as v from hbase.`offers_nation_idx` as`nation` join hbase.offers_ref0 as `ref0` on CONVERT_FROM(BYTE_SUBSTR(`ref0`.row_key,-8,8),'BIGINT_BE') = CONVERT_FROM(nation.`v`.`v`,'BIGINT_BE') where `nation`.row_key > '0br' and `nation`.row_key < '0bs' limit 10 {noformat} When I execute the query with single node or less than 5 nodes, its working good. But when I execute it in cluster which have about 14 nodes, its throw a exception: First time will throw this exception: *Caused by: java.sql.SQLException: SYSTEM ERROR: SchemaChangeException: Hash join does not support schema changes* Then if I query again, it will always throw below exception: {noformat} *Query Failed: An Error Occurred* *org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR:IllegalStateException: Failure while reading vector. Expected vector class of org.apache.drill.exec.vector.NullableIntVector but was holding vector class org.apache.drill.exec.vector.complex.MapVector, field=v(MAP:REQUIRED [v(VARBINARY:OPTIONAL)[$bits$(UINT1:REQUIRED), v(VARBINARY:OPTIONAL)[$offsets$(UINT4:REQUIRED)]]] Fragment 12:4 [Error Id:06c6eae4-0822-4714-b0bf-a6e04ebfec79 on xxx:31010]* {noformat} was: [Creating this JIRA on behalf of Qiang Li] Let say I have two tables. {noformat} offers_ref0 : rowkey salt(1byte)+long uid(8 byte ) , family: v, qualifier: v(string) offers_nation_idx: rowkey salt(1byte) + string, family:v, qualifier: v(long 8 byte) {noformat} there is the SQL: {noformat} select CONVERT_FROM(BYTE_SUBSTR(`ref0`.row_key,-8,8),'BIGINT_BE') as did, convert_from(`ref0`.`v`.`v`,'UTF8') as v from hbase.`offers_nation_idx` as`nation` join hbase.offers_ref0 as `ref0` on CONVERT_FROM(BYTE_SUBSTR(`ref0`.row_key,-8,8),'BIGINT_BE') = CONVERT_FROM(nation.`v`.`v`,'BIGINT_BE') where `nation`.row_key > '0br' and `nation`.row_key < '0bs' limit 10 {noformat} When I execute the query with single node or less than 5 nodes, its working good. But when I execute it in cluster which have about 14 nodes, its throw a exception: First time will throw this exception: *Caused by: java.sql.SQLException: SYSTEM ERROR: SchemaChangeException: Hash join does not support schema changes* Then if I query again, it will always throw below exception: {noformat} *Query Failed: An Error Occurred* *org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR:IllegalStateException: Failure while reading vector. Expected vector class of org.apache.drill.exec.vector.NullableIntVector but was holding vector class org.apache.drill.exec.vector.complex.MapVector, field=v(MAP:REQUIRED [v(VARBINARY:OPTIONAL)[$bits$(UINT1:REQUIRED), v(VARBINARY:OPTIONAL)[$offsets$(UINT4:REQUIRED)]]] Fragment 12:4 [Error Id:06c6eae4-0822-4714-b0bf-a6e04ebfec79 on xxx:31010]* {noformat} > Query against HBase table on a 5 node cluster fails with SchemaChangeException > -- > > Key: DRILL-4734 > URL: https://issues.apache.org/jira/browse/DRILL-4734 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators, Storage - HBase >Affects Versions: 1.6.0 >Reporter: Aman Sinha > > [Creating this JIRA on behalf of Qiang Li] > Let say I have two tables. > {noformat} > offers_ref0 : rowkey salt(1byte)+long uid(8 byte ) , family: v, qualifier: > v(string) > offers_nation_idx: rowkey salt(1byte) + string, family:v, qualifier: v(long > 8 byte) > {noformat} > there is the SQL: > {noformat} > select CONVERT_FROM(BYTE_SUBSTR(`ref0`.row_key,-8,8),'BIGINT_BE') as did, >convert_from(`ref0`.`v`.`v`,'UTF8') as v from hbase.`offers_nation_idx` > as`nation` > join hbase.offers_ref0 as `ref0` > on CONVERT_FROM(BYTE_SUBSTR(`ref0`.row_key,-8,8),'BIGINT_BE') = > CONVERT_FROM(nation.`v`.`v`,'BIGINT_BE') >where `nation`.row_key > '0br' and `nation`.row_key < '0bs' > limit 10 > {noformat} > When I execute the query with single node or less than 5 nodes, its working > good. But when I execute it in cluster which have about 14 nodes, its throw > a exception: > First time will throw this exception: > *Caused by: java.sql.SQLException: SYSTEM ERROR: SchemaChangeException: > Hash join does not support schema changes* > Then if I query again, it will
[jira] [Updated] (DRILL-4734) Query against HBase table on a 5 node cluster fails with SchemaChangeException
[ https://issues.apache.org/jira/browse/DRILL-4734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aman Sinha updated DRILL-4734: -- Description: [Creating this JIRA on behalf of Qiang Li] Let say I have two tables. {noformat} offers_ref0 : rowkey salt(1byte)+long uid(8 byte ) , family: v, qualifier: v(string) offers_nation_idx: rowkey salt(1byte) + string, family:v, qualifier: v(long 8 byte) {noformat} there is the SQL: {noformat} select CONVERT_FROM(BYTE_SUBSTR(`ref0`.row_key,-8,8),'BIGINT_BE') as did, convert_from(`ref0`.`v`.`v`,'UTF8') as v from hbase.`offers_nation_idx` as`nation` join hbase.offers_ref0 as `ref0` on CONVERT_FROM(BYTE_SUBSTR(`ref0`.row_key,-8,8),'BIGINT_BE') = CONVERT_FROM(nation.`v`.`v`,'BIGINT_BE') where `nation`.row_key > '0br' and `nation`.row_key < '0bs' limit 10 {noformat} When I execute the query with single node or less than 5 nodes, its working good. But when I execute it in cluster which have about 14 nodes, its throw a exception: First time will throw this exception: *Caused by: java.sql.SQLException: SYSTEM ERROR: SchemaChangeException: Hash join does not support schema changes* Then if I query again, it will always throw below exception: {noformat} *Query Failed: An Error Occurred* *org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR:IllegalStateException: Failure while reading vector. Expected vector class of org.apache.drill.exec.vector.NullableIntVector but was holding vector class org.apache.drill.exec.vector.complex.MapVector, field=v(MAP:REQUIRED [v(VARBINARY:OPTIONAL)[$bits$(UINT1:REQUIRED), v(VARBINARY:OPTIONAL)[$offsets$(UINT4:REQUIRED)]]] Fragment 12:4 [Error Id:06c6eae4-0822-4714-b0bf-a6e04ebfec79 on xxx:31010]* {noformat} was: [Creating this JIRA on behalf of Qiang Li] Let say I have two tables. offers_ref0 : rowkey salt(1byte)+long uid(8 byte ) , family: v, qualifier: v(string) offers_nation_idx: rowkey salt(1byte) + string, family:v, qualifier: v(long 8 byte) there is the SQL: select CONVERT_FROM(BYTE_SUBSTR(`ref0`.row_key,-8,8),'BIGINT_BE') as uid, convert_from(`ref0`.`v`.`v`,'UTF8') as v from hbase.`offers_nation_idx` as `nation` join hbase.offers_ref0 as `ref0` on CONVERT_FROM(BYTE_SUBSTR(`ref0`.row_key,-8,8),'BIGINT_BE') = CONVERT_FROM(nation.`v`.`v`,'BIGINT_BE') where `nation`.row_key > '0br' and `nation`.row_key < '0bs' limit 10 When I execute the query with single node or less than 5 nodes, its working good. But when I execute it in cluster which have about 14 nodes, its throw a exception: First time will throw this exception: *Caused by: java.sql.SQLException: SYSTEM ERROR: SchemaChangeException: Hash join does not support schema changes* Then if I query again, it will always throw below exception: *Query Failed: An Error Occurred* *org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: IllegalStateException: Failure while reading vector. Expected vector class of org.apache.drill.exec.vector.NullableIntVector but was holding vector class org.apache.drill.exec.vector.complex.MapVector, field= v(MAP:REQUIRED)[v(VARBINARY:OPTIONAL)[$bits$(UINT1:REQUIRED), v(VARBINARY:OPTIONAL)[$offsets$(UINT4:REQUIRED)]]] Fragment 12:4 [Error Id: 06c6eae4-0822-4714-b0bf-a6e04ebfec79 on xxx:31010]* Its very strange, and I do not know how to solve it. I tried add node to the cluster one by one, it will reproduce when I added 5 nodes. Can anyone help me solve this issue? > Query against HBase table on a 5 node cluster fails with SchemaChangeException > -- > > Key: DRILL-4734 > URL: https://issues.apache.org/jira/browse/DRILL-4734 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators, Storage - HBase >Affects Versions: 1.6.0 >Reporter: Aman Sinha > > [Creating this JIRA on behalf of Qiang Li] > Let say I have two tables. > {noformat} > offers_ref0 : rowkey salt(1byte)+long uid(8 byte ) , family: v, qualifier: > v(string) > offers_nation_idx: rowkey salt(1byte) + string, family:v, qualifier: v(long > 8 byte) > {noformat} > there is the SQL: > {noformat} > select CONVERT_FROM(BYTE_SUBSTR(`ref0`.row_key,-8,8),'BIGINT_BE') as did, > convert_from(`ref0`.`v`.`v`,'UTF8') as v from hbase.`offers_nation_idx` > as`nation` join hbase.offers_ref0 as `ref0` on > CONVERT_FROM(BYTE_SUBSTR(`ref0`.row_key,-8,8),'BIGINT_BE') = > CONVERT_FROM(nation.`v`.`v`,'BIGINT_BE') where `nation`.row_key > '0br' and > `nation`.row_key < '0bs' limit 10 > {noformat} > When I execute the query with single node or less than 5 nodes, its working > good. But when I execute it in cluster which have about 14 nodes, its throw > a exception: > First time will throw this exception: > *Caused by: java.sql.SQLException: SYSTEM ERROR: SchemaChangeException: > Hash join does not support
[jira] [Created] (DRILL-4734) Query against HBase table on a 5 node cluster fails with SchemaChangeException
Aman Sinha created DRILL-4734: - Summary: Query against HBase table on a 5 node cluster fails with SchemaChangeException Key: DRILL-4734 URL: https://issues.apache.org/jira/browse/DRILL-4734 Project: Apache Drill Issue Type: Bug Components: Execution - Relational Operators, Storage - HBase Affects Versions: 1.6.0 Reporter: Aman Sinha [Creating this JIRA on behalf of Qiang Li] Let say I have two tables. offers_ref0 : rowkey salt(1byte)+long uid(8 byte ) , family: v, qualifier: v(string) offers_nation_idx: rowkey salt(1byte) + string, family:v, qualifier: v(long 8 byte) there is the SQL: select CONVERT_FROM(BYTE_SUBSTR(`ref0`.row_key,-8,8),'BIGINT_BE') as uid, convert_from(`ref0`.`v`.`v`,'UTF8') as v from hbase.`offers_nation_idx` as `nation` join hbase.offers_ref0 as `ref0` on CONVERT_FROM(BYTE_SUBSTR(`ref0`.row_key,-8,8),'BIGINT_BE') = CONVERT_FROM(nation.`v`.`v`,'BIGINT_BE') where `nation`.row_key > '0br' and `nation`.row_key < '0bs' limit 10 When I execute the query with single node or less than 5 nodes, its working good. But when I execute it in cluster which have about 14 nodes, its throw a exception: First time will throw this exception: *Caused by: java.sql.SQLException: SYSTEM ERROR: SchemaChangeException: Hash join does not support schema changes* Then if I query again, it will always throw below exception: *Query Failed: An Error Occurred* *org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: IllegalStateException: Failure while reading vector. Expected vector class of org.apache.drill.exec.vector.NullableIntVector but was holding vector class org.apache.drill.exec.vector.complex.MapVector, field= v(MAP:REQUIRED)[v(VARBINARY:OPTIONAL)[$bits$(UINT1:REQUIRED), v(VARBINARY:OPTIONAL)[$offsets$(UINT4:REQUIRED)]]] Fragment 12:4 [Error Id: 06c6eae4-0822-4714-b0bf-a6e04ebfec79 on xxx:31010]* Its very strange, and I do not know how to solve it. I tried add node to the cluster one by one, it will reproduce when I added 5 nodes. Can anyone help me solve this issue? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4733) max(dir0) reading more columns than necessary
[ https://issues.apache.org/jira/browse/DRILL-4733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15341831#comment-15341831 ] ASF GitHub Bot commented on DRILL-4733: --- GitHub user arina-ielchiieva opened a pull request: https://github.com/apache/drill/pull/531 DRILL-4733: max(dir0) reading more columns than necessary You can merge this pull request into a Git repository by running: $ git pull https://github.com/arina-ielchiieva/drill DRILL-4733 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/531.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #531 commit 91b55e88311061ad6729d35b32ca150734991971 Author: Arina IelchiievaDate: 2016-06-21T12:33:32Z DRILL-4733: max(dir0) reading more columns than necessary > max(dir0) reading more columns than necessary > - > > Key: DRILL-4733 > URL: https://issues.apache.org/jira/browse/DRILL-4733 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization, Storage - Parquet >Affects Versions: 1.7.0 >Reporter: Rahul Challapalli >Assignee: Arina Ielchiieva >Priority: Critical > Fix For: 1.7.0 > > Attachments: bug.tgz > > > The below query started to fail from this commit : > 3209886a8548eea4a2f74c059542672f8665b8d2 > {code} > select max(dir0) from dfs.`/drill/testdata/bug/2016`; > Error: UNSUPPORTED_OPERATION ERROR: Streaming aggregate does not support > schema changes > Fragment 0:0 > [Error Id: b0060205-e9a6-428a-9803-7b4312b2c6f4 on qa-node190.qa.lab:31010] > (state=,code=0) > {code} > The sub-folders contains files which do have schema change for one column > "contributions" (int32 vs double). However prior to this commit we did not > fail in the scenario. Log files and test data are attached -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3726) Drill is not properly interpreting CRLF (0d0a). CR gets read as content.
[ https://issues.apache.org/jira/browse/DRILL-3726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15341825#comment-15341825 ] Arina Ielchiieva commented on DRILL-3726: - User will have two options: 1. specify delimiter in select clause: select * from table(dfs.`my_table`(type=>'text', 'lineDelimiter'=>'\r\n')) 2. update storage plugin lineDelimiter value to '\r\n' on web UI. > Drill is not properly interpreting CRLF (0d0a). CR gets read as content. > > > Key: DRILL-3726 > URL: https://issues.apache.org/jira/browse/DRILL-3726 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Text & CSV >Affects Versions: 1.1.0 > Environment: Linux RHEL 6.6, OSX 10.9 >Reporter: Edmon Begoli > Fix For: 1.7.0 > > Original Estimate: 120h > Remaining Estimate: 120h > > When we query the last attribute of a text file, we get missing characters. > Looking at the row through Drill, a \r is included at the end of the last > attribute. > Looking in a text editor, it's not embedded into that attribute. > I'm thinking that Drill is not interpreting CRLF (0d0a) as a new line, only > the LF, resulting in the CR becoming part of the last attribute. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3149) TextReader should support multibyte line delimiters
[ https://issues.apache.org/jira/browse/DRILL-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15341814#comment-15341814 ] ASF GitHub Bot commented on DRILL-3149: --- Github user arina-ielchiieva closed the pull request at: https://github.com/apache/drill/pull/500 > TextReader should support multibyte line delimiters > --- > > Key: DRILL-3149 > URL: https://issues.apache.org/jira/browse/DRILL-3149 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Text & CSV >Affects Versions: 1.0.0, 1.1.0 >Reporter: Jim Scott >Assignee: Arina Ielchiieva >Priority: Minor > Labels: doc-impacting > Fix For: 1.7.0 > > > lineDelimiter in the TextFormatConfig doesn't support \r\n for record > delimiters. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3149) TextReader should support multibyte line delimiters
[ https://issues.apache.org/jira/browse/DRILL-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15341813#comment-15341813 ] ASF GitHub Bot commented on DRILL-3149: --- Github user arina-ielchiieva commented on the issue: https://github.com/apache/drill/pull/500 Changed merged into master with commit id - 223507b76ff6c2227e667ae4a53f743c92edd295 > TextReader should support multibyte line delimiters > --- > > Key: DRILL-3149 > URL: https://issues.apache.org/jira/browse/DRILL-3149 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Text & CSV >Affects Versions: 1.0.0, 1.1.0 >Reporter: Jim Scott >Assignee: Arina Ielchiieva >Priority: Minor > Labels: doc-impacting > Fix For: 1.7.0 > > > lineDelimiter in the TextFormatConfig doesn't support \r\n for record > delimiters. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-4658) cannot specify tab as a fieldDelimiter in table function
[ https://issues.apache.org/jira/browse/DRILL-4658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva resolved DRILL-4658. - Resolution: Fixed Fix Version/s: 1.7.0 Fix merged into master with commit id - 223507b76ff6c2227e667ae4a53f743c92edd295 > cannot specify tab as a fieldDelimiter in table function > > > Key: DRILL-4658 > URL: https://issues.apache.org/jira/browse/DRILL-4658 > Project: Apache Drill > Issue Type: Bug > Components: SQL Parser >Affects Versions: 1.6.0 > Environment: Mac OS X, Java 8 >Reporter: Vince Gonzalez >Assignee: Arina Ielchiieva > Fix For: 1.7.0 > > > I can't specify a tab delimiter in the table function because it maybe counts > the characters rather than trying to interpret as a character escape code? > {code} > 0: jdbc:drill:zk=local> select columns[0] as a, cast(columns[1] as bigint) as > b from table(dfs.tmp.`sample_cast.tsv`(type => 'text', fieldDelimiter => > '\t', skipFirstLine => true)); > Error: PARSE ERROR: Expected single character but was String: \t > table sample_cast.tsv > parameter fieldDelimiter > SQL Query null > [Error Id: 3efa82e1-3810-4d4a-b23c-32d6658dffcf on 172.30.1.144:31010] > (state=,code=0) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-3726) Drill is not properly interpreting CRLF (0d0a). CR gets read as content.
[ https://issues.apache.org/jira/browse/DRILL-3726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva resolved DRILL-3726. - Resolution: Fixed Fix merged into master with commit id 223507b76ff6c2227e667ae4a53f743c92edd295 > Drill is not properly interpreting CRLF (0d0a). CR gets read as content. > > > Key: DRILL-3726 > URL: https://issues.apache.org/jira/browse/DRILL-3726 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Text & CSV >Affects Versions: 1.1.0 > Environment: Linux RHEL 6.6, OSX 10.9 >Reporter: Edmon Begoli > Fix For: 1.7.0 > > Original Estimate: 120h > Remaining Estimate: 120h > > When we query the last attribute of a text file, we get missing characters. > Looking at the row through Drill, a \r is included at the end of the last > attribute. > Looking in a text editor, it's not embedded into that attribute. > I'm thinking that Drill is not interpreting CRLF (0d0a) as a new line, only > the LF, resulting in the CR becoming part of the last attribute. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-3149) TextReader should support multibyte line delimiters
[ https://issues.apache.org/jira/browse/DRILL-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva resolved DRILL-3149. - Resolution: Fixed Fix Version/s: (was: Future) 1.7.0 > TextReader should support multibyte line delimiters > --- > > Key: DRILL-3149 > URL: https://issues.apache.org/jira/browse/DRILL-3149 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Text & CSV >Affects Versions: 1.0.0, 1.1.0 >Reporter: Jim Scott >Assignee: Arina Ielchiieva >Priority: Minor > Labels: doc-impacting > Fix For: 1.7.0 > > > lineDelimiter in the TextFormatConfig doesn't support \r\n for record > delimiters. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3149) TextReader should support multibyte line delimiters
[ https://issues.apache.org/jira/browse/DRILL-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva updated DRILL-3149: Labels: doc-impacting (was: ) > TextReader should support multibyte line delimiters > --- > > Key: DRILL-3149 > URL: https://issues.apache.org/jira/browse/DRILL-3149 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Text & CSV >Affects Versions: 1.0.0, 1.1.0 >Reporter: Jim Scott >Assignee: Arina Ielchiieva >Priority: Minor > Labels: doc-impacting > Fix For: 1.7.0 > > > lineDelimiter in the TextFormatConfig doesn't support \r\n for record > delimiters. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3149) TextReader should support multibyte line delimiters
[ https://issues.apache.org/jira/browse/DRILL-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15341807#comment-15341807 ] Arina Ielchiieva commented on DRILL-3149: - Merged into master with commit id 223507b76ff6c2227e667ae4a53f743c92edd295 > TextReader should support multibyte line delimiters > --- > > Key: DRILL-3149 > URL: https://issues.apache.org/jira/browse/DRILL-3149 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Text & CSV >Affects Versions: 1.0.0, 1.1.0 >Reporter: Jim Scott >Assignee: Arina Ielchiieva >Priority: Minor > Labels: doc-impacting > Fix For: 1.7.0 > > > lineDelimiter in the TextFormatConfig doesn't support \r\n for record > delimiters. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4701) Fix log name and missing lines in logs on Web UI
[ https://issues.apache.org/jira/browse/DRILL-4701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15341791#comment-15341791 ] ASF GitHub Bot commented on DRILL-4701: --- Github user arina-ielchiieva commented on the issue: https://github.com/apache/drill/pull/511 Changes merged into master with commit id 4123ed2a539cd3f9812f22f96d56aa4709828acd > Fix log name and missing lines in logs on Web UI > > > Key: DRILL-4701 > URL: https://issues.apache.org/jira/browse/DRILL-4701 > Project: Apache Drill > Issue Type: Bug >Reporter: Arina Ielchiieva >Assignee: Arina Ielchiieva > Fix For: 1.7.0 > > > 1. When the log files are downloaded from the ui, the name of the downloaded > file is "download". We should save the file with the same name as the log > file (ie. drillbit.log) > 2. The last N lines of the log file displayed in the web UI do not match the > log file itself. Some lines are missing compared with actual log. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4701) Fix log name and missing lines in logs on Web UI
[ https://issues.apache.org/jira/browse/DRILL-4701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15341792#comment-15341792 ] ASF GitHub Bot commented on DRILL-4701: --- Github user arina-ielchiieva closed the pull request at: https://github.com/apache/drill/pull/511 > Fix log name and missing lines in logs on Web UI > > > Key: DRILL-4701 > URL: https://issues.apache.org/jira/browse/DRILL-4701 > Project: Apache Drill > Issue Type: Bug >Reporter: Arina Ielchiieva >Assignee: Arina Ielchiieva > Fix For: 1.7.0 > > > 1. When the log files are downloaded from the ui, the name of the downloaded > file is "download". We should save the file with the same name as the log > file (ie. drillbit.log) > 2. The last N lines of the log file displayed in the web UI do not match the > log file itself. Some lines are missing compared with actual log. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-2593) 500 error when crc for a query profile is out of sync
[ https://issues.apache.org/jira/browse/DRILL-2593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva resolved DRILL-2593. - Resolution: Fixed > 500 error when crc for a query profile is out of sync > - > > Key: DRILL-2593 > URL: https://issues.apache.org/jira/browse/DRILL-2593 > Project: Apache Drill > Issue Type: Bug > Components: Client - HTTP >Affects Versions: 0.7.0 >Reporter: Jason Altekruse >Assignee: Arina Ielchiieva > Fix For: 1.7.0 > > Attachments: warning1.JPG, warning2.JPG > > > To reproduce, on a machine where an embedded drillbit has been run, edit one > of the profiles stored in /tmp/drill/profiles and try to navigate to the > profiles page on the Web UI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-4571) Add link to local Drill logs from the web UI
[ https://issues.apache.org/jira/browse/DRILL-4571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva resolved DRILL-4571. - Resolution: Fixed Fix was merged into master with commit id 4123ed2a539cd3f9812f22f96d56aa4709828acd > Add link to local Drill logs from the web UI > > > Key: DRILL-4571 > URL: https://issues.apache.org/jira/browse/DRILL-4571 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.6.0 >Reporter: Arina Ielchiieva >Assignee: Arina Ielchiieva > Labels: doc-impacting > Fix For: 1.7.0 > > Attachments: display_log.JPG, drillbit_download.log.gz, > drillbit_queries_json_screenshot.jpg, drillbit_ui.log, log_list.JPG > > > Now we have link to the profile from the web UI. > It will be handy for the users to have the link to local logs as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4701) Fix log name and missing lines in logs on Web UI
[ https://issues.apache.org/jira/browse/DRILL-4701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15341787#comment-15341787 ] Arina Ielchiieva commented on DRILL-4701: - Merged into master with commit id 4123ed2a539cd3f9812f22f96d56aa4709828acd > Fix log name and missing lines in logs on Web UI > > > Key: DRILL-4701 > URL: https://issues.apache.org/jira/browse/DRILL-4701 > Project: Apache Drill > Issue Type: Bug >Reporter: Arina Ielchiieva >Assignee: Arina Ielchiieva > Fix For: 1.7.0 > > > 1. When the log files are downloaded from the ui, the name of the downloaded > file is "download". We should save the file with the same name as the log > file (ie. drillbit.log) > 2. The last N lines of the log file displayed in the web UI do not match the > log file itself. Some lines are missing compared with actual log. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-4701) Fix log name and missing lines in logs on Web UI
[ https://issues.apache.org/jira/browse/DRILL-4701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva resolved DRILL-4701. - Resolution: Fixed > Fix log name and missing lines in logs on Web UI > > > Key: DRILL-4701 > URL: https://issues.apache.org/jira/browse/DRILL-4701 > Project: Apache Drill > Issue Type: Bug >Reporter: Arina Ielchiieva >Assignee: Arina Ielchiieva > Fix For: 1.7.0 > > > 1. When the log files are downloaded from the ui, the name of the downloaded > file is "download". We should save the file with the same name as the log > file (ie. drillbit.log) > 2. The last N lines of the log file displayed in the web UI do not match the > log file itself. Some lines are missing compared with actual log. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4716) status.json doesn't work in drill ui
[ https://issues.apache.org/jira/browse/DRILL-4716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15341785#comment-15341785 ] ASF GitHub Bot commented on DRILL-4716: --- Github user arina-ielchiieva commented on the issue: https://github.com/apache/drill/pull/522 Chnages merged into master with commit 1c451a341e80c2372be47d999741240fb5495eea > status.json doesn't work in drill ui > > > Key: DRILL-4716 > URL: https://issues.apache.org/jira/browse/DRILL-4716 > Project: Apache Drill > Issue Type: Bug > Components: Client - HTTP >Affects Versions: 1.6.0 >Reporter: Arina Ielchiieva >Assignee: Arina Ielchiieva >Priority: Minor > Fix For: 1.7.0 > > > 1. http://localhost:8047/status returns "Running!" > But http://localhost:8047/status.json gives error. > {code} > { > "errorMessage" : "HTTP 404 Not Found" > } > {code} > 2. Remove link to System Options on page http://localhost:8047/status as > redundant. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4716) status.json doesn't work in drill ui
[ https://issues.apache.org/jira/browse/DRILL-4716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15341786#comment-15341786 ] ASF GitHub Bot commented on DRILL-4716: --- Github user arina-ielchiieva closed the pull request at: https://github.com/apache/drill/pull/522 > status.json doesn't work in drill ui > > > Key: DRILL-4716 > URL: https://issues.apache.org/jira/browse/DRILL-4716 > Project: Apache Drill > Issue Type: Bug > Components: Client - HTTP >Affects Versions: 1.6.0 >Reporter: Arina Ielchiieva >Assignee: Arina Ielchiieva >Priority: Minor > Fix For: 1.7.0 > > > 1. http://localhost:8047/status returns "Running!" > But http://localhost:8047/status.json gives error. > {code} > { > "errorMessage" : "HTTP 404 Not Found" > } > {code} > 2. Remove link to System Options on page http://localhost:8047/status as > redundant. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-4716) status.json doesn't work in drill ui
[ https://issues.apache.org/jira/browse/DRILL-4716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva resolved DRILL-4716. - Resolution: Fixed > status.json doesn't work in drill ui > > > Key: DRILL-4716 > URL: https://issues.apache.org/jira/browse/DRILL-4716 > Project: Apache Drill > Issue Type: Bug > Components: Client - HTTP >Affects Versions: 1.6.0 >Reporter: Arina Ielchiieva >Assignee: Arina Ielchiieva >Priority: Minor > Fix For: 1.7.0 > > > 1. http://localhost:8047/status returns "Running!" > But http://localhost:8047/status.json gives error. > {code} > { > "errorMessage" : "HTTP 404 Not Found" > } > {code} > 2. Remove link to System Options on page http://localhost:8047/status as > redundant. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-2385) count on complex objects failed with missing function implementation
[ https://issues.apache.org/jira/browse/DRILL-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vitalii Diravka resolved DRILL-2385. Resolution: Fixed Fixed in f86c4fa8. > count on complex objects failed with missing function implementation > > > Key: DRILL-2385 > URL: https://issues.apache.org/jira/browse/DRILL-2385 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Drill >Affects Versions: 0.8.0 >Reporter: Chun Chang >Assignee: Vitalii Diravka >Priority: Minor > Fix For: 1.7.0 > > > #Wed Mar 04 01:23:42 EST 2015 > git.commit.id.abbrev=71b6bfe > Have a complex type looks like the following: > {code} > 0: jdbc:drill:schema=dfs.drillTestDirComplexJ> select t.sia from > `complex.json` t limit 1; > ++ > |sia | > ++ > | [1,11,101,1001] | > ++ > {code} > A count on the complex type will fail with missing function implementation: > {code} > 0: jdbc:drill:schema=dfs.drillTestDirComplexJ> select t.gbyi, count(t.sia) > countsia from `complex.json` t group by t.gbyi; > Query failed: RemoteRpcException: Failure while running fragment., Schema is > currently null. You must call buildSchema(SelectionVectorMode) before this > container can return a schema. [ 12856530-3133-45be-bdf4-ef8cc784f7b3 on > qa-node119.qa.lab:31010 ] > [ 12856530-3133-45be-bdf4-ef8cc784f7b3 on qa-node119.qa.lab:31010 ] > Error: exception while executing query: Failure while executing query. > (state=,code=0) > {code} > drillbit.log > {code} > 2015-03-04 13:44:51,383 [2b08832b-9247-e90c-785d-751f02fc1548:frag:2:0] ERROR > o.a.drill.exec.ops.FragmentContext - Fragment Context received failure. > org.apache.drill.exec.exception.SchemaChangeException: Failure while > materializing expression. > Error in expression at index 0. Error: Missing function implementation: > [count(BIGINT-REPEATED)]. Full expression: null. > at > org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.createAggregatorInternal(HashAggBatch.java:210) > [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.createAggregator(HashAggBatch.java:158) > [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.buildSchema(HashAggBatch.java:101) > [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:130) > [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:118) > [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:67) > [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.innerNext(PartitionSenderRootExec.java:114) > [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:57) > [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:121) > [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] > at > org.apache.drill.exec.work.WorkManager$RunnableWrapper.run(WorkManager.java:303) > [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > [na:1.7.0_45] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > [na:1.7.0_45] > at java.lang.Thread.run(Thread.java:744) [na:1.7.0_45] > 2015-03-04 13:44:51,383 [2b08832b-9247-e90c-785d-751f02fc1548:frag:2:0] WARN > o.a.d.e.w.fragment.FragmentExecutor - Error while initializing or executing > fragment > java.lang.NullPointerException: Schema is currently null. You must call > buildSchema(SelectionVectorMode) before this container can return a schema. > at > com.google.common.base.Preconditions.checkNotNull(Preconditions.java:208) > ~[guava-14.0.1.jar:na] > at > org.apache.drill.exec.record.VectorContainer.getSchema(VectorContainer.java:261) > ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.getSchema(AbstractRecordBatch.java:155) > ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] > at >
[jira] [Commented] (DRILL-2593) 500 error when crc for a query profile is out of sync
[ https://issues.apache.org/jira/browse/DRILL-2593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15341764#comment-15341764 ] ASF GitHub Bot commented on DRILL-2593: --- Github user arina-ielchiieva commented on the issue: https://github.com/apache/drill/pull/523 Merged into master with commit 2862beaf5c72ccaafc6c52b9956f2d0414948b67 > 500 error when crc for a query profile is out of sync > - > > Key: DRILL-2593 > URL: https://issues.apache.org/jira/browse/DRILL-2593 > Project: Apache Drill > Issue Type: Bug > Components: Client - HTTP >Affects Versions: 0.7.0 >Reporter: Jason Altekruse >Assignee: Arina Ielchiieva > Fix For: 1.7.0 > > Attachments: warning1.JPG, warning2.JPG > > > To reproduce, on a machine where an embedded drillbit has been run, edit one > of the profiles stored in /tmp/drill/profiles and try to navigate to the > profiles page on the Web UI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-2593) 500 error when crc for a query profile is out of sync
[ https://issues.apache.org/jira/browse/DRILL-2593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15341765#comment-15341765 ] ASF GitHub Bot commented on DRILL-2593: --- Github user arina-ielchiieva closed the pull request at: https://github.com/apache/drill/pull/523 > 500 error when crc for a query profile is out of sync > - > > Key: DRILL-2593 > URL: https://issues.apache.org/jira/browse/DRILL-2593 > Project: Apache Drill > Issue Type: Bug > Components: Client - HTTP >Affects Versions: 0.7.0 >Reporter: Jason Altekruse >Assignee: Arina Ielchiieva > Fix For: 1.7.0 > > Attachments: warning1.JPG, warning2.JPG > > > To reproduce, on a machine where an embedded drillbit has been run, edit one > of the profiles stored in /tmp/drill/profiles and try to navigate to the > profiles page on the Web UI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-2593) 500 error when crc for a query profile is out of sync
[ https://issues.apache.org/jira/browse/DRILL-2593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15341763#comment-15341763 ] Arina Ielchiieva commented on DRILL-2593: - Merged into master with commit 2862beaf5c72ccaafc6c52b9956f2d0414948b67 > 500 error when crc for a query profile is out of sync > - > > Key: DRILL-2593 > URL: https://issues.apache.org/jira/browse/DRILL-2593 > Project: Apache Drill > Issue Type: Bug > Components: Client - HTTP >Affects Versions: 0.7.0 >Reporter: Jason Altekruse >Assignee: Arina Ielchiieva > Fix For: 1.7.0 > > Attachments: warning1.JPG, warning2.JPG > > > To reproduce, on a machine where an embedded drillbit has been run, edit one > of the profiles stored in /tmp/drill/profiles and try to navigate to the > profiles page on the Web UI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4733) max(dir0) reading more columns than necessary
[ https://issues.apache.org/jira/browse/DRILL-4733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva updated DRILL-4733: Fix Version/s: 1.7.0 > max(dir0) reading more columns than necessary > - > > Key: DRILL-4733 > URL: https://issues.apache.org/jira/browse/DRILL-4733 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization, Storage - Parquet >Affects Versions: 1.7.0 >Reporter: Rahul Challapalli >Assignee: Arina Ielchiieva >Priority: Critical > Fix For: 1.7.0 > > Attachments: bug.tgz > > > The below query started to fail from this commit : > 3209886a8548eea4a2f74c059542672f8665b8d2 > {code} > select max(dir0) from dfs.`/drill/testdata/bug/2016`; > Error: UNSUPPORTED_OPERATION ERROR: Streaming aggregate does not support > schema changes > Fragment 0:0 > [Error Id: b0060205-e9a6-428a-9803-7b4312b2c6f4 on qa-node190.qa.lab:31010] > (state=,code=0) > {code} > The sub-folders contains files which do have schema change for one column > "contributions" (int32 vs double). However prior to this commit we did not > fail in the scenario. Log files and test data are attached -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (DRILL-4733) max(dir0) reading more columns than necessary
[ https://issues.apache.org/jira/browse/DRILL-4733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva reassigned DRILL-4733: --- Assignee: Arina Ielchiieva > max(dir0) reading more columns than necessary > - > > Key: DRILL-4733 > URL: https://issues.apache.org/jira/browse/DRILL-4733 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization, Storage - Parquet >Affects Versions: 1.7.0 >Reporter: Rahul Challapalli >Assignee: Arina Ielchiieva >Priority: Critical > Fix For: 1.7.0 > > Attachments: bug.tgz > > > The below query started to fail from this commit : > 3209886a8548eea4a2f74c059542672f8665b8d2 > {code} > select max(dir0) from dfs.`/drill/testdata/bug/2016`; > Error: UNSUPPORTED_OPERATION ERROR: Streaming aggregate does not support > schema changes > Fragment 0:0 > [Error Id: b0060205-e9a6-428a-9803-7b4312b2c6f4 on qa-node190.qa.lab:31010] > (state=,code=0) > {code} > The sub-folders contains files which do have schema change for one column > "contributions" (int32 vs double). However prior to this commit we did not > fail in the scenario. Log files and test data are attached -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4650) Excel file (.xsl) and Microsoft Access file (.accdb) problem
[ https://issues.apache.org/jira/browse/DRILL-4650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sanjiv Kumar updated DRILL-4650: Description: I am trying to query from excel file(.xsl file) and ms access file (.accdb), but i am unable to query from these files in drill. Is there any way to query these files. Or any Storage Plugin for query these excel and ms access files. was:I am trying to query from excel file(.xsl file) and ms access file (.accdb), but i am unable to query from these files in drill. Is there any way to query these files. Or any Storage Plugin for query these excel and ms access files. > Excel file (.xsl) and Microsoft Access file (.accdb) problem > - > > Key: DRILL-4650 > URL: https://issues.apache.org/jira/browse/DRILL-4650 > Project: Apache Drill > Issue Type: New Feature >Affects Versions: 1.6.0 >Reporter: Sanjiv Kumar > > I am trying to query from excel file(.xsl file) and ms access file (.accdb), > but i am unable to query from these files in drill. Is there any way to query > these files. Or any Storage Plugin for query these excel and ms access files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4601) Partitioning based on the parquet statistics
[ https://issues.apache.org/jira/browse/DRILL-4601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15341447#comment-15341447 ] Miroslav Holubec commented on DRILL-4601: - [~jacq...@dremio.com], [~jaltekruse], [~sphillips]: any inputs? > Partitioning based on the parquet statistics > > > Key: DRILL-4601 > URL: https://issues.apache.org/jira/browse/DRILL-4601 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Reporter: Miroslav Holubec > Labels: parquet, partitioning, planning, statistics > Attachments: DRILL-4601.1.patch > > > It can really help performance to extend current partitioning idea > implemented in DRILL- even further. > Currently partitioning is based on statistics, when min value equals to max > value for whole file. Based on this, files are removed from scan in planning > phase. Problem is, that it leads to many small parquet files, which is not > fine in HDFS world. Also only few columns are partitioned. > I would like to extend this idea to use all statistics for all columns. So if > value should equal to constant, remove all files from plan which have > statistics off. This will really help performance for scans over many parquet > files. > I have initial patch ready, currently just to give an idea. (it changes > metadata v2, which is not fine). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4601) Partitioning based on the parquet statistics
[ https://issues.apache.org/jira/browse/DRILL-4601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Miroslav Holubec updated DRILL-4601: Description: It can really help performance to extend current partitioning idea implemented in DRILL- even further. Currently partitioning is based on statistics, when min value equals to max value for whole file. Based on this, files are removed from scan in planning phase. Problem is, that it leads to many small parquet files, which is not fine in HDFS world. Also only few columns are partitioned. I would like to extend this idea to use all statistics for all columns. So if value should equal to constant, remove all files from plan which have statistics off. This will really help performance for scans over many parquet files. I have initial patch ready, currently just to give an idea. (it changes metadata v2, which is not fine). was: It can really help performance to extend current partitioning idea implemented in DRILL- even further. Currently partitioning is based on statistics, when min value equals to max value for whole file. Based on this, files are removed from scan in planning phase. Problem is, that it leads to many small parquet files, which is not fine in HDFS world. Also only few columns are partitioned. I would like to extend this idea to use all statistics for all columns. So if value should equal to constant, remove all files from plan which have statistics off. This will really help performance for scans over many parquet files. I have initial patch ready, currently just to give an idea. (it changes metadata v2, which is not fine and also it currently supports only equal operation). > Partitioning based on the parquet statistics > > > Key: DRILL-4601 > URL: https://issues.apache.org/jira/browse/DRILL-4601 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Reporter: Miroslav Holubec > Labels: parquet, partitioning, planning, statistics > Attachments: DRILL-4601.1.patch > > > It can really help performance to extend current partitioning idea > implemented in DRILL- even further. > Currently partitioning is based on statistics, when min value equals to max > value for whole file. Based on this, files are removed from scan in planning > phase. Problem is, that it leads to many small parquet files, which is not > fine in HDFS world. Also only few columns are partitioned. > I would like to extend this idea to use all statistics for all columns. So if > value should equal to constant, remove all files from plan which have > statistics off. This will really help performance for scans over many parquet > files. > I have initial patch ready, currently just to give an idea. (it changes > metadata v2, which is not fine). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4601) Partitioning based on the parquet statistics
[ https://issues.apache.org/jira/browse/DRILL-4601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15341441#comment-15341441 ] Miroslav Holubec commented on DRILL-4601: - current patch in github: https://github.com/myroch/drill > Partitioning based on the parquet statistics > > > Key: DRILL-4601 > URL: https://issues.apache.org/jira/browse/DRILL-4601 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Reporter: Miroslav Holubec > Labels: parquet, partitioning, planning, statistics > Attachments: DRILL-4601.1.patch > > > It can really help performance to extend current partitioning idea > implemented in DRILL- even further. > Currently partitioning is based on statistics, when min value equals to max > value for whole file. Based on this, files are removed from scan in planning > phase. Problem is, that it leads to many small parquet files, which is not > fine in HDFS world. Also only few columns are partitioned. > I would like to extend this idea to use all statistics for all columns. So if > value should equal to constant, remove all files from plan which have > statistics off. This will really help performance for scans over many parquet > files. > I have initial patch ready, currently just to give an idea. (it changes > metadata v2, which is not fine and also it currently supports only equal > operation). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4650) Excel file (.xsl) and Microsoft Access file (.accdb) problem
[ https://issues.apache.org/jira/browse/DRILL-4650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15341359#comment-15341359 ] Sanjiv Kumar commented on DRILL-4650: - Can anyone tell how to query from excel file(xsl) through Storage Plugin Please. ?? > Excel file (.xsl) and Microsoft Access file (.accdb) problem > - > > Key: DRILL-4650 > URL: https://issues.apache.org/jira/browse/DRILL-4650 > Project: Apache Drill > Issue Type: New Feature >Affects Versions: 1.6.0 >Reporter: Sanjiv Kumar > > I am trying to query from excel file(.xsl file) and ms access file (.accdb), > but i am unable to query from these files in drill. Is there any way to query > these files. Or any Storage Plugin for query these excel and ms access files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)