[jira] [Updated] (DRILL-4734) Query against HBase table on a 5 node cluster fails with SchemaChangeException

2016-06-21 Thread tiredqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

tiredqiang updated DRILL-4734:
--
Attachment: drillbit.log

Attached the driibit error log. 

For this test case, I used the hbase1.2.0 client, but there have the same error 
for version 1.1.3 , that's the reason I changed to 1.2.0.  

> Query against HBase table on a 5 node cluster fails with SchemaChangeException
> --
>
> Key: DRILL-4734
> URL: https://issues.apache.org/jira/browse/DRILL-4734
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators, Storage - HBase
>Affects Versions: 1.6.0
>Reporter: Aman Sinha
> Attachments: 2nodes.explain.txt, 5nodes.explain.txt, drillbit.log
>
>
> [Creating this JIRA on behalf of Qiang Li]
> Let say I have two tables.
> {noformat}
> offers_ref0 : rowkey salt(1byte)+long uid(8 byte ) , family: v,  qualifier:
> v(string)
> offers_nation_idx: rowkey salt(1byte) + string, family:v, qualifier: v(long
> 8 byte)
> {noformat}
> there is the SQL:
> {noformat}
> select CONVERT_FROM(BYTE_SUBSTR(`ref0`.row_key,-8,8),'BIGINT_BE') as did, 
>convert_from(`ref0`.`v`.`v`,'UTF8') as v  from hbase.`offers_nation_idx` 
> as`nation` 
> join hbase.offers_ref0 as `ref0` 
> on CONVERT_FROM(BYTE_SUBSTR(`ref0`.row_key,-8,8),'BIGINT_BE') = 
>  CONVERT_FROM(nation.`v`.`v`,'BIGINT_BE') 
>where `nation`.row_key  > '0br' and `nation`.row_key  < '0bs' 
>  limit 10
> {noformat}
> When I execute the query with single node or less than 5 nodes, its working
> good. But when I execute it in cluster which have about 14 nodes, its throw
> a exception:
> First time will throw this exception:
> *Caused by: java.sql.SQLException: SYSTEM ERROR: SchemaChangeException:
> Hash join does not support schema changes*
> Then if I query again, it will always throw below exception:
> {noformat}
> *Query Failed: An Error Occurred*
> *org.apache.drill.common.exceptions.UserRemoteException: SYSTEM 
> ERROR:IllegalStateException: 
> Failure while reading vector. Expected vector class of 
> org.apache.drill.exec.vector.NullableIntVector but 
> was holding vector class org.apache.drill.exec.vector.complex.MapVector, 
> field=v(MAP:REQUIRED
>  [v(VARBINARY:OPTIONAL)[$bits$(UINT1:REQUIRED),
> v(VARBINARY:OPTIONAL)[$offsets$(UINT4:REQUIRED)]]] Fragment 12:4
>  [Error Id:06c6eae4-0822-4714-b0bf-a6e04ebfec79 on xxx:31010]*
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (DRILL-1328) Support table statistics

2016-06-21 Thread Gautam Kumar Parai (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-1328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gautam Kumar Parai reassigned DRILL-1328:
-

Assignee: Gautam Kumar Parai

> Support table statistics
> 
>
> Key: DRILL-1328
> URL: https://issues.apache.org/jira/browse/DRILL-1328
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Cliff Buchanan
>Assignee: Gautam Kumar Parai
> Fix For: Future
>
> Attachments: 0001-PRE-Set-value-count-in-splitAndTransfer.patch
>
>
> This consists of several subtasks
> * implement operators to generate statistics
> * add "analyze table" support to parser/planner
> * create a metadata provider to allow statistics to be used by optiq in 
> planning optimization
> * implement statistics functions
> Right now, the bulk of this functionality is implemented, but it hasn't been 
> rigorously tested and needs to have some definite answers for some of the 
> parts "around the edges" (how analyze table figures out where the table 
> statistics are located, how a table "append" should work in a read only file 
> system)
> Also, here are a few known caveats:
> * table statistics are collected by creating a sql query based on the string 
> path of the table. This should probably be done with a Table reference.
> * Case sensitivity for column statistics is probably iffy
> * Math for combining two column NDVs into a joint NDV should be checked.
> * Schema changes aren't really being considered yet.
> * adding getDrillTable is probably unnecessary; it might be better to do 
> getTable().unwrap(DrillTable.class)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4203) Parquet File : Date is stored wrongly

2016-06-21 Thread Jacques Nadeau (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15343111#comment-15343111
 ] 

Jacques Nadeau commented on DRILL-4203:
---

I think someone will need to pick this up. I don't think anyone is actively 
working on it.

> Parquet File : Date is stored wrongly
> -
>
> Key: DRILL-4203
> URL: https://issues.apache.org/jira/browse/DRILL-4203
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.4.0
>Reporter: Stéphane Trou
>Priority: Critical
>
> Hello,
> I have some problems when i try to read parquet files produce by drill with  
> Spark,  all dates are corrupted.
> I think the problem come from drill :)
> {code}
> cat /tmp/date_parquet.csv 
> Epoch,1970-01-01
> {code}
> {code}
> 0: jdbc:drill:zk=local> select columns[0] as name, cast(columns[1] as date) 
> as epoch_date from dfs.tmp.`date_parquet.csv`;
> ++-+
> |  name  | epoch_date  |
> ++-+
> | Epoch  | 1970-01-01  |
> ++-+
> {code}
> {code}
> 0: jdbc:drill:zk=local> create table dfs.tmp.`buggy_parquet`as select 
> columns[0] as name, cast(columns[1] as date) as epoch_date from 
> dfs.tmp.`date_parquet.csv`;
> +---++
> | Fragment  | Number of records written  |
> +---++
> | 0_0   | 1  |
> +---++
> {code}
> When I read the file with parquet tools, i found  
> {code}
> java -jar parquet-tools-1.8.1.jar head /tmp/buggy_parquet/
> name = Epoch
> epoch_date = 4881176
> {code}
> According to 
> [https://github.com/Parquet/parquet-format/blob/master/LogicalTypes.md#date], 
> epoch_date should be equals to 0.
> Meta : 
> {code}
> java -jar parquet-tools-1.8.1.jar meta /tmp/buggy_parquet/
> file:file:/tmp/buggy_parquet/0_0_0.parquet 
> creator: parquet-mr version 1.8.1-drill-r0 (build 
> 6b605a4ea05b66e1a6bf843353abcb4834a4ced8) 
> extra:   drill.version = 1.4.0 
> file schema: root 
> 
> name:OPTIONAL BINARY O:UTF8 R:0 D:1
> epoch_date:  OPTIONAL INT32 O:DATE R:0 D:1
> row group 1: RC:1 TS:93 OFFSET:4 
> 
> name: BINARY SNAPPY DO:0 FPO:4 SZ:52/50/0,96 VC:1 
> ENC:RLE,BIT_PACKED,PLAIN
> epoch_date:   INT32 SNAPPY DO:0 FPO:56 SZ:45/43/0,96 VC:1 
> ENC:RLE,BIT_PACKED,PLAIN
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4203) Parquet File : Date is stored wrongly

2016-06-21 Thread Jacques Nadeau (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacques Nadeau updated DRILL-4203:
--
Assignee: (was: Jason Altekruse)

> Parquet File : Date is stored wrongly
> -
>
> Key: DRILL-4203
> URL: https://issues.apache.org/jira/browse/DRILL-4203
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.4.0
>Reporter: Stéphane Trou
>Priority: Critical
>
> Hello,
> I have some problems when i try to read parquet files produce by drill with  
> Spark,  all dates are corrupted.
> I think the problem come from drill :)
> {code}
> cat /tmp/date_parquet.csv 
> Epoch,1970-01-01
> {code}
> {code}
> 0: jdbc:drill:zk=local> select columns[0] as name, cast(columns[1] as date) 
> as epoch_date from dfs.tmp.`date_parquet.csv`;
> ++-+
> |  name  | epoch_date  |
> ++-+
> | Epoch  | 1970-01-01  |
> ++-+
> {code}
> {code}
> 0: jdbc:drill:zk=local> create table dfs.tmp.`buggy_parquet`as select 
> columns[0] as name, cast(columns[1] as date) as epoch_date from 
> dfs.tmp.`date_parquet.csv`;
> +---++
> | Fragment  | Number of records written  |
> +---++
> | 0_0   | 1  |
> +---++
> {code}
> When I read the file with parquet tools, i found  
> {code}
> java -jar parquet-tools-1.8.1.jar head /tmp/buggy_parquet/
> name = Epoch
> epoch_date = 4881176
> {code}
> According to 
> [https://github.com/Parquet/parquet-format/blob/master/LogicalTypes.md#date], 
> epoch_date should be equals to 0.
> Meta : 
> {code}
> java -jar parquet-tools-1.8.1.jar meta /tmp/buggy_parquet/
> file:file:/tmp/buggy_parquet/0_0_0.parquet 
> creator: parquet-mr version 1.8.1-drill-r0 (build 
> 6b605a4ea05b66e1a6bf843353abcb4834a4ced8) 
> extra:   drill.version = 1.4.0 
> file schema: root 
> 
> name:OPTIONAL BINARY O:UTF8 R:0 D:1
> epoch_date:  OPTIONAL INT32 O:DATE R:0 D:1
> row group 1: RC:1 TS:93 OFFSET:4 
> 
> name: BINARY SNAPPY DO:0 FPO:4 SZ:52/50/0,96 VC:1 
> ENC:RLE,BIT_PACKED,PLAIN
> epoch_date:   INT32 SNAPPY DO:0 FPO:56 SZ:45/43/0,96 VC:1 
> ENC:RLE,BIT_PACKED,PLAIN
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4743) HashJoin's not fully parallelized in query plan

2016-06-21 Thread Gautam Kumar Parai (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15343061#comment-15343061
 ] 

Gautam Kumar Parai commented on DRILL-4743:
---

[~amansinha100] I have updated the pull request. Please take a look.

> HashJoin's not fully parallelized in query plan
> ---
>
> Key: DRILL-4743
> URL: https://issues.apache.org/jira/browse/DRILL-4743
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.5.0
>Reporter: Gautam Kumar Parai
>Assignee: Gautam Kumar Parai
>  Labels: doc-impacting
>
> The underlying problem is filter selectivity under-estimate for a query with 
> complicated predicates e.g. deeply nested and/or predicates. This leads to 
> under parallelization of the major fragment doing the join. 
> To really resolve this problem we need table/column statistics to correctly 
> estimate the selectivity. However, in the absence of statistics OR even when 
> existing statistics are insufficient to get a correct estimate of selectivity 
> this will serve as a workaround.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4571) Add link to local Drill logs from the web UI

2016-06-21 Thread Krystal (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15343042#comment-15343042
 ] 

Krystal commented on DRILL-4571:


git.commit.id.abbrev=fbdd20e

Verified feature.

> Add link to local Drill logs from the web UI
> 
>
> Key: DRILL-4571
> URL: https://issues.apache.org/jira/browse/DRILL-4571
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.6.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>  Labels: doc-impacting
> Fix For: 1.7.0
>
> Attachments: display_log.JPG, drillbit_download.log.gz, 
> drillbit_queries_json_screenshot.jpg, drillbit_ui.log, log_list.JPG
>
>
> Now we have link to the profile from the web UI.
> It will be handy for the users to have the link to local logs as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (DRILL-4571) Add link to local Drill logs from the web UI

2016-06-21 Thread Krystal (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krystal closed DRILL-4571.
--

Verified.

> Add link to local Drill logs from the web UI
> 
>
> Key: DRILL-4571
> URL: https://issues.apache.org/jira/browse/DRILL-4571
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.6.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>  Labels: doc-impacting
> Fix For: 1.7.0
>
> Attachments: display_log.JPG, drillbit_download.log.gz, 
> drillbit_queries_json_screenshot.jpg, drillbit_ui.log, log_list.JPG
>
>
> Now we have link to the profile from the web UI.
> It will be handy for the users to have the link to local logs as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4203) Parquet File : Date is stored wrongly

2016-06-21 Thread Rahul Challapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15342999#comment-15342999
 ] 

Rahul Challapalli commented on DRILL-4203:
--

May be I lost track of some conversation around this. What is the latest update 
on this issue?

> Parquet File : Date is stored wrongly
> -
>
> Key: DRILL-4203
> URL: https://issues.apache.org/jira/browse/DRILL-4203
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.4.0
>Reporter: Stéphane Trou
>Assignee: Jason Altekruse
>Priority: Critical
>
> Hello,
> I have some problems when i try to read parquet files produce by drill with  
> Spark,  all dates are corrupted.
> I think the problem come from drill :)
> {code}
> cat /tmp/date_parquet.csv 
> Epoch,1970-01-01
> {code}
> {code}
> 0: jdbc:drill:zk=local> select columns[0] as name, cast(columns[1] as date) 
> as epoch_date from dfs.tmp.`date_parquet.csv`;
> ++-+
> |  name  | epoch_date  |
> ++-+
> | Epoch  | 1970-01-01  |
> ++-+
> {code}
> {code}
> 0: jdbc:drill:zk=local> create table dfs.tmp.`buggy_parquet`as select 
> columns[0] as name, cast(columns[1] as date) as epoch_date from 
> dfs.tmp.`date_parquet.csv`;
> +---++
> | Fragment  | Number of records written  |
> +---++
> | 0_0   | 1  |
> +---++
> {code}
> When I read the file with parquet tools, i found  
> {code}
> java -jar parquet-tools-1.8.1.jar head /tmp/buggy_parquet/
> name = Epoch
> epoch_date = 4881176
> {code}
> According to 
> [https://github.com/Parquet/parquet-format/blob/master/LogicalTypes.md#date], 
> epoch_date should be equals to 0.
> Meta : 
> {code}
> java -jar parquet-tools-1.8.1.jar meta /tmp/buggy_parquet/
> file:file:/tmp/buggy_parquet/0_0_0.parquet 
> creator: parquet-mr version 1.8.1-drill-r0 (build 
> 6b605a4ea05b66e1a6bf843353abcb4834a4ced8) 
> extra:   drill.version = 1.4.0 
> file schema: root 
> 
> name:OPTIONAL BINARY O:UTF8 R:0 D:1
> epoch_date:  OPTIONAL INT32 O:DATE R:0 D:1
> row group 1: RC:1 TS:93 OFFSET:4 
> 
> name: BINARY SNAPPY DO:0 FPO:4 SZ:52/50/0,96 VC:1 
> ENC:RLE,BIT_PACKED,PLAIN
> epoch_date:   INT32 SNAPPY DO:0 FPO:56 SZ:45/43/0,96 VC:1 
> ENC:RLE,BIT_PACKED,PLAIN
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4733) max(dir0) reading more columns than necessary

2016-06-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15342874#comment-15342874
 ] 

ASF GitHub Bot commented on DRILL-4733:
---

Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/531#discussion_r67962703
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/store/TestImplicitFileColumns.java
 ---
@@ -110,4 +111,20 @@ public void testImplicitColumnsForParquet() throws 
Exception {
 .go();
   }
 
+  @Test // DRILL-4733
+  public void testMultilevelParquetWithSchemaChange() throws Exception {
+try {
+  test("alter session set `planner.enable_decimal_data_type` = true");
+  testBuilder()
+  .sqlQuery(String.format("select max(dir0) as max_dir from 
dfs_test.`%s/src/test/resources/multilevel/parquetWithSchemaChange`",
+  TestTools.getWorkingPath()))
+  .unOrdered()
+  .baselineColumns("max_dir")
+  .baselineValues("voter50.parquet")
--- End diff --

@jinfengni 
I guess confusion in here is that `voter50.parquet` is folder name. If 
would be clearer I can rename folder and files in it (currently files have 
names 0_0_0.parquet).


> max(dir0) reading more columns than necessary
> -
>
> Key: DRILL-4733
> URL: https://issues.apache.org/jira/browse/DRILL-4733
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization, Storage - Parquet
>Affects Versions: 1.7.0
>Reporter: Rahul Challapalli
>Assignee: Arina Ielchiieva
>Priority: Critical
> Fix For: 1.7.0
>
> Attachments: bug.tgz
>
>
> The below query started to fail from this commit : 
> 3209886a8548eea4a2f74c059542672f8665b8d2
> {code}
> select max(dir0) from dfs.`/drill/testdata/bug/2016`;
> Error: UNSUPPORTED_OPERATION ERROR: Streaming aggregate does not support 
> schema changes
> Fragment 0:0
> [Error Id: b0060205-e9a6-428a-9803-7b4312b2c6f4 on qa-node190.qa.lab:31010] 
> (state=,code=0)
> {code}
> The sub-folders contains files which do have schema change for one column 
> "contributions" (int32 vs double). However prior to this commit we did not 
> fail in the scenario. Log files and test data are attached



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4743) HashJoin's not fully parallelized in query plan

2016-06-21 Thread Zelaine Fong (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong updated DRILL-4743:

Labels: doc-impacting  (was: )

> HashJoin's not fully parallelized in query plan
> ---
>
> Key: DRILL-4743
> URL: https://issues.apache.org/jira/browse/DRILL-4743
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.5.0
>Reporter: Gautam Kumar Parai
>Assignee: Gautam Kumar Parai
>  Labels: doc-impacting
>
> The underlying problem is filter selectivity under-estimate for a query with 
> complicated predicates e.g. deeply nested and/or predicates. This leads to 
> under parallelization of the major fragment doing the join. 
> To really resolve this problem we need table/column statistics to correctly 
> estimate the selectivity. However, in the absence of statistics OR even when 
> existing statistics are insufficient to get a correct estimate of selectivity 
> this will serve as a workaround.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4743) HashJoin's not fully parallelized in query plan

2016-06-21 Thread Gautam Kumar Parai (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15342826#comment-15342826
 ] 

Gautam Kumar Parai commented on DRILL-4743:
---

I have created a pull request https://github.com/apache/drill/pull/534 
[~amansinha100] can you please take a look and provide the feedback

> HashJoin's not fully parallelized in query plan
> ---
>
> Key: DRILL-4743
> URL: https://issues.apache.org/jira/browse/DRILL-4743
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.5.0
>Reporter: Gautam Kumar Parai
>Assignee: Gautam Kumar Parai
>
> The underlying problem is filter selectivity under-estimate for a query with 
> complicated predicates e.g. deeply nested and/or predicates. This leads to 
> under parallelization of the major fragment doing the join. 
> To really resolve this problem we need table/column statistics to correctly 
> estimate the selectivity. However, in the absence of statistics OR even when 
> existing statistics are insufficient to get a correct estimate of selectivity 
> this will serve as a workaround.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4743) HashJoin's not fully parallelized in query plan

2016-06-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15342825#comment-15342825
 ] 

ASF GitHub Bot commented on DRILL-4743:
---

GitHub user gparai opened a pull request:

https://github.com/apache/drill/pull/534

[DRILL-4743] HashJoin's not fully parallelized in query plan

Provide a user parameter for defining a lower bound of selectivity to 
prevent under-estimates on filter selectivity.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gparai/drill MD-880-ADM

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/534.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #534






> HashJoin's not fully parallelized in query plan
> ---
>
> Key: DRILL-4743
> URL: https://issues.apache.org/jira/browse/DRILL-4743
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.5.0
>Reporter: Gautam Kumar Parai
>Assignee: Gautam Kumar Parai
>
> The underlying problem is filter selectivity under-estimate for a query with 
> complicated predicates e.g. deeply nested and/or predicates. This leads to 
> under parallelization of the major fragment doing the join. 
> To really resolve this problem we need table/column statistics to correctly 
> estimate the selectivity. However, in the absence of statistics OR even when 
> existing statistics are insufficient to get a correct estimate of selectivity 
> this will serve as a workaround.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4743) HashJoin's not fully parallelized in query plan

2016-06-21 Thread Gautam Kumar Parai (JIRA)
Gautam Kumar Parai created DRILL-4743:
-

 Summary: HashJoin's not fully parallelized in query plan
 Key: DRILL-4743
 URL: https://issues.apache.org/jira/browse/DRILL-4743
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.5.0
Reporter: Gautam Kumar Parai
Assignee: Gautam Kumar Parai


The underlying problem is filter selectivity under-estimate for a query with 
complicated predicates e.g. deeply nested and/or predicates. This leads to 
under parallelization of the major fragment doing the join. 

To really resolve this problem we need table/column statistics to correctly 
estimate the selectivity. However, in the absence of statistics OR even when 
existing statistics are insufficient to get a correct estimate of selectivity 
this will serve as a workaround.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4733) max(dir0) reading more columns than necessary

2016-06-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15342753#comment-15342753
 ] 

ASF GitHub Bot commented on DRILL-4733:
---

Github user amansinha100 commented on a diff in the pull request:

https://github.com/apache/drill/pull/531#discussion_r67955187
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/easy/EasyFormatPlugin.java
 ---
@@ -126,8 +127,12 @@ CloseableRecordBatch getReaderBatch(FragmentContext 
context, EasySubScan scan) t
 final ImplicitColumnExplorer columnExplorer = new 
ImplicitColumnExplorer(context, scan.getColumns());
 
 if (!columnExplorer.isSelectAllColumns()) {
+  // We must make sure to pass a table column (not to be confused with 
implicit column) to the underlying record reader.
+  List tableColumns =
--- End diff --

Ok, I see.   Although this patch resolves this issue, I am thinking that 
without doing a performance test it is not feasible to see the performance 
impact of the overall implicit columns support.  It is a nice feature to have 
but I think we can give it a little more time to go through functional and perf 
tests.  


> max(dir0) reading more columns than necessary
> -
>
> Key: DRILL-4733
> URL: https://issues.apache.org/jira/browse/DRILL-4733
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization, Storage - Parquet
>Affects Versions: 1.7.0
>Reporter: Rahul Challapalli
>Assignee: Arina Ielchiieva
>Priority: Critical
> Fix For: 1.7.0
>
> Attachments: bug.tgz
>
>
> The below query started to fail from this commit : 
> 3209886a8548eea4a2f74c059542672f8665b8d2
> {code}
> select max(dir0) from dfs.`/drill/testdata/bug/2016`;
> Error: UNSUPPORTED_OPERATION ERROR: Streaming aggregate does not support 
> schema changes
> Fragment 0:0
> [Error Id: b0060205-e9a6-428a-9803-7b4312b2c6f4 on qa-node190.qa.lab:31010] 
> (state=,code=0)
> {code}
> The sub-folders contains files which do have schema change for one column 
> "contributions" (int32 vs double). However prior to this commit we did not 
> fail in the scenario. Log files and test data are attached



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4742) Using convert_from timestamp_impala gives a random error

2016-06-21 Thread Rahul Challapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rahul Challapalli updated DRILL-4742:
-
Attachment: temp.parquet
error.txt

The above query ran successfully 5-6 times before I hit the random error. The 
attached log contains information related to the successful runs as well 

> Using convert_from timestamp_impala gives a random error
> 
>
> Key: DRILL-4742
> URL: https://issues.apache.org/jira/browse/DRILL-4742
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.6.0, 1.7.0
>Reporter: Rahul Challapalli
>Priority: Critical
> Attachments: error.txt, temp.parquet
>
>
> Drill Commit # fbdd20e54351879200184b478c2a32f238bf2176
> The following query randomly generates the below error. 
> {code}
> select convert_from(create_timestamp, 'TIMESTAMP_IMPALA') from 
> dfs.`/drill/testdata/temp.parquet`;
> Error: SYSTEM ERROR: ArrayIndexOutOfBoundsException: 0
> Fragment 0:0
> [Error Id: 9fe53a95-c4ae-424d-8c6d-489abab2d2ca on qa-node190.qa.lab:31010] 
> (state=,code=0)
> {code}
> The underlying parquet file is generated using hive. Below is the metadata 
> information
> {code}
> /root/parquet-tools-1.5.1-SNAPSHOT/parquet-meta temp.parquet 
> creator:  parquet-mr version 1.6.0 
> file schema:  hive_schema 
> 
> voter_id: OPTIONAL INT32 R:0 D:1
> name: OPTIONAL BINARY O:UTF8 R:0 D:1
> age:  OPTIONAL INT32 R:0 D:1
> registration: OPTIONAL BINARY O:UTF8 R:0 D:1
> contributions:OPTIONAL FLOAT R:0 D:1
> voterzone:OPTIONAL INT32 R:0 D:1
> create_timestamp: OPTIONAL INT96 R:0 D:1
> create_date:  OPTIONAL INT32 O:DATE R:0 D:1
> row group 1:  RC:200 TS:9902 
> 
> voter_id:  INT32 UNCOMPRESSED DO:0 FPO:4 SZ:843/843/1.00 VC:200 
> ENC:RLE,BIT_PACKED,PLAIN
> name:  BINARY UNCOMPRESSED DO:0 FPO:847 SZ:3214/3214/1.00 VC:200 
> ENC:PLAIN_DICTIONARY,RLE,BIT_PACKED
> age:   INT32 UNCOMPRESSED DO:0 FPO:4061 SZ:438/438/1.00 VC:200 
> ENC:PLAIN_DICTIONARY,RLE,BIT_PACKED
> registration:  BINARY UNCOMPRESSED DO:0 FPO:4499 SZ:241/241/1.00 VC:200 
> ENC:PLAIN_DICTIONARY,RLE,BIT_PACKED
> contributions: FLOAT UNCOMPRESSED DO:0 FPO:4740 SZ:843/843/1.00 VC:200 
> ENC:RLE,BIT_PACKED,PLAIN
> voterzone: INT32 UNCOMPRESSED DO:0 FPO:5583 SZ:843/843/1.00 VC:200 
> ENC:RLE,BIT_PACKED,PLAIN
> create_timestamp:  INT96 UNCOMPRESSED DO:0 FPO:6426 SZ:2642/2642/1.00 VC:200 
> ENC:PLAIN_DICTIONARY,RLE,BIT_PACKED
> create_date:   INT32 UNCOMPRESSED DO:0 FPO:9068 SZ:838/838/1.00 VC:200 
> ENC:RLE,BIT_PACKED,PLAIN
> {code}
> I attached the log file and the data file



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4742) Using convert_from timestamp_impala gives a random error

2016-06-21 Thread Rahul Challapalli (JIRA)
Rahul Challapalli created DRILL-4742:


 Summary: Using convert_from timestamp_impala gives a random error
 Key: DRILL-4742
 URL: https://issues.apache.org/jira/browse/DRILL-4742
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.6.0, 1.7.0
Reporter: Rahul Challapalli
Priority: Critical


Drill Commit # fbdd20e54351879200184b478c2a32f238bf2176

The following query randomly generates the below error. 
{code}
select convert_from(create_timestamp, 'TIMESTAMP_IMPALA') from 
dfs.`/drill/testdata/temp.parquet`;
Error: SYSTEM ERROR: ArrayIndexOutOfBoundsException: 0

Fragment 0:0

[Error Id: 9fe53a95-c4ae-424d-8c6d-489abab2d2ca on qa-node190.qa.lab:31010] 
(state=,code=0)
{code}

The underlying parquet file is generated using hive. Below is the metadata 
information
{code}
/root/parquet-tools-1.5.1-SNAPSHOT/parquet-meta temp.parquet 
creator:  parquet-mr version 1.6.0 

file schema:  hive_schema 

voter_id: OPTIONAL INT32 R:0 D:1
name: OPTIONAL BINARY O:UTF8 R:0 D:1
age:  OPTIONAL INT32 R:0 D:1
registration: OPTIONAL BINARY O:UTF8 R:0 D:1
contributions:OPTIONAL FLOAT R:0 D:1
voterzone:OPTIONAL INT32 R:0 D:1
create_timestamp: OPTIONAL INT96 R:0 D:1
create_date:  OPTIONAL INT32 O:DATE R:0 D:1

row group 1:  RC:200 TS:9902 

voter_id:  INT32 UNCOMPRESSED DO:0 FPO:4 SZ:843/843/1.00 VC:200 
ENC:RLE,BIT_PACKED,PLAIN
name:  BINARY UNCOMPRESSED DO:0 FPO:847 SZ:3214/3214/1.00 VC:200 
ENC:PLAIN_DICTIONARY,RLE,BIT_PACKED
age:   INT32 UNCOMPRESSED DO:0 FPO:4061 SZ:438/438/1.00 VC:200 
ENC:PLAIN_DICTIONARY,RLE,BIT_PACKED
registration:  BINARY UNCOMPRESSED DO:0 FPO:4499 SZ:241/241/1.00 VC:200 
ENC:PLAIN_DICTIONARY,RLE,BIT_PACKED
contributions: FLOAT UNCOMPRESSED DO:0 FPO:4740 SZ:843/843/1.00 VC:200 
ENC:RLE,BIT_PACKED,PLAIN
voterzone: INT32 UNCOMPRESSED DO:0 FPO:5583 SZ:843/843/1.00 VC:200 
ENC:RLE,BIT_PACKED,PLAIN
create_timestamp:  INT96 UNCOMPRESSED DO:0 FPO:6426 SZ:2642/2642/1.00 VC:200 
ENC:PLAIN_DICTIONARY,RLE,BIT_PACKED
create_date:   INT32 UNCOMPRESSED DO:0 FPO:9068 SZ:838/838/1.00 VC:200 
ENC:RLE,BIT_PACKED,PLAIN
{code}

I attached the log file and the data file



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (DRILL-4735) Count(dir0) on parquet returns 0 result

2016-06-21 Thread Jinfeng Ni (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15342270#comment-15342270
 ] 

Jinfeng Ni edited comment on DRILL-4735 at 6/21/16 8:47 PM:


I run the query on 1.4.0, and saw the same problem.  I have not checked earlier 
version. But it's likely that this problem has been there for long time.

This bug also happened on 1.0.0 release. 


was (Author: jni):
I run the query on 1.4.0, and saw the same problem.  I have not checked earlier 
version. But it's likely that this problem has been there for long time.


> Count(dir0) on parquet returns 0 result
> ---
>
> Key: DRILL-4735
> URL: https://issues.apache.org/jira/browse/DRILL-4735
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization, Storage - Parquet
>Affects Versions: 1.0.0, 1.4.0, 1.6.0, 1.7.0
>Reporter: Krystal
>Assignee: Jinfeng Ni
>Priority: Critical
>
> Selecting a count of dir0, dir1, etc against a parquet directory returns 0 
> rows.
> select count(dir0) from `min_max_dir`;
> +-+
> | EXPR$0  |
> +-+
> | 0   |
> +-+
> select count(dir1) from `min_max_dir`;
> +-+
> | EXPR$0  |
> +-+
> | 0   |
> +-+
> If I put both dir0 and dir1 in the same select, it returns expected result:
> select count(dir0), count(dir1) from `min_max_dir`;
> +-+-+
> | EXPR$0  | EXPR$1  |
> +-+-+
> | 600 | 600 |
> +-+-+
> Here is the physical plan for count(dir0) query:
> {code}
> 00-00Screen : rowType = RecordType(BIGINT EXPR$0): rowcount = 20.0, 
> cumulative cost = {22.0 rows, 22.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id 
> = 1346
> 00-01  Project(EXPR$0=[$0]) : rowType = RecordType(BIGINT EXPR$0): 
> rowcount = 20.0, cumulative cost = {20.0 rows, 20.0 cpu, 0.0 io, 0.0 network, 
> 0.0 memory}, id = 1345
> 00-02Project(EXPR$0=[$0]) : rowType = RecordType(BIGINT EXPR$0): 
> rowcount = 20.0, cumulative cost = {20.0 rows, 20.0 cpu, 0.0 io, 0.0 network, 
> 0.0 memory}, id = 1344
> 00-03  
> Scan(groupscan=[org.apache.drill.exec.store.pojo.PojoRecordReader@3da85d3b[columns
>  = null, isStarQuery = false, isSkipQuery = false]]) : rowType = 
> RecordType(BIGINT count): rowcount = 20.0, cumulative cost = {20.0 rows, 20.0 
> cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 1343
> {code}
> Here is part of the explain plan for the count(dir0) and count(dir1) in the 
> same select:
> {code}
> 00-00Screen : rowType = RecordType(BIGINT EXPR$0, BIGINT EXPR$1): 
> rowcount = 60.0, cumulative cost = {1206.0 rows, 15606.0 cpu, 0.0 io, 0.0 
> network, 0.0 memory}, id = 1623
> 00-01  Project(EXPR$0=[$0], EXPR$1=[$1]) : rowType = RecordType(BIGINT 
> EXPR$0, BIGINT EXPR$1): rowcount = 60.0, cumulative cost = {1200.0 rows, 
> 15600.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 1622
> 00-02StreamAgg(group=[{}], EXPR$0=[COUNT($0)], EXPR$1=[COUNT($1)]) : 
> rowType = RecordType(BIGINT EXPR$0, BIGINT EXPR$1): rowcount = 60.0, 
> cumulative cost = {1200.0 rows, 15600.0 cpu, 0.0 io, 0.0 network, 0.0 
> memory}, id = 1621
> 00-03  Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
> [path=maprfs:/drill/testdata/min_max_dir/1999/Apr/voter20.parquet/0_0_0.parquet],
>  ReadEntryWithPath 
> [path=maprfs:/drill/testdata/min_max_dir/1999/MAR/voter15.parquet/0_0_0.parquet],
>  ReadEntryWithPath 
> [path=maprfs:/drill/testdata/min_max_dir/1985/jan/voter5.parquet/0_0_0.parquet],
>  ReadEntryWithPath 
> [path=maprfs:/drill/testdata/min_max_dir/1985/apr/voter60.parquet/0_0_0.parquet],...,
>  ReadEntryWithPath 
> [path=maprfs:/drill/testdata/min_max_dir/2014/jul/voter35.parquet/0_0_0.parquet]],
>  selectionRoot=maprfs:/drill/testdata/min_max_dir, numFiles=16, 
> usedMetadataFile=false, columns=[`dir0`, `dir1`]]]) : rowType = 
> RecordType(ANY dir0, ANY dir1): rowcount = 600.0, cumulative cost = {600.0 
> rows, 1200.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 1620
> {code}
> Notice that in the first case, 
> "org.apache.drill.exec.store.pojo.PojoRecordReader" is used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (DRILL-4735) Count(dir0) on parquet returns 0 result

2016-06-21 Thread Jinfeng Ni (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinfeng Ni reassigned DRILL-4735:
-

Assignee: Jinfeng Ni

> Count(dir0) on parquet returns 0 result
> ---
>
> Key: DRILL-4735
> URL: https://issues.apache.org/jira/browse/DRILL-4735
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization, Storage - Parquet
>Affects Versions: 1.0.0, 1.4.0, 1.6.0, 1.7.0
>Reporter: Krystal
>Assignee: Jinfeng Ni
>Priority: Critical
>
> Selecting a count of dir0, dir1, etc against a parquet directory returns 0 
> rows.
> select count(dir0) from `min_max_dir`;
> +-+
> | EXPR$0  |
> +-+
> | 0   |
> +-+
> select count(dir1) from `min_max_dir`;
> +-+
> | EXPR$0  |
> +-+
> | 0   |
> +-+
> If I put both dir0 and dir1 in the same select, it returns expected result:
> select count(dir0), count(dir1) from `min_max_dir`;
> +-+-+
> | EXPR$0  | EXPR$1  |
> +-+-+
> | 600 | 600 |
> +-+-+
> Here is the physical plan for count(dir0) query:
> {code}
> 00-00Screen : rowType = RecordType(BIGINT EXPR$0): rowcount = 20.0, 
> cumulative cost = {22.0 rows, 22.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id 
> = 1346
> 00-01  Project(EXPR$0=[$0]) : rowType = RecordType(BIGINT EXPR$0): 
> rowcount = 20.0, cumulative cost = {20.0 rows, 20.0 cpu, 0.0 io, 0.0 network, 
> 0.0 memory}, id = 1345
> 00-02Project(EXPR$0=[$0]) : rowType = RecordType(BIGINT EXPR$0): 
> rowcount = 20.0, cumulative cost = {20.0 rows, 20.0 cpu, 0.0 io, 0.0 network, 
> 0.0 memory}, id = 1344
> 00-03  
> Scan(groupscan=[org.apache.drill.exec.store.pojo.PojoRecordReader@3da85d3b[columns
>  = null, isStarQuery = false, isSkipQuery = false]]) : rowType = 
> RecordType(BIGINT count): rowcount = 20.0, cumulative cost = {20.0 rows, 20.0 
> cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 1343
> {code}
> Here is part of the explain plan for the count(dir0) and count(dir1) in the 
> same select:
> {code}
> 00-00Screen : rowType = RecordType(BIGINT EXPR$0, BIGINT EXPR$1): 
> rowcount = 60.0, cumulative cost = {1206.0 rows, 15606.0 cpu, 0.0 io, 0.0 
> network, 0.0 memory}, id = 1623
> 00-01  Project(EXPR$0=[$0], EXPR$1=[$1]) : rowType = RecordType(BIGINT 
> EXPR$0, BIGINT EXPR$1): rowcount = 60.0, cumulative cost = {1200.0 rows, 
> 15600.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 1622
> 00-02StreamAgg(group=[{}], EXPR$0=[COUNT($0)], EXPR$1=[COUNT($1)]) : 
> rowType = RecordType(BIGINT EXPR$0, BIGINT EXPR$1): rowcount = 60.0, 
> cumulative cost = {1200.0 rows, 15600.0 cpu, 0.0 io, 0.0 network, 0.0 
> memory}, id = 1621
> 00-03  Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
> [path=maprfs:/drill/testdata/min_max_dir/1999/Apr/voter20.parquet/0_0_0.parquet],
>  ReadEntryWithPath 
> [path=maprfs:/drill/testdata/min_max_dir/1999/MAR/voter15.parquet/0_0_0.parquet],
>  ReadEntryWithPath 
> [path=maprfs:/drill/testdata/min_max_dir/1985/jan/voter5.parquet/0_0_0.parquet],
>  ReadEntryWithPath 
> [path=maprfs:/drill/testdata/min_max_dir/1985/apr/voter60.parquet/0_0_0.parquet],...,
>  ReadEntryWithPath 
> [path=maprfs:/drill/testdata/min_max_dir/2014/jul/voter35.parquet/0_0_0.parquet]],
>  selectionRoot=maprfs:/drill/testdata/min_max_dir, numFiles=16, 
> usedMetadataFile=false, columns=[`dir0`, `dir1`]]]) : rowType = 
> RecordType(ANY dir0, ANY dir1): rowcount = 600.0, cumulative cost = {600.0 
> rows, 1200.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 1620
> {code}
> Notice that in the first case, 
> "org.apache.drill.exec.store.pojo.PojoRecordReader" is used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4735) Count(dir0) on parquet returns 0 result

2016-06-21 Thread Jinfeng Ni (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinfeng Ni updated DRILL-4735:
--
Affects Version/s: 1.0.0

> Count(dir0) on parquet returns 0 result
> ---
>
> Key: DRILL-4735
> URL: https://issues.apache.org/jira/browse/DRILL-4735
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization, Storage - Parquet
>Affects Versions: 1.0.0, 1.4.0, 1.6.0, 1.7.0
>Reporter: Krystal
>Priority: Critical
>
> Selecting a count of dir0, dir1, etc against a parquet directory returns 0 
> rows.
> select count(dir0) from `min_max_dir`;
> +-+
> | EXPR$0  |
> +-+
> | 0   |
> +-+
> select count(dir1) from `min_max_dir`;
> +-+
> | EXPR$0  |
> +-+
> | 0   |
> +-+
> If I put both dir0 and dir1 in the same select, it returns expected result:
> select count(dir0), count(dir1) from `min_max_dir`;
> +-+-+
> | EXPR$0  | EXPR$1  |
> +-+-+
> | 600 | 600 |
> +-+-+
> Here is the physical plan for count(dir0) query:
> {code}
> 00-00Screen : rowType = RecordType(BIGINT EXPR$0): rowcount = 20.0, 
> cumulative cost = {22.0 rows, 22.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id 
> = 1346
> 00-01  Project(EXPR$0=[$0]) : rowType = RecordType(BIGINT EXPR$0): 
> rowcount = 20.0, cumulative cost = {20.0 rows, 20.0 cpu, 0.0 io, 0.0 network, 
> 0.0 memory}, id = 1345
> 00-02Project(EXPR$0=[$0]) : rowType = RecordType(BIGINT EXPR$0): 
> rowcount = 20.0, cumulative cost = {20.0 rows, 20.0 cpu, 0.0 io, 0.0 network, 
> 0.0 memory}, id = 1344
> 00-03  
> Scan(groupscan=[org.apache.drill.exec.store.pojo.PojoRecordReader@3da85d3b[columns
>  = null, isStarQuery = false, isSkipQuery = false]]) : rowType = 
> RecordType(BIGINT count): rowcount = 20.0, cumulative cost = {20.0 rows, 20.0 
> cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 1343
> {code}
> Here is part of the explain plan for the count(dir0) and count(dir1) in the 
> same select:
> {code}
> 00-00Screen : rowType = RecordType(BIGINT EXPR$0, BIGINT EXPR$1): 
> rowcount = 60.0, cumulative cost = {1206.0 rows, 15606.0 cpu, 0.0 io, 0.0 
> network, 0.0 memory}, id = 1623
> 00-01  Project(EXPR$0=[$0], EXPR$1=[$1]) : rowType = RecordType(BIGINT 
> EXPR$0, BIGINT EXPR$1): rowcount = 60.0, cumulative cost = {1200.0 rows, 
> 15600.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 1622
> 00-02StreamAgg(group=[{}], EXPR$0=[COUNT($0)], EXPR$1=[COUNT($1)]) : 
> rowType = RecordType(BIGINT EXPR$0, BIGINT EXPR$1): rowcount = 60.0, 
> cumulative cost = {1200.0 rows, 15600.0 cpu, 0.0 io, 0.0 network, 0.0 
> memory}, id = 1621
> 00-03  Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
> [path=maprfs:/drill/testdata/min_max_dir/1999/Apr/voter20.parquet/0_0_0.parquet],
>  ReadEntryWithPath 
> [path=maprfs:/drill/testdata/min_max_dir/1999/MAR/voter15.parquet/0_0_0.parquet],
>  ReadEntryWithPath 
> [path=maprfs:/drill/testdata/min_max_dir/1985/jan/voter5.parquet/0_0_0.parquet],
>  ReadEntryWithPath 
> [path=maprfs:/drill/testdata/min_max_dir/1985/apr/voter60.parquet/0_0_0.parquet],...,
>  ReadEntryWithPath 
> [path=maprfs:/drill/testdata/min_max_dir/2014/jul/voter35.parquet/0_0_0.parquet]],
>  selectionRoot=maprfs:/drill/testdata/min_max_dir, numFiles=16, 
> usedMetadataFile=false, columns=[`dir0`, `dir1`]]]) : rowType = 
> RecordType(ANY dir0, ANY dir1): rowcount = 600.0, cumulative cost = {600.0 
> rows, 1200.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 1620
> {code}
> Notice that in the first case, 
> "org.apache.drill.exec.store.pojo.PojoRecordReader" is used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4733) max(dir0) reading more columns than necessary

2016-06-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15342588#comment-15342588
 ] 

ASF GitHub Bot commented on DRILL-4733:
---

Github user jinfengni commented on a diff in the pull request:

https://github.com/apache/drill/pull/531#discussion_r67943855
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/store/TestImplicitFileColumns.java
 ---
@@ -110,4 +111,20 @@ public void testImplicitColumnsForParquet() throws 
Exception {
 .go();
   }
 
+  @Test // DRILL-4733
+  public void testMultilevelParquetWithSchemaChange() throws Exception {
+try {
+  test("alter session set `planner.enable_decimal_data_type` = true");
+  testBuilder()
+  .sqlQuery(String.format("select max(dir0) as max_dir from 
dfs_test.`%s/src/test/resources/multilevel/parquetWithSchemaChange`",
+  TestTools.getWorkingPath()))
+  .unOrdered()
+  .baselineColumns("max_dir")
+  .baselineValues("voter50.parquet")
--- End diff --

Why do you put baselineValue in a parquet, in stead of putting it in the 
testcase directly? Tthe query seems to return one single value. 


> max(dir0) reading more columns than necessary
> -
>
> Key: DRILL-4733
> URL: https://issues.apache.org/jira/browse/DRILL-4733
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization, Storage - Parquet
>Affects Versions: 1.7.0
>Reporter: Rahul Challapalli
>Assignee: Arina Ielchiieva
>Priority: Critical
> Fix For: 1.7.0
>
> Attachments: bug.tgz
>
>
> The below query started to fail from this commit : 
> 3209886a8548eea4a2f74c059542672f8665b8d2
> {code}
> select max(dir0) from dfs.`/drill/testdata/bug/2016`;
> Error: UNSUPPORTED_OPERATION ERROR: Streaming aggregate does not support 
> schema changes
> Fragment 0:0
> [Error Id: b0060205-e9a6-428a-9803-7b4312b2c6f4 on qa-node190.qa.lab:31010] 
> (state=,code=0)
> {code}
> The sub-folders contains files which do have schema change for one column 
> "contributions" (int32 vs double). However prior to this commit we did not 
> fail in the scenario. Log files and test data are attached



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4741) sqlline scripts should differentiate embedded vs remote config

2016-06-21 Thread Paul Rogers (JIRA)
Paul Rogers created DRILL-4741:
--

 Summary: sqlline scripts should differentiate embedded vs remote 
config
 Key: DRILL-4741
 URL: https://issues.apache.org/jira/browse/DRILL-4741
 Project: Apache Drill
  Issue Type: Improvement
Affects Versions: 1.8.0
Reporter: Paul Rogers
Priority: Minor


$DRILL_HOME/bin contains four sqlline-related scripts:

sqlline -- main script for running sqlline
drill-conf — Wrapper for sqlline, uses drill config to find Drill. Seems this 
one needs fixing to use a config other than the hard-coded $DRILL_HOME/conf 
location.
drill-embedded — Starts a drill “embedded” in SqlLine, using a local ZK.
drill-localhost — Wrapper for sqlline, uses a local ZK.

The last three turn around and call sqlline.

Behind the scenes, the script call drill-config.sh and drill-env.sh to do setup.

Note, however that we run Sqlline and Drill in three distinct configurations:

sqlline as client: should run with light memory
drillbit as daemon: should run with full memory use
sqline with embedded drillbit: sqlline needs to run with Drillbit memory 
options.

Today, sqlline always uses the Drillbit memory options (and VM options) which 
results in too much memory and port conflicts when running client-only.

Provide sqlline specific VM and memory options. Then, the tricky bit, use them 
only when Drill is not embedded.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4387) Improve execution side when it handles skipAll query

2016-06-21 Thread Khurram Faraaz (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15342463#comment-15342463
 ] 

Khurram Faraaz commented on DRILL-4387:
---

The below queries return wrong results. (the problem seems to be there for 
quite some time)

{noformat}
Directory structure is

[root@centos-01 DRILL_4589]# ls
1990  1992  1994  1996  1998  2000  2002  2004  2006  2008  2010  2012  2014
1991  1993  1995  1997  1999  2001  2003  2005  2007  2009  2011  2013  2015
[root@centos-01 DRILL_4589]# cd 1990
[root@centos-01 1990]# ls
Q1  Q2  Q3  Q4
and so on...

Below two queries return 0, I don't think the results are correct, please review

0: jdbc:drill:schema=dfs.tmp> select count(dir0) from `DRILL_4589`;
+-+
| EXPR$0  |
+-+
| 0   |
+-+
1 row selected (9.117 seconds)
0: jdbc:drill:schema=dfs.tmp> select count(dir1) from `DRILL_4589`;
+-+
| EXPR$0  |
+-+
| 0   |
+-+
1 row selected (8.97 seconds)

0: jdbc:drill:schema=dfs.tmp> explain plan for select count(dir0) from 
`DRILL_4589`;
+--+--+
| text | json |
+--+--+
| 00-00Screen
00-01  Project(EXPR$0=[$0])
00-02Project(EXPR$0=[$0])
00-03  
Scan(groupscan=[org.apache.drill.exec.store.pojo.PojoRecordReader@5275c59a[columns
 = null, isStarQuery = false, isSkipQuery = false]])


0: jdbc:drill:schema=dfs.tmp> explain plan for select count(dir1) from 
`DRILL_4589`;
+--+--+
| text | json |
+--+--+
| 00-00Screen
00-01  Project(EXPR$0=[$0])
00-02Project(EXPR$0=[$0])
00-03  
Scan(groupscan=[org.apache.drill.exec.store.pojo.PojoRecordReader@337121ac[columns
 = null, isStarQuery = false, isSkipQuery = false]])
{noformat}

> Improve execution side when it handles skipAll query
> 
>
> Key: DRILL-4387
> URL: https://issues.apache.org/jira/browse/DRILL-4387
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Jinfeng Ni
>Assignee: Jinfeng Ni
> Fix For: 1.6.0
>
>
> DRILL-4279 changes the planner side and the RecordReader in the execution 
> side when they handles skipAll query. However, it seems there are other 
> places in the codebase that do not handle skipAll query efficiently. In 
> particular, in GroupScan or ScanBatchCreator, we will replace a NULL or empty 
> column list with star column. This essentially will force the execution side 
> (RecordReader) to fetch all the columns for data source. Such behavior will 
> lead to big performance overhead for the SCAN operator.
> To improve Drill's performance, we should change those places as well, as a 
> follow-up work after DRILL-4279.
> One simple example of this problem is:
> {code}
>SELECT DISTINCT substring(dir1, 5) from  dfs.`/Path/To/ParquetTable`;  
> {code}
> The query does not require any regular column from the parquet file. However, 
> ParquetRowGroupScan and ParquetScanBatchCreator will put star column as the 
> column list. In case table has dozens or hundreds of columns, this will make 
> SCAN operator much more expensive than necessary. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4736) "noexec" set for /tmp

2016-06-21 Thread Sudheesh Katkam (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sudheesh Katkam updated DRILL-4736:
---
Description: 
We should. can you file a doc bug.

The issue is caused by "noexec" set for /tmp.
Should we mention this in Drill Doc?
This is not the first time we hit this issue.

Thanks,
Hao


  was:

We should. can you file a doc bug.

The issue is caused by "noexec" set for /tmp.
Should we mention this in Drill Doc?
This is not the first time we hit this issue.

Thanks,
Hao



https://maprdrill.atlassian.net/browse/MD-946



> "noexec" set for /tmp
> -
>
> Key: DRILL-4736
> URL: https://issues.apache.org/jira/browse/DRILL-4736
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Documentation
>Reporter: Bridget Bevens
>Assignee: Bridget Bevens
>
> We should. can you file a doc bug.
> The issue is caused by "noexec" set for /tmp.
> Should we mention this in Drill Doc?
> This is not the first time we hit this issue.
> Thanks,
> Hao



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4737) Adjust drill-env.sh instructions to reflect config/site directories

2016-06-21 Thread Paul Rogers (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers updated DRILL-4737:
---
Description: 
See https://drill.apache.org/docs/starting-drill-in-distributed-mode/

Requires a number of changes to reflect Drill's support of a configuration 
directory as specified by:

drillbit.sh --config /path/to/config/dir cmd

"The default memory for a Drillbit is 8G, but Drill prefers 16G" The default 
*direct* memory for Drill is 8G. The default total memory for Drill is 12G. 
(Included 4G heap.)

"Drillbit startup script located in /conf/drill-
env.sh." The location is $SITE_DIR/drill-env.sh. $SITE_DIR is either:

1. $DRILL_HOME/conf by default (as stated in the docs), or
2. Specified by the --config  option to drillbit.sh

"edit the XX:MaxDirectMemorySize parameter". Please show how to do this. The 
correct form (to work with YARN) is:

export DRILL_MAX_DIRECT_MEMORY=${DRILL_MAX_DIRECT_MEMORY:-"8G"}

Where the new value replaces the "8G". (This is different than the pre-1.8 
form.)

"If this parameter is not set, the limit depends on the amount of available 
system memory." This has never turned out to be true as the script always 
provides a default value.

Another point: Drill assumes all nodes have the same amount of memory. Relying 
on system memory will not, in general, work as some Drillbits (with less system 
memory) will die with OOM errors. I suspect this is why a default setting is 
always provided.

"After you edit /conf/drill-env.sh" change to 
"After you edit drill-env.sh" to avoid repeating the path.

Further, note that in 1.8, drill-env.sh will become self-documenting: it will 
contain example settings and comments for each supported config option. (Thanks 
to John O. for that suggestion!) We migth want to mention this information 
somewhere...

  was:
See https://drill.apache.org/docs/starting-drill-in-distributed-mode/

Requires a number of changes to reflect Drill's support of a configuration 
directory as specified by:

drillbit.sh --config /path/to/config/dir cmd

"The default memory for a Drillbit is 8G, but Drill prefers 16G" The default 
*direct* memory for Drill is 8G. The default total memory for Drill is 12G. 
(Included 4G heap.)

"Drillbit startup script located in /conf/drill-
env.sh." The location is $SITE_DIR/drill-env.sh. $SITE_DIR is either:

1. $DRILL_HOME/conf by default (as stated in the docs), or
2. Specified by the --config  option to drillbit.sh

"edit the XX:MaxDirectMemorySize parameter". Please show how to do this. The 
correct form (to work with YARN) is:

export DRILL_MAX_DIRECT_MEMORY=${DRILL_MAX_DIRECT_MEMORY:-"8G"}

Where the new value replaces the "8G". (This is different than the pre-1.8 
form.)

"If this parameter is not set, the limit depends on the amount of available 
system memory." This has never turned out to be true as the script always 
provides a default value.

Another point: Drill assumes all nodes have the same amount of memory. Relying 
on system memory will not, in general, work as some Drillbits (with less system 
memory) will die with OOM errors. I suspect this is why a default setting is 
always provided.

"After you edit /conf/drill-env.sh" change to 
"After you edit drill-env.sh" to avoid repeating the path.


> Adjust drill-env.sh instructions to reflect config/site directories
> ---
>
> Key: DRILL-4737
> URL: https://issues.apache.org/jira/browse/DRILL-4737
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.8.0
>Reporter: Paul Rogers
>Priority: Minor
>
> See https://drill.apache.org/docs/starting-drill-in-distributed-mode/
> Requires a number of changes to reflect Drill's support of a configuration 
> directory as specified by:
> drillbit.sh --config /path/to/config/dir cmd
> "The default memory for a Drillbit is 8G, but Drill prefers 16G" The default 
> *direct* memory for Drill is 8G. The default total memory for Drill is 12G. 
> (Included 4G heap.)
> "Drillbit startup script located in /conf/drill-
> env.sh." The location is $SITE_DIR/drill-env.sh. $SITE_DIR is either:
> 1. $DRILL_HOME/conf by default (as stated in the docs), or
> 2. Specified by the --config  option to drillbit.sh
> "edit the XX:MaxDirectMemorySize parameter". Please show how to do this. The 
> correct form (to work with YARN) is:
> export DRILL_MAX_DIRECT_MEMORY=${DRILL_MAX_DIRECT_MEMORY:-"8G"}
> Where the new value replaces the "8G". (This is different than the pre-1.8 
> form.)
> "If this parameter is not set, the limit depends on the amount of available 
> system memory." This has never turned out to be true as the script always 
> provides a default value.
> Another point: Drill assumes all nodes have the same amount of memory. 
> Relying on system memory 

[jira] [Updated] (DRILL-4740) Improvements to "Analyzing the Yelp Academic Dataset"

2016-06-21 Thread Paul Rogers (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers updated DRILL-4740:
---
Description: 
Consider the topic paragraph for the Yelp sample data page: 
http://drill.apache.org/docs/analyzing-the-yelp-academic-dataset/

It could use a bit of TLC. For example:

"Apache Drill is one of the fastest growing open source projects, with the 
community making rapid progress with monthly releases The key difference is 
Drill’s agility and flexibility."

This is a non-sequiter. The speed and agility of the software does not drive 
the monthly releases. Can we reword it to say that Drill’s speed and agility 
makes it a popular project? And that many people work hard to make it better 
with monthly releases? Something like that...

(Although, at present, releases have dropped to bi-monthly or quarterly...)

And:

"Along with meeting the table stakes for SQL-on-Hadoop, which is to achieve low 
latency performance at scale, …"

Seems two problems.

1. What does it mean “meeting the table stakes”? Very unclear.
2. This is a run-on sentence that tries to say multiple thoughts in a single 
sentence and should be rewritten.

Then, there is redundancy:

"...Drill allows users to analyze the data without any ETL or up-front schema 
definitions. … Drill, has a “no schema” approach…"

I’m sure this paragraph was written quickly early on, but it could certainly be 
improved a bit…

More comments:

1. Minor nit: "This document aligns Drill output for example purposes. Drill 
output is not aligned in this case."

I think that what this is saying is, “Drill output in this document is aligned 
for clarity. The actual Drill output you see may not be aligned.”

It would be better to explain why it is not aligned here, since data is aligned 
in the earlier examples…

2.  Somewhat off: "You can directly query self-describing files such as JSON, 
Parquet, and text. There is no need to create metadata definitions in the Hive 
metastore."

I think what this is saying is that Drill infers schema information from 
self-describing files such as JSON, Parquet and CSV/TSV (with a header row). 
Contrast this with other systems, such as Hive, that require that you first 
define the schema in a data dictionary.

Note that text is NOT a self-describing file format in the general case!

3.  Yelp seems to be creating new revisions of their data set. I downloaded 
Round 7. The results differ from those in the Drill page text. Perhaps insert a 
statement that the examples used Round (whatever round) and that the reader’s 
results may differ when using later rounds.

4.  The Yelp data is JSON. Somewhere near the top of the page (perhaps directly 
under "Querying Data with Drill”),  we should say:

The Yelp data is in JSON format.

Where the “JSON format” would be link to the JSON docs: 
https://drill.apache.org/docs/json-data-model/

This is handy later when we tell the user to set the all_text_mode:

First, change Drill to work in all text mode (so we can take a look at all of 
the data).

Where we should add: (See the JSON Data Model documentation for more 
information.)

5. This query:

select attributes from 
dfs.`//yelp/yelp_academic_dataset_business.json` limit 10;

Appears all on one line and is truncated at the right of the page. Looks like 
we’ve broken our other long queries onto multiple lines. Perhaps this one needs 
the same treatment.

7. Here: "Top first categories in number of review counts"

Perhaps copy the following text from the JSON format page to add explanation:

“Query Complex Data” show how to use composite types to access nested arrays.

8. Another nit. Consider "Top businesses with cool rated reviews”. This (and 
similar items) are headers, but appear as regular text. The items have the HTML 
h4 tag, but have no special formatting. Can we make them bold or some such?

9. The following example SQL has two problems:

0: jdbc:drill:zk=local> create or replace view dfs.tmp.businessreviews as 
Select b.name,b.stars,b.state,b.city,r.votes.funny,r.votes.useful,r.votes.cool, 
r.`date` 
from dfs.`//yelp/yelp_academic_dataset_business.json` b, 
dfs.`//yelp/yelp_academic_dataset_review.json` r 
where r.business_id=b.business_id

First, the third line scrolls off the page on my (moderate sized) page. Perhaps 
split it after “b, “.

Second, the statement must end with a semi-colon: “b.business_id;”.

10. Another nit. This paragraph:

"The goal of Apache Drill is to provide the freedom and flexibility in 
exploring data in ways we have never seen before with SQL technologies. The 
community is working on more exciting features around nested data and 
supporting data with changing schemas in upcoming releases."

Would seem to be a better fit at the top of the page rather than toward the end.

11. Another nit. This paragraph:

"In addition to these queries, you can get many deep insights using Drill’s SQL 
functionality. If 

[jira] [Updated] (DRILL-4740) Improvements to "Analyzing the Yelp Academic Dataset"

2016-06-21 Thread Paul Rogers (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers updated DRILL-4740:
---
Summary: Improvements to "Analyzing the Yelp Academic Dataset"  (was: 
Awkward wording in "Analyzing the Yelp Academic Dataset")

> Improvements to "Analyzing the Yelp Academic Dataset"
> -
>
> Key: DRILL-4740
> URL: https://issues.apache.org/jira/browse/DRILL-4740
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.6.0
>Reporter: Paul Rogers
>Priority: Minor
>
> Consider the topic paragraph for the Yelp sample data page: 
> http://drill.apache.org/docs/analyzing-the-yelp-academic-dataset/
> It could use a bit of TLC. For example:
> "Apache Drill is one of the fastest growing open source projects, with the 
> community making rapid progress with monthly releases The key difference is 
> Drill’s agility and flexibility."
> This is a non-sequiter. The speed and agility of the software does not drive 
> the monthly releases. Can we reword it to say that Drill’s speed and agility 
> makes it a popular project? And that many people work hard to make it better 
> with monthly releases? Something like that...
> (Although, at present, releases have dropped to bi-monthly or quarterly...)
> And:
> "Along with meeting the table stakes for SQL-on-Hadoop, which is to achieve 
> low latency performance at scale, …"
> Seems two problems.
> 1. What does it mean “meeting the table stakes”? Very unclear.
> 2. This is a run-on sentence that tries to say multiple thoughts in a single 
> sentence and should be rewritten.
> Then, there is redundancy:
> "...Drill allows users to analyze the data without any ETL or up-front schema 
> definitions. … Drill, has a “no schema” approach…"
> I’m sure this paragraph was written quickly early on, but it could certainly 
> be improved a bit…



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4740) Awkward wording in "Analyzing the Yelp Academic Dataset"

2016-06-21 Thread Paul Rogers (JIRA)
Paul Rogers created DRILL-4740:
--

 Summary: Awkward wording in "Analyzing the Yelp Academic Dataset"
 Key: DRILL-4740
 URL: https://issues.apache.org/jira/browse/DRILL-4740
 Project: Apache Drill
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 1.6.0
Reporter: Paul Rogers
Priority: Minor


Consider the topic paragraph for the Yelp sample data page: 
http://drill.apache.org/docs/analyzing-the-yelp-academic-dataset/

It could use a bit of TLC. For example:

"Apache Drill is one of the fastest growing open source projects, with the 
community making rapid progress with monthly releases The key difference is 
Drill’s agility and flexibility."

This is a non-sequiter. The speed and agility of the software does not drive 
the monthly releases. Can we reword it to say that Drill’s speed and agility 
makes it a popular project? And that many people work hard to make it better 
with monthly releases? Something like that...

(Although, at present, releases have dropped to bi-monthly or quarterly...)

And:

"Along with meeting the table stakes for SQL-on-Hadoop, which is to achieve low 
latency performance at scale, …"

Seems two problems.

1. What does it mean “meeting the table stakes”? Very unclear.
2. This is a run-on sentence that tries to say multiple thoughts in a single 
sentence and should be rewritten.

Then, there is redundancy:

"...Drill allows users to analyze the data without any ETL or up-front schema 
definitions. … Drill, has a “no schema” approach…"

I’m sure this paragraph was written quickly early on, but it could certainly be 
improved a bit…



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4739) "SQL Extensions" doc. errata

2016-06-21 Thread Paul Rogers (JIRA)
Paul Rogers created DRILL-4739:
--

 Summary: "SQL Extensions" doc. errata
 Key: DRILL-4739
 URL: https://issues.apache.org/jira/browse/DRILL-4739
 Project: Apache Drill
  Issue Type: Bug
  Components: Documentation
Affects Versions: 1.6.0
Reporter: Paul Rogers
Priority: Minor


The “sys.drilbits” (http://drill.apache.org/docs/sql-extensions/) example 
throws an error when used with the standalone version:

SELECT host FROM sys.drillbits WHERE `current` = true;

Produces the following error:

Error: VALIDATION ERROR: From line 1, column 8 to line 1, column 11: Column 
'host' not found in any table
[Error Id: e1d92308-9235-4699-ac53-03a59d06ce69 on 10.250.50.31:31010] 
(state=,code=0)

Performing “select * from sys.drillbits;” shows that the actual column name is 
“hostname”. Checking http://drill.apache.org/docs/querying-system-tables/ shows 
that hostname is the documented column name.

So, change the above example to:

SELECT hostname FROM sys.drillbits WHERE `current` = true;

Note the use of "hostname" rather than "host".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4738) "Compiling Drill from Source" doc changes

2016-06-21 Thread Paul Rogers (JIRA)
Paul Rogers created DRILL-4738:
--

 Summary: "Compiling Drill from Source" doc changes
 Key: DRILL-4738
 URL: https://issues.apache.org/jira/browse/DRILL-4738
 Project: Apache Drill
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 1.6.0
Reporter: Paul Rogers
Priority: Minor


In, “2. Compile the code”, before the “mvn clean …”, add:

export MAVEN_OPTS="-Xms256m -Xmx512m -XX:MaxPermSize=256m"

The code will not compile with the default JVM options, instead, you’ll get an 
Out of Memory message.

I personally encountered this. A new Drill developer fought with this issue. 
Might as well document the issue to save others from the same hassles.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4737) Adjust drill-env.sh instructions to reflect config/site directories

2016-06-21 Thread Paul Rogers (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers updated DRILL-4737:
---
Issue Type: Improvement  (was: Bug)

> Adjust drill-env.sh instructions to reflect config/site directories
> ---
>
> Key: DRILL-4737
> URL: https://issues.apache.org/jira/browse/DRILL-4737
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.8.0
>Reporter: Paul Rogers
>Priority: Minor
>
> See https://drill.apache.org/docs/starting-drill-in-distributed-mode/
> Requires a number of changes to reflect Drill's support of a configuration 
> directory as specified by:
> drillbit.sh --config /path/to/config/dir cmd
> "The default memory for a Drillbit is 8G, but Drill prefers 16G" The default 
> *direct* memory for Drill is 8G. The default total memory for Drill is 12G. 
> (Included 4G heap.)
> "Drillbit startup script located in /conf/drill-
> env.sh." The location is $SITE_DIR/drill-env.sh. $SITE_DIR is either:
> 1. $DRILL_HOME/conf by default (as stated in the docs), or
> 2. Specified by the --config  option to drillbit.sh
> "edit the XX:MaxDirectMemorySize parameter". Please show how to do this. The 
> correct form (to work with YARN) is:
> export DRILL_MAX_DIRECT_MEMORY=${DRILL_MAX_DIRECT_MEMORY:-"8G"}
> Where the new value replaces the "8G". (This is different than the pre-1.8 
> form.)
> "If this parameter is not set, the limit depends on the amount of available 
> system memory." This has never turned out to be true as the script always 
> provides a default value.
> Another point: Drill assumes all nodes have the same amount of memory. 
> Relying on system memory will not, in general, work as some Drillbits (with 
> less system memory) will die with OOM errors. I suspect this is why a default 
> setting is always provided.
> "After you edit /conf/drill-env.sh" change to 
> "After you edit drill-env.sh" to avoid repeating the path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4737) Adjust drill-env.sh instructions to reflect config/site directories

2016-06-21 Thread Paul Rogers (JIRA)
Paul Rogers created DRILL-4737:
--

 Summary: Adjust drill-env.sh instructions to reflect config/site 
directories
 Key: DRILL-4737
 URL: https://issues.apache.org/jira/browse/DRILL-4737
 Project: Apache Drill
  Issue Type: Bug
  Components: Documentation
Affects Versions: 1.8.0
Reporter: Paul Rogers
Priority: Minor


See https://drill.apache.org/docs/starting-drill-in-distributed-mode/

Requires a number of changes to reflect Drill's support of a configuration 
directory as specified by:

drillbit.sh --config /path/to/config/dir cmd

"The default memory for a Drillbit is 8G, but Drill prefers 16G" The default 
*direct* memory for Drill is 8G. The default total memory for Drill is 12G. 
(Included 4G heap.)

"Drillbit startup script located in /conf/drill-
env.sh." The location is $SITE_DIR/drill-env.sh. $SITE_DIR is either:

1. $DRILL_HOME/conf by default (as stated in the docs), or
2. Specified by the --config  option to drillbit.sh

"edit the XX:MaxDirectMemorySize parameter". Please show how to do this. The 
correct form (to work with YARN) is:

export DRILL_MAX_DIRECT_MEMORY=${DRILL_MAX_DIRECT_MEMORY:-"8G"}

Where the new value replaces the "8G". (This is different than the pre-1.8 
form.)

"If this parameter is not set, the limit depends on the amount of available 
system memory." This has never turned out to be true as the script always 
provides a default value.

Another point: Drill assumes all nodes have the same amount of memory. Relying 
on system memory will not, in general, work as some Drillbits (with less system 
memory) will die with OOM errors. I suspect this is why a default setting is 
always provided.

"After you edit /conf/drill-env.sh" change to 
"After you edit drill-env.sh" to avoid repeating the path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4736) "noexec" set for /tmp

2016-06-21 Thread Bridget Bevens (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bridget Bevens updated DRILL-4736:
--
Description: 

We should. can you file a doc bug.

The issue is caused by "noexec" set for /tmp.
Should we mention this in Drill Doc?
This is not the first time we hit this issue.

Thanks,
Hao



https://maprdrill.atlassian.net/browse/MD-946


  was:
Neeraja Rentachintala
10:44 AM (1 minute ago)

to Hao, Zelaine, me, Kathleen, Dayanand 
+Bridget

We should. can you file a doc bug.

The issue is caused by "noexec" set for /tmp.
Should we mention this in Drill Doc?
This is not the first time we hit this issue.

Thanks,
Hao




On Tue, Jun 21, 2016 at 10:06 AM, Kathleen Li  wrote:

Customer update as follows:

1)  We upgrade our 6 node cluster from MapR 3.1 to MapR 5.1 while also 
upgrading the OS of the servers to SuSe 12. One of the nodes is using Drill so 
we installed the latest version of Drill at 1.6. 

2)  According to the MCS, this particular node is getting an alert - 
Drillbit Down Alarm. When viewing the alarm via the node it states - Can not 
determine if service: drill-bits is running. Check logs at: 
/opt/mapr/drill/drill-1.6.0/logs/ . I'm including a small piece of the 
drillbit.log attached to this email. Here is part of the error that I am seeing 
that may be significant. From: Neeraja Rentachintala 






> "noexec" set for /tmp
> -
>
> Key: DRILL-4736
> URL: https://issues.apache.org/jira/browse/DRILL-4736
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Documentation
>Reporter: Bridget Bevens
>Assignee: Bridget Bevens
>
> We should. can you file a doc bug.
> The issue is caused by "noexec" set for /tmp.
> Should we mention this in Drill Doc?
> This is not the first time we hit this issue.
> Thanks,
> Hao
> https://maprdrill.atlassian.net/browse/MD-946



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4736) "noexec" set for /tmp

2016-06-21 Thread Bridget Bevens (JIRA)
Bridget Bevens created DRILL-4736:
-

 Summary: "noexec" set for /tmp
 Key: DRILL-4736
 URL: https://issues.apache.org/jira/browse/DRILL-4736
 Project: Apache Drill
  Issue Type: Bug
  Components: Documentation
Reporter: Bridget Bevens
Assignee: Bridget Bevens


Neeraja Rentachintala
10:44 AM (1 minute ago)

to Hao, Zelaine, me, Kathleen, Dayanand 
+Bridget

We should. can you file a doc bug.

The issue is caused by "noexec" set for /tmp.
Should we mention this in Drill Doc?
This is not the first time we hit this issue.

Thanks,
Hao




On Tue, Jun 21, 2016 at 10:06 AM, Kathleen Li  wrote:

Customer update as follows:

1)  We upgrade our 6 node cluster from MapR 3.1 to MapR 5.1 while also 
upgrading the OS of the servers to SuSe 12. One of the nodes is using Drill so 
we installed the latest version of Drill at 1.6. 

2)  According to the MCS, this particular node is getting an alert - 
Drillbit Down Alarm. When viewing the alarm via the node it states - Can not 
determine if service: drill-bits is running. Check logs at: 
/opt/mapr/drill/drill-1.6.0/logs/ . I'm including a small piece of the 
drillbit.log attached to this email. Here is part of the error that I am seeing 
that may be significant. From: Neeraja Rentachintala 







--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4733) max(dir0) reading more columns than necessary

2016-06-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15342297#comment-15342297
 ] 

ASF GitHub Bot commented on DRILL-4733:
---

Github user rchallapalli commented on a diff in the pull request:

https://github.com/apache/drill/pull/531#discussion_r67917077
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/store/TestImplicitFileColumns.java
 ---
@@ -110,4 +111,20 @@ public void testImplicitColumnsForParquet() throws 
Exception {
 .go();
   }
 
+  @Test // DRILL-4733
+  public void testMultilevelParquetWithSchemaChange() throws Exception {
+try {
+  test("alter session set `planner.enable_decimal_data_type` = true");
--- End diff --

One of the parquet files in the data set contain a column which is double. 
But I do not understand why drill requires us to enable the decimal type;




> max(dir0) reading more columns than necessary
> -
>
> Key: DRILL-4733
> URL: https://issues.apache.org/jira/browse/DRILL-4733
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization, Storage - Parquet
>Affects Versions: 1.7.0
>Reporter: Rahul Challapalli
>Assignee: Arina Ielchiieva
>Priority: Critical
> Fix For: 1.7.0
>
> Attachments: bug.tgz
>
>
> The below query started to fail from this commit : 
> 3209886a8548eea4a2f74c059542672f8665b8d2
> {code}
> select max(dir0) from dfs.`/drill/testdata/bug/2016`;
> Error: UNSUPPORTED_OPERATION ERROR: Streaming aggregate does not support 
> schema changes
> Fragment 0:0
> [Error Id: b0060205-e9a6-428a-9803-7b4312b2c6f4 on qa-node190.qa.lab:31010] 
> (state=,code=0)
> {code}
> The sub-folders contains files which do have schema change for one column 
> "contributions" (int32 vs double). However prior to this commit we did not 
> fail in the scenario. Log files and test data are attached



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4733) max(dir0) reading more columns than necessary

2016-06-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15342290#comment-15342290
 ] 

ASF GitHub Bot commented on DRILL-4733:
---

Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/531#discussion_r67916478
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/easy/EasyFormatPlugin.java
 ---
@@ -126,8 +127,12 @@ CloseableRecordBatch getReaderBatch(FragmentContext 
context, EasySubScan scan) t
 final ImplicitColumnExplorer columnExplorer = new 
ImplicitColumnExplorer(context, scan.getColumns());
 
 if (!columnExplorer.isSelectAllColumns()) {
+  // We must make sure to pass a table column (not to be confused with 
implicit column) to the underlying record reader.
+  List tableColumns =
--- End diff --

In original PR  I have created helper class which contained common logic 
for parquet and test format plugins. Somehow I missed that this part is unique 
for text format plugin, and should NOT be used in parquet one. That's why I 
have removed it from ImplicitColumnExplorer and added to EasyFormatPlugin.


> max(dir0) reading more columns than necessary
> -
>
> Key: DRILL-4733
> URL: https://issues.apache.org/jira/browse/DRILL-4733
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization, Storage - Parquet
>Affects Versions: 1.7.0
>Reporter: Rahul Challapalli
>Assignee: Arina Ielchiieva
>Priority: Critical
> Fix For: 1.7.0
>
> Attachments: bug.tgz
>
>
> The below query started to fail from this commit : 
> 3209886a8548eea4a2f74c059542672f8665b8d2
> {code}
> select max(dir0) from dfs.`/drill/testdata/bug/2016`;
> Error: UNSUPPORTED_OPERATION ERROR: Streaming aggregate does not support 
> schema changes
> Fragment 0:0
> [Error Id: b0060205-e9a6-428a-9803-7b4312b2c6f4 on qa-node190.qa.lab:31010] 
> (state=,code=0)
> {code}
> The sub-folders contains files which do have schema change for one column 
> "contributions" (int32 vs double). However prior to this commit we did not 
> fail in the scenario. Log files and test data are attached



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4733) max(dir0) reading more columns than necessary

2016-06-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15342281#comment-15342281
 ] 

ASF GitHub Bot commented on DRILL-4733:
---

Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/531#discussion_r67915875
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/store/TestImplicitFileColumns.java
 ---
@@ -110,4 +111,20 @@ public void testImplicitColumnsForParquet() throws 
Exception {
 .go();
   }
 
+  @Test // DRILL-4733
+  public void testMultilevelParquetWithSchemaChange() throws Exception {
+try {
+  test("alter session set `planner.enable_decimal_data_type` = true");
--- End diff --

When I run this query without decimal data type enabled, drill tell me to 
turn it on. Probably it's connected with data inside the dataset.


> max(dir0) reading more columns than necessary
> -
>
> Key: DRILL-4733
> URL: https://issues.apache.org/jira/browse/DRILL-4733
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization, Storage - Parquet
>Affects Versions: 1.7.0
>Reporter: Rahul Challapalli
>Assignee: Arina Ielchiieva
>Priority: Critical
> Fix For: 1.7.0
>
> Attachments: bug.tgz
>
>
> The below query started to fail from this commit : 
> 3209886a8548eea4a2f74c059542672f8665b8d2
> {code}
> select max(dir0) from dfs.`/drill/testdata/bug/2016`;
> Error: UNSUPPORTED_OPERATION ERROR: Streaming aggregate does not support 
> schema changes
> Fragment 0:0
> [Error Id: b0060205-e9a6-428a-9803-7b4312b2c6f4 on qa-node190.qa.lab:31010] 
> (state=,code=0)
> {code}
> The sub-folders contains files which do have schema change for one column 
> "contributions" (int32 vs double). However prior to this commit we did not 
> fail in the scenario. Log files and test data are attached



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4735) Count(dir0) on parquet returns 0 result

2016-06-21 Thread Jinfeng Ni (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15342270#comment-15342270
 ] 

Jinfeng Ni commented on DRILL-4735:
---

I run the query on 1.4.0, and saw the same problem.  I have not checked earlier 
version. But it's likely that this problem has been there for long time.


> Count(dir0) on parquet returns 0 result
> ---
>
> Key: DRILL-4735
> URL: https://issues.apache.org/jira/browse/DRILL-4735
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization, Storage - Parquet
>Affects Versions: 1.4.0, 1.6.0, 1.7.0
>Reporter: Krystal
>Priority: Critical
>
> Selecting a count of dir0, dir1, etc against a parquet directory returns 0 
> rows.
> select count(dir0) from `min_max_dir`;
> +-+
> | EXPR$0  |
> +-+
> | 0   |
> +-+
> select count(dir1) from `min_max_dir`;
> +-+
> | EXPR$0  |
> +-+
> | 0   |
> +-+
> If I put both dir0 and dir1 in the same select, it returns expected result:
> select count(dir0), count(dir1) from `min_max_dir`;
> +-+-+
> | EXPR$0  | EXPR$1  |
> +-+-+
> | 600 | 600 |
> +-+-+
> Here is the physical plan for count(dir0) query:
> {code}
> 00-00Screen : rowType = RecordType(BIGINT EXPR$0): rowcount = 20.0, 
> cumulative cost = {22.0 rows, 22.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id 
> = 1346
> 00-01  Project(EXPR$0=[$0]) : rowType = RecordType(BIGINT EXPR$0): 
> rowcount = 20.0, cumulative cost = {20.0 rows, 20.0 cpu, 0.0 io, 0.0 network, 
> 0.0 memory}, id = 1345
> 00-02Project(EXPR$0=[$0]) : rowType = RecordType(BIGINT EXPR$0): 
> rowcount = 20.0, cumulative cost = {20.0 rows, 20.0 cpu, 0.0 io, 0.0 network, 
> 0.0 memory}, id = 1344
> 00-03  
> Scan(groupscan=[org.apache.drill.exec.store.pojo.PojoRecordReader@3da85d3b[columns
>  = null, isStarQuery = false, isSkipQuery = false]]) : rowType = 
> RecordType(BIGINT count): rowcount = 20.0, cumulative cost = {20.0 rows, 20.0 
> cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 1343
> {code}
> Here is part of the explain plan for the count(dir0) and count(dir1) in the 
> same select:
> {code}
> 00-00Screen : rowType = RecordType(BIGINT EXPR$0, BIGINT EXPR$1): 
> rowcount = 60.0, cumulative cost = {1206.0 rows, 15606.0 cpu, 0.0 io, 0.0 
> network, 0.0 memory}, id = 1623
> 00-01  Project(EXPR$0=[$0], EXPR$1=[$1]) : rowType = RecordType(BIGINT 
> EXPR$0, BIGINT EXPR$1): rowcount = 60.0, cumulative cost = {1200.0 rows, 
> 15600.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 1622
> 00-02StreamAgg(group=[{}], EXPR$0=[COUNT($0)], EXPR$1=[COUNT($1)]) : 
> rowType = RecordType(BIGINT EXPR$0, BIGINT EXPR$1): rowcount = 60.0, 
> cumulative cost = {1200.0 rows, 15600.0 cpu, 0.0 io, 0.0 network, 0.0 
> memory}, id = 1621
> 00-03  Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
> [path=maprfs:/drill/testdata/min_max_dir/1999/Apr/voter20.parquet/0_0_0.parquet],
>  ReadEntryWithPath 
> [path=maprfs:/drill/testdata/min_max_dir/1999/MAR/voter15.parquet/0_0_0.parquet],
>  ReadEntryWithPath 
> [path=maprfs:/drill/testdata/min_max_dir/1985/jan/voter5.parquet/0_0_0.parquet],
>  ReadEntryWithPath 
> [path=maprfs:/drill/testdata/min_max_dir/1985/apr/voter60.parquet/0_0_0.parquet],...,
>  ReadEntryWithPath 
> [path=maprfs:/drill/testdata/min_max_dir/2014/jul/voter35.parquet/0_0_0.parquet]],
>  selectionRoot=maprfs:/drill/testdata/min_max_dir, numFiles=16, 
> usedMetadataFile=false, columns=[`dir0`, `dir1`]]]) : rowType = 
> RecordType(ANY dir0, ANY dir1): rowcount = 600.0, cumulative cost = {600.0 
> rows, 1200.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 1620
> {code}
> Notice that in the first case, 
> "org.apache.drill.exec.store.pojo.PojoRecordReader" is used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4733) max(dir0) reading more columns than necessary

2016-06-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15342268#comment-15342268
 ] 

ASF GitHub Bot commented on DRILL-4733:
---

Github user amansinha100 commented on a diff in the pull request:

https://github.com/apache/drill/pull/531#discussion_r67914754
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/store/TestImplicitFileColumns.java
 ---
@@ -110,4 +111,20 @@ public void testImplicitColumnsForParquet() throws 
Exception {
 .go();
   }
 
+  @Test // DRILL-4733
+  public void testMultilevelParquetWithSchemaChange() throws Exception {
+try {
+  test("alter session set `planner.enable_decimal_data_type` = true");
--- End diff --

Why is decimal type relevant for this particular test ? 


> max(dir0) reading more columns than necessary
> -
>
> Key: DRILL-4733
> URL: https://issues.apache.org/jira/browse/DRILL-4733
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization, Storage - Parquet
>Affects Versions: 1.7.0
>Reporter: Rahul Challapalli
>Assignee: Arina Ielchiieva
>Priority: Critical
> Fix For: 1.7.0
>
> Attachments: bug.tgz
>
>
> The below query started to fail from this commit : 
> 3209886a8548eea4a2f74c059542672f8665b8d2
> {code}
> select max(dir0) from dfs.`/drill/testdata/bug/2016`;
> Error: UNSUPPORTED_OPERATION ERROR: Streaming aggregate does not support 
> schema changes
> Fragment 0:0
> [Error Id: b0060205-e9a6-428a-9803-7b4312b2c6f4 on qa-node190.qa.lab:31010] 
> (state=,code=0)
> {code}
> The sub-folders contains files which do have schema change for one column 
> "contributions" (int32 vs double). However prior to this commit we did not 
> fail in the scenario. Log files and test data are attached



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4735) Count(dir0) on parquet returns 0 result

2016-06-21 Thread Rahul Challapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15342265#comment-15342265
 ] 

Rahul Challapalli commented on DRILL-4735:
--

[~knguyen] Can you confirm whether this a regression from 1.6 ?

> Count(dir0) on parquet returns 0 result
> ---
>
> Key: DRILL-4735
> URL: https://issues.apache.org/jira/browse/DRILL-4735
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization, Storage - Parquet
>Affects Versions: 1.4.0, 1.6.0, 1.7.0
>Reporter: Krystal
>Priority: Critical
>
> Selecting a count of dir0, dir1, etc against a parquet directory returns 0 
> rows.
> select count(dir0) from `min_max_dir`;
> +-+
> | EXPR$0  |
> +-+
> | 0   |
> +-+
> select count(dir1) from `min_max_dir`;
> +-+
> | EXPR$0  |
> +-+
> | 0   |
> +-+
> If I put both dir0 and dir1 in the same select, it returns expected result:
> select count(dir0), count(dir1) from `min_max_dir`;
> +-+-+
> | EXPR$0  | EXPR$1  |
> +-+-+
> | 600 | 600 |
> +-+-+
> Here is the physical plan for count(dir0) query:
> {code}
> 00-00Screen : rowType = RecordType(BIGINT EXPR$0): rowcount = 20.0, 
> cumulative cost = {22.0 rows, 22.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id 
> = 1346
> 00-01  Project(EXPR$0=[$0]) : rowType = RecordType(BIGINT EXPR$0): 
> rowcount = 20.0, cumulative cost = {20.0 rows, 20.0 cpu, 0.0 io, 0.0 network, 
> 0.0 memory}, id = 1345
> 00-02Project(EXPR$0=[$0]) : rowType = RecordType(BIGINT EXPR$0): 
> rowcount = 20.0, cumulative cost = {20.0 rows, 20.0 cpu, 0.0 io, 0.0 network, 
> 0.0 memory}, id = 1344
> 00-03  
> Scan(groupscan=[org.apache.drill.exec.store.pojo.PojoRecordReader@3da85d3b[columns
>  = null, isStarQuery = false, isSkipQuery = false]]) : rowType = 
> RecordType(BIGINT count): rowcount = 20.0, cumulative cost = {20.0 rows, 20.0 
> cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 1343
> {code}
> Here is part of the explain plan for the count(dir0) and count(dir1) in the 
> same select:
> {code}
> 00-00Screen : rowType = RecordType(BIGINT EXPR$0, BIGINT EXPR$1): 
> rowcount = 60.0, cumulative cost = {1206.0 rows, 15606.0 cpu, 0.0 io, 0.0 
> network, 0.0 memory}, id = 1623
> 00-01  Project(EXPR$0=[$0], EXPR$1=[$1]) : rowType = RecordType(BIGINT 
> EXPR$0, BIGINT EXPR$1): rowcount = 60.0, cumulative cost = {1200.0 rows, 
> 15600.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 1622
> 00-02StreamAgg(group=[{}], EXPR$0=[COUNT($0)], EXPR$1=[COUNT($1)]) : 
> rowType = RecordType(BIGINT EXPR$0, BIGINT EXPR$1): rowcount = 60.0, 
> cumulative cost = {1200.0 rows, 15600.0 cpu, 0.0 io, 0.0 network, 0.0 
> memory}, id = 1621
> 00-03  Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
> [path=maprfs:/drill/testdata/min_max_dir/1999/Apr/voter20.parquet/0_0_0.parquet],
>  ReadEntryWithPath 
> [path=maprfs:/drill/testdata/min_max_dir/1999/MAR/voter15.parquet/0_0_0.parquet],
>  ReadEntryWithPath 
> [path=maprfs:/drill/testdata/min_max_dir/1985/jan/voter5.parquet/0_0_0.parquet],
>  ReadEntryWithPath 
> [path=maprfs:/drill/testdata/min_max_dir/1985/apr/voter60.parquet/0_0_0.parquet],...,
>  ReadEntryWithPath 
> [path=maprfs:/drill/testdata/min_max_dir/2014/jul/voter35.parquet/0_0_0.parquet]],
>  selectionRoot=maprfs:/drill/testdata/min_max_dir, numFiles=16, 
> usedMetadataFile=false, columns=[`dir0`, `dir1`]]]) : rowType = 
> RecordType(ANY dir0, ANY dir1): rowcount = 600.0, cumulative cost = {600.0 
> rows, 1200.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 1620
> {code}
> Notice that in the first case, 
> "org.apache.drill.exec.store.pojo.PojoRecordReader" is used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4735) Count(dir0) on parquet returns 0 result

2016-06-21 Thread Jinfeng Ni (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinfeng Ni updated DRILL-4735:
--
Affects Version/s: 1.4.0

> Count(dir0) on parquet returns 0 result
> ---
>
> Key: DRILL-4735
> URL: https://issues.apache.org/jira/browse/DRILL-4735
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization, Storage - Parquet
>Affects Versions: 1.4.0, 1.6.0, 1.7.0
>Reporter: Krystal
>Priority: Critical
>
> Selecting a count of dir0, dir1, etc against a parquet directory returns 0 
> rows.
> select count(dir0) from `min_max_dir`;
> +-+
> | EXPR$0  |
> +-+
> | 0   |
> +-+
> select count(dir1) from `min_max_dir`;
> +-+
> | EXPR$0  |
> +-+
> | 0   |
> +-+
> If I put both dir0 and dir1 in the same select, it returns expected result:
> select count(dir0), count(dir1) from `min_max_dir`;
> +-+-+
> | EXPR$0  | EXPR$1  |
> +-+-+
> | 600 | 600 |
> +-+-+
> Here is the physical plan for count(dir0) query:
> {code}
> 00-00Screen : rowType = RecordType(BIGINT EXPR$0): rowcount = 20.0, 
> cumulative cost = {22.0 rows, 22.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id 
> = 1346
> 00-01  Project(EXPR$0=[$0]) : rowType = RecordType(BIGINT EXPR$0): 
> rowcount = 20.0, cumulative cost = {20.0 rows, 20.0 cpu, 0.0 io, 0.0 network, 
> 0.0 memory}, id = 1345
> 00-02Project(EXPR$0=[$0]) : rowType = RecordType(BIGINT EXPR$0): 
> rowcount = 20.0, cumulative cost = {20.0 rows, 20.0 cpu, 0.0 io, 0.0 network, 
> 0.0 memory}, id = 1344
> 00-03  
> Scan(groupscan=[org.apache.drill.exec.store.pojo.PojoRecordReader@3da85d3b[columns
>  = null, isStarQuery = false, isSkipQuery = false]]) : rowType = 
> RecordType(BIGINT count): rowcount = 20.0, cumulative cost = {20.0 rows, 20.0 
> cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 1343
> {code}
> Here is part of the explain plan for the count(dir0) and count(dir1) in the 
> same select:
> {code}
> 00-00Screen : rowType = RecordType(BIGINT EXPR$0, BIGINT EXPR$1): 
> rowcount = 60.0, cumulative cost = {1206.0 rows, 15606.0 cpu, 0.0 io, 0.0 
> network, 0.0 memory}, id = 1623
> 00-01  Project(EXPR$0=[$0], EXPR$1=[$1]) : rowType = RecordType(BIGINT 
> EXPR$0, BIGINT EXPR$1): rowcount = 60.0, cumulative cost = {1200.0 rows, 
> 15600.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 1622
> 00-02StreamAgg(group=[{}], EXPR$0=[COUNT($0)], EXPR$1=[COUNT($1)]) : 
> rowType = RecordType(BIGINT EXPR$0, BIGINT EXPR$1): rowcount = 60.0, 
> cumulative cost = {1200.0 rows, 15600.0 cpu, 0.0 io, 0.0 network, 0.0 
> memory}, id = 1621
> 00-03  Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
> [path=maprfs:/drill/testdata/min_max_dir/1999/Apr/voter20.parquet/0_0_0.parquet],
>  ReadEntryWithPath 
> [path=maprfs:/drill/testdata/min_max_dir/1999/MAR/voter15.parquet/0_0_0.parquet],
>  ReadEntryWithPath 
> [path=maprfs:/drill/testdata/min_max_dir/1985/jan/voter5.parquet/0_0_0.parquet],
>  ReadEntryWithPath 
> [path=maprfs:/drill/testdata/min_max_dir/1985/apr/voter60.parquet/0_0_0.parquet],...,
>  ReadEntryWithPath 
> [path=maprfs:/drill/testdata/min_max_dir/2014/jul/voter35.parquet/0_0_0.parquet]],
>  selectionRoot=maprfs:/drill/testdata/min_max_dir, numFiles=16, 
> usedMetadataFile=false, columns=[`dir0`, `dir1`]]]) : rowType = 
> RecordType(ANY dir0, ANY dir1): rowcount = 600.0, cumulative cost = {600.0 
> rows, 1200.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 1620
> {code}
> Notice that in the first case, 
> "org.apache.drill.exec.store.pojo.PojoRecordReader" is used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4733) max(dir0) reading more columns than necessary

2016-06-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15342263#comment-15342263
 ] 

ASF GitHub Bot commented on DRILL-4733:
---

Github user amansinha100 commented on a diff in the pull request:

https://github.com/apache/drill/pull/531#discussion_r67914676
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/easy/EasyFormatPlugin.java
 ---
@@ -126,8 +127,12 @@ CloseableRecordBatch getReaderBatch(FragmentContext 
context, EasySubScan scan) t
 final ImplicitColumnExplorer columnExplorer = new 
ImplicitColumnExplorer(context, scan.getColumns());
 
 if (!columnExplorer.isSelectAllColumns()) {
+  // We must make sure to pass a table column (not to be confused with 
implicit column) to the underlying record reader.
+  List tableColumns =
--- End diff --

I haven't looked at the original patch for implicit columns but I am not 
sure why this fix is in the EasyFormatPlugin when the test is against Parquet 
files ?  


> max(dir0) reading more columns than necessary
> -
>
> Key: DRILL-4733
> URL: https://issues.apache.org/jira/browse/DRILL-4733
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization, Storage - Parquet
>Affects Versions: 1.7.0
>Reporter: Rahul Challapalli
>Assignee: Arina Ielchiieva
>Priority: Critical
> Fix For: 1.7.0
>
> Attachments: bug.tgz
>
>
> The below query started to fail from this commit : 
> 3209886a8548eea4a2f74c059542672f8665b8d2
> {code}
> select max(dir0) from dfs.`/drill/testdata/bug/2016`;
> Error: UNSUPPORTED_OPERATION ERROR: Streaming aggregate does not support 
> schema changes
> Fragment 0:0
> [Error Id: b0060205-e9a6-428a-9803-7b4312b2c6f4 on qa-node190.qa.lab:31010] 
> (state=,code=0)
> {code}
> The sub-folders contains files which do have schema change for one column 
> "contributions" (int32 vs double). However prior to this commit we did not 
> fail in the scenario. Log files and test data are attached



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4735) Count(dir0) on parquet returns 0 result

2016-06-21 Thread Rahul Challapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rahul Challapalli updated DRILL-4735:
-
Priority: Critical  (was: Major)

> Count(dir0) on parquet returns 0 result
> ---
>
> Key: DRILL-4735
> URL: https://issues.apache.org/jira/browse/DRILL-4735
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization, Storage - Parquet
>Affects Versions: 1.6.0, 1.7.0
>Reporter: Krystal
>Priority: Critical
>
> Selecting a count of dir0, dir1, etc against a parquet directory returns 0 
> rows.
> select count(dir0) from `min_max_dir`;
> +-+
> | EXPR$0  |
> +-+
> | 0   |
> +-+
> select count(dir1) from `min_max_dir`;
> +-+
> | EXPR$0  |
> +-+
> | 0   |
> +-+
> If I put both dir0 and dir1 in the same select, it returns expected result:
> select count(dir0), count(dir1) from `min_max_dir`;
> +-+-+
> | EXPR$0  | EXPR$1  |
> +-+-+
> | 600 | 600 |
> +-+-+
> Here is the physical plan for count(dir0) query:
> {code}
> 00-00Screen : rowType = RecordType(BIGINT EXPR$0): rowcount = 20.0, 
> cumulative cost = {22.0 rows, 22.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id 
> = 1346
> 00-01  Project(EXPR$0=[$0]) : rowType = RecordType(BIGINT EXPR$0): 
> rowcount = 20.0, cumulative cost = {20.0 rows, 20.0 cpu, 0.0 io, 0.0 network, 
> 0.0 memory}, id = 1345
> 00-02Project(EXPR$0=[$0]) : rowType = RecordType(BIGINT EXPR$0): 
> rowcount = 20.0, cumulative cost = {20.0 rows, 20.0 cpu, 0.0 io, 0.0 network, 
> 0.0 memory}, id = 1344
> 00-03  
> Scan(groupscan=[org.apache.drill.exec.store.pojo.PojoRecordReader@3da85d3b[columns
>  = null, isStarQuery = false, isSkipQuery = false]]) : rowType = 
> RecordType(BIGINT count): rowcount = 20.0, cumulative cost = {20.0 rows, 20.0 
> cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 1343
> {code}
> Here is part of the explain plan for the count(dir0) and count(dir1) in the 
> same select:
> {code}
> 00-00Screen : rowType = RecordType(BIGINT EXPR$0, BIGINT EXPR$1): 
> rowcount = 60.0, cumulative cost = {1206.0 rows, 15606.0 cpu, 0.0 io, 0.0 
> network, 0.0 memory}, id = 1623
> 00-01  Project(EXPR$0=[$0], EXPR$1=[$1]) : rowType = RecordType(BIGINT 
> EXPR$0, BIGINT EXPR$1): rowcount = 60.0, cumulative cost = {1200.0 rows, 
> 15600.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 1622
> 00-02StreamAgg(group=[{}], EXPR$0=[COUNT($0)], EXPR$1=[COUNT($1)]) : 
> rowType = RecordType(BIGINT EXPR$0, BIGINT EXPR$1): rowcount = 60.0, 
> cumulative cost = {1200.0 rows, 15600.0 cpu, 0.0 io, 0.0 network, 0.0 
> memory}, id = 1621
> 00-03  Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
> [path=maprfs:/drill/testdata/min_max_dir/1999/Apr/voter20.parquet/0_0_0.parquet],
>  ReadEntryWithPath 
> [path=maprfs:/drill/testdata/min_max_dir/1999/MAR/voter15.parquet/0_0_0.parquet],
>  ReadEntryWithPath 
> [path=maprfs:/drill/testdata/min_max_dir/1985/jan/voter5.parquet/0_0_0.parquet],
>  ReadEntryWithPath 
> [path=maprfs:/drill/testdata/min_max_dir/1985/apr/voter60.parquet/0_0_0.parquet],...,
>  ReadEntryWithPath 
> [path=maprfs:/drill/testdata/min_max_dir/2014/jul/voter35.parquet/0_0_0.parquet]],
>  selectionRoot=maprfs:/drill/testdata/min_max_dir, numFiles=16, 
> usedMetadataFile=false, columns=[`dir0`, `dir1`]]]) : rowType = 
> RecordType(ANY dir0, ANY dir1): rowcount = 600.0, cumulative cost = {600.0 
> rows, 1200.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 1620
> {code}
> Notice that in the first case, 
> "org.apache.drill.exec.store.pojo.PojoRecordReader" is used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4732) Update JDBC driver to use the new prepared statement APIs on DrillClient

2016-06-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15342110#comment-15342110
 ] 

ASF GitHub Bot commented on DRILL-4732:
---

GitHub user vkorukanti opened a pull request:

https://github.com/apache/drill/pull/532

DRILL-4732: Update JDBC driver to use the new prepared statement APIs in 
DrillClient

Changes specific to DRILL-4732 are in last commit.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vkorukanti/drill DRILL-4732

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/532.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #532


commit 32ba03c7abd9a3784c9a5376dd2835325fe8d5f9
Author: vkorukanti 
Date:   2016-06-09T23:03:06Z

DRILL-4728: Add support for new metadata fetch APIs

+ Protobuf messages
   - GetCatalogsReq -> GetCatalogsResp
   - GetSchemasReq -> GetSchemasResp
   - GetTablesReq -> GetTablesResp
   - GetColumnsReq -> GetColumnsResp

+ Java Drill client changes

+ Server side changes to handle the metadata API calls
  - Provide a self contained `Runnable` implementation for each metadata API
that process the requests and sends the response to client
  - In `UserWorker` override the `handle` method that takes the 
`ResponseSender` and
send the response from the `handle` method instead of returning it.
  - Add a method for each new API to UserWorker to submit the metadata work.
  - Add a method `addNewWork(Runnable runnable)` to `WorkerBee` to submit a 
generic
`Runnable` to `ExecutorService`.
  - Move out couple of methods from `QueryContext` into a separate interface
`SchemaConfigInfoProvider` to enable instantiating Schema trees without 
the
full `QueryContext`

+ New protobuf messages increased the `jdbc-all.jar` size. Up the limit to 
21MB.

Change-Id: I5a5e4b453caf912d832ff8547c5789c884195cc4

commit a2ca69b3a81a8ff66bd671da775318204d49dda0
Author: vkorukanti 
Date:   2016-06-13T18:20:25Z

DRILL-4729: Add support for prepared statement implementation on server side

+ Add following APIs for Drill Java client
  - DrillRpcFuture 
createPreparedStatement(final String query)
  - void executePreparedStatement(final PreparedStatement 
preparedStatement, UserResultsListener resultsListener)
  - List executePreparedStatement(final PreparedStatement 
preparedStatement) (for testing purpose)

+ Separated out the interface from UserClientConnection. It makes it easy 
to have wrappers which need to
  tap the messages and data going to the actual client.

+ Implement CREATE_PREPARED_STATEMENT and handle RunQuery with 
PreparedStatement

+ Test changes to support prepared statement as query type

+ Add tests in TestPreparedStatementProvider

Change-Id: Id26cbb9ed809f0ab3c7530e6a5d8314d2e868b86

commit 2d91a605eac808561f2bf9ae60e6582936a4e9f0
Author: vkorukanti 
Date:   2016-06-20T21:40:05Z

DRILL-4732: Update JDBC driver to use the new prepared statement APIs on 
DrillClient

Change-Id: Ib8131789e9ad257b3f60859bc4115eaef43aee48




> Update JDBC driver to use the new prepared statement APIs on DrillClient
> 
>
> Key: DRILL-4732
> URL: https://issues.apache.org/jira/browse/DRILL-4732
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Metadata
>Reporter: Venki Korukanti
>Assignee: Venki Korukanti
> Fix For: 1.8.0
>
>
> DRILL-4729 is adding new prepared statement implementation on server side and 
> it provides APIs on DrillClient to create new prepared statement which 
> returns metadata along with a opaque handle and submit prepared statement for 
> execution.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4735) Count(dir0) on parquet returns 0 result

2016-06-21 Thread Krystal (JIRA)
Krystal created DRILL-4735:
--

 Summary: Count(dir0) on parquet returns 0 result
 Key: DRILL-4735
 URL: https://issues.apache.org/jira/browse/DRILL-4735
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning & Optimization, Storage - Parquet
Affects Versions: 1.6.0, 1.7.0
Reporter: Krystal


Selecting a count of dir0, dir1, etc against a parquet directory returns 0 rows.

select count(dir0) from `min_max_dir`;
+-+
| EXPR$0  |
+-+
| 0   |
+-+

select count(dir1) from `min_max_dir`;
+-+
| EXPR$0  |
+-+
| 0   |
+-+

If I put both dir0 and dir1 in the same select, it returns expected result:
select count(dir0), count(dir1) from `min_max_dir`;
+-+-+
| EXPR$0  | EXPR$1  |
+-+-+
| 600 | 600 |
+-+-+

Here is the physical plan for count(dir0) query:
{code}
00-00Screen : rowType = RecordType(BIGINT EXPR$0): rowcount = 20.0, 
cumulative cost = {22.0 rows, 22.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 
1346
00-01  Project(EXPR$0=[$0]) : rowType = RecordType(BIGINT EXPR$0): rowcount 
= 20.0, cumulative cost = {20.0 rows, 20.0 cpu, 0.0 io, 0.0 network, 0.0 
memory}, id = 1345
00-02Project(EXPR$0=[$0]) : rowType = RecordType(BIGINT EXPR$0): 
rowcount = 20.0, cumulative cost = {20.0 rows, 20.0 cpu, 0.0 io, 0.0 network, 
0.0 memory}, id = 1344
00-03  
Scan(groupscan=[org.apache.drill.exec.store.pojo.PojoRecordReader@3da85d3b[columns
 = null, isStarQuery = false, isSkipQuery = false]]) : rowType = 
RecordType(BIGINT count): rowcount = 20.0, cumulative cost = {20.0 rows, 20.0 
cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 1343
{code}

Here is part of the explain plan for the count(dir0) and count(dir1) in the 
same select:
{code}
00-00Screen : rowType = RecordType(BIGINT EXPR$0, BIGINT EXPR$1): rowcount 
= 60.0, cumulative cost = {1206.0 rows, 15606.0 cpu, 0.0 io, 0.0 network, 0.0 
memory}, id = 1623
00-01  Project(EXPR$0=[$0], EXPR$1=[$1]) : rowType = RecordType(BIGINT 
EXPR$0, BIGINT EXPR$1): rowcount = 60.0, cumulative cost = {1200.0 rows, 
15600.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 1622
00-02StreamAgg(group=[{}], EXPR$0=[COUNT($0)], EXPR$1=[COUNT($1)]) : 
rowType = RecordType(BIGINT EXPR$0, BIGINT EXPR$1): rowcount = 60.0, cumulative 
cost = {1200.0 rows, 15600.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 1621
00-03  Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
[path=maprfs:/drill/testdata/min_max_dir/1999/Apr/voter20.parquet/0_0_0.parquet],
 ReadEntryWithPath 
[path=maprfs:/drill/testdata/min_max_dir/1999/MAR/voter15.parquet/0_0_0.parquet],
 ReadEntryWithPath 
[path=maprfs:/drill/testdata/min_max_dir/1985/jan/voter5.parquet/0_0_0.parquet],
 ReadEntryWithPath 
[path=maprfs:/drill/testdata/min_max_dir/1985/apr/voter60.parquet/0_0_0.parquet],...,
 ReadEntryWithPath 
[path=maprfs:/drill/testdata/min_max_dir/2014/jul/voter35.parquet/0_0_0.parquet]],
 selectionRoot=maprfs:/drill/testdata/min_max_dir, numFiles=16, 
usedMetadataFile=false, columns=[`dir0`, `dir1`]]]) : rowType = RecordType(ANY 
dir0, ANY dir1): rowcount = 600.0, cumulative cost = {600.0 rows, 1200.0 cpu, 
0.0 io, 0.0 network, 0.0 memory}, id = 1620
{code}

Notice that in the first case, 
"org.apache.drill.exec.store.pojo.PojoRecordReader" is used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-2385) count on complex objects failed with missing function implementation

2016-06-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15341890#comment-15341890
 ] 

ASF GitHub Bot commented on DRILL-2385:
---

Github user vdiravka commented on the issue:

https://github.com/apache/drill/pull/501
  
Changes merged into master with commit id f86c4fa


> count on complex objects failed with missing function implementation
> 
>
> Key: DRILL-2385
> URL: https://issues.apache.org/jira/browse/DRILL-2385
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 0.8.0
>Reporter: Chun Chang
>Assignee: Vitalii Diravka
>Priority: Minor
> Fix For: 1.7.0
>
>
> #Wed Mar 04 01:23:42 EST 2015
> git.commit.id.abbrev=71b6bfe
> Have a complex type looks like the following:
> {code}
> 0: jdbc:drill:schema=dfs.drillTestDirComplexJ> select t.sia from 
> `complex.json` t limit 1;
> ++
> |sia |
> ++
> | [1,11,101,1001] |
> ++
> {code}
> A count on the complex type will fail with missing function implementation:
> {code}
> 0: jdbc:drill:schema=dfs.drillTestDirComplexJ> select t.gbyi, count(t.sia) 
> countsia from `complex.json` t group by t.gbyi;
> Query failed: RemoteRpcException: Failure while running fragment., Schema is 
> currently null.  You must call buildSchema(SelectionVectorMode) before this 
> container can return a schema. [ 12856530-3133-45be-bdf4-ef8cc784f7b3 on 
> qa-node119.qa.lab:31010 ]
> [ 12856530-3133-45be-bdf4-ef8cc784f7b3 on qa-node119.qa.lab:31010 ]
> Error: exception while executing query: Failure while executing query. 
> (state=,code=0)
> {code}
> drillbit.log
> {code}
> 2015-03-04 13:44:51,383 [2b08832b-9247-e90c-785d-751f02fc1548:frag:2:0] ERROR 
> o.a.drill.exec.ops.FragmentContext - Fragment Context received failure.
> org.apache.drill.exec.exception.SchemaChangeException: Failure while 
> materializing expression.
> Error in expression at index 0.  Error: Missing function implementation: 
> [count(BIGINT-REPEATED)].  Full expression: null.
> at 
> org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.createAggregatorInternal(HashAggBatch.java:210)
>  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.createAggregator(HashAggBatch.java:158)
>  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.buildSchema(HashAggBatch.java:101)
>  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:130)
>  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:118)
>  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:67) 
> [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.innerNext(PartitionSenderRootExec.java:114)
>  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:57) 
> [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:121)
>  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.WorkManager$RunnableWrapper.run(WorkManager.java:303)
>  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_45]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_45]
> at java.lang.Thread.run(Thread.java:744) [na:1.7.0_45]
> 2015-03-04 13:44:51,383 [2b08832b-9247-e90c-785d-751f02fc1548:frag:2:0] WARN  
> o.a.d.e.w.fragment.FragmentExecutor - Error while initializing or executing 
> fragment
> java.lang.NullPointerException: Schema is currently null.  You must call 
> buildSchema(SelectionVectorMode) before this container can return a schema.
> at 
> com.google.common.base.Preconditions.checkNotNull(Preconditions.java:208) 
> ~[guava-14.0.1.jar:na]
> at 
> org.apache.drill.exec.record.VectorContainer.getSchema(VectorContainer.java:261)
>  ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
> at 
> 

[jira] [Commented] (DRILL-2385) count on complex objects failed with missing function implementation

2016-06-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15341891#comment-15341891
 ] 

ASF GitHub Bot commented on DRILL-2385:
---

Github user vdiravka closed the pull request at:

https://github.com/apache/drill/pull/501


> count on complex objects failed with missing function implementation
> 
>
> Key: DRILL-2385
> URL: https://issues.apache.org/jira/browse/DRILL-2385
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 0.8.0
>Reporter: Chun Chang
>Assignee: Vitalii Diravka
>Priority: Minor
> Fix For: 1.7.0
>
>
> #Wed Mar 04 01:23:42 EST 2015
> git.commit.id.abbrev=71b6bfe
> Have a complex type looks like the following:
> {code}
> 0: jdbc:drill:schema=dfs.drillTestDirComplexJ> select t.sia from 
> `complex.json` t limit 1;
> ++
> |sia |
> ++
> | [1,11,101,1001] |
> ++
> {code}
> A count on the complex type will fail with missing function implementation:
> {code}
> 0: jdbc:drill:schema=dfs.drillTestDirComplexJ> select t.gbyi, count(t.sia) 
> countsia from `complex.json` t group by t.gbyi;
> Query failed: RemoteRpcException: Failure while running fragment., Schema is 
> currently null.  You must call buildSchema(SelectionVectorMode) before this 
> container can return a schema. [ 12856530-3133-45be-bdf4-ef8cc784f7b3 on 
> qa-node119.qa.lab:31010 ]
> [ 12856530-3133-45be-bdf4-ef8cc784f7b3 on qa-node119.qa.lab:31010 ]
> Error: exception while executing query: Failure while executing query. 
> (state=,code=0)
> {code}
> drillbit.log
> {code}
> 2015-03-04 13:44:51,383 [2b08832b-9247-e90c-785d-751f02fc1548:frag:2:0] ERROR 
> o.a.drill.exec.ops.FragmentContext - Fragment Context received failure.
> org.apache.drill.exec.exception.SchemaChangeException: Failure while 
> materializing expression.
> Error in expression at index 0.  Error: Missing function implementation: 
> [count(BIGINT-REPEATED)].  Full expression: null.
> at 
> org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.createAggregatorInternal(HashAggBatch.java:210)
>  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.createAggregator(HashAggBatch.java:158)
>  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.buildSchema(HashAggBatch.java:101)
>  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:130)
>  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:118)
>  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:67) 
> [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.innerNext(PartitionSenderRootExec.java:114)
>  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:57) 
> [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:121)
>  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.WorkManager$RunnableWrapper.run(WorkManager.java:303)
>  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_45]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_45]
> at java.lang.Thread.run(Thread.java:744) [na:1.7.0_45]
> 2015-03-04 13:44:51,383 [2b08832b-9247-e90c-785d-751f02fc1548:frag:2:0] WARN  
> o.a.d.e.w.fragment.FragmentExecutor - Error while initializing or executing 
> fragment
> java.lang.NullPointerException: Schema is currently null.  You must call 
> buildSchema(SelectionVectorMode) before this container can return a schema.
> at 
> com.google.common.base.Preconditions.checkNotNull(Preconditions.java:208) 
> ~[guava-14.0.1.jar:na]
> at 
> org.apache.drill.exec.record.VectorContainer.getSchema(VectorContainer.java:261)
>  ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.getSchema(AbstractRecordBatch.java:155)
>  

[jira] [Comment Edited] (DRILL-4734) Query against HBase table on a 5 node cluster fails with SchemaChangeException

2016-06-21 Thread Aman Sinha (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15341843#comment-15341843
 ] 

Aman Sinha edited comment on DRILL-4734 at 6/21/16 2:15 PM:


Attached Explain plan with 2 nodes


was (Author: amansinha100):
Explain plan with 2 nodes

> Query against HBase table on a 5 node cluster fails with SchemaChangeException
> --
>
> Key: DRILL-4734
> URL: https://issues.apache.org/jira/browse/DRILL-4734
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators, Storage - HBase
>Affects Versions: 1.6.0
>Reporter: Aman Sinha
> Attachments: 2nodes.explain.txt, 5nodes.explain.txt
>
>
> [Creating this JIRA on behalf of Qiang Li]
> Let say I have two tables.
> {noformat}
> offers_ref0 : rowkey salt(1byte)+long uid(8 byte ) , family: v,  qualifier:
> v(string)
> offers_nation_idx: rowkey salt(1byte) + string, family:v, qualifier: v(long
> 8 byte)
> {noformat}
> there is the SQL:
> {noformat}
> select CONVERT_FROM(BYTE_SUBSTR(`ref0`.row_key,-8,8),'BIGINT_BE') as did, 
>convert_from(`ref0`.`v`.`v`,'UTF8') as v  from hbase.`offers_nation_idx` 
> as`nation` 
> join hbase.offers_ref0 as `ref0` 
> on CONVERT_FROM(BYTE_SUBSTR(`ref0`.row_key,-8,8),'BIGINT_BE') = 
>  CONVERT_FROM(nation.`v`.`v`,'BIGINT_BE') 
>where `nation`.row_key  > '0br' and `nation`.row_key  < '0bs' 
>  limit 10
> {noformat}
> When I execute the query with single node or less than 5 nodes, its working
> good. But when I execute it in cluster which have about 14 nodes, its throw
> a exception:
> First time will throw this exception:
> *Caused by: java.sql.SQLException: SYSTEM ERROR: SchemaChangeException:
> Hash join does not support schema changes*
> Then if I query again, it will always throw below exception:
> {noformat}
> *Query Failed: An Error Occurred*
> *org.apache.drill.common.exceptions.UserRemoteException: SYSTEM 
> ERROR:IllegalStateException: 
> Failure while reading vector. Expected vector class of 
> org.apache.drill.exec.vector.NullableIntVector but 
> was holding vector class org.apache.drill.exec.vector.complex.MapVector, 
> field=v(MAP:REQUIRED
>  [v(VARBINARY:OPTIONAL)[$bits$(UINT1:REQUIRED),
> v(VARBINARY:OPTIONAL)[$offsets$(UINT4:REQUIRED)]]] Fragment 12:4
>  [Error Id:06c6eae4-0822-4714-b0bf-a6e04ebfec79 on xxx:31010]*
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4734) Query against HBase table on a 5 node cluster fails with SchemaChangeException

2016-06-21 Thread Aman Sinha (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aman Sinha updated DRILL-4734:
--
Attachment: 2nodes.explain.txt

Explain plan with 2 nodes

> Query against HBase table on a 5 node cluster fails with SchemaChangeException
> --
>
> Key: DRILL-4734
> URL: https://issues.apache.org/jira/browse/DRILL-4734
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators, Storage - HBase
>Affects Versions: 1.6.0
>Reporter: Aman Sinha
> Attachments: 2nodes.explain.txt, 5nodes.explain.txt
>
>
> [Creating this JIRA on behalf of Qiang Li]
> Let say I have two tables.
> {noformat}
> offers_ref0 : rowkey salt(1byte)+long uid(8 byte ) , family: v,  qualifier:
> v(string)
> offers_nation_idx: rowkey salt(1byte) + string, family:v, qualifier: v(long
> 8 byte)
> {noformat}
> there is the SQL:
> {noformat}
> select CONVERT_FROM(BYTE_SUBSTR(`ref0`.row_key,-8,8),'BIGINT_BE') as did, 
>convert_from(`ref0`.`v`.`v`,'UTF8') as v  from hbase.`offers_nation_idx` 
> as`nation` 
> join hbase.offers_ref0 as `ref0` 
> on CONVERT_FROM(BYTE_SUBSTR(`ref0`.row_key,-8,8),'BIGINT_BE') = 
>  CONVERT_FROM(nation.`v`.`v`,'BIGINT_BE') 
>where `nation`.row_key  > '0br' and `nation`.row_key  < '0bs' 
>  limit 10
> {noformat}
> When I execute the query with single node or less than 5 nodes, its working
> good. But when I execute it in cluster which have about 14 nodes, its throw
> a exception:
> First time will throw this exception:
> *Caused by: java.sql.SQLException: SYSTEM ERROR: SchemaChangeException:
> Hash join does not support schema changes*
> Then if I query again, it will always throw below exception:
> {noformat}
> *Query Failed: An Error Occurred*
> *org.apache.drill.common.exceptions.UserRemoteException: SYSTEM 
> ERROR:IllegalStateException: 
> Failure while reading vector. Expected vector class of 
> org.apache.drill.exec.vector.NullableIntVector but 
> was holding vector class org.apache.drill.exec.vector.complex.MapVector, 
> field=v(MAP:REQUIRED
>  [v(VARBINARY:OPTIONAL)[$bits$(UINT1:REQUIRED),
> v(VARBINARY:OPTIONAL)[$offsets$(UINT4:REQUIRED)]]] Fragment 12:4
>  [Error Id:06c6eae4-0822-4714-b0bf-a6e04ebfec79 on xxx:31010]*
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4734) Query against HBase table on a 5 node cluster fails with SchemaChangeException

2016-06-21 Thread Aman Sinha (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aman Sinha updated DRILL-4734:
--
Attachment: 5nodes.explain.txt

Attached Explain plan with 5 nodes

> Query against HBase table on a 5 node cluster fails with SchemaChangeException
> --
>
> Key: DRILL-4734
> URL: https://issues.apache.org/jira/browse/DRILL-4734
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators, Storage - HBase
>Affects Versions: 1.6.0
>Reporter: Aman Sinha
> Attachments: 2nodes.explain.txt, 5nodes.explain.txt
>
>
> [Creating this JIRA on behalf of Qiang Li]
> Let say I have two tables.
> {noformat}
> offers_ref0 : rowkey salt(1byte)+long uid(8 byte ) , family: v,  qualifier:
> v(string)
> offers_nation_idx: rowkey salt(1byte) + string, family:v, qualifier: v(long
> 8 byte)
> {noformat}
> there is the SQL:
> {noformat}
> select CONVERT_FROM(BYTE_SUBSTR(`ref0`.row_key,-8,8),'BIGINT_BE') as did, 
>convert_from(`ref0`.`v`.`v`,'UTF8') as v  from hbase.`offers_nation_idx` 
> as`nation` 
> join hbase.offers_ref0 as `ref0` 
> on CONVERT_FROM(BYTE_SUBSTR(`ref0`.row_key,-8,8),'BIGINT_BE') = 
>  CONVERT_FROM(nation.`v`.`v`,'BIGINT_BE') 
>where `nation`.row_key  > '0br' and `nation`.row_key  < '0bs' 
>  limit 10
> {noformat}
> When I execute the query with single node or less than 5 nodes, its working
> good. But when I execute it in cluster which have about 14 nodes, its throw
> a exception:
> First time will throw this exception:
> *Caused by: java.sql.SQLException: SYSTEM ERROR: SchemaChangeException:
> Hash join does not support schema changes*
> Then if I query again, it will always throw below exception:
> {noformat}
> *Query Failed: An Error Occurred*
> *org.apache.drill.common.exceptions.UserRemoteException: SYSTEM 
> ERROR:IllegalStateException: 
> Failure while reading vector. Expected vector class of 
> org.apache.drill.exec.vector.NullableIntVector but 
> was holding vector class org.apache.drill.exec.vector.complex.MapVector, 
> field=v(MAP:REQUIRED
>  [v(VARBINARY:OPTIONAL)[$bits$(UINT1:REQUIRED),
> v(VARBINARY:OPTIONAL)[$offsets$(UINT4:REQUIRED)]]] Fragment 12:4
>  [Error Id:06c6eae4-0822-4714-b0bf-a6e04ebfec79 on xxx:31010]*
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4734) Query against HBase table on a 5 node cluster fails with SchemaChangeException

2016-06-21 Thread Aman Sinha (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aman Sinha updated DRILL-4734:
--
Description: 
[Creating this JIRA on behalf of Qiang Li]

Let say I have two tables.

{noformat}
offers_ref0 : rowkey salt(1byte)+long uid(8 byte ) , family: v,  qualifier:
v(string)
offers_nation_idx: rowkey salt(1byte) + string, family:v, qualifier: v(long
8 byte)
{noformat}

there is the SQL:

{noformat}
select CONVERT_FROM(BYTE_SUBSTR(`ref0`.row_key,-8,8),'BIGINT_BE') as did, 
   convert_from(`ref0`.`v`.`v`,'UTF8') as v  from hbase.`offers_nation_idx` 
as`nation` 
join hbase.offers_ref0 as `ref0` 
on CONVERT_FROM(BYTE_SUBSTR(`ref0`.row_key,-8,8),'BIGINT_BE') = 
 CONVERT_FROM(nation.`v`.`v`,'BIGINT_BE') 
   where `nation`.row_key  > '0br' and `nation`.row_key  < '0bs' 
 limit 10
{noformat}
When I execute the query with single node or less than 5 nodes, its working
good. But when I execute it in cluster which have about 14 nodes, its throw
a exception:

First time will throw this exception:
*Caused by: java.sql.SQLException: SYSTEM ERROR: SchemaChangeException:
Hash join does not support schema changes*

Then if I query again, it will always throw below exception:

{noformat}
*Query Failed: An Error Occurred*
*org.apache.drill.common.exceptions.UserRemoteException: SYSTEM 
ERROR:IllegalStateException: 
Failure while reading vector. Expected vector class of 
org.apache.drill.exec.vector.NullableIntVector but 
was holding vector class org.apache.drill.exec.vector.complex.MapVector, 
field=v(MAP:REQUIRED
 [v(VARBINARY:OPTIONAL)[$bits$(UINT1:REQUIRED),
v(VARBINARY:OPTIONAL)[$offsets$(UINT4:REQUIRED)]]] Fragment 12:4
 [Error Id:06c6eae4-0822-4714-b0bf-a6e04ebfec79 on xxx:31010]*
{noformat}


  was:
[Creating this JIRA on behalf of Qiang Li]

Let say I have two tables.

{noformat}
offers_ref0 : rowkey salt(1byte)+long uid(8 byte ) , family: v,  qualifier:
v(string)
offers_nation_idx: rowkey salt(1byte) + string, family:v, qualifier: v(long
8 byte)
{noformat}

there is the SQL:

{noformat}
select CONVERT_FROM(BYTE_SUBSTR(`ref0`.row_key,-8,8),'BIGINT_BE') as did, 
convert_from(`ref0`.`v`.`v`,'UTF8') as v  from hbase.`offers_nation_idx` 
as`nation`  join hbase.offers_ref0 as `ref0` on 
CONVERT_FROM(BYTE_SUBSTR(`ref0`.row_key,-8,8),'BIGINT_BE') = 
CONVERT_FROM(nation.`v`.`v`,'BIGINT_BE') where `nation`.row_key  > '0br' and 
`nation`.row_key  < '0bs' limit 10
{noformat}
When I execute the query with single node or less than 5 nodes, its working
good. But when I execute it in cluster which have about 14 nodes, its throw
a exception:

First time will throw this exception:
*Caused by: java.sql.SQLException: SYSTEM ERROR: SchemaChangeException:
Hash join does not support schema changes*

Then if I query again, it will always throw below exception:

{noformat}
*Query Failed: An Error Occurred*
*org.apache.drill.common.exceptions.UserRemoteException: SYSTEM 
ERROR:IllegalStateException: Failure while reading vector. Expected vector 
class of org.apache.drill.exec.vector.NullableIntVector but was holding vector 
class org.apache.drill.exec.vector.complex.MapVector, field=v(MAP:REQUIRED 
[v(VARBINARY:OPTIONAL)[$bits$(UINT1:REQUIRED),
v(VARBINARY:OPTIONAL)[$offsets$(UINT4:REQUIRED)]]] Fragment 12:4 [Error 
Id:06c6eae4-0822-4714-b0bf-a6e04ebfec79 on xxx:31010]*
{noformat}



> Query against HBase table on a 5 node cluster fails with SchemaChangeException
> --
>
> Key: DRILL-4734
> URL: https://issues.apache.org/jira/browse/DRILL-4734
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators, Storage - HBase
>Affects Versions: 1.6.0
>Reporter: Aman Sinha
>
> [Creating this JIRA on behalf of Qiang Li]
> Let say I have two tables.
> {noformat}
> offers_ref0 : rowkey salt(1byte)+long uid(8 byte ) , family: v,  qualifier:
> v(string)
> offers_nation_idx: rowkey salt(1byte) + string, family:v, qualifier: v(long
> 8 byte)
> {noformat}
> there is the SQL:
> {noformat}
> select CONVERT_FROM(BYTE_SUBSTR(`ref0`.row_key,-8,8),'BIGINT_BE') as did, 
>convert_from(`ref0`.`v`.`v`,'UTF8') as v  from hbase.`offers_nation_idx` 
> as`nation` 
> join hbase.offers_ref0 as `ref0` 
> on CONVERT_FROM(BYTE_SUBSTR(`ref0`.row_key,-8,8),'BIGINT_BE') = 
>  CONVERT_FROM(nation.`v`.`v`,'BIGINT_BE') 
>where `nation`.row_key  > '0br' and `nation`.row_key  < '0bs' 
>  limit 10
> {noformat}
> When I execute the query with single node or less than 5 nodes, its working
> good. But when I execute it in cluster which have about 14 nodes, its throw
> a exception:
> First time will throw this exception:
> *Caused by: java.sql.SQLException: SYSTEM ERROR: SchemaChangeException:
> Hash join does not support schema changes*
> Then if I query again, it will 

[jira] [Updated] (DRILL-4734) Query against HBase table on a 5 node cluster fails with SchemaChangeException

2016-06-21 Thread Aman Sinha (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aman Sinha updated DRILL-4734:
--
Description: 
[Creating this JIRA on behalf of Qiang Li]

Let say I have two tables.

{noformat}
offers_ref0 : rowkey salt(1byte)+long uid(8 byte ) , family: v,  qualifier:
v(string)
offers_nation_idx: rowkey salt(1byte) + string, family:v, qualifier: v(long
8 byte)
{noformat}

there is the SQL:

{noformat}
select CONVERT_FROM(BYTE_SUBSTR(`ref0`.row_key,-8,8),'BIGINT_BE') as did, 
convert_from(`ref0`.`v`.`v`,'UTF8') as v  from hbase.`offers_nation_idx` 
as`nation`  join hbase.offers_ref0 as `ref0` on 
CONVERT_FROM(BYTE_SUBSTR(`ref0`.row_key,-8,8),'BIGINT_BE') = 
CONVERT_FROM(nation.`v`.`v`,'BIGINT_BE') where `nation`.row_key  > '0br' and 
`nation`.row_key  < '0bs' limit 10
{noformat}
When I execute the query with single node or less than 5 nodes, its working
good. But when I execute it in cluster which have about 14 nodes, its throw
a exception:

First time will throw this exception:
*Caused by: java.sql.SQLException: SYSTEM ERROR: SchemaChangeException:
Hash join does not support schema changes*

Then if I query again, it will always throw below exception:

{noformat}
*Query Failed: An Error Occurred*
*org.apache.drill.common.exceptions.UserRemoteException: SYSTEM 
ERROR:IllegalStateException: Failure while reading vector. Expected vector 
class of org.apache.drill.exec.vector.NullableIntVector but was holding vector 
class org.apache.drill.exec.vector.complex.MapVector, field=v(MAP:REQUIRED 
[v(VARBINARY:OPTIONAL)[$bits$(UINT1:REQUIRED),
v(VARBINARY:OPTIONAL)[$offsets$(UINT4:REQUIRED)]]] Fragment 12:4 [Error 
Id:06c6eae4-0822-4714-b0bf-a6e04ebfec79 on xxx:31010]*
{noformat}


  was:

[Creating this JIRA on behalf of Qiang Li]

Let say I have two tables.

offers_ref0 : rowkey salt(1byte)+long uid(8 byte ) , family: v,  qualifier:
v(string)
offers_nation_idx: rowkey salt(1byte) + string, family:v, qualifier: v(long
8 byte)

there is the SQL:

select CONVERT_FROM(BYTE_SUBSTR(`ref0`.row_key,-8,8),'BIGINT_BE') as uid,
convert_from(`ref0`.`v`.`v`,'UTF8') as v  from hbase.`offers_nation_idx` as
`nation` join hbase.offers_ref0 as `ref0` on
CONVERT_FROM(BYTE_SUBSTR(`ref0`.row_key,-8,8),'BIGINT_BE') =
CONVERT_FROM(nation.`v`.`v`,'BIGINT_BE') where `nation`.row_key  > '0br'
and `nation`.row_key  < '0bs' limit 10

When I execute the query with single node or less than 5 nodes, its working
good. But when I execute it in cluster which have about 14 nodes, its throw
a exception:

First time will throw this exception:
*Caused by: java.sql.SQLException: SYSTEM ERROR: SchemaChangeException:
Hash join does not support schema changes*

Then if I query again, it will always throw below exception:
*Query Failed: An Error Occurred*
*org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR:
IllegalStateException: Failure while reading vector. Expected vector class
of org.apache.drill.exec.vector.NullableIntVector but was holding vector
class org.apache.drill.exec.vector.complex.MapVector, field=
v(MAP:REQUIRED)[v(VARBINARY:OPTIONAL)[$bits$(UINT1:REQUIRED),
v(VARBINARY:OPTIONAL)[$offsets$(UINT4:REQUIRED)]]] Fragment 12:4 [Error Id:
06c6eae4-0822-4714-b0bf-a6e04ebfec79 on xxx:31010]*

Its very strange, and I do not know how to solve it.
I tried add node to the cluster one by one, it will reproduce when I added
5 nodes. Can anyone help me solve this issue?


> Query against HBase table on a 5 node cluster fails with SchemaChangeException
> --
>
> Key: DRILL-4734
> URL: https://issues.apache.org/jira/browse/DRILL-4734
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators, Storage - HBase
>Affects Versions: 1.6.0
>Reporter: Aman Sinha
>
> [Creating this JIRA on behalf of Qiang Li]
> Let say I have two tables.
> {noformat}
> offers_ref0 : rowkey salt(1byte)+long uid(8 byte ) , family: v,  qualifier:
> v(string)
> offers_nation_idx: rowkey salt(1byte) + string, family:v, qualifier: v(long
> 8 byte)
> {noformat}
> there is the SQL:
> {noformat}
> select CONVERT_FROM(BYTE_SUBSTR(`ref0`.row_key,-8,8),'BIGINT_BE') as did, 
> convert_from(`ref0`.`v`.`v`,'UTF8') as v  from hbase.`offers_nation_idx` 
> as`nation`  join hbase.offers_ref0 as `ref0` on 
> CONVERT_FROM(BYTE_SUBSTR(`ref0`.row_key,-8,8),'BIGINT_BE') = 
> CONVERT_FROM(nation.`v`.`v`,'BIGINT_BE') where `nation`.row_key  > '0br' and 
> `nation`.row_key  < '0bs' limit 10
> {noformat}
> When I execute the query with single node or less than 5 nodes, its working
> good. But when I execute it in cluster which have about 14 nodes, its throw
> a exception:
> First time will throw this exception:
> *Caused by: java.sql.SQLException: SYSTEM ERROR: SchemaChangeException:
> Hash join does not support 

[jira] [Created] (DRILL-4734) Query against HBase table on a 5 node cluster fails with SchemaChangeException

2016-06-21 Thread Aman Sinha (JIRA)
Aman Sinha created DRILL-4734:
-

 Summary: Query against HBase table on a 5 node cluster fails with 
SchemaChangeException
 Key: DRILL-4734
 URL: https://issues.apache.org/jira/browse/DRILL-4734
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators, Storage - HBase
Affects Versions: 1.6.0
Reporter: Aman Sinha



[Creating this JIRA on behalf of Qiang Li]

Let say I have two tables.

offers_ref0 : rowkey salt(1byte)+long uid(8 byte ) , family: v,  qualifier:
v(string)
offers_nation_idx: rowkey salt(1byte) + string, family:v, qualifier: v(long
8 byte)

there is the SQL:

select CONVERT_FROM(BYTE_SUBSTR(`ref0`.row_key,-8,8),'BIGINT_BE') as uid,
convert_from(`ref0`.`v`.`v`,'UTF8') as v  from hbase.`offers_nation_idx` as
`nation` join hbase.offers_ref0 as `ref0` on
CONVERT_FROM(BYTE_SUBSTR(`ref0`.row_key,-8,8),'BIGINT_BE') =
CONVERT_FROM(nation.`v`.`v`,'BIGINT_BE') where `nation`.row_key  > '0br'
and `nation`.row_key  < '0bs' limit 10

When I execute the query with single node or less than 5 nodes, its working
good. But when I execute it in cluster which have about 14 nodes, its throw
a exception:

First time will throw this exception:
*Caused by: java.sql.SQLException: SYSTEM ERROR: SchemaChangeException:
Hash join does not support schema changes*

Then if I query again, it will always throw below exception:
*Query Failed: An Error Occurred*
*org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR:
IllegalStateException: Failure while reading vector. Expected vector class
of org.apache.drill.exec.vector.NullableIntVector but was holding vector
class org.apache.drill.exec.vector.complex.MapVector, field=
v(MAP:REQUIRED)[v(VARBINARY:OPTIONAL)[$bits$(UINT1:REQUIRED),
v(VARBINARY:OPTIONAL)[$offsets$(UINT4:REQUIRED)]]] Fragment 12:4 [Error Id:
06c6eae4-0822-4714-b0bf-a6e04ebfec79 on xxx:31010]*

Its very strange, and I do not know how to solve it.
I tried add node to the cluster one by one, it will reproduce when I added
5 nodes. Can anyone help me solve this issue?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4733) max(dir0) reading more columns than necessary

2016-06-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15341831#comment-15341831
 ] 

ASF GitHub Bot commented on DRILL-4733:
---

GitHub user arina-ielchiieva opened a pull request:

https://github.com/apache/drill/pull/531

DRILL-4733: max(dir0) reading more columns than necessary



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/arina-ielchiieva/drill DRILL-4733

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/531.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #531


commit 91b55e88311061ad6729d35b32ca150734991971
Author: Arina Ielchiieva 
Date:   2016-06-21T12:33:32Z

DRILL-4733: max(dir0) reading more columns than necessary




> max(dir0) reading more columns than necessary
> -
>
> Key: DRILL-4733
> URL: https://issues.apache.org/jira/browse/DRILL-4733
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization, Storage - Parquet
>Affects Versions: 1.7.0
>Reporter: Rahul Challapalli
>Assignee: Arina Ielchiieva
>Priority: Critical
> Fix For: 1.7.0
>
> Attachments: bug.tgz
>
>
> The below query started to fail from this commit : 
> 3209886a8548eea4a2f74c059542672f8665b8d2
> {code}
> select max(dir0) from dfs.`/drill/testdata/bug/2016`;
> Error: UNSUPPORTED_OPERATION ERROR: Streaming aggregate does not support 
> schema changes
> Fragment 0:0
> [Error Id: b0060205-e9a6-428a-9803-7b4312b2c6f4 on qa-node190.qa.lab:31010] 
> (state=,code=0)
> {code}
> The sub-folders contains files which do have schema change for one column 
> "contributions" (int32 vs double). However prior to this commit we did not 
> fail in the scenario. Log files and test data are attached



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3726) Drill is not properly interpreting CRLF (0d0a). CR gets read as content.

2016-06-21 Thread Arina Ielchiieva (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15341825#comment-15341825
 ] 

Arina Ielchiieva commented on DRILL-3726:
-

User will have two options:
1. specify delimiter in select clause:
select * from table(dfs.`my_table`(type=>'text', 'lineDelimiter'=>'\r\n'))
2. update storage plugin lineDelimiter value to '\r\n' on web UI.

> Drill is not properly interpreting CRLF (0d0a). CR gets read as content.
> 
>
> Key: DRILL-3726
> URL: https://issues.apache.org/jira/browse/DRILL-3726
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Text & CSV
>Affects Versions: 1.1.0
> Environment: Linux RHEL 6.6, OSX 10.9
>Reporter: Edmon Begoli
> Fix For: 1.7.0
>
>   Original Estimate: 120h
>  Remaining Estimate: 120h
>
>   When we query the last attribute of a text file, we get missing characters. 
>  Looking at the row through Drill, a \r is included at the end of the last 
> attribute.  
> Looking in a text editor, it's not embedded into that attribute.
> I'm thinking that Drill is not interpreting CRLF (0d0a) as a new line, only 
> the LF, resulting in the CR becoming part of the last attribute.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3149) TextReader should support multibyte line delimiters

2016-06-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15341814#comment-15341814
 ] 

ASF GitHub Bot commented on DRILL-3149:
---

Github user arina-ielchiieva closed the pull request at:

https://github.com/apache/drill/pull/500


> TextReader should support multibyte line delimiters
> ---
>
> Key: DRILL-3149
> URL: https://issues.apache.org/jira/browse/DRILL-3149
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Text & CSV
>Affects Versions: 1.0.0, 1.1.0
>Reporter: Jim Scott
>Assignee: Arina Ielchiieva
>Priority: Minor
>  Labels: doc-impacting
> Fix For: 1.7.0
>
>
> lineDelimiter in the TextFormatConfig doesn't support \r\n for record 
> delimiters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3149) TextReader should support multibyte line delimiters

2016-06-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15341813#comment-15341813
 ] 

ASF GitHub Bot commented on DRILL-3149:
---

Github user arina-ielchiieva commented on the issue:

https://github.com/apache/drill/pull/500
  
Changed merged into master with commit id - 
223507b76ff6c2227e667ae4a53f743c92edd295


> TextReader should support multibyte line delimiters
> ---
>
> Key: DRILL-3149
> URL: https://issues.apache.org/jira/browse/DRILL-3149
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Text & CSV
>Affects Versions: 1.0.0, 1.1.0
>Reporter: Jim Scott
>Assignee: Arina Ielchiieva
>Priority: Minor
>  Labels: doc-impacting
> Fix For: 1.7.0
>
>
> lineDelimiter in the TextFormatConfig doesn't support \r\n for record 
> delimiters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-4658) cannot specify tab as a fieldDelimiter in table function

2016-06-21 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva resolved DRILL-4658.
-
   Resolution: Fixed
Fix Version/s: 1.7.0

Fix merged into master with commit id - 223507b76ff6c2227e667ae4a53f743c92edd295

> cannot specify tab as a fieldDelimiter in table function
> 
>
> Key: DRILL-4658
> URL: https://issues.apache.org/jira/browse/DRILL-4658
> Project: Apache Drill
>  Issue Type: Bug
>  Components: SQL Parser
>Affects Versions: 1.6.0
> Environment: Mac OS X, Java 8
>Reporter: Vince Gonzalez
>Assignee: Arina Ielchiieva
> Fix For: 1.7.0
>
>
> I can't specify a tab delimiter in the table function because it maybe counts 
> the characters rather than trying to interpret as a character escape code?
> {code}
> 0: jdbc:drill:zk=local> select columns[0] as a, cast(columns[1] as bigint) as 
> b from table(dfs.tmp.`sample_cast.tsv`(type => 'text', fieldDelimiter => 
> '\t', skipFirstLine => true));
> Error: PARSE ERROR: Expected single character but was String: \t
> table sample_cast.tsv
> parameter fieldDelimiter
> SQL Query null
> [Error Id: 3efa82e1-3810-4d4a-b23c-32d6658dffcf on 172.30.1.144:31010] 
> (state=,code=0)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-3726) Drill is not properly interpreting CRLF (0d0a). CR gets read as content.

2016-06-21 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva resolved DRILL-3726.
-
Resolution: Fixed

Fix merged into master with commit id 223507b76ff6c2227e667ae4a53f743c92edd295

> Drill is not properly interpreting CRLF (0d0a). CR gets read as content.
> 
>
> Key: DRILL-3726
> URL: https://issues.apache.org/jira/browse/DRILL-3726
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Text & CSV
>Affects Versions: 1.1.0
> Environment: Linux RHEL 6.6, OSX 10.9
>Reporter: Edmon Begoli
> Fix For: 1.7.0
>
>   Original Estimate: 120h
>  Remaining Estimate: 120h
>
>   When we query the last attribute of a text file, we get missing characters. 
>  Looking at the row through Drill, a \r is included at the end of the last 
> attribute.  
> Looking in a text editor, it's not embedded into that attribute.
> I'm thinking that Drill is not interpreting CRLF (0d0a) as a new line, only 
> the LF, resulting in the CR becoming part of the last attribute.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-3149) TextReader should support multibyte line delimiters

2016-06-21 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva resolved DRILL-3149.
-
   Resolution: Fixed
Fix Version/s: (was: Future)
   1.7.0

> TextReader should support multibyte line delimiters
> ---
>
> Key: DRILL-3149
> URL: https://issues.apache.org/jira/browse/DRILL-3149
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Text & CSV
>Affects Versions: 1.0.0, 1.1.0
>Reporter: Jim Scott
>Assignee: Arina Ielchiieva
>Priority: Minor
>  Labels: doc-impacting
> Fix For: 1.7.0
>
>
> lineDelimiter in the TextFormatConfig doesn't support \r\n for record 
> delimiters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-3149) TextReader should support multibyte line delimiters

2016-06-21 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-3149:

Labels: doc-impacting  (was: )

> TextReader should support multibyte line delimiters
> ---
>
> Key: DRILL-3149
> URL: https://issues.apache.org/jira/browse/DRILL-3149
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Text & CSV
>Affects Versions: 1.0.0, 1.1.0
>Reporter: Jim Scott
>Assignee: Arina Ielchiieva
>Priority: Minor
>  Labels: doc-impacting
> Fix For: 1.7.0
>
>
> lineDelimiter in the TextFormatConfig doesn't support \r\n for record 
> delimiters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3149) TextReader should support multibyte line delimiters

2016-06-21 Thread Arina Ielchiieva (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15341807#comment-15341807
 ] 

Arina Ielchiieva commented on DRILL-3149:
-

Merged into master with commit id 223507b76ff6c2227e667ae4a53f743c92edd295

> TextReader should support multibyte line delimiters
> ---
>
> Key: DRILL-3149
> URL: https://issues.apache.org/jira/browse/DRILL-3149
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Text & CSV
>Affects Versions: 1.0.0, 1.1.0
>Reporter: Jim Scott
>Assignee: Arina Ielchiieva
>Priority: Minor
>  Labels: doc-impacting
> Fix For: 1.7.0
>
>
> lineDelimiter in the TextFormatConfig doesn't support \r\n for record 
> delimiters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4701) Fix log name and missing lines in logs on Web UI

2016-06-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15341791#comment-15341791
 ] 

ASF GitHub Bot commented on DRILL-4701:
---

Github user arina-ielchiieva commented on the issue:

https://github.com/apache/drill/pull/511
  
Changes merged into master with commit id 
4123ed2a539cd3f9812f22f96d56aa4709828acd


> Fix log name and missing lines in logs on Web UI
> 
>
> Key: DRILL-4701
> URL: https://issues.apache.org/jira/browse/DRILL-4701
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
> Fix For: 1.7.0
>
>
> 1. When the log files are downloaded from the ui, the name of the downloaded 
> file is "download". We should save the file with the same name as the log 
> file (ie. drillbit.log)
> 2. The last N lines of the log file displayed in the web UI do not match the 
> log file itself. Some lines are missing compared with actual log.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4701) Fix log name and missing lines in logs on Web UI

2016-06-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15341792#comment-15341792
 ] 

ASF GitHub Bot commented on DRILL-4701:
---

Github user arina-ielchiieva closed the pull request at:

https://github.com/apache/drill/pull/511


> Fix log name and missing lines in logs on Web UI
> 
>
> Key: DRILL-4701
> URL: https://issues.apache.org/jira/browse/DRILL-4701
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
> Fix For: 1.7.0
>
>
> 1. When the log files are downloaded from the ui, the name of the downloaded 
> file is "download". We should save the file with the same name as the log 
> file (ie. drillbit.log)
> 2. The last N lines of the log file displayed in the web UI do not match the 
> log file itself. Some lines are missing compared with actual log.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-2593) 500 error when crc for a query profile is out of sync

2016-06-21 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva resolved DRILL-2593.
-
Resolution: Fixed

> 500 error when crc for a query profile is out of sync
> -
>
> Key: DRILL-2593
> URL: https://issues.apache.org/jira/browse/DRILL-2593
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - HTTP
>Affects Versions: 0.7.0
>Reporter: Jason Altekruse
>Assignee: Arina Ielchiieva
> Fix For: 1.7.0
>
> Attachments: warning1.JPG, warning2.JPG
>
>
> To reproduce, on a machine where an embedded drillbit has been run, edit one 
> of the profiles stored in /tmp/drill/profiles and try to navigate to the 
> profiles page on the Web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-4571) Add link to local Drill logs from the web UI

2016-06-21 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva resolved DRILL-4571.
-
Resolution: Fixed

Fix was merged into master with commit id 
4123ed2a539cd3f9812f22f96d56aa4709828acd

> Add link to local Drill logs from the web UI
> 
>
> Key: DRILL-4571
> URL: https://issues.apache.org/jira/browse/DRILL-4571
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.6.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>  Labels: doc-impacting
> Fix For: 1.7.0
>
> Attachments: display_log.JPG, drillbit_download.log.gz, 
> drillbit_queries_json_screenshot.jpg, drillbit_ui.log, log_list.JPG
>
>
> Now we have link to the profile from the web UI.
> It will be handy for the users to have the link to local logs as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4701) Fix log name and missing lines in logs on Web UI

2016-06-21 Thread Arina Ielchiieva (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15341787#comment-15341787
 ] 

Arina Ielchiieva commented on DRILL-4701:
-

Merged into master with commit id 4123ed2a539cd3f9812f22f96d56aa4709828acd

> Fix log name and missing lines in logs on Web UI
> 
>
> Key: DRILL-4701
> URL: https://issues.apache.org/jira/browse/DRILL-4701
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
> Fix For: 1.7.0
>
>
> 1. When the log files are downloaded from the ui, the name of the downloaded 
> file is "download". We should save the file with the same name as the log 
> file (ie. drillbit.log)
> 2. The last N lines of the log file displayed in the web UI do not match the 
> log file itself. Some lines are missing compared with actual log.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-4701) Fix log name and missing lines in logs on Web UI

2016-06-21 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva resolved DRILL-4701.
-
Resolution: Fixed

> Fix log name and missing lines in logs on Web UI
> 
>
> Key: DRILL-4701
> URL: https://issues.apache.org/jira/browse/DRILL-4701
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
> Fix For: 1.7.0
>
>
> 1. When the log files are downloaded from the ui, the name of the downloaded 
> file is "download". We should save the file with the same name as the log 
> file (ie. drillbit.log)
> 2. The last N lines of the log file displayed in the web UI do not match the 
> log file itself. Some lines are missing compared with actual log.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4716) status.json doesn't work in drill ui

2016-06-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15341785#comment-15341785
 ] 

ASF GitHub Bot commented on DRILL-4716:
---

Github user arina-ielchiieva commented on the issue:

https://github.com/apache/drill/pull/522
  
Chnages merged into master with commit 
1c451a341e80c2372be47d999741240fb5495eea


> status.json doesn't work in drill ui
> 
>
> Key: DRILL-4716
> URL: https://issues.apache.org/jira/browse/DRILL-4716
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - HTTP
>Affects Versions: 1.6.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Minor
> Fix For: 1.7.0
>
>
> 1. http://localhost:8047/status returns "Running!"
> But http://localhost:8047/status.json gives error.
> {code}
> {
>   "errorMessage" : "HTTP 404 Not Found"
> }
> {code}
> 2. Remove link to System Options on page http://localhost:8047/status as 
> redundant.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4716) status.json doesn't work in drill ui

2016-06-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15341786#comment-15341786
 ] 

ASF GitHub Bot commented on DRILL-4716:
---

Github user arina-ielchiieva closed the pull request at:

https://github.com/apache/drill/pull/522


> status.json doesn't work in drill ui
> 
>
> Key: DRILL-4716
> URL: https://issues.apache.org/jira/browse/DRILL-4716
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - HTTP
>Affects Versions: 1.6.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Minor
> Fix For: 1.7.0
>
>
> 1. http://localhost:8047/status returns "Running!"
> But http://localhost:8047/status.json gives error.
> {code}
> {
>   "errorMessage" : "HTTP 404 Not Found"
> }
> {code}
> 2. Remove link to System Options on page http://localhost:8047/status as 
> redundant.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-4716) status.json doesn't work in drill ui

2016-06-21 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva resolved DRILL-4716.
-
Resolution: Fixed

> status.json doesn't work in drill ui
> 
>
> Key: DRILL-4716
> URL: https://issues.apache.org/jira/browse/DRILL-4716
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - HTTP
>Affects Versions: 1.6.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Minor
> Fix For: 1.7.0
>
>
> 1. http://localhost:8047/status returns "Running!"
> But http://localhost:8047/status.json gives error.
> {code}
> {
>   "errorMessage" : "HTTP 404 Not Found"
> }
> {code}
> 2. Remove link to System Options on page http://localhost:8047/status as 
> redundant.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-2385) count on complex objects failed with missing function implementation

2016-06-21 Thread Vitalii Diravka (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalii Diravka resolved DRILL-2385.

Resolution: Fixed

Fixed in f86c4fa8.

> count on complex objects failed with missing function implementation
> 
>
> Key: DRILL-2385
> URL: https://issues.apache.org/jira/browse/DRILL-2385
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 0.8.0
>Reporter: Chun Chang
>Assignee: Vitalii Diravka
>Priority: Minor
> Fix For: 1.7.0
>
>
> #Wed Mar 04 01:23:42 EST 2015
> git.commit.id.abbrev=71b6bfe
> Have a complex type looks like the following:
> {code}
> 0: jdbc:drill:schema=dfs.drillTestDirComplexJ> select t.sia from 
> `complex.json` t limit 1;
> ++
> |sia |
> ++
> | [1,11,101,1001] |
> ++
> {code}
> A count on the complex type will fail with missing function implementation:
> {code}
> 0: jdbc:drill:schema=dfs.drillTestDirComplexJ> select t.gbyi, count(t.sia) 
> countsia from `complex.json` t group by t.gbyi;
> Query failed: RemoteRpcException: Failure while running fragment., Schema is 
> currently null.  You must call buildSchema(SelectionVectorMode) before this 
> container can return a schema. [ 12856530-3133-45be-bdf4-ef8cc784f7b3 on 
> qa-node119.qa.lab:31010 ]
> [ 12856530-3133-45be-bdf4-ef8cc784f7b3 on qa-node119.qa.lab:31010 ]
> Error: exception while executing query: Failure while executing query. 
> (state=,code=0)
> {code}
> drillbit.log
> {code}
> 2015-03-04 13:44:51,383 [2b08832b-9247-e90c-785d-751f02fc1548:frag:2:0] ERROR 
> o.a.drill.exec.ops.FragmentContext - Fragment Context received failure.
> org.apache.drill.exec.exception.SchemaChangeException: Failure while 
> materializing expression.
> Error in expression at index 0.  Error: Missing function implementation: 
> [count(BIGINT-REPEATED)].  Full expression: null.
> at 
> org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.createAggregatorInternal(HashAggBatch.java:210)
>  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.createAggregator(HashAggBatch.java:158)
>  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.buildSchema(HashAggBatch.java:101)
>  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:130)
>  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:118)
>  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:67) 
> [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.innerNext(PartitionSenderRootExec.java:114)
>  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:57) 
> [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:121)
>  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.WorkManager$RunnableWrapper.run(WorkManager.java:303)
>  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_45]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_45]
> at java.lang.Thread.run(Thread.java:744) [na:1.7.0_45]
> 2015-03-04 13:44:51,383 [2b08832b-9247-e90c-785d-751f02fc1548:frag:2:0] WARN  
> o.a.d.e.w.fragment.FragmentExecutor - Error while initializing or executing 
> fragment
> java.lang.NullPointerException: Schema is currently null.  You must call 
> buildSchema(SelectionVectorMode) before this container can return a schema.
> at 
> com.google.common.base.Preconditions.checkNotNull(Preconditions.java:208) 
> ~[guava-14.0.1.jar:na]
> at 
> org.apache.drill.exec.record.VectorContainer.getSchema(VectorContainer.java:261)
>  ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.getSchema(AbstractRecordBatch.java:155)
>  ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
> at 
> 

[jira] [Commented] (DRILL-2593) 500 error when crc for a query profile is out of sync

2016-06-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15341764#comment-15341764
 ] 

ASF GitHub Bot commented on DRILL-2593:
---

Github user arina-ielchiieva commented on the issue:

https://github.com/apache/drill/pull/523
  
Merged into master with commit 2862beaf5c72ccaafc6c52b9956f2d0414948b67


> 500 error when crc for a query profile is out of sync
> -
>
> Key: DRILL-2593
> URL: https://issues.apache.org/jira/browse/DRILL-2593
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - HTTP
>Affects Versions: 0.7.0
>Reporter: Jason Altekruse
>Assignee: Arina Ielchiieva
> Fix For: 1.7.0
>
> Attachments: warning1.JPG, warning2.JPG
>
>
> To reproduce, on a machine where an embedded drillbit has been run, edit one 
> of the profiles stored in /tmp/drill/profiles and try to navigate to the 
> profiles page on the Web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-2593) 500 error when crc for a query profile is out of sync

2016-06-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15341765#comment-15341765
 ] 

ASF GitHub Bot commented on DRILL-2593:
---

Github user arina-ielchiieva closed the pull request at:

https://github.com/apache/drill/pull/523


> 500 error when crc for a query profile is out of sync
> -
>
> Key: DRILL-2593
> URL: https://issues.apache.org/jira/browse/DRILL-2593
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - HTTP
>Affects Versions: 0.7.0
>Reporter: Jason Altekruse
>Assignee: Arina Ielchiieva
> Fix For: 1.7.0
>
> Attachments: warning1.JPG, warning2.JPG
>
>
> To reproduce, on a machine where an embedded drillbit has been run, edit one 
> of the profiles stored in /tmp/drill/profiles and try to navigate to the 
> profiles page on the Web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-2593) 500 error when crc for a query profile is out of sync

2016-06-21 Thread Arina Ielchiieva (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15341763#comment-15341763
 ] 

Arina Ielchiieva commented on DRILL-2593:
-

Merged into master with commit 2862beaf5c72ccaafc6c52b9956f2d0414948b67

> 500 error when crc for a query profile is out of sync
> -
>
> Key: DRILL-2593
> URL: https://issues.apache.org/jira/browse/DRILL-2593
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - HTTP
>Affects Versions: 0.7.0
>Reporter: Jason Altekruse
>Assignee: Arina Ielchiieva
> Fix For: 1.7.0
>
> Attachments: warning1.JPG, warning2.JPG
>
>
> To reproduce, on a machine where an embedded drillbit has been run, edit one 
> of the profiles stored in /tmp/drill/profiles and try to navigate to the 
> profiles page on the Web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4733) max(dir0) reading more columns than necessary

2016-06-21 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-4733:

Fix Version/s: 1.7.0

> max(dir0) reading more columns than necessary
> -
>
> Key: DRILL-4733
> URL: https://issues.apache.org/jira/browse/DRILL-4733
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization, Storage - Parquet
>Affects Versions: 1.7.0
>Reporter: Rahul Challapalli
>Assignee: Arina Ielchiieva
>Priority: Critical
> Fix For: 1.7.0
>
> Attachments: bug.tgz
>
>
> The below query started to fail from this commit : 
> 3209886a8548eea4a2f74c059542672f8665b8d2
> {code}
> select max(dir0) from dfs.`/drill/testdata/bug/2016`;
> Error: UNSUPPORTED_OPERATION ERROR: Streaming aggregate does not support 
> schema changes
> Fragment 0:0
> [Error Id: b0060205-e9a6-428a-9803-7b4312b2c6f4 on qa-node190.qa.lab:31010] 
> (state=,code=0)
> {code}
> The sub-folders contains files which do have schema change for one column 
> "contributions" (int32 vs double). However prior to this commit we did not 
> fail in the scenario. Log files and test data are attached



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (DRILL-4733) max(dir0) reading more columns than necessary

2016-06-21 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva reassigned DRILL-4733:
---

Assignee: Arina Ielchiieva

> max(dir0) reading more columns than necessary
> -
>
> Key: DRILL-4733
> URL: https://issues.apache.org/jira/browse/DRILL-4733
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization, Storage - Parquet
>Affects Versions: 1.7.0
>Reporter: Rahul Challapalli
>Assignee: Arina Ielchiieva
>Priority: Critical
> Fix For: 1.7.0
>
> Attachments: bug.tgz
>
>
> The below query started to fail from this commit : 
> 3209886a8548eea4a2f74c059542672f8665b8d2
> {code}
> select max(dir0) from dfs.`/drill/testdata/bug/2016`;
> Error: UNSUPPORTED_OPERATION ERROR: Streaming aggregate does not support 
> schema changes
> Fragment 0:0
> [Error Id: b0060205-e9a6-428a-9803-7b4312b2c6f4 on qa-node190.qa.lab:31010] 
> (state=,code=0)
> {code}
> The sub-folders contains files which do have schema change for one column 
> "contributions" (int32 vs double). However prior to this commit we did not 
> fail in the scenario. Log files and test data are attached



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4650) Excel file (.xsl) and Microsoft Access file (.accdb) problem

2016-06-21 Thread Sanjiv Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sanjiv Kumar updated DRILL-4650:

Description: 
I am trying to query from excel file(.xsl file) and ms access file (.accdb), 
but i am unable to query from these files in drill. Is there any way to query 
these files. Or any Storage Plugin for query these excel and ms access files. 


  was:I am trying to query from excel file(.xsl file) and ms access file 
(.accdb), but i am unable to query from these files in drill. Is there any way 
to query these files. Or any Storage Plugin for query these excel and ms access 
files. 


>  Excel file (.xsl) and Microsoft Access file (.accdb) problem
> -
>
> Key: DRILL-4650
> URL: https://issues.apache.org/jira/browse/DRILL-4650
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: 1.6.0
>Reporter: Sanjiv Kumar
>
> I am trying to query from excel file(.xsl file) and ms access file (.accdb), 
> but i am unable to query from these files in drill. Is there any way to query 
> these files. Or any Storage Plugin for query these excel and ms access files. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4601) Partitioning based on the parquet statistics

2016-06-21 Thread Miroslav Holubec (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15341447#comment-15341447
 ] 

Miroslav Holubec commented on DRILL-4601:
-

[~jacq...@dremio.com], [~jaltekruse], [~sphillips]: any inputs?

> Partitioning based on the parquet statistics
> 
>
> Key: DRILL-4601
> URL: https://issues.apache.org/jira/browse/DRILL-4601
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Reporter: Miroslav Holubec
>  Labels: parquet, partitioning, planning, statistics
> Attachments: DRILL-4601.1.patch
>
>
> It can really help performance to extend current partitioning idea 
> implemented in DRILL- even further.
> Currently partitioning is based on statistics, when min value equals to max 
> value for whole file. Based on this, files are removed from scan in planning 
> phase. Problem is, that it leads to many small parquet files, which is not 
> fine in HDFS world. Also only few columns are partitioned.
> I would like to extend this idea to use all statistics for all columns. So if 
> value should equal to constant, remove all files from plan which have 
> statistics off. This will really help performance for scans over many parquet 
> files.
> I have initial patch ready, currently just to give an idea. (it changes 
> metadata v2, which is not fine).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4601) Partitioning based on the parquet statistics

2016-06-21 Thread Miroslav Holubec (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miroslav Holubec updated DRILL-4601:

Description: 
It can really help performance to extend current partitioning idea implemented 
in DRILL- even further.
Currently partitioning is based on statistics, when min value equals to max 
value for whole file. Based on this, files are removed from scan in planning 
phase. Problem is, that it leads to many small parquet files, which is not fine 
in HDFS world. Also only few columns are partitioned.

I would like to extend this idea to use all statistics for all columns. So if 
value should equal to constant, remove all files from plan which have 
statistics off. This will really help performance for scans over many parquet 
files.

I have initial patch ready, currently just to give an idea. (it changes 
metadata v2, which is not fine).

  was:
It can really help performance to extend current partitioning idea implemented 
in DRILL- even further.
Currently partitioning is based on statistics, when min value equals to max 
value for whole file. Based on this, files are removed from scan in planning 
phase. Problem is, that it leads to many small parquet files, which is not fine 
in HDFS world. Also only few columns are partitioned.

I would like to extend this idea to use all statistics for all columns. So if 
value should equal to constant, remove all files from plan which have 
statistics off. This will really help performance for scans over many parquet 
files.

I have initial patch ready, currently just to give an idea. (it changes 
metadata v2, which is not fine and also it currently supports only equal 
operation).


> Partitioning based on the parquet statistics
> 
>
> Key: DRILL-4601
> URL: https://issues.apache.org/jira/browse/DRILL-4601
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Reporter: Miroslav Holubec
>  Labels: parquet, partitioning, planning, statistics
> Attachments: DRILL-4601.1.patch
>
>
> It can really help performance to extend current partitioning idea 
> implemented in DRILL- even further.
> Currently partitioning is based on statistics, when min value equals to max 
> value for whole file. Based on this, files are removed from scan in planning 
> phase. Problem is, that it leads to many small parquet files, which is not 
> fine in HDFS world. Also only few columns are partitioned.
> I would like to extend this idea to use all statistics for all columns. So if 
> value should equal to constant, remove all files from plan which have 
> statistics off. This will really help performance for scans over many parquet 
> files.
> I have initial patch ready, currently just to give an idea. (it changes 
> metadata v2, which is not fine).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4601) Partitioning based on the parquet statistics

2016-06-21 Thread Miroslav Holubec (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15341441#comment-15341441
 ] 

Miroslav Holubec commented on DRILL-4601:
-

current patch in github: https://github.com/myroch/drill

> Partitioning based on the parquet statistics
> 
>
> Key: DRILL-4601
> URL: https://issues.apache.org/jira/browse/DRILL-4601
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Reporter: Miroslav Holubec
>  Labels: parquet, partitioning, planning, statistics
> Attachments: DRILL-4601.1.patch
>
>
> It can really help performance to extend current partitioning idea 
> implemented in DRILL- even further.
> Currently partitioning is based on statistics, when min value equals to max 
> value for whole file. Based on this, files are removed from scan in planning 
> phase. Problem is, that it leads to many small parquet files, which is not 
> fine in HDFS world. Also only few columns are partitioned.
> I would like to extend this idea to use all statistics for all columns. So if 
> value should equal to constant, remove all files from plan which have 
> statistics off. This will really help performance for scans over many parquet 
> files.
> I have initial patch ready, currently just to give an idea. (it changes 
> metadata v2, which is not fine and also it currently supports only equal 
> operation).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4650) Excel file (.xsl) and Microsoft Access file (.accdb) problem

2016-06-21 Thread Sanjiv Kumar (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15341359#comment-15341359
 ] 

Sanjiv Kumar commented on DRILL-4650:
-

Can anyone tell how to query from excel file(xsl) through Storage Plugin 
Please. ??

>  Excel file (.xsl) and Microsoft Access file (.accdb) problem
> -
>
> Key: DRILL-4650
> URL: https://issues.apache.org/jira/browse/DRILL-4650
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: 1.6.0
>Reporter: Sanjiv Kumar
>
> I am trying to query from excel file(.xsl file) and ms access file (.accdb), 
> but i am unable to query from these files in drill. Is there any way to query 
> these files. Or any Storage Plugin for query these excel and ms access files. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)