date:20151215

[jira] [Commented] (DRILL-4194) Improve the performance of metadata fetch operation in HiveScan

2015-12-15 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15058390#comment-15058390
 ] 

ASF GitHub Bot commented on DRILL-4194:
---

Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/301


> Improve the performance of metadata fetch operation in HiveScan
> ---
>
> Key: DRILL-4194
> URL: https://issues.apache.org/jira/browse/DRILL-4194
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 1.4.0
>Reporter: Venki Korukanti
>Assignee: Venki Korukanti
> Fix For: 1.5.0
>
>
> Current HiveScan fetches the InputSplits for all partitions when {{HiveScan}} 
> is created. This causes long delays when the table contains large number of 
> partitions. If we end up pruning majority of partitions, this delay is 
> unnecessary.
> We need this InputSplits info from the beginning of planning because
>  * it is used in calculating the cost of the {{HiveScan}}. Currently when 
> calculating the cost first we look at the rowCount (from Hive MetaStore), if 
> it is available we use it in cost calculation. Otherwise we estimate the 
> rowCount from InputSplits. 
>  * We also need the InputSplits for determining whether {{HiveScan}} is a 
> singleton or distributed for adding appropriate traits in {{ScanPrule}}
> Fix is to delay the loading of the InputSplits until we need. There are two 
> cases where we need it. If we end up fetching the InputSplits, store them 
> until the query completes.
>  * If the stats are not available, then we need InputSplits
>  * If the partition is not pruned we need it for parallelization purposes.
> Regarding getting the parallelization info in {{ScanPrule}}: Had a discussion 
> with [~amansinha100]. All we need at this point is whether the data is 
> distributed or singleton at this point. Added a method {{isSingleton()}} to 
> GroupScan. Returning {{false}} seems to work fine for HiveScan, but I am not 
> sure of the implications here. We also have {{ExcessiveExchangeIdentifier}} 
> which removes unnecessary exchanges by looking at the parallelization info. I 
> think it is ok to return the parallelization info here as the pruning must 
> have already completed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Reopened] (DRILL-3739) NPE on select from Hive for HBase table

2015-12-15 Thread Krystal (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-3739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krystal reopened DRILL-3739:


git.commit.id.abbrev=b906811

Queries against hive hbase tables fail with the following error from sqlline:
select * from hbase_voter limit 2;
Error: SYSTEM ERROR: NullPointerException

Here is the stack trace:
{code}
2015-12-15 10:57:17,524 [298f9d72-0fa1-8d3b-8bc4-130141005e0f:foreman] INFO  
o.a.drill.exec.work.foreman.Foreman - Query text for query id 
298f9d72-0fa1-8d3b-8bc4-130141005e0f: select * from hbase_voter limit 2
2015-12-15 10:57:17,960 [298f9d72-0fa1-8d3b-8bc4-130141005e0f:foreman] ERROR 
o.a.drill.exec.work.foreman.Foreman - SYSTEM ERROR: NullPointerException


[Error Id: 4dbcc70a-0911-48ff-97fe-0478d160e63a on mfs41.qa.lab:31010]
org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
NullPointerException


[Error Id: 4dbcc70a-0911-48ff-97fe-0478d160e63a on mfs41.qa.lab:31010]
at 
org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:534)
 ~[drill-common-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
at 
org.apache.drill.exec.work.foreman.Foreman$ForemanResult.close(Foreman.java:742)
 [drill-java-exec-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
at 
org.apache.drill.exec.work.foreman.Foreman$StateSwitch.processEvent(Foreman.java:841)
 [drill-java-exec-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
at 
org.apache.drill.exec.work.foreman.Foreman$StateSwitch.processEvent(Foreman.java:786)
 [drill-java-exec-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
at 
org.apache.drill.common.EventProcessor.sendEvent(EventProcessor.java:73) 
[drill-common-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
at 
org.apache.drill.exec.work.foreman.Foreman$StateSwitch.moveToState(Foreman.java:788)
 [drill-java-exec-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
at 
org.apache.drill.exec.work.foreman.Foreman.moveToState(Foreman.java:894) 
[drill-java-exec-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:255) 
[drill-java-exec-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
[na:1.7.0_45]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
[na:1.7.0_45]
at java.lang.Thread.run(Thread.java:744) [na:1.7.0_45]
Caused by: org.apache.drill.exec.work.foreman.ForemanException: Unexpected 
exception during fragment initialization: null
... 4 common frames omitted
Caused by: java.lang.NullPointerException: null
at java.lang.Class.forName0(Native Method) ~[na:1.7.0_45]
at java.lang.Class.forName(Class.java:190) ~[na:1.7.0_45]
at 
org.apache.drill.exec.planner.sql.logical.ConvertHiveParquetScanToDrillParquetScan.getInputFormatFromSD(ConvertHiveParquetScanToDrillParquetScan.java:136)
 ~[drill-storage-hive-core-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
at 
org.apache.drill.exec.planner.sql.logical.ConvertHiveParquetScanToDrillParquetScan.matches(ConvertHiveParquetScanToDrillParquetScan.java:94)
 ~[drill-storage-hive-core-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
at 
org.apache.calcite.plan.volcano.VolcanoRuleCall.matchRecurse(VolcanoRuleCall.java:282)
 ~[calcite-core-1.4.0-drill-r10.jar:1.4.0-drill-r10]
at 
org.apache.calcite.plan.volcano.VolcanoRuleCall.match(VolcanoRuleCall.java:267) 
~[calcite-core-1.4.0-drill-r10.jar:1.4.0-drill-r10]
at 
org.apache.calcite.plan.volcano.VolcanoPlanner.fireRules(VolcanoPlanner.java:1522)
 ~[calcite-core-1.4.0-drill-r10.jar:1.4.0-drill-r10]
at 
org.apache.calcite.plan.volcano.VolcanoPlanner.registerImpl(VolcanoPlanner.java:1807)
 ~[calcite-core-1.4.0-drill-r10.jar:1.4.0-drill-r10]
at 
org.apache.calcite.plan.volcano.VolcanoPlanner.register(VolcanoPlanner.java:1017)
 ~[calcite-core-1.4.0-drill-r10.jar:1.4.0-drill-r10]
at 
org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:1037)
 ~[calcite-core-1.4.0-drill-r10.jar:1.4.0-drill-r10]
at 
org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:117)
 ~[calcite-core-1.4.0-drill-r10.jar:1.4.0-drill-r10]
at 
org.apache.calcite.rel.AbstractRelNode.onRegister(AbstractRelNode.java:305) 
~[calcite-core-1.4.0-drill-r10.jar:1.4.0-drill-r10]
at 
org.apache.calcite.plan.volcano.VolcanoPlanner.registerImpl(VolcanoPlanner.java:1658)
 ~[calcite-core-1.4.0-drill-r10.jar:1.4.0-drill-r10]
at 
org.apache.calcite.plan.volcano.VolcanoPlanner.register(VolcanoPlanner.java:1017)
 ~[calcite-core-1.4.0-drill-r10.jar:1.4.0-drill-r10]
at 
org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:1037)
 ~[calcite-core-1.4.0-drill-r10.jar:1.4.0-drill-r10]
at 
org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:117)

[jira] [Comment Edited] (DRILL-4190) TPCDS queries are running out of memory when hash join is disabled

2015-12-15 Thread Deneche A. Hakim (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15058622#comment-15058622
 ] 

Deneche A. Hakim edited comment on DRILL-4190 at 12/15/15 7:37 PM:
---

If I'm not mistaken, here is what's causing the query to fail:
- a sort below a merge join spilled to disk (this is important), then starts 
passing data downstream. This means that whenever it loads a batch from disk it 
uses it's own allocator (bound by the query's sort memory limit) to allocate 
the batch
- MergeJoin's RecordIterator seems to hold all incoming batches in memory until 
the operator is closed. This causes the sort allocator to hit it's allocation 
limit and the query fails.

The same query runs fine in 1.2.0 which suggests prior to RecordIterator, 
MergeJoin didn't hold all batches in memory (I asked a question on the dev list 
to confirm this point)

[~aah] can you please confirm if I am correct ? thanks


was (Author: adeneche):
If I'm not mistaken, here is what's causing the query to fail:
- a sort below a merge join spilled to disk (this is important), then starts 
passing data downstream. This means that whenever it loads a batch from disk it 
uses it's own allocator (bound by the query's sort memory limit) to allocate 
the batch
- MergeJoin's RecordIterator seems to hold all incoming batches in memory until 
the operator is closed. This causes the sort allocator to hit it's allocation 
limit and the query fails.

The same query runs fine in 1.2.0 which suggests prior to RecordIterator, 
MergeJoin didn't hold all batches in memory (I asked a question on the dev list 
to confirm this point)

> TPCDS queries are running out of memory when hash join is disabled
> --
>
> Key: DRILL-4190
> URL: https://issues.apache.org/jira/browse/DRILL-4190
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.3.0, 1.4.0, 1.5.0
>Reporter: Victoria Markman
>Assignee: Deneche A. Hakim
>Priority: Blocker
> Attachments: 2990f5f8-ec64-1223-c1d8-97dd7e601cee.sys.drill, 
> exception.log, query3.sql
>
>
> TPCDS queries with the latest 1.4.0 release when hash join is disabled:
> 22 queries fail with out of memory 
> 2 wrong results (I did not validate the nature of wrong result yet)
> Only query97.sql is a legitimate failure: we don't support full outer join 
> with the merge join.
> It is important to understand what has changed between 1.2.0 and 1.4.0 that 
> made these tests not runnable with the same configuration. 
> Same tests with the same drill configuration pass in 1.2.0 release.
> (I hope I did not make a mistake somewhere in my cluster setup :))
> {code}
> 0: jdbc:drill:schema=dfs> select * from sys.version;
> +-+---+-++--++
> | version | commit_id |   
> commit_message|commit_time
>  | build_email  | build_time |
> +-+---+-++--++
> | 1.4.0-SNAPSHOT  | b9068117177c3b47025f52c00f67938e0c3e4732  | DRILL-4165 
> Add a precondition for size of merge join record batch.  | 08.12.2015 @ 
> 01:25:34 UTC  | Unknown  | 08.12.2015 @ 03:36:25 UTC  |
> +-+---+-++--++
> 1 row selected (2.211 seconds)
> Execution Failures:
> /root/drill-tests-new/framework/resources/Advanced/tpcds/tpcds_sf100/original/query50.sql
> /root/drill-tests-new/framework/resources/Advanced/tpcds/tpcds_sf100/original/query33.sql
> /root/drill-tests-new/framework/resources/Advanced/tpcds/tpcds_sf100/original/query74.sql
> /root/drill-tests-new/framework/resources/Advanced/tpcds/tpcds_sf100/original/query68.sql
> /root/drill-tests-new/framework/resources/Advanced/tpcds/tpcds_sf100/original/query34.sql
> /root/drill-tests-new/framework/resources/Advanced/tpcds/tpcds_sf100/original/query21.sql
> /root/drill-tests-new/framework/resources/Advanced/tpcds/tpcds_sf100/original/query46.sql
> /root/drill-tests-new/framework/resources/Advanced/tpcds/tpcds_sf100/original/query91.sql
> /root/drill-tests-new/framework/resources/Advanced/tpcds/tpcds_sf100/original/query59.sql
>

[jira] [Closed] (DRILL-3912) Common subexpression elimination in code generation

2015-12-15 Thread Dechang Gu (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dechang Gu closed DRILL-3912.
-

Tested and Verified:  
For the following query: 

select * from comscore_512MB where 
l_orderkey in (1,2,3,4,5,6,7,8,9,10)  or
l_partkey in (20,30,40,55,60,70,80,90,100,101) or
l_suppkey in (11,21,32,44,55,66,77,88,99,111) or
l_linenumber in (12,22,32,42,52,62,72,82,92,102) or
l_quantity in (200,300,400,500,600,700,800,900,1000) or
l_extenededprice in (1.5,2.5,3.5,4.5,5.5,6.5,7.5,8.5,9.5,10.5) or
l_discount in (1.1,2.1,3.1,4.1,5.1,6.1,7.1,8.1,9.1,10.1) or
l_tax in (10,15,20,25,30,35,40,45,50,55) or
l_returnflag in ('r','b','y','d','a','c','p','x') 
limit 100
;

query time reduced by 25% (12s vs 16s), comparing 1.4.0 (gitId 32b871b) to
the one before the commit (gitid bb69f22  mapr-drill 1.3.0 branch).

LGTM.

> Common subexpression elimination in code generation
> ---
>
> Key: DRILL-3912
> URL: https://issues.apache.org/jira/browse/DRILL-3912
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Steven Phillips
>Assignee: Jinfeng Ni
> Fix For: 1.3.0
>
>
> Drill currently will evaluate the full expression tree, even if there are 
> redundant subtrees. Many of these redundant evaluations can be eliminated by 
> reusing the results from previously evaluated expression trees.
> For example,
> {code}
> select a + 1, (a + 1)* (a - 1) from t
> {code}
> Will compute the entire (a + 1) expression twice. With CSE, it will only be 
> evaluated once.
> The benefit will be reducing the work done when evaluating expressions, as 
> well as reducing the amount of code that is generated, which could also lead 
> to better JIT optimization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-4176) Dynamic Schema Discovery is not done in case of Drill- Hive

2015-12-15 Thread Venki Korukanti (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venki Korukanti updated DRILL-4176:
---
Fix Version/s: (was: 1.4.0)
   Future

> Dynamic Schema Discovery is not done in case of Drill- Hive
> ---
>
> Key: DRILL-4176
> URL: https://issues.apache.org/jira/browse/DRILL-4176
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.3.0
>Reporter: Devender Yadav 
> Fix For: Future
>
>
> I am using hive with drill.
> Storage Plugin info:
> {
>   "type": "hive",
>   "enabled": true,
>   "configProps": {
> "hive.metastore.uris": "",
> "javax.jdo.option.ConnectionURL": 
> "jdbc:mysql://localhost:3306/metastore_hive",
> "javax.jdo.option.ConnectionDriverName": "com.mysql.jdbc.Driver",
> "javax.jdo.option.ConnectionUserName": "root",
> "javax.jdo.option.ConnectionPassword": "root",
> "hive.metastore.warehouse.dir": "/user/hive/warehouse",
> "fs.default.name": "file:///",
> "hive.metastore.sasl.enabled": "false"
>   }
> }
> It's working fine for querying and all.
> Then I wanted to check whether it automatically discover newly created tables 
> in hive or not.
> I started drill in embedded mode and used a particular database in hive using
> use hive.testDB;
> Here testDB is a database in Hive with tables t1 & t2. Then I queried:
> show tables;
> It gave me table names
> t1 
> t2
> I created a table t3 in hive and again fired show tables; in Drill. It's 
> still showing  t1 t2. After 5-10 min I fired  show tables; again and it's 
> showing t1 t2 t3.
> I think it should show t3 immediately after adding t3 in hive.
> What can be reason for this behavior and how drill is handling it internally?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-4196) some TPCDS queries return wrong result when hash join is disabled

2015-12-15 Thread Deneche A. Hakim (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deneche A. Hakim updated DRILL-4196:

Assignee: amit hadke  (was: Deneche A. Hakim)

> some TPCDS queries return wrong result when hash join is disabled
> -
>
> Key: DRILL-4196
> URL: https://issues.apache.org/jira/browse/DRILL-4196
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Victoria Markman
>Assignee: amit hadke
> Attachments: query40.tar, query52.tar
>
>
> With hash join disabled query52.sql and query40.sql returned incorrect result 
> with 1.4.0 :
> {noformat}
> +-+---+-++--++
> | version | commit_id |   
> commit_message|commit_time
>  | build_email  | build_time |
> +-+---+-++--++
> | 1.4.0-SNAPSHOT  | b9068117177c3b47025f52c00f67938e0c3e4732  | DRILL-4165 
> Add a precondition for size of merge join record batch.  | 08.12.2015 @ 
> 01:25:34 UTC  | Unknown  | 08.12.2015 @ 03:36:25 UTC  |
> +-+---+-++--++
> 1 row selected (2.13 seconds)
> {noformat}
> Setup and options are the same as in DRILL-4190
> See attached queries (.sql), expected result (.e_tsv) and actual output (.out)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Closed] (DRILL-4146) Concurrent queries hang in planner in ReflectiveRelMetadataProvider

2015-12-15 Thread Dechang Gu (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dechang Gu closed DRILL-4146.
-

Tested and verified with TPCH concurrency test with 28, 32, 48, 64, 96, and 128 
threads, using mapr-drill 1.4.0 (gitid 32b871b)   No issue is shown. 

> Concurrent queries hang in planner in ReflectiveRelMetadataProvider
> ---
>
> Key: DRILL-4146
> URL: https://issues.apache.org/jira/browse/DRILL-4146
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.3.0
>Reporter: Aman Sinha
>Assignee: Aman Sinha
> Fix For: 1.4.0
>
>
> At concurrency levels of 30 or more for certain workloads we have seen 
> queries hang in the planning phase in Calcite.  The top of the jstack is 
> shown below: 
> {noformat}
> "29b47a17-6ef3-4b7f-98e7-a7c1a702c32f:foreman" daemon prio=10 
> tid=0x7f55484a1800 nid=0x289a runnable [0x7f54b4369000]
>java.lang.Thread.State: RUNNABLE
> at java.util.HashMap.getEntry(HashMap.java:465)
> at java.util.HashMap.get(HashMap.java:417)
> at 
> org.apache.calcite.rel.metadata.ReflectiveRelMetadataProvider.apply(ReflectiveRelMetadataProvider.java:251)
> at 
> org.apache.calcite.rel.metadata.ChainedRelMetadataProvider.apply(ChainedRelMetadataProvider.java:60)
> at 
> org.apache.calcite.rel.metadata.ChainedRelMetadataProvider.apply(ChainedRelMetadataProvider.java:60)
> {noformat}
>  
> After some investigations, we found that this issue was actually addressed by 
> CALCITE-874 (ReflectiveRelMetadataProvider is not thread-safe).   This JIRA 
> is a placeholder to merge that Calcite fix since Drill is currently not 
> up-to-date with Calcite and there is an immediate need for running queries in 
> a high concurrency environment. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Closed] (DRILL-4082) Better error message when multiple versions of the same function are found by the classpath scanner

2015-12-15 Thread Krystal (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krystal closed DRILL-4082.
--

git.commit.id.abbrev=b906811

Added some duplicate jars to .../jars directory and verified the log file that 
error messages are being displayed:

2015-12-15 10:17:08,589 [main] INFO  o.a.d.c.scanner.ClassPathScanner - User 
Error Occurred
org.apache.drill.common.exceptions.UserException: FUNCTION ERROR: function 
org.apache.drill.exec.expr.fn.impl.conv.OrderedBytesDoubleDescConvertTo scanned 
twice in the following locations:
[jar:file:/opt/mapr/drill/drill-1.4.0/jars/drill-mongo-storage-1.4.0-SNAPSHOT.jar!/,
 
jar:file:/opt/mapr/drill/drill-1.4.0/jars/3rdparty/drill-memory-impl-1.4.0-SNAPSHOT.jar!/,
 
jar:file:/opt/mapr/drill/drill-1.4.0/jars/drill-storage-hive-core-1.4.0-SNAPSHOT.jar!/,
 jar:file:/opt/mapr/drill/drill-1.4.0/jars/drill-gis-1.4.0-SNAPSHOT.jar!/, 
jar:file:/opt/mapr/drill/drill-1.4.0/jars/drill-jdbc-storage-1.4.0-SNAPSHOT_dup.jar!/,
 
jar:file:/opt/mapr/drill/drill-1.4.0/jars/drill-hive-exec-shaded-1.4.0-SNAPSHOT.jar!/,
 jar:file:/opt/mapr/drill/drill-1.4.0/jars/drill-storage-hbase-1.2.0.jar!/, 
jar:file:/opt/mapr/drill/drill-1.4.0/jars/drill-storage-maprdb-1.4.0-SNAPSHOT.jar!/,
 
jar:file:/opt/mapr/drill/drill-1.4.0/jars/drill-jdbc-storage-1.4.0-SNAPSHOT.jar!/,
 
jar:file:/opt/mapr/drill/drill-1.4.0/jars/drill-memory-impl-1.4.0-SNAPSHOT.jar!/,
 
jar:file:/opt/mapr/drill/drill-1.4.0/jars/drill-storage-hbase-1.4.0-SNAPSHOT.jar!/]
Do you have conflicting jars on the classpath?


> Better error message when multiple versions of the same function are found by 
> the classpath scanner
> ---
>
> Key: DRILL-4082
> URL: https://issues.apache.org/jira/browse/DRILL-4082
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
> Fix For: 1.4.0
>
>
> PR:
> https://github.com/apache/drill/pull/252



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4190) TPCDS queries are running out of memory when hash join is disabled

2015-12-15 Thread Deneche A. Hakim (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15058622#comment-15058622
 ] 

Deneche A. Hakim commented on DRILL-4190:
-

If I'm not mistaken, here is what's causing the query to fail:
- a sort below a merge join spilled to disk (this is important), then starts 
passing data downstream. This means that whenever it loads a batch from disk it 
uses it's own allocator (bound by the query's sort memory limit) to allocate 
the batch
- MergeJoin's RecordIterator seems to hold all incoming batches in memory until 
the operator is closed. This causes the sort allocator to hit it's allocation 
limit and the query fails.

The same query runs fine in 1.2.0 which suggests prior to RecordIterator, 
MergeJoin didn't hold all batches in memory (I asked a question on the dev list 
to confirm this point)

> TPCDS queries are running out of memory when hash join is disabled
> --
>
> Key: DRILL-4190
> URL: https://issues.apache.org/jira/browse/DRILL-4190
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.3.0, 1.4.0, 1.5.0
>Reporter: Victoria Markman
>Assignee: Deneche A. Hakim
>Priority: Blocker
> Attachments: 2990f5f8-ec64-1223-c1d8-97dd7e601cee.sys.drill, 
> exception.log, query3.sql
>
>
> TPCDS queries with the latest 1.4.0 release when hash join is disabled:
> 22 queries fail with out of memory 
> 2 wrong results (I did not validate the nature of wrong result yet)
> Only query97.sql is a legitimate failure: we don't support full outer join 
> with the merge join.
> It is important to understand what has changed between 1.2.0 and 1.4.0 that 
> made these tests not runnable with the same configuration. 
> Same tests with the same drill configuration pass in 1.2.0 release.
> (I hope I did not make a mistake somewhere in my cluster setup :))
> {code}
> 0: jdbc:drill:schema=dfs> select * from sys.version;
> +-+---+-++--++
> | version | commit_id |   
> commit_message|commit_time
>  | build_email  | build_time |
> +-+---+-++--++
> | 1.4.0-SNAPSHOT  | b9068117177c3b47025f52c00f67938e0c3e4732  | DRILL-4165 
> Add a precondition for size of merge join record batch.  | 08.12.2015 @ 
> 01:25:34 UTC  | Unknown  | 08.12.2015 @ 03:36:25 UTC  |
> +-+---+-++--++
> 1 row selected (2.211 seconds)
> Execution Failures:
> /root/drill-tests-new/framework/resources/Advanced/tpcds/tpcds_sf100/original/query50.sql
> /root/drill-tests-new/framework/resources/Advanced/tpcds/tpcds_sf100/original/query33.sql
> /root/drill-tests-new/framework/resources/Advanced/tpcds/tpcds_sf100/original/query74.sql
> /root/drill-tests-new/framework/resources/Advanced/tpcds/tpcds_sf100/original/query68.sql
> /root/drill-tests-new/framework/resources/Advanced/tpcds/tpcds_sf100/original/query34.sql
> /root/drill-tests-new/framework/resources/Advanced/tpcds/tpcds_sf100/original/query21.sql
> /root/drill-tests-new/framework/resources/Advanced/tpcds/tpcds_sf100/original/query46.sql
> /root/drill-tests-new/framework/resources/Advanced/tpcds/tpcds_sf100/original/query91.sql
> /root/drill-tests-new/framework/resources/Advanced/tpcds/tpcds_sf100/original/query59.sql
> /root/drill-tests-new/framework/resources/Advanced/tpcds/tpcds_sf100/original/query3.sql
> /root/drill-tests-new/framework/resources/Advanced/tpcds/tpcds_sf100/original/query66.sql
> /root/drill-tests-new/framework/resources/Advanced/tpcds/tpcds_sf100/original/query84.sql
> /root/drill-tests-new/framework/resources/Advanced/tpcds/tpcds_sf100/original/query97.sql
> /root/drill-tests-new/framework/resources/Advanced/tpcds/tpcds_sf100/original/query19.sql
> /root/drill-tests-new/framework/resources/Advanced/tpcds/tpcds_sf100/original/query96.sql
> /root/drill-tests-new/framework/resources/Advanced/tpcds/tpcds_sf100/original/query43.sql
> /root/drill-tests-new/framework/resources/Advanced/tpcds/tpcds_sf100/original/query15.sql
> /root/drill-tests-new/framework/resources/Advanced/tpcds/tpcds_sf100/original/query2.sql
>

[jira] [Commented] (DRILL-4196) some TPCDS queries return wrong result when hash join is disabled

2015-12-15 Thread Deneche A. Hakim (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15058631#comment-15058631
 ] 

Deneche A. Hakim commented on DRILL-4196:
-

[~aah] can you take a look, please ? this may be related to the changes you 
made when adding RecordIterator. Thanks

> some TPCDS queries return wrong result when hash join is disabled
> -
>
> Key: DRILL-4196
> URL: https://issues.apache.org/jira/browse/DRILL-4196
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Victoria Markman
>Assignee: amit hadke
> Attachments: query40.tar, query52.tar
>
>
> With hash join disabled query52.sql and query40.sql returned incorrect result 
> with 1.4.0 :
> {noformat}
> +-+---+-++--++
> | version | commit_id |   
> commit_message|commit_time
>  | build_email  | build_time |
> +-+---+-++--++
> | 1.4.0-SNAPSHOT  | b9068117177c3b47025f52c00f67938e0c3e4732  | DRILL-4165 
> Add a precondition for size of merge join record batch.  | 08.12.2015 @ 
> 01:25:34 UTC  | Unknown  | 08.12.2015 @ 03:36:25 UTC  |
> +-+---+-++--++
> 1 row selected (2.13 seconds)
> {noformat}
> Setup and options are the same as in DRILL-4190
> See attached queries (.sql), expected result (.e_tsv) and actual output (.out)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Closed] (DRILL-4165) IllegalStateException in MergeJoin for a query against TPC-DS data

2015-12-15 Thread Victoria Markman (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Victoria Markman closed DRILL-4165.
---

> IllegalStateException in MergeJoin for a query against TPC-DS data
> --
>
> Key: DRILL-4165
> URL: https://issues.apache.org/jira/browse/DRILL-4165
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.4.0
>Reporter: Aman Sinha
>Assignee: amit hadke
> Fix For: 1.4.0
>
>
> I am seeing the following on the 1.4.0 branch. 
> {noformat}
> 0: jdbc:drill:zk=local> alter session set `planner.enable_hashjoin` = false;
> ..
> 0: jdbc:drill:zk=local> select count(*) from dfs.`tpcds/store_sales` ss1, 
> dfs.`tpcds/store_sales` ss2 where ss1.ss_customer_sk = ss2.ss_customer_sk and 
> ss1.ss_store_sk = 1 and ss2.ss_store_sk = 2;
> Error: SYSTEM ERROR: IllegalStateException: Incoming batch [#55, 
> MergeJoinBatch] has size 1984616, which is beyond the limit of 65536
> Fragment 0:0
> [Error Id: 18bf00fe-52d7-4d84-97ec-b04a035afb4e on 192.168.1.103:31010]
>   (java.lang.IllegalStateException) Incoming batch [#55, MergeJoinBatch] has 
> size 1984616, which is beyond the limit of 65536
> 
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next():305
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():132
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4165) IllegalStateException in MergeJoin for a query against TPC-DS data

2015-12-15 Thread Victoria Markman (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15058621#comment-15058621
 ] 

Victoria Markman commented on DRILL-4165:
-

Verified fixed in:

{code}
#Generated by Git-Commit-Id-Plugin
#Tue Dec 08 03:32:09 UTC 2015
git.commit.id.abbrev=b906811
git.commit.user.email=amit.ha...@gmail.com
git.commit.message.full=DRILL-4165 Add a precondition for size of merge join 
record batch.\n
git.commit.id=b9068117177c3b47025f52c00f67938e0c3e4732
{code}

Test added under: Functional/tpcds/variants/parquet/drill-4165.sql

> IllegalStateException in MergeJoin for a query against TPC-DS data
> --
>
> Key: DRILL-4165
> URL: https://issues.apache.org/jira/browse/DRILL-4165
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.4.0
>Reporter: Aman Sinha
>Assignee: amit hadke
> Fix For: 1.4.0
>
>
> I am seeing the following on the 1.4.0 branch. 
> {noformat}
> 0: jdbc:drill:zk=local> alter session set `planner.enable_hashjoin` = false;
> ..
> 0: jdbc:drill:zk=local> select count(*) from dfs.`tpcds/store_sales` ss1, 
> dfs.`tpcds/store_sales` ss2 where ss1.ss_customer_sk = ss2.ss_customer_sk and 
> ss1.ss_store_sk = 1 and ss2.ss_store_sk = 2;
> Error: SYSTEM ERROR: IllegalStateException: Incoming batch [#55, 
> MergeJoinBatch] has size 1984616, which is beyond the limit of 65536
> Fragment 0:0
> [Error Id: 18bf00fe-52d7-4d84-97ec-b04a035afb4e on 192.168.1.103:31010]
>   (java.lang.IllegalStateException) Incoming batch [#55, MergeJoinBatch] has 
> size 1984616, which is beyond the limit of 65536
> 
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next():305
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():132
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-4199) Add Support for HBase 1.X

2015-12-15 Thread Divjot singh (JIRA)

Divjot singh created DRILL-4199:
---

 Summary: Add Support for HBase 1.X
 Key: DRILL-4199
 URL: https://issues.apache.org/jira/browse/DRILL-4199
 Project: Apache Drill
  Issue Type: New Feature
  Components: Storage - HBase
Affects Versions: Future
Reporter: Divjot singh


Is there any Road map to upgrade the Hbase version to 1.x series. Currently 
drill supports Hbase 0.98 version.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4187) Introduce a state to separate queries pending execution from those pending in the queue.

2015-12-15 Thread Jacques Nadeau (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15058864#comment-15058864
 ] 

Jacques Nadeau commented on DRILL-4187:
---

Sounds good to me.

> Introduce a state to separate queries pending execution from those pending in 
> the queue.
> 
>
> Key: DRILL-4187
> URL: https://issues.apache.org/jira/browse/DRILL-4187
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Hanifi Gunes
>Assignee: Hanifi Gunes
>
> Currently queries pending in the queue are not listed in the web UI besides 
> we use the state PENDING to mean pending executions. This issue proposes i) 
> to list enqueued queries in the web UI ii) to introduce a new state for 
> queries sitting at the queue, differentiating then from those pending 
> execution.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-4187) Introduce a state to separate queries pending execution from those pending in the queue.

2015-12-15 Thread Hanifi Gunes (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanifi Gunes updated DRILL-4187:

Fix Version/s: 1.5.0

> Introduce a state to separate queries pending execution from those pending in 
> the queue.
> 
>
> Key: DRILL-4187
> URL: https://issues.apache.org/jira/browse/DRILL-4187
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Hanifi Gunes
>Assignee: Hanifi Gunes
> Fix For: 1.5.0
>
>
> Currently queries pending in the queue are not listed in the web UI besides 
> we use the state PENDING to mean pending executions. This issue proposes i) 
> to list enqueued queries in the web UI ii) to introduce a new state for 
> queries sitting at the queue, differentiating then from those pending 
> execution.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4187) Introduce a state to separate queries pending execution from those pending in the queue.

2015-12-15 Thread Hanifi Gunes (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15058859#comment-15058859
 ] 

Hanifi Gunes commented on DRILL-4187:
-

I was thinking of the state name, ENQUEUED. We could rename PENDING to 
PENDING_EXECUTION rather than radically changing its previous meaning.

> Introduce a state to separate queries pending execution from those pending in 
> the queue.
> 
>
> Key: DRILL-4187
> URL: https://issues.apache.org/jira/browse/DRILL-4187
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Hanifi Gunes
>Assignee: Hanifi Gunes
>
> Currently queries pending in the queue are not listed in the web UI besides 
> we use the state PENDING to mean pending executions. This issue proposes i) 
> to list enqueued queries in the web UI ii) to introduce a new state for 
> queries sitting at the queue, differentiating then from those pending 
> execution.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-4081) Handle schema changes in ExternalSort

2015-12-15 Thread Victoria Markman (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Victoria Markman updated DRILL-4081:

Reviewer:   (was: Victoria Markman)

> Handle schema changes in ExternalSort
> -
>
> Key: DRILL-4081
> URL: https://issues.apache.org/jira/browse/DRILL-4081
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Steven Phillips
>Assignee: Steven Phillips
> Fix For: 1.4.0
>
>
> This improvement will make use of the Union vector to handle schema changes. 
> When a new schema appears, the schema will be "merged" with the previous 
> schema. The result will be a new schema that uses Union type to store the 
> columns where this is a type conflict. All of the batches (including the 
> batches that have already arrived) will be coerced into this new schema.
> A new comparison function will be included to handle the comparison of Union 
> type. Comparison of union type will work as follows:
> 1. All numeric types can be mutually compared, and will be compared using 
> Drill implicit cast rules.
> 2. All other types will not be compared against other types, but only among 
> values of the same type.
> 3. There will be an overall precedence of types with regards to ordering. 
> This precedence is not yet defined, but will be as part of the work on this 
> issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-2523) Correlated subquery with group by and count() throws unclear error

2015-12-15 Thread Victoria Markman (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-2523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059007#comment-15059007
 ] 

Victoria Markman commented on DRILL-2523:
-

Also affects tpcds query 16

> Correlated subquery with group by and count() throws unclear 
> error
> ---
>
> Key: DRILL-2523
> URL: https://issues.apache.org/jira/browse/DRILL-2523
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 0.8.0
>Reporter: Victoria Markman
> Fix For: Future
>
> Attachments: t1.parquet, t2.parquet
>
>
> Correlated subquery: works, returns correct result
> {code}
> 0: jdbc:drill:schema=dfs> select * from t2 where a2 not in ( select a1 from 
> t1 where t2.b2 = t1.b1 );
> ++++
> | a2 | b2 | c2 |
> ++++
> | 0  | zzz| 2014-12-31 |
> | 4  | d  | 2015-01-04 |
> ++++
> 2 rows selected (4.893 seconds)
> {code}
> count (\*) works
> {code}
> 0: jdbc:drill:schema=dfs> select t2.c2, count(*) from t2 where a2 not in ( 
> select a1 from t1 where t2.b2 = t1.b1 ) group by t2.c2 order by t2.c2;
> +++
> | c2 |   EXPR$1   |
> +++
> | 2014-12-31 | 1  |
> | 2015-01-04 | 1  |
> +++
> 2 rows selected (1.201 seconds) 
> {code}
> count() does not work and throws an error.
> Postgres returns result in this case. I'm not sure what is wrong with this 
> query and error does not tell me anything 
> {code}
> 0: jdbc:drill:schema=dfs> select t2.c2, count(t2.b2) from t2 where a2 not in 
> ( select a1 from t1 where t2.b2 = t1.b1 ) group by t2.c2 order by t2.c2;
> Query failed: IllegalArgumentException: Target must be less than target 
> count, 2
> Error: exception while executing query: Failure while executing query. 
> (state=,code=0)
> {code}
> drillbit.log
> {code}
> 2015-03-23 18:07:30,799 [2aefa99c-d961-f2c2-29a2-f2265d0b72a6:foreman] ERROR 
> o.a.drill.exec.work.foreman.Foreman - Error 
> 7e9910d4-9302-4b2c-9463-689b67f15601: IllegalArgumentException: Target must 
> be less than target count, 2
> org.apache.drill.exec.work.foreman.ForemanException: Unexpected exception 
> during fragment initialization: Target must be less than target count, 2
> at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:211) 
> [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
> at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_71]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_71]
> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71]
> Caused by: java.lang.IllegalArgumentException: Target must be less than 
> target count, 2
> at 
> org.eigenbase.util.mapping.Mappings$PartialFunctionImpl.set(Mappings.java:1374)
>  ~[optiq-core-0.9-drill-r20.jar:na]
> at org.eigenbase.util.mapping.Mappings.target(Mappings.java:266) 
> ~[optiq-core-0.9-drill-r20.jar:na]
> at 
> org.eigenbase.sql2rel.RelDecorrelator.decorrelateRel(RelDecorrelator.java:304)
>  ~[optiq-core-0.9-drill-r20.jar:na]
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[na:1.7.0_71]
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
> ~[na:1.7.0_71]
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[na:1.7.0_71]
> at java.lang.reflect.Method.invoke(Method.java:606) ~[na:1.7.0_71]
> at 
> org.eigenbase.util.ReflectUtil.invokeVisitorInternal(ReflectUtil.java:252) 
> ~[optiq-core-0.9-drill-r20.jar:na]
> at org.eigenbase.util.ReflectUtil.invokeVisitor(ReflectUtil.java:209) 
> ~[optiq-core-0.9-drill-r20.jar:na]
> at 
> org.eigenbase.util.ReflectUtil$1.invokeVisitor(ReflectUtil.java:473) 
> ~[optiq-core-0.9-drill-r20.jar:na]
> at 
> org.eigenbase.sql2rel.RelDecorrelator$DecorrelateRelVisitor.visit(RelDecorrelator.java:1372)
>  ~[optiq-core-0.9-drill-r20.jar:na]
> at 
> org.eigenbase.sql2rel.RelDecorrelator.decorrelate(RelDecorrelator.java:135) 
> ~[optiq-core-0.9-drill-r20.jar:na]
> at 
> org.eigenbase.sql2rel.SqlToRelConverter.decorrelateQuery(SqlToRelConverter.java:2618)
>  ~[optiq-core-0.9-drill-r20.jar:na]
> at 
> org.eigenbase.sql2rel.SqlToRelConverter.decorrelate(SqlToRelConverter.java:363)
>  ~[optiq-core-0.9-drill-r20.jar:na]
>

[jira] [Closed] (DRILL-3802) Throw unsupported error for ROLLUP/GROUPING

2015-12-15 Thread Victoria Markman (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-3802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Victoria Markman closed DRILL-3802.
---

> Throw unsupported error for ROLLUP/GROUPING
> ---
>
> Key: DRILL-3802
> URL: https://issues.apache.org/jira/browse/DRILL-3802
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Query Planning & Optimization
>Reporter: Victoria Markman
>Assignee: Jinfeng Ni
> Fix For: 1.3.0
>
>
> I believe that this is the cause of assertions in  TPCDS #36, #67
> {code}
> SELECT Sum(ss_net_profit) / Sum(ss_ext_sales_price) AS 
> gross_margin, 
> i_category, 
> i_class, 
> Grouping(i_category) + Grouping(i_class) AS 
> lochierarchy, 
> Rank() 
> OVER ( 
> partition BY Grouping(i_category)+Grouping(i_class), CASE 
> WHEN Grouping( 
> i_class) = 0 THEN i_category END 
> ORDER BY Sum(ss_net_profit)/Sum(ss_ext_sales_price) ASC) AS 
> rank_within_parent 
> FROM store_sales, 
> date_dim d1, 
> item, 
> store 
> WHERE d1.d_year = 2000 
> AND d1.d_date_sk = ss_sold_date_sk 
> AND i_item_sk = ss_item_sk 
> AND s_store_sk = ss_store_sk 
> AND s_state IN ( 'TN', 'TN', 'TN', 'TN', 
> 'TN', 'TN', 'TN', 'TN' ) 
> GROUP BY rollup( i_category, i_class ) 
> ORDER BY lochierarchy DESC, 
> CASE 
> WHEN lochierarchy = 0 THEN i_category 
> END, 
> rank_within_parent
> LIMIT 100;
> Error: SYSTEM ERROR: AssertionError: Internal error: invariant violated: 
> conversion result not null
> [Error Id: 6afae7ce-c426-44f3-a600-aa34ab7632a1 on ucs-node5.perf.lab:31010] 
> (state=,code=0)
> java.sql.SQLException: SYSTEM ERROR: AssertionError: Internal error: 
> invariant violated: conversion result not null
> [Error Id: 6afae7ce-c426-44f3-a600-aa34ab7632a1 on ucs-node5.perf.lab:31010]
> at 
> org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:247)
> at 
> org.apache.drill.jdbc.impl.DrillCursor.loadInitialSchema(DrillCursor.java:290)
> at 
> org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:1359)
> at 
> org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:74)
> at 
> net.hydromatic.avatica.AvaticaConnection.executeQueryInternal(AvaticaConnection.java:404)
> at 
> net.hydromatic.avatica.AvaticaStatement.executeQueryInternal(AvaticaStatement.java:351)
> at 
> net.hydromatic.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:338)
> at net.hydromatic.avatica.AvaticaStatement.execute(AvaticaStatement.java:69)
> at 
> org.apache.drill.jdbc.impl.DrillStatementImpl.execute(DrillStatementImpl.java:86)
> at sqlline.Commands.execute(Commands.java:841)
> at sqlline.Commands.sql(Commands.java:751)
> at sqlline.SqlLine.dispatch(SqlLine.java:738)
> at sqlline.SqlLine.runCommands(SqlLine.java:1641)
> at sqlline.Commands.run(Commands.java:1304)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at sqlline.ReflectiveCommandHandler.execute(ReflectiveCommandHandler.java:36)
> at sqlline.SqlLine.dispatch(SqlLine.java:734)
> at sqlline.SqlLine.initArgs(SqlLine.java:544)
> at sqlline.SqlLine.begin(SqlLine.java:587)
> at sqlline.SqlLine.start(SqlLine.java:366)
> at sqlline.SqlLine.main(SqlLine.java:259)
> Caused by: org.apache.drill.common.exceptions.UserRemoteException: SYSTEM 
> ERROR: AssertionError: Internal error: invariant violated: conversion result 
> not null
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-3802) Throw unsupported error for ROLLUP/GROUPING

2015-12-15 Thread Victoria Markman (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-3802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15058900#comment-15058900
 ] 

Victoria Markman commented on DRILL-3802:
-

Verified fixed in 1.4.0

{code}
#Tue Dec 08 03:32:09 UTC 2015
git.commit.id.abbrev=b906811
git.commit.user.email=amit.ha...@gmail.com
git.commit.message.full=DRILL-4165 Add a precondition for size of merge join 
record batch.\n
git.commit.id=b9068117177c3b47025f52c00f67938e0c3e4732
git.commit.message.short=DRILL-4165 Add a precondition for size of merge join 
record batch.
git.commit.user.name=Amit Hadke
{code}

{code}
0: jdbc:drill:schema=dfs> SELECT Sum(ss_net_profit) / Sum(ss_ext_sales_price) 
AS 
. . . . . . . . . . . . > gross_margin, 
. . . . . . . . . . . . > i_category, 
. . . . . . . . . . . . > i_class, 
. . . . . . . . . . . . > Grouping(i_category) + Grouping(i_class) AS 
. . . . . . . . . . . . > lochierarchy, 
. . . . . . . . . . . . > Rank() 
. . . . . . . . . . . . > OVER ( 
. . . . . . . . . . . . > partition BY Grouping(i_category)+Grouping(i_class), 
CASE 
. . . . . . . . . . . . > WHEN Grouping( 
. . . . . . . . . . . . > i_class) = 0 THEN i_category END 
. . . . . . . . . . . . > ORDER BY Sum(ss_net_profit)/Sum(ss_ext_sales_price) 
ASC) AS 
. . . . . . . . . . . . > rank_within_parent 
. . . . . . . . . . . . > FROM store_sales, 
. . . . . . . . . . . . > date_dim d1, 
. . . . . . . . . . . . > item, 
. . . . . . . . . . . . > store 
. . . . . . . . . . . . > WHERE d1.d_year = 2000 
. . . . . . . . . . . . > AND d1.d_date_sk = ss_sold_date_sk 
. . . . . . . . . . . . > AND i_item_sk = ss_item_sk 
. . . . . . . . . . . . > AND s_store_sk = ss_store_sk 
. . . . . . . . . . . . > AND s_state IN ( 'TN', 'TN', 'TN', 'TN', 
. . . . . . . . . . . . > 'TN', 'TN', 'TN', 'TN' ) 
. . . . . . . . . . . . > GROUP BY rollup( i_category, i_class ) 
. . . . . . . . . . . . > ORDER BY lochierarchy DESC, 
. . . . . . . . . . . . > CASE 
. . . . . . . . . . . . > WHEN lochierarchy = 0 THEN i_category 
. . . . . . . . . . . . > END, 
. . . . . . . . . . . . > rank_within_parent
. . . . . . . . . . . . > LIMIT 100;
Error: UNSUPPORTED_OPERATION ERROR: Grouping, Grouping_ID, Group_ID are not 
supported.
See Apache Drill JIRA: DRILL-3962
[Error Id: 643e0e4d-e11e-4e40-90dd-8a633a2d2aaa on atsqa4-136.qa.lab:31010] 
(state=,code=0)
{code}

> Throw unsupported error for ROLLUP/GROUPING
> ---
>
> Key: DRILL-3802
> URL: https://issues.apache.org/jira/browse/DRILL-3802
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Query Planning & Optimization
>Reporter: Victoria Markman
>Assignee: Jinfeng Ni
> Fix For: 1.3.0
>
>
> I believe that this is the cause of assertions in  TPCDS #36, #67
> {code}
> SELECT Sum(ss_net_profit) / Sum(ss_ext_sales_price) AS 
> gross_margin, 
> i_category, 
> i_class, 
> Grouping(i_category) + Grouping(i_class) AS 
> lochierarchy, 
> Rank() 
> OVER ( 
> partition BY Grouping(i_category)+Grouping(i_class), CASE 
> WHEN Grouping( 
> i_class) = 0 THEN i_category END 
> ORDER BY Sum(ss_net_profit)/Sum(ss_ext_sales_price) ASC) AS 
> rank_within_parent 
> FROM store_sales, 
> date_dim d1, 
> item, 
> store 
> WHERE d1.d_year = 2000 
> AND d1.d_date_sk = ss_sold_date_sk 
> AND i_item_sk = ss_item_sk 
> AND s_store_sk = ss_store_sk 
> AND s_state IN ( 'TN', 'TN', 'TN', 'TN', 
> 'TN', 'TN', 'TN', 'TN' ) 
> GROUP BY rollup( i_category, i_class ) 
> ORDER BY lochierarchy DESC, 
> CASE 
> WHEN lochierarchy = 0 THEN i_category 
> END, 
> rank_within_parent
> LIMIT 100;
> Error: SYSTEM ERROR: AssertionError: Internal error: invariant violated: 
> conversion result not null
> [Error Id: 6afae7ce-c426-44f3-a600-aa34ab7632a1 on ucs-node5.perf.lab:31010] 
> (state=,code=0)
> java.sql.SQLException: SYSTEM ERROR: AssertionError: Internal error: 
> invariant violated: conversion result not null
> [Error Id: 6afae7ce-c426-44f3-a600-aa34ab7632a1 on ucs-node5.perf.lab:31010]
> at 
> org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:247)
> at 
> org.apache.drill.jdbc.impl.DrillCursor.loadInitialSchema(DrillCursor.java:290)
> at 
> org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:1359)
> at 
> org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:74)
> at 
> net.hydromatic.avatica.AvaticaConnection.executeQueryInternal(AvaticaConnection.java:404)
> at 
> net.hydromatic.avatica.AvaticaStatement.executeQueryInternal(AvaticaStatement.java:351)
> at 
> net.hydromatic.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:338)
> at net.hydromatic.avatica.AvaticaStatement.execute(AvaticaStatement.java:69)
> at 
> org.apache.drill.jdbc.impl.DrillStatementImpl.execute(DrillStatementImpl.java:86)
> at sqlline.Commands.execute(Commands.java:841)
> at

[jira] [Comment Edited] (DRILL-4187) Introduce a state to separate queries pending execution from those pending in the queue.

2015-12-15 Thread Hanifi Gunes (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15058911#comment-15058911
 ] 

Hanifi Gunes edited comment on DRILL-4187 at 12/15/15 9:59 PM:
---

Thinking about it once more, I like STARTING more than PENDING_EXECUTION. Sorry 
:D


was (Author: hgunes):
Thinking about it once more, I like STARTING more than PENDING_EXECUTION. 

> Introduce a state to separate queries pending execution from those pending in 
> the queue.
> 
>
> Key: DRILL-4187
> URL: https://issues.apache.org/jira/browse/DRILL-4187
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Hanifi Gunes
>Assignee: Hanifi Gunes
>
> Currently queries pending in the queue are not listed in the web UI besides 
> we use the state PENDING to mean pending executions. This issue proposes i) 
> to list enqueued queries in the web UI ii) to introduce a new state for 
> queries sitting at the queue, differentiating then from those pending 
> execution.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Closed] (DRILL-4053) Reduce metadata cache file size

2015-12-15 Thread Rahul Challapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rahul Challapalli closed DRILL-4053.


Verified and added a testcase in the extended tests

> Reduce metadata cache file size
> ---
>
> Key: DRILL-4053
> URL: https://issues.apache.org/jira/browse/DRILL-4053
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Metadata
>Affects Versions: 1.3.0
>Reporter: Parth Chandra
>Assignee: Parth Chandra
> Fix For: 1.4.0
>
>
> The parquet metadata cache file has fair amount of redundant metadata that 
> causes the size of the cache file to bloat. Two things that we can reduce are 
> :
> 1) Schema is repeated for every row group. We can keep a merged schema 
> (similar to what was discussed for insert into functionality) 2) The max and 
> min value in the stats are used for partition pruning when the values are the 
> same. We can keep the maxValue only and that too only if it is the same as 
> the minValue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (DRILL-2419) UDF that returns string representation of expression type

2015-12-15 Thread Victoria Markman (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-2419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059160#comment-15059160
 ] 

Victoria Markman edited comment on DRILL-2419 at 12/16/15 12:14 AM:


So this function will become useful when we enable union type, I believe.
I've played with it a little and it seems to return what was intended, though 
NULL is not technically a type ...

{code}
0: jdbc:drill:schema=dfs> select count(*), typeof(c_timestamp) from j1 group by 
typeof(c_timestamp); 
+-++
| EXPR$0  |   EXPR$1   |
+-++
| 9800| TIMESTAMP  |
| 200 | NULL   |
+-++
2 rows selected (1.361 seconds)


0: jdbc:drill:schema=dfs> select typeof(c_integer + c_bigint ) from j2;
+-+
| EXPR$0  |
+-+
| BIGINT  |
| BIGINT  |
| BIGINT  |
| BIGINT  |
| BIGINT  |
| BIGINT  |
| BIGINT  |
| BIGINT  |
| BIGINT  |
| NULL|
+-+
10 rows selected (0.565 seconds)


0: jdbc:drill:schema=dfs> select typeof(sum(c_integer)) from j2;
+-+
| EXPR$0  |
+-+
| BIGINT  |
+-+
1 row selected (0.377 seconds)

0: jdbc:drill:schema=dfs> select typeof(c_integer) from j3 where c_integer is 
null;
+-+
| EXPR$0  |
+-+
+-+
No rows selected (0.598 seconds)

0: jdbc:drill:schema=dfs> select typeof(typeof(a1)) from t1; 
+--+
|  EXPR$0  |
+--+
| VARCHAR  |
| VARCHAR  |
| VARCHAR  |
| VARCHAR  |
| VARCHAR  |
| VARCHAR  |
| VARCHAR  |
| VARCHAR  |
| VARCHAR  |
| VARCHAR  |
+--+
10 rows selected (0.395 seconds)

:)
{code}


was (Author: vicky):
So this function will become useful when we enable union type, I believe.
I've played with it a little and it seems to return what was intended, though 
NULL is not technically a type ...

{code}
0: jdbc:drill:schema=dfs> select count(*), typeof(c_timestamp) from j1 group by 
typeof(c_timestamp); 
+-++
| EXPR$0  |   EXPR$1   |
+-++
| 9800| TIMESTAMP  |
| 200 | NULL   |
+-++
2 rows selected (1.361 seconds)


0: jdbc:drill:schema=dfs> select typeof(c_integer + c_bigint ) from j2;
+-+
| EXPR$0  |
+-+
| BIGINT  |
| BIGINT  |
| BIGINT  |
| BIGINT  |
| BIGINT  |
| BIGINT  |
| BIGINT  |
| BIGINT  |
| BIGINT  |
| NULL|
+-+
10 rows selected (0.565 seconds)


0: jdbc:drill:schema=dfs> select typeof(sum(c_integer)) from j2;
+-+
| EXPR$0  |
+-+
| BIGINT  |
+-+
1 row selected (0.377 seconds)

0: jdbc:drill:schema=dfs> select typeof(c_integer) from j3 where c_integer is 
null;
+-+
| EXPR$0  |
+-+
+-+
No rows selected (0.598 seconds)

{code}

> UDF that returns string representation of expression type
> -
>
> Key: DRILL-2419
> URL: https://issues.apache.org/jira/browse/DRILL-2419
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Functions - Drill
>Reporter: Victoria Markman
>Assignee: Steven Phillips
> Fix For: 1.3.0
>
>
> Suggested name: typeof (credit goes to Aman)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Closed] (DRILL-3965) Index out of bounds exception in partition pruning

2015-12-15 Thread Rahul Challapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-3965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rahul Challapalli closed DRILL-3965.


Verified and added a testcase

> Index out of bounds exception in partition pruning
> --
>
> Key: DRILL-3965
> URL: https://issues.apache.org/jira/browse/DRILL-3965
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Reporter: Mehant Baid
>Assignee: Mehant Baid
> Fix For: 1.3.0
>
> Attachments: DRILL-3965.patch
>
>
> Hit IOOB while trying to perform partition pruning on a table that was 
> created using CTAS auto partitioning with the below stack trace.
> Caused by: java.lang.StringIndexOutOfBoundsException: String index out of 
> range: -8
>   at java.lang.String.substring(String.java:1875) ~[na:1.7.0_79]
>   at 
> org.apache.drill.exec.planner.DFSPartitionLocation.(DFSPartitionLocation.java:31)
>  ~[drill-java-exec-1.2.0.jar:1.2.0]
>   at 
> org.apache.drill.exec.planner.ParquetPartitionDescriptor.createPartitionSublists(ParquetPartitionDescriptor.java:126)
>  ~[drill-java-exec-1.2.0.jar:1.2.0]
>   at 
> org.apache.drill.exec.planner.AbstractPartitionDescriptor.iterator(AbstractPartitionDescriptor.java:53)
>  ~[drill-java-exec-1.2.0.jar:1.2.0]
>   at 
> org.apache.drill.exec.planner.logical.partition.PruneScanRule.doOnMatch(PruneScanRule.java:190)
>  ~[drill-java-exec-1.2.0.jar:1.2.0]
>   at 
> org.apache.drill.exec.planner.logical.partition.ParquetPruneScanRule$2.onMatch(ParquetPruneScanRule.java:87)
>  ~[drill-java-exec-1.2.0.jar:1.2.0]
>   at 
> org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:228)
>  ~[calcite-core-1.4.0-drill-r5.jar:1.4.0-drill-r5]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Closed] (DRILL-3376) Reading individual files created by CTAS with partition causes an exception

2015-12-15 Thread Rahul Challapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-3376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rahul Challapalli closed DRILL-3376.


Verified and added a testcase

> Reading individual files created by CTAS with partition causes an exception
> ---
>
> Key: DRILL-3376
> URL: https://issues.apache.org/jira/browse/DRILL-3376
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Writer
>Affects Versions: 1.1.0
>Reporter: Parth Chandra
>Assignee: Steven Phillips
> Fix For: 1.1.0
>
>
> Create a table using CTAS with partitioning:
> {code}
> create table `lineitem_part` partition by (l_moddate) as select l.*, 
> l_shipdate - extract(day from l_shipdate) + 1 l_moddate from 
> cp.`tpch/lineitem.parquet` l
> {code}
> Then the following query causes an exception
> {code}
> select distinct l_moddate from `lineitem_part/0_0_1.parquet` where l_moddate 
> = date '1992-01-01';
> {code}
> Trace in the log file - 
> {panel}
> Caused by: java.lang.StringIndexOutOfBoundsException: String index out of 
> range: 0
> at java.lang.String.charAt(String.java:658) ~[na:1.7.0_65]
> at 
> org.apache.drill.exec.planner.logical.partition.PruneScanRule$PathPartition.(PruneScanRule.java:493)
>  ~[drill-java-exec-1.1.0-SNAPSHOT.jar:1.1.0-SNAPSHOT]
> at 
> org.apache.drill.exec.planner.logical.partition.PruneScanRule.doOnMatch(PruneScanRule.java:385)
>  ~[drill-java-exec-1.1.0-SNAPSHOT.jar:1.1.0-SNAPSHOT]
> at 
> org.apache.drill.exec.planner.logical.partition.PruneScanRule$4.onMatch(PruneScanRule.java:278)
>  ~[drill-java-exec-1.1.0-SNAPSHOT.jar:1.1.0-SNAPSHOT]
> at 
> org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:228)
>  ~[calcite-core-1.1.0-drill-r9.jar:1.1.0-drill-r9]
> ... 13 common frames omitted
> {panel}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Closed] (DRILL-3887) Parquet metadata cache not being used

2015-12-15 Thread Rahul Challapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-3887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rahul Challapalli closed DRILL-3887.


Our functional tests in the metadata caching suite already verify this. So this 
can be closed

> Parquet metadata cache not being used
> -
>
> Key: DRILL-3887
> URL: https://issues.apache.org/jira/browse/DRILL-3887
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Steven Phillips
>Assignee: Mehant Baid
>Priority: Critical
> Fix For: 1.3.0
>
>
> The fix for DRILL-3788 causes a directory to be expanded to its list of files 
> early in the query, and this change causes the ParquetGroupScan to no longer 
> use the parquet metadata file, even when it is there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-4198) Enhance StoragePlugin interface to expose logical space rules for planning purpose

2015-12-15 Thread Venki Korukanti (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venki Korukanti updated DRILL-4198:
---
Fix Version/s: 1.5.0

> Enhance StoragePlugin interface to expose logical space rules for planning 
> purpose
> --
>
> Key: DRILL-4198
> URL: https://issues.apache.org/jira/browse/DRILL-4198
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Venki Korukanti
>Assignee: Venki Korukanti
> Fix For: 1.5.0
>
>
> Currently StoragePlugins can only expose rules that are executed in physical 
> space. Add an interface method to StoragePlugin to expose logical space rules 
> to planner.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Closed] (DRILL-3871) Off by one error while reading binary fields with one terminal null in parquet

2015-12-15 Thread Victoria Markman (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-3871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Victoria Markman closed DRILL-3871.
---

> Off by one error while reading binary fields with one terminal null in parquet
> --
>
> Key: DRILL-3871
> URL: https://issues.apache.org/jira/browse/DRILL-3871
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.2.0
>Reporter: Victoria Markman
>Assignee: Deneche A. Hakim
>Priority: Critical
>  Labels: int96
> Fix For: 1.3.0
>
> Attachments: tables.tar
>
>
> Both tables in the join where created by impala, with column c_timestamp 
> being parquet int96. 
> {code}
> 0: jdbc:drill:schema=dfs> select
> . . . . . . . . . . . . > max(t1.c_timestamp),
> . . . . . . . . . . . . > min(t1.c_timestamp),
> . . . . . . . . . . . . > count(t1.c_timestamp)
> . . . . . . . . . . . . > from
> . . . . . . . . . . . . > imp_t1 t1
> . . . . . . . . . . . . > inner join
> . . . . . . . . . . . . > imp_t2 t2
> . . . . . . . . . . . . > on  (t1.c_timestamp = t2.c_timestamp)
> . . . . . . . . . . . . > ;
> java.lang.RuntimeException: java.sql.SQLException: SYSTEM ERROR: 
> TProtocolException: Required field 'uncompressed_page_size' was not found in 
> serialized data! Struct: PageHeader(type:null, uncompressed_page_size:0, 
> compressed_page_size:0)
> Fragment 0:0
> [Error Id: eb6a5df8-fc59-409b-957a-59cb1079b5b8 on atsqa4-133.qa.lab:31010]
> at sqlline.IncrementalRows.hasNext(IncrementalRows.java:73)
> at 
> sqlline.TableOutputFormat$ResizingRowsProvider.next(TableOutputFormat.java:87)
> at sqlline.TableOutputFormat.print(TableOutputFormat.java:118)
> at sqlline.SqlLine.print(SqlLine.java:1583)
> at sqlline.Commands.execute(Commands.java:852)
> at sqlline.Commands.sql(Commands.java:751)
> at sqlline.SqlLine.dispatch(SqlLine.java:738)
> at sqlline.SqlLine.begin(SqlLine.java:612)
> at sqlline.SqlLine.start(SqlLine.java:366)
> at sqlline.SqlLine.main(SqlLine.java:259)
> {code}
> drillbit.log
> {code}
> 2015-09-30 21:15:45,710 [29f3aefe-3209-a6e6-0418-500dac60a339:foreman] INFO  
> o.a.d.exec.store.parquet.Metadata - Took 0 ms to get file statuses
> 2015-09-30 21:15:45,712 [29f3aefe-3209-a6e6-0418-500dac60a339:foreman] INFO  
> o.a.d.exec.store.parquet.Metadata - Fetch parquet metadata: Executed 1 out of 
> 1 using 1 threads. Time: 1ms total, 1.645381ms avg, 1ms max.
> 2015-09-30 21:15:45,712 [29f3aefe-3209-a6e6-0418-500dac60a339:foreman] INFO  
> o.a.d.exec.store.parquet.Metadata - Fetch parquet metadata: Executed 1 out of 
> 1 using 1 threads. Earliest start: 1.332000 μs, Latest start: 1.332000 μs, 
> Average start: 1.332000 μs .
> 2015-09-30 21:15:45,830 [29f3aefe-3209-a6e6-0418-500dac60a339:frag:0:0] INFO  
> o.a.d.e.w.fragment.FragmentExecutor - 
> 29f3aefe-3209-a6e6-0418-500dac60a339:0:0: State change requested 
> AWAITING_ALLOCATION --> RUNNING
> 2015-09-30 21:15:45,830 [29f3aefe-3209-a6e6-0418-500dac60a339:frag:0:0] INFO  
> o.a.d.e.w.f.FragmentStatusReporter - 
> 29f3aefe-3209-a6e6-0418-500dac60a339:0:0: State to report: RUNNING
> 2015-09-30 21:15:45,925 [29f3aefe-3209-a6e6-0418-500dac60a339:frag:0:0] INFO  
> o.a.d.e.w.fragment.FragmentExecutor - 
> 29f3aefe-3209-a6e6-0418-500dac60a339:0:0: State change requested RUNNING --> 
> FAILED
> 2015-09-30 21:15:45,930 [29f3aefe-3209-a6e6-0418-500dac60a339:frag:0:0] INFO  
> o.a.d.e.w.fragment.FragmentExecutor - 
> 29f3aefe-3209-a6e6-0418-500dac60a339:0:0: State change requested FAILED --> 
> FINISHED
> 2015-09-30 21:15:45,931 [29f3aefe-3209-a6e6-0418-500dac60a339:frag:0:0] ERROR 
> o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: TProtocolException: 
> Required field 'uncompressed_page_size' was not found in serialized data! 
> Struct: PageHeader(type:null, uncompressed_page_size:0, 
> compressed_page_size:0)
> Fragment 0:0
> [Error Id: eb6a5df8-fc59-409b-957a-59cb1079b5b8 on atsqa4-133.qa.lab:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> TProtocolException: Required field 'uncompressed_page_size' was not found in 
> serialized data! Struct: PageHeader(type:null, uncompressed_page_size:0, 
> compressed_page_size:0)
> Fragment 0:0
> [Error Id: eb6a5df8-fc59-409b-957a-59cb1079b5b8 on atsqa4-133.qa.lab:31010]
> at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:534)
>  ~[drill-common-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:323)
>  [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
>

[jira] [Commented] (DRILL-3871) Off by one error while reading binary fields with one terminal null in parquet

2015-12-15 Thread Victoria Markman (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-3871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059122#comment-15059122
 ] 

Victoria Markman commented on DRILL-3871:
-

Verified fixed in 1.4.0

{noformat}
#Tue Dec 08 03:32:09 UTC 2015
git.commit.id.abbrev=b906811
git.commit.user.email=amit.ha...@gmail.com
git.commit.message.full=DRILL-4165 Add a precondition for size of merge join 
record batch.\n
git.commit.id=b9068117177c3b47025f52c00f67938e0c3e4732
{noformat}

Tests are checked in under Functional/int96

> Off by one error while reading binary fields with one terminal null in parquet
> --
>
> Key: DRILL-3871
> URL: https://issues.apache.org/jira/browse/DRILL-3871
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.2.0
>Reporter: Victoria Markman
>Assignee: Deneche A. Hakim
>Priority: Critical
>  Labels: int96
> Fix For: 1.3.0
>
> Attachments: tables.tar
>
>
> Both tables in the join where created by impala, with column c_timestamp 
> being parquet int96. 
> {code}
> 0: jdbc:drill:schema=dfs> select
> . . . . . . . . . . . . > max(t1.c_timestamp),
> . . . . . . . . . . . . > min(t1.c_timestamp),
> . . . . . . . . . . . . > count(t1.c_timestamp)
> . . . . . . . . . . . . > from
> . . . . . . . . . . . . > imp_t1 t1
> . . . . . . . . . . . . > inner join
> . . . . . . . . . . . . > imp_t2 t2
> . . . . . . . . . . . . > on  (t1.c_timestamp = t2.c_timestamp)
> . . . . . . . . . . . . > ;
> java.lang.RuntimeException: java.sql.SQLException: SYSTEM ERROR: 
> TProtocolException: Required field 'uncompressed_page_size' was not found in 
> serialized data! Struct: PageHeader(type:null, uncompressed_page_size:0, 
> compressed_page_size:0)
> Fragment 0:0
> [Error Id: eb6a5df8-fc59-409b-957a-59cb1079b5b8 on atsqa4-133.qa.lab:31010]
> at sqlline.IncrementalRows.hasNext(IncrementalRows.java:73)
> at 
> sqlline.TableOutputFormat$ResizingRowsProvider.next(TableOutputFormat.java:87)
> at sqlline.TableOutputFormat.print(TableOutputFormat.java:118)
> at sqlline.SqlLine.print(SqlLine.java:1583)
> at sqlline.Commands.execute(Commands.java:852)
> at sqlline.Commands.sql(Commands.java:751)
> at sqlline.SqlLine.dispatch(SqlLine.java:738)
> at sqlline.SqlLine.begin(SqlLine.java:612)
> at sqlline.SqlLine.start(SqlLine.java:366)
> at sqlline.SqlLine.main(SqlLine.java:259)
> {code}
> drillbit.log
> {code}
> 2015-09-30 21:15:45,710 [29f3aefe-3209-a6e6-0418-500dac60a339:foreman] INFO  
> o.a.d.exec.store.parquet.Metadata - Took 0 ms to get file statuses
> 2015-09-30 21:15:45,712 [29f3aefe-3209-a6e6-0418-500dac60a339:foreman] INFO  
> o.a.d.exec.store.parquet.Metadata - Fetch parquet metadata: Executed 1 out of 
> 1 using 1 threads. Time: 1ms total, 1.645381ms avg, 1ms max.
> 2015-09-30 21:15:45,712 [29f3aefe-3209-a6e6-0418-500dac60a339:foreman] INFO  
> o.a.d.exec.store.parquet.Metadata - Fetch parquet metadata: Executed 1 out of 
> 1 using 1 threads. Earliest start: 1.332000 μs, Latest start: 1.332000 μs, 
> Average start: 1.332000 μs .
> 2015-09-30 21:15:45,830 [29f3aefe-3209-a6e6-0418-500dac60a339:frag:0:0] INFO  
> o.a.d.e.w.fragment.FragmentExecutor - 
> 29f3aefe-3209-a6e6-0418-500dac60a339:0:0: State change requested 
> AWAITING_ALLOCATION --> RUNNING
> 2015-09-30 21:15:45,830 [29f3aefe-3209-a6e6-0418-500dac60a339:frag:0:0] INFO  
> o.a.d.e.w.f.FragmentStatusReporter - 
> 29f3aefe-3209-a6e6-0418-500dac60a339:0:0: State to report: RUNNING
> 2015-09-30 21:15:45,925 [29f3aefe-3209-a6e6-0418-500dac60a339:frag:0:0] INFO  
> o.a.d.e.w.fragment.FragmentExecutor - 
> 29f3aefe-3209-a6e6-0418-500dac60a339:0:0: State change requested RUNNING --> 
> FAILED
> 2015-09-30 21:15:45,930 [29f3aefe-3209-a6e6-0418-500dac60a339:frag:0:0] INFO  
> o.a.d.e.w.fragment.FragmentExecutor - 
> 29f3aefe-3209-a6e6-0418-500dac60a339:0:0: State change requested FAILED --> 
> FINISHED
> 2015-09-30 21:15:45,931 [29f3aefe-3209-a6e6-0418-500dac60a339:frag:0:0] ERROR 
> o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: TProtocolException: 
> Required field 'uncompressed_page_size' was not found in serialized data! 
> Struct: PageHeader(type:null, uncompressed_page_size:0, 
> compressed_page_size:0)
> Fragment 0:0
> [Error Id: eb6a5df8-fc59-409b-957a-59cb1079b5b8 on atsqa4-133.qa.lab:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> TProtocolException: Required field 'uncompressed_page_size' was not found in 
> serialized data! Struct: PageHeader(type:null, uncompressed_page_size:0, 
> compressed_page_size:0)
> Fragment 0:0
> [Error Id: eb6a5df8-fc59-409b-957a-59cb1079b5b8

[jira] [Created] (DRILL-4203) Parquet File : Date is stored wrongly

2015-12-15 Thread JIRA

Stéphane Trou created DRILL-4203:


 Summary: Parquet File : Date is stored wrongly
 Key: DRILL-4203
 URL: https://issues.apache.org/jira/browse/DRILL-4203
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.4.0
Reporter: Stéphane Trou


Hello,

I have some problems when i try to read parquet files produce by drill with  
Spark,  all dates are corrupted.

I think the problem come from drill :)

{code}
cat /tmp/date_parquet.csv 
Epoch,1970-01-01
{code}

{code}
0: jdbc:drill:zk=local> select columns[0] as name, cast(columns[1] as date) as 
epoch_date from dfs.tmp.`date_parquet.csv`;
++-+
|  name  | epoch_date  |
++-+
| Epoch  | 1970-01-01  |
++-+
{code}

{code}
0: jdbc:drill:zk=local> create table dfs.tmp.`buggy_parquet`as select 
columns[0] as name, cast(columns[1] as date) as epoch_date from 
dfs.tmp.`date_parquet.csv`;
+---++
| Fragment  | Number of records written  |
+---++
| 0_0   | 1  |
+---++
{code}

When I read the file with parquet tools, i found  
{code}
java -jar parquet-tools-1.8.1.jar head /tmp/buggy_parquet/
name = Epoch
epoch_date = 4881176
{code}

According to 
[https://github.com/Parquet/parquet-format/blob/master/LogicalTypes.md#date], 
epoch_date should be equals to 0.

Meta : 
{code}
java -jar parquet-tools-1.8.1.jar meta /tmp/buggy_parquet/
file:file:/tmp/buggy_parquet/0_0_0.parquet 
creator: parquet-mr version 1.8.1-drill-r0 (build 
6b605a4ea05b66e1a6bf843353abcb4834a4ced8) 
extra:   drill.version = 1.4.0 

file schema: root 

name:OPTIONAL BINARY O:UTF8 R:0 D:1
epoch_date:  OPTIONAL INT32 O:DATE R:0 D:1

row group 1: RC:1 TS:93 OFFSET:4 

name: BINARY SNAPPY DO:0 FPO:4 SZ:52/50/0,96 VC:1 
ENC:RLE,BIT_PACKED,PLAIN
epoch_date:   INT32 SNAPPY DO:0 FPO:56 SZ:45/43/0,96 VC:1 
ENC:RLE,BIT_PACKED,PLAIN
{code}





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-3765) Partition prune rule is unnecessary fired multiple times.

2015-12-15 Thread Rahul Challapalli (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-3765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059081#comment-15059081
 ] 

Rahul Challapalli commented on DRILL-3765:
--

[~jni] I am trying to verify this and the only thing I can think of is a 
performance test. Let me know if a functional test can be added to test this 
patch

> Partition prune rule is unnecessary fired multiple times. 
> --
>
> Key: DRILL-3765
> URL: https://issues.apache.org/jira/browse/DRILL-3765
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Reporter: Jinfeng Ni
>Assignee: Jinfeng Ni
> Fix For: 1.4.0
>
>
> It seems that the partition prune rule may be fired multiple times, even 
> after the first rule execution has pushed the filter into the scan operator. 
> Since partition prune has to build the vectors to contain the partition /file 
> / directory information, to invoke the partition prune rule unnecessary may 
> lead to big memory overhead.
> Drill planner should avoid the un-necessary partition prune rule, in order to 
> reduce the chance of hitting OOM exception, while the partition prune rule is 
> executed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-4204) typeof function throws system error when input parameter is a literal value

2015-12-15 Thread Victoria Markman (JIRA)

Victoria Markman created DRILL-4204:
---

 Summary: typeof function throws system error when input parameter 
is a literal value
 Key: DRILL-4204
 URL: https://issues.apache.org/jira/browse/DRILL-4204
 Project: Apache Drill
  Issue Type: Bug
  Components: Functions - Drill
Affects Versions: 1.4.0
Reporter: Victoria Markman
Priority: Minor


{code}
0: jdbc:drill:schema=dfs> select typeof(1) from sys.options limit 1;
Error: SYSTEM ERROR: IllegalArgumentException: Can not set 
org.apache.drill.exec.vector.complex.reader.FieldReader field 
org.apache.drill.exec.expr.fn.impl.UnionFunctions$GetType.input to 
org.apache.drill.exec.expr.holders.IntHolder
[Error Id: 2139649a-b6f4-48b8-9a25-c0cb78072524 on atsqa4-134.qa.lab:31010] 
(state=,code=0)

0: jdbc:drill:schema=dfs> select typeof('1') from sys.options limit 1;
Error: SYSTEM ERROR: IllegalArgumentException: Can not set 
org.apache.drill.exec.vector.complex.reader.FieldReader field 
org.apache.drill.exec.expr.fn.impl.UnionFunctions$GetType.input to 
org.apache.drill.exec.expr.holders.VarCharHolder
[Error Id: 4f3b9fbd-6ad4-4d0d-ad7b-e22778a6bcb9 on atsqa4-134.qa.lab:31010] 
(state=,code=0)
{code}

drillbit.log
{code}
2015-12-15 23:57:34,323 [298f5712-077d-21bc-49ec-ebc2aca5acce:foreman] ERROR 
o.a.drill.exec.work.foreman.Foreman - SYSTEM ERROR: IllegalArgumentException: 
Can not set org.apache.drill.exec.vector.complex.reader.FieldReader field 
org.apache.drill.exec.expr.fn.impl.UnionFunctions$GetType.input to 
org.apache.drill.exec.expr.holders.IntHolder


[Error Id: 2139649a-b6f4-48b8-9a25-c0cb78072524 on atsqa4-134.qa.lab:31010]
org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
IllegalArgumentException: Can not set 
org.apache.drill.exec.vector.complex.reader.FieldReader field 
org.apache.drill.exec.expr.fn.impl.UnionFunctions$GetType.input to 
org.apache.drill.exec.expr.holders.IntHolder


[Error Id: 2139649a-b6f4-48b8-9a25-c0cb78072524 on atsqa4-134.qa.lab:31010]
at 
org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:534)
 ~[drill-common-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
at 
org.apache.drill.exec.work.foreman.Foreman$ForemanResult.close(Foreman.java:742)
 [drill-java-exec-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
at 
org.apache.drill.exec.work.foreman.Foreman$StateSwitch.processEvent(Foreman.java:841)
 [drill-java-exec-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
at 
org.apache.drill.exec.work.foreman.Foreman$StateSwitch.processEvent(Foreman.java:786)
 [drill-java-exec-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
at 
org.apache.drill.common.EventProcessor.sendEvent(EventProcessor.java:73) 
[drill-common-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
at 
org.apache.drill.exec.work.foreman.Foreman$StateSwitch.moveToState(Foreman.java:788)
 [drill-java-exec-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
at 
org.apache.drill.exec.work.foreman.Foreman.moveToState(Foreman.java:894) 
[drill-java-exec-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:255) 
[drill-java-exec-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
[na:1.7.0_71]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
[na:1.7.0_71]
at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71]
Caused by: org.apache.drill.exec.work.foreman.ForemanException: Unexpected 
exception during fragment initialization: Internal error: Error while applying 
rule ReduceExpressionsRule_Project, args 
[rel#4401:LogicalProject.NONE.ANY([]).[](input=rel#4400:Subset#0.ENUMERABLE.ANY([]).[],EXPR$0=TYPEOF(1))]
... 4 common frames omitted
Caused by: java.lang.AssertionError: Internal error: Error while applying rule 
ReduceExpressionsRule_Project, args 
[rel#4401:LogicalProject.NONE.ANY([]).[](input=rel#4400:Subset#0.ENUMERABLE.ANY([]).[],EXPR$0=TYPEOF(1))]
at org.apache.calcite.util.Util.newInternal(Util.java:792) 
~[calcite-core-1.4.0-drill-r10.jar:1.4.0-drill-r10]
at 
org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:251)
 ~[calcite-core-1.4.0-drill-r10.jar:1.4.0-drill-r10]
at 
org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:808)
 ~[calcite-core-1.4.0-drill-r10.jar:1.4.0-drill-r10]
at 
org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:303) 
~[calcite-core-1.4.0-drill-r10.jar:1.4.0-drill-r10]
at 
org.apache.calcite.prepare.PlannerImpl.transform(PlannerImpl.java:313) 
~[calcite-core-1.4.0-drill-r10.jar:1.4.0-drill-r10]
at 
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.doLogicalPlanning(DefaultSqlHandler.java:542)
 ~[drill-java-exec-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
at

[jira] [Closed] (DRILL-3543) Add stats for external sort to a query profile

2015-12-15 Thread Victoria Markman (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-3543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Victoria Markman closed DRILL-3543.
---

> Add stats for external sort to a query profile
> --
>
> Key: DRILL-3543
> URL: https://issues.apache.org/jira/browse/DRILL-3543
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Relational Operators
>Affects Versions: 1.1.0
>Reporter: Victoria Markman
>Assignee: Deneche A. Hakim
>Priority: Critical
>  Labels: documentation, usability
> Fix For: 1.4.0
>
> Attachments: Screen Shot 2015-12-11 at 2.19.37 PM.png
>
>
> The only indication if sort spilled to disk today is info from the 
> drillbit.log.
> It would be great if this information was displayed in the query profile.
> {code}
> 015-07-22 23:47:29,907 [2a4fd46e-f8c3-6b96-b165-b665a41be311:frag:0:0] INFO  
> o.a.d.e.p.i.xsort.ExternalSortBatch - Completed spilling to 
> /tmp/drill/spill/2a4fd46e-f8c3-6b96-b165-b665a41be311/major_fragment_0/minor_fragment_0/operator_7/92
> 2015-07-22 23:47:29,919 [2a4fd46e-f8c3-6b96-b165-b665a41be311:frag:0:0] INFO  
> o.a.d.e.p.i.xsort.ExternalSortBatch - Merging and spilling to 
> /tmp/drill/spill/2a4fd46e-f8c3-6b96-b165-b665a41be311/major_fragment_0/minor_fragment_0/operator_7/93
> 2015-07-22 23:47:29,919 [2a4fd46e-f8c3-6b96-b165-b665a41be311:frag:0:0] INFO  
> o.a.d.e.p.i.xsort.ExternalSortBatch - Completed spilling to 
> /tmp/drill/spill/2a4fd46e-f8c3-6b96-b165-b665a41be311/major_fragment_0/minor_fragment_0/operator_7/93
> 2015-07-22 23:47:29,919 [2a4fd46e-f8c3-6b96-b165-b665a41be311:frag:0:0] WARN  
> o.a.d.e.p.i.xsort.ExternalSortBatch - Starting to merge. 7 batch groups. 
> Current allocated memory: 11566787
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-4202) Handle coordinator/foreman failures to prevent data loss

2015-12-15 Thread Hanifi Gunes (JIRA)

Hanifi Gunes created DRILL-4202:
---

 Summary: Handle coordinator/foreman failures to prevent data loss
 Key: DRILL-4202
 URL: https://issues.apache.org/jira/browse/DRILL-4202
 Project: Apache Drill
  Issue Type: Sub-task
Reporter: Hanifi Gunes
Priority: Minor


Foreman relies on ephemeral nodes to get rid of zombie profiles. However, 
foreman failures still cause loss of profile data. This happens at any 
non-terminal state where profile is not yet persisted. The initial proposal is 
to rely on watchers to detect state changes and react upon. We can use random 
back-off or a similar scheme to avoid hammering Zookeeper.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (DRILL-3376) Reading individual files created by CTAS with partition causes an exception

2015-12-15 Thread Rahul Challapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-3376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rahul Challapalli resolved DRILL-3376.
--
Resolution: Duplicate

> Reading individual files created by CTAS with partition causes an exception
> ---
>
> Key: DRILL-3376
> URL: https://issues.apache.org/jira/browse/DRILL-3376
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Writer
>Affects Versions: 1.1.0
>Reporter: Parth Chandra
>Assignee: Steven Phillips
> Fix For: 1.1.0
>
>
> Create a table using CTAS with partitioning:
> {code}
> create table `lineitem_part` partition by (l_moddate) as select l.*, 
> l_shipdate - extract(day from l_shipdate) + 1 l_moddate from 
> cp.`tpch/lineitem.parquet` l
> {code}
> Then the following query causes an exception
> {code}
> select distinct l_moddate from `lineitem_part/0_0_1.parquet` where l_moddate 
> = date '1992-01-01';
> {code}
> Trace in the log file - 
> {panel}
> Caused by: java.lang.StringIndexOutOfBoundsException: String index out of 
> range: 0
> at java.lang.String.charAt(String.java:658) ~[na:1.7.0_65]
> at 
> org.apache.drill.exec.planner.logical.partition.PruneScanRule$PathPartition.(PruneScanRule.java:493)
>  ~[drill-java-exec-1.1.0-SNAPSHOT.jar:1.1.0-SNAPSHOT]
> at 
> org.apache.drill.exec.planner.logical.partition.PruneScanRule.doOnMatch(PruneScanRule.java:385)
>  ~[drill-java-exec-1.1.0-SNAPSHOT.jar:1.1.0-SNAPSHOT]
> at 
> org.apache.drill.exec.planner.logical.partition.PruneScanRule$4.onMatch(PruneScanRule.java:278)
>  ~[drill-java-exec-1.1.0-SNAPSHOT.jar:1.1.0-SNAPSHOT]
> at 
> org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:228)
>  ~[calcite-core-1.1.0-drill-r9.jar:1.1.0-drill-r9]
> ... 13 common frames omitted
> {panel}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (DRILL-4198) Enhance StoragePlugin interface to expose logical space rules for planning purpose

2015-12-15 Thread Venki Korukanti (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venki Korukanti resolved DRILL-4198.

Resolution: Fixed

> Enhance StoragePlugin interface to expose logical space rules for planning 
> purpose
> --
>
> Key: DRILL-4198
> URL: https://issues.apache.org/jira/browse/DRILL-4198
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Venki Korukanti
>Assignee: Venki Korukanti
>
> Currently StoragePlugins can only expose rules that are executed in physical 
> space. Add an interface method to StoragePlugin to expose logical space rules 
> to planner.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-2419) UDF that returns string representation of expression type

2015-12-15 Thread Victoria Markman (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-2419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059160#comment-15059160
 ] 

Victoria Markman commented on DRILL-2419:
-

So this function will become useful when we enable union type, I believe.
I've played with it a little and it seems to return what was intended, though 
NULL is not technically a type ...

{code}
0: jdbc:drill:schema=dfs> select count(*), typeof(c_timestamp) from j1 group by 
typeof(c_timestamp); 
+-++
| EXPR$0  |   EXPR$1   |
+-++
| 9800| TIMESTAMP  |
| 200 | NULL   |
+-++
2 rows selected (1.361 seconds)


0: jdbc:drill:schema=dfs> select typeof(c_integer + c_bigint ) from j2;
+-+
| EXPR$0  |
+-+
| BIGINT  |
| BIGINT  |
| BIGINT  |
| BIGINT  |
| BIGINT  |
| BIGINT  |
| BIGINT  |
| BIGINT  |
| BIGINT  |
| NULL|
+-+
10 rows selected (0.565 seconds)


0: jdbc:drill:schema=dfs> select typeof(sum(c_integer)) from j2;
+-+
| EXPR$0  |
+-+
| BIGINT  |
+-+
1 row selected (0.377 seconds)

0: jdbc:drill:schema=dfs> select typeof(c_integer) from j3 where c_integer is 
null;
+-+
| EXPR$0  |
+-+
+-+
No rows selected (0.598 seconds)

{code}

> UDF that returns string representation of expression type
> -
>
> Key: DRILL-2419
> URL: https://issues.apache.org/jira/browse/DRILL-2419
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Functions - Drill
>Reporter: Victoria Markman
>Assignee: Steven Phillips
> Fix For: 1.3.0
>
>
> Suggested name: typeof (credit goes to Aman)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-3765) Partition prune rule is unnecessary fired multiple times.

2015-12-15 Thread Jinfeng Ni (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-3765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059185#comment-15059185
 ] 

Jinfeng Ni commented on DRILL-3765:
---

[~rkins], that's right. I post some preliminary performance number for "explain 
plan" in the earlier comment. Please note that the improvement depends on the 
complexity of the partition filtering ; essentially, this patch is trying to 
reduce the # of partitioning filter evaluation. The more complex the 
partitioning filter is, the more likely we would see performance improvement in 
planning time.

> Partition prune rule is unnecessary fired multiple times. 
> --
>
> Key: DRILL-3765
> URL: https://issues.apache.org/jira/browse/DRILL-3765
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Reporter: Jinfeng Ni
>Assignee: Jinfeng Ni
> Fix For: 1.4.0
>
>
> It seems that the partition prune rule may be fired multiple times, even 
> after the first rule execution has pushed the filter into the scan operator. 
> Since partition prune has to build the vectors to contain the partition /file 
> / directory information, to invoke the partition prune rule unnecessary may 
> lead to big memory overhead.
> Drill planner should avoid the un-necessary partition prune rule, in order to 
> reduce the chance of hitting OOM exception, while the partition prune rule is 
> executed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-4025) Reduce getFileStatus() invocation for Parquet by 1

2015-12-15 Thread Dechang Gu (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dechang Gu updated DRILL-4025:
--
Reviewer: Dechang Gu  (was: Chun Chang)

> Reduce getFileStatus() invocation for Parquet by 1
> --
>
> Key: DRILL-4025
> URL: https://issues.apache.org/jira/browse/DRILL-4025
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.3.0
>Reporter: Mehant Baid
>Assignee: Mehant Baid
> Fix For: 1.3.0
>
> Attachments: DRILL-4025.patch
>
>
> Currently we invoke getFileStatus() to list all the files under a directory 
> even when we have the metadata cache file. The information is already present 
> in the cache so we don't need to perform this operation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-3543) Add stats for external sort to a query profile

2015-12-15 Thread Victoria Markman (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-3543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059184#comment-15059184
 ] 

Victoria Markman commented on DRILL-3543:
-

Verified in 1.4.0

{code}
#Tue Dec 08 03:32:09 UTC 2015
git.commit.id.abbrev=b906811
git.commit.user.email=amit.ha...@gmail.com
git.commit.message.full=DRILL-4165 Add a precondition for size of merge join 
record batch.\n
git.commit.id=b9068117177c3b47025f52c00f67938e0c3e4732
{code}

> Add stats for external sort to a query profile
> --
>
> Key: DRILL-3543
> URL: https://issues.apache.org/jira/browse/DRILL-3543
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Relational Operators
>Affects Versions: 1.1.0
>Reporter: Victoria Markman
>Assignee: Deneche A. Hakim
>Priority: Critical
>  Labels: documentation, usability
> Fix For: 1.4.0
>
> Attachments: Screen Shot 2015-12-11 at 2.19.37 PM.png
>
>
> The only indication if sort spilled to disk today is info from the 
> drillbit.log.
> It would be great if this information was displayed in the query profile.
> {code}
> 015-07-22 23:47:29,907 [2a4fd46e-f8c3-6b96-b165-b665a41be311:frag:0:0] INFO  
> o.a.d.e.p.i.xsort.ExternalSortBatch - Completed spilling to 
> /tmp/drill/spill/2a4fd46e-f8c3-6b96-b165-b665a41be311/major_fragment_0/minor_fragment_0/operator_7/92
> 2015-07-22 23:47:29,919 [2a4fd46e-f8c3-6b96-b165-b665a41be311:frag:0:0] INFO  
> o.a.d.e.p.i.xsort.ExternalSortBatch - Merging and spilling to 
> /tmp/drill/spill/2a4fd46e-f8c3-6b96-b165-b665a41be311/major_fragment_0/minor_fragment_0/operator_7/93
> 2015-07-22 23:47:29,919 [2a4fd46e-f8c3-6b96-b165-b665a41be311:frag:0:0] INFO  
> o.a.d.e.p.i.xsort.ExternalSortBatch - Completed spilling to 
> /tmp/drill/spill/2a4fd46e-f8c3-6b96-b165-b665a41be311/major_fragment_0/minor_fragment_0/operator_7/93
> 2015-07-22 23:47:29,919 [2a4fd46e-f8c3-6b96-b165-b665a41be311:frag:0:0] WARN  
> o.a.d.e.p.i.xsort.ExternalSortBatch - Starting to merge. 7 batch groups. 
> Current allocated memory: 11566787
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Closed] (DRILL-2908) Support reading the Parquet int 96 type

2015-12-15 Thread Victoria Markman (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Victoria Markman closed DRILL-2908.
---

> Support reading the Parquet int 96 type
> ---
>
> Key: DRILL-2908
> URL: https://issues.apache.org/jira/browse/DRILL-2908
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Parquet
>Reporter: Jason Altekruse
>Assignee: Victoria Markman
>  Labels: document
> Fix For: 1.2.0
>
>
> While Drill does not currently have an int96 type, it is supported by the 
> parquet format and we should be able to read files that contain columns of 
> this type. For now we will read the data into a varbinary and users will have 
> to use existing convert_from functions or write their own to interpret the 
> type of data actually stored. One example is the Impala timestamp format 
> which is encoded in an int96 column.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-3765) Partition prune rule is unnecessary fired multiple times.

2015-12-15 Thread Rahul Challapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-3765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rahul Challapalli updated DRILL-3765:
-
Reviewer: Dechang Gu  (was: Rahul Challapalli)

> Partition prune rule is unnecessary fired multiple times. 
> --
>
> Key: DRILL-3765
> URL: https://issues.apache.org/jira/browse/DRILL-3765
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Reporter: Jinfeng Ni
>Assignee: Jinfeng Ni
> Fix For: 1.4.0
>
>
> It seems that the partition prune rule may be fired multiple times, even 
> after the first rule execution has pushed the filter into the scan operator. 
> Since partition prune has to build the vectors to contain the partition /file 
> / directory information, to invoke the partition prune rule unnecessary may 
> lead to big memory overhead.
> Drill planner should avoid the un-necessary partition prune rule, in order to 
> reduce the chance of hitting OOM exception, while the partition prune rule is 
> executed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Closed] (DRILL-2419) UDF that returns string representation of expression type

2015-12-15 Thread Victoria Markman (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Victoria Markman closed DRILL-2419.
---

> UDF that returns string representation of expression type
> -
>
> Key: DRILL-2419
> URL: https://issues.apache.org/jira/browse/DRILL-2419
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Functions - Drill
>Reporter: Victoria Markman
>Assignee: Steven Phillips
> Fix For: 1.3.0
>
>
> Suggested name: typeof (credit goes to Aman)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4169) Upgrade Hive Storage Plugin to work with latest stable Hive (v1.2.1)

2015-12-15 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15058242#comment-15058242
 ] 

ASF GitHub Bot commented on DRILL-4169:
---

GitHub user vkorukanti opened a pull request:

https://github.com/apache/drill/pull/302

DRILL-4169: Upgrade Hive storage plugin to work with Hive 1.2.1

+ HadoopShims.setTokenStr is moved to Utils.setTokenStr. There is no change
  in functionality.
+ Disable binary partitions columns in Hive test suites. Binary
  partition column feature is regressed in Hive 1.2.1. This should affect
  only the Hive execution which is used to generate the test data. If Drill
  is talking to Hive v1.0.0 (which has binary partition columns working),
  Drill should be able to get the data from Hive without any issues (tested)
+ Update StorageHandler based test as there is an issue with test data
  generation in Hive. Need a separate test with custom test StorageHandler.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vkorukanti/drill hive121

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/302.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #302


commit 1579d40641a8731b9478233d252349a7bf7166c5
Author: vkorukanti 
Date:   2015-12-11T19:36:11Z

DRILL-4194: Improve performance of the HiveScan metadata fetch operation

+ Use the stats (numRows) stored in Hive metastore whenever available to
  calculate the costs for planning purpose
+ Delay the costly operation of loading of InputSplits until needed. When
  InputSplits are loaded, cache them at query level to speedup subsequent
  access.

this closes #301

commit ff555e63218038c5dddc5a4eecea7faf8cff058c
Author: vkorukanti 
Date:   2015-08-26T00:51:19Z

DRILL-4169: Upgrade Hive storage plugin to work with Hive 1.2.1

+ HadoopShims.setTokenStr is moved to Utils.setTokenStr. There is no change
  in functionality.
+ Disable binary partitions columns in Hive test suites. Binary
  partition column feature is regressed in Hive 1.2.1. This should affect
  only the Hive execution which is used to generate the test data. If Drill
  is talking to Hive v1.0.0 (which has binary partition columns working),
  Drill should be able to get the data from Hive without any issues (tested)
+ Update StorageHandler based test as there is an issue with test data
  generation in Hive. Need a separate test with custom test StorageHandler.




> Upgrade Hive Storage Plugin to work with latest stable Hive (v1.2.1)
> 
>
> Key: DRILL-4169
> URL: https://issues.apache.org/jira/browse/DRILL-4169
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive
>Affects Versions: 1.4.0
>Reporter: Venki Korukanti
>Assignee: Venki Korukanti
> Fix For: 1.5.0
>
>
> There have been few bug fixes in Hive SerDes since Hive 1.0.0. Its good to 
> update the Hive storage plugin to work with latest stable Hive version 
> (1.2.1), so that HiveRecordReader can use the latest SerDes.
> Compatibility when working with lower versions (v1.0.0 - currently supported 
> version) of Hive servers: There are no metastore API changes between Hive 
> 1.0.0 and Hive 1.2.1 that affect how Drill's Hive storage plugin is 
> interacting with Hive metastore. Tested to make sure it works fine. So users 
> can use Drill to query Hive 1.0.0 (currently supported) and Hive 1.2.1 (new 
> addition in this JIRA).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4144) AssertionError : Internal error: Error while applying rule HivePushPartitionFilterIntoScan:Filter_On_Project_Hive

2015-12-15 Thread Sean Hsuan-Yi Chu (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059271#comment-15059271
 ] 

Sean Hsuan-Yi Chu commented on DRILL-4144:
--

[~khfaraaz]. This cannot be reproduced on master (Commit#: 
bc74629a546afe109252dcf4e3ef00ffc22e7a7a) :

My java is 
java version "1.8.0_65"
Java(TM) SE Runtime Environment (build 1.8.0_65-b17)

Would you mind trying on master again ? 

> AssertionError : Internal error: Error while applying rule 
> HivePushPartitionFilterIntoScan:Filter_On_Project_Hive
> -
>
> Key: DRILL-4144
> URL: https://issues.apache.org/jira/browse/DRILL-4144
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.3.0
> Environment: 4 node cluster
>Reporter: Khurram Faraaz
>Assignee: Sean Hsuan-Yi Chu
>
> AssertionError seen on Drill 1.3 version d61bb83a on a 4 node cluster, as 
> part of Functional test run on JDK 8. Note that assertions were enabled as 
> part of test execution.
> {code}
> [root@centos-01 bin]# java -version
> openjdk version "1.8.0_65"
> OpenJDK Runtime Environment (build 1.8.0_65-b17)
> OpenJDK 64-Bit Server VM (build 25.65-b01, mixed mode)
> {code}
> Failing test case : 
> Functional/interpreted_partition_pruning/hive/text/hier_intint/plan/4.q
> {code}
> query => explain plan for select l_orderkey, l_partkey, l_quantity, 
> l_shipdate, l_shipinstruct from 
> hive.lineitem_text_partitioned_hive_hier_intint where case when `month` > 11 
> then 2 else null end is not null and `year` = 1991;
> {code}
> {noformat}
> [Error Id: c0e23293-2592-4421-9953-bc7d6488398f on centos-03.qa.lab:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: AssertionError
> [Error Id: c0e23293-2592-4421-9953-bc7d6488398f on centos-03.qa.lab:31010]
>   at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:534)
>  ~[drill-common-1.3.0.jar:1.3.0]
>   at 
> org.apache.drill.exec.work.foreman.Foreman$ForemanResult.close(Foreman.java:742)
>  [drill-java-exec-1.3.0.jar:1.3.0]
>   at 
> org.apache.drill.exec.work.foreman.Foreman$StateSwitch.processEvent(Foreman.java:841)
>  [drill-java-exec-1.3.0.jar:1.3.0]
>   at 
> org.apache.drill.exec.work.foreman.Foreman$StateSwitch.processEvent(Foreman.java:786)
>  [drill-java-exec-1.3.0.jar:1.3.0]
>   at 
> org.apache.drill.common.EventProcessor.sendEvent(EventProcessor.java:73) 
> [drill-common-1.3.0.jar:1.3.0]
>   at 
> org.apache.drill.exec.work.foreman.Foreman$StateSwitch.moveToState(Foreman.java:788)
>  [drill-java-exec-1.3.0.jar:1.3.0]
>   at 
> org.apache.drill.exec.work.foreman.Foreman.moveToState(Foreman.java:894) 
> [drill-java-exec-1.3.0.jar:1.3.0]
>   at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:255) 
> [drill-java-exec-1.3.0.jar:1.3.0]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [na:1.8.0_65]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_65]
>   at java.lang.Thread.run(Thread.java:745) [na:1.8.0_65]
> Caused by: org.apache.drill.exec.work.foreman.ForemanException: Unexpected 
> exception during fragment initialization: Internal error: Error while 
> applying rule HivePushPartitionFilterIntoScan:Filter_On_Project_Hive, args 
> [rel#879659:DrillFilterRel.LOGICAL.ANY([]).[](input=rel#879658:Subset#8.LOGICAL.ANY([]).[],condition=AND(IS
>  NOT NULL(CASE(>($6, 11), 2, null)), =($5, 1991))), 
> rel#879657:DrillProjectRel.LOGICAL.ANY([]).[](input=rel#879656:Subset#7.LOGICAL.ANY([]).[],l_orderkey=$6,l_partkey=$3,l_quantity=$0,l_shipdate=$5,l_shipinstruct=$1,year=$2,month=$4),
>  rel#879639:DrillScanRel.LOGICAL.ANY([]).[](table=[hive, 
> lineitem_text_partitioned_hive_hier_intint],groupscan=HiveScan 
> [table=Table(dbName:default, 
> tableName:lineitem_text_partitioned_hive_hier_intint), 
> inputSplits=[maprfs:///drill/testdata/partition_pruning/hive/text/lineitem_hierarchical_intint/1991/1/lineitemaa.tbl:0+106992,
>  
> maprfs:///drill/testdata/partition_pruning/hive/text/lineitem_hierarchical_intint/1991/10/lineitemaj.tbl:0+106646,
>  
> maprfs:///drill/testdata/partition_pruning/hive/text/lineitem_hierarchical_intint/1991/11/lineitemak.tbl:0+106900,
>  
> maprfs:///drill/testdata/partition_pruning/hive/text/lineitem_hierarchical_intint/1991/12/lineitemal.tbl:0+11926,
>  
> maprfs:///drill/testdata/partition_pruning/hive/text/lineitem_hierarchical_intint/1991/2/lineitemab.tbl:0+106663,
>  
> maprfs:///drill/testdata/partition_pruning/hive/text/lineitem_hierarchical_intint/1991/3/lineitemac.tbl:0+106980,
>  
>

[jira] [Commented] (DRILL-3578) UnsupportedOperationException: Unable to get value vector class for minor type [FIXEDBINARY] and mode [OPTIONAL]

2015-12-15 Thread Victoria Markman (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-3578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059281#comment-15059281
 ] 

Victoria Markman commented on DRILL-3578:
-

This particular case is fixed in 1.4.0

{code}
#Tue Dec 08 03:32:09 UTC 2015
git.commit.id.abbrev=b906811
git.commit.user.email=amit.ha...@gmail.com
git.commit.message.full=DRILL-4165 Add a precondition for size of merge join 
record batch.\n
git.commit.id=b9068117177c3b47025f52c00f67938e0c3e4732
{code}

{code}
[Fri Oct 02 09:31:31 root@~ ] # sqlline
apache drill 1.2.0 
"just drill it"
0: jdbc:drill:schema=dfs> select * from dfs.`test/type_test`;
+--+---+--+-+
| num  | word  | dtg  | dollar  |
+--+---+--+-+
| 1| One   | [B@28b12ebd  | 1.0 |
| 2| Two   | [B@8738f2a   | 2.0 |
+--+---+--+-+
2 rows selected (2.815 seconds)
{code}

{code}
0: jdbc:drill:schema=dfs> select num, word, 
CONVERT_FROM(dtg,'TIMESTAMP_IMPALA') from dfs.`test/type_test`;
+--+---++
| num  | word  | EXPR$2 |
+--+---++
| 1| One   | 2015-01-01 00:01:00.0  |
| 2| Two   | 2015-01-02 00:02:00.0  |
+--+---++
2 rows selected (0.741 seconds)
{code}

It's unfortunate that we called parameter to CONVERT_FROM 'TIMESTAMP_IMPALA' 
... Here customer is querying Hive table. Wish I thought about it before ... 
'TIMESTAMP_EXTERNAL' or both TIMESTAMP_HIVE or TIMESTAMP_IMPALA which are the 
same would have been better choice probably.

> UnsupportedOperationException: Unable to get value vector class for minor 
> type [FIXEDBINARY] and mode [OPTIONAL]
> 
>
> Key: DRILL-3578
> URL: https://issues.apache.org/jira/browse/DRILL-3578
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.1.0
>Reporter: Hao Zhu
>Assignee: Parth Chandra
>Priority: Critical
> Fix For: 1.3.0
>
>
> The issue is Drill fails to read "timestamp" type in parquet file generated 
> by Hive.
> How to reproduce:
> 1. Create a external Hive CSV table in hive 1.0:
> {code}
> create external table type_test_csv
> (
>   id1 int,
>   id2 string,
>   id3 timestamp,
>   id4 double
> )
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY ','
> STORED AS TEXTFILE
> LOCATION '/xxx/testcsv';
> {code}
> 2. Put sample data for above external table:
> {code}
> 1,One,2015-01-01 00:01:00,1.0
> 2,Two,2015-01-02 00:02:00,2.0
> {code}
> 3. Create a parquet hive table:
> {code}
> create external table type_test
> (
>   id1 int,
>   id2 string,
>   id3 timestamp,
>   id4 double
> )
> STORED AS PARQUET
> LOCATION '/xxx/type_test';
> INSERT OVERWRITE TABLE type_test
>   SELECT * FROM type_test_csv;
> {code}
> 4. Then querying the parquet file directly through filesystem storage plugin:
> {code}
> > select * from dfs.`xxx/type_test`;
> Error: SYSTEM ERROR: UnsupportedOperationException: Unable to get value 
> vector class for minor type [FIXEDBINARY] and mode [OPTIONAL]
> Fragment 0:0
> [Error Id: fccfe8b2-6427-46e5-8bfd-cac639e526e8 on h3.poc.com:31010] 
> (state=,code=0)
> {code}
> 5. If the sample data is only 1 row:
> {code}
> 1,One,2015-01-01 00:01:00,1.0
> {code}
> Then the error message would become:
> {code}
> > select * from dfs.`xxx/type_test`;
> Error: SYSTEM ERROR: UnsupportedOperationException: Unsupported type:INT96
> [Error Id: b52b5d46-63a8-4be6-a11d-999a1b46c7c2 on h3.poc.com:31010] 
> (state=,code=0)
> {code}
> Using Hive storage plugin works fine. This issue only applies to filesystem 
> storage plugin.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-4200) drill-jdbc-storage: applies timezone to java.sql.Date field and fails

2015-12-15 Thread Karol Potocki (JIRA)

Karol Potocki created DRILL-4200:


 Summary: drill-jdbc-storage: applies timezone to java.sql.Date 
field and fails
 Key: DRILL-4200
 URL: https://issues.apache.org/jira/browse/DRILL-4200
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - Other
Affects Versions: 1.3.0
 Environment: drill-jdbc-storage plugin configured (based on 
https://drill.apache.org/docs/rdbms-storage-plugin) with 
org.relique.jdbc.csv.CsvDriver to access dbf (dbase) files.
Reporter: Karol Potocki


When using org.relique.jdbc.csv.CsvDriver to query files with date fields (i.e. 
2012-05-01) causes:

{code}
UnsupportedOperationException: Method not supported: ResultSet.getDate(int, 
Calendar)
{code}

In JdbcRecordReader.java:406  there is getDate which tries to apply timezone to 
java.sql.Date which probably is not timezone related and this brings the error.

Quick fix is to use ResultSet.getDate(int) instead.

Details:
{code}
Caused by: java.lang.UnsupportedOperationException: Method not supported: Result
Set.getDate(int, Calendar)
at org.relique.jdbc.csv.CsvResultSet.getDate(Unknown Source) ~[csvjdbc-1
.0-28.jar:na]
at org.apache.commons.dbcp.DelegatingResultSet.getDate(DelegatingResultS
et.java:574) ~[commons-dbcp-1.4.jar:1.4]
at org.apache.commons.dbcp.DelegatingResultSet.getDate(DelegatingResultS
et.java:574) ~[commons-dbcp-1.4.jar:1.4]
at org.apache.drill.exec.store.jdbc.JdbcRecordReader$DateCopier.copy(Jdb
cRecordReader.java:406) ~[drill-jdbc-storage-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
at org.apache.drill.exec.store.jdbc.JdbcRecordReader.next(JdbcRecordRead
er.java:242) ~[drill-jdbc-storage-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-4091) Support more functions in gis contrib module

2015-12-15 Thread Karol Potocki (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karol Potocki updated DRILL-4091:
-
Target Version/s: 1.5.0

> Support more functions in gis contrib module
> 
>
> Key: DRILL-4091
> URL: https://issues.apache.org/jira/browse/DRILL-4091
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Functions - Drill
>Reporter: Karol Potocki
>
> Support for commonly used gis functions in gis contrib module: relate, 
> contains, crosses, intersects, touches, difference, disjoint, buffer, union 
> etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-4201) DrillPushFilterPastProject should allow partial filter pushdown.

2015-12-15 Thread Jinfeng Ni (JIRA)

Jinfeng Ni created DRILL-4201:
-

 Summary: DrillPushFilterPastProject should allow partial filter 
pushdown. 
 Key: DRILL-4201
 URL: https://issues.apache.org/jira/browse/DRILL-4201
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning & Optimization
Reporter: Jinfeng Ni
Assignee: Jinfeng Ni
 Fix For: 1.5.0


Currently, DrillPushFilterPastProjectRule will stop pushing the filter down, if 
the filter itself has ITEM or FLATTEN function, or its input reference is 
referring to an ITEM or FLATTEN function. However, in case that the filter is a 
conjunction of multiple sub-filters, some of them refer to ITEM  or FLATTEN but 
the other not, then we should allow partial filter to be pushed down. For 
instance,

WHERE  partition_col > 10 and flatten_output_col = 'ABC'. 

The "flatten_output_col" comes from the output of FLATTEN operator, and 
therefore flatten_output_col = 'ABC' should not pushed past the project. But 
partiion_col > 10 should be pushed down, such that we could trigger the pruning 
rule to apply partition pruning.

It would be improve Drill query performance, when the partially pushed filter 
leads to partition pruning, or the partially pushed filter results in early 
filtering in upstream operator. 






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4187) Introduce a state to separate queries pending execution from those pending in the queue.

2015-12-15 Thread Hanifi Gunes (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15058911#comment-15058911
 ] 

Hanifi Gunes commented on DRILL-4187:
-

Thinking about it once more, I like STARTING more than PENDING_EXECUTION. 

> Introduce a state to separate queries pending execution from those pending in 
> the queue.
> 
>
> Key: DRILL-4187
> URL: https://issues.apache.org/jira/browse/DRILL-4187
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Hanifi Gunes
>Assignee: Hanifi Gunes
>
> Currently queries pending in the queue are not listed in the web UI besides 
> we use the state PENDING to mean pending executions. This issue proposes i) 
> to list enqueued queries in the web UI ii) to introduce a new state for 
> queries sitting at the queue, differentiating then from those pending 
> execution.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-4180) IllegalArgumentException while reading JSON files

2015-12-15 Thread Suresh Ollala (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-4180:
-
Reviewer: Chun Chang

> IllegalArgumentException while reading JSON files
> -
>
> Key: DRILL-4180
> URL: https://issues.apache.org/jira/browse/DRILL-4180
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - JSON
>Reporter: Sean Hsuan-Yi Chu
>Assignee: Sean Hsuan-Yi Chu
> Fix For: 1.5.0
>
> Attachments: a.json, b.json
>
>
> First of all, this issue can be reproduced when drill runs on distributed 
> mode.
> We have two json files in distributed file system. The type for the column is 
> MAP and there is not schema change on the top level. However, in the one 
> layer deeper in this column, the first file has one NullableBit column, which 
> does not appear in the second file. 
> The issue can be reproduced by the files in the attachment and this query :
> {code}
> select jsonFieldMapLevel1_aaa from directory
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-3066) AtomicRemainder - Tried to close remainder, but it has already been closed.

2015-12-15 Thread Khurram Faraaz (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-3066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059645#comment-15059645
 ] 

Khurram Faraaz commented on DRILL-3066:
---

Verified Fix, we do not see the warning message in drillbit.log file when a 
corrupt parquet file is read. Some of the bytes of data from original parquet 
file were removed, and hence the Exception.

{code}
0: jdbc:drill:schema=dfs.tmp> select * from `corpt_Prq_01.parquet`;
Error: SYSTEM ERROR: EOFException: Seeking beyond EOF, file: 
/tmp/corpt_Prq_01.parquet, file length: 1024, seeking to: 207814580

Fragment 0:0

[Error Id: 9e864140-c422-4032-8a56-5090639c6396 on centos-01.qa.lab:31010] 
(state=,code=0)
{code}

> AtomicRemainder - Tried to close remainder, but it has already been closed.
> ---
>
> Key: DRILL-3066
> URL: https://issues.apache.org/jira/browse/DRILL-3066
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.0.0
> Environment: 21cc578b6b8c8f3ca1ebffd3dbb92e35d68bc726 
>Reporter: Khurram Faraaz
>Assignee: Khurram Faraaz
>Priority: Minor
> Fix For: 1.0.0
>
>
> I see the below stack trace in drillbit.log when I try query a corrupt 
> parquet file. Test was run on 4 node cluster on CentOS.
> AtomicRemainder - Tried to close remainder, but it has already been closed.
> {code}
> 2015-05-13 20:42:58,893 [2aac48ac-82d3-0f5a-2bac-537e82b3ac02:frag:0:0] WARN  
> o.a.d.exec.memory.AtomicRemainder - Tried to close remainder, but it has 
> already been closed
> java.lang.Exception: null
> at 
> org.apache.drill.exec.memory.AtomicRemainder.close(AtomicRemainder.java:196) 
> [drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
> at org.apache.drill.exec.memory.Accountor.close(Accountor.java:386) 
> [drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
> at 
> org.apache.drill.exec.memory.TopLevelAllocator$ChildAllocator.close(TopLevelAllocator.java:310)
>  [drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
> at 
> org.apache.drill.exec.ops.FragmentContext.suppressingClose(FragmentContext.java:405)
>  [drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
> at 
> org.apache.drill.exec.ops.FragmentContext.close(FragmentContext.java:399) 
> [drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources(FragmentExecutor.java:312)
>  [drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cancel(FragmentExecutor.java:135)
>  [drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.foreman.QueryManager.cancelExecutingFragments(QueryManager.java:202)
>  [drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.foreman.Foreman$StateSwitch.processEvent(Foreman.java:836)
>  [drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.foreman.Foreman$StateSwitch.processEvent(Foreman.java:780)
>  [drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
> at 
> org.apache.drill.common.EventProcessor.sendEvent(EventProcessor.java:73) 
> [drill-common-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.foreman.Foreman$StateSwitch.moveToState(Foreman.java:782)
>  [drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.foreman.Foreman.moveToState(Foreman.java:891) 
> [drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.foreman.Foreman.access$2700(Foreman.java:107) 
> [drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.foreman.Foreman$StateListener.moveToState(Foreman.java:1161)
>  [drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.foreman.QueryManager$1.statusUpdate(QueryManager.java:481)
>  [drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.foreman.QueryManager$RootStatusReporter.statusChange(QueryManager.java:461)
>  [drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.AbstractStatusReporter.fail(AbstractStatusReporter.java:90)
>  [drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.AbstractStatusReporter.fail(AbstractStatusReporter.java:86)
>  [drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
> at 
>

[jira] [Closed] (DRILL-3066) AtomicRemainder - Tried to close remainder, but it has already been closed.

2015-12-15 Thread Khurram Faraaz (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-3066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Khurram Faraaz closed DRILL-3066.
-

> AtomicRemainder - Tried to close remainder, but it has already been closed.
> ---
>
> Key: DRILL-3066
> URL: https://issues.apache.org/jira/browse/DRILL-3066
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.0.0
> Environment: 21cc578b6b8c8f3ca1ebffd3dbb92e35d68bc726 
>Reporter: Khurram Faraaz
>Assignee: Khurram Faraaz
>Priority: Minor
> Fix For: 1.0.0
>
>
> I see the below stack trace in drillbit.log when I try query a corrupt 
> parquet file. Test was run on 4 node cluster on CentOS.
> AtomicRemainder - Tried to close remainder, but it has already been closed.
> {code}
> 2015-05-13 20:42:58,893 [2aac48ac-82d3-0f5a-2bac-537e82b3ac02:frag:0:0] WARN  
> o.a.d.exec.memory.AtomicRemainder - Tried to close remainder, but it has 
> already been closed
> java.lang.Exception: null
> at 
> org.apache.drill.exec.memory.AtomicRemainder.close(AtomicRemainder.java:196) 
> [drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
> at org.apache.drill.exec.memory.Accountor.close(Accountor.java:386) 
> [drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
> at 
> org.apache.drill.exec.memory.TopLevelAllocator$ChildAllocator.close(TopLevelAllocator.java:310)
>  [drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
> at 
> org.apache.drill.exec.ops.FragmentContext.suppressingClose(FragmentContext.java:405)
>  [drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
> at 
> org.apache.drill.exec.ops.FragmentContext.close(FragmentContext.java:399) 
> [drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources(FragmentExecutor.java:312)
>  [drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cancel(FragmentExecutor.java:135)
>  [drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.foreman.QueryManager.cancelExecutingFragments(QueryManager.java:202)
>  [drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.foreman.Foreman$StateSwitch.processEvent(Foreman.java:836)
>  [drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.foreman.Foreman$StateSwitch.processEvent(Foreman.java:780)
>  [drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
> at 
> org.apache.drill.common.EventProcessor.sendEvent(EventProcessor.java:73) 
> [drill-common-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.foreman.Foreman$StateSwitch.moveToState(Foreman.java:782)
>  [drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.foreman.Foreman.moveToState(Foreman.java:891) 
> [drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.foreman.Foreman.access$2700(Foreman.java:107) 
> [drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.foreman.Foreman$StateListener.moveToState(Foreman.java:1161)
>  [drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.foreman.QueryManager$1.statusUpdate(QueryManager.java:481)
>  [drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.foreman.QueryManager$RootStatusReporter.statusChange(QueryManager.java:461)
>  [drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.AbstractStatusReporter.fail(AbstractStatusReporter.java:90)
>  [drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.AbstractStatusReporter.fail(AbstractStatusReporter.java:86)
>  [drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:291)
>  [drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:255)
>  [drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
> at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_45]
> at 
>

[jira] [Commented] (DRILL-3478) Bson Record Reader for Mongo storage plugin

2015-12-15 Thread B Anil Kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-3478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059664#comment-15059664
 ] 

B Anil Kumar commented on DRILL-3478:
-

Uploaded the new patch with review comments fixes. Please review 
https://reviews.apache.org/r/40182/

With new patch made BsonRecordReader as *default* and tested with below test 
cases and attached queries.

*To run test cases with Bson Record Reader:*
{noformat}

1) For sharded replicated (default)
mvn test -Ddrill.mongo.tests.shardMode=true
2) For embedded
mvn test -Ddrill.mongo.tests.shardMode=false

{noformat}

*To run with jsonRecordReader:*

{noformat}

1) For sharded replicated (default)
mvn test -Ddrill.mongo.tests.shardMode=true 
-Ddrill.mongo.tests.bson.reader=false
2) For embedded
mvn test -Ddrill.mongo.tests.shardMode=false 
-Ddrill.mongo.tests.bson.reader=false

{noformat}

> Bson Record Reader for Mongo storage plugin
> ---
>
> Key: DRILL-3478
> URL: https://issues.apache.org/jira/browse/DRILL-3478
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - MongoDB
>Reporter: B Anil Kumar
>Assignee: B Anil Kumar
> Fix For: Future
>
> Attachments: drill_bson_sqlline_test_2015_1
>
>
> Improve the mongo query performance.
> We are considering the suggestions provided by [~dragoncurve] and [~hgunes] 
> in drill mailing chain.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-2612) Union All involving empty JSON file on right input to union all returns zero results

2015-12-15 Thread Khurram Faraaz (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-2612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059667#comment-15059667
 ] 

Khurram Faraaz commented on DRILL-2612:
---

Verified Fix, test needs to be added.

> Union All involving empty JSON file on right input to union all returns zero 
> results
> 
>
> Key: DRILL-2612
> URL: https://issues.apache.org/jira/browse/DRILL-2612
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 0.9.0
> Environment: | 9d92b8e319f2d46e8659d903d355450e15946533 | DRILL-2580: 
> Exit early from HashJoinBatch if build side is empty | 26.03.2015 @ 16:13:53 
> EDT | Unknown | 26.03.2015 @ 16:53:21 EDT |
>Reporter: Khurram Faraaz
>Assignee: Sean Hsuan-Yi Chu
> Attachments: DRILL-2612.1.patch
>
>
> Union All returns zero results when the input JSON file on the right of union 
> all operator is empty. 
> The JSON file to the left of Union All has data in it. Performing a Union All 
> on such a setup results in zero results being returned by Union All.
> {code}
> 0: jdbc:drill:> select key from `intData.json` union all select key from 
> `empty01.json`;
> +--+
> |  |
> +--+
> +--+
> No rows selected (0.092 seconds)
> File on the left of union all has 200 JSON objects in it.
> 0: jdbc:drill:> select count(key) from `intData.json`;
> ++
> |   EXPR$0   |
> ++
> | 200|
> ++
> 1 row selected (0.073 seconds)
> 0: jdbc:drill:> select count(key) from `empty01.json`;
> +--+
> |  |
> +--+
> +--+
> No rows selected (0.074 seconds)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-3478) Bson Record Reader for Mongo storage plugin

2015-12-15 Thread B Anil Kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-3478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

B Anil Kumar updated DRILL-3478:

Attachment: 0001-DRILL-3478_1-Review-comments-fixes.patch

> Bson Record Reader for Mongo storage plugin
> ---
>
> Key: DRILL-3478
> URL: https://issues.apache.org/jira/browse/DRILL-3478
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - MongoDB
>Reporter: B Anil Kumar
>Assignee: B Anil Kumar
> Fix For: Future
>
> Attachments: 0001-DRILL-3478_1-Review-comments-fixes.patch, 
> Test_queries_with_review_comment_fixes, drill_bson_sqlline_test_2015_1
>
>
> Improve the mongo query performance.
> We are considering the suggestions provided by [~dragoncurve] and [~hgunes] 
> in drill mailing chain.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-2591) Aggregate in left input to Union All does not work

2015-12-15 Thread Khurram Faraaz (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-2591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059668#comment-15059668
 ] 

Khurram Faraaz commented on DRILL-2591:
---

Verified fix on Drill 1.4 test needs to be added.

> Aggregate in left input to Union All does not work
> --
>
> Key: DRILL-2591
> URL: https://issues.apache.org/jira/browse/DRILL-2591
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 0.9.0
> Environment: {code}
> 0: jdbc:drill:> select * from sys.version;
> +++-+-++
> | commit_id  | commit_message | commit_time | build_email | build_time |
> +++-+-++
> | 9d92b8e319f2d46e8659d903d355450e15946533 | DRILL-2580: Exit early from 
> HashJoinBatch if build side is empty | 26.03.2015 @ 16:13:53 EDT | Unknown
>  | 26.03.2015 @ 16:53:21 EDT |
> +++-+-++
> 1 row selected (0.104 seconds)
> {code}
>Reporter: Khurram Faraaz
>Assignee: Sean Hsuan-Yi Chu
> Attachments: DRILL-2591.1.patch
>
>
> If the left input to Union All has an aggregate function, the result is 
> SQLException. This was seen on a 4 node cluster.
> {code}
> 0: jdbc:drill:> select max(key) from `dateData.json` union all select key 
> from `timeStmpData.json`;
> ++
> |   EXPR$0   |
> ++
> Query failed: Query stopped., Schema change detected in the left input of 
> Union-All. This is not currently supported [ 
> 441285d7-e4a5-46c8-ab11-a0332945e3fc on centos-04.qa.lab:31010 ]
> java.lang.RuntimeException: java.sql.SQLException: Failure while executing 
> query.
>   at sqlline.SqlLine$IncrementalRows.hasNext(SqlLine.java:2514)
>   at sqlline.SqlLine$TableOutputFormat.print(SqlLine.java:2148)
>   at sqlline.SqlLine.print(SqlLine.java:1809)
>   at sqlline.SqlLine$Commands.execute(SqlLine.java:3766)
>   at sqlline.SqlLine$Commands.sql(SqlLine.java:3663)
>   at sqlline.SqlLine.dispatch(SqlLine.java:889)
>   at sqlline.SqlLine.begin(SqlLine.java:763)
>   at sqlline.SqlLine.start(SqlLine.java:498)
>   at sqlline.SqlLine.main(SqlLine.java:460)
> {code}
> Stack trace from drillbit.log
> {code}
> 2015-03-27 00:29:09,795 [2aeb5baa-5af0-ac70-b49a-53e61c92be51:frag:0:0] ERROR 
> o.a.drill.exec.work.foreman.Foreman - Error 
> e3ad43f5-fda6-48e5-9e74-779c69bb3cb2: RemoteRpcException: Failure while 
> running fragment., Schema change detected in the left input of Union-All. 
> This is not currently supported [ c2c7add0-651b-44d8-9a7c-3218761098e4 on 
> centos-04.qa.lab:31010 ]
> [ c2c7add0-651b-44d8-9a7c-3218761098e4 on centos-04.qa.lab:31010 ]
> org.apache.drill.exec.rpc.RemoteRpcException: Failure while running 
> fragment., Schema change detected in the left input of Union-All. This is not 
> currently supported [ c2c7add0-651b-44d8-9a7c-3218761098e4 on 
> centos-04.qa.lab:31010 ]
> [ c2c7add0-651b-44d8-9a7c-3218761098e4 on centos-04.qa.lab:31010 ]
> at 
> org.apache.drill.exec.work.foreman.QueryManager.statusUpdate(QueryManager.java:163)
>  [drill-java-exec-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.foreman.QueryManager$RootStatusReporter.statusChange(QueryManager.java:281)
>  [drill-java-exec-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.AbstractStatusReporter.fail(AbstractStatusReporter.java:114)
>  [drill-java-exec-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.AbstractStatusReporter.fail(AbstractStatusReporter.java:110)
>  [drill-java-exec-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.internalFail(FragmentExecutor.java:230)
>  [drill-java-exec-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:165)
>  [drill-java-exec-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT]
> at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_75]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_75]
> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_75]
> 2015-03-27 00:29:09,796 [2aeb5baa-5af0-ac70-b49a-53e61c92be51:frag:0:0] WARN  
> o.a.d.e.w.fragment.FragmentExecutor - Error while initializing or executing 
> fragment
> java.lang.RuntimeException: Error closing fragment

[jira] [Closed] (DRILL-2562) Order by over trimmed key results in incorrect ordering

2015-12-15 Thread Khurram Faraaz (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Khurram Faraaz closed DRILL-2562.
-

> Order by over trimmed key results in incorrect ordering
> ---
>
> Key: DRILL-2562
> URL: https://issues.apache.org/jira/browse/DRILL-2562
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 0.8.0
> Environment: | f658a3c513ddf7f2d1b0ad7aa1f3f65049a594fe | DRILL-2209 
> Insert ProjectOperator with MuxExchange | 09.03.2015 @ 01:49:18 EDT
>Reporter: Khurram Faraaz
>Assignee: Steven Phillips
>Priority: Critical
> Attachments: longStringInJsnData.json
>
>
> Input data in JSON data file has prevailing and trailing spaces for some of 
> the values. Trimming the whitespace and then doing an order by over the 
> trimmed results, we see the query returned results in in correct ordering. 
> Each value is a string value. Some of the strings are very long in length 
> (they have 1000-2049 characters in them).
> {code}
> 0: jdbc:drill:> select trim(key) from `longStringInJsnData.json` order by key;
> ++
> |   EXPR$0   |
> ++
> | p  |
> | m  |
> | a  |
> | aeiou  |
> | h  |
> | z  |
> | Hello World! |
> ...
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Closed] (DRILL-3689) incorrect results : aggregate AVG returns wrong results over results returned by LEAD function.

2015-12-15 Thread Khurram Faraaz (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-3689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Khurram Faraaz closed DRILL-3689.
-

> incorrect results : aggregate AVG returns wrong results over results returned 
> by LEAD function.
> ---
>
> Key: DRILL-3689
> URL: https://issues.apache.org/jira/browse/DRILL-3689
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.2.0
> Environment:  private-branch 
> https://github.com/adeneche/incubator-drill/tree/new-window-funcs
>Reporter: Khurram Faraaz
>Assignee: Khurram Faraaz
>Priority: Critical
>  Labels: window_function
> Fix For: 1.2.0
>
>
> Aggregate AVG returns wrong results over results returned by LEAD function.
> results returned by Drill
> {code}
> 0: jdbc:drill:schema=dfs.tmp> SELECT  avg(lead_col1) FROM (SELECT LEAD(col1) 
> OVER(PARTITION BY col7 ORDER BY col1) lead_col1 , col7 FROM FEWRWSPQQ_101) 
> sub_query;
> +-+
> | EXPR$0  |
> +-+
> | 2.35195986941647008E17  |
> +-+
> 1 row selected (0.264 seconds)
> {code}
> Explain plan for above query from Drill
> {code}
> 0: jdbc:drill:schema=dfs.tmp> explain plan for SELECT  avg(lead_col1) FROM 
> (SELECT LEAD(col1) OVER(PARTITION BY col7 ORDER BY col1) lead_col1 , col7 
> FROM FEWRWSPQQ_101) sub_query;
> +--+--+
> | text | json |
> +--+--+
> | 00-00Screen
> 00-01  Project(EXPR$0=[$0])
> 00-02Project(EXPR$0=[CAST(/(CastHigh(CASE(=($1, 0), null, $0)), 
> $1)):ANY NOT NULL])
> 00-03  StreamAgg(group=[{}], agg#0=[$SUM0($0)], agg#1=[COUNT($0)])
> 00-04Project(w0$o0=[$3])
> 00-05  Window(window#0=[window(partition {2} order by [1] range 
> between UNBOUNDED PRECEDING and CURRENT ROW aggs [LEAD($1)])])
> 00-06SelectionVectorRemover
> 00-07  Sort(sort0=[$2], sort1=[$1], dir0=[ASC], dir1=[ASC])
> 00-08Project(T36¦¦*=[$0], col1=[$1], col7=[$2])
> 00-09  Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath [path=maprfs:///tmp/FEWRWSPQQ_101]], 
> selectionRoot=maprfs:/tmp/FEWRWSPQQ_101, numFiles=1, columns=[`*`]]])
> {code}
> results returned by Postgres
> {code}
> postgres=# SELECT  avg(lead_col1) FROM (SELECT LEAD(col1) OVER(PARTITION BY 
> col7 ORDER BY col1) lead_col1 , col7 FROM FEWRWSPQQ_101) sub_query;
>  avg 
> -
>  1157533190627124568
> (1 row)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Closed] (DRILL-3351) Invalid query must be caught earlier

2015-12-15 Thread Khurram Faraaz (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-3351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Khurram Faraaz closed DRILL-3351.
-

> Invalid query must be caught earlier
> 
>
> Key: DRILL-3351
> URL: https://issues.apache.org/jira/browse/DRILL-3351
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.1.0
> Environment: 8815eb7d
>Reporter: Khurram Faraaz
>Assignee: Jinfeng Ni
> Fix For: 1.1.0
>
>
> The below query is not valid and we must report an error instead of returning 
> results. Postgres doe not support this kind of a query.
> Drill returns some results, we must instead report an error to user.
> {code}
> 0: jdbc:drill:schema=dfs.tmp> SELECT MIN(col_int) OVER() FROM vwOnParq group 
> by col_char_2;
> +-+
> | EXPR$0  |
> +-+
> | AZ  |
> | AZ  |
> | AZ  |
> | AZ  |
> | AZ  |
> | AZ  |
> | AZ  |
> | AZ  |
> | AZ  |
> | AZ  |
> | AZ  |
> | AZ  |
> | AZ  |
> | AZ  |
> | AZ  |
> | AZ  |
> | AZ  |
> | AZ  |
> +-+
> 18 rows selected (0.27 seconds)
> {code}
> Output from Postgres
> {code}
> postgres=# select min(col_int) over() from all_typs_tbl group by col_char_2;
> ERROR:  column "all_typs_tbl.col_int" must appear in the GROUP BY clause or 
> be used in an aggregate function
> LINE 1: select min(col_int) over() from all_typs_tbl group by col_ch...
> {code}
> Querying the original parquet file that was used to create the view, returns 
> an assertion error
> {code}
> 0: jdbc:drill:schema=dfs.tmp> SELECT MIN(col_int) OVER() FROM 
> `tblForView/0_0_0.parquet` group by col_char_2;
> Error: SYSTEM ERROR: java.lang.AssertionError: Internal error: while 
> converting MIN(`tblForView/0_0_0.parquet`.`col_int`)
> [Error Id: e8ed279d-aa8c-4db1-9906-5dd7fdecaac2 on centos-02.qa.lab:31010] 
> (state=,code=0)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Closed] (DRILL-2816) system error does not display the original Exception message

2015-12-15 Thread Khurram Faraaz (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Khurram Faraaz closed DRILL-2816.
-

> system error does not display the original Exception message
> 
>
> Key: DRILL-2816
> URL: https://issues.apache.org/jira/browse/DRILL-2816
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 0.9.0
> Environment: 64e3ec52b93e9331aa5179e040eca19afece8317 | DRILL-2611: 
> value vectors should report valid value count | 16.04.2015 @ 13:53:34 EDT
>Reporter: Khurram Faraaz
>Assignee: Steven Phillips
> Fix For: 1.0.0
>
> Attachments: DRILL-2816.1.patch.txt, DRILL-2816.2.patch.txt
>
>
> The below SQL reported an assertion error on an earlier source level, however 
> now it is not reported as an AssertionError on the sqlline prompt.
> {code}
> this is the output from an earlier level
> 0: jdbc:drill:> select max(columns[0]) from (select * from `countries.csv` 
> offset 1) tmp order by tmp.columns[1];
> Query failed: AssertionError: star should have been expanded
> Error: exception while executing query: Failure while executing query. 
> (state=,code=0)
> here is the output from current level, as mention in the Environment field of 
> this JIRI
> 0: jdbc:drill:> select max(tmp.columns[0]) from (select * from 
> `countries.csv` offset 1) tmp order by tmp.columns[1];
> Query failed: SYSTEM ERROR: Unexpected exception during fragment 
> initialization: star should have been expanded
> [3bfba8e5-5449-4d15-a663-0677e9ae65da on centos-04.qa.lab:31010]
> Error: exception while executing query: Failure while executing query. 
> (state=,code=0)
> 2015-04-17 23:24:56,720 [2ace69b7-760a-4977-6eb9-1a39e4c1bb07:foreman] INFO  
> o.a.drill.exec.work.foreman.Foreman - State change requested.  PENDING --> 
> FAILED
> org.apache.drill.exec.work.foreman.ForemanException: Unexpected exception 
> during fragment initialization: star should have been expanded
> at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:211) 
> [drill-java-exec-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_75]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_75]
> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_75]
> Caused by: java.lang.AssertionError: star should have been expanded
> at org.eigenbase.sql.validate.AggChecker.visit(AggChecker.java:81) 
> ~[optiq-core-0.9-drill-r21.jar:na]
> at org.eigenbase.sql.validate.AggChecker.visit(AggChecker.java:31) 
> ~[optiq-core-0.9-drill-r21.jar:na]
> at org.eigenbase.sql.SqlIdentifier.accept(SqlIdentifier.java:222) 
> ~[optiq-core-0.9-drill-r21.jar:na]
> at 
> org.eigenbase.sql.util.SqlBasicVisitor$ArgHandlerImpl.visitChild(SqlBasicVisitor.java:107)
>  ~[optiq-core-0.9-drill-r21.jar:na]
> at org.eigenbase.sql.SqlOperator.acceptCall(SqlOperator.java:688) 
> ~[optiq-core-0.9-drill-r21.jar:na]
> at org.eigenbase.sql.validate.AggChecker.visit(AggChecker.java:139) 
> ~[optiq-core-0.9-drill-r21.jar:na]
> at org.eigenbase.sql.validate.AggChecker.visit(AggChecker.java:31) 
> ~[optiq-core-0.9-drill-r21.jar:na]
> at org.eigenbase.sql.SqlCall.accept(SqlCall.java:125) 
> ~[optiq-core-0.9-drill-r21.jar:na]
> at 
> org.eigenbase.sql.util.SqlBasicVisitor$ArgHandlerImpl.visitChild(SqlBasicVisitor.java:107)
>  ~[optiq-core-0.9-drill-r21.jar:na]
> at org.eigenbase.sql.SqlOperator.acceptCall(SqlOperator.java:688) 
> ~[optiq-core-0.9-drill-r21.jar:na]
> at org.eigenbase.sql.validate.AggChecker.visit(AggChecker.java:139) 
> ~[optiq-core-0.9-drill-r21.jar:na]
> at org.eigenbase.sql.validate.AggChecker.visit(AggChecker.java:31) 
> ~[optiq-core-0.9-drill-r21.jar:na]
> at org.eigenbase.sql.SqlCall.accept(SqlCall.java:125) 
> ~[optiq-core-0.9-drill-r21.jar:na]
> at 
> org.eigenbase.sql.validate.AggregatingSelectScope.checkAggregateExpr(AggregatingSelectScope.java:155)
>  ~[optiq-core-0.9-drill-r21.jar:na]
> at 
> org.eigenbase.sql.validate.AggregatingSelectScope.validateExpr(AggregatingSelectScope.java:164)
>  ~[optiq-core-0.9-drill-r21.jar:na]
> at 
> org.eigenbase.sql.validate.OrderByScope.validateExpr(OrderByScope.java:100) 
> ~[optiq-core-0.9-drill-r21.jar:na]
> at 
> org.eigenbase.sql.validate.SqlValidatorImpl.validateExpr(SqlValidatorImpl.java:3150)
>  ~[optiq-core-0.9-drill-r21.jar:na]
> at 
> org.eigenbase.sql.validate.SqlValidatorImpl.validateOrderItem(SqlValidatorImpl.java:2965)
>  ~[optiq-core-0.9-drill-r21.jar:na]
> at 
>

[jira] [Commented] (DRILL-2816) system error does not display the original Exception message

2015-12-15 Thread Khurram Faraaz (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-2816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059660#comment-15059660
 ] 

Khurram Faraaz commented on DRILL-2816:
---

Verified Fix on Drill 1.4, we do not see the assertion error.

{code}
0: jdbc:drill:schema=dfs.tmp> select max(columns[0]) from (select * from 
`countries.csv` offset 1) tmp order by tmp.columns[1];
Error: VALIDATION ERROR: From line 1, column 83 to line 1, column 85: 
Expression 'tmp.*' is not being grouped


[Error Id: 72d87d6a-ba58-41b4-b188-e9f0a83d6313 on centos-01.qa.lab:31010] 
(state=,code=0)
{code}

> system error does not display the original Exception message
> 
>
> Key: DRILL-2816
> URL: https://issues.apache.org/jira/browse/DRILL-2816
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 0.9.0
> Environment: 64e3ec52b93e9331aa5179e040eca19afece8317 | DRILL-2611: 
> value vectors should report valid value count | 16.04.2015 @ 13:53:34 EDT
>Reporter: Khurram Faraaz
>Assignee: Steven Phillips
> Fix For: 1.0.0
>
> Attachments: DRILL-2816.1.patch.txt, DRILL-2816.2.patch.txt
>
>
> The below SQL reported an assertion error on an earlier source level, however 
> now it is not reported as an AssertionError on the sqlline prompt.
> {code}
> this is the output from an earlier level
> 0: jdbc:drill:> select max(columns[0]) from (select * from `countries.csv` 
> offset 1) tmp order by tmp.columns[1];
> Query failed: AssertionError: star should have been expanded
> Error: exception while executing query: Failure while executing query. 
> (state=,code=0)
> here is the output from current level, as mention in the Environment field of 
> this JIRI
> 0: jdbc:drill:> select max(tmp.columns[0]) from (select * from 
> `countries.csv` offset 1) tmp order by tmp.columns[1];
> Query failed: SYSTEM ERROR: Unexpected exception during fragment 
> initialization: star should have been expanded
> [3bfba8e5-5449-4d15-a663-0677e9ae65da on centos-04.qa.lab:31010]
> Error: exception while executing query: Failure while executing query. 
> (state=,code=0)
> 2015-04-17 23:24:56,720 [2ace69b7-760a-4977-6eb9-1a39e4c1bb07:foreman] INFO  
> o.a.drill.exec.work.foreman.Foreman - State change requested.  PENDING --> 
> FAILED
> org.apache.drill.exec.work.foreman.ForemanException: Unexpected exception 
> during fragment initialization: star should have been expanded
> at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:211) 
> [drill-java-exec-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_75]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_75]
> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_75]
> Caused by: java.lang.AssertionError: star should have been expanded
> at org.eigenbase.sql.validate.AggChecker.visit(AggChecker.java:81) 
> ~[optiq-core-0.9-drill-r21.jar:na]
> at org.eigenbase.sql.validate.AggChecker.visit(AggChecker.java:31) 
> ~[optiq-core-0.9-drill-r21.jar:na]
> at org.eigenbase.sql.SqlIdentifier.accept(SqlIdentifier.java:222) 
> ~[optiq-core-0.9-drill-r21.jar:na]
> at 
> org.eigenbase.sql.util.SqlBasicVisitor$ArgHandlerImpl.visitChild(SqlBasicVisitor.java:107)
>  ~[optiq-core-0.9-drill-r21.jar:na]
> at org.eigenbase.sql.SqlOperator.acceptCall(SqlOperator.java:688) 
> ~[optiq-core-0.9-drill-r21.jar:na]
> at org.eigenbase.sql.validate.AggChecker.visit(AggChecker.java:139) 
> ~[optiq-core-0.9-drill-r21.jar:na]
> at org.eigenbase.sql.validate.AggChecker.visit(AggChecker.java:31) 
> ~[optiq-core-0.9-drill-r21.jar:na]
> at org.eigenbase.sql.SqlCall.accept(SqlCall.java:125) 
> ~[optiq-core-0.9-drill-r21.jar:na]
> at 
> org.eigenbase.sql.util.SqlBasicVisitor$ArgHandlerImpl.visitChild(SqlBasicVisitor.java:107)
>  ~[optiq-core-0.9-drill-r21.jar:na]
> at org.eigenbase.sql.SqlOperator.acceptCall(SqlOperator.java:688) 
> ~[optiq-core-0.9-drill-r21.jar:na]
> at org.eigenbase.sql.validate.AggChecker.visit(AggChecker.java:139) 
> ~[optiq-core-0.9-drill-r21.jar:na]
> at org.eigenbase.sql.validate.AggChecker.visit(AggChecker.java:31) 
> ~[optiq-core-0.9-drill-r21.jar:na]
> at org.eigenbase.sql.SqlCall.accept(SqlCall.java:125) 
> ~[optiq-core-0.9-drill-r21.jar:na]
> at 
> org.eigenbase.sql.validate.AggregatingSelectScope.checkAggregateExpr(AggregatingSelectScope.java:155)
>  ~[optiq-core-0.9-drill-r21.jar:na]
> at 
> org.eigenbase.sql.validate.AggregatingSelectScope.validateExpr(AggregatingSelectScope.java:164)
>  ~[optiq-core-0.9-drill-r21.jar:na]
> at 
>

[jira] [Closed] (DRILL-2694) Correlated subquery can not be planned

2015-12-15 Thread Khurram Faraaz (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Khurram Faraaz closed DRILL-2694.
-

> Correlated subquery can not be planned
> --
>
> Key: DRILL-2694
> URL: https://issues.apache.org/jira/browse/DRILL-2694
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 0.9.0
> Environment: 9d92b8e319f2d46e8659d903d355450e15946533 | DRILL-2580: 
> Exit early from HashJoinBatch if build side is empty | 26.03.2015 @ 16:13:53 
> EDT 
>Reporter: Khurram Faraaz
>Assignee: Khurram Faraaz
> Fix For: 1.1.0
>
>
> Correlated subquery can not be planned. Test was run on 4 node cluster on 
> CentOS. Please note that Physical plan tab and Visualized plan tabs were 
> empty for the below query.
> {code}
> 0: jdbc:drill:> select * from `allTypData.csv` t1 where t1.columns[0] > 
> (select min(columns[0]) from `allTypData2.csv` t2);
> Query failed: UnsupportedRelOperatorException: This query cannot be planned 
> possibly due to either a cartesian join or an inequality join
> Error: exception while executing query: Failure while executing query. 
> (state=,code=0)
> Table in outer query has 295 rows.
> 0: jdbc:drill:> select count(*) from `allTypData.csv`;
> ++
> |   EXPR$0   |
> ++
> | 295|
> ++
> 1 row selected (0.082 seconds)
> Table in inner query has 999 rows.
> 0: jdbc:drill:> select count(*) from `allTypData2.csv`;
> ++
> |   EXPR$0   |
> ++
> | 999|
> ++
> 1 row selected (0.083 seconds)
> sub query is,
> 0: jdbc:drill:> select min(columns[0]) from `allTypData2.csv`;
> ++
> |   EXPR$0   |
> ++
> | -141497 |
> ++
> 1 row selected (0.097 seconds)
> Note that when we replace the sub-query with the value that the sub-query 
> returns, the original query returns results. It fails only when there is 
> correlated subquery.
> select * from `allTypData.csv` t1 where t1.columns[0] > -141497;
> ++
> |  columns   |
> ++
> ...
> ++
> 214 rows selected (0.162 seconds)
> 0: jdbc:drill:> explain plan for select * from `allTypData.csv` t1 where 
> t1.columns[0] > (select min(columns[0]) from `allTypData2.csv` t2);
> Query failed: UnsupportedRelOperatorException: This query cannot be planned 
> possibly due to either a cartesian join or an inequality join
> Error: exception while executing query: Failure while executing query. 
> (state=,code=0)
> {code}
> Stack trace from drillbit.log 
> {code}
> 2015-04-04 21:12:40,687 [2adfac37-037b-c692-31b1-41d8004d9b13:foreman] INFO  
> o.a.drill.exec.work.foreman.Foreman - State change requested.  PENDING --> 
> FAILED
> org.apache.drill.exec.work.foreman.UnsupportedRelOperatorException: This 
> query cannot be planned possibly due to either a cartesian join or an 
> inequality join
> at 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToDrel(DefaultSqlHandler.java:217)
>  ~[drill-java-exec-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT]
> at 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan(DefaultSqlHandler.java:138)
>  ~[drill-java-exec-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT]
> at 
> org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:145)
>  ~[drill-java-exec-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:773) 
> ~[drill-java-exec-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT]
> at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:204) 
> ~[drill-java-exec-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_75]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_75]
> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_75]
> 2015-04-04 21:12:40,695 [2adfac37-037b-c692-31b1-41d8004d9b13:foreman] INFO  
> o.a.drill.exec.work.foreman.Foreman - foreman cleaning up - status: []
> 2015-04-04 21:12:40,696 [2adfac37-037b-c692-31b1-41d8004d9b13:foreman] ERROR 
> o.a.drill.exec.work.foreman.Foreman - Error 
> 59dcbb50-d418-400a-9c0c-7331fcb6b344: UnsupportedRelOperatorException: This 
> query cannot be planned possibly due to either a cartesian join or an 
> inequality join
> org.apache.drill.exec.work.foreman.UnsupportedRelOperatorException: This 
> query cannot be planned possibly due to either a cartesian join or an 
> inequality join
> at 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToDrel(DefaultSqlHandler.java:217)
>

[jira] [Commented] (DRILL-2953) Group By + Order By query results are not ordered.

2015-12-15 Thread Khurram Faraaz (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-2953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059657#comment-15059657
 ] 

Khurram Faraaz commented on DRILL-2953:
---

Verified Fix on Drill 1.4, test needs to be added.

{code}
0: jdbc:drill:schema=dfs.tmp> select cast(columns[0] as int) from 
`testWindow.csv` t2 where t2.columns[0] is not null group by columns[0] order 
by cast(columns[0] as int);
+-+
| EXPR$0  |
+-+
| 2   |
| 10  |
| 50  |
| 55  |
| 57  |
| 61  |
| 67  |
| 89  |
| 100 |
| 113 |
| 119 |
+-+
11 rows selected (0.465 seconds)
{code}


> Group By + Order By query results are not ordered.
> --
>
> Key: DRILL-2953
> URL: https://issues.apache.org/jira/browse/DRILL-2953
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 0.9.0
> Environment: 10833d2cae9f5312cf0e31f8c9f3f8a9dcdc0c45 | Commit 0.9.0 
> release version. | 03.05.2015 @ 14:56:56 EDT
>Reporter: Khurram Faraaz
>Assignee: Jinfeng Ni
>Priority: Critical
> Fix For: 1.0.0
>
> Attachments: 
> 0001-DRILL-2953-Ensure-sort-would-be-enforced-when-a-cast.patch
>
>
> Group by + order by query does not return results in correct order. Sort is 
> performed before the aggregation is done, which should not be the case.
> Test was performed on 4 node cluster on CentOS.
> {code}
> 0: jdbc:drill:> select cast(columns[0] as int) c1 from `testWindow.csv` t2 
> where t2.columns[0] is not null group by columns[0] order by columns[0];
> ++
> | c1 |
> ++
> | 10 |
> | 100|
> | 113|
> | 119|
> | 2  |
> | 50 |
> | 55 |
> | 57 |
> | 61 |
> | 67 |
> | 89 |
> ++
> 11 rows selected (0.218 seconds)
> {code}
> Explain plan for that query that returns wrong results.
> {code}
> 0: jdbc:drill:> explain plan for select cast(columns[0] as int) c1 from 
> `testWindow.csv` t2 where t2.columns[0] is not null group by columns[0] order 
> by columns[0];
> +++
> |text|json|
> +++
> | 00-00Screen
> 00-01  Project(c1=[$0])
> 00-02Project(c1=[CAST($0):INTEGER], EXPR$1=[$0])
> 00-03  StreamAgg(group=[{0}])
> 00-04Sort(sort0=[$0], dir0=[ASC])
> 00-05  Filter(condition=[IS NOT NULL($0)])
> 00-06Project(ITEM=[ITEM($0, 0)])
> 00-07  Scan(groupscan=[EasyGroupScan 
> [selectionRoot=/tmp/testWindow.csv, numFiles=1, columns=[`columns`[0]], 
> files=[maprfs:/tmp/testWindow.csv]]])
> {code} 
> Incorrect results , not in order.
> {code}
> 0: jdbc:drill:> select cast(columns[0] as int) from `testWindow.csv` t2 where 
> t2.columns[0] is not null group by columns[0] order by columns[0];
> ++
> |   EXPR$0   |
> ++
> | 10 |
> | 100|
> | 113|
> | 119|
> | 2  |
> | 50 |
> | 55 |
> | 57 |
> | 61 |
> | 67 |
> | 89 |
> ++
> 11 rows selected (0.214 seconds)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Closed] (DRILL-2770) Aggregate query returns AssertionError: star should have been expanded

2015-12-15 Thread Khurram Faraaz (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Khurram Faraaz closed DRILL-2770.
-

> Aggregate query returns AssertionError: star should have been expanded
> --
>
> Key: DRILL-2770
> URL: https://issues.apache.org/jira/browse/DRILL-2770
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 0.9.0
> Environment: | 393a8affdab9b93093a7afcc81d016e720d7781f | MD-192: 
> CONVERT_FROM in where clause | 25.03.2015 @ 17:57:28 EDT
>Reporter: Khurram Faraaz
>Assignee: Khurram Faraaz
> Fix For: 1.1.0
>
>
> Aggregate query that should return maximum value, reports an AssertionError.
> Test was performed on 4 node cluster on CentOS.
> {code}
> 0: jdbc:drill:> select max(columns[0]) from (select * from `countries.csv` 
> offset 1) tmp order by tmp.columns[1];
> Query failed: AssertionError: star should have been expanded
> Error: exception while executing query: Failure while executing query. 
> (state=,code=0)
> 0: jdbc:drill:> select max(tmp.columns[0]) from (select * from 
> `countries.csv` offset 1) tmp order by tmp.columns[1];
> Query failed: AssertionError: star should have been expanded
> Error: exception while executing query: Failure while executing query. 
> (state=,code=0)
> {code}
> Aggregate query to get the maximum value from columns[0]
> {code}
> 0: jdbc:drill:> select max(columns[0]) from `countries.csv`;
> ++
> |   EXPR$0   |
> ++
> | 302802,"VE","Venezuela","SA","http://en.wikipedia.org/wiki/Venezuela;, |
> ++
> 1 row selected (0.372 seconds)
> {code}
> query without the order by, returns results. Although it should have returned 
> just the maximum value in that column, we also see some other data returned 
> by the query. We do not see the assertion when the order by is removed.
> {code}
> 0: jdbc:drill:> select max(tmp.columns[0]) from (select * from 
> `countries.csv` offset 1) tmp;
> ++
> |   EXPR$0   |
> ++
> | 302802,"VE","Venezuela","SA","http://en.wikipedia.org/wiki/Venezuela;, |
> ++
> 1 row selected (0.192 seconds)
> {code}
> Note that there is header information in the CSV file in the first row. 
> {code}
> 0: jdbc:drill:> select * from `countries.csv` limit 2;
> ++
> |  columns   |
> ++
> | ["\"id\",\"code\",\"name\",\"continent\",\"wikipedia_link\",\"keywords\""] |
> | 
> ["302672,\"AD\",\"Andorra\",\"EU\",\"http://en.wikipedia.org/wiki/Andorra\",;]
>  |
> ++
> 2 rows selected (0.14 seconds)
> {code}
> Snippet from CSV data file
> {code}
> [root@centos-01 airport_CSV_data]# head -10 countries.csv 
> "id","code","name","continent","wikipedia_link","keywords"
> 302672,"AD","Andorra","EU","http://en.wikipedia.org/wiki/Andorra;,
> 302618,"AE","United Arab 
> Emirates","AS","http://en.wikipedia.org/wiki/United_Arab_Emirates","UAE;
> 302619,"AF","Afghanistan","AS","http://en.wikipedia.org/wiki/Afghanistan;,
> 302722,"AG","Antigua and 
> Barbuda","NA","http://en.wikipedia.org/wiki/Antigua_and_Barbuda;,
> 302723,"AI","Anguilla","NA","http://en.wikipedia.org/wiki/Anguilla;,
> 302673,"AL","Albania","EU","http://en.wikipedia.org/wiki/Albania;,
> 302620,"AM","Armenia","AS","http://en.wikipedia.org/wiki/Armenia;,
> 302556,"AO","Angola","AF","http://en.wikipedia.org/wiki/Angola;,
> 302615,"AQ","Antarctica","AN","http://en.wikipedia.org/wiki/Antarctica;,
> {code}
> Stack trace from drillbit.log
> {code}
> 2015-04-13 20:09:18,198 [2ad3dd91-0f3b-b882-8ae6-45f8ad208fb6:foreman] ERROR 
> o.a.drill.exec.work.foreman.Foreman - Error 
> 4eee026f-6235-45a8-84b1-dd8302edec3c: AssertionError: star should have been 
> expanded
> org.apache.drill.exec.work.foreman.ForemanException: Unexpected exception 
> during fragment initialization: star should have been expanded
> at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:213) 
> [drill-java-exec-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_75]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_75]
> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_75]
> Caused by: java.lang.AssertionError: star should have been expanded
> at org.eigenbase.sql.validate.AggChecker.visit(AggChecker.java:81) 
> ~[optiq-core-0.9-drill-r20.jar:na]
> at org.eigenbase.sql.validate.AggChecker.visit(AggChecker.java:31) 
> ~[optiq-core-0.9-drill-r20.jar:na]
> at org.eigenbase.sql.SqlIdentifier.accept(SqlIdentifier.java:222) 
> ~[optiq-core-0.9-drill-r20.jar:na]
> at 
>

[jira] [Closed] (DRILL-2631) Project one column from output of Union All results in Column count mismatch in UNION ALL

2015-12-15 Thread Khurram Faraaz (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Khurram Faraaz closed DRILL-2631.
-

> Project one column from output of Union All results in Column count mismatch 
> in UNION ALL
> -
>
> Key: DRILL-2631
> URL: https://issues.apache.org/jira/browse/DRILL-2631
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 0.9.0
>Reporter: Khurram Faraaz
>Assignee: Sean Hsuan-Yi Chu
> Fix For: 0.9.0
>
>
> Projecting values from a single column, from the output of Union All results 
> in SqlValidatorException: Column count mismatch in Union All.
> Tests were run on a 4 node cluster.
> case 1) select distinct values from column, fails.
> {code}
> Error: exception while executing query: Failure while executing query. 
> (state=,code=0)
> 0: jdbc:drill:> select distinct sum_salary from (select * from (select 
> sum(cast(columns[3] as int)) over(partition by cast(columns[4] as 
> varchar(25)) order by columns[4]) sum_salary, cast(columns[1] as varchar(50)) 
> name, cast(columns[4] as varchar(25)) department from `testWindow.csv`) where 
> sum_salary > 10.0  order by name ) union all (select * from (select 
> sum(cast(columns[3] as int)) over(partition by cast(columns[4] as 
> varchar(25)) order by columns[4]) sum_salary, cast(columns[1] as varchar(50)) 
> name, cast(columns[4] as varchar(25)) department from `testWindow.csv`) where 
> sum_salary > 10.0  order by name);
> Query failed: SqlValidatorException: Column count mismatch in UNION ALL
> Error: exception while executing query: Failure while executing query. 
> (state=,code=0)
> {code}
> case 2) select non distinct values from column fails.
> {code}
> 0: jdbc:drill:> select sum_salary from (select * from (select 
> sum(cast(columns[3] as int)) over(partition by cast(columns[4] as 
> varchar(25)) order by columns[4]) sum_salary, cast(columns[1] as varchar(50)) 
> name, cast(columns[4] as varchar(25)) department from `testWindow.csv`) where 
> sum_salary > 10.0  order by name ) union all (select * from (select 
> sum(cast(columns[3] as int)) over(partition by cast(columns[4] as 
> varchar(25)) order by columns[4]) sum_salary, cast(columns[1] as varchar(50)) 
> name, cast(columns[4] as varchar(25)) department from `testWindow.csv`) where 
> sum_salary > 10.0  order by name);
> Query failed: SqlValidatorException: Column count mismatch in UNION ALL
> {code}
> Results returned by the sub-query.(this sub-query is used in the Union All 
> query above in case 1 and case 2)
> {code}
> 0: jdbc:drill:> select * from (select sum(cast(columns[3] as int)) 
> over(partition by cast(columns[4] as varchar(25)) order by columns[4]) 
> sum_salary, cast(columns[1] as varchar(50)) name, cast(columns[4] as 
> varchar(25)) department from `testWindow.csv`) where sum_salary > 10.0  
> order by name;
> ++++
> | sum_salary |name| department |
> ++++
> | 452000 | Bill Sawyer | Engineering |
> | 452000 | Bob Sr | Engineering |
> | 452000 | Jane Doe   | Engineering |
> | 452000 | Kumar  | Engineering |
> | 199000 | Patrick| Sales  |
> | 20 | Rock Breaker | Product Management |
> | 199000 | Sam| Sales  |
> | 452000 | Susan  | Engineering |
> ++++
> 8 rows selected (0.217 seconds)
> {code}
> Case where, select * from (Query 1) Union All (Query 2); This query returns 
> correct results. this is the case where we project all columns from output of 
> Union All
> {code}
> 0: jdbc:drill:> select * from (select * from (select sum(cast(columns[3] as 
> int)) over(partition by cast(columns[4] as varchar(25)) order by columns[4]) 
> sum_salary, cast(columns[1] as varchar(50)) name, cast(columns[4] as 
> varchar(25)) department from `testWindow.csv`) where sum_salary > 10.0  
> order by name ) union all (select * from (select sum(cast(columns[3] as int)) 
> over(partition by cast(columns[4] as varchar(25)) order by columns[4]) 
> sum_salary, cast(columns[1] as varchar(50)) name, cast(columns[4] as 
> varchar(25)) department from `testWindow.csv`) where sum_salary > 10.0  
> order by name);
> ++++
> | sum_salary |name| department |
> ++++
> | 452000 | Bill Sawyer | Engineering |
> | 452000 | Bob Sr | Engineering |
> | 452000 | Jane Doe   | Engineering |
> | 452000 | Kumar  | Engineering |
> | 199000 | Patrick| Sales  |
> | 20 | Rock Breaker | Product Management |
> | 199000 | Sam| Sales  |
> | 452000 | Susan

[jira] [Updated] (DRILL-3478) Bson Record Reader for Mongo storage plugin

2015-12-15 Thread B Anil Kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-3478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

B Anil Kumar updated DRILL-3478:

Attachment: Test_queries_with_review_comment_fixes

> Bson Record Reader for Mongo storage plugin
> ---
>
> Key: DRILL-3478
> URL: https://issues.apache.org/jira/browse/DRILL-3478
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - MongoDB
>Reporter: B Anil Kumar
>Assignee: B Anil Kumar
> Fix For: Future
>
> Attachments: Test_queries_with_review_comment_fixes, 
> drill_bson_sqlline_test_2015_1
>
>
> Improve the mongo query performance.
> We are considering the suggestions provided by [~dragoncurve] and [~hgunes] 
> in drill mailing chain.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

67 matches

Mail list logo