[jira] [Commented] (DRILL-5378) Put more information into SchemaChangeException when HashJoin hit SchemaChangeException

2017-03-23 Thread Jinfeng Ni (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15939559#comment-15939559
 ] 

Jinfeng Ni commented on DRILL-5378:
---

Include HashAgg and sort operator in the proposed change as well, since those 
two operators does not support schema change as well.

Prior error message :
{code}
UNSUPPORTED_OPERATION ERROR: Hash aggregate does not support schema 
changesSchema changed. 
{code}

New proposed error message in SchemaChangeException : 

{code}
UNSUPPORTED_OPERATION ERROR: Hash aggregate does not support schema 
changesSchema changed. 
Prior schema : 
BatchSchema [fields=[year(VARCHAR:OPTIONAL)], selectionVector=NONE]
New schema : 
BatchSchema [fields=[year(VARCHAR:REQUIRED)], selectionVector=NONE]
{code}

> Put more information into SchemaChangeException when HashJoin hit 
> SchemaChangeException
> ---
>
> Key: DRILL-5378
> URL: https://issues.apache.org/jira/browse/DRILL-5378
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Reporter: Jinfeng Ni
>Assignee: Jinfeng Ni
>Priority: Minor
>
> HashJoin currently does not allow schema change in either build side or probe 
> side. When HashJoin hit SchemaChangeException in the middle of execution, 
> Drill reports a brief error message about SchemaChangeException, without 
> providing any information what schemas are in the incoming batches. That 
> makes hard to analyze the error, and understand what's going on. 
> It probably makes sense to put the two differing schemas in the error 
> message, so that user could get better idea about the schema change. 
> Before Drill can provide support for schema change in HashJoin, the detailed 
> error message would help user debug error. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (DRILL-5378) Put more information into SchemaChangeException when HashJoin hit SchemaChangeException

2017-03-23 Thread Jinfeng Ni (JIRA)
Jinfeng Ni created DRILL-5378:
-

 Summary: Put more information into SchemaChangeException when 
HashJoin hit SchemaChangeException
 Key: DRILL-5378
 URL: https://issues.apache.org/jira/browse/DRILL-5378
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Reporter: Jinfeng Ni
Assignee: Jinfeng Ni
Priority: Minor


HashJoin currently does not allow schema change in either build side or probe 
side. When HashJoin hit SchemaChangeException in the middle of execution, Drill 
reports a brief error message about SchemaChangeException, without providing 
any information what schemas are in the incoming batches. That makes hard to 
analyze the error, and understand what's going on. 

It probably makes sense to put the two differing schemas in the error message, 
so that user could get better idea about the schema change. 
Before Drill can provide support for schema change in HashJoin, the detailed 
error message would help user debug error. 




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-4667) Improve memory footprint of broadcast joins

2017-03-23 Thread Kunal Khatua (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Khatua updated DRILL-4667:

Fix Version/s: Future

> Improve memory footprint of broadcast joins
> ---
>
> Key: DRILL-4667
> URL: https://issues.apache.org/jira/browse/DRILL-4667
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Relational Operators
>Affects Versions: 1.6.0
>Reporter: Aman Sinha
> Fix For: Future
>
>
> For broadcast joins, currently Drill optimizes the data transfer across the 
> network for broadcast table by sending a single copy to the receiving node 
> which then distributes it to all minor fragments running on that particular 
> node.  However, each minor fragment builds its own hash table (for a hash 
> join) using this broadcast table.  We can substantially improve the memory 
> footprint by having a shared copy of the hash table among multiple minor 
> fragments on a node.  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-4971) query encounters system error: Statement "break AndOP3" is not enclosed by a breakable statement with label "AndOP3"

2017-03-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15939442#comment-15939442
 ] 

ASF GitHub Bot commented on DRILL-4971:
---

Github user jinfengni commented on a diff in the pull request:

https://github.com/apache/drill/pull/792#discussion_r107808909
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/EvaluationVisitor.java 
---
@@ -671,8 +674,9 @@ private HoldingContainer 
visitBooleanAnd(BooleanOperator op,
   HoldingContainer out = generator.declare(op.getMajorType());
 
   JLabel label = generator.getEvalBlockLabel("AndOP");
-  JBlock eval = generator.getEvalBlock().block();  // enter into 
nested block
-  generator.nestEvalBlock(eval);
+  JBlock eval = new JBlock();
--- End diff --

How about we add the following two methods in ClassGenerator.java: 

```java
  private JBlock createInnerBlock(BlockType type) {
final JBlock currBlock = getBlock(type);
final JBlock innerBlock = new JBlock();
currBlock.add(innerBlock);
return innerBlock;
  }

  protected JBlock createInnerEvalBlock() {
return createInnerBlock(BlockType.EVAL);
  }
```

Then, replace `generator.getEvalBlock().block()` with 
`generator.createInnerEvalBlock()`  ?



> query encounters system error: Statement "break AndOP3" is not enclosed by a 
> breakable statement with label "AndOP3"
> 
>
> Key: DRILL-4971
> URL: https://issues.apache.org/jira/browse/DRILL-4971
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Reporter: Robert Hou
>Assignee: Vitalii Diravka
> Attachments: low_table, medium_table
>
>
> This query returns an error.  The stack trace suggests it might be a schema 
> change issue, but there is no schema change in this table.  Many other 
> queries are succeeding.
> select count(\*) from test where ((int_id > 3060 and int_id < 6002) or 
> (int_id > 9025 and int_id < 11976)) and ((int_id > 9025 and int_id < 11976) 
> or (int_id > 3060 and int_id < 6002)) and (int_id > 3060 and int_id < 6002);
> Error: SYSTEM ERROR: CompileException: Line 232, Column 30: Statement "break 
> AndOP3" is not enclosed by a breakable statement with label "AndOP3"
> [Error Id: 254d093b-79a1-4425-802c-ade08db293e4 on qa-node211:31010]^M
> ^M
>   (org.apache.drill.exec.exception.SchemaChangeException) Failure while 
> attempting to load generated class^M
> 
> org.apache.drill.exec.physical.impl.filter.FilterRecordBatch.generateSV2Filterer():198^M
> 
> org.apache.drill.exec.physical.impl.filter.FilterRecordBatch.setupNewSchema():107^M
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():78^M
> There are two partitions to the test table.  One covers the range 3061 - 6001 
> and the other covers the range 9026 - 11975.
> This second query returns a different, but possibly related, error.  
> select count(\*) from orders_parts where (((int_id > -3025 and int_id < -4) 
> or (int_id > -5 and int_id < 3061) or (int_id > 3060 and int_id < 6002)) and 
> (int_id > -5 and int_id < 3061)) and (((int_id > -5 and int_id < 3061) or 
> (int_id > 9025 and int_id < 11976)) and (int_id > -5 and int_id < 3061))^M
> Failed with exception^M
> java.sql.SQLException: SYSTEM ERROR: CompileException: Line 447, Column 30: 
> Statement "break AndOP6" is not enclosed by a breakable statement with label 
> "AndOP6"^M
> ^M
> Fragment 0:0^M
> ^M
> [Error Id: ac09187e-d3a2-41a7-a659-b287aca6039c on qa-node209:31010]^M
> ^M
>   (org.apache.drill.exec.exception.SchemaChangeException) Failure while 
> attempting to load generated class^M
> 
> org.apache.drill.exec.physical.impl.filter.FilterRecordBatch.generateSV2Filterer():198^M
> 
> org.apache.drill.exec.physical.impl.filter.FilterRecordBatch.setupNewSchema():107^M
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():78^M



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5377) Drill returns weird characters when parquet date auto-correction is turned off

2017-03-23 Thread Rahul Challapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15939358#comment-15939358
 ] 

Rahul Challapalli commented on DRILL-5377:
--

[~zelaine] There are no issues with auto-correction enabled. Every thing works 
as expected.
We also added an option to disable the auto-correction just-in-case someone 
actually wants to use those very old dates in their data sets. In testing that 
option, I encountered null characters (^@) being returned by drill.

> Drill returns weird characters when parquet date auto-correction is turned off
> --
>
> Key: DRILL-5377
> URL: https://issues.apache.org/jira/browse/DRILL-5377
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.10.0
>Reporter: Rahul Challapalli
>
> git.commit.id.abbrev=38ef562
> Below is the output, I get from test framework when I disable auto correction 
> for date fields
> {code}
> select l_shipdate from table(cp.`tpch/lineitem.parquet` (type => 'parquet', 
> autoCorrectCorruptDates => false)) order by l_shipdate limit 10;
> ^@356-03-19
> ^@356-03-21
> ^@356-03-21
> ^@356-03-23
> ^@356-03-24
> ^@356-03-24
> ^@356-03-26
> ^@356-03-26
> ^@356-03-26
> ^@356-03-26
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5377) Drill returns weird characters when parquet date auto-correction is turned off

2017-03-23 Thread Zelaine Fong (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15939352#comment-15939352
 ] 

Zelaine Fong commented on DRILL-5377:
-

[~rkins] - are the dates correct if auto-correction is enabled? 

> Drill returns weird characters when parquet date auto-correction is turned off
> --
>
> Key: DRILL-5377
> URL: https://issues.apache.org/jira/browse/DRILL-5377
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.10.0
>Reporter: Rahul Challapalli
>
> git.commit.id.abbrev=38ef562
> Below is the output, I get from test framework when I disable auto correction 
> for date fields
> {code}
> select l_shipdate from table(cp.`tpch/lineitem.parquet` (type => 'parquet', 
> autoCorrectCorruptDates => false)) order by l_shipdate limit 10;
> ^@356-03-19
> ^@356-03-21
> ^@356-03-21
> ^@356-03-23
> ^@356-03-24
> ^@356-03-24
> ^@356-03-26
> ^@356-03-26
> ^@356-03-26
> ^@356-03-26
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (DRILL-5377) Drill returns weird characters when parquet date auto-correction is turned off

2017-03-23 Thread Rahul Challapalli (JIRA)
Rahul Challapalli created DRILL-5377:


 Summary: Drill returns weird characters when parquet date 
auto-correction is turned off
 Key: DRILL-5377
 URL: https://issues.apache.org/jira/browse/DRILL-5377
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - Parquet
Affects Versions: 1.10.0
Reporter: Rahul Challapalli


git.commit.id.abbrev=38ef562

Below is the output, I get from test framework when I disable auto correction 
for date fields
{code}
select l_shipdate from table(cp.`tpch/lineitem.parquet` (type => 'parquet', 
autoCorrectCorruptDates => false)) order by l_shipdate limit 10;

^@356-03-19
^@356-03-21
^@356-03-21
^@356-03-23
^@356-03-24
^@356-03-24
^@356-03-26
^@356-03-26
^@356-03-26
^@356-03-26
{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-4301) OOM : Unable to allocate sv2 for 1000 records, and not enough batchGroups to spill.

2017-03-23 Thread Zelaine Fong (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15939222#comment-15939222
 ] 

Zelaine Fong commented on DRILL-4301:
-

[~Paul.Rogers] - as I noted in my comment, I believe the partition pruning 
error is DRILL-4139.  There is a pull request for that Jira but it needs 
further testing.  The issue has been assigned.

> OOM : Unable to allocate sv2 for 1000 records, and not enough batchGroups to 
> spill.
> ---
>
> Key: DRILL-4301
> URL: https://issues.apache.org/jira/browse/DRILL-4301
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Execution - Flow
>Affects Versions: 1.5.0
> Environment: 4 node cluster
>Reporter: Khurram Faraaz
>Assignee: Paul Rogers
> Fix For: 1.10.0
>
>
> Query below in Functional tests, fails due to OOM 
> {code}
> select * from dfs.`/drill/testdata/metadata_caching/fewtypes_boolpartition` 
> where bool_col = true;
> {code}
> Drill version : drill-1.5.0
> JAVA_VERSION=1.8.0
> {noformat}
> version   commit_id   commit_message  commit_time build_email 
> build_time
> 1.5.0-SNAPSHOT2f0e3f27e630d5ac15cdaef808564e01708c3c55
> DRILL-4190 Don't hold on to batches from left side of merge join.   
> 20.01.2016 @ 22:30:26 UTC   Unknown 20.01.2016 @ 23:48:33 UTC
> framework/framework/resources/Functional/metadata_caching/data/bool_partition1.q
>  (connection: 808078113)
> [#1378] Query failed: 
> oadd.org.apache.drill.common.exceptions.UserRemoteException: RESOURCE ERROR: 
> One or more nodes ran out of memory while executing the query.
> Unable to allocate sv2 for 1000 records, and not enough batchGroups to spill.
> batchGroups.size 0
> spilledBatchGroups.size 0
> allocated memory 48326272
> allocator limit 46684427
> Fragment 0:0
> [Error Id: 97d58ea3-8aff-48cf-a25e-32363b8e0ecd on drill-demod2:31010]
>   at 
> oadd.org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:119)
>   at 
> oadd.org.apache.drill.exec.rpc.user.UserClient.handleReponse(UserClient.java:113)
>   at 
> oadd.org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:46)
>   at 
> oadd.org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:31)
>   at oadd.org.apache.drill.exec.rpc.RpcBus.handle(RpcBus.java:67)
>   at 
> oadd.org.apache.drill.exec.rpc.RpcBus$RequestEvent.run(RpcBus.java:374)
>   at 
> oadd.org.apache.drill.common.SerializedExecutor$RunnableProcessor.run(SerializedExecutor.java:89)
>   at 
> oadd.org.apache.drill.exec.rpc.RpcBus$SameExecutor.execute(RpcBus.java:252)
>   at 
> oadd.org.apache.drill.common.SerializedExecutor.execute(SerializedExecutor.java:123)
>   at 
> oadd.org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:285)
>   at 
> oadd.org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:257)
>   at 
> oadd.io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:89)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>   at 
> oadd.io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:254)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>   at 
> oadd.io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>   at 
> oadd.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:242)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>   at 
> oadd.io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>   at 
> 

[jira] [Commented] (DRILL-4301) OOM : Unable to allocate sv2 for 1000 records, and not enough batchGroups to spill.

2017-03-23 Thread Paul Rogers (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15939160#comment-15939160
 ] 

Paul Rogers commented on DRILL-4301:


I did not see an issue, but I may not have run the same test.

The issue is in Parquet. Recall that Drill does not support Parquet logical 
types; only the physical types.

Looks like Parquet is trying to map a column value to a bit vector. The stack 
trace shows a failure in the planner, not execution.

Still we should take a look to figure out what's happening.

> OOM : Unable to allocate sv2 for 1000 records, and not enough batchGroups to 
> spill.
> ---
>
> Key: DRILL-4301
> URL: https://issues.apache.org/jira/browse/DRILL-4301
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Execution - Flow
>Affects Versions: 1.5.0
> Environment: 4 node cluster
>Reporter: Khurram Faraaz
>Assignee: Paul Rogers
> Fix For: 1.10.0
>
>
> Query below in Functional tests, fails due to OOM 
> {code}
> select * from dfs.`/drill/testdata/metadata_caching/fewtypes_boolpartition` 
> where bool_col = true;
> {code}
> Drill version : drill-1.5.0
> JAVA_VERSION=1.8.0
> {noformat}
> version   commit_id   commit_message  commit_time build_email 
> build_time
> 1.5.0-SNAPSHOT2f0e3f27e630d5ac15cdaef808564e01708c3c55
> DRILL-4190 Don't hold on to batches from left side of merge join.   
> 20.01.2016 @ 22:30:26 UTC   Unknown 20.01.2016 @ 23:48:33 UTC
> framework/framework/resources/Functional/metadata_caching/data/bool_partition1.q
>  (connection: 808078113)
> [#1378] Query failed: 
> oadd.org.apache.drill.common.exceptions.UserRemoteException: RESOURCE ERROR: 
> One or more nodes ran out of memory while executing the query.
> Unable to allocate sv2 for 1000 records, and not enough batchGroups to spill.
> batchGroups.size 0
> spilledBatchGroups.size 0
> allocated memory 48326272
> allocator limit 46684427
> Fragment 0:0
> [Error Id: 97d58ea3-8aff-48cf-a25e-32363b8e0ecd on drill-demod2:31010]
>   at 
> oadd.org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:119)
>   at 
> oadd.org.apache.drill.exec.rpc.user.UserClient.handleReponse(UserClient.java:113)
>   at 
> oadd.org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:46)
>   at 
> oadd.org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:31)
>   at oadd.org.apache.drill.exec.rpc.RpcBus.handle(RpcBus.java:67)
>   at 
> oadd.org.apache.drill.exec.rpc.RpcBus$RequestEvent.run(RpcBus.java:374)
>   at 
> oadd.org.apache.drill.common.SerializedExecutor$RunnableProcessor.run(SerializedExecutor.java:89)
>   at 
> oadd.org.apache.drill.exec.rpc.RpcBus$SameExecutor.execute(RpcBus.java:252)
>   at 
> oadd.org.apache.drill.common.SerializedExecutor.execute(SerializedExecutor.java:123)
>   at 
> oadd.org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:285)
>   at 
> oadd.org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:257)
>   at 
> oadd.io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:89)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>   at 
> oadd.io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:254)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>   at 
> oadd.io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>   at 
> oadd.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:242)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>   at 
> oadd.io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
> 

[jira] [Commented] (DRILL-5376) Rationalize Drill's row structure for simpler code, better performance

2017-03-23 Thread Paul Rogers (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15939149#comment-15939149
 ] 

Paul Rogers commented on DRILL-5376:


Just to be clear, the issue here is not the structure of the *data*, but rather 
the structure of the *metadata*. The point is to define a uniform tuple-based 
metadata description that provides uniform access to the current, existing 
vector-based storage.

That is, if we have a "flattened" row structure:

{code}
a, b.c, b.d, e
{code}

We can easily get/set values by index. Given a row index into a vector set, 
provide a simple integer index to access column values to update the current, 
rather complex set of mechanisms used to get values from vector "bundles."

The underlying implementation is the same; the only difference is to 
rationalize our various schemas, map vectors, readers, writers and so on to 
make the code simpler and more performant.


> Rationalize Drill's row structure for simpler code, better performance
> --
>
> Key: DRILL-5376
> URL: https://issues.apache.org/jira/browse/DRILL-5376
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>
> Drill is a columnar system, but data is ultimately represented as rows (AKA 
> records or tuples.) The way that Drill represents rows leads to excessive 
> code complexity and runtime cost.
> Data in Drill is stored in vectors: one (or more) per column. Vectors do not 
> stand alone, however, they are "bundled" into various forms of grouping: the 
> {{VectorContainer}}, {{RecordBatch}}, {{VectorAccessible}}, 
> {{VectorAccessibleSerializable}}, and more. Each has slightly different 
> semantics, requiring large amounts of code to bridge between the 
> representations.
> Consider only a simple row: one with only scalar columns. In classic 
> relational theory, such a row is a tuple:
> {code}
> R = (a, b, c, d, ...)
> {code}
> A tuple is defined as an ordered list of column values. Unlike a list or 
> array, the column values also have names and may have varying data types.
> In SQL, columns are referenced by either position or name. In most execution 
> engines, columns are referenced by position (since positions, in most 
> systems, cannot change.) A 1:1 mapping is provided between names and 
> positions. (See the JDBC {{RecordSet}} interface.)
> This allows code to be very fast: code references columns by index, not by 
> name, avoiding name lookups for each column reference.
> Drill provides a murky, hybrid approach. Some structures ({{BatchSchema}}, 
> for example) appear to provide a fixed column ordering, allowing indexed 
> column access. But, other abstractions provide only an iterator. Others (such 
> as {{VectorContainer}}) provides name-based access or, by clever programming, 
> indexed access.
> As a result, it is never clear exactly how to quickly access a column: by 
> name, by name to multi-part index to vector?
> Of course, Drill also supports maps, which add to the complexity. First, we 
> must understand that a "map" in Drill is not a "map" in the classic sense: it 
> is not a collection of (name, value) pairs in the JSON sense: a collection in 
> which each instance may have a different set of pairs.
> Instead, in Drill, a "map" is really a nested tuple: a map has the same 
> structure as a Drill record: a collection of names and values in which all 
> rows have the same structure. (This is so because maps are really a 
> collection of value vectors, and the vectors cut across all rows.)
> Drill, however, does not reflect this symmetry: that a row and a map are both 
> tuples. There are no common abstractions for the two. Instead, maps are 
> represented as a {{MapVector}} that contains a (name, vector) map for its 
> children.
> Because of this name-based mapping, high-speed indexed access to vectors is 
> not provided "out of the box." Certainly each consumer of a map can build its 
> own indexing mechanism. But, this leads to code complexity and redundancy.
> This ticket asks to rationalize Drill's row, map and schema abstractions 
> around the tuple concept. A schema is a description of a tuple and should (as 
> in JDBC) provide both name and index based access. That is, provide methods 
> of the form:
> {code}
> MaterializedField getField(int index);
> MaterializedField getField(String name);
> ...
> ValueVector getVector(int index);
> ValueVector getVector(String name);
> {code}
> Provide a common abstraction for rows and maps, recognizing their structural 
> similarity.
> There is an obvious issue with indexing columns in a row when the row 
> contains maps. Should indexing be multi-part (index into row, then into map) 
> as today? A better alternative is to provide a flattened interface:
> {code}
> 0: a, 1: b.x, 2: 

[jira] [Commented] (DRILL-5299) Query queues can be split by different users or different businesses

2017-03-23 Thread Saurabh Mahapatra (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15938937#comment-15938937
 ] 

Saurabh Mahapatra commented on DRILL-5299:
--

[~zhzthecoder]: I am trying to get a better understanding of this. Is the goal 
to allow prioritize queries at the queue level i.e. express lane vs normal lane

> Query queues can be split by different users or different businesses
> 
>
> Key: DRILL-5299
> URL: https://issues.apache.org/jira/browse/DRILL-5299
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: Future
> Environment: RH Linux / OpenJDK 8
>Reporter: HZZ dep
>
> So far we can have 2 query queues that are split based on the cost model. For 
> serving multiple users and multiple businesses, Drill cluster maintainers 
> would always like to control the query concurrency that are allocated for a 
> concrete user or a concrete businesses. Which means, each user or business 
> can have its own "query pool" defined by a specific "pool size".



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5101) Provide boot-time option to disable the Dynamic UDF feature

2017-03-23 Thread Gopi Kumar (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15938880#comment-15938880
 ] 

Gopi Kumar commented on DRILL-5101:
---

Here is a (somewhat hacky) workaround to run on Drill 1.9 or 1.10 embedded on 
Windows 2012 or 2016 if you are an admin account and facing this error. Run 
these commands before the first time you are running sqlline.bat. 

mkdir %userprofile%\drill
mkdir %userprofile%\drill\udf
mkdir %userprofile%\drill\udf\registry
mkdir %userprofile%\drill\udf\tmp
mkdir %userprofile%\drill\udf\staging

takeown /R /F %userprofile%\drill

Basically, precreate the udf directories and recursively reset file and folder 
owner to yourself instead of builtin\Administrators. (Note: I did not test any 
UDF features in Drill itself  after this. Just trying to use the standard 
querying functionality). 




> Provide boot-time option to disable the Dynamic UDF feature
> ---
>
> Key: DRILL-5101
> URL: https://issues.apache.org/jira/browse/DRILL-5101
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.9.0
>Reporter: Paul Rogers
>Assignee: Arina Ielchiieva
>Priority: Minor
>
> A Windows user on the mailing list could not start an embedded Drillbit 
> because the Dynamic UDF feature tried to create a directory on the user's 
> protected Users folder:
> {code}
> Error: Failure in starting embedded Drillbit: 
> org.apache.drill.common.exceptions
> .DrillRuntimeException: Error during udf area creation 
> [/C:/Users/ivy.chan/drill
> /udf/registry] on file system [file:///] (state=,code=0)
> java.sql.SQLException: Failure in starting embedded Drillbit: 
> org.apache.drill.c
> ommon.exceptions.DrillRuntimeException: Error during udf area creation 
> [/C:/User
> s/ivy.chan/drill/udf/registry] on file system [file:///]
>at 
> org.apache.drill.jdbc.impl.DrillConnectionImpl.(DrillConnection
> Impl.java:128)
>at 
> org.apache.drill.jdbc.impl.DrillJdbc41Factory.newDrillConnection(Dril
> lJdbc41Factory.java:70)
>at 
> org.apache.drill.jdbc.impl.DrillFactory.newConnection(DrillFactory.ja
> va:69)
> {code}
> The fastest workaround (since this was an embedded Drillbit) would be to 
> disable the Dynamic UDF feature. Unfortunately, the only option to do so is a 
> runtime option that requires that the Drillbit be started. That creates a 
> vicious circle: we can't start the Drillbit unless we disable Dynamic UDFs, 
> but we can't disable them unless we start the Drillbit.
> The workaround might be to change the root directory, which is why this bug 
> is marked minor.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5101) Provide boot-time option to disable the Dynamic UDF feature

2017-03-23 Thread Gopi Kumar (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15938857#comment-15938857
 ] 

Gopi Kumar commented on DRILL-5101:
---

Thanks for the response. I agree we shouldnt hardcode group name. 

So basically it looks like none of the 3 conditions succeed on Windows when 
running embedded in the Windows administrator account. I believe in Windows, 
file / folder security is based on ACLs not the u+g+o model of *nix systems 
which seems to be what is checked in the 3 conditions above and hence failing 
all these checks. I confirmed this problem does not exist when I run from a 
standard user. 

Are you proposing to just check that the directory exists and treat rest as a 
warning? That would work for this case. Is this something that can be fixed in 
next version? It will be really helpful for Windows users. This has been an 
issue only in Drill 1.9 onwards. 

> Provide boot-time option to disable the Dynamic UDF feature
> ---
>
> Key: DRILL-5101
> URL: https://issues.apache.org/jira/browse/DRILL-5101
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.9.0
>Reporter: Paul Rogers
>Assignee: Arina Ielchiieva
>Priority: Minor
>
> A Windows user on the mailing list could not start an embedded Drillbit 
> because the Dynamic UDF feature tried to create a directory on the user's 
> protected Users folder:
> {code}
> Error: Failure in starting embedded Drillbit: 
> org.apache.drill.common.exceptions
> .DrillRuntimeException: Error during udf area creation 
> [/C:/Users/ivy.chan/drill
> /udf/registry] on file system [file:///] (state=,code=0)
> java.sql.SQLException: Failure in starting embedded Drillbit: 
> org.apache.drill.c
> ommon.exceptions.DrillRuntimeException: Error during udf area creation 
> [/C:/User
> s/ivy.chan/drill/udf/registry] on file system [file:///]
>at 
> org.apache.drill.jdbc.impl.DrillConnectionImpl.(DrillConnection
> Impl.java:128)
>at 
> org.apache.drill.jdbc.impl.DrillJdbc41Factory.newDrillConnection(Dril
> lJdbc41Factory.java:70)
>at 
> org.apache.drill.jdbc.impl.DrillFactory.newConnection(DrillFactory.ja
> va:69)
> {code}
> The fastest workaround (since this was an embedded Drillbit) would be to 
> disable the Dynamic UDF feature. Unfortunately, the only option to do so is a 
> runtime option that requires that the Drillbit be started. That creates a 
> vicious circle: we can't start the Drillbit unless we disable Dynamic UDFs, 
> but we can't disable them unless we start the Drillbit.
> The workaround might be to change the root directory, which is why this bug 
> is marked minor.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5376) Rationalize Drill's row structure for simpler code, better performance

2017-03-23 Thread Jinfeng Ni (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15938739#comment-15938739
 ] 

Jinfeng Ni commented on DRILL-5376:
---

I'm not fully convinced this is the right idea, until I see some prototype 
showing the advantage of row based structure over column based structure. For 
one thing regarding name based vs index based, it's true that Drill execution 
used name based approach, in stead of index or position based approach which is 
commonly used in traditional RDBMS. That's because schema could be different, 
in the sense of column order, additional columns, and the name based approach 
is designed to handle that. For instance, if I have two json files.   The query 
"select A,B from dfs.`/path/to/jsonfiles` will work using named based approach. 
I'm not clear how it would work for position-based execution in your row-based 
structure. 

{code}
{"A" : "foo1",
 "B" " "foo2"
}
{"B" : "foo3",
 "A" : "foo4"
}
{code}

One point regarding the efficiency of name based approach: the name-based 
resolution only happens at batch level, not at row level, and the name-based 
resolution only happens when there is a new schema. If the schema remains same 
for upcoming batches, name-based resolution does not have to happen. 





> Rationalize Drill's row structure for simpler code, better performance
> --
>
> Key: DRILL-5376
> URL: https://issues.apache.org/jira/browse/DRILL-5376
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>
> Drill is a columnar system, but data is ultimately represented as rows (AKA 
> records or tuples.) The way that Drill represents rows leads to excessive 
> code complexity and runtime cost.
> Data in Drill is stored in vectors: one (or more) per column. Vectors do not 
> stand alone, however, they are "bundled" into various forms of grouping: the 
> {{VectorContainer}}, {{RecordBatch}}, {{VectorAccessible}}, 
> {{VectorAccessibleSerializable}}, and more. Each has slightly different 
> semantics, requiring large amounts of code to bridge between the 
> representations.
> Consider only a simple row: one with only scalar columns. In classic 
> relational theory, such a row is a tuple:
> {code}
> R = (a, b, c, d, ...)
> {code}
> A tuple is defined as an ordered list of column values. Unlike a list or 
> array, the column values also have names and may have varying data types.
> In SQL, columns are referenced by either position or name. In most execution 
> engines, columns are referenced by position (since positions, in most 
> systems, cannot change.) A 1:1 mapping is provided between names and 
> positions. (See the JDBC {{RecordSet}} interface.)
> This allows code to be very fast: code references columns by index, not by 
> name, avoiding name lookups for each column reference.
> Drill provides a murky, hybrid approach. Some structures ({{BatchSchema}}, 
> for example) appear to provide a fixed column ordering, allowing indexed 
> column access. But, other abstractions provide only an iterator. Others (such 
> as {{VectorContainer}}) provides name-based access or, by clever programming, 
> indexed access.
> As a result, it is never clear exactly how to quickly access a column: by 
> name, by name to multi-part index to vector?
> Of course, Drill also supports maps, which add to the complexity. First, we 
> must understand that a "map" in Drill is not a "map" in the classic sense: it 
> is not a collection of (name, value) pairs in the JSON sense: a collection in 
> which each instance may have a different set of pairs.
> Instead, in Drill, a "map" is really a nested tuple: a map has the same 
> structure as a Drill record: a collection of names and values in which all 
> rows have the same structure. (This is so because maps are really a 
> collection of value vectors, and the vectors cut across all rows.)
> Drill, however, does not reflect this symmetry: that a row and a map are both 
> tuples. There are no common abstractions for the two. Instead, maps are 
> represented as a {{MapVector}} that contains a (name, vector) map for its 
> children.
> Because of this name-based mapping, high-speed indexed access to vectors is 
> not provided "out of the box." Certainly each consumer of a map can build its 
> own indexing mechanism. But, this leads to code complexity and redundancy.
> This ticket asks to rationalize Drill's row, map and schema abstractions 
> around the tuple concept. A schema is a description of a tuple and should (as 
> in JDBC) provide both name and index based access. That is, provide methods 
> of the form:
> {code}
> MaterializedField getField(int index);
> MaterializedField getField(String name);
> ...
> ValueVector getVector(int index);
> ValueVector getVector(String name);
> 

[jira] [Comment Edited] (DRILL-5376) Rationalize Drill's row structure for simpler code, better performance

2017-03-23 Thread Jinfeng Ni (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15938739#comment-15938739
 ] 

Jinfeng Ni edited comment on DRILL-5376 at 3/23/17 4:44 PM:


I'm not fully convinced this is the right idea, until I see some prototype 
showing the advantage of row based structure over column based structure. For 
one thing regarding name based vs index based, it's true that Drill execution 
used name based approach, in stead of index or position based approach which is 
commonly used in traditional RDBMS. That's because schema could be different, 
in the sense of column order, additional columns, and the name based approach 
is designed to handle that. For instance, if I have two json files.   The query 
"select A,B from dfs.`/path/to/jsonfiles` will work using named based approach. 
I'm not clear how it would work for position-based execution in your row-based 
structure. 

{code}
{"A" : "foo1",
 "B" : "foo2"
}
{"B" : "foo3",
 "A" : "foo4"
}
{code}

One point regarding the efficiency of name based approach: the name-based 
resolution only happens at batch level, not at row level, and the name-based 
resolution only happens when there is a new schema. If the schema remains same 
for upcoming batches, name-based resolution does not have to happen. 






was (Author: jni):
I'm not fully convinced this is the right idea, until I see some prototype 
showing the advantage of row based structure over column based structure. For 
one thing regarding name based vs index based, it's true that Drill execution 
used name based approach, in stead of index or position based approach which is 
commonly used in traditional RDBMS. That's because schema could be different, 
in the sense of column order, additional columns, and the name based approach 
is designed to handle that. For instance, if I have two json files.   The query 
"select A,B from dfs.`/path/to/jsonfiles` will work using named based approach. 
I'm not clear how it would work for position-based execution in your row-based 
structure. 

{code}
{"A" : "foo1",
 "B" " "foo2"
}
{"B" : "foo3",
 "A" : "foo4"
}
{code}

One point regarding the efficiency of name based approach: the name-based 
resolution only happens at batch level, not at row level, and the name-based 
resolution only happens when there is a new schema. If the schema remains same 
for upcoming batches, name-based resolution does not have to happen. 





> Rationalize Drill's row structure for simpler code, better performance
> --
>
> Key: DRILL-5376
> URL: https://issues.apache.org/jira/browse/DRILL-5376
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>
> Drill is a columnar system, but data is ultimately represented as rows (AKA 
> records or tuples.) The way that Drill represents rows leads to excessive 
> code complexity and runtime cost.
> Data in Drill is stored in vectors: one (or more) per column. Vectors do not 
> stand alone, however, they are "bundled" into various forms of grouping: the 
> {{VectorContainer}}, {{RecordBatch}}, {{VectorAccessible}}, 
> {{VectorAccessibleSerializable}}, and more. Each has slightly different 
> semantics, requiring large amounts of code to bridge between the 
> representations.
> Consider only a simple row: one with only scalar columns. In classic 
> relational theory, such a row is a tuple:
> {code}
> R = (a, b, c, d, ...)
> {code}
> A tuple is defined as an ordered list of column values. Unlike a list or 
> array, the column values also have names and may have varying data types.
> In SQL, columns are referenced by either position or name. In most execution 
> engines, columns are referenced by position (since positions, in most 
> systems, cannot change.) A 1:1 mapping is provided between names and 
> positions. (See the JDBC {{RecordSet}} interface.)
> This allows code to be very fast: code references columns by index, not by 
> name, avoiding name lookups for each column reference.
> Drill provides a murky, hybrid approach. Some structures ({{BatchSchema}}, 
> for example) appear to provide a fixed column ordering, allowing indexed 
> column access. But, other abstractions provide only an iterator. Others (such 
> as {{VectorContainer}}) provides name-based access or, by clever programming, 
> indexed access.
> As a result, it is never clear exactly how to quickly access a column: by 
> name, by name to multi-part index to vector?
> Of course, Drill also supports maps, which add to the complexity. First, we 
> must understand that a "map" in Drill is not a "map" in the classic sense: it 
> is not a collection of (name, value) pairs in the JSON sense: a collection in 
> which each instance may have a different set of pairs.
> Instead, in Drill, a 

[jira] [Commented] (DRILL-5375) Nested loop join: return correct result for left join

2017-03-23 Thread Zelaine Fong (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15938592#comment-15938592
 ] 

Zelaine Fong commented on DRILL-5375:
-

[~arina] - thanks for your explanation on right/full joins.  Makes sense.

> Nested loop join: return correct result for left join
> -
>
> Key: DRILL-5375
> URL: https://issues.apache.org/jira/browse/DRILL-5375
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.8.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>  Labels: doc-impacting
>
> Mini repro:
> 1. Create 2 Hive tables with data
> {code}
> CREATE TABLE t1 (
>   FYQ varchar(999),
>   dts varchar(999),
>   dte varchar(999)
> )
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
> 2016-Q1,2016-06-01,2016-09-30
> 2016-Q2,2016-09-01,2016-12-31
> 2016-Q3,2017-01-01,2017-03-31
> 2016-Q4,2017-04-01,2017-06-30
> CREATE TABLE t2 (
>   who varchar(999),
>   event varchar(999),
>   dt varchar(999)
> )
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
> aperson,did somthing,2017-01-06
> aperson,did somthing else,2017-01-12
> aperson,had chrsitmas,2016-12-26
> aperson,went wild,2016-01-01
> {code}
> 2. Impala Query shows correct result
> {code}
> select t2.dt, t1.fyq, t2.who, t2.event
> from t2
> left join t1 on t2.dt between t1.dts and t1.dte
> order by t2.dt;
> ++-+-+---+
> | dt | fyq | who | event |
> ++-+-+---+
> | 2016-01-01 | NULL| aperson | went wild |
> | 2016-12-26 | 2016-Q2 | aperson | had chrsitmas |
> | 2017-01-06 | 2016-Q3 | aperson | did somthing  |
> | 2017-01-12 | 2016-Q3 | aperson | did somthing else |
> ++-+-+---+
> {code}
> 3. Drill query shows wrong results:
> {code}
> alter session set planner.enable_nljoin_for_scalar_only=false;
> use hive;
> select t2.dt, t1.fyq, t2.who, t2.event
> from t2
> left join t1 on t2.dt between t1.dts and t1.dte
> order by t2.dt;
> +-+--+--++
> | dt  |   fyq|   who|   event|
> +-+--+--++
> | 2016-12-26  | 2016-Q2  | aperson  | had chrsitmas  |
> | 2017-01-06  | 2016-Q3  | aperson  | did somthing   |
> | 2017-01-12  | 2016-Q3  | aperson  | did somthing else  |
> +-+--+--++
> 3 rows selected (2.523 seconds)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (DRILL-5375) Nested loop join: return correct result for left join

2017-03-23 Thread Arina Ielchiieva (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15938401#comment-15938401
 ] 

Arina Ielchiieva edited comment on DRILL-5375 at 3/23/17 3:46 PM:
--

[~amansinha100]
Yes, this issue is specific for non-equi join conditions. We can't always split 
non-equi join condition into two sides as with equi-join conditions.
Example: non-equi join condition can contain self join which can not be 
transformed into filter since OR clause is present 
{noformat}select * from t1 inner join t2 on t1.c1 >= t2.c1 or t1.c3 <> 
t1.c4{noformat}
So for non-equi join we transform the whole logical expression together (as 
filter does but it does it for expression that contains fields only from one 
input), in case with nested loop join, logical expression contains fields from 
two inputs. Thus we need some indication from which batch field should be taken 
during code generation.
I have created PR, hope it will shred more light. But please let me know if 
anything can be optimized or I have missed something.


was (Author: arina):
[~amansinha100]
Yes, this issue is specific for non-equi join conditions. We can't always split 
into two sides as with equi-join conditions.
Example: it can contain self join which can not be transformed into filter 
since OR clause is present 
{noformat}select * from t1 inner join t2 on t1.c1 >= t2.c1 or t1.c3 <> 
t1.c4{noformat}
So for non-equi join we transform the whole logical expression together (as 
filter does but it does it for expression that contains fields only from one 
input), in case with nested loop join, logical expression contains fields from 
two inputs. Thus we need some indication from which batch field should be taken 
during code generation.
I have create PR, hope it will shred more light. But please let me know if 
anything can be optimized or I have missed something.

> Nested loop join: return correct result for left join
> -
>
> Key: DRILL-5375
> URL: https://issues.apache.org/jira/browse/DRILL-5375
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.8.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>  Labels: doc-impacting
>
> Mini repro:
> 1. Create 2 Hive tables with data
> {code}
> CREATE TABLE t1 (
>   FYQ varchar(999),
>   dts varchar(999),
>   dte varchar(999)
> )
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
> 2016-Q1,2016-06-01,2016-09-30
> 2016-Q2,2016-09-01,2016-12-31
> 2016-Q3,2017-01-01,2017-03-31
> 2016-Q4,2017-04-01,2017-06-30
> CREATE TABLE t2 (
>   who varchar(999),
>   event varchar(999),
>   dt varchar(999)
> )
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
> aperson,did somthing,2017-01-06
> aperson,did somthing else,2017-01-12
> aperson,had chrsitmas,2016-12-26
> aperson,went wild,2016-01-01
> {code}
> 2. Impala Query shows correct result
> {code}
> select t2.dt, t1.fyq, t2.who, t2.event
> from t2
> left join t1 on t2.dt between t1.dts and t1.dte
> order by t2.dt;
> ++-+-+---+
> | dt | fyq | who | event |
> ++-+-+---+
> | 2016-01-01 | NULL| aperson | went wild |
> | 2016-12-26 | 2016-Q2 | aperson | had chrsitmas |
> | 2017-01-06 | 2016-Q3 | aperson | did somthing  |
> | 2017-01-12 | 2016-Q3 | aperson | did somthing else |
> ++-+-+---+
> {code}
> 3. Drill query shows wrong results:
> {code}
> alter session set planner.enable_nljoin_for_scalar_only=false;
> use hive;
> select t2.dt, t1.fyq, t2.who, t2.event
> from t2
> left join t1 on t2.dt between t1.dts and t1.dte
> order by t2.dt;
> +-+--+--++
> | dt  |   fyq|   who|   event|
> +-+--+--++
> | 2016-12-26  | 2016-Q2  | aperson  | had chrsitmas  |
> | 2017-01-06  | 2016-Q3  | aperson  | did somthing   |
> | 2017-01-12  | 2016-Q3  | aperson  | did somthing else  |
> +-+--+--++
> 3 rows selected (2.523 seconds)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5375) Nested loop join: return correct result for left join

2017-03-23 Thread Zelaine Fong (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong updated DRILL-5375:

Reviewer: Aman Sinha

Assigned Reviewer to [~amansinha100]

> Nested loop join: return correct result for left join
> -
>
> Key: DRILL-5375
> URL: https://issues.apache.org/jira/browse/DRILL-5375
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.8.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>  Labels: doc-impacting
>
> Mini repro:
> 1. Create 2 Hive tables with data
> {code}
> CREATE TABLE t1 (
>   FYQ varchar(999),
>   dts varchar(999),
>   dte varchar(999)
> )
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
> 2016-Q1,2016-06-01,2016-09-30
> 2016-Q2,2016-09-01,2016-12-31
> 2016-Q3,2017-01-01,2017-03-31
> 2016-Q4,2017-04-01,2017-06-30
> CREATE TABLE t2 (
>   who varchar(999),
>   event varchar(999),
>   dt varchar(999)
> )
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
> aperson,did somthing,2017-01-06
> aperson,did somthing else,2017-01-12
> aperson,had chrsitmas,2016-12-26
> aperson,went wild,2016-01-01
> {code}
> 2. Impala Query shows correct result
> {code}
> select t2.dt, t1.fyq, t2.who, t2.event
> from t2
> left join t1 on t2.dt between t1.dts and t1.dte
> order by t2.dt;
> ++-+-+---+
> | dt | fyq | who | event |
> ++-+-+---+
> | 2016-01-01 | NULL| aperson | went wild |
> | 2016-12-26 | 2016-Q2 | aperson | had chrsitmas |
> | 2017-01-06 | 2016-Q3 | aperson | did somthing  |
> | 2017-01-12 | 2016-Q3 | aperson | did somthing else |
> ++-+-+---+
> {code}
> 3. Drill query shows wrong results:
> {code}
> alter session set planner.enable_nljoin_for_scalar_only=false;
> use hive;
> select t2.dt, t1.fyq, t2.who, t2.event
> from t2
> left join t1 on t2.dt between t1.dts and t1.dte
> order by t2.dt;
> +-+--+--++
> | dt  |   fyq|   who|   event|
> +-+--+--++
> | 2016-12-26  | 2016-Q2  | aperson  | had chrsitmas  |
> | 2017-01-06  | 2016-Q3  | aperson  | did somthing   |
> | 2017-01-12  | 2016-Q3  | aperson  | did somthing else  |
> +-+--+--++
> 3 rows selected (2.523 seconds)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Reopened] (DRILL-5164) Equi-join query results in CompileException when inputs have large number of columns

2017-03-23 Thread Zelaine Fong (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong reopened DRILL-5164:
-
  Assignee: Volodymyr Vysotskyi  (was: Serhii Harnyk)

Reopened, based on [~khfaraaz]'s findings that the repro query fails, albeit 
with a different error.

> Equi-join query results in CompileException when inputs have large number of 
> columns
> 
>
> Key: DRILL-5164
> URL: https://issues.apache.org/jira/browse/DRILL-5164
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Codegen
>Affects Versions: 1.9.0
>Reporter: Khurram Faraaz
>Assignee: Volodymyr Vysotskyi
>Priority: Critical
> Fix For: 1.10.0
>
> Attachments: manyColsInJson.json
>
>
> Drill 1.9.0 
> git commit ID : 4c1b420b
> 4 node CentOS cluster
> JSON file has 4095 keys (columns)
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> select * from `manyColsInJson.json` t1, 
> `manyColsInJson.json` t2 where t1.key2000 = t2.key2000;
> Error: SYSTEM ERROR: CompileException: File 
> 'org.apache.drill.exec.compile.DrillJavaFileObject[HashJoinProbeGen294.java]',
>  Line 16397, Column 17: HashJoinProbeGen294.java:16397: error: code too large
> public void doSetup(FragmentContext context, VectorContainer buildBatch, 
> RecordBatch probeBatch, RecordBatch outgoing)
> ^ (compiler.err.limit.code)
> Fragment 0:0
> [Error Id: 7d0efa7e-e183-4c40-939a-4908699f94bf on centos-01.qa.lab:31010] 
> (state=,code=0)
> {noformat}
> Stack trace from drillbit.log
> {noformat}
> 2016-12-26 09:52:11,321 [279f17fd-c8f0-5d18-1124-76099f0a5cc8:frag:0:0] ERROR 
> o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: CompileException: File 
> 'org.apache.drill.exec.compile.DrillJavaFileObject[HashJoinProbeGen294.java]',
>  Line 16397, Column 17: HashJoinProbeGen294.java:16397: error: code too large
> public void doSetup(FragmentContext context, VectorContainer buildBatch, 
> RecordBatch probeBatch, RecordBatch outgoing)
> ^ (compiler.err.limit.code)
> Fragment 0:0
> [Error Id: 7d0efa7e-e183-4c40-939a-4908699f94bf on centos-01.qa.lab:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> CompileException: File 
> 'org.apache.drill.exec.compile.DrillJavaFileObject[HashJoinProbeGen294.java]',
>  Line 16397, Column 17: HashJoinProbeGen294.java:16397: error: code too large
> public void doSetup(FragmentContext context, VectorContainer buildBatch, 
> RecordBatch probeBatch, RecordBatch outgoing)
> ^ (compiler.err.limit.code)
> Fragment 0:0
> [Error Id: 7d0efa7e-e183-4c40-939a-4908699f94bf on centos-01.qa.lab:31010]
> at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:543)
>  ~[drill-common-1.9.0.jar:1.9.0]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:293)
>  [drill-java-exec-1.9.0.jar:1.9.0]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:160)
>  [drill-java-exec-1.9.0.jar:1.9.0]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:262)
>  [drill-java-exec-1.9.0.jar:1.9.0]
> at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.9.0.jar:1.9.0]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [na:1.8.0_91]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_91]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_91]
> Caused by: org.apache.drill.common.exceptions.DrillRuntimeException: 
> org.apache.drill.exec.exception.SchemaChangeException: 
> org.apache.drill.exec.exception.ClassTransformationException: 
> java.util.concurrent.ExecutionException: 
> org.apache.drill.exec.exception.ClassTransformationException: Failure 
> generating transformation classes for value:
> package org.apache.drill.exec.test.generated;
> ...
> public class HashJoinProbeGen294 {
> NullableVarCharVector[] vv0;
> NullableVarCharVector vv3;
> NullableVarCharVector[] vv6;
> ...
> vv49137 .copyFromSafe((probeIndex), (outIndex), vv49134);
> vv49143 .copyFromSafe((probeIndex), (outIndex), vv49140);
> vv49149 .copyFromSafe((probeIndex), (outIndex), vv49146);
> }
> }
> 
> public void __DRILL_INIT__()
> throws SchemaChangeException
> {
> }
> }
> at 
> org.apache.drill.exec.compile.ClassTransformer.getImplementationClass(ClassTransformer.java:302)
>  ~[drill-java-exec-1.9.0.jar:1.9.0]
> at 
> 

[jira] [Commented] (DRILL-4938) Report UserException when constant expression reduction fails

2017-03-23 Thread Zelaine Fong (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15938565#comment-15938565
 ] 

Zelaine Fong commented on DRILL-4938:
-

[~khfaraaz] - the gist of this fix is to report a better error, not to 
eliminate the error.  I see that the query now returns a "PLAN ERROR" instead 
of a "SYSTEM ERROR".  

> Report UserException when constant expression reduction fails
> -
>
> Key: DRILL-4938
> URL: https://issues.apache.org/jira/browse/DRILL-4938
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.9.0
>Reporter: Khurram Faraaz
>Assignee: Serhii Harnyk
>Priority: Minor
> Fix For: 1.10.0
>
>
> We need a better error message instead of DrillRuntimeException
> Drill 1.9.0 git commit ID : 4edabe7a
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> select (res1 = 2016/09/22) res2
> . . . . . . . . . . . . . . > from
> . . . . . . . . . . . . . . > (
> . . . . . . . . . . . . . . > select (case when (false) then null else 
> cast('2016/09/22' as date) end) res1
> . . . . . . . . . . . . . . > from (values(1)) foo
> . . . . . . . . . . . . . . > ) foobar;
> Error: SYSTEM ERROR: DrillRuntimeException: Failure while materializing 
> expression in constant expression evaluator [CASE(false, =(null, /(/(2016, 
> 9), 22)), =(CAST('2016/09/22'):DATE NOT NULL, /(/(2016, 9), 22)))].  Errors:
> Error in expression at index -1.  Error: Missing function implementation: 
> [castTIMESTAMP(INT-REQUIRED)].  Full expression: --UNKNOWN EXPRESSION--.
> Error in expression at index -1.  Error: Missing function implementation: 
> [castTIMESTAMP(INT-REQUIRED)].  Full expression: --UNKNOWN EXPRESSION--.
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (DRILL-4139) Exception while trying to prune partition. java.lang.UnsupportedOperationException: Unsupported type: BIT

2017-03-23 Thread Zelaine Fong (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong reassigned DRILL-4139:
---

Assignee: Volodymyr Vysotskyi  (was: Aman Sinha)

Volodymyr -- this might be a good issue for you to start with.  There is 
already a pull request, but it looks like it's missing a unit test.  Can you 
add the unit test and then do the necessary testing.  Thanks.

> Exception while trying to prune partition. 
> java.lang.UnsupportedOperationException: Unsupported type: BIT
> -
>
> Key: DRILL-4139
> URL: https://issues.apache.org/jira/browse/DRILL-4139
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.3.0
> Environment: 4 node cluster on CentOS
>Reporter: Khurram Faraaz
>Assignee: Volodymyr Vysotskyi
>
> Exception while trying to prune partition.
> java.lang.UnsupportedOperationException: Unsupported type: BIT
> is seen in drillbit.log after Functional run on 4 node cluster.
> Drill 1.3.0 sys.version => d61bb83a8
> {code}
> 2015-11-27 03:12:19,809 [29a835ec-3c02-0fb6-d3c1-bae276ef7385:foreman] INFO  
> o.a.d.e.p.l.partition.PruneScanRule - Beginning partition pruning, pruning 
> class: org.apache.drill.exec.planner.logical.partition.ParquetPruneScanRule$2
> 2015-11-27 03:12:19,809 [29a835ec-3c02-0fb6-d3c1-bae276ef7385:foreman] INFO  
> o.a.d.e.p.l.partition.PruneScanRule - Total elapsed time to build and analyze 
> filter tree: 0 ms
> 2015-11-27 03:12:19,810 [29a835ec-3c02-0fb6-d3c1-bae276ef7385:foreman] WARN  
> o.a.d.e.p.l.partition.PruneScanRule - Exception while trying to prune 
> partition.
> java.lang.UnsupportedOperationException: Unsupported type: BIT
> at 
> org.apache.drill.exec.store.parquet.ParquetGroupScan.populatePruningVector(ParquetGroupScan.java:479)
>  ~[drill-java-exec-1.3.0.jar:1.3.0]
> at 
> org.apache.drill.exec.planner.ParquetPartitionDescriptor.populatePartitionVectors(ParquetPartitionDescriptor.java:96)
>  ~[drill-java-exec-1.3.0.jar:1.3.0]
> at 
> org.apache.drill.exec.planner.logical.partition.PruneScanRule.doOnMatch(PruneScanRule.java:235)
>  ~[drill-java-exec-1.3.0.jar:1.3.0]
> at 
> org.apache.drill.exec.planner.logical.partition.ParquetPruneScanRule$2.onMatch(ParquetPruneScanRule.java:87)
>  [drill-java-exec-1.3.0.jar:1.3.0]
> at 
> org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:228)
>  [calcite-core-1.4.0-drill-r8.jar:1.4.0-drill-r8]
> at 
> org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:808)
>  [calcite-core-1.4.0-drill-r8.jar:1.4.0-drill-r8]
> at 
> org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:303) 
> [calcite-core-1.4.0-drill-r8.jar:1.4.0-drill-r8]
> at 
> org.apache.calcite.prepare.PlannerImpl.transform(PlannerImpl.java:303) 
> [calcite-core-1.4.0-drill-r8.jar:1.4.0-drill-r8]
> at 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.logicalPlanningVolcanoAndLopt(DefaultSqlHandler.java:545)
>  [drill-java-exec-1.3.0.jar:1.3.0]
> at 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToDrel(DefaultSqlHandler.java:213)
>  [drill-java-exec-1.3.0.jar:1.3.0]
> at 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToDrel(DefaultSqlHandler.java:248)
>  [drill-java-exec-1.3.0.jar:1.3.0]
> at 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan(DefaultSqlHandler.java:164)
>  [drill-java-exec-1.3.0.jar:1.3.0]
> at 
> org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:184)
>  [drill-java-exec-1.3.0.jar:1.3.0]
> at 
> org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:905) 
> [drill-java-exec-1.3.0.jar:1.3.0]
> at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:244) 
> [drill-java-exec-1.3.0.jar:1.3.0]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_45]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_45]
> at java.lang.Thread.run(Thread.java:744) [na:1.7.0_45]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-4301) OOM : Unable to allocate sv2 for 1000 records, and not enough batchGroups to spill.

2017-03-23 Thread Zelaine Fong (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15938552#comment-15938552
 ] 

Zelaine Fong commented on DRILL-4301:
-

[~Paul.Rogers] - did you not see the partition pruning error when you tested 
this?

[~khfaraaz] - the partition pruning error you're seeing looks like DRILL-4139, 
which you previously reported.  It looks like there is a pull request for that 
issue, but it didn't get completed.  I will reassign that issue.

> OOM : Unable to allocate sv2 for 1000 records, and not enough batchGroups to 
> spill.
> ---
>
> Key: DRILL-4301
> URL: https://issues.apache.org/jira/browse/DRILL-4301
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Execution - Flow
>Affects Versions: 1.5.0
> Environment: 4 node cluster
>Reporter: Khurram Faraaz
>Assignee: Paul Rogers
> Fix For: 1.10.0
>
>
> Query below in Functional tests, fails due to OOM 
> {code}
> select * from dfs.`/drill/testdata/metadata_caching/fewtypes_boolpartition` 
> where bool_col = true;
> {code}
> Drill version : drill-1.5.0
> JAVA_VERSION=1.8.0
> {noformat}
> version   commit_id   commit_message  commit_time build_email 
> build_time
> 1.5.0-SNAPSHOT2f0e3f27e630d5ac15cdaef808564e01708c3c55
> DRILL-4190 Don't hold on to batches from left side of merge join.   
> 20.01.2016 @ 22:30:26 UTC   Unknown 20.01.2016 @ 23:48:33 UTC
> framework/framework/resources/Functional/metadata_caching/data/bool_partition1.q
>  (connection: 808078113)
> [#1378] Query failed: 
> oadd.org.apache.drill.common.exceptions.UserRemoteException: RESOURCE ERROR: 
> One or more nodes ran out of memory while executing the query.
> Unable to allocate sv2 for 1000 records, and not enough batchGroups to spill.
> batchGroups.size 0
> spilledBatchGroups.size 0
> allocated memory 48326272
> allocator limit 46684427
> Fragment 0:0
> [Error Id: 97d58ea3-8aff-48cf-a25e-32363b8e0ecd on drill-demod2:31010]
>   at 
> oadd.org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:119)
>   at 
> oadd.org.apache.drill.exec.rpc.user.UserClient.handleReponse(UserClient.java:113)
>   at 
> oadd.org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:46)
>   at 
> oadd.org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:31)
>   at oadd.org.apache.drill.exec.rpc.RpcBus.handle(RpcBus.java:67)
>   at 
> oadd.org.apache.drill.exec.rpc.RpcBus$RequestEvent.run(RpcBus.java:374)
>   at 
> oadd.org.apache.drill.common.SerializedExecutor$RunnableProcessor.run(SerializedExecutor.java:89)
>   at 
> oadd.org.apache.drill.exec.rpc.RpcBus$SameExecutor.execute(RpcBus.java:252)
>   at 
> oadd.org.apache.drill.common.SerializedExecutor.execute(SerializedExecutor.java:123)
>   at 
> oadd.org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:285)
>   at 
> oadd.org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:257)
>   at 
> oadd.io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:89)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>   at 
> oadd.io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:254)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>   at 
> oadd.io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>   at 
> oadd.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:242)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>   at 
> oadd.io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>   at 
> 

[jira] [Commented] (DRILL-5375) Nested loop join: return correct result for left join

2017-03-23 Thread Arina Ielchiieva (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15938401#comment-15938401
 ] 

Arina Ielchiieva commented on DRILL-5375:
-

[~amansinha100]
Yes, this issue is specific for non-equi join conditions. We can't always split 
into two sides as with equi-join conditions.
Example: it can contain self join which can not be transformed into filter 
since OR clause is present 
{noformat}select * from t1 inner join t2 on t1.c1 >= t2.c1 or t1.c3 <> 
t1.c4{noformat}
So for non-equi join we transform the whole logical expression together (as 
filter does but it does it for expression that contains fields only from one 
input), in case with nested loop join, logical expression contains fields from 
two inputs. Thus we need some indication from which batch field should be taken 
during code generation.
I have create PR, hope it will shred more light. But please let me know if 
anything can be optimized or I have missed something.

> Nested loop join: return correct result for left join
> -
>
> Key: DRILL-5375
> URL: https://issues.apache.org/jira/browse/DRILL-5375
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.8.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>  Labels: doc-impacting
>
> Mini repro:
> 1. Create 2 Hive tables with data
> {code}
> CREATE TABLE t1 (
>   FYQ varchar(999),
>   dts varchar(999),
>   dte varchar(999)
> )
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
> 2016-Q1,2016-06-01,2016-09-30
> 2016-Q2,2016-09-01,2016-12-31
> 2016-Q3,2017-01-01,2017-03-31
> 2016-Q4,2017-04-01,2017-06-30
> CREATE TABLE t2 (
>   who varchar(999),
>   event varchar(999),
>   dt varchar(999)
> )
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
> aperson,did somthing,2017-01-06
> aperson,did somthing else,2017-01-12
> aperson,had chrsitmas,2016-12-26
> aperson,went wild,2016-01-01
> {code}
> 2. Impala Query shows correct result
> {code}
> select t2.dt, t1.fyq, t2.who, t2.event
> from t2
> left join t1 on t2.dt between t1.dts and t1.dte
> order by t2.dt;
> ++-+-+---+
> | dt | fyq | who | event |
> ++-+-+---+
> | 2016-01-01 | NULL| aperson | went wild |
> | 2016-12-26 | 2016-Q2 | aperson | had chrsitmas |
> | 2017-01-06 | 2016-Q3 | aperson | did somthing  |
> | 2017-01-12 | 2016-Q3 | aperson | did somthing else |
> ++-+-+---+
> {code}
> 3. Drill query shows wrong results:
> {code}
> alter session set planner.enable_nljoin_for_scalar_only=false;
> use hive;
> select t2.dt, t1.fyq, t2.who, t2.event
> from t2
> left join t1 on t2.dt between t1.dts and t1.dte
> order by t2.dt;
> +-+--+--++
> | dt  |   fyq|   who|   event|
> +-+--+--++
> | 2016-12-26  | 2016-Q2  | aperson  | had chrsitmas  |
> | 2017-01-06  | 2016-Q3  | aperson  | did somthing   |
> | 2017-01-12  | 2016-Q3  | aperson  | did somthing else  |
> +-+--+--++
> 3 rows selected (2.523 seconds)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5375) Nested loop join: return correct result for left join

2017-03-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15938399#comment-15938399
 ] 

ASF GitHub Bot commented on DRILL-5375:
---

GitHub user arina-ielchiieva opened a pull request:

https://github.com/apache/drill/pull/794

DRILL-5375: Nested loop join: return correct result for left join

With this fix nested loop join will correctly process INNER and LEFT joins 
with non-equality conditions.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/arina-ielchiieva/drill DRILL-5375

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/794.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #794


commit 71628e70a525d9bd27b4f5f56259dce84c75154d
Author: Arina Ielchiieva 
Date:   2017-03-22T15:07:23Z

DRILL-5375: Nested loop join: return correct result for left join




> Nested loop join: return correct result for left join
> -
>
> Key: DRILL-5375
> URL: https://issues.apache.org/jira/browse/DRILL-5375
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.8.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>  Labels: doc-impacting
>
> Mini repro:
> 1. Create 2 Hive tables with data
> {code}
> CREATE TABLE t1 (
>   FYQ varchar(999),
>   dts varchar(999),
>   dte varchar(999)
> )
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
> 2016-Q1,2016-06-01,2016-09-30
> 2016-Q2,2016-09-01,2016-12-31
> 2016-Q3,2017-01-01,2017-03-31
> 2016-Q4,2017-04-01,2017-06-30
> CREATE TABLE t2 (
>   who varchar(999),
>   event varchar(999),
>   dt varchar(999)
> )
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
> aperson,did somthing,2017-01-06
> aperson,did somthing else,2017-01-12
> aperson,had chrsitmas,2016-12-26
> aperson,went wild,2016-01-01
> {code}
> 2. Impala Query shows correct result
> {code}
> select t2.dt, t1.fyq, t2.who, t2.event
> from t2
> left join t1 on t2.dt between t1.dts and t1.dte
> order by t2.dt;
> ++-+-+---+
> | dt | fyq | who | event |
> ++-+-+---+
> | 2016-01-01 | NULL| aperson | went wild |
> | 2016-12-26 | 2016-Q2 | aperson | had chrsitmas |
> | 2017-01-06 | 2016-Q3 | aperson | did somthing  |
> | 2017-01-12 | 2016-Q3 | aperson | did somthing else |
> ++-+-+---+
> {code}
> 3. Drill query shows wrong results:
> {code}
> alter session set planner.enable_nljoin_for_scalar_only=false;
> use hive;
> select t2.dt, t1.fyq, t2.who, t2.event
> from t2
> left join t1 on t2.dt between t1.dts and t1.dte
> order by t2.dt;
> +-+--+--++
> | dt  |   fyq|   who|   event|
> +-+--+--++
> | 2016-12-26  | 2016-Q2  | aperson  | had chrsitmas  |
> | 2017-01-06  | 2016-Q3  | aperson  | did somthing   |
> | 2017-01-12  | 2016-Q3  | aperson  | did somthing else  |
> +-+--+--++
> 3 rows selected (2.523 seconds)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5101) Provide boot-time option to disable the Dynamic UDF feature

2017-03-23 Thread Arina Ielchiieva (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15938269#comment-15938269
 ] 

Arina Ielchiieva commented on DRILL-5101:
-

Actually, we don't expect the directory to be owned by the user but only to be 
writable for the user:
{noformat}
 233   // It is considered that process user has write rights on directory 
if:
 234   // 1. process user is owner of the directory and has write rights
 235   // 2. process user is in group that has write rights
 236   // 3. any user has write rights
{noformat}
Well, since drill start up fails, I guess all this checks give false result.
Adding check for admin vs "builtin\Administrators" looks like hard-coding, 
let's say on next Windows Server version, such group will change to 
"builtin\Admins" and we'll need to update the code and so on. Actually, I was 
thinking since such permission checks give so much hard time, leave check that 
directory exists and is actually a directory enforced. But permission check do 
was warning. 

> Provide boot-time option to disable the Dynamic UDF feature
> ---
>
> Key: DRILL-5101
> URL: https://issues.apache.org/jira/browse/DRILL-5101
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.9.0
>Reporter: Paul Rogers
>Assignee: Arina Ielchiieva
>Priority: Minor
>
> A Windows user on the mailing list could not start an embedded Drillbit 
> because the Dynamic UDF feature tried to create a directory on the user's 
> protected Users folder:
> {code}
> Error: Failure in starting embedded Drillbit: 
> org.apache.drill.common.exceptions
> .DrillRuntimeException: Error during udf area creation 
> [/C:/Users/ivy.chan/drill
> /udf/registry] on file system [file:///] (state=,code=0)
> java.sql.SQLException: Failure in starting embedded Drillbit: 
> org.apache.drill.c
> ommon.exceptions.DrillRuntimeException: Error during udf area creation 
> [/C:/User
> s/ivy.chan/drill/udf/registry] on file system [file:///]
>at 
> org.apache.drill.jdbc.impl.DrillConnectionImpl.(DrillConnection
> Impl.java:128)
>at 
> org.apache.drill.jdbc.impl.DrillJdbc41Factory.newDrillConnection(Dril
> lJdbc41Factory.java:70)
>at 
> org.apache.drill.jdbc.impl.DrillFactory.newConnection(DrillFactory.ja
> va:69)
> {code}
> The fastest workaround (since this was an embedded Drillbit) would be to 
> disable the Dynamic UDF feature. Unfortunately, the only option to do so is a 
> runtime option that requires that the Drillbit be started. That creates a 
> vicious circle: we can't start the Drillbit unless we disable Dynamic UDFs, 
> but we can't disable them unless we start the Drillbit.
> The workaround might be to change the root directory, which is why this bug 
> is marked minor.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (DRILL-4872) NPE from CTAS partitioned by a projected casted null

2017-03-23 Thread Khurram Faraaz (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Khurram Faraaz closed DRILL-4872.
-

> NPE from CTAS partitioned by a projected casted null
> 
>
> Key: DRILL-4872
> URL: https://issues.apache.org/jira/browse/DRILL-4872
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.7.0
>Reporter: Boaz Ben-Zvi
>Assignee: Arina Ielchiieva
>  Labels: NPE
> Fix For: 1.10.0
>
>
> Extracted from DRILL-3898 : Running the same test case on a smaller table ( 
> store_sales.dat from TPCDS SF 1) has no space issues, but there is a Null 
> Pointer Exception from the projection:
> Caused by: java.lang.NullPointerException: null
>   at 
> org.apache.drill.exec.expr.fn.impl.ByteFunctionHelpers.compare(ByteFunctionHelpers.java:100)
>  ~[vector-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.test.generated.ProjectorGen1.doEval(ProjectorTemplate.java:49)
>  ~[na:na]
>   at 
> org.apache.drill.exec.test.generated.ProjectorGen1.projectRecords(ProjectorTemplate.java:62)
>  ~[na:na]
>   at 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.doWork(ProjectRecordBatch.java:199)
>  ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> A simplified version of the test case:
> 0: jdbc:drill:zk=local> create table dfs.tmp.ttt partition by ( x ) as select 
> case when columns[8] = '' then cast(null as varchar(10)) else cast(columns[8] 
> as varchar(10)) end as x FROM 
> dfs.`/Users/boazben-zvi/data/store_sales/store_sales.dat`;
> Error: SYSTEM ERROR: NullPointerException
>  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-4872) NPE from CTAS partitioned by a projected casted null

2017-03-23 Thread Khurram Faraaz (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15938259#comment-15938259
 ] 

Khurram Faraaz commented on DRILL-4872:
---

Verified and automated. Test is added here 
Functional/tpcds/sanity/text/drill_4872.sql into private branch.


> NPE from CTAS partitioned by a projected casted null
> 
>
> Key: DRILL-4872
> URL: https://issues.apache.org/jira/browse/DRILL-4872
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.7.0
>Reporter: Boaz Ben-Zvi
>Assignee: Arina Ielchiieva
>  Labels: NPE
> Fix For: 1.10.0
>
>
> Extracted from DRILL-3898 : Running the same test case on a smaller table ( 
> store_sales.dat from TPCDS SF 1) has no space issues, but there is a Null 
> Pointer Exception from the projection:
> Caused by: java.lang.NullPointerException: null
>   at 
> org.apache.drill.exec.expr.fn.impl.ByteFunctionHelpers.compare(ByteFunctionHelpers.java:100)
>  ~[vector-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.test.generated.ProjectorGen1.doEval(ProjectorTemplate.java:49)
>  ~[na:na]
>   at 
> org.apache.drill.exec.test.generated.ProjectorGen1.projectRecords(ProjectorTemplate.java:62)
>  ~[na:na]
>   at 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.doWork(ProjectRecordBatch.java:199)
>  ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> A simplified version of the test case:
> 0: jdbc:drill:zk=local> create table dfs.tmp.ttt partition by ( x ) as select 
> case when columns[8] = '' then cast(null as varchar(10)) else cast(columns[8] 
> as varchar(10)) end as x FROM 
> dfs.`/Users/boazben-zvi/data/store_sales/store_sales.dat`;
> Error: SYSTEM ERROR: NullPointerException
>  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (DRILL-5375) Nested loop join: return correct result for left join

2017-03-23 Thread Arina Ielchiieva (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15938213#comment-15938213
 ] 

Arina Ielchiieva edited comment on DRILL-5375 at 3/23/17 12:57 PM:
---

[~zfong]
Actually, I have analyzed nested loop join and types of joins it can support 
and came to the conclusion that it should support INNER and LEFT joins only, as 
was done initially (before my changes). Basically, nested loop join is not good 
candidate for RIGHT or FULL joins because of its implementation specifics, 
planner won't pick nested loop join for such joins as it's not optimal. If we 
want support non-equi join with RIGHT and FULL joins then we need to add 
support for non-equi joins in hash and merge joins which are much better 
candidates for such types of joins.

The main idea of nested loop join is that it buffers data from right table 
(which should be small enough) and for each left table record checks if any 
right table record satisfies the join condition. Let's say we want try to allow 
RIGHT and FULL joins for nested loop join in Drill.
Pre-conditions:
2 drillbits (we would assume that join will be performed on two nodes)
2 tables:
T1
||c1||
|A|
|B|
  any other letters except C
T2
||c1||
|A|
|B|
|C|

Query: select * from t1 right join t2 on t1.c1 =  t2.c1
Expected result:
||t1.c1||t2.c1||
|A|A|
|B|B|
|null|C|

Drill buffers T2 table on each node.
Drillbit_1 receives batch from T1 (let's imagine that out batches will contain 
only one row): *A*. It iterates over right input data and finds match: *A|A*. 
Also it marks that match for B, C was not found.
Drillbit_2 receives batch from T1: *B*. It iterates over right input data and 
finds match: *B|B*. Also it marks that match for A, C was not found.

Now to return correct RIGHT join output, we need to take statistics from two 
nodes (T2 rows that didn't find match) and merge it: *B, C + A, C => C* and 
additionally output *null|C*.
Presumably there will be one node that waits output from all nodes. 

This doesn't really coincide with Drill batch iteration approach and may be too 
tricky to implement. Thus as I have mentioned before, if we need RIGHT or FULL 
join to support non-equi joins, more correctly will be to add non-equi join 
support for hash and merge joins.

Regarding planner.enable_join_optimization it is enabled by default. User may 
want to disable it for the reasons described in above comment.




was (Author: arina):
[~zfong]
Actually, I have analyzed nested loop join and types of joins it can support 
and came to the conclusion that it should support INNER and LEFT joins only, as 
was done initially (before my changes). Basically, nested loop join is not good 
candidate for RIGHT or FULL joins because of its implementation specifics, 
planner won't pick nested loop join for such joins as it's not optimal. If we 
want support non-equi join with RIGHT and FULL joins then we need to add 
support for non-equi joins in hash and merge joins which are much better 
candidates for such types of joins.

The main idea of nested loop join is that it buffers data from right table 
(which should be small enough) and for each left table record checks if any 
right table record satisfies the join condition. Let's say we want try to allow 
RIGHT and FULL joins for nested loop join in Drill.
Pre-conditions:
2 drillbits (we would assume that join will be performed on two nodes)
2 tables:
T1
||c1||
|A|
|B|
  any other letters except C
T2
||c1||
|A|
|B|
|C|

Query: select * from t1 right join t2 on t1.c1 =  t2.c1
Expected result:
||t1.c1||t2.c1||
|A|A|
|B|B|
|null|C|

Drill buffers T2 table on each node.
Drillbit_1 receives batch from T1 (let's imagine that out batches will contain 
only one row): *A*. It iterates over right input data and finds match: *A|A*. 
Also it marks that match for B, C was not found.
Drillbit_2 receives batch from T1: *B*. It iterates over right input data and 
finds match: *B|B*. Also it marks that match for A, C was not found.

Now to return correct RIGHT join output, we need to take statistics from two 
nodes (T2 rows that didn't find match) and merge it: *B, C + A, C => C* and 
additionally output *null|C*.
Presumably there will be one node that waits output from all nodes. 

This doesn't really coincide with Drill batch iteration approach and may be too 
tricky to implement. Thus as I have mentioned before, if we need RIGHT or FULL 
join to support non-equi joins, more correctly will be to add non-equi join 
support for hash and merge joins.




> Nested loop join: return correct result for left join
> -
>
> Key: DRILL-5375
> URL: https://issues.apache.org/jira/browse/DRILL-5375
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.8.0
>Reporter: Arina Ielchiieva
>Assignee: 

[jira] [Commented] (DRILL-5375) Nested loop join: return correct result for left join

2017-03-23 Thread Arina Ielchiieva (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15938213#comment-15938213
 ] 

Arina Ielchiieva commented on DRILL-5375:
-

[~zfong]
Actually, I have analyzed nested loop join and types of joins it can support 
and came to the conclusion that it should support INNER and LEFT joins only, as 
was done initially (before my changes). Basically, nested loop join is not good 
candidate for RIGHT or FULL joins because of its implementation specifics, 
planner won't pick nested loop join for such joins as it's not optimal. If we 
want support non-equi join with RIGHT and FULL joins then we need to add 
support for non-equi joins in hash and merge joins which are much better 
candidates for such types of joins.

The main idea of nested loop join is that it buffers data from right table 
(which should be small enough) and for each left table record checks if any 
right table record satisfies the join condition. Let's say we want try to allow 
RIGHT and FULL joins for nested loop join in Drill.
Pre-conditions:
2 drillbits (we would assume that join will be performed on two nodes)
2 tables:
T1
||c1||
|A|
|B|
  any other letters except C
T2
||c1||
|A|
|B|
|C|

Query: select * from t1 right join t2 on t1.c1 =  t2.c1
Expected result:
||t1.c1||t2.c1||
|A|A|
|B|B|
|null|C|

Drill buffers T2 table on each node.
Drillbit_1 receives batch from T1 (let's imagine that out batches will contain 
only one row): *A*. It iterates over right input data and finds match: *A|A*. 
Also it marks that match for B, C was not found.
Drillbit_2 receives batch from T1: *B*. It iterates over right input data and 
finds match: *B|B*. Also it marks that match for A, C was not found.

Now to return correct RIGHT join output, we need to take statistics from two 
nodes (T2 rows that didn't find match) and merge it: *B, C + A, C => C* and 
additionally output *null|C*.
Presumably there will be one node that waits output from all nodes. 

This doesn't really coincide with Drill batch iteration approach and may be too 
tricky to implement. Thus as I have mentioned before, if we need RIGHT or FULL 
join to support non-equi joins, more correctly will be to add non-equi join 
support for hash and merge joins.




> Nested loop join: return correct result for left join
> -
>
> Key: DRILL-5375
> URL: https://issues.apache.org/jira/browse/DRILL-5375
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.8.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>  Labels: doc-impacting
>
> Mini repro:
> 1. Create 2 Hive tables with data
> {code}
> CREATE TABLE t1 (
>   FYQ varchar(999),
>   dts varchar(999),
>   dte varchar(999)
> )
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
> 2016-Q1,2016-06-01,2016-09-30
> 2016-Q2,2016-09-01,2016-12-31
> 2016-Q3,2017-01-01,2017-03-31
> 2016-Q4,2017-04-01,2017-06-30
> CREATE TABLE t2 (
>   who varchar(999),
>   event varchar(999),
>   dt varchar(999)
> )
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
> aperson,did somthing,2017-01-06
> aperson,did somthing else,2017-01-12
> aperson,had chrsitmas,2016-12-26
> aperson,went wild,2016-01-01
> {code}
> 2. Impala Query shows correct result
> {code}
> select t2.dt, t1.fyq, t2.who, t2.event
> from t2
> left join t1 on t2.dt between t1.dts and t1.dte
> order by t2.dt;
> ++-+-+---+
> | dt | fyq | who | event |
> ++-+-+---+
> | 2016-01-01 | NULL| aperson | went wild |
> | 2016-12-26 | 2016-Q2 | aperson | had chrsitmas |
> | 2017-01-06 | 2016-Q3 | aperson | did somthing  |
> | 2017-01-12 | 2016-Q3 | aperson | did somthing else |
> ++-+-+---+
> {code}
> 3. Drill query shows wrong results:
> {code}
> alter session set planner.enable_nljoin_for_scalar_only=false;
> use hive;
> select t2.dt, t1.fyq, t2.who, t2.event
> from t2
> left join t1 on t2.dt between t1.dts and t1.dte
> order by t2.dt;
> +-+--+--++
> | dt  |   fyq|   who|   event|
> +-+--+--++
> | 2016-12-26  | 2016-Q2  | aperson  | had chrsitmas  |
> | 2017-01-06  | 2016-Q3  | aperson  | did somthing   |
> | 2017-01-12  | 2016-Q3  | aperson  | did somthing else  |
> +-+--+--++
> 3 rows selected (2.523 seconds)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-4678) Tune metadata by generating a dispatcher at runtime

2017-03-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15938150#comment-15938150
 ] 

ASF GitHub Bot commented on DRILL-4678:
---

Github user Serhii-Harnyk commented on a diff in the pull request:

https://github.com/apache/drill/pull/793#discussion_r107647154
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/cost/DrillRelMdRowCount.java
 ---
@@ -14,35 +14,71 @@
  * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  * See the License for the specific language governing permissions and
  * limitations under the License.
- 
**/
+ */
 package org.apache.drill.exec.planner.cost;
 
+import org.apache.calcite.rel.SingleRel;
 import org.apache.calcite.rel.core.Aggregate;
 import org.apache.calcite.rel.core.Filter;
+import org.apache.calcite.rel.core.Join;
+import org.apache.calcite.rel.core.Project;
+import org.apache.calcite.rel.core.Sort;
+import org.apache.calcite.rel.core.Union;
 import org.apache.calcite.rel.metadata.ReflectiveRelMetadataProvider;
 import org.apache.calcite.rel.metadata.RelMdRowCount;
 import org.apache.calcite.rel.metadata.RelMetadataProvider;
+import org.apache.calcite.rel.metadata.RelMetadataQuery;
 import org.apache.calcite.util.BuiltInMethod;
 import org.apache.calcite.util.ImmutableBitSet;
+import org.apache.drill.exec.planner.common.DrillLimitRelBase;
 
 public class DrillRelMdRowCount extends RelMdRowCount{
   private static final DrillRelMdRowCount INSTANCE = new 
DrillRelMdRowCount();
 
   public static final RelMetadataProvider SOURCE = 
ReflectiveRelMetadataProvider.reflectiveSource(BuiltInMethod.ROW_COUNT.method, 
INSTANCE);
 
   @Override
-  public Double getRowCount(Aggregate rel) {
+  public Double getRowCount(Aggregate rel, RelMetadataQuery mq) {
 ImmutableBitSet groupKey = ImmutableBitSet.range(rel.getGroupCount());
 
 if (groupKey.isEmpty()) {
   return 1.0;
 } else {
-  return super.getRowCount(rel);
+  return super.getRowCount(rel, mq);
 }
   }
 
   @Override
-  public Double getRowCount(Filter rel) {
-return rel.getRows();
+  public Double getRowCount(Filter rel, RelMetadataQuery mq) {
--- End diff --

In RelMdRowCount.java added overloaded methods, which returns other result 
than getRowCount(RelNode rel, RelMetadataQuery mq) [1] as it was before. 
So in order to preserve previous Drill behavior, I override those methods.
[1] 
https://github.com/Serhii-Harnyk/incubator-calcite/blob/02848c99d75f2d0e00c219f3fa300fc8e7df26df/core/src/main/java/org/apache/calcite/rel/metadata/RelMdRowCount.java#L65


> Tune metadata by generating a dispatcher at runtime
> ---
>
> Key: DRILL-4678
> URL: https://issues.apache.org/jira/browse/DRILL-4678
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.7.0
> Environment: 4 node cluster CentOS
>Reporter: Khurram Faraaz
>Assignee: Serhii Harnyk
>Priority: Critical
> Attachments: hung_Date_Query.log
>
>
> Below query hangs
> {noformat}
> 2016-05-16 10:33:57,506 [28c65de9-9f67-dadb-5e4e-e1a12f8dda49:foreman] INFO  
> o.a.drill.exec.work.foreman.Foreman - Query text for query id 
> 28c65de9-9f67-dadb-5e4e-e1a12f8dda49: SELECT DISTINCT dt FROM (
> VALUES(CAST('1964-03-07' AS DATE)),
>   (CAST('2002-03-04' AS DATE)),
>   (CAST('1966-09-04' AS DATE)),
>   (CAST('1993-08-18' AS DATE)),
>   (CAST('1970-06-11' AS DATE)),
>   (CAST('1970-06-11' AS DATE)),
>   (CAST('1970-06-11' AS DATE)),
>   (CAST('1970-06-11' AS DATE)),
>   (CAST('1970-06-11' AS DATE)),
>   (CAST('1959-10-23' AS DATE)),
>   (CAST('1992-01-14' AS DATE)),
>   (CAST('1994-07-24' AS DATE)),
>   (CAST('1979-11-25' AS DATE)),
>   (CAST('1945-01-14' AS DATE)),
>   (CAST('1982-07-25' AS DATE)),
>   (CAST('1966-09-06' AS DATE)),
>   (CAST('1989-05-01' AS DATE)),
>   (CAST('1996-03-08' AS DATE)),
>   (CAST('1998-08-19' AS DATE)),
>   (CAST('2013-08-13' AS DATE)),
>   (CAST('2013-08-13' AS DATE)),
>   (CAST('2013-08-13' AS DATE)),
>   (CAST('2013-08-13' AS DATE)),
>   (CAST('2013-08-13' AS DATE)),
>   (CAST('2013-08-13' AS DATE)),
> (CAST('1999-07-20' AS DATE)),
> (CAST('1962-07-03' AS DATE)),
>   (CAST('2011-08-17' AS DATE)),
>   (CAST('2011-05-16' AS DATE)),
>   (CAST('1946-05-08' AS DATE)),
>   (CAST('1994-02-13' AS DATE)),
>   (CAST('1978-08-09' AS DATE)),
>   (CAST('1978-08-09' AS 

[jira] [Assigned] (DRILL-4824) JSON with complex nested data produces incorrect output with missing fields

2017-03-23 Thread Volodymyr Vysotskyi (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Volodymyr Vysotskyi reassigned DRILL-4824:
--

Assignee: Volodymyr Vysotskyi  (was: Serhii Harnyk)

> JSON with complex nested data produces incorrect output with missing fields
> ---
>
> Key: DRILL-4824
> URL: https://issues.apache.org/jira/browse/DRILL-4824
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - JSON
>Affects Versions: 1.0.0
>Reporter: Roman
>Assignee: Volodymyr Vysotskyi
>
> There is incorrect output in case of JSON file with complex nested data.
> _JSON:_
> {code:none|title=example.json|borderStyle=solid}
> {
> "Field1" : {
> }
> }
> {
> "Field1" : {
> "InnerField1": {"key1":"value1"},
> "InnerField2": {"key2":"value2"}
> }
> }
> {
> "Field1" : {
> "InnerField3" : ["value3", "value4"],
> "InnerField4" : ["value5", "value6"]
> }
> }
> {code}
> _Query:_
> {code:sql}
> select Field1 from dfs.`/tmp/example.json`
> {code}
> _Incorrect result:_
> {code:none}
> +---+
> |  Field1   |
> +---+
> {"InnerField1":{},"InnerField2":{},"InnerField3":[],"InnerField4":[]}
> {"InnerField1":{"key1":"value1"},"InnerField2" 
> {"key2":"value2"},"InnerField3":[],"InnerField4":[]}
> {"InnerField1":{},"InnerField2":{},"InnerField3":["value3","value4"],"InnerField4":["value5","value6"]}
> +--+
> {code}
> Theres is no need to output missing fields. In case of deeply nested 
> structure we will get unreadable result for user.
> _Correct result:_
> {code:none}
> +--+
> | Field1   |
> +--+
> |{} 
> {"InnerField1":{"key1":"value1"},"InnerField2":{"key2":"value2"}}
> {"InnerField3":["value3","value4"],"InnerField4":["value5","value6"]}
> +--+
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5164) Equi-join query results in CompileException when inputs have large number of columns

2017-03-23 Thread Khurram Faraaz (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15937790#comment-15937790
 ] 

Khurram Faraaz commented on DRILL-5164:
---

We still see CompileException on Drill 1.10.0 commit id: b657d44f

{noformat}
0: jdbc:drill:schema=dfs.tmp> select * from `manyColsInJson.json` t1, 
`manyColsInJson.json` t2 where t1.key2000 = t2.key2000;
Error: SYSTEM ERROR: CompileException: File 
'org.apache.drill.exec.compile.DrillJavaFileObject[HashJoinProbeGen3885.java]', 
Line 10, Column 8: HashJoinProbeGen3885.java:10: error: too many constants
public class HashJoinProbeGen3885 {
   ^ (compiler.err.limit.pool)

Fragment 0:0

[Error Id: f937f582-8059-4af3-b908-44f4b40ea28e on centos-01.qa.lab:31010] 
(state=,code=0)
{noformat}

> Equi-join query results in CompileException when inputs have large number of 
> columns
> 
>
> Key: DRILL-5164
> URL: https://issues.apache.org/jira/browse/DRILL-5164
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Codegen
>Affects Versions: 1.9.0
>Reporter: Khurram Faraaz
>Assignee: Serhii Harnyk
>Priority: Critical
> Fix For: 1.10.0
>
> Attachments: manyColsInJson.json
>
>
> Drill 1.9.0 
> git commit ID : 4c1b420b
> 4 node CentOS cluster
> JSON file has 4095 keys (columns)
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> select * from `manyColsInJson.json` t1, 
> `manyColsInJson.json` t2 where t1.key2000 = t2.key2000;
> Error: SYSTEM ERROR: CompileException: File 
> 'org.apache.drill.exec.compile.DrillJavaFileObject[HashJoinProbeGen294.java]',
>  Line 16397, Column 17: HashJoinProbeGen294.java:16397: error: code too large
> public void doSetup(FragmentContext context, VectorContainer buildBatch, 
> RecordBatch probeBatch, RecordBatch outgoing)
> ^ (compiler.err.limit.code)
> Fragment 0:0
> [Error Id: 7d0efa7e-e183-4c40-939a-4908699f94bf on centos-01.qa.lab:31010] 
> (state=,code=0)
> {noformat}
> Stack trace from drillbit.log
> {noformat}
> 2016-12-26 09:52:11,321 [279f17fd-c8f0-5d18-1124-76099f0a5cc8:frag:0:0] ERROR 
> o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: CompileException: File 
> 'org.apache.drill.exec.compile.DrillJavaFileObject[HashJoinProbeGen294.java]',
>  Line 16397, Column 17: HashJoinProbeGen294.java:16397: error: code too large
> public void doSetup(FragmentContext context, VectorContainer buildBatch, 
> RecordBatch probeBatch, RecordBatch outgoing)
> ^ (compiler.err.limit.code)
> Fragment 0:0
> [Error Id: 7d0efa7e-e183-4c40-939a-4908699f94bf on centos-01.qa.lab:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> CompileException: File 
> 'org.apache.drill.exec.compile.DrillJavaFileObject[HashJoinProbeGen294.java]',
>  Line 16397, Column 17: HashJoinProbeGen294.java:16397: error: code too large
> public void doSetup(FragmentContext context, VectorContainer buildBatch, 
> RecordBatch probeBatch, RecordBatch outgoing)
> ^ (compiler.err.limit.code)
> Fragment 0:0
> [Error Id: 7d0efa7e-e183-4c40-939a-4908699f94bf on centos-01.qa.lab:31010]
> at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:543)
>  ~[drill-common-1.9.0.jar:1.9.0]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:293)
>  [drill-java-exec-1.9.0.jar:1.9.0]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:160)
>  [drill-java-exec-1.9.0.jar:1.9.0]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:262)
>  [drill-java-exec-1.9.0.jar:1.9.0]
> at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.9.0.jar:1.9.0]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [na:1.8.0_91]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_91]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_91]
> Caused by: org.apache.drill.common.exceptions.DrillRuntimeException: 
> org.apache.drill.exec.exception.SchemaChangeException: 
> org.apache.drill.exec.exception.ClassTransformationException: 
> java.util.concurrent.ExecutionException: 
> org.apache.drill.exec.exception.ClassTransformationException: Failure 
> generating transformation classes for value:
> package org.apache.drill.exec.test.generated;
> ...
> public class HashJoinProbeGen294 {
> NullableVarCharVector[] vv0;
> NullableVarCharVector vv3;
> NullableVarCharVector[] vv6;
> ...
> vv49137 .copyFromSafe((probeIndex), (outIndex), vv49134);
> vv49143