[GitHub] drill issue #1101: DRILL-6032: Made the batch sizing for HashAgg more accura...

2018-02-02 Thread ilooner
Github user ilooner commented on the issue:

https://github.com/apache/drill/pull/1101
  
An additional comment is that after the defaults were reduced the number of 
random failures we see in our functional tests decreased to just 3. Typically 
there are more than that.


---


[GitHub] drill issue #1101: DRILL-6032: Made the batch sizing for HashAgg more accura...

2018-02-02 Thread ilooner
Github user ilooner commented on the issue:

https://github.com/apache/drill/pull/1101
  
@Ben-Zvi I have responded to comments and implemented requested changes. 
Please see latest commits for changes. I have made some additional changes 
after noticing that some of the batch sizing calculations were incorrect, I 
have also made various documentation and naming changes to improve the 
readability of the code and to document what needs to be improved in the 
future.:

- The size of varchar columns were not properly accounted for in the 
outgoing batch in the case where varchars are aggregated. I have added logic to 
the updateEst... method to account for this.
- I have added docs to the updateEst method explaining what the various 
batch sizing estimates do.
- I have changed the variable names for the batch size estimates to more 
accurately reflect what they do.
- Previously if a batch size estimate was smaller than the actual amount of 
memory allocated, the estimate was updated to be the larger size. I think this 
was actually the wrong behavior since it causes the HashAgg operator's memory 
estimates to be extremely aggressive and in the worst case can cause the 
operator to run out of memory prematurely. Ideally a good estimate will over 
estimate 50% of the time and under estimate 50% of the time.
- I have changed the HashAgg defaults. Although tests passed with the 
previous defaults, I felt they were too aggressive. I have changed the default 
number of partitions to 16 and the minimum number of batches to 1.

Let me know if you have anymore comments. Thanks.


---


[jira] [Created] (DRILL-6133) RecordBatchSizer throws IndexOutOfBounds Exception for union vector

2018-02-02 Thread Padma Penumarthy (JIRA)
Padma Penumarthy created DRILL-6133:
---

 Summary: RecordBatchSizer throws IndexOutOfBounds Exception for 
union vector
 Key: DRILL-6133
 URL: https://issues.apache.org/jira/browse/DRILL-6133
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Flow
Affects Versions: 1.12.0
Reporter: Padma Penumarthy
Assignee: Padma Penumarthy
 Fix For: 1.13.0


RecordBatchSizer throws IndexOutOfBoundsException when trying to get payload 
byte count of union vector. 

[Error Id: 430026a7-a963-40f1-bae2-1850649e8434 on 172.30.8.158:31013]
 at 
org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633)
 ~[classes/:na]
 at 
org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:300)
 [classes/:na]
 at 
org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:160)
 [classes/:na]
 at 
org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:266)
 [classes/:na]
 at 
org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) 
[classes/:na]
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
[na:1.8.0_45]
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
[na:1.8.0_45]
 at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45]
Caused by: java.lang.IndexOutOfBoundsException: DrillBuf[2], udle: [1 0..0], 
index: 4, length: 4 (expected: range(0, 0))
DrillBuf[2], udle: [1 0..0]
 at 
org.apache.drill.exec.memory.BoundsChecking.checkIndex(BoundsChecking.java:80) 
~[classes/:na]
 at 
org.apache.drill.exec.memory.BoundsChecking.lengthCheck(BoundsChecking.java:86) 
~[classes/:na]
 at io.netty.buffer.DrillBuf.chk(DrillBuf.java:114) ~[classes/:4.0.48.Final]
 at io.netty.buffer.DrillBuf.getInt(DrillBuf.java:484) ~[classes/:4.0.48.Final]
 at org.apache.drill.exec.vector.UInt4Vector$Accessor.get(UInt4Vector.java:432) 
~[classes/:na]
 at 
org.apache.drill.exec.vector.VarCharVector.getPayloadByteCount(VarCharVector.java:308)
 ~[classes/:na]
 at 
org.apache.drill.exec.vector.NullableVarCharVector.getPayloadByteCount(NullableVarCharVector.java:256)
 ~[classes/:na]
 at 
org.apache.drill.exec.vector.complex.AbstractMapVector.getPayloadByteCount(AbstractMapVector.java:303)
 ~[classes/:na]
 at 
org.apache.drill.exec.vector.complex.UnionVector.getPayloadByteCount(UnionVector.java:574)
 ~[classes/:na]
 at 
org.apache.drill.exec.physical.impl.spill.RecordBatchSizer$ColumnSize.(RecordBatchSizer.java:147)
 ~[classes/:na]
 at 
org.apache.drill.exec.physical.impl.spill.RecordBatchSizer.measureColumn(RecordBatchSizer.java:403)
 ~[classes/:na]
 at 
org.apache.drill.exec.physical.impl.spill.RecordBatchSizer.(RecordBatchSizer.java:350)
 ~[classes/:na]
 at 
org.apache.drill.exec.physical.impl.spill.RecordBatchSizer.(RecordBatchSizer.java:320)
 ~[classes/:na]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] drill pull request #1096: DRILL-6099 : Push limit past flatten(project) with...

2018-02-02 Thread amansinha100
Github user amansinha100 commented on a diff in the pull request:

https://github.com/apache/drill/pull/1096#discussion_r165788152
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillPushLimitToScanRule.java
 ---
@@ -55,18 +62,21 @@ public void onMatch(RelOptRuleCall call) {
 }
   };
 
-  public static DrillPushLimitToScanRule LIMIT_ON_PROJECT =
-  new DrillPushLimitToScanRule(
-  RelOptHelper.some(DrillLimitRel.class, RelOptHelper.some(
-  DrillProjectRel.class, 
RelOptHelper.any(DrillScanRel.class))),
-  "DrillPushLimitToScanRule_LimitOnProject") {
+  public static DrillPushLimitToScanRule LIMIT_ON_PROJECT = new 
DrillPushLimitToScanRule(
--- End diff --

I am not sure why this rule is being overloaded for doing limit push past 
project.  This particular rule is about doing limit pushdown into scan for 
cases where we have LIMIT-SCAN or LIMIT-PROJECT-SCAN.  I think we should keep 
this rule as-is but create a separate rule that does a limit push past project. 
 Was there a strong reason to do it this way ?  Could there be a side effect of 
removing the check for the Scan ? 


---


[GitHub] drill pull request #1096: DRILL-6099 : Push limit past flatten(project) with...

2018-02-02 Thread amansinha100
Github user amansinha100 commented on a diff in the pull request:

https://github.com/apache/drill/pull/1096#discussion_r165791415
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillPushLimitToScanRule.java
 ---
@@ -55,18 +62,21 @@ public void onMatch(RelOptRuleCall call) {
 }
   };
 
-  public static DrillPushLimitToScanRule LIMIT_ON_PROJECT =
-  new DrillPushLimitToScanRule(
-  RelOptHelper.some(DrillLimitRel.class, RelOptHelper.some(
-  DrillProjectRel.class, 
RelOptHelper.any(DrillScanRel.class))),
-  "DrillPushLimitToScanRule_LimitOnProject") {
+  public static DrillPushLimitToScanRule LIMIT_ON_PROJECT = new 
DrillPushLimitToScanRule(
+  RelOptHelper.some(DrillLimitRel.class, 
RelOptHelper.any(DrillProjectRel.class)), 
"DrillPushLimitToScanRule_LimitOnProject") {
 @Override
 public boolean matches(RelOptRuleCall call) {
   DrillLimitRel limitRel = call.rel(0);
-  DrillScanRel scanRel = call.rel(2);
-  // For now only applies to Parquet. And pushdown only apply limit 
but not offset,
+  DrillProjectRel projectRel = call.rel(1);
+  // pushdown only apply limit but not offset,
   // so if getFetch() return null no need to run this rule.
-  if (scanRel.getGroupScan().supportsLimitPushdown() && 
(limitRel.getFetch() != null)) {
--- End diff --

I can understand that this check was removed because this matches() method 
no longer is checking for DrillScanRel, but does that mean that no one is 
checking the GroupScan for supportsLimitPushdown() ? 


---


[GitHub] drill pull request #:

2018-02-02 Thread ilooner
Github user ilooner commented on the pull request:


https://github.com/apache/drill/commit/f911575abd9672866b39926a60c1731bbeee155f#commitcomment-27301156
  
In 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggTemplate.java:
In 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggTemplate.java
 on line 527:
Thanks for catching this. I am still running tests and debugging on our 
Jenkins, so you may see intermediate commits with errors. I will let you know 
once I have something in a state that passes all tests.


---


Re: Error in Drill 1.13

2018-02-02 Thread Timothy Farkas
Hi Charles,

FragmentContext used to be a concrete class. Now the FragmentContext has been 
changed to an interface, and the concrete class that implements it is now 
FragmentContextImpl

Thanks,
Tim


From: Charles Givre 
Sent: Friday, February 2, 2018 11:47:12 AM
To: dev@drill.apache.org
Subject: Error in Drill 1.13

Hello all,
I’m getting ready to submit a PR for a log format plugin for Drill and after I 
rebased Drill, I’m now getting the following error:


java.lang.IncompatibleClassChangeError: Found interface 
org.apache.drill.exec.ops.FragmentContext, but class was expected
at 
org.apache.drill.exec.store.log.LogRecordReader.(LogRecordReader.java:90)
at 
org.apache.drill.exec.store.log.LogFormatPlugin.getRecordReader(LogFormatPlugin.java:63)
at 
org.apache.drill.exec.store.dfs.easy.EasyFormatPlugin.getReaderBatch(EasyFormatPlugin.java:150)
at 
org.apache.drill.exec.store.dfs.easy.EasyReaderBatchCreator.getBatch(EasyReaderBatchCreator.java:33)
at 
org.apache.drill.exec.store.dfs.easy.EasyReaderBatchCreator.getBatch(EasyReaderBatchCreator.java:28)
at 
org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch(ImplCreator.java:159)
at 
org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreator.java:182)
at 
org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch(ImplCreator.java:137)
at 
org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreator.java:182)
at 
org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch(ImplCreator.java:137)
at 
org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreator.java:182)
at 
org.apache.drill.exec.physical.impl.ImplCreator.getRootExec(ImplCreator.java:110)
at 
org.apache.drill.exec.physical.impl.ImplCreator.getExec(ImplCreator.java:87)
at 
org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:206)
at 
org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)


Here is the code which seems to be causing the problem:

public LogRecordReader(FragmentContext fragmentContext, String inputPath, 
DrillFileSystem fileSystem,
   List columns, 
LogFormatPlugin.LogFormatConfig config) throws OutOfMemoryException {
  try {
Path hdfsPath = new Path(inputPath);
Configuration conf = new Configuration();
FSDataInputStream fsStream = fileSystem.open(hdfsPath);
CompressionCodecFactory factory = new CompressionCodecFactory(conf);
CompressionCodec codec = factory.getCodec(hdfsPath);
if (codec == null) {
  reader = new BufferedReader(new 
InputStreamReader(fsStream.getWrappedStream(), "UTF-8"));
} else {
  CompressionInputStream comInputStream = 
codec.createInputStream(fsStream.getWrappedStream());
  reader = new BufferedReader(new InputStreamReader(comInputStream));
}
this.inputPath = inputPath;
this.lineCount = 0;
this.config = config;
this.buffer = fragmentContext.getManagedBuffer(4096);
setColumns(columns);

  } catch (IOException e) {
logger.debug("Log Reader Plugin: " + e.getMessage());
  }
}
and here is a link to the repo with the complete code; 
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_cgivre_drill-2Dlogfile-2Dplugin=DwIFaQ=cskdkSMqhcnjZxdQVpwTXg=4eQVr8zB8ZBff-yxTimdOQ=Lz6tyq_4Ljazq01AWJA1ZTV4s9ysO1lcLbVMA3M9jAs=Qx0c4Si8Vr7fI-2tNtB6waqrqIiLnvLDaFOoRJr-354=
 
.
  I’m a little stumped on this and would appreciate any suggestions.  My plugin 
DOES work on 1.12.
—C




[GitHub] drill pull request #:

2018-02-02 Thread Ben-Zvi
Github user Ben-Zvi commented on the pull request:


https://github.com/apache/drill/commit/f911575abd9672866b39926a60c1731bbeee155f#commitcomment-27300995
  
In 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggTemplate.java:
In 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggTemplate.java
 on line 527:
This change prevents an exception, but the next code line makes the same 
call which would trigger this exception:
{code}
estBatchHolderValuesRowWidth += columnSize.getKnownSize();
{code}


---


Error in Drill 1.13

2018-02-02 Thread Charles Givre
Hello all, 
I’m getting ready to submit a PR for a log format plugin for Drill and after I 
rebased Drill, I’m now getting the following error:


java.lang.IncompatibleClassChangeError: Found interface 
org.apache.drill.exec.ops.FragmentContext, but class was expected
at 
org.apache.drill.exec.store.log.LogRecordReader.(LogRecordReader.java:90)
at 
org.apache.drill.exec.store.log.LogFormatPlugin.getRecordReader(LogFormatPlugin.java:63)
at 
org.apache.drill.exec.store.dfs.easy.EasyFormatPlugin.getReaderBatch(EasyFormatPlugin.java:150)
at 
org.apache.drill.exec.store.dfs.easy.EasyReaderBatchCreator.getBatch(EasyReaderBatchCreator.java:33)
at 
org.apache.drill.exec.store.dfs.easy.EasyReaderBatchCreator.getBatch(EasyReaderBatchCreator.java:28)
at 
org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch(ImplCreator.java:159)
at 
org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreator.java:182)
at 
org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch(ImplCreator.java:137)
at 
org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreator.java:182)
at 
org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch(ImplCreator.java:137)
at 
org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreator.java:182)
at 
org.apache.drill.exec.physical.impl.ImplCreator.getRootExec(ImplCreator.java:110)
at 
org.apache.drill.exec.physical.impl.ImplCreator.getExec(ImplCreator.java:87)
at 
org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:206)
at 
org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)


Here is the code which seems to be causing the problem:

public LogRecordReader(FragmentContext fragmentContext, String inputPath, 
DrillFileSystem fileSystem,
   List columns, 
LogFormatPlugin.LogFormatConfig config) throws OutOfMemoryException {
  try {
Path hdfsPath = new Path(inputPath);
Configuration conf = new Configuration();
FSDataInputStream fsStream = fileSystem.open(hdfsPath);
CompressionCodecFactory factory = new CompressionCodecFactory(conf);
CompressionCodec codec = factory.getCodec(hdfsPath);
if (codec == null) {
  reader = new BufferedReader(new 
InputStreamReader(fsStream.getWrappedStream(), "UTF-8"));
} else {
  CompressionInputStream comInputStream = 
codec.createInputStream(fsStream.getWrappedStream());
  reader = new BufferedReader(new InputStreamReader(comInputStream));
}
this.inputPath = inputPath;
this.lineCount = 0;
this.config = config;
this.buffer = fragmentContext.getManagedBuffer(4096);
setColumns(columns);

  } catch (IOException e) {
logger.debug("Log Reader Plugin: " + e.getMessage());
  }
}
and here is a link to the repo with the complete code; 
https://github.com/cgivre/drill-logfile-plugin 
.  I’m a little stumped on this 
and would appreciate any suggestions.  My plugin DOES work on 1.12.  
—C 




[jira] [Created] (DRILL-6132) HashPartitionSender leaks memory

2018-02-02 Thread Chun Chang (JIRA)
Chun Chang created DRILL-6132:
-

 Summary: HashPartitionSender leaks memory
 Key: DRILL-6132
 URL: https://issues.apache.org/jira/browse/DRILL-6132
 Project: Apache Drill
  Issue Type: Bug
  Components: Functions - Drill
Affects Versions: 1.12.0
Reporter: Chun Chang
Assignee: Timothy Farkas


Enable assertion (-ea), I noticed HashPartitionSender leaks memory if 
aggregation query fails due to OOM.
{noformat}
message: "SYSTEM ERROR: IllegalStateException: 
Allocator[op:2:1:0:HashPartitionSender] closed with outstanding buffers 
allocated (1).\nAllocator(op:2:1:0:HashPartitionSender) 
100/8192/3300352/100 (res/actual/peak/limit)\n child allocators: 
0\n ledgers: 1\n ledger[703835] allocator: op:2:1:0:HashPartitionSender), 
isOwning: true, size: 8192, references: 1, life: 6329151004063490..0, 
allocatorManager: [688058, life: 6329151004058252..0] holds 1 buffers. \n 
DrillBuf[777552], udle: [688059 0..8192]: , \n reservations: 0\n\n\nFragment 
2:1\n\n[Error Id: c7cc9d37-8881-4db1-8123-2651628c4081 on 
10.10.30.168:31010]\n\n (java.lang.IllegalStateException) 
Allocator[op:2:1:0:HashPartitionSender] closed with outstanding buffers 
allocated (1).\nAllocator(op:2:1:0:HashPartitionSender) 
100/8192/3300352/100 (res/actual/peak/limit)\n child allocators: 
0\n ledgers: 1\n ledger[703835] allocator: op:2:1:0:HashPartitionSender), 
isOwning: true, size: 8192, references: 1, life: 6329151004063490..0, 
allocatorManager: [688058, life: 6329151004058252..0] holds 1 buffers. \n 
DrillBuf[777552], udle: [688059 0..8192]: , \n reservations: 0\n\n 
org.apache.drill.exec.memory.BaseAllocator.close():504\n 
org.apache.drill.exec.ops.BaseOperatorContext.close():157\n 
org.apache.drill.exec.ops.OperatorContextImpl.close():79\n 
org.apache.drill.exec.ops.FragmentContext.suppressingClose():429\n 
org.apache.drill.exec.ops.FragmentContext.close():418\n 
org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources():324\n 
org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup():155\n 
org.apache.drill.exec.work.fragment.FragmentExecutor.run():267\n 
org.apache.drill.common.SelfCleaningRunnable.run():38\n 
java.util.concurrent.ThreadPoolExecutor.runWorker():1149\n 
java.util.concurrent.ThreadPoolExecutor$Worker.run():624\n 
java.lang.Thread.run():748\n"
exception {
exception_class: "java.lang.IllegalStateException"
message: "Allocator[op:2:1:0:HashPartitionSender] closed with outstanding 
buffers allocated (1).\nAllocator(op:2:1:0:HashPartitionSender) 
100/8192/3300352/100 (res/actual/peak/limit)\n child allocators: 
0\n ledgers: 1\n ledger[703835] allocator: op:2:1:0:HashPartitionSender), 
isOwning: true, size: 8192, references: 1, life: 6329151004063490..0, 
allocatorManager: [688058, life: 6329151004058252..0] holds 1 buffers. \n 
DrillBuf[777552], udle: [688059 0..8192]: , \n reservations: 0\n"
stack_trace {
class_name: "org.apache.drill.exec.memory.BaseAllocator"
file_name: "BaseAllocator.java"
line_number: 504
method_name: "close"
is_native_method: false
}
stack_trace {
class_name: "org.apache.drill.exec.ops.BaseOperatorContext"
file_name: "BaseOperatorContext.java"
line_number: 157
method_name: "close"
is_native_method: false
}
stack_trace {
class_name: "org.apache.drill.exec.ops.OperatorContextImpl"
file_name: "OperatorContextImpl.java"
line_number: 79
method_name: "close"
is_native_method: false
}
stack_trace {
class_name: "org.apache.drill.exec.ops.FragmentContext"
file_name: "FragmentContext.java"
line_number: 429
method_name: "suppressingClose"
is_native_method: false
}
stack_trace {
class_name: "org.apache.drill.exec.ops.FragmentContext"
file_name: "FragmentContext.java"
line_number: 418
method_name: "close"
is_native_method: false
}
stack_trace {
class_name: "org.apache.drill.exec.work.fragment.FragmentExecutor"
file_name: "FragmentExecutor.java"
line_number: 324
method_name: "closeOutResources"
is_native_method: false
}
stack_trace {
class_name: "org.apache.drill.exec.work.fragment.FragmentExecutor"
file_name: "FragmentExecutor.java"
line_number: 155
method_name: "cleanup"
is_native_method: false
}
stack_trace {
class_name: "org.apache.drill.exec.work.fragment.FragmentExecutor"
file_name: "FragmentExecutor.java"
line_number: 267
method_name: "run"
is_native_method: false
}
stack_trace {
class_name: "org.apache.drill.common.SelfCleaningRunnable"
file_name: "SelfCleaningRunnable.java"
line_number: 38
method_name: "run"
is_native_method: false
}
stack_trace {
class_name: "..."
line_number: 0
method_name: "..."
is_native_method: false
}
}
}{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: LATERAL and UNNEST support for Drill

2018-02-02 Thread Julian Hyde
It’s all consistent - really!

Consider ‘x OR NOT y’. Is ‘OR NOT’ an operator? Of course not. ‘NOT’ is a 
prefix operator, and ‘OR’ applies to its result.

Similarly, ‘x JOIN LATERAL y ON (c)’, the ‘LATERAL’ applies to ‘y’ more, and 
only later does the ‘JOIN’ kick in. ‘LATERAL’ can apply to any table-like 
object, such as a sub-query.

So ‘x, LATERAL y’ fits with that model. (In fact, we think of ‘,’ as short-hand 
for ‘CROSS JOIN’.)

CROSS APPLY and OUTER APPLY are really useful, and Oracle/Microsoft designed 
them well, but they are not fundamental. Calcite implements them as syntactic 
sugar — i.e. a rewrite at an early stage to ‘CROSS JOIN LATERAL’ — and I think 
that’s how Drill should do it too.

LATERAL is fundamental but it’s hard for end-users to understand, so it’s 
better if end-users use UNNEST and CROSS/OUTER APPLY for most purposes.

Julian


> On Feb 2, 2018, at 9:08 AM, Aman Sinha  wrote:
> 
> Internally, within the query engine both LATERAL and CROSS/OUTER APPLY
> would be essentially doing the same work.
> 
> After thinking about this some more, from the syntax perspective I do see
> Julian's point about the CROSS/OUTER APPLY having some advantages over
> LATERAL.
> 
> In particular, with LATERAL there are a few inconsistencies:  (I am not
> sure why the SQL standards committee did not define this more crisply)
> 
> Inner Join versions (equivalent to CROSS APPLY) ...there are 2
> alternatives:
>  1.  SELECT .. FROM t1,  LATERAL (SELECT ... FROM UNNEST(t1.orders))  ;
>  2.  SELECT .. FROM t1 INNER JOIN LATERAL (SELECT .. FROM
> UNNEST(t1.orders)) on TRUE;
> 
> Outer Join version (equivalent to OUTER APPLY):
>  3.  SELECT .. FROM t1 LEFT JOIN LATERAL (SELECT ... FROM
> UNNEST(t1.orders))  on TRUE ;
> 
> I can see 3 inconsistencies :
> a) In the first query there is a 'comma' separating the table with LATERAL
> because LATERAL appears as a prefix operator acting on the subquery
>  whereas in the 2nd and 3rd queries, it appears as join qualifier and
> there is no 'comma'.   This gives the impression that it is a prefix in 1
> but a binary operator in 2 and 3.
> b) For the outer join case, there is only one way to express the query
> whereas with inner join there are 2 alternatives.
> c) The 'ON TRUE' clause is needed in the 'JOIN' versions 2 and 3, but not
> needed in 1.
> 
> 
> In comparison, the CROSS/OUTER APPLY is less ambiguous : there is only one
> way to express a CROSS or OUTER APPLY.  It does not involve the 'JOIN'
> clause,
> so there is no need for the 'ON' clause either.  It also makes it explicit
> that the APPLY keyword is about applying a table function.
> 
> Couple of disadvantages of the CROSS/OUTER APPLY:
> a) it is not official SQL standard, but as Julian said, it falls in the
> 'gray area'.
> b) If Drill supports these, the SqlConformance setting in Calcite would
> need to be customized..currently we use DEFAULT.
> 
> Overall, seems the advantages of CROSS/OUTER APPLY outweigh the
> disadvantages, so we could go with that.
> 
> 
> -Aman
> 
> 
> 
> 
> On Tue, Jan 30, 2018 at 4:42 PM, Julian Hyde  > wrote:
> 
>> There are a few things in the gray area between the official standard and
>> the de facto standard. CROSS/OUTER APPLY is well thought out, does not
>> conflict with standard SQL, and has a couple of big vendors behind it. I
>> think it’s safe to add it. (Calcite has a conformance setting so that
>> someone can disable it if they choose.)
>> 
>> SQL Server goes a bit further, in that allows you to invoke a UDF without
>> the TABLE keyword. We do not allow that in Calcite, at this point, because
>> it could be ambiguous[2].
>> 
>> I recommend that you do not start building key features on LATERAL.
>> LATERAL is a concept that is tricky for users to get their heads around -
>> it is really a hack that subtly adjusts the namespaces that the SQL
>> validator uses when resolving table aliases inside a join. It’s not
>> inherently about unnesting data or calling functions - in fact LATERAL is
>> implicit if you use UNNEST. The less your end users need to type LATERAL,
>> the better.
>> 
>> Julian
>> 
>> [2] https://issues.apache.org/jira/browse/CALCITE-1490 <
>> https://issues.apache.org/jira/browse/CALCITE-1490 
>> >
>> 
>>> On Jan 30, 2018, at 12:20 PM, Chunhui Shi  wrote:
>>> 
>>> Hi Julian, I think CROSS APPLY and OUTER APPLY are what we want and we
>> have discussed internally. the only problem is, they are not standard SQL
>> although they are supported in SQL server and Oracle 12. Since SQL syntax
>> does not have way to "invoke table function for each row", we have to
>> choose between using APPLY or overloading the meaning of LATERAL as in the
>> current document attached in the JIRA. Which way you think is the better
>> way?
>>> 
>>> 
>>> Thanks,
>>> 
>>> Chunhui
>>> 
>>> 
>>> From: Julian Hyde 

[GitHub] drill pull request #1101: DRILL-6032: Made the batch sizing for HashAgg more...

2018-02-02 Thread ilooner
Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/1101#discussion_r165704469
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggTemplate.java
 ---
@@ -956,21 +925,8 @@ private void spillAPartition(int part) {
   this.htables[part].outputKeys(currOutBatchIndex, this.outContainer, 
outStartIdxHolder.value, outNumRecordsHolder.value, numPendingOutput);
 
   // set the value count for outgoing batch value vectors
-  /* int i = 0; */
   for (VectorWrapper v : outgoing) {
 v.getValueVector().getMutator().setValueCount(numOutputRecords);
-/*
--- End diff --

Thanks!


---


Re: LATERAL and UNNEST support for Drill

2018-02-02 Thread Aman Sinha
Internally, within the query engine both LATERAL and CROSS/OUTER APPLY
would be essentially doing the same work.

After thinking about this some more, from the syntax perspective I do see
Julian's point about the CROSS/OUTER APPLY having some advantages over
LATERAL.

In particular, with LATERAL there are a few inconsistencies:  (I am not
sure why the SQL standards committee did not define this more crisply)

Inner Join versions (equivalent to CROSS APPLY) ...there are 2
alternatives:
  1.  SELECT .. FROM t1,  LATERAL (SELECT ... FROM UNNEST(t1.orders))  ;
  2.  SELECT .. FROM t1 INNER JOIN LATERAL (SELECT .. FROM
UNNEST(t1.orders)) on TRUE;

Outer Join version (equivalent to OUTER APPLY):
  3.  SELECT .. FROM t1 LEFT JOIN LATERAL (SELECT ... FROM
UNNEST(t1.orders))  on TRUE ;

I can see 3 inconsistencies :
 a) In the first query there is a 'comma' separating the table with LATERAL
because LATERAL appears as a prefix operator acting on the subquery
  whereas in the 2nd and 3rd queries, it appears as join qualifier and
there is no 'comma'.   This gives the impression that it is a prefix in 1
but a binary operator in 2 and 3.
 b) For the outer join case, there is only one way to express the query
whereas with inner join there are 2 alternatives.
 c) The 'ON TRUE' clause is needed in the 'JOIN' versions 2 and 3, but not
needed in 1.


In comparison, the CROSS/OUTER APPLY is less ambiguous : there is only one
way to express a CROSS or OUTER APPLY.  It does not involve the 'JOIN'
clause,
so there is no need for the 'ON' clause either.  It also makes it explicit
that the APPLY keyword is about applying a table function.

Couple of disadvantages of the CROSS/OUTER APPLY:
a) it is not official SQL standard, but as Julian said, it falls in the
'gray area'.
b) If Drill supports these, the SqlConformance setting in Calcite would
need to be customized..currently we use DEFAULT.

Overall, seems the advantages of CROSS/OUTER APPLY outweigh the
disadvantages, so we could go with that.


-Aman




On Tue, Jan 30, 2018 at 4:42 PM, Julian Hyde  wrote:

> There are a few things in the gray area between the official standard and
> the de facto standard. CROSS/OUTER APPLY is well thought out, does not
> conflict with standard SQL, and has a couple of big vendors behind it. I
> think it’s safe to add it. (Calcite has a conformance setting so that
> someone can disable it if they choose.)
>
> SQL Server goes a bit further, in that allows you to invoke a UDF without
> the TABLE keyword. We do not allow that in Calcite, at this point, because
> it could be ambiguous[2].
>
> I recommend that you do not start building key features on LATERAL.
> LATERAL is a concept that is tricky for users to get their heads around -
> it is really a hack that subtly adjusts the namespaces that the SQL
> validator uses when resolving table aliases inside a join. It’s not
> inherently about unnesting data or calling functions - in fact LATERAL is
> implicit if you use UNNEST. The less your end users need to type LATERAL,
> the better.
>
> Julian
>
> [2] https://issues.apache.org/jira/browse/CALCITE-1490 <
> https://issues.apache.org/jira/browse/CALCITE-1490>
>
> > On Jan 30, 2018, at 12:20 PM, Chunhui Shi  wrote:
> >
> > Hi Julian, I think CROSS APPLY and OUTER APPLY are what we want and we
> have discussed internally. the only problem is, they are not standard SQL
> although they are supported in SQL server and Oracle 12. Since SQL syntax
> does not have way to "invoke table function for each row", we have to
> choose between using APPLY or overloading the meaning of LATERAL as in the
> current document attached in the JIRA. Which way you think is the better
> way?
> >
> >
> > Thanks,
> >
> > Chunhui
> >
> > 
> > From: Julian Hyde 
> > Sent: Tuesday, January 30, 2018 12:01:47 PM
> > To: dev@drill.apache.org
> > Subject: Re: LATERAL and UNNEST support for Drill
> >
> > LATERAL is a prefix operator not a binary operator, so I believe you are
> missing a comma:
> >
> >> FROM t1 LATERAL UNNEST (t1.array1), UNNEST (t1.array2)
> >
> > should be
> >
> >> FROM t1, LATERAL UNNEST (t1.array1), LATERAL UNNEST (t1.array2)
> >
> > I agree with your remarks about the extra power of putting UNNEST in the
> FROM clause (per the standard) versus the SELECT clause (per PostgreSQL).
> >
> > Note that Calcite supports CROSS APPLY and OUTER APPLY[1]. This is
> useful when you want to apply a table function for each row of a table. It
> is just syntactic sugar for LATERAL TABLE so you may get it virtually for
> free.
> >
> > Julian
> >
> >
> > [1] https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.
> apache.org_jira_browse_CALCITE-2D1472=DwIFAg=cskdkSMqhcnjZxdQVpwTXg=
> FCGQb-L4gJ1XbsL1WU2sugDtPvzIxWFzAi5u4TTtxaI=
> 9Y08i3YgrresMOxi7InbjxT0WSHQkcPjJufQWLI9PGk=
> PfJfEyQhvXOSwuTo04m94qSHfz2KHZrR2WPazXpUl6g=  

[GitHub] drill pull request #1060: DRILL-5846: Improve parquet performance for Flat D...

2018-02-02 Thread sachouche
Github user sachouche commented on a diff in the pull request:

https://github.com/apache/drill/pull/1060#discussion_r165698800
  
--- Diff: exec/memory/base/src/main/java/io/netty/buffer/DrillBuf.java ---
@@ -703,7 +703,18 @@ protected void _setLong(int index, long value) {
 
   @Override
   public ByteBuf getBytes(int index, ByteBuf dst, int dstIndex, int 
length) {
-udle.getBytes(index + offset, dst, dstIndex, length);
+final int BULK_COPY_THR = 1024;
--- End diff --

Vlad,

- I had a chat with @bitblender and he explains that Java was invoking a 
stub (not a function call) to perform copyMemory; he agreed copyMemory will be 
slower for small buffers and the task was to determine the cutoff point
- My tests (I will send you my test) indicate that a length of 1024bytes is 
the length were copyMemory starts performing exactly as getByte()

NOTE - I am using JRE 1.8; static buffers initialized once; payload 1MB 
(1048576bytes) and loop-count of 102400; MacOS High Sierra; 1 thread, 4GB MX, MS


---


[GitHub] drill pull request #1108: DRILL-6130: Fix NPE during physical plan submissio...

2018-02-02 Thread arina-ielchiieva
GitHub user arina-ielchiieva opened a pull request:

https://github.com/apache/drill/pull/1108

DRILL-6130: Fix NPE during physical plan submission for various storage 
plugins

1. Fixed ser / de issues for Hive, Kafka, Hbase plugins.
2. Added physical plan submission unit test for all storage plugins in 
contrib module.
3. Refactoring.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/arina-ielchiieva/drill DRILL-6130

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/1108.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1108


commit aabd11cc327100d248b22b72d9696d8911545d75
Author: Arina Ielchiieva 
Date:   2018-02-01T17:44:43Z

DRILL-6130: Fix NPE during physical plan submission for various storage 
plugins

1. Fixed ser / de issues for Hive, Kafka, Hbase plugins.
2. Added physical plan submission unit test for all storage plugins in 
contrib module.
3. Refactoring.




---


[GitHub] drill issue #1108: DRILL-6130: Fix NPE during physical plan submission for v...

2018-02-02 Thread arina-ielchiieva
Github user arina-ielchiieva commented on the issue:

https://github.com/apache/drill/pull/1108
  
@vdiravka please review.


---