[jira] [Commented] (DRILL-6139) Travis CI hangs on TestVariableWidthWriter#testRestartRow

2018-02-06 Thread Paul Rogers (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16354194#comment-16354194
 ] 

Paul Rogers commented on DRILL-6139:


This is in my code; I'll take a look. The test did not fail last time I ran the 
developer unit tests. This bit of code is a bit tricky, so I'll need to poke 
around a bit.

> Travis CI hangs on TestVariableWidthWriter#testRestartRow
> -
>
> Key: DRILL-6139
> URL: https://issues.apache.org/jira/browse/DRILL-6139
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.12.0
>Reporter: Boaz Ben-Zvi
>Assignee: Paul Rogers
>Priority: Major
>
> The Travis CI fails (probably hangs, then times out) in the following test:
> {code:java}
> Running org.apache.drill.test.rowSet.test.DummyWriterTest Running 
> org.apache.drill.test.rowSet.test.DummyWriterTest#testDummyScalar Running 
> org.apache.drill.test.rowSet.test.DummyWriterTest#testDummyMap Tests run: 2, 
> Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 3.109 sec - in 
> org.apache.drill.test.rowSet.test.DummyWriterTest Running 
> org.apache.drill.test.rowSet.test.TestVariableWidthWriter Running 
> org.apache.drill.test.rowSet.test.TestVariableWidthWriter#testSkipNulls 
> Running org.apache.drill.test.rowSet.test.TestVariableWidthWriter#testWrite 
> Running 
> org.apache.drill.test.rowSet.test.TestVariableWidthWriter#testFillEmpties 
> Running 
> org.apache.drill.test.rowSet.test.TestVariableWidthWriter#testRollover 
> Running 
> org.apache.drill.test.rowSet.test.TestVariableWidthWriter#testSizeLimit 
> Running 
> org.apache.drill.test.rowSet.test.TestVariableWidthWriter#testRolloverWithEmpties
>  Running 
> org.apache.drill.test.rowSet.test.TestVariableWidthWriter#testRestartRow 
> Killed
>  
> Results : 
> Tests run: 1554, Failures: 0, Errors: 0, Skipped: 66{code}
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6133) RecordBatchSizer throws IndexOutOfBounds Exception for union vector

2018-02-06 Thread Paul Rogers (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16354199#comment-16354199
 ] 

Paul Rogers commented on DRILL-6133:


This looks familiar; it may have been fixed in my private branch. I'll take a 
look to see if I can just port the fix to master.

> RecordBatchSizer throws IndexOutOfBounds Exception for union vector
> ---
>
> Key: DRILL-6133
> URL: https://issues.apache.org/jira/browse/DRILL-6133
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.12.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
>Priority: Minor
> Fix For: 1.13.0
>
>
> RecordBatchSizer throws IndexOutOfBoundsException when trying to get payload 
> byte count of union vector. 
> [Error Id: 430026a7-a963-40f1-bae2-1850649e8434 on 172.30.8.158:31013]
>  at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633)
>  ~[classes/:na]
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:300)
>  [classes/:na]
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:160)
>  [classes/:na]
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:266)
>  [classes/:na]
>  at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [classes/:na]
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [na:1.8.0_45]
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_45]
>  at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45]
> Caused by: java.lang.IndexOutOfBoundsException: DrillBuf[2], udle: [1 0..0], 
> index: 4, length: 4 (expected: range(0, 0))
> DrillBuf[2], udle: [1 0..0]
>  at 
> org.apache.drill.exec.memory.BoundsChecking.checkIndex(BoundsChecking.java:80)
>  ~[classes/:na]
>  at 
> org.apache.drill.exec.memory.BoundsChecking.lengthCheck(BoundsChecking.java:86)
>  ~[classes/:na]
>  at io.netty.buffer.DrillBuf.chk(DrillBuf.java:114) ~[classes/:4.0.48.Final]
>  at io.netty.buffer.DrillBuf.getInt(DrillBuf.java:484) 
> ~[classes/:4.0.48.Final]
>  at 
> org.apache.drill.exec.vector.UInt4Vector$Accessor.get(UInt4Vector.java:432) 
> ~[classes/:na]
>  at 
> org.apache.drill.exec.vector.VarCharVector.getPayloadByteCount(VarCharVector.java:308)
>  ~[classes/:na]
>  at 
> org.apache.drill.exec.vector.NullableVarCharVector.getPayloadByteCount(NullableVarCharVector.java:256)
>  ~[classes/:na]
>  at 
> org.apache.drill.exec.vector.complex.AbstractMapVector.getPayloadByteCount(AbstractMapVector.java:303)
>  ~[classes/:na]
>  at 
> org.apache.drill.exec.vector.complex.UnionVector.getPayloadByteCount(UnionVector.java:574)
>  ~[classes/:na]
>  at 
> org.apache.drill.exec.physical.impl.spill.RecordBatchSizer$ColumnSize.(RecordBatchSizer.java:147)
>  ~[classes/:na]
>  at 
> org.apache.drill.exec.physical.impl.spill.RecordBatchSizer.measureColumn(RecordBatchSizer.java:403)
>  ~[classes/:na]
>  at 
> org.apache.drill.exec.physical.impl.spill.RecordBatchSizer.(RecordBatchSizer.java:350)
>  ~[classes/:na]
>  at 
> org.apache.drill.exec.physical.impl.spill.RecordBatchSizer.(RecordBatchSizer.java:320)
>  ~[classes/:na]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6032) Use RecordBatchSizer to estimate size of columns in HashAgg

2018-02-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16354225#comment-16354225
 ] 

ASF GitHub Bot commented on DRILL-6032:
---

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/1101#discussion_r166387829
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggTemplate.java
 ---
@@ -140,6 +131,9 @@
   private OperatorContext oContext;
   private BufferAllocator allocator;
 
+  private Map keySizes;
+  // The size estimates for varchar value columns. The keys are the index 
of the varchar value columns.
+  private Map varcharValueSizes;
--- End diff --

As far as I know we don't support aggregations on repeated types. Varchar 
is the only non FixedWidth type we can aggregate.


> Use RecordBatchSizer to estimate size of columns in HashAgg
> ---
>
> Key: DRILL-6032
> URL: https://issues.apache.org/jira/browse/DRILL-6032
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Major
> Fix For: 1.13.0
>
>
> We need to use the RecordBatchSize to estimate the size of columns in the 
> Partition batches created by HashAgg.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6032) Use RecordBatchSizer to estimate size of columns in HashAgg

2018-02-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16354378#comment-16354378
 ] 

ASF GitHub Bot commented on DRILL-6032:
---

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/1101#discussion_r166409878
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggTemplate.java
 ---
@@ -226,7 +221,7 @@ public BatchHolder() {
 ((FixedWidthVector) vector).allocateNew(HashTable.BATCH_SIZE);
   } else if (vector instanceof VariableWidthVector) {
 // This case is never used  a varchar falls under 
ObjectVector which is allocated on the heap !
-((VariableWidthVector) vector).allocateNew(maxColumnWidth, 
HashTable.BATCH_SIZE);
+((VariableWidthVector) vector).allocateNew(columnSize, 
HashTable.BATCH_SIZE);
--- End diff --

Thanks for catching this. It should use stdSize here.


> Use RecordBatchSizer to estimate size of columns in HashAgg
> ---
>
> Key: DRILL-6032
> URL: https://issues.apache.org/jira/browse/DRILL-6032
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Major
> Fix For: 1.13.0
>
>
> We need to use the RecordBatchSize to estimate the size of columns in the 
> Partition batches created by HashAgg.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (DRILL-6070) Hash join with empty tables should not do casting of data types to INT

2018-02-06 Thread Vitalii Diravka (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalii Diravka resolved DRILL-6070.

   Resolution: Resolved
 Assignee: Vitalii Diravka
Fix Version/s: (was: Future)
   1.13.0

This issue is resolved in context of DRILL-5851 and DRILL-4185.
The query works for HashJoin, MergeJoin and NestedLoopJoin operators.

> Hash join with empty tables should not do casting of data types to 
> INT
> 
>
> Key: DRILL-6070
> URL: https://issues.apache.org/jira/browse/DRILL-6070
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
>Priority: Major
> Fix For: 1.13.0
>
> Attachments: nullable1.json
>
>
> LeftJoin query by leveraging HashJoin operator leads to error, but by using 
> MergeJoin works fine.
> {code}
> 0: jdbc:drill:zk=local> alter session set `planner.enable_hashjoin` = true;
> +---+---+
> |  ok   |  summary  |
> +---+---+
> | true  | planner.enable_hashjoin updated.  |
> +---+---+
> 1 row selected (0.078 seconds)
> 0: jdbc:drill:zk=local> alter session set `planner.enable_mergejoin` = false;
> +---++
> |  ok   |  summary   |
> +---++
> | true  | planner.enable_mergejoin updated.  |
> +---++
> 1 row selected (0.079 seconds)
> 0: jdbc:drill:zk=local> select t1.a1, t1.b1, t2.a2, t2.b2 from 
> dfs.`/tmp/nullable1.json` t1 left join dfs.`/tmp/empty0.json` t2 on t1.b1 = 
> t2.b2;
> Error: SYSTEM ERROR: DrillRuntimeException: Join only supports implicit casts 
> between 1. Numeric data
>  2. Varchar, Varbinary data 3. Date, Timestamp data Left type: VARCHAR, Right 
> type: INT. Add explicit casts to avoid this error
> Fragment 0:0
> [Error Id: 2cfc662f-48c2-4e62-a2ea-5a0f33d64c9b on vitalii-pc:31010] 
> (state=,code=0)
> {code}
> {code}
> 00-00Screen : rowType = RecordType(ANY a1, ANY b1, ANY a2, ANY b2): 
> rowcount = 1.0, cumulative cost = {2.1 rows, 20.1 cpu, 0.0 io, 0.0 network, 
> 17.6 memory}, id = 930
> 00-01  Project(a1=[$0], b1=[$1], a2=[$2], b2=[$3]) : rowType = 
> RecordType(ANY a1, ANY b1, ANY a2, ANY b2): rowcount = 1.0, cumulative cost = 
> {2.0 rows, 20.0 cpu, 0.0 io, 0.0 network, 17.6 memory}, id = 929
> 00-02Project(a1=[$1], b1=[$0], a2=[$3], b2=[$2]) : rowType = 
> RecordType(ANY a1, ANY b1, ANY a2, ANY b2): rowcount = 1.0, cumulative cost = 
> {2.0 rows, 20.0 cpu, 0.0 io, 0.0 network, 17.6 memory}, id = 928
> 00-03  HashJoin(condition=[=($0, $2)], joinType=[left]) : rowType = 
> RecordType(ANY b1, ANY a1, ANY b2, ANY a2): rowcount = 1.0, cumulative cost = 
> {2.0 rows, 20.0 cpu, 0.0 io, 0.0 network, 17.6 memory}, id = 927
> 00-05Scan(groupscan=[EasyGroupScan 
> [selectionRoot=file:/home/vitalii/IdeaProjects/drill-fork/exec/java-exec/target/test-classes/jsoninput/nullable1.json,
>  numFiles=1, columns=[`b1`, `a1`], 
> files=[file:/home/vitalii/IdeaProjects/drill-fork/exec/java-exec/target/test-classes/jsoninput/nullable1.json]]])
>  : rowType = RecordType(ANY b1, ANY a1): rowcount = 1.0, cumulative cost = 
> {0.0 rows, 0.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 925
> 00-04Scan(groupscan=[EasyGroupScan 
> [selectionRoot=file:/home/vitalii/IdeaProjects/drill-fork/exec/java-exec/src/test/resources/project/pushdown/empty0.json,
>  numFiles=1, columns=[`b2`, `a2`], 
> files=[file:/home/vitalii/IdeaProjects/drill-fork/exec/java-exec/src/test/resources/project/pushdown/empty0.json]]])
>  : rowType = RecordType(ANY b2, ANY a2): rowcount = 1.0, cumulative cost = 
> {0.0 rows, 0.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 926
> {code}
> Left join with empty tables should not do casting of data types to 
> INT.
> The result should be the same as for MergeJoin operator:
> {code}
> 0: jdbc:drill:zk=local> alter session set `planner.enable_hashjoin` = false;
> +---+---+
> |  ok   |  summary  |
> +---+---+
> | true  | planner.enable_hashjoin updated.  |
> +---+---+
> 1 row selected (0.087 seconds)
> 0: jdbc:drill:zk=local> alter session set `planner.enable_mergejoin` = true;
> +---++
> |  ok   |  summary   |
> +---++
> | true  | planner.enable_mergejoin updated.  |
> +---++
> 1 row selected (0.073 seconds)
> 0: jdbc:drill:zk=local> select 

[jira] [Commented] (DRILL-6114) Complete internal metadata layer for improved batch handling

2018-02-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16354218#comment-16354218
 ] 

ASF GitHub Bot commented on DRILL-6114:
---

Github user ppadma commented on the issue:

https://github.com/apache/drill/pull/1112
  
@paul-rogers I ran the pre commit tests. No issues. Everything passed. Will 
do one more time once code reviews are done.


> Complete internal metadata layer for improved batch handling
> 
>
> Key: DRILL-6114
> URL: https://issues.apache.org/jira/browse/DRILL-6114
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.13.0
>
>
> Slice of the ["batch handling" 
> project.|https://github.com/paul-rogers/drill/wiki/Batch-Handling-Upgrades] 
> that includes enhancements to the internal metadata system.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6032) Use RecordBatchSizer to estimate size of columns in HashAgg

2018-02-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16354239#comment-16354239
 ] 

ASF GitHub Bot commented on DRILL-6032:
---

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/1101#discussion_r166388922
  
--- Diff: exec/vector/src/main/codegen/templates/FixedValueVectors.java ---
@@ -298,6 +298,11 @@ public int getPayloadByteCount(int valueCount) {
 return valueCount * ${type.width};
   }
 
+  @Override
+  public int getValueWidth() {
--- End diff --

TypeHelper doesn't return the correct size for Nullable columns. For 
example NullableIntVector has a width of 5 instead of 4.


> Use RecordBatchSizer to estimate size of columns in HashAgg
> ---
>
> Key: DRILL-6032
> URL: https://issues.apache.org/jira/browse/DRILL-6032
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Major
> Fix For: 1.13.0
>
>
> We need to use the RecordBatchSize to estimate the size of columns in the 
> Partition batches created by HashAgg.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-4834) decimal implementation is vulnerable to overflow errors, and extremely complex

2018-02-06 Thread Pritesh Maker (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16354280#comment-16354280
 ] 

Pritesh Maker commented on DRILL-4834:
--

[~daveoshinsky] since the work will continue with DRILL-6094, should we resolve 
this Jira?

> decimal implementation is vulnerable to overflow errors, and extremely complex
> --
>
> Key: DRILL-4834
> URL: https://issues.apache.org/jira/browse/DRILL-4834
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.6.0
> Environment: Drill 1.7 on any platform
>Reporter: Dave Oshinsky
>Assignee: Dave Oshinsky
>Priority: Major
> Fix For: 1.13.0
>
>
> While working on a fix for DRILL-4704, logic was added to CastIntDecimal.java 
> template to handle the situation where a precision is not supplied (i.e., the 
> supplied precision is zero) for an integer value that is to be casted to a 
> decimal.  The Drill decimal implementation uses a limited selection of fixed 
> decimal precision data types (the total number of decimal digits, i.e., 
> Decimal9, 18, 28, 38) to represent decimal values.  If the destination 
> precision is too small to represent the input integer that is being casted, 
> there is no clean way to deal with the overflow error properly.
> While using fixed decimal precisions as is being done currently can lead to 
> more efficient use of memory, it often will actually lead to less efficient 
> use of memory (when the fixed precision is specified significantly larger 
> than is actually needed to represent the numbers), and it results in a 
> tremendous mushrooming of the complexity of the code.  For each fixed 
> precision (and there are only a limited set of selections, 9, 18, 28, 38, 
> which itself leads to memory inefficiency), there is a separate set of code 
> generated from templates.  For each pairwise combination of decimal or 
> non-decimal numeric types, there are multiple places in the code where 
> conversions must be handled, or conditions must be included to handle the 
> difference in precision between the two types.  A one-size-fits-all approach 
> (using a variable width vector to represent any decimal precision) would 
> usually be more memory-efficient (since precisions are often over-specified), 
> and would greatly simplify the code.
> Also see the DRILL-4184 issue, which is related.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6135) New Feature: SHOW CREATE TABLE / VIEW command

2018-02-06 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16354347#comment-16354347
 ] 

Julian Hyde commented on DRILL-6135:


After some research, I agree that {{SHOW CREATE}} seems to be the standard. As 
you say, MySQL and Presto (and its derivative, Athena) support it. Also Hive 
supports it; see HIVE-967. Oracle, PostgreSQL, DB2, SQL Server do not have an 
equivalent (other than using stored procedures).

There is a minor concern for how we would generate DDL for sub-objects such as 
columns and foreign keys, which do not have their own CREATE statement but 
nevertheless could have their own DDL fragment.

> New Feature: SHOW CREATE TABLE / VIEW command
> -
>
> Key: DRILL-6135
> URL: https://issues.apache.org/jira/browse/DRILL-6135
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Metadata, Storage - Information Schema
>Affects Versions: 1.10.0
> Environment: MapR 5.2 + Kerberos
>Reporter: Hari Sekhon
>Priority: Major
>
> Feature Request to implement
> {code:java}
> SHOW CREATE VIEW ;{code}
> A colleague and I just had to cat the view file which is non-pretty json and 
> hard to read a large view creation statement that could have been presented 
> in drill shell and formatted.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-4834) decimal implementation is vulnerable to overflow errors, and extremely complex

2018-02-06 Thread Dave Oshinsky (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16354367#comment-16354367
 ] 

Dave Oshinsky commented on DRILL-4834:
--

[~priteshm] the issues with this Jira (and DRILL-4184) are not yet resolved, so 
the Jira should remain.  On the other hand, one could view this Jira as now 
superseded by DRILL-6094.  Would resolving this Jira cause any problems with 
using the changes in the corresponding PR 570 in the resolution for DRILL-6094?

> decimal implementation is vulnerable to overflow errors, and extremely complex
> --
>
> Key: DRILL-4834
> URL: https://issues.apache.org/jira/browse/DRILL-4834
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.6.0
> Environment: Drill 1.7 on any platform
>Reporter: Dave Oshinsky
>Assignee: Dave Oshinsky
>Priority: Major
> Fix For: 1.13.0
>
>
> While working on a fix for DRILL-4704, logic was added to CastIntDecimal.java 
> template to handle the situation where a precision is not supplied (i.e., the 
> supplied precision is zero) for an integer value that is to be casted to a 
> decimal.  The Drill decimal implementation uses a limited selection of fixed 
> decimal precision data types (the total number of decimal digits, i.e., 
> Decimal9, 18, 28, 38) to represent decimal values.  If the destination 
> precision is too small to represent the input integer that is being casted, 
> there is no clean way to deal with the overflow error properly.
> While using fixed decimal precisions as is being done currently can lead to 
> more efficient use of memory, it often will actually lead to less efficient 
> use of memory (when the fixed precision is specified significantly larger 
> than is actually needed to represent the numbers), and it results in a 
> tremendous mushrooming of the complexity of the code.  For each fixed 
> precision (and there are only a limited set of selections, 9, 18, 28, 38, 
> which itself leads to memory inefficiency), there is a separate set of code 
> generated from templates.  For each pairwise combination of decimal or 
> non-decimal numeric types, there are multiple places in the code where 
> conversions must be handled, or conditions must be included to handle the 
> difference in precision between the two types.  A one-size-fits-all approach 
> (using a variable width vector to represent any decimal precision) would 
> usually be more memory-efficient (since precisions are often over-specified), 
> and would greatly simplify the code.
> Also see the DRILL-4184 issue, which is related.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6032) Use RecordBatchSizer to estimate size of columns in HashAgg

2018-02-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16354394#comment-16354394
 ] 

ASF GitHub Bot commented on DRILL-6032:
---

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/1101#discussion_r166411194
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggTemplate.java
 ---
@@ -733,28 +780,32 @@ private void restoreReservedMemory() {
* @param records
*/
   private void allocateOutgoing(int records) {
-// Skip the keys and only allocate for outputting the workspace values
-// (keys will be output through splitAndTransfer)
-Iterator outgoingIter = outContainer.iterator();
-for (int i = 0; i < numGroupByOutFields; i++) {
-  outgoingIter.next();
-}
-
 // try to preempt an OOM by using the reserved memory
 useReservedOutgoingMemory();
 long allocatedBefore = allocator.getAllocatedMemory();
 
-while (outgoingIter.hasNext()) {
+for (int columnIndex = numGroupByOutFields; columnIndex < 
outContainer.getNumberOfColumns(); columnIndex++) {
+  final VectorWrapper wrapper = 
outContainer.getValueVector(columnIndex);
   @SuppressWarnings("resource")
-  ValueVector vv = outgoingIter.next().getValueVector();
+  final ValueVector vv = wrapper.getValueVector();
 
-  AllocationHelper.allocatePrecomputedChildCount(vv, records, 
maxColumnWidth, 0);
+  final RecordBatchSizer.ColumnSize columnSizer = new 
RecordBatchSizer.ColumnSize(wrapper.getValueVector());
+  int columnSize;
+
+  if (columnSizer.hasKnownSize()) {
+// For fixed width vectors we know the size of each record
+columnSize = columnSizer.getKnownSize();
+  } else {
+// For var chars we need to use the input estimate
+columnSize = varcharValueSizes.get(columnIndex);
+  }
+
+  AllocationHelper.allocatePrecomputedChildCount(vv, records, 
columnSize, 0);
--- End diff --

Hmm. The element count from the sizer would tell us the number of rows in 
the incoming batch. But that will not give us an accurate prediction for the 
number of keys we have aggregations for. Maybe we should use the number of keys 
in the hashtable instead.


> Use RecordBatchSizer to estimate size of columns in HashAgg
> ---
>
> Key: DRILL-6032
> URL: https://issues.apache.org/jira/browse/DRILL-6032
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Major
> Fix For: 1.13.0
>
>
> We need to use the RecordBatchSize to estimate the size of columns in the 
> Partition batches created by HashAgg.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (DRILL-6139) Travis CI hangs on TestVariableWidthWriter#testRestartRow

2018-02-06 Thread Paul Rogers (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers reassigned DRILL-6139:
--

Assignee: Paul Rogers

> Travis CI hangs on TestVariableWidthWriter#testRestartRow
> -
>
> Key: DRILL-6139
> URL: https://issues.apache.org/jira/browse/DRILL-6139
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.12.0
>Reporter: Boaz Ben-Zvi
>Assignee: Paul Rogers
>Priority: Major
>
> The Travis CI fails (probably hangs, then times out) in the following test:
> {code:java}
> Running org.apache.drill.test.rowSet.test.DummyWriterTest Running 
> org.apache.drill.test.rowSet.test.DummyWriterTest#testDummyScalar Running 
> org.apache.drill.test.rowSet.test.DummyWriterTest#testDummyMap Tests run: 2, 
> Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 3.109 sec - in 
> org.apache.drill.test.rowSet.test.DummyWriterTest Running 
> org.apache.drill.test.rowSet.test.TestVariableWidthWriter Running 
> org.apache.drill.test.rowSet.test.TestVariableWidthWriter#testSkipNulls 
> Running org.apache.drill.test.rowSet.test.TestVariableWidthWriter#testWrite 
> Running 
> org.apache.drill.test.rowSet.test.TestVariableWidthWriter#testFillEmpties 
> Running 
> org.apache.drill.test.rowSet.test.TestVariableWidthWriter#testRollover 
> Running 
> org.apache.drill.test.rowSet.test.TestVariableWidthWriter#testSizeLimit 
> Running 
> org.apache.drill.test.rowSet.test.TestVariableWidthWriter#testRolloverWithEmpties
>  Running 
> org.apache.drill.test.rowSet.test.TestVariableWidthWriter#testRestartRow 
> Killed
>  
> Results : 
> Tests run: 1554, Failures: 0, Errors: 0, Skipped: 66{code}
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6123) Limit batch size for Merge Join based on memory

2018-02-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16354231#comment-16354231
 ] 

ASF GitHub Bot commented on DRILL-6123:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1107#discussion_r166387073
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/MergeJoinBatch.java
 ---
@@ -102,20 +105,78 @@
   private final List comparators;
   private final JoinRelType joinType;
   private JoinWorker worker;
+  private final long outputBatchSize;
 
   private static final String LEFT_INPUT = "LEFT INPUT";
   private static final String RIGHT_INPUT = "RIGHT INPUT";
 
+  private class MergeJoinMemoryManager extends 
AbstractRecordBatchMemoryManager {
+private int leftRowWidth;
+private int rightRowWidth;
+
+/**
+ * mergejoin operates on one record at a time from the left and right 
batches
+ * using RecordIterator abstraction. We have a callback mechanism to 
get notified
+ * when new batch is loaded in record iterator.
+ * This can get called in the middle of current output batch we are 
building.
+ * when this gets called, adjust number of output rows for the current 
batch and
+ * update the value to be used for subsequent batches.
+ */
+@Override
+public void update(int inputIndex) {
+  switch(inputIndex) {
+case 0:
+  final RecordBatchSizer leftSizer = new RecordBatchSizer(left);
+  leftRowWidth = leftSizer.netRowWidth();
+  break;
+case 1:
+  final RecordBatchSizer rightSizer = new RecordBatchSizer(right);
+  rightRowWidth = rightSizer.netRowWidth();
+default:
+  break;
+  }
+
+  final int newOutgoingRowWidth = leftRowWidth + rightRowWidth;
+
+  // If outgoing row width is 0, just return. This is possible for 
empty batches or
+  // when first set of batches come with OK_NEW_SCHEMA and no data.
+  if (newOutgoingRowWidth == 0) {
+return;
+  }
+
+  // update the value to be used for next batch(es)
+  setOutputRowCount(Math.min(ValueVector.MAX_ROW_COUNT,
+
Math.max(RecordBatchSizer.safeDivide(outputBatchSize/WORST_CASE_FRAGMENTATION_FACTOR,
 newOutgoingRowWidth), MIN_NUM_ROWS)));
--- End diff --

Maybe wrap this in a method since it is used multiple times.


> Limit batch size for Merge Join based on memory
> ---
>
> Key: DRILL-6123
> URL: https://issues.apache.org/jira/browse/DRILL-6123
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Flow
>Affects Versions: 1.12.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
>Priority: Major
> Fix For: 1.13.0
>
>
> Merge join limits output batch size to 32K rows irrespective of row size. 
> This can create very large or very small batches (in terms of memory), 
> depending upon average row width. Change this to figure out output row count 
> based on memory specified with the new outputBatchSize option and average row 
> width of incoming left and right batches. Output row count will be minimum of 
> 1 and max of 64k. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6123) Limit batch size for Merge Join based on memory

2018-02-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16354229#comment-16354229
 ] 

ASF GitHub Bot commented on DRILL-6123:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1107#discussion_r166384715
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinStatus.java
 ---
@@ -101,8 +101,12 @@ public final void resetOutputPos() {
   }
 
   public final boolean isOutgoingBatchFull() {
-Preconditions.checkArgument(outputPosition <= OUTPUT_BATCH_SIZE);
-return outputPosition == OUTPUT_BATCH_SIZE;
+Preconditions.checkArgument(outputPosition <= outputRowCount);
+return outputPosition == outputRowCount;
--- End diff --

Maybe be just a bit more paranoid? `outputPosition >= outputRowCount`?

And, while we're at it, maybe `outputRowCount` -> `targetOutputRowCount`? 
To make clear that the value is our target, not the actual, current row count.


> Limit batch size for Merge Join based on memory
> ---
>
> Key: DRILL-6123
> URL: https://issues.apache.org/jira/browse/DRILL-6123
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Flow
>Affects Versions: 1.12.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
>Priority: Major
> Fix For: 1.13.0
>
>
> Merge join limits output batch size to 32K rows irrespective of row size. 
> This can create very large or very small batches (in terms of memory), 
> depending upon average row width. Change this to figure out output row count 
> based on memory specified with the new outputBatchSize option and average row 
> width of incoming left and right batches. Output row count will be minimum of 
> 1 and max of 64k. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6123) Limit batch size for Merge Join based on memory

2018-02-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16354230#comment-16354230
 ] 

ASF GitHub Bot commented on DRILL-6123:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1107#discussion_r166384067
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java ---
@@ -77,7 +77,7 @@ private ExecConstants() {
   public static final String SPILL_DIRS = "drill.exec.spill.directories";
 
   public static final String OUTPUT_BATCH_SIZE = 
"drill.exec.memory.operator.output_batch_size";
-  public static final LongValidator OUTPUT_BATCH_SIZE_VALIDATOR = new 
RangeLongValidator(OUTPUT_BATCH_SIZE, 1024, 512 * 1024 * 1024);
+  public static final LongValidator OUTPUT_BATCH_SIZE_VALIDATOR = new 
RangeLongValidator(OUTPUT_BATCH_SIZE, 1, 512 * 1024 * 1024);
--- End diff --

Maybe add a comment to explain the units here. Bytes? MB? A minimum batch 
size of 1 byte seems small, but a max size of 512 GB seems large, so not sure 
of the limits...


> Limit batch size for Merge Join based on memory
> ---
>
> Key: DRILL-6123
> URL: https://issues.apache.org/jira/browse/DRILL-6123
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Flow
>Affects Versions: 1.12.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
>Priority: Major
> Fix For: 1.13.0
>
>
> Merge join limits output batch size to 32K rows irrespective of row size. 
> This can create very large or very small batches (in terms of memory), 
> depending upon average row width. Change this to figure out output row count 
> based on memory specified with the new outputBatchSize option and average row 
> width of incoming left and right batches. Output row count will be minimum of 
> 1 and max of 64k. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6123) Limit batch size for Merge Join based on memory

2018-02-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16354228#comment-16354228
 ] 

ASF GitHub Bot commented on DRILL-6123:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1107#discussion_r166387630
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/MergeJoinBatch.java
 ---
@@ -102,20 +105,78 @@
   private final List comparators;
   private final JoinRelType joinType;
   private JoinWorker worker;
+  private final long outputBatchSize;
 
   private static final String LEFT_INPUT = "LEFT INPUT";
   private static final String RIGHT_INPUT = "RIGHT INPUT";
 
+  private class MergeJoinMemoryManager extends 
AbstractRecordBatchMemoryManager {
+private int leftRowWidth;
+private int rightRowWidth;
+
+/**
+ * mergejoin operates on one record at a time from the left and right 
batches
+ * using RecordIterator abstraction. We have a callback mechanism to 
get notified
+ * when new batch is loaded in record iterator.
+ * This can get called in the middle of current output batch we are 
building.
+ * when this gets called, adjust number of output rows for the current 
batch and
+ * update the value to be used for subsequent batches.
+ */
+@Override
+public void update(int inputIndex) {
+  switch(inputIndex) {
+case 0:
+  final RecordBatchSizer leftSizer = new RecordBatchSizer(left);
+  leftRowWidth = leftSizer.netRowWidth();
+  break;
+case 1:
+  final RecordBatchSizer rightSizer = new RecordBatchSizer(right);
+  rightRowWidth = rightSizer.netRowWidth();
+default:
+  break;
+  }
+
+  final int newOutgoingRowWidth = leftRowWidth + rightRowWidth;
+
+  // If outgoing row width is 0, just return. This is possible for 
empty batches or
+  // when first set of batches come with OK_NEW_SCHEMA and no data.
+  if (newOutgoingRowWidth == 0) {
+return;
+  }
+
+  // update the value to be used for next batch(es)
+  setOutputRowCount(Math.min(ValueVector.MAX_ROW_COUNT,
+
Math.max(RecordBatchSizer.safeDivide(outputBatchSize/WORST_CASE_FRAGMENTATION_FACTOR,
 newOutgoingRowWidth), MIN_NUM_ROWS)));
+
+  // Adjust for the current batch.
+  // calculate memory used so far based on previous outgoing row width 
and how many rows we already processed.
+  final long memoryUsed = status.getOutPosition() * 
getOutgoingRowWidth();
+  // This is the remaining memory.
+  final long remainingMemory = 
Math.max(outputBatchSize/WORST_CASE_FRAGMENTATION_FACTOR - memoryUsed, 0);
+  // These are number of rows we can fit in remaining memory based on 
new outgoing row width.
+  final int numOutputRowsRemaining = 
RecordBatchSizer.safeDivide(remainingMemory, newOutgoingRowWidth);
+
+  final int adjustedOutputRowCount = Math.min(MAX_NUM_ROWS, 
Math.max(status.getOutPosition() + numOutputRowsRemaining, MIN_NUM_ROWS));
+  status.setOutputRowCount(adjustedOutputRowCount);
+  setOutgoingRowWidth(newOutgoingRowWidth);
--- End diff --

This number is valid only for this one batch. The next batch doesn't have 
the "legacy" rows. Do we recompute the number at the start of the next output 
batch?


> Limit batch size for Merge Join based on memory
> ---
>
> Key: DRILL-6123
> URL: https://issues.apache.org/jira/browse/DRILL-6123
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Flow
>Affects Versions: 1.12.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
>Priority: Major
> Fix For: 1.13.0
>
>
> Merge join limits output batch size to 32K rows irrespective of row size. 
> This can create very large or very small batches (in terms of memory), 
> depending upon average row width. Change this to figure out output row count 
> based on memory specified with the new outputBatchSize option and average row 
> width of incoming left and right batches. Output row count will be minimum of 
> 1 and max of 64k. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6032) Use RecordBatchSizer to estimate size of columns in HashAgg

2018-02-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16354241#comment-16354241
 ] 

ASF GitHub Bot commented on DRILL-6032:
---

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/1101#discussion_r166389076
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/spill/RecordBatchSizer.java
 ---
@@ -65,6 +70,14 @@
 
 public int stdSize;
 
+/**
+ * If the we can determine the exact width of the row of a vector 
upfront,
+ * the row widths is saved here. If we cannot determine the exact width
+ * (for example for VarChar or Repeated vectors), then
+ */
+
+private int knownSize = -1;
--- End diff --

Thanks for catching this. I'll use stdSize instead.


> Use RecordBatchSizer to estimate size of columns in HashAgg
> ---
>
> Key: DRILL-6032
> URL: https://issues.apache.org/jira/browse/DRILL-6032
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Major
> Fix For: 1.13.0
>
>
> We need to use the RecordBatchSize to estimate the size of columns in the 
> Partition batches created by HashAgg.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (DRILL-6135) New Feature: SHOW CREATE VIEW command

2018-02-06 Thread Hari Sekhon (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16353638#comment-16353638
 ] 

Hari Sekhon edited comment on DRILL-6135 at 2/6/18 9:35 AM:


[~julianhyde]

I think SHOW CREATE ... is better as it stays in line with other SQL systems. 
This should also be applied to tables etc. Examples include MySQL and Presto / 
AWS Athena:

 

[https://dev.mysql.com/doc/refman/5.7/en/show-create-table.html]

 

[https://docs.aws.amazon.com/athena/latest/ug/show-create-table.html]

 


was (Author: harisekhon):
[~julianhyde]

SHOW CREATE ... is better as it stays in line with other SQL systems. This 
should also be applied to tables etc. Examples include MySQL and Presto / AWS 
Athena:

 

[https://dev.mysql.com/doc/refman/5.7/en/show-create-table.html]

 

[https://docs.aws.amazon.com/athena/latest/ug/show-create-table.html]

 

> New Feature: SHOW CREATE VIEW command
> -
>
> Key: DRILL-6135
> URL: https://issues.apache.org/jira/browse/DRILL-6135
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Metadata, Storage - Information Schema
>Affects Versions: 1.10.0
> Environment: MapR 5.2 + Kerberos
>Reporter: Hari Sekhon
>Priority: Major
>
> Feature Request to implement
> {code:java}
> SHOW CREATE VIEW ;{code}
> A colleague and I just had to cat the view file which is non-pretty json and 
> hard to read a large view creation statement that could have been presented 
> in drill shell and formatted.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6135) New Feature: SHOW CREATE TABLE / VIEW command

2018-02-06 Thread Hari Sekhon (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sekhon updated DRILL-6135:
---
Summary: New Feature: SHOW CREATE TABLE / VIEW command  (was: New Feature: 
SHOW CREATE VIEW command)

> New Feature: SHOW CREATE TABLE / VIEW command
> -
>
> Key: DRILL-6135
> URL: https://issues.apache.org/jira/browse/DRILL-6135
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Metadata, Storage - Information Schema
>Affects Versions: 1.10.0
> Environment: MapR 5.2 + Kerberos
>Reporter: Hari Sekhon
>Priority: Major
>
> Feature Request to implement
> {code:java}
> SHOW CREATE VIEW ;{code}
> A colleague and I just had to cat the view file which is non-pretty json and 
> hard to read a large view creation statement that could have been presented 
> in drill shell and formatted.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (DRILL-6135) New Feature: SHOW CREATE VIEW command

2018-02-06 Thread Hari Sekhon (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16353638#comment-16353638
 ] 

Hari Sekhon edited comment on DRILL-6135 at 2/6/18 9:35 AM:


[~julianhyde]

SHOW CREATE ... is better as it stays in line with other SQL systems. This 
should also be applied to tables etc. Examples include MySQL and Presto / AWS 
Athena:

 

[https://dev.mysql.com/doc/refman/5.7/en/show-create-table.html]

 

[https://docs.aws.amazon.com/athena/latest/ug/show-create-table.html]

 


was (Author: harisekhon):
[~julianhyde]

SHOW CREATE ... is better as it stays in line with other SQL systems. This 
should also be applied to tables etc.

> New Feature: SHOW CREATE VIEW command
> -
>
> Key: DRILL-6135
> URL: https://issues.apache.org/jira/browse/DRILL-6135
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Metadata, Storage - Information Schema
>Affects Versions: 1.10.0
> Environment: MapR 5.2 + Kerberos
>Reporter: Hari Sekhon
>Priority: Major
>
> Feature Request to implement
> {code:java}
> SHOW CREATE VIEW ;{code}
> A colleague and I just had to cat the view file which is non-pretty json and 
> hard to read a large view creation statement that could have been presented 
> in drill shell and formatted.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6135) New Feature: SHOW CREATE VIEW command

2018-02-06 Thread Hari Sekhon (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16353638#comment-16353638
 ] 

Hari Sekhon commented on DRILL-6135:


[~julianhyde]

SHOW CREATE ... is better as it stays in line with other SQL systems. This 
should also be applied to tables etc.

> New Feature: SHOW CREATE VIEW command
> -
>
> Key: DRILL-6135
> URL: https://issues.apache.org/jira/browse/DRILL-6135
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Metadata, Storage - Information Schema
>Affects Versions: 1.10.0
> Environment: MapR 5.2 + Kerberos
>Reporter: Hari Sekhon
>Priority: Major
>
> Feature Request to implement
> {code:java}
> SHOW CREATE VIEW ;{code}
> A colleague and I just had to cat the view file which is non-pretty json and 
> hard to read a large view creation statement that could have been presented 
> in drill shell and formatted.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (DRILL-6135) New Feature: SHOW CREATE TABLE / VIEW command

2018-02-06 Thread Hari Sekhon (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16353638#comment-16353638
 ] 

Hari Sekhon edited comment on DRILL-6135 at 2/6/18 9:37 AM:


[~julianhyde]

I think SHOW CREATE  is better as it stays in line with other SQL 
systems. This should also be applied to both tables and views. Examples include 
MySQL and Presto / AWS Athena:

 

[https://dev.mysql.com/doc/refman/5.7/en/show-create-table.html]

 

[https://docs.aws.amazon.com/athena/latest/ug/show-create-table.html]

 


was (Author: harisekhon):
[~julianhyde]

I think SHOW CREATE  is better as it stays in line with other SQL 
systems. This should also be applied to tables etc. Examples include MySQL and 
Presto / AWS Athena:

 

[https://dev.mysql.com/doc/refman/5.7/en/show-create-table.html]

 

[https://docs.aws.amazon.com/athena/latest/ug/show-create-table.html]

 

> New Feature: SHOW CREATE TABLE / VIEW command
> -
>
> Key: DRILL-6135
> URL: https://issues.apache.org/jira/browse/DRILL-6135
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Metadata, Storage - Information Schema
>Affects Versions: 1.10.0
> Environment: MapR 5.2 + Kerberos
>Reporter: Hari Sekhon
>Priority: Major
>
> Feature Request to implement
> {code:java}
> SHOW CREATE VIEW ;{code}
> A colleague and I just had to cat the view file which is non-pretty json and 
> hard to read a large view creation statement that could have been presented 
> in drill shell and formatted.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (DRILL-6135) New Feature: SHOW CREATE TABLE / VIEW command

2018-02-06 Thread Hari Sekhon (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16353638#comment-16353638
 ] 

Hari Sekhon edited comment on DRILL-6135 at 2/6/18 9:36 AM:


[~julianhyde]

I think SHOW CREATE  is better as it stays in line with other SQL 
systems. This should also be applied to tables etc. Examples include MySQL and 
Presto / AWS Athena:

 

[https://dev.mysql.com/doc/refman/5.7/en/show-create-table.html]

 

[https://docs.aws.amazon.com/athena/latest/ug/show-create-table.html]

 


was (Author: harisekhon):
[~julianhyde]

I think SHOW CREATE  is better as it stays in line with other SQL 
systems. This should also be applied to tables etc. Examples include MySQL and 
Presto / AWS Athena:

 

[https://dev.mysql.com/doc/refman/5.7/en/show-create-table.html]

 

[https://docs.aws.amazon.com/athena/latest/ug/show-create-table.html]

 

> New Feature: SHOW CREATE TABLE / VIEW command
> -
>
> Key: DRILL-6135
> URL: https://issues.apache.org/jira/browse/DRILL-6135
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Metadata, Storage - Information Schema
>Affects Versions: 1.10.0
> Environment: MapR 5.2 + Kerberos
>Reporter: Hari Sekhon
>Priority: Major
>
> Feature Request to implement
> {code:java}
> SHOW CREATE VIEW ;{code}
> A colleague and I just had to cat the view file which is non-pretty json and 
> hard to read a large view creation statement that could have been presented 
> in drill shell and formatted.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (DRILL-6135) New Feature: SHOW CREATE TABLE / VIEW command

2018-02-06 Thread Hari Sekhon (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16353638#comment-16353638
 ] 

Hari Sekhon edited comment on DRILL-6135 at 2/6/18 9:36 AM:


[~julianhyde]

I think SHOW CREATE  is better as it stays in line with other SQL 
systems. This should also be applied to tables etc. Examples include MySQL and 
Presto / AWS Athena:

 

[https://dev.mysql.com/doc/refman/5.7/en/show-create-table.html]

 

[https://docs.aws.amazon.com/athena/latest/ug/show-create-table.html]

 


was (Author: harisekhon):
[~julianhyde]

I think SHOW CREATE ... is better as it stays in line with other SQL systems. 
This should also be applied to tables etc. Examples include MySQL and Presto / 
AWS Athena:

 

[https://dev.mysql.com/doc/refman/5.7/en/show-create-table.html]

 

[https://docs.aws.amazon.com/athena/latest/ug/show-create-table.html]

 

> New Feature: SHOW CREATE TABLE / VIEW command
> -
>
> Key: DRILL-6135
> URL: https://issues.apache.org/jira/browse/DRILL-6135
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Metadata, Storage - Information Schema
>Affects Versions: 1.10.0
> Environment: MapR 5.2 + Kerberos
>Reporter: Hari Sekhon
>Priority: Major
>
> Feature Request to implement
> {code:java}
> SHOW CREATE VIEW ;{code}
> A colleague and I just had to cat the view file which is non-pretty json and 
> hard to read a large view creation statement that could have been presented 
> in drill shell and formatted.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6138) Move RecordBatchSizer to org.apache.drill.exec.record package

2018-02-06 Thread Padma Penumarthy (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16354624#comment-16354624
 ] 

Padma Penumarthy commented on DRILL-6138:
-

I need this change to be merged sooner for some stuff I am working on.  
@ilooner said it is ok to merge this sooner.

 

> Move RecordBatchSizer to org.apache.drill.exec.record package
> -
>
> Key: DRILL-6138
> URL: https://issues.apache.org/jira/browse/DRILL-6138
> Project: Apache Drill
>  Issue Type: Task
>  Components: Execution - Flow
>Affects Versions: 1.12.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
>Priority: Minor
> Fix For: 1.13.0
>
>
> Move RecordBatchSizer from org.apache.drill.exec.physical.impl.spill package 
> to org.apache.drill.exec.record package.
> Minor refactoring - change columnSizes from list to map. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6114) Complete internal metadata layer for improved batch handling

2018-02-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16354734#comment-16354734
 ] 

ASF GitHub Bot commented on DRILL-6114:
---

Github user paul-rogers commented on the issue:

https://github.com/apache/drill/pull/1112
  
Thanks @ppadma!

Next we'll need a reviewer. @parthchandra or @arina-ielchiieva, is this 
something you can review?

Once this one is good, I'll follow up with another that includes revisions 
to the `SchemaBuilder` that allows tests to build schemas using the additional 
data types added here. 


> Complete internal metadata layer for improved batch handling
> 
>
> Key: DRILL-6114
> URL: https://issues.apache.org/jira/browse/DRILL-6114
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.13.0
>
>
> Slice of the ["batch handling" 
> project.|https://github.com/paul-rogers/drill/wiki/Batch-Handling-Upgrades] 
> that includes enhancements to the internal metadata system.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-5993) Improve Performance of Copiers used by SV Remover, Top N etc.

2018-02-06 Thread Timothy Farkas (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Farkas updated DRILL-5993:
--
Labels: ready-to-commit  (was: )

> Improve Performance of Copiers used by SV Remover, Top N etc.
> -
>
> Key: DRILL-5993
> URL: https://issues.apache.org/jira/browse/DRILL-5993
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.13.0
>
>
> Currently the copier can only copy record from an incoming batch to the 
> beginning of an outgoing batch. We need to be able to copy a record and 
> append it to the end of the outgoing batch. Also Paul's Generic copiers are 
> more performant and simpler and should be added.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6138) Move RecordBatchSizer to org.apache.drill.exec.record package

2018-02-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16354634#comment-16354634
 ] 

ASF GitHub Bot commented on DRILL-6138:
---

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/1115#discussion_r166459971
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatchSizer.java 
---
@@ -260,7 +254,7 @@ public static ColumnSize getColumn(ValueVector v, 
String prefix) {
 
   public static final int MAX_VECTOR_SIZE = ValueVector.MAX_BUFFER_SIZE; 
// 16 MiB
 
-  private List columnSizes = new ArrayList<>();
+  private Map columnSizes = 
CaseInsensitiveMap.newHashMap();
--- End diff --

I'll rebase after this change goes in.


> Move RecordBatchSizer to org.apache.drill.exec.record package
> -
>
> Key: DRILL-6138
> URL: https://issues.apache.org/jira/browse/DRILL-6138
> Project: Apache Drill
>  Issue Type: Task
>  Components: Execution - Flow
>Affects Versions: 1.12.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
>Priority: Minor
> Fix For: 1.13.0
>
>
> Move RecordBatchSizer from org.apache.drill.exec.physical.impl.spill package 
> to org.apache.drill.exec.record package.
> Minor refactoring - change columnSizes from list to map. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (DRILL-6003) Unit test TestDynamicUDFSupport.testLazyInitWhenDynamicUdfSupportIsDisabled fails with FUNCTION ERROR: Failure reading Function class.

2018-02-06 Thread Timothy Farkas (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16354664#comment-16354664
 ] 

Timothy Farkas edited comment on DRILL-6003 at 2/6/18 10:47 PM:


[~ben-zvi] Has observed that this issue has resurfaced on Jenkins. There is 
likely some race condition going on here that still needs to be fixed.

{code}
Tests run: 29, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 26.585 sec <<< 
FAILURE! - in org.apache.drill.TestDynamicUDFSupport
testOverloadedFunctionExecutionStage(org.apache.drill.TestDynamicUDFSupport)  
Time elapsed: 0.804 sec  <<< ERROR!
org.apache.drill.exec.rpc.RpcException: 
org.apache.drill.common.exceptions.UserRemoteException: FUNCTION ERROR: Failure 
reading Function class.

Function Class com.drill.udf.overloading.Log
Fragment 0:0

[Error Id: 06187408-1750-4071-832a-e9b57b927985 on atsqa6c60.qa.lab:31040]
at 
org.apache.drill.exec.rpc.RpcException.mapException(RpcException.java:60)
at 
org.apache.drill.exec.client.DrillClient$ListHoldingResultsListener.getResults(DrillClient.java:865)
at 
org.apache.drill.exec.client.DrillClient.runQuery(DrillClient.java:567)
at 
org.apache.drill.test.BaseTestQuery.testRunAndReturn(BaseTestQuery.java:333)
at 
org.apache.drill.test.BaseTestQuery$ClassicTestServices.testRunAndReturn(BaseTestQuery.java:271)
at 
org.apache.drill.test.DrillTestWrapper.testRunAndReturn(DrillTestWrapper.java:859)
at 
org.apache.drill.test.DrillTestWrapper.compareUnorderedResults(DrillTestWrapper.java:508)
at org.apache.drill.test.DrillTestWrapper.run(DrillTestWrapper.java:149)
at org.apache.drill.test.TestBuilder.go(TestBuilder.java:139)
at 
org.apache.drill.TestDynamicUDFSupport.testOverloadedFunctionExecutionStage(TestDynamicUDFSupport.java:537)
Caused by: org.apache.drill.common.exceptions.UserRemoteException: FUNCTION 
ERROR: Failure reading Function class.
{code}


was (Author: timothyfarkas):
[~ben-zvi] Has observed that this issue has resurfaced on Jenkins. There is 
likely some race condition going on here that still needs to be fixed.

> Unit test TestDynamicUDFSupport.testLazyInitWhenDynamicUdfSupportIsDisabled 
> fails with FUNCTION ERROR: Failure reading Function class.
> --
>
> Key: DRILL-6003
> URL: https://issues.apache.org/jira/browse/DRILL-6003
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Tools, Build  Test
>Affects Versions: 1.12.0, 1.13.0
>Reporter: Abhishek Girish
>Assignee: Timothy Farkas
>Priority: Major
>  Labels: ready-to-commit
>
> {code}
> 14:05:23.170 [main] ERROR org.apache.drill.TestReporter - Test Failed (d: 0 
> B(1 B), h: 229.7 MiB(1.1 GiB), nh: 187.0 KiB(73.2 MiB)): 
> testLazyInitWhenDynamicUdfSupportIsDisabled(org.apache.drill.TestDynamicUDFSupport)
> org.apache.drill.exec.rpc.RpcException: 
> org.apache.drill.common.exceptions.UserRemoteException: FUNCTION ERROR: 
> Failure reading Function class.
> Function Class com.drill.udf.CustomLowerFunction
> Fragment 0:0
> [Error Id: 1d6ea0e5-fd65-4622-924d-d196defaedc8 on 10.10.104.57:31010]
>   at 
> org.apache.drill.exec.rpc.RpcException.mapException(RpcException.java:60) 
> ~[drill-rpc-1.12.0.jar:1.12.0]
>   at 
> org.apache.drill.exec.client.DrillClient$ListHoldingResultsListener.getResults(DrillClient.java:865)
>  ~[classes/:na]
>   at 
> org.apache.drill.exec.client.DrillClient.runQuery(DrillClient.java:567) 
> ~[classes/:na]
>   at 
> org.apache.drill.test.BaseTestQuery.testRunAndReturn(BaseTestQuery.java:338) 
> ~[test-classes/:na]
>   at 
> org.apache.drill.test.BaseTestQuery$ClassicTestServices.testRunAndReturn(BaseTestQuery.java:276)
>  ~[test-classes/:na]
>   at 
> org.apache.drill.test.DrillTestWrapper.testRunAndReturn(DrillTestWrapper.java:830)
>  ~[test-classes/:na]
>   at 
> org.apache.drill.test.DrillTestWrapper.compareUnorderedResults(DrillTestWrapper.java:484)
>  ~[test-classes/:na]
>   at 
> org.apache.drill.test.DrillTestWrapper.run(DrillTestWrapper.java:147) 
> ~[test-classes/:na]
>   at org.apache.drill.test.TestBuilder.go(TestBuilder.java:139) 
> ~[test-classes/:na]
>   at 
> org.apache.drill.TestDynamicUDFSupport.testLazyInitWhenDynamicUdfSupportIsDisabled(TestDynamicUDFSupport.java:506)
>  ~[test-classes/:na]
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[na:1.7.0_131]
>   at java.lang.reflect.Method.invoke(Method.java:606) ~[na:1.7.0_131]
>   at java.lang.reflect.Method.invoke(Method.java:606) ~[na:1.7.0_131]
> org.apache.drill.common.exceptions.UserRemoteException: FUNCTION ERROR: 
> Failure reading Function class.
> Function 

[jira] [Created] (DRILL-6140) Operators listed in Profiles Page doesn't always correspond with operator specified in Physical Plan

2018-02-06 Thread Kunal Khatua (JIRA)
Kunal Khatua created DRILL-6140:
---

 Summary: Operators listed in Profiles Page doesn't always 
correspond with operator specified in Physical Plan
 Key: DRILL-6140
 URL: https://issues.apache.org/jira/browse/DRILL-6140
 Project: Apache Drill
  Issue Type: Bug
  Components: Web Server
Affects Versions: 1.12.0
Reporter: Kunal Khatua
Assignee: Kunal Khatua






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6140) Operators listed in Profiles Page doesn't always correspond with operator specified in Physical Plan

2018-02-06 Thread Kunal Khatua (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Khatua updated DRILL-6140:

Description: 
A query's physical plan correctly shows
{code}
 00-00 Screen : rowType = RecordType(BIGINT EXPR$0): rowcount = 1.0, cumulative 
cost = { ...
   00-01 Project(EXPR$0=[$0]) : rowType = RecordType(BIGINT EXPR$0): rowcount = 
1.0, cumulative cost = { ...
 00-02 StreamAgg(group=[{}], EXPR$0=[$SUM0($0)]) : rowType = 
RecordType(BIGINT EXPR$0): rowcount = 1.0, cumulative cost = { ...
   00-03 UnionExchange : rowType = RecordType(BIGINT EXPR$0): rowcount = 
1.0, cumulative cost = { ...
 01-01 StreamAgg(group=[{}], EXPR$0=[COUNT()]) : rowType = 
RecordType(BIGINT EXPR$0): rowcount = 1.0, cumulative cost = { ...
   01-02 Project($f0=[0]) : rowType = RecordType(INTEGER $f0): rowcount 
= 1.79279253E7, cumulative cost = ...
 01-03 Flatten(flattenField=[$1]) : rowType = RecordType(ANY 
rfsSpecCode, ...
   01-04 Project(rfsSpecCode=[$1], PUResultsArray=[$2]) : rowType = 
...
 01-05 SelectionVectorRemover : rowType = RecordType(ANY 
schemaName, ...
   01-06 Filter(condition=[=($0, 'OnyxBlue')]) : rowType = ...
 01-07 Project(schemaName=[$0], ITEM=[ITEM($1, 
'rfsSpecCode')], ...
   01-08 Scan(groupscan=[ParquetGroupScan 
[entries=[ReadEntryWithPath [
{code}

However, the profile page shows the operators as...
||Operator ID || Type || Metrics||
|00-xx-00 | SCREEN | ... |
|00-xx-01 | PROJECT | ... |
|00-xx-02 | STREAMING_AGGREGATE | ... |
|00-xx-03 | UNORDERED_RECEIVER | ... |
|01-xx-00 | SINGLE_SENDER | ... |
|01-xx-01 | STREAMING_AGGREGATE | ... |
|01-xx-02 | PROJECT | ... |
|01-xx-03 | SINGLE_SENDER | ... |
|01-xx-04 | PROJECT | ... |
|01-xx-05 | SELECTION_VECTOR_REMOVER | ... |
|01-xx-06 | FILTER | ... |
|01-xx-07 | PROJECT | ... |
|01-xx-08 | PARQUET_ROW_GROUP_SCAN | ... |


As you can see ... {{FLATTEN}}  operator appears as a {{SINGLE_SENDER}} making 
the profile hard to interpret.

  was:
A query's physical plan correctly shows
{code}
 00-00 Screen : rowType = RecordType(BIGINT EXPR$0): rowcount = 1.0, cumulative 
cost = { ...
 00-01 Project(EXPR$0=[$0]) : rowType = RecordType(BIGINT EXPR$0): rowcount = 
1.0, cumulative cost = { ...
 00-02 StreamAgg(group=[{}], EXPR$0=[$SUM0($0)]) : rowType = RecordType(BIGINT 
EXPR$0): rowcount = 1.0, cumulative cost = { ...
 00-03 UnionExchange : rowType = RecordType(BIGINT EXPR$0): rowcount = 1.0, 
cumulative cost = { ...
 01-01 StreamAgg(group=[{}], EXPR$0=[COUNT()]) : rowType = RecordType(BIGINT 
EXPR$0): rowcount = 1.0, cumulative cost = { ...
 01-02 Project($f0=[0]) : rowType = RecordType(INTEGER $f0): rowcount = 
1.79279253E7, cumulative cost = ...
 01-03 Flatten(flattenField=[$1]) : rowType = RecordType(ANY rfsSpecCode, ...
 01-04 Project(rfsSpecCode=[$1], PUResultsArray=[$2]) : rowType = ...
 01-05 SelectionVectorRemover : rowType = RecordType(ANY schemaName, ...
 01-06 Filter(condition=[=($0, 'OnyxBlue')]) : rowType = ...
 01-07 Project(schemaName=[$0], ITEM=[ITEM($1, 'rfsSpecCode')], ...
 01-08 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [
{code}

However, the profile page shows the operators as...
||Operator ID || Type || Metrics||
|00-xx-00 | SCREEN | ... |
|00-xx-01 | PROJECT | ... |
|00-xx-02 | STREAMING_AGGREGATE | ... |
|00-xx-03 | UNORDERED_RECEIVER | ... |
|01-xx-00 | SINGLE_SENDER | ... |
|01-xx-01 | STREAMING_AGGREGATE | ... |
|01-xx-02 | PROJECT | ... |
|01-xx-03 | SINGLE_SENDER | ... |
|01-xx-04 | PROJECT | ... |
|01-xx-05 | SELECTION_VECTOR_REMOVER | ... |
|01-xx-06 | FILTER | ... |
|01-xx-07 | PROJECT | ... |
|01-xx-08 | PARQUET_ROW_GROUP_SCAN | ... |


As you can see ... {{FLATTEN}}  operator appears as a {{SINGLE_SENDER}} making 
the profile hard to interpret.


> Operators listed in Profiles Page doesn't always correspond with operator 
> specified in Physical Plan
> 
>
> Key: DRILL-6140
> URL: https://issues.apache.org/jira/browse/DRILL-6140
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Web Server
>Affects Versions: 1.12.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>Priority: Major
>
> A query's physical plan correctly shows
> {code}
>  00-00 Screen : rowType = RecordType(BIGINT EXPR$0): rowcount = 1.0, 
> cumulative cost = { ...
>00-01 Project(EXPR$0=[$0]) : rowType = RecordType(BIGINT EXPR$0): rowcount 
> = 1.0, cumulative cost = { ...
>  00-02 StreamAgg(group=[{}], EXPR$0=[$SUM0($0)]) : rowType = 
> RecordType(BIGINT EXPR$0): rowcount = 1.0, cumulative cost = { ...
>00-03 UnionExchange : rowType = RecordType(BIGINT EXPR$0): rowcount = 
> 1.0, cumulative cost = { ...
>  

[jira] [Updated] (DRILL-6140) Operators listed in Profiles Page doesn't always correspond with operator specified in Physical Plan

2018-02-06 Thread Kunal Khatua (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Khatua updated DRILL-6140:

Attachment: 25978a1a-24cf-fb4a-17af-59e7115b4fa1.sys.drill

> Operators listed in Profiles Page doesn't always correspond with operator 
> specified in Physical Plan
> 
>
> Key: DRILL-6140
> URL: https://issues.apache.org/jira/browse/DRILL-6140
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Web Server
>Affects Versions: 1.12.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>Priority: Major
> Attachments: 25978a1a-24cf-fb4a-17af-59e7115b4fa1.sys.drill
>
>
> A query's physical plan correctly shows
> {code}
>  00-00 Screen : rowType = RecordType(BIGINT EXPR$0): rowcount = 1.0, 
> cumulative cost = { ...
>00-01 Project(EXPR$0=[$0]) : rowType = RecordType(BIGINT EXPR$0): rowcount 
> = 1.0, cumulative cost = { ...
>  00-02 StreamAgg(group=[{}], EXPR$0=[$SUM0($0)]) : rowType = 
> RecordType(BIGINT EXPR$0): rowcount = 1.0, cumulative cost = { ...
>00-03 UnionExchange : rowType = RecordType(BIGINT EXPR$0): rowcount = 
> 1.0, cumulative cost = { ...
>  01-01 StreamAgg(group=[{}], EXPR$0=[COUNT()]) : rowType = 
> RecordType(BIGINT EXPR$0): rowcount = 1.0, cumulative cost = { ...
>01-02 Project($f0=[0]) : rowType = RecordType(INTEGER $f0): 
> rowcount = 1.79279253E7, cumulative cost = ...
>  01-03 Flatten(flattenField=[$1]) : rowType = RecordType(ANY 
> rfsSpecCode, ...
>01-04 Project(rfsSpecCode=[$1], PUResultsArray=[$2]) : rowType 
> = ...
>  01-05 SelectionVectorRemover : rowType = RecordType(ANY 
> schemaName, ...
>01-06 Filter(condition=[=($0, 'OnyxBlue')]) : rowType = ...
>  01-07 Project(schemaName=[$0], ITEM=[ITEM($1, 
> 'rfsSpecCode')], ...
>01-08 Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath [
> {code}
> However, the profile page shows the operators as...
> ||Operator ID || Type || Metrics||
> |00-xx-00 | SCREEN | ... |
> |00-xx-01 | PROJECT | ... |
> |00-xx-02 | STREAMING_AGGREGATE | ... |
> |00-xx-03 | UNORDERED_RECEIVER | ... |
> |01-xx-00 | SINGLE_SENDER | ... |
> |01-xx-01 | STREAMING_AGGREGATE | ... |
> |01-xx-02 | PROJECT | ... |
> |01-xx-03 | SINGLE_SENDER | ... |
> |01-xx-04 | PROJECT | ... |
> |01-xx-05 | SELECTION_VECTOR_REMOVER | ... |
> |01-xx-06 | FILTER | ... |
> |01-xx-07 | PROJECT | ... |
> |01-xx-08 | PARQUET_ROW_GROUP_SCAN | ... |
> As you can see ... {{FLATTEN}}  operator appears as a {{SINGLE_SENDER}} 
> making the profile hard to interpret.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6032) Use RecordBatchSizer to estimate size of columns in HashAgg

2018-02-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16354637#comment-16354637
 ] 

ASF GitHub Bot commented on DRILL-6032:
---

Github user ilooner commented on the issue:

https://github.com/apache/drill/pull/1101
  
@ppadma Responded to your comments. Please look at the last three commits 
for changes


> Use RecordBatchSizer to estimate size of columns in HashAgg
> ---
>
> Key: DRILL-6032
> URL: https://issues.apache.org/jira/browse/DRILL-6032
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Major
> Fix For: 1.13.0
>
>
> We need to use the RecordBatchSize to estimate the size of columns in the 
> Partition batches created by HashAgg.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6003) Unit test TestDynamicUDFSupport.testLazyInitWhenDynamicUdfSupportIsDisabled fails with FUNCTION ERROR: Failure reading Function class.

2018-02-06 Thread Timothy Farkas (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Farkas updated DRILL-6003:
--
Fix Version/s: (was: 1.13.0)

> Unit test TestDynamicUDFSupport.testLazyInitWhenDynamicUdfSupportIsDisabled 
> fails with FUNCTION ERROR: Failure reading Function class.
> --
>
> Key: DRILL-6003
> URL: https://issues.apache.org/jira/browse/DRILL-6003
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Tools, Build  Test
>Affects Versions: 1.12.0, 1.13.0
>Reporter: Abhishek Girish
>Assignee: Timothy Farkas
>Priority: Major
>  Labels: ready-to-commit
>
> {code}
> 14:05:23.170 [main] ERROR org.apache.drill.TestReporter - Test Failed (d: 0 
> B(1 B), h: 229.7 MiB(1.1 GiB), nh: 187.0 KiB(73.2 MiB)): 
> testLazyInitWhenDynamicUdfSupportIsDisabled(org.apache.drill.TestDynamicUDFSupport)
> org.apache.drill.exec.rpc.RpcException: 
> org.apache.drill.common.exceptions.UserRemoteException: FUNCTION ERROR: 
> Failure reading Function class.
> Function Class com.drill.udf.CustomLowerFunction
> Fragment 0:0
> [Error Id: 1d6ea0e5-fd65-4622-924d-d196defaedc8 on 10.10.104.57:31010]
>   at 
> org.apache.drill.exec.rpc.RpcException.mapException(RpcException.java:60) 
> ~[drill-rpc-1.12.0.jar:1.12.0]
>   at 
> org.apache.drill.exec.client.DrillClient$ListHoldingResultsListener.getResults(DrillClient.java:865)
>  ~[classes/:na]
>   at 
> org.apache.drill.exec.client.DrillClient.runQuery(DrillClient.java:567) 
> ~[classes/:na]
>   at 
> org.apache.drill.test.BaseTestQuery.testRunAndReturn(BaseTestQuery.java:338) 
> ~[test-classes/:na]
>   at 
> org.apache.drill.test.BaseTestQuery$ClassicTestServices.testRunAndReturn(BaseTestQuery.java:276)
>  ~[test-classes/:na]
>   at 
> org.apache.drill.test.DrillTestWrapper.testRunAndReturn(DrillTestWrapper.java:830)
>  ~[test-classes/:na]
>   at 
> org.apache.drill.test.DrillTestWrapper.compareUnorderedResults(DrillTestWrapper.java:484)
>  ~[test-classes/:na]
>   at 
> org.apache.drill.test.DrillTestWrapper.run(DrillTestWrapper.java:147) 
> ~[test-classes/:na]
>   at org.apache.drill.test.TestBuilder.go(TestBuilder.java:139) 
> ~[test-classes/:na]
>   at 
> org.apache.drill.TestDynamicUDFSupport.testLazyInitWhenDynamicUdfSupportIsDisabled(TestDynamicUDFSupport.java:506)
>  ~[test-classes/:na]
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[na:1.7.0_131]
>   at java.lang.reflect.Method.invoke(Method.java:606) ~[na:1.7.0_131]
>   at java.lang.reflect.Method.invoke(Method.java:606) ~[na:1.7.0_131]
> org.apache.drill.common.exceptions.UserRemoteException: FUNCTION ERROR: 
> Failure reading Function class.
> Function Class com.drill.udf.CustomLowerFunction
> Fragment 0:0
> [Error Id: 1d6ea0e5-fd65-4622-924d-d196defaedc8 on 10.10.104.57:31010]
>   at 
> org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:123)
>  ~[classes/:na]
>   at 
> org.apache.drill.exec.rpc.user.UserClient.handle(UserClient.java:468) 
> ~[classes/:na]
>   at 
> org.apache.drill.exec.rpc.user.UserClient.handle(UserClient.java:102) 
> ~[classes/:na]
>   at 
> org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:274) 
> ~[drill-rpc-1.12.0.jar:1.12.0]
>   at 
> org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:244) 
> ~[drill-rpc-1.12.0.jar:1.12.0]
>   at 
> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:88)
>  ~[netty-codec-4.0.48.Final.jar:4.0.48.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
>  ~[netty-transport-4.0.48.Final.jar:4.0.48.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
>  ~[netty-transport-4.0.48.Final.jar:4.0.48.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
>  ~[netty-transport-4.0.48.Final.jar:4.0.48.Final]
>   at 
> io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:287)
>  ~[netty-handler-4.0.48.Final.jar:4.0.48.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
>  ~[netty-transport-4.0.48.Final.jar:4.0.48.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
>  ~[netty-transport-4.0.48.Final.jar:4.0.48.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
>  

[jira] [Updated] (DRILL-6140) Operators listed in Profiles Page doesn't always correspond with operator specified in Physical Plan

2018-02-06 Thread Kunal Khatua (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Khatua updated DRILL-6140:

Description: 
A query's physical plan correctly shows
{code}
 00-00 Screen : rowType = RecordType(BIGINT EXPR$0): rowcount = 1.0, cumulative 
cost = { ...
 00-01 Project(EXPR$0=[$0]) : rowType = RecordType(BIGINT EXPR$0): rowcount = 
1.0, cumulative cost = { ...
 00-02 StreamAgg(group=[{}], EXPR$0=[$SUM0($0)]) : rowType = RecordType(BIGINT 
EXPR$0): rowcount = 1.0, cumulative cost = { ...
 00-03 UnionExchange : rowType = RecordType(BIGINT EXPR$0): rowcount = 1.0, 
cumulative cost = { ...
 01-01 StreamAgg(group=[{}], EXPR$0=[COUNT()]) : rowType = RecordType(BIGINT 
EXPR$0): rowcount = 1.0, cumulative cost = { ...
 01-02 Project($f0=[0]) : rowType = RecordType(INTEGER $f0): rowcount = 
1.79279253E7, cumulative cost = ...
 01-03 Flatten(flattenField=[$1]) : rowType = RecordType(ANY rfsSpecCode, ...
 01-04 Project(rfsSpecCode=[$1], PUResultsArray=[$2]) : rowType = ...
 01-05 SelectionVectorRemover : rowType = RecordType(ANY schemaName, ...
 01-06 Filter(condition=[=($0, 'OnyxBlue')]) : rowType = ...
 01-07 Project(schemaName=[$0], ITEM=[ITEM($1, 'rfsSpecCode')], ...
 01-08 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [
{code}

However, the profile page shows the operators as...
||Operator ID || Type || Metrics||
|00-xx-00 | SCREEN | ... |
|00-xx-01 | PROJECT | ... |
|00-xx-02 | STREAMING_AGGREGATE | ... |
|00-xx-03 | UNORDERED_RECEIVER | ... |
|01-xx-00 | SINGLE_SENDER | ... |
|01-xx-01 | STREAMING_AGGREGATE | ... |
|01-xx-02 | PROJECT | ... |
|01-xx-03 | SINGLE_SENDER | ... |
|01-xx-04 | PROJECT | ... |
|01-xx-05 | SELECTION_VECTOR_REMOVER | ... |
|01-xx-06 | FILTER | ... |
|01-xx-07 | PROJECT | ... |
|01-xx-08 | PARQUET_ROW_GROUP_SCAN | ... |


As you can see ... {{FLATTEN}}  operator appears as a {{SINGLE_SENDER}} making 
the profile hard to interpret.

> Operators listed in Profiles Page doesn't always correspond with operator 
> specified in Physical Plan
> 
>
> Key: DRILL-6140
> URL: https://issues.apache.org/jira/browse/DRILL-6140
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Web Server
>Affects Versions: 1.12.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>Priority: Major
>
> A query's physical plan correctly shows
> {code}
>  00-00 Screen : rowType = RecordType(BIGINT EXPR$0): rowcount = 1.0, 
> cumulative cost = { ...
>  00-01 Project(EXPR$0=[$0]) : rowType = RecordType(BIGINT EXPR$0): rowcount = 
> 1.0, cumulative cost = { ...
>  00-02 StreamAgg(group=[{}], EXPR$0=[$SUM0($0)]) : rowType = 
> RecordType(BIGINT EXPR$0): rowcount = 1.0, cumulative cost = { ...
>  00-03 UnionExchange : rowType = RecordType(BIGINT EXPR$0): rowcount = 1.0, 
> cumulative cost = { ...
>  01-01 StreamAgg(group=[{}], EXPR$0=[COUNT()]) : rowType = RecordType(BIGINT 
> EXPR$0): rowcount = 1.0, cumulative cost = { ...
>  01-02 Project($f0=[0]) : rowType = RecordType(INTEGER $f0): rowcount = 
> 1.79279253E7, cumulative cost = ...
>  01-03 Flatten(flattenField=[$1]) : rowType = RecordType(ANY rfsSpecCode, ...
>  01-04 Project(rfsSpecCode=[$1], PUResultsArray=[$2]) : rowType = ...
>  01-05 SelectionVectorRemover : rowType = RecordType(ANY schemaName, ...
>  01-06 Filter(condition=[=($0, 'OnyxBlue')]) : rowType = ...
>  01-07 Project(schemaName=[$0], ITEM=[ITEM($1, 'rfsSpecCode')], ...
>  01-08 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [
> {code}
> However, the profile page shows the operators as...
> ||Operator ID || Type || Metrics||
> |00-xx-00 | SCREEN | ... |
> |00-xx-01 | PROJECT | ... |
> |00-xx-02 | STREAMING_AGGREGATE | ... |
> |00-xx-03 | UNORDERED_RECEIVER | ... |
> |01-xx-00 | SINGLE_SENDER | ... |
> |01-xx-01 | STREAMING_AGGREGATE | ... |
> |01-xx-02 | PROJECT | ... |
> |01-xx-03 | SINGLE_SENDER | ... |
> |01-xx-04 | PROJECT | ... |
> |01-xx-05 | SELECTION_VECTOR_REMOVER | ... |
> |01-xx-06 | FILTER | ... |
> |01-xx-07 | PROJECT | ... |
> |01-xx-08 | PARQUET_ROW_GROUP_SCAN | ... |
> As you can see ... {{FLATTEN}}  operator appears as a {{SINGLE_SENDER}} 
> making the profile hard to interpret.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Reopened] (DRILL-6003) Unit test TestDynamicUDFSupport.testLazyInitWhenDynamicUdfSupportIsDisabled fails with FUNCTION ERROR: Failure reading Function class.

2018-02-06 Thread Timothy Farkas (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Farkas reopened DRILL-6003:
---

[~ben-zvi] Has observed that this issue has resurfaced on Jenkins. There is 
likely some race condition going on here that still needs to be fixed.

> Unit test TestDynamicUDFSupport.testLazyInitWhenDynamicUdfSupportIsDisabled 
> fails with FUNCTION ERROR: Failure reading Function class.
> --
>
> Key: DRILL-6003
> URL: https://issues.apache.org/jira/browse/DRILL-6003
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Tools, Build  Test
>Affects Versions: 1.12.0, 1.13.0
>Reporter: Abhishek Girish
>Assignee: Timothy Farkas
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.13.0
>
>
> {code}
> 14:05:23.170 [main] ERROR org.apache.drill.TestReporter - Test Failed (d: 0 
> B(1 B), h: 229.7 MiB(1.1 GiB), nh: 187.0 KiB(73.2 MiB)): 
> testLazyInitWhenDynamicUdfSupportIsDisabled(org.apache.drill.TestDynamicUDFSupport)
> org.apache.drill.exec.rpc.RpcException: 
> org.apache.drill.common.exceptions.UserRemoteException: FUNCTION ERROR: 
> Failure reading Function class.
> Function Class com.drill.udf.CustomLowerFunction
> Fragment 0:0
> [Error Id: 1d6ea0e5-fd65-4622-924d-d196defaedc8 on 10.10.104.57:31010]
>   at 
> org.apache.drill.exec.rpc.RpcException.mapException(RpcException.java:60) 
> ~[drill-rpc-1.12.0.jar:1.12.0]
>   at 
> org.apache.drill.exec.client.DrillClient$ListHoldingResultsListener.getResults(DrillClient.java:865)
>  ~[classes/:na]
>   at 
> org.apache.drill.exec.client.DrillClient.runQuery(DrillClient.java:567) 
> ~[classes/:na]
>   at 
> org.apache.drill.test.BaseTestQuery.testRunAndReturn(BaseTestQuery.java:338) 
> ~[test-classes/:na]
>   at 
> org.apache.drill.test.BaseTestQuery$ClassicTestServices.testRunAndReturn(BaseTestQuery.java:276)
>  ~[test-classes/:na]
>   at 
> org.apache.drill.test.DrillTestWrapper.testRunAndReturn(DrillTestWrapper.java:830)
>  ~[test-classes/:na]
>   at 
> org.apache.drill.test.DrillTestWrapper.compareUnorderedResults(DrillTestWrapper.java:484)
>  ~[test-classes/:na]
>   at 
> org.apache.drill.test.DrillTestWrapper.run(DrillTestWrapper.java:147) 
> ~[test-classes/:na]
>   at org.apache.drill.test.TestBuilder.go(TestBuilder.java:139) 
> ~[test-classes/:na]
>   at 
> org.apache.drill.TestDynamicUDFSupport.testLazyInitWhenDynamicUdfSupportIsDisabled(TestDynamicUDFSupport.java:506)
>  ~[test-classes/:na]
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[na:1.7.0_131]
>   at java.lang.reflect.Method.invoke(Method.java:606) ~[na:1.7.0_131]
>   at java.lang.reflect.Method.invoke(Method.java:606) ~[na:1.7.0_131]
> org.apache.drill.common.exceptions.UserRemoteException: FUNCTION ERROR: 
> Failure reading Function class.
> Function Class com.drill.udf.CustomLowerFunction
> Fragment 0:0
> [Error Id: 1d6ea0e5-fd65-4622-924d-d196defaedc8 on 10.10.104.57:31010]
>   at 
> org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:123)
>  ~[classes/:na]
>   at 
> org.apache.drill.exec.rpc.user.UserClient.handle(UserClient.java:468) 
> ~[classes/:na]
>   at 
> org.apache.drill.exec.rpc.user.UserClient.handle(UserClient.java:102) 
> ~[classes/:na]
>   at 
> org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:274) 
> ~[drill-rpc-1.12.0.jar:1.12.0]
>   at 
> org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:244) 
> ~[drill-rpc-1.12.0.jar:1.12.0]
>   at 
> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:88)
>  ~[netty-codec-4.0.48.Final.jar:4.0.48.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
>  ~[netty-transport-4.0.48.Final.jar:4.0.48.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
>  ~[netty-transport-4.0.48.Final.jar:4.0.48.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
>  ~[netty-transport-4.0.48.Final.jar:4.0.48.Final]
>   at 
> io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:287)
>  ~[netty-handler-4.0.48.Final.jar:4.0.48.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
>  ~[netty-transport-4.0.48.Final.jar:4.0.48.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
>  ~[netty-transport-4.0.48.Final.jar:4.0.48.Final]

[jira] [Updated] (DRILL-6003) Unit test TestDynamicUDFSupport.testLazyInitWhenDynamicUdfSupportIsDisabled fails with FUNCTION ERROR: Failure reading Function class.

2018-02-06 Thread Timothy Farkas (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Farkas updated DRILL-6003:
--
Affects Version/s: 1.13.0

> Unit test TestDynamicUDFSupport.testLazyInitWhenDynamicUdfSupportIsDisabled 
> fails with FUNCTION ERROR: Failure reading Function class.
> --
>
> Key: DRILL-6003
> URL: https://issues.apache.org/jira/browse/DRILL-6003
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Tools, Build  Test
>Affects Versions: 1.12.0, 1.13.0
>Reporter: Abhishek Girish
>Assignee: Timothy Farkas
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.13.0
>
>
> {code}
> 14:05:23.170 [main] ERROR org.apache.drill.TestReporter - Test Failed (d: 0 
> B(1 B), h: 229.7 MiB(1.1 GiB), nh: 187.0 KiB(73.2 MiB)): 
> testLazyInitWhenDynamicUdfSupportIsDisabled(org.apache.drill.TestDynamicUDFSupport)
> org.apache.drill.exec.rpc.RpcException: 
> org.apache.drill.common.exceptions.UserRemoteException: FUNCTION ERROR: 
> Failure reading Function class.
> Function Class com.drill.udf.CustomLowerFunction
> Fragment 0:0
> [Error Id: 1d6ea0e5-fd65-4622-924d-d196defaedc8 on 10.10.104.57:31010]
>   at 
> org.apache.drill.exec.rpc.RpcException.mapException(RpcException.java:60) 
> ~[drill-rpc-1.12.0.jar:1.12.0]
>   at 
> org.apache.drill.exec.client.DrillClient$ListHoldingResultsListener.getResults(DrillClient.java:865)
>  ~[classes/:na]
>   at 
> org.apache.drill.exec.client.DrillClient.runQuery(DrillClient.java:567) 
> ~[classes/:na]
>   at 
> org.apache.drill.test.BaseTestQuery.testRunAndReturn(BaseTestQuery.java:338) 
> ~[test-classes/:na]
>   at 
> org.apache.drill.test.BaseTestQuery$ClassicTestServices.testRunAndReturn(BaseTestQuery.java:276)
>  ~[test-classes/:na]
>   at 
> org.apache.drill.test.DrillTestWrapper.testRunAndReturn(DrillTestWrapper.java:830)
>  ~[test-classes/:na]
>   at 
> org.apache.drill.test.DrillTestWrapper.compareUnorderedResults(DrillTestWrapper.java:484)
>  ~[test-classes/:na]
>   at 
> org.apache.drill.test.DrillTestWrapper.run(DrillTestWrapper.java:147) 
> ~[test-classes/:na]
>   at org.apache.drill.test.TestBuilder.go(TestBuilder.java:139) 
> ~[test-classes/:na]
>   at 
> org.apache.drill.TestDynamicUDFSupport.testLazyInitWhenDynamicUdfSupportIsDisabled(TestDynamicUDFSupport.java:506)
>  ~[test-classes/:na]
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[na:1.7.0_131]
>   at java.lang.reflect.Method.invoke(Method.java:606) ~[na:1.7.0_131]
>   at java.lang.reflect.Method.invoke(Method.java:606) ~[na:1.7.0_131]
> org.apache.drill.common.exceptions.UserRemoteException: FUNCTION ERROR: 
> Failure reading Function class.
> Function Class com.drill.udf.CustomLowerFunction
> Fragment 0:0
> [Error Id: 1d6ea0e5-fd65-4622-924d-d196defaedc8 on 10.10.104.57:31010]
>   at 
> org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:123)
>  ~[classes/:na]
>   at 
> org.apache.drill.exec.rpc.user.UserClient.handle(UserClient.java:468) 
> ~[classes/:na]
>   at 
> org.apache.drill.exec.rpc.user.UserClient.handle(UserClient.java:102) 
> ~[classes/:na]
>   at 
> org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:274) 
> ~[drill-rpc-1.12.0.jar:1.12.0]
>   at 
> org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:244) 
> ~[drill-rpc-1.12.0.jar:1.12.0]
>   at 
> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:88)
>  ~[netty-codec-4.0.48.Final.jar:4.0.48.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
>  ~[netty-transport-4.0.48.Final.jar:4.0.48.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
>  ~[netty-transport-4.0.48.Final.jar:4.0.48.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
>  ~[netty-transport-4.0.48.Final.jar:4.0.48.Final]
>   at 
> io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:287)
>  ~[netty-handler-4.0.48.Final.jar:4.0.48.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
>  ~[netty-transport-4.0.48.Final.jar:4.0.48.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
>  ~[netty-transport-4.0.48.Final.jar:4.0.48.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
>  

[jira] [Commented] (DRILL-6032) Use RecordBatchSizer to estimate size of columns in HashAgg

2018-02-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16354753#comment-16354753
 ] 

ASF GitHub Bot commented on DRILL-6032:
---

Github user ilooner commented on the issue:

https://github.com/apache/drill/pull/1101
  
Travis failure is unrelated


> Use RecordBatchSizer to estimate size of columns in HashAgg
> ---
>
> Key: DRILL-6032
> URL: https://issues.apache.org/jira/browse/DRILL-6032
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Major
> Fix For: 1.13.0
>
>
> We need to use the RecordBatchSize to estimate the size of columns in the 
> Partition batches created by HashAgg.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6119) The OpenTSDB storage plugin is not included in the Drill distribution

2018-02-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16354777#comment-16354777
 ] 

ASF GitHub Bot commented on DRILL-6119:
---

Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/1102


> The OpenTSDB storage plugin is not included in the Drill distribution
> -
>
> Key: DRILL-6119
> URL: https://issues.apache.org/jira/browse/DRILL-6119
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Other
>Affects Versions: 1.13.0
>Reporter: Anton Gozhiy
>Assignee: Vlad
>Priority: Blocker
>  Labels: ready-to-commit
> Fix For: 1.13.0
>
>
> Steps:
>  # Open the drillbit web UI ( [http://localhost:8047/] )
>  # Navigate to the storage tab
>  # Try to add new storage plugin with the following config:
> {noformat}
> {
>   "type": "openTSDB",
>   "connection": "http://localhost:4242;,
>   "enabled": true
> }
> {noformat}
> Expected result:
> The plugin should be added and enabled successfully
> Actual result:
> Error displayed: "Please retry: error (invalid JSON mapping)". 
> In the drillbit.log: 
> {noformat}
> com.fasterxml.jackson.databind.JsonMappingException: Could not resolve type 
> id 'openTSDB' into a subtype of [simple type, class 
> org.apache.drill.common.logical.StoragePluginConfig]: known type ids = 
> [InfoSchemaConfig, StoragePluginConfig, SystemTablePluginConfig, file, hbase, 
> hive, jdbc, kafka, kudu, mock, mongo, named]
> {noformat}
> The jar file corresponding to the plugin is absent at the distribution jar 
> folder.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6138) Move RecordBatchSizer to org.apache.drill.exec.record package

2018-02-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16354564#comment-16354564
 ] 

ASF GitHub Bot commented on DRILL-6138:
---

GitHub user ppadma opened a pull request:

https://github.com/apache/drill/pull/1115

DRILL-6138: Move RecordBatchSizer to org.apache.drill.exec.record pac…

…kage
Also,  changed columnSizes in RecordBatchSizer from list to map so we can 
lookup using field names.

@Ben-Zvi can you please review ?


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ppadma/drill DRILL-6138

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/1115.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1115


commit b4104e73ad37dee96e29e345b958412791ab9079
Author: Padma Penumarthy 
Date:   2018-02-06T05:41:45Z

DRILL-6138: Move RecordBatchSizer to org.apache.drill.exec.record package




> Move RecordBatchSizer to org.apache.drill.exec.record package
> -
>
> Key: DRILL-6138
> URL: https://issues.apache.org/jira/browse/DRILL-6138
> Project: Apache Drill
>  Issue Type: Task
>  Components: Execution - Flow
>Affects Versions: 1.12.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
>Priority: Minor
> Fix For: 1.13.0
>
>
> Move RecordBatchSizer from org.apache.drill.exec.physical.impl.spill package 
> to org.apache.drill.exec.record package.
> Minor refactoring - change columnSizes from list to map. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6138) Move RecordBatchSizer to org.apache.drill.exec.record package

2018-02-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16354603#comment-16354603
 ] 

ASF GitHub Bot commented on DRILL-6138:
---

Github user Ben-Zvi commented on a diff in the pull request:

https://github.com/apache/drill/pull/1115#discussion_r166454887
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatchSizer.java 
---
@@ -260,7 +254,7 @@ public static ColumnSize getColumn(ValueVector v, 
String prefix) {
 
   public static final int MAX_VECTOR_SIZE = ValueVector.MAX_BUFFER_SIZE; 
// 16 MiB
 
-  private List columnSizes = new ArrayList<>();
+  private Map columnSizes = 
CaseInsensitiveMap.newHashMap();
--- End diff --

Tim (@ilooner) made the same change for DRILL-6032 ( #1101 )



> Move RecordBatchSizer to org.apache.drill.exec.record package
> -
>
> Key: DRILL-6138
> URL: https://issues.apache.org/jira/browse/DRILL-6138
> Project: Apache Drill
>  Issue Type: Task
>  Components: Execution - Flow
>Affects Versions: 1.12.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
>Priority: Minor
> Fix For: 1.13.0
>
>
> Move RecordBatchSizer from org.apache.drill.exec.physical.impl.spill package 
> to org.apache.drill.exec.record package.
> Minor refactoring - change columnSizes from list to map. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6129) Query fails on nested data type schema change

2018-02-06 Thread Kunal Khatua (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Khatua updated DRILL-6129:

Labels:   (was: ready-to-commit)

> Query fails on nested data type schema change
> -
>
> Key: DRILL-6129
> URL: https://issues.apache.org/jira/browse/DRILL-6129
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - CLI
>Affects Versions: 1.10.0
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Minor
> Fix For: 1.13.0
>
>
> Use-Case -
>  * Assume two parquet files with similar schemas except for a nested column
>  * Schema file1
>  ** int64 field1
>  ** optional group field2
>  *** optional group field2.1 (LIST)
>   repeated group list
>  * optional group element
>  ** optional int64 child_field
>  * Schema file2
>  ** int64 field1
>  ** optional group field2
>  *** optional group field2.1 (LIST)
>   repeated group list
>  * optional group element
>  ** optional group child_field
>  *** optional int64 child_field_f1
>  *** optional int64 child_field_f1
>  * Essentially child_field changed from an int64 to a group of fields
>  
> Observed Query Failure
> select * from ;
> Error: Unexpected RuntimeException: java.lang.IllegalArgumentException: The 
> field $bits$(UINT1:REQUIRED) doesn't match the provided metadata major_type {
>   minor_type: MAP
>   mode: REQUIRED
> Note that selecting one file at a time succeeds which seems to indicate the 
> issue has to do with the schema change logic. 
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6123) Limit batch size for Merge Join based on memory

2018-02-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16354481#comment-16354481
 ] 

ASF GitHub Bot commented on DRILL-6123:
---

Github user ppadma commented on a diff in the pull request:

https://github.com/apache/drill/pull/1107#discussion_r166427892
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/MergeJoinBatch.java
 ---
@@ -102,20 +105,78 @@
   private final List comparators;
   private final JoinRelType joinType;
   private JoinWorker worker;
+  private final long outputBatchSize;
 
   private static final String LEFT_INPUT = "LEFT INPUT";
   private static final String RIGHT_INPUT = "RIGHT INPUT";
 
+  private class MergeJoinMemoryManager extends 
AbstractRecordBatchMemoryManager {
+private int leftRowWidth;
+private int rightRowWidth;
+
+/**
+ * mergejoin operates on one record at a time from the left and right 
batches
+ * using RecordIterator abstraction. We have a callback mechanism to 
get notified
+ * when new batch is loaded in record iterator.
+ * This can get called in the middle of current output batch we are 
building.
+ * when this gets called, adjust number of output rows for the current 
batch and
+ * update the value to be used for subsequent batches.
+ */
+@Override
+public void update(int inputIndex) {
+  switch(inputIndex) {
+case 0:
+  final RecordBatchSizer leftSizer = new RecordBatchSizer(left);
+  leftRowWidth = leftSizer.netRowWidth();
+  break;
+case 1:
+  final RecordBatchSizer rightSizer = new RecordBatchSizer(right);
+  rightRowWidth = rightSizer.netRowWidth();
+default:
+  break;
+  }
+
+  final int newOutgoingRowWidth = leftRowWidth + rightRowWidth;
+
+  // If outgoing row width is 0, just return. This is possible for 
empty batches or
+  // when first set of batches come with OK_NEW_SCHEMA and no data.
+  if (newOutgoingRowWidth == 0) {
+return;
+  }
+
+  // update the value to be used for next batch(es)
+  setOutputRowCount(Math.min(ValueVector.MAX_ROW_COUNT,
+
Math.max(RecordBatchSizer.safeDivide(outputBatchSize/WORST_CASE_FRAGMENTATION_FACTOR,
 newOutgoingRowWidth), MIN_NUM_ROWS)));
+
+  // Adjust for the current batch.
+  // calculate memory used so far based on previous outgoing row width 
and how many rows we already processed.
+  final long memoryUsed = status.getOutPosition() * 
getOutgoingRowWidth();
+  // This is the remaining memory.
+  final long remainingMemory = 
Math.max(outputBatchSize/WORST_CASE_FRAGMENTATION_FACTOR - memoryUsed, 0);
+  // These are number of rows we can fit in remaining memory based on 
new outgoing row width.
+  final int numOutputRowsRemaining = 
RecordBatchSizer.safeDivide(remainingMemory, newOutgoingRowWidth);
+
+  final int adjustedOutputRowCount = Math.min(MAX_NUM_ROWS, 
Math.max(status.getOutPosition() + numOutputRowsRemaining, MIN_NUM_ROWS));
+  status.setOutputRowCount(adjustedOutputRowCount);
+  setOutgoingRowWidth(newOutgoingRowWidth);
--- End diff --

Yes, this is how it works.
We read from left and right side using RecordIterator abstraction, which 
reads full record batches underneath and gives one record at a time with its 
next call.  I have a callback mechanism when we read a new batch in record 
iterator to adjust the row widths. When I get the callback, I adjust row count 
for the current batch we are working on based on remaining memory available for 
the current batch and also compute and save the row count I should use for next 
full batch. In the innerNext, when we start working on the next output batch, I 
am setting the target output row count based on this value.

Addressed all other code review comments. Please take a look when you get a 
chance.



> Limit batch size for Merge Join based on memory
> ---
>
> Key: DRILL-6123
> URL: https://issues.apache.org/jira/browse/DRILL-6123
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Flow
>Affects Versions: 1.12.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
>Priority: Major
> Fix For: 1.13.0
>
>
> Merge join limits output batch size to 32K rows irrespective of row size. 
> This can create very large or very small batches (in terms of memory), 
> depending upon average row width. Change this to figure out output row count 
> based on memory specified with the new 

[jira] [Commented] (DRILL-6123) Limit batch size for Merge Join based on memory

2018-02-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16354590#comment-16354590
 ] 

ASF GitHub Bot commented on DRILL-6123:
---

Github user sachouche commented on a diff in the pull request:

https://github.com/apache/drill/pull/1107#discussion_r166448594
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java ---
@@ -77,7 +77,7 @@ private ExecConstants() {
   public static final String SPILL_DIRS = "drill.exec.spill.directories";
 
   public static final String OUTPUT_BATCH_SIZE = 
"drill.exec.memory.operator.output_batch_size";
-  public static final LongValidator OUTPUT_BATCH_SIZE_VALIDATOR = new 
RangeLongValidator(OUTPUT_BATCH_SIZE, 1024, 512 * 1024 * 1024);
+  public static final LongValidator OUTPUT_BATCH_SIZE_VALIDATOR = new 
RangeLongValidator(OUTPUT_BATCH_SIZE, 1, 512 * 1024 * 1024);
--- End diff --

Theoretically, 2Gb-1 should be the largest value as our value vectors use 
integers as offset values.


> Limit batch size for Merge Join based on memory
> ---
>
> Key: DRILL-6123
> URL: https://issues.apache.org/jira/browse/DRILL-6123
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Flow
>Affects Versions: 1.12.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
>Priority: Major
> Fix For: 1.13.0
>
>
> Merge join limits output batch size to 32K rows irrespective of row size. 
> This can create very large or very small batches (in terms of memory), 
> depending upon average row width. Change this to figure out output row count 
> based on memory specified with the new outputBatchSize option and average row 
> width of incoming left and right batches. Output row count will be minimum of 
> 1 and max of 64k. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6123) Limit batch size for Merge Join based on memory

2018-02-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16354589#comment-16354589
 ] 

ASF GitHub Bot commented on DRILL-6123:
---

Github user sachouche commented on a diff in the pull request:

https://github.com/apache/drill/pull/1107#discussion_r166450263
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/MergeJoinBatch.java
 ---
@@ -102,20 +105,78 @@
   private final List comparators;
   private final JoinRelType joinType;
   private JoinWorker worker;
+  private final long outputBatchSize;
 
   private static final String LEFT_INPUT = "LEFT INPUT";
   private static final String RIGHT_INPUT = "RIGHT INPUT";
 
+  private class MergeJoinMemoryManager extends 
AbstractRecordBatchMemoryManager {
+private int leftRowWidth;
+private int rightRowWidth;
+
+/**
+ * mergejoin operates on one record at a time from the left and right 
batches
+ * using RecordIterator abstraction. We have a callback mechanism to 
get notified
+ * when new batch is loaded in record iterator.
+ * This can get called in the middle of current output batch we are 
building.
+ * when this gets called, adjust number of output rows for the current 
batch and
+ * update the value to be used for subsequent batches.
+ */
+@Override
+public void update(int inputIndex) {
+  switch(inputIndex) {
+case 0:
+  final RecordBatchSizer leftSizer = new RecordBatchSizer(left);
+  leftRowWidth = leftSizer.netRowWidth();
+  break;
+case 1:
+  final RecordBatchSizer rightSizer = new RecordBatchSizer(right);
+  rightRowWidth = rightSizer.netRowWidth();
+default:
+  break;
+  }
+
+  final int newOutgoingRowWidth = leftRowWidth + rightRowWidth;
+
+  // If outgoing row width is 0, just return. This is possible for 
empty batches or
+  // when first set of batches come with OK_NEW_SCHEMA and no data.
+  if (newOutgoingRowWidth == 0) {
+return;
+  }
+
+  // update the value to be used for next batch(es)
+  setOutputRowCount(Math.min(ValueVector.MAX_ROW_COUNT,
+
Math.max(RecordBatchSizer.safeDivide(outputBatchSize/WORST_CASE_FRAGMENTATION_FACTOR,
 newOutgoingRowWidth), MIN_NUM_ROWS)));
--- End diff --

Our goal is to eventually use Paul's framework for controlling batch sizes. 
I feel the BatchRecordMemoryManager terminology is a bit ambitious (kind of 
misleading).


> Limit batch size for Merge Join based on memory
> ---
>
> Key: DRILL-6123
> URL: https://issues.apache.org/jira/browse/DRILL-6123
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Flow
>Affects Versions: 1.12.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
>Priority: Major
> Fix For: 1.13.0
>
>
> Merge join limits output batch size to 32K rows irrespective of row size. 
> This can create very large or very small batches (in terms of memory), 
> depending upon average row width. Change this to figure out output row count 
> based on memory specified with the new outputBatchSize option and average row 
> width of incoming left and right batches. Output row count will be minimum of 
> 1 and max of 64k. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-5978) Upgrade Hive libraries to 2.1.1 version.

2018-02-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16354930#comment-16354930
 ] 

ASF GitHub Bot commented on DRILL-5978:
---

Github user vrozov commented on a diff in the pull request:

https://github.com/apache/drill/pull/#discussion_r166508479
  
--- Diff: contrib/storage-hive/core/pom.xml ---
@@ -58,6 +58,10 @@
   commons-codec
   commons-codec
 
+
--- End diff --

Is the exclusion necessary due to a version conflict or the dependency is 
not required?


> Upgrade Hive libraries to 2.1.1 version.
> 
>
> Key: DRILL-5978
> URL: https://issues.apache.org/jira/browse/DRILL-5978
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive
>Affects Versions: 1.11.0
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.13.0
>
>
> Currently Drill uses [Hive version 1.2.1 
> libraries|https://github.com/apache/drill/blob/master/pom.xml#L53] to perform 
> queries on Hive. This version of library can be used for Hive1.x versions and 
> Hive2.x versions too, but some features of Hive2.x are broken (for example 
> using of ORC transactional tables). To fix that it will be good to update 
> drill-hive library version to 2.1 or newer. 
> Tasks which should be done:
> - resolving dependency conflicts;
> - investigating backward compatibility of newer drill-hive library with older 
> Hive versions (1.x);
> - updating drill-hive version for 
> [MapR|https://github.com/apache/drill/blob/master/pom.xml#L1777] profile too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-5978) Upgrade Hive libraries to 2.1.1 version.

2018-02-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16354936#comment-16354936
 ] 

ASF GitHub Bot commented on DRILL-5978:
---

Github user vrozov commented on a diff in the pull request:

https://github.com/apache/drill/pull/#discussion_r166513403
  
--- Diff: contrib/storage-hive/hive-exec-shade/pom.xml ---
@@ -39,23 +39,28 @@
   log4j
 
 
-  commons-codec
-  commons-codec
-
-
-  calcite-avatica
-  org.apache.calcite
+  org.json
+  json
 
   
 
+
+  org.apache.parquet
+  parquet-column
+  ${parquet.version}
--- End diff --

Any other parquet dependencies? If they are not needed, why the explicit 
dependency on org.apache.parquet:parquet-column is necessary?


> Upgrade Hive libraries to 2.1.1 version.
> 
>
> Key: DRILL-5978
> URL: https://issues.apache.org/jira/browse/DRILL-5978
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive
>Affects Versions: 1.11.0
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.13.0
>
>
> Currently Drill uses [Hive version 1.2.1 
> libraries|https://github.com/apache/drill/blob/master/pom.xml#L53] to perform 
> queries on Hive. This version of library can be used for Hive1.x versions and 
> Hive2.x versions too, but some features of Hive2.x are broken (for example 
> using of ORC transactional tables). To fix that it will be good to update 
> drill-hive library version to 2.1 or newer. 
> Tasks which should be done:
> - resolving dependency conflicts;
> - investigating backward compatibility of newer drill-hive library with older 
> Hive versions (1.x);
> - updating drill-hive version for 
> [MapR|https://github.com/apache/drill/blob/master/pom.xml#L1777] profile too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-5978) Upgrade Hive libraries to 2.1.1 version.

2018-02-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16354932#comment-16354932
 ] 

ASF GitHub Bot commented on DRILL-5978:
---

Github user vrozov commented on a diff in the pull request:

https://github.com/apache/drill/pull/#discussion_r166512235
  
--- Diff: 
contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveUtilities.java
 ---
@@ -507,5 +510,52 @@ public static boolean 
hasHeaderOrFooter(HiveTableWithColumnCache table) {
 int skipFooter = retrieveIntProperty(tableProperties, 
serdeConstants.FOOTER_COUNT, -1);
 return skipHeader > 0 || skipFooter > 0;
   }
+
+  /**
+   * This method checks whether the table is transactional and set 
necessary properties in {@link JobConf}.
+   * If schema evolution properties aren't set in job conf for the input 
format, method sets the column names
+   * and types from table/partition properties or storage descriptor.
+   *
+   * @param job the job to update
+   * @param properties table or partition properties
+   * @param sd storage descriptor
+   */
+  public static void verifyAndAddTransactionalProperties(JobConf job, 
Properties properties, StorageDescriptor sd) {
+
+if (AcidUtils.isTablePropertyTransactional(properties)) {
+  AcidUtils.setTransactionalTableScan(job, true);
+
+  // No work is needed, if schema evolution is used
+  if (Utilities.isSchemaEvolutionEnabled(job, true) && 
job.get(IOConstants.SCHEMA_EVOLUTION_COLUMNS) != null &&
+  job.get(IOConstants.SCHEMA_EVOLUTION_COLUMNS_TYPES) != null) {
+return;
+  }
+
+  String colNames;
+  String colTypes;
+
+  // Try to get get column names and types from table or partition 
properties. If they are absent there, get columns
+  // data from storage descriptor of the table
+  if (properties.containsKey(serdeConstants.LIST_COLUMNS) && 
properties.containsKey(serdeConstants.LIST_COLUMN_TYPES)) {
+colNames = job.get(serdeConstants.LIST_COLUMNS);
+colTypes = job.get(serdeConstants.LIST_COLUMN_TYPES);
+  } else {
+final StringBuilder colNamesBuilder = new StringBuilder();
--- End diff --

consider using `Joiner`.


> Upgrade Hive libraries to 2.1.1 version.
> 
>
> Key: DRILL-5978
> URL: https://issues.apache.org/jira/browse/DRILL-5978
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive
>Affects Versions: 1.11.0
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.13.0
>
>
> Currently Drill uses [Hive version 1.2.1 
> libraries|https://github.com/apache/drill/blob/master/pom.xml#L53] to perform 
> queries on Hive. This version of library can be used for Hive1.x versions and 
> Hive2.x versions too, but some features of Hive2.x are broken (for example 
> using of ORC transactional tables). To fix that it will be good to update 
> drill-hive library version to 2.1 or newer. 
> Tasks which should be done:
> - resolving dependency conflicts;
> - investigating backward compatibility of newer drill-hive library with older 
> Hive versions (1.x);
> - updating drill-hive version for 
> [MapR|https://github.com/apache/drill/blob/master/pom.xml#L1777] profile too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-5978) Upgrade Hive libraries to 2.1.1 version.

2018-02-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16354928#comment-16354928
 ] 

ASF GitHub Bot commented on DRILL-5978:
---

Github user vrozov commented on a diff in the pull request:

https://github.com/apache/drill/pull/#discussion_r166508171
  
--- Diff: contrib/storage-hive/core/pom.xml ---
@@ -101,6 +105,7 @@
 
   org.apache.calcite
   calcite-core
+  ${calcite.version}
--- End diff --

The same question as for common/pom.xml.


> Upgrade Hive libraries to 2.1.1 version.
> 
>
> Key: DRILL-5978
> URL: https://issues.apache.org/jira/browse/DRILL-5978
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive
>Affects Versions: 1.11.0
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.13.0
>
>
> Currently Drill uses [Hive version 1.2.1 
> libraries|https://github.com/apache/drill/blob/master/pom.xml#L53] to perform 
> queries on Hive. This version of library can be used for Hive1.x versions and 
> Hive2.x versions too, but some features of Hive2.x are broken (for example 
> using of ORC transactional tables). To fix that it will be good to update 
> drill-hive library version to 2.1 or newer. 
> Tasks which should be done:
> - resolving dependency conflicts;
> - investigating backward compatibility of newer drill-hive library with older 
> Hive versions (1.x);
> - updating drill-hive version for 
> [MapR|https://github.com/apache/drill/blob/master/pom.xml#L1777] profile too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-5978) Upgrade Hive libraries to 2.1.1 version.

2018-02-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16354934#comment-16354934
 ] 

ASF GitHub Bot commented on DRILL-5978:
---

Github user vrozov commented on a diff in the pull request:

https://github.com/apache/drill/pull/#discussion_r166510396
  
--- Diff: 
contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveUtilities.java
 ---
@@ -507,5 +510,52 @@ public static boolean 
hasHeaderOrFooter(HiveTableWithColumnCache table) {
 int skipFooter = retrieveIntProperty(tableProperties, 
serdeConstants.FOOTER_COUNT, -1);
 return skipHeader > 0 || skipFooter > 0;
   }
+
+  /**
+   * This method checks whether the table is transactional and set 
necessary properties in {@link JobConf}.
+   * If schema evolution properties aren't set in job conf for the input 
format, method sets the column names
+   * and types from table/partition properties or storage descriptor.
+   *
+   * @param job the job to update
+   * @param properties table or partition properties
+   * @param sd storage descriptor
+   */
+  public static void verifyAndAddTransactionalProperties(JobConf job, 
Properties properties, StorageDescriptor sd) {
--- End diff --

Is it necessary to pass both `JobConf` and `Properties`? As far as I can 
see `job` is always populated using passed `properties`.


> Upgrade Hive libraries to 2.1.1 version.
> 
>
> Key: DRILL-5978
> URL: https://issues.apache.org/jira/browse/DRILL-5978
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive
>Affects Versions: 1.11.0
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.13.0
>
>
> Currently Drill uses [Hive version 1.2.1 
> libraries|https://github.com/apache/drill/blob/master/pom.xml#L53] to perform 
> queries on Hive. This version of library can be used for Hive1.x versions and 
> Hive2.x versions too, but some features of Hive2.x are broken (for example 
> using of ORC transactional tables). To fix that it will be good to update 
> drill-hive library version to 2.1 or newer. 
> Tasks which should be done:
> - resolving dependency conflicts;
> - investigating backward compatibility of newer drill-hive library with older 
> Hive versions (1.x);
> - updating drill-hive version for 
> [MapR|https://github.com/apache/drill/blob/master/pom.xml#L1777] profile too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-5978) Upgrade Hive libraries to 2.1.1 version.

2018-02-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16354931#comment-16354931
 ] 

ASF GitHub Bot commented on DRILL-5978:
---

Github user vrozov commented on a diff in the pull request:

https://github.com/apache/drill/pull/#discussion_r166512379
  
--- Diff: 
contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveUtilities.java
 ---
@@ -507,5 +510,52 @@ public static boolean 
hasHeaderOrFooter(HiveTableWithColumnCache table) {
 int skipFooter = retrieveIntProperty(tableProperties, 
serdeConstants.FOOTER_COUNT, -1);
 return skipHeader > 0 || skipFooter > 0;
   }
+
+  /**
+   * This method checks whether the table is transactional and set 
necessary properties in {@link JobConf}.
+   * If schema evolution properties aren't set in job conf for the input 
format, method sets the column names
+   * and types from table/partition properties or storage descriptor.
+   *
+   * @param job the job to update
+   * @param properties table or partition properties
+   * @param sd storage descriptor
+   */
+  public static void verifyAndAddTransactionalProperties(JobConf job, 
Properties properties, StorageDescriptor sd) {
+
+if (AcidUtils.isTablePropertyTransactional(properties)) {
+  AcidUtils.setTransactionalTableScan(job, true);
+
+  // No work is needed, if schema evolution is used
+  if (Utilities.isSchemaEvolutionEnabled(job, true) && 
job.get(IOConstants.SCHEMA_EVOLUTION_COLUMNS) != null &&
+  job.get(IOConstants.SCHEMA_EVOLUTION_COLUMNS_TYPES) != null) {
+return;
+  }
+
+  String colNames;
+  String colTypes;
+
+  // Try to get get column names and types from table or partition 
properties. If they are absent there, get columns
+  // data from storage descriptor of the table
+  if (properties.containsKey(serdeConstants.LIST_COLUMNS) && 
properties.containsKey(serdeConstants.LIST_COLUMN_TYPES)) {
--- End diff --

avoid double `get()` (`containsKey()`) if possible.


> Upgrade Hive libraries to 2.1.1 version.
> 
>
> Key: DRILL-5978
> URL: https://issues.apache.org/jira/browse/DRILL-5978
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive
>Affects Versions: 1.11.0
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.13.0
>
>
> Currently Drill uses [Hive version 1.2.1 
> libraries|https://github.com/apache/drill/blob/master/pom.xml#L53] to perform 
> queries on Hive. This version of library can be used for Hive1.x versions and 
> Hive2.x versions too, but some features of Hive2.x are broken (for example 
> using of ORC transactional tables). To fix that it will be good to update 
> drill-hive library version to 2.1 or newer. 
> Tasks which should be done:
> - resolving dependency conflicts;
> - investigating backward compatibility of newer drill-hive library with older 
> Hive versions (1.x);
> - updating drill-hive version for 
> [MapR|https://github.com/apache/drill/blob/master/pom.xml#L1777] profile too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-5978) Upgrade Hive libraries to 2.1.1 version.

2018-02-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16354933#comment-16354933
 ] 

ASF GitHub Bot commented on DRILL-5978:
---

Github user vrozov commented on a diff in the pull request:

https://github.com/apache/drill/pull/#discussion_r166513062
  
--- Diff: contrib/storage-hive/hive-exec-shade/pom.xml ---
@@ -39,23 +39,28 @@
   log4j
 
 
-  commons-codec
-  commons-codec
-
-
-  calcite-avatica
-  org.apache.calcite
+  org.json
--- End diff --

Has new version of hive introduced the dependency on org.json:json?


> Upgrade Hive libraries to 2.1.1 version.
> 
>
> Key: DRILL-5978
> URL: https://issues.apache.org/jira/browse/DRILL-5978
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive
>Affects Versions: 1.11.0
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.13.0
>
>
> Currently Drill uses [Hive version 1.2.1 
> libraries|https://github.com/apache/drill/blob/master/pom.xml#L53] to perform 
> queries on Hive. This version of library can be used for Hive1.x versions and 
> Hive2.x versions too, but some features of Hive2.x are broken (for example 
> using of ORC transactional tables). To fix that it will be good to update 
> drill-hive library version to 2.1 or newer. 
> Tasks which should be done:
> - resolving dependency conflicts;
> - investigating backward compatibility of newer drill-hive library with older 
> Hive versions (1.x);
> - updating drill-hive version for 
> [MapR|https://github.com/apache/drill/blob/master/pom.xml#L1777] profile too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-5978) Upgrade Hive libraries to 2.1.1 version.

2018-02-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16354935#comment-16354935
 ] 

ASF GitHub Bot commented on DRILL-5978:
---

Github user vrozov commented on a diff in the pull request:

https://github.com/apache/drill/pull/#discussion_r166513525
  
--- Diff: contrib/storage-hive/hive-exec-shade/pom.xml ---
@@ -39,23 +39,28 @@
   log4j
 
 
-  commons-codec
-  commons-codec
-
-
-  calcite-avatica
-  org.apache.calcite
+  org.json
+  json
 
   
 
+
+  org.apache.parquet
+  parquet-column
+  ${parquet.version}
+
+
+  com.tdunning
+  json
+
   
 
   
 
   
 org.apache.maven.plugins
 maven-shade-plugin
-2.1
+3.1.0
--- End diff --

What is the reason for the version change and should it be applied to other 
modules where shading is used?


> Upgrade Hive libraries to 2.1.1 version.
> 
>
> Key: DRILL-5978
> URL: https://issues.apache.org/jira/browse/DRILL-5978
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive
>Affects Versions: 1.11.0
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.13.0
>
>
> Currently Drill uses [Hive version 1.2.1 
> libraries|https://github.com/apache/drill/blob/master/pom.xml#L53] to perform 
> queries on Hive. This version of library can be used for Hive1.x versions and 
> Hive2.x versions too, but some features of Hive2.x are broken (for example 
> using of ORC transactional tables). To fix that it will be good to update 
> drill-hive library version to 2.1 or newer. 
> Tasks which should be done:
> - resolving dependency conflicts;
> - investigating backward compatibility of newer drill-hive library with older 
> Hive versions (1.x);
> - updating drill-hive version for 
> [MapR|https://github.com/apache/drill/blob/master/pom.xml#L1777] profile too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-5978) Upgrade Hive libraries to 2.1.1 version.

2018-02-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16354929#comment-16354929
 ] 

ASF GitHub Bot commented on DRILL-5978:
---

Github user vrozov commented on a diff in the pull request:

https://github.com/apache/drill/pull/#discussion_r166507937
  
--- Diff: common/pom.xml ---
@@ -45,6 +45,7 @@
 
   org.apache.calcite
   calcite-core
+  ${calcite.version}
--- End diff --

Why is this change necessary? The version should come from 
`` of the parent pom.


> Upgrade Hive libraries to 2.1.1 version.
> 
>
> Key: DRILL-5978
> URL: https://issues.apache.org/jira/browse/DRILL-5978
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive
>Affects Versions: 1.11.0
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.13.0
>
>
> Currently Drill uses [Hive version 1.2.1 
> libraries|https://github.com/apache/drill/blob/master/pom.xml#L53] to perform 
> queries on Hive. This version of library can be used for Hive1.x versions and 
> Hive2.x versions too, but some features of Hive2.x are broken (for example 
> using of ORC transactional tables). To fix that it will be good to update 
> drill-hive library version to 2.1 or newer. 
> Tasks which should be done:
> - resolving dependency conflicts;
> - investigating backward compatibility of newer drill-hive library with older 
> Hive versions (1.x);
> - updating drill-hive version for 
> [MapR|https://github.com/apache/drill/blob/master/pom.xml#L1777] profile too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6141) JOIN query that uses USING clause returns incorrect results

2018-02-06 Thread Khurram Faraaz (JIRA)
Khurram Faraaz created DRILL-6141:
-

 Summary: JOIN query that uses USING clause returns incorrect 
results
 Key: DRILL-6141
 URL: https://issues.apache.org/jira/browse/DRILL-6141
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Affects Versions: 1.12.0
Reporter: Khurram Faraaz


Join query that uses USING clause returns incorrect results.

Postgres 9.2.23 returns only one occurrence of the "id" column

{noformat}
postgres=# create table t1(id int, name varchar(30));
CREATE TABLE
postgres=# create table t2(id int, name varchar(30));
CREATE TABLE

postgres=# select * from t1;
 id | name
+---
 10 | John
 13 | Kevin
 15 | Susan
(3 rows)

postgres=# select * from t2;
 id | name
+---
 19 | Kyle
 13 | Kevin
 1 | Bob
 17 | Kumar
(4 rows)

postgres=# select * from t1 JOIN t2 USING(id);
 id | name | name
+---+---
 13 | Kevin | Kevin
(1 row)

{noformat}

results from Drill 1.12.0-mapr commit : 2de42491be795721bcb4059bd46e27fc33272309

{noformat}


0: jdbc:drill:schema=dfs.tmp> create table t1 as select cast(columns[0] as int) 
c1, cast(columns[1] as varchar(30)) c2 from `t1.csv`;
+---++
| Fragment | Number of records written |
+---++
| 0_0 | 3 |
+---++
1 row selected (0.213 seconds)
0: jdbc:drill:schema=dfs.tmp> create table t2 as select cast(columns[0] as int) 
c1, cast(columns[1] as varchar(30)) c2 from `t2.csv`;
+---++
| Fragment | Number of records written |
+---++
| 0_0 | 4 |
+---++
1 row selected (0.168 seconds)

0: jdbc:drill:schema=dfs.tmp> select * from t1;
+-++
| c1 | c2 |
+-++
| 10 | John |
| 13 | Kevin |
| 15 | Susan |
+-++
3 rows selected (0.15 seconds)
0: jdbc:drill:schema=dfs.tmp> select * from t2;
+-++
| c1 | c2 |
+-++
| 19 | Kyle |
| 13 | Kevin |
| 1 | Bob |
| 17 | Kumar |
+-++
4 rows selected (0.171 seconds)

## Note that Drill returns an extra column, unlike Postgres, for the same query 
over same data

0: jdbc:drill:schema=dfs.tmp> select * from t1 JOIN t2 USING(c1);
+-++--++
| c1 | c2 | c10 | c20 |
+-++--++
| 13 | Kevin | 13 | Kevin |
+-++--++
1 row selected (0.256 seconds)

## explain plan for above query

0: jdbc:drill:schema=dfs.tmp> explain plan for select * from t1 JOIN t2 
USING(c1);
+--+--+
| text | json |
+--+--+
| 00-00 Screen
00-01 ProjectAllowDup(*=[$0], *0=[$1])
00-02 Project(T49¦¦*=[$0], T48¦¦*=[$2])
00-03 Project(T49¦¦*=[$2], c10=[$3], T48¦¦*=[$0], c1=[$1])
00-04 HashJoin(condition=[=($3, $1)], joinType=[inner])
00-06 Project(T48¦¦*=[$0], c1=[$1])
00-08 Scan(table=[[dfs, tmp, t2]], groupscan=[ParquetGroupScan 
[entries=[ReadEntryWithPath [path=maprfs:///tmp/t2]], 
selectionRoot=maprfs:/tmp/t2, numFiles=1, numRowGroups=1, 
usedMetadataFile=false, columns=[`*`]]])
00-05 Project(T49¦¦*=[$0], c10=[$1])
00-07 Project(T49¦¦*=[$0], c1=[$1])
00-09 Scan(table=[[dfs, tmp, t1]], groupscan=[ParquetGroupScan 
[entries=[ReadEntryWithPath [path=maprfs:///tmp/t1]], 
selectionRoot=maprfs:/tmp/t1, numFiles=1, numRowGroups=1, 
usedMetadataFile=false, columns=[`*`]]])

{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6138) Move RecordBatchSizer to org.apache.drill.exec.record package

2018-02-06 Thread Padma Penumarthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Padma Penumarthy updated DRILL-6138:

Labels: ready-to-commit  (was: )

> Move RecordBatchSizer to org.apache.drill.exec.record package
> -
>
> Key: DRILL-6138
> URL: https://issues.apache.org/jira/browse/DRILL-6138
> Project: Apache Drill
>  Issue Type: Task
>  Components: Execution - Flow
>Affects Versions: 1.12.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.13.0
>
>
> Move RecordBatchSizer from org.apache.drill.exec.physical.impl.spill package 
> to org.apache.drill.exec.record package.
> Minor refactoring - change columnSizes from list to map. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6140) Operators listed in Profiles Page doesn't always correspond with operator specified in Physical Plan

2018-02-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16355008#comment-16355008
 ] 

ASF GitHub Bot commented on DRILL-6140:
---

GitHub user kkhatua opened a pull request:

https://github.com/apache/drill/pull/1116

DRILL-6140: Correctly list Operators in Profiles Page

Operators listed in Profiles Page don't always correspond with operator 
specified in Physical Plan.
This commit fixes that by using the PhysicalPlan as a reference, but 
reverts to the inferred names in the event of an Exchange-based operator

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kkhatua/drill DRILL-6140

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/1116.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1116


commit 4ff7e34690235cb7c2522aed0eda3d027c0ddbe8
Author: Kunal Khatua 
Date:   2018-02-07T06:14:58Z

DRILL-6140: Correctly list Operators in Profiles Page

Operators listed in Profiles Page don't always correspond with operator 
specified in Physical Plan.
This commit fixes that by using the PhysicalPlan as a reference, but 
reverts to the inferred names in the event of an Exchange-based operator




> Operators listed in Profiles Page doesn't always correspond with operator 
> specified in Physical Plan
> 
>
> Key: DRILL-6140
> URL: https://issues.apache.org/jira/browse/DRILL-6140
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Web Server
>Affects Versions: 1.12.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>Priority: Major
> Fix For: 1.13.0
>
> Attachments: 25978a1a-24cf-fb4a-17af-59e7115b4fa1.sys.drill
>
>
> A query's physical plan correctly shows
> {code}
>  00-00 Screen : rowType = RecordType(BIGINT EXPR$0): rowcount = 1.0, 
> cumulative cost = { ...
>00-01 Project(EXPR$0=[$0]) : rowType = RecordType(BIGINT EXPR$0): rowcount 
> = 1.0, cumulative cost = { ...
>  00-02 StreamAgg(group=[{}], EXPR$0=[$SUM0($0)]) : rowType = 
> RecordType(BIGINT EXPR$0): rowcount = 1.0, cumulative cost = { ...
>00-03 UnionExchange : rowType = RecordType(BIGINT EXPR$0): rowcount = 
> 1.0, cumulative cost = { ...
>  01-01 StreamAgg(group=[{}], EXPR$0=[COUNT()]) : rowType = 
> RecordType(BIGINT EXPR$0): rowcount = 1.0, cumulative cost = { ...
>01-02 Project($f0=[0]) : rowType = RecordType(INTEGER $f0): 
> rowcount = 1.79279253E7, cumulative cost = ...
>  01-03 Flatten(flattenField=[$1]) : rowType = RecordType(ANY 
> rfsSpecCode, ...
>01-04 Project(rfsSpecCode=[$1], PUResultsArray=[$2]) : rowType 
> = ...
>  01-05 SelectionVectorRemover : rowType = RecordType(ANY 
> schemaName, ...
>01-06 Filter(condition=[=($0, 'OnyxBlue')]) : rowType = ...
>  01-07 Project(schemaName=[$0], ITEM=[ITEM($1, 
> 'rfsSpecCode')], ...
>01-08 Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath [
> {code}
> However, the profile page shows the operators as...
> ||Operator ID || Type || Metrics||
> |00-xx-00 | SCREEN | ... |
> |00-xx-01 | PROJECT | ... |
> |00-xx-02 | STREAMING_AGGREGATE | ... |
> |00-xx-03 | UNORDERED_RECEIVER | ... |
> |01-xx-00 | SINGLE_SENDER | ... |
> |01-xx-01 | STREAMING_AGGREGATE | ... |
> |01-xx-02 | PROJECT | ... |
> |01-xx-03 | SINGLE_SENDER | ... |
> |01-xx-04 | PROJECT | ... |
> |01-xx-05 | SELECTION_VECTOR_REMOVER | ... |
> |01-xx-06 | FILTER | ... |
> |01-xx-07 | PROJECT | ... |
> |01-xx-08 | PARQUET_ROW_GROUP_SCAN | ... |
> As you can see ... {{FLATTEN}}  operator appears as a {{SINGLE_SENDER}} 
> making the profile hard to interpret.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6140) Operators listed in Profiles Page doesn't always correspond with operator specified in Physical Plan

2018-02-06 Thread Kunal Khatua (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Khatua updated DRILL-6140:

Fix Version/s: 1.13.0

> Operators listed in Profiles Page doesn't always correspond with operator 
> specified in Physical Plan
> 
>
> Key: DRILL-6140
> URL: https://issues.apache.org/jira/browse/DRILL-6140
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Web Server
>Affects Versions: 1.12.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>Priority: Major
> Fix For: 1.13.0
>
> Attachments: 25978a1a-24cf-fb4a-17af-59e7115b4fa1.sys.drill
>
>
> A query's physical plan correctly shows
> {code}
>  00-00 Screen : rowType = RecordType(BIGINT EXPR$0): rowcount = 1.0, 
> cumulative cost = { ...
>00-01 Project(EXPR$0=[$0]) : rowType = RecordType(BIGINT EXPR$0): rowcount 
> = 1.0, cumulative cost = { ...
>  00-02 StreamAgg(group=[{}], EXPR$0=[$SUM0($0)]) : rowType = 
> RecordType(BIGINT EXPR$0): rowcount = 1.0, cumulative cost = { ...
>00-03 UnionExchange : rowType = RecordType(BIGINT EXPR$0): rowcount = 
> 1.0, cumulative cost = { ...
>  01-01 StreamAgg(group=[{}], EXPR$0=[COUNT()]) : rowType = 
> RecordType(BIGINT EXPR$0): rowcount = 1.0, cumulative cost = { ...
>01-02 Project($f0=[0]) : rowType = RecordType(INTEGER $f0): 
> rowcount = 1.79279253E7, cumulative cost = ...
>  01-03 Flatten(flattenField=[$1]) : rowType = RecordType(ANY 
> rfsSpecCode, ...
>01-04 Project(rfsSpecCode=[$1], PUResultsArray=[$2]) : rowType 
> = ...
>  01-05 SelectionVectorRemover : rowType = RecordType(ANY 
> schemaName, ...
>01-06 Filter(condition=[=($0, 'OnyxBlue')]) : rowType = ...
>  01-07 Project(schemaName=[$0], ITEM=[ITEM($1, 
> 'rfsSpecCode')], ...
>01-08 Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath [
> {code}
> However, the profile page shows the operators as...
> ||Operator ID || Type || Metrics||
> |00-xx-00 | SCREEN | ... |
> |00-xx-01 | PROJECT | ... |
> |00-xx-02 | STREAMING_AGGREGATE | ... |
> |00-xx-03 | UNORDERED_RECEIVER | ... |
> |01-xx-00 | SINGLE_SENDER | ... |
> |01-xx-01 | STREAMING_AGGREGATE | ... |
> |01-xx-02 | PROJECT | ... |
> |01-xx-03 | SINGLE_SENDER | ... |
> |01-xx-04 | PROJECT | ... |
> |01-xx-05 | SELECTION_VECTOR_REMOVER | ... |
> |01-xx-06 | FILTER | ... |
> |01-xx-07 | PROJECT | ... |
> |01-xx-08 | PARQUET_ROW_GROUP_SCAN | ... |
> As you can see ... {{FLATTEN}}  operator appears as a {{SINGLE_SENDER}} 
> making the profile hard to interpret.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-5978) Upgrade Hive libraries to 2.1.1 version.

2018-02-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16353546#comment-16353546
 ] 

ASF GitHub Bot commented on DRILL-5978:
---

Github user sohami commented on a diff in the pull request:

https://github.com/apache/drill/pull/#discussion_r166202560
  
--- Diff: 
contrib/storage-hive/core/src/test/java/org/apache/drill/exec/impersonation/hive/TestSqlStdBasedAuthorization.java
 ---
@@ -100,7 +103,9 @@ private static void 
setSqlStdBasedAuthorizationInHiveConf() {
 hiveConfig.put(METASTORE_EXECUTE_SET_UGI.varname, 
hiveConf.get(METASTORE_EXECUTE_SET_UGI.varname));
 hiveConfig.put(HIVE_AUTHORIZATION_ENABLED.varname, 
hiveConf.get(HIVE_AUTHORIZATION_ENABLED.varname));
 hiveConfig.put(HIVE_AUTHENTICATOR_MANAGER.varname, 
SessionStateUserAuthenticator.class.getName());
-hiveConfig.put(HIVE_AUTHORIZATION_MANAGER.varname, 
SQLStdHiveAuthorizerFactory.class.getName());
--- End diff --

why is this removed ? The test seem to be for Authorization.


> Upgrade Hive libraries to 2.1.1 version.
> 
>
> Key: DRILL-5978
> URL: https://issues.apache.org/jira/browse/DRILL-5978
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive
>Affects Versions: 1.11.0
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.13.0
>
>
> Currently Drill uses [Hive version 1.2.1 
> libraries|https://github.com/apache/drill/blob/master/pom.xml#L53] to perform 
> queries on Hive. This version of library can be used for Hive1.x versions and 
> Hive2.x versions too, but some features of Hive2.x are broken (for example 
> using of ORC transactional tables). To fix that it will be good to update 
> drill-hive library version to 2.1 or newer. 
> Tasks which should be done:
> - resolving dependency conflicts;
> - investigating backward compatibility of newer drill-hive library with older 
> Hive versions (1.x);
> - updating drill-hive version for 
> [MapR|https://github.com/apache/drill/blob/master/pom.xml#L1777] profile too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-5978) Upgrade Hive libraries to 2.1.1 version.

2018-02-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16353548#comment-16353548
 ] 

ASF GitHub Bot commented on DRILL-5978:
---

Github user sohami commented on a diff in the pull request:

https://github.com/apache/drill/pull/#discussion_r166213941
  
--- Diff: 
contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveUtilities.java
 ---
@@ -507,5 +509,51 @@ public static boolean 
hasHeaderOrFooter(HiveTableWithColumnCache table) {
 int skipFooter = retrieveIntProperty(tableProperties, 
serdeConstants.FOOTER_COUNT, -1);
 return skipHeader > 0 || skipFooter > 0;
   }
+
+  /**
+   * This method checks whether the schema evolution properties are set in 
job conf for the input format. If they
+   * aren't set, method sets the column names and types from 
table/partition properties or storage descriptor.
+   * @param job the job to update
+   * @param properties table or partition properties
+   * @param isAcidTable true if the table is transactional, false otherwise
+   * @param sd storage descriptor
+   */
+  public static void setColumnTypes(JobConf job, Properties properties, 
boolean isAcidTable, StorageDescriptor sd) {
+
+// No work is needed, if schema evolution is used
+if (Utilities.isSchemaEvolutionEnabled(job, isAcidTable) && 
job.get(IOConstants.SCHEMA_EVOLUTION_COLUMNS) != null &&
+job.get(IOConstants.SCHEMA_EVOLUTION_COLUMNS_TYPES) != null) {
+  return;
+}
+
+String colNames;
+String colTypes;
+
+// Try to get get column names and types from table or partition 
properties. If they are absent there, get columns
+// data from storage descriptor of the table
+if (properties.containsKey(serdeConstants.LIST_COLUMNS) && 
properties.containsKey(serdeConstants.LIST_COLUMN_TYPES)) {
+  colNames = job.get(serdeConstants.LIST_COLUMNS);
+  colTypes = job.get(serdeConstants.LIST_COLUMN_TYPES);
+} else {
+  StringBuilder colNamesBuilder = new StringBuilder();
+  StringBuilder colTypesBuilder = new StringBuilder();
+  boolean isFirst = true;
+  for(FieldSchema col: sd.getCols()) {
+if (isFirst) {
+  isFirst = false;
+} else {
+  colNamesBuilder.append(',');
+  colTypesBuilder.append(',');
+}
+colNamesBuilder.append(col.getName());
+colTypesBuilder.append(col.getType());
+  }
+  colNames = colNamesBuilder.toString();
+  colTypes = colTypesBuilder.toString();
--- End diff --

how about changing the loop as below:

```
final StringBuilder colNamesBuilder = new StringBuilder();
final StringBuilder colTypesBuilder = new StringBuilder();

 for(FieldSchema col: sd.getCols()) {
   colNamesBuilder.append(col.getName());
   colTypesBuilder.append(col.getType());
   colNamesBuilder.append(',');
   colTypesBuilder.append(',');
  }
  colNames = colNamesBuilder.substring(0, colNamesBuilder.length() - 1);
  colTypes = colTypesBuilder.substring(0, colTypesBuilder.length() - 1);
```



> Upgrade Hive libraries to 2.1.1 version.
> 
>
> Key: DRILL-5978
> URL: https://issues.apache.org/jira/browse/DRILL-5978
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive
>Affects Versions: 1.11.0
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.13.0
>
>
> Currently Drill uses [Hive version 1.2.1 
> libraries|https://github.com/apache/drill/blob/master/pom.xml#L53] to perform 
> queries on Hive. This version of library can be used for Hive1.x versions and 
> Hive2.x versions too, but some features of Hive2.x are broken (for example 
> using of ORC transactional tables). To fix that it will be good to update 
> drill-hive library version to 2.1 or newer. 
> Tasks which should be done:
> - resolving dependency conflicts;
> - investigating backward compatibility of newer drill-hive library with older 
> Hive versions (1.x);
> - updating drill-hive version for 
> [MapR|https://github.com/apache/drill/blob/master/pom.xml#L1777] profile too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-5978) Upgrade Hive libraries to 2.1.1 version.

2018-02-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16353549#comment-16353549
 ] 

ASF GitHub Bot commented on DRILL-5978:
---

Github user sohami commented on a diff in the pull request:

https://github.com/apache/drill/pull/#discussion_r166198314
  
--- Diff: pom.xml ---
@@ -884,13 +884,33 @@
 io.netty
 netty-all
   
+  
+org.mortbay.jetty
+servlet-api
--- End diff --

duplicate exclusion. Already excluded at the top


> Upgrade Hive libraries to 2.1.1 version.
> 
>
> Key: DRILL-5978
> URL: https://issues.apache.org/jira/browse/DRILL-5978
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive
>Affects Versions: 1.11.0
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.13.0
>
>
> Currently Drill uses [Hive version 1.2.1 
> libraries|https://github.com/apache/drill/blob/master/pom.xml#L53] to perform 
> queries on Hive. This version of library can be used for Hive1.x versions and 
> Hive2.x versions too, but some features of Hive2.x are broken (for example 
> using of ORC transactional tables). To fix that it will be good to update 
> drill-hive library version to 2.1 or newer. 
> Tasks which should be done:
> - resolving dependency conflicts;
> - investigating backward compatibility of newer drill-hive library with older 
> Hive versions (1.x);
> - updating drill-hive version for 
> [MapR|https://github.com/apache/drill/blob/master/pom.xml#L1777] profile too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-5978) Upgrade Hive libraries to 2.1.1 version.

2018-02-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16353547#comment-16353547
 ] 

ASF GitHub Bot commented on DRILL-5978:
---

Github user sohami commented on a diff in the pull request:

https://github.com/apache/drill/pull/#discussion_r166209232
  
--- Diff: 
contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveMetadataProvider.java
 ---
@@ -264,6 +265,10 @@ private HiveStats 
getStatsEstimateFromInputSplits(final List
   final List splits = Lists.newArrayList();
   final JobConf job = new JobConf(hiveConf);
   HiveUtilities.addConfToJob(job, properties);
+  if (AcidUtils.isTablePropertyTransactional(properties)) {
+AcidUtils.setTransactionalTableScan(job, true);
+HiveUtilities.setColumnTypes(job, properties, true, sd);
+  }
--- End diff --

How about refactoring this block of code to a new method in HiveUtilities ? 
Like `verifyAndAddTransactionalProperty()`. Then just call that method from 
both here and HiveAbstractReader


> Upgrade Hive libraries to 2.1.1 version.
> 
>
> Key: DRILL-5978
> URL: https://issues.apache.org/jira/browse/DRILL-5978
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive
>Affects Versions: 1.11.0
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.13.0
>
>
> Currently Drill uses [Hive version 1.2.1 
> libraries|https://github.com/apache/drill/blob/master/pom.xml#L53] to perform 
> queries on Hive. This version of library can be used for Hive1.x versions and 
> Hive2.x versions too, but some features of Hive2.x are broken (for example 
> using of ORC transactional tables). To fix that it will be good to update 
> drill-hive library version to 2.1 or newer. 
> Tasks which should be done:
> - resolving dependency conflicts;
> - investigating backward compatibility of newer drill-hive library with older 
> Hive versions (1.x);
> - updating drill-hive version for 
> [MapR|https://github.com/apache/drill/blob/master/pom.xml#L1777] profile too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-5978) Upgrade Hive libraries to 2.1.1 version.

2018-02-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16353842#comment-16353842
 ] 

ASF GitHub Bot commented on DRILL-5978:
---

Github user vdiravka commented on a diff in the pull request:

https://github.com/apache/drill/pull/#discussion_r166251399
  
--- Diff: 
contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveUtilities.java
 ---
@@ -507,5 +509,51 @@ public static boolean 
hasHeaderOrFooter(HiveTableWithColumnCache table) {
 int skipFooter = retrieveIntProperty(tableProperties, 
serdeConstants.FOOTER_COUNT, -1);
 return skipHeader > 0 || skipFooter > 0;
   }
+
+  /**
+   * This method checks whether the schema evolution properties are set in 
job conf for the input format. If they
+   * aren't set, method sets the column names and types from 
table/partition properties or storage descriptor.
+   * @param job the job to update
+   * @param properties table or partition properties
+   * @param isAcidTable true if the table is transactional, false otherwise
+   * @param sd storage descriptor
+   */
+  public static void setColumnTypes(JobConf job, Properties properties, 
boolean isAcidTable, StorageDescriptor sd) {
+
+// No work is needed, if schema evolution is used
+if (Utilities.isSchemaEvolutionEnabled(job, isAcidTable) && 
job.get(IOConstants.SCHEMA_EVOLUTION_COLUMNS) != null &&
+job.get(IOConstants.SCHEMA_EVOLUTION_COLUMNS_TYPES) != null) {
+  return;
+}
+
+String colNames;
+String colTypes;
+
+// Try to get get column names and types from table or partition 
properties. If they are absent there, get columns
+// data from storage descriptor of the table
+if (properties.containsKey(serdeConstants.LIST_COLUMNS) && 
properties.containsKey(serdeConstants.LIST_COLUMN_TYPES)) {
+  colNames = job.get(serdeConstants.LIST_COLUMNS);
+  colTypes = job.get(serdeConstants.LIST_COLUMN_TYPES);
+} else {
+  StringBuilder colNamesBuilder = new StringBuilder();
+  StringBuilder colTypesBuilder = new StringBuilder();
+  boolean isFirst = true;
+  for(FieldSchema col: sd.getCols()) {
+if (isFirst) {
+  isFirst = false;
+} else {
+  colNamesBuilder.append(',');
+  colTypesBuilder.append(',');
+}
+colNamesBuilder.append(col.getName());
+colTypesBuilder.append(col.getType());
+  }
+  colNames = colNamesBuilder.toString();
+  colTypes = colTypesBuilder.toString();
--- End diff --

I like it. Thank you.


> Upgrade Hive libraries to 2.1.1 version.
> 
>
> Key: DRILL-5978
> URL: https://issues.apache.org/jira/browse/DRILL-5978
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive
>Affects Versions: 1.11.0
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.13.0
>
>
> Currently Drill uses [Hive version 1.2.1 
> libraries|https://github.com/apache/drill/blob/master/pom.xml#L53] to perform 
> queries on Hive. This version of library can be used for Hive1.x versions and 
> Hive2.x versions too, but some features of Hive2.x are broken (for example 
> using of ORC transactional tables). To fix that it will be good to update 
> drill-hive library version to 2.1 or newer. 
> Tasks which should be done:
> - resolving dependency conflicts;
> - investigating backward compatibility of newer drill-hive library with older 
> Hive versions (1.x);
> - updating drill-hive version for 
> [MapR|https://github.com/apache/drill/blob/master/pom.xml#L1777] profile too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-5978) Upgrade Hive libraries to 2.1.1 version.

2018-02-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16353841#comment-16353841
 ] 

ASF GitHub Bot commented on DRILL-5978:
---

Github user vdiravka commented on a diff in the pull request:

https://github.com/apache/drill/pull/#discussion_r166246336
  
--- Diff: 
contrib/storage-hive/core/src/test/java/org/apache/drill/exec/impersonation/hive/TestSqlStdBasedAuthorization.java
 ---
@@ -100,7 +103,9 @@ private static void 
setSqlStdBasedAuthorizationInHiveConf() {
 hiveConfig.put(METASTORE_EXECUTE_SET_UGI.varname, 
hiveConf.get(METASTORE_EXECUTE_SET_UGI.varname));
 hiveConfig.put(HIVE_AUTHORIZATION_ENABLED.varname, 
hiveConf.get(HIVE_AUTHORIZATION_ENABLED.varname));
 hiveConfig.put(HIVE_AUTHENTICATOR_MANAGER.varname, 
SessionStateUserAuthenticator.class.getName());
-hiveConfig.put(HIVE_AUTHORIZATION_MANAGER.varname, 
SQLStdHiveAuthorizerFactory.class.getName());
--- End diff --

Did it accidentally. I've returned this string.


> Upgrade Hive libraries to 2.1.1 version.
> 
>
> Key: DRILL-5978
> URL: https://issues.apache.org/jira/browse/DRILL-5978
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive
>Affects Versions: 1.11.0
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.13.0
>
>
> Currently Drill uses [Hive version 1.2.1 
> libraries|https://github.com/apache/drill/blob/master/pom.xml#L53] to perform 
> queries on Hive. This version of library can be used for Hive1.x versions and 
> Hive2.x versions too, but some features of Hive2.x are broken (for example 
> using of ORC transactional tables). To fix that it will be good to update 
> drill-hive library version to 2.1 or newer. 
> Tasks which should be done:
> - resolving dependency conflicts;
> - investigating backward compatibility of newer drill-hive library with older 
> Hive versions (1.x);
> - updating drill-hive version for 
> [MapR|https://github.com/apache/drill/blob/master/pom.xml#L1777] profile too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-5978) Upgrade Hive libraries to 2.1.1 version.

2018-02-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16353839#comment-16353839
 ] 

ASF GitHub Bot commented on DRILL-5978:
---

Github user vdiravka commented on a diff in the pull request:

https://github.com/apache/drill/pull/#discussion_r166282137
  
--- Diff: 
contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveMetadataProvider.java
 ---
@@ -264,6 +265,10 @@ private HiveStats 
getStatsEstimateFromInputSplits(final List
   final List splits = Lists.newArrayList();
   final JobConf job = new JobConf(hiveConf);
   HiveUtilities.addConfToJob(job, properties);
+  if (AcidUtils.isTablePropertyTransactional(properties)) {
+AcidUtils.setTransactionalTableScan(job, true);
+HiveUtilities.setColumnTypes(job, properties, true, sd);
+  }
--- End diff --

It makes sense. Moreover since schema_evolution is required for acid tables 
[HIVE-12799](https://issues.apache.org/jira/browse/HIVE-12799) I've combined it 
with setColumnTypes() helper method.


> Upgrade Hive libraries to 2.1.1 version.
> 
>
> Key: DRILL-5978
> URL: https://issues.apache.org/jira/browse/DRILL-5978
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive
>Affects Versions: 1.11.0
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.13.0
>
>
> Currently Drill uses [Hive version 1.2.1 
> libraries|https://github.com/apache/drill/blob/master/pom.xml#L53] to perform 
> queries on Hive. This version of library can be used for Hive1.x versions and 
> Hive2.x versions too, but some features of Hive2.x are broken (for example 
> using of ORC transactional tables). To fix that it will be good to update 
> drill-hive library version to 2.1 or newer. 
> Tasks which should be done:
> - resolving dependency conflicts;
> - investigating backward compatibility of newer drill-hive library with older 
> Hive versions (1.x);
> - updating drill-hive version for 
> [MapR|https://github.com/apache/drill/blob/master/pom.xml#L1777] profile too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-5978) Upgrade Hive libraries to 2.1.1 version.

2018-02-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16353840#comment-16353840
 ] 

ASF GitHub Bot commented on DRILL-5978:
---

Github user vdiravka commented on a diff in the pull request:

https://github.com/apache/drill/pull/#discussion_r166247024
  
--- Diff: pom.xml ---
@@ -884,13 +884,33 @@
 io.netty
 netty-all
   
+  
+org.mortbay.jetty
+servlet-api
--- End diff --

You are right. Exclusion is removed. Thank you.


> Upgrade Hive libraries to 2.1.1 version.
> 
>
> Key: DRILL-5978
> URL: https://issues.apache.org/jira/browse/DRILL-5978
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive
>Affects Versions: 1.11.0
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.13.0
>
>
> Currently Drill uses [Hive version 1.2.1 
> libraries|https://github.com/apache/drill/blob/master/pom.xml#L53] to perform 
> queries on Hive. This version of library can be used for Hive1.x versions and 
> Hive2.x versions too, but some features of Hive2.x are broken (for example 
> using of ORC transactional tables). To fix that it will be good to update 
> drill-hive library version to 2.1 or newer. 
> Tasks which should be done:
> - resolving dependency conflicts;
> - investigating backward compatibility of newer drill-hive library with older 
> Hive versions (1.x);
> - updating drill-hive version for 
> [MapR|https://github.com/apache/drill/blob/master/pom.xml#L1777] profile too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6118) Handle item star columns during project / filter push down and directory pruning

2018-02-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16353918#comment-16353918
 ] 

ASF GitHub Bot commented on DRILL-6118:
---

Github user vvysotskyi commented on a diff in the pull request:

https://github.com/apache/drill/pull/1104#discussion_r166310161
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectRecordBatch.java
 ---
@@ -596,10 +596,10 @@ private void classifyExpr(final NamedExpression ex, 
final RecordBatch incoming,
 final NameSegment ref = ex.getRef().getRootSegment();
 final boolean exprHasPrefix = 
expr.getPath().contains(StarColumnHelper.PREFIX_DELIMITER);
 final boolean refHasPrefix = 
ref.getPath().contains(StarColumnHelper.PREFIX_DELIMITER);
-final boolean exprIsStar = expr.getPath().equals(SchemaPath.WILDCARD);
-final boolean refContainsStar = 
ref.getPath().contains(SchemaPath.WILDCARD);
-final boolean exprContainsStar = 
expr.getPath().contains(SchemaPath.WILDCARD);
-final boolean refEndsWithStar = 
ref.getPath().endsWith(SchemaPath.WILDCARD);
+final boolean exprIsStar = 
expr.getPath().equals(SchemaPath.DYNAMIC_STAR);
--- End diff --

This change became required after Calcite update. With the changes in 
CALCITE-1150, `*` is replaced by `**` after a query is parsed and `**` is added 
to the RowType. Therefore WILDCARD can't come from the plan and its usage 
should be replaced by `**`.


> Handle item star columns during project  /  filter push down and directory 
> pruning
> --
>
> Key: DRILL-6118
> URL: https://issues.apache.org/jira/browse/DRILL-6118
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.12.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.13.0
>
>
> Project push down, filter push down and partition pruning does not work with 
> dynamically expanded column with is represented as star in ITEM operator: 
> _ITEM($0, 'column_name')_ where $0 is a star.
>  This often occurs when view, sub-select or cte with star is issued.
>  To solve this issue we can create {{DrillFilterItemStarReWriterRule}} which 
> will rewrite such ITEM operator before filter push down and directory 
> pruning. For project into scan push down logic will be handled separately in 
> already existing rule {{DrillPushProjectIntoScanRule}}. Basically, we can 
> consider the following queries the same: 
>  {{select col1 from t}}
>  {{select col1 from (select * from t)}}
> *Use cases*
> Since item star columns where not considered during project / filter push 
> down and directory pruning, push down and pruning did not happen. This was 
> causing Drill to read all columns from file (when only several are needed) or 
> ready all files instead. Views with star query is the most common example. 
> Such behavior significantly degrades performance for item star queries 
> comparing to queries without item star.
> *EXAMPLES*
> *Data set* 
> will create table with three files each in dedicated sub-folder:
> {noformat}
> use dfs.tmp;
> create table `order_ctas/t1` as select cast(o_orderdate as date) as 
> o_orderdate from cp.`tpch/orders.parquet` where o_orderdate between date 
> '1992-01-01' and date '1992-01-03';
> create table `order_ctas/t2` as select cast(o_orderdate as date) as 
> o_orderdate from cp.`tpch/orders.parquet` where o_orderdate between date 
> '1992-01-04' and date '1992-01-06';
> create table `order_ctas/t3` as select cast(o_orderdate as date) as 
> o_orderdate from cp.`tpch/orders.parquet` where o_orderdate between date 
> '1992-01-07' and date '1992-01-09';
> {noformat}
> *Filter push down*
> {{select * from order_ctas where o_orderdate = date '1992-01-01'}} will read 
> only one file
> {noformat}
> 00-00Screen
> 00-01  Project(**=[$0])
> 00-02Project(T1¦¦**=[$0])
> 00-03  SelectionVectorRemover
> 00-04Filter(condition=[=($1, 1992-01-01)])
> 00-05  Project(T1¦¦**=[$0], o_orderdate=[$1])
> 00-06Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath [path=/tmp/order_ctas/t1/0_0_0.parquet]], 
> selectionRoot=/tmp/order_ctas, numFiles=1, numRowGroups=1, 
> usedMetadataFile=false, columns=[`**`]]])
> {noformat}
> {{select * from (select * from order_ctas) where o_orderdate = date 
> '1992-01-01'}} will ready all three files
> {noformat}
> 00-00Screen
> 00-01  Project(**=[$0])
> 00-02SelectionVectorRemover
> 00-03  Filter(condition=[=(ITEM($0, 'o_orderdate'), 1992-01-01)])
> 00-04Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
>