[jira] [Commented] (HIVE-17896) TopNKey: Create a standalone vectorizable TopNKey operator

2018-07-19 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16549839#comment-16549839
 ] 

Hive QA commented on HIVE-17896:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12932197/HIVE-17896.13.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:green}SUCCESS:{color} +1 due to 14678 tests passed

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/12704/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/12704/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-12704/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12932197 - PreCommit-HIVE-Build

> TopNKey: Create a standalone vectorizable TopNKey operator
> --
>
> Key: HIVE-17896
> URL: https://issues.apache.org/jira/browse/HIVE-17896
> Project: Hive
>  Issue Type: New Feature
>  Components: Operators
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Assignee: Teddy Choi
>Priority: Major
> Attachments: HIVE-17896.1.patch, HIVE-17896.10.patch, 
> HIVE-17896.11.patch, HIVE-17896.12.patch, HIVE-17896.13.patch, 
> HIVE-17896.3.patch, HIVE-17896.4.patch, HIVE-17896.5.patch, 
> HIVE-17896.6.patch, HIVE-17896.7.patch, HIVE-17896.8.patch, HIVE-17896.9.patch
>
>
> For TPC-DS Query27, the TopN operation is delayed by the group-by - the 
> group-by operator buffers up all the rows before discarding the 99% of the 
> rows in the TopN Hash within the ReduceSink Operator.
> The RS TopN operator is very restrictive as it only supports doing the 
> filtering on the shuffle keys, but it is better to do this before breaking 
> the vectors into rows and losing the isRepeating properties.
> Adding a TopN Key operator in the physical operator tree allows the following 
> to happen.
> GBY->RS(Top=1)
> can become 
> TNK(1)->GBY->RS(Top=1)
> So that, the TopNKey can remove rows before they are buffered into the GBY 
> and consume memory.
> Here's the equivalent implementation in Presto
> https://github.com/prestodb/presto/blob/master/presto-main/src/main/java/com/facebook/presto/operator/TopNOperator.java#L35
> Adding this as a sub-feature of GroupBy prevents further optimizations if the 
> GBY is on keys "a,b,c" and the TopNKey is on just "a".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-17896) TopNKey: Create a standalone vectorizable TopNKey operator

2018-07-19 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16549795#comment-16549795
 ] 

Hive QA commented on HIVE-17896:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
49s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
 9s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
44s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
15s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
32s{color} | {color:blue} common in master has 64 extant Findbugs warnings. 
{color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
41s{color} | {color:blue} serde in master has 194 extant Findbugs warnings. 
{color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  4m  
1s{color} | {color:blue} ql in master has 2273 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
28s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
22s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
 6s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
41s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
45s{color} | {color:red} ql: The patch generated 32 new + 430 unchanged - 0 
fixed = 462 total (was 430) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
53s{color} | {color:red} serde generated 1 new + 194 unchanged - 0 fixed = 195 
total (was 194) {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  4m 
27s{color} | {color:red} ql generated 8 new + 2273 unchanged - 0 fixed = 2281 
total (was 2273) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
29s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
13s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 32m 58s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:serde |
|  |  
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.compare(Object[],
 ObjectInspector[], Object[], ObjectInspector[], boolean[]) negates the return 
value of 
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.compare(Object,
 ObjectInspector, Object, ObjectInspector)  At 
ObjectInspectorUtils.java:negates the return value of 
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.compare(Object,
 ObjectInspector, Object, ObjectInspector)  At ObjectInspectorUtils.java:[line 
956] |
| FindBugs | module:ql |
|  |  new 
org.apache.hadoop.hive.ql.exec.TopNKeyOperator$KeyWrapperComparator(ObjectInspector[],
 ObjectInspector[], boolean[]) may expose internal representation by storing an 
externally mutable object into 
TopNKeyOperator$KeyWrapperComparator.columnSortOrderIsDesc  At 
TopNKeyOperator.java:expose internal representation by storing an externally 
mutable object into TopNKeyOperator$KeyWrapperComparator.columnSortOrderIsDesc  
At TopNKeyOperator.java:[line 83] |
|  |  new 
org.apache.hadoop.hive.ql.exec.TopNKeyOperator$KeyWrapperComparator(ObjectInspector[],
 ObjectInspector[], boolean[]) may expose internal representation by storing an 
externally mutable object into 
TopNKeyOperator$KeyWrapperComparator.objectInspectors1  At 
TopNKeyOperator.java:expose internal representation by storing an externally 
mutable object into 

[jira] [Commented] (HIVE-17896) TopNKey: Create a standalone vectorizable TopNKey operator

2018-07-19 Thread Jesus Camacho Rodriguez (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16549678#comment-16549678
 ] 

Jesus Camacho Rodriguez commented on HIVE-17896:


+1 to latest patch (pending tests)

[~teddy.choi], please push it to branch-3 too. Thanks

> TopNKey: Create a standalone vectorizable TopNKey operator
> --
>
> Key: HIVE-17896
> URL: https://issues.apache.org/jira/browse/HIVE-17896
> Project: Hive
>  Issue Type: New Feature
>  Components: Operators
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Assignee: Teddy Choi
>Priority: Major
> Attachments: HIVE-17896.1.patch, HIVE-17896.10.patch, 
> HIVE-17896.11.patch, HIVE-17896.12.patch, HIVE-17896.13.patch, 
> HIVE-17896.3.patch, HIVE-17896.4.patch, HIVE-17896.5.patch, 
> HIVE-17896.6.patch, HIVE-17896.7.patch, HIVE-17896.8.patch, HIVE-17896.9.patch
>
>
> For TPC-DS Query27, the TopN operation is delayed by the group-by - the 
> group-by operator buffers up all the rows before discarding the 99% of the 
> rows in the TopN Hash within the ReduceSink Operator.
> The RS TopN operator is very restrictive as it only supports doing the 
> filtering on the shuffle keys, but it is better to do this before breaking 
> the vectors into rows and losing the isRepeating properties.
> Adding a TopN Key operator in the physical operator tree allows the following 
> to happen.
> GBY->RS(Top=1)
> can become 
> TNK(1)->GBY->RS(Top=1)
> So that, the TopNKey can remove rows before they are buffered into the GBY 
> and consume memory.
> Here's the equivalent implementation in Presto
> https://github.com/prestodb/presto/blob/master/presto-main/src/main/java/com/facebook/presto/operator/TopNOperator.java#L35
> Adding this as a sub-feature of GroupBy prevents further optimizations if the 
> GBY is on keys "a,b,c" and the TopNKey is on just "a".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-17896) TopNKey: Create a standalone vectorizable TopNKey operator

2018-07-18 Thread Jesus Camacho Rodriguez (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548481#comment-16548481
 ] 

Jesus Camacho Rodriguez commented on HIVE-17896:


[~teddy.choi], thanks for applying the changes. Could you regenerate those q 
files and update RB to check them too? Latest patch LGTM, unless [~mmccline] or 
[~gopalv] have any comment about the vectorized operator.

> TopNKey: Create a standalone vectorizable TopNKey operator
> --
>
> Key: HIVE-17896
> URL: https://issues.apache.org/jira/browse/HIVE-17896
> Project: Hive
>  Issue Type: New Feature
>  Components: Operators
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Assignee: Teddy Choi
>Priority: Major
> Attachments: HIVE-17896.1.patch, HIVE-17896.10.patch, 
> HIVE-17896.11.patch, HIVE-17896.12.patch, HIVE-17896.3.patch, 
> HIVE-17896.4.patch, HIVE-17896.5.patch, HIVE-17896.6.patch, 
> HIVE-17896.7.patch, HIVE-17896.8.patch, HIVE-17896.9.patch
>
>
> For TPC-DS Query27, the TopN operation is delayed by the group-by - the 
> group-by operator buffers up all the rows before discarding the 99% of the 
> rows in the TopN Hash within the ReduceSink Operator.
> The RS TopN operator is very restrictive as it only supports doing the 
> filtering on the shuffle keys, but it is better to do this before breaking 
> the vectors into rows and losing the isRepeating properties.
> Adding a TopN Key operator in the physical operator tree allows the following 
> to happen.
> GBY->RS(Top=1)
> can become 
> TNK(1)->GBY->RS(Top=1)
> So that, the TopNKey can remove rows before they are buffered into the GBY 
> and consume memory.
> Here's the equivalent implementation in Presto
> https://github.com/prestodb/presto/blob/master/presto-main/src/main/java/com/facebook/presto/operator/TopNOperator.java#L35
> Adding this as a sub-feature of GroupBy prevents further optimizations if the 
> GBY is on keys "a,b,c" and the TopNKey is on just "a".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-17896) TopNKey: Create a standalone vectorizable TopNKey operator

2018-07-17 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16546510#comment-16546510
 ] 

Hive QA commented on HIVE-17896:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12931901/HIVE-17896.12.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 48 failed/errored test(s), 14668 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[explainuser_2] 
(batchId=154)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_struct_type_vectorization]
 (batchId=153)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[parquet_complex_types_vectorization]
 (batchId=154)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[parquet_map_type_vectorization]
 (batchId=154)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[parquet_struct_type_vectorization]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[bucket_groupby]
 (batchId=179)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[check_constraint]
 (batchId=160)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[explainuser_1]
 (batchId=165)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[limit_pushdown3]
 (batchId=163)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[limit_pushdown]
 (batchId=174)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_decimal64_reader]
 (batchId=168)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[offset_limit]
 (batchId=165)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[offset_limit_ppd_optimizer]
 (batchId=173)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_cast_constant]
 (batchId=157)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_char_2]
 (batchId=174)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_groupby_grouping_sets_limit]
 (batchId=165)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_groupby_reduce]
 (batchId=171)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_mr_diff_schema_alias]
 (batchId=172)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_reduce_groupby_decimal]
 (batchId=164)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_string_concat]
 (batchId=164)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorization_limit]
 (batchId=165)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query10] 
(batchId=262)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query14] 
(batchId=262)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query15] 
(batchId=262)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query17] 
(batchId=262)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query25] 
(batchId=262)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query26] 
(batchId=262)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query27] 
(batchId=262)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query29] 
(batchId=262)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query35] 
(batchId=262)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query37] 
(batchId=262)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query40] 
(batchId=262)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query43] 
(batchId=262)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query45] 
(batchId=262)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query49] 
(batchId=262)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query50] 
(batchId=262)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query5] 
(batchId=262)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query60] 
(batchId=262)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query66] 
(batchId=262)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query69] 
(batchId=262)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query76] 
(batchId=262)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query77] 
(batchId=262)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query7] 
(batchId=262)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query80] 
(batchId=262)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query82] 
(batchId=262)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query8] 
(batchId=262)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query99] 
(batchId=262)

[jira] [Commented] (HIVE-17896) TopNKey: Create a standalone vectorizable TopNKey operator

2018-07-17 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16546489#comment-16546489
 ] 

Hive QA commented on HIVE-17896:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
49s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
49s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
46s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
17s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
36s{color} | {color:blue} common in master has 64 extant Findbugs warnings. 
{color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
43s{color} | {color:blue} serde in master has 194 extant Findbugs warnings. 
{color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  4m 
18s{color} | {color:blue} ql in master has 2273 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
30s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
27s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
 5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
47s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
46s{color} | {color:red} ql: The patch generated 32 new + 430 unchanged - 0 
fixed = 462 total (was 430) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
53s{color} | {color:red} serde generated 1 new + 194 unchanged - 0 fixed = 195 
total (was 194) {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  4m 
41s{color} | {color:red} ql generated 8 new + 2273 unchanged - 0 fixed = 2281 
total (was 2273) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
28s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
14s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 33m 31s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:serde |
|  |  
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.compare(Object[],
 ObjectInspector[], Object[], ObjectInspector[], boolean[]) negates the return 
value of 
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.compare(Object,
 ObjectInspector, Object, ObjectInspector)  At 
ObjectInspectorUtils.java:negates the return value of 
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.compare(Object,
 ObjectInspector, Object, ObjectInspector)  At ObjectInspectorUtils.java:[line 
956] |
| FindBugs | module:ql |
|  |  new 
org.apache.hadoop.hive.ql.exec.TopNKeyOperator$KeyWrapperComparator(ObjectInspector[],
 ObjectInspector[], boolean[]) may expose internal representation by storing an 
externally mutable object into 
TopNKeyOperator$KeyWrapperComparator.columnSortOrderIsDesc  At 
TopNKeyOperator.java:expose internal representation by storing an externally 
mutable object into TopNKeyOperator$KeyWrapperComparator.columnSortOrderIsDesc  
At TopNKeyOperator.java:[line 83] |
|  |  new 
org.apache.hadoop.hive.ql.exec.TopNKeyOperator$KeyWrapperComparator(ObjectInspector[],
 ObjectInspector[], boolean[]) may expose internal representation by storing an 
externally mutable object into 
TopNKeyOperator$KeyWrapperComparator.objectInspectors1  At 
TopNKeyOperator.java:expose internal representation by storing an externally 
mutable object into 

[jira] [Commented] (HIVE-17896) TopNKey: Create a standalone vectorizable TopNKey operator

2018-07-12 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16542352#comment-16542352
 ] 

Hive QA commented on HIVE-17896:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12931291/HIVE-17896.11.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:green}SUCCESS:{color} +1 due to 14656 tests passed

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/12567/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/12567/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-12567/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12931291 - PreCommit-HIVE-Build

> TopNKey: Create a standalone vectorizable TopNKey operator
> --
>
> Key: HIVE-17896
> URL: https://issues.apache.org/jira/browse/HIVE-17896
> Project: Hive
>  Issue Type: New Feature
>  Components: Operators
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Assignee: Teddy Choi
>Priority: Major
> Attachments: HIVE-17896.1.patch, HIVE-17896.10.patch, 
> HIVE-17896.11.patch, HIVE-17896.3.patch, HIVE-17896.4.patch, 
> HIVE-17896.5.patch, HIVE-17896.6.patch, HIVE-17896.7.patch, 
> HIVE-17896.8.patch, HIVE-17896.9.patch
>
>
> For TPC-DS Query27, the TopN operation is delayed by the group-by - the 
> group-by operator buffers up all the rows before discarding the 99% of the 
> rows in the TopN Hash within the ReduceSink Operator.
> The RS TopN operator is very restrictive as it only supports doing the 
> filtering on the shuffle keys, but it is better to do this before breaking 
> the vectors into rows and losing the isRepeating properties.
> Adding a TopN Key operator in the physical operator tree allows the following 
> to happen.
> GBY->RS(Top=1)
> can become 
> TNK(1)->GBY->RS(Top=1)
> So that, the TopNKey can remove rows before they are buffered into the GBY 
> and consume memory.
> Here's the equivalent implementation in Presto
> https://github.com/prestodb/presto/blob/master/presto-main/src/main/java/com/facebook/presto/operator/TopNOperator.java#L35
> Adding this as a sub-feature of GroupBy prevents further optimizations if the 
> GBY is on keys "a,b,c" and the TopNKey is on just "a".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-17896) TopNKey: Create a standalone vectorizable TopNKey operator

2018-07-12 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16542341#comment-16542341
 ] 

Hive QA commented on HIVE-17896:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  2m 
30s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
49s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
46s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
 6s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  1m  
3s{color} | {color:blue} common in master has 64 extant Findbugs warnings. 
{color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  1m  
8s{color} | {color:blue} serde in master has 194 extant Findbugs warnings. 
{color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  5m  
9s{color} | {color:blue} ql in master has 2289 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m  
7s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
30s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  2m  
0s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
57s{color} | {color:red} ql: The patch generated 35 new + 426 unchanged - 0 
fixed = 461 total (was 426) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
58s{color} | {color:red} serde generated 1 new + 194 unchanged - 0 fixed = 195 
total (was 194) {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  4m 
55s{color} | {color:red} ql generated 8 new + 2289 unchanged - 0 fixed = 2297 
total (was 2289) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
38s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
16s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 42m 15s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:serde |
|  |  
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.compare(Object[],
 ObjectInspector[], Object[], ObjectInspector[], boolean[]) negates the return 
value of 
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.compare(Object,
 ObjectInspector, Object, ObjectInspector)  At 
ObjectInspectorUtils.java:negates the return value of 
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.compare(Object,
 ObjectInspector, Object, ObjectInspector)  At ObjectInspectorUtils.java:[line 
956] |
| FindBugs | module:ql |
|  |  new 
org.apache.hadoop.hive.ql.exec.TopNKeyOperator$KeyWrapperComparator(ObjectInspector[],
 ObjectInspector[], boolean[]) may expose internal representation by storing an 
externally mutable object into 
TopNKeyOperator$KeyWrapperComparator.columnSortOrderIsDesc  At 
TopNKeyOperator.java:expose internal representation by storing an externally 
mutable object into TopNKeyOperator$KeyWrapperComparator.columnSortOrderIsDesc  
At TopNKeyOperator.java:[line 71] |
|  |  new 
org.apache.hadoop.hive.ql.exec.TopNKeyOperator$KeyWrapperComparator(ObjectInspector[],
 ObjectInspector[], boolean[]) may expose internal representation by storing an 
externally mutable object into 
TopNKeyOperator$KeyWrapperComparator.objectInspectors1  At 
TopNKeyOperator.java:expose internal representation by storing an externally 
mutable object into 

[jira] [Commented] (HIVE-17896) TopNKey: Create a standalone vectorizable TopNKey operator

2018-07-12 Thread Teddy Choi (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16541342#comment-16541342
 ] 

Teddy Choi commented on HIVE-17896:
---

This 11th patch removes pushdown implementation and its test code in .q files.

> TopNKey: Create a standalone vectorizable TopNKey operator
> --
>
> Key: HIVE-17896
> URL: https://issues.apache.org/jira/browse/HIVE-17896
> Project: Hive
>  Issue Type: New Feature
>  Components: Operators
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Assignee: Teddy Choi
>Priority: Major
> Attachments: HIVE-17896.1.patch, HIVE-17896.10.patch, 
> HIVE-17896.11.patch, HIVE-17896.3.patch, HIVE-17896.4.patch, 
> HIVE-17896.5.patch, HIVE-17896.6.patch, HIVE-17896.7.patch, 
> HIVE-17896.8.patch, HIVE-17896.9.patch
>
>
> For TPC-DS Query27, the TopN operation is delayed by the group-by - the 
> group-by operator buffers up all the rows before discarding the 99% of the 
> rows in the TopN Hash within the ReduceSink Operator.
> The RS TopN operator is very restrictive as it only supports doing the 
> filtering on the shuffle keys, but it is better to do this before breaking 
> the vectors into rows and losing the isRepeating properties.
> Adding a TopN Key operator in the physical operator tree allows the following 
> to happen.
> GBY->RS(Top=1)
> can become 
> TNK(1)->GBY->RS(Top=1)
> So that, the TopNKey can remove rows before they are buffered into the GBY 
> and consume memory.
> Here's the equivalent implementation in Presto
> https://github.com/prestodb/presto/blob/master/presto-main/src/main/java/com/facebook/presto/operator/TopNOperator.java#L35
> Adding this as a sub-feature of GroupBy prevents further optimizations if the 
> GBY is on keys "a,b,c" and the TopNKey is on just "a".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-17896) TopNKey: Create a standalone vectorizable TopNKey operator

2018-07-12 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16541255#comment-16541255
 ] 

Hive QA commented on HIVE-17896:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12931161/HIVE-17896.10.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:green}SUCCESS:{color} +1 due to 14649 tests passed

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/12546/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/12546/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-12546/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12931161 - PreCommit-HIVE-Build

> TopNKey: Create a standalone vectorizable TopNKey operator
> --
>
> Key: HIVE-17896
> URL: https://issues.apache.org/jira/browse/HIVE-17896
> Project: Hive
>  Issue Type: New Feature
>  Components: Operators
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Assignee: Teddy Choi
>Priority: Major
> Attachments: HIVE-17896.1.patch, HIVE-17896.10.patch, 
> HIVE-17896.3.patch, HIVE-17896.4.patch, HIVE-17896.5.patch, 
> HIVE-17896.6.patch, HIVE-17896.7.patch, HIVE-17896.8.patch, HIVE-17896.9.patch
>
>
> For TPC-DS Query27, the TopN operation is delayed by the group-by - the 
> group-by operator buffers up all the rows before discarding the 99% of the 
> rows in the TopN Hash within the ReduceSink Operator.
> The RS TopN operator is very restrictive as it only supports doing the 
> filtering on the shuffle keys, but it is better to do this before breaking 
> the vectors into rows and losing the isRepeating properties.
> Adding a TopN Key operator in the physical operator tree allows the following 
> to happen.
> GBY->RS(Top=1)
> can become 
> TNK(1)->GBY->RS(Top=1)
> So that, the TopNKey can remove rows before they are buffered into the GBY 
> and consume memory.
> Here's the equivalent implementation in Presto
> https://github.com/prestodb/presto/blob/master/presto-main/src/main/java/com/facebook/presto/operator/TopNOperator.java#L35
> Adding this as a sub-feature of GroupBy prevents further optimizations if the 
> GBY is on keys "a,b,c" and the TopNKey is on just "a".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-17896) TopNKey: Create a standalone vectorizable TopNKey operator

2018-07-12 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16541232#comment-16541232
 ] 

Hive QA commented on HIVE-17896:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
34s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
48s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
48s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
19s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
34s{color} | {color:blue} common in master has 64 extant Findbugs warnings. 
{color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
43s{color} | {color:blue} serde in master has 194 extant Findbugs warnings. 
{color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  4m 
23s{color} | {color:blue} ql in master has 2287 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
30s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
25s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
54s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
47s{color} | {color:red} ql: The patch generated 35 new + 425 unchanged - 0 
fixed = 460 total (was 425) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m  
5s{color} | {color:red} serde generated 1 new + 194 unchanged - 0 fixed = 195 
total (was 194) {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  6m 
18s{color} | {color:red} ql generated 8 new + 2287 unchanged - 0 fixed = 2295 
total (was 2287) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
18s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
16s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 36m 37s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:serde |
|  |  
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.compare(Object[],
 ObjectInspector[], Object[], ObjectInspector[], boolean[]) negates the return 
value of 
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.compare(Object,
 ObjectInspector, Object, ObjectInspector)  At 
ObjectInspectorUtils.java:negates the return value of 
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.compare(Object,
 ObjectInspector, Object, ObjectInspector)  At ObjectInspectorUtils.java:[line 
956] |
| FindBugs | module:ql |
|  |  new 
org.apache.hadoop.hive.ql.exec.TopNKeyOperator$KeyWrapperComparator(ObjectInspector[],
 ObjectInspector[], boolean[]) may expose internal representation by storing an 
externally mutable object into 
TopNKeyOperator$KeyWrapperComparator.columnSortOrderIsDesc  At 
TopNKeyOperator.java:expose internal representation by storing an externally 
mutable object into TopNKeyOperator$KeyWrapperComparator.columnSortOrderIsDesc  
At TopNKeyOperator.java:[line 71] |
|  |  new 
org.apache.hadoop.hive.ql.exec.TopNKeyOperator$KeyWrapperComparator(ObjectInspector[],
 ObjectInspector[], boolean[]) may expose internal representation by storing an 
externally mutable object into 
TopNKeyOperator$KeyWrapperComparator.objectInspectors1  At 
TopNKeyOperator.java:expose internal representation by storing an externally 
mutable object into 

[jira] [Commented] (HIVE-17896) TopNKey: Create a standalone vectorizable TopNKey operator

2018-07-11 Thread Jesus Camacho Rodriguez (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540474#comment-16540474
 ] 

Jesus Camacho Rodriguez commented on HIVE-17896:


Thanks [~teddy.choi].
[~gopalv], [~mmccline], would you mind to take a look at the 
{{VectorTopNKeyOperator}} implementation? https://reviews.apache.org/r/65174/
I am reviewing the rest of the patch. Thanks

> TopNKey: Create a standalone vectorizable TopNKey operator
> --
>
> Key: HIVE-17896
> URL: https://issues.apache.org/jira/browse/HIVE-17896
> Project: Hive
>  Issue Type: New Feature
>  Components: Operators
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Assignee: Teddy Choi
>Priority: Major
> Attachments: HIVE-17896.1.patch, HIVE-17896.10.patch, 
> HIVE-17896.3.patch, HIVE-17896.4.patch, HIVE-17896.5.patch, 
> HIVE-17896.6.patch, HIVE-17896.7.patch, HIVE-17896.8.patch, HIVE-17896.9.patch
>
>
> For TPC-DS Query27, the TopN operation is delayed by the group-by - the 
> group-by operator buffers up all the rows before discarding the 99% of the 
> rows in the TopN Hash within the ReduceSink Operator.
> The RS TopN operator is very restrictive as it only supports doing the 
> filtering on the shuffle keys, but it is better to do this before breaking 
> the vectors into rows and losing the isRepeating properties.
> Adding a TopN Key operator in the physical operator tree allows the following 
> to happen.
> GBY->RS(Top=1)
> can become 
> TNK(1)->GBY->RS(Top=1)
> So that, the TopNKey can remove rows before they are buffered into the GBY 
> and consume memory.
> Here's the equivalent implementation in Presto
> https://github.com/prestodb/presto/blob/master/presto-main/src/main/java/com/facebook/presto/operator/TopNOperator.java#L35
> Adding this as a sub-feature of GroupBy prevents further optimizations if the 
> GBY is on keys "a,b,c" and the TopNKey is on just "a".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-17896) TopNKey: Create a standalone vectorizable TopNKey operator

2018-06-02 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16499135#comment-16499135
 ] 

Hive QA commented on HIVE-17896:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12925842/HIVE-17896.9.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 14449 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[input31] (batchId=63)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[topnkey] 
(batchId=106)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[vector_topnkey] 
(batchId=106)
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/11445/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/11445/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-11445/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12925842 - PreCommit-HIVE-Build

> TopNKey: Create a standalone vectorizable TopNKey operator
> --
>
> Key: HIVE-17896
> URL: https://issues.apache.org/jira/browse/HIVE-17896
> Project: Hive
>  Issue Type: New Feature
>  Components: Operators
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Assignee: Teddy Choi
>Priority: Major
> Attachments: HIVE-17896.1.patch, HIVE-17896.3.patch, 
> HIVE-17896.4.patch, HIVE-17896.5.patch, HIVE-17896.6.patch, 
> HIVE-17896.7.patch, HIVE-17896.8.patch, HIVE-17896.9.patch
>
>
> For TPC-DS Query27, the TopN operation is delayed by the group-by - the 
> group-by operator buffers up all the rows before discarding the 99% of the 
> rows in the TopN Hash within the ReduceSink Operator.
> The RS TopN operator is very restrictive as it only supports doing the 
> filtering on the shuffle keys, but it is better to do this before breaking 
> the vectors into rows and losing the isRepeating properties.
> Adding a TopN Key operator in the physical operator tree allows the following 
> to happen.
> GBY->RS(Top=1)
> can become 
> TNK(1)->GBY->RS(Top=1)
> So that, the TopNKey can remove rows before they are buffered into the GBY 
> and consume memory.
> Here's the equivalent implementation in Presto
> https://github.com/prestodb/presto/blob/master/presto-main/src/main/java/com/facebook/presto/operator/TopNOperator.java#L35
> Adding this as a sub-feature of GroupBy prevents further optimizations if the 
> GBY is on keys "a,b,c" and the TopNKey is on just "a".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-17896) TopNKey: Create a standalone vectorizable TopNKey operator

2018-06-02 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16499127#comment-16499127
 ] 

Hive QA commented on HIVE-17896:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
36s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
35s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
26s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 5s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
28s{color} | {color:blue} common in master has 62 extant Findbugs warnings. 
{color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
36s{color} | {color:blue} serde in master has 190 extant Findbugs warnings. 
{color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  3m 
27s{color} | {color:blue} ql in master has 2278 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
16s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
8s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
26s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
37s{color} | {color:red} ql: The patch generated 37 new + 397 unchanged - 0 
fixed = 434 total (was 397) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
44s{color} | {color:red} serde generated 1 new + 190 unchanged - 0 fixed = 191 
total (was 190) {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  3m 
44s{color} | {color:red} ql generated 8 new + 2278 unchanged - 0 fixed = 2286 
total (was 2278) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
13s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
12s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 27m 14s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:serde |
|  |  
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.compare(Object[],
 ObjectInspector[], Object[], ObjectInspector[], boolean[]) negates the return 
value of 
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.compare(Object,
 ObjectInspector, Object, ObjectInspector)  At 
ObjectInspectorUtils.java:negates the return value of 
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.compare(Object,
 ObjectInspector, Object, ObjectInspector)  At ObjectInspectorUtils.java:[line 
958] |
| FindBugs | module:ql |
|  |  new 
org.apache.hadoop.hive.ql.exec.TopNKeyOperator$KeyWrapperComparator(ObjectInspector[],
 ObjectInspector[], boolean[]) may expose internal representation by storing an 
externally mutable object into 
TopNKeyOperator$KeyWrapperComparator.columnSortOrderIsDesc  At 
TopNKeyOperator.java:expose internal representation by storing an externally 
mutable object into TopNKeyOperator$KeyWrapperComparator.columnSortOrderIsDesc  
At TopNKeyOperator.java:[line 71] |
|  |  new 
org.apache.hadoop.hive.ql.exec.TopNKeyOperator$KeyWrapperComparator(ObjectInspector[],
 ObjectInspector[], boolean[]) may expose internal representation by storing an 
externally mutable object into 
TopNKeyOperator$KeyWrapperComparator.objectInspectors1  At 
TopNKeyOperator.java:expose internal representation by storing an externally 
mutable object into 

[jira] [Commented] (HIVE-17896) TopNKey: Create a standalone vectorizable TopNKey operator

2018-05-30 Thread Gopal V (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16496033#comment-16496033
 ] 

Gopal V commented on HIVE-17896:


[~teddy.choi]: the pushdown is too aggressive at the moment.

The operator impl looks good, so it is the optimizer cases which I'm concerned 
about and would like you to split the multi-op push-down into a different patch.

The best way to get the operator level changes committed right now would be to 
restrict the push-down to a simple case & let it bake-through for a few test 
cycles (specifically, the pushdown through GBY case in the JIRA: GBY->RS(TopN) 
=> TNK->GBY->RS).

For example.

{code}
tez/topnkey.q.out

SELECT src1.key, src2.value FROM src src1 JOIN src src2 ON (src1.key = 
src2.key) ORDER BY src1.key LIMIT 5
{code}

produces

{code}
+  <-Reducer 2 [SIMPLE_EDGE]
+SHUFFLE [RS_10]
+  Select Operator [SEL_9] (rows=809 width=178)
+Output:["_col0","_col1"]
+Top N Key Operator [TNK_18] (rows=809 width=178)
+  keys:_col0,sort order:+,top n:5
+  Merge Join Operator [MERGEJOIN_21] (rows=809 width=178)
+Conds:RS_6._col0=RS_7._col0(Inner),Output:["_col0","_col2"]
+  <-Map 1 [SIMPLE_EDGE]
+SHUFFLE [RS_6]
+  PartitionCols:_col0
+  Select Operator [SEL_2] (rows=500 width=87)
...
+Filter Operator [FIL_16] (rows=500 width=87)
+  predicate:key is not null
+  Top N Key Operator [TNK_19] (rows=500 width=87)
+keys:key,sort order:+,top n:5
+TableScan [TS_0] (rows=500 width=87)
+  
default@src,src1,Tbl:COMPLETE,Col:COMPLETE,Output:["key"]
{code}

The TNK_18 is good, but the TNK_19 is likely to give incorrect results due to 
the nature of the join filter case.

I'm happy to review these as two changes (i.e new operator & aggressive 
push-down) and get them in one by one.

> TopNKey: Create a standalone vectorizable TopNKey operator
> --
>
> Key: HIVE-17896
> URL: https://issues.apache.org/jira/browse/HIVE-17896
> Project: Hive
>  Issue Type: New Feature
>  Components: Operators
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Assignee: Teddy Choi
>Priority: Major
> Attachments: HIVE-17896.1.patch, HIVE-17896.3.patch, 
> HIVE-17896.4.patch, HIVE-17896.5.patch, HIVE-17896.6.patch, 
> HIVE-17896.7.patch, HIVE-17896.8.patch, HIVE-17896.9.patch
>
>
> For TPC-DS Query27, the TopN operation is delayed by the group-by - the 
> group-by operator buffers up all the rows before discarding the 99% of the 
> rows in the TopN Hash within the ReduceSink Operator.
> The RS TopN operator is very restrictive as it only supports doing the 
> filtering on the shuffle keys, but it is better to do this before breaking 
> the vectors into rows and losing the isRepeating properties.
> Adding a TopN Key operator in the physical operator tree allows the following 
> to happen.
> GBY->RS(Top=1)
> can become 
> TNK(1)->GBY->RS(Top=1)
> So that, the TopNKey can remove rows before they are buffered into the GBY 
> and consume memory.
> Here's the equivalent implementation in Presto
> https://github.com/prestodb/presto/blob/master/presto-main/src/main/java/com/facebook/presto/operator/TopNOperator.java#L35
> Adding this as a sub-feature of GroupBy prevents further optimizations if the 
> GBY is on keys "a,b,c" and the TopNKey is on just "a".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-17896) TopNKey: Create a standalone vectorizable TopNKey operator

2018-05-30 Thread Teddy Choi (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16496005#comment-16496005
 ] 

Teddy Choi commented on HIVE-17896:
---

[~gopalv], [~mmccline] could you review this? I rebased it on the latest master 
branch.

> TopNKey: Create a standalone vectorizable TopNKey operator
> --
>
> Key: HIVE-17896
> URL: https://issues.apache.org/jira/browse/HIVE-17896
> Project: Hive
>  Issue Type: New Feature
>  Components: Operators
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Assignee: Teddy Choi
>Priority: Major
> Attachments: HIVE-17896.1.patch, HIVE-17896.3.patch, 
> HIVE-17896.4.patch, HIVE-17896.5.patch, HIVE-17896.6.patch, 
> HIVE-17896.7.patch, HIVE-17896.8.patch, HIVE-17896.9.patch
>
>
> For TPC-DS Query27, the TopN operation is delayed by the group-by - the 
> group-by operator buffers up all the rows before discarding the 99% of the 
> rows in the TopN Hash within the ReduceSink Operator.
> The RS TopN operator is very restrictive as it only supports doing the 
> filtering on the shuffle keys, but it is better to do this before breaking 
> the vectors into rows and losing the isRepeating properties.
> Adding a TopN Key operator in the physical operator tree allows the following 
> to happen.
> GBY->RS(Top=1)
> can become 
> TNK(1)->GBY->RS(Top=1)
> So that, the TopNKey can remove rows before they are buffered into the GBY 
> and consume memory.
> Here's the equivalent implementation in Presto
> https://github.com/prestodb/presto/blob/master/presto-main/src/main/java/com/facebook/presto/operator/TopNOperator.java#L35
> Adding this as a sub-feature of GroupBy prevents further optimizations if the 
> GBY is on keys "a,b,c" and the TopNKey is on just "a".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-17896) TopNKey: Create a standalone vectorizable TopNKey operator

2018-03-08 Thread Teddy Choi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391375#comment-16391375
 ] 

Teddy Choi commented on HIVE-17896:
---

This eighth patch fixes object inspector class cast error and applies recent 
changes in the master branch.

> TopNKey: Create a standalone vectorizable TopNKey operator
> --
>
> Key: HIVE-17896
> URL: https://issues.apache.org/jira/browse/HIVE-17896
> Project: Hive
>  Issue Type: New Feature
>  Components: Operators
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Assignee: Teddy Choi
>Priority: Major
> Attachments: HIVE-17896.1.patch, HIVE-17896.3.patch, 
> HIVE-17896.4.patch, HIVE-17896.5.patch, HIVE-17896.6.patch, 
> HIVE-17896.7.patch, HIVE-17896.8.patch
>
>
> For TPC-DS Query27, the TopN operation is delayed by the group-by - the 
> group-by operator buffers up all the rows before discarding the 99% of the 
> rows in the TopN Hash within the ReduceSink Operator.
> The RS TopN operator is very restrictive as it only supports doing the 
> filtering on the shuffle keys, but it is better to do this before breaking 
> the vectors into rows and losing the isRepeating properties.
> Adding a TopN Key operator in the physical operator tree allows the following 
> to happen.
> GBY->RS(Top=1)
> can become 
> TNK(1)->GBY->RS(Top=1)
> So that, the TopNKey can remove rows before they are buffered into the GBY 
> and consume memory.
> Here's the equivalent implementation in Presto
> https://github.com/prestodb/presto/blob/master/presto-main/src/main/java/com/facebook/presto/operator/TopNOperator.java#L35
> Adding this as a sub-feature of GroupBy prevents further optimizations if the 
> GBY is on keys "a,b,c" and the TopNKey is on just "a".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-17896) TopNKey: Create a standalone vectorizable TopNKey operator

2018-03-05 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16387273#comment-16387273
 ] 

Hive QA commented on HIVE-17896:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12913133/HIVE-17896.7.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 97 failed/errored test(s), 13483 tests 
executed
*Failed tests:*
{noformat}
TestNegativeCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=94)


[jira] [Commented] (HIVE-17896) TopNKey: Create a standalone vectorizable TopNKey operator

2018-03-05 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16387239#comment-16387239
 ] 

Hive QA commented on HIVE-17896:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
1s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
34s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
13s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
25s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 5s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
14s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
8s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
32s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
40s{color} | {color:red} ql: The patch generated 37 new + 420 unchanged - 0 
fixed = 457 total (was 420) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
14s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
12s{color} | {color:red} The patch generated 49 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 16m 54s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-9501/dev-support/hive-personality.sh
 |
| git revision | master / 8d88cfa |
| Default Java | 1.8.0_111 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-9501/yetus/diff-checkstyle-ql.txt
 |
| asflicense | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-9501/yetus/patch-asflicense-problems.txt
 |
| modules | C: common serde itests ql U: . |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-9501/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> TopNKey: Create a standalone vectorizable TopNKey operator
> --
>
> Key: HIVE-17896
> URL: https://issues.apache.org/jira/browse/HIVE-17896
> Project: Hive
>  Issue Type: New Feature
>  Components: Operators
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Assignee: Teddy Choi
>Priority: Major
> Attachments: HIVE-17896.1.patch, HIVE-17896.3.patch, 
> HIVE-17896.4.patch, HIVE-17896.5.patch, HIVE-17896.6.patch, HIVE-17896.7.patch
>
>
> For TPC-DS Query27, the TopN operation is delayed by the group-by - the 
> group-by operator buffers up all the rows before discarding the 99% of the 
> rows in the TopN Hash within the ReduceSink Operator.
> The RS TopN operator is very restrictive as it only supports doing the 
> filtering on the shuffle keys, but it is better to do this before breaking 
> the vectors into rows and losing the isRepeating properties.
> Adding a TopN Key operator in the physical operator tree allows the following 
> to happen.
> GBY->RS(Top=1)
> can become 
> TNK(1)->GBY->RS(Top=1)
> So that, the TopNKey can remove rows before they are buffered into the GBY 
> and consume memory.
> Here's the equivalent implementation in Presto
> 

[jira] [Commented] (HIVE-17896) TopNKey: Create a standalone vectorizable TopNKey operator

2018-03-05 Thread Teddy Choi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16387122#comment-16387122
 ] 

Teddy Choi commented on HIVE-17896:
---

Updated the patch with the latest master branch.

> TopNKey: Create a standalone vectorizable TopNKey operator
> --
>
> Key: HIVE-17896
> URL: https://issues.apache.org/jira/browse/HIVE-17896
> Project: Hive
>  Issue Type: New Feature
>  Components: Operators
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Assignee: Teddy Choi
>Priority: Major
> Attachments: HIVE-17896.1.patch, HIVE-17896.3.patch, 
> HIVE-17896.4.patch, HIVE-17896.5.patch, HIVE-17896.6.patch, HIVE-17896.7.patch
>
>
> For TPC-DS Query27, the TopN operation is delayed by the group-by - the 
> group-by operator buffers up all the rows before discarding the 99% of the 
> rows in the TopN Hash within the ReduceSink Operator.
> The RS TopN operator is very restrictive as it only supports doing the 
> filtering on the shuffle keys, but it is better to do this before breaking 
> the vectors into rows and losing the isRepeating properties.
> Adding a TopN Key operator in the physical operator tree allows the following 
> to happen.
> GBY->RS(Top=1)
> can become 
> TNK(1)->GBY->RS(Top=1)
> So that, the TopNKey can remove rows before they are buffered into the GBY 
> and consume memory.
> Here's the equivalent implementation in Presto
> https://github.com/prestodb/presto/blob/master/presto-main/src/main/java/com/facebook/presto/operator/TopNOperator.java#L35
> Adding this as a sub-feature of GroupBy prevents further optimizations if the 
> GBY is on keys "a,b,c" and the TopNKey is on just "a".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-17896) TopNKey: Create a standalone vectorizable TopNKey operator

2018-01-15 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16326788#comment-16326788
 ] 

Gopal V commented on HIVE-17896:


Thanks Teddy, I will look into the new patch.

> TopNKey: Create a standalone vectorizable TopNKey operator
> --
>
> Key: HIVE-17896
> URL: https://issues.apache.org/jira/browse/HIVE-17896
> Project: Hive
>  Issue Type: New Feature
>  Components: Operators
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Assignee: Teddy Choi
>Priority: Major
> Attachments: HIVE-17896.1.patch, HIVE-17896.3.patch, 
> HIVE-17896.4.patch, HIVE-17896.5.patch, HIVE-17896.6.patch
>
>
> For TPC-DS Query27, the TopN operation is delayed by the group-by - the 
> group-by operator buffers up all the rows before discarding the 99% of the 
> rows in the TopN Hash within the ReduceSink Operator.
> The RS TopN operator is very restrictive as it only supports doing the 
> filtering on the shuffle keys, but it is better to do this before breaking 
> the vectors into rows and losing the isRepeating properties.
> Adding a TopN Key operator in the physical operator tree allows the following 
> to happen.
> GBY->RS(Top=1)
> can become 
> TNK(1)->GBY->RS(Top=1)
> So that, the TopNKey can remove rows before they are buffered into the GBY 
> and consume memory.
> Here's the equivalent implementation in Presto
> https://github.com/prestodb/presto/blob/master/presto-main/src/main/java/com/facebook/presto/operator/TopNOperator.java#L35
> Adding this as a sub-feature of GroupBy prevents further optimizations if the 
> GBY is on keys "a,b,c" and the TopNKey is on just "a".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-17896) TopNKey: Create a standalone vectorizable TopNKey operator

2018-01-15 Thread Teddy Choi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16326740#comment-16326740
 ] 

Teddy Choi commented on HIVE-17896:
---

[~gopalv], I fixed the DPP optimizer bug and uploaded it on RB.

> TopNKey: Create a standalone vectorizable TopNKey operator
> --
>
> Key: HIVE-17896
> URL: https://issues.apache.org/jira/browse/HIVE-17896
> Project: Hive
>  Issue Type: New Feature
>  Components: Operators
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Assignee: Teddy Choi
>Priority: Major
> Attachments: HIVE-17896.1.patch, HIVE-17896.3.patch, 
> HIVE-17896.4.patch, HIVE-17896.5.patch, HIVE-17896.6.patch
>
>
> For TPC-DS Query27, the TopN operation is delayed by the group-by - the 
> group-by operator buffers up all the rows before discarding the 99% of the 
> rows in the TopN Hash within the ReduceSink Operator.
> The RS TopN operator is very restrictive as it only supports doing the 
> filtering on the shuffle keys, but it is better to do this before breaking 
> the vectors into rows and losing the isRepeating properties.
> Adding a TopN Key operator in the physical operator tree allows the following 
> to happen.
> GBY->RS(Top=1)
> can become 
> TNK(1)->GBY->RS(Top=1)
> So that, the TopNKey can remove rows before they are buffered into the GBY 
> and consume memory.
> Here's the equivalent implementation in Presto
> https://github.com/prestodb/presto/blob/master/presto-main/src/main/java/com/facebook/presto/operator/TopNOperator.java#L35
> Adding this as a sub-feature of GroupBy prevents further optimizations if the 
> GBY is on keys "a,b,c" and the TopNKey is on just "a".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-17896) TopNKey: Create a standalone vectorizable TopNKey operator

2018-01-09 Thread Teddy Choi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16318062#comment-16318062
 ] 

Teddy Choi commented on HIVE-17896:
---

[~gopalv], I didn't created schema objects carefully. I will apply your 
feedback. Thanks.

> TopNKey: Create a standalone vectorizable TopNKey operator
> --
>
> Key: HIVE-17896
> URL: https://issues.apache.org/jira/browse/HIVE-17896
> Project: Hive
>  Issue Type: New Feature
>  Components: Operators
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Assignee: Teddy Choi
> Attachments: HIVE-17896.1.patch, HIVE-17896.3.patch, 
> HIVE-17896.4.patch, HIVE-17896.5.patch
>
>
> For TPC-DS Query27, the TopN operation is delayed by the group-by - the 
> group-by operator buffers up all the rows before discarding the 99% of the 
> rows in the TopN Hash within the ReduceSink Operator.
> The RS TopN operator is very restrictive as it only supports doing the 
> filtering on the shuffle keys, but it is better to do this before breaking 
> the vectors into rows and losing the isRepeating properties.
> Adding a TopN Key operator in the physical operator tree allows the following 
> to happen.
> GBY->RS(Top=1)
> can become 
> TNK(1)->GBY->RS(Top=1)
> So that, the TopNKey can remove rows before they are buffered into the GBY 
> and consume memory.
> Here's the equivalent implementation in Presto
> https://github.com/prestodb/presto/blob/master/presto-main/src/main/java/com/facebook/presto/operator/TopNOperator.java#L35
> Adding this as a sub-feature of GroupBy prevents further optimizations if the 
> GBY is on keys "a,b,c" and the TopNKey is on just "a".



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17896) TopNKey: Create a standalone vectorizable TopNKey operator

2018-01-05 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16314210#comment-16314210
 ] 

Gopal V commented on HIVE-17896:


Testing and review ongoing - currently this patch breaks DPP optimizer for some 
reason.

This is TPC-DS Query27 on 1Tb partitioned data, failing explains.

{code}
Caused by: java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.plan.ExprNodeColumnDesc.(ExprNodeColumnDesc.java:84)
at 
org.apache.hadoop.hive.ql.plan.ExprNodeColumnDesc.(ExprNodeColumnDesc.java:80)
at 
org.apache.hadoop.hive.ql.optimizer.DynamicPartitionPruningOptimization.generateSemiJoinOperatorPlan(DynamicPartitionPruningOptimization.java:528)
at 
org.apache.hadoop.hive.ql.optimizer.DynamicPartitionPruningOptimization.process(DynamicPartitionPruningOptimization.java:226)
at 
org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
at 
org.apache.hadoop.hive.ql.lib.ForwardWalker.walk(ForwardWalker.java:74)
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120)
at 
org.apache.hadoop.hive.ql.parse.TezCompiler.runDynamicPartitionPruning(TezCompiler.java:503)
at 
org.apache.hadoop.hive.ql.parse.TezCompiler.optimizeOperatorPlan(TezCompiler.java:159)
at 
org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:145)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:11726)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:299)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:268)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:639)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1504)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1632)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1395)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1382)
{code}

> TopNKey: Create a standalone vectorizable TopNKey operator
> --
>
> Key: HIVE-17896
> URL: https://issues.apache.org/jira/browse/HIVE-17896
> Project: Hive
>  Issue Type: New Feature
>  Components: Operators
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Assignee: Teddy Choi
> Attachments: HIVE-17896.1.patch, HIVE-17896.3.patch, 
> HIVE-17896.4.patch, HIVE-17896.5.patch
>
>
> For TPC-DS Query27, the TopN operation is delayed by the group-by - the 
> group-by operator buffers up all the rows before discarding the 99% of the 
> rows in the TopN Hash within the ReduceSink Operator.
> The RS TopN operator is very restrictive as it only supports doing the 
> filtering on the shuffle keys, but it is better to do this before breaking 
> the vectors into rows and losing the isRepeating properties.
> Adding a TopN Key operator in the physical operator tree allows the following 
> to happen.
> GBY->RS(Top=1)
> can become 
> TNK(1)->GBY->RS(Top=1)
> So that, the TopNKey can remove rows before they are buffered into the GBY 
> and consume memory.
> Here's the equivalent implementation in Presto
> https://github.com/prestodb/presto/blob/master/presto-main/src/main/java/com/facebook/presto/operator/TopNOperator.java#L35
> Adding this as a sub-feature of GroupBy prevents further optimizations if the 
> GBY is on keys "a,b,c" and the TopNKey is on just "a".



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17896) TopNKey: Create a standalone vectorizable TopNKey operator

2018-01-03 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16309581#comment-16309581
 ] 

Hive QA commented on HIVE-17896:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12904342/HIVE-17896.5.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 22 failed/errored test(s), 11548 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join25] (batchId=72)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[mapjoin_hook] 
(batchId=12)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ppd_join5] (batchId=35)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[bucket_map_join_tez1]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[bucketsortoptimize_insert_2]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[hybridgrace_hashjoin_2]
 (batchId=156)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=164)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid] 
(batchId=168)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid_fast]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sysdb] 
(batchId=159)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[bucketizedhiveinputformat]
 (batchId=177)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[authorization_part]
 (batchId=93)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[stats_aggregator_error_1]
 (batchId=93)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[ppd_join5] 
(batchId=120)
org.apache.hadoop.hive.cli.TestSparkPerfCliDriver.testCliDriver[query84] 
(batchId=247)
org.apache.hadoop.hive.metastore.TestEmbeddedHiveMetaStore.testTransactionalValidation
 (batchId=213)
org.apache.hadoop.hive.ql.exec.tez.TestWorkloadManager.testApplyPlanQpChanges 
(batchId=284)
org.apache.hadoop.hive.ql.io.TestDruidRecordWriter.testWrite (batchId=253)
org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testConstraints 
(batchId=225)
org.apache.hive.jdbc.TestSSL.testConnectionMismatch (batchId=231)
org.apache.hive.jdbc.TestSSL.testConnectionWrongCertCN (batchId=231)
org.apache.hive.jdbc.TestSSL.testMetastoreConnectionWrongCertCN (batchId=231)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/8416/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/8416/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-8416/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 22 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12904342 - PreCommit-HIVE-Build

> TopNKey: Create a standalone vectorizable TopNKey operator
> --
>
> Key: HIVE-17896
> URL: https://issues.apache.org/jira/browse/HIVE-17896
> Project: Hive
>  Issue Type: New Feature
>  Components: Operators
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Assignee: Teddy Choi
> Attachments: HIVE-17896.1.patch, HIVE-17896.3.patch, 
> HIVE-17896.4.patch, HIVE-17896.5.patch
>
>
> For TPC-DS Query27, the TopN operation is delayed by the group-by - the 
> group-by operator buffers up all the rows before discarding the 99% of the 
> rows in the TopN Hash within the ReduceSink Operator.
> The RS TopN operator is very restrictive as it only supports doing the 
> filtering on the shuffle keys, but it is better to do this before breaking 
> the vectors into rows and losing the isRepeating properties.
> Adding a TopN Key operator in the physical operator tree allows the following 
> to happen.
> GBY->RS(Top=1)
> can become 
> TNK(1)->GBY->RS(Top=1)
> So that, the TopNKey can remove rows before they are buffered into the GBY 
> and consume memory.
> Here's the equivalent implementation in Presto
> https://github.com/prestodb/presto/blob/master/presto-main/src/main/java/com/facebook/presto/operator/TopNOperator.java#L35
> Adding this as a sub-feature of GroupBy prevents further optimizations if the 
> GBY is on keys "a,b,c" and the TopNKey is on just "a".



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17896) TopNKey: Create a standalone vectorizable TopNKey operator

2018-01-03 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16309533#comment-16309533
 ] 

Hive QA commented on HIVE-17896:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
22s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
21s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
28s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 8s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
17s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
21s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
29s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
17s{color} | {color:red} common: The patch generated 1 new + 932 unchanged - 0 
fixed = 933 total (was 932) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
38s{color} | {color:red} ql: The patch generated 42 new + 584 unchanged - 1 
fixed = 626 total (was 585) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
15s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
11s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 17m 27s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /data/hiveptest/working/yetus/dev-support/hive-personality.sh |
| git revision | master / 98c95c6 |
| Default Java | 1.8.0_111 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-8416/yetus/diff-checkstyle-common.txt
 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-8416/yetus/diff-checkstyle-ql.txt
 |
| modules | C: common serde ql itests U: . |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-8416/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> TopNKey: Create a standalone vectorizable TopNKey operator
> --
>
> Key: HIVE-17896
> URL: https://issues.apache.org/jira/browse/HIVE-17896
> Project: Hive
>  Issue Type: New Feature
>  Components: Operators
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Assignee: Teddy Choi
> Attachments: HIVE-17896.1.patch, HIVE-17896.3.patch, 
> HIVE-17896.4.patch, HIVE-17896.5.patch
>
>
> For TPC-DS Query27, the TopN operation is delayed by the group-by - the 
> group-by operator buffers up all the rows before discarding the 99% of the 
> rows in the TopN Hash within the ReduceSink Operator.
> The RS TopN operator is very restrictive as it only supports doing the 
> filtering on the shuffle keys, but it is better to do this before breaking 
> the vectors into rows and losing the isRepeating properties.
> Adding a TopN Key operator in the physical operator tree allows the following 
> to happen.
> GBY->RS(Top=1)
> can become 
> TNK(1)->GBY->RS(Top=1)
> So that, the TopNKey can remove rows before they 

[jira] [Commented] (HIVE-17896) TopNKey: Create a standalone vectorizable TopNKey operator

2018-01-02 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16308397#comment-16308397
 ] 

Hive QA commented on HIVE-17896:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12904230/HIVE-17896.4.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 20 failed/errored test(s), 11546 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_2] 
(batchId=48)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[lateral_view_ppd] 
(batchId=87)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ppd_join5] (batchId=35)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[bucketsortoptimize_insert_2]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[hybridgrace_hashjoin_2]
 (batchId=156)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=164)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid] 
(batchId=168)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid_fast]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sysdb] 
(batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_reduce_groupby_duplicate_cols]
 (batchId=158)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[authorization_part]
 (batchId=93)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[ppd_join5] 
(batchId=120)
org.apache.hadoop.hive.cli.control.TestDanglingQOuts.checkDanglingQOut 
(batchId=208)
org.apache.hadoop.hive.metastore.TestEmbeddedHiveMetaStore.testTransactionalValidation
 (batchId=213)
org.apache.hadoop.hive.ql.TestAcidOnTez.testMapJoinOnTez (batchId=222)
org.apache.hadoop.hive.ql.io.TestDruidRecordWriter.testWrite (batchId=253)
org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testConstraints 
(batchId=225)
org.apache.hive.jdbc.TestSSL.testConnectionMismatch (batchId=231)
org.apache.hive.jdbc.TestSSL.testConnectionWrongCertCN (batchId=231)
org.apache.hive.jdbc.TestSSL.testMetastoreConnectionWrongCertCN (batchId=231)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/8396/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/8396/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-8396/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 20 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12904230 - PreCommit-HIVE-Build

> TopNKey: Create a standalone vectorizable TopNKey operator
> --
>
> Key: HIVE-17896
> URL: https://issues.apache.org/jira/browse/HIVE-17896
> Project: Hive
>  Issue Type: New Feature
>  Components: Operators
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Assignee: Teddy Choi
> Attachments: HIVE-17896.1.patch, HIVE-17896.3.patch, 
> HIVE-17896.4.patch
>
>
> For TPC-DS Query27, the TopN operation is delayed by the group-by - the 
> group-by operator buffers up all the rows before discarding the 99% of the 
> rows in the TopN Hash within the ReduceSink Operator.
> The RS TopN operator is very restrictive as it only supports doing the 
> filtering on the shuffle keys, but it is better to do this before breaking 
> the vectors into rows and losing the isRepeating properties.
> Adding a TopN Key operator in the physical operator tree allows the following 
> to happen.
> GBY->RS(Top=1)
> can become 
> TNK(1)->GBY->RS(Top=1)
> So that, the TopNKey can remove rows before they are buffered into the GBY 
> and consume memory.
> Here's the equivalent implementation in Presto
> https://github.com/prestodb/presto/blob/master/presto-main/src/main/java/com/facebook/presto/operator/TopNOperator.java#L35
> Adding this as a sub-feature of GroupBy prevents further optimizations if the 
> GBY is on keys "a,b,c" and the TopNKey is on just "a".



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17896) TopNKey: Create a standalone vectorizable TopNKey operator

2018-01-02 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16308327#comment-16308327
 ] 

Hive QA commented on HIVE-17896:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
28s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
35s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
29s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
15s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
19s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
20s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
27s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
18s{color} | {color:red} common: The patch generated 1 new + 931 unchanged - 0 
fixed = 932 total (was 931) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
38s{color} | {color:red} ql: The patch generated 37 new + 581 unchanged - 1 
fixed = 618 total (was 582) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 1s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
17s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
13s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 17m 50s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /data/hiveptest/working/yetus/dev-support/hive-personality.sh |
| git revision | master / 64229b9 |
| Default Java | 1.8.0_111 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-8396/yetus/diff-checkstyle-common.txt
 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-8396/yetus/diff-checkstyle-ql.txt
 |
| modules | C: common serde ql itests U: . |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-8396/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> TopNKey: Create a standalone vectorizable TopNKey operator
> --
>
> Key: HIVE-17896
> URL: https://issues.apache.org/jira/browse/HIVE-17896
> Project: Hive
>  Issue Type: New Feature
>  Components: Operators
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Assignee: Teddy Choi
> Attachments: HIVE-17896.1.patch, HIVE-17896.3.patch, 
> HIVE-17896.4.patch
>
>
> For TPC-DS Query27, the TopN operation is delayed by the group-by - the 
> group-by operator buffers up all the rows before discarding the 99% of the 
> rows in the TopN Hash within the ReduceSink Operator.
> The RS TopN operator is very restrictive as it only supports doing the 
> filtering on the shuffle keys, but it is better to do this before breaking 
> the vectors into rows and losing the isRepeating properties.
> Adding a TopN Key operator in the physical operator tree allows the following 
> to happen.
> GBY->RS(Top=1)
> can become 
> TNK(1)->GBY->RS(Top=1)
> So that, the TopNKey can remove rows before they are buffered into 

[jira] [Commented] (HIVE-17896) TopNKey: Create a standalone vectorizable TopNKey operator

2017-12-17 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16294057#comment-16294057
 ] 

Hive QA commented on HIVE-17896:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12902538/HIVE-17896.3.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 150 failed/errored test(s), 11533 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join25] (batchId=72)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ppd_join5] (batchId=35)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[explainuser_2] 
(batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[global_limit] 
(batchId=149)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=151)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[parquet_complex_types_vectorization]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[bucket_groupby]
 (batchId=170)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[bucketmapjoin7]
 (batchId=158)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[bucketsortoptimize_insert_2]
 (batchId=152)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[cbo_rp_lineage2]
 (batchId=153)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[ctas] 
(batchId=153)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[dynpart_sort_opt_vectorization]
 (batchId=162)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[dynpart_sort_optimization]
 (batchId=163)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[explainuser_1]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[hybridgrace_hashjoin_2]
 (batchId=157)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=165)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[lateral_view]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[limit_join_transpose]
 (batchId=156)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[limit_pushdown3]
 (batchId=158)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[limit_pushdown]
 (batchId=167)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[lineage2] 
(batchId=164)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[lineage3] 
(batchId=161)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid] 
(batchId=169)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid_fast]
 (batchId=160)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[offset_limit]
 (batchId=160)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[offset_limit_ppd_optimizer]
 (batchId=166)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[orc_predicate_pushdown]
 (batchId=153)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[parquet_predicate_pushdown]
 (batchId=157)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[quotedid_smb]
 (batchId=157)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[smb_mapjoin_15]
 (batchId=168)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_in]
 (batchId=165)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_notin]
 (batchId=167)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_scalar]
 (batchId=160)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_select]
 (batchId=160)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sysdb] 
(batchId=160)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[temp_table] 
(batchId=170)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union_fast_stats]
 (batchId=163)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union_top_level]
 (batchId=165)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_cast_constant]
 (batchId=153)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_char_2]
 (batchId=166)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_char_simple]
 (batchId=162)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_coalesce]
 (batchId=153)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_data_types]
 (batchId=168)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_decimal_expressions]
 (batchId=163)

[jira] [Commented] (HIVE-17896) TopNKey: Create a standalone vectorizable TopNKey operator

2017-12-17 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16294047#comment-16294047
 ] 

Hive QA commented on HIVE-17896:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m  
0s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
34s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
39s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
11s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
23s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
22s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
 5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
33s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
19s{color} | {color:red} common: The patch generated 1 new + 931 unchanged - 0 
fixed = 932 total (was 931) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
41s{color} | {color:red} ql: The patch generated 23 new + 580 unchanged - 1 
fixed = 603 total (was 581) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
25s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
15s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 19m  4s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /data/hiveptest/working/yetus/dev-support/hive-personality.sh |
| git revision | master / 646ccce |
| Default Java | 1.8.0_111 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-8294/yetus/diff-checkstyle-common.txt
 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-8294/yetus/diff-checkstyle-ql.txt
 |
| modules | C: common serde ql itests U: . |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-8294/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> TopNKey: Create a standalone vectorizable TopNKey operator
> --
>
> Key: HIVE-17896
> URL: https://issues.apache.org/jira/browse/HIVE-17896
> Project: Hive
>  Issue Type: New Feature
>  Components: Operators
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Assignee: Teddy Choi
> Attachments: HIVE-17896.1.patch, HIVE-17896.3.patch
>
>
> For TPC-DS Query27, the TopN operation is delayed by the group-by - the 
> group-by operator buffers up all the rows before discarding the 99% of the 
> rows in the TopN Hash within the ReduceSink Operator.
> The RS TopN operator is very restrictive as it only supports doing the 
> filtering on the shuffle keys, but it is better to do this before breaking 
> the vectors into rows and losing the isRepeating properties.
> Adding a TopN Key operator in the physical operator tree allows the following 
> to happen.
> GBY->RS(Top=1)
> can become 
> TNK(1)->GBY->RS(Top=1)
> So that, the TopNKey can remove rows before they are buffered into the GBY 
> and consume