[jira] [Updated] (HIVE-18159) Vectorization: Support Map type in MapWork

2017-12-17 Thread Colin Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Ma updated HIVE-18159:

Status: Patch Available  (was: Open)

> Vectorization: Support Map type in MapWork
> --
>
> Key: HIVE-18159
> URL: https://issues.apache.org/jira/browse/HIVE-18159
> Project: Hive
>  Issue Type: Improvement
>Reporter: Colin Ma
>Assignee: Colin Ma
> Attachments: HIVE-18159.001.patch
>
>
> Support Complex Types in vectorization is finished in HIVE-16589, but Map 
> type is still not support in MapWork. This ticket is target to support it for 
> MapWork when vectorization is enable.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-18159) Vectorization: Support Map type in MapWork

2017-12-17 Thread Colin Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Ma updated HIVE-18159:

Attachment: HIVE-18159.001.patch

hi, [~Ferd], can you help to review the patch, thanks for your help.

> Vectorization: Support Map type in MapWork
> --
>
> Key: HIVE-18159
> URL: https://issues.apache.org/jira/browse/HIVE-18159
> Project: Hive
>  Issue Type: Improvement
>Reporter: Colin Ma
>Assignee: Colin Ma
> Attachments: HIVE-18159.001.patch
>
>
> Support Complex Types in vectorization is finished in HIVE-16589, but Map 
> type is still not support in MapWork. This ticket is target to support it for 
> MapWork when vectorization is enable.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-18268) Hive Prepared Statement when split with double quoted in query fails

2017-12-17 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16294601#comment-16294601
 ] 

Hive QA commented on HIVE-18268:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
19s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
18s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
11s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
12s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
10s{color} | {color:green} jdbc: The patch generated 0 new + 14 unchanged - 4 
fixed = 14 total (was 18) {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch 10 line(s) with tabs. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
13s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
12s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}  9m 50s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /data/hiveptest/working/yetus/dev-support/hive-personality.sh |
| git revision | master / 8ced3bc |
| Default Java | 1.8.0_111 |
| whitespace | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-8296/yetus/whitespace-tabs.txt
 |
| modules | C: jdbc U: jdbc |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-8296/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Hive Prepared Statement when split with double quoted in query fails
> 
>
> Key: HIVE-18268
> URL: https://issues.apache.org/jira/browse/HIVE-18268
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC
>Affects Versions: 2.3.2
>Reporter: Choi JaeHwan
>Assignee: Choi JaeHwan
> Fix For: 3.0.0, 2.4.0, 2.3.3
>
> Attachments: HIVE-18268.1.patch, HIVE-18268.2.patch, 
> HIVE-18268.3.patch, HIVE-18268.patch
>
>
> HIVE-13625, Change sql statement split when odd number of escape characters, 
> and add parameter counter validation, above 
> {code:java}
> // prev code
> StringBuilder newSql = new StringBuilder(parts.get(0));
> for(int i=1;i   if(!parameters.containsKey(i)){
> throw new SQLException("Parameter #"+i+" is unset");
>   }
>   newSql.append(parameters.get(i));
>   newSql.append(parts.get(i));
> }
> // change from HIVE-13625
> int paramLoc = 1;
> while (getCharIndexFromSqlByParamLocation(sql, '?', paramLoc) > 0) {
>   // check the user has set the needs parameters
>   if (parameters.containsKey(paramLoc)) {
> int tt = getCharIndexFromSqlByParamLocation(newSql.toString(), '?', 
> 1);
> newSql.deleteCharAt(tt);
> newSql.insert(tt, parameters.get(paramLoc));
>   }
>   paramLoc++;
> }
> {code}
> If the number of split SQL and the number of parameters are not matched, an 
> SQLException is thrown
> Currently, when splitting SQL, there is no processing for double quoted, and 
> when the token ('?' ) is between double quote, SQL is split.
> i think when the token 

[jira] [Updated] (HIVE-18268) Hive Prepared Statement when split with double quoted in query fails

2017-12-17 Thread Choi JaeHwan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Choi JaeHwan updated HIVE-18268:

Fix Version/s: 2.4.0
   3.0.0
   Status: Patch Available  (was: Open)

> Hive Prepared Statement when split with double quoted in query fails
> 
>
> Key: HIVE-18268
> URL: https://issues.apache.org/jira/browse/HIVE-18268
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC
>Affects Versions: 2.3.2
>Reporter: Choi JaeHwan
>Assignee: Choi JaeHwan
> Fix For: 3.0.0, 2.4.0, 2.3.3
>
> Attachments: HIVE-18268.1.patch, HIVE-18268.2.patch, 
> HIVE-18268.3.patch, HIVE-18268.patch
>
>
> HIVE-13625, Change sql statement split when odd number of escape characters, 
> and add parameter counter validation, above 
> {code:java}
> // prev code
> StringBuilder newSql = new StringBuilder(parts.get(0));
> for(int i=1;i   if(!parameters.containsKey(i)){
> throw new SQLException("Parameter #"+i+" is unset");
>   }
>   newSql.append(parameters.get(i));
>   newSql.append(parts.get(i));
> }
> // change from HIVE-13625
> int paramLoc = 1;
> while (getCharIndexFromSqlByParamLocation(sql, '?', paramLoc) > 0) {
>   // check the user has set the needs parameters
>   if (parameters.containsKey(paramLoc)) {
> int tt = getCharIndexFromSqlByParamLocation(newSql.toString(), '?', 
> 1);
> newSql.deleteCharAt(tt);
> newSql.insert(tt, parameters.get(paramLoc));
>   }
>   paramLoc++;
> }
> {code}
> If the number of split SQL and the number of parameters are not matched, an 
> SQLException is thrown
> Currently, when splitting SQL, there is no processing for double quoted, and 
> when the token ('?' ) is between double quote, SQL is split.
> i think when the token between double quoted is literal, it is correct to not 
> split.
> for example, above the query;
> {code:java}
> // Some comments here
> 1:  String query =  " select 1 from x where qa="?" "
> 2:  String query = " SELECT 1 FROM `x` WHERE (trecord LIKE "ALA[d_?]%")
> {code}
> ? is literal, then query do not split. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-18268) Hive Prepared Statement when split with double quoted in query fails

2017-12-17 Thread Choi JaeHwan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Choi JaeHwan updated HIVE-18268:

Status: Open  (was: Patch Available)

> Hive Prepared Statement when split with double quoted in query fails
> 
>
> Key: HIVE-18268
> URL: https://issues.apache.org/jira/browse/HIVE-18268
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC
>Affects Versions: 2.3.2
>Reporter: Choi JaeHwan
>Assignee: Choi JaeHwan
> Fix For: 2.3.3
>
> Attachments: HIVE-18268.1.patch, HIVE-18268.2.patch, 
> HIVE-18268.3.patch, HIVE-18268.patch
>
>
> HIVE-13625, Change sql statement split when odd number of escape characters, 
> and add parameter counter validation, above 
> {code:java}
> // prev code
> StringBuilder newSql = new StringBuilder(parts.get(0));
> for(int i=1;i   if(!parameters.containsKey(i)){
> throw new SQLException("Parameter #"+i+" is unset");
>   }
>   newSql.append(parameters.get(i));
>   newSql.append(parts.get(i));
> }
> // change from HIVE-13625
> int paramLoc = 1;
> while (getCharIndexFromSqlByParamLocation(sql, '?', paramLoc) > 0) {
>   // check the user has set the needs parameters
>   if (parameters.containsKey(paramLoc)) {
> int tt = getCharIndexFromSqlByParamLocation(newSql.toString(), '?', 
> 1);
> newSql.deleteCharAt(tt);
> newSql.insert(tt, parameters.get(paramLoc));
>   }
>   paramLoc++;
> }
> {code}
> If the number of split SQL and the number of parameters are not matched, an 
> SQLException is thrown
> Currently, when splitting SQL, there is no processing for double quoted, and 
> when the token ('?' ) is between double quote, SQL is split.
> i think when the token between double quoted is literal, it is correct to not 
> split.
> for example, above the query;
> {code:java}
> // Some comments here
> 1:  String query =  " select 1 from x where qa="?" "
> 2:  String query = " SELECT 1 FROM `x` WHERE (trecord LIKE "ALA[d_?]%")
> {code}
> ? is literal, then query do not split. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-18268) Hive Prepared Statement when split with double quoted in query fails

2017-12-17 Thread Choi JaeHwan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Choi JaeHwan updated HIVE-18268:

Attachment: HIVE-18268.3.patch

> Hive Prepared Statement when split with double quoted in query fails
> 
>
> Key: HIVE-18268
> URL: https://issues.apache.org/jira/browse/HIVE-18268
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC
>Affects Versions: 2.3.2
>Reporter: Choi JaeHwan
>Assignee: Choi JaeHwan
> Fix For: 2.3.3
>
> Attachments: HIVE-18268.1.patch, HIVE-18268.2.patch, 
> HIVE-18268.3.patch, HIVE-18268.patch
>
>
> HIVE-13625, Change sql statement split when odd number of escape characters, 
> and add parameter counter validation, above 
> {code:java}
> // prev code
> StringBuilder newSql = new StringBuilder(parts.get(0));
> for(int i=1;i   if(!parameters.containsKey(i)){
> throw new SQLException("Parameter #"+i+" is unset");
>   }
>   newSql.append(parameters.get(i));
>   newSql.append(parts.get(i));
> }
> // change from HIVE-13625
> int paramLoc = 1;
> while (getCharIndexFromSqlByParamLocation(sql, '?', paramLoc) > 0) {
>   // check the user has set the needs parameters
>   if (parameters.containsKey(paramLoc)) {
> int tt = getCharIndexFromSqlByParamLocation(newSql.toString(), '?', 
> 1);
> newSql.deleteCharAt(tt);
> newSql.insert(tt, parameters.get(paramLoc));
>   }
>   paramLoc++;
> }
> {code}
> If the number of split SQL and the number of parameters are not matched, an 
> SQLException is thrown
> Currently, when splitting SQL, there is no processing for double quoted, and 
> when the token ('?' ) is between double quote, SQL is split.
> i think when the token between double quoted is literal, it is correct to not 
> split.
> for example, above the query;
> {code:java}
> // Some comments here
> 1:  String query =  " select 1 from x where qa="?" "
> 2:  String query = " SELECT 1 FROM `x` WHERE (trecord LIKE "ALA[d_?]%")
> {code}
> ? is literal, then query do not split. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-18289) Fix jar dependency when enable rdd cache in Hive on Spark

2017-12-17 Thread liyunzhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liyunzhang reassigned HIVE-18289:
-


> Fix jar dependency when enable rdd cache in Hive on Spark
> -
>
> Key: HIVE-18289
> URL: https://issues.apache.org/jira/browse/HIVE-18289
> Project: Hive
>  Issue Type: Bug
>Reporter: liyunzhang
>Assignee: liyunzhang
>
> running DS/query28 when enabling HIVE-17486's 4th patch
> on tpcds_bin_partitioned_orc_10 whether on spark local or yarn mode
> command
> {code}
> set spark.local=yarn-client;
> echo 'use tpcds_bin_partitioned_orc_10;source query28.sql;'|hive --hiveconf 
> spark.app.name=query28.sql  --hiveconf hive.spark.optimize.shared.work=true 
> -i testbench.settings -i query28.sql.setting
> {code}
> the exception 
> {code}
> ava.lang.RuntimeException: java.lang.NoSuchMethodException: 
> org.apache.hadoop.hive.ql.io.orc.OrcStruct.()
> 748678 at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:134) 
> ~[hadoop-common-2.7.3.jar:?]
> 748679 at 
> org.apache.hadoop.io.WritableUtils.clone(WritableUtils.java:217) 
> ~[hadoop-common-2.7.3.jar:?]
> 748680 at 
> org.apache.hadoop.hive.ql.exec.spark.MapInput$CopyFunction.call(MapInput.java:85)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.   0-SNAPSHOT]
> 748681 at 
> org.apache.hadoop.hive.ql.exec.spark.MapInput$CopyFunction.call(MapInput.java:72)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.   0-SNAPSHOT]
> 748682 at 
> org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.apply(JavaPairRDD.scala:1031)
>  ~[spark-core_2.11-2.   0.0.jar:2.0.0]
> 748683 at 
> org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.apply(JavaPairRDD.scala:1031)
>  ~[spark-core_2.11-2.   0.0.jar:2.0.0]
> 748684 at scala.collection.Iterator$$anon$11.next(Iterator.scala:409) 
> ~[scala-library-2.11.8.jar:?]
> 748685 at 
> org.apache.spark.storage.memory.MemoryStore.putIteratorAsValues(MemoryStore.scala:214)
>  ~[spark-core_2.11-2.0.0.jar:2.   0.0]
> 748686 at 
> org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:919)
>  ~[spark-core_2.11-2.0.0.   jar:2.0.0]
> 748687 at 
> org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:910)
>  ~[spark-core_2.11-2.0.0.   jar:2.0.0]
> 748688 at 
> org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:866) 
> ~[spark-core_2.11-2.0.0.jar:2.0.0]
> 748689 at 
> org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:910) 
> ~[spark-core_2.11-2.0.0.jar:2.0.0]
> 748690 at 
> org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:668) 
> ~[spark-core_2.11-2.0.0.jar:2.0.0]
> 748691 at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:330) 
> ~[spark-core_2.11-2.0.0.jar:2.0.0]
> 748692 at org.apache.spark.rdd.RDD.iterator(RDD.scala:281) 
> ~[spark-core_2.11-2.0.0.jar:2.0.0]
> 748693 at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) 
> ~[spark-core_2.11-2.0.0.jar:2.0.0]
> 748694 at 
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) 
> ~[spark-core_2.11-2.0.0.jar:2.0.0]
> 748695 at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) 
> ~[spark-core_2.11-2.0.0.jar:2.0.0]
> 748696 at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79) 
> ~[spark-core_2.11-2
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-8307) null character in columns.comments schema property breaks jobconf.xml

2017-12-17 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16294492#comment-16294492
 ] 

Hive QA commented on HIVE-8307:
---



Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12902582/HIVE-8307.04.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/8295/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/8295/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-8295/

Messages:
{noformat}
 This message was trimmed, see log for full details 
error: a/ql/src/test/results/clientpositive/pointlookup4.q.out: does not exist 
in index
error: a/ql/src/test/results/clientpositive/ppd_join_filter.q.out: does not 
exist in index
error: a/ql/src/test/results/clientpositive/ppd_vc.q.out: does not exist in 
index
error: a/ql/src/test/results/clientpositive/ppr_allchildsarenull.q.out: does 
not exist in index
error: a/ql/src/test/results/clientpositive/push_or.q.out: does not exist in 
index
error: a/ql/src/test/results/clientpositive/quotedid_tblproperty.q.out: does 
not exist in index
error: a/ql/src/test/results/clientpositive/rand_partitionpruner1.q.out: does 
not exist in index
error: a/ql/src/test/results/clientpositive/rand_partitionpruner2.q.out: does 
not exist in index
error: a/ql/src/test/results/clientpositive/rand_partitionpruner3.q.out: does 
not exist in index
error: a/ql/src/test/results/clientpositive/regexp_extract.q.out: does not 
exist in index
error: a/ql/src/test/results/clientpositive/router_join_ppr.q.out: does not 
exist in index
error: a/ql/src/test/results/clientpositive/sample1.q.out: does not exist in 
index
error: a/ql/src/test/results/clientpositive/sample2.q.out: does not exist in 
index
error: a/ql/src/test/results/clientpositive/sample4.q.out: does not exist in 
index
error: a/ql/src/test/results/clientpositive/sample5.q.out: does not exist in 
index
error: a/ql/src/test/results/clientpositive/sample6.q.out: does not exist in 
index
error: a/ql/src/test/results/clientpositive/sample7.q.out: does not exist in 
index
error: a/ql/src/test/results/clientpositive/sample8.q.out: does not exist in 
index
error: a/ql/src/test/results/clientpositive/sample9.q.out: does not exist in 
index
error: a/ql/src/test/results/clientpositive/serde_user_properties.q.out: does 
not exist in index
error: a/ql/src/test/results/clientpositive/smb_mapjoin_11.q.out: does not 
exist in index
error: a/ql/src/test/results/clientpositive/smb_mapjoin_12.q.out: does not 
exist in index
error: a/ql/src/test/results/clientpositive/smb_mapjoin_13.q.out: does not 
exist in index
error: a/ql/src/test/results/clientpositive/sort_merge_join_desc_5.q.out: does 
not exist in index
error: a/ql/src/test/results/clientpositive/sort_merge_join_desc_6.q.out: does 
not exist in index
error: a/ql/src/test/results/clientpositive/sort_merge_join_desc_7.q.out: does 
not exist in index
error: 
a/ql/src/test/results/clientpositive/spark/auto_join_reordering_values.q.out: 
does not exist in index
error: a/ql/src/test/results/clientpositive/spark/auto_sortmerge_join_1.q.out: 
does not exist in index
error: a/ql/src/test/results/clientpositive/spark/auto_sortmerge_join_12.q.out: 
does not exist in index
error: a/ql/src/test/results/clientpositive/spark/auto_sortmerge_join_2.q.out: 
does not exist in index
error: a/ql/src/test/results/clientpositive/spark/auto_sortmerge_join_3.q.out: 
does not exist in index
error: a/ql/src/test/results/clientpositive/spark/auto_sortmerge_join_4.q.out: 
does not exist in index
error: a/ql/src/test/results/clientpositive/spark/auto_sortmerge_join_5.q.out: 
does not exist in index
error: a/ql/src/test/results/clientpositive/spark/auto_sortmerge_join_7.q.out: 
does not exist in index
error: a/ql/src/test/results/clientpositive/spark/auto_sortmerge_join_8.q.out: 
does not exist in index
error: a/ql/src/test/results/clientpositive/spark/avro_joins_native.q.out: does 
not exist in index
error: a/ql/src/test/results/clientpositive/spark/bucket2.q.out: does not exist 
in index
error: a/ql/src/test/results/clientpositive/spark/bucket3.q.out: does not exist 
in index
error: a/ql/src/test/results/clientpositive/spark/bucket4.q.out: does not exist 
in index
error: a/ql/src/test/results/clientpositive/spark/bucket5.q.out: does not exist 
in index
error: a/ql/src/test/results/clientpositive/spark/bucket_map_join_1.q.out: does 
not exist in index
error: a/ql/src/test/results/clientpositive/spark/bucket_map_join_2.q.out: does 
not exist in index
error: a/ql/src/test/results/clientpositive/spark/bucket_map_join_spark1.q.out: 
does not exist in index
error: a/ql/src/test/results/clientpositive/spark/bucket_map_join_spark2.q.out: 
does not exist in index
error: a/ql/src/test/results/clientpositive/spark/bucket_map_join_spark3.q.out: 
does not 

[jira] [Updated] (HIVE-18111) Fix temp path for Spark DPP sink

2017-12-17 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-18111:
--
   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Pushed to master. Thanks Sahil for reviewing.

> Fix temp path for Spark DPP sink
> 
>
> Key: HIVE-18111
> URL: https://issues.apache.org/jira/browse/HIVE-18111
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
> Fix For: 3.0.0
>
> Attachments: HIVE-18111.1.patch, HIVE-18111.2.patch, 
> HIVE-18111.3.patch, HIVE-18111.4.patch, HIVE-18111.5.patch, HIVE-18111.5.patch
>
>
> Before HIVE-17877, each DPP sink has only one target work. The output path of 
> a DPP work is {{TMP_PATH/targetWorkId/dppWorkId}}. When we do the pruning, 
> each map work reads DPP outputs under {{TMP_PATH/targetWorkId}}.
> After HIVE-17877, each DPP sink can have multiple target works. It's possible 
> that a map work needs to read DPP outputs from multiple 
> {{TMP_PATH/targetWorkId}}. To solve this, I think we can have a DPP output 
> path specific to each query, e.g. {{QUERY_TMP_PATH/dpp_output}}. Each DPP 
> work outputs to {{QUERY_TMP_PATH/dpp_output/dppWorkId}}. And each map work 
> reads from {{QUERY_TMP_PATH/dpp_output}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-8307) null character in columns.comments schema property breaks jobconf.xml

2017-12-17 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-8307:
--
Attachment: HIVE-8307.04.patch

[~ashutoshc], I have rebased the patch and regenerated the q files. Could you 
take a look?

> null character in columns.comments schema property breaks jobconf.xml
> -
>
> Key: HIVE-8307
> URL: https://issues.apache.org/jira/browse/HIVE-8307
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 0.13.0, 0.14.0, 0.13.1
>Reporter: Carl Laird
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-8307.03.patch, HIVE-8307.04.patch, 
> HIVE-8307.1.patch, HIVE-8307.2.patch, HIVE-8307.patch
>
>
> It would appear that the fix for 
> https://issues.apache.org/jira/browse/HIVE-6681 is causing the null character 
> to show up in job config xml files:
> I get the following when trying to insert into an elasticsearch backed table:
> [Fatal Error] :336:51: Character reference "

[jira] [Updated] (HIVE-18209) Fix API call in VectorizedListColumnReader to get value from BytesColumnVector

2017-12-17 Thread Ferdinand Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu updated HIVE-18209:

   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Pushed to the master

> Fix API call in VectorizedListColumnReader to get value from BytesColumnVector
> --
>
> Key: HIVE-18209
> URL: https://issues.apache.org/jira/browse/HIVE-18209
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Colin Ma
>Assignee: Colin Ma
> Fix For: 3.0.0
>
> Attachments: HIVE-18209.001.patch, HIVE-18209.002.patch, 
> HIVE-18209.003.patch
>
>
> With the API BytesColumnVector.setVal(), the isRepeating attribute can't be 
> set correctly if ListColumnVector.child is BytesColumnVector. 
> BytesColumnVector.setRef() should be used to avoid this problem.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-18211) Support to read multiple level definition for Map type in Parquet file

2017-12-17 Thread Ferdinand Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu updated HIVE-18211:

   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Pushed to the master.

> Support to read multiple level definition for Map type in Parquet file
> --
>
> Key: HIVE-18211
> URL: https://issues.apache.org/jira/browse/HIVE-18211
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Colin Ma
>Assignee: Colin Ma
> Fix For: 3.0.0
>
> Attachments: HIVE-18211.001.patch, HIVE-18211.002.patch
>
>
> For the current implementation with VectorizedParquetRecordReader, only 
> following definition for map type is supported:
> {code}
> repeated group map (MAP_KEY_VALUE) {
> required binary key (UTF8); optional binary value (UTF8);}
> }
> {code}
> The implementation should support multiple level definition like:
> {code}
> optional group m1 (MAP) {
> repeated group map (MAP_KEY_VALUE)
> {required binary key (UTF8); optional binary value (UTF8);}
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-18211) Support to read multiple level definition for Map type in Parquet file

2017-12-17 Thread Colin Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16294430#comment-16294430
 ] 

Colin Ma commented on HIVE-18211:
-

[~Ferd], I think failed tests are not patch related.

> Support to read multiple level definition for Map type in Parquet file
> --
>
> Key: HIVE-18211
> URL: https://issues.apache.org/jira/browse/HIVE-18211
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Colin Ma
>Assignee: Colin Ma
> Attachments: HIVE-18211.001.patch, HIVE-18211.002.patch
>
>
> For the current implementation with VectorizedParquetRecordReader, only 
> following definition for map type is supported:
> {code}
> repeated group map (MAP_KEY_VALUE) {
> required binary key (UTF8); optional binary value (UTF8);}
> }
> {code}
> The implementation should support multiple level definition like:
> {code}
> optional group m1 (MAP) {
> repeated group map (MAP_KEY_VALUE)
> {required binary key (UTF8); optional binary value (UTF8);}
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-18209) Fix API call in VectorizedListColumnReader to get value from BytesColumnVector

2017-12-17 Thread Colin Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16294431#comment-16294431
 ] 

Colin Ma commented on HIVE-18209:
-

[~Ferd],  I think failed tests are not patch related.

> Fix API call in VectorizedListColumnReader to get value from BytesColumnVector
> --
>
> Key: HIVE-18209
> URL: https://issues.apache.org/jira/browse/HIVE-18209
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Colin Ma
>Assignee: Colin Ma
> Attachments: HIVE-18209.001.patch, HIVE-18209.002.patch, 
> HIVE-18209.003.patch
>
>
> With the API BytesColumnVector.setVal(), the isRepeating attribute can't be 
> set correctly if ListColumnVector.child is BytesColumnVector. 
> BytesColumnVector.setRef() should be used to avoid this problem.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-14498) Freshness period for query rewriting using materialized views

2017-12-17 Thread Jesus Camacho Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16294428#comment-16294428
 ] 

Jesus Camacho Rodriguez commented on HIVE-14498:


[~ashutoshc], could you take a look?
https://reviews.apache.org/r/64490/

Thanks

> Freshness period for query rewriting using materialized views
> -
>
> Key: HIVE-14498
> URL: https://issues.apache.org/jira/browse/HIVE-14498
> Project: Hive
>  Issue Type: Sub-task
>  Components: Materialized views
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-14498.01.patch, HIVE-14498.02.patch, 
> HIVE-14498.03.patch, HIVE-14498.patch
>
>
> Once we have query rewriting in place (HIVE-14496), one of the main issues is 
> data freshness in the materialized views.
> Since we will not support view maintenance at first, we could include a 
> HiveConf property to configure a max freshness period (_n timeunits_). If a 
> query comes, and the materialized view has been populated (by create, 
> refresh, etc.) for a longer period than _n_, then we should not use it for 
> rewriting the query.
> Optionally, we could print a warning for the user indicating that the 
> materialized was not used because it was not fresh.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17896) TopNKey: Create a standalone vectorizable TopNKey operator

2017-12-17 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16294057#comment-16294057
 ] 

Hive QA commented on HIVE-17896:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12902538/HIVE-17896.3.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 150 failed/errored test(s), 11533 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join25] (batchId=72)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ppd_join5] (batchId=35)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[explainuser_2] 
(batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[global_limit] 
(batchId=149)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=151)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[parquet_complex_types_vectorization]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[bucket_groupby]
 (batchId=170)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[bucketmapjoin7]
 (batchId=158)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[bucketsortoptimize_insert_2]
 (batchId=152)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[cbo_rp_lineage2]
 (batchId=153)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[ctas] 
(batchId=153)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[dynpart_sort_opt_vectorization]
 (batchId=162)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[dynpart_sort_optimization]
 (batchId=163)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[explainuser_1]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[hybridgrace_hashjoin_2]
 (batchId=157)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=165)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[lateral_view]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[limit_join_transpose]
 (batchId=156)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[limit_pushdown3]
 (batchId=158)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[limit_pushdown]
 (batchId=167)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[lineage2] 
(batchId=164)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[lineage3] 
(batchId=161)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid] 
(batchId=169)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid_fast]
 (batchId=160)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[offset_limit]
 (batchId=160)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[offset_limit_ppd_optimizer]
 (batchId=166)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[orc_predicate_pushdown]
 (batchId=153)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[parquet_predicate_pushdown]
 (batchId=157)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[quotedid_smb]
 (batchId=157)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[smb_mapjoin_15]
 (batchId=168)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_in]
 (batchId=165)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_notin]
 (batchId=167)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_scalar]
 (batchId=160)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_select]
 (batchId=160)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sysdb] 
(batchId=160)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[temp_table] 
(batchId=170)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union_fast_stats]
 (batchId=163)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union_top_level]
 (batchId=165)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_cast_constant]
 (batchId=153)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_char_2]
 (batchId=166)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_char_simple]
 (batchId=162)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_coalesce]
 (batchId=153)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_data_types]
 (batchId=168)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_decimal_expressions]
 (batchId=163)

[jira] [Commented] (HIVE-17896) TopNKey: Create a standalone vectorizable TopNKey operator

2017-12-17 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16294047#comment-16294047
 ] 

Hive QA commented on HIVE-17896:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m  
0s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
34s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
39s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
11s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
23s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
22s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
 5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
33s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
19s{color} | {color:red} common: The patch generated 1 new + 931 unchanged - 0 
fixed = 932 total (was 931) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
41s{color} | {color:red} ql: The patch generated 23 new + 580 unchanged - 1 
fixed = 603 total (was 581) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
25s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
15s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 19m  4s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /data/hiveptest/working/yetus/dev-support/hive-personality.sh |
| git revision | master / 646ccce |
| Default Java | 1.8.0_111 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-8294/yetus/diff-checkstyle-common.txt
 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-8294/yetus/diff-checkstyle-ql.txt
 |
| modules | C: common serde ql itests U: . |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-8294/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> TopNKey: Create a standalone vectorizable TopNKey operator
> --
>
> Key: HIVE-17896
> URL: https://issues.apache.org/jira/browse/HIVE-17896
> Project: Hive
>  Issue Type: New Feature
>  Components: Operators
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Assignee: Teddy Choi
> Attachments: HIVE-17896.1.patch, HIVE-17896.3.patch
>
>
> For TPC-DS Query27, the TopN operation is delayed by the group-by - the 
> group-by operator buffers up all the rows before discarding the 99% of the 
> rows in the TopN Hash within the ReduceSink Operator.
> The RS TopN operator is very restrictive as it only supports doing the 
> filtering on the shuffle keys, but it is better to do this before breaking 
> the vectors into rows and losing the isRepeating properties.
> Adding a TopN Key operator in the physical operator tree allows the following 
> to happen.
> GBY->RS(Top=1)
> can become 
> TNK(1)->GBY->RS(Top=1)
> So that, the TopNKey can remove rows before they are buffered into the GBY 
> and consume 

[jira] [Updated] (HIVE-17896) TopNKey: Create a standalone vectorizable TopNKey operator

2017-12-17 Thread Teddy Choi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi updated HIVE-17896:
--
Description: 
For TPC-DS Query27, the TopN operation is delayed by the group-by - the 
group-by operator buffers up all the rows before discarding the 99% of the rows 
in the TopN Hash within the ReduceSink Operator.

The RS TopN operator is very restrictive as it only supports doing the 
filtering on the shuffle keys, but it is better to do this before breaking the 
vectors into rows and losing the isRepeating properties.

Adding a TopN Key operator in the physical operator tree allows the following 
to happen.

GBY->RS(Top=1)

can become 

TNK(1)->GBY->RS(Top=1)

So that, the TopNKey can remove rows before they are buffered into the GBY and 
consume memory.

Here's the equivalent implementation in Presto

https://github.com/prestodb/presto/blob/master/presto-main/src/main/java/com/facebook/presto/operator/TopNOperator.java#L35

Adding this as a sub-feature of GroupBy prevents further optimizations if the 
GBY is on keys "a,b,c" and the TopNKey is on just "a".

  was:
For TPC-DS Query27, the TopN operation is delayed by the group-by - the 
group-by operator buffers up all the rows before discarding the 99% of the rows 
in the TopN Hash within the ReduceSink Operator.

The RS TopN operator is very restrictive as it only supports doing the 
filtering on the shuffle keys, but it is better to do this before breaking the 
vectors into rows and losing the isRepeating properties.

Adding a TopN operator in the physical operator tree allows the following to 
happen.

GBY->RS(Top=1)

can become 

TopN(1)->GBY->RS(Top=1)

So that, the TopN can remove rows before they are buffered into the GBY and 
consume memory.

Here's the equivalent implementation in Presto

https://github.com/prestodb/presto/blob/master/presto-main/src/main/java/com/facebook/presto/operator/TopNOperator.java#L35

Adding this as a sub-feature of GroupBy prevents further optimizations if the 
GBY is on keys "a,b,c" and the TopN is on just "a".


> TopNKey: Create a standalone vectorizable TopNKey operator
> --
>
> Key: HIVE-17896
> URL: https://issues.apache.org/jira/browse/HIVE-17896
> Project: Hive
>  Issue Type: New Feature
>  Components: Operators
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Assignee: Teddy Choi
> Attachments: HIVE-17896.1.patch, HIVE-17896.3.patch
>
>
> For TPC-DS Query27, the TopN operation is delayed by the group-by - the 
> group-by operator buffers up all the rows before discarding the 99% of the 
> rows in the TopN Hash within the ReduceSink Operator.
> The RS TopN operator is very restrictive as it only supports doing the 
> filtering on the shuffle keys, but it is better to do this before breaking 
> the vectors into rows and losing the isRepeating properties.
> Adding a TopN Key operator in the physical operator tree allows the following 
> to happen.
> GBY->RS(Top=1)
> can become 
> TNK(1)->GBY->RS(Top=1)
> So that, the TopNKey can remove rows before they are buffered into the GBY 
> and consume memory.
> Here's the equivalent implementation in Presto
> https://github.com/prestodb/presto/blob/master/presto-main/src/main/java/com/facebook/presto/operator/TopNOperator.java#L35
> Adding this as a sub-feature of GroupBy prevents further optimizations if the 
> GBY is on keys "a,b,c" and the TopNKey is on just "a".



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17896) TopNKey: Create a standalone vectorizable TopNKey operator

2017-12-17 Thread Teddy Choi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi updated HIVE-17896:
--
Summary: TopNKey: Create a standalone vectorizable TopNKey operator  (was: 
TopN: Create a standalone vectorizable TopN operator)

> TopNKey: Create a standalone vectorizable TopNKey operator
> --
>
> Key: HIVE-17896
> URL: https://issues.apache.org/jira/browse/HIVE-17896
> Project: Hive
>  Issue Type: New Feature
>  Components: Operators
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Assignee: Teddy Choi
> Attachments: HIVE-17896.1.patch, HIVE-17896.3.patch
>
>
> For TPC-DS Query27, the TopN operation is delayed by the group-by - the 
> group-by operator buffers up all the rows before discarding the 99% of the 
> rows in the TopN Hash within the ReduceSink Operator.
> The RS TopN operator is very restrictive as it only supports doing the 
> filtering on the shuffle keys, but it is better to do this before breaking 
> the vectors into rows and losing the isRepeating properties.
> Adding a TopN operator in the physical operator tree allows the following to 
> happen.
> GBY->RS(Top=1)
> can become 
> TopN(1)->GBY->RS(Top=1)
> So that, the TopN can remove rows before they are buffered into the GBY and 
> consume memory.
> Here's the equivalent implementation in Presto
> https://github.com/prestodb/presto/blob/master/presto-main/src/main/java/com/facebook/presto/operator/TopNOperator.java#L35
> Adding this as a sub-feature of GroupBy prevents further optimizations if the 
> GBY is on keys "a,b,c" and the TopN is on just "a".



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17896) TopN: Create a standalone vectorizable TopN operator

2017-12-17 Thread Teddy Choi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16294042#comment-16294042
 ] 

Teddy Choi commented on HIVE-17896:
---

There is already TopN prefix among existing classes, including TopNHash, 
TopNPropagator, TopNReducer, PTFTopNHash. So I named new classes with prefix 
TopNKey. It works in two phases, to collect top n keys in rows, then filter the 
rows with the keys.

> TopN: Create a standalone vectorizable TopN operator
> 
>
> Key: HIVE-17896
> URL: https://issues.apache.org/jira/browse/HIVE-17896
> Project: Hive
>  Issue Type: New Feature
>  Components: Operators
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Assignee: Teddy Choi
> Attachments: HIVE-17896.1.patch, HIVE-17896.3.patch
>
>
> For TPC-DS Query27, the TopN operation is delayed by the group-by - the 
> group-by operator buffers up all the rows before discarding the 99% of the 
> rows in the TopN Hash within the ReduceSink Operator.
> The RS TopN operator is very restrictive as it only supports doing the 
> filtering on the shuffle keys, but it is better to do this before breaking 
> the vectors into rows and losing the isRepeating properties.
> Adding a TopN operator in the physical operator tree allows the following to 
> happen.
> GBY->RS(Top=1)
> can become 
> TopN(1)->GBY->RS(Top=1)
> So that, the TopN can remove rows before they are buffered into the GBY and 
> consume memory.
> Here's the equivalent implementation in Presto
> https://github.com/prestodb/presto/blob/master/presto-main/src/main/java/com/facebook/presto/operator/TopNOperator.java#L35
> Adding this as a sub-feature of GroupBy prevents further optimizations if the 
> GBY is on keys "a,b,c" and the TopN is on just "a".



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17896) TopN: Create a standalone vectorizable TopN operator

2017-12-17 Thread Teddy Choi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16294041#comment-16294041
 ] 

Teddy Choi commented on HIVE-17896:
---

Sorry. I attached wrong HIVE-17896.2.patch. HIVE-17896.3.patch is the correct 
patch.

> TopN: Create a standalone vectorizable TopN operator
> 
>
> Key: HIVE-17896
> URL: https://issues.apache.org/jira/browse/HIVE-17896
> Project: Hive
>  Issue Type: New Feature
>  Components: Operators
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Assignee: Teddy Choi
> Attachments: HIVE-17896.1.patch, HIVE-17896.3.patch
>
>
> For TPC-DS Query27, the TopN operation is delayed by the group-by - the 
> group-by operator buffers up all the rows before discarding the 99% of the 
> rows in the TopN Hash within the ReduceSink Operator.
> The RS TopN operator is very restrictive as it only supports doing the 
> filtering on the shuffle keys, but it is better to do this before breaking 
> the vectors into rows and losing the isRepeating properties.
> Adding a TopN operator in the physical operator tree allows the following to 
> happen.
> GBY->RS(Top=1)
> can become 
> TopN(1)->GBY->RS(Top=1)
> So that, the TopN can remove rows before they are buffered into the GBY and 
> consume memory.
> Here's the equivalent implementation in Presto
> https://github.com/prestodb/presto/blob/master/presto-main/src/main/java/com/facebook/presto/operator/TopNOperator.java#L35
> Adding this as a sub-feature of GroupBy prevents further optimizations if the 
> GBY is on keys "a,b,c" and the TopN is on just "a".



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17896) TopN: Create a standalone vectorizable TopN operator

2017-12-17 Thread Teddy Choi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi updated HIVE-17896:
--
Attachment: HIVE-17896.3.patch

> TopN: Create a standalone vectorizable TopN operator
> 
>
> Key: HIVE-17896
> URL: https://issues.apache.org/jira/browse/HIVE-17896
> Project: Hive
>  Issue Type: New Feature
>  Components: Operators
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Assignee: Teddy Choi
> Attachments: HIVE-17896.1.patch, HIVE-17896.3.patch
>
>
> For TPC-DS Query27, the TopN operation is delayed by the group-by - the 
> group-by operator buffers up all the rows before discarding the 99% of the 
> rows in the TopN Hash within the ReduceSink Operator.
> The RS TopN operator is very restrictive as it only supports doing the 
> filtering on the shuffle keys, but it is better to do this before breaking 
> the vectors into rows and losing the isRepeating properties.
> Adding a TopN operator in the physical operator tree allows the following to 
> happen.
> GBY->RS(Top=1)
> can become 
> TopN(1)->GBY->RS(Top=1)
> So that, the TopN can remove rows before they are buffered into the GBY and 
> consume memory.
> Here's the equivalent implementation in Presto
> https://github.com/prestodb/presto/blob/master/presto-main/src/main/java/com/facebook/presto/operator/TopNOperator.java#L35
> Adding this as a sub-feature of GroupBy prevents further optimizations if the 
> GBY is on keys "a,b,c" and the TopN is on just "a".



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17896) TopN: Create a standalone vectorizable TopN operator

2017-12-17 Thread Teddy Choi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi updated HIVE-17896:
--
Attachment: (was: HIVE-17896.3.patch)

> TopN: Create a standalone vectorizable TopN operator
> 
>
> Key: HIVE-17896
> URL: https://issues.apache.org/jira/browse/HIVE-17896
> Project: Hive
>  Issue Type: New Feature
>  Components: Operators
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Assignee: Teddy Choi
> Attachments: HIVE-17896.1.patch
>
>
> For TPC-DS Query27, the TopN operation is delayed by the group-by - the 
> group-by operator buffers up all the rows before discarding the 99% of the 
> rows in the TopN Hash within the ReduceSink Operator.
> The RS TopN operator is very restrictive as it only supports doing the 
> filtering on the shuffle keys, but it is better to do this before breaking 
> the vectors into rows and losing the isRepeating properties.
> Adding a TopN operator in the physical operator tree allows the following to 
> happen.
> GBY->RS(Top=1)
> can become 
> TopN(1)->GBY->RS(Top=1)
> So that, the TopN can remove rows before they are buffered into the GBY and 
> consume memory.
> Here's the equivalent implementation in Presto
> https://github.com/prestodb/presto/blob/master/presto-main/src/main/java/com/facebook/presto/operator/TopNOperator.java#L35
> Adding this as a sub-feature of GroupBy prevents further optimizations if the 
> GBY is on keys "a,b,c" and the TopN is on just "a".



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17896) TopN: Create a standalone vectorizable TopN operator

2017-12-17 Thread Teddy Choi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi updated HIVE-17896:
--
Attachment: HIVE-17896.3.patch

> TopN: Create a standalone vectorizable TopN operator
> 
>
> Key: HIVE-17896
> URL: https://issues.apache.org/jira/browse/HIVE-17896
> Project: Hive
>  Issue Type: New Feature
>  Components: Operators
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Assignee: Teddy Choi
> Attachments: HIVE-17896.1.patch, HIVE-17896.3.patch
>
>
> For TPC-DS Query27, the TopN operation is delayed by the group-by - the 
> group-by operator buffers up all the rows before discarding the 99% of the 
> rows in the TopN Hash within the ReduceSink Operator.
> The RS TopN operator is very restrictive as it only supports doing the 
> filtering on the shuffle keys, but it is better to do this before breaking 
> the vectors into rows and losing the isRepeating properties.
> Adding a TopN operator in the physical operator tree allows the following to 
> happen.
> GBY->RS(Top=1)
> can become 
> TopN(1)->GBY->RS(Top=1)
> So that, the TopN can remove rows before they are buffered into the GBY and 
> consume memory.
> Here's the equivalent implementation in Presto
> https://github.com/prestodb/presto/blob/master/presto-main/src/main/java/com/facebook/presto/operator/TopNOperator.java#L35
> Adding this as a sub-feature of GroupBy prevents further optimizations if the 
> GBY is on keys "a,b,c" and the TopN is on just "a".



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17896) TopN: Create a standalone vectorizable TopN operator

2017-12-17 Thread Teddy Choi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi updated HIVE-17896:
--
Attachment: (was: HIVE-17896.2.patch)

> TopN: Create a standalone vectorizable TopN operator
> 
>
> Key: HIVE-17896
> URL: https://issues.apache.org/jira/browse/HIVE-17896
> Project: Hive
>  Issue Type: New Feature
>  Components: Operators
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Assignee: Teddy Choi
> Attachments: HIVE-17896.1.patch
>
>
> For TPC-DS Query27, the TopN operation is delayed by the group-by - the 
> group-by operator buffers up all the rows before discarding the 99% of the 
> rows in the TopN Hash within the ReduceSink Operator.
> The RS TopN operator is very restrictive as it only supports doing the 
> filtering on the shuffle keys, but it is better to do this before breaking 
> the vectors into rows and losing the isRepeating properties.
> Adding a TopN operator in the physical operator tree allows the following to 
> happen.
> GBY->RS(Top=1)
> can become 
> TopN(1)->GBY->RS(Top=1)
> So that, the TopN can remove rows before they are buffered into the GBY and 
> consume memory.
> Here's the equivalent implementation in Presto
> https://github.com/prestodb/presto/blob/master/presto-main/src/main/java/com/facebook/presto/operator/TopNOperator.java#L35
> Adding this as a sub-feature of GroupBy prevents further optimizations if the 
> GBY is on keys "a,b,c" and the TopN is on just "a".



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)