[jira] [Commented] (HIVE-17037) Use 1-to-1 Tez edge to avoid unnecessary input data shuffle

2017-07-20 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095819#comment-16095819
 ] 

Lefty Leverenz commented on HIVE-17037:
---

Doc note:  This adds *hive.optimize.joinreducededuplication* to HiveConf.java, 
so it will need to be documented in the wiki.

* [Configuration Properties -- Query and DDL Execution | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-QueryandDDLExecution]

Added a TODOC3.0 label.

> Use 1-to-1 Tez edge to avoid unnecessary input data shuffle
> ---
>
> Key: HIVE-17037
> URL: https://issues.apache.org/jira/browse/HIVE-17037
> Project: Hive
>  Issue Type: Improvement
>  Components: Physical Optimizer
>Affects Versions: 3.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>  Labels: TODOC3.0
> Fix For: 3.0.0
>
> Attachments: HIVE-17037.01.patch, HIVE-17037.02.patch, 
> HIVE-17037.03.patch, HIVE-17037.patch
>
>
> As an example, consider the following query:
> {code:sql}
> SELECT *
> FROM (
>   SELECT a.value
>   FROM src1 a
>   JOIN src1 b
>   ON (a.value = b.value)
>   GROUP BY a.value
> ) a
> JOIN src
> ON (a.value = src.value);
> {code}
> Currently, the plan generated for Tez will contain an unnecessary shuffle 
> operation between the subquery and the join, since the records produced by 
> the subquery are already sorted by the value.
> This issue is to extend join algorithm selection to be able to shuffle only 
> some of the inputs for a given join and avoid unnecessary shuffle operations.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Reopened] (HIVE-17142) HIVE command to get the column count ?

2017-07-20 Thread Jayanthi R (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jayanthi R reopened HIVE-17142:
---

> HIVE command to get the column count ?
> --
>
> Key: HIVE-17142
> URL: https://issues.apache.org/jira/browse/HIVE-17142
> Project: Hive
>  Issue Type: Wish
>Reporter: Jayanthi R
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17037) Use 1-to-1 Tez edge to avoid unnecessary input data shuffle

2017-07-20 Thread Lefty Leverenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-17037:
--
Labels: TODOC3.0  (was: )

> Use 1-to-1 Tez edge to avoid unnecessary input data shuffle
> ---
>
> Key: HIVE-17037
> URL: https://issues.apache.org/jira/browse/HIVE-17037
> Project: Hive
>  Issue Type: Improvement
>  Components: Physical Optimizer
>Affects Versions: 3.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>  Labels: TODOC3.0
> Fix For: 3.0.0
>
> Attachments: HIVE-17037.01.patch, HIVE-17037.02.patch, 
> HIVE-17037.03.patch, HIVE-17037.patch
>
>
> As an example, consider the following query:
> {code:sql}
> SELECT *
> FROM (
>   SELECT a.value
>   FROM src1 a
>   JOIN src1 b
>   ON (a.value = b.value)
>   GROUP BY a.value
> ) a
> JOIN src
> ON (a.value = src.value);
> {code}
> Currently, the plan generated for Tez will contain an unnecessary shuffle 
> operation between the subquery and the join, since the records produced by 
> the subquery are already sorted by the value.
> This issue is to extend join algorithm selection to be able to shuffle only 
> some of the inputs for a given join and avoid unnecessary shuffle operations.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17141) HIVE command to get the column count ?

2017-07-20 Thread Jayanthi R (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095816#comment-16095816
 ] 

Jayanthi R commented on HIVE-17141:
---

I want to count how many number of columns are there in my table.

> HIVE command to get the column count ?
> --
>
> Key: HIVE-17141
> URL: https://issues.apache.org/jira/browse/HIVE-17141
> Project: Hive
>  Issue Type: Wish
>Reporter: Jayanthi R
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (HIVE-17142) HIVE command to get the column count ?

2017-07-20 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg resolved HIVE-17142.

Resolution: Duplicate

> HIVE command to get the column count ?
> --
>
> Key: HIVE-17142
> URL: https://issues.apache.org/jira/browse/HIVE-17142
> Project: Hive
>  Issue Type: Wish
>Reporter: Jayanthi R
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17141) HIVE command to get the column count ?

2017-07-20 Thread Vineet Garg (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095811#comment-16095811
 ] 

Vineet Garg commented on HIVE-17141:


[~jayanthir] You can use {code:sql} select count() from  {code} 
to get column count.
I am not sure what this JIRA is for. Could you please elaborate if you are 
facing any issue? 

If you have any question you can use d...@hive.apache.org mailing list.


> HIVE command to get the column count ?
> --
>
> Key: HIVE-17141
> URL: https://issues.apache.org/jira/browse/HIVE-17141
> Project: Hive
>  Issue Type: Wish
>Reporter: Jayanthi R
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16369) Vectorization: Support PTF (Part 1: No Custom Window Framing -- Default Only)

2017-07-20 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095798#comment-16095798
 ] 

Lefty Leverenz commented on HIVE-16369:
---

Doc note:  This adds *hive.vectorized.execution.ptf.enabled* to HiveConf.java, 
so it will need to be documented in the wiki.

* [Configuration Properties -- Vectorization | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-Vectorization]

Added a TODOC3.0 label.

Acronym clarification:  PTF means partitioned table function ... right?

> Vectorization: Support PTF (Part 1: No Custom Window Framing -- Default Only)
> -
>
> Key: HIVE-16369
> URL: https://issues.apache.org/jira/browse/HIVE-16369
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
>  Labels: TODOC3.0
> Fix For: 3.0.0
>
> Attachments: HIVE-16369.01.patch, HIVE-16369.02.patch, 
> HIVE-16369.04.patch, HIVE-16369.05.patch.tar.gz, HIVE-16369.06.patch, 
> HIVE-16369.07.patch, HIVE-16369.091.patch, HIVE-16369.092.patch, 
> HIVE-16369.093.patch, HIVE-16369.094.patch, HIVE-16369.095.patch, 
> HIVE-16369.097.patch, HIVE-16369.098.patch, HIVE-16369.0991.patch, 
> HIVE-16369.0992.patch, HIVE-16369.0993.patch, HIVE-16369.0994.patch, 
> HIVE-16369.099.patch, HIVE-16369.09.patch
>
>
> Vectorize a submit of current PTFOperator window function support.  The first 
> phase doesn't include custom PRECEDING / FOLLOWING window frame clauses.
> Since we don't have unbounded support yet (i.e. spilling to disk) the enable 
> variable hive.vectorized.execution.ptf.enabled is off by default.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17139) Conditional expressions optimization: skip the expression evaluation if the condition is not satisfied for vectorization engine.

2017-07-20 Thread Ke Jia (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095796#comment-16095796
 ] 

Ke Jia commented on HIVE-17139:
---

With this patch, I test "select case when a=1 then trim(b) end from 
test_orc_5000" in my development machine. The data scale is almost 50 million 
records in table test_orc_5000(a int, b string) stored as ORC. The execution 
engine is spark. I do three experiments and the average value is as below 
table. The result shows the execution time of spark from 35.76s to 32.57s, the 
time cost of VectorSelectOperator from 3.12s to 0.89s and the count of then 
expression evaluation from 4735 to 5000712.

||  ||Non-optimization||Optimization||Improvement||
|Hos|35.76s|32.57s|8.9%|
|VectorSelectOperator|3.12s|0.89s|7.15%|
|count|4735|5000712|8.99%|









> Conditional expressions optimization: skip the expression evaluation if the 
> condition is not satisfied for vectorization engine.
> 
>
> Key: HIVE-17139
> URL: https://issues.apache.org/jira/browse/HIVE-17139
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ke Jia
>Assignee: Ke Jia
> Attachments: HIVE-17139.1.patch
>
>
> The case when and if statement execution for Hive vectorization is not 
> optimal, which all the conditional and else expressions are evaluated for 
> current implementation. The optimized approach is to update the selected 
> array of batch parameter after the conditional expression is executed. Then 
> the else expression will only do the selected rows instead of all.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-13125) Support masking and filtering of rows/columns

2017-07-20 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-13125:
---
Attachment: ColumnMaskingInsertDesign.docx

> Support masking and filtering of rows/columns
> -
>
> Key: HIVE-13125
> URL: https://issues.apache.org/jira/browse/HIVE-13125
> Project: Hive
>  Issue Type: New Feature
>  Components: Security
>Affects Versions: 2.0.0
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Fix For: 2.1.0
>
> Attachments: ColumnMaskingInsertDesign.docx, HIVE-13125.01.patch, 
> HIVE-13125.02.patch, HIVE-13125.03.patch, HIVE-13125.04.patch, 
> HIVE-13125.final.patch
>
>
> Traditionally, access control at the row and column level is implemented 
> through views. Using views as an access control method works well only when 
> access rules, restrictions, and conditions are monolithic and simple. It 
> however becomes ineffective when view definitions become too complex because 
> of the complexity and granularity of privacy and security policies. It also 
> becomes costly when a large number of views must be manually updated and 
> maintained. In addition, the ability to update views proves to be 
> challenging. As privacy and security policies evolve, required updates to 
> views may negatively affect the security logic particularly when database 
> applications reference the views directly by name. HIVE row and column access 
> control helps resolve all these problems.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16369) Vectorization: Support PTF (Part 1: No Custom Window Framing -- Default Only)

2017-07-20 Thread Lefty Leverenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-16369:
--
Labels: TODOC3.0  (was: )

> Vectorization: Support PTF (Part 1: No Custom Window Framing -- Default Only)
> -
>
> Key: HIVE-16369
> URL: https://issues.apache.org/jira/browse/HIVE-16369
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
>  Labels: TODOC3.0
> Fix For: 3.0.0
>
> Attachments: HIVE-16369.01.patch, HIVE-16369.02.patch, 
> HIVE-16369.04.patch, HIVE-16369.05.patch.tar.gz, HIVE-16369.06.patch, 
> HIVE-16369.07.patch, HIVE-16369.091.patch, HIVE-16369.092.patch, 
> HIVE-16369.093.patch, HIVE-16369.094.patch, HIVE-16369.095.patch, 
> HIVE-16369.097.patch, HIVE-16369.098.patch, HIVE-16369.0991.patch, 
> HIVE-16369.0992.patch, HIVE-16369.0993.patch, HIVE-16369.0994.patch, 
> HIVE-16369.099.patch, HIVE-16369.09.patch
>
>
> Vectorize a submit of current PTFOperator window function support.  The first 
> phase doesn't include custom PRECEDING / FOLLOWING window frame clauses.
> Since we don't have unbounded support yet (i.e. spilling to disk) the enable 
> variable hive.vectorized.execution.ptf.enabled is off by default.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17139) Conditional expressions optimization: skip the expression evaluation if the condition is not satisfied for vectorization engine.

2017-07-20 Thread Ke Jia (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ke Jia updated HIVE-17139:
--
Attachment: HIVE-17139.1.patch

> Conditional expressions optimization: skip the expression evaluation if the 
> condition is not satisfied for vectorization engine.
> 
>
> Key: HIVE-17139
> URL: https://issues.apache.org/jira/browse/HIVE-17139
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ke Jia
>Assignee: Ke Jia
> Attachments: HIVE-17139.1.patch
>
>
> The case when and if statement execution for Hive vectorization is not 
> optimal, which all the conditional and else expressions are evaluated for 
> current implementation. The optimized approach is to update the selected 
> array of batch parameter after the conditional expression is executed. Then 
> the else expression will only do the selected rows instead of all.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17139) Conditional expressions optimization: skip the expression evaluation if the condition is not satisfied for vectorization engine.

2017-07-20 Thread Ke Jia (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ke Jia reassigned HIVE-17139:
-


> Conditional expressions optimization: skip the expression evaluation if the 
> condition is not satisfied for vectorization engine.
> 
>
> Key: HIVE-17139
> URL: https://issues.apache.org/jira/browse/HIVE-17139
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ke Jia
>Assignee: Ke Jia
>
> The case when and if statement execution for Hive vectorization is not 
> optimal, which all the conditional and else expressions are evaluated for 
> current implementation. The optimized approach is to update the selected 
> array of batch parameter after the conditional expression is executed. Then 
> the else expression will only do the selected rows instead of all.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17117) Metalisteners are not notified when threadlocal metaconf is cleanup

2017-07-20 Thread PRASHANT GOLASH (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095763#comment-16095763
 ] 

PRASHANT GOLASH commented on HIVE-17117:


[~mohitsabharwal], I have attached the latest patch. Please have a look and let 
me know the next steps.

> Metalisteners are not notified when threadlocal metaconf is cleanup 
> 
>
> Key: HIVE-17117
> URL: https://issues.apache.org/jira/browse/HIVE-17117
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
> Environment: Tested on master branch (Applicable for downlevel 
> versions as well)
>Reporter: PRASHANT GOLASH
>Assignee: PRASHANT GOLASH
>Priority: Minor
> Attachments: HIVE-17117.1.patch, HIVE-17117.patch
>
>
> Meta listeners are not notified of meta-conf cleanup. This could potentially 
> leave stale values on listeners objects. For e.g.
> Request1
> a. HS2 -> HMS : HMSHandler#setMetaConf
>  MetaListeners are notified of the ConfigChangeEvent.
> b. HS2 -> HMS : HMSHandler#shutdown / HiveMetaStore#deleteContext (if 
> shutdown is not invoked)
> MetaConf is cleaned up in HiveMetaStore#cleanupRawStore, but meta 
> listeners are not notified
> Request 2
> 3. HS2->HMS : AlterPartition
>  MetaListeners are notified of AlterPartitionEvent. If any listener has 
> taken dependency on the meta conf value, it will still be having stale value 
> from Request1 and would potentially be having issues.
> The correct behavior should be to notify meta listeners on cleanup as well.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17114) HoS: Possible skew in shuffling when data is not really skewed

2017-07-20 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095761#comment-16095761
 ] 

Rui Li commented on HIVE-17114:
---

Latest failures not related.

> HoS: Possible skew in shuffling when data is not really skewed
> --
>
> Key: HIVE-17114
> URL: https://issues.apache.org/jira/browse/HIVE-17114
> Project: Hive
>  Issue Type: Bug
>Reporter: Rui Li
>Assignee: Rui Li
>Priority: Minor
> Attachments: HIVE-17114.1.patch, HIVE-17114.2.patch, 
> HIVE-17114.3.patch
>
>
> Observed in HoS and may apply to other engines as well.
> When we join 2 tables on a single int key, we use the key itself as hash code 
> in {{ObjectInspectorUtils.hashCode}}:
> {code}
>   case INT:
> return ((IntObjectInspector) poi).get(o);
> {code}
> Suppose the keys are different but are all some multiples of 10. And if we 
> choose 10 as #reducers, the shuffle will be skewed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17034) The spark tar for itests is downloaded every time if md5sum is not installed

2017-07-20 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095759#comment-16095759
 ] 

Rui Li commented on HIVE-17034:
---

Hi [~kgyrtkirk], the change here only affects the "re-download" logic:
{code}
if [[ ! -f $DOWNLOAD_DIR/$tarName ]]
then
  curl -Sso $DOWNLOAD_DIR/$tarName $url
else
  local md5File="$tarName".md5sum
  curl -Sso $DOWNLOAD_DIR/$md5File "$url".md5sum
  cd $DOWNLOAD_DIR
  if type md5sum >/dev/null  ! md5sum -c $md5File; then
curl -Sso $DOWNLOAD_DIR/$tarName $url || return 1
  fi
{code}
If the tar doesn't exist in the first place, it'll be downloaded anyway.
For "re-download", if the developer really cares about updating the spark tar, 
I assume he/she will be aware that md5sum is needed. Does that make sense?

> The spark tar for itests is downloaded every time if md5sum is not installed
> 
>
> Key: HIVE-17034
> URL: https://issues.apache.org/jira/browse/HIVE-17034
> Project: Hive
>  Issue Type: Test
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
>Priority: Trivial
> Fix For: 3.0.0
>
> Attachments: HIVE-17034.1.patch
>
>
> I think we should either skip verifying md5, or fail the build to let 
> developer know md5sum is required.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17087) Remove unnecessary HoS DPP trees during map-join conversion

2017-07-20 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095710#comment-16095710
 ] 

Hive QA commented on HIVE-17087:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12878290/HIVE-17087.1.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 11 failed/errored test(s), 11095 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_row__id] 
(batchId=46)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=144)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_scalar]
 (batchId=153)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning]
 (batchId=168)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=168)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=235)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=235)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=179)
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testHttpRetryOnServerIdleTimeout 
(batchId=228)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6103/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6103/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6103/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 11 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12878290 - PreCommit-HIVE-Build

> Remove unnecessary HoS DPP trees during map-join conversion
> ---
>
> Key: HIVE-17087
> URL: https://issues.apache.org/jira/browse/HIVE-17087
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-17087.1.patch
>
>
> Ran the following query in the {{TestSparkCliDriver}}:
> {code:sql}
> set hive.spark.dynamic.partition.pruning=true;
> set hive.auto.convert.join=true;
> create table partitioned_table1 (col int) partitioned by (part_col int);
> create table partitioned_table2 (col int) partitioned by (part_col int);
> create table regular_table (col int);
> insert into table regular_table values (1);
> alter table partitioned_table1 add partition (part_col = 1);
> insert into table partitioned_table1 partition (part_col = 1) values (1), 
> (2), (3), (4), (5), (6), (7), (8), (9), (10);
> alter table partitioned_table2 add partition (part_col = 1);
> insert into table partitioned_table2 partition (part_col = 1) values (1), 
> (2), (3), (4), (5), (6), (7), (8), (9), (10);
> explain select * from partitioned_table1 where partitioned_table1.part_col in 
> (select regular_table.col from regular_table join partitioned_table2 on 
> regular_table.col = partitioned_table2.part_col);
> {code}
> and got the following explain plan:
> {code}
> STAGE DEPENDENCIES:
>   Stage-2 is a root stage
>   Stage-4 depends on stages: Stage-2
>   Stage-5 depends on stages: Stage-4
>   Stage-3 depends on stages: Stage-5
>   Stage-1 depends on stages: Stage-3
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-2
> Spark
>  A masked pattern was here 
>   Vertices:
> Map 4 
> Map Operator Tree:
> TableScan
>   alias: partitioned_table1
>   Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
>   Select Operator
> expressions: col (type: int), part_col (type: int)
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
> Select Operator
>   expressions: _col1 (type: int)
>   outputColumnNames: _col0
>   Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
>   Group By Operator
> keys: _col0 (type: int)
> mode: hash
> outputColumnNames: _col0
>  

[jira] [Assigned] (HIVE-10567) partial scan for rcfile table doesn't work for dynamic partition

2017-07-20 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li reassigned HIVE-10567:
--

Assignee: Bing Li  (was: Thomas Friedrich)

> partial scan for rcfile table doesn't work for dynamic partition
> 
>
> Key: HIVE-10567
> URL: https://issues.apache.org/jira/browse/HIVE-10567
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 0.14.0, 1.0.0
>Reporter: Thomas Friedrich
>Assignee: Bing Li
>Priority: Minor
>  Labels: rcfile
> Attachments: HIVE-10567.1.patch
>
>
> HIVE-3958 added support for partial scan for RCFile. This works fine for 
> static partitions (for example: analyze table analyze_srcpart_partial_scan 
> PARTITION(ds='2008-04-08',hr=11) compute statistics partialscan).
> For dynamic partition, the analyze files with an IOException 
> "java.io.IOException: No input paths specified in job":
> hive> ANALYZE TABLE testtable PARTITION(col_varchar) COMPUTE STATISTICS 
> PARTIALSCAN;
> java.io.IOException: No input paths specified in job
> at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getInputPaths(HiveInputFormat.java:318)
> at 
> org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:459)
> at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:624)
> at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:616)
> at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:492)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (HIVE-17131) Add InterfaceAudience and InterfaceStability annotations for SerDe APIs

2017-07-20 Thread Sahil Takiar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095681#comment-16095681
 ] 

Sahil Takiar edited comment on HIVE-17131 at 7/21/17 2:23 AM:
--

Thanks for pointing that out. I can just move the changes for the Serializer 
and Deserializer interfaces into a separate patch that will only go into 
branch-2, does that sound reasonable?


was (Author: stakiar):
Thanks for pointing that out. I changed the target version for this to branch-2 
only.

> Add InterfaceAudience and InterfaceStability annotations for SerDe APIs
> ---
>
> Key: HIVE-17131
> URL: https://issues.apache.org/jira/browse/HIVE-17131
> Project: Hive
>  Issue Type: Sub-task
>  Components: Serializers/Deserializers
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-17131.1.patch
>
>
> Adding InterfaceAudience and InterfaceStability annotations for the core 
> SerDe APIs.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (HIVE-17131) Add InterfaceAudience and InterfaceStability annotations for SerDe APIs

2017-07-20 Thread Sahil Takiar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095681#comment-16095681
 ] 

Sahil Takiar edited comment on HIVE-17131 at 7/21/17 2:18 AM:
--

Thanks for pointing that out. I changed the target version for this to branch-2 
only.


was (Author: stakiar):
T

> Add InterfaceAudience and InterfaceStability annotations for SerDe APIs
> ---
>
> Key: HIVE-17131
> URL: https://issues.apache.org/jira/browse/HIVE-17131
> Project: Hive
>  Issue Type: Sub-task
>  Components: Serializers/Deserializers
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-17131.1.patch
>
>
> Adding InterfaceAudience and InterfaceStability annotations for the core 
> SerDe APIs.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17131) Add InterfaceAudience and InterfaceStability annotations for SerDe APIs

2017-07-20 Thread Sahil Takiar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095681#comment-16095681
 ] 

Sahil Takiar commented on HIVE-17131:
-

T

> Add InterfaceAudience and InterfaceStability annotations for SerDe APIs
> ---
>
> Key: HIVE-17131
> URL: https://issues.apache.org/jira/browse/HIVE-17131
> Project: Hive
>  Issue Type: Sub-task
>  Components: Serializers/Deserializers
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-17131.1.patch
>
>
> Adding InterfaceAudience and InterfaceStability annotations for the core 
> SerDe APIs.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17131) Add InterfaceAudience and InterfaceStability annotations for SerDe APIs

2017-07-20 Thread Sahil Takiar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-17131:

Target Version/s: 2.4.0  (was: 3.0.0)

> Add InterfaceAudience and InterfaceStability annotations for SerDe APIs
> ---
>
> Key: HIVE-17131
> URL: https://issues.apache.org/jira/browse/HIVE-17131
> Project: Hive
>  Issue Type: Sub-task
>  Components: Serializers/Deserializers
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-17131.1.patch
>
>
> Adding InterfaceAudience and InterfaceStability annotations for the core 
> SerDe APIs.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16997) Extend object store to store bit vectors

2017-07-20 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095680#comment-16095680
 ] 

Hive QA commented on HIVE-16997:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12878270/HIVE-16997.02.patch

{color:green}SUCCESS:{color} +1 due to 7 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 12 failed/errored test(s), 10998 tests 
executed
*Failed tests:*
{noformat}
TestPerfCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=144)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_scalar]
 (batchId=153)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=168)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=100)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] 
(batchId=99)
org.apache.hadoop.hive.metastore.TestHiveMetaStoreStatsMerge.testStatsMerge 
(batchId=206)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=179)
org.apache.hive.hcatalog.pig.TestHCatLoaderComplexSchema.testMapWithComplexData[1]
 (batchId=182)
org.apache.hive.hcatalog.pig.TestHCatLoaderComplexSchema.testSyntheticComplexSchema[0]
 (batchId=182)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6102/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6102/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6102/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 12 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12878270 - PreCommit-HIVE-Build

> Extend object store to store bit vectors
> 
>
> Key: HIVE-16997
> URL: https://issues.apache.org/jira/browse/HIVE-16997
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-16997.01.patch, HIVE-16997.02.patch
>
>
> This patch includes: (1) a new serde for FMSketch (2) change of schema for 
> derby and mysql (3) support for date type (4) refactoring the extrapolation 
> and merge code



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17087) Remove unnecessary HoS DPP trees during map-join conversion

2017-07-20 Thread Sahil Takiar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-17087:

Summary: Remove unnecessary HoS DPP trees during map-join conversion  (was: 
Remove HoS DPP tree during map-join conversion)

> Remove unnecessary HoS DPP trees during map-join conversion
> ---
>
> Key: HIVE-17087
> URL: https://issues.apache.org/jira/browse/HIVE-17087
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-17087.1.patch
>
>
> Ran the following query in the {{TestSparkCliDriver}}:
> {code:sql}
> set hive.spark.dynamic.partition.pruning=true;
> set hive.auto.convert.join=true;
> create table partitioned_table1 (col int) partitioned by (part_col int);
> create table partitioned_table2 (col int) partitioned by (part_col int);
> create table regular_table (col int);
> insert into table regular_table values (1);
> alter table partitioned_table1 add partition (part_col = 1);
> insert into table partitioned_table1 partition (part_col = 1) values (1), 
> (2), (3), (4), (5), (6), (7), (8), (9), (10);
> alter table partitioned_table2 add partition (part_col = 1);
> insert into table partitioned_table2 partition (part_col = 1) values (1), 
> (2), (3), (4), (5), (6), (7), (8), (9), (10);
> explain select * from partitioned_table1 where partitioned_table1.part_col in 
> (select regular_table.col from regular_table join partitioned_table2 on 
> regular_table.col = partitioned_table2.part_col);
> {code}
> and got the following explain plan:
> {code}
> STAGE DEPENDENCIES:
>   Stage-2 is a root stage
>   Stage-4 depends on stages: Stage-2
>   Stage-5 depends on stages: Stage-4
>   Stage-3 depends on stages: Stage-5
>   Stage-1 depends on stages: Stage-3
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-2
> Spark
>  A masked pattern was here 
>   Vertices:
> Map 4 
> Map Operator Tree:
> TableScan
>   alias: partitioned_table1
>   Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
>   Select Operator
> expressions: col (type: int), part_col (type: int)
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
> Select Operator
>   expressions: _col1 (type: int)
>   outputColumnNames: _col0
>   Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
>   Group By Operator
> keys: _col0 (type: int)
> mode: hash
> outputColumnNames: _col0
> Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
> Spark Partition Pruning Sink Operator
>   partition key expr: part_col
>   Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
>   target column name: part_col
>   target work: Map 3
>   Stage: Stage-4
> Spark
>  A masked pattern was here 
>   Vertices:
> Map 2 
> Map Operator Tree:
> TableScan
>   alias: regular_table
>   Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE 
> Column stats: NONE
>   Filter Operator
> predicate: col is not null (type: boolean)
> Statistics: Num rows: 1 Data size: 1 Basic stats: 
> COMPLETE Column stats: NONE
> Select Operator
>   expressions: col (type: int)
>   outputColumnNames: _col0
>   Statistics: Num rows: 1 Data size: 1 Basic stats: 
> COMPLETE Column stats: NONE
>   Spark HashTable Sink Operator
> keys:
>   0 _col0 (type: int)
>   1 _col0 (type: int)
>   Select Operator
> expressions: _col0 (type: int)
> outputColumnNames: _col0
> Statistics: Num rows: 1 Data size: 1 Basic stats: 
> COMPLETE Column stats: NONE
> Group By Operator
>   keys: _col0 (type: int)
>   mode: hash
>   outputColumnNames: _col0
>   Statistics: Num rows: 1 Data size: 1 Basic stats: 
> COMPLETE Column stats: NONE

[jira] [Commented] (HIVE-17087) Remove HoS DPP tree during map-join conversion

2017-07-20 Thread Sahil Takiar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095675#comment-16095675
 ] 

Sahil Takiar commented on HIVE-17087:
-

Patch uploaded, will post some more details of the fix soon.

> Remove HoS DPP tree during map-join conversion
> --
>
> Key: HIVE-17087
> URL: https://issues.apache.org/jira/browse/HIVE-17087
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-17087.1.patch
>
>
> Ran the following query in the {{TestSparkCliDriver}}:
> {code:sql}
> set hive.spark.dynamic.partition.pruning=true;
> set hive.auto.convert.join=true;
> create table partitioned_table1 (col int) partitioned by (part_col int);
> create table partitioned_table2 (col int) partitioned by (part_col int);
> create table regular_table (col int);
> insert into table regular_table values (1);
> alter table partitioned_table1 add partition (part_col = 1);
> insert into table partitioned_table1 partition (part_col = 1) values (1), 
> (2), (3), (4), (5), (6), (7), (8), (9), (10);
> alter table partitioned_table2 add partition (part_col = 1);
> insert into table partitioned_table2 partition (part_col = 1) values (1), 
> (2), (3), (4), (5), (6), (7), (8), (9), (10);
> explain select * from partitioned_table1 where partitioned_table1.part_col in 
> (select regular_table.col from regular_table join partitioned_table2 on 
> regular_table.col = partitioned_table2.part_col);
> {code}
> and got the following explain plan:
> {code}
> STAGE DEPENDENCIES:
>   Stage-2 is a root stage
>   Stage-4 depends on stages: Stage-2
>   Stage-5 depends on stages: Stage-4
>   Stage-3 depends on stages: Stage-5
>   Stage-1 depends on stages: Stage-3
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-2
> Spark
>  A masked pattern was here 
>   Vertices:
> Map 4 
> Map Operator Tree:
> TableScan
>   alias: partitioned_table1
>   Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
>   Select Operator
> expressions: col (type: int), part_col (type: int)
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
> Select Operator
>   expressions: _col1 (type: int)
>   outputColumnNames: _col0
>   Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
>   Group By Operator
> keys: _col0 (type: int)
> mode: hash
> outputColumnNames: _col0
> Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
> Spark Partition Pruning Sink Operator
>   partition key expr: part_col
>   Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
>   target column name: part_col
>   target work: Map 3
>   Stage: Stage-4
> Spark
>  A masked pattern was here 
>   Vertices:
> Map 2 
> Map Operator Tree:
> TableScan
>   alias: regular_table
>   Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE 
> Column stats: NONE
>   Filter Operator
> predicate: col is not null (type: boolean)
> Statistics: Num rows: 1 Data size: 1 Basic stats: 
> COMPLETE Column stats: NONE
> Select Operator
>   expressions: col (type: int)
>   outputColumnNames: _col0
>   Statistics: Num rows: 1 Data size: 1 Basic stats: 
> COMPLETE Column stats: NONE
>   Spark HashTable Sink Operator
> keys:
>   0 _col0 (type: int)
>   1 _col0 (type: int)
>   Select Operator
> expressions: _col0 (type: int)
> outputColumnNames: _col0
> Statistics: Num rows: 1 Data size: 1 Basic stats: 
> COMPLETE Column stats: NONE
> Group By Operator
>   keys: _col0 (type: int)
>   mode: hash
>   outputColumnNames: _col0
>   Statistics: Num rows: 1 Data size: 1 Basic stats: 
> COMPLETE Column stats: NONE
>   Spark Partition Pruning 

[jira] [Updated] (HIVE-17087) Remove HoS DPP tree during map-join conversion

2017-07-20 Thread Sahil Takiar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-17087:

Attachment: HIVE-17087.1.patch

> Remove HoS DPP tree during map-join conversion
> --
>
> Key: HIVE-17087
> URL: https://issues.apache.org/jira/browse/HIVE-17087
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-17087.1.patch
>
>
> Ran the following query in the {{TestSparkCliDriver}}:
> {code:sql}
> set hive.spark.dynamic.partition.pruning=true;
> set hive.auto.convert.join=true;
> create table partitioned_table1 (col int) partitioned by (part_col int);
> create table partitioned_table2 (col int) partitioned by (part_col int);
> create table regular_table (col int);
> insert into table regular_table values (1);
> alter table partitioned_table1 add partition (part_col = 1);
> insert into table partitioned_table1 partition (part_col = 1) values (1), 
> (2), (3), (4), (5), (6), (7), (8), (9), (10);
> alter table partitioned_table2 add partition (part_col = 1);
> insert into table partitioned_table2 partition (part_col = 1) values (1), 
> (2), (3), (4), (5), (6), (7), (8), (9), (10);
> explain select * from partitioned_table1 where partitioned_table1.part_col in 
> (select regular_table.col from regular_table join partitioned_table2 on 
> regular_table.col = partitioned_table2.part_col);
> {code}
> and got the following explain plan:
> {code}
> STAGE DEPENDENCIES:
>   Stage-2 is a root stage
>   Stage-4 depends on stages: Stage-2
>   Stage-5 depends on stages: Stage-4
>   Stage-3 depends on stages: Stage-5
>   Stage-1 depends on stages: Stage-3
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-2
> Spark
>  A masked pattern was here 
>   Vertices:
> Map 4 
> Map Operator Tree:
> TableScan
>   alias: partitioned_table1
>   Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
>   Select Operator
> expressions: col (type: int), part_col (type: int)
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
> Select Operator
>   expressions: _col1 (type: int)
>   outputColumnNames: _col0
>   Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
>   Group By Operator
> keys: _col0 (type: int)
> mode: hash
> outputColumnNames: _col0
> Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
> Spark Partition Pruning Sink Operator
>   partition key expr: part_col
>   Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
>   target column name: part_col
>   target work: Map 3
>   Stage: Stage-4
> Spark
>  A masked pattern was here 
>   Vertices:
> Map 2 
> Map Operator Tree:
> TableScan
>   alias: regular_table
>   Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE 
> Column stats: NONE
>   Filter Operator
> predicate: col is not null (type: boolean)
> Statistics: Num rows: 1 Data size: 1 Basic stats: 
> COMPLETE Column stats: NONE
> Select Operator
>   expressions: col (type: int)
>   outputColumnNames: _col0
>   Statistics: Num rows: 1 Data size: 1 Basic stats: 
> COMPLETE Column stats: NONE
>   Spark HashTable Sink Operator
> keys:
>   0 _col0 (type: int)
>   1 _col0 (type: int)
>   Select Operator
> expressions: _col0 (type: int)
> outputColumnNames: _col0
> Statistics: Num rows: 1 Data size: 1 Basic stats: 
> COMPLETE Column stats: NONE
> Group By Operator
>   keys: _col0 (type: int)
>   mode: hash
>   outputColumnNames: _col0
>   Statistics: Num rows: 1 Data size: 1 Basic stats: 
> COMPLETE Column stats: NONE
>   Spark Partition Pruning Sink Operator
> partition key expr: 

[jira] [Updated] (HIVE-17087) Remove HoS DPP tree during map-join conversion

2017-07-20 Thread Sahil Takiar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-17087:

Status: Patch Available  (was: Open)

> Remove HoS DPP tree during map-join conversion
> --
>
> Key: HIVE-17087
> URL: https://issues.apache.org/jira/browse/HIVE-17087
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-17087.1.patch
>
>
> Ran the following query in the {{TestSparkCliDriver}}:
> {code:sql}
> set hive.spark.dynamic.partition.pruning=true;
> set hive.auto.convert.join=true;
> create table partitioned_table1 (col int) partitioned by (part_col int);
> create table partitioned_table2 (col int) partitioned by (part_col int);
> create table regular_table (col int);
> insert into table regular_table values (1);
> alter table partitioned_table1 add partition (part_col = 1);
> insert into table partitioned_table1 partition (part_col = 1) values (1), 
> (2), (3), (4), (5), (6), (7), (8), (9), (10);
> alter table partitioned_table2 add partition (part_col = 1);
> insert into table partitioned_table2 partition (part_col = 1) values (1), 
> (2), (3), (4), (5), (6), (7), (8), (9), (10);
> explain select * from partitioned_table1 where partitioned_table1.part_col in 
> (select regular_table.col from regular_table join partitioned_table2 on 
> regular_table.col = partitioned_table2.part_col);
> {code}
> and got the following explain plan:
> {code}
> STAGE DEPENDENCIES:
>   Stage-2 is a root stage
>   Stage-4 depends on stages: Stage-2
>   Stage-5 depends on stages: Stage-4
>   Stage-3 depends on stages: Stage-5
>   Stage-1 depends on stages: Stage-3
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-2
> Spark
>  A masked pattern was here 
>   Vertices:
> Map 4 
> Map Operator Tree:
> TableScan
>   alias: partitioned_table1
>   Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
>   Select Operator
> expressions: col (type: int), part_col (type: int)
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
> Select Operator
>   expressions: _col1 (type: int)
>   outputColumnNames: _col0
>   Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
>   Group By Operator
> keys: _col0 (type: int)
> mode: hash
> outputColumnNames: _col0
> Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
> Spark Partition Pruning Sink Operator
>   partition key expr: part_col
>   Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
>   target column name: part_col
>   target work: Map 3
>   Stage: Stage-4
> Spark
>  A masked pattern was here 
>   Vertices:
> Map 2 
> Map Operator Tree:
> TableScan
>   alias: regular_table
>   Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE 
> Column stats: NONE
>   Filter Operator
> predicate: col is not null (type: boolean)
> Statistics: Num rows: 1 Data size: 1 Basic stats: 
> COMPLETE Column stats: NONE
> Select Operator
>   expressions: col (type: int)
>   outputColumnNames: _col0
>   Statistics: Num rows: 1 Data size: 1 Basic stats: 
> COMPLETE Column stats: NONE
>   Spark HashTable Sink Operator
> keys:
>   0 _col0 (type: int)
>   1 _col0 (type: int)
>   Select Operator
> expressions: _col0 (type: int)
> outputColumnNames: _col0
> Statistics: Num rows: 1 Data size: 1 Basic stats: 
> COMPLETE Column stats: NONE
> Group By Operator
>   keys: _col0 (type: int)
>   mode: hash
>   outputColumnNames: _col0
>   Statistics: Num rows: 1 Data size: 1 Basic stats: 
> COMPLETE Column stats: NONE
>   Spark Partition Pruning Sink Operator
> partition key 

[jira] [Updated] (HIVE-17087) Remove HoS DPP tree during map-join conversion

2017-07-20 Thread Sahil Takiar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-17087:

Summary: Remove HoS DPP tree during map-join conversion  (was: HoS Query 
with multiple Partition Pruning Sinks + subquery has incorrect explain)

> Remove HoS DPP tree during map-join conversion
> --
>
> Key: HIVE-17087
> URL: https://issues.apache.org/jira/browse/HIVE-17087
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>
> Ran the following query in the {{TestSparkCliDriver}}:
> {code:sql}
> set hive.spark.dynamic.partition.pruning=true;
> set hive.auto.convert.join=true;
> create table partitioned_table1 (col int) partitioned by (part_col int);
> create table partitioned_table2 (col int) partitioned by (part_col int);
> create table regular_table (col int);
> insert into table regular_table values (1);
> alter table partitioned_table1 add partition (part_col = 1);
> insert into table partitioned_table1 partition (part_col = 1) values (1), 
> (2), (3), (4), (5), (6), (7), (8), (9), (10);
> alter table partitioned_table2 add partition (part_col = 1);
> insert into table partitioned_table2 partition (part_col = 1) values (1), 
> (2), (3), (4), (5), (6), (7), (8), (9), (10);
> explain select * from partitioned_table1 where partitioned_table1.part_col in 
> (select regular_table.col from regular_table join partitioned_table2 on 
> regular_table.col = partitioned_table2.part_col);
> {code}
> and got the following explain plan:
> {code}
> STAGE DEPENDENCIES:
>   Stage-2 is a root stage
>   Stage-4 depends on stages: Stage-2
>   Stage-5 depends on stages: Stage-4
>   Stage-3 depends on stages: Stage-5
>   Stage-1 depends on stages: Stage-3
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-2
> Spark
>  A masked pattern was here 
>   Vertices:
> Map 4 
> Map Operator Tree:
> TableScan
>   alias: partitioned_table1
>   Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
>   Select Operator
> expressions: col (type: int), part_col (type: int)
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
> Select Operator
>   expressions: _col1 (type: int)
>   outputColumnNames: _col0
>   Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
>   Group By Operator
> keys: _col0 (type: int)
> mode: hash
> outputColumnNames: _col0
> Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
> Spark Partition Pruning Sink Operator
>   partition key expr: part_col
>   Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
>   target column name: part_col
>   target work: Map 3
>   Stage: Stage-4
> Spark
>  A masked pattern was here 
>   Vertices:
> Map 2 
> Map Operator Tree:
> TableScan
>   alias: regular_table
>   Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE 
> Column stats: NONE
>   Filter Operator
> predicate: col is not null (type: boolean)
> Statistics: Num rows: 1 Data size: 1 Basic stats: 
> COMPLETE Column stats: NONE
> Select Operator
>   expressions: col (type: int)
>   outputColumnNames: _col0
>   Statistics: Num rows: 1 Data size: 1 Basic stats: 
> COMPLETE Column stats: NONE
>   Spark HashTable Sink Operator
> keys:
>   0 _col0 (type: int)
>   1 _col0 (type: int)
>   Select Operator
> expressions: _col0 (type: int)
> outputColumnNames: _col0
> Statistics: Num rows: 1 Data size: 1 Basic stats: 
> COMPLETE Column stats: NONE
> Group By Operator
>   keys: _col0 (type: int)
>   mode: hash
>   outputColumnNames: _col0
>   Statistics: Num rows: 1 Data size: 1 Basic stats: 
> COMPLETE Column stats: NONE
>   Spark Partition 

[jira] [Updated] (HIVE-17116) Vectorization: Add infrastructure for vectorization of ROW__ID struct

2017-07-20 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-17116:

Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Vectorization: Add infrastructure for vectorization of ROW__ID struct
> -
>
> Key: HIVE-17116
> URL: https://issues.apache.org/jira/browse/HIVE-17116
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Fix For: 3.0.0
>
> Attachments: HIVE-17116.01.patch, HIVE-17116.02.patch
>
>
> Supports new ACID work.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17116) Vectorization: Add infrastructure for vectorization of ROW__ID struct

2017-07-20 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095668#comment-16095668
 ] 

Matt McCline commented on HIVE-17116:
-

In a subsequent JIRA, Teddy will actually make vectorized ROW__ID work by 
filling in that column with values -- probably from within the ACID ORC 
reader(s).  See line 815 in VectorMapOperator for the relevant UNDONE.

> Vectorization: Add infrastructure for vectorization of ROW__ID struct
> -
>
> Key: HIVE-17116
> URL: https://issues.apache.org/jira/browse/HIVE-17116
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Fix For: 3.0.0
>
> Attachments: HIVE-17116.01.patch, HIVE-17116.02.patch
>
>
> Supports new ACID work.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17116) Vectorization: Add infrastructure for vectorization of ROW__ID struct

2017-07-20 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-17116:

Fix Version/s: 3.0.0

> Vectorization: Add infrastructure for vectorization of ROW__ID struct
> -
>
> Key: HIVE-17116
> URL: https://issues.apache.org/jira/browse/HIVE-17116
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Fix For: 3.0.0
>
> Attachments: HIVE-17116.01.patch, HIVE-17116.02.patch
>
>
> Supports new ACID work.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17116) Vectorization: Add infrastructure for vectorization of ROW__ID struct

2017-07-20 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095667#comment-16095667
 ] 

Matt McCline commented on HIVE-17116:
-

Thank you Teddy for your code review.
Committed to master.

> Vectorization: Add infrastructure for vectorization of ROW__ID struct
> -
>
> Key: HIVE-17116
> URL: https://issues.apache.org/jira/browse/HIVE-17116
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Fix For: 3.0.0
>
> Attachments: HIVE-17116.01.patch, HIVE-17116.02.patch
>
>
> Supports new ACID work.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17116) Vectorization: Add infrastructure for vectorization of ROW__ID struct

2017-07-20 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-17116:

Summary: Vectorization: Add infrastructure for vectorization of ROW__ID 
struct  (was: Vectorization: Enable vectorization of ROW__ID struct)

> Vectorization: Add infrastructure for vectorization of ROW__ID struct
> -
>
> Key: HIVE-17116
> URL: https://issues.apache.org/jira/browse/HIVE-17116
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-17116.01.patch, HIVE-17116.02.patch
>
>
> Supports new ACID work.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17128) Operation Logging leaks file descriptors as the log4j Appender is never closed

2017-07-20 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095620#comment-16095620
 ] 

Hive QA commented on HIVE-17128:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12878269/HIVE-17128.2.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 11092 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=144)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_scalar]
 (batchId=153)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=168)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=100)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=235)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=179)
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testHttpRetryOnServerIdleTimeout 
(batchId=228)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6101/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6101/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6101/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 9 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12878269 - PreCommit-HIVE-Build

> Operation Logging leaks file descriptors as the log4j Appender is never closed
> --
>
> Key: HIVE-17128
> URL: https://issues.apache.org/jira/browse/HIVE-17128
> Project: Hive
>  Issue Type: Bug
>  Components: Logging
>Reporter: Andrew Sherman
>Assignee: Andrew Sherman
> Attachments: HIVE-17128.1.patch, HIVE-17128.2.patch
>
>
> [HIVE-16061] and [HIVE-16400] changed operation logging to use the Log4j2 
> RoutingAppender to automatically output the log for each query into each 
> individual operation log file. As log4j does not know when a query is 
> finished it keeps the OutputStream in the Appender open even when the query 
> completes. The stream holds a file descriptor and so we leak file 
> descriptors. Note that we are already careful to close any streams reading 
> from the operation log file.
> h2. Fix
> To fix this we use a technique described in the comments of [LOG4J2-510] 
> which uses reflection to close the appender. The test in 
> TestOperationLoggingLayout will be extended to check that the Appender is 
> closed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16077) UPDATE/DELETE fails with numBuckets > numReducers

2017-07-20 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095582#comment-16095582
 ] 

Hive QA commented on HIVE-16077:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12878267/HIVE-16077.02.patch

{color:green}SUCCESS:{color} +1 due to 3 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 11 failed/errored test(s), 11094 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[materialized_view_create_rewrite]
 (batchId=240)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[skewjoinopt3] 
(batchId=21)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=144)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[dynpart_sort_optimization_acid]
 (batchId=154)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_scalar]
 (batchId=153)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=168)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=235)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=235)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=179)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6100/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6100/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6100/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 11 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12878267 - PreCommit-HIVE-Build

> UPDATE/DELETE fails with numBuckets > numReducers
> -
>
> Key: HIVE-16077
> URL: https://issues.apache.org/jira/browse/HIVE-16077
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.1.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-16077.01.patch, HIVE-16077.02.patch
>
>
> don't think we have such tests for Acid path
> check if they exist for non-acid path



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17128) Operation Logging leaks file descriptors as the log4j Appender is never closed

2017-07-20 Thread Andrew Sherman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095528#comment-16095528
 ] 

Andrew Sherman commented on HIVE-17128:
---

Hi [~aihuaxu] can you review this change please? [There is a review board diff 
here|https://reviews.apache.org/r/61010/] Thanks

> Operation Logging leaks file descriptors as the log4j Appender is never closed
> --
>
> Key: HIVE-17128
> URL: https://issues.apache.org/jira/browse/HIVE-17128
> Project: Hive
>  Issue Type: Bug
>  Components: Logging
>Reporter: Andrew Sherman
>Assignee: Andrew Sherman
> Attachments: HIVE-17128.1.patch, HIVE-17128.2.patch
>
>
> [HIVE-16061] and [HIVE-16400] changed operation logging to use the Log4j2 
> RoutingAppender to automatically output the log for each query into each 
> individual operation log file. As log4j does not know when a query is 
> finished it keeps the OutputStream in the Appender open even when the query 
> completes. The stream holds a file descriptor and so we leak file 
> descriptors. Note that we are already careful to close any streams reading 
> from the operation log file.
> h2. Fix
> To fix this we use a technique described in the comments of [LOG4J2-510] 
> which uses reflection to close the appender. The test in 
> TestOperationLoggingLayout will be extended to check that the Appender is 
> closed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16997) Extend object store to store bit vectors

2017-07-20 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-16997:
---
Status: Patch Available  (was: Open)

> Extend object store to store bit vectors
> 
>
> Key: HIVE-16997
> URL: https://issues.apache.org/jira/browse/HIVE-16997
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-16997.01.patch, HIVE-16997.02.patch
>
>
> This patch includes: (1) a new serde for FMSketch (2) change of schema for 
> derby and mysql (3) support for date type (4) refactoring the extrapolation 
> and merge code



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16997) Extend object store to store bit vectors

2017-07-20 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-16997:
---
Status: Open  (was: Patch Available)

> Extend object store to store bit vectors
> 
>
> Key: HIVE-16997
> URL: https://issues.apache.org/jira/browse/HIVE-16997
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-16997.01.patch, HIVE-16997.02.patch
>
>
> This patch includes: (1) a new serde for FMSketch (2) change of schema for 
> derby and mysql (3) support for date type (4) refactoring the extrapolation 
> and merge code



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16997) Extend object store to store bit vectors

2017-07-20 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-16997:
---
Attachment: HIVE-16997.02.patch

> Extend object store to store bit vectors
> 
>
> Key: HIVE-16997
> URL: https://issues.apache.org/jira/browse/HIVE-16997
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-16997.01.patch, HIVE-16997.02.patch
>
>
> This patch includes: (1) a new serde for FMSketch (2) change of schema for 
> derby and mysql (3) support for date type (4) refactoring the extrapolation 
> and merge code



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17128) Operation Logging leaks file descriptors as the log4j Appender is never closed

2017-07-20 Thread Andrew Sherman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Sherman updated HIVE-17128:
--
Attachment: HIVE-17128.2.patch

> Operation Logging leaks file descriptors as the log4j Appender is never closed
> --
>
> Key: HIVE-17128
> URL: https://issues.apache.org/jira/browse/HIVE-17128
> Project: Hive
>  Issue Type: Bug
>  Components: Logging
>Reporter: Andrew Sherman
>Assignee: Andrew Sherman
> Attachments: HIVE-17128.1.patch, HIVE-17128.2.patch
>
>
> [HIVE-16061] and [HIVE-16400] changed operation logging to use the Log4j2 
> RoutingAppender to automatically output the log for each query into each 
> individual operation log file. As log4j does not know when a query is 
> finished it keeps the OutputStream in the Appender open even when the query 
> completes. The stream holds a file descriptor and so we leak file 
> descriptors. Note that we are already careful to close any streams reading 
> from the operation log file.
> h2. Fix
> To fix this we use a technique described in the comments of [LOG4J2-510] 
> which uses reflection to close the appender. The test in 
> TestOperationLoggingLayout will be extended to check that the Appender is 
> closed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16077) UPDATE/DELETE fails with numBuckets > numReducers

2017-07-20 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-16077:
--
Attachment: HIVE-16077.02.patch

> UPDATE/DELETE fails with numBuckets > numReducers
> -
>
> Key: HIVE-16077
> URL: https://issues.apache.org/jira/browse/HIVE-16077
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.1.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-16077.01.patch, HIVE-16077.02.patch
>
>
> don't think we have such tests for Acid path
> check if they exist for non-acid path



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17085) ORC file merge/concatenation should do full schema check

2017-07-20 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-17085:
-
   Resolution: Fixed
Fix Version/s: 2.4.0
   3.0.0
   Status: Resolved  (was: Patch Available)

Test failures are unrelated to this patch. 
Committed to branch-2 and master. Thanks Zoltan for the review!

> ORC file merge/concatenation should do full schema check
> 
>
> Key: HIVE-17085
> URL: https://issues.apache.org/jira/browse/HIVE-17085
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 2.2.0, 2.3.0, 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Fix For: 3.0.0, 2.4.0
>
> Attachments: HIVE-17085.1.patch, HIVE-17085.2.patch
>
>
> ORC merging/concatenation compatibility check just looks for column count 
> match at outer level. ORC schema evolution now supports inner structs as 
> well. With that outer level column count will match but inner column level 
> will not match. Compatibility check should do full schema match before 
> merging/concatenation. This issue will not cause data loss but will cause 
> task failures with exception like below
> {code}
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to close 
> OrcFileMergeOperator
>   at 
> org.apache.hadoop.hive.ql.exec.OrcFileMergeOperator.closeOp(OrcFileMergeOperator.java:247)
>   at 
> org.apache.hadoop.hive.ql.exec.OrcFileMergeOperator.processKeyValuePairs(OrcFileMergeOperator.java:172)
>   at 
> org.apache.hadoop.hive.ql.exec.OrcFileMergeOperator.process(OrcFileMergeOperator.java:72)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MergeFileRecordProcessor.processRow(MergeFileRecordProcessor.java:212)
>   ... 16 more
> Caused by: java.lang.IllegalArgumentException: Column has wrong number of 
> index entries found: 0 expected: 1
>   at 
> org.apache.orc.impl.WriterImpl$TreeWriter.writeStripe(WriterImpl.java:695)
>   at 
> org.apache.orc.impl.WriterImpl$StructTreeWriter.writeStripe(WriterImpl.java:2147)
>   at org.apache.orc.impl.WriterImpl.flushStripe(WriterImpl.java:2661)
>   at org.apache.orc.impl.WriterImpl.close(WriterImpl.java:2834)
>   at 
> org.apache.hadoop.hive.ql.io.orc.WriterImpl.close(WriterImpl.java:321)
>   at 
> org.apache.hadoop.hive.ql.exec.OrcFileMergeOperator.closeOp(OrcFileMergeOperator.java:243)
>   ... 19 more
> {code}
> Concatenation should also make sure writer version is matching (it currently 
> checks only file version match).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17131) Add InterfaceAudience and InterfaceStability annotations for SerDe APIs

2017-07-20 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095414#comment-16095414
 ] 

Ashutosh Chauhan commented on HIVE-17131:
-

I think we shall do HIVE-16374 instead for master.

> Add InterfaceAudience and InterfaceStability annotations for SerDe APIs
> ---
>
> Key: HIVE-17131
> URL: https://issues.apache.org/jira/browse/HIVE-17131
> Project: Hive
>  Issue Type: Sub-task
>  Components: Serializers/Deserializers
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-17131.1.patch
>
>
> Adding InterfaceAudience and InterfaceStability annotations for the core 
> SerDe APIs.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-14181) DROP TABLE in hive doesn't Throw Error

2017-07-20 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095376#comment-16095376
 ] 

Aihua Xu commented on HIVE-14181:
-

[~szita] That is a challenge to keep HDFS async with metadata during insertion 
and deletion. Agree that we can print some kind of warning to the user at least 
and let the user clean up the data manually. 

Another thought: when HDFS trash is turned off, seems throwing the warning is 
what we can do; when HDFS trash is turned on, since we can recover HDFS files, 
then we can keep HDFS and metadata async by recovering HDFS file and reverting 
metadata change if anything fails. 

How do you think?


> DROP TABLE in hive doesn't Throw Error
> --
>
> Key: HIVE-14181
> URL: https://issues.apache.org/jira/browse/HIVE-14181
> Project: Hive
>  Issue Type: Bug
> Environment: Hive 1.1.0
> CDH 5.5.1-1
>Reporter: Pranjal Singh
>Assignee: Adam Szita
>  Labels: easyfix
> Attachments: HIVE-14181.1.patch, HIVE-14181.2.patch
>
>
> drop table table_name doen't throw an error if drop table fails.
> I was dropping a table and my trash didn't have enough space to hold the 
> table but the drop table command showed success and the table wasn't deleted. 
> But the hadoop fs -rm -r /hive/xyz.db/table_name/ gave an error "Failed to 
> move to trash" because I didnot have enough space quota in my trash.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17138) FileSinkOperator doesn't create empty files for acid path

2017-07-20 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-17138:
--
Description: 
For bucketed tables, FileSinkOperator is expected (in some cases)  to produce a 
specific number of files even if they are empty.
FileSinkOperator.closeOp(boolean abort) has logic to create files even if empty.

This doesn't property work for Acid path.  For Insert, the OrcRecordUpdater(s) 
is set up in createBucketForFileIdx() which creates the actual bucketN file (as 
of HIVE-14007, it does it regardless of whether RecordUpdater sees any rows).  
This causes empty (i.e.ORC metadata only) bucket files to be created for 
multiFileSpray=true if a particular FileSinkOperator.process() sees at least 1 
row.  For example,
{noformat}
create table fourbuckets (a int, b int) clustered by (a) into 4 buckets stored 
as orc TBLPROPERTIES ('transactional'='true');
insert into fourbuckets values(0,1),(1,1);
with mapreduce.job.reduces = 1 or 2 
{noformat}

For Update/Delete path, OrcRecordWriter is created lazily when the 1st row that 
needs to land there is seen.  Thus it never creates empty buckets no mater what 
the value of _skipFiles_ in closeOp(boolean).

Once Split Update does the split early (in operator pipeline) only the Insert 
path will matter since base and delta are the only files split computation, etc 
looks at.  delete_delta is only for Acid internals so there is never any reason 
for create empty files there.


  was:
For bucketed tables, FileSinkOperator is expected (in some cases)  to produce a 
specific number of files even if they are empty.
FileSinkOperator.closeOp(boolean abort) has logic to create files even if empty.

This doesn't property work for Acid path.  For Insert, the OrcRecordUpdater(s) 
is set up in createBucketForFileIdx() which creates the actual bucketN file (as 
of HIVE-14007, it does it regardless of whether RecordUpdate sees any rows).  
This causes empty (i.e.ORC metadata only) bucket files to be created.  For 
example,
{noformat}
create table fourbuckets (a int, b int) clustered by (a) into 4 buckets stored 
as orc TBLPROPERTIES ('transactional'='true');
insert into fourbuckets values(0,1),(1,1);
{noformat}

For Update/Delete path, OrcRecordWriter is created lazily when the 1st row that 
needs to land there is seen.  Thus it never creates empty buckets no mater what 
the value of _skipFiles_ in closeOp(boolean).

Once Split Update does the split early (in operator pipeline) only the Insert 
path will matter since base and delta are the only files split computation, etc 
looks at.  delete_delta is only for Acid internals so there is never any reason 
for create empty files there.



> FileSinkOperator doesn't create empty files for acid path
> -
>
> Key: HIVE-17138
> URL: https://issues.apache.org/jira/browse/HIVE-17138
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>
> For bucketed tables, FileSinkOperator is expected (in some cases)  to produce 
> a specific number of files even if they are empty.
> FileSinkOperator.closeOp(boolean abort) has logic to create files even if 
> empty.
> This doesn't property work for Acid path.  For Insert, the 
> OrcRecordUpdater(s) is set up in createBucketForFileIdx() which creates the 
> actual bucketN file (as of HIVE-14007, it does it regardless of whether 
> RecordUpdater sees any rows).  This causes empty (i.e.ORC metadata only) 
> bucket files to be created for multiFileSpray=true if a particular 
> FileSinkOperator.process() sees at least 1 row.  For example,
> {noformat}
> create table fourbuckets (a int, b int) clustered by (a) into 4 buckets 
> stored as orc TBLPROPERTIES ('transactional'='true');
> insert into fourbuckets values(0,1),(1,1);
> with mapreduce.job.reduces = 1 or 2 
> {noformat}
> For Update/Delete path, OrcRecordWriter is created lazily when the 1st row 
> that needs to land there is seen.  Thus it never creates empty buckets no 
> mater what the value of _skipFiles_ in closeOp(boolean).
> Once Split Update does the split early (in operator pipeline) only the Insert 
> path will matter since base and delta are the only files split computation, 
> etc looks at.  delete_delta is only for Acid internals so there is never any 
> reason for create empty files there.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17138) FileSinkOperator doesn't create empty files for acid path

2017-07-20 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-17138:
-


> FileSinkOperator doesn't create empty files for acid path
> -
>
> Key: HIVE-17138
> URL: https://issues.apache.org/jira/browse/HIVE-17138
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>
> For bucketed tables, FileSinkOperator is expected (in some cases)  to produce 
> a specific number of files even if they are empty.
> FileSinkOperator.closeOp(boolean abort) has logic to create files even if 
> empty.
> This doesn't property work for Acid path.  For Insert, the 
> OrcRecordUpdater(s) is set up in createBucketForFileIdx() which creates the 
> actual bucketN file (as of HIVE-14007, it does it regardless of whether 
> RecordUpdate sees any rows).  This causes empty (i.e.ORC metadata only) 
> bucket files to be created.  For example,
> {noformat}
> create table fourbuckets (a int, b int) clustered by (a) into 4 buckets 
> stored as orc TBLPROPERTIES ('transactional'='true');
> insert into fourbuckets values(0,1),(1,1);
> {noformat}
> For Update/Delete path, OrcRecordWriter is created lazily when the 1st row 
> that needs to land there is seen.  Thus it never creates empty buckets no 
> mater what the value of _skipFiles_ in closeOp(boolean).
> Once Split Update does the split early (in operator pipeline) only the Insert 
> path will matter since base and delta are the only files split computation, 
> etc looks at.  delete_delta is only for Acid internals so there is never any 
> reason for create empty files there.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17128) Operation Logging leaks file descriptors as the log4j Appender is never closed

2017-07-20 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095297#comment-16095297
 ] 

Hive QA commented on HIVE-17128:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12878232/HIVE-17128.1.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 11088 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=144)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_scalar]
 (batchId=153)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=168)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=100)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=235)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=235)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=179)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6099/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6099/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6099/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 9 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12878232 - PreCommit-HIVE-Build

> Operation Logging leaks file descriptors as the log4j Appender is never closed
> --
>
> Key: HIVE-17128
> URL: https://issues.apache.org/jira/browse/HIVE-17128
> Project: Hive
>  Issue Type: Bug
>  Components: Logging
>Reporter: Andrew Sherman
>Assignee: Andrew Sherman
> Attachments: HIVE-17128.1.patch
>
>
> [HIVE-16061] and [HIVE-16400] changed operation logging to use the Log4j2 
> RoutingAppender to automatically output the log for each query into each 
> individual operation log file. As log4j does not know when a query is 
> finished it keeps the OutputStream in the Appender open even when the query 
> completes. The stream holds a file descriptor and so we leak file 
> descriptors. Note that we are already careful to close any streams reading 
> from the operation log file.
> h2. Fix
> To fix this we use a technique described in the comments of [LOG4J2-510] 
> which uses reflection to close the appender. The test in 
> TestOperationLoggingLayout will be extended to check that the Appender is 
> closed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16787) Fix itests in branch-2.2

2017-07-20 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HIVE-16787:
-
Release Note:   (was: I just committed this. Thanks for the review, Alan.
)

I just committed this. Thanks for the review, Alan.


> Fix itests in branch-2.2
> 
>
> Key: HIVE-16787
> URL: https://issues.apache.org/jira/browse/HIVE-16787
> Project: Hive
>  Issue Type: Bug
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Fix For: 2.2.0
>
>
> The itests are broken in branch 2.2 and need to be fixed before release.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (HIVE-16787) Fix itests in branch-2.2

2017-07-20 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley resolved HIVE-16787.
--
Resolution: Fixed

> Fix itests in branch-2.2
> 
>
> Key: HIVE-16787
> URL: https://issues.apache.org/jira/browse/HIVE-16787
> Project: Hive
>  Issue Type: Bug
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Fix For: 2.2.0
>
>
> The itests are broken in branch 2.2 and need to be fixed before release.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Reopened] (HIVE-16787) Fix itests in branch-2.2

2017-07-20 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley reopened HIVE-16787:
--

> Fix itests in branch-2.2
> 
>
> Key: HIVE-16787
> URL: https://issues.apache.org/jira/browse/HIVE-16787
> Project: Hive
>  Issue Type: Bug
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Fix For: 2.2.0
>
>
> The itests are broken in branch 2.2 and need to be fixed before release.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (HIVE-16787) Fix itests in branch-2.2

2017-07-20 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley resolved HIVE-16787.
--
  Resolution: Fixed
Release Note: 
I just committed this. Thanks for the review, Alan.


> Fix itests in branch-2.2
> 
>
> Key: HIVE-16787
> URL: https://issues.apache.org/jira/browse/HIVE-16787
> Project: Hive
>  Issue Type: Bug
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Fix For: 2.2.0
>
>
> The itests are broken in branch 2.2 and need to be fixed before release.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16787) Fix itests in branch-2.2

2017-07-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095286#comment-16095286
 ] 

ASF GitHub Bot commented on HIVE-16787:
---

Github user asfgit closed the pull request at:

https://github.com/apache/hive/pull/207


> Fix itests in branch-2.2
> 
>
> Key: HIVE-16787
> URL: https://issues.apache.org/jira/browse/HIVE-16787
> Project: Hive
>  Issue Type: Bug
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Fix For: 2.2.0
>
>
> The itests are broken in branch 2.2 and need to be fixed before release.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16366) Hive 2.3 release planning

2017-07-20 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-16366:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Hive 2.3 release planning
> -
>
> Key: HIVE-16366
> URL: https://issues.apache.org/jira/browse/HIVE-16366
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
>Priority: Blocker
>  Labels: 2.3.0
> Fix For: 2.3.0
>
> Attachments: HIVE-16366-branch-2.3.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-14181) DROP TABLE in hive doesn't Throw Error

2017-07-20 Thread Adam Szita (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095257#comment-16095257
 ] 

Adam Szita commented on HIVE-14181:
---

Hi [~daijy], we've taken a look on this [~aihuaxu] and [~vihangk1] a while back 
but couldn't reach consensus. Please take a look on the RB link too.

I agree that rolling back (as the latest patch does) the HMS transaction is a 
bad idea because the data deletion may fail partially, leaving an intact 
metadata and corrupted data - that's a bad combo..
However I still think that we should at least throw the exception back to the 
user to indicate that something went south while actually deleting the data, 
and not just leave a warning in the log of HMS..

> DROP TABLE in hive doesn't Throw Error
> --
>
> Key: HIVE-14181
> URL: https://issues.apache.org/jira/browse/HIVE-14181
> Project: Hive
>  Issue Type: Bug
> Environment: Hive 1.1.0
> CDH 5.5.1-1
>Reporter: Pranjal Singh
>Assignee: Adam Szita
>  Labels: easyfix
> Attachments: HIVE-14181.1.patch, HIVE-14181.2.patch
>
>
> drop table table_name doen't throw an error if drop table fails.
> I was dropping a table and my trash didn't have enough space to hold the 
> table but the drop table command showed success and the table wasn't deleted. 
> But the hadoop fs -rm -r /hive/xyz.db/table_name/ gave an error "Failed to 
> move to trash" because I didnot have enough space quota in my trash.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17137) Fix javolution conflict

2017-07-20 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095207#comment-16095207
 ] 

Hive QA commented on HIVE-17137:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12878218/HIVE-17137.01.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 11088 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[materialized_view_create_rewrite]
 (batchId=240)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=144)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_scalar]
 (batchId=153)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=168)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=179)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6097/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6097/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6097/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 7 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12878218 - PreCommit-HIVE-Build

> Fix javolution conflict
> ---
>
> Key: HIVE-17137
> URL: https://issues.apache.org/jira/browse/HIVE-17137
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-17137.01.patch
>
>
> as reported by [~jcamachorodriguez]
> {code}
> [WARNING] Some problems were encountered while building the effective model 
> for org.apache.hive:hive-exec:jar:3.0.0-SNAPSHOT
> [WARNING] 'dependencies.dependency.(groupId:artifactId:type:classifier)' must 
> be unique: javolution:javolution:jar -> duplicate declaration of version 
> ${javolution.version} @ org.apache.hive:hive-exec:[unknown-version], 
> /grid/5/dev/jcamachorodriguez/dist/tez-autobuild/hive/ql/pom.xml, line 366, 
> column 17
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17128) Operation Logging leaks file descriptors as the log4j Appender is never closed

2017-07-20 Thread Andrew Sherman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Sherman updated HIVE-17128:
--
Attachment: HIVE-17128.1.patch

> Operation Logging leaks file descriptors as the log4j Appender is never closed
> --
>
> Key: HIVE-17128
> URL: https://issues.apache.org/jira/browse/HIVE-17128
> Project: Hive
>  Issue Type: Bug
>  Components: Logging
>Reporter: Andrew Sherman
>Assignee: Andrew Sherman
> Attachments: HIVE-17128.1.patch
>
>
> [HIVE-16061] and [HIVE-16400] changed operation logging to use the Log4j2 
> RoutingAppender to automatically output the log for each query into each 
> individual operation log file. As log4j does not know when a query is 
> finished it keeps the OutputStream in the Appender open even when the query 
> completes. The stream holds a file descriptor and so we leak file 
> descriptors. Note that we are already careful to close any streams reading 
> from the operation log file.
> h2. Fix
> To fix this we use a technique described in the comments of [LOG4J2-510] 
> which uses reflection to close the appender. The test in 
> TestOperationLoggingLayout will be extended to check that the Appender is 
> closed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17128) Operation Logging leaks file descriptors as the log4j Appender is never closed

2017-07-20 Thread Andrew Sherman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Sherman updated HIVE-17128:
--
Status: Patch Available  (was: Open)

> Operation Logging leaks file descriptors as the log4j Appender is never closed
> --
>
> Key: HIVE-17128
> URL: https://issues.apache.org/jira/browse/HIVE-17128
> Project: Hive
>  Issue Type: Bug
>  Components: Logging
>Reporter: Andrew Sherman
>Assignee: Andrew Sherman
> Attachments: HIVE-17128.1.patch
>
>
> [HIVE-16061] and [HIVE-16400] changed operation logging to use the Log4j2 
> RoutingAppender to automatically output the log for each query into each 
> individual operation log file. As log4j does not know when a query is 
> finished it keeps the OutputStream in the Appender open even when the query 
> completes. The stream holds a file descriptor and so we leak file 
> descriptors. Note that we are already careful to close any streams reading 
> from the operation log file.
> h2. Fix
> To fix this we use a technique described in the comments of [LOG4J2-510] 
> which uses reflection to close the appender. The test in 
> TestOperationLoggingLayout will be extended to check that the Appender is 
> closed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Work started] (HIVE-17117) Metalisteners are not notified when threadlocal metaconf is cleanup

2017-07-20 Thread PRASHANT GOLASH (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-17117 started by PRASHANT GOLASH.
--
> Metalisteners are not notified when threadlocal metaconf is cleanup 
> 
>
> Key: HIVE-17117
> URL: https://issues.apache.org/jira/browse/HIVE-17117
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
> Environment: Tested on master branch (Applicable for downlevel 
> versions as well)
>Reporter: PRASHANT GOLASH
>Assignee: PRASHANT GOLASH
>Priority: Minor
> Attachments: HIVE-17117.1.patch, HIVE-17117.patch
>
>
> Meta listeners are not notified of meta-conf cleanup. This could potentially 
> leave stale values on listeners objects. For e.g.
> Request1
> a. HS2 -> HMS : HMSHandler#setMetaConf
>  MetaListeners are notified of the ConfigChangeEvent.
> b. HS2 -> HMS : HMSHandler#shutdown / HiveMetaStore#deleteContext (if 
> shutdown is not invoked)
> MetaConf is cleaned up in HiveMetaStore#cleanupRawStore, but meta 
> listeners are not notified
> Request 2
> 3. HS2->HMS : AlterPartition
>  MetaListeners are notified of AlterPartitionEvent. If any listener has 
> taken dependency on the meta conf value, it will still be having stale value 
> from Request1 and would potentially be having issues.
> The correct behavior should be to notify meta listeners on cleanup as well.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17117) Metalisteners are not notified when threadlocal metaconf is cleanup

2017-07-20 Thread PRASHANT GOLASH (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

PRASHANT GOLASH updated HIVE-17117:
---
Status: In Progress  (was: Patch Available)

> Metalisteners are not notified when threadlocal metaconf is cleanup 
> 
>
> Key: HIVE-17117
> URL: https://issues.apache.org/jira/browse/HIVE-17117
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
> Environment: Tested on master branch (Applicable for downlevel 
> versions as well)
>Reporter: PRASHANT GOLASH
>Assignee: PRASHANT GOLASH
>Priority: Minor
> Attachments: HIVE-17117.1.patch, HIVE-17117.patch
>
>
> Meta listeners are not notified of meta-conf cleanup. This could potentially 
> leave stale values on listeners objects. For e.g.
> Request1
> a. HS2 -> HMS : HMSHandler#setMetaConf
>  MetaListeners are notified of the ConfigChangeEvent.
> b. HS2 -> HMS : HMSHandler#shutdown / HiveMetaStore#deleteContext (if 
> shutdown is not invoked)
> MetaConf is cleaned up in HiveMetaStore#cleanupRawStore, but meta 
> listeners are not notified
> Request 2
> 3. HS2->HMS : AlterPartition
>  MetaListeners are notified of AlterPartitionEvent. If any listener has 
> taken dependency on the meta conf value, it will still be having stale value 
> from Request1 and would potentially be having issues.
> The correct behavior should be to notify meta listeners on cleanup as well.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Work stopped] (HIVE-17117) Metalisteners are not notified when threadlocal metaconf is cleanup

2017-07-20 Thread PRASHANT GOLASH (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-17117 stopped by PRASHANT GOLASH.
--
> Metalisteners are not notified when threadlocal metaconf is cleanup 
> 
>
> Key: HIVE-17117
> URL: https://issues.apache.org/jira/browse/HIVE-17117
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
> Environment: Tested on master branch (Applicable for downlevel 
> versions as well)
>Reporter: PRASHANT GOLASH
>Assignee: PRASHANT GOLASH
>Priority: Minor
> Attachments: HIVE-17117.1.patch, HIVE-17117.patch
>
>
> Meta listeners are not notified of meta-conf cleanup. This could potentially 
> leave stale values on listeners objects. For e.g.
> Request1
> a. HS2 -> HMS : HMSHandler#setMetaConf
>  MetaListeners are notified of the ConfigChangeEvent.
> b. HS2 -> HMS : HMSHandler#shutdown / HiveMetaStore#deleteContext (if 
> shutdown is not invoked)
> MetaConf is cleaned up in HiveMetaStore#cleanupRawStore, but meta 
> listeners are not notified
> Request 2
> 3. HS2->HMS : AlterPartition
>  MetaListeners are notified of AlterPartitionEvent. If any listener has 
> taken dependency on the meta conf value, it will still be having stale value 
> from Request1 and would potentially be having issues.
> The correct behavior should be to notify meta listeners on cleanup as well.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17083) DagUtils overwrites any credentials already added

2017-07-20 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095176#comment-16095176
 ] 

Josh Elser commented on HIVE-17083:
---

Thanks here as well, Sushanth!

> DagUtils overwrites any credentials already added
> -
>
> Key: HIVE-17083
> URL: https://issues.apache.org/jira/browse/HIVE-17083
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Reporter: Josh Elser
>Assignee: Josh Elser
> Fix For: 3.0.0
>
> Attachments: HIVE-17083.patch
>
>
> While working with a StorageHandler with hive.execution.engine=tez, I found 
> that the credentials the storage handler was adding were not propagating to 
> the dag.
> After a big of debugging/git-log, I found that DagUtils was overwriting the 
> credentials which were already set. A quick patch locally seem to make things 
> work again. Will put together a quick unit test.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17117) Metalisteners are not notified when threadlocal metaconf is cleanup

2017-07-20 Thread PRASHANT GOLASH (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095172#comment-16095172
 ] 

PRASHANT GOLASH commented on HIVE-17117:


Attached the latest patch. Thanks [~mohitsabharwal] & [~csun]

> Metalisteners are not notified when threadlocal metaconf is cleanup 
> 
>
> Key: HIVE-17117
> URL: https://issues.apache.org/jira/browse/HIVE-17117
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
> Environment: Tested on master branch (Applicable for downlevel 
> versions as well)
>Reporter: PRASHANT GOLASH
>Assignee: PRASHANT GOLASH
>Priority: Minor
> Attachments: HIVE-17117.1.patch, HIVE-17117.patch
>
>
> Meta listeners are not notified of meta-conf cleanup. This could potentially 
> leave stale values on listeners objects. For e.g.
> Request1
> a. HS2 -> HMS : HMSHandler#setMetaConf
>  MetaListeners are notified of the ConfigChangeEvent.
> b. HS2 -> HMS : HMSHandler#shutdown / HiveMetaStore#deleteContext (if 
> shutdown is not invoked)
> MetaConf is cleaned up in HiveMetaStore#cleanupRawStore, but meta 
> listeners are not notified
> Request 2
> 3. HS2->HMS : AlterPartition
>  MetaListeners are notified of AlterPartitionEvent. If any listener has 
> taken dependency on the meta conf value, it will still be having stale value 
> from Request1 and would potentially be having issues.
> The correct behavior should be to notify meta listeners on cleanup as well.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17117) Metalisteners are not notified when threadlocal metaconf is cleanup

2017-07-20 Thread PRASHANT GOLASH (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

PRASHANT GOLASH updated HIVE-17117:
---
Attachment: HIVE-17117.1.patch

> Metalisteners are not notified when threadlocal metaconf is cleanup 
> 
>
> Key: HIVE-17117
> URL: https://issues.apache.org/jira/browse/HIVE-17117
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
> Environment: Tested on master branch (Applicable for downlevel 
> versions as well)
>Reporter: PRASHANT GOLASH
>Assignee: PRASHANT GOLASH
>Priority: Minor
> Attachments: HIVE-17117.1.patch, HIVE-17117.patch
>
>
> Meta listeners are not notified of meta-conf cleanup. This could potentially 
> leave stale values on listeners objects. For e.g.
> Request1
> a. HS2 -> HMS : HMSHandler#setMetaConf
>  MetaListeners are notified of the ConfigChangeEvent.
> b. HS2 -> HMS : HMSHandler#shutdown / HiveMetaStore#deleteContext (if 
> shutdown is not invoked)
> MetaConf is cleaned up in HiveMetaStore#cleanupRawStore, but meta 
> listeners are not notified
> Request 2
> 3. HS2->HMS : AlterPartition
>  MetaListeners are notified of AlterPartitionEvent. If any listener has 
> taken dependency on the meta conf value, it will still be having stale value 
> from Request1 and would potentially be having issues.
> The correct behavior should be to notify meta listeners on cleanup as well.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17117) Metalisteners are not notified when threadlocal metaconf is cleanup

2017-07-20 Thread PRASHANT GOLASH (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

PRASHANT GOLASH updated HIVE-17117:
---
Attachment: (was: HIVE-17117.patch)

> Metalisteners are not notified when threadlocal metaconf is cleanup 
> 
>
> Key: HIVE-17117
> URL: https://issues.apache.org/jira/browse/HIVE-17117
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
> Environment: Tested on master branch (Applicable for downlevel 
> versions as well)
>Reporter: PRASHANT GOLASH
>Assignee: PRASHANT GOLASH
>Priority: Minor
> Attachments: HIVE-17117.1.patch, HIVE-17117.patch
>
>
> Meta listeners are not notified of meta-conf cleanup. This could potentially 
> leave stale values on listeners objects. For e.g.
> Request1
> a. HS2 -> HMS : HMSHandler#setMetaConf
>  MetaListeners are notified of the ConfigChangeEvent.
> b. HS2 -> HMS : HMSHandler#shutdown / HiveMetaStore#deleteContext (if 
> shutdown is not invoked)
> MetaConf is cleaned up in HiveMetaStore#cleanupRawStore, but meta 
> listeners are not notified
> Request 2
> 3. HS2->HMS : AlterPartition
>  MetaListeners are notified of AlterPartitionEvent. If any listener has 
> taken dependency on the meta conf value, it will still be having stale value 
> from Request1 and would potentially be having issues.
> The correct behavior should be to notify meta listeners on cleanup as well.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17117) Metalisteners are not notified when threadlocal metaconf is cleanup

2017-07-20 Thread PRASHANT GOLASH (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

PRASHANT GOLASH updated HIVE-17117:
---
Attachment: HIVE-17117.patch

> Metalisteners are not notified when threadlocal metaconf is cleanup 
> 
>
> Key: HIVE-17117
> URL: https://issues.apache.org/jira/browse/HIVE-17117
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
> Environment: Tested on master branch (Applicable for downlevel 
> versions as well)
>Reporter: PRASHANT GOLASH
>Assignee: PRASHANT GOLASH
>Priority: Minor
> Attachments: HIVE-17117.patch, HIVE-17117.patch
>
>
> Meta listeners are not notified of meta-conf cleanup. This could potentially 
> leave stale values on listeners objects. For e.g.
> Request1
> a. HS2 -> HMS : HMSHandler#setMetaConf
>  MetaListeners are notified of the ConfigChangeEvent.
> b. HS2 -> HMS : HMSHandler#shutdown / HiveMetaStore#deleteContext (if 
> shutdown is not invoked)
> MetaConf is cleaned up in HiveMetaStore#cleanupRawStore, but meta 
> listeners are not notified
> Request 2
> 3. HS2->HMS : AlterPartition
>  MetaListeners are notified of AlterPartitionEvent. If any listener has 
> taken dependency on the meta conf value, it will still be having stale value 
> from Request1 and would potentially be having issues.
> The correct behavior should be to notify meta listeners on cleanup as well.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (HIVE-17130) Add automated tests to check backwards compatibility of core APIs

2017-07-20 Thread Sahil Takiar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095130#comment-16095130
 ] 

Sahil Takiar edited comment on HIVE-17130 at 7/20/17 6:12 PM:
--

Here are the relevant JIRAs from other Apache Projects who have done similar 
things:

HADOOP-13583
HBASE-12808 and HBASE-18020
KUDU-1265
SPARK-1094 (Spark uses a Scala specific tool called 
[MiMa|https://github.com/typesafehub/migration-manager])


was (Author: stakiar):
Here are the relevant JIRAs from other Apache Projects who have done similar 
things:

HADOOP-13583
HBASE-12808 and HBASE-18020
KUDU-1265

> Add automated tests to check backwards compatibility of core APIs
> -
>
> Key: HIVE-17130
> URL: https://issues.apache.org/jira/browse/HIVE-17130
> Project: Hive
>  Issue Type: Test
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>
> We should added automated tests that check we are not adding backwards 
> incompatible changes to core APIs (e.g. HMS APIs, SerDe APIs, UDF APIs, etc.).
> Other Apache components, such as HBase and Hadoop already have existing 
> checks. They are largely based on the japi-compliance-checker: 
> https://lvc.github.io/japi-compliance-checker/
> The nice thing about the japi-compliance-checker is that it can identify an 
> interface as "any class with a specified Java annotation", so we can use the 
> compliance-checker to check for backwards compatibility of any classes 
> annotated with InterfaceAudience.Public
> Ideally, we can build this check into our pre-commit job, or get it into 
> YETUS, since we are already working on adding YETUS support to Hive.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17131) Add InterfaceAudience and InterfaceStability annotations for SerDe APIs

2017-07-20 Thread Sahil Takiar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095138#comment-16095138
 ] 

Sahil Takiar commented on HIVE-17131:
-

[~ashutoshc], [~sershe] saw you both did some work on SerDe APIs in HIVE-15167 
and HIVE-4007, so wanted to see if either of you had any thoughts or objections 
to this.

> Add InterfaceAudience and InterfaceStability annotations for SerDe APIs
> ---
>
> Key: HIVE-17131
> URL: https://issues.apache.org/jira/browse/HIVE-17131
> Project: Hive
>  Issue Type: Sub-task
>  Components: Serializers/Deserializers
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-17131.1.patch
>
>
> Adding InterfaceAudience and InterfaceStability annotations for the core 
> SerDe APIs.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17130) Add automated tests to check backwards compatibility of core APIs

2017-07-20 Thread Sahil Takiar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095130#comment-16095130
 ] 

Sahil Takiar commented on HIVE-17130:
-

Here are the relevant JIRAs from other Apache Projects who have done similar 
things:

HADOOP-13583
HBASE-12808 and HBASE-18020
KUDU-1265

> Add automated tests to check backwards compatibility of core APIs
> -
>
> Key: HIVE-17130
> URL: https://issues.apache.org/jira/browse/HIVE-17130
> Project: Hive
>  Issue Type: Test
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>
> We should added automated tests that check we are not adding backwards 
> incompatible changes to core APIs (e.g. HMS APIs, SerDe APIs, UDF APIs, etc.).
> Other Apache components, such as HBase and Hadoop already have existing 
> checks. They are largely based on the japi-compliance-checker: 
> https://lvc.github.io/japi-compliance-checker/
> The nice thing about the japi-compliance-checker is that it can identify an 
> interface as "any class with a specified Java annotation", so we can use the 
> compliance-checker to check for backwards compatibility of any classes 
> annotated with InterfaceAudience.Public
> Ideally, we can build this check into our pre-commit job, or get it into 
> YETUS, since we are already working on adding YETUS support to Hive.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17117) Metalisteners are not notified when threadlocal metaconf is cleanup

2017-07-20 Thread Mohit Sabharwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095091#comment-16095091
 ] 

Mohit Sabharwal commented on HIVE-17117:


looks like the patch attached needs to be refreshed. Not the same as one on RB.

> Metalisteners are not notified when threadlocal metaconf is cleanup 
> 
>
> Key: HIVE-17117
> URL: https://issues.apache.org/jira/browse/HIVE-17117
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
> Environment: Tested on master branch (Applicable for downlevel 
> versions as well)
>Reporter: PRASHANT GOLASH
>Assignee: PRASHANT GOLASH
>Priority: Minor
> Attachments: HIVE-17117.patch
>
>
> Meta listeners are not notified of meta-conf cleanup. This could potentially 
> leave stale values on listeners objects. For e.g.
> Request1
> a. HS2 -> HMS : HMSHandler#setMetaConf
>  MetaListeners are notified of the ConfigChangeEvent.
> b. HS2 -> HMS : HMSHandler#shutdown / HiveMetaStore#deleteContext (if 
> shutdown is not invoked)
> MetaConf is cleaned up in HiveMetaStore#cleanupRawStore, but meta 
> listeners are not notified
> Request 2
> 3. HS2->HMS : AlterPartition
>  MetaListeners are notified of AlterPartitionEvent. If any listener has 
> taken dependency on the meta conf value, it will still be having stale value 
> from Request1 and would potentially be having issues.
> The correct behavior should be to notify meta listeners on cleanup as well.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17117) Metalisteners are not notified when threadlocal metaconf is cleanup

2017-07-20 Thread Chao Sun (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095073#comment-16095073
 ] 

Chao Sun commented on HIVE-17117:
-

+1

> Metalisteners are not notified when threadlocal metaconf is cleanup 
> 
>
> Key: HIVE-17117
> URL: https://issues.apache.org/jira/browse/HIVE-17117
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
> Environment: Tested on master branch (Applicable for downlevel 
> versions as well)
>Reporter: PRASHANT GOLASH
>Assignee: PRASHANT GOLASH
>Priority: Minor
> Attachments: HIVE-17117.patch
>
>
> Meta listeners are not notified of meta-conf cleanup. This could potentially 
> leave stale values on listeners objects. For e.g.
> Request1
> a. HS2 -> HMS : HMSHandler#setMetaConf
>  MetaListeners are notified of the ConfigChangeEvent.
> b. HS2 -> HMS : HMSHandler#shutdown / HiveMetaStore#deleteContext (if 
> shutdown is not invoked)
> MetaConf is cleaned up in HiveMetaStore#cleanupRawStore, but meta 
> listeners are not notified
> Request 2
> 3. HS2->HMS : AlterPartition
>  MetaListeners are notified of AlterPartitionEvent. If any listener has 
> taken dependency on the meta conf value, it will still be having stale value 
> from Request1 and would potentially be having issues.
> The correct behavior should be to notify meta listeners on cleanup as well.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17137) Fix javolution conflict

2017-07-20 Thread Jesus Camacho Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095071#comment-16095071
 ] 

Jesus Camacho Rodriguez commented on HIVE-17137:


+1

> Fix javolution conflict
> ---
>
> Key: HIVE-17137
> URL: https://issues.apache.org/jira/browse/HIVE-17137
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-17137.01.patch
>
>
> as reported by [~jcamachorodriguez]
> {code}
> [WARNING] Some problems were encountered while building the effective model 
> for org.apache.hive:hive-exec:jar:3.0.0-SNAPSHOT
> [WARNING] 'dependencies.dependency.(groupId:artifactId:type:classifier)' must 
> be unique: javolution:javolution:jar -> duplicate declaration of version 
> ${javolution.version} @ org.apache.hive:hive-exec:[unknown-version], 
> /grid/5/dev/jcamachorodriguez/dist/tez-autobuild/hive/ql/pom.xml, line 366, 
> column 17
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17137) Fix javolution conflict

2017-07-20 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-17137:
---
Status: Patch Available  (was: Open)

> Fix javolution conflict
> ---
>
> Key: HIVE-17137
> URL: https://issues.apache.org/jira/browse/HIVE-17137
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-17137.01.patch
>
>
> as reported by [~jcamachorodriguez]
> {code}
> [WARNING] Some problems were encountered while building the effective model 
> for org.apache.hive:hive-exec:jar:3.0.0-SNAPSHOT
> [WARNING] 'dependencies.dependency.(groupId:artifactId:type:classifier)' must 
> be unique: javolution:javolution:jar -> duplicate declaration of version 
> ${javolution.version} @ org.apache.hive:hive-exec:[unknown-version], 
> /grid/5/dev/jcamachorodriguez/dist/tez-autobuild/hive/ql/pom.xml, line 366, 
> column 17
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17137) Fix javolution conflict

2017-07-20 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-17137:
---
Attachment: HIVE-17137.01.patch

> Fix javolution conflict
> ---
>
> Key: HIVE-17137
> URL: https://issues.apache.org/jira/browse/HIVE-17137
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-17137.01.patch
>
>
> as reported by [~jcamachorodriguez]
> {code}
> [WARNING] Some problems were encountered while building the effective model 
> for org.apache.hive:hive-exec:jar:3.0.0-SNAPSHOT
> [WARNING] 'dependencies.dependency.(groupId:artifactId:type:classifier)' must 
> be unique: javolution:javolution:jar -> duplicate declaration of version 
> ${javolution.version} @ org.apache.hive:hive-exec:[unknown-version], 
> /grid/5/dev/jcamachorodriguez/dist/tez-autobuild/hive/ql/pom.xml, line 366, 
> column 17
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17137) Fix javolution conflict

2017-07-20 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095065#comment-16095065
 ] 

Pengcheng Xiong commented on HIVE-17137:


[~jcamachorodriguez], could u review the patch? thanks!

> Fix javolution conflict
> ---
>
> Key: HIVE-17137
> URL: https://issues.apache.org/jira/browse/HIVE-17137
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-17137.01.patch
>
>
> as reported by [~jcamachorodriguez]
> {code}
> [WARNING] Some problems were encountered while building the effective model 
> for org.apache.hive:hive-exec:jar:3.0.0-SNAPSHOT
> [WARNING] 'dependencies.dependency.(groupId:artifactId:type:classifier)' must 
> be unique: javolution:javolution:jar -> duplicate declaration of version 
> ${javolution.version} @ org.apache.hive:hive-exec:[unknown-version], 
> /grid/5/dev/jcamachorodriguez/dist/tez-autobuild/hive/ql/pom.xml, line 366, 
> column 17
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17137) Fix javolution conflict

2017-07-20 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong reassigned HIVE-17137:
--


> Fix javolution conflict
> ---
>
> Key: HIVE-17137
> URL: https://issues.apache.org/jira/browse/HIVE-17137
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
>
> as reported by [~jcamachorodriguez]
> {code}
> [WARNING] Some problems were encountered while building the effective model 
> for org.apache.hive:hive-exec:jar:3.0.0-SNAPSHOT
> [WARNING] 'dependencies.dependency.(groupId:artifactId:type:classifier)' must 
> be unique: javolution:javolution:jar -> duplicate declaration of version 
> ${javolution.version} @ org.apache.hive:hive-exec:[unknown-version], 
> /grid/5/dev/jcamachorodriguez/dist/tez-autobuild/hive/ql/pom.xml, line 366, 
> column 17
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16996) Add HLL as an alternative to FM sketch to compute stats

2017-07-20 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095044#comment-16095044
 ] 

Pengcheng Xiong commented on HIVE-16996:


yes, i also saw that just now, I think it is due to my problem. I will take a 
look and put a patch there. thanks for discovering this!

> Add HLL as an alternative to FM sketch to compute stats
> ---
>
> Key: HIVE-16996
> URL: https://issues.apache.org/jira/browse/HIVE-16996
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
>  Labels: TODOC3.0
> Fix For: 3.0.0
>
> Attachments: Accuracy and performance comparison between HyperLogLog 
> and FM Sketch.docx, HIVE-16966.01.patch, HIVE-16966.02.patch, 
> HIVE-16966.03.patch, HIVE-16966.04.patch, HIVE-16966.05.patch, 
> HIVE-16966.06.patch, HIVE-16966.07.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17085) ORC file merge/concatenation should do full schema check

2017-07-20 Thread Zoltan Haindrich (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095019#comment-16095019
 ] 

Zoltan Haindrich commented on HIVE-17085:
-

+1

> ORC file merge/concatenation should do full schema check
> 
>
> Key: HIVE-17085
> URL: https://issues.apache.org/jira/browse/HIVE-17085
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 2.2.0, 2.3.0, 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-17085.1.patch, HIVE-17085.2.patch
>
>
> ORC merging/concatenation compatibility check just looks for column count 
> match at outer level. ORC schema evolution now supports inner structs as 
> well. With that outer level column count will match but inner column level 
> will not match. Compatibility check should do full schema match before 
> merging/concatenation. This issue will not cause data loss but will cause 
> task failures with exception like below
> {code}
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to close 
> OrcFileMergeOperator
>   at 
> org.apache.hadoop.hive.ql.exec.OrcFileMergeOperator.closeOp(OrcFileMergeOperator.java:247)
>   at 
> org.apache.hadoop.hive.ql.exec.OrcFileMergeOperator.processKeyValuePairs(OrcFileMergeOperator.java:172)
>   at 
> org.apache.hadoop.hive.ql.exec.OrcFileMergeOperator.process(OrcFileMergeOperator.java:72)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MergeFileRecordProcessor.processRow(MergeFileRecordProcessor.java:212)
>   ... 16 more
> Caused by: java.lang.IllegalArgumentException: Column has wrong number of 
> index entries found: 0 expected: 1
>   at 
> org.apache.orc.impl.WriterImpl$TreeWriter.writeStripe(WriterImpl.java:695)
>   at 
> org.apache.orc.impl.WriterImpl$StructTreeWriter.writeStripe(WriterImpl.java:2147)
>   at org.apache.orc.impl.WriterImpl.flushStripe(WriterImpl.java:2661)
>   at org.apache.orc.impl.WriterImpl.close(WriterImpl.java:2834)
>   at 
> org.apache.hadoop.hive.ql.io.orc.WriterImpl.close(WriterImpl.java:321)
>   at 
> org.apache.hadoop.hive.ql.exec.OrcFileMergeOperator.closeOp(OrcFileMergeOperator.java:243)
>   ... 19 more
> {code}
> Concatenation should also make sure writer version is matching (it currently 
> checks only file version match).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17001) Insert overwrite table doesn't clean partition directory on HDFS if partition is missing from HMS

2017-07-20 Thread Barna Zsombor Klara (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barna Zsombor Klara updated HIVE-17001:
---
Status: Open  (was: Patch Available)

Cancelling the patch as after some discussions it was decided that this should 
not be an issue. Data in the directory could be copied there on purpose by the 
user and should not be deleted without a warning.

> Insert overwrite table doesn't clean partition directory on HDFS if partition 
> is missing from HMS
> -
>
> Key: HIVE-17001
> URL: https://issues.apache.org/jira/browse/HIVE-17001
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Metastore
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
> Attachments: HIVE-17001.01.patch
>
>
> Insert overwrite table should clear existing data before creating the new 
> data files.
> For a partitioned table we will clean any folder of existing partitions on 
> HDFS, however if the partition folder exists only on HDFS and the partition 
> definition is missing in HMS, the folder is not cleared.
> Reproduction steps:
> 1. CREATE TABLE test( col1 string) PARTITIONED BY (ds string);
> 2. INSERT INTO test PARTITION(ds='p1') values ('a');
> 3. Copy the data to a different folder with different name.
> 4. ALTER TABLE test DROP PARTITION (ds='p1');
> 5. Recreate the partition directory, copy and rename the data file back
> 6. INSERT OVERWRITE TABLE test PARTITION(ds='p1') values ('b');
> 7. SELECT * from test;
> will result in 2 records being returned instead of 1.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17085) ORC file merge/concatenation should do full schema check

2017-07-20 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16094937#comment-16094937
 ] 

Prasanth Jayachandran commented on HIVE-17085:
--

[~gopalv] Can you please review this patch?

> ORC file merge/concatenation should do full schema check
> 
>
> Key: HIVE-17085
> URL: https://issues.apache.org/jira/browse/HIVE-17085
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 2.2.0, 2.3.0, 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-17085.1.patch, HIVE-17085.2.patch
>
>
> ORC merging/concatenation compatibility check just looks for column count 
> match at outer level. ORC schema evolution now supports inner structs as 
> well. With that outer level column count will match but inner column level 
> will not match. Compatibility check should do full schema match before 
> merging/concatenation. This issue will not cause data loss but will cause 
> task failures with exception like below
> {code}
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to close 
> OrcFileMergeOperator
>   at 
> org.apache.hadoop.hive.ql.exec.OrcFileMergeOperator.closeOp(OrcFileMergeOperator.java:247)
>   at 
> org.apache.hadoop.hive.ql.exec.OrcFileMergeOperator.processKeyValuePairs(OrcFileMergeOperator.java:172)
>   at 
> org.apache.hadoop.hive.ql.exec.OrcFileMergeOperator.process(OrcFileMergeOperator.java:72)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MergeFileRecordProcessor.processRow(MergeFileRecordProcessor.java:212)
>   ... 16 more
> Caused by: java.lang.IllegalArgumentException: Column has wrong number of 
> index entries found: 0 expected: 1
>   at 
> org.apache.orc.impl.WriterImpl$TreeWriter.writeStripe(WriterImpl.java:695)
>   at 
> org.apache.orc.impl.WriterImpl$StructTreeWriter.writeStripe(WriterImpl.java:2147)
>   at org.apache.orc.impl.WriterImpl.flushStripe(WriterImpl.java:2661)
>   at org.apache.orc.impl.WriterImpl.close(WriterImpl.java:2834)
>   at 
> org.apache.hadoop.hive.ql.io.orc.WriterImpl.close(WriterImpl.java:321)
>   at 
> org.apache.hadoop.hive.ql.exec.OrcFileMergeOperator.closeOp(OrcFileMergeOperator.java:243)
>   ... 19 more
> {code}
> Concatenation should also make sure writer version is matching (it currently 
> checks only file version match).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17135) Bad error messages when beeline connects to unreachable hosts using binary and SSL

2017-07-20 Thread Carter Shanklin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16094887#comment-16094887
 ] 

Carter Shanklin commented on HIVE-17135:


One more problem: Connection Refused is also not reported properly when using 
binary and SSL

Compare:
Error: Could not open client transport with JDBC Uri: 
jdbc:hive2://hdp261.example.com:10003/default;ssl=true;sslTrustStore=/etc/truststore.jks;trustStorePassword=abcdef:
 Could not connect to hdp261.example.com on port 10003 (state=08S01,code=0)

Versus:

Error: Could not open client transport with JDBC Uri: 
jdbc:hive2://hdp261.example.com:10003/default;ssl=true;sslTrustStore=/etc/truststore.jks;trustStorePassword=abcdef;transportMode=http;httpPath=cliservice:
 Could not establish connection to 
jdbc:hive2://hdp261.example.com:10003/default;ssl=true;sslTrustStore=/etc/truststore.jks;trustStorePassword=abcdef;transportMode=http;httpPath=cliservice:
 org.apache.http.conn.HttpHostConnectException: Connect to 
hdp261.example.com:10003 [hdp261.example.com/192.168.59.21] failed: Connection 
refused (Connection refused) (state=08S01,code=0)

> Bad error messages when beeline connects to unreachable hosts using binary 
> and SSL
> --
>
> Key: HIVE-17135
> URL: https://issues.apache.org/jira/browse/HIVE-17135
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Reporter: Carter Shanklin
> Attachments: Screen Shot 2017-07-20 at 08.42.07.png
>
>
> When you attempt to connect beeline to an unreachable host using both binary 
> transport and SSL you get a generic / unhelpful error message.
> If you use HTTP or you don't use SSL (binary or HTTP) you get a descriptive 
> error message.
> "Network is unreachable" <- for unroutable destinations
> "Connection timed out" <- for hosts that fail to respond for whatever reason.
> See attached image for the matrix.
> It would be better if binary+SSL gave the same descriptive errors



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17136) Unhelpful beeline error message when you attempt to connect to HTTP HS2 using binary with SSL enabled

2017-07-20 Thread Carter Shanklin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carter Shanklin updated HIVE-17136:
---
Description: 
In this case the error message is "Invalid status 72".

Full error:
Error: Could not open client transport with JDBC Uri: 
jdbc:hive2://hdp261.example.com:10001/default;ssl=true;sslTrustStore=/etc/truststore.jks;trustStorePassword=abcdef;httpPath=cliservice:
 Invalid status 72 (state=08S01,code=0)

In my environment the connection works if I add transportMode=http.

Compare this error to the error you get if you try to connect to something that 
is not HiveServer2 like SSH:

Error: Could not open client transport with JDBC Uri: 
jdbc:hive2://hdp261.example.com:22/default;ssl=true;sslTrustStore=/etc/truststore.jks;trustStorePassword=abcdef;httpPath=cliservice:
 javax.net.ssl.SSLException: Unrecognized SSL message, plaintext connection? 
(state=08S01,code=0)

If you got a similar error when you connect to HS2 it would be a lot easier to 
diagnose.

  was:
In this case the error message is "Invalid status 72".

Full error:
Error: Could not open client transport with JDBC Uri: 
jdbc:hive2://hdp261.example.com:10001/default;ssl=true;sslTrustStore=/etc/truststore.jks;trustStorePassword=abcdef;httpPath=cliservice:
 Invalid status 72 (state=08S01,code=0)

In my environment the connection works if I add transportMode=http.

Compare this error to the error you get if you try to connect to something that 
is not HiveServer2 like SSH:

Error: Could not open client transport with JDBC Uri: 
jdbc:hive2://hdp261.example.com:22/default;ssl=true;sslTrustStore=/etc/truststore.jks;trustStorePassword=abcdef;httpPath=cliservice:
 javax.net.ssl.SSLException: Unrecognized SSL message, plaintext connection? 
(state=08S01,code=0)

If you got this error when you connect to HS2 it would be a lot easier to 
diagnose.


> Unhelpful beeline error message when you attempt to connect to HTTP HS2 using 
> binary with SSL enabled
> -
>
> Key: HIVE-17136
> URL: https://issues.apache.org/jira/browse/HIVE-17136
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Reporter: Carter Shanklin
>
> In this case the error message is "Invalid status 72".
> Full error:
> Error: Could not open client transport with JDBC Uri: 
> jdbc:hive2://hdp261.example.com:10001/default;ssl=true;sslTrustStore=/etc/truststore.jks;trustStorePassword=abcdef;httpPath=cliservice:
>  Invalid status 72 (state=08S01,code=0)
> In my environment the connection works if I add transportMode=http.
> Compare this error to the error you get if you try to connect to something 
> that is not HiveServer2 like SSH:
> Error: Could not open client transport with JDBC Uri: 
> jdbc:hive2://hdp261.example.com:22/default;ssl=true;sslTrustStore=/etc/truststore.jks;trustStorePassword=abcdef;httpPath=cliservice:
>  javax.net.ssl.SSLException: Unrecognized SSL message, plaintext connection? 
> (state=08S01,code=0)
> If you got a similar error when you connect to HS2 it would be a lot easier 
> to diagnose.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17136) Unhelpful beeline error message when you attempt to connect to HTTP HS2 using binary with SSL enabled

2017-07-20 Thread Carter Shanklin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carter Shanklin updated HIVE-17136:
---
Component/s: Beeline

> Unhelpful beeline error message when you attempt to connect to HTTP HS2 using 
> binary with SSL enabled
> -
>
> Key: HIVE-17136
> URL: https://issues.apache.org/jira/browse/HIVE-17136
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Reporter: Carter Shanklin
>
> In this case the error message is "Invalid status 72".
> Full error:
> Error: Could not open client transport with JDBC Uri: 
> jdbc:hive2://hdp261.example.com:10001/default;ssl=true;sslTrustStore=/etc/truststore.jks;trustStorePassword=abcdef;httpPath=cliservice:
>  Invalid status 72 (state=08S01,code=0)
> In my environment the connection works if I add transportMode=http.
> Compare this error to the error you get if you try to connect to something 
> that is not HiveServer2 like SSH:
> Error: Could not open client transport with JDBC Uri: 
> jdbc:hive2://hdp261.example.com:22/default;ssl=true;sslTrustStore=/etc/truststore.jks;trustStorePassword=abcdef;httpPath=cliservice:
>  javax.net.ssl.SSLException: Unrecognized SSL message, plaintext connection? 
> (state=08S01,code=0)
> If you got this error when you connect to HS2 it would be a lot easier to 
> diagnose.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17135) Bad error messages when beeline connects to unreachable hosts using binary and SSL

2017-07-20 Thread Carter Shanklin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carter Shanklin updated HIVE-17135:
---
Attachment: Screen Shot 2017-07-20 at 08.42.07.png

> Bad error messages when beeline connects to unreachable hosts using binary 
> and SSL
> --
>
> Key: HIVE-17135
> URL: https://issues.apache.org/jira/browse/HIVE-17135
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Reporter: Carter Shanklin
> Attachments: Screen Shot 2017-07-20 at 08.42.07.png
>
>
> When you attempt to connect beeline to an unreachable host using both binary 
> transport and SSL you get a generic / unhelpful error message.
> If you use HTTP or you don't use SSL (binary or HTTP) you get a descriptive 
> error message.
> "Network is unreachable" <- for unroutable destinations
> "Connection timed out" <- for hosts that fail to respond for whatever reason.
> See attached image for the matrix.
> It would be better if binary+SSL gave the same descriptive errors



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17090) spark.only.query.files are not being run by ptest

2017-07-20 Thread Sahil Takiar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-17090:

   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

> spark.only.query.files are not being run by ptest
> -
>
> Key: HIVE-17090
> URL: https://issues.apache.org/jira/browse/HIVE-17090
> Project: Hive
>  Issue Type: Bug
>  Components: Test
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Fix For: 3.0.0
>
> Attachments: HIVE-17090.patch
>
>
> Checked a recent run of Hive QA and it doesn't look like qtests specified in 
> spark.only.query.files are being run.
> I think some modifications to ptest config files are required to get this 
> working - e.g. the deployed master-m2.properties file for ptest should 
> contain mainProperties.$\{spark.only.query.files} in the 
> qFileTest.miniSparkOnYarn.groups.normal key.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17090) spark.only.query.files are not being run by ptest

2017-07-20 Thread Sahil Takiar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16094850#comment-16094850
 ] 

Sahil Takiar commented on HIVE-17090:
-

Committed to master, thanks Sergio!

> spark.only.query.files are not being run by ptest
> -
>
> Key: HIVE-17090
> URL: https://issues.apache.org/jira/browse/HIVE-17090
> Project: Hive
>  Issue Type: Bug
>  Components: Test
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Fix For: 3.0.0
>
> Attachments: HIVE-17090.patch
>
>
> Checked a recent run of Hive QA and it doesn't look like qtests specified in 
> spark.only.query.files are being run.
> I think some modifications to ptest config files are required to get this 
> working - e.g. the deployed master-m2.properties file for ptest should 
> contain mainProperties.$\{spark.only.query.files} in the 
> qFileTest.miniSparkOnYarn.groups.normal key.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17122) spark_vectorized_dynamic_partition_pruning.q is continuously failing

2017-07-20 Thread Vihang Karajgaonkar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16094828#comment-16094828
 ] 

Vihang Karajgaonkar commented on HIVE-17122:


I will spend some time today to look into this and update if I find anything.

> spark_vectorized_dynamic_partition_pruning.q is continuously failing
> 
>
> Key: HIVE-17122
> URL: https://issues.apache.org/jira/browse/HIVE-17122
> Project: Hive
>  Issue Type: Bug
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>
> {code}
> Driver stacktrace:
> at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1450)
> at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1438)
> at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1437)
> at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
> at 
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1437)
> at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:811)
> at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:811)
> at scala.Option.foreach(Option.scala:257)
> at 
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:811)
> at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1659)
> at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1618)
> at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1607)
> at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
> Caused by: java.lang.RuntimeException: Hive Runtime Error while closing 
> operators: 1
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.close(SparkReduceRecordHandler.java:616)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.closeRecordProcessor(HiveReduceFunctionResultList.java:67)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList.hasNext(HiveBaseFunctionResultList.java:96)
> at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:42)
> at 
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47)
> at org.apache.spark.scheduler.Task.run(Task.scala:85)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorGroupKeyHelper.copyGroupKey(VectorGroupKeyHelper.java:107)
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator$ProcessingModeReduceMergePartial.doProcessBatch(VectorGroupByOperator.java:832)
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator$ProcessingModeBase.processBatch(VectorGroupByOperator.java:179)
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator.process(VectorGroupByOperator.java:1037)
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.forwardBatch(SparkReduceRecordHandler.java:542)
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.close(SparkReduceRecordHandler.java:584)
> ... 11 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17122) spark_vectorized_dynamic_partition_pruning.q is continuously failing

2017-07-20 Thread Sahil Takiar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16094823#comment-16094823
 ] 

Sahil Takiar commented on HIVE-17122:
-

Thanks for pointing that out [~kellyzly]. Sounds we have been hitting this 
issue for a while. I tried debugging the code a bit, and didn't find anything 
obvious. Will update this JIRA if I find something.

CC: [~vihangk1] if you have any idea on this, let us know.

> spark_vectorized_dynamic_partition_pruning.q is continuously failing
> 
>
> Key: HIVE-17122
> URL: https://issues.apache.org/jira/browse/HIVE-17122
> Project: Hive
>  Issue Type: Bug
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>
> {code}
> Driver stacktrace:
> at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1450)
> at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1438)
> at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1437)
> at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
> at 
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1437)
> at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:811)
> at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:811)
> at scala.Option.foreach(Option.scala:257)
> at 
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:811)
> at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1659)
> at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1618)
> at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1607)
> at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
> Caused by: java.lang.RuntimeException: Hive Runtime Error while closing 
> operators: 1
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.close(SparkReduceRecordHandler.java:616)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.closeRecordProcessor(HiveReduceFunctionResultList.java:67)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList.hasNext(HiveBaseFunctionResultList.java:96)
> at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:42)
> at 
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47)
> at org.apache.spark.scheduler.Task.run(Task.scala:85)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorGroupKeyHelper.copyGroupKey(VectorGroupKeyHelper.java:107)
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator$ProcessingModeReduceMergePartial.doProcessBatch(VectorGroupByOperator.java:832)
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator$ProcessingModeBase.processBatch(VectorGroupByOperator.java:179)
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator.process(VectorGroupByOperator.java:1037)
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.forwardBatch(SparkReduceRecordHandler.java:542)
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.close(SparkReduceRecordHandler.java:584)
> ... 11 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16787) Fix itests in branch-2.2

2017-07-20 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16094815#comment-16094815
 ] 

Alan Gates commented on HIVE-16787:
---

+1

> Fix itests in branch-2.2
> 
>
> Key: HIVE-16787
> URL: https://issues.apache.org/jira/browse/HIVE-16787
> Project: Hive
>  Issue Type: Bug
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Fix For: 2.2.0
>
>
> The itests are broken in branch 2.2 and need to be fixed before release.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17034) The spark tar for itests is downloaded every time if md5sum is not installed

2017-07-20 Thread Zoltan Haindrich (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16094726#comment-16094726
 ] 

Zoltan Haindrich commented on HIVE-17034:
-

[~lirui] I think that the new solution will never download anything in case 
md5sum is not installed...I think it would be better to fail the build in case 
md5sum is not installed - I guess currently if someone doesn't have md5sum 
accessible in the path; will end up with cryptic messages about spark being not 
in the vicinity...

> The spark tar for itests is downloaded every time if md5sum is not installed
> 
>
> Key: HIVE-17034
> URL: https://issues.apache.org/jira/browse/HIVE-17034
> Project: Hive
>  Issue Type: Test
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
>Priority: Trivial
> Fix For: 3.0.0
>
> Attachments: HIVE-17034.1.patch
>
>
> I think we should either skip verifying md5, or fail the build to let 
> developer know md5sum is required.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16222) add a setting to disable row.serde for specific formats; enable for others

2017-07-20 Thread Daniel Voros (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16094705#comment-16094705
 ] 

Daniel Voros commented on HIVE-16222:
-

[~sershe], [~leftylev] is right, this hasn't been committed yet.

> add a setting to disable row.serde for specific formats; enable for others
> --
>
> Key: HIVE-16222
> URL: https://issues.apache.org/jira/browse/HIVE-16222
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: 3.0.0
>
> Attachments: HIVE-16222.01.patch, HIVE-16222.02.patch, 
> HIVE-16222.03.patch, HIVE-16222.04.patch, HIVE-16222.05.patch, 
> HIVE-16222.patch
>
>
> Per [~gopalv]
> {quote}
> row.serde = true ... breaks Parquet (they expect to get the same object back, 
> which means you can't buffer 1024 rows).
> {quote}
> We want to enable this and vector.serde for text vectorization. Need to turn 
> it off for specific formats.
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Reopened] (HIVE-16222) add a setting to disable row.serde for specific formats; enable for others

2017-07-20 Thread Zoltan Haindrich (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich reopened HIVE-16222:
-

> add a setting to disable row.serde for specific formats; enable for others
> --
>
> Key: HIVE-16222
> URL: https://issues.apache.org/jira/browse/HIVE-16222
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: 3.0.0
>
> Attachments: HIVE-16222.01.patch, HIVE-16222.02.patch, 
> HIVE-16222.03.patch, HIVE-16222.04.patch, HIVE-16222.05.patch, 
> HIVE-16222.patch
>
>
> Per [~gopalv]
> {quote}
> row.serde = true ... breaks Parquet (they expect to get the same object back, 
> which means you can't buffer 1024 rows).
> {quote}
> We want to enable this and vector.serde for text vectorization. Need to turn 
> it off for specific formats.
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17114) HoS: Possible skew in shuffling when data is not really skewed

2017-07-20 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16094640#comment-16094640
 ] 

Hive QA commented on HIVE-17114:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12878159/HIVE-17114.3.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 11087 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1]
 (batchId=240)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=144)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning]
 (batchId=168)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=168)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=235)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=235)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=179)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6096/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6096/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6096/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 9 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12878159 - PreCommit-HIVE-Build

> HoS: Possible skew in shuffling when data is not really skewed
> --
>
> Key: HIVE-17114
> URL: https://issues.apache.org/jira/browse/HIVE-17114
> Project: Hive
>  Issue Type: Bug
>Reporter: Rui Li
>Assignee: Rui Li
>Priority: Minor
> Attachments: HIVE-17114.1.patch, HIVE-17114.2.patch, 
> HIVE-17114.3.patch
>
>
> Observed in HoS and may apply to other engines as well.
> When we join 2 tables on a single int key, we use the key itself as hash code 
> in {{ObjectInspectorUtils.hashCode}}:
> {code}
>   case INT:
> return ((IntObjectInspector) poi).get(o);
> {code}
> Suppose the keys are different but are all some multiples of 10. And if we 
> choose 10 as #reducers, the shuffle will be skewed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17037) Use 1-to-1 Tez edge to avoid unnecessary input data shuffle

2017-07-20 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-17037:
---
   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Pushed to master, thanks [~ashutoshc]!

> Use 1-to-1 Tez edge to avoid unnecessary input data shuffle
> ---
>
> Key: HIVE-17037
> URL: https://issues.apache.org/jira/browse/HIVE-17037
> Project: Hive
>  Issue Type: Improvement
>  Components: Physical Optimizer
>Affects Versions: 3.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Fix For: 3.0.0
>
> Attachments: HIVE-17037.01.patch, HIVE-17037.02.patch, 
> HIVE-17037.03.patch, HIVE-17037.patch
>
>
> As an example, consider the following query:
> {code:sql}
> SELECT *
> FROM (
>   SELECT a.value
>   FROM src1 a
>   JOIN src1 b
>   ON (a.value = b.value)
>   GROUP BY a.value
> ) a
> JOIN src
> ON (a.value = src.value);
> {code}
> Currently, the plan generated for Tez will contain an unnecessary shuffle 
> operation between the subquery and the join, since the records produced by 
> the subquery are already sorted by the value.
> This issue is to extend join algorithm selection to be able to shuffle only 
> some of the inputs for a given join and avoid unnecessary shuffle operations.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16945) Add method to compare Operators

2017-07-20 Thread Jesus Camacho Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16094596#comment-16094596
 ] 

Jesus Camacho Rodriguez commented on HIVE-16945:


[~lirui], sure, that would be great! I can review it once it is ready.

> Add method to compare Operators 
> 
>
> Key: HIVE-16945
> URL: https://issues.apache.org/jira/browse/HIVE-16945
> Project: Hive
>  Issue Type: Improvement
>  Components: Operators
>Affects Versions: 3.0.0
>Reporter: Jesus Camacho Rodriguez
>
> HIVE-10844 introduced a comparator factory class for operators that 
> encapsulates all the logic to assess whether two operators are equal:
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/OperatorComparatorFactory.java
> The current design might create problems as any change in fields of operators 
> will break the comparators. It would be better to do this via inheritance 
> from Operator base class, by adding a {{logicalEquals(Operator other)}} 
> method.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (HIVE-16945) Add method to compare Operators

2017-07-20 Thread Jesus Camacho Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16094549#comment-16094549
 ] 

Jesus Camacho Rodriguez edited comment on HIVE-16945 at 7/20/17 12:13 PM:
--

[~lirui], thanks for the feedback. I had not started working on this issue yet, 
I created the placeholder to keep it in mind.

I guess overriding equals/hashCode would create a bunch of issues since 
codebase has relied on identity comparison for Operator objects, thus I would 
create a new method indeed.
I agree with you that {{compareTo}} is not a good name for it, 
{{logicalEquals}} would be better. I have changed it in the description.


was (Author: jcamachorodriguez):
[~lirui], thanks for the feedback. I had not started working on this issue yet, 
I created the placeholder to keep it in mind.

I guess overriding equals/hashCode would create a bunch of issues since 
codebase has relied on identity comparison for this objects, thus I would 
create a new method indeed.
I agree with you that {{compareTo}} is not a good name for it, 
{{logicalEquals}} would be better. I have changed it in the description.

> Add method to compare Operators 
> 
>
> Key: HIVE-16945
> URL: https://issues.apache.org/jira/browse/HIVE-16945
> Project: Hive
>  Issue Type: Improvement
>  Components: Operators
>Affects Versions: 3.0.0
>Reporter: Jesus Camacho Rodriguez
>
> HIVE-10844 introduced a comparator factory class for operators that 
> encapsulates all the logic to assess whether two operators are equal:
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/OperatorComparatorFactory.java
> The current design might create problems as any change in fields of operators 
> will break the comparators. It would be better to do this via inheritance 
> from Operator base class, by adding a {{logicalEquals(Operator other)}} 
> method.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-16945) Add method to compare Operators

2017-07-20 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li reassigned HIVE-16945:
-

Assignee: Rui Li

> Add method to compare Operators 
> 
>
> Key: HIVE-16945
> URL: https://issues.apache.org/jira/browse/HIVE-16945
> Project: Hive
>  Issue Type: Improvement
>  Components: Operators
>Affects Versions: 3.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Rui Li
>
> HIVE-10844 introduced a comparator factory class for operators that 
> encapsulates all the logic to assess whether two operators are equal:
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/OperatorComparatorFactory.java
> The current design might create problems as any change in fields of operators 
> will break the comparators. It would be better to do this via inheritance 
> from Operator base class, by adding a {{logicalEquals(Operator other)}} 
> method.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16996) Add HLL as an alternative to FM sketch to compute stats

2017-07-20 Thread Jesus Camacho Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16094597#comment-16094597
 ] 

Jesus Camacho Rodriguez commented on HIVE-16996:


[~pxiong], I am seeing the following warning when I compile the project:
{noformat}
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hive:hive-exec:jar:3.0.0-SNAPSHOT
[WARNING] 'dependencies.dependency.(groupId:artifactId:type:classifier)' must 
be unique: javolution:javolution:jar -> duplicate declaration of version 
${javolution.version} @ org.apache.hive:hive-exec:[unknown-version], 
/grid/5/dev/jcamachorodriguez/dist/tez-autobuild/hive/ql/pom.xml, line 366, 
column 17
{noformat}
Might be related to this change? Thanks

> Add HLL as an alternative to FM sketch to compute stats
> ---
>
> Key: HIVE-16996
> URL: https://issues.apache.org/jira/browse/HIVE-16996
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
>  Labels: TODOC3.0
> Fix For: 3.0.0
>
> Attachments: Accuracy and performance comparison between HyperLogLog 
> and FM Sketch.docx, HIVE-16966.01.patch, HIVE-16966.02.patch, 
> HIVE-16966.03.patch, HIVE-16966.04.patch, HIVE-16966.05.patch, 
> HIVE-16966.06.patch, HIVE-16966.07.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16945) Add method to compare Operators

2017-07-20 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16094594#comment-16094594
 ] 

Rui Li commented on HIVE-16945:
---

Thanks [~jcamachorodriguez] for the explanations. I think I can help. Would you 
mind if I work on it?

> Add method to compare Operators 
> 
>
> Key: HIVE-16945
> URL: https://issues.apache.org/jira/browse/HIVE-16945
> Project: Hive
>  Issue Type: Improvement
>  Components: Operators
>Affects Versions: 3.0.0
>Reporter: Jesus Camacho Rodriguez
>
> HIVE-10844 introduced a comparator factory class for operators that 
> encapsulates all the logic to assess whether two operators are equal:
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/OperatorComparatorFactory.java
> The current design might create problems as any change in fields of operators 
> will break the comparators. It would be better to do this via inheritance 
> from Operator base class, by adding a {{logicalEquals(Operator other)}} 
> method.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17114) HoS: Possible skew in shuffling when data is not really skewed

2017-07-20 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16094588#comment-16094588
 ] 

Rui Li commented on HIVE-17114:
---

{{constprog_semijoin}} needs sorting query results. Other failures are not 
related. Update patch v3 to address it.
[~xuefuz], [~csun] would you mind take a look? Thanks.
The idea is to set UNIFORM trait to RS in {{SetSparkReducerParallelism}}, when 
num reducers is automatically decided. Most of the code change is refactoring 
in order to be more concise.

> HoS: Possible skew in shuffling when data is not really skewed
> --
>
> Key: HIVE-17114
> URL: https://issues.apache.org/jira/browse/HIVE-17114
> Project: Hive
>  Issue Type: Bug
>Reporter: Rui Li
>Assignee: Rui Li
>Priority: Minor
> Attachments: HIVE-17114.1.patch, HIVE-17114.2.patch, 
> HIVE-17114.3.patch
>
>
> Observed in HoS and may apply to other engines as well.
> When we join 2 tables on a single int key, we use the key itself as hash code 
> in {{ObjectInspectorUtils.hashCode}}:
> {code}
>   case INT:
> return ((IntObjectInspector) poi).get(o);
> {code}
> Suppose the keys are different but are all some multiples of 10. And if we 
> choose 10 as #reducers, the shuffle will be skewed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16924) Support distinct in presence Gby

2017-07-20 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16094576#comment-16094576
 ] 

Hive QA commented on HIVE-16924:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12878116/HIVE-16924.01.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 45 failed/errored test(s), 11088 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1]
 (batchId=240)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[decimal_3] (batchId=25)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[decimal_5] (batchId=63)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[distinct_gby] 
(batchId=56)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby_duplicate_key] 
(batchId=6)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[having2] (batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[mapjoin_distinct] 
(batchId=54)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_join] 
(batchId=19)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_decimal_3] 
(batchId=19)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_decimal_5] 
(batchId=1)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_null_projection] 
(batchId=9)
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver[hbase_bulk] 
(batchId=98)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[global_limit] 
(batchId=142)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=144)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[unionDistinct_1] 
(batchId=142)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[cbo_rp_unionDistinct_2]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[explainuser_1]
 (batchId=152)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[limit_pushdown3]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[limit_pushdown]
 (batchId=158)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[offset_limit_ppd_optimizer]
 (batchId=158)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[ptf] 
(batchId=147)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[selectDistinctStar]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[smb_cache] 
(batchId=149)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[unionDistinct_2]
 (batchId=147)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_decimal_3]
 (batchId=148)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_decimal_5]
 (batchId=145)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_null_projection]
 (batchId=146)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorized_ptf]
 (batchId=157)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning]
 (batchId=168)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_explainuser_1]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=168)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=100)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[unionDistinct_2] 
(batchId=99)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[selectDistinctStarNeg_2]
 (batchId=89)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[udaf_invalid_place]
 (batchId=89)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=235)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[limit_pushdown] 
(batchId=130)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[mapjoin_distinct] 
(batchId=124)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[parquet_join] 
(batchId=109)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[ptf] (batchId=108)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorized_ptf] 
(batchId=128)
org.apache.hadoop.hive.ql.parse.TestParseNegativeDriver.testCliDriver[wrong_distinct1]
 (batchId=239)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=179)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6095/testReport
Console output: 

[jira] [Updated] (HIVE-17114) HoS: Possible skew in shuffling when data is not really skewed

2017-07-20 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-17114:
--
Attachment: HIVE-17114.3.patch

> HoS: Possible skew in shuffling when data is not really skewed
> --
>
> Key: HIVE-17114
> URL: https://issues.apache.org/jira/browse/HIVE-17114
> Project: Hive
>  Issue Type: Bug
>Reporter: Rui Li
>Assignee: Rui Li
>Priority: Minor
> Attachments: HIVE-17114.1.patch, HIVE-17114.2.patch, 
> HIVE-17114.3.patch
>
>
> Observed in HoS and may apply to other engines as well.
> When we join 2 tables on a single int key, we use the key itself as hash code 
> in {{ObjectInspectorUtils.hashCode}}:
> {code}
>   case INT:
> return ((IntObjectInspector) poi).get(o);
> {code}
> Suppose the keys are different but are all some multiples of 10. And if we 
> choose 10 as #reducers, the shuffle will be skewed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


  1   2   >