[jira] [Created] (HIVE-23685) Removing user's extra resources when executing File Merge Task

2020-06-12 Thread Qiang.Kang (Jira)
Qiang.Kang created HIVE-23685:
-

 Summary: Removing user's extra resources when executing File Merge 
Task
 Key: HIVE-23685
 URL: https://issues.apache.org/jira/browse/HIVE-23685
 Project: Hive
  Issue Type: Bug
  Components: Physical Optimizer, Query Planning
Reporter: Qiang.Kang
Assignee: Qiang.Kang


Hi, we find that MapReduce's file merge map containers will download user's 
extra resources(such as: added jars, files, archives) before launching task. 
When these resources are large or the network is busy, file merge jobs will be 
timeout, causing the query be failed. As we all know, file merge task will run 
correctly just with hive-exec.jar and MapReduce framework. Therefore, there is 
no need to download user's resources. The patch below prevents setting 
`tmpjars` for FileMerge Task.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Github PR Pre Commit Build Error

2020-06-12 Thread Zoltan Haindrich

yeah; I've also seen it a few days ago I've already increased it - but it needs 
at a jenkins pod restart; so I'll do it in the weekend when nothing is running
https://github.com/kgyrtkirk/hive-test-kube/blob/ae4bc1567051630f642d8c4c791f0fcb7ae38eef/htk-jenkins/entrypoint#L7


On 6/12/20 3:56 PM, David Mollitor wrote:

Hey Zoltan,

A build just failed with:

Timed out waiting for websocket connection. You should increase the value
of system property
org.csanchez.jenkins.plugins.kubernetes.pipeline.ContainerExecDecorator.websocketConnectionTimeout
currently set at 60 seconds

http://130.211.9.232/blue/organizations/jenkins/hive-precommit/detail/PR-1082/5/pipeline/94


Not sure if this needs to be increased.

Thanks.



Github PR Pre Commit Build Error

2020-06-12 Thread David Mollitor
Hey Zoltan,

A build just failed with:

Timed out waiting for websocket connection. You should increase the value
of system property
org.csanchez.jenkins.plugins.kubernetes.pipeline.ContainerExecDecorator.websocketConnectionTimeout
currently set at 60 seconds

http://130.211.9.232/blue/organizations/jenkins/hive-precommit/detail/PR-1082/5/pipeline/94


Not sure if this needs to be increased.

Thanks.


[jira] [Created] (HIVE-23684) Large underestimation in NDV stats when input and join cardinality ratio is big

2020-06-12 Thread Stamatis Zampetakis (Jira)
Stamatis Zampetakis created HIVE-23684:
--

 Summary: Large underestimation in NDV stats when input and join 
cardinality ratio is big
 Key: HIVE-23684
 URL: https://issues.apache.org/jira/browse/HIVE-23684
 Project: Hive
  Issue Type: Bug
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis


Large underestimations of NDV values may occur after a join operation since the 
current logic will decrease the original NDV values proportionally.

The 
[code|https://github.com/apache/hive/blob/1271d08a3c51c021fa710449f8748b8cdb12b70f/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java#L2558]
 compares the number of rows of each relation before the join with the number 
of rows after the join and extracts a ratio for each side. Based on this ratio 
it adapts (reduces) the NDV accordingly.

Consider for instance the following query:
{code:sql}
select inv_warehouse_sk
 , inv_item_sk
 , stddev_samp(inv_quantity_on_hand) stdev
 , avg(inv_quantity_on_hand) mean
from inventory
   , date_dim
where inv_date_sk = d_date_sk
  and d_year = 1999
  and d_moy = 2
group by inv_warehouse_sk, inv_item_sk;
{code}
For the sake of the discussion, I outline below some relevant stats (from 
TPCDS30tb):
 T(inventory) = 1627857000
 T(date_dim) = 73049
 T(inventory JOIN date_dim[d_year=1999 AND d_moy=2]) = 24948000
 V(inventory, inv_date_sk) = 261
 V(inventory, inv_item_sk) = 42
 V(inventory, inv_warehouse_sk) = 27
 V(date_dim, inv, d_date_sk) = 73049

For instance, in this query the join between inventory and date_dim has ~24M 
rows while inventory has ~1.5B so the NDV of the columns coming from inventory 
are reduced by a factor of ~100 so we end up with V(JOIN, inv_item_sk) = ~6K 
while the real one is 231000.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23683) Add queue time to compaction

2020-06-12 Thread Peter Vary (Jira)
Peter Vary created HIVE-23683:
-

 Summary: Add queue time to compaction
 Key: HIVE-23683
 URL: https://issues.apache.org/jira/browse/HIVE-23683
 Project: Hive
  Issue Type: Improvement
  Components: Transactions
Reporter: Peter Vary
Assignee: Peter Vary


It would be good to report to the user when the transaction is initiated. This 
info can be used when considering the health status of the compaction system



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23682) TestMetrics is flaky

2020-06-12 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created HIVE-23682:
---

 Summary: TestMetrics is flaky
 Key: HIVE-23682
 URL: https://issues.apache.org/jira/browse/HIVE-23682
 Project: Hive
  Issue Type: Sub-task
Reporter: Zoltan Haindrich


http://34.66.156.144:8080/job/hive-precommit/job/master/31/testReport/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23681) TestTriggersMoveWorkloadManager is unstable

2020-06-12 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created HIVE-23681:
---

 Summary: TestTriggersMoveWorkloadManager is unstable
 Key: HIVE-23681
 URL: https://issues.apache.org/jira/browse/HIVE-23681
 Project: Hive
  Issue Type: Sub-task
Reporter: Zoltan Haindrich


http://34.66.156.144:8080/job/hive-precommit/job/master/37/testReport/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23680) TestDbNotificationListener is unstable

2020-06-12 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created HIVE-23680:
---

 Summary: TestDbNotificationListener is unstable
 Key: HIVE-23680
 URL: https://issues.apache.org/jira/browse/HIVE-23680
 Project: Hive
  Issue Type: Sub-task
Reporter: Zoltan Haindrich


http://34.66.156.144:8080/job/hive-precommit/job/master/35/testReport/
http://130.211.9.232/job/hive-flaky-check/24/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23679) TestSparkClient is flaky

2020-06-12 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created HIVE-23679:
---

 Summary: TestSparkClient is flaky
 Key: HIVE-23679
 URL: https://issues.apache.org/jira/browse/HIVE-23679
 Project: Hive
  Issue Type: Sub-task
Reporter: Zoltan Haindrich


http://130.211.9.232/job/hive-precommit/job/master/34/testReport/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23678) Don't enforce ASF license headers on target files

2020-06-12 Thread Karen Coppage (Jira)
Karen Coppage created HIVE-23678:


 Summary: Don't enforce ASF license headers on target files
 Key: HIVE-23678
 URL: https://issues.apache.org/jira/browse/HIVE-23678
 Project: Hive
  Issue Type: Bug
Reporter: Karen Coppage
Assignee: Karen Coppage






--
This message was sent by Atlassian Jira
(v8.3.4#803005)