date:20201008

[jira] [Work logged] (HIVE-24244) NPE during Atlas metadata replication

2020-10-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24244?focusedWorklogId=497768=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-497768
 ]

ASF GitHub Bot logged work on HIVE-24244:
-

Author: ASF GitHub Bot
Created on: 09/Oct/20 05:50
Start Date: 09/Oct/20 05:50
Worklog Time Spent: 10m 
  Work Description: pkumarsinha commented on a change in pull request #1563:
URL: https://github.com/apache/hive/pull/1563#discussion_r502202414



##
File path: 
ql/src/test/org/apache/hadoop/hive/ql/exec/repl/TestAtlasDumpTask.java
##
@@ -96,4 +105,13 @@ public void testAtlasDumpMetrics() throws Exception {
 Assert.assertTrue(eventDetailsCaptor
 
.getAllValues().get(1).toString().contains("{\"dbName\":\"srcDB\",\"dumpEndTime\""));
   }
+
+  @Test
+  public void testAtlasRestClientBuilder() throws SemanticException, 
IOException {
+mockStatic(UserGroupInformation.class);
+
when(UserGroupInformation.getLoginUser()).thenReturn(mock(UserGroupInformation.class));
+AtlasRestClientBuilder atlasRestCleintBuilder = new 
AtlasRestClientBuilder("http://localhost:31000;);
+AtlasRestClient atlasClient = atlasRestCleintBuilder.getClient(conf);
+Assert.assertTrue(atlasClient != null);

Review comment:
   HiveConf is mocked, so hive in test is not present(so false). 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 497768)
Time Spent: 0.5h  (was: 20m)

> NPE during Atlas metadata replication
> -
>
> Key: HIVE-24244
> URL: https://issues.apache.org/jira/browse/HIVE-24244
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24244.01.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-23860) Synchronize drop/modify functions across multiple HS2's

2020-10-08 Thread wenjun ma (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wenjun ma reassigned HIVE-23860:


Assignee: wenjun ma  (was: Adesh Kumar Rao)

> Synchronize drop/modify functions across multiple HS2's
> ---
>
> Key: HIVE-23860
> URL: https://issues.apache.org/jira/browse/HIVE-23860
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Adesh Kumar Rao
>Assignee: wenjun ma
>Priority: Minor
>
> Unless reload-function is run by connecting explicitly to all the HS2's, 
> below don't happen automatically.
> 1) Dropping a function from 1 HS2 does not remove it from other HS2's. 
> 2) Dropping a function and adding another one with same name, does not modify 
> the function in other HS2's



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24244) NPE during Atlas metadata replication

2020-10-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24244:
--
Labels: pull-request-available  (was: )

> NPE during Atlas metadata replication
> -
>
> Key: HIVE-24244
> URL: https://issues.apache.org/jira/browse/HIVE-24244
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24244.01.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24244) NPE during Atlas metadata replication

2020-10-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24244?focusedWorklogId=497732=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-497732
 ]

ASF GitHub Bot logged work on HIVE-24244:
-

Author: ASF GitHub Bot
Created on: 09/Oct/20 03:22
Start Date: 09/Oct/20 03:22
Worklog Time Spent: 10m 
  Work Description: aasha commented on a change in pull request #1563:
URL: https://github.com/apache/hive/pull/1563#discussion_r502163365



##
File path: 
ql/src/test/org/apache/hadoop/hive/ql/exec/repl/TestAtlasDumpTask.java
##
@@ -96,4 +105,13 @@ public void testAtlasDumpMetrics() throws Exception {
 Assert.assertTrue(eventDetailsCaptor
 
.getAllValues().get(1).toString().contains("{\"dbName\":\"srcDB\",\"dumpEndTime\""));
   }
+
+  @Test
+  public void testAtlasRestClientBuilder() throws SemanticException, 
IOException {
+mockStatic(UserGroupInformation.class);
+
when(UserGroupInformation.getLoginUser()).thenReturn(mock(UserGroupInformation.class));
+AtlasRestClientBuilder atlasRestCleintBuilder = new 
AtlasRestClientBuilder("http://localhost:31000;);
+AtlasRestClient atlasClient = atlasRestCleintBuilder.getClient(conf);
+Assert.assertTrue(atlasClient != null);

Review comment:
   hive in test repl is set to true. It will return a No Op Client which 
will never be null.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 497732)
Time Spent: 20m  (was: 10m)

> NPE during Atlas metadata replication
> -
>
> Key: HIVE-24244
> URL: https://issues.apache.org/jira/browse/HIVE-24244
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
> Attachments: HIVE-24244.01.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23800) Add hooks when HiveServer2 stops due to OutOfMemoryError

2020-10-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23800?focusedWorklogId=497721=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-497721
 ]

ASF GitHub Bot logged work on HIVE-23800:
-

Author: ASF GitHub Bot
Created on: 09/Oct/20 03:01
Start Date: 09/Oct/20 03:01
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on a change in pull request #1205:
URL: https://github.com/apache/hive/pull/1205#discussion_r502158480



##
File path: ql/src/java/org/apache/hadoop/hive/ql/hooks/HookContext.java
##
@@ -45,7 +47,50 @@
 public class HookContext {
 
   static public enum HookType {
-PRE_EXEC_HOOK, POST_EXEC_HOOK, ON_FAILURE_HOOK
+

Review comment:
   Checked on my test and production env,  it shows that the hooks compiled 
for the old api can be reused without any changes  with the new implementation.

##
File path: ql/src/java/org/apache/hadoop/hive/ql/HookRunner.java
##
@@ -39,57 +36,27 @@
 import org.apache.hadoop.hive.ql.parse.HiveSemanticAnalyzerHook;
 import org.apache.hadoop.hive.ql.parse.HiveSemanticAnalyzerHookContext;
 import org.apache.hadoop.hive.ql.session.SessionState;
-import org.apache.hadoop.hive.ql.session.SessionState.LogHelper;
 import org.apache.hive.common.util.HiveStringUtils;
 
+import static org.apache.hadoop.hive.ql.hooks.HookContext.HookType.*;
+
 /**
  * Handles hook executions for {@link Driver}.
  */
 public class HookRunner {
 
   private static final String CLASS_NAME = Driver.class.getName();
   private final HiveConf conf;
-  private LogHelper console;
-  private List queryHooks = new ArrayList<>();
-  private List saHooks = new ArrayList<>();
-  private List driverRunHooks = new ArrayList<>();
-  private List preExecHooks = new ArrayList<>();
-  private List postExecHooks = new ArrayList<>();
-  private List onFailureHooks = new ArrayList<>();
-  private boolean initialized = false;
+  private final HooksLoader loader;

Review comment:
   Rename it to HiveHooks instead.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 497721)
Time Spent: 5h 20m  (was: 5h 10m)

> Add hooks when HiveServer2 stops due to OutOfMemoryError
> 
>
> Key: HIVE-23800
> URL: https://issues.apache.org/jira/browse/HIVE-23800
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Zhihua Deng
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> Make oom hook an interface of HiveServer2,  so user can implement the hook to 
> do something before HS2 stops, such as dumping the heap or altering the 
> devops.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23948) Improve Query Results Cache

2020-10-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23948?focusedWorklogId=497683=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-497683
 ]

ASF GitHub Bot logged work on HIVE-23948:
-

Author: ASF GitHub Bot
Created on: 09/Oct/20 00:53
Start Date: 09/Oct/20 00:53
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #1335:
URL: https://github.com/apache/hive/pull/1335


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 497683)
Time Spent: 50m  (was: 40m)

> Improve Query Results Cache
> ---
>
> Key: HIVE-23948
> URL: https://issues.apache.org/jira/browse/HIVE-23948
> Project: Hive
>  Issue Type: Improvement
>Reporter: Hunter Logan
>Assignee: Hunter Logan
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Creating a Jira for this github PR from before github was actively used
> [https://github.com/apache/hive/pull/652]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-21611) Date.getTime() can be changed to System.currentTimeMillis()

2020-10-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-21611?focusedWorklogId=497682=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-497682
 ]

ASF GitHub Bot logged work on HIVE-21611:
-

Author: ASF GitHub Bot
Created on: 09/Oct/20 00:52
Start Date: 09/Oct/20 00:52
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #1334:
URL: https://github.com/apache/hive/pull/1334


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 497682)
Time Spent: 2h 20m  (was: 2h 10m)

> Date.getTime() can be changed to System.currentTimeMillis()
> ---
>
> Key: HIVE-21611
> URL: https://issues.apache.org/jira/browse/HIVE-21611
> Project: Hive
>  Issue Type: Bug
>Reporter: bd2019us
>Assignee: Hunter Logan
>Priority: Major
>  Labels: pull-request-available
> Attachments: 1.patch
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Hello,
> I found that System.currentTimeMillis() can be used here instead of new 
> Date.getTime().
> Since new Date() is a thin wrapper of light method 
> System.currentTimeMillis(). The performance will be greatly damaged if it is 
> invoked too much times.
> According to my local testing at the same environment, 
> System.currentTimeMillis() can achieve a speedup to 5 times (435 ms vs 2073 
> ms), when these two methods are invoked 5,000,000 times.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24242) Relax safety checks in SharedWorkOptimizer

2020-10-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24242:
--
Labels: pull-request-available  (was: )

> Relax safety checks in SharedWorkOptimizer
> --
>
> Key: HIVE-24242
> URL: https://issues.apache.org/jira/browse/HIVE-24242
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> there are some checks to lock out problematic cases
> For UnionOperator 
> [here|https://github.com/apache/hive/blob/1507d80fd47aad38b87bba4fd58c1427ba89dbbf/ql/src/java/org/apache/hadoop/hive/ql/optimizer/SharedWorkOptimizer.java#L1571]
> This check could prevent the optimization even if the Union is only visible 
> from only 1 of the TS ops.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24242) Relax safety checks in SharedWorkOptimizer

2020-10-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24242?focusedWorklogId=497622=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-497622
 ]

ASF GitHub Bot logged work on HIVE-24242:
-

Author: ASF GitHub Bot
Created on: 08/Oct/20 21:58
Start Date: 08/Oct/20 21:58
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk opened a new pull request #1564:
URL: https://github.com/apache/hive/pull/1564


   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 497622)
Remaining Estimate: 0h
Time Spent: 10m

> Relax safety checks in SharedWorkOptimizer
> --
>
> Key: HIVE-24242
> URL: https://issues.apache.org/jira/browse/HIVE-24242
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> there are some checks to lock out problematic cases
> For UnionOperator 
> [here|https://github.com/apache/hive/blob/1507d80fd47aad38b87bba4fd58c1427ba89dbbf/ql/src/java/org/apache/hadoop/hive/ql/optimizer/SharedWorkOptimizer.java#L1571]
> This check could prevent the optimization even if the Union is only visible 
> from only 1 of the TS ops.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24245) Vectorized PTF with count and distinct over partition producing incorrect results.

2020-10-08 Thread Chiran Ravani (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chiran Ravani updated HIVE-24245:
-
Description: 
Vectorized PTF for count and distinct over partition is broken. It produces 
incorrect results.
Below is the test case.

{code}
CREATE TABLE bigd781b_new (
  id int,
  txt1 string,
  txt2 string,
  cda_date int,
  cda_job_name varchar(12));

INSERT INTO bigd781b_new VALUES 
  (1,'2010005759','7164335675012038',20200528,'load1'),
  (2,'2010005759','7164335675012038',20200528,'load2');
{code}

Running below query produces incorrect results

{code}
SELECT
txt1,
txt2,
count(distinct txt1) over(partition by txt1) as n,
count(distinct txt2) over(partition by txt2) as m
FROM bigd781b_new
{code}

as below.

{code}
+-+---+++
|txt1 |   txt2| n  | m  |
+-+---+++
| 2010005759  | 7164335675012038  | 2  | 2  |
| 2010005759  | 7164335675012038  | 2  | 2  |
+-+---+++
{code}

While the correct output would be

{code}
+-+---+++
|txt1 |   txt2| n  | m  |
+-+---+++
| 2010005759  | 7164335675012038  | 1  | 1  |
| 2010005759  | 7164335675012038  | 1  | 1  |
+-+---+++
{code}


The problem does not appear after setting below property
set hive.vectorized.execution.ptf.enabled=false;


  was:
Vectorized PTF for count and distinct over partition is broken. It produces 
incorrect results.
Below is the test case.

{code}
CREATE TABLE bigd781b_new (
  id int,
  txt1 string,
  txt2 string,
  cda_date int,
  cda_job_name varchar(12));

INSERT INTO bigd781b_new VALUES 
  (1,'2010005759','7164335675012038',20200528,'load1'),
  (2,'2010005759','7164335675012038',20200528,'load2');
{code}

Running below query produces incorrect results

{code}
SELECT
txt1,
txt2,
count(distinct txt1) over(partition by txt1) as n,
count(distinct txt2) over(partition by txt2) as m
FROM bigd781b_new
WHERE cda_date = 20200528 and ( txt2 = '7164335675012038');
{code}

as below.

{code}
+-+---+++
|txt1 |   txt2| n  | m  |
+-+---+++
| 2010005759  | 7164335675012038  | 2  | 2  |
| 2010005759  | 7164335675012038  | 2  | 2  |
+-+---+++
{code}

While the correct output would be

{code}
+-+---+++
|txt1 |   txt2| n  | m  |
+-+---+++
| 2010005759  | 7164335675012038  | 1  | 1  |
| 2010005759  | 7164335675012038  | 1  | 1  |
+-+---+++
{code}


The problem does not appear after setting below property
set hive.vectorized.execution.ptf.enabled=false;



> Vectorized PTF with count and distinct over partition producing incorrect 
> results.
> --
>
> Key: HIVE-24245
> URL: https://issues.apache.org/jira/browse/HIVE-24245
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, PTF-Windowing, Vectorization
>Affects Versions: 3.1.0, 3.1.2
>Reporter: Chiran Ravani
>Priority: Critical
>
> Vectorized PTF for count and distinct over partition is broken. It produces 
> incorrect results.
> Below is the test case.
> {code}
> CREATE TABLE bigd781b_new (
>   id int,
>   txt1 string,
>   txt2 string,
>   cda_date int,
>   cda_job_name varchar(12));
> INSERT INTO bigd781b_new VALUES 
>   (1,'2010005759','7164335675012038',20200528,'load1'),
>   (2,'2010005759','7164335675012038',20200528,'load2');
> {code}
> Running below query produces incorrect results
> {code}
> SELECT
> txt1,
> txt2,
> count(distinct txt1) over(partition by txt1) as n,
> count(distinct txt2) over(partition by txt2) as m
> FROM bigd781b_new
> {code}
> as below.
> {code}
> +-+---+++
> |txt1 |   txt2| n  | m  |
> +-+---+++
> | 2010005759  | 7164335675012038  | 2  | 2  |
> | 2010005759  | 7164335675012038  | 2  | 2  |
> +-+---+++
> {code}
> While the correct output would be
> {code}
> +-+---+++
> |txt1 |   txt2| n  | m  |
> +-+---+++
> | 2010005759  | 7164335675012038  | 1  | 1  |
> | 2010005759  | 7164335675012038  | 1  | 1  |
> +-+---+++
> {code}
> The problem does not appear after setting below property
> set hive.vectorized.execution.ptf.enabled=false;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24244) NPE during Atlas metadata replication

2020-10-08 Thread Pravin Sinha (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pravin Sinha updated HIVE-24244:

Attachment: HIVE-24244.01.patch

> NPE during Atlas metadata replication
> -
>
> Key: HIVE-24244
> URL: https://issues.apache.org/jira/browse/HIVE-24244
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
> Attachments: HIVE-24244.01.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24244) NPE during Atlas metadata replication

2020-10-08 Thread Pravin Sinha (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pravin Sinha updated HIVE-24244:

Issue Type: Bug  (was: Improvement)

> NPE during Atlas metadata replication
> -
>
> Key: HIVE-24244
> URL: https://issues.apache.org/jira/browse/HIVE-24244
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24244) NPE during Atlas metadata replication

2020-10-08 Thread Pravin Sinha (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pravin Sinha updated HIVE-24244:

Labels:   (was: pull-request-available)
Status: Patch Available  (was: In Progress)

> NPE during Atlas metadata replication
> -
>
> Key: HIVE-24244
> URL: https://issues.apache.org/jira/browse/HIVE-24244
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24244) NPE during Atlas metadata replication

2020-10-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24244:
--
Labels: pull-request-available  (was: )

> NPE during Atlas metadata replication
> -
>
> Key: HIVE-24244
> URL: https://issues.apache.org/jira/browse/HIVE-24244
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24244) NPE during Atlas metadata replication

2020-10-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24244?focusedWorklogId=497561=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-497561
 ]

ASF GitHub Bot logged work on HIVE-24244:
-

Author: ASF GitHub Bot
Created on: 08/Oct/20 20:15
Start Date: 08/Oct/20 20:15
Worklog Time Spent: 10m 
  Work Description: pkumarsinha opened a new pull request #1563:
URL: https://github.com/apache/hive/pull/1563


   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 497561)
Remaining Estimate: 0h
Time Spent: 10m

> NPE during Atlas metadata replication
> -
>
> Key: HIVE-24244
> URL: https://issues.apache.org/jira/browse/HIVE-24244
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work started] (HIVE-24244) NPE during Atlas metadata replication

2020-10-08 Thread Pravin Sinha (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-24244 started by Pravin Sinha.
---
> NPE during Atlas metadata replication
> -
>
> Key: HIVE-24244
> URL: https://issues.apache.org/jira/browse/HIVE-24244
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-24244) NPE during Atlas metadata replication

2020-10-08 Thread Pravin Sinha (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pravin Sinha reassigned HIVE-24244:
---


> NPE during Atlas metadata replication
> -
>
> Key: HIVE-24244
> URL: https://issues.apache.org/jira/browse/HIVE-24244
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called

2020-10-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=497551=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-497551
 ]

ASF GitHub Bot logged work on HIVE-21052:
-

Author: ASF GitHub Bot
Created on: 08/Oct/20 19:54
Start Date: 08/Oct/20 19:54
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1548:
URL: https://github.com/apache/hive/pull/1548#discussion_r501883427



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java
##
@@ -107,11 +107,12 @@ public CompactionTxnHandler() {
 // Check for aborted txns: number of aborted txns past threshold and 
age of aborted txns
 // past time threshold
 boolean checkAbortedTimeThreshold = abortedTimeThreshold >= 0;
-final String sCheckAborted = "SELECT \"TC_DATABASE\", \"TC_TABLE\", 
\"TC_PARTITION\","
-+ "MIN(\"TXN_STARTED\"), COUNT(*)"
+String sCheckAborted = "SELECT \"TC_DATABASE\", \"TC_TABLE\", 
\"TC_PARTITION\", "
++ "MIN(\"TXN_STARTED\"), COUNT(*), "
++ "MAX(CASE WHEN \"TC_OPERATION_TYPE\" = " + OperationType.DYNPART 
+ " THEN 1 ELSE 0 END) AS \"IS_DP\" "

Review comment:
   I still don't follow. Aborted txn check is done per db/table/partition, 
so if you have db1/tbl1/p1/type=NOT_DP and db1/tbl1/null/type=DP - that should 
generate 2 entries in potential compactions.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 497551)
Time Spent: 10h 20m  (was: 10h 10m)

> Make sure transactions get cleaned if they are aborted before addPartitions 
> is called
> -
>
> Key: HIVE-21052
> URL: https://issues.apache.org/jira/browse/HIVE-21052
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0, 3.1.1
>Reporter: Jaume M
>Assignee: Jaume M
>Priority: Critical
>  Labels: pull-request-available
> Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, 
> HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, 
> HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, 
> HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, 
> HIVE-21052.8.patch, HIVE-21052.9.patch
>
>  Time Spent: 10h 20m
>  Remaining Estimate: 0h
>
> If the transaction is aborted between openTxn and addPartitions and data has 
> been written on the table the transaction manager will think it's an empty 
> transaction and no cleaning will be done.
> This is currently an issue in the streaming API and in micromanaged tables. 
> As proposed by [~ekoifman] this can be solved by:
> * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and 
> when addPartitions is called remove this entry from TXN_COMPONENTS and add 
> the corresponding partition entry to TXN_COMPONENTS.
> * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that 
> specifies that a transaction was opened and it was aborted it must generate 
> jobs for the worker for every possible partition available.
> cc [~ewohlstadter]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called

2020-10-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=497462=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-497462
 ]

ASF GitHub Bot logged work on HIVE-21052:
-

Author: ASF GitHub Bot
Created on: 08/Oct/20 17:35
Start Date: 08/Oct/20 17:35
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1548:
URL: https://github.com/apache/hive/pull/1548#discussion_r501884660



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java
##
@@ -107,11 +107,12 @@ public CompactionTxnHandler() {
 // Check for aborted txns: number of aborted txns past threshold and 
age of aborted txns
 // past time threshold
 boolean checkAbortedTimeThreshold = abortedTimeThreshold >= 0;
-final String sCheckAborted = "SELECT \"TC_DATABASE\", \"TC_TABLE\", 
\"TC_PARTITION\","
-+ "MIN(\"TXN_STARTED\"), COUNT(*)"
+String sCheckAborted = "SELECT \"TC_DATABASE\", \"TC_TABLE\", 
\"TC_PARTITION\", "
++ "MIN(\"TXN_STARTED\"), COUNT(*), "
++ "MAX(CASE WHEN \"TC_OPERATION_TYPE\" = " + OperationType.DYNPART 
+ " THEN 1 ELSE 0 END) AS \"IS_DP\" "

Review comment:
   oh, sorry, I only considered time based threshold for DYN_PART





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 497462)
Time Spent: 10h 10m  (was: 10h)

> Make sure transactions get cleaned if they are aborted before addPartitions 
> is called
> -
>
> Key: HIVE-21052
> URL: https://issues.apache.org/jira/browse/HIVE-21052
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0, 3.1.1
>Reporter: Jaume M
>Assignee: Jaume M
>Priority: Critical
>  Labels: pull-request-available
> Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, 
> HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, 
> HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, 
> HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, 
> HIVE-21052.8.patch, HIVE-21052.9.patch
>
>  Time Spent: 10h 10m
>  Remaining Estimate: 0h
>
> If the transaction is aborted between openTxn and addPartitions and data has 
> been written on the table the transaction manager will think it's an empty 
> transaction and no cleaning will be done.
> This is currently an issue in the streaming API and in micromanaged tables. 
> As proposed by [~ekoifman] this can be solved by:
> * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and 
> when addPartitions is called remove this entry from TXN_COMPONENTS and add 
> the corresponding partition entry to TXN_COMPONENTS.
> * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that 
> specifies that a transaction was opened and it was aborted it must generate 
> jobs for the worker for every possible partition available.
> cc [~ewohlstadter]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called

2020-10-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=497455=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-497455
 ]

ASF GitHub Bot logged work on HIVE-21052:
-

Author: ASF GitHub Bot
Created on: 08/Oct/20 17:19
Start Date: 08/Oct/20 17:19
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1548:
URL: https://github.com/apache/hive/pull/1548#discussion_r501884660



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java
##
@@ -107,11 +107,12 @@ public CompactionTxnHandler() {
 // Check for aborted txns: number of aborted txns past threshold and 
age of aborted txns
 // past time threshold
 boolean checkAbortedTimeThreshold = abortedTimeThreshold >= 0;
-final String sCheckAborted = "SELECT \"TC_DATABASE\", \"TC_TABLE\", 
\"TC_PARTITION\","
-+ "MIN(\"TXN_STARTED\"), COUNT(*)"
+String sCheckAborted = "SELECT \"TC_DATABASE\", \"TC_TABLE\", 
\"TC_PARTITION\", "
++ "MIN(\"TXN_STARTED\"), COUNT(*), "
++ "MAX(CASE WHEN \"TC_OPERATION_TYPE\" = " + OperationType.DYNPART 
+ " THEN 1 ELSE 0 END) AS \"IS_DP\" "

Review comment:
   oh, sorry, I only considered time based threshold





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 497455)
Time Spent: 10h  (was: 9h 50m)

> Make sure transactions get cleaned if they are aborted before addPartitions 
> is called
> -
>
> Key: HIVE-21052
> URL: https://issues.apache.org/jira/browse/HIVE-21052
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0, 3.1.1
>Reporter: Jaume M
>Assignee: Jaume M
>Priority: Critical
>  Labels: pull-request-available
> Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, 
> HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, 
> HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, 
> HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, 
> HIVE-21052.8.patch, HIVE-21052.9.patch
>
>  Time Spent: 10h
>  Remaining Estimate: 0h
>
> If the transaction is aborted between openTxn and addPartitions and data has 
> been written on the table the transaction manager will think it's an empty 
> transaction and no cleaning will be done.
> This is currently an issue in the streaming API and in micromanaged tables. 
> As proposed by [~ekoifman] this can be solved by:
> * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and 
> when addPartitions is called remove this entry from TXN_COMPONENTS and add 
> the corresponding partition entry to TXN_COMPONENTS.
> * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that 
> specifies that a transaction was opened and it was aborted it must generate 
> jobs for the worker for every possible partition available.
> cc [~ewohlstadter]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called

2020-10-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=497453=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-497453
 ]

ASF GitHub Bot logged work on HIVE-21052:
-

Author: ASF GitHub Bot
Created on: 08/Oct/20 17:17
Start Date: 08/Oct/20 17:17
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1548:
URL: https://github.com/apache/hive/pull/1548#discussion_r501883427



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java
##
@@ -107,11 +107,12 @@ public CompactionTxnHandler() {
 // Check for aborted txns: number of aborted txns past threshold and 
age of aborted txns
 // past time threshold
 boolean checkAbortedTimeThreshold = abortedTimeThreshold >= 0;
-final String sCheckAborted = "SELECT \"TC_DATABASE\", \"TC_TABLE\", 
\"TC_PARTITION\","
-+ "MIN(\"TXN_STARTED\"), COUNT(*)"
+String sCheckAborted = "SELECT \"TC_DATABASE\", \"TC_TABLE\", 
\"TC_PARTITION\", "
++ "MIN(\"TXN_STARTED\"), COUNT(*), "
++ "MAX(CASE WHEN \"TC_OPERATION_TYPE\" = " + OperationType.DYNPART 
+ " THEN 1 ELSE 0 END) AS \"IS_DP\" "

Review comment:
   I still don't follow. Aborted txn check is done per db/table/partition, 
so if you have db1/tbl1/{p1-p100}/type=NOT_DP and db1/tbl1/null/type=DP - that 
should generate 2 entries in potential compactions.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 497453)
Time Spent: 9h 50m  (was: 9h 40m)

> Make sure transactions get cleaned if they are aborted before addPartitions 
> is called
> -
>
> Key: HIVE-21052
> URL: https://issues.apache.org/jira/browse/HIVE-21052
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0, 3.1.1
>Reporter: Jaume M
>Assignee: Jaume M
>Priority: Critical
>  Labels: pull-request-available
> Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, 
> HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, 
> HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, 
> HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, 
> HIVE-21052.8.patch, HIVE-21052.9.patch
>
>  Time Spent: 9h 50m
>  Remaining Estimate: 0h
>
> If the transaction is aborted between openTxn and addPartitions and data has 
> been written on the table the transaction manager will think it's an empty 
> transaction and no cleaning will be done.
> This is currently an issue in the streaming API and in micromanaged tables. 
> As proposed by [~ekoifman] this can be solved by:
> * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and 
> when addPartitions is called remove this entry from TXN_COMPONENTS and add 
> the corresponding partition entry to TXN_COMPONENTS.
> * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that 
> specifies that a transaction was opened and it was aborted it must generate 
> jobs for the worker for every possible partition available.
> cc [~ewohlstadter]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-24197) Check for write transactions for the db under replication at a frequent interval

2020-10-08 Thread Pravin Sinha (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-24197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17210316#comment-17210316
 ] 

Pravin Sinha commented on HIVE-24197:
-

+1

> Check for write transactions for the db under replication at a frequent 
> interval
> 
>
> Key: HIVE-24197
> URL: https://issues.apache.org/jira/browse/HIVE-24197
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24197.01.patch, HIVE-24197.02.patch, 
> HIVE-24197.03.patch, HIVE-24197.04.patch, HIVE-24197.05.patch, 
> HIVE-24197.06.patch, HIVE-24197.07.patch, HIVE-24197.08.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24236) Connection leak in TxnHandler

2020-10-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24236?focusedWorklogId=497434=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-497434
 ]

ASF GitHub Bot logged work on HIVE-24236:
-

Author: ASF GitHub Bot
Created on: 08/Oct/20 16:26
Start Date: 08/Oct/20 16:26
Worklog Time Spent: 10m 
  Work Description: yongzhi merged pull request #1559:
URL: https://github.com/apache/hive/pull/1559


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 497434)
Time Spent: 1h 20m  (was: 1h 10m)

> Connection leak in TxnHandler
> -
>
> Key: HIVE-24236
> URL: https://issues.apache.org/jira/browse/HIVE-24236
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Yongzhi Chen
>Assignee: Yongzhi Chen
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> We see failures in QE tests with cannot allocate connections errors. The 
> exception stack like following:
> {noformat}
> 2020-09-29T18:44:26,563 INFO  [Heartbeater-0]: txn.TxnHandler 
> (TxnHandler.java:checkRetryable(3733)) - Non-retryable error in 
> heartbeat(HeartbeatRequest(lockid:0, txnid:11908)) : Cannot get a connection, 
> general error (SQLState=null, ErrorCode=0)
> 2020-09-29T18:44:26,564 ERROR [Heartbeater-0]: metastore.RetryingHMSHandler 
> (RetryingHMSHandler.java:invokeInternal(201)) - MetaException(message:Unable 
> to select from transaction database 
> org.apache.commons.dbcp.SQLNestedException: Cannot get a connection, general 
> error
> at 
> org.apache.commons.dbcp.PoolingDataSource.getConnection(PoolingDataSource.java:118)
> at 
> org.apache.hadoop.hive.metastore.txn.TxnHandler.getDbConn(TxnHandler.java:3605)
> at 
> org.apache.hadoop.hive.metastore.txn.TxnHandler.getDbConn(TxnHandler.java:3598)
> at 
> org.apache.hadoop.hive.metastore.txn.TxnHandler.heartbeat(TxnHandler.java:2739)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.heartbeat(HiveMetaStore.java:8452)
> at sun.reflect.GeneratedMethodAccessor415.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108)
> at com.sun.proxy.$Proxy63.heartbeat(Unknown Source)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.heartbeat(HiveMetaStoreClient.java:3247)
> at sun.reflect.GeneratedMethodAccessor414.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:213)
> at com.sun.proxy.$Proxy64.heartbeat(Unknown Source)
> at 
> org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.heartbeat(DbTxnManager.java:671)
> at 
> org.apache.hadoop.hive.ql.lockmgr.DbTxnManager$Heartbeater.lambda$run$0(DbTxnManager.java:1102)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898)
> at 
> org.apache.hadoop.hive.ql.lockmgr.DbTxnManager$Heartbeater.run(DbTxnManager.java:1101)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.InterruptedException
> at java.lang.Object.wait(Native Method)
> at 
>

[jira] [Issue Comment Deleted] (HIVE-24197) Check for write transactions for the db under replication at a frequent interval

2020-10-08 Thread Pravin Sinha (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pravin Sinha updated HIVE-24197:

Comment: was deleted

(was: +1)

> Check for write transactions for the db under replication at a frequent 
> interval
> 
>
> Key: HIVE-24197
> URL: https://issues.apache.org/jira/browse/HIVE-24197
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24197.01.patch, HIVE-24197.02.patch, 
> HIVE-24197.03.patch, HIVE-24197.04.patch, HIVE-24197.05.patch, 
> HIVE-24197.06.patch, HIVE-24197.07.patch, HIVE-24197.08.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24241) Enable SharedWorkOptimizer to merge downstream operators after an optimization step

2020-10-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24241?focusedWorklogId=497409=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-497409
 ]

ASF GitHub Bot logged work on HIVE-24241:
-

Author: ASF GitHub Bot
Created on: 08/Oct/20 15:27
Start Date: 08/Oct/20 15:27
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk opened a new pull request #1562:
URL: https://github.com/apache/hive/pull/1562


   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 497409)
Remaining Estimate: 0h
Time Spent: 10m

> Enable SharedWorkOptimizer to merge downstream operators after an 
> optimization step
> ---
>
> Key: HIVE-24241
> URL: https://issues.apache.org/jira/browse/HIVE-24241
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24241) Enable SharedWorkOptimizer to merge downstream operators after an optimization step

2020-10-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24241:
--
Labels: pull-request-available  (was: )

> Enable SharedWorkOptimizer to merge downstream operators after an 
> optimization step
> ---
>
> Key: HIVE-24241
> URL: https://issues.apache.org/jira/browse/HIVE-24241
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24224) Fix skipping header/footer for Hive on Tez on compressed files

2020-10-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24224?focusedWorklogId=497387=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-497387
 ]

ASF GitHub Bot logged work on HIVE-24224:
-

Author: ASF GitHub Bot
Created on: 08/Oct/20 14:55
Start Date: 08/Oct/20 14:55
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk closed pull request #1546:
URL: https://github.com/apache/hive/pull/1546


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 497387)
Time Spent: 40m  (was: 0.5h)

> Fix skipping header/footer for Hive on Tez on compressed files
> --
>
> Key: HIVE-24224
> URL: https://issues.apache.org/jira/browse/HIVE-24224
> Project: Hive
>  Issue Type: Bug
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Compressed file with Hive on Tez  returns header and footers - for both 
> select * and select count ( * ):
> {noformat}
> printf "offset,id,other\n9,\"20200315 X00 1356\",123\n17,\"20200315 X00 
> 1357\",123\nrst,rst,rst" > data.csv
> hdfs dfs -put -f data.csv /apps/hive/warehouse/bz2test/bz2tbl1/
> bzip2 -f data.csv 
> hdfs dfs -put -f data.csv.bz2 /apps/hive/warehouse/bz2test/bz2tbl2/
> beeline -e "CREATE EXTERNAL TABLE default.bz2tst2 (
>   sequence   int,
>   id string,
>   other  string) 
> ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
> LOCATION '/apps/hive/warehouse/bz2test/bz2tbl2' 
> TBLPROPERTIES (
>   'skip.header.line.count'='1',
>   'skip.footer.line.count'='1');"
> beeline -e "
>   SET hive.fetch.task.conversion = none;
>   SELECT * FROM default.bz2tst2;"
> +---+++
> | bz2tst2.sequence  | bz2tst2.id | bz2tst2.other  |
> +---+++
> | offset| id | other  |
> | 9 | 20200315 X00 1356  | 123|
> | 17| 20200315 X00 1357  | 123|
> | rst   | rst| rst|
> +---+++
> {noformat}
> PS: HIVE-22769 addressed the issue for Hive on LLAP.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called

2020-10-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=497368=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-497368
 ]

ASF GitHub Bot logged work on HIVE-21052:
-

Author: ASF GitHub Bot
Created on: 08/Oct/20 14:28
Start Date: 08/Oct/20 14:28
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1548:
URL: https://github.com/apache/hive/pull/1548#discussion_r501766620



##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java
##
@@ -232,6 +240,51 @@ public Object run() throws Exception {
   private static String idWatermark(CompactionInfo ci) {
 return " id=" + ci.id;
   }
+
+  private void cleanAborted(CompactionInfo ci) throws MetaException {
+if (ci.writeIds == null || ci.writeIds.size() == 0) {
+  LOG.warn("Attempted cleaning aborted transaction with empty writeId 
list");

Review comment:
   fixed





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 497368)
Time Spent: 9h 40m  (was: 9.5h)

> Make sure transactions get cleaned if they are aborted before addPartitions 
> is called
> -
>
> Key: HIVE-21052
> URL: https://issues.apache.org/jira/browse/HIVE-21052
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0, 3.1.1
>Reporter: Jaume M
>Assignee: Jaume M
>Priority: Critical
>  Labels: pull-request-available
> Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, 
> HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, 
> HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, 
> HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, 
> HIVE-21052.8.patch, HIVE-21052.9.patch
>
>  Time Spent: 9h 40m
>  Remaining Estimate: 0h
>
> If the transaction is aborted between openTxn and addPartitions and data has 
> been written on the table the transaction manager will think it's an empty 
> transaction and no cleaning will be done.
> This is currently an issue in the streaming API and in micromanaged tables. 
> As proposed by [~ekoifman] this can be solved by:
> * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and 
> when addPartitions is called remove this entry from TXN_COMPONENTS and add 
> the corresponding partition entry to TXN_COMPONENTS.
> * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that 
> specifies that a transaction was opened and it was aborted it must generate 
> jobs for the worker for every possible partition available.
> cc [~ewohlstadter]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called

2020-10-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=497358=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-497358
 ]

ASF GitHub Bot logged work on HIVE-21052:
-

Author: ASF GitHub Bot
Created on: 08/Oct/20 14:17
Start Date: 08/Oct/20 14:17
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1548:
URL: https://github.com/apache/hive/pull/1548#discussion_r501757123



##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java
##
@@ -853,6 +857,273 @@ public void majorCompactAfterAbort() throws Exception {
 Lists.newArrayList(5, 6), 1);
   }
 
+  @Test
+  public void testCleanAbortCompactAfterAbortTwoPartitions() throws Exception {
+String dbName = "default";
+String tblName = "cws";
+
+HiveStreamingConnection connection1 = 
prepareTableTwoPartitionsAndConnection(dbName, tblName, 1);
+HiveStreamingConnection connection2 = 
prepareTableTwoPartitionsAndConnection(dbName, tblName, 1);
+
+connection1.beginTransaction();
+connection1.write("1,1".getBytes());
+connection1.write("2,2".getBytes());
+connection1.abortTransaction();
+
+connection2.beginTransaction();
+connection2.write("1,3".getBytes());
+connection2.write("2,3".getBytes());
+connection2.write("3,3".getBytes());
+connection2.abortTransaction();
+
+assertAndCompactCleanAbort(dbName, tblName);
+
+connection1.close();
+connection2.close();
+  }
+
+  @Test
+  public void testCleanAbortCompactAfterAbort() throws Exception {
+String dbName = "default";
+String tblName = "cws";
+
+// Create three folders with two different transactions
+HiveStreamingConnection connection1 = prepareTableAndConnection(dbName, 
tblName, 1);
+HiveStreamingConnection connection2 = prepareTableAndConnection(dbName, 
tblName, 1);
+
+connection1.beginTransaction();
+connection1.write("1,1".getBytes());
+connection1.write("2,2".getBytes());
+connection1.abortTransaction();
+
+connection2.beginTransaction();
+connection2.write("1,3".getBytes());
+connection2.write("2,3".getBytes());
+connection2.write("3,3".getBytes());
+connection2.abortTransaction();
+
+assertAndCompactCleanAbort(dbName, tblName);
+
+connection1.close();
+connection2.close();
+  }
+
+  private void assertAndCompactCleanAbort(String dbName, String tblName) 
throws Exception {
+IMetaStoreClient msClient = new HiveMetaStoreClient(conf);
+TxnStore txnHandler = TxnUtils.getTxnStore(conf);
+Table table = msClient.getTable(dbName, tblName);
+FileSystem fs = FileSystem.get(conf);
+FileStatus[] stat =
+fs.listStatus(new Path(table.getSd().getLocation()));
+if (3 != stat.length) {
+  Assert.fail("Expecting three directories corresponding to three 
partitions, FileStatus[] stat " + Arrays.toString(stat));
+}
+
+int count = TxnDbUtil.countQueryAgent(conf, "select count(*) from 
TXN_COMPONENTS where TC_OPERATION_TYPE='p'");
+// We should have two rows corresponding to the two aborted transactions
+Assert.assertEquals(TxnDbUtil.queryToString(conf, "select * from 
TXN_COMPONENTS"), 2, count);
+
+runInitiator(conf);
+count = TxnDbUtil.countQueryAgent(conf, "select count(*) from 
COMPACTION_QUEUE where CQ_TYPE='p'");
+// Only one job is added to the queue per table. This job corresponds to 
all the entries for a particular table
+// with rows in TXN_COMPONENTS
+Assert.assertEquals(TxnDbUtil.queryToString(conf, "select * from 
COMPACTION_QUEUE"), 1, count);
+
+ShowCompactResponse rsp = txnHandler.showCompact(new ShowCompactRequest());
+Assert.assertEquals(1, rsp.getCompacts().size());
+Assert.assertEquals(TxnStore.CLEANING_RESPONSE, 
rsp.getCompacts().get(0).getState());
+Assert.assertEquals("cws", rsp.getCompacts().get(0).getTablename());
+Assert.assertEquals(CompactionType.CLEAN_ABORTED,
+rsp.getCompacts().get(0).getType());
+
+runCleaner(conf);
+
+// After the cleaner runs TXN_COMPONENTS and COMPACTION_QUEUE should have 
zero rows, also the folders should have been deleted.
+count = TxnDbUtil.countQueryAgent(conf, "select count(*) from 
TXN_COMPONENTS");
+Assert.assertEquals(TxnDbUtil.queryToString(conf, "select * from 
TXN_COMPONENTS"), 0, count);
+
+count = TxnDbUtil.countQueryAgent(conf, "select count(*) from 
COMPACTION_QUEUE");
+Assert.assertEquals(TxnDbUtil.queryToString(conf, "select * from 
COMPACTION_QUEUE"), 0, count);
+
+RemoteIterator it =
+fs.listFiles(new Path(table.getSd().getLocation()), true);
+if (it.hasNext()) {
+  Assert.fail("Expecting compaction to have cleaned the directories, 
FileStatus[] stat " + Arrays.toString(stat));
+}
+
+rsp = txnHandler.showCompact(new

[jira] [Updated] (HIVE-24243) Missing table alias in LEFT JOIN causing inconsistent results

2020-10-08 Thread Henrique dos Santos Goulart (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henrique dos Santos Goulart updated HIVE-24243:
---
Description: 
Missing table alias in LEFT JOIN causing inconsistent results, see attachments.
{code:java}
with
cte_left as (
select null as col11, 'col2_1' as col2 union all
select 1as col11, 'col2_2' as col2
),
cte_right as (
select 1 as col1, 'col3' as col3
)
select *
from cte_left l
left join cte_right r on r.col1 = l.col11;
{code}
Returns 2 rows correctly.

vs
{code:java}
with
cte_left as (
select null as col11, 'col2_1' as col2 union all
select 1as col11, 'col2_2' as col2
),
cte_right as (
select 1 as col1, 'col3' as col3
)
select *
from cte_left
left join cte_right r on r.col1 = col11;
{code}
Returns 1 row.

  was:
Missing table alias in LEFT JOIN causing inconsistent results, see attachments.


{code:java}
with
cte_left as (
select null as col11, 'col2_1' as col2 union all
select 1as col11, 'col2_2' as col2
),
cte_right as (
select 1 as col1, 'col3' as col3
)
select *
from cte_left l
left join cte_right r on r.col1 = l.col11;
{code}
vs


{code:java}
with
cte_left as (
select null as col11, 'col2_1' as col2 union all
select 1as col11, 'col2_2' as col2
),
cte_right as (
select 1 as col1, 'col3' as col3
)
select *
from cte_left
left join cte_right r on r.col1 = col11;
{code}


> Missing table alias in LEFT JOIN causing inconsistent results
> -
>
> Key: HIVE-24243
> URL: https://issues.apache.org/jira/browse/HIVE-24243
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.1
>Reporter: Henrique dos Santos Goulart
>Priority: Major
> Attachments: alias.png, no_alias.png
>
>
> Missing table alias in LEFT JOIN causing inconsistent results, see 
> attachments.
> {code:java}
> with
> cte_left as (
> select null as col11, 'col2_1' as col2 union all
> select 1as col11, 'col2_2' as col2
> ),
> cte_right as (
> select 1 as col1, 'col3' as col3
> )
> select *
> from cte_left l
> left join cte_right r on r.col1 = l.col11;
> {code}
> Returns 2 rows correctly.
> vs
> {code:java}
> with
> cte_left as (
> select null as col11, 'col2_1' as col2 union all
> select 1as col11, 'col2_2' as col2
> ),
> cte_right as (
> select 1 as col1, 'col3' as col3
> )
> select *
> from cte_left
> left join cte_right r on r.col1 = col11;
> {code}
> Returns 1 row.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24243) Missing table alias in LEFT JOIN causing inconsistent results

2020-10-08 Thread Henrique dos Santos Goulart (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henrique dos Santos Goulart updated HIVE-24243:
---
Description: 
Missing table alias in LEFT JOIN causing inconsistent results, see attachments.


{code:java}
with
cte_left as (
select null as col11, 'col2_1' as col2 union all
select 1as col11, 'col2_2' as col2
),
cte_right as (
select 1 as col1, 'col3' as col3
)
select *
from cte_left l
left join cte_right r on r.col1 = l.col11;
{code}
vs


{code:java}
with
cte_left as (
select null as col11, 'col2_1' as col2 union all
select 1as col11, 'col2_2' as col2
),
cte_right as (
select 1 as col1, 'col3' as col3
)
select *
from cte_left
left join cte_right r on r.col1 = col11;
{code}

  was:
Missing table alias in LEFT JOIN causing inconsistent results:
 !alias.png!

VS

!no_alias.png!


> Missing table alias in LEFT JOIN causing inconsistent results
> -
>
> Key: HIVE-24243
> URL: https://issues.apache.org/jira/browse/HIVE-24243
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.1
>Reporter: Henrique dos Santos Goulart
>Priority: Major
> Attachments: alias.png, no_alias.png
>
>
> Missing table alias in LEFT JOIN causing inconsistent results, see 
> attachments.
> {code:java}
> with
> cte_left as (
> select null as col11, 'col2_1' as col2 union all
> select 1as col11, 'col2_2' as col2
> ),
> cte_right as (
> select 1 as col1, 'col3' as col3
> )
> select *
> from cte_left l
> left join cte_right r on r.col1 = l.col11;
> {code}
> vs
> {code:java}
> with
> cte_left as (
> select null as col11, 'col2_1' as col2 union all
> select 1as col11, 'col2_2' as col2
> ),
> cte_right as (
> select 1 as col1, 'col3' as col3
> )
> select *
> from cte_left
> left join cte_right r on r.col1 = col11;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24243) Missing table alias in LEFT JOIN causing inconsistent results

2020-10-08 Thread Henrique dos Santos Goulart (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henrique dos Santos Goulart updated HIVE-24243:
---
Attachment: no_alias.png
alias.png

> Missing table alias in LEFT JOIN causing inconsistent results
> -
>
> Key: HIVE-24243
> URL: https://issues.apache.org/jira/browse/HIVE-24243
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.1
>Reporter: Henrique dos Santos Goulart
>Priority: Major
> Attachments: alias.png, no_alias.png
>
>
> Missing table alias in LEFT JOIN causing inconsistent results:
> !alias.png|id=cp-img!
> VS
> !no_alias.png|id=cp-img!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24243) Missing table alias in LEFT JOIN causing inconsistent results

2020-10-08 Thread Henrique dos Santos Goulart (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henrique dos Santos Goulart updated HIVE-24243:
---
Description: 
Missing table alias in LEFT JOIN causing inconsistent results:
 !alias.png!

VS

!no_alias.png!

  was:
Missing table alias in LEFT JOIN causing inconsistent results:
!alias.png|id=cp-img!


VS

!no_alias.png|id=cp-img!


> Missing table alias in LEFT JOIN causing inconsistent results
> -
>
> Key: HIVE-24243
> URL: https://issues.apache.org/jira/browse/HIVE-24243
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.1
>Reporter: Henrique dos Santos Goulart
>Priority: Major
> Attachments: alias.png, no_alias.png
>
>
> Missing table alias in LEFT JOIN causing inconsistent results:
>  !alias.png!
> VS
> !no_alias.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called

2020-10-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=497353=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-497353
 ]

ASF GitHub Bot logged work on HIVE-21052:
-

Author: ASF GitHub Bot
Created on: 08/Oct/20 13:52
Start Date: 08/Oct/20 13:52
Worklog Time Spent: 10m 
  Work Description: pvargacl commented on a change in pull request #1548:
URL: https://github.com/apache/hive/pull/1548#discussion_r501737923



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java
##
@@ -414,77 +436,56 @@ public void markCleaned(CompactionInfo info) throws 
MetaException {
  * aborted TXN_COMPONENTS above tc_writeid (and consequently about 
aborted txns).
  * See {@link ql.txn.compactor.Cleaner.removeFiles()}
  */
-s = "SELECT DISTINCT \"TXN_ID\" FROM \"TXNS\", \"TXN_COMPONENTS\" 
WHERE \"TXN_ID\" = \"TC_TXNID\" "
-+ "AND \"TXN_STATE\" = " + TxnStatus.ABORTED + " AND 
\"TC_DATABASE\" = ? AND \"TC_TABLE\" = ?";
-if (info.highestWriteId != 0) s += " AND \"TC_WRITEID\" <= ?";
-if (info.partName != null) s += " AND \"TC_PARTITION\" = ?";
+List queries = new ArrayList<>();
+Iterator writeIdsIter = null;
+List counts = null;
 
-pStmt = dbConn.prepareStatement(s);
-paramCount = 1;
-pStmt.setString(paramCount++, info.dbname);
-pStmt.setString(paramCount++, info.tableName);
-if(info.highestWriteId != 0) {
-  pStmt.setLong(paramCount++, info.highestWriteId);
+s = "DELETE FROM \"TXN_COMPONENTS\" WHERE \"TC_TXNID\" IN (" +
+  "   SELECT \"TXN_ID\" FROM \"TXNS\" WHERE \"TXN_STATE\" = " + 
TxnStatus.ABORTED + ") " +

Review comment:
   cool





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 497353)
Time Spent: 9h 20m  (was: 9h 10m)

> Make sure transactions get cleaned if they are aborted before addPartitions 
> is called
> -
>
> Key: HIVE-21052
> URL: https://issues.apache.org/jira/browse/HIVE-21052
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0, 3.1.1
>Reporter: Jaume M
>Assignee: Jaume M
>Priority: Critical
>  Labels: pull-request-available
> Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, 
> HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, 
> HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, 
> HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, 
> HIVE-21052.8.patch, HIVE-21052.9.patch
>
>  Time Spent: 9h 20m
>  Remaining Estimate: 0h
>
> If the transaction is aborted between openTxn and addPartitions and data has 
> been written on the table the transaction manager will think it's an empty 
> transaction and no cleaning will be done.
> This is currently an issue in the streaming API and in micromanaged tables. 
> As proposed by [~ekoifman] this can be solved by:
> * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and 
> when addPartitions is called remove this entry from TXN_COMPONENTS and add 
> the corresponding partition entry to TXN_COMPONENTS.
> * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that 
> specifies that a transaction was opened and it was aborted it must generate 
> jobs for the worker for every possible partition available.
> cc [~ewohlstadter]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called

2020-10-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=497352=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-497352
 ]

ASF GitHub Bot logged work on HIVE-21052:
-

Author: ASF GitHub Bot
Created on: 08/Oct/20 13:45
Start Date: 08/Oct/20 13:45
Worklog Time Spent: 10m 
  Work Description: pvargacl commented on a change in pull request #1548:
URL: https://github.com/apache/hive/pull/1548#discussion_r501732794



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java
##
@@ -107,11 +107,12 @@ public CompactionTxnHandler() {
 // Check for aborted txns: number of aborted txns past threshold and 
age of aborted txns
 // past time threshold
 boolean checkAbortedTimeThreshold = abortedTimeThreshold >= 0;
-final String sCheckAborted = "SELECT \"TC_DATABASE\", \"TC_TABLE\", 
\"TC_PARTITION\","
-+ "MIN(\"TXN_STARTED\"), COUNT(*)"
+String sCheckAborted = "SELECT \"TC_DATABASE\", \"TC_TABLE\", 
\"TC_PARTITION\", "
++ "MIN(\"TXN_STARTED\"), COUNT(*), "
++ "MAX(CASE WHEN \"TC_OPERATION_TYPE\" = " + OperationType.DYNPART 
+ " THEN 1 ELSE 0 END) AS \"IS_DP\" "

Review comment:
   Previously if you had aborted txn above threshold this would generate a 
"normal" compaction that would clean up everything. However now if you have one 
dynpart aborted the type will be CLEAN_ABORTED that will only clean the 
writeids belonging to p-type records and leave everything else. This will delay 
the normal cleaning. I am not sure that is a problem or not. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 497352)
Time Spent: 9h 10m  (was: 9h)

> Make sure transactions get cleaned if they are aborted before addPartitions 
> is called
> -
>
> Key: HIVE-21052
> URL: https://issues.apache.org/jira/browse/HIVE-21052
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0, 3.1.1
>Reporter: Jaume M
>Assignee: Jaume M
>Priority: Critical
>  Labels: pull-request-available
> Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, 
> HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, 
> HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, 
> HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, 
> HIVE-21052.8.patch, HIVE-21052.9.patch
>
>  Time Spent: 9h 10m
>  Remaining Estimate: 0h
>
> If the transaction is aborted between openTxn and addPartitions and data has 
> been written on the table the transaction manager will think it's an empty 
> transaction and no cleaning will be done.
> This is currently an issue in the streaming API and in micromanaged tables. 
> As proposed by [~ekoifman] this can be solved by:
> * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and 
> when addPartitions is called remove this entry from TXN_COMPONENTS and add 
> the corresponding partition entry to TXN_COMPONENTS.
> * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that 
> specifies that a transaction was opened and it was aborted it must generate 
> jobs for the worker for every possible partition available.
> cc [~ewohlstadter]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called

2020-10-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=497348=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-497348
 ]

ASF GitHub Bot logged work on HIVE-21052:
-

Author: ASF GitHub Bot
Created on: 08/Oct/20 13:44
Start Date: 08/Oct/20 13:44
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1548:
URL: https://github.com/apache/hive/pull/1548#discussion_r501731802



##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java
##
@@ -853,6 +857,273 @@ public void majorCompactAfterAbort() throws Exception {
 Lists.newArrayList(5, 6), 1);
   }
 
+  @Test
+  public void testCleanAbortCompactAfterAbortTwoPartitions() throws Exception {
+String dbName = "default";
+String tblName = "cws";
+
+HiveStreamingConnection connection1 = 
prepareTableTwoPartitionsAndConnection(dbName, tblName, 1);
+HiveStreamingConnection connection2 = 
prepareTableTwoPartitionsAndConnection(dbName, tblName, 1);
+
+connection1.beginTransaction();
+connection1.write("1,1".getBytes());
+connection1.write("2,2".getBytes());
+connection1.abortTransaction();
+
+connection2.beginTransaction();
+connection2.write("1,3".getBytes());
+connection2.write("2,3".getBytes());
+connection2.write("3,3".getBytes());
+connection2.abortTransaction();
+
+assertAndCompactCleanAbort(dbName, tblName);
+
+connection1.close();
+connection2.close();
+  }
+
+  @Test
+  public void testCleanAbortCompactAfterAbort() throws Exception {
+String dbName = "default";
+String tblName = "cws";
+
+// Create three folders with two different transactions
+HiveStreamingConnection connection1 = prepareTableAndConnection(dbName, 
tblName, 1);
+HiveStreamingConnection connection2 = prepareTableAndConnection(dbName, 
tblName, 1);
+
+connection1.beginTransaction();
+connection1.write("1,1".getBytes());
+connection1.write("2,2".getBytes());
+connection1.abortTransaction();
+
+connection2.beginTransaction();
+connection2.write("1,3".getBytes());
+connection2.write("2,3".getBytes());
+connection2.write("3,3".getBytes());
+connection2.abortTransaction();
+
+assertAndCompactCleanAbort(dbName, tblName);
+
+connection1.close();
+connection2.close();
+  }
+
+  private void assertAndCompactCleanAbort(String dbName, String tblName) 
throws Exception {
+IMetaStoreClient msClient = new HiveMetaStoreClient(conf);
+TxnStore txnHandler = TxnUtils.getTxnStore(conf);
+Table table = msClient.getTable(dbName, tblName);
+FileSystem fs = FileSystem.get(conf);
+FileStatus[] stat =
+fs.listStatus(new Path(table.getSd().getLocation()));
+if (3 != stat.length) {
+  Assert.fail("Expecting three directories corresponding to three 
partitions, FileStatus[] stat " + Arrays.toString(stat));
+}
+
+int count = TxnDbUtil.countQueryAgent(conf, "select count(*) from 
TXN_COMPONENTS where TC_OPERATION_TYPE='p'");
+// We should have two rows corresponding to the two aborted transactions
+Assert.assertEquals(TxnDbUtil.queryToString(conf, "select * from 
TXN_COMPONENTS"), 2, count);
+
+runInitiator(conf);
+count = TxnDbUtil.countQueryAgent(conf, "select count(*) from 
COMPACTION_QUEUE where CQ_TYPE='p'");
+// Only one job is added to the queue per table. This job corresponds to 
all the entries for a particular table
+// with rows in TXN_COMPONENTS
+Assert.assertEquals(TxnDbUtil.queryToString(conf, "select * from 
COMPACTION_QUEUE"), 1, count);
+
+ShowCompactResponse rsp = txnHandler.showCompact(new ShowCompactRequest());
+Assert.assertEquals(1, rsp.getCompacts().size());
+Assert.assertEquals(TxnStore.CLEANING_RESPONSE, 
rsp.getCompacts().get(0).getState());
+Assert.assertEquals("cws", rsp.getCompacts().get(0).getTablename());
+Assert.assertEquals(CompactionType.CLEAN_ABORTED,
+rsp.getCompacts().get(0).getType());
+
+runCleaner(conf);
+
+// After the cleaner runs TXN_COMPONENTS and COMPACTION_QUEUE should have 
zero rows, also the folders should have been deleted.
+count = TxnDbUtil.countQueryAgent(conf, "select count(*) from 
TXN_COMPONENTS");
+Assert.assertEquals(TxnDbUtil.queryToString(conf, "select * from 
TXN_COMPONENTS"), 0, count);
+
+count = TxnDbUtil.countQueryAgent(conf, "select count(*) from 
COMPACTION_QUEUE");
+Assert.assertEquals(TxnDbUtil.queryToString(conf, "select * from 
COMPACTION_QUEUE"), 0, count);
+
+RemoteIterator it =
+fs.listFiles(new Path(table.getSd().getLocation()), true);
+if (it.hasNext()) {
+  Assert.fail("Expecting compaction to have cleaned the directories, 
FileStatus[] stat " + Arrays.toString(stat));
+}
+
+rsp = txnHandler.showCompact(new

[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called

2020-10-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=497347=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-497347
 ]

ASF GitHub Bot logged work on HIVE-21052:
-

Author: ASF GitHub Bot
Created on: 08/Oct/20 13:44
Start Date: 08/Oct/20 13:44
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1548:
URL: https://github.com/apache/hive/pull/1548#discussion_r501731697



##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java
##
@@ -232,6 +240,51 @@ public Object run() throws Exception {
   private static String idWatermark(CompactionInfo ci) {
 return " id=" + ci.id;
   }
+
+  private void cleanAborted(CompactionInfo ci) throws MetaException {
+if (ci.writeIds == null || ci.writeIds.size() == 0) {
+  LOG.warn("Attempted cleaning aborted transaction with empty writeId 
list");

Review comment:
   yep, good catch!





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 497347)
Time Spent: 8h 50m  (was: 8h 40m)

> Make sure transactions get cleaned if they are aborted before addPartitions 
> is called
> -
>
> Key: HIVE-21052
> URL: https://issues.apache.org/jira/browse/HIVE-21052
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0, 3.1.1
>Reporter: Jaume M
>Assignee: Jaume M
>Priority: Critical
>  Labels: pull-request-available
> Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, 
> HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, 
> HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, 
> HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, 
> HIVE-21052.8.patch, HIVE-21052.9.patch
>
>  Time Spent: 8h 50m
>  Remaining Estimate: 0h
>
> If the transaction is aborted between openTxn and addPartitions and data has 
> been written on the table the transaction manager will think it's an empty 
> transaction and no cleaning will be done.
> This is currently an issue in the streaming API and in micromanaged tables. 
> As proposed by [~ekoifman] this can be solved by:
> * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and 
> when addPartitions is called remove this entry from TXN_COMPONENTS and add 
> the corresponding partition entry to TXN_COMPONENTS.
> * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that 
> specifies that a transaction was opened and it was aborted it must generate 
> jobs for the worker for every possible partition available.
> cc [~ewohlstadter]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called

2020-10-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=497333=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-497333
 ]

ASF GitHub Bot logged work on HIVE-21052:
-

Author: ASF GitHub Bot
Created on: 08/Oct/20 13:33
Start Date: 08/Oct/20 13:33
Worklog Time Spent: 10m 
  Work Description: pvargacl commented on a change in pull request #1548:
URL: https://github.com/apache/hive/pull/1548#discussion_r501723853



##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java
##
@@ -232,6 +240,51 @@ public Object run() throws Exception {
   private static String idWatermark(CompactionInfo ci) {
 return " id=" + ci.id;
   }
+
+  private void cleanAborted(CompactionInfo ci) throws MetaException {
+if (ci.writeIds == null || ci.writeIds.size() == 0) {
+  LOG.warn("Attempted cleaning aborted transaction with empty writeId 
list");

Review comment:
   Shouldn't you mark the compaction failed or cleaned?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 497333)
Time Spent: 8h 40m  (was: 8.5h)

> Make sure transactions get cleaned if they are aborted before addPartitions 
> is called
> -
>
> Key: HIVE-21052
> URL: https://issues.apache.org/jira/browse/HIVE-21052
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0, 3.1.1
>Reporter: Jaume M
>Assignee: Jaume M
>Priority: Critical
>  Labels: pull-request-available
> Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, 
> HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, 
> HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, 
> HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, 
> HIVE-21052.8.patch, HIVE-21052.9.patch
>
>  Time Spent: 8h 40m
>  Remaining Estimate: 0h
>
> If the transaction is aborted between openTxn and addPartitions and data has 
> been written on the table the transaction manager will think it's an empty 
> transaction and no cleaning will be done.
> This is currently an issue in the streaming API and in micromanaged tables. 
> As proposed by [~ekoifman] this can be solved by:
> * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and 
> when addPartitions is called remove this entry from TXN_COMPONENTS and add 
> the corresponding partition entry to TXN_COMPONENTS.
> * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that 
> specifies that a transaction was opened and it was aborted it must generate 
> jobs for the worker for every possible partition available.
> cc [~ewohlstadter]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called

2020-10-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=497332=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-497332
 ]

ASF GitHub Bot logged work on HIVE-21052:
-

Author: ASF GitHub Bot
Created on: 08/Oct/20 13:32
Start Date: 08/Oct/20 13:32
Worklog Time Spent: 10m 
  Work Description: pvargacl commented on a change in pull request #1548:
URL: https://github.com/apache/hive/pull/1548#discussion_r501723043



##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java
##
@@ -97,9 +100,9 @@ public void run() {
   long minOpenTxnId = txnHandler.findMinOpenTxnIdForCleaner();
   LOG.info("Cleaning based on min open txn id: " + minOpenTxnId);
   List cleanerList = new ArrayList<>();
-  for(CompactionInfo compactionInfo : txnHandler.findReadyToClean()) {
+  for (CompactionInfo compactionInfo : txnHandler.findReadyToClean()) {
 
cleanerList.add(CompletableFuture.runAsync(CompactorUtil.ThrowingRunnable.unchecked(()
 ->
-clean(compactionInfo, minOpenTxnId)), cleanerExecutor));
+  clean(compactionInfo, minOpenTxnId)), cleanerExecutor));

Review comment:
   I agree, it can be addressed in a follow up Jira.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 497332)
Time Spent: 8.5h  (was: 8h 20m)

> Make sure transactions get cleaned if they are aborted before addPartitions 
> is called
> -
>
> Key: HIVE-21052
> URL: https://issues.apache.org/jira/browse/HIVE-21052
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0, 3.1.1
>Reporter: Jaume M
>Assignee: Jaume M
>Priority: Critical
>  Labels: pull-request-available
> Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, 
> HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, 
> HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, 
> HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, 
> HIVE-21052.8.patch, HIVE-21052.9.patch
>
>  Time Spent: 8.5h
>  Remaining Estimate: 0h
>
> If the transaction is aborted between openTxn and addPartitions and data has 
> been written on the table the transaction manager will think it's an empty 
> transaction and no cleaning will be done.
> This is currently an issue in the streaming API and in micromanaged tables. 
> As proposed by [~ekoifman] this can be solved by:
> * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and 
> when addPartitions is called remove this entry from TXN_COMPONENTS and add 
> the corresponding partition entry to TXN_COMPONENTS.
> * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that 
> specifies that a transaction was opened and it was aborted it must generate 
> jobs for the worker for every possible partition available.
> cc [~ewohlstadter]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called

2020-10-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=497328=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-497328
 ]

ASF GitHub Bot logged work on HIVE-21052:
-

Author: ASF GitHub Bot
Created on: 08/Oct/20 13:25
Start Date: 08/Oct/20 13:25
Worklog Time Spent: 10m 
  Work Description: pvargacl commented on a change in pull request #1548:
URL: https://github.com/apache/hive/pull/1548#discussion_r501718248



##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java
##
@@ -853,6 +857,273 @@ public void majorCompactAfterAbort() throws Exception {
 Lists.newArrayList(5, 6), 1);
   }
 
+  @Test
+  public void testCleanAbortCompactAfterAbortTwoPartitions() throws Exception {
+String dbName = "default";
+String tblName = "cws";
+
+HiveStreamingConnection connection1 = 
prepareTableTwoPartitionsAndConnection(dbName, tblName, 1);
+HiveStreamingConnection connection2 = 
prepareTableTwoPartitionsAndConnection(dbName, tblName, 1);
+
+connection1.beginTransaction();
+connection1.write("1,1".getBytes());
+connection1.write("2,2".getBytes());
+connection1.abortTransaction();
+
+connection2.beginTransaction();
+connection2.write("1,3".getBytes());
+connection2.write("2,3".getBytes());
+connection2.write("3,3".getBytes());
+connection2.abortTransaction();
+
+assertAndCompactCleanAbort(dbName, tblName);
+
+connection1.close();
+connection2.close();
+  }
+
+  @Test
+  public void testCleanAbortCompactAfterAbort() throws Exception {
+String dbName = "default";
+String tblName = "cws";
+
+// Create three folders with two different transactions
+HiveStreamingConnection connection1 = prepareTableAndConnection(dbName, 
tblName, 1);
+HiveStreamingConnection connection2 = prepareTableAndConnection(dbName, 
tblName, 1);
+
+connection1.beginTransaction();
+connection1.write("1,1".getBytes());
+connection1.write("2,2".getBytes());
+connection1.abortTransaction();
+
+connection2.beginTransaction();
+connection2.write("1,3".getBytes());
+connection2.write("2,3".getBytes());
+connection2.write("3,3".getBytes());
+connection2.abortTransaction();
+
+assertAndCompactCleanAbort(dbName, tblName);
+
+connection1.close();
+connection2.close();
+  }
+
+  private void assertAndCompactCleanAbort(String dbName, String tblName) 
throws Exception {
+IMetaStoreClient msClient = new HiveMetaStoreClient(conf);
+TxnStore txnHandler = TxnUtils.getTxnStore(conf);
+Table table = msClient.getTable(dbName, tblName);
+FileSystem fs = FileSystem.get(conf);
+FileStatus[] stat =
+fs.listStatus(new Path(table.getSd().getLocation()));
+if (3 != stat.length) {
+  Assert.fail("Expecting three directories corresponding to three 
partitions, FileStatus[] stat " + Arrays.toString(stat));
+}
+
+int count = TxnDbUtil.countQueryAgent(conf, "select count(*) from 
TXN_COMPONENTS where TC_OPERATION_TYPE='p'");
+// We should have two rows corresponding to the two aborted transactions
+Assert.assertEquals(TxnDbUtil.queryToString(conf, "select * from 
TXN_COMPONENTS"), 2, count);
+
+runInitiator(conf);
+count = TxnDbUtil.countQueryAgent(conf, "select count(*) from 
COMPACTION_QUEUE where CQ_TYPE='p'");
+// Only one job is added to the queue per table. This job corresponds to 
all the entries for a particular table
+// with rows in TXN_COMPONENTS
+Assert.assertEquals(TxnDbUtil.queryToString(conf, "select * from 
COMPACTION_QUEUE"), 1, count);
+
+ShowCompactResponse rsp = txnHandler.showCompact(new ShowCompactRequest());
+Assert.assertEquals(1, rsp.getCompacts().size());
+Assert.assertEquals(TxnStore.CLEANING_RESPONSE, 
rsp.getCompacts().get(0).getState());
+Assert.assertEquals("cws", rsp.getCompacts().get(0).getTablename());
+Assert.assertEquals(CompactionType.CLEAN_ABORTED,
+rsp.getCompacts().get(0).getType());
+
+runCleaner(conf);
+
+// After the cleaner runs TXN_COMPONENTS and COMPACTION_QUEUE should have 
zero rows, also the folders should have been deleted.
+count = TxnDbUtil.countQueryAgent(conf, "select count(*) from 
TXN_COMPONENTS");
+Assert.assertEquals(TxnDbUtil.queryToString(conf, "select * from 
TXN_COMPONENTS"), 0, count);
+
+count = TxnDbUtil.countQueryAgent(conf, "select count(*) from 
COMPACTION_QUEUE");
+Assert.assertEquals(TxnDbUtil.queryToString(conf, "select * from 
COMPACTION_QUEUE"), 0, count);
+
+RemoteIterator it =
+fs.listFiles(new Path(table.getSd().getLocation()), true);
+if (it.hasNext()) {
+  Assert.fail("Expecting compaction to have cleaned the directories, 
FileStatus[] stat " + Arrays.toString(stat));
+}
+
+rsp = txnHandler.showCompact(new

[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called

2020-10-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=497326=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-497326
 ]

ASF GitHub Bot logged work on HIVE-21052:
-

Author: ASF GitHub Bot
Created on: 08/Oct/20 13:21
Start Date: 08/Oct/20 13:21
Worklog Time Spent: 10m 
  Work Description: pvargacl commented on a change in pull request #1548:
URL: https://github.com/apache/hive/pull/1548#discussion_r501715311



##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java
##
@@ -853,6 +857,273 @@ public void majorCompactAfterAbort() throws Exception {
 Lists.newArrayList(5, 6), 1);
   }
 
+  @Test
+  public void testCleanAbortCompactAfterAbortTwoPartitions() throws Exception {
+String dbName = "default";
+String tblName = "cws";
+
+HiveStreamingConnection connection1 = 
prepareTableTwoPartitionsAndConnection(dbName, tblName, 1);
+HiveStreamingConnection connection2 = 
prepareTableTwoPartitionsAndConnection(dbName, tblName, 1);
+
+connection1.beginTransaction();
+connection1.write("1,1".getBytes());
+connection1.write("2,2".getBytes());
+connection1.abortTransaction();
+
+connection2.beginTransaction();
+connection2.write("1,3".getBytes());
+connection2.write("2,3".getBytes());
+connection2.write("3,3".getBytes());
+connection2.abortTransaction();
+
+assertAndCompactCleanAbort(dbName, tblName);
+
+connection1.close();
+connection2.close();
+  }
+
+  @Test
+  public void testCleanAbortCompactAfterAbort() throws Exception {
+String dbName = "default";
+String tblName = "cws";
+
+// Create three folders with two different transactions
+HiveStreamingConnection connection1 = prepareTableAndConnection(dbName, 
tblName, 1);
+HiveStreamingConnection connection2 = prepareTableAndConnection(dbName, 
tblName, 1);
+
+connection1.beginTransaction();
+connection1.write("1,1".getBytes());
+connection1.write("2,2".getBytes());
+connection1.abortTransaction();
+
+connection2.beginTransaction();
+connection2.write("1,3".getBytes());
+connection2.write("2,3".getBytes());
+connection2.write("3,3".getBytes());
+connection2.abortTransaction();
+
+assertAndCompactCleanAbort(dbName, tblName);
+
+connection1.close();
+connection2.close();
+  }
+
+  private void assertAndCompactCleanAbort(String dbName, String tblName) 
throws Exception {
+IMetaStoreClient msClient = new HiveMetaStoreClient(conf);
+TxnStore txnHandler = TxnUtils.getTxnStore(conf);
+Table table = msClient.getTable(dbName, tblName);
+FileSystem fs = FileSystem.get(conf);
+FileStatus[] stat =
+fs.listStatus(new Path(table.getSd().getLocation()));
+if (3 != stat.length) {
+  Assert.fail("Expecting three directories corresponding to three 
partitions, FileStatus[] stat " + Arrays.toString(stat));
+}
+
+int count = TxnDbUtil.countQueryAgent(conf, "select count(*) from 
TXN_COMPONENTS where TC_OPERATION_TYPE='p'");
+// We should have two rows corresponding to the two aborted transactions
+Assert.assertEquals(TxnDbUtil.queryToString(conf, "select * from 
TXN_COMPONENTS"), 2, count);
+
+runInitiator(conf);
+count = TxnDbUtil.countQueryAgent(conf, "select count(*) from 
COMPACTION_QUEUE where CQ_TYPE='p'");
+// Only one job is added to the queue per table. This job corresponds to 
all the entries for a particular table
+// with rows in TXN_COMPONENTS
+Assert.assertEquals(TxnDbUtil.queryToString(conf, "select * from 
COMPACTION_QUEUE"), 1, count);
+
+ShowCompactResponse rsp = txnHandler.showCompact(new ShowCompactRequest());
+Assert.assertEquals(1, rsp.getCompacts().size());
+Assert.assertEquals(TxnStore.CLEANING_RESPONSE, 
rsp.getCompacts().get(0).getState());
+Assert.assertEquals("cws", rsp.getCompacts().get(0).getTablename());
+Assert.assertEquals(CompactionType.CLEAN_ABORTED,
+rsp.getCompacts().get(0).getType());
+
+runCleaner(conf);
+
+// After the cleaner runs TXN_COMPONENTS and COMPACTION_QUEUE should have 
zero rows, also the folders should have been deleted.
+count = TxnDbUtil.countQueryAgent(conf, "select count(*) from 
TXN_COMPONENTS");
+Assert.assertEquals(TxnDbUtil.queryToString(conf, "select * from 
TXN_COMPONENTS"), 0, count);
+
+count = TxnDbUtil.countQueryAgent(conf, "select count(*) from 
COMPACTION_QUEUE");
+Assert.assertEquals(TxnDbUtil.queryToString(conf, "select * from 
COMPACTION_QUEUE"), 0, count);
+
+RemoteIterator it =
+fs.listFiles(new Path(table.getSd().getLocation()), true);
+if (it.hasNext()) {
+  Assert.fail("Expecting compaction to have cleaned the directories, 
FileStatus[] stat " + Arrays.toString(stat));
+}
+
+rsp = txnHandler.showCompact(new

[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called

2020-10-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=497325=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-497325
 ]

ASF GitHub Bot logged work on HIVE-21052:
-

Author: ASF GitHub Bot
Created on: 08/Oct/20 13:19
Start Date: 08/Oct/20 13:19
Worklog Time Spent: 10m 
  Work Description: pvargacl commented on a change in pull request #1548:
URL: https://github.com/apache/hive/pull/1548#discussion_r501713463



##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java
##
@@ -853,6 +857,273 @@ public void majorCompactAfterAbort() throws Exception {
 Lists.newArrayList(5, 6), 1);
   }
 
+  @Test
+  public void testCleanAbortCompactAfterAbortTwoPartitions() throws Exception {
+String dbName = "default";
+String tblName = "cws";
+
+HiveStreamingConnection connection1 = 
prepareTableTwoPartitionsAndConnection(dbName, tblName, 1);
+HiveStreamingConnection connection2 = 
prepareTableTwoPartitionsAndConnection(dbName, tblName, 1);
+
+connection1.beginTransaction();
+connection1.write("1,1".getBytes());
+connection1.write("2,2".getBytes());
+connection1.abortTransaction();
+
+connection2.beginTransaction();
+connection2.write("1,3".getBytes());
+connection2.write("2,3".getBytes());
+connection2.write("3,3".getBytes());
+connection2.abortTransaction();
+
+assertAndCompactCleanAbort(dbName, tblName);
+
+connection1.close();
+connection2.close();
+  }
+
+  @Test
+  public void testCleanAbortCompactAfterAbort() throws Exception {
+String dbName = "default";
+String tblName = "cws";
+
+// Create three folders with two different transactions
+HiveStreamingConnection connection1 = prepareTableAndConnection(dbName, 
tblName, 1);
+HiveStreamingConnection connection2 = prepareTableAndConnection(dbName, 
tblName, 1);
+
+connection1.beginTransaction();
+connection1.write("1,1".getBytes());
+connection1.write("2,2".getBytes());
+connection1.abortTransaction();
+
+connection2.beginTransaction();
+connection2.write("1,3".getBytes());
+connection2.write("2,3".getBytes());
+connection2.write("3,3".getBytes());
+connection2.abortTransaction();
+
+assertAndCompactCleanAbort(dbName, tblName);
+
+connection1.close();
+connection2.close();
+  }
+
+  private void assertAndCompactCleanAbort(String dbName, String tblName) 
throws Exception {
+IMetaStoreClient msClient = new HiveMetaStoreClient(conf);
+TxnStore txnHandler = TxnUtils.getTxnStore(conf);
+Table table = msClient.getTable(dbName, tblName);
+FileSystem fs = FileSystem.get(conf);
+FileStatus[] stat =
+fs.listStatus(new Path(table.getSd().getLocation()));
+if (3 != stat.length) {
+  Assert.fail("Expecting three directories corresponding to three 
partitions, FileStatus[] stat " + Arrays.toString(stat));
+}
+
+int count = TxnDbUtil.countQueryAgent(conf, "select count(*) from 
TXN_COMPONENTS where TC_OPERATION_TYPE='p'");
+// We should have two rows corresponding to the two aborted transactions
+Assert.assertEquals(TxnDbUtil.queryToString(conf, "select * from 
TXN_COMPONENTS"), 2, count);
+
+runInitiator(conf);
+count = TxnDbUtil.countQueryAgent(conf, "select count(*) from 
COMPACTION_QUEUE where CQ_TYPE='p'");
+// Only one job is added to the queue per table. This job corresponds to 
all the entries for a particular table
+// with rows in TXN_COMPONENTS
+Assert.assertEquals(TxnDbUtil.queryToString(conf, "select * from 
COMPACTION_QUEUE"), 1, count);
+
+ShowCompactResponse rsp = txnHandler.showCompact(new ShowCompactRequest());
+Assert.assertEquals(1, rsp.getCompacts().size());
+Assert.assertEquals(TxnStore.CLEANING_RESPONSE, 
rsp.getCompacts().get(0).getState());
+Assert.assertEquals("cws", rsp.getCompacts().get(0).getTablename());
+Assert.assertEquals(CompactionType.CLEAN_ABORTED,
+rsp.getCompacts().get(0).getType());
+
+runCleaner(conf);
+
+// After the cleaner runs TXN_COMPONENTS and COMPACTION_QUEUE should have 
zero rows, also the folders should have been deleted.
+count = TxnDbUtil.countQueryAgent(conf, "select count(*) from 
TXN_COMPONENTS");
+Assert.assertEquals(TxnDbUtil.queryToString(conf, "select * from 
TXN_COMPONENTS"), 0, count);
+
+count = TxnDbUtil.countQueryAgent(conf, "select count(*) from 
COMPACTION_QUEUE");
+Assert.assertEquals(TxnDbUtil.queryToString(conf, "select * from 
COMPACTION_QUEUE"), 0, count);
+
+RemoteIterator it =
+fs.listFiles(new Path(table.getSd().getLocation()), true);
+if (it.hasNext()) {
+  Assert.fail("Expecting compaction to have cleaned the directories, 
FileStatus[] stat " + Arrays.toString(stat));

Review comment:
   I think this assert is quit misleading. I

[jira] [Work logged] (HIVE-24203) Implement stats annotation rule for the LateralViewJoinOperator

2020-10-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24203?focusedWorklogId=497313=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-497313
 ]

ASF GitHub Bot logged work on HIVE-24203:
-

Author: ASF GitHub Bot
Created on: 08/Oct/20 12:56
Start Date: 08/Oct/20 12:56
Worklog Time Spent: 10m 
  Work Description: okumin commented on a change in pull request #1531:
URL: https://github.com/apache/hive/pull/1531#discussion_r501697579



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java
##
@@ -2921,6 +2920,97 @@ public Object process(Node nd, Stack stack, 
NodeProcessorCtx procCtx,
 }
   }
 
+  /**
+   * LateralViewJoinOperator changes the data size and column level statistics.
+   *
+   * A diagram of LATERAL VIEW.
+   *
+   *   [Lateral View Forward]
+   *  / \
+   *[Select]  [Select]
+   *||
+   *| [UDTF]
+   *\   /
+   *   [Lateral View Join]
+   *
+   * For each row of the source, the left branch just picks columns and the 
right branch processes UDTF.
+   * And then LVJ joins a row from the left branch with rows from the right 
branch.
+   * The join has one-to-many relationship since UDTF can generate multiple 
rows.
+   *
+   * This rule multiplies the stats from the left branch by T(right) / T(left) 
and sums up the both sides.
+   */
+  public static class LateralViewJoinStatsRule extends DefaultStatsRule 
implements SemanticNodeProcessor {
+@Override
+public Object process(Node nd, Stack stack, NodeProcessorCtx procCtx,
+  Object... nodeOutputs) throws SemanticException {
+  final LateralViewJoinOperator lop = (LateralViewJoinOperator) nd;
+  final AnnotateStatsProcCtx aspCtx = (AnnotateStatsProcCtx) procCtx;
+  final HiveConf conf = aspCtx.getConf();
+
+  if (!isAllParentsContainStatistics(lop)) {
+return null;
+  }
+
+  final List> parents = 
lop.getParentOperators();
+  if (parents.size() != 2) {
+LOG.warn("LateralViewJoinOperator should have just two parents but 
actually has "
++ parents.size() + " parents.");
+return null;
+  }
+
+  final Statistics selectStats = 
parents.get(LateralViewJoinOperator.SELECT_TAG).getStatistics();
+  final Statistics udtfStats = 
parents.get(LateralViewJoinOperator.UDTF_TAG).getStatistics();
+
+  final double factor = (double) udtfStats.getNumRows() / (double) 
selectStats.getNumRows();
+  final long selectDataSize = 
StatsUtils.safeMult(selectStats.getDataSize(), factor);
+  final long dataSize = StatsUtils.safeAdd(selectDataSize, 
udtfStats.getDataSize());
+  Statistics joinedStats = new Statistics(udtfStats.getNumRows(), 
dataSize, 0, 0);
+
+  if (satisfyPrecondition(selectStats) && satisfyPrecondition(udtfStats)) {
+final Map columnExprMap = lop.getColumnExprMap();
+final RowSchema schema = lop.getSchema();
+
+joinedStats.updateColumnStatsState(selectStats.getColumnStatsState());
+final List selectColStats = StatsUtils
+.getColStatisticsFromExprMap(conf, selectStats, columnExprMap, 
schema);
+joinedStats.addToColumnStats(multiplyColStats(selectColStats, factor));
+
+joinedStats.updateColumnStatsState(udtfStats.getColumnStatsState());
+final List udtfColStats = StatsUtils
+.getColStatisticsFromExprMap(conf, udtfStats, columnExprMap, 
schema);
+joinedStats.addToColumnStats(udtfColStats);
+
+joinedStats = applyRuntimeStats(aspCtx.getParseContext().getContext(), 
joinedStats, lop);
+lop.setStatistics(joinedStats);
+
+if (LOG.isDebugEnabled()) {
+  LOG.debug("[0] STATS-" + lop.toString() + ": " + 
joinedStats.extendedToString());
+}
+  } else {
+joinedStats = applyRuntimeStats(aspCtx.getParseContext().getContext(), 
joinedStats, lop);
+lop.setStatistics(joinedStats);
+
+if (LOG.isDebugEnabled()) {
+  LOG.debug("[1] STATS-" + lop.toString() + ": " + 
joinedStats.extendedToString());
+}
+  }
+  return null;
+}
+
+private List multiplyColStats(List 
colStatistics, double factor) {
+  for (ColStatistics colStats : colStatistics) {
+colStats.setNumFalses(StatsUtils.safeMult(colStats.getNumFalses(), 
factor));
+colStats.setNumTrues(StatsUtils.safeMult(colStats.getNumTrues(), 
factor));
+colStats.setNumNulls(StatsUtils.safeMult(colStats.getNumNulls(), 
factor));
+// When factor > 1, the same records are duplicated and countDistinct 
never changes.
+if (factor < 1.0) {
+  
colStats.setCountDistint(StatsUtils.safeMult(colStats.getCountDistint(), 
factor));

Review comment:
   Ceiled. I moved this method since I'd like to reuse it for HIVE-24240.

[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called

2020-10-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=497306=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-497306
 ]

ASF GitHub Bot logged work on HIVE-21052:
-

Author: ASF GitHub Bot
Created on: 08/Oct/20 12:49
Start Date: 08/Oct/20 12:49
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on pull request #1548:
URL: https://github.com/apache/hive/pull/1548#issuecomment-705545322


   looks like master is broken right now



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 497306)
Time Spent: 7h 50m  (was: 7h 40m)

> Make sure transactions get cleaned if they are aborted before addPartitions 
> is called
> -
>
> Key: HIVE-21052
> URL: https://issues.apache.org/jira/browse/HIVE-21052
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0, 3.1.1
>Reporter: Jaume M
>Assignee: Jaume M
>Priority: Critical
>  Labels: pull-request-available
> Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, 
> HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, 
> HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, 
> HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, 
> HIVE-21052.8.patch, HIVE-21052.9.patch
>
>  Time Spent: 7h 50m
>  Remaining Estimate: 0h
>
> If the transaction is aborted between openTxn and addPartitions and data has 
> been written on the table the transaction manager will think it's an empty 
> transaction and no cleaning will be done.
> This is currently an issue in the streaming API and in micromanaged tables. 
> As proposed by [~ekoifman] this can be solved by:
> * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and 
> when addPartitions is called remove this entry from TXN_COMPONENTS and add 
> the corresponding partition entry to TXN_COMPONENTS.
> * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that 
> specifies that a transaction was opened and it was aborted it must generate 
> jobs for the worker for every possible partition available.
> cc [~ewohlstadter]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called

2020-10-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=497260=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-497260
 ]

ASF GitHub Bot logged work on HIVE-21052:
-

Author: ASF GitHub Bot
Created on: 08/Oct/20 11:47
Start Date: 08/Oct/20 11:47
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1548:
URL: https://github.com/apache/hive/pull/1548#discussion_r501646861



##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java
##
@@ -97,9 +100,9 @@ public void run() {
   long minOpenTxnId = txnHandler.findMinOpenTxnIdForCleaner();
   LOG.info("Cleaning based on min open txn id: " + minOpenTxnId);
   List cleanerList = new ArrayList<>();
-  for(CompactionInfo compactionInfo : txnHandler.findReadyToClean()) {
+  for (CompactionInfo compactionInfo : txnHandler.findReadyToClean()) {
 
cleanerList.add(CompletableFuture.runAsync(CompactorUtil.ThrowingRunnable.unchecked(()
 ->
-clean(compactionInfo, minOpenTxnId)), cleanerExecutor));
+  clean(compactionInfo, minOpenTxnId)), cleanerExecutor));

Review comment:
   1. In original patch Map tableLock = 
new ConcurrentHashMap<>() was used to prevent  a concurrent p-clean (where the 
whole table will be scanned). I think, that is resolved by grouping p-cleans 
and recording list of writeIds that needs to be removed:
   
https://github.com/apache/hive/pull/1548/files#diff-9cf3ae764b7a33b568a984d695aff837R328
   @vpnvishv is that correct? Also we do not allow concurrent Cleaners, their 
execution is mutexed.
   
   2. was related to the following issue based on Map tableLock = new ConcurrentHashMap<>()  design:
   "Suppose you have p-type clean on table T that is running (i.e. has the 
Write lock) and you have 30 different partition clean requests (in T).  The 30 
per partition cleans will get blocked but they will tie up every thread in the 
pool while they are blocked, right?  If so, no other clean (on any other table) 
will actually make progress until the p-type on T is done."
   I think, it's not valid now.
   
   
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 497260)
Time Spent: 7h 20m  (was: 7h 10m)

> Make sure transactions get cleaned if they are aborted before addPartitions 
> is called
> -
>
> Key: HIVE-21052
> URL: https://issues.apache.org/jira/browse/HIVE-21052
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0, 3.1.1
>Reporter: Jaume M
>Assignee: Jaume M
>Priority: Critical
>  Labels: pull-request-available
> Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, 
> HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, 
> HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, 
> HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, 
> HIVE-21052.8.patch, HIVE-21052.9.patch
>
>  Time Spent: 7h 20m
>  Remaining Estimate: 0h
>
> If the transaction is aborted between openTxn and addPartitions and data has 
> been written on the table the transaction manager will think it's an empty 
> transaction and no cleaning will be done.
> This is currently an issue in the streaming API and in micromanaged tables. 
> As proposed by [~ekoifman] this can be solved by:
> * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and 
> when addPartitions is called remove this entry from TXN_COMPONENTS and add 
> the corresponding partition entry to TXN_COMPONENTS.
> * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that 
> specifies that a transaction was opened and it was aborted it must generate 
> jobs for the worker for every possible partition available.
> cc [~ewohlstadter]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24203) Implement stats annotation rule for the LateralViewJoinOperator

2020-10-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24203?focusedWorklogId=497304=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-497304
 ]

ASF GitHub Bot logged work on HIVE-24203:
-

Author: ASF GitHub Bot
Created on: 08/Oct/20 12:47
Start Date: 08/Oct/20 12:47
Worklog Time Spent: 10m 
  Work Description: okumin commented on a change in pull request #1531:
URL: https://github.com/apache/hive/pull/1531#discussion_r501691897



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java
##
@@ -2921,6 +2920,97 @@ public Object process(Node nd, Stack stack, 
NodeProcessorCtx procCtx,
 }
   }
 
+  /**
+   * LateralViewJoinOperator changes the data size and column level statistics.
+   *
+   * A diagram of LATERAL VIEW.
+   *
+   *   [Lateral View Forward]
+   *  / \
+   *[Select]  [Select]
+   *||
+   *| [UDTF]
+   *\   /
+   *   [Lateral View Join]
+   *
+   * For each row of the source, the left branch just picks columns and the 
right branch processes UDTF.
+   * And then LVJ joins a row from the left branch with rows from the right 
branch.
+   * The join has one-to-many relationship since UDTF can generate multiple 
rows.
+   *
+   * This rule multiplies the stats from the left branch by T(right) / T(left) 
and sums up the both sides.
+   */
+  public static class LateralViewJoinStatsRule extends DefaultStatsRule 
implements SemanticNodeProcessor {
+@Override
+public Object process(Node nd, Stack stack, NodeProcessorCtx procCtx,
+  Object... nodeOutputs) throws SemanticException {
+  final LateralViewJoinOperator lop = (LateralViewJoinOperator) nd;
+  final AnnotateStatsProcCtx aspCtx = (AnnotateStatsProcCtx) procCtx;
+  final HiveConf conf = aspCtx.getConf();
+
+  if (!isAllParentsContainStatistics(lop)) {
+return null;
+  }
+
+  final List> parents = 
lop.getParentOperators();
+  if (parents.size() != 2) {
+LOG.warn("LateralViewJoinOperator should have just two parents but 
actually has "
++ parents.size() + " parents.");
+return null;
+  }
+
+  final Statistics selectStats = 
parents.get(LateralViewJoinOperator.SELECT_TAG).getStatistics();
+  final Statistics udtfStats = 
parents.get(LateralViewJoinOperator.UDTF_TAG).getStatistics();
+
+  final double factor = (double) udtfStats.getNumRows() / (double) 
selectStats.getNumRows();

Review comment:
   Added steps to check both numbers and ensure at least one record on 
stats.
   
https://github.com/apache/hive/pull/1531/commits/50396346eaed5d6bab4ff87dd079918a769a7ebd





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 497304)
Time Spent: 2h 20m  (was: 2h 10m)

> Implement stats annotation rule for the LateralViewJoinOperator
> ---
>
> Key: HIVE-24203
> URL: https://issues.apache.org/jira/browse/HIVE-24203
> Project: Hive
>  Issue Type: Improvement
>  Components: Physical Optimizer
>Affects Versions: 4.0.0, 3.1.2, 2.3.7
>Reporter: okumin
>Assignee: okumin
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> StatsRulesProcFactory doesn't have any rules to handle a JOIN by LATERAL VIEW.
> This can cause an underestimation in case that UDTF in LATERAL VIEW generates 
> multiple rows.
> HIVE-20262 has already added the rule for UDTF.
> This issue would add the rule for LateralViewJoinOperator.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called

2020-10-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=497257=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-497257
 ]

ASF GitHub Bot logged work on HIVE-21052:
-

Author: ASF GitHub Bot
Created on: 08/Oct/20 11:45
Start Date: 08/Oct/20 11:45
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1548:
URL: https://github.com/apache/hive/pull/1548#discussion_r501646861



##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java
##
@@ -97,9 +100,9 @@ public void run() {
   long minOpenTxnId = txnHandler.findMinOpenTxnIdForCleaner();
   LOG.info("Cleaning based on min open txn id: " + minOpenTxnId);
   List cleanerList = new ArrayList<>();
-  for(CompactionInfo compactionInfo : txnHandler.findReadyToClean()) {
+  for (CompactionInfo compactionInfo : txnHandler.findReadyToClean()) {
 
cleanerList.add(CompletableFuture.runAsync(CompactorUtil.ThrowingRunnable.unchecked(()
 ->
-clean(compactionInfo, minOpenTxnId)), cleanerExecutor));
+  clean(compactionInfo, minOpenTxnId)), cleanerExecutor));

Review comment:
   1. In original patch Map tableLock = 
new ConcurrentHashMap<>() was used to prevent  a concurrent p-clean (where the 
whole table will be scanned). I think, that is resolved by grouping p-cleans 
and recording list of writeIds that needs to be removed. @vpnvishv is that 
correct? Also we do not allow concurrent Cleaners, their execution is mutexed.
   
   2. was related to the following issue based on Map tableLock = new ConcurrentHashMap<>()  design:
   "Suppose you have p-type clean on table T that is running (i.e. has the 
Write lock) and you have 30 different partition clean requests (in T).  The 30 
per partition cleans will get blocked but they will tie up every thread in the 
pool while they are blocked, right?  If so, no other clean (on any other table) 
will actually make progress until the p-type on T is done."
   I think, it's not valid now.
   
   
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 497257)
Time Spent: 7h 10m  (was: 7h)

> Make sure transactions get cleaned if they are aborted before addPartitions 
> is called
> -
>
> Key: HIVE-21052
> URL: https://issues.apache.org/jira/browse/HIVE-21052
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0, 3.1.1
>Reporter: Jaume M
>Assignee: Jaume M
>Priority: Critical
>  Labels: pull-request-available
> Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, 
> HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, 
> HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, 
> HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, 
> HIVE-21052.8.patch, HIVE-21052.9.patch
>
>  Time Spent: 7h 10m
>  Remaining Estimate: 0h
>
> If the transaction is aborted between openTxn and addPartitions and data has 
> been written on the table the transaction manager will think it's an empty 
> transaction and no cleaning will be done.
> This is currently an issue in the streaming API and in micromanaged tables. 
> As proposed by [~ekoifman] this can be solved by:
> * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and 
> when addPartitions is called remove this entry from TXN_COMPONENTS and add 
> the corresponding partition entry to TXN_COMPONENTS.
> * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that 
> specifies that a transaction was opened and it was aborted it must generate 
> jobs for the worker for every possible partition available.
> cc [~ewohlstadter]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called

2020-10-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=497254=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-497254
 ]

ASF GitHub Bot logged work on HIVE-21052:
-

Author: ASF GitHub Bot
Created on: 08/Oct/20 11:43
Start Date: 08/Oct/20 11:43
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1548:
URL: https://github.com/apache/hive/pull/1548#discussion_r501646861



##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java
##
@@ -97,9 +100,9 @@ public void run() {
   long minOpenTxnId = txnHandler.findMinOpenTxnIdForCleaner();
   LOG.info("Cleaning based on min open txn id: " + minOpenTxnId);
   List cleanerList = new ArrayList<>();
-  for(CompactionInfo compactionInfo : txnHandler.findReadyToClean()) {
+  for (CompactionInfo compactionInfo : txnHandler.findReadyToClean()) {
 
cleanerList.add(CompletableFuture.runAsync(CompactorUtil.ThrowingRunnable.unchecked(()
 ->
-clean(compactionInfo, minOpenTxnId)), cleanerExecutor));
+  clean(compactionInfo, minOpenTxnId)), cleanerExecutor));

Review comment:
   In original patch Map tableLock = new 
ConcurrentHashMap<>() was used to prevent  a concurrent p-clean (where the 
whole table will be scanned). I think, that is resolved by grouping p-cleans 
and recording list of writeIds that needs to be removed. @vpnvishv is that 
correct? Also we do not allow concurrent Cleaners, their execution is mutexed.
   
   
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 497254)
Time Spent: 7h  (was: 6h 50m)

> Make sure transactions get cleaned if they are aborted before addPartitions 
> is called
> -
>
> Key: HIVE-21052
> URL: https://issues.apache.org/jira/browse/HIVE-21052
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0, 3.1.1
>Reporter: Jaume M
>Assignee: Jaume M
>Priority: Critical
>  Labels: pull-request-available
> Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, 
> HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, 
> HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, 
> HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, 
> HIVE-21052.8.patch, HIVE-21052.9.patch
>
>  Time Spent: 7h
>  Remaining Estimate: 0h
>
> If the transaction is aborted between openTxn and addPartitions and data has 
> been written on the table the transaction manager will think it's an empty 
> transaction and no cleaning will be done.
> This is currently an issue in the streaming API and in micromanaged tables. 
> As proposed by [~ekoifman] this can be solved by:
> * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and 
> when addPartitions is called remove this entry from TXN_COMPONENTS and add 
> the corresponding partition entry to TXN_COMPONENTS.
> * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that 
> specifies that a transaction was opened and it was aborted it must generate 
> jobs for the worker for every possible partition available.
> cc [~ewohlstadter]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24203) Implement stats annotation rule for the LateralViewJoinOperator

2020-10-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24203?focusedWorklogId=497300=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-497300
 ]

ASF GitHub Bot logged work on HIVE-24203:
-

Author: ASF GitHub Bot
Created on: 08/Oct/20 12:43
Start Date: 08/Oct/20 12:43
Worklog Time Spent: 10m 
  Work Description: okumin commented on a change in pull request #1531:
URL: https://github.com/apache/hive/pull/1531#discussion_r501689451



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java
##
@@ -2921,6 +2920,97 @@ public Object process(Node nd, Stack stack, 
NodeProcessorCtx procCtx,
 }
   }
 
+  /**
+   * LateralViewJoinOperator changes the data size and column level statistics.
+   *
+   * A diagram of LATERAL VIEW.
+   *
+   *   [Lateral View Forward]
+   *  / \
+   *[Select]  [Select]
+   *||
+   *| [UDTF]
+   *\   /
+   *   [Lateral View Join]
+   *
+   * For each row of the source, the left branch just picks columns and the 
right branch processes UDTF.
+   * And then LVJ joins a row from the left branch with rows from the right 
branch.
+   * The join has one-to-many relationship since UDTF can generate multiple 
rows.
+   *
+   * This rule multiplies the stats from the left branch by T(right) / T(left) 
and sums up the both sides.
+   */
+  public static class LateralViewJoinStatsRule extends DefaultStatsRule 
implements SemanticNodeProcessor {
+@Override
+public Object process(Node nd, Stack stack, NodeProcessorCtx procCtx,
+  Object... nodeOutputs) throws SemanticException {
+  final LateralViewJoinOperator lop = (LateralViewJoinOperator) nd;
+  final AnnotateStatsProcCtx aspCtx = (AnnotateStatsProcCtx) procCtx;
+  final HiveConf conf = aspCtx.getConf();
+
+  if (!isAllParentsContainStatistics(lop)) {
+return null;
+  }
+
+  final List> parents = 
lop.getParentOperators();
+  if (parents.size() != 2) {
+LOG.warn("LateralViewJoinOperator should have just two parents but 
actually has "
++ parents.size() + " parents.");
+return null;
+  }
+
+  final Statistics selectStats = 
parents.get(LateralViewJoinOperator.SELECT_TAG).getStatistics();
+  final Statistics udtfStats = 
parents.get(LateralViewJoinOperator.UDTF_TAG).getStatistics();
+
+  final double factor = (double) udtfStats.getNumRows() / (double) 
selectStats.getNumRows();
+  final long selectDataSize = 
StatsUtils.safeMult(selectStats.getDataSize(), factor);
+  final long dataSize = StatsUtils.safeAdd(selectDataSize, 
udtfStats.getDataSize());
+  Statistics joinedStats = new Statistics(udtfStats.getNumRows(), 
dataSize, 0, 0);
+
+  if (satisfyPrecondition(selectStats) && satisfyPrecondition(udtfStats)) {
+final Map columnExprMap = lop.getColumnExprMap();
+final RowSchema schema = lop.getSchema();
+
+joinedStats.updateColumnStatsState(selectStats.getColumnStatsState());
+final List selectColStats = StatsUtils
+.getColStatisticsFromExprMap(conf, selectStats, columnExprMap, 
schema);
+joinedStats.addToColumnStats(multiplyColStats(selectColStats, factor));
+
+joinedStats.updateColumnStatsState(udtfStats.getColumnStatsState());
+final List udtfColStats = StatsUtils
+.getColStatisticsFromExprMap(conf, udtfStats, columnExprMap, 
schema);
+joinedStats.addToColumnStats(udtfColStats);
+
+joinedStats = applyRuntimeStats(aspCtx.getParseContext().getContext(), 
joinedStats, lop);
+lop.setStatistics(joinedStats);
+
+if (LOG.isDebugEnabled()) {
+  LOG.debug("[0] STATS-" + lop.toString() + ": " + 
joinedStats.extendedToString());
+}

Review comment:
   I also agree and I did that.
   
https://github.com/apache/hive/pull/1531/commits/d333d5d70184a1cf1f0c0f239e9229965e486202





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 497300)
Time Spent: 2h 10m  (was: 2h)

> Implement stats annotation rule for the LateralViewJoinOperator
> ---
>
> Key: HIVE-24203
> URL: https://issues.apache.org/jira/browse/HIVE-24203
> Project: Hive
>  Issue Type: Improvement
>  Components: Physical Optimizer
>Affects Versions: 4.0.0, 3.1.2, 2.3.7
>Reporter: okumin
>Assignee: okumin
>Priority:

[jira] [Resolved] (HIVE-20137) Truncate for Transactional tables should use base_x

2020-10-08 Thread Peter Varga (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-20137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Varga resolved HIVE-20137.

Fix Version/s: 4.0.0
   Resolution: Fixed

> Truncate for Transactional tables should use base_x
> ---
>
> Key: HIVE-20137
> URL: https://issues.apache.org/jira/browse/HIVE-20137
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Eugene Koifman
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> This is a follow up to HIVE-19387.
> Once we have a lock that blocks writers but not readers (HIVE-19369), it 
> would make sense to make truncate create a new base_x, where is x is a 
> writeId in current txn - the same as Insert Overwrite does.
> This would mean it can work w/o interfering with existing writers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called

2020-10-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=497251=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-497251
 ]

ASF GitHub Bot logged work on HIVE-21052:
-

Author: ASF GitHub Bot
Created on: 08/Oct/20 11:29
Start Date: 08/Oct/20 11:29
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1548:
URL: https://github.com/apache/hive/pull/1548#discussion_r501646861



##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java
##
@@ -97,9 +100,9 @@ public void run() {
   long minOpenTxnId = txnHandler.findMinOpenTxnIdForCleaner();
   LOG.info("Cleaning based on min open txn id: " + minOpenTxnId);
   List cleanerList = new ArrayList<>();
-  for(CompactionInfo compactionInfo : txnHandler.findReadyToClean()) {
+  for (CompactionInfo compactionInfo : txnHandler.findReadyToClean()) {
 
cleanerList.add(CompletableFuture.runAsync(CompactorUtil.ThrowingRunnable.unchecked(()
 ->
-clean(compactionInfo, minOpenTxnId)), cleanerExecutor));
+  clean(compactionInfo, minOpenTxnId)), cleanerExecutor));

Review comment:
   In original patch Map tableLock = new 
ConcurrentHashMap<>() was used to prevent  a concurrent p-clean (where the 
whole table will be scanned). I think, that is resolved by grouping p-cleans 
and recording list of writeIds that needs to be removed. @vpnvishv is that 
correct? 
   
   
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 497251)
Time Spent: 6h 50m  (was: 6h 40m)

> Make sure transactions get cleaned if they are aborted before addPartitions 
> is called
> -
>
> Key: HIVE-21052
> URL: https://issues.apache.org/jira/browse/HIVE-21052
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0, 3.1.1
>Reporter: Jaume M
>Assignee: Jaume M
>Priority: Critical
>  Labels: pull-request-available
> Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, 
> HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, 
> HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, 
> HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, 
> HIVE-21052.8.patch, HIVE-21052.9.patch
>
>  Time Spent: 6h 50m
>  Remaining Estimate: 0h
>
> If the transaction is aborted between openTxn and addPartitions and data has 
> been written on the table the transaction manager will think it's an empty 
> transaction and no cleaning will be done.
> This is currently an issue in the streaming API and in micromanaged tables. 
> As proposed by [~ekoifman] this can be solved by:
> * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and 
> when addPartitions is called remove this entry from TXN_COMPONENTS and add 
> the corresponding partition entry to TXN_COMPONENTS.
> * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that 
> specifies that a transaction was opened and it was aborted it must generate 
> jobs for the worker for every possible partition available.
> cc [~ewohlstadter]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called

2020-10-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=497264=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-497264
 ]

ASF GitHub Bot logged work on HIVE-21052:
-

Author: ASF GitHub Bot
Created on: 08/Oct/20 11:53
Start Date: 08/Oct/20 11:53
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1548:
URL: https://github.com/apache/hive/pull/1548#discussion_r501646861



##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java
##
@@ -97,9 +100,9 @@ public void run() {
   long minOpenTxnId = txnHandler.findMinOpenTxnIdForCleaner();
   LOG.info("Cleaning based on min open txn id: " + minOpenTxnId);
   List cleanerList = new ArrayList<>();
-  for(CompactionInfo compactionInfo : txnHandler.findReadyToClean()) {
+  for (CompactionInfo compactionInfo : txnHandler.findReadyToClean()) {
 
cleanerList.add(CompletableFuture.runAsync(CompactorUtil.ThrowingRunnable.unchecked(()
 ->
-clean(compactionInfo, minOpenTxnId)), cleanerExecutor));
+  clean(compactionInfo, minOpenTxnId)), cleanerExecutor));

Review comment:
   1. In original patch Map tableLock = 
new ConcurrentHashMap<>() was used to prevent  a concurrent p-clean (where the 
whole table will be scanned). I think, that is resolved by grouping p-cleans 
and recording list of writeIds that needs to be removed:
   
https://github.com/apache/hive/pull/1548/files#diff-9cf3ae764b7a33b568a984d695aff837R328
   @vpnvishv is that correct? Also we do not allow concurrent Cleaners, their 
execution is mutexed.
   
   2. was related to the following issue based on Map tableLock = new ConcurrentHashMap<>()  design:
   "Suppose you have p-type clean on table T that is running (i.e. has the 
Write lock) and you have 30 different partition clean requests (in T).  The 30 
per partition cleans will get blocked but they will tie up every thread in the 
pool while they are blocked, right?  If so, no other clean (on any other table) 
will actually make progress until the p-type on T is done."
   
   Yes, it's still the case that we'll have to wait for all tasks to complete 
and if there is one long-running task, we won't be able to submit new ones. 
However not sure if it's a critical issue. I think, we can address it in a 
separate jira.
   
   
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 497264)
Time Spent: 7h 40m  (was: 7.5h)

> Make sure transactions get cleaned if they are aborted before addPartitions 
> is called
> -
>
> Key: HIVE-21052
> URL: https://issues.apache.org/jira/browse/HIVE-21052
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0, 3.1.1
>Reporter: Jaume M
>Assignee: Jaume M
>Priority: Critical
>  Labels: pull-request-available
> Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, 
> HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, 
> HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, 
> HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, 
> HIVE-21052.8.patch, HIVE-21052.9.patch
>
>  Time Spent: 7h 40m
>  Remaining Estimate: 0h
>
> If the transaction is aborted between openTxn and addPartitions and data has 
> been written on the table the transaction manager will think it's an empty 
> transaction and no cleaning will be done.
> This is currently an issue in the streaming API and in micromanaged tables. 
> As proposed by [~ekoifman] this can be solved by:
> * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and 
> when addPartitions is called remove this entry from TXN_COMPONENTS and add 
> the corresponding partition entry to TXN_COMPONENTS.
> * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that 
> specifies that a transaction was opened and it was aborted it must generate 
> jobs for the worker for every possible partition available.
> cc [~ewohlstadter]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called

2020-10-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=497262=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-497262
 ]

ASF GitHub Bot logged work on HIVE-21052:
-

Author: ASF GitHub Bot
Created on: 08/Oct/20 11:52
Start Date: 08/Oct/20 11:52
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1548:
URL: https://github.com/apache/hive/pull/1548#discussion_r501646861



##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java
##
@@ -97,9 +100,9 @@ public void run() {
   long minOpenTxnId = txnHandler.findMinOpenTxnIdForCleaner();
   LOG.info("Cleaning based on min open txn id: " + minOpenTxnId);
   List cleanerList = new ArrayList<>();
-  for(CompactionInfo compactionInfo : txnHandler.findReadyToClean()) {
+  for (CompactionInfo compactionInfo : txnHandler.findReadyToClean()) {
 
cleanerList.add(CompletableFuture.runAsync(CompactorUtil.ThrowingRunnable.unchecked(()
 ->
-clean(compactionInfo, minOpenTxnId)), cleanerExecutor));
+  clean(compactionInfo, minOpenTxnId)), cleanerExecutor));

Review comment:
   1. In original patch Map tableLock = 
new ConcurrentHashMap<>() was used to prevent  a concurrent p-clean (where the 
whole table will be scanned). I think, that is resolved by grouping p-cleans 
and recording list of writeIds that needs to be removed:
   
https://github.com/apache/hive/pull/1548/files#diff-9cf3ae764b7a33b568a984d695aff837R328
   @vpnvishv is that correct? Also we do not allow concurrent Cleaners, their 
execution is mutexed.
   
   2. was related to the following issue based on Map tableLock = new ConcurrentHashMap<>()  design:
   "Suppose you have p-type clean on table T that is running (i.e. has the 
Write lock) and you have 30 different partition clean requests (in T).  The 30 
per partition cleans will get blocked but they will tie up every thread in the 
pool while they are blocked, right?  If so, no other clean (on any other table) 
will actually make progress until the p-type on T is done."
   Yes, it's still the case that we'll have to wait for all tasks to complete 
and if there is one slow task, we won't be able to submit new tasks. However 
not sure if it's a critical issue. I think, we can address it in a separate 
jira.
   
   
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 497262)
Time Spent: 7.5h  (was: 7h 20m)

> Make sure transactions get cleaned if they are aborted before addPartitions 
> is called
> -
>
> Key: HIVE-21052
> URL: https://issues.apache.org/jira/browse/HIVE-21052
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0, 3.1.1
>Reporter: Jaume M
>Assignee: Jaume M
>Priority: Critical
>  Labels: pull-request-available
> Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, 
> HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, 
> HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, 
> HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, 
> HIVE-21052.8.patch, HIVE-21052.9.patch
>
>  Time Spent: 7.5h
>  Remaining Estimate: 0h
>
> If the transaction is aborted between openTxn and addPartitions and data has 
> been written on the table the transaction manager will think it's an empty 
> transaction and no cleaning will be done.
> This is currently an issue in the streaming API and in micromanaged tables. 
> As proposed by [~ekoifman] this can be solved by:
> * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and 
> when addPartitions is called remove this entry from TXN_COMPONENTS and add 
> the corresponding partition entry to TXN_COMPONENTS.
> * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that 
> specifies that a transaction was opened and it was aborted it must generate 
> jobs for the worker for every possible partition available.
> cc [~ewohlstadter]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called

2020-10-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=497224=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-497224
 ]

ASF GitHub Bot logged work on HIVE-21052:
-

Author: ASF GitHub Bot
Created on: 08/Oct/20 10:39
Start Date: 08/Oct/20 10:39
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1548:
URL: https://github.com/apache/hive/pull/1548#discussion_r499754423



##
File path: ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java
##
@@ -2839,6 +2848,87 @@ public static void setNonTransactional(Map tblProps) {
 tblProps.remove(hive_metastoreConstants.TABLE_TRANSACTIONAL_PROPERTIES);
   }
 
+  /**
+   * Look for delta directories matching the list of writeIds and deletes them.
+   * @param rootPartition root partition to look for the delta directories
+   * @param conf configuration
+   * @param writeIds list of writeIds to look for in the delta directories
+   * @return list of deleted directories.
+   * @throws IOException
+   */
+  public static List deleteDeltaDirectories(Path rootPartition, 
Configuration conf, Set writeIds)
+  throws IOException {
+FileSystem fs = rootPartition.getFileSystem(conf);
+
+PathFilter filter = (p) -> {
+  String name = p.getName();
+  for (Long wId : writeIds) {
+if (name.startsWith(deltaSubdir(wId, wId)) && !name.contains("=")) {

Review comment:
   changed, included delete_delta as well





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 497224)
Time Spent: 6h 40m  (was: 6.5h)

> Make sure transactions get cleaned if they are aborted before addPartitions 
> is called
> -
>
> Key: HIVE-21052
> URL: https://issues.apache.org/jira/browse/HIVE-21052
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0, 3.1.1
>Reporter: Jaume M
>Assignee: Jaume M
>Priority: Critical
>  Labels: pull-request-available
> Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, 
> HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, 
> HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, 
> HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, 
> HIVE-21052.8.patch, HIVE-21052.9.patch
>
>  Time Spent: 6h 40m
>  Remaining Estimate: 0h
>
> If the transaction is aborted between openTxn and addPartitions and data has 
> been written on the table the transaction manager will think it's an empty 
> transaction and no cleaning will be done.
> This is currently an issue in the streaming API and in micromanaged tables. 
> As proposed by [~ekoifman] this can be solved by:
> * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and 
> when addPartitions is called remove this entry from TXN_COMPONENTS and add 
> the corresponding partition entry to TXN_COMPONENTS.
> * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that 
> specifies that a transaction was opened and it was aborted it must generate 
> jobs for the worker for every possible partition available.
> cc [~ewohlstadter]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-24242) Relax safety checks in SharedWorkOptimizer

2020-10-08 Thread Zoltan Haindrich (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich reassigned HIVE-24242:
---


> Relax safety checks in SharedWorkOptimizer
> --
>
> Key: HIVE-24242
> URL: https://issues.apache.org/jira/browse/HIVE-24242
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>
> there are some checks to lock out problematic cases
> For UnionOperator 
> [here|https://github.com/apache/hive/blob/1507d80fd47aad38b87bba4fd58c1427ba89dbbf/ql/src/java/org/apache/hadoop/hive/ql/optimizer/SharedWorkOptimizer.java#L1571])
> This check could prevent the optimization even if the Union is only visible 
> from only 1 of the TS ops.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24242) Relax safety checks in SharedWorkOptimizer

2020-10-08 Thread Zoltan Haindrich (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-24242:

Description: 
there are some checks to lock out problematic cases

For UnionOperator 
[here|https://github.com/apache/hive/blob/1507d80fd47aad38b87bba4fd58c1427ba89dbbf/ql/src/java/org/apache/hadoop/hive/ql/optimizer/SharedWorkOptimizer.java#L1571]

This check could prevent the optimization even if the Union is only visible 
from only 1 of the TS ops.



  was:
there are some checks to lock out problematic cases

For UnionOperator 
[here|https://github.com/apache/hive/blob/1507d80fd47aad38b87bba4fd58c1427ba89dbbf/ql/src/java/org/apache/hadoop/hive/ql/optimizer/SharedWorkOptimizer.java#L1571])

This check could prevent the optimization even if the Union is only visible 
from only 1 of the TS ops.




> Relax safety checks in SharedWorkOptimizer
> --
>
> Key: HIVE-24242
> URL: https://issues.apache.org/jira/browse/HIVE-24242
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>
> there are some checks to lock out problematic cases
> For UnionOperator 
> [here|https://github.com/apache/hive/blob/1507d80fd47aad38b87bba4fd58c1427ba89dbbf/ql/src/java/org/apache/hadoop/hive/ql/optimizer/SharedWorkOptimizer.java#L1571]
> This check could prevent the optimization even if the Union is only visible 
> from only 1 of the TS ops.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-24241) Enable SharedWorkOptimizer to merge downstream operators after an optimization step

2020-10-08 Thread Zoltan Haindrich (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich reassigned HIVE-24241:
---


> Enable SharedWorkOptimizer to merge downstream operators after an 
> optimization step
> ---
>
> Key: HIVE-24241
> URL: https://issues.apache.org/jira/browse/HIVE-24241
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24225) FIX S3A recordReader policy selection

2020-10-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24225?focusedWorklogId=497199=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-497199
 ]

ASF GitHub Bot logged work on HIVE-24225:
-

Author: ASF GitHub Bot
Created on: 08/Oct/20 09:28
Start Date: 08/Oct/20 09:28
Worklog Time Spent: 10m 
  Work Description: steveloughran commented on pull request #1547:
URL: https://github.com/apache/hive/pull/1547#issuecomment-705448817


   (that openFileWithOptions() is how the openFile() is implemented, it's not 
meant to be the API call which apps make. That's covered: 
[here](https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/filesystem/fsdatainputstreambuilder.html)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 497199)
Time Spent: 1h 20m  (was: 1h 10m)

> FIX S3A recordReader policy selection
> -
>
> Key: HIVE-24225
> URL: https://issues.apache.org/jira/browse/HIVE-24225
> Project: Hive
>  Issue Type: Bug
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Dynamic S3A recordReader policy selection can cause issues on lazy 
> initialized FS objects



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24225) FIX S3A recordReader policy selection

2020-10-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24225?focusedWorklogId=497198=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-497198
 ]

ASF GitHub Bot logged work on HIVE-24225:
-

Author: ASF GitHub Bot
Created on: 08/Oct/20 09:24
Start Date: 08/Oct/20 09:24
Worklog Time Spent: 10m 
  Work Description: steveloughran commented on pull request #1547:
URL: https://github.com/apache/hive/pull/1547#issuecomment-705446922


   you actually want to use the openFile() API Call which is already in 
hadoop-3.3, but which is being tweaked 
(https://github.com/apache/hadoop/pull/2168) to have a standard option for seek 
policy for all object stores, and the ability to pass in file length -which 
will let the S3A & ABFS connectors skip a head probe
   
   ```java
 @Override
 public SeekableInputStream newStream() throws IOException {
   FutureDataInputStreamBuilder builder = fs.openFile(getPath())
 .opt("fs.s3a.experimental.input.fadvise", "random")  // supported in 
Hadoop 3.3.0
 .opt("fs.option.openfile.fadvise", "random"); // this will be the 
new standard one
   
   if (length > 0) {
 builder.opt("fs.option.openfile.length", length);// also in the 
PR. Pass in the end of the split & s3a will be happy too
   }
   CompletableFuture streamF = builder.build();
   return HadoopStreams.wrap(FutureIOSupport.awaitFuture(streamF));
 }
   ```
   
   That API is there today, if you want to try playing with it,.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 497198)
Time Spent: 1h 10m  (was: 1h)

> FIX S3A recordReader policy selection
> -
>
> Key: HIVE-24225
> URL: https://issues.apache.org/jira/browse/HIVE-24225
> Project: Hive
>  Issue Type: Bug
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Dynamic S3A recordReader policy selection can cause issues on lazy 
> initialized FS objects



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24231) Enhance shared work optimizer to merge scans with filters on both sides

2020-10-08 Thread Zoltan Haindrich (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-24231:

Summary: Enhance shared work optimizer to merge scans with filters on both 
sides  (was: Enhance shared work optimizer to merge scans with semijoin filters 
on both sides)

> Enhance shared work optimizer to merge scans with filters on both sides
> ---
>
> Key: HIVE-24231
> URL: https://issues.apache.org/jira/browse/HIVE-24231
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24197) Check for write transactions for the db under replication at a frequent interval

2020-10-08 Thread Aasha Medhi (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aasha Medhi updated HIVE-24197:
---
Status: In Progress  (was: Patch Available)

> Check for write transactions for the db under replication at a frequent 
> interval
> 
>
> Key: HIVE-24197
> URL: https://issues.apache.org/jira/browse/HIVE-24197
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24197.01.patch, HIVE-24197.02.patch, 
> HIVE-24197.03.patch, HIVE-24197.04.patch, HIVE-24197.05.patch, 
> HIVE-24197.06.patch, HIVE-24197.07.patch, HIVE-24197.08.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24197) Check for write transactions for the db under replication at a frequent interval

2020-10-08 Thread Aasha Medhi (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aasha Medhi updated HIVE-24197:
---
Attachment: HIVE-24197.08.patch
Status: Patch Available  (was: In Progress)

> Check for write transactions for the db under replication at a frequent 
> interval
> 
>
> Key: HIVE-24197
> URL: https://issues.apache.org/jira/browse/HIVE-24197
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24197.01.patch, HIVE-24197.02.patch, 
> HIVE-24197.03.patch, HIVE-24197.04.patch, HIVE-24197.05.patch, 
> HIVE-24197.06.patch, HIVE-24197.07.patch, HIVE-24197.08.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

62 matches

Mail list logo