[jira] [Work logged] (HIVE-22068) Return the last event id dumped as repl status to avoid notification event missing error.

2020-06-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22068?focusedWorklogId=442512=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-442512
 ]

ASF GitHub Bot logged work on HIVE-22068:
-

Author: ASF GitHub Bot
Created on: 08/Jun/20 00:24
Start Date: 08/Jun/20 00:24
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] commented on pull request #742:
URL: https://github.com/apache/hive/pull/742#issuecomment-640302939


   This pull request has been automatically marked as stale because it has not 
had recent activity. It will be closed if no further activity occurs.
   Feel free to reach out on the d...@hive.apache.org list if the patch is in 
need of reviews.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 442512)
Time Spent: 2.5h  (was: 2h 20m)

> Return the last event id dumped as repl status to avoid notification event 
> missing error.
> -
>
> Key: HIVE-22068
> URL: https://issues.apache.org/jira/browse/HIVE-22068
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-22068.01.patch, HIVE-22068.02.patch, 
> HIVE-22068.03.patch, HIVE-22068.04.patch, HIVE-22068.05.patch, 
> HIVE-22068.06.patch, HIVE-22068.07.patch
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> In repl load, update the status of target database to the last event dumped 
> so that repl status returns that and next incremental can specify it as the 
> event from which to start the dump. WIthout that repl status might return and 
> old event which might cause, older events to be dumped again and/or a 
> notification event missing error if the older events are cleaned by the 
> cleaner.
> While at it
>  * Add more logging to DB notification listener cleaner thread
>  ** The time when it considered cleaning, the interval and time before which 
> events were cleared, the min and max id at that time
>  ** how many events were cleared
>  ** min and max id after the cleaning.
>  * In REPL::START document the starting event, end event if specified and the 
> maximum number of events, if specified.
>  *



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-22068) Return the last event id dumped as repl status to avoid notification event missing error.

2020-06-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22068?focusedWorklogId=442515=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-442515
 ]

ASF GitHub Bot logged work on HIVE-22068:
-

Author: ASF GitHub Bot
Created on: 08/Jun/20 00:24
Start Date: 08/Jun/20 00:24
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #742:
URL: https://github.com/apache/hive/pull/742


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 442515)
Time Spent: 2h 40m  (was: 2.5h)

> Return the last event id dumped as repl status to avoid notification event 
> missing error.
> -
>
> Key: HIVE-22068
> URL: https://issues.apache.org/jira/browse/HIVE-22068
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-22068.01.patch, HIVE-22068.02.patch, 
> HIVE-22068.03.patch, HIVE-22068.04.patch, HIVE-22068.05.patch, 
> HIVE-22068.06.patch, HIVE-22068.07.patch
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> In repl load, update the status of target database to the last event dumped 
> so that repl status returns that and next incremental can specify it as the 
> event from which to start the dump. WIthout that repl status might return and 
> old event which might cause, older events to be dumped again and/or a 
> notification event missing error if the older events are cleaned by the 
> cleaner.
> While at it
>  * Add more logging to DB notification listener cleaner thread
>  ** The time when it considered cleaning, the interval and time before which 
> events were cleared, the min and max id at that time
>  ** how many events were cleared
>  ** min and max id after the cleaning.
>  * In REPL::START document the starting event, end event if specified and the 
> maximum number of events, if specified.
>  *



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-22068) Return the last event id dumped as repl status to avoid notification event missing error.

2019-08-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22068?focusedWorklogId=297970=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-297970
 ]

ASF GitHub Bot logged work on HIVE-22068:
-

Author: ASF GitHub Bot
Created on: 20/Aug/19 15:38
Start Date: 20/Aug/19 15:38
Worklog Time Spent: 10m 
  Work Description: ashutosh-bapat commented on pull request #742: 
HIVE-22068 : Add more logging to notification cleaner and replication to track 
events
URL: https://github.com/apache/hive/pull/742#discussion_r315762673
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplLoadTask.java
 ##
 @@ -522,6 +525,41 @@ private int executeIncrementalLoad(DriverContext 
driverContext) {
   // bootstrap of tables if exist.
   if (builder.hasMoreWork() || work.getPathsToCopyIterator().hasNext() || 
work.hasBootstrapLoadTasks()) {
 DAGTraversal.traverse(childTasks, new 
AddDependencyToLeaves(TaskFactory.get(work, conf)));
+  } else {
+// Nothing to be done for repl load now. Add a task to update the 
last.repl.id of the
+// target database to the event id of the last event considered by the 
dump. Next
+// incremental cycle if starts from this id, the events considered for 
this dump, won't
+// be considered again.
+
+// The name of the database to be loaded into is either specified 
directly or is
+// available from the dump metadata.
+String dbName = work.dbNameToLoadIn;
+if (dbName == null || StringUtils.isNotBlank(dbName)) {
+  if (work.currentReplScope != null) {
 
 Review comment:
   Done. Please check and suggest improvement if necessary.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 297970)
Time Spent: 2h 20m  (was: 2h 10m)

> Return the last event id dumped as repl status to avoid notification event 
> missing error.
> -
>
> Key: HIVE-22068
> URL: https://issues.apache.org/jira/browse/HIVE-22068
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22068.01.patch, HIVE-22068.02.patch, 
> HIVE-22068.03.patch, HIVE-22068.04.patch, HIVE-22068.05.patch
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> In repl load, update the status of target database to the last event dumped 
> so that repl status returns that and next incremental can specify it as the 
> event from which to start the dump. WIthout that repl status might return and 
> old event which might cause, older events to be dumped again and/or a 
> notification event missing error if the older events are cleaned by the 
> cleaner.
> While at it
>  * Add more logging to DB notification listener cleaner thread
>  ** The time when it considered cleaning, the interval and time before which 
> events were cleared, the min and max id at that time
>  ** how many events were cleared
>  ** min and max id after the cleaning.
>  * In REPL::START document the starting event, end event if specified and the 
> maximum number of events, if specified.
>  *



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (HIVE-22068) Return the last event id dumped as repl status to avoid notification event missing error.

2019-08-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22068?focusedWorklogId=297846=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-297846
 ]

ASF GitHub Bot logged work on HIVE-22068:
-

Author: ASF GitHub Bot
Created on: 20/Aug/19 12:54
Start Date: 20/Aug/19 12:54
Worklog Time Spent: 10m 
  Work Description: ashutosh-bapat commented on pull request #742: 
HIVE-22068 : Add more logging to notification cleaner and replication to track 
events
URL: https://github.com/apache/hive/pull/742#discussion_r315670865
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplLoadTask.java
 ##
 @@ -522,6 +525,41 @@ private int executeIncrementalLoad(DriverContext 
driverContext) {
   // bootstrap of tables if exist.
   if (builder.hasMoreWork() || work.getPathsToCopyIterator().hasNext() || 
work.hasBootstrapLoadTasks()) {
 DAGTraversal.traverse(childTasks, new 
AddDependencyToLeaves(TaskFactory.get(work, conf)));
+  } else {
+// Nothing to be done for repl load now. Add a task to update the 
last.repl.id of the
+// target database to the event id of the last event considered by the 
dump. Next
+// incremental cycle if starts from this id, the events considered for 
this dump, won't
+// be considered again.
+
+// The name of the database to be loaded into is either specified 
directly or is
+// available from the dump metadata.
+String dbName = work.dbNameToLoadIn;
+if (dbName == null || StringUtils.isNotBlank(dbName)) {
 
 Review comment:
   Thanks for catching this. Fixed.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 297846)
Time Spent: 2h 10m  (was: 2h)

> Return the last event id dumped as repl status to avoid notification event 
> missing error.
> -
>
> Key: HIVE-22068
> URL: https://issues.apache.org/jira/browse/HIVE-22068
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22068.01.patch, HIVE-22068.02.patch, 
> HIVE-22068.03.patch, HIVE-22068.04.patch, HIVE-22068.05.patch
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> In repl load, update the status of target database to the last event dumped 
> so that repl status returns that and next incremental can specify it as the 
> event from which to start the dump. WIthout that repl status might return and 
> old event which might cause, older events to be dumped again and/or a 
> notification event missing error if the older events are cleaned by the 
> cleaner.
> While at it
>  * Add more logging to DB notification listener cleaner thread
>  ** The time when it considered cleaning, the interval and time before which 
> events were cleared, the min and max id at that time
>  ** how many events were cleared
>  ** min and max id after the cleaning.
>  * In REPL::START document the starting event, end event if specified and the 
> maximum number of events, if specified.
>  *



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (HIVE-22068) Return the last event id dumped as repl status to avoid notification event missing error.

2019-08-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22068?focusedWorklogId=297843=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-297843
 ]

ASF GitHub Bot logged work on HIVE-22068:
-

Author: ASF GitHub Bot
Created on: 20/Aug/19 12:51
Start Date: 20/Aug/19 12:51
Worklog Time Spent: 10m 
  Work Description: ashutosh-bapat commented on pull request #742: 
HIVE-22068 : Add more logging to notification cleaner and replication to track 
events
URL: https://github.com/apache/hive/pull/742#discussion_r315669736
 
 

 ##
 File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosExternalTables.java
 ##
 @@ -750,6 +766,38 @@ public Table apply(@Nullable Table table) {
 .verifyResults(Arrays.asList("1", "2"));
   }
 
+  @Test
+  public void testIncrementalDumpEmptyDumpDirectory() throws Throwable {
 
 Review comment:
   Added a testcase with external table bootstrap. There I could reproduce the 
problem you mentioned and also fixed it. During an incremental the last repl id 
is updated for the database after applying all the events but before 
bootstrapping any tables.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 297843)
Time Spent: 2h  (was: 1h 50m)

> Return the last event id dumped as repl status to avoid notification event 
> missing error.
> -
>
> Key: HIVE-22068
> URL: https://issues.apache.org/jira/browse/HIVE-22068
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22068.01.patch, HIVE-22068.02.patch, 
> HIVE-22068.03.patch, HIVE-22068.04.patch, HIVE-22068.05.patch
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> In repl load, update the status of target database to the last event dumped 
> so that repl status returns that and next incremental can specify it as the 
> event from which to start the dump. WIthout that repl status might return and 
> old event which might cause, older events to be dumped again and/or a 
> notification event missing error if the older events are cleaned by the 
> cleaner.
> While at it
>  * Add more logging to DB notification listener cleaner thread
>  ** The time when it considered cleaning, the interval and time before which 
> events were cleared, the min and max id at that time
>  ** how many events were cleared
>  ** min and max id after the cleaning.
>  * In REPL::START document the starting event, end event if specified and the 
> maximum number of events, if specified.
>  *



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (HIVE-22068) Return the last event id dumped as repl status to avoid notification event missing error.

2019-08-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-22068?focusedWorklogId=296729=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296729
 ]

ASF GitHub Bot logged work on HIVE-22068:
-

Author: ASF GitHub Bot
Created on: 17/Aug/19 04:57
Start Date: 17/Aug/19 04:57
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #742: HIVE-22068 : 
Add more logging to notification cleaner and replication to track events
URL: https://github.com/apache/hive/pull/742#discussion_r314934595
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplLoadTask.java
 ##
 @@ -522,6 +525,41 @@ private int executeIncrementalLoad(DriverContext 
driverContext) {
   // bootstrap of tables if exist.
   if (builder.hasMoreWork() || work.getPathsToCopyIterator().hasNext() || 
work.hasBootstrapLoadTasks()) {
 DAGTraversal.traverse(childTasks, new 
AddDependencyToLeaves(TaskFactory.get(work, conf)));
+  } else {
+// Nothing to be done for repl load now. Add a task to update the 
last.repl.id of the
+// target database to the event id of the last event considered by the 
dump. Next
+// incremental cycle if starts from this id, the events considered for 
this dump, won't
+// be considered again.
+
+// The name of the database to be loaded into is either specified 
directly or is
+// available from the dump metadata.
+String dbName = work.dbNameToLoadIn;
+if (dbName == null || StringUtils.isNotBlank(dbName)) {
 
 Review comment:
   Should use StringUtils.isBlank(dbName). 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 296729)
Time Spent: 1h 40m  (was: 1.5h)

> Return the last event id dumped as repl status to avoid notification event 
> missing error.
> -
>
> Key: HIVE-22068
> URL: https://issues.apache.org/jira/browse/HIVE-22068
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22068.01.patch, HIVE-22068.02.patch, 
> HIVE-22068.03.patch, HIVE-22068.04.patch, HIVE-22068.05.patch
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> In repl load, update the status of target database to the last event dumped 
> so that repl status returns that and next incremental can specify it as the 
> event from which to start the dump. WIthout that repl status might return and 
> old event which might cause, older events to be dumped again and/or a 
> notification event missing error if the older events are cleaned by the 
> cleaner.
> While at it
>  * Add more logging to DB notification listener cleaner thread
>  ** The time when it considered cleaning, the interval and time before which 
> events were cleared, the min and max id at that time
>  ** how many events were cleared
>  ** min and max id after the cleaning.
>  * In REPL::START document the starting event, end event if specified and the 
> maximum number of events, if specified.
>  *



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (HIVE-22068) Return the last event id dumped as repl status to avoid notification event missing error.

2019-08-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-22068?focusedWorklogId=296730=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296730
 ]

ASF GitHub Bot logged work on HIVE-22068:
-

Author: ASF GitHub Bot
Created on: 17/Aug/19 04:57
Start Date: 17/Aug/19 04:57
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #742: HIVE-22068 : 
Add more logging to notification cleaner and replication to track events
URL: https://github.com/apache/hive/pull/742#discussion_r314934615
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplLoadTask.java
 ##
 @@ -522,6 +525,41 @@ private int executeIncrementalLoad(DriverContext 
driverContext) {
   // bootstrap of tables if exist.
   if (builder.hasMoreWork() || work.getPathsToCopyIterator().hasNext() || 
work.hasBootstrapLoadTasks()) {
 DAGTraversal.traverse(childTasks, new 
AddDependencyToLeaves(TaskFactory.get(work, conf)));
+  } else {
+// Nothing to be done for repl load now. Add a task to update the 
last.repl.id of the
+// target database to the event id of the last event considered by the 
dump. Next
+// incremental cycle if starts from this id, the events considered for 
this dump, won't
+// be considered again.
+
+// The name of the database to be loaded into is either specified 
directly or is
+// available from the dump metadata.
+String dbName = work.dbNameToLoadIn;
+if (dbName == null || StringUtils.isNotBlank(dbName)) {
+  if (work.currentReplScope != null) {
 
 Review comment:
   Add a comment about in which scenario we hit this case.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 296730)
Time Spent: 1h 50m  (was: 1h 40m)

> Return the last event id dumped as repl status to avoid notification event 
> missing error.
> -
>
> Key: HIVE-22068
> URL: https://issues.apache.org/jira/browse/HIVE-22068
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22068.01.patch, HIVE-22068.02.patch, 
> HIVE-22068.03.patch, HIVE-22068.04.patch, HIVE-22068.05.patch
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> In repl load, update the status of target database to the last event dumped 
> so that repl status returns that and next incremental can specify it as the 
> event from which to start the dump. WIthout that repl status might return and 
> old event which might cause, older events to be dumped again and/or a 
> notification event missing error if the older events are cleaned by the 
> cleaner.
> While at it
>  * Add more logging to DB notification listener cleaner thread
>  ** The time when it considered cleaning, the interval and time before which 
> events were cleared, the min and max id at that time
>  ** how many events were cleared
>  ** min and max id after the cleaning.
>  * In REPL::START document the starting event, end event if specified and the 
> maximum number of events, if specified.
>  *



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (HIVE-22068) Return the last event id dumped as repl status to avoid notification event missing error.

2019-08-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-22068?focusedWorklogId=296728=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296728
 ]

ASF GitHub Bot logged work on HIVE-22068:
-

Author: ASF GitHub Bot
Created on: 17/Aug/19 04:54
Start Date: 17/Aug/19 04:54
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #742: HIVE-22068 : 
Add more logging to notification cleaner and replication to track events
URL: https://github.com/apache/hive/pull/742#discussion_r314934579
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplLoadTask.java
 ##
 @@ -522,6 +525,25 @@ private int executeIncrementalLoad(DriverContext 
driverContext) {
   // bootstrap of tables if exist.
   if (builder.hasMoreWork() || work.getPathsToCopyIterator().hasNext() || 
work.hasBootstrapLoadTasks()) {
 DAGTraversal.traverse(childTasks, new 
AddDependencyToLeaves(TaskFactory.get(work, conf)));
+  } else if (work.dbNameToLoadIn != null) {
+// Nothing to be done for repl load now. Add a task to update the 
last.repl.id of the
+// target database to the event id of the last event considered by the 
dump. Next
+// incremental cycle if starts from this id, the events considered for 
this dump, won't
+// be considered again. If we are replicating to multiple databases at 
a time, it's not
+// possible to know which all databases we are replicating into and 
hence we can not
+// update repl id in all those databases.
+String lastEventid = builder.eventTo().toString();
 
 Review comment:
   OK
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 296728)
Time Spent: 1.5h  (was: 1h 20m)

> Return the last event id dumped as repl status to avoid notification event 
> missing error.
> -
>
> Key: HIVE-22068
> URL: https://issues.apache.org/jira/browse/HIVE-22068
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22068.01.patch, HIVE-22068.02.patch, 
> HIVE-22068.03.patch, HIVE-22068.04.patch, HIVE-22068.05.patch
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> In repl load, update the status of target database to the last event dumped 
> so that repl status returns that and next incremental can specify it as the 
> event from which to start the dump. WIthout that repl status might return and 
> old event which might cause, older events to be dumped again and/or a 
> notification event missing error if the older events are cleaned by the 
> cleaner.
> While at it
>  * Add more logging to DB notification listener cleaner thread
>  ** The time when it considered cleaning, the interval and time before which 
> events were cleared, the min and max id at that time
>  ** how many events were cleared
>  ** min and max id after the cleaning.
>  * In REPL::START document the starting event, end event if specified and the 
> maximum number of events, if specified.
>  *



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (HIVE-22068) Return the last event id dumped as repl status to avoid notification event missing error.

2019-08-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-22068?focusedWorklogId=296727=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296727
 ]

ASF GitHub Bot logged work on HIVE-22068:
-

Author: ASF GitHub Bot
Created on: 17/Aug/19 04:53
Start Date: 17/Aug/19 04:53
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #742: HIVE-22068 : 
Add more logging to notification cleaner and replication to track events
URL: https://github.com/apache/hive/pull/742#discussion_r314934575
 
 

 ##
 File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosExternalTables.java
 ##
 @@ -750,6 +766,38 @@ public Table apply(@Nullable Table table) {
 .verifyResults(Arrays.asList("1", "2"));
   }
 
+  @Test
+  public void testIncrementalDumpEmptyDumpDirectory() throws Throwable {
 
 Review comment:
   We cannot reproduce this scenario with acid enabled. If ACID enabled, we 
will have at least open/commit txn event part of each incremental dump. So, 
impossible to get empty incremental dump. Please try this scenario with ACID 
disabled. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 296727)
Time Spent: 1h 20m  (was: 1h 10m)

> Return the last event id dumped as repl status to avoid notification event 
> missing error.
> -
>
> Key: HIVE-22068
> URL: https://issues.apache.org/jira/browse/HIVE-22068
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22068.01.patch, HIVE-22068.02.patch, 
> HIVE-22068.03.patch, HIVE-22068.04.patch, HIVE-22068.05.patch
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> In repl load, update the status of target database to the last event dumped 
> so that repl status returns that and next incremental can specify it as the 
> event from which to start the dump. WIthout that repl status might return and 
> old event which might cause, older events to be dumped again and/or a 
> notification event missing error if the older events are cleaned by the 
> cleaner.
> While at it
>  * Add more logging to DB notification listener cleaner thread
>  ** The time when it considered cleaning, the interval and time before which 
> events were cleared, the min and max id at that time
>  ** how many events were cleared
>  ** min and max id after the cleaning.
>  * In REPL::START document the starting event, end event if specified and the 
> maximum number of events, if specified.
>  *



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (HIVE-22068) Return the last event id dumped as repl status to avoid notification event missing error.

2019-08-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-22068?focusedWorklogId=296382=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296382
 ]

ASF GitHub Bot logged work on HIVE-22068:
-

Author: ASF GitHub Bot
Created on: 16/Aug/19 16:12
Start Date: 16/Aug/19 16:12
Worklog Time Spent: 10m 
  Work Description: ashutosh-bapat commented on pull request #742: 
HIVE-22068 : Add more logging to notification cleaner and replication to track 
events
URL: https://github.com/apache/hive/pull/742#discussion_r31479
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplLoadTask.java
 ##
 @@ -522,6 +525,25 @@ private int executeIncrementalLoad(DriverContext 
driverContext) {
   // bootstrap of tables if exist.
   if (builder.hasMoreWork() || work.getPathsToCopyIterator().hasNext() || 
work.hasBootstrapLoadTasks()) {
 DAGTraversal.traverse(childTasks, new 
AddDependencyToLeaves(TaskFactory.get(work, conf)));
+  } else if (work.dbNameToLoadIn != null) {
+// Nothing to be done for repl load now. Add a task to update the 
last.repl.id of the
+// target database to the event id of the last event considered by the 
dump. Next
+// incremental cycle if starts from this id, the events considered for 
this dump, won't
+// be considered again. If we are replicating to multiple databases at 
a time, it's not
+// possible to know which all databases we are replicating into and 
hence we can not
+// update repl id in all those databases.
+String lastEventid = builder.eventTo().toString();
 
 Review comment:
   I think the reason this code is getting duplicated multiple times is the 
number of variables that change between one code snippet to the other is almost 
same as the number of lines of code. So, it's probably 50/50 chance that we 
will have any real benefit from de-duplication.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 296382)
Time Spent: 1h 10m  (was: 1h)

> Return the last event id dumped as repl status to avoid notification event 
> missing error.
> -
>
> Key: HIVE-22068
> URL: https://issues.apache.org/jira/browse/HIVE-22068
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22068.01.patch, HIVE-22068.02.patch, 
> HIVE-22068.03.patch, HIVE-22068.04.patch
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> In repl load, update the status of target database to the last event dumped 
> so that repl status returns that and next incremental can specify it as the 
> event from which to start the dump. WIthout that repl status might return and 
> old event which might cause, older events to be dumped again and/or a 
> notification event missing error if the older events are cleaned by the 
> cleaner.
> While at it
>  * Add more logging to DB notification listener cleaner thread
>  ** The time when it considered cleaning, the interval and time before which 
> events were cleared, the min and max id at that time
>  ** how many events were cleared
>  ** min and max id after the cleaning.
>  * In REPL::START document the starting event, end event if specified and the 
> maximum number of events, if specified.
>  *



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (HIVE-22068) Return the last event id dumped as repl status to avoid notification event missing error.

2019-08-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-22068?focusedWorklogId=296375=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296375
 ]

ASF GitHub Bot logged work on HIVE-22068:
-

Author: ASF GitHub Bot
Created on: 16/Aug/19 16:06
Start Date: 16/Aug/19 16:06
Worklog Time Spent: 10m 
  Work Description: ashutosh-bapat commented on pull request #742: 
HIVE-22068 : Add more logging to notification cleaner and replication to track 
events
URL: https://github.com/apache/hive/pull/742#discussion_r314787910
 
 

 ##
 File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosExternalTables.java
 ##
 @@ -750,6 +766,38 @@ public Table apply(@Nullable Table table) {
 .verifyResults(Arrays.asList("1", "2"));
   }
 
+  @Test
+  public void testIncrementalDumpEmptyDumpDirectory() throws Throwable {
 
 Review comment:
   I have added a testcase in the patch. It's passing for me. Can you please 
check the same?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 296375)
Time Spent: 1h  (was: 50m)

> Return the last event id dumped as repl status to avoid notification event 
> missing error.
> -
>
> Key: HIVE-22068
> URL: https://issues.apache.org/jira/browse/HIVE-22068
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22068.01.patch, HIVE-22068.02.patch, 
> HIVE-22068.03.patch, HIVE-22068.04.patch
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> In repl load, update the status of target database to the last event dumped 
> so that repl status returns that and next incremental can specify it as the 
> event from which to start the dump. WIthout that repl status might return and 
> old event which might cause, older events to be dumped again and/or a 
> notification event missing error if the older events are cleaned by the 
> cleaner.
> While at it
>  * Add more logging to DB notification listener cleaner thread
>  ** The time when it considered cleaning, the interval and time before which 
> events were cleared, the min and max id at that time
>  ** how many events were cleared
>  ** min and max id after the cleaning.
>  * In REPL::START document the starting event, end event if specified and the 
> maximum number of events, if specified.
>  *



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (HIVE-22068) Return the last event id dumped as repl status to avoid notification event missing error.

2019-08-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-22068?focusedWorklogId=296269=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296269
 ]

ASF GitHub Bot logged work on HIVE-22068:
-

Author: ASF GitHub Bot
Created on: 16/Aug/19 12:49
Start Date: 16/Aug/19 12:49
Worklog Time Spent: 10m 
  Work Description: ashutosh-bapat commented on pull request #742: 
HIVE-22068 : Add more logging to notification cleaner and replication to track 
events
URL: https://github.com/apache/hive/pull/742#discussion_r314705989
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplLoadTask.java
 ##
 @@ -522,6 +525,25 @@ private int executeIncrementalLoad(DriverContext 
driverContext) {
   // bootstrap of tables if exist.
   if (builder.hasMoreWork() || work.getPathsToCopyIterator().hasNext() || 
work.hasBootstrapLoadTasks()) {
 DAGTraversal.traverse(childTasks, new 
AddDependencyToLeaves(TaskFactory.get(work, conf)));
+  } else if (work.dbNameToLoadIn != null) {
 
 Review comment:
   Ok. Thanks for bringing that up. Done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 296269)
Time Spent: 50m  (was: 40m)

> Return the last event id dumped as repl status to avoid notification event 
> missing error.
> -
>
> Key: HIVE-22068
> URL: https://issues.apache.org/jira/browse/HIVE-22068
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22068.01.patch, HIVE-22068.02.patch, 
> HIVE-22068.03.patch, HIVE-22068.04.patch
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> In repl load, update the status of target database to the last event dumped 
> so that repl status returns that and next incremental can specify it as the 
> event from which to start the dump. WIthout that repl status might return and 
> old event which might cause, older events to be dumped again and/or a 
> notification event missing error if the older events are cleaned by the 
> cleaner.
> While at it
>  * Add more logging to DB notification listener cleaner thread
>  ** The time when it considered cleaning, the interval and time before which 
> events were cleared, the min and max id at that time
>  ** how many events were cleared
>  ** min and max id after the cleaning.
>  * In REPL::START document the starting event, end event if specified and the 
> maximum number of events, if specified.
>  *



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (HIVE-22068) Return the last event id dumped as repl status to avoid notification event missing error.

2019-08-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-22068?focusedWorklogId=296244=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296244
 ]

ASF GitHub Bot logged work on HIVE-22068:
-

Author: ASF GitHub Bot
Created on: 16/Aug/19 11:58
Start Date: 16/Aug/19 11:58
Worklog Time Spent: 10m 
  Work Description: ashutosh-bapat commented on pull request #742: 
HIVE-22068 : Add more logging to notification cleaner and replication to track 
events
URL: https://github.com/apache/hive/pull/742#discussion_r314690479
 
 

 ##
 File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosExternalTables.java
 ##
 @@ -750,6 +766,38 @@ public Table apply(@Nullable Table table) {
 .verifyResults(Arrays.asList("1", "2"));
   }
 
+  @Test
+  public void testIncrementalDumpEmptyDumpDirectory() throws Throwable {
+WarehouseInstance.Tuple tuple = primary.run("use " + primaryDbName)
+.run("create external table t1 (id int)")
+.run("insert into table t1 values (1)")
+.run("insert into table t1 values (2)")
+.dump(primaryDbName, null);
+
+replica.load(replicatedDbName, tuple.dumpLocation)
+.status(replicatedDbName)
+.verifyResult(tuple.lastReplicationId);
+
+WarehouseInstance.Tuple incTuple = primary.dump(primaryDbName, 
tuple.lastReplicationId);
+
+replica.load(replicatedDbName, incTuple.dumpLocation)
+.status(replicatedDbName)
+.verifyResult(incTuple.lastReplicationId);
+
+// create events for some other database and then dump the primaryDbName 
to dump an empty directory.
+primary.run("create database " + extraPrimaryDb + " WITH DBPROPERTIES ( '" 
+
+SOURCE_OF_REPLICATION + "' = '1,2,3')");
+WarehouseInstance.Tuple inc2Tuple = primary.run("use " + extraPrimaryDb)
+.run("create table tbl (fld int)")
+.run("use " + primaryDbName)
+.dump(primaryDbName, incTuple.lastReplicationId);
+
 
 Review comment:
   Done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 296244)
Time Spent: 40m  (was: 0.5h)

> Return the last event id dumped as repl status to avoid notification event 
> missing error.
> -
>
> Key: HIVE-22068
> URL: https://issues.apache.org/jira/browse/HIVE-22068
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22068.01.patch, HIVE-22068.02.patch, 
> HIVE-22068.03.patch, HIVE-22068.04.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> In repl load, update the status of target database to the last event dumped 
> so that repl status returns that and next incremental can specify it as the 
> event from which to start the dump. WIthout that repl status might return and 
> old event which might cause, older events to be dumped again and/or a 
> notification event missing error if the older events are cleaned by the 
> cleaner.
> While at it
>  * Add more logging to DB notification listener cleaner thread
>  ** The time when it considered cleaning, the interval and time before which 
> events were cleared, the min and max id at that time
>  ** how many events were cleared
>  ** min and max id after the cleaning.
>  * In REPL::START document the starting event, end event if specified and the 
> maximum number of events, if specified.
>  *



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (HIVE-22068) Return the last event id dumped as repl status to avoid notification event missing error.

2019-08-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-22068?focusedWorklogId=296104=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296104
 ]

ASF GitHub Bot logged work on HIVE-22068:
-

Author: ASF GitHub Bot
Created on: 16/Aug/19 06:32
Start Date: 16/Aug/19 06:32
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #742: HIVE-22068 : 
Add more logging to notification cleaner and replication to track events
URL: https://github.com/apache/hive/pull/742#discussion_r314595417
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplLoadTask.java
 ##
 @@ -522,6 +525,25 @@ private int executeIncrementalLoad(DriverContext 
driverContext) {
   // bootstrap of tables if exist.
   if (builder.hasMoreWork() || work.getPathsToCopyIterator().hasNext() || 
work.hasBootstrapLoadTasks()) {
 DAGTraversal.traverse(childTasks, new 
AddDependencyToLeaves(TaskFactory.get(work, conf)));
+  } else if (work.dbNameToLoadIn != null) {
 
 Review comment:
   I think, work.dbNameToLoadIn will be null if you don't specify the name in 
REPL LOAD command. In this case, we should get the name from DumpMetadata to 
set the last repl ID.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 296104)
Time Spent: 20m  (was: 10m)

> Return the last event id dumped as repl status to avoid notification event 
> missing error.
> -
>
> Key: HIVE-22068
> URL: https://issues.apache.org/jira/browse/HIVE-22068
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22068.01.patch, HIVE-22068.02.patch, 
> HIVE-22068.03.patch, HIVE-22068.04.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In repl load, update the status of target database to the last event dumped 
> so that repl status returns that and next incremental can specify it as the 
> event from which to start the dump. WIthout that repl status might return and 
> old event which might cause, older events to be dumped again and/or a 
> notification event missing error if the older events are cleaned by the 
> cleaner.
> While at it
>  * Add more logging to DB notification listener cleaner thread
>  ** The time when it considered cleaning, the interval and time before which 
> events were cleared, the min and max id at that time
>  ** how many events were cleared
>  ** min and max id after the cleaning.
>  * In REPL::START document the starting event, end event if specified and the 
> maximum number of events, if specified.
>  *



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (HIVE-22068) Return the last event id dumped as repl status to avoid notification event missing error.

2019-08-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-22068?focusedWorklogId=296105=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296105
 ]

ASF GitHub Bot logged work on HIVE-22068:
-

Author: ASF GitHub Bot
Created on: 16/Aug/19 06:32
Start Date: 16/Aug/19 06:32
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #742: HIVE-22068 : 
Add more logging to notification cleaner and replication to track events
URL: https://github.com/apache/hive/pull/742#discussion_r314596061
 
 

 ##
 File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosExternalTables.java
 ##
 @@ -750,6 +766,38 @@ public Table apply(@Nullable Table table) {
 .verifyResults(Arrays.asList("1", "2"));
   }
 
+  @Test
+  public void testIncrementalDumpEmptyDumpDirectory() throws Throwable {
 
 Review comment:
   Add another test case where we dynamically bootstrap a table (table level 
replication) with incremental dump but no events are dumped. It takes a special 
route in executeIncrementalLoad() method Line: 503 and so I guess, as per 
current change, it won't update the database last repl ID.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 296105)
Time Spent: 20m  (was: 10m)

> Return the last event id dumped as repl status to avoid notification event 
> missing error.
> -
>
> Key: HIVE-22068
> URL: https://issues.apache.org/jira/browse/HIVE-22068
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22068.01.patch, HIVE-22068.02.patch, 
> HIVE-22068.03.patch, HIVE-22068.04.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In repl load, update the status of target database to the last event dumped 
> so that repl status returns that and next incremental can specify it as the 
> event from which to start the dump. WIthout that repl status might return and 
> old event which might cause, older events to be dumped again and/or a 
> notification event missing error if the older events are cleaned by the 
> cleaner.
> While at it
>  * Add more logging to DB notification listener cleaner thread
>  ** The time when it considered cleaning, the interval and time before which 
> events were cleared, the min and max id at that time
>  ** how many events were cleared
>  ** min and max id after the cleaning.
>  * In REPL::START document the starting event, end event if specified and the 
> maximum number of events, if specified.
>  *



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (HIVE-22068) Return the last event id dumped as repl status to avoid notification event missing error.

2019-08-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-22068?focusedWorklogId=296106=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296106
 ]

ASF GitHub Bot logged work on HIVE-22068:
-

Author: ASF GitHub Bot
Created on: 16/Aug/19 06:32
Start Date: 16/Aug/19 06:32
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #742: HIVE-22068 : 
Add more logging to notification cleaner and replication to track events
URL: https://github.com/apache/hive/pull/742#discussion_r314592846
 
 

 ##
 File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosExternalTables.java
 ##
 @@ -750,6 +766,38 @@ public Table apply(@Nullable Table table) {
 .verifyResults(Arrays.asList("1", "2"));
   }
 
+  @Test
+  public void testIncrementalDumpEmptyDumpDirectory() throws Throwable {
+WarehouseInstance.Tuple tuple = primary.run("use " + primaryDbName)
+.run("create external table t1 (id int)")
+.run("insert into table t1 values (1)")
+.run("insert into table t1 values (2)")
+.dump(primaryDbName, null);
+
+replica.load(replicatedDbName, tuple.dumpLocation)
+.status(replicatedDbName)
+.verifyResult(tuple.lastReplicationId);
+
+WarehouseInstance.Tuple incTuple = primary.dump(primaryDbName, 
tuple.lastReplicationId);
+
+replica.load(replicatedDbName, incTuple.dumpLocation)
+.status(replicatedDbName)
+.verifyResult(incTuple.lastReplicationId);
+
+// create events for some other database and then dump the primaryDbName 
to dump an empty directory.
+primary.run("create database " + extraPrimaryDb + " WITH DBPROPERTIES ( '" 
+
+SOURCE_OF_REPLICATION + "' = '1,2,3')");
+WarehouseInstance.Tuple inc2Tuple = primary.run("use " + extraPrimaryDb)
+.run("create table tbl (fld int)")
+.run("use " + primaryDbName)
+.dump(primaryDbName, incTuple.lastReplicationId);
+
 
 Review comment:
   Shall add a validation if REPL DUMP returned last_repl_id is same as the 
latest event ID in notification event table even though no events on dumped db.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 296106)
Time Spent: 0.5h  (was: 20m)

> Return the last event id dumped as repl status to avoid notification event 
> missing error.
> -
>
> Key: HIVE-22068
> URL: https://issues.apache.org/jira/browse/HIVE-22068
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22068.01.patch, HIVE-22068.02.patch, 
> HIVE-22068.03.patch, HIVE-22068.04.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> In repl load, update the status of target database to the last event dumped 
> so that repl status returns that and next incremental can specify it as the 
> event from which to start the dump. WIthout that repl status might return and 
> old event which might cause, older events to be dumped again and/or a 
> notification event missing error if the older events are cleaned by the 
> cleaner.
> While at it
>  * Add more logging to DB notification listener cleaner thread
>  ** The time when it considered cleaning, the interval and time before which 
> events were cleared, the min and max id at that time
>  ** how many events were cleared
>  ** min and max id after the cleaning.
>  * In REPL::START document the starting event, end event if specified and the 
> maximum number of events, if specified.
>  *



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (HIVE-22068) Return the last event id dumped as repl status to avoid notification event missing error.

2019-08-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-22068?focusedWorklogId=296103=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296103
 ]

ASF GitHub Bot logged work on HIVE-22068:
-

Author: ASF GitHub Bot
Created on: 16/Aug/19 06:32
Start Date: 16/Aug/19 06:32
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #742: HIVE-22068 : 
Add more logging to notification cleaner and replication to track events
URL: https://github.com/apache/hive/pull/742#discussion_r314596395
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplLoadTask.java
 ##
 @@ -522,6 +525,25 @@ private int executeIncrementalLoad(DriverContext 
driverContext) {
   // bootstrap of tables if exist.
   if (builder.hasMoreWork() || work.getPathsToCopyIterator().hasNext() || 
work.hasBootstrapLoadTasks()) {
 DAGTraversal.traverse(childTasks, new 
AddDependencyToLeaves(TaskFactory.get(work, conf)));
+  } else if (work.dbNameToLoadIn != null) {
+// Nothing to be done for repl load now. Add a task to update the 
last.repl.id of the
+// target database to the event id of the last event considered by the 
dump. Next
+// incremental cycle if starts from this id, the events considered for 
this dump, won't
+// be considered again. If we are replicating to multiple databases at 
a time, it's not
+// possible to know which all databases we are replicating into and 
hence we can not
+// update repl id in all those databases.
+String lastEventid = builder.eventTo().toString();
 
 Review comment:
   Can we try to re-use ReplLoadTask.updateDatabaseLastReplID method instead of 
duplicating the code here?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 296103)
Time Spent: 20m  (was: 10m)

> Return the last event id dumped as repl status to avoid notification event 
> missing error.
> -
>
> Key: HIVE-22068
> URL: https://issues.apache.org/jira/browse/HIVE-22068
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22068.01.patch, HIVE-22068.02.patch, 
> HIVE-22068.03.patch, HIVE-22068.04.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In repl load, update the status of target database to the last event dumped 
> so that repl status returns that and next incremental can specify it as the 
> event from which to start the dump. WIthout that repl status might return and 
> old event which might cause, older events to be dumped again and/or a 
> notification event missing error if the older events are cleaned by the 
> cleaner.
> While at it
>  * Add more logging to DB notification listener cleaner thread
>  ** The time when it considered cleaning, the interval and time before which 
> events were cleared, the min and max id at that time
>  ** how many events were cleared
>  ** min and max id after the cleaning.
>  * In REPL::START document the starting event, end event if specified and the 
> maximum number of events, if specified.
>  *



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)