[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.

2019-04-10 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=225969=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-225969
 ]

ASF GitHub Bot logged work on HIVE-21109:
-

Author: ASF GitHub Bot
Created on: 11/Apr/19 05:41
Start Date: 11/Apr/19 05:41
Worklog Time Spent: 10m 
  Work Description: ashutosh-bapat commented on pull request #579: 
HIVE-21109 : Support stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r274264405
 
 

 ##
 File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestStatsReplicationScenariosMM.java
 ##
 @@ -0,0 +1,54 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.parse;
+
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.metastore.conf.MetastoreConf;
+import 
org.apache.hadoop.hive.metastore.messaging.json.gzip.GzipJSONMessageEncoder;
+import org.junit.BeforeClass;
+import org.junit.Rule;
+import org.junit.rules.TestName;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.HashMap;
+import java.util.Map;
+
+/**
+ * Tests statistics replication for ACID tables.
+ */
+public class TestStatsReplicationScenariosMM extends 
TestStatsReplicationScenarios {
+  @Rule
+  public final TestName testName = new TestName();
+
+  @BeforeClass
+  public static void classLevelSetup() throws Exception {
+Map overrides = new HashMap<>();
+overrides.put(MetastoreConf.ConfVars.EVENT_MESSAGE_FACTORY.getHiveName(),
+GzipJSONMessageEncoder.class.getCanonicalName());
+overrides.put(HiveConf.ConfVars.HIVE_SUPPORT_CONCURRENCY.varname, "true");
+overrides.put(HiveConf.ConfVars.HIVE_TXN_MANAGER.varname,
+  "org.apache.hadoop.hive.ql.lockmgr.DbTxnManager");
+
overrides.put(MetastoreConf.ConfVars.CAPABILITY_CHECK.getHiveName(),"false");
+
overrides.put(HiveConf.ConfVars.REPL_BOOTSTRAP_DUMP_OPEN_TXN_TIMEOUT.varname,"1s");
+overrides.put(HiveConf.ConfVars.DYNAMICPARTITIONINGMODE.varname, 
"nonstrict");
+
+
+internalBeforeClassSetup(overrides, overrides, 
TestReplicationScenarios.class, true, "mm");
 
 Review comment:
   Done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 225969)
Time Spent: 13h 20m  (was: 13h 10m)

> Stats replication for ACID tables.
> --
>
> Key: HIVE-21109
> URL: https://issues.apache.org/jira/browse/HIVE-21109
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, 
> HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, 
> HIVE-21109.06.patch, HIVE-21109.07.patch, HIVE-21109.08.patch, 
> HIVE-21109.09.patch, HIVE-21109.09.patch, HIVE-21109.10.patch
>
>  Time Spent: 13h 20m
>  Remaining Estimate: 0h
>
> Transactional tables require a writeID associated with the stats update. This 
> writeId needs to be in sync with the writeId on the source and hence needs to 
> be replicated from the source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.

2019-04-10 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=225970=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-225970
 ]

ASF GitHub Bot logged work on HIVE-21109:
-

Author: ASF GitHub Bot
Created on: 11/Apr/19 05:41
Start Date: 11/Apr/19 05:41
Worklog Time Spent: 10m 
  Work Description: ashutosh-bapat commented on pull request #579: 
HIVE-21109 : Support stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r274264424
 
 

 ##
 File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestStatsReplicationScenariosMMNoAutogather.java
 ##
 @@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.parse;
+
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.metastore.conf.MetastoreConf;
+import 
org.apache.hadoop.hive.metastore.messaging.json.gzip.GzipJSONMessageEncoder;
+import org.junit.BeforeClass;
+import org.junit.Rule;
+import org.junit.rules.TestName;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.HashMap;
+import java.util.Map;
+
+/**
+ * Tests statistics replication for ACID tables.
+ */
+public class TestStatsReplicationScenariosMMNoAutogather extends 
TestStatsReplicationScenarios {
+  @Rule
+  public final TestName testName = new TestName();
+
+  @BeforeClass
+  public static void classLevelSetup() throws Exception {
+Map overrides = new HashMap<>();
+overrides.put(MetastoreConf.ConfVars.EVENT_MESSAGE_FACTORY.getHiveName(),
+GzipJSONMessageEncoder.class.getCanonicalName());
+overrides.put(HiveConf.ConfVars.HIVE_SUPPORT_CONCURRENCY.varname, "true");
+overrides.put(HiveConf.ConfVars.HIVE_TXN_MANAGER.varname,
+  "org.apache.hadoop.hive.ql.lockmgr.DbTxnManager");
+
overrides.put(MetastoreConf.ConfVars.CAPABILITY_CHECK.getHiveName(),"false");
+
overrides.put(HiveConf.ConfVars.REPL_BOOTSTRAP_DUMP_OPEN_TXN_TIMEOUT.varname,"1s");
+overrides.put(HiveConf.ConfVars.DYNAMICPARTITIONINGMODE.varname, 
"nonstrict");
+overrides.put("mapred.input.dir.recursive", "true");
+
+
+internalBeforeClassSetup(overrides, overrides, 
TestReplicationScenarios.class, false, "mm");
 
 Review comment:
   Done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 225970)
Time Spent: 13.5h  (was: 13h 20m)

> Stats replication for ACID tables.
> --
>
> Key: HIVE-21109
> URL: https://issues.apache.org/jira/browse/HIVE-21109
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, 
> HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, 
> HIVE-21109.06.patch, HIVE-21109.07.patch, HIVE-21109.08.patch, 
> HIVE-21109.09.patch, HIVE-21109.09.patch, HIVE-21109.10.patch
>
>  Time Spent: 13.5h
>  Remaining Estimate: 0h
>
> Transactional tables require a writeID associated with the stats update. This 
> writeId needs to be in sync with the writeId on the source and hence needs to 
> be replicated from the source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.

2019-04-10 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=225966=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-225966
 ]

ASF GitHub Bot logged work on HIVE-21109:
-

Author: ASF GitHub Bot
Created on: 11/Apr/19 05:36
Start Date: 11/Apr/19 05:36
Worklog Time Spent: 10m 
  Work Description: ashutosh-bapat commented on pull request #579: 
HIVE-21109 : Support stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r274263457
 
 

 ##
 File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestStatsReplicationScenarios.java
 ##
 @@ -224,6 +236,17 @@ private void 
verifyNoPartitionStatsReplicationForMetadataOnly(String tableName)
 }
   }
 
+  private String getCreateTableProperties() {
+if (acidTableKindToUse != null) {
+  if (acidTableKindToUse.equals("orc")) {
 
 Review comment:
   Done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 225966)
Time Spent: 13h 10m  (was: 13h)

> Stats replication for ACID tables.
> --
>
> Key: HIVE-21109
> URL: https://issues.apache.org/jira/browse/HIVE-21109
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, 
> HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, 
> HIVE-21109.06.patch, HIVE-21109.07.patch, HIVE-21109.08.patch, 
> HIVE-21109.09.patch, HIVE-21109.09.patch, HIVE-21109.10.patch
>
>  Time Spent: 13h 10m
>  Remaining Estimate: 0h
>
> Transactional tables require a writeID associated with the stats update. This 
> writeId needs to be in sync with the writeId on the source and hence needs to 
> be replicated from the source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.

2019-04-10 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=225642=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-225642
 ]

ASF GitHub Bot logged work on HIVE-21109:
-

Author: ASF GitHub Bot
Created on: 10/Apr/19 15:13
Start Date: 10/Apr/19 15:13
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #579: HIVE-21109 : 
Support stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r273981487
 
 

 ##
 File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestStatsReplicationScenarios.java
 ##
 @@ -224,6 +236,17 @@ private void 
verifyNoPartitionStatsReplicationForMetadataOnly(String tableName)
 }
   }
 
+  private String getCreateTableProperties() {
+if (acidTableKindToUse != null) {
+  if (acidTableKindToUse.equals("orc")) {
 
 Review comment:
   To be clear, the name can be "full_acid" and "mm_acid". In fact. MM table 
can be created on Orc data.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 225642)
Time Spent: 13h  (was: 12h 50m)

> Stats replication for ACID tables.
> --
>
> Key: HIVE-21109
> URL: https://issues.apache.org/jira/browse/HIVE-21109
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, 
> HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, 
> HIVE-21109.06.patch, HIVE-21109.07.patch, HIVE-21109.08.patch, 
> HIVE-21109.09.patch, HIVE-21109.09.patch, HIVE-21109.10.patch
>
>  Time Spent: 13h
>  Remaining Estimate: 0h
>
> Transactional tables require a writeID associated with the stats update. This 
> writeId needs to be in sync with the writeId on the source and hence needs to 
> be replicated from the source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.

2019-04-10 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=225640=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-225640
 ]

ASF GitHub Bot logged work on HIVE-21109:
-

Author: ASF GitHub Bot
Created on: 10/Apr/19 15:13
Start Date: 10/Apr/19 15:13
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #579: HIVE-21109 : 
Support stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r273984253
 
 

 ##
 File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestStatsReplicationScenariosMM.java
 ##
 @@ -0,0 +1,54 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.parse;
+
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.metastore.conf.MetastoreConf;
+import 
org.apache.hadoop.hive.metastore.messaging.json.gzip.GzipJSONMessageEncoder;
+import org.junit.BeforeClass;
+import org.junit.Rule;
+import org.junit.rules.TestName;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.HashMap;
+import java.util.Map;
+
+/**
+ * Tests statistics replication for ACID tables.
+ */
+public class TestStatsReplicationScenariosMM extends 
TestStatsReplicationScenarios {
+  @Rule
+  public final TestName testName = new TestName();
+
+  @BeforeClass
+  public static void classLevelSetup() throws Exception {
+Map overrides = new HashMap<>();
+overrides.put(MetastoreConf.ConfVars.EVENT_MESSAGE_FACTORY.getHiveName(),
+GzipJSONMessageEncoder.class.getCanonicalName());
+overrides.put(HiveConf.ConfVars.HIVE_SUPPORT_CONCURRENCY.varname, "true");
+overrides.put(HiveConf.ConfVars.HIVE_TXN_MANAGER.varname,
+  "org.apache.hadoop.hive.ql.lockmgr.DbTxnManager");
+
overrides.put(MetastoreConf.ConfVars.CAPABILITY_CHECK.getHiveName(),"false");
+
overrides.put(HiveConf.ConfVars.REPL_BOOTSTRAP_DUMP_OPEN_TXN_TIMEOUT.varname,"1s");
+overrides.put(HiveConf.ConfVars.DYNAMICPARTITIONINGMODE.varname, 
"nonstrict");
+
+
+internalBeforeClassSetup(overrides, overrides, 
TestReplicationScenarios.class, true, "mm");
 
 Review comment:
   Shall use current class name TestStatsReplicationScenariosMM.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 225640)
Time Spent: 12h 40m  (was: 12.5h)

> Stats replication for ACID tables.
> --
>
> Key: HIVE-21109
> URL: https://issues.apache.org/jira/browse/HIVE-21109
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, 
> HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, 
> HIVE-21109.06.patch, HIVE-21109.07.patch, HIVE-21109.08.patch, 
> HIVE-21109.09.patch, HIVE-21109.09.patch, HIVE-21109.10.patch
>
>  Time Spent: 12h 40m
>  Remaining Estimate: 0h
>
> Transactional tables require a writeID associated with the stats update. This 
> writeId needs to be in sync with the writeId on the source and hence needs to 
> be replicated from the source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.

2019-04-10 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=225641=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-225641
 ]

ASF GitHub Bot logged work on HIVE-21109:
-

Author: ASF GitHub Bot
Created on: 10/Apr/19 15:13
Start Date: 10/Apr/19 15:13
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #579: HIVE-21109 : 
Support stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r273984105
 
 

 ##
 File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestStatsReplicationScenariosMMNoAutogather.java
 ##
 @@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.parse;
+
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.metastore.conf.MetastoreConf;
+import 
org.apache.hadoop.hive.metastore.messaging.json.gzip.GzipJSONMessageEncoder;
+import org.junit.BeforeClass;
+import org.junit.Rule;
+import org.junit.rules.TestName;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.HashMap;
+import java.util.Map;
+
+/**
+ * Tests statistics replication for ACID tables.
+ */
+public class TestStatsReplicationScenariosMMNoAutogather extends 
TestStatsReplicationScenarios {
+  @Rule
+  public final TestName testName = new TestName();
+
+  @BeforeClass
+  public static void classLevelSetup() throws Exception {
+Map overrides = new HashMap<>();
+overrides.put(MetastoreConf.ConfVars.EVENT_MESSAGE_FACTORY.getHiveName(),
+GzipJSONMessageEncoder.class.getCanonicalName());
+overrides.put(HiveConf.ConfVars.HIVE_SUPPORT_CONCURRENCY.varname, "true");
+overrides.put(HiveConf.ConfVars.HIVE_TXN_MANAGER.varname,
+  "org.apache.hadoop.hive.ql.lockmgr.DbTxnManager");
+
overrides.put(MetastoreConf.ConfVars.CAPABILITY_CHECK.getHiveName(),"false");
+
overrides.put(HiveConf.ConfVars.REPL_BOOTSTRAP_DUMP_OPEN_TXN_TIMEOUT.varname,"1s");
+overrides.put(HiveConf.ConfVars.DYNAMICPARTITIONINGMODE.varname, 
"nonstrict");
+overrides.put("mapred.input.dir.recursive", "true");
+
+
+internalBeforeClassSetup(overrides, overrides, 
TestReplicationScenarios.class, false, "mm");
 
 Review comment:
   Shall use current class name TestStatsReplicationScenariosMMNoAutogather.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 225641)
Time Spent: 12h 50m  (was: 12h 40m)

> Stats replication for ACID tables.
> --
>
> Key: HIVE-21109
> URL: https://issues.apache.org/jira/browse/HIVE-21109
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, 
> HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, 
> HIVE-21109.06.patch, HIVE-21109.07.patch, HIVE-21109.08.patch, 
> HIVE-21109.09.patch, HIVE-21109.09.patch, HIVE-21109.10.patch
>
>  Time Spent: 12h 50m
>  Remaining Estimate: 0h
>
> Transactional tables require a writeID associated with the stats update. This 
> writeId needs to be in sync with the writeId on the source and hence needs to 
> be replicated from the source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.

2019-04-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=225073=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-225073
 ]

ASF GitHub Bot logged work on HIVE-21109:
-

Author: ASF GitHub Bot
Created on: 09/Apr/19 15:55
Start Date: 09/Apr/19 15:55
Worklog Time Spent: 10m 
  Work Description: ashutosh-bapat commented on pull request #579: 
HIVE-21109 : Support stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r273563740
 
 

 ##
 File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestStatsReplicationScenariosMigration.java
 ##
 @@ -0,0 +1,78 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.parse;
+
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.metastore.conf.MetastoreConf;
+import 
org.apache.hadoop.hive.metastore.messaging.json.gzip.GzipJSONMessageEncoder;
+import org.junit.BeforeClass;
+import org.junit.Rule;
+import org.junit.rules.TestName;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.HashMap;
+import java.util.Map;
+
+/**
+ * Tests statistics replication for ACID tables.
+ */
+public class TestStatsReplicationScenariosMigration extends 
TestStatsReplicationScenarios {
+  @Rule
+  public final TestName testName = new TestName();
+
+  protected static final Logger LOG = 
LoggerFactory.getLogger(TestReplicationScenarios.class);
+
+  @BeforeClass
+  public static void classLevelSetup() throws Exception {
+Map overrides = new HashMap<>();
+overrides.put(MetastoreConf.ConfVars.EVENT_MESSAGE_FACTORY.getHiveName(),
+GzipJSONMessageEncoder.class.getCanonicalName());
+
+HashMap replicaConfigs = new HashMap() {{
+  put("hive.support.concurrency", "true");
+  put("hive.txn.manager", 
"org.apache.hadoop.hive.ql.lockmgr.DbTxnManager");
+  put("hive.metastore.client.capability.check", "false");
+  put("hive.repl.bootstrap.dump.open.txn.timeout", "1s");
+  put("hive.exec.dynamic.partition.mode", "nonstrict");
+  put("hive.strict.checks.bucketing", "false");
+  put("hive.mapred.mode", "nonstrict");
+  put("mapred.input.dir.recursive", "true");
+  put("hive.metastore.disallow.incompatible.col.type.changes", "false");
+  put("hive.strict.managed.tables", "true");
+}};
+replicaConfigs.putAll(overrides);
+
+HashMap primaryConfigs = new HashMap() {{
+  put("hive.metastore.client.capability.check", "false");
+  put("hive.repl.bootstrap.dump.open.txn.timeout", "1s");
+  put("hive.exec.dynamic.partition.mode", "nonstrict");
+  put("hive.strict.checks.bucketing", "false");
+  put("hive.mapred.mode", "nonstrict");
+  put("mapred.input.dir.recursive", "true");
+  put("hive.metastore.disallow.incompatible.col.type.changes", "false");
+  put("hive.support.concurrency", "false");
+  put("hive.txn.manager", 
"org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager");
+  put("hive.strict.managed.tables", "false");
+}};
+primaryConfigs.putAll(overrides);
+
+internalBeforeClassSetup(primaryConfigs, replicaConfigs,
 
 Review comment:
   As long as the writeId associated with the stats is valid according to  the 
given query's valid writeId list, the stats will be used if they are marked 
valid. Usually when a writeId advances, the stats will be marked invalid if the 
operation advancing the writeId renders stats inaccurate. In case of migration, 
even though writeid advances, the operation may not necessarily render stats 
inaccurate. In that case, even if the writeId associated with the stats is 
behind the latest allocated one the stats will be useful as long as 1. the 
writeId appears valid according to the query's writeId list and 2. stats 
themselves are marked valid.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact 

[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.

2019-04-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=225063=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-225063
 ]

ASF GitHub Bot logged work on HIVE-21109:
-

Author: ASF GitHub Bot
Created on: 09/Apr/19 15:47
Start Date: 09/Apr/19 15:47
Worklog Time Spent: 10m 
  Work Description: ashutosh-bapat commented on pull request #579: 
HIVE-21109 : Support stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r273560079
 
 

 ##
 File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestStatsReplicationScenarios.java
 ##
 @@ -216,16 +233,23 @@ private void 
verifyNoPartitionStatsReplicationForMetadataOnly(String tableName)
 String ndTableName = "ndTable";
 // Partitioned table without data during bootstrap and hence no stats.
 String ndPartTableName = "ndPTable";
+String tblCreateExtra = "";
+
+if (useAcidTables) {
 
 Review comment:
   Done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 225063)
Time Spent: 12h 20m  (was: 12h 10m)

> Stats replication for ACID tables.
> --
>
> Key: HIVE-21109
> URL: https://issues.apache.org/jira/browse/HIVE-21109
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, 
> HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, 
> HIVE-21109.06.patch, HIVE-21109.07.patch, HIVE-21109.08.patch, 
> HIVE-21109.09.patch, HIVE-21109.09.patch
>
>  Time Spent: 12h 20m
>  Remaining Estimate: 0h
>
> Transactional tables require a writeID associated with the stats update. This 
> writeId needs to be in sync with the writeId on the source and hence needs to 
> be replicated from the source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.

2019-04-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=221688=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221688
 ]

ASF GitHub Bot logged work on HIVE-21109:
-

Author: ASF GitHub Bot
Created on: 02/Apr/19 11:25
Start Date: 02/Apr/19 11:25
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #579: HIVE-21109 : 
Support stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r271253403
 
 

 ##
 File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestStatsReplicationScenariosMigration.java
 ##
 @@ -0,0 +1,78 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.parse;
+
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.metastore.conf.MetastoreConf;
+import 
org.apache.hadoop.hive.metastore.messaging.json.gzip.GzipJSONMessageEncoder;
+import org.junit.BeforeClass;
+import org.junit.Rule;
+import org.junit.rules.TestName;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.HashMap;
+import java.util.Map;
+
+/**
+ * Tests statistics replication for ACID tables.
+ */
+public class TestStatsReplicationScenariosMigration extends 
TestStatsReplicationScenarios {
+  @Rule
+  public final TestName testName = new TestName();
+
+  protected static final Logger LOG = 
LoggerFactory.getLogger(TestReplicationScenarios.class);
+
+  @BeforeClass
+  public static void classLevelSetup() throws Exception {
+Map overrides = new HashMap<>();
+overrides.put(MetastoreConf.ConfVars.EVENT_MESSAGE_FACTORY.getHiveName(),
+GzipJSONMessageEncoder.class.getCanonicalName());
+
+HashMap replicaConfigs = new HashMap() {{
+  put("hive.support.concurrency", "true");
+  put("hive.txn.manager", 
"org.apache.hadoop.hive.ql.lockmgr.DbTxnManager");
+  put("hive.metastore.client.capability.check", "false");
+  put("hive.repl.bootstrap.dump.open.txn.timeout", "1s");
+  put("hive.exec.dynamic.partition.mode", "nonstrict");
+  put("hive.strict.checks.bucketing", "false");
+  put("hive.mapred.mode", "nonstrict");
+  put("mapred.input.dir.recursive", "true");
+  put("hive.metastore.disallow.incompatible.col.type.changes", "false");
+  put("hive.strict.managed.tables", "true");
+}};
+replicaConfigs.putAll(overrides);
+
+HashMap primaryConfigs = new HashMap() {{
+  put("hive.metastore.client.capability.check", "false");
+  put("hive.repl.bootstrap.dump.open.txn.timeout", "1s");
+  put("hive.exec.dynamic.partition.mode", "nonstrict");
+  put("hive.strict.checks.bucketing", "false");
+  put("hive.mapred.mode", "nonstrict");
+  put("mapred.input.dir.recursive", "true");
+  put("hive.metastore.disallow.incompatible.col.type.changes", "false");
+  put("hive.support.concurrency", "false");
+  put("hive.txn.manager", 
"org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager");
+  put("hive.strict.managed.tables", "false");
+}};
+primaryConfigs.putAll(overrides);
+
+internalBeforeClassSetup(primaryConfigs, replicaConfigs,
 
 Review comment:
   Yeah. That's true.
   I have a question on how txn stats work.
   If the writeId in the table/partition is less than the latest allocated 
writeId in the table, will future queries use this stats for query optimization 
or not? 
   If writeId is not latest, it means, possibly stats are not up to date. So, 
I'm not sure how such stats are used. Pls share if you have some idea.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 221688)
Time Spent: 12h  (was: 11h 50m)

> Stats replication for ACID tables.
> --
>
> Key: HIVE-21109
>  

[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.

2019-04-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=221689=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221689
 ]

ASF GitHub Bot logged work on HIVE-21109:
-

Author: ASF GitHub Bot
Created on: 02/Apr/19 11:25
Start Date: 02/Apr/19 11:25
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #579: HIVE-21109 : 
Support stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r271253403
 
 

 ##
 File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestStatsReplicationScenariosMigration.java
 ##
 @@ -0,0 +1,78 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.parse;
+
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.metastore.conf.MetastoreConf;
+import 
org.apache.hadoop.hive.metastore.messaging.json.gzip.GzipJSONMessageEncoder;
+import org.junit.BeforeClass;
+import org.junit.Rule;
+import org.junit.rules.TestName;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.HashMap;
+import java.util.Map;
+
+/**
+ * Tests statistics replication for ACID tables.
+ */
+public class TestStatsReplicationScenariosMigration extends 
TestStatsReplicationScenarios {
+  @Rule
+  public final TestName testName = new TestName();
+
+  protected static final Logger LOG = 
LoggerFactory.getLogger(TestReplicationScenarios.class);
+
+  @BeforeClass
+  public static void classLevelSetup() throws Exception {
+Map overrides = new HashMap<>();
+overrides.put(MetastoreConf.ConfVars.EVENT_MESSAGE_FACTORY.getHiveName(),
+GzipJSONMessageEncoder.class.getCanonicalName());
+
+HashMap replicaConfigs = new HashMap() {{
+  put("hive.support.concurrency", "true");
+  put("hive.txn.manager", 
"org.apache.hadoop.hive.ql.lockmgr.DbTxnManager");
+  put("hive.metastore.client.capability.check", "false");
+  put("hive.repl.bootstrap.dump.open.txn.timeout", "1s");
+  put("hive.exec.dynamic.partition.mode", "nonstrict");
+  put("hive.strict.checks.bucketing", "false");
+  put("hive.mapred.mode", "nonstrict");
+  put("mapred.input.dir.recursive", "true");
+  put("hive.metastore.disallow.incompatible.col.type.changes", "false");
+  put("hive.strict.managed.tables", "true");
+}};
+replicaConfigs.putAll(overrides);
+
+HashMap primaryConfigs = new HashMap() {{
+  put("hive.metastore.client.capability.check", "false");
+  put("hive.repl.bootstrap.dump.open.txn.timeout", "1s");
+  put("hive.exec.dynamic.partition.mode", "nonstrict");
+  put("hive.strict.checks.bucketing", "false");
+  put("hive.mapred.mode", "nonstrict");
+  put("mapred.input.dir.recursive", "true");
+  put("hive.metastore.disallow.incompatible.col.type.changes", "false");
+  put("hive.support.concurrency", "false");
+  put("hive.txn.manager", 
"org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager");
+  put("hive.strict.managed.tables", "false");
+}};
+primaryConfigs.putAll(overrides);
+
+internalBeforeClassSetup(primaryConfigs, replicaConfigs,
 
 Review comment:
   Yeah. That's true.
   I have a question on how txn stats work.
   If the writeId in the table/partition object is less than the latest 
allocated writeId in the table, will future queries use this stats for query 
optimization or not? 
   If writeId is not latest, it means, possibly stats are not up to date. So, 
I'm not sure how such stats are used. Pls share if you have some idea.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 221689)
Time Spent: 12h 10m  (was: 12h)

> Stats replication for ACID tables.
> --
>
> Key: HIVE-21109
>   

[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.

2019-04-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=221664=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221664
 ]

ASF GitHub Bot logged work on HIVE-21109:
-

Author: ASF GitHub Bot
Created on: 02/Apr/19 09:41
Start Date: 02/Apr/19 09:41
Worklog Time Spent: 10m 
  Work Description: ashutosh-bapat commented on pull request #579: 
HIVE-21109 : Support stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r271215497
 
 

 ##
 File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestStatsReplicationScenariosMigration.java
 ##
 @@ -0,0 +1,78 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.parse;
+
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.metastore.conf.MetastoreConf;
+import 
org.apache.hadoop.hive.metastore.messaging.json.gzip.GzipJSONMessageEncoder;
+import org.junit.BeforeClass;
+import org.junit.Rule;
+import org.junit.rules.TestName;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.HashMap;
+import java.util.Map;
+
+/**
+ * Tests statistics replication for ACID tables.
+ */
+public class TestStatsReplicationScenariosMigration extends 
TestStatsReplicationScenarios {
+  @Rule
+  public final TestName testName = new TestName();
+
+  protected static final Logger LOG = 
LoggerFactory.getLogger(TestReplicationScenarios.class);
+
+  @BeforeClass
+  public static void classLevelSetup() throws Exception {
+Map overrides = new HashMap<>();
+overrides.put(MetastoreConf.ConfVars.EVENT_MESSAGE_FACTORY.getHiveName(),
+GzipJSONMessageEncoder.class.getCanonicalName());
+
+HashMap replicaConfigs = new HashMap() {{
+  put("hive.support.concurrency", "true");
+  put("hive.txn.manager", 
"org.apache.hadoop.hive.ql.lockmgr.DbTxnManager");
+  put("hive.metastore.client.capability.check", "false");
+  put("hive.repl.bootstrap.dump.open.txn.timeout", "1s");
+  put("hive.exec.dynamic.partition.mode", "nonstrict");
+  put("hive.strict.checks.bucketing", "false");
+  put("hive.mapred.mode", "nonstrict");
+  put("mapred.input.dir.recursive", "true");
+  put("hive.metastore.disallow.incompatible.col.type.changes", "false");
+  put("hive.strict.managed.tables", "true");
+}};
+replicaConfigs.putAll(overrides);
+
+HashMap primaryConfigs = new HashMap() {{
+  put("hive.metastore.client.capability.check", "false");
+  put("hive.repl.bootstrap.dump.open.txn.timeout", "1s");
+  put("hive.exec.dynamic.partition.mode", "nonstrict");
+  put("hive.strict.checks.bucketing", "false");
+  put("hive.mapred.mode", "nonstrict");
+  put("mapred.input.dir.recursive", "true");
+  put("hive.metastore.disallow.incompatible.col.type.changes", "false");
+  put("hive.support.concurrency", "false");
+  put("hive.txn.manager", 
"org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager");
+  put("hive.strict.managed.tables", "false");
+}};
+primaryConfigs.putAll(overrides);
+
+internalBeforeClassSetup(primaryConfigs, replicaConfigs,
 
 Review comment:
   The write Id associated with the column stats is stored in Table/Partition 
object. For migration case, this writeId may not exactly represent the writeId 
of stats if there's an alter table event after column stats update. Such an 
alter table event would open a new transaction and update writeId in 
table/partition object.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 221664)
Time Spent: 11h 50m  (was: 11h 40m)

> Stats replication for ACID tables.
> --
>
> Key: HIVE-21109
> URL: 

[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.

2019-04-01 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=221246=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221246
 ]

ASF GitHub Bot logged work on HIVE-21109:
-

Author: ASF GitHub Bot
Created on: 01/Apr/19 11:18
Start Date: 01/Apr/19 11:18
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #579: HIVE-21109 : 
Support stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r270820010
 
 

 ##
 File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestStatsReplicationScenarios.java
 ##
 @@ -301,12 +338,106 @@ private String dumpLoadVerify(List tableNames, 
String lastReplicationId,
 return dumpTuple.lastReplicationId;
   }
 
+  /**
+   * Run a bootstrap that will fail.
+   * @param tuple the location of bootstrap dump
+   */
+  private void failBootstrapLoad(WarehouseInstance.Tuple tuple, int 
failAfterNumTables) throws Throwable {
+// fail setting ckpt directory property for the second table so that we 
test the case when
+// bootstrap load fails after some but not all tables are loaded.
+BehaviourInjection callerVerifier
+= new BehaviourInjection() {
+  int cntTables = 0;
+  @Nullable
+  @Override
+  public Boolean apply(@Nullable CallerArguments args) {
+cntTables++;
+if (args.dbName.equalsIgnoreCase(replicatedDbName) && cntTables > 
failAfterNumTables) {
+  injectionPathCalled = true;
+  LOG.warn("Verifier - DB : " + args.dbName + " TABLE : " + 
args.tblName);
+  return false;
+}
+return true;
+  }
+};
+
+InjectableBehaviourObjectStore.setAlterTableModifier(callerVerifier);
+try {
+  replica.loadFailure(replicatedDbName, tuple.dumpLocation);
+  callerVerifier.assertInjectionsPerformed(true, false);
+} finally {
+  InjectableBehaviourObjectStore.resetAlterTableModifier();
+}
+  }
+
+  private void failIncrementalLoad(WarehouseInstance.Tuple dumpTuple, int 
failAfterNumEvents) throws Throwable {
+// fail add notification when updating table stats after given number of 
such events. Thus we
+// test successful application as well as failed application of this event.
+BehaviourInjection callerVerifier
+= new BehaviourInjection() {
+  int cntEvents = 0;
+  @Override
+  public Boolean apply(NotificationEvent entry) {
+cntEvents++;
 
 Review comment:
   OK
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 221246)
Time Spent: 11.5h  (was: 11h 20m)

> Stats replication for ACID tables.
> --
>
> Key: HIVE-21109
> URL: https://issues.apache.org/jira/browse/HIVE-21109
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, 
> HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, 
> HIVE-21109.06.patch, HIVE-21109.07.patch, HIVE-21109.08.patch
>
>  Time Spent: 11.5h
>  Remaining Estimate: 0h
>
> Transactional tables require a writeID associated with the stats update. This 
> writeId needs to be in sync with the writeId on the source and hence needs to 
> be replicated from the source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.

2019-04-01 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=221247=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221247
 ]

ASF GitHub Bot logged work on HIVE-21109:
-

Author: ASF GitHub Bot
Created on: 01/Apr/19 11:18
Start Date: 01/Apr/19 11:18
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #579: HIVE-21109 : 
Support stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r270820010
 
 

 ##
 File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestStatsReplicationScenarios.java
 ##
 @@ -301,12 +338,106 @@ private String dumpLoadVerify(List tableNames, 
String lastReplicationId,
 return dumpTuple.lastReplicationId;
   }
 
+  /**
+   * Run a bootstrap that will fail.
+   * @param tuple the location of bootstrap dump
+   */
+  private void failBootstrapLoad(WarehouseInstance.Tuple tuple, int 
failAfterNumTables) throws Throwable {
+// fail setting ckpt directory property for the second table so that we 
test the case when
+// bootstrap load fails after some but not all tables are loaded.
+BehaviourInjection callerVerifier
+= new BehaviourInjection() {
+  int cntTables = 0;
+  @Nullable
+  @Override
+  public Boolean apply(@Nullable CallerArguments args) {
+cntTables++;
+if (args.dbName.equalsIgnoreCase(replicatedDbName) && cntTables > 
failAfterNumTables) {
+  injectionPathCalled = true;
+  LOG.warn("Verifier - DB : " + args.dbName + " TABLE : " + 
args.tblName);
+  return false;
+}
+return true;
+  }
+};
+
+InjectableBehaviourObjectStore.setAlterTableModifier(callerVerifier);
+try {
+  replica.loadFailure(replicatedDbName, tuple.dumpLocation);
+  callerVerifier.assertInjectionsPerformed(true, false);
+} finally {
+  InjectableBehaviourObjectStore.resetAlterTableModifier();
+}
+  }
+
+  private void failIncrementalLoad(WarehouseInstance.Tuple dumpTuple, int 
failAfterNumEvents) throws Throwable {
+// fail add notification when updating table stats after given number of 
such events. Thus we
+// test successful application as well as failed application of this event.
+BehaviourInjection callerVerifier
+= new BehaviourInjection() {
+  int cntEvents = 0;
+  @Override
+  public Boolean apply(NotificationEvent entry) {
+cntEvents++;
 
 Review comment:
   OK. Pls update the test to fail for 2nd update stats event.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 221247)
Time Spent: 11h 40m  (was: 11.5h)

> Stats replication for ACID tables.
> --
>
> Key: HIVE-21109
> URL: https://issues.apache.org/jira/browse/HIVE-21109
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, 
> HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, 
> HIVE-21109.06.patch, HIVE-21109.07.patch, HIVE-21109.08.patch
>
>  Time Spent: 11h 40m
>  Remaining Estimate: 0h
>
> Transactional tables require a writeID associated with the stats update. This 
> writeId needs to be in sync with the writeId on the source and hence needs to 
> be replicated from the source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.

2019-04-01 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=221245=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221245
 ]

ASF GitHub Bot logged work on HIVE-21109:
-

Author: ASF GitHub Bot
Created on: 01/Apr/19 11:17
Start Date: 01/Apr/19 11:17
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #579: HIVE-21109 : 
Support stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r270819769
 
 

 ##
 File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestStatsReplicationScenarios.java
 ##
 @@ -301,12 +338,106 @@ private String dumpLoadVerify(List tableNames, 
String lastReplicationId,
 return dumpTuple.lastReplicationId;
   }
 
+  /**
+   * Run a bootstrap that will fail.
+   * @param tuple the location of bootstrap dump
+   */
+  private void failBootstrapLoad(WarehouseInstance.Tuple tuple, int 
failAfterNumTables) throws Throwable {
+// fail setting ckpt directory property for the second table so that we 
test the case when
+// bootstrap load fails after some but not all tables are loaded.
+BehaviourInjection callerVerifier
+= new BehaviourInjection() {
+  int cntTables = 0;
+  @Nullable
+  @Override
+  public Boolean apply(@Nullable CallerArguments args) {
+cntTables++;
+if (args.dbName.equalsIgnoreCase(replicatedDbName) && cntTables > 
failAfterNumTables) {
+  injectionPathCalled = true;
+  LOG.warn("Verifier - DB : " + args.dbName + " TABLE : " + 
args.tblName);
+  return false;
+}
+return true;
+  }
+};
+
+InjectableBehaviourObjectStore.setAlterTableModifier(callerVerifier);
+try {
+  replica.loadFailure(replicatedDbName, tuple.dumpLocation);
+  callerVerifier.assertInjectionsPerformed(true, false);
+} finally {
+  InjectableBehaviourObjectStore.resetAlterTableModifier();
+}
 
 Review comment:
   Ok
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 221245)
Time Spent: 11h 20m  (was: 11h 10m)

> Stats replication for ACID tables.
> --
>
> Key: HIVE-21109
> URL: https://issues.apache.org/jira/browse/HIVE-21109
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, 
> HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, 
> HIVE-21109.06.patch, HIVE-21109.07.patch, HIVE-21109.08.patch
>
>  Time Spent: 11h 20m
>  Remaining Estimate: 0h
>
> Transactional tables require a writeID associated with the stats update. This 
> writeId needs to be in sync with the writeId on the source and hence needs to 
> be replicated from the source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.

2019-04-01 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=221244=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221244
 ]

ASF GitHub Bot logged work on HIVE-21109:
-

Author: ASF GitHub Bot
Created on: 01/Apr/19 11:17
Start Date: 01/Apr/19 11:17
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #579: HIVE-21109 : 
Support stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r270819437
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
 ##
 @@ -2950,21 +2957,32 @@ public Partition createPartition(Table tbl, 
Map partSpec) throws
 int size = addPartitionDesc.getPartitionCount();
 List in =
 new ArrayList(size);
-AcidUtils.TableSnapshot tableSnapshot = AcidUtils.getTableSnapshot(conf, 
tbl, true);
 long writeId;
 String validWriteIdList;
-if (tableSnapshot != null && tableSnapshot.getWriteId() > 0) {
-  writeId = tableSnapshot.getWriteId();
-  validWriteIdList = tableSnapshot.getValidWriteIdList();
+
+// In case of replication, get the writeId from the source and use valid 
write Id list
+// for replication.
+if (addPartitionDesc.getReplicationSpec().isInReplicationScope() &&
+addPartitionDesc.getPartition(0).getWriteId() > 0) {
 
 Review comment:
   I think, this assumption of table type is non-transactional (based on 
writeId=0) and ignoring failure case is not right.
   We can explicitly check if it is transactional table or not and then do 
necessary checks. If writeId comes as 0 for transactional table, then it is 
error flow. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 221244)
Time Spent: 11h 10m  (was: 11h)

> Stats replication for ACID tables.
> --
>
> Key: HIVE-21109
> URL: https://issues.apache.org/jira/browse/HIVE-21109
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, 
> HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, 
> HIVE-21109.06.patch, HIVE-21109.07.patch, HIVE-21109.08.patch
>
>  Time Spent: 11h 10m
>  Remaining Estimate: 0h
>
> Transactional tables require a writeID associated with the stats update. This 
> writeId needs to be in sync with the writeId on the source and hence needs to 
> be replicated from the source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.

2019-04-01 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=221241=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221241
 ]

ASF GitHub Bot logged work on HIVE-21109:
-

Author: ASF GitHub Bot
Created on: 01/Apr/19 11:13
Start Date: 01/Apr/19 11:13
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #579: HIVE-21109 : 
Support stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r270818464
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnStatsUpdateTask.java
 ##
 @@ -297,21 +303,34 @@ private ColumnStatisticsDesc getColumnStatsDesc(String 
dbName,
 
   private int persistColumnStats(Hive db) throws HiveException, MetaException, 
IOException {
 ColumnStatistics colStats = constructColumnStatsFromInput();
-ColumnStatisticsDesc colStatsDesc = colStats.getStatsDesc();
-// We do not support stats replication for a transactional table yet. If 
we are converting
-// a non-transactional table to a transactional table during replication, 
we might get
-// column statistics but we shouldn't update those.
-if (work.getColStats() != null &&
-
AcidUtils.isTransactionalTable(getHive().getTable(colStatsDesc.getDbName(),
-  
colStatsDesc.getTableName( {
-  LOG.debug("Skipped updating column stats for table " +
-TableName.getDbTable(colStatsDesc.getDbName(), 
colStatsDesc.getTableName()) +
-" because it is converted to a transactional table during 
replication.");
-  return 0;
-}
-
 SetPartitionsStatsRequest request =
 new SetPartitionsStatsRequest(Collections.singletonList(colStats));
+
+// Set writeId and validWriteId list for replicated statistics.
+if (work.getColStats() != null) {
+  String dbName = colStats.getStatsDesc().getDbName();
+  String tblName = colStats.getStatsDesc().getTableName();
+  Table tbl = db.getTable(dbName, tblName);
+  long writeId = work.getWriteId();
+  // If it's a transactional table on source and target, we will get a 
valid writeId
+  // associated with it. Otherwise it's a non-transactional table on 
source migrated to a
+  // transactional table on target, we need to craft a valid writeId here.
+  if (AcidUtils.isTransactionalTable(tbl)) {
+ValidWriteIdList writeIds;
+if (writeId <= 0) {
+  Long tmpWriteId = ReplUtils.getMigrationCurrentTblWriteId(conf);
+  if (tmpWriteId == null) {
+throw new HiveException("DDLTask : Write id is not set in the 
config by open txn task for migration");
+  }
+  writeId = tmpWriteId;
+}
+writeIds = new ValidReaderWriteIdList(TableName.getDbTable(dbName, 
tblName), new long[0],
 
 Review comment:
   I think, this assumption can change in future if someone uses this task to 
update stats even in non-repl flow. I suggest to add explicit check for repl 
scope.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 221241)
Time Spent: 11h  (was: 10h 50m)

> Stats replication for ACID tables.
> --
>
> Key: HIVE-21109
> URL: https://issues.apache.org/jira/browse/HIVE-21109
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, 
> HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, 
> HIVE-21109.06.patch, HIVE-21109.07.patch, HIVE-21109.08.patch
>
>  Time Spent: 11h
>  Remaining Estimate: 0h
>
> Transactional tables require a writeID associated with the stats update. This 
> writeId needs to be in sync with the writeId on the source and hence needs to 
> be replicated from the source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.

2019-04-01 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=221240=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221240
 ]

ASF GitHub Bot logged work on HIVE-21109:
-

Author: ASF GitHub Bot
Created on: 01/Apr/19 11:10
Start Date: 01/Apr/19 11:10
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #579: HIVE-21109 : 
Support stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r270817825
 
 

 ##
 File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestStatsReplicationScenariosMigration.java
 ##
 @@ -0,0 +1,78 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.parse;
+
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.metastore.conf.MetastoreConf;
+import 
org.apache.hadoop.hive.metastore.messaging.json.gzip.GzipJSONMessageEncoder;
+import org.junit.BeforeClass;
+import org.junit.Rule;
+import org.junit.rules.TestName;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.HashMap;
+import java.util.Map;
+
+/**
+ * Tests statistics replication for ACID tables.
+ */
+public class TestStatsReplicationScenariosMigration extends 
TestStatsReplicationScenarios {
+  @Rule
+  public final TestName testName = new TestName();
+
+  protected static final Logger LOG = 
LoggerFactory.getLogger(TestReplicationScenarios.class);
+
+  @BeforeClass
+  public static void classLevelSetup() throws Exception {
+Map overrides = new HashMap<>();
+overrides.put(MetastoreConf.ConfVars.EVENT_MESSAGE_FACTORY.getHiveName(),
+GzipJSONMessageEncoder.class.getCanonicalName());
+
+HashMap replicaConfigs = new HashMap() {{
+  put("hive.support.concurrency", "true");
+  put("hive.txn.manager", 
"org.apache.hadoop.hive.ql.lockmgr.DbTxnManager");
+  put("hive.metastore.client.capability.check", "false");
+  put("hive.repl.bootstrap.dump.open.txn.timeout", "1s");
+  put("hive.exec.dynamic.partition.mode", "nonstrict");
+  put("hive.strict.checks.bucketing", "false");
+  put("hive.mapred.mode", "nonstrict");
+  put("mapred.input.dir.recursive", "true");
+  put("hive.metastore.disallow.incompatible.col.type.changes", "false");
+  put("hive.strict.managed.tables", "true");
+}};
+replicaConfigs.putAll(overrides);
+
+HashMap primaryConfigs = new HashMap() {{
+  put("hive.metastore.client.capability.check", "false");
+  put("hive.repl.bootstrap.dump.open.txn.timeout", "1s");
+  put("hive.exec.dynamic.partition.mode", "nonstrict");
+  put("hive.strict.checks.bucketing", "false");
+  put("hive.mapred.mode", "nonstrict");
+  put("mapred.input.dir.recursive", "true");
+  put("hive.metastore.disallow.incompatible.col.type.changes", "false");
+  put("hive.support.concurrency", "false");
+  put("hive.txn.manager", 
"org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager");
+  put("hive.strict.managed.tables", "false");
+}};
+primaryConfigs.putAll(overrides);
+
+internalBeforeClassSetup(primaryConfigs, replicaConfigs,
 
 Review comment:
   In migration case, we shall validate if stats are associated with correct 
writeId. I think, in our tests, it should be pointing to last allocated writeId.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 221240)
Time Spent: 10h 50m  (was: 10h 40m)

> Stats replication for ACID tables.
> --
>
> Key: HIVE-21109
> URL: https://issues.apache.org/jira/browse/HIVE-21109
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: 

[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.

2019-04-01 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=221210=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221210
 ]

ASF GitHub Bot logged work on HIVE-21109:
-

Author: ASF GitHub Bot
Created on: 01/Apr/19 10:05
Start Date: 01/Apr/19 10:05
Worklog Time Spent: 10m 
  Work Description: ashutosh-bapat commented on pull request #579: 
HIVE-21109 : Support stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r270797151
 
 

 ##
 File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestStatsReplicationScenarios.java
 ##
 @@ -301,12 +338,106 @@ private String dumpLoadVerify(List tableNames, 
String lastReplicationId,
 return dumpTuple.lastReplicationId;
   }
 
+  /**
+   * Run a bootstrap that will fail.
+   * @param tuple the location of bootstrap dump
+   */
+  private void failBootstrapLoad(WarehouseInstance.Tuple tuple, int 
failAfterNumTables) throws Throwable {
+// fail setting ckpt directory property for the second table so that we 
test the case when
+// bootstrap load fails after some but not all tables are loaded.
+BehaviourInjection callerVerifier
+= new BehaviourInjection() {
+  int cntTables = 0;
+  @Nullable
+  @Override
+  public Boolean apply(@Nullable CallerArguments args) {
+cntTables++;
 
 Review comment:
   Hmm. Thanks for catching this. Done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 221210)
Time Spent: 10h 40m  (was: 10.5h)

> Stats replication for ACID tables.
> --
>
> Key: HIVE-21109
> URL: https://issues.apache.org/jira/browse/HIVE-21109
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, 
> HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, 
> HIVE-21109.06.patch, HIVE-21109.07.patch, HIVE-21109.08.patch
>
>  Time Spent: 10h 40m
>  Remaining Estimate: 0h
>
> Transactional tables require a writeID associated with the stats update. This 
> writeId needs to be in sync with the writeId on the source and hence needs to 
> be replicated from the source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.

2019-04-01 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=221209=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221209
 ]

ASF GitHub Bot logged work on HIVE-21109:
-

Author: ASF GitHub Bot
Created on: 01/Apr/19 10:05
Start Date: 01/Apr/19 10:05
Worklog Time Spent: 10m 
  Work Description: ashutosh-bapat commented on pull request #579: 
HIVE-21109 : Support stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r270797067
 
 

 ##
 File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestStatsReplicationScenariosMigrationNoAutogather.java
 ##
 @@ -0,0 +1,78 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.parse;
+
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.metastore.conf.MetastoreConf;
+import 
org.apache.hadoop.hive.metastore.messaging.json.gzip.GzipJSONMessageEncoder;
+import org.junit.BeforeClass;
+import org.junit.Rule;
+import org.junit.rules.TestName;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.HashMap;
+import java.util.Map;
+
+/**
+ * Tests statistics replication for ACID tables.
+ */
+public class TestStatsReplicationScenariosMigrationNoAutogather extends 
TestStatsReplicationScenarios {
+  @Rule
+  public final TestName testName = new TestName();
+
+  protected static final Logger LOG = 
LoggerFactory.getLogger(TestReplicationScenarios.class);
 
 Review comment:
   Removed.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 221209)
Time Spent: 10.5h  (was: 10h 20m)

> Stats replication for ACID tables.
> --
>
> Key: HIVE-21109
> URL: https://issues.apache.org/jira/browse/HIVE-21109
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, 
> HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, 
> HIVE-21109.06.patch, HIVE-21109.07.patch, HIVE-21109.08.patch
>
>  Time Spent: 10.5h
>  Remaining Estimate: 0h
>
> Transactional tables require a writeID associated with the stats update. This 
> writeId needs to be in sync with the writeId on the source and hence needs to 
> be replicated from the source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.

2019-04-01 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=221207=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221207
 ]

ASF GitHub Bot logged work on HIVE-21109:
-

Author: ASF GitHub Bot
Created on: 01/Apr/19 10:02
Start Date: 01/Apr/19 10:02
Worklog Time Spent: 10m 
  Work Description: ashutosh-bapat commented on pull request #579: 
HIVE-21109 : Support stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r270795998
 
 

 ##
 File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestStatsReplicationScenarios.java
 ##
 @@ -301,12 +338,106 @@ private String dumpLoadVerify(List tableNames, 
String lastReplicationId,
 return dumpTuple.lastReplicationId;
   }
 
+  /**
+   * Run a bootstrap that will fail.
+   * @param tuple the location of bootstrap dump
+   */
+  private void failBootstrapLoad(WarehouseInstance.Tuple tuple, int 
failAfterNumTables) throws Throwable {
+// fail setting ckpt directory property for the second table so that we 
test the case when
+// bootstrap load fails after some but not all tables are loaded.
+BehaviourInjection callerVerifier
+= new BehaviourInjection() {
+  int cntTables = 0;
+  @Nullable
+  @Override
+  public Boolean apply(@Nullable CallerArguments args) {
+cntTables++;
+if (args.dbName.equalsIgnoreCase(replicatedDbName) && cntTables > 
failAfterNumTables) {
+  injectionPathCalled = true;
+  LOG.warn("Verifier - DB : " + args.dbName + " TABLE : " + 
args.tblName);
+  return false;
+}
+return true;
+  }
+};
+
+InjectableBehaviourObjectStore.setAlterTableModifier(callerVerifier);
+try {
+  replica.loadFailure(replicatedDbName, tuple.dumpLocation);
+  callerVerifier.assertInjectionsPerformed(true, false);
+} finally {
+  InjectableBehaviourObjectStore.resetAlterTableModifier();
+}
+  }
+
+  private void failIncrementalLoad(WarehouseInstance.Tuple dumpTuple, int 
failAfterNumEvents) throws Throwable {
+// fail add notification when updating table stats after given number of 
such events. Thus we
+// test successful application as well as failed application of this event.
+BehaviourInjection callerVerifier
+= new BehaviourInjection() {
+  int cntEvents = 0;
+  @Override
+  public Boolean apply(NotificationEvent entry) {
+cntEvents++;
+if 
(entry.getEventType().equalsIgnoreCase(EventMessage.EventType.UPDATE_TABLE_COLUMN_STAT.toString())
 &&
+cntEvents > failAfterNumEvents) {
+  injectionPathCalled = true;
+  LOG.warn("Verifier - DB: " + entry.getDbName()
+  + " Table: " + entry.getTableName()
+  + " Event: " + entry.getEventType());
+  return false;
+}
+return true;
+  }
+};
+
+InjectableBehaviourObjectStore.setAddNotificationModifier(callerVerifier);
+try {
+  replica.loadFailure(replicatedDbName, dumpTuple.dumpLocation);
+} finally {
+  InjectableBehaviourObjectStore.resetAddNotificationModifier();
+}
+callerVerifier.assertInjectionsPerformed(true, false);
+
+// fail add notification when updating partition stats for for the second 
time. Thus we test
+// successful application as well as failed application of this event.
+callerVerifier = new BehaviourInjection() {
+  int cntEvents = 1;
+
+  @Override
+  public Boolean apply(NotificationEvent entry) {
+cntEvents++;
+if 
(entry.getEventType().equalsIgnoreCase(EventMessage.EventType.UPDATE_PARTITION_COLUMN_STAT.toString())
 &&
+cntEvents > failAfterNumEvents) {
+  injectionPathCalled = true;
+  LOG.warn("Verifier - DB: " + entry.getDbName()
+  + " Table: " + entry.getTableName()
+  + " Event: " + entry.getEventType());
+  return false;
+}
+return true;
+  }
+};
+
+InjectableBehaviourObjectStore.setAddNotificationModifier(callerVerifier);
+try {
+  replica.loadFailure(replicatedDbName, dumpTuple.dumpLocation);
+} finally {
+  InjectableBehaviourObjectStore.resetAddNotificationModifier();
+}
 
 Review comment:
   Done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 221207)
Time Spent: 10h 20m  (was: 10h 10m)

> Stats replication for ACID tables.
> --
>
> Key: HIVE-21109
>   

[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.

2019-04-01 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=221206=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221206
 ]

ASF GitHub Bot logged work on HIVE-21109:
-

Author: ASF GitHub Bot
Created on: 01/Apr/19 10:00
Start Date: 01/Apr/19 10:00
Worklog Time Spent: 10m 
  Work Description: ashutosh-bapat commented on pull request #579: 
HIVE-21109 : Support stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r270795354
 
 

 ##
 File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestStatsReplicationScenarios.java
 ##
 @@ -301,12 +338,106 @@ private String dumpLoadVerify(List tableNames, 
String lastReplicationId,
 return dumpTuple.lastReplicationId;
   }
 
+  /**
+   * Run a bootstrap that will fail.
+   * @param tuple the location of bootstrap dump
+   */
+  private void failBootstrapLoad(WarehouseInstance.Tuple tuple, int 
failAfterNumTables) throws Throwable {
+// fail setting ckpt directory property for the second table so that we 
test the case when
+// bootstrap load fails after some but not all tables are loaded.
+BehaviourInjection callerVerifier
+= new BehaviourInjection() {
+  int cntTables = 0;
+  @Nullable
+  @Override
+  public Boolean apply(@Nullable CallerArguments args) {
+cntTables++;
+if (args.dbName.equalsIgnoreCase(replicatedDbName) && cntTables > 
failAfterNumTables) {
+  injectionPathCalled = true;
+  LOG.warn("Verifier - DB : " + args.dbName + " TABLE : " + 
args.tblName);
+  return false;
+}
+return true;
+  }
+};
+
+InjectableBehaviourObjectStore.setAlterTableModifier(callerVerifier);
+try {
+  replica.loadFailure(replicatedDbName, tuple.dumpLocation);
+  callerVerifier.assertInjectionsPerformed(true, false);
+} finally {
+  InjectableBehaviourObjectStore.resetAlterTableModifier();
+}
+  }
+
+  private void failIncrementalLoad(WarehouseInstance.Tuple dumpTuple, int 
failAfterNumEvents) throws Throwable {
+// fail add notification when updating table stats after given number of 
such events. Thus we
+// test successful application as well as failed application of this event.
+BehaviourInjection callerVerifier
+= new BehaviourInjection() {
+  int cntEvents = 0;
+  @Override
+  public Boolean apply(NotificationEvent entry) {
+cntEvents++;
 
 Review comment:
   This code has changed while working on another related comment. Again we 
don't need to count exact number of events. We need at least one successful 
event and other one unsuccessful event.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 221206)
Time Spent: 10h 10m  (was: 10h)

> Stats replication for ACID tables.
> --
>
> Key: HIVE-21109
> URL: https://issues.apache.org/jira/browse/HIVE-21109
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, 
> HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, 
> HIVE-21109.06.patch, HIVE-21109.07.patch, HIVE-21109.08.patch
>
>  Time Spent: 10h 10m
>  Remaining Estimate: 0h
>
> Transactional tables require a writeID associated with the stats update. This 
> writeId needs to be in sync with the writeId on the source and hence needs to 
> be replicated from the source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.

2019-04-01 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=221203=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221203
 ]

ASF GitHub Bot logged work on HIVE-21109:
-

Author: ASF GitHub Bot
Created on: 01/Apr/19 09:58
Start Date: 01/Apr/19 09:58
Worklog Time Spent: 10m 
  Work Description: ashutosh-bapat commented on pull request #579: 
HIVE-21109 : Support stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r270794735
 
 

 ##
 File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestStatsReplicationScenarios.java
 ##
 @@ -301,12 +338,106 @@ private String dumpLoadVerify(List tableNames, 
String lastReplicationId,
 return dumpTuple.lastReplicationId;
   }
 
+  /**
+   * Run a bootstrap that will fail.
+   * @param tuple the location of bootstrap dump
+   */
+  private void failBootstrapLoad(WarehouseInstance.Tuple tuple, int 
failAfterNumTables) throws Throwable {
+// fail setting ckpt directory property for the second table so that we 
test the case when
+// bootstrap load fails after some but not all tables are loaded.
+BehaviourInjection callerVerifier
+= new BehaviourInjection() {
+  int cntTables = 0;
+  @Nullable
+  @Override
+  public Boolean apply(@Nullable CallerArguments args) {
+cntTables++;
+if (args.dbName.equalsIgnoreCase(replicatedDbName) && cntTables > 
failAfterNumTables) {
+  injectionPathCalled = true;
+  LOG.warn("Verifier - DB : " + args.dbName + " TABLE : " + 
args.tblName);
+  return false;
+}
+return true;
+  }
+};
+
+InjectableBehaviourObjectStore.setAlterTableModifier(callerVerifier);
+try {
+  replica.loadFailure(replicatedDbName, tuple.dumpLocation);
+  callerVerifier.assertInjectionsPerformed(true, false);
+} finally {
+  InjectableBehaviourObjectStore.resetAlterTableModifier();
+}
 
 Review comment:
   I don't think we need to be really hard and fast about the exact number of 
tables loaded. All we are testing is whether there was a failure and the retry 
loaded the stats successfully. Current set of checks is enough for that.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 221203)
Time Spent: 10h  (was: 9h 50m)

> Stats replication for ACID tables.
> --
>
> Key: HIVE-21109
> URL: https://issues.apache.org/jira/browse/HIVE-21109
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, 
> HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, 
> HIVE-21109.06.patch, HIVE-21109.07.patch, HIVE-21109.08.patch
>
>  Time Spent: 10h
>  Remaining Estimate: 0h
>
> Transactional tables require a writeID associated with the stats update. This 
> writeId needs to be in sync with the writeId on the source and hence needs to 
> be replicated from the source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.

2019-04-01 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=221179=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221179
 ]

ASF GitHub Bot logged work on HIVE-21109:
-

Author: ASF GitHub Bot
Created on: 01/Apr/19 09:38
Start Date: 01/Apr/19 09:38
Worklog Time Spent: 10m 
  Work Description: ashutosh-bapat commented on pull request #579: 
HIVE-21109 : Support stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r270786715
 
 

 ##
 File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestStatsReplicationScenarios.java
 ##
 @@ -269,11 +294,23 @@ private String dumpLoadVerify(List tableNames, 
String lastReplicationId,
 WarehouseInstance.Tuple dumpTuple = primary.run("use " + primaryDbName)
 .dump(primaryDbName, lastReplicationId, withClauseList);
 
+
 // Load, if necessary changing configuration.
 if (parallelLoad) {
   replica.hiveConf.setBoolVar(HiveConf.ConfVars.EXECPARALLEL, true);
 }
 
+// Fail load if for testing failure and retry scenario. Fail the load 
while setting
+// checkpoint for a table in the middle of list of tables.
+if (failRetry) {
+  if (lastReplicationId == null) {
+failBootstrapLoad(dumpTuple, tableNames.size()/2);
+  } else {
+failIncrementalLoad(dumpTuple, tableNames.size()/2);
 
 Review comment:
   We are counting UpdateTableStats or UpdatePartStats events and not every 
event. So, we will fail only after encountering no of tables/2 events of those 
types. So it can not fail before applying update stats events. But to be on the 
safer side, I have changed the code to fail after second event so that we have 
at least one successful application before we fail. Since we are performing 
multiple insert events per table, we can be sure that there are at least 2 
events of each type.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 221179)
Time Spent: 9h 50m  (was: 9h 40m)

> Stats replication for ACID tables.
> --
>
> Key: HIVE-21109
> URL: https://issues.apache.org/jira/browse/HIVE-21109
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, 
> HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, 
> HIVE-21109.06.patch, HIVE-21109.07.patch, HIVE-21109.08.patch
>
>  Time Spent: 9h 50m
>  Remaining Estimate: 0h
>
> Transactional tables require a writeID associated with the stats update. This 
> writeId needs to be in sync with the writeId on the source and hence needs to 
> be replicated from the source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.

2019-04-01 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=221122=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221122
 ]

ASF GitHub Bot logged work on HIVE-21109:
-

Author: ASF GitHub Bot
Created on: 01/Apr/19 06:50
Start Date: 01/Apr/19 06:50
Worklog Time Spent: 10m 
  Work Description: ashutosh-bapat commented on pull request #579: 
HIVE-21109 : Support stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r270732089
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
 ##
 @@ -2950,21 +2957,32 @@ public Partition createPartition(Table tbl, 
Map partSpec) throws
 int size = addPartitionDesc.getPartitionCount();
 List in =
 new ArrayList(size);
-AcidUtils.TableSnapshot tableSnapshot = AcidUtils.getTableSnapshot(conf, 
tbl, true);
 long writeId;
 String validWriteIdList;
-if (tableSnapshot != null && tableSnapshot.getWriteId() > 0) {
-  writeId = tableSnapshot.getWriteId();
-  validWriteIdList = tableSnapshot.getValidWriteIdList();
+
+// In case of replication, get the writeId from the source and use valid 
write Id list
+// for replication.
+if (addPartitionDesc.getReplicationSpec().isInReplicationScope() &&
+addPartitionDesc.getPartition(0).getWriteId() > 0) {
 
 Review comment:
   writeId will be 0 for non-transactional tables. Also this is 
createPartitions code, which may get executed for with writeId = 0 for 
non-transactional modifications to partitions for transactional tables as well. 
The condition, which I borrowed from the old code is required so that we don't 
create a valid writeId list or try to get a table snapshot when writeId is zero.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 221122)
Time Spent: 9h 40m  (was: 9.5h)

> Stats replication for ACID tables.
> --
>
> Key: HIVE-21109
> URL: https://issues.apache.org/jira/browse/HIVE-21109
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, 
> HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, 
> HIVE-21109.06.patch, HIVE-21109.07.patch, HIVE-21109.08.patch
>
>  Time Spent: 9h 40m
>  Remaining Estimate: 0h
>
> Transactional tables require a writeID associated with the stats update. This 
> writeId needs to be in sync with the writeId on the source and hence needs to 
> be replicated from the source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.

2019-04-01 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=221121=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221121
 ]

ASF GitHub Bot logged work on HIVE-21109:
-

Author: ASF GitHub Bot
Created on: 01/Apr/19 06:49
Start Date: 01/Apr/19 06:49
Worklog Time Spent: 10m 
  Work Description: ashutosh-bapat commented on pull request #579: 
HIVE-21109 : Support stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r270732089
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
 ##
 @@ -2950,21 +2957,32 @@ public Partition createPartition(Table tbl, 
Map partSpec) throws
 int size = addPartitionDesc.getPartitionCount();
 List in =
 new ArrayList(size);
-AcidUtils.TableSnapshot tableSnapshot = AcidUtils.getTableSnapshot(conf, 
tbl, true);
 long writeId;
 String validWriteIdList;
-if (tableSnapshot != null && tableSnapshot.getWriteId() > 0) {
-  writeId = tableSnapshot.getWriteId();
-  validWriteIdList = tableSnapshot.getValidWriteIdList();
+
+// In case of replication, get the writeId from the source and use valid 
write Id list
+// for replication.
+if (addPartitionDesc.getReplicationSpec().isInReplicationScope() &&
+addPartitionDesc.getPartition(0).getWriteId() > 0) {
 
 Review comment:
   writeId will be 0 for non-transactional tables. Also this is 
createPartitions code, which may get executed for partitions created when 
writeId 0 for transactional tables as well. The condition, which I borrowed 
from the old code is required so that we don't create a valid writeId list or 
try to get a table snapshot when writeId is zero.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 221121)
Time Spent: 9.5h  (was: 9h 20m)

> Stats replication for ACID tables.
> --
>
> Key: HIVE-21109
> URL: https://issues.apache.org/jira/browse/HIVE-21109
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, 
> HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, 
> HIVE-21109.06.patch, HIVE-21109.07.patch, HIVE-21109.08.patch
>
>  Time Spent: 9.5h
>  Remaining Estimate: 0h
>
> Transactional tables require a writeID associated with the stats update. This 
> writeId needs to be in sync with the writeId on the source and hence needs to 
> be replicated from the source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.

2019-04-01 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=221119=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221119
 ]

ASF GitHub Bot logged work on HIVE-21109:
-

Author: ASF GitHub Bot
Created on: 01/Apr/19 06:48
Start Date: 01/Apr/19 06:48
Worklog Time Spent: 10m 
  Work Description: ashutosh-bapat commented on pull request #579: 
HIVE-21109 : Support stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r270732089
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
 ##
 @@ -2950,21 +2957,32 @@ public Partition createPartition(Table tbl, 
Map partSpec) throws
 int size = addPartitionDesc.getPartitionCount();
 List in =
 new ArrayList(size);
-AcidUtils.TableSnapshot tableSnapshot = AcidUtils.getTableSnapshot(conf, 
tbl, true);
 long writeId;
 String validWriteIdList;
-if (tableSnapshot != null && tableSnapshot.getWriteId() > 0) {
-  writeId = tableSnapshot.getWriteId();
-  validWriteIdList = tableSnapshot.getValidWriteIdList();
+
+// In case of replication, get the writeId from the source and use valid 
write Id list
+// for replication.
+if (addPartitionDesc.getReplicationSpec().isInReplicationScope() &&
+addPartitionDesc.getPartition(0).getWriteId() > 0) {
 
 Review comment:
   writeId will be 0 for non-transactional tables. Also this is 
createPartitions code, which may get executed for partitions created when 
writeId 0 for transactional tables as well.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 221119)
Time Spent: 9h 20m  (was: 9h 10m)

> Stats replication for ACID tables.
> --
>
> Key: HIVE-21109
> URL: https://issues.apache.org/jira/browse/HIVE-21109
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, 
> HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, 
> HIVE-21109.06.patch, HIVE-21109.07.patch, HIVE-21109.08.patch
>
>  Time Spent: 9h 20m
>  Remaining Estimate: 0h
>
> Transactional tables require a writeID associated with the stats update. This 
> writeId needs to be in sync with the writeId on the source and hence needs to 
> be replicated from the source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.

2019-04-01 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=221117=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221117
 ]

ASF GitHub Bot logged work on HIVE-21109:
-

Author: ASF GitHub Bot
Created on: 01/Apr/19 06:44
Start Date: 01/Apr/19 06:44
Worklog Time Spent: 10m 
  Work Description: ashutosh-bapat commented on pull request #579: 
HIVE-21109 : Support stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r270731373
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnStatsUpdateTask.java
 ##
 @@ -297,21 +303,34 @@ private ColumnStatisticsDesc getColumnStatsDesc(String 
dbName,
 
   private int persistColumnStats(Hive db) throws HiveException, MetaException, 
IOException {
 ColumnStatistics colStats = constructColumnStatsFromInput();
-ColumnStatisticsDesc colStatsDesc = colStats.getStatsDesc();
-// We do not support stats replication for a transactional table yet. If 
we are converting
-// a non-transactional table to a transactional table during replication, 
we might get
-// column statistics but we shouldn't update those.
-if (work.getColStats() != null &&
-
AcidUtils.isTransactionalTable(getHive().getTable(colStatsDesc.getDbName(),
-  
colStatsDesc.getTableName( {
-  LOG.debug("Skipped updating column stats for table " +
-TableName.getDbTable(colStatsDesc.getDbName(), 
colStatsDesc.getTableName()) +
-" because it is converted to a transactional table during 
replication.");
-  return 0;
-}
-
 SetPartitionsStatsRequest request =
 new SetPartitionsStatsRequest(Collections.singletonList(colStats));
+
+// Set writeId and validWriteId list for replicated statistics.
+if (work.getColStats() != null) {
+  String dbName = colStats.getStatsDesc().getDbName();
+  String tblName = colStats.getStatsDesc().getTableName();
+  Table tbl = db.getTable(dbName, tblName);
+  long writeId = work.getWriteId();
+  // If it's a transactional table on source and target, we will get a 
valid writeId
+  // associated with it. Otherwise it's a non-transactional table on 
source migrated to a
+  // transactional table on target, we need to craft a valid writeId here.
+  if (AcidUtils.isTransactionalTable(tbl)) {
+ValidWriteIdList writeIds;
+if (writeId <= 0) {
+  Long tmpWriteId = ReplUtils.getMigrationCurrentTblWriteId(conf);
+  if (tmpWriteId == null) {
+throw new HiveException("DDLTask : Write id is not set in the 
config by open txn task for migration");
+  }
+  writeId = tmpWriteId;
+}
+writeIds = new ValidReaderWriteIdList(TableName.getDbTable(dbName, 
tblName), new long[0],
 
 Review comment:
   work.getColStats() returns non-null value only in case of replication flow. 
This block of code is under that condition. So, it executes only in repl flow. 
Added a comment to that effect. Also added a comment per your suggestion.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 221117)
Time Spent: 9h 10m  (was: 9h)

> Stats replication for ACID tables.
> --
>
> Key: HIVE-21109
> URL: https://issues.apache.org/jira/browse/HIVE-21109
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, 
> HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, 
> HIVE-21109.06.patch, HIVE-21109.07.patch, HIVE-21109.08.patch
>
>  Time Spent: 9h 10m
>  Remaining Estimate: 0h
>
> Transactional tables require a writeID associated with the stats update. This 
> writeId needs to be in sync with the writeId on the source and hence needs to 
> be replicated from the source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.

2019-04-01 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=221113=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221113
 ]

ASF GitHub Bot logged work on HIVE-21109:
-

Author: ASF GitHub Bot
Created on: 01/Apr/19 06:38
Start Date: 01/Apr/19 06:38
Worklog Time Spent: 10m 
  Work Description: ashutosh-bapat commented on pull request #579: 
HIVE-21109 : Support stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r270730194
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
 ##
 @@ -2950,21 +2957,32 @@ public Partition createPartition(Table tbl, 
Map partSpec) throws
 int size = addPartitionDesc.getPartitionCount();
 List in =
 new ArrayList(size);
-AcidUtils.TableSnapshot tableSnapshot = AcidUtils.getTableSnapshot(conf, 
tbl, true);
 long writeId;
 String validWriteIdList;
-if (tableSnapshot != null && tableSnapshot.getWriteId() > 0) {
-  writeId = tableSnapshot.getWriteId();
-  validWriteIdList = tableSnapshot.getValidWriteIdList();
+
+// In case of replication, get the writeId from the source and use valid 
write Id list
+// for replication.
+if (addPartitionDesc.getReplicationSpec().isInReplicationScope() &&
+addPartitionDesc.getPartition(0).getWriteId() > 0) {
+  writeId = addPartitionDesc.getPartition(0).getWriteId();
+  validWriteIdList = new 
ValidReaderWriteIdList(TableName.getDbTable(tbl.getDbName(),
 
 Review comment:
   Done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 221113)
Time Spent: 9h  (was: 8h 50m)

> Stats replication for ACID tables.
> --
>
> Key: HIVE-21109
> URL: https://issues.apache.org/jira/browse/HIVE-21109
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, 
> HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, 
> HIVE-21109.06.patch, HIVE-21109.07.patch, HIVE-21109.08.patch
>
>  Time Spent: 9h
>  Remaining Estimate: 0h
>
> Transactional tables require a writeID associated with the stats update. This 
> writeId needs to be in sync with the writeId on the source and hence needs to 
> be replicated from the source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.

2019-04-01 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=221112=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221112
 ]

ASF GitHub Bot logged work on HIVE-21109:
-

Author: ASF GitHub Bot
Created on: 01/Apr/19 06:36
Start Date: 01/Apr/19 06:36
Worklog Time Spent: 10m 
  Work Description: ashutosh-bapat commented on pull request #579: 
HIVE-21109 : Support stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r270729852
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
 ##
 @@ -694,7 +695,9 @@ public void alterTable(String catName, String dbName, 
String tblName, Table newT
   AcidUtils.TableSnapshot tableSnapshot = null;
   if (transactional) {
 if (replWriteId > 0) {
-  ValidWriteIdList writeIds = 
AcidUtils.getTableValidWriteIdListWithTxnList(conf, dbName, tblName);
+  ValidWriteIdList writeIds = new 
ValidReaderWriteIdList(TableName.getDbTable(dbName, tblName),
 
 Review comment:
   Done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 221112)
Time Spent: 8h 50m  (was: 8h 40m)

> Stats replication for ACID tables.
> --
>
> Key: HIVE-21109
> URL: https://issues.apache.org/jira/browse/HIVE-21109
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, 
> HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, 
> HIVE-21109.06.patch, HIVE-21109.07.patch, HIVE-21109.08.patch
>
>  Time Spent: 8h 50m
>  Remaining Estimate: 0h
>
> Transactional tables require a writeID associated with the stats update. This 
> writeId needs to be in sync with the writeId on the source and hence needs to 
> be replicated from the source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.

2019-04-01 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=221109=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221109
 ]

ASF GitHub Bot logged work on HIVE-21109:
-

Author: ASF GitHub Bot
Created on: 01/Apr/19 06:30
Start Date: 01/Apr/19 06:30
Worklog Time Spent: 10m 
  Work Description: ashutosh-bapat commented on pull request #579: 
HIVE-21109 : Support stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r270728648
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnStatsUpdateTask.java
 ##
 @@ -297,21 +303,34 @@ private ColumnStatisticsDesc getColumnStatsDesc(String 
dbName,
 
   private int persistColumnStats(Hive db) throws HiveException, MetaException, 
IOException {
 ColumnStatistics colStats = constructColumnStatsFromInput();
-ColumnStatisticsDesc colStatsDesc = colStats.getStatsDesc();
-// We do not support stats replication for a transactional table yet. If 
we are converting
-// a non-transactional table to a transactional table during replication, 
we might get
-// column statistics but we shouldn't update those.
-if (work.getColStats() != null &&
-
AcidUtils.isTransactionalTable(getHive().getTable(colStatsDesc.getDbName(),
-  
colStatsDesc.getTableName( {
-  LOG.debug("Skipped updating column stats for table " +
-TableName.getDbTable(colStatsDesc.getDbName(), 
colStatsDesc.getTableName()) +
-" because it is converted to a transactional table during 
replication.");
-  return 0;
-}
-
 SetPartitionsStatsRequest request =
 new SetPartitionsStatsRequest(Collections.singletonList(colStats));
+
+// Set writeId and validWriteId list for replicated statistics.
+if (work.getColStats() != null) {
+  String dbName = colStats.getStatsDesc().getDbName();
+  String tblName = colStats.getStatsDesc().getTableName();
+  Table tbl = db.getTable(dbName, tblName);
+  long writeId = work.getWriteId();
+  // If it's a transactional table on source and target, we will get a 
valid writeId
+  // associated with it. Otherwise it's a non-transactional table on 
source migrated to a
+  // transactional table on target, we need to craft a valid writeId here.
+  if (AcidUtils.isTransactionalTable(tbl)) {
+ValidWriteIdList writeIds;
+if (writeId <= 0) {
 
 Review comment:
   We can not set writeId in the ColumnStatsUpdateWork because the writeId for 
migration is available only after a transaction is opened for migration, which 
doesn't happen at the load time (when the work is created). Going by the gist 
of your suggestion, I have set a flag in work to indicate that the writeId 
should be the one used for migration.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 221109)
Time Spent: 8h 40m  (was: 8.5h)

> Stats replication for ACID tables.
> --
>
> Key: HIVE-21109
> URL: https://issues.apache.org/jira/browse/HIVE-21109
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, 
> HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, 
> HIVE-21109.06.patch, HIVE-21109.07.patch, HIVE-21109.08.patch
>
>  Time Spent: 8h 40m
>  Remaining Estimate: 0h
>
> Transactional tables require a writeID associated with the stats update. This 
> writeId needs to be in sync with the writeId on the source and hence needs to 
> be replicated from the source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.

2019-03-31 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=220947=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-220947
 ]

ASF GitHub Bot logged work on HIVE-21109:
-

Author: ASF GitHub Bot
Created on: 31/Mar/19 07:33
Start Date: 31/Mar/19 07:33
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #579: HIVE-21109 : 
Support stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r270653325
 
 

 ##
 File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestStatsReplicationScenarios.java
 ##
 @@ -269,11 +294,23 @@ private String dumpLoadVerify(List tableNames, 
String lastReplicationId,
 WarehouseInstance.Tuple dumpTuple = primary.run("use " + primaryDbName)
 .dump(primaryDbName, lastReplicationId, withClauseList);
 
+
 // Load, if necessary changing configuration.
 if (parallelLoad) {
   replica.hiveConf.setBoolVar(HiveConf.ConfVars.EXECPARALLEL, true);
 }
 
+// Fail load if for testing failure and retry scenario. Fail the load 
while setting
+// checkpoint for a table in the middle of list of tables.
+if (failRetry) {
+  if (lastReplicationId == null) {
+failBootstrapLoad(dumpTuple, tableNames.size()/2);
+  } else {
+failIncrementalLoad(dumpTuple, tableNames.size()/2);
 
 Review comment:
   It is not directly mapped that one event per table. So, this value of 
tableNames.size()/2 may fail even before applying update state event. If we 
want to fail the incremental load after a fixed event, then need to get the 
event count by dumping it after that operation at source.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 220947)

> Stats replication for ACID tables.
> --
>
> Key: HIVE-21109
> URL: https://issues.apache.org/jira/browse/HIVE-21109
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, 
> HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, 
> HIVE-21109.06.patch, HIVE-21109.07.patch, HIVE-21109.08.patch
>
>  Time Spent: 8h
>  Remaining Estimate: 0h
>
> Transactional tables require a writeID associated with the stats update. This 
> writeId needs to be in sync with the writeId on the source and hence needs to 
> be replicated from the source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.

2019-03-31 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=220941=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-220941
 ]

ASF GitHub Bot logged work on HIVE-21109:
-

Author: ASF GitHub Bot
Created on: 31/Mar/19 07:33
Start Date: 31/Mar/19 07:33
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #579: HIVE-21109 : 
Support stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r270652691
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnStatsUpdateTask.java
 ##
 @@ -297,21 +303,34 @@ private ColumnStatisticsDesc getColumnStatsDesc(String 
dbName,
 
   private int persistColumnStats(Hive db) throws HiveException, MetaException, 
IOException {
 ColumnStatistics colStats = constructColumnStatsFromInput();
-ColumnStatisticsDesc colStatsDesc = colStats.getStatsDesc();
-// We do not support stats replication for a transactional table yet. If 
we are converting
-// a non-transactional table to a transactional table during replication, 
we might get
-// column statistics but we shouldn't update those.
-if (work.getColStats() != null &&
-
AcidUtils.isTransactionalTable(getHive().getTable(colStatsDesc.getDbName(),
-  
colStatsDesc.getTableName( {
-  LOG.debug("Skipped updating column stats for table " +
-TableName.getDbTable(colStatsDesc.getDbName(), 
colStatsDesc.getTableName()) +
-" because it is converted to a transactional table during 
replication.");
-  return 0;
-}
-
 SetPartitionsStatsRequest request =
 new SetPartitionsStatsRequest(Collections.singletonList(colStats));
+
+// Set writeId and validWriteId list for replicated statistics.
+if (work.getColStats() != null) {
+  String dbName = colStats.getStatsDesc().getDbName();
+  String tblName = colStats.getStatsDesc().getTableName();
+  Table tbl = db.getTable(dbName, tblName);
+  long writeId = work.getWriteId();
+  // If it's a transactional table on source and target, we will get a 
valid writeId
+  // associated with it. Otherwise it's a non-transactional table on 
source migrated to a
+  // transactional table on target, we need to craft a valid writeId here.
+  if (AcidUtils.isTransactionalTable(tbl)) {
+ValidWriteIdList writeIds;
+if (writeId <= 0) {
+  Long tmpWriteId = ReplUtils.getMigrationCurrentTblWriteId(conf);
+  if (tmpWriteId == null) {
+throw new HiveException("DDLTask : Write id is not set in the 
config by open txn task for migration");
+  }
+  writeId = tmpWriteId;
+}
+writeIds = new ValidReaderWriteIdList(TableName.getDbTable(dbName, 
tblName), new long[0],
 
 Review comment:
   Only in repl flow, this method of hardcoding ValidWriteIdList make sense. If 
not, then need to go with logic of getting it from HMS. Need to check it here 
and also add a comment on why this hardcoding logic works for repl flow.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 220941)

> Stats replication for ACID tables.
> --
>
> Key: HIVE-21109
> URL: https://issues.apache.org/jira/browse/HIVE-21109
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, 
> HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, 
> HIVE-21109.06.patch, HIVE-21109.07.patch, HIVE-21109.08.patch
>
>  Time Spent: 7h 40m
>  Remaining Estimate: 0h
>
> Transactional tables require a writeID associated with the stats update. This 
> writeId needs to be in sync with the writeId on the source and hence needs to 
> be replicated from the source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.

2019-03-31 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=220950=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-220950
 ]

ASF GitHub Bot logged work on HIVE-21109:
-

Author: ASF GitHub Bot
Created on: 31/Mar/19 07:33
Start Date: 31/Mar/19 07:33
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #579: HIVE-21109 : 
Support stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r270653503
 
 

 ##
 File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestStatsReplicationScenariosMigrationNoAutogather.java
 ##
 @@ -0,0 +1,78 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.parse;
+
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.metastore.conf.MetastoreConf;
+import 
org.apache.hadoop.hive.metastore.messaging.json.gzip.GzipJSONMessageEncoder;
+import org.junit.BeforeClass;
+import org.junit.Rule;
+import org.junit.rules.TestName;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.HashMap;
+import java.util.Map;
+
+/**
+ * Tests statistics replication for ACID tables.
+ */
+public class TestStatsReplicationScenariosMigrationNoAutogather extends 
TestStatsReplicationScenarios {
+  @Rule
+  public final TestName testName = new TestName();
+
+  protected static final Logger LOG = 
LoggerFactory.getLogger(TestReplicationScenarios.class);
 
 Review comment:
   LOG is not used. Can be removed. Pls check in other newly added test classes.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 220950)
Time Spent: 8h 20m  (was: 8h 10m)

> Stats replication for ACID tables.
> --
>
> Key: HIVE-21109
> URL: https://issues.apache.org/jira/browse/HIVE-21109
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, 
> HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, 
> HIVE-21109.06.patch, HIVE-21109.07.patch, HIVE-21109.08.patch
>
>  Time Spent: 8h 20m
>  Remaining Estimate: 0h
>
> Transactional tables require a writeID associated with the stats update. This 
> writeId needs to be in sync with the writeId on the source and hence needs to 
> be replicated from the source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.

2019-03-31 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=220948=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-220948
 ]

ASF GitHub Bot logged work on HIVE-21109:
-

Author: ASF GitHub Bot
Created on: 31/Mar/19 07:33
Start Date: 31/Mar/19 07:33
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #579: HIVE-21109 : 
Support stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r270653390
 
 

 ##
 File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestStatsReplicationScenarios.java
 ##
 @@ -301,12 +338,106 @@ private String dumpLoadVerify(List tableNames, 
String lastReplicationId,
 return dumpTuple.lastReplicationId;
   }
 
+  /**
+   * Run a bootstrap that will fail.
+   * @param tuple the location of bootstrap dump
+   */
+  private void failBootstrapLoad(WarehouseInstance.Tuple tuple, int 
failAfterNumTables) throws Throwable {
+// fail setting ckpt directory property for the second table so that we 
test the case when
+// bootstrap load fails after some but not all tables are loaded.
+BehaviourInjection callerVerifier
+= new BehaviourInjection() {
+  int cntTables = 0;
+  @Nullable
+  @Override
+  public Boolean apply(@Nullable CallerArguments args) {
+cntTables++;
+if (args.dbName.equalsIgnoreCase(replicatedDbName) && cntTables > 
failAfterNumTables) {
+  injectionPathCalled = true;
+  LOG.warn("Verifier - DB : " + args.dbName + " TABLE : " + 
args.tblName);
+  return false;
+}
+return true;
+  }
+};
+
+InjectableBehaviourObjectStore.setAlterTableModifier(callerVerifier);
+try {
+  replica.loadFailure(replicatedDbName, tuple.dumpLocation);
+  callerVerifier.assertInjectionsPerformed(true, false);
+} finally {
+  InjectableBehaviourObjectStore.resetAlterTableModifier();
+}
 
 Review comment:
   Shall add necessary validations to see only so many tables are loaded at 
this point of time.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 220948)

> Stats replication for ACID tables.
> --
>
> Key: HIVE-21109
> URL: https://issues.apache.org/jira/browse/HIVE-21109
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, 
> HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, 
> HIVE-21109.06.patch, HIVE-21109.07.patch, HIVE-21109.08.patch
>
>  Time Spent: 8h
>  Remaining Estimate: 0h
>
> Transactional tables require a writeID associated with the stats update. This 
> writeId needs to be in sync with the writeId on the source and hence needs to 
> be replicated from the source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.

2019-03-31 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=220946=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-220946
 ]

ASF GitHub Bot logged work on HIVE-21109:
-

Author: ASF GitHub Bot
Created on: 31/Mar/19 07:33
Start Date: 31/Mar/19 07:33
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #579: HIVE-21109 : 
Support stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r270652901
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
 ##
 @@ -2950,21 +2957,32 @@ public Partition createPartition(Table tbl, 
Map partSpec) throws
 int size = addPartitionDesc.getPartitionCount();
 List in =
 new ArrayList(size);
-AcidUtils.TableSnapshot tableSnapshot = AcidUtils.getTableSnapshot(conf, 
tbl, true);
 long writeId;
 String validWriteIdList;
-if (tableSnapshot != null && tableSnapshot.getWriteId() > 0) {
-  writeId = tableSnapshot.getWriteId();
-  validWriteIdList = tableSnapshot.getValidWriteIdList();
+
+// In case of replication, get the writeId from the source and use valid 
write Id list
+// for replication.
+if (addPartitionDesc.getReplicationSpec().isInReplicationScope() &&
+addPartitionDesc.getPartition(0).getWriteId() > 0) {
+  writeId = addPartitionDesc.getPartition(0).getWriteId();
+  validWriteIdList = new 
ValidReaderWriteIdList(TableName.getDbTable(tbl.getDbName(),
 
 Review comment:
   Pls add a comment on why this hardcoding logic works for repl flow.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 220946)
Time Spent: 8h  (was: 7h 50m)

> Stats replication for ACID tables.
> --
>
> Key: HIVE-21109
> URL: https://issues.apache.org/jira/browse/HIVE-21109
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, 
> HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, 
> HIVE-21109.06.patch, HIVE-21109.07.patch, HIVE-21109.08.patch
>
>  Time Spent: 8h
>  Remaining Estimate: 0h
>
> Transactional tables require a writeID associated with the stats update. This 
> writeId needs to be in sync with the writeId on the source and hence needs to 
> be replicated from the source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.

2019-03-31 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=220945=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-220945
 ]

ASF GitHub Bot logged work on HIVE-21109:
-

Author: ASF GitHub Bot
Created on: 31/Mar/19 07:33
Start Date: 31/Mar/19 07:33
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #579: HIVE-21109 : 
Support stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r270652886
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
 ##
 @@ -2950,21 +2957,32 @@ public Partition createPartition(Table tbl, 
Map partSpec) throws
 int size = addPartitionDesc.getPartitionCount();
 List in =
 new ArrayList(size);
-AcidUtils.TableSnapshot tableSnapshot = AcidUtils.getTableSnapshot(conf, 
tbl, true);
 long writeId;
 String validWriteIdList;
-if (tableSnapshot != null && tableSnapshot.getWriteId() > 0) {
-  writeId = tableSnapshot.getWriteId();
-  validWriteIdList = tableSnapshot.getValidWriteIdList();
+
+// In case of replication, get the writeId from the source and use valid 
write Id list
+// for replication.
+if (addPartitionDesc.getReplicationSpec().isInReplicationScope() &&
+addPartitionDesc.getPartition(0).getWriteId() > 0) {
 
 Review comment:
   I think, in replication flow and for transactional table, the 2nd condition 
should be true. If no valid writeId obtained from source, then need to fail. We 
cannot fall back to default logic.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 220945)

> Stats replication for ACID tables.
> --
>
> Key: HIVE-21109
> URL: https://issues.apache.org/jira/browse/HIVE-21109
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, 
> HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, 
> HIVE-21109.06.patch, HIVE-21109.07.patch, HIVE-21109.08.patch
>
>  Time Spent: 7h 50m
>  Remaining Estimate: 0h
>
> Transactional tables require a writeID associated with the stats update. This 
> writeId needs to be in sync with the writeId on the source and hence needs to 
> be replicated from the source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.

2019-03-31 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=220951=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-220951
 ]

ASF GitHub Bot logged work on HIVE-21109:
-

Author: ASF GitHub Bot
Created on: 31/Mar/19 07:33
Start Date: 31/Mar/19 07:33
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #579: HIVE-21109 : 
Support stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r270653475
 
 

 ##
 File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestStatsReplicationScenarios.java
 ##
 @@ -301,12 +338,106 @@ private String dumpLoadVerify(List tableNames, 
String lastReplicationId,
 return dumpTuple.lastReplicationId;
   }
 
+  /**
+   * Run a bootstrap that will fail.
+   * @param tuple the location of bootstrap dump
+   */
+  private void failBootstrapLoad(WarehouseInstance.Tuple tuple, int 
failAfterNumTables) throws Throwable {
+// fail setting ckpt directory property for the second table so that we 
test the case when
+// bootstrap load fails after some but not all tables are loaded.
+BehaviourInjection callerVerifier
+= new BehaviourInjection() {
+  int cntTables = 0;
+  @Nullable
+  @Override
+  public Boolean apply(@Nullable CallerArguments args) {
+cntTables++;
+if (args.dbName.equalsIgnoreCase(replicatedDbName) && cntTables > 
failAfterNumTables) {
+  injectionPathCalled = true;
+  LOG.warn("Verifier - DB : " + args.dbName + " TABLE : " + 
args.tblName);
+  return false;
+}
+return true;
+  }
+};
+
+InjectableBehaviourObjectStore.setAlterTableModifier(callerVerifier);
+try {
+  replica.loadFailure(replicatedDbName, tuple.dumpLocation);
+  callerVerifier.assertInjectionsPerformed(true, false);
+} finally {
+  InjectableBehaviourObjectStore.resetAlterTableModifier();
+}
+  }
+
+  private void failIncrementalLoad(WarehouseInstance.Tuple dumpTuple, int 
failAfterNumEvents) throws Throwable {
+// fail add notification when updating table stats after given number of 
such events. Thus we
+// test successful application as well as failed application of this event.
+BehaviourInjection callerVerifier
+= new BehaviourInjection() {
+  int cntEvents = 0;
+  @Override
+  public Boolean apply(NotificationEvent entry) {
+cntEvents++;
+if 
(entry.getEventType().equalsIgnoreCase(EventMessage.EventType.UPDATE_TABLE_COLUMN_STAT.toString())
 &&
+cntEvents > failAfterNumEvents) {
+  injectionPathCalled = true;
+  LOG.warn("Verifier - DB: " + entry.getDbName()
+  + " Table: " + entry.getTableName()
+  + " Event: " + entry.getEventType());
+  return false;
+}
+return true;
+  }
+};
+
+InjectableBehaviourObjectStore.setAddNotificationModifier(callerVerifier);
+try {
+  replica.loadFailure(replicatedDbName, dumpTuple.dumpLocation);
+} finally {
+  InjectableBehaviourObjectStore.resetAddNotificationModifier();
+}
+callerVerifier.assertInjectionsPerformed(true, false);
+
+// fail add notification when updating partition stats for for the second 
time. Thus we test
+// successful application as well as failed application of this event.
+callerVerifier = new BehaviourInjection() {
+  int cntEvents = 1;
+
+  @Override
+  public Boolean apply(NotificationEvent entry) {
+cntEvents++;
+if 
(entry.getEventType().equalsIgnoreCase(EventMessage.EventType.UPDATE_PARTITION_COLUMN_STAT.toString())
 &&
+cntEvents > failAfterNumEvents) {
+  injectionPathCalled = true;
+  LOG.warn("Verifier - DB: " + entry.getDbName()
+  + " Table: " + entry.getTableName()
+  + " Event: " + entry.getEventType());
+  return false;
+}
+return true;
+  }
+};
+
+InjectableBehaviourObjectStore.setAddNotificationModifier(callerVerifier);
+try {
+  replica.loadFailure(replicatedDbName, dumpTuple.dumpLocation);
+} finally {
+  InjectableBehaviourObjectStore.resetAddNotificationModifier();
+}
 
 Review comment:
   Shall add validations to see if REPL LOAD fails at right place.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 220951)
Time Spent: 8.5h  (was: 8h 20m)

> Stats replication for ACID tables.
> --

[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.

2019-03-31 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=220944=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-220944
 ]

ASF GitHub Bot logged work on HIVE-21109:
-

Author: ASF GitHub Bot
Created on: 31/Mar/19 07:33
Start Date: 31/Mar/19 07:33
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #579: HIVE-21109 : 
Support stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r270652633
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnStatsUpdateTask.java
 ##
 @@ -297,21 +303,34 @@ private ColumnStatisticsDesc getColumnStatsDesc(String 
dbName,
 
   private int persistColumnStats(Hive db) throws HiveException, MetaException, 
IOException {
 ColumnStatistics colStats = constructColumnStatsFromInput();
-ColumnStatisticsDesc colStatsDesc = colStats.getStatsDesc();
-// We do not support stats replication for a transactional table yet. If 
we are converting
-// a non-transactional table to a transactional table during replication, 
we might get
-// column statistics but we shouldn't update those.
-if (work.getColStats() != null &&
-
AcidUtils.isTransactionalTable(getHive().getTable(colStatsDesc.getDbName(),
-  
colStatsDesc.getTableName( {
-  LOG.debug("Skipped updating column stats for table " +
-TableName.getDbTable(colStatsDesc.getDbName(), 
colStatsDesc.getTableName()) +
-" because it is converted to a transactional table during 
replication.");
-  return 0;
-}
-
 SetPartitionsStatsRequest request =
 new SetPartitionsStatsRequest(Collections.singletonList(colStats));
+
+// Set writeId and validWriteId list for replicated statistics.
+if (work.getColStats() != null) {
+  String dbName = colStats.getStatsDesc().getDbName();
+  String tblName = colStats.getStatsDesc().getTableName();
+  Table tbl = db.getTable(dbName, tblName);
+  long writeId = work.getWriteId();
+  // If it's a transactional table on source and target, we will get a 
valid writeId
+  // associated with it. Otherwise it's a non-transactional table on 
source migrated to a
+  // transactional table on target, we need to craft a valid writeId here.
+  if (AcidUtils.isTransactionalTable(tbl)) {
+ValidWriteIdList writeIds;
+if (writeId <= 0) {
 
 Review comment:
   Instead of having this assumption of "writeId <= 0 means migration case", it 
is better if the caller sets the correct writeId in ColumnStatsUpdateWork 
itself. 
   If it is non-migration case and there is a bug in the caller and passes 
wrong writeId, then we throw incorrect exception message.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 220944)

> Stats replication for ACID tables.
> --
>
> Key: HIVE-21109
> URL: https://issues.apache.org/jira/browse/HIVE-21109
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, 
> HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, 
> HIVE-21109.06.patch, HIVE-21109.07.patch, HIVE-21109.08.patch
>
>  Time Spent: 7h 50m
>  Remaining Estimate: 0h
>
> Transactional tables require a writeID associated with the stats update. This 
> writeId needs to be in sync with the writeId on the source and hence needs to 
> be replicated from the source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.

2019-03-31 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=220942=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-220942
 ]

ASF GitHub Bot logged work on HIVE-21109:
-

Author: ASF GitHub Bot
Created on: 31/Mar/19 07:33
Start Date: 31/Mar/19 07:33
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #579: HIVE-21109 : 
Support stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r270653464
 
 

 ##
 File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestStatsReplicationScenarios.java
 ##
 @@ -301,12 +338,106 @@ private String dumpLoadVerify(List tableNames, 
String lastReplicationId,
 return dumpTuple.lastReplicationId;
   }
 
+  /**
+   * Run a bootstrap that will fail.
+   * @param tuple the location of bootstrap dump
+   */
+  private void failBootstrapLoad(WarehouseInstance.Tuple tuple, int 
failAfterNumTables) throws Throwable {
+// fail setting ckpt directory property for the second table so that we 
test the case when
+// bootstrap load fails after some but not all tables are loaded.
+BehaviourInjection callerVerifier
+= new BehaviourInjection() {
+  int cntTables = 0;
+  @Nullable
+  @Override
+  public Boolean apply(@Nullable CallerArguments args) {
+cntTables++;
+if (args.dbName.equalsIgnoreCase(replicatedDbName) && cntTables > 
failAfterNumTables) {
+  injectionPathCalled = true;
+  LOG.warn("Verifier - DB : " + args.dbName + " TABLE : " + 
args.tblName);
+  return false;
+}
+return true;
+  }
+};
+
+InjectableBehaviourObjectStore.setAlterTableModifier(callerVerifier);
+try {
+  replica.loadFailure(replicatedDbName, tuple.dumpLocation);
+  callerVerifier.assertInjectionsPerformed(true, false);
+} finally {
+  InjectableBehaviourObjectStore.resetAlterTableModifier();
+}
+  }
+
+  private void failIncrementalLoad(WarehouseInstance.Tuple dumpTuple, int 
failAfterNumEvents) throws Throwable {
+// fail add notification when updating table stats after given number of 
such events. Thus we
+// test successful application as well as failed application of this event.
+BehaviourInjection callerVerifier
+= new BehaviourInjection() {
+  int cntEvents = 0;
+  @Override
+  public Boolean apply(NotificationEvent entry) {
+cntEvents++;
 
 Review comment:
   The add notification count in target may not match the number of events from 
source. So, better to count the number of AlterTable which changes last_repl_id 
parameters. It will be set once per event.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 220942)
Time Spent: 7h 50m  (was: 7h 40m)

> Stats replication for ACID tables.
> --
>
> Key: HIVE-21109
> URL: https://issues.apache.org/jira/browse/HIVE-21109
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, 
> HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, 
> HIVE-21109.06.patch, HIVE-21109.07.patch, HIVE-21109.08.patch
>
>  Time Spent: 7h 50m
>  Remaining Estimate: 0h
>
> Transactional tables require a writeID associated with the stats update. This 
> writeId needs to be in sync with the writeId on the source and hence needs to 
> be replicated from the source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.

2019-03-31 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=220949=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-220949
 ]

ASF GitHub Bot logged work on HIVE-21109:
-

Author: ASF GitHub Bot
Created on: 31/Mar/19 07:33
Start Date: 31/Mar/19 07:33
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #579: HIVE-21109 : 
Support stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r270652817
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
 ##
 @@ -694,7 +695,9 @@ public void alterTable(String catName, String dbName, 
String tblName, Table newT
   AcidUtils.TableSnapshot tableSnapshot = null;
   if (transactional) {
 if (replWriteId > 0) {
-  ValidWriteIdList writeIds = 
AcidUtils.getTableValidWriteIdListWithTxnList(conf, dbName, tblName);
+  ValidWriteIdList writeIds = new 
ValidReaderWriteIdList(TableName.getDbTable(dbName, tblName),
 
 Review comment:
   Pls add a comment on why this hardcoding logic works for repl flow.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 220949)
Time Spent: 8h 10m  (was: 8h)

> Stats replication for ACID tables.
> --
>
> Key: HIVE-21109
> URL: https://issues.apache.org/jira/browse/HIVE-21109
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, 
> HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, 
> HIVE-21109.06.patch, HIVE-21109.07.patch, HIVE-21109.08.patch
>
>  Time Spent: 8h 10m
>  Remaining Estimate: 0h
>
> Transactional tables require a writeID associated with the stats update. This 
> writeId needs to be in sync with the writeId on the source and hence needs to 
> be replicated from the source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.

2019-03-31 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=220943=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-220943
 ]

ASF GitHub Bot logged work on HIVE-21109:
-

Author: ASF GitHub Bot
Created on: 31/Mar/19 07:33
Start Date: 31/Mar/19 07:33
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #579: HIVE-21109 : 
Support stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r270653379
 
 

 ##
 File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestStatsReplicationScenarios.java
 ##
 @@ -301,12 +338,106 @@ private String dumpLoadVerify(List tableNames, 
String lastReplicationId,
 return dumpTuple.lastReplicationId;
   }
 
+  /**
+   * Run a bootstrap that will fail.
+   * @param tuple the location of bootstrap dump
+   */
+  private void failBootstrapLoad(WarehouseInstance.Tuple tuple, int 
failAfterNumTables) throws Throwable {
+// fail setting ckpt directory property for the second table so that we 
test the case when
+// bootstrap load fails after some but not all tables are loaded.
+BehaviourInjection callerVerifier
+= new BehaviourInjection() {
+  int cntTables = 0;
+  @Nullable
+  @Override
+  public Boolean apply(@Nullable CallerArguments args) {
+cntTables++;
 
 Review comment:
   This stub will be called multiple times per table as it is invoked by 
several methods in InjectableBehaviourObjectStore. Need to increment the count 
only if encounter a new table.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 220943)
Time Spent: 7h 50m  (was: 7h 40m)

> Stats replication for ACID tables.
> --
>
> Key: HIVE-21109
> URL: https://issues.apache.org/jira/browse/HIVE-21109
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, 
> HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, 
> HIVE-21109.06.patch, HIVE-21109.07.patch, HIVE-21109.08.patch
>
>  Time Spent: 7h 50m
>  Remaining Estimate: 0h
>
> Transactional tables require a writeID associated with the stats update. This 
> writeId needs to be in sync with the writeId on the source and hence needs to 
> be replicated from the source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.

2019-03-31 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=220940=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-220940
 ]

ASF GitHub Bot logged work on HIVE-21109:
-

Author: ASF GitHub Bot
Created on: 31/Mar/19 07:33
Start Date: 31/Mar/19 07:33
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #579: HIVE-21109 : 
Support stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r270652339
 
 

 ##
 File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestStatsReplicationScenarios.java
 ##
 @@ -216,16 +233,23 @@ private void 
verifyNoPartitionStatsReplicationForMetadataOnly(String tableName)
 String ndTableName = "ndTable";
 // Partitioned table without data during bootstrap and hence no stats.
 String ndPartTableName = "ndPTable";
+String tblCreateExtra = "";
+
+if (useAcidTables) {
 
 Review comment:
   We can add one more test set for MM (insert-only ACID tables).
   Also, ACID table stats tests should cover few more operations.
   1. Delete and update on Full ACID tables.
   2. Insert overwrite, Truncate.
   3. LOAD DATA, Import.
   4. CTAS.
   5. MERGE.
   6. ADD/REMOVE columns. 
   7. Table/partition renames (Need to see, if REPL LOAD of rename event takes 
care of stats too.)
   Note, we need to run dump and load after each operation to check if the 
stats are consistent.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 220940)
Time Spent: 7h 40m  (was: 7.5h)

> Stats replication for ACID tables.
> --
>
> Key: HIVE-21109
> URL: https://issues.apache.org/jira/browse/HIVE-21109
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, 
> HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, 
> HIVE-21109.06.patch, HIVE-21109.07.patch, HIVE-21109.08.patch
>
>  Time Spent: 7h 40m
>  Remaining Estimate: 0h
>
> Transactional tables require a writeID associated with the stats update. This 
> writeId needs to be in sync with the writeId on the source and hence needs to 
> be replicated from the source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.

2019-03-29 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=220593=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-220593
 ]

ASF GitHub Bot logged work on HIVE-21109:
-

Author: ASF GitHub Bot
Created on: 29/Mar/19 16:12
Start Date: 29/Mar/19 16:12
Worklog Time Spent: 10m 
  Work Description: ashutosh-bapat commented on pull request #579: 
HIVE-21109 : Support stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r270479016
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
 ##
 @@ -987,10 +989,14 @@ public void createTable(Table tbl, boolean ifNotExists,
   tTbl.setPrivileges(principalPrivs);
 }
   }
-  // Set table snapshot to api.Table to make it persistent.
-  TableSnapshot tableSnapshot = AcidUtils.getTableSnapshot(conf, tbl, 
true);
-  if (tableSnapshot != null) {
-tbl.getTTable().setWriteId(tableSnapshot.getWriteId());
+  // Set table snapshot to api.Table to make it persistent. A 
transactional table being
+  // replicated may have a valid write Id copied from the source. Use that 
instead of
+  // crafting one on the replica.
+  if (tTbl.getWriteId() <= 0) {
 
 Review comment:
   You are right. We do not need it at the creation time. We already have tests 
for that and they are working fine i.e. the expected stats both the table level 
and column level is getting replicated.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 220593)
Time Spent: 7.5h  (was: 7h 20m)

> Stats replication for ACID tables.
> --
>
> Key: HIVE-21109
> URL: https://issues.apache.org/jira/browse/HIVE-21109
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, 
> HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, 
> HIVE-21109.06.patch, HIVE-21109.07.patch, HIVE-21109.08.patch
>
>  Time Spent: 7.5h
>  Remaining Estimate: 0h
>
> Transactional tables require a writeID associated with the stats update. This 
> writeId needs to be in sync with the writeId on the source and hence needs to 
> be replicated from the source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.

2019-03-29 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=220592=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-220592
 ]

ASF GitHub Bot logged work on HIVE-21109:
-

Author: ASF GitHub Bot
Created on: 29/Mar/19 16:10
Start Date: 29/Mar/19 16:10
Worklog Time Spent: 10m 
  Work Description: ashutosh-bapat commented on pull request #579: 
HIVE-21109 : Support stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r270478318
 
 

 ##
 File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestStatsReplicationScenarios.java
 ##
 @@ -359,17 +383,20 @@ private void testStatsReplicationCommon(boolean 
parallelBootstrap, boolean metad
   }
 
   @Test
-  public void testForNonAcidTables() throws Throwable {
+  public void testNonParallelBootstrapLoad() throws Throwable {
+LOG.info("Testing " + testName.getClass().getName() + "." + 
testName.getMethodName());
 testStatsReplicationCommon(false, false);
   }
 
   @Test
-  public void testForNonAcidTablesParallelBootstrapLoad() throws Throwable {
-testStatsReplicationCommon(true, false);
+  public void testForParallelBootstrapLoad() throws Throwable {
+LOG.info("Testing " + testName.getClass().getName() + "." + 
testName.getMethodName());
+testStatsReplicationCommon(true, false );
   }
 
   @Test
-  public void testNonAcidMetadataOnlyDump() throws Throwable {
+  public void testMetadataOnlyDump() throws Throwable {
 
 Review comment:
   Added test for the first case. For second case, the events for parallel 
inserts will be serialized and applied serially on repl side. So this should be 
a problem on repl. We may test whether the events are generated in serialized 
fashion and have same expected contents. But that should be done a test which 
tests concurrent inserts (may be we already have it somewhere) and not in a 
replication test.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 220592)
Time Spent: 7h 20m  (was: 7h 10m)

> Stats replication for ACID tables.
> --
>
> Key: HIVE-21109
> URL: https://issues.apache.org/jira/browse/HIVE-21109
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, 
> HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, 
> HIVE-21109.06.patch, HIVE-21109.07.patch, HIVE-21109.08.patch
>
>  Time Spent: 7h 20m
>  Remaining Estimate: 0h
>
> Transactional tables require a writeID associated with the stats update. This 
> writeId needs to be in sync with the writeId on the source and hence needs to 
> be replicated from the source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.

2019-03-29 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=220591=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-220591
 ]

ASF GitHub Bot logged work on HIVE-21109:
-

Author: ASF GitHub Bot
Created on: 29/Mar/19 16:08
Start Date: 29/Mar/19 16:08
Worklog Time Spent: 10m 
  Work Description: ashutosh-bapat commented on pull request #579: 
HIVE-21109 : Support stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r270477619
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
 ##
 @@ -2950,21 +2956,33 @@ public Partition createPartition(Table tbl, 
Map partSpec) throws
 int size = addPartitionDesc.getPartitionCount();
 List in =
 new ArrayList(size);
-AcidUtils.TableSnapshot tableSnapshot = AcidUtils.getTableSnapshot(conf, 
tbl, true);
 long writeId;
 String validWriteIdList;
-if (tableSnapshot != null && tableSnapshot.getWriteId() > 0) {
-  writeId = tableSnapshot.getWriteId();
-  validWriteIdList = tableSnapshot.getValidWriteIdList();
+
+// In case of replication, get the writeId from the source and use valid 
write Id list
+// for replication.
+if (addPartitionDesc.getReplicationSpec() != null &&
+addPartitionDesc.getReplicationSpec().isInReplicationScope() &&
+addPartitionDesc.getPartition(0).getWriteId() > 0) {
+  writeId = addPartitionDesc.getPartition(0).getWriteId();
+  validWriteIdList =
 
 Review comment:
   Done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 220591)
Time Spent: 7h 10m  (was: 7h)

> Stats replication for ACID tables.
> --
>
> Key: HIVE-21109
> URL: https://issues.apache.org/jira/browse/HIVE-21109
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, 
> HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, 
> HIVE-21109.06.patch, HIVE-21109.07.patch, HIVE-21109.08.patch
>
>  Time Spent: 7h 10m
>  Remaining Estimate: 0h
>
> Transactional tables require a writeID associated with the stats update. This 
> writeId needs to be in sync with the writeId on the source and hence needs to 
> be replicated from the source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.

2019-03-29 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=220590=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-220590
 ]

ASF GitHub Bot logged work on HIVE-21109:
-

Author: ASF GitHub Bot
Created on: 29/Mar/19 16:08
Start Date: 29/Mar/19 16:08
Worklog Time Spent: 10m 
  Work Description: ashutosh-bapat commented on pull request #579: 
HIVE-21109 : Support stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r270477567
 
 

 ##
 File path: 
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnCommonUtils.java
 ##
 @@ -84,6 +86,73 @@ public static ValidTxnList 
createValidReadTxnList(GetOpenTxnsResponse txns, long
 return new ValidReadTxnList(exceptions, outAbortedBits, highWaterMark, 
minOpenTxnId);
   }
 
+  /**
+   * Transform a {@link 
org.apache.hadoop.hive.metastore.api.GetOpenTxnsResponse} to a
+   * {@link org.apache.hadoop.hive.common.ValidTxnList}.  This assumes that 
the caller intends to
+   * read the files, and thus treats both open and aborted transactions as 
invalid.
+   *
+   * This API is used by Hive replication which may have multiple transactions 
open at a time.
+   *
+   * @param txns open txn list from the metastore
+   * @param currentTxns Current transactions that the replication has opened.  
If any of the
+   *transactions is greater than 0 it will be removed from 
the exceptions
+   *list so that the replication sees its own transaction 
as valid.
+   * @return a valid txn list.
+   */
+  public static ValidTxnList createValidReadTxnList(GetOpenTxnsResponse txns,
 
 Review comment:
   Done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 220590)
Time Spent: 7h  (was: 6h 50m)

> Stats replication for ACID tables.
> --
>
> Key: HIVE-21109
> URL: https://issues.apache.org/jira/browse/HIVE-21109
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, 
> HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, 
> HIVE-21109.06.patch, HIVE-21109.07.patch, HIVE-21109.08.patch
>
>  Time Spent: 7h
>  Remaining Estimate: 0h
>
> Transactional tables require a writeID associated with the stats update. This 
> writeId needs to be in sync with the writeId on the source and hence needs to 
> be replicated from the source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.

2019-03-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=219857=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-219857
 ]

ASF GitHub Bot logged work on HIVE-21109:
-

Author: ASF GitHub Bot
Created on: 28/Mar/19 07:24
Start Date: 28/Mar/19 07:24
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #579: HIVE-21109 : 
Support stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269880708
 
 

 ##
 File path: 
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnCommonUtils.java
 ##
 @@ -84,6 +86,73 @@ public static ValidTxnList 
createValidReadTxnList(GetOpenTxnsResponse txns, long
 return new ValidReadTxnList(exceptions, outAbortedBits, highWaterMark, 
minOpenTxnId);
   }
 
+  /**
+   * Transform a {@link 
org.apache.hadoop.hive.metastore.api.GetOpenTxnsResponse} to a
+   * {@link org.apache.hadoop.hive.common.ValidTxnList}.  This assumes that 
the caller intends to
+   * read the files, and thus treats both open and aborted transactions as 
invalid.
+   *
+   * This API is used by Hive replication which may have multiple transactions 
open at a time.
+   *
+   * @param txns open txn list from the metastore
+   * @param currentTxns Current transactions that the replication has opened.  
If any of the
+   *transactions is greater than 0 it will be removed from 
the exceptions
+   *list so that the replication sees its own transaction 
as valid.
+   * @return a valid txn list.
+   */
+  public static ValidTxnList createValidReadTxnList(GetOpenTxnsResponse txns,
 
 Review comment:
   Yes, even I think, for REPL LOAD, we should always hardcode the 
ValidWriteIdList using current writeId so that stats are always valid while 
applying current event. Even if it is invalid, the subsequent 
alterTable/partition event would set it so in the table/partition parameters.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 219857)
Time Spent: 6h 50m  (was: 6h 40m)

> Stats replication for ACID tables.
> --
>
> Key: HIVE-21109
> URL: https://issues.apache.org/jira/browse/HIVE-21109
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, 
> HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, 
> HIVE-21109.06.patch
>
>  Time Spent: 6h 50m
>  Remaining Estimate: 0h
>
> Transactional tables require a writeID associated with the stats update. This 
> writeId needs to be in sync with the writeId on the source and hence needs to 
> be replicated from the source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.

2019-03-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=219855=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-219855
 ]

ASF GitHub Bot logged work on HIVE-21109:
-

Author: ASF GitHub Bot
Created on: 28/Mar/19 07:21
Start Date: 28/Mar/19 07:21
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #579: HIVE-21109 : 
Support stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269880083
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
 ##
 @@ -2950,21 +2956,33 @@ public Partition createPartition(Table tbl, 
Map partSpec) throws
 int size = addPartitionDesc.getPartitionCount();
 List in =
 new ArrayList(size);
-AcidUtils.TableSnapshot tableSnapshot = AcidUtils.getTableSnapshot(conf, 
tbl, true);
 long writeId;
 String validWriteIdList;
-if (tableSnapshot != null && tableSnapshot.getWriteId() > 0) {
-  writeId = tableSnapshot.getWriteId();
-  validWriteIdList = tableSnapshot.getValidWriteIdList();
+
+// In case of replication, get the writeId from the source and use valid 
write Id list
+// for replication.
+if (addPartitionDesc.getReplicationSpec() != null &&
+addPartitionDesc.getReplicationSpec().isInReplicationScope() &&
+addPartitionDesc.getPartition(0).getWriteId() > 0) {
+  writeId = addPartitionDesc.getPartition(0).getWriteId();
+  validWriteIdList =
 
 Review comment:
   Even that logic to create ValidWriteIdList based on all repl opened txns 
isn't right as it says that stats are valid for all these open txns but it 
isn't. Also, it sets high water mark based on 0th index in the replTxnsList map 
which might be pointing to wrong writeId compared to current txn's writeId. So, 
I doubt, this logic should be anyways removed.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 219855)
Time Spent: 6h 40m  (was: 6.5h)

> Stats replication for ACID tables.
> --
>
> Key: HIVE-21109
> URL: https://issues.apache.org/jira/browse/HIVE-21109
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, 
> HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, 
> HIVE-21109.06.patch
>
>  Time Spent: 6h 40m
>  Remaining Estimate: 0h
>
> Transactional tables require a writeID associated with the stats update. This 
> writeId needs to be in sync with the writeId on the source and hence needs to 
> be replicated from the source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.

2019-03-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=219854=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-219854
 ]

ASF GitHub Bot logged work on HIVE-21109:
-

Author: ASF GitHub Bot
Created on: 28/Mar/19 07:20
Start Date: 28/Mar/19 07:20
Worklog Time Spent: 10m 
  Work Description: ashutosh-bapat commented on pull request #579: 
HIVE-21109 : Support stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269879748
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java
 ##
 @@ -2689,7 +2689,19 @@ private int alterTable(Hive db, AlterTableDesc 
alterTbl) throws HiveException {
   } else {
 // Note: this is necessary for UPDATE_STATISTICS command, that 
operates via ADDPROPS (why?).
 //   For any other updates, we don't want to do txn check on 
partitions when altering table.
-boolean isTxn = alterTbl.getPartSpec() != null && alterTbl.getOp() == 
AlterTableTypes.ADDPROPS;
+boolean isTxn = false;
+if (alterTbl.getPartSpec() != null && alterTbl.getOp() == 
AlterTableTypes.ADDPROPS) {
+  // ADDPROPS is used to add repl.last.id during replication. That's 
not a transactional
+  // change.
+  Map props = alterTbl.getProps();
+  if (props.size() <= 1 && 
props.get(ReplicationSpec.KEY.CURR_STATE_ID.toString()) != null) {
 
 Review comment:
   Done. Instead of last.repl.id, I am explicitly checking if the property is 
related to stats and then set isTxn only in case of replication.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 219854)
Time Spent: 6.5h  (was: 6h 20m)

> Stats replication for ACID tables.
> --
>
> Key: HIVE-21109
> URL: https://issues.apache.org/jira/browse/HIVE-21109
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, 
> HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, 
> HIVE-21109.06.patch
>
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>
> Transactional tables require a writeID associated with the stats update. This 
> writeId needs to be in sync with the writeId on the source and hence needs to 
> be replicated from the source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.

2019-03-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=219844=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-219844
 ]

ASF GitHub Bot logged work on HIVE-21109:
-

Author: ASF GitHub Bot
Created on: 28/Mar/19 06:51
Start Date: 28/Mar/19 06:51
Worklog Time Spent: 10m 
  Work Description: ashutosh-bapat commented on pull request #579: 
HIVE-21109 : Support stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269874353
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
 ##
 @@ -2950,21 +2956,33 @@ public Partition createPartition(Table tbl, 
Map partSpec) throws
 int size = addPartitionDesc.getPartitionCount();
 List in =
 new ArrayList(size);
-AcidUtils.TableSnapshot tableSnapshot = AcidUtils.getTableSnapshot(conf, 
tbl, true);
 long writeId;
 String validWriteIdList;
-if (tableSnapshot != null && tableSnapshot.getWriteId() > 0) {
-  writeId = tableSnapshot.getWriteId();
-  validWriteIdList = tableSnapshot.getValidWriteIdList();
+
+// In case of replication, get the writeId from the source and use valid 
write Id list
+// for replication.
+if (addPartitionDesc.getReplicationSpec() != null &&
+addPartitionDesc.getReplicationSpec().isInReplicationScope() &&
+addPartitionDesc.getPartition(0).getWriteId() > 0) {
+  writeId = addPartitionDesc.getPartition(0).getWriteId();
+  validWriteIdList =
 
 Review comment:
   Ok. Underneath this code is using the valid write id list created using open 
transaction list of repl. So, this isn't wrong. But this may change subject to 
the changes because of other comment you have.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 219844)
Time Spent: 6h 20m  (was: 6h 10m)

> Stats replication for ACID tables.
> --
>
> Key: HIVE-21109
> URL: https://issues.apache.org/jira/browse/HIVE-21109
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, 
> HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, 
> HIVE-21109.06.patch
>
>  Time Spent: 6h 20m
>  Remaining Estimate: 0h
>
> Transactional tables require a writeID associated with the stats update. This 
> writeId needs to be in sync with the writeId on the source and hence needs to 
> be replicated from the source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.

2019-03-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=219842=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-219842
 ]

ASF GitHub Bot logged work on HIVE-21109:
-

Author: ASF GitHub Bot
Created on: 28/Mar/19 06:42
Start Date: 28/Mar/19 06:42
Worklog Time Spent: 10m 
  Work Description: ashutosh-bapat commented on pull request #579: 
HIVE-21109 : Support stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269872622
 
 

 ##
 File path: 
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnCommonUtils.java
 ##
 @@ -84,6 +86,73 @@ public static ValidTxnList 
createValidReadTxnList(GetOpenTxnsResponse txns, long
 return new ValidReadTxnList(exceptions, outAbortedBits, highWaterMark, 
minOpenTxnId);
   }
 
+  /**
+   * Transform a {@link 
org.apache.hadoop.hive.metastore.api.GetOpenTxnsResponse} to a
+   * {@link org.apache.hadoop.hive.common.ValidTxnList}.  This assumes that 
the caller intends to
+   * read the files, and thus treats both open and aborted transactions as 
invalid.
+   *
+   * This API is used by Hive replication which may have multiple transactions 
open at a time.
+   *
+   * @param txns open txn list from the metastore
+   * @param currentTxns Current transactions that the replication has opened.  
If any of the
+   *transactions is greater than 0 it will be removed from 
the exceptions
+   *list so that the replication sees its own transaction 
as valid.
+   * @return a valid txn list.
+   */
+  public static ValidTxnList createValidReadTxnList(GetOpenTxnsResponse txns,
 
 Review comment:
   If there were multiple transactions on the source running concurrently at a 
time, there will be those many open transaction events in the dump which when 
replicated will have those many open transactions at a time on the target while 
replaying those events. So, there could be multiple open transactions on target 
even during repl load.
   
   The only link between CreateTableOperation#createTableReplaceMode and 
Hive#alterTable is EnvironmentContext, so would could use this to pass a flag 
to indicate  the valid writeId list should be created using the given writeId. 
But we are using Environment context to pass information only to the metastore 
and not use it in-between. We could construct the valid writeId list in the 
metastore directly like what we are doing for create table and partition using 
that kind of flag. Does that look good?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 219842)
Time Spent: 6h 10m  (was: 6h)

> Stats replication for ACID tables.
> --
>
> Key: HIVE-21109
> URL: https://issues.apache.org/jira/browse/HIVE-21109
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, 
> HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, 
> HIVE-21109.06.patch
>
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>
> Transactional tables require a writeID associated with the stats update. This 
> writeId needs to be in sync with the writeId on the source and hence needs to 
> be replicated from the source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.

2019-03-27 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=219813=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-219813
 ]

ASF GitHub Bot logged work on HIVE-21109:
-

Author: ASF GitHub Bot
Created on: 28/Mar/19 05:26
Start Date: 28/Mar/19 05:26
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #579: HIVE-21109 : 
Support stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269861070
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/plan/ImportTableDesc.java
 ##
 @@ -381,4 +382,11 @@ public void setOwnerName(String ownerName) {
 throw new RuntimeException("Invalid table type : " + getDescType());
 }
   }
+
+  public Long getReplWriteId() {
+if (this.createTblDesc != null) {
+  return this.createTblDesc.getReplWriteId();
 
 Review comment:
   If we unify writeId and replWriteId in CreateTableDesc into one, then it's 
fine. In fact, they are one and the same. So, no point in having 2 members for 
same value.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 219813)
Time Spent: 6h  (was: 5h 50m)

> Stats replication for ACID tables.
> --
>
> Key: HIVE-21109
> URL: https://issues.apache.org/jira/browse/HIVE-21109
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, 
> HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, 
> HIVE-21109.06.patch
>
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
> Transactional tables require a writeID associated with the stats update. This 
> writeId needs to be in sync with the writeId on the source and hence needs to 
> be replicated from the source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.

2019-03-27 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=219812=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-219812
 ]

ASF GitHub Bot logged work on HIVE-21109:
-

Author: ASF GitHub Bot
Created on: 28/Mar/19 05:25
Start Date: 28/Mar/19 05:25
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #579: HIVE-21109 : 
Support stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269861070
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/plan/ImportTableDesc.java
 ##
 @@ -381,4 +382,11 @@ public void setOwnerName(String ownerName) {
 throw new RuntimeException("Invalid table type : " + getDescType());
 }
   }
+
+  public Long getReplWriteId() {
+if (this.createTblDesc != null) {
+  return this.createTblDesc.getReplWriteId();
 
 Review comment:
   If we unify writeId and replWriteId in CreateTableDesc, then it's fine.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 219812)
Time Spent: 5h 50m  (was: 5h 40m)

> Stats replication for ACID tables.
> --
>
> Key: HIVE-21109
> URL: https://issues.apache.org/jira/browse/HIVE-21109
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, 
> HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, 
> HIVE-21109.06.patch
>
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> Transactional tables require a writeID associated with the stats update. This 
> writeId needs to be in sync with the writeId on the source and hence needs to 
> be replicated from the source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.

2019-03-27 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=219808=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-219808
 ]

ASF GitHub Bot logged work on HIVE-21109:
-

Author: ASF GitHub Bot
Created on: 28/Mar/19 04:54
Start Date: 28/Mar/19 04:54
Worklog Time Spent: 10m 
  Work Description: ashutosh-bapat commented on pull request #579: 
HIVE-21109 : Support stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269856891
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/plan/ImportTableDesc.java
 ##
 @@ -381,4 +382,11 @@ public void setOwnerName(String ownerName) {
 throw new RuntimeException("Invalid table type : " + getDescType());
 }
   }
+
+  public Long getReplWriteId() {
+if (this.createTblDesc != null) {
+  return this.createTblDesc.getReplWriteId();
 
 Review comment:
   AFAIU, the reason we set replWriteId in CreateTableDesc is it can be then 
passed everywhere CreateTableDesc is used. It's better not to create two paths 
for passing same writeId, with a risk of those going of sync with each other.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 219808)
Time Spent: 5h 40m  (was: 5.5h)

> Stats replication for ACID tables.
> --
>
> Key: HIVE-21109
> URL: https://issues.apache.org/jira/browse/HIVE-21109
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, 
> HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, 
> HIVE-21109.06.patch
>
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> Transactional tables require a writeID associated with the stats update. This 
> writeId needs to be in sync with the writeId on the source and hence needs to 
> be replicated from the source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.

2019-03-27 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=219788=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-219788
 ]

ASF GitHub Bot logged work on HIVE-21109:
-

Author: ASF GitHub Bot
Created on: 28/Mar/19 03:16
Start Date: 28/Mar/19 03:16
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #579: HIVE-21109 : 
Support stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269844369
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java
 ##
 @@ -2689,7 +2689,19 @@ private int alterTable(Hive db, AlterTableDesc 
alterTbl) throws HiveException {
   } else {
 // Note: this is necessary for UPDATE_STATISTICS command, that 
operates via ADDPROPS (why?).
 //   For any other updates, we don't want to do txn check on 
partitions when altering table.
-boolean isTxn = alterTbl.getPartSpec() != null && alterTbl.getOp() == 
AlterTableTypes.ADDPROPS;
+boolean isTxn = false;
+if (alterTbl.getPartSpec() != null && alterTbl.getOp() == 
AlterTableTypes.ADDPROPS) {
+  // ADDPROPS is used to add repl.last.id during replication. That's 
not a transactional
+  // change.
+  Map props = alterTbl.getProps();
+  if (props.size() <= 1 && 
props.get(ReplicationSpec.KEY.CURR_STATE_ID.toString()) != null) {
 
 Review comment:
   I don't know. I would suggest to keep it non-transactional only in repl flow 
to avoid any impacts.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 219788)
Time Spent: 5.5h  (was: 5h 20m)

> Stats replication for ACID tables.
> --
>
> Key: HIVE-21109
> URL: https://issues.apache.org/jira/browse/HIVE-21109
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, 
> HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, 
> HIVE-21109.06.patch
>
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> Transactional tables require a writeID associated with the stats update. This 
> writeId needs to be in sync with the writeId on the source and hence needs to 
> be replicated from the source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.

2019-03-27 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=219787=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-219787
 ]

ASF GitHub Bot logged work on HIVE-21109:
-

Author: ASF GitHub Bot
Created on: 28/Mar/19 03:12
Start Date: 28/Mar/19 03:12
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #579: HIVE-21109 : 
Support stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269843944
 
 

 ##
 File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
 ##
 @@ -1894,6 +1898,16 @@ private void create_table_core(final RawStore ms, final 
Table tbl,
List checkConstraints)
 throws AlreadyExistsException, MetaException,
 InvalidObjectException, NoSuchObjectException, InvalidInputException {
+
+  ColumnStatistics colStats = null;
+  // If the given table has column statistics, save it here. We will 
update it later.
+  // We don't want it to be part of the Table object being created, lest 
the create table
 
 Review comment:
   I think, it's fine. Ignore this comment.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 219787)
Time Spent: 5h 20m  (was: 5h 10m)

> Stats replication for ACID tables.
> --
>
> Key: HIVE-21109
> URL: https://issues.apache.org/jira/browse/HIVE-21109
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, 
> HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, 
> HIVE-21109.06.patch
>
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> Transactional tables require a writeID associated with the stats update. This 
> writeId needs to be in sync with the writeId on the source and hence needs to 
> be replicated from the source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.

2019-03-27 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=219786=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-219786
 ]

ASF GitHub Bot logged work on HIVE-21109:
-

Author: ASF GitHub Bot
Created on: 28/Mar/19 03:11
Start Date: 28/Mar/19 03:11
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #579: HIVE-21109 : 
Support stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269843519
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/plan/ImportTableDesc.java
 ##
 @@ -381,4 +382,11 @@ public void setOwnerName(String ownerName) {
 throw new RuntimeException("Invalid table type : " + getDescType());
 }
   }
+
+  public Long getReplWriteId() {
+if (this.createTblDesc != null) {
+  return this.createTblDesc.getReplWriteId();
 
 Review comment:
   I meant, prepareImport already takes writeId (which comes from event 
message) as input parameter which is being set in CreateTableDesc and later 
read back by getBaseAddPartitionDescFromPartition. Instead, writeId itself can 
be passed to getBaseAddPartitionDescFromPartition.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 219786)
Time Spent: 5h 10m  (was: 5h)

> Stats replication for ACID tables.
> --
>
> Key: HIVE-21109
> URL: https://issues.apache.org/jira/browse/HIVE-21109
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, 
> HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, 
> HIVE-21109.06.patch
>
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> Transactional tables require a writeID associated with the stats update. This 
> writeId needs to be in sync with the writeId on the source and hence needs to 
> be replicated from the source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.

2019-03-27 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=219785=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-219785
 ]

ASF GitHub Bot logged work on HIVE-21109:
-

Author: ASF GitHub Bot
Created on: 28/Mar/19 03:10
Start Date: 28/Mar/19 03:10
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #579: HIVE-21109 : 
Support stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269257547
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
 ##
 @@ -987,10 +989,14 @@ public void createTable(Table tbl, boolean ifNotExists,
   tTbl.setPrivileges(principalPrivs);
 }
   }
-  // Set table snapshot to api.Table to make it persistent.
-  TableSnapshot tableSnapshot = AcidUtils.getTableSnapshot(conf, tbl, 
true);
-  if (tableSnapshot != null) {
-tbl.getTTable().setWriteId(tableSnapshot.getWriteId());
+  // Set table snapshot to api.Table to make it persistent. A 
transactional table being
+  // replicated may have a valid write Id copied from the source. Use that 
instead of
+  // crafting one on the replica.
+  if (tTbl.getWriteId() <= 0) {
 
 Review comment:
   DO_NOT_UPDATE_STATS flag should be set in createTableFlow as well. Or else 
in autogather mode at target, it will be updated automatically. Not sure if it 
is needed as table itself is not there in metastore. Anyways, please check if 
needed.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 219785)
Time Spent: 5h  (was: 4h 50m)

> Stats replication for ACID tables.
> --
>
> Key: HIVE-21109
> URL: https://issues.apache.org/jira/browse/HIVE-21109
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, 
> HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, 
> HIVE-21109.06.patch
>
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> Transactional tables require a writeID associated with the stats update. This 
> writeId needs to be in sync with the writeId on the source and hence needs to 
> be replicated from the source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.

2019-03-27 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=219784=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-219784
 ]

ASF GitHub Bot logged work on HIVE-21109:
-

Author: ASF GitHub Bot
Created on: 28/Mar/19 03:09
Start Date: 28/Mar/19 03:09
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #579: HIVE-21109 : 
Support stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269843519
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/plan/ImportTableDesc.java
 ##
 @@ -381,4 +382,11 @@ public void setOwnerName(String ownerName) {
 throw new RuntimeException("Invalid table type : " + getDescType());
 }
   }
+
+  public Long getReplWriteId() {
+if (this.createTblDesc != null) {
+  return this.createTblDesc.getReplWriteId();
 
 Review comment:
   I meant, prepareImport already takes writeId as input parameter which is 
being set in CreateTableDesc and later read back by 
getBaseAddPartitionDescFromPartition. Instead, writeId itself can be passed to 
getBaseAddPartitionDescFromPartition.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 219784)
Time Spent: 4h 50m  (was: 4h 40m)

> Stats replication for ACID tables.
> --
>
> Key: HIVE-21109
> URL: https://issues.apache.org/jira/browse/HIVE-21109
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, 
> HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, 
> HIVE-21109.06.patch
>
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> Transactional tables require a writeID associated with the stats update. This 
> writeId needs to be in sync with the writeId on the source and hence needs to 
> be replicated from the source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.

2019-03-27 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=219266=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-219266
 ]

ASF GitHub Bot logged work on HIVE-21109:
-

Author: ASF GitHub Bot
Created on: 27/Mar/19 12:03
Start Date: 27/Mar/19 12:03
Worklog Time Spent: 10m 
  Work Description: ashutosh-bapat commented on pull request #579: 
HIVE-21109 : Support stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269523732
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java
 ##
 @@ -2689,7 +2689,19 @@ private int alterTable(Hive db, AlterTableDesc 
alterTbl) throws HiveException {
   } else {
 // Note: this is necessary for UPDATE_STATISTICS command, that 
operates via ADDPROPS (why?).
 //   For any other updates, we don't want to do txn check on 
partitions when altering table.
-boolean isTxn = alterTbl.getPartSpec() != null && alterTbl.getOp() == 
AlterTableTypes.ADDPROPS;
+boolean isTxn = false;
+if (alterTbl.getPartSpec() != null && alterTbl.getOp() == 
AlterTableTypes.ADDPROPS) {
+  // ADDPROPS is used to add repl.last.id during replication. That's 
not a transactional
+  // change.
+  Map props = alterTbl.getProps();
+  if (props.size() <= 1 && 
props.get(ReplicationSpec.KEY.CURR_STATE_ID.toString()) != null) {
 
 Review comment:
   The comment 
   
   // Note: this is necessary for UPDATE_STATISTICS command, that 
operates via ADDPROPS (why?).
   //   For any other updates, we don't want to do txn check on 
partitions when altering table.
   
   itself looks wrong. I do not see any ADDPROPS usage which is updating 
statistics properties. All those seem to come through AddPartition and not 
alterTable for partitioned table. So, may be we can safely mark this as 
non-transactional always. Does that look right?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 219266)
Time Spent: 4h 40m  (was: 4.5h)

> Stats replication for ACID tables.
> --
>
> Key: HIVE-21109
> URL: https://issues.apache.org/jira/browse/HIVE-21109
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, 
> HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, 
> HIVE-21109.06.patch
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> Transactional tables require a writeID associated with the stats update. This 
> writeId needs to be in sync with the writeId on the source and hence needs to 
> be replicated from the source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.

2019-03-27 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=219260=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-219260
 ]

ASF GitHub Bot logged work on HIVE-21109:
-

Author: ASF GitHub Bot
Created on: 27/Mar/19 11:56
Start Date: 27/Mar/19 11:56
Worklog Time Spent: 10m 
  Work Description: ashutosh-bapat commented on pull request #579: 
HIVE-21109 : Support stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269523732
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java
 ##
 @@ -2689,7 +2689,19 @@ private int alterTable(Hive db, AlterTableDesc 
alterTbl) throws HiveException {
   } else {
 // Note: this is necessary for UPDATE_STATISTICS command, that 
operates via ADDPROPS (why?).
 //   For any other updates, we don't want to do txn check on 
partitions when altering table.
-boolean isTxn = alterTbl.getPartSpec() != null && alterTbl.getOp() == 
AlterTableTypes.ADDPROPS;
+boolean isTxn = false;
+if (alterTbl.getPartSpec() != null && alterTbl.getOp() == 
AlterTableTypes.ADDPROPS) {
+  // ADDPROPS is used to add repl.last.id during replication. That's 
not a transactional
+  // change.
+  Map props = alterTbl.getProps();
+  if (props.size() <= 1 && 
props.get(ReplicationSpec.KEY.CURR_STATE_ID.toString()) != null) {
 
 Review comment:
   itself is wrong. I do not see any ADDPROPS usage which is updating 
transactional properties. All those seem to come through AddPartition and not 
alterTable for partitioned table. So, may be we can safely mark this as 
non-transactional always. Does that look right?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 219260)
Time Spent: 4.5h  (was: 4h 20m)

> Stats replication for ACID tables.
> --
>
> Key: HIVE-21109
> URL: https://issues.apache.org/jira/browse/HIVE-21109
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, 
> HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, 
> HIVE-21109.06.patch
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> Transactional tables require a writeID associated with the stats update. This 
> writeId needs to be in sync with the writeId on the source and hence needs to 
> be replicated from the source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.

2019-03-27 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=219250=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-219250
 ]

ASF GitHub Bot logged work on HIVE-21109:
-

Author: ASF GitHub Bot
Created on: 27/Mar/19 11:11
Start Date: 27/Mar/19 11:11
Worklog Time Spent: 10m 
  Work Description: ashutosh-bapat commented on pull request #579: 
HIVE-21109 : Support stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269508934
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/ddl/table/CreateTableDesc.java
 ##
 @@ -118,7 +118,8 @@
   List notNullConstraints;
   List defaultConstraints;
   List checkConstraints;
-  private ColumnStatistics colStats;
+  private ColumnStatistics colStats;  // For the sake of replication
+  private long writeId = -1; // For the sake of replication
 
 Review comment:
   I was initially afraid that there could be other side-effects of this 
change. Your suggestion will bring all writeId replication through replWriteId, 
which is good. Done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 219250)
Time Spent: 4h 20m  (was: 4h 10m)

> Stats replication for ACID tables.
> --
>
> Key: HIVE-21109
> URL: https://issues.apache.org/jira/browse/HIVE-21109
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, 
> HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, 
> HIVE-21109.06.patch
>
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> Transactional tables require a writeID associated with the stats update. This 
> writeId needs to be in sync with the writeId on the source and hence needs to 
> be replicated from the source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.

2019-03-27 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=219226=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-219226
 ]

ASF GitHub Bot logged work on HIVE-21109:
-

Author: ASF GitHub Bot
Created on: 27/Mar/19 09:46
Start Date: 27/Mar/19 09:46
Worklog Time Spent: 10m 
  Work Description: ashutosh-bapat commented on pull request #579: 
HIVE-21109 : Support stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269476213
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/plan/ImportTableDesc.java
 ##
 @@ -381,4 +382,11 @@ public void setOwnerName(String ownerName) {
 throw new RuntimeException("Invalid table type : " + getDescType());
 }
   }
+
+  public Long getReplWriteId() {
+if (this.createTblDesc != null) {
+  return this.createTblDesc.getReplWriteId();
 
 Review comment:
   In getBaseAddPartitionDescFromPartition() where we use this function, we 
don't have access to the event message. Instead we are passing the writeId 
through ImportTableDesc by calling setReplWriteId(). This function just 
introduces the missing getReplWriteId() method symmetric to setReplWriteId(). 
If we use local variable and pass it around there is a possibility that local 
writeId variable can go out of sync with that in ImportTableDesc.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 219226)
Time Spent: 4h 10m  (was: 4h)

> Stats replication for ACID tables.
> --
>
> Key: HIVE-21109
> URL: https://issues.apache.org/jira/browse/HIVE-21109
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, 
> HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, 
> HIVE-21109.06.patch
>
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> Transactional tables require a writeID associated with the stats update. This 
> writeId needs to be in sync with the writeId on the source and hence needs to 
> be replicated from the source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.

2019-03-27 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=219220=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-219220
 ]

ASF GitHub Bot logged work on HIVE-21109:
-

Author: ASF GitHub Bot
Created on: 27/Mar/19 09:27
Start Date: 27/Mar/19 09:27
Worklog Time Spent: 10m 
  Work Description: ashutosh-bapat commented on pull request #579: 
HIVE-21109 : Support stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269468871
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
 ##
 @@ -2950,21 +2956,33 @@ public Partition createPartition(Table tbl, 
Map partSpec) throws
 int size = addPartitionDesc.getPartitionCount();
 List in =
 new ArrayList(size);
-AcidUtils.TableSnapshot tableSnapshot = AcidUtils.getTableSnapshot(conf, 
tbl, true);
 long writeId;
 String validWriteIdList;
-if (tableSnapshot != null && tableSnapshot.getWriteId() > 0) {
-  writeId = tableSnapshot.getWriteId();
-  validWriteIdList = tableSnapshot.getValidWriteIdList();
+
+// In case of replication, get the writeId from the source and use valid 
write Id list
+// for replication.
+if (addPartitionDesc.getReplicationSpec() != null &&
 
 Review comment:
   Done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 219220)
Time Spent: 4h  (was: 3h 50m)

> Stats replication for ACID tables.
> --
>
> Key: HIVE-21109
> URL: https://issues.apache.org/jira/browse/HIVE-21109
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, 
> HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, 
> HIVE-21109.06.patch
>
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> Transactional tables require a writeID associated with the stats update. This 
> writeId needs to be in sync with the writeId on the source and hence needs to 
> be replicated from the source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.

2019-03-27 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=219218=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-219218
 ]

ASF GitHub Bot logged work on HIVE-21109:
-

Author: ASF GitHub Bot
Created on: 27/Mar/19 09:24
Start Date: 27/Mar/19 09:24
Worklog Time Spent: 10m 
  Work Description: ashutosh-bapat commented on pull request #579: 
HIVE-21109 : Support stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269467769
 
 

 ##
 File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
 ##
 @@ -3539,10 +3573,19 @@ public boolean equals(Object obj) {
 }
 
 // Update partition column statistics if available
-for (Partition newPart : newParts) {
-  if (newPart.isSetColStats()) {
-updatePartitonColStatsInternal(tbl, newPart.getColStats(), null, 
newPart.getWriteId());
+int cnt = 0;
+for (ColumnStatistics partColStats: partsColStats) {
+  long writeId = partsWriteIds.get(cnt++);
+  // On replica craft a valid snapshot out of the writeId in the 
partition
+  String validWriteIds = null;
+  if (writeId > 0) {
+ValidWriteIdList vwil =
 
 Review comment:
   Done. Please check.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 219218)
Time Spent: 3h 50m  (was: 3h 40m)

> Stats replication for ACID tables.
> --
>
> Key: HIVE-21109
> URL: https://issues.apache.org/jira/browse/HIVE-21109
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, 
> HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, 
> HIVE-21109.06.patch
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> Transactional tables require a writeID associated with the stats update. This 
> writeId needs to be in sync with the writeId on the source and hence needs to 
> be replicated from the source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.

2019-03-27 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=219217=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-219217
 ]

ASF GitHub Bot logged work on HIVE-21109:
-

Author: ASF GitHub Bot
Created on: 27/Mar/19 09:24
Start Date: 27/Mar/19 09:24
Worklog Time Spent: 10m 
  Work Description: ashutosh-bapat commented on pull request #579: 
HIVE-21109 : Support stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269467699
 
 

 ##
 File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
 ##
 @@ -2130,11 +2144,18 @@ private void create_table_core(final RawStore ms, 
final Table tbl,
 
   // If the table has column statistics, update it into the metastore. 
This feature is used
   // by replication to replicate table level statistics.
-  if (tbl.isSetColStats()) {
-// We do not replicate statistics for a transactional table right now 
and hence we do not
-// expect a transactional table to have column statistics here. So 
passing null
-// validWriteIds is fine for now.
-updateTableColumnStatsInternal(tbl.getColStats(), null, 
tbl.getWriteId());
+  if (colStats != null) {
+// On replica craft a valid snapshot out of the writeId in the table.
+long writeId = tbl.getWriteId();
+String validWriteIds = null;
+if (writeId > 0) {
+  ValidWriteIdList vwil =
+  new 
ValidReaderWriteIdList(TableName.getDbTable(tbl.getDbName(),
 
 Review comment:
   Done. Please check.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 219217)
Time Spent: 3h 40m  (was: 3.5h)

> Stats replication for ACID tables.
> --
>
> Key: HIVE-21109
> URL: https://issues.apache.org/jira/browse/HIVE-21109
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, 
> HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, 
> HIVE-21109.06.patch
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> Transactional tables require a writeID associated with the stats update. This 
> writeId needs to be in sync with the writeId on the source and hence needs to 
> be replicated from the source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.

2019-03-27 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=219216=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-219216
 ]

ASF GitHub Bot logged work on HIVE-21109:
-

Author: ASF GitHub Bot
Created on: 27/Mar/19 09:24
Start Date: 27/Mar/19 09:24
Worklog Time Spent: 10m 
  Work Description: ashutosh-bapat commented on pull request #579: 
HIVE-21109 : Support stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269467626
 
 

 ##
 File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
 ##
 @@ -2130,11 +2144,18 @@ private void create_table_core(final RawStore ms, 
final Table tbl,
 
   // If the table has column statistics, update it into the metastore. 
This feature is used
   // by replication to replicate table level statistics.
-  if (tbl.isSetColStats()) {
-// We do not replicate statistics for a transactional table right now 
and hence we do not
-// expect a transactional table to have column statistics here. So 
passing null
-// validWriteIds is fine for now.
-updateTableColumnStatsInternal(tbl.getColStats(), null, 
tbl.getWriteId());
+  if (colStats != null) {
+// On replica craft a valid snapshot out of the writeId in the table.
+long writeId = tbl.getWriteId();
+String validWriteIds = null;
+if (writeId > 0) {
+  ValidWriteIdList vwil =
 
 Review comment:
   Done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 219216)
Time Spent: 3.5h  (was: 3h 20m)

> Stats replication for ACID tables.
> --
>
> Key: HIVE-21109
> URL: https://issues.apache.org/jira/browse/HIVE-21109
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, 
> HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, 
> HIVE-21109.06.patch
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> Transactional tables require a writeID associated with the stats update. This 
> writeId needs to be in sync with the writeId on the source and hence needs to 
> be replicated from the source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.

2019-03-27 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=219208=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-219208
 ]

ASF GitHub Bot logged work on HIVE-21109:
-

Author: ASF GitHub Bot
Created on: 27/Mar/19 09:14
Start Date: 27/Mar/19 09:14
Worklog Time Spent: 10m 
  Work Description: ashutosh-bapat commented on pull request #579: 
HIVE-21109 : Support stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269463941
 
 

 ##
 File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
 ##
 @@ -1894,6 +1898,16 @@ private void create_table_core(final RawStore ms, final 
Table tbl,
List checkConstraints)
 throws AlreadyExistsException, MetaException,
 InvalidObjectException, NoSuchObjectException, InvalidInputException {
+
+  ColumnStatistics colStats = null;
+  // If the given table has column statistics, save it here. We will 
update it later.
+  // We don't want it to be part of the Table object being created, lest 
the create table
 
 Review comment:
   " and also shouldn't be persisted". That's not true. We will persist the 
table stats but later. If you let me know which part of the comment is complex 
(needs simplification), will come up with alternate wording reflecting the same 
meaning.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 219208)
Time Spent: 3h 20m  (was: 3h 10m)

> Stats replication for ACID tables.
> --
>
> Key: HIVE-21109
> URL: https://issues.apache.org/jira/browse/HIVE-21109
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, 
> HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, 
> HIVE-21109.06.patch
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Transactional tables require a writeID associated with the stats update. This 
> writeId needs to be in sync with the writeId on the source and hence needs to 
> be replicated from the source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.

2019-03-27 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=219202=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-219202
 ]

ASF GitHub Bot logged work on HIVE-21109:
-

Author: ASF GitHub Bot
Created on: 27/Mar/19 08:41
Start Date: 27/Mar/19 08:41
Worklog Time Spent: 10m 
  Work Description: ashutosh-bapat commented on pull request #579: 
HIVE-21109 : Support stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269452642
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/events/filesystem/FSTableEvent.java
 ##
 @@ -199,12 +199,15 @@ private AddPartitionDesc partitionDesc(Path fromPath,
   // Right now, we do not have a way of associating a writeId with 
statistics for a table
   // converted to a transactional table if it was non-transactional on the 
source. So, do not
 
 Review comment:
   Done. Looks like I missed pushing an entire commit fixing the comments. Done 
now.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 219202)
Time Spent: 3h 10m  (was: 3h)

> Stats replication for ACID tables.
> --
>
> Key: HIVE-21109
> URL: https://issues.apache.org/jira/browse/HIVE-21109
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, 
> HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, 
> HIVE-21109.06.patch
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Transactional tables require a writeID associated with the stats update. This 
> writeId needs to be in sync with the writeId on the source and hence needs to 
> be replicated from the source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.

2019-03-27 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=219178=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-219178
 ]

ASF GitHub Bot logged work on HIVE-21109:
-

Author: ASF GitHub Bot
Created on: 27/Mar/19 06:58
Start Date: 27/Mar/19 06:58
Worklog Time Spent: 10m 
  Work Description: ashutosh-bapat commented on pull request #579: 
HIVE-21109 : Support stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269425412
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/parse/ImportSemanticAnalyzer.java
 ##
 @@ -1247,17 +1244,37 @@ private static void createReplImportTasks(
   } else if (!replicationSpec.isMetadataOnly()
   && !shouldSkipDataCopyInReplScope(tblDesc, replicationSpec)) {
 x.getLOG().debug("adding dependent CopyWork/MoveWork for table");
-t.addDependentTask(loadTable(fromURI, table, 
replicationSpec.isReplace(),
-new Path(tblDesc.getLocation()), replicationSpec, x, writeId, 
stmtId));
+dependentTasks = new ArrayList<>(1);
+dependentTasks.add(loadTable(fromURI, table, 
replicationSpec.isReplace(),
+  new Path(tblDesc.getLocation()), 
replicationSpec,
+  x, writeId, stmtId));
   }
 
-  if (dropTblTask != null) {
-// Drop first and then create
-dropTblTask.addDependentTask(t);
-x.getTasks().add(dropTblTask);
+  // During replication, by the time we reply a commit transaction event, 
the table should
+  // have been already created when replaying previous events. So no need 
to create table
+  // again. For some reason we need create table task for partitioned 
table though.
 
 Review comment:
   Corrected. The partition case is already fixed, but the comment wasn't 
corrected.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 219178)
Time Spent: 3h  (was: 2h 50m)

> Stats replication for ACID tables.
> --
>
> Key: HIVE-21109
> URL: https://issues.apache.org/jira/browse/HIVE-21109
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, 
> HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, 
> HIVE-21109.06.patch
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> Transactional tables require a writeID associated with the stats update. This 
> writeId needs to be in sync with the writeId on the source and hence needs to 
> be replicated from the source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.

2019-03-27 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=219177=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-219177
 ]

ASF GitHub Bot logged work on HIVE-21109:
-

Author: ASF GitHub Bot
Created on: 27/Mar/19 06:52
Start Date: 27/Mar/19 06:52
Worklog Time Spent: 10m 
  Work Description: ashutosh-bapat commented on pull request #579: 
HIVE-21109 : Support stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269424107
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
 ##
 @@ -828,6 +828,8 @@ public void alterPartitions(String tblName, 
List newParts,
   new ArrayList();
 try {
   AcidUtils.TableSnapshot tableSnapshot = null;
+  // TODO: In case of replication use the writeId and valid write id list 
constructed for
 
 Review comment:
   I have addressed this comment and removed it as well. But didn't commit the 
change and thus wasn't part of the PR. I have updated PR. This TODO is no more 
there. Sorry.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 219177)
Time Spent: 2h 50m  (was: 2h 40m)

> Stats replication for ACID tables.
> --
>
> Key: HIVE-21109
> URL: https://issues.apache.org/jira/browse/HIVE-21109
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, 
> HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, 
> HIVE-21109.06.patch
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Transactional tables require a writeID associated with the stats update. This 
> writeId needs to be in sync with the writeId on the source and hence needs to 
> be replicated from the source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.

2019-03-27 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=219176=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-219176
 ]

ASF GitHub Bot logged work on HIVE-21109:
-

Author: ASF GitHub Bot
Created on: 27/Mar/19 06:51
Start Date: 27/Mar/19 06:51
Worklog Time Spent: 10m 
  Work Description: ashutosh-bapat commented on pull request #579: 
HIVE-21109 : Support stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269423978
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java
 ##
 @@ -2689,7 +2689,19 @@ private int alterTable(Hive db, AlterTableDesc 
alterTbl) throws HiveException {
   } else {
 // Note: this is necessary for UPDATE_STATISTICS command, that 
operates via ADDPROPS (why?).
 //   For any other updates, we don't want to do txn check on 
partitions when altering table.
-boolean isTxn = alterTbl.getPartSpec() != null && alterTbl.getOp() == 
AlterTableTypes.ADDPROPS;
+boolean isTxn = false;
+if (alterTbl.getPartSpec() != null && alterTbl.getOp() == 
AlterTableTypes.ADDPROPS) {
+  // ADDPROPS is used to add repl.last.id during replication. That's 
not a transactional
+  // change.
+  Map props = alterTbl.getProps();
+  if (props.size() <= 1 && 
props.get(ReplicationSpec.KEY.CURR_STATE_ID.toString()) != null) {
+isTxn = false;
+  } else {
+isTxn = true;
+  }
+}
+// TODO: Somehow we have to signal alterPartitions that it's part of 
replication and
+//  should use replication's valid writeid list instead of creating 
one.
 
 Review comment:
   I have addressed this comment and removed it as well. But didn't commit the 
change and thus wasn't part of the PR. I have updated PR. This TODO is no more 
there. Sorry.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 219176)
Time Spent: 2h 40m  (was: 2.5h)

> Stats replication for ACID tables.
> --
>
> Key: HIVE-21109
> URL: https://issues.apache.org/jira/browse/HIVE-21109
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, 
> HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, 
> HIVE-21109.06.patch
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Transactional tables require a writeID associated with the stats update. This 
> writeId needs to be in sync with the writeId on the source and hence needs to 
> be replicated from the source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.

2019-03-26 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=218858=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218858
 ]

ASF GitHub Bot logged work on HIVE-21109:
-

Author: ASF GitHub Bot
Created on: 26/Mar/19 18:58
Start Date: 26/Mar/19 18:58
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #579: HIVE-21109 : 
Support stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269136269
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/parse/ImportSemanticAnalyzer.java
 ##
 @@ -1247,17 +1244,37 @@ private static void createReplImportTasks(
   } else if (!replicationSpec.isMetadataOnly()
   && !shouldSkipDataCopyInReplScope(tblDesc, replicationSpec)) {
 x.getLOG().debug("adding dependent CopyWork/MoveWork for table");
-t.addDependentTask(loadTable(fromURI, table, 
replicationSpec.isReplace(),
-new Path(tblDesc.getLocation()), replicationSpec, x, writeId, 
stmtId));
+dependentTasks = new ArrayList<>(1);
+dependentTasks.add(loadTable(fromURI, table, 
replicationSpec.isReplace(),
+  new Path(tblDesc.getLocation()), 
replicationSpec,
+  x, writeId, stmtId));
   }
 
-  if (dropTblTask != null) {
-// Drop first and then create
-dropTblTask.addDependentTask(t);
-x.getTasks().add(dropTblTask);
+  // During replication, by the time we reply a commit transaction event, 
the table should
+  // have been already created when replaying previous events. So no need 
to create table
+  // again. For some reason we need create table task for partitioned 
table though.
 
 Review comment:
   The comment says for partitioned table, create table task needed but in the 
code it is skipped always for commit txn event. Which one is correct?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 218858)
Time Spent: 1h 10m  (was: 1h)

> Stats replication for ACID tables.
> --
>
> Key: HIVE-21109
> URL: https://issues.apache.org/jira/browse/HIVE-21109
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, 
> HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, 
> HIVE-21109.06.patch
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Transactional tables require a writeID associated with the stats update. This 
> writeId needs to be in sync with the writeId on the source and hence needs to 
> be replicated from the source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.

2019-03-26 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=218855=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218855
 ]

ASF GitHub Bot logged work on HIVE-21109:
-

Author: ASF GitHub Bot
Created on: 26/Mar/19 18:58
Start Date: 26/Mar/19 18:58
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #579: HIVE-21109 : 
Support stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269156935
 
 

 ##
 File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
 ##
 @@ -1894,6 +1898,16 @@ private void create_table_core(final RawStore ms, final 
Table tbl,
List checkConstraints)
 throws AlreadyExistsException, MetaException,
 InvalidObjectException, NoSuchObjectException, InvalidInputException {
+
+  ColumnStatistics colStats = null;
+  // If the given table has column statistics, save it here. We will 
update it later.
+  // We don't want it to be part of the Table object being created, lest 
the create table
 
 Review comment:
   Shall simplify the comment. "Column stats are not expected to be part of 
Create table event and also shouldn't be persisted. So remove it from Table 
object."
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 218855)
Time Spent: 50m  (was: 40m)

> Stats replication for ACID tables.
> --
>
> Key: HIVE-21109
> URL: https://issues.apache.org/jira/browse/HIVE-21109
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, 
> HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, 
> HIVE-21109.06.patch
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Transactional tables require a writeID associated with the stats update. This 
> writeId needs to be in sync with the writeId on the source and hence needs to 
> be replicated from the source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.

2019-03-26 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=218867=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218867
 ]

ASF GitHub Bot logged work on HIVE-21109:
-

Author: ASF GitHub Bot
Created on: 26/Mar/19 18:58
Start Date: 26/Mar/19 18:58
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #579: HIVE-21109 : 
Support stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269247183
 
 

 ##
 File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestStatsReplicationScenarios.java
 ##
 @@ -359,17 +383,20 @@ private void testStatsReplicationCommon(boolean 
parallelBootstrap, boolean metad
   }
 
   @Test
-  public void testForNonAcidTables() throws Throwable {
+  public void testNonParallelBootstrapLoad() throws Throwable {
+LOG.info("Testing " + testName.getClass().getName() + "." + 
testName.getMethodName());
 testStatsReplicationCommon(false, false);
   }
 
   @Test
-  public void testForNonAcidTablesParallelBootstrapLoad() throws Throwable {
-testStatsReplicationCommon(true, false);
+  public void testForParallelBootstrapLoad() throws Throwable {
+LOG.info("Testing " + testName.getClass().getName() + "." + 
testName.getMethodName());
+testStatsReplicationCommon(true, false );
   }
 
   @Test
-  public void testNonAcidMetadataOnlyDump() throws Throwable {
+  public void testMetadataOnlyDump() throws Throwable {
 
 Review comment:
   Add more tests for the following scenarios.
   1. REPL LOAD fails after replicating table or partition objects with stats 
but before setting last replId. Now, retry which takes alter table/partition 
replace flows and stats should be valid after successful replication. Need this 
for all non-transactional, transactional and migration cases.
   2. Parallel inserts with autogather enabled. Now, we will have events such 
that multiple txns open when updating stats event. Also, try to simulate that 
one stats update was successful and the other one invalidates it due to 
concurrent writes. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 218867)
Time Spent: 2.5h  (was: 2h 20m)

> Stats replication for ACID tables.
> --
>
> Key: HIVE-21109
> URL: https://issues.apache.org/jira/browse/HIVE-21109
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, 
> HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, 
> HIVE-21109.06.patch
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Transactional tables require a writeID associated with the stats update. This 
> writeId needs to be in sync with the writeId on the source and hence needs to 
> be replicated from the source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.

2019-03-26 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=218865=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218865
 ]

ASF GitHub Bot logged work on HIVE-21109:
-

Author: ASF GitHub Bot
Created on: 26/Mar/19 18:58
Start Date: 26/Mar/19 18:58
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #579: HIVE-21109 : 
Support stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269262756
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/plan/ImportTableDesc.java
 ##
 @@ -381,4 +382,11 @@ public void setOwnerName(String ownerName) {
 throw new RuntimeException("Invalid table type : " + getDescType());
 }
   }
+
+  public Long getReplWriteId() {
+if (this.createTblDesc != null) {
+  return this.createTblDesc.getReplWriteId();
 
 Review comment:
   This replWriteId is just a place holder for the writeId from the event 
message. It need not be in CreateTableDesc. Can be maintained in local 
variables and pass around.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 218865)
Time Spent: 2h 10m  (was: 2h)

> Stats replication for ACID tables.
> --
>
> Key: HIVE-21109
> URL: https://issues.apache.org/jira/browse/HIVE-21109
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, 
> HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, 
> HIVE-21109.06.patch
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Transactional tables require a writeID associated with the stats update. This 
> writeId needs to be in sync with the writeId on the source and hence needs to 
> be replicated from the source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.

2019-03-26 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=218860=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218860
 ]

ASF GitHub Bot logged work on HIVE-21109:
-

Author: ASF GitHub Bot
Created on: 26/Mar/19 18:58
Start Date: 26/Mar/19 18:58
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #579: HIVE-21109 : 
Support stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269220469
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
 ##
 @@ -2950,21 +2956,33 @@ public Partition createPartition(Table tbl, 
Map partSpec) throws
 int size = addPartitionDesc.getPartitionCount();
 List in =
 new ArrayList(size);
-AcidUtils.TableSnapshot tableSnapshot = AcidUtils.getTableSnapshot(conf, 
tbl, true);
 long writeId;
 String validWriteIdList;
-if (tableSnapshot != null && tableSnapshot.getWriteId() > 0) {
-  writeId = tableSnapshot.getWriteId();
-  validWriteIdList = tableSnapshot.getValidWriteIdList();
+
+// In case of replication, get the writeId from the source and use valid 
write Id list
+// for replication.
+if (addPartitionDesc.getReplicationSpec() != null &&
+addPartitionDesc.getReplicationSpec().isInReplicationScope() &&
+addPartitionDesc.getPartition(0).getWriteId() > 0) {
+  writeId = addPartitionDesc.getPartition(0).getWriteId();
+  validWriteIdList =
 
 Review comment:
   In replication flow, it is fine to use hardcoded ValidWriteIdList as we want 
to forcefully set this writeId into table or partition objects. Getting it from 
current state might be wrong as we don't update ValidTxnList in conf for repl 
created txns. 
   ValidWriteIdList is just used to check if writeId in metastore objects are 
updated by any concurrent inserts. In repl load flow, it is not possible as we 
replicate one event at a time or in bootstrap, no 2 threads writes into same 
table.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 218860)
Time Spent: 1.5h  (was: 1h 20m)

> Stats replication for ACID tables.
> --
>
> Key: HIVE-21109
> URL: https://issues.apache.org/jira/browse/HIVE-21109
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, 
> HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, 
> HIVE-21109.06.patch
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Transactional tables require a writeID associated with the stats update. This 
> writeId needs to be in sync with the writeId on the source and hence needs to 
> be replicated from the source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.

2019-03-26 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=218863=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218863
 ]

ASF GitHub Bot logged work on HIVE-21109:
-

Author: ASF GitHub Bot
Created on: 26/Mar/19 18:58
Start Date: 26/Mar/19 18:58
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #579: HIVE-21109 : 
Support stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269169210
 
 

 ##
 File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
 ##
 @@ -2130,11 +2144,18 @@ private void create_table_core(final RawStore ms, 
final Table tbl,
 
   // If the table has column statistics, update it into the metastore. 
This feature is used
   // by replication to replicate table level statistics.
-  if (tbl.isSetColStats()) {
-// We do not replicate statistics for a transactional table right now 
and hence we do not
-// expect a transactional table to have column statistics here. So 
passing null
-// validWriteIds is fine for now.
-updateTableColumnStatsInternal(tbl.getColStats(), null, 
tbl.getWriteId());
+  if (colStats != null) {
+// On replica craft a valid snapshot out of the writeId in the table.
+long writeId = tbl.getWriteId();
+String validWriteIds = null;
+if (writeId > 0) {
+  ValidWriteIdList vwil =
+  new 
ValidReaderWriteIdList(TableName.getDbTable(tbl.getDbName(),
 
 Review comment:
   Shall add a comment on why the hardcoded validWriteList is used in this flow 
instead of taking current state of txns.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 218863)
Time Spent: 1h 50m  (was: 1h 40m)

> Stats replication for ACID tables.
> --
>
> Key: HIVE-21109
> URL: https://issues.apache.org/jira/browse/HIVE-21109
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, 
> HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, 
> HIVE-21109.06.patch
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Transactional tables require a writeID associated with the stats update. This 
> writeId needs to be in sync with the writeId on the source and hence needs to 
> be replicated from the source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.

2019-03-26 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=218856=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218856
 ]

ASF GitHub Bot logged work on HIVE-21109:
-

Author: ASF GitHub Bot
Created on: 26/Mar/19 18:58
Start Date: 26/Mar/19 18:58
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #579: HIVE-21109 : 
Support stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269110947
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
 ##
 @@ -828,6 +828,8 @@ public void alterPartitions(String tblName, 
List newParts,
   new ArrayList();
 try {
   AcidUtils.TableSnapshot tableSnapshot = null;
+  // TODO: In case of replication use the writeId and valid write id list 
constructed for
 
 Review comment:
   Is it done or still TODO?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 218856)
Time Spent: 1h  (was: 50m)

> Stats replication for ACID tables.
> --
>
> Key: HIVE-21109
> URL: https://issues.apache.org/jira/browse/HIVE-21109
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, 
> HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, 
> HIVE-21109.06.patch
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Transactional tables require a writeID associated with the stats update. This 
> writeId needs to be in sync with the writeId on the source and hence needs to 
> be replicated from the source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.

2019-03-26 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=218854=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218854
 ]

ASF GitHub Bot logged work on HIVE-21109:
-

Author: ASF GitHub Bot
Created on: 26/Mar/19 18:58
Start Date: 26/Mar/19 18:58
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #579: HIVE-21109 : 
Support stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269060256
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/ddl/table/CreateTableDesc.java
 ##
 @@ -118,7 +118,8 @@
   List notNullConstraints;
   List defaultConstraints;
   List checkConstraints;
-  private ColumnStatistics colStats;
+  private ColumnStatistics colStats;  // For the sake of replication
+  private long writeId = -1; // For the sake of replication
 
 Review comment:
   Can we re-use the replWriteId variable that we already have?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 218854)
Time Spent: 40m  (was: 0.5h)

> Stats replication for ACID tables.
> --
>
> Key: HIVE-21109
> URL: https://issues.apache.org/jira/browse/HIVE-21109
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, 
> HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, 
> HIVE-21109.06.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Transactional tables require a writeID associated with the stats update. This 
> writeId needs to be in sync with the writeId on the source and hence needs to 
> be replicated from the source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.

2019-03-26 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=218853=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218853
 ]

ASF GitHub Bot logged work on HIVE-21109:
-

Author: ASF GitHub Bot
Created on: 26/Mar/19 18:58
Start Date: 26/Mar/19 18:58
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #579: HIVE-21109 : 
Support stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269098036
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java
 ##
 @@ -2689,7 +2689,19 @@ private int alterTable(Hive db, AlterTableDesc 
alterTbl) throws HiveException {
   } else {
 // Note: this is necessary for UPDATE_STATISTICS command, that 
operates via ADDPROPS (why?).
 //   For any other updates, we don't want to do txn check on 
partitions when altering table.
-boolean isTxn = alterTbl.getPartSpec() != null && alterTbl.getOp() == 
AlterTableTypes.ADDPROPS;
+boolean isTxn = false;
+if (alterTbl.getPartSpec() != null && alterTbl.getOp() == 
AlterTableTypes.ADDPROPS) {
+  // ADDPROPS is used to add repl.last.id during replication. That's 
not a transactional
+  // change.
+  Map props = alterTbl.getProps();
+  if (props.size() <= 1 && 
props.get(ReplicationSpec.KEY.CURR_STATE_ID.toString()) != null) {
+isTxn = false;
+  } else {
+isTxn = true;
+  }
+}
+// TODO: Somehow we have to signal alterPartitions that it's part of 
replication and
+//  should use replication's valid writeid list instead of creating 
one.
 
 Review comment:
   What do you mean by replication's valid writeid list in this comment? Even 
in repl flow, we get validWriteIdList from HMS based on incoming writeId in the 
event msg. Are you suggesting to cache this ValidWriteIdList somewhere and use 
it instead of invoking HMS API?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 218853)
Time Spent: 0.5h  (was: 20m)

> Stats replication for ACID tables.
> --
>
> Key: HIVE-21109
> URL: https://issues.apache.org/jira/browse/HIVE-21109
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, 
> HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, 
> HIVE-21109.06.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Transactional tables require a writeID associated with the stats update. This 
> writeId needs to be in sync with the writeId on the source and hence needs to 
> be replicated from the source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.

2019-03-26 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=218864=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218864
 ]

ASF GitHub Bot logged work on HIVE-21109:
-

Author: ASF GitHub Bot
Created on: 26/Mar/19 18:58
Start Date: 26/Mar/19 18:58
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #579: HIVE-21109 : 
Support stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269223302
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
 ##
 @@ -2950,21 +2956,33 @@ public Partition createPartition(Table tbl, 
Map partSpec) throws
 int size = addPartitionDesc.getPartitionCount();
 List in =
 new ArrayList(size);
-AcidUtils.TableSnapshot tableSnapshot = AcidUtils.getTableSnapshot(conf, 
tbl, true);
 long writeId;
 String validWriteIdList;
-if (tableSnapshot != null && tableSnapshot.getWriteId() > 0) {
-  writeId = tableSnapshot.getWriteId();
-  validWriteIdList = tableSnapshot.getValidWriteIdList();
+
+// In case of replication, get the writeId from the source and use valid 
write Id list
+// for replication.
+if (addPartitionDesc.getReplicationSpec() != null &&
 
 Review comment:
   addPartitionDesc.getReplicationSpec() will never be null. Can remove this 
check.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 218864)
Time Spent: 2h  (was: 1h 50m)

> Stats replication for ACID tables.
> --
>
> Key: HIVE-21109
> URL: https://issues.apache.org/jira/browse/HIVE-21109
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, 
> HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, 
> HIVE-21109.06.patch
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Transactional tables require a writeID associated with the stats update. This 
> writeId needs to be in sync with the writeId on the source and hence needs to 
> be replicated from the source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.

2019-03-26 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=218852=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218852
 ]

ASF GitHub Bot logged work on HIVE-21109:
-

Author: ASF GitHub Bot
Created on: 26/Mar/19 18:58
Start Date: 26/Mar/19 18:58
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #579: HIVE-21109 : 
Support stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269081532
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java
 ##
 @@ -2689,7 +2689,19 @@ private int alterTable(Hive db, AlterTableDesc 
alterTbl) throws HiveException {
   } else {
 // Note: this is necessary for UPDATE_STATISTICS command, that 
operates via ADDPROPS (why?).
 //   For any other updates, we don't want to do txn check on 
partitions when altering table.
-boolean isTxn = alterTbl.getPartSpec() != null && alterTbl.getOp() == 
AlterTableTypes.ADDPROPS;
+boolean isTxn = false;
+if (alterTbl.getPartSpec() != null && alterTbl.getOp() == 
AlterTableTypes.ADDPROPS) {
+  // ADDPROPS is used to add repl.last.id during replication. That's 
not a transactional
+  // change.
+  Map props = alterTbl.getProps();
+  if (props.size() <= 1 && 
props.get(ReplicationSpec.KEY.CURR_STATE_ID.toString()) != null) {
 
 Review comment:
   ReplUtils.REPL_CHECKPOINT_KEY is another prop we set it in repl flow which 
is not transactional. This check doesn't seems to be clean as in future we 
might add more such alters in repl flow. Can we check 
replicationSpec.isReplicationScope instead or another flag in AlterTableDesc to 
skip this?
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 218852)
Time Spent: 20m  (was: 10m)

> Stats replication for ACID tables.
> --
>
> Key: HIVE-21109
> URL: https://issues.apache.org/jira/browse/HIVE-21109
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, 
> HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, 
> HIVE-21109.06.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Transactional tables require a writeID associated with the stats update. This 
> writeId needs to be in sync with the writeId on the source and hence needs to 
> be replicated from the source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.

2019-03-26 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=218861=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218861
 ]

ASF GitHub Bot logged work on HIVE-21109:
-

Author: ASF GitHub Bot
Created on: 26/Mar/19 18:58
Start Date: 26/Mar/19 18:58
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #579: HIVE-21109 : 
Support stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269161871
 
 

 ##
 File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
 ##
 @@ -2130,11 +2144,18 @@ private void create_table_core(final RawStore ms, 
final Table tbl,
 
   // If the table has column statistics, update it into the metastore. 
This feature is used
   // by replication to replicate table level statistics.
-  if (tbl.isSetColStats()) {
-// We do not replicate statistics for a transactional table right now 
and hence we do not
-// expect a transactional table to have column statistics here. So 
passing null
-// validWriteIds is fine for now.
-updateTableColumnStatsInternal(tbl.getColStats(), null, 
tbl.getWriteId());
+  if (colStats != null) {
+// On replica craft a valid snapshot out of the writeId in the table.
+long writeId = tbl.getWriteId();
+String validWriteIds = null;
+if (writeId > 0) {
+  ValidWriteIdList vwil =
 
 Review comment:
   Shall use meaningful names instead of "vwil".
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 218861)

> Stats replication for ACID tables.
> --
>
> Key: HIVE-21109
> URL: https://issues.apache.org/jira/browse/HIVE-21109
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, 
> HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, 
> HIVE-21109.06.patch
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Transactional tables require a writeID associated with the stats update. This 
> writeId needs to be in sync with the writeId on the source and hence needs to 
> be replicated from the source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.

2019-03-26 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=218866=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218866
 ]

ASF GitHub Bot logged work on HIVE-21109:
-

Author: ASF GitHub Bot
Created on: 26/Mar/19 18:58
Start Date: 26/Mar/19 18:58
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #579: HIVE-21109 : 
Support stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269257547
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
 ##
 @@ -987,10 +989,14 @@ public void createTable(Table tbl, boolean ifNotExists,
   tTbl.setPrivileges(principalPrivs);
 }
   }
-  // Set table snapshot to api.Table to make it persistent.
-  TableSnapshot tableSnapshot = AcidUtils.getTableSnapshot(conf, tbl, 
true);
-  if (tableSnapshot != null) {
-tbl.getTTable().setWriteId(tableSnapshot.getWriteId());
+  // Set table snapshot to api.Table to make it persistent. A 
transactional table being
+  // replicated may have a valid write Id copied from the source. Use that 
instead of
+  // crafting one on the replica.
+  if (tTbl.getWriteId() <= 0) {
 
 Review comment:
   DO_NOT_UPDATE_STATS flag should be set in createTableFlow as well. Or else 
in autogather mode at target, it will be updated automatically.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 218866)
Time Spent: 2h 20m  (was: 2h 10m)

> Stats replication for ACID tables.
> --
>
> Key: HIVE-21109
> URL: https://issues.apache.org/jira/browse/HIVE-21109
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, 
> HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, 
> HIVE-21109.06.patch
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Transactional tables require a writeID associated with the stats update. This 
> writeId needs to be in sync with the writeId on the source and hence needs to 
> be replicated from the source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.

2019-03-26 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=218857=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218857
 ]

ASF GitHub Bot logged work on HIVE-21109:
-

Author: ASF GitHub Bot
Created on: 26/Mar/19 18:58
Start Date: 26/Mar/19 18:58
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #579: HIVE-21109 : 
Support stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269103325
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/events/filesystem/FSTableEvent.java
 ##
 @@ -199,12 +199,15 @@ private AddPartitionDesc partitionDesc(Path fromPath,
   // Right now, we do not have a way of associating a writeId with 
statistics for a table
   // converted to a transactional table if it was non-transactional on the 
source. So, do not
 
 Review comment:
   Comment needs to be corrected.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 218857)

> Stats replication for ACID tables.
> --
>
> Key: HIVE-21109
> URL: https://issues.apache.org/jira/browse/HIVE-21109
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, 
> HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, 
> HIVE-21109.06.patch
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Transactional tables require a writeID associated with the stats update. This 
> writeId needs to be in sync with the writeId on the source and hence needs to 
> be replicated from the source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.

2019-03-26 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=218859=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218859
 ]

ASF GitHub Bot logged work on HIVE-21109:
-

Author: ASF GitHub Bot
Created on: 26/Mar/19 18:58
Start Date: 26/Mar/19 18:58
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #579: HIVE-21109 : 
Support stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269154738
 
 

 ##
 File path: 
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnCommonUtils.java
 ##
 @@ -84,6 +86,73 @@ public static ValidTxnList 
createValidReadTxnList(GetOpenTxnsResponse txns, long
 return new ValidReadTxnList(exceptions, outAbortedBits, highWaterMark, 
minOpenTxnId);
   }
 
+  /**
+   * Transform a {@link 
org.apache.hadoop.hive.metastore.api.GetOpenTxnsResponse} to a
+   * {@link org.apache.hadoop.hive.common.ValidTxnList}.  This assumes that 
the caller intends to
+   * read the files, and thus treats both open and aborted transactions as 
invalid.
+   *
+   * This API is used by Hive replication which may have multiple transactions 
open at a time.
+   *
+   * @param txns open txn list from the metastore
+   * @param currentTxns Current transactions that the replication has opened.  
If any of the
+   *transactions is greater than 0 it will be removed from 
the exceptions
+   *list so that the replication sees its own transaction 
as valid.
+   * @return a valid txn list.
+   */
+  public static ValidTxnList createValidReadTxnList(GetOpenTxnsResponse txns,
 
 Review comment:
   The complete logic of considering all txns opened in a batch by open txn 
event as current txns is incorrect. 
   Multiple txns are opened by repl task only for replicating Hive Streaming 
case where we allocate txns batch but use one at a time. Also, we don't update 
stats in that case. Even if we update stats, it should refer to one txn as 
current txn and rest of the txns are left open. 
   Shall remove replTxnIds cache in TxnManager as well. All callers shall 
create a hardcoded ValidWriteIdList using the writeId received from event msg.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 218859)
Time Spent: 1h 20m  (was: 1h 10m)

> Stats replication for ACID tables.
> --
>
> Key: HIVE-21109
> URL: https://issues.apache.org/jira/browse/HIVE-21109
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, 
> HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, 
> HIVE-21109.06.patch
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Transactional tables require a writeID associated with the stats update. This 
> writeId needs to be in sync with the writeId on the source and hence needs to 
> be replicated from the source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.

2019-03-26 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=218862=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218862
 ]

ASF GitHub Bot logged work on HIVE-21109:
-

Author: ASF GitHub Bot
Created on: 26/Mar/19 18:58
Start Date: 26/Mar/19 18:58
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #579: HIVE-21109 : 
Support stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269172695
 
 

 ##
 File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
 ##
 @@ -3539,10 +3573,19 @@ public boolean equals(Object obj) {
 }
 
 // Update partition column statistics if available
-for (Partition newPart : newParts) {
-  if (newPart.isSetColStats()) {
-updatePartitonColStatsInternal(tbl, newPart.getColStats(), null, 
newPart.getWriteId());
+int cnt = 0;
+for (ColumnStatistics partColStats: partsColStats) {
+  long writeId = partsWriteIds.get(cnt++);
+  // On replica craft a valid snapshot out of the writeId in the 
partition
+  String validWriteIds = null;
+  if (writeId > 0) {
+ValidWriteIdList vwil =
 
 Review comment:
   Same as above.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 218862)
Time Spent: 1h 40m  (was: 1.5h)

> Stats replication for ACID tables.
> --
>
> Key: HIVE-21109
> URL: https://issues.apache.org/jira/browse/HIVE-21109
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, 
> HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, 
> HIVE-21109.06.patch
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Transactional tables require a writeID associated with the stats update. This 
> writeId needs to be in sync with the writeId on the source and hence needs to 
> be replicated from the source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.

2019-03-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=217846=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-217846
 ]

ASF GitHub Bot logged work on HIVE-21109:
-

Author: ASF GitHub Bot
Created on: 25/Mar/19 06:45
Start Date: 25/Mar/19 06:45
Worklog Time Spent: 10m 
  Work Description: ashutosh-bapat commented on pull request #579: 
HIVE-21109 : Support stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579
 
 
   During bootstrap we use a method similar to non-ACID tables to transfer 
statistics of an ACID table
   from source to replica. However installing statistics of an ACID table 
requires a valid writeId and
   writeId list. We use the table/partition's latest writeId and a valid 
transaction list containing
   only that writeId to install the statistics in the metastore.
   
   During incremental replication writeId is obtained from the UpdateStats 
event and valid writeId list
   with that writeId marked as valid is used to install the column statistics. 
Table level statistics is
   replicated by replaying corresponding ALTER_TABLE/ALTER_PARTITION event.
   
   Further this commit has following related changes.
   
   1. The table or the partition associated with the commit transaction event 
should have been created
   when replaying corresponding events before commit transaction event. Thus 
there is no need to add
   tasks for creating the table or the partition.
   
   2 Maintain a list of open replicated transactions and use that to create 
valid transactions list
   when replaying a replicated event.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 217846)
Time Spent: 10m
Remaining Estimate: 0h

> Stats replication for ACID tables.
> --
>
> Key: HIVE-21109
> URL: https://issues.apache.org/jira/browse/HIVE-21109
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, 
> HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Transactional tables require a writeID associated with the stats update. This 
> writeId needs to be in sync with the writeId on the source and hence needs to 
> be replicated from the source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)