[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.
[ https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=225969=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-225969 ] ASF GitHub Bot logged work on HIVE-21109: - Author: ASF GitHub Bot Created on: 11/Apr/19 05:41 Start Date: 11/Apr/19 05:41 Worklog Time Spent: 10m Work Description: ashutosh-bapat commented on pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r274264405 ## File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestStatsReplicationScenariosMM.java ## @@ -0,0 +1,54 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hive.ql.parse; + +import org.apache.hadoop.hive.conf.HiveConf; +import org.apache.hadoop.hive.metastore.conf.MetastoreConf; +import org.apache.hadoop.hive.metastore.messaging.json.gzip.GzipJSONMessageEncoder; +import org.junit.BeforeClass; +import org.junit.Rule; +import org.junit.rules.TestName; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.HashMap; +import java.util.Map; + +/** + * Tests statistics replication for ACID tables. + */ +public class TestStatsReplicationScenariosMM extends TestStatsReplicationScenarios { + @Rule + public final TestName testName = new TestName(); + + @BeforeClass + public static void classLevelSetup() throws Exception { +Map overrides = new HashMap<>(); +overrides.put(MetastoreConf.ConfVars.EVENT_MESSAGE_FACTORY.getHiveName(), +GzipJSONMessageEncoder.class.getCanonicalName()); +overrides.put(HiveConf.ConfVars.HIVE_SUPPORT_CONCURRENCY.varname, "true"); +overrides.put(HiveConf.ConfVars.HIVE_TXN_MANAGER.varname, + "org.apache.hadoop.hive.ql.lockmgr.DbTxnManager"); + overrides.put(MetastoreConf.ConfVars.CAPABILITY_CHECK.getHiveName(),"false"); + overrides.put(HiveConf.ConfVars.REPL_BOOTSTRAP_DUMP_OPEN_TXN_TIMEOUT.varname,"1s"); +overrides.put(HiveConf.ConfVars.DYNAMICPARTITIONINGMODE.varname, "nonstrict"); + + +internalBeforeClassSetup(overrides, overrides, TestReplicationScenarios.class, true, "mm"); Review comment: Done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 225969) Time Spent: 13h 20m (was: 13h 10m) > Stats replication for ACID tables. > -- > > Key: HIVE-21109 > URL: https://issues.apache.org/jira/browse/HIVE-21109 > Project: Hive > Issue Type: Sub-task >Reporter: Ashutosh Bapat >Assignee: Ashutosh Bapat >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, > HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, > HIVE-21109.06.patch, HIVE-21109.07.patch, HIVE-21109.08.patch, > HIVE-21109.09.patch, HIVE-21109.09.patch, HIVE-21109.10.patch > > Time Spent: 13h 20m > Remaining Estimate: 0h > > Transactional tables require a writeID associated with the stats update. This > writeId needs to be in sync with the writeId on the source and hence needs to > be replicated from the source. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.
[ https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=225970=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-225970 ] ASF GitHub Bot logged work on HIVE-21109: - Author: ASF GitHub Bot Created on: 11/Apr/19 05:41 Start Date: 11/Apr/19 05:41 Worklog Time Spent: 10m Work Description: ashutosh-bapat commented on pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r274264424 ## File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestStatsReplicationScenariosMMNoAutogather.java ## @@ -0,0 +1,55 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hive.ql.parse; + +import org.apache.hadoop.hive.conf.HiveConf; +import org.apache.hadoop.hive.metastore.conf.MetastoreConf; +import org.apache.hadoop.hive.metastore.messaging.json.gzip.GzipJSONMessageEncoder; +import org.junit.BeforeClass; +import org.junit.Rule; +import org.junit.rules.TestName; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.HashMap; +import java.util.Map; + +/** + * Tests statistics replication for ACID tables. + */ +public class TestStatsReplicationScenariosMMNoAutogather extends TestStatsReplicationScenarios { + @Rule + public final TestName testName = new TestName(); + + @BeforeClass + public static void classLevelSetup() throws Exception { +Map overrides = new HashMap<>(); +overrides.put(MetastoreConf.ConfVars.EVENT_MESSAGE_FACTORY.getHiveName(), +GzipJSONMessageEncoder.class.getCanonicalName()); +overrides.put(HiveConf.ConfVars.HIVE_SUPPORT_CONCURRENCY.varname, "true"); +overrides.put(HiveConf.ConfVars.HIVE_TXN_MANAGER.varname, + "org.apache.hadoop.hive.ql.lockmgr.DbTxnManager"); + overrides.put(MetastoreConf.ConfVars.CAPABILITY_CHECK.getHiveName(),"false"); + overrides.put(HiveConf.ConfVars.REPL_BOOTSTRAP_DUMP_OPEN_TXN_TIMEOUT.varname,"1s"); +overrides.put(HiveConf.ConfVars.DYNAMICPARTITIONINGMODE.varname, "nonstrict"); +overrides.put("mapred.input.dir.recursive", "true"); + + +internalBeforeClassSetup(overrides, overrides, TestReplicationScenarios.class, false, "mm"); Review comment: Done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 225970) Time Spent: 13.5h (was: 13h 20m) > Stats replication for ACID tables. > -- > > Key: HIVE-21109 > URL: https://issues.apache.org/jira/browse/HIVE-21109 > Project: Hive > Issue Type: Sub-task >Reporter: Ashutosh Bapat >Assignee: Ashutosh Bapat >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, > HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, > HIVE-21109.06.patch, HIVE-21109.07.patch, HIVE-21109.08.patch, > HIVE-21109.09.patch, HIVE-21109.09.patch, HIVE-21109.10.patch > > Time Spent: 13.5h > Remaining Estimate: 0h > > Transactional tables require a writeID associated with the stats update. This > writeId needs to be in sync with the writeId on the source and hence needs to > be replicated from the source. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.
[ https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=225966=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-225966 ] ASF GitHub Bot logged work on HIVE-21109: - Author: ASF GitHub Bot Created on: 11/Apr/19 05:36 Start Date: 11/Apr/19 05:36 Worklog Time Spent: 10m Work Description: ashutosh-bapat commented on pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r274263457 ## File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestStatsReplicationScenarios.java ## @@ -224,6 +236,17 @@ private void verifyNoPartitionStatsReplicationForMetadataOnly(String tableName) } } + private String getCreateTableProperties() { +if (acidTableKindToUse != null) { + if (acidTableKindToUse.equals("orc")) { Review comment: Done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 225966) Time Spent: 13h 10m (was: 13h) > Stats replication for ACID tables. > -- > > Key: HIVE-21109 > URL: https://issues.apache.org/jira/browse/HIVE-21109 > Project: Hive > Issue Type: Sub-task >Reporter: Ashutosh Bapat >Assignee: Ashutosh Bapat >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, > HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, > HIVE-21109.06.patch, HIVE-21109.07.patch, HIVE-21109.08.patch, > HIVE-21109.09.patch, HIVE-21109.09.patch, HIVE-21109.10.patch > > Time Spent: 13h 10m > Remaining Estimate: 0h > > Transactional tables require a writeID associated with the stats update. This > writeId needs to be in sync with the writeId on the source and hence needs to > be replicated from the source. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.
[ https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=225642=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-225642 ] ASF GitHub Bot logged work on HIVE-21109: - Author: ASF GitHub Bot Created on: 10/Apr/19 15:13 Start Date: 10/Apr/19 15:13 Worklog Time Spent: 10m Work Description: sankarh commented on pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r273981487 ## File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestStatsReplicationScenarios.java ## @@ -224,6 +236,17 @@ private void verifyNoPartitionStatsReplicationForMetadataOnly(String tableName) } } + private String getCreateTableProperties() { +if (acidTableKindToUse != null) { + if (acidTableKindToUse.equals("orc")) { Review comment: To be clear, the name can be "full_acid" and "mm_acid". In fact. MM table can be created on Orc data. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 225642) Time Spent: 13h (was: 12h 50m) > Stats replication for ACID tables. > -- > > Key: HIVE-21109 > URL: https://issues.apache.org/jira/browse/HIVE-21109 > Project: Hive > Issue Type: Sub-task >Reporter: Ashutosh Bapat >Assignee: Ashutosh Bapat >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, > HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, > HIVE-21109.06.patch, HIVE-21109.07.patch, HIVE-21109.08.patch, > HIVE-21109.09.patch, HIVE-21109.09.patch, HIVE-21109.10.patch > > Time Spent: 13h > Remaining Estimate: 0h > > Transactional tables require a writeID associated with the stats update. This > writeId needs to be in sync with the writeId on the source and hence needs to > be replicated from the source. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.
[ https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=225640=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-225640 ] ASF GitHub Bot logged work on HIVE-21109: - Author: ASF GitHub Bot Created on: 10/Apr/19 15:13 Start Date: 10/Apr/19 15:13 Worklog Time Spent: 10m Work Description: sankarh commented on pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r273984253 ## File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestStatsReplicationScenariosMM.java ## @@ -0,0 +1,54 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hive.ql.parse; + +import org.apache.hadoop.hive.conf.HiveConf; +import org.apache.hadoop.hive.metastore.conf.MetastoreConf; +import org.apache.hadoop.hive.metastore.messaging.json.gzip.GzipJSONMessageEncoder; +import org.junit.BeforeClass; +import org.junit.Rule; +import org.junit.rules.TestName; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.HashMap; +import java.util.Map; + +/** + * Tests statistics replication for ACID tables. + */ +public class TestStatsReplicationScenariosMM extends TestStatsReplicationScenarios { + @Rule + public final TestName testName = new TestName(); + + @BeforeClass + public static void classLevelSetup() throws Exception { +Map overrides = new HashMap<>(); +overrides.put(MetastoreConf.ConfVars.EVENT_MESSAGE_FACTORY.getHiveName(), +GzipJSONMessageEncoder.class.getCanonicalName()); +overrides.put(HiveConf.ConfVars.HIVE_SUPPORT_CONCURRENCY.varname, "true"); +overrides.put(HiveConf.ConfVars.HIVE_TXN_MANAGER.varname, + "org.apache.hadoop.hive.ql.lockmgr.DbTxnManager"); + overrides.put(MetastoreConf.ConfVars.CAPABILITY_CHECK.getHiveName(),"false"); + overrides.put(HiveConf.ConfVars.REPL_BOOTSTRAP_DUMP_OPEN_TXN_TIMEOUT.varname,"1s"); +overrides.put(HiveConf.ConfVars.DYNAMICPARTITIONINGMODE.varname, "nonstrict"); + + +internalBeforeClassSetup(overrides, overrides, TestReplicationScenarios.class, true, "mm"); Review comment: Shall use current class name TestStatsReplicationScenariosMM. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 225640) Time Spent: 12h 40m (was: 12.5h) > Stats replication for ACID tables. > -- > > Key: HIVE-21109 > URL: https://issues.apache.org/jira/browse/HIVE-21109 > Project: Hive > Issue Type: Sub-task >Reporter: Ashutosh Bapat >Assignee: Ashutosh Bapat >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, > HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, > HIVE-21109.06.patch, HIVE-21109.07.patch, HIVE-21109.08.patch, > HIVE-21109.09.patch, HIVE-21109.09.patch, HIVE-21109.10.patch > > Time Spent: 12h 40m > Remaining Estimate: 0h > > Transactional tables require a writeID associated with the stats update. This > writeId needs to be in sync with the writeId on the source and hence needs to > be replicated from the source. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.
[ https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=225641=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-225641 ] ASF GitHub Bot logged work on HIVE-21109: - Author: ASF GitHub Bot Created on: 10/Apr/19 15:13 Start Date: 10/Apr/19 15:13 Worklog Time Spent: 10m Work Description: sankarh commented on pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r273984105 ## File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestStatsReplicationScenariosMMNoAutogather.java ## @@ -0,0 +1,55 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hive.ql.parse; + +import org.apache.hadoop.hive.conf.HiveConf; +import org.apache.hadoop.hive.metastore.conf.MetastoreConf; +import org.apache.hadoop.hive.metastore.messaging.json.gzip.GzipJSONMessageEncoder; +import org.junit.BeforeClass; +import org.junit.Rule; +import org.junit.rules.TestName; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.HashMap; +import java.util.Map; + +/** + * Tests statistics replication for ACID tables. + */ +public class TestStatsReplicationScenariosMMNoAutogather extends TestStatsReplicationScenarios { + @Rule + public final TestName testName = new TestName(); + + @BeforeClass + public static void classLevelSetup() throws Exception { +Map overrides = new HashMap<>(); +overrides.put(MetastoreConf.ConfVars.EVENT_MESSAGE_FACTORY.getHiveName(), +GzipJSONMessageEncoder.class.getCanonicalName()); +overrides.put(HiveConf.ConfVars.HIVE_SUPPORT_CONCURRENCY.varname, "true"); +overrides.put(HiveConf.ConfVars.HIVE_TXN_MANAGER.varname, + "org.apache.hadoop.hive.ql.lockmgr.DbTxnManager"); + overrides.put(MetastoreConf.ConfVars.CAPABILITY_CHECK.getHiveName(),"false"); + overrides.put(HiveConf.ConfVars.REPL_BOOTSTRAP_DUMP_OPEN_TXN_TIMEOUT.varname,"1s"); +overrides.put(HiveConf.ConfVars.DYNAMICPARTITIONINGMODE.varname, "nonstrict"); +overrides.put("mapred.input.dir.recursive", "true"); + + +internalBeforeClassSetup(overrides, overrides, TestReplicationScenarios.class, false, "mm"); Review comment: Shall use current class name TestStatsReplicationScenariosMMNoAutogather. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 225641) Time Spent: 12h 50m (was: 12h 40m) > Stats replication for ACID tables. > -- > > Key: HIVE-21109 > URL: https://issues.apache.org/jira/browse/HIVE-21109 > Project: Hive > Issue Type: Sub-task >Reporter: Ashutosh Bapat >Assignee: Ashutosh Bapat >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, > HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, > HIVE-21109.06.patch, HIVE-21109.07.patch, HIVE-21109.08.patch, > HIVE-21109.09.patch, HIVE-21109.09.patch, HIVE-21109.10.patch > > Time Spent: 12h 50m > Remaining Estimate: 0h > > Transactional tables require a writeID associated with the stats update. This > writeId needs to be in sync with the writeId on the source and hence needs to > be replicated from the source. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.
[ https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=225073=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-225073 ] ASF GitHub Bot logged work on HIVE-21109: - Author: ASF GitHub Bot Created on: 09/Apr/19 15:55 Start Date: 09/Apr/19 15:55 Worklog Time Spent: 10m Work Description: ashutosh-bapat commented on pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r273563740 ## File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestStatsReplicationScenariosMigration.java ## @@ -0,0 +1,78 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hive.ql.parse; + +import org.apache.hadoop.hive.conf.HiveConf; +import org.apache.hadoop.hive.metastore.conf.MetastoreConf; +import org.apache.hadoop.hive.metastore.messaging.json.gzip.GzipJSONMessageEncoder; +import org.junit.BeforeClass; +import org.junit.Rule; +import org.junit.rules.TestName; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.HashMap; +import java.util.Map; + +/** + * Tests statistics replication for ACID tables. + */ +public class TestStatsReplicationScenariosMigration extends TestStatsReplicationScenarios { + @Rule + public final TestName testName = new TestName(); + + protected static final Logger LOG = LoggerFactory.getLogger(TestReplicationScenarios.class); + + @BeforeClass + public static void classLevelSetup() throws Exception { +Map overrides = new HashMap<>(); +overrides.put(MetastoreConf.ConfVars.EVENT_MESSAGE_FACTORY.getHiveName(), +GzipJSONMessageEncoder.class.getCanonicalName()); + +HashMap replicaConfigs = new HashMap() {{ + put("hive.support.concurrency", "true"); + put("hive.txn.manager", "org.apache.hadoop.hive.ql.lockmgr.DbTxnManager"); + put("hive.metastore.client.capability.check", "false"); + put("hive.repl.bootstrap.dump.open.txn.timeout", "1s"); + put("hive.exec.dynamic.partition.mode", "nonstrict"); + put("hive.strict.checks.bucketing", "false"); + put("hive.mapred.mode", "nonstrict"); + put("mapred.input.dir.recursive", "true"); + put("hive.metastore.disallow.incompatible.col.type.changes", "false"); + put("hive.strict.managed.tables", "true"); +}}; +replicaConfigs.putAll(overrides); + +HashMap primaryConfigs = new HashMap() {{ + put("hive.metastore.client.capability.check", "false"); + put("hive.repl.bootstrap.dump.open.txn.timeout", "1s"); + put("hive.exec.dynamic.partition.mode", "nonstrict"); + put("hive.strict.checks.bucketing", "false"); + put("hive.mapred.mode", "nonstrict"); + put("mapred.input.dir.recursive", "true"); + put("hive.metastore.disallow.incompatible.col.type.changes", "false"); + put("hive.support.concurrency", "false"); + put("hive.txn.manager", "org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager"); + put("hive.strict.managed.tables", "false"); +}}; +primaryConfigs.putAll(overrides); + +internalBeforeClassSetup(primaryConfigs, replicaConfigs, Review comment: As long as the writeId associated with the stats is valid according to the given query's valid writeId list, the stats will be used if they are marked valid. Usually when a writeId advances, the stats will be marked invalid if the operation advancing the writeId renders stats inaccurate. In case of migration, even though writeid advances, the operation may not necessarily render stats inaccurate. In that case, even if the writeId associated with the stats is behind the latest allocated one the stats will be useful as long as 1. the writeId appears valid according to the query's writeId list and 2. stats themselves are marked valid. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact
[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.
[ https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=225063=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-225063 ] ASF GitHub Bot logged work on HIVE-21109: - Author: ASF GitHub Bot Created on: 09/Apr/19 15:47 Start Date: 09/Apr/19 15:47 Worklog Time Spent: 10m Work Description: ashutosh-bapat commented on pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r273560079 ## File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestStatsReplicationScenarios.java ## @@ -216,16 +233,23 @@ private void verifyNoPartitionStatsReplicationForMetadataOnly(String tableName) String ndTableName = "ndTable"; // Partitioned table without data during bootstrap and hence no stats. String ndPartTableName = "ndPTable"; +String tblCreateExtra = ""; + +if (useAcidTables) { Review comment: Done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 225063) Time Spent: 12h 20m (was: 12h 10m) > Stats replication for ACID tables. > -- > > Key: HIVE-21109 > URL: https://issues.apache.org/jira/browse/HIVE-21109 > Project: Hive > Issue Type: Sub-task >Reporter: Ashutosh Bapat >Assignee: Ashutosh Bapat >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, > HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, > HIVE-21109.06.patch, HIVE-21109.07.patch, HIVE-21109.08.patch, > HIVE-21109.09.patch, HIVE-21109.09.patch > > Time Spent: 12h 20m > Remaining Estimate: 0h > > Transactional tables require a writeID associated with the stats update. This > writeId needs to be in sync with the writeId on the source and hence needs to > be replicated from the source. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.
[ https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=221688=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221688 ] ASF GitHub Bot logged work on HIVE-21109: - Author: ASF GitHub Bot Created on: 02/Apr/19 11:25 Start Date: 02/Apr/19 11:25 Worklog Time Spent: 10m Work Description: sankarh commented on pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r271253403 ## File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestStatsReplicationScenariosMigration.java ## @@ -0,0 +1,78 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hive.ql.parse; + +import org.apache.hadoop.hive.conf.HiveConf; +import org.apache.hadoop.hive.metastore.conf.MetastoreConf; +import org.apache.hadoop.hive.metastore.messaging.json.gzip.GzipJSONMessageEncoder; +import org.junit.BeforeClass; +import org.junit.Rule; +import org.junit.rules.TestName; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.HashMap; +import java.util.Map; + +/** + * Tests statistics replication for ACID tables. + */ +public class TestStatsReplicationScenariosMigration extends TestStatsReplicationScenarios { + @Rule + public final TestName testName = new TestName(); + + protected static final Logger LOG = LoggerFactory.getLogger(TestReplicationScenarios.class); + + @BeforeClass + public static void classLevelSetup() throws Exception { +Map overrides = new HashMap<>(); +overrides.put(MetastoreConf.ConfVars.EVENT_MESSAGE_FACTORY.getHiveName(), +GzipJSONMessageEncoder.class.getCanonicalName()); + +HashMap replicaConfigs = new HashMap() {{ + put("hive.support.concurrency", "true"); + put("hive.txn.manager", "org.apache.hadoop.hive.ql.lockmgr.DbTxnManager"); + put("hive.metastore.client.capability.check", "false"); + put("hive.repl.bootstrap.dump.open.txn.timeout", "1s"); + put("hive.exec.dynamic.partition.mode", "nonstrict"); + put("hive.strict.checks.bucketing", "false"); + put("hive.mapred.mode", "nonstrict"); + put("mapred.input.dir.recursive", "true"); + put("hive.metastore.disallow.incompatible.col.type.changes", "false"); + put("hive.strict.managed.tables", "true"); +}}; +replicaConfigs.putAll(overrides); + +HashMap primaryConfigs = new HashMap() {{ + put("hive.metastore.client.capability.check", "false"); + put("hive.repl.bootstrap.dump.open.txn.timeout", "1s"); + put("hive.exec.dynamic.partition.mode", "nonstrict"); + put("hive.strict.checks.bucketing", "false"); + put("hive.mapred.mode", "nonstrict"); + put("mapred.input.dir.recursive", "true"); + put("hive.metastore.disallow.incompatible.col.type.changes", "false"); + put("hive.support.concurrency", "false"); + put("hive.txn.manager", "org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager"); + put("hive.strict.managed.tables", "false"); +}}; +primaryConfigs.putAll(overrides); + +internalBeforeClassSetup(primaryConfigs, replicaConfigs, Review comment: Yeah. That's true. I have a question on how txn stats work. If the writeId in the table/partition is less than the latest allocated writeId in the table, will future queries use this stats for query optimization or not? If writeId is not latest, it means, possibly stats are not up to date. So, I'm not sure how such stats are used. Pls share if you have some idea. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 221688) Time Spent: 12h (was: 11h 50m) > Stats replication for ACID tables. > -- > > Key: HIVE-21109 >
[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.
[ https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=221689=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221689 ] ASF GitHub Bot logged work on HIVE-21109: - Author: ASF GitHub Bot Created on: 02/Apr/19 11:25 Start Date: 02/Apr/19 11:25 Worklog Time Spent: 10m Work Description: sankarh commented on pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r271253403 ## File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestStatsReplicationScenariosMigration.java ## @@ -0,0 +1,78 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hive.ql.parse; + +import org.apache.hadoop.hive.conf.HiveConf; +import org.apache.hadoop.hive.metastore.conf.MetastoreConf; +import org.apache.hadoop.hive.metastore.messaging.json.gzip.GzipJSONMessageEncoder; +import org.junit.BeforeClass; +import org.junit.Rule; +import org.junit.rules.TestName; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.HashMap; +import java.util.Map; + +/** + * Tests statistics replication for ACID tables. + */ +public class TestStatsReplicationScenariosMigration extends TestStatsReplicationScenarios { + @Rule + public final TestName testName = new TestName(); + + protected static final Logger LOG = LoggerFactory.getLogger(TestReplicationScenarios.class); + + @BeforeClass + public static void classLevelSetup() throws Exception { +Map overrides = new HashMap<>(); +overrides.put(MetastoreConf.ConfVars.EVENT_MESSAGE_FACTORY.getHiveName(), +GzipJSONMessageEncoder.class.getCanonicalName()); + +HashMap replicaConfigs = new HashMap() {{ + put("hive.support.concurrency", "true"); + put("hive.txn.manager", "org.apache.hadoop.hive.ql.lockmgr.DbTxnManager"); + put("hive.metastore.client.capability.check", "false"); + put("hive.repl.bootstrap.dump.open.txn.timeout", "1s"); + put("hive.exec.dynamic.partition.mode", "nonstrict"); + put("hive.strict.checks.bucketing", "false"); + put("hive.mapred.mode", "nonstrict"); + put("mapred.input.dir.recursive", "true"); + put("hive.metastore.disallow.incompatible.col.type.changes", "false"); + put("hive.strict.managed.tables", "true"); +}}; +replicaConfigs.putAll(overrides); + +HashMap primaryConfigs = new HashMap() {{ + put("hive.metastore.client.capability.check", "false"); + put("hive.repl.bootstrap.dump.open.txn.timeout", "1s"); + put("hive.exec.dynamic.partition.mode", "nonstrict"); + put("hive.strict.checks.bucketing", "false"); + put("hive.mapred.mode", "nonstrict"); + put("mapred.input.dir.recursive", "true"); + put("hive.metastore.disallow.incompatible.col.type.changes", "false"); + put("hive.support.concurrency", "false"); + put("hive.txn.manager", "org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager"); + put("hive.strict.managed.tables", "false"); +}}; +primaryConfigs.putAll(overrides); + +internalBeforeClassSetup(primaryConfigs, replicaConfigs, Review comment: Yeah. That's true. I have a question on how txn stats work. If the writeId in the table/partition object is less than the latest allocated writeId in the table, will future queries use this stats for query optimization or not? If writeId is not latest, it means, possibly stats are not up to date. So, I'm not sure how such stats are used. Pls share if you have some idea. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 221689) Time Spent: 12h 10m (was: 12h) > Stats replication for ACID tables. > -- > > Key: HIVE-21109 >
[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.
[ https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=221664=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221664 ] ASF GitHub Bot logged work on HIVE-21109: - Author: ASF GitHub Bot Created on: 02/Apr/19 09:41 Start Date: 02/Apr/19 09:41 Worklog Time Spent: 10m Work Description: ashutosh-bapat commented on pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r271215497 ## File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestStatsReplicationScenariosMigration.java ## @@ -0,0 +1,78 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hive.ql.parse; + +import org.apache.hadoop.hive.conf.HiveConf; +import org.apache.hadoop.hive.metastore.conf.MetastoreConf; +import org.apache.hadoop.hive.metastore.messaging.json.gzip.GzipJSONMessageEncoder; +import org.junit.BeforeClass; +import org.junit.Rule; +import org.junit.rules.TestName; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.HashMap; +import java.util.Map; + +/** + * Tests statistics replication for ACID tables. + */ +public class TestStatsReplicationScenariosMigration extends TestStatsReplicationScenarios { + @Rule + public final TestName testName = new TestName(); + + protected static final Logger LOG = LoggerFactory.getLogger(TestReplicationScenarios.class); + + @BeforeClass + public static void classLevelSetup() throws Exception { +Map overrides = new HashMap<>(); +overrides.put(MetastoreConf.ConfVars.EVENT_MESSAGE_FACTORY.getHiveName(), +GzipJSONMessageEncoder.class.getCanonicalName()); + +HashMap replicaConfigs = new HashMap() {{ + put("hive.support.concurrency", "true"); + put("hive.txn.manager", "org.apache.hadoop.hive.ql.lockmgr.DbTxnManager"); + put("hive.metastore.client.capability.check", "false"); + put("hive.repl.bootstrap.dump.open.txn.timeout", "1s"); + put("hive.exec.dynamic.partition.mode", "nonstrict"); + put("hive.strict.checks.bucketing", "false"); + put("hive.mapred.mode", "nonstrict"); + put("mapred.input.dir.recursive", "true"); + put("hive.metastore.disallow.incompatible.col.type.changes", "false"); + put("hive.strict.managed.tables", "true"); +}}; +replicaConfigs.putAll(overrides); + +HashMap primaryConfigs = new HashMap() {{ + put("hive.metastore.client.capability.check", "false"); + put("hive.repl.bootstrap.dump.open.txn.timeout", "1s"); + put("hive.exec.dynamic.partition.mode", "nonstrict"); + put("hive.strict.checks.bucketing", "false"); + put("hive.mapred.mode", "nonstrict"); + put("mapred.input.dir.recursive", "true"); + put("hive.metastore.disallow.incompatible.col.type.changes", "false"); + put("hive.support.concurrency", "false"); + put("hive.txn.manager", "org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager"); + put("hive.strict.managed.tables", "false"); +}}; +primaryConfigs.putAll(overrides); + +internalBeforeClassSetup(primaryConfigs, replicaConfigs, Review comment: The write Id associated with the column stats is stored in Table/Partition object. For migration case, this writeId may not exactly represent the writeId of stats if there's an alter table event after column stats update. Such an alter table event would open a new transaction and update writeId in table/partition object. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 221664) Time Spent: 11h 50m (was: 11h 40m) > Stats replication for ACID tables. > -- > > Key: HIVE-21109 > URL:
[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.
[ https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=221246=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221246 ] ASF GitHub Bot logged work on HIVE-21109: - Author: ASF GitHub Bot Created on: 01/Apr/19 11:18 Start Date: 01/Apr/19 11:18 Worklog Time Spent: 10m Work Description: sankarh commented on pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r270820010 ## File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestStatsReplicationScenarios.java ## @@ -301,12 +338,106 @@ private String dumpLoadVerify(List tableNames, String lastReplicationId, return dumpTuple.lastReplicationId; } + /** + * Run a bootstrap that will fail. + * @param tuple the location of bootstrap dump + */ + private void failBootstrapLoad(WarehouseInstance.Tuple tuple, int failAfterNumTables) throws Throwable { +// fail setting ckpt directory property for the second table so that we test the case when +// bootstrap load fails after some but not all tables are loaded. +BehaviourInjection callerVerifier += new BehaviourInjection() { + int cntTables = 0; + @Nullable + @Override + public Boolean apply(@Nullable CallerArguments args) { +cntTables++; +if (args.dbName.equalsIgnoreCase(replicatedDbName) && cntTables > failAfterNumTables) { + injectionPathCalled = true; + LOG.warn("Verifier - DB : " + args.dbName + " TABLE : " + args.tblName); + return false; +} +return true; + } +}; + +InjectableBehaviourObjectStore.setAlterTableModifier(callerVerifier); +try { + replica.loadFailure(replicatedDbName, tuple.dumpLocation); + callerVerifier.assertInjectionsPerformed(true, false); +} finally { + InjectableBehaviourObjectStore.resetAlterTableModifier(); +} + } + + private void failIncrementalLoad(WarehouseInstance.Tuple dumpTuple, int failAfterNumEvents) throws Throwable { +// fail add notification when updating table stats after given number of such events. Thus we +// test successful application as well as failed application of this event. +BehaviourInjection callerVerifier += new BehaviourInjection() { + int cntEvents = 0; + @Override + public Boolean apply(NotificationEvent entry) { +cntEvents++; Review comment: OK This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 221246) Time Spent: 11.5h (was: 11h 20m) > Stats replication for ACID tables. > -- > > Key: HIVE-21109 > URL: https://issues.apache.org/jira/browse/HIVE-21109 > Project: Hive > Issue Type: Sub-task >Reporter: Ashutosh Bapat >Assignee: Ashutosh Bapat >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, > HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, > HIVE-21109.06.patch, HIVE-21109.07.patch, HIVE-21109.08.patch > > Time Spent: 11.5h > Remaining Estimate: 0h > > Transactional tables require a writeID associated with the stats update. This > writeId needs to be in sync with the writeId on the source and hence needs to > be replicated from the source. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.
[ https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=221247=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221247 ] ASF GitHub Bot logged work on HIVE-21109: - Author: ASF GitHub Bot Created on: 01/Apr/19 11:18 Start Date: 01/Apr/19 11:18 Worklog Time Spent: 10m Work Description: sankarh commented on pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r270820010 ## File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestStatsReplicationScenarios.java ## @@ -301,12 +338,106 @@ private String dumpLoadVerify(List tableNames, String lastReplicationId, return dumpTuple.lastReplicationId; } + /** + * Run a bootstrap that will fail. + * @param tuple the location of bootstrap dump + */ + private void failBootstrapLoad(WarehouseInstance.Tuple tuple, int failAfterNumTables) throws Throwable { +// fail setting ckpt directory property for the second table so that we test the case when +// bootstrap load fails after some but not all tables are loaded. +BehaviourInjection callerVerifier += new BehaviourInjection() { + int cntTables = 0; + @Nullable + @Override + public Boolean apply(@Nullable CallerArguments args) { +cntTables++; +if (args.dbName.equalsIgnoreCase(replicatedDbName) && cntTables > failAfterNumTables) { + injectionPathCalled = true; + LOG.warn("Verifier - DB : " + args.dbName + " TABLE : " + args.tblName); + return false; +} +return true; + } +}; + +InjectableBehaviourObjectStore.setAlterTableModifier(callerVerifier); +try { + replica.loadFailure(replicatedDbName, tuple.dumpLocation); + callerVerifier.assertInjectionsPerformed(true, false); +} finally { + InjectableBehaviourObjectStore.resetAlterTableModifier(); +} + } + + private void failIncrementalLoad(WarehouseInstance.Tuple dumpTuple, int failAfterNumEvents) throws Throwable { +// fail add notification when updating table stats after given number of such events. Thus we +// test successful application as well as failed application of this event. +BehaviourInjection callerVerifier += new BehaviourInjection() { + int cntEvents = 0; + @Override + public Boolean apply(NotificationEvent entry) { +cntEvents++; Review comment: OK. Pls update the test to fail for 2nd update stats event. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 221247) Time Spent: 11h 40m (was: 11.5h) > Stats replication for ACID tables. > -- > > Key: HIVE-21109 > URL: https://issues.apache.org/jira/browse/HIVE-21109 > Project: Hive > Issue Type: Sub-task >Reporter: Ashutosh Bapat >Assignee: Ashutosh Bapat >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, > HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, > HIVE-21109.06.patch, HIVE-21109.07.patch, HIVE-21109.08.patch > > Time Spent: 11h 40m > Remaining Estimate: 0h > > Transactional tables require a writeID associated with the stats update. This > writeId needs to be in sync with the writeId on the source and hence needs to > be replicated from the source. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.
[ https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=221245=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221245 ] ASF GitHub Bot logged work on HIVE-21109: - Author: ASF GitHub Bot Created on: 01/Apr/19 11:17 Start Date: 01/Apr/19 11:17 Worklog Time Spent: 10m Work Description: sankarh commented on pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r270819769 ## File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestStatsReplicationScenarios.java ## @@ -301,12 +338,106 @@ private String dumpLoadVerify(List tableNames, String lastReplicationId, return dumpTuple.lastReplicationId; } + /** + * Run a bootstrap that will fail. + * @param tuple the location of bootstrap dump + */ + private void failBootstrapLoad(WarehouseInstance.Tuple tuple, int failAfterNumTables) throws Throwable { +// fail setting ckpt directory property for the second table so that we test the case when +// bootstrap load fails after some but not all tables are loaded. +BehaviourInjection callerVerifier += new BehaviourInjection() { + int cntTables = 0; + @Nullable + @Override + public Boolean apply(@Nullable CallerArguments args) { +cntTables++; +if (args.dbName.equalsIgnoreCase(replicatedDbName) && cntTables > failAfterNumTables) { + injectionPathCalled = true; + LOG.warn("Verifier - DB : " + args.dbName + " TABLE : " + args.tblName); + return false; +} +return true; + } +}; + +InjectableBehaviourObjectStore.setAlterTableModifier(callerVerifier); +try { + replica.loadFailure(replicatedDbName, tuple.dumpLocation); + callerVerifier.assertInjectionsPerformed(true, false); +} finally { + InjectableBehaviourObjectStore.resetAlterTableModifier(); +} Review comment: Ok This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 221245) Time Spent: 11h 20m (was: 11h 10m) > Stats replication for ACID tables. > -- > > Key: HIVE-21109 > URL: https://issues.apache.org/jira/browse/HIVE-21109 > Project: Hive > Issue Type: Sub-task >Reporter: Ashutosh Bapat >Assignee: Ashutosh Bapat >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, > HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, > HIVE-21109.06.patch, HIVE-21109.07.patch, HIVE-21109.08.patch > > Time Spent: 11h 20m > Remaining Estimate: 0h > > Transactional tables require a writeID associated with the stats update. This > writeId needs to be in sync with the writeId on the source and hence needs to > be replicated from the source. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.
[ https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=221244=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221244 ] ASF GitHub Bot logged work on HIVE-21109: - Author: ASF GitHub Bot Created on: 01/Apr/19 11:17 Start Date: 01/Apr/19 11:17 Worklog Time Spent: 10m Work Description: sankarh commented on pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r270819437 ## File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java ## @@ -2950,21 +2957,32 @@ public Partition createPartition(Table tbl, Map partSpec) throws int size = addPartitionDesc.getPartitionCount(); List in = new ArrayList(size); -AcidUtils.TableSnapshot tableSnapshot = AcidUtils.getTableSnapshot(conf, tbl, true); long writeId; String validWriteIdList; -if (tableSnapshot != null && tableSnapshot.getWriteId() > 0) { - writeId = tableSnapshot.getWriteId(); - validWriteIdList = tableSnapshot.getValidWriteIdList(); + +// In case of replication, get the writeId from the source and use valid write Id list +// for replication. +if (addPartitionDesc.getReplicationSpec().isInReplicationScope() && +addPartitionDesc.getPartition(0).getWriteId() > 0) { Review comment: I think, this assumption of table type is non-transactional (based on writeId=0) and ignoring failure case is not right. We can explicitly check if it is transactional table or not and then do necessary checks. If writeId comes as 0 for transactional table, then it is error flow. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 221244) Time Spent: 11h 10m (was: 11h) > Stats replication for ACID tables. > -- > > Key: HIVE-21109 > URL: https://issues.apache.org/jira/browse/HIVE-21109 > Project: Hive > Issue Type: Sub-task >Reporter: Ashutosh Bapat >Assignee: Ashutosh Bapat >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, > HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, > HIVE-21109.06.patch, HIVE-21109.07.patch, HIVE-21109.08.patch > > Time Spent: 11h 10m > Remaining Estimate: 0h > > Transactional tables require a writeID associated with the stats update. This > writeId needs to be in sync with the writeId on the source and hence needs to > be replicated from the source. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.
[ https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=221241=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221241 ] ASF GitHub Bot logged work on HIVE-21109: - Author: ASF GitHub Bot Created on: 01/Apr/19 11:13 Start Date: 01/Apr/19 11:13 Worklog Time Spent: 10m Work Description: sankarh commented on pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r270818464 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnStatsUpdateTask.java ## @@ -297,21 +303,34 @@ private ColumnStatisticsDesc getColumnStatsDesc(String dbName, private int persistColumnStats(Hive db) throws HiveException, MetaException, IOException { ColumnStatistics colStats = constructColumnStatsFromInput(); -ColumnStatisticsDesc colStatsDesc = colStats.getStatsDesc(); -// We do not support stats replication for a transactional table yet. If we are converting -// a non-transactional table to a transactional table during replication, we might get -// column statistics but we shouldn't update those. -if (work.getColStats() != null && - AcidUtils.isTransactionalTable(getHive().getTable(colStatsDesc.getDbName(), - colStatsDesc.getTableName( { - LOG.debug("Skipped updating column stats for table " + -TableName.getDbTable(colStatsDesc.getDbName(), colStatsDesc.getTableName()) + -" because it is converted to a transactional table during replication."); - return 0; -} - SetPartitionsStatsRequest request = new SetPartitionsStatsRequest(Collections.singletonList(colStats)); + +// Set writeId and validWriteId list for replicated statistics. +if (work.getColStats() != null) { + String dbName = colStats.getStatsDesc().getDbName(); + String tblName = colStats.getStatsDesc().getTableName(); + Table tbl = db.getTable(dbName, tblName); + long writeId = work.getWriteId(); + // If it's a transactional table on source and target, we will get a valid writeId + // associated with it. Otherwise it's a non-transactional table on source migrated to a + // transactional table on target, we need to craft a valid writeId here. + if (AcidUtils.isTransactionalTable(tbl)) { +ValidWriteIdList writeIds; +if (writeId <= 0) { + Long tmpWriteId = ReplUtils.getMigrationCurrentTblWriteId(conf); + if (tmpWriteId == null) { +throw new HiveException("DDLTask : Write id is not set in the config by open txn task for migration"); + } + writeId = tmpWriteId; +} +writeIds = new ValidReaderWriteIdList(TableName.getDbTable(dbName, tblName), new long[0], Review comment: I think, this assumption can change in future if someone uses this task to update stats even in non-repl flow. I suggest to add explicit check for repl scope. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 221241) Time Spent: 11h (was: 10h 50m) > Stats replication for ACID tables. > -- > > Key: HIVE-21109 > URL: https://issues.apache.org/jira/browse/HIVE-21109 > Project: Hive > Issue Type: Sub-task >Reporter: Ashutosh Bapat >Assignee: Ashutosh Bapat >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, > HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, > HIVE-21109.06.patch, HIVE-21109.07.patch, HIVE-21109.08.patch > > Time Spent: 11h > Remaining Estimate: 0h > > Transactional tables require a writeID associated with the stats update. This > writeId needs to be in sync with the writeId on the source and hence needs to > be replicated from the source. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.
[ https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=221240=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221240 ] ASF GitHub Bot logged work on HIVE-21109: - Author: ASF GitHub Bot Created on: 01/Apr/19 11:10 Start Date: 01/Apr/19 11:10 Worklog Time Spent: 10m Work Description: sankarh commented on pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r270817825 ## File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestStatsReplicationScenariosMigration.java ## @@ -0,0 +1,78 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hive.ql.parse; + +import org.apache.hadoop.hive.conf.HiveConf; +import org.apache.hadoop.hive.metastore.conf.MetastoreConf; +import org.apache.hadoop.hive.metastore.messaging.json.gzip.GzipJSONMessageEncoder; +import org.junit.BeforeClass; +import org.junit.Rule; +import org.junit.rules.TestName; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.HashMap; +import java.util.Map; + +/** + * Tests statistics replication for ACID tables. + */ +public class TestStatsReplicationScenariosMigration extends TestStatsReplicationScenarios { + @Rule + public final TestName testName = new TestName(); + + protected static final Logger LOG = LoggerFactory.getLogger(TestReplicationScenarios.class); + + @BeforeClass + public static void classLevelSetup() throws Exception { +Map overrides = new HashMap<>(); +overrides.put(MetastoreConf.ConfVars.EVENT_MESSAGE_FACTORY.getHiveName(), +GzipJSONMessageEncoder.class.getCanonicalName()); + +HashMap replicaConfigs = new HashMap() {{ + put("hive.support.concurrency", "true"); + put("hive.txn.manager", "org.apache.hadoop.hive.ql.lockmgr.DbTxnManager"); + put("hive.metastore.client.capability.check", "false"); + put("hive.repl.bootstrap.dump.open.txn.timeout", "1s"); + put("hive.exec.dynamic.partition.mode", "nonstrict"); + put("hive.strict.checks.bucketing", "false"); + put("hive.mapred.mode", "nonstrict"); + put("mapred.input.dir.recursive", "true"); + put("hive.metastore.disallow.incompatible.col.type.changes", "false"); + put("hive.strict.managed.tables", "true"); +}}; +replicaConfigs.putAll(overrides); + +HashMap primaryConfigs = new HashMap() {{ + put("hive.metastore.client.capability.check", "false"); + put("hive.repl.bootstrap.dump.open.txn.timeout", "1s"); + put("hive.exec.dynamic.partition.mode", "nonstrict"); + put("hive.strict.checks.bucketing", "false"); + put("hive.mapred.mode", "nonstrict"); + put("mapred.input.dir.recursive", "true"); + put("hive.metastore.disallow.incompatible.col.type.changes", "false"); + put("hive.support.concurrency", "false"); + put("hive.txn.manager", "org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager"); + put("hive.strict.managed.tables", "false"); +}}; +primaryConfigs.putAll(overrides); + +internalBeforeClassSetup(primaryConfigs, replicaConfigs, Review comment: In migration case, we shall validate if stats are associated with correct writeId. I think, in our tests, it should be pointing to last allocated writeId. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 221240) Time Spent: 10h 50m (was: 10h 40m) > Stats replication for ACID tables. > -- > > Key: HIVE-21109 > URL: https://issues.apache.org/jira/browse/HIVE-21109 > Project: Hive > Issue Type: Sub-task >Reporter: Ashutosh Bapat >Assignee: Ashutosh Bapat >Priority:
[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.
[ https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=221210=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221210 ] ASF GitHub Bot logged work on HIVE-21109: - Author: ASF GitHub Bot Created on: 01/Apr/19 10:05 Start Date: 01/Apr/19 10:05 Worklog Time Spent: 10m Work Description: ashutosh-bapat commented on pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r270797151 ## File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestStatsReplicationScenarios.java ## @@ -301,12 +338,106 @@ private String dumpLoadVerify(List tableNames, String lastReplicationId, return dumpTuple.lastReplicationId; } + /** + * Run a bootstrap that will fail. + * @param tuple the location of bootstrap dump + */ + private void failBootstrapLoad(WarehouseInstance.Tuple tuple, int failAfterNumTables) throws Throwable { +// fail setting ckpt directory property for the second table so that we test the case when +// bootstrap load fails after some but not all tables are loaded. +BehaviourInjection callerVerifier += new BehaviourInjection() { + int cntTables = 0; + @Nullable + @Override + public Boolean apply(@Nullable CallerArguments args) { +cntTables++; Review comment: Hmm. Thanks for catching this. Done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 221210) Time Spent: 10h 40m (was: 10.5h) > Stats replication for ACID tables. > -- > > Key: HIVE-21109 > URL: https://issues.apache.org/jira/browse/HIVE-21109 > Project: Hive > Issue Type: Sub-task >Reporter: Ashutosh Bapat >Assignee: Ashutosh Bapat >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, > HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, > HIVE-21109.06.patch, HIVE-21109.07.patch, HIVE-21109.08.patch > > Time Spent: 10h 40m > Remaining Estimate: 0h > > Transactional tables require a writeID associated with the stats update. This > writeId needs to be in sync with the writeId on the source and hence needs to > be replicated from the source. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.
[ https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=221209=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221209 ] ASF GitHub Bot logged work on HIVE-21109: - Author: ASF GitHub Bot Created on: 01/Apr/19 10:05 Start Date: 01/Apr/19 10:05 Worklog Time Spent: 10m Work Description: ashutosh-bapat commented on pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r270797067 ## File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestStatsReplicationScenariosMigrationNoAutogather.java ## @@ -0,0 +1,78 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hive.ql.parse; + +import org.apache.hadoop.hive.conf.HiveConf; +import org.apache.hadoop.hive.metastore.conf.MetastoreConf; +import org.apache.hadoop.hive.metastore.messaging.json.gzip.GzipJSONMessageEncoder; +import org.junit.BeforeClass; +import org.junit.Rule; +import org.junit.rules.TestName; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.HashMap; +import java.util.Map; + +/** + * Tests statistics replication for ACID tables. + */ +public class TestStatsReplicationScenariosMigrationNoAutogather extends TestStatsReplicationScenarios { + @Rule + public final TestName testName = new TestName(); + + protected static final Logger LOG = LoggerFactory.getLogger(TestReplicationScenarios.class); Review comment: Removed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 221209) Time Spent: 10.5h (was: 10h 20m) > Stats replication for ACID tables. > -- > > Key: HIVE-21109 > URL: https://issues.apache.org/jira/browse/HIVE-21109 > Project: Hive > Issue Type: Sub-task >Reporter: Ashutosh Bapat >Assignee: Ashutosh Bapat >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, > HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, > HIVE-21109.06.patch, HIVE-21109.07.patch, HIVE-21109.08.patch > > Time Spent: 10.5h > Remaining Estimate: 0h > > Transactional tables require a writeID associated with the stats update. This > writeId needs to be in sync with the writeId on the source and hence needs to > be replicated from the source. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.
[ https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=221207=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221207 ] ASF GitHub Bot logged work on HIVE-21109: - Author: ASF GitHub Bot Created on: 01/Apr/19 10:02 Start Date: 01/Apr/19 10:02 Worklog Time Spent: 10m Work Description: ashutosh-bapat commented on pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r270795998 ## File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestStatsReplicationScenarios.java ## @@ -301,12 +338,106 @@ private String dumpLoadVerify(List tableNames, String lastReplicationId, return dumpTuple.lastReplicationId; } + /** + * Run a bootstrap that will fail. + * @param tuple the location of bootstrap dump + */ + private void failBootstrapLoad(WarehouseInstance.Tuple tuple, int failAfterNumTables) throws Throwable { +// fail setting ckpt directory property for the second table so that we test the case when +// bootstrap load fails after some but not all tables are loaded. +BehaviourInjection callerVerifier += new BehaviourInjection() { + int cntTables = 0; + @Nullable + @Override + public Boolean apply(@Nullable CallerArguments args) { +cntTables++; +if (args.dbName.equalsIgnoreCase(replicatedDbName) && cntTables > failAfterNumTables) { + injectionPathCalled = true; + LOG.warn("Verifier - DB : " + args.dbName + " TABLE : " + args.tblName); + return false; +} +return true; + } +}; + +InjectableBehaviourObjectStore.setAlterTableModifier(callerVerifier); +try { + replica.loadFailure(replicatedDbName, tuple.dumpLocation); + callerVerifier.assertInjectionsPerformed(true, false); +} finally { + InjectableBehaviourObjectStore.resetAlterTableModifier(); +} + } + + private void failIncrementalLoad(WarehouseInstance.Tuple dumpTuple, int failAfterNumEvents) throws Throwable { +// fail add notification when updating table stats after given number of such events. Thus we +// test successful application as well as failed application of this event. +BehaviourInjection callerVerifier += new BehaviourInjection() { + int cntEvents = 0; + @Override + public Boolean apply(NotificationEvent entry) { +cntEvents++; +if (entry.getEventType().equalsIgnoreCase(EventMessage.EventType.UPDATE_TABLE_COLUMN_STAT.toString()) && +cntEvents > failAfterNumEvents) { + injectionPathCalled = true; + LOG.warn("Verifier - DB: " + entry.getDbName() + + " Table: " + entry.getTableName() + + " Event: " + entry.getEventType()); + return false; +} +return true; + } +}; + +InjectableBehaviourObjectStore.setAddNotificationModifier(callerVerifier); +try { + replica.loadFailure(replicatedDbName, dumpTuple.dumpLocation); +} finally { + InjectableBehaviourObjectStore.resetAddNotificationModifier(); +} +callerVerifier.assertInjectionsPerformed(true, false); + +// fail add notification when updating partition stats for for the second time. Thus we test +// successful application as well as failed application of this event. +callerVerifier = new BehaviourInjection() { + int cntEvents = 1; + + @Override + public Boolean apply(NotificationEvent entry) { +cntEvents++; +if (entry.getEventType().equalsIgnoreCase(EventMessage.EventType.UPDATE_PARTITION_COLUMN_STAT.toString()) && +cntEvents > failAfterNumEvents) { + injectionPathCalled = true; + LOG.warn("Verifier - DB: " + entry.getDbName() + + " Table: " + entry.getTableName() + + " Event: " + entry.getEventType()); + return false; +} +return true; + } +}; + +InjectableBehaviourObjectStore.setAddNotificationModifier(callerVerifier); +try { + replica.loadFailure(replicatedDbName, dumpTuple.dumpLocation); +} finally { + InjectableBehaviourObjectStore.resetAddNotificationModifier(); +} Review comment: Done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 221207) Time Spent: 10h 20m (was: 10h 10m) > Stats replication for ACID tables. > -- > > Key: HIVE-21109 >
[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.
[ https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=221206=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221206 ] ASF GitHub Bot logged work on HIVE-21109: - Author: ASF GitHub Bot Created on: 01/Apr/19 10:00 Start Date: 01/Apr/19 10:00 Worklog Time Spent: 10m Work Description: ashutosh-bapat commented on pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r270795354 ## File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestStatsReplicationScenarios.java ## @@ -301,12 +338,106 @@ private String dumpLoadVerify(List tableNames, String lastReplicationId, return dumpTuple.lastReplicationId; } + /** + * Run a bootstrap that will fail. + * @param tuple the location of bootstrap dump + */ + private void failBootstrapLoad(WarehouseInstance.Tuple tuple, int failAfterNumTables) throws Throwable { +// fail setting ckpt directory property for the second table so that we test the case when +// bootstrap load fails after some but not all tables are loaded. +BehaviourInjection callerVerifier += new BehaviourInjection() { + int cntTables = 0; + @Nullable + @Override + public Boolean apply(@Nullable CallerArguments args) { +cntTables++; +if (args.dbName.equalsIgnoreCase(replicatedDbName) && cntTables > failAfterNumTables) { + injectionPathCalled = true; + LOG.warn("Verifier - DB : " + args.dbName + " TABLE : " + args.tblName); + return false; +} +return true; + } +}; + +InjectableBehaviourObjectStore.setAlterTableModifier(callerVerifier); +try { + replica.loadFailure(replicatedDbName, tuple.dumpLocation); + callerVerifier.assertInjectionsPerformed(true, false); +} finally { + InjectableBehaviourObjectStore.resetAlterTableModifier(); +} + } + + private void failIncrementalLoad(WarehouseInstance.Tuple dumpTuple, int failAfterNumEvents) throws Throwable { +// fail add notification when updating table stats after given number of such events. Thus we +// test successful application as well as failed application of this event. +BehaviourInjection callerVerifier += new BehaviourInjection() { + int cntEvents = 0; + @Override + public Boolean apply(NotificationEvent entry) { +cntEvents++; Review comment: This code has changed while working on another related comment. Again we don't need to count exact number of events. We need at least one successful event and other one unsuccessful event. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 221206) Time Spent: 10h 10m (was: 10h) > Stats replication for ACID tables. > -- > > Key: HIVE-21109 > URL: https://issues.apache.org/jira/browse/HIVE-21109 > Project: Hive > Issue Type: Sub-task >Reporter: Ashutosh Bapat >Assignee: Ashutosh Bapat >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, > HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, > HIVE-21109.06.patch, HIVE-21109.07.patch, HIVE-21109.08.patch > > Time Spent: 10h 10m > Remaining Estimate: 0h > > Transactional tables require a writeID associated with the stats update. This > writeId needs to be in sync with the writeId on the source and hence needs to > be replicated from the source. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.
[ https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=221203=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221203 ] ASF GitHub Bot logged work on HIVE-21109: - Author: ASF GitHub Bot Created on: 01/Apr/19 09:58 Start Date: 01/Apr/19 09:58 Worklog Time Spent: 10m Work Description: ashutosh-bapat commented on pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r270794735 ## File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestStatsReplicationScenarios.java ## @@ -301,12 +338,106 @@ private String dumpLoadVerify(List tableNames, String lastReplicationId, return dumpTuple.lastReplicationId; } + /** + * Run a bootstrap that will fail. + * @param tuple the location of bootstrap dump + */ + private void failBootstrapLoad(WarehouseInstance.Tuple tuple, int failAfterNumTables) throws Throwable { +// fail setting ckpt directory property for the second table so that we test the case when +// bootstrap load fails after some but not all tables are loaded. +BehaviourInjection callerVerifier += new BehaviourInjection() { + int cntTables = 0; + @Nullable + @Override + public Boolean apply(@Nullable CallerArguments args) { +cntTables++; +if (args.dbName.equalsIgnoreCase(replicatedDbName) && cntTables > failAfterNumTables) { + injectionPathCalled = true; + LOG.warn("Verifier - DB : " + args.dbName + " TABLE : " + args.tblName); + return false; +} +return true; + } +}; + +InjectableBehaviourObjectStore.setAlterTableModifier(callerVerifier); +try { + replica.loadFailure(replicatedDbName, tuple.dumpLocation); + callerVerifier.assertInjectionsPerformed(true, false); +} finally { + InjectableBehaviourObjectStore.resetAlterTableModifier(); +} Review comment: I don't think we need to be really hard and fast about the exact number of tables loaded. All we are testing is whether there was a failure and the retry loaded the stats successfully. Current set of checks is enough for that. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 221203) Time Spent: 10h (was: 9h 50m) > Stats replication for ACID tables. > -- > > Key: HIVE-21109 > URL: https://issues.apache.org/jira/browse/HIVE-21109 > Project: Hive > Issue Type: Sub-task >Reporter: Ashutosh Bapat >Assignee: Ashutosh Bapat >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, > HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, > HIVE-21109.06.patch, HIVE-21109.07.patch, HIVE-21109.08.patch > > Time Spent: 10h > Remaining Estimate: 0h > > Transactional tables require a writeID associated with the stats update. This > writeId needs to be in sync with the writeId on the source and hence needs to > be replicated from the source. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.
[ https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=221179=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221179 ] ASF GitHub Bot logged work on HIVE-21109: - Author: ASF GitHub Bot Created on: 01/Apr/19 09:38 Start Date: 01/Apr/19 09:38 Worklog Time Spent: 10m Work Description: ashutosh-bapat commented on pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r270786715 ## File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestStatsReplicationScenarios.java ## @@ -269,11 +294,23 @@ private String dumpLoadVerify(List tableNames, String lastReplicationId, WarehouseInstance.Tuple dumpTuple = primary.run("use " + primaryDbName) .dump(primaryDbName, lastReplicationId, withClauseList); + // Load, if necessary changing configuration. if (parallelLoad) { replica.hiveConf.setBoolVar(HiveConf.ConfVars.EXECPARALLEL, true); } +// Fail load if for testing failure and retry scenario. Fail the load while setting +// checkpoint for a table in the middle of list of tables. +if (failRetry) { + if (lastReplicationId == null) { +failBootstrapLoad(dumpTuple, tableNames.size()/2); + } else { +failIncrementalLoad(dumpTuple, tableNames.size()/2); Review comment: We are counting UpdateTableStats or UpdatePartStats events and not every event. So, we will fail only after encountering no of tables/2 events of those types. So it can not fail before applying update stats events. But to be on the safer side, I have changed the code to fail after second event so that we have at least one successful application before we fail. Since we are performing multiple insert events per table, we can be sure that there are at least 2 events of each type. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 221179) Time Spent: 9h 50m (was: 9h 40m) > Stats replication for ACID tables. > -- > > Key: HIVE-21109 > URL: https://issues.apache.org/jira/browse/HIVE-21109 > Project: Hive > Issue Type: Sub-task >Reporter: Ashutosh Bapat >Assignee: Ashutosh Bapat >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, > HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, > HIVE-21109.06.patch, HIVE-21109.07.patch, HIVE-21109.08.patch > > Time Spent: 9h 50m > Remaining Estimate: 0h > > Transactional tables require a writeID associated with the stats update. This > writeId needs to be in sync with the writeId on the source and hence needs to > be replicated from the source. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.
[ https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=221122=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221122 ] ASF GitHub Bot logged work on HIVE-21109: - Author: ASF GitHub Bot Created on: 01/Apr/19 06:50 Start Date: 01/Apr/19 06:50 Worklog Time Spent: 10m Work Description: ashutosh-bapat commented on pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r270732089 ## File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java ## @@ -2950,21 +2957,32 @@ public Partition createPartition(Table tbl, Map partSpec) throws int size = addPartitionDesc.getPartitionCount(); List in = new ArrayList(size); -AcidUtils.TableSnapshot tableSnapshot = AcidUtils.getTableSnapshot(conf, tbl, true); long writeId; String validWriteIdList; -if (tableSnapshot != null && tableSnapshot.getWriteId() > 0) { - writeId = tableSnapshot.getWriteId(); - validWriteIdList = tableSnapshot.getValidWriteIdList(); + +// In case of replication, get the writeId from the source and use valid write Id list +// for replication. +if (addPartitionDesc.getReplicationSpec().isInReplicationScope() && +addPartitionDesc.getPartition(0).getWriteId() > 0) { Review comment: writeId will be 0 for non-transactional tables. Also this is createPartitions code, which may get executed for with writeId = 0 for non-transactional modifications to partitions for transactional tables as well. The condition, which I borrowed from the old code is required so that we don't create a valid writeId list or try to get a table snapshot when writeId is zero. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 221122) Time Spent: 9h 40m (was: 9.5h) > Stats replication for ACID tables. > -- > > Key: HIVE-21109 > URL: https://issues.apache.org/jira/browse/HIVE-21109 > Project: Hive > Issue Type: Sub-task >Reporter: Ashutosh Bapat >Assignee: Ashutosh Bapat >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, > HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, > HIVE-21109.06.patch, HIVE-21109.07.patch, HIVE-21109.08.patch > > Time Spent: 9h 40m > Remaining Estimate: 0h > > Transactional tables require a writeID associated with the stats update. This > writeId needs to be in sync with the writeId on the source and hence needs to > be replicated from the source. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.
[ https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=221121=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221121 ] ASF GitHub Bot logged work on HIVE-21109: - Author: ASF GitHub Bot Created on: 01/Apr/19 06:49 Start Date: 01/Apr/19 06:49 Worklog Time Spent: 10m Work Description: ashutosh-bapat commented on pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r270732089 ## File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java ## @@ -2950,21 +2957,32 @@ public Partition createPartition(Table tbl, Map partSpec) throws int size = addPartitionDesc.getPartitionCount(); List in = new ArrayList(size); -AcidUtils.TableSnapshot tableSnapshot = AcidUtils.getTableSnapshot(conf, tbl, true); long writeId; String validWriteIdList; -if (tableSnapshot != null && tableSnapshot.getWriteId() > 0) { - writeId = tableSnapshot.getWriteId(); - validWriteIdList = tableSnapshot.getValidWriteIdList(); + +// In case of replication, get the writeId from the source and use valid write Id list +// for replication. +if (addPartitionDesc.getReplicationSpec().isInReplicationScope() && +addPartitionDesc.getPartition(0).getWriteId() > 0) { Review comment: writeId will be 0 for non-transactional tables. Also this is createPartitions code, which may get executed for partitions created when writeId 0 for transactional tables as well. The condition, which I borrowed from the old code is required so that we don't create a valid writeId list or try to get a table snapshot when writeId is zero. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 221121) Time Spent: 9.5h (was: 9h 20m) > Stats replication for ACID tables. > -- > > Key: HIVE-21109 > URL: https://issues.apache.org/jira/browse/HIVE-21109 > Project: Hive > Issue Type: Sub-task >Reporter: Ashutosh Bapat >Assignee: Ashutosh Bapat >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, > HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, > HIVE-21109.06.patch, HIVE-21109.07.patch, HIVE-21109.08.patch > > Time Spent: 9.5h > Remaining Estimate: 0h > > Transactional tables require a writeID associated with the stats update. This > writeId needs to be in sync with the writeId on the source and hence needs to > be replicated from the source. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.
[ https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=221119=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221119 ] ASF GitHub Bot logged work on HIVE-21109: - Author: ASF GitHub Bot Created on: 01/Apr/19 06:48 Start Date: 01/Apr/19 06:48 Worklog Time Spent: 10m Work Description: ashutosh-bapat commented on pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r270732089 ## File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java ## @@ -2950,21 +2957,32 @@ public Partition createPartition(Table tbl, Map partSpec) throws int size = addPartitionDesc.getPartitionCount(); List in = new ArrayList(size); -AcidUtils.TableSnapshot tableSnapshot = AcidUtils.getTableSnapshot(conf, tbl, true); long writeId; String validWriteIdList; -if (tableSnapshot != null && tableSnapshot.getWriteId() > 0) { - writeId = tableSnapshot.getWriteId(); - validWriteIdList = tableSnapshot.getValidWriteIdList(); + +// In case of replication, get the writeId from the source and use valid write Id list +// for replication. +if (addPartitionDesc.getReplicationSpec().isInReplicationScope() && +addPartitionDesc.getPartition(0).getWriteId() > 0) { Review comment: writeId will be 0 for non-transactional tables. Also this is createPartitions code, which may get executed for partitions created when writeId 0 for transactional tables as well. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 221119) Time Spent: 9h 20m (was: 9h 10m) > Stats replication for ACID tables. > -- > > Key: HIVE-21109 > URL: https://issues.apache.org/jira/browse/HIVE-21109 > Project: Hive > Issue Type: Sub-task >Reporter: Ashutosh Bapat >Assignee: Ashutosh Bapat >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, > HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, > HIVE-21109.06.patch, HIVE-21109.07.patch, HIVE-21109.08.patch > > Time Spent: 9h 20m > Remaining Estimate: 0h > > Transactional tables require a writeID associated with the stats update. This > writeId needs to be in sync with the writeId on the source and hence needs to > be replicated from the source. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.
[ https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=221117=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221117 ] ASF GitHub Bot logged work on HIVE-21109: - Author: ASF GitHub Bot Created on: 01/Apr/19 06:44 Start Date: 01/Apr/19 06:44 Worklog Time Spent: 10m Work Description: ashutosh-bapat commented on pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r270731373 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnStatsUpdateTask.java ## @@ -297,21 +303,34 @@ private ColumnStatisticsDesc getColumnStatsDesc(String dbName, private int persistColumnStats(Hive db) throws HiveException, MetaException, IOException { ColumnStatistics colStats = constructColumnStatsFromInput(); -ColumnStatisticsDesc colStatsDesc = colStats.getStatsDesc(); -// We do not support stats replication for a transactional table yet. If we are converting -// a non-transactional table to a transactional table during replication, we might get -// column statistics but we shouldn't update those. -if (work.getColStats() != null && - AcidUtils.isTransactionalTable(getHive().getTable(colStatsDesc.getDbName(), - colStatsDesc.getTableName( { - LOG.debug("Skipped updating column stats for table " + -TableName.getDbTable(colStatsDesc.getDbName(), colStatsDesc.getTableName()) + -" because it is converted to a transactional table during replication."); - return 0; -} - SetPartitionsStatsRequest request = new SetPartitionsStatsRequest(Collections.singletonList(colStats)); + +// Set writeId and validWriteId list for replicated statistics. +if (work.getColStats() != null) { + String dbName = colStats.getStatsDesc().getDbName(); + String tblName = colStats.getStatsDesc().getTableName(); + Table tbl = db.getTable(dbName, tblName); + long writeId = work.getWriteId(); + // If it's a transactional table on source and target, we will get a valid writeId + // associated with it. Otherwise it's a non-transactional table on source migrated to a + // transactional table on target, we need to craft a valid writeId here. + if (AcidUtils.isTransactionalTable(tbl)) { +ValidWriteIdList writeIds; +if (writeId <= 0) { + Long tmpWriteId = ReplUtils.getMigrationCurrentTblWriteId(conf); + if (tmpWriteId == null) { +throw new HiveException("DDLTask : Write id is not set in the config by open txn task for migration"); + } + writeId = tmpWriteId; +} +writeIds = new ValidReaderWriteIdList(TableName.getDbTable(dbName, tblName), new long[0], Review comment: work.getColStats() returns non-null value only in case of replication flow. This block of code is under that condition. So, it executes only in repl flow. Added a comment to that effect. Also added a comment per your suggestion. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 221117) Time Spent: 9h 10m (was: 9h) > Stats replication for ACID tables. > -- > > Key: HIVE-21109 > URL: https://issues.apache.org/jira/browse/HIVE-21109 > Project: Hive > Issue Type: Sub-task >Reporter: Ashutosh Bapat >Assignee: Ashutosh Bapat >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, > HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, > HIVE-21109.06.patch, HIVE-21109.07.patch, HIVE-21109.08.patch > > Time Spent: 9h 10m > Remaining Estimate: 0h > > Transactional tables require a writeID associated with the stats update. This > writeId needs to be in sync with the writeId on the source and hence needs to > be replicated from the source. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.
[ https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=221113=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221113 ] ASF GitHub Bot logged work on HIVE-21109: - Author: ASF GitHub Bot Created on: 01/Apr/19 06:38 Start Date: 01/Apr/19 06:38 Worklog Time Spent: 10m Work Description: ashutosh-bapat commented on pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r270730194 ## File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java ## @@ -2950,21 +2957,32 @@ public Partition createPartition(Table tbl, Map partSpec) throws int size = addPartitionDesc.getPartitionCount(); List in = new ArrayList(size); -AcidUtils.TableSnapshot tableSnapshot = AcidUtils.getTableSnapshot(conf, tbl, true); long writeId; String validWriteIdList; -if (tableSnapshot != null && tableSnapshot.getWriteId() > 0) { - writeId = tableSnapshot.getWriteId(); - validWriteIdList = tableSnapshot.getValidWriteIdList(); + +// In case of replication, get the writeId from the source and use valid write Id list +// for replication. +if (addPartitionDesc.getReplicationSpec().isInReplicationScope() && +addPartitionDesc.getPartition(0).getWriteId() > 0) { + writeId = addPartitionDesc.getPartition(0).getWriteId(); + validWriteIdList = new ValidReaderWriteIdList(TableName.getDbTable(tbl.getDbName(), Review comment: Done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 221113) Time Spent: 9h (was: 8h 50m) > Stats replication for ACID tables. > -- > > Key: HIVE-21109 > URL: https://issues.apache.org/jira/browse/HIVE-21109 > Project: Hive > Issue Type: Sub-task >Reporter: Ashutosh Bapat >Assignee: Ashutosh Bapat >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, > HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, > HIVE-21109.06.patch, HIVE-21109.07.patch, HIVE-21109.08.patch > > Time Spent: 9h > Remaining Estimate: 0h > > Transactional tables require a writeID associated with the stats update. This > writeId needs to be in sync with the writeId on the source and hence needs to > be replicated from the source. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.
[ https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=221112=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221112 ] ASF GitHub Bot logged work on HIVE-21109: - Author: ASF GitHub Bot Created on: 01/Apr/19 06:36 Start Date: 01/Apr/19 06:36 Worklog Time Spent: 10m Work Description: ashutosh-bapat commented on pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r270729852 ## File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java ## @@ -694,7 +695,9 @@ public void alterTable(String catName, String dbName, String tblName, Table newT AcidUtils.TableSnapshot tableSnapshot = null; if (transactional) { if (replWriteId > 0) { - ValidWriteIdList writeIds = AcidUtils.getTableValidWriteIdListWithTxnList(conf, dbName, tblName); + ValidWriteIdList writeIds = new ValidReaderWriteIdList(TableName.getDbTable(dbName, tblName), Review comment: Done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 221112) Time Spent: 8h 50m (was: 8h 40m) > Stats replication for ACID tables. > -- > > Key: HIVE-21109 > URL: https://issues.apache.org/jira/browse/HIVE-21109 > Project: Hive > Issue Type: Sub-task >Reporter: Ashutosh Bapat >Assignee: Ashutosh Bapat >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, > HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, > HIVE-21109.06.patch, HIVE-21109.07.patch, HIVE-21109.08.patch > > Time Spent: 8h 50m > Remaining Estimate: 0h > > Transactional tables require a writeID associated with the stats update. This > writeId needs to be in sync with the writeId on the source and hence needs to > be replicated from the source. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.
[ https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=221109=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221109 ] ASF GitHub Bot logged work on HIVE-21109: - Author: ASF GitHub Bot Created on: 01/Apr/19 06:30 Start Date: 01/Apr/19 06:30 Worklog Time Spent: 10m Work Description: ashutosh-bapat commented on pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r270728648 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnStatsUpdateTask.java ## @@ -297,21 +303,34 @@ private ColumnStatisticsDesc getColumnStatsDesc(String dbName, private int persistColumnStats(Hive db) throws HiveException, MetaException, IOException { ColumnStatistics colStats = constructColumnStatsFromInput(); -ColumnStatisticsDesc colStatsDesc = colStats.getStatsDesc(); -// We do not support stats replication for a transactional table yet. If we are converting -// a non-transactional table to a transactional table during replication, we might get -// column statistics but we shouldn't update those. -if (work.getColStats() != null && - AcidUtils.isTransactionalTable(getHive().getTable(colStatsDesc.getDbName(), - colStatsDesc.getTableName( { - LOG.debug("Skipped updating column stats for table " + -TableName.getDbTable(colStatsDesc.getDbName(), colStatsDesc.getTableName()) + -" because it is converted to a transactional table during replication."); - return 0; -} - SetPartitionsStatsRequest request = new SetPartitionsStatsRequest(Collections.singletonList(colStats)); + +// Set writeId and validWriteId list for replicated statistics. +if (work.getColStats() != null) { + String dbName = colStats.getStatsDesc().getDbName(); + String tblName = colStats.getStatsDesc().getTableName(); + Table tbl = db.getTable(dbName, tblName); + long writeId = work.getWriteId(); + // If it's a transactional table on source and target, we will get a valid writeId + // associated with it. Otherwise it's a non-transactional table on source migrated to a + // transactional table on target, we need to craft a valid writeId here. + if (AcidUtils.isTransactionalTable(tbl)) { +ValidWriteIdList writeIds; +if (writeId <= 0) { Review comment: We can not set writeId in the ColumnStatsUpdateWork because the writeId for migration is available only after a transaction is opened for migration, which doesn't happen at the load time (when the work is created). Going by the gist of your suggestion, I have set a flag in work to indicate that the writeId should be the one used for migration. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 221109) Time Spent: 8h 40m (was: 8.5h) > Stats replication for ACID tables. > -- > > Key: HIVE-21109 > URL: https://issues.apache.org/jira/browse/HIVE-21109 > Project: Hive > Issue Type: Sub-task >Reporter: Ashutosh Bapat >Assignee: Ashutosh Bapat >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, > HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, > HIVE-21109.06.patch, HIVE-21109.07.patch, HIVE-21109.08.patch > > Time Spent: 8h 40m > Remaining Estimate: 0h > > Transactional tables require a writeID associated with the stats update. This > writeId needs to be in sync with the writeId on the source and hence needs to > be replicated from the source. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.
[ https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=220947=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-220947 ] ASF GitHub Bot logged work on HIVE-21109: - Author: ASF GitHub Bot Created on: 31/Mar/19 07:33 Start Date: 31/Mar/19 07:33 Worklog Time Spent: 10m Work Description: sankarh commented on pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r270653325 ## File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestStatsReplicationScenarios.java ## @@ -269,11 +294,23 @@ private String dumpLoadVerify(List tableNames, String lastReplicationId, WarehouseInstance.Tuple dumpTuple = primary.run("use " + primaryDbName) .dump(primaryDbName, lastReplicationId, withClauseList); + // Load, if necessary changing configuration. if (parallelLoad) { replica.hiveConf.setBoolVar(HiveConf.ConfVars.EXECPARALLEL, true); } +// Fail load if for testing failure and retry scenario. Fail the load while setting +// checkpoint for a table in the middle of list of tables. +if (failRetry) { + if (lastReplicationId == null) { +failBootstrapLoad(dumpTuple, tableNames.size()/2); + } else { +failIncrementalLoad(dumpTuple, tableNames.size()/2); Review comment: It is not directly mapped that one event per table. So, this value of tableNames.size()/2 may fail even before applying update state event. If we want to fail the incremental load after a fixed event, then need to get the event count by dumping it after that operation at source. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 220947) > Stats replication for ACID tables. > -- > > Key: HIVE-21109 > URL: https://issues.apache.org/jira/browse/HIVE-21109 > Project: Hive > Issue Type: Sub-task >Reporter: Ashutosh Bapat >Assignee: Ashutosh Bapat >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, > HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, > HIVE-21109.06.patch, HIVE-21109.07.patch, HIVE-21109.08.patch > > Time Spent: 8h > Remaining Estimate: 0h > > Transactional tables require a writeID associated with the stats update. This > writeId needs to be in sync with the writeId on the source and hence needs to > be replicated from the source. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.
[ https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=220941=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-220941 ] ASF GitHub Bot logged work on HIVE-21109: - Author: ASF GitHub Bot Created on: 31/Mar/19 07:33 Start Date: 31/Mar/19 07:33 Worklog Time Spent: 10m Work Description: sankarh commented on pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r270652691 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnStatsUpdateTask.java ## @@ -297,21 +303,34 @@ private ColumnStatisticsDesc getColumnStatsDesc(String dbName, private int persistColumnStats(Hive db) throws HiveException, MetaException, IOException { ColumnStatistics colStats = constructColumnStatsFromInput(); -ColumnStatisticsDesc colStatsDesc = colStats.getStatsDesc(); -// We do not support stats replication for a transactional table yet. If we are converting -// a non-transactional table to a transactional table during replication, we might get -// column statistics but we shouldn't update those. -if (work.getColStats() != null && - AcidUtils.isTransactionalTable(getHive().getTable(colStatsDesc.getDbName(), - colStatsDesc.getTableName( { - LOG.debug("Skipped updating column stats for table " + -TableName.getDbTable(colStatsDesc.getDbName(), colStatsDesc.getTableName()) + -" because it is converted to a transactional table during replication."); - return 0; -} - SetPartitionsStatsRequest request = new SetPartitionsStatsRequest(Collections.singletonList(colStats)); + +// Set writeId and validWriteId list for replicated statistics. +if (work.getColStats() != null) { + String dbName = colStats.getStatsDesc().getDbName(); + String tblName = colStats.getStatsDesc().getTableName(); + Table tbl = db.getTable(dbName, tblName); + long writeId = work.getWriteId(); + // If it's a transactional table on source and target, we will get a valid writeId + // associated with it. Otherwise it's a non-transactional table on source migrated to a + // transactional table on target, we need to craft a valid writeId here. + if (AcidUtils.isTransactionalTable(tbl)) { +ValidWriteIdList writeIds; +if (writeId <= 0) { + Long tmpWriteId = ReplUtils.getMigrationCurrentTblWriteId(conf); + if (tmpWriteId == null) { +throw new HiveException("DDLTask : Write id is not set in the config by open txn task for migration"); + } + writeId = tmpWriteId; +} +writeIds = new ValidReaderWriteIdList(TableName.getDbTable(dbName, tblName), new long[0], Review comment: Only in repl flow, this method of hardcoding ValidWriteIdList make sense. If not, then need to go with logic of getting it from HMS. Need to check it here and also add a comment on why this hardcoding logic works for repl flow. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 220941) > Stats replication for ACID tables. > -- > > Key: HIVE-21109 > URL: https://issues.apache.org/jira/browse/HIVE-21109 > Project: Hive > Issue Type: Sub-task >Reporter: Ashutosh Bapat >Assignee: Ashutosh Bapat >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, > HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, > HIVE-21109.06.patch, HIVE-21109.07.patch, HIVE-21109.08.patch > > Time Spent: 7h 40m > Remaining Estimate: 0h > > Transactional tables require a writeID associated with the stats update. This > writeId needs to be in sync with the writeId on the source and hence needs to > be replicated from the source. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.
[ https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=220950=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-220950 ] ASF GitHub Bot logged work on HIVE-21109: - Author: ASF GitHub Bot Created on: 31/Mar/19 07:33 Start Date: 31/Mar/19 07:33 Worklog Time Spent: 10m Work Description: sankarh commented on pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r270653503 ## File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestStatsReplicationScenariosMigrationNoAutogather.java ## @@ -0,0 +1,78 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hive.ql.parse; + +import org.apache.hadoop.hive.conf.HiveConf; +import org.apache.hadoop.hive.metastore.conf.MetastoreConf; +import org.apache.hadoop.hive.metastore.messaging.json.gzip.GzipJSONMessageEncoder; +import org.junit.BeforeClass; +import org.junit.Rule; +import org.junit.rules.TestName; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.HashMap; +import java.util.Map; + +/** + * Tests statistics replication for ACID tables. + */ +public class TestStatsReplicationScenariosMigrationNoAutogather extends TestStatsReplicationScenarios { + @Rule + public final TestName testName = new TestName(); + + protected static final Logger LOG = LoggerFactory.getLogger(TestReplicationScenarios.class); Review comment: LOG is not used. Can be removed. Pls check in other newly added test classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 220950) Time Spent: 8h 20m (was: 8h 10m) > Stats replication for ACID tables. > -- > > Key: HIVE-21109 > URL: https://issues.apache.org/jira/browse/HIVE-21109 > Project: Hive > Issue Type: Sub-task >Reporter: Ashutosh Bapat >Assignee: Ashutosh Bapat >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, > HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, > HIVE-21109.06.patch, HIVE-21109.07.patch, HIVE-21109.08.patch > > Time Spent: 8h 20m > Remaining Estimate: 0h > > Transactional tables require a writeID associated with the stats update. This > writeId needs to be in sync with the writeId on the source and hence needs to > be replicated from the source. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.
[ https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=220948=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-220948 ] ASF GitHub Bot logged work on HIVE-21109: - Author: ASF GitHub Bot Created on: 31/Mar/19 07:33 Start Date: 31/Mar/19 07:33 Worklog Time Spent: 10m Work Description: sankarh commented on pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r270653390 ## File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestStatsReplicationScenarios.java ## @@ -301,12 +338,106 @@ private String dumpLoadVerify(List tableNames, String lastReplicationId, return dumpTuple.lastReplicationId; } + /** + * Run a bootstrap that will fail. + * @param tuple the location of bootstrap dump + */ + private void failBootstrapLoad(WarehouseInstance.Tuple tuple, int failAfterNumTables) throws Throwable { +// fail setting ckpt directory property for the second table so that we test the case when +// bootstrap load fails after some but not all tables are loaded. +BehaviourInjection callerVerifier += new BehaviourInjection() { + int cntTables = 0; + @Nullable + @Override + public Boolean apply(@Nullable CallerArguments args) { +cntTables++; +if (args.dbName.equalsIgnoreCase(replicatedDbName) && cntTables > failAfterNumTables) { + injectionPathCalled = true; + LOG.warn("Verifier - DB : " + args.dbName + " TABLE : " + args.tblName); + return false; +} +return true; + } +}; + +InjectableBehaviourObjectStore.setAlterTableModifier(callerVerifier); +try { + replica.loadFailure(replicatedDbName, tuple.dumpLocation); + callerVerifier.assertInjectionsPerformed(true, false); +} finally { + InjectableBehaviourObjectStore.resetAlterTableModifier(); +} Review comment: Shall add necessary validations to see only so many tables are loaded at this point of time. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 220948) > Stats replication for ACID tables. > -- > > Key: HIVE-21109 > URL: https://issues.apache.org/jira/browse/HIVE-21109 > Project: Hive > Issue Type: Sub-task >Reporter: Ashutosh Bapat >Assignee: Ashutosh Bapat >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, > HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, > HIVE-21109.06.patch, HIVE-21109.07.patch, HIVE-21109.08.patch > > Time Spent: 8h > Remaining Estimate: 0h > > Transactional tables require a writeID associated with the stats update. This > writeId needs to be in sync with the writeId on the source and hence needs to > be replicated from the source. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.
[ https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=220946=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-220946 ] ASF GitHub Bot logged work on HIVE-21109: - Author: ASF GitHub Bot Created on: 31/Mar/19 07:33 Start Date: 31/Mar/19 07:33 Worklog Time Spent: 10m Work Description: sankarh commented on pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r270652901 ## File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java ## @@ -2950,21 +2957,32 @@ public Partition createPartition(Table tbl, Map partSpec) throws int size = addPartitionDesc.getPartitionCount(); List in = new ArrayList(size); -AcidUtils.TableSnapshot tableSnapshot = AcidUtils.getTableSnapshot(conf, tbl, true); long writeId; String validWriteIdList; -if (tableSnapshot != null && tableSnapshot.getWriteId() > 0) { - writeId = tableSnapshot.getWriteId(); - validWriteIdList = tableSnapshot.getValidWriteIdList(); + +// In case of replication, get the writeId from the source and use valid write Id list +// for replication. +if (addPartitionDesc.getReplicationSpec().isInReplicationScope() && +addPartitionDesc.getPartition(0).getWriteId() > 0) { + writeId = addPartitionDesc.getPartition(0).getWriteId(); + validWriteIdList = new ValidReaderWriteIdList(TableName.getDbTable(tbl.getDbName(), Review comment: Pls add a comment on why this hardcoding logic works for repl flow. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 220946) Time Spent: 8h (was: 7h 50m) > Stats replication for ACID tables. > -- > > Key: HIVE-21109 > URL: https://issues.apache.org/jira/browse/HIVE-21109 > Project: Hive > Issue Type: Sub-task >Reporter: Ashutosh Bapat >Assignee: Ashutosh Bapat >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, > HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, > HIVE-21109.06.patch, HIVE-21109.07.patch, HIVE-21109.08.patch > > Time Spent: 8h > Remaining Estimate: 0h > > Transactional tables require a writeID associated with the stats update. This > writeId needs to be in sync with the writeId on the source and hence needs to > be replicated from the source. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.
[ https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=220945=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-220945 ] ASF GitHub Bot logged work on HIVE-21109: - Author: ASF GitHub Bot Created on: 31/Mar/19 07:33 Start Date: 31/Mar/19 07:33 Worklog Time Spent: 10m Work Description: sankarh commented on pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r270652886 ## File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java ## @@ -2950,21 +2957,32 @@ public Partition createPartition(Table tbl, Map partSpec) throws int size = addPartitionDesc.getPartitionCount(); List in = new ArrayList(size); -AcidUtils.TableSnapshot tableSnapshot = AcidUtils.getTableSnapshot(conf, tbl, true); long writeId; String validWriteIdList; -if (tableSnapshot != null && tableSnapshot.getWriteId() > 0) { - writeId = tableSnapshot.getWriteId(); - validWriteIdList = tableSnapshot.getValidWriteIdList(); + +// In case of replication, get the writeId from the source and use valid write Id list +// for replication. +if (addPartitionDesc.getReplicationSpec().isInReplicationScope() && +addPartitionDesc.getPartition(0).getWriteId() > 0) { Review comment: I think, in replication flow and for transactional table, the 2nd condition should be true. If no valid writeId obtained from source, then need to fail. We cannot fall back to default logic. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 220945) > Stats replication for ACID tables. > -- > > Key: HIVE-21109 > URL: https://issues.apache.org/jira/browse/HIVE-21109 > Project: Hive > Issue Type: Sub-task >Reporter: Ashutosh Bapat >Assignee: Ashutosh Bapat >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, > HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, > HIVE-21109.06.patch, HIVE-21109.07.patch, HIVE-21109.08.patch > > Time Spent: 7h 50m > Remaining Estimate: 0h > > Transactional tables require a writeID associated with the stats update. This > writeId needs to be in sync with the writeId on the source and hence needs to > be replicated from the source. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.
[ https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=220951=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-220951 ] ASF GitHub Bot logged work on HIVE-21109: - Author: ASF GitHub Bot Created on: 31/Mar/19 07:33 Start Date: 31/Mar/19 07:33 Worklog Time Spent: 10m Work Description: sankarh commented on pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r270653475 ## File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestStatsReplicationScenarios.java ## @@ -301,12 +338,106 @@ private String dumpLoadVerify(List tableNames, String lastReplicationId, return dumpTuple.lastReplicationId; } + /** + * Run a bootstrap that will fail. + * @param tuple the location of bootstrap dump + */ + private void failBootstrapLoad(WarehouseInstance.Tuple tuple, int failAfterNumTables) throws Throwable { +// fail setting ckpt directory property for the second table so that we test the case when +// bootstrap load fails after some but not all tables are loaded. +BehaviourInjection callerVerifier += new BehaviourInjection() { + int cntTables = 0; + @Nullable + @Override + public Boolean apply(@Nullable CallerArguments args) { +cntTables++; +if (args.dbName.equalsIgnoreCase(replicatedDbName) && cntTables > failAfterNumTables) { + injectionPathCalled = true; + LOG.warn("Verifier - DB : " + args.dbName + " TABLE : " + args.tblName); + return false; +} +return true; + } +}; + +InjectableBehaviourObjectStore.setAlterTableModifier(callerVerifier); +try { + replica.loadFailure(replicatedDbName, tuple.dumpLocation); + callerVerifier.assertInjectionsPerformed(true, false); +} finally { + InjectableBehaviourObjectStore.resetAlterTableModifier(); +} + } + + private void failIncrementalLoad(WarehouseInstance.Tuple dumpTuple, int failAfterNumEvents) throws Throwable { +// fail add notification when updating table stats after given number of such events. Thus we +// test successful application as well as failed application of this event. +BehaviourInjection callerVerifier += new BehaviourInjection() { + int cntEvents = 0; + @Override + public Boolean apply(NotificationEvent entry) { +cntEvents++; +if (entry.getEventType().equalsIgnoreCase(EventMessage.EventType.UPDATE_TABLE_COLUMN_STAT.toString()) && +cntEvents > failAfterNumEvents) { + injectionPathCalled = true; + LOG.warn("Verifier - DB: " + entry.getDbName() + + " Table: " + entry.getTableName() + + " Event: " + entry.getEventType()); + return false; +} +return true; + } +}; + +InjectableBehaviourObjectStore.setAddNotificationModifier(callerVerifier); +try { + replica.loadFailure(replicatedDbName, dumpTuple.dumpLocation); +} finally { + InjectableBehaviourObjectStore.resetAddNotificationModifier(); +} +callerVerifier.assertInjectionsPerformed(true, false); + +// fail add notification when updating partition stats for for the second time. Thus we test +// successful application as well as failed application of this event. +callerVerifier = new BehaviourInjection() { + int cntEvents = 1; + + @Override + public Boolean apply(NotificationEvent entry) { +cntEvents++; +if (entry.getEventType().equalsIgnoreCase(EventMessage.EventType.UPDATE_PARTITION_COLUMN_STAT.toString()) && +cntEvents > failAfterNumEvents) { + injectionPathCalled = true; + LOG.warn("Verifier - DB: " + entry.getDbName() + + " Table: " + entry.getTableName() + + " Event: " + entry.getEventType()); + return false; +} +return true; + } +}; + +InjectableBehaviourObjectStore.setAddNotificationModifier(callerVerifier); +try { + replica.loadFailure(replicatedDbName, dumpTuple.dumpLocation); +} finally { + InjectableBehaviourObjectStore.resetAddNotificationModifier(); +} Review comment: Shall add validations to see if REPL LOAD fails at right place. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 220951) Time Spent: 8.5h (was: 8h 20m) > Stats replication for ACID tables. > --
[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.
[ https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=220944=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-220944 ] ASF GitHub Bot logged work on HIVE-21109: - Author: ASF GitHub Bot Created on: 31/Mar/19 07:33 Start Date: 31/Mar/19 07:33 Worklog Time Spent: 10m Work Description: sankarh commented on pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r270652633 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnStatsUpdateTask.java ## @@ -297,21 +303,34 @@ private ColumnStatisticsDesc getColumnStatsDesc(String dbName, private int persistColumnStats(Hive db) throws HiveException, MetaException, IOException { ColumnStatistics colStats = constructColumnStatsFromInput(); -ColumnStatisticsDesc colStatsDesc = colStats.getStatsDesc(); -// We do not support stats replication for a transactional table yet. If we are converting -// a non-transactional table to a transactional table during replication, we might get -// column statistics but we shouldn't update those. -if (work.getColStats() != null && - AcidUtils.isTransactionalTable(getHive().getTable(colStatsDesc.getDbName(), - colStatsDesc.getTableName( { - LOG.debug("Skipped updating column stats for table " + -TableName.getDbTable(colStatsDesc.getDbName(), colStatsDesc.getTableName()) + -" because it is converted to a transactional table during replication."); - return 0; -} - SetPartitionsStatsRequest request = new SetPartitionsStatsRequest(Collections.singletonList(colStats)); + +// Set writeId and validWriteId list for replicated statistics. +if (work.getColStats() != null) { + String dbName = colStats.getStatsDesc().getDbName(); + String tblName = colStats.getStatsDesc().getTableName(); + Table tbl = db.getTable(dbName, tblName); + long writeId = work.getWriteId(); + // If it's a transactional table on source and target, we will get a valid writeId + // associated with it. Otherwise it's a non-transactional table on source migrated to a + // transactional table on target, we need to craft a valid writeId here. + if (AcidUtils.isTransactionalTable(tbl)) { +ValidWriteIdList writeIds; +if (writeId <= 0) { Review comment: Instead of having this assumption of "writeId <= 0 means migration case", it is better if the caller sets the correct writeId in ColumnStatsUpdateWork itself. If it is non-migration case and there is a bug in the caller and passes wrong writeId, then we throw incorrect exception message. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 220944) > Stats replication for ACID tables. > -- > > Key: HIVE-21109 > URL: https://issues.apache.org/jira/browse/HIVE-21109 > Project: Hive > Issue Type: Sub-task >Reporter: Ashutosh Bapat >Assignee: Ashutosh Bapat >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, > HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, > HIVE-21109.06.patch, HIVE-21109.07.patch, HIVE-21109.08.patch > > Time Spent: 7h 50m > Remaining Estimate: 0h > > Transactional tables require a writeID associated with the stats update. This > writeId needs to be in sync with the writeId on the source and hence needs to > be replicated from the source. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.
[ https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=220942=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-220942 ] ASF GitHub Bot logged work on HIVE-21109: - Author: ASF GitHub Bot Created on: 31/Mar/19 07:33 Start Date: 31/Mar/19 07:33 Worklog Time Spent: 10m Work Description: sankarh commented on pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r270653464 ## File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestStatsReplicationScenarios.java ## @@ -301,12 +338,106 @@ private String dumpLoadVerify(List tableNames, String lastReplicationId, return dumpTuple.lastReplicationId; } + /** + * Run a bootstrap that will fail. + * @param tuple the location of bootstrap dump + */ + private void failBootstrapLoad(WarehouseInstance.Tuple tuple, int failAfterNumTables) throws Throwable { +// fail setting ckpt directory property for the second table so that we test the case when +// bootstrap load fails after some but not all tables are loaded. +BehaviourInjection callerVerifier += new BehaviourInjection() { + int cntTables = 0; + @Nullable + @Override + public Boolean apply(@Nullable CallerArguments args) { +cntTables++; +if (args.dbName.equalsIgnoreCase(replicatedDbName) && cntTables > failAfterNumTables) { + injectionPathCalled = true; + LOG.warn("Verifier - DB : " + args.dbName + " TABLE : " + args.tblName); + return false; +} +return true; + } +}; + +InjectableBehaviourObjectStore.setAlterTableModifier(callerVerifier); +try { + replica.loadFailure(replicatedDbName, tuple.dumpLocation); + callerVerifier.assertInjectionsPerformed(true, false); +} finally { + InjectableBehaviourObjectStore.resetAlterTableModifier(); +} + } + + private void failIncrementalLoad(WarehouseInstance.Tuple dumpTuple, int failAfterNumEvents) throws Throwable { +// fail add notification when updating table stats after given number of such events. Thus we +// test successful application as well as failed application of this event. +BehaviourInjection callerVerifier += new BehaviourInjection() { + int cntEvents = 0; + @Override + public Boolean apply(NotificationEvent entry) { +cntEvents++; Review comment: The add notification count in target may not match the number of events from source. So, better to count the number of AlterTable which changes last_repl_id parameters. It will be set once per event. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 220942) Time Spent: 7h 50m (was: 7h 40m) > Stats replication for ACID tables. > -- > > Key: HIVE-21109 > URL: https://issues.apache.org/jira/browse/HIVE-21109 > Project: Hive > Issue Type: Sub-task >Reporter: Ashutosh Bapat >Assignee: Ashutosh Bapat >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, > HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, > HIVE-21109.06.patch, HIVE-21109.07.patch, HIVE-21109.08.patch > > Time Spent: 7h 50m > Remaining Estimate: 0h > > Transactional tables require a writeID associated with the stats update. This > writeId needs to be in sync with the writeId on the source and hence needs to > be replicated from the source. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.
[ https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=220949=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-220949 ] ASF GitHub Bot logged work on HIVE-21109: - Author: ASF GitHub Bot Created on: 31/Mar/19 07:33 Start Date: 31/Mar/19 07:33 Worklog Time Spent: 10m Work Description: sankarh commented on pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r270652817 ## File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java ## @@ -694,7 +695,9 @@ public void alterTable(String catName, String dbName, String tblName, Table newT AcidUtils.TableSnapshot tableSnapshot = null; if (transactional) { if (replWriteId > 0) { - ValidWriteIdList writeIds = AcidUtils.getTableValidWriteIdListWithTxnList(conf, dbName, tblName); + ValidWriteIdList writeIds = new ValidReaderWriteIdList(TableName.getDbTable(dbName, tblName), Review comment: Pls add a comment on why this hardcoding logic works for repl flow. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 220949) Time Spent: 8h 10m (was: 8h) > Stats replication for ACID tables. > -- > > Key: HIVE-21109 > URL: https://issues.apache.org/jira/browse/HIVE-21109 > Project: Hive > Issue Type: Sub-task >Reporter: Ashutosh Bapat >Assignee: Ashutosh Bapat >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, > HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, > HIVE-21109.06.patch, HIVE-21109.07.patch, HIVE-21109.08.patch > > Time Spent: 8h 10m > Remaining Estimate: 0h > > Transactional tables require a writeID associated with the stats update. This > writeId needs to be in sync with the writeId on the source and hence needs to > be replicated from the source. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.
[ https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=220943=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-220943 ] ASF GitHub Bot logged work on HIVE-21109: - Author: ASF GitHub Bot Created on: 31/Mar/19 07:33 Start Date: 31/Mar/19 07:33 Worklog Time Spent: 10m Work Description: sankarh commented on pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r270653379 ## File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestStatsReplicationScenarios.java ## @@ -301,12 +338,106 @@ private String dumpLoadVerify(List tableNames, String lastReplicationId, return dumpTuple.lastReplicationId; } + /** + * Run a bootstrap that will fail. + * @param tuple the location of bootstrap dump + */ + private void failBootstrapLoad(WarehouseInstance.Tuple tuple, int failAfterNumTables) throws Throwable { +// fail setting ckpt directory property for the second table so that we test the case when +// bootstrap load fails after some but not all tables are loaded. +BehaviourInjection callerVerifier += new BehaviourInjection() { + int cntTables = 0; + @Nullable + @Override + public Boolean apply(@Nullable CallerArguments args) { +cntTables++; Review comment: This stub will be called multiple times per table as it is invoked by several methods in InjectableBehaviourObjectStore. Need to increment the count only if encounter a new table. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 220943) Time Spent: 7h 50m (was: 7h 40m) > Stats replication for ACID tables. > -- > > Key: HIVE-21109 > URL: https://issues.apache.org/jira/browse/HIVE-21109 > Project: Hive > Issue Type: Sub-task >Reporter: Ashutosh Bapat >Assignee: Ashutosh Bapat >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, > HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, > HIVE-21109.06.patch, HIVE-21109.07.patch, HIVE-21109.08.patch > > Time Spent: 7h 50m > Remaining Estimate: 0h > > Transactional tables require a writeID associated with the stats update. This > writeId needs to be in sync with the writeId on the source and hence needs to > be replicated from the source. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.
[ https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=220940=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-220940 ] ASF GitHub Bot logged work on HIVE-21109: - Author: ASF GitHub Bot Created on: 31/Mar/19 07:33 Start Date: 31/Mar/19 07:33 Worklog Time Spent: 10m Work Description: sankarh commented on pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r270652339 ## File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestStatsReplicationScenarios.java ## @@ -216,16 +233,23 @@ private void verifyNoPartitionStatsReplicationForMetadataOnly(String tableName) String ndTableName = "ndTable"; // Partitioned table without data during bootstrap and hence no stats. String ndPartTableName = "ndPTable"; +String tblCreateExtra = ""; + +if (useAcidTables) { Review comment: We can add one more test set for MM (insert-only ACID tables). Also, ACID table stats tests should cover few more operations. 1. Delete and update on Full ACID tables. 2. Insert overwrite, Truncate. 3. LOAD DATA, Import. 4. CTAS. 5. MERGE. 6. ADD/REMOVE columns. 7. Table/partition renames (Need to see, if REPL LOAD of rename event takes care of stats too.) Note, we need to run dump and load after each operation to check if the stats are consistent. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 220940) Time Spent: 7h 40m (was: 7.5h) > Stats replication for ACID tables. > -- > > Key: HIVE-21109 > URL: https://issues.apache.org/jira/browse/HIVE-21109 > Project: Hive > Issue Type: Sub-task >Reporter: Ashutosh Bapat >Assignee: Ashutosh Bapat >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, > HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, > HIVE-21109.06.patch, HIVE-21109.07.patch, HIVE-21109.08.patch > > Time Spent: 7h 40m > Remaining Estimate: 0h > > Transactional tables require a writeID associated with the stats update. This > writeId needs to be in sync with the writeId on the source and hence needs to > be replicated from the source. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.
[ https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=220593=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-220593 ] ASF GitHub Bot logged work on HIVE-21109: - Author: ASF GitHub Bot Created on: 29/Mar/19 16:12 Start Date: 29/Mar/19 16:12 Worklog Time Spent: 10m Work Description: ashutosh-bapat commented on pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r270479016 ## File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java ## @@ -987,10 +989,14 @@ public void createTable(Table tbl, boolean ifNotExists, tTbl.setPrivileges(principalPrivs); } } - // Set table snapshot to api.Table to make it persistent. - TableSnapshot tableSnapshot = AcidUtils.getTableSnapshot(conf, tbl, true); - if (tableSnapshot != null) { -tbl.getTTable().setWriteId(tableSnapshot.getWriteId()); + // Set table snapshot to api.Table to make it persistent. A transactional table being + // replicated may have a valid write Id copied from the source. Use that instead of + // crafting one on the replica. + if (tTbl.getWriteId() <= 0) { Review comment: You are right. We do not need it at the creation time. We already have tests for that and they are working fine i.e. the expected stats both the table level and column level is getting replicated. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 220593) Time Spent: 7.5h (was: 7h 20m) > Stats replication for ACID tables. > -- > > Key: HIVE-21109 > URL: https://issues.apache.org/jira/browse/HIVE-21109 > Project: Hive > Issue Type: Sub-task >Reporter: Ashutosh Bapat >Assignee: Ashutosh Bapat >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, > HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, > HIVE-21109.06.patch, HIVE-21109.07.patch, HIVE-21109.08.patch > > Time Spent: 7.5h > Remaining Estimate: 0h > > Transactional tables require a writeID associated with the stats update. This > writeId needs to be in sync with the writeId on the source and hence needs to > be replicated from the source. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.
[ https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=220592=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-220592 ] ASF GitHub Bot logged work on HIVE-21109: - Author: ASF GitHub Bot Created on: 29/Mar/19 16:10 Start Date: 29/Mar/19 16:10 Worklog Time Spent: 10m Work Description: ashutosh-bapat commented on pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r270478318 ## File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestStatsReplicationScenarios.java ## @@ -359,17 +383,20 @@ private void testStatsReplicationCommon(boolean parallelBootstrap, boolean metad } @Test - public void testForNonAcidTables() throws Throwable { + public void testNonParallelBootstrapLoad() throws Throwable { +LOG.info("Testing " + testName.getClass().getName() + "." + testName.getMethodName()); testStatsReplicationCommon(false, false); } @Test - public void testForNonAcidTablesParallelBootstrapLoad() throws Throwable { -testStatsReplicationCommon(true, false); + public void testForParallelBootstrapLoad() throws Throwable { +LOG.info("Testing " + testName.getClass().getName() + "." + testName.getMethodName()); +testStatsReplicationCommon(true, false ); } @Test - public void testNonAcidMetadataOnlyDump() throws Throwable { + public void testMetadataOnlyDump() throws Throwable { Review comment: Added test for the first case. For second case, the events for parallel inserts will be serialized and applied serially on repl side. So this should be a problem on repl. We may test whether the events are generated in serialized fashion and have same expected contents. But that should be done a test which tests concurrent inserts (may be we already have it somewhere) and not in a replication test. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 220592) Time Spent: 7h 20m (was: 7h 10m) > Stats replication for ACID tables. > -- > > Key: HIVE-21109 > URL: https://issues.apache.org/jira/browse/HIVE-21109 > Project: Hive > Issue Type: Sub-task >Reporter: Ashutosh Bapat >Assignee: Ashutosh Bapat >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, > HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, > HIVE-21109.06.patch, HIVE-21109.07.patch, HIVE-21109.08.patch > > Time Spent: 7h 20m > Remaining Estimate: 0h > > Transactional tables require a writeID associated with the stats update. This > writeId needs to be in sync with the writeId on the source and hence needs to > be replicated from the source. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.
[ https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=220591=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-220591 ] ASF GitHub Bot logged work on HIVE-21109: - Author: ASF GitHub Bot Created on: 29/Mar/19 16:08 Start Date: 29/Mar/19 16:08 Worklog Time Spent: 10m Work Description: ashutosh-bapat commented on pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r270477619 ## File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java ## @@ -2950,21 +2956,33 @@ public Partition createPartition(Table tbl, Map partSpec) throws int size = addPartitionDesc.getPartitionCount(); List in = new ArrayList(size); -AcidUtils.TableSnapshot tableSnapshot = AcidUtils.getTableSnapshot(conf, tbl, true); long writeId; String validWriteIdList; -if (tableSnapshot != null && tableSnapshot.getWriteId() > 0) { - writeId = tableSnapshot.getWriteId(); - validWriteIdList = tableSnapshot.getValidWriteIdList(); + +// In case of replication, get the writeId from the source and use valid write Id list +// for replication. +if (addPartitionDesc.getReplicationSpec() != null && +addPartitionDesc.getReplicationSpec().isInReplicationScope() && +addPartitionDesc.getPartition(0).getWriteId() > 0) { + writeId = addPartitionDesc.getPartition(0).getWriteId(); + validWriteIdList = Review comment: Done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 220591) Time Spent: 7h 10m (was: 7h) > Stats replication for ACID tables. > -- > > Key: HIVE-21109 > URL: https://issues.apache.org/jira/browse/HIVE-21109 > Project: Hive > Issue Type: Sub-task >Reporter: Ashutosh Bapat >Assignee: Ashutosh Bapat >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, > HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, > HIVE-21109.06.patch, HIVE-21109.07.patch, HIVE-21109.08.patch > > Time Spent: 7h 10m > Remaining Estimate: 0h > > Transactional tables require a writeID associated with the stats update. This > writeId needs to be in sync with the writeId on the source and hence needs to > be replicated from the source. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.
[ https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=220590=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-220590 ] ASF GitHub Bot logged work on HIVE-21109: - Author: ASF GitHub Bot Created on: 29/Mar/19 16:08 Start Date: 29/Mar/19 16:08 Worklog Time Spent: 10m Work Description: ashutosh-bapat commented on pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r270477567 ## File path: standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnCommonUtils.java ## @@ -84,6 +86,73 @@ public static ValidTxnList createValidReadTxnList(GetOpenTxnsResponse txns, long return new ValidReadTxnList(exceptions, outAbortedBits, highWaterMark, minOpenTxnId); } + /** + * Transform a {@link org.apache.hadoop.hive.metastore.api.GetOpenTxnsResponse} to a + * {@link org.apache.hadoop.hive.common.ValidTxnList}. This assumes that the caller intends to + * read the files, and thus treats both open and aborted transactions as invalid. + * + * This API is used by Hive replication which may have multiple transactions open at a time. + * + * @param txns open txn list from the metastore + * @param currentTxns Current transactions that the replication has opened. If any of the + *transactions is greater than 0 it will be removed from the exceptions + *list so that the replication sees its own transaction as valid. + * @return a valid txn list. + */ + public static ValidTxnList createValidReadTxnList(GetOpenTxnsResponse txns, Review comment: Done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 220590) Time Spent: 7h (was: 6h 50m) > Stats replication for ACID tables. > -- > > Key: HIVE-21109 > URL: https://issues.apache.org/jira/browse/HIVE-21109 > Project: Hive > Issue Type: Sub-task >Reporter: Ashutosh Bapat >Assignee: Ashutosh Bapat >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, > HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, > HIVE-21109.06.patch, HIVE-21109.07.patch, HIVE-21109.08.patch > > Time Spent: 7h > Remaining Estimate: 0h > > Transactional tables require a writeID associated with the stats update. This > writeId needs to be in sync with the writeId on the source and hence needs to > be replicated from the source. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.
[ https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=219857=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-219857 ] ASF GitHub Bot logged work on HIVE-21109: - Author: ASF GitHub Bot Created on: 28/Mar/19 07:24 Start Date: 28/Mar/19 07:24 Worklog Time Spent: 10m Work Description: sankarh commented on pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r269880708 ## File path: standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnCommonUtils.java ## @@ -84,6 +86,73 @@ public static ValidTxnList createValidReadTxnList(GetOpenTxnsResponse txns, long return new ValidReadTxnList(exceptions, outAbortedBits, highWaterMark, minOpenTxnId); } + /** + * Transform a {@link org.apache.hadoop.hive.metastore.api.GetOpenTxnsResponse} to a + * {@link org.apache.hadoop.hive.common.ValidTxnList}. This assumes that the caller intends to + * read the files, and thus treats both open and aborted transactions as invalid. + * + * This API is used by Hive replication which may have multiple transactions open at a time. + * + * @param txns open txn list from the metastore + * @param currentTxns Current transactions that the replication has opened. If any of the + *transactions is greater than 0 it will be removed from the exceptions + *list so that the replication sees its own transaction as valid. + * @return a valid txn list. + */ + public static ValidTxnList createValidReadTxnList(GetOpenTxnsResponse txns, Review comment: Yes, even I think, for REPL LOAD, we should always hardcode the ValidWriteIdList using current writeId so that stats are always valid while applying current event. Even if it is invalid, the subsequent alterTable/partition event would set it so in the table/partition parameters. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 219857) Time Spent: 6h 50m (was: 6h 40m) > Stats replication for ACID tables. > -- > > Key: HIVE-21109 > URL: https://issues.apache.org/jira/browse/HIVE-21109 > Project: Hive > Issue Type: Sub-task >Reporter: Ashutosh Bapat >Assignee: Ashutosh Bapat >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, > HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, > HIVE-21109.06.patch > > Time Spent: 6h 50m > Remaining Estimate: 0h > > Transactional tables require a writeID associated with the stats update. This > writeId needs to be in sync with the writeId on the source and hence needs to > be replicated from the source. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.
[ https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=219855=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-219855 ] ASF GitHub Bot logged work on HIVE-21109: - Author: ASF GitHub Bot Created on: 28/Mar/19 07:21 Start Date: 28/Mar/19 07:21 Worklog Time Spent: 10m Work Description: sankarh commented on pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r269880083 ## File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java ## @@ -2950,21 +2956,33 @@ public Partition createPartition(Table tbl, Map partSpec) throws int size = addPartitionDesc.getPartitionCount(); List in = new ArrayList(size); -AcidUtils.TableSnapshot tableSnapshot = AcidUtils.getTableSnapshot(conf, tbl, true); long writeId; String validWriteIdList; -if (tableSnapshot != null && tableSnapshot.getWriteId() > 0) { - writeId = tableSnapshot.getWriteId(); - validWriteIdList = tableSnapshot.getValidWriteIdList(); + +// In case of replication, get the writeId from the source and use valid write Id list +// for replication. +if (addPartitionDesc.getReplicationSpec() != null && +addPartitionDesc.getReplicationSpec().isInReplicationScope() && +addPartitionDesc.getPartition(0).getWriteId() > 0) { + writeId = addPartitionDesc.getPartition(0).getWriteId(); + validWriteIdList = Review comment: Even that logic to create ValidWriteIdList based on all repl opened txns isn't right as it says that stats are valid for all these open txns but it isn't. Also, it sets high water mark based on 0th index in the replTxnsList map which might be pointing to wrong writeId compared to current txn's writeId. So, I doubt, this logic should be anyways removed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 219855) Time Spent: 6h 40m (was: 6.5h) > Stats replication for ACID tables. > -- > > Key: HIVE-21109 > URL: https://issues.apache.org/jira/browse/HIVE-21109 > Project: Hive > Issue Type: Sub-task >Reporter: Ashutosh Bapat >Assignee: Ashutosh Bapat >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, > HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, > HIVE-21109.06.patch > > Time Spent: 6h 40m > Remaining Estimate: 0h > > Transactional tables require a writeID associated with the stats update. This > writeId needs to be in sync with the writeId on the source and hence needs to > be replicated from the source. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.
[ https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=219854=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-219854 ] ASF GitHub Bot logged work on HIVE-21109: - Author: ASF GitHub Bot Created on: 28/Mar/19 07:20 Start Date: 28/Mar/19 07:20 Worklog Time Spent: 10m Work Description: ashutosh-bapat commented on pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r269879748 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java ## @@ -2689,7 +2689,19 @@ private int alterTable(Hive db, AlterTableDesc alterTbl) throws HiveException { } else { // Note: this is necessary for UPDATE_STATISTICS command, that operates via ADDPROPS (why?). // For any other updates, we don't want to do txn check on partitions when altering table. -boolean isTxn = alterTbl.getPartSpec() != null && alterTbl.getOp() == AlterTableTypes.ADDPROPS; +boolean isTxn = false; +if (alterTbl.getPartSpec() != null && alterTbl.getOp() == AlterTableTypes.ADDPROPS) { + // ADDPROPS is used to add repl.last.id during replication. That's not a transactional + // change. + Map props = alterTbl.getProps(); + if (props.size() <= 1 && props.get(ReplicationSpec.KEY.CURR_STATE_ID.toString()) != null) { Review comment: Done. Instead of last.repl.id, I am explicitly checking if the property is related to stats and then set isTxn only in case of replication. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 219854) Time Spent: 6.5h (was: 6h 20m) > Stats replication for ACID tables. > -- > > Key: HIVE-21109 > URL: https://issues.apache.org/jira/browse/HIVE-21109 > Project: Hive > Issue Type: Sub-task >Reporter: Ashutosh Bapat >Assignee: Ashutosh Bapat >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, > HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, > HIVE-21109.06.patch > > Time Spent: 6.5h > Remaining Estimate: 0h > > Transactional tables require a writeID associated with the stats update. This > writeId needs to be in sync with the writeId on the source and hence needs to > be replicated from the source. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.
[ https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=219844=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-219844 ] ASF GitHub Bot logged work on HIVE-21109: - Author: ASF GitHub Bot Created on: 28/Mar/19 06:51 Start Date: 28/Mar/19 06:51 Worklog Time Spent: 10m Work Description: ashutosh-bapat commented on pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r269874353 ## File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java ## @@ -2950,21 +2956,33 @@ public Partition createPartition(Table tbl, Map partSpec) throws int size = addPartitionDesc.getPartitionCount(); List in = new ArrayList(size); -AcidUtils.TableSnapshot tableSnapshot = AcidUtils.getTableSnapshot(conf, tbl, true); long writeId; String validWriteIdList; -if (tableSnapshot != null && tableSnapshot.getWriteId() > 0) { - writeId = tableSnapshot.getWriteId(); - validWriteIdList = tableSnapshot.getValidWriteIdList(); + +// In case of replication, get the writeId from the source and use valid write Id list +// for replication. +if (addPartitionDesc.getReplicationSpec() != null && +addPartitionDesc.getReplicationSpec().isInReplicationScope() && +addPartitionDesc.getPartition(0).getWriteId() > 0) { + writeId = addPartitionDesc.getPartition(0).getWriteId(); + validWriteIdList = Review comment: Ok. Underneath this code is using the valid write id list created using open transaction list of repl. So, this isn't wrong. But this may change subject to the changes because of other comment you have. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 219844) Time Spent: 6h 20m (was: 6h 10m) > Stats replication for ACID tables. > -- > > Key: HIVE-21109 > URL: https://issues.apache.org/jira/browse/HIVE-21109 > Project: Hive > Issue Type: Sub-task >Reporter: Ashutosh Bapat >Assignee: Ashutosh Bapat >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, > HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, > HIVE-21109.06.patch > > Time Spent: 6h 20m > Remaining Estimate: 0h > > Transactional tables require a writeID associated with the stats update. This > writeId needs to be in sync with the writeId on the source and hence needs to > be replicated from the source. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.
[ https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=219842=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-219842 ] ASF GitHub Bot logged work on HIVE-21109: - Author: ASF GitHub Bot Created on: 28/Mar/19 06:42 Start Date: 28/Mar/19 06:42 Worklog Time Spent: 10m Work Description: ashutosh-bapat commented on pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r269872622 ## File path: standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnCommonUtils.java ## @@ -84,6 +86,73 @@ public static ValidTxnList createValidReadTxnList(GetOpenTxnsResponse txns, long return new ValidReadTxnList(exceptions, outAbortedBits, highWaterMark, minOpenTxnId); } + /** + * Transform a {@link org.apache.hadoop.hive.metastore.api.GetOpenTxnsResponse} to a + * {@link org.apache.hadoop.hive.common.ValidTxnList}. This assumes that the caller intends to + * read the files, and thus treats both open and aborted transactions as invalid. + * + * This API is used by Hive replication which may have multiple transactions open at a time. + * + * @param txns open txn list from the metastore + * @param currentTxns Current transactions that the replication has opened. If any of the + *transactions is greater than 0 it will be removed from the exceptions + *list so that the replication sees its own transaction as valid. + * @return a valid txn list. + */ + public static ValidTxnList createValidReadTxnList(GetOpenTxnsResponse txns, Review comment: If there were multiple transactions on the source running concurrently at a time, there will be those many open transaction events in the dump which when replicated will have those many open transactions at a time on the target while replaying those events. So, there could be multiple open transactions on target even during repl load. The only link between CreateTableOperation#createTableReplaceMode and Hive#alterTable is EnvironmentContext, so would could use this to pass a flag to indicate the valid writeId list should be created using the given writeId. But we are using Environment context to pass information only to the metastore and not use it in-between. We could construct the valid writeId list in the metastore directly like what we are doing for create table and partition using that kind of flag. Does that look good? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 219842) Time Spent: 6h 10m (was: 6h) > Stats replication for ACID tables. > -- > > Key: HIVE-21109 > URL: https://issues.apache.org/jira/browse/HIVE-21109 > Project: Hive > Issue Type: Sub-task >Reporter: Ashutosh Bapat >Assignee: Ashutosh Bapat >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, > HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, > HIVE-21109.06.patch > > Time Spent: 6h 10m > Remaining Estimate: 0h > > Transactional tables require a writeID associated with the stats update. This > writeId needs to be in sync with the writeId on the source and hence needs to > be replicated from the source. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.
[ https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=219813=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-219813 ] ASF GitHub Bot logged work on HIVE-21109: - Author: ASF GitHub Bot Created on: 28/Mar/19 05:26 Start Date: 28/Mar/19 05:26 Worklog Time Spent: 10m Work Description: sankarh commented on pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r269861070 ## File path: ql/src/java/org/apache/hadoop/hive/ql/plan/ImportTableDesc.java ## @@ -381,4 +382,11 @@ public void setOwnerName(String ownerName) { throw new RuntimeException("Invalid table type : " + getDescType()); } } + + public Long getReplWriteId() { +if (this.createTblDesc != null) { + return this.createTblDesc.getReplWriteId(); Review comment: If we unify writeId and replWriteId in CreateTableDesc into one, then it's fine. In fact, they are one and the same. So, no point in having 2 members for same value. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 219813) Time Spent: 6h (was: 5h 50m) > Stats replication for ACID tables. > -- > > Key: HIVE-21109 > URL: https://issues.apache.org/jira/browse/HIVE-21109 > Project: Hive > Issue Type: Sub-task >Reporter: Ashutosh Bapat >Assignee: Ashutosh Bapat >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, > HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, > HIVE-21109.06.patch > > Time Spent: 6h > Remaining Estimate: 0h > > Transactional tables require a writeID associated with the stats update. This > writeId needs to be in sync with the writeId on the source and hence needs to > be replicated from the source. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.
[ https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=219812=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-219812 ] ASF GitHub Bot logged work on HIVE-21109: - Author: ASF GitHub Bot Created on: 28/Mar/19 05:25 Start Date: 28/Mar/19 05:25 Worklog Time Spent: 10m Work Description: sankarh commented on pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r269861070 ## File path: ql/src/java/org/apache/hadoop/hive/ql/plan/ImportTableDesc.java ## @@ -381,4 +382,11 @@ public void setOwnerName(String ownerName) { throw new RuntimeException("Invalid table type : " + getDescType()); } } + + public Long getReplWriteId() { +if (this.createTblDesc != null) { + return this.createTblDesc.getReplWriteId(); Review comment: If we unify writeId and replWriteId in CreateTableDesc, then it's fine. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 219812) Time Spent: 5h 50m (was: 5h 40m) > Stats replication for ACID tables. > -- > > Key: HIVE-21109 > URL: https://issues.apache.org/jira/browse/HIVE-21109 > Project: Hive > Issue Type: Sub-task >Reporter: Ashutosh Bapat >Assignee: Ashutosh Bapat >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, > HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, > HIVE-21109.06.patch > > Time Spent: 5h 50m > Remaining Estimate: 0h > > Transactional tables require a writeID associated with the stats update. This > writeId needs to be in sync with the writeId on the source and hence needs to > be replicated from the source. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.
[ https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=219808=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-219808 ] ASF GitHub Bot logged work on HIVE-21109: - Author: ASF GitHub Bot Created on: 28/Mar/19 04:54 Start Date: 28/Mar/19 04:54 Worklog Time Spent: 10m Work Description: ashutosh-bapat commented on pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r269856891 ## File path: ql/src/java/org/apache/hadoop/hive/ql/plan/ImportTableDesc.java ## @@ -381,4 +382,11 @@ public void setOwnerName(String ownerName) { throw new RuntimeException("Invalid table type : " + getDescType()); } } + + public Long getReplWriteId() { +if (this.createTblDesc != null) { + return this.createTblDesc.getReplWriteId(); Review comment: AFAIU, the reason we set replWriteId in CreateTableDesc is it can be then passed everywhere CreateTableDesc is used. It's better not to create two paths for passing same writeId, with a risk of those going of sync with each other. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 219808) Time Spent: 5h 40m (was: 5.5h) > Stats replication for ACID tables. > -- > > Key: HIVE-21109 > URL: https://issues.apache.org/jira/browse/HIVE-21109 > Project: Hive > Issue Type: Sub-task >Reporter: Ashutosh Bapat >Assignee: Ashutosh Bapat >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, > HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, > HIVE-21109.06.patch > > Time Spent: 5h 40m > Remaining Estimate: 0h > > Transactional tables require a writeID associated with the stats update. This > writeId needs to be in sync with the writeId on the source and hence needs to > be replicated from the source. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.
[ https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=219788=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-219788 ] ASF GitHub Bot logged work on HIVE-21109: - Author: ASF GitHub Bot Created on: 28/Mar/19 03:16 Start Date: 28/Mar/19 03:16 Worklog Time Spent: 10m Work Description: sankarh commented on pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r269844369 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java ## @@ -2689,7 +2689,19 @@ private int alterTable(Hive db, AlterTableDesc alterTbl) throws HiveException { } else { // Note: this is necessary for UPDATE_STATISTICS command, that operates via ADDPROPS (why?). // For any other updates, we don't want to do txn check on partitions when altering table. -boolean isTxn = alterTbl.getPartSpec() != null && alterTbl.getOp() == AlterTableTypes.ADDPROPS; +boolean isTxn = false; +if (alterTbl.getPartSpec() != null && alterTbl.getOp() == AlterTableTypes.ADDPROPS) { + // ADDPROPS is used to add repl.last.id during replication. That's not a transactional + // change. + Map props = alterTbl.getProps(); + if (props.size() <= 1 && props.get(ReplicationSpec.KEY.CURR_STATE_ID.toString()) != null) { Review comment: I don't know. I would suggest to keep it non-transactional only in repl flow to avoid any impacts. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 219788) Time Spent: 5.5h (was: 5h 20m) > Stats replication for ACID tables. > -- > > Key: HIVE-21109 > URL: https://issues.apache.org/jira/browse/HIVE-21109 > Project: Hive > Issue Type: Sub-task >Reporter: Ashutosh Bapat >Assignee: Ashutosh Bapat >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, > HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, > HIVE-21109.06.patch > > Time Spent: 5.5h > Remaining Estimate: 0h > > Transactional tables require a writeID associated with the stats update. This > writeId needs to be in sync with the writeId on the source and hence needs to > be replicated from the source. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.
[ https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=219787=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-219787 ] ASF GitHub Bot logged work on HIVE-21109: - Author: ASF GitHub Bot Created on: 28/Mar/19 03:12 Start Date: 28/Mar/19 03:12 Worklog Time Spent: 10m Work Description: sankarh commented on pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r269843944 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java ## @@ -1894,6 +1898,16 @@ private void create_table_core(final RawStore ms, final Table tbl, List checkConstraints) throws AlreadyExistsException, MetaException, InvalidObjectException, NoSuchObjectException, InvalidInputException { + + ColumnStatistics colStats = null; + // If the given table has column statistics, save it here. We will update it later. + // We don't want it to be part of the Table object being created, lest the create table Review comment: I think, it's fine. Ignore this comment. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 219787) Time Spent: 5h 20m (was: 5h 10m) > Stats replication for ACID tables. > -- > > Key: HIVE-21109 > URL: https://issues.apache.org/jira/browse/HIVE-21109 > Project: Hive > Issue Type: Sub-task >Reporter: Ashutosh Bapat >Assignee: Ashutosh Bapat >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, > HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, > HIVE-21109.06.patch > > Time Spent: 5h 20m > Remaining Estimate: 0h > > Transactional tables require a writeID associated with the stats update. This > writeId needs to be in sync with the writeId on the source and hence needs to > be replicated from the source. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.
[ https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=219786=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-219786 ] ASF GitHub Bot logged work on HIVE-21109: - Author: ASF GitHub Bot Created on: 28/Mar/19 03:11 Start Date: 28/Mar/19 03:11 Worklog Time Spent: 10m Work Description: sankarh commented on pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r269843519 ## File path: ql/src/java/org/apache/hadoop/hive/ql/plan/ImportTableDesc.java ## @@ -381,4 +382,11 @@ public void setOwnerName(String ownerName) { throw new RuntimeException("Invalid table type : " + getDescType()); } } + + public Long getReplWriteId() { +if (this.createTblDesc != null) { + return this.createTblDesc.getReplWriteId(); Review comment: I meant, prepareImport already takes writeId (which comes from event message) as input parameter which is being set in CreateTableDesc and later read back by getBaseAddPartitionDescFromPartition. Instead, writeId itself can be passed to getBaseAddPartitionDescFromPartition. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 219786) Time Spent: 5h 10m (was: 5h) > Stats replication for ACID tables. > -- > > Key: HIVE-21109 > URL: https://issues.apache.org/jira/browse/HIVE-21109 > Project: Hive > Issue Type: Sub-task >Reporter: Ashutosh Bapat >Assignee: Ashutosh Bapat >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, > HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, > HIVE-21109.06.patch > > Time Spent: 5h 10m > Remaining Estimate: 0h > > Transactional tables require a writeID associated with the stats update. This > writeId needs to be in sync with the writeId on the source and hence needs to > be replicated from the source. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.
[ https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=219785=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-219785 ] ASF GitHub Bot logged work on HIVE-21109: - Author: ASF GitHub Bot Created on: 28/Mar/19 03:10 Start Date: 28/Mar/19 03:10 Worklog Time Spent: 10m Work Description: sankarh commented on pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r269257547 ## File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java ## @@ -987,10 +989,14 @@ public void createTable(Table tbl, boolean ifNotExists, tTbl.setPrivileges(principalPrivs); } } - // Set table snapshot to api.Table to make it persistent. - TableSnapshot tableSnapshot = AcidUtils.getTableSnapshot(conf, tbl, true); - if (tableSnapshot != null) { -tbl.getTTable().setWriteId(tableSnapshot.getWriteId()); + // Set table snapshot to api.Table to make it persistent. A transactional table being + // replicated may have a valid write Id copied from the source. Use that instead of + // crafting one on the replica. + if (tTbl.getWriteId() <= 0) { Review comment: DO_NOT_UPDATE_STATS flag should be set in createTableFlow as well. Or else in autogather mode at target, it will be updated automatically. Not sure if it is needed as table itself is not there in metastore. Anyways, please check if needed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 219785) Time Spent: 5h (was: 4h 50m) > Stats replication for ACID tables. > -- > > Key: HIVE-21109 > URL: https://issues.apache.org/jira/browse/HIVE-21109 > Project: Hive > Issue Type: Sub-task >Reporter: Ashutosh Bapat >Assignee: Ashutosh Bapat >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, > HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, > HIVE-21109.06.patch > > Time Spent: 5h > Remaining Estimate: 0h > > Transactional tables require a writeID associated with the stats update. This > writeId needs to be in sync with the writeId on the source and hence needs to > be replicated from the source. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.
[ https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=219784=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-219784 ] ASF GitHub Bot logged work on HIVE-21109: - Author: ASF GitHub Bot Created on: 28/Mar/19 03:09 Start Date: 28/Mar/19 03:09 Worklog Time Spent: 10m Work Description: sankarh commented on pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r269843519 ## File path: ql/src/java/org/apache/hadoop/hive/ql/plan/ImportTableDesc.java ## @@ -381,4 +382,11 @@ public void setOwnerName(String ownerName) { throw new RuntimeException("Invalid table type : " + getDescType()); } } + + public Long getReplWriteId() { +if (this.createTblDesc != null) { + return this.createTblDesc.getReplWriteId(); Review comment: I meant, prepareImport already takes writeId as input parameter which is being set in CreateTableDesc and later read back by getBaseAddPartitionDescFromPartition. Instead, writeId itself can be passed to getBaseAddPartitionDescFromPartition. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 219784) Time Spent: 4h 50m (was: 4h 40m) > Stats replication for ACID tables. > -- > > Key: HIVE-21109 > URL: https://issues.apache.org/jira/browse/HIVE-21109 > Project: Hive > Issue Type: Sub-task >Reporter: Ashutosh Bapat >Assignee: Ashutosh Bapat >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, > HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, > HIVE-21109.06.patch > > Time Spent: 4h 50m > Remaining Estimate: 0h > > Transactional tables require a writeID associated with the stats update. This > writeId needs to be in sync with the writeId on the source and hence needs to > be replicated from the source. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.
[ https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=219266=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-219266 ] ASF GitHub Bot logged work on HIVE-21109: - Author: ASF GitHub Bot Created on: 27/Mar/19 12:03 Start Date: 27/Mar/19 12:03 Worklog Time Spent: 10m Work Description: ashutosh-bapat commented on pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r269523732 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java ## @@ -2689,7 +2689,19 @@ private int alterTable(Hive db, AlterTableDesc alterTbl) throws HiveException { } else { // Note: this is necessary for UPDATE_STATISTICS command, that operates via ADDPROPS (why?). // For any other updates, we don't want to do txn check on partitions when altering table. -boolean isTxn = alterTbl.getPartSpec() != null && alterTbl.getOp() == AlterTableTypes.ADDPROPS; +boolean isTxn = false; +if (alterTbl.getPartSpec() != null && alterTbl.getOp() == AlterTableTypes.ADDPROPS) { + // ADDPROPS is used to add repl.last.id during replication. That's not a transactional + // change. + Map props = alterTbl.getProps(); + if (props.size() <= 1 && props.get(ReplicationSpec.KEY.CURR_STATE_ID.toString()) != null) { Review comment: The comment // Note: this is necessary for UPDATE_STATISTICS command, that operates via ADDPROPS (why?). // For any other updates, we don't want to do txn check on partitions when altering table. itself looks wrong. I do not see any ADDPROPS usage which is updating statistics properties. All those seem to come through AddPartition and not alterTable for partitioned table. So, may be we can safely mark this as non-transactional always. Does that look right? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 219266) Time Spent: 4h 40m (was: 4.5h) > Stats replication for ACID tables. > -- > > Key: HIVE-21109 > URL: https://issues.apache.org/jira/browse/HIVE-21109 > Project: Hive > Issue Type: Sub-task >Reporter: Ashutosh Bapat >Assignee: Ashutosh Bapat >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, > HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, > HIVE-21109.06.patch > > Time Spent: 4h 40m > Remaining Estimate: 0h > > Transactional tables require a writeID associated with the stats update. This > writeId needs to be in sync with the writeId on the source and hence needs to > be replicated from the source. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.
[ https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=219260=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-219260 ] ASF GitHub Bot logged work on HIVE-21109: - Author: ASF GitHub Bot Created on: 27/Mar/19 11:56 Start Date: 27/Mar/19 11:56 Worklog Time Spent: 10m Work Description: ashutosh-bapat commented on pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r269523732 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java ## @@ -2689,7 +2689,19 @@ private int alterTable(Hive db, AlterTableDesc alterTbl) throws HiveException { } else { // Note: this is necessary for UPDATE_STATISTICS command, that operates via ADDPROPS (why?). // For any other updates, we don't want to do txn check on partitions when altering table. -boolean isTxn = alterTbl.getPartSpec() != null && alterTbl.getOp() == AlterTableTypes.ADDPROPS; +boolean isTxn = false; +if (alterTbl.getPartSpec() != null && alterTbl.getOp() == AlterTableTypes.ADDPROPS) { + // ADDPROPS is used to add repl.last.id during replication. That's not a transactional + // change. + Map props = alterTbl.getProps(); + if (props.size() <= 1 && props.get(ReplicationSpec.KEY.CURR_STATE_ID.toString()) != null) { Review comment: itself is wrong. I do not see any ADDPROPS usage which is updating transactional properties. All those seem to come through AddPartition and not alterTable for partitioned table. So, may be we can safely mark this as non-transactional always. Does that look right? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 219260) Time Spent: 4.5h (was: 4h 20m) > Stats replication for ACID tables. > -- > > Key: HIVE-21109 > URL: https://issues.apache.org/jira/browse/HIVE-21109 > Project: Hive > Issue Type: Sub-task >Reporter: Ashutosh Bapat >Assignee: Ashutosh Bapat >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, > HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, > HIVE-21109.06.patch > > Time Spent: 4.5h > Remaining Estimate: 0h > > Transactional tables require a writeID associated with the stats update. This > writeId needs to be in sync with the writeId on the source and hence needs to > be replicated from the source. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.
[ https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=219250=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-219250 ] ASF GitHub Bot logged work on HIVE-21109: - Author: ASF GitHub Bot Created on: 27/Mar/19 11:11 Start Date: 27/Mar/19 11:11 Worklog Time Spent: 10m Work Description: ashutosh-bapat commented on pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r269508934 ## File path: ql/src/java/org/apache/hadoop/hive/ql/ddl/table/CreateTableDesc.java ## @@ -118,7 +118,8 @@ List notNullConstraints; List defaultConstraints; List checkConstraints; - private ColumnStatistics colStats; + private ColumnStatistics colStats; // For the sake of replication + private long writeId = -1; // For the sake of replication Review comment: I was initially afraid that there could be other side-effects of this change. Your suggestion will bring all writeId replication through replWriteId, which is good. Done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 219250) Time Spent: 4h 20m (was: 4h 10m) > Stats replication for ACID tables. > -- > > Key: HIVE-21109 > URL: https://issues.apache.org/jira/browse/HIVE-21109 > Project: Hive > Issue Type: Sub-task >Reporter: Ashutosh Bapat >Assignee: Ashutosh Bapat >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, > HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, > HIVE-21109.06.patch > > Time Spent: 4h 20m > Remaining Estimate: 0h > > Transactional tables require a writeID associated with the stats update. This > writeId needs to be in sync with the writeId on the source and hence needs to > be replicated from the source. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.
[ https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=219226=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-219226 ] ASF GitHub Bot logged work on HIVE-21109: - Author: ASF GitHub Bot Created on: 27/Mar/19 09:46 Start Date: 27/Mar/19 09:46 Worklog Time Spent: 10m Work Description: ashutosh-bapat commented on pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r269476213 ## File path: ql/src/java/org/apache/hadoop/hive/ql/plan/ImportTableDesc.java ## @@ -381,4 +382,11 @@ public void setOwnerName(String ownerName) { throw new RuntimeException("Invalid table type : " + getDescType()); } } + + public Long getReplWriteId() { +if (this.createTblDesc != null) { + return this.createTblDesc.getReplWriteId(); Review comment: In getBaseAddPartitionDescFromPartition() where we use this function, we don't have access to the event message. Instead we are passing the writeId through ImportTableDesc by calling setReplWriteId(). This function just introduces the missing getReplWriteId() method symmetric to setReplWriteId(). If we use local variable and pass it around there is a possibility that local writeId variable can go out of sync with that in ImportTableDesc. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 219226) Time Spent: 4h 10m (was: 4h) > Stats replication for ACID tables. > -- > > Key: HIVE-21109 > URL: https://issues.apache.org/jira/browse/HIVE-21109 > Project: Hive > Issue Type: Sub-task >Reporter: Ashutosh Bapat >Assignee: Ashutosh Bapat >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, > HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, > HIVE-21109.06.patch > > Time Spent: 4h 10m > Remaining Estimate: 0h > > Transactional tables require a writeID associated with the stats update. This > writeId needs to be in sync with the writeId on the source and hence needs to > be replicated from the source. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.
[ https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=219220=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-219220 ] ASF GitHub Bot logged work on HIVE-21109: - Author: ASF GitHub Bot Created on: 27/Mar/19 09:27 Start Date: 27/Mar/19 09:27 Worklog Time Spent: 10m Work Description: ashutosh-bapat commented on pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r269468871 ## File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java ## @@ -2950,21 +2956,33 @@ public Partition createPartition(Table tbl, Map partSpec) throws int size = addPartitionDesc.getPartitionCount(); List in = new ArrayList(size); -AcidUtils.TableSnapshot tableSnapshot = AcidUtils.getTableSnapshot(conf, tbl, true); long writeId; String validWriteIdList; -if (tableSnapshot != null && tableSnapshot.getWriteId() > 0) { - writeId = tableSnapshot.getWriteId(); - validWriteIdList = tableSnapshot.getValidWriteIdList(); + +// In case of replication, get the writeId from the source and use valid write Id list +// for replication. +if (addPartitionDesc.getReplicationSpec() != null && Review comment: Done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 219220) Time Spent: 4h (was: 3h 50m) > Stats replication for ACID tables. > -- > > Key: HIVE-21109 > URL: https://issues.apache.org/jira/browse/HIVE-21109 > Project: Hive > Issue Type: Sub-task >Reporter: Ashutosh Bapat >Assignee: Ashutosh Bapat >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, > HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, > HIVE-21109.06.patch > > Time Spent: 4h > Remaining Estimate: 0h > > Transactional tables require a writeID associated with the stats update. This > writeId needs to be in sync with the writeId on the source and hence needs to > be replicated from the source. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.
[ https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=219218=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-219218 ] ASF GitHub Bot logged work on HIVE-21109: - Author: ASF GitHub Bot Created on: 27/Mar/19 09:24 Start Date: 27/Mar/19 09:24 Worklog Time Spent: 10m Work Description: ashutosh-bapat commented on pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r269467769 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java ## @@ -3539,10 +3573,19 @@ public boolean equals(Object obj) { } // Update partition column statistics if available -for (Partition newPart : newParts) { - if (newPart.isSetColStats()) { -updatePartitonColStatsInternal(tbl, newPart.getColStats(), null, newPart.getWriteId()); +int cnt = 0; +for (ColumnStatistics partColStats: partsColStats) { + long writeId = partsWriteIds.get(cnt++); + // On replica craft a valid snapshot out of the writeId in the partition + String validWriteIds = null; + if (writeId > 0) { +ValidWriteIdList vwil = Review comment: Done. Please check. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 219218) Time Spent: 3h 50m (was: 3h 40m) > Stats replication for ACID tables. > -- > > Key: HIVE-21109 > URL: https://issues.apache.org/jira/browse/HIVE-21109 > Project: Hive > Issue Type: Sub-task >Reporter: Ashutosh Bapat >Assignee: Ashutosh Bapat >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, > HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, > HIVE-21109.06.patch > > Time Spent: 3h 50m > Remaining Estimate: 0h > > Transactional tables require a writeID associated with the stats update. This > writeId needs to be in sync with the writeId on the source and hence needs to > be replicated from the source. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.
[ https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=219217=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-219217 ] ASF GitHub Bot logged work on HIVE-21109: - Author: ASF GitHub Bot Created on: 27/Mar/19 09:24 Start Date: 27/Mar/19 09:24 Worklog Time Spent: 10m Work Description: ashutosh-bapat commented on pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r269467699 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java ## @@ -2130,11 +2144,18 @@ private void create_table_core(final RawStore ms, final Table tbl, // If the table has column statistics, update it into the metastore. This feature is used // by replication to replicate table level statistics. - if (tbl.isSetColStats()) { -// We do not replicate statistics for a transactional table right now and hence we do not -// expect a transactional table to have column statistics here. So passing null -// validWriteIds is fine for now. -updateTableColumnStatsInternal(tbl.getColStats(), null, tbl.getWriteId()); + if (colStats != null) { +// On replica craft a valid snapshot out of the writeId in the table. +long writeId = tbl.getWriteId(); +String validWriteIds = null; +if (writeId > 0) { + ValidWriteIdList vwil = + new ValidReaderWriteIdList(TableName.getDbTable(tbl.getDbName(), Review comment: Done. Please check. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 219217) Time Spent: 3h 40m (was: 3.5h) > Stats replication for ACID tables. > -- > > Key: HIVE-21109 > URL: https://issues.apache.org/jira/browse/HIVE-21109 > Project: Hive > Issue Type: Sub-task >Reporter: Ashutosh Bapat >Assignee: Ashutosh Bapat >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, > HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, > HIVE-21109.06.patch > > Time Spent: 3h 40m > Remaining Estimate: 0h > > Transactional tables require a writeID associated with the stats update. This > writeId needs to be in sync with the writeId on the source and hence needs to > be replicated from the source. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.
[ https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=219216=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-219216 ] ASF GitHub Bot logged work on HIVE-21109: - Author: ASF GitHub Bot Created on: 27/Mar/19 09:24 Start Date: 27/Mar/19 09:24 Worklog Time Spent: 10m Work Description: ashutosh-bapat commented on pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r269467626 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java ## @@ -2130,11 +2144,18 @@ private void create_table_core(final RawStore ms, final Table tbl, // If the table has column statistics, update it into the metastore. This feature is used // by replication to replicate table level statistics. - if (tbl.isSetColStats()) { -// We do not replicate statistics for a transactional table right now and hence we do not -// expect a transactional table to have column statistics here. So passing null -// validWriteIds is fine for now. -updateTableColumnStatsInternal(tbl.getColStats(), null, tbl.getWriteId()); + if (colStats != null) { +// On replica craft a valid snapshot out of the writeId in the table. +long writeId = tbl.getWriteId(); +String validWriteIds = null; +if (writeId > 0) { + ValidWriteIdList vwil = Review comment: Done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 219216) Time Spent: 3.5h (was: 3h 20m) > Stats replication for ACID tables. > -- > > Key: HIVE-21109 > URL: https://issues.apache.org/jira/browse/HIVE-21109 > Project: Hive > Issue Type: Sub-task >Reporter: Ashutosh Bapat >Assignee: Ashutosh Bapat >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, > HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, > HIVE-21109.06.patch > > Time Spent: 3.5h > Remaining Estimate: 0h > > Transactional tables require a writeID associated with the stats update. This > writeId needs to be in sync with the writeId on the source and hence needs to > be replicated from the source. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.
[ https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=219208=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-219208 ] ASF GitHub Bot logged work on HIVE-21109: - Author: ASF GitHub Bot Created on: 27/Mar/19 09:14 Start Date: 27/Mar/19 09:14 Worklog Time Spent: 10m Work Description: ashutosh-bapat commented on pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r269463941 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java ## @@ -1894,6 +1898,16 @@ private void create_table_core(final RawStore ms, final Table tbl, List checkConstraints) throws AlreadyExistsException, MetaException, InvalidObjectException, NoSuchObjectException, InvalidInputException { + + ColumnStatistics colStats = null; + // If the given table has column statistics, save it here. We will update it later. + // We don't want it to be part of the Table object being created, lest the create table Review comment: " and also shouldn't be persisted". That's not true. We will persist the table stats but later. If you let me know which part of the comment is complex (needs simplification), will come up with alternate wording reflecting the same meaning. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 219208) Time Spent: 3h 20m (was: 3h 10m) > Stats replication for ACID tables. > -- > > Key: HIVE-21109 > URL: https://issues.apache.org/jira/browse/HIVE-21109 > Project: Hive > Issue Type: Sub-task >Reporter: Ashutosh Bapat >Assignee: Ashutosh Bapat >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, > HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, > HIVE-21109.06.patch > > Time Spent: 3h 20m > Remaining Estimate: 0h > > Transactional tables require a writeID associated with the stats update. This > writeId needs to be in sync with the writeId on the source and hence needs to > be replicated from the source. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.
[ https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=219202=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-219202 ] ASF GitHub Bot logged work on HIVE-21109: - Author: ASF GitHub Bot Created on: 27/Mar/19 08:41 Start Date: 27/Mar/19 08:41 Worklog Time Spent: 10m Work Description: ashutosh-bapat commented on pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r269452642 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/events/filesystem/FSTableEvent.java ## @@ -199,12 +199,15 @@ private AddPartitionDesc partitionDesc(Path fromPath, // Right now, we do not have a way of associating a writeId with statistics for a table // converted to a transactional table if it was non-transactional on the source. So, do not Review comment: Done. Looks like I missed pushing an entire commit fixing the comments. Done now. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 219202) Time Spent: 3h 10m (was: 3h) > Stats replication for ACID tables. > -- > > Key: HIVE-21109 > URL: https://issues.apache.org/jira/browse/HIVE-21109 > Project: Hive > Issue Type: Sub-task >Reporter: Ashutosh Bapat >Assignee: Ashutosh Bapat >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, > HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, > HIVE-21109.06.patch > > Time Spent: 3h 10m > Remaining Estimate: 0h > > Transactional tables require a writeID associated with the stats update. This > writeId needs to be in sync with the writeId on the source and hence needs to > be replicated from the source. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.
[ https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=219178=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-219178 ] ASF GitHub Bot logged work on HIVE-21109: - Author: ASF GitHub Bot Created on: 27/Mar/19 06:58 Start Date: 27/Mar/19 06:58 Worklog Time Spent: 10m Work Description: ashutosh-bapat commented on pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r269425412 ## File path: ql/src/java/org/apache/hadoop/hive/ql/parse/ImportSemanticAnalyzer.java ## @@ -1247,17 +1244,37 @@ private static void createReplImportTasks( } else if (!replicationSpec.isMetadataOnly() && !shouldSkipDataCopyInReplScope(tblDesc, replicationSpec)) { x.getLOG().debug("adding dependent CopyWork/MoveWork for table"); -t.addDependentTask(loadTable(fromURI, table, replicationSpec.isReplace(), -new Path(tblDesc.getLocation()), replicationSpec, x, writeId, stmtId)); +dependentTasks = new ArrayList<>(1); +dependentTasks.add(loadTable(fromURI, table, replicationSpec.isReplace(), + new Path(tblDesc.getLocation()), replicationSpec, + x, writeId, stmtId)); } - if (dropTblTask != null) { -// Drop first and then create -dropTblTask.addDependentTask(t); -x.getTasks().add(dropTblTask); + // During replication, by the time we reply a commit transaction event, the table should + // have been already created when replaying previous events. So no need to create table + // again. For some reason we need create table task for partitioned table though. Review comment: Corrected. The partition case is already fixed, but the comment wasn't corrected. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 219178) Time Spent: 3h (was: 2h 50m) > Stats replication for ACID tables. > -- > > Key: HIVE-21109 > URL: https://issues.apache.org/jira/browse/HIVE-21109 > Project: Hive > Issue Type: Sub-task >Reporter: Ashutosh Bapat >Assignee: Ashutosh Bapat >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, > HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, > HIVE-21109.06.patch > > Time Spent: 3h > Remaining Estimate: 0h > > Transactional tables require a writeID associated with the stats update. This > writeId needs to be in sync with the writeId on the source and hence needs to > be replicated from the source. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.
[ https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=219177=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-219177 ] ASF GitHub Bot logged work on HIVE-21109: - Author: ASF GitHub Bot Created on: 27/Mar/19 06:52 Start Date: 27/Mar/19 06:52 Worklog Time Spent: 10m Work Description: ashutosh-bapat commented on pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r269424107 ## File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java ## @@ -828,6 +828,8 @@ public void alterPartitions(String tblName, List newParts, new ArrayList(); try { AcidUtils.TableSnapshot tableSnapshot = null; + // TODO: In case of replication use the writeId and valid write id list constructed for Review comment: I have addressed this comment and removed it as well. But didn't commit the change and thus wasn't part of the PR. I have updated PR. This TODO is no more there. Sorry. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 219177) Time Spent: 2h 50m (was: 2h 40m) > Stats replication for ACID tables. > -- > > Key: HIVE-21109 > URL: https://issues.apache.org/jira/browse/HIVE-21109 > Project: Hive > Issue Type: Sub-task >Reporter: Ashutosh Bapat >Assignee: Ashutosh Bapat >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, > HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, > HIVE-21109.06.patch > > Time Spent: 2h 50m > Remaining Estimate: 0h > > Transactional tables require a writeID associated with the stats update. This > writeId needs to be in sync with the writeId on the source and hence needs to > be replicated from the source. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.
[ https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=219176=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-219176 ] ASF GitHub Bot logged work on HIVE-21109: - Author: ASF GitHub Bot Created on: 27/Mar/19 06:51 Start Date: 27/Mar/19 06:51 Worklog Time Spent: 10m Work Description: ashutosh-bapat commented on pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r269423978 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java ## @@ -2689,7 +2689,19 @@ private int alterTable(Hive db, AlterTableDesc alterTbl) throws HiveException { } else { // Note: this is necessary for UPDATE_STATISTICS command, that operates via ADDPROPS (why?). // For any other updates, we don't want to do txn check on partitions when altering table. -boolean isTxn = alterTbl.getPartSpec() != null && alterTbl.getOp() == AlterTableTypes.ADDPROPS; +boolean isTxn = false; +if (alterTbl.getPartSpec() != null && alterTbl.getOp() == AlterTableTypes.ADDPROPS) { + // ADDPROPS is used to add repl.last.id during replication. That's not a transactional + // change. + Map props = alterTbl.getProps(); + if (props.size() <= 1 && props.get(ReplicationSpec.KEY.CURR_STATE_ID.toString()) != null) { +isTxn = false; + } else { +isTxn = true; + } +} +// TODO: Somehow we have to signal alterPartitions that it's part of replication and +// should use replication's valid writeid list instead of creating one. Review comment: I have addressed this comment and removed it as well. But didn't commit the change and thus wasn't part of the PR. I have updated PR. This TODO is no more there. Sorry. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 219176) Time Spent: 2h 40m (was: 2.5h) > Stats replication for ACID tables. > -- > > Key: HIVE-21109 > URL: https://issues.apache.org/jira/browse/HIVE-21109 > Project: Hive > Issue Type: Sub-task >Reporter: Ashutosh Bapat >Assignee: Ashutosh Bapat >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, > HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, > HIVE-21109.06.patch > > Time Spent: 2h 40m > Remaining Estimate: 0h > > Transactional tables require a writeID associated with the stats update. This > writeId needs to be in sync with the writeId on the source and hence needs to > be replicated from the source. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.
[ https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=218858=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218858 ] ASF GitHub Bot logged work on HIVE-21109: - Author: ASF GitHub Bot Created on: 26/Mar/19 18:58 Start Date: 26/Mar/19 18:58 Worklog Time Spent: 10m Work Description: sankarh commented on pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r269136269 ## File path: ql/src/java/org/apache/hadoop/hive/ql/parse/ImportSemanticAnalyzer.java ## @@ -1247,17 +1244,37 @@ private static void createReplImportTasks( } else if (!replicationSpec.isMetadataOnly() && !shouldSkipDataCopyInReplScope(tblDesc, replicationSpec)) { x.getLOG().debug("adding dependent CopyWork/MoveWork for table"); -t.addDependentTask(loadTable(fromURI, table, replicationSpec.isReplace(), -new Path(tblDesc.getLocation()), replicationSpec, x, writeId, stmtId)); +dependentTasks = new ArrayList<>(1); +dependentTasks.add(loadTable(fromURI, table, replicationSpec.isReplace(), + new Path(tblDesc.getLocation()), replicationSpec, + x, writeId, stmtId)); } - if (dropTblTask != null) { -// Drop first and then create -dropTblTask.addDependentTask(t); -x.getTasks().add(dropTblTask); + // During replication, by the time we reply a commit transaction event, the table should + // have been already created when replaying previous events. So no need to create table + // again. For some reason we need create table task for partitioned table though. Review comment: The comment says for partitioned table, create table task needed but in the code it is skipped always for commit txn event. Which one is correct? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 218858) Time Spent: 1h 10m (was: 1h) > Stats replication for ACID tables. > -- > > Key: HIVE-21109 > URL: https://issues.apache.org/jira/browse/HIVE-21109 > Project: Hive > Issue Type: Sub-task >Reporter: Ashutosh Bapat >Assignee: Ashutosh Bapat >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, > HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, > HIVE-21109.06.patch > > Time Spent: 1h 10m > Remaining Estimate: 0h > > Transactional tables require a writeID associated with the stats update. This > writeId needs to be in sync with the writeId on the source and hence needs to > be replicated from the source. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.
[ https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=218855=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218855 ] ASF GitHub Bot logged work on HIVE-21109: - Author: ASF GitHub Bot Created on: 26/Mar/19 18:58 Start Date: 26/Mar/19 18:58 Worklog Time Spent: 10m Work Description: sankarh commented on pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r269156935 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java ## @@ -1894,6 +1898,16 @@ private void create_table_core(final RawStore ms, final Table tbl, List checkConstraints) throws AlreadyExistsException, MetaException, InvalidObjectException, NoSuchObjectException, InvalidInputException { + + ColumnStatistics colStats = null; + // If the given table has column statistics, save it here. We will update it later. + // We don't want it to be part of the Table object being created, lest the create table Review comment: Shall simplify the comment. "Column stats are not expected to be part of Create table event and also shouldn't be persisted. So remove it from Table object." This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 218855) Time Spent: 50m (was: 40m) > Stats replication for ACID tables. > -- > > Key: HIVE-21109 > URL: https://issues.apache.org/jira/browse/HIVE-21109 > Project: Hive > Issue Type: Sub-task >Reporter: Ashutosh Bapat >Assignee: Ashutosh Bapat >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, > HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, > HIVE-21109.06.patch > > Time Spent: 50m > Remaining Estimate: 0h > > Transactional tables require a writeID associated with the stats update. This > writeId needs to be in sync with the writeId on the source and hence needs to > be replicated from the source. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.
[ https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=218867=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218867 ] ASF GitHub Bot logged work on HIVE-21109: - Author: ASF GitHub Bot Created on: 26/Mar/19 18:58 Start Date: 26/Mar/19 18:58 Worklog Time Spent: 10m Work Description: sankarh commented on pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r269247183 ## File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestStatsReplicationScenarios.java ## @@ -359,17 +383,20 @@ private void testStatsReplicationCommon(boolean parallelBootstrap, boolean metad } @Test - public void testForNonAcidTables() throws Throwable { + public void testNonParallelBootstrapLoad() throws Throwable { +LOG.info("Testing " + testName.getClass().getName() + "." + testName.getMethodName()); testStatsReplicationCommon(false, false); } @Test - public void testForNonAcidTablesParallelBootstrapLoad() throws Throwable { -testStatsReplicationCommon(true, false); + public void testForParallelBootstrapLoad() throws Throwable { +LOG.info("Testing " + testName.getClass().getName() + "." + testName.getMethodName()); +testStatsReplicationCommon(true, false ); } @Test - public void testNonAcidMetadataOnlyDump() throws Throwable { + public void testMetadataOnlyDump() throws Throwable { Review comment: Add more tests for the following scenarios. 1. REPL LOAD fails after replicating table or partition objects with stats but before setting last replId. Now, retry which takes alter table/partition replace flows and stats should be valid after successful replication. Need this for all non-transactional, transactional and migration cases. 2. Parallel inserts with autogather enabled. Now, we will have events such that multiple txns open when updating stats event. Also, try to simulate that one stats update was successful and the other one invalidates it due to concurrent writes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 218867) Time Spent: 2.5h (was: 2h 20m) > Stats replication for ACID tables. > -- > > Key: HIVE-21109 > URL: https://issues.apache.org/jira/browse/HIVE-21109 > Project: Hive > Issue Type: Sub-task >Reporter: Ashutosh Bapat >Assignee: Ashutosh Bapat >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, > HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, > HIVE-21109.06.patch > > Time Spent: 2.5h > Remaining Estimate: 0h > > Transactional tables require a writeID associated with the stats update. This > writeId needs to be in sync with the writeId on the source and hence needs to > be replicated from the source. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.
[ https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=218865=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218865 ] ASF GitHub Bot logged work on HIVE-21109: - Author: ASF GitHub Bot Created on: 26/Mar/19 18:58 Start Date: 26/Mar/19 18:58 Worklog Time Spent: 10m Work Description: sankarh commented on pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r269262756 ## File path: ql/src/java/org/apache/hadoop/hive/ql/plan/ImportTableDesc.java ## @@ -381,4 +382,11 @@ public void setOwnerName(String ownerName) { throw new RuntimeException("Invalid table type : " + getDescType()); } } + + public Long getReplWriteId() { +if (this.createTblDesc != null) { + return this.createTblDesc.getReplWriteId(); Review comment: This replWriteId is just a place holder for the writeId from the event message. It need not be in CreateTableDesc. Can be maintained in local variables and pass around. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 218865) Time Spent: 2h 10m (was: 2h) > Stats replication for ACID tables. > -- > > Key: HIVE-21109 > URL: https://issues.apache.org/jira/browse/HIVE-21109 > Project: Hive > Issue Type: Sub-task >Reporter: Ashutosh Bapat >Assignee: Ashutosh Bapat >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, > HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, > HIVE-21109.06.patch > > Time Spent: 2h 10m > Remaining Estimate: 0h > > Transactional tables require a writeID associated with the stats update. This > writeId needs to be in sync with the writeId on the source and hence needs to > be replicated from the source. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.
[ https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=218860=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218860 ] ASF GitHub Bot logged work on HIVE-21109: - Author: ASF GitHub Bot Created on: 26/Mar/19 18:58 Start Date: 26/Mar/19 18:58 Worklog Time Spent: 10m Work Description: sankarh commented on pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r269220469 ## File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java ## @@ -2950,21 +2956,33 @@ public Partition createPartition(Table tbl, Map partSpec) throws int size = addPartitionDesc.getPartitionCount(); List in = new ArrayList(size); -AcidUtils.TableSnapshot tableSnapshot = AcidUtils.getTableSnapshot(conf, tbl, true); long writeId; String validWriteIdList; -if (tableSnapshot != null && tableSnapshot.getWriteId() > 0) { - writeId = tableSnapshot.getWriteId(); - validWriteIdList = tableSnapshot.getValidWriteIdList(); + +// In case of replication, get the writeId from the source and use valid write Id list +// for replication. +if (addPartitionDesc.getReplicationSpec() != null && +addPartitionDesc.getReplicationSpec().isInReplicationScope() && +addPartitionDesc.getPartition(0).getWriteId() > 0) { + writeId = addPartitionDesc.getPartition(0).getWriteId(); + validWriteIdList = Review comment: In replication flow, it is fine to use hardcoded ValidWriteIdList as we want to forcefully set this writeId into table or partition objects. Getting it from current state might be wrong as we don't update ValidTxnList in conf for repl created txns. ValidWriteIdList is just used to check if writeId in metastore objects are updated by any concurrent inserts. In repl load flow, it is not possible as we replicate one event at a time or in bootstrap, no 2 threads writes into same table. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 218860) Time Spent: 1.5h (was: 1h 20m) > Stats replication for ACID tables. > -- > > Key: HIVE-21109 > URL: https://issues.apache.org/jira/browse/HIVE-21109 > Project: Hive > Issue Type: Sub-task >Reporter: Ashutosh Bapat >Assignee: Ashutosh Bapat >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, > HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, > HIVE-21109.06.patch > > Time Spent: 1.5h > Remaining Estimate: 0h > > Transactional tables require a writeID associated with the stats update. This > writeId needs to be in sync with the writeId on the source and hence needs to > be replicated from the source. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.
[ https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=218863=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218863 ] ASF GitHub Bot logged work on HIVE-21109: - Author: ASF GitHub Bot Created on: 26/Mar/19 18:58 Start Date: 26/Mar/19 18:58 Worklog Time Spent: 10m Work Description: sankarh commented on pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r269169210 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java ## @@ -2130,11 +2144,18 @@ private void create_table_core(final RawStore ms, final Table tbl, // If the table has column statistics, update it into the metastore. This feature is used // by replication to replicate table level statistics. - if (tbl.isSetColStats()) { -// We do not replicate statistics for a transactional table right now and hence we do not -// expect a transactional table to have column statistics here. So passing null -// validWriteIds is fine for now. -updateTableColumnStatsInternal(tbl.getColStats(), null, tbl.getWriteId()); + if (colStats != null) { +// On replica craft a valid snapshot out of the writeId in the table. +long writeId = tbl.getWriteId(); +String validWriteIds = null; +if (writeId > 0) { + ValidWriteIdList vwil = + new ValidReaderWriteIdList(TableName.getDbTable(tbl.getDbName(), Review comment: Shall add a comment on why the hardcoded validWriteList is used in this flow instead of taking current state of txns. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 218863) Time Spent: 1h 50m (was: 1h 40m) > Stats replication for ACID tables. > -- > > Key: HIVE-21109 > URL: https://issues.apache.org/jira/browse/HIVE-21109 > Project: Hive > Issue Type: Sub-task >Reporter: Ashutosh Bapat >Assignee: Ashutosh Bapat >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, > HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, > HIVE-21109.06.patch > > Time Spent: 1h 50m > Remaining Estimate: 0h > > Transactional tables require a writeID associated with the stats update. This > writeId needs to be in sync with the writeId on the source and hence needs to > be replicated from the source. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.
[ https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=218856=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218856 ] ASF GitHub Bot logged work on HIVE-21109: - Author: ASF GitHub Bot Created on: 26/Mar/19 18:58 Start Date: 26/Mar/19 18:58 Worklog Time Spent: 10m Work Description: sankarh commented on pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r269110947 ## File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java ## @@ -828,6 +828,8 @@ public void alterPartitions(String tblName, List newParts, new ArrayList(); try { AcidUtils.TableSnapshot tableSnapshot = null; + // TODO: In case of replication use the writeId and valid write id list constructed for Review comment: Is it done or still TODO? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 218856) Time Spent: 1h (was: 50m) > Stats replication for ACID tables. > -- > > Key: HIVE-21109 > URL: https://issues.apache.org/jira/browse/HIVE-21109 > Project: Hive > Issue Type: Sub-task >Reporter: Ashutosh Bapat >Assignee: Ashutosh Bapat >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, > HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, > HIVE-21109.06.patch > > Time Spent: 1h > Remaining Estimate: 0h > > Transactional tables require a writeID associated with the stats update. This > writeId needs to be in sync with the writeId on the source and hence needs to > be replicated from the source. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.
[ https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=218854=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218854 ] ASF GitHub Bot logged work on HIVE-21109: - Author: ASF GitHub Bot Created on: 26/Mar/19 18:58 Start Date: 26/Mar/19 18:58 Worklog Time Spent: 10m Work Description: sankarh commented on pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r269060256 ## File path: ql/src/java/org/apache/hadoop/hive/ql/ddl/table/CreateTableDesc.java ## @@ -118,7 +118,8 @@ List notNullConstraints; List defaultConstraints; List checkConstraints; - private ColumnStatistics colStats; + private ColumnStatistics colStats; // For the sake of replication + private long writeId = -1; // For the sake of replication Review comment: Can we re-use the replWriteId variable that we already have? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 218854) Time Spent: 40m (was: 0.5h) > Stats replication for ACID tables. > -- > > Key: HIVE-21109 > URL: https://issues.apache.org/jira/browse/HIVE-21109 > Project: Hive > Issue Type: Sub-task >Reporter: Ashutosh Bapat >Assignee: Ashutosh Bapat >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, > HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, > HIVE-21109.06.patch > > Time Spent: 40m > Remaining Estimate: 0h > > Transactional tables require a writeID associated with the stats update. This > writeId needs to be in sync with the writeId on the source and hence needs to > be replicated from the source. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.
[ https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=218853=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218853 ] ASF GitHub Bot logged work on HIVE-21109: - Author: ASF GitHub Bot Created on: 26/Mar/19 18:58 Start Date: 26/Mar/19 18:58 Worklog Time Spent: 10m Work Description: sankarh commented on pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r269098036 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java ## @@ -2689,7 +2689,19 @@ private int alterTable(Hive db, AlterTableDesc alterTbl) throws HiveException { } else { // Note: this is necessary for UPDATE_STATISTICS command, that operates via ADDPROPS (why?). // For any other updates, we don't want to do txn check on partitions when altering table. -boolean isTxn = alterTbl.getPartSpec() != null && alterTbl.getOp() == AlterTableTypes.ADDPROPS; +boolean isTxn = false; +if (alterTbl.getPartSpec() != null && alterTbl.getOp() == AlterTableTypes.ADDPROPS) { + // ADDPROPS is used to add repl.last.id during replication. That's not a transactional + // change. + Map props = alterTbl.getProps(); + if (props.size() <= 1 && props.get(ReplicationSpec.KEY.CURR_STATE_ID.toString()) != null) { +isTxn = false; + } else { +isTxn = true; + } +} +// TODO: Somehow we have to signal alterPartitions that it's part of replication and +// should use replication's valid writeid list instead of creating one. Review comment: What do you mean by replication's valid writeid list in this comment? Even in repl flow, we get validWriteIdList from HMS based on incoming writeId in the event msg. Are you suggesting to cache this ValidWriteIdList somewhere and use it instead of invoking HMS API? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 218853) Time Spent: 0.5h (was: 20m) > Stats replication for ACID tables. > -- > > Key: HIVE-21109 > URL: https://issues.apache.org/jira/browse/HIVE-21109 > Project: Hive > Issue Type: Sub-task >Reporter: Ashutosh Bapat >Assignee: Ashutosh Bapat >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, > HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, > HIVE-21109.06.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > > Transactional tables require a writeID associated with the stats update. This > writeId needs to be in sync with the writeId on the source and hence needs to > be replicated from the source. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.
[ https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=218864=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218864 ] ASF GitHub Bot logged work on HIVE-21109: - Author: ASF GitHub Bot Created on: 26/Mar/19 18:58 Start Date: 26/Mar/19 18:58 Worklog Time Spent: 10m Work Description: sankarh commented on pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r269223302 ## File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java ## @@ -2950,21 +2956,33 @@ public Partition createPartition(Table tbl, Map partSpec) throws int size = addPartitionDesc.getPartitionCount(); List in = new ArrayList(size); -AcidUtils.TableSnapshot tableSnapshot = AcidUtils.getTableSnapshot(conf, tbl, true); long writeId; String validWriteIdList; -if (tableSnapshot != null && tableSnapshot.getWriteId() > 0) { - writeId = tableSnapshot.getWriteId(); - validWriteIdList = tableSnapshot.getValidWriteIdList(); + +// In case of replication, get the writeId from the source and use valid write Id list +// for replication. +if (addPartitionDesc.getReplicationSpec() != null && Review comment: addPartitionDesc.getReplicationSpec() will never be null. Can remove this check. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 218864) Time Spent: 2h (was: 1h 50m) > Stats replication for ACID tables. > -- > > Key: HIVE-21109 > URL: https://issues.apache.org/jira/browse/HIVE-21109 > Project: Hive > Issue Type: Sub-task >Reporter: Ashutosh Bapat >Assignee: Ashutosh Bapat >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, > HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, > HIVE-21109.06.patch > > Time Spent: 2h > Remaining Estimate: 0h > > Transactional tables require a writeID associated with the stats update. This > writeId needs to be in sync with the writeId on the source and hence needs to > be replicated from the source. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.
[ https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=218852=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218852 ] ASF GitHub Bot logged work on HIVE-21109: - Author: ASF GitHub Bot Created on: 26/Mar/19 18:58 Start Date: 26/Mar/19 18:58 Worklog Time Spent: 10m Work Description: sankarh commented on pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r269081532 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java ## @@ -2689,7 +2689,19 @@ private int alterTable(Hive db, AlterTableDesc alterTbl) throws HiveException { } else { // Note: this is necessary for UPDATE_STATISTICS command, that operates via ADDPROPS (why?). // For any other updates, we don't want to do txn check on partitions when altering table. -boolean isTxn = alterTbl.getPartSpec() != null && alterTbl.getOp() == AlterTableTypes.ADDPROPS; +boolean isTxn = false; +if (alterTbl.getPartSpec() != null && alterTbl.getOp() == AlterTableTypes.ADDPROPS) { + // ADDPROPS is used to add repl.last.id during replication. That's not a transactional + // change. + Map props = alterTbl.getProps(); + if (props.size() <= 1 && props.get(ReplicationSpec.KEY.CURR_STATE_ID.toString()) != null) { Review comment: ReplUtils.REPL_CHECKPOINT_KEY is another prop we set it in repl flow which is not transactional. This check doesn't seems to be clean as in future we might add more such alters in repl flow. Can we check replicationSpec.isReplicationScope instead or another flag in AlterTableDesc to skip this? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 218852) Time Spent: 20m (was: 10m) > Stats replication for ACID tables. > -- > > Key: HIVE-21109 > URL: https://issues.apache.org/jira/browse/HIVE-21109 > Project: Hive > Issue Type: Sub-task >Reporter: Ashutosh Bapat >Assignee: Ashutosh Bapat >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, > HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, > HIVE-21109.06.patch > > Time Spent: 20m > Remaining Estimate: 0h > > Transactional tables require a writeID associated with the stats update. This > writeId needs to be in sync with the writeId on the source and hence needs to > be replicated from the source. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.
[ https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=218861=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218861 ] ASF GitHub Bot logged work on HIVE-21109: - Author: ASF GitHub Bot Created on: 26/Mar/19 18:58 Start Date: 26/Mar/19 18:58 Worklog Time Spent: 10m Work Description: sankarh commented on pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r269161871 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java ## @@ -2130,11 +2144,18 @@ private void create_table_core(final RawStore ms, final Table tbl, // If the table has column statistics, update it into the metastore. This feature is used // by replication to replicate table level statistics. - if (tbl.isSetColStats()) { -// We do not replicate statistics for a transactional table right now and hence we do not -// expect a transactional table to have column statistics here. So passing null -// validWriteIds is fine for now. -updateTableColumnStatsInternal(tbl.getColStats(), null, tbl.getWriteId()); + if (colStats != null) { +// On replica craft a valid snapshot out of the writeId in the table. +long writeId = tbl.getWriteId(); +String validWriteIds = null; +if (writeId > 0) { + ValidWriteIdList vwil = Review comment: Shall use meaningful names instead of "vwil". This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 218861) > Stats replication for ACID tables. > -- > > Key: HIVE-21109 > URL: https://issues.apache.org/jira/browse/HIVE-21109 > Project: Hive > Issue Type: Sub-task >Reporter: Ashutosh Bapat >Assignee: Ashutosh Bapat >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, > HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, > HIVE-21109.06.patch > > Time Spent: 1.5h > Remaining Estimate: 0h > > Transactional tables require a writeID associated with the stats update. This > writeId needs to be in sync with the writeId on the source and hence needs to > be replicated from the source. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.
[ https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=218866=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218866 ] ASF GitHub Bot logged work on HIVE-21109: - Author: ASF GitHub Bot Created on: 26/Mar/19 18:58 Start Date: 26/Mar/19 18:58 Worklog Time Spent: 10m Work Description: sankarh commented on pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r269257547 ## File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java ## @@ -987,10 +989,14 @@ public void createTable(Table tbl, boolean ifNotExists, tTbl.setPrivileges(principalPrivs); } } - // Set table snapshot to api.Table to make it persistent. - TableSnapshot tableSnapshot = AcidUtils.getTableSnapshot(conf, tbl, true); - if (tableSnapshot != null) { -tbl.getTTable().setWriteId(tableSnapshot.getWriteId()); + // Set table snapshot to api.Table to make it persistent. A transactional table being + // replicated may have a valid write Id copied from the source. Use that instead of + // crafting one on the replica. + if (tTbl.getWriteId() <= 0) { Review comment: DO_NOT_UPDATE_STATS flag should be set in createTableFlow as well. Or else in autogather mode at target, it will be updated automatically. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 218866) Time Spent: 2h 20m (was: 2h 10m) > Stats replication for ACID tables. > -- > > Key: HIVE-21109 > URL: https://issues.apache.org/jira/browse/HIVE-21109 > Project: Hive > Issue Type: Sub-task >Reporter: Ashutosh Bapat >Assignee: Ashutosh Bapat >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, > HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, > HIVE-21109.06.patch > > Time Spent: 2h 20m > Remaining Estimate: 0h > > Transactional tables require a writeID associated with the stats update. This > writeId needs to be in sync with the writeId on the source and hence needs to > be replicated from the source. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.
[ https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=218857=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218857 ] ASF GitHub Bot logged work on HIVE-21109: - Author: ASF GitHub Bot Created on: 26/Mar/19 18:58 Start Date: 26/Mar/19 18:58 Worklog Time Spent: 10m Work Description: sankarh commented on pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r269103325 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/events/filesystem/FSTableEvent.java ## @@ -199,12 +199,15 @@ private AddPartitionDesc partitionDesc(Path fromPath, // Right now, we do not have a way of associating a writeId with statistics for a table // converted to a transactional table if it was non-transactional on the source. So, do not Review comment: Comment needs to be corrected. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 218857) > Stats replication for ACID tables. > -- > > Key: HIVE-21109 > URL: https://issues.apache.org/jira/browse/HIVE-21109 > Project: Hive > Issue Type: Sub-task >Reporter: Ashutosh Bapat >Assignee: Ashutosh Bapat >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, > HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, > HIVE-21109.06.patch > > Time Spent: 1h > Remaining Estimate: 0h > > Transactional tables require a writeID associated with the stats update. This > writeId needs to be in sync with the writeId on the source and hence needs to > be replicated from the source. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.
[ https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=218859=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218859 ] ASF GitHub Bot logged work on HIVE-21109: - Author: ASF GitHub Bot Created on: 26/Mar/19 18:58 Start Date: 26/Mar/19 18:58 Worklog Time Spent: 10m Work Description: sankarh commented on pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r269154738 ## File path: standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnCommonUtils.java ## @@ -84,6 +86,73 @@ public static ValidTxnList createValidReadTxnList(GetOpenTxnsResponse txns, long return new ValidReadTxnList(exceptions, outAbortedBits, highWaterMark, minOpenTxnId); } + /** + * Transform a {@link org.apache.hadoop.hive.metastore.api.GetOpenTxnsResponse} to a + * {@link org.apache.hadoop.hive.common.ValidTxnList}. This assumes that the caller intends to + * read the files, and thus treats both open and aborted transactions as invalid. + * + * This API is used by Hive replication which may have multiple transactions open at a time. + * + * @param txns open txn list from the metastore + * @param currentTxns Current transactions that the replication has opened. If any of the + *transactions is greater than 0 it will be removed from the exceptions + *list so that the replication sees its own transaction as valid. + * @return a valid txn list. + */ + public static ValidTxnList createValidReadTxnList(GetOpenTxnsResponse txns, Review comment: The complete logic of considering all txns opened in a batch by open txn event as current txns is incorrect. Multiple txns are opened by repl task only for replicating Hive Streaming case where we allocate txns batch but use one at a time. Also, we don't update stats in that case. Even if we update stats, it should refer to one txn as current txn and rest of the txns are left open. Shall remove replTxnIds cache in TxnManager as well. All callers shall create a hardcoded ValidWriteIdList using the writeId received from event msg. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 218859) Time Spent: 1h 20m (was: 1h 10m) > Stats replication for ACID tables. > -- > > Key: HIVE-21109 > URL: https://issues.apache.org/jira/browse/HIVE-21109 > Project: Hive > Issue Type: Sub-task >Reporter: Ashutosh Bapat >Assignee: Ashutosh Bapat >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, > HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, > HIVE-21109.06.patch > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Transactional tables require a writeID associated with the stats update. This > writeId needs to be in sync with the writeId on the source and hence needs to > be replicated from the source. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.
[ https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=218862=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218862 ] ASF GitHub Bot logged work on HIVE-21109: - Author: ASF GitHub Bot Created on: 26/Mar/19 18:58 Start Date: 26/Mar/19 18:58 Worklog Time Spent: 10m Work Description: sankarh commented on pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r269172695 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java ## @@ -3539,10 +3573,19 @@ public boolean equals(Object obj) { } // Update partition column statistics if available -for (Partition newPart : newParts) { - if (newPart.isSetColStats()) { -updatePartitonColStatsInternal(tbl, newPart.getColStats(), null, newPart.getWriteId()); +int cnt = 0; +for (ColumnStatistics partColStats: partsColStats) { + long writeId = partsWriteIds.get(cnt++); + // On replica craft a valid snapshot out of the writeId in the partition + String validWriteIds = null; + if (writeId > 0) { +ValidWriteIdList vwil = Review comment: Same as above. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 218862) Time Spent: 1h 40m (was: 1.5h) > Stats replication for ACID tables. > -- > > Key: HIVE-21109 > URL: https://issues.apache.org/jira/browse/HIVE-21109 > Project: Hive > Issue Type: Sub-task >Reporter: Ashutosh Bapat >Assignee: Ashutosh Bapat >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, > HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, > HIVE-21109.06.patch > > Time Spent: 1h 40m > Remaining Estimate: 0h > > Transactional tables require a writeID associated with the stats update. This > writeId needs to be in sync with the writeId on the source and hence needs to > be replicated from the source. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.
[ https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=217846=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-217846 ] ASF GitHub Bot logged work on HIVE-21109: - Author: ASF GitHub Bot Created on: 25/Mar/19 06:45 Start Date: 25/Mar/19 06:45 Worklog Time Spent: 10m Work Description: ashutosh-bapat commented on pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579 During bootstrap we use a method similar to non-ACID tables to transfer statistics of an ACID table from source to replica. However installing statistics of an ACID table requires a valid writeId and writeId list. We use the table/partition's latest writeId and a valid transaction list containing only that writeId to install the statistics in the metastore. During incremental replication writeId is obtained from the UpdateStats event and valid writeId list with that writeId marked as valid is used to install the column statistics. Table level statistics is replicated by replaying corresponding ALTER_TABLE/ALTER_PARTITION event. Further this commit has following related changes. 1. The table or the partition associated with the commit transaction event should have been created when replaying corresponding events before commit transaction event. Thus there is no need to add tasks for creating the table or the partition. 2 Maintain a list of open replicated transactions and use that to create valid transactions list when replaying a replicated event. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 217846) Time Spent: 10m Remaining Estimate: 0h > Stats replication for ACID tables. > -- > > Key: HIVE-21109 > URL: https://issues.apache.org/jira/browse/HIVE-21109 > Project: Hive > Issue Type: Sub-task >Reporter: Ashutosh Bapat >Assignee: Ashutosh Bapat >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, > HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch > > Time Spent: 10m > Remaining Estimate: 0h > > Transactional tables require a writeID associated with the stats update. This > writeId needs to be in sync with the writeId on the source and hence needs to > be replicated from the source. -- This message was sent by Atlassian JIRA (v7.6.3#76005)