[jira] [Commented] (HIVE-22111) Materialized view based on replicated table might not get refreshed
[ https://issues.apache.org/jira/browse/HIVE-22111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390802#comment-17390802 ] Peter Vary commented on HIVE-22111: --- CC: [~kkasa] - I have seen you working on mat views lately > Materialized view based on replicated table might not get refreshed > --- > > Key: HIVE-22111 > URL: https://issues.apache.org/jira/browse/HIVE-22111 > Project: Hive > Issue Type: Bug > Components: Materialized views, repl >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Minor > > Consider the following scenario: > * create a base table which we replicate > * create a materialized view in the target hive based on the base table > * modify (delete/update) the base table in the source hive > * replicate the changes (delete/update) to the target hive > * query the materialized view in the target hive > > We do not refresh the data, since when the transaction is created by > replication we set ctc_update_delete to 'N'. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22111) Materialized view based on replicated table might not get refreshed
[ https://issues.apache.org/jira/browse/HIVE-22111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16956488#comment-16956488 ] Jesus Camacho Rodriguez commented on HIVE-22111: I have just seen this issue again. If we are replicating the creation metadata for the materialized views, then we cannot just set the flag to 'N'; indeed as a workaround, we could set it to 'Y' even if we fallback to full rebuild first time we rebuild the materialized view in the cluster with the replica. AFAIK, materialized views replication has other problems right now wrt replication, see HIVE-18621 and HIVE-20543. > Materialized view based on replicated table might not get refreshed > --- > > Key: HIVE-22111 > URL: https://issues.apache.org/jira/browse/HIVE-22111 > Project: Hive > Issue Type: Bug > Components: Materialized views, repl >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Minor > > Consider the following scenario: > * create a base table which we replicate > * create a materialized view in the target hive based on the base table > * modify (delete/update) the base table in the source hive > * replicate the changes (delete/update) to the target hive > * query the materialized view in the target hive > > We do not refresh the data, since when the transaction is created by > replication we set ctc_update_delete to 'N'. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22111) Materialized view based on replicated table might not get refreshed
[ https://issues.apache.org/jira/browse/HIVE-22111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16907221#comment-16907221 ] Peter Vary commented on HIVE-22111: --- In the TxnHandler.commitTxn method when we store the new commit generated by a replication event we do this: {code:java} s = "insert into COMPLETED_TXN_COMPONENTS (ctc_txnid, ctc_database, " + "ctc_table, ctc_partition, ctc_writeid, ctc_update_delete) select tc_txnid," + " tc_database, tc_table, tc_partition, tc_writeid, '" + isUpdateDelete + "' from TXN_COMPONENTS where tc_txnid = " + txnid + //we only track compactor activity in TXN_COMPONENTS to handle the case where the //compactor txn aborts - so don't bother copying it to COMPLETED_TXN_COMPONENTS " AND tc_operation_type <> " + quoteChar(OperationType.COMPACT.sqlConst); {code} See: [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L1227-L1233] In case of replication {{isUpdateDelete}} is always 'N'. {{TxnHandler.getMaterializationInvalidationInfo}} filters out components based on {{ctc_update_delete}}. {code:java} query.append("select ctc_update_delete from COMPLETED_TXN_COMPONENTS where ctc_update_delete='Y' AND ("); {code} See: [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L2021] By my understanding this means that this will cause the Materialized View to miss the change, and it will not be updated and might cause wrong results. We do not have correct UpdateDelete information in case of replication, so the quick fix would be that we set the isUpdateDelete to 'Y' every time when we are coming from a replication event. If everything works as I expect then this would mean that we might end up regenerating the Materialized View unnecessarily on the target cluster, but we could ensure correct results even in this edge case. [~jcamachorodriguez]: Would this be an acceptable tradeoff? Thanks, Peter > Materialized view based on replicated table might not get refreshed > --- > > Key: HIVE-22111 > URL: https://issues.apache.org/jira/browse/HIVE-22111 > Project: Hive > Issue Type: Bug > Components: Materialized views, repl >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Minor > > Consider the following scenario: > * create a base table which we replicate > * create a materialized view in the target hive based on the base table > * modify (delete/update) the base table in the source hive > * replicate the changes (delete/update) to the target hive > * query the materialized view in the target hive > > We do not refresh the data, since when the transaction is created by > replication we set ctc_update_delete to 'N'. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (HIVE-22111) Materialized view based on replicated table might not get refreshed
[ https://issues.apache.org/jira/browse/HIVE-22111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16907177#comment-16907177 ] Peter Vary commented on HIVE-22111: --- CC: [~jcamachorodriguez], [~sankarh] > Materialized view based on replicated table might not get refreshed > --- > > Key: HIVE-22111 > URL: https://issues.apache.org/jira/browse/HIVE-22111 > Project: Hive > Issue Type: Bug > Components: Materialized views, repl >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Minor > > Consider the following scenario: > * create a base table which we replicate > * create a materialized view in the target hive based on the base table > * modify (delete/update) the base table in the source hive > * replicate the changes (delete/update) to the target hive > * query the materialized view in the target hive > > We do not refresh the data, since when the transaction is created by > replication we set ctc_update_delete to 'N'. -- This message was sent by Atlassian JIRA (v7.6.14#76016)