[jira] [Commented] (HIVE-22111) Materialized view based on replicated table might not get refreshed

2021-07-30 Thread Peter Vary (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390802#comment-17390802
 ] 

Peter Vary commented on HIVE-22111:
---

CC: [~kkasa] - I have seen you working on mat views lately

> Materialized view based on replicated table might not get refreshed
> ---
>
> Key: HIVE-22111
> URL: https://issues.apache.org/jira/browse/HIVE-22111
> Project: Hive
>  Issue Type: Bug
>  Components: Materialized views, repl
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Minor
>
> Consider the following scenario:
> * create a base table which we replicate
> * create a materialized view in the target hive based on the base table
> * modify (delete/update) the base table in the source hive
> * replicate the changes (delete/update) to the target hive
> * query the materialized view in the target hive
>  
> We do not refresh the data, since when the transaction is created by 
> replication we set ctc_update_delete to 'N'.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22111) Materialized view based on replicated table might not get refreshed

2019-10-21 Thread Jesus Camacho Rodriguez (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16956488#comment-16956488
 ] 

Jesus Camacho Rodriguez commented on HIVE-22111:


I have just seen this issue again. If we are replicating the creation metadata 
for the materialized views, then we cannot just set the flag to 'N'; indeed as 
a workaround, we could set it to 'Y' even if we fallback to full rebuild first 
time we rebuild the materialized view in the cluster with the replica.
AFAIK, materialized views replication has other problems right now wrt 
replication, see HIVE-18621 and HIVE-20543.

> Materialized view based on replicated table might not get refreshed
> ---
>
> Key: HIVE-22111
> URL: https://issues.apache.org/jira/browse/HIVE-22111
> Project: Hive
>  Issue Type: Bug
>  Components: Materialized views, repl
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Minor
>
> Consider the following scenario:
> * create a base table which we replicate
> * create a materialized view in the target hive based on the base table
> * modify (delete/update) the base table in the source hive
> * replicate the changes (delete/update) to the target hive
> * query the materialized view in the target hive
>  
> We do not refresh the data, since when the transaction is created by 
> replication we set ctc_update_delete to 'N'.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22111) Materialized view based on replicated table might not get refreshed

2019-08-14 Thread Peter Vary (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-22111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16907221#comment-16907221
 ] 

Peter Vary commented on HIVE-22111:
---

In the TxnHandler.commitTxn method when we store the new commit generated by a 
replication event we do this:
{code:java}
  s = "insert into COMPLETED_TXN_COMPONENTS (ctc_txnid, ctc_database, " 
+
  "ctc_table, ctc_partition, ctc_writeid, ctc_update_delete) 
select tc_txnid," +
  " tc_database, tc_table, tc_partition, tc_writeid, '" + 
isUpdateDelete +
  "' from TXN_COMPONENTS where tc_txnid = " + txnid +
  //we only track compactor activity in TXN_COMPONENTS to handle 
the case where the
  //compactor txn aborts - so don't bother copying it to 
COMPLETED_TXN_COMPONENTS
  " AND tc_operation_type <> " + 
quoteChar(OperationType.COMPACT.sqlConst);
{code}
See: 
[https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L1227-L1233]

In case of replication {{isUpdateDelete}} is always 'N'.

{{TxnHandler.getMaterializationInvalidationInfo}} filters out components based 
on {{ctc_update_delete}}.
{code:java}
  query.append("select ctc_update_delete from COMPLETED_TXN_COMPONENTS 
where ctc_update_delete='Y' AND (");
{code}
See: 
[https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L2021]

By my understanding this means that this will cause the Materialized View to 
miss the change, and it will not be updated and might cause wrong results.

We do not have correct UpdateDelete information in case of replication, so the 
quick fix would be that we set the isUpdateDelete to 'Y' every time when we are 
coming from a replication event. If everything works as I expect then this 
would mean that we might end up regenerating the Materialized View 
unnecessarily on the target cluster, but we could ensure correct results even 
in this edge case. [~jcamachorodriguez]: Would this be an acceptable tradeoff?

Thanks,
 Peter

> Materialized view based on replicated table might not get refreshed
> ---
>
> Key: HIVE-22111
> URL: https://issues.apache.org/jira/browse/HIVE-22111
> Project: Hive
>  Issue Type: Bug
>  Components: Materialized views, repl
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Minor
>
> Consider the following scenario:
> * create a base table which we replicate
> * create a materialized view in the target hive based on the base table
> * modify (delete/update) the base table in the source hive
> * replicate the changes (delete/update) to the target hive
> * query the materialized view in the target hive
>  
> We do not refresh the data, since when the transaction is created by 
> replication we set ctc_update_delete to 'N'.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (HIVE-22111) Materialized view based on replicated table might not get refreshed

2019-08-14 Thread Peter Vary (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-22111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16907177#comment-16907177
 ] 

Peter Vary commented on HIVE-22111:
---

CC: [~jcamachorodriguez], [~sankarh]

> Materialized view based on replicated table might not get refreshed
> ---
>
> Key: HIVE-22111
> URL: https://issues.apache.org/jira/browse/HIVE-22111
> Project: Hive
>  Issue Type: Bug
>  Components: Materialized views, repl
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Minor
>
> Consider the following scenario:
> * create a base table which we replicate
> * create a materialized view in the target hive based on the base table
> * modify (delete/update) the base table in the source hive
> * replicate the changes (delete/update) to the target hive
> * query the materialized view in the target hive
>  
> We do not refresh the data, since when the transaction is created by 
> replication we set ctc_update_delete to 'N'.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)