[jira] [Assigned] (HIVE-19423) REPL LOAD creates staging directory in source dump directory instead of table data location

2018-05-04 Thread mahesh kumar behera (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera reassigned HIVE-19423:
--

Assignee: mahesh kumar behera

> REPL LOAD creates staging directory in source dump directory instead of table 
> data location
> ---
>
> Key: HIVE-19423
> URL: https://issues.apache.org/jira/browse/HIVE-19423
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, HiveServer2, repl
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: Hive, Repl
> Fix For: 3.0.0
>
>
> REPL LOAD creates staging directory in source dump directory instead of table 
> data location. In case of replication from on-perm to cloud it can create 
> problem. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19423) REPL LOAD creates staging directory in source dump directory instead of table data location

2018-05-04 Thread mahesh kumar behera (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-19423:
---
Status: Patch Available  (was: Open)

[~sankarh] [~thejas]

Please review.Thanks.

> REPL LOAD creates staging directory in source dump directory instead of table 
> data location
> ---
>
> Key: HIVE-19423
> URL: https://issues.apache.org/jira/browse/HIVE-19423
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, HiveServer2, repl
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: Hive, Repl
> Fix For: 3.0.0
>
> Attachments: HIVE-19423.01.patch
>
>
> REPL LOAD creates staging directory in source dump directory instead of table 
> data location. In case of replication from on-perm to cloud it can create 
> problem. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19423) REPL LOAD creates staging directory in source dump directory instead of table data location

2018-05-04 Thread mahesh kumar behera (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-19423:
---
Attachment: HIVE-19423.01.patch

> REPL LOAD creates staging directory in source dump directory instead of table 
> data location
> ---
>
> Key: HIVE-19423
> URL: https://issues.apache.org/jira/browse/HIVE-19423
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, HiveServer2, repl
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: Hive, Repl
> Fix For: 3.0.0
>
> Attachments: HIVE-19423.01.patch
>
>
> REPL LOAD creates staging directory in source dump directory instead of table 
> data location. In case of replication from on-perm to cloud it can create 
> problem. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19423) REPL LOAD creates staging directory in source dump directory instead of table data location

2018-05-06 Thread mahesh kumar behera (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-19423:
---
Attachment: HIVE-19423.02.patch

> REPL LOAD creates staging directory in source dump directory instead of table 
> data location
> ---
>
> Key: HIVE-19423
> URL: https://issues.apache.org/jira/browse/HIVE-19423
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, HiveServer2, repl
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: Hive, Repl, pull-request-available
> Fix For: 3.0.0, 3.1.0
>
> Attachments: HIVE-19423.01.patch, HIVE-19423.02.patch
>
>
> REPL LOAD creates staging directory in source dump directory instead of table 
> data location. In case of replication from on-perm to cloud it can create 
> problem. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19423) REPL LOAD creates staging directory in source dump directory instead of table data location

2018-05-06 Thread mahesh kumar behera (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-19423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16465443#comment-16465443
 ] 

mahesh kumar behera commented on HIVE-19423:


[~sankarh]

 

Added the new patch with the fix.Please have a look.

> REPL LOAD creates staging directory in source dump directory instead of table 
> data location
> ---
>
> Key: HIVE-19423
> URL: https://issues.apache.org/jira/browse/HIVE-19423
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, HiveServer2, repl
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: Hive, Repl, pull-request-available
> Fix For: 3.0.0, 3.1.0
>
> Attachments: HIVE-19423.01.patch, HIVE-19423.02.patch
>
>
> REPL LOAD creates staging directory in source dump directory instead of table 
> data location. In case of replication from on-perm to cloud it can create 
> problem. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19423) REPL LOAD creates staging directory in source dump directory instead of table data location

2018-05-07 Thread mahesh kumar behera (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-19423:
---
Attachment: HIVE-19423.02.patch

> REPL LOAD creates staging directory in source dump directory instead of table 
> data location
> ---
>
> Key: HIVE-19423
> URL: https://issues.apache.org/jira/browse/HIVE-19423
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, HiveServer2, repl
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: Hive, Repl, pull-request-available
> Fix For: 3.0.0, 3.1.0
>
> Attachments: HIVE-19423.01.patch, HIVE-19423.02.patch
>
>
> REPL LOAD creates staging directory in source dump directory instead of table 
> data location. In case of replication from on-perm to cloud it can create 
> problem. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19423) REPL LOAD creates staging directory in source dump directory instead of table data location

2018-05-06 Thread mahesh kumar behera (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-19423:
---
Attachment: (was: HIVE-19423.02.patch)

> REPL LOAD creates staging directory in source dump directory instead of table 
> data location
> ---
>
> Key: HIVE-19423
> URL: https://issues.apache.org/jira/browse/HIVE-19423
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, HiveServer2, repl
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: Hive, Repl, pull-request-available
> Fix For: 3.0.0, 3.1.0
>
> Attachments: HIVE-19423.01.patch, HIVE-19423.03.patch
>
>
> REPL LOAD creates staging directory in source dump directory instead of table 
> data location. In case of replication from on-perm to cloud it can create 
> problem. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19423) REPL LOAD creates staging directory in source dump directory instead of table data location

2018-05-06 Thread mahesh kumar behera (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-19423:
---
Attachment: HIVE-19423.03.patch

> REPL LOAD creates staging directory in source dump directory instead of table 
> data location
> ---
>
> Key: HIVE-19423
> URL: https://issues.apache.org/jira/browse/HIVE-19423
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, HiveServer2, repl
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: Hive, Repl, pull-request-available
> Fix For: 3.0.0, 3.1.0
>
> Attachments: HIVE-19423.01.patch, HIVE-19423.03.patch
>
>
> REPL LOAD creates staging directory in source dump directory instead of table 
> data location. In case of replication from on-perm to cloud it can create 
> problem. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19423) REPL LOAD creates staging directory in source dump directory instead of table data location

2018-05-07 Thread mahesh kumar behera (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-19423:
---
Attachment: (was: HIVE-19423.03.patch)

> REPL LOAD creates staging directory in source dump directory instead of table 
> data location
> ---
>
> Key: HIVE-19423
> URL: https://issues.apache.org/jira/browse/HIVE-19423
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, HiveServer2, repl
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: Hive, Repl, pull-request-available
> Fix For: 3.0.0, 3.1.0
>
> Attachments: HIVE-19423.01.patch, HIVE-19423.02.patch
>
>
> REPL LOAD creates staging directory in source dump directory instead of table 
> data location. In case of replication from on-perm to cloud it can create 
> problem. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19488) enable CM root based on db parameter, identifying a db as source of replication.

2018-05-13 Thread mahesh kumar behera (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-19488:
---
Attachment: HIVE-19488.03.patch

> enable CM root based on db parameter, identifying a db as source of 
> replication.
> 
>
> Key: HIVE-19488
> URL: https://issues.apache.org/jira/browse/HIVE-19488
> Project: Hive
>  Issue Type: Task
>  Components: Hive, HiveServer2, repl
>Affects Versions: 3.1.0
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.1.0
>
> Attachments: HIVE-19488.01.patch, HIVE-19488.02.patch, 
> HIVE-19488.03.patch
>
>
> * add a parameter at db level to identify if its a source of replication. 
> beacon will set this.
>  * Enable CM root only for databases that are a source of a replication 
> policy, for other db's skip the CM root functionality.
>  * prevent database drop if the parameter indicating its source of a 
> replication, is set.
>  * as an upgrade to this version, beacon should set the property on all 
> existing database policies, in affect.
>  * the parameter should be of the form . –  repl.source.for : List < policy 
> ids >



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18193) Migrate existing ACID tables to use write id per table rather than global transaction id

2018-05-07 Thread mahesh kumar behera (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466815#comment-16466815
 ] 

mahesh kumar behera commented on HIVE-18193:


with Acid replication not yet supported, if we upgrade all tables to ACID, will 
it not disable replication for all tables ?

> Migrate existing ACID tables to use write id per table rather than global 
> transaction id
> 
>
> Key: HIVE-18193
> URL: https://issues.apache.org/jira/browse/HIVE-18193
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, Transactions
>Affects Versions: 3.0.0
>Reporter: anishek
>Assignee: Sankar Hariappan
>Priority: Blocker
>  Labels: ACID, Upgrade
> Fix For: 3.0.0, 3.1.0
>
> Attachments: HIVE-18193.01.patch, HIVE-18193.02.patch
>
>
> dependent upon HIVE-18192
> For existing ACID Tables we need to update the table level write id 
> metatables/sequences so any new operations on these tables works seamlessly 
> without any conflicting data in existing base/delta files.
> 1. Need to create metadata tables such as NEXT_WRITE_ID and TXN_TO_WRITE_ID.
> 2. Add entries for each ACID/MM tables into NEXT_WRITE_ID where NWI_NEXT is 
> set to current value of NEXT_TXN_ID.NTXN_NEXT.
> 3. All current open/abort transactions to have an entry in TXN_TO_WRITE_ID 
> such that T2W_TXNID=T2W_WRITEID=Open/AbortedTxnId.
> 4. Added new column TC_WRITEID in TXN_COMPONENTS and CTC_WRITEID in 
> COMPLETED_TXN_COMPONENTS to store the write id which should be set as 
> respective values of TC_TXNID and CTC_TXNID from the same row.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Issue Comment Deleted] (HIVE-18193) Migrate existing ACID tables to use write id per table rather than global transaction id

2018-05-07 Thread mahesh kumar behera (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-18193:
---
Comment: was deleted

(was: with Acid replication not yet supported, if we upgrade all tables to 
ACID, will it not disable replication for all tables ?)

> Migrate existing ACID tables to use write id per table rather than global 
> transaction id
> 
>
> Key: HIVE-18193
> URL: https://issues.apache.org/jira/browse/HIVE-18193
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, Transactions
>Affects Versions: 3.0.0
>Reporter: anishek
>Assignee: Sankar Hariappan
>Priority: Blocker
>  Labels: ACID, Upgrade
> Fix For: 3.0.0, 3.1.0
>
> Attachments: HIVE-18193.01.patch, HIVE-18193.02.patch
>
>
> dependent upon HIVE-18192
> For existing ACID Tables we need to update the table level write id 
> metatables/sequences so any new operations on these tables works seamlessly 
> without any conflicting data in existing base/delta files.
> 1. Need to create metadata tables such as NEXT_WRITE_ID and TXN_TO_WRITE_ID.
> 2. Add entries for each ACID/MM tables into NEXT_WRITE_ID where NWI_NEXT is 
> set to current value of NEXT_TXN_ID.NTXN_NEXT.
> 3. All current open/abort transactions to have an entry in TXN_TO_WRITE_ID 
> such that T2W_TXNID=T2W_WRITEID=Open/AbortedTxnId.
> 4. Added new column TC_WRITEID in TXN_COMPONENTS and CTC_WRITEID in 
> COMPLETED_TXN_COMPONENTS to store the write id which should be set as 
> respective values of TC_TXNID and CTC_TXNID from the same row.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19248) REPL LOAD couldn't copy file from source CM path and also doesn't throw error if file copy fails.

2018-05-07 Thread mahesh kumar behera (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-19248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466829#comment-16466829
 ] 

mahesh kumar behera commented on HIVE-19248:


HIVE-19248.02.patch looks fine to me 

> REPL LOAD couldn't copy file from source CM path and also doesn't throw error 
> if file copy fails.
> -
>
> Key: HIVE-19248
> URL: https://issues.apache.org/jira/browse/HIVE-19248
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, repl
>Affects Versions: 3.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Blocker
>  Labels: DR, pull-request-available, replication
> Fix For: 3.0.0, 3.1.0
>
> Attachments: HIVE-19248.01.patch, HIVE-19248.02.patch
>
>
> Hive replication uses Hadoop distcp to copy files from primary to replica 
> warehouse. If the HDFS block size is different across clusters, it cause file 
> copy failures.
> {code:java}
> 2018-04-09 14:32:06,690 ERROR [main] 
> org.apache.hadoop.tools.mapred.CopyMapper: Failure in copying 
> hdfs://chelsea/apps/hive/warehouse/tpch_flat_orc_1000.db/customer/000259_0 to 
> hdfs://marilyn/apps/hive/warehouse/tpch_flat_orc_1000.db/customer/.hive-staging_hive_2018-04-09_14-30-45_723_7153496419225102220-2/-ext-10001/000259_0
> java.io.IOException: File copy failed: 
> hdfs://chelsea/apps/hive/warehouse/tpch_flat_orc_1000.db/customer/000259_0 
> --> 
> hdfs://marilyn/apps/hive/warehouse/tpch_flat_orc_1000.db/customer/.hive-staging_hive_2018-04-09_14-30-45_723_7153496419225102220-2/-ext-10001/000259_0
>  at 
> org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:299)
>  at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:266)
>  at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:52)
>  at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
>  at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
>  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
>  at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:170)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
>  at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:164)
> Caused by: java.io.IOException: Couldn't run retriable-command: Copying 
> hdfs://chelsea/apps/hive/warehouse/tpch_flat_orc_1000.db/customer/000259_0 to 
> hdfs://marilyn/apps/hive/warehouse/tpch_flat_orc_1000.db/customer/.hive-staging_hive_2018-04-09_14-30-45_723_7153496419225102220-2/-ext-10001/000259_0
>  at 
> org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:101)
>  at 
> org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:296)
>  ... 10 more
> Caused by: java.io.IOException: Check-sum mismatch between 
> hdfs://chelsea/apps/hive/warehouse/tpch_flat_orc_1000.db/customer/000259_0 
> and 
> hdfs://marilyn/apps/hive/warehouse/tpch_flat_orc_1000.db/customer/.hive-staging_hive_2018-04-09_14-30-45_723_7153496419225102220-2/-ext-10001/.distcp.tmp.attempt_1522833620762_4416_m_00_0.
>  Source and target differ in block-size. Use -pb to preserve block-sizes 
> during copy. Alternatively, skip checksum-checks altogether, using -skipCrc. 
> (NOTE: By skipping checksums, one runs the risk of masking data-corruption 
> during file-transfer.)
>  at 
> org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.compareCheckSums(RetriableFileCopyCommand.java:212)
>  at 
> org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doCopy(RetriableFileCopyCommand.java:130)
>  at 
> org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doExecute(RetriableFileCopyCommand.java:99)
>  at 
> org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:87)
>  ... 11 more
> {code}
> Distcp failed as the CM path for the file doesn't point to source file 
> system. So, it is needed to get the qualified cm root URI as part of files 
> listed in dump.
> Also, REPL LOAD returns success even if distcp jobs failed.
> CopyUtils.doCopyRetry doesn't throw error if copy failed even after maximum 
> attempts. 
> So, need to perform 2 things.
>  # If copy of multiple files fail for some reason, then retry with same set 
> of files again but need to set CM path if original source file is missing or 
> modified based on checksum. Let distcp to skip the properly copied files. 
> FileUtil.copy will always overwrite the files.
>  # If source path is moved to CM path, then delete the incorrectly copied 
> files.
>  # If copy fails for maximum attempt, then throw error.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19435) Incremental replication cause data loss if a table is dropped followed by create and insert-into with different partition type.

2018-05-10 Thread mahesh kumar behera (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-19435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16470077#comment-16470077
 ] 

mahesh kumar behera commented on HIVE-19435:


code changes looks fine to me

> Incremental replication cause data loss if a table is dropped followed by 
> create and insert-into with different partition type.
> ---
>
> Key: HIVE-19435
> URL: https://issues.apache.org/jira/browse/HIVE-19435
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, repl
>Affects Versions: 3.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR, pull-request-available, replication
> Fix For: 3.1.0
>
> Attachments: HIVE-19435.01.patch, HIVE-19435.02.patch, 
> HIVE-19435.03.patch
>
>
> If the incremental dump have drop of partitioned table followed by 
> create/insert on non-partitioned table with same name, doesn't replicate the 
> data. Explained below.
> Let's say we have a partitioned table T1 which was already replicated to 
> target.
> DROP_TABLE(T1)->CREATE_TABLE(T1) (Non-partitioned) -> INSERT(T1)(10) 
> After REPL LOAD, T1 doesn't have any data.
> Same is valid for non-partitioned to partitioned and partition spec mismatch 
> case as well.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-19488) enable CM root based on db parameter, identifying a db as source of replication.

2018-05-10 Thread mahesh kumar behera (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera reassigned HIVE-19488:
--


> enable CM root based on db parameter, identifying a db as source of 
> replication.
> 
>
> Key: HIVE-19488
> URL: https://issues.apache.org/jira/browse/HIVE-19488
> Project: Hive
>  Issue Type: Task
>  Components: Hive, HiveServer2, repl
>Affects Versions: 3.1.0
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
> Fix For: 3.1.0
>
>
> * add a parameter at db level to identify if its a source of replication. 
> beacon will set this.
>  * Enable CM root only for databases that are a source of a replication 
> policy, for other db's skip the CM root functionality.
>  * prevent database drop if the parameter indicating its source of a 
> replication, is set.
>  * as an upgrade to this version, beacon should set the property on all 
> existing database policies, in affect.
>  * the parameter should be of the form . –  repl.source.for : List < policy 
> ids >



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19488) enable CM root based on db parameter, identifying a db as source of replication.

2018-05-10 Thread mahesh kumar behera (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-19488:
---
Status: Patch Available  (was: Open)

> enable CM root based on db parameter, identifying a db as source of 
> replication.
> 
>
> Key: HIVE-19488
> URL: https://issues.apache.org/jira/browse/HIVE-19488
> Project: Hive
>  Issue Type: Task
>  Components: Hive, HiveServer2, repl
>Affects Versions: 3.1.0
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
> Fix For: 3.1.0
>
> Attachments: HIVE-19488.01.patch
>
>
> * add a parameter at db level to identify if its a source of replication. 
> beacon will set this.
>  * Enable CM root only for databases that are a source of a replication 
> policy, for other db's skip the CM root functionality.
>  * prevent database drop if the parameter indicating its source of a 
> replication, is set.
>  * as an upgrade to this version, beacon should set the property on all 
> existing database policies, in affect.
>  * the parameter should be of the form . –  repl.source.for : List < policy 
> ids >



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19488) enable CM root based on db parameter, identifying a db as source of replication.

2018-05-10 Thread mahesh kumar behera (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-19488:
---
Attachment: HIVE-19488.01.patch

> enable CM root based on db parameter, identifying a db as source of 
> replication.
> 
>
> Key: HIVE-19488
> URL: https://issues.apache.org/jira/browse/HIVE-19488
> Project: Hive
>  Issue Type: Task
>  Components: Hive, HiveServer2, repl
>Affects Versions: 3.1.0
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
> Fix For: 3.1.0
>
> Attachments: HIVE-19488.01.patch
>
>
> * add a parameter at db level to identify if its a source of replication. 
> beacon will set this.
>  * Enable CM root only for databases that are a source of a replication 
> policy, for other db's skip the CM root functionality.
>  * prevent database drop if the parameter indicating its source of a 
> replication, is set.
>  * as an upgrade to this version, beacon should set the property on all 
> existing database policies, in affect.
>  * the parameter should be of the form . –  repl.source.for : List < policy 
> ids >



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19488) enable CM root based on db parameter, identifying a db as source of replication.

2018-05-12 Thread mahesh kumar behera (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-19488:
---
Attachment: (was: HIVE-19488.02.patch)

> enable CM root based on db parameter, identifying a db as source of 
> replication.
> 
>
> Key: HIVE-19488
> URL: https://issues.apache.org/jira/browse/HIVE-19488
> Project: Hive
>  Issue Type: Task
>  Components: Hive, HiveServer2, repl
>Affects Versions: 3.1.0
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.1.0
>
> Attachments: HIVE-19488.01.patch
>
>
> * add a parameter at db level to identify if its a source of replication. 
> beacon will set this.
>  * Enable CM root only for databases that are a source of a replication 
> policy, for other db's skip the CM root functionality.
>  * prevent database drop if the parameter indicating its source of a 
> replication, is set.
>  * as an upgrade to this version, beacon should set the property on all 
> existing database policies, in affect.
>  * the parameter should be of the form . –  repl.source.for : List < policy 
> ids >



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-19512) create thread local hive object for Movetask

2018-05-13 Thread mahesh kumar behera (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera reassigned HIVE-19512:
--


> create thread local hive object for Movetask
> 
>
> Key: HIVE-19512
> URL: https://issues.apache.org/jira/browse/HIVE-19512
> Project: Hive
>  Issue Type: Task
>  Components: Hive, HiveServer2, repl
>Affects Versions: 3.1.0
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.1.0
>
>
> * add a parameter at db level to identify if its a source of replication. 
> beacon will set this.
>  * Enable CM root only for databases that are a source of a replication 
> policy, for other db's skip the CM root functionality.
>  * prevent database drop if the parameter indicating its source of a 
> replication, is set.
>  * as an upgrade to this version, beacon should set the property on all 
> existing database policies, in affect.
>  * the parameter should be of the form . –  repl.source.for : List < policy 
> ids >



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19488) enable CM root based on db parameter, identifying a db as source of replication.

2018-05-13 Thread mahesh kumar behera (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-19488:
---
Attachment: (was: HIVE-19488.02.patch)

> enable CM root based on db parameter, identifying a db as source of 
> replication.
> 
>
> Key: HIVE-19488
> URL: https://issues.apache.org/jira/browse/HIVE-19488
> Project: Hive
>  Issue Type: Task
>  Components: Hive, HiveServer2, repl
>Affects Versions: 3.1.0
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.1.0
>
> Attachments: HIVE-19488.01.patch, HIVE-19488.02.patch
>
>
> * add a parameter at db level to identify if its a source of replication. 
> beacon will set this.
>  * Enable CM root only for databases that are a source of a replication 
> policy, for other db's skip the CM root functionality.
>  * prevent database drop if the parameter indicating its source of a 
> replication, is set.
>  * as an upgrade to this version, beacon should set the property on all 
> existing database policies, in affect.
>  * the parameter should be of the form . –  repl.source.for : List < policy 
> ids >



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19488) enable CM root based on db parameter, identifying a db as source of replication.

2018-05-12 Thread mahesh kumar behera (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-19488:
---
Attachment: HIVE-19488.02.patch

> enable CM root based on db parameter, identifying a db as source of 
> replication.
> 
>
> Key: HIVE-19488
> URL: https://issues.apache.org/jira/browse/HIVE-19488
> Project: Hive
>  Issue Type: Task
>  Components: Hive, HiveServer2, repl
>Affects Versions: 3.1.0
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.1.0
>
> Attachments: HIVE-19488.01.patch, HIVE-19488.02.patch
>
>
> * add a parameter at db level to identify if its a source of replication. 
> beacon will set this.
>  * Enable CM root only for databases that are a source of a replication 
> policy, for other db's skip the CM root functionality.
>  * prevent database drop if the parameter indicating its source of a 
> replication, is set.
>  * as an upgrade to this version, beacon should set the property on all 
> existing database policies, in affect.
>  * the parameter should be of the form . –  repl.source.for : List < policy 
> ids >



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19512) create thread local hive object for Movetask

2018-05-13 Thread mahesh kumar behera (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-19512:
---
Description: The move task does meta store operations. If the meta hive 
object is shared between Move tasks, then meta store throwing out of sequence 
error. So each move task should have a hive object of its own.   (was: * add a 
parameter at db level to identify if its a source of replication. beacon will 
set this.

 * Enable CM root only for databases that are a source of a replication policy, 
for other db's skip the CM root functionality.

 * prevent database drop if the parameter indicating its source of a 
replication, is set.

 * as an upgrade to this version, beacon should set the property on all 
existing database policies, in affect.

 * the parameter should be of the form . –  repl.source.for : List < policy ids 
>)

> create thread local hive object for Movetask
> 
>
> Key: HIVE-19512
> URL: https://issues.apache.org/jira/browse/HIVE-19512
> Project: Hive
>  Issue Type: Task
>  Components: Hive, HiveServer2, repl
>Affects Versions: 3.1.0
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.1.0
>
>
> The move task does meta store operations. If the meta hive object is shared 
> between Move tasks, then meta store throwing out of sequence error. So each 
> move task should have a hive object of its own. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19512) If parallel execution is enabled, metastore is throwing out of sequence error.

2018-05-13 Thread mahesh kumar behera (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-19512:
---
Summary: If parallel execution is enabled, metastore is throwing out of 
sequence error.  (was: create thread local hive object for Movetask)

> If parallel execution is enabled, metastore is throwing out of sequence error.
> --
>
> Key: HIVE-19512
> URL: https://issues.apache.org/jira/browse/HIVE-19512
> Project: Hive
>  Issue Type: Task
>  Components: Hive, HiveServer2, repl
>Affects Versions: 3.1.0
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.1.0
>
>
> The move task does meta store operations. If the meta hive object is shared 
> between Move tasks, then meta store throwing out of sequence error. So each 
> move task should have a hive object of its own. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19512) If parallel execution is enabled, metastore is throwing out of sequence error.

2018-05-13 Thread mahesh kumar behera (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-19512:
---
Labels:   (was: pull-request-available)

> If parallel execution is enabled, metastore is throwing out of sequence error.
> --
>
> Key: HIVE-19512
> URL: https://issues.apache.org/jira/browse/HIVE-19512
> Project: Hive
>  Issue Type: Task
>  Components: Hive, HiveServer2, repl
>Affects Versions: 3.1.0
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
> Fix For: 3.1.0
>
>
> The move task does meta store operations. If the meta hive object is shared 
> between Move tasks, then meta store throwing out of sequence error. So each 
> move task should have a hive object of its own. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19488) enable CM root based on db parameter, identifying a db as source of replication.

2018-05-12 Thread mahesh kumar behera (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-19488:
---
Attachment: HIVE-19488.02.patch

> enable CM root based on db parameter, identifying a db as source of 
> replication.
> 
>
> Key: HIVE-19488
> URL: https://issues.apache.org/jira/browse/HIVE-19488
> Project: Hive
>  Issue Type: Task
>  Components: Hive, HiveServer2, repl
>Affects Versions: 3.1.0
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.1.0
>
> Attachments: HIVE-19488.01.patch, HIVE-19488.02.patch
>
>
> * add a parameter at db level to identify if its a source of replication. 
> beacon will set this.
>  * Enable CM root only for databases that are a source of a replication 
> policy, for other db's skip the CM root functionality.
>  * prevent database drop if the parameter indicating its source of a 
> replication, is set.
>  * as an upgrade to this version, beacon should set the property on all 
> existing database policies, in affect.
>  * the parameter should be of the form . –  repl.source.for : List < policy 
> ids >



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19488) enable CM root based on db parameter, identifying a db as source of replication.

2018-05-13 Thread mahesh kumar behera (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-19488:
---
Attachment: HIVE-19488.02.patch

> enable CM root based on db parameter, identifying a db as source of 
> replication.
> 
>
> Key: HIVE-19488
> URL: https://issues.apache.org/jira/browse/HIVE-19488
> Project: Hive
>  Issue Type: Task
>  Components: Hive, HiveServer2, repl
>Affects Versions: 3.1.0
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.1.0
>
> Attachments: HIVE-19488.01.patch, HIVE-19488.02.patch
>
>
> * add a parameter at db level to identify if its a source of replication. 
> beacon will set this.
>  * Enable CM root only for databases that are a source of a replication 
> policy, for other db's skip the CM root functionality.
>  * prevent database drop if the parameter indicating its source of a 
> replication, is set.
>  * as an upgrade to this version, beacon should set the property on all 
> existing database policies, in affect.
>  * the parameter should be of the form . –  repl.source.for : List < policy 
> ids >



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19485) dump directory for non native tables should not be created

2018-05-17 Thread mahesh kumar behera (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-19485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16478876#comment-16478876
 ] 

mahesh kumar behera commented on HIVE-19485:


code changes looks fine to me

> dump directory for non native tables should not be created
> --
>
> Key: HIVE-19485
> URL: https://issues.apache.org/jira/browse/HIVE-19485
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: anishek
>Assignee: anishek
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.1.0
>
> Attachments: HIVE-19485.0.patch, HIVE-19485.1.patch, 
> HIVE-19485.2.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19267) Create/Replicate ACID Write event

2018-05-19 Thread mahesh kumar behera (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-19267:
---
Attachment: HIVE-19267.05.patch

> Create/Replicate ACID Write event
> -
>
> Key: HIVE-19267
> URL: https://issues.apache.org/jira/browse/HIVE-19267
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl, Transactions
>Affects Versions: 3.0.0
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: ACID, DR, pull-request-available, replication
> Attachments: HIVE-19267.01.patch, HIVE-19267.02.patch, 
> HIVE-19267.03.patch, HIVE-19267.04.patch, HIVE-19267.05.patch
>
>
>  
> h1. Replicate ACID write Events
>  * Create new EVENT_WRITE event with related message format to log the write 
> operations with in a txn along with data associated.
>  * Log this event when perform any writes (insert into, insert overwrite, 
> load table, delete, update, merge, truncate) on table/partition.
>  * If a single MERGE/UPDATE/INSERT/DELETE statement operates on multiple 
> partitions, then need to log one event per partition.
>  * DbNotificationListener should log this type of event to special metastore 
> table named "MTxnWriteNotificationLog".
>  * This table should maintain a map of txn ID against list of 
> tables/partitions written by given txn.
>  * The entry for a given txn should be removed by the cleaner thread that 
> removes the expired events from EventNotificationTable.
> h1. Replicate Commit Txn operation (with writes)
> Add new EVENT_COMMIT_TXN to log the metadata/data of all tables/partitions 
> modified within the txn.
> *Source warehouse:*
>  * This event should read the EVENT_WRITEs from "MTxnWriteNotificationLog" 
> metastore table to consolidate the list of tables/partitions modified within 
> this txn scope.
>  * Based on the list of tables/partitions modified and table Write ID, need 
> to compute the list of delta files added by this txn.
>  * Repl dump should read this message and dump the metadata and delta files 
> list.
> *Target warehouse:*
>  * Ensure snapshot isolation at target for on-going read txns which shouldn't 
> view the data replicated from committed txn. (Ensured with open and allocate 
> write ID events).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19488) enable CM root based on db parameter, identifying a db as source of replication.

2018-05-20 Thread mahesh kumar behera (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-19488:
---
Attachment: HIVE-19488.04.patch

> enable CM root based on db parameter, identifying a db as source of 
> replication.
> 
>
> Key: HIVE-19488
> URL: https://issues.apache.org/jira/browse/HIVE-19488
> Project: Hive
>  Issue Type: Task
>  Components: Hive, HiveServer2, repl
>Affects Versions: 3.1.0
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.1.0
>
> Attachments: HIVE-19488.01.patch, HIVE-19488.02.patch, 
> HIVE-19488.03.patch, HIVE-19488.04.patch
>
>
> * add a parameter at db level to identify if its a source of replication. 
> beacon will set this.
>  * Enable CM root only for databases that are a source of a replication 
> policy, for other db's skip the CM root functionality.
>  * prevent database drop if the parameter indicating its source of a 
> replication, is set.
>  * as an upgrade to this version, beacon should set the property on all 
> existing database policies, in affect.
>  * the parameter should be of the form . –  repl.source.for : List < policy 
> ids >



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19267) Create/Replicate ACID Write event

2018-05-19 Thread mahesh kumar behera (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-19267:
---
Attachment: HIVE-19267.06.patch

> Create/Replicate ACID Write event
> -
>
> Key: HIVE-19267
> URL: https://issues.apache.org/jira/browse/HIVE-19267
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl, Transactions
>Affects Versions: 3.0.0
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: ACID, DR, pull-request-available, replication
> Attachments: HIVE-19267.01.patch, HIVE-19267.02.patch, 
> HIVE-19267.03.patch, HIVE-19267.04.patch, HIVE-19267.05.patch, 
> HIVE-19267.06.patch
>
>
>  
> h1. Replicate ACID write Events
>  * Create new EVENT_WRITE event with related message format to log the write 
> operations with in a txn along with data associated.
>  * Log this event when perform any writes (insert into, insert overwrite, 
> load table, delete, update, merge, truncate) on table/partition.
>  * If a single MERGE/UPDATE/INSERT/DELETE statement operates on multiple 
> partitions, then need to log one event per partition.
>  * DbNotificationListener should log this type of event to special metastore 
> table named "MTxnWriteNotificationLog".
>  * This table should maintain a map of txn ID against list of 
> tables/partitions written by given txn.
>  * The entry for a given txn should be removed by the cleaner thread that 
> removes the expired events from EventNotificationTable.
> h1. Replicate Commit Txn operation (with writes)
> Add new EVENT_COMMIT_TXN to log the metadata/data of all tables/partitions 
> modified within the txn.
> *Source warehouse:*
>  * This event should read the EVENT_WRITEs from "MTxnWriteNotificationLog" 
> metastore table to consolidate the list of tables/partitions modified within 
> this txn scope.
>  * Based on the list of tables/partitions modified and table Write ID, need 
> to compute the list of delta files added by this txn.
>  * Repl dump should read this message and dump the metadata and delta files 
> list.
> *Target warehouse:*
>  * Ensure snapshot isolation at target for on-going read txns which shouldn't 
> view the data replicated from committed txn. (Ensured with open and allocate 
> write ID events).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19488) enable CM root based on db parameter, identifying a db as source of replication.

2018-05-22 Thread mahesh kumar behera (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-19488:
---
Attachment: HIVE-19488.07.patch

> enable CM root based on db parameter, identifying a db as source of 
> replication.
> 
>
> Key: HIVE-19488
> URL: https://issues.apache.org/jira/browse/HIVE-19488
> Project: Hive
>  Issue Type: Task
>  Components: Hive, HiveServer2, repl
>Affects Versions: 3.1.0
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.1.0
>
> Attachments: HIVE-19488.01.patch, HIVE-19488.02.patch, 
> HIVE-19488.03.patch, HIVE-19488.04.patch, HIVE-19488.05.patch, 
> HIVE-19488.06.patch, HIVE-19488.07.patch
>
>
> * add a parameter at db level to identify if its a source of replication. 
> beacon will set this.
>  * Enable CM root only for databases that are a source of a replication 
> policy, for other db's skip the CM root functionality.
>  * prevent database drop if the parameter indicating its source of a 
> replication, is set.
>  * as an upgrade to this version, beacon should set the property on all 
> existing database policies, in affect.
>  * the parameter should be of the form . –  repl.source.for : List < policy 
> ids >



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19340) Disable timeout of transactions opened by replication task at target cluster

2018-05-24 Thread mahesh kumar behera (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-19340:
---
Attachment: HIVE-19340.03.patch

> Disable timeout of transactions opened by replication task at target cluster
> 
>
> Key: HIVE-19340
> URL: https://issues.apache.org/jira/browse/HIVE-19340
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl, Transactions
>Affects Versions: 3.0.0
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: ACID, DR, pull-request-available, replication
> Attachments: HIVE-19340.01.patch, HIVE-19340.02.patch, 
> HIVE-19340.03.patch
>
>
> The transactions opened by applying EVENT_OPEN_TXN should never be aborted 
> automatically due to time-out. Aborting of transaction started by replication 
> task may leads to inconsistent state at target which needs additional 
> overhead to clean-up. So, it is proposed to mark the transactions opened by 
> replication task as special ones and shouldn't be aborted if heart beat is 
> lost. This helps to ensure all ABORT and COMMIT events will always find the 
> corresponding txn at target to operate.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19708) Repl copy retrying with cm path even if the failure is due to network issue

2018-05-24 Thread mahesh kumar behera (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-19708:
---
Labels:   (was: pull-request-available)

> Repl copy retrying with cm path even if the failure is due to network issue
> ---
>
> Key: HIVE-19708
> URL: https://issues.apache.org/jira/browse/HIVE-19708
> Project: Hive
>  Issue Type: Task
>  Components: Hive, HiveServer2, repl
>Affects Versions: 3.1.0
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
> Fix For: 3.1.0
>
>
> * add a parameter at db level to identify if its a source of replication. 
> beacon will set this.
>  * Enable CM root only for databases that are a source of a replication 
> policy, for other db's skip the CM root functionality.
>  * prevent database drop if the parameter indicating its source of a 
> replication, is set.
>  * as an upgrade to this version, beacon should set the property on all 
> existing database policies, in affect.
>  * the parameter should be of the form . –  repl.source.for : List < policy 
> ids >



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-19708) Repl copy retrying with cm path even if the failure is due to network issue

2018-05-24 Thread mahesh kumar behera (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera reassigned HIVE-19708:
--


> Repl copy retrying with cm path even if the failure is due to network issue
> ---
>
> Key: HIVE-19708
> URL: https://issues.apache.org/jira/browse/HIVE-19708
> Project: Hive
>  Issue Type: Task
>  Components: Hive, HiveServer2, repl
>Affects Versions: 3.1.0
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.1.0
>
>
> * add a parameter at db level to identify if its a source of replication. 
> beacon will set this.
>  * Enable CM root only for databases that are a source of a replication 
> policy, for other db's skip the CM root functionality.
>  * prevent database drop if the parameter indicating its source of a 
> replication, is set.
>  * as an upgrade to this version, beacon should set the property on all 
> existing database policies, in affect.
>  * the parameter should be of the form . –  repl.source.for : List < policy 
> ids >



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19499) Bootstrap REPL LOAD shall add tasks to create checkpoints for db/tables/partitions.

2018-05-25 Thread mahesh kumar behera (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-19499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16490360#comment-16490360
 ] 

mahesh kumar behera commented on HIVE-19499:


code changes looks fine to me 

 

> Bootstrap REPL LOAD shall add tasks to create checkpoints for 
> db/tables/partitions.
> ---
>
> Key: HIVE-19499
> URL: https://issues.apache.org/jira/browse/HIVE-19499
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, repl
>Affects Versions: 3.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR, pull-request-available, replication
> Fix For: 3.1.0
>
> Attachments: HIVE-19499.01.patch, HIVE-19499.02.patch
>
>
> Currently. bootstrap REPL LOAD expect the target database to be empty or not 
> exist to start bootstrap load.
> But, this adds overhead when there is a failure in between bootstrap load and 
> there is no way to resume it from where it fails. So, it is needed to create 
> checkpoints in table/partitions to skip the completely loaded objects.
> Use the fully qualified path of the dump directory as a checkpoint 
> identifier. This should be added to the table / partition properties in hive 
> via a task, as the last task in the DAG for table / partition creation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19708) Repl copy retrying with cm path even if the failure is due to network issue

2018-05-25 Thread mahesh kumar behera (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-19708:
---
Description: 
* During repl load
 ** for filesystem based copying of file if the copy fails due to a connection 
error to source Name Node, we should recreate the filesystem object.
 ** the retry logic for local file copy should be triggered using the original 
source file path ( and not the CM root path ) since failure can be due to 
network issues between DFSClient and NN.

 * When listing files in tables / partition to include them in _files, we 
should add retry logic when failure occurs. FileSystem object here also should 
be recreated since the existing one might be in inconsistent state.

  was:
* add a parameter at db level to identify if its a source of replication. 
beacon will set this.

 * Enable CM root only for databases that are a source of a replication policy, 
for other db's skip the CM root functionality.

 * prevent database drop if the parameter indicating its source of a 
replication, is set.

 * as an upgrade to this version, beacon should set the property on all 
existing database policies, in affect.

 * the parameter should be of the form . –  repl.source.for : List < policy ids 
>


> Repl copy retrying with cm path even if the failure is due to network issue
> ---
>
> Key: HIVE-19708
> URL: https://issues.apache.org/jira/browse/HIVE-19708
> Project: Hive
>  Issue Type: Task
>  Components: Hive, HiveServer2, repl
>Affects Versions: 3.1.0
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
> Fix For: 3.1.0
>
>
> * During repl load
>  ** for filesystem based copying of file if the copy fails due to a 
> connection error to source Name Node, we should recreate the filesystem 
> object.
>  ** the retry logic for local file copy should be triggered using the 
> original source file path ( and not the CM root path ) since failure can be 
> due to network issues between DFSClient and NN.
>  * When listing files in tables / partition to include them in _files, we 
> should add retry logic when failure occurs. FileSystem object here also 
> should be recreated since the existing one might be in inconsistent state.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19708) Repl copy retrying with cm path even if the failure is due to network issue

2018-05-25 Thread mahesh kumar behera (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-19708:
---
Attachment: HIVE-19708.01.patch

> Repl copy retrying with cm path even if the failure is due to network issue
> ---
>
> Key: HIVE-19708
> URL: https://issues.apache.org/jira/browse/HIVE-19708
> Project: Hive
>  Issue Type: Task
>  Components: Hive, HiveServer2, repl
>Affects Versions: 3.1.0
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
> Fix For: 3.1.0
>
> Attachments: HIVE-19708.01.patch
>
>
> * During repl load
>  ** for filesystem based copying of file if the copy fails due to a 
> connection error to source Name Node, we should recreate the filesystem 
> object.
>  ** the retry logic for local file copy should be triggered using the 
> original source file path ( and not the CM root path ) since failure can be 
> due to network issues between DFSClient and NN.
>  * When listing files in tables / partition to include them in _files, we 
> should add retry logic when failure occurs. FileSystem object here also 
> should be recreated since the existing one might be in inconsistent state.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19708) Repl copy retrying with cm path even if the failure is due to network issue

2018-05-25 Thread mahesh kumar behera (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-19708:
---
Status: Patch Available  (was: Open)

> Repl copy retrying with cm path even if the failure is due to network issue
> ---
>
> Key: HIVE-19708
> URL: https://issues.apache.org/jira/browse/HIVE-19708
> Project: Hive
>  Issue Type: Task
>  Components: Hive, HiveServer2, repl
>Affects Versions: 3.1.0
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
> Fix For: 3.1.0
>
> Attachments: HIVE-19708.01.patch
>
>
> * During repl load
>  ** for filesystem based copying of file if the copy fails due to a 
> connection error to source Name Node, we should recreate the filesystem 
> object.
>  ** the retry logic for local file copy should be triggered using the 
> original source file path ( and not the CM root path ) since failure can be 
> due to network issues between DFSClient and NN.
>  * When listing files in tables / partition to include them in _files, we 
> should add retry logic when failure occurs. FileSystem object here also 
> should be recreated since the existing one might be in inconsistent state.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19340) Disable timeout of transactions opened by replication task at target cluster

2018-05-22 Thread mahesh kumar behera (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-19340:
---
Attachment: HIVE-19340.02.patch

> Disable timeout of transactions opened by replication task at target cluster
> 
>
> Key: HIVE-19340
> URL: https://issues.apache.org/jira/browse/HIVE-19340
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl, Transactions
>Affects Versions: 3.0.0
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: ACID, DR, pull-request-available, replication
> Attachments: HIVE-19340.01.patch, HIVE-19340.02.patch
>
>
> The transactions opened by applying EVENT_OPEN_TXN should never be aborted 
> automatically due to time-out. Aborting of transaction started by replication 
> task may leads to inconsistent state at target which needs additional 
> overhead to clean-up. So, it is proposed to mark the transactions opened by 
> replication task as special ones and shouldn't be aborted if heart beat is 
> lost. This helps to ensure all ABORT and COMMIT events will always find the 
> corresponding txn at target to operate.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19488) enable CM root based on db parameter, identifying a db as source of replication.

2018-05-21 Thread mahesh kumar behera (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-19488:
---
Attachment: HIVE-19488.05.patch

> enable CM root based on db parameter, identifying a db as source of 
> replication.
> 
>
> Key: HIVE-19488
> URL: https://issues.apache.org/jira/browse/HIVE-19488
> Project: Hive
>  Issue Type: Task
>  Components: Hive, HiveServer2, repl
>Affects Versions: 3.1.0
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.1.0
>
> Attachments: HIVE-19488.01.patch, HIVE-19488.02.patch, 
> HIVE-19488.03.patch, HIVE-19488.04.patch, HIVE-19488.05.patch
>
>
> * add a parameter at db level to identify if its a source of replication. 
> beacon will set this.
>  * Enable CM root only for databases that are a source of a replication 
> policy, for other db's skip the CM root functionality.
>  * prevent database drop if the parameter indicating its source of a 
> replication, is set.
>  * as an upgrade to this version, beacon should set the property on all 
> existing database policies, in affect.
>  * the parameter should be of the form . –  repl.source.for : List < policy 
> ids >



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19488) enable CM root based on db parameter, identifying a db as source of replication.

2018-05-22 Thread mahesh kumar behera (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-19488:
---
Attachment: HIVE-19488.06.patch

> enable CM root based on db parameter, identifying a db as source of 
> replication.
> 
>
> Key: HIVE-19488
> URL: https://issues.apache.org/jira/browse/HIVE-19488
> Project: Hive
>  Issue Type: Task
>  Components: Hive, HiveServer2, repl
>Affects Versions: 3.1.0
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.1.0
>
> Attachments: HIVE-19488.01.patch, HIVE-19488.02.patch, 
> HIVE-19488.03.patch, HIVE-19488.04.patch, HIVE-19488.05.patch, 
> HIVE-19488.06.patch
>
>
> * add a parameter at db level to identify if its a source of replication. 
> beacon will set this.
>  * Enable CM root only for databases that are a source of a replication 
> policy, for other db's skip the CM root functionality.
>  * prevent database drop if the parameter indicating its source of a 
> replication, is set.
>  * as an upgrade to this version, beacon should set the property on all 
> existing database policies, in affect.
>  * the parameter should be of the form . –  repl.source.for : List < policy 
> ids >



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19569) alter table db1.t1 rename db2.t2 generates MetaStoreEventListener.onDropTable()

2018-06-09 Thread mahesh kumar behera (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-19569:
---
Attachment: HIVE-19569.02.patch

> alter table db1.t1 rename db2.t2 generates 
> MetaStoreEventListener.onDropTable()
> ---
>
> Key: HIVE-19569
> URL: https://issues.apache.org/jira/browse/HIVE-19569
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore, Standalone Metastore, Transactions
>Affects Versions: 3.0.0
>Reporter: Eugene Koifman
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-19569.01.patch, HIVE-19569.02.patch
>
>
> When renaming a table within the same DB, this operation causes 
> {{MetaStoreEventListener.onAlterTable()}} to fire but when changing DB name 
> for a table it causes {{MetaStoreEventListener.onDropTable()}} + 
> {{MetaStoreEventListener.onCreateTable()}}.
> The files from original table are moved to new table location.  
> This creates confusing semantics since any logic in {{onDropTable()}} doesn't 
> know about the larger context, i.e. that there will be a matching 
> {{onCreateTable()}}.
> In particular, this causes a problem for Acid tables since files moved from 
> old table use WriteIDs that are not meaningful with the context of new table.
> Current implementation is due to replication.  This should ideally be changed 
> to raise a "not supported" error for tables that are marked for replication.
> cc [~sankarh]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19725) Add ability to dump non-native tables in replication metadata dump

2018-06-09 Thread mahesh kumar behera (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-19725:
---
Attachment: HIVE-19725.04.patch

> Add ability to dump non-native tables in replication metadata dump
> --
>
> Key: HIVE-19725
> URL: https://issues.apache.org/jira/browse/HIVE-19725
> Project: Hive
>  Issue Type: Task
>  Components: repl
>Affects Versions: 3.0.0, 3.1.0, 4.0.0
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: Repl, pull-request-available
> Fix For: 3.1.0, 4.0.0
>
> Attachments: HIVE-19725.01.patch, HIVE-19725.02.patch, 
> HIVE-19725.03.patch, HIVE-19725.04.patch
>
>
> if hive.repl.dump.metadata.only is set to true, allow dumping non native 
> tables also. 
> Data dump for non-native tables should never be allowed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19267) Create/Replicate ACID Write event

2018-06-09 Thread mahesh kumar behera (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-19267:
---
Attachment: HIVE-19267.13.patch

> Create/Replicate ACID Write event
> -
>
> Key: HIVE-19267
> URL: https://issues.apache.org/jira/browse/HIVE-19267
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl, Transactions
>Affects Versions: 3.0.0
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: ACID, DR, pull-request-available, replication
> Attachments: HIVE-19267.01.patch, HIVE-19267.02.patch, 
> HIVE-19267.03.patch, HIVE-19267.04.patch, HIVE-19267.05.patch, 
> HIVE-19267.06.patch, HIVE-19267.07.patch, HIVE-19267.08.patch, 
> HIVE-19267.09.patch, HIVE-19267.10.patch, HIVE-19267.11.patch, 
> HIVE-19267.12.patch, HIVE-19267.13.patch
>
>
>  
> h1. Replicate ACID write Events
>  * Create new EVENT_WRITE event with related message format to log the write 
> operations with in a txn along with data associated.
>  * Log this event when perform any writes (insert into, insert overwrite, 
> load table, delete, update, merge, truncate) on table/partition.
>  * If a single MERGE/UPDATE/INSERT/DELETE statement operates on multiple 
> partitions, then need to log one event per partition.
>  * DbNotificationListener should log this type of event to special metastore 
> table named "MTxnWriteNotificationLog".
>  * This table should maintain a map of txn ID against list of 
> tables/partitions written by given txn.
>  * The entry for a given txn should be removed by the cleaner thread that 
> removes the expired events from EventNotificationTable.
> h1. Replicate Commit Txn operation (with writes)
> Add new EVENT_COMMIT_TXN to log the metadata/data of all tables/partitions 
> modified within the txn.
> *Source warehouse:*
>  * This event should read the EVENT_WRITEs from "MTxnWriteNotificationLog" 
> metastore table to consolidate the list of tables/partitions modified within 
> this txn scope.
>  * Based on the list of tables/partitions modified and table Write ID, need 
> to compute the list of delta files added by this txn.
>  * Repl dump should read this message and dump the metadata and delta files 
> list.
> *Target warehouse:*
>  * Ensure snapshot isolation at target for on-going read txns which shouldn't 
> view the data replicated from committed txn. (Ensured with open and allocate 
> write ID events).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19267) Create/Replicate ACID Write event

2018-06-09 Thread mahesh kumar behera (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-19267:
---
Attachment: HIVE-19267.12.patch

> Create/Replicate ACID Write event
> -
>
> Key: HIVE-19267
> URL: https://issues.apache.org/jira/browse/HIVE-19267
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl, Transactions
>Affects Versions: 3.0.0
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: ACID, DR, pull-request-available, replication
> Attachments: HIVE-19267.01.patch, HIVE-19267.02.patch, 
> HIVE-19267.03.patch, HIVE-19267.04.patch, HIVE-19267.05.patch, 
> HIVE-19267.06.patch, HIVE-19267.07.patch, HIVE-19267.08.patch, 
> HIVE-19267.09.patch, HIVE-19267.10.patch, HIVE-19267.11.patch, 
> HIVE-19267.12.patch
>
>
>  
> h1. Replicate ACID write Events
>  * Create new EVENT_WRITE event with related message format to log the write 
> operations with in a txn along with data associated.
>  * Log this event when perform any writes (insert into, insert overwrite, 
> load table, delete, update, merge, truncate) on table/partition.
>  * If a single MERGE/UPDATE/INSERT/DELETE statement operates on multiple 
> partitions, then need to log one event per partition.
>  * DbNotificationListener should log this type of event to special metastore 
> table named "MTxnWriteNotificationLog".
>  * This table should maintain a map of txn ID against list of 
> tables/partitions written by given txn.
>  * The entry for a given txn should be removed by the cleaner thread that 
> removes the expired events from EventNotificationTable.
> h1. Replicate Commit Txn operation (with writes)
> Add new EVENT_COMMIT_TXN to log the metadata/data of all tables/partitions 
> modified within the txn.
> *Source warehouse:*
>  * This event should read the EVENT_WRITEs from "MTxnWriteNotificationLog" 
> metastore table to consolidate the list of tables/partitions modified within 
> this txn scope.
>  * Based on the list of tables/partitions modified and table Write ID, need 
> to compute the list of delta files added by this txn.
>  * Repl dump should read this message and dump the metadata and delta files 
> list.
> *Target warehouse:*
>  * Ensure snapshot isolation at target for on-going read txns which shouldn't 
> view the data replicated from committed txn. (Ensured with open and allocate 
> write ID events).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19829) Incremental replication load should create tasks in execution phase rather than semantic phase

2018-06-10 Thread mahesh kumar behera (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-19829:
---
Attachment: HIVE-19829.02.patch

> Incremental replication load should create tasks in execution phase rather 
> than semantic phase
> --
>
> Key: HIVE-19829
> URL: https://issues.apache.org/jira/browse/HIVE-19829
> Project: Hive
>  Issue Type: Task
>  Components: repl
>Affects Versions: 3.1.0, 4.0.0
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.1.0, 4.0.0
>
> Attachments: HIVE-19829.01.patch, HIVE-19829.02.patch
>
>
> Split the incremental load into multiple iterations. In each iteration create 
> number of tasks equal to the configured value.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19267) Create/Replicate ACID Write event

2018-06-10 Thread mahesh kumar behera (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-19267:
---
Attachment: HIVE-19267.14.patch

> Create/Replicate ACID Write event
> -
>
> Key: HIVE-19267
> URL: https://issues.apache.org/jira/browse/HIVE-19267
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl, Transactions
>Affects Versions: 3.0.0
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: ACID, DR, pull-request-available, replication
> Attachments: HIVE-19267.01.patch, HIVE-19267.02.patch, 
> HIVE-19267.03.patch, HIVE-19267.04.patch, HIVE-19267.05.patch, 
> HIVE-19267.06.patch, HIVE-19267.07.patch, HIVE-19267.08.patch, 
> HIVE-19267.09.patch, HIVE-19267.10.patch, HIVE-19267.11.patch, 
> HIVE-19267.12.patch, HIVE-19267.13.patch, HIVE-19267.14.patch
>
>
>  
> h1. Replicate ACID write Events
>  * Create new EVENT_WRITE event with related message format to log the write 
> operations with in a txn along with data associated.
>  * Log this event when perform any writes (insert into, insert overwrite, 
> load table, delete, update, merge, truncate) on table/partition.
>  * If a single MERGE/UPDATE/INSERT/DELETE statement operates on multiple 
> partitions, then need to log one event per partition.
>  * DbNotificationListener should log this type of event to special metastore 
> table named "MTxnWriteNotificationLog".
>  * This table should maintain a map of txn ID against list of 
> tables/partitions written by given txn.
>  * The entry for a given txn should be removed by the cleaner thread that 
> removes the expired events from EventNotificationTable.
> h1. Replicate Commit Txn operation (with writes)
> Add new EVENT_COMMIT_TXN to log the metadata/data of all tables/partitions 
> modified within the txn.
> *Source warehouse:*
>  * This event should read the EVENT_WRITEs from "MTxnWriteNotificationLog" 
> metastore table to consolidate the list of tables/partitions modified within 
> this txn scope.
>  * Based on the list of tables/partitions modified and table Write ID, need 
> to compute the list of delta files added by this txn.
>  * Repl dump should read this message and dump the metadata and delta files 
> list.
> *Target warehouse:*
>  * Ensure snapshot isolation at target for on-going read txns which shouldn't 
> view the data replicated from committed txn. (Ensured with open and allocate 
> write ID events).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19725) Add ability to dump non-native tables in replication metadata dump

2018-06-10 Thread mahesh kumar behera (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-19725:
---
Attachment: HIVE-19725.05.patch

> Add ability to dump non-native tables in replication metadata dump
> --
>
> Key: HIVE-19725
> URL: https://issues.apache.org/jira/browse/HIVE-19725
> Project: Hive
>  Issue Type: Task
>  Components: repl
>Affects Versions: 3.0.0, 3.1.0, 4.0.0
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: Repl, pull-request-available
> Fix For: 3.1.0, 4.0.0
>
> Attachments: HIVE-19725.01.patch, HIVE-19725.02.patch, 
> HIVE-19725.03.patch, HIVE-19725.04.patch, HIVE-19725.05.patch
>
>
> if hive.repl.dump.metadata.only is set to true, allow dumping non native 
> tables also. 
> Data dump for non-native tables should never be allowed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19829) Incremental replication load should create tasks in execution phase rather than semantic phase

2018-06-08 Thread mahesh kumar behera (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-19829:
---
Status: Patch Available  (was: Open)

> Incremental replication load should create tasks in execution phase rather 
> than semantic phase
> --
>
> Key: HIVE-19829
> URL: https://issues.apache.org/jira/browse/HIVE-19829
> Project: Hive
>  Issue Type: Task
>  Components: repl
>Affects Versions: 3.1.0, 4.0.0
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
> Fix For: 3.1.0, 4.0.0
>
> Attachments: HIVE-19829.01.patch
>
>
> Split the incremental load into multiple iterations. In each iteration create 
> number of tasks equal to the configured value.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-19829) Incremental replication load should create tasks in execution phase rather than semantic phase

2018-06-08 Thread mahesh kumar behera (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera reassigned HIVE-19829:
--


> Incremental replication load should create tasks in execution phase rather 
> than semantic phase
> --
>
> Key: HIVE-19829
> URL: https://issues.apache.org/jira/browse/HIVE-19829
> Project: Hive
>  Issue Type: Task
>  Components: repl
>Affects Versions: 3.1.0, 4.0.0
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
> Fix For: 3.1.0, 4.0.0
>
>
> Split the incremental load into multiple iterations. In each iteration create 
> number of tasks equal to the configured value.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19829) Incremental replication load should create tasks in execution phase rather than semantic phase

2018-06-08 Thread mahesh kumar behera (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-19829:
---
Attachment: HIVE-19829.01.patch

> Incremental replication load should create tasks in execution phase rather 
> than semantic phase
> --
>
> Key: HIVE-19829
> URL: https://issues.apache.org/jira/browse/HIVE-19829
> Project: Hive
>  Issue Type: Task
>  Components: repl
>Affects Versions: 3.1.0, 4.0.0
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
> Fix For: 3.1.0, 4.0.0
>
> Attachments: HIVE-19829.01.patch
>
>
> Split the incremental load into multiple iterations. In each iteration create 
> number of tasks equal to the configured value.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19829) Incremental replication load should create tasks in execution phase rather than semantic phase

2018-06-08 Thread mahesh kumar behera (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16506413#comment-16506413
 ] 

mahesh kumar behera commented on HIVE-19829:


[~sankarh] Please review the patch

> Incremental replication load should create tasks in execution phase rather 
> than semantic phase
> --
>
> Key: HIVE-19829
> URL: https://issues.apache.org/jira/browse/HIVE-19829
> Project: Hive
>  Issue Type: Task
>  Components: repl
>Affects Versions: 3.1.0, 4.0.0
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.1.0, 4.0.0
>
> Attachments: HIVE-19829.01.patch
>
>
> Split the incremental load into multiple iterations. In each iteration create 
> number of tasks equal to the configured value.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19812) Disable external table replication by default via a configuration property

2018-06-08 Thread mahesh kumar behera (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-19812:
---
Attachment: HIVE-19812.02.patch

> Disable external table replication by default via a configuration property
> --
>
> Key: HIVE-19812
> URL: https://issues.apache.org/jira/browse/HIVE-19812
> Project: Hive
>  Issue Type: Task
>  Components: repl
>Affects Versions: 3.1.0, 4.0.0
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.1.0, 4.0.0
>
> Attachments: HIVE-19812.01.patch, HIVE-19812.02.patch
>
>
> use a hive config property to allow external table replication. set this 
> property by default to prevent external table replication.
> for metadata only hive repl always export metadata for external tables.
>  
> REPL_DUMP_EXTERNAL_TABLES("hive.repl.dump.include.external.tables", false,
> "Indicates if repl dump should include information about external tables. It 
> should be \n"
> + "used in conjunction with 'hive.repl.dump.metadata.only' set to false. if 
> 'hive.repl.dump.metadata.only' \n"
> + " is set to true then this config parameter has no effect as external table 
> meta data is flushed \n"
> + " always by default.")
> This should be done for only replication dump and not for export



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19267) Create/Replicate ACID Write event

2018-06-15 Thread mahesh kumar behera (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-19267:
---
Attachment: HIVE-19267.16.patch

> Create/Replicate ACID Write event
> -
>
> Key: HIVE-19267
> URL: https://issues.apache.org/jira/browse/HIVE-19267
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl, Transactions
>Affects Versions: 3.0.0
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: ACID, DR, pull-request-available, replication
> Attachments: HIVE-19267.01.patch, HIVE-19267.02.patch, 
> HIVE-19267.03.patch, HIVE-19267.04.patch, HIVE-19267.05.patch, 
> HIVE-19267.06.patch, HIVE-19267.07.patch, HIVE-19267.08.patch, 
> HIVE-19267.09.patch, HIVE-19267.10.patch, HIVE-19267.11.patch, 
> HIVE-19267.12.patch, HIVE-19267.13.patch, HIVE-19267.14.patch, 
> HIVE-19267.15.patch, HIVE-19267.16.patch
>
>
>  
> h1. Replicate ACID write Events
>  * Create new EVENT_WRITE event with related message format to log the write 
> operations with in a txn along with data associated.
>  * Log this event when perform any writes (insert into, insert overwrite, 
> load table, delete, update, merge, truncate) on table/partition.
>  * If a single MERGE/UPDATE/INSERT/DELETE statement operates on multiple 
> partitions, then need to log one event per partition.
>  * DbNotificationListener should log this type of event to special metastore 
> table named "MTxnWriteNotificationLog".
>  * This table should maintain a map of txn ID against list of 
> tables/partitions written by given txn.
>  * The entry for a given txn should be removed by the cleaner thread that 
> removes the expired events from EventNotificationTable.
> h1. Replicate Commit Txn operation (with writes)
> Add new EVENT_COMMIT_TXN to log the metadata/data of all tables/partitions 
> modified within the txn.
> *Source warehouse:*
>  * This event should read the EVENT_WRITEs from "MTxnWriteNotificationLog" 
> metastore table to consolidate the list of tables/partitions modified within 
> this txn scope.
>  * Based on the list of tables/partitions modified and table Write ID, need 
> to compute the list of delta files added by this txn.
>  * Repl dump should read this message and dump the metadata and delta files 
> list.
> *Target warehouse:*
>  * Ensure snapshot isolation at target for on-going read txns which shouldn't 
> view the data replicated from committed txn. (Ensured with open and allocate 
> write ID events).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19725) Add ability to dump non-native tables in replication metadata dump

2018-06-16 Thread mahesh kumar behera (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-19725:
---
Attachment: HIVE-19725.07.patch

> Add ability to dump non-native tables in replication metadata dump
> --
>
> Key: HIVE-19725
> URL: https://issues.apache.org/jira/browse/HIVE-19725
> Project: Hive
>  Issue Type: Task
>  Components: repl
>Affects Versions: 3.0.0, 3.1.0, 4.0.0
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: Repl, pull-request-available
> Fix For: 3.1.0, 4.0.0
>
> Attachments: HIVE-19725.01.patch, HIVE-19725.02.patch, 
> HIVE-19725.03.patch, HIVE-19725.04.patch, HIVE-19725.05.patch, 
> HIVE-19725.06-branch-3.patch, HIVE-19725.07.patch
>
>
> if hive.repl.dump.metadata.only is set to true, allow dumping non native 
> tables also. 
> Data dump for non-native tables should never be allowed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19267) Create/Replicate ACID Write event

2018-06-18 Thread mahesh kumar behera (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-19267:
---
Attachment: HIVE-19267.17.patch

> Create/Replicate ACID Write event
> -
>
> Key: HIVE-19267
> URL: https://issues.apache.org/jira/browse/HIVE-19267
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl, Transactions
>Affects Versions: 3.0.0
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: ACID, DR, pull-request-available, replication
> Attachments: HIVE-19267.01.patch, HIVE-19267.02.patch, 
> HIVE-19267.03.patch, HIVE-19267.04.patch, HIVE-19267.05.patch, 
> HIVE-19267.06.patch, HIVE-19267.07.patch, HIVE-19267.08.patch, 
> HIVE-19267.09.patch, HIVE-19267.10.patch, HIVE-19267.11.patch, 
> HIVE-19267.12.patch, HIVE-19267.13.patch, HIVE-19267.14.patch, 
> HIVE-19267.15.patch, HIVE-19267.16.patch, HIVE-19267.17.patch
>
>
>  
> h1. Replicate ACID write Events
>  * Create new EVENT_WRITE event with related message format to log the write 
> operations with in a txn along with data associated.
>  * Log this event when perform any writes (insert into, insert overwrite, 
> load table, delete, update, merge, truncate) on table/partition.
>  * If a single MERGE/UPDATE/INSERT/DELETE statement operates on multiple 
> partitions, then need to log one event per partition.
>  * DbNotificationListener should log this type of event to special metastore 
> table named "MTxnWriteNotificationLog".
>  * This table should maintain a map of txn ID against list of 
> tables/partitions written by given txn.
>  * The entry for a given txn should be removed by the cleaner thread that 
> removes the expired events from EventNotificationTable.
> h1. Replicate Commit Txn operation (with writes)
> Add new EVENT_COMMIT_TXN to log the metadata/data of all tables/partitions 
> modified within the txn.
> *Source warehouse:*
>  * This event should read the EVENT_WRITEs from "MTxnWriteNotificationLog" 
> metastore table to consolidate the list of tables/partitions modified within 
> this txn scope.
>  * Based on the list of tables/partitions modified and table Write ID, need 
> to compute the list of delta files added by this txn.
>  * Repl dump should read this message and dump the metadata and delta files 
> list.
> *Target warehouse:*
>  * Ensure snapshot isolation at target for on-going read txns which shouldn't 
> view the data replicated from committed txn. (Ensured with open and allocate 
> write ID events).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19880) Repl Load to return recoverable vs non-recoverable error codes

2018-06-18 Thread mahesh kumar behera (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-19880:
---
Attachment: HIVE-19880.04-branch-3.patch

> Repl Load to return recoverable vs non-recoverable error codes
> --
>
> Key: HIVE-19880
> URL: https://issues.apache.org/jira/browse/HIVE-19880
> Project: Hive
>  Issue Type: Task
>  Components: repl
>Affects Versions: 3.1.0, 4.0.0
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-19880.01.patch, HIVE-19880.04-branch-3.patch, 
> HIVE-19880.04.patch
>
>
> To enable bootstrap of large databases, application has to have the ability 
> to keep retrying the bootstrap load till it encounters a fatal error. The 
> ability to identify if an error is fatal or not will be decided by hive and 
> communication of the same will happen to application via error codes.
> So there should be different error codes for recoverable vs non-recoverable 
> failures which should be propagated to application as part of running the 
> repl load command.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19725) Add ability to dump non-native tables in replication metadata dump

2018-06-18 Thread mahesh kumar behera (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-19725:
---
Attachment: HIVE-19725.07-branch-3.patch

> Add ability to dump non-native tables in replication metadata dump
> --
>
> Key: HIVE-19725
> URL: https://issues.apache.org/jira/browse/HIVE-19725
> Project: Hive
>  Issue Type: Task
>  Components: repl
>Affects Versions: 3.0.0, 3.1.0, 4.0.0
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: Repl, pull-request-available
> Fix For: 3.1.0, 4.0.0
>
> Attachments: HIVE-19725.01.patch, HIVE-19725.02.patch, 
> HIVE-19725.03.patch, HIVE-19725.04.patch, HIVE-19725.05.patch, 
> HIVE-19725.06-branch-3.patch, HIVE-19725.07-branch-3.patch, 
> HIVE-19725.07.patch
>
>
> if hive.repl.dump.metadata.only is set to true, allow dumping non native 
> tables also. 
> Data dump for non-native tables should never be allowed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-19880) Repl Load to return recoverable vs non-recoverable error codes

2018-06-13 Thread mahesh kumar behera (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera reassigned HIVE-19880:
--


> Repl Load to return recoverable vs non-recoverable error codes
> --
>
> Key: HIVE-19880
> URL: https://issues.apache.org/jira/browse/HIVE-19880
> Project: Hive
>  Issue Type: Task
>  Components: repl
>Affects Versions: 3.1.0, 4.0.0
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
> Fix For: 3.1.0, 4.0.0
>
>
> To enable bootstrap of large databases, beacon has to have the ability to 
> keep retrying the bootstrap load till it encounters a fatal error. The 
> ability to identify if an error is fatal or not will be decided by hive and 
> communication of the same will happen to beacon via error codes.
> So there should be different error codes for recoverable vs non-recoverable 
> failures which should be propagated to beacon as part of running the repl 
> load command.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19880) Repl Load to return recoverable vs non-recoverable error codes

2018-06-13 Thread mahesh kumar behera (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16511153#comment-16511153
 ] 

mahesh kumar behera commented on HIVE-19880:


[~sankarh]

Please review the patch

> Repl Load to return recoverable vs non-recoverable error codes
> --
>
> Key: HIVE-19880
> URL: https://issues.apache.org/jira/browse/HIVE-19880
> Project: Hive
>  Issue Type: Task
>  Components: repl
>Affects Versions: 3.1.0, 4.0.0
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
> Fix For: 3.1.0, 4.0.0
>
> Attachments: HIVE-19880.01.patch
>
>
> To enable bootstrap of large databases, beacon has to have the ability to 
> keep retrying the bootstrap load till it encounters a fatal error. The 
> ability to identify if an error is fatal or not will be decided by hive and 
> communication of the same will happen to beacon via error codes.
> So there should be different error codes for recoverable vs non-recoverable 
> failures which should be propagated to beacon as part of running the repl 
> load command.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19880) Repl Load to return recoverable vs non-recoverable error codes

2018-06-13 Thread mahesh kumar behera (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-19880:
---
Status: Patch Available  (was: Open)

> Repl Load to return recoverable vs non-recoverable error codes
> --
>
> Key: HIVE-19880
> URL: https://issues.apache.org/jira/browse/HIVE-19880
> Project: Hive
>  Issue Type: Task
>  Components: repl
>Affects Versions: 3.1.0, 4.0.0
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
> Fix For: 3.1.0, 4.0.0
>
> Attachments: HIVE-19880.01.patch
>
>
> To enable bootstrap of large databases, beacon has to have the ability to 
> keep retrying the bootstrap load till it encounters a fatal error. The 
> ability to identify if an error is fatal or not will be decided by hive and 
> communication of the same will happen to beacon via error codes.
> So there should be different error codes for recoverable vs non-recoverable 
> failures which should be propagated to beacon as part of running the repl 
> load command.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19880) Repl Load to return recoverable vs non-recoverable error codes

2018-06-13 Thread mahesh kumar behera (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-19880:
---
Attachment: HIVE-19880.01.patch

> Repl Load to return recoverable vs non-recoverable error codes
> --
>
> Key: HIVE-19880
> URL: https://issues.apache.org/jira/browse/HIVE-19880
> Project: Hive
>  Issue Type: Task
>  Components: repl
>Affects Versions: 3.1.0, 4.0.0
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
> Fix For: 3.1.0, 4.0.0
>
> Attachments: HIVE-19880.01.patch
>
>
> To enable bootstrap of large databases, beacon has to have the ability to 
> keep retrying the bootstrap load till it encounters a fatal error. The 
> ability to identify if an error is fatal or not will be decided by hive and 
> communication of the same will happen to beacon via error codes.
> So there should be different error codes for recoverable vs non-recoverable 
> failures which should be propagated to beacon as part of running the repl 
> load command.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19881) Allow metadata dump for database which are not source of replication

2018-06-13 Thread mahesh kumar behera (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16511309#comment-16511309
 ] 

mahesh kumar behera commented on HIVE-19881:


[~sankarh]  [~anishek]

Please review the patch

> Allow metadata dump for database which are not source of replication
> 
>
> Key: HIVE-19881
> URL: https://issues.apache.org/jira/browse/HIVE-19881
> Project: Hive
>  Issue Type: Task
>  Components: repl
>Affects Versions: 3.1.0, 4.0.0
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
> Fix For: 3.1.0, 4.0.0
>
> Attachments: HIVE-19881.01..patch
>
>
> If the dump is meta data only then allow dump even if the db is not source of 
> replication



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19340) Disable timeout of transactions opened by replication task at target cluster

2018-06-13 Thread mahesh kumar behera (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-19340:
---
Attachment: HIVE-19340.07-branch-3.patch

> Disable timeout of transactions opened by replication task at target cluster
> 
>
> Key: HIVE-19340
> URL: https://issues.apache.org/jira/browse/HIVE-19340
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl, Transactions
>Affects Versions: 3.0.0
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: ACID, DR, pull-request-available, replication
> Fix For: 4.0.0
>
> Attachments: HIVE-19340.01.patch, HIVE-19340.02.patch, 
> HIVE-19340.03-branch-3.patch, HIVE-19340.03.patch, 
> HIVE-19340.04-branch-3.patch, HIVE-19340.06-branch-3.patch, 
> HIVE-19340.06.patch, HIVE-19340.07-branch-3.patch
>
>
> The transactions opened by applying EVENT_OPEN_TXN should never be aborted 
> automatically due to time-out. Aborting of transaction started by replication 
> task may leads to inconsistent state at target which needs additional 
> overhead to clean-up. So, it is proposed to mark the transactions opened by 
> replication task as special ones and shouldn't be aborted if heart beat is 
> lost. This helps to ensure all ABORT and COMMIT events will always find the 
> corresponding txn at target to operate.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19881) Allow metadata dump for database which are not source of replication

2018-06-13 Thread mahesh kumar behera (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-19881:
---
Status: Patch Available  (was: Open)

> Allow metadata dump for database which are not source of replication
> 
>
> Key: HIVE-19881
> URL: https://issues.apache.org/jira/browse/HIVE-19881
> Project: Hive
>  Issue Type: Task
>  Components: repl
>Affects Versions: 3.1.0, 4.0.0
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
> Fix For: 3.1.0, 4.0.0
>
> Attachments: HIVE-19881.01..patch
>
>
> If the dump is meta data only then allow dump even if the db is not source of 
> replication



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-19881) Allow metadata dump for database which are not source of replication

2018-06-13 Thread mahesh kumar behera (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera reassigned HIVE-19881:
--


> Allow metadata dump for database which are not source of replication
> 
>
> Key: HIVE-19881
> URL: https://issues.apache.org/jira/browse/HIVE-19881
> Project: Hive
>  Issue Type: Task
>  Components: repl
>Affects Versions: 3.1.0, 4.0.0
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
> Fix For: 3.1.0, 4.0.0
>
>
> If the dump is meta data only then allow dump even if the db is not source of 
> replication



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19881) Allow metadata dump for database which are not source of replication

2018-06-13 Thread mahesh kumar behera (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-19881:
---
Attachment: HIVE-19881.01..patch

> Allow metadata dump for database which are not source of replication
> 
>
> Key: HIVE-19881
> URL: https://issues.apache.org/jira/browse/HIVE-19881
> Project: Hive
>  Issue Type: Task
>  Components: repl
>Affects Versions: 3.1.0, 4.0.0
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
> Fix For: 3.1.0, 4.0.0
>
> Attachments: HIVE-19881.01..patch
>
>
> If the dump is meta data only then allow dump even if the db is not source of 
> replication



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19829) Incremental replication load should create tasks in execution phase rather than semantic phase

2018-06-13 Thread mahesh kumar behera (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-19829:
---
Attachment: HIVE-19829.04.patch

> Incremental replication load should create tasks in execution phase rather 
> than semantic phase
> --
>
> Key: HIVE-19829
> URL: https://issues.apache.org/jira/browse/HIVE-19829
> Project: Hive
>  Issue Type: Task
>  Components: repl
>Affects Versions: 3.1.0, 4.0.0
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.1.0, 4.0.0
>
> Attachments: HIVE-19829.01.patch, HIVE-19829.02.patch, 
> HIVE-19829.03.patch, HIVE-19829.04.patch
>
>
> Split the incremental load into multiple iterations. In each iteration create 
> number of tasks equal to the configured value.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19725) Add ability to dump non-native tables in replication metadata dump

2018-06-13 Thread mahesh kumar behera (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-19725:
---
Attachment: HIVE-19725.06-branch-3.patch

> Add ability to dump non-native tables in replication metadata dump
> --
>
> Key: HIVE-19725
> URL: https://issues.apache.org/jira/browse/HIVE-19725
> Project: Hive
>  Issue Type: Task
>  Components: repl
>Affects Versions: 3.0.0, 3.1.0, 4.0.0
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: Repl, pull-request-available
> Fix For: 3.1.0, 4.0.0
>
> Attachments: HIVE-19725.01.patch, HIVE-19725.02.patch, 
> HIVE-19725.03.patch, HIVE-19725.04.patch, HIVE-19725.05.patch, 
> HIVE-19725.06-branch-3.patch
>
>
> if hive.repl.dump.metadata.only is set to true, allow dumping non native 
> tables also. 
> Data dump for non-native tables should never be allowed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19739) Bootstrap REPL LOAD to use checkpoints to validate and skip the loaded data/metadata.

2018-06-12 Thread mahesh kumar behera (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16509614#comment-16509614
 ] 

mahesh kumar behera commented on HIVE-19739:


path.04 looks fine

> Bootstrap REPL LOAD to use checkpoints to validate and skip the loaded 
> data/metadata.
> -
>
> Key: HIVE-19739
> URL: https://issues.apache.org/jira/browse/HIVE-19739
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, repl
>Affects Versions: 3.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR, pull-request-available, replication
> Fix For: 4.0.0
>
> Attachments: HIVE-19739.01.patch, HIVE-19739.02.patch, 
> HIVE-19739.03.patch, HIVE-19739.04.patch
>
>
> Currently. bootstrap REPL LOAD have added checkpoint identifiers in 
> DB/table/partition object properties once the data/metadata related to the 
> object is successfully loaded.
> If the Db exist and is not empty, then currently we are throwing exception. 
> But need to support it for the retry scenario after a failure.
> If there is a retry of bootstrap load using the same dump, then instead of 
> throwing error, we should check if any of the tables/partitions are 
> completely loaded using the checkpoint identifiers. If yes, then skip it or 
> else drop/create them again.
> If the bootstrap load is performed using different dump, then it should throw 
> exception.
> Allow bootstrap on empty Db only if ckpt property is not set. Also, if 
> bootstrap load is completed on the target Db, then shouldn't allow bootstrap 
> retry at all.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19812) Disable external table replication by default via a configuration property

2018-06-14 Thread mahesh kumar behera (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-19812:
---
Attachment: HIVE-19812.04.patch

> Disable external table replication by default via a configuration property
> --
>
> Key: HIVE-19812
> URL: https://issues.apache.org/jira/browse/HIVE-19812
> Project: Hive
>  Issue Type: Task
>  Components: repl
>Affects Versions: 3.1.0, 4.0.0
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.1.0, 4.0.0
>
> Attachments: HIVE-19812.01.patch, HIVE-19812.02.patch, 
> HIVE-19812.03.patch, HIVE-19812.04.patch
>
>
> use a hive config property to allow external table replication. set this 
> property by default to prevent external table replication.
> for metadata only hive repl always export metadata for external tables.
>  
> REPL_DUMP_EXTERNAL_TABLES("hive.repl.dump.include.external.tables", false,
> "Indicates if repl dump should include information about external tables. It 
> should be \n"
> + "used in conjunction with 'hive.repl.dump.metadata.only' set to false. if 
> 'hive.repl.dump.metadata.only' \n"
> + " is set to true then this config parameter has no effect as external table 
> meta data is flushed \n"
> + " always by default.")
> This should be done for only replication dump and not for export



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19829) Incremental replication load should create tasks in execution phase rather than semantic phase

2018-06-15 Thread mahesh kumar behera (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-19829:
---
Attachment: HIVE-19829.06.patch

> Incremental replication load should create tasks in execution phase rather 
> than semantic phase
> --
>
> Key: HIVE-19829
> URL: https://issues.apache.org/jira/browse/HIVE-19829
> Project: Hive
>  Issue Type: Task
>  Components: repl
>Affects Versions: 3.1.0, 4.0.0
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.1.0, 4.0.0
>
> Attachments: HIVE-19829.01.patch, HIVE-19829.02.patch, 
> HIVE-19829.03.patch, HIVE-19829.04.patch, HIVE-19829.06.patch
>
>
> Split the incremental load into multiple iterations. In each iteration create 
> number of tasks equal to the configured value.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19569) alter table db1.t1 rename db2.t2 generates MetaStoreEventListener.onDropTable()

2018-06-14 Thread mahesh kumar behera (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-19569:
---
Attachment: HIVE-19569.04.patch

> alter table db1.t1 rename db2.t2 generates 
> MetaStoreEventListener.onDropTable()
> ---
>
> Key: HIVE-19569
> URL: https://issues.apache.org/jira/browse/HIVE-19569
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore, Standalone Metastore, Transactions
>Affects Versions: 3.0.0
>Reporter: Eugene Koifman
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-19569.01.patch, HIVE-19569.02.patch, 
> HIVE-19569.03.patch, HIVE-19569.04.patch
>
>
> When renaming a table within the same DB, this operation causes 
> {{MetaStoreEventListener.onAlterTable()}} to fire but when changing DB name 
> for a table it causes {{MetaStoreEventListener.onDropTable()}} + 
> {{MetaStoreEventListener.onCreateTable()}}.
> The files from original table are moved to new table location.  
> This creates confusing semantics since any logic in {{onDropTable()}} doesn't 
> know about the larger context, i.e. that there will be a matching 
> {{onCreateTable()}}.
> In particular, this causes a problem for Acid tables since files moved from 
> old table use WriteIDs that are not meaningful with the context of new table.
> Current implementation is due to replication.  This should ideally be changed 
> to raise a "not supported" error for tables that are marked for replication.
> cc [~sankarh]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19812) Disable external table replication by default via a configuration property

2018-06-16 Thread mahesh kumar behera (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-19812:
---
Attachment: HIVE-19812.05.patch

> Disable external table replication by default via a configuration property
> --
>
> Key: HIVE-19812
> URL: https://issues.apache.org/jira/browse/HIVE-19812
> Project: Hive
>  Issue Type: Task
>  Components: repl
>Affects Versions: 3.1.0, 4.0.0
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.1.0, 4.0.0
>
> Attachments: HIVE-19812.01.patch, HIVE-19812.02.patch, 
> HIVE-19812.03.patch, HIVE-19812.04.patch, HIVE-19812.05.patch
>
>
> use a hive config property to allow external table replication. set this 
> property by default to prevent external table replication.
> for metadata only hive repl always export metadata for external tables.
>  
> REPL_DUMP_EXTERNAL_TABLES("hive.repl.dump.include.external.tables", false,
> "Indicates if repl dump should include information about external tables. It 
> should be \n"
> + "used in conjunction with 'hive.repl.dump.metadata.only' set to false. if 
> 'hive.repl.dump.metadata.only' \n"
> + " is set to true then this config parameter has no effect as external table 
> meta data is flushed \n"
> + " always by default.")
> This should be done for only replication dump and not for export



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-19924) Tag distcp jobs run by Repl Load

2018-06-16 Thread mahesh kumar behera (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera reassigned HIVE-19924:
--


> Tag distcp jobs run by Repl Load
> 
>
> Key: HIVE-19924
> URL: https://issues.apache.org/jira/browse/HIVE-19924
> Project: Hive
>  Issue Type: Task
>  Components: repl
>Affects Versions: 3.1.0, 4.0.0
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
> Fix For: 3.1.0, 4.0.0
>
>
> Add tags in jobconf for distcp related jobs started by replication. This will 
> allow hive to kill these jobs in case beacon retries, or hs2 dies and beacon 
> issues a kill command.
>  * one of the tags should definitely be the query_id that starts the job : 
> With this flow beacon before retrying the bootstrap load, will issue a kill 
> command to hs2 with the query id of the previous issued command. hs2 will 
> then kill an running jobs on yarn tagged with the Query_id.
>  * To get around the additional failure point as mentioned above. The jobs 
> can be tagged with an additional unique tag_id provided by Beacon in the WITH 
> clause in repl load command to be used to tag distcp jobs ). Enhance the kill 
> api to take the tag as input and kill jobs associated with that tag. Problem 
> here is how do we validate the association of the tag with a hive query id to 
> make sure this api is not used to kill jobs run by other components, however 
> we can provide this capability to only admins and should be ok in that case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19881) Allow metadata-only dump for database which are not source of replication

2018-06-17 Thread mahesh kumar behera (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16515065#comment-16515065
 ] 

mahesh kumar behera commented on HIVE-19881:


test failures are not related to this patch ..failing since last two builds ..

> Allow metadata-only dump for database which are not source of replication
> -
>
> Key: HIVE-19881
> URL: https://issues.apache.org/jira/browse/HIVE-19881
> Project: Hive
>  Issue Type: Task
>  Components: repl
>Affects Versions: 3.1.0, 4.0.0
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-19881.01-branch-3.patch, HIVE-19881.01..patch
>
>
> If the dump is meta data only then allow dump even if the db is not source of 
> replication



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19569) alter table db1.t1 rename db2.t2 generates MetaStoreEventListener.onDropTable()

2018-06-17 Thread mahesh kumar behera (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-19569:
---
Attachment: HIVE-19569.01-branch-3.patch

> alter table db1.t1 rename db2.t2 generates 
> MetaStoreEventListener.onDropTable()
> ---
>
> Key: HIVE-19569
> URL: https://issues.apache.org/jira/browse/HIVE-19569
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore, Standalone Metastore, Transactions
>Affects Versions: 3.0.0
>Reporter: Eugene Koifman
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-19569.01-branch-3.patch, HIVE-19569.01.patch, 
> HIVE-19569.02.patch, HIVE-19569.03.patch, HIVE-19569.04.patch
>
>
> When renaming a table within the same DB, this operation causes 
> {{MetaStoreEventListener.onAlterTable()}} to fire but when changing DB name 
> for a table it causes {{MetaStoreEventListener.onDropTable()}} + 
> {{MetaStoreEventListener.onCreateTable()}}.
> The files from original table are moved to new table location.  
> This creates confusing semantics since any logic in {{onDropTable()}} doesn't 
> know about the larger context, i.e. that there will be a matching 
> {{onCreateTable()}}.
> In particular, this causes a problem for Acid tables since files moved from 
> old table use WriteIDs that are not meaningful with the context of new table.
> Current implementation is due to replication.  This should ideally be changed 
> to raise a "not supported" error for tables that are marked for replication.
> cc [~sankarh]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19829) Incremental replication load should create tasks in execution phase rather than semantic phase

2018-06-17 Thread mahesh kumar behera (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-19829:
---
Attachment: HIVE-19829.07.patch

> Incremental replication load should create tasks in execution phase rather 
> than semantic phase
> --
>
> Key: HIVE-19829
> URL: https://issues.apache.org/jira/browse/HIVE-19829
> Project: Hive
>  Issue Type: Task
>  Components: repl
>Affects Versions: 3.1.0, 4.0.0
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-19829.01.patch, HIVE-19829.02.patch, 
> HIVE-19829.03.patch, HIVE-19829.04.patch, HIVE-19829.06.patch, 
> HIVE-19829.07.patch
>
>
> Split the incremental load into multiple iterations. In each iteration create 
> number of tasks equal to the configured value.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19812) Disable external table replication by default via a configuration property

2018-06-19 Thread mahesh kumar behera (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-19812:
---
Attachment: HIVE-19812.06.patch

> Disable external table replication by default via a configuration property
> --
>
> Key: HIVE-19812
> URL: https://issues.apache.org/jira/browse/HIVE-19812
> Project: Hive
>  Issue Type: Task
>  Components: repl
>Affects Versions: 3.1.0, 4.0.0
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.1.0, 4.0.0
>
> Attachments: HIVE-19812.01.patch, HIVE-19812.02.patch, 
> HIVE-19812.03.patch, HIVE-19812.04.patch, HIVE-19812.05.patch, 
> HIVE-19812.06.patch
>
>
> use a hive config property to allow external table replication. set this 
> property by default to prevent external table replication.
> for metadata only hive repl always export metadata for external tables.
>  
> REPL_DUMP_EXTERNAL_TABLES("hive.repl.dump.include.external.tables", false,
> "Indicates if repl dump should include information about external tables. It 
> should be \n"
> + "used in conjunction with 'hive.repl.dump.metadata.only' set to false. if 
> 'hive.repl.dump.metadata.only' \n"
> + " is set to true then this config parameter has no effect as external table 
> meta data is flushed \n"
> + " always by default.")
> This should be done for only replication dump and not for export



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19881) Allow metadata dump for database which are not source of replication

2018-06-14 Thread mahesh kumar behera (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16513253#comment-16513253
 ] 

mahesh kumar behera commented on HIVE-19881:


[~sankarh] .. can we commit patch.1 ?

> Allow metadata dump for database which are not source of replication
> 
>
> Key: HIVE-19881
> URL: https://issues.apache.org/jira/browse/HIVE-19881
> Project: Hive
>  Issue Type: Task
>  Components: repl
>Affects Versions: 3.1.0, 4.0.0
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.1.0, 4.0.0
>
> Attachments: HIVE-19881.01..patch
>
>
> If the dump is meta data only then allow dump even if the db is not source of 
> replication



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19340) Disable timeout of transactions opened by replication task at target cluster

2018-06-14 Thread mahesh kumar behera (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16513251#comment-16513251
 ] 

mahesh kumar behera commented on HIVE-19340:


TestTxnCommands2.testMultiInsertStatement is failing for map reduce error, not 
related to this patch. Local run is passing.

TestMiniDruidCliDriver.testCliDriver and 
spark.client.rpc.TestRpc.testServerPort are not related to this patch 

 

[~sankarh]  check if this can be committed 

> Disable timeout of transactions opened by replication task at target cluster
> 
>
> Key: HIVE-19340
> URL: https://issues.apache.org/jira/browse/HIVE-19340
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl, Transactions
>Affects Versions: 3.0.0
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: ACID, DR, pull-request-available, replication
> Fix For: 4.0.0
>
> Attachments: HIVE-19340.01.patch, HIVE-19340.02.patch, 
> HIVE-19340.03-branch-3.patch, HIVE-19340.03.patch, 
> HIVE-19340.04-branch-3.patch, HIVE-19340.06-branch-3.patch, 
> HIVE-19340.06.patch, HIVE-19340.07-branch-3.patch
>
>
> The transactions opened by applying EVENT_OPEN_TXN should never be aborted 
> automatically due to time-out. Aborting of transaction started by replication 
> task may leads to inconsistent state at target which needs additional 
> overhead to clean-up. So, it is proposed to mark the transactions opened by 
> replication task as special ones and shouldn't be aborted if heart beat is 
> lost. This helps to ensure all ABORT and COMMIT events will always find the 
> corresponding txn at target to operate.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19267) Create/Replicate ACID Write event

2018-06-13 Thread mahesh kumar behera (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-19267:
---
Attachment: HIVE-19267.15.patch

> Create/Replicate ACID Write event
> -
>
> Key: HIVE-19267
> URL: https://issues.apache.org/jira/browse/HIVE-19267
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl, Transactions
>Affects Versions: 3.0.0
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: ACID, DR, pull-request-available, replication
> Attachments: HIVE-19267.01.patch, HIVE-19267.02.patch, 
> HIVE-19267.03.patch, HIVE-19267.04.patch, HIVE-19267.05.patch, 
> HIVE-19267.06.patch, HIVE-19267.07.patch, HIVE-19267.08.patch, 
> HIVE-19267.09.patch, HIVE-19267.10.patch, HIVE-19267.11.patch, 
> HIVE-19267.12.patch, HIVE-19267.13.patch, HIVE-19267.14.patch, 
> HIVE-19267.15.patch
>
>
>  
> h1. Replicate ACID write Events
>  * Create new EVENT_WRITE event with related message format to log the write 
> operations with in a txn along with data associated.
>  * Log this event when perform any writes (insert into, insert overwrite, 
> load table, delete, update, merge, truncate) on table/partition.
>  * If a single MERGE/UPDATE/INSERT/DELETE statement operates on multiple 
> partitions, then need to log one event per partition.
>  * DbNotificationListener should log this type of event to special metastore 
> table named "MTxnWriteNotificationLog".
>  * This table should maintain a map of txn ID against list of 
> tables/partitions written by given txn.
>  * The entry for a given txn should be removed by the cleaner thread that 
> removes the expired events from EventNotificationTable.
> h1. Replicate Commit Txn operation (with writes)
> Add new EVENT_COMMIT_TXN to log the metadata/data of all tables/partitions 
> modified within the txn.
> *Source warehouse:*
>  * This event should read the EVENT_WRITEs from "MTxnWriteNotificationLog" 
> metastore table to consolidate the list of tables/partitions modified within 
> this txn scope.
>  * Based on the list of tables/partitions modified and table Write ID, need 
> to compute the list of delta files added by this txn.
>  * Repl dump should read this message and dump the metadata and delta files 
> list.
> *Target warehouse:*
>  * Ensure snapshot isolation at target for on-going read txns which shouldn't 
> view the data replicated from committed txn. (Ensured with open and allocate 
> write ID events).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19880) Repl Load to return recoverable vs non-recoverable error codes

2018-06-15 Thread mahesh kumar behera (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-19880:
---
Attachment: HIVE-19880.04.patch

> Repl Load to return recoverable vs non-recoverable error codes
> --
>
> Key: HIVE-19880
> URL: https://issues.apache.org/jira/browse/HIVE-19880
> Project: Hive
>  Issue Type: Task
>  Components: repl
>Affects Versions: 3.1.0, 4.0.0
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.1.0, 4.0.0
>
> Attachments: HIVE-19880.01.patch, HIVE-19880.04.patch
>
>
> To enable bootstrap of large databases, application has to have the ability 
> to keep retrying the bootstrap load till it encounters a fatal error. The 
> ability to identify if an error is fatal or not will be decided by hive and 
> communication of the same will happen to application via error codes.
> So there should be different error codes for recoverable vs non-recoverable 
> failures which should be propagated to application as part of running the 
> repl load command.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19881) Allow metadata-only dump for database which are not source of replication

2018-06-15 Thread mahesh kumar behera (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-19881:
---
Attachment: HIVE-19881.01-branch-3.patch

> Allow metadata-only dump for database which are not source of replication
> -
>
> Key: HIVE-19881
> URL: https://issues.apache.org/jira/browse/HIVE-19881
> Project: Hive
>  Issue Type: Task
>  Components: repl
>Affects Versions: 3.1.0, 4.0.0
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-19881.01-branch-3.patch, HIVE-19881.01..patch
>
>
> If the dump is meta data only then allow dump even if the db is not source of 
> replication



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19812) Disable external table replication by default via a configuration property

2018-06-10 Thread mahesh kumar behera (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-19812:
---
Attachment: HIVE-19812.03.patch

> Disable external table replication by default via a configuration property
> --
>
> Key: HIVE-19812
> URL: https://issues.apache.org/jira/browse/HIVE-19812
> Project: Hive
>  Issue Type: Task
>  Components: repl
>Affects Versions: 3.1.0, 4.0.0
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.1.0, 4.0.0
>
> Attachments: HIVE-19812.01.patch, HIVE-19812.02.patch, 
> HIVE-19812.03.patch
>
>
> use a hive config property to allow external table replication. set this 
> property by default to prevent external table replication.
> for metadata only hive repl always export metadata for external tables.
>  
> REPL_DUMP_EXTERNAL_TABLES("hive.repl.dump.include.external.tables", false,
> "Indicates if repl dump should include information about external tables. It 
> should be \n"
> + "used in conjunction with 'hive.repl.dump.metadata.only' set to false. if 
> 'hive.repl.dump.metadata.only' \n"
> + " is set to true then this config parameter has no effect as external table 
> meta data is flushed \n"
> + " always by default.")
> This should be done for only replication dump and not for export



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19815) Repl dump should not propagate the checkpoint and repl source properties

2018-06-10 Thread mahesh kumar behera (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16507679#comment-16507679
 ] 

mahesh kumar behera commented on HIVE-19815:


the changes looks fine to me  : [~sankarh]

> Repl dump should not propagate the checkpoint and repl source properties
> 
>
> Key: HIVE-19815
> URL: https://issues.apache.org/jira/browse/HIVE-19815
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, repl
>Affects Versions: 3.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.1.0, 4.0.0
>
> Attachments: HIVE-19815.01.patch, HIVE-19815.02.patch
>
>
> For replication scenarios of A-> B -> C the repl dump on B should not include 
> the checkpoint property when dumping out table information. 
> Alter tables/partitions during incremental should not propagate this as well.
> Also should not propagate the the db level parameters set by replication 
> internally.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19829) Incremental replication load should create tasks in execution phase rather than semantic phase

2018-06-11 Thread mahesh kumar behera (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-19829:
---
Attachment: HIVE-19829.03.patch

> Incremental replication load should create tasks in execution phase rather 
> than semantic phase
> --
>
> Key: HIVE-19829
> URL: https://issues.apache.org/jira/browse/HIVE-19829
> Project: Hive
>  Issue Type: Task
>  Components: repl
>Affects Versions: 3.1.0, 4.0.0
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.1.0, 4.0.0
>
> Attachments: HIVE-19829.01.patch, HIVE-19829.02.patch, 
> HIVE-19829.03.patch
>
>
> Split the incremental load into multiple iterations. In each iteration create 
> number of tasks equal to the configured value.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19569) alter table db1.t1 rename db2.t2 generates MetaStoreEventListener.onDropTable()

2018-06-11 Thread mahesh kumar behera (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-19569:
---
Attachment: HIVE-19569.02.patch

> alter table db1.t1 rename db2.t2 generates 
> MetaStoreEventListener.onDropTable()
> ---
>
> Key: HIVE-19569
> URL: https://issues.apache.org/jira/browse/HIVE-19569
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore, Standalone Metastore, Transactions
>Affects Versions: 3.0.0
>Reporter: Eugene Koifman
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-19569.01.patch, HIVE-19569.02.patch, 
> HIVE-19569.02.patch
>
>
> When renaming a table within the same DB, this operation causes 
> {{MetaStoreEventListener.onAlterTable()}} to fire but when changing DB name 
> for a table it causes {{MetaStoreEventListener.onDropTable()}} + 
> {{MetaStoreEventListener.onCreateTable()}}.
> The files from original table are moved to new table location.  
> This creates confusing semantics since any logic in {{onDropTable()}} doesn't 
> know about the larger context, i.e. that there will be a matching 
> {{onCreateTable()}}.
> In particular, this causes a problem for Acid tables since files moved from 
> old table use WriteIDs that are not meaningful with the context of new table.
> Current implementation is due to replication.  This should ideally be changed 
> to raise a "not supported" error for tables that are marked for replication.
> cc [~sankarh]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19569) alter table db1.t1 rename db2.t2 generates MetaStoreEventListener.onDropTable()

2018-06-11 Thread mahesh kumar behera (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-19569:
---
Attachment: (was: HIVE-19569.02.patch)

> alter table db1.t1 rename db2.t2 generates 
> MetaStoreEventListener.onDropTable()
> ---
>
> Key: HIVE-19569
> URL: https://issues.apache.org/jira/browse/HIVE-19569
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore, Standalone Metastore, Transactions
>Affects Versions: 3.0.0
>Reporter: Eugene Koifman
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-19569.01.patch, HIVE-19569.02.patch
>
>
> When renaming a table within the same DB, this operation causes 
> {{MetaStoreEventListener.onAlterTable()}} to fire but when changing DB name 
> for a table it causes {{MetaStoreEventListener.onDropTable()}} + 
> {{MetaStoreEventListener.onCreateTable()}}.
> The files from original table are moved to new table location.  
> This creates confusing semantics since any logic in {{onDropTable()}} doesn't 
> know about the larger context, i.e. that there will be a matching 
> {{onCreateTable()}}.
> In particular, this causes a problem for Acid tables since files moved from 
> old table use WriteIDs that are not meaningful with the context of new table.
> Current implementation is due to replication.  This should ideally be changed 
> to raise a "not supported" error for tables that are marked for replication.
> cc [~sankarh]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19569) alter table db1.t1 rename db2.t2 generates MetaStoreEventListener.onDropTable()

2018-06-11 Thread mahesh kumar behera (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-19569:
---
Attachment: HIVE-19569.03.patch

> alter table db1.t1 rename db2.t2 generates 
> MetaStoreEventListener.onDropTable()
> ---
>
> Key: HIVE-19569
> URL: https://issues.apache.org/jira/browse/HIVE-19569
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore, Standalone Metastore, Transactions
>Affects Versions: 3.0.0
>Reporter: Eugene Koifman
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-19569.01.patch, HIVE-19569.02.patch, 
> HIVE-19569.03.patch
>
>
> When renaming a table within the same DB, this operation causes 
> {{MetaStoreEventListener.onAlterTable()}} to fire but when changing DB name 
> for a table it causes {{MetaStoreEventListener.onDropTable()}} + 
> {{MetaStoreEventListener.onCreateTable()}}.
> The files from original table are moved to new table location.  
> This creates confusing semantics since any logic in {{onDropTable()}} doesn't 
> know about the larger context, i.e. that there will be a matching 
> {{onCreateTable()}}.
> In particular, this causes a problem for Acid tables since files moved from 
> old table use WriteIDs that are not meaningful with the context of new table.
> Current implementation is due to replication.  This should ideally be changed 
> to raise a "not supported" error for tables that are marked for replication.
> cc [~sankarh]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19340) Disable timeout of transactions opened by replication task at target cluster

2018-05-28 Thread mahesh kumar behera (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-19340:
---
Attachment: HIVE-19340.04-branch-3.patch

> Disable timeout of transactions opened by replication task at target cluster
> 
>
> Key: HIVE-19340
> URL: https://issues.apache.org/jira/browse/HIVE-19340
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl, Transactions
>Affects Versions: 3.0.0
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: ACID, DR, pull-request-available, replication
> Fix For: 4.0.0
>
> Attachments: HIVE-19340.01.patch, HIVE-19340.02.patch, 
> HIVE-19340.03-branch-3.patch, HIVE-19340.03.patch, 
> HIVE-19340.04-branch-3.patch
>
>
> The transactions opened by applying EVENT_OPEN_TXN should never be aborted 
> automatically due to time-out. Aborting of transaction started by replication 
> task may leads to inconsistent state at target which needs additional 
> overhead to clean-up. So, it is proposed to mark the transactions opened by 
> replication task as special ones and shouldn't be aborted if heart beat is 
> lost. This helps to ensure all ABORT and COMMIT events will always find the 
> corresponding txn at target to operate.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-19725) Add ability to dump non-native tables in replication metadata dump

2018-05-28 Thread mahesh kumar behera (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera reassigned HIVE-19725:
--


> Add ability to dump non-native tables in replication metadata dump
> --
>
> Key: HIVE-19725
> URL: https://issues.apache.org/jira/browse/HIVE-19725
> Project: Hive
>  Issue Type: Task
>  Components: repl
>Affects Versions: 3.0.0, 3.1.0, 4.0.0
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: Repl
> Fix For: 3.1.0, 3.0.1, 4.0.0
>
>
> Add a configuration in the WITH clause in REPL DUMP to include ability to 
> dump non-native tables.
> Data dump for non-native tables should never be allowed.
> This configuration will be used along with "hive.repl.dump.metadata.only" by 
> DAS



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19725) Add ability to dump non-native tables in replication metadata dump

2018-05-28 Thread mahesh kumar behera (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-19725:
---
Attachment: HIVE-19725.01.patch

> Add ability to dump non-native tables in replication metadata dump
> --
>
> Key: HIVE-19725
> URL: https://issues.apache.org/jira/browse/HIVE-19725
> Project: Hive
>  Issue Type: Task
>  Components: repl
>Affects Versions: 3.0.0, 3.1.0, 4.0.0
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: Repl
> Fix For: 3.1.0, 3.0.1, 4.0.0
>
> Attachments: HIVE-19725.01.patch
>
>
> Add a configuration in the WITH clause in REPL DUMP to include ability to 
> dump non-native tables.
> Data dump for non-native tables should never be allowed.
> This configuration will be used along with "hive.repl.dump.metadata.only" by 
> DAS



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19725) Add ability to dump non-native tables in replication metadata dump

2018-05-28 Thread mahesh kumar behera (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-19725:
---
Description: 
if hive.repl.dump.metadata.only is set to true, allow dumping non native tables 
also. This will be used by DAS.

Data dump for non-native tables should never be allowed.

  was:
Add a configuration in the WITH clause in REPL DUMP to include ability to dump 
non-native tables.

Data dump for non-native tables should never be allowed.

This configuration will be used along with "hive.repl.dump.metadata.only" by DAS


> Add ability to dump non-native tables in replication metadata dump
> --
>
> Key: HIVE-19725
> URL: https://issues.apache.org/jira/browse/HIVE-19725
> Project: Hive
>  Issue Type: Task
>  Components: repl
>Affects Versions: 3.0.0, 3.1.0, 4.0.0
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: Repl
> Fix For: 3.1.0, 3.0.1, 4.0.0
>
> Attachments: HIVE-19725.01.patch
>
>
> if hive.repl.dump.metadata.only is set to true, allow dumping non native 
> tables also. This will be used by DAS.
> Data dump for non-native tables should never be allowed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19708) Repl copy retrying with cm path even if the failure is due to network issue

2018-05-30 Thread mahesh kumar behera (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-19708:
---
Attachment: HIVE-19708.04.patch

> Repl copy retrying with cm path even if the failure is due to network issue
> ---
>
> Key: HIVE-19708
> URL: https://issues.apache.org/jira/browse/HIVE-19708
> Project: Hive
>  Issue Type: Task
>  Components: Hive, HiveServer2, repl
>Affects Versions: 3.1.0
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.1.0
>
> Attachments: HIVE-19708.01.patch, HIVE-19708.02.patch, 
> HIVE-19708.04.patch
>
>
> * During repl load
>  ** for filesystem based copying of file if the copy fails due to a 
> connection error to source Name Node, we should recreate the filesystem 
> object.
>  ** the retry logic for local file copy should be triggered using the 
> original source file path ( and not the CM root path ) since failure can be 
> due to network issues between DFSClient and NN.
>  * When listing files in tables / partition to include them in _files, we 
> should add retry logic when failure occurs. FileSystem object here also 
> should be recreated since the existing one might be in inconsistent state.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19725) Add ability to dump non-native tables in replication metadata dump

2018-05-31 Thread mahesh kumar behera (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-19725:
---
Attachment: HIVE-19725.02.patch

> Add ability to dump non-native tables in replication metadata dump
> --
>
> Key: HIVE-19725
> URL: https://issues.apache.org/jira/browse/HIVE-19725
> Project: Hive
>  Issue Type: Task
>  Components: repl
>Affects Versions: 3.0.0, 3.1.0, 4.0.0
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: Repl, pull-request-available
> Fix For: 3.1.0, 3.0.1, 4.0.0
>
> Attachments: HIVE-19725.01.patch, HIVE-19725.02.patch
>
>
> if hive.repl.dump.metadata.only is set to true, allow dumping non native 
> tables also. 
> Data dump for non-native tables should never be allowed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19488) Enable CM root based on db parameter, identifying a db as source of replication.

2018-05-27 Thread mahesh kumar behera (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-19488:
---
Attachment: HIVE-19488.08.patch

> Enable CM root based on db parameter, identifying a db as source of 
> replication.
> 
>
> Key: HIVE-19488
> URL: https://issues.apache.org/jira/browse/HIVE-19488
> Project: Hive
>  Issue Type: Task
>  Components: Hive, HiveServer2, repl
>Affects Versions: 3.1.0
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.1.0
>
> Attachments: HIVE-19488.01.patch, HIVE-19488.02.patch, 
> HIVE-19488.03.patch, HIVE-19488.04.patch, HIVE-19488.05.patch, 
> HIVE-19488.06.patch, HIVE-19488.07.patch, HIVE-19488.08.patch
>
>
> * add a parameter at db level to identify if its a source of replication. 
> beacon will set this.
>  * Enable CM root only for databases that are a source of a replication 
> policy, for other db's skip the CM root functionality.
>  * prevent database drop if the parameter indicating its source of a 
> replication, is set.
>  * as an upgrade to this version, beacon should set the property on all 
> existing database policies, in affect.
>  * the parameter should be of the form . –  repl.source.for : List < policy 
> ids >



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19340) Disable timeout of transactions opened by replication task at target cluster

2018-05-27 Thread mahesh kumar behera (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-19340:
---
Attachment: HIVE-19340.03-branch-3.patch

> Disable timeout of transactions opened by replication task at target cluster
> 
>
> Key: HIVE-19340
> URL: https://issues.apache.org/jira/browse/HIVE-19340
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl, Transactions
>Affects Versions: 3.0.0
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: ACID, DR, pull-request-available, replication
> Fix For: 4.0.0
>
> Attachments: HIVE-19340.01.patch, HIVE-19340.02.patch, 
> HIVE-19340.03-branch-3.patch, HIVE-19340.03.patch
>
>
> The transactions opened by applying EVENT_OPEN_TXN should never be aborted 
> automatically due to time-out. Aborting of transaction started by replication 
> task may leads to inconsistent state at target which needs additional 
> overhead to clean-up. So, it is proposed to mark the transactions opened by 
> replication task as special ones and shouldn't be aborted if heart beat is 
> lost. This helps to ensure all ABORT and COMMIT events will always find the 
> corresponding txn at target to operate.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19267) Create/Replicate ACID Write event

2018-05-27 Thread mahesh kumar behera (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-19267:
---
Attachment: HIVE-19267.07.patch

> Create/Replicate ACID Write event
> -
>
> Key: HIVE-19267
> URL: https://issues.apache.org/jira/browse/HIVE-19267
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl, Transactions
>Affects Versions: 3.0.0
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: ACID, DR, pull-request-available, replication
> Attachments: HIVE-19267.01.patch, HIVE-19267.02.patch, 
> HIVE-19267.03.patch, HIVE-19267.04.patch, HIVE-19267.05.patch, 
> HIVE-19267.06.patch, HIVE-19267.07.patch
>
>
>  
> h1. Replicate ACID write Events
>  * Create new EVENT_WRITE event with related message format to log the write 
> operations with in a txn along with data associated.
>  * Log this event when perform any writes (insert into, insert overwrite, 
> load table, delete, update, merge, truncate) on table/partition.
>  * If a single MERGE/UPDATE/INSERT/DELETE statement operates on multiple 
> partitions, then need to log one event per partition.
>  * DbNotificationListener should log this type of event to special metastore 
> table named "MTxnWriteNotificationLog".
>  * This table should maintain a map of txn ID against list of 
> tables/partitions written by given txn.
>  * The entry for a given txn should be removed by the cleaner thread that 
> removes the expired events from EventNotificationTable.
> h1. Replicate Commit Txn operation (with writes)
> Add new EVENT_COMMIT_TXN to log the metadata/data of all tables/partitions 
> modified within the txn.
> *Source warehouse:*
>  * This event should read the EVENT_WRITEs from "MTxnWriteNotificationLog" 
> metastore table to consolidate the list of tables/partitions modified within 
> this txn scope.
>  * Based on the list of tables/partitions modified and table Write ID, need 
> to compute the list of delta files added by this txn.
>  * Repl dump should read this message and dump the metadata and delta files 
> list.
> *Target warehouse:*
>  * Ensure snapshot isolation at target for on-going read txns which shouldn't 
> view the data replicated from committed txn. (Ensured with open and allocate 
> write ID events).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19340) Disable timeout of transactions opened by replication task at target cluster

2018-05-29 Thread mahesh kumar behera (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16494559#comment-16494559
 ] 

mahesh kumar behera commented on HIVE-19340:


[~sershe] [~ekoifman] [~thejas] [~sankarh]

Sorry i did'nt update here  the discussion we had with Eugene. 
 # We should not allow user to abort it as its created by replication task.
 # Only admin/super user should be able to abort these transaction.
 # In fact we need to have some authorization mechanism to validate user 
aborting any transaction.
 # I have created a internal Jira 
([BUG-102193|https://hortonworks.jira.com/browse/BUG-102193]) to address the 
abort transaction for tracking.

    

> Disable timeout of transactions opened by replication task at target cluster
> 
>
> Key: HIVE-19340
> URL: https://issues.apache.org/jira/browse/HIVE-19340
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl, Transactions
>Affects Versions: 3.0.0
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: ACID, DR, pull-request-available, replication
> Fix For: 4.0.0
>
> Attachments: HIVE-19340.01.patch, HIVE-19340.02.patch, 
> HIVE-19340.03-branch-3.patch, HIVE-19340.03.patch, 
> HIVE-19340.04-branch-3.patch
>
>
> The transactions opened by applying EVENT_OPEN_TXN should never be aborted 
> automatically due to time-out. Aborting of transaction started by replication 
> task may leads to inconsistent state at target which needs additional 
> overhead to clean-up. So, it is proposed to mark the transactions opened by 
> replication task as special ones and shouldn't be aborted if heart beat is 
> lost. This helps to ensure all ABORT and COMMIT events will always find the 
> corresponding txn at target to operate.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   3   4   5   6   7   8   9   10   >