[jira] [Commented] (OAK-1453) MongoMK failover support for replica sets (esp. shards)
[ https://issues.apache.org/jira/browse/OAK-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074477#comment-14074477 ] Chetan Mehrotra commented on OAK-1453: -- For reference from this [thread|https://groups.google.com/d/msg/mongodb-user/qSi8RmvcAUY/PjvrMGpeHDcJ] bq. Now, if you are getting back success for the write and you are using writeConcern w:1 (acknowledged) but during the data load you are losing the primary, *unless you have used w:2 as write concern (waiting for replication to at least one other node before acknowledgement) you will potentially have some records that would be written to the primary, not replicated and then if you shut down the primary and another node becomes the primary, the original node will have to roll back that write (since it's not on the new primary)*. If that's what happened, you will be able to find a rollback directory in the data directory which will have the documents that were rolled back. MongoMK failover support for replica sets (esp. shards) --- Key: OAK-1453 URL: https://issues.apache.org/jira/browse/OAK-1453 Project: Jackrabbit Oak Issue Type: Improvement Components: mongomk Reporter: Michael Marth Assignee: Thomas Mueller Labels: production, resilience Fix For: 1.1 With OAK-759 we have introduced replica support in MongoMK. I think we still need to address the resilience for failover from primary to secoandary: Consider a case where Oak writes to the primary. Replication to secondary is ongoing. During that period the primary goes down and the secondary becomes primary. There could be some half-replicated MVCC revisions, which need to be either discarded or be ignored after the failover. This might not be an issue if there is only one shard, as the commit root is written last (and replicated last) But with 2 shards the the replication state of these 2 shards could be inconsistent. Oak needs to handle such a situation without falling over. If we can detect a Mongo failover we could query Mongo which revisions are fully replicated to the new primary and discard the potentially half-replicated revisions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (OAK-1453) MongoMK failover support for replica sets (esp. shards)
[ https://issues.apache.org/jira/browse/OAK-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074482#comment-14074482 ] Chetan Mehrotra commented on OAK-1453: -- Couple of observation regarding importance of config servers when sharding is involved based on this [thread|https://groups.google.com/d/msg/mongodb-user/Q0yRpr-kNco/DLMtpjZq36IJ] * you cannot simply randomise the list of config servers per mongos. The --configdb string needs to be the same across all mongos'. * Traffic between mongos and config servers can be very high (particularly first config server) if balancing is going on * On AWS it might be tempting to use t1.micro for config server but that should not be done as they might require higher network throughput. Use m3.large MongoMK failover support for replica sets (esp. shards) --- Key: OAK-1453 URL: https://issues.apache.org/jira/browse/OAK-1453 Project: Jackrabbit Oak Issue Type: Improvement Components: mongomk Reporter: Michael Marth Assignee: Thomas Mueller Labels: production, resilience Fix For: 1.1 With OAK-759 we have introduced replica support in MongoMK. I think we still need to address the resilience for failover from primary to secoandary: Consider a case where Oak writes to the primary. Replication to secondary is ongoing. During that period the primary goes down and the secondary becomes primary. There could be some half-replicated MVCC revisions, which need to be either discarded or be ignored after the failover. This might not be an issue if there is only one shard, as the commit root is written last (and replicated last) But with 2 shards the the replication state of these 2 shards could be inconsistent. Oak needs to handle such a situation without falling over. If we can detect a Mongo failover we could query Mongo which revisions are fully replicated to the new primary and discard the potentially half-replicated revisions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (OAK-1453) MongoMK failover support for replica sets (esp. shards)
[ https://issues.apache.org/jira/browse/OAK-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13955028#comment-13955028 ] Stefan Egli commented on OAK-1453: -- OAK-1649 reports a problem on a save following a replica crash MongoMK failover support for replica sets (esp. shards) --- Key: OAK-1453 URL: https://issues.apache.org/jira/browse/OAK-1453 Project: Jackrabbit Oak Issue Type: Improvement Components: mongomk Reporter: Michael Marth Assignee: Thomas Mueller Priority: Critical Labels: production, resilience Fix For: 0.20 With OAK-759 we have introduced replica support in MongoMK. I think we still need to address the resilience for failover from primary to secoandary: Consider a case where Oak writes to the primary. Replication to secondary is ongoing. During that period the primary goes down and the secondary becomes primary. There could be some half-replicated MVCC revisions, which need to be either discarded or be ignored after the failover. This might not be an issue if there is only one shard, as the commit root is written last (and replicated last) But with 2 shards the the replication state of these 2 shards could be inconsistent. Oak needs to handle such a situation without falling over. If we can detect a Mongo failover we could query Mongo which revisions are fully replicated to the new primary and discard the potentially half-replicated revisions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (OAK-1453) MongoMK failover support for replica sets (esp. shards)
[ https://issues.apache.org/jira/browse/OAK-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13955063#comment-13955063 ] Stefan Egli commented on OAK-1453: -- OAK-1650 reports two kinds of exceptions occuring when crashing the replica-primary during a save of a large transaction MongoMK failover support for replica sets (esp. shards) --- Key: OAK-1453 URL: https://issues.apache.org/jira/browse/OAK-1453 Project: Jackrabbit Oak Issue Type: Improvement Components: mongomk Reporter: Michael Marth Assignee: Thomas Mueller Priority: Critical Labels: production, resilience Fix For: 0.20 With OAK-759 we have introduced replica support in MongoMK. I think we still need to address the resilience for failover from primary to secoandary: Consider a case where Oak writes to the primary. Replication to secondary is ongoing. During that period the primary goes down and the secondary becomes primary. There could be some half-replicated MVCC revisions, which need to be either discarded or be ignored after the failover. This might not be an issue if there is only one shard, as the commit root is written last (and replicated last) But with 2 shards the the replication state of these 2 shards could be inconsistent. Oak needs to handle such a situation without falling over. If we can detect a Mongo failover we could query Mongo which revisions are fully replicated to the new primary and discard the potentially half-replicated revisions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (OAK-1453) MongoMK failover support for replica sets (esp. shards)
[ https://issues.apache.org/jira/browse/OAK-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13950836#comment-13950836 ] Stefan Egli commented on OAK-1453: -- As noted in OAK-1641 there is a 'nasty' runtime exception thrown during a crash/failover. Not sure what the expected behavior of this should be though? MongoMK failover support for replica sets (esp. shards) --- Key: OAK-1453 URL: https://issues.apache.org/jira/browse/OAK-1453 Project: Jackrabbit Oak Issue Type: Improvement Components: mongomk Reporter: Michael Marth Assignee: Thomas Mueller Priority: Critical Labels: production, resilience Fix For: 0.20 With OAK-759 we have introduced replica support in MongoMK. I think we still need to address the resilience for failover from primary to secoandary: Consider a case where Oak writes to the primary. Replication to secondary is ongoing. During that period the primary goes down and the secondary becomes primary. There could be some half-replicated MVCC revisions, which need to be either discarded or be ignored after the failover. This might not be an issue if there is only one shard, as the commit root is written last (and replicated last) But with 2 shards the the replication state of these 2 shards could be inconsistent. Oak needs to handle such a situation without falling over. If we can detect a Mongo failover we could query Mongo which revisions are fully replicated to the new primary and discard the potentially half-replicated revisions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (OAK-1453) MongoMK failover support for replica sets (esp. shards)
[ https://issues.apache.org/jira/browse/OAK-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13950790#comment-13950790 ] Stefan Egli commented on OAK-1453: -- OAK-1641 reports a failure in the google cache when a primary mongo crashes MongoMK failover support for replica sets (esp. shards) --- Key: OAK-1453 URL: https://issues.apache.org/jira/browse/OAK-1453 Project: Jackrabbit Oak Issue Type: Improvement Components: mongomk Reporter: Michael Marth Assignee: Thomas Mueller Priority: Critical Labels: production, resilience Fix For: 0.20 With OAK-759 we have introduced replica support in MongoMK. I think we still need to address the resilience for failover from primary to secoandary: Consider a case where Oak writes to the primary. Replication to secondary is ongoing. During that period the primary goes down and the secondary becomes primary. There could be some half-replicated MVCC revisions, which need to be either discarded or be ignored after the failover. This might not be an issue if there is only one shard, as the commit root is written last (and replicated last) But with 2 shards the the replication state of these 2 shards could be inconsistent. Oak needs to handle such a situation without falling over. If we can detect a Mongo failover we could query Mongo which revisions are fully replicated to the new primary and discard the potentially half-replicated revisions. -- This message was sent by Atlassian JIRA (v6.2#6252)