[jira] [Commented] (OAK-1453) MongoMK failover support for replica sets (esp. shards)

2014-07-25 Thread Chetan Mehrotra (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074477#comment-14074477
 ] 

Chetan Mehrotra commented on OAK-1453:
--

For reference from this 
[thread|https://groups.google.com/d/msg/mongodb-user/qSi8RmvcAUY/PjvrMGpeHDcJ]

bq. Now, if you are getting back success for the write and you are using 
writeConcern w:1 (acknowledged) but during the data load you are losing the 
primary, *unless you have used w:2 as write concern (waiting for replication to 
at least one other node before acknowledgement) you will potentially have some 
records that would be written to the primary, not replicated and then if you 
shut down the primary and another node becomes the primary, the original node 
will have to roll back that write (since it's not on the new primary)*.   If 
that's what happened, you will be able to find a rollback directory in the 
data directory which will have the documents that were rolled back.

 MongoMK failover support for replica sets (esp. shards)
 ---

 Key: OAK-1453
 URL: https://issues.apache.org/jira/browse/OAK-1453
 Project: Jackrabbit Oak
  Issue Type: Improvement
  Components: mongomk
Reporter: Michael Marth
Assignee: Thomas Mueller
  Labels: production, resilience
 Fix For: 1.1


 With OAK-759 we have introduced replica support in MongoMK. I think we still 
 need to address the resilience for failover from primary to secoandary:
 Consider a case where Oak writes to the primary. Replication to secondary is 
 ongoing. During that period the primary goes down and the secondary becomes 
 primary. There could be some half-replicated MVCC revisions, which need to 
 be either discarded or be ignored after the failover.
 This might not be an issue if there is only one shard, as the commit root is 
 written last (and replicated last)
 But with 2 shards the the replication state of these 2 shards could be 
 inconsistent. Oak needs to handle such a situation without falling over.
 If we can detect a Mongo failover we could query Mongo which revisions are 
 fully replicated to the new primary and discard the potentially 
 half-replicated revisions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (OAK-1453) MongoMK failover support for replica sets (esp. shards)

2014-07-25 Thread Chetan Mehrotra (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074482#comment-14074482
 ] 

Chetan Mehrotra commented on OAK-1453:
--

Couple of observation regarding importance of config servers when sharding is 
involved based on this 
[thread|https://groups.google.com/d/msg/mongodb-user/Q0yRpr-kNco/DLMtpjZq36IJ]

* you cannot simply randomise the list of config servers per mongos.  The 
--configdb string needs to be the same across all mongos'.
* Traffic between mongos and config servers can be very high (particularly 
first config server) if balancing is going on
* On AWS it might be tempting to use t1.micro for config server but that should 
not be done as they might require higher network throughput. Use m3.large

 MongoMK failover support for replica sets (esp. shards)
 ---

 Key: OAK-1453
 URL: https://issues.apache.org/jira/browse/OAK-1453
 Project: Jackrabbit Oak
  Issue Type: Improvement
  Components: mongomk
Reporter: Michael Marth
Assignee: Thomas Mueller
  Labels: production, resilience
 Fix For: 1.1


 With OAK-759 we have introduced replica support in MongoMK. I think we still 
 need to address the resilience for failover from primary to secoandary:
 Consider a case where Oak writes to the primary. Replication to secondary is 
 ongoing. During that period the primary goes down and the secondary becomes 
 primary. There could be some half-replicated MVCC revisions, which need to 
 be either discarded or be ignored after the failover.
 This might not be an issue if there is only one shard, as the commit root is 
 written last (and replicated last)
 But with 2 shards the the replication state of these 2 shards could be 
 inconsistent. Oak needs to handle such a situation without falling over.
 If we can detect a Mongo failover we could query Mongo which revisions are 
 fully replicated to the new primary and discard the potentially 
 half-replicated revisions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (OAK-1453) MongoMK failover support for replica sets (esp. shards)

2014-03-31 Thread Stefan Egli (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13955028#comment-13955028
 ] 

Stefan Egli commented on OAK-1453:
--

OAK-1649 reports a problem on a save following a replica crash

 MongoMK failover support for replica sets (esp. shards)
 ---

 Key: OAK-1453
 URL: https://issues.apache.org/jira/browse/OAK-1453
 Project: Jackrabbit Oak
  Issue Type: Improvement
  Components: mongomk
Reporter: Michael Marth
Assignee: Thomas Mueller
Priority: Critical
  Labels: production, resilience
 Fix For: 0.20


 With OAK-759 we have introduced replica support in MongoMK. I think we still 
 need to address the resilience for failover from primary to secoandary:
 Consider a case where Oak writes to the primary. Replication to secondary is 
 ongoing. During that period the primary goes down and the secondary becomes 
 primary. There could be some half-replicated MVCC revisions, which need to 
 be either discarded or be ignored after the failover.
 This might not be an issue if there is only one shard, as the commit root is 
 written last (and replicated last)
 But with 2 shards the the replication state of these 2 shards could be 
 inconsistent. Oak needs to handle such a situation without falling over.
 If we can detect a Mongo failover we could query Mongo which revisions are 
 fully replicated to the new primary and discard the potentially 
 half-replicated revisions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (OAK-1453) MongoMK failover support for replica sets (esp. shards)

2014-03-31 Thread Stefan Egli (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13955063#comment-13955063
 ] 

Stefan Egli commented on OAK-1453:
--

OAK-1650 reports two kinds of exceptions occuring when crashing the 
replica-primary during a save of a large transaction

 MongoMK failover support for replica sets (esp. shards)
 ---

 Key: OAK-1453
 URL: https://issues.apache.org/jira/browse/OAK-1453
 Project: Jackrabbit Oak
  Issue Type: Improvement
  Components: mongomk
Reporter: Michael Marth
Assignee: Thomas Mueller
Priority: Critical
  Labels: production, resilience
 Fix For: 0.20


 With OAK-759 we have introduced replica support in MongoMK. I think we still 
 need to address the resilience for failover from primary to secoandary:
 Consider a case where Oak writes to the primary. Replication to secondary is 
 ongoing. During that period the primary goes down and the secondary becomes 
 primary. There could be some half-replicated MVCC revisions, which need to 
 be either discarded or be ignored after the failover.
 This might not be an issue if there is only one shard, as the commit root is 
 written last (and replicated last)
 But with 2 shards the the replication state of these 2 shards could be 
 inconsistent. Oak needs to handle such a situation without falling over.
 If we can detect a Mongo failover we could query Mongo which revisions are 
 fully replicated to the new primary and discard the potentially 
 half-replicated revisions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (OAK-1453) MongoMK failover support for replica sets (esp. shards)

2014-03-28 Thread Stefan Egli (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13950836#comment-13950836
 ] 

Stefan Egli commented on OAK-1453:
--

As noted in OAK-1641 there is a 'nasty' runtime exception thrown during a 
crash/failover. Not sure what the expected behavior of this should be though?

 MongoMK failover support for replica sets (esp. shards)
 ---

 Key: OAK-1453
 URL: https://issues.apache.org/jira/browse/OAK-1453
 Project: Jackrabbit Oak
  Issue Type: Improvement
  Components: mongomk
Reporter: Michael Marth
Assignee: Thomas Mueller
Priority: Critical
  Labels: production, resilience
 Fix For: 0.20


 With OAK-759 we have introduced replica support in MongoMK. I think we still 
 need to address the resilience for failover from primary to secoandary:
 Consider a case where Oak writes to the primary. Replication to secondary is 
 ongoing. During that period the primary goes down and the secondary becomes 
 primary. There could be some half-replicated MVCC revisions, which need to 
 be either discarded or be ignored after the failover.
 This might not be an issue if there is only one shard, as the commit root is 
 written last (and replicated last)
 But with 2 shards the the replication state of these 2 shards could be 
 inconsistent. Oak needs to handle such a situation without falling over.
 If we can detect a Mongo failover we could query Mongo which revisions are 
 fully replicated to the new primary and discard the potentially 
 half-replicated revisions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (OAK-1453) MongoMK failover support for replica sets (esp. shards)

2014-03-28 Thread Stefan Egli (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13950790#comment-13950790
 ] 

Stefan Egli commented on OAK-1453:
--

OAK-1641 reports a failure in the google cache when a primary mongo crashes

 MongoMK failover support for replica sets (esp. shards)
 ---

 Key: OAK-1453
 URL: https://issues.apache.org/jira/browse/OAK-1453
 Project: Jackrabbit Oak
  Issue Type: Improvement
  Components: mongomk
Reporter: Michael Marth
Assignee: Thomas Mueller
Priority: Critical
  Labels: production, resilience
 Fix For: 0.20


 With OAK-759 we have introduced replica support in MongoMK. I think we still 
 need to address the resilience for failover from primary to secoandary:
 Consider a case where Oak writes to the primary. Replication to secondary is 
 ongoing. During that period the primary goes down and the secondary becomes 
 primary. There could be some half-replicated MVCC revisions, which need to 
 be either discarded or be ignored after the failover.
 This might not be an issue if there is only one shard, as the commit root is 
 written last (and replicated last)
 But with 2 shards the the replication state of these 2 shards could be 
 inconsistent. Oak needs to handle such a situation without falling over.
 If we can detect a Mongo failover we could query Mongo which revisions are 
 fully replicated to the new primary and discard the potentially 
 half-replicated revisions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)