Re: RECOVERING A WAN INSTALLATION

evaristo.camar...@yahoo.es Mon, 28 Oct 2019 01:32:43 -0700

 Thanks a lot Barry and others,

 You are right some info is missingin my previous mails. Let me detail a bit 
further:


 - We have both PARTITIONED ANDREPLICATION regions (both of them are 
PERSISTENT) -> We have both paralleland serial senders with overflow to disk

- We are using off-heap

- We are using PDX serialization

- Our default setup is an active /active setup with some data stickiness (In 
normal circumstances the same datais handled always in the same Geode 
instance). An instance is taking all thetraffic under network split between WAN 
instances or failure of a Geode cluster.

- We have custom conflict resolution,to minimize consistency issues.
 
I was checking your proposal and Ihave several comments / questions, so any 
feedback is really appreciated:
- When having multiple regions, theprocesses should be repeated for each 
region, and senders should be startedwhen all regions "have finished" to ingest 
data and consume events,right?

- I am not sure if this approachscales well when clusters are big. We were 
thinking more on an export data /transfer / import data approach. I am not 100 
% sure what is best. We will dosome testing and we can find the best option. 
Your approach has the benefitthat time in which events are duplicated is much 
more reduced and I think thatcould avoid potential consistency issues.

 

Thanks,

/Evaristo





    En jueves, 24 de octubre de 2019 23:47:03 CEST, Barry Oglesby 
<bogle...@pivotal.io> escribió:  
 
 You could look into a blue-green-type strategy to re-populate the second WAN 
site.

This idea uses a Geode durable client that is connected to both sites. It 
connects to site1 using CQ and site2 using a proxy region. It basically takes 
initial results and events from site1 and puts them into site2.

If you're in the state where site1 is up and site2 is down, then here are steps:

1. Stop gateway sender in site1 so that no events are queued for site2. You can 
use gfsh stop gateway-sender to do this.

2. Restart locators and servers in site2

3. Stop gateway sender in site2 so that events from the durable client are not 
sent back to site1. You can use gfsh stop gateway-sender to do this.

At this point, the two sites are not really connected by the WAN.

4. Start durable client (set durable-client-id=migration-client)

This:
- creates a CQ connected to site1
- executes the CQ with initial results
- adds those results to site2 using the proxy region
- sends ready for events which starts the events flowing to the 
MigrationListener. Events received by the MigrationListener are added to site2 
using the proxy region.

When steady state is achieved (meaning all the initial results are processed 
and only the MigrationListener is processing events):

5. Restart gateway sender in site1

6. Stop durable client

After you restart the gateway sender in site1 but before you stop the durable 
client, both will be sending events to site2. This will result in duplicate 
events in site2, so the shorter the time between these actions, the fewer 
duplicate events.

7. After the durable client has been stopped, restart the gateway sender in 
site2.

Notes / Caveats:

I attached the MigrationClient, MigrationListener and configuration files.

If you're using PDX serialization, you might have to work around JIRA 
GEODE-6271:

https://issues.apache.org/jira/browse/GEODE-6271

The MigrationClient does this in registerPdxTypesOnAllPools. If you're not 
using PDX serialization, you can remove this code.

You don't mention if any of your entities are persistent.

If your PdxTypes are persistent in site2, you won't need to work around JIRA 
GEODE-6271

If your senders are persistent, you may need to delete the disk files before 
restarting the senders.

Thanks,Barry Oglesby


On Wed, Oct 23, 2019 at 10:36 PM evaristo.camar...@yahoo.es 
<evaristo.camar...@yahoo.es> wrote:

Thanks a lot. We Will try this 

Enviado desde Yahoo Mail con Android 
 
  El mié., oct. 23, 2019 a 23:35, Jason Huynh<jhu...@pivotal.io> escribió:   Hi 
Evaristo,
I spoke with another committer, Anil, and from what we understand, this process 
that is described would work.  I am not sure if this it the recommended way to 
do a restart but we believe the steps outlined would get the intended outcome.
To clear a Serial gateway, I believe stopping the gateway sender will clear 
it's queue. However for a parallel gateway sender I think the parallel queue 
gets cleared once the sender is restarted (so a stop and then a start).  There 
may be other ways such as destroying the gateway sender but you'd probably have 
to detach it from the region first.
This sounds like a WAN gii feature would be useful and help reduce the steps in 
this use case.
Please chime in if this response is wrong or can be improved.
Thanks,-Jason 
On Tue, Oct 22, 2019 at 1:26 PM evaristo.camar...@yahoo.es 
<evaristo.camar...@yahoo.es> wrote:


Hi there,

 

We are planning to use aninstallation with 2 Geode cluster connected via WAN 
and using gateway senders/receiversto keep them updated. Main reason is 
resiliency for disasters in a data center.

 

It is not clear for us how torecover a datacenter in case of disaster. This is 
the use case:

- One of the data centers have aproblem (natural catastrophe)

- The other data center keepsrunning traffic and filling the gateway sender 
queues that need to be stoppedat some point to avoid filling up the disk 
resources.

 

At some point in time, the datacenter is ready to start recovery that will 
require to synchronize the Geodecopy. The procedure should something like:

- Drain gateway service queues incopy providing service

- Start gateway senders

- Make a copy

- Transfer copy to data center thatwill be recovered

- Import the copy

- Allow the data center to catchupup via replication

- Start again the copy.

 

Does it make sense? Or is there abetter way to do it. In case the answer is 
yes, is there any way to draingateway sender’s queues (both for parallel and 
serial GWs)

 

Thanks in advance,

 

/Evaristo

Re: RECOVERING A WAN INSTALLATION

Reply via email to