Re: RECOVERING A WAN INSTALLATION

Barry Oglesby Thu, 24 Oct 2019 14:48:42 -0700

You could look into a blue-green-type strategy to re-populate the second
WAN site.

This idea uses a Geode durable client that is connected to both sites. It
connects to site1 using CQ and site2 using a proxy region. It basically
takes initial results and events from site1 and puts them into site2.

If you're in the state where site1 is up and site2 is down, then here are
steps:

1. Stop gateway sender in site1 so that no events are queued for site2. You
can use gfsh stop gateway-sender to do this.

2. Restart locators and servers in site2

3. Stop gateway sender in site2 so that events from the durable client are
not sent back to site1. You can use gfsh stop gateway-sender to do this.

At this point, the two sites are not really connected by the WAN.

4. Start durable client (set durable-client-id=migration-client)

This:
- creates a CQ connected to site1
- executes the CQ with initial results
- adds those results to site2 using the proxy region
- sends ready for events which starts the events flowing to the
MigrationListener. Events received by the MigrationListener are added to
site2 using the proxy region.

When steady state is achieved (meaning all the initial results are
processed and only the MigrationListener is processing events):

5. Restart gateway sender in site1

6. Stop durable client

After you restart the gateway sender in site1 but before you stop the
durable client, both will be sending events to site2. This will result in
duplicate events in site2, so the shorter the time between these actions,
the fewer duplicate events.

7. After the durable client has been stopped, restart the gateway sender in
site2.

Notes / Caveats:

I attached the MigrationClient, MigrationListener and configuration files.

If you're using PDX serialization, you might have to work around JIRA
GEODE-6271:

https://issues.apache.org/jira/browse/GEODE-6271

The MigrationClient does this in registerPdxTypesOnAllPools. If you're not
using PDX serialization, you can remove this code.

You don't mention if any of your entities are persistent.

If your PdxTypes are persistent in site2, you won't need to work around
JIRA GEODE-6271

If your senders are persistent, you may need to delete the disk files
before restarting the senders.

Thanks,
Barry Oglesby

On Wed, Oct 23, 2019 at 10:36 PM evaristo.camar...@yahoo.es <
evaristo.camar...@yahoo.es> wrote:

> Thanks a lot. We Will try this
>
> Enviado desde Yahoo Mail con Android
> <https://go.onelink.me/107872968?pid=InProduct&c=Global_Internal_YGrowth_AndroidEmailSig__AndroidUsers&af_wl=ym&af_sub1=Internal&af_sub2=Global_YGrowth&af_sub3=EmailSignature>
>
> El mié., oct. 23, 2019 a 23:35, Jason Huynh
> <jhu...@pivotal.io> escribió:
> Hi Evaristo,
>
> I spoke with another committer, Anil, and from what we understand, this
> process that is described would work.  I am not sure if this it the
> recommended way to do a restart but we believe the steps outlined would get
> the intended outcome.
>
> To clear a Serial gateway, I believe stopping the gateway sender will
> clear it's queue. However for a parallel gateway sender I think the
> parallel queue gets cleared once the sender is restarted (so a stop and
> then a start).  There may be other ways such as destroying the gateway
> sender but you'd probably have to detach it from the region first.
>
> This sounds like a WAN gii feature would be useful and help reduce the
> steps in this use case.
>
> Please chime in if this response is wrong or can be improved.
>
> Thanks,
> -Jason
>
> On Tue, Oct 22, 2019 at 1:26 PM evaristo.camar...@yahoo.es <
> evaristo.camar...@yahoo.es> wrote:
>
> Hi there,
>
>
>
> We are planning to use an installation with 2 Geode cluster connected via
> WAN and using gateway senders/receivers to keep them updated. Main reason
> is resiliency for disasters in a data center.
>
>
>
> It is not clear for us how to recover a datacenter in case of disaster.
> This is the use case:
>
> - One of the data centers have a problem (natural catastrophe)
>
> - The other data center keeps running traffic and filling the gateway
> sender queues that need to be stopped at some point to avoid filling up the
> disk resources.
>
>
>
> At some point in time, the data center is ready to start recovery that
> will require to synchronize the Geode copy. The procedure should something
> like:
>
> - Drain gateway service queues in copy providing service
>
> - Start gateway senders
>
> - Make a copy
>
> - Transfer copy to data center that will be recovered
>
> - Import the copy
>
> - Allow the data center to catchup up via replication
>
> - Start again the copy.
>
>
>
> Does it make sense? Or is there a better way to do it. In case the answer
> is yes, is there any way to drain gateway sender’s queues (both for
> parallel and serial GWs)
>
>
>
> Thanks in advance,
>
>
>
> /Evaristo
>
>
>
>

<?xml version="1.0" encoding="UTF-8"?>

<!DOCTYPE client-cache PUBLIC
  "-//GemStone Systems, Inc.//GemFire Declarative Caching 7.0//EN"
  "http://www.gemstone.com/dtd/cache7_0.dtd";>
  
<client-cache>

  <pool name="ny-pool">
    <locator host="localhost" port="10331"/>
  </pool>

  <pool name="ln-pool" subscription-enabled="true">
    <locator host="localhost" port="10332"/>
  </pool>

  <region name="data" refid="PROXY">
    <region-attributes pool-name="ny-pool"/>
  </region>

</client-cache>

gemfire-migration-client.properties
Description: Binary data

MigrationClient.java
Description: Binary data

MigrationListener.java
Description: Binary data

Re: RECOVERING A WAN INSTALLATION

Reply via email to