Thanks a lot Barry and others, You are right some info is missingin my previous mails. Let me detail a bit further:
- We have both PARTITIONED ANDREPLICATION regions (both of them are PERSISTENT) -> We have both paralleland serial senders with overflow to disk - We are using off-heap - We are using PDX serialization - Our default setup is an active /active setup with some data stickiness (In normal circumstances the same datais handled always in the same Geode instance). An instance is taking all thetraffic under network split between WAN instances or failure of a Geode cluster. - We have custom conflict resolution,to minimize consistency issues. I was checking your proposal and Ihave several comments / questions, so any feedback is really appreciated: - When having multiple regions, theprocesses should be repeated for each region, and senders should be startedwhen all regions "have finished" to ingest data and consume events,right? - I am not sure if this approachscales well when clusters are big. We were thinking more on an export data /transfer / import data approach. I am not 100 % sure what is best. We will dosome testing and we can find the best option. Your approach has the benefitthat time in which events are duplicated is much more reduced and I think thatcould avoid potential consistency issues. Thanks, /Evaristo En jueves, 24 de octubre de 2019 23:47:03 CEST, Barry Oglesby <bogle...@pivotal.io> escribió: You could look into a blue-green-type strategy to re-populate the second WAN site. This idea uses a Geode durable client that is connected to both sites. It connects to site1 using CQ and site2 using a proxy region. It basically takes initial results and events from site1 and puts them into site2. If you're in the state where site1 is up and site2 is down, then here are steps: 1. Stop gateway sender in site1 so that no events are queued for site2. You can use gfsh stop gateway-sender to do this. 2. Restart locators and servers in site2 3. Stop gateway sender in site2 so that events from the durable client are not sent back to site1. You can use gfsh stop gateway-sender to do this. At this point, the two sites are not really connected by the WAN. 4. Start durable client (set durable-client-id=migration-client) This: - creates a CQ connected to site1 - executes the CQ with initial results - adds those results to site2 using the proxy region - sends ready for events which starts the events flowing to the MigrationListener. Events received by the MigrationListener are added to site2 using the proxy region. When steady state is achieved (meaning all the initial results are processed and only the MigrationListener is processing events): 5. Restart gateway sender in site1 6. Stop durable client After you restart the gateway sender in site1 but before you stop the durable client, both will be sending events to site2. This will result in duplicate events in site2, so the shorter the time between these actions, the fewer duplicate events. 7. After the durable client has been stopped, restart the gateway sender in site2. Notes / Caveats: I attached the MigrationClient, MigrationListener and configuration files. If you're using PDX serialization, you might have to work around JIRA GEODE-6271: https://issues.apache.org/jira/browse/GEODE-6271 The MigrationClient does this in registerPdxTypesOnAllPools. If you're not using PDX serialization, you can remove this code. You don't mention if any of your entities are persistent. If your PdxTypes are persistent in site2, you won't need to work around JIRA GEODE-6271 If your senders are persistent, you may need to delete the disk files before restarting the senders. Thanks,Barry Oglesby On Wed, Oct 23, 2019 at 10:36 PM evaristo.camar...@yahoo.es <evaristo.camar...@yahoo.es> wrote: Thanks a lot. We Will try this Enviado desde Yahoo Mail con Android El mié., oct. 23, 2019 a 23:35, Jason Huynh<jhu...@pivotal.io> escribió: Hi Evaristo, I spoke with another committer, Anil, and from what we understand, this process that is described would work. I am not sure if this it the recommended way to do a restart but we believe the steps outlined would get the intended outcome. To clear a Serial gateway, I believe stopping the gateway sender will clear it's queue. However for a parallel gateway sender I think the parallel queue gets cleared once the sender is restarted (so a stop and then a start). There may be other ways such as destroying the gateway sender but you'd probably have to detach it from the region first. This sounds like a WAN gii feature would be useful and help reduce the steps in this use case. Please chime in if this response is wrong or can be improved. Thanks,-Jason On Tue, Oct 22, 2019 at 1:26 PM evaristo.camar...@yahoo.es <evaristo.camar...@yahoo.es> wrote: Hi there, We are planning to use aninstallation with 2 Geode cluster connected via WAN and using gateway senders/receiversto keep them updated. Main reason is resiliency for disasters in a data center. It is not clear for us how torecover a datacenter in case of disaster. This is the use case: - One of the data centers have aproblem (natural catastrophe) - The other data center keepsrunning traffic and filling the gateway sender queues that need to be stoppedat some point to avoid filling up the disk resources. At some point in time, the datacenter is ready to start recovery that will require to synchronize the Geodecopy. The procedure should something like: - Drain gateway service queues incopy providing service - Start gateway senders - Make a copy - Transfer copy to data center thatwill be recovered - Import the copy - Allow the data center to catchupup via replication - Start again the copy. Does it make sense? Or is there abetter way to do it. In case the answer is yes, is there any way to draingateway sender’s queues (both for parallel and serial GWs) Thanks in advance, /Evaristo