[ 
https://issues.apache.org/jira/browse/GEODE-8248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juan Ramos updated GEODE-8248:
------------------------------
    Attachment: temporal.zip

> Member hangs waiting for missing disk-stores after gfsh shutdown
> ----------------------------------------------------------------
>
>                 Key: GEODE-8248
>                 URL: https://issues.apache.org/jira/browse/GEODE-8248
>             Project: Geode
>          Issue Type: Bug
>          Components: gfsh, persistence
>            Reporter: Juan Ramos
>            Priority: Major
>         Attachments: temporal.zip
>
>
> Let’s say I have 2 servers with a simple {{REPLICATE_PERSISTENT}} region and 
> I stop both using the {{gfsh shutdown}} command.
> According to the 
> [documentation|https://geode.apache.org/docs/guide/112/managing/disk_storage/starting_system_with_disk_stores.html],
>  I should be able to start either of the servers without any problems as both 
> host the most up to date data. However, what happens in reality is that the 
> startup hangs with the following:
> {noformat}
> (1) Executing - start server --name=server1 --locators=localhost[10334] 
> --server-port=40401 --cache-xml-file=/temporal/cache.xml
> .........
> Region /TestRegion has potentially stale data. It is waiting for another 
> member to recover the latest data.
> My persistent id:
>   DiskStore ID: 4d1abaf3-677d-4c52-b3f8-681e051f143c
>   Name: server1
>   Location: /temporal/server1/dataStore
> Members with potentially new data:
> [
>   DiskStore ID: 163dfaf7-a680-4154-a278-8cec40d57d80
>   Name: server2
>   Location: /temporal/server2/dataStore
> ]
> "main" #1 prio=5 os_prio=31 tid=0x00007f9b28809000 nid=0x1003 in 
> Object.wait() [0x000070000ab04000]
>    java.lang.Thread.State: TIMED_WAITING (on object monitor)
>       at java.lang.Object.wait(Native Method)
>       at 
> org.apache.geode.internal.cache.persistence.MembershipChangeListener.waitForChange(MembershipChangeListener.java:62)
>       - locked <0x0000000719df55e0> (a 
> org.apache.geode.internal.cache.persistence.MembershipChangeListener)
>       at 
> org.apache.geode.internal.cache.persistence.PersistenceInitialImageAdvisor.waitForMembershipChangeForMissingDiskStores(PersistenceInitialImageAdvisor.java:218)
>       at 
> org.apache.geode.internal.cache.persistence.PersistenceInitialImageAdvisor.getAdvice(PersistenceInitialImageAdvisor.java:118)
>       at 
> org.apache.geode.internal.cache.persistence.PersistenceAdvisorImpl.getInitialImageAdvice(PersistenceAdvisorImpl.java:835)
>       at 
> org.apache.geode.internal.cache.persistence.CreatePersistentRegionProcessor.getInitialImageAdvice(CreatePersistentRegionProcessor.java:52)
>       at 
> org.apache.geode.internal.cache.DistributedRegion.getInitialImageAndRecovery(DistributedRegion.java:1196)
>       at 
> org.apache.geode.internal.cache.DistributedRegion.initialize(DistributedRegion.java:1076)
>       at 
> org.apache.geode.internal.cache.GemFireCacheImpl.createVMRegion(GemFireCacheImpl.java:3043)
>       at 
> org.apache.geode.pdx.internal.PeerTypeRegistration.initialize(PeerTypeRegistration.java:198)
>       at 
> org.apache.geode.pdx.internal.TypeRegistry.initialize(TypeRegistry.java:116)
>       at 
> org.apache.geode.internal.cache.GemFireCacheImpl.initializePdxRegistry(GemFireCacheImpl.java:1449)
>       - locked <0x00000005c0593168> (a 
> org.apache.geode.internal.cache.GemFireCacheImpl)
>       at 
> org.apache.geode.internal.cache.xmlcache.CacheCreation.create(CacheCreation.java:511)
>       at 
> org.apache.geode.internal.cache.xmlcache.CacheXmlParser.create(CacheXmlParser.java:337)
>       at 
> org.apache.geode.internal.cache.GemFireCacheImpl.loadCacheXml(GemFireCacheImpl.java:4272)
>       at 
> org.apache.geode.internal.cache.GemFireCacheImpl.initializeDeclarativeCache(GemFireCacheImpl.java:1388)
>       at 
> org.apache.geode.internal.cache.GemFireCacheImpl.initialize(GemFireCacheImpl.java:1208)
>       at 
> org.apache.geode.internal.cache.InternalCacheBuilder.create(InternalCacheBuilder.java:207)
>       - locked <0x00000005c016a108> (a java.lang.Class for 
> org.apache.geode.internal.cache.GemFireCacheImpl)
>       - locked <0x00000005c0043de0> (a java.lang.Class for 
> org.apache.geode.internal.cache.InternalCacheBuilder)
>       at 
> org.apache.geode.internal.cache.InternalCacheBuilder.create(InternalCacheBuilder.java:164)
>       - locked <0x00000005c0043de0> (a java.lang.Class for 
> org.apache.geode.internal.cache.InternalCacheBuilder)
>       at org.apache.geode.cache.CacheFactory.create(CacheFactory.java:139)
>       at 
> org.apache.geode.distributed.internal.DefaultServerLauncherCacheProvider.createCache(DefaultServerLauncherCacheProvider.java:52)
>       at 
> org.apache.geode.distributed.ServerLauncher.createCache(ServerLauncher.java:869)
>       at 
> org.apache.geode.distributed.ServerLauncher.start(ServerLauncher.java:786)
>       at 
> org.apache.geode.distributed.ServerLauncher.run(ServerLauncher.java:716)
>       at 
> org.apache.geode.distributed.ServerLauncher.main(ServerLauncher.java:236)
> {noformat}
> We should either fix the problem and make sure the members fully synchronise 
> their data during the {{shutdown}} process so they don't have to wait on each 
> other or, if this is the expected behaviour, update the documentation 
> accordingly.
> The attached {{zip}} file contains a simple script to reproduce the issue, 
> the only thing that needs to be changed after downloading and uncompressing 
> the file, it's the {{GEMFIRE}} environment variable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to