Thanks Anthony.

We have already enabled synchronous disk writes to minimize data loss in the 
event of crash.

From: Anthony Baker <aba...@pivotal.io<mailto:aba...@pivotal.io>>
Reply-To: 
<user@geode.incubator.apache.org<mailto:user@geode.incubator.apache.org>>
Date: Thursday, October 13, 2016 at 8:31 PM
To: <user@geode.incubator.apache.org<mailto:user@geode.incubator.apache.org>>
Subject: Re: GemFire persisted data corruption - how to debug?

Hi Kapil,

Geode (by default) writes data synchronously to other cluster members.  If a 
node crashes like in your test, the update is preserved by the cluster even in 
the absence of persistence.  Synchronous disk writes can be turned on (see [1]) 
but many users prefer to avoid the fsync performance penalty.

Anthony

[1] https://cwiki.apache.org/confluence/display/GEODE/Native+Disk+Persistence

On Oct 13, 2016, at 6:46 PM, Kapil Goyal 
<goy...@vmware.com<mailto:goy...@vmware.com>> wrote:

Hi Folks,

I am doing some crash testing with a single cache node of GemFire, where I 
power off the VM where cache is running and then bring it back up. Upon 
restart, GemFire refuses to come up with this error:

Caused by: java.lang.NullPointerException
        at 
com.gemstone.gemfire.internal.util.concurrent.CustomEntryConcurrentHashMap.keyHash(CustomEntryConcurrentHashMap.java:228)
 ~[gemfire-8.2.0.2.jar:?]
        at 
com.gemstone.gemfire.internal.cache.AbstractRegionEntry$HashRegionEntryCreator.keyHashCode(AbstractRegionEntry.java:934)
 ~[gemfire-8.2.0.2.jar:?]
        at 
com.gemstone.gemfire.internal.util.concurrent.CustomEntryConcurrentHashMap.get(CustomEntryConcurrentHashMap.java:1447)
 ~[gemfire-8.2.0.2.jar:?]
        at 
com.gemstone.gemfire.internal.cache.AbstractRegionMap.getEntry(AbstractRegionMap.java:368)
 ~[gemfire-8.2.0.2.jar:?]
        at 
com.gemstone.gemfire.internal.cache.AbstractLRURegionMap.getEntry(AbstractLRURegionMap.java:47)
 ~[gemfire-8.2.0.2.jar:?]
        at 
com.gemstone.gemfire.internal.cache.PlaceHolderDiskRegion.getDiskEntry(PlaceHolderDiskRegion.java:93)
 ~[gemfire-8.2.0.2.jar:?]
        at 
com.gemstone.gemfire.internal.cache.Oplog.readModifyEntry(Oplog.java:2779) 
~[gemfire-8.2.0.2.jar:?]
        at com.gemstone.gemfire.internal.cache.Oplog.readCrf(Oplog.java:1957) 
~[gemfire-8.2.0.2.jar:?]
        at 
com.gemstone.gemfire.internal.cache.Oplog.recoverCrf(Oplog.java:2270) 
~[gemfire-8.2.0.2.jar:?]
        at 
com.gemstone.gemfire.internal.cache.PersistentOplogSet.recoverOplogs(PersistentOplogSet.java:459)
 ~[gemfire-8.2.0.2.jar:?]
        at 
com.gemstone.gemfire.internal.cache.PersistentOplogSet.recoverRegionsThatAreReady(PersistentOplogSet.java:367)
 ~[gemfire-8.2.0.2.jar:?]
        at 
com.gemstone.gemfire.internal.cache.DiskStoreImpl.recoverRegionsThatAreReady(DiskStoreImpl.java:2065)
 ~[gemfire-8.2.0.2.jar:?]
        at 
com.gemstone.gemfire.internal.cache.DiskStoreImpl.initializeIfNeeded(DiskStoreImpl.java:2052)
 ~[gemfire-8.2.0.2.jar:?]
        at 
com.gemstone.gemfire.internal.cache.DiskStoreImpl.doInitialRecovery(DiskStoreImpl.java:2057)
 ~[gemfire-8.2.0.2.jar:?]
        at 
com.gemstone.gemfire.internal.cache.DiskStoreFactoryImpl.create(DiskStoreFactoryImpl.java:135)
 ~[gemfire-8.2.0.2.jar:?]
        at 
com.gemstone.gemfire.internal.cache.xmlcache.CacheCreation.createDiskStore(CacheCreation.java:650)
 ~[gemfire-8.2.0.2.jar:?]
        at 
com.gemstone.gemfire.internal.cache.xmlcache.CacheCreation.create(CacheCreation.java:425)
 ~[gemfire-8.2.0.2.jar:?]
        at 
com.gemstone.gemfire.internal.cache.xmlcache.CacheXmlParser.create(CacheXmlParser.java:331)
 ~[gemfire-8.2.0.2.jar:?]
        at 
com.gemstone.gemfire.internal.cache.GemFireCacheImpl.loadCacheXml(GemFireCacheImpl.java:4248)
 ~[gemfire-8.2.0.2.jar:?]
        at 
org.springframework.data.gemfire.CacheFactoryBean.init(CacheFactoryBean.java:306)
 ~[spring-data-gemfire-1.5.2.RELEASE.jar:1.5.2.RELEASE]
        at 
org.springframework.data.gemfire.CacheFactoryBean.getObject(CacheFactoryBean.java:455)
 ~[spring-data-gemfire-1.5.2.RELEASE.jar:1.5.2.RELEASE]

It hints at GemFire data on disk being corrupted, so I used 'gfsh' to verify:

gfsh>validate offline-disk-store --name=nsxDiskStore 
--disk-dirs=/common/nsxapi/data/self

Validating nsxDiskStore
/nsx_sys/ArrayListIDPriorityModel: entryCount=0
/nsx_sys/Crl: entryCount=0
/nsx_sys/Certificate: entryCount=1
......
Error in validating disk store nsxDiskStore is : null

This confirms that the disk-store is corrupted, but doesn't give any more 
information to debug this further. How do I go about debugging this? Have you 
seen this before and are there any fixes/workarounds available?

Thanks
Kapil

Reply via email to