Hi Dan,

Thanks for providing more information about the inner workings of disk-store.

We enabled sync-writes right from the start and it was enabled when VM was 
powered off.

I was able to enable trace logging, but couldn't make out much from the logs I 
got. I also tried attaching a debugger to the validation code and it just 
helped me reached the same conclusion as you - a key is null.

Regards
Kapil

From: Dan Smith <dsm...@pivotal.io<mailto:dsm...@pivotal.io>>
Reply-To: 
"user@geode.incubator.apache.org<mailto:user@geode.incubator.apache.org>" 
<user@geode.incubator.apache.org<mailto:user@geode.incubator.apache.org>>
Date: Friday, October 14, 2016 at 1:47 PM
To: "user@geode.incubator.apache.org<mailto:user@geode.incubator.apache.org>" 
<user@geode.incubator.apache.org<mailto:user@geode.incubator.apache.org>>
Subject: Re: GemFire persisted data corruption - how to debug?

Hi Kapil,

Hmm, that's not good. It looks like it somehow ending up with a key that is 
null. Did you have synchronous writes enabled before powering off the VM?

With or without synchronous writes enabled, this is not expected or a known 
issue. Every write to the disk store files records an end of record marker, so 
an incomplete write or something like that shouldn't be able to cause this.

One way to get more debugging information is to enable trace logging when you 
validate that offline disk store. But this prints out a lot of debugging 
information, so it might be hard to follow what's going on:
gfsh validate offline-disk-store --J=-Dgemfire.log-level=trace 
--name=nsxDiskStore --disk-dirs=/common/nsxapi/data/self

-Dan



On Fri, Oct 14, 2016 at 10:59 AM, Anilkumar Gingade 
<aging...@pivotal.io<mailto:aging...@pivotal.io>> wrote:
Kapil,

This is related to GemFire 8.2. If its a critical issue, please reach-out to 
GemFire support so that it will be tracked and addressed in time.

Can you see this with Geode versions, this helps to eliminate/identify the 
version where the issues i happening.

Is it possible for you to share your .xml and persistent files?

Thanks,
-Anil.




On Thu, Oct 13, 2016 at 6:46 PM, Kapil Goyal 
<goy...@vmware.com<mailto:goy...@vmware.com>> wrote:
Hi Folks,

I am doing some crash testing with a single cache node of GemFire, where I 
power off the VM where cache is running and then bring it back up. Upon 
restart, GemFire refuses to come up with this error:

Caused by: java.lang.NullPointerException
        at 
com.gemstone.gemfire.internal.util.concurrent.CustomEntryConcurrentHashMap.keyHash(CustomEntryConcurrentHashMap.java:228)
 ~[gemfire-8.2.0.2.jar:?]
        at 
com.gemstone.gemfire.internal.cache.AbstractRegionEntry$HashRegionEntryCreator.keyHashCode(AbstractRegionEntry.java:934)
 ~[gemfire-8.2.0.2.jar:?]
        at 
com.gemstone.gemfire.internal.util.concurrent.CustomEntryConcurrentHashMap.get(CustomEntryConcurrentHashMap.java:1447)
 ~[gemfire-8.2.0.2.jar:?]
        at 
com.gemstone.gemfire.internal.cache.AbstractRegionMap.getEntry(AbstractRegionMap.java:368)
 ~[gemfire-8.2.0.2.jar:?]
        at 
com.gemstone.gemfire.internal.cache.AbstractLRURegionMap.getEntry(AbstractLRURegionMap.java:47)
 ~[gemfire-8.2.0.2.jar:?]
        at 
com.gemstone.gemfire.internal.cache.PlaceHolderDiskRegion.ge<https://urldefense.proofpoint.com/v2/url?u=http-3A__cache.PlaceHolderDiskRegion.ge&d=CwMFaQ&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=jI39iGhJSMsophpPzgwoqWd6xw05tZ5QPHjlmT5c7Tw&m=GpPVQWlXbmUVFsVG6xvwkhIlnb_jNflOci9phO_ttGI&s=THtATbPW4Cvv38hMtEGr6rJ-7I8DwSgDlOmS-E5QeX4&e=>tDiskEntry(PlaceHolderDiskRegion.java:93)
 ~[gemfire-8.2.0.2.jar:?]
        at 
com.gemstone.gemfire.internal.cache.Oplog.readModifyEntry(Oplog.java:2779) 
~[gemfire-8.2.0.2.jar:?]
        at com.gemstone.gemfire.internal.cache.Oplog.readCrf(Oplog.java:1957) 
~[gemfire-8.2.0.2.jar:?]
        at 
com.gemstone.gemfire.internal.cache.Oplog.recoverCrf(Oplog.java:2270) 
~[gemfire-8.2.0.2.jar:?]
        at 
com.gemstone.gemfire.internal.cache.PersistentOplogSet.recoverOplogs(PersistentOplogSet.java:459)
 ~[gemfire-8.2.0.2.jar:?]
        at 
com.gemstone.gemfire.internal.cache.PersistentOplogSet.recoverRegionsThatAreReady(PersistentOplogSet.java:367)
 ~[gemfire-8.2.0.2.jar:?]
        at 
com.gemstone.gemfire.internal.cache.DiskStoreImpl.recoverRegionsThatAreReady(DiskStoreImpl.java:2065)
 ~[gemfire-8.2.0.2.jar:?]
        at 
com.gemstone.gemfire.internal.cache.DiskStoreImpl.initializeIfNeeded(DiskStoreImpl.java:2052)
 ~[gemfire-8.2.0.2.jar:?]
        at 
com.gemstone.gemfire.internal.cache.DiskStoreImpl.doInitialRecovery(DiskStoreImpl.java:2057)
 ~[gemfire-8.2.0.2.jar:?]
        at 
com.gemstone.gemfire.internal.cache.DiskStoreFactoryImpl.create(DiskStoreFactoryImpl.java:135)
 ~[gemfire-8.2.0.2.jar:?]
        at 
com.gemstone.gemfire.internal.cache.xmlcache.CacheCreation.createDiskStore(CacheCreation.java:650)
 ~[gemfire-8.2.0.2.jar:?]
        at 
com.gemstone.gemfire.internal.cache.xmlcache.CacheCreation.create(CacheCreation.java:425)
 ~[gemfire-8.2.0.2.jar:?]
        at 
com.gemstone.gemfire.internal.cache.xmlcache.CacheXmlParser.create(CacheXmlParser.java:331)
 ~[gemfire-8.2.0.2.jar:?]
        at 
com.gemstone.gemfire.internal.cache.GemFireCacheImpl.loadCacheXml(GemFireCacheImpl.java:4248)
 ~[gemfire-8.2.0.2.jar:?]
        at 
org.springframework.data.gemfire.CacheFactoryBean.init(CacheFactoryBean.java:306)
 
~[spring-data-gemfire-1.5.2.RE<https://urldefense.proofpoint.com/v2/url?u=http-3A__spring-2Ddata-2Dgemfire-2D1.5.2.RE&d=CwMFaQ&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=jI39iGhJSMsophpPzgwoqWd6xw05tZ5QPHjlmT5c7Tw&m=GpPVQWlXbmUVFsVG6xvwkhIlnb_jNflOci9phO_ttGI&s=IqCtRW2UuGyEtMekG7SU-R0MfTT8HC5276GcMSh0HgM&e=>LEASE.jar:1.5.2.RELEASE]
        at 
org.springframework.data.gemfire.CacheFactoryBean.getObject(CacheFactoryBean.java:455)
 
~[spring-data-gemfire-1.5.2.RE<https://urldefense.proofpoint.com/v2/url?u=http-3A__spring-2Ddata-2Dgemfire-2D1.5.2.RE&d=CwMFaQ&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=jI39iGhJSMsophpPzgwoqWd6xw05tZ5QPHjlmT5c7Tw&m=GpPVQWlXbmUVFsVG6xvwkhIlnb_jNflOci9phO_ttGI&s=IqCtRW2UuGyEtMekG7SU-R0MfTT8HC5276GcMSh0HgM&e=>LEASE.jar:1.5.2.RELEASE]

It hints at GemFire data on disk being corrupted, so I used 'gfsh' to verify:

gfsh>validate offline-disk-store --name=nsxDiskStore 
--disk-dirs=/common/nsxapi/data/self

Validating nsxDiskStore
/nsx_sys/ArrayListIDPriorityModel: entryCount=0
/nsx_sys/Crl: entryCount=0
/nsx_sys/Certificate: entryCount=1
......
Error in validating disk store nsxDiskStore is : null

This confirms that the disk-store is corrupted, but doesn't give any more 
information to debug this further. How do I go about debugging this? Have you 
seen this before and are there any fixes/workarounds available?

Thanks
Kapil


Reply via email to