Re: NullPointerException while create region during server restart

2021-07-08 Thread Kirk Lund
Hi Mario,

I would guess that *getPdxRegistry()* is returning a null until after the
registry has finished initializing. Just a guess though.

Here's a spreadsheet
a
couple of us created and used as a reference for some work about a year and
a half ago. The source code line numbers probably aren't correct anymore,
but the order of steps and general details should still be accurate. As
you'll see, the PDX region (aka the PDX registry) is created at step 19 of
the spreadsheet.

Step 26 is the creation of the CacheServerMXBean.
Step 29 marks 'Online' status change.

You need to wait for all servers to reach 'Online' on Step 29 of the
spreadsheet before making any changes like creating regions.

To understand how to identify 'Online', take a look at these two acceptance
tests:
1.
geode-assembly/src/acceptanceTest/java/org/apache/geode/launcher/ServerStartupOnlineTest.java
2.
geode-assembly/src/acceptanceTest/java/org/apache/geode/launcher/ServerStartupNotificationTest.java

-Kirk

On Tue, Jul 6, 2021 at 12:06 AM Mario Kevo  wrote:

> Hi Geode devs,
>
> I opened a new ticket https://issues.apache.org/jira/browse/GEODE-9409
> regarding NullPointerException on creating region while one of the servers
> is restarting.
> If we run the "create region" command through gfsh while the server is
> starting it passed, but if the server is restarted then it fails. The
> difference is that when we restarted the server, we kill them and start
> again. As it has already a server directory, it takes more time to get the
> server up as expected.
> In that case, if we run the "create region" command it can happen that the
> cache is not fully created and we are trying to do something on that. That
> can lead to the NullPointerException, as creating region catches
> pdxRegistry from the cache while doing findDiskStore, but sometimes it is
> not initialized in the cache yet. So every method run against that will
> throw NullPoniterException.
> There is a part of the code where the exception is thrown:
>
> DiskStoreImpl findDiskStore(RegionAttributes regionAttributes,
> InternalRegionArguments internalRegionArgs) {
>   // validate that persistent type registry is persistent
>   if (getAttributes().getDataPolicy().withPersistence()) {
> getCache().getPdxRegistry().creatingPersistentRegion();
>   }
>
> As I already mention, getPdxRegistry(LocalRegion.java) will be null if it
> is not yet initialized in create(CacheCreation.java):
>
> DiskStoreAttributesCreation pdxRegDSC = initializePdxDiskStore(cache);
>
> cache.initializePdxRegistry();
>
> createDiskStores(cache, pdxRegDSC);
>
> I tried to do some fixes, but without a success. 
> It can be passed if we add some retry and sleep, but that is not
> acceptable.
>
> So if someone has some idea how to do some wait until pdxRegistry is
> initialized or something else what will help us to avoid this problem?
>
> BR,
> Mario
>


Re: Odg: NullPointerException while create region during server restart

2021-07-08 Thread Anthony Baker
One thing you might check is why the create region request from gfsh was 
allowed to proceed before initialization was complete.  That is, cluster config 
and all associated configuration like the pdx registry should be created before 
any *new configuration* requests are processed.

I’m not sure what the code path looks like but that might be a place to start 
investigating.

Anthony


> On Jul 8, 2021, at 4:27 AM, Mario Kevo  wrote:
> 
> Hi Anthony,
> 
> It happened while the server is starting and creating a cache (while fills in 
> the content of a cache based on the creation object's state). The NPE occurs 
> when the "create region" command is executed before pdxRegistry is 
> initialized. There is that part of the code where pdxRegistry is initialized: 
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FNordix%2Fgeode%2Fblob%2Fdevelop%2Fgeode-core%2Fsrc%2Fmain%2Fjava%2Forg%2Fapache%2Fgeode%2Finternal%2Fcache%2Fxmlcache%2FCacheCreation.java%23L529data=04%7C01%7Cbakera%40vmware.com%7C45e30baf46fc4661ec3d08d942036bd5%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637613404734267138%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=viKM8UDdp5xydX0AcB03xte%2Bxsdv%2F0p68qXjyca1HW4%3Dreserved=0
> 
> Before this part of the code is executed it has that pdxRegistry is null, and 
> it throws the NPE in findDiskStore.
> 
> 
> BR,
> Mario
> 
> Šalje: Anthony Baker 
> Poslano: 7. srpnja 2021. 17:58
> Prima: dev@geode.apache.org 
> Predmet: Re: NullPointerException while create region during server restart
> 
> When the NPE occurs, has the server completed its bootstrapping from cluster 
> configuration yet?
> 
> Anthony
> 
> 
>> On Jul 6, 2021, at 12:06 AM, Mario Kevo  wrote:
>> 
>> Hi Geode devs,
>> 
>> I opened a new ticket https://issues.apache.org/jira/browse/GEODE-9409 
>> regarding NullPointerException on creating region while one of the servers 
>> is restarting.
>> If we run the "create region" command through gfsh while the server is 
>> starting it passed, but if the server is restarted then it fails. The 
>> difference is that when we restarted the server, we kill them and start 
>> again. As it has already a server directory, it takes more time to get the 
>> server up as expected.
>> In that case, if we run the "create region" command it can happen that the 
>> cache is not fully created and we are trying to do something on that. That 
>> can lead to the NullPointerException, as creating region catches pdxRegistry 
>> from the cache while doing findDiskStore, but sometimes it is not 
>> initialized in the cache yet. So every method run against that will throw 
>> NullPoniterException.
>> There is a part of the code where the exception is thrown:
>> 
>> DiskStoreImpl findDiskStore(RegionAttributes regionAttributes,
>>   InternalRegionArguments internalRegionArgs) {
>> // validate that persistent type registry is persistent
>> if (getAttributes().getDataPolicy().withPersistence()) {
>>   getCache().getPdxRegistry().creatingPersistentRegion();
>> }
>> 
>> As I already mention, getPdxRegistry(LocalRegion.java) will be null if it is 
>> not yet initialized in create(CacheCreation.java):
>> 
>> DiskStoreAttributesCreation pdxRegDSC = initializePdxDiskStore(cache);
>> 
>> cache.initializePdxRegistry();
>> 
>> createDiskStores(cache, pdxRegDSC);
>> 
>> I tried to do some fixes, but without a success. 
>> It can be passed if we add some retry and sleep, but that is not acceptable.
>> 
>> So if someone has some idea how to do some wait until pdxRegistry is 
>> initialized or something else what will help us to avoid this problem?
>> 
>> BR,
>> Mario
> 



Odg: NullPointerException while create region during server restart

2021-07-08 Thread Mario Kevo
Hi Anthony,

It happened while the server is starting and creating a cache (while fills in 
the content of a cache based on the creation object's state). The NPE occurs 
when the "create region" command is executed before pdxRegistry is initialized. 
There is that part of the code where pdxRegistry is initialized: 
https://github.com/Nordix/geode/blob/develop/geode-core/src/main/java/org/apache/geode/internal/cache/xmlcache/CacheCreation.java#L529

Before this part of the code is executed it has that pdxRegistry is null, and 
it throws the NPE in findDiskStore.


BR,
Mario

Šalje: Anthony Baker 
Poslano: 7. srpnja 2021. 17:58
Prima: dev@geode.apache.org 
Predmet: Re: NullPointerException while create region during server restart

When the NPE occurs, has the server completed its bootstrapping from cluster 
configuration yet?

Anthony


> On Jul 6, 2021, at 12:06 AM, Mario Kevo  wrote:
>
> Hi Geode devs,
>
> I opened a new ticket https://issues.apache.org/jira/browse/GEODE-9409 
> regarding NullPointerException on creating region while one of the servers is 
> restarting.
> If we run the "create region" command through gfsh while the server is 
> starting it passed, but if the server is restarted then it fails. The 
> difference is that when we restarted the server, we kill them and start 
> again. As it has already a server directory, it takes more time to get the 
> server up as expected.
> In that case, if we run the "create region" command it can happen that the 
> cache is not fully created and we are trying to do something on that. That 
> can lead to the NullPointerException, as creating region catches pdxRegistry 
> from the cache while doing findDiskStore, but sometimes it is not initialized 
> in the cache yet. So every method run against that will throw 
> NullPoniterException.
> There is a part of the code where the exception is thrown:
>
> DiskStoreImpl findDiskStore(RegionAttributes regionAttributes,
>InternalRegionArguments internalRegionArgs) {
>  // validate that persistent type registry is persistent
>  if (getAttributes().getDataPolicy().withPersistence()) {
>getCache().getPdxRegistry().creatingPersistentRegion();
>  }
>
> As I already mention, getPdxRegistry(LocalRegion.java) will be null if it is 
> not yet initialized in create(CacheCreation.java):
>
> DiskStoreAttributesCreation pdxRegDSC = initializePdxDiskStore(cache);
>
> cache.initializePdxRegistry();
>
> createDiskStores(cache, pdxRegDSC);
>
> I tried to do some fixes, but without a success. 
> It can be passed if we add some retry and sleep, but that is not acceptable.
>
> So if someone has some idea how to do some wait until pdxRegistry is 
> initialized or something else what will help us to avoid this problem?
>
> BR,
> Mario



Native client concourse pipeline is failing

2021-07-08 Thread Mario Salazar de Torres
Hi everyone,

I am trying to run the CI for geode-native, but I am observing some issues:

  *   
https://concourse.apachegeode-ci.info/teams/main/pipelines/geode-native-develop-pr/jobs/build-rhel-7-debug/builds/307
  *   
https://concourse.apachegeode-ci.info/teams/main/pipelines/geode-native-develop-pr/jobs/build-rhel-7-debug/builds/308

The error seems to be:
Put "/volumes/8f996ea5-e894-4f0d-40ba-b61fbe0e2b7e/stream-out?path=.": context 
canceled
And as my little experience goes with this, I'd say this seems to be related to 
the concourse system itself. Is there anyone that can look at it?

Thanks,
Mario!