NoAvailableServers after geode server restart

vas aj Mon, 24 Aug 2020 11:59:44 -0700

Hi all,

After I restart the geode cluster having a region of type
*PARTITION_REDUNDANT_PERSISTENT*, the following are seen in the logs


*server-1 logs*
..........................................................
Region /ukCustomers (and any colocated sub-regions) has potentially stale
data.  Buckets [1, 6, 8] are waiting for another offline member to recover
the latest data. My persistent id is:
  DiskStore ID: 30596906-c97c-4279-89ea-46d088ed27f6
  Name: stay-wrong-zeta
  Location: /10.1.2.28:/scripts/data-1
Offline members with potentially new data:[
  DiskStore ID: f4d5a2f6-7254-4749-ba9f-1831d8215634
  Location: /10.1.2.22:/scripts/data-4
  Buckets: [1, 6, 8]
]Use the gfsh show missing-disk-stores command to see all disk stores that
are being waited on by other members.
..........
Region /ukCustomers has successfully completed waiting for other members to
recover the latest data. My persistent member information:
  DiskStore ID: 30596906-c97c-4279-89ea-46d088ed27f6
  Name: stay-wrong-zeta
  Location: /10.1.2.28:/scripts/data-1

................
Server in /stay-wrong-zeta on server-1-7bfcbd6c7b-b54wb[40404] as
stay-wrong-zeta is currently online.
Process ID: 23
Uptime: 1 minute 48 seconds
Geode Version: 1.11.0
Java Version: 1.8.0_212
Log File: /stay-wrong-zeta/stay-wrong-zeta.log
JVM Arguments: -Dgemfire.locators=locator-1[10334],locator-2[10334]
-Dgemfire.start-dev-rest-api=false -Dgemfire.use-cluster-configuration=true
-Dgemfire.cache-xml-file=/scripts/cache-1.xml -Dgemfire.log-level=error
-Xms512m -Xmx512m -XX:+UseG1GC
-Dgemfire.launcher.registerSignalHandlers=true -Djava.awt.headless=true
-Dsun.rmi.dgc.server.gcInterval=9223372036854775806
Class-Path:
/geode/lib/geode-core-1.11.0.jar:/scripts/classpath/domain.jar:/scripts/classpath/spatial4j-0.7.jar:/scripts/classpath/geode-configs.jar:/scripts/classpath/lucene-sandbox-6.6.2.jar:/geode/lib/geode-dependencies.jar

*server-2 logs*
...................................
Region /ukCustomers (and any colocated sub-regions) has potentially stale
data.  Buckets [0, 1, 3] are waiting for another offline member to recover
the latest data.My persistent id is:
  DiskStore ID: 2455d3c8-d852-4dac-a743-25ae62f5892c
  Name: kick-drab-bat
  Location: /10.1.2.30:/scripts/data-2
Offline members with potentially new data:[
  DiskStore ID: f4d5a2f6-7254-4749-ba9f-1831d8215634
  Location: /10.1.2.22:/scripts/data-4
  Buckets: [0, 1, 3]
]Use the gfsh show missing-disk-stores command to see all disk stores that
are being waited on by other members.
..........
Region /ukCustomers has successfully completed waiting for other members to
recover the latest data.My persistent member information:
  DiskStore ID: 2455d3c8-d852-4dac-a743-25ae62f5892c
  Name: kick-drab-bat
  Location: /10.1.2.30:/scripts/data-2

..............
Server in /kick-drab-bat on server-2-9cbbd877c-gl6c4[40405] as
kick-drab-bat is currently online.
Process ID: 23
Uptime: 1 minute 15 seconds
Geode Version: 1.11.0
Java Version: 1.8.0_212
Log File: /kick-drab-bat/kick-drab-bat.log
JVM Arguments: -Dgemfire.locators=locator-1[10334],locator-2[10334]
-Dgemfire.start-dev-rest-api=false -Dgemfire.use-cluster-configuration=true
-Dgemfire.cache-xml-file=/scripts/cache-2.xml -Dgemfire.log-level=error
-Xms512m -Xmx512m -XX:+UseG1GC
-Dgemfire.launcher.registerSignalHandlers=true -Djava.awt.headless=true
-Dsun.rmi.dgc.server.gcInterval=9223372036854775806
Class-Path:
/geode/lib/geode-core-1.11.0.jar:/scripts/classpath/domain.jar:/scripts/classpath/spatial4j-0.7.jar:/scripts/classpath/geode-configs.jar:/scripts/classpath/lucene-sandbox-6.6.2.jar:/geode/lib/geode-dependencies.jar

When I try to connect to the geode server using *client-cache*, it throws
an error

org.apache.geode.cache.client.NoAvailableServersException: null
at
org.apache.geode.cache.client.internal.pooling.ConnectionManagerImpl.borrowConnection(ConnectionManagerImpl.java:277)
at
org.apache.geode.cache.client.internal.OpExecutorImpl.execute(OpExecutorImpl.java:125)
at
org.apache.geode.cache.client.internal.OpExecutorImpl.execute(OpExecutorImpl.java:108)
at
org.apache.geode.cache.client.internal.PoolImpl.execute(PoolImpl.java:772)
at
org.apache.geode.cache.client.internal.PutAllOp.execute(PutAllOp.java:100)
at
org.apache.geode.cache.client.internal.ServerRegionProxy.putAll(ServerRegionProxy.java:592)
at
org.apache.geode.internal.cache.LocalRegion.basicPutAll(LocalRegion.java:8913)
at org.apache.geode.internal.cache.LocalRegion.putAll(LocalRegion.java:8846)
at org.apache.geode.internal.cache.LocalRegion.putAll(LocalRegion.java:8858)

. . .
. . .
. . .

However, telnet <<remote hostname>> 40404 works fine.

*What has gone wrong ?*

*client-cache.xml* is as follows:

<?xml version="1.0" encoding="UTF-8"?>
<client-cache>
    <pool name="writeCachePool">
        <server host="${server1.url}" port="${server1.port}"/>
        <server host="${server2.url}" port="${server2.port}"/>
    </pool>
    <region name="ukCustomers" refid="PROXY"/>
</client-cache>

server 1 is re-started using the command
args: ["gfsh", "start", "server",
"--locators=locator-1[10334],locator-2[10334]",
"--rebalance=true","--server-port=40404", "--log-level=error",
"--J=-Xms512m", "--J=-Xmx512m", "--J=-XX:+UseG1GC",
"--classpath=/scripts/classpath/domain.jar",
"--cache-xml-file=/scripts/cache-1.xml"]

where cache-1.xml is as follows:

<?xml version="1.0" encoding="UTF-8"?>
<cache version="1.0" is-server="true">
    <disk-store name="disk-store-1" compaction-threshold="40"
max-oplog-size="1024" queue-size="10000"
                time-interval="2000" write-buffer-size="65536"
disk-usage-warning-percentage="80"
                disk-usage-critical-percentage="98">
        <disk-dirs>
            <disk-dir>/scripts/data-1</disk-dir>
        </disk-dirs>
    </disk-store>
    <region name="ukCustomers" refid="PARTITION_REDUNDANT_PERSISTENT">
        <region-attributes data-policy="persistent-partition"
                           disk-store-name="disk-store-1"
                           statistics-enabled="true"
disk-synchronous="true">
            <partition-attributes redundant-copies="1"
recovery-delay="5000" startup-recovery-delay="5000"/>
        </region-attributes>
    </region>
</cache>

server 2 is also restarted in the similar manner with cache-2.xml. However
for cache-2.xml,  dish-dir would be /scripts/data-2
& disk-store-name="disk-store-2"

NoAvailableServers after geode server restart

Reply via email to