Hi all,
I am deploying geode on kubernestes with 2 locators and 5 servers
And observed very weird inconsistent while running the application:
Locators are started by command:
gfsh start locator --name="${HOSTNAME}" --connect=false
--locators="${LOCATORS}" --port=10334
Servers are started by command
gfsh start server --bind-address=$ip --name="${HOSTNAME}"
--cache-xml-file=/geode/config/cache.xml --groups memory --initial-heap=20g
--max-heap=20g --eviction-heap-percentage=60 --critical-heap-percentage=80
--J=-XX:+UseConcMarkSweepGC --J=-XX:CMSInitiatingOccupancyFraction=60
--locators="${LOCATORS}"
Where the cache.xml looks like following:
<?xml version="1.0" encoding="UTF-8"?>
<cache
xmlns="http://geode.apache.org/schema/cache"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://geode.apache.org/schema/cache
http://geode.apache.org/schema/cache/cache-1.0.xsd"
version="1.0">
<disk-store name="transactions_overflow">
<disk-dirs>
<disk-dir>/data/transactions_overflow</disk-dir>
</disk-dirs>
</disk-store>
<region name="transactions">
<region-attributes refid="PARTITION_OVERFLOW">
<partition-attributes total-num-buckets="23"
redundant-copies="3"/>
</region-attributes>
<region-attributes disk-store-name="transactions_overflow"
></region-attributes>
<region-attributes>
<eviction-attributes>
<lru-entry-count action="overflow-to-disk"/>
</eviction-attributes>
</region-attributes>
</region>
</cache>
Here's how I start the cluster
1/ start first locator and wait some time for it to fully startup
2/ start second locators and wait some time for it to fully startup
3/ start all 5 servers at the same time
The cluster came up nicely, I can see 2 locators and 5 servers when
connecting to any locators via gfsh
However, when I start populating data to my region I got in-consistent data
returns by my client
I run following code:
while(true) {
long start = System.currentTimeMillis();
ClientCache cache = new
ClientCacheFactory().set("cache-xml-file", "cache.xml").create();
TransactionGeoDAO geoDAO = new TransactionGeoDAO(cache);
HashMap<String, PositionStore> records =
geoDAO.getTransactions(date);
LOGGER.info(String.format("Timetaken %s, Number of records %s",
System.currentTimeMillis() - start, records.size()));
cache.close();
}
The query in getTransactions is : select * from /transactions where
date=%s
And the result returns very consisetnt (even after i stop publishing
data)
716676 INFO com.tata.mo.Reconcile - Timetaken 483, Number of records
2446
717879 INFO com.tata.mo.Reconcile - Timetaken 1203, Number of
records 2593
718290 INFO com.tata.mo.Reconcile - Timetaken 411, Number of records
2057
718810 INFO com.tata.mo.Reconcile - Timetaken 520, Number of records
2593
719180 INFO com.tata.mo.Reconcile - Timetaken 370, Number of records
2446
719834 INFO com.tata.mo.Reconcile - Timetaken 654, Number of records
2057
720374 INFO com.tata.mo.Reconcile - Timetaken 540, Number of records
2446
721579 INFO com.tata.mo.Reconcile - Timetaken 1205, Number of
records 2593
722255 INFO com.tata.mo.Reconcile - Timetaken 676, Number of records
2057
722733 INFO com.tata.mo.Reconcile - Timetaken 478, Number of records
2057
Here's my cache.xml for client
<!DOCTYPE client-cache PUBLIC
"-//GemStone Systems, Inc.//GemFire Declarative Caching 6.5//EN"
"http://www.gemstone.com/dtd/cache8_0.dtd">
<client-cache>
<pool name="myPool">
<locator host="locator1" port="10334"/>
<locator host="locator2" port="10334"/>
</pool>
<region name="transactions" refid="PROXY"/>
</client-cache>
But without using --cache-xml-file=/geode/config/cache.xml option, If
region is created by gfsh when all servers came up the result will be
consistent
Besides of above errors, I sometime got following erros:
1/ NoAvailableServersException
All locators and servers are still running, "list members" still show
correct members
ERROR StatusLogger Unrecognized conversion specifier [n] starting at
position 56 in conversion pattern.
Exception in thread "main"
org.apache.geode.cache.client.NoAvailableServersException
2/ When I try to restart a server, it failes with error missing region
/transactions_overflow
Althouth it has been defined in cache.xml file
Could anyone please help to check if my deployment method is in the right
way?
--
Thanks
Kien