Hi,
FYI: I have filed a JIRA ticket on this but I thought may be someone might be
aware of solution or workaround for this problem. So, I am posting it here as
well.
In one of the project we are using Geode. Here is a summary of how we use it.
- Geode servers (Release 1.1.1) have multiple regions.
- Clients subscribe to the data from these regions.
- Clients subscribe interest in all the entries, therefore they get updates
about all the entries from creation to modification to deletion.
- One of the regions usually has 5-10 million entries with a TTL of 24 hours.
Most entries are added in an hour's span one after other. So, when TTL kicks
in, they are often destroyed in an hour.
Problem:
Every now and then we observe following message:
Client queue for
_gfe_non_durable_client_with_id_x.x.x.x(14229:loner):42754:e4266fc4_2_queue
client is full.
This seems to happen when the TTL kicks in. Entries start getting evicted
(deleted), the updates now must be sent to clients. We see that the updates do
happen for a while but suddenly the updates stop and the queue size starts
growing. This is becoming a major issue for smooth functioning of our
production setup. Any help will be much appreciated.
I did some ground work by downloading and looking at the code. I see reference
to 2 issues #37581, #51400. But I am unable to view actual JIRA tickets (needs
login credentials) Hopefully, it helps someone looking at the issue.
Here is the pertinent code:
@Override
@edu.umd.cs.findbugs.annotations.SuppressWarnings("TLW_TWO_LOCK_WAIT")
void checkQueueSizeConstraint() throws InterruptedException {
if (this.haContainer instanceof HAContainerMap && isPrimary()) { // Fix
for bug 39413
if (Thread.interrupted())
throw new InterruptedException();
synchronized (this.putGuard) {
if (putPermits <= 0) {
synchronized (this.permitMon) {
if (reconcilePutPermits() <= 0) {
if
(region.getSystem().getConfig().getRemoveUnresponsiveClient()) {
isClientSlowReciever = true;
} else {
try {
long logFrequency =
CacheClientNotifier.DEFAULT_LOG_FREQUENCY;
CacheClientNotifier ccn = CacheClientNotifier.getInstance();
if (ccn != null) { // check needed for junit tests
logFrequency = ccn.getLogFrequency();
}
if ((this.maxQueueSizeHitCount % logFrequency) == 0) {
logger.warn(LocalizedMessage.create(
LocalizedStrings.HARegionQueue_CLIENT_QUEUE_FOR_0_IS_FULL,
new Object[] {region.getName()}));
this.maxQueueSizeHitCount = 0;
}
++this.maxQueueSizeHitCount;
this.region.checkReadiness(); // fix for bug 37581
// TODO: wait called while holding two locks
this.permitMon.wait(CacheClientNotifier.eventEnqueueWaitTime);
this.region.checkReadiness(); // fix for bug 37581
// Fix for #51400. Allow the queue to grow beyond its
// capacity/maxQueueSize, if it is taking a long time to
// drain the queue, either due to a slower client or the
// deadlock scenario mentioned in the ticket.
reconcilePutPermits();
if ((this.maxQueueSizeHitCount % logFrequency) == 1) {
logger.info(LocalizedMessage
.create(LocalizedStrings.HARegionQueue_RESUMING_WITH_PROCESSING_PUTS));
}
} catch (InterruptedException ex) {
// TODO: The line below is meaningless. Comment it out later
this.permitMon.notifyAll();
throw ex;
}
}
}
} // synchronized (this.permitMon)
} // if (putPermits <= 0)
--putPermits;
} // synchronized (this.putGuard)
}
}
Thanks
Mangesh