Gregory Chase created GEODE-3709:
Summary: Geode Version: 1.1.1In one of the project we a...
Key: GEODE-3709
URL: https://issues.apache.org/jira/browse/GEODE-3709
Project: Geode
Issue Type: Improvement
Reporter: Gregory Chase
Geode Version: 1.1.1
In one of the project we are using Geode. Here is a summary of how we use it.
- Geode servers have multiple regions.
- Clients subscribe to the data from these regions.
- Clients subscribe interest in all the entries, therefore they get updates
about all the entries from creation to modification to deletion.
- One of the regions usually has 5-10 million entries with a TTL of 24 hours.
Most entries are added in an hour's span one after other. So when TTL kicks in,
they are often destroyed in an hour.
Problem:
Every now and then we observe following message:
Client queue for
_gfe_non_durable_client_with_id_x.x.x.x(14229:loner):42754:e4266fc4_2_queue
client is full.
This seems to happen when the TTL kicks in on the region with 5-10 million
entries. Entries start getting evicted (deleted); the updates (destroys) now
must be sent to clients. We see that the updates do happen for a while but
suddenly the updates stop and the queue size starts growing. This is becoming a
major issue for smooth functioning of our production setup. Any help will be
much appreciated.
I did some ground work by downloading and looking at the code. I see reference
to 2 issues #37581, #51400. But I am unable to view actual JIRA tickets (needs
login credentials) Hopefully, it helps someone looking at the issue.
Here is the pertinent code:
@Override
@edu.umd.cs.findbugs.annotations.SuppressWarnings("TLW_TWO_LOCK_WAIT")
void checkQueueSizeConstraint() throws InterruptedException {
if (this.haContainer instanceof HAContainerMap && isPrimary()) { // Fix
for bug 39413
if (Thread.interrupted())
throw new InterruptedException();
synchronized (this.putGuard) {
if (putPermits <= 0) {
synchronized (this.permitMon) {
if (reconcilePutPermits() <= 0) {
if
(region.getSystem().getConfig().getRemoveUnresponsiveClient()) {
isClientSlowReciever = true;
} else {
try {
long logFrequency =
CacheClientNotifier.DEFAULT_LOG_FREQUENCY;
CacheClientNotifier ccn = CacheClientNotifier.getInstance();
if (ccn != null) { // check needed for junit tests
logFrequency = ccn.getLogFrequency();
}
if ((this.maxQueueSizeHitCount % logFrequency) == 0) {
logger.warn(LocalizedMessage.create(
LocalizedStrings.HARegionQueue_CLIENT_QUEUE_FOR_0_IS_FULL,
new Object[] {region.getName()}));
this.maxQueueSizeHitCount = 0;
}
++this.maxQueueSizeHitCount;
this.region.checkReadiness(); // fix for bug 37581
// TODO: wait called while holding two locks
this.permitMon.wait(CacheClientNotifier.eventEnqueueWaitTime);
this.region.checkReadiness(); // fix for bug 37581
// Fix for #51400. Allow the queue to grow beyond its
// capacity/maxQueueSize, if it is taking a long time to
// drain the queue, either due to a slower client or the
// deadlock scenario mentioned in the ticket.
reconcilePutPermits();
if ((this.maxQueueSizeHitCount % logFrequency) == 1) {
logger.info(LocalizedMessage
.create(LocalizedStrings.HARegionQueue_RESUMING_WITH_PROCESSING_PUTS));
}
} catch (InterruptedException ex) {
// TODO: The line below is meaningless. Comment it out later
this.permitMon.notifyAll();
throw ex;
}
}
}
} // synchronized (this.permitMon)
} // if (putPermits <= 0)
--putPermits;
} // synchronized (this.putGuard)
}
}
*Reporter*: Mangesh Deshmukh
*E-mail*: [mailto:mdeshm...@quotient.com]
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)