Just to clarify why I think increasing queue size more may not help is that, I 
have also looked at the stats. And the pattern is that the 
“receiveBytes-CacheClientUpdaterStats” on the client side suddenly seizes to 
show any activity from the time where we see “client queue full” message on the 
server side.

From: Mangesh Deshmukh <[email protected]>
Date: Wednesday, September 27, 2017 at 3:38 PM
To: "[email protected]" <[email protected]>
Subject: Re: Subscription Queue Full

Hi Anil,

Thanks for responding.
We tried increasing SubscriptionQueue size as well. As of now, the size is 
configured to be 500k. From the code comments it looks like some kind of 
deadlock is happening. Maybe folks familiar with the code given below maybe 
able to shed some light.

Thanks,
Mangesh


From: Anilkumar Gingade <[email protected]>
Reply-To: "[email protected]" <[email protected]>
Date: Wednesday, September 27, 2017 at 3:09 PM
To: "[email protected]" <[email protected]>
Subject: Re: Subscription Queue Full

You can try increasing the subscription queue...
Following are some of the steps to manage subscription queue:
http://gemfire.docs.pivotal.io/geode/developing/events/limit_server_subscription_queue_size.html

-Anil.


On Wed, Sep 27, 2017 at 2:58 PM, Mangesh Deshmukh 
<[email protected]<mailto:[email protected]>> wrote:
Hi,

FYI: I have filed a JIRA ticket on this but I thought may be someone might be 
aware of solution or workaround for this problem. So, I am posting it here as 
well.

In one of the project we are using Geode. Here is a summary of how we use it.
- Geode servers (Release 1.1.1) have multiple regions.
- Clients subscribe to the data from these regions.
- Clients subscribe interest in all the entries, therefore they get updates 
about all the entries from creation to modification to deletion.
- One of the regions usually has 5-10 million entries with a TTL of 24 hours. 
Most entries are added in an hour's span one after other. So, when TTL kicks 
in, they are often destroyed in an hour.

Problem:
Every now and then we observe following message:
                Client queue for 
_gfe_non_durable_client_with_id_x.x.x.x(14229:loner):42754:e4266fc4_2_queue 
client is full.
This seems to happen when the TTL kicks in. Entries start getting evicted 
(deleted), the updates now must be sent to clients. We see that the updates do 
happen for a while but suddenly the updates stop and the queue size starts 
growing. This is becoming a major issue for smooth functioning of our 
production setup. Any help will be much appreciated.

I did some ground work by downloading and looking at the code. I see reference 
to 2 issues #37581, #51400. But I am unable to view actual JIRA tickets (needs 
login credentials) Hopefully, it helps someone looking at the issue.
Here is the pertinent code:

   @Override
    @edu.umd.cs.findbugs.annotations.SuppressWarnings("TLW_TWO_LOCK_WAIT")
    void checkQueueSizeConstraint() throws InterruptedException {
      if (this.haContainer instanceof HAContainerMap && isPrimary()) { // Fix 
for bug 39413
        if (Thread.interrupted())
          throw new InterruptedException();
        synchronized (this.putGuard) {
          if (putPermits <= 0) {
            synchronized (this.permitMon) {
              if (reconcilePutPermits() <= 0) {
                if 
(region.getSystem().getConfig().getRemoveUnresponsiveClient()) {
                  isClientSlowReciever = true;
                } else {
                  try {
                    long logFrequency = 
CacheClientNotifier.DEFAULT_LOG_FREQUENCY;
                    CacheClientNotifier ccn = CacheClientNotifier.getInstance();
                    if (ccn != null) { // check needed for junit tests
                      logFrequency = ccn.getLogFrequency();
                    }
                    if ((this.maxQueueSizeHitCount % logFrequency) == 0) {
                      logger.warn(LocalizedMessage.create(
                          
LocalizedStrings.HARegionQueue_CLIENT_QUEUE_FOR_0_IS_FULL,
                          new Object[] {region.getName()}));
                      this.maxQueueSizeHitCount = 0;
                    }
                    ++this.maxQueueSizeHitCount;
                    this.region.checkReadiness(); // fix for bug 37581
                    // TODO: wait called while holding two locks
                    
this.permitMon.wait(CacheClientNotifier.eventEnqueueWaitTime);
                    this.region.checkReadiness(); // fix for bug 37581
                    // Fix for #51400. Allow the queue to grow beyond its
                    // capacity/maxQueueSize, if it is taking a long time to
                    // drain the queue, either due to a slower client or the
                    // deadlock scenario mentioned in the ticket.
                    reconcilePutPermits();
                    if ((this.maxQueueSizeHitCount % logFrequency) == 1) {
                      logger.info<http://logger.info>(LocalizedMessage
                          
.create(LocalizedStrings.HARegionQueue_RESUMING_WITH_PROCESSING_PUTS));
                    }
                  } catch (InterruptedException ex) {
                    // TODO: The line below is meaningless. Comment it out later
                    this.permitMon.notifyAll();
                    throw ex;
                  }
                }
              }
            } // synchronized (this.permitMon)
          } // if (putPermits <= 0)
          --putPermits;
        } // synchronized (this.putGuard)
      }
    }


Thanks
Mangesh


Reply via email to