Re:Flooded topology after a full GC

Ramin Farajollah (BLOOMBERG/ 731 LEX) Thu, 19 Dec 2019 06:59:59 -0800

Correction: HotSpot 8 (not OpenJDK 8)

From: [email protected] At: 12/19/19 09:56:34To:  [email protected]
Subject: Flooded topology after a full GC


Hi,

We use an object pool for messages in tuples. It has been effective to reduce 
GCs in creating the heavy objects.

After a full GC (~30sec), the Zookeeper connection is suspended and is restored 
by Curator. This is followed by a huge rise in the number of the objects 
(presumably in flight). This leads into more frequent full GCs and the eventual 
crash of the topology.

I'm trying to understand what triggers the huge rise immediately after STW of 
full GC/Curator reconnect. My guess is that all tuples had failed due to zk 
timeout and were resent. In addition, there may be acks/fails signals 
exasperating the situation.

My questions are:
1) How to determine if tuples are resent?
2) How to determine if acks/fails contribute to the traffic?
3) Without back pressure, are excessive tuples are silently discarded from the 
outbound or the inbound queues?
4) What happens to the failed tuples? (I need a hook to release the objects).

Details:
- OpenJDK 8
- Storm 1.2.3
- Curator 2.12.0
- zk session timeout 40000 ms, connection timeout 1500 ms
- Initially the cache is adequate (8gb)

Re:Flooded topology after a full GC

Reply via email to