Thanks a lot Guozhang, I've now upgraded to 0.8.2-beta and the issue seems to be gone.

András

On 11/3/2014 4:45 PM, Guozhang Wang wrote:
Hi Andras,

Could you try 0.8.2-beta and see if this issue comes out again? We fixed a
couple of the purgatory issues (e.g. KAFKA-1616
<https://issues.apache.org/jira/browse/KAFKA-1616>) in 0.8.2, but I do not
remember any of them will cause OOM.

Guozhang

On Mon, Nov 3, 2014 at 5:42 AM, András Serény <sereny.and...@gravityrd.com>
wrote:

Hi Kafka users,

we're running a cluster of two Kafka 0.8.1.1 brokers, with a twofold
replicaton of each topic.

When both brokers are up, after a short while the FetchRequestPurgatory
starts to grow indefinitely on the leader (detectable via a heap dump and
also via the "FetchRequestPurgatory"."PurgatorySize" JMX metric),
eventually leading to an OOM error. When one of the brokers is shut down,
the purgatory stops growing in size, and the remaining broker runs fine. In
https://issues.apache.org/jira/browse/KAFKA-1016, I see this can occur
when a fetcher specifies a too large max wait time, but we don't override
replica.fetch.wait.max.ms, leaving it at the default 500 ms.

Do you have any suggestions what can be the cause and how to fix it?

Thanks a lot,
András




Reply via email to