Controller is not running in this node. We found that one of the new
producer added was publishing only to a specific partition which was in
this node, that explains the high utilization only in this node. The
publisher was running in an Async thread invoking the send() asynchronously
but the messages did not show up in kafka consumer. This but resulted in
high CPU utilization. Though the producer was disabled, the async threads
were still running. It normalized only after the application with the
producer was restarted.

These were in 2 data centers and I am not sure if the latency caused the
async thread to get completed before the Kafka producer thread to write the
message to the broker. When we did a synchronous send(message).get() it
worked fine. We might have published 500 messages every 15 minutes but not
sure why it boosted the CPU utilization without even the consumption
happening in the kafka broker. We are rewriting the producer
implementation, it seems async send invoked from an async java thread does
not work in networks with low latency.

On Sun, Aug 5, 2018 at 10:16 AM, Manjunath N <manj...@gmail.com> wrote:

> After you deleted a topic was it a clean delete. Did you verify in
> zookeeper and kafka logs directory? if not you may need to do some clean up
> if there are inconsistency in kafka logs dir and zookeeper.
> did you try to move the replicas assignment to different machines for this
> topic and see if it behaves same way on other machines for this particular
> topic?
> Check number of open file handle before you start writing to this topic
> and after you kick off the writer/producer on all the replica machines for
> this topic.
> Check how many log files are being created for the partition segment in
> kafka logs dir.
> Is there anything in log files to trace back this behavior? if you could
> check and share any errors or warning it will help.
>
>
> > On Aug 5, 2018, at 4:13 AM, Ted Yu <yuzhih...@gmail.com> wrote:
> >
> > bq. only one specific node is showing this issue
> >
> > Is controller running on this node ? updating the metrics is expensive.
> >
> > Cheers
> >
> > On Sat, Aug 4, 2018 at 3:00 PM Abhijith Sreenivasan <
> > abhijithonl...@gmail.com> wrote:
> >
> >> Hello
> >>
> >> We are seeing high CPU usage for the Kafka process. I am using 0.11
> >> version. Has 5 topics out of which 1 was created newly. We attempted to
> >> publish message this new topic which did not show up in the consumer,
> but
> >> no errors in the publisher end. Not sure why the message did not show
> up in
> >> consumer.
> >>
> >> This ran for a couple of days (30K messages) when we noticed 100%+ CPU
> >> usage. Tried deleting the topic (config is enabled), it was marked for
> >> deletion but after which usage rose to below levels 240%+. We restarted
> the
> >> process many times and disabled the publisher/producer but no
> difference.
> >> After some time (1 or 2 hours) we are getting a "Too many open files"
> error
> >> and process is shutting down.
> >>
> >> We have 3 nodes with Kafka and 3 other nodes running with ZK, but only
> one
> >> specific node is showing this issue. (where new topic partition is
> >> present).
> >>
> >> Still debugging and this is a prod environment.. please help!
> >>
> >> Thanks,
> >> Abhi
> >>
> >> top - 17:47:43 up 289 days, 18:54,  2 users,  load average: 2.65, 2.72,
> >> 2.52
> >> Tasks: 144 total,   1 running, 143 sleeping,   0 stopped,   0 zombie
> >> %Cpu(s): 37.5 us, 19.4 sy,  0.0 ni, 37.0 id,  0.0 wa,  0.0 hi,  5.4 si,
> >> 0.7 st
> >> KiB Mem : 16266464 total,  1431916 free,  5769976 used,  9064572
> buff/cache
> >> KiB Swap:        0 total,        0 free,        0 used.  9230548 avail
> Mem
> >>
> >>  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+
> COMMAND
> >> 32058 root      20   0 5898348 1.078g  15548 S 253.0  6.9  99:03.68 java
> >>   10 root      20   0       0      0      0 S   0.3  0.0 921:42.77
> >> rcu_sche
> >>
>
>

Reply via email to