You can take a profile with Java Flight Recorder if you use Java 11 or using async profiler otherwise. See below for the latter:
https://issues.apache.org/jira/browse/KAFKA-9339?page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel&focusedCommentId=17013400#comment-17013400 It's worth filing a JIRA and discuss it there. Ismael On Sun, Jan 12, 2020 at 10:28 PM Navneeth Krishnan <reachnavnee...@gmail.com> wrote: > Hi Ismael, > > We were previously running on 0.10.2.1 with 8 brokers running around 80% > CPU. But now we have upgraded to 2.3 with 16 brokers. It's the same message > rate, topics, producers and consumers but the CPU is still >80%. How can we > troubleshoot to find where exactly is the problem? > > Thanks > > On Wed, Jan 8, 2020 at 10:33 AM Ismael Juma <ism...@juma.me.uk> wrote: > > > Has the behavior changed after an upgrade or has it been consistent since > > the start? > > > > Ismael > > > > On Thu, Jan 2, 2020 at 4:18 PM Navneeth Krishnan < > reachnavnee...@gmail.com > > > > > wrote: > > > > > Hi All, > > > > > > We have a kafka cluster with 12 nodes and we are pretty much seeing 90% > > > cpu usage on all the nodes. Here is all the information. Need some help > > on > > > figuring out what the problem is and how to overcome this issue. > > > > > > *Cluster:* > > > Kafka version: 2.3.0 > > > Number of brokers in cluster: 12 > > > Node type: 4 vCores 32GB mem > > > Network In: 10Mbps per broker > > > Network Out: 16Mbps per broker > > > Topics: 10 (approximately) > > > Partitions: 20 (Max), some has only partitions > > > Replication Factor: 3 > > > > > > *CPU Usage:* > > > [image: image.png] > > > > > > *VMStat* > > > > > > [root]# vmstat 1 10 > > > > > > procs -----------memory---------- ---swap-- -----io---- -system-- > > > ------cpu----- > > > > > > r b swpd free buff cache si so bi bo in cs us sy > > id > > > wa st > > > > > > 8 0 0 234444 19064 24046980 0 0 17 2026 1 3 38 > 33 > > > 28 0 1 > > > > > > 7 0 0 256444 19036 24023880 0 0 768 0 64027 22708 > 44 > > > 40 16 0 1 > > > > > > 7 0 0 245356 19052 24034560 0 0 256 472 63509 23276 > 44 > > > 39 17 0 1 > > > > > > 7 0 0 235096 19052 24046616 0 0 0 0 62277 22516 > 46 > > > 38 15 0 1 > > > > > > 8 0 0 260548 19036 24020084 0 0 516 49888 62364 22894 > 43 > > > 38 18 0 1 > > > > > > 5 0 0 249232 19036 24030924 0 0 512 0 61022 24589 > 41 > > > 39 20 0 1 > > > > > > 6 0 0 238072 19036 24042512 0 0 1024 0 63358 23063 > 44 > > > 38 17 0 0 > > > > > > 5 0 0 262904 19052 24017972 0 0 0 440 63078 23499 > 46 > > > 37 17 0 1 > > > > > > 7 0 0 250324 19052 24030008 0 0 0 0 64615 22617 > 48 > > > 38 14 0 1 > > > > > > 6 0 0 237920 19052 24042372 0 0 1024 48900 63223 23029 > 42 > > > 40 18 0 1 > > > > > > > > > *IO Stat:* > > > > > > [root]# iostat -m > > > > > > Linux 4.14.72-73.55.amzn2.x86_64 (loc-kafka11.internal.dnaspaces.io) > > > 01/02/2020 _x86_64_ (4 CPU) > > > > > > > > > > > > avg-cpu: %user %nice %system %iowait %steal %idle > > > > > > 38.11 0.00 33.09 0.11 0.61 28.08 > > > > > > > > > > > > Device: tps MB_read/s MB_wrtn/s MB_read MB_wrtn > > > > > > xvda 2.36 0.01 0.01 26760 43360 > > > > > > nvme0n1 0.00 0.00 0.00 2 0 > > > > > > xvdf 70.95 0.06 7.67 185908 25205338 > > > > > > *Top Kafka broker threads:* > > > [image: image.png] > > > > > > *Top 3:* > > > > > > > "data-plane-kafka-network-thread-10-ListenerName(PLAINTEXT)-PLAINTEXT-0" > > > #60 prio=5 os_prio=0 tid=0x00007f8b1ab56000 nid=0x581f runnable > > > [0x00007f8a886ce000] > > > > > > > "data-plane-kafka-network-thread-10-ListenerName(PLAINTEXT)-PLAINTEXT-2" > > > #62 prio=5 os_prio=0 tid=0x00007f8b1ab59000 nid=0x5821 runnable > > > [0x00007f8a6aefd000] > > > > > > > "data-plane-kafka-network-thread-10-ListenerName(PLAINTEXT)-PLAINTEXT-1" > > > #61 prio=5 os_prio=0 tid=0x00007f8b1ab57800 nid=0x5820 runnable > > > [0x00007f8a885cd000] > > > > > > It doesn't looks like GC and IO is the problem. > > > > > > Thanks > > > > > >