Re: Increased CPU usage with 0.8.2-beta

2015-02-16 Thread Solon Gordon
I tested the new patch out and am seeing comparable CPU usage to the previous patch. As far as I can see, heap usage is also comparable between the two patches, though I will say that both look significantly better than 0.8.1.1 (~250MB vs. ~1GB). I'll report back if any new issues come up as I

Re: Increased CPU usage with 0.8.2-beta

2015-02-14 Thread Mathias Söderberg
Jun, I updated our brokers earlier today with the mentioned patch. A week ago our brokers used ~380% CPU (out of 400%) quite consistently, and now they're varying between 250-325% (probably running a bit high right now as we have some consumers catching up quite some lag), so there's definitely

Re: Increased CPU usage with 0.8.2-beta

2015-02-13 Thread Todd Palino
I'm checking into this on our side. The version we're working on jumping to right now is not the 0.8.2 release version, but it is significantly ahead of 0.8.1.1. We've got it deployed on one cluster and I'm making sure it's balanced right now before I take a look at all the metrics. I'll fill in

Re: Increased CPU usage with 0.8.2-beta

2015-02-13 Thread Jay Kreps
We can reproduce this issue, have a theory as to the cause, and are working on a fix. Here is the ticket to track it: https://issues.apache.org/jira/browse/KAFKA-1952 I would recommend people hold off on 0.8.2 upgrades until we have a handle on this. -Jay On Fri, Feb 13, 2015 at 1:47 PM, Solon

Re: Increased CPU usage with 0.8.2-beta

2015-02-13 Thread Solon Gordon
Thanks for the fast response. I did a quick test and initial results look promising. When I swapped in the patched version, CPU usage dropped from ~150% to ~65%. Still a bit higher than what I see with 0.8.1.1 but much more reasonable. I'll do more testing on Monday but wanted to get you some

Re: Increased CPU usage with 0.8.2-beta

2015-02-12 Thread Jay Kreps
This is a serious issue, we'll take a look. -Jay On Thu, Feb 12, 2015 at 3:19 PM, Solon Gordon so...@knewton.com wrote: I saw a very similar jump in CPU usage when I tried upgrading from 0.8.1.1 to 0.8.2.0 today in a test environment. The Kafka cluster there is two m1.larges handling 2,000

Re: Increased CPU usage with 0.8.2-beta

2015-02-03 Thread Mathias Söderberg
Jun, I re-ran the hprof test, for about 30 minutes again, for 0.8.2.0-rc2 with the same version of snappy that 0.8.1.1 used. Attached the logs. Unfortunately there wasn't any improvement as the node running 0.8.2.0-rc2 still had a higher load and CPU usage. Best regards, Mathias On Tue Feb 03

Re: Increased CPU usage with 0.8.2-beta

2015-02-03 Thread Jun Rao
Mathias, The new hprof doesn't reveal anything new to me. We did fix the logic in using Purgatory in 0.8.2, which could potentially drive up the CPU usage a bit. To verify that, could you do your test on a single broker (with replication factor 1) btw 0.8.1 and 0.8.2 and see if there is any

Re: Increased CPU usage with 0.8.2-beta

2015-02-02 Thread Mathias Söderberg
Hi all, I ran the same hprof test on 0.8.1.1, and also did a re-run on 0.8.2.0-rc2, attached logs from both runs. Both runs lasted for 30-40 minutes. The configurations used can be found over here: https://gist.github.com/mthssdrbrg/5fcb9fbdb851d8cc66a2. The configuration used for the first run

Re: Increased CPU usage with 0.8.2-beta

2015-02-02 Thread Mathias Söderberg
Jun, Yeah, sure, I'll take it for a spin tomorrow. On Mon Feb 02 2015 at 11:08:42 PM Jun Rao j...@confluent.io wrote: Mathias, Thanks for the info. I took a quick look. The biggest difference I saw is the org.xerial.snappy.SnappyNative.rawCompress() call. In 0.8.1.1, it uses about 0.05% of

Re: Increased CPU usage with 0.8.2-beta

2015-02-02 Thread Jun Rao
Mathias, Thanks for the info. I took a quick look. The biggest difference I saw is the org.xerial.snappy.SnappyNative.rawCompress() call. In 0.8.1.1, it uses about 0.05% of the CPU. In 0.8.2.0, it uses about 0.10% of the CPU. We did upgrade snappy from 1.0.5 in 0.8.1.1 to 1.1.1.6 in 0.8.2.0.

Re: Increased CPU usage with 0.8.2-beta

2015-02-02 Thread Jaikiran Pai
On Monday 02 February 2015 11:03 PM, Jun Rao wrote: Jaikiran, The fix you provided in probably unnecessary. The channel that we use in SimpleConsumer (BlockingChannel) is configured to be blocking. So even though the read from the socket is in a loop, each read blocks if there is no bytes

Re: Increased CPU usage with 0.8.2-beta

2015-02-02 Thread Jay Kreps
Actually that fetch call blocks on the server side. That is, if there is no data, the server will wait until data arrives or the timeout occurs to send a response. This is done to help simplify the client development. If that isn't happening it is likely a bug or a configuration change in the

Re: Increased CPU usage with 0.8.2-beta

2015-02-02 Thread Jay Kreps
Ah, yeah, you're right. That is just wait time not CPU time. We should check that profile it must be something else on the list. -Jay On Mon, Feb 2, 2015 at 9:33 AM, Jun Rao j...@confluent.io wrote: Hi, Mathias, From the hprof output, it seems that the top CPU consumers are socketAccept()

Re: Increased CPU usage with 0.8.2-beta

2015-02-02 Thread Jun Rao
Hi, Mathias, From the hprof output, it seems that the top CPU consumers are socketAccept() and epollWait(). As far as I am aware, there hasn't been any significant changes in the socket server code btw 0.8.1 and 0.8.2. Could you run the same hprof test on 0.8.1 so that we can see the difference?

Re: Increased CPU usage with 0.8.2-beta

2015-02-01 Thread Jaikiran Pai
Hi Mathias, Looking at that thread dump, I think the potential culprit is this one: TRACE 303545: (thread=200049) sun.nio.ch.EPollArrayWrapper.epollWait(EPollArrayWrapper.java:Unknown line) sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)

Re: Increased CPU usage with 0.8.2-beta

2015-01-26 Thread Mathias Söderberg
Hi Neha, I sent an e-mail earlier today, but noticed now that it didn't actually go through. Anyhow, I've attached two files, one with output from a 10 minute run and one with output from a 30 minute run. Realized that maybe I should've done one or two runs with 0.8.1.1 as well, but

Re: Increased CPU usage with 0.8.2-beta

2014-12-09 Thread Mathias Söderberg
Hi Neha, Yeah sure. I'm not familiar with hprof, so any particular options I should include or just run with defaults? Best regards, Mathias On Mon Dec 08 2014 at 7:41:32 PM Neha Narkhede n...@confluent.io wrote: Thanks for reporting the issue. Would you mind running hprof and sending the

Re: Increased CPU usage with 0.8.2-beta

2014-12-09 Thread Neha Narkhede
The following should be sufficient java -agentlib:hprof=cpu=samples,depth=100,interval=20,lineno=y,thread=y,file=kafka.hprof classname You would need to start the Kafka server with the settings above for sometime until you observe the problem. On Tue, Dec 9, 2014 at 3:47 AM, Mathias Söderberg

Increased CPU usage with 0.8.2-beta

2014-12-08 Thread Mathias Söderberg
Good day, I upgraded a Kafka cluster from v0.8.1.1 to v0.8.2-beta and noticed that the CPU usage on the broker machines went up by roughly 40%, from ~60% to ~100% and am wondering if anyone else has experienced something similar? The load average also went up by 2x-3x. We're running on EC2 and

Re: Increased CPU usage with 0.8.2-beta

2014-12-08 Thread Neha Narkhede
Thanks for reporting the issue. Would you mind running hprof and sending the output? On Mon, Dec 8, 2014 at 1:25 AM, Mathias Söderberg mathias.soederb...@gmail.com wrote: Good day, I upgraded a Kafka cluster from v0.8.1.1 to v0.8.2-beta and noticed that the CPU usage on the broker machines