The problem is gone, but I'm unsure of the root cause. The client library I use recently added support for the new producer. Switching to that seems to have sidestepped the problem.
On Tue, Jun 30, 2015 at 12:53 PM, Shayne S <shaynest...@gmail.com> wrote: > Thanks for responding Gwen. > > There is sometimes an error in server.log, but it isn't 1 to 1 with the > connection resets on the producer (although this is a socket from that > box). : > > [2015-06-30 15:26:09,544] ERROR Closing socket for /10.0.6.244 because of > error (kafka.network.Processor) > java.io.IOException: Connection reset by peer > at sun.nio.ch.FileDispatcherImpl.read0(Native Method) > at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) > at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) > at sun.nio.ch.IOUtil.read(IOUtil.java:197) > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379) > at kafka.utils.Utils$.read(Utils.scala:380) > at > kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54) > at kafka.network.Processor.read(SocketServer.scala:444) > at kafka.network.Processor.run(SocketServer.scala:340) > at java.lang.Thread.run(Thread.java:745) > > Maybe interesting, I also see the following in the server.log with the > mentioned 15-16 minute quiet window: > > [2015-06-30 14:41:55,978] INFO Closing socket connection to /10.0.6.244. > (kafka.network.Processor) > [2015-06-30 14:41:55,979] INFO Closing socket connection to /10.0.6.244. > (kafka.network.Processor) > [2015-06-30 14:41:55,982] INFO Closing socket connection to /10.0.6.244. > (kafka.network.Processor) > [2015-06-30 14:57:27,308] INFO Closing socket connection to /10.0.6.244. > (kafka.network.Processor) > [2015-06-30 14:57:27,309] INFO Closing socket connection to /10.0.6.244. > (kafka.network.Processor) > [2015-06-30 14:57:27,312] INFO Closing socket connection to /10.0.6.244. > (kafka.network.Processor) > > On Tue, Jun 30, 2015 at 12:44 PM, Gwen Shapira <gshap...@cloudera.com> > wrote: > >> Do you see any errors on the brokers when this happens? >> >> On Tue, Jun 30, 2015 at 10:14 AM, Shayne S <shaynest...@gmail.com> wrote: >> > This problem is intermittent, not sure what is causing it. Some days >> > everything runs non-stop with no issues, some days I get the following. >> > >> > Setup: >> > - Single broker >> > - Running 0.8.2.1 >> > >> > I'm running a single broker. When the problem is presenting, anywhere >> from >> > 5,000 to 30,000 messages may be processed in a few seconds, then >> everything >> > stops for 15-16 minutes before I see this: >> > >> > 2015-06-30 16:48:03,737 [async-thread-macro-1] INFO >> > kafka.producer.SyncProducer - Disconnecting from >> > ip-10-0-9-105.ec2.internal:9092 >> > 2015-06-30 16:48:03,744 [async-thread-macro-1] WARN >> > kafka.producer.async.DefaultEventHandler - Failed to send producer >> request >> > with correlation id 42474 to broker 0 with data for partitions >> > [refrigerators_archive_features,0] >> > java.io.IOException: Connection timed out >> > at sun.nio.ch.FileDispatcherImpl.writev0(Native Method) >> > at sun.nio.ch.SocketDispatcher.writev(SocketDispatcher.java:51) >> > at sun.nio.ch.IOUtil.write(IOUtil.java:148) >> > at >> sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:524) >> > at java.nio.channels.SocketChannel.write(SocketChannel.java:493) >> > at >> > >> kafka.network.BoundedByteBufferSend.writeTo(BoundedByteBufferSend.scala:56) >> > at >> kafka.network.Send$class.writeCompletely(Transmission.scala:75) >> > at >> > >> kafka.network.BoundedByteBufferSend.writeCompletely(BoundedByteBufferSend.scala:26) >> > at kafka.network.BlockingChannel.send(BlockingChannel.scala:92) >> > at >> kafka.producer.SyncProducer.liftedTree1$1(SyncProducer.scala:72) >> > at >> > >> kafka.producer.SyncProducer.kafka$producer$SyncProducer$$doSend(SyncProducer.scala:71) >> > at >> > >> kafka.producer.SyncProducer$$anonfun$send$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(SyncProducer.scala:102) >> > at >> > >> kafka.producer.SyncProducer$$anonfun$send$1$$anonfun$apply$mcV$sp$1.apply(SyncProducer.scala:102) >> > at >> > >> kafka.producer.SyncProducer$$anonfun$send$1$$anonfun$apply$mcV$sp$1.apply(SyncProducer.scala:102) >> > at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33) >> > at >> > >> kafka.producer.SyncProducer$$anonfun$send$1.apply$mcV$sp(SyncProducer.scala:101) >> > at >> > >> kafka.producer.SyncProducer$$anonfun$send$1.apply(SyncProducer.scala:101) >> > at >> > >> kafka.producer.SyncProducer$$anonfun$send$1.apply(SyncProducer.scala:101) >> > at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33) >> > at kafka.producer.SyncProducer.send(SyncProducer.scala:100) >> > at >> > >> kafka.producer.async.DefaultEventHandler.kafka$producer$async$DefaultEventHandler$$send(DefaultEventHandler.scala:255) >> > at >> > >> kafka.producer.async.DefaultEventHandler$$anonfun$dispatchSerializedData$2.apply(DefaultEventHandler.scala:106) >> > at >> > >> kafka.producer.async.DefaultEventHandler$$anonfun$dispatchSerializedData$2.apply(DefaultEventHandler.scala:100) >> > ........ >> > >> > And the cycle repeats. Another few thousand messages and then it stops >> once >> > more. A thread dump shows this: >> > >> > "async-thread-macro-3" daemon prio=10 tid=0x00007f90383a5800 nid=0x6a8 >> > runnable [0x00007f9025872000] >> > java.lang.Thread.State: RUNNABLE >> > at sun.nio.ch.FileDispatcherImpl.writev0(Native Method) >> > at sun.nio.ch.SocketDispatcher.writev(SocketDispatcher.java:51) >> > at sun.nio.ch.IOUtil.write(IOUtil.java:148) >> > at >> sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:524) >> > - locked <0x000000008dd16c70> (a java.lang.Object) >> > at java.nio.channels.SocketChannel.write(SocketChannel.java:493) >> > at >> > >> kafka.network.BoundedByteBufferSend.writeTo(BoundedByteBufferSend.scala:56) >> > at >> kafka.network.Send$class.writeCompletely(Transmission.scala:75) >> > at >> > >> kafka.network.BoundedByteBufferSend.writeCompletely(BoundedByteBufferSend.scala:26) >> > at kafka.network.BlockingChannel.send(BlockingChannel.scala:92) >> > at >> kafka.producer.SyncProducer.liftedTree1$1(SyncProducer.scala:72) >> > at >> > >> kafka.producer.SyncProducer.kafka$producer$SyncProducer$$doSend(SyncProducer.scala:71) >> > - locked <0x000000008dd16b30> (a java.lang.Object) >> > at >> > >> kafka.producer.SyncProducer$$anonfun$send$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(SyncProducer.scala:102) >> > at >> > >> kafka.producer.SyncProducer$$anonfun$send$1$$anonfun$apply$mcV$sp$1.apply(SyncProducer.scala:102) >> > at >> > >> kafka.producer.SyncProducer$$anonfun$send$1$$anonfun$apply$mcV$sp$1.apply(SyncProducer.scala:102) >> > at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33) >> > at >> > >> kafka.producer.SyncProducer$$anonfun$send$1.apply$mcV$sp(SyncProducer.scala:101) >> > at >> > >> kafka.producer.SyncProducer$$anonfun$send$1.apply(SyncProducer.scala:101) >> > at >> > >> kafka.producer.SyncProducer$$anonfun$send$1.apply(SyncProducer.scala:101) >> > at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33) >> > at kafka.producer.SyncProducer.send(SyncProducer.scala:100) >> > at >> > >> kafka.producer.async.DefaultEventHandler.kafka$producer$async$DefaultEventHandler$$send(DefaultEventHandler.scala:255) >> > at >> > >> kafka.producer.async.DefaultEventHandler$$anonfun$dispatchSerializedData$2.apply(DefaultEventHandler.scala:106) >> > at >> > >> kafka.producer.async.DefaultEventHandler$$anonfun$dispatchSerializedData$2.apply(DefaultEventHandler.scala:100) >> > ......... >> > >> > I found an issue that sounds similar that is fixed in 0.8.3, but it >> doesn't >> > sound like my producer is getting stuck in the same place: >> > https://issues.apache.org/jira/browse/KAFKA-1733 >> > >> > Thanks in advance! >> > Shayne >> > >