Ewen, Thanks for your prompt reply.
I've read the issue. However, I think it seems to be another problem. During the logs' growing time, all consumers were kept down. And all nodes (including brokers and consumers) are disconnected due to the network problem. Besides, our new born cluster got no more than that much data (2GB). Anyway, thanks a lot. On Sat, Oct 18, 2014 at 3:43 PM, Ewen Cheslack-Postava <[email protected]> wrote: > This looks very similar to the error and stacktrace I see when > reproducing https://issues.apache.org/jira/browse/KAFKA-1196 -- that's > an overflow where the data returned in a FetchResponse exceeds 2GB. (It > triggers the error you're seeing because FetchResponse's size overflows > to become negative, which breaks tests for whether data has finished > sending.) I haven't tested against 0.8.1.1, but it looks identical > modulo line #'s. If it's the same issue, unfortunately it won't fix > itself, so that log will just keep growing with more error messages as > the consumer keeps reconnecting, requesting data, then triggering the > error in the broker which forcibly disconnects the consumer. > > I'm not certain what to suggest here since KAFKA-1196 still needs a lot > of refinement. But given the 0.8.1.1 code I don't think there's much > choice but to try to reduce the amount of data that will be returned. > One way to do that is is to reduce the # of partitions read in the > FetchRequest (i.e. make sure FetchRequests address fewer > TopicAndPartitions, maybe putting each TopicAndPartition in its own > request). An alternative would be to use more recent offsets (i.e. don't > start from the oldest data available in Kafka). A recent enough offset > should result in a < 2GB response. > > -Ewen > > On Sat, Oct 18, 2014, at 12:07 AM, xingcan wrote: > > Hi, all > > > > Recently, I upgrade my Kafka cluster to 0.8.1.1 and set replication with > > num.replica.fetchers=5. Last night, there's something wrong with the > > network. Soon, I found the server.log files (not data log!) on every node > > reached 4GB in an hour. > > I am not sure if it's my inappropriate configuration or other reason. Can > > anybody help me with this. Thanks~ > > > > log file tail > > ============================================ > > [2014-10-16 20:59:59,994] ERROR Closing socket for /192.168.1.66 because > > of > > error (kafka.network.Processor) > > kafka.common.KafkaException: This operation cannot be completed on a > > complete request. > > at > > kafka.network.Transmission$class.expectIncomplete(Transmission.scala:34) > > at > > kafka.api.FetchResponseSend.expectIncomplete(FetchResponse.scala:191) > > at kafka.api.FetchResponseSend.writeTo(FetchResponse.scala:214) > > at kafka.network.Processor.write(SocketServer.scala:375) > > at kafka.network.Processor.run(SocketServer.scala:247) > > at java.lang.Thread.run(Thread.java:745) > > [2014-10-16 20:59:59,994] ERROR Closing socket for /192.168.1.66 because > > of > > error (kafka.network.Processor) > > kafka.common.KafkaException: This operation cannot be completed on a > > complete request. > > at > > kafka.network.Transmission$class.expectIncomplete(Transmission.scala:34) > > at > > kafka.api.FetchResponseSend.expectIncomplete(FetchResponse.scala:191) > > at kafka.api.FetchResponseSend.writeTo(FetchResponse.scala:214) > > at kafka.network.Processor.write(SocketServer.scala:375) > > at kafka.network.Processor.run(SocketServer.scala:247) > > at java.lang.Thread.run(Thread.java:745) > > [2014-10-16 20:59:59,994] ERROR Closing socket for /192.168.1.65 because > > of > > error (kafka.network.Processor) > > kafka.common.KafkaException: This operation cannot be completed on a > > complete request. > > at > > kafka.network.Transmission$class.expectIncomplete(Transmission.scala:34) > > at > > kafka.api.FetchResponseSend.expectIncomplete(FetchResponse.scala:191) > > at kafka.api.FetchResponseSend.writeTo(FetchResponse.scala:214) > > at kafka.network.Processor.write(SocketServer.scala:375) > > at kafka.network.Processor.run(SocketServer.scala:247) > > at java.lang.Thread.run(Thread.java:745) > > > > > > > > -- > > *Xingcan* > -- *Xingcan*
