Re: How to run the three producers test
Hmm..yeah some error logs would be nice like Gwen pointed out. Do any of your brokers fall out of the ISR when sending messages? It seems like your setup should be fine, so I'm not entirely sure. On Tue, Jul 14, 2015 at 1:31 PM, Yuheng Du yuheng.du.h...@gmail.com wrote: Jiefu, I am performing these tests on a 6 nodes cluster in cloudlab (a infrastructure built for scientific research). I use 2 nodes as producers, 2 as brokers only, and 2 as consumers. I have tested for each individual machines and they work well. I did not use AWS. Thank you! On Tue, Jul 14, 2015 at 4:20 PM, JIEFU GONG jg...@berkeley.edu wrote: Yuheng, are you performing these tests locally or using a service such as AWS? I'd try using each separate machine individually first, connecting to the ZK/Kafka servers and ensuring that each is able to first log and consume messages independently. On Tue, Jul 14, 2015 at 1:17 PM, Gwen Shapira gshap...@cloudera.com wrote: Are there any errors on the broker logs? On Tue, Jul 14, 2015 at 11:57 AM, Yuheng Du yuheng.du.h...@gmail.com wrote: Jiefu, Thank you. The three producers can run at the same time. I mean should they be started at exactly the same time? (I have three consoles from each of the three machines and I just start the console command manually one by one) Or a small variation of the starting time won't matter? Gwen and Jiefu, I have started the three producers at three machines, after a while, all of them gives a java.net.ConnectException: [2015-07-14 12:56:46,352] WARN Error in I/O with producer0-link-0/ 192.168.1.1 (org.apache.kafka.common.network.Selector) java.net.ConnectException: Connection refused.. [2015-07-14 12:56:48,056] WARN Error in I/O with producer1-link-0/ 192.168.1.2 (org.apache.kafka.common.network.Selector) java.net.ConnectException: Connection refused. What could be the cause? Thank you guys! On Tue, Jul 14, 2015 at 2:47 PM, JIEFU GONG jg...@berkeley.edu wrote: Yuheng, Yes, if you read the blog post it specifies that he's using three separate machines. There's no reason the producers cannot be started at the same time, I believe. On Tue, Jul 14, 2015 at 11:42 AM, Yuheng Du yuheng.du.h...@gmail.com wrote: Hi, I am running the performance test for kafka. https://gist.github.com/jkreps /c7ddb4041ef62a900e6c For the Three Producers, 3x async replication scenario, the command is the same as one producer: bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance test 5000 100 -1 acks=1 bootstrap.servers=esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864 batch.size=8196 So How to I run the test for three producers? Do I just run them on three separate servers at the same time? Will there be some error in this way since the three producers can't be started at the same time? Thanks. best, Yuheng -- Jiefu Gong University of California, Berkeley | Class of 2017 B.A Computer Science | College of Letters and Sciences jg...@berkeley.edu elise...@berkeley.edu | (925) 400-3427 -- Jiefu Gong University of California, Berkeley | Class of 2017 B.A Computer Science | College of Letters and Sciences jg...@berkeley.edu elise...@berkeley.edu | (925) 400-3427 -- Jiefu Gong University of California, Berkeley | Class of 2017 B.A Computer Science | College of Letters and Sciences jg...@berkeley.edu elise...@berkeley.edu | (925) 400-3427
Re: How to run the three producers test
Someone may correct me if I am incorrect, but how much disk space do you have on these nodes? Your exception 'No space left on device' from one of your brokers seems to suggest that you're full (after all you're writing 500 million records). If this is the case I believe the expected behavior for Kafka is to reject any more attempts to write data? On Tue, Jul 14, 2015 at 2:27 PM, Yuheng Du yuheng.du.h...@gmail.com wrote: Also, the log in another broker (not the bootstrap) says: [2015-07-14 15:18:41,220] FATAL [Replica Manager on Broker 1]: Error writing to highwatermark file: (kafka.server.ReplicaManager) [2015-07-14 15:18:40,160] ERROR Closing socket for /130.127.133.47 because of error (kafka.network.Process or) java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcherImpl.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) at sun.nio.ch.IOUtil.read(IOUtil.java:197) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379) at kafka.utils.Utils$.read(Utils.scala:380) at kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54) at kafka.network.Processor.read(SocketServer.scala:444) at kafka.network.Processor.run(SocketServer.scala:340) at java.lang.Thread.run(Thread.java:745) java.io.IOException: No space left on device at java.io.FileOutputStream.writeBytes(Native Method) at java.io.FileOutputStream.write(FileOutputStream.java:345) at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221) at sun.nio.cs.StreamEncoder.implFlushBuffe (END) On Tue, Jul 14, 2015 at 5:24 PM, Yuheng Du yuheng.du.h...@gmail.com wrote: Hi Jiefu, Gwen, I am running the Throughput versus stored data test: bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance test 500 100 -1 acks=1 bootstrap.servers= esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864 batch.size=8196 After around 50,000,000 messages were sent, I got a bunch of connection refused error as I mentioned before. I checked the logs on the broker and here is what I see: [2015-07-14 15:11:23,578] WARN Partition [test,4] on broker 5: No checkpointed highwatermark is found for partition [test,4] (kafka.cluster.Partition) [2015-07-14 15:12:33,298] INFO Rolled new log segment for 'test-4' in 4 ms. (kafka.log.Log) [2015-07-14 15:12:33,299] INFO Rolled new log segment for 'test-0' in 1 ms. (kafka.log.Log) [2015-07-14 15:13:39,529] INFO Rolled new log segment for 'test-4' in 1 ms. (kafka.log.Log) [2015-07-14 15:13:39,531] INFO Rolled new log segment for 'test-0' in 1 ms. (kafka.log.Log) [2015-07-14 15:14:48,502] INFO Rolled new log segment for 'test-4' in 3 ms. (kafka.log.Log) [2015-07-14 15:14:48,502] INFO Rolled new log segment for 'test-0' in 1 ms. (kafka.log.Log) [2015-07-14 15:15:51,478] INFO Rolled new log segment for 'test-4' in 1 ms. (kafka.log.Log) [2015-07-14 15:15:51,479] INFO Rolled new log segment for 'test-0' in 1 ms. (kafka.log.Log) [2015-07-14 15:16:52,589] INFO Rolled new log segment for 'test-4' in 1 ms. (kafka.log.Log) [2015-07-14 15:16:52,590] INFO Rolled new log segment for 'test-0' in 1 ms. (kafka.log.Log) [2015-07-14 15:17:57,406] INFO Rolled new log segment for 'test-4' in 1 ms. (kafka.log.Log) [2015-07-14 15:17:57,407] INFO Rolled new log segment for 'test-0' in 0 ms. (kafka.log.Log) [2015-07-14 15:18:39,792] FATAL [KafkaApi-5] Halting due to unrecoverable I/O error while handling produce request: (kafka.server.KafkaApis) kafka.common.KafkaStorageException: I/O exception in append to log 'test-0' at kafka.log.Log.append(Log.scala:266) at kafka.cluster.Partition$$anonfun$appendMessagesToLeader$1.apply(Partition.scala:379) at kafka.cluster.Partition$$anonfun$appendMessagesToLeader$1.apply(Partition.scala:365) at kafka.utils.Utils$.inLock(Utils.scala:535) at kafka.utils.Utils$.inReadLock(Utils.scala:541) at kafka.cluster.Partition.appendMessagesToLeader(Partition.scala:365) at kafka.server.KafkaApis$$anonfun$appendToLocalLog$2.apply(KafkaApis.scala:291) at kafka.server.KafkaApis$$anonfun$appendToLocalLog$2.apply(KafkaApis.scala:282) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala.coll Can you help me with this problem? Thanks.
Re: How to run the three producers test
I checked the logs on the brokers, it seems that the zookeeper or the kafka server process is not running on this broker...Thank you guys. I will see if it happens again. On Tue, Jul 14, 2015 at 4:53 PM, JIEFU GONG jg...@berkeley.edu wrote: Hmm..yeah some error logs would be nice like Gwen pointed out. Do any of your brokers fall out of the ISR when sending messages? It seems like your setup should be fine, so I'm not entirely sure. On Tue, Jul 14, 2015 at 1:31 PM, Yuheng Du yuheng.du.h...@gmail.com wrote: Jiefu, I am performing these tests on a 6 nodes cluster in cloudlab (a infrastructure built for scientific research). I use 2 nodes as producers, 2 as brokers only, and 2 as consumers. I have tested for each individual machines and they work well. I did not use AWS. Thank you! On Tue, Jul 14, 2015 at 4:20 PM, JIEFU GONG jg...@berkeley.edu wrote: Yuheng, are you performing these tests locally or using a service such as AWS? I'd try using each separate machine individually first, connecting to the ZK/Kafka servers and ensuring that each is able to first log and consume messages independently. On Tue, Jul 14, 2015 at 1:17 PM, Gwen Shapira gshap...@cloudera.com wrote: Are there any errors on the broker logs? On Tue, Jul 14, 2015 at 11:57 AM, Yuheng Du yuheng.du.h...@gmail.com wrote: Jiefu, Thank you. The three producers can run at the same time. I mean should they be started at exactly the same time? (I have three consoles from each of the three machines and I just start the console command manually one by one) Or a small variation of the starting time won't matter? Gwen and Jiefu, I have started the three producers at three machines, after a while, all of them gives a java.net.ConnectException: [2015-07-14 12:56:46,352] WARN Error in I/O with producer0-link-0/ 192.168.1.1 (org.apache.kafka.common.network.Selector) java.net.ConnectException: Connection refused.. [2015-07-14 12:56:48,056] WARN Error in I/O with producer1-link-0/ 192.168.1.2 (org.apache.kafka.common.network.Selector) java.net.ConnectException: Connection refused. What could be the cause? Thank you guys! On Tue, Jul 14, 2015 at 2:47 PM, JIEFU GONG jg...@berkeley.edu wrote: Yuheng, Yes, if you read the blog post it specifies that he's using three separate machines. There's no reason the producers cannot be started at the same time, I believe. On Tue, Jul 14, 2015 at 11:42 AM, Yuheng Du yuheng.du.h...@gmail.com wrote: Hi, I am running the performance test for kafka. https://gist.github.com/jkreps /c7ddb4041ef62a900e6c For the Three Producers, 3x async replication scenario, the command is the same as one producer: bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance test 5000 100 -1 acks=1 bootstrap.servers=esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864 batch.size=8196 So How to I run the test for three producers? Do I just run them on three separate servers at the same time? Will there be some error in this way since the three producers can't be started at the same time? Thanks. best, Yuheng -- Jiefu Gong University of California, Berkeley | Class of 2017 B.A Computer Science | College of Letters and Sciences jg...@berkeley.edu elise...@berkeley.edu | (925) 400-3427 -- Jiefu Gong University of California, Berkeley | Class of 2017 B.A Computer Science | College of Letters and Sciences jg...@berkeley.edu elise...@berkeley.edu | (925) 400-3427 -- Jiefu Gong University of California, Berkeley | Class of 2017 B.A Computer Science | College of Letters and Sciences jg...@berkeley.edu elise...@berkeley.edu | (925) 400-3427
Re: How to run the three producers test
Also, the log in another broker (not the bootstrap) says: [2015-07-14 15:18:41,220] FATAL [Replica Manager on Broker 1]: Error writing to highwatermark file: (kafka.server.ReplicaManager) [2015-07-14 15:18:40,160] ERROR Closing socket for /130.127.133.47 because of error (kafka.network.Process or) java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcherImpl.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) at sun.nio.ch.IOUtil.read(IOUtil.java:197) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379) at kafka.utils.Utils$.read(Utils.scala:380) at kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54) at kafka.network.Processor.read(SocketServer.scala:444) at kafka.network.Processor.run(SocketServer.scala:340) at java.lang.Thread.run(Thread.java:745) java.io.IOException: No space left on device at java.io.FileOutputStream.writeBytes(Native Method) at java.io.FileOutputStream.write(FileOutputStream.java:345) at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221) at sun.nio.cs.StreamEncoder.implFlushBuffe (END) On Tue, Jul 14, 2015 at 5:24 PM, Yuheng Du yuheng.du.h...@gmail.com wrote: Hi Jiefu, Gwen, I am running the Throughput versus stored data test: bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance test 500 100 -1 acks=1 bootstrap.servers= esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864 batch.size=8196 After around 50,000,000 messages were sent, I got a bunch of connection refused error as I mentioned before. I checked the logs on the broker and here is what I see: [2015-07-14 15:11:23,578] WARN Partition [test,4] on broker 5: No checkpointed highwatermark is found for partition [test,4] (kafka.cluster.Partition) [2015-07-14 15:12:33,298] INFO Rolled new log segment for 'test-4' in 4 ms. (kafka.log.Log) [2015-07-14 15:12:33,299] INFO Rolled new log segment for 'test-0' in 1 ms. (kafka.log.Log) [2015-07-14 15:13:39,529] INFO Rolled new log segment for 'test-4' in 1 ms. (kafka.log.Log) [2015-07-14 15:13:39,531] INFO Rolled new log segment for 'test-0' in 1 ms. (kafka.log.Log) [2015-07-14 15:14:48,502] INFO Rolled new log segment for 'test-4' in 3 ms. (kafka.log.Log) [2015-07-14 15:14:48,502] INFO Rolled new log segment for 'test-0' in 1 ms. (kafka.log.Log) [2015-07-14 15:15:51,478] INFO Rolled new log segment for 'test-4' in 1 ms. (kafka.log.Log) [2015-07-14 15:15:51,479] INFO Rolled new log segment for 'test-0' in 1 ms. (kafka.log.Log) [2015-07-14 15:16:52,589] INFO Rolled new log segment for 'test-4' in 1 ms. (kafka.log.Log) [2015-07-14 15:16:52,590] INFO Rolled new log segment for 'test-0' in 1 ms. (kafka.log.Log) [2015-07-14 15:17:57,406] INFO Rolled new log segment for 'test-4' in 1 ms. (kafka.log.Log) [2015-07-14 15:17:57,407] INFO Rolled new log segment for 'test-0' in 0 ms. (kafka.log.Log) [2015-07-14 15:18:39,792] FATAL [KafkaApi-5] Halting due to unrecoverable I/O error while handling produce request: (kafka.server.KafkaApis) kafka.common.KafkaStorageException: I/O exception in append to log 'test-0' at kafka.log.Log.append(Log.scala:266) at kafka.cluster.Partition$$anonfun$appendMessagesToLeader$1.apply(Partition.scala:379) at kafka.cluster.Partition$$anonfun$appendMessagesToLeader$1.apply(Partition.scala:365) at kafka.utils.Utils$.inLock(Utils.scala:535) at kafka.utils.Utils$.inReadLock(Utils.scala:541) at kafka.cluster.Partition.appendMessagesToLeader(Partition.scala:365) at kafka.server.KafkaApis$$anonfun$appendToLocalLog$2.apply(KafkaApis.scala:291) at kafka.server.KafkaApis$$anonfun$appendToLocalLog$2.apply(KafkaApis.scala:282) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala.coll Can you help me with this problem? Thanks. On Tue, Jul 14, 2015 at 5:12 PM, Yuheng Du yuheng.du.h...@gmail.com wrote: I checked the logs on the brokers, it seems that the zookeeper or the kafka server process is not running on this broker...Thank you guys. I will see if it happens again. On Tue, Jul 14, 2015 at 4:53 PM, JIEFU GONG jg...@berkeley.edu wrote: Hmm..yeah some error logs would be nice like Gwen pointed out. Do any of your brokers fall out of the ISR when sending messages? It seems like your setup should be fine, so I'm not entirely sure. On Tue, Jul 14, 2015 at 1:31 PM,
Re: How to run the three producers test
Hi Jiefu, Gwen, I am running the Throughput versus stored data test: bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance test 500 100 -1 acks=1 bootstrap.servers= esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864 batch.size=8196 After around 50,000,000 messages were sent, I got a bunch of connection refused error as I mentioned before. I checked the logs on the broker and here is what I see: [2015-07-14 15:11:23,578] WARN Partition [test,4] on broker 5: No checkpointed highwatermark is found for partition [test,4] (kafka.cluster.Partition) [2015-07-14 15:12:33,298] INFO Rolled new log segment for 'test-4' in 4 ms. (kafka.log.Log) [2015-07-14 15:12:33,299] INFO Rolled new log segment for 'test-0' in 1 ms. (kafka.log.Log) [2015-07-14 15:13:39,529] INFO Rolled new log segment for 'test-4' in 1 ms. (kafka.log.Log) [2015-07-14 15:13:39,531] INFO Rolled new log segment for 'test-0' in 1 ms. (kafka.log.Log) [2015-07-14 15:14:48,502] INFO Rolled new log segment for 'test-4' in 3 ms. (kafka.log.Log) [2015-07-14 15:14:48,502] INFO Rolled new log segment for 'test-0' in 1 ms. (kafka.log.Log) [2015-07-14 15:15:51,478] INFO Rolled new log segment for 'test-4' in 1 ms. (kafka.log.Log) [2015-07-14 15:15:51,479] INFO Rolled new log segment for 'test-0' in 1 ms. (kafka.log.Log) [2015-07-14 15:16:52,589] INFO Rolled new log segment for 'test-4' in 1 ms. (kafka.log.Log) [2015-07-14 15:16:52,590] INFO Rolled new log segment for 'test-0' in 1 ms. (kafka.log.Log) [2015-07-14 15:17:57,406] INFO Rolled new log segment for 'test-4' in 1 ms. (kafka.log.Log) [2015-07-14 15:17:57,407] INFO Rolled new log segment for 'test-0' in 0 ms. (kafka.log.Log) [2015-07-14 15:18:39,792] FATAL [KafkaApi-5] Halting due to unrecoverable I/O error while handling produce request: (kafka.server.KafkaApis) kafka.common.KafkaStorageException: I/O exception in append to log 'test-0' at kafka.log.Log.append(Log.scala:266) at kafka.cluster.Partition$$anonfun$appendMessagesToLeader$1.apply(Partition.scala:379) at kafka.cluster.Partition$$anonfun$appendMessagesToLeader$1.apply(Partition.scala:365) at kafka.utils.Utils$.inLock(Utils.scala:535) at kafka.utils.Utils$.inReadLock(Utils.scala:541) at kafka.cluster.Partition.appendMessagesToLeader(Partition.scala:365) at kafka.server.KafkaApis$$anonfun$appendToLocalLog$2.apply(KafkaApis.scala:291) at kafka.server.KafkaApis$$anonfun$appendToLocalLog$2.apply(KafkaApis.scala:282) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala.coll Can you help me with this problem? Thanks. On Tue, Jul 14, 2015 at 5:12 PM, Yuheng Du yuheng.du.h...@gmail.com wrote: I checked the logs on the brokers, it seems that the zookeeper or the kafka server process is not running on this broker...Thank you guys. I will see if it happens again. On Tue, Jul 14, 2015 at 4:53 PM, JIEFU GONG jg...@berkeley.edu wrote: Hmm..yeah some error logs would be nice like Gwen pointed out. Do any of your brokers fall out of the ISR when sending messages? It seems like your setup should be fine, so I'm not entirely sure. On Tue, Jul 14, 2015 at 1:31 PM, Yuheng Du yuheng.du.h...@gmail.com wrote: Jiefu, I am performing these tests on a 6 nodes cluster in cloudlab (a infrastructure built for scientific research). I use 2 nodes as producers, 2 as brokers only, and 2 as consumers. I have tested for each individual machines and they work well. I did not use AWS. Thank you! On Tue, Jul 14, 2015 at 4:20 PM, JIEFU GONG jg...@berkeley.edu wrote: Yuheng, are you performing these tests locally or using a service such as AWS? I'd try using each separate machine individually first, connecting to the ZK/Kafka servers and ensuring that each is able to first log and consume messages independently. On Tue, Jul 14, 2015 at 1:17 PM, Gwen Shapira gshap...@cloudera.com wrote: Are there any errors on the broker logs? On Tue, Jul 14, 2015 at 11:57 AM, Yuheng Du yuheng.du.h...@gmail.com wrote: Jiefu, Thank you. The three producers can run at the same time. I mean should they be started at exactly the same time? (I have three consoles from each of the three machines and I just start the console command manually one by one) Or a small variation of the starting time won't matter? Gwen and Jiefu, I have started the three producers at three machines, after a while, all of them gives a java.net.ConnectException: [2015-07-14 12:56:46,352] WARN Error in I/O with
Re: How to run the three producers test
But is there a way to let kafka override the old data if the disk is filled? Or is it not necessary to use this figure? Thanks. On Tue, Jul 14, 2015 at 10:14 PM, Yuheng Du yuheng.du.h...@gmail.com wrote: Jiefu, I agree with you. I checked the hardware specs of my machines, each one of them has: RAM 256GB ECC Memory (16x 16 GB DDR4 1600MT/s dual rank RDIMMs Disk Two 1 TB 7.2K RPM 3G SATA HDDs For the throughput versus stored data test, it uses 5*10^10 messages, which has the total volume of 5TB, I made the replication factor to be 3, which means the total size including replicas would be 15TB, which apparently overwhelmed the two brokers I use. Thanks. best, Yuheng On Tue, Jul 14, 2015 at 6:06 PM, JIEFU GONG jg...@berkeley.edu wrote: Someone may correct me if I am incorrect, but how much disk space do you have on these nodes? Your exception 'No space left on device' from one of your brokers seems to suggest that you're full (after all you're writing 500 million records). If this is the case I believe the expected behavior for Kafka is to reject any more attempts to write data? On Tue, Jul 14, 2015 at 2:27 PM, Yuheng Du yuheng.du.h...@gmail.com wrote: Also, the log in another broker (not the bootstrap) says: [2015-07-14 15:18:41,220] FATAL [Replica Manager on Broker 1]: Error writing to highwatermark file: (kafka.server.ReplicaManager) [2015-07-14 15:18:40,160] ERROR Closing socket for /130.127.133.47 because of error (kafka.network.Process or) java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcherImpl.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) at sun.nio.ch.IOUtil.read(IOUtil.java:197) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379) at kafka.utils.Utils$.read(Utils.scala:380) at kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54) at kafka.network.Processor.read(SocketServer.scala:444) at kafka.network.Processor.run(SocketServer.scala:340) at java.lang.Thread.run(Thread.java:745) java.io.IOException: No space left on device at java.io.FileOutputStream.writeBytes(Native Method) at java.io.FileOutputStream.write(FileOutputStream.java:345) at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221) at sun.nio.cs.StreamEncoder.implFlushBuffe (END) On Tue, Jul 14, 2015 at 5:24 PM, Yuheng Du yuheng.du.h...@gmail.com wrote: Hi Jiefu, Gwen, I am running the Throughput versus stored data test: bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance test 500 100 -1 acks=1 bootstrap.servers= esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864 batch.size=8196 After around 50,000,000 messages were sent, I got a bunch of connection refused error as I mentioned before. I checked the logs on the broker and here is what I see: [2015-07-14 15:11:23,578] WARN Partition [test,4] on broker 5: No checkpointed highwatermark is found for partition [test,4] (kafka.cluster.Partition) [2015-07-14 15:12:33,298] INFO Rolled new log segment for 'test-4' in 4 ms. (kafka.log.Log) [2015-07-14 15:12:33,299] INFO Rolled new log segment for 'test-0' in 1 ms. (kafka.log.Log) [2015-07-14 15:13:39,529] INFO Rolled new log segment for 'test-4' in 1 ms. (kafka.log.Log) [2015-07-14 15:13:39,531] INFO Rolled new log segment for 'test-0' in 1 ms. (kafka.log.Log) [2015-07-14 15:14:48,502] INFO Rolled new log segment for 'test-4' in 3 ms. (kafka.log.Log) [2015-07-14 15:14:48,502] INFO Rolled new log segment for 'test-0' in 1 ms. (kafka.log.Log) [2015-07-14 15:15:51,478] INFO Rolled new log segment for 'test-4' in 1 ms. (kafka.log.Log) [2015-07-14 15:15:51,479] INFO Rolled new log segment for 'test-0' in 1 ms. (kafka.log.Log) [2015-07-14 15:16:52,589] INFO Rolled new log segment for 'test-4' in 1 ms. (kafka.log.Log) [2015-07-14 15:16:52,590] INFO Rolled new log segment for 'test-0' in 1 ms. (kafka.log.Log) [2015-07-14 15:17:57,406] INFO Rolled new log segment for 'test-4' in 1 ms. (kafka.log.Log) [2015-07-14 15:17:57,407] INFO Rolled new log segment for 'test-0' in 0 ms. (kafka.log.Log) [2015-07-14 15:18:39,792] FATAL [KafkaApi-5] Halting due to unrecoverable I/O error while handling produce request: (kafka.server.KafkaApis) kafka.common.KafkaStorageException: I/O exception in append to log 'test-0' at kafka.log.Log.append(Log.scala:266) at kafka.cluster.Partition$$anonfun$appendMessagesToLeader$1.apply(Partition.scala:379) at
Re: How to run the three producers test
Jiefu, I agree with you. I checked the hardware specs of my machines, each one of them has: RAM 256GB ECC Memory (16x 16 GB DDR4 1600MT/s dual rank RDIMMs Disk Two 1 TB 7.2K RPM 3G SATA HDDs For the throughput versus stored data test, it uses 5*10^10 messages, which has the total volume of 5TB, I made the replication factor to be 3, which means the total size including replicas would be 15TB, which apparently overwhelmed the two brokers I use. Thanks. best, Yuheng On Tue, Jul 14, 2015 at 6:06 PM, JIEFU GONG jg...@berkeley.edu wrote: Someone may correct me if I am incorrect, but how much disk space do you have on these nodes? Your exception 'No space left on device' from one of your brokers seems to suggest that you're full (after all you're writing 500 million records). If this is the case I believe the expected behavior for Kafka is to reject any more attempts to write data? On Tue, Jul 14, 2015 at 2:27 PM, Yuheng Du yuheng.du.h...@gmail.com wrote: Also, the log in another broker (not the bootstrap) says: [2015-07-14 15:18:41,220] FATAL [Replica Manager on Broker 1]: Error writing to highwatermark file: (kafka.server.ReplicaManager) [2015-07-14 15:18:40,160] ERROR Closing socket for /130.127.133.47 because of error (kafka.network.Process or) java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcherImpl.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) at sun.nio.ch.IOUtil.read(IOUtil.java:197) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379) at kafka.utils.Utils$.read(Utils.scala:380) at kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54) at kafka.network.Processor.read(SocketServer.scala:444) at kafka.network.Processor.run(SocketServer.scala:340) at java.lang.Thread.run(Thread.java:745) java.io.IOException: No space left on device at java.io.FileOutputStream.writeBytes(Native Method) at java.io.FileOutputStream.write(FileOutputStream.java:345) at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221) at sun.nio.cs.StreamEncoder.implFlushBuffe (END) On Tue, Jul 14, 2015 at 5:24 PM, Yuheng Du yuheng.du.h...@gmail.com wrote: Hi Jiefu, Gwen, I am running the Throughput versus stored data test: bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance test 500 100 -1 acks=1 bootstrap.servers= esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864 batch.size=8196 After around 50,000,000 messages were sent, I got a bunch of connection refused error as I mentioned before. I checked the logs on the broker and here is what I see: [2015-07-14 15:11:23,578] WARN Partition [test,4] on broker 5: No checkpointed highwatermark is found for partition [test,4] (kafka.cluster.Partition) [2015-07-14 15:12:33,298] INFO Rolled new log segment for 'test-4' in 4 ms. (kafka.log.Log) [2015-07-14 15:12:33,299] INFO Rolled new log segment for 'test-0' in 1 ms. (kafka.log.Log) [2015-07-14 15:13:39,529] INFO Rolled new log segment for 'test-4' in 1 ms. (kafka.log.Log) [2015-07-14 15:13:39,531] INFO Rolled new log segment for 'test-0' in 1 ms. (kafka.log.Log) [2015-07-14 15:14:48,502] INFO Rolled new log segment for 'test-4' in 3 ms. (kafka.log.Log) [2015-07-14 15:14:48,502] INFO Rolled new log segment for 'test-0' in 1 ms. (kafka.log.Log) [2015-07-14 15:15:51,478] INFO Rolled new log segment for 'test-4' in 1 ms. (kafka.log.Log) [2015-07-14 15:15:51,479] INFO Rolled new log segment for 'test-0' in 1 ms. (kafka.log.Log) [2015-07-14 15:16:52,589] INFO Rolled new log segment for 'test-4' in 1 ms. (kafka.log.Log) [2015-07-14 15:16:52,590] INFO Rolled new log segment for 'test-0' in 1 ms. (kafka.log.Log) [2015-07-14 15:17:57,406] INFO Rolled new log segment for 'test-4' in 1 ms. (kafka.log.Log) [2015-07-14 15:17:57,407] INFO Rolled new log segment for 'test-0' in 0 ms. (kafka.log.Log) [2015-07-14 15:18:39,792] FATAL [KafkaApi-5] Halting due to unrecoverable I/O error while handling produce request: (kafka.server.KafkaApis) kafka.common.KafkaStorageException: I/O exception in append to log 'test-0' at kafka.log.Log.append(Log.scala:266) at kafka.cluster.Partition$$anonfun$appendMessagesToLeader$1.apply(Partition.scala:379) at kafka.cluster.Partition$$anonfun$appendMessagesToLeader$1.apply(Partition.scala:365) at kafka.utils.Utils$.inLock(Utils.scala:535) at kafka.utils.Utils$.inReadLock(Utils.scala:541) at
Re: How to run the three producers test
Jiefu, Now even if the disk space is enough (less than 18%), when I run it still gives me error where in the logs it says: [2015-07-14 23:08:48,735] FATAL Fatal error during KafkaServerStartable startup. Prepare to shutdown (kafka.server.KafkaServerStartable) org.I0Itec.zkclient.exception.ZkTimeoutException: Unable to connect to zookeeper server within timeout: 6000 at org.I0Itec.zkclient.ZkClient.connect(ZkClient.java:880) at org.I0Itec.zkclient.ZkClient.init(ZkClient.java:98) at org.I0Itec.zkclient.ZkClient.init(ZkClient.java:84) at kafka.server.KafkaServer.initZk(KafkaServer.scala:157) at kafka.server.KafkaServer.startup(KafkaServer.scala:82) at kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:29) at kafka.Kafka$.main(Kafka.scala:46) at kafka.Kafka.main(Kafka.scala) [2015-07-14 23:08:48,737] INFO [Kafka Server 1], shutting down (kafka.server.KafkaServer) I have checked that the zookeeper is running fine. Can anyone help why I got the error? Thanks. On Tue, Jul 14, 2015 at 10:24 PM, Yuheng Du yuheng.du.h...@gmail.com wrote: But is there a way to let kafka override the old data if the disk is filled? Or is it not necessary to use this figure? Thanks. On Tue, Jul 14, 2015 at 10:14 PM, Yuheng Du yuheng.du.h...@gmail.com wrote: Jiefu, I agree with you. I checked the hardware specs of my machines, each one of them has: RAM 256GB ECC Memory (16x 16 GB DDR4 1600MT/s dual rank RDIMMs Disk Two 1 TB 7.2K RPM 3G SATA HDDs For the throughput versus stored data test, it uses 5*10^10 messages, which has the total volume of 5TB, I made the replication factor to be 3, which means the total size including replicas would be 15TB, which apparently overwhelmed the two brokers I use. Thanks. best, Yuheng On Tue, Jul 14, 2015 at 6:06 PM, JIEFU GONG jg...@berkeley.edu wrote: Someone may correct me if I am incorrect, but how much disk space do you have on these nodes? Your exception 'No space left on device' from one of your brokers seems to suggest that you're full (after all you're writing 500 million records). If this is the case I believe the expected behavior for Kafka is to reject any more attempts to write data? On Tue, Jul 14, 2015 at 2:27 PM, Yuheng Du yuheng.du.h...@gmail.com wrote: Also, the log in another broker (not the bootstrap) says: [2015-07-14 15:18:41,220] FATAL [Replica Manager on Broker 1]: Error writing to highwatermark file: (kafka.server.ReplicaManager) [2015-07-14 15:18:40,160] ERROR Closing socket for /130.127.133.47 because of error (kafka.network.Process or) java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcherImpl.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) at sun.nio.ch.IOUtil.read(IOUtil.java:197) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379) at kafka.utils.Utils$.read(Utils.scala:380) at kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54) at kafka.network.Processor.read(SocketServer.scala:444) at kafka.network.Processor.run(SocketServer.scala:340) at java.lang.Thread.run(Thread.java:745) java.io.IOException: No space left on device at java.io.FileOutputStream.writeBytes(Native Method) at java.io.FileOutputStream.write(FileOutputStream.java:345) at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221) at sun.nio.cs.StreamEncoder.implFlushBuffe (END) On Tue, Jul 14, 2015 at 5:24 PM, Yuheng Du yuheng.du.h...@gmail.com wrote: Hi Jiefu, Gwen, I am running the Throughput versus stored data test: bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance test 500 100 -1 acks=1 bootstrap.servers= esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864 batch.size=8196 After around 50,000,000 messages were sent, I got a bunch of connection refused error as I mentioned before. I checked the logs on the broker and here is what I see: [2015-07-14 15:11:23,578] WARN Partition [test,4] on broker 5: No checkpointed highwatermark is found for partition [test,4] (kafka.cluster.Partition) [2015-07-14 15:12:33,298] INFO Rolled new log segment for 'test-4' in 4 ms. (kafka.log.Log) [2015-07-14 15:12:33,299] INFO Rolled new log segment for 'test-0' in 1 ms. (kafka.log.Log) [2015-07-14 15:13:39,529] INFO Rolled new log segment for 'test-4' in 1 ms. (kafka.log.Log) [2015-07-14 15:13:39,531] INFO Rolled new log segment for 'test-0' in 1 ms. (kafka.log.Log) [2015-07-14 15:14:48,502] INFO Rolled new log segment for 'test-4' in 3 ms. (kafka.log.Log) [2015-07-14
Re: How to run the three producers test
Yuheng, Yes, if you read the blog post it specifies that he's using three separate machines. There's no reason the producers cannot be started at the same time, I believe. On Tue, Jul 14, 2015 at 11:42 AM, Yuheng Du yuheng.du.h...@gmail.com wrote: Hi, I am running the performance test for kafka. https://gist.github.com/jkreps /c7ddb4041ef62a900e6c For the Three Producers, 3x async replication scenario, the command is the same as one producer: bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance test 5000 100 -1 acks=1 bootstrap.servers=esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864 batch.size=8196 So How to I run the test for three producers? Do I just run them on three separate servers at the same time? Will there be some error in this way since the three producers can't be started at the same time? Thanks. best, Yuheng -- Jiefu Gong University of California, Berkeley | Class of 2017 B.A Computer Science | College of Letters and Sciences jg...@berkeley.edu elise...@berkeley.edu | (925) 400-3427
Re: How to run the three producers test
You need to run 3 of those at the same time. We don't expect any errors, but if you run into anything, let us know and we'll try to help. Gwen On Tue, Jul 14, 2015 at 11:42 AM, Yuheng Du yuheng.du.h...@gmail.com wrote: Hi, I am running the performance test for kafka. https://gist.github.com/jkreps /c7ddb4041ef62a900e6c For the Three Producers, 3x async replication scenario, the command is the same as one producer: bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance test 5000 100 -1 acks=1 bootstrap.servers=esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864 batch.size=8196 So How to I run the test for three producers? Do I just run them on three separate servers at the same time? Will there be some error in this way since the three producers can't be started at the same time? Thanks. best, Yuheng
Re: How to run the three producers test
Jiefu, Thank you. The three producers can run at the same time. I mean should they be started at exactly the same time? (I have three consoles from each of the three machines and I just start the console command manually one by one) Or a small variation of the starting time won't matter? Gwen and Jiefu, I have started the three producers at three machines, after a while, all of them gives a java.net.ConnectException: [2015-07-14 12:56:46,352] WARN Error in I/O with producer0-link-0/ 192.168.1.1 (org.apache.kafka.common.network.Selector) java.net.ConnectException: Connection refused.. [2015-07-14 12:56:48,056] WARN Error in I/O with producer1-link-0/ 192.168.1.2 (org.apache.kafka.common.network.Selector) java.net.ConnectException: Connection refused. What could be the cause? Thank you guys! On Tue, Jul 14, 2015 at 2:47 PM, JIEFU GONG jg...@berkeley.edu wrote: Yuheng, Yes, if you read the blog post it specifies that he's using three separate machines. There's no reason the producers cannot be started at the same time, I believe. On Tue, Jul 14, 2015 at 11:42 AM, Yuheng Du yuheng.du.h...@gmail.com wrote: Hi, I am running the performance test for kafka. https://gist.github.com/jkreps /c7ddb4041ef62a900e6c For the Three Producers, 3x async replication scenario, the command is the same as one producer: bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance test 5000 100 -1 acks=1 bootstrap.servers=esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864 batch.size=8196 So How to I run the test for three producers? Do I just run them on three separate servers at the same time? Will there be some error in this way since the three producers can't be started at the same time? Thanks. best, Yuheng -- Jiefu Gong University of California, Berkeley | Class of 2017 B.A Computer Science | College of Letters and Sciences jg...@berkeley.edu elise...@berkeley.edu | (925) 400-3427