Re: How to run the three producers test

Yuheng Du Tue, 14 Jul 2015 19:25:13 -0700

But is there a way to let kafka override the old data if the disk is
filled? Or is it not necessary to use this figure? Thanks.


On Tue, Jul 14, 2015 at 10:14 PM, Yuheng Du <yuheng.du.h...@gmail.com>
wrote:

> Jiefu,
>
> I agree with you. I checked the hardware specs of my machines, each one of
> them has:
>
> RAM
>
>
>
> 256GB ECC Memory (16x 16 GB DDR4 1600MT/s dual rank RDIMMs
>
> Disk
>
>
>
> Two 1 TB 7.2K RPM 3G SATA HDDs
>
> For the throughput versus stored data test, it uses 5*10^10 messages,
> which has the total volume of 5TB, I made the replication factor to be 3,
> which means the total size including replicas would be 15TB, which
> apparently overwhelmed the two brokers I use.
>
> Thanks.
>
> best,
> Yuheng
>
> On Tue, Jul 14, 2015 at 6:06 PM, JIEFU GONG <jg...@berkeley.edu> wrote:
>
>> Someone may correct me if I am incorrect, but how much disk space do you
>> have on these nodes? Your exception 'No space left on device' from one of
>> your brokers seems to suggest that you're full (after all you're writing
>> 500 million records). If this is the case I believe the expected behavior
>> for Kafka is to reject any more attempts to write data?
>>
>> On Tue, Jul 14, 2015 at 2:27 PM, Yuheng Du <yuheng.du.h...@gmail.com>
>> wrote:
>>
>> > Also, the log in another broker (not the bootstrap) says:
>> >
>> > [2015-07-14 15:18:41,220] FATAL [Replica Manager on Broker 1]: Error
>> > writing to highwatermark file:  (kafka.server.ReplicaManager)
>> >
>> >
>> > [2015-07-14 15:18:40,160] ERROR Closing socket for /130.127.133.47
>> because
>> > of error (kafka.network.Process
>> >
>> > or)
>> >
>> > java.io.IOException: Connection reset by peer
>> >
>> >         at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
>> >
>> >         at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
>> >
>> >         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
>> >
>> >         at sun.nio.ch.IOUtil.read(IOUtil.java:197)
>> >
>> >         at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
>> >
>> >         at kafka.utils.Utils$.read(Utils.scala:380)
>> >
>> >         at
>> >
>> >
>> kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54)
>> >
>> >         at kafka.network.Processor.read(SocketServer.scala:444)
>> >
>> >         at kafka.network.Processor.run(SocketServer.scala:340)
>> >
>> >         at java.lang.Thread.run(Thread.java:745)
>> >
>> > ........
>> >
>> > java.io.IOException: No space left on device
>> >
>> >         at java.io.FileOutputStream.writeBytes(Native Method)
>> >
>> >         at java.io.FileOutputStream.write(FileOutputStream.java:345)
>> >
>> >         at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221)
>> >
>> >         at sun.nio.cs.StreamEncoder.implFlushBuffe
>> >
>> > (END)
>> >
>> > On Tue, Jul 14, 2015 at 5:24 PM, Yuheng Du <yuheng.du.h...@gmail.com>
>> > wrote:
>> >
>> > > Hi Jiefu, Gwen,
>> > >
>> > > I am running the Throughput versus stored data test:
>> > > bin/kafka-run-class.sh
>> org.apache.kafka.clients.tools.ProducerPerformance
>> > > test 50000000000 100 -1 acks=1 bootstrap.servers=
>> > > esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864
>> > batch.size=8196
>> > >
>> > > After around 50,000,000 messages were sent, I got a bunch of
>> connection
>> > > refused error as I mentioned before. I checked the logs on the broker
>> and
>> > > here is what I see:
>> > >
>> > > [2015-07-14 15:11:23,578] WARN Partition [test,4] on broker 5: No
>> > > checkpointed highwatermark is found for partition [test,4]
>> > > (kafka.cluster.Partition)
>> > >
>> > > [2015-07-14 15:12:33,298] INFO Rolled new log segment for 'test-4' in
>> 4
>> > > ms. (kafka.log.Log)
>> > >
>> > > [2015-07-14 15:12:33,299] INFO Rolled new log segment for 'test-0' in
>> 1
>> > > ms. (kafka.log.Log)
>> > >
>> > > [2015-07-14 15:13:39,529] INFO Rolled new log segment for 'test-4' in
>> 1
>> > > ms. (kafka.log.Log)
>> > >
>> > > [2015-07-14 15:13:39,531] INFO Rolled new log segment for 'test-0' in
>> 1
>> > > ms. (kafka.log.Log)
>> > >
>> > > [2015-07-14 15:14:48,502] INFO Rolled new log segment for 'test-4' in
>> 3
>> > > ms. (kafka.log.Log)
>> > >
>> > > [2015-07-14 15:14:48,502] INFO Rolled new log segment for 'test-0' in
>> 1
>> > > ms. (kafka.log.Log)
>> > >
>> > > [2015-07-14 15:15:51,478] INFO Rolled new log segment for 'test-4' in
>> 1
>> > > ms. (kafka.log.Log)
>> > >
>> > > [2015-07-14 15:15:51,479] INFO Rolled new log segment for 'test-0' in
>> 1
>> > > ms. (kafka.log.Log)
>> > >
>> > > [2015-07-14 15:16:52,589] INFO Rolled new log segment for 'test-4' in
>> 1
>> > > ms. (kafka.log.Log)
>> > >
>> > > [2015-07-14 15:16:52,590] INFO Rolled new log segment for 'test-0' in
>> 1
>> > > ms. (kafka.log.Log)
>> > >
>> > > [2015-07-14 15:17:57,406] INFO Rolled new log segment for 'test-4' in
>> 1
>> > > ms. (kafka.log.Log)
>> > >
>> > > [2015-07-14 15:17:57,407] INFO Rolled new log segment for 'test-0' in
>> 0
>> > > ms. (kafka.log.Log)
>> > >
>> > > [2015-07-14 15:18:39,792] FATAL [KafkaApi-5] Halting due to
>> unrecoverable
>> > > I/O error while handling produce request:  (kafka.server.KafkaApis)
>> > >
>> > > kafka.common.KafkaStorageException: I/O exception in append to log
>> > 'test-0'
>> > >
>> > >         at kafka.log.Log.append(Log.scala:266)
>> > >
>> > >         at
>> > >
>> >
>> kafka.cluster.Partition$$anonfun$appendMessagesToLeader$1.apply(Partition.scala:379)
>> > >
>> > >         at
>> > >
>> >
>> kafka.cluster.Partition$$anonfun$appendMessagesToLeader$1.apply(Partition.scala:365)
>> > >
>> > >         at kafka.utils.Utils$.inLock(Utils.scala:535)
>> > >
>> > >         at kafka.utils.Utils$.inReadLock(Utils.scala:541)
>> > >
>> > >         at
>> > > kafka.cluster.Partition.appendMessagesToLeader(Partition.scala:365)
>> > >
>> > >         at
>> > >
>> >
>> kafka.server.KafkaApis$$anonfun$appendToLocalLog$2.apply(KafkaApis.scala:291)
>> > >
>> > >         at
>> > >
>> >
>> kafka.server.KafkaApis$$anonfun$appendToLocalLog$2.apply(KafkaApis.scala:282)
>> > >
>> > >         at
>> > >
>> >
>> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>> > >
>> > >         at
>> > >
>> >
>> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>> > >
>> > >         at
>> > >
>> >
>> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
>> > >
>> > >         at
>> > >
>> >
>> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
>> > >
>> > >         at scala.coll
>> > >
>> > >
>> > >
>> > > Can you help me with this problem? Thanks.
>> > >
>> > > On Tue, Jul 14, 2015 at 5:12 PM, Yuheng Du <yuheng.du.h...@gmail.com>
>> > > wrote:
>> > >
>> > >> I checked the logs on the brokers, it seems that the zookeeper or the
>> > >> kafka server process is not running on this broker...Thank you guys.
>> I
>> > will
>> > >> see if it happens again.
>> > >>
>> > >> On Tue, Jul 14, 2015 at 4:53 PM, JIEFU GONG <jg...@berkeley.edu>
>> wrote:
>> > >>
>> > >>> Hmm..yeah some error logs would be nice like Gwen pointed out. Do
>> any
>> > of
>> > >>> your brokers fall out of the ISR when sending messages? It seems
>> like
>> > >>> your
>> > >>> setup should be fine, so I'm not entirely sure.
>> > >>>
>> > >>> On Tue, Jul 14, 2015 at 1:31 PM, Yuheng Du <
>> yuheng.du.h...@gmail.com>
>> > >>> wrote:
>> > >>>
>> > >>> > Jiefu,
>> > >>> >
>> > >>> > I am performing these tests on a 6 nodes cluster in cloudlab (a
>> > >>> > infrastructure built for scientific research). I use 2 nodes as
>> > >>> producers,
>> > >>> > 2 as brokers only, and 2 as consumers. I have tested for each
>> > >>> individual
>> > >>> > machines and they work well. I did not use AWS. Thank you!
>> > >>> >
>> > >>> > On Tue, Jul 14, 2015 at 4:20 PM, JIEFU GONG <jg...@berkeley.edu>
>> > >>> wrote:
>> > >>> >
>> > >>> > > Yuheng, are you performing these tests locally or using a
>> service
>> > >>> such as
>> > >>> > > AWS? I'd try using each separate machine individually first,
>> > >>> connecting
>> > >>> > to
>> > >>> > > the ZK/Kafka servers and ensuring that each is able to first log
>> > and
>> > >>> > > consume messages independently.
>> > >>> > >
>> > >>> > > On Tue, Jul 14, 2015 at 1:17 PM, Gwen Shapira <
>> > gshap...@cloudera.com
>> > >>> >
>> > >>> > > wrote:
>> > >>> > >
>> > >>> > > > Are there any errors on the broker logs?
>> > >>> > > >
>> > >>> > > > On Tue, Jul 14, 2015 at 11:57 AM, Yuheng Du <
>> > >>> yuheng.du.h...@gmail.com>
>> > >>> > > > wrote:
>> > >>> > > > > Jiefu,
>> > >>> > > > >
>> > >>> > > > > Thank you. The three producers can run at the same time. I
>> mean
>> > >>> > should
>> > >>> > > > they
>> > >>> > > > > be started at exactly the same time? (I have three consoles
>> > from
>> > >>> each
>> > >>> > > of
>> > >>> > > > > the three machines and I just start the console command
>> > manually
>> > >>> one
>> > >>> > by
>> > >>> > > > > one) Or a small variation of the starting time won't matter?
>> > >>> > > > >
>> > >>> > > > > Gwen and Jiefu,
>> > >>> > > > >
>> > >>> > > > > I have started the three producers at three machines, after
>> a
>> > >>> while,
>> > >>> > > all
>> > >>> > > > of
>> > >>> > > > > them gives a java.net.ConnectException:
>> > >>> > > > >
>> > >>> > > > > [2015-07-14 12:56:46,352] WARN Error in I/O with
>> > >>> producer0-link-0/
>> > >>> > > > > 192.168.1.1 (org.apache.kafka.common.network.Selector)
>> > >>> > > > >
>> > >>> > > > > java.net.ConnectException: Connection refused......
>> > >>> > > > >
>> > >>> > > > > [2015-07-14 12:56:48,056] WARN Error in I/O with
>> > >>> producer1-link-0/
>> > >>> > > > > 192.168.1.2 (org.apache.kafka.common.network.Selector)
>> > >>> > > > >
>> > >>> > > > > java.net.ConnectException: Connection refused.....
>> > >>> > > > >
>> > >>> > > > > What could be the cause?
>> > >>> > > > >
>> > >>> > > > > Thank you guys!
>> > >>> > > > >
>> > >>> > > > >
>> > >>> > > > >
>> > >>> > > > >
>> > >>> > > > > On Tue, Jul 14, 2015 at 2:47 PM, JIEFU GONG <
>> > jg...@berkeley.edu>
>> > >>> > > wrote:
>> > >>> > > > >
>> > >>> > > > >> Yuheng,
>> > >>> > > > >>
>> > >>> > > > >> Yes, if you read the blog post it specifies that he's using
>> > >>> three
>> > >>> > > > separate
>> > >>> > > > >> machines. There's no reason the producers cannot be
>> started at
>> > >>> the
>> > >>> > > same
>> > >>> > > > >> time, I believe.
>> > >>> > > > >>
>> > >>> > > > >> On Tue, Jul 14, 2015 at 11:42 AM, Yuheng Du <
>> > >>> > yuheng.du.h...@gmail.com
>> > >>> > > >
>> > >>> > > > >> wrote:
>> > >>> > > > >>
>> > >>> > > > >> > Hi,
>> > >>> > > > >> >
>> > >>> > > > >> > I am running the performance test for kafka.
>> > >>> > > > >> > https://gist.github.com/jkreps
>> > >>> > > > >> > /c7ddb4041ef62a900e6c
>> > >>> > > > >> >
>> > >>> > > > >> > For the "Three Producers, 3x async replication" scenario,
>> > the
>> > >>> > > command
>> > >>> > > > is
>> > >>> > > > >> > the same as one producer:
>> > >>> > > > >> >
>> > >>> > > > >> > bin/kafka-run-class.sh
>> > >>> > > > org.apache.kafka.clients.tools.ProducerPerformance
>> > >>> > > > >> > test 50000000 100 -1 acks=1
>> > >>> > > > >> > bootstrap.servers=esv4-hcl198.grid.linkedin.com:9092
>> > >>> > > > >> > buffer.memory=67108864 batch.size=8196
>> > >>> > > > >> >
>> > >>> > > > >> > So How to I run the test for three producers? Do I just
>> run
>> > >>> them
>> > >>> > on
>> > >>> > > > three
>> > >>> > > > >> > separate servers at the same time? Will there be some
>> error
>> > in
>> > >>> > this
>> > >>> > > > way
>> > >>> > > > >> > since the three producers can't be started at the same
>> time?
>> > >>> > > > >> >
>> > >>> > > > >> > Thanks.
>> > >>> > > > >> >
>> > >>> > > > >> > best,
>> > >>> > > > >> > Yuheng
>> > >>> > > > >> >
>> > >>> > > > >>
>> > >>> > > > >>
>> > >>> > > > >>
>> > >>> > > > >> --
>> > >>> > > > >>
>> > >>> > > > >> Jiefu Gong
>> > >>> > > > >> University of California, Berkeley | Class of 2017
>> > >>> > > > >> B.A Computer Science | College of Letters and Sciences
>> > >>> > > > >>
>> > >>> > > > >> jg...@berkeley.edu <elise...@berkeley.edu> | (925)
>> 400-3427
>> > >>> > > > >>
>> > >>> > > >
>> > >>> > >
>> > >>> > >
>> > >>> > >
>> > >>> > > --
>> > >>> > >
>> > >>> > > Jiefu Gong
>> > >>> > > University of California, Berkeley | Class of 2017
>> > >>> > > B.A Computer Science | College of Letters and Sciences
>> > >>> > >
>> > >>> > > jg...@berkeley.edu <elise...@berkeley.edu> | (925) 400-3427
>> > >>> > >
>> > >>> >
>> > >>>
>> > >>>
>> > >>>
>> > >>> --
>> > >>>
>> > >>> Jiefu Gong
>> > >>> University of California, Berkeley | Class of 2017
>> > >>> B.A Computer Science | College of Letters and Sciences
>> > >>>
>> > >>> jg...@berkeley.edu <elise...@berkeley.edu> | (925) 400-3427
>> > >>>
>> > >>
>> > >>
>> > >
>> >
>>
>>
>>
>> --
>>
>> Jiefu Gong
>> University of California, Berkeley | Class of 2017
>> B.A Computer Science | College of Letters and Sciences
>>
>> jg...@berkeley.edu <elise...@berkeley.edu> | (925) 400-3427
>>
>
>

Re: How to run the three producers test

Reply via email to