Kafka stream or ksql design question
Hi folks, As far as I know, Kafka Stream is a separate process by reading data from topic, transform, and writing to another topic if needed. In this case, how this process supports high throughout stream as well as load balance in terms of message traffic and computing resource for stream processing? Regarding to KSL, is there any query optimization in place or in roadmap? Thanks, Will
Re: Kafka as a data ingest
In terms of big files which is quite often in HDFS, does connect task parallel process the same file like what MR deal with split files? I do not think so. In this case, Kafka connect implement has no advantages to read single big file unless you also use mapreduce. Sent from my iPhone On Jan 10, 2017, at 02:41, Ewen Cheslack-Postavawrote: >> However, I'm trying to figure out if I can use Kafka to read Hadoop file. > > The question is a bit unclear as to whether you mean "use Kafka to send > data to a Hadoop file" or "use Kafka to read a Hadoop file into a Kafka > topic". But in both cases, Kafka Connect provides a good option. > > The more common use case is sending data that you have in Kafka into HDFS. > In that case, > http://docs.confluent.io/3.1.1/connect/connect-hdfs/docs/hdfs_connector.html > is a good option. > > If you want the less common case of sending data from HDFS files into a > stream of Kafka records, I'm not aware of a connector for doing that yet > but it is definitely possible. Kafka Connect takes care of a lot of the > details for you so all you have to do is read the file and emit Connect's > SourceRecords containing the data from the file. Most other details are > handled for you. > > -Ewen > >> On Mon, Jan 9, 2017 at 9:18 PM, Sharninder wrote: >> >> If you want to know if "kafka" can read hadoop files, then no. But you can >> write your own producer that reads from hdfs any which way and pushes to >> kafka. We use kafka as the ingestion pipeline's main queue. Read from >> various sources and push everything to kafka. >> >> >> On Tue, Jan 10, 2017 at 6:26 AM, Cas Apanowicz < >> cas.apanow...@it-horizon.com >>> wrote: >> >>> Hi, >>> >>> I have general understanding of main Kafka functionality as a streaming >>> tool. >>> However, I'm trying to figure out if I can use Kafka to read Hadoop file. >>> Can you please advise? >>> Thanks >>> >>> Cas >>> >>> >> >> >> -- >> -- >> Sharninder >>
Kafka connect distribute start failed
Hi folks, I try to start the kafka connect in the distribute ways as follows. It has below error. Standalone mode is fine. It happens on the 3.0.1. and 3.1 version of confluent kafka. Des anyone know the cause of this error? Thanks, Will security.protocol = PLAINTEXT internal.key.converter = class org.apache.kafka.connect.json.JsonConverter access.control.allow.methods = ssl.keymanager.algorithm = SunX509 metrics.sample.window.ms = 3 (org.apache.kafka.connect.runtime.distributed.DistributedConfig:178) [2016-12-05 21:24:14,457] INFO Logging initialized @991ms (org.eclipse.jetty.util.log:186) Exception in thread "main" org.apache.kafka.common.KafkaException: Failed to construct kafka consumer at org.apache.kafka.connect.runtime.distributed.WorkerGroupMember.(WorkerGroupMember.java:125) at org.apache.kafka.connect.runtime.distributed.DistributedHerder.(DistributedHerder.java:148) at org.apache.kafka.connect.runtime.distributed.DistributedHerder.(DistributedHerder.java:130) at org.apache.kafka.connect.cli.ConnectDistributed.main(ConnectDistributed.java:84) Caused by: java.lang.NoSuchMethodError: org.apache.kafka.clients.consumer.internals.AbstractCoordinator.(Lorg/apache/kafka/clients/consumer/internals/ConsumerNetworkClient;Ljava/lang/String;IIILorg/apache/kafka/common/metrics/Metrics;Ljava/lang/String;Lorg/apache/kafka/common/utils/Time;J)V at org.apache.kafka.connect.runtime.distributed.WorkerCoordinator.(WorkerCoordinator.java:77) at org.apache.kafka.connect.runtime.distributed.WorkerGroupMember.(WorkerGroupMember.java:105) ... 3 more
How to collect connect metrcs
Hi folks, How I can collect Kafka connect metrics from Confluent? Are there any API to use? In addition, if one file is very big, can multiple task working on the same file simultaneously? Thanks, Will
Re: Link read avro from Kafka Connect Issue
By using the kafka-avro-console-consumer I am able to get rich message from kafka connect with AvroConvert, but it got no output except schema from Flink By using the producer with defaultEncoding, the kafka-avro-console-consumer throws exceptions show how. But Flink consumer works. But my target is to get Flink costume avro data produced by Kafka connect > On Nov 2, 2016, at 7:36 PM, Will Du <will...@gmail.com> wrote: > > > On Nov 2, 2016, at 7:31 PM, Will Du <will...@gmail.com > <mailto:will...@gmail.com>> wrote: > > Hi folks, > I am trying to consume avro data from Kafka in Flink. The data is produced by > Kafka connect using AvroConverter. I have created a > AvroDeserializationSchema.java > <https://gist.github.com/datafibers/ae9d624b6db44865ae14defe8a838123> used by > Flink consumer. Then, I use following code to read it. > > public static void main(String[] args) throws Exception { > StreamExecutionEnvironment env = > StreamExecutionEnvironment.getExecutionEnvironment(); > Properties properties = new Properties(); > properties.setProperty("bootstrap.servers", “localhost:9092"); > properties.setProperty("zookeeper.connect", “localhost:2181”); > Schema schema = new Parser().parse("{" + "\"name\": \"test\", " > + "\"type\": \"record\", " > + "\"fields\": " > +" [ " > + " { \"name\": \"name\", \"type\": > \"string\" }," > + " { \"name\": \"symbol\", > \"type\": \"string\" }," > + " { \"name\": \"exchange\", > \"type\": \"string\"}" > + "] " > +"}"); > > AvroDeserializationSchema avroSchema = new > AvroDeserializationSchema<>(schema); > FlinkKafkaConsumer09 kafkaConsumer = > new FlinkKafkaConsumer09<>("myavrotopic",avroSchema, > properties); > DataStream messageStream = > env.addSource(kafkaConsumer); > messageStream.rebalance().print(); > env.execute("Flink AVRO KAFKA Test"); > } > > Once, I run the code, I am able to get the schema information only as follows. > {"name":"", "symbol":"", "exchange":""} > {"name":"", "symbol":"", "exchange":""} > {"name":"", "symbol":"", "exchange":""} > {"name":"", "symbol":"", "exchange":”"} > > Could anyone help to find out the issues why I cannot decode it? > > Further troubleshooting, I found out if I use a kafka producer here > <https://gist.github.com/datafibers/d063b255b50fa34515c0ac9e24d4485c> to send > the avro data especially using kafka.serializer.DefaultEncoder. Above code > can get correct result. Does any body know how to either set DefaultEncoder > in Kafka Connect or set it when writing customized kafka connect? Or in the > other way, how should I modify the AvroDeserializationSchema.java for instead? > > Thanks, I’ll post this to the Flink user group as well. > Will
Link read avro from Kafka Connect Issue
On Nov 2, 2016, at 7:31 PM, Will Du <will...@gmail.com> wrote: Hi folks, I am trying to consume avro data from Kafka in Flink. The data is produced by Kafka connect using AvroConverter. I have created a AvroDeserializationSchema.java <https://gist.github.com/datafibers/ae9d624b6db44865ae14defe8a838123> used by Flink consumer. Then, I use following code to read it. public static void main(String[] args) throws Exception { StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); Properties properties = new Properties(); properties.setProperty("bootstrap.servers", “localhost:9092"); properties.setProperty("zookeeper.connect", “localhost:2181”); Schema schema = new Parser().parse("{" + "\"name\": \"test\", " + "\"type\": \"record\", " + "\"fields\": " +" [ " + " { \"name\": \"name\", \"type\": \"string\" }," + " { \"name\": \"symbol\", \"type\": \"string\" }," + " { \"name\": \"exchange\", \"type\": \"string\"}" + "] " +"}"); AvroDeserializationSchema avroSchema = new AvroDeserializationSchema<>(schema); FlinkKafkaConsumer09 kafkaConsumer = new FlinkKafkaConsumer09<>("myavrotopic",avroSchema, properties); DataStream messageStream = env.addSource(kafkaConsumer); messageStream.rebalance().print(); env.execute("Flink AVRO KAFKA Test"); } Once, I run the code, I am able to get the schema information only as follows. {"name":"", "symbol":"", "exchange":""} {"name":"", "symbol":"", "exchange":""} {"name":"", "symbol":"", "exchange":""} {"name":"", "symbol":"", "exchange":”"} Could anyone help to find out the issues why I cannot decode it? Further troubleshooting, I found out if I use a kafka producer here <https://gist.github.com/datafibers/d063b255b50fa34515c0ac9e24d4485c> to send the avro data especially using kafka.serializer.DefaultEncoder. Above code can get correct result. Does any body know how to either set DefaultEncoder in Kafka Connect or set it when writing customized kafka connect? Or in the other way, how should I modify the AvroDeserializationSchema.java for instead? Thanks, I’ll post this to the Flink user group as well. Will
connection time out
Hi guys, I was running a single node broker in a cluster. And when I run the producer in another cluster, I got connection time out error. I can ping into port 9092 and other ports on the broker machine from the producer. I just can't publish any messages. The command I used to run the producer is: bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance speedx2 50 100 -1 acks=1 bootstrap.servers=130.127.xxx.xxx:9092 buffer.memory=67184 batch.size=8196 Can anyone suggest what the problem might be? Thank you! best, Yuheng
Re: connection time out
Also, I can see the topic "speedx2" being created in the broker, but not message data is coming through. On Sun, Nov 29, 2015 at 7:00 PM, Yuheng Du <yuheng.du.h...@gmail.com> wrote: > Hi guys, > > I was running a single node broker in a cluster. And when I run the > producer in another cluster, I got connection time out error. > > I can ping into port 9092 and other ports on the broker machine from the > producer. I just can't publish any messages. The command I used to run the > producer is: > > bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance > speedx2 50 100 -1 acks=1 bootstrap.servers=130.127.xxx.xxx:9092 > buffer.memory=67184 batch.size=8196 > > Can anyone suggest what the problem might be? > > > Thank you! > > > best, > > Yuheng >
Re: producer api
Thank you Erik. But in my setup, there is only one node whose public ip provided in my broker cluster, so I can only use one bootstrap broker as for now. On Mon, Sep 14, 2015 at 3:50 PM, Helleren, Erik <erik.helle...@cmegroup.com> wrote: > You only need one of the brokers to connect for publishing. Kafka will > tell the client about all the other brokers. But best practices state > including all of them is best. > -Erik > > On 9/14/15, 2:46 PM, "Yuheng Du" <yuheng.du.h...@gmail.com> wrote: > > >I am writing a kafka producer application in java. I want the producer to > >publish data to a cluster of 6 brokers. Is there a way to specify only the > >load balancing node but not all the brokers list? > > > >For example, like in the benchmarking kafka commandssdg: > > > >bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance > >test 5000 100 -1 acks=-1 bootstrap.servers= > >esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864 > batch.size=64000 > > > >it only specifies the bootstrap server node but not all the broker list > >like: > > > >Properties props = new Properties(); > > > >props.put("metadata.broker.list", "broker1:9092,broker2:9092"); > > > > > > > >Thanks for replying. > >
producer api
I am writing a kafka producer application in java. I want the producer to publish data to a cluster of 6 brokers. Is there a way to specify only the load balancing node but not all the brokers list? For example, like in the benchmarking kafka commandssdg: bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance test 5000 100 -1 acks=-1 bootstrap.servers= esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864 batch.size=64000 it only specifies the bootstrap server node but not all the broker list like: Properties props = new Properties(); props.put("metadata.broker.list", "broker1:9092,broker2:9092"); Thanks for replying.
Re: latency test
Thank you Erik. In my test I am using fixed 200bytes messages and I run 500k messages per producer on 92 physically isolated producers. Each test run takes about 20 minutes. As the broker cluster is migrated into a new physical cluster, I will perform my test and get the latency results in the next couple of weeks. I will keep you posted. Thanks. On Wed, Sep 9, 2015 at 4:58 PM, Helleren, Erik <erik.helle...@cmegroup.com> wrote: > Yes, and that can really hurt average performance. All the partitions > were nearly identical up to the 99%’ile, and had very good performance at > that level hovering around a few milli’s. But when looking beyond the > 99%’ile, there was that clear fork in the distribution where a set of 3 > partitions surged upwards. This could be for a dozen different reasons: > Network blips, noisy networks, location in the network, resource > contention on that broker, etc. But it effected that one broker more than > others. And the reasons for my cluster displaying this behavior could be > very different than the reason for any other cluster. > > Its worth noting that this was mostly a latency test over a stress test. > There was a single kafka producer object, very small message sizes (100 > bytes), and it was only pushing through around 5MB/s worth of data. And > the client was configured to minimize the amount of data that would be on > the internal queue/buffer waiting to be sent. The messages that were > being sent were compromised of 10 byte ascii ‘words’ selected randomly > from a dictionary of 1000 words, which benefits compression while still > resulting in likely unique messages. And the test I ran was only for 6 > min, and I did not do the work required to see if there was a burst of > slower messages which caused this behavior, or if it was a consistent > issue with that node. > -Erik > > > On 9/9/15, 2:24 PM, "Yuheng Du" <yuheng.du.h...@gmail.com> wrote: > > >So are you suggesting that the long delays happened in %1 percentile > >happens in the slower partitions that are further away? Thanks. > > > >On Wed, Sep 9, 2015 at 3:15 PM, Helleren, Erik > ><erik.helle...@cmegroup.com> > >wrote: > > > >> So, I did my own latency test on a cluster of 3 nodes, and there is a > >> significant difference around the 99%’ile and higher for partitions when > >> measuring the the ack time when configured for a single ack. The graph > >> that I wish I could attach or post clearly shows that around 1/3 of the > >> partitions significantly diverge from the other two. So, at least in my > >> case, one of my brokers is further than the others. > >> -Erik > >> > >> On 9/4/15, 1:06 PM, "Yuheng Du" <yuheng.du.h...@gmail.com> wrote: > >> > >> >No problem. Thanks for your advice. I think it would be fun to > >>explore. I > >> >only know how to program in java though. Hope it will work. > >> > > >> >On Fri, Sep 4, 2015 at 2:03 PM, Helleren, Erik > >> ><erik.helle...@cmegroup.com> > >> >wrote: > >> > > >> >> I thing the suggestion is to have partitions/brokers >=1, so 32 > >>should > >> >>be > >> >> enough. > >> >> > >> >> As for latency tests, there isn’t a lot of code to do a latency test. > >> >>If > >> >> you just want to measure ack time its around 100 lines. I will try > >>to > >> >> push out some good latency testing code to github, but my company is > >> >> scared of open sourcing code… so it might be a while… > >> >> -Erik > >> >> > >> >> > >> >> On 9/4/15, 12:55 PM, "Yuheng Du" <yuheng.du.h...@gmail.com> wrote: > >> >> > >> >> >Thanks for your reply Erik. I am running some more tests according > >>to > >> >>your > >> >> >suggestions now and I will share with my results here. Is it > >>necessary > >> >>to > >> >> >use a fixed number of partitions (32 partitions maybe) for my test? > >> >> > > >> >> >I am testing 2, 4, 8, 16 and 32 brokers scenarios, all of them are > >> >>running > >> >> >on individual physical nodes. So I think using at least 32 > >>partitions > >> >>will > >> >> >make more sense? I have seen latencies increase as the number of > >> >> >partitions > >> >> >goes up in my experiments. > >> >> > > >&g
Re: latency test
So are you suggesting that the long delays happened in %1 percentile happens in the slower partitions that are further away? Thanks. On Wed, Sep 9, 2015 at 3:15 PM, Helleren, Erik <erik.helle...@cmegroup.com> wrote: > So, I did my own latency test on a cluster of 3 nodes, and there is a > significant difference around the 99%’ile and higher for partitions when > measuring the the ack time when configured for a single ack. The graph > that I wish I could attach or post clearly shows that around 1/3 of the > partitions significantly diverge from the other two. So, at least in my > case, one of my brokers is further than the others. > -Erik > > On 9/4/15, 1:06 PM, "Yuheng Du" <yuheng.du.h...@gmail.com> wrote: > > >No problem. Thanks for your advice. I think it would be fun to explore. I > >only know how to program in java though. Hope it will work. > > > >On Fri, Sep 4, 2015 at 2:03 PM, Helleren, Erik > ><erik.helle...@cmegroup.com> > >wrote: > > > >> I thing the suggestion is to have partitions/brokers >=1, so 32 should > >>be > >> enough. > >> > >> As for latency tests, there isn’t a lot of code to do a latency test. > >>If > >> you just want to measure ack time its around 100 lines. I will try to > >> push out some good latency testing code to github, but my company is > >> scared of open sourcing code… so it might be a while… > >> -Erik > >> > >> > >> On 9/4/15, 12:55 PM, "Yuheng Du" <yuheng.du.h...@gmail.com> wrote: > >> > >> >Thanks for your reply Erik. I am running some more tests according to > >>your > >> >suggestions now and I will share with my results here. Is it necessary > >>to > >> >use a fixed number of partitions (32 partitions maybe) for my test? > >> > > >> >I am testing 2, 4, 8, 16 and 32 brokers scenarios, all of them are > >>running > >> >on individual physical nodes. So I think using at least 32 partitions > >>will > >> >make more sense? I have seen latencies increase as the number of > >> >partitions > >> >goes up in my experiments. > >> > > >> >To get the latency of each event data recorded, are you suggesting > >>that I > >> >rewrite my own test program (in Java perhaps) or I can just modify the > >> >standard test program provided by kafka ( > >> >https://gist.github.com/jkreps/c7ddb4041ef62a900e6c )? I guess I need > >>to > >> >rebuild the source if I modify the standard java test program > >> >ProducerPerformance provided in kafka, right? Now this standard program > >> >only has average latencies and percentile latencies but no per event > >> >latencies. > >> > > >> >Thanks. > >> > > >> >On Fri, Sep 4, 2015 at 1:42 PM, Helleren, Erik > >> ><erik.helle...@cmegroup.com> > >> >wrote: > >> > > >> >> That is an excellent question! There are a bunch of ways to monitor > >> >> jitter and see when that is happening. Here are a few: > >> >> > >> >> - You could slice the histogram every few seconds, save it out with a > >> >> timestamp, and then look at how they compare. This would be mostly > >> >> manual, or you can graph line charts of the percentiles over time in > >> >>excel > >> >> where each percentile would be a series. If you are using HDR > >> >>Histogram, > >> >> you should look at how to use the Recorder class to do this coupled > >> >>with a > >> >> ScheduledExecutorService. > >> >> > >> >> - You can just save the starting timestamp of the event and the > >>latency > >> >>of > >> >> each event. If you put it into a CSV, you can just load it up into > >> >>excel > >> >> and graph as a XY chart. That way you can see every point during the > >> >> running of your program and you can see trends. You want to be > >>careful > >> >> about this one, especially of writing to a file in the callback that > >> >>kfaka > >> >> provides. > >> >> > >> >> Also, I have noticed that most of the very slow observations are at > >> >> startup. But don’t trust me, trust the data and share your findings. > >> >> Also, having a 99.9 percentile provides a pretty good standard for > >&g
When a message is exposed to the consumer
According to the section 3.1 of the paper "Kafka: a Distributed Messaging System for Log Processing": "a message is only exposed to the consumers after it is flushed"? Is it still true in the current kafka? like the message can only be available after it is flushed to disk? Thanks.
Re: latency test
When I using 32 partitions, the 4 brokers latency becomes larger than the 8 brokers latency. So is it always true that using more brokers can give less latency when the number of partitions is at least the size of the brokers? Thanks. On Thu, Sep 3, 2015 at 10:45 PM, Yuheng Du <yuheng.du.h...@gmail.com> wrote: > I am running a producer latency test. When using 92 producers in 92 > physical node publishing to 4 brokers, the latency is slightly lower than > using 8 brokers, I am using 8 partitions for the topic. > > I have rerun the test and it gives me the same result, the 4 brokers > scenario still has lower latency than the 8 brokers scenarios. > > It is weird because I tested 1broker, 2 brokers, 4 brokers, 8 brokers, 16 > brokers and 32 brokers. For the rest of the case the latency decreases as > the number of brokers increase. > > 4 brokers/8 brokers is the only pair that doesn't satisfy this rule. What > could be the cause? > > I am using a 200 bytes message, the test let each producer publishes 500k > messages to a given topic. Every test run when I change the number of > brokers, I use a new topic. > > Thanks for any advices. >
Re: latency test
Thank you Erik! That's is helpful! But also I see jitters of the maximum latencies when running the experiment. The average end to acknowledgement latency from producer to broker is around 5ms when using 92 producers and 4 brokers, and the 99.9 percentile latency is 58ms, but the maximum latency goes up to 1359 ms. How to locate the source of this jitter? Thanks. On Fri, Sep 4, 2015 at 10:54 AM, Helleren, Erik <erik.helle...@cmegroup.com> wrote: > WellŠ not to be contrarian, but latency depends much more on the latency > between the producer and the broker that is the leader for the partition > you are publishing to. At least when your brokers are not saturated with > messages, and acks are set to 1. If acks are set to ALL, latency on an > non-saturated kafka cluster will be: Round Trip Latency from producer to > leader for partition + Max( slowest Round Trip Latency to a replicas of > that partition). If a cluster is saturated with messages, we have to > assume that all partitions receive an equal distribution of messages to > avoid linear algebra and queueing theory models. I don¹t like linear > algebra :P > > Since you are probably putting all your latencies into a single histogram > per producer, or worse, just an average, this pattern would have been > obscured. Obligatory lecture about measuring latency by Gil Tene > (https://www.youtube.com/watch?v=9MKY4KypBzg). To verify this hypothesis, > you should re-write the benchmark to plot the latencies for each write to > a partition for each producer into a histogram. (HRD histogram is pretty > good for that). This would give you producers*partitions histograms, > which might be unwieldy for that many producers. But wait, there is hope! > > To verify that this hypothesis holds, you just have to see that there is a > significant difference between different partitions on a SINGLE producing > client. So, pick one producing client at random and use the data from > that. The easy way to do that is just plot all the partition latency > histograms on top of each other in the same plot, that way you have a > pretty plot to show people. If you don¹t want to setup plotting, you can > just compare the medians (50¹th percentile) of the partitions¹ histograms. > If there is a lot of variance, your latency anomaly is explained by > brokers 4-7 being slower than nodes 0-3! If there isn¹t a lot of variance > at 50%, look at higher percentiles. And if higher percentiles for all the > partitions look the same, this hypothesis is disproved. > > If you want to make a general statement about latency of writing to kafka, > you can merge all the histograms into a single histogram and plot that. > > To Yuheng¹s credit, more brokers always results in more throughput. But > throughput and latency are two different creatures. Its worth noting that > kafka is designed to be high throughput first and low latency second. And > it does a really good job at both. > > Disclaimer: I might not like linear algebra, but I do like statistics. > Let me know if there are topics that need more explanation above that > aren¹t covered by Gil¹s lecture. > -Erik > > On 9/4/15, 9:03 AM, "Yuheng Du" <yuheng.du.h...@gmail.com> wrote: > > >When I using 32 partitions, the 4 brokers latency becomes larger than the > >8 > >brokers latency. > > > >So is it always true that using more brokers can give less latency when > >the > >number of partitions is at least the size of the brokers? > > > >Thanks. > > > >On Thu, Sep 3, 2015 at 10:45 PM, Yuheng Du <yuheng.du.h...@gmail.com> > >wrote: > > > >> I am running a producer latency test. When using 92 producers in 92 > >> physical node publishing to 4 brokers, the latency is slightly lower > >>than > >> using 8 brokers, I am using 8 partitions for the topic. > >> > >> I have rerun the test and it gives me the same result, the 4 brokers > >> scenario still has lower latency than the 8 brokers scenarios. > >> > >> It is weird because I tested 1broker, 2 brokers, 4 brokers, 8 brokers, > >>16 > >> brokers and 32 brokers. For the rest of the case the latency decreases > >>as > >> the number of brokers increase. > >> > >> 4 brokers/8 brokers is the only pair that doesn't satisfy this rule. > >>What > >> could be the cause? > >> > >> I am using a 200 bytes message, the test let each producer publishes > >>500k > >> messages to a given topic. Every test run when I change the number of > >> brokers, I use a new topic. > >> > >> Thanks for any advices. > >> > >
Re: When a message is exposed to the consumer
Can't read it. Sorry On Fri, Sep 4, 2015 at 12:08 PM, Roman Shramkov <roman_shram...@epam.com> wrote: > Её ай н Анны уйг > > sent from a mobile device, please excuse brevity and typos > > > ----Пользователь Yuheng Du написал > > According to the section 3.1 of the paper "Kafka: a Distributed Messaging > System for Log Processing": > > "a message is only exposed to the consumers after it is flushed"? > > Is it still true in the current kafka? like the message can only be > available after it is flushed to disk? > > Thanks. >
Re: latency test
Thanks for your reply Erik. I am running some more tests according to your suggestions now and I will share with my results here. Is it necessary to use a fixed number of partitions (32 partitions maybe) for my test? I am testing 2, 4, 8, 16 and 32 brokers scenarios, all of them are running on individual physical nodes. So I think using at least 32 partitions will make more sense? I have seen latencies increase as the number of partitions goes up in my experiments. To get the latency of each event data recorded, are you suggesting that I rewrite my own test program (in Java perhaps) or I can just modify the standard test program provided by kafka ( https://gist.github.com/jkreps/c7ddb4041ef62a900e6c )? I guess I need to rebuild the source if I modify the standard java test program ProducerPerformance provided in kafka, right? Now this standard program only has average latencies and percentile latencies but no per event latencies. Thanks. On Fri, Sep 4, 2015 at 1:42 PM, Helleren, Erik <erik.helle...@cmegroup.com> wrote: > That is an excellent question! There are a bunch of ways to monitor > jitter and see when that is happening. Here are a few: > > - You could slice the histogram every few seconds, save it out with a > timestamp, and then look at how they compare. This would be mostly > manual, or you can graph line charts of the percentiles over time in excel > where each percentile would be a series. If you are using HDR Histogram, > you should look at how to use the Recorder class to do this coupled with a > ScheduledExecutorService. > > - You can just save the starting timestamp of the event and the latency of > each event. If you put it into a CSV, you can just load it up into excel > and graph as a XY chart. That way you can see every point during the > running of your program and you can see trends. You want to be careful > about this one, especially of writing to a file in the callback that kfaka > provides. > > Also, I have noticed that most of the very slow observations are at > startup. But don’t trust me, trust the data and share your findings. > Also, having a 99.9 percentile provides a pretty good standard for typical > poor case performance. Average is borderline useless, 50%’ile is a better > typical case because that’s the number that says “half of events will be > this slow or faster”, or for values that are high like 99.9%’ile, “0.1% of > all events will be slower than this”. > -Erik > > On 9/4/15, 12:05 PM, "Yuheng Du" <yuheng.du.h...@gmail.com> wrote: > > >Thank you Erik! That's is helpful! > > > >But also I see jitters of the maximum latencies when running the > >experiment. > > > >The average end to acknowledgement latency from producer to broker is > >around 5ms when using 92 producers and 4 brokers, and the 99.9 percentile > >latency is 58ms, but the maximum latency goes up to 1359 ms. How to locate > >the source of this jitter? > > > >Thanks. > > > >On Fri, Sep 4, 2015 at 10:54 AM, Helleren, Erik > ><erik.helle...@cmegroup.com> > >wrote: > > > >> WellŠ not to be contrarian, but latency depends much more on the latency > >> between the producer and the broker that is the leader for the partition > >> you are publishing to. At least when your brokers are not saturated > >>with > >> messages, and acks are set to 1. If acks are set to ALL, latency on an > >> non-saturated kafka cluster will be: Round Trip Latency from producer to > >> leader for partition + Max( slowest Round Trip Latency to a replicas of > >> that partition). If a cluster is saturated with messages, we have to > >> assume that all partitions receive an equal distribution of messages to > >> avoid linear algebra and queueing theory models. I don¹t like linear > >> algebra :P > >> > >> Since you are probably putting all your latencies into a single > >>histogram > >> per producer, or worse, just an average, this pattern would have been > >> obscured. Obligatory lecture about measuring latency by Gil Tene > >> (https://www.youtube.com/watch?v=9MKY4KypBzg). To verify this > >>hypothesis, > >> you should re-write the benchmark to plot the latencies for each write > >>to > >> a partition for each producer into a histogram. (HRD histogram is pretty > >> good for that). This would give you producers*partitions histograms, > >> which might be unwieldy for that many producers. But wait, there is > >>hope! > >> > >> To verify that this hypothesis holds, you just have to see that there > >>is a > >> significant difference between different partitions on a S
Re: latency test
No problem. Thanks for your advice. I think it would be fun to explore. I only know how to program in java though. Hope it will work. On Fri, Sep 4, 2015 at 2:03 PM, Helleren, Erik <erik.helle...@cmegroup.com> wrote: > I thing the suggestion is to have partitions/brokers >=1, so 32 should be > enough. > > As for latency tests, there isn’t a lot of code to do a latency test. If > you just want to measure ack time its around 100 lines. I will try to > push out some good latency testing code to github, but my company is > scared of open sourcing code… so it might be a while… > -Erik > > > On 9/4/15, 12:55 PM, "Yuheng Du" <yuheng.du.h...@gmail.com> wrote: > > >Thanks for your reply Erik. I am running some more tests according to your > >suggestions now and I will share with my results here. Is it necessary to > >use a fixed number of partitions (32 partitions maybe) for my test? > > > >I am testing 2, 4, 8, 16 and 32 brokers scenarios, all of them are running > >on individual physical nodes. So I think using at least 32 partitions will > >make more sense? I have seen latencies increase as the number of > >partitions > >goes up in my experiments. > > > >To get the latency of each event data recorded, are you suggesting that I > >rewrite my own test program (in Java perhaps) or I can just modify the > >standard test program provided by kafka ( > >https://gist.github.com/jkreps/c7ddb4041ef62a900e6c )? I guess I need to > >rebuild the source if I modify the standard java test program > >ProducerPerformance provided in kafka, right? Now this standard program > >only has average latencies and percentile latencies but no per event > >latencies. > > > >Thanks. > > > >On Fri, Sep 4, 2015 at 1:42 PM, Helleren, Erik > ><erik.helle...@cmegroup.com> > >wrote: > > > >> That is an excellent question! There are a bunch of ways to monitor > >> jitter and see when that is happening. Here are a few: > >> > >> - You could slice the histogram every few seconds, save it out with a > >> timestamp, and then look at how they compare. This would be mostly > >> manual, or you can graph line charts of the percentiles over time in > >>excel > >> where each percentile would be a series. If you are using HDR > >>Histogram, > >> you should look at how to use the Recorder class to do this coupled > >>with a > >> ScheduledExecutorService. > >> > >> - You can just save the starting timestamp of the event and the latency > >>of > >> each event. If you put it into a CSV, you can just load it up into > >>excel > >> and graph as a XY chart. That way you can see every point during the > >> running of your program and you can see trends. You want to be careful > >> about this one, especially of writing to a file in the callback that > >>kfaka > >> provides. > >> > >> Also, I have noticed that most of the very slow observations are at > >> startup. But don’t trust me, trust the data and share your findings. > >> Also, having a 99.9 percentile provides a pretty good standard for > >>typical > >> poor case performance. Average is borderline useless, 50%’ile is a > >>better > >> typical case because that’s the number that says “half of events will be > >> this slow or faster”, or for values that are high like 99.9%’ile, “0.1% > >>of > >> all events will be slower than this”. > >> -Erik > >> > >> On 9/4/15, 12:05 PM, "Yuheng Du" <yuheng.du.h...@gmail.com> wrote: > >> > >> >Thank you Erik! That's is helpful! > >> > > >> >But also I see jitters of the maximum latencies when running the > >> >experiment. > >> > > >> >The average end to acknowledgement latency from producer to broker is > >> >around 5ms when using 92 producers and 4 brokers, and the 99.9 > >>percentile > >> >latency is 58ms, but the maximum latency goes up to 1359 ms. How to > >>locate > >> >the source of this jitter? > >> > > >> >Thanks. > >> > > >> >On Fri, Sep 4, 2015 at 10:54 AM, Helleren, Erik > >> ><erik.helle...@cmegroup.com> > >> >wrote: > >> > > >> >> WellŠ not to be contrarian, but latency depends much more on the > >>latency > >> >> between the producer and the broker that is the leader for the > >>partition > >> >> you are pub
latency test
I am running a producer latency test. When using 92 producers in 92 physical node publishing to 4 brokers, the latency is slightly lower than using 8 brokers, I am using 8 partitions for the topic. I have rerun the test and it gives me the same result, the 4 brokers scenario still has lower latency than the 8 brokers scenarios. It is weird because I tested 1broker, 2 brokers, 4 brokers, 8 brokers, 16 brokers and 32 brokers. For the rest of the case the latency decreases as the number of brokers increase. 4 brokers/8 brokers is the only pair that doesn't satisfy this rule. What could be the cause? I am using a 200 bytes message, the test let each producer publishes 500k messages to a given topic. Every test run when I change the number of brokers, I use a new topic. Thanks for any advices.
Re: Reduce latency
Also, When I set the target throughput to be 1 records/s, The actual test results show I got an average of 579.86 records per second among all my producers. How did that happen? Why this number is not 1 then? Thanks. On Tue, Aug 18, 2015 at 10:03 AM, Yuheng Du yuheng.du.h...@gmail.com wrote: Thank you Jay, that really helps! Kishore, Where you can monitor whether the network is busy on IO in visual vm? Thanks. I am running 90 producer process on 90 physical machines in the experiment. On Tue, Aug 18, 2015 at 1:19 AM, Jay Kreps j...@confluent.io wrote: Yuheng, From the command you gave it looks like you are configuring the perf test to send data as fast as possible (the -1 for target throughput). This means it will always queue up a bunch of unsent data until the buffer is exhausted and then block. The larger the buffer, the bigger the queue. This is where the latency comes from. This is exactly what you would expect and what the buffering is supposed to do. If you want to measure latency this test doesn't really make sense, you need to measure with some fixed throughput. Instead of -1 enter the target throughput you want to measure latency at (e.g. 10 records/sec). -Jay On Thu, Aug 13, 2015 at 12:18 PM, Yuheng Du yuheng.du.h...@gmail.com wrote: Thank you Alvaro, How to use sync producers? I am running the standard ProducerPerformance test from kafka to measure the latency of each message to send from producer to broker only. The command is like bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance test7 5000 100 -1 acks=1 bootstrap.servers=esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864 batch.size=8196 For running producers, where should I put the producer.type=sync configuration into? The config/server.properties? Also Does this mean we are using batch size of 1? Which version of Kafka are you using? thanks. On Thu, Aug 13, 2015 at 3:01 PM, Alvaro Gareppe agare...@gmail.com wrote: Are you measuring latency as time between producer and consumer ? In that case, the ack shouldn't affect the latency, cause even tough your producer is not going to wait for the ack, the consumer will only get the message after its commited in the server. About latency my best result occur with sync producers, but the throughput is much lower in that case. About not flushing to disk I'm pretty sure that it's not an option in kafka (correct me if I'm wrong) Regards, Alvaro Gareppe On Thu, Aug 13, 2015 at 12:59 PM, Yuheng Du yuheng.du.h...@gmail.com wrote: Also, the latency results show no major difference when using ack=0 or ack=1. Why is that? On Thu, Aug 13, 2015 at 11:51 AM, Yuheng Du yuheng.du.h...@gmail.com wrote: I am running an experiment where 92 producers is publishing data into 6 brokers and 10 consumer are reading online data simultaneously. How should I do to reduce the latency? Currently when I run the producer performance test the average latency is around 10s. Should I disable log.flush? How to do that? Thanks. -- Ing. Alvaro Gareppe agare...@gmail.com
Re: Reduce latency
I see. Thank you Tao. But now I don't get it what Jay said that my latency test only makes sense if I set a fixed throughput. Why do I need to set a fixed throughput for my test instead of just set the expected throughput to be -1 (as much as possible)? Thanks. On Tue, Aug 18, 2015 at 2:43 PM, Tao Feng fengta...@gmail.com wrote: Hi Yuheng, The 1 record/s is just a param for producerperformance for your producer target tput. It only takes effect to do the throttling if you tries to send more than 1 record/s. The actual tput of the test depends on your producer config and your setup. -Tao On Tue, Aug 18, 2015 at 11:34 AM, Yuheng Du yuheng.du.h...@gmail.com wrote: Also, When I set the target throughput to be 1 records/s, The actual test results show I got an average of 579.86 records per second among all my producers. How did that happen? Why this number is not 1 then? Thanks. On Tue, Aug 18, 2015 at 10:03 AM, Yuheng Du yuheng.du.h...@gmail.com wrote: Thank you Jay, that really helps! Kishore, Where you can monitor whether the network is busy on IO in visual vm? Thanks. I am running 90 producer process on 90 physical machines in the experiment. On Tue, Aug 18, 2015 at 1:19 AM, Jay Kreps j...@confluent.io wrote: Yuheng, From the command you gave it looks like you are configuring the perf test to send data as fast as possible (the -1 for target throughput). This means it will always queue up a bunch of unsent data until the buffer is exhausted and then block. The larger the buffer, the bigger the queue. This is where the latency comes from. This is exactly what you would expect and what the buffering is supposed to do. If you want to measure latency this test doesn't really make sense, you need to measure with some fixed throughput. Instead of -1 enter the target throughput you want to measure latency at (e.g. 10 records/sec). -Jay On Thu, Aug 13, 2015 at 12:18 PM, Yuheng Du yuheng.du.h...@gmail.com wrote: Thank you Alvaro, How to use sync producers? I am running the standard ProducerPerformance test from kafka to measure the latency of each message to send from producer to broker only. The command is like bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance test7 5000 100 -1 acks=1 bootstrap.servers=esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864 batch.size=8196 For running producers, where should I put the producer.type=sync configuration into? The config/server.properties? Also Does this mean we are using batch size of 1? Which version of Kafka are you using? thanks. On Thu, Aug 13, 2015 at 3:01 PM, Alvaro Gareppe agare...@gmail.com wrote: Are you measuring latency as time between producer and consumer ? In that case, the ack shouldn't affect the latency, cause even tough your producer is not going to wait for the ack, the consumer will only get the message after its commited in the server. About latency my best result occur with sync producers, but the throughput is much lower in that case. About not flushing to disk I'm pretty sure that it's not an option in kafka (correct me if I'm wrong) Regards, Alvaro Gareppe On Thu, Aug 13, 2015 at 12:59 PM, Yuheng Du yuheng.du.h...@gmail.com wrote: Also, the latency results show no major difference when using ack=0 or ack=1. Why is that? On Thu, Aug 13, 2015 at 11:51 AM, Yuheng Du yuheng.du.h...@gmail.com wrote: I am running an experiment where 92 producers is publishing data into 6 brokers and 10 consumer are reading online data simultaneously. How should I do to reduce the latency? Currently when I run the producer performance test the average latency is around 10s. Should I disable log.flush? How to do that? Thanks. -- Ing. Alvaro Gareppe agare...@gmail.com
Re: Reduce latency
I see. So the internal queue overwrites the producer buffer size configuration? When buffer is full the producer will block sending, right? On Tue, Aug 18, 2015 at 3:52 PM, Tao Feng fengta...@gmail.com wrote: From what I understand, if you set the throughput to -1, the producerperformance will push records as much as possible to an internal per topic per partition queue. In the background there is a sender IO thread handling the actual record sending process. If you push record to the queue faster than the send rate, your queue will become longer and longer, eventually record latency will become meaningless for a latency-purpose test. On Tue, Aug 18, 2015 at 11:48 AM, Yuheng Du yuheng.du.h...@gmail.com wrote: I see. Thank you Tao. But now I don't get it what Jay said that my latency test only makes sense if I set a fixed throughput. Why do I need to set a fixed throughput for my test instead of just set the expected throughput to be -1 (as much as possible)? Thanks. On Tue, Aug 18, 2015 at 2:43 PM, Tao Feng fengta...@gmail.com wrote: Hi Yuheng, The 1 record/s is just a param for producerperformance for your producer target tput. It only takes effect to do the throttling if you tries to send more than 1 record/s. The actual tput of the test depends on your producer config and your setup. -Tao On Tue, Aug 18, 2015 at 11:34 AM, Yuheng Du yuheng.du.h...@gmail.com wrote: Also, When I set the target throughput to be 1 records/s, The actual test results show I got an average of 579.86 records per second among all my producers. How did that happen? Why this number is not 1 then? Thanks. On Tue, Aug 18, 2015 at 10:03 AM, Yuheng Du yuheng.du.h...@gmail.com wrote: Thank you Jay, that really helps! Kishore, Where you can monitor whether the network is busy on IO in visual vm? Thanks. I am running 90 producer process on 90 physical machines in the experiment. On Tue, Aug 18, 2015 at 1:19 AM, Jay Kreps j...@confluent.io wrote: Yuheng, From the command you gave it looks like you are configuring the perf test to send data as fast as possible (the -1 for target throughput). This means it will always queue up a bunch of unsent data until the buffer is exhausted and then block. The larger the buffer, the bigger the queue. This is where the latency comes from. This is exactly what you would expect and what the buffering is supposed to do. If you want to measure latency this test doesn't really make sense, you need to measure with some fixed throughput. Instead of -1 enter the target throughput you want to measure latency at (e.g. 10 records/sec). -Jay On Thu, Aug 13, 2015 at 12:18 PM, Yuheng Du yuheng.du.h...@gmail.com wrote: Thank you Alvaro, How to use sync producers? I am running the standard ProducerPerformance test from kafka to measure the latency of each message to send from producer to broker only. The command is like bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance test7 5000 100 -1 acks=1 bootstrap.servers=esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864 batch.size=8196 For running producers, where should I put the producer.type=sync configuration into? The config/server.properties? Also Does this mean we are using batch size of 1? Which version of Kafka are you using? thanks. On Thu, Aug 13, 2015 at 3:01 PM, Alvaro Gareppe agare...@gmail.com wrote: Are you measuring latency as time between producer and consumer ? In that case, the ack shouldn't affect the latency, cause even tough your producer is not going to wait for the ack, the consumer will only get the message after its commited in the server. About latency my best result occur with sync producers, but the throughput is much lower in that case. About not flushing to disk I'm pretty sure that it's not an option in kafka (correct me if I'm wrong) Regards, Alvaro Gareppe On Thu, Aug 13, 2015 at 12:59 PM, Yuheng Du yuheng.du.h...@gmail.com wrote: Also, the latency results show no major difference when using ack=0 or ack=1. Why is that? On Thu, Aug 13, 2015 at 11:51 AM, Yuheng Du yuheng.du.h...@gmail.com wrote: I am running an experiment where 92 producers is publishing data into 6 brokers and 10 consumer are reading online data simultaneously. How should I do to reduce the latency? Currently when I
Re: Reduce latency
Thank you Jay, that really helps! Kishore, Where you can monitor whether the network is busy on IO in visual vm? Thanks. I am running 90 producer process on 90 physical machines in the experiment. On Tue, Aug 18, 2015 at 1:19 AM, Jay Kreps j...@confluent.io wrote: Yuheng, From the command you gave it looks like you are configuring the perf test to send data as fast as possible (the -1 for target throughput). This means it will always queue up a bunch of unsent data until the buffer is exhausted and then block. The larger the buffer, the bigger the queue. This is where the latency comes from. This is exactly what you would expect and what the buffering is supposed to do. If you want to measure latency this test doesn't really make sense, you need to measure with some fixed throughput. Instead of -1 enter the target throughput you want to measure latency at (e.g. 10 records/sec). -Jay On Thu, Aug 13, 2015 at 12:18 PM, Yuheng Du yuheng.du.h...@gmail.com wrote: Thank you Alvaro, How to use sync producers? I am running the standard ProducerPerformance test from kafka to measure the latency of each message to send from producer to broker only. The command is like bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance test7 5000 100 -1 acks=1 bootstrap.servers=esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864 batch.size=8196 For running producers, where should I put the producer.type=sync configuration into? The config/server.properties? Also Does this mean we are using batch size of 1? Which version of Kafka are you using? thanks. On Thu, Aug 13, 2015 at 3:01 PM, Alvaro Gareppe agare...@gmail.com wrote: Are you measuring latency as time between producer and consumer ? In that case, the ack shouldn't affect the latency, cause even tough your producer is not going to wait for the ack, the consumer will only get the message after its commited in the server. About latency my best result occur with sync producers, but the throughput is much lower in that case. About not flushing to disk I'm pretty sure that it's not an option in kafka (correct me if I'm wrong) Regards, Alvaro Gareppe On Thu, Aug 13, 2015 at 12:59 PM, Yuheng Du yuheng.du.h...@gmail.com wrote: Also, the latency results show no major difference when using ack=0 or ack=1. Why is that? On Thu, Aug 13, 2015 at 11:51 AM, Yuheng Du yuheng.du.h...@gmail.com wrote: I am running an experiment where 92 producers is publishing data into 6 brokers and 10 consumer are reading online data simultaneously. How should I do to reduce the latency? Currently when I run the producer performance test the average latency is around 10s. Should I disable log.flush? How to do that? Thanks. -- Ing. Alvaro Gareppe agare...@gmail.com
Re: Reduce latency
Thank you Kishore, I made the buffer twice the size of the batch size and the latency has reduced significantly. But is there only one thread io thread sending the batches? Can I increase the number of threads sending the batches so more than one batch could be sent at the same time? Thanks. On Thu, Aug 13, 2015 at 5:38 PM, Kishore Senji kse...@gmail.com wrote: Your batch.size is 8196 and your buffer.memory is 67108864. This means 67108864/8196 ~ 8188 batches are in memory ready to the sent. There is only one thread io thread sending them. I would guess that the io thread ( kafka-producer-network-thread) would be busy. Please check it in visual vm. In steady state, every batch has to wait for the previous 8187 batches to be done before it gets a chance to be sent out, but the latency is counted from the time is added to the queue. This is the reason that you are seeing very high end-to-end latency. Have the buffer.memory to be only twice that of the batch.size so that while one is in flight, you can another batch ready to go (and the KafkaProducer would block to send more when there is no memory and this way the batches are not waiting in the queue unnecessarily) . Also may be you want to increase the batch.size further more, you will get even better throughput with more or less same latency (as there is no shortage of events in the test program). On Thu, Aug 13, 2015 at 1:13 PM Yuheng Du yuheng.du.h...@gmail.com wrote: Yes there is. But if we are using ProducerPerformance test, it's configured as giving input when running the test command. Do you write a java program to test the latency? Thanks. On Thu, Aug 13, 2015 at 3:54 PM, Alvaro Gareppe agare...@gmail.com wrote: I'm using last one, but not using the ProducerPerformance, I created my own. but I think there is a producer.properties file in config folder in kafka.. is that configuration not for this tester ? On Thu, Aug 13, 2015 at 4:18 PM, Yuheng Du yuheng.du.h...@gmail.com wrote: Thank you Alvaro, How to use sync producers? I am running the standard ProducerPerformance test from kafka to measure the latency of each message to send from producer to broker only. The command is like bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance test7 5000 100 -1 acks=1 bootstrap.servers=esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864 batch.size=8196 For running producers, where should I put the producer.type=sync configuration into? The config/server.properties? Also Does this mean we are using batch size of 1? Which version of Kafka are you using? thanks. On Thu, Aug 13, 2015 at 3:01 PM, Alvaro Gareppe agare...@gmail.com wrote: Are you measuring latency as time between producer and consumer ? In that case, the ack shouldn't affect the latency, cause even tough your producer is not going to wait for the ack, the consumer will only get the message after its commited in the server. About latency my best result occur with sync producers, but the throughput is much lower in that case. About not flushing to disk I'm pretty sure that it's not an option in kafka (correct me if I'm wrong) Regards, Alvaro Gareppe On Thu, Aug 13, 2015 at 12:59 PM, Yuheng Du yuheng.du.h...@gmail.com wrote: Also, the latency results show no major difference when using ack=0 or ack=1. Why is that? On Thu, Aug 13, 2015 at 11:51 AM, Yuheng Du yuheng.du.h...@gmail.com wrote: I am running an experiment where 92 producers is publishing data into 6 brokers and 10 consumer are reading online data simultaneously. How should I do to reduce the latency? Currently when I run the producer performance test the average latency is around 10s. Should I disable log.flush? How to do that? Thanks. -- Ing. Alvaro Gareppe agare...@gmail.com -- Ing. Alvaro Gareppe agare...@gmail.com
Re: use page cache as much as possiblee
So if I understand correctly, even if I delay flushing, the consumer will get the messages as soon as the broker receives them and put them into page cache (assuming producer doesn't wait for acks from brokers)? And will the decrease of log.flush interval help reduce latency between producer and consumer? Thanks. On Fri, Aug 14, 2015 at 11:57 AM, Kishore Senji kse...@gmail.com wrote: Thank you Gwen for correcting me. This document ( https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Replication) in Writes section also has specified the same thing as you have mentioned. One thing is not clear to me as to what happens when the Replicas add the message to memory but the leader fails before acking to the producer. Later the leader replica is chosen to be the leader for the partition, it will advance the HW to its LEO (which has the message). The producer can resend the same message thinking it failed and there will be a duplicate message. Is my understanding correct here? On Thu, Aug 13, 2015 at 10:50 PM, Gwen Shapira g...@confluent.io wrote: On Thu, Aug 13, 2015 at 4:10 PM, Kishore Senji kse...@gmail.com wrote: Consumers can only fetch data up to the committed offset and the reason is reliability and durability on a broker crash (some consumers might get the new data and some may not as the data is not yet committed and lost). Data will be committed when it is flushed. So if you delay the flushing, consumers won't get those messages until that time. As far as I know, this is not accurate. A message is considered committed when all ISR replicas received it (this much is documented). This doesn't need to include writing to disk, which will happen asynchronously. Even though you flush periodically based on log.flush.interval.messages and log.flush.interval.ms, if the segment file is in the pagecache, the consumers will still benefit from that pagecache and OS wouldn't read it again from disk. On Thu, Aug 13, 2015 at 2:54 PM Yuheng Du yuheng.du.h...@gmail.com wrote: Hi, As I understand it, kafka brokers will store the incoming messages into pagecache as much as possible and then flush them into disk, right? But in my experiment where 90 producers is publishing data into 6 brokers, I see that the log directory on disk where broker stores the data is constantly increasing (every seconds.) So why this is happening? Does this has to do with the default log.flush.interval setting? I want the broker to write to disk less often when serving some on-line consumers to reduce latency. I tested in my broker the disk write speed is around 110MB/s. Thanks for any replies.
Reduce latency
I am running an experiment where 92 producers is publishing data into 6 brokers and 10 consumer are reading online data simultaneously. How should I do to reduce the latency? Currently when I run the producer performance test the average latency is around 10s. Should I disable log.flush? How to do that? Thanks.
Re: Reduce latency
Also, the latency results show no major difference when using ack=0 or ack=1. Why is that? On Thu, Aug 13, 2015 at 11:51 AM, Yuheng Du yuheng.du.h...@gmail.com wrote: I am running an experiment where 92 producers is publishing data into 6 brokers and 10 consumer are reading online data simultaneously. How should I do to reduce the latency? Currently when I run the producer performance test the average latency is around 10s. Should I disable log.flush? How to do that? Thanks.
use page cache as much as possiblee
Hi, As I understand it, kafka brokers will store the incoming messages into pagecache as much as possible and then flush them into disk, right? But in my experiment where 90 producers is publishing data into 6 brokers, I see that the log directory on disk where broker stores the data is constantly increasing (every seconds.) So why this is happening? Does this has to do with the default log.flush.interval setting? I want the broker to write to disk less often when serving some on-line consumers to reduce latency. I tested in my broker the disk write speed is around 110MB/s. Thanks for any replies.
Variation of producer latency in ProducerPerformance test
Hi, I am running a test which 92 producers each publish 53000 records of size 254 bytes to 2 brokers. The average latency shown in each producer has high variations. For some producer, the average latency is as low as 38ms to send the 53000 records; but for some producer, the average latency is as high as 13067ms. How to explain this problem? To get the lowest latencies, what batch size and other important configs should I use? Thanks!
Kafka vs RabbitMQ latency
Hi guys, I was reading a paper today in which the latency of kafka and rabbitmq is compared: http://downloads.hindawi.com/journals/js/2015/468047.pdf To my surprise, kafka has shown some large variations of latency as the number of records per second increases. So I am curious about why is that. Also in the ProducerPerformanceTest: in/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance test7 5000 100 -1 *acks=1* bootstrap.servers=esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864 batch.size=8196 Setting acks = 1 means the producer will wait for ack from leader replica, right? Could that be the reason which affects latency? If I set it to 0, it will make the producers send as fast as possible therefore the throughput can increase and latency decrease in the test results? Thanks for answering. best,
multiple producer throughput
Hi, I am running 40 producers on 40 nodes cluster. The messages are sent to 6 brokers in another cluster. The producers are running ProducerPerformance test. When 20 nodes are running, the throughput is around 13MB/s and when running 40 nodes, the throughput is around 9MB/s. I have set log.retention.ms=9000 to delete the unwanted messages, just to avoid the disk space to be filled. So I want to know how should I tune the system to get better throughput result? Thanks.
Re: deleting data automatically
Thank you! On Mon, Jul 27, 2015 at 1:43 PM, Ewen Cheslack-Postava e...@confluent.io wrote: As I mentioned, adjusting any settings such that files are small enough that you don't get the benefits of append-only writes or file creation/deletion become a bottleneck might affect performance. It looks like the default setting for log.segment.bytes is 1GB, so given fast enough cleanup of old logs, you may not need to adjust that setting -- assuming you have a reasonable amount of storage, you'll easily fit many dozen log files of that size. -Ewen On Mon, Jul 27, 2015 at 10:36 AM, Yuheng Du yuheng.du.h...@gmail.com wrote: Thank you! what performance impacts will it be if I change log.segment.bytes? Thanks. On Mon, Jul 27, 2015 at 1:25 PM, Ewen Cheslack-Postava e...@confluent.io wrote: I think log.cleanup.interval.mins was removed in the first 0.8 release. It sounds like you're looking at outdated docs. Search for log.retention.check.interval.ms here: http://kafka.apache.org/documentation.html As for setting the values too low hurting performance, I'd guess it's probably only an issue if you set them extremely small, such that file creation and cleanup become a bottleneck. -Ewen On Mon, Jul 27, 2015 at 10:03 AM, Yuheng Du yuheng.du.h...@gmail.com wrote: If I want to get higher throughput, should I increase the log.segment.bytes? I don't see log.retention.check.interval.ms, but there is log.cleanup.interval.mins, is that what you mean? If I set log.roll.ms or log.cleanup.interval.mins too small, will it hurt the throughput? Thanks. On Fri, Jul 24, 2015 at 11:03 PM, Ewen Cheslack-Postava e...@confluent.io wrote: You'll want to set the log retention policy via log.retention.{ms,minutes,hours} or log.retention.bytes. If you want really aggressive collection (e.g., on the order of seconds, as you specified), you might also need to adjust log.segment.bytes/log.roll.{ms,hours} and log.retention.check.interval.ms. On Fri, Jul 24, 2015 at 12:49 PM, Yuheng Du yuheng.du.h...@gmail.com wrote: Hi, I am testing the kafka producer performance. So I created a queue and writes a large amount of data to that queue. Is there a way to delete the data automatically after some time, say whenever the data size reaches 50GB or the retention time exceeds 10 seconds, it will be deleted so my disk won't get filled and new data can't be written in? Thanks.! -- Thanks, Ewen -- Thanks, Ewen -- Thanks, Ewen
Re: deleting data automatically
If I want to get higher throughput, should I increase the log.segment.bytes? I don't see log.retention.check.interval.ms, but there is log.cleanup.interval.mins, is that what you mean? If I set log.roll.ms or log.cleanup.interval.mins too small, will it hurt the throughput? Thanks. On Fri, Jul 24, 2015 at 11:03 PM, Ewen Cheslack-Postava e...@confluent.io wrote: You'll want to set the log retention policy via log.retention.{ms,minutes,hours} or log.retention.bytes. If you want really aggressive collection (e.g., on the order of seconds, as you specified), you might also need to adjust log.segment.bytes/log.roll.{ms,hours} and log.retention.check.interval.ms. On Fri, Jul 24, 2015 at 12:49 PM, Yuheng Du yuheng.du.h...@gmail.com wrote: Hi, I am testing the kafka producer performance. So I created a queue and writes a large amount of data to that queue. Is there a way to delete the data automatically after some time, say whenever the data size reaches 50GB or the retention time exceeds 10 seconds, it will be deleted so my disk won't get filled and new data can't be written in? Thanks.! -- Thanks, Ewen
Re: deleting data automatically
Thank you! what performance impacts will it be if I change log.segment.bytes? Thanks. On Mon, Jul 27, 2015 at 1:25 PM, Ewen Cheslack-Postava e...@confluent.io wrote: I think log.cleanup.interval.mins was removed in the first 0.8 release. It sounds like you're looking at outdated docs. Search for log.retention.check.interval.ms here: http://kafka.apache.org/documentation.html As for setting the values too low hurting performance, I'd guess it's probably only an issue if you set them extremely small, such that file creation and cleanup become a bottleneck. -Ewen On Mon, Jul 27, 2015 at 10:03 AM, Yuheng Du yuheng.du.h...@gmail.com wrote: If I want to get higher throughput, should I increase the log.segment.bytes? I don't see log.retention.check.interval.ms, but there is log.cleanup.interval.mins, is that what you mean? If I set log.roll.ms or log.cleanup.interval.mins too small, will it hurt the throughput? Thanks. On Fri, Jul 24, 2015 at 11:03 PM, Ewen Cheslack-Postava e...@confluent.io wrote: You'll want to set the log retention policy via log.retention.{ms,minutes,hours} or log.retention.bytes. If you want really aggressive collection (e.g., on the order of seconds, as you specified), you might also need to adjust log.segment.bytes/log.roll.{ms,hours} and log.retention.check.interval.ms. On Fri, Jul 24, 2015 at 12:49 PM, Yuheng Du yuheng.du.h...@gmail.com wrote: Hi, I am testing the kafka producer performance. So I created a queue and writes a large amount of data to that queue. Is there a way to delete the data automatically after some time, say whenever the data size reaches 50GB or the retention time exceeds 10 seconds, it will be deleted so my disk won't get filled and new data can't be written in? Thanks.! -- Thanks, Ewen -- Thanks, Ewen
Re: multiple producer throughput
The message size is 100 bytes and each producer sends out 50million messages. It's the number used by the benchmarking kafka post. http://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines Thanks. On Mon, Jul 27, 2015 at 4:15 PM, Prabhjot Bharaj prabhbha...@gmail.com wrote: Hi, Have you tried with acks=1 and -1 as well? Please share the numbers and the message size Regards, Prabcs On Jul 27, 2015 10:24 PM, Yuheng Du yuheng.du.h...@gmail.com wrote: Hi, I am running 40 producers on 40 nodes cluster. The messages are sent to 6 brokers in another cluster. The producers are running ProducerPerformance test. When 20 nodes are running, the throughput is around 13MB/s and when running 40 nodes, the throughput is around 9MB/s. I have set log.retention.ms=9000 to delete the unwanted messages, just to avoid the disk space to be filled. So I want to know how should I tune the system to get better throughput result? Thanks.
Re: properducertest on multiple nodes
I deleted the queue and recreated it before I run the test. Things are working after restart the broker cluster, thanks! On Fri, Jul 24, 2015 at 12:06 PM, Gwen Shapira gshap...@cloudera.com wrote: Does topic speedx1 exist? On Fri, Jul 24, 2015 at 7:09 AM, Yuheng Du yuheng.du.h...@gmail.com wrote: Hi, I am trying to run 20 performance test on 10 nodes using pbsdsh. The messages will send to a 6 brokers cluster. It seems to work for a while. When I delete the test queue and rerun the test, the broker does not seem to process incoming messages: [yuhengd@node1739 kafka_2.10-0.8.2.1]$ bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance speedx1 5000 100 -1 acks=1 bootstrap.servers=130.127.133.72:9092 buffer.memory=67108864 batch.size=8196 1 records sent, 0.0 records/sec (0.00 MB/sec), 6.0 ms avg latency, 6.0 max latency. org.apache.kafka.common.errors.TimeoutException: Failed to update metadata after 79 ms. Can anyone suggest what the problem is? Or what configuration should I change? Thanks!
properducertest on multiple nodes
Hi, I am trying to run 20 performance test on 10 nodes using pbsdsh. The messages will send to a 6 brokers cluster. It seems to work for a while. When I delete the test queue and rerun the test, the broker does not seem to process incoming messages: [yuhengd@node1739 kafka_2.10-0.8.2.1]$ bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance speedx1 5000 100 -1 acks=1 bootstrap.servers=130.127.133.72:9092 buffer.memory=67108864 batch.size=8196 1 records sent, 0.0 records/sec (0.00 MB/sec), 6.0 ms avg latency, 6.0 max latency. org.apache.kafka.common.errors.TimeoutException: Failed to update metadata after 79 ms. Can anyone suggest what the problem is? Or what configuration should I change? Thanks!
deleting data automatically
Hi, I am testing the kafka producer performance. So I created a queue and writes a large amount of data to that queue. Is there a way to delete the data automatically after some time, say whenever the data size reaches 50GB or the retention time exceeds 10 seconds, it will be deleted so my disk won't get filled and new data can't be written in? Thanks.!
Re: broker data directory
Thank you, Nicolas! On Tue, Jul 21, 2015 at 10:46 AM, Nicolas Phung nicolas.ph...@gmail.com wrote: Yes indeed. # A comma seperated list of directories under which to store log files log.dirs=/var/lib/kafka You can put several disk/partitions too. Regards, On Tue, Jul 21, 2015 at 4:37 PM, Yuheng Du yuheng.du.h...@gmail.com wrote: Just wanna make sure, in server.properties, the configuration log.dirs=/tmp/kafka-logs specifies the directory of where the log (data) stores, right? If I want the data to be saved elsewhere, this is the configuration I need to change, right? Thanks for answering. best,
broker data directory
Just wanna make sure, in server.properties, the configuration log.dirs=/tmp/kafka-logs specifies the directory of where the log (data) stores, right? If I want the data to be saved elsewhere, this is the configuration I need to change, right? Thanks for answering. best,
Re: latency performance test
Hi Ewen, Thank you for your patient explaining. It is very helpful. Can we assume that the long latency of ProducerPerformance comes from queuing delay in the buffer and it is related to buffer size? Thank you! best, Yuheng On Thu, Jul 16, 2015 at 12:21 AM, Ewen Cheslack-Postava e...@confluent.io wrote: The tests are meant to evaluate different things and the way they send messages is the source of the difference. EndToEndLatency works with a single message at a time. It produces the message then waits for the consumer to receive it. This approach guarantees there is no delay due to queuing. The goal with this test is to evaluate the *minimum* latency. ProducerPerformance focuses on achieving maximum throughput. This means it will enqueue lots of records so it will always have more data to send (and can use batching to increase the throughput). Unlike EndToEndLatency, this means records may just sit in a queue on the producer for awhile because the maximum number of in flight requests has been reached and it needs to wait for responses for those requests. Since EndToEndLatency only ever has one record outstanding, it will never encounter this case. Batching itself doesn't increase the latency because it only occurs when the producer is either a) already unable to send messages anyway or b) linger.ms is greater than 0, but the tests use the default setting that doesn't linger at all. In your example for ProducerPerformance, you have 100 byte records and will buffer up to 64MB. Given the batch size of 8K and default producer settings of 5 in flight requests, you can roughly think of one round trip time handling 5 * 8K = 40K bytes of data. If your roundtrip is 1ms, then if your buffer is full at 64MB it will take you 64 MB / (40 KB/ms) = 1638ms = 1.6s. That means that the record that was added at the end of the buffer had to just sit in the buffer for 1.6s before it was sent off to the broker. And if your buffer is consistently full (which it should be for ProducerPerformance since it's sending as fast as it can), that means *every* record waits that long. Of course, these numbers are estimates, depend on my having used 1ms, but hopefully should make it clear why you can see relatively large latencies. -Ewen On Wed, Jul 15, 2015 at 1:38 AM, Yuheng Du yuheng.du.h...@gmail.com wrote: Hi, I have run the end to end latency test and the producerPerformance test on my kafka cluster according to https://gist.github.com/jkreps/c7ddb4041ef62a900e6c In end to end latency test, the latency was around 2ms. In producerperformance test, if use batch size 8196 to send 50,000,000 records: bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance speedx1 5000 100 -1 acks=1 bootstrap.servers=192.168.1.1:9092 buffer.memory=67108864 batch.size=8196 The results show that max latency is 3617ms, avg latency 626.7ms. I wanna know why the latency in producerperformance test is significantly larger than end to end test? Is it because of batching? Are the definitons of these two latencies different? I looked at the source code and I believe the latency is measure for the producer.send() function to complete. So does this latency includes transmission delay, transferring delay, and what other components? Thanks. best, Yuheng -- Thanks, Ewen
Re: kafka benchmark tests
Jiefu, Have you tried to run benchmark_test.py? I ran it and it asks me for the ducktape.services.service yuhengdu@consumer0:/packages/kafka_2.10-0.8.2.1$ python benchmark_test.py Traceback (most recent call last): File benchmark_test.py, line 16, in module from ducktape.services.service import Service ImportError: No module named ducktape.services.service Can you help me on getting it to work, Ewen? Thanks. best, Yuheng On Tue, Jul 14, 2015 at 11:28 PM, Ewen Cheslack-Postava e...@confluent.io wrote: @Jiefu, yes! The patch is functional, I think it's just waiting on a bit of final review after the last round of changes. You can definitely use it for your own benchmarking, and we'd love to see patches for any additional tests we missed in the first pass! -Ewen On Tue, Jul 14, 2015 at 10:53 AM, JIEFU GONG jg...@berkeley.edu wrote: Yuheng, I would recommend looking here: http://kafka.apache.org/documentation.html#brokerconfigs and scrolling down to get a better understanding of the default settings and what they mean -- it'll tell you what different options for acks does. Ewen, Thank you immensely for your thoughts, they shed a lot of insight into the issue. Though it is understandable that your specific results need to be verified, it seems that the KIP-25 patch is functional and I can use it for my own benchmarking purposes? Is that correct? Thanks again! On Tue, Jul 14, 2015 at 8:22 AM, Yuheng Du yuheng.du.h...@gmail.com wrote: Also, I guess setting the target throughput to -1 means let it be as high as possible? On Tue, Jul 14, 2015 at 10:36 AM, Yuheng Du yuheng.du.h...@gmail.com wrote: Thanks. If I set the acks=1 in the producer config options in bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance test7 5000 100 -1 acks=1 bootstrap.servers= esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864 batch.size=8196? Does that mean for each message generated at the producer, the producer will wait until the broker sends the ack back, then send another message? Thanks. Yuheng On Tue, Jul 14, 2015 at 10:06 AM, Manikumar Reddy ku...@nmsworks.co.in wrote: Yes, A list of Kafka Server host/port pairs to use for establishing the initial connection to the Kafka cluster https://kafka.apache.org/documentation.html#newproducerconfigs On Tue, Jul 14, 2015 at 7:29 PM, Yuheng Du yuheng.du.h...@gmail.com wrote: Does anyone know what is bootstrap.servers= esv4-hcl198.grid.linkedin.com:9092 means in the following test command: bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance test7 5000 100 -1 acks=1 bootstrap.servers= esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864 batch.size=8196? what is bootstrap.servers? Is it the kafka server that I am running a test at? Thanks. Yuheng On Tue, Jul 14, 2015 at 12:18 AM, Ewen Cheslack-Postava e...@confluent.io wrote: I implemented (nearly) the same basic set of tests in the system test framework we started at Confluent and that is going to move into Kafka -- see the wip patch for KIP-25 here: https://github.com/apache/kafka/pull/70 In particular, that test is implemented in benchmark_test.py: https://github.com/apache/kafka/pull/70/files#diff-ca984778cf9943407645eb6784f19dc8 Hopefully once that's merged people can reuse that benchmark (and add to it!) so they can easily run the same benchmarks across different hardware. Here are some results from an older version of that test on m3.2xlarge instances on EC2 using local ephemeral storage (I think... it's been awhile since I ran these numbers and I didn't document methodology that carefully): INFO:_.KafkaBenchmark:= INFO:_.KafkaBenchmark:BENCHMARK RESULTS INFO:_.KafkaBenchmark:= INFO:_.KafkaBenchmark:Single producer, no replication: 684097.470208 rec/sec (65.24 MB/s) INFO:_.KafkaBenchmark:Single producer, async 3x replication: 667494.359673 rec/sec (63.66 MB/s) INFO:_.KafkaBenchmark:Single producer, sync 3x replication: 116485.764275 rec/sec (11.11 MB/s) INFO:_.KafkaBenchmark:Three producers, async 3x replication: 1696519.022182 rec/sec (161.79 MB/s) INFO:_.KafkaBenchmark:Message size: INFO:_.KafkaBenchmark: 10: 1637825.195625 rec/sec (15.62 MB/s) INFO:_.KafkaBenchmark: 100: 605504.877911 rec/sec (57.75 MB/s) INFO:_.KafkaBenchmark: 1000: 90351.817570 rec/sec (86.17 MB/s) INFO:_.KafkaBenchmark: 1: 8306.180862 rec/sec (79.21 MB/s) INFO:_.KafkaBenchmark: 10
Re: Latency test
Tao, If I am running on the command line the following command bin/kafka-run-class.sh kafka.tools.TestEndToEndLatency 192.168.1.3:9092 192.168.1.1:2181 speedx3 5000 100 1 -Xmx1024m It promped that it is not correct. So where should I put the -Xmx1024m option? Thanks. On Wed, Jul 15, 2015 at 3:44 AM, Tao Feng fengta...@gmail.com wrote: (Please correct me if I am wrong.) Based on TestEndToEndLatency( https://github.com/apache/kafka/blob/trunk/core/src/test/scala/kafka/tools/TestEndToEndLatency.scala ), consumer_fetch_max_wait corresponds to fetch.wait.max.ms in consumer config( http://kafka.apache.org/documentation.html#consumerconfigs). The default value listed at document is 100(ms). To add java heap space to jvm, put -Xmx$Size(max heap size) for your jvm option. On Wed, Jul 15, 2015 at 12:29 AM, Yuheng Du yuheng.du.h...@gmail.com wrote: Tao, Thanks. The example on https://gist.github.com/jkreps/c7ddb4041ef62a900e6c is outdated already. The error message shows: USAGE: java kafka.tools.TestEndToEndLatency$ broker_list zookeeper_connect topic num_messages consumer_fetch_max_wait producer_acks Can anyone helps me what should be put in consumer_fetch_max_wait? Thanks. On Tue, Jul 14, 2015 at 5:21 PM, Tao Feng fengta...@gmail.com wrote: I think ProducerPerformance microbenchmark only measure between client to brokers(producer to brokers) and provide latency information. On Tue, Jul 14, 2015 at 11:05 AM, Yuheng Du yuheng.du.h...@gmail.com wrote: Currently, the latency test from kafka test the end to end latency between producers and consumers. Is there a way to test the producer to broker and broker to consumer delay seperately? Thanks.
Re: Latency test
I got java out of heap error when running end to end latency test: yuhengdu@consumer0:/packages/kafka_2.10-0.8.2.1$ bin/kafka-run-class.sh kafka.tools.TestEndToEndLatency 192.168.1.3:9092 192.168.1.1:2181 speedx3 5000 100 1 Exception in thread main java.lang.OutOfMemoryError: Java heap space at kafka.tools.TestEndToEndLatency$.main(TestEndToEndLatency.scala:69) at kafka.tools.TestEndToEndLatency.main(TestEndToEndLatency.scala) What command should I do to add java heap space to jvm? Thanks! Yuheng On Wed, Jul 15, 2015 at 3:29 AM, Yuheng Du yuheng.du.h...@gmail.com wrote: Tao, Thanks. The example on https://gist.github.com/jkreps/c7ddb4041ef62a900e6c is outdated already. The error message shows: USAGE: java kafka.tools.TestEndToEndLatency$ broker_list zookeeper_connect topic num_messages consumer_fetch_max_wait producer_acks Can anyone helps me what should be put in consumer_fetch_max_wait? Thanks. On Tue, Jul 14, 2015 at 5:21 PM, Tao Feng fengta...@gmail.com wrote: I think ProducerPerformance microbenchmark only measure between client to brokers(producer to brokers) and provide latency information. On Tue, Jul 14, 2015 at 11:05 AM, Yuheng Du yuheng.du.h...@gmail.com wrote: Currently, the latency test from kafka test the end to end latency between producers and consumers. Is there a way to test the producer to broker and broker to consumer delay seperately? Thanks.
latency performance test
Hi, I have run the end to end latency test and the producerPerformance test on my kafka cluster according to https://gist.github.com/jkreps/c7ddb4041ef62a900e6c In end to end latency test, the latency was around 2ms. In producerperformance test, if use batch size 8196 to send 50,000,000 records: bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance speedx1 5000 100 -1 acks=1 bootstrap.servers=192.168.1.1:9092 buffer.memory=67108864 batch.size=8196 The results show that max latency is 3617ms, avg latency 626.7ms. I wanna know why the latency in producerperformance test is significantly larger than end to end test? Is it because of batching? Are the definitons of these two latencies different? I looked at the source code and I believe the latency is measure for the producer.send() function to complete. So does this latency includes transmission delay, transferring delay, and what other components? Thanks. best, Yuheng
Re: Latency test
Tao, Thanks. The example on https://gist.github.com/jkreps/c7ddb4041ef62a900e6c is outdated already. The error message shows: USAGE: java kafka.tools.TestEndToEndLatency$ broker_list zookeeper_connect topic num_messages consumer_fetch_max_wait producer_acks Can anyone helps me what should be put in consumer_fetch_max_wait? Thanks. On Tue, Jul 14, 2015 at 5:21 PM, Tao Feng fengta...@gmail.com wrote: I think ProducerPerformance microbenchmark only measure between client to brokers(producer to brokers) and provide latency information. On Tue, Jul 14, 2015 at 11:05 AM, Yuheng Du yuheng.du.h...@gmail.com wrote: Currently, the latency test from kafka test the end to end latency between producers and consumers. Is there a way to test the producer to broker and broker to consumer delay seperately? Thanks.
Re: Latency test
I have run the end to end latency test and the producerPerformance test on my kafka cluster according to https://gist.github.com/jkreps/c7ddb4041ef62a900e6c In end to end latency test, the latency was around 2ms. In producerperformance test, if use batch size 8196 to send 50,000,000 records: bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance speedx1 5000 100 -1 acks=1 bootstrap.servers=192.168.1.1:9092 buffer.memory=67108864 batch.size=8196 The results show that max latency is 3617ms, avg latency 626.7ms. I wanna know why the latency in producerperformance test is significantly larger than end to end test? Is it because of batching? Are the definitons of these two latencies different? I looked at the source code and I believe the latency is measure for the producer.send() function to complete. So does this latency includes transmission delay, transferring delay, and what other components? Thanks. best, Yuheng On Wed, Jul 15, 2015 at 3:51 AM, Yuheng Du yuheng.du.h...@gmail.com wrote: Tao, If I am running on the command line the following command bin/kafka-run-class.sh kafka.tools.TestEndToEndLatency 192.168.1.3:9092 192.168.1.1:2181 speedx3 5000 100 1 -Xmx1024m It promped that it is not correct. So where should I put the -Xmx1024m option? Thanks. On Wed, Jul 15, 2015 at 3:44 AM, Tao Feng fengta...@gmail.com wrote: (Please correct me if I am wrong.) Based on TestEndToEndLatency( https://github.com/apache/kafka/blob/trunk/core/src/test/scala/kafka/tools/TestEndToEndLatency.scala ), consumer_fetch_max_wait corresponds to fetch.wait.max.ms in consumer config( http://kafka.apache.org/documentation.html#consumerconfigs). The default value listed at document is 100(ms). To add java heap space to jvm, put -Xmx$Size(max heap size) for your jvm option. On Wed, Jul 15, 2015 at 12:29 AM, Yuheng Du yuheng.du.h...@gmail.com wrote: Tao, Thanks. The example on https://gist.github.com/jkreps/c7ddb4041ef62a900e6c is outdated already. The error message shows: USAGE: java kafka.tools.TestEndToEndLatency$ broker_list zookeeper_connect topic num_messages consumer_fetch_max_wait producer_acks Can anyone helps me what should be put in consumer_fetch_max_wait? Thanks. On Tue, Jul 14, 2015 at 5:21 PM, Tao Feng fengta...@gmail.com wrote: I think ProducerPerformance microbenchmark only measure between client to brokers(producer to brokers) and provide latency information. On Tue, Jul 14, 2015 at 11:05 AM, Yuheng Du yuheng.du.h...@gmail.com wrote: Currently, the latency test from kafka test the end to end latency between producers and consumers. Is there a way to test the producer to broker and broker to consumer delay seperately? Thanks.
kafka TestEndtoEndLatency
In kafka performance tests https://gist.github.com/jkreps /c7ddb4041ef62a900e6c The TestEndtoEndLatency results are typically around 2ms, while the ProducerPerformance normally has average latencyaround several hundres ms when using batch size 8196. Are both results talking about end to end latency? and the ProducerPerformance results' latency is just larger because of batching of several messages? What is the message size of the TestEndtoEndLatency test uses? Thanks for any ideas. best,
Re: kafka benchmark tests
Hi Geoffrey, Thank you for your helpful information. Do I have to install the virtual machines? I am using Mac as the testdriver machine or I can use a linux machine to run testdriver too. Thanks. best, Yuheng On Wed, Jul 15, 2015 at 2:55 PM, Geoffrey Anderson ge...@confluent.io wrote: Hi Yuheng, Running these tests requires a tool we've created at Confluent called 'ducktape', which you need to install with the command: pip install ducktape==0.2.0 Running the tests locally requires some setup (creation of virtual machines etc.) which is outlined here: https://github.com/apache/kafka/pull/70/files#diff-62f0ff60ede3b78b9c95624e2f61d6c1 The instructions in the quickstart show you how to run the tests on cluster of virtual machines (on a single host) Once you have a cluster up and running, you'll be able to run the test you're interested in: cd kafka/tests ducktape kafkatest/tests/benchmark_test.py Definitely keep us posted about which parts are difficult, annoying, or confusing about this process and we'll do our best to help. Thanks, Geoff On Wed, Jul 15, 2015 at 12:49 AM, Yuheng Du yuheng.du.h...@gmail.com wrote: Jiefu, Have you tried to run benchmark_test.py? I ran it and it asks me for the ducktape.services.service yuhengdu@consumer0:/packages/kafka_2.10-0.8.2.1$ python benchmark_test.py Traceback (most recent call last): File benchmark_test.py, line 16, in module from ducktape.services.service import Service ImportError: No module named ducktape.services.service Can you help me on getting it to work, Ewen? Thanks. best, Yuheng On Tue, Jul 14, 2015 at 11:28 PM, Ewen Cheslack-Postava e...@confluent.io wrote: @Jiefu, yes! The patch is functional, I think it's just waiting on a bit of final review after the last round of changes. You can definitely use it for your own benchmarking, and we'd love to see patches for any additional tests we missed in the first pass! -Ewen On Tue, Jul 14, 2015 at 10:53 AM, JIEFU GONG jg...@berkeley.edu wrote: Yuheng, I would recommend looking here: http://kafka.apache.org/documentation.html#brokerconfigs and scrolling down to get a better understanding of the default settings and what they mean -- it'll tell you what different options for acks does. Ewen, Thank you immensely for your thoughts, they shed a lot of insight into the issue. Though it is understandable that your specific results need to be verified, it seems that the KIP-25 patch is functional and I can use it for my own benchmarking purposes? Is that correct? Thanks again! On Tue, Jul 14, 2015 at 8:22 AM, Yuheng Du yuheng.du.h...@gmail.com wrote: Also, I guess setting the target throughput to -1 means let it be as high as possible? On Tue, Jul 14, 2015 at 10:36 AM, Yuheng Du yuheng.du.h...@gmail.com wrote: Thanks. If I set the acks=1 in the producer config options in bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance test7 5000 100 -1 acks=1 bootstrap.servers= esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864 batch.size=8196? Does that mean for each message generated at the producer, the producer will wait until the broker sends the ack back, then send another message? Thanks. Yuheng On Tue, Jul 14, 2015 at 10:06 AM, Manikumar Reddy ku...@nmsworks.co.in wrote: Yes, A list of Kafka Server host/port pairs to use for establishing the initial connection to the Kafka cluster https://kafka.apache.org/documentation.html#newproducerconfigs On Tue, Jul 14, 2015 at 7:29 PM, Yuheng Du yuheng.du.h...@gmail.com wrote: Does anyone know what is bootstrap.servers= esv4-hcl198.grid.linkedin.com:9092 means in the following test command: bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance test7 5000 100 -1 acks=1 bootstrap.servers= esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864 batch.size=8196? what is bootstrap.servers? Is it the kafka server that I am running a test at? Thanks. Yuheng On Tue, Jul 14, 2015 at 12:18 AM, Ewen Cheslack-Postava e...@confluent.io wrote: I implemented (nearly) the same basic set of tests in the system test framework we started at Confluent and that is going to move into Kafka -- see the wip patch for KIP-25 here: https://github.com/apache/kafka/pull/70 In particular, that test is implemented in benchmark_test.py: https://github.com/apache/kafka
Re: kafka TestEndtoEndLatency
Guozhang, Thank you for explaining. I see that in ProducerPerformance call back functions were used to get the latency metrics. For the TestEndtoEndLatency, does message size matter? What this end-to-end latency comprise of, besides transferring a package from source to destination (typically around 0.2 ms for 100bytes message)? Thanks. Yuheng On Wed, Jul 15, 2015 at 3:01 PM, Guozhang Wang wangg...@gmail.com wrote: Yuheng, Only TestEndtoEndLatency's number are end to end, for ProducerPerformance the latency is for the send-to-ack latency, which increases as batch size increases. Guozhang On Wed, Jul 15, 2015 at 11:36 AM, Yuheng Du yuheng.du.h...@gmail.com wrote: In kafka performance tests https://gist.github.com/jkreps /c7ddb4041ef62a900e6c The TestEndtoEndLatency results are typically around 2ms, while the ProducerPerformance normally has average latencyaround several hundres ms when using batch size 8196. Are both results talking about end to end latency? and the ProducerPerformance results' latency is just larger because of batching of several messages? What is the message size of the TestEndtoEndLatency test uses? Thanks for any ideas. best, -- -- Guozhang
Re: kafka TestEndtoEndLatency
Thanks. Here is the source code snippet of EndtoEndLatency test: for (i- 0 until numMessages) { val begin = System.nanoTime producer.send( new ProducerRecord[Array[Byte],Array[Byte]](topic, message)) val received = iter.next val elapsed = System.nanoTime - begin // poor man's progress bar if (i % 1000 == 0) println(i + \t + elapsed / 1000.0 / 1000.0) totalTime + = elapsed latencies(i) = (elapsed / 1000 / 1000) } iter.next reads a message using a consumer process. The producer.send()function, is it blocking? I think the message should still matters here, am I right? In the source code, it is val message = hello there beautiful.getBytes is around 21 bytes. I see that in this EndtoEndLatency test, no acks is involved, right? Thanks. Yuheng On Wed, Jul 15, 2015 at 3:43 PM, Guozhang Wang wangg...@gmail.com wrote: The end-to-end latency record the transferring of a message from producer to broker, then to consumer. I cannot remember the details not but I think the EndtoEndLatency test record the latency as average, hence it is small. Guozhang On Wed, Jul 15, 2015 at 12:28 PM, Yuheng Du yuheng.du.h...@gmail.com wrote: Guozhang, Thank you for explaining. I see that in ProducerPerformance call back functions were used to get the latency metrics. For the TestEndtoEndLatency, does message size matter? What this end-to-end latency comprise of, besides transferring a package from source to destination (typically around 0.2 ms for 100bytes message)? Thanks. Yuheng On Wed, Jul 15, 2015 at 3:01 PM, Guozhang Wang wangg...@gmail.com wrote: Yuheng, Only TestEndtoEndLatency's number are end to end, for ProducerPerformance the latency is for the send-to-ack latency, which increases as batch size increases. Guozhang On Wed, Jul 15, 2015 at 11:36 AM, Yuheng Du yuheng.du.h...@gmail.com wrote: In kafka performance tests https://gist.github.com/jkreps /c7ddb4041ef62a900e6c The TestEndtoEndLatency results are typically around 2ms, while the ProducerPerformance normally has average latencyaround several hundres ms when using batch size 8196. Are both results talking about end to end latency? and the ProducerPerformance results' latency is just larger because of batching of several messages? What is the message size of the TestEndtoEndLatency test uses? Thanks for any ideas. best, -- -- Guozhang -- -- Guozhang
Re: kafka benchmark tests
Hi Geoffrey, Thank you for your detailed explaining. They are really helpful. I am thinking of going after the second way, since I have bare metal access to all the nodes in the cluster, it's probably better to run real slave machines instead of virtual machines. (correct me if I am wrong) Each of my node has 256 G ram and 2T disk space, how large will the slave machine virtual machine be and how much memory they will take? Thank you! best, Yuheng On Wed, Jul 15, 2015 at 4:19 PM, Geoffrey Anderson ge...@confluent.io wrote: Hi Yuheng, Yes, you should be able to run on either mac or linux. The test cluster consists of a test-driver machine and some number of slave machines. Right now, there are roughly two ways to set up the slave machines: 1) Slave machines are virtual machines *on* the test-driver machine. 2) Slave machines are external to the test-driver machine. 1 is the simplest to set up, but yes it does require installation of the virtual machines on the test-driver machine. The installation of these machines is outlined in the quickstart I mentioned (here is a better link for the test README: https://github.com/confluentinc/kafka/tree/KAFKA-2276/tests). The tool we're using to bring up the slave virtual machines is called vagrant, so the vagrant steps in the quickstart are really telling you how to install the virtual machines. Hope that helps! Cheers, Geoff On Wed, Jul 15, 2015 at 12:13 PM, Yuheng Du yuheng.du.h...@gmail.com wrote: Hi Geoffrey, Thank you for your helpful information. Do I have to install the virtual machines? I am using Mac as the testdriver machine or I can use a linux machine to run testdriver too. Thanks. best, Yuheng On Wed, Jul 15, 2015 at 2:55 PM, Geoffrey Anderson ge...@confluent.io wrote: Hi Yuheng, Running these tests requires a tool we've created at Confluent called 'ducktape', which you need to install with the command: pip install ducktape==0.2.0 Running the tests locally requires some setup (creation of virtual machines etc.) which is outlined here: https://github.com/apache/kafka/pull/70/files#diff-62f0ff60ede3b78b9c95624e2f61d6c1 The instructions in the quickstart show you how to run the tests on cluster of virtual machines (on a single host) Once you have a cluster up and running, you'll be able to run the test you're interested in: cd kafka/tests ducktape kafkatest/tests/benchmark_test.py Definitely keep us posted about which parts are difficult, annoying, or confusing about this process and we'll do our best to help. Thanks, Geoff On Wed, Jul 15, 2015 at 12:49 AM, Yuheng Du yuheng.du.h...@gmail.com wrote: Jiefu, Have you tried to run benchmark_test.py? I ran it and it asks me for the ducktape.services.service yuhengdu@consumer0:/packages/kafka_2.10-0.8.2.1$ python benchmark_test.py Traceback (most recent call last): File benchmark_test.py, line 16, in module from ducktape.services.service import Service ImportError: No module named ducktape.services.service Can you help me on getting it to work, Ewen? Thanks. best, Yuheng On Tue, Jul 14, 2015 at 11:28 PM, Ewen Cheslack-Postava e...@confluent.io wrote: @Jiefu, yes! The patch is functional, I think it's just waiting on a bit of final review after the last round of changes. You can definitely use it for your own benchmarking, and we'd love to see patches for any additional tests we missed in the first pass! -Ewen On Tue, Jul 14, 2015 at 10:53 AM, JIEFU GONG jg...@berkeley.edu wrote: Yuheng, I would recommend looking here: http://kafka.apache.org/documentation.html#brokerconfigs and scrolling down to get a better understanding of the default settings and what they mean -- it'll tell you what different options for acks does. Ewen, Thank you immensely for your thoughts, they shed a lot of insight into the issue. Though it is understandable that your specific results need to be verified, it seems that the KIP-25 patch is functional and I can use it for my own benchmarking purposes? Is that correct? Thanks again! On Tue, Jul 14, 2015 at 8:22 AM, Yuheng Du yuheng.du.h...@gmail.com wrote: Also, I guess setting the target throughput to -1 means let it be as high as possible? On Tue, Jul 14, 2015 at 10:36 AM, Yuheng Du yuheng.du.h...@gmail.com wrote: Thanks. If I set the acks=1 in the producer config options in bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance test7 5000 100 -1 acks=1 bootstrap.servers= esv4-hcl198
Re: Latency test
Thank you, Tao! On Jul 15, 2015 6:27 PM, Tao Feng fengta...@gmail.com wrote: Sorry Yufeng, You should change it in $KAFKA_HEAP_OPTS. On Wed, Jul 15, 2015 at 3:09 PM, Tao Feng fengta...@gmail.com wrote: Hi Yuheng, You could add the -Xmx1024m in https://github.com/apache/kafka/blob/trunk/bin/kafka-run-class.sh KAFKA_JVM_PERFORMANCE_OPTS. On Wed, Jul 15, 2015 at 12:51 AM, Yuheng Du yuheng.du.h...@gmail.com wrote: Tao, If I am running on the command line the following command bin/kafka-run-class.sh kafka.tools.TestEndToEndLatency 192.168.1.3:9092 192.168.1.1:2181 speedx3 5000 100 1 -Xmx1024m It promped that it is not correct. So where should I put the -Xmx1024m option? Thanks. On Wed, Jul 15, 2015 at 3:44 AM, Tao Feng fengta...@gmail.com wrote: (Please correct me if I am wrong.) Based on TestEndToEndLatency( https://github.com/apache/kafka/blob/trunk/core/src/test/scala/kafka/tools/TestEndToEndLatency.scala ), consumer_fetch_max_wait corresponds to fetch.wait.max.ms in consumer config( http://kafka.apache.org/documentation.html#consumerconfigs). The default value listed at document is 100(ms). To add java heap space to jvm, put -Xmx$Size(max heap size) for your jvm option. On Wed, Jul 15, 2015 at 12:29 AM, Yuheng Du yuheng.du.h...@gmail.com wrote: Tao, Thanks. The example on https://gist.github.com/jkreps/c7ddb4041ef62a900e6c is outdated already. The error message shows: USAGE: java kafka.tools.TestEndToEndLatency$ broker_list zookeeper_connect topic num_messages consumer_fetch_max_wait producer_acks Can anyone helps me what should be put in consumer_fetch_max_wait? Thanks. On Tue, Jul 14, 2015 at 5:21 PM, Tao Feng fengta...@gmail.com wrote: I think ProducerPerformance microbenchmark only measure between client to brokers(producer to brokers) and provide latency information. On Tue, Jul 14, 2015 at 11:05 AM, Yuheng Du yuheng.du.h...@gmail.com wrote: Currently, the latency test from kafka test the end to end latency between producers and consumers. Is there a way to test the producer to broker and broker to consumer delay seperately? Thanks.
Re: kafka benchmark tests
Thanks. If I set the acks=1 in the producer config options in bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance test7 5000 100 -1 acks=1 bootstrap.servers= esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864 batch.size=8196? Does that mean for each message generated at the producer, the producer will wait until the broker sends the ack back, then send another message? Thanks. Yuheng On Tue, Jul 14, 2015 at 10:06 AM, Manikumar Reddy ku...@nmsworks.co.in wrote: Yes, A list of Kafka Server host/port pairs to use for establishing the initial connection to the Kafka cluster https://kafka.apache.org/documentation.html#newproducerconfigs On Tue, Jul 14, 2015 at 7:29 PM, Yuheng Du yuheng.du.h...@gmail.com wrote: Does anyone know what is bootstrap.servers= esv4-hcl198.grid.linkedin.com:9092 means in the following test command: bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance test7 5000 100 -1 acks=1 bootstrap.servers= esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864 batch.size=8196? what is bootstrap.servers? Is it the kafka server that I am running a test at? Thanks. Yuheng On Tue, Jul 14, 2015 at 12:18 AM, Ewen Cheslack-Postava e...@confluent.io wrote: I implemented (nearly) the same basic set of tests in the system test framework we started at Confluent and that is going to move into Kafka -- see the wip patch for KIP-25 here: https://github.com/apache/kafka/pull/70 In particular, that test is implemented in benchmark_test.py: https://github.com/apache/kafka/pull/70/files#diff-ca984778cf9943407645eb6784f19dc8 Hopefully once that's merged people can reuse that benchmark (and add to it!) so they can easily run the same benchmarks across different hardware. Here are some results from an older version of that test on m3.2xlarge instances on EC2 using local ephemeral storage (I think... it's been awhile since I ran these numbers and I didn't document methodology that carefully): INFO:_.KafkaBenchmark:= INFO:_.KafkaBenchmark:BENCHMARK RESULTS INFO:_.KafkaBenchmark:= INFO:_.KafkaBenchmark:Single producer, no replication: 684097.470208 rec/sec (65.24 MB/s) INFO:_.KafkaBenchmark:Single producer, async 3x replication: 667494.359673 rec/sec (63.66 MB/s) INFO:_.KafkaBenchmark:Single producer, sync 3x replication: 116485.764275 rec/sec (11.11 MB/s) INFO:_.KafkaBenchmark:Three producers, async 3x replication: 1696519.022182 rec/sec (161.79 MB/s) INFO:_.KafkaBenchmark:Message size: INFO:_.KafkaBenchmark: 10: 1637825.195625 rec/sec (15.62 MB/s) INFO:_.KafkaBenchmark: 100: 605504.877911 rec/sec (57.75 MB/s) INFO:_.KafkaBenchmark: 1000: 90351.817570 rec/sec (86.17 MB/s) INFO:_.KafkaBenchmark: 1: 8306.180862 rec/sec (79.21 MB/s) INFO:_.KafkaBenchmark: 10: 978.403499 rec/sec (93.31 MB/s) INFO:_.KafkaBenchmark:Throughput over long run, data memory: INFO:_.KafkaBenchmark: Time block 0: 684725.151324 rec/sec (65.30 MB/s) INFO:_.KafkaBenchmark:Single consumer: 701031.14 rec/sec (56.830500 MB/s) INFO:_.KafkaBenchmark:Three consumers: 3304011.014900 rec/sec (267.830800 MB/s) INFO:_.KafkaBenchmark:Producer + consumer: INFO:_.KafkaBenchmark: Producer: 624984.375391 rec/sec (59.60 MB/s) INFO:_.KafkaBenchmark: Consumer: 624984.375391 rec/sec (59.60 MB/s) INFO:_.KafkaBenchmark:End-to-end latency: median 2.00 ms, 99% 4.00 ms, 99.9% 19.00 ms Don't trust these numbers for anything, the were a quick one-off test. I'm just pasting the output so you get some idea of what the results might look like. Once we merge the KIP-25 patch, Confluent will be running the tests regularly and results will be available publicly so we'll be able to keep better tabs on performance, albeit for only a specific class of hardware. For the batch.size question -- I'm not sure the results in the blog post actually have different settings, it could be accidental divergence between the script and the blog post. The post specifically notes that tuning the batch size in the synchronous case might help, but that he didn't do that. If you're trying to benchmark the *optimal* throughput, tuning the batch size would make sense. Since synchronous replication will have higher latency and there's a limit to how many requests can be in flight at once, you'll want a larger batch size to compensate for the additional latency. However, in practice the increase you see may be negligible. Somebody who has spent more time fiddling with tweaking producer performance may have more insight. -Ewen On Mon, Jul 13, 2015 at 10:08 AM, JIEFU GONG jg...@berkeley.edu wrote: Hi all, I was wondering if any of you guys have done
Re: kafka benchmark tests
Also, I guess setting the target throughput to -1 means let it be as high as possible? On Tue, Jul 14, 2015 at 10:36 AM, Yuheng Du yuheng.du.h...@gmail.com wrote: Thanks. If I set the acks=1 in the producer config options in bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance test7 5000 100 -1 acks=1 bootstrap.servers= esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864 batch.size=8196? Does that mean for each message generated at the producer, the producer will wait until the broker sends the ack back, then send another message? Thanks. Yuheng On Tue, Jul 14, 2015 at 10:06 AM, Manikumar Reddy ku...@nmsworks.co.in wrote: Yes, A list of Kafka Server host/port pairs to use for establishing the initial connection to the Kafka cluster https://kafka.apache.org/documentation.html#newproducerconfigs On Tue, Jul 14, 2015 at 7:29 PM, Yuheng Du yuheng.du.h...@gmail.com wrote: Does anyone know what is bootstrap.servers= esv4-hcl198.grid.linkedin.com:9092 means in the following test command: bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance test7 5000 100 -1 acks=1 bootstrap.servers= esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864 batch.size=8196? what is bootstrap.servers? Is it the kafka server that I am running a test at? Thanks. Yuheng On Tue, Jul 14, 2015 at 12:18 AM, Ewen Cheslack-Postava e...@confluent.io wrote: I implemented (nearly) the same basic set of tests in the system test framework we started at Confluent and that is going to move into Kafka -- see the wip patch for KIP-25 here: https://github.com/apache/kafka/pull/70 In particular, that test is implemented in benchmark_test.py: https://github.com/apache/kafka/pull/70/files#diff-ca984778cf9943407645eb6784f19dc8 Hopefully once that's merged people can reuse that benchmark (and add to it!) so they can easily run the same benchmarks across different hardware. Here are some results from an older version of that test on m3.2xlarge instances on EC2 using local ephemeral storage (I think... it's been awhile since I ran these numbers and I didn't document methodology that carefully): INFO:_.KafkaBenchmark:= INFO:_.KafkaBenchmark:BENCHMARK RESULTS INFO:_.KafkaBenchmark:= INFO:_.KafkaBenchmark:Single producer, no replication: 684097.470208 rec/sec (65.24 MB/s) INFO:_.KafkaBenchmark:Single producer, async 3x replication: 667494.359673 rec/sec (63.66 MB/s) INFO:_.KafkaBenchmark:Single producer, sync 3x replication: 116485.764275 rec/sec (11.11 MB/s) INFO:_.KafkaBenchmark:Three producers, async 3x replication: 1696519.022182 rec/sec (161.79 MB/s) INFO:_.KafkaBenchmark:Message size: INFO:_.KafkaBenchmark: 10: 1637825.195625 rec/sec (15.62 MB/s) INFO:_.KafkaBenchmark: 100: 605504.877911 rec/sec (57.75 MB/s) INFO:_.KafkaBenchmark: 1000: 90351.817570 rec/sec (86.17 MB/s) INFO:_.KafkaBenchmark: 1: 8306.180862 rec/sec (79.21 MB/s) INFO:_.KafkaBenchmark: 10: 978.403499 rec/sec (93.31 MB/s) INFO:_.KafkaBenchmark:Throughput over long run, data memory: INFO:_.KafkaBenchmark: Time block 0: 684725.151324 rec/sec (65.30 MB/s) INFO:_.KafkaBenchmark:Single consumer: 701031.14 rec/sec (56.830500 MB/s) INFO:_.KafkaBenchmark:Three consumers: 3304011.014900 rec/sec (267.830800 MB/s) INFO:_.KafkaBenchmark:Producer + consumer: INFO:_.KafkaBenchmark: Producer: 624984.375391 rec/sec (59.60 MB/s) INFO:_.KafkaBenchmark: Consumer: 624984.375391 rec/sec (59.60 MB/s) INFO:_.KafkaBenchmark:End-to-end latency: median 2.00 ms, 99% 4.00 ms, 99.9% 19.00 ms Don't trust these numbers for anything, the were a quick one-off test. I'm just pasting the output so you get some idea of what the results might look like. Once we merge the KIP-25 patch, Confluent will be running the tests regularly and results will be available publicly so we'll be able to keep better tabs on performance, albeit for only a specific class of hardware. For the batch.size question -- I'm not sure the results in the blog post actually have different settings, it could be accidental divergence between the script and the blog post. The post specifically notes that tuning the batch size in the synchronous case might help, but that he didn't do that. If you're trying to benchmark the *optimal* throughput, tuning the batch size would make sense. Since synchronous replication will have higher latency and there's a limit to how many requests can be in flight at once, you'll want a larger batch size to compensate for the additional latency. However, in practice the increase you see may be negligible. Somebody who has spent more time fiddling with tweaking producer performance may
Re: kafka benchmark tests
Does anyone know what is bootstrap.servers= esv4-hcl198.grid.linkedin.com:9092 means in the following test command: bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance test7 5000 100 -1 acks=1 bootstrap.servers= esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864 batch.size=8196? what is bootstrap.servers? Is it the kafka server that I am running a test at? Thanks. Yuheng On Tue, Jul 14, 2015 at 12:18 AM, Ewen Cheslack-Postava e...@confluent.io wrote: I implemented (nearly) the same basic set of tests in the system test framework we started at Confluent and that is going to move into Kafka -- see the wip patch for KIP-25 here: https://github.com/apache/kafka/pull/70 In particular, that test is implemented in benchmark_test.py: https://github.com/apache/kafka/pull/70/files#diff-ca984778cf9943407645eb6784f19dc8 Hopefully once that's merged people can reuse that benchmark (and add to it!) so they can easily run the same benchmarks across different hardware. Here are some results from an older version of that test on m3.2xlarge instances on EC2 using local ephemeral storage (I think... it's been awhile since I ran these numbers and I didn't document methodology that carefully): INFO:_.KafkaBenchmark:= INFO:_.KafkaBenchmark:BENCHMARK RESULTS INFO:_.KafkaBenchmark:= INFO:_.KafkaBenchmark:Single producer, no replication: 684097.470208 rec/sec (65.24 MB/s) INFO:_.KafkaBenchmark:Single producer, async 3x replication: 667494.359673 rec/sec (63.66 MB/s) INFO:_.KafkaBenchmark:Single producer, sync 3x replication: 116485.764275 rec/sec (11.11 MB/s) INFO:_.KafkaBenchmark:Three producers, async 3x replication: 1696519.022182 rec/sec (161.79 MB/s) INFO:_.KafkaBenchmark:Message size: INFO:_.KafkaBenchmark: 10: 1637825.195625 rec/sec (15.62 MB/s) INFO:_.KafkaBenchmark: 100: 605504.877911 rec/sec (57.75 MB/s) INFO:_.KafkaBenchmark: 1000: 90351.817570 rec/sec (86.17 MB/s) INFO:_.KafkaBenchmark: 1: 8306.180862 rec/sec (79.21 MB/s) INFO:_.KafkaBenchmark: 10: 978.403499 rec/sec (93.31 MB/s) INFO:_.KafkaBenchmark:Throughput over long run, data memory: INFO:_.KafkaBenchmark: Time block 0: 684725.151324 rec/sec (65.30 MB/s) INFO:_.KafkaBenchmark:Single consumer: 701031.14 rec/sec (56.830500 MB/s) INFO:_.KafkaBenchmark:Three consumers: 3304011.014900 rec/sec (267.830800 MB/s) INFO:_.KafkaBenchmark:Producer + consumer: INFO:_.KafkaBenchmark: Producer: 624984.375391 rec/sec (59.60 MB/s) INFO:_.KafkaBenchmark: Consumer: 624984.375391 rec/sec (59.60 MB/s) INFO:_.KafkaBenchmark:End-to-end latency: median 2.00 ms, 99% 4.00 ms, 99.9% 19.00 ms Don't trust these numbers for anything, the were a quick one-off test. I'm just pasting the output so you get some idea of what the results might look like. Once we merge the KIP-25 patch, Confluent will be running the tests regularly and results will be available publicly so we'll be able to keep better tabs on performance, albeit for only a specific class of hardware. For the batch.size question -- I'm not sure the results in the blog post actually have different settings, it could be accidental divergence between the script and the blog post. The post specifically notes that tuning the batch size in the synchronous case might help, but that he didn't do that. If you're trying to benchmark the *optimal* throughput, tuning the batch size would make sense. Since synchronous replication will have higher latency and there's a limit to how many requests can be in flight at once, you'll want a larger batch size to compensate for the additional latency. However, in practice the increase you see may be negligible. Somebody who has spent more time fiddling with tweaking producer performance may have more insight. -Ewen On Mon, Jul 13, 2015 at 10:08 AM, JIEFU GONG jg...@berkeley.edu wrote: Hi all, I was wondering if any of you guys have done benchmarks on Kafka performance before, and if they or their details (# nodes in cluster, # records / size(s) of messages, etc.) could be shared. For comparison purposes, I am trying to benchmark Kafka against some similar services such as Kinesis or Scribe. Additionally, I was wondering if anyone could shed some insight on Jay Kreps' benchmarks that he has openly published here: https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines Specifically, I am unsure of why between his tests of 3x synchronous replication and 3x async replication he changed the batch.size, as well as why he is seemingly publishing to incorrect topics: Configs: https://gist.github.com/jkreps/c7ddb4041ef62a900e6c Any help is greatly appreciated! -- Jiefu Gong University of California, Berkeley | Class of 2017 B.A Computer Science | College of Letters and Sciences
Re: How to run the three producers test
I checked the logs on the brokers, it seems that the zookeeper or the kafka server process is not running on this broker...Thank you guys. I will see if it happens again. On Tue, Jul 14, 2015 at 4:53 PM, JIEFU GONG jg...@berkeley.edu wrote: Hmm..yeah some error logs would be nice like Gwen pointed out. Do any of your brokers fall out of the ISR when sending messages? It seems like your setup should be fine, so I'm not entirely sure. On Tue, Jul 14, 2015 at 1:31 PM, Yuheng Du yuheng.du.h...@gmail.com wrote: Jiefu, I am performing these tests on a 6 nodes cluster in cloudlab (a infrastructure built for scientific research). I use 2 nodes as producers, 2 as brokers only, and 2 as consumers. I have tested for each individual machines and they work well. I did not use AWS. Thank you! On Tue, Jul 14, 2015 at 4:20 PM, JIEFU GONG jg...@berkeley.edu wrote: Yuheng, are you performing these tests locally or using a service such as AWS? I'd try using each separate machine individually first, connecting to the ZK/Kafka servers and ensuring that each is able to first log and consume messages independently. On Tue, Jul 14, 2015 at 1:17 PM, Gwen Shapira gshap...@cloudera.com wrote: Are there any errors on the broker logs? On Tue, Jul 14, 2015 at 11:57 AM, Yuheng Du yuheng.du.h...@gmail.com wrote: Jiefu, Thank you. The three producers can run at the same time. I mean should they be started at exactly the same time? (I have three consoles from each of the three machines and I just start the console command manually one by one) Or a small variation of the starting time won't matter? Gwen and Jiefu, I have started the three producers at three machines, after a while, all of them gives a java.net.ConnectException: [2015-07-14 12:56:46,352] WARN Error in I/O with producer0-link-0/ 192.168.1.1 (org.apache.kafka.common.network.Selector) java.net.ConnectException: Connection refused.. [2015-07-14 12:56:48,056] WARN Error in I/O with producer1-link-0/ 192.168.1.2 (org.apache.kafka.common.network.Selector) java.net.ConnectException: Connection refused. What could be the cause? Thank you guys! On Tue, Jul 14, 2015 at 2:47 PM, JIEFU GONG jg...@berkeley.edu wrote: Yuheng, Yes, if you read the blog post it specifies that he's using three separate machines. There's no reason the producers cannot be started at the same time, I believe. On Tue, Jul 14, 2015 at 11:42 AM, Yuheng Du yuheng.du.h...@gmail.com wrote: Hi, I am running the performance test for kafka. https://gist.github.com/jkreps /c7ddb4041ef62a900e6c For the Three Producers, 3x async replication scenario, the command is the same as one producer: bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance test 5000 100 -1 acks=1 bootstrap.servers=esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864 batch.size=8196 So How to I run the test for three producers? Do I just run them on three separate servers at the same time? Will there be some error in this way since the three producers can't be started at the same time? Thanks. best, Yuheng -- Jiefu Gong University of California, Berkeley | Class of 2017 B.A Computer Science | College of Letters and Sciences jg...@berkeley.edu elise...@berkeley.edu | (925) 400-3427 -- Jiefu Gong University of California, Berkeley | Class of 2017 B.A Computer Science | College of Letters and Sciences jg...@berkeley.edu elise...@berkeley.edu | (925) 400-3427 -- Jiefu Gong University of California, Berkeley | Class of 2017 B.A Computer Science | College of Letters and Sciences jg...@berkeley.edu elise...@berkeley.edu | (925) 400-3427
Re: How to run the three producers test
Also, the log in another broker (not the bootstrap) says: [2015-07-14 15:18:41,220] FATAL [Replica Manager on Broker 1]: Error writing to highwatermark file: (kafka.server.ReplicaManager) [2015-07-14 15:18:40,160] ERROR Closing socket for /130.127.133.47 because of error (kafka.network.Process or) java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcherImpl.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) at sun.nio.ch.IOUtil.read(IOUtil.java:197) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379) at kafka.utils.Utils$.read(Utils.scala:380) at kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54) at kafka.network.Processor.read(SocketServer.scala:444) at kafka.network.Processor.run(SocketServer.scala:340) at java.lang.Thread.run(Thread.java:745) java.io.IOException: No space left on device at java.io.FileOutputStream.writeBytes(Native Method) at java.io.FileOutputStream.write(FileOutputStream.java:345) at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221) at sun.nio.cs.StreamEncoder.implFlushBuffe (END) On Tue, Jul 14, 2015 at 5:24 PM, Yuheng Du yuheng.du.h...@gmail.com wrote: Hi Jiefu, Gwen, I am running the Throughput versus stored data test: bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance test 500 100 -1 acks=1 bootstrap.servers= esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864 batch.size=8196 After around 50,000,000 messages were sent, I got a bunch of connection refused error as I mentioned before. I checked the logs on the broker and here is what I see: [2015-07-14 15:11:23,578] WARN Partition [test,4] on broker 5: No checkpointed highwatermark is found for partition [test,4] (kafka.cluster.Partition) [2015-07-14 15:12:33,298] INFO Rolled new log segment for 'test-4' in 4 ms. (kafka.log.Log) [2015-07-14 15:12:33,299] INFO Rolled new log segment for 'test-0' in 1 ms. (kafka.log.Log) [2015-07-14 15:13:39,529] INFO Rolled new log segment for 'test-4' in 1 ms. (kafka.log.Log) [2015-07-14 15:13:39,531] INFO Rolled new log segment for 'test-0' in 1 ms. (kafka.log.Log) [2015-07-14 15:14:48,502] INFO Rolled new log segment for 'test-4' in 3 ms. (kafka.log.Log) [2015-07-14 15:14:48,502] INFO Rolled new log segment for 'test-0' in 1 ms. (kafka.log.Log) [2015-07-14 15:15:51,478] INFO Rolled new log segment for 'test-4' in 1 ms. (kafka.log.Log) [2015-07-14 15:15:51,479] INFO Rolled new log segment for 'test-0' in 1 ms. (kafka.log.Log) [2015-07-14 15:16:52,589] INFO Rolled new log segment for 'test-4' in 1 ms. (kafka.log.Log) [2015-07-14 15:16:52,590] INFO Rolled new log segment for 'test-0' in 1 ms. (kafka.log.Log) [2015-07-14 15:17:57,406] INFO Rolled new log segment for 'test-4' in 1 ms. (kafka.log.Log) [2015-07-14 15:17:57,407] INFO Rolled new log segment for 'test-0' in 0 ms. (kafka.log.Log) [2015-07-14 15:18:39,792] FATAL [KafkaApi-5] Halting due to unrecoverable I/O error while handling produce request: (kafka.server.KafkaApis) kafka.common.KafkaStorageException: I/O exception in append to log 'test-0' at kafka.log.Log.append(Log.scala:266) at kafka.cluster.Partition$$anonfun$appendMessagesToLeader$1.apply(Partition.scala:379) at kafka.cluster.Partition$$anonfun$appendMessagesToLeader$1.apply(Partition.scala:365) at kafka.utils.Utils$.inLock(Utils.scala:535) at kafka.utils.Utils$.inReadLock(Utils.scala:541) at kafka.cluster.Partition.appendMessagesToLeader(Partition.scala:365) at kafka.server.KafkaApis$$anonfun$appendToLocalLog$2.apply(KafkaApis.scala:291) at kafka.server.KafkaApis$$anonfun$appendToLocalLog$2.apply(KafkaApis.scala:282) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala.coll Can you help me with this problem? Thanks. On Tue, Jul 14, 2015 at 5:12 PM, Yuheng Du yuheng.du.h...@gmail.com wrote: I checked the logs on the brokers, it seems that the zookeeper or the kafka server process is not running on this broker...Thank you guys. I will see if it happens again. On Tue, Jul 14, 2015 at 4:53 PM, JIEFU GONG jg...@berkeley.edu wrote: Hmm..yeah some error logs would be nice like Gwen pointed out. Do any of your brokers fall out of the ISR when sending messages? It seems like your setup should be fine, so I'm not entirely sure. On Tue, Jul 14, 2015 at 1:31 PM
Re: How to run the three producers test
Hi Jiefu, Gwen, I am running the Throughput versus stored data test: bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance test 500 100 -1 acks=1 bootstrap.servers= esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864 batch.size=8196 After around 50,000,000 messages were sent, I got a bunch of connection refused error as I mentioned before. I checked the logs on the broker and here is what I see: [2015-07-14 15:11:23,578] WARN Partition [test,4] on broker 5: No checkpointed highwatermark is found for partition [test,4] (kafka.cluster.Partition) [2015-07-14 15:12:33,298] INFO Rolled new log segment for 'test-4' in 4 ms. (kafka.log.Log) [2015-07-14 15:12:33,299] INFO Rolled new log segment for 'test-0' in 1 ms. (kafka.log.Log) [2015-07-14 15:13:39,529] INFO Rolled new log segment for 'test-4' in 1 ms. (kafka.log.Log) [2015-07-14 15:13:39,531] INFO Rolled new log segment for 'test-0' in 1 ms. (kafka.log.Log) [2015-07-14 15:14:48,502] INFO Rolled new log segment for 'test-4' in 3 ms. (kafka.log.Log) [2015-07-14 15:14:48,502] INFO Rolled new log segment for 'test-0' in 1 ms. (kafka.log.Log) [2015-07-14 15:15:51,478] INFO Rolled new log segment for 'test-4' in 1 ms. (kafka.log.Log) [2015-07-14 15:15:51,479] INFO Rolled new log segment for 'test-0' in 1 ms. (kafka.log.Log) [2015-07-14 15:16:52,589] INFO Rolled new log segment for 'test-4' in 1 ms. (kafka.log.Log) [2015-07-14 15:16:52,590] INFO Rolled new log segment for 'test-0' in 1 ms. (kafka.log.Log) [2015-07-14 15:17:57,406] INFO Rolled new log segment for 'test-4' in 1 ms. (kafka.log.Log) [2015-07-14 15:17:57,407] INFO Rolled new log segment for 'test-0' in 0 ms. (kafka.log.Log) [2015-07-14 15:18:39,792] FATAL [KafkaApi-5] Halting due to unrecoverable I/O error while handling produce request: (kafka.server.KafkaApis) kafka.common.KafkaStorageException: I/O exception in append to log 'test-0' at kafka.log.Log.append(Log.scala:266) at kafka.cluster.Partition$$anonfun$appendMessagesToLeader$1.apply(Partition.scala:379) at kafka.cluster.Partition$$anonfun$appendMessagesToLeader$1.apply(Partition.scala:365) at kafka.utils.Utils$.inLock(Utils.scala:535) at kafka.utils.Utils$.inReadLock(Utils.scala:541) at kafka.cluster.Partition.appendMessagesToLeader(Partition.scala:365) at kafka.server.KafkaApis$$anonfun$appendToLocalLog$2.apply(KafkaApis.scala:291) at kafka.server.KafkaApis$$anonfun$appendToLocalLog$2.apply(KafkaApis.scala:282) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala.coll Can you help me with this problem? Thanks. On Tue, Jul 14, 2015 at 5:12 PM, Yuheng Du yuheng.du.h...@gmail.com wrote: I checked the logs on the brokers, it seems that the zookeeper or the kafka server process is not running on this broker...Thank you guys. I will see if it happens again. On Tue, Jul 14, 2015 at 4:53 PM, JIEFU GONG jg...@berkeley.edu wrote: Hmm..yeah some error logs would be nice like Gwen pointed out. Do any of your brokers fall out of the ISR when sending messages? It seems like your setup should be fine, so I'm not entirely sure. On Tue, Jul 14, 2015 at 1:31 PM, Yuheng Du yuheng.du.h...@gmail.com wrote: Jiefu, I am performing these tests on a 6 nodes cluster in cloudlab (a infrastructure built for scientific research). I use 2 nodes as producers, 2 as brokers only, and 2 as consumers. I have tested for each individual machines and they work well. I did not use AWS. Thank you! On Tue, Jul 14, 2015 at 4:20 PM, JIEFU GONG jg...@berkeley.edu wrote: Yuheng, are you performing these tests locally or using a service such as AWS? I'd try using each separate machine individually first, connecting to the ZK/Kafka servers and ensuring that each is able to first log and consume messages independently. On Tue, Jul 14, 2015 at 1:17 PM, Gwen Shapira gshap...@cloudera.com wrote: Are there any errors on the broker logs? On Tue, Jul 14, 2015 at 11:57 AM, Yuheng Du yuheng.du.h...@gmail.com wrote: Jiefu, Thank you. The three producers can run at the same time. I mean should they be started at exactly the same time? (I have three consoles from each of the three machines and I just start the console command manually one by one) Or a small variation of the starting time won't matter? Gwen and Jiefu, I have started the three producers at three machines, after a while, all of them gives a java.net.ConnectException: [2015-07-14 12:56:46,352] WARN Error in I/O with producer0-link
Re: How to run the three producers test
But is there a way to let kafka override the old data if the disk is filled? Or is it not necessary to use this figure? Thanks. On Tue, Jul 14, 2015 at 10:14 PM, Yuheng Du yuheng.du.h...@gmail.com wrote: Jiefu, I agree with you. I checked the hardware specs of my machines, each one of them has: RAM 256GB ECC Memory (16x 16 GB DDR4 1600MT/s dual rank RDIMMs Disk Two 1 TB 7.2K RPM 3G SATA HDDs For the throughput versus stored data test, it uses 5*10^10 messages, which has the total volume of 5TB, I made the replication factor to be 3, which means the total size including replicas would be 15TB, which apparently overwhelmed the two brokers I use. Thanks. best, Yuheng On Tue, Jul 14, 2015 at 6:06 PM, JIEFU GONG jg...@berkeley.edu wrote: Someone may correct me if I am incorrect, but how much disk space do you have on these nodes? Your exception 'No space left on device' from one of your brokers seems to suggest that you're full (after all you're writing 500 million records). If this is the case I believe the expected behavior for Kafka is to reject any more attempts to write data? On Tue, Jul 14, 2015 at 2:27 PM, Yuheng Du yuheng.du.h...@gmail.com wrote: Also, the log in another broker (not the bootstrap) says: [2015-07-14 15:18:41,220] FATAL [Replica Manager on Broker 1]: Error writing to highwatermark file: (kafka.server.ReplicaManager) [2015-07-14 15:18:40,160] ERROR Closing socket for /130.127.133.47 because of error (kafka.network.Process or) java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcherImpl.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) at sun.nio.ch.IOUtil.read(IOUtil.java:197) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379) at kafka.utils.Utils$.read(Utils.scala:380) at kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54) at kafka.network.Processor.read(SocketServer.scala:444) at kafka.network.Processor.run(SocketServer.scala:340) at java.lang.Thread.run(Thread.java:745) java.io.IOException: No space left on device at java.io.FileOutputStream.writeBytes(Native Method) at java.io.FileOutputStream.write(FileOutputStream.java:345) at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221) at sun.nio.cs.StreamEncoder.implFlushBuffe (END) On Tue, Jul 14, 2015 at 5:24 PM, Yuheng Du yuheng.du.h...@gmail.com wrote: Hi Jiefu, Gwen, I am running the Throughput versus stored data test: bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance test 500 100 -1 acks=1 bootstrap.servers= esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864 batch.size=8196 After around 50,000,000 messages were sent, I got a bunch of connection refused error as I mentioned before. I checked the logs on the broker and here is what I see: [2015-07-14 15:11:23,578] WARN Partition [test,4] on broker 5: No checkpointed highwatermark is found for partition [test,4] (kafka.cluster.Partition) [2015-07-14 15:12:33,298] INFO Rolled new log segment for 'test-4' in 4 ms. (kafka.log.Log) [2015-07-14 15:12:33,299] INFO Rolled new log segment for 'test-0' in 1 ms. (kafka.log.Log) [2015-07-14 15:13:39,529] INFO Rolled new log segment for 'test-4' in 1 ms. (kafka.log.Log) [2015-07-14 15:13:39,531] INFO Rolled new log segment for 'test-0' in 1 ms. (kafka.log.Log) [2015-07-14 15:14:48,502] INFO Rolled new log segment for 'test-4' in 3 ms. (kafka.log.Log) [2015-07-14 15:14:48,502] INFO Rolled new log segment for 'test-0' in 1 ms. (kafka.log.Log) [2015-07-14 15:15:51,478] INFO Rolled new log segment for 'test-4' in 1 ms. (kafka.log.Log) [2015-07-14 15:15:51,479] INFO Rolled new log segment for 'test-0' in 1 ms. (kafka.log.Log) [2015-07-14 15:16:52,589] INFO Rolled new log segment for 'test-4' in 1 ms. (kafka.log.Log) [2015-07-14 15:16:52,590] INFO Rolled new log segment for 'test-0' in 1 ms. (kafka.log.Log) [2015-07-14 15:17:57,406] INFO Rolled new log segment for 'test-4' in 1 ms. (kafka.log.Log) [2015-07-14 15:17:57,407] INFO Rolled new log segment for 'test-0' in 0 ms. (kafka.log.Log) [2015-07-14 15:18:39,792] FATAL [KafkaApi-5] Halting due to unrecoverable I/O error while handling produce request: (kafka.server.KafkaApis) kafka.common.KafkaStorageException: I/O exception in append to log 'test-0' at kafka.log.Log.append(Log.scala:266) at kafka.cluster.Partition$$anonfun$appendMessagesToLeader$1.apply(Partition.scala:379) at kafka.cluster.Partition
Re: How to run the three producers test
Jiefu, I agree with you. I checked the hardware specs of my machines, each one of them has: RAM 256GB ECC Memory (16x 16 GB DDR4 1600MT/s dual rank RDIMMs Disk Two 1 TB 7.2K RPM 3G SATA HDDs For the throughput versus stored data test, it uses 5*10^10 messages, which has the total volume of 5TB, I made the replication factor to be 3, which means the total size including replicas would be 15TB, which apparently overwhelmed the two brokers I use. Thanks. best, Yuheng On Tue, Jul 14, 2015 at 6:06 PM, JIEFU GONG jg...@berkeley.edu wrote: Someone may correct me if I am incorrect, but how much disk space do you have on these nodes? Your exception 'No space left on device' from one of your brokers seems to suggest that you're full (after all you're writing 500 million records). If this is the case I believe the expected behavior for Kafka is to reject any more attempts to write data? On Tue, Jul 14, 2015 at 2:27 PM, Yuheng Du yuheng.du.h...@gmail.com wrote: Also, the log in another broker (not the bootstrap) says: [2015-07-14 15:18:41,220] FATAL [Replica Manager on Broker 1]: Error writing to highwatermark file: (kafka.server.ReplicaManager) [2015-07-14 15:18:40,160] ERROR Closing socket for /130.127.133.47 because of error (kafka.network.Process or) java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcherImpl.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) at sun.nio.ch.IOUtil.read(IOUtil.java:197) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379) at kafka.utils.Utils$.read(Utils.scala:380) at kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54) at kafka.network.Processor.read(SocketServer.scala:444) at kafka.network.Processor.run(SocketServer.scala:340) at java.lang.Thread.run(Thread.java:745) java.io.IOException: No space left on device at java.io.FileOutputStream.writeBytes(Native Method) at java.io.FileOutputStream.write(FileOutputStream.java:345) at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221) at sun.nio.cs.StreamEncoder.implFlushBuffe (END) On Tue, Jul 14, 2015 at 5:24 PM, Yuheng Du yuheng.du.h...@gmail.com wrote: Hi Jiefu, Gwen, I am running the Throughput versus stored data test: bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance test 500 100 -1 acks=1 bootstrap.servers= esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864 batch.size=8196 After around 50,000,000 messages were sent, I got a bunch of connection refused error as I mentioned before. I checked the logs on the broker and here is what I see: [2015-07-14 15:11:23,578] WARN Partition [test,4] on broker 5: No checkpointed highwatermark is found for partition [test,4] (kafka.cluster.Partition) [2015-07-14 15:12:33,298] INFO Rolled new log segment for 'test-4' in 4 ms. (kafka.log.Log) [2015-07-14 15:12:33,299] INFO Rolled new log segment for 'test-0' in 1 ms. (kafka.log.Log) [2015-07-14 15:13:39,529] INFO Rolled new log segment for 'test-4' in 1 ms. (kafka.log.Log) [2015-07-14 15:13:39,531] INFO Rolled new log segment for 'test-0' in 1 ms. (kafka.log.Log) [2015-07-14 15:14:48,502] INFO Rolled new log segment for 'test-4' in 3 ms. (kafka.log.Log) [2015-07-14 15:14:48,502] INFO Rolled new log segment for 'test-0' in 1 ms. (kafka.log.Log) [2015-07-14 15:15:51,478] INFO Rolled new log segment for 'test-4' in 1 ms. (kafka.log.Log) [2015-07-14 15:15:51,479] INFO Rolled new log segment for 'test-0' in 1 ms. (kafka.log.Log) [2015-07-14 15:16:52,589] INFO Rolled new log segment for 'test-4' in 1 ms. (kafka.log.Log) [2015-07-14 15:16:52,590] INFO Rolled new log segment for 'test-0' in 1 ms. (kafka.log.Log) [2015-07-14 15:17:57,406] INFO Rolled new log segment for 'test-4' in 1 ms. (kafka.log.Log) [2015-07-14 15:17:57,407] INFO Rolled new log segment for 'test-0' in 0 ms. (kafka.log.Log) [2015-07-14 15:18:39,792] FATAL [KafkaApi-5] Halting due to unrecoverable I/O error while handling produce request: (kafka.server.KafkaApis) kafka.common.KafkaStorageException: I/O exception in append to log 'test-0' at kafka.log.Log.append(Log.scala:266) at kafka.cluster.Partition$$anonfun$appendMessagesToLeader$1.apply(Partition.scala:379) at kafka.cluster.Partition$$anonfun$appendMessagesToLeader$1.apply(Partition.scala:365) at kafka.utils.Utils$.inLock(Utils.scala:535) at kafka.utils.Utils$.inReadLock(Utils.scala:541
Re: How to run the three producers test
Jiefu, Now even if the disk space is enough (less than 18%), when I run it still gives me error where in the logs it says: [2015-07-14 23:08:48,735] FATAL Fatal error during KafkaServerStartable startup. Prepare to shutdown (kafka.server.KafkaServerStartable) org.I0Itec.zkclient.exception.ZkTimeoutException: Unable to connect to zookeeper server within timeout: 6000 at org.I0Itec.zkclient.ZkClient.connect(ZkClient.java:880) at org.I0Itec.zkclient.ZkClient.init(ZkClient.java:98) at org.I0Itec.zkclient.ZkClient.init(ZkClient.java:84) at kafka.server.KafkaServer.initZk(KafkaServer.scala:157) at kafka.server.KafkaServer.startup(KafkaServer.scala:82) at kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:29) at kafka.Kafka$.main(Kafka.scala:46) at kafka.Kafka.main(Kafka.scala) [2015-07-14 23:08:48,737] INFO [Kafka Server 1], shutting down (kafka.server.KafkaServer) I have checked that the zookeeper is running fine. Can anyone help why I got the error? Thanks. On Tue, Jul 14, 2015 at 10:24 PM, Yuheng Du yuheng.du.h...@gmail.com wrote: But is there a way to let kafka override the old data if the disk is filled? Or is it not necessary to use this figure? Thanks. On Tue, Jul 14, 2015 at 10:14 PM, Yuheng Du yuheng.du.h...@gmail.com wrote: Jiefu, I agree with you. I checked the hardware specs of my machines, each one of them has: RAM 256GB ECC Memory (16x 16 GB DDR4 1600MT/s dual rank RDIMMs Disk Two 1 TB 7.2K RPM 3G SATA HDDs For the throughput versus stored data test, it uses 5*10^10 messages, which has the total volume of 5TB, I made the replication factor to be 3, which means the total size including replicas would be 15TB, which apparently overwhelmed the two brokers I use. Thanks. best, Yuheng On Tue, Jul 14, 2015 at 6:06 PM, JIEFU GONG jg...@berkeley.edu wrote: Someone may correct me if I am incorrect, but how much disk space do you have on these nodes? Your exception 'No space left on device' from one of your brokers seems to suggest that you're full (after all you're writing 500 million records). If this is the case I believe the expected behavior for Kafka is to reject any more attempts to write data? On Tue, Jul 14, 2015 at 2:27 PM, Yuheng Du yuheng.du.h...@gmail.com wrote: Also, the log in another broker (not the bootstrap) says: [2015-07-14 15:18:41,220] FATAL [Replica Manager on Broker 1]: Error writing to highwatermark file: (kafka.server.ReplicaManager) [2015-07-14 15:18:40,160] ERROR Closing socket for /130.127.133.47 because of error (kafka.network.Process or) java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcherImpl.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) at sun.nio.ch.IOUtil.read(IOUtil.java:197) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379) at kafka.utils.Utils$.read(Utils.scala:380) at kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54) at kafka.network.Processor.read(SocketServer.scala:444) at kafka.network.Processor.run(SocketServer.scala:340) at java.lang.Thread.run(Thread.java:745) java.io.IOException: No space left on device at java.io.FileOutputStream.writeBytes(Native Method) at java.io.FileOutputStream.write(FileOutputStream.java:345) at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221) at sun.nio.cs.StreamEncoder.implFlushBuffe (END) On Tue, Jul 14, 2015 at 5:24 PM, Yuheng Du yuheng.du.h...@gmail.com wrote: Hi Jiefu, Gwen, I am running the Throughput versus stored data test: bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance test 500 100 -1 acks=1 bootstrap.servers= esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864 batch.size=8196 After around 50,000,000 messages were sent, I got a bunch of connection refused error as I mentioned before. I checked the logs on the broker and here is what I see: [2015-07-14 15:11:23,578] WARN Partition [test,4] on broker 5: No checkpointed highwatermark is found for partition [test,4] (kafka.cluster.Partition) [2015-07-14 15:12:33,298] INFO Rolled new log segment for 'test-4' in 4 ms. (kafka.log.Log) [2015-07-14 15:12:33,299] INFO Rolled new log segment for 'test-0' in 1 ms. (kafka.log.Log) [2015-07-14 15:13:39,529] INFO Rolled new log segment for 'test-4' in 1 ms. (kafka.log.Log) [2015-07-14 15:13:39,531] INFO Rolled new log segment for 'test-0' in 1 ms. (kafka.log.Log) [2015-07-14 15:14:48,502] INFO Rolled new log segment for 'test-4' in 3 ms. (kafka.log.Log) [2015-07-14
How to run the three producers test
Hi, I am running the performance test for kafka. https://gist.github.com/jkreps /c7ddb4041ef62a900e6c For the Three Producers, 3x async replication scenario, the command is the same as one producer: bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance test 5000 100 -1 acks=1 bootstrap.servers=esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864 batch.size=8196 So How to I run the test for three producers? Do I just run them on three separate servers at the same time? Will there be some error in this way since the three producers can't be started at the same time? Thanks. best, Yuheng
Re: How to run the three producers test
Jiefu, Thank you. The three producers can run at the same time. I mean should they be started at exactly the same time? (I have three consoles from each of the three machines and I just start the console command manually one by one) Or a small variation of the starting time won't matter? Gwen and Jiefu, I have started the three producers at three machines, after a while, all of them gives a java.net.ConnectException: [2015-07-14 12:56:46,352] WARN Error in I/O with producer0-link-0/ 192.168.1.1 (org.apache.kafka.common.network.Selector) java.net.ConnectException: Connection refused.. [2015-07-14 12:56:48,056] WARN Error in I/O with producer1-link-0/ 192.168.1.2 (org.apache.kafka.common.network.Selector) java.net.ConnectException: Connection refused. What could be the cause? Thank you guys! On Tue, Jul 14, 2015 at 2:47 PM, JIEFU GONG jg...@berkeley.edu wrote: Yuheng, Yes, if you read the blog post it specifies that he's using three separate machines. There's no reason the producers cannot be started at the same time, I believe. On Tue, Jul 14, 2015 at 11:42 AM, Yuheng Du yuheng.du.h...@gmail.com wrote: Hi, I am running the performance test for kafka. https://gist.github.com/jkreps /c7ddb4041ef62a900e6c For the Three Producers, 3x async replication scenario, the command is the same as one producer: bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance test 5000 100 -1 acks=1 bootstrap.servers=esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864 batch.size=8196 So How to I run the test for three producers? Do I just run them on three separate servers at the same time? Will there be some error in this way since the three producers can't be started at the same time? Thanks. best, Yuheng -- Jiefu Gong University of California, Berkeley | Class of 2017 B.A Computer Science | College of Letters and Sciences jg...@berkeley.edu elise...@berkeley.edu | (925) 400-3427
Latency test
Currently, the latency test from kafka test the end to end latency between producers and consumers. Is there a way to test the producer to broker and broker to consumer delay seperately? Thanks.
Re: performance benchmarking of kafka
Hi, Appreciate your response. It works now! It is just a typo of the class names : (. It really has nothing to do with whether you are using the binaries or the source version of kafka. Thanks everyone! On Mon, Jul 13, 2015 at 11:18 PM, tao xiao xiaotao...@gmail.com wrote: org.apache.kafka.clients.tools.ProducerPerformance resides in kafka-clients-0.8.2.1.jar. You need to make sure the jar exists in $KAFKA_HOME/libs/. I use kafka_2.10-0.8.2.1 too and here is the output % bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance USAGE: java org.apache.kafka.clients.tools.ProducerPerformance topic_name num_records record_size target_records_sec [prop_name=prop_value]* On Tue, 14 Jul 2015 at 05:08 Yuheng Du yuheng.du.h...@gmail.com wrote: I am using the binaries of kafka_2.10-0.8.2.1. Could that be the problem? Should I use the source of kafka-0.8.2.1-src.tgz to each of my machiines, build them and run the test? Thanks. On Mon, Jul 13, 2015 at 4:37 PM, JIEFU GONG jg...@berkeley.edu wrote: You may need to open up your run-class.sh in a text editor and modify the classpath -- I believe I had a similar error before. On Mon, Jul 13, 2015 at 1:16 PM, Yuheng Du yuheng.du.h...@gmail.com wrote: Hi guys, I am trying to replicate the test of benchmarking kafka at http://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines . When I run bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance test7 5000 100 -1 acks=1 bootstrap.servers=192.168.1.1:9092 buffer.memory=67108864 batch.size=8196 and I got the following error: Error: Could not find or load main class org.apache.kafka.client.tools.ProducerPerformance What should I fix? Thank you! -- Jiefu Gong University of California, Berkeley | Class of 2017 B.A Computer Science | College of Letters and Sciences jg...@berkeley.edu elise...@berkeley.edu | (925) 400-3427
performance benchmarking of kafka
Hi guys, I am trying to replicate the test of benchmarking kafka at http://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines . When I run bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance test7 5000 100 -1 acks=1 bootstrap.servers=192.168.1.1:9092 buffer.memory=67108864 batch.size=8196 and I got the following error: Error: Could not find or load main class org.apache.kafka.client.tools.ProducerPerformance What should I fix? Thank you!
Re: performance benchmarking of kafka
Thank you. I see that in run-class.sh, they have the following lines: 63 for file in $base_dir/clients/build/libs/kafka-clients*.jar; 64 do 65 CLASSPATH=$CLASSPATH:$file 66 done So I believe all the jars in the libs/ directory have already been included in the classpath? Which directory is the ProducerPerformance class resides? Thanks. On Mon, Jul 13, 2015 at 4:37 PM, JIEFU GONG jg...@berkeley.edu wrote: You may need to open up your run-class.sh in a text editor and modify the classpath -- I believe I had a similar error before. On Mon, Jul 13, 2015 at 1:16 PM, Yuheng Du yuheng.du.h...@gmail.com wrote: Hi guys, I am trying to replicate the test of benchmarking kafka at http://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines . When I run bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance test7 5000 100 -1 acks=1 bootstrap.servers=192.168.1.1:9092 buffer.memory=67108864 batch.size=8196 and I got the following error: Error: Could not find or load main class org.apache.kafka.client.tools.ProducerPerformance What should I fix? Thank you! -- Jiefu Gong University of California, Berkeley | Class of 2017 B.A Computer Science | College of Letters and Sciences jg...@berkeley.edu elise...@berkeley.edu | (925) 400-3427
Re: performance benchmarking of kafka
I am using the binaries of kafka_2.10-0.8.2.1. Could that be the problem? Should I use the source of kafka-0.8.2.1-src.tgz to each of my machiines, build them and run the test? Thanks. On Mon, Jul 13, 2015 at 4:37 PM, JIEFU GONG jg...@berkeley.edu wrote: You may need to open up your run-class.sh in a text editor and modify the classpath -- I believe I had a similar error before. On Mon, Jul 13, 2015 at 1:16 PM, Yuheng Du yuheng.du.h...@gmail.com wrote: Hi guys, I am trying to replicate the test of benchmarking kafka at http://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines . When I run bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance test7 5000 100 -1 acks=1 bootstrap.servers=192.168.1.1:9092 buffer.memory=67108864 batch.size=8196 and I got the following error: Error: Could not find or load main class org.apache.kafka.client.tools.ProducerPerformance What should I fix? Thank you! -- Jiefu Gong University of California, Berkeley | Class of 2017 B.A Computer Science | College of Letters and Sciences jg...@berkeley.edu elise...@berkeley.edu | (925) 400-3427
Re: A kafka web monitor
Hi Wan, I tried to install this DCMonitor, but when I try to clone the project, but it gives me Permission denied, the remote end hung up unexpectedly. Can you provide any suggestions to this issue? Thanks. best, Yuheng On Mon, Mar 23, 2015 at 8:54 AM, Wan Wei flowbeha...@gmail.com wrote: We have make a simple web console to monitor some kafka informations like consumer offset, logsize. https://github.com/shunfei/DCMonitor Hope you like it and offer your help to make it better :) Regards Flow
kafka topic information
I am wondering where does kafka cluster keep the topic metadata (name, partition, replication, etc)? How does a server recover the topic's metadata and messages after restart and what data will be lost? Thanks for anyone to answer my questions. best, Yuheng
Re: kafka topic information
Harsha, Thanks for reply. So what if the zookeeper cluster fails? Will the topics information be lost? What fault-tolerant mechanism does zookeeper offer? best, On Mon, Mar 9, 2015 at 11:36 AM, Harsha ka...@harsha.io wrote: Yuheng, kafka keeps cluster metadata in zookeeper along with topic metadata as well. You can use zookeeper-shell.sh or zkCli.sh to check zk nodes, /brokers/topics will give you the list of topics . -- Harsha On March 9, 2015 at 8:20:59 AM, Yuheng Du (yuheng.du.h...@gmail.com) wrote: I am wondering where does kafka cluster keep the topic metadata (name, partition, replication, etc)? How does a server recover the topic's metadata and messages after restart and what data will be lost? Thanks for anyone to answer my questions. best, Yuheng
Re: kafka topic information
Thanks, got it! best, Yuheng On Mon, Mar 9, 2015 at 11:52 AM, Harsha ka...@harsha.io wrote: In general users are expected to run zookeeper cluster of 3 or 5 nodes. Zookeeper requires quorum of servers running which means at least ceil(n/2) servers need to be up. For 3 zookeeper nodes there needs to be atleast 2 zk nodes up at any time , i.e your cluster can function fine incase of 1 machine failure and incase of 5 there should be at least 3 nodes to be up and running. For more info on zookeeper you can look under here http://zookeeper.apache.org/doc/r3.4.6/ http://zookeeper.apache.org/doc/r3.4.6/zookeeperAdmin.html -- Harsha On March 9, 2015 at 8:39:00 AM, Yuheng Du (yuheng.du.h...@gmail.com) wrote: Harsha, Thanks for reply. So what if the zookeeper cluster fails? Will the topics information be lost? What fault-tolerant mechanism does zookeeper offer? best, On Mon, Mar 9, 2015 at 11:36 AM, Harsha ka...@harsha.io wrote: Yuheng, kafka keeps cluster metadata in zookeeper along with topic metadata as well. You can use zookeeper-shell.sh or zkCli.sh to check zk nodes, /brokers/topics will give you the list of topics . -- Harsha On March 9, 2015 at 8:20:59 AM, Yuheng Du (yuheng.du.h...@gmail.com) wrote: I am wondering where does kafka cluster keep the topic metadata (name, partition, replication, etc)? How does a server recover the topic's metadata and messages after restart and what data will be lost? Thanks for anyone to answer my questions. best, Yuheng
Re: Set up kafka cluster
will do, Thanks! On Thu, Mar 5, 2015 at 3:35 PM, Gwen Shapira gshap...@cloudera.com wrote: Did you take a look at the quick-start guide? https://kafka.apache.org/082/quickstart.html It shows how to set up a single node, how to validate that its working and then how to set up multi-node cluster. Good luck! On Thu, Mar 5, 2015 at 12:30 PM, Yuheng Du yuheng.du.h...@gmail.com wrote: Thank you Gwen, I also need the kafka cluster continue to provide message brokering service to a Storm cluster after the benchmarking. I am fairly new to cluster setups. So is there an instruction telling me how to set up the three-node kafka cluster before running benchmarking? That would be really helpful. Thanks gain. best, Yuheng On Thu, Mar 5, 2015 at 3:22 PM, Gwen Shapira gshap...@cloudera.com wrote: Jay Kreps has a gist with step by step instructions for reproducing the benchmarks used by LinkedIn: https://gist.github.com/jkreps/c7ddb4041ef62a900e6c And the blog with the results: https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines Gwen On Thu, Mar 5, 2015 at 12:16 PM, Yuheng Du yuheng.du.h...@gmail.com wrote: Hi everyone, I am trying to set up a kafka cluster consisting of three machines. I wanna run a benchmarking program in them. Can anyone recommend a step by step tutorial/instruction of how I can do it? Thanks. best, Yuheng
Re: Set up kafka cluster
Thank you Gwen, I also need the kafka cluster continue to provide message brokering service to a Storm cluster after the benchmarking. I am fairly new to cluster setups. So is there an instruction telling me how to set up the three-node kafka cluster before running benchmarking? That would be really helpful. Thanks gain. best, Yuheng On Thu, Mar 5, 2015 at 3:22 PM, Gwen Shapira gshap...@cloudera.com wrote: Jay Kreps has a gist with step by step instructions for reproducing the benchmarks used by LinkedIn: https://gist.github.com/jkreps/c7ddb4041ef62a900e6c And the blog with the results: https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines Gwen On Thu, Mar 5, 2015 at 12:16 PM, Yuheng Du yuheng.du.h...@gmail.com wrote: Hi everyone, I am trying to set up a kafka cluster consisting of three machines. I wanna run a benchmarking program in them. Can anyone recommend a step by step tutorial/instruction of how I can do it? Thanks. best, Yuheng
Set up kafka cluster
Hi everyone, I am trying to set up a kafka cluster consisting of three machines. I wanna run a benchmarking program in them. Can anyone recommend a step by step tutorial/instruction of how I can do it? Thanks. best, Yuheng