Kafka stream or ksql design question

2018-07-16 Thread Will Du
Hi folks,
As far as I know, Kafka Stream is a separate process by reading data from 
topic, transform, and writing to another topic if needed. In this case, how 
this process supports high throughout stream as well as load balance in terms 
of message traffic and computing resource for stream processing?

Regarding to KSL, is there any query optimization in place or in roadmap?

Thanks,
Will



Re: Kafka as a data ingest

2017-01-10 Thread Will Du
In terms of big files which is quite often in HDFS, does connect task parallel 
process the same file like what MR deal with split files? I do not think so. In 
this case, Kafka connect implement has no advantages to read single big file 
unless you also use mapreduce.

Sent from my iPhone

On Jan 10, 2017, at 02:41, Ewen Cheslack-Postava  wrote:

>> However, I'm trying to figure out if I can use Kafka to read Hadoop file.
> 
> The question is a bit unclear as to whether you mean "use Kafka to send
> data to a Hadoop file" or "use Kafka to read a Hadoop file into a Kafka
> topic". But in both cases, Kafka Connect provides a good option.
> 
> The more common use case is sending data that you have in Kafka into HDFS.
> In that case,
> http://docs.confluent.io/3.1.1/connect/connect-hdfs/docs/hdfs_connector.html
> is a good option.
> 
> If you want the less common case of sending data from HDFS files into a
> stream of Kafka records, I'm not aware of a connector for doing that yet
> but it is definitely possible. Kafka Connect takes care of a lot of the
> details for you so all you have to do is read the file and emit Connect's
> SourceRecords containing the data from the file. Most other details are
> handled for you.
> 
> -Ewen
> 
>> On Mon, Jan 9, 2017 at 9:18 PM, Sharninder  wrote:
>> 
>> If you want to know if "kafka" can read hadoop files, then no. But you can
>> write your own producer that reads from hdfs any which way and pushes to
>> kafka. We use kafka as the ingestion pipeline's main queue. Read from
>> various sources and push everything to kafka.
>> 
>> 
>> On Tue, Jan 10, 2017 at 6:26 AM, Cas Apanowicz <
>> cas.apanow...@it-horizon.com
>>> wrote:
>> 
>>> Hi,
>>> 
>>> I have general understanding of main Kafka functionality as a streaming
>>> tool.
>>> However, I'm trying to figure out if I can use Kafka to read Hadoop file.
>>> Can you please advise?
>>> Thanks
>>> 
>>> Cas
>>> 
>>> 
>> 
>> 
>> --
>> --
>> Sharninder
>> 


Kafka connect distribute start failed

2016-12-05 Thread Will Du
Hi folks,
I try to start the kafka connect in the distribute ways as follows. It has 
below error. Standalone mode is fine. It happens on the 3.0.1. and 3.1 version 
of confluent kafka. Des anyone know the cause of this error?
Thanks,
Will

security.protocol = PLAINTEXT
internal.key.converter = class 
org.apache.kafka.connect.json.JsonConverter
access.control.allow.methods =
ssl.keymanager.algorithm = SunX509
metrics.sample.window.ms = 3
 (org.apache.kafka.connect.runtime.distributed.DistributedConfig:178)
[2016-12-05 21:24:14,457] INFO Logging initialized @991ms 
(org.eclipse.jetty.util.log:186)
Exception in thread "main" org.apache.kafka.common.KafkaException: Failed to 
construct kafka consumer
at 
org.apache.kafka.connect.runtime.distributed.WorkerGroupMember.(WorkerGroupMember.java:125)
at 
org.apache.kafka.connect.runtime.distributed.DistributedHerder.(DistributedHerder.java:148)
at 
org.apache.kafka.connect.runtime.distributed.DistributedHerder.(DistributedHerder.java:130)
at 
org.apache.kafka.connect.cli.ConnectDistributed.main(ConnectDistributed.java:84)
Caused by: java.lang.NoSuchMethodError: 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator.(Lorg/apache/kafka/clients/consumer/internals/ConsumerNetworkClient;Ljava/lang/String;IIILorg/apache/kafka/common/metrics/Metrics;Ljava/lang/String;Lorg/apache/kafka/common/utils/Time;J)V
at 
org.apache.kafka.connect.runtime.distributed.WorkerCoordinator.(WorkerCoordinator.java:77)
at 
org.apache.kafka.connect.runtime.distributed.WorkerGroupMember.(WorkerGroupMember.java:105)
... 3 more

How to collect connect metrcs

2016-12-03 Thread Will Du
Hi folks,
How I can collect Kafka connect metrics from Confluent? Are there any API to 
use?
In addition, if one file is very big, can multiple task working on the same 
file simultaneously?

Thanks,
Will



Re: Link read avro from Kafka Connect Issue

2016-11-02 Thread Will Du
By using the kafka-avro-console-consumer I am able to get rich message from 
kafka connect with AvroConvert, but it got no output except schema from Flink 

By using the producer with defaultEncoding, the kafka-avro-console-consumer 
throws exceptions show how. But Flink consumer works. But my target is to get 
Flink costume avro data produced by Kafka connect

> On Nov 2, 2016, at 7:36 PM, Will Du <will...@gmail.com> wrote:
> 
> 
> On Nov 2, 2016, at 7:31 PM, Will Du <will...@gmail.com 
> <mailto:will...@gmail.com>> wrote:
> 
> Hi folks,
> I am trying to consume avro data from Kafka in Flink. The data is produced by 
> Kafka connect using AvroConverter. I have created a 
> AvroDeserializationSchema.java 
> <https://gist.github.com/datafibers/ae9d624b6db44865ae14defe8a838123> used by 
> Flink consumer. Then, I use following code to read it.
> 
> public static void main(String[] args) throws Exception {
>   StreamExecutionEnvironment env = 
> StreamExecutionEnvironment.getExecutionEnvironment();
>   Properties properties = new Properties();
>   properties.setProperty("bootstrap.servers", “localhost:9092");
>   properties.setProperty("zookeeper.connect", “localhost:2181”);
> Schema schema = new Parser().parse("{" + "\"name\": \"test\", "
>  + "\"type\": \"record\", "
>  + "\"fields\": "
>  +" [ "
>  + "  { \"name\": \"name\", \"type\": 
> \"string\" },"
>  + "  { \"name\": \"symbol\", 
> \"type\": \"string\" },"
>  + "  { \"name\": \"exchange\", 
> \"type\": \"string\"}"
>  + "] "
>  +"}");
> 
>   AvroDeserializationSchema avroSchema = new 
> AvroDeserializationSchema<>(schema);
>   FlinkKafkaConsumer09 kafkaConsumer = 
>   new FlinkKafkaConsumer09<>("myavrotopic",avroSchema, 
> properties);
>   DataStream messageStream = 
> env.addSource(kafkaConsumer);
>   messageStream.rebalance().print();
>   env.execute("Flink AVRO KAFKA Test");
> }
> 
> Once, I run the code, I am able to get the schema information only as follows.
> {"name":"", "symbol":"", "exchange":""}
> {"name":"", "symbol":"", "exchange":""}
> {"name":"", "symbol":"", "exchange":""}
> {"name":"", "symbol":"", "exchange":”"}
> 
> Could anyone help to find out the issues why I cannot decode it?
> 
> Further troubleshooting, I found out if I use a kafka producer here 
> <https://gist.github.com/datafibers/d063b255b50fa34515c0ac9e24d4485c> to send 
> the avro data especially using kafka.serializer.DefaultEncoder. Above code 
> can get correct result. Does any body know how to either set DefaultEncoder 
> in Kafka Connect or set it when writing customized kafka connect? Or in the 
> other way, how should I modify the AvroDeserializationSchema.java for instead?
> 
> Thanks, I’ll post this to the Flink user group as well.
> Will



Link read avro from Kafka Connect Issue

2016-11-02 Thread Will Du

On Nov 2, 2016, at 7:31 PM, Will Du <will...@gmail.com> wrote:

Hi folks,
I am trying to consume avro data from Kafka in Flink. The data is produced by 
Kafka connect using AvroConverter. I have created a 
AvroDeserializationSchema.java 
<https://gist.github.com/datafibers/ae9d624b6db44865ae14defe8a838123> used by 
Flink consumer. Then, I use following code to read it.

public static void main(String[] args) throws Exception {
  StreamExecutionEnvironment env = 
StreamExecutionEnvironment.getExecutionEnvironment();
  Properties properties = new Properties();
  properties.setProperty("bootstrap.servers", “localhost:9092");
  properties.setProperty("zookeeper.connect", “localhost:2181”);
Schema schema = new Parser().parse("{" + "\"name\": \"test\", "
   + "\"type\": \"record\", "
   + "\"fields\": "
   +" [ "
   + "  { \"name\": \"name\", \"type\": 
\"string\" },"
   + "  { \"name\": \"symbol\", 
\"type\": \"string\" },"
   + "  { \"name\": \"exchange\", 
\"type\": \"string\"}"
   + "] "
   +"}");

  AvroDeserializationSchema avroSchema = new 
AvroDeserializationSchema<>(schema);
  FlinkKafkaConsumer09 kafkaConsumer = 
new FlinkKafkaConsumer09<>("myavrotopic",avroSchema, 
properties);
  DataStream messageStream = 
env.addSource(kafkaConsumer);
  messageStream.rebalance().print();
  env.execute("Flink AVRO KAFKA Test");
}

Once, I run the code, I am able to get the schema information only as follows.
{"name":"", "symbol":"", "exchange":""}
{"name":"", "symbol":"", "exchange":""}
{"name":"", "symbol":"", "exchange":""}
{"name":"", "symbol":"", "exchange":”"}

Could anyone help to find out the issues why I cannot decode it?

Further troubleshooting, I found out if I use a kafka producer here 
<https://gist.github.com/datafibers/d063b255b50fa34515c0ac9e24d4485c> to send 
the avro data especially using kafka.serializer.DefaultEncoder. Above code can 
get correct result. Does any body know how to either set DefaultEncoder in 
Kafka Connect or set it when writing customized kafka connect? Or in the other 
way, how should I modify the AvroDeserializationSchema.java for instead?

Thanks, I’ll post this to the Flink user group as well.
Will

connection time out

2015-11-29 Thread Yuheng Du
Hi guys,

I was running a single node broker in a cluster. And when I run the
producer in another cluster, I got connection time out error.

I can ping into port 9092 and other ports on the broker machine from the
producer. I just can't publish any messages. The command I used to run the
producer is:

bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance
speedx2 50 100 -1 acks=1 bootstrap.servers=130.127.xxx.xxx:9092
buffer.memory=67184 batch.size=8196

Can anyone suggest what the problem might be?


Thank you!


best,

Yuheng


Re: connection time out

2015-11-29 Thread Yuheng Du
Also, I can see the topic "speedx2" being created in the broker, but not
message data is coming through.

On Sun, Nov 29, 2015 at 7:00 PM, Yuheng Du <yuheng.du.h...@gmail.com> wrote:

> Hi guys,
>
> I was running a single node broker in a cluster. And when I run the
> producer in another cluster, I got connection time out error.
>
> I can ping into port 9092 and other ports on the broker machine from the
> producer. I just can't publish any messages. The command I used to run the
> producer is:
>
> bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance
> speedx2 50 100 -1 acks=1 bootstrap.servers=130.127.xxx.xxx:9092
> buffer.memory=67184 batch.size=8196
>
> Can anyone suggest what the problem might be?
>
>
> Thank you!
>
>
> best,
>
> Yuheng
>


Re: producer api

2015-09-14 Thread Yuheng Du
Thank you Erik. But in my setup, there is only one node whose public ip
provided in my broker cluster, so I can only use one bootstrap broker as
for now.


On Mon, Sep 14, 2015 at 3:50 PM, Helleren, Erik <erik.helle...@cmegroup.com>
wrote:

> You only need one of the brokers to connect for publishing.  Kafka will
> tell the client about all the other brokers.  But best practices state
> including all of them is best.
> -Erik
>
> On 9/14/15, 2:46 PM, "Yuheng Du" <yuheng.du.h...@gmail.com> wrote:
>
> >I am writing a kafka producer application in java. I want the producer to
> >publish data to a cluster of 6 brokers. Is there a way to specify only the
> >load balancing node but not all the brokers list?
> >
> >For example, like in the benchmarking kafka commandssdg:
> >
> >bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance
> >test 5000 100 -1 acks=-1 bootstrap.servers=
> >esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864
> batch.size=64000
> >
> >it only specifies the bootstrap server node but not all the broker list
> >like:
> >
> >Properties props = new Properties();
> >
> >props.put("metadata.broker.list", "broker1:9092,broker2:9092");
> >
> >
> >
> >Thanks for replying.
>
>


producer api

2015-09-14 Thread Yuheng Du
I am writing a kafka producer application in java. I want the producer to
publish data to a cluster of 6 brokers. Is there a way to specify only the
load balancing node but not all the brokers list?

For example, like in the benchmarking kafka commandssdg:

bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance
test 5000 100 -1 acks=-1 bootstrap.servers=
esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864 batch.size=64000

it only specifies the bootstrap server node but not all the broker list
like:

Properties props = new Properties();

props.put("metadata.broker.list", "broker1:9092,broker2:9092");



Thanks for replying.


Re: latency test

2015-09-09 Thread Yuheng Du
Thank you Erik.

In my test I am using fixed 200bytes messages and I run 500k messages per
producer on 92 physically isolated producers. Each test run takes about 20
minutes. As the broker cluster is migrated into a new physical cluster, I
will perform my test and get the latency results in the next couple of
weeks.

I will keep you posted.

Thanks.

On Wed, Sep 9, 2015 at 4:58 PM, Helleren, Erik <erik.helle...@cmegroup.com>
wrote:

> Yes, and that can really hurt average performance.  All the partitions
> were nearly identical up to the 99%’ile, and had very good performance at
> that level hovering around a few milli’s.  But when looking beyond the
> 99%’ile, there was that clear fork in the distribution where a set of 3
> partitions surged upwards.  This could be for a dozen different reasons:
> Network blips, noisy networks, location in the network, resource
> contention on that broker, etc.  But it effected that one broker more than
> others.  And the reasons for my cluster displaying this behavior could be
> very different than the reason for any other cluster.
>
> Its worth noting that this was mostly a latency test over a stress test.
> There was a single kafka producer object, very small message sizes (100
> bytes), and it was only pushing through around 5MB/s worth of data. And
> the client was configured to minimize the amount of data that would be on
> the internal queue/buffer waiting to be sent.  The messages that were
> being sent were compromised of 10 byte ascii ‘words’ selected randomly
> from a dictionary of 1000 words, which benefits compression while still
> resulting in likely unique messages.  And the test I ran was only for 6
> min, and I did not do the work required to see if there was a burst of
> slower messages which caused this behavior, or if it was a consistent
> issue with that node.
> -Erik
>
>
> On 9/9/15, 2:24 PM, "Yuheng Du" <yuheng.du.h...@gmail.com> wrote:
>
> >So are you suggesting that the long delays happened in %1 percentile
> >happens in the slower partitions that are further away? Thanks.
> >
> >On Wed, Sep 9, 2015 at 3:15 PM, Helleren, Erik
> ><erik.helle...@cmegroup.com>
> >wrote:
> >
> >> So, I did my own latency test on a cluster of 3 nodes, and there is a
> >> significant difference around the 99%’ile and higher for partitions when
> >> measuring the the ack time when configured for a single ack.  The graph
> >> that I wish I could attach or post clearly shows that around 1/3 of the
> >> partitions significantly diverge from the other two.  So, at least in my
> >> case, one of my brokers is further than the others.
> >> -Erik
> >>
> >> On 9/4/15, 1:06 PM, "Yuheng Du" <yuheng.du.h...@gmail.com> wrote:
> >>
> >> >No problem. Thanks for your advice. I think it would be fun to
> >>explore. I
> >> >only know how to program in java though. Hope it will work.
> >> >
> >> >On Fri, Sep 4, 2015 at 2:03 PM, Helleren, Erik
> >> ><erik.helle...@cmegroup.com>
> >> >wrote:
> >> >
> >> >> I thing the suggestion is to have partitions/brokers >=1, so 32
> >>should
> >> >>be
> >> >> enough.
> >> >>
> >> >> As for latency tests, there isn’t a lot of code to do a latency test.
> >> >>If
> >> >> you just want to measure ack time its around 100 lines.  I will try
> >>to
> >> >> push out some good latency testing code to github, but my company is
> >> >> scared of open sourcing code… so it might be a while…
> >> >> -Erik
> >> >>
> >> >>
> >> >> On 9/4/15, 12:55 PM, "Yuheng Du" <yuheng.du.h...@gmail.com> wrote:
> >> >>
> >> >> >Thanks for your reply Erik. I am running some more tests according
> >>to
> >> >>your
> >> >> >suggestions now and I will share with my results here. Is it
> >>necessary
> >> >>to
> >> >> >use a fixed number of partitions (32 partitions maybe) for my test?
> >> >> >
> >> >> >I am testing 2, 4, 8, 16 and 32 brokers scenarios, all of them are
> >> >>running
> >> >> >on individual physical nodes. So I think using at least 32
> >>partitions
> >> >>will
> >> >> >make more sense? I have seen latencies increase as the number of
> >> >> >partitions
> >> >> >goes up in my experiments.
> >> >> >
> >&g

Re: latency test

2015-09-09 Thread Yuheng Du
So are you suggesting that the long delays happened in %1 percentile
happens in the slower partitions that are further away? Thanks.

On Wed, Sep 9, 2015 at 3:15 PM, Helleren, Erik <erik.helle...@cmegroup.com>
wrote:

> So, I did my own latency test on a cluster of 3 nodes, and there is a
> significant difference around the 99%’ile and higher for partitions when
> measuring the the ack time when configured for a single ack.  The graph
> that I wish I could attach or post clearly shows that around 1/3 of the
> partitions significantly diverge from the other two.  So, at least in my
> case, one of my brokers is further than the others.
> -Erik
>
> On 9/4/15, 1:06 PM, "Yuheng Du" <yuheng.du.h...@gmail.com> wrote:
>
> >No problem. Thanks for your advice. I think it would be fun to explore. I
> >only know how to program in java though. Hope it will work.
> >
> >On Fri, Sep 4, 2015 at 2:03 PM, Helleren, Erik
> ><erik.helle...@cmegroup.com>
> >wrote:
> >
> >> I thing the suggestion is to have partitions/brokers >=1, so 32 should
> >>be
> >> enough.
> >>
> >> As for latency tests, there isn’t a lot of code to do a latency test.
> >>If
> >> you just want to measure ack time its around 100 lines.  I will try to
> >> push out some good latency testing code to github, but my company is
> >> scared of open sourcing code… so it might be a while…
> >> -Erik
> >>
> >>
> >> On 9/4/15, 12:55 PM, "Yuheng Du" <yuheng.du.h...@gmail.com> wrote:
> >>
> >> >Thanks for your reply Erik. I am running some more tests according to
> >>your
> >> >suggestions now and I will share with my results here. Is it necessary
> >>to
> >> >use a fixed number of partitions (32 partitions maybe) for my test?
> >> >
> >> >I am testing 2, 4, 8, 16 and 32 brokers scenarios, all of them are
> >>running
> >> >on individual physical nodes. So I think using at least 32 partitions
> >>will
> >> >make more sense? I have seen latencies increase as the number of
> >> >partitions
> >> >goes up in my experiments.
> >> >
> >> >To get the latency of each event data recorded, are you suggesting
> >>that I
> >> >rewrite my own test program (in Java perhaps) or I can just modify the
> >> >standard test program provided by kafka (
> >> >https://gist.github.com/jkreps/c7ddb4041ef62a900e6c )? I guess I need
> >>to
> >> >rebuild the source if I modify the standard java test program
> >> >ProducerPerformance provided in kafka, right? Now this standard program
> >> >only has average latencies and percentile latencies but no per event
> >> >latencies.
> >> >
> >> >Thanks.
> >> >
> >> >On Fri, Sep 4, 2015 at 1:42 PM, Helleren, Erik
> >> ><erik.helle...@cmegroup.com>
> >> >wrote:
> >> >
> >> >> That is an excellent question!  There are a bunch of ways to monitor
> >> >> jitter and see when that is happening.  Here are a few:
> >> >>
> >> >> - You could slice the histogram every few seconds, save it out with a
> >> >> timestamp, and then look at how they compare.  This would be mostly
> >> >> manual, or you can graph line charts of the percentiles over time in
> >> >>excel
> >> >> where each percentile would be a series.  If you are using HDR
> >> >>Histogram,
> >> >> you should look at how to use the Recorder class to do this coupled
> >> >>with a
> >> >> ScheduledExecutorService.
> >> >>
> >> >> - You can just save the starting timestamp of the event and the
> >>latency
> >> >>of
> >> >> each event.  If you put it into a CSV, you can just load it up into
> >> >>excel
> >> >> and graph as a XY chart.  That way you can see every point during the
> >> >> running of your program and you can see trends.  You want to be
> >>careful
> >> >> about this one, especially of writing to a file in the callback that
> >> >>kfaka
> >> >> provides.
> >> >>
> >> >> Also, I have noticed that most of the very slow observations are at
> >> >> startup.  But don’t trust me, trust the data and share your findings.
> >> >> Also, having a 99.9 percentile provides a pretty good standard for
> >&g

When a message is exposed to the consumer

2015-09-04 Thread Yuheng Du
According to the section 3.1 of the paper "Kafka: a Distributed Messaging
System for Log Processing":

"a message is only exposed to the consumers after it is flushed"?

Is it still true in the current kafka? like the message can only be
available after it is flushed to disk?

Thanks.


Re: latency test

2015-09-04 Thread Yuheng Du
When I using 32 partitions, the 4 brokers latency becomes larger than the 8
brokers latency.

So is it always true that using more brokers can give less latency when the
number of partitions is at least the size of the brokers?

Thanks.

On Thu, Sep 3, 2015 at 10:45 PM, Yuheng Du <yuheng.du.h...@gmail.com> wrote:

> I am running a producer latency test. When using 92 producers in 92
> physical node publishing to 4 brokers, the latency is slightly lower than
> using 8 brokers, I am using 8 partitions for the topic.
>
> I have rerun the test and it gives me the same result, the 4 brokers
> scenario still has lower latency than the 8 brokers scenarios.
>
> It is weird because I tested 1broker, 2 brokers, 4 brokers, 8 brokers, 16
> brokers and 32 brokers. For the rest of the case the latency decreases as
> the number of brokers increase.
>
> 4 brokers/8 brokers is the only pair that doesn't satisfy this rule. What
> could be the cause?
>
> I am using a 200 bytes message, the test let each producer publishes 500k
> messages to a given topic. Every test run when I change the number of
> brokers, I use a new topic.
>
> Thanks for any advices.
>


Re: latency test

2015-09-04 Thread Yuheng Du
Thank you Erik! That's is helpful!

But also I see jitters of the maximum latencies when running the
experiment.

The average end to acknowledgement latency from producer to broker is
around 5ms when using 92 producers and 4 brokers, and the 99.9 percentile
latency is 58ms, but the maximum latency goes up to 1359 ms. How to locate
the source of this jitter?

Thanks.

On Fri, Sep 4, 2015 at 10:54 AM, Helleren, Erik <erik.helle...@cmegroup.com>
wrote:

> WellŠ not to be contrarian, but latency depends much more on the latency
> between the producer and the broker that is the leader for the partition
> you are publishing to.  At least when your brokers are not saturated with
> messages, and acks are set to 1.  If acks are set to ALL, latency on an
> non-saturated kafka cluster will be: Round Trip Latency from producer to
> leader for partition + Max( slowest Round Trip Latency to a replicas of
> that partition).  If a cluster is saturated with messages, we have to
> assume that all partitions receive an equal distribution of messages to
> avoid linear algebra and queueing theory models.  I don¹t like linear
> algebra :P
>
> Since you are probably putting all your latencies into a single histogram
> per producer, or worse, just an average, this pattern would have been
> obscured.  Obligatory lecture about measuring latency by Gil Tene
> (https://www.youtube.com/watch?v=9MKY4KypBzg).  To verify this hypothesis,
> you should re-write the benchmark to plot the latencies for each write to
> a partition for each producer into a histogram. (HRD histogram is pretty
> good for that).  This would give you producers*partitions histograms,
> which might be unwieldy for that many producers. But wait, there is hope!
>
> To verify that this hypothesis holds, you just have to see that there is a
> significant difference between different partitions on a SINGLE producing
> client. So, pick one producing client at random and use the data from
> that. The easy way to do that is just plot all the partition latency
> histograms on top of each other in the same plot, that way you have a
> pretty plot to show people.  If you don¹t want to setup plotting, you can
> just compare the medians (50¹th percentile) of the partitions¹ histograms.
>  If there is a lot of variance, your latency anomaly is explained by
> brokers 4-7 being slower than nodes 0-3!  If there isn¹t a lot of variance
> at 50%, look at higher percentiles.  And if higher percentiles for all the
> partitions look the same, this hypothesis is disproved.
>
> If you want to make a general statement about latency of writing to kafka,
> you can merge all the histograms into a single histogram and plot that.
>
> To Yuheng¹s credit, more brokers always results in more throughput. But
> throughput and latency are two different creatures.  Its worth noting that
> kafka is designed to be high throughput first and low latency second.  And
> it does a really good job at both.
>
> Disclaimer: I might not like linear algebra, but I do like statistics.
> Let me know if there are topics that need more explanation above that
> aren¹t covered by Gil¹s lecture.
> -Erik
>
> On 9/4/15, 9:03 AM, "Yuheng Du" <yuheng.du.h...@gmail.com> wrote:
>
> >When I using 32 partitions, the 4 brokers latency becomes larger than the
> >8
> >brokers latency.
> >
> >So is it always true that using more brokers can give less latency when
> >the
> >number of partitions is at least the size of the brokers?
> >
> >Thanks.
> >
> >On Thu, Sep 3, 2015 at 10:45 PM, Yuheng Du <yuheng.du.h...@gmail.com>
> >wrote:
> >
> >> I am running a producer latency test. When using 92 producers in 92
> >> physical node publishing to 4 brokers, the latency is slightly lower
> >>than
> >> using 8 brokers, I am using 8 partitions for the topic.
> >>
> >> I have rerun the test and it gives me the same result, the 4 brokers
> >> scenario still has lower latency than the 8 brokers scenarios.
> >>
> >> It is weird because I tested 1broker, 2 brokers, 4 brokers, 8 brokers,
> >>16
> >> brokers and 32 brokers. For the rest of the case the latency decreases
> >>as
> >> the number of brokers increase.
> >>
> >> 4 brokers/8 brokers is the only pair that doesn't satisfy this rule.
> >>What
> >> could be the cause?
> >>
> >> I am using a 200 bytes message, the test let each producer publishes
> >>500k
> >> messages to a given topic. Every test run when I change the number of
> >> brokers, I use a new topic.
> >>
> >> Thanks for any advices.
> >>
>
>


Re: When a message is exposed to the consumer

2015-09-04 Thread Yuheng Du
Can't read it. Sorry

On Fri, Sep 4, 2015 at 12:08 PM, Roman Shramkov <roman_shram...@epam.com>
wrote:

> Её ай н Анны уйг
>
> sent from a mobile device, please excuse brevity and typos
>
>
> ----Пользователь Yuheng Du написал 
>
> According to the section 3.1 of the paper "Kafka: a Distributed Messaging
> System for Log Processing":
>
> "a message is only exposed to the consumers after it is flushed"?
>
> Is it still true in the current kafka? like the message can only be
> available after it is flushed to disk?
>
> Thanks.
>


Re: latency test

2015-09-04 Thread Yuheng Du
Thanks for your reply Erik. I am running some more tests according to your
suggestions now and I will share with my results here. Is it necessary to
use a fixed number of partitions (32 partitions maybe) for my test?

I am testing 2, 4, 8, 16 and 32 brokers scenarios, all of them are running
on individual physical nodes. So I think using at least 32 partitions will
make more sense? I have seen latencies increase as the number of partitions
goes up in my experiments.

To get the latency of each event data recorded, are you suggesting that I
rewrite my own test program (in Java perhaps) or I can just modify the
standard test program provided by kafka (
https://gist.github.com/jkreps/c7ddb4041ef62a900e6c )? I guess I need to
rebuild the source if I modify the standard java test program
ProducerPerformance provided in kafka, right? Now this standard program
only has average latencies and percentile latencies but no per event
latencies.

Thanks.

On Fri, Sep 4, 2015 at 1:42 PM, Helleren, Erik <erik.helle...@cmegroup.com>
wrote:

> That is an excellent question!  There are a bunch of ways to monitor
> jitter and see when that is happening.  Here are a few:
>
> - You could slice the histogram every few seconds, save it out with a
> timestamp, and then look at how they compare.  This would be mostly
> manual, or you can graph line charts of the percentiles over time in excel
> where each percentile would be a series.  If you are using HDR Histogram,
> you should look at how to use the Recorder class to do this coupled with a
> ScheduledExecutorService.
>
> - You can just save the starting timestamp of the event and the latency of
> each event.  If you put it into a CSV, you can just load it up into excel
> and graph as a XY chart.  That way you can see every point during the
> running of your program and you can see trends.  You want to be careful
> about this one, especially of writing to a file in the callback that kfaka
> provides.
>
> Also, I have noticed that most of the very slow observations are at
> startup.  But don’t trust me, trust the data and share your findings.
> Also, having a 99.9 percentile provides a pretty good standard for typical
> poor case performance.  Average is borderline useless, 50%’ile is a better
> typical case because that’s the number that says “half of events will be
> this slow or faster”, or for values that are high like 99.9%’ile, “0.1% of
> all events will be slower than this”.
> -Erik
>
> On 9/4/15, 12:05 PM, "Yuheng Du" <yuheng.du.h...@gmail.com> wrote:
>
> >Thank you Erik! That's is helpful!
> >
> >But also I see jitters of the maximum latencies when running the
> >experiment.
> >
> >The average end to acknowledgement latency from producer to broker is
> >around 5ms when using 92 producers and 4 brokers, and the 99.9 percentile
> >latency is 58ms, but the maximum latency goes up to 1359 ms. How to locate
> >the source of this jitter?
> >
> >Thanks.
> >
> >On Fri, Sep 4, 2015 at 10:54 AM, Helleren, Erik
> ><erik.helle...@cmegroup.com>
> >wrote:
> >
> >> WellŠ not to be contrarian, but latency depends much more on the latency
> >> between the producer and the broker that is the leader for the partition
> >> you are publishing to.  At least when your brokers are not saturated
> >>with
> >> messages, and acks are set to 1.  If acks are set to ALL, latency on an
> >> non-saturated kafka cluster will be: Round Trip Latency from producer to
> >> leader for partition + Max( slowest Round Trip Latency to a replicas of
> >> that partition).  If a cluster is saturated with messages, we have to
> >> assume that all partitions receive an equal distribution of messages to
> >> avoid linear algebra and queueing theory models.  I don¹t like linear
> >> algebra :P
> >>
> >> Since you are probably putting all your latencies into a single
> >>histogram
> >> per producer, or worse, just an average, this pattern would have been
> >> obscured.  Obligatory lecture about measuring latency by Gil Tene
> >> (https://www.youtube.com/watch?v=9MKY4KypBzg).  To verify this
> >>hypothesis,
> >> you should re-write the benchmark to plot the latencies for each write
> >>to
> >> a partition for each producer into a histogram. (HRD histogram is pretty
> >> good for that).  This would give you producers*partitions histograms,
> >> which might be unwieldy for that many producers. But wait, there is
> >>hope!
> >>
> >> To verify that this hypothesis holds, you just have to see that there
> >>is a
> >> significant difference between different partitions on a S

Re: latency test

2015-09-04 Thread Yuheng Du
No problem. Thanks for your advice. I think it would be fun to explore. I
only know how to program in java though. Hope it will work.

On Fri, Sep 4, 2015 at 2:03 PM, Helleren, Erik <erik.helle...@cmegroup.com>
wrote:

> I thing the suggestion is to have partitions/brokers >=1, so 32 should be
> enough.
>
> As for latency tests, there isn’t a lot of code to do a latency test.  If
> you just want to measure ack time its around 100 lines.  I will try to
> push out some good latency testing code to github, but my company is
> scared of open sourcing code… so it might be a while…
> -Erik
>
>
> On 9/4/15, 12:55 PM, "Yuheng Du" <yuheng.du.h...@gmail.com> wrote:
>
> >Thanks for your reply Erik. I am running some more tests according to your
> >suggestions now and I will share with my results here. Is it necessary to
> >use a fixed number of partitions (32 partitions maybe) for my test?
> >
> >I am testing 2, 4, 8, 16 and 32 brokers scenarios, all of them are running
> >on individual physical nodes. So I think using at least 32 partitions will
> >make more sense? I have seen latencies increase as the number of
> >partitions
> >goes up in my experiments.
> >
> >To get the latency of each event data recorded, are you suggesting that I
> >rewrite my own test program (in Java perhaps) or I can just modify the
> >standard test program provided by kafka (
> >https://gist.github.com/jkreps/c7ddb4041ef62a900e6c )? I guess I need to
> >rebuild the source if I modify the standard java test program
> >ProducerPerformance provided in kafka, right? Now this standard program
> >only has average latencies and percentile latencies but no per event
> >latencies.
> >
> >Thanks.
> >
> >On Fri, Sep 4, 2015 at 1:42 PM, Helleren, Erik
> ><erik.helle...@cmegroup.com>
> >wrote:
> >
> >> That is an excellent question!  There are a bunch of ways to monitor
> >> jitter and see when that is happening.  Here are a few:
> >>
> >> - You could slice the histogram every few seconds, save it out with a
> >> timestamp, and then look at how they compare.  This would be mostly
> >> manual, or you can graph line charts of the percentiles over time in
> >>excel
> >> where each percentile would be a series.  If you are using HDR
> >>Histogram,
> >> you should look at how to use the Recorder class to do this coupled
> >>with a
> >> ScheduledExecutorService.
> >>
> >> - You can just save the starting timestamp of the event and the latency
> >>of
> >> each event.  If you put it into a CSV, you can just load it up into
> >>excel
> >> and graph as a XY chart.  That way you can see every point during the
> >> running of your program and you can see trends.  You want to be careful
> >> about this one, especially of writing to a file in the callback that
> >>kfaka
> >> provides.
> >>
> >> Also, I have noticed that most of the very slow observations are at
> >> startup.  But don’t trust me, trust the data and share your findings.
> >> Also, having a 99.9 percentile provides a pretty good standard for
> >>typical
> >> poor case performance.  Average is borderline useless, 50%’ile is a
> >>better
> >> typical case because that’s the number that says “half of events will be
> >> this slow or faster”, or for values that are high like 99.9%’ile, “0.1%
> >>of
> >> all events will be slower than this”.
> >> -Erik
> >>
> >> On 9/4/15, 12:05 PM, "Yuheng Du" <yuheng.du.h...@gmail.com> wrote:
> >>
> >> >Thank you Erik! That's is helpful!
> >> >
> >> >But also I see jitters of the maximum latencies when running the
> >> >experiment.
> >> >
> >> >The average end to acknowledgement latency from producer to broker is
> >> >around 5ms when using 92 producers and 4 brokers, and the 99.9
> >>percentile
> >> >latency is 58ms, but the maximum latency goes up to 1359 ms. How to
> >>locate
> >> >the source of this jitter?
> >> >
> >> >Thanks.
> >> >
> >> >On Fri, Sep 4, 2015 at 10:54 AM, Helleren, Erik
> >> ><erik.helle...@cmegroup.com>
> >> >wrote:
> >> >
> >> >> WellŠ not to be contrarian, but latency depends much more on the
> >>latency
> >> >> between the producer and the broker that is the leader for the
> >>partition
> >> >> you are pub

latency test

2015-09-03 Thread Yuheng Du
I am running a producer latency test. When using 92 producers in 92
physical node publishing to 4 brokers, the latency is slightly lower than
using 8 brokers, I am using 8 partitions for the topic.

I have rerun the test and it gives me the same result, the 4 brokers
scenario still has lower latency than the 8 brokers scenarios.

It is weird because I tested 1broker, 2 brokers, 4 brokers, 8 brokers, 16
brokers and 32 brokers. For the rest of the case the latency decreases as
the number of brokers increase.

4 brokers/8 brokers is the only pair that doesn't satisfy this rule. What
could be the cause?

I am using a 200 bytes message, the test let each producer publishes 500k
messages to a given topic. Every test run when I change the number of
brokers, I use a new topic.

Thanks for any advices.


Re: Reduce latency

2015-08-18 Thread Yuheng Du
Also, When I set the target throughput to be 1 records/s, The actual
test results show I got an average of 579.86 records per second among all
my producers. How did that happen? Why this number is not 1 then?
Thanks.

On Tue, Aug 18, 2015 at 10:03 AM, Yuheng Du yuheng.du.h...@gmail.com
wrote:

 Thank you Jay, that really helps!

 Kishore, Where you can monitor whether the network is busy on IO in visual
 vm? Thanks. I am running 90 producer process on 90 physical machines in the
 experiment.

 On Tue, Aug 18, 2015 at 1:19 AM, Jay Kreps j...@confluent.io wrote:

 Yuheng,

 From the command you gave it looks like you are configuring the perf test
 to send data as fast as possible (the -1 for target throughput). This
 means
 it will always queue up a bunch of unsent data until the buffer is
 exhausted and then block. The larger the buffer, the bigger the queue.
 This
 is where the latency comes from. This is exactly what you would expect and
 what the buffering is supposed to do.

 If you want to measure latency this test doesn't really make sense, you
 need to measure with some fixed throughput. Instead of -1 enter the target
 throughput you want to measure latency at (e.g. 10 records/sec).

 -Jay

 On Thu, Aug 13, 2015 at 12:18 PM, Yuheng Du yuheng.du.h...@gmail.com
 wrote:

  Thank you Alvaro,
 
  How to use sync producers? I am running the standard ProducerPerformance
  test from kafka to measure the latency of each message to send from
  producer to broker only.
  The command is like bin/kafka-run-class.sh
  org.apache.kafka.clients.tools.ProducerPerformance test7 5000 100 -1
  acks=1 bootstrap.servers=esv4-hcl198.grid.linkedin.com:9092
  buffer.memory=67108864 batch.size=8196
 
  For running producers, where should I put the producer.type=sync
  configuration into? The config/server.properties? Also Does this mean we
  are using batch size of 1? Which version of Kafka are you using?
  thanks.
 
  On Thu, Aug 13, 2015 at 3:01 PM, Alvaro Gareppe agare...@gmail.com
  wrote:
 
   Are you measuring latency as time between producer and consumer ?
  
   In that case, the ack shouldn't affect the latency, cause even tough
 your
   producer is not going to wait for the ack, the consumer will only get
 the
   message after its commited in the server.
  
   About latency my best result occur with sync producers, but the
  throughput
   is much lower in that case.
  
   About not flushing to disk I'm pretty sure that it's not an option in
  kafka
   (correct me if I'm wrong)
  
   Regards,
   Alvaro Gareppe
  
   On Thu, Aug 13, 2015 at 12:59 PM, Yuheng Du yuheng.du.h...@gmail.com
 
   wrote:
  
Also, the latency results show no major difference when using ack=0
 or
ack=1. Why is that?
   
On Thu, Aug 13, 2015 at 11:51 AM, Yuheng Du 
 yuheng.du.h...@gmail.com
wrote:
   
 I am running an experiment where 92 producers is publishing data
  into 6
 brokers and 10 consumer are reading online data simultaneously.

 How should I do to reduce the latency? Currently when I run the
   producer
 performance test the average latency is around 10s.

 Should I disable log.flush? How to do that? Thanks.

   
  
  
  
   --
   Ing. Alvaro Gareppe
   agare...@gmail.com
  
 





Re: Reduce latency

2015-08-18 Thread Yuheng Du
I see. Thank you Tao. But now I don't get it what Jay said that my latency
test only makes sense if I set a fixed throughput. Why do I need to set a
fixed throughput for my test instead of just set the expected throughput to
be -1 (as much as possible)?

Thanks.

On Tue, Aug 18, 2015 at 2:43 PM, Tao Feng fengta...@gmail.com wrote:

 Hi Yuheng,

 The 1 record/s is just a param for producerperformance for your
 producer target tput. It only takes effect to do the throttling if you
 tries to send more than 1 record/s.  The actual tput of the test
 depends on your producer config and your setup.

 -Tao

 On Tue, Aug 18, 2015 at 11:34 AM, Yuheng Du yuheng.du.h...@gmail.com
 wrote:

  Also, When I set the target throughput to be 1 records/s, The actual
  test results show I got an average of 579.86 records per second among all
  my producers. How did that happen? Why this number is not 1 then?
  Thanks.
 
  On Tue, Aug 18, 2015 at 10:03 AM, Yuheng Du yuheng.du.h...@gmail.com
  wrote:
 
   Thank you Jay, that really helps!
  
   Kishore, Where you can monitor whether the network is busy on IO in
  visual
   vm? Thanks. I am running 90 producer process on 90 physical machines in
  the
   experiment.
  
   On Tue, Aug 18, 2015 at 1:19 AM, Jay Kreps j...@confluent.io wrote:
  
   Yuheng,
  
   From the command you gave it looks like you are configuring the perf
  test
   to send data as fast as possible (the -1 for target throughput). This
   means
   it will always queue up a bunch of unsent data until the buffer is
   exhausted and then block. The larger the buffer, the bigger the queue.
   This
   is where the latency comes from. This is exactly what you would expect
  and
   what the buffering is supposed to do.
  
   If you want to measure latency this test doesn't really make sense,
 you
   need to measure with some fixed throughput. Instead of -1 enter the
  target
   throughput you want to measure latency at (e.g. 10 records/sec).
  
   -Jay
  
   On Thu, Aug 13, 2015 at 12:18 PM, Yuheng Du yuheng.du.h...@gmail.com
 
   wrote:
  
Thank you Alvaro,
   
How to use sync producers? I am running the standard
  ProducerPerformance
test from kafka to measure the latency of each message to send from
producer to broker only.
The command is like bin/kafka-run-class.sh
org.apache.kafka.clients.tools.ProducerPerformance test7 5000
 100
  -1
acks=1 bootstrap.servers=esv4-hcl198.grid.linkedin.com:9092
buffer.memory=67108864 batch.size=8196
   
For running producers, where should I put the producer.type=sync
configuration into? The config/server.properties? Also Does this
 mean
  we
are using batch size of 1? Which version of Kafka are you using?
thanks.
   
On Thu, Aug 13, 2015 at 3:01 PM, Alvaro Gareppe agare...@gmail.com
 
wrote:
   
 Are you measuring latency as time between producer and consumer ?

 In that case, the ack shouldn't affect the latency, cause even
 tough
   your
 producer is not going to wait for the ack, the consumer will only
  get
   the
 message after its commited in the server.

 About latency my best result occur with sync producers, but the
throughput
 is much lower in that case.

 About not flushing to disk I'm pretty sure that it's not an option
  in
kafka
 (correct me if I'm wrong)

 Regards,
 Alvaro Gareppe

 On Thu, Aug 13, 2015 at 12:59 PM, Yuheng Du 
  yuheng.du.h...@gmail.com
   
 wrote:

  Also, the latency results show no major difference when using
  ack=0
   or
  ack=1. Why is that?
 
  On Thu, Aug 13, 2015 at 11:51 AM, Yuheng Du 
   yuheng.du.h...@gmail.com
  wrote:
 
   I am running an experiment where 92 producers is publishing
 data
into 6
   brokers and 10 consumer are reading online data
 simultaneously.
  
   How should I do to reduce the latency? Currently when I run
 the
 producer
   performance test the average latency is around 10s.
  
   Should I disable log.flush? How to do that? Thanks.
  
 



 --
 Ing. Alvaro Gareppe
 agare...@gmail.com

   
  
  
  
 



Re: Reduce latency

2015-08-18 Thread Yuheng Du
I see. So the internal queue overwrites the producer buffer size
configuration? When buffer is full the producer will block sending, right?

On Tue, Aug 18, 2015 at 3:52 PM, Tao Feng fengta...@gmail.com wrote:

 From what I understand, if you set the throughput to -1, the
 producerperformance will push records as much as possible to an internal
 per topic per partition queue. In the background there is a sender IO
 thread handling the actual record sending process. If you push record to
 the queue faster than the send rate, your queue  will become longer and
 longer, eventually record latency will become meaningless for a
 latency-purpose test.


 On Tue, Aug 18, 2015 at 11:48 AM, Yuheng Du yuheng.du.h...@gmail.com
 wrote:

  I see. Thank you Tao. But now I don't get it what Jay said that my
 latency
  test only makes sense if I set a fixed throughput. Why do I need to set a
  fixed throughput for my test instead of just set the expected throughput
 to
  be -1 (as much as possible)?
 
  Thanks.
 
  On Tue, Aug 18, 2015 at 2:43 PM, Tao Feng fengta...@gmail.com wrote:
 
   Hi Yuheng,
  
   The 1 record/s is just a param for producerperformance for your
   producer target tput. It only takes effect to do the throttling if you
   tries to send more than 1 record/s.  The actual tput of the test
   depends on your producer config and your setup.
  
   -Tao
  
   On Tue, Aug 18, 2015 at 11:34 AM, Yuheng Du yuheng.du.h...@gmail.com
   wrote:
  
Also, When I set the target throughput to be 1 records/s, The
  actual
test results show I got an average of 579.86 records per second among
  all
my producers. How did that happen? Why this number is not 1 then?
Thanks.
   
On Tue, Aug 18, 2015 at 10:03 AM, Yuheng Du 
 yuheng.du.h...@gmail.com
wrote:
   
 Thank you Jay, that really helps!

 Kishore, Where you can monitor whether the network is busy on IO in
visual
 vm? Thanks. I am running 90 producer process on 90 physical
 machines
  in
the
 experiment.

 On Tue, Aug 18, 2015 at 1:19 AM, Jay Kreps j...@confluent.io
 wrote:

 Yuheng,

 From the command you gave it looks like you are configuring the
 perf
test
 to send data as fast as possible (the -1 for target throughput).
  This
 means
 it will always queue up a bunch of unsent data until the buffer is
 exhausted and then block. The larger the buffer, the bigger the
  queue.
 This
 is where the latency comes from. This is exactly what you would
  expect
and
 what the buffering is supposed to do.

 If you want to measure latency this test doesn't really make
 sense,
   you
 need to measure with some fixed throughput. Instead of -1 enter
 the
target
 throughput you want to measure latency at (e.g. 10
 records/sec).

 -Jay

 On Thu, Aug 13, 2015 at 12:18 PM, Yuheng Du 
  yuheng.du.h...@gmail.com
   
 wrote:

  Thank you Alvaro,
 
  How to use sync producers? I am running the standard
ProducerPerformance
  test from kafka to measure the latency of each message to send
  from
  producer to broker only.
  The command is like bin/kafka-run-class.sh
  org.apache.kafka.clients.tools.ProducerPerformance test7
 5000
   100
-1
  acks=1 bootstrap.servers=esv4-hcl198.grid.linkedin.com:9092
  buffer.memory=67108864 batch.size=8196
 
  For running producers, where should I put the producer.type=sync
  configuration into? The config/server.properties? Also Does this
   mean
we
  are using batch size of 1? Which version of Kafka are you using?
  thanks.
 
  On Thu, Aug 13, 2015 at 3:01 PM, Alvaro Gareppe 
  agare...@gmail.com
   
  wrote:
 
   Are you measuring latency as time between producer and
 consumer
  ?
  
   In that case, the ack shouldn't affect the latency, cause even
   tough
 your
   producer is not going to wait for the ack, the consumer will
  only
get
 the
   message after its commited in the server.
  
   About latency my best result occur with sync producers, but
 the
  throughput
   is much lower in that case.
  
   About not flushing to disk I'm pretty sure that it's not an
  option
in
  kafka
   (correct me if I'm wrong)
  
   Regards,
   Alvaro Gareppe
  
   On Thu, Aug 13, 2015 at 12:59 PM, Yuheng Du 
yuheng.du.h...@gmail.com
 
   wrote:
  
Also, the latency results show no major difference when
 using
ack=0
 or
ack=1. Why is that?
   
On Thu, Aug 13, 2015 at 11:51 AM, Yuheng Du 
 yuheng.du.h...@gmail.com
wrote:
   
 I am running an experiment where 92 producers is
 publishing
   data
  into 6
 brokers and 10 consumer are reading online data
   simultaneously.

 How should I do to reduce the latency? Currently when I

Re: Reduce latency

2015-08-18 Thread Yuheng Du
Thank you Jay, that really helps!

Kishore, Where you can monitor whether the network is busy on IO in visual
vm? Thanks. I am running 90 producer process on 90 physical machines in the
experiment.

On Tue, Aug 18, 2015 at 1:19 AM, Jay Kreps j...@confluent.io wrote:

 Yuheng,

 From the command you gave it looks like you are configuring the perf test
 to send data as fast as possible (the -1 for target throughput). This means
 it will always queue up a bunch of unsent data until the buffer is
 exhausted and then block. The larger the buffer, the bigger the queue. This
 is where the latency comes from. This is exactly what you would expect and
 what the buffering is supposed to do.

 If you want to measure latency this test doesn't really make sense, you
 need to measure with some fixed throughput. Instead of -1 enter the target
 throughput you want to measure latency at (e.g. 10 records/sec).

 -Jay

 On Thu, Aug 13, 2015 at 12:18 PM, Yuheng Du yuheng.du.h...@gmail.com
 wrote:

  Thank you Alvaro,
 
  How to use sync producers? I am running the standard ProducerPerformance
  test from kafka to measure the latency of each message to send from
  producer to broker only.
  The command is like bin/kafka-run-class.sh
  org.apache.kafka.clients.tools.ProducerPerformance test7 5000 100 -1
  acks=1 bootstrap.servers=esv4-hcl198.grid.linkedin.com:9092
  buffer.memory=67108864 batch.size=8196
 
  For running producers, where should I put the producer.type=sync
  configuration into? The config/server.properties? Also Does this mean we
  are using batch size of 1? Which version of Kafka are you using?
  thanks.
 
  On Thu, Aug 13, 2015 at 3:01 PM, Alvaro Gareppe agare...@gmail.com
  wrote:
 
   Are you measuring latency as time between producer and consumer ?
  
   In that case, the ack shouldn't affect the latency, cause even tough
 your
   producer is not going to wait for the ack, the consumer will only get
 the
   message after its commited in the server.
  
   About latency my best result occur with sync producers, but the
  throughput
   is much lower in that case.
  
   About not flushing to disk I'm pretty sure that it's not an option in
  kafka
   (correct me if I'm wrong)
  
   Regards,
   Alvaro Gareppe
  
   On Thu, Aug 13, 2015 at 12:59 PM, Yuheng Du yuheng.du.h...@gmail.com
   wrote:
  
Also, the latency results show no major difference when using ack=0
 or
ack=1. Why is that?
   
On Thu, Aug 13, 2015 at 11:51 AM, Yuheng Du 
 yuheng.du.h...@gmail.com
wrote:
   
 I am running an experiment where 92 producers is publishing data
  into 6
 brokers and 10 consumer are reading online data simultaneously.

 How should I do to reduce the latency? Currently when I run the
   producer
 performance test the average latency is around 10s.

 Should I disable log.flush? How to do that? Thanks.

   
  
  
  
   --
   Ing. Alvaro Gareppe
   agare...@gmail.com
  
 



Re: Reduce latency

2015-08-17 Thread Yuheng Du
Thank you Kishore, I made the buffer twice the size of the batch size and
the latency has reduced significantly.

But is there only one thread io thread sending the batches? Can I increase
the number of threads sending the batches so more than one batch could be
sent at the same time?

Thanks.



On Thu, Aug 13, 2015 at 5:38 PM, Kishore Senji kse...@gmail.com wrote:

 Your batch.size is 8196 and your buffer.memory is 67108864. This means
 67108864/8196
 ~ 8188 batches are in memory ready to the sent. There is only one thread io
 thread sending them. I would guess that the io thread (
 kafka-producer-network-thread) would be busy. Please check it in visual vm.

 In steady state, every batch has to wait for the previous 8187 batches to
 be done before it gets a chance to be sent out, but the latency is counted
 from the time is added to the queue. This is the reason that you are seeing
 very high end-to-end latency.

 Have the buffer.memory to be only twice that of the batch.size so that
 while one is in flight, you can another batch ready to go (and the
 KafkaProducer would block to send more when there is no memory and this way
 the batches are not waiting in the queue unnecessarily) . Also may be you
 want to increase the batch.size further more, you will get even better
 throughput with more or less same latency (as there is no shortage of
 events in the test program).

 On Thu, Aug 13, 2015 at 1:13 PM Yuheng Du yuheng.du.h...@gmail.com
 wrote:

  Yes there is. But if we are using ProducerPerformance test, it's
 configured
  as giving input when running the test command. Do you write a java
 program
  to test the latency? Thanks.
 
  On Thu, Aug 13, 2015 at 3:54 PM, Alvaro Gareppe agare...@gmail.com
  wrote:
 
   I'm using last one, but not using the ProducerPerformance, I created my
   own. but I think there is a producer.properties file in config folder
 in
   kafka.. is that configuration not for this tester ?
  
   On Thu, Aug 13, 2015 at 4:18 PM, Yuheng Du yuheng.du.h...@gmail.com
   wrote:
  
Thank you Alvaro,
   
How to use sync producers? I am running the standard
  ProducerPerformance
test from kafka to measure the latency of each message to send from
producer to broker only.
The command is like bin/kafka-run-class.sh
org.apache.kafka.clients.tools.ProducerPerformance test7 5000 100
  -1
acks=1 bootstrap.servers=esv4-hcl198.grid.linkedin.com:9092
buffer.memory=67108864 batch.size=8196
   
For running producers, where should I put the producer.type=sync
configuration into? The config/server.properties? Also Does this mean
  we
are using batch size of 1? Which version of Kafka are you using?
thanks.
   
On Thu, Aug 13, 2015 at 3:01 PM, Alvaro Gareppe agare...@gmail.com
wrote:
   
 Are you measuring latency as time between producer and consumer ?

 In that case, the ack shouldn't affect the latency, cause even
 tough
   your
 producer is not going to wait for the ack, the consumer will only
 get
   the
 message after its commited in the server.

 About latency my best result occur with sync producers, but the
throughput
 is much lower in that case.

 About not flushing to disk I'm pretty sure that it's not an option
 in
kafka
 (correct me if I'm wrong)

 Regards,
 Alvaro Gareppe

 On Thu, Aug 13, 2015 at 12:59 PM, Yuheng Du 
  yuheng.du.h...@gmail.com
 wrote:

  Also, the latency results show no major difference when using
 ack=0
   or
  ack=1. Why is that?
 
  On Thu, Aug 13, 2015 at 11:51 AM, Yuheng Du 
   yuheng.du.h...@gmail.com
  wrote:
 
   I am running an experiment where 92 producers is publishing
 data
into 6
   brokers and 10 consumer are reading online data simultaneously.
  
   How should I do to reduce the latency? Currently when I run the
 producer
   performance test the average latency is around 10s.
  
   Should I disable log.flush? How to do that? Thanks.
  
 



 --
 Ing. Alvaro Gareppe
 agare...@gmail.com

   
  
  
  
   --
   Ing. Alvaro Gareppe
   agare...@gmail.com
  
 



Re: use page cache as much as possiblee

2015-08-14 Thread Yuheng Du
So if I understand correctly, even if I delay flushing, the consumer will
get the messages as soon as the broker receives them and put them into page
cache (assuming producer doesn't wait for acks from brokers)?

And will the decrease of log.flush interval help reduce latency between
producer and consumer?

Thanks.


On Fri, Aug 14, 2015 at 11:57 AM, Kishore Senji kse...@gmail.com wrote:

 Thank you Gwen for correcting me. This document (
 https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Replication) in
 Writes section also has specified the same thing as you have mentioned.
 One thing is not clear to me as to what happens when the Replicas add the
 message to memory but the leader fails before acking to the producer. Later
 the leader replica is chosen to be the leader for the partition, it will
 advance the HW to its LEO (which has the message). The producer can resend
 the same message thinking it failed and there will be a duplicate message.
 Is my understanding correct here?

 On Thu, Aug 13, 2015 at 10:50 PM, Gwen Shapira g...@confluent.io wrote:

  On Thu, Aug 13, 2015 at 4:10 PM, Kishore Senji kse...@gmail.com wrote:
 
   Consumers can only fetch data up to the committed offset and the reason
  is
   reliability and durability on a broker crash (some consumers might get
  the
   new data and some may not as the data is not yet committed and lost).
  Data
   will be committed when it is flushed. So if you delay the flushing,
   consumers won't get those messages until that time.
  
 
  As far as I know, this is not accurate.
 
  A message is considered committed when all ISR replicas received it (this
  much is documented). This doesn't need to include writing to disk, which
  will happen asynchronously.
 
 
  
   Even though you flush periodically based on log.flush.interval.messages
  and
   log.flush.interval.ms, if the segment file is in the pagecache, the
   consumers will still benefit from that pagecache and OS wouldn't read
 it
   again from disk.
  
   On Thu, Aug 13, 2015 at 2:54 PM Yuheng Du yuheng.du.h...@gmail.com
   wrote:
  
Hi,
   
As I understand it, kafka brokers will store the incoming messages
 into
pagecache as much as possible and then flush them into disk, right?
   
But in my experiment where 90 producers is publishing data into 6
   brokers,
I see that the log directory on disk where broker stores the data is
constantly increasing (every seconds.) So why this is happening? Does
   this
has to do with the default log.flush.interval setting?
   
I want the broker to write to disk less often when serving some
 on-line
consumers to reduce latency. I tested in my broker the disk write
 speed
   is
around 110MB/s.
   
Thanks for any replies.
   
  
 



Reduce latency

2015-08-13 Thread Yuheng Du
I am running an experiment where 92 producers is publishing data into 6
brokers and 10 consumer are reading online data simultaneously.

How should I do to reduce the latency? Currently when I run the producer
performance test the average latency is around 10s.

Should I disable log.flush? How to do that? Thanks.


Re: Reduce latency

2015-08-13 Thread Yuheng Du
Also, the latency results show no major difference when using ack=0 or
ack=1. Why is that?

On Thu, Aug 13, 2015 at 11:51 AM, Yuheng Du yuheng.du.h...@gmail.com
wrote:

 I am running an experiment where 92 producers is publishing data into 6
 brokers and 10 consumer are reading online data simultaneously.

 How should I do to reduce the latency? Currently when I run the producer
 performance test the average latency is around 10s.

 Should I disable log.flush? How to do that? Thanks.



use page cache as much as possiblee

2015-08-13 Thread Yuheng Du
Hi,

As I understand it, kafka brokers will store the incoming messages into
pagecache as much as possible and then flush them into disk, right?

But in my experiment where 90 producers is publishing data into 6 brokers,
I see that the log directory on disk where broker stores the data is
constantly increasing (every seconds.) So why this is happening? Does this
has to do with the default log.flush.interval setting?

I want the broker to write to disk less often when serving some on-line
consumers to reduce latency. I tested in my broker the disk write speed is
around 110MB/s.

Thanks for any replies.


Variation of producer latency in ProducerPerformance test

2015-08-11 Thread Yuheng Du
Hi,

I am running a test which 92 producers each publish 53000 records of size
254 bytes to 2 brokers.

The average latency shown in each producer has high variations. For some
producer, the average latency is as low as 38ms to send the 53000 records;
but for some producer, the average latency is as high as 13067ms.

How to explain this problem? To get the lowest latencies, what batch size
and other important configs should I use?

Thanks!


Kafka vs RabbitMQ latency

2015-08-04 Thread Yuheng Du
Hi guys,

I was reading a paper today in which the latency of kafka and rabbitmq is
compared:
http://downloads.hindawi.com/journals/js/2015/468047.pdf

To my surprise, kafka has shown some large variations of latency as the
number of records per second increases.

So I am curious about why is that. Also in the
ProducerPerformanceTest: in/kafka-run-class.sh
org.apache.kafka.clients.tools.ProducerPerformance test7 5000 100 -1
*acks=1* bootstrap.servers=esv4-hcl198.grid.linkedin.com:9092
buffer.memory=67108864 batch.size=8196

Setting acks = 1 means the producer will wait for ack from leader replica,
right? Could that be the reason which affects latency? If I set it to 0, it
will make the producers send as fast as possible therefore the throughput
can increase and latency decrease in the test results?

Thanks for answering.

best,


multiple producer throughput

2015-07-27 Thread Yuheng Du
Hi,

I am running 40 producers on 40 nodes cluster. The messages are sent to 6
brokers in another cluster. The producers are running ProducerPerformance
test.

When 20 nodes are running, the throughput is around 13MB/s and when running
40 nodes, the throughput is around 9MB/s.

I have set log.retention.ms=9000 to delete the unwanted messages, just to
avoid the disk space to be filled.

So I want to know how should I tune the system to get better throughput
result? Thanks.


Re: deleting data automatically

2015-07-27 Thread Yuheng Du
Thank you!

On Mon, Jul 27, 2015 at 1:43 PM, Ewen Cheslack-Postava e...@confluent.io
wrote:

 As I mentioned, adjusting any settings such that files are small enough
 that you don't get the benefits of append-only writes or file
 creation/deletion become a bottleneck might affect performance. It looks
 like the default setting for log.segment.bytes is 1GB, so given fast enough
 cleanup of old logs, you may not need to adjust that setting -- assuming
 you have a reasonable amount of storage, you'll easily fit many dozen log
 files of that size.

 -Ewen

 On Mon, Jul 27, 2015 at 10:36 AM, Yuheng Du yuheng.du.h...@gmail.com
 wrote:

  Thank you! what performance impacts will it be if I change
  log.segment.bytes? Thanks.
 
  On Mon, Jul 27, 2015 at 1:25 PM, Ewen Cheslack-Postava 
 e...@confluent.io
  wrote:
 
   I think log.cleanup.interval.mins was removed in the first 0.8 release.
  It
   sounds like you're looking at outdated docs. Search for
   log.retention.check.interval.ms here:
   http://kafka.apache.org/documentation.html
  
   As for setting the values too low hurting performance, I'd guess it's
   probably only an issue if you set them extremely small, such that file
   creation and cleanup become a bottleneck.
  
   -Ewen
  
   On Mon, Jul 27, 2015 at 10:03 AM, Yuheng Du yuheng.du.h...@gmail.com
   wrote:
  
If I want to get higher throughput, should I increase the
log.segment.bytes?
   
I don't see log.retention.check.interval.ms, but there is
log.cleanup.interval.mins, is that what you mean?
   
If I set log.roll.ms or log.cleanup.interval.mins too small, will it
   hurt
the throughput? Thanks.
   
On Fri, Jul 24, 2015 at 11:03 PM, Ewen Cheslack-Postava 
   e...@confluent.io

wrote:
   
 You'll want to set the log retention policy via
 log.retention.{ms,minutes,hours} or log.retention.bytes. If you
 want
really
 aggressive collection (e.g., on the order of seconds, as you
   specified),
 you might also need to adjust log.segment.bytes/log.roll.{ms,hours}
  and
 log.retention.check.interval.ms.

 On Fri, Jul 24, 2015 at 12:49 PM, Yuheng Du 
  yuheng.du.h...@gmail.com
 wrote:

  Hi,
 
  I am testing the kafka producer performance. So I created a queue
  and
  writes a large amount of data to that queue.
 
  Is there a way to delete the data automatically after some time,
  say
  whenever the data size reaches 50GB or the retention time exceeds
  10
  seconds, it will be deleted so my disk won't get filled and new
  data
 can't
  be written in?
 
  Thanks.!
 



 --
 Thanks,
 Ewen

   
  
  
  
   --
   Thanks,
   Ewen
  
 



 --
 Thanks,
 Ewen



Re: deleting data automatically

2015-07-27 Thread Yuheng Du
If I want to get higher throughput, should I increase the
log.segment.bytes?

I don't see log.retention.check.interval.ms, but there is
log.cleanup.interval.mins, is that what you mean?

If I set log.roll.ms or log.cleanup.interval.mins too small, will it hurt
the throughput? Thanks.

On Fri, Jul 24, 2015 at 11:03 PM, Ewen Cheslack-Postava e...@confluent.io
wrote:

 You'll want to set the log retention policy via
 log.retention.{ms,minutes,hours} or log.retention.bytes. If you want really
 aggressive collection (e.g., on the order of seconds, as you specified),
 you might also need to adjust log.segment.bytes/log.roll.{ms,hours} and
 log.retention.check.interval.ms.

 On Fri, Jul 24, 2015 at 12:49 PM, Yuheng Du yuheng.du.h...@gmail.com
 wrote:

  Hi,
 
  I am testing the kafka producer performance. So I created a queue and
  writes a large amount of data to that queue.
 
  Is there a way to delete the data automatically after some time, say
  whenever the data size reaches 50GB or the retention time exceeds 10
  seconds, it will be deleted so my disk won't get filled and new data
 can't
  be written in?
 
  Thanks.!
 



 --
 Thanks,
 Ewen



Re: deleting data automatically

2015-07-27 Thread Yuheng Du
Thank you! what performance impacts will it be if I change
log.segment.bytes? Thanks.

On Mon, Jul 27, 2015 at 1:25 PM, Ewen Cheslack-Postava e...@confluent.io
wrote:

 I think log.cleanup.interval.mins was removed in the first 0.8 release. It
 sounds like you're looking at outdated docs. Search for
 log.retention.check.interval.ms here:
 http://kafka.apache.org/documentation.html

 As for setting the values too low hurting performance, I'd guess it's
 probably only an issue if you set them extremely small, such that file
 creation and cleanup become a bottleneck.

 -Ewen

 On Mon, Jul 27, 2015 at 10:03 AM, Yuheng Du yuheng.du.h...@gmail.com
 wrote:

  If I want to get higher throughput, should I increase the
  log.segment.bytes?
 
  I don't see log.retention.check.interval.ms, but there is
  log.cleanup.interval.mins, is that what you mean?
 
  If I set log.roll.ms or log.cleanup.interval.mins too small, will it
 hurt
  the throughput? Thanks.
 
  On Fri, Jul 24, 2015 at 11:03 PM, Ewen Cheslack-Postava 
 e...@confluent.io
  
  wrote:
 
   You'll want to set the log retention policy via
   log.retention.{ms,minutes,hours} or log.retention.bytes. If you want
  really
   aggressive collection (e.g., on the order of seconds, as you
 specified),
   you might also need to adjust log.segment.bytes/log.roll.{ms,hours} and
   log.retention.check.interval.ms.
  
   On Fri, Jul 24, 2015 at 12:49 PM, Yuheng Du yuheng.du.h...@gmail.com
   wrote:
  
Hi,
   
I am testing the kafka producer performance. So I created a queue and
writes a large amount of data to that queue.
   
Is there a way to delete the data automatically after some time, say
whenever the data size reaches 50GB or the retention time exceeds 10
seconds, it will be deleted so my disk won't get filled and new data
   can't
be written in?
   
Thanks.!
   
  
  
  
   --
   Thanks,
   Ewen
  
 



 --
 Thanks,
 Ewen



Re: multiple producer throughput

2015-07-27 Thread Yuheng Du
The message size is 100 bytes and each producer sends out 50million
messages. It's the number used by the benchmarking kafka post.
http://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines

Thanks.

On Mon, Jul 27, 2015 at 4:15 PM, Prabhjot Bharaj prabhbha...@gmail.com
wrote:

 Hi,

 Have you tried with acks=1 and -1 as well?
 Please share the numbers and the message size

 Regards,
 Prabcs
 On Jul 27, 2015 10:24 PM, Yuheng Du yuheng.du.h...@gmail.com wrote:

  Hi,
 
  I am running 40 producers on 40 nodes cluster. The messages are sent to 6
  brokers in another cluster. The producers are running ProducerPerformance
  test.
 
  When 20 nodes are running, the throughput is around 13MB/s and when
 running
  40 nodes, the throughput is around 9MB/s.
 
  I have set log.retention.ms=9000 to delete the unwanted messages, just
 to
  avoid the disk space to be filled.
 
  So I want to know how should I tune the system to get better throughput
  result? Thanks.
 



Re: properducertest on multiple nodes

2015-07-24 Thread Yuheng Du
I deleted the queue and recreated it before I run the test. Things are
working after restart the broker cluster, thanks!

On Fri, Jul 24, 2015 at 12:06 PM, Gwen Shapira gshap...@cloudera.com
wrote:

 Does topic speedx1 exist?

 On Fri, Jul 24, 2015 at 7:09 AM, Yuheng Du yuheng.du.h...@gmail.com
 wrote:
  Hi,
 
  I am trying to run 20 performance test on 10 nodes using pbsdsh.
 
  The messages will send to a 6 brokers cluster. It seems to work for a
  while. When I delete the test queue and rerun the test, the broker does
 not
  seem to process incoming messages:
 
  [yuhengd@node1739 kafka_2.10-0.8.2.1]$ bin/kafka-run-class.sh
  org.apache.kafka.clients.tools.ProducerPerformance speedx1 5000 100
 -1
  acks=1 bootstrap.servers=130.127.133.72:9092 buffer.memory=67108864
  batch.size=8196
 
  1 records sent, 0.0 records/sec (0.00 MB/sec), 6.0 ms avg latency,
  6.0 max latency.
 
  org.apache.kafka.common.errors.TimeoutException: Failed to update
 metadata
  after 79 ms.
 
 
  Can anyone suggest what the problem is? Or what configuration should I
  change? Thanks!



properducertest on multiple nodes

2015-07-24 Thread Yuheng Du
Hi,

I am trying to run 20 performance test on 10 nodes using pbsdsh.

The messages will send to a 6 brokers cluster. It seems to work for a
while. When I delete the test queue and rerun the test, the broker does not
seem to process incoming messages:

[yuhengd@node1739 kafka_2.10-0.8.2.1]$ bin/kafka-run-class.sh
org.apache.kafka.clients.tools.ProducerPerformance speedx1 5000 100 -1
acks=1 bootstrap.servers=130.127.133.72:9092 buffer.memory=67108864
batch.size=8196

1 records sent, 0.0 records/sec (0.00 MB/sec), 6.0 ms avg latency,
6.0 max latency.

org.apache.kafka.common.errors.TimeoutException: Failed to update metadata
after 79 ms.


Can anyone suggest what the problem is? Or what configuration should I
change? Thanks!


deleting data automatically

2015-07-24 Thread Yuheng Du
Hi,

I am testing the kafka producer performance. So I created a queue and
writes a large amount of data to that queue.

Is there a way to delete the data automatically after some time, say
whenever the data size reaches 50GB or the retention time exceeds 10
seconds, it will be deleted so my disk won't get filled and new data can't
be written in?

Thanks.!


Re: broker data directory

2015-07-21 Thread Yuheng Du
Thank you, Nicolas!

On Tue, Jul 21, 2015 at 10:46 AM, Nicolas Phung nicolas.ph...@gmail.com
wrote:

 Yes indeed.

 # A comma seperated list of directories under which to store log files
 log.dirs=/var/lib/kafka

 You can put several disk/partitions too.

 Regards,

 On Tue, Jul 21, 2015 at 4:37 PM, Yuheng Du yuheng.du.h...@gmail.com
 wrote:

  Just wanna make sure, in server.properties, the configuration
  log.dirs=/tmp/kafka-logs
 
  specifies the directory of where the log (data) stores, right?
 
  If I want the data to be saved elsewhere, this is the configuration I
 need
  to change, right?
 
  Thanks for answering.
 
  best,
 



broker data directory

2015-07-21 Thread Yuheng Du
Just wanna make sure, in server.properties, the configuration
log.dirs=/tmp/kafka-logs

specifies the directory of where the log (data) stores, right?

If I want the data to be saved elsewhere, this is the configuration I need
to change, right?

Thanks for answering.

best,


Re: latency performance test

2015-07-16 Thread Yuheng Du
Hi Ewen,

Thank you for your patient explaining. It is very helpful.

Can we assume that the long latency of ProducerPerformance comes from
queuing delay in the buffer and it is related to buffer size?

Thank you!

best,
Yuheng

On Thu, Jul 16, 2015 at 12:21 AM, Ewen Cheslack-Postava e...@confluent.io
wrote:

 The tests are meant to evaluate different things and the way they send
 messages is the source of the difference.

 EndToEndLatency works with a single message at a time. It produces the
 message then waits for the consumer to receive it. This approach guarantees
 there is no delay due to queuing. The goal with this test is to evaluate
 the *minimum* latency.

 ProducerPerformance focuses on achieving maximum throughput. This means it
 will enqueue lots of records so it will always have more data to send (and
 can use batching to increase the throughput). Unlike EndToEndLatency, this
 means records may just sit in a queue on the producer for awhile because
 the maximum number of in flight requests has been reached and it needs to
 wait for responses for those requests. Since EndToEndLatency only ever has
 one record outstanding, it will never encounter this case.

 Batching itself doesn't increase the latency because it only occurs when
 the producer is either a) already unable to send messages anyway or b)
 linger.ms is greater than 0, but the tests use the default setting that
 doesn't linger at all.

 In your example for ProducerPerformance, you have 100 byte records and will
 buffer up to 64MB. Given the batch size of 8K and default producer settings
 of 5 in flight requests, you can roughly think of one round trip time
 handling 5 * 8K = 40K bytes of data. If your roundtrip is 1ms, then if your
 buffer is full at 64MB it will take you 64 MB / (40 KB/ms) = 1638ms = 1.6s.
 That means that the record that was added at the end of the buffer had to
 just sit in the buffer for 1.6s before it was sent off to the broker. And
 if your buffer is consistently full (which it should be for
 ProducerPerformance since it's sending as fast as it can), that means
 *every* record waits that long.

 Of course, these numbers are estimates, depend on my having used 1ms, but
 hopefully should make it clear why you can see relatively large latencies.

 -Ewen


 On Wed, Jul 15, 2015 at 1:38 AM, Yuheng Du yuheng.du.h...@gmail.com
 wrote:

  Hi,
 
  I have run the end to end latency test and the producerPerformance test
 on
  my kafka cluster according to
  https://gist.github.com/jkreps/c7ddb4041ef62a900e6c
 
  In end to end latency test, the latency was around 2ms. In
  producerperformance test, if use batch size 8196 to send 50,000,000
  records:
 
  bin/kafka-run-class.sh
 org.apache.kafka.clients.tools.ProducerPerformance
  speedx1 5000 100 -1 acks=1 bootstrap.servers=192.168.1.1:9092
  buffer.memory=67108864 batch.size=8196
 
 
  The results show that max latency is 3617ms, avg latency 626.7ms. I wanna
  know why the latency in producerperformance test is significantly larger
  than end to end test? Is it because of batching? Are the definitons of
  these two latencies different? I looked at the source code and I believe
  the latency is measure for the producer.send() function to complete. So
  does this latency includes transmission delay, transferring delay, and
 what
  other components?
 
 
  Thanks.
 
 
  best,
 
  Yuheng
 



 --
 Thanks,
 Ewen



Re: kafka benchmark tests

2015-07-15 Thread Yuheng Du
Jiefu,

Have you tried to run benchmark_test.py? I ran it and it asks me for the
ducktape.services.service

yuhengdu@consumer0:/packages/kafka_2.10-0.8.2.1$ python benchmark_test.py

Traceback (most recent call last):

  File benchmark_test.py, line 16, in module

from ducktape.services.service import Service

ImportError: No module named ducktape.services.service


Can you help me on getting it to work, Ewen? Thanks.


best,

Yuheng

On Tue, Jul 14, 2015 at 11:28 PM, Ewen Cheslack-Postava e...@confluent.io
wrote:

 @Jiefu, yes! The patch is functional, I think it's just waiting on a bit of
 final review after the last round of changes. You can definitely use it for
 your own benchmarking, and we'd love to see patches for any additional
 tests we missed in the first pass!

 -Ewen

 On Tue, Jul 14, 2015 at 10:53 AM, JIEFU GONG jg...@berkeley.edu wrote:

  Yuheng,
  I would recommend looking here:
  http://kafka.apache.org/documentation.html#brokerconfigs and scrolling
  down
  to get a better understanding of the default settings and what they mean
 --
  it'll tell you what different options for acks does.
 
  Ewen,
  Thank you immensely for your thoughts, they shed a lot of insight into
 the
  issue. Though it is understandable that your specific results need to be
  verified, it seems that the KIP-25 patch is functional and I can use it
 for
  my own benchmarking purposes? Is that correct? Thanks again!
 
  On Tue, Jul 14, 2015 at 8:22 AM, Yuheng Du yuheng.du.h...@gmail.com
  wrote:
 
   Also, I guess setting the target throughput to -1 means let it be as
 high
   as possible?
  
   On Tue, Jul 14, 2015 at 10:36 AM, Yuheng Du yuheng.du.h...@gmail.com
   wrote:
  
Thanks. If I set the acks=1 in the producer config options in
bin/kafka-run-class.sh
  org.apache.kafka.clients.tools.ProducerPerformance
test7 5000 100 -1 acks=1 bootstrap.servers=
esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864
   batch.size=8196?
   
Does that mean for each message generated at the producer, the
 producer
will wait until the broker sends the ack back, then send another
  message?
   
Thanks.
   
Yuheng
   
On Tue, Jul 14, 2015 at 10:06 AM, Manikumar Reddy 
  ku...@nmsworks.co.in
wrote:
   
Yes, A list of  Kafka Server host/port pairs to use for establishing
  the
initial connection to the Kafka cluster
   
https://kafka.apache.org/documentation.html#newproducerconfigs
   
On Tue, Jul 14, 2015 at 7:29 PM, Yuheng Du 
 yuheng.du.h...@gmail.com
wrote:
   
 Does anyone know what is bootstrap.servers=
 esv4-hcl198.grid.linkedin.com:9092 means in the following test
   command:

 bin/kafka-run-class.sh
org.apache.kafka.clients.tools.ProducerPerformance
 test7 5000 100 -1 acks=1 bootstrap.servers=
 esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864
batch.size=8196?

 what is bootstrap.servers? Is it the kafka server that I am
 running
  a
test
 at?

 Thanks.

 Yuheng




 On Tue, Jul 14, 2015 at 12:18 AM, Ewen Cheslack-Postava 
e...@confluent.io
 
 wrote:

  I implemented (nearly) the same basic set of tests in the system
   test
  framework we started at Confluent and that is going to move into
Kafka --
  see the wip patch for KIP-25 here:
 https://github.com/apache/kafka/pull/70
  In particular, that test is implemented in benchmark_test.py:
 
 

   
  
 
 https://github.com/apache/kafka/pull/70/files#diff-ca984778cf9943407645eb6784f19dc8
 
  Hopefully once that's merged people can reuse that benchmark
 (and
   add
to
  it!) so they can easily run the same benchmarks across different
 hardware.
  Here are some results from an older version of that test on
   m3.2xlarge
  instances on EC2 using local ephemeral storage (I think... it's
  been
 awhile
  since I ran these numbers and I didn't document methodology that
  carefully):
 
  INFO:_.KafkaBenchmark:=
  INFO:_.KafkaBenchmark:BENCHMARK RESULTS
  INFO:_.KafkaBenchmark:=
  INFO:_.KafkaBenchmark:Single producer, no replication:
  684097.470208
  rec/sec (65.24 MB/s)
  INFO:_.KafkaBenchmark:Single producer, async 3x replication:
  667494.359673 rec/sec (63.66 MB/s)
  INFO:_.KafkaBenchmark:Single producer, sync 3x replication:
  116485.764275 rec/sec (11.11 MB/s)
  INFO:_.KafkaBenchmark:Three producers, async 3x replication:
  1696519.022182 rec/sec (161.79 MB/s)
  INFO:_.KafkaBenchmark:Message size:
  INFO:_.KafkaBenchmark: 10: 1637825.195625 rec/sec (15.62
 MB/s)
  INFO:_.KafkaBenchmark: 100: 605504.877911 rec/sec (57.75
 MB/s)
  INFO:_.KafkaBenchmark: 1000: 90351.817570 rec/sec (86.17
 MB/s)
  INFO:_.KafkaBenchmark: 1: 8306.180862 rec/sec (79.21
 MB/s)
  INFO:_.KafkaBenchmark: 10

Re: Latency test

2015-07-15 Thread Yuheng Du
Tao,

If I am running on the command line the following command
bin/kafka-run-class.sh kafka.tools.TestEndToEndLatency 192.168.1.3:9092
192.168.1.1:2181 speedx3 5000 100 1 -Xmx1024m

It promped that it is not correct. So where should I put the -Xmx1024m
option? Thanks.



On Wed, Jul 15, 2015 at 3:44 AM, Tao Feng fengta...@gmail.com wrote:

 (Please correct me if I am wrong.) Based on  TestEndToEndLatency(

 https://github.com/apache/kafka/blob/trunk/core/src/test/scala/kafka/tools/TestEndToEndLatency.scala
 ),
 consumer_fetch_max_wait corresponds to fetch.wait.max.ms in consumer
 config(
 http://kafka.apache.org/documentation.html#consumerconfigs). The default
 value listed at document is 100(ms).

 To add java heap space to jvm, put -Xmx$Size(max heap size) for your jvm
 option.

 On Wed, Jul 15, 2015 at 12:29 AM, Yuheng Du yuheng.du.h...@gmail.com
 wrote:

  Tao,
 
  Thanks. The example on
 https://gist.github.com/jkreps/c7ddb4041ef62a900e6c
  is outdated already. The error message shows:
 
  USAGE: java kafka.tools.TestEndToEndLatency$ broker_list
 zookeeper_connect
  topic num_messages consumer_fetch_max_wait producer_acks
 
 
  Can anyone helps me what should be put in consumer_fetch_max_wait?
 Thanks.
 
  On Tue, Jul 14, 2015 at 5:21 PM, Tao Feng fengta...@gmail.com wrote:
 
   I think ProducerPerformance microbenchmark  only measure between client
  to
   brokers(producer to brokers) and provide latency information.
  
   On Tue, Jul 14, 2015 at 11:05 AM, Yuheng Du yuheng.du.h...@gmail.com
   wrote:
  
Currently, the latency test from kafka test the end to end latency
   between
producers and consumers.
   
Is there  a way to test the producer to broker  and broker to
 consumer
delay seperately?
   
Thanks.
   
  
 



Re: Latency test

2015-07-15 Thread Yuheng Du
I got java out of heap error when running end to end latency test:

yuhengdu@consumer0:/packages/kafka_2.10-0.8.2.1$ bin/kafka-run-class.sh
kafka.tools.TestEndToEndLatency 192.168.1.3:9092 192.168.1.1:2181 speedx3
5000 100 1

Exception in thread main java.lang.OutOfMemoryError: Java heap space

at kafka.tools.TestEndToEndLatency$.main(TestEndToEndLatency.scala:69)

at kafka.tools.TestEndToEndLatency.main(TestEndToEndLatency.scala)


What command should I do to add java heap space  to jvm? Thanks!


Yuheng

On Wed, Jul 15, 2015 at 3:29 AM, Yuheng Du yuheng.du.h...@gmail.com wrote:

 Tao,

 Thanks. The example on https://gist.github.com/jkreps/c7ddb4041ef62a900e6c
 is outdated already. The error message shows:

 USAGE: java kafka.tools.TestEndToEndLatency$ broker_list zookeeper_connect
 topic num_messages consumer_fetch_max_wait producer_acks


 Can anyone helps me what should be put in consumer_fetch_max_wait? Thanks.

 On Tue, Jul 14, 2015 at 5:21 PM, Tao Feng fengta...@gmail.com wrote:

 I think ProducerPerformance microbenchmark  only measure between client to
 brokers(producer to brokers) and provide latency information.

 On Tue, Jul 14, 2015 at 11:05 AM, Yuheng Du yuheng.du.h...@gmail.com
 wrote:

  Currently, the latency test from kafka test the end to end latency
 between
  producers and consumers.
 
  Is there  a way to test the producer to broker  and broker to consumer
  delay seperately?
 
  Thanks.
 





latency performance test

2015-07-15 Thread Yuheng Du
Hi,

I have run the end to end latency test and the producerPerformance test on
my kafka cluster according to
https://gist.github.com/jkreps/c7ddb4041ef62a900e6c

In end to end latency test, the latency was around 2ms. In
producerperformance test, if use batch size 8196 to send 50,000,000 records:

bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance
speedx1 5000 100 -1 acks=1 bootstrap.servers=192.168.1.1:9092
buffer.memory=67108864 batch.size=8196


The results show that max latency is 3617ms, avg latency 626.7ms. I wanna
know why the latency in producerperformance test is significantly larger
than end to end test? Is it because of batching? Are the definitons of
these two latencies different? I looked at the source code and I believe
the latency is measure for the producer.send() function to complete. So
does this latency includes transmission delay, transferring delay, and what
other components?


Thanks.


best,

Yuheng


Re: Latency test

2015-07-15 Thread Yuheng Du
Tao,

Thanks. The example on https://gist.github.com/jkreps/c7ddb4041ef62a900e6c
is outdated already. The error message shows:

USAGE: java kafka.tools.TestEndToEndLatency$ broker_list zookeeper_connect
topic num_messages consumer_fetch_max_wait producer_acks


Can anyone helps me what should be put in consumer_fetch_max_wait? Thanks.

On Tue, Jul 14, 2015 at 5:21 PM, Tao Feng fengta...@gmail.com wrote:

 I think ProducerPerformance microbenchmark  only measure between client to
 brokers(producer to brokers) and provide latency information.

 On Tue, Jul 14, 2015 at 11:05 AM, Yuheng Du yuheng.du.h...@gmail.com
 wrote:

  Currently, the latency test from kafka test the end to end latency
 between
  producers and consumers.
 
  Is there  a way to test the producer to broker  and broker to consumer
  delay seperately?
 
  Thanks.
 



Re: Latency test

2015-07-15 Thread Yuheng Du
I have run the end to end latency test and the producerPerformance test on
my kafka cluster according to
https://gist.github.com/jkreps/c7ddb4041ef62a900e6c

In end to end latency test, the latency was around 2ms. In
producerperformance test, if use batch size 8196 to send 50,000,000 records:

bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance
speedx1 5000 100 -1 acks=1 bootstrap.servers=192.168.1.1:9092
buffer.memory=67108864 batch.size=8196


The results show that max latency is 3617ms, avg latency 626.7ms. I wanna
know why the latency in producerperformance test is significantly larger
than end to end test? Is it because of batching? Are the definitons of
these two latencies different? I looked at the source code and I believe
the latency is measure for the producer.send() function to complete. So
does this latency includes transmission delay, transferring delay, and what
other components?


Thanks.


best,

Yuheng

On Wed, Jul 15, 2015 at 3:51 AM, Yuheng Du yuheng.du.h...@gmail.com wrote:

 Tao,

 If I am running on the command line the following command
 bin/kafka-run-class.sh kafka.tools.TestEndToEndLatency 192.168.1.3:9092
 192.168.1.1:2181 speedx3 5000 100 1 -Xmx1024m

 It promped that it is not correct. So where should I put the -Xmx1024m
 option? Thanks.



 On Wed, Jul 15, 2015 at 3:44 AM, Tao Feng fengta...@gmail.com wrote:

 (Please correct me if I am wrong.) Based on  TestEndToEndLatency(

 https://github.com/apache/kafka/blob/trunk/core/src/test/scala/kafka/tools/TestEndToEndLatency.scala
 ),
 consumer_fetch_max_wait corresponds to fetch.wait.max.ms in consumer
 config(
 http://kafka.apache.org/documentation.html#consumerconfigs). The default
 value listed at document is 100(ms).

 To add java heap space to jvm, put -Xmx$Size(max heap size) for your jvm
 option.

 On Wed, Jul 15, 2015 at 12:29 AM, Yuheng Du yuheng.du.h...@gmail.com
 wrote:

  Tao,
 
  Thanks. The example on
 https://gist.github.com/jkreps/c7ddb4041ef62a900e6c
  is outdated already. The error message shows:
 
  USAGE: java kafka.tools.TestEndToEndLatency$ broker_list
 zookeeper_connect
  topic num_messages consumer_fetch_max_wait producer_acks
 
 
  Can anyone helps me what should be put in consumer_fetch_max_wait?
 Thanks.
 
  On Tue, Jul 14, 2015 at 5:21 PM, Tao Feng fengta...@gmail.com wrote:
 
   I think ProducerPerformance microbenchmark  only measure between
 client
  to
   brokers(producer to brokers) and provide latency information.
  
   On Tue, Jul 14, 2015 at 11:05 AM, Yuheng Du yuheng.du.h...@gmail.com
 
   wrote:
  
Currently, the latency test from kafka test the end to end latency
   between
producers and consumers.
   
Is there  a way to test the producer to broker  and broker to
 consumer
delay seperately?
   
Thanks.
   
  
 





kafka TestEndtoEndLatency

2015-07-15 Thread Yuheng Du
In kafka performance tests https://gist.github.com/jkreps
/c7ddb4041ef62a900e6c

The TestEndtoEndLatency results are typically around 2ms, while the
ProducerPerformance normally has average latencyaround several hundres ms
when using batch size 8196.

Are both results talking about end to end latency? and the
ProducerPerformance results' latency is just larger because of batching of
several messages?

What is the message size of the TestEndtoEndLatency test uses?

Thanks for any ideas.

best,


Re: kafka benchmark tests

2015-07-15 Thread Yuheng Du
Hi Geoffrey,

Thank you for your helpful information. Do I have to install the virtual
machines? I am using Mac as the testdriver machine or I can use a linux
machine to run testdriver too.

Thanks.

best,
Yuheng

On Wed, Jul 15, 2015 at 2:55 PM, Geoffrey Anderson ge...@confluent.io
wrote:

 Hi Yuheng,

 Running these tests requires a tool we've created at Confluent called
 'ducktape', which you need to install with the command:
 pip install ducktape==0.2.0

 Running the tests locally requires some setup (creation of virtual machines
 etc.) which is outlined here:

 https://github.com/apache/kafka/pull/70/files#diff-62f0ff60ede3b78b9c95624e2f61d6c1
 The instructions in the quickstart show you how to run the tests on cluster
 of virtual machines (on a single host)

 Once you have a cluster up and running, you'll be able to run the test
 you're interested in:
 cd kafka/tests
 ducktape kafkatest/tests/benchmark_test.py

 Definitely keep us posted about which parts are difficult, annoying, or
 confusing about this process and we'll do our best to help.

 Thanks,
 Geoff



 On Wed, Jul 15, 2015 at 12:49 AM, Yuheng Du yuheng.du.h...@gmail.com
 wrote:

  Jiefu,
 
  Have you tried to run benchmark_test.py? I ran it and it asks me for the
  ducktape.services.service
 
  yuhengdu@consumer0:/packages/kafka_2.10-0.8.2.1$ python
 benchmark_test.py
 
  Traceback (most recent call last):
 
File benchmark_test.py, line 16, in module
 
  from ducktape.services.service import Service
 
  ImportError: No module named ducktape.services.service
 
 
  Can you help me on getting it to work, Ewen? Thanks.
 
 
  best,
 
  Yuheng
 
  On Tue, Jul 14, 2015 at 11:28 PM, Ewen Cheslack-Postava 
 e...@confluent.io
  
  wrote:
 
   @Jiefu, yes! The patch is functional, I think it's just waiting on a
 bit
  of
   final review after the last round of changes. You can definitely use it
  for
   your own benchmarking, and we'd love to see patches for any additional
   tests we missed in the first pass!
  
   -Ewen
  
   On Tue, Jul 14, 2015 at 10:53 AM, JIEFU GONG jg...@berkeley.edu
 wrote:
  
Yuheng,
I would recommend looking here:
http://kafka.apache.org/documentation.html#brokerconfigs and
 scrolling
down
to get a better understanding of the default settings and what they
  mean
   --
it'll tell you what different options for acks does.
   
Ewen,
Thank you immensely for your thoughts, they shed a lot of insight
 into
   the
issue. Though it is understandable that your specific results need to
  be
verified, it seems that the KIP-25 patch is functional and I can use
 it
   for
my own benchmarking purposes? Is that correct? Thanks again!
   
On Tue, Jul 14, 2015 at 8:22 AM, Yuheng Du yuheng.du.h...@gmail.com
 
wrote:
   
 Also, I guess setting the target throughput to -1 means let it be
 as
   high
 as possible?

 On Tue, Jul 14, 2015 at 10:36 AM, Yuheng Du 
  yuheng.du.h...@gmail.com
 wrote:

  Thanks. If I set the acks=1 in the producer config options in
  bin/kafka-run-class.sh
org.apache.kafka.clients.tools.ProducerPerformance
  test7 5000 100 -1 acks=1 bootstrap.servers=
  esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864
 batch.size=8196?
 
  Does that mean for each message generated at the producer, the
   producer
  will wait until the broker sends the ack back, then send another
message?
 
  Thanks.
 
  Yuheng
 
  On Tue, Jul 14, 2015 at 10:06 AM, Manikumar Reddy 
ku...@nmsworks.co.in
  wrote:
 
  Yes, A list of  Kafka Server host/port pairs to use for
  establishing
the
  initial connection to the Kafka cluster
 
  https://kafka.apache.org/documentation.html#newproducerconfigs
 
  On Tue, Jul 14, 2015 at 7:29 PM, Yuheng Du 
   yuheng.du.h...@gmail.com
  wrote:
 
   Does anyone know what is bootstrap.servers=
   esv4-hcl198.grid.linkedin.com:9092 means in the following
 test
 command:
  
   bin/kafka-run-class.sh
  org.apache.kafka.clients.tools.ProducerPerformance
   test7 5000 100 -1 acks=1 bootstrap.servers=
   esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864
  batch.size=8196?
  
   what is bootstrap.servers? Is it the kafka server that I am
   running
a
  test
   at?
  
   Thanks.
  
   Yuheng
  
  
  
  
   On Tue, Jul 14, 2015 at 12:18 AM, Ewen Cheslack-Postava 
  e...@confluent.io
   
   wrote:
  
I implemented (nearly) the same basic set of tests in the
  system
 test
framework we started at Confluent and that is going to move
  into
  Kafka --
see the wip patch for KIP-25 here:
   https://github.com/apache/kafka/pull/70
In particular, that test is implemented in
 benchmark_test.py:
   
   
  
 

   
  
 
 https://github.com/apache/kafka

Re: kafka TestEndtoEndLatency

2015-07-15 Thread Yuheng Du
Guozhang,

Thank you for explaining. I see that in ProducerPerformance call back
functions were used to get the latency metrics.
For the TestEndtoEndLatency, does message size matter? What this end-to-end
latency comprise of, besides transferring a package from source to
destination (typically around 0.2 ms for 100bytes message)?

Thanks.

Yuheng

On Wed, Jul 15, 2015 at 3:01 PM, Guozhang Wang wangg...@gmail.com wrote:

 Yuheng,

 Only TestEndtoEndLatency's number are end to end, for ProducerPerformance
 the latency is for the send-to-ack latency, which increases as batch size
 increases.

 Guozhang

 On Wed, Jul 15, 2015 at 11:36 AM, Yuheng Du yuheng.du.h...@gmail.com
 wrote:

  In kafka performance tests https://gist.github.com/jkreps
  /c7ddb4041ef62a900e6c
 
  The TestEndtoEndLatency results are typically around 2ms, while the
  ProducerPerformance normally has average latencyaround several hundres
 ms
  when using batch size 8196.
 
  Are both results talking about end to end latency? and the
  ProducerPerformance results' latency is just larger because of batching
 of
  several messages?
 
  What is the message size of the TestEndtoEndLatency test uses?
 
  Thanks for any ideas.
 
  best,
 



 --
 -- Guozhang



Re: kafka TestEndtoEndLatency

2015-07-15 Thread Yuheng Du
Thanks. Here is the source code snippet of EndtoEndLatency test:


for (i- 0 until numMessages) { val begin = System.nanoTime producer.send(
new ProducerRecord[Array[Byte],Array[Byte]](topic, message)) val received =
iter.next val elapsed = System.nanoTime - begin // poor man's progress bar
if (i % 1000 == 0) println(i + \t + elapsed / 1000.0 / 1000.0) totalTime +
= elapsed latencies(i) = (elapsed / 1000 / 1000) }












iter.next reads a message using a consumer process. The
producer.send()function, is it blocking? I think the message should still
matters here, am I right?
In the source code, it is val message = hello there beautiful.getBytes
is around 21 bytes.

I see that in this EndtoEndLatency test, no acks is involved, right?

Thanks.
Yuheng

On Wed, Jul 15, 2015 at 3:43 PM, Guozhang Wang wangg...@gmail.com wrote:

 The end-to-end latency record the transferring of a message from producer
 to broker, then to consumer.

 I cannot remember the details not but I think the EndtoEndLatency test
 record the latency as average, hence it is small.

 Guozhang

 On Wed, Jul 15, 2015 at 12:28 PM, Yuheng Du yuheng.du.h...@gmail.com
 wrote:

  Guozhang,
 
  Thank you for explaining. I see that in ProducerPerformance call back
  functions were used to get the latency metrics.
  For the TestEndtoEndLatency, does message size matter? What this
 end-to-end
  latency comprise of, besides transferring a package from source to
  destination (typically around 0.2 ms for 100bytes message)?
 
  Thanks.
 
  Yuheng
 
  On Wed, Jul 15, 2015 at 3:01 PM, Guozhang Wang wangg...@gmail.com
 wrote:
 
   Yuheng,
  
   Only TestEndtoEndLatency's number are end to end, for
 ProducerPerformance
   the latency is for the send-to-ack latency, which increases as batch
 size
   increases.
  
   Guozhang
  
   On Wed, Jul 15, 2015 at 11:36 AM, Yuheng Du yuheng.du.h...@gmail.com
   wrote:
  
In kafka performance tests https://gist.github.com/jkreps
/c7ddb4041ef62a900e6c
   
The TestEndtoEndLatency results are typically around 2ms, while the
ProducerPerformance normally has average latencyaround several
  hundres
   ms
when using batch size 8196.
   
Are both results talking about end to end latency? and the
ProducerPerformance results' latency is just larger because of
 batching
   of
several messages?
   
What is the message size of the TestEndtoEndLatency test uses?
   
Thanks for any ideas.
   
best,
   
  
  
  
   --
   -- Guozhang
  
 



 --
 -- Guozhang



Re: kafka benchmark tests

2015-07-15 Thread Yuheng Du
Hi Geoffrey,

Thank you for your detailed explaining. They are really helpful.

I am thinking of going after the second way, since I have bare metal access
to all the nodes in the cluster, it's probably better to run real slave
machines instead of virtual machines. (correct me if I am wrong)

Each of my node has 256 G ram and 2T disk space, how large will the slave
machine virtual machine be and how much memory they will take?

Thank you!

best,
Yuheng


On Wed, Jul 15, 2015 at 4:19 PM, Geoffrey Anderson ge...@confluent.io
wrote:

 Hi Yuheng,

 Yes, you should be able to run on either mac or linux.

 The test cluster consists of a test-driver machine and some number of slave
 machines. Right now, there are roughly two ways to set up the slave
 machines:

 1) Slave machines are virtual machines *on* the test-driver machine.
 2) Slave machines are external to the test-driver machine.

 1 is the simplest to set up, but yes it does require installation of the
 virtual machines on the test-driver machine.

 The installation of these machines is outlined in the quickstart I
 mentioned (here is a better link for the test README:
 https://github.com/confluentinc/kafka/tree/KAFKA-2276/tests).

 The tool we're using to bring up the slave virtual machines is called
 vagrant, so the vagrant steps in the quickstart are really telling you
 how to install the virtual machines.

 Hope that helps!

 Cheers,
 Geoff




 On Wed, Jul 15, 2015 at 12:13 PM, Yuheng Du yuheng.du.h...@gmail.com
 wrote:

  Hi Geoffrey,
 
  Thank you for your helpful information. Do I have to install the virtual
  machines? I am using Mac as the testdriver machine or I can use a linux
  machine to run testdriver too.
 
  Thanks.
 
  best,
  Yuheng
 
  On Wed, Jul 15, 2015 at 2:55 PM, Geoffrey Anderson ge...@confluent.io
  wrote:
 
   Hi Yuheng,
  
   Running these tests requires a tool we've created at Confluent called
   'ducktape', which you need to install with the command:
   pip install ducktape==0.2.0
  
   Running the tests locally requires some setup (creation of virtual
  machines
   etc.) which is outlined here:
  
  
 
 https://github.com/apache/kafka/pull/70/files#diff-62f0ff60ede3b78b9c95624e2f61d6c1
   The instructions in the quickstart show you how to run the tests on
  cluster
   of virtual machines (on a single host)
  
   Once you have a cluster up and running, you'll be able to run the test
   you're interested in:
   cd kafka/tests
   ducktape kafkatest/tests/benchmark_test.py
  
   Definitely keep us posted about which parts are difficult, annoying, or
   confusing about this process and we'll do our best to help.
  
   Thanks,
   Geoff
  
  
  
   On Wed, Jul 15, 2015 at 12:49 AM, Yuheng Du yuheng.du.h...@gmail.com
   wrote:
  
Jiefu,
   
Have you tried to run benchmark_test.py? I ran it and it asks me for
  the
ducktape.services.service
   
yuhengdu@consumer0:/packages/kafka_2.10-0.8.2.1$ python
   benchmark_test.py
   
Traceback (most recent call last):
   
  File benchmark_test.py, line 16, in module
   
from ducktape.services.service import Service
   
ImportError: No module named ducktape.services.service
   
   
Can you help me on getting it to work, Ewen? Thanks.
   
   
best,
   
Yuheng
   
On Tue, Jul 14, 2015 at 11:28 PM, Ewen Cheslack-Postava 
   e...@confluent.io

wrote:
   
 @Jiefu, yes! The patch is functional, I think it's just waiting on
 a
   bit
of
 final review after the last round of changes. You can definitely
 use
  it
for
 your own benchmarking, and we'd love to see patches for any
  additional
 tests we missed in the first pass!

 -Ewen

 On Tue, Jul 14, 2015 at 10:53 AM, JIEFU GONG jg...@berkeley.edu
   wrote:

  Yuheng,
  I would recommend looking here:
  http://kafka.apache.org/documentation.html#brokerconfigs and
   scrolling
  down
  to get a better understanding of the default settings and what
 they
mean
 --
  it'll tell you what different options for acks does.
 
  Ewen,
  Thank you immensely for your thoughts, they shed a lot of insight
   into
 the
  issue. Though it is understandable that your specific results
 need
  to
be
  verified, it seems that the KIP-25 patch is functional and I can
  use
   it
 for
  my own benchmarking purposes? Is that correct? Thanks again!
 
  On Tue, Jul 14, 2015 at 8:22 AM, Yuheng Du 
  yuheng.du.h...@gmail.com
   
  wrote:
 
   Also, I guess setting the target throughput to -1 means let it
 be
   as
 high
   as possible?
  
   On Tue, Jul 14, 2015 at 10:36 AM, Yuheng Du 
yuheng.du.h...@gmail.com
   wrote:
  
Thanks. If I set the acks=1 in the producer config options in
bin/kafka-run-class.sh
  org.apache.kafka.clients.tools.ProducerPerformance
test7 5000 100 -1 acks=1 bootstrap.servers=
esv4-hcl198

Re: Latency test

2015-07-15 Thread Yuheng Du
Thank you, Tao!
On Jul 15, 2015 6:27 PM, Tao Feng fengta...@gmail.com wrote:

 Sorry Yufeng, You  should change it in $KAFKA_HEAP_OPTS.

 On Wed, Jul 15, 2015 at 3:09 PM, Tao Feng fengta...@gmail.com wrote:

  Hi Yuheng,
 
  You could add the -Xmx1024m in
  https://github.com/apache/kafka/blob/trunk/bin/kafka-run-class.sh
  KAFKA_JVM_PERFORMANCE_OPTS.
 
 
 
  On Wed, Jul 15, 2015 at 12:51 AM, Yuheng Du yuheng.du.h...@gmail.com
  wrote:
 
  Tao,
 
  If I am running on the command line the following command
  bin/kafka-run-class.sh kafka.tools.TestEndToEndLatency
 192.168.1.3:9092
  192.168.1.1:2181 speedx3 5000 100 1 -Xmx1024m
 
  It promped that it is not correct. So where should I put the -Xmx1024m
  option? Thanks.
 
 
 
  On Wed, Jul 15, 2015 at 3:44 AM, Tao Feng fengta...@gmail.com wrote:
 
   (Please correct me if I am wrong.) Based on  TestEndToEndLatency(
  
  
 
 https://github.com/apache/kafka/blob/trunk/core/src/test/scala/kafka/tools/TestEndToEndLatency.scala
   ),
   consumer_fetch_max_wait corresponds to fetch.wait.max.ms in consumer
   config(
   http://kafka.apache.org/documentation.html#consumerconfigs). The
  default
   value listed at document is 100(ms).
  
   To add java heap space to jvm, put -Xmx$Size(max heap size) for your
 jvm
   option.
  
   On Wed, Jul 15, 2015 at 12:29 AM, Yuheng Du yuheng.du.h...@gmail.com
 
   wrote:
  
Tao,
   
Thanks. The example on
   https://gist.github.com/jkreps/c7ddb4041ef62a900e6c
is outdated already. The error message shows:
   
USAGE: java kafka.tools.TestEndToEndLatency$ broker_list
   zookeeper_connect
topic num_messages consumer_fetch_max_wait producer_acks
   
   
Can anyone helps me what should be put in consumer_fetch_max_wait?
   Thanks.
   
On Tue, Jul 14, 2015 at 5:21 PM, Tao Feng fengta...@gmail.com
  wrote:
   
 I think ProducerPerformance microbenchmark  only measure between
  client
to
 brokers(producer to brokers) and provide latency information.

 On Tue, Jul 14, 2015 at 11:05 AM, Yuheng Du 
  yuheng.du.h...@gmail.com
 wrote:

  Currently, the latency test from kafka test the end to end
 latency
 between
  producers and consumers.
 
  Is there  a way to test the producer to broker  and broker to
   consumer
  delay seperately?
 
  Thanks.
 

   
  
 
 
 



Re: kafka benchmark tests

2015-07-14 Thread Yuheng Du
Thanks. If I set the acks=1 in the producer config options in
bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance
test7 5000 100 -1 acks=1 bootstrap.servers=
esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864 batch.size=8196?

Does that mean for each message generated at the producer, the producer
will wait until the broker sends the ack back, then send another message?

Thanks.

Yuheng

On Tue, Jul 14, 2015 at 10:06 AM, Manikumar Reddy ku...@nmsworks.co.in
wrote:

 Yes, A list of  Kafka Server host/port pairs to use for establishing the
 initial connection to the Kafka cluster

 https://kafka.apache.org/documentation.html#newproducerconfigs

 On Tue, Jul 14, 2015 at 7:29 PM, Yuheng Du yuheng.du.h...@gmail.com
 wrote:

  Does anyone know what is bootstrap.servers=
  esv4-hcl198.grid.linkedin.com:9092 means in the following test command:
 
  bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance
  test7 5000 100 -1 acks=1 bootstrap.servers=
  esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864
 batch.size=8196?
 
  what is bootstrap.servers? Is it the kafka server that I am running a
 test
  at?
 
  Thanks.
 
  Yuheng
 
 
 
 
  On Tue, Jul 14, 2015 at 12:18 AM, Ewen Cheslack-Postava 
 e...@confluent.io
  
  wrote:
 
   I implemented (nearly) the same basic set of tests in the system test
   framework we started at Confluent and that is going to move into Kafka
 --
   see the wip patch for KIP-25 here:
  https://github.com/apache/kafka/pull/70
   In particular, that test is implemented in benchmark_test.py:
  
  
 
 https://github.com/apache/kafka/pull/70/files#diff-ca984778cf9943407645eb6784f19dc8
  
   Hopefully once that's merged people can reuse that benchmark (and add
 to
   it!) so they can easily run the same benchmarks across different
  hardware.
   Here are some results from an older version of that test on m3.2xlarge
   instances on EC2 using local ephemeral storage (I think... it's been
  awhile
   since I ran these numbers and I didn't document methodology that
   carefully):
  
   INFO:_.KafkaBenchmark:=
   INFO:_.KafkaBenchmark:BENCHMARK RESULTS
   INFO:_.KafkaBenchmark:=
   INFO:_.KafkaBenchmark:Single producer, no replication: 684097.470208
   rec/sec (65.24 MB/s)
   INFO:_.KafkaBenchmark:Single producer, async 3x replication:
   667494.359673 rec/sec (63.66 MB/s)
   INFO:_.KafkaBenchmark:Single producer, sync 3x replication:
   116485.764275 rec/sec (11.11 MB/s)
   INFO:_.KafkaBenchmark:Three producers, async 3x replication:
   1696519.022182 rec/sec (161.79 MB/s)
   INFO:_.KafkaBenchmark:Message size:
   INFO:_.KafkaBenchmark: 10: 1637825.195625 rec/sec (15.62 MB/s)
   INFO:_.KafkaBenchmark: 100: 605504.877911 rec/sec (57.75 MB/s)
   INFO:_.KafkaBenchmark: 1000: 90351.817570 rec/sec (86.17 MB/s)
   INFO:_.KafkaBenchmark: 1: 8306.180862 rec/sec (79.21 MB/s)
   INFO:_.KafkaBenchmark: 10: 978.403499 rec/sec (93.31 MB/s)
   INFO:_.KafkaBenchmark:Throughput over long run, data  memory:
   INFO:_.KafkaBenchmark: Time block 0: 684725.151324 rec/sec (65.30
  MB/s)
   INFO:_.KafkaBenchmark:Single consumer: 701031.14 rec/sec (56.830500
   MB/s)
   INFO:_.KafkaBenchmark:Three consumers: 3304011.014900 rec/sec
 (267.830800
   MB/s)
   INFO:_.KafkaBenchmark:Producer + consumer:
   INFO:_.KafkaBenchmark: Producer: 624984.375391 rec/sec (59.60 MB/s)
   INFO:_.KafkaBenchmark: Consumer: 624984.375391 rec/sec (59.60 MB/s)
   INFO:_.KafkaBenchmark:End-to-end latency: median 2.00 ms, 99%
   4.00 ms, 99.9% 19.00 ms
  
   Don't trust these numbers for anything, the were a quick one-off test.
  I'm
   just pasting the output so you get some idea of what the results might
  look
   like. Once we merge the KIP-25 patch, Confluent will be running the
 tests
   regularly and results will be available publicly so we'll be able to
 keep
   better tabs on performance, albeit for only a specific class of
 hardware.
  
   For the batch.size question -- I'm not sure the results in the blog
 post
   actually have different settings, it could be accidental divergence
  between
   the script and the blog post. The post specifically notes that tuning
 the
   batch size in the synchronous case might help, but that he didn't do
  that.
   If you're trying to benchmark the *optimal* throughput, tuning the
 batch
   size would make sense. Since synchronous replication will have higher
   latency and there's a limit to how many requests can be in flight at
  once,
   you'll want a larger batch size to compensate for the additional
 latency.
   However, in practice the increase you see may be negligible. Somebody
 who
   has spent more time fiddling with tweaking producer performance may
 have
   more insight.
  
   -Ewen
  
   On Mon, Jul 13, 2015 at 10:08 AM, JIEFU GONG jg...@berkeley.edu
 wrote:
  
Hi all,
   
I was wondering if any of you guys have done

Re: kafka benchmark tests

2015-07-14 Thread Yuheng Du
Also, I guess setting the target throughput to -1 means let it be as high
as possible?

On Tue, Jul 14, 2015 at 10:36 AM, Yuheng Du yuheng.du.h...@gmail.com
wrote:

 Thanks. If I set the acks=1 in the producer config options in
 bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance
 test7 5000 100 -1 acks=1 bootstrap.servers=
 esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864 batch.size=8196?

 Does that mean for each message generated at the producer, the producer
 will wait until the broker sends the ack back, then send another message?

 Thanks.

 Yuheng

 On Tue, Jul 14, 2015 at 10:06 AM, Manikumar Reddy ku...@nmsworks.co.in
 wrote:

 Yes, A list of  Kafka Server host/port pairs to use for establishing the
 initial connection to the Kafka cluster

 https://kafka.apache.org/documentation.html#newproducerconfigs

 On Tue, Jul 14, 2015 at 7:29 PM, Yuheng Du yuheng.du.h...@gmail.com
 wrote:

  Does anyone know what is bootstrap.servers=
  esv4-hcl198.grid.linkedin.com:9092 means in the following test command:
 
  bin/kafka-run-class.sh
 org.apache.kafka.clients.tools.ProducerPerformance
  test7 5000 100 -1 acks=1 bootstrap.servers=
  esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864
 batch.size=8196?
 
  what is bootstrap.servers? Is it the kafka server that I am running a
 test
  at?
 
  Thanks.
 
  Yuheng
 
 
 
 
  On Tue, Jul 14, 2015 at 12:18 AM, Ewen Cheslack-Postava 
 e...@confluent.io
  
  wrote:
 
   I implemented (nearly) the same basic set of tests in the system test
   framework we started at Confluent and that is going to move into
 Kafka --
   see the wip patch for KIP-25 here:
  https://github.com/apache/kafka/pull/70
   In particular, that test is implemented in benchmark_test.py:
  
  
 
 https://github.com/apache/kafka/pull/70/files#diff-ca984778cf9943407645eb6784f19dc8
  
   Hopefully once that's merged people can reuse that benchmark (and add
 to
   it!) so they can easily run the same benchmarks across different
  hardware.
   Here are some results from an older version of that test on m3.2xlarge
   instances on EC2 using local ephemeral storage (I think... it's been
  awhile
   since I ran these numbers and I didn't document methodology that
   carefully):
  
   INFO:_.KafkaBenchmark:=
   INFO:_.KafkaBenchmark:BENCHMARK RESULTS
   INFO:_.KafkaBenchmark:=
   INFO:_.KafkaBenchmark:Single producer, no replication: 684097.470208
   rec/sec (65.24 MB/s)
   INFO:_.KafkaBenchmark:Single producer, async 3x replication:
   667494.359673 rec/sec (63.66 MB/s)
   INFO:_.KafkaBenchmark:Single producer, sync 3x replication:
   116485.764275 rec/sec (11.11 MB/s)
   INFO:_.KafkaBenchmark:Three producers, async 3x replication:
   1696519.022182 rec/sec (161.79 MB/s)
   INFO:_.KafkaBenchmark:Message size:
   INFO:_.KafkaBenchmark: 10: 1637825.195625 rec/sec (15.62 MB/s)
   INFO:_.KafkaBenchmark: 100: 605504.877911 rec/sec (57.75 MB/s)
   INFO:_.KafkaBenchmark: 1000: 90351.817570 rec/sec (86.17 MB/s)
   INFO:_.KafkaBenchmark: 1: 8306.180862 rec/sec (79.21 MB/s)
   INFO:_.KafkaBenchmark: 10: 978.403499 rec/sec (93.31 MB/s)
   INFO:_.KafkaBenchmark:Throughput over long run, data  memory:
   INFO:_.KafkaBenchmark: Time block 0: 684725.151324 rec/sec (65.30
  MB/s)
   INFO:_.KafkaBenchmark:Single consumer: 701031.14 rec/sec
 (56.830500
   MB/s)
   INFO:_.KafkaBenchmark:Three consumers: 3304011.014900 rec/sec
 (267.830800
   MB/s)
   INFO:_.KafkaBenchmark:Producer + consumer:
   INFO:_.KafkaBenchmark: Producer: 624984.375391 rec/sec (59.60
 MB/s)
   INFO:_.KafkaBenchmark: Consumer: 624984.375391 rec/sec (59.60
 MB/s)
   INFO:_.KafkaBenchmark:End-to-end latency: median 2.00 ms, 99%
   4.00 ms, 99.9% 19.00 ms
  
   Don't trust these numbers for anything, the were a quick one-off test.
  I'm
   just pasting the output so you get some idea of what the results might
  look
   like. Once we merge the KIP-25 patch, Confluent will be running the
 tests
   regularly and results will be available publicly so we'll be able to
 keep
   better tabs on performance, albeit for only a specific class of
 hardware.
  
   For the batch.size question -- I'm not sure the results in the blog
 post
   actually have different settings, it could be accidental divergence
  between
   the script and the blog post. The post specifically notes that tuning
 the
   batch size in the synchronous case might help, but that he didn't do
  that.
   If you're trying to benchmark the *optimal* throughput, tuning the
 batch
   size would make sense. Since synchronous replication will have higher
   latency and there's a limit to how many requests can be in flight at
  once,
   you'll want a larger batch size to compensate for the additional
 latency.
   However, in practice the increase you see may be negligible. Somebody
 who
   has spent more time fiddling with tweaking producer performance may

Re: kafka benchmark tests

2015-07-14 Thread Yuheng Du
Does anyone know what is bootstrap.servers=
esv4-hcl198.grid.linkedin.com:9092 means in the following test command:

bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance
test7 5000 100 -1 acks=1 bootstrap.servers=
esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864 batch.size=8196?

what is bootstrap.servers? Is it the kafka server that I am running a test
at?

Thanks.

Yuheng




On Tue, Jul 14, 2015 at 12:18 AM, Ewen Cheslack-Postava e...@confluent.io
wrote:

 I implemented (nearly) the same basic set of tests in the system test
 framework we started at Confluent and that is going to move into Kafka --
 see the wip patch for KIP-25 here: https://github.com/apache/kafka/pull/70
 In particular, that test is implemented in benchmark_test.py:

 https://github.com/apache/kafka/pull/70/files#diff-ca984778cf9943407645eb6784f19dc8

 Hopefully once that's merged people can reuse that benchmark (and add to
 it!) so they can easily run the same benchmarks across different hardware.
 Here are some results from an older version of that test on m3.2xlarge
 instances on EC2 using local ephemeral storage (I think... it's been awhile
 since I ran these numbers and I didn't document methodology that
 carefully):

 INFO:_.KafkaBenchmark:=
 INFO:_.KafkaBenchmark:BENCHMARK RESULTS
 INFO:_.KafkaBenchmark:=
 INFO:_.KafkaBenchmark:Single producer, no replication: 684097.470208
 rec/sec (65.24 MB/s)
 INFO:_.KafkaBenchmark:Single producer, async 3x replication:
 667494.359673 rec/sec (63.66 MB/s)
 INFO:_.KafkaBenchmark:Single producer, sync 3x replication:
 116485.764275 rec/sec (11.11 MB/s)
 INFO:_.KafkaBenchmark:Three producers, async 3x replication:
 1696519.022182 rec/sec (161.79 MB/s)
 INFO:_.KafkaBenchmark:Message size:
 INFO:_.KafkaBenchmark: 10: 1637825.195625 rec/sec (15.62 MB/s)
 INFO:_.KafkaBenchmark: 100: 605504.877911 rec/sec (57.75 MB/s)
 INFO:_.KafkaBenchmark: 1000: 90351.817570 rec/sec (86.17 MB/s)
 INFO:_.KafkaBenchmark: 1: 8306.180862 rec/sec (79.21 MB/s)
 INFO:_.KafkaBenchmark: 10: 978.403499 rec/sec (93.31 MB/s)
 INFO:_.KafkaBenchmark:Throughput over long run, data  memory:
 INFO:_.KafkaBenchmark: Time block 0: 684725.151324 rec/sec (65.30 MB/s)
 INFO:_.KafkaBenchmark:Single consumer: 701031.14 rec/sec (56.830500
 MB/s)
 INFO:_.KafkaBenchmark:Three consumers: 3304011.014900 rec/sec (267.830800
 MB/s)
 INFO:_.KafkaBenchmark:Producer + consumer:
 INFO:_.KafkaBenchmark: Producer: 624984.375391 rec/sec (59.60 MB/s)
 INFO:_.KafkaBenchmark: Consumer: 624984.375391 rec/sec (59.60 MB/s)
 INFO:_.KafkaBenchmark:End-to-end latency: median 2.00 ms, 99%
 4.00 ms, 99.9% 19.00 ms

 Don't trust these numbers for anything, the were a quick one-off test. I'm
 just pasting the output so you get some idea of what the results might look
 like. Once we merge the KIP-25 patch, Confluent will be running the tests
 regularly and results will be available publicly so we'll be able to keep
 better tabs on performance, albeit for only a specific class of hardware.

 For the batch.size question -- I'm not sure the results in the blog post
 actually have different settings, it could be accidental divergence between
 the script and the blog post. The post specifically notes that tuning the
 batch size in the synchronous case might help, but that he didn't do that.
 If you're trying to benchmark the *optimal* throughput, tuning the batch
 size would make sense. Since synchronous replication will have higher
 latency and there's a limit to how many requests can be in flight at once,
 you'll want a larger batch size to compensate for the additional latency.
 However, in practice the increase you see may be negligible. Somebody who
 has spent more time fiddling with tweaking producer performance may have
 more insight.

 -Ewen

 On Mon, Jul 13, 2015 at 10:08 AM, JIEFU GONG jg...@berkeley.edu wrote:

  Hi all,
 
  I was wondering if any of you guys have done benchmarks on Kafka
  performance before, and if they or their details (# nodes in cluster, #
  records / size(s) of messages, etc.) could be shared.
 
  For comparison purposes, I am trying to benchmark Kafka against some
  similar services such as Kinesis or Scribe. Additionally, I was wondering
  if anyone could shed some insight on Jay Kreps' benchmarks that he has
  openly published here:
 
 
 https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines
 
  Specifically, I am unsure of why between his tests of 3x synchronous
  replication and 3x async replication he changed the batch.size, as well
 as
  why he is seemingly publishing to incorrect topics:
 
  Configs:
  https://gist.github.com/jkreps/c7ddb4041ef62a900e6c
 
  Any help is greatly appreciated!
 
 
 
  --
 
  Jiefu Gong
  University of California, Berkeley | Class of 2017
  B.A Computer Science | College of Letters and Sciences
 
  

Re: How to run the three producers test

2015-07-14 Thread Yuheng Du
I checked the logs on the brokers, it seems that the zookeeper or the kafka
server process is not running on this broker...Thank you guys. I will see
if it happens again.

On Tue, Jul 14, 2015 at 4:53 PM, JIEFU GONG jg...@berkeley.edu wrote:

 Hmm..yeah some error logs would be nice like Gwen pointed out. Do any of
 your brokers fall out of the ISR when sending messages? It seems like your
 setup should be fine, so I'm not entirely sure.

 On Tue, Jul 14, 2015 at 1:31 PM, Yuheng Du yuheng.du.h...@gmail.com
 wrote:

  Jiefu,
 
  I am performing these tests on a 6 nodes cluster in cloudlab (a
  infrastructure built for scientific research). I use 2 nodes as
 producers,
  2 as brokers only, and 2 as consumers. I have tested for each individual
  machines and they work well. I did not use AWS. Thank you!
 
  On Tue, Jul 14, 2015 at 4:20 PM, JIEFU GONG jg...@berkeley.edu wrote:
 
   Yuheng, are you performing these tests locally or using a service such
 as
   AWS? I'd try using each separate machine individually first, connecting
  to
   the ZK/Kafka servers and ensuring that each is able to first log and
   consume messages independently.
  
   On Tue, Jul 14, 2015 at 1:17 PM, Gwen Shapira gshap...@cloudera.com
   wrote:
  
Are there any errors on the broker logs?
   
On Tue, Jul 14, 2015 at 11:57 AM, Yuheng Du 
 yuheng.du.h...@gmail.com
wrote:
 Jiefu,

 Thank you. The three producers can run at the same time. I mean
  should
they
 be started at exactly the same time? (I have three consoles from
 each
   of
 the three machines and I just start the console command manually
 one
  by
 one) Or a small variation of the starting time won't matter?

 Gwen and Jiefu,

 I have started the three producers at three machines, after a
 while,
   all
of
 them gives a java.net.ConnectException:

 [2015-07-14 12:56:46,352] WARN Error in I/O with producer0-link-0/
 192.168.1.1 (org.apache.kafka.common.network.Selector)

 java.net.ConnectException: Connection refused..

 [2015-07-14 12:56:48,056] WARN Error in I/O with producer1-link-0/
 192.168.1.2 (org.apache.kafka.common.network.Selector)

 java.net.ConnectException: Connection refused.

 What could be the cause?

 Thank you guys!




 On Tue, Jul 14, 2015 at 2:47 PM, JIEFU GONG jg...@berkeley.edu
   wrote:

 Yuheng,

 Yes, if you read the blog post it specifies that he's using three
separate
 machines. There's no reason the producers cannot be started at the
   same
 time, I believe.

 On Tue, Jul 14, 2015 at 11:42 AM, Yuheng Du 
  yuheng.du.h...@gmail.com
   
 wrote:

  Hi,
 
  I am running the performance test for kafka.
  https://gist.github.com/jkreps
  /c7ddb4041ef62a900e6c
 
  For the Three Producers, 3x async replication scenario, the
   command
is
  the same as one producer:
 
  bin/kafka-run-class.sh
org.apache.kafka.clients.tools.ProducerPerformance
  test 5000 100 -1 acks=1
  bootstrap.servers=esv4-hcl198.grid.linkedin.com:9092
  buffer.memory=67108864 batch.size=8196
 
  So How to I run the test for three producers? Do I just run them
  on
three
  separate servers at the same time? Will there be some error in
  this
way
  since the three producers can't be started at the same time?
 
  Thanks.
 
  best,
  Yuheng
 



 --

 Jiefu Gong
 University of California, Berkeley | Class of 2017
 B.A Computer Science | College of Letters and Sciences

 jg...@berkeley.edu elise...@berkeley.edu | (925) 400-3427

   
  
  
  
   --
  
   Jiefu Gong
   University of California, Berkeley | Class of 2017
   B.A Computer Science | College of Letters and Sciences
  
   jg...@berkeley.edu elise...@berkeley.edu | (925) 400-3427
  
 



 --

 Jiefu Gong
 University of California, Berkeley | Class of 2017
 B.A Computer Science | College of Letters and Sciences

 jg...@berkeley.edu elise...@berkeley.edu | (925) 400-3427



Re: How to run the three producers test

2015-07-14 Thread Yuheng Du
Also, the log in another broker (not the bootstrap) says:

[2015-07-14 15:18:41,220] FATAL [Replica Manager on Broker 1]: Error
writing to highwatermark file:  (kafka.server.ReplicaManager)


[2015-07-14 15:18:40,160] ERROR Closing socket for /130.127.133.47 because
of error (kafka.network.Process

or)

java.io.IOException: Connection reset by peer

at sun.nio.ch.FileDispatcherImpl.read0(Native Method)

at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)

at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)

at sun.nio.ch.IOUtil.read(IOUtil.java:197)

at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)

at kafka.utils.Utils$.read(Utils.scala:380)

at
kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54)

at kafka.network.Processor.read(SocketServer.scala:444)

at kafka.network.Processor.run(SocketServer.scala:340)

at java.lang.Thread.run(Thread.java:745)



java.io.IOException: No space left on device

at java.io.FileOutputStream.writeBytes(Native Method)

at java.io.FileOutputStream.write(FileOutputStream.java:345)

at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221)

at sun.nio.cs.StreamEncoder.implFlushBuffe

(END)

On Tue, Jul 14, 2015 at 5:24 PM, Yuheng Du yuheng.du.h...@gmail.com wrote:

 Hi Jiefu, Gwen,

 I am running the Throughput versus stored data test:
 bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance
 test 500 100 -1 acks=1 bootstrap.servers=
 esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864 batch.size=8196

 After around 50,000,000 messages were sent, I got a bunch of connection
 refused error as I mentioned before. I checked the logs on the broker and
 here is what I see:

 [2015-07-14 15:11:23,578] WARN Partition [test,4] on broker 5: No
 checkpointed highwatermark is found for partition [test,4]
 (kafka.cluster.Partition)

 [2015-07-14 15:12:33,298] INFO Rolled new log segment for 'test-4' in 4
 ms. (kafka.log.Log)

 [2015-07-14 15:12:33,299] INFO Rolled new log segment for 'test-0' in 1
 ms. (kafka.log.Log)

 [2015-07-14 15:13:39,529] INFO Rolled new log segment for 'test-4' in 1
 ms. (kafka.log.Log)

 [2015-07-14 15:13:39,531] INFO Rolled new log segment for 'test-0' in 1
 ms. (kafka.log.Log)

 [2015-07-14 15:14:48,502] INFO Rolled new log segment for 'test-4' in 3
 ms. (kafka.log.Log)

 [2015-07-14 15:14:48,502] INFO Rolled new log segment for 'test-0' in 1
 ms. (kafka.log.Log)

 [2015-07-14 15:15:51,478] INFO Rolled new log segment for 'test-4' in 1
 ms. (kafka.log.Log)

 [2015-07-14 15:15:51,479] INFO Rolled new log segment for 'test-0' in 1
 ms. (kafka.log.Log)

 [2015-07-14 15:16:52,589] INFO Rolled new log segment for 'test-4' in 1
 ms. (kafka.log.Log)

 [2015-07-14 15:16:52,590] INFO Rolled new log segment for 'test-0' in 1
 ms. (kafka.log.Log)

 [2015-07-14 15:17:57,406] INFO Rolled new log segment for 'test-4' in 1
 ms. (kafka.log.Log)

 [2015-07-14 15:17:57,407] INFO Rolled new log segment for 'test-0' in 0
 ms. (kafka.log.Log)

 [2015-07-14 15:18:39,792] FATAL [KafkaApi-5] Halting due to unrecoverable
 I/O error while handling produce request:  (kafka.server.KafkaApis)

 kafka.common.KafkaStorageException: I/O exception in append to log 'test-0'

 at kafka.log.Log.append(Log.scala:266)

 at
 kafka.cluster.Partition$$anonfun$appendMessagesToLeader$1.apply(Partition.scala:379)

 at
 kafka.cluster.Partition$$anonfun$appendMessagesToLeader$1.apply(Partition.scala:365)

 at kafka.utils.Utils$.inLock(Utils.scala:535)

 at kafka.utils.Utils$.inReadLock(Utils.scala:541)

 at
 kafka.cluster.Partition.appendMessagesToLeader(Partition.scala:365)

 at
 kafka.server.KafkaApis$$anonfun$appendToLocalLog$2.apply(KafkaApis.scala:291)

 at
 kafka.server.KafkaApis$$anonfun$appendToLocalLog$2.apply(KafkaApis.scala:282)

 at
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)

 at
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)

 at
 scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)

 at
 scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)

 at scala.coll



 Can you help me with this problem? Thanks.

 On Tue, Jul 14, 2015 at 5:12 PM, Yuheng Du yuheng.du.h...@gmail.com
 wrote:

 I checked the logs on the brokers, it seems that the zookeeper or the
 kafka server process is not running on this broker...Thank you guys. I will
 see if it happens again.

 On Tue, Jul 14, 2015 at 4:53 PM, JIEFU GONG jg...@berkeley.edu wrote:

 Hmm..yeah some error logs would be nice like Gwen pointed out. Do any of
 your brokers fall out of the ISR when sending messages? It seems like
 your
 setup should be fine, so I'm not entirely sure.

 On Tue, Jul 14, 2015 at 1:31 PM

Re: How to run the three producers test

2015-07-14 Thread Yuheng Du
Hi Jiefu, Gwen,

I am running the Throughput versus stored data test:
bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance
test 500 100 -1 acks=1 bootstrap.servers=
esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864 batch.size=8196

After around 50,000,000 messages were sent, I got a bunch of connection
refused error as I mentioned before. I checked the logs on the broker and
here is what I see:

[2015-07-14 15:11:23,578] WARN Partition [test,4] on broker 5: No
checkpointed highwatermark is found for partition [test,4]
(kafka.cluster.Partition)

[2015-07-14 15:12:33,298] INFO Rolled new log segment for 'test-4' in 4 ms.
(kafka.log.Log)

[2015-07-14 15:12:33,299] INFO Rolled new log segment for 'test-0' in 1 ms.
(kafka.log.Log)

[2015-07-14 15:13:39,529] INFO Rolled new log segment for 'test-4' in 1 ms.
(kafka.log.Log)

[2015-07-14 15:13:39,531] INFO Rolled new log segment for 'test-0' in 1 ms.
(kafka.log.Log)

[2015-07-14 15:14:48,502] INFO Rolled new log segment for 'test-4' in 3 ms.
(kafka.log.Log)

[2015-07-14 15:14:48,502] INFO Rolled new log segment for 'test-0' in 1 ms.
(kafka.log.Log)

[2015-07-14 15:15:51,478] INFO Rolled new log segment for 'test-4' in 1 ms.
(kafka.log.Log)

[2015-07-14 15:15:51,479] INFO Rolled new log segment for 'test-0' in 1 ms.
(kafka.log.Log)

[2015-07-14 15:16:52,589] INFO Rolled new log segment for 'test-4' in 1 ms.
(kafka.log.Log)

[2015-07-14 15:16:52,590] INFO Rolled new log segment for 'test-0' in 1 ms.
(kafka.log.Log)

[2015-07-14 15:17:57,406] INFO Rolled new log segment for 'test-4' in 1 ms.
(kafka.log.Log)

[2015-07-14 15:17:57,407] INFO Rolled new log segment for 'test-0' in 0 ms.
(kafka.log.Log)

[2015-07-14 15:18:39,792] FATAL [KafkaApi-5] Halting due to unrecoverable
I/O error while handling produce request:  (kafka.server.KafkaApis)

kafka.common.KafkaStorageException: I/O exception in append to log 'test-0'

at kafka.log.Log.append(Log.scala:266)

at
kafka.cluster.Partition$$anonfun$appendMessagesToLeader$1.apply(Partition.scala:379)

at
kafka.cluster.Partition$$anonfun$appendMessagesToLeader$1.apply(Partition.scala:365)

at kafka.utils.Utils$.inLock(Utils.scala:535)

at kafka.utils.Utils$.inReadLock(Utils.scala:541)

at
kafka.cluster.Partition.appendMessagesToLeader(Partition.scala:365)

at
kafka.server.KafkaApis$$anonfun$appendToLocalLog$2.apply(KafkaApis.scala:291)

at
kafka.server.KafkaApis$$anonfun$appendToLocalLog$2.apply(KafkaApis.scala:282)

at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)

at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)

at
scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)

at
scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)

at scala.coll



Can you help me with this problem? Thanks.

On Tue, Jul 14, 2015 at 5:12 PM, Yuheng Du yuheng.du.h...@gmail.com wrote:

 I checked the logs on the brokers, it seems that the zookeeper or the
 kafka server process is not running on this broker...Thank you guys. I will
 see if it happens again.

 On Tue, Jul 14, 2015 at 4:53 PM, JIEFU GONG jg...@berkeley.edu wrote:

 Hmm..yeah some error logs would be nice like Gwen pointed out. Do any of
 your brokers fall out of the ISR when sending messages? It seems like your
 setup should be fine, so I'm not entirely sure.

 On Tue, Jul 14, 2015 at 1:31 PM, Yuheng Du yuheng.du.h...@gmail.com
 wrote:

  Jiefu,
 
  I am performing these tests on a 6 nodes cluster in cloudlab (a
  infrastructure built for scientific research). I use 2 nodes as
 producers,
  2 as brokers only, and 2 as consumers. I have tested for each individual
  machines and they work well. I did not use AWS. Thank you!
 
  On Tue, Jul 14, 2015 at 4:20 PM, JIEFU GONG jg...@berkeley.edu wrote:
 
   Yuheng, are you performing these tests locally or using a service
 such as
   AWS? I'd try using each separate machine individually first,
 connecting
  to
   the ZK/Kafka servers and ensuring that each is able to first log and
   consume messages independently.
  
   On Tue, Jul 14, 2015 at 1:17 PM, Gwen Shapira gshap...@cloudera.com
   wrote:
  
Are there any errors on the broker logs?
   
On Tue, Jul 14, 2015 at 11:57 AM, Yuheng Du 
 yuheng.du.h...@gmail.com
wrote:
 Jiefu,

 Thank you. The three producers can run at the same time. I mean
  should
they
 be started at exactly the same time? (I have three consoles from
 each
   of
 the three machines and I just start the console command manually
 one
  by
 one) Or a small variation of the starting time won't matter?

 Gwen and Jiefu,

 I have started the three producers at three machines, after a
 while,
   all
of
 them gives a java.net.ConnectException:

 [2015-07-14 12:56:46,352] WARN Error in I/O with producer0-link

Re: How to run the three producers test

2015-07-14 Thread Yuheng Du
But is there a way to let kafka override the old data if the disk is
filled? Or is it not necessary to use this figure? Thanks.

On Tue, Jul 14, 2015 at 10:14 PM, Yuheng Du yuheng.du.h...@gmail.com
wrote:

 Jiefu,

 I agree with you. I checked the hardware specs of my machines, each one of
 them has:

 RAM



 256GB ECC Memory (16x 16 GB DDR4 1600MT/s dual rank RDIMMs

 Disk



 Two 1 TB 7.2K RPM 3G SATA HDDs

 For the throughput versus stored data test, it uses 5*10^10 messages,
 which has the total volume of 5TB, I made the replication factor to be 3,
 which means the total size including replicas would be 15TB, which
 apparently overwhelmed the two brokers I use.

 Thanks.

 best,
 Yuheng

 On Tue, Jul 14, 2015 at 6:06 PM, JIEFU GONG jg...@berkeley.edu wrote:

 Someone may correct me if I am incorrect, but how much disk space do you
 have on these nodes? Your exception 'No space left on device' from one of
 your brokers seems to suggest that you're full (after all you're writing
 500 million records). If this is the case I believe the expected behavior
 for Kafka is to reject any more attempts to write data?

 On Tue, Jul 14, 2015 at 2:27 PM, Yuheng Du yuheng.du.h...@gmail.com
 wrote:

  Also, the log in another broker (not the bootstrap) says:
 
  [2015-07-14 15:18:41,220] FATAL [Replica Manager on Broker 1]: Error
  writing to highwatermark file:  (kafka.server.ReplicaManager)
 
 
  [2015-07-14 15:18:40,160] ERROR Closing socket for /130.127.133.47
 because
  of error (kafka.network.Process
 
  or)
 
  java.io.IOException: Connection reset by peer
 
  at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
 
  at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
 
  at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
 
  at sun.nio.ch.IOUtil.read(IOUtil.java:197)
 
  at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
 
  at kafka.utils.Utils$.read(Utils.scala:380)
 
  at
 
 
 kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54)
 
  at kafka.network.Processor.read(SocketServer.scala:444)
 
  at kafka.network.Processor.run(SocketServer.scala:340)
 
  at java.lang.Thread.run(Thread.java:745)
 
  
 
  java.io.IOException: No space left on device
 
  at java.io.FileOutputStream.writeBytes(Native Method)
 
  at java.io.FileOutputStream.write(FileOutputStream.java:345)
 
  at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221)
 
  at sun.nio.cs.StreamEncoder.implFlushBuffe
 
  (END)
 
  On Tue, Jul 14, 2015 at 5:24 PM, Yuheng Du yuheng.du.h...@gmail.com
  wrote:
 
   Hi Jiefu, Gwen,
  
   I am running the Throughput versus stored data test:
   bin/kafka-run-class.sh
 org.apache.kafka.clients.tools.ProducerPerformance
   test 500 100 -1 acks=1 bootstrap.servers=
   esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864
  batch.size=8196
  
   After around 50,000,000 messages were sent, I got a bunch of
 connection
   refused error as I mentioned before. I checked the logs on the broker
 and
   here is what I see:
  
   [2015-07-14 15:11:23,578] WARN Partition [test,4] on broker 5: No
   checkpointed highwatermark is found for partition [test,4]
   (kafka.cluster.Partition)
  
   [2015-07-14 15:12:33,298] INFO Rolled new log segment for 'test-4' in
 4
   ms. (kafka.log.Log)
  
   [2015-07-14 15:12:33,299] INFO Rolled new log segment for 'test-0' in
 1
   ms. (kafka.log.Log)
  
   [2015-07-14 15:13:39,529] INFO Rolled new log segment for 'test-4' in
 1
   ms. (kafka.log.Log)
  
   [2015-07-14 15:13:39,531] INFO Rolled new log segment for 'test-0' in
 1
   ms. (kafka.log.Log)
  
   [2015-07-14 15:14:48,502] INFO Rolled new log segment for 'test-4' in
 3
   ms. (kafka.log.Log)
  
   [2015-07-14 15:14:48,502] INFO Rolled new log segment for 'test-0' in
 1
   ms. (kafka.log.Log)
  
   [2015-07-14 15:15:51,478] INFO Rolled new log segment for 'test-4' in
 1
   ms. (kafka.log.Log)
  
   [2015-07-14 15:15:51,479] INFO Rolled new log segment for 'test-0' in
 1
   ms. (kafka.log.Log)
  
   [2015-07-14 15:16:52,589] INFO Rolled new log segment for 'test-4' in
 1
   ms. (kafka.log.Log)
  
   [2015-07-14 15:16:52,590] INFO Rolled new log segment for 'test-0' in
 1
   ms. (kafka.log.Log)
  
   [2015-07-14 15:17:57,406] INFO Rolled new log segment for 'test-4' in
 1
   ms. (kafka.log.Log)
  
   [2015-07-14 15:17:57,407] INFO Rolled new log segment for 'test-0' in
 0
   ms. (kafka.log.Log)
  
   [2015-07-14 15:18:39,792] FATAL [KafkaApi-5] Halting due to
 unrecoverable
   I/O error while handling produce request:  (kafka.server.KafkaApis)
  
   kafka.common.KafkaStorageException: I/O exception in append to log
  'test-0'
  
   at kafka.log.Log.append(Log.scala:266)
  
   at
  
 
 kafka.cluster.Partition$$anonfun$appendMessagesToLeader$1.apply(Partition.scala:379)
  
   at
  
 
 kafka.cluster.Partition

Re: How to run the three producers test

2015-07-14 Thread Yuheng Du
Jiefu,

I agree with you. I checked the hardware specs of my machines, each one of
them has:

RAM



256GB ECC Memory (16x 16 GB DDR4 1600MT/s dual rank RDIMMs

Disk



Two 1 TB 7.2K RPM 3G SATA HDDs

For the throughput versus stored data test, it uses 5*10^10 messages, which
has the total volume of 5TB, I made the replication factor to be 3, which
means the total size including replicas would be 15TB, which apparently
overwhelmed the two brokers I use.

Thanks.

best,
Yuheng

On Tue, Jul 14, 2015 at 6:06 PM, JIEFU GONG jg...@berkeley.edu wrote:

 Someone may correct me if I am incorrect, but how much disk space do you
 have on these nodes? Your exception 'No space left on device' from one of
 your brokers seems to suggest that you're full (after all you're writing
 500 million records). If this is the case I believe the expected behavior
 for Kafka is to reject any more attempts to write data?

 On Tue, Jul 14, 2015 at 2:27 PM, Yuheng Du yuheng.du.h...@gmail.com
 wrote:

  Also, the log in another broker (not the bootstrap) says:
 
  [2015-07-14 15:18:41,220] FATAL [Replica Manager on Broker 1]: Error
  writing to highwatermark file:  (kafka.server.ReplicaManager)
 
 
  [2015-07-14 15:18:40,160] ERROR Closing socket for /130.127.133.47
 because
  of error (kafka.network.Process
 
  or)
 
  java.io.IOException: Connection reset by peer
 
  at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
 
  at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
 
  at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
 
  at sun.nio.ch.IOUtil.read(IOUtil.java:197)
 
  at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
 
  at kafka.utils.Utils$.read(Utils.scala:380)
 
  at
 
 
 kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54)
 
  at kafka.network.Processor.read(SocketServer.scala:444)
 
  at kafka.network.Processor.run(SocketServer.scala:340)
 
  at java.lang.Thread.run(Thread.java:745)
 
  
 
  java.io.IOException: No space left on device
 
  at java.io.FileOutputStream.writeBytes(Native Method)
 
  at java.io.FileOutputStream.write(FileOutputStream.java:345)
 
  at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221)
 
  at sun.nio.cs.StreamEncoder.implFlushBuffe
 
  (END)
 
  On Tue, Jul 14, 2015 at 5:24 PM, Yuheng Du yuheng.du.h...@gmail.com
  wrote:
 
   Hi Jiefu, Gwen,
  
   I am running the Throughput versus stored data test:
   bin/kafka-run-class.sh
 org.apache.kafka.clients.tools.ProducerPerformance
   test 500 100 -1 acks=1 bootstrap.servers=
   esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864
  batch.size=8196
  
   After around 50,000,000 messages were sent, I got a bunch of connection
   refused error as I mentioned before. I checked the logs on the broker
 and
   here is what I see:
  
   [2015-07-14 15:11:23,578] WARN Partition [test,4] on broker 5: No
   checkpointed highwatermark is found for partition [test,4]
   (kafka.cluster.Partition)
  
   [2015-07-14 15:12:33,298] INFO Rolled new log segment for 'test-4' in 4
   ms. (kafka.log.Log)
  
   [2015-07-14 15:12:33,299] INFO Rolled new log segment for 'test-0' in 1
   ms. (kafka.log.Log)
  
   [2015-07-14 15:13:39,529] INFO Rolled new log segment for 'test-4' in 1
   ms. (kafka.log.Log)
  
   [2015-07-14 15:13:39,531] INFO Rolled new log segment for 'test-0' in 1
   ms. (kafka.log.Log)
  
   [2015-07-14 15:14:48,502] INFO Rolled new log segment for 'test-4' in 3
   ms. (kafka.log.Log)
  
   [2015-07-14 15:14:48,502] INFO Rolled new log segment for 'test-0' in 1
   ms. (kafka.log.Log)
  
   [2015-07-14 15:15:51,478] INFO Rolled new log segment for 'test-4' in 1
   ms. (kafka.log.Log)
  
   [2015-07-14 15:15:51,479] INFO Rolled new log segment for 'test-0' in 1
   ms. (kafka.log.Log)
  
   [2015-07-14 15:16:52,589] INFO Rolled new log segment for 'test-4' in 1
   ms. (kafka.log.Log)
  
   [2015-07-14 15:16:52,590] INFO Rolled new log segment for 'test-0' in 1
   ms. (kafka.log.Log)
  
   [2015-07-14 15:17:57,406] INFO Rolled new log segment for 'test-4' in 1
   ms. (kafka.log.Log)
  
   [2015-07-14 15:17:57,407] INFO Rolled new log segment for 'test-0' in 0
   ms. (kafka.log.Log)
  
   [2015-07-14 15:18:39,792] FATAL [KafkaApi-5] Halting due to
 unrecoverable
   I/O error while handling produce request:  (kafka.server.KafkaApis)
  
   kafka.common.KafkaStorageException: I/O exception in append to log
  'test-0'
  
   at kafka.log.Log.append(Log.scala:266)
  
   at
  
 
 kafka.cluster.Partition$$anonfun$appendMessagesToLeader$1.apply(Partition.scala:379)
  
   at
  
 
 kafka.cluster.Partition$$anonfun$appendMessagesToLeader$1.apply(Partition.scala:365)
  
   at kafka.utils.Utils$.inLock(Utils.scala:535)
  
   at kafka.utils.Utils$.inReadLock(Utils.scala:541

Re: How to run the three producers test

2015-07-14 Thread Yuheng Du
Jiefu,

Now even if the disk space is enough (less than 18%), when I run

it still gives me error where in the logs it says:

[2015-07-14 23:08:48,735] FATAL Fatal error during KafkaServerStartable
startup. Prepare to shutdown (kafka.server.KafkaServerStartable)

org.I0Itec.zkclient.exception.ZkTimeoutException: Unable to connect to
zookeeper server within timeout: 6000

at org.I0Itec.zkclient.ZkClient.connect(ZkClient.java:880)

at org.I0Itec.zkclient.ZkClient.init(ZkClient.java:98)

at org.I0Itec.zkclient.ZkClient.init(ZkClient.java:84)

at kafka.server.KafkaServer.initZk(KafkaServer.scala:157)

at kafka.server.KafkaServer.startup(KafkaServer.scala:82)

at
kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:29)

at kafka.Kafka$.main(Kafka.scala:46)

at kafka.Kafka.main(Kafka.scala)

[2015-07-14 23:08:48,737] INFO [Kafka Server 1], shutting down
(kafka.server.KafkaServer)

I have checked that the zookeeper is running fine. Can anyone help why I
got the error? Thanks.

On Tue, Jul 14, 2015 at 10:24 PM, Yuheng Du yuheng.du.h...@gmail.com
wrote:

 But is there a way to let kafka override the old data if the disk is
 filled? Or is it not necessary to use this figure? Thanks.

 On Tue, Jul 14, 2015 at 10:14 PM, Yuheng Du yuheng.du.h...@gmail.com
 wrote:

 Jiefu,

 I agree with you. I checked the hardware specs of my machines, each one
 of them has:

 RAM



 256GB ECC Memory (16x 16 GB DDR4 1600MT/s dual rank RDIMMs

 Disk



 Two 1 TB 7.2K RPM 3G SATA HDDs

 For the throughput versus stored data test, it uses 5*10^10 messages,
 which has the total volume of 5TB, I made the replication factor to be 3,
 which means the total size including replicas would be 15TB, which
 apparently overwhelmed the two brokers I use.

 Thanks.

 best,
 Yuheng

 On Tue, Jul 14, 2015 at 6:06 PM, JIEFU GONG jg...@berkeley.edu wrote:

 Someone may correct me if I am incorrect, but how much disk space do you
 have on these nodes? Your exception 'No space left on device' from one of
 your brokers seems to suggest that you're full (after all you're writing
 500 million records). If this is the case I believe the expected behavior
 for Kafka is to reject any more attempts to write data?

 On Tue, Jul 14, 2015 at 2:27 PM, Yuheng Du yuheng.du.h...@gmail.com
 wrote:

  Also, the log in another broker (not the bootstrap) says:
 
  [2015-07-14 15:18:41,220] FATAL [Replica Manager on Broker 1]: Error
  writing to highwatermark file:  (kafka.server.ReplicaManager)
 
 
  [2015-07-14 15:18:40,160] ERROR Closing socket for /130.127.133.47
 because
  of error (kafka.network.Process
 
  or)
 
  java.io.IOException: Connection reset by peer
 
  at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
 
  at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
 
  at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
 
  at sun.nio.ch.IOUtil.read(IOUtil.java:197)
 
  at
 sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
 
  at kafka.utils.Utils$.read(Utils.scala:380)
 
  at
 
 
 kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54)
 
  at kafka.network.Processor.read(SocketServer.scala:444)
 
  at kafka.network.Processor.run(SocketServer.scala:340)
 
  at java.lang.Thread.run(Thread.java:745)
 
  
 
  java.io.IOException: No space left on device
 
  at java.io.FileOutputStream.writeBytes(Native Method)
 
  at java.io.FileOutputStream.write(FileOutputStream.java:345)
 
  at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221)
 
  at sun.nio.cs.StreamEncoder.implFlushBuffe
 
  (END)
 
  On Tue, Jul 14, 2015 at 5:24 PM, Yuheng Du yuheng.du.h...@gmail.com
  wrote:
 
   Hi Jiefu, Gwen,
  
   I am running the Throughput versus stored data test:
   bin/kafka-run-class.sh
 org.apache.kafka.clients.tools.ProducerPerformance
   test 500 100 -1 acks=1 bootstrap.servers=
   esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864
  batch.size=8196
  
   After around 50,000,000 messages were sent, I got a bunch of
 connection
   refused error as I mentioned before. I checked the logs on the
 broker and
   here is what I see:
  
   [2015-07-14 15:11:23,578] WARN Partition [test,4] on broker 5: No
   checkpointed highwatermark is found for partition [test,4]
   (kafka.cluster.Partition)
  
   [2015-07-14 15:12:33,298] INFO Rolled new log segment for 'test-4'
 in 4
   ms. (kafka.log.Log)
  
   [2015-07-14 15:12:33,299] INFO Rolled new log segment for 'test-0'
 in 1
   ms. (kafka.log.Log)
  
   [2015-07-14 15:13:39,529] INFO Rolled new log segment for 'test-4'
 in 1
   ms. (kafka.log.Log)
  
   [2015-07-14 15:13:39,531] INFO Rolled new log segment for 'test-0'
 in 1
   ms. (kafka.log.Log)
  
   [2015-07-14 15:14:48,502] INFO Rolled new log segment for 'test-4'
 in 3
   ms. (kafka.log.Log)
  
   [2015-07-14

How to run the three producers test

2015-07-14 Thread Yuheng Du
Hi,

I am running the performance test for kafka. https://gist.github.com/jkreps
/c7ddb4041ef62a900e6c

For the Three Producers, 3x async replication scenario, the command is
the same as one producer:

bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance
test 5000 100 -1 acks=1
bootstrap.servers=esv4-hcl198.grid.linkedin.com:9092
buffer.memory=67108864 batch.size=8196

So How to I run the test for three producers? Do I just run them on three
separate servers at the same time? Will there be some error in this way
since the three producers can't be started at the same time?

Thanks.

best,
Yuheng


Re: How to run the three producers test

2015-07-14 Thread Yuheng Du
Jiefu,

Thank you. The three producers can run at the same time. I mean should they
be started at exactly the same time? (I have three consoles from each of
the three machines and I just start the console command manually one by
one) Or a small variation of the starting time won't matter?

Gwen and Jiefu,

I have started the three producers at three machines, after a while, all of
them gives a java.net.ConnectException:

[2015-07-14 12:56:46,352] WARN Error in I/O with producer0-link-0/
192.168.1.1 (org.apache.kafka.common.network.Selector)

java.net.ConnectException: Connection refused..

[2015-07-14 12:56:48,056] WARN Error in I/O with producer1-link-0/
192.168.1.2 (org.apache.kafka.common.network.Selector)

java.net.ConnectException: Connection refused.

What could be the cause?

Thank you guys!




On Tue, Jul 14, 2015 at 2:47 PM, JIEFU GONG jg...@berkeley.edu wrote:

 Yuheng,

 Yes, if you read the blog post it specifies that he's using three separate
 machines. There's no reason the producers cannot be started at the same
 time, I believe.

 On Tue, Jul 14, 2015 at 11:42 AM, Yuheng Du yuheng.du.h...@gmail.com
 wrote:

  Hi,
 
  I am running the performance test for kafka.
  https://gist.github.com/jkreps
  /c7ddb4041ef62a900e6c
 
  For the Three Producers, 3x async replication scenario, the command is
  the same as one producer:
 
  bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance
  test 5000 100 -1 acks=1
  bootstrap.servers=esv4-hcl198.grid.linkedin.com:9092
  buffer.memory=67108864 batch.size=8196
 
  So How to I run the test for three producers? Do I just run them on three
  separate servers at the same time? Will there be some error in this way
  since the three producers can't be started at the same time?
 
  Thanks.
 
  best,
  Yuheng
 



 --

 Jiefu Gong
 University of California, Berkeley | Class of 2017
 B.A Computer Science | College of Letters and Sciences

 jg...@berkeley.edu elise...@berkeley.edu | (925) 400-3427



Latency test

2015-07-14 Thread Yuheng Du
Currently, the latency test from kafka test the end to end latency between
producers and consumers.

Is there  a way to test the producer to broker  and broker to consumer
delay seperately?

Thanks.


Re: performance benchmarking of kafka

2015-07-13 Thread Yuheng Du
Hi,

Appreciate your response. It works now! It is just a typo of the class
names : (.

It really has nothing to do with whether you are using the binaries or the
source version of kafka.

Thanks everyone!

On Mon, Jul 13, 2015 at 11:18 PM, tao xiao xiaotao...@gmail.com wrote:

 org.apache.kafka.clients.tools.ProducerPerformance resides in
 kafka-clients-0.8.2.1.jar.
 You need to make sure the jar exists in $KAFKA_HOME/libs/. I use
 kafka_2.10-0.8.2.1
 too and here is the output

 % bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance

 USAGE: java org.apache.kafka.clients.tools.ProducerPerformance topic_name
 num_records record_size target_records_sec [prop_name=prop_value]*



 On Tue, 14 Jul 2015 at 05:08 Yuheng Du yuheng.du.h...@gmail.com wrote:

  I am using the binaries of kafka_2.10-0.8.2.1. Could that be the problem?
  Should I use the source of kafka-0.8.2.1-src.tgz to each of my machiines,
  build them and run the test?
  Thanks.
 
  On Mon, Jul 13, 2015 at 4:37 PM, JIEFU GONG jg...@berkeley.edu wrote:
 
   You may need to open up your run-class.sh in a text editor and modify
 the
   classpath -- I believe I had a similar error before.
  
   On Mon, Jul 13, 2015 at 1:16 PM, Yuheng Du yuheng.du.h...@gmail.com
   wrote:
  
Hi guys,
   
I am trying to replicate the test of benchmarking kafka at
   
   
  
 
 http://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines
.
   
When I run
   
bin/kafka-run-class.sh
  org.apache.kafka.clients.tools.ProducerPerformance
test7 5000 100 -1 acks=1 bootstrap.servers=192.168.1.1:9092
buffer.memory=67108864 batch.size=8196
   
and I got the following error:
Error: Could not find or load main class
org.apache.kafka.client.tools.ProducerPerformance
   
What should I fix? Thank you!
   
  
  
  
   --
  
   Jiefu Gong
   University of California, Berkeley | Class of 2017
   B.A Computer Science | College of Letters and Sciences
  
   jg...@berkeley.edu elise...@berkeley.edu | (925) 400-3427
  
 



performance benchmarking of kafka

2015-07-13 Thread Yuheng Du
Hi guys,

I am trying to replicate the test of benchmarking kafka at
http://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines
.

When I run

bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance
test7 5000 100 -1 acks=1 bootstrap.servers=192.168.1.1:9092
buffer.memory=67108864 batch.size=8196

and I got the following error:
Error: Could not find or load main class
org.apache.kafka.client.tools.ProducerPerformance

What should I fix? Thank you!


Re: performance benchmarking of kafka

2015-07-13 Thread Yuheng Du
Thank you. I see that in run-class.sh, they have the following lines:

 63 for file in $base_dir/clients/build/libs/kafka-clients*.jar;

 64 do

 65   CLASSPATH=$CLASSPATH:$file

 66 done

So I believe all the jars in the libs/ directory have already been included
in the classpath?

Which directory is the ProducerPerformance class resides?

Thanks.

On Mon, Jul 13, 2015 at 4:37 PM, JIEFU GONG jg...@berkeley.edu wrote:

 You may need to open up your run-class.sh in a text editor and modify the
 classpath -- I believe I had a similar error before.

 On Mon, Jul 13, 2015 at 1:16 PM, Yuheng Du yuheng.du.h...@gmail.com
 wrote:

  Hi guys,
 
  I am trying to replicate the test of benchmarking kafka at
 
 
 http://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines
  .
 
  When I run
 
  bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance
  test7 5000 100 -1 acks=1 bootstrap.servers=192.168.1.1:9092
  buffer.memory=67108864 batch.size=8196
 
  and I got the following error:
  Error: Could not find or load main class
  org.apache.kafka.client.tools.ProducerPerformance
 
  What should I fix? Thank you!
 



 --

 Jiefu Gong
 University of California, Berkeley | Class of 2017
 B.A Computer Science | College of Letters and Sciences

 jg...@berkeley.edu elise...@berkeley.edu | (925) 400-3427



Re: performance benchmarking of kafka

2015-07-13 Thread Yuheng Du
I am using the binaries of kafka_2.10-0.8.2.1. Could that be the problem?
Should I use the source of kafka-0.8.2.1-src.tgz to each of my machiines,
build them and run the test?
Thanks.

On Mon, Jul 13, 2015 at 4:37 PM, JIEFU GONG jg...@berkeley.edu wrote:

 You may need to open up your run-class.sh in a text editor and modify the
 classpath -- I believe I had a similar error before.

 On Mon, Jul 13, 2015 at 1:16 PM, Yuheng Du yuheng.du.h...@gmail.com
 wrote:

  Hi guys,
 
  I am trying to replicate the test of benchmarking kafka at
 
 
 http://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines
  .
 
  When I run
 
  bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance
  test7 5000 100 -1 acks=1 bootstrap.servers=192.168.1.1:9092
  buffer.memory=67108864 batch.size=8196
 
  and I got the following error:
  Error: Could not find or load main class
  org.apache.kafka.client.tools.ProducerPerformance
 
  What should I fix? Thank you!
 



 --

 Jiefu Gong
 University of California, Berkeley | Class of 2017
 B.A Computer Science | College of Letters and Sciences

 jg...@berkeley.edu elise...@berkeley.edu | (925) 400-3427



Re: A kafka web monitor

2015-03-27 Thread Yuheng Du
Hi Wan,

I tried to install this DCMonitor, but when I try to clone the project, but
it gives me Permission denied, the remote end hung up unexpectedly. Can
you provide any suggestions to this issue?

Thanks.

best,
Yuheng

On Mon, Mar 23, 2015 at 8:54 AM, Wan Wei flowbeha...@gmail.com wrote:

 We have make a simple web console to monitor some kafka informations like
 consumer offset, logsize.

 https://github.com/shunfei/DCMonitor


 Hope you like it and offer your help to make it better :)
 Regards
 Flow



kafka topic information

2015-03-09 Thread Yuheng Du
I am wondering where does kafka cluster keep the topic metadata (name,
partition, replication, etc)? How does a server recover  the topic's
metadata and messages after restart and what data will be lost?

Thanks for anyone to answer my questions.

best,
Yuheng


Re: kafka topic information

2015-03-09 Thread Yuheng Du
Harsha,

Thanks for reply. So what if the zookeeper cluster fails? Will the topics
information be lost? What fault-tolerant mechanism does zookeeper offer?

best,

On Mon, Mar 9, 2015 at 11:36 AM, Harsha ka...@harsha.io wrote:

 Yuheng,
   kafka keeps cluster metadata in zookeeper along with topic
 metadata as well. You can use zookeeper-shell.sh or zkCli.sh to check zk
 nodes, /brokers/topics will give you the list of topics .

 --
 Harsha


 On March 9, 2015 at 8:20:59 AM, Yuheng Du (yuheng.du.h...@gmail.com)
 wrote:

 I am wondering where does kafka cluster keep the topic metadata (name,
 partition, replication, etc)? How does a server recover the topic's
 metadata and messages after restart and what data will be lost?

 Thanks for anyone to answer my questions.

 best,
 Yuheng




Re: kafka topic information

2015-03-09 Thread Yuheng Du
Thanks, got it!

best,
Yuheng

On Mon, Mar 9, 2015 at 11:52 AM, Harsha ka...@harsha.io wrote:

 In general users are expected to run zookeeper cluster of 3 or 5 nodes.
 Zookeeper requires quorum of servers running which means at least ceil(n/2)
 servers need to be up. For 3 zookeeper nodes there needs to be atleast 2 zk
 nodes up at any time , i.e your cluster can function  fine incase of 1
 machine failure and incase of 5 there should be at least 3 nodes to be up
 and running.  For more info on zookeeper you can look under here
 http://zookeeper.apache.org/doc/r3.4.6/
 http://zookeeper.apache.org/doc/r3.4.6/zookeeperAdmin.html


 --
 Harsha

 On March 9, 2015 at 8:39:00 AM, Yuheng Du (yuheng.du.h...@gmail.com)
 wrote:

 Harsha,

 Thanks for reply. So what if the zookeeper cluster fails? Will the topics
 information be lost? What fault-tolerant mechanism does zookeeper offer?

 best,

 On Mon, Mar 9, 2015 at 11:36 AM, Harsha ka...@harsha.io wrote:

  Yuheng,
kafka keeps cluster metadata in zookeeper along with topic
 metadata as well. You can use zookeeper-shell.sh or zkCli.sh to check zk
 nodes, /brokers/topics will give you the list of topics .

  --
 Harsha


 On March 9, 2015 at 8:20:59 AM, Yuheng Du (yuheng.du.h...@gmail.com)
 wrote:

  I am wondering where does kafka cluster keep the topic metadata (name,
 partition, replication, etc)? How does a server recover the topic's
 metadata and messages after restart and what data will be lost?

 Thanks for anyone to answer my questions.

 best,
 Yuheng





Re: Set up kafka cluster

2015-03-05 Thread Yuheng Du
will do, Thanks!

On Thu, Mar 5, 2015 at 3:35 PM, Gwen Shapira gshap...@cloudera.com wrote:

 Did you take a look at the quick-start guide?
 https://kafka.apache.org/082/quickstart.html

 It shows how to set up a single node, how to validate that its working
 and then how to set up multi-node cluster.

 Good luck!

 On Thu, Mar 5, 2015 at 12:30 PM, Yuheng Du yuheng.du.h...@gmail.com
 wrote:
  Thank you Gwen,
 
  I also need the kafka cluster continue to provide message brokering
 service
  to a Storm cluster after the benchmarking. I am fairly new to cluster
  setups. So is there an instruction telling me how to set up the
 three-node
  kafka cluster before running benchmarking? That would be really helpful.
 
  Thanks gain.
 
  best,
  Yuheng
 
  On Thu, Mar 5, 2015 at 3:22 PM, Gwen Shapira gshap...@cloudera.com
 wrote:
 
  Jay Kreps has a gist with step by step instructions for reproducing
  the benchmarks used by LinkedIn:
  https://gist.github.com/jkreps/c7ddb4041ef62a900e6c
 
  And the blog with the results:
 
 
 https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines
 
  Gwen
 
  On Thu, Mar 5, 2015 at 12:16 PM, Yuheng Du yuheng.du.h...@gmail.com
  wrote:
   Hi everyone,
  
   I am trying to set up a kafka cluster consisting of three machines. I
  wanna
   run a benchmarking program in them. Can anyone recommend a step by
 step
   tutorial/instruction of how I can do it?
  
   Thanks.
  
   best,
   Yuheng
 



Re: Set up kafka cluster

2015-03-05 Thread Yuheng Du
Thank you Gwen,

I also need the kafka cluster continue to provide message brokering service
to a Storm cluster after the benchmarking. I am fairly new to cluster
setups. So is there an instruction telling me how to set up the three-node
kafka cluster before running benchmarking? That would be really helpful.

Thanks gain.

best,
Yuheng

On Thu, Mar 5, 2015 at 3:22 PM, Gwen Shapira gshap...@cloudera.com wrote:

 Jay Kreps has a gist with step by step instructions for reproducing
 the benchmarks used by LinkedIn:
 https://gist.github.com/jkreps/c7ddb4041ef62a900e6c

 And the blog with the results:

 https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines

 Gwen

 On Thu, Mar 5, 2015 at 12:16 PM, Yuheng Du yuheng.du.h...@gmail.com
 wrote:
  Hi everyone,
 
  I am trying to set up a kafka cluster consisting of three machines. I
 wanna
  run a benchmarking program in them. Can anyone recommend a step by step
  tutorial/instruction of how I can do it?
 
  Thanks.
 
  best,
  Yuheng



Set up kafka cluster

2015-03-05 Thread Yuheng Du
Hi everyone,

I am trying to set up a kafka cluster consisting of three machines. I wanna
run a benchmarking program in them. Can anyone recommend a step by step
tutorial/instruction of how I can do it?

Thanks.

best,
Yuheng