subject:"Re\: why did Kafka choose pull instead of push for a consumer \?"

Re: why did Kafka choose pull instead of push for a consumer ?

2016-09-28 Thread Gerard Klijs

@Kant,
Did you measure the latency while doing the test? I would expect there is
some trade-off between latency and throughput. Using only the default
configuration makes it difficult to compare. And it would also be
interesting to see the relative changes when the number of brokers will be
changed from 1 to 3, or even more.
Took a quick look at google, but could not find any kind of good comparison
like that.

On Fri, Sep 23, 2016 at 2:11 PM Tauzell, Dave 
wrote:

> Kafka writes each message but the OS is writing those to in memory disk
> cache.  Kafka periodically calls fsync() to tell the OS to force the disk
> cache to actual disk.  Kafka gets high availability by replicating messages
> to other brokers so that the messages are in-memory on several machines at
> once.  If all the replicas fail around the same time you could lose data.
>
> -Dave
>
> -Original Message-
> From: kant kodali [mailto:kanth...@gmail.com]
> Sent: Friday, September 23, 2016 5:18 AM
> To: users@kafka.apache.org
> Subject: Re: why did Kafka choose pull instead of push for a consumer ?
>
> @Gerard
> Here are my initial benchmarks
> Producer on Machine 1 (m4.xlarge on AWS)Broker on Machine 2 (m4.xlarge on
> AWS) Consumer on Machine 3 (m4.xlarge on AWS) Data size 1.2KB Receive
> throughtput: ~24K Kafka Receive throughput ~58K (same exact configuration)
> All the benchmarks I ran are with default options So what pulsar guys are
> saying is that Kafka doesn't persist every message by default instead it
> would batch them for a period of time and then persist so if the JVM
> crashes before it persist all the messages that are in the batch are lost
> whereas pulsar guarantees strong durability by storing every message to
> write ahead log so messages are never lost.
> My question now is that what settings I need to change in Kafka so it will
> store every message? that way I am comparing apples to apples.
>
>
>
>
>
>
> On Fri, Sep 23, 2016 12:06 AM, Gerard Klijs gerard.kl...@dizzit.com
> wrote:
> I haven't tried it myself, nor very likely will in the near future, but
>
> since it's also distributed I guess that with a large enough cluster you
>
> will be able to handle any load. One of the things kafka might be better at
>
> is more connecters available, a better at least once guarantee, better
>
> monitoring options. I really don't know, but if latancy is really important
>
> pulsar might be better, they used kafka before at yahoo and maybe still do
>
> for some stuff, recent work on https://github.com/yahoo/kafka-manager
> seems
>
> to suggest so.
>
> Alternatively you could configure a kafka topic/producer/consumer to limit
>
> latency, and that may also be enough to get a low enough latency. It would
>
> certainly be interesting to compare the two, with the same hardware, and
>
> with high load.
>
>
>
>
> On Thu, Sep 22, 2016 at 6:01 PM kant kodali  wrote:
>
>
>
>
> > @Gerard Thanks for this. It looks good any benchmarks on this
> > throughput
>
> > wise?
>
> >
>
> >
>
> >
>
> >
>
> >
>
> >
>
> > On Thu, Sep 22, 2016 7:45 AM, Gerard Klijs gerard.kl...@dizzit.com
>
> > wrote:
>
> > We have a simple application producing 1 msg/sec, and did nothing to
>
> >
>
> > optimise the performance and have about a 10 msec delay between
> > consumer
>
> >
>
> > and producer. When low latency is important, maybe pulsar is a better
> > fit,
>
> >
>
> > https://www.datanami.com/2016/09/07/yahoos-new-pulsar-kafka-competitor/
> .
>
> >
>
> >
>
> >
>
> >
>
> > On Tue, Sep 20, 2016 at 2:24 PM Michael Freeman 
>
> >
>
> > wrote:
>
> >
>
> >
>
> >
>
> >
>
> > > Thanks for sharing Radek, great article.
>
> >
>
> > >
>
> >
>
> > > Michael
>
> >
>
> > >
>
> >
>
> > > > On 17 Sep 2016, at 21:13, Radoslaw Gruchalski
> > > > 
>
> >
>
> > > wrote:
>
> >
>
> > > >
>
> >
>
> > > > Please read this article:
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> >
>
> https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying
>
> >
>
> > > >
>
> >
>
> > > > –
>
> >
>
> > > > Best regards,
>
> >
>
> > > > Radek Gruchalski
>
> >
>
> > > > r

RE: why did Kafka choose pull instead of push for a consumer ?

2016-09-23 Thread Tauzell, Dave

Kafka writes each message but the OS is writing those to in memory disk cache.  
Kafka periodically calls fsync() to tell the OS to force the disk cache to 
actual disk.  Kafka gets high availability by replicating messages to other 
brokers so that the messages are in-memory on several machines at once.  If all 
the replicas fail around the same time you could lose data.

-Dave

-Original Message-
From: kant kodali [mailto:kanth...@gmail.com]
Sent: Friday, September 23, 2016 5:18 AM
To: users@kafka.apache.org
Subject: Re: why did Kafka choose pull instead of push for a consumer ?

@Gerard
Here are my initial benchmarks
Producer on Machine 1 (m4.xlarge on AWS)Broker on Machine 2 (m4.xlarge on AWS) 
Consumer on Machine 3 (m4.xlarge on AWS) Data size 1.2KB Receive throughtput: 
~24K Kafka Receive throughput ~58K (same exact configuration) All the 
benchmarks I ran are with default options So what pulsar guys are saying is 
that Kafka doesn't persist every message by default instead it would batch them 
for a period of time and then persist so if the JVM crashes before it persist 
all the messages that are in the batch are lost whereas pulsar guarantees 
strong durability by storing every message to write ahead log so messages are 
never lost.
My question now is that what settings I need to change in Kafka so it will 
store every message? that way I am comparing apples to apples.

On Fri, Sep 23, 2016 12:06 AM, Gerard Klijs gerard.kl...@dizzit.com
wrote:
I haven't tried it myself, nor very likely will in the near future, but

since it's also distributed I guess that with a large enough cluster you

will be able to handle any load. One of the things kafka might be better at

is more connecters available, a better at least once guarantee, better

monitoring options. I really don't know, but if latancy is really important

pulsar might be better, they used kafka before at yahoo and maybe still do

for some stuff, recent work on https://github.com/yahoo/kafka-manager seems

to suggest so.

Alternatively you could configure a kafka topic/producer/consumer to limit

latency, and that may also be enough to get a low enough latency. It would

certainly be interesting to compare the two, with the same hardware, and

with high load.

On Thu, Sep 22, 2016 at 6:01 PM kant kodali  wrote:

> @Gerard Thanks for this. It looks good any benchmarks on this
> throughput

> wise?

>

>

>

>

>

>

> On Thu, Sep 22, 2016 7:45 AM, Gerard Klijs gerard.kl...@dizzit.com

> wrote:

> We have a simple application producing 1 msg/sec, and did nothing to

>

> optimise the performance and have about a 10 msec delay between
> consumer

>

> and producer. When low latency is important, maybe pulsar is a better
> fit,

>

> https://www.datanami.com/2016/09/07/yahoos-new-pulsar-kafka-competitor/ .

>

>

>

>

> On Tue, Sep 20, 2016 at 2:24 PM Michael Freeman 

>

> wrote:

>

>

>

>

> > Thanks for sharing Radek, great article.

>

> >

>

> > Michael

>

> >

>

> > > On 17 Sep 2016, at 21:13, Radoslaw Gruchalski
> > > 

>

> > wrote:

>

> > >

>

> > > Please read this article:

>

> > >

>

> >

>

>
https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying

>

> > >

>

> > > –

>

> > > Best regards,

>

> > > Radek Gruchalski

>

> > > ra...@gruchalski.com

>

> > >

>

> > >

>

> > > On September 17, 2016 at 9:49:43 PM, kant kodali
> > > (kanth...@gmail.com)

>

> > wrote:

>

> > >

>

> > > Still it should be possible to implement using reactive streams right.

>

> > > Could you please enlighten me on what are the some major
> > > differences

> you

>

> > > see

>

> > > between a commit log and a message queue? I see them being
> > > different

> only

>

> > > in the

>

> > > implementation but not functionality wise so I would be glad to
> > > hear

> your

>

> > > thoughts.

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > > On Sat, Sep 17, 2016 12:39 PM, Radoslaw Gruchalski

> ra...@gruchalski.com

>

> > > wrote:

>

> > > Kafka is not a queue. It’s a distributed commit log.

>

> > >

>

> > >

>

> > >

>

> > >

>

> > > –

>

> > >

>

> > > Best regards,

>

> > >

>

> > > Radek Gruchalski

>

> > >

>

>

Re: why did Kafka choose pull instead of push for a consumer ?

2016-09-23 Thread kant kodali

@Gerard
Here are my initial benchmarks
Producer on Machine 1 (m4.xlarge on AWS)Broker on Machine 2 (m4.xlarge on AWS)
Consumer on Machine 3 (m4.xlarge on AWS)
Data size 1.2KB
Receive throughtput: ~24K
Kafka Receive throughput ~58K (same exact configuration)
All the benchmarks I ran are with default options So what pulsar guys are saying
is that Kafka doesn't persist every message by default instead it would batch
them for a period of time and then persist so if the JVM crashes before it
persist all the messages that are in the batch are lost whereas pulsar
guarantees strong durability by storing every message to write ahead log so
messages are never lost.
My question now is that what settings I need to change in Kafka so it will store
every message? that way I am comparing apples to apples.
 





On Fri, Sep 23, 2016 12:06 AM, Gerard Klijs gerard.kl...@dizzit.com
wrote:
I haven't tried it myself, nor very likely will in the near future, but

since it's also distributed I guess that with a large enough cluster you

will be able to handle any load. One of the things kafka might be better at

is more connecters available, a better at least once guarantee, better

monitoring options. I really don't know, but if latancy is really important

pulsar might be better, they used kafka before at yahoo and maybe still do

for some stuff, recent work on https://github.com/yahoo/kafka-manager seems

to suggest so.

Alternatively you could configure a kafka topic/producer/consumer to limit

latency, and that may also be enough to get a low enough latency. It would

certainly be interesting to compare the two, with the same hardware, and

with high load.




On Thu, Sep 22, 2016 at 6:01 PM kant kodali  wrote:




> @Gerard Thanks for this. It looks good any benchmarks on this throughput

> wise?

>

>

>

>

>

>

> On Thu, Sep 22, 2016 7:45 AM, Gerard Klijs gerard.kl...@dizzit.com

> wrote:

> We have a simple application producing 1 msg/sec, and did nothing to

>

> optimise the performance and have about a 10 msec delay between consumer

>

> and producer. When low latency is important, maybe pulsar is a better fit,

>

> https://www.datanami.com/2016/09/07/yahoos-new-pulsar-kafka-competitor/ .

>

>

>

>

> On Tue, Sep 20, 2016 at 2:24 PM Michael Freeman 

>

> wrote:

>

>

>

>

> > Thanks for sharing Radek, great article.

>

> >

>

> > Michael

>

> >

>

> > > On 17 Sep 2016, at 21:13, Radoslaw Gruchalski 

>

> > wrote:

>

> > >

>

> > > Please read this article:

>

> > >

>

> >

>

>
https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying

>

> > >

>

> > > –

>

> > > Best regards,

>

> > > Radek Gruchalski

>

> > > ra...@gruchalski.com

>

> > >

>

> > >

>

> > > On September 17, 2016 at 9:49:43 PM, kant kodali (kanth...@gmail.com)

>

> > wrote:

>

> > >

>

> > > Still it should be possible to implement using reactive streams right.

>

> > > Could you please enlighten me on what are the some major differences

> you

>

> > > see

>

> > > between a commit log and a message queue? I see them being different

> only

>

> > > in the

>

> > > implementation but not functionality wise so I would be glad to hear

> your

>

> > > thoughts.

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > > On Sat, Sep 17, 2016 12:39 PM, Radoslaw Gruchalski

> ra...@gruchalski.com

>

> > > wrote:

>

> > > Kafka is not a queue. It’s a distributed commit log.

>

> > >

>

> > >

>

> > >

>

> > >

>

> > > –

>

> > >

>

> > > Best regards,

>

> > >

>

> > > Radek Gruchalski

>

> > >

>

> > > ra...@gruchalski.com

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > > On September 17, 2016 at 9:23:09 PM, kant kodali (kanth...@gmail.com)

>

> > > wrote:

>

> > >

>

> > >

>

> > >

>

> > >

>

> > > Hmm...Looks like Kafka is written in Scala. There is this thing called

>

> > >

>

> > > reactive

>

> > >

>

> > > streams where a slow consumer can apply back pressure if they are

>

> > consuming

>

> > >

>

> > > slow. Even with Java this is possible with a Library called RxJava and

>

> > >

>

> > > these

>

> > >

>

> > > ideas will be incorporated in Java 9 as well.

>

> > >

>

> > > I still don't see why they would pick poll just to solve this one

> problem

>

> > >

>

> > > and

>

> > >

>

> > > compensating on others. Poll just don't sound realtime. I heard from

> some

>

> > >

>

> > > people

>

> > >

>

> > > that they would set poll to 100ms. Well 1) that is a lot of time. 2)

>

> > >

>

> > > Financial

>

> > >

>

> > > applications requires micro second latency. Kafka from what I

> understand

>

> > >

>

> > > looks

>

> > >

>

> > > like has a very high latency and here is the article.

>

> > >

>

> > > http://bravenewgeek.com/dissecting-message-queues/ I usually don't go

> by

>

> > >

>

> > > articles but I ran my own experiments on differen

Re: why did Kafka choose pull instead of push for a consumer ?

2016-09-23 Thread Gerard Klijs

I haven't tried it myself, nor very likely will in the near future, but
since it's also distributed I guess that with a large enough cluster you
will be able to handle any load. One of the things kafka might be better at
is more connecters available, a better at least once guarantee, better
monitoring options. I really don't know, but if latancy is really important
pulsar might be better, they used kafka before at yahoo and maybe still do
for some stuff, recent work on https://github.com/yahoo/kafka-manager seems
to suggest so.
Alternatively you could configure a kafka topic/producer/consumer to limit
latency, and that may also be enough to get a low enough latency. It would
certainly be interesting to compare the two, with the same hardware, and
with high load.

On Thu, Sep 22, 2016 at 6:01 PM kant kodali  wrote:

> @Gerard Thanks for this. It looks good any benchmarks on this throughput
> wise?
>
>
>
>
>
>
> On Thu, Sep 22, 2016 7:45 AM, Gerard Klijs gerard.kl...@dizzit.com
> wrote:
> We have a simple application producing 1 msg/sec, and did nothing to
>
> optimise the performance and have about a 10 msec delay between consumer
>
> and producer. When low latency is important, maybe pulsar is a better fit,
>
> https://www.datanami.com/2016/09/07/yahoos-new-pulsar-kafka-competitor/ .
>
>
>
>
> On Tue, Sep 20, 2016 at 2:24 PM Michael Freeman 
>
> wrote:
>
>
>
>
> > Thanks for sharing Radek, great article.
>
> >
>
> > Michael
>
> >
>
> > > On 17 Sep 2016, at 21:13, Radoslaw Gruchalski 
>
> > wrote:
>
> > >
>
> > > Please read this article:
>
> > >
>
> >
>
> https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying
>
> > >
>
> > > –
>
> > > Best regards,
>
> > > Radek Gruchalski
>
> > > ra...@gruchalski.com
>
> > >
>
> > >
>
> > > On September 17, 2016 at 9:49:43 PM, kant kodali (kanth...@gmail.com)
>
> > wrote:
>
> > >
>
> > > Still it should be possible to implement using reactive streams right.
>
> > > Could you please enlighten me on what are the some major differences
> you
>
> > > see
>
> > > between a commit log and a message queue? I see them being different
> only
>
> > > in the
>
> > > implementation but not functionality wise so I would be glad to hear
> your
>
> > > thoughts.
>
> > >
>
> > >
>
> > >
>
> > >
>
> > >
>
> > >
>
> > > On Sat, Sep 17, 2016 12:39 PM, Radoslaw Gruchalski
> ra...@gruchalski.com
>
> > > wrote:
>
> > > Kafka is not a queue. It’s a distributed commit log.
>
> > >
>
> > >
>
> > >
>
> > >
>
> > > –
>
> > >
>
> > > Best regards,
>
> > >
>
> > > Radek Gruchalski
>
> > >
>
> > > ra...@gruchalski.com
>
> > >
>
> > >
>
> > >
>
> > >
>
> > >
>
> > >
>
> > >
>
> > > On September 17, 2016 at 9:23:09 PM, kant kodali (kanth...@gmail.com)
>
> > > wrote:
>
> > >
>
> > >
>
> > >
>
> > >
>
> > > Hmm...Looks like Kafka is written in Scala. There is this thing called
>
> > >
>
> > > reactive
>
> > >
>
> > > streams where a slow consumer can apply back pressure if they are
>
> > consuming
>
> > >
>
> > > slow. Even with Java this is possible with a Library called RxJava and
>
> > >
>
> > > these
>
> > >
>
> > > ideas will be incorporated in Java 9 as well.
>
> > >
>
> > > I still don't see why they would pick poll just to solve this one
> problem
>
> > >
>
> > > and
>
> > >
>
> > > compensating on others. Poll just don't sound realtime. I heard from
> some
>
> > >
>
> > > people
>
> > >
>
> > > that they would set poll to 100ms. Well 1) that is a lot of time. 2)
>
> > >
>
> > > Financial
>
> > >
>
> > > applications requires micro second latency. Kafka from what I
> understand
>
> > >
>
> > > looks
>
> > >
>
> > > like has a very high latency and here is the article.
>
> > >
>
> > > http://bravenewgeek.com/dissecting-message-queues/ I usually don't go
> by
>
> > >
>
> > > articles but I ran my own experiments on different queues and my
> numbers
>
> > >
>
> > > are
>
> > >
>
> > > very close to this article so I would say whoever wrote this article
> has
>
> > >
>
> > > done a
>
> > >
>
> > > good Job. 3) poll does generate unnecessary traffic in case if the data
>
> > >
>
> > > isn't
>
> > >
>
> > > available.
>
> > >
>
> > > Finally still not sure why they would pick poll() ? or do they plan on
>
> > >
>
> > > introducing reactive streams?Thanks,kant
>
> > >
>
> > >
>
> > >
>
> > >
>
> > >
>
> > >
>
> > >
>
> > >
>
> > >
>
> > >
>
> > >
>
> > >
>
> > >
>
> > >
>
> > >
>
> > >
>
> > >
>
> > >
>
> > >
>
> > > On Sat, Sep 17, 2016 5:14 AM, Radoslaw Gruchalski ra...@gruchalski.com
>
> > >
>
> > > wrote:
>
> > >
>
> > > I'm only guessing here regarding if this is the reason:
>
> > >
>
> > >
>
> > >
>
> > >
>
> > > Pull is much more sensible when a lot of data is pushed through. It
>
> > allows
>
> > >
>
> > > consumers consuming at their own pace, slow consumers do not slow the
>
> > >
>
> > > complete
>
> > >
>
> > > system down.
>
> > >
>
> > >
>
> > >
>
> > >
>
> > >
>
> > >
>
> > >
>
> > >
>
> > >
>
> > >
>
> > >
>
> > >
>

Re: why did Kafka choose pull instead of push for a consumer ?

2016-09-22 Thread kant kodali

@Gerard Thanks for this. It looks good any benchmarks on this throughput wise?
 





On Thu, Sep 22, 2016 7:45 AM, Gerard Klijs gerard.kl...@dizzit.com
wrote:
We have a simple application producing 1 msg/sec, and did nothing to

optimise the performance and have about a 10 msec delay between consumer

and producer. When low latency is important, maybe pulsar is a better fit,

https://www.datanami.com/2016/09/07/yahoos-new-pulsar-kafka-competitor/ .




On Tue, Sep 20, 2016 at 2:24 PM Michael Freeman 

wrote:




> Thanks for sharing Radek, great article.

>

> Michael

>

> > On 17 Sep 2016, at 21:13, Radoslaw Gruchalski 

> wrote:

> >

> > Please read this article:

> >

>
https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying

> >

> > –

> > Best regards,

> > Radek Gruchalski

> > ra...@gruchalski.com

> >

> >

> > On September 17, 2016 at 9:49:43 PM, kant kodali (kanth...@gmail.com)

> wrote:

> >

> > Still it should be possible to implement using reactive streams right.

> > Could you please enlighten me on what are the some major differences you

> > see

> > between a commit log and a message queue? I see them being different only

> > in the

> > implementation but not functionality wise so I would be glad to hear your

> > thoughts.

> >

> >

> >

> >

> >

> >

> > On Sat, Sep 17, 2016 12:39 PM, Radoslaw Gruchalski ra...@gruchalski.com

> > wrote:

> > Kafka is not a queue. It’s a distributed commit log.

> >

> >

> >

> >

> > –

> >

> > Best regards,

> >

> > Radek Gruchalski

> >

> > ra...@gruchalski.com

> >

> >

> >

> >

> >

> >

> >

> > On September 17, 2016 at 9:23:09 PM, kant kodali (kanth...@gmail.com)

> > wrote:

> >

> >

> >

> >

> > Hmm...Looks like Kafka is written in Scala. There is this thing called

> >

> > reactive

> >

> > streams where a slow consumer can apply back pressure if they are

> consuming

> >

> > slow. Even with Java this is possible with a Library called RxJava and

> >

> > these

> >

> > ideas will be incorporated in Java 9 as well.

> >

> > I still don't see why they would pick poll just to solve this one problem

> >

> > and

> >

> > compensating on others. Poll just don't sound realtime. I heard from some

> >

> > people

> >

> > that they would set poll to 100ms. Well 1) that is a lot of time. 2)

> >

> > Financial

> >

> > applications requires micro second latency. Kafka from what I understand

> >

> > looks

> >

> > like has a very high latency and here is the article.

> >

> > http://bravenewgeek.com/dissecting-message-queues/ I usually don't go by

> >

> > articles but I ran my own experiments on different queues and my numbers

> >

> > are

> >

> > very close to this article so I would say whoever wrote this article has

> >

> > done a

> >

> > good Job. 3) poll does generate unnecessary traffic in case if the data

> >

> > isn't

> >

> > available.

> >

> > Finally still not sure why they would pick poll() ? or do they plan on

> >

> > introducing reactive streams?Thanks,kant

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> > On Sat, Sep 17, 2016 5:14 AM, Radoslaw Gruchalski ra...@gruchalski.com

> >

> > wrote:

> >

> > I'm only guessing here regarding if this is the reason:

> >

> >

> >

> >

> > Pull is much more sensible when a lot of data is pushed through. It

> allows

> >

> > consumers consuming at their own pace, slow consumers do not slow the

> >

> > complete

> >

> > system down.

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> > --

> >

> >

> >

> >

> > Best regards,

> >

> >

> >

> >

> > Rad

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> > On Sat, Sep 17, 2016 at 11:18 AM +0200, "kant kodali" <

> kanth...@gmail.com>

> >

> > wrote:

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> > why did Kafka choose pull instead of push for a consumer? push sounds

> like

> >

> > it

> >

> >

> >

> >

> > is more realtime to me than poll and also wouldn't poll just keeps

> polling

> >

> > even

> >

> >

> >

> >

> > when they are no messages in the broker causing more traffic? please

> >

> > enlighten

> >

> >

> >

> >

> > me

>

Re: why did Kafka choose pull instead of push for a consumer ?

2016-09-22 Thread Gerard Klijs

We have a simple application producing 1 msg/sec, and did nothing to
optimise the performance and have about a 10 msec delay between consumer
and producer. When low latency is important, maybe pulsar is a better fit,
https://www.datanami.com/2016/09/07/yahoos-new-pulsar-kafka-competitor/ .

On Tue, Sep 20, 2016 at 2:24 PM Michael Freeman 
wrote:

> Thanks for sharing Radek, great article.
>
> Michael
>
> > On 17 Sep 2016, at 21:13, Radoslaw Gruchalski 
> wrote:
> >
> > Please read this article:
> >
> https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying
> >
> > –
> > Best regards,
> > Radek Gruchalski
> > ra...@gruchalski.com
> >
> >
> > On September 17, 2016 at 9:49:43 PM, kant kodali (kanth...@gmail.com)
> wrote:
> >
> > Still it should be possible to implement using reactive streams right.
> > Could you please enlighten me on what are the some major differences you
> > see
> > between a commit log and a message queue? I see them being different only
> > in the
> > implementation but not functionality wise so I would be glad to hear your
> > thoughts.
> >
> >
> >
> >
> >
> >
> > On Sat, Sep 17, 2016 12:39 PM, Radoslaw Gruchalski ra...@gruchalski.com
> > wrote:
> > Kafka is not a queue. It’s a distributed commit log.
> >
> >
> >
> >
> > –
> >
> > Best regards,
> >
> > Radek Gruchalski
> >
> > ra...@gruchalski.com
> >
> >
> >
> >
> >
> >
> >
> > On September 17, 2016 at 9:23:09 PM, kant kodali (kanth...@gmail.com)
> > wrote:
> >
> >
> >
> >
> > Hmm...Looks like Kafka is written in Scala. There is this thing called
> >
> > reactive
> >
> > streams where a slow consumer can apply back pressure if they are
> consuming
> >
> > slow. Even with Java this is possible with a Library called RxJava and
> >
> > these
> >
> > ideas will be incorporated in Java 9 as well.
> >
> > I still don't see why they would pick poll just to solve this one problem
> >
> > and
> >
> > compensating on others. Poll just don't sound realtime. I heard from some
> >
> > people
> >
> > that they would set poll to 100ms. Well 1) that is a lot of time. 2)
> >
> > Financial
> >
> > applications requires micro second latency. Kafka from what I understand
> >
> > looks
> >
> > like has a very high latency and here is the article.
> >
> > http://bravenewgeek.com/dissecting-message-queues/ I usually don't go by
> >
> > articles but I ran my own experiments on different queues and my numbers
> >
> > are
> >
> > very close to this article so I would say whoever wrote this article has
> >
> > done a
> >
> > good Job. 3) poll does generate unnecessary traffic in case if the data
> >
> > isn't
> >
> > available.
> >
> > Finally still not sure why they would pick poll() ? or do they plan on
> >
> > introducing reactive streams?Thanks,kant
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > On Sat, Sep 17, 2016 5:14 AM, Radoslaw Gruchalski ra...@gruchalski.com
> >
> > wrote:
> >
> > I'm only guessing here regarding if this is the reason:
> >
> >
> >
> >
> > Pull is much more sensible when a lot of data is pushed through. It
> allows
> >
> > consumers consuming at their own pace, slow consumers do not slow the
> >
> > complete
> >
> > system down.
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > --
> >
> >
> >
> >
> > Best regards,
> >
> >
> >
> >
> > Rad
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > On Sat, Sep 17, 2016 at 11:18 AM +0200, "kant kodali" <
> kanth...@gmail.com>
> >
> > wrote:
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > why did Kafka choose pull instead of push for a consumer? push sounds
> like
> >
> > it
> >
> >
> >
> >
> > is more realtime to me than poll and also wouldn't poll just keeps
> polling
> >
> > even
> >
> >
> >
> >
> > when they are no messages in the broker causing more traffic? please
> >
> > enlighten
> >
> >
> >
> >
> > me
>

Re: why did Kafka choose pull instead of push for a consumer ?

2016-09-20 Thread Michael Freeman

Thanks for sharing Radek, great article.

Michael

> On 17 Sep 2016, at 21:13, Radoslaw Gruchalski  wrote:
> 
> Please read this article:
> https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying
> 
> –
> Best regards,
> Radek Gruchalski
> ra...@gruchalski.com
> 
> 
> On September 17, 2016 at 9:49:43 PM, kant kodali (kanth...@gmail.com) wrote:
> 
> Still it should be possible to implement using reactive streams right.
> Could you please enlighten me on what are the some major differences you
> see
> between a commit log and a message queue? I see them being different only
> in the
> implementation but not functionality wise so I would be glad to hear your
> thoughts.
> 
> 
> 
> 
> 
> 
> On Sat, Sep 17, 2016 12:39 PM, Radoslaw Gruchalski ra...@gruchalski.com
> wrote:
> Kafka is not a queue. It’s a distributed commit log.
> 
> 
> 
> 
> –
> 
> Best regards,
> 
> Radek Gruchalski
> 
> ra...@gruchalski.com
> 
> 
> 
> 
> 
> 
> 
> On September 17, 2016 at 9:23:09 PM, kant kodali (kanth...@gmail.com)
> wrote:
> 
> 
> 
> 
> Hmm...Looks like Kafka is written in Scala. There is this thing called
> 
> reactive
> 
> streams where a slow consumer can apply back pressure if they are consuming
> 
> slow. Even with Java this is possible with a Library called RxJava and
> 
> these
> 
> ideas will be incorporated in Java 9 as well.
> 
> I still don't see why they would pick poll just to solve this one problem
> 
> and
> 
> compensating on others. Poll just don't sound realtime. I heard from some
> 
> people
> 
> that they would set poll to 100ms. Well 1) that is a lot of time. 2)
> 
> Financial
> 
> applications requires micro second latency. Kafka from what I understand
> 
> looks
> 
> like has a very high latency and here is the article.
> 
> http://bravenewgeek.com/dissecting-message-queues/ I usually don't go by
> 
> articles but I ran my own experiments on different queues and my numbers
> 
> are
> 
> very close to this article so I would say whoever wrote this article has
> 
> done a
> 
> good Job. 3) poll does generate unnecessary traffic in case if the data
> 
> isn't
> 
> available.
> 
> Finally still not sure why they would pick poll() ? or do they plan on
> 
> introducing reactive streams?Thanks,kant
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> On Sat, Sep 17, 2016 5:14 AM, Radoslaw Gruchalski ra...@gruchalski.com
> 
> wrote:
> 
> I'm only guessing here regarding if this is the reason:
> 
> 
> 
> 
> Pull is much more sensible when a lot of data is pushed through. It allows
> 
> consumers consuming at their own pace, slow consumers do not slow the
> 
> complete
> 
> system down.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> -- 
> 
> 
> 
> 
> Best regards,
> 
> 
> 
> 
> Rad
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> On Sat, Sep 17, 2016 at 11:18 AM +0200, "kant kodali" 
> 
> wrote:
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> why did Kafka choose pull instead of push for a consumer? push sounds like
> 
> it
> 
> 
> 
> 
> is more realtime to me than poll and also wouldn't poll just keeps polling
> 
> even
> 
> 
> 
> 
> when they are no messages in the broker causing more traffic? please
> 
> enlighten
> 
> 
> 
> 
> me

Re: why did Kafka choose pull instead of push for a consumer ?

2016-09-17 Thread Radoslaw Gruchalski

Please read this article:
https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying

–
Best regards,
Radek Gruchalski
ra...@gruchalski.com

On September 17, 2016 at 9:49:43 PM, kant kodali (kanth...@gmail.com) wrote:

Still it should be possible to implement using reactive streams right.
Could you please enlighten me on what are the some major differences you
see
between a commit log and a message queue? I see them being different only
in the
implementation but not functionality wise so I would be glad to hear your
thoughts.

On Sat, Sep 17, 2016 12:39 PM, Radoslaw Gruchalski ra...@gruchalski.com
wrote:
Kafka is not a queue. It’s a distributed commit log.

–

Best regards,

Radek Gruchalski

ra...@gruchalski.com

On September 17, 2016 at 9:23:09 PM, kant kodali (kanth...@gmail.com)
wrote:

Hmm...Looks like Kafka is written in Scala. There is this thing called

reactive

streams where a slow consumer can apply back pressure if they are consuming

slow. Even with Java this is possible with a Library called RxJava and

these

ideas will be incorporated in Java 9 as well.

I still don't see why they would pick poll just to solve this one problem

and

compensating on others. Poll just don't sound realtime. I heard from some

people

that they would set poll to 100ms. Well 1) that is a lot of time. 2)

Financial

applications requires micro second latency. Kafka from what I understand

looks

like has a very high latency and here is the article.

http://bravenewgeek.com/dissecting-message-queues/ I usually don't go by

articles but I ran my own experiments on different queues and my numbers

are

very close to this article so I would say whoever wrote this article has

done a

good Job. 3) poll does generate unnecessary traffic in case if the data

isn't

available.

Finally still not sure why they would pick poll() ? or do they plan on

introducing reactive streams?Thanks,kant

On Sat, Sep 17, 2016 5:14 AM, Radoslaw Gruchalski ra...@gruchalski.com

wrote:

I'm only guessing here regarding if this is the reason:

Pull is much more sensible when a lot of data is pushed through. It allows

consumers consuming at their own pace, slow consumers do not slow the

complete

system down.

Best regards,

Rad

On Sat, Sep 17, 2016 at 11:18 AM +0200, "kant kodali"

wrote:

why did Kafka choose pull instead of push for a consumer? push sounds like

is more realtime to me than poll and also wouldn't poll just keeps polling

even

when they are no messages in the broker causing more traffic? please

enlighten

Re: why did Kafka choose pull instead of push for a consumer ?

2016-09-17 Thread Ofir Manor

There are two distinct questions...
1. Regarding reactive streams, Akka has an implementation for Kafka:
https://github.com/akka/reactive-kafka
2. Kafka is not a queue. For example, it does not implement "dequeue"
operation.
All the message management / retention is not based on whether a message
was consumed or not. A topic may have a one month retention, so messages
will be deleted after one month, even if you consumed them after five
seconds. Which makes sense, as multiple (unlimited) number of consumer may
read the same messages, and each may decide to move backwards or forwards
in the log based on their need, etc etc.
So, Kafka separates producing messages, consuming messages and managing the
logs.

Ofir Manor

Co-Founder & CTO | Equalum

Mobile: +972-54-7801286 | Email: ofir.ma...@equalum.io

On Sat, Sep 17, 2016 at 10:49 PM, kant kodali  wrote:

> Still it should be possible to implement using reactive streams right.
> Could you please enlighten me on what are the some major differences you
> see
> between a commit log and a message queue? I see them being different only
> in the
> implementation but not functionality wise so I would be glad to hear your
> thoughts.
>
>
>
>
>
>
> On Sat, Sep 17, 2016 12:39 PM, Radoslaw Gruchalski ra...@gruchalski.com
> wrote:
> Kafka is not a queue. It’s a distributed commit log.
>
>
>
>
> –
>
> Best regards,
>
> Radek Gruchalski
>
> ra...@gruchalski.com
>
>
>
>
>
>
>
> On September 17, 2016 at 9:23:09 PM, kant kodali (kanth...@gmail.com)
> wrote:
>
>
>
>
> Hmm...Looks like Kafka is written in Scala. There is this thing called
>
> reactive
>
> streams where a slow consumer can apply back pressure if they are consuming
>
> slow. Even with Java this is possible with a Library called RxJava and
>
> these
>
> ideas will be incorporated in Java 9 as well.
>
> I still don't see why they would pick poll just to solve this one problem
>
> and
>
> compensating on others. Poll just don't sound realtime. I heard from some
>
> people
>
> that they would set poll to 100ms. Well 1) that is a lot of time. 2)
>
> Financial
>
> applications requires micro second latency. Kafka from what I understand
>
> looks
>
> like has a very high latency and here is the article.
>
> http://bravenewgeek.com/dissecting-message-queues/ I usually don't go by
>
> articles but I ran my own experiments on different queues and my numbers
>
> are
>
> very close to this article so I would say whoever wrote this article has
>
> done a
>
> good Job. 3) poll does generate unnecessary traffic in case if the data
>
> isn't
>
> available.
>
> Finally still not sure why they would pick poll() ? or do they plan on
>
> introducing reactive streams?Thanks,kant
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> On Sat, Sep 17, 2016 5:14 AM, Radoslaw Gruchalski ra...@gruchalski.com
>
> wrote:
>
> I'm only guessing here regarding if this is the reason:
>
>
>
>
> Pull is much more sensible when a lot of data is pushed through. It allows
>
> consumers consuming at their own pace, slow consumers do not slow the
>
> complete
>
> system down.
>
>
>
>
>
>
>
>
>
>
>
>
>
> --
>
>
>
>
> Best regards,
>
>
>
>
> Rad
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> On Sat, Sep 17, 2016 at 11:18 AM +0200, "kant kodali" 
>
> wrote:
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> why did Kafka choose pull instead of push for a consumer? push sounds like
>
> it
>
>
>
>
> is more realtime to me than poll and also wouldn't poll just keeps polling
>
> even
>
>
>
>
> when they are no messages in the broker causing more traffic? please
>
> enlighten
>
>
>
>
> me
>

Re: why did Kafka choose pull instead of push for a consumer ?

2016-09-17 Thread kant kodali

Still it should be possible to implement using reactive streams right.
Could you please enlighten me on what are the some major differences you see
between a commit log and a message queue? I see them being different only in the
implementation but not functionality wise so I would be glad to hear your
thoughts.
 





On Sat, Sep 17, 2016 12:39 PM, Radoslaw Gruchalski ra...@gruchalski.com
wrote:
Kafka is not a queue. It’s a distributed commit log.




–

Best regards,

Radek Gruchalski

ra...@gruchalski.com







On September 17, 2016 at 9:23:09 PM, kant kodali (kanth...@gmail.com) wrote:




Hmm...Looks like Kafka is written in Scala. There is this thing called

reactive

streams where a slow consumer can apply back pressure if they are consuming

slow. Even with Java this is possible with a Library called RxJava and

these

ideas will be incorporated in Java 9 as well.

I still don't see why they would pick poll just to solve this one problem

and

compensating on others. Poll just don't sound realtime. I heard from some

people

that they would set poll to 100ms. Well 1) that is a lot of time. 2)

Financial

applications requires micro second latency. Kafka from what I understand

looks

like has a very high latency and here is the article.

http://bravenewgeek.com/dissecting-message-queues/ I usually don't go by

articles but I ran my own experiments on different queues and my numbers

are

very close to this article so I would say whoever wrote this article has

done a

good Job. 3) poll does generate unnecessary traffic in case if the data

isn't

available.

Finally still not sure why they would pick poll() ? or do they plan on

introducing reactive streams?Thanks,kant



















On Sat, Sep 17, 2016 5:14 AM, Radoslaw Gruchalski ra...@gruchalski.com

wrote:

I'm only guessing here regarding if this is the reason:




Pull is much more sensible when a lot of data is pushed through. It allows

consumers consuming at their own pace, slow consumers do not slow the

complete

system down.













-- 




Best regards,




Rad








































On Sat, Sep 17, 2016 at 11:18 AM +0200, "kant kodali" 

wrote:






























































































why did Kafka choose pull instead of push for a consumer? push sounds like

it




is more realtime to me than poll and also wouldn't poll just keeps polling

even




when they are no messages in the broker causing more traffic? please

enlighten




me

Re: why did Kafka choose pull instead of push for a consumer ?

2016-09-17 Thread Radoslaw Gruchalski

Kafka is not a queue. It’s a distributed commit log.

–
Best regards,
Radek Gruchalski
ra...@gruchalski.com


On September 17, 2016 at 9:23:09 PM, kant kodali (kanth...@gmail.com) wrote:

Hmm...Looks like Kafka is written in Scala. There is this thing called
reactive
streams where a slow consumer can apply back pressure if they are consuming
slow. Even with Java this is possible with a Library called RxJava and
these
ideas will be incorporated in Java 9 as well.
I still don't see why they would pick poll just to solve this one problem
and
compensating on others. Poll just don't sound realtime. I heard from some
people
that they would set poll to 100ms. Well 1) that is a lot of time. 2)
Financial
applications requires micro second latency. Kafka from what I understand
looks
like has a very high latency and here is the article.
http://bravenewgeek.com/dissecting-message-queues/ I usually  don't go by
articles but I ran my own experiments on different queues and my numbers
are
very close to this article so I would say whoever wrote this article has
done a
good Job. 3) poll does generate unnecessary traffic in case if the data
isn't
available.
Finally still not sure why they would pick poll() ? or do they plan on
introducing reactive streams?Thanks,kant






On Sat, Sep 17, 2016 5:14 AM, Radoslaw Gruchalski ra...@gruchalski.com
wrote:
I'm only guessing here regarding if this is the reason:

Pull is much more sensible when a lot of data is pushed through. It allows
consumers consuming at their own pace, slow consumers do not slow the
complete
system down.




-- 

Best regards,

Rad













On Sat, Sep 17, 2016 at 11:18 AM +0200, "kant kodali" 
wrote:































why did Kafka choose pull instead of push for a consumer? push sounds like
it

is more realtime to me than poll and also wouldn't poll just keeps polling
even

when they are no messages in the broker causing more traffic? please
enlighten

me

Re: why did Kafka choose pull instead of push for a consumer ?

2016-09-17 Thread kant kodali

Hmm...Looks like Kafka is written in Scala. There is this thing called reactive
streams where a slow consumer can apply back pressure if they are consuming
slow. Even with Java this is possible with a Library called RxJava and these
ideas will be incorporated in Java 9 as well.
I still don't see why they would pick poll just to solve this one problem and
compensating on others. Poll just don't sound realtime. I heard from some people
that they would set poll to 100ms. Well 1) that is a lot of time. 2) Financial
applications requires micro second latency. Kafka from what I understand looks
like has a very high latency and here is the article.
http://bravenewgeek.com/dissecting-message-queues/ I usually  don't go by
articles but I ran my own experiments on different queues and my numbers are
very close to this article so I would say whoever wrote this article has done a
good Job. 3) poll does generate unnecessary traffic in case if the data isn't
available.
Finally still not sure why they would pick poll() ? or do they plan on
introducing reactive streams?Thanks,kant
 





On Sat, Sep 17, 2016 5:14 AM, Radoslaw Gruchalski ra...@gruchalski.com
wrote:
I'm only guessing here regarding if this is the reason:

Pull is much more sensible when a lot of data is pushed through. It allows
consumers consuming at their own pace, slow consumers do not slow the complete
system down.




-- 

Best regards,

Rad













On Sat, Sep 17, 2016 at 11:18 AM +0200, "kant kodali" 
wrote:































why did Kafka choose pull instead of push for a consumer? push sounds like it

is more realtime to me than poll and also wouldn't poll just keeps polling even

when they are no messages in the broker causing more traffic? please enlighten

me

Re: why did Kafka choose pull instead of push for a consumer ?

2016-09-17 Thread Radoslaw Gruchalski

I'm only guessing here regarding if this is the reason:
Pull is much more sensible when a lot of data is pushed through. It allows 
consumers consuming at their own pace, slow consumers do not slow the complete 
system down.

-- 
Best regards,
Rad




On Sat, Sep 17, 2016 at 11:18 AM +0200, "kant kodali"  
wrote:










why did Kafka choose pull instead of push for a consumer? push sounds like it
is more realtime to me than poll and also wouldn't poll just keeps polling even
when they are no messages in the broker causing more traffic? please enlighten
me

Re: why did Kafka choose pull instead of push for a consumer ?

RE: why did Kafka choose pull instead of push for a consumer ?

Re: why did Kafka choose pull instead of push for a consumer ?

Re: why did Kafka choose pull instead of push for a consumer ?

Re: why did Kafka choose pull instead of push for a consumer ?

Re: why did Kafka choose pull instead of push for a consumer ?

Re: why did Kafka choose pull instead of push for a consumer ?

Re: why did Kafka choose pull instead of push for a consumer ?

Re: why did Kafka choose pull instead of push for a consumer ?

Re: why did Kafka choose pull instead of push for a consumer ?

Re: why did Kafka choose pull instead of push for a consumer ?

Re: why did Kafka choose pull instead of push for a consumer ?

Re: why did Kafka choose pull instead of push for a consumer ?

13 matches

Site Navigation

Mail list logo

Footer information