Re: why did Kafka choose pull instead of push for a consumer ?
@Kant, Did you measure the latency while doing the test? I would expect there is some trade-off between latency and throughput. Using only the default configuration makes it difficult to compare. And it would also be interesting to see the relative changes when the number of brokers will be changed from 1 to 3, or even more. Took a quick look at google, but could not find any kind of good comparison like that. On Fri, Sep 23, 2016 at 2:11 PM Tauzell, Dave wrote: > Kafka writes each message but the OS is writing those to in memory disk > cache. Kafka periodically calls fsync() to tell the OS to force the disk > cache to actual disk. Kafka gets high availability by replicating messages > to other brokers so that the messages are in-memory on several machines at > once. If all the replicas fail around the same time you could lose data. > > -Dave > > -Original Message- > From: kant kodali [mailto:kanth...@gmail.com] > Sent: Friday, September 23, 2016 5:18 AM > To: users@kafka.apache.org > Subject: Re: why did Kafka choose pull instead of push for a consumer ? > > @Gerard > Here are my initial benchmarks > Producer on Machine 1 (m4.xlarge on AWS)Broker on Machine 2 (m4.xlarge on > AWS) Consumer on Machine 3 (m4.xlarge on AWS) Data size 1.2KB Receive > throughtput: ~24K Kafka Receive throughput ~58K (same exact configuration) > All the benchmarks I ran are with default options So what pulsar guys are > saying is that Kafka doesn't persist every message by default instead it > would batch them for a period of time and then persist so if the JVM > crashes before it persist all the messages that are in the batch are lost > whereas pulsar guarantees strong durability by storing every message to > write ahead log so messages are never lost. > My question now is that what settings I need to change in Kafka so it will > store every message? that way I am comparing apples to apples. > > > > > > > On Fri, Sep 23, 2016 12:06 AM, Gerard Klijs gerard.kl...@dizzit.com > wrote: > I haven't tried it myself, nor very likely will in the near future, but > > since it's also distributed I guess that with a large enough cluster you > > will be able to handle any load. One of the things kafka might be better at > > is more connecters available, a better at least once guarantee, better > > monitoring options. I really don't know, but if latancy is really important > > pulsar might be better, they used kafka before at yahoo and maybe still do > > for some stuff, recent work on https://github.com/yahoo/kafka-manager > seems > > to suggest so. > > Alternatively you could configure a kafka topic/producer/consumer to limit > > latency, and that may also be enough to get a low enough latency. It would > > certainly be interesting to compare the two, with the same hardware, and > > with high load. > > > > > On Thu, Sep 22, 2016 at 6:01 PM kant kodali wrote: > > > > > > @Gerard Thanks for this. It looks good any benchmarks on this > > throughput > > > wise? > > > > > > > > > > > > > > > > > > > > > On Thu, Sep 22, 2016 7:45 AM, Gerard Klijs gerard.kl...@dizzit.com > > > wrote: > > > We have a simple application producing 1 msg/sec, and did nothing to > > > > > > optimise the performance and have about a 10 msec delay between > > consumer > > > > > > and producer. When low latency is important, maybe pulsar is a better > > fit, > > > > > > https://www.datanami.com/2016/09/07/yahoos-new-pulsar-kafka-competitor/ > . > > > > > > > > > > > > > > > On Tue, Sep 20, 2016 at 2:24 PM Michael Freeman > > > > > > wrote: > > > > > > > > > > > > > > > > Thanks for sharing Radek, great article. > > > > > > > > > > > > > > Michael > > > > > > > > > > > > > > > On 17 Sep 2016, at 21:13, Radoslaw Gruchalski > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > Please read this article: > > > > > > > > > > > > > > > > > > > > > > > https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying > > > > > > > > > > > > > > > > – > > > > > > > > Best regards, > > > > > > > > Radek Gruchalski > > > > > > > > r
RE: why did Kafka choose pull instead of push for a consumer ?
Kafka writes each message but the OS is writing those to in memory disk cache. Kafka periodically calls fsync() to tell the OS to force the disk cache to actual disk. Kafka gets high availability by replicating messages to other brokers so that the messages are in-memory on several machines at once. If all the replicas fail around the same time you could lose data. -Dave -Original Message- From: kant kodali [mailto:kanth...@gmail.com] Sent: Friday, September 23, 2016 5:18 AM To: users@kafka.apache.org Subject: Re: why did Kafka choose pull instead of push for a consumer ? @Gerard Here are my initial benchmarks Producer on Machine 1 (m4.xlarge on AWS)Broker on Machine 2 (m4.xlarge on AWS) Consumer on Machine 3 (m4.xlarge on AWS) Data size 1.2KB Receive throughtput: ~24K Kafka Receive throughput ~58K (same exact configuration) All the benchmarks I ran are with default options So what pulsar guys are saying is that Kafka doesn't persist every message by default instead it would batch them for a period of time and then persist so if the JVM crashes before it persist all the messages that are in the batch are lost whereas pulsar guarantees strong durability by storing every message to write ahead log so messages are never lost. My question now is that what settings I need to change in Kafka so it will store every message? that way I am comparing apples to apples. On Fri, Sep 23, 2016 12:06 AM, Gerard Klijs gerard.kl...@dizzit.com wrote: I haven't tried it myself, nor very likely will in the near future, but since it's also distributed I guess that with a large enough cluster you will be able to handle any load. One of the things kafka might be better at is more connecters available, a better at least once guarantee, better monitoring options. I really don't know, but if latancy is really important pulsar might be better, they used kafka before at yahoo and maybe still do for some stuff, recent work on https://github.com/yahoo/kafka-manager seems to suggest so. Alternatively you could configure a kafka topic/producer/consumer to limit latency, and that may also be enough to get a low enough latency. It would certainly be interesting to compare the two, with the same hardware, and with high load. On Thu, Sep 22, 2016 at 6:01 PM kant kodali wrote: > @Gerard Thanks for this. It looks good any benchmarks on this > throughput > wise? > > > > > > > On Thu, Sep 22, 2016 7:45 AM, Gerard Klijs gerard.kl...@dizzit.com > wrote: > We have a simple application producing 1 msg/sec, and did nothing to > > optimise the performance and have about a 10 msec delay between > consumer > > and producer. When low latency is important, maybe pulsar is a better > fit, > > https://www.datanami.com/2016/09/07/yahoos-new-pulsar-kafka-competitor/ . > > > > > On Tue, Sep 20, 2016 at 2:24 PM Michael Freeman > > wrote: > > > > > > Thanks for sharing Radek, great article. > > > > > > Michael > > > > > > > On 17 Sep 2016, at 21:13, Radoslaw Gruchalski > > > > > > wrote: > > > > > > > > Please read this article: > > > > > > > > > https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying > > > > > > > > – > > > > Best regards, > > > > Radek Gruchalski > > > > ra...@gruchalski.com > > > > > > > > > > > > On September 17, 2016 at 9:49:43 PM, kant kodali > > > (kanth...@gmail.com) > > > wrote: > > > > > > > > Still it should be possible to implement using reactive streams right. > > > > Could you please enlighten me on what are the some major > > > differences > you > > > > see > > > > between a commit log and a message queue? I see them being > > > different > only > > > > in the > > > > implementation but not functionality wise so I would be glad to > > > hear > your > > > > thoughts. > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Sat, Sep 17, 2016 12:39 PM, Radoslaw Gruchalski > ra...@gruchalski.com > > > > wrote: > > > > Kafka is not a queue. It’s a distributed commit log. > > > > > > > > > > > > > > > > > > > > – > > > > > > > > Best regards, > > > > > > > > Radek Gruchalski > > > > > >
Re: why did Kafka choose pull instead of push for a consumer ?
@Gerard Here are my initial benchmarks Producer on Machine 1 (m4.xlarge on AWS)Broker on Machine 2 (m4.xlarge on AWS) Consumer on Machine 3 (m4.xlarge on AWS) Data size 1.2KB Receive throughtput: ~24K Kafka Receive throughput ~58K (same exact configuration) All the benchmarks I ran are with default options So what pulsar guys are saying is that Kafka doesn't persist every message by default instead it would batch them for a period of time and then persist so if the JVM crashes before it persist all the messages that are in the batch are lost whereas pulsar guarantees strong durability by storing every message to write ahead log so messages are never lost. My question now is that what settings I need to change in Kafka so it will store every message? that way I am comparing apples to apples. On Fri, Sep 23, 2016 12:06 AM, Gerard Klijs gerard.kl...@dizzit.com wrote: I haven't tried it myself, nor very likely will in the near future, but since it's also distributed I guess that with a large enough cluster you will be able to handle any load. One of the things kafka might be better at is more connecters available, a better at least once guarantee, better monitoring options. I really don't know, but if latancy is really important pulsar might be better, they used kafka before at yahoo and maybe still do for some stuff, recent work on https://github.com/yahoo/kafka-manager seems to suggest so. Alternatively you could configure a kafka topic/producer/consumer to limit latency, and that may also be enough to get a low enough latency. It would certainly be interesting to compare the two, with the same hardware, and with high load. On Thu, Sep 22, 2016 at 6:01 PM kant kodali wrote: > @Gerard Thanks for this. It looks good any benchmarks on this throughput > wise? > > > > > > > On Thu, Sep 22, 2016 7:45 AM, Gerard Klijs gerard.kl...@dizzit.com > wrote: > We have a simple application producing 1 msg/sec, and did nothing to > > optimise the performance and have about a 10 msec delay between consumer > > and producer. When low latency is important, maybe pulsar is a better fit, > > https://www.datanami.com/2016/09/07/yahoos-new-pulsar-kafka-competitor/ . > > > > > On Tue, Sep 20, 2016 at 2:24 PM Michael Freeman > > wrote: > > > > > > Thanks for sharing Radek, great article. > > > > > > Michael > > > > > > > On 17 Sep 2016, at 21:13, Radoslaw Gruchalski > > > wrote: > > > > > > > > Please read this article: > > > > > > > > > https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying > > > > > > > > – > > > > Best regards, > > > > Radek Gruchalski > > > > ra...@gruchalski.com > > > > > > > > > > > > On September 17, 2016 at 9:49:43 PM, kant kodali (kanth...@gmail.com) > > > wrote: > > > > > > > > Still it should be possible to implement using reactive streams right. > > > > Could you please enlighten me on what are the some major differences > you > > > > see > > > > between a commit log and a message queue? I see them being different > only > > > > in the > > > > implementation but not functionality wise so I would be glad to hear > your > > > > thoughts. > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Sat, Sep 17, 2016 12:39 PM, Radoslaw Gruchalski > ra...@gruchalski.com > > > > wrote: > > > > Kafka is not a queue. It’s a distributed commit log. > > > > > > > > > > > > > > > > > > > > – > > > > > > > > Best regards, > > > > > > > > Radek Gruchalski > > > > > > > > ra...@gruchalski.com > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On September 17, 2016 at 9:23:09 PM, kant kodali (kanth...@gmail.com) > > > > wrote: > > > > > > > > > > > > > > > > > > > > Hmm...Looks like Kafka is written in Scala. There is this thing called > > > > > > > > reactive > > > > > > > > streams where a slow consumer can apply back pressure if they are > > > consuming > > > > > > > > slow. Even with Java this is possible with a Library called RxJava and > > > > > > > > these > > > > > > > > ideas will be incorporated in Java 9 as well. > > > > > > > > I still don't see why they would pick poll just to solve this one > problem > > > > > > > > and > > > > > > > > compensating on others. Poll just don't sound realtime. I heard from > some > > > > > > > > people > > > > > > > > that they would set poll to 100ms. Well 1) that is a lot of time. 2) > > > > > > > > Financial > > > > > > > > applications requires micro second latency. Kafka from what I > understand > > > > > > > > looks > > > > > > > > like has a very high latency and here is the article. > > > > > > > > http://bravenewgeek.com/dissecting-message-queues/ I usually don't go > by > > > > > > > > articles but I ran my own experiments on differen
Re: why did Kafka choose pull instead of push for a consumer ?
I haven't tried it myself, nor very likely will in the near future, but since it's also distributed I guess that with a large enough cluster you will be able to handle any load. One of the things kafka might be better at is more connecters available, a better at least once guarantee, better monitoring options. I really don't know, but if latancy is really important pulsar might be better, they used kafka before at yahoo and maybe still do for some stuff, recent work on https://github.com/yahoo/kafka-manager seems to suggest so. Alternatively you could configure a kafka topic/producer/consumer to limit latency, and that may also be enough to get a low enough latency. It would certainly be interesting to compare the two, with the same hardware, and with high load. On Thu, Sep 22, 2016 at 6:01 PM kant kodali wrote: > @Gerard Thanks for this. It looks good any benchmarks on this throughput > wise? > > > > > > > On Thu, Sep 22, 2016 7:45 AM, Gerard Klijs gerard.kl...@dizzit.com > wrote: > We have a simple application producing 1 msg/sec, and did nothing to > > optimise the performance and have about a 10 msec delay between consumer > > and producer. When low latency is important, maybe pulsar is a better fit, > > https://www.datanami.com/2016/09/07/yahoos-new-pulsar-kafka-competitor/ . > > > > > On Tue, Sep 20, 2016 at 2:24 PM Michael Freeman > > wrote: > > > > > > Thanks for sharing Radek, great article. > > > > > > Michael > > > > > > > On 17 Sep 2016, at 21:13, Radoslaw Gruchalski > > > wrote: > > > > > > > > Please read this article: > > > > > > > > > https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying > > > > > > > > – > > > > Best regards, > > > > Radek Gruchalski > > > > ra...@gruchalski.com > > > > > > > > > > > > On September 17, 2016 at 9:49:43 PM, kant kodali (kanth...@gmail.com) > > > wrote: > > > > > > > > Still it should be possible to implement using reactive streams right. > > > > Could you please enlighten me on what are the some major differences > you > > > > see > > > > between a commit log and a message queue? I see them being different > only > > > > in the > > > > implementation but not functionality wise so I would be glad to hear > your > > > > thoughts. > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Sat, Sep 17, 2016 12:39 PM, Radoslaw Gruchalski > ra...@gruchalski.com > > > > wrote: > > > > Kafka is not a queue. It’s a distributed commit log. > > > > > > > > > > > > > > > > > > > > – > > > > > > > > Best regards, > > > > > > > > Radek Gruchalski > > > > > > > > ra...@gruchalski.com > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On September 17, 2016 at 9:23:09 PM, kant kodali (kanth...@gmail.com) > > > > wrote: > > > > > > > > > > > > > > > > > > > > Hmm...Looks like Kafka is written in Scala. There is this thing called > > > > > > > > reactive > > > > > > > > streams where a slow consumer can apply back pressure if they are > > > consuming > > > > > > > > slow. Even with Java this is possible with a Library called RxJava and > > > > > > > > these > > > > > > > > ideas will be incorporated in Java 9 as well. > > > > > > > > I still don't see why they would pick poll just to solve this one > problem > > > > > > > > and > > > > > > > > compensating on others. Poll just don't sound realtime. I heard from > some > > > > > > > > people > > > > > > > > that they would set poll to 100ms. Well 1) that is a lot of time. 2) > > > > > > > > Financial > > > > > > > > applications requires micro second latency. Kafka from what I > understand > > > > > > > > looks > > > > > > > > like has a very high latency and here is the article. > > > > > > > > http://bravenewgeek.com/dissecting-message-queues/ I usually don't go > by > > > > > > > > articles but I ran my own experiments on different queues and my > numbers > > > > > > > > are > > > > > > > > very close to this article so I would say whoever wrote this article > has > > > > > > > > done a > > > > > > > > good Job. 3) poll does generate unnecessary traffic in case if the data > > > > > > > > isn't > > > > > > > > available. > > > > > > > > Finally still not sure why they would pick poll() ? or do they plan on > > > > > > > > introducing reactive streams?Thanks,kant > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Sat, Sep 17, 2016 5:14 AM, Radoslaw Gruchalski ra...@gruchalski.com > > > > > > > > wrote: > > > > > > > > I'm only guessing here regarding if this is the reason: > > > > > > > > > > > > > > > > > > > > Pull is much more sensible when a lot of data is pushed through. It > > > allows > > > > > > > > consumers consuming at their own pace, slow consumers do not slow the > > > > > > > > complete > > > > > > > > system down. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
Re: why did Kafka choose pull instead of push for a consumer ?
@Gerard Thanks for this. It looks good any benchmarks on this throughput wise? On Thu, Sep 22, 2016 7:45 AM, Gerard Klijs gerard.kl...@dizzit.com wrote: We have a simple application producing 1 msg/sec, and did nothing to optimise the performance and have about a 10 msec delay between consumer and producer. When low latency is important, maybe pulsar is a better fit, https://www.datanami.com/2016/09/07/yahoos-new-pulsar-kafka-competitor/ . On Tue, Sep 20, 2016 at 2:24 PM Michael Freeman wrote: > Thanks for sharing Radek, great article. > > Michael > > > On 17 Sep 2016, at 21:13, Radoslaw Gruchalski > wrote: > > > > Please read this article: > > > https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying > > > > – > > Best regards, > > Radek Gruchalski > > ra...@gruchalski.com > > > > > > On September 17, 2016 at 9:49:43 PM, kant kodali (kanth...@gmail.com) > wrote: > > > > Still it should be possible to implement using reactive streams right. > > Could you please enlighten me on what are the some major differences you > > see > > between a commit log and a message queue? I see them being different only > > in the > > implementation but not functionality wise so I would be glad to hear your > > thoughts. > > > > > > > > > > > > > > On Sat, Sep 17, 2016 12:39 PM, Radoslaw Gruchalski ra...@gruchalski.com > > wrote: > > Kafka is not a queue. It’s a distributed commit log. > > > > > > > > > > – > > > > Best regards, > > > > Radek Gruchalski > > > > ra...@gruchalski.com > > > > > > > > > > > > > > > > On September 17, 2016 at 9:23:09 PM, kant kodali (kanth...@gmail.com) > > wrote: > > > > > > > > > > Hmm...Looks like Kafka is written in Scala. There is this thing called > > > > reactive > > > > streams where a slow consumer can apply back pressure if they are > consuming > > > > slow. Even with Java this is possible with a Library called RxJava and > > > > these > > > > ideas will be incorporated in Java 9 as well. > > > > I still don't see why they would pick poll just to solve this one problem > > > > and > > > > compensating on others. Poll just don't sound realtime. I heard from some > > > > people > > > > that they would set poll to 100ms. Well 1) that is a lot of time. 2) > > > > Financial > > > > applications requires micro second latency. Kafka from what I understand > > > > looks > > > > like has a very high latency and here is the article. > > > > http://bravenewgeek.com/dissecting-message-queues/ I usually don't go by > > > > articles but I ran my own experiments on different queues and my numbers > > > > are > > > > very close to this article so I would say whoever wrote this article has > > > > done a > > > > good Job. 3) poll does generate unnecessary traffic in case if the data > > > > isn't > > > > available. > > > > Finally still not sure why they would pick poll() ? or do they plan on > > > > introducing reactive streams?Thanks,kant > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Sat, Sep 17, 2016 5:14 AM, Radoslaw Gruchalski ra...@gruchalski.com > > > > wrote: > > > > I'm only guessing here regarding if this is the reason: > > > > > > > > > > Pull is much more sensible when a lot of data is pushed through. It > allows > > > > consumers consuming at their own pace, slow consumers do not slow the > > > > complete > > > > system down. > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > Best regards, > > > > > > > > > > Rad > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Sat, Sep 17, 2016 at 11:18 AM +0200, "kant kodali" < > kanth...@gmail.com> > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > why did Kafka choose pull instead of push for a consumer? push sounds > like > > > > it > > > > > > > > > > is more realtime to me than poll and also wouldn't poll just keeps > polling > > > > even > > > > > > > > > > when they are no messages in the broker causing more traffic? please > > > > enlighten > > > > > > > > > > me >
Re: why did Kafka choose pull instead of push for a consumer ?
We have a simple application producing 1 msg/sec, and did nothing to optimise the performance and have about a 10 msec delay between consumer and producer. When low latency is important, maybe pulsar is a better fit, https://www.datanami.com/2016/09/07/yahoos-new-pulsar-kafka-competitor/ . On Tue, Sep 20, 2016 at 2:24 PM Michael Freeman wrote: > Thanks for sharing Radek, great article. > > Michael > > > On 17 Sep 2016, at 21:13, Radoslaw Gruchalski > wrote: > > > > Please read this article: > > > https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying > > > > – > > Best regards, > > Radek Gruchalski > > ra...@gruchalski.com > > > > > > On September 17, 2016 at 9:49:43 PM, kant kodali (kanth...@gmail.com) > wrote: > > > > Still it should be possible to implement using reactive streams right. > > Could you please enlighten me on what are the some major differences you > > see > > between a commit log and a message queue? I see them being different only > > in the > > implementation but not functionality wise so I would be glad to hear your > > thoughts. > > > > > > > > > > > > > > On Sat, Sep 17, 2016 12:39 PM, Radoslaw Gruchalski ra...@gruchalski.com > > wrote: > > Kafka is not a queue. It’s a distributed commit log. > > > > > > > > > > – > > > > Best regards, > > > > Radek Gruchalski > > > > ra...@gruchalski.com > > > > > > > > > > > > > > > > On September 17, 2016 at 9:23:09 PM, kant kodali (kanth...@gmail.com) > > wrote: > > > > > > > > > > Hmm...Looks like Kafka is written in Scala. There is this thing called > > > > reactive > > > > streams where a slow consumer can apply back pressure if they are > consuming > > > > slow. Even with Java this is possible with a Library called RxJava and > > > > these > > > > ideas will be incorporated in Java 9 as well. > > > > I still don't see why they would pick poll just to solve this one problem > > > > and > > > > compensating on others. Poll just don't sound realtime. I heard from some > > > > people > > > > that they would set poll to 100ms. Well 1) that is a lot of time. 2) > > > > Financial > > > > applications requires micro second latency. Kafka from what I understand > > > > looks > > > > like has a very high latency and here is the article. > > > > http://bravenewgeek.com/dissecting-message-queues/ I usually don't go by > > > > articles but I ran my own experiments on different queues and my numbers > > > > are > > > > very close to this article so I would say whoever wrote this article has > > > > done a > > > > good Job. 3) poll does generate unnecessary traffic in case if the data > > > > isn't > > > > available. > > > > Finally still not sure why they would pick poll() ? or do they plan on > > > > introducing reactive streams?Thanks,kant > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Sat, Sep 17, 2016 5:14 AM, Radoslaw Gruchalski ra...@gruchalski.com > > > > wrote: > > > > I'm only guessing here regarding if this is the reason: > > > > > > > > > > Pull is much more sensible when a lot of data is pushed through. It > allows > > > > consumers consuming at their own pace, slow consumers do not slow the > > > > complete > > > > system down. > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > Best regards, > > > > > > > > > > Rad > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Sat, Sep 17, 2016 at 11:18 AM +0200, "kant kodali" < > kanth...@gmail.com> > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > why did Kafka choose pull instead of push for a consumer? push sounds > like > > > > it > > > > > > > > > > is more realtime to me than poll and also wouldn't poll just keeps > polling > > > > even > > > > > > > > > > when they are no messages in the broker causing more traffic? please > > > > enlighten > > > > > > > > > > me >
Re: why did Kafka choose pull instead of push for a consumer ?
Thanks for sharing Radek, great article. Michael > On 17 Sep 2016, at 21:13, Radoslaw Gruchalski wrote: > > Please read this article: > https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying > > – > Best regards, > Radek Gruchalski > ra...@gruchalski.com > > > On September 17, 2016 at 9:49:43 PM, kant kodali (kanth...@gmail.com) wrote: > > Still it should be possible to implement using reactive streams right. > Could you please enlighten me on what are the some major differences you > see > between a commit log and a message queue? I see them being different only > in the > implementation but not functionality wise so I would be glad to hear your > thoughts. > > > > > > > On Sat, Sep 17, 2016 12:39 PM, Radoslaw Gruchalski ra...@gruchalski.com > wrote: > Kafka is not a queue. It’s a distributed commit log. > > > > > – > > Best regards, > > Radek Gruchalski > > ra...@gruchalski.com > > > > > > > > On September 17, 2016 at 9:23:09 PM, kant kodali (kanth...@gmail.com) > wrote: > > > > > Hmm...Looks like Kafka is written in Scala. There is this thing called > > reactive > > streams where a slow consumer can apply back pressure if they are consuming > > slow. Even with Java this is possible with a Library called RxJava and > > these > > ideas will be incorporated in Java 9 as well. > > I still don't see why they would pick poll just to solve this one problem > > and > > compensating on others. Poll just don't sound realtime. I heard from some > > people > > that they would set poll to 100ms. Well 1) that is a lot of time. 2) > > Financial > > applications requires micro second latency. Kafka from what I understand > > looks > > like has a very high latency and here is the article. > > http://bravenewgeek.com/dissecting-message-queues/ I usually don't go by > > articles but I ran my own experiments on different queues and my numbers > > are > > very close to this article so I would say whoever wrote this article has > > done a > > good Job. 3) poll does generate unnecessary traffic in case if the data > > isn't > > available. > > Finally still not sure why they would pick poll() ? or do they plan on > > introducing reactive streams?Thanks,kant > > > > > > > > > > > > > > > > > > > > On Sat, Sep 17, 2016 5:14 AM, Radoslaw Gruchalski ra...@gruchalski.com > > wrote: > > I'm only guessing here regarding if this is the reason: > > > > > Pull is much more sensible when a lot of data is pushed through. It allows > > consumers consuming at their own pace, slow consumers do not slow the > > complete > > system down. > > > > > > > > > > > > > > -- > > > > > Best regards, > > > > > Rad > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Sat, Sep 17, 2016 at 11:18 AM +0200, "kant kodali" > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > why did Kafka choose pull instead of push for a consumer? push sounds like > > it > > > > > is more realtime to me than poll and also wouldn't poll just keeps polling > > even > > > > > when they are no messages in the broker causing more traffic? please > > enlighten > > > > > me
Re: why did Kafka choose pull instead of push for a consumer ?
Please read this article: https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying – Best regards, Radek Gruchalski ra...@gruchalski.com On September 17, 2016 at 9:49:43 PM, kant kodali (kanth...@gmail.com) wrote: Still it should be possible to implement using reactive streams right. Could you please enlighten me on what are the some major differences you see between a commit log and a message queue? I see them being different only in the implementation but not functionality wise so I would be glad to hear your thoughts. On Sat, Sep 17, 2016 12:39 PM, Radoslaw Gruchalski ra...@gruchalski.com wrote: Kafka is not a queue. It’s a distributed commit log. – Best regards, Radek Gruchalski ra...@gruchalski.com On September 17, 2016 at 9:23:09 PM, kant kodali (kanth...@gmail.com) wrote: Hmm...Looks like Kafka is written in Scala. There is this thing called reactive streams where a slow consumer can apply back pressure if they are consuming slow. Even with Java this is possible with a Library called RxJava and these ideas will be incorporated in Java 9 as well. I still don't see why they would pick poll just to solve this one problem and compensating on others. Poll just don't sound realtime. I heard from some people that they would set poll to 100ms. Well 1) that is a lot of time. 2) Financial applications requires micro second latency. Kafka from what I understand looks like has a very high latency and here is the article. http://bravenewgeek.com/dissecting-message-queues/ I usually don't go by articles but I ran my own experiments on different queues and my numbers are very close to this article so I would say whoever wrote this article has done a good Job. 3) poll does generate unnecessary traffic in case if the data isn't available. Finally still not sure why they would pick poll() ? or do they plan on introducing reactive streams?Thanks,kant On Sat, Sep 17, 2016 5:14 AM, Radoslaw Gruchalski ra...@gruchalski.com wrote: I'm only guessing here regarding if this is the reason: Pull is much more sensible when a lot of data is pushed through. It allows consumers consuming at their own pace, slow consumers do not slow the complete system down. -- Best regards, Rad On Sat, Sep 17, 2016 at 11:18 AM +0200, "kant kodali" wrote: why did Kafka choose pull instead of push for a consumer? push sounds like it is more realtime to me than poll and also wouldn't poll just keeps polling even when they are no messages in the broker causing more traffic? please enlighten me
Re: why did Kafka choose pull instead of push for a consumer ?
There are two distinct questions... 1. Regarding reactive streams, Akka has an implementation for Kafka: https://github.com/akka/reactive-kafka 2. Kafka is not a queue. For example, it does not implement "dequeue" operation. All the message management / retention is not based on whether a message was consumed or not. A topic may have a one month retention, so messages will be deleted after one month, even if you consumed them after five seconds. Which makes sense, as multiple (unlimited) number of consumer may read the same messages, and each may decide to move backwards or forwards in the log based on their need, etc etc. So, Kafka separates producing messages, consuming messages and managing the logs. Ofir Manor Co-Founder & CTO | Equalum Mobile: +972-54-7801286 | Email: ofir.ma...@equalum.io On Sat, Sep 17, 2016 at 10:49 PM, kant kodali wrote: > Still it should be possible to implement using reactive streams right. > Could you please enlighten me on what are the some major differences you > see > between a commit log and a message queue? I see them being different only > in the > implementation but not functionality wise so I would be glad to hear your > thoughts. > > > > > > > On Sat, Sep 17, 2016 12:39 PM, Radoslaw Gruchalski ra...@gruchalski.com > wrote: > Kafka is not a queue. It’s a distributed commit log. > > > > > – > > Best regards, > > Radek Gruchalski > > ra...@gruchalski.com > > > > > > > > On September 17, 2016 at 9:23:09 PM, kant kodali (kanth...@gmail.com) > wrote: > > > > > Hmm...Looks like Kafka is written in Scala. There is this thing called > > reactive > > streams where a slow consumer can apply back pressure if they are consuming > > slow. Even with Java this is possible with a Library called RxJava and > > these > > ideas will be incorporated in Java 9 as well. > > I still don't see why they would pick poll just to solve this one problem > > and > > compensating on others. Poll just don't sound realtime. I heard from some > > people > > that they would set poll to 100ms. Well 1) that is a lot of time. 2) > > Financial > > applications requires micro second latency. Kafka from what I understand > > looks > > like has a very high latency and here is the article. > > http://bravenewgeek.com/dissecting-message-queues/ I usually don't go by > > articles but I ran my own experiments on different queues and my numbers > > are > > very close to this article so I would say whoever wrote this article has > > done a > > good Job. 3) poll does generate unnecessary traffic in case if the data > > isn't > > available. > > Finally still not sure why they would pick poll() ? or do they plan on > > introducing reactive streams?Thanks,kant > > > > > > > > > > > > > > > > > > > > On Sat, Sep 17, 2016 5:14 AM, Radoslaw Gruchalski ra...@gruchalski.com > > wrote: > > I'm only guessing here regarding if this is the reason: > > > > > Pull is much more sensible when a lot of data is pushed through. It allows > > consumers consuming at their own pace, slow consumers do not slow the > > complete > > system down. > > > > > > > > > > > > > > -- > > > > > Best regards, > > > > > Rad > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Sat, Sep 17, 2016 at 11:18 AM +0200, "kant kodali" > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > why did Kafka choose pull instead of push for a consumer? push sounds like > > it > > > > > is more realtime to me than poll and also wouldn't poll just keeps polling > > even > > > > > when they are no messages in the broker causing more traffic? please > > enlighten > > > > > me >
Re: why did Kafka choose pull instead of push for a consumer ?
Still it should be possible to implement using reactive streams right. Could you please enlighten me on what are the some major differences you see between a commit log and a message queue? I see them being different only in the implementation but not functionality wise so I would be glad to hear your thoughts. On Sat, Sep 17, 2016 12:39 PM, Radoslaw Gruchalski ra...@gruchalski.com wrote: Kafka is not a queue. It’s a distributed commit log. – Best regards, Radek Gruchalski ra...@gruchalski.com On September 17, 2016 at 9:23:09 PM, kant kodali (kanth...@gmail.com) wrote: Hmm...Looks like Kafka is written in Scala. There is this thing called reactive streams where a slow consumer can apply back pressure if they are consuming slow. Even with Java this is possible with a Library called RxJava and these ideas will be incorporated in Java 9 as well. I still don't see why they would pick poll just to solve this one problem and compensating on others. Poll just don't sound realtime. I heard from some people that they would set poll to 100ms. Well 1) that is a lot of time. 2) Financial applications requires micro second latency. Kafka from what I understand looks like has a very high latency and here is the article. http://bravenewgeek.com/dissecting-message-queues/ I usually don't go by articles but I ran my own experiments on different queues and my numbers are very close to this article so I would say whoever wrote this article has done a good Job. 3) poll does generate unnecessary traffic in case if the data isn't available. Finally still not sure why they would pick poll() ? or do they plan on introducing reactive streams?Thanks,kant On Sat, Sep 17, 2016 5:14 AM, Radoslaw Gruchalski ra...@gruchalski.com wrote: I'm only guessing here regarding if this is the reason: Pull is much more sensible when a lot of data is pushed through. It allows consumers consuming at their own pace, slow consumers do not slow the complete system down. -- Best regards, Rad On Sat, Sep 17, 2016 at 11:18 AM +0200, "kant kodali" wrote: why did Kafka choose pull instead of push for a consumer? push sounds like it is more realtime to me than poll and also wouldn't poll just keeps polling even when they are no messages in the broker causing more traffic? please enlighten me
Re: why did Kafka choose pull instead of push for a consumer ?
Kafka is not a queue. It’s a distributed commit log. – Best regards, Radek Gruchalski ra...@gruchalski.com On September 17, 2016 at 9:23:09 PM, kant kodali (kanth...@gmail.com) wrote: Hmm...Looks like Kafka is written in Scala. There is this thing called reactive streams where a slow consumer can apply back pressure if they are consuming slow. Even with Java this is possible with a Library called RxJava and these ideas will be incorporated in Java 9 as well. I still don't see why they would pick poll just to solve this one problem and compensating on others. Poll just don't sound realtime. I heard from some people that they would set poll to 100ms. Well 1) that is a lot of time. 2) Financial applications requires micro second latency. Kafka from what I understand looks like has a very high latency and here is the article. http://bravenewgeek.com/dissecting-message-queues/ I usually don't go by articles but I ran my own experiments on different queues and my numbers are very close to this article so I would say whoever wrote this article has done a good Job. 3) poll does generate unnecessary traffic in case if the data isn't available. Finally still not sure why they would pick poll() ? or do they plan on introducing reactive streams?Thanks,kant On Sat, Sep 17, 2016 5:14 AM, Radoslaw Gruchalski ra...@gruchalski.com wrote: I'm only guessing here regarding if this is the reason: Pull is much more sensible when a lot of data is pushed through. It allows consumers consuming at their own pace, slow consumers do not slow the complete system down. -- Best regards, Rad On Sat, Sep 17, 2016 at 11:18 AM +0200, "kant kodali" wrote: why did Kafka choose pull instead of push for a consumer? push sounds like it is more realtime to me than poll and also wouldn't poll just keeps polling even when they are no messages in the broker causing more traffic? please enlighten me
Re: why did Kafka choose pull instead of push for a consumer ?
Hmm...Looks like Kafka is written in Scala. There is this thing called reactive streams where a slow consumer can apply back pressure if they are consuming slow. Even with Java this is possible with a Library called RxJava and these ideas will be incorporated in Java 9 as well. I still don't see why they would pick poll just to solve this one problem and compensating on others. Poll just don't sound realtime. I heard from some people that they would set poll to 100ms. Well 1) that is a lot of time. 2) Financial applications requires micro second latency. Kafka from what I understand looks like has a very high latency and here is the article. http://bravenewgeek.com/dissecting-message-queues/ I usually don't go by articles but I ran my own experiments on different queues and my numbers are very close to this article so I would say whoever wrote this article has done a good Job. 3) poll does generate unnecessary traffic in case if the data isn't available. Finally still not sure why they would pick poll() ? or do they plan on introducing reactive streams?Thanks,kant On Sat, Sep 17, 2016 5:14 AM, Radoslaw Gruchalski ra...@gruchalski.com wrote: I'm only guessing here regarding if this is the reason: Pull is much more sensible when a lot of data is pushed through. It allows consumers consuming at their own pace, slow consumers do not slow the complete system down. -- Best regards, Rad On Sat, Sep 17, 2016 at 11:18 AM +0200, "kant kodali" wrote: why did Kafka choose pull instead of push for a consumer? push sounds like it is more realtime to me than poll and also wouldn't poll just keeps polling even when they are no messages in the broker causing more traffic? please enlighten me
Re: why did Kafka choose pull instead of push for a consumer ?
I'm only guessing here regarding if this is the reason: Pull is much more sensible when a lot of data is pushed through. It allows consumers consuming at their own pace, slow consumers do not slow the complete system down. -- Best regards, Rad On Sat, Sep 17, 2016 at 11:18 AM +0200, "kant kodali" wrote: why did Kafka choose pull instead of push for a consumer? push sounds like it is more realtime to me than poll and also wouldn't poll just keeps polling even when they are no messages in the broker causing more traffic? please enlighten me