Re: Apache Kafka website "videos" page clarification

2020-10-13 Thread Ben Stopford
I don't think the process here is perfect, but I do think whatever process
we have should be self-policing i.e. committers shouldn't need to watch and
vet every video that someone chooses to submit to work out if it's good
quality, vendor-neutral, and relevant.

The Kafka Summit approach is at least self-vetting and, in my opinion, it's
the best compromise between being low maintenance, fair, and not open to
manipulation. I appreciate that there are some forms of content it might
not cover, but I think it's fair to say that the vast majority of
high-quality Kafka-related video content comes out of Kafka Summit, so
overall it seems like a good compromise. Reusing a pre-existing Apache
committee and community process rather than putting yet another burden on
the committers, etc. also just seems like a good, pragmatic approach.

B

On Fri, 9 Oct 2020 at 11:15, Paolo Patierno  wrote:

> Hi Ben,
>
> Thanks for the answer and sorry to come back to this so late.
>
>
> From what you described, it seems that the process to get Kafka related
> videos on the Apache Kafka community web site is going through the Kafka
> Summit.
>
> You have to submit a talk, it has to be accepted by the committee and
> finally, it has to get more votes.
>
> To be honest I don’t see it as a transparent and open community approach.
>
> Requiring a submission is problematic because there could be good content,
> such as something based around a screencast, that wouldn’t necessarily make
> for a good conference talk. Also, to present at Kafka Summit my
> understanding is that I have to make a contributor agreement with
> Confluent. Such agreements are not necessary to contribute to Apache.
>
> While the most recent Kafka Summit was free, for previous summits only
> people paying for the summit would have been able to vote.
>
> I would say that in this way we are missing a lot of good content talking
> about Apache Kafka out there that would be beneficial to the overall
> community.
>
> I can understand that we want high-quality content on the website but
> going through the usual open way with a pull-request and reviews from the
> community and committers would be a more transparent process in my opinion.
> It would allow for the author of the content to iteratively improve it,
> rather than having to wait months for a new Kafka Summit CFP, review etc
> etc.
>
>
> Thanks,
>
> Paolo
>
> Paolo Patierno
> Principal Software Engineer @ Red Hat
> Microsoft MVP on Azure
>
> Twitter : @ppatierno<http://twitter.com/ppatierno>
> Linkedin : paolopatierno<http://it.linkedin.com/in/paolopatierno>
> Blog : DevExperience<http://paolopatierno.wordpress.com/>
> 
> From: Ben Stopford 
> Sent: Thursday, September 10, 2020 12:54 PM
> To: Kafka Users 
> Subject: Re: Apache Kafka website "videos" page clarification
>
> Hi Paulo
>
> The reason for using Kafka Summit videos is there is an extensive,
> community-driven selection process that goes into Kafka Summit driven by
> the Kafka Summit Program Committee. This is then further filtered by the
> community itself: those members of the community that attend the summit and
> vote.
>
> This seems the best way to share content about AK without having a complex
> review process for each talk that someone might want to include on the AK
> website.
>
> Hope that makes sense
>
> Ben
>
> On Wed, 9 Sep 2020 at 14:00, Paolo Patierno  wrote:
>
> > Hi all!
> > I have just noticed the new content on the Apache Kafka website about
> > books, papers, podcasts, and videos ... congratulations and great works
> to
> > put them all together!! It's an impressive list!!
> > On the videos page I read this:
> >
> > The following talks, with video recordings and slides available, achieved
> > the best ratings by the community at the Kafka Summit conferences from
> 2018
> > onwards. Thanks to all the speakers for their hard work!
> >
> > Does it mean that it's not possible to publish videos talking about
> Apache
> > Kafka (upstream community project) that were delivered outside of Kafka
> > Summit? (i.e. KubeCon or any other conference)
> > In the end, it would be always Apache Kafka content, right?
> > What's the purpose of that video page?
> >
> > Thanks in advance for the clarification
> >
> > Paolo Patierno
> > Principal Software Engineer @ Red Hat
> > Microsoft MVP on Azure
> >
> > Twitter : @ppatierno<http://twitter.com/ppatierno>
> > Linkedin : paolopatierno<http://it.linkedin.com/in/paolopatierno>
> > Blog : DevExperience<http://paolopatierno.wordpress.com/>
> >
>
>
> --
>
> Ben Stopford
>


-- 

Ben Stopford


Re: Apache Kafka website "videos" page clarification

2020-09-10 Thread Ben Stopford
Hi Paulo

The reason for using Kafka Summit videos is there is an extensive,
community-driven selection process that goes into Kafka Summit driven by
the Kafka Summit Program Committee. This is then further filtered by the
community itself: those members of the community that attend the summit and
vote.

This seems the best way to share content about AK without having a complex
review process for each talk that someone might want to include on the AK
website.

Hope that makes sense

Ben

On Wed, 9 Sep 2020 at 14:00, Paolo Patierno  wrote:

> Hi all!
> I have just noticed the new content on the Apache Kafka website about
> books, papers, podcasts, and videos ... congratulations and great works to
> put them all together!! It's an impressive list!!
> On the videos page I read this:
>
> The following talks, with video recordings and slides available, achieved
> the best ratings by the community at the Kafka Summit conferences from 2018
> onwards. Thanks to all the speakers for their hard work!
>
> Does it mean that it's not possible to publish videos talking about Apache
> Kafka (upstream community project) that were delivered outside of Kafka
> Summit? (i.e. KubeCon or any other conference)
> In the end, it would be always Apache Kafka content, right?
> What's the purpose of that video page?
>
> Thanks in advance for the clarification
>
> Paolo Patierno
> Principal Software Engineer @ Red Hat
> Microsoft MVP on Azure
>
> Twitter : @ppatierno<http://twitter.com/ppatierno>
> Linkedin : paolopatierno<http://it.linkedin.com/in/paolopatierno>
> Blog : DevExperience<http://paolopatierno.wordpress.com/>
>


-- 

Ben Stopford


Re: [VOTE] KIP-657: Add Customized Kafka Streams Logo

2020-08-20 Thread Ben Stopford
> > > > > What I would suggest is to first agree on what the KStreams logo is
> > > > > supposed to convey to the reader.  Here's my personal take:
> > > > >
> > > > > Objective 1: First and foremost, the KStreams logo should make it
> > clear
> > > > and
> > > > > obvious that KStreams is an official and integral part of Apache
> > Kafka.
> > > > > This applies to both what is depicted and how it is depicted (like
> > > font,
> > > > > line art, colors).
> > > > > Objective 2: The logo should allude to the role of KStreams in the
> > > Kafka
> > > > > project, which is the processing part.  That is, "doing something
> > > useful
> > > > to
> > > > > the data in Kafka".
> > > > >
> > > > > The "circling arrow" aspect of the current otter logos does allude
> to
> > > > > "continuous processing", which is going in the direction of (2),
> but
> > > the
> > > > > logos do not meet (1) in my opinion.
> > > > >
> > > > > -Michael
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Tue, Aug 18, 2020 at 10:34 PM Matthias J. Sax  >
> > > > wrote:
> > > > >
> > > > > > Adding the user mailing list -- I think we should accepts votes
> on
> > > both
> > > > > > lists for this special case, as it's not a technical decision.
> > > > > >
> > > > > > @Boyang: as mentioned by Bruno, can we maybe add black/white
> > options
> > > > for
> > > > > > both proposals, too?
> > > > > >
> > > > > > I also agree that Design B is not ideal with regard to the Kafka
> > > logo.
> > > > > > Would it be possible to change Design B accordingly?
> > > > > >
> > > > > > I am not a font expert, but the fonts in both design are
> different
> > > and
> > > > I
> > > > > > am wondering if there is an official Apache Kafka font that we
> > should
> > > > > > reuse to make sure that the logos align -- I would expect that
> both
> > > > > > logos (including "Apache Kafka" and "Kafka Streams" names) will
> be
> > > used
> > > > > > next to each other and it would look awkward if the font differs.
> > > > > >
> > > > > >
> > > > > > -Matthias
> > > > > >
> > > > > > On 8/18/20 11:28 AM, Navinder Brar wrote:
> > > > > > > Hi,
> > > > > > > Thanks for the KIP, really like the idea. I am +1(non-binding)
> > on A
> > > > > > mainly because I felt like you have to tilt your head to realize
> > the
> > > > > > otter's head in B.
> > > > > > > Regards,Navinder
> > > > > > >
> > > > > > > On Tuesday, 18 August, 2020, 11:44:20 pm IST, Guozhang
> Wang <
> > > > > > wangg...@gmail.com> wrote:
> > > > > > >
> > > > > > >  I'm leaning towards design B primarily because it reminds me
> of
> > > the
> > > > > > Firefox
> > > > > > > logo which I like a lot. But I also share Adam's concern that
> it
> > > > should
> > > > > > > better not obscure the Kafka logo --- so if we can tweak a bit
> to
> > > fix
> > > > > it
> > > > > > my
> > > > > > > vote goes to B, otherwise A :)
> > > > > > >
> > > > > > >
> > > > > > > Guozhang
> > > > > > >
> > > > > > > On Tue, Aug 18, 2020 at 9:48 AM Bruno Cadonna <
> > br...@confluent.io>
> > > > > > wrote:
> > > > > > >
> > > > > > >> Thanks for the KIP!
> > > > > > >>
> > > > > > >> I am +1 (non-binding) for A.
> > > > > > >>
> > > > > > >> I would also like to hear opinions whether the logo should be
> > > > > colorized
> > > > > > >> or just black and white.
> > > > > > >>
> > > > > > >> Best,
> > > > > > >> Bruno
> > > > > > >>
> > > > > > >>
> > > > > > >> On 15.08.20 16:05, Adam Bellemare wrote:
> > > > > > >>> I prefer Design B, but given that I missed the discussion
> > > thread, I
> > > > > > think
> > > > > > >>> it would be better without the Otter obscuring any part of
> the
> > > > Kafka
> > > > > > >> logo.
> > > > > > >>>
> > > > > > >>> On Thu, Aug 13, 2020 at 6:31 PM Boyang Chen <
> > > > > > reluctanthero...@gmail.com>
> > > > > > >>> wrote:
> > > > > > >>>
> > > > > > >>>> Hello everyone,
> > > > > > >>>>
> > > > > > >>>> I would like to start a vote thread for KIP-657:
> > > > > > >>>>
> > > > > > >>>>
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-657%3A+Add+Customized+Kafka+Streams+Logo
> > > > > > >>>>
> > > > > > >>>> This KIP is aiming to add a new logo for the Kafka Streams
> > > > library.
> > > > > > And
> > > > > > >> we
> > > > > > >>>> prepared two candidates with a cute otter. You could look up
> > the
> > > > KIP
> > > > > > to
> > > > > > >>>> find those logos.
> > > > > > >>>>
> > > > > > >>>>
> > > > > > >>>> Please post your vote against these two customized logos.
> For
> > > > > > example, I
> > > > > > >>>> would vote for *design-A (binding)*.
> > > > > > >>>>
> > > > > > >>>> This vote thread shall be open for one week to collect
> enough
> > > > votes
> > > > > to
> > > > > > >> call
> > > > > > >>>> for a winner. Still, feel free to post any question you may
> > have
> > > > > > >> regarding
> > > > > > >>>> this KIP, thanks!
> > > > > > >>>>
> > > > > > >>>
> > > > > > >>
> > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> >
> > --
> >
> > Michael G. Noll
> >
> > Principal Technologist, Office of the CTO
> >
> > <https://www.confluent.io>
> >
>
>
> --
>
> <https://www.confluent.io>
>
> Antony Stubbs
>
> Principal Consulting Engineer / Solutions Architect
>
>
> Follow us:  Blog
> <
> https://confluent.io/blog?utm_source=footer_medium=email_campaign=ch.email-signature_type.community_content.blog
> >
> • Slack <https://slackpass.io/confluentcommunity> • Twitter
> <https://twitter.com/ConfluentInc> • YouTube <
> https://youtube.com/confluent>
>


-- 

Ben Stopford

Lead Technologist, Office of the CTO

<https://www.confluent.io>


Re: [VOTE] KIP-657: Add Customized Kafka Streams Logo

2020-08-19 Thread Ben Stopford
t is depicted (like
> font,
> > > line art, colors).
> > > Objective 2: The logo should allude to the role of KStreams in the
> Kafka
> > > project, which is the processing part.  That is, "doing something
> useful
> > to
> > > the data in Kafka".
> > >
> > > The "circling arrow" aspect of the current otter logos does allude to
> > > "continuous processing", which is going in the direction of (2), but
> the
> > > logos do not meet (1) in my opinion.
> > >
> > > -Michael
> > >
> > >
> > >
> > >
> > > On Tue, Aug 18, 2020 at 10:34 PM Matthias J. Sax 
> > wrote:
> > >
> > > > Adding the user mailing list -- I think we should accepts votes on
> both
> > > > lists for this special case, as it's not a technical decision.
> > > >
> > > > @Boyang: as mentioned by Bruno, can we maybe add black/white options
> > for
> > > > both proposals, too?
> > > >
> > > > I also agree that Design B is not ideal with regard to the Kafka
> logo.
> > > > Would it be possible to change Design B accordingly?
> > > >
> > > > I am not a font expert, but the fonts in both design are different
> and
> > I
> > > > am wondering if there is an official Apache Kafka font that we should
> > > > reuse to make sure that the logos align -- I would expect that both
> > > > logos (including "Apache Kafka" and "Kafka Streams" names) will be
> used
> > > > next to each other and it would look awkward if the font differs.
> > > >
> > > >
> > > > -Matthias
> > > >
> > > > On 8/18/20 11:28 AM, Navinder Brar wrote:
> > > > > Hi,
> > > > > Thanks for the KIP, really like the idea. I am +1(non-binding) on A
> > > > mainly because I felt like you have to tilt your head to realize the
> > > > otter's head in B.
> > > > > Regards,Navinder
> > > > >
> > > > > On Tuesday, 18 August, 2020, 11:44:20 pm IST, Guozhang Wang <
> > > > wangg...@gmail.com> wrote:
> > > > >
> > > > >  I'm leaning towards design B primarily because it reminds me of
> the
> > > > Firefox
> > > > > logo which I like a lot. But I also share Adam's concern that it
> > should
> > > > > better not obscure the Kafka logo --- so if we can tweak a bit to
> fix
> > > it
> > > > my
> > > > > vote goes to B, otherwise A :)
> > > > >
> > > > >
> > > > > Guozhang
> > > > >
> > > > > On Tue, Aug 18, 2020 at 9:48 AM Bruno Cadonna 
> > > > wrote:
> > > > >
> > > > >> Thanks for the KIP!
> > > > >>
> > > > >> I am +1 (non-binding) for A.
> > > > >>
> > > > >> I would also like to hear opinions whether the logo should be
> > > colorized
> > > > >> or just black and white.
> > > > >>
> > > > >> Best,
> > > > >> Bruno
> > > > >>
> > > > >>
> > > > >> On 15.08.20 16:05, Adam Bellemare wrote:
> > > > >>> I prefer Design B, but given that I missed the discussion
> thread, I
> > > > think
> > > > >>> it would be better without the Otter obscuring any part of the
> > Kafka
> > > > >> logo.
> > > > >>>
> > > > >>> On Thu, Aug 13, 2020 at 6:31 PM Boyang Chen <
> > > > reluctanthero...@gmail.com>
> > > > >>> wrote:
> > > > >>>
> > > > >>>> Hello everyone,
> > > > >>>>
> > > > >>>> I would like to start a vote thread for KIP-657:
> > > > >>>>
> > > > >>>>
> > > > >>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-657%3A+Add+Customized+Kafka+Streams+Logo
> > > > >>>>
> > > > >>>> This KIP is aiming to add a new logo for the Kafka Streams
> > library.
> > > > And
> > > > >> we
> > > > >>>> prepared two candidates with a cute otter. You could look up the
> > KIP
> > > > to
> > > > >>>> find those logos.
> > > > >>>>
> > > > >>>>
> > > > >>>> Please post your vote against these two customized logos. For
> > > > example, I
> > > > >>>> would vote for *design-A (binding)*.
> > > > >>>>
> > > > >>>> This vote thread shall be open for one week to collect enough
> > votes
> > > to
> > > > >> call
> > > > >>>> for a winner. Still, feel free to post any question you may have
> > > > >> regarding
> > > > >>>> this KIP, thanks!
> > > > >>>>
> > > > >>>
> > > > >>
> > > > >
> > > > >
> > > >
> > > >
> > >
> >
>
>
> --
> -- Guozhang
>


-- 

Ben Stopford

Lead Technologist, Office of the CTO

<https://www.confluent.io>


Re: New Website Layout

2020-08-12 Thread Ben Stopford
thanks Folks

On Wed, 12 Aug 2020 at 07:28, Luke Chen  wrote:

> Hi Tom, Ben,
> PR is ready to address the issue you saw:
> https://github.com/apache/kafka-site/pull/292
>
> Thanks.
>
> On Wed, Aug 12, 2020 at 1:09 AM Tom Bentley  wrote:
>
> > Hi Ben,
> >
> > Thanks for fixing that. Another problem I've just noticed is a couple of
> > garbled headings. E.g. scroll down from
> > https://kafka.apache.org/documentation.html#design_compactionbasics and
> > the
> > "What guarantees does log compaction provide?" section is rendering as
> >
> > $1 class="anchor-heading">$8$9$10
> > <https://kafka.apache.org/documentation.html#$4>
> >
> > with the . Similar thing in
> > https://kafka.apache.org/documentation.html#design_quotas. The source
> HTML
> > looks OK to me.
> >
> > Kind regards,
> >
> > Tom
> >
> > On Mon, Aug 10, 2020 at 2:15 PM Ben Stopford  wrote:
> >
> > > Good spot. Thanks.
> > >
> > > On Thu, 6 Aug 2020 at 18:59, Ben Weintraub  wrote:
> > >
> > > > Plus one to Tom's request - the ability to easily generate links to
> > > > specific config options is extremely valuable.
> > > >
> > > > On Thu, Aug 6, 2020 at 10:09 AM Tom Bentley 
> > wrote:
> > > >
> > > > > Hi Ben,
> > > > >
> > > > > The documentation for the configs (broker, producer etc) used to
> > > function
> > > > > as links as well as anchors, which made the url fragments more
> > > > > discoverable, because you could click on the link and then
> copy+paste
> > > the
> > > > > browser URL:
> > > > >
> > > > > 
> > > > >> > > > href="#batch.size">batch.size
> > > > > 
> > > > >
> > > > > What seems to have happened with the new layout is the  tags are
> > > > empty,
> > > > > and no longer enclose the config name,
> > > > >
> > > > > 
> > > > >href="#batch.size">
> > > > >   batch.size
> > > > > 
> > > > >
> > > > > meaning you can't click on the link to copy and paste the URL.
> Could
> > > the
> > > > > old behaviour be restored?
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Tom
> > > > >
> > > > > On Wed, Aug 5, 2020 at 12:43 PM Luke Chen 
> wrote:
> > > > >
> > > > > > When entering streams doc, it'll always show:
> > > > > > *You're viewing documentation for an older version of Kafka -
> check
> > > out
> > > > > our
> > > > > > current documentation here.*
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Wed, Aug 5, 2020 at 6:44 PM Ben Stopford 
> > > wrote:
> > > > > >
> > > > > > > Thanks for the PR and feedback Michael. Appreciated.
> > > > > > >
> > > > > > > On Wed, 5 Aug 2020 at 10:49, Mickael Maison <
> > > > mickael.mai...@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Thank you, it looks great!
> > > > > > > >
> > > > > > > > I found a couple of small issues:
> > > > > > > > - It's not rendering correctly with http.
> > > > > > > > - It's printing "called" to the console. I opened a PR to
> > remove
> > > > the
> > > > > > > > console.log() call:
> > > https://github.com/apache/kafka-site/pull/278
> > > > > > > >
> > > > > > > > On Wed, Aug 5, 2020 at 9:45 AM Ben Stopford <
> b...@confluent.io>
> > > > > wrote:
> > > > > > > > >
> > > > > > > > > The new website layout has gone live as you may have seen.
> > > There
> > > > > are
> > > > > > a
> > > > > > > > > couple of rendering issues in the streams developer guide
> > that
> > > > > we're
> > > > > > > > > getting addressed. If anyone spots anything else could they
> > > > please
> > > > > > > reply
> > > > > > > > to
> > > > > > > > > this thread.
> > > > > > > > >
> > > > > > > > > Thanks
> > > > > > > > >
> > > > > > > > > Ben
> > > > > > > > >
> > > > > > > > > On Fri, 26 Jun 2020 at 11:48, Ben Stopford <
> b...@confluent.io
> > >
> > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hey folks
> > > > > > > > > >
> > > > > > > > > > We've made some updates to the website's look and feel.
> > There
> > > > is
> > > > > a
> > > > > > > > staged
> > > > > > > > > > version in the link below.
> > > > > > > > > >
> > > > > > > > > >
> https://ec2-13-57-18-236.us-west-1.compute.amazonaws.com/
> > > > > > > > > > username: kafka
> > > > > > > > > > password: streaming
> > > > > > > > > >
> > > > > > > > > > Comments welcomed.
> > > > > > > > > >
> > > > > > > > > > Ben
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > >
> > > > > > > Ben Stopford
> > > > > > >
> > > > > > > Lead Technologist, Office of the CTO
> > > > > > >
> > > > > > > <https://www.confluent.io>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > >
> > > --
> > >
> > > Ben Stopford
> > >
> > > Lead Technologist, Office of the CTO
> > >
> > > <https://www.confluent.io>
> > >
> >
>


-- 

Ben Stopford

Lead Technologist, Office of the CTO

<https://www.confluent.io>


Re: New Website Layout

2020-08-10 Thread Ben Stopford
Good spot. Thanks.

On Thu, 6 Aug 2020 at 18:59, Ben Weintraub  wrote:

> Plus one to Tom's request - the ability to easily generate links to
> specific config options is extremely valuable.
>
> On Thu, Aug 6, 2020 at 10:09 AM Tom Bentley  wrote:
>
> > Hi Ben,
> >
> > The documentation for the configs (broker, producer etc) used to function
> > as links as well as anchors, which made the url fragments more
> > discoverable, because you could click on the link and then copy+paste the
> > browser URL:
> >
> > 
> >> href="#batch.size">batch.size
> > 
> >
> > What seems to have happened with the new layout is the  tags are
> empty,
> > and no longer enclose the config name,
> >
> > 
> >   
> >   batch.size
> > 
> >
> > meaning you can't click on the link to copy and paste the URL. Could the
> > old behaviour be restored?
> >
> > Thanks,
> >
> > Tom
> >
> > On Wed, Aug 5, 2020 at 12:43 PM Luke Chen  wrote:
> >
> > > When entering streams doc, it'll always show:
> > > *You're viewing documentation for an older version of Kafka - check out
> > our
> > > current documentation here.*
> > >
> > >
> > >
> > > On Wed, Aug 5, 2020 at 6:44 PM Ben Stopford  wrote:
> > >
> > > > Thanks for the PR and feedback Michael. Appreciated.
> > > >
> > > > On Wed, 5 Aug 2020 at 10:49, Mickael Maison <
> mickael.mai...@gmail.com>
> > > > wrote:
> > > >
> > > > > Thank you, it looks great!
> > > > >
> > > > > I found a couple of small issues:
> > > > > - It's not rendering correctly with http.
> > > > > - It's printing "called" to the console. I opened a PR to remove
> the
> > > > > console.log() call: https://github.com/apache/kafka-site/pull/278
> > > > >
> > > > > On Wed, Aug 5, 2020 at 9:45 AM Ben Stopford 
> > wrote:
> > > > > >
> > > > > > The new website layout has gone live as you may have seen. There
> > are
> > > a
> > > > > > couple of rendering issues in the streams developer guide that
> > we're
> > > > > > getting addressed. If anyone spots anything else could they
> please
> > > > reply
> > > > > to
> > > > > > this thread.
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > > Ben
> > > > > >
> > > > > > On Fri, 26 Jun 2020 at 11:48, Ben Stopford 
> > wrote:
> > > > > >
> > > > > > > Hey folks
> > > > > > >
> > > > > > > We've made some updates to the website's look and feel. There
> is
> > a
> > > > > staged
> > > > > > > version in the link below.
> > > > > > >
> > > > > > > https://ec2-13-57-18-236.us-west-1.compute.amazonaws.com/
> > > > > > > username: kafka
> > > > > > > password: streaming
> > > > > > >
> > > > > > > Comments welcomed.
> > > > > > >
> > > > > > > Ben
> > > > > > >
> > > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > >
> > > > Ben Stopford
> > > >
> > > > Lead Technologist, Office of the CTO
> > > >
> > > > <https://www.confluent.io>
> > > >
> > >
> >
>


-- 

Ben Stopford

Lead Technologist, Office of the CTO

<https://www.confluent.io>


Re: New Website Layout

2020-08-05 Thread Ben Stopford
Thanks for the PR and feedback Michael. Appreciated.

On Wed, 5 Aug 2020 at 10:49, Mickael Maison 
wrote:

> Thank you, it looks great!
>
> I found a couple of small issues:
> - It's not rendering correctly with http.
> - It's printing "called" to the console. I opened a PR to remove the
> console.log() call: https://github.com/apache/kafka-site/pull/278
>
> On Wed, Aug 5, 2020 at 9:45 AM Ben Stopford  wrote:
> >
> > The new website layout has gone live as you may have seen. There are a
> > couple of rendering issues in the streams developer guide that we're
> > getting addressed. If anyone spots anything else could they please reply
> to
> > this thread.
> >
> > Thanks
> >
> > Ben
> >
> > On Fri, 26 Jun 2020 at 11:48, Ben Stopford  wrote:
> >
> > > Hey folks
> > >
> > > We've made some updates to the website's look and feel. There is a
> staged
> > > version in the link below.
> > >
> > > https://ec2-13-57-18-236.us-west-1.compute.amazonaws.com/
> > > username: kafka
> > > password: streaming
> > >
> > > Comments welcomed.
> > >
> > > Ben
> > >
> > >
>


-- 

Ben Stopford

Lead Technologist, Office of the CTO

<https://www.confluent.io>


Re: New Website Layout

2020-08-05 Thread Ben Stopford
The new website layout has gone live as you may have seen. There are a
couple of rendering issues in the streams developer guide that we're
getting addressed. If anyone spots anything else could they please reply to
this thread.

Thanks

Ben

On Fri, 26 Jun 2020 at 11:48, Ben Stopford  wrote:

> Hey folks
>
> We've made some updates to the website's look and feel. There is a staged
> version in the link below.
>
> https://ec2-13-57-18-236.us-west-1.compute.amazonaws.com/
> username: kafka
> password: streaming
>
> Comments welcomed.
>
> Ben
>
>


New Website Layout

2020-06-26 Thread Ben Stopford
Hey folks

We've made some updates to the website's look and feel. There is a staged
version in the link below.

https://ec2-13-57-18-236.us-west-1.compute.amazonaws.com/
username: kafka
password: streaming

Comments welcomed.

Ben


Re: kafka compacted topic

2017-11-30 Thread Ben Stopford
The CPU/IO required to complete a compaction phase will grow as the log
grows but you can manage this via the cleaner's various configs. Check out
properties starting log.cleaner in the docs (
https://kafka.apache.org/documentation). All databases that implement LSM
storage have a similar overhead (Cassandra, HBase, RocksDB etc). Note the
first (active) segment is never compacted.

On Thu, Nov 30, 2017 at 6:59 AM Kane Kim  wrote:

> I want to confirm if kafka has to re-compact all log segments, as log grows
> doesn't it become slower as well?
>
> On Tue, Nov 28, 2017 at 11:33 PM, Jakub Scholz  wrote:
>
> > There is quite a nice section on this in the documentation -
> > http://kafka.apache.org/documentation/#compaction ... I think it should
> > answer your questions.
> >
> > On Wed, Nov 29, 2017 at 7:19 AM, Kane Kim  wrote:
> >
> > > How does kafka log compaction work?
> > > Does it compact all of the log files periodically against new changes?
> > >
> >
>


Re: GDPR appliance

2017-11-28 Thread Ben Stopford
You should also be able to manage this with a compacted topic. If you give
each message a unique key you'd then be able to delete, or overwrite
specific records. Kafka will delete them from disk when compaction runs. If
you need to partition for ordering purposes you'd need to use a custom
partitioner that extracts a partition key from the unique key before it
does the hash.

B

On Sun, Nov 26, 2017 at 10:40 AM Wim Van Leuven <
wim.vanleu...@highestpoint.biz> wrote:

> Thanks, Lars, for the most interesting read!
>
>
>
> On Sun, 26 Nov 2017 at 00:38 Lars Albertsson  wrote:
>
> > Hi David,
> >
> > You might find this presentation useful:
> > https://www.slideshare.net/lallea/protecting-privacy-in-practice
> >
> > It explains privacy building blocks primarily in a batch processing
> > context, but most of the principles are applicable for stream
> > processing as well, e.g. splitting non-PII and PII data ("ejected
> > record" slide), encrypting PII data ("lost key" slide).
> >
> > Regards,
> >
> >
> >
> > Lars Albertsson
> > Data engineering consultant
> > www.mapflat.com
> > https://twitter.com/lalleal
> > +46 70 7687109 <+46%2070%20768%2071%2009> <+46%2070%20768%2071%2009>
> > Calendar: http://www.mapflat.com/calendar
> >
> >
> > On Wed, Nov 22, 2017 at 7:46 PM, David Espinosa 
> wrote:
> > > Hi all,
> > > I would like to double check with you how we want to apply some GDPR
> into
> > > my kafka topics. In concrete the "right to be forgotten", what forces
> us
> > to
> > > delete some data contained in the messages. So not deleting the
> message,
> > > but editing it.
> > > For doing that, my intention is to replicate the topic and apply a
> > > transformation over it.
> > > I think that frameworks like Kafka Streams or Apache Storm.
> > >
> > > Did anybody had to solve this problem?
> > >
> > > Thanks in advance.
> >
>


Re: Replication throttling

2017-10-05 Thread Ben Stopford
Typically you don't want replication throttling enabled all the time as if
a broker drops out of the isr for whatever reason catch-up will be impeded.
Having said that, this may not be an issue if the throttle is quite mild
and your max write rate is well below your the network limit, but it is
safer to enable/disable when you perform maintenance.
B

On Tue, 19 Sep 2017, 18:14 Ivan Simonenko, 
wrote:

> Hi, Kafka Users,
>
> In the documentation for replication throttling it is mentioned that it
> should be removed after partitions moved or a broker completed bootstrap (
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-73+Replication+Quotas#KIP-73ReplicationQuotas-2.HowdoIthrottleabootstrappingbroker
> ?
> and  https://kafka.apache.org/documentation/ )
>
> In an assumption that throttling is configured based on network or drive
> throughput it seems to be a good idea to have it always enabled. Why is it
> recommended to turn it off?
>
> In addition, we noticed that during broker bootstrap if throttling is not
> configured and broker got overloaded then we
> face NotLeaderForPartitionException for a prolonged time span. It doesn't
> happen replication throttling 10-20% below max throughput.
>
> Best,
> Ivan Simonenko
>


Re: Event sourcing with Kafka and Kafka Streams. How to deal with atomicity

2017-07-24 Thread Ben Stopford
No worries Jose ;-)

So there are a few ways you could do this, but I think it’s important that
you manage a single “stock level” state store, backed by a changelog. Use
this for validation, and keep it up to date at the same time. You should
also ensure the input topic(s) are partitioned by productId so any update
to, or validation of, the same product will be sequenced. This effectively
ensures the mutations of the quantities in stock will be atomic.

So say we have two inputs: OrderRequests, StockUpdates

Order requests need to validate that there is sufficient stock, via the
product store, then decrement the stock value in that store:

public Event validateInventory(OrderRequestEvent order, KeyValueStore<>
store){

Long stockCount = store.get(order.product);

if (stockCount - order.quantity >= 0) {

//decrement the value in the store

store.put(order.product, stockCount - order.amount);

return new OrderValidatedEvent(Validation.Passed);

} else

   return new OrderValidatedEvent(Validation.Failed);

}

Stock updates need to increase the stock value in the product store as new
stock arrives.

public void updateStockStore(StockUpdateEvent update, KeyValueStore<>
store){

Long current = update.get(update.product);

store.put(update.product, current + update.amount);

}

To do the processing we merge input streams, then push this into a
transfomer, that uses a single state store to manage the mapping between
products and their stock levels.

KStream unvalidatedOrdersStream =
builder.stream(orderTopic);

KStream stockStream = builder.stream(stockUpdateTopic);

StateStoreSupplier productStore = Stores.create(productStoreName)...build()

KStream orderOutputs =

unvalidatedOrdersStream.outerJoin(stockStream, ...)

.transform(StockCheckTransformer::new, productStoreName)

.filter((key, value) -> value != "");

orderOutputs.to(validatedOrdersStream);


With the transformer both managing and validating against the stock levels.

StockCountTransformer { ….

public KeyValue transform(ProductId key, Event event)

if (event.isStockUpdate()) {

Stock update = parseStock(value);

return KeyValue.pair(key,

updateStockStore(parseStockUpdate(event), productStore))

  } else if (event.isOrderRequest()) {

return KeyValue.pair(key,

validateInventory(parseOrderReq(event), productStore))

}

}

}

Now the stock levels will be held in the changelog topic which backs the
ProductStore which we can reuse if we wish.

I think we could also optimise this code a bit by splitting into two
transformers via streams.branch(..).

Regarding EoS. This doesn’t add any magic to your processing logic. It just
guarantees that your stock count will be accurate in the face of failure
(i.e. you don’t need to manage idempotence yourself).

B


On Sat, Jul 22, 2017 at 12:52 PM José Antonio Iñigo <
joseantonio.in...@gmail.com> wrote:

> Hi Garret,
>
> At the moment, to simplify the problem I only have one topic, orders, where
> I add products and decrement them based on ProductAdded and ProductReserved
> events.
>
> Yeaterday I was reading about EoS but I don't know if it'll solve the
> problem. Dividing the query-update in two steps means that the event
> ordering could be:
>
> OrderPlaced (query stock ok)
> OrderPlaced (query stock ok)
> ProductReserved (update stock)
> ProductReserved (update stock)
>
> Regarding EoS this sequence is correct, the messages are delivered once in
> the order in which they were generated. The problem is the order itself: if
> there were a way to query-update-store-generate-event in one step to
> produce instead the following sequence of events there wouldn't be any
> problem:
>
> OrderPlaced->ProductReserved (query stock ok + Update stock store +
> reserved event)
> OrderPlaced->ProductNoStock (query stock fail so no update and out-of-stock
> event)
>
> Regards
>
> On Sat, 22 Jul 2017 at 05:35, Garrett Barton 
> wrote:
>
> > Could you take in both topics via the same stream? Meaning don't do a
> kafka
> > streams join, literally just read both streams. If KStream cant do this,
> > dunno haven't tried, then simple upstream merge job to throw them into 1
> > topic with same partitioning scheme.
> >
> > I'd assume you would have the products stream that would be some kind of
> > incrementer on state (within the local state store).  The Orders stream
> > would act as a decrement to the same stream task.  Exactly once semantics
> > and you skirt the issue of having to wait for the update to come back
> > around.
> >
> > Thoughts?
> >
> > On Fri, Jul 21, 2017 at 6:15 PM, José Antonio Iñigo <
> > joseantonio.in...@gmail.com> wrote:
> >
> > > Hi Chris,
> > >
> > >
> > >
> > > *"if I understand your problem correctly, the issue is that you need
> > > todecrement the stock count when you reserve it, rather than splitting
> > it*
> > > *into a second phase."*
> > >
> > 

Re: Event sourcing with Kafka and Kafka Streams. How to deal with atomicity

2017-07-21 Thread Ben Stopford
Hi Jose
If I understand your problem correctly, the issue is that you need to
decrement the stock count when you reserve it, rather than splitting it
into a second phase. You can do this via the DSL with a Transfomer. There's
a related example below. Alternatively you could do it with the processor
API. As Michael suggested, you can now do this atomically with the EoS
feature.

Hi Chris
Kafka Streams provides the index that gives you the findByPk semantic.
Although for high performance use cases the Kafka broker is often used,
alone, via the Memory Image approach.

B
https://github.com/confluentinc/examples/blob/master/kafka-streams/src/test/java/io/confluent/examples/streams/StateStoresInTheDSLIntegrationTest.java
https://martinfowler.com/bliki/MemoryImage.html


On Fri, Jul 21, 2017 at 3:15 PM Michal Borowiecki <
michal.borowie...@openbet.com> wrote:

> With Kafka Streams you get those and atomicity via Exactly-once-Semantics.
>
> Michał
>
>
> On 21/07/17 14:51, Chris Richardson wrote:
> > Hi,
> >
> > I like Kafka but I don't understand the claim that it can be used for
> Event
> > Sourcing (here <
> http://microservices.io/patterns/data/event-sourcing.html>
> > and here )
> >
> > One part of the event sourcing is the ability to subscribe to events
> > published by aggregates and clearly Kafka works well there.
> >
> > But the other part of Event Sourcing is "database" like functionality,
> > which includes
> >
> > - findEventsByPrimaryKey() - needed to be able to reconstruct an
> > aggregate from its events - the essence of event sourcing
> > - Atomic updates -  for updating aggregates  -
> findEventsByPrimaryKey()
> > - business logic - insertNewEvents()) in order to handle this kind of
> > scenario.
> >
> > The approach we have taken is to implement event sourcing using a
> database
> > and Kafka.
> > For instance: see
> >
> https://blog.eventuate.io/2016/10/06/eventuate-local-event-sourcing-and-cqrs-with-spring-boot-apache-kafka-and-mysql/
> >
> > Chris
> >
>
> --
> Signature
> Michal Borowiecki   
> *Senior Software Engineer L4*
> *T:*+44 208 742 1600 <+44%2020%208742%201600> <
> https://signature.openbet/cgi-bin/signature.php#>
>
> *E:*michal.borowie...@openbet.com
> 
> *DL: *  +44 203 249 8448 <+44%2020%203249%208448>
> 
>
> *W:*www.openbet.com <
> https://signature.openbet/cgi-bin/signature.php#>
> **
>
>
> **
>
> 
> This message is confidential and intended only for the addressee. If you
> have received this message in error, please immediately notify the
> postmas...@openbet.com  and delete it
> from your system as well as any copies. The content of e-mails as well
> as traffic data may be monitored by OpenBet for employment and security
> purposes. To protect the environment please do not print this e-mail
> unless necessary. OpenBet Ltd. Registered Office: Chiswick Park Building
> 9, 566 Chiswick High Road, London, W4 5XT, United Kingdom. A company
> registered in England and Wales. Registered no. 3134634. VAT no.
> GB927523612
>
>


Re: Resetting offsets

2017-05-03 Thread Ben Stopford
Hu is correct, there isn't anything currently, but there is an active
proposal:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-122%3A+Add+Reset+Consumer+Group+Offsets+tooling

On Wed, May 3, 2017 at 1:23 PM Hu Xi  wrote:

> Seems there is no command line out of box, but if you could write a simple
> Java client application that firstly calls 'seek' or 'seekToBeginning' to
> reset offsets to what you expect and then invoke commitSync to commit the
> offsets.
>
>
> 
> 发件人: Paul van der Linden 
> 发送时间: 2017年5月3日 18:28
> 收件人: users@kafka.apache.org
> 主题: Resetting offsets
>
> I'm trying to reset the offsets for all partitions for all topics for a
> consumer group, but I can't seem to find a working way.
>
> The command line tool provides a tool to remove a consumer group (which
> would be fine in this occasion), but this is not working with the "new"
> style consumer groups. I tried to set consumer offsets with a client, which
> also didn't work (we are using confluent-kafka-python with librdkafka).
>
> Is there any way to reset the offsets (preferable with python or a command
> line tool)?
>
> Thanks
>


Re: Kafka running in VPN

2017-03-23 Thread Ben Stopford
The bootstrap servers are only used to make an initial connection. From
there the clients's request metadata which provides them with a 'map' of
the cluster. The addresses in the metadata are those registered in
Zookeeper by each broker. They don't relate to the bootstrap list in any
way. You can change these from the default addresses the brokers bind to
via the advertised.listeners / listeners broker properies.

-B

On Wed, Mar 22, 2017 at 1:28 PM Subroto Sanyal 
wrote:

> Hello
>
> I am using Kafka-0.10.0
> My Kafka brokers are running on a 3 node cluster in a VPN. The consumer and
> producer are not part of VPN; so I use ssh tunnels and update the bootstrap
> servers on the Kafka Client (consumer and producer) config accordingly.
> The Kafka Client is able to initiate the primary connection and I see a
> topic is also getting created successfully but, further communication is
> not happening.
> With little debugging found out that there is something like Metadata
> Updater which gets the broker advertised ports and further communication
> using the same InsetSocketAddress.
>
> Is there any way to let the Kafka Clients not use/update the broker
> address?
>
> --
> Cheers,
> *Subroto Sanyal*
>


Re: Relationship fetch.replica.max.bytes and message.max.bytes

2017-03-23 Thread Ben Stopford
Hi Kostas - The docs for replica.fetch.max.bytes should be helpful here:

The number of bytes of messages to attempt to fetch for each partition.
This is not an absolute maximum, if the first message in the first
non-empty partition of the fetch is larger than this value, the message
will still be returned to ensure that progress can be made.

-B

On Thu, Mar 23, 2017 at 3:27 AM Kostas Christidis  wrote:

> Can fetch.replica.max.bytes be equal to message.max.bytes?
>
> 1. The defaults in the official Kafka documentation [1] have the
> parameter "fetch.replica.max.bytes" set to a higher value than
> "message.max.bytes". However, nothing in the description of these
> parameters implies that equality would be wrong.
>
> 2. The relevant passage in pg. 41 in the Definitive Guide book [2]
> does not imply that the former needs to be larger than the latter
> either.
>
> 3. A Cloudera doc [3] however notes that: "replica.fetch.max.bytes
> [...] must be larger than message.max.bytes, or a broker can accept
> messages it cannot replicate, potentially resulting in data loss."
>
> 4. The only other reference I could find to this strict inequality was
> this StackOverflow comment [4].
>
> So:
>
> Does fetch.replica.max.bytes *have* to be strictly larger to
> message.max.bytes?
>
> If so, what is the technical reason behind this?
>
> Thank you.
>
> [1] https://kafka.apache.org/documentation/
> [2] https://shop.oreilly.com/product/0636920044123.do
> [3]
> https://www.cloudera.com/documentation/kafka/latest/topics/kafka_performance.html
> [4] http://stackoverflow.com/a/39026744/2363529
>


Re: KIP-122: Add a tool to Reset Consumer Group Offsets

2017-02-08 Thread Ben Stopford
Yes - using a tool like this to skip a set of consumer groups over a
corrupt/bad message is definitely appealing.

B

On Wed, Feb 8, 2017 at 9:37 PM Gwen Shapira  wrote:

> I like the --reset-to-earliest and --reset-to-latest. In general,
> since the JSON route is the most challenging for users, we want to
> provide a lot of ways to do useful things without going there.
>
> Two things that can help:
>
> 1. A lot of times, users want to skip few messages that cause issues
> and continue. maybe just specifying the topic, partition and delta
> will be better than having to find the offset and write a JSON and
> validate the JSON etc.
>
> 2. Thinking if there are other common use-cases that we can make easy
> rather than just one generic but not very usable method.
>
> Gwen
>
> On Wed, Feb 8, 2017 at 3:25 AM, Jorge Esteban Quilcate Otoya
>  wrote:
> > Thanks for the feedback!
> >
> > @Onur, @Gwen:
> >
> > Agree. Actually at the first draft I considered to have it inside
> > ´kafka-consumer-groups.sh´, but I decide to propose it as a standalone
> tool
> > to describe it clearly and focus it on reset functionality.
> >
> > But now that you mentioned, it does make sense to have it in
> > ´kafka-consumer-groups.sh´. How would be a consistent way to introduce
> it?
> >
> > Maybe something like this:
> >
> > ´kafka-consumer-groups.sh --reset-offset --generate --group cg1 --topics
> t1
> > --reset-from 2017-01-01T00:00:00.000 --output plan.json´
> >
> > ´kafka-consumer-groups.sh --reset-offset --verify --reset-json-file
> > plan.json´
> >
> > ´kafka-consumer-groups.sh --reset-offset --execute --reset-json-file
> > plan.json´
> >
> > ´kafka-consumer-groups.sh --reset-offset --generate-and-execute --group
> cg1
> > --topics t1 --reset-from 2017-01-01T00:00:00.000´
> >
> > @Gwen:
> >
> >> It looks exactly like the replica assignment tool
> >
> > It was influenced by ;-) I use the generate-verify-execute process here
> to
> > make sure user will be aware of the result of this operation. At the
> > beginning we considered only add a couple of options to Consumer Group
> > Command:
> >
> > --rewind-to-timestamp and --rewind-to-period
> >
> > @Onur:
> >
> >> You can actually get away with overriding while members of the group
> are live
> > with method 2 by using group information from DescribeGroupsRequest.
> >
> > This means that we need to have Consumer Group stopped before executing
> and
> > start a new consumer internally to do this? Therefore, we won't be able
> to
> > consider executing reset when ConsumerGroup is active? (trying to relate
> it
> > with @Dong 5th question)
> >
> > @Dong:
> >
> >> Should we allow user to use wildcard to reset offset of all groups for a
> > given topic as well?
> >
> > I haven't thought about this scenario. Could be interesting. Following
> the
> > recommendation to add it into Consumer Group Command, in this case Group
> > argument will be optional if there are only 1 topic. I think for multiple
> > topic won't be that useful.
> >
> >> Should we allow user to specify timestamp per topic partition in the
> json
> > file as well?
> >
> > Don't think this could be a valid from the tool, but if Reset Plan is
> > generated, and user want to set the offset for a specific partition to
> > other offset (eventually based on another timestamp), and execute it, it
> > will be up to her/him.
> >
> >> Should the script take some credential file to make sure that this
> > operation is authenticated given the potential impact of this operation?
> >
> > Haven't tried to secure brokers yet, but the tool should support
> > authorization if it's enabled in the broker.
> >
> >> Should we provide constant to reset committed offset to earliest/latest
> > offset of a partition, e.g. -1 indicates earliest offset and -2 indicates
> > latest offset.
> >
> > I will go for something like ´--reset-to-earliest´ and
> ´--reset-to-latest´
> >
> >> Should we allow dynamic change of the comitted offset when consumer are
> > running, such that consumer will seek to the newly committed offset and
> > start consuming from there?
> >
> > Not sure about this. I will recommend to keep it simple and ask user to
> > stop consumers first. But I would considered it if the trade-offs are
> > clear.
> >
> > @Matthias
> >
> > Added :). And thanks a lot for your help to define this KIP!
> >
> >
> >
> > El mié., 8 feb. 2017 a las 7:47, Gwen Shapira ()
> > escribió:
> >
> >> As long as the CLI is a bit consistent? Like, not just adding 3
> >> arguments and a JSON parser to the existing tool, right?
> >>
> >> On Tue, Feb 7, 2017 at 10:29 PM, Onur Karaman
> >>  wrote:
> >> > I think it makes sense to just add the feature to
> >> kafka-consumer-groups.sh
> >> >
> >> > On Tue, Feb 7, 2017 at 10:24 PM, Gwen Shapira 
> wrote:
> >> >
> >> >> Thanks for the KIP. I'm super happy about adding the capability.
> >> >>
> >> >> I hate 

Re: [ANNOUNCE] New committer: Grant Henke

2017-01-11 Thread Ben Stopford
Congrats Grant!!
On Wed, 11 Jan 2017 at 20:01, Ismael Juma  wrote:

> Congratulations Grant, well deserved. :)
>
> Ismael
>
> On 11 Jan 2017 7:51 pm, "Gwen Shapira"  wrote:
>
> > The PMC for Apache Kafka has invited Grant Henke to join as a
> > committer and we are pleased to announce that he has accepted!
> >
> > Grant contributed 88 patches, 90 code reviews, countless great
> > comments on discussions, a much-needed cleanup to our protocol and the
> > on-going and critical work on the Admin protocol. Throughout this, he
> > displayed great technical judgment, high-quality work and willingness
> > to contribute where needed to make Apache Kafka awesome.
> >
> > Thank you for your contributions, Grant :)
> >
> > --
> > Gwen Shapira
> > Product Manager | Confluent
> > 650.450.2760 | @gwenshap
> > Follow us: Twitter | blog
> >
>


[VOTE] KIP-106 - Default unclean.leader.election.enabled True => False

2017-01-11 Thread Ben Stopford
Looks like there was a good consensus on the discuss thread for KIP-106 so
lets move to a vote.

Please chime in if you would like to change the default for
unclean.leader.election.enabled from true to false.

https://cwiki.apache.org/confluence/display/KAFKA/%5BWIP%5D+KIP-106+-+Change+Default+unclean.leader.election.enabled+from+True+to+False

B


Re: [VOTE] Vote for KIP-101 - Leader Epochs

2017-01-11 Thread Ben Stopford
OK - my mistake was mistaken! There is consensus. This KIP has been
accepted.

On Wed, Jan 11, 2017 at 6:48 PM Ben Stopford <b...@confluent.io> wrote:

> Sorry - my mistake. Looks like I still need one more binding vote. Is
> there a committer out there that could add their vote?
>
> B
>
> On Wed, Jan 11, 2017 at 6:44 PM Ben Stopford <b...@confluent.io> wrote:
>
> So I believe we can mark this as Accepted. I've updated the KIP page.
> Thanks for the input everyone.
>
> On Fri, Jan 6, 2017 at 9:31 AM Ben Stopford <b...@confluent.io> wrote:
>
> Thanks Joel. I'll fix up the pics to make them consistent on nomenclature.
>
>
> B
>
> On Fri, Jan 6, 2017 at 2:39 AM Joel Koshy <jjkosh...@gmail.com> wrote:
>
> (adding the dev list back - as it seems to have gotten dropped earlier in
> this thread)
>
> On Thu, Jan 5, 2017 at 6:36 PM, Joel Koshy <jjkosh...@gmail.com> wrote:
>
> > +1
> >
> > This is a very well-written KIP!
> > Minor: there is still a mix of terms in the doc that references the
> > earlier LeaderGenerationRequest (which is what I'm assuming what it was
> > called in previous versions of the wiki). Same for the diagrams which I'm
> > guessing are a little harder to make consistent with the text.
> >
> >
> >
> > On Thu, Jan 5, 2017 at 5:54 PM, Jun Rao <j...@confluent.io> wrote:
> >
> >> Hi, Ben,
> >>
> >> Thanks for the updated KIP. +1
> >>
> >> 1) In OffsetForLeaderEpochResponse, start_offset probably should be
> >> end_offset since it's the end offset of that epoch.
> >> 3) That's fine. We can fix KAFKA-1120 separately.
> >>
> >> Jun
> >>
> >>
> >> On Thu, Jan 5, 2017 at 11:11 AM, Ben Stopford <b...@confluent.io> wrote:
> >>
> >> > Hi Jun
> >> >
> >> > Thanks for raising these points. Thorough as ever!
> >> >
> >> > 1) Changes made as requested.
> >> > 2) Done.
> >> > 3) My plan for handing returning leaders is to simply to force the
> >> Leader
> >> > Epoch to increment if a leader returns. I don't plan to fix KAFKA-1120
> >> as
> >> > part of this KIP. It is really a separate issue with wider
> implications.
> >> > I'd be happy to add KAFKA-1120 into the release though if we have
> time.
> >> > 4) Agreed. Not sure exactly how that's going to play out, but I think
> >> we're
> >> > on the same page.
> >> >
> >> > Please could
> >> >
> >> > Cheers
> >> > B
> >> >
> >> > On Thu, Jan 5, 2017 at 12:50 AM Jun Rao <j...@confluent.io> wrote:
> >> >
> >> > > Hi, Ben,
> >> > >
> >> > > Thanks for the proposal. Looks good overall. A few comments below.
> >> > >
> >> > > 1. For LeaderEpochRequest, we need to include topic right? We
> probably
> >> > want
> >> > > to follow other requests by nesting partition inside topic? For
> >> > > LeaderEpochResponse,
> >> > > do we need to return leader_epoch? I was thinking that we could just
> >> > return
> >> > > an end_offset, which is the next offset of the last message in the
> >> > > requested leader generation. Finally, would
> >> OffsetForLeaderEpochRequest
> >> > be
> >> > > a better name?
> >> > >
> >> > > 2. We should bump up both the produce request and the fetch request
> >> > > protocol version since both include the message set.
> >> > >
> >> > > 3. Extending LeaderEpoch to include Returning Leaders: To support
> >> this,
> >> > do
> >> > > you plan to use the approach that stores  CZXID in the broker
> >> > registration
> >> > > and including the CZXID of the leader in /brokers/topics/[topic]/
> >> > > partitions/[partitionId]/state in ZK?
> >> > >
> >> > > 4. Since there are a few other KIPs involving message format too, it
> >> > would
> >> > > be useful to consider if we could combine the message format changes
> >> in
> >> > the
> >> > > same release.
> >> > >
> >> > > Thanks,
> >> > >
> >> > > Jun
> >> > >
> >> > >
> >> > > On Wed, Jan 4, 2017 at 9:24 AM, Ben Stopford <b...@confluent.io>
> >> wrote:
> >> > >
> >> > > > Hi All
> >> > > >
> >> > > > We’re having some problems with this thread being subsumed by the
> >> > > > [Discuss] thread. Hopefully this one will appear distinct. If you
> >> see
> >> > > more
> >> > > > than one, please use this one.
> >> > > >
> >> > > > KIP-101 should now be ready for a vote. As a reminder the KIP
> >> proposes
> >> > a
> >> > > > change to the replication protocol to remove the potential for
> >> replicas
> >> > > to
> >> > > > diverge.
> >> > > >
> >> > > > The KIP can be found here:  https://cwiki.apache.org/confl
> >> > > > uence/display/KAFKA/KIP-101+-+Alter+Replication+Protocol+to+
> >> > > > use+Leader+Epoch+rather+than+High+Watermark+for+Truncation <
> >> > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-101+-
> >> > > > +Alter+Replication+Protocol+to+use+Leader+Epoch+rather+
> >> > > > than+High+Watermark+for+Truncation>
> >> > > >
> >> > > > Please let us know your vote.
> >> > > >
> >> > > > B
> >> > > >
> >> > > >
> >> > > >
> >> > > >
> >> > > >
> >> > >
> >> >
> >>
> >
> >
>
>


Re: [VOTE] Vote for KIP-101 - Leader Epochs

2017-01-11 Thread Ben Stopford
Sorry - my mistake. Looks like I still need one more binding vote. Is there
a committer out there that could add their vote?

B

On Wed, Jan 11, 2017 at 6:44 PM Ben Stopford <b...@confluent.io> wrote:

> So I believe we can mark this as Accepted. I've updated the KIP page.
> Thanks for the input everyone.
>
> On Fri, Jan 6, 2017 at 9:31 AM Ben Stopford <b...@confluent.io> wrote:
>
> Thanks Joel. I'll fix up the pics to make them consistent on nomenclature.
>
>
> B
>
> On Fri, Jan 6, 2017 at 2:39 AM Joel Koshy <jjkosh...@gmail.com> wrote:
>
> (adding the dev list back - as it seems to have gotten dropped earlier in
> this thread)
>
> On Thu, Jan 5, 2017 at 6:36 PM, Joel Koshy <jjkosh...@gmail.com> wrote:
>
> > +1
> >
> > This is a very well-written KIP!
> > Minor: there is still a mix of terms in the doc that references the
> > earlier LeaderGenerationRequest (which is what I'm assuming what it was
> > called in previous versions of the wiki). Same for the diagrams which I'm
> > guessing are a little harder to make consistent with the text.
> >
> >
> >
> > On Thu, Jan 5, 2017 at 5:54 PM, Jun Rao <j...@confluent.io> wrote:
> >
> >> Hi, Ben,
> >>
> >> Thanks for the updated KIP. +1
> >>
> >> 1) In OffsetForLeaderEpochResponse, start_offset probably should be
> >> end_offset since it's the end offset of that epoch.
> >> 3) That's fine. We can fix KAFKA-1120 separately.
> >>
> >> Jun
> >>
> >>
> >> On Thu, Jan 5, 2017 at 11:11 AM, Ben Stopford <b...@confluent.io> wrote:
> >>
> >> > Hi Jun
> >> >
> >> > Thanks for raising these points. Thorough as ever!
> >> >
> >> > 1) Changes made as requested.
> >> > 2) Done.
> >> > 3) My plan for handing returning leaders is to simply to force the
> >> Leader
> >> > Epoch to increment if a leader returns. I don't plan to fix KAFKA-1120
> >> as
> >> > part of this KIP. It is really a separate issue with wider
> implications.
> >> > I'd be happy to add KAFKA-1120 into the release though if we have
> time.
> >> > 4) Agreed. Not sure exactly how that's going to play out, but I think
> >> we're
> >> > on the same page.
> >> >
> >> > Please could
> >> >
> >> > Cheers
> >> > B
> >> >
> >> > On Thu, Jan 5, 2017 at 12:50 AM Jun Rao <j...@confluent.io> wrote:
> >> >
> >> > > Hi, Ben,
> >> > >
> >> > > Thanks for the proposal. Looks good overall. A few comments below.
> >> > >
> >> > > 1. For LeaderEpochRequest, we need to include topic right? We
> probably
> >> > want
> >> > > to follow other requests by nesting partition inside topic? For
> >> > > LeaderEpochResponse,
> >> > > do we need to return leader_epoch? I was thinking that we could just
> >> > return
> >> > > an end_offset, which is the next offset of the last message in the
> >> > > requested leader generation. Finally, would
> >> OffsetForLeaderEpochRequest
> >> > be
> >> > > a better name?
> >> > >
> >> > > 2. We should bump up both the produce request and the fetch request
> >> > > protocol version since both include the message set.
> >> > >
> >> > > 3. Extending LeaderEpoch to include Returning Leaders: To support
> >> this,
> >> > do
> >> > > you plan to use the approach that stores  CZXID in the broker
> >> > registration
> >> > > and including the CZXID of the leader in /brokers/topics/[topic]/
> >> > > partitions/[partitionId]/state in ZK?
> >> > >
> >> > > 4. Since there are a few other KIPs involving message format too, it
> >> > would
> >> > > be useful to consider if we could combine the message format changes
> >> in
> >> > the
> >> > > same release.
> >> > >
> >> > > Thanks,
> >> > >
> >> > > Jun
> >> > >
> >> > >
> >> > > On Wed, Jan 4, 2017 at 9:24 AM, Ben Stopford <b...@confluent.io>
> >> wrote:
> >> > >
> >> > > > Hi All
> >> > > >
> >> > > > We’re having some problems with this thread being subsumed by the
> >> > > > [Discuss] thread. Hopefully this one will appear distinct. If you
> >> see
> >> > > more
> >> > > > than one, please use this one.
> >> > > >
> >> > > > KIP-101 should now be ready for a vote. As a reminder the KIP
> >> proposes
> >> > a
> >> > > > change to the replication protocol to remove the potential for
> >> replicas
> >> > > to
> >> > > > diverge.
> >> > > >
> >> > > > The KIP can be found here:  https://cwiki.apache.org/confl
> >> > > > uence/display/KAFKA/KIP-101+-+Alter+Replication+Protocol+to+
> >> > > > use+Leader+Epoch+rather+than+High+Watermark+for+Truncation <
> >> > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-101+-
> >> > > > +Alter+Replication+Protocol+to+use+Leader+Epoch+rather+
> >> > > > than+High+Watermark+for+Truncation>
> >> > > >
> >> > > > Please let us know your vote.
> >> > > >
> >> > > > B
> >> > > >
> >> > > >
> >> > > >
> >> > > >
> >> > > >
> >> > >
> >> >
> >>
> >
> >
>
>


Re: [VOTE] Vote for KIP-101 - Leader Epochs

2017-01-06 Thread Ben Stopford
Thanks Joel. I'll fix up the pics to make them consistent on nomenclature.

B

On Fri, Jan 6, 2017 at 2:39 AM Joel Koshy <jjkosh...@gmail.com> wrote:

> (adding the dev list back - as it seems to have gotten dropped earlier in
> this thread)
>
> On Thu, Jan 5, 2017 at 6:36 PM, Joel Koshy <jjkosh...@gmail.com> wrote:
>
> > +1
> >
> > This is a very well-written KIP!
> > Minor: there is still a mix of terms in the doc that references the
> > earlier LeaderGenerationRequest (which is what I'm assuming what it was
> > called in previous versions of the wiki). Same for the diagrams which I'm
> > guessing are a little harder to make consistent with the text.
> >
> >
> >
> > On Thu, Jan 5, 2017 at 5:54 PM, Jun Rao <j...@confluent.io> wrote:
> >
> >> Hi, Ben,
> >>
> >> Thanks for the updated KIP. +1
> >>
> >> 1) In OffsetForLeaderEpochResponse, start_offset probably should be
> >> end_offset since it's the end offset of that epoch.
> >> 3) That's fine. We can fix KAFKA-1120 separately.
> >>
> >> Jun
> >>
> >>
> >> On Thu, Jan 5, 2017 at 11:11 AM, Ben Stopford <b...@confluent.io> wrote:
> >>
> >> > Hi Jun
> >> >
> >> > Thanks for raising these points. Thorough as ever!
> >> >
> >> > 1) Changes made as requested.
> >> > 2) Done.
> >> > 3) My plan for handing returning leaders is to simply to force the
> >> Leader
> >> > Epoch to increment if a leader returns. I don't plan to fix KAFKA-1120
> >> as
> >> > part of this KIP. It is really a separate issue with wider
> implications.
> >> > I'd be happy to add KAFKA-1120 into the release though if we have
> time.
> >> > 4) Agreed. Not sure exactly how that's going to play out, but I think
> >> we're
> >> > on the same page.
> >> >
> >> > Please could
> >> >
> >> > Cheers
> >> > B
> >> >
> >> > On Thu, Jan 5, 2017 at 12:50 AM Jun Rao <j...@confluent.io> wrote:
> >> >
> >> > > Hi, Ben,
> >> > >
> >> > > Thanks for the proposal. Looks good overall. A few comments below.
> >> > >
> >> > > 1. For LeaderEpochRequest, we need to include topic right? We
> probably
> >> > want
> >> > > to follow other requests by nesting partition inside topic? For
> >> > > LeaderEpochResponse,
> >> > > do we need to return leader_epoch? I was thinking that we could just
> >> > return
> >> > > an end_offset, which is the next offset of the last message in the
> >> > > requested leader generation. Finally, would
> >> OffsetForLeaderEpochRequest
> >> > be
> >> > > a better name?
> >> > >
> >> > > 2. We should bump up both the produce request and the fetch request
> >> > > protocol version since both include the message set.
> >> > >
> >> > > 3. Extending LeaderEpoch to include Returning Leaders: To support
> >> this,
> >> > do
> >> > > you plan to use the approach that stores  CZXID in the broker
> >> > registration
> >> > > and including the CZXID of the leader in /brokers/topics/[topic]/
> >> > > partitions/[partitionId]/state in ZK?
> >> > >
> >> > > 4. Since there are a few other KIPs involving message format too, it
> >> > would
> >> > > be useful to consider if we could combine the message format changes
> >> in
> >> > the
> >> > > same release.
> >> > >
> >> > > Thanks,
> >> > >
> >> > > Jun
> >> > >
> >> > >
> >> > > On Wed, Jan 4, 2017 at 9:24 AM, Ben Stopford <b...@confluent.io>
> >> wrote:
> >> > >
> >> > > > Hi All
> >> > > >
> >> > > > We’re having some problems with this thread being subsumed by the
> >> > > > [Discuss] thread. Hopefully this one will appear distinct. If you
> >> see
> >> > > more
> >> > > > than one, please use this one.
> >> > > >
> >> > > > KIP-101 should now be ready for a vote. As a reminder the KIP
> >> proposes
> >> > a
> >> > > > change to the replication protocol to remove the potential for
> >> replicas
> >> > > to
> >> > > > diverge.
> >> > > >
> >> > > > The KIP can be found here:  https://cwiki.apache.org/confl
> >> > > > uence/display/KAFKA/KIP-101+-+Alter+Replication+Protocol+to+
> >> > > > use+Leader+Epoch+rather+than+High+Watermark+for+Truncation <
> >> > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-101+-
> >> > > > +Alter+Replication+Protocol+to+use+Leader+Epoch+rather+
> >> > > > than+High+Watermark+for+Truncation>
> >> > > >
> >> > > > Please let us know your vote.
> >> > > >
> >> > > > B
> >> > > >
> >> > > >
> >> > > >
> >> > > >
> >> > > >
> >> > >
> >> >
> >>
> >
> >
>


Re: [VOTE] Vote for KIP-101 - Leader Epochs

2017-01-05 Thread Ben Stopford
Hi Jun

Thanks for raising these points. Thorough as ever!

1) Changes made as requested.
2) Done.
3) My plan for handing returning leaders is to simply to force the Leader
Epoch to increment if a leader returns. I don't plan to fix KAFKA-1120 as
part of this KIP. It is really a separate issue with wider implications.
I'd be happy to add KAFKA-1120 into the release though if we have time.
4) Agreed. Not sure exactly how that's going to play out, but I think we're
on the same page.

Please could

Cheers
B

On Thu, Jan 5, 2017 at 12:50 AM Jun Rao <j...@confluent.io> wrote:

> Hi, Ben,
>
> Thanks for the proposal. Looks good overall. A few comments below.
>
> 1. For LeaderEpochRequest, we need to include topic right? We probably want
> to follow other requests by nesting partition inside topic? For
> LeaderEpochResponse,
> do we need to return leader_epoch? I was thinking that we could just return
> an end_offset, which is the next offset of the last message in the
> requested leader generation. Finally, would OffsetForLeaderEpochRequest be
> a better name?
>
> 2. We should bump up both the produce request and the fetch request
> protocol version since both include the message set.
>
> 3. Extending LeaderEpoch to include Returning Leaders: To support this, do
> you plan to use the approach that stores  CZXID in the broker registration
> and including the CZXID of the leader in /brokers/topics/[topic]/
> partitions/[partitionId]/state in ZK?
>
> 4. Since there are a few other KIPs involving message format too, it would
> be useful to consider if we could combine the message format changes in the
> same release.
>
> Thanks,
>
> Jun
>
>
> On Wed, Jan 4, 2017 at 9:24 AM, Ben Stopford <b...@confluent.io> wrote:
>
> > Hi All
> >
> > We’re having some problems with this thread being subsumed by the
> > [Discuss] thread. Hopefully this one will appear distinct. If you see
> more
> > than one, please use this one.
> >
> > KIP-101 should now be ready for a vote. As a reminder the KIP proposes a
> > change to the replication protocol to remove the potential for replicas
> to
> > diverge.
> >
> > The KIP can be found here:  https://cwiki.apache.org/confl
> > uence/display/KAFKA/KIP-101+-+Alter+Replication+Protocol+to+
> > use+Leader+Epoch+rather+than+High+Watermark+for+Truncation <
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-101+-
> > +Alter+Replication+Protocol+to+use+Leader+Epoch+rather+
> > than+High+Watermark+for+Truncation>
> >
> > Please let us know your vote.
> >
> > B
> >
> >
> >
> >
> >
>


[VOTE] Vote for KIP-101 - Leader Epochs

2017-01-04 Thread Ben Stopford
Hi All

We’re having some problems with this thread being subsumed by the [Discuss] 
thread. Hopefully this one will appear distinct. If you see more than one, 
please use this one. 

KIP-101 should now be ready for a vote. As a reminder the KIP proposes a change 
to the replication protocol to remove the potential for replicas to diverge.

The KIP can be found here:  
https://cwiki.apache.org/confluence/display/KAFKA/KIP-101+-+Alter+Replication+Protocol+to+use+Leader+Epoch+rather+than+High+Watermark+for+Truncation
 


Please let us know your vote. 

B






Re: Questions about single consumer per partition approach

2016-12-21 Thread Ben Stopford
Hi Alexi

Typically you would use a key to guarantee that messages with the same key
have a global ordering, rather than using manual assignment. Kafka will
send all messages with the same key to the same partition. If you need
global ordering, spanning all messages from a single producer, you can use
a single partition topic. This will limit you to one active consumer per
consumer group as the consumer group protocol guarantees that a partition
can only be assigned to one consumer, within a group, at one time.

B

On Wed, Dec 21, 2016 at 4:36 AM Alexei Levashov <
alexei.levas...@arrayent.com> wrote:

> Hello,
> I have a few newbie questions about usage of Kafka as a messaging system.
> Kafka version - 0.10.1.0.
>
> 1 - Let's assume that I want to ensure time sequence of events i.e. if
> message A from producer was published at time t1 to partition P and message
> B from the same producer published to partition P at time t2,
> I want to consume message A before message B, provided t1
> Question1,
> Do I have any choice except one consumer per partition?
>
> 2. - If I have one consumer per partition and use
>  consumer.assign(partitionList) call to assign consumer to a partition do I
> still need group membership for this single consumer?
>  I didn't find clear description what is the protocol of interaction
> between GroupCoordinator and PartitionLeader
> <
> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Client-side+Assignment+Proposal
> >
> will be in case of  "manual" partition assignment.
>  On one hand the API documentation
> <
> https://kafka.apache.org/0101/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html
> >
> says
> that :
>  "Manual partition assignment does not use group coordination, so
> consumer failures will not cause assigned partitions to be rebalanced.
>   Each consumer acts independently even if it shares a groupId with
> another consumer.
>   To avoid offset commit conflicts, you should usually ensure that the
> groupId is unique for each consumer instance."
>
>   On the other hand I am still consuming messages in
> consumer.poll(timeout) loop and inside this poll() call consumer should
> send heartbeats to coordinator.
>
> Question 2 .
>  If consumer doesn't send these heartbeats for [*session.timeout.ms
> *] period of time  should the partition
> ownership be revoked or not?
>
>  If no - does it mean I have to use homegrown heartbeats for consumer
> state monitoring? How would the application know that the consumer thread
> is dead?
>  If yes - what callback to notify the application can I use?
> ConsumerRebalanceListener is available only for group subscription.
>
> Thank you.
>


Re: Halting because log truncation is not allowed for topic __consumer_offsets

2016-12-19 Thread Ben Stopford
Hi Jun

This should only be possible in situations where there is a crash or
something happens to the underlying disks (assuming clean leader election).
I've not come across others. The assumption, as I understand it, is that
the underlying issue stems from KAFKA-1211
 which is being addressed
in KIP-101
.
If you can reproduce in a more generally scenario we would be very
interested.

All the best
B


On Mon, Dec 19, 2016 at 12:35 AM Jun MA  wrote:

> Would be grateful to hear opinions from experts out there. Thanks in
> advance
>
> > On Dec 17, 2016, at 11:06 AM, Jun MA  wrote:
> >
> > Hi,
> >
> > We saw the following FATAL error in 2 of our brokers (3 in total, the
> active controller doesn’t have this) and they crashed in the same time.
> >
> > [2016-12-16 16:12:47,085] FATAL [ReplicaFetcherThread-0-3], Halting
> because log truncation is not allowed for topic __consumer_offsets, Current
> leader 3's latest offset 5910081 is less than replica 1's latest offset
> 5910082 (kafka.server.ReplicaFetcherThread)
> >
> > Our solution is set topic __consumer_offsets
> unclean.leader.election.enable=true temporarily, and restart brokers. In
> this way we potentially lose offsets of some topics. Is there any better
> solutions?
> >
> > I saw related tickets https://issues.apache.org/jira/browse/KAFKA-3861 <
> https://issues.apache.org/jira/browse/KAFKA-3861>,
> https://issues.apache.org/jira/browse/KAFKA-3410 <
> https://issues.apache.org/jira/browse/KAFKA-3410> and understand why
> brokers crashed. But we didn’t see any scenarios mentioned in above
> tickets. Is there any other reason why this happened? There’s no broker
> restart involved in our case. How can we prevent it from happening?
> >
> > We’re using 0.9.0 with unclean.leader.election.enable=false and
> min.insync.replicas=2.
> >
> > Thanks,
> > Jun
>
>


Re: Training Kafka and ZooKeeper - Monitoring and Operability

2016-10-11 Thread Ben Stopford
Useful resource Nico, Thanks

B

On Tuesday, 11 October 2016, Nicolas Motte <lingusi...@gmail.com> wrote:

> Hi everyone,
>
> I created a training for Application Management and OPS teams in my
> company.
> Some sections are specific to our deployment, but most of them are generic
> and explain how Kafka and ZooKeeper work.
>
> I uploaded it on SlideShare, I thought it might be useful to other people:
> http://fr.slideshare.net/NicolasMotte/training-kafka-
> and-zookeeper-monitoring-and-operability
>
> In the description you will get a link to the version with audio
> description.
>
> Cheers
> Nico
>


-- 
Ben Stopford


Re: rate-limiting on rebalancing, or sync from non-leaders?

2016-07-04 Thread Ben Stopford
Hi Charity

There will be a KIP for this coming out shortly. 

All the best
B


> On 4 Jul 2016, at 13:14, Alexis Midon  wrote:
> 
> Same here at Airbnb. Moving data is the biggest operational challenge
> because of the network bandwidth cannibalization.
> I was hoping that rate limiting would apply to replica fetchers too.
> 
> On Sun, Jul 3, 2016 at 15:38 Tom Crayford  wrote:
> 
>> Hi Charity,
>> 
>> I'm not sure about the roadmap. The way we (and linkedin/dropbox/netflix)
>> handle rebalances right now is to do a small handful of partitions at a
>> time (LinkedIn does 10 partitions at a time the last I heard), not a big
>> bang rebalance of all the partitions in the cluster. That's not perfect and
>> not great throttling, and I agree that it's something Kafka desperately
>> needs to work on.
>> 
>> Thanks
>> 
>> Tom Crayford
>> Heroku Kafka
>> 
>> On Sun, Jul 3, 2016 at 2:00 AM, Charity Majors  wrote:
>> 
>>> Hi there,
>>> 
>>> I'm curious if there's anything on the Kafka roadmap for adding
>>> rate-limiting or max-throughput for rebalancing processes.
>>> 
>>> Alternately, if you have RF>2, maybe a setting to instruct followers to
>>> sync from other followers?
>>> 
>>> I'm super impressed with how fast and efficient the kafka data
>> rebalancing
>>> process is, but also fear for the future when it's battling for resources
>>> against high production trafffic.  :)
>>> 
>> 



Re: Coordinator lost for consumer groups

2016-07-01 Thread Ben Stopford
You might try increasing the log.cleaner.dedupe.buffer.size. This should 
increase the deduplication yield for each scan.

If you haven’t seen them there are some notes on log compaction here: 
https://cwiki.apache.org/confluence/display/KAFKA/Log+Compaction 



> On 1 Jul 2016, at 10:10, Sathyakumar Seshachalam 
>  wrote:
> 
> The problem still persists. But note that I was running old consumer (Zk
> based) to describe consumers. Running ./kafka-consumer-groups.sh
> kafka-groups.sh --bootstrap-server 10.211.16.215 --group groupX --describe,
> I get the below error. So none of the consumer groups seem to have a
> coordinator now.
> Error while executing consumer group command This is not the correct
> coordinator for this group.
> org.apache.kafka.common.errors.NotCoordinatorForGroupException: This is not
> the correct coordinator for this group.
> 
> On Fri, Jul 1, 2016 at 2:15 PM, Sathyakumar Seshachalam <
> sathyakumar_seshacha...@trimble.com> wrote:
> 
>> And I am willing to suspend log compaction and restart the brokers, but am
>> worried if that will leave the system in a recoverable state or If I just
>> have to wait it out.
>> 
>> On Fri, Jul 1, 2016 at 2:06 PM, Sathyakumar Seshachalam <
>> sathyakumar_seshacha...@trimble.com> wrote:
>> 
>>> Hi,
>>> 
>>> I have 3 Kafka nodes (running 0.9.0) that all had active consumers and
>>> producers.
>>> 
>>> Now all these had uncompacted __consumer_offsets group that grew to 1.8
>>> TB. So I restarted these nodes with a log.cleaner.enabled to save some
>>> space. Since then consumers have stalled.
>>> 
>>> When I do a ./kafka-consumer-groups.sh kafka-groups.sh --zookeeper
>>> 10.211.16.215 --group groupX --describe,
>>> 
>>> I get
>>> "Could not fetch offset from kafka for group GroupX partition [GroupX, 0]
>>> due to kafka.common.NotCoordinatorForConsumerException". Note that the
>>> compaction is still in progress. And I get this for most consumer groups.
>>> 
>>> Any clues how to fix this ?
>>> 
>>> Regards,
>>> Sathya
>>> 
>>> 
>>> 
>>> 
>>> 
>> 
>> 



Re: How many connections per consumer/producer

2016-06-30 Thread Ben Stopford
Hi Dhiaraj

That shouldn’t be the case. As I understand it both the producer and consumer 
hold a single connection to each broker they need to communicate with. Multiple 
produce requests can be sent through a single connection in the producer (the 
number being configurable with max.in.flight.requests.per.connection and 
non-blocking io is used). Consumers (new consumer) send a single request to 
each broker. 

Hope that helps

B
> On 30 Jun 2016, at 11:03, dhiraj prajapati  wrote:
> 
> Hi,
> I am using new Kafka Consumer and Producer APIs (version 0.9.0.1)
> I see that my consumer as well as producer has multiple connections
> established with kafka brokers. Why is this so?
> Does the consumer and producer APIs use connection pooling? If yes, where
> do I configure the pool size?
> 
> Regards,
> Dhiraj



Re: Setting max fetch size for the console consumer

2016-06-24 Thread Ben Stopford
It’s actually more than one setting: 
http://stackoverflow.com/questions/21020347/kafka-sending-a-15mb-message 


B
> On 24 Jun 2016, at 14:31, Tauzell, Dave  wrote:
> 
> How do I set the maximum fetch size for the console consumer?
> 
> I'm getting this error when doing some testing with large messages:
> 
> kafka.common.MessageSizeTooLargeException: Found a message larger than the 
> maximum fetch size of this consumer on topic replicated_twice partition 28 at 
> fetch offset 11596. Increase the fetch size, or decrease the maximum message 
> size the broker will allow.
> 
> 
> 
> Dave Tauzell | Senior Software Engineer | Surescripts
> O: 651.855.3042 | www.surescripts.com |   
> dave.tauz...@surescripts.com
> Connect with us: Twitter I 
> LinkedIn I 
> Facebook I 
> YouTube
> 
> 
> This e-mail and any files transmitted with it are confidential, may contain 
> sensitive information, and are intended solely for the use of the individual 
> or entity to whom they are addressed. If you have received this e-mail in 
> error, please notify the sender by reply e-mail immediately and destroy all 
> copies of the e-mail and any attachments.



Re: is kafka the right choice

2016-06-24 Thread Ben Stopford
correction: elevates => alleviates

> On 24 Jun 2016, at 11:13, Ben Stopford <b...@confluent.io> wrote:
> 
> Kafka uses a long poll 
> <http://kafka.apache.org/documentation.html#design_pull>. So requests 
> effectively block on the server, if there is insufficient data available. 
> This elevates many of the issues associated with traditional polling 
> approaches. 
> 
> Service-based applications often require directed channels to do request 
> response. People do use Kafka in this way, Philippe gave a good example 
> below.  You just need to be aware that, should you have a lot of services 
> that need to interact, it could involve creating a lot of topics.  Kafka 
> topics are persistent and generally long lived. They shouldn’t be considered 
> ephemeral imho. 
> 
> B
> 
> 
>> On 23 Jun 2016, at 17:35, Philippe Derome <phder...@gmail.com 
>> <mailto:phder...@gmail.com>> wrote:
>> 
>> See Keyhole Software blog and particularly John Boardman's presentation of
>> sample app with responsive web client using WebSockets connecting to a
>> netty embedded web server that itself uses producer and consumer clients
>> with a Kafka infrastructure (@johnwboardman). On first look, it seems like
>> a valid approach. Behind the web server are services that are Kafka apps
>> interacting with external web APIs.
>> 
>> Anecdotally quite a few companies post jobs with Kafka playing a role in a
>> micro architecture solution.
>> 
>> I'll now let experts speak...
>> On 23 Jun 2016 11:47 a.m., "Pranay Suresh" <pranay.sur...@gmail.com 
>> <mailto:pranay.sur...@gmail.com>> wrote:
>> 
>>> Hey Kafka experts,
>>> 
>>> After having read Jay Kreps awesome Kafka reading(
>>> 
>>> https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying
>>>  
>>> <https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying>
>>> )
>>> I have a doubt.
>>> 
>>> For communication between browsers (lets say collaborative editing, chat
>>> etc.) is Kafka the right choice ? Especially given that Kafka consumers are
>>> designed to pull , rather than a callback style push. For low latency
>>> possibly ephemeral data/events is Kafka a good choice ? Can I have a
>>> browser open a socket into a webserver and each request initiate a consumer
>>> to consume from kafka (by polling?) OR is Kafka designed and/or meant to be
>>> used for a separate usecase ?
>>> 
>>> Any feedback is appreciated. Let the bashing begin!
>>> 
>>> Many Thanks,
>>> pranay
>>> 
> 



Re: is kafka the right choice

2016-06-24 Thread Ben Stopford
Kafka uses a long poll 
. So requests 
effectively block on the server, if there is insufficient data available. This 
elevates many of the issues associated with traditional polling approaches. 

Service-based applications often require directed channels to do request 
response. People do use Kafka in this way, Philippe gave a good example below.  
You just need to be aware that, should you have a lot of services that need to 
interact, it could involve creating a lot of topics.  Kafka topics are 
persistent and generally long lived. They shouldn’t be considered ephemeral 
imho. 

B


> On 23 Jun 2016, at 17:35, Philippe Derome  wrote:
> 
> See Keyhole Software blog and particularly John Boardman's presentation of
> sample app with responsive web client using WebSockets connecting to a
> netty embedded web server that itself uses producer and consumer clients
> with a Kafka infrastructure (@johnwboardman). On first look, it seems like
> a valid approach. Behind the web server are services that are Kafka apps
> interacting with external web APIs.
> 
> Anecdotally quite a few companies post jobs with Kafka playing a role in a
> micro architecture solution.
> 
> I'll now let experts speak...
> On 23 Jun 2016 11:47 a.m., "Pranay Suresh"  wrote:
> 
>> Hey Kafka experts,
>> 
>> After having read Jay Kreps awesome Kafka reading(
>> 
>> https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying
>> )
>> I have a doubt.
>> 
>> For communication between browsers (lets say collaborative editing, chat
>> etc.) is Kafka the right choice ? Especially given that Kafka consumers are
>> designed to pull , rather than a callback style push. For low latency
>> possibly ephemeral data/events is Kafka a good choice ? Can I have a
>> browser open a socket into a webserver and each request initiate a consumer
>> to consume from kafka (by polling?) OR is Kafka designed and/or meant to be
>> used for a separate usecase ?
>> 
>> Any feedback is appreciated. Let the bashing begin!
>> 
>> Many Thanks,
>> pranay
>> 



Re: Quotas feature Kafka 0.9.0.1

2016-06-09 Thread Ben Stopford
Hi Liju

Alas we can’t use quotas directly to throttle replication. The problem is that, 
currently, fetch requests from followers include critical traffic (the 
replication of produce requests) as well as non critical traffic (brokers 
catching up etc) so we can’t apply the current quotas mechanism directly. We’re 
looking at this problem now though and there’ll be a KIP coming out shortly. 

All the best

Ben

> On 8 Jun 2016, at 08:34, Rajini Sivaram  wrote:
> 
> Liju,
> 
> Quotas are not applied to the replica fetch followers.
> 
> Regards,
> 
> Rajini
> 
> On Fri, Jun 3, 2016 at 7:25 PM, Liju John  wrote:
> 
>> Hi ,
>> 
>> We are exploring the new quotas feature with Kafka 0.9.01.
>> Could you please let me know if quotas feature works for fetch follower as
>> well ?
>> We see that when a broker is down for a long time and brought back , the
>> replica catches up aggressively , impacting the whole cluster.
>> Would it be possible to throttle Fetch follower as well with quotas?
>> 
>> 
>> Regards,
>> Liju John
>> 



Re: Rebalancing issue while Kafka scaling

2016-06-01 Thread Ben Stopford
Pretty much. It’s not actually related to zookeeper. 

Generalising a bit, replication factor 2 means Kafka can lose 1 machine and be 
ok. 

B
> On 1 Jun 2016, at 12:46, Hafsa Asif <hafsa.a...@matchinguu.com> wrote:
> 
> So, it means that I should create topics with at least replication-factor=2
> inspite of how many servers in a kafka cluster. If any server goes down or
> slows down then zookeeper will not go out-of-sync.
> Currently, my all topics are with eplication-factor= 1 and I got an issue
> that Zookeeper goes out of sync. So, increasing replication-factor will
> solve the issue?
> 
> Hafsa
> 
> 2016-06-01 12:57 GMT+02:00 Ben Stopford <b...@confluent.io>:
> 
>> Hi Hafa
>> 
>> If you create a topic with replication-factor = 2, you can lose one of
>> them without losing data, so long as they were "in sync". Replicas can fall
>> out of sync if one of the machines runs slow. The system tracks in sync
>> replicas. These are exposed by JMX too. Check out the docs on replication
>> for more details:
>> 
>> http://kafka.apache.org/090/documentation.html#replication <
>> http://kafka.apache.org/090/documentation.html#replication>
>> 
>> B
>> 
>>> On 1 Jun 2016, at 10:45, Hafsa Asif <hafsa.a...@matchinguu.com> wrote:
>>> 
>>> Hello Jayesh,
>>> 
>>> Thank you very much for such a good description. My further questions are
>>> (just to be my self clear about the concept).
>>> 
>>> 1. If I have only one partition in a 'Topic' in a Kafka with following
>>> configuration,
>>> bin/kafka-topics.sh --create --zookeeper localhost:2181
>>> --replication-factor 1 --partitions 1 --topic mytopic1
>>> Then still I need to rebalance topic partitions while node
>> adding/removing
>>> in Kafka cluster?
>>> 
>>> 2. What is the actual meaning of this line 'if all your topics have
>> atleast
>>> 2 insync replicas'. My understanding is that, I need to create replica of
>>> each topic in each server. e.g: I have two servers in a Kafka cluster
>> then
>>> I need to create topic 'mytopic1' in both servers. It helps to get rid of
>>> any problem while removing any of the server.
>>> 
>>> I will look in detail into your provided link. Many thanks for this.
>>> 
>>> Looking forward for the answers from also other Kafka ninjas as well :)
>>> 
>>> Best,
>>> Hafsa
>>> 
>>> 2016-05-31 18:50 GMT+02:00 Thakrar, Jayesh <jthak...@conversantmedia.com
>>> :
>>> 
>>>> Hafsa, Florin
>>>> 
>>>> First thing first, it is possible to scale a Kafka cluster up or down
>>>> (i.e. add/remove servers).
>>>> And as has been noted in this thread, after you add a server to a
>> cluster,
>>>> you need to rebalance the topic partitions in order to put the newly
>> added
>>>> server into use.
>>>> And similarly, before you remove a server, it is advised that you drain
>>>> off the data from the server to be removed (its not a hard requirement,
>> if
>>>> all your topics have atleast 2 insync replicas, including the server
>> being
>>>> removed and you intend to rebalance after server removal).
>>>> 
>>>> However, "automating" the rebalancing of topic partitions is not
>> trivial.
>>>> 
>>>> There is a KIP out there to help with the rebalancing , but lacks
>> details
>>>> -
>>>> 
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-6+-+New+reassignment+partition+logic+for+rebalancing
>>>> My guess is due to its non-trivial nature AND the number of cases one
>>>> needs to take care of - e.g. scaling up by 5% v/s scaling up by 50% in
>> say,
>>>> a 20 node cluster.
>>>> Furthermore, to be really effective, one needs to be cognizant of the
>>>> partition sizes, and with rack-awareness, the task becomes even more
>>>> involved.
>>>> 
>>>> Regards,
>>>> Jayesh
>>>> 
>>>> -Original Message-
>>>> From: Spico Florin [mailto:spicoflo...@gmail.com]
>>>> Sent: Tuesday, May 31, 2016 9:44 AM
>>>> To: users@kafka.apache.org
>>>> Subject: Re: Rebalancing issue while Kafka scaling
>>>> 
>>>> Hi!
>>>> What version of Kafka you are using? What do you mean by "Kafka needs
>>>> rebalacing?" Rebalancing of what? Can you please be more specific

Re: Dynamic bootstrap.servers with multiple data centers

2016-06-01 Thread Ben Stopford
Hey Danny

Currently the bootstrap servers are only used when the client initialises 
(there’s a bit of discussion around the issue in the jira below if you’re 
interested). To implement failover you’d need to catch a timeout exception in 
your client code, consulting your service discovery mechanism and reinitialise 
the client. 

KAFKA-3068 

B

> On 31 May 2016, at 22:09, Danny Bahir  wrote:
> 
> Hello,
> 
> Working on a multi data center Kafka installation in which all clusters have 
> the same topics, the producers will be able to connect to any of the 
> clusters. Would like the ability to dynamically control the set of clusters a 
> producer will be able to connect to, that will allow to gracefully take a 
> cluster offline for maintenance.
> Current design is to have one zk cluster that is across all data centers and 
> will have info regarding what in which cluster a service is available.
> 
> In the case of Kafka it will house the info needed to populate 
> bootstrap.servers, a wrapper will be placed around the Kafka producer and 
> will watch this ZK value. When the value will change the producer instance 
> with the old value will be shut down and a new producer with the new 
> bootstrap.servers info will replace it.
> 
> Is there a best practice for achieving this?
> 
> Is there a way to dynamically update bootstrap.servers?
> 
> Does the producer always go to the same machine from bootstrap.servers when 
> it refreshes the MetaData after metadata.max.age.ms has expired?
> 
> Thanks!



Re: Rebalancing issue while Kafka scaling

2016-06-01 Thread Ben Stopford
Hi Hafa

If you create a topic with replication-factor = 2, you can lose one of them 
without losing data, so long as they were "in sync". Replicas can fall out of 
sync if one of the machines runs slow. The system tracks in sync replicas. 
These are exposed by JMX too. Check out the docs on replication for more 
details:

http://kafka.apache.org/090/documentation.html#replication 


B

> On 1 Jun 2016, at 10:45, Hafsa Asif  wrote:
> 
> Hello Jayesh,
> 
> Thank you very much for such a good description. My further questions are
> (just to be my self clear about the concept).
> 
> 1. If I have only one partition in a 'Topic' in a Kafka with following
> configuration,
> bin/kafka-topics.sh --create --zookeeper localhost:2181
> --replication-factor 1 --partitions 1 --topic mytopic1
> Then still I need to rebalance topic partitions while node adding/removing
> in Kafka cluster?
> 
> 2. What is the actual meaning of this line 'if all your topics have atleast
> 2 insync replicas'. My understanding is that, I need to create replica of
> each topic in each server. e.g: I have two servers in a Kafka cluster then
> I need to create topic 'mytopic1' in both servers. It helps to get rid of
> any problem while removing any of the server.
> 
> I will look in detail into your provided link. Many thanks for this.
> 
> Looking forward for the answers from also other Kafka ninjas as well :)
> 
> Best,
> Hafsa
> 
> 2016-05-31 18:50 GMT+02:00 Thakrar, Jayesh :
> 
>> Hafsa, Florin
>> 
>> First thing first, it is possible to scale a Kafka cluster up or down
>> (i.e. add/remove servers).
>> And as has been noted in this thread, after you add a server to a cluster,
>> you need to rebalance the topic partitions in order to put the newly added
>> server into use.
>> And similarly, before you remove a server, it is advised that you drain
>> off the data from the server to be removed (its not a hard requirement, if
>> all your topics have atleast 2 insync replicas, including the server being
>> removed and you intend to rebalance after server removal).
>> 
>> However, "automating" the rebalancing of topic partitions is not trivial.
>> 
>> There is a KIP out there to help with the rebalancing , but lacks details
>> -
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-6+-+New+reassignment+partition+logic+for+rebalancing
>> My guess is due to its non-trivial nature AND the number of cases one
>> needs to take care of - e.g. scaling up by 5% v/s scaling up by 50% in say,
>> a 20 node cluster.
>> Furthermore, to be really effective, one needs to be cognizant of the
>> partition sizes, and with rack-awareness, the task becomes even more
>> involved.
>> 
>> Regards,
>> Jayesh
>> 
>> -Original Message-
>> From: Spico Florin [mailto:spicoflo...@gmail.com]
>> Sent: Tuesday, May 31, 2016 9:44 AM
>> To: users@kafka.apache.org
>> Subject: Re: Rebalancing issue while Kafka scaling
>> 
>> Hi!
>>  What version of Kafka you are using? What do you mean by "Kafka needs
>> rebalacing?" Rebalancing of what? Can you please be more specific.
>> 
>> Regards,
>> Florin
>> 
>> 
>> 
>> On Tue, May 31, 2016 at 4:58 PM, Hafsa Asif 
>> wrote:
>> 
>>> Hello Folks,
>>> 
>>> Today , my team members shows concern that whenever we increase node
>>> in Kafka cluster, Kafka needs rebalancing. The rebalancing is sort of
>>> manual and not-good step whenever scaling happens. Second, if Kafka
>>> scales up then it cannot be scale down. Please provide us proper
>>> guidance over this issue, may be we have not enough configuration
>> properties.
>>> 
>>> Hafsa
>>> 
>> 
>> 
>> 
>> 
>> This email and any files included with it may contain privileged,
>> proprietary and/or confidential information that is for the sole use
>> of the intended recipient(s).  Any disclosure, copying, distribution,
>> posting, or use of the information contained in or attached to this
>> email is prohibited unless permitted by the sender.  If you have
>> received this email in error, please immediately notify the sender
>> via return email, telephone, or fax and destroy this original transmission
>> and its included files without reading or saving it in any manner.
>> Thank you.
>> 



Failing between mirrored clusters

2016-05-11 Thread Ben Stopford
Hi

I’m looking at failing-over from one cluster to another, connected via mirror 
maker, where the __consumer_offsets topic is also mirrored. 

In theory this should allow consumers to be restarted to point at the secondary 
cluster such that they resume from the same offset they reached in the primary 
cluster. Retries in MM will cause the offsets to diverge somewhat, which would 
in turn cause some reprocessing of messages on failover, but this should be a 
better option than resorting to the earliest/latest offset.  

Does anyone have experience doing this?  

Thanks 

Ben

Re: Tune Kafka offsets.load.buffer.size

2016-04-20 Thread Ben Stopford
If you have a relatively small number of consumers you might further reduce 
offsets.topic.segment.bytes. The active segment is not compacted. 
B
> On 18 Apr 2016, at 23:45, Muqtafi Akhmad  wrote:
> 
> dear Kafka users,
> 
> Is there any tips about how to configure *offsets.load.buffer.size*
> configuration to speed up the offsets loading after leader change? The
> default value for this configuration is 5242880
> 
> Thank you,
> 
> -- 
> Muqtafi Akhmad
> Software Engineer
> Traveloka



Re: steps path to kafka mastery

2016-03-29 Thread Ben Stopford
Not sure which book you read but, based on the first few chapters, this book 
 is an worthy investment. 

B
> On 29 Mar 2016, at 03:40, S Ahmed  wrote:
> 
> Hello,
> 
> This may be a silly question for some but here goes :)
> 
> Without real production experience, what steps do you suggest one take to
> really have some solid skillz in kafka?
> 
> I tend to learn in a structured way, but it just seems that since kafka is
> a general purpose tool there isn't really a course per say that teaches you
> all things kafka.
> 
> There are books but the one I read was more of a tutorial on certain
> aspects of kafka but it doesn't give you the insights of building a real
> production system.
> 
> Any suggestions or tips?



Re: Kafka Streams scaling questions

2016-03-22 Thread Ben Stopford
Hi Kishore

In general I think it’s up to you to choose keys that keep related data 
together, but also give you reasonable load balancing. I’m afraid that I’m not 
sure I fully followed your explanation of how storm solves this problem more 
efficiently though.

I noticed you asked:  "How would this work for a multi-tenant stream processing 
where people want to write multiple stream jobs on the same set of data?” - I 
think this is simply that Consumer Group behaviour. Different applications 
would get different consumer groups (application ids in KStreams), giving them 
independent parallelism over the same data. 

In theory there is another knob to consider. Consumers (actually the leader 
consumer) can control which partitions they get assigned. KStreams already uses 
this feature to do things like create stand by replicas, but I don’t think (but 
I may be wrong) this helps you with your problem directly.  

All the best

B 

> On 21 Mar 2016, at 03:12, Kishore Senji  wrote:
> 
> I will scale back the question to get some replies :)
> 
> Suppose the use-case is to build a monitoring platform -
> For log aggregation from thousands of nodes, I believe that a Kafka topic
> should be partitioned n-ways and the data should be sprayed in a
> round-robin fashion to get a good even distribution of data (because we
> don't know upfront how the data is sliced by semantically and we don't know
> whether the key for semantic partitioning gives a even distribution of
> data). Later in stream processing, the appropriate group-bys would be done
> on the same source of data to support various ways of slicing.
> 
> 
> http://kafka.apache.org/documentation.html#design_loadbalancing - "This can
> be done at random, implementing a kind of random load balancing, or it can
> be done by some semantic partitioning function"
> http://kafka.apache.org/documentation.html#basic_ops_modify_topic - "Be
> aware that one use case for partitions is to semantically partition data,
> and adding partitions doesn't change the partitioning of existing data so
> this may disturb consumers if they rely on that partition"
> 
> The above docs caution the use of semantic partitioning as it can lead to
> uneven distribution (hotspots) if the semantic key does not give even
> distribution, plus on a flex up of partitions the data would now be in two
> partitions. For these reasons, I strongly believe the data should be pushed
> to Kafka in a round-robin fashion and later a Stream processing framework
> should use the appropriate group-bys (this also gives us the flexibility to
> slice in different ways as well at runtime)
> 
> KStreams let us do stream processing on a partition of data. So to do
> windowed aggregation, the data for the same key should be in the same
> partition. This means to use KStreams we have to use Semantic partitioning,
> which will have the above issues as shown in Kafka docs. So my question is -
> 
> If we want to use KStreams how should we deal with "Load balancing" (it can
> happen that the semantic partitioning can overload a single partition and
> so Kafka partition will be overloaded as well as the KStream task)  and
> "Flex up of partitions" (more than one partition will have data for a given
> key and so the windowed aggregations result in incorrect data)?
> 
> Thanks,
> Kishore.
> 
> On Thu, Mar 17, 2016 at 4:28 PM, Kishore Senji  wrote:
> 
>> Hi All,
>> 
>> I read through the doc on KStreams here:
>> http://www.confluent.io/blog/introducing-kafka-streams-stream-processing-made-simple
>> 
>> 
>> I was wondering about how an use-case that I have be solved with KStream?
>> 
>> Use-case: Logs are obtained from a service pool. It contains many nodes.
>> We want to alert if a particular consumer (identified by consumer id) is
>> making calls to the service more than X number of times in the last 1 min.
>> The data is available in logs similar to access logs for example. The
>> window is a sliding window. We have to check back 60s from the current
>> event and see if in total (with the current event) it would exceed the
>> threshold. Logs are pushed to Kafka using a random partitioner whose range
>> is [1 to n] where n is the total number of partitions.
>> 
>> One way of achieving this is to push data in to the first Kafka topic
>> (using random partitioning) and then a set of KStream tasks re-shuffling
>> the data on consumer_id in to the second topic. The next set of KStream
>> tasks operate on the second topic (1 task/partition) and do the
>> aggregation. If this is an acceptable solution, here are my questions on
>> scaling.
>> 
>> 
>>   - I can see that the second topic is prone to hotspots. If we get
>>   billions of requests for a given consumer_id and only few hundreds for
>>   another consumer_id, the second kafka 

Re: Would Kafka streams be a good choice for a collaborative web app?

2016-03-21 Thread Ben Stopford
It sounds like a fairly typical pub-sub use case where you’d likely be choosing 
Kafka because of its scalable data retention and built in fault tolerance. As 
such it’s a reasonable choice.
> On 21 Mar 2016, at 17:07, Mark van Leeuwen  wrote:
> 
> Hi Sandesh,
> 
> Thanks for the suggestions. I've looked at them now :-)
> 
> The core problem that needs to be solved with my app is keeping a full 
> replayable history of changes, transmitting latest state to web apps when 
> they start, then keeping them in sync with latest state as changes are made 
> by all current clients, preferably without polling. That's why keeping track 
> of offsets with each client seemed the way to go.
> 
> Not sure how stream processing engines help with that - but happy to be 
> advised otherwise.
> 
> Cheers.
> 
> On 22/03/16 02:35, Sandesh Hegde wrote:
>> Hello Mark,
>> 
>> Have you looked at one of the streaming engines like Apache Apex, Flink?
>> 
>> Thanks
>> 
>> On Mon, Mar 21, 2016 at 7:56 AM Gerard Klijs 
>> wrote:
>> 
>>> Hi Mark,
>>> 
>>> I don't think it would be a good solution with the latencies to and from
>>> the server your running from in mind. This is less of a problem is your app
>>> is only mainly used in one region.
>>> 
>>> I recently went to a Firebase event, and it seems a lot more fitting. It
>>> also allows the user to see it's own changes real-time, and provides
>>> several authentication options, and has servers world-wide.
>>> 
>>> On Mon, Mar 21, 2016 at 7:53 AM Mark van Leeuwen  wrote:
>>> 
 Hi,
 
 I'm soon to begin design and dev of a collaborative web app where
 changes made by one user should appear to other users in near real time.
 
 I'm new to Kafka, but having read a bit about Kafka streams I'm
 wondering if it would be a good solution. Change events produced by one
 user would be published to multiple consumer clients over a websocket,
 each having their own offset.
 
 Would this be viable?
 
 Are there any considerations I should be aware of?
 
 Thanks,
 Mark
 
 
> 



Re: Question regarding compression of topics in Kafka

2016-03-19 Thread Ben Stopford
Yes it will compress the data stored on the file system if you specify 
compression in the producer. You can check whether the data is compressed on 
disk by running the following command in the data directory. 
kafka-run-class kafka.tools.DumpLogSegments --print-data-log --files 
latest-log-file.log

> On 17 Mar 2016, at 23:59, R P  wrote:
> 
> Hello All,
>   Does kafka support compressing storage logs stored in log dir?
> What does compression.type=(gzip/snappy) in server.properties do?
> 
> Based on documents I am assuming that it will compress the logs on local 
> file system.
> I ran a quick experiment and found that my logs stored on local disk are 
> not getting compressed.
> Size of data stored on disk is same with or without compression.
> 
> I am using following configuration properties in server.properties 
> config file.
> 
> compression.type=gzip
> compressed.topics="gzip-topic"
> 
> Thanks for reading and appreciate any responses.
> 
> Thanks,
> R P



Re: question about time-delay

2016-03-16 Thread Ben Stopford
Kafka’s defaults are set for low latency so that’s probably a reasonable 
measure of your lower bound latency for that message size. 

B
> On 15 Mar 2016, at 16:26, 杜俊霖  wrote:
> 
> Hello,I have some question when I use kafka transfer data。
> In my test,I create a producer and a consumer and test the time-delay 
> from producer to consumer  with 1000 bytes data. It takes me about 3ms.
> but when i ping to broker ,the time-delay is about 0.1ms.
> what configuration I can do to make sure the time-delay under 1ms when 
> transfer data from producer to consumer. 
> best wishes!
> 
> 
> 
>  



Re: Kafka Applicability - Large Messages

2016-03-14 Thread Ben Stopford
Becket did a good talk at the last Kafka meetup on how Linked In handle the 
large message problem. 

http://www.slideshare.net/JiangjieQin/handle-large-messages-in-apache-kafka-58692297
 


> On 14 Mar 2016, at 09:42, Jens Rantil  wrote:
> 
> Just making it more explicit: AFAIK, all Kafka consumers I've seen loads the 
> incoming messages into memory. Unless you make it possible to stream it do 
> disk or something you need to make sure your consumers has the available 
> memory.
> 
> Cheers,
> Jens
> 
> On Fri, Mar 4, 2016 at 6:07 PM Cees de Groot  > wrote:
> 1GB sounds like a tad steep, you may want to do some testing, as Kafka
> needs to be told that such large messages can arrive and broker will then
> pre-allocate buffers for that. Personally, I'd stop short of low megabytes,
> anything bigger can be dropped off in e.g. S3 and then you just queue a
> link for further processing.
> 
> I'm not saying it's impossible, Kafka handles large messages better than
> most other tools out there, but you do want to do a test setup to make sure
> that it'll handle the sort of traffic you fling at it in any case.
> 
> On Fri, Mar 4, 2016 at 4:26 AM, Mahesh Dharmasena  >
> wrote:
> 
> > We have a client with several thousand stores which send and receive
> > messages to main system that resides on the headquarters.
> >
> > A single Store sends and receive around 50 to 100 messages per day.
> >
> > Average Message size could be from 2KB to 1GB.
> >
> > Please let me know whether I can adapt Apache Kafka for the solution?
> >
> >
> > - Mahesh.
> >
> 
> 
> 
> --
> 
> *Cees de Groot*
> PRINCIPAL SOFTWARE ENGINEER
> [image: PagerDuty logo] >
> pagerduty.com 
> c...@pagerduty.com   >
> +1(416)435-4085
> 
> [image: Twitter]  >[image: FaceBook]
>  >[image: Google+]
>  >[image: LinkedIn]
>  >[image: Blog]
> >
> -- 
> Henrik Hedvall
> Lead Designer
> henrik.hedv...@tink.se 
> +46 72 505 57 59
> 
> Tink AB
> Wallingatan 5
> 111 60 Stockholm, Sweden 
> www.tink.se 
> 
> 
> 



Re: Kafka topics with infinite retention?

2016-03-14 Thread Ben Stopford
A couple of things:

- Compacted topics provide a useful way to retain meaningful datasets inside 
the broker, which don’t grow indefinitely. If you have an update-in-place use 
case, where the event sourced approach doesn’t buy you much, these will keep 
the reload time down when you regenerate materialised views.  
- When going down the master data store route a few different problems may 
conflate. Disaster recovery, historic backups, regenerating data in non 
production environments.  

B


> On 14 Mar 2016, at 09:56, Jens Rantil  wrote:
> 
> This is definitely an interesting use case. However, you need to be aware
> that changing the broker topology won't rebalance the preexisting data from
> the previous brokers. That is, you risk loosing data.
> 
> Cheers,
> Jens
> 
> On Wed, Mar 9, 2016 at 2:10 PM Daniel Schierbeck 
> wrote:
> 
>> I'm considering an architecture where Kafka acts as the primary datastore,
>> with infinite retention of messages. The messages in this case will be
>> domain events that must not be lost. Different downstream consumers would
>> ingest the events and build up various views on them, e.g. aggregated
>> stats, indexes by various properties, full text search, etc.
>> 
>> The important bit is that I'd like to avoid having a separate datastore for
>> long-term archival of events, since:
>> 
>> 1) I want to make it easy to spin up new materialized views based on past
>> events, and only having to deal with Kafka is simpler.
>> 2) Instead of having some sort of two-phased import process where I need to
>> first import historical data and then do a switchover to the Kafka topics,
>> I'd rather just start from offset 0 in the Kafka topics.
>> 3) I'd like to be able to use standard tooling where possible, and most
>> tools for ingesting events into e.g. Spark Streaming would be difficult to
>> use unless all the data was in Kafka.
>> 
>> I'd like to know if anyone here has tried this use case. Based on the
>> presentations by Jay Kreps and Martin Kleppmann I would expect that someone
>> had actually implemented some of the ideas they're been pushing. I'd also
>> like to know what sort of problems Kafka would pose for long-term storage –
>> would I need special storage nodes, or would replication be sufficient to
>> ensure durability?
>> 
>> Daniel Schierbeck
>> Senior Staff Engineer, Zendesk
>> 
> -- 
> 
> Jens Rantil
> Backend Developer @ Tink
> 
> Tink AB, Wallingatan 5, 111 60 Stockholm, Sweden
> For urgent matters you can reach me at +46-708-84 18 32.



Re: Exactly-once publication behaviour

2016-02-19 Thread Ben Stopford
Hi Andrew

There are plans to add exactly once behaviour. This will likely be a little 
more than Idempotent producers with the motivation being to provide better 
delivery guarantees for Connect, Streams and Mirror Maker. 

B

 

> On 19 Feb 2016, at 13:54, Andrew Schofield 
>  wrote:
> 
> When publishing messages to Kafka, you make a choice between at-most-once and 
> at-least-once delivery, depending on whether you wait for acknowledgments and 
> whether you retry on failures. In most cases, those options are good enough. 
> However, some systems offer exactly-once reliability too. Although my view is 
> that the practical use of exactly-once is limited in the situations that 
> Kafka is generally used for, when you're connecting other systems to Kafka or 
> bridging between protocols, I think there is value in propagating the 
> reliability level that the other system expects.
> 
> As a consumer, you can manage your offset and get exactly-once delivery, or 
> more likely exactly-once processing, of the messages.
> 
> I've read about idempotent producers 
> (https://cwiki.apache.org/confluence/display/KAFKA/Idempotent+Producer) and I 
> know there's been some discussion about transactions too.
> 
> Is there a plan to provide the tools to enable exactly-once publication 
> behaviour? Is this a planned enhancement to Kafka Connect? Is there already 
> some technique that people are using effectively to get exactly-once?
> 
> Andrew Schofield



Re: Kafka response ordering guarantees

2016-02-17 Thread Ben Stopford
So long as you set max.inflight.requests.per.connection = 1 Kafka should 
provide strong ordering within a partition (so use the same key for messages 
that should retain their order). There is a bug currently raised agaisnt this 
feature though where there is an edge case that can cause ordering issues. 

https://issues.apache.org/jira/browse/KAFKA-3197  
> On 17 Feb 2016, at 07:17, Ivan Dyachkov  wrote:
> 
> Hello all.
> 
> I'm developing a kafka client and have a question about kafka server 
> guarantees.
> 
> A statement from 
> https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol#AGuideToTheKafkaProtocol-Network
>  makes me a bit confused:
> 
> "The server guarantees that on a single TCP connection, requests will be 
> processed in the order they are sent and responses will return in that order 
> as well. The broker's request processing allows only a single in-flight 
> request per connection in order to guarantee this ordering. Note that clients 
> can (and ideally should) use non-blocking IO to implement request pipelining 
> and achieve higher throughput. i.e., clients can send requests even while 
> awaiting responses for preceding requests since the outstanding requests will 
> be buffered in the underlying OS socket buffer. All requests are initiated by 
> the client, and result in a corresponding response message from the server 
> except where noted."
> 
> Does this mean that when a client is sending more than one in-flight request 
> per connection, the server does not guarantee that responses will be sent in 
> the same order as requests?
> 
> In other words, if I have a strictly monotonically increasing integer as a 
> correlation id for all requests, can I rely on Kafka that correlation id in 
> responses will also have this property?
> 
> Thanks.
> 
> /Ivan



Re: Replication Factor and number of brokers

2016-02-17 Thread Ben Stopford
If you create a topic with more replicas than brokers it should throw an
error but if you lose a broker you'd have under replicated partitions.

B

On Tuesday, 16 February 2016, Alex Loddengaard  wrote:

> Hi Sean, you'll want equal or more brokers than your replication factor.
> Meaning, if your replication factor is 3, you'll want 3 or more brokers.
>
> I'm not sure what Kafka will do if you have fewer brokers than your
> replication factor. It will either give you the highest replication factor
> it can (in this case, the number of brokers), or it will put more than one
> replica on some brokers. My guess is the former, but again, I'm not sure.
>
> Hope this helps.
>
> Alex
>
> On Tue, Feb 16, 2016 at 7:47 AM, Damian Guy  > wrote:
>
> > Then you'll have under-replicated partitions. However, even if you have 3
> > brokers with a replication factor of 2 and you lose a single broker
> you'll
> > still likely have under-replicated partitions.
> > Partitions are assigned to brokers, 1 broker will be the leader and n
> > brokers will be followers. If any of the brokers with replicas of the
> > partition on it crash then you'll have under-replicated partitions.
> >
> >
> > On 16 February 2016 at 14:45, Sean Morris (semorris)  >
> > wrote:
> >
> > > So if I have a replication factor of 2, but only 2 brokers, then
> > > replication works, but what if I lose one broker?
> > >
> > > Thanks,
> > > Sean
> > >
> > > On 2/16/16, 9:14 AM, "Damian Guy" >
> wrote:
> > >
> > > >Hi,
> > > >
> > > >You need to have at least replication factor brokers.
> > > >replication factor  = 1 is no replication.
> > > >
> > > >HTH,
> > > >Damian
> > > >
> > > >On 16 February 2016 at 14:08, Sean Morris (semorris) <
> > semor...@cisco.com >
> > > >wrote:
> > > >
> > > >> Should your number of brokers be atleast one more then your
> > replication
> > > >> factor of your topic(s)?
> > > >>
> > > >> So if I have a replication factor of 2, I need atleast 3 brokers?
> > > >>
> > > >> Thanks,
> > > >> Sean
> > > >>
> > > >>
> > >
> > >
> >
>
>
>
> --
> *Alex Loddengaard | **Solutions Architect | Confluent*
> *Download Apache Kafka and Confluent Platform: www.confluent.io/download
> *
>


Re: Rebalancing during the long-running tasks

2016-02-16 Thread Ben Stopford
I think you’ll find some useful context in this KIP Jason wrote. It’s pretty 
good. 

https://cwiki.apache.org/confluence/display/KAFKA/KIP-41%3A+KafkaConsumer+Max+Records
 



> On 16 Feb 2016, at 07:15, Насыров Ренат  wrote:
> 
> Hello!
> 
> I'm trying to use kafka for long-running tasks processing. The tasks can be 
> very short (less than a second) or very long (about 10 minutes). I've got one 
> consumer group for the single queue, and one or more consumers. Sometimes 
> consumers manage to commit their offsets before rebalancing, sometimes not 
> (and fail). Accordning to this document ( 
> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+0.9+Consumer+Rewrite+Design
>  ), in worst case (when all the consumers are on very long tasks) it goes as 
> follows:
> 
> 1) Consumers get long tasks from the queue.
> 2) Consumers performing their long-running tasks.
> 3) Session timeout happens.
> 4) Group coordinator performs a rebalance; the current generation number is 
> increased.
> 5) Consumers complete their long-running tasks and commit.
> 6) GroupCoordinator returns IllegalGeneration errors to consumers and does 
> not allow to commit the offsets.
> 7) Consumers reconnect and get the very messages from the previous 
> generation, thus stucking in forever loop.
> 
> Suggestions:
> 
> 1) Commit first, then process. Inacceptable in my case because it leads to 
> at-most-once semantics.
> 2) Increase session timeout limit. Not desired because task duration can 
> negatively affect the effectiveness of rebalance.
> 
> Is there any proper way to complete long-running tasks?



Re: Kafka as master data store

2016-02-15 Thread Ben Stopford
Ted - it depends on your domain. More conservative approaches to long lived 
data protect against data corruption, which generally means snapshots and cold 
storage.  


> On 15 Feb 2016, at 21:31, Ted Swerve  wrote:
> 
> HI Ben, Sharninder,
> 
> Thanks for your responses, I appreciate it.
> 
> Ben - thanks for the tips on settings. A backup could certainly be a
> possibility, although if only with similar durability guarantees, I'm not
> sure what the purpose would be?
> 
> Sharninder - yes, we would only be using the logs as forward-only streams -
> i.e. picking an offset to read from and moving forwards - and would be
> setting retention time to essentially infinite.
> 
> Regards,
> Ted.
> 
> On Tue, Feb 16, 2016 at 5:05 AM, Sharninder Khera 
> wrote:
> 
>> This topic comes up often on this list. Kafka can be used as a datastore
>> if that’s what your application wants with the caveat that Kafka isn’t
>> designed to keep data around forever. There is a default retention time
>> after which older data gets deleted. The high level consumer essentially
>> reads data as a stream and while you can do sort of random access with the
>> low level consumer, its not ideal.
>> 
>> 
>> 
>>> On 15-Feb-2016, at 10:26 PM, Ted Swerve  wrote:
>>> 
>>> Hello,
>>> 
>>> Is it viable to use infinite-retention Kafka topics as a master data
>>> store?  I'm not talking massive volumes of data here, but still
>> potentially
>>> extending into tens of terabytes.
>>> 
>>> Are there any drawbacks or pitfalls to such an approach?  It seems like a
>>> compelling design, but there seem to be mixed messages about its
>>> suitability for this kind of role.
>>> 
>>> Regards,
>>> Ted
>> 
>> 



Re: Kafka as master data store

2016-02-15 Thread Ben Stopford
Hi Ted

This is an interesting question. 

Kafka has similar resilience properties to other distributed stores such as 
Cassandra, which are used as master data stores (obviously without the query 
functions). You’d need to set unclean.leader.election.enable=false and 
configure sufficient replication to get good resiliency. 

One objection to doing this would be that the majority of Kafka usage is for 
transitory data. This is fair and I’ve not seen Kafka used as a master data 
store per se. I have seen it used for reliable messaging, which means not 
losing data and hence requires similar properties. Certainly there is nothing I 
can think of that would suggest Kafka would be any worse than other distributed 
data stores, but to further mitigate concerns, you could use Connect to create 
a backup in HDFS, SAN etc. 

All the best

B 



> On 15 Feb 2016, at 08:56, Ted Swerve  wrote:
> 
> Hello,
> 
> Is it viable to use infinite-retention Kafka topics as a master data
> store?  I'm not talking massive volumes of data here, but still potentially
> extending into tens of terabytes.
> 
> Are there any drawbacks or pitfalls to such an approach?  It seems like a
> compelling design, but there seem to be mixed messages about its
> suitability for this kind of role.
> 
> Regards,
> Ted



Re: Kafka 0.8.2.0 Log4j

2016-02-12 Thread Ben Stopford
Check you’re setting the Kafka log4j properties.  

-Dlog4j.configuration=file:config/log4j.properties

B 
> On 12 Feb 2016, at 07:33, Joe San  wrote:
> 
> How could I get rid of this warning?
> 
> log4j:WARN No appenders could be found for logger
> (kafka.utils.VerifiableProperties).
> log4j:WARN Please initialize the log4j system properly.
> 
> Any ideas how to get rid of this warning?



Re: How to retrieve the HighWaterMark

2016-02-11 Thread Ben Stopford
Hi Florian

I think you should be able to get it by calling consumer.seekToEnd() followed 
by consumer.position() for each topic partition. 

B

> On 10 Feb 2016, at 09:23, Florian Hussonnois  wrote:
> 
> Hi all,
> 
> I'm looking for a way to retrieve the HighWaterMark using the new API.
> 
> Is that possible ?
> 
> Thank you in advance
> 
> -- 
> Florian HUSSONNOIS



Re: How to retrieve the HighWaterMark

2016-02-11 Thread Ben Stopford
As an aside - you should also be able to validate this against the 
replication-offset-checkpoint file for each topic partition, server side.

> On 11 Feb 2016, at 09:02, Ben Stopford <b...@confluent.io> wrote:
> 
> Hi Florian
> 
> I think you should be able to get it by calling consumer.seekToEnd() followed 
> by consumer.position() for each topic partition. 
> 
> B
> 
>> On 10 Feb 2016, at 09:23, Florian Hussonnois <fhussonn...@gmail.com> wrote:
>> 
>> Hi all,
>> 
>> I'm looking for a way to retrieve the HighWaterMark using the new API.
>> 
>> Is that possible ?
>> 
>> Thank you in advance
>> 
>> -- 
>> Florian HUSSONNOIS
> 



Re: Communication between Kafka clients & Kafka

2016-01-22 Thread Ben Stopford
Hey Praveen

Kafka uses a binary protocol over TCP. You can find details of specifics here 

 if you’re interested.

All the best 

B
> On 22 Jan 2016, at 08:00, praveen S  wrote:
> 
> Do Kafka clients(producers & consumers) use rpc to communicate with the
> Kafka cluster.?
> 
> Regards,
> Praveen



Re: Possible WAN Replication Setup

2016-01-17 Thread Ben Stopford
Jason 

Don’t forget that Kafka relies on redundant replicas for fault tolerance rather 
than disk persistence, so your single instances might lose messages straight 
out of the box if they’re not terminated cleanly. You could set flush.messages 
to 1 though. Don’t forget about Zookeeper either. That has to go somewhere. 

For what it’s worth I’ve seen one installation move away from this type of 
pattern as it was a little painful to manage. Your milage may vary though. But 
you’re certainly not alone with wanting to do something like this. There is a 
buffering producer on the roadmap, although it may end up being a slightly 
different thing. 

B  


> On 16 Jan 2016, at 00:12, Jason J. W. Williams  
> wrote:
> 
> Hey Luke,
> 
> Thank you for the reply and encouragement. I'm going to start hacking on a
> small PoC.
> 
> -J
> 
> On Fri, Jan 15, 2016 at 12:01 PM, Luke Steensen <
> luke.steen...@braintreepayments.com> wrote:
> 
>> Not an expert, but that sounds like a very reasonable use case for Kafka.
>> The log.retention.* configs on the edge brokers should cover your TTL
>> needs.
>> 
>> 
>> On Thu, Jan 14, 2016 at 3:37 PM, Jason J. W. Williams <
>> jasonjwwilli...@gmail.com> wrote:
>> 
>>> Hello,
>>> 
>>> We historically have been a RabbitMQ environment, but we're looking at
>>> using Kafka for a new project and I'm wondering if the following
>>> topology/setup would work well in Kafka (for RMQ we'd use federation):
>>> 
>>> * Multiple remote datacenters consisting each of a single server running
>> an
>>> HTTP application that receives client data and generates events. Each
>>> server would also run single-node Kafka "cluster". The application would
>>> write events as messages into the single-node Kafka "cluster" running on
>>> the same machine.
>>> * A hub datacenter that the remote data centers are connected to via SSL.
>>> The hub data center would run a multi-node Kafka cluster (3 nodes).
>>> * Use mirrormaker in the hub data center to mirror event messages from
>> each
>>> of the remote single-node servers into the hub's central Kafka cluster,
>>> where all of the real consumers are listening.
>>> 
>>> The problem set is each of the remote servers is collecting data from
>>> customers over HTTP and returning responses, but those remote servers are
>>> also generating events from those customer interactions. We want to
>> publish
>>> those events into a central hub data center for analytics. We want the
>>> event messages at the remote servers to queue up when their network
>>> connections to the hub data center is unreliable, and automatically relay
>>> queued messages to the hub data center when the network comes
>> back...making
>>> the event relay system tolerant to WAN network faults. We'd also want to
>>> set up some kind of TTL on queued messages, so if the WAN connection to
>> the
>>> hub is down for an extended period of time, the messages queued on the
>>> remote servers don't build up infinitely.
>>> 
>>> Any thoughts on if this setup is advisable/inadvisable with Kafka (or any
>>> other thoughts on it) would be greatly appreciated.
>>> 
>>> -J
>>> 
>> 



Re: reassign __consumer_offsets partitions

2015-12-17 Thread Ben Stopford
Hi Damian

The reassignment should treat the offsets topic as any other topic. I did a 
quick test and it seemed to work for me. Do you see anything suspicious in the 
controller log?

B
> On 16 Dec 2015, at 14:51, Damian Guy  wrote:
> 
> Hi,
> 
> 
> We have had some temporary nodes in our kafka cluster and i now need to
> move assigned partitions off of those nodes onto the permanent members. I'm
> familiar with the kafka-reassign-partitions script, but ... How do i get it
> to work with the __consumer_offsets partition? It currently seems to ignore
> it.
> 
> Thanks,
> Damian



Re: kafka connection with zookeeper

2015-12-12 Thread Ben Stopford
Hi Sadanand

Kafka secures it’s connection with Zookeeper via SASL and it’s a little 
different to the way brokers secure connections between themselves and with 
clients. 

There’s more info here: 
http://docs.confluent.io/2.0.0/kafka/zookeeper-authentication.html 


This applies to the latest release of Kafka

Best regards

Ben
> On 11 Dec 2015, at 19:54, sadanand.kulka...@wipro.com wrote:
> 
> Hi there!
> 
>  Couldn’t find anything so far! please help..
> 
>  - Can Kafka broker communicates with Zookeeper using secure (SSL/TLLS) 
> protocol?  If yes, which zookeeper and Kafka versions supports it?
> 
> Thanks a ton!
> 
> Appreciate for your time and help!
> 
> Regards
> Sadanand
> The information contained in this electronic message and any attachments to 
> this message are intended for the exclusive use of the addressee(s) and may 
> contain proprietary, confidential or privileged information. If you are not 
> the intended recipient, you should not disseminate, distribute or copy this 
> e-mail. Please notify the sender immediately and destroy all copies of this 
> message and any attachments. WARNING: Computer viruses can be transmitted via 
> email. The recipient should check this email and any attachments for the 
> presence of viruses. The company accepts no liability for any damage caused 
> by any virus transmitted by this email. www.wipro.com



Re: SSL - kafka producer cannot publish to topic

2015-12-11 Thread Ben Stopford
Yes - that’s correct Ismael. I think what Shri was saying was that he got it 
working when he added the SSL properties to the file he passed into the Console 
Producer.

> On 11 Dec 2015, at 17:06, Ismael Juma  wrote:
> 
> Hi Shrikant,
> 
> On Thu, Dec 10, 2015 at 9:03 PM, Shrikant Patel  wrote:
> 
>> Figured it out.
>> 
>> I was adding the ssl properties to producer.properties. We need to add
>> this to separate file and provide that file as input to procuder bat\sh
>> script --producer.config client-ssl.properties.
>> 
>> It seems the kafka.tools.ConsoleProducer class needs to have
>> --producer.config parameter pointing to just ssl configuration. It does not
>> pick it up from producer.properties.
>> 
> 
> This is not correct, the properties file passed to `producer.config` can
> have any producer configuration, not just SSL.
> 
> You mentioned `producer.properties`, but I don't see any mention of it in
> your scripts so that's the reason why it wasn't working as far as I can
> see. Am I missing something?
> 
> Best,
> Ismael



Re: Doubt regarding Encryption and Authentication using SSL

2015-12-09 Thread Ben Stopford
Hi Ritesh

You just need to create yourself a text file called client-ssl.properties or 
similar in the directory your running from.  In that file you put your SSL 
client information. Something like this:

security.protocol = SSL
ssl.truststore.location = "/var/private/ssl/kafka.client.truststore.jks"
ssl.truststore.password = "test1234"

If you prefer you can pass these on the command line too with the 
producer/consumer-property option too. 

There’s some documentation here 
 if 
you’d like more info. 

All the best

Ben


> On 9 Dec 2015, at 14:17, Ritesh Sinha  
> wrote:
> 
> Hi,
> 
> 
> I am following the kafka documentation to create encryption and
> authentication  while sending message to kafka by ssl
> 
> I got stuck at these commands
> 
> kafka-console-producer.sh --broker-list localhost:9093 --topic test
> --producer.config *client-ssl.properties*
> 
> kafka-console-consumer.sh --bootstrap-server localhost:9093 --topic
> test --new-consumer --consumer.config *client-ssl.properties*
> 
> *I*t is asking for *client-ssl.properties* for producer and consumer
> config. I am not sure what these files are.I am able to follow these
> steps :
> 
> Generate SSL key and certificate for each Kafka broker
> Creating your own CA
> 
> Signing the certificate
> Configuring Kafka Brokers
> 
> Can anyone help me in understanding what file does producer config needs
> exactly?
> 
> Thanks in Advance



Re: Error while sending data to kafka producer

2015-12-09 Thread Ben Stopford
what is your server config?

> On 9 Dec 2015, at 18:21, Ritesh Sinha  
> wrote:
> 
> Hi,
> 
> I am trying to send message to kafka producer using encryption and
> authentication.After creating the key and everything successfully.While
> passing the value through console i am getting this error:
> 
> ERROR Error when sending message to topic test with key: null, value: 2
> bytes with error: Failed to update metadata after 6 ms.
> (org.apache.kafka.clients.producer.internals.ErrorLoggingCallback)
> 
> I am using this command :
> 
> kafka-console-producer.sh --broker-list localhost:9093 --topic test
> --producer.config *client-ssl.properties*
> 
> 
> content of client-ssl.properties:
> 
> security.protocol = SSL
> ssl.truststore.location =
> /home/ritesh/software/kafka_2.10-0.9.0.0/kafka.client.truststore.jks
> ssl.truststore.password = test1234
> 
> 
> What could be the issue ?



Re: Error while sending data to kafka producer

2015-12-09 Thread Ben Stopford
Hi Ritesh

You config on both sides looks fine. There may be something wrong with your 
truststore, although you should see exceptions in either the client or server 
log files if that is the case. 

As you appear to be running locally, try creating the JKS files using the shell 
script included here 
(http://docs.confluent.io/2.0.0/kafka/ssl.html#signing-the-certificate 
) 
entering the password test1234 whenever prompted then use this JKS files in 
your client and broker config. If executed correctly this example should 
definitely work. 

Ben 


> On 9 Dec 2015, at 18:50, Ritesh Sinha  
> wrote:
> 
> This is my server config.
> 
> On Thu, Dec 10, 2015 at 12:19 AM, Ritesh Sinha <
> kumarriteshranjansi...@gmail.com> wrote:
> 
>> # Licensed to the Apache Software Foundation (ASF) under one or more
>> # contributor license agreements.  See the NOTICE file distributed with
>> # this work for additional information regarding copyright ownership.
>> # The ASF licenses this file to You under the Apache License, Version 2.0
>> # (the "License"); you may not use this file except in compliance with
>> # the License.  You may obtain a copy of the License at
>> #
>> #http://www.apache.org/licenses/LICENSE-2.0
>> #
>> # Unless required by applicable law or agreed to in writing, software
>> # distributed under the License is distributed on an "AS IS" BASIS,
>> # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
>> # See the License for the specific language governing permissions and
>> # limitations under the License.
>> # see kafka.server.KafkaConfig for additional details and defaults
>> 
>> # Server Basics #
>> 
>> # The id of the broker. This must be set to a unique integer for each
>> broker.
>> broker.id=0
>> 
>> # Socket Server Settings
>> #
>> 
>> listeners=PLAINTEXT://:9092,SSL://localhost:9093
>> 
>> # The port the socket server listens on
>> #port=9092
>> 
>> # Hostname the broker will bind to. If not set, the server will bind to
>> all interfaces
>> #host.name=localhost
>> 
>> # Hostname the broker will advertise to producers and consumers. If not
>> set, it uses the
>> # value for "host.name" if configured.  Otherwise, it will use the value
>> returned from
>> # java.net.InetAddress.getCanonicalHostName().
>> #advertised.host.name=
>> 
>> # The port to publish to ZooKeeper for clients to use. If this is not set,
>> # it will publish the same port that the broker binds to.
>> #advertised.port=
>> 
>> # The number of threads handling network requests
>> num.network.threads=3
>> 
>> # The number of threads doing disk I/O
>> num.io.threads=8
>> 
>> # The send buffer (SO_SNDBUF) used by the socket server
>> socket.send.buffer.bytes=102400
>> 
>> # The receive buffer (SO_RCVBUF) used by the socket server
>> socket.receive.buffer.bytes=102400
>> 
>> # The maximum size of a request that the socket server will accept
>> (protection against OOM)
>> socket.request.max.bytes=104857600
>> 
>> 
>> # Log Basics #
>> 
>> # A comma seperated list of directories under which to store log files
>> log.dirs=/tmp/kafka-logs
>> 
>> # The default number of log partitions per topic. More partitions allow
>> greater
>> # parallelism for consumption, but this will also result in more files
>> across
>> # the brokers.
>> num.partitions=1
>> 
>> # The number of threads per data directory to be used for log recovery at
>> startup and flushing at shutdown.
>> # This value is recommended to be increased for installations with data
>> dirs located in RAID array.
>> num.recovery.threads.per.data.dir=1
>> 
>> # Log Flush Policy
>> #
>> 
>> # Messages are immediately written to the filesystem but by default we
>> only fsync() to sync
>> # the OS cache lazily. The following configurations control the flush of
>> data to disk.
>> # There are a few important trade-offs here:
>> #1. Durability: Unflushed data may be lost if you are not using
>> replication.
>> #2. Latency: Very large flush intervals may lead to latency spikes
>> when the flush does occur as there will be a lot of data to flush.
>> #3. Throughput: The flush is generally the most expensive operation,
>> and a small flush interval may lead to exceessive seeks.
>> # The settings below allow one to configure the flush policy to flush data
>> after a period of time or
>> # every N messages (or both). This can be done globally and overridden on
>> a per-topic basis.
>> 
>> # The number of messages to accept before forcing a flush of data to disk
>> #log.flush.interval.messages=1
>> 
>> # The maximum amount of time a message can sit in a log before we force a
>> flush
>> #log.flush.interval.ms=1000
>> 
>> 

Re: producer-consumer issues during deployments

2015-11-26 Thread Ben Stopford
Hi Prabhjot

I may have slightly misunderstood your question so apologies if that’s the 
case. The general approach to releases is to use a rolling upgrade where you 
take one machine offline at a time, restart it, wait for it to come online (you 
can monitor this via JMX) then move onto the next. If you’re taking multiple 
machines offline at the same time you need to be careful about where the 
replicas for those machines reside. You can examine these individually for each 
topic via kafka-topcis.sh. 

Regarding your questions the following points may be of use:

- Only one replica (the leader) will be available for writing at any one time 
in Kafka. If you offline machines then Kafka will switch over to use replicas 
on other machines if they are available. 
- The behaviour of produce requests will depend on the acknowledgment setting 
the producer provides, the setting for minimum in sync replicas and how many 
replicas remain standing after the failure. There are a few things going on 
here but they’re explained quite well here 
. 
- Consumers consume from the leader also so if the leader for a partition is 
online then you will be able to consumer from it. If the leader is on a machine 
that goes offline then consumption will pause whilst leadership switches over 
to a replica.  

All the best
B

> On 26 Nov 2015, at 17:58, Prabhjot Bharaj  wrote:
> 
> Hi,
> 
> Request your expertise on these doubts of mine
> 
> Thanks,
> Prabhjot
> 
> On Thu, Nov 26, 2015 at 4:43 PM, Prabhjot Bharaj 
> wrote:
> 
>> Hi,
>> 
>> We arrange our kafka machines in groups and deploy these phases.
>> 
>> For kafka, we’ll have to map groups with phases. During each phase of the
>> release, all the machines in that group can go down.
>> 
>> When this happens, there are a couple of cases:-
>> 
>>   1. All replicas are residing in a group of machines which will all go
>>   down in this phase
>>  - Affect on Producer –
>> - What happens to the produce requests (whether produce can
>> dynamically keep writing to the remaining partitions now)
>> - What happens to the already queued requests which were being
>> sent to the earlier replicas – they will fail (we’ll have to use 
>> producer
>> callback feature to take care of retrying in case the above step
>> works fine)
>>  - Affect on Consumer -
>> - Can the consumers consume from a lesser number of partitions?
>> - Does the consumer 'consume' api gives any callback/failure
>> when all replicas of a partition go down?
>> 
>> If you have come across any of the above cases, please provide how you
>> solved the problem ? or whether everything works just well with Kafka
>> during deployments and my cases described above are all invalid or handled
>> by kafka and its clients internally ?
>> 
>> Thanks,
>> Prabhjot
>> 
> 
> 
> 
> -- 
> -
> "There are only 10 types of people in the world: Those who understand
> binary, and those who don't"



Re: kafka java producer security to access kerberos

2015-11-09 Thread Ben Stopford
Hi Surrender 

Try using the producer-property option to specify the relevant ssl properties 
as a set of key-value pairs.

There is some helpful info here 
 too 

B

> On 9 Nov 2015, at 15:48, Kudumula, Surender  wrote:
> 
> Hi all
> Just a quick question is there any way we can pass the security protocol 
> property to java producer client
> ./bin/kafka-console-producer.sh --broker-list 
> c6401.ambari.apache.org:6667,c6402.ambari.apache.org:6667 --topic test_topic 
> --security-protocol PLAINTEXTSASL
> Any reply will be appreciated thanks
> 
> Regards
> 
> Surender Kudumula
> 
> 



Re: Getting started with Kafka using Java client

2015-10-12 Thread Ben Stopford
Hi Tarun

There is an examples section in the kafka project here 
 which shows the Consumer, 
SingleConsumer and Producer. These are just clients so you’ll need ZK and a 
Kafka server running to get them going. You probably don’t need to worry about 
the SimpleConsumerDemo which is an older way of interacting with Kafka.

Another option is to run a client from inside a test. There is a very simple 
example here .  

B
> On 12 Oct 2015, at 13:52, Tarun Sapra  wrote:
> 
> Hi All,
> 
> Can someone please point me to a well documented github code example for
> using Kafka 's java client. So far I haven't been able to find much related
> to using Java client. Most of the examples are using command prompt for
> producing and consuming messages. It's a bit frustrating that though Kafka
> has been generating so much hype in Big Data space but still lacks good
> examples on the internet
> 
> Thanks & Regards,
> Tarun



Re: New Consumer - discover consumer groups

2015-10-12 Thread Ben Stopford
I double checked with Jun but there is currently no direct API for consumer 
group discovery. I expect you already know this but you can get a consumer’s 
offset in the new API. You could also derive the info you need from the offsets 
topic.

B 


> On 12 Oct 2015, at 17:09, Damian Guy  wrote:
> 
> Hi,
> 
> Assuming i am using the latest kafka (trunk), exclusively with the new
> consumer, and i want to monitor consumer lag across all groups - how would
> i go about discovering the consumer groups? Is there an API call for this?
> 
> Thanks,
> Damian



Re: kafka metrics emitting to graphite

2015-10-11 Thread Ben Stopford
Hi Sunil

Try using JMXTrans to expose Kafka’s internal JMX metrics to graphite. 

https://github.com/jmxtrans/jmxtrans 

B
> On 11 Oct 2015, at 11:19, sunil kalva  wrote:
> 
> How to configure, to emit kafka broker metrics to graphite.
> 
> t
> SunilKalva



Re: [!!Mass Mail]Re: Dual commit with zookeeper and kafka

2015-10-10 Thread Ben Stopford
Dual commits are only really designed to aid migration of topics that were 
started using ZK and want to move offset storage to Kafka. They are completely 
separate offset storage mechanisms. I suggest you pick one and use that. I’d 
suggest Kafka. 

In the crash case you describe, offsets would be different in ZK & Kafka. The 
consumer would pick the larger of the two offsets to use.

> On 10 Oct 2015, at 11:11, Рябков Алексей Николаевич <a.ryab...@ntc-vulkan.ru> 
> wrote:
> 
> Hello!
> 
>>> Whether you commit offsets to Kafka itself (stored in offsets topic) or ZK 
>>> or both depends on the settings in two properties: offset.storage and 
>>> >>dual.commit.enabled (here 
>>> <http://kafka.apache.org/documentation.html#consumerconfigs>). Currently ZK 
>>> is the default.
> 
> Thanks for explanation but I have more questions...
> 1. As far as I understand when I use offset.storage=kafka  and 
> dual.commit.enabled in consumer (I mean java-based) then when I call commit 
> method commit then internally client commit offset  to kafka and then to 
> zk Is it correct?
> 2. if first is correct ...then  what happened if client crash between this  2 
> operations (commit 2 kafka and commit 2 zookeeper)?
> 3. What happened if broker crash And my client support only zookeeper? (I 
> guess that I read message from beggining) Is it correct?  Or broker 
> explicitly sync kafka commit with zookeeper when started?
> 
> Thanks in advance, Aleksey 
> 
> -Original Message-
> From: Ben Stopford [mailto:b...@confluent.io] 
> Sent: Friday, October 9, 2015 5:54 PM
> To: users@kafka.apache.org
> Subject: [!!Mass Mail]Re: Dual commit with zookeeper and kafka
> 
> Hi Alexey
> 
> Whether you commit offsets to Kafka itself (stored in offsets topic) or ZK or 
> both depends on the settings in two properties: offset.storage and 
> dual.commit.enabled (here 
> <http://kafka.apache.org/documentation.html#consumerconfigs>). Currently ZK 
> is the default. 
> 
> Commits happen either periodically (auto.commit.enable=true && 
> auto.commit.interval.ms) or explicitly. For example the high level consumer 
> has a commitOffsets method. I expect the python bindings have a similar 
> function. The rest proxy offers this function too. 
> 
> Hope that helps 
> 
> B
>   
> 
>> On 6 Oct 2015, at 23:06, Рябков Алексей Николаевич <a.ryab...@ntc-vulkan.ru> 
>> wrote:
>> 
>> Hello!
>> 
>> Can somebody explain me how to  use multiple consumers with different commit 
>> storage...
>> For example, java-based consumers use kafka commit storage...
>> python-based consumers use zookeeper commit storage
>> 
>> My question is:
>> Is it true that when one consumer commit to kafka,  server also commit 
>> automatically to zookeeper (so all consumers see the same commit) Is 
>> it true that when one consumer commit to zookeeper,  server also 
>> commit automatically to kafka (so all consumers see the same commit) 
>> Is it true that after server restart it read commits from kafka 
>> storage and publish (sync) commit to zookeeper (so after server 
>> restart all consumers can see also the same commit)
>> 
>> Thanks in advance, Aleksey
> 



Re: Dual commit with zookeeper and kafka

2015-10-09 Thread Ben Stopford
Hi Alexey

Whether you commit offsets to Kafka itself (stored in offsets topic) or ZK or 
both depends on the settings in two properties: offset.storage and 
dual.commit.enabled (here 
). Currently ZK is 
the default. 

Commits happen either periodically (auto.commit.enable=true && 
auto.commit.interval.ms) or explicitly. For example the high level consumer has 
a commitOffsets method. I expect the python bindings have a similar function. 
The rest proxy offers this function too. 

Hope that helps 

B


> On 6 Oct 2015, at 23:06, Рябков Алексей Николаевич  
> wrote:
> 
> Hello!
> 
> Can somebody explain me how to  use multiple consumers with different commit 
> storage...
> For example, java-based consumers use kafka commit storage...
> python-based consumers use zookeeper commit storage
> 
> My question is:
> Is it true that when one consumer commit to kafka,  server also commit 
> automatically to zookeeper (so all consumers see the same commit)
> Is it true that when one consumer commit to zookeeper,  server also commit 
> automatically to kafka (so all consumers see the same commit)
> Is it true that after server restart it read commits from kafka storage and 
> publish (sync) commit to zookeeper (so after server restart all consumers can 
> see also the same commit) 
> 
> Thanks in advance, Aleksey



Re: Kafka Mirror to consume data from beginning of topics

2015-10-09 Thread Ben Stopford
Hi Leo 

Set auto.offset.reset=smallest in your consumer.config

B
> On 8 Oct 2015, at 18:47, Clelio De Souza  wrote:
> 
> Hi there,
> 
> I am trying to setup a Kafka Mirror mechanism, but it seems the consumer
> from the source Kafka cluster only reads from new incoming data to the
> topics, i.e. it does not read historically saved data in the topics.
> 
> Is there a way to define the consumer of Kafka Mirror to read from the
> beginning of the topics of my source Kafka cluster?
> 
> Many thanks in advance!
> 
> Cheers,
> Leo



Re: number of topics given many consumers and groups within the data

2015-09-30 Thread Ben Stopford
I agree. The only reason I can think of for the custom partitioning route would 
be if your group concept were to grow to a point where a topic-per-category 
strategy become prohibitive. This seems unlikely based on what you’ve said. I 
should also add that Todd is spot on regarding the SimpleConsumer not being 
something you’d want to pursue at this time. There is however a new consumer on 
trunk which makes these things a little easier. 


> On 30 Sep 2015, at 19:05, Pradeep Gollakota <pradeep...@gmail.com> wrote:
> 
> To add a little more context to Shaun's question, we have around 400
> customers. Each customer has a stream of events. Some customers generate a
> lot of data while others don't. We need to ensure that each customer's data
> is sorted globally by timestamp.
> 
> We have two use cases around consumption:
> 
> 1. A user may consume an individual customers data
> 2. A user may consume data for all customers
> 
> Given these two use cases, I think the better strategy is to have a
> separate topic per customer as Todd suggested.
> 
> On Wed, Sep 30, 2015 at 9:26 AM, Todd Palino <tpal...@gmail.com> wrote:
> 
>> So I disagree with the idea to use custom partitioning, depending on your
>> requirements. Having a consumer consume from a single partition is not
>> (currently) that easy. If you don't care which consumer gets which
>> partition (group), then it's not that bad. You have 20 partitions, you have
>> 20 consumers, and you use custom partitioning as noted. The consumers use
>> the high level consumer with a single group, each one will get one
>> partition each, and it's pretty straightforward. If a consumer crashes, you
>> will end up with two partitions on one of the remaining consumers. If this
>> is OK, this is a decent solution.
>> 
>> If, however, you require that each consumer always have the same group of
>> data, and you need to know what that group is beforehand, it's more
>> difficult. You need to use the simple consumer to do it, which means you
>> need to implement a lot of logic for error and status code handling
>> yourself, and do it right. In this case, I think your idea of using 400
>> separate topics is sound. This way you can still use the high level
>> consumer, which takes care of the error handling for you, and your data is
>> separated out by topic.
>> 
>> Provided it is not an issue to implement it in your producer, I would go
>> with the separate topics. Alternately, if you're not sure you always want
>> separate topics, you could go with something similar to your second idea,
>> but have a consumer read the single topic and split the data out into 400
>> separate topics in Kafka (no need for Cassandra or Redis or anything else).
>> Then your real consumers can all consume their separate topics. Reading and
>> writing the data one extra time is much better than rereading all of it 400
>> times and throwing most of it away.
>> 
>> -Todd
>> 
>> 
>> On Wed, Sep 30, 2015 at 9:06 AM, Ben Stopford <b...@confluent.io> wrote:
>> 
>>> Hi Shaun
>>> 
>>> You might consider using a custom partition assignment strategy to push
>>> your different “groups" to different partitions. This would allow you
>> walk
>>> the middle ground between "all consumers consume everything” and “one
>> topic
>>> per consumer” as you vary the number of partitions in the topic, albeit
>> at
>>> the cost of a little extra complexity.
>>> 
>>> Also, not sure if you’ve seen it but there is quite a good section in the
>>> FAQ here <
>>> 
>> https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-HowmanytopicscanIhave
>> ?>
>>> on topic and partition sizing.
>>> 
>>> B
>>> 
>>>> On 29 Sep 2015, at 18:48, Shaun Senecal <shaun.sene...@lithium.com>
>>> wrote:
>>>> 
>>>> Hi
>>>> 
>>>> 
>>>> I heave read Jay Kreps post regarding the number of topics that can be
>>> handled by a broker (
>>> https://www.quora.com/How-many-topics-can-be-created-in-Apache-Kafka),
>>> and it has left me with more questions that I dont see answered anywhere
>>> else.
>>>> 
>>>> 
>>>> We have a data stream which will be consumed by many consumers (~400).
>>> We also have many "groups" within our data.  A group in the data
>>> corresponds 1:1 with what the consumers would consume, so consumer A only
>>> ever see group A messages, consumer B only consumes group B messages,
>> etc.
>>>&

Re: number of topics given many consumers and groups within the data

2015-09-30 Thread Ben Stopford
Hi Shaun

You might consider using a custom partition assignment strategy to push your 
different “groups" to different partitions. This would allow you walk the 
middle ground between "all consumers consume everything” and “one topic per 
consumer” as you vary the number of partitions in the topic, albeit at the cost 
of a little extra complexity.

Also, not sure if you’ve seen it but there is quite a good section in the FAQ 
here 

 on topic and partition sizing. 

B

> On 29 Sep 2015, at 18:48, Shaun Senecal  wrote:
> 
> Hi
> 
> 
> I heave read Jay Kreps post regarding the number of topics that can be 
> handled by a broker 
> (https://www.quora.com/How-many-topics-can-be-created-in-Apache-Kafka), and 
> it has left me with more questions that I dont see answered anywhere else.
> 
> 
> We have a data stream which will be consumed by many consumers (~400).  We 
> also have many "groups" within our data.  A group in the data corresponds 1:1 
> with what the consumers would consume, so consumer A only ever see group A 
> messages, consumer B only consumes group B messages, etc.
> 
> 
> The downstream consumers will be consuming via a websocket API, so the API 
> server will be the thing consuming from kafka.
> 
> 
> If I use a single topic with, say, 20 partitions, the consumers in the API 
> server would need to re-read the same messages over and over for each 
> consumer, which seems like a waste of network and a potential bottleneck.
> 
> 
> Alternatively, I could use a single topic with 20 partitions and have a 
> single consumer in the API put the messages into cassandra/redis (as 
> suggested by Jay), and serve out the downstream consumer streams that way.  
> However, that requires using a secondary sorted storage, which seems like a 
> waste (and added complexity) given that Kafka already has the data exactly as 
> I need it.  Especially if cassandra/redis are required to maintain a long TTL 
> on the stream.
> 
> 
> Finally, I could use 1 topic per group, each with a single partition.  This 
> would result in 400 topics on the broker, but would allow the API server to 
> simply serve the stream for each consumer directly from kafka and wont 
> require additional machinery to serve out the requests.
> 
> 
> The 400 topic solution makes the most sense to me (doesnt require extra 
> services, doesnt waste resources), but seem to conflict with best practices, 
> so I wanted to ask the community for input.  Has anyone done this before?  
> What makes the most sense here?
> 
> 
> 
> 
> Thanks
> 
> 
> Shaun



Re: Which perf-test tool?

2015-09-23 Thread Ben Stopford
Both classes work ok. I prefer the Java one simply because has better output 
and it does less overriding of default values.

However, in both cases you probably need to tweak settings to suit your use 
case. Most notably: 
acks
batch.size
linger.ms
based on whether you are interested in latency or throughput. That is usually 
sufficient for producer performance measurement. 

At present nothing is deprecated but there are some changes going in to clean 
these up a little. 

B


> On 23 Sep 2015, at 10:14, Markus Jais  wrote:
> 
> Hello,
> 
> I have a question about performance testing:
> 
> Performance tests for producers can run (using a Java class) with:
> 
> bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance
> 
> but there is also:
> ./bin/kafka-producer-perf-test.sh
> This is calling a Scala Class called kafka.tools.ProducerPerformance.
> 
> Which tool is recommend for 0.8.2.1 and newer?
> 
> What are the differences and limitations? I couldn't find anything in the 
> Kafka docs.
> 
> They seem to be similar according to the source code.
> Is one of the tools deprecated?
> 
> Best,
> 
> Markus



Re: 0.8.x

2015-08-26 Thread Ben Stopford
Hi Damian

Just clarifying - you’re saying you currently have Kafka 0.7.x running with a 
dedicated broker addresses (bypassing ZK) and hitting a VIP which you use for 
load balancing writes. Is that correct?

Are you worried about something specific in the 0.8.x way of doing things (ZK 
under that level of load etc) or are you just looking for experiences running 
0.8.x with that many producers?

B


 On 25 Aug 2015, at 10:29, Damian Guy damian@gmail.com wrote:
 
 Hi,
 
 We currently run 0.7.x on our clusters and are now finally getting around
 to upgrading to kafka latest.  One thing that has been holding us back is
 that we can no longer use a VIP to front the clusters. I understand we
 could use a VIP for metadata lookups, but we have 100,000 + producers to at
 least one of our clusters.
 So, my question is: How is Kafka 0.8.x going to handle 100,000+ producers?
 Any recommendations on setup etc?
 
 Thanks for the help,
 Damian



Re: 0.8.x

2015-08-26 Thread Ben Stopford
That’s a fair few connections indeed! 

You may be able to take the route of pinning producers to specific partitions. 
KIP-22 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-22+-+Expose+a+Partitioner+interface+in+the+new+producer
 made this easier by exposing a partitioner interface to producers. That should 
let you balance load. 

Also it may be worth adding that the code all uses non-blocking IO. I don’t 
have hard numbers here though.  

Has anyone else worked with 0.8.x at this level of load?

B 


 On 26 Aug 2015, at 10:57, Ben Stopford b...@confluent.io wrote:
 
 Hi Damian
 
 Just clarifying - you’re saying you currently have Kafka 0.7.x running with a 
 dedicated broker addresses (bypassing ZK) and hitting a VIP which you use for 
 load balancing writes. Is that correct?
 
 Are you worried about something specific in the 0.8.x way of doing things (ZK 
 under that level of load etc) or are you just looking for experiences running 
 0.8.x with that many producers?
 
 B
 
 
 On 25 Aug 2015, at 10:29, Damian Guy damian@gmail.com wrote:
 
 Hi,
 
 We currently run 0.7.x on our clusters and are now finally getting around
 to upgrading to kafka latest.  One thing that has been holding us back is
 that we can no longer use a VIP to front the clusters. I understand we
 could use a VIP for metadata lookups, but we have 100,000 + producers to at
 least one of our clusters.
 So, my question is: How is Kafka 0.8.x going to handle 100,000+ producers?
 Any recommendations on setup etc?
 
 Thanks for the help,
 Damian
 



Re: is SSL support feature ready to use in kafka-truck branch

2015-08-21 Thread Ben Stopford
Hi Qi

Trunk seems fairly stable. 

There are guidelines here which includes how to generate keys 
https://cwiki.apache.org/confluence/display/KAFKA/Deploying+SSL+for+Kafka 
https://cwiki.apache.org/confluence/display/KAFKA/Deploying+SSL+for+Kafka

Your server config needs these properties (also on the webpage):

listeners=PLAINTEXT://:9092,SSL://:9093

ssl.protocol = TLS
ssl.keystore.type = JKS
ssl.keystore.location = path/keystore.jks
ssl.keystore.password = pass
ssl.key.password = pass
ssl.truststore.type = JKS
ssl.truststore.location = path/truststore.jks
ssl.truststore.password = pass

To get yourself going it’s easiest to just generate a set of certs locally and 
spark up the console producer/consumer pair. You’ll need the latest cut from 
trunk (from today) to get a console consumer that works. 

Hope that helps

Ben


 On 21 Aug 2015, at 07:10, Qi Xu shkir...@gmail.com wrote:
 
 Hi folks,
 I tried to clone the latest version of kafka truck and try to enable the
 SSL. The server.properties seems not having any security related settings,
 and it seems there's no other config file relevant to SSL either.
 So may I know is this feature ready to use now in truck branch?
 
 BTW, we're using the SSL feature from the branch :
 https://github.com/relango/kafka/tree/0.8.2. Is there any significant
 difference between Kafka-truck and relango's branch?
 
 Thanks,
 Qi



Re: thread that handle client request in Kafka brokers

2015-08-17 Thread Ben Stopford
Hi Tao

This is unlikely to be a problem. The producer is threadsafe (see here 
http://kafka.apache.org/082/javadoc/index.html?org/apache/kafka/clients/producer/KafkaProducer.html)
 so you can happily share it between your pool of message producers. Kafka also 
provides a range of facilities for improving parallelism in your pipe. These 
include batching messages for bulk transfer, along with callbacks to provide 
acknowledgement. Kafka obviously provides a great degree of parallelism across 
brokers...

Hope that helps. 

B



 On 15 Aug 2015, at 17:19, Tao Feng fengta...@gmail.com wrote:
 
 Hi All,
 
 I was told that only single thread handle client producer request in Kafka
 broker which may potentially be a performance problem with request queue-up
 if we have many small requests. I am wondering which part of the code I
 should read to understand the above logic
 
 Thanks,
 -Tao



Re: kafka log flush questions

2015-08-07 Thread Ben Stopford
Hi Tao

1. I am wondering if the fsync operation is called by the last two routines
internally?
= Yes

2. If log.flush.interval.ms is not specified, is it true that Kafka let OS
to handle pagecache flush in background?
= Yes

3. If we specify ack=1 and ack=-1 in new producer, do those request only
persist in pagecache or actual disk?
= acknowledgements do not imply a channel is flushed. ack = -1 will increase 
durability through redundancy. If you really want to control fsync you can 
configure Kafka for force a flush after a defined number of messages using 
log.flush.interval.messages. I should add that this isn’t generally the best 
approach. You’ll get much better performance if you use multiple redundant 
replicas to manage your durability concerns, if you can. 

B

 On 7 Aug 2015, at 05:49, Tao Feng fengta...@gmail.com wrote:
 
 Hi ,
 
 I am trying to understand the Kafka log flush behavior. My understanding is
 when the broker specifies broker config param log.flush.interval.ms, it
 will specify log config param flush.ms internally. In logManager logic,
 when the log exceed flush.ms, it will call Log.flush which will call
 FileChannel.force(true) and MappedByteBuffer.flush() .
 
 Couple of questions:
 1. I am wondering if the fsync operation is called by the last two routines
 internally?
 2. If log.flush.interval.ms is not specified, is it true that Kafka let OS
 to handle pagecache flush in background?
 3. If we specify ack=1 and ack=-1 in new producer, do those request only
 persist in pagecache or actual disk?
 
 Thanks,
 -Tao



Re: Broker side consume-request filtering

2015-08-06 Thread Ben Stopford
Yes - this is what is basically what is termed selectors in JMS or routing-keys 
in AMQP. 

My guess is that a KIP to implement this kind of server side filtering would 
not make it through. Kafka is a producer-centric firehose. Server side 
filtering wouldn’t really fit well with the original design goals. There are 
other broker-centric solutions out there already too that do this kind of thing.

Client side filtering seems more plausible architecturally, but would it really 
add much? You’d still have to supply a custom filter.

Having said all that, I may be completely wrong, I’m relatively new to this 
group. If anyone else has a more complete view please chip in. 

You could also promote the question to the dev group to reach a wider audience. 

B

 

 On 6 Aug 2015, at 18:55, Alvaro Gareppe agare...@gmail.com wrote:
 
 Is this discussion open ? Cause this is exactly what I’m looking for.