See
https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+SimpleConsumer+Example

Thanks,

Jun


On Wed, Oct 2, 2013 at 9:42 PM, Jason Rosenberg <j...@squareup.com> wrote:

> Jun,
>
> Thanks, can you point me to the client code to issue a metadata request!
>
> Jason
>
>
> On Thu, Oct 3, 2013 at 12:24 AM, Jun Rao <jun...@gmail.com> wrote:
>
> > It's fixable. Since we plan to rewrite the consumer client code in the
> near
> > future, it could be considered at that point.
> >
> > If you issue a metadata request with an empty topic list, you will get
> back
> > the metadata of all topics.
> >
> > Thanks,
> >
> > Jun
> >
> >
> > On Wed, Oct 2, 2013 at 1:28 PM, Jason Rosenberg <j...@squareup.com>
> wrote:
> >
> > > How hard would it be to fix this issue, where we have a topic filter
> that
> > > matches multiple topics, for the load to be distributed over multiple
> > > threads, and over multiple consumers?  For some reason, I had thought
> > this
> > > issue was fixed in 0.8, but I guess not?
> > >
> > > I am currently using a single partition, for multiple topics.  I worry
> > that
> > > it won't scale ultimately to only ever have one thread on one consumer
> > > doing all the work......We could move to multiple partitions, but for
> > > ordering reasons in some use cases, this is not always ideal.
> > >
> > > Perhaps I can come up with some sort of dynamic topic sniffer, and have
> > it
> > > evenly divide the available topics between the available consumers (and
> > > threads per consumer)!  Is there a simple api within the kafka client
> > code,
> > > for getting the list of topics?
> > >
> > > Jason
> > >
> > >
> > > On Fri, Aug 30, 2013 at 11:41 PM, Jun Rao <jun...@gmail.com> wrote:
> > >
> > > > It seems to me option 1) is easer. Option 2) has the same issue as
> > option
> > > > 1) since you have to manage different while lists.
> > > >
> > > > A more general solution is probably to change the consumer
> distribution
> > > > model to divide partitions across topics. That way, one can create as
> > > many
> > > > streams as total # partitions for all topics. We can look into that
> in
> > > the
> > > > future.
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > >
> > > > On Fri, Aug 30, 2013 at 8:24 AM, Rajasekar Elango <
> > > rela...@salesforce.com
> > > > >wrote:
> > > >
> > > > > Yeah. The actual bottleneck is actually number of topics that match
> > the
> > > > > topic filter. Num of streams is going be shared between all topics
> > it's
> > > > > consuming from. I thought about following ideas to work around
> this.
> > (I
> > > > am
> > > > > basically referring to mirrormaker consumer in examples).
> > > > >
> > > > > Option 1). Instead of running one mirrormaker process with topic
> > filter
> > > > > ".+", We can start multiple mirrormaker process with topic filter
> > > > matching
> > > > > each topic (Eg: mirrormaker1 => whitelist topic1.* , mirrormaker2
> > > > > => whitelist topic2.* etc)
> > > > >
> > > > > But this adds some operations overhead to start and manage multiple
> > > > > processes on the host.
> > > > >
> > > > > Option 2) Modify mirrormaker code to support list of whitelist
> > filters
> > > > and
> > > > > it should create message streams for  each filter
> > > > > (call createMessageStreamsByFilter for each filter).
> > > > >
> > > > > What would be your recommendation..? If adding feature to
> mirrormaker
> > > is
> > > > > worth kafka, we can do option 2.
> > > > >
> > > > > Thanks,
> > > > > Raja.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Fri, Aug 30, 2013 at 10:34 AM, Jun Rao <jun...@gmail.com>
> wrote:
> > > > >
> > > > > > Right, but if you set #partitions in each topic to 16, you can
> use
> > a
> > > > > total
> > > > > > of 16 streams.
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Jun
> > > > > >
> > > > > >
> > > > > > On Thu, Aug 29, 2013 at 9:08 PM, Rajasekar Elango <
> > > > > rela...@salesforce.com
> > > > > > >wrote:
> > > > > >
> > > > > > > With option 1) I can't really use 8 streams in each consumer,
> If
> > I
> > > do
> > > > > > only
> > > > > > > one consumer seem to be doing all work. So I had to actually
> use
> > > > total
> > > > > 8
> > > > > > > streams with 4 for each consumer.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Fri, Aug 30, 2013 at 12:01 AM, Jun Rao <jun...@gmail.com>
> > > wrote:
> > > > > > >
> > > > > > > > The drawback of 2), as you said is no auto failover. I was
> > > > suggesting
> > > > > > > that
> > > > > > > > you use 16 partitions. Then you can use option 1) with 8
> > streams
> > > in
> > > > > > each
> > > > > > > > consumer.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > >
> > > > > > > > Jun
> > > > > > > >
> > > > > > > >
> > > > > > > > On Thu, Aug 29, 2013 at 8:51 PM, Rajasekar Elango <
> > > > > > > rela...@salesforce.com
> > > > > > > > >wrote:
> > > > > > > >
> > > > > > > > > Hi Jun,
> > > > > > > > >
> > > > > > > > > If you read my previous posts, based on current re
> balancing
> > > > logic,
> > > > > > if
> > > > > > > we
> > > > > > > > > consumer from topic filter, consumer actively use all
> > streams.
> > > > Can
> > > > > > you
> > > > > > > > > provide your recommendation of option 1 vs option 2 in my
> > > > previous
> > > > > > > post?
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Raja.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Thu, Aug 29, 2013 at 11:42 PM, Jun Rao <
> jun...@gmail.com>
> > > > > wrote:
> > > > > > > > >
> > > > > > > > > > You can always use more partitions to get more
> parallelism
> > in
> > > > the
> > > > > > > > > > consumers.
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > >
> > > > > > > > > > Jun
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Thu, Aug 29, 2013 at 12:44 PM, Rajasekar Elango
> > > > > > > > > > <rela...@salesforce.com>wrote:
> > > > > > > > > >
> > > > > > > > > > > So what is best way to load balance multiple consumers
> > > > > consuming
> > > > > > > from
> > > > > > > > > > topic
> > > > > > > > > > > filter.
> > > > > > > > > > >
> > > > > > > > > > > Let's say we have 4 topics with 8 partitions and 2
> > > consumers.
> > > > > > > > > > >
> > > > > > > > > > > Option 1) To load balance consumers, we can set
> > > num.streams=4
> > > > > so
> > > > > > > that
> > > > > > > > > > both
> > > > > > > > > > > consumers split 8 partitions. but can only use half of
> > > > consumer
> > > > > > > > > streams.
> > > > > > > > > > >
> > > > > > > > > > > Option 2) Configure mutually exclusive topic filter
> regex
> > > > such
> > > > > > > that 2
> > > > > > > > > > > topics will match consumer1 and 2 topics will match
> > > > consumer2.
> > > > > > Now
> > > > > > > we
> > > > > > > > > can
> > > > > > > > > > > set num.streams=8 and fully utilize consumer streams. I
> > > > believe
> > > > > > > this
> > > > > > > > > will
> > > > > > > > > > > improve performance, but if consumer dies, we will not
> > get
> > > > any
> > > > > > data
> > > > > > > > > from
> > > > > > > > > > > the topic used by that consumer.
> > > > > > > > > > >
> > > > > > > > > > > What would be your recommendation?
> > > > > > > > > > >
> > > > > > > > > > > Thanks,
> > > > > > > > > > > Raja.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Thu, Aug 29, 2013 at 12:42 PM, Neha Narkhede <
> > > > > > > > > neha.narkh...@gmail.com
> > > > > > > > > > > >wrote:
> > > > > > > > > > >
> > > > > > > > > > > > >> 2) When I started mirrormaker with num.streams=16,
> > > looks
> > > > > > like
> > > > > > > 16
> > > > > > > > > > > > consumer
> > > > > > > > > > > > threads were created, but only 8 are showing up as
> > active
> > > > as
> > > > > > > owner
> > > > > > > > in
> > > > > > > > > > > > consumer offset tracker and all topics/partitions are
> > > > > > distributed
> > > > > > > > > > > between 8
> > > > > > > > > > > > consumer threads.
> > > > > > > > > > > >
> > > > > > > > > > > > This is because currently the consumer rebalancing
> > > process
> > > > of
> > > > > > > > > assigning
> > > > > > > > > > > > partitions to consumer streams is at a per topic
> level.
> > > > > Unless
> > > > > > > you
> > > > > > > > > have
> > > > > > > > > > > at
> > > > > > > > > > > > least one topic with 16 partitions, the remaining 8
> > > threads
> > > > > > will
> > > > > > > > not
> > > > > > > > > do
> > > > > > > > > > > any
> > > > > > > > > > > > work. This is not ideal and we want to look into a
> > better
> > > > > > > > rebalancing
> > > > > > > > > > > > algorithm. Though it is a big change and we prefer
> > doing
> > > it
> > > > > as
> > > > > > > part
> > > > > > > > > of
> > > > > > > > > > > the
> > > > > > > > > > > > consumer client rewrite.
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks,
> > > > > > > > > > > > Neha
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On Thu, Aug 29, 2013 at 8:03 AM, Rajasekar Elango <
> > > > > > > > > > > rela...@salesforce.com
> > > > > > > > > > > > >wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > So my understanding is num of active streams that a
> > > > > consumer
> > > > > > > can
> > > > > > > > > > > utilize
> > > > > > > > > > > > is
> > > > > > > > > > > > > number of partitions in topic. This is fine if we
> > > > consumer
> > > > > > from
> > > > > > > > > > > specific
> > > > > > > > > > > > > topic. But if we consumer from TopicFilter, I
> thought
> > > > > > consumer
> > > > > > > > > should
> > > > > > > > > > > > able
> > > > > > > > > > > > > to utilize (number of topics that match filter *
> > number
> > > > of
> > > > > > > > > partitions
> > > > > > > > > > > in
> > > > > > > > > > > > > topic) . But looks like number of streams that
> > consumer
> > > > can
> > > > > > use
> > > > > > > > is
> > > > > > > > > > > > limited
> > > > > > > > > > > > > by just number if partitions in topic although it's
> > > > > consuming
> > > > > > > > from
> > > > > > > > > > > > multiple
> > > > > > > > > > > > > topic.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Here what I observed with 1 mirrormaker consuming
> > from
> > > > > > > whitelist
> > > > > > > > > > '.+'.
> > > > > > > > > > > > >
> > > > > > > > > > > > > The white list matches 5 topics and each topic has
> 8
> > > > > > > partitions.
> > > > > > > > I
> > > > > > > > > > used
> > > > > > > > > > > > > consumer offset checker to look at owner of
> > each/topic
> > > > > > > partition.
> > > > > > > > > > > > >
> > > > > > > > > > > > > 1) When I started mirrormaker with num.streams=8,
> all
> > > > > > > > > > topics/partitions
> > > > > > > > > > > > are
> > > > > > > > > > > > > distributed between 8 consumer threads.
> > > > > > > > > > > > >
> > > > > > > > > > > > > 2) When I started mirrormaker with num.streams=16,
> > > looks
> > > > > like
> > > > > > > 16
> > > > > > > > > > > consumer
> > > > > > > > > > > > > threads were created, but only 8 are showing up as
> > > active
> > > > > as
> > > > > > > > owner
> > > > > > > > > in
> > > > > > > > > > > > > consumer offset tracker and all topics/partitions
> are
> > > > > > > distributed
> > > > > > > > > > > > between 8
> > > > > > > > > > > > > consumer threads.
> > > > > > > > > > > > >
> > > > > > > > > > > > > So this could be bottleneck for consumers as
> although
> > > we
> > > > > > > > > partitioned
> > > > > > > > > > > > topic,
> > > > > > > > > > > > > if we are consuming from topic filter it can't
> > utilize
> > > > much
> > > > > > of
> > > > > > > > > > > > parallelism
> > > > > > > > > > > > > with num of streams. Am i missing something, is
> > there a
> > > > way
> > > > > > to
> > > > > > > > make
> > > > > > > > > > > > > cosumers/mirrormakers to utilize more number of
> > active
> > > > > > streams?
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > --
> > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > Raja.
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > --
> > > > > > > > > > > Thanks,
> > > > > > > > > > > Raja.
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Thanks,
> > > > > > > > > Raja.
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Thanks,
> > > > > > > Raja.
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Thanks,
> > > > > Raja.
> > > > >
> > > >
> > >
> >
>

Reply via email to