You can use a thread pool to write to hbase. And create another pool of 
consumer threads. Or add more 
consumer processes. The bottleneck is writing to Hbase in this case.
Regards,

Libo


-----Original Message-----
From: Graeme Wallace [mailto:graeme.wall...@farecompare.com] 
Sent: Wednesday, October 02, 2013 4:36 PM
To: users
Subject: Re: Strategies for improving Consumer throughput

Yes, definitely consumers are behind - we can see from examining the offsets


On Wed, Oct 2, 2013 at 1:59 PM, Joe Stein <crypt...@gmail.com> wrote:

> Are you sure the consumers are behind? could the pause be because the 
> stream is empty and producing messages is what is behind the consumption?
>
> What if you shut off your consumers for 5 minutes and then start them 
> again do the consumers behave the same way?
>
> /*******************************************
>  Joe Stein
>  Founder, Principal Consultant
>  Big Data Open Source Security LLC
>  http://www.stealth.ly
>  Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
> ********************************************/
>
>
> On Wed, Oct 2, 2013 at 3:54 PM, Graeme Wallace < 
> graeme.wall...@farecompare.com> wrote:
>
> > Hi All,
> >
> > We've got processes that produce many millions of itineraries per minute.
> > We would like to get them into HBase (so we can query for chunks of 
> > them
> > later) - so our idea was to write each itinerary as a message into 
> > Kafka
> -
> > so that not only can we have consumers that write to HBase, but also
> other
> > consumers that may provide some sort of real-time monitoring service 
> > and also an archive service.
> >
> > Problem is - we don't really know enough about how best to do this 
> > effectively with Kafka, so that the producers can run flat out and 
> > the consumers can run flat out too. We've tried having one topic, 
> > with
> multiple
> > partitions to match the spindles on our broker h/w (12 on each) - 
> > and setting up a thread per partition on the consumer side.
> >
> > At the moment, our particular problem is that the consumers just 
> > can't
> keep
> > up. We can see from logging that the consumer threads seem to run in 
> > bursts, then a pause (as yet we don't know what the pause is - dont 
> > think its GC). Anyways, does what we are doing with one topic and 
> > multiple partitions sound correct ? Or do we need to change ? Any 
> > tricks to speed
> up
> > consumption ? (we've tried changing the fetch size - doesnt help much).
> Am
> > i correct in assuming we can have one thread per partition for
> consumption
> > ?
> >
> > Thanks in advance,
> >
> > Graeme
> >
> > --
> > Graeme Wallace
> > CTO
> > FareCompare.com
> > O: 972 588 1414
> > M: 214 681 9018
> >
>



--
Graeme Wallace
CTO
FareCompare.com
O: 972 588 1414
M: 214 681 9018

Reply via email to