Asaf, thanks for your explanation.  This actually makes complete sense, as
we have 2 replicas.  So the math works out when taking this into
consideration.

Thanks!
Jorge

On Sat, Apr 16, 2016 at 9:32 PM, Asaf Mesika <asaf.mes...@gmail.com> wrote:

> Another thought: Brokers replicate data in. So a record weighing 10 bytes
> will be written out once for replication and one more time to a consumer so
> it will be 20 bytes out. Makes sense?
> On Thu, 14 Apr 2016 at 02:46 Jorge Rodriguez <jo...@bloomreach.com> wrote:
>
> > Thanks for your response Asaf.  I have 4 brokers.  These measurements are
> > from the kafka brokers.
> >
> > This measurement on this graph comes from Kafka.  It is a sum across all
> 4
> > brokers of the
> > metric: kafka.server.BrokerTopicMetrics.BytesInPerSec.1MinuteRate.
> >
> > But I also have a system metric which I feed independently using collectd
> > "interface" plugin.  And the bytes out and in match the ones reported by
> > kafka fairly well.  As well there is a corresponding increase in network
> > packets sent.
> >
> > Also, in the SparkStreaming side, I can see that during these spikes, the
> > number of received packets and bytes also spikes.
> >
> > So during the spikes, I believe that some of the fetch requests are
> perhaps
> > failing and we hit a retry.  I am debugging that currently and I think
> it's
> > related to the STW GC which happens on spark streaming occasionally.
> > Working on some GC tuning should alleviate this.
> >
> > However, even if this is the case, this would not explain though why
> under
> > normal operations, the number of bytes out is 2x the number of bytes in.
> > Since I only have 1 consumer for each topic, I would expect the numbers
> to
> > be fairly close.  Do you
> >
> >
> >
> >
> > On Tue, Apr 12, 2016 at 8:31 PM, Asaf Mesika <asaf.mes...@gmail.com>
> > wrote:
> >
> > > Where exactly do you get the measurement from? Your broker? Do you have
> > > only one? Your producer? Your spark job?
> > > On Mon, 11 Apr 2016 at 23:54 Jorge Rodriguez <jo...@bloomreach.com>
> > wrote:
> > >
> > > > We are running a kafka cluster for our real-time pixel processing
> > > > pipeline.  The data is produced from our pixel servers into kafka,
> and
> > > then
> > > > consumed by a spark streaming application.  Based on this, I would
> > expect
> > > > that the bytes in vs bytes out should be roughly equal, as each
> message
> > > > should be consumed once.
> > > >
> > > > Under normal operations, the bytes out is a little less than 2X the
> > bytes
> > > > in.  Does anyone know why this is?  We do use a replication factor of
> > 2.
> > > >
> > > > Occasionally, we get a spike in Bytes out.  But bytes in remain the
> > same
> > > > (see image below).  This correlates with a significant delay in
> > > processing
> > > > time in the spark streaming side.
> > > >
> > > > Below is a chart of kafka reported bytes out vs in.  The system level
> > > > network metrics show the same information (transferred bytes spike).
> > > >
> > > > Could anyone provide some tips for debugging/getting to the bottom of
> > > this
> > > > issue?
> > > >
> > > > Thanks,
> > > > Jorge
> > > >
> > > > *Kafka reported Bytes in Per topic and for all topics vs Kafka bytes
> > > out:*
> > > >
> > > > [image: Inline image 1]
> > > >
> > >
> >
>

Reply via email to