Hi Ismael,

Thanks again for sending that link yesterday! I tried it this AM and this
change totally fixed the problem! The manifestation we observed was not
increased CPU usage, but rather a MUCH larger memory heap requirement. Once
I changed log.message.format.version to the version of our clients, the
following occurred:

1. ISRs went to full replication for each partition
2. Memory heap usage went down by a factor of 6
3. Storm throughput went up by a factor of 5

Our cluster looks great now--thanks again for pointing me to the docs where
the config issue was described--much, much appreciated!

--John

On Mon, Jul 10, 2017 at 12:26 PM, Ismael Juma <ism...@juma.me.uk> wrote:

> Hi John,
>
> Yes, down conversion when consuming messages does increase JVM heap usage
> as we have to load the data into the JVM heap to convert it. If down
> conversion is not needed, we are able to send the data without copying it
> to the JVM heap.
>
> Ismael
>
> On Sun, Jul 9, 2017 at 4:23 PM, John Yost <hokiege...@gmail.com> wrote:
>
> > Hi Ismael,
> >
> > Gotcha, will do. Okay, in reading to docs you linked, that may explain
> what
> > we're seeing. When we upgraded to 0.10.0, we did not upgrade the clients
> > from 0.9.0.1, so while the message format is the default--in this case,
> > 0.10.0--the message format expected by the consumers is pre-0.10.0.
> While I
> > am not seeing increased CPU utilization, it appears that the memory
> > requirements for the brokers have changed with the upgrade given that I
> had
> > to increase the broker memory heap size from 6 to 10GB to prevent
> > out-of-memory errors from occurring.
> >
> > Would the message format difference result in consumed and/or produced
> > messages piling up in a buffer, and, consequently, increase the broker
> > memory heap size requirement due to the format mismatch? That would be
> > awesome because that means we just need to update the
> > log.message.format.version to 0.9.0 until we upgrade the clients.
> >
> > --John
> >
> > --John
> >
> > On Sun, Jul 9, 2017 at 10:46 AM, Ismael Juma <ism...@juma.me.uk> wrote:
> >
> > > Hi John,
> > >
> > > Please read the upgrade documentation for the relevant versions:
> > >
> > > http://kafka.apache.org/documentation.html#upgrade
> > >
> > > Also, let's try to keep the discussion in one thread. I asked some
> > > questions in the related "0.10.1 memory and garbage collection issues"
> > > thread that you started.
> > >
> > > Ismael
> > >
> > > On Sun, Jul 9, 2017 at 3:30 PM, John Yost <hokiege...@gmail.com>
> wrote:
> > >
> > > > Hi Everyone,
> > > >
> > > > Ever since we've upgraded from 0.9.0.1 to 0.10.0 our five-node Kafka
> > > > cluster is unstable. Specifically, whereas before a 6GB memory heap
> > > worked
> > > > fine, following the upgrade all five brokers crashed with out of
> memory
> > > > errors within an hour of the upgrade. I boosted the memory heap to
> > 10GB,
> > > > which fixed the OOM error problem, but now it appears the GC pauses
> are
> > > > preventing the cluster from maintaining more than one ISR. I realize
> I
> > > > could up the replica lag settings to improve the ISR numbers, but
> > that's
> > > > treating the symptom and not the root problem.
> > > >
> > > > There appears to be a change in the memory requirements somewhere in
> > the
> > > > Kafka stack, which could be on the producer side as well, but I want
> to
> > > > rule out any configuration issues on the broker side. Are there any
> 0.9
> > > > defaults in particular anyone is aware of that I should change for
> > 0.10.x
> > > > to resolve the root problem(s) of these observations?
> > > >
> > > > --John
> > > >
> > >
> >
>

Reply via email to