Re: Exception on Startup. Is it bad or benign.

Joel Koshy Wed, 09 Apr 2014 10:01:37 -0700

Do you see the data loss warning after a controlled shutdown? It isn't
very clear from your original message whether that is associated with
a shutdown operation.


We have a test setup similar to what you are describing - i.e.,
continuous rolling bounces of a test cluster (while there is traffic
flowing into it through mirror makers). For each broker: wait until
under-replicated-partition count on every broker is zero, then proceed
to do a controlled shutdown of that broker.

Thanks,

Joel

On Wed, Apr 09, 2014 at 09:02:45AM -0400, Alex Gray wrote:
> Thanks Joel and Guozhang!
> The data retention is 72 hours.
> Graceful shutdown is done via SIGTERM, and
> controlled.shutdown.enabled=true is in the config.
> I do see 'Controlled shutdown succeeded' in the broker log when I
> shut it down.
> 
> With both your responses, I feel as if brokers are indeed setup and
> functioning correctly.
> 
> I want to ask the developers if I can run a write a script that
> gracefully restarts each broker randomly throughout the entire day,
> 24/7 :)
> 
> That should weed out any issues.
> 
> Thanks guys,
> 
> Alex
> 
> 
> On Tue Apr  8 20:38:15 2014, Joel Koshy wrote:
> >Also, when you say "graceful shutdown" you mean you issue SIGTERM? Do
> >you have controlled.shutdown.enable=true in the broker config. If that
> >is set and the controlled shutdown succeeds (i.e., if you see
> >'Controlled shutdown succeeded' in the broker log) then you shouldn't
> >be seeing the data loss warning in your controller log during the
> >shutdown and restarts. Or are you seeing it at other times as well?
> >
> >WRT the OffsetOutOfRangeException: is your broker down for a long
> >period? Do you have a very low retention setting for your topics? Or
> >are you bringing up a consumer that has been down for a long period?
> >
> >Thanks,
> >
> >Joel
> >
> >On Tue, Apr 08, 2014 at 04:58:08PM -0700, Guozhang Wang wrote:
> >>Hi Alex,
> >>
> >>1. There is no "cool-off" time since the rebalance should be done before
> >>the server complete shutdown.
> >>
> >>2. The logs are indicating there is possible data loss, which is "expected"
> >>if your producer's required.ack config is <= 1 but not == -1. If you do not
> >>want data loss, you can change that config value in your producer clients
> >>to be > 1, which will effectively trade some latency and availability for
> >>consistency.
> >>
> >>Guozhang
> >>
> >>
> >>On Tue, Apr 8, 2014 at 9:51 AM, Alex Gray <alex.g...@inin.com> wrote:
> >>
> >>>We have 3 Zookeepers and 3 Kafka Brokers, version 0.8.0.
> >>>
> >>>I gracefully shutdown one of the kafka brokers.
> >>>
> >>>Question 1:  Should I wait some time before starting the broker back up,
> >>>or can I restart it as soon as possible?  In other words, do I have to wait
> >>>for the other brokers to "re-balance (or whatever they do)" before starting
> >>>it back up?
> >>>
> >>>Question 2: Every once in a while, I get the following exception when the
> >>>kafka broker is starting up.  Is this bad?  Searching around the
> >>>newsgroups, I could not get a definitive answer. Example:
> >>>http://grokbase.com/t/kafka/users/13cq54bx5q/understanding-
> >>>offsetoutofrangeexceptions
> >>>http://grokbase.com/t/kafka/users/1413hp296y/trouble-
> >>>recovering-after-a-crashed-broker
> >>>
> >>>Here is the exception:
> >>>[2014-04-08 00:02:40,555] ERROR [KafkaApi-3] Error when processing fetch
> >>>request for partition [KeyPairGenerated,0] offset 514 from consumer with
> >>>correlation id 85 (kafka.server.KafkaApis)
> >>>kafka.common.OffsetOutOfRangeException: Request for offset 514 but we
> >>>only have log segments in the range 0 to 0.
> >>>     at kafka.log.Log.read(Log.scala:429)
> >>>     at kafka.server.KafkaApis.kafka$server$KafkaApis$$
> >>>readMessageSet(KafkaApis.scala:388)
> >>>     at kafka.server.KafkaApis$$anonfun$kafka$server$
> >>>KafkaApis$$readMessageSets$1.apply(KafkaApis.scala:334)
> >>>     at kafka.server.KafkaApis$$anonfun$kafka$server$
> >>>KafkaApis$$readMessageSets$1.apply(KafkaApis.scala:330)
> >>>     at scala.collection.TraversableLike$$anonfun$map$
> >>>1.apply(TraversableLike.scala:206)
> >>>     at scala.collection.TraversableLike$$anonfun$map$
> >>>1.apply(TraversableLike.scala:206)
> >>>     at scala.collection.immutable.Map$Map1.foreach(Map.scala:105)
> >>>     at scala.collection.TraversableLike$class.map(
> >>>TraversableLike.scala:206)
> >>>     at scala.collection.immutable.Map$Map1.map(Map.scala:93)
> >>>     at kafka.server.KafkaApis.kafka$server$KafkaApis$$
> >>>readMessageSets(KafkaApis.scala:330)
> >>>     at kafka.server.KafkaApis.handleFetchRequest(KafkaApis.scala:296)
> >>>     at kafka.server.KafkaApis.handle(KafkaApis.scala:66)
> >>>     at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:42)
> >>>     at java.lang.Thread.run(Thread.java:722)
> >>>
> >>>And in the controller.log, I see every once in a while something like:
> >>>
> >>>controller.log.2014-04-01-04:[2014-04-01 04:42:41,713] WARN [
> >>>OfflinePartitionLeaderSelector]: No broker in ISR is alive for
> >>>[KeyPairGenerated,0]. Elect leader 3 from live brokers 3. There's potential
> >>>data loss. (kafka.controller.OfflinePartitionLeaderSelector)
> >>>
> >>>(Which I did via: grep "data loss" *)
> >>>
> >>>I'm not a programmer: I am the admin for these machines, and I just want
> >>>to make sure everything is cool.
> >>>Oh, the server.properties has:
> >>>default.replication.factor=3
> >>>
> >>>Thanks,
> >>>
> >>>Alex
> >>>
> >>>
> >>
> >>
> >>--
> >>-- Guozhang
> >
> 
> --
> *Alex Gray* | DevOps Engineer, PureCloud
> Phone +1.317.493.4291 | mobile +1.857.636.2810
> *Interactive Intelligence*
> Deliberately Innovative
> www.inin.com <http://www.inin.com/>
>

Re: Exception on Startup. Is it bad or benign.

Reply via email to