Thanks Joel and Guozhang!
The data retention is 72 hours.
Graceful shutdown is done via SIGTERM, and controlled.shutdown.enabled=true is in the config. I do see 'Controlled shutdown succeeded' in the broker log when I shut it down.

With both your responses, I feel as if brokers are indeed setup and functioning correctly.

I want to ask the developers if I can run a write a script that gracefully restarts each broker randomly throughout the entire day, 24/7 :)

That should weed out any issues.

Thanks guys,

Alex


On Tue Apr  8 20:38:15 2014, Joel Koshy wrote:
Also, when you say "graceful shutdown" you mean you issue SIGTERM? Do
you have controlled.shutdown.enable=true in the broker config. If that
is set and the controlled shutdown succeeds (i.e., if you see
'Controlled shutdown succeeded' in the broker log) then you shouldn't
be seeing the data loss warning in your controller log during the
shutdown and restarts. Or are you seeing it at other times as well?

WRT the OffsetOutOfRangeException: is your broker down for a long
period? Do you have a very low retention setting for your topics? Or
are you bringing up a consumer that has been down for a long period?

Thanks,

Joel

On Tue, Apr 08, 2014 at 04:58:08PM -0700, Guozhang Wang wrote:
Hi Alex,

1. There is no "cool-off" time since the rebalance should be done before
the server complete shutdown.

2. The logs are indicating there is possible data loss, which is "expected"
if your producer's required.ack config is <= 1 but not == -1. If you do not
want data loss, you can change that config value in your producer clients
to be > 1, which will effectively trade some latency and availability for
consistency.

Guozhang


On Tue, Apr 8, 2014 at 9:51 AM, Alex Gray <alex.g...@inin.com> wrote:

We have 3 Zookeepers and 3 Kafka Brokers, version 0.8.0.

I gracefully shutdown one of the kafka brokers.

Question 1:  Should I wait some time before starting the broker back up,
or can I restart it as soon as possible?  In other words, do I have to wait
for the other brokers to "re-balance (or whatever they do)" before starting
it back up?

Question 2: Every once in a while, I get the following exception when the
kafka broker is starting up.  Is this bad?  Searching around the
newsgroups, I could not get a definitive answer. Example:
http://grokbase.com/t/kafka/users/13cq54bx5q/understanding-
offsetoutofrangeexceptions
http://grokbase.com/t/kafka/users/1413hp296y/trouble-
recovering-after-a-crashed-broker

Here is the exception:
[2014-04-08 00:02:40,555] ERROR [KafkaApi-3] Error when processing fetch
request for partition [KeyPairGenerated,0] offset 514 from consumer with
correlation id 85 (kafka.server.KafkaApis)
kafka.common.OffsetOutOfRangeException: Request for offset 514 but we
only have log segments in the range 0 to 0.
     at kafka.log.Log.read(Log.scala:429)
     at kafka.server.KafkaApis.kafka$server$KafkaApis$$
readMessageSet(KafkaApis.scala:388)
     at kafka.server.KafkaApis$$anonfun$kafka$server$
KafkaApis$$readMessageSets$1.apply(KafkaApis.scala:334)
     at kafka.server.KafkaApis$$anonfun$kafka$server$
KafkaApis$$readMessageSets$1.apply(KafkaApis.scala:330)
     at scala.collection.TraversableLike$$anonfun$map$
1.apply(TraversableLike.scala:206)
     at scala.collection.TraversableLike$$anonfun$map$
1.apply(TraversableLike.scala:206)
     at scala.collection.immutable.Map$Map1.foreach(Map.scala:105)
     at scala.collection.TraversableLike$class.map(
TraversableLike.scala:206)
     at scala.collection.immutable.Map$Map1.map(Map.scala:93)
     at kafka.server.KafkaApis.kafka$server$KafkaApis$$
readMessageSets(KafkaApis.scala:330)
     at kafka.server.KafkaApis.handleFetchRequest(KafkaApis.scala:296)
     at kafka.server.KafkaApis.handle(KafkaApis.scala:66)
     at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:42)
     at java.lang.Thread.run(Thread.java:722)

And in the controller.log, I see every once in a while something like:

controller.log.2014-04-01-04:[2014-04-01 04:42:41,713] WARN [
OfflinePartitionLeaderSelector]: No broker in ISR is alive for
[KeyPairGenerated,0]. Elect leader 3 from live brokers 3. There's potential
data loss. (kafka.controller.OfflinePartitionLeaderSelector)

(Which I did via: grep "data loss" *)

I'm not a programmer: I am the admin for these machines, and I just want
to make sure everything is cool.
Oh, the server.properties has:
default.replication.factor=3

Thanks,

Alex




--
-- Guozhang


--
*Alex Gray* | DevOps Engineer, PureCloud
Phone +1.317.493.4291 | mobile +1.857.636.2810
*Interactive Intelligence*
Deliberately Innovative
www.inin.com <http://www.inin.com/>

Reply via email to