Good morning Jun. Correction in terms of open file handler limit. I was wrong. I re-ran the command ulimit -Hn and it shows 10240. Which brings to the next question. How appropriately calculate open files handler required by Kafka? What is your guys settings for this field?
Thanks, Vadim On Wed, Aug 14, 2013 at 8:19 AM, Vadim Keylis <vkeylis2...@gmail.com> wrote: > Good morning Jun. We are using Kafka 0.8 that I built from trunk in June > or early July. I forgot to mention that running ulimit on the hosts shows > open file handler set to unlimited. What are the ways to recover from last > error and restart Kafka ? How can I delete topic with Kafka service on all > host down? How many topics can Kafka support to prevent to many open file > exception? What did you set open file handler limit in your cluster? > > Thanks so much, > Vadim > > Sent from my iPhone > > On Aug 14, 2013, at 7:38 AM, Jun Rao <jun...@gmail.com> wrote: > > > The first error is caused by too many open file handlers. Kafka keeps > each > > of the segment files open on the broker. So, the more topics/partitions > you > > have, the more file handlers you need. You probably need to increase the > > open file handler limit and also monitor the # of open file handlers so > > that you can get an alert when it gets close to the limit. > > > > Not sure why you get the second error on restart. Are you using the 0.8 > > beta1 release? > > > > Thanks, > > > > Jun > > > > > > On Tue, Aug 13, 2013 at 11:04 PM, Vadim Keylis <vkeylis2...@gmail.com > >wrote: > > > >> We have 3 node kafka cluster. I initially created 4 topics. > >> I wrote small shell script to create 150 topics. > >> > >> TOPICS=$(< $1) > >> for topic in $TOPICS > >> do > >> echo "/usr/local/kafka/bin/kafka-create-topic.sh --replica 3 --topic > >> $topic --zookeeper $2:2181/kafka --partition 36" > >> /usr/local/kafka/bin/kafka-create-topic.sh --replica 3 --topic $topic > >> --zookeeper $2:2181/kafka --partition 36 > >> done > >> > >> 10 minutes later I see messages like this > >> [2013-08-13 11:43:58,944] INFO [ReplicaFetcherManager on broker 7] > Removing > >> fetcher for partition [m3_registration,0] > >> (kafka.server.ReplicaFetcherManager) followed by > >> [2013-08-13 11:44:00,067] WARN [ReplicaFetcherThread-0-8], error for > >> partition [m3_registration,22] to broker 8 > >> (kafka.server.ReplicaFetcherThread) > >> kafka.common.NotLeaderForPartitionException > >> > >> Then a few minutes later followed by the following messages that > >> overwhelmed logging system. > >> [2013-08-13 11:46:35,916] ERROR error in loggedRunnable > >> (kafka.utils.Utils$) > >> java.io.FileNotFoundException: > >> /home/kafka/data7/replication-offset-checkpoint.tmp (Too many open > files) > >> at java.io.FileOutputStream.open(Native Method) > >> at java.io.FileOutputStream.<init>(FileOutputStream.java:194) > >> > >> I restarted the service after discovering the problem. After a few > minutes > >> attempting to recover kafka service crashed with the following error. > >> > >> [2013-08-13 17:20:08,953] INFO [Log Manager on Broker 7] Loading log > >> 'm3_registration-29' (kafka.log.LogManager) > >> [2013-08-13 17:20:08,992] FATAL Fatal error during KafkaServerStable > >> startup. Prepare to shutdown (kafka.server.KafkaServerStartable) > >> java.lang.IllegalStateException: Found log file with no corresponding > index > >> file. > >> > >> No activity on the cluster after topics were added. > >> What could have cause the crash and trigger too many open files > exception? > >> What the best way to recover in order to restart kafka service(Not sure > if > >> delete topic command will work in this particular case as all 3 services > >> would not start)?How to prevent in the future? > >> > >> Thanks so much in advance, > >> Vadim > >> >