RE: Too many open files in kafka 0.9
According to https://issues.apache.org/jira/browse/KAFKA-3806 I have adjusted offset.retention.minutes and it seems that it solves my issue -Message d'origine- De : Ted Yu [mailto:yuzhih...@gmail.com] Envoyé : mercredi 29 novembre 2017 19:41 À : users@kafka.apache.org Objet : Re: Too many open files in kafka 0.9 There is KAFKA-3317 which is still open. Have you seen this ? http://search-hadoop.com/m/Kafka/uyzND1KvOlt1p5UcE?subj=Re+Brokers+is+down+by+java+io+IOException+Too+many+open+files+ On Wed, Nov 29, 2017 at 8:55 AM, REYMOND Jean-max (BPCE-IT - SYNCHRONE TECHNOLOGIES) <jean-max.reymond.prestata...@bpce-it.fr> wrote: > We have a cluster with 3 brokers and kafka 0.9.0.1. One week ago, we > decide to adjust log.retention.hours from 10 days to 2 days. Stop and > go the cluster and it is ok. But for one broker, we have every day > more and more datas and two days later crash with message too many > open files. lsof return 7400 opened files. We adjust to 1 and > crash again. So, in our data repository, we remove all the datas and > run again and after a few minutes, cluster is OK. But now, after atfer > 6 hours, the two valid brokers have 72 GB and the other broker has 90 > GB. lsof -p xxx returns 1030 and it is growing continously. I am sure > that tomorrow morning, we will have a crash. > > In the server.log of the broken broker, > > [2017-11-29 17:28:51,360] INFO Rolled new log segment for > '__consumer_offsets-27' in 1 ms. (kafka.log.Log) > [2017-11-29 17:31:28,836] INFO Rolled new log segment for > '__consumer_offsets-8' in 1 ms. (kafka.log.Log) > [2017-11-29 17:35:22,100] INFO Rolled new log segment for > '__consumer_offsets-12' in 1 ms. (kafka.log.Log) > [2017-11-29 17:37:55,984] INFO Rolled new log segment for > '__consumer_offsets-11' in 1 ms. (kafka.log.Log) > [2017-11-29 17:38:30,600] INFO [Group Metadata Manager on Broker 2]: > Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator. > GroupMetadataManager) > [2017-11-29 17:39:55,836] INFO Rolled new log segment for > '__consumer_offsets-16' in 1 ms. (kafka.log.Log) > [2017-11-29 17:43:38,300] INFO Rolled new log segment for > '__consumer_offsets-48' in 1 ms. (kafka.log.Log) > [2017-11-29 17:44:21,110] INFO Rolled new log segment for > '__consumer_offsets-36' in 1 ms. (kafka.log.Log) > [2017-11-29 17:48:30,600] INFO [Group Metadata Manager on Broker 2]: > Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator. > GroupMetadataManager) > > And in the same time on a valid broker > > [2017-11-29 17:44:46,704] INFO Deleting index > /pfic/kafka/data/kafka_data/__consumer_offsets-48/ > 002686063378.index.deleted (kafka.log.OffsetIndex) > [2017-11-29 17:44:47,341] INFO Deleting segment 2687254936 from log > __consumer_offsets-48. (kafka.log.Log) > [2017-11-29 17:44:47,376] INFO Deleting index > /pfic/kafka/data/kafka_data/__consumer_offsets-48/ > 002687254936.index.deleted (kafka.log.OffsetIndex) > [2017-11-29 17:45:32,991] INFO Deleting segment 0 from log > __consumer_offsets-36. (kafka.log.Log) > [2017-11-29 17:45:32,991] INFO Deleting segment 1769617973 from log > __consumer_offsets-36. (kafka.log.Log) > [2017-11-29 17:45:32,993] INFO Deleting index > /pfic/kafka/data/kafka_data/__consumer_offsets-36/ > .index.deleted (kafka.log.OffsetIndex) > [2017-11-29 17:45:32,993] INFO Deleting index > /pfic/kafka/data/kafka_data/__consumer_offsets-36/ > 001769617973.index.deleted (kafka.log.OffsetIndex) > [2017-11-29 17:45:33,593] INFO Deleting segment 1770704579 from log > __consumer_offsets-36. (kafka.log.Log) > [2017-11-29 17:45:33,627] INFO Deleting index > /pfic/kafka/data/kafka_data/__consumer_offsets-36/ > 001770704579.index.deleted (kafka.log.OffsetIndex) > [2017-11-29 17:45:58,394] INFO [Group Metadata Manager on Broker 0]: > Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator. > GroupMetadataManager) > > So, the broken broker never delete a segment. Of course, the three > brokers have the same configuration. > What's happen ? > Thanks for your advices, > > > Jean-Max REYMOND > BPCE Infogérance & Technologies > > > -- > L'intégrité de ce message n'étant pas assurée sur Internet, BPCE-IT ne > peut être tenu responsable de son contenu. Si vous n'êtes pas > destinataire de ce message, merci de le détruire et d'avertir l'expéditeur. > The integrity of this message cannot be guaranteed on the Internet. > BPCE-IT cannot therefore be considered responsible for the contents. > If you are not the intended recipient of this message, then please > delete it and notify the sender. >
RE: Too many open files in kafka 0.9
Thanks for your precious advices. Yes, we had upgraded nofiles parameter in limits.conf but after one week, the big crash. Precisely, on the broken node, __consumer_offsets-XX directories are never deleted and after 20 hours, we have 70 GB of these directories and files. This is the huge difference with the others brokers., so is it safe to remove these directories __consumer_offsets-XX if not acceded since one day ? -Message d'origine- De : Ted Yu [mailto:yuzhih...@gmail.com] Envoyé : mercredi 29 novembre 2017 19:41 À : users@kafka.apache.org Objet : Re: Too many open files in kafka 0.9 There is KAFKA-3317 which is still open. Have you seen this ? http://search-hadoop.com/m/Kafka/uyzND1KvOlt1p5UcE?subj=Re+Brokers+is+down+by+java+io+IOException+Too+many+open+files+ On Wed, Nov 29, 2017 at 8:55 AM, REYMOND Jean-max (BPCE-IT - SYNCHRONE TECHNOLOGIES) <jean-max.reymond.prestata...@bpce-it.fr> wrote: > We have a cluster with 3 brokers and kafka 0.9.0.1. One week ago, we > decide to adjust log.retention.hours from 10 days to 2 days. Stop and > go the cluster and it is ok. But for one broker, we have every day > more and more datas and two days later crash with message too many > open files. lsof return 7400 opened files. We adjust to 1 and > crash again. So, in our data repository, we remove all the datas and > run again and after a few minutes, cluster is OK. But now, after atfer > 6 hours, the two valid brokers have 72 GB and the other broker has 90 > GB. lsof -p xxx returns 1030 and it is growing continously. I am sure > that tomorrow morning, we will have a crash. > > In the server.log of the broken broker, > > [2017-11-29 17:28:51,360] INFO Rolled new log segment for > '__consumer_offsets-27' in 1 ms. (kafka.log.Log) > [2017-11-29 17:31:28,836] INFO Rolled new log segment for > '__consumer_offsets-8' in 1 ms. (kafka.log.Log) > [2017-11-29 17:35:22,100] INFO Rolled new log segment for > '__consumer_offsets-12' in 1 ms. (kafka.log.Log) > [2017-11-29 17:37:55,984] INFO Rolled new log segment for > '__consumer_offsets-11' in 1 ms. (kafka.log.Log) > [2017-11-29 17:38:30,600] INFO [Group Metadata Manager on Broker 2]: > Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator. > GroupMetadataManager) > [2017-11-29 17:39:55,836] INFO Rolled new log segment for > '__consumer_offsets-16' in 1 ms. (kafka.log.Log) > [2017-11-29 17:43:38,300] INFO Rolled new log segment for > '__consumer_offsets-48' in 1 ms. (kafka.log.Log) > [2017-11-29 17:44:21,110] INFO Rolled new log segment for > '__consumer_offsets-36' in 1 ms. (kafka.log.Log) > [2017-11-29 17:48:30,600] INFO [Group Metadata Manager on Broker 2]: > Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator. > GroupMetadataManager) > > And in the same time on a valid broker > > [2017-11-29 17:44:46,704] INFO Deleting index > /pfic/kafka/data/kafka_data/__consumer_offsets-48/ > 002686063378.index.deleted (kafka.log.OffsetIndex) > [2017-11-29 17:44:47,341] INFO Deleting segment 2687254936 from log > __consumer_offsets-48. (kafka.log.Log) > [2017-11-29 17:44:47,376] INFO Deleting index > /pfic/kafka/data/kafka_data/__consumer_offsets-48/ > 002687254936.index.deleted (kafka.log.OffsetIndex) > [2017-11-29 17:45:32,991] INFO Deleting segment 0 from log > __consumer_offsets-36. (kafka.log.Log) > [2017-11-29 17:45:32,991] INFO Deleting segment 1769617973 from log > __consumer_offsets-36. (kafka.log.Log) > [2017-11-29 17:45:32,993] INFO Deleting index > /pfic/kafka/data/kafka_data/__consumer_offsets-36/ > .index.deleted (kafka.log.OffsetIndex) > [2017-11-29 17:45:32,993] INFO Deleting index > /pfic/kafka/data/kafka_data/__consumer_offsets-36/ > 001769617973.index.deleted (kafka.log.OffsetIndex) > [2017-11-29 17:45:33,593] INFO Deleting segment 1770704579 from log > __consumer_offsets-36. (kafka.log.Log) > [2017-11-29 17:45:33,627] INFO Deleting index > /pfic/kafka/data/kafka_data/__consumer_offsets-36/ > 001770704579.index.deleted (kafka.log.OffsetIndex) > [2017-11-29 17:45:58,394] INFO [Group Metadata Manager on Broker 0]: > Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator. > GroupMetadataManager) > > So, the broken broker never delete a segment. Of course, the three > brokers have the same configuration. > What's happen ? > Thanks for your advices, > > > Jean-Max REYMOND > BPCE Infogérance & Technologies > > > -- > L'intégrité de ce message n'étant pas assurée sur Internet, BPCE-IT ne > peut être tenu responsable de son contenu. Si vous n'êtes pas > destinataire de ce message, merci de le détruire et d'avertir l'expéditeur. >
Re: Too many open files in kafka 0.9
There is KAFKA-3317 which is still open. Have you seen this ? http://search-hadoop.com/m/Kafka/uyzND1KvOlt1p5UcE?subj=Re+Brokers+is+down+by+java+io+IOException+Too+many+open+files+ On Wed, Nov 29, 2017 at 8:55 AM, REYMOND Jean-max (BPCE-IT - SYNCHRONE TECHNOLOGIES) <jean-max.reymond.prestata...@bpce-it.fr> wrote: > We have a cluster with 3 brokers and kafka 0.9.0.1. One week ago, we > decide to adjust log.retention.hours from 10 days to 2 days. Stop and go > the cluster and it is ok. But for one broker, we have every day more and > more datas and two days later crash with message too many open files. lsof > return 7400 opened files. We adjust to 1 and crash again. So, in our > data repository, we remove all the datas and run again and after a few > minutes, cluster is OK. But now, after atfer 6 hours, the two valid brokers > have 72 GB and the other broker has 90 GB. lsof -p xxx returns 1030 and it > is growing continously. I am sure that tomorrow morning, we will have a > crash. > > In the server.log of the broken broker, > > [2017-11-29 17:28:51,360] INFO Rolled new log segment for > '__consumer_offsets-27' in 1 ms. (kafka.log.Log) > [2017-11-29 17:31:28,836] INFO Rolled new log segment for > '__consumer_offsets-8' in 1 ms. (kafka.log.Log) > [2017-11-29 17:35:22,100] INFO Rolled new log segment for > '__consumer_offsets-12' in 1 ms. (kafka.log.Log) > [2017-11-29 17:37:55,984] INFO Rolled new log segment for > '__consumer_offsets-11' in 1 ms. (kafka.log.Log) > [2017-11-29 17:38:30,600] INFO [Group Metadata Manager on Broker 2]: > Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator. > GroupMetadataManager) > [2017-11-29 17:39:55,836] INFO Rolled new log segment for > '__consumer_offsets-16' in 1 ms. (kafka.log.Log) > [2017-11-29 17:43:38,300] INFO Rolled new log segment for > '__consumer_offsets-48' in 1 ms. (kafka.log.Log) > [2017-11-29 17:44:21,110] INFO Rolled new log segment for > '__consumer_offsets-36' in 1 ms. (kafka.log.Log) > [2017-11-29 17:48:30,600] INFO [Group Metadata Manager on Broker 2]: > Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator. > GroupMetadataManager) > > And in the same time on a valid broker > > [2017-11-29 17:44:46,704] INFO Deleting index > /pfic/kafka/data/kafka_data/__consumer_offsets-48/ > 002686063378.index.deleted (kafka.log.OffsetIndex) > [2017-11-29 17:44:47,341] INFO Deleting segment 2687254936 from log > __consumer_offsets-48. (kafka.log.Log) > [2017-11-29 17:44:47,376] INFO Deleting index > /pfic/kafka/data/kafka_data/__consumer_offsets-48/ > 002687254936.index.deleted (kafka.log.OffsetIndex) > [2017-11-29 17:45:32,991] INFO Deleting segment 0 from log > __consumer_offsets-36. (kafka.log.Log) > [2017-11-29 17:45:32,991] INFO Deleting segment 1769617973 from log > __consumer_offsets-36. (kafka.log.Log) > [2017-11-29 17:45:32,993] INFO Deleting index > /pfic/kafka/data/kafka_data/__consumer_offsets-36/ > .index.deleted (kafka.log.OffsetIndex) > [2017-11-29 17:45:32,993] INFO Deleting index > /pfic/kafka/data/kafka_data/__consumer_offsets-36/ > 001769617973.index.deleted (kafka.log.OffsetIndex) > [2017-11-29 17:45:33,593] INFO Deleting segment 1770704579 from log > __consumer_offsets-36. (kafka.log.Log) > [2017-11-29 17:45:33,627] INFO Deleting index > /pfic/kafka/data/kafka_data/__consumer_offsets-36/ > 001770704579.index.deleted (kafka.log.OffsetIndex) > [2017-11-29 17:45:58,394] INFO [Group Metadata Manager on Broker 0]: > Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator. > GroupMetadataManager) > > So, the broken broker never delete a segment. Of course, the three brokers > have the same configuration. > What's happen ? > Thanks for your advices, > > > Jean-Max REYMOND > BPCE Infogérance & Technologies > > > -- > L'intégrité de ce message n'étant pas assurée sur Internet, BPCE-IT ne > peut être tenu responsable de son contenu. Si vous n'êtes pas destinataire > de ce message, merci de le détruire et d'avertir l'expéditeur. > The integrity of this message cannot be guaranteed on the Internet. > BPCE-IT cannot therefore be considered responsible for the contents. If you > are not the intended recipient of this message, then please delete it and > notify the sender. > > -- >
Too many open files in kafka 0.9
We have a cluster with 3 brokers and kafka 0.9.0.1. One week ago, we decide to adjust log.retention.hours from 10 days to 2 days. Stop and go the cluster and it is ok. But for one broker, we have every day more and more datas and two days later crash with message too many open files. lsof return 7400 opened files. We adjust to 1 and crash again. So, in our data repository, we remove all the datas and run again and after a few minutes, cluster is OK. But now, after atfer 6 hours, the two valid brokers have 72 GB and the other broker has 90 GB. lsof -p xxx returns 1030 and it is growing continously. I am sure that tomorrow morning, we will have a crash. In the server.log of the broken broker, [2017-11-29 17:28:51,360] INFO Rolled new log segment for '__consumer_offsets-27' in 1 ms. (kafka.log.Log) [2017-11-29 17:31:28,836] INFO Rolled new log segment for '__consumer_offsets-8' in 1 ms. (kafka.log.Log) [2017-11-29 17:35:22,100] INFO Rolled new log segment for '__consumer_offsets-12' in 1 ms. (kafka.log.Log) [2017-11-29 17:37:55,984] INFO Rolled new log segment for '__consumer_offsets-11' in 1 ms. (kafka.log.Log) [2017-11-29 17:38:30,600] INFO [Group Metadata Manager on Broker 2]: Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.GroupMetadataManager) [2017-11-29 17:39:55,836] INFO Rolled new log segment for '__consumer_offsets-16' in 1 ms. (kafka.log.Log) [2017-11-29 17:43:38,300] INFO Rolled new log segment for '__consumer_offsets-48' in 1 ms. (kafka.log.Log) [2017-11-29 17:44:21,110] INFO Rolled new log segment for '__consumer_offsets-36' in 1 ms. (kafka.log.Log) [2017-11-29 17:48:30,600] INFO [Group Metadata Manager on Broker 2]: Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.GroupMetadataManager) And in the same time on a valid broker [2017-11-29 17:44:46,704] INFO Deleting index /pfic/kafka/data/kafka_data/__consumer_offsets-48/002686063378.index.deleted (kafka.log.OffsetIndex) [2017-11-29 17:44:47,341] INFO Deleting segment 2687254936 from log __consumer_offsets-48. (kafka.log.Log) [2017-11-29 17:44:47,376] INFO Deleting index /pfic/kafka/data/kafka_data/__consumer_offsets-48/002687254936.index.deleted (kafka.log.OffsetIndex) [2017-11-29 17:45:32,991] INFO Deleting segment 0 from log __consumer_offsets-36. (kafka.log.Log) [2017-11-29 17:45:32,991] INFO Deleting segment 1769617973 from log __consumer_offsets-36. (kafka.log.Log) [2017-11-29 17:45:32,993] INFO Deleting index /pfic/kafka/data/kafka_data/__consumer_offsets-36/.index.deleted (kafka.log.OffsetIndex) [2017-11-29 17:45:32,993] INFO Deleting index /pfic/kafka/data/kafka_data/__consumer_offsets-36/001769617973.index.deleted (kafka.log.OffsetIndex) [2017-11-29 17:45:33,593] INFO Deleting segment 1770704579 from log __consumer_offsets-36. (kafka.log.Log) [2017-11-29 17:45:33,627] INFO Deleting index /pfic/kafka/data/kafka_data/__consumer_offsets-36/001770704579.index.deleted (kafka.log.OffsetIndex) [2017-11-29 17:45:58,394] INFO [Group Metadata Manager on Broker 0]: Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.GroupMetadataManager) So, the broken broker never delete a segment. Of course, the three brokers have the same configuration. What's happen ? Thanks for your advices, Jean-Max REYMOND BPCE Infogérance & Technologies -- L'intégrité de ce message n'étant pas assurée sur Internet, BPCE-IT ne peut être tenu responsable de son contenu. Si vous n'êtes pas destinataire de ce message, merci de le détruire et d'avertir l'expéditeur. The integrity of this message cannot be guaranteed on the Internet. BPCE-IT cannot therefore be considered responsible for the contents. If you are not the intended recipient of this message, then please delete it and notify the sender. --
Re: Brokers is down by “java.io.IOException: Too many open files”
I’ve seen where setting network configurations within the OS can help mitigate some of the “Too many open files” issue as well. Try changing the following items on the OS to try to have used network connections close as quickly as possible in order to keep file handle use down: sysctl -w net.ipv4.tcp_fin_timeout=10 By default, this value is 60 seconds. Reducing the value to 10 seconds will allow socket related file handles to be released sooner. sysctl -w net.ipv4.tcp_synack_retries=3 By default, this value is 5. Setting this to 3 decreases the amount of time that it will take for a failed passive TCP connection to timeout which releases resources sooner. Additionally, be sure that your zookeeper account ulimit NOFILE are also set to a high enough value so that they are able to service requests for network connections at a comparable amount as the Kafka broker. The above network parameters help zookeeper as well, so look into implementing them on your zookeeper nodes as well. Finally, make sure the account that you run your producer and consumer processes also have appropriate ulimit setting for NOFILE and the nodes where they run use the network configurations above. Thank you, Jeff Groves On 5/17/17, 1:11 AM, "Yang Cui" <y...@freewheel.tv> wrote: Hi Caleb, We already set the number of max open files to 100,000 before this error happened. Normally, the file description is about 20,000, but in some time, it suddenly jump to so many count. This is our monitor about Kafka FD info: 2017-05-17-05:04:19 FD_total_num:19261 FD_pair_num:15256 FD_ads_num:3191 FD_Type:TYPE 1 DIR 2 unix 2 sock 4 CHR 7 a_inode 73 FIFO 146 IPv4 149 REG 18877 2017-05-17-05:04:31 FD_total_num:19267 FD_pair_num:15259 FD_ads_num:3192 FD_Type:TYPE 1 DIR 2 unix 2 sock 4 CHR 7 a_inode 73 FIFO 146 IPv4 152 REG 18883 2017-05-17-05:04:44 FD_total_num:19272 FD_pair_num:15263 FD_ads_num:3197 FD_Type:TYPE 1 DIR 2 unix 2 sock 4 CHR 7 a_inode 73 FIFO 146 IPv4 148 REG 18892 2017-05-17-05:04:57 FD_total_num:19280 FD_pair_num:15268 FD_ads_num:3197 FD_Type:TYPE 1 DIR 2 unix 2 sock 4 CHR 7 a_inode 73 FIFO 146 IPv4 150 REG 18895 2017-05-17-05:05:09 FD_total_num:19277 FD_pair_num:15271 FD_ads_num:3197 FD_Type:TYPE 1 DIR 2 unix 2 sock 4 CHR 7 a_inode 73 FIFO 146 IPv4 152 REG 18898 2017-05-17-05:05:21 FD_total_num:19223 FD_pair_num:15217 FD_ads_num:3189 FD_Type:TYPE 1 DIR 2 unix 2 sock 4 CHR 7 a_inode 73 FIFO 146 IPv4 158 REG 18836 2017-05-17-05:05:34 FD_total_num:19235 FD_pair_num:15223 FD_ads_num:3189 FD_Type:TYPE 1 DIR 2 unix 2 sock 4 CHR 7 a_inode 73 FIFO 146 IPv4 158 REG 18842 On 13/05/2017, 9:57 AM, "Caleb Welton" <ca...@autonomic.ai> wrote: You need to up your OS open file limits, something like this should work: # /etc/security/limits.conf * - nofile 65536 On Fri, May 12, 2017 at 6:34 PM, Yang Cui <y...@freewheel.tv> wrote: > Our Kafka cluster is broken down by the problem “java.io.IOException: Too > many open files” three times in 3 weeks. > > We encounter these problem on both 0.9.0.1 and 0.10.2.1 version. > > The error is like: > > java.io.IOException: Too many open files > at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) > at sun.nio.ch.ServerSocketChannelImpl.accept( > ServerSocketChannelImpl.java:422) > at sun.nio.ch.ServerSocketChannelImpl.accept( > ServerSocketChannelImpl.java:250) > at kafka.network.Acceptor.accept(SocketServer.scala:340) > at kafka.network.Acceptor.run(SocketServer.scala:283) > at java.lang.Thread.run(Thread.java:745) > > Is someone encounter the similar problem? > > >
Re: Brokers is down by “java.io.IOException: Too many open files”
Hi Caleb, We already set the number of max open files to 100,000 before this error happened. Normally, the file description is about 20,000, but in some time, it suddenly jump to so many count. This is our monitor about Kafka FD info: 2017-05-17-05:04:19 FD_total_num:19261 FD_pair_num:15256 FD_ads_num:3191 FD_Type:TYPE 1 DIR 2 unix 2 sock 4 CHR 7 a_inode 73 FIFO 146 IPv4 149 REG 18877 2017-05-17-05:04:31 FD_total_num:19267 FD_pair_num:15259 FD_ads_num:3192 FD_Type:TYPE 1 DIR 2 unix 2 sock 4 CHR 7 a_inode 73 FIFO 146 IPv4 152 REG 18883 2017-05-17-05:04:44 FD_total_num:19272 FD_pair_num:15263 FD_ads_num:3197 FD_Type:TYPE 1 DIR 2 unix 2 sock 4 CHR 7 a_inode 73 FIFO 146 IPv4 148 REG 18892 2017-05-17-05:04:57 FD_total_num:19280 FD_pair_num:15268 FD_ads_num:3197 FD_Type:TYPE 1 DIR 2 unix 2 sock 4 CHR 7 a_inode 73 FIFO 146 IPv4 150 REG 18895 2017-05-17-05:05:09 FD_total_num:19277 FD_pair_num:15271 FD_ads_num:3197 FD_Type:TYPE 1 DIR 2 unix 2 sock 4 CHR 7 a_inode 73 FIFO 146 IPv4 152 REG 18898 2017-05-17-05:05:21 FD_total_num:19223 FD_pair_num:15217 FD_ads_num:3189 FD_Type:TYPE 1 DIR 2 unix 2 sock 4 CHR 7 a_inode 73 FIFO 146 IPv4 158 REG 18836 2017-05-17-05:05:34 FD_total_num:19235 FD_pair_num:15223 FD_ads_num:3189 FD_Type:TYPE 1 DIR 2 unix 2 sock 4 CHR 7 a_inode 73 FIFO 146 IPv4 158 REG 18842 On 13/05/2017, 9:57 AM, "Caleb Welton" <ca...@autonomic.ai> wrote: You need to up your OS open file limits, something like this should work: # /etc/security/limits.conf * - nofile 65536 On Fri, May 12, 2017 at 6:34 PM, Yang Cui <y...@freewheel.tv> wrote: > Our Kafka cluster is broken down by the problem “java.io.IOException: Too > many open files” three times in 3 weeks. > > We encounter these problem on both 0.9.0.1 and 0.10.2.1 version. > > The error is like: > > java.io.IOException: Too many open files > at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) > at sun.nio.ch.ServerSocketChannelImpl.accept( > ServerSocketChannelImpl.java:422) > at sun.nio.ch.ServerSocketChannelImpl.accept( > ServerSocketChannelImpl.java:250) > at kafka.network.Acceptor.accept(SocketServer.scala:340) > at kafka.network.Acceptor.run(SocketServer.scala:283) > at java.lang.Thread.run(Thread.java:745) > > Is someone encounter the similar problem? > > >
Re: Brokers is down by “java.io.IOException: Too many open files”
If you're using a systemd based OS you'll actually need to set it in the unit file. LimitNOFILE=10 https://kafka.apache.org/documentation/#upgrade_10_1_breaking contains some changes re file handles as well. __ Sam Pegler PRODUCTION ENGINEER T. +44(0) 07 562 867 486 <http://www.infectiousmedia.com/> 3-7 Herbal Hill / London / EC1R 5EJ www.infectiousmedia.com This email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately, and do not disclose the contents to another person, use it for any purpose, or store, or copy the information in any medium. Please also destroy and delete the message from your computer. On 13 May 2017 at 02:57, Caleb Welton <ca...@autonomic.ai> wrote: > You need to up your OS open file limits, something like this should work: > > # /etc/security/limits.conf > * - nofile 65536 > > > > > On Fri, May 12, 2017 at 6:34 PM, Yang Cui <y...@freewheel.tv> wrote: > > > Our Kafka cluster is broken down by the problem “java.io.IOException: > Too > > many open files” three times in 3 weeks. > > > > We encounter these problem on both 0.9.0.1 and 0.10.2.1 version. > > > > The error is like: > > > > java.io.IOException: Too many open files > > at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) > > at sun.nio.ch.ServerSocketChannelImpl.accept( > > ServerSocketChannelImpl.java:422) > > at sun.nio.ch.ServerSocketChannelImpl.accept( > > ServerSocketChannelImpl.java:250) > > at kafka.network.Acceptor.accept(SocketServer.scala:340) > > at kafka.network.Acceptor.run(SocketServer.scala:283) > > at java.lang.Thread.run(Thread.java:745) > > > > Is someone encounter the similar problem? > > > > > > >
Re: Brokers is down by “java.io.IOException: Too many open files”
You need to up your OS open file limits, something like this should work: # /etc/security/limits.conf * - nofile 65536 On Fri, May 12, 2017 at 6:34 PM, Yang Cui <y...@freewheel.tv> wrote: > Our Kafka cluster is broken down by the problem “java.io.IOException: Too > many open files” three times in 3 weeks. > > We encounter these problem on both 0.9.0.1 and 0.10.2.1 version. > > The error is like: > > java.io.IOException: Too many open files > at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) > at sun.nio.ch.ServerSocketChannelImpl.accept( > ServerSocketChannelImpl.java:422) > at sun.nio.ch.ServerSocketChannelImpl.accept( > ServerSocketChannelImpl.java:250) > at kafka.network.Acceptor.accept(SocketServer.scala:340) > at kafka.network.Acceptor.run(SocketServer.scala:283) > at java.lang.Thread.run(Thread.java:745) > > Is someone encounter the similar problem? > > >
Brokers is down by “java.io.IOException: Too many open files”
Our Kafka cluster is broken down by the problem “java.io.IOException: Too many open files” three times in 3 weeks. We encounter these problem on both 0.9.0.1 and 0.10.2.1 version. The error is like: java.io.IOException: Too many open files at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422) at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250) at kafka.network.Acceptor.accept(SocketServer.scala:340) at kafka.network.Acceptor.run(SocketServer.scala:283) at java.lang.Thread.run(Thread.java:745) Is someone encounter the similar problem?
Re: Too many open files
What does the output of: lsof -p show on that specific node? -Jaikiran On Monday 12 September 2016 10:03 PM, Michael Sparr wrote: 5-node Kafka cluster, bare metal, Ubuntu 14.04.x LTS with 64GB RAM, 8-core, 960GB SSD boxes and a single node in cluster is filling logs with the following: [2016-09-12 09:34:49,522] ERROR Error while accepting connection (kafka.network.Acceptor) java.io.IOException: Too many open files at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422) at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250) at kafka.network.Acceptor.accept(SocketServer.scala:323) at kafka.network.Acceptor.run(SocketServer.scala:268) at java.lang.Thread.run(Thread.java:745) No other nodes in cluster have this issue. Separate application server has consumers/producers using librdkafka + confluent kafka python library with a few million messages published to under 100 topics. For days now the /var/log/kafka/kafka.server.log.N are filling up server with this message and using up all space on only a single server node in cluster. I have soft/hard limits at 65,535 for all users so > ulimit -n reveals 65535 Is there a setting I should add from librdkafka config in the Python producer clients to shorten socket connections even further to avoid this or something else going on? Should I write this as issue in Github repo and if so, which project? Thanks!
Re: Too many open files
What does the output of: lsof -p show? -Jaikiran On Monday 12 September 2016 10:03 PM, Michael Sparr wrote: 5-node Kafka cluster, bare metal, Ubuntu 14.04.x LTS with 64GB RAM, 8-core, 960GB SSD boxes and a single node in cluster is filling logs with the following: [2016-09-12 09:34:49,522] ERROR Error while accepting connection (kafka.network.Acceptor) java.io.IOException: Too many open files at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422) at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250) at kafka.network.Acceptor.accept(SocketServer.scala:323) at kafka.network.Acceptor.run(SocketServer.scala:268) at java.lang.Thread.run(Thread.java:745) No other nodes in cluster have this issue. Separate application server has consumers/producers using librdkafka + confluent kafka python library with a few million messages published to under 100 topics. For days now the /var/log/kafka/kafka.server.log.N are filling up server with this message and using up all space on only a single server node in cluster. I have soft/hard limits at 65,535 for all users so > ulimit -n reveals 65535 Is there a setting I should add from librdkafka config in the Python producer clients to shorten socket connections even further to avoid this or something else going on? Should I write this as issue in Github repo and if so, which project? Thanks!
Too many open files
5-node Kafka cluster, bare metal, Ubuntu 14.04.x LTS with 64GB RAM, 8-core, 960GB SSD boxes and a single node in cluster is filling logs with the following: [2016-09-12 09:34:49,522] ERROR Error while accepting connection (kafka.network.Acceptor) java.io.IOException: Too many open files at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422) at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250) at kafka.network.Acceptor.accept(SocketServer.scala:323) at kafka.network.Acceptor.run(SocketServer.scala:268) at java.lang.Thread.run(Thread.java:745) No other nodes in cluster have this issue. Separate application server has consumers/producers using librdkafka + confluent kafka python library with a few million messages published to under 100 topics. For days now the /var/log/kafka/kafka.server.log.N are filling up server with this message and using up all space on only a single server node in cluster. I have soft/hard limits at 65,535 for all users so > ulimit -n reveals 65535 Is there a setting I should add from librdkafka config in the Python producer clients to shorten socket connections even further to avoid this or something else going on? Should I write this as issue in Github repo and if so, which project? Thanks!
Re: Too Many Open Files
What are the producers/consumers for the Kafka cluster? Remember that its not just files but also sockets that add to the count. I had seen issues when we had a network switch problem and had Storm consumers. The switch would cause issues in connectivity between Kafka brokers, zookeepers and clients, causing a flood of connections from everyone to each other. On 8/1/16, 7:14 AM, "Scott Thibault" <scott.thiba...@multiscalehn.com> wrote: Did you verify that the process has the correct limit applied? cat /proc//limits --Scott Thibault On Sun, Jul 31, 2016 at 4:14 PM, Kessiler Rodrigues <kessi...@callinize.com> wrote: > I’m still experiencing this issue… > > Here are the kafka logs. > > [2016-07-31 20:10:35,658] ERROR Error while accepting connection > (kafka.network.Acceptor) > java.io.IOException: Too many open files > at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) > at > sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422) > at > sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250) > at kafka.network.Acceptor.accept(SocketServer.scala:323) > at kafka.network.Acceptor.run(SocketServer.scala:268) > at java.lang.Thread.run(Thread.java:745) > [2016-07-31 20:10:35,658] ERROR Error while accepting connection > (kafka.network.Acceptor) > java.io.IOException: Too many open files > at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) > at > sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422) > at > sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250) > at kafka.network.Acceptor.accept(SocketServer.scala:323) > at kafka.network.Acceptor.run(SocketServer.scala:268) > at java.lang.Thread.run(Thread.java:745) > [2016-07-31 20:10:35,658] ERROR Error while accepting connection > (kafka.network.Acceptor) > java.io.IOException: Too many open files > at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) > at > sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422) > at > sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250) > at kafka.network.Acceptor.accept(SocketServer.scala:323) > at kafka.network.Acceptor.run(SocketServer.scala:268) > at java.lang.Thread.run(Thread.java:745) > > My ulimit is 1 million, how is that possible? > > Can someone help with this? > > > > On Jul 30, 2016, at 5:05 AM, Kessiler Rodrigues <kessi...@callinize.com> > wrote: > > > > I have changed it a bit. > > > > I have 10 brokers and 20k topics with 1 partition each. > > > > I looked at the kaka’s logs dir and I only have 3318 files. > > > > I’m doing some tests to see how many topics/partitions I can have, but > it is throwing too many files once it hits 15k topics.. > > > > Any thoughts? > > > > > > > >> On Jul 29, 2016, at 10:33 PM, Gwen Shapira <g...@confluent.io> wrote: > >> > >> woah, it looks like you have 15,000 replicas per broker? > >> > >> You can go into the directory you configured for kafka's log.dir and > >> see how many files you have there. Depending on your segment size and > >> retention policy, you could have hundreds of files per partition > >> there... > >> > >> Make sure you have at least that many file handles and then also add > >> handles for the client connections. > >> > >> 1 million file handles sound like a lot, but you are running lots of > >> partitions per broker... > >> > >> We normally don't see more than maybe 4000 per broker and most > >> clusters have a lot fewer, so consider adding brokers and spreading > >> partitions around a bit. > >> > >> Gwen > >> > >> On Fri, Jul 29, 2016 at 12:00 PM, Kessiler Rodrigues > >> <kessi...@callinize.com> wrote: > >>> Hi guys, > >>> > >>> I have been experiencing some issues on kafka, where its throwing too > many open files. > >>> > >>> I have around of 6k topics and 5 partitions each. > >>> > >>> My cluster was made with 6
Re: Too Many Open Files
Did you verify that the process has the correct limit applied? cat /proc//limits --Scott Thibault On Sun, Jul 31, 2016 at 4:14 PM, Kessiler Rodrigues <kessi...@callinize.com> wrote: > I’m still experiencing this issue… > > Here are the kafka logs. > > [2016-07-31 20:10:35,658] ERROR Error while accepting connection > (kafka.network.Acceptor) > java.io.IOException: Too many open files > at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) > at > sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422) > at > sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250) > at kafka.network.Acceptor.accept(SocketServer.scala:323) > at kafka.network.Acceptor.run(SocketServer.scala:268) > at java.lang.Thread.run(Thread.java:745) > [2016-07-31 20:10:35,658] ERROR Error while accepting connection > (kafka.network.Acceptor) > java.io.IOException: Too many open files > at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) > at > sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422) > at > sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250) > at kafka.network.Acceptor.accept(SocketServer.scala:323) > at kafka.network.Acceptor.run(SocketServer.scala:268) > at java.lang.Thread.run(Thread.java:745) > [2016-07-31 20:10:35,658] ERROR Error while accepting connection > (kafka.network.Acceptor) > java.io.IOException: Too many open files > at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) > at > sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422) > at > sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250) > at kafka.network.Acceptor.accept(SocketServer.scala:323) > at kafka.network.Acceptor.run(SocketServer.scala:268) > at java.lang.Thread.run(Thread.java:745) > > My ulimit is 1 million, how is that possible? > > Can someone help with this? > > > > On Jul 30, 2016, at 5:05 AM, Kessiler Rodrigues <kessi...@callinize.com> > wrote: > > > > I have changed it a bit. > > > > I have 10 brokers and 20k topics with 1 partition each. > > > > I looked at the kaka’s logs dir and I only have 3318 files. > > > > I’m doing some tests to see how many topics/partitions I can have, but > it is throwing too many files once it hits 15k topics.. > > > > Any thoughts? > > > > > > > >> On Jul 29, 2016, at 10:33 PM, Gwen Shapira <g...@confluent.io> wrote: > >> > >> woah, it looks like you have 15,000 replicas per broker? > >> > >> You can go into the directory you configured for kafka's log.dir and > >> see how many files you have there. Depending on your segment size and > >> retention policy, you could have hundreds of files per partition > >> there... > >> > >> Make sure you have at least that many file handles and then also add > >> handles for the client connections. > >> > >> 1 million file handles sound like a lot, but you are running lots of > >> partitions per broker... > >> > >> We normally don't see more than maybe 4000 per broker and most > >> clusters have a lot fewer, so consider adding brokers and spreading > >> partitions around a bit. > >> > >> Gwen > >> > >> On Fri, Jul 29, 2016 at 12:00 PM, Kessiler Rodrigues > >> <kessi...@callinize.com> wrote: > >>> Hi guys, > >>> > >>> I have been experiencing some issues on kafka, where its throwing too > many open files. > >>> > >>> I have around of 6k topics and 5 partitions each. > >>> > >>> My cluster was made with 6 brokers. All of them are running Ubuntu 16 > and the file limits settings are: > >>> > >>> `cat /proc/sys/fs/file-max` > >>> 200 > >>> > >>> `ulimit -n` > >>> 100 > >>> > >>> Anyone has experienced it before? > > > > -- *This e-mail is not encrypted. Due to the unsecured nature of unencrypted e-mail, there may be some level of risk that the information in this e-mail could be read by a third party. Accordingly, the recipient(s) named above are hereby advised to not communicate protected health information using this e-mail address. If you desire to send protected health information electronically, please contact MultiScale Health Networks at (206)538-6090*
Re: Too Many Open Files
Hey guys I got a solution for this. The kafka process wasn’t getting the limits config because I was running it under supervisor. I changed it and right now I’m using systemd to put kafka up and running! On systemd services you can setup your FD limit using a property called “LimitNOFile”. Thanks for all your help! > On Aug 1, 2016, at 5:04 AM, Anirudh P <panirudh2...@gmail.com> wrote: > > I agree with Steve. We had a similar problem where we set the ulimit to a > certain value but it was getting overridden. > It only worked when we set the ulimit after logging in as root. You might > want to give that a try if you have not done so already > > - Anirudh > > On Mon, Aug 1, 2016 at 1:19 PM, Steve Miller <st...@idrathernotsay.com> > wrote: > >> Can you run lsof -p (pid) for whatever the pid is for your Kafka process? >> >> For the fd limits you've set, I don't think subtlety is required: if >> there's a millionish lines in the output, the fd limit you set is where you >> think it is, and if it's a lot lower than that, the limit isn't being >> applied properly somehow (maybe you are running this under, say, >> supervisord, and maybe its config is lowering the limit, or the limits for >> root are as you say but the limits for the kafka user aren't being set >> properly, that sort of thing). >> >> If you do have 1M lines in the output, at least this might give you a >> place to start figuring out what's open and why. >> >>-Steve >> >>> On Jul 31, 2016, at 4:14 PM, Kessiler Rodrigues <kessi...@callinize.com> >> wrote: >>> >>> I’m still experiencing this issue… >>> >>> Here are the kafka logs. >>> >>> [2016-07-31 20:10:35,658] ERROR Error while accepting connection >> (kafka.network.Acceptor) >>> java.io.IOException: Too many open files >>> at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) >>> at >> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422) >>> at >> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250) >>> at kafka.network.Acceptor.accept(SocketServer.scala:323) >>> at kafka.network.Acceptor.run(SocketServer.scala:268) >>> at java.lang.Thread.run(Thread.java:745) >>> [2016-07-31 20:10:35,658] ERROR Error while accepting connection >> (kafka.network.Acceptor) >>> java.io.IOException: Too many open files >>> at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) >>> at >> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422) >>> at >> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250) >>> at kafka.network.Acceptor.accept(SocketServer.scala:323) >>> at kafka.network.Acceptor.run(SocketServer.scala:268) >>> at java.lang.Thread.run(Thread.java:745) >>> [2016-07-31 20:10:35,658] ERROR Error while accepting connection >> (kafka.network.Acceptor) >>> java.io.IOException: Too many open files >>> at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) >>> at >> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422) >>> at >> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250) >>> at kafka.network.Acceptor.accept(SocketServer.scala:323) >>> at kafka.network.Acceptor.run(SocketServer.scala:268) >>> at java.lang.Thread.run(Thread.java:745) >>> >>> My ulimit is 1 million, how is that possible? >>> >>> Can someone help with this? >>> >>> >>>> On Jul 30, 2016, at 5:05 AM, Kessiler Rodrigues <kessi...@callinize.com> >> wrote: >>>> >>>> I have changed it a bit. >>>> >>>> I have 10 brokers and 20k topics with 1 partition each. >>>> >>>> I looked at the kaka’s logs dir and I only have 3318 files. >>>> >>>> I’m doing some tests to see how many topics/partitions I can have, but >> it is throwing too many files once it hits 15k topics.. >>>> >>>> Any thoughts? >>>> >>>> >>>> >>>>> On Jul 29, 2016, at 10:33 PM, Gwen Shapira <g...@confluent.io> wrote: >>>>> >>>>> woah, it looks like you have 15,000 replicas per broker? >>>>> >>>>> You can go into the directory you configured for kafka's log.dir and >>>>> see how many files you have there. Depending on your segment size and >>>>> retent
Re: Too Many Open Files
I agree with Steve. We had a similar problem where we set the ulimit to a certain value but it was getting overridden. It only worked when we set the ulimit after logging in as root. You might want to give that a try if you have not done so already - Anirudh On Mon, Aug 1, 2016 at 1:19 PM, Steve Miller <st...@idrathernotsay.com> wrote: > Can you run lsof -p (pid) for whatever the pid is for your Kafka process? > > For the fd limits you've set, I don't think subtlety is required: if > there's a millionish lines in the output, the fd limit you set is where you > think it is, and if it's a lot lower than that, the limit isn't being > applied properly somehow (maybe you are running this under, say, > supervisord, and maybe its config is lowering the limit, or the limits for > root are as you say but the limits for the kafka user aren't being set > properly, that sort of thing). > > If you do have 1M lines in the output, at least this might give you a > place to start figuring out what's open and why. > > -Steve > > > On Jul 31, 2016, at 4:14 PM, Kessiler Rodrigues <kessi...@callinize.com> > wrote: > > > > I’m still experiencing this issue… > > > > Here are the kafka logs. > > > > [2016-07-31 20:10:35,658] ERROR Error while accepting connection > (kafka.network.Acceptor) > > java.io.IOException: Too many open files > >at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) > >at > sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422) > >at > sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250) > >at kafka.network.Acceptor.accept(SocketServer.scala:323) > >at kafka.network.Acceptor.run(SocketServer.scala:268) > >at java.lang.Thread.run(Thread.java:745) > > [2016-07-31 20:10:35,658] ERROR Error while accepting connection > (kafka.network.Acceptor) > > java.io.IOException: Too many open files > >at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) > >at > sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422) > >at > sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250) > >at kafka.network.Acceptor.accept(SocketServer.scala:323) > >at kafka.network.Acceptor.run(SocketServer.scala:268) > >at java.lang.Thread.run(Thread.java:745) > > [2016-07-31 20:10:35,658] ERROR Error while accepting connection > (kafka.network.Acceptor) > > java.io.IOException: Too many open files > >at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) > >at > sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422) > >at > sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250) > >at kafka.network.Acceptor.accept(SocketServer.scala:323) > >at kafka.network.Acceptor.run(SocketServer.scala:268) > >at java.lang.Thread.run(Thread.java:745) > > > > My ulimit is 1 million, how is that possible? > > > > Can someone help with this? > > > > > >> On Jul 30, 2016, at 5:05 AM, Kessiler Rodrigues <kessi...@callinize.com> > wrote: > >> > >> I have changed it a bit. > >> > >> I have 10 brokers and 20k topics with 1 partition each. > >> > >> I looked at the kaka’s logs dir and I only have 3318 files. > >> > >> I’m doing some tests to see how many topics/partitions I can have, but > it is throwing too many files once it hits 15k topics.. > >> > >> Any thoughts? > >> > >> > >> > >>> On Jul 29, 2016, at 10:33 PM, Gwen Shapira <g...@confluent.io> wrote: > >>> > >>> woah, it looks like you have 15,000 replicas per broker? > >>> > >>> You can go into the directory you configured for kafka's log.dir and > >>> see how many files you have there. Depending on your segment size and > >>> retention policy, you could have hundreds of files per partition > >>> there... > >>> > >>> Make sure you have at least that many file handles and then also add > >>> handles for the client connections. > >>> > >>> 1 million file handles sound like a lot, but you are running lots of > >>> partitions per broker... > >>> > >>> We normally don't see more than maybe 4000 per broker and most > >>> clusters have a lot fewer, so consider adding brokers and spreading > >>> partitions around a bit. > >>> > >>> Gwen > >>> > >>> On Fri, Jul 29, 2016 at 12:00 PM, Kessiler Rodrigues > >>> <kessi...@callinize.com> wrote: > >>>> Hi guys, > >>>> > >>>> I have been experiencing some issues on kafka, where its throwing too > many open files. > >>>> > >>>> I have around of 6k topics and 5 partitions each. > >>>> > >>>> My cluster was made with 6 brokers. All of them are running Ubuntu 16 > and the file limits settings are: > >>>> > >>>> `cat /proc/sys/fs/file-max` > >>>> 200 > >>>> > >>>> `ulimit -n` > >>>> 100 > >>>> > >>>> Anyone has experienced it before? > > > >
Re: Too Many Open Files
Can you run lsof -p (pid) for whatever the pid is for your Kafka process? For the fd limits you've set, I don't think subtlety is required: if there's a millionish lines in the output, the fd limit you set is where you think it is, and if it's a lot lower than that, the limit isn't being applied properly somehow (maybe you are running this under, say, supervisord, and maybe its config is lowering the limit, or the limits for root are as you say but the limits for the kafka user aren't being set properly, that sort of thing). If you do have 1M lines in the output, at least this might give you a place to start figuring out what's open and why. -Steve > On Jul 31, 2016, at 4:14 PM, Kessiler Rodrigues <kessi...@callinize.com> > wrote: > > I’m still experiencing this issue… > > Here are the kafka logs. > > [2016-07-31 20:10:35,658] ERROR Error while accepting connection > (kafka.network.Acceptor) > java.io.IOException: Too many open files >at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) >at > sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422) >at > sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250) >at kafka.network.Acceptor.accept(SocketServer.scala:323) >at kafka.network.Acceptor.run(SocketServer.scala:268) >at java.lang.Thread.run(Thread.java:745) > [2016-07-31 20:10:35,658] ERROR Error while accepting connection > (kafka.network.Acceptor) > java.io.IOException: Too many open files >at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) >at > sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422) >at > sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250) >at kafka.network.Acceptor.accept(SocketServer.scala:323) >at kafka.network.Acceptor.run(SocketServer.scala:268) >at java.lang.Thread.run(Thread.java:745) > [2016-07-31 20:10:35,658] ERROR Error while accepting connection > (kafka.network.Acceptor) > java.io.IOException: Too many open files >at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) >at > sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422) >at > sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250) >at kafka.network.Acceptor.accept(SocketServer.scala:323) >at kafka.network.Acceptor.run(SocketServer.scala:268) >at java.lang.Thread.run(Thread.java:745) > > My ulimit is 1 million, how is that possible? > > Can someone help with this? > > >> On Jul 30, 2016, at 5:05 AM, Kessiler Rodrigues <kessi...@callinize.com> >> wrote: >> >> I have changed it a bit. >> >> I have 10 brokers and 20k topics with 1 partition each. >> >> I looked at the kaka’s logs dir and I only have 3318 files. >> >> I’m doing some tests to see how many topics/partitions I can have, but it is >> throwing too many files once it hits 15k topics.. >> >> Any thoughts? >> >> >> >>> On Jul 29, 2016, at 10:33 PM, Gwen Shapira <g...@confluent.io> wrote: >>> >>> woah, it looks like you have 15,000 replicas per broker? >>> >>> You can go into the directory you configured for kafka's log.dir and >>> see how many files you have there. Depending on your segment size and >>> retention policy, you could have hundreds of files per partition >>> there... >>> >>> Make sure you have at least that many file handles and then also add >>> handles for the client connections. >>> >>> 1 million file handles sound like a lot, but you are running lots of >>> partitions per broker... >>> >>> We normally don't see more than maybe 4000 per broker and most >>> clusters have a lot fewer, so consider adding brokers and spreading >>> partitions around a bit. >>> >>> Gwen >>> >>> On Fri, Jul 29, 2016 at 12:00 PM, Kessiler Rodrigues >>> <kessi...@callinize.com> wrote: >>>> Hi guys, >>>> >>>> I have been experiencing some issues on kafka, where its throwing too many >>>> open files. >>>> >>>> I have around of 6k topics and 5 partitions each. >>>> >>>> My cluster was made with 6 brokers. All of them are running Ubuntu 16 and >>>> the file limits settings are: >>>> >>>> `cat /proc/sys/fs/file-max` >>>> 200 >>>> >>>> `ulimit -n` >>>> 100 >>>> >>>> Anyone has experienced it before? >
Re: Too Many Open Files
Gwen, Is there any particular reason why "inactive" (no consumers or producers for a topic) files need to be open? Chris -- Learn microservices - http://learnmicroservices.io Microservices application platform http://eventuate.io On Fri, Jul 29, 2016 at 6:33 PM, Gwen Shapira <g...@confluent.io> wrote: > woah, it looks like you have 15,000 replicas per broker? > > You can go into the directory you configured for kafka's log.dir and > see how many files you have there. Depending on your segment size and > retention policy, you could have hundreds of files per partition > there... > > Make sure you have at least that many file handles and then also add > handles for the client connections. > > 1 million file handles sound like a lot, but you are running lots of > partitions per broker... > > We normally don't see more than maybe 4000 per broker and most > clusters have a lot fewer, so consider adding brokers and spreading > partitions around a bit. > > Gwen > > On Fri, Jul 29, 2016 at 12:00 PM, Kessiler Rodrigues > <kessi...@callinize.com> wrote: > > Hi guys, > > > > I have been experiencing some issues on kafka, where its throwing too > many open files. > > > > I have around of 6k topics and 5 partitions each. > > > > My cluster was made with 6 brokers. All of them are running Ubuntu 16 > and the file limits settings are: > > > > `cat /proc/sys/fs/file-max` > > 200 > > > > `ulimit -n` > > 100 > > > > Anyone has experienced it before? >
RE: Too Many Open Files
Maybe you are exhausting your sockets, not file handles for some reason? From: Kessiler Rodrigues [kessi...@callinize.com] Sent: 31 July 2016 22:14 To: users@kafka.apache.org Subject: Re: Too Many Open Files I’m still experiencing this issue… Here are the kafka logs. [2016-07-31 20:10:35,658] ERROR Error while accepting connection (kafka.network.Acceptor) java.io.IOException: Too many open files at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422) at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250) at kafka.network.Acceptor.accept(SocketServer.scala:323) at kafka.network.Acceptor.run(SocketServer.scala:268) at java.lang.Thread.run(Thread.java:745) [2016-07-31 20:10:35,658] ERROR Error while accepting connection (kafka.network.Acceptor) java.io.IOException: Too many open files at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422) at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250) at kafka.network.Acceptor.accept(SocketServer.scala:323) at kafka.network.Acceptor.run(SocketServer.scala:268) at java.lang.Thread.run(Thread.java:745) [2016-07-31 20:10:35,658] ERROR Error while accepting connection (kafka.network.Acceptor) java.io.IOException: Too many open files at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422) at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250) at kafka.network.Acceptor.accept(SocketServer.scala:323) at kafka.network.Acceptor.run(SocketServer.scala:268) at java.lang.Thread.run(Thread.java:745) My ulimit is 1 million, how is that possible? Can someone help with this? > On Jul 30, 2016, at 5:05 AM, Kessiler Rodrigues <kessi...@callinize.com> > wrote: > > I have changed it a bit. > > I have 10 brokers and 20k topics with 1 partition each. > > I looked at the kaka’s logs dir and I only have 3318 files. > > I’m doing some tests to see how many topics/partitions I can have, but it is > throwing too many files once it hits 15k topics.. > > Any thoughts? > > > >> On Jul 29, 2016, at 10:33 PM, Gwen Shapira <g...@confluent.io> wrote: >> >> woah, it looks like you have 15,000 replicas per broker? >> >> You can go into the directory you configured for kafka's log.dir and >> see how many files you have there. Depending on your segment size and >> retention policy, you could have hundreds of files per partition >> there... >> >> Make sure you have at least that many file handles and then also add >> handles for the client connections. >> >> 1 million file handles sound like a lot, but you are running lots of >> partitions per broker... >> >> We normally don't see more than maybe 4000 per broker and most >> clusters have a lot fewer, so consider adding brokers and spreading >> partitions around a bit. >> >> Gwen >> >> On Fri, Jul 29, 2016 at 12:00 PM, Kessiler Rodrigues >> <kessi...@callinize.com> wrote: >>> Hi guys, >>> >>> I have been experiencing some issues on kafka, where its throwing too many >>> open files. >>> >>> I have around of 6k topics and 5 partitions each. >>> >>> My cluster was made with 6 brokers. All of them are running Ubuntu 16 and >>> the file limits settings are: >>> >>> `cat /proc/sys/fs/file-max` >>> 200 >>> >>> `ulimit -n` >>> 100 >>> >>> Anyone has experienced it before? >
Re: Too Many Open Files
I’m still experiencing this issue… Here are the kafka logs. [2016-07-31 20:10:35,658] ERROR Error while accepting connection (kafka.network.Acceptor) java.io.IOException: Too many open files at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422) at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250) at kafka.network.Acceptor.accept(SocketServer.scala:323) at kafka.network.Acceptor.run(SocketServer.scala:268) at java.lang.Thread.run(Thread.java:745) [2016-07-31 20:10:35,658] ERROR Error while accepting connection (kafka.network.Acceptor) java.io.IOException: Too many open files at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422) at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250) at kafka.network.Acceptor.accept(SocketServer.scala:323) at kafka.network.Acceptor.run(SocketServer.scala:268) at java.lang.Thread.run(Thread.java:745) [2016-07-31 20:10:35,658] ERROR Error while accepting connection (kafka.network.Acceptor) java.io.IOException: Too many open files at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422) at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250) at kafka.network.Acceptor.accept(SocketServer.scala:323) at kafka.network.Acceptor.run(SocketServer.scala:268) at java.lang.Thread.run(Thread.java:745) My ulimit is 1 million, how is that possible? Can someone help with this? > On Jul 30, 2016, at 5:05 AM, Kessiler Rodrigues <kessi...@callinize.com> > wrote: > > I have changed it a bit. > > I have 10 brokers and 20k topics with 1 partition each. > > I looked at the kaka’s logs dir and I only have 3318 files. > > I’m doing some tests to see how many topics/partitions I can have, but it is > throwing too many files once it hits 15k topics.. > > Any thoughts? > > > >> On Jul 29, 2016, at 10:33 PM, Gwen Shapira <g...@confluent.io> wrote: >> >> woah, it looks like you have 15,000 replicas per broker? >> >> You can go into the directory you configured for kafka's log.dir and >> see how many files you have there. Depending on your segment size and >> retention policy, you could have hundreds of files per partition >> there... >> >> Make sure you have at least that many file handles and then also add >> handles for the client connections. >> >> 1 million file handles sound like a lot, but you are running lots of >> partitions per broker... >> >> We normally don't see more than maybe 4000 per broker and most >> clusters have a lot fewer, so consider adding brokers and spreading >> partitions around a bit. >> >> Gwen >> >> On Fri, Jul 29, 2016 at 12:00 PM, Kessiler Rodrigues >> <kessi...@callinize.com> wrote: >>> Hi guys, >>> >>> I have been experiencing some issues on kafka, where its throwing too many >>> open files. >>> >>> I have around of 6k topics and 5 partitions each. >>> >>> My cluster was made with 6 brokers. All of them are running Ubuntu 16 and >>> the file limits settings are: >>> >>> `cat /proc/sys/fs/file-max` >>> 200 >>> >>> `ulimit -n` >>> 100 >>> >>> Anyone has experienced it before? >
Re: Too Many Open Files
I have changed it a bit. I have 10 brokers and 20k topics with 1 partition each. I looked at the kaka’s logs dir and I only have 3318 files. I’m doing some tests to see how many topics/partitions I can have, but it is throwing too many files once it hits 15k topics.. Any thoughts? > On Jul 29, 2016, at 10:33 PM, Gwen Shapira <g...@confluent.io> wrote: > > woah, it looks like you have 15,000 replicas per broker? > > You can go into the directory you configured for kafka's log.dir and > see how many files you have there. Depending on your segment size and > retention policy, you could have hundreds of files per partition > there... > > Make sure you have at least that many file handles and then also add > handles for the client connections. > > 1 million file handles sound like a lot, but you are running lots of > partitions per broker... > > We normally don't see more than maybe 4000 per broker and most > clusters have a lot fewer, so consider adding brokers and spreading > partitions around a bit. > > Gwen > > On Fri, Jul 29, 2016 at 12:00 PM, Kessiler Rodrigues > <kessi...@callinize.com> wrote: >> Hi guys, >> >> I have been experiencing some issues on kafka, where its throwing too many >> open files. >> >> I have around of 6k topics and 5 partitions each. >> >> My cluster was made with 6 brokers. All of them are running Ubuntu 16 and >> the file limits settings are: >> >> `cat /proc/sys/fs/file-max` >> 200 >> >> `ulimit -n` >> 100 >> >> Anyone has experienced it before?
Re: Too Many Open Files
woah, it looks like you have 15,000 replicas per broker? You can go into the directory you configured for kafka's log.dir and see how many files you have there. Depending on your segment size and retention policy, you could have hundreds of files per partition there... Make sure you have at least that many file handles and then also add handles for the client connections. 1 million file handles sound like a lot, but you are running lots of partitions per broker... We normally don't see more than maybe 4000 per broker and most clusters have a lot fewer, so consider adding brokers and spreading partitions around a bit. Gwen On Fri, Jul 29, 2016 at 12:00 PM, Kessiler Rodrigues <kessi...@callinize.com> wrote: > Hi guys, > > I have been experiencing some issues on kafka, where its throwing too many > open files. > > I have around of 6k topics and 5 partitions each. > > My cluster was made with 6 brokers. All of them are running Ubuntu 16 and the > file limits settings are: > > `cat /proc/sys/fs/file-max` > 200 > > `ulimit -n` > 100 > > Anyone has experienced it before?
Too Many Open Files
Hi guys, I have been experiencing some issues on kafka, where its throwing too many open files. I have around of 6k topics and 5 partitions each. My cluster was made with 6 brokers. All of them are running Ubuntu 16 and the file limits settings are: `cat /proc/sys/fs/file-max` 200 `ulimit -n` 100 Anyone has experienced it before?
java.io.IOException: Too many open files error
Hi, all We test our production kafka, and getting such error [2015-01-15 19:03:45,057] ERROR Error in acceptor (kafka.network.Acceptor) java.io.IOException: Too many open files at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) at sun.nio.ch.ServerSocketChannelImpl.accept( ServerSocketChannelImpl.java:241) at kafka.network.Acceptor.accept(SocketServer.scala:200) at kafka.network.Acceptor.run(SocketServer.scala:154) at java.lang.Thread.run(Thread.java:745) I noticed some other developers had similar issues, one suggestion was Without knowing the intricacies of Kafka, i think the default open file descriptors is 1024 on unix. This can be changed by setting a higher ulimit value ( typically 8192 but sometimes even 10 ). Before modifying the ulimit I would recommend you check the number of sockets stuck in TIME_WAIT mode. In this case, it looks like the broker has too many open sockets. This could be because you have a rogue client connecting and disconnecting repeatedly. You might have to reduce the TIME_WAIT state to 30 seconds or lower. We increase the open file handles by doing this: insert kafka - nofile 10 in /etc/security/limits.conf Is that right to change the open file descriptors? In addition, it says to reduce the TIME_WAIT, where about to change this state? Or any other solution for this issue? thanks -- Alec Li
Re: java.io.IOException: Too many open files error
You may find this article useful for troubleshooting and modifying TIME_WAIT: http://www.linuxbrigade.com/reduce-time_wait-socket-connections/ The line you have for increasing file limit is fine, but you may also need to increase the limit system wide: insert fs.file-max = 10 in /etc/sysctl.conf Gwen On Thu, Jan 15, 2015 at 12:30 PM, Sa Li sal...@gmail.com wrote: Hi, all We test our production kafka, and getting such error [2015-01-15 19:03:45,057] ERROR Error in acceptor (kafka.network.Acceptor) java.io.IOException: Too many open files at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) at sun.nio.ch.ServerSocketChannelImpl.accept( ServerSocketChannelImpl.java:241) at kafka.network.Acceptor.accept(SocketServer.scala:200) at kafka.network.Acceptor.run(SocketServer.scala:154) at java.lang.Thread.run(Thread.java:745) I noticed some other developers had similar issues, one suggestion was Without knowing the intricacies of Kafka, i think the default open file descriptors is 1024 on unix. This can be changed by setting a higher ulimit value ( typically 8192 but sometimes even 10 ). Before modifying the ulimit I would recommend you check the number of sockets stuck in TIME_WAIT mode. In this case, it looks like the broker has too many open sockets. This could be because you have a rogue client connecting and disconnecting repeatedly. You might have to reduce the TIME_WAIT state to 30 seconds or lower. We increase the open file handles by doing this: insert kafka - nofile 10 in /etc/security/limits.conf Is that right to change the open file descriptors? In addition, it says to reduce the TIME_WAIT, where about to change this state? Or any other solution for this issue? thanks -- Alec Li
Re: java.io.IOException: Too many open files error
Hi Sa Li, Depending on your system that configuration entry needs to be modified. The first parameter after the insert is the username what you use to run kafka. It might be your own username or something else, in the following example it is called kafkauser. On the top of that I also like to use soft and hard limits, when you hit the soft limit the system will log a meaningful message in dmesg so you can see what is happening. kafkauser soft nofile 8 kafkauser hard nofile 10 Hope that helps, Istvan On Thu, Jan 15, 2015 at 12:30 PM, Sa Li sal...@gmail.com wrote: Hi, all We test our production kafka, and getting such error [2015-01-15 19:03:45,057] ERROR Error in acceptor (kafka.network.Acceptor) java.io.IOException: Too many open files at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) at sun.nio.ch.ServerSocketChannelImpl.accept( ServerSocketChannelImpl.java:241) at kafka.network.Acceptor.accept(SocketServer.scala:200) at kafka.network.Acceptor.run(SocketServer.scala:154) at java.lang.Thread.run(Thread.java:745) I noticed some other developers had similar issues, one suggestion was Without knowing the intricacies of Kafka, i think the default open file descriptors is 1024 on unix. This can be changed by setting a higher ulimit value ( typically 8192 but sometimes even 10 ). Before modifying the ulimit I would recommend you check the number of sockets stuck in TIME_WAIT mode. In this case, it looks like the broker has too many open sockets. This could be because you have a rogue client connecting and disconnecting repeatedly. You might have to reduce the TIME_WAIT state to 30 seconds or lower. We increase the open file handles by doing this: insert kafka - nofile 10 in /etc/security/limits.conf Is that right to change the open file descriptors? In addition, it says to reduce the TIME_WAIT, where about to change this state? Or any other solution for this issue? thanks -- Alec Li -- the sun shines for all
Re: java.io.IOException: Too many open files error
Thanks for the reply, I have change the configuration and running to see if any errors come out. SL On Thu, Jan 15, 2015 at 3:34 PM, István lecc...@gmail.com wrote: Hi Sa Li, Depending on your system that configuration entry needs to be modified. The first parameter after the insert is the username what you use to run kafka. It might be your own username or something else, in the following example it is called kafkauser. On the top of that I also like to use soft and hard limits, when you hit the soft limit the system will log a meaningful message in dmesg so you can see what is happening. kafkauser soft nofile 8 kafkauser hard nofile 10 Hope that helps, Istvan On Thu, Jan 15, 2015 at 12:30 PM, Sa Li sal...@gmail.com wrote: Hi, all We test our production kafka, and getting such error [2015-01-15 19:03:45,057] ERROR Error in acceptor (kafka.network.Acceptor) java.io.IOException: Too many open files at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) at sun.nio.ch.ServerSocketChannelImpl.accept( ServerSocketChannelImpl.java:241) at kafka.network.Acceptor.accept(SocketServer.scala:200) at kafka.network.Acceptor.run(SocketServer.scala:154) at java.lang.Thread.run(Thread.java:745) I noticed some other developers had similar issues, one suggestion was Without knowing the intricacies of Kafka, i think the default open file descriptors is 1024 on unix. This can be changed by setting a higher ulimit value ( typically 8192 but sometimes even 10 ). Before modifying the ulimit I would recommend you check the number of sockets stuck in TIME_WAIT mode. In this case, it looks like the broker has too many open sockets. This could be because you have a rogue client connecting and disconnecting repeatedly. You might have to reduce the TIME_WAIT state to 30 seconds or lower. We increase the open file handles by doing this: insert kafka - nofile 10 in /etc/security/limits.conf Is that right to change the open file descriptors? In addition, it says to reduce the TIME_WAIT, where about to change this state? Or any other solution for this issue? thanks -- Alec Li -- the sun shines for all -- Alec Li
Re: Too Many Open Files Broker Error
Hi Jun, That was the problem. It was actually the Ubuntu upstart job over writing the limit. Thank you very much for your help. Paul Lung On 7/9/14, 1:58 PM, Jun Rao jun...@gmail.com wrote: Is it possible your container wrapper somehow overrides the file handler limit? Thanks, Jun On Wed, Jul 9, 2014 at 9:59 AM, Lung, Paul pl...@ebay.com wrote: Yup. In fact, I just ran the test program again while the Kafak broker is still running, using the same user of course. I was able to get up to 10K connections with the test program. The test program uses the same java NIO library that the broker does. So the machine is capable of handling that many connections. The only issue I saw was that the NIO ServerSocketChannel is a bit slow at accepting connections when the total connection goes around 4K, but this could be due to the fact that I put the ServerSocketChannel in the same Selector as the 4K SocketChannels. So sometimes on the client side, I see: java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcher.write0(Native Method) at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:122) at sun.nio.ch.IOUtil.write(IOUtil.java:93) at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:352) at FdTest$ClientThread.run(FdTest.java:108) But all I have to do is sleep for a bit on the client, and then retry again. However, 4K does seem like a magic number, since that¹s seems to be the number that the Kafka broker machine can handle before it gives me the ³Too Many Open Files² error and eventually crashes. Paul Lung On 7/8/14, 9:29 PM, Jun Rao jun...@gmail.com wrote: Does your test program run as the same user as Kafka broker? Thanks, Jun On Tue, Jul 8, 2014 at 1:42 PM, Lung, Paul pl...@ebay.com wrote: Hi Guys, I¹m seeing the following errors from the 0.8.1.1 broker. This occurs most often on the Controller machine. Then the controller process crashes, and the controller bounces to other machines, which causes those machines to crash. Looking at the file descriptors being held by the process, it¹s only around 4000 or so(looking at . There aren¹t a whole lot of connections in TIME_WAIT states, and I¹ve increased the ephemeral port range to ³16000 64000² via /proc/sys/net/ipv4/ip_local_port_range². I¹ve written a Java test program to see how many sockets and files I can open. The socket is definitely limited by the ephemeral port range, which was around 22K at the time. But I can open tons of files, since the open file limit of the user is set to 100K. So given that I can theoretically open 48K sockets and probably 90K files, and I only see around 4K total for the Kafka broker, I¹m really confused as to why I¹m seeing this error. Is there some internal Kafka limit that I don¹t know about? Paul Lung java.io.IOException: Too many open files at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java: 16 3) at kafka.network.Acceptor.accept(SocketServer.scala:200) at kafka.network.Acceptor.run(SocketServer.scala:154) at java.lang.Thread.run(Thread.java:679) [2014-07-08 13:07:21,534] ERROR Error in acceptor (kafka.network.Acceptor) java.io.IOException: Too many open files at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java: 16 3) at kafka.network.Acceptor.accept(SocketServer.scala:200) at kafka.network.Acceptor.run(SocketServer.scala:154) at java.lang.Thread.run(Thread.java:679) [2014-07-08 13:07:21,563] ERROR [ReplicaFetcherThread-3-2124488], Error for partition [bom__021active_80__32__miniactiveitem_lvs_qn,0] to broker 2124488:class kafka.common.NotLeaderForPartitionException (kafka.server.ReplicaFetcherThread) [2014-07-08 13:07:21,558] FATAL [Replica Manager on Broker 2140112]: Error writing to highwatermark file: (kafka.server.ReplicaManager) java.io.FileNotFoundException: /ebay/cronus/software/cronusapp_home/kafka/kafka-logs/replication-offse t- checkpoint.tmp (Too many open files) at java.io.FileOutputStream.open(Native Method) at java.io.FileOutputStream.init(FileOutputStream.java:209) at java.io.FileOutputStream.init(FileOutputStream.java:160) at java.io.FileWriter.init(FileWriter.java:90) at kafka.server.OffsetCheckpoint.write(OffsetCheckpoint.scala:37) at kafka.server.ReplicaManager$$anonfun$checkpointHighWatermarks$2.apply(R ep licaManager.scala:447) at kafka.server.ReplicaManager$$anonfun$checkpointHighWatermarks$2.apply(R ep licaManager.scala:444
Re: Too Many Open Files Broker Error
I have the same problem. I didn't dig deeper but I saw this happen when I launch kafka in daemon mode. I found the daemon mode is just launch kafka with nohup. Not quite clear why this happen. On Wed, Jul 9, 2014 at 9:59 AM, Lung, Paul pl...@ebay.com wrote: Yup. In fact, I just ran the test program again while the Kafak broker is still running, using the same user of course. I was able to get up to 10K connections with the test program. The test program uses the same java NIO library that the broker does. So the machine is capable of handling that many connections. The only issue I saw was that the NIO ServerSocketChannel is a bit slow at accepting connections when the total connection goes around 4K, but this could be due to the fact that I put the ServerSocketChannel in the same Selector as the 4K SocketChannels. So sometimes on the client side, I see: java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcher.write0(Native Method) at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:122) at sun.nio.ch.IOUtil.write(IOUtil.java:93) at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:352) at FdTest$ClientThread.run(FdTest.java:108) But all I have to do is sleep for a bit on the client, and then retry again. However, 4K does seem like a magic number, since that¹s seems to be the number that the Kafka broker machine can handle before it gives me the ³Too Many Open Files² error and eventually crashes. Paul Lung On 7/8/14, 9:29 PM, Jun Rao jun...@gmail.com wrote: Does your test program run as the same user as Kafka broker? Thanks, Jun On Tue, Jul 8, 2014 at 1:42 PM, Lung, Paul pl...@ebay.com wrote: Hi Guys, I¹m seeing the following errors from the 0.8.1.1 broker. This occurs most often on the Controller machine. Then the controller process crashes, and the controller bounces to other machines, which causes those machines to crash. Looking at the file descriptors being held by the process, it¹s only around 4000 or so(looking at . There aren¹t a whole lot of connections in TIME_WAIT states, and I¹ve increased the ephemeral port range to ³16000 64000² via /proc/sys/net/ipv4/ip_local_port_range². I¹ve written a Java test program to see how many sockets and files I can open. The socket is definitely limited by the ephemeral port range, which was around 22K at the time. But I can open tons of files, since the open file limit of the user is set to 100K. So given that I can theoretically open 48K sockets and probably 90K files, and I only see around 4K total for the Kafka broker, I¹m really confused as to why I¹m seeing this error. Is there some internal Kafka limit that I don¹t know about? Paul Lung java.io.IOException: Too many open files at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:16 3) at kafka.network.Acceptor.accept(SocketServer.scala:200) at kafka.network.Acceptor.run(SocketServer.scala:154) at java.lang.Thread.run(Thread.java:679) [2014-07-08 13:07:21,534] ERROR Error in acceptor (kafka.network.Acceptor) java.io.IOException: Too many open files at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:16 3) at kafka.network.Acceptor.accept(SocketServer.scala:200) at kafka.network.Acceptor.run(SocketServer.scala:154) at java.lang.Thread.run(Thread.java:679) [2014-07-08 13:07:21,563] ERROR [ReplicaFetcherThread-3-2124488], Error for partition [bom__021active_80__32__miniactiveitem_lvs_qn,0] to broker 2124488:class kafka.common.NotLeaderForPartitionException (kafka.server.ReplicaFetcherThread) [2014-07-08 13:07:21,558] FATAL [Replica Manager on Broker 2140112]: Error writing to highwatermark file: (kafka.server.ReplicaManager) java.io.FileNotFoundException: /ebay/cronus/software/cronusapp_home/kafka/kafka-logs/replication-offset- checkpoint.tmp (Too many open files) at java.io.FileOutputStream.open(Native Method) at java.io.FileOutputStream.init(FileOutputStream.java:209) at java.io.FileOutputStream.init(FileOutputStream.java:160) at java.io.FileWriter.init(FileWriter.java:90) at kafka.server.OffsetCheckpoint.write(OffsetCheckpoint.scala:37) at kafka.server.ReplicaManager$$anonfun$checkpointHighWatermarks$2.apply(Rep licaManager.scala:447) at kafka.server.ReplicaManager$$anonfun$checkpointHighWatermarks$2.apply(Rep licaManager.scala:444) at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(Trav ersableLike.scala
Re: Too Many Open Files Broker Error
I don't know if that is your problem, but I had this output when my brokers couldn't talk to each others... The zookeeper were using the FQDN but my brokers didn't know the FQDN of the other brokers... If you look at you brokers info in zk (get /brokers/ids/#ID_OF_BROKER) can you ping/connect to the value of the key host from your other brokers? François Langelier Étudiant en génie Logiciel - École de Technologie Supérieure http://www.etsmtl.ca/ Capitaine Club Capra http://capra.etsmtl.ca/ VP-Communication - CS Games http://csgames.org 2014 Jeux de Génie http://www.jdgets.com/ 2011 à 2014 Argentier Fraternité du Piranha http://fraternitedupiranha.com/ 2012-2014 Comité Organisateur Olympiades ÉTS 2012 Compétition Québécoise d'Ingénierie 2012 - Compétition Senior On 9 July 2014 15:17, hsy...@gmail.com hsy...@gmail.com wrote: I have the same problem. I didn't dig deeper but I saw this happen when I launch kafka in daemon mode. I found the daemon mode is just launch kafka with nohup. Not quite clear why this happen. On Wed, Jul 9, 2014 at 9:59 AM, Lung, Paul pl...@ebay.com wrote: Yup. In fact, I just ran the test program again while the Kafak broker is still running, using the same user of course. I was able to get up to 10K connections with the test program. The test program uses the same java NIO library that the broker does. So the machine is capable of handling that many connections. The only issue I saw was that the NIO ServerSocketChannel is a bit slow at accepting connections when the total connection goes around 4K, but this could be due to the fact that I put the ServerSocketChannel in the same Selector as the 4K SocketChannels. So sometimes on the client side, I see: java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcher.write0(Native Method) at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:122) at sun.nio.ch.IOUtil.write(IOUtil.java:93) at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:352) at FdTest$ClientThread.run(FdTest.java:108) But all I have to do is sleep for a bit on the client, and then retry again. However, 4K does seem like a magic number, since that¹s seems to be the number that the Kafka broker machine can handle before it gives me the ³Too Many Open Files² error and eventually crashes. Paul Lung On 7/8/14, 9:29 PM, Jun Rao jun...@gmail.com wrote: Does your test program run as the same user as Kafka broker? Thanks, Jun On Tue, Jul 8, 2014 at 1:42 PM, Lung, Paul pl...@ebay.com wrote: Hi Guys, I¹m seeing the following errors from the 0.8.1.1 broker. This occurs most often on the Controller machine. Then the controller process crashes, and the controller bounces to other machines, which causes those machines to crash. Looking at the file descriptors being held by the process, it¹s only around 4000 or so(looking at . There aren¹t a whole lot of connections in TIME_WAIT states, and I¹ve increased the ephemeral port range to ³16000 64000² via /proc/sys/net/ipv4/ip_local_port_range². I¹ve written a Java test program to see how many sockets and files I can open. The socket is definitely limited by the ephemeral port range, which was around 22K at the time. But I can open tons of files, since the open file limit of the user is set to 100K. So given that I can theoretically open 48K sockets and probably 90K files, and I only see around 4K total for the Kafka broker, I¹m really confused as to why I¹m seeing this error. Is there some internal Kafka limit that I don¹t know about? Paul Lung java.io.IOException: Too many open files at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:16 3) at kafka.network.Acceptor.accept(SocketServer.scala:200) at kafka.network.Acceptor.run(SocketServer.scala:154) at java.lang.Thread.run(Thread.java:679) [2014-07-08 13:07:21,534] ERROR Error in acceptor (kafka.network.Acceptor) java.io.IOException: Too many open files at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:16 3) at kafka.network.Acceptor.accept(SocketServer.scala:200) at kafka.network.Acceptor.run(SocketServer.scala:154) at java.lang.Thread.run(Thread.java:679) [2014-07-08 13:07:21,563] ERROR [ReplicaFetcherThread-3-2124488], Error for partition [bom__021active_80__32__miniactiveitem_lvs_qn,0] to broker 2124488:class kafka.common.NotLeaderForPartitionException (kafka.server.ReplicaFetcherThread) [2014-07
Re: Too Many Open Files Broker Error
Is it possible your container wrapper somehow overrides the file handler limit? Thanks, Jun On Wed, Jul 9, 2014 at 9:59 AM, Lung, Paul pl...@ebay.com wrote: Yup. In fact, I just ran the test program again while the Kafak broker is still running, using the same user of course. I was able to get up to 10K connections with the test program. The test program uses the same java NIO library that the broker does. So the machine is capable of handling that many connections. The only issue I saw was that the NIO ServerSocketChannel is a bit slow at accepting connections when the total connection goes around 4K, but this could be due to the fact that I put the ServerSocketChannel in the same Selector as the 4K SocketChannels. So sometimes on the client side, I see: java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcher.write0(Native Method) at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:122) at sun.nio.ch.IOUtil.write(IOUtil.java:93) at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:352) at FdTest$ClientThread.run(FdTest.java:108) But all I have to do is sleep for a bit on the client, and then retry again. However, 4K does seem like a magic number, since that¹s seems to be the number that the Kafka broker machine can handle before it gives me the ³Too Many Open Files² error and eventually crashes. Paul Lung On 7/8/14, 9:29 PM, Jun Rao jun...@gmail.com wrote: Does your test program run as the same user as Kafka broker? Thanks, Jun On Tue, Jul 8, 2014 at 1:42 PM, Lung, Paul pl...@ebay.com wrote: Hi Guys, I¹m seeing the following errors from the 0.8.1.1 broker. This occurs most often on the Controller machine. Then the controller process crashes, and the controller bounces to other machines, which causes those machines to crash. Looking at the file descriptors being held by the process, it¹s only around 4000 or so(looking at . There aren¹t a whole lot of connections in TIME_WAIT states, and I¹ve increased the ephemeral port range to ³16000 64000² via /proc/sys/net/ipv4/ip_local_port_range². I¹ve written a Java test program to see how many sockets and files I can open. The socket is definitely limited by the ephemeral port range, which was around 22K at the time. But I can open tons of files, since the open file limit of the user is set to 100K. So given that I can theoretically open 48K sockets and probably 90K files, and I only see around 4K total for the Kafka broker, I¹m really confused as to why I¹m seeing this error. Is there some internal Kafka limit that I don¹t know about? Paul Lung java.io.IOException: Too many open files at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:16 3) at kafka.network.Acceptor.accept(SocketServer.scala:200) at kafka.network.Acceptor.run(SocketServer.scala:154) at java.lang.Thread.run(Thread.java:679) [2014-07-08 13:07:21,534] ERROR Error in acceptor (kafka.network.Acceptor) java.io.IOException: Too many open files at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:16 3) at kafka.network.Acceptor.accept(SocketServer.scala:200) at kafka.network.Acceptor.run(SocketServer.scala:154) at java.lang.Thread.run(Thread.java:679) [2014-07-08 13:07:21,563] ERROR [ReplicaFetcherThread-3-2124488], Error for partition [bom__021active_80__32__miniactiveitem_lvs_qn,0] to broker 2124488:class kafka.common.NotLeaderForPartitionException (kafka.server.ReplicaFetcherThread) [2014-07-08 13:07:21,558] FATAL [Replica Manager on Broker 2140112]: Error writing to highwatermark file: (kafka.server.ReplicaManager) java.io.FileNotFoundException: /ebay/cronus/software/cronusapp_home/kafka/kafka-logs/replication-offset- checkpoint.tmp (Too many open files) at java.io.FileOutputStream.open(Native Method) at java.io.FileOutputStream.init(FileOutputStream.java:209) at java.io.FileOutputStream.init(FileOutputStream.java:160) at java.io.FileWriter.init(FileWriter.java:90) at kafka.server.OffsetCheckpoint.write(OffsetCheckpoint.scala:37) at kafka.server.ReplicaManager$$anonfun$checkpointHighWatermarks$2.apply(Rep licaManager.scala:447) at kafka.server.ReplicaManager$$anonfun$checkpointHighWatermarks$2.apply(Rep licaManager.scala:444) at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(Trav ersableLike.scala:772) at scala.collection.immutable.Map$Map1.foreach(Map.scala:109
Re: Too Many Open Files Broker Error
Does your test program run as the same user as Kafka broker? Thanks, Jun On Tue, Jul 8, 2014 at 1:42 PM, Lung, Paul pl...@ebay.com wrote: Hi Guys, I’m seeing the following errors from the 0.8.1.1 broker. This occurs most often on the Controller machine. Then the controller process crashes, and the controller bounces to other machines, which causes those machines to crash. Looking at the file descriptors being held by the process, it’s only around 4000 or so(looking at . There aren’t a whole lot of connections in TIME_WAIT states, and I’ve increased the ephemeral port range to “16000 – 64000” via /proc/sys/net/ipv4/ip_local_port_range”. I’ve written a Java test program to see how many sockets and files I can open. The socket is definitely limited by the ephemeral port range, which was around 22K at the time. But I can open tons of files, since the open file limit of the user is set to 100K. So given that I can theoretically open 48K sockets and probably 90K files, and I only see around 4K total for the Kafka broker, I’m really confused as to why I’m seeing this error. Is there some internal Kafka limit that I don’t know about? Paul Lung java.io.IOException: Too many open files at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:163) at kafka.network.Acceptor.accept(SocketServer.scala:200) at kafka.network.Acceptor.run(SocketServer.scala:154) at java.lang.Thread.run(Thread.java:679) [2014-07-08 13:07:21,534] ERROR Error in acceptor (kafka.network.Acceptor) java.io.IOException: Too many open files at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:163) at kafka.network.Acceptor.accept(SocketServer.scala:200) at kafka.network.Acceptor.run(SocketServer.scala:154) at java.lang.Thread.run(Thread.java:679) [2014-07-08 13:07:21,563] ERROR [ReplicaFetcherThread-3-2124488], Error for partition [bom__021active_80__32__miniactiveitem_lvs_qn,0] to broker 2124488:class kafka.common.NotLeaderForPartitionException (kafka.server.ReplicaFetcherThread) [2014-07-08 13:07:21,558] FATAL [Replica Manager on Broker 2140112]: Error writing to highwatermark file: (kafka.server.ReplicaManager) java.io.FileNotFoundException: /ebay/cronus/software/cronusapp_home/kafka/kafka-logs/replication-offset-checkpoint.tmp (Too many open files) at java.io.FileOutputStream.open(Native Method) at java.io.FileOutputStream.init(FileOutputStream.java:209) at java.io.FileOutputStream.init(FileOutputStream.java:160) at java.io.FileWriter.init(FileWriter.java:90) at kafka.server.OffsetCheckpoint.write(OffsetCheckpoint.scala:37) at kafka.server.ReplicaManager$$anonfun$checkpointHighWatermarks$2.apply(ReplicaManager.scala:447) at kafka.server.ReplicaManager$$anonfun$checkpointHighWatermarks$2.apply(ReplicaManager.scala:444) at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772) at scala.collection.immutable.Map$Map1.foreach(Map.scala:109) at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771) at kafka.server.ReplicaManager.checkpointHighWatermarks(ReplicaManager.scala:444) at kafka.server.ReplicaManager$$anonfun$1.apply$mcV$sp(ReplicaManager.scala:94) at kafka.utils.KafkaScheduler$$anon$1.run(KafkaScheduler.scala:100) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:165) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:267) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:679)
Re: Getting java.io.IOException: Too many open files
Ok. What I just saw was that when the controller machine reaches around 4100+ files, it crashes. Then I think the controller bounced between 2 other machines, taking them down too, and the circled back to the original machine. Paul Lung On 6/24/14, 10:51 PM, Lung, Paul pl...@ebay.com wrote: The controller machine has 3500 or so, while the other machines have around 1600. Paul Lung On 6/24/14, 10:31 PM, Prakash Gowri Shankor prakash.shan...@gmail.com wrote: How many files does each broker itself have open ? You can find this from 'ls -l /proc/processid/fd' On Tue, Jun 24, 2014 at 10:18 PM, Lung, Paul pl...@ebay.com wrote: Hi All, I just upgraded my cluster from 0.8.1 to 0.8.1.1. I¹m seeing the following error messages on the same 3 brokers once in a while: [2014-06-24 21:43:44,711] ERROR Error in acceptor (kafka.network.Acceptor) java.io.IOException: Too many open files at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:1 6 3) at kafka.network.Acceptor.accept(SocketServer.scala:200) at kafka.network.Acceptor.run(SocketServer.scala:154) at java.lang.Thread.run(Thread.java:679) [2014-06-24 21:43:44,711] ERROR Error in acceptor (kafka.network.Acceptor) java.io.IOException: Too many open files at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:1 6 3) at kafka.network.Acceptor.accept(SocketServer.scala:200) at kafka.network.Acceptor.run(SocketServer.scala:154) at java.lang.Thread.run(Thread.java:679) When this happens, these 3 brokers essentially go out of sync when you do a ³kafka-topics.sh ‹describe². I tracked the number of open files by doing ³watch n 1 Œsudo lsof | wc l¹², which basically counts all open files on the system. The numbers for the systems are basically in the 6000 range, with one system going to 9000. I presume the 9000 machine is the controller. Looking at the ulimit of the user, both the hard limit and the soft limit for open files is 100,000. Using sysctl, the max file is fs.file-max = 9774928. So we seem to be way under the limit. What am I missing here? Is there some JVM limit around 10K open files or something? Paul Lung
Re: Getting java.io.IOException: Too many open files
Hi Prakash, How many open files do you expect a broker to be able to handle? It seems like this broker is crashing at around 4100 or so open files. Thanks, Paul Lung On 6/24/14, 11:08 PM, Lung, Paul pl...@ebay.com wrote: Ok. What I just saw was that when the controller machine reaches around 4100+ files, it crashes. Then I think the controller bounced between 2 other machines, taking them down too, and the circled back to the original machine. Paul Lung On 6/24/14, 10:51 PM, Lung, Paul pl...@ebay.com wrote: The controller machine has 3500 or so, while the other machines have around 1600. Paul Lung On 6/24/14, 10:31 PM, Prakash Gowri Shankor prakash.shan...@gmail.com wrote: How many files does each broker itself have open ? You can find this from 'ls -l /proc/processid/fd' On Tue, Jun 24, 2014 at 10:18 PM, Lung, Paul pl...@ebay.com wrote: Hi All, I just upgraded my cluster from 0.8.1 to 0.8.1.1. I¹m seeing the following error messages on the same 3 brokers once in a while: [2014-06-24 21:43:44,711] ERROR Error in acceptor (kafka.network.Acceptor) java.io.IOException: Too many open files at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java: 1 6 3) at kafka.network.Acceptor.accept(SocketServer.scala:200) at kafka.network.Acceptor.run(SocketServer.scala:154) at java.lang.Thread.run(Thread.java:679) [2014-06-24 21:43:44,711] ERROR Error in acceptor (kafka.network.Acceptor) java.io.IOException: Too many open files at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java: 1 6 3) at kafka.network.Acceptor.accept(SocketServer.scala:200) at kafka.network.Acceptor.run(SocketServer.scala:154) at java.lang.Thread.run(Thread.java:679) When this happens, these 3 brokers essentially go out of sync when you do a ³kafka-topics.sh ‹describe². I tracked the number of open files by doing ³watch n 1 Œsudo lsof | wc l¹², which basically counts all open files on the system. The numbers for the systems are basically in the 6000 range, with one system going to 9000. I presume the 9000 machine is the controller. Looking at the ulimit of the user, both the hard limit and the soft limit for open files is 100,000. Using sysctl, the max file is fs.file-max = 9774928. So we seem to be way under the limit. What am I missing here? Is there some JVM limit around 10K open files or something? Paul Lung
Re: Getting java.io.IOException: Too many open files
Without knowing the intricacies of Kafka, i think the default open file descriptors is 1024 on unix. This can be changed by setting a higher ulimit value ( typically 8192 but sometimes even 10 ). Before modifying the ulimit I would recommend you check the number of sockets stuck in TIME_WAIT mode. In this case, it looks like the broker has too many open sockets. This could be because you have a rogue client connecting and disconnecting repeatedly. You might have to reduce the TIME_WAIT state to 30 seconds or lower. On Wed, Jun 25, 2014 at 10:19 AM, Lung, Paul pl...@ebay.com wrote: Hi Prakash, How many open files do you expect a broker to be able to handle? It seems like this broker is crashing at around 4100 or so open files. Thanks, Paul Lung On 6/24/14, 11:08 PM, Lung, Paul pl...@ebay.com wrote: Ok. What I just saw was that when the controller machine reaches around 4100+ files, it crashes. Then I think the controller bounced between 2 other machines, taking them down too, and the circled back to the original machine. Paul Lung On 6/24/14, 10:51 PM, Lung, Paul pl...@ebay.com wrote: The controller machine has 3500 or so, while the other machines have around 1600. Paul Lung On 6/24/14, 10:31 PM, Prakash Gowri Shankor prakash.shan...@gmail.com wrote: How many files does each broker itself have open ? You can find this from 'ls -l /proc/processid/fd' On Tue, Jun 24, 2014 at 10:18 PM, Lung, Paul pl...@ebay.com wrote: Hi All, I just upgraded my cluster from 0.8.1 to 0.8.1.1. I¹m seeing the following error messages on the same 3 brokers once in a while: [2014-06-24 21:43:44,711] ERROR Error in acceptor (kafka.network.Acceptor) java.io.IOException: Too many open files at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java: 1 6 3) at kafka.network.Acceptor.accept(SocketServer.scala:200) at kafka.network.Acceptor.run(SocketServer.scala:154) at java.lang.Thread.run(Thread.java:679) [2014-06-24 21:43:44,711] ERROR Error in acceptor (kafka.network.Acceptor) java.io.IOException: Too many open files at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java: 1 6 3) at kafka.network.Acceptor.accept(SocketServer.scala:200) at kafka.network.Acceptor.run(SocketServer.scala:154) at java.lang.Thread.run(Thread.java:679) When this happens, these 3 brokers essentially go out of sync when you do a ³kafka-topics.sh ‹describe². I tracked the number of open files by doing ³watch n 1 Œsudo lsof | wc l¹², which basically counts all open files on the system. The numbers for the systems are basically in the 6000 range, with one system going to 9000. I presume the 9000 machine is the controller. Looking at the ulimit of the user, both the hard limit and the soft limit for open files is 100,000. Using sysctl, the max file is fs.file-max = 9774928. So we seem to be way under the limit. What am I missing here? Is there some JVM limit around 10K open files or something? Paul Lung
Getting java.io.IOException: Too many open files
Hi All, I just upgraded my cluster from 0.8.1 to 0.8.1.1. I’m seeing the following error messages on the same 3 brokers once in a while: [2014-06-24 21:43:44,711] ERROR Error in acceptor (kafka.network.Acceptor) java.io.IOException: Too many open files at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:163) at kafka.network.Acceptor.accept(SocketServer.scala:200) at kafka.network.Acceptor.run(SocketServer.scala:154) at java.lang.Thread.run(Thread.java:679) [2014-06-24 21:43:44,711] ERROR Error in acceptor (kafka.network.Acceptor) java.io.IOException: Too many open files at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:163) at kafka.network.Acceptor.accept(SocketServer.scala:200) at kafka.network.Acceptor.run(SocketServer.scala:154) at java.lang.Thread.run(Thread.java:679) When this happens, these 3 brokers essentially go out of sync when you do a “kafka-topics.sh —describe”. I tracked the number of open files by doing “watch –n 1 ‘sudo lsof | wc –l’”, which basically counts all open files on the system. The numbers for the systems are basically in the 6000 range, with one system going to 9000. I presume the 9000 machine is the controller. Looking at the ulimit of the user, both the hard limit and the soft limit for open files is 100,000. Using sysctl, the max file is fs.file-max = 9774928. So we seem to be way under the limit. What am I missing here? Is there some JVM limit around 10K open files or something? Paul Lung
Re: Getting java.io.IOException: Too many open files
How many files does each broker itself have open ? You can find this from 'ls -l /proc/processid/fd' On Tue, Jun 24, 2014 at 10:18 PM, Lung, Paul pl...@ebay.com wrote: Hi All, I just upgraded my cluster from 0.8.1 to 0.8.1.1. I’m seeing the following error messages on the same 3 brokers once in a while: [2014-06-24 21:43:44,711] ERROR Error in acceptor (kafka.network.Acceptor) java.io.IOException: Too many open files at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:163) at kafka.network.Acceptor.accept(SocketServer.scala:200) at kafka.network.Acceptor.run(SocketServer.scala:154) at java.lang.Thread.run(Thread.java:679) [2014-06-24 21:43:44,711] ERROR Error in acceptor (kafka.network.Acceptor) java.io.IOException: Too many open files at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:163) at kafka.network.Acceptor.accept(SocketServer.scala:200) at kafka.network.Acceptor.run(SocketServer.scala:154) at java.lang.Thread.run(Thread.java:679) When this happens, these 3 brokers essentially go out of sync when you do a “kafka-topics.sh —describe”. I tracked the number of open files by doing “watch –n 1 ‘sudo lsof | wc –l’”, which basically counts all open files on the system. The numbers for the systems are basically in the 6000 range, with one system going to 9000. I presume the 9000 machine is the controller. Looking at the ulimit of the user, both the hard limit and the soft limit for open files is 100,000. Using sysctl, the max file is fs.file-max = 9774928. So we seem to be way under the limit. What am I missing here? Is there some JVM limit around 10K open files or something? Paul Lung
Re: Getting java.io.IOException: Too many open files
The controller machine has 3500 or so, while the other machines have around 1600. Paul Lung On 6/24/14, 10:31 PM, Prakash Gowri Shankor prakash.shan...@gmail.com wrote: How many files does each broker itself have open ? You can find this from 'ls -l /proc/processid/fd' On Tue, Jun 24, 2014 at 10:18 PM, Lung, Paul pl...@ebay.com wrote: Hi All, I just upgraded my cluster from 0.8.1 to 0.8.1.1. I¹m seeing the following error messages on the same 3 brokers once in a while: [2014-06-24 21:43:44,711] ERROR Error in acceptor (kafka.network.Acceptor) java.io.IOException: Too many open files at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:16 3) at kafka.network.Acceptor.accept(SocketServer.scala:200) at kafka.network.Acceptor.run(SocketServer.scala:154) at java.lang.Thread.run(Thread.java:679) [2014-06-24 21:43:44,711] ERROR Error in acceptor (kafka.network.Acceptor) java.io.IOException: Too many open files at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:16 3) at kafka.network.Acceptor.accept(SocketServer.scala:200) at kafka.network.Acceptor.run(SocketServer.scala:154) at java.lang.Thread.run(Thread.java:679) When this happens, these 3 brokers essentially go out of sync when you do a ³kafka-topics.sh ‹describe². I tracked the number of open files by doing ³watch n 1 Œsudo lsof | wc l¹², which basically counts all open files on the system. The numbers for the systems are basically in the 6000 range, with one system going to 9000. I presume the 9000 machine is the controller. Looking at the ulimit of the user, both the hard limit and the soft limit for open files is 100,000. Using sysctl, the max file is fs.file-max = 9774928. So we seem to be way under the limit. What am I missing here? Is there some JVM limit around 10K open files or something? Paul Lung
Re: too many open files - broker died
Thanks, Jun. On Sat, Nov 2, 2013 at 8:31 PM, Jun Rao jun...@gmail.com wrote: The # of required open file handlers is # client socket connections + # log segment and index files. Thanks, Jun On Fri, Nov 1, 2013 at 10:28 PM, Kane Kane kane.ist...@gmail.com wrote: I had only 1 topic with 45 partitions replicated across 3 brokers. After several hours of uploading some data to kafka 1 broker died with the following exception. I guess i can fix it raising limit for open files, but I wonder how it happened under described circumstances. [2013-11-02 00:19:14,862] INFO Reconnect due to socket error: null (kafka.consumer.SimpleConsumer) [2013-11-02 00:19:14,706] INFO Reconnect due to socket error: null (kafka.consumer.SimpleConsumer) [2013-11-02 00:19:05,150] INFO Reconnect due to socket error: null (kafka.consumer.SimpleConsumer) [2013-11-02 00:09:08,569] FATAL [ReplicaFetcherThread-0-2], Disk error while replicating data. (kafka.server.ReplicaFetcherThread) kafka.common.KafkaStorageException: I/O exception in append to log 'perf1-4' at kafka.log.Log.append(Unknown Source) at kafka.server.ReplicaFetcherThread.processPartitionData(Unknown Source) at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$1$$anonfun$apply$mcV$sp$2.apply(Unknown Source) at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$1$$anonfun$apply$mcV$sp$2.apply(Unknown Source) at scala.collection.immutable.HashMap$HashMap1.foreach(HashMap.scala:224) at scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:403) at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$1.apply$mcV$sp(Unknown Source) at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$1.apply(Unknown Source) at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$1.apply(Unknown Source) at kafka.utils.Utils$.inLock(Unknown Source) at kafka.server.AbstractFetcherThread.processFetchRequest(Unknown Source) at kafka.server.AbstractFetcherThread.doWork(Unknown Source) at kafka.utils.ShutdownableThread.run(Unknown Source) Caused by: java.io.FileNotFoundException: /disk1/kafka-logs/perf1-4/00010558.index (Too many open files) at java.io.RandomAccessFile.open(Native Method) at java.io.RandomAccessFile.init(RandomAccessFile.java:241) at kafka.log.OffsetIndex$$anonfun$resize$1.apply(Unknown Source) at kafka.log.OffsetIndex$$anonfun$resize$1.apply(Unknown Source) at kafka.utils.Utils$.inLock(Unknown Source) at kafka.log.OffsetIndex.resize(Unknown Source) at kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply$mcV$sp(Unknown Source) at kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply(Unknown Source) at kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply(Unknown Source) at kafka.utils.Utils$.inLock(Unknown Source) at kafka.log.OffsetIndex.trimToValidSize(Unknown Source) at kafka.log.Log.roll(Unknown Source) at kafka.log.Log.maybeRoll(Unknown Source)
too many open files - broker died
I had only 1 topic with 45 partitions replicated across 3 brokers. After several hours of uploading some data to kafka 1 broker died with the following exception. I guess i can fix it raising limit for open files, but I wonder how it happened under described circumstances. [2013-11-02 00:19:14,862] INFO Reconnect due to socket error: null (kafka.consumer.SimpleConsumer) [2013-11-02 00:19:14,706] INFO Reconnect due to socket error: null (kafka.consumer.SimpleConsumer) [2013-11-02 00:19:05,150] INFO Reconnect due to socket error: null (kafka.consumer.SimpleConsumer) [2013-11-02 00:09:08,569] FATAL [ReplicaFetcherThread-0-2], Disk error while replicating data. (kafka.server.ReplicaFetcherThread) kafka.common.KafkaStorageException: I/O exception in append to log 'perf1-4' at kafka.log.Log.append(Unknown Source) at kafka.server.ReplicaFetcherThread.processPartitionData(Unknown Source) at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$1$$anonfun$apply$mcV$sp$2.apply(Unknown Source) at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$1$$anonfun$apply$mcV$sp$2.apply(Unknown Source) at scala.collection.immutable.HashMap$HashMap1.foreach(HashMap.scala:224) at scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:403) at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$1.apply$mcV$sp(Unknown Source) at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$1.apply(Unknown Source) at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$1.apply(Unknown Source) at kafka.utils.Utils$.inLock(Unknown Source) at kafka.server.AbstractFetcherThread.processFetchRequest(Unknown Source) at kafka.server.AbstractFetcherThread.doWork(Unknown Source) at kafka.utils.ShutdownableThread.run(Unknown Source) Caused by: java.io.FileNotFoundException: /disk1/kafka-logs/perf1-4/00010558.index (Too many open files) at java.io.RandomAccessFile.open(Native Method) at java.io.RandomAccessFile.init(RandomAccessFile.java:241) at kafka.log.OffsetIndex$$anonfun$resize$1.apply(Unknown Source) at kafka.log.OffsetIndex$$anonfun$resize$1.apply(Unknown Source) at kafka.utils.Utils$.inLock(Unknown Source) at kafka.log.OffsetIndex.resize(Unknown Source) at kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply$mcV$sp(Unknown Source) at kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply(Unknown Source) at kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply(Unknown Source) at kafka.utils.Utils$.inLock(Unknown Source) at kafka.log.OffsetIndex.trimToValidSize(Unknown Source) at kafka.log.Log.roll(Unknown Source) at kafka.log.Log.maybeRoll(Unknown Source)
Re: Too many open files
Are you using the java or non-java producer? Are you using ZK based, broker-list based, or VIP based producer? Thanks, Jun On Wed, Sep 25, 2013 at 10:06 PM, Nicolas Berthet nicolasbert...@maaii.comwrote: Jun, I observed similar kind of things recently. (didn't notice before because our file limit is huge) I have a set of brokers in a datacenter, and producers in different data centers. At some point I got disconnections, from the producer perspective I had something like 15 connections to the broker. On the other hand on the broker side, I observed hundreds of connections from the producer in an ESTABLISHED state. We had some default settings for the socket timeout on the OS level, which we reduced hoping it would prevent the issue in the future. I'm not sure if the issue is from the broker or OS configuration though. I'm still keeping the broker under observation for the time being. Note that, for clients in the same datacenter, we didn't see this issue, the socket count matches on both ends. Nicolas Berthet -Original Message- From: Jun Rao [mailto:jun...@gmail.com] Sent: Thursday, September 26, 2013 12:39 PM To: users@kafka.apache.org Subject: Re: Too many open files If a client is gone, the broker should automatically close those broken sockets. Are you using a hardware load balancer? Thanks, Jun On Wed, Sep 25, 2013 at 4:48 PM, Mark static.void@gmail.com wrote: FYI if I kill all producers I don't see the number of open files drop. I still see all the ESTABLISHED connections. Is there a broker setting to automatically kill any inactive TCP connections? On Sep 25, 2013, at 4:30 PM, Mark static.void@gmail.com wrote: Any other ideas? On Sep 25, 2013, at 9:06 AM, Jun Rao jun...@gmail.com wrote: We haven't seen any socket leaks with the java producer. If you have lots of unexplained socket connections in established mode, one possible cause is that the client created new producer instances, but didn't close the old ones. Thanks, Jun On Wed, Sep 25, 2013 at 6:08 AM, Mark static.void@gmail.com wrote: No. We are using the kafka-rb ruby gem producer. https://github.com/acrosa/kafka-rb Now that you asked that question I need to ask. Is there a problem with the java producer? Sent from my iPhone On Sep 24, 2013, at 9:01 PM, Jun Rao jun...@gmail.com wrote: Are you using the java producer client? Thanks, Jun On Tue, Sep 24, 2013 at 5:33 PM, Mark static.void@gmail.com wrote: Our 0.7.2 Kafka cluster keeps crashing with: 2013-09-24 17:21:47,513 - [kafka-acceptor:Acceptor@153] - Error in acceptor java.io.IOException: Too many open The obvious fix is to bump up the number of open files but I'm wondering if there is a leak on the Kafka side and/or our application side. We currently have the ulimit set to a generous 4096 but obviously we are hitting this ceiling. What's a recommended value? We are running rails and our Unicorn workers are connecting to our Kafka cluster via round-robin load balancing. We have about 1500 workers to that would be 1500 connections right there but they should be split across our 3 nodes. Instead Netstat shows thousands of connections that look like this: tcp0 0 kafka1.mycompany.:XmlIpcRegSvc ::: 10.99.99.1:22503ESTABLISHED tcp0 0 kafka1.mycompany.:XmlIpcRegSvc ::: 10.99.99.1:48398ESTABLISHED tcp0 0 kafka1.mycompany.:XmlIpcRegSvc ::: 10.99.99.2:29617ESTABLISHED tcp0 0 kafka1.mycompany.:XmlIpcRegSvc ::: 10.99.99.1:32444ESTABLISHED tcp0 0 kafka1.mycompany.:XmlIpcRegSvc ::: 10.99.99.1:34415ESTABLISHED tcp0 0 kafka1.mycompany.:XmlIpcRegSvc ::: 10.99.99.1:56901ESTABLISHED tcp0 0 kafka1.mycompany.:XmlIpcRegSvc ::: 10.99.99.2:45349ESTABLISHED Has anyone come across this problem before? Is this a 0.7.2 leak, LB misconfiguration... ? Thanks
Re: Too many open files
We are using a hardware loadbalancer with a VIP based ruby producer. On Sep 26, 2013, at 7:37 AM, Jun Rao jun...@gmail.com wrote: Are you using the java or non-java producer? Are you using ZK based, broker-list based, or VIP based producer? Thanks, Jun On Wed, Sep 25, 2013 at 10:06 PM, Nicolas Berthet nicolasbert...@maaii.comwrote: Jun, I observed similar kind of things recently. (didn't notice before because our file limit is huge) I have a set of brokers in a datacenter, and producers in different data centers. At some point I got disconnections, from the producer perspective I had something like 15 connections to the broker. On the other hand on the broker side, I observed hundreds of connections from the producer in an ESTABLISHED state. We had some default settings for the socket timeout on the OS level, which we reduced hoping it would prevent the issue in the future. I'm not sure if the issue is from the broker or OS configuration though. I'm still keeping the broker under observation for the time being. Note that, for clients in the same datacenter, we didn't see this issue, the socket count matches on both ends. Nicolas Berthet -Original Message- From: Jun Rao [mailto:jun...@gmail.com] Sent: Thursday, September 26, 2013 12:39 PM To: users@kafka.apache.org Subject: Re: Too many open files If a client is gone, the broker should automatically close those broken sockets. Are you using a hardware load balancer? Thanks, Jun On Wed, Sep 25, 2013 at 4:48 PM, Mark static.void@gmail.com wrote: FYI if I kill all producers I don't see the number of open files drop. I still see all the ESTABLISHED connections. Is there a broker setting to automatically kill any inactive TCP connections? On Sep 25, 2013, at 4:30 PM, Mark static.void@gmail.com wrote: Any other ideas? On Sep 25, 2013, at 9:06 AM, Jun Rao jun...@gmail.com wrote: We haven't seen any socket leaks with the java producer. If you have lots of unexplained socket connections in established mode, one possible cause is that the client created new producer instances, but didn't close the old ones. Thanks, Jun On Wed, Sep 25, 2013 at 6:08 AM, Mark static.void@gmail.com wrote: No. We are using the kafka-rb ruby gem producer. https://github.com/acrosa/kafka-rb Now that you asked that question I need to ask. Is there a problem with the java producer? Sent from my iPhone On Sep 24, 2013, at 9:01 PM, Jun Rao jun...@gmail.com wrote: Are you using the java producer client? Thanks, Jun On Tue, Sep 24, 2013 at 5:33 PM, Mark static.void@gmail.com wrote: Our 0.7.2 Kafka cluster keeps crashing with: 2013-09-24 17:21:47,513 - [kafka-acceptor:Acceptor@153] - Error in acceptor java.io.IOException: Too many open The obvious fix is to bump up the number of open files but I'm wondering if there is a leak on the Kafka side and/or our application side. We currently have the ulimit set to a generous 4096 but obviously we are hitting this ceiling. What's a recommended value? We are running rails and our Unicorn workers are connecting to our Kafka cluster via round-robin load balancing. We have about 1500 workers to that would be 1500 connections right there but they should be split across our 3 nodes. Instead Netstat shows thousands of connections that look like this: tcp0 0 kafka1.mycompany.:XmlIpcRegSvc ::: 10.99.99.1:22503ESTABLISHED tcp0 0 kafka1.mycompany.:XmlIpcRegSvc ::: 10.99.99.1:48398ESTABLISHED tcp0 0 kafka1.mycompany.:XmlIpcRegSvc ::: 10.99.99.2:29617ESTABLISHED tcp0 0 kafka1.mycompany.:XmlIpcRegSvc ::: 10.99.99.1:32444ESTABLISHED tcp0 0 kafka1.mycompany.:XmlIpcRegSvc ::: 10.99.99.1:34415ESTABLISHED tcp0 0 kafka1.mycompany.:XmlIpcRegSvc ::: 10.99.99.1:56901ESTABLISHED tcp0 0 kafka1.mycompany.:XmlIpcRegSvc ::: 10.99.99.2:45349ESTABLISHED Has anyone come across this problem before? Is this a 0.7.2 leak, LB misconfiguration... ? Thanks
Re: Too many open files
What OS settings did you change? How high is your huge file limit? On Sep 25, 2013, at 10:06 PM, Nicolas Berthet nicolasbert...@maaii.com wrote: Jun, I observed similar kind of things recently. (didn't notice before because our file limit is huge) I have a set of brokers in a datacenter, and producers in different data centers. At some point I got disconnections, from the producer perspective I had something like 15 connections to the broker. On the other hand on the broker side, I observed hundreds of connections from the producer in an ESTABLISHED state. We had some default settings for the socket timeout on the OS level, which we reduced hoping it would prevent the issue in the future. I'm not sure if the issue is from the broker or OS configuration though. I'm still keeping the broker under observation for the time being. Note that, for clients in the same datacenter, we didn't see this issue, the socket count matches on both ends. Nicolas Berthet -Original Message- From: Jun Rao [mailto:jun...@gmail.com] Sent: Thursday, September 26, 2013 12:39 PM To: users@kafka.apache.org Subject: Re: Too many open files If a client is gone, the broker should automatically close those broken sockets. Are you using a hardware load balancer? Thanks, Jun On Wed, Sep 25, 2013 at 4:48 PM, Mark static.void@gmail.com wrote: FYI if I kill all producers I don't see the number of open files drop. I still see all the ESTABLISHED connections. Is there a broker setting to automatically kill any inactive TCP connections? On Sep 25, 2013, at 4:30 PM, Mark static.void@gmail.com wrote: Any other ideas? On Sep 25, 2013, at 9:06 AM, Jun Rao jun...@gmail.com wrote: We haven't seen any socket leaks with the java producer. If you have lots of unexplained socket connections in established mode, one possible cause is that the client created new producer instances, but didn't close the old ones. Thanks, Jun On Wed, Sep 25, 2013 at 6:08 AM, Mark static.void@gmail.com wrote: No. We are using the kafka-rb ruby gem producer. https://github.com/acrosa/kafka-rb Now that you asked that question I need to ask. Is there a problem with the java producer? Sent from my iPhone On Sep 24, 2013, at 9:01 PM, Jun Rao jun...@gmail.com wrote: Are you using the java producer client? Thanks, Jun On Tue, Sep 24, 2013 at 5:33 PM, Mark static.void@gmail.com wrote: Our 0.7.2 Kafka cluster keeps crashing with: 2013-09-24 17:21:47,513 - [kafka-acceptor:Acceptor@153] - Error in acceptor java.io.IOException: Too many open The obvious fix is to bump up the number of open files but I'm wondering if there is a leak on the Kafka side and/or our application side. We currently have the ulimit set to a generous 4096 but obviously we are hitting this ceiling. What's a recommended value? We are running rails and our Unicorn workers are connecting to our Kafka cluster via round-robin load balancing. We have about 1500 workers to that would be 1500 connections right there but they should be split across our 3 nodes. Instead Netstat shows thousands of connections that look like this: tcp0 0 kafka1.mycompany.:XmlIpcRegSvc ::: 10.99.99.1:22503ESTABLISHED tcp0 0 kafka1.mycompany.:XmlIpcRegSvc ::: 10.99.99.1:48398ESTABLISHED tcp0 0 kafka1.mycompany.:XmlIpcRegSvc ::: 10.99.99.2:29617ESTABLISHED tcp0 0 kafka1.mycompany.:XmlIpcRegSvc ::: 10.99.99.1:32444ESTABLISHED tcp0 0 kafka1.mycompany.:XmlIpcRegSvc ::: 10.99.99.1:34415ESTABLISHED tcp0 0 kafka1.mycompany.:XmlIpcRegSvc ::: 10.99.99.1:56901ESTABLISHED tcp0 0 kafka1.mycompany.:XmlIpcRegSvc ::: 10.99.99.2:45349ESTABLISHED Has anyone come across this problem before? Is this a 0.7.2 leak, LB misconfiguration... ? Thanks
RE: Too many open files
Hi Mark, I'm using centos 6.2. My file limit is something like 500k, the value is arbitrary. One of the thing I changed so far are the TCP keepalive parameters, it had moderate success so far. net.ipv4.tcp_keepalive_time net.ipv4.tcp_keepalive_intvl net.ipv4.tcp_keepalive_probes I still notice an abnormal number of ESTABLISHED connections, I've been doing some search and came over this page (http://www.lognormal.com/blog/2012/09/27/linux-tcpip-tuning/) I'll change the net.netfilter.nf_conntrack_tcp_timeout_established as indicated there, it looks closer to the solution to my issue. Are you also experiencing the issue in a cross data center context ? Best regards, Nicolas Berthet -Original Message- From: Mark [mailto:static.void@gmail.com] Sent: Friday, September 27, 2013 6:08 AM To: users@kafka.apache.org Subject: Re: Too many open files What OS settings did you change? How high is your huge file limit? On Sep 25, 2013, at 10:06 PM, Nicolas Berthet nicolasbert...@maaii.com wrote: Jun, I observed similar kind of things recently. (didn't notice before because our file limit is huge) I have a set of brokers in a datacenter, and producers in different data centers. At some point I got disconnections, from the producer perspective I had something like 15 connections to the broker. On the other hand on the broker side, I observed hundreds of connections from the producer in an ESTABLISHED state. We had some default settings for the socket timeout on the OS level, which we reduced hoping it would prevent the issue in the future. I'm not sure if the issue is from the broker or OS configuration though. I'm still keeping the broker under observation for the time being. Note that, for clients in the same datacenter, we didn't see this issue, the socket count matches on both ends. Nicolas Berthet -Original Message- From: Jun Rao [mailto:jun...@gmail.com] Sent: Thursday, September 26, 2013 12:39 PM To: users@kafka.apache.org Subject: Re: Too many open files If a client is gone, the broker should automatically close those broken sockets. Are you using a hardware load balancer? Thanks, Jun On Wed, Sep 25, 2013 at 4:48 PM, Mark static.void@gmail.com wrote: FYI if I kill all producers I don't see the number of open files drop. I still see all the ESTABLISHED connections. Is there a broker setting to automatically kill any inactive TCP connections? On Sep 25, 2013, at 4:30 PM, Mark static.void@gmail.com wrote: Any other ideas? On Sep 25, 2013, at 9:06 AM, Jun Rao jun...@gmail.com wrote: We haven't seen any socket leaks with the java producer. If you have lots of unexplained socket connections in established mode, one possible cause is that the client created new producer instances, but didn't close the old ones. Thanks, Jun On Wed, Sep 25, 2013 at 6:08 AM, Mark static.void@gmail.com wrote: No. We are using the kafka-rb ruby gem producer. https://github.com/acrosa/kafka-rb Now that you asked that question I need to ask. Is there a problem with the java producer? Sent from my iPhone On Sep 24, 2013, at 9:01 PM, Jun Rao jun...@gmail.com wrote: Are you using the java producer client? Thanks, Jun On Tue, Sep 24, 2013 at 5:33 PM, Mark static.void@gmail.com wrote: Our 0.7.2 Kafka cluster keeps crashing with: 2013-09-24 17:21:47,513 - [kafka-acceptor:Acceptor@153] - Error in acceptor java.io.IOException: Too many open The obvious fix is to bump up the number of open files but I'm wondering if there is a leak on the Kafka side and/or our application side. We currently have the ulimit set to a generous 4096 but obviously we are hitting this ceiling. What's a recommended value? We are running rails and our Unicorn workers are connecting to our Kafka cluster via round-robin load balancing. We have about 1500 workers to that would be 1500 connections right there but they should be split across our 3 nodes. Instead Netstat shows thousands of connections that look like this: tcp0 0 kafka1.mycompany.:XmlIpcRegSvc ::: 10.99.99.1:22503ESTABLISHED tcp0 0 kafka1.mycompany.:XmlIpcRegSvc ::: 10.99.99.1:48398ESTABLISHED tcp0 0 kafka1.mycompany.:XmlIpcRegSvc ::: 10.99.99.2:29617ESTABLISHED tcp0 0 kafka1.mycompany.:XmlIpcRegSvc ::: 10.99.99.1:32444ESTABLISHED tcp0 0 kafka1.mycompany.:XmlIpcRegSvc ::: 10.99.99.1:34415ESTABLISHED tcp0 0 kafka1.mycompany.:XmlIpcRegSvc ::: 10.99.99.1:56901ESTABLISHED tcp0 0 kafka1.mycompany.:XmlIpcRegSvc ::: 10.99.99.2:45349ESTABLISHED Has anyone come across this problem before? Is this a 0.7.2 leak, LB misconfiguration... ? Thanks
Re: Too many open files
No. We are using the kafka-rb ruby gem producer. https://github.com/acrosa/kafka-rb Now that you asked that question I need to ask. Is there a problem with the java producer? Sent from my iPhone On Sep 24, 2013, at 9:01 PM, Jun Rao jun...@gmail.com wrote: Are you using the java producer client? Thanks, Jun On Tue, Sep 24, 2013 at 5:33 PM, Mark static.void@gmail.com wrote: Our 0.7.2 Kafka cluster keeps crashing with: 2013-09-24 17:21:47,513 - [kafka-acceptor:Acceptor@153] - Error in acceptor java.io.IOException: Too many open The obvious fix is to bump up the number of open files but I'm wondering if there is a leak on the Kafka side and/or our application side. We currently have the ulimit set to a generous 4096 but obviously we are hitting this ceiling. What's a recommended value? We are running rails and our Unicorn workers are connecting to our Kafka cluster via round-robin load balancing. We have about 1500 workers to that would be 1500 connections right there but they should be split across our 3 nodes. Instead Netstat shows thousands of connections that look like this: tcp0 0 kafka1.mycompany.:XmlIpcRegSvc :::10.99.99.1:22503 ESTABLISHED tcp0 0 kafka1.mycompany.:XmlIpcRegSvc :::10.99.99.1:48398 ESTABLISHED tcp0 0 kafka1.mycompany.:XmlIpcRegSvc :::10.99.99.2:29617 ESTABLISHED tcp0 0 kafka1.mycompany.:XmlIpcRegSvc :::10.99.99.1:32444 ESTABLISHED tcp0 0 kafka1.mycompany.:XmlIpcRegSvc :::10.99.99.1:34415 ESTABLISHED tcp0 0 kafka1.mycompany.:XmlIpcRegSvc :::10.99.99.1:56901 ESTABLISHED tcp0 0 kafka1.mycompany.:XmlIpcRegSvc :::10.99.99.2:45349 ESTABLISHED Has anyone come across this problem before? Is this a 0.7.2 leak, LB misconfiguration… ? Thanks
Re: Too many open files
We haven't seen any socket leaks with the java producer. If you have lots of unexplained socket connections in established mode, one possible cause is that the client created new producer instances, but didn't close the old ones. Thanks, Jun On Wed, Sep 25, 2013 at 6:08 AM, Mark static.void@gmail.com wrote: No. We are using the kafka-rb ruby gem producer. https://github.com/acrosa/kafka-rb Now that you asked that question I need to ask. Is there a problem with the java producer? Sent from my iPhone On Sep 24, 2013, at 9:01 PM, Jun Rao jun...@gmail.com wrote: Are you using the java producer client? Thanks, Jun On Tue, Sep 24, 2013 at 5:33 PM, Mark static.void@gmail.com wrote: Our 0.7.2 Kafka cluster keeps crashing with: 2013-09-24 17:21:47,513 - [kafka-acceptor:Acceptor@153] - Error in acceptor java.io.IOException: Too many open The obvious fix is to bump up the number of open files but I'm wondering if there is a leak on the Kafka side and/or our application side. We currently have the ulimit set to a generous 4096 but obviously we are hitting this ceiling. What's a recommended value? We are running rails and our Unicorn workers are connecting to our Kafka cluster via round-robin load balancing. We have about 1500 workers to that would be 1500 connections right there but they should be split across our 3 nodes. Instead Netstat shows thousands of connections that look like this: tcp0 0 kafka1.mycompany.:XmlIpcRegSvc ::: 10.99.99.1:22503ESTABLISHED tcp0 0 kafka1.mycompany.:XmlIpcRegSvc ::: 10.99.99.1:48398ESTABLISHED tcp0 0 kafka1.mycompany.:XmlIpcRegSvc ::: 10.99.99.2:29617ESTABLISHED tcp0 0 kafka1.mycompany.:XmlIpcRegSvc ::: 10.99.99.1:32444ESTABLISHED tcp0 0 kafka1.mycompany.:XmlIpcRegSvc ::: 10.99.99.1:34415ESTABLISHED tcp0 0 kafka1.mycompany.:XmlIpcRegSvc ::: 10.99.99.1:56901ESTABLISHED tcp0 0 kafka1.mycompany.:XmlIpcRegSvc ::: 10.99.99.2:45349ESTABLISHED Has anyone come across this problem before? Is this a 0.7.2 leak, LB misconfiguration… ? Thanks
Re: Too many open files
Any other ideas? On Sep 25, 2013, at 9:06 AM, Jun Rao jun...@gmail.com wrote: We haven't seen any socket leaks with the java producer. If you have lots of unexplained socket connections in established mode, one possible cause is that the client created new producer instances, but didn't close the old ones. Thanks, Jun On Wed, Sep 25, 2013 at 6:08 AM, Mark static.void@gmail.com wrote: No. We are using the kafka-rb ruby gem producer. https://github.com/acrosa/kafka-rb Now that you asked that question I need to ask. Is there a problem with the java producer? Sent from my iPhone On Sep 24, 2013, at 9:01 PM, Jun Rao jun...@gmail.com wrote: Are you using the java producer client? Thanks, Jun On Tue, Sep 24, 2013 at 5:33 PM, Mark static.void@gmail.com wrote: Our 0.7.2 Kafka cluster keeps crashing with: 2013-09-24 17:21:47,513 - [kafka-acceptor:Acceptor@153] - Error in acceptor java.io.IOException: Too many open The obvious fix is to bump up the number of open files but I'm wondering if there is a leak on the Kafka side and/or our application side. We currently have the ulimit set to a generous 4096 but obviously we are hitting this ceiling. What's a recommended value? We are running rails and our Unicorn workers are connecting to our Kafka cluster via round-robin load balancing. We have about 1500 workers to that would be 1500 connections right there but they should be split across our 3 nodes. Instead Netstat shows thousands of connections that look like this: tcp0 0 kafka1.mycompany.:XmlIpcRegSvc ::: 10.99.99.1:22503ESTABLISHED tcp0 0 kafka1.mycompany.:XmlIpcRegSvc ::: 10.99.99.1:48398ESTABLISHED tcp0 0 kafka1.mycompany.:XmlIpcRegSvc ::: 10.99.99.2:29617ESTABLISHED tcp0 0 kafka1.mycompany.:XmlIpcRegSvc ::: 10.99.99.1:32444ESTABLISHED tcp0 0 kafka1.mycompany.:XmlIpcRegSvc ::: 10.99.99.1:34415ESTABLISHED tcp0 0 kafka1.mycompany.:XmlIpcRegSvc ::: 10.99.99.1:56901ESTABLISHED tcp0 0 kafka1.mycompany.:XmlIpcRegSvc ::: 10.99.99.2:45349ESTABLISHED Has anyone come across this problem before? Is this a 0.7.2 leak, LB misconfiguration… ? Thanks
Re: Too many open files
FYI if I kill all producers I don't see the number of open files drop. I still see all the ESTABLISHED connections. Is there a broker setting to automatically kill any inactive TCP connections? On Sep 25, 2013, at 4:30 PM, Mark static.void@gmail.com wrote: Any other ideas? On Sep 25, 2013, at 9:06 AM, Jun Rao jun...@gmail.com wrote: We haven't seen any socket leaks with the java producer. If you have lots of unexplained socket connections in established mode, one possible cause is that the client created new producer instances, but didn't close the old ones. Thanks, Jun On Wed, Sep 25, 2013 at 6:08 AM, Mark static.void@gmail.com wrote: No. We are using the kafka-rb ruby gem producer. https://github.com/acrosa/kafka-rb Now that you asked that question I need to ask. Is there a problem with the java producer? Sent from my iPhone On Sep 24, 2013, at 9:01 PM, Jun Rao jun...@gmail.com wrote: Are you using the java producer client? Thanks, Jun On Tue, Sep 24, 2013 at 5:33 PM, Mark static.void@gmail.com wrote: Our 0.7.2 Kafka cluster keeps crashing with: 2013-09-24 17:21:47,513 - [kafka-acceptor:Acceptor@153] - Error in acceptor java.io.IOException: Too many open The obvious fix is to bump up the number of open files but I'm wondering if there is a leak on the Kafka side and/or our application side. We currently have the ulimit set to a generous 4096 but obviously we are hitting this ceiling. What's a recommended value? We are running rails and our Unicorn workers are connecting to our Kafka cluster via round-robin load balancing. We have about 1500 workers to that would be 1500 connections right there but they should be split across our 3 nodes. Instead Netstat shows thousands of connections that look like this: tcp0 0 kafka1.mycompany.:XmlIpcRegSvc ::: 10.99.99.1:22503ESTABLISHED tcp0 0 kafka1.mycompany.:XmlIpcRegSvc ::: 10.99.99.1:48398ESTABLISHED tcp0 0 kafka1.mycompany.:XmlIpcRegSvc ::: 10.99.99.2:29617ESTABLISHED tcp0 0 kafka1.mycompany.:XmlIpcRegSvc ::: 10.99.99.1:32444ESTABLISHED tcp0 0 kafka1.mycompany.:XmlIpcRegSvc ::: 10.99.99.1:34415ESTABLISHED tcp0 0 kafka1.mycompany.:XmlIpcRegSvc ::: 10.99.99.1:56901ESTABLISHED tcp0 0 kafka1.mycompany.:XmlIpcRegSvc ::: 10.99.99.2:45349ESTABLISHED Has anyone come across this problem before? Is this a 0.7.2 leak, LB misconfiguration… ? Thanks
Re: Too many open files
If a client is gone, the broker should automatically close those broken sockets. Are you using a hardware load balancer? Thanks, Jun On Wed, Sep 25, 2013 at 4:48 PM, Mark static.void@gmail.com wrote: FYI if I kill all producers I don't see the number of open files drop. I still see all the ESTABLISHED connections. Is there a broker setting to automatically kill any inactive TCP connections? On Sep 25, 2013, at 4:30 PM, Mark static.void@gmail.com wrote: Any other ideas? On Sep 25, 2013, at 9:06 AM, Jun Rao jun...@gmail.com wrote: We haven't seen any socket leaks with the java producer. If you have lots of unexplained socket connections in established mode, one possible cause is that the client created new producer instances, but didn't close the old ones. Thanks, Jun On Wed, Sep 25, 2013 at 6:08 AM, Mark static.void@gmail.com wrote: No. We are using the kafka-rb ruby gem producer. https://github.com/acrosa/kafka-rb Now that you asked that question I need to ask. Is there a problem with the java producer? Sent from my iPhone On Sep 24, 2013, at 9:01 PM, Jun Rao jun...@gmail.com wrote: Are you using the java producer client? Thanks, Jun On Tue, Sep 24, 2013 at 5:33 PM, Mark static.void@gmail.com wrote: Our 0.7.2 Kafka cluster keeps crashing with: 2013-09-24 17:21:47,513 - [kafka-acceptor:Acceptor@153] - Error in acceptor java.io.IOException: Too many open The obvious fix is to bump up the number of open files but I'm wondering if there is a leak on the Kafka side and/or our application side. We currently have the ulimit set to a generous 4096 but obviously we are hitting this ceiling. What's a recommended value? We are running rails and our Unicorn workers are connecting to our Kafka cluster via round-robin load balancing. We have about 1500 workers to that would be 1500 connections right there but they should be split across our 3 nodes. Instead Netstat shows thousands of connections that look like this: tcp0 0 kafka1.mycompany.:XmlIpcRegSvc ::: 10.99.99.1:22503ESTABLISHED tcp0 0 kafka1.mycompany.:XmlIpcRegSvc ::: 10.99.99.1:48398ESTABLISHED tcp0 0 kafka1.mycompany.:XmlIpcRegSvc ::: 10.99.99.2:29617ESTABLISHED tcp0 0 kafka1.mycompany.:XmlIpcRegSvc ::: 10.99.99.1:32444ESTABLISHED tcp0 0 kafka1.mycompany.:XmlIpcRegSvc ::: 10.99.99.1:34415ESTABLISHED tcp0 0 kafka1.mycompany.:XmlIpcRegSvc ::: 10.99.99.1:56901ESTABLISHED tcp0 0 kafka1.mycompany.:XmlIpcRegSvc ::: 10.99.99.2:45349ESTABLISHED Has anyone come across this problem before? Is this a 0.7.2 leak, LB misconfiguration… ? Thanks
Too many open files
Our 0.7.2 Kafka cluster keeps crashing with: 2013-09-24 17:21:47,513 - [kafka-acceptor:Acceptor@153] - Error in acceptor java.io.IOException: Too many open The obvious fix is to bump up the number of open files but I'm wondering if there is a leak on the Kafka side and/or our application side. We currently have the ulimit set to a generous 4096 but obviously we are hitting this ceiling. What's a recommended value? We are running rails and our Unicorn workers are connecting to our Kafka cluster via round-robin load balancing. We have about 1500 workers to that would be 1500 connections right there but they should be split across our 3 nodes. Instead Netstat shows thousands of connections that look like this: tcp0 0 kafka1.mycompany.:XmlIpcRegSvc :::10.99.99.1:22503 ESTABLISHED tcp0 0 kafka1.mycompany.:XmlIpcRegSvc :::10.99.99.1:48398 ESTABLISHED tcp0 0 kafka1.mycompany.:XmlIpcRegSvc :::10.99.99.2:29617 ESTABLISHED tcp0 0 kafka1.mycompany.:XmlIpcRegSvc :::10.99.99.1:32444 ESTABLISHED tcp0 0 kafka1.mycompany.:XmlIpcRegSvc :::10.99.99.1:34415 ESTABLISHED tcp0 0 kafka1.mycompany.:XmlIpcRegSvc :::10.99.99.1:56901 ESTABLISHED tcp0 0 kafka1.mycompany.:XmlIpcRegSvc :::10.99.99.2:45349 ESTABLISHED Has anyone come across this problem before? Is this a 0.7.2 leak, LB misconfiguration… ? Thanks
Re: Too many open files
Are you using the java producer client? Thanks, Jun On Tue, Sep 24, 2013 at 5:33 PM, Mark static.void@gmail.com wrote: Our 0.7.2 Kafka cluster keeps crashing with: 2013-09-24 17:21:47,513 - [kafka-acceptor:Acceptor@153] - Error in acceptor java.io.IOException: Too many open The obvious fix is to bump up the number of open files but I'm wondering if there is a leak on the Kafka side and/or our application side. We currently have the ulimit set to a generous 4096 but obviously we are hitting this ceiling. What's a recommended value? We are running rails and our Unicorn workers are connecting to our Kafka cluster via round-robin load balancing. We have about 1500 workers to that would be 1500 connections right there but they should be split across our 3 nodes. Instead Netstat shows thousands of connections that look like this: tcp0 0 kafka1.mycompany.:XmlIpcRegSvc :::10.99.99.1:22503 ESTABLISHED tcp0 0 kafka1.mycompany.:XmlIpcRegSvc :::10.99.99.1:48398 ESTABLISHED tcp0 0 kafka1.mycompany.:XmlIpcRegSvc :::10.99.99.2:29617 ESTABLISHED tcp0 0 kafka1.mycompany.:XmlIpcRegSvc :::10.99.99.1:32444 ESTABLISHED tcp0 0 kafka1.mycompany.:XmlIpcRegSvc :::10.99.99.1:34415 ESTABLISHED tcp0 0 kafka1.mycompany.:XmlIpcRegSvc :::10.99.99.1:56901 ESTABLISHED tcp0 0 kafka1.mycompany.:XmlIpcRegSvc :::10.99.99.2:45349 ESTABLISHED Has anyone come across this problem before? Is this a 0.7.2 leak, LB misconfiguration… ? Thanks
Re: java.net.SocketException: Too many open files
We've had this problem with Zookeeper... Setting ulimit properly can occasionally be tricky because you need to logout and re-ssh into the box for the changes to take effect on the next processes you start up. Another problem we've hit was that our puppet service was running in the background and silently restoring settings to their original values, which would bite us a while later, when we'd need to restart a service (currently running processes keep the limit they had at start time). You can double-check that your processes are running with the ulimit you expect them to by finding out their PID (using ps) and then doing sudo cat /proc/PID/limits If you don't see the value you configured in the Max open files line, then something somewhere prevented your process from using the number of file handles you want it to. Of course, what I just said doesn't address the possibility that there could be some sort of file handle leak somewhere in the 0.8 code... Though I guess such bug would have surfaced in heavy-duty environments such as LinkedIn's, if it existed. -- Felix On Fri, Aug 2, 2013 at 12:07 AM, Jun Rao jun...@gmail.com wrote: If you do netstat, what hosts are those connections for and what state are those connections in? Thanks, Jun On Thu, Aug 1, 2013 at 9:04 AM, Nandigam, Sujitha snandi...@verisign.com wrote: Hi, In producer I was continuously getting this exception java.net.SocketException: Too many open files even though I added the below line to /etc/security/limits.conf kafka-0.8.0-beta1-src-nofile983040 ERROR Producer connection to localhost:9093 unsuccessful (kafka.producer.SyncProducer) java.net.SocketException: Too many open files Please help me how to resolve this. Thanks, Sujitha This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed, and may contain information that is non-public, proprietary, privileged, confidential and exempt from disclosure under applicable law or may be constituted as attorney work product. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. If you have received this message in error, notify sender immediately and delete this message immediately.
java.net.SocketException: Too many open files
Hi, In producer I was continuously getting this exception java.net.SocketException: Too many open files even though I added the below line to /etc/security/limits.conf kafka-0.8.0-beta1-src-nofile983040 ERROR Producer connection to localhost:9093 unsuccessful (kafka.producer.SyncProducer) java.net.SocketException: Too many open files Please help me how to resolve this. Thanks, Sujitha This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed, and may contain information that is non-public, proprietary, privileged, confidential and exempt from disclosure under applicable law or may be constituted as attorney work product. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. If you have received this message in error, notify sender immediately and delete this message immediately.
Re: java.net.SocketException: Too many open files
If you do netstat, what hosts are those connections for and what state are those connections in? Thanks, Jun On Thu, Aug 1, 2013 at 9:04 AM, Nandigam, Sujitha snandi...@verisign.comwrote: Hi, In producer I was continuously getting this exception java.net.SocketException: Too many open files even though I added the below line to /etc/security/limits.conf kafka-0.8.0-beta1-src-nofile983040 ERROR Producer connection to localhost:9093 unsuccessful (kafka.producer.SyncProducer) java.net.SocketException: Too many open files Please help me how to resolve this. Thanks, Sujitha This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed, and may contain information that is non-public, proprietary, privileged, confidential and exempt from disclosure under applicable law or may be constituted as attorney work product. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. If you have received this message in error, notify sender immediately and delete this message immediately.