RE: Too many open files in kafka 0.9

2017-12-07 Thread REYMOND Jean-max (BPCE-IT - SYNCHRONE TECHNOLOGIES)
According to 
https://issues.apache.org/jira/browse/KAFKA-3806

I have adjusted offset.retention.minutes and it seems that it solves my issue

-Message d'origine-
De : Ted Yu [mailto:yuzhih...@gmail.com] 
Envoyé : mercredi 29 novembre 2017 19:41
À : users@kafka.apache.org
Objet : Re: Too many open files in kafka 0.9

There is KAFKA-3317 which is still open.

Have you seen this ?

http://search-hadoop.com/m/Kafka/uyzND1KvOlt1p5UcE?subj=Re+Brokers+is+down+by+java+io+IOException+Too+many+open+files+

On Wed, Nov 29, 2017 at 8:55 AM, REYMOND Jean-max (BPCE-IT - SYNCHRONE
TECHNOLOGIES) <jean-max.reymond.prestata...@bpce-it.fr> wrote:

> We have a cluster with 3 brokers and kafka 0.9.0.1. One week ago, we 
> decide to adjust log.retention.hours from 10 days to 2 days. Stop and 
> go the cluster and it is ok. But for one broker, we have every day 
> more and more datas and two days later crash with message too many 
> open files. lsof return 7400 opened files. We adjust to 1 and 
> crash again. So, in our data repository, we remove all the datas and 
> run again and after a few minutes, cluster is OK. But now, after atfer 
> 6 hours, the two valid brokers have 72 GB and the other broker has 90 
> GB. lsof -p xxx returns 1030 and it is growing continously. I am sure 
> that tomorrow morning, we will have a crash.
>
> In the server.log of the broken broker,
>
> [2017-11-29 17:28:51,360] INFO Rolled new log segment for 
> '__consumer_offsets-27' in 1 ms. (kafka.log.Log)
> [2017-11-29 17:31:28,836] INFO Rolled new log segment for 
> '__consumer_offsets-8' in 1 ms. (kafka.log.Log)
> [2017-11-29 17:35:22,100] INFO Rolled new log segment for 
> '__consumer_offsets-12' in 1 ms. (kafka.log.Log)
> [2017-11-29 17:37:55,984] INFO Rolled new log segment for 
> '__consumer_offsets-11' in 1 ms. (kafka.log.Log)
> [2017-11-29 17:38:30,600] INFO [Group Metadata Manager on Broker 2]:
> Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.
> GroupMetadataManager)
> [2017-11-29 17:39:55,836] INFO Rolled new log segment for 
> '__consumer_offsets-16' in 1 ms. (kafka.log.Log)
> [2017-11-29 17:43:38,300] INFO Rolled new log segment for 
> '__consumer_offsets-48' in 1 ms. (kafka.log.Log)
> [2017-11-29 17:44:21,110] INFO Rolled new log segment for 
> '__consumer_offsets-36' in 1 ms. (kafka.log.Log)
> [2017-11-29 17:48:30,600] INFO [Group Metadata Manager on Broker 2]:
> Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.
> GroupMetadataManager)
>
> And in the same time on a valid broker
>
> [2017-11-29 17:44:46,704] INFO Deleting index 
> /pfic/kafka/data/kafka_data/__consumer_offsets-48/
> 002686063378.index.deleted (kafka.log.OffsetIndex)
> [2017-11-29 17:44:47,341] INFO Deleting segment 2687254936 from log 
> __consumer_offsets-48. (kafka.log.Log)
> [2017-11-29 17:44:47,376] INFO Deleting index 
> /pfic/kafka/data/kafka_data/__consumer_offsets-48/
> 002687254936.index.deleted (kafka.log.OffsetIndex)
> [2017-11-29 17:45:32,991] INFO Deleting segment 0 from log 
> __consumer_offsets-36. (kafka.log.Log)
> [2017-11-29 17:45:32,991] INFO Deleting segment 1769617973 from log 
> __consumer_offsets-36. (kafka.log.Log)
> [2017-11-29 17:45:32,993] INFO Deleting index 
> /pfic/kafka/data/kafka_data/__consumer_offsets-36/
> .index.deleted (kafka.log.OffsetIndex)
> [2017-11-29 17:45:32,993] INFO Deleting index 
> /pfic/kafka/data/kafka_data/__consumer_offsets-36/
> 001769617973.index.deleted (kafka.log.OffsetIndex)
> [2017-11-29 17:45:33,593] INFO Deleting segment 1770704579 from log 
> __consumer_offsets-36. (kafka.log.Log)
> [2017-11-29 17:45:33,627] INFO Deleting index 
> /pfic/kafka/data/kafka_data/__consumer_offsets-36/
> 001770704579.index.deleted (kafka.log.OffsetIndex)
> [2017-11-29 17:45:58,394] INFO [Group Metadata Manager on Broker 0]:
> Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.
> GroupMetadataManager)
>
> So, the broken broker never delete a segment. Of course, the three 
> brokers have the same configuration.
> What's happen ?
> Thanks for your advices,
>
>
> Jean-Max REYMOND
> BPCE Infogérance & Technologies
>
> 
> --
> L'intégrité de ce message n'étant pas assurée sur Internet, BPCE-IT ne 
> peut être tenu responsable de son contenu. Si vous n'êtes pas 
> destinataire de ce message, merci de le détruire et d'avertir l'expéditeur.
> The integrity of this message cannot be guaranteed on the Internet.
> BPCE-IT cannot therefore be considered responsible for the contents. 
> If you are not the intended recipient of this message, then please 
> delete it and notify the sender.
> 

RE: Too many open files in kafka 0.9

2017-11-30 Thread REYMOND Jean-max (BPCE-IT - SYNCHRONE TECHNOLOGIES)
Thanks for your precious advices.
Yes, we had upgraded nofiles parameter in limits.conf but after one week, the 
big crash.
Precisely, on the broken node, __consumer_offsets-XX directories are never 
deleted and after 20 hours, we have 70 GB of these directories and files. This 
is the huge difference with the others brokers., so is it safe to remove these 
directories __consumer_offsets-XX  if not acceded since one day ?

-Message d'origine-
De : Ted Yu [mailto:yuzhih...@gmail.com] 
Envoyé : mercredi 29 novembre 2017 19:41
À : users@kafka.apache.org
Objet : Re: Too many open files in kafka 0.9

There is KAFKA-3317 which is still open.

Have you seen this ?

http://search-hadoop.com/m/Kafka/uyzND1KvOlt1p5UcE?subj=Re+Brokers+is+down+by+java+io+IOException+Too+many+open+files+

On Wed, Nov 29, 2017 at 8:55 AM, REYMOND Jean-max (BPCE-IT - SYNCHRONE
TECHNOLOGIES) <jean-max.reymond.prestata...@bpce-it.fr> wrote:

> We have a cluster with 3 brokers and kafka 0.9.0.1. One week ago, we 
> decide to adjust log.retention.hours from 10 days to 2 days. Stop and 
> go the cluster and it is ok. But for one broker, we have every day 
> more and more datas and two days later crash with message too many 
> open files. lsof return 7400 opened files. We adjust to 1 and 
> crash again. So, in our data repository, we remove all the datas and 
> run again and after a few minutes, cluster is OK. But now, after atfer 
> 6 hours, the two valid brokers have 72 GB and the other broker has 90 
> GB. lsof -p xxx returns 1030 and it is growing continously. I am sure 
> that tomorrow morning, we will have a crash.
>
> In the server.log of the broken broker,
>
> [2017-11-29 17:28:51,360] INFO Rolled new log segment for 
> '__consumer_offsets-27' in 1 ms. (kafka.log.Log)
> [2017-11-29 17:31:28,836] INFO Rolled new log segment for 
> '__consumer_offsets-8' in 1 ms. (kafka.log.Log)
> [2017-11-29 17:35:22,100] INFO Rolled new log segment for 
> '__consumer_offsets-12' in 1 ms. (kafka.log.Log)
> [2017-11-29 17:37:55,984] INFO Rolled new log segment for 
> '__consumer_offsets-11' in 1 ms. (kafka.log.Log)
> [2017-11-29 17:38:30,600] INFO [Group Metadata Manager on Broker 2]:
> Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.
> GroupMetadataManager)
> [2017-11-29 17:39:55,836] INFO Rolled new log segment for 
> '__consumer_offsets-16' in 1 ms. (kafka.log.Log)
> [2017-11-29 17:43:38,300] INFO Rolled new log segment for 
> '__consumer_offsets-48' in 1 ms. (kafka.log.Log)
> [2017-11-29 17:44:21,110] INFO Rolled new log segment for 
> '__consumer_offsets-36' in 1 ms. (kafka.log.Log)
> [2017-11-29 17:48:30,600] INFO [Group Metadata Manager on Broker 2]:
> Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.
> GroupMetadataManager)
>
> And in the same time on a valid broker
>
> [2017-11-29 17:44:46,704] INFO Deleting index 
> /pfic/kafka/data/kafka_data/__consumer_offsets-48/
> 002686063378.index.deleted (kafka.log.OffsetIndex)
> [2017-11-29 17:44:47,341] INFO Deleting segment 2687254936 from log 
> __consumer_offsets-48. (kafka.log.Log)
> [2017-11-29 17:44:47,376] INFO Deleting index 
> /pfic/kafka/data/kafka_data/__consumer_offsets-48/
> 002687254936.index.deleted (kafka.log.OffsetIndex)
> [2017-11-29 17:45:32,991] INFO Deleting segment 0 from log 
> __consumer_offsets-36. (kafka.log.Log)
> [2017-11-29 17:45:32,991] INFO Deleting segment 1769617973 from log 
> __consumer_offsets-36. (kafka.log.Log)
> [2017-11-29 17:45:32,993] INFO Deleting index 
> /pfic/kafka/data/kafka_data/__consumer_offsets-36/
> .index.deleted (kafka.log.OffsetIndex)
> [2017-11-29 17:45:32,993] INFO Deleting index 
> /pfic/kafka/data/kafka_data/__consumer_offsets-36/
> 001769617973.index.deleted (kafka.log.OffsetIndex)
> [2017-11-29 17:45:33,593] INFO Deleting segment 1770704579 from log 
> __consumer_offsets-36. (kafka.log.Log)
> [2017-11-29 17:45:33,627] INFO Deleting index 
> /pfic/kafka/data/kafka_data/__consumer_offsets-36/
> 001770704579.index.deleted (kafka.log.OffsetIndex)
> [2017-11-29 17:45:58,394] INFO [Group Metadata Manager on Broker 0]:
> Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.
> GroupMetadataManager)
>
> So, the broken broker never delete a segment. Of course, the three 
> brokers have the same configuration.
> What's happen ?
> Thanks for your advices,
>
>
> Jean-Max REYMOND
> BPCE Infogérance & Technologies
>
> 
> --
> L'intégrité de ce message n'étant pas assurée sur Internet, BPCE-IT ne 
> peut être tenu responsable de son contenu. Si vous n'êtes pas 
> destinataire de ce message, merci de le détruire et d'avertir l'expéditeur.
>

Re: Too many open files in kafka 0.9

2017-11-29 Thread Ted Yu
There is KAFKA-3317 which is still open.

Have you seen this ?

http://search-hadoop.com/m/Kafka/uyzND1KvOlt1p5UcE?subj=Re+Brokers+is+down+by+java+io+IOException+Too+many+open+files+

On Wed, Nov 29, 2017 at 8:55 AM, REYMOND Jean-max (BPCE-IT - SYNCHRONE
TECHNOLOGIES) <jean-max.reymond.prestata...@bpce-it.fr> wrote:

> We have a cluster with 3 brokers and kafka 0.9.0.1. One week ago, we
> decide to adjust log.retention.hours from 10 days to 2 days. Stop and go
> the cluster and it is ok. But for one broker, we have every day more and
> more datas and two days later crash with message too many open files. lsof
> return 7400 opened files. We adjust to 1 and crash again. So, in our
> data repository, we remove all the datas and run again and after a few
> minutes, cluster is OK. But now, after atfer 6 hours, the two valid brokers
> have 72 GB and the other broker has 90 GB. lsof -p xxx returns 1030 and it
> is growing continously. I am sure that tomorrow morning, we will have a
> crash.
>
> In the server.log of the broken broker,
>
> [2017-11-29 17:28:51,360] INFO Rolled new log segment for
> '__consumer_offsets-27' in 1 ms. (kafka.log.Log)
> [2017-11-29 17:31:28,836] INFO Rolled new log segment for
> '__consumer_offsets-8' in 1 ms. (kafka.log.Log)
> [2017-11-29 17:35:22,100] INFO Rolled new log segment for
> '__consumer_offsets-12' in 1 ms. (kafka.log.Log)
> [2017-11-29 17:37:55,984] INFO Rolled new log segment for
> '__consumer_offsets-11' in 1 ms. (kafka.log.Log)
> [2017-11-29 17:38:30,600] INFO [Group Metadata Manager on Broker 2]:
> Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.
> GroupMetadataManager)
> [2017-11-29 17:39:55,836] INFO Rolled new log segment for
> '__consumer_offsets-16' in 1 ms. (kafka.log.Log)
> [2017-11-29 17:43:38,300] INFO Rolled new log segment for
> '__consumer_offsets-48' in 1 ms. (kafka.log.Log)
> [2017-11-29 17:44:21,110] INFO Rolled new log segment for
> '__consumer_offsets-36' in 1 ms. (kafka.log.Log)
> [2017-11-29 17:48:30,600] INFO [Group Metadata Manager on Broker 2]:
> Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.
> GroupMetadataManager)
>
> And in the same time on a valid broker
>
> [2017-11-29 17:44:46,704] INFO Deleting index
> /pfic/kafka/data/kafka_data/__consumer_offsets-48/
> 002686063378.index.deleted (kafka.log.OffsetIndex)
> [2017-11-29 17:44:47,341] INFO Deleting segment 2687254936 from log
> __consumer_offsets-48. (kafka.log.Log)
> [2017-11-29 17:44:47,376] INFO Deleting index
> /pfic/kafka/data/kafka_data/__consumer_offsets-48/
> 002687254936.index.deleted (kafka.log.OffsetIndex)
> [2017-11-29 17:45:32,991] INFO Deleting segment 0 from log
> __consumer_offsets-36. (kafka.log.Log)
> [2017-11-29 17:45:32,991] INFO Deleting segment 1769617973 from log
> __consumer_offsets-36. (kafka.log.Log)
> [2017-11-29 17:45:32,993] INFO Deleting index
> /pfic/kafka/data/kafka_data/__consumer_offsets-36/
> .index.deleted (kafka.log.OffsetIndex)
> [2017-11-29 17:45:32,993] INFO Deleting index
> /pfic/kafka/data/kafka_data/__consumer_offsets-36/
> 001769617973.index.deleted (kafka.log.OffsetIndex)
> [2017-11-29 17:45:33,593] INFO Deleting segment 1770704579 from log
> __consumer_offsets-36. (kafka.log.Log)
> [2017-11-29 17:45:33,627] INFO Deleting index
> /pfic/kafka/data/kafka_data/__consumer_offsets-36/
> 001770704579.index.deleted (kafka.log.OffsetIndex)
> [2017-11-29 17:45:58,394] INFO [Group Metadata Manager on Broker 0]:
> Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.
> GroupMetadataManager)
>
> So, the broken broker never delete a segment. Of course, the three brokers
> have the same configuration.
> What's happen ?
> Thanks for your advices,
>
>
> Jean-Max REYMOND
> BPCE Infogérance & Technologies
>
> 
> --
> L'intégrité de ce message n'étant pas assurée sur Internet, BPCE-IT ne
> peut être tenu responsable de son contenu. Si vous n'êtes pas destinataire
> de ce message, merci de le détruire et d'avertir l'expéditeur.
> The integrity of this message cannot be guaranteed on the Internet.
> BPCE-IT cannot therefore be considered responsible for the contents. If you
> are not the intended recipient of this message, then please delete it and
> notify the sender.
> 
> --
>


Too many open files in kafka 0.9

2017-11-29 Thread REYMOND Jean-max (BPCE-IT - SYNCHRONE TECHNOLOGIES)
We have a cluster with 3 brokers and kafka 0.9.0.1. One week ago, we decide to 
adjust log.retention.hours from 10 days to 2 days. Stop and go the cluster and 
it is ok. But for one broker, we have every day more and more datas and two 
days later crash with message too many open files. lsof return 7400 opened 
files. We adjust to 1 and crash again. So, in our data repository, we 
remove all the datas and run again and after a few minutes, cluster is OK. But 
now, after atfer 6 hours, the two valid brokers have 72 GB and the other broker 
has 90 GB. lsof -p xxx returns 1030 and it is growing continously. I am sure 
that tomorrow morning, we will have a crash.

In the server.log of the broken broker,

[2017-11-29 17:28:51,360] INFO Rolled new log segment for 
'__consumer_offsets-27' in 1 ms. (kafka.log.Log)
[2017-11-29 17:31:28,836] INFO Rolled new log segment for 
'__consumer_offsets-8' in 1 ms. (kafka.log.Log)
[2017-11-29 17:35:22,100] INFO Rolled new log segment for 
'__consumer_offsets-12' in 1 ms. (kafka.log.Log)
[2017-11-29 17:37:55,984] INFO Rolled new log segment for 
'__consumer_offsets-11' in 1 ms. (kafka.log.Log)
[2017-11-29 17:38:30,600] INFO [Group Metadata Manager on Broker 2]: Removed 0 
expired offsets in 0 milliseconds. (kafka.coordinator.GroupMetadataManager)
[2017-11-29 17:39:55,836] INFO Rolled new log segment for 
'__consumer_offsets-16' in 1 ms. (kafka.log.Log)
[2017-11-29 17:43:38,300] INFO Rolled new log segment for 
'__consumer_offsets-48' in 1 ms. (kafka.log.Log)
[2017-11-29 17:44:21,110] INFO Rolled new log segment for 
'__consumer_offsets-36' in 1 ms. (kafka.log.Log)
[2017-11-29 17:48:30,600] INFO [Group Metadata Manager on Broker 2]: Removed 0 
expired offsets in 0 milliseconds. (kafka.coordinator.GroupMetadataManager)

And in the same time on a valid broker

[2017-11-29 17:44:46,704] INFO Deleting index 
/pfic/kafka/data/kafka_data/__consumer_offsets-48/002686063378.index.deleted
 (kafka.log.OffsetIndex)
[2017-11-29 17:44:47,341] INFO Deleting segment 2687254936 from log 
__consumer_offsets-48. (kafka.log.Log)
[2017-11-29 17:44:47,376] INFO Deleting index 
/pfic/kafka/data/kafka_data/__consumer_offsets-48/002687254936.index.deleted
 (kafka.log.OffsetIndex)
[2017-11-29 17:45:32,991] INFO Deleting segment 0 from log 
__consumer_offsets-36. (kafka.log.Log)
[2017-11-29 17:45:32,991] INFO Deleting segment 1769617973 from log 
__consumer_offsets-36. (kafka.log.Log)
[2017-11-29 17:45:32,993] INFO Deleting index 
/pfic/kafka/data/kafka_data/__consumer_offsets-36/.index.deleted
 (kafka.log.OffsetIndex)
[2017-11-29 17:45:32,993] INFO Deleting index 
/pfic/kafka/data/kafka_data/__consumer_offsets-36/001769617973.index.deleted
 (kafka.log.OffsetIndex)
[2017-11-29 17:45:33,593] INFO Deleting segment 1770704579 from log 
__consumer_offsets-36. (kafka.log.Log)
[2017-11-29 17:45:33,627] INFO Deleting index 
/pfic/kafka/data/kafka_data/__consumer_offsets-36/001770704579.index.deleted
 (kafka.log.OffsetIndex)
[2017-11-29 17:45:58,394] INFO [Group Metadata Manager on Broker 0]: Removed 0 
expired offsets in 0 milliseconds. (kafka.coordinator.GroupMetadataManager)

So, the broken broker never delete a segment. Of course, the three brokers have 
the same configuration.
What's happen ?
Thanks for your advices,


Jean-Max REYMOND
BPCE Infogérance & Technologies

--
L'intégrité de ce message n'étant pas assurée sur Internet, BPCE-IT ne peut 
être tenu responsable de son contenu. Si vous n'êtes pas destinataire de ce 
message, merci de le détruire et d'avertir l'expéditeur.
The integrity of this message cannot be guaranteed on the Internet. BPCE-IT 
cannot therefore be considered responsible for the contents. If you are not the 
intended recipient of this message, then please delete it and notify the sender.
--


Re: Brokers is down by “java.io.IOException: Too many open files”

2017-05-17 Thread Jeffrey Groves
I’ve seen where setting network configurations within the OS can help mitigate 
some of the “Too many open files” issue as well.



Try changing the following items on the OS to try to have used network 
connections close as quickly as possible in order to keep file handle use down:





sysctl -w net.ipv4.tcp_fin_timeout=10


By default, this value is 60 seconds.  Reducing the value to 10 seconds will 
allow socket related file handles to be released sooner.



sysctl -w net.ipv4.tcp_synack_retries=3



By default, this value is 5.  Setting this to 3 decreases the amount of time 
that it will take for a failed passive TCP connection to timeout which releases 
resources sooner.




Additionally, be sure that your zookeeper account ulimit NOFILE are also set to 
a high enough value so that they are able to service requests for network 
connections at a comparable amount as the Kafka broker.  The above network 
parameters help zookeeper as well, so look into implementing them on your 
zookeeper nodes as well.



Finally, make sure the account that you run your producer and consumer 
processes also have appropriate ulimit setting for NOFILE and the nodes where 
they run use the network configurations above.







Thank you,



Jeff Groves





On 5/17/17, 1:11 AM, "Yang Cui" <y...@freewheel.tv> wrote:



Hi Caleb,



  We already set the number of max open files to 100,000 before this error 
happened.



  Normally, the file description is about 20,000, but in some time, it 
suddenly jump to so many count.



  This is our monitor about Kafka FD info:



  2017-05-17-05:04:19 FD_total_num:19261 FD_pair_num:15256 FD_ads_num:3191 
FD_Type:TYPE 1 DIR 2 unix 2 sock 4 CHR 7 a_inode 73 FIFO 146 IPv4 149 REG 18877

  2017-05-17-05:04:31 FD_total_num:19267 FD_pair_num:15259 FD_ads_num:3192 
FD_Type:TYPE 1 DIR 2 unix 2 sock 4 CHR 7 a_inode 73 FIFO 146 IPv4 152 REG 18883

  2017-05-17-05:04:44 FD_total_num:19272 FD_pair_num:15263 FD_ads_num:3197 
FD_Type:TYPE 1 DIR 2 unix 2 sock 4 CHR 7 a_inode 73 FIFO 146 IPv4 148 REG 18892

  2017-05-17-05:04:57 FD_total_num:19280 FD_pair_num:15268 FD_ads_num:3197 
FD_Type:TYPE 1 DIR 2 unix 2 sock 4 CHR 7 a_inode 73 FIFO 146 IPv4 150 REG 18895

  2017-05-17-05:05:09 FD_total_num:19277 FD_pair_num:15271 FD_ads_num:3197 
FD_Type:TYPE 1 DIR 2 unix 2 sock 4 CHR 7 a_inode 73 FIFO 146 IPv4 152 REG 18898

  2017-05-17-05:05:21 FD_total_num:19223 FD_pair_num:15217 FD_ads_num:3189 
FD_Type:TYPE 1 DIR 2 unix 2 sock 4 CHR 7 a_inode 73 FIFO 146 IPv4 158 REG 18836

  2017-05-17-05:05:34 FD_total_num:19235 FD_pair_num:15223 FD_ads_num:3189 
FD_Type:TYPE 1 DIR 2 unix 2 sock 4 CHR 7 a_inode 73 FIFO 146 IPv4 158 REG 18842







On 13/05/2017, 9:57 AM, "Caleb Welton" <ca...@autonomic.ai> wrote:



You need to up your OS open file limits, something like this should 
work:



# /etc/security/limits.conf

* - nofile 65536









On Fri, May 12, 2017 at 6:34 PM, Yang Cui <y...@freewheel.tv> wrote:



> Our Kafka cluster is broken down by  the problem 
“java.io.IOException: Too

> many open files”  three times in 3 weeks.

>

> We encounter these problem on both 0.9.0.1 and 0.10.2.1 version.

>

> The error is like:

    >

> java.io.IOException: Too many open files

> at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)

> at sun.nio.ch.ServerSocketChannelImpl.accept(

> ServerSocketChannelImpl.java:422)

> at sun.nio.ch.ServerSocketChannelImpl.accept(

> ServerSocketChannelImpl.java:250)

> at kafka.network.Acceptor.accept(SocketServer.scala:340)

> at kafka.network.Acceptor.run(SocketServer.scala:283)

> at java.lang.Thread.run(Thread.java:745)

>

> Is someone encounter the similar problem?

>

>

>








Re: Brokers is down by “java.io.IOException: Too many open files”

2017-05-16 Thread Yang Cui
Hi Caleb, 

  We already set the number of max open files to 100,000 before this error 
happened.
  
  Normally, the file description is about 20,000, but in some time, it suddenly 
jump to so many count.
  
  This is our monitor about Kafka FD info:
  
  2017-05-17-05:04:19 FD_total_num:19261 FD_pair_num:15256 FD_ads_num:3191 
FD_Type:TYPE 1 DIR 2 unix 2 sock 4 CHR 7 a_inode 73 FIFO 146 IPv4 149 REG 18877
  2017-05-17-05:04:31 FD_total_num:19267 FD_pair_num:15259 FD_ads_num:3192 
FD_Type:TYPE 1 DIR 2 unix 2 sock 4 CHR 7 a_inode 73 FIFO 146 IPv4 152 REG 18883
  2017-05-17-05:04:44 FD_total_num:19272 FD_pair_num:15263 FD_ads_num:3197 
FD_Type:TYPE 1 DIR 2 unix 2 sock 4 CHR 7 a_inode 73 FIFO 146 IPv4 148 REG 18892
  2017-05-17-05:04:57 FD_total_num:19280 FD_pair_num:15268 FD_ads_num:3197 
FD_Type:TYPE 1 DIR 2 unix 2 sock 4 CHR 7 a_inode 73 FIFO 146 IPv4 150 REG 18895
  2017-05-17-05:05:09 FD_total_num:19277 FD_pair_num:15271 FD_ads_num:3197 
FD_Type:TYPE 1 DIR 2 unix 2 sock 4 CHR 7 a_inode 73 FIFO 146 IPv4 152 REG 18898
  2017-05-17-05:05:21 FD_total_num:19223 FD_pair_num:15217 FD_ads_num:3189 
FD_Type:TYPE 1 DIR 2 unix 2 sock 4 CHR 7 a_inode 73 FIFO 146 IPv4 158 REG 18836
  2017-05-17-05:05:34 FD_total_num:19235 FD_pair_num:15223 FD_ads_num:3189 
FD_Type:TYPE 1 DIR 2 unix 2 sock 4 CHR 7 a_inode 73 FIFO 146 IPv4 158 REG 18842
  
  

On 13/05/2017, 9:57 AM, "Caleb Welton" <ca...@autonomic.ai> wrote:

You need to up your OS open file limits, something like this should work:

# /etc/security/limits.conf
* - nofile 65536




On Fri, May 12, 2017 at 6:34 PM, Yang Cui <y...@freewheel.tv> wrote:

> Our Kafka cluster is broken down by  the problem “java.io.IOException: Too
> many open files”  three times in 3 weeks.
>
> We encounter these problem on both 0.9.0.1 and 0.10.2.1 version.
>
> The error is like:
    >
    > java.io.IOException: Too many open files
> at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
> at sun.nio.ch.ServerSocketChannelImpl.accept(
> ServerSocketChannelImpl.java:422)
> at sun.nio.ch.ServerSocketChannelImpl.accept(
> ServerSocketChannelImpl.java:250)
> at kafka.network.Acceptor.accept(SocketServer.scala:340)
> at kafka.network.Acceptor.run(SocketServer.scala:283)
> at java.lang.Thread.run(Thread.java:745)
>
> Is someone encounter the similar problem?
>
>
>




Re: Brokers is down by “java.io.IOException: Too many open files”

2017-05-15 Thread Sam Pegler
If you're using a systemd based OS you'll actually need to set it in the
unit file.

LimitNOFILE=10

https://kafka.apache.org/documentation/#upgrade_10_1_breaking contains some
changes re file handles as well.


__

Sam Pegler

PRODUCTION ENGINEER

T. +44(0) 07 562 867 486

<http://www.infectiousmedia.com/>
3-7 Herbal Hill / London / EC1R 5EJ
www.infectiousmedia.com

This email and any attachments are confidential and may also be privileged.
If you
are not the intended recipient, please notify the sender immediately, and
do not
disclose the contents to another person, use it for any purpose, or store,
or copy
the information in any medium. Please also destroy and delete the message
from
your computer.


On 13 May 2017 at 02:57, Caleb Welton <ca...@autonomic.ai> wrote:

> You need to up your OS open file limits, something like this should work:
>
> # /etc/security/limits.conf
> * - nofile 65536
>
>
>
>
> On Fri, May 12, 2017 at 6:34 PM, Yang Cui <y...@freewheel.tv> wrote:
>
> > Our Kafka cluster is broken down by  the problem “java.io.IOException:
> Too
> > many open files”  three times in 3 weeks.
> >
> > We encounter these problem on both 0.9.0.1 and 0.10.2.1 version.
> >
> > The error is like:
> >
> > java.io.IOException: Too many open files
> > at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
> > at sun.nio.ch.ServerSocketChannelImpl.accept(
> > ServerSocketChannelImpl.java:422)
> > at sun.nio.ch.ServerSocketChannelImpl.accept(
> > ServerSocketChannelImpl.java:250)
> > at kafka.network.Acceptor.accept(SocketServer.scala:340)
> > at kafka.network.Acceptor.run(SocketServer.scala:283)
> > at java.lang.Thread.run(Thread.java:745)
> >
> > Is someone encounter the similar problem?
> >
> >
> >
>


Re: Brokers is down by “java.io.IOException: Too many open files”

2017-05-12 Thread Caleb Welton
You need to up your OS open file limits, something like this should work:

# /etc/security/limits.conf
* - nofile 65536




On Fri, May 12, 2017 at 6:34 PM, Yang Cui <y...@freewheel.tv> wrote:

> Our Kafka cluster is broken down by  the problem “java.io.IOException: Too
> many open files”  three times in 3 weeks.
>
> We encounter these problem on both 0.9.0.1 and 0.10.2.1 version.
>
> The error is like:
>
> java.io.IOException: Too many open files
> at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
> at sun.nio.ch.ServerSocketChannelImpl.accept(
> ServerSocketChannelImpl.java:422)
> at sun.nio.ch.ServerSocketChannelImpl.accept(
> ServerSocketChannelImpl.java:250)
> at kafka.network.Acceptor.accept(SocketServer.scala:340)
> at kafka.network.Acceptor.run(SocketServer.scala:283)
> at java.lang.Thread.run(Thread.java:745)
>
> Is someone encounter the similar problem?
>
>
>


Brokers is down by “java.io.IOException: Too many open files”

2017-05-12 Thread Yang Cui
Our Kafka cluster is broken down by  the problem “java.io.IOException: Too many 
open files”  three times in 3 weeks.

We encounter these problem on both 0.9.0.1 and 0.10.2.1 version.

The error is like:

java.io.IOException: Too many open files
at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
at 
sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422)
at 
sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
at kafka.network.Acceptor.accept(SocketServer.scala:340)
at kafka.network.Acceptor.run(SocketServer.scala:283)
at java.lang.Thread.run(Thread.java:745)

Is someone encounter the similar problem?




Re: Too many open files

2016-09-14 Thread Jaikiran Pai

What does the output of:

lsof -p 

show on that specific node?

-Jaikiran

On Monday 12 September 2016 10:03 PM, Michael Sparr wrote:

5-node Kafka cluster, bare metal, Ubuntu 14.04.x LTS with 64GB RAM, 8-core, 
960GB SSD boxes and a single node in cluster is filling logs with the following:

[2016-09-12 09:34:49,522] ERROR Error while accepting connection 
(kafka.network.Acceptor)
java.io.IOException: Too many open files
at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
at 
sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422)
at 
sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
at kafka.network.Acceptor.accept(SocketServer.scala:323)
at kafka.network.Acceptor.run(SocketServer.scala:268)
at java.lang.Thread.run(Thread.java:745)

No other nodes in cluster have this issue. Separate application server has 
consumers/producers using librdkafka + confluent kafka python library with a 
few million messages published to under 100 topics.

For days now the /var/log/kafka/kafka.server.log.N are filling up server with this 
message and using up all space on only a single server node in cluster. I have 
soft/hard limits at 65,535 for all users so > ulimit -n reveals 65535

Is there a setting I should add from librdkafka config in the Python producer 
clients to shorten socket connections even further to avoid this or something 
else going on?

Should I write this as issue in Github repo and if so, which project?


Thanks!






Re: Too many open files

2016-09-14 Thread Jaikiran Pai

What does the output of:

lsof -p 

show?

-Jaikiran

On Monday 12 September 2016 10:03 PM, Michael Sparr wrote:

5-node Kafka cluster, bare metal, Ubuntu 14.04.x LTS with 64GB RAM, 8-core, 
960GB SSD boxes and a single node in cluster is filling logs with the following:

[2016-09-12 09:34:49,522] ERROR Error while accepting connection 
(kafka.network.Acceptor)
java.io.IOException: Too many open files
at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
at 
sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422)
at 
sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
at kafka.network.Acceptor.accept(SocketServer.scala:323)
at kafka.network.Acceptor.run(SocketServer.scala:268)
at java.lang.Thread.run(Thread.java:745)

No other nodes in cluster have this issue. Separate application server has 
consumers/producers using librdkafka + confluent kafka python library with a 
few million messages published to under 100 topics.

For days now the /var/log/kafka/kafka.server.log.N are filling up server with this 
message and using up all space on only a single server node in cluster. I have 
soft/hard limits at 65,535 for all users so > ulimit -n reveals 65535

Is there a setting I should add from librdkafka config in the Python producer 
clients to shorten socket connections even further to avoid this or something 
else going on?

Should I write this as issue in Github repo and if so, which project?


Thanks!






Too many open files

2016-09-12 Thread Michael Sparr
5-node Kafka cluster, bare metal, Ubuntu 14.04.x LTS with 64GB RAM, 8-core, 
960GB SSD boxes and a single node in cluster is filling logs with the following:

[2016-09-12 09:34:49,522] ERROR Error while accepting connection 
(kafka.network.Acceptor)
java.io.IOException: Too many open files
at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
at 
sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422)
at 
sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
at kafka.network.Acceptor.accept(SocketServer.scala:323)
at kafka.network.Acceptor.run(SocketServer.scala:268)
at java.lang.Thread.run(Thread.java:745)

No other nodes in cluster have this issue. Separate application server has 
consumers/producers using librdkafka + confluent kafka python library with a 
few million messages published to under 100 topics.

For days now the /var/log/kafka/kafka.server.log.N are filling up server with 
this message and using up all space on only a single server node in cluster. I 
have soft/hard limits at 65,535 for all users so > ulimit -n reveals 65535

Is there a setting I should add from librdkafka config in the Python producer 
clients to shorten socket connections even further to avoid this or something 
else going on?

Should I write this as issue in Github repo and if so, which project?


Thanks!



Re: Too Many Open Files

2016-08-01 Thread Thakrar, Jayesh
What are the producers/consumers for the Kafka cluster?
Remember that its not just files but also sockets that add to the count.

I had seen issues when we had a network switch problem and had Storm consumers.
The switch would cause issues in connectivity between Kafka brokers, zookeepers 
and clients, causing a flood of connections from everyone to each other.

On 8/1/16, 7:14 AM, "Scott Thibault" <scott.thiba...@multiscalehn.com> wrote:

Did you verify that the process has the correct limit applied?
cat /proc//limits

--Scott Thibault


On Sun, Jul 31, 2016 at 4:14 PM, Kessiler Rodrigues <kessi...@callinize.com>
wrote:

> I’m still experiencing this issue…
>
> Here are the kafka logs.
>
> [2016-07-31 20:10:35,658] ERROR Error while accepting connection
> (kafka.network.Acceptor)
    > java.io.IOException: Too many open files
> at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
> at
> 
sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422)
> at
> 
sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
> at kafka.network.Acceptor.accept(SocketServer.scala:323)
> at kafka.network.Acceptor.run(SocketServer.scala:268)
> at java.lang.Thread.run(Thread.java:745)
> [2016-07-31 20:10:35,658] ERROR Error while accepting connection
    > (kafka.network.Acceptor)
> java.io.IOException: Too many open files
> at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
> at
> 
sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422)
> at
> 
sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
> at kafka.network.Acceptor.accept(SocketServer.scala:323)
> at kafka.network.Acceptor.run(SocketServer.scala:268)
> at java.lang.Thread.run(Thread.java:745)
> [2016-07-31 20:10:35,658] ERROR Error while accepting connection
> (kafka.network.Acceptor)
> java.io.IOException: Too many open files
> at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
> at
> 
sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422)
> at
> 
sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
> at kafka.network.Acceptor.accept(SocketServer.scala:323)
> at kafka.network.Acceptor.run(SocketServer.scala:268)
> at java.lang.Thread.run(Thread.java:745)
>
> My ulimit is 1 million, how is that possible?
>
> Can someone help with this?
>
>
> > On Jul 30, 2016, at 5:05 AM, Kessiler Rodrigues <kessi...@callinize.com>
> wrote:
> >
> > I have changed it a bit.
> >
> > I have 10 brokers and 20k topics with 1 partition each.
> >
> > I looked at the kaka’s logs dir and I only have 3318 files.
> >
> > I’m doing some tests to see how many topics/partitions I can have, but
> it is throwing too many files once it hits 15k topics..
> >
> > Any thoughts?
> >
> >
> >
> >> On Jul 29, 2016, at 10:33 PM, Gwen Shapira <g...@confluent.io> wrote:
> >>
> >> woah, it looks like you have 15,000 replicas per broker?
> >>
> >> You can go into the directory you configured for kafka's log.dir and
> >> see how many files you have there. Depending on your segment size and
> >> retention policy, you could have hundreds of files per partition
> >> there...
> >>
> >> Make sure you have at least that many file handles and then also add
> >> handles for the client connections.
> >>
> >> 1 million file handles sound like a lot, but you are running lots of
> >> partitions per broker...
> >>
> >> We normally don't see more than maybe 4000 per broker and most
> >> clusters have a lot fewer, so consider adding brokers and spreading
> >> partitions around a bit.
> >>
> >> Gwen
> >>
> >> On Fri, Jul 29, 2016 at 12:00 PM, Kessiler Rodrigues
> >> <kessi...@callinize.com> wrote:
> >>> Hi guys,
> >>>
> >>> I have been experiencing some issues on kafka, where its throwing too
> many open files.
> >>>
> >>> I have around of 6k topics and 5 partitions each.
> >>>
> >>> My cluster was made with 6

Re: Too Many Open Files

2016-08-01 Thread Scott Thibault
Did you verify that the process has the correct limit applied?
cat /proc//limits

--Scott Thibault


On Sun, Jul 31, 2016 at 4:14 PM, Kessiler Rodrigues <kessi...@callinize.com>
wrote:

> I’m still experiencing this issue…
>
> Here are the kafka logs.
>
> [2016-07-31 20:10:35,658] ERROR Error while accepting connection
> (kafka.network.Acceptor)
> java.io.IOException: Too many open files
> at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
> at
> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422)
> at
> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
> at kafka.network.Acceptor.accept(SocketServer.scala:323)
> at kafka.network.Acceptor.run(SocketServer.scala:268)
> at java.lang.Thread.run(Thread.java:745)
> [2016-07-31 20:10:35,658] ERROR Error while accepting connection
> (kafka.network.Acceptor)
> java.io.IOException: Too many open files
> at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
> at
> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422)
> at
> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
> at kafka.network.Acceptor.accept(SocketServer.scala:323)
> at kafka.network.Acceptor.run(SocketServer.scala:268)
> at java.lang.Thread.run(Thread.java:745)
> [2016-07-31 20:10:35,658] ERROR Error while accepting connection
> (kafka.network.Acceptor)
> java.io.IOException: Too many open files
> at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
> at
> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422)
> at
> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
> at kafka.network.Acceptor.accept(SocketServer.scala:323)
> at kafka.network.Acceptor.run(SocketServer.scala:268)
> at java.lang.Thread.run(Thread.java:745)
>
> My ulimit is 1 million, how is that possible?
>
> Can someone help with this?
>
>
> > On Jul 30, 2016, at 5:05 AM, Kessiler Rodrigues <kessi...@callinize.com>
> wrote:
> >
> > I have changed it a bit.
> >
> > I have 10 brokers and 20k topics with 1 partition each.
> >
> > I looked at the kaka’s logs dir and I only have 3318 files.
> >
> > I’m doing some tests to see how many topics/partitions I can have, but
> it is throwing too many files once it hits 15k topics..
> >
> > Any thoughts?
> >
> >
> >
> >> On Jul 29, 2016, at 10:33 PM, Gwen Shapira <g...@confluent.io> wrote:
> >>
> >> woah, it looks like you have 15,000 replicas per broker?
> >>
> >> You can go into the directory you configured for kafka's log.dir and
> >> see how many files you have there. Depending on your segment size and
> >> retention policy, you could have hundreds of files per partition
> >> there...
> >>
> >> Make sure you have at least that many file handles and then also add
> >> handles for the client connections.
> >>
> >> 1 million file handles sound like a lot, but you are running lots of
> >> partitions per broker...
> >>
> >> We normally don't see more than maybe 4000 per broker and most
> >> clusters have a lot fewer, so consider adding brokers and spreading
> >> partitions around a bit.
> >>
> >> Gwen
> >>
> >> On Fri, Jul 29, 2016 at 12:00 PM, Kessiler Rodrigues
> >> <kessi...@callinize.com> wrote:
> >>> Hi guys,
> >>>
> >>> I have been experiencing some issues on kafka, where its throwing too
> many open files.
> >>>
> >>> I have around of 6k topics and 5 partitions each.
> >>>
> >>> My cluster was made with 6 brokers. All of them are running Ubuntu 16
> and the file limits settings are:
> >>>
> >>> `cat  /proc/sys/fs/file-max`
> >>> 200
> >>>
> >>> `ulimit -n`
> >>> 100
> >>>
> >>> Anyone has experienced it before?
> >
>
>


-- 
*This e-mail is not encrypted.  Due to the unsecured nature of unencrypted
e-mail, there may be some level of risk that the information in this e-mail
could be read by a third party.  Accordingly, the recipient(s) named above
are hereby advised to not communicate protected health information using
this e-mail address.  If you desire to send protected health information
electronically, please contact MultiScale Health Networks at (206)538-6090*


Re: Too Many Open Files

2016-08-01 Thread Kessiler Rodrigues
Hey guys

I got a solution for this. The kafka process wasn’t getting the limits config 
because I was running it under supervisor.

I changed it and right now I’m using systemd to put kafka up and running!

On systemd services you can setup your FD limit using a property called 
“LimitNOFile”.

Thanks for all your help!


> On Aug 1, 2016, at 5:04 AM, Anirudh P <panirudh2...@gmail.com> wrote:
> 
> I agree with Steve. We had a similar problem where we set the ulimit to a
> certain value but it was getting overridden.
> It only worked when we set the ulimit after logging in as root. You might
> want to give that a try if you have not done so already
> 
> - Anirudh
> 
> On Mon, Aug 1, 2016 at 1:19 PM, Steve Miller <st...@idrathernotsay.com>
> wrote:
> 
>> Can you run lsof -p (pid) for whatever the pid is for your Kafka process?
>> 
>> For the fd limits you've set, I don't think subtlety is required: if
>> there's a millionish lines in the output, the fd limit you set is where you
>> think it is, and if it's a lot lower than that, the limit isn't being
>> applied properly somehow (maybe you are running this under, say,
>> supervisord, and maybe its config is lowering the limit, or the limits for
>> root are as you say but the limits for the kafka user aren't being set
>> properly, that sort of thing).
>> 
>> If you do have 1M lines in the output, at least this might give you a
>> place to start figuring out what's open and why.
>> 
>>-Steve
>> 
>>> On Jul 31, 2016, at 4:14 PM, Kessiler Rodrigues <kessi...@callinize.com>
>> wrote:
>>> 
>>> I’m still experiencing this issue…
>>> 
>>> Here are the kafka logs.
>>> 
>>> [2016-07-31 20:10:35,658] ERROR Error while accepting connection
>> (kafka.network.Acceptor)
>>> java.io.IOException: Too many open files
>>>   at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
>>>   at
>> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422)
>>>   at
>> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
>>>   at kafka.network.Acceptor.accept(SocketServer.scala:323)
>>>   at kafka.network.Acceptor.run(SocketServer.scala:268)
>>>   at java.lang.Thread.run(Thread.java:745)
>>> [2016-07-31 20:10:35,658] ERROR Error while accepting connection
>> (kafka.network.Acceptor)
>>> java.io.IOException: Too many open files
>>>   at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
>>>   at
>> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422)
>>>   at
>> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
>>>   at kafka.network.Acceptor.accept(SocketServer.scala:323)
>>>   at kafka.network.Acceptor.run(SocketServer.scala:268)
>>>   at java.lang.Thread.run(Thread.java:745)
>>> [2016-07-31 20:10:35,658] ERROR Error while accepting connection
>> (kafka.network.Acceptor)
>>> java.io.IOException: Too many open files
>>>   at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
>>>   at
>> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422)
>>>   at
>> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
>>>   at kafka.network.Acceptor.accept(SocketServer.scala:323)
>>>   at kafka.network.Acceptor.run(SocketServer.scala:268)
>>>   at java.lang.Thread.run(Thread.java:745)
>>> 
>>> My ulimit is 1 million, how is that possible?
>>> 
>>> Can someone help with this?
>>> 
>>> 
>>>> On Jul 30, 2016, at 5:05 AM, Kessiler Rodrigues <kessi...@callinize.com>
>> wrote:
>>>> 
>>>> I have changed it a bit.
>>>> 
>>>> I have 10 brokers and 20k topics with 1 partition each.
>>>> 
>>>> I looked at the kaka’s logs dir and I only have 3318 files.
>>>> 
>>>> I’m doing some tests to see how many topics/partitions I can have, but
>> it is throwing too many files once it hits 15k topics..
>>>> 
>>>> Any thoughts?
>>>> 
>>>> 
>>>> 
>>>>> On Jul 29, 2016, at 10:33 PM, Gwen Shapira <g...@confluent.io> wrote:
>>>>> 
>>>>> woah, it looks like you have 15,000 replicas per broker?
>>>>> 
>>>>> You can go into the directory you configured for kafka's log.dir and
>>>>> see how many files you have there. Depending on your segment size and
>>>>> retent

Re: Too Many Open Files

2016-08-01 Thread Anirudh P
I agree with Steve. We had a similar problem where we set the ulimit to a
certain value but it was getting overridden.
It only worked when we set the ulimit after logging in as root. You might
want to give that a try if you have not done so already

- Anirudh

On Mon, Aug 1, 2016 at 1:19 PM, Steve Miller <st...@idrathernotsay.com>
wrote:

> Can you run lsof -p (pid) for whatever the pid is for your Kafka process?
>
> For the fd limits you've set, I don't think subtlety is required: if
> there's a millionish lines in the output, the fd limit you set is where you
> think it is, and if it's a lot lower than that, the limit isn't being
> applied properly somehow (maybe you are running this under, say,
> supervisord, and maybe its config is lowering the limit, or the limits for
> root are as you say but the limits for the kafka user aren't being set
> properly, that sort of thing).
>
> If you do have 1M lines in the output, at least this might give you a
> place to start figuring out what's open and why.
>
> -Steve
>
> > On Jul 31, 2016, at 4:14 PM, Kessiler Rodrigues <kessi...@callinize.com>
> wrote:
> >
> > I’m still experiencing this issue…
> >
> > Here are the kafka logs.
> >
> > [2016-07-31 20:10:35,658] ERROR Error while accepting connection
> (kafka.network.Acceptor)
> > java.io.IOException: Too many open files
> >at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
> >at
> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422)
> >at
> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
> >at kafka.network.Acceptor.accept(SocketServer.scala:323)
> >at kafka.network.Acceptor.run(SocketServer.scala:268)
> >at java.lang.Thread.run(Thread.java:745)
> > [2016-07-31 20:10:35,658] ERROR Error while accepting connection
> (kafka.network.Acceptor)
> > java.io.IOException: Too many open files
> >at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
> >at
> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422)
> >at
> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
> >at kafka.network.Acceptor.accept(SocketServer.scala:323)
> >at kafka.network.Acceptor.run(SocketServer.scala:268)
> >at java.lang.Thread.run(Thread.java:745)
> > [2016-07-31 20:10:35,658] ERROR Error while accepting connection
> (kafka.network.Acceptor)
> > java.io.IOException: Too many open files
> >at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
> >at
> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422)
> >at
> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
> >at kafka.network.Acceptor.accept(SocketServer.scala:323)
> >at kafka.network.Acceptor.run(SocketServer.scala:268)
> >at java.lang.Thread.run(Thread.java:745)
> >
> > My ulimit is 1 million, how is that possible?
> >
> > Can someone help with this?
> >
> >
> >> On Jul 30, 2016, at 5:05 AM, Kessiler Rodrigues <kessi...@callinize.com>
> wrote:
> >>
> >> I have changed it a bit.
> >>
> >> I have 10 brokers and 20k topics with 1 partition each.
> >>
> >> I looked at the kaka’s logs dir and I only have 3318 files.
> >>
> >> I’m doing some tests to see how many topics/partitions I can have, but
> it is throwing too many files once it hits 15k topics..
> >>
> >> Any thoughts?
> >>
> >>
> >>
> >>> On Jul 29, 2016, at 10:33 PM, Gwen Shapira <g...@confluent.io> wrote:
> >>>
> >>> woah, it looks like you have 15,000 replicas per broker?
> >>>
> >>> You can go into the directory you configured for kafka's log.dir and
> >>> see how many files you have there. Depending on your segment size and
> >>> retention policy, you could have hundreds of files per partition
> >>> there...
> >>>
> >>> Make sure you have at least that many file handles and then also add
> >>> handles for the client connections.
> >>>
> >>> 1 million file handles sound like a lot, but you are running lots of
> >>> partitions per broker...
> >>>
> >>> We normally don't see more than maybe 4000 per broker and most
> >>> clusters have a lot fewer, so consider adding brokers and spreading
> >>> partitions around a bit.
> >>>
> >>> Gwen
> >>>
> >>> On Fri, Jul 29, 2016 at 12:00 PM, Kessiler Rodrigues
> >>> <kessi...@callinize.com> wrote:
> >>>> Hi guys,
> >>>>
> >>>> I have been experiencing some issues on kafka, where its throwing too
> many open files.
> >>>>
> >>>> I have around of 6k topics and 5 partitions each.
> >>>>
> >>>> My cluster was made with 6 brokers. All of them are running Ubuntu 16
> and the file limits settings are:
> >>>>
> >>>> `cat  /proc/sys/fs/file-max`
> >>>> 200
> >>>>
> >>>> `ulimit -n`
> >>>> 100
> >>>>
> >>>> Anyone has experienced it before?
> >
>
>


Re: Too Many Open Files

2016-08-01 Thread Steve Miller
Can you run lsof -p (pid) for whatever the pid is for your Kafka process?

For the fd limits you've set, I don't think subtlety is required: if there's a 
millionish lines in the output, the fd limit you set is where you think it is, 
and if it's a lot lower than that, the limit isn't being applied properly 
somehow (maybe you are running this under, say, supervisord, and maybe its 
config is lowering the limit, or the limits for root are as you say but the 
limits for the kafka user aren't being set properly, that sort of thing).

If you do have 1M lines in the output, at least this might give you a place to 
start figuring out what's open and why.

-Steve

> On Jul 31, 2016, at 4:14 PM, Kessiler Rodrigues <kessi...@callinize.com> 
> wrote:
> 
> I’m still experiencing this issue…
> 
> Here are the kafka logs.
> 
> [2016-07-31 20:10:35,658] ERROR Error while accepting connection 
> (kafka.network.Acceptor)
> java.io.IOException: Too many open files
>at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
>at 
> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422)
>at 
> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
>at kafka.network.Acceptor.accept(SocketServer.scala:323)
>at kafka.network.Acceptor.run(SocketServer.scala:268)
>at java.lang.Thread.run(Thread.java:745)
> [2016-07-31 20:10:35,658] ERROR Error while accepting connection 
> (kafka.network.Acceptor)
> java.io.IOException: Too many open files
>at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
>at 
> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422)
>at 
> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
>at kafka.network.Acceptor.accept(SocketServer.scala:323)
>at kafka.network.Acceptor.run(SocketServer.scala:268)
>at java.lang.Thread.run(Thread.java:745)
> [2016-07-31 20:10:35,658] ERROR Error while accepting connection 
> (kafka.network.Acceptor)
> java.io.IOException: Too many open files
>at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
>at 
> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422)
>at 
> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
>at kafka.network.Acceptor.accept(SocketServer.scala:323)
>at kafka.network.Acceptor.run(SocketServer.scala:268)
>at java.lang.Thread.run(Thread.java:745)
> 
> My ulimit is 1 million, how is that possible?
> 
> Can someone help with this? 
> 
> 
>> On Jul 30, 2016, at 5:05 AM, Kessiler Rodrigues <kessi...@callinize.com> 
>> wrote:
>> 
>> I have changed it a bit.
>> 
>> I have 10 brokers and 20k topics with 1 partition each. 
>> 
>> I looked at the kaka’s logs dir and I only have 3318 files.
>> 
>> I’m doing some tests to see how many topics/partitions I can have, but it is 
>> throwing too many files once it hits 15k topics..
>> 
>> Any thoughts?
>> 
>> 
>> 
>>> On Jul 29, 2016, at 10:33 PM, Gwen Shapira <g...@confluent.io> wrote:
>>> 
>>> woah, it looks like you have 15,000 replicas per broker?
>>> 
>>> You can go into the directory you configured for kafka's log.dir and
>>> see how many files you have there. Depending on your segment size and
>>> retention policy, you could have hundreds of files per partition
>>> there...
>>> 
>>> Make sure you have at least that many file handles and then also add
>>> handles for the client connections.
>>> 
>>> 1 million file handles sound like a lot, but you are running lots of
>>> partitions per broker...
>>> 
>>> We normally don't see more than maybe 4000 per broker and most
>>> clusters have a lot fewer, so consider adding brokers and spreading
>>> partitions around a bit.
>>> 
>>> Gwen
>>> 
>>> On Fri, Jul 29, 2016 at 12:00 PM, Kessiler Rodrigues
>>> <kessi...@callinize.com> wrote:
>>>> Hi guys,
>>>> 
>>>> I have been experiencing some issues on kafka, where its throwing too many 
>>>> open files.
>>>> 
>>>> I have around of 6k topics and 5 partitions each.
>>>> 
>>>> My cluster was made with 6 brokers. All of them are running Ubuntu 16 and 
>>>> the file limits settings are:
>>>> 
>>>> `cat  /proc/sys/fs/file-max`
>>>> 200
>>>> 
>>>> `ulimit -n`
>>>> 100
>>>> 
>>>> Anyone has experienced it before?
> 



Re: Too Many Open Files

2016-07-31 Thread Chris Richardson
Gwen,

Is there any particular reason why "inactive" (no consumers or producers
for a topic)  files need to be open?

Chris

-- 
Learn microservices - http://learnmicroservices.io
Microservices application platform http://eventuate.io

On Fri, Jul 29, 2016 at 6:33 PM, Gwen Shapira <g...@confluent.io> wrote:

> woah, it looks like you have 15,000 replicas per broker?
>
> You can go into the directory you configured for kafka's log.dir and
> see how many files you have there. Depending on your segment size and
> retention policy, you could have hundreds of files per partition
> there...
>
> Make sure you have at least that many file handles and then also add
> handles for the client connections.
>
> 1 million file handles sound like a lot, but you are running lots of
> partitions per broker...
>
> We normally don't see more than maybe 4000 per broker and most
> clusters have a lot fewer, so consider adding brokers and spreading
> partitions around a bit.
>
> Gwen
>
> On Fri, Jul 29, 2016 at 12:00 PM, Kessiler Rodrigues
> <kessi...@callinize.com> wrote:
> > Hi guys,
> >
> > I have been experiencing some issues on kafka, where its throwing too
> many open files.
> >
> > I have around of 6k topics and 5 partitions each.
> >
> > My cluster was made with 6 brokers. All of them are running Ubuntu 16
> and the file limits settings are:
> >
> > `cat  /proc/sys/fs/file-max`
> > 200
> >
> >  `ulimit -n`
> > 100
> >
> > Anyone has experienced it before?
>


RE: Too Many Open Files

2016-07-31 Thread Krzysztof Nawara
Maybe you are exhausting your sockets, not file handles for some reason? 


From: Kessiler Rodrigues [kessi...@callinize.com]
Sent: 31 July 2016 22:14
To: users@kafka.apache.org
Subject: Re: Too Many Open Files

I’m still experiencing this issue…

Here are the kafka logs.

[2016-07-31 20:10:35,658] ERROR Error while accepting connection 
(kafka.network.Acceptor)
java.io.IOException: Too many open files
at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
at 
sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422)
at 
sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
at kafka.network.Acceptor.accept(SocketServer.scala:323)
at kafka.network.Acceptor.run(SocketServer.scala:268)
at java.lang.Thread.run(Thread.java:745)
[2016-07-31 20:10:35,658] ERROR Error while accepting connection 
(kafka.network.Acceptor)
java.io.IOException: Too many open files
at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
at 
sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422)
at 
sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
at kafka.network.Acceptor.accept(SocketServer.scala:323)
at kafka.network.Acceptor.run(SocketServer.scala:268)
at java.lang.Thread.run(Thread.java:745)
[2016-07-31 20:10:35,658] ERROR Error while accepting connection 
(kafka.network.Acceptor)
java.io.IOException: Too many open files
at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
at 
sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422)
at 
sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
at kafka.network.Acceptor.accept(SocketServer.scala:323)
at kafka.network.Acceptor.run(SocketServer.scala:268)
at java.lang.Thread.run(Thread.java:745)

My ulimit is 1 million, how is that possible?

Can someone help with this?

> On Jul 30, 2016, at 5:05 AM, Kessiler Rodrigues <kessi...@callinize.com> 
> wrote:
>
> I have changed it a bit.
>
> I have 10 brokers and 20k topics with 1 partition each.
>
> I looked at the kaka’s logs dir and I only have 3318 files.
>
> I’m doing some tests to see how many topics/partitions I can have, but it is 
> throwing too many files once it hits 15k topics..
>
> Any thoughts?
>
>
>
>> On Jul 29, 2016, at 10:33 PM, Gwen Shapira <g...@confluent.io> wrote:
>>
>> woah, it looks like you have 15,000 replicas per broker?
>>
>> You can go into the directory you configured for kafka's log.dir and
>> see how many files you have there. Depending on your segment size and
>> retention policy, you could have hundreds of files per partition
>> there...
>>
>> Make sure you have at least that many file handles and then also add
>> handles for the client connections.
>>
>> 1 million file handles sound like a lot, but you are running lots of
>> partitions per broker...
>>
>> We normally don't see more than maybe 4000 per broker and most
>> clusters have a lot fewer, so consider adding brokers and spreading
>> partitions around a bit.
>>
>> Gwen
>>
>> On Fri, Jul 29, 2016 at 12:00 PM, Kessiler Rodrigues
>> <kessi...@callinize.com> wrote:
>>> Hi guys,
>>>
>>> I have been experiencing some issues on kafka, where its throwing too many 
>>> open files.
>>>
>>> I have around of 6k topics and 5 partitions each.
>>>
>>> My cluster was made with 6 brokers. All of them are running Ubuntu 16 and 
>>> the file limits settings are:
>>>
>>> `cat  /proc/sys/fs/file-max`
>>> 200
>>>
>>> `ulimit -n`
>>> 100
>>>
>>> Anyone has experienced it before?
>


Re: Too Many Open Files

2016-07-31 Thread Kessiler Rodrigues
I’m still experiencing this issue…

Here are the kafka logs.

[2016-07-31 20:10:35,658] ERROR Error while accepting connection 
(kafka.network.Acceptor)
java.io.IOException: Too many open files
at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
at 
sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422)
at 
sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
at kafka.network.Acceptor.accept(SocketServer.scala:323)
at kafka.network.Acceptor.run(SocketServer.scala:268)
at java.lang.Thread.run(Thread.java:745)
[2016-07-31 20:10:35,658] ERROR Error while accepting connection 
(kafka.network.Acceptor)
java.io.IOException: Too many open files
at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
at 
sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422)
at 
sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
at kafka.network.Acceptor.accept(SocketServer.scala:323)
at kafka.network.Acceptor.run(SocketServer.scala:268)
at java.lang.Thread.run(Thread.java:745)
[2016-07-31 20:10:35,658] ERROR Error while accepting connection 
(kafka.network.Acceptor)
java.io.IOException: Too many open files
at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
at 
sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422)
at 
sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
at kafka.network.Acceptor.accept(SocketServer.scala:323)
at kafka.network.Acceptor.run(SocketServer.scala:268)
at java.lang.Thread.run(Thread.java:745)

My ulimit is 1 million, how is that possible?

Can someone help with this? 


> On Jul 30, 2016, at 5:05 AM, Kessiler Rodrigues <kessi...@callinize.com> 
> wrote:
> 
> I have changed it a bit.
> 
> I have 10 brokers and 20k topics with 1 partition each. 
> 
> I looked at the kaka’s logs dir and I only have 3318 files.
> 
> I’m doing some tests to see how many topics/partitions I can have, but it is 
> throwing too many files once it hits 15k topics..
> 
> Any thoughts?
> 
> 
> 
>> On Jul 29, 2016, at 10:33 PM, Gwen Shapira <g...@confluent.io> wrote:
>> 
>> woah, it looks like you have 15,000 replicas per broker?
>> 
>> You can go into the directory you configured for kafka's log.dir and
>> see how many files you have there. Depending on your segment size and
>> retention policy, you could have hundreds of files per partition
>> there...
>> 
>> Make sure you have at least that many file handles and then also add
>> handles for the client connections.
>> 
>> 1 million file handles sound like a lot, but you are running lots of
>> partitions per broker...
>> 
>> We normally don't see more than maybe 4000 per broker and most
>> clusters have a lot fewer, so consider adding brokers and spreading
>> partitions around a bit.
>> 
>> Gwen
>> 
>> On Fri, Jul 29, 2016 at 12:00 PM, Kessiler Rodrigues
>> <kessi...@callinize.com> wrote:
>>> Hi guys,
>>> 
>>> I have been experiencing some issues on kafka, where its throwing too many 
>>> open files.
>>> 
>>> I have around of 6k topics and 5 partitions each.
>>> 
>>> My cluster was made with 6 brokers. All of them are running Ubuntu 16 and 
>>> the file limits settings are:
>>> 
>>> `cat  /proc/sys/fs/file-max`
>>> 200
>>> 
>>> `ulimit -n`
>>> 100
>>> 
>>> Anyone has experienced it before?
> 



Re: Too Many Open Files

2016-07-30 Thread Kessiler Rodrigues
I have changed it a bit.

I have 10 brokers and 20k topics with 1 partition each. 

I looked at the kaka’s logs dir and I only have 3318 files.

I’m doing some tests to see how many topics/partitions I can have, but it is 
throwing too many files once it hits 15k topics..

Any thoughts?



> On Jul 29, 2016, at 10:33 PM, Gwen Shapira <g...@confluent.io> wrote:
> 
> woah, it looks like you have 15,000 replicas per broker?
> 
> You can go into the directory you configured for kafka's log.dir and
> see how many files you have there. Depending on your segment size and
> retention policy, you could have hundreds of files per partition
> there...
> 
> Make sure you have at least that many file handles and then also add
> handles for the client connections.
> 
> 1 million file handles sound like a lot, but you are running lots of
> partitions per broker...
> 
> We normally don't see more than maybe 4000 per broker and most
> clusters have a lot fewer, so consider adding brokers and spreading
> partitions around a bit.
> 
> Gwen
> 
> On Fri, Jul 29, 2016 at 12:00 PM, Kessiler Rodrigues
> <kessi...@callinize.com> wrote:
>> Hi guys,
>> 
>> I have been experiencing some issues on kafka, where its throwing too many 
>> open files.
>> 
>> I have around of 6k topics and 5 partitions each.
>> 
>> My cluster was made with 6 brokers. All of them are running Ubuntu 16 and 
>> the file limits settings are:
>> 
>> `cat  /proc/sys/fs/file-max`
>> 200
>> 
>> `ulimit -n`
>> 100
>> 
>> Anyone has experienced it before?



Re: Too Many Open Files

2016-07-29 Thread Gwen Shapira
woah, it looks like you have 15,000 replicas per broker?

You can go into the directory you configured for kafka's log.dir and
see how many files you have there. Depending on your segment size and
retention policy, you could have hundreds of files per partition
there...

Make sure you have at least that many file handles and then also add
handles for the client connections.

1 million file handles sound like a lot, but you are running lots of
partitions per broker...

We normally don't see more than maybe 4000 per broker and most
clusters have a lot fewer, so consider adding brokers and spreading
partitions around a bit.

Gwen

On Fri, Jul 29, 2016 at 12:00 PM, Kessiler Rodrigues
<kessi...@callinize.com> wrote:
> Hi guys,
>
> I have been experiencing some issues on kafka, where its throwing too many 
> open files.
>
> I have around of 6k topics and 5 partitions each.
>
> My cluster was made with 6 brokers. All of them are running Ubuntu 16 and the 
> file limits settings are:
>
> `cat  /proc/sys/fs/file-max`
> 200
>
>  `ulimit -n`
> 100
>
> Anyone has experienced it before?


Too Many Open Files

2016-07-29 Thread Kessiler Rodrigues
Hi guys,

I have been experiencing some issues on kafka, where its throwing too many open 
files.

I have around of 6k topics and 5 partitions each.

My cluster was made with 6 brokers. All of them are running Ubuntu 16 and the 
file limits settings are:

`cat  /proc/sys/fs/file-max`
200

 `ulimit -n`
100

Anyone has experienced it before? 

java.io.IOException: Too many open files error

2015-01-15 Thread Sa Li
Hi, all

We test our production kafka, and getting such error

[2015-01-15 19:03:45,057] ERROR Error in acceptor (kafka.network.Acceptor)
java.io.IOException: Too many open files
at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
at sun.nio.ch.ServerSocketChannelImpl.accept(
ServerSocketChannelImpl.java:241)
at kafka.network.Acceptor.accept(SocketServer.scala:200)
at kafka.network.Acceptor.run(SocketServer.scala:154)
at java.lang.Thread.run(Thread.java:745)

I noticed some other developers had similar issues, one suggestion was 

Without knowing the intricacies of Kafka, i think the default open file
descriptors is 1024 on unix. This can be changed by setting a higher ulimit
value ( typically 8192 but sometimes even 10 ).
Before modifying the ulimit I would recommend you check the number of
sockets stuck in TIME_WAIT mode. In this case, it looks like the broker has
too many open sockets. This could be because you have a rogue client
connecting and disconnecting repeatedly.
You might have to reduce the TIME_WAIT state to 30 seconds or lower.



We increase the open file handles by doing this:

insert kafka - nofile 10 in /etc/security/limits.conf

Is that right to change the open file descriptors?  In addition, it says to
reduce the TIME_WAIT, where about to change this state? Or any other
solution for this issue?

thanks



-- 

Alec Li


Re: java.io.IOException: Too many open files error

2015-01-15 Thread Gwen Shapira
You may find this article useful for troubleshooting and modifying TIME_WAIT:
http://www.linuxbrigade.com/reduce-time_wait-socket-connections/

The line you have for increasing file limit is fine, but you may also
need to increase the limit system wide:
insert fs.file-max = 10 in /etc/sysctl.conf

Gwen

On Thu, Jan 15, 2015 at 12:30 PM, Sa Li sal...@gmail.com wrote:
 Hi, all

 We test our production kafka, and getting such error

 [2015-01-15 19:03:45,057] ERROR Error in acceptor (kafka.network.Acceptor)
 java.io.IOException: Too many open files
 at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
 at sun.nio.ch.ServerSocketChannelImpl.accept(
 ServerSocketChannelImpl.java:241)
 at kafka.network.Acceptor.accept(SocketServer.scala:200)
 at kafka.network.Acceptor.run(SocketServer.scala:154)
 at java.lang.Thread.run(Thread.java:745)

 I noticed some other developers had similar issues, one suggestion was 

 Without knowing the intricacies of Kafka, i think the default open file
 descriptors is 1024 on unix. This can be changed by setting a higher ulimit
 value ( typically 8192 but sometimes even 10 ).
 Before modifying the ulimit I would recommend you check the number of
 sockets stuck in TIME_WAIT mode. In this case, it looks like the broker has
 too many open sockets. This could be because you have a rogue client
 connecting and disconnecting repeatedly.
 You might have to reduce the TIME_WAIT state to 30 seconds or lower.

 

 We increase the open file handles by doing this:

 insert kafka - nofile 10 in /etc/security/limits.conf

 Is that right to change the open file descriptors?  In addition, it says to
 reduce the TIME_WAIT, where about to change this state? Or any other
 solution for this issue?

 thanks



 --

 Alec Li


Re: java.io.IOException: Too many open files error

2015-01-15 Thread István
Hi Sa Li,

Depending on your system that configuration entry needs to be modified. The
first parameter after the insert is the username what you use to run kafka.
It might be your own username or something else, in the following example
it is called kafkauser. On the top of that I also like to use soft and hard
limits, when you hit the soft limit the system will log a meaningful
message in dmesg so you can see what is happening.

kafkauser soft nofile 8
kafkauser hard nofile 10

Hope that helps,
Istvan

On Thu, Jan 15, 2015 at 12:30 PM, Sa Li sal...@gmail.com wrote:

 Hi, all

 We test our production kafka, and getting such error

 [2015-01-15 19:03:45,057] ERROR Error in acceptor (kafka.network.Acceptor)
 java.io.IOException: Too many open files
 at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
 at sun.nio.ch.ServerSocketChannelImpl.accept(
 ServerSocketChannelImpl.java:241)
 at kafka.network.Acceptor.accept(SocketServer.scala:200)
 at kafka.network.Acceptor.run(SocketServer.scala:154)
 at java.lang.Thread.run(Thread.java:745)

 I noticed some other developers had similar issues, one suggestion was 

 Without knowing the intricacies of Kafka, i think the default open file
 descriptors is 1024 on unix. This can be changed by setting a higher ulimit
 value ( typically 8192 but sometimes even 10 ).
 Before modifying the ulimit I would recommend you check the number of
 sockets stuck in TIME_WAIT mode. In this case, it looks like the broker has
 too many open sockets. This could be because you have a rogue client
 connecting and disconnecting repeatedly.
 You might have to reduce the TIME_WAIT state to 30 seconds or lower.

 

 We increase the open file handles by doing this:

 insert kafka - nofile 10 in /etc/security/limits.conf

 Is that right to change the open file descriptors?  In addition, it says to
 reduce the TIME_WAIT, where about to change this state? Or any other
 solution for this issue?

 thanks



 --

 Alec Li




-- 
the sun shines for all


Re: java.io.IOException: Too many open files error

2015-01-15 Thread Sa Li
Thanks for the reply, I have change the configuration and running to see if
any errors come out.

SL

On Thu, Jan 15, 2015 at 3:34 PM, István lecc...@gmail.com wrote:

 Hi Sa Li,

 Depending on your system that configuration entry needs to be modified. The
 first parameter after the insert is the username what you use to run kafka.
 It might be your own username or something else, in the following example
 it is called kafkauser. On the top of that I also like to use soft and hard
 limits, when you hit the soft limit the system will log a meaningful
 message in dmesg so you can see what is happening.

 kafkauser soft nofile 8
 kafkauser hard nofile 10

 Hope that helps,
 Istvan

 On Thu, Jan 15, 2015 at 12:30 PM, Sa Li sal...@gmail.com wrote:

  Hi, all
 
  We test our production kafka, and getting such error
 
  [2015-01-15 19:03:45,057] ERROR Error in acceptor
 (kafka.network.Acceptor)
  java.io.IOException: Too many open files
  at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
  at sun.nio.ch.ServerSocketChannelImpl.accept(
  ServerSocketChannelImpl.java:241)
  at kafka.network.Acceptor.accept(SocketServer.scala:200)
  at kafka.network.Acceptor.run(SocketServer.scala:154)
  at java.lang.Thread.run(Thread.java:745)
 
  I noticed some other developers had similar issues, one suggestion was 
 
  Without knowing the intricacies of Kafka, i think the default open file
  descriptors is 1024 on unix. This can be changed by setting a higher
 ulimit
  value ( typically 8192 but sometimes even 10 ).
  Before modifying the ulimit I would recommend you check the number of
  sockets stuck in TIME_WAIT mode. In this case, it looks like the broker
 has
  too many open sockets. This could be because you have a rogue client
  connecting and disconnecting repeatedly.
  You might have to reduce the TIME_WAIT state to 30 seconds or lower.
 
  
 
  We increase the open file handles by doing this:
 
  insert kafka - nofile 10 in /etc/security/limits.conf
 
  Is that right to change the open file descriptors?  In addition, it says
 to
  reduce the TIME_WAIT, where about to change this state? Or any other
  solution for this issue?
 
  thanks
 
 
 
  --
 
  Alec Li
 



 --
 the sun shines for all




-- 

Alec Li


Re: Too Many Open Files Broker Error

2014-07-10 Thread Lung, Paul
Hi Jun,

That was the problem. It was actually the Ubuntu upstart job over writing
the limit. Thank you very much for your help.

Paul Lung

On 7/9/14, 1:58 PM, Jun Rao jun...@gmail.com wrote:

Is it possible your container wrapper somehow overrides the file handler
limit?

Thanks,

Jun


On Wed, Jul 9, 2014 at 9:59 AM, Lung, Paul pl...@ebay.com wrote:

 Yup. In fact, I just ran the test program again while the Kafak broker
is
 still running, using the same user of course. I was able to get up to
10K
 connections with the test program. The test program uses the same java
NIO
 library that the broker does. So the machine is capable of handling that
 many connections. The only issue I saw was that the NIO
 ServerSocketChannel is a bit slow at accepting connections when the
total
 connection goes around 4K, but this could be due to the fact that I put
 the ServerSocketChannel in the same Selector as the 4K SocketChannels.
So
 sometimes on the client side, I see:

 java.io.IOException: Connection reset by peer
 at sun.nio.ch.FileDispatcher.write0(Native Method)
 at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
 at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:122)
 at sun.nio.ch.IOUtil.write(IOUtil.java:93)
 at 
sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:352)
 at FdTest$ClientThread.run(FdTest.java:108)


 But all I have to do is sleep for a bit on the client, and then retry
 again. However, 4K does seem like a magic number, since that¹s seems to
be
 the number that the Kafka broker machine can handle before it gives me
the
 ³Too Many Open Files² error and eventually crashes.

 Paul Lung

 On 7/8/14, 9:29 PM, Jun Rao jun...@gmail.com wrote:

 Does your test program run as the same user as Kafka broker?
 
 Thanks,
 
 Jun
 
 
 On Tue, Jul 8, 2014 at 1:42 PM, Lung, Paul pl...@ebay.com wrote:
 
  Hi Guys,
 
  I¹m seeing the following errors from the 0.8.1.1 broker. This occurs
 most
  often on the Controller machine. Then the controller process crashes,
 and
  the controller bounces to other machines, which causes those
machines to
  crash. Looking at the file descriptors being held by the process,
it¹s
 only
  around 4000 or so(looking at . There aren¹t a whole lot of
connections
 in
  TIME_WAIT states, and I¹ve increased the ephemeral port range to
³16000
 ­
  64000² via /proc/sys/net/ipv4/ip_local_port_range². I¹ve written a
Java
  test program to see how many sockets and files I can open. The
socket is
  definitely limited by the ephemeral port range, which was around 22K
at
 the
  time. But I
  can open tons of files, since the open file limit of the user is set
to
  100K.
 
  So given that I can theoretically open 48K sockets and probably 90K
 files,
  and I only see around 4K total for the Kafka broker, I¹m really
 confused as
  to why I¹m seeing this error. Is there some internal Kafka limit
that I
  don¹t know about?
 
  Paul Lung
 
 
 
  java.io.IOException: Too many open files
 
  at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
 
  at
 
 
sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:
16
 3)
 
  at kafka.network.Acceptor.accept(SocketServer.scala:200)
 
  at kafka.network.Acceptor.run(SocketServer.scala:154)
 
  at java.lang.Thread.run(Thread.java:679)
 
  [2014-07-08 13:07:21,534] ERROR Error in acceptor
 (kafka.network.Acceptor)
 
  java.io.IOException: Too many open files
 
  at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
 
  at
 
 
sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:
16
 3)
 
  at kafka.network.Acceptor.accept(SocketServer.scala:200)
 
  at kafka.network.Acceptor.run(SocketServer.scala:154)
 
  at java.lang.Thread.run(Thread.java:679)
 
  [2014-07-08 13:07:21,563] ERROR [ReplicaFetcherThread-3-2124488],
Error
  for partition
[bom__021active_80__32__miniactiveitem_lvs_qn,0]
 to
  broker 2124488:class kafka.common.NotLeaderForPartitionException
  (kafka.server.ReplicaFetcherThread)
 
  [2014-07-08 13:07:21,558] FATAL [Replica Manager on Broker 2140112]:
 Error
  writing to highwatermark file:  (kafka.server.ReplicaManager)
 
  java.io.FileNotFoundException:
 
 
/ebay/cronus/software/cronusapp_home/kafka/kafka-logs/replication-offse
t-
 checkpoint.tmp
  (Too many open files)
 
  at java.io.FileOutputStream.open(Native Method)
 
  at java.io.FileOutputStream.init(FileOutputStream.java:209)
 
  at java.io.FileOutputStream.init(FileOutputStream.java:160)
 
  at java.io.FileWriter.init(FileWriter.java:90)
 
  at
 kafka.server.OffsetCheckpoint.write(OffsetCheckpoint.scala:37)
 
  at
 
 
kafka.server.ReplicaManager$$anonfun$checkpointHighWatermarks$2.apply(R
ep
 licaManager.scala:447)
 
  at
 
 
kafka.server.ReplicaManager$$anonfun$checkpointHighWatermarks$2.apply(R
ep
 licaManager.scala:444

Re: Too Many Open Files Broker Error

2014-07-09 Thread hsy...@gmail.com
I have the same problem. I didn't dig deeper but I saw this happen when I
launch kafka in daemon mode. I found the daemon mode is just launch kafka
with nohup. Not quite clear why this happen.


On Wed, Jul 9, 2014 at 9:59 AM, Lung, Paul pl...@ebay.com wrote:

 Yup. In fact, I just ran the test program again while the Kafak broker is
 still running, using the same user of course. I was able to get up to 10K
 connections with the test program. The test program uses the same java NIO
 library that the broker does. So the machine is capable of handling that
 many connections. The only issue I saw was that the NIO
 ServerSocketChannel is a bit slow at accepting connections when the total
 connection goes around 4K, but this could be due to the fact that I put
 the ServerSocketChannel in the same Selector as the 4K SocketChannels. So
 sometimes on the client side, I see:

 java.io.IOException: Connection reset by peer
 at sun.nio.ch.FileDispatcher.write0(Native Method)
 at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
 at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:122)
 at sun.nio.ch.IOUtil.write(IOUtil.java:93)
 at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:352)
 at FdTest$ClientThread.run(FdTest.java:108)


 But all I have to do is sleep for a bit on the client, and then retry
 again. However, 4K does seem like a magic number, since that¹s seems to be
 the number that the Kafka broker machine can handle before it gives me the
 ³Too Many Open Files² error and eventually crashes.

 Paul Lung

 On 7/8/14, 9:29 PM, Jun Rao jun...@gmail.com wrote:

 Does your test program run as the same user as Kafka broker?
 
 Thanks,
 
 Jun
 
 
 On Tue, Jul 8, 2014 at 1:42 PM, Lung, Paul pl...@ebay.com wrote:
 
  Hi Guys,
 
  I¹m seeing the following errors from the 0.8.1.1 broker. This occurs
 most
  often on the Controller machine. Then the controller process crashes,
 and
  the controller bounces to other machines, which causes those machines to
  crash. Looking at the file descriptors being held by the process, it¹s
 only
  around 4000 or so(looking at . There aren¹t a whole lot of connections
 in
  TIME_WAIT states, and I¹ve increased the ephemeral port range to ³16000
 ­
  64000² via /proc/sys/net/ipv4/ip_local_port_range². I¹ve written a Java
  test program to see how many sockets and files I can open. The socket is
  definitely limited by the ephemeral port range, which was around 22K at
 the
  time. But I
  can open tons of files, since the open file limit of the user is set to
  100K.
 
  So given that I can theoretically open 48K sockets and probably 90K
 files,
  and I only see around 4K total for the Kafka broker, I¹m really
 confused as
  to why I¹m seeing this error. Is there some internal Kafka limit that I
  don¹t know about?
 
  Paul Lung
 
 
 
  java.io.IOException: Too many open files
 
  at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
 
  at
 
 sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:16
 3)
 
  at kafka.network.Acceptor.accept(SocketServer.scala:200)
 
  at kafka.network.Acceptor.run(SocketServer.scala:154)
 
  at java.lang.Thread.run(Thread.java:679)
 
  [2014-07-08 13:07:21,534] ERROR Error in acceptor
 (kafka.network.Acceptor)
 
  java.io.IOException: Too many open files
 
  at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
 
  at
 
 sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:16
 3)
 
  at kafka.network.Acceptor.accept(SocketServer.scala:200)
 
  at kafka.network.Acceptor.run(SocketServer.scala:154)
 
  at java.lang.Thread.run(Thread.java:679)
 
  [2014-07-08 13:07:21,563] ERROR [ReplicaFetcherThread-3-2124488], Error
  for partition [bom__021active_80__32__miniactiveitem_lvs_qn,0]
 to
  broker 2124488:class kafka.common.NotLeaderForPartitionException
  (kafka.server.ReplicaFetcherThread)
 
  [2014-07-08 13:07:21,558] FATAL [Replica Manager on Broker 2140112]:
 Error
  writing to highwatermark file:  (kafka.server.ReplicaManager)
 
  java.io.FileNotFoundException:
 
 /ebay/cronus/software/cronusapp_home/kafka/kafka-logs/replication-offset-
 checkpoint.tmp
  (Too many open files)
 
  at java.io.FileOutputStream.open(Native Method)
 
  at java.io.FileOutputStream.init(FileOutputStream.java:209)
 
  at java.io.FileOutputStream.init(FileOutputStream.java:160)
 
  at java.io.FileWriter.init(FileWriter.java:90)
 
  at
 kafka.server.OffsetCheckpoint.write(OffsetCheckpoint.scala:37)
 
  at
 
 kafka.server.ReplicaManager$$anonfun$checkpointHighWatermarks$2.apply(Rep
 licaManager.scala:447)
 
  at
 
 kafka.server.ReplicaManager$$anonfun$checkpointHighWatermarks$2.apply(Rep
 licaManager.scala:444)
 
  at
 
 scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(Trav
 ersableLike.scala

Re: Too Many Open Files Broker Error

2014-07-09 Thread François Langelier
I don't know if that is your problem, but I had this output when my brokers
couldn't talk to each others...

The zookeeper were using the FQDN but my brokers didn't know the FQDN of
the other brokers...

If you look at you brokers info in zk (get /brokers/ids/#ID_OF_BROKER) can
you ping/connect to the value of the key host from your other brokers?



François Langelier
Étudiant en génie Logiciel - École de Technologie Supérieure
http://www.etsmtl.ca/
Capitaine Club Capra http://capra.etsmtl.ca/
VP-Communication - CS Games http://csgames.org 2014
Jeux de Génie http://www.jdgets.com/ 2011 à 2014
Argentier Fraternité du Piranha http://fraternitedupiranha.com/ 2012-2014
Comité Organisateur Olympiades ÉTS 2012
Compétition Québécoise d'Ingénierie 2012 - Compétition Senior


On 9 July 2014 15:17, hsy...@gmail.com hsy...@gmail.com wrote:

 I have the same problem. I didn't dig deeper but I saw this happen when I
 launch kafka in daemon mode. I found the daemon mode is just launch kafka
 with nohup. Not quite clear why this happen.


 On Wed, Jul 9, 2014 at 9:59 AM, Lung, Paul pl...@ebay.com wrote:

  Yup. In fact, I just ran the test program again while the Kafak broker is
  still running, using the same user of course. I was able to get up to 10K
  connections with the test program. The test program uses the same java
 NIO
  library that the broker does. So the machine is capable of handling that
  many connections. The only issue I saw was that the NIO
  ServerSocketChannel is a bit slow at accepting connections when the total
  connection goes around 4K, but this could be due to the fact that I put
  the ServerSocketChannel in the same Selector as the 4K SocketChannels. So
  sometimes on the client side, I see:
 
  java.io.IOException: Connection reset by peer
  at sun.nio.ch.FileDispatcher.write0(Native Method)
  at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
  at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:122)
  at sun.nio.ch.IOUtil.write(IOUtil.java:93)
  at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:352)
  at FdTest$ClientThread.run(FdTest.java:108)
 
 
  But all I have to do is sleep for a bit on the client, and then retry
  again. However, 4K does seem like a magic number, since that¹s seems to
 be
  the number that the Kafka broker machine can handle before it gives me
 the
  ³Too Many Open Files² error and eventually crashes.
 
  Paul Lung
 
  On 7/8/14, 9:29 PM, Jun Rao jun...@gmail.com wrote:
 
  Does your test program run as the same user as Kafka broker?
  
  Thanks,
  
  Jun
  
  
  On Tue, Jul 8, 2014 at 1:42 PM, Lung, Paul pl...@ebay.com wrote:
  
   Hi Guys,
  
   I¹m seeing the following errors from the 0.8.1.1 broker. This occurs
  most
   often on the Controller machine. Then the controller process crashes,
  and
   the controller bounces to other machines, which causes those machines
 to
   crash. Looking at the file descriptors being held by the process, it¹s
  only
   around 4000 or so(looking at . There aren¹t a whole lot of connections
  in
   TIME_WAIT states, and I¹ve increased the ephemeral port range to
 ³16000
  ­
   64000² via /proc/sys/net/ipv4/ip_local_port_range². I¹ve written a
 Java
   test program to see how many sockets and files I can open. The socket
 is
   definitely limited by the ephemeral port range, which was around 22K
 at
  the
   time. But I
   can open tons of files, since the open file limit of the user is set
 to
   100K.
  
   So given that I can theoretically open 48K sockets and probably 90K
  files,
   and I only see around 4K total for the Kafka broker, I¹m really
  confused as
   to why I¹m seeing this error. Is there some internal Kafka limit that
 I
   don¹t know about?
  
   Paul Lung
  
  
  
   java.io.IOException: Too many open files
  
   at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
  
   at
  
 
 sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:16
  3)
  
   at kafka.network.Acceptor.accept(SocketServer.scala:200)
  
   at kafka.network.Acceptor.run(SocketServer.scala:154)
  
   at java.lang.Thread.run(Thread.java:679)
  
   [2014-07-08 13:07:21,534] ERROR Error in acceptor
  (kafka.network.Acceptor)
  
   java.io.IOException: Too many open files
  
   at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
  
   at
  
 
 sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:16
  3)
  
   at kafka.network.Acceptor.accept(SocketServer.scala:200)
  
   at kafka.network.Acceptor.run(SocketServer.scala:154)
  
   at java.lang.Thread.run(Thread.java:679)
  
   [2014-07-08 13:07:21,563] ERROR [ReplicaFetcherThread-3-2124488],
 Error
   for partition [bom__021active_80__32__miniactiveitem_lvs_qn,0]
  to
   broker 2124488:class kafka.common.NotLeaderForPartitionException
   (kafka.server.ReplicaFetcherThread)
  
   [2014-07

Re: Too Many Open Files Broker Error

2014-07-09 Thread Jun Rao
Is it possible your container wrapper somehow overrides the file handler
limit?

Thanks,

Jun


On Wed, Jul 9, 2014 at 9:59 AM, Lung, Paul pl...@ebay.com wrote:

 Yup. In fact, I just ran the test program again while the Kafak broker is
 still running, using the same user of course. I was able to get up to 10K
 connections with the test program. The test program uses the same java NIO
 library that the broker does. So the machine is capable of handling that
 many connections. The only issue I saw was that the NIO
 ServerSocketChannel is a bit slow at accepting connections when the total
 connection goes around 4K, but this could be due to the fact that I put
 the ServerSocketChannel in the same Selector as the 4K SocketChannels. So
 sometimes on the client side, I see:

 java.io.IOException: Connection reset by peer
 at sun.nio.ch.FileDispatcher.write0(Native Method)
 at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
 at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:122)
 at sun.nio.ch.IOUtil.write(IOUtil.java:93)
 at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:352)
 at FdTest$ClientThread.run(FdTest.java:108)


 But all I have to do is sleep for a bit on the client, and then retry
 again. However, 4K does seem like a magic number, since that¹s seems to be
 the number that the Kafka broker machine can handle before it gives me the
 ³Too Many Open Files² error and eventually crashes.

 Paul Lung

 On 7/8/14, 9:29 PM, Jun Rao jun...@gmail.com wrote:

 Does your test program run as the same user as Kafka broker?
 
 Thanks,
 
 Jun
 
 
 On Tue, Jul 8, 2014 at 1:42 PM, Lung, Paul pl...@ebay.com wrote:
 
  Hi Guys,
 
  I¹m seeing the following errors from the 0.8.1.1 broker. This occurs
 most
  often on the Controller machine. Then the controller process crashes,
 and
  the controller bounces to other machines, which causes those machines to
  crash. Looking at the file descriptors being held by the process, it¹s
 only
  around 4000 or so(looking at . There aren¹t a whole lot of connections
 in
  TIME_WAIT states, and I¹ve increased the ephemeral port range to ³16000
 ­
  64000² via /proc/sys/net/ipv4/ip_local_port_range². I¹ve written a Java
  test program to see how many sockets and files I can open. The socket is
  definitely limited by the ephemeral port range, which was around 22K at
 the
  time. But I
  can open tons of files, since the open file limit of the user is set to
  100K.
 
  So given that I can theoretically open 48K sockets and probably 90K
 files,
  and I only see around 4K total for the Kafka broker, I¹m really
 confused as
  to why I¹m seeing this error. Is there some internal Kafka limit that I
  don¹t know about?
 
  Paul Lung
 
 
 
  java.io.IOException: Too many open files
 
  at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
 
  at
 
 sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:16
 3)
 
  at kafka.network.Acceptor.accept(SocketServer.scala:200)
 
  at kafka.network.Acceptor.run(SocketServer.scala:154)
 
  at java.lang.Thread.run(Thread.java:679)
 
  [2014-07-08 13:07:21,534] ERROR Error in acceptor
 (kafka.network.Acceptor)
 
  java.io.IOException: Too many open files
 
  at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
 
  at
 
 sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:16
 3)
 
  at kafka.network.Acceptor.accept(SocketServer.scala:200)
 
  at kafka.network.Acceptor.run(SocketServer.scala:154)
 
  at java.lang.Thread.run(Thread.java:679)
 
  [2014-07-08 13:07:21,563] ERROR [ReplicaFetcherThread-3-2124488], Error
  for partition [bom__021active_80__32__miniactiveitem_lvs_qn,0]
 to
  broker 2124488:class kafka.common.NotLeaderForPartitionException
  (kafka.server.ReplicaFetcherThread)
 
  [2014-07-08 13:07:21,558] FATAL [Replica Manager on Broker 2140112]:
 Error
  writing to highwatermark file:  (kafka.server.ReplicaManager)
 
  java.io.FileNotFoundException:
 
 /ebay/cronus/software/cronusapp_home/kafka/kafka-logs/replication-offset-
 checkpoint.tmp
  (Too many open files)
 
  at java.io.FileOutputStream.open(Native Method)
 
  at java.io.FileOutputStream.init(FileOutputStream.java:209)
 
  at java.io.FileOutputStream.init(FileOutputStream.java:160)
 
  at java.io.FileWriter.init(FileWriter.java:90)
 
  at
 kafka.server.OffsetCheckpoint.write(OffsetCheckpoint.scala:37)
 
  at
 
 kafka.server.ReplicaManager$$anonfun$checkpointHighWatermarks$2.apply(Rep
 licaManager.scala:447)
 
  at
 
 kafka.server.ReplicaManager$$anonfun$checkpointHighWatermarks$2.apply(Rep
 licaManager.scala:444)
 
  at
 
 scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(Trav
 ersableLike.scala:772)
 
  at scala.collection.immutable.Map$Map1.foreach(Map.scala:109

Re: Too Many Open Files Broker Error

2014-07-08 Thread Jun Rao
Does your test program run as the same user as Kafka broker?

Thanks,

Jun


On Tue, Jul 8, 2014 at 1:42 PM, Lung, Paul pl...@ebay.com wrote:

 Hi Guys,

 I’m seeing the following errors from the 0.8.1.1 broker. This occurs most
 often on the Controller machine. Then the controller process crashes, and
 the controller bounces to other machines, which causes those machines to
 crash. Looking at the file descriptors being held by the process, it’s only
 around 4000 or so(looking at . There aren’t a whole lot of connections in
 TIME_WAIT states, and I’ve increased the ephemeral port range to “16000 –
 64000” via /proc/sys/net/ipv4/ip_local_port_range”. I’ve written a Java
 test program to see how many sockets and files I can open. The socket is
 definitely limited by the ephemeral port range, which was around 22K at the
 time. But I
 can open tons of files, since the open file limit of the user is set to
 100K.

 So given that I can theoretically open 48K sockets and probably 90K files,
 and I only see around 4K total for the Kafka broker, I’m really confused as
 to why I’m seeing this error. Is there some internal Kafka limit that I
 don’t know about?

 Paul Lung



 java.io.IOException: Too many open files

 at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)

 at
 sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:163)

 at kafka.network.Acceptor.accept(SocketServer.scala:200)

 at kafka.network.Acceptor.run(SocketServer.scala:154)

 at java.lang.Thread.run(Thread.java:679)

 [2014-07-08 13:07:21,534] ERROR Error in acceptor (kafka.network.Acceptor)

 java.io.IOException: Too many open files

 at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)

 at
 sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:163)

 at kafka.network.Acceptor.accept(SocketServer.scala:200)

 at kafka.network.Acceptor.run(SocketServer.scala:154)

 at java.lang.Thread.run(Thread.java:679)

 [2014-07-08 13:07:21,563] ERROR [ReplicaFetcherThread-3-2124488], Error
 for partition [bom__021active_80__32__miniactiveitem_lvs_qn,0] to
 broker 2124488:class kafka.common.NotLeaderForPartitionException
 (kafka.server.ReplicaFetcherThread)

 [2014-07-08 13:07:21,558] FATAL [Replica Manager on Broker 2140112]: Error
 writing to highwatermark file:  (kafka.server.ReplicaManager)

 java.io.FileNotFoundException:
 /ebay/cronus/software/cronusapp_home/kafka/kafka-logs/replication-offset-checkpoint.tmp
 (Too many open files)

 at java.io.FileOutputStream.open(Native Method)

 at java.io.FileOutputStream.init(FileOutputStream.java:209)

 at java.io.FileOutputStream.init(FileOutputStream.java:160)

 at java.io.FileWriter.init(FileWriter.java:90)

 at kafka.server.OffsetCheckpoint.write(OffsetCheckpoint.scala:37)

 at
 kafka.server.ReplicaManager$$anonfun$checkpointHighWatermarks$2.apply(ReplicaManager.scala:447)

 at
 kafka.server.ReplicaManager$$anonfun$checkpointHighWatermarks$2.apply(ReplicaManager.scala:444)

 at
 scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)

 at scala.collection.immutable.Map$Map1.foreach(Map.scala:109)

 at
 scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)

 at
 kafka.server.ReplicaManager.checkpointHighWatermarks(ReplicaManager.scala:444)

 at
 kafka.server.ReplicaManager$$anonfun$1.apply$mcV$sp(ReplicaManager.scala:94)

 at kafka.utils.KafkaScheduler$$anon$1.run(KafkaScheduler.scala:100)

 at
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)

 at
 java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)

 at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)

 at
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:165)

 at
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:267)

 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)

 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)

 at java.lang.Thread.run(Thread.java:679)






Re: Getting java.io.IOException: Too many open files

2014-06-25 Thread Lung, Paul
Ok. What I just saw was that when the controller machine reaches around
4100+ files, it crashes. Then I think the controller bounced between 2
other machines, taking them down too, and the circled back to the original
machine.

Paul Lung

On 6/24/14, 10:51 PM, Lung, Paul pl...@ebay.com wrote:

The controller machine has 3500 or so, while the other machines have
around 1600.

Paul Lung

On 6/24/14, 10:31 PM, Prakash Gowri Shankor prakash.shan...@gmail.com
wrote:

How many files does each broker itself have open ? You can find this from
'ls -l /proc/processid/fd'




On Tue, Jun 24, 2014 at 10:18 PM, Lung, Paul pl...@ebay.com wrote:

 Hi All,


 I just upgraded my cluster from 0.8.1 to 0.8.1.1. I¹m seeing the
following
 error messages on the same 3 brokers once in a while:


 [2014-06-24 21:43:44,711] ERROR Error in acceptor
(kafka.network.Acceptor)

 java.io.IOException: Too many open files

 at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)

 at
 
sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:1
6
3)

 at kafka.network.Acceptor.accept(SocketServer.scala:200)

 at kafka.network.Acceptor.run(SocketServer.scala:154)

 at java.lang.Thread.run(Thread.java:679)

 [2014-06-24 21:43:44,711] ERROR Error in acceptor
(kafka.network.Acceptor)

 java.io.IOException: Too many open files

 at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)

 at
 
sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:1
6
3)

 at kafka.network.Acceptor.accept(SocketServer.scala:200)

 at kafka.network.Acceptor.run(SocketServer.scala:154)

 at java.lang.Thread.run(Thread.java:679)

 When this happens, these 3 brokers essentially go out of sync when you
do
 a ³kafka-topics.sh ‹describe².

 I tracked the number of open files by doing ³watch ­n 1 Œsudo lsof | wc
 ­l¹², which basically counts all open files on the system. The numbers
for
 the systems are basically in the 6000 range, with one system going to
9000.
 I presume the 9000 machine is the controller. Looking at the ulimit of
the
 user, both the hard limit and the soft limit for open files is 100,000.
 Using sysctl, the max file is fs.file-max = 9774928. So we seem to be
way
 under the limit.

 What am I missing here? Is there some JVM limit around 10K open files
or
 something?

 Paul Lung





Re: Getting java.io.IOException: Too many open files

2014-06-25 Thread Lung, Paul
Hi Prakash,

How many open files do you expect a broker to be able to handle? It seems
like this broker is crashing at around 4100 or so open files.

Thanks,
Paul Lung

On 6/24/14, 11:08 PM, Lung, Paul pl...@ebay.com wrote:

Ok. What I just saw was that when the controller machine reaches around
4100+ files, it crashes. Then I think the controller bounced between 2
other machines, taking them down too, and the circled back to the original
machine.

Paul Lung

On 6/24/14, 10:51 PM, Lung, Paul pl...@ebay.com wrote:

The controller machine has 3500 or so, while the other machines have
around 1600.

Paul Lung

On 6/24/14, 10:31 PM, Prakash Gowri Shankor prakash.shan...@gmail.com
wrote:

How many files does each broker itself have open ? You can find this
from
'ls -l /proc/processid/fd'




On Tue, Jun 24, 2014 at 10:18 PM, Lung, Paul pl...@ebay.com wrote:

 Hi All,


 I just upgraded my cluster from 0.8.1 to 0.8.1.1. I¹m seeing the
following
 error messages on the same 3 brokers once in a while:


 [2014-06-24 21:43:44,711] ERROR Error in acceptor
(kafka.network.Acceptor)

 java.io.IOException: Too many open files

 at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)

 at
 
sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:
1
6
3)

 at kafka.network.Acceptor.accept(SocketServer.scala:200)

 at kafka.network.Acceptor.run(SocketServer.scala:154)

 at java.lang.Thread.run(Thread.java:679)

 [2014-06-24 21:43:44,711] ERROR Error in acceptor
(kafka.network.Acceptor)

 java.io.IOException: Too many open files

 at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)

 at
 
sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:
1
6
3)

 at kafka.network.Acceptor.accept(SocketServer.scala:200)

 at kafka.network.Acceptor.run(SocketServer.scala:154)

 at java.lang.Thread.run(Thread.java:679)

 When this happens, these 3 brokers essentially go out of sync when you
do
 a ³kafka-topics.sh ‹describe².

 I tracked the number of open files by doing ³watch ­n 1 Œsudo lsof |
wc
 ­l¹², which basically counts all open files on the system. The numbers
for
 the systems are basically in the 6000 range, with one system going to
9000.
 I presume the 9000 machine is the controller. Looking at the ulimit of
the
 user, both the hard limit and the soft limit for open files is
100,000.
 Using sysctl, the max file is fs.file-max = 9774928. So we seem to be
way
 under the limit.

 What am I missing here? Is there some JVM limit around 10K open files
or
 something?

 Paul Lung






Re: Getting java.io.IOException: Too many open files

2014-06-25 Thread Prakash Gowri Shankor
Without knowing the intricacies of Kafka, i think the default open file
descriptors is 1024 on unix. This can be changed by setting a higher ulimit
value ( typically 8192 but sometimes even 10 ).
Before modifying the ulimit I would recommend you check the number of
sockets stuck in TIME_WAIT mode. In this case, it looks like the broker has
too many open sockets. This could be because you have a rogue client
connecting and disconnecting repeatedly.
You might have to reduce the TIME_WAIT state to 30 seconds or lower.



On Wed, Jun 25, 2014 at 10:19 AM, Lung, Paul pl...@ebay.com wrote:

 Hi Prakash,

 How many open files do you expect a broker to be able to handle? It seems
 like this broker is crashing at around 4100 or so open files.

 Thanks,
 Paul Lung

 On 6/24/14, 11:08 PM, Lung, Paul pl...@ebay.com wrote:

 Ok. What I just saw was that when the controller machine reaches around
 4100+ files, it crashes. Then I think the controller bounced between 2
 other machines, taking them down too, and the circled back to the original
 machine.
 
 Paul Lung
 
 On 6/24/14, 10:51 PM, Lung, Paul pl...@ebay.com wrote:
 
 The controller machine has 3500 or so, while the other machines have
 around 1600.
 
 Paul Lung
 
 On 6/24/14, 10:31 PM, Prakash Gowri Shankor prakash.shan...@gmail.com
 
 wrote:
 
 How many files does each broker itself have open ? You can find this
 from
 'ls -l /proc/processid/fd'
 
 
 
 
 On Tue, Jun 24, 2014 at 10:18 PM, Lung, Paul pl...@ebay.com wrote:
 
  Hi All,
 
 
  I just upgraded my cluster from 0.8.1 to 0.8.1.1. I¹m seeing the
 following
  error messages on the same 3 brokers once in a while:
 
 
  [2014-06-24 21:43:44,711] ERROR Error in acceptor
 (kafka.network.Acceptor)
 
  java.io.IOException: Too many open files
 
  at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
 
  at
 
 sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:
 1
 6
 3)
 
  at kafka.network.Acceptor.accept(SocketServer.scala:200)
 
  at kafka.network.Acceptor.run(SocketServer.scala:154)
 
  at java.lang.Thread.run(Thread.java:679)
 
  [2014-06-24 21:43:44,711] ERROR Error in acceptor
 (kafka.network.Acceptor)
 
  java.io.IOException: Too many open files
 
  at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
 
  at
 
 sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:
 1
 6
 3)
 
  at kafka.network.Acceptor.accept(SocketServer.scala:200)
 
  at kafka.network.Acceptor.run(SocketServer.scala:154)
 
  at java.lang.Thread.run(Thread.java:679)
 
  When this happens, these 3 brokers essentially go out of sync when you
 do
  a ³kafka-topics.sh ‹describe².
 
  I tracked the number of open files by doing ³watch ­n 1 Œsudo lsof |
 wc
  ­l¹², which basically counts all open files on the system. The numbers
 for
  the systems are basically in the 6000 range, with one system going to
 9000.
  I presume the 9000 machine is the controller. Looking at the ulimit of
 the
  user, both the hard limit and the soft limit for open files is
 100,000.
  Using sysctl, the max file is fs.file-max = 9774928. So we seem to be
 way
  under the limit.
 
  What am I missing here? Is there some JVM limit around 10K open files
 or
  something?
 
  Paul Lung
 
 
 




Getting java.io.IOException: Too many open files

2014-06-24 Thread Lung, Paul
Hi All,


I just upgraded my cluster from 0.8.1 to 0.8.1.1. I’m seeing the following 
error messages on the same 3 brokers once in a while:


[2014-06-24 21:43:44,711] ERROR Error in acceptor (kafka.network.Acceptor)

java.io.IOException: Too many open files

at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)

at 
sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:163)

at kafka.network.Acceptor.accept(SocketServer.scala:200)

at kafka.network.Acceptor.run(SocketServer.scala:154)

at java.lang.Thread.run(Thread.java:679)

[2014-06-24 21:43:44,711] ERROR Error in acceptor (kafka.network.Acceptor)

java.io.IOException: Too many open files

at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)

at 
sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:163)

at kafka.network.Acceptor.accept(SocketServer.scala:200)

at kafka.network.Acceptor.run(SocketServer.scala:154)

at java.lang.Thread.run(Thread.java:679)

When this happens, these 3 brokers essentially go out of sync when you do a 
“kafka-topics.sh —describe”.

I tracked the number of open files by doing “watch –n 1 ‘sudo lsof | wc –l’”, 
which basically counts all open files on the system. The numbers for the 
systems are basically in the 6000 range, with one system going to 9000. I 
presume the 9000 machine is the controller. Looking at the ulimit of the user, 
both the hard limit and the soft limit for open files is 100,000. Using sysctl, 
the max file is fs.file-max = 9774928. So we seem to be way under the limit.

What am I missing here? Is there some JVM limit around 10K open files or 
something?

Paul Lung


Re: Getting java.io.IOException: Too many open files

2014-06-24 Thread Prakash Gowri Shankor
How many files does each broker itself have open ? You can find this from
'ls -l /proc/processid/fd'




On Tue, Jun 24, 2014 at 10:18 PM, Lung, Paul pl...@ebay.com wrote:

 Hi All,


 I just upgraded my cluster from 0.8.1 to 0.8.1.1. I’m seeing the following
 error messages on the same 3 brokers once in a while:


 [2014-06-24 21:43:44,711] ERROR Error in acceptor (kafka.network.Acceptor)

 java.io.IOException: Too many open files

 at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)

 at
 sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:163)

 at kafka.network.Acceptor.accept(SocketServer.scala:200)

 at kafka.network.Acceptor.run(SocketServer.scala:154)

 at java.lang.Thread.run(Thread.java:679)

 [2014-06-24 21:43:44,711] ERROR Error in acceptor (kafka.network.Acceptor)

 java.io.IOException: Too many open files

 at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)

 at
 sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:163)

 at kafka.network.Acceptor.accept(SocketServer.scala:200)

 at kafka.network.Acceptor.run(SocketServer.scala:154)

 at java.lang.Thread.run(Thread.java:679)

 When this happens, these 3 brokers essentially go out of sync when you do
 a “kafka-topics.sh —describe”.

 I tracked the number of open files by doing “watch –n 1 ‘sudo lsof | wc
 –l’”, which basically counts all open files on the system. The numbers for
 the systems are basically in the 6000 range, with one system going to 9000.
 I presume the 9000 machine is the controller. Looking at the ulimit of the
 user, both the hard limit and the soft limit for open files is 100,000.
 Using sysctl, the max file is fs.file-max = 9774928. So we seem to be way
 under the limit.

 What am I missing here? Is there some JVM limit around 10K open files or
 something?

 Paul Lung



Re: Getting java.io.IOException: Too many open files

2014-06-24 Thread Lung, Paul
The controller machine has 3500 or so, while the other machines have
around 1600.

Paul Lung

On 6/24/14, 10:31 PM, Prakash Gowri Shankor prakash.shan...@gmail.com
wrote:

How many files does each broker itself have open ? You can find this from
'ls -l /proc/processid/fd'




On Tue, Jun 24, 2014 at 10:18 PM, Lung, Paul pl...@ebay.com wrote:

 Hi All,


 I just upgraded my cluster from 0.8.1 to 0.8.1.1. I¹m seeing the
following
 error messages on the same 3 brokers once in a while:


 [2014-06-24 21:43:44,711] ERROR Error in acceptor
(kafka.network.Acceptor)

 java.io.IOException: Too many open files

 at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)

 at
 
sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:16
3)

 at kafka.network.Acceptor.accept(SocketServer.scala:200)

 at kafka.network.Acceptor.run(SocketServer.scala:154)

 at java.lang.Thread.run(Thread.java:679)

 [2014-06-24 21:43:44,711] ERROR Error in acceptor
(kafka.network.Acceptor)

 java.io.IOException: Too many open files

 at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)

 at
 
sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:16
3)

 at kafka.network.Acceptor.accept(SocketServer.scala:200)

 at kafka.network.Acceptor.run(SocketServer.scala:154)

 at java.lang.Thread.run(Thread.java:679)

 When this happens, these 3 brokers essentially go out of sync when you
do
 a ³kafka-topics.sh ‹describe².

 I tracked the number of open files by doing ³watch ­n 1 Œsudo lsof | wc
 ­l¹², which basically counts all open files on the system. The numbers
for
 the systems are basically in the 6000 range, with one system going to
9000.
 I presume the 9000 machine is the controller. Looking at the ulimit of
the
 user, both the hard limit and the soft limit for open files is 100,000.
 Using sysctl, the max file is fs.file-max = 9774928. So we seem to be
way
 under the limit.

 What am I missing here? Is there some JVM limit around 10K open files or
 something?

 Paul Lung




Re: too many open files - broker died

2013-11-02 Thread Kane Kane
Thanks, Jun.

On Sat, Nov 2, 2013 at 8:31 PM, Jun Rao jun...@gmail.com wrote:
 The # of required open file handlers is # client socket connections + # log
 segment and index files.

 Thanks,

 Jun


 On Fri, Nov 1, 2013 at 10:28 PM, Kane Kane kane.ist...@gmail.com wrote:

 I had only 1 topic with 45 partitions replicated across 3 brokers.
 After several hours of uploading some data to kafka 1 broker died with
 the following exception.
 I guess i can fix it raising limit for open files, but I wonder how it
 happened under described circumstances.


 [2013-11-02 00:19:14,862] INFO Reconnect due to socket error: null
 (kafka.consumer.SimpleConsumer)
 [2013-11-02 00:19:14,706] INFO Reconnect due to socket error: null
 (kafka.consumer.SimpleConsumer)

 [2013-11-02 00:19:05,150] INFO Reconnect due to socket error: null
 (kafka.consumer.SimpleConsumer)
 [2013-11-02 00:09:08,569] FATAL [ReplicaFetcherThread-0-2], Disk error
 while replicating data. (kafka.server.ReplicaFetcherThread)
 kafka.common.KafkaStorageException: I/O exception in append to log
 'perf1-4'
 at kafka.log.Log.append(Unknown Source)
 at kafka.server.ReplicaFetcherThread.processPartitionData(Unknown
 Source)
 at
 kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$1$$anonfun$apply$mcV$sp$2.apply(Unknown
 Source)
 at
 kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$1$$anonfun$apply$mcV$sp$2.apply(Unknown
 Source)
 at
 scala.collection.immutable.HashMap$HashMap1.foreach(HashMap.scala:224)
 at
 scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:403)
 at
 kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$1.apply$mcV$sp(Unknown
 Source)
 at
 kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$1.apply(Unknown
 Source)
 at
 kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$1.apply(Unknown
 Source)
 at kafka.utils.Utils$.inLock(Unknown Source)
 at kafka.server.AbstractFetcherThread.processFetchRequest(Unknown
 Source)
 at kafka.server.AbstractFetcherThread.doWork(Unknown Source)
 at kafka.utils.ShutdownableThread.run(Unknown Source)
 Caused by: java.io.FileNotFoundException:
 /disk1/kafka-logs/perf1-4/00010558.index (Too many open
 files)
 at java.io.RandomAccessFile.open(Native Method)
 at java.io.RandomAccessFile.init(RandomAccessFile.java:241)
 at kafka.log.OffsetIndex$$anonfun$resize$1.apply(Unknown Source)
 at kafka.log.OffsetIndex$$anonfun$resize$1.apply(Unknown Source)
 at kafka.utils.Utils$.inLock(Unknown Source)
 at kafka.log.OffsetIndex.resize(Unknown Source)
 at
 kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply$mcV$sp(Unknown
 Source)
 at kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply(Unknown
 Source)
 at kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply(Unknown
 Source)
 at kafka.utils.Utils$.inLock(Unknown Source)
 at kafka.log.OffsetIndex.trimToValidSize(Unknown Source)
 at kafka.log.Log.roll(Unknown Source)
 at kafka.log.Log.maybeRoll(Unknown Source)



too many open files - broker died

2013-11-01 Thread Kane Kane
I had only 1 topic with 45 partitions replicated across 3 brokers.
After several hours of uploading some data to kafka 1 broker died with
the following exception.
I guess i can fix it raising limit for open files, but I wonder how it
happened under described circumstances.


[2013-11-02 00:19:14,862] INFO Reconnect due to socket error: null
(kafka.consumer.SimpleConsumer)
[2013-11-02 00:19:14,706] INFO Reconnect due to socket error: null
(kafka.consumer.SimpleConsumer)

[2013-11-02 00:19:05,150] INFO Reconnect due to socket error: null
(kafka.consumer.SimpleConsumer)
[2013-11-02 00:09:08,569] FATAL [ReplicaFetcherThread-0-2], Disk error
while replicating data. (kafka.server.ReplicaFetcherThread)
kafka.common.KafkaStorageException: I/O exception in append to log 'perf1-4'
at kafka.log.Log.append(Unknown Source)
at kafka.server.ReplicaFetcherThread.processPartitionData(Unknown
Source)
at 
kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$1$$anonfun$apply$mcV$sp$2.apply(Unknown
Source)
at 
kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$1$$anonfun$apply$mcV$sp$2.apply(Unknown
Source)
at 
scala.collection.immutable.HashMap$HashMap1.foreach(HashMap.scala:224)
at 
scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:403)
at 
kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$1.apply$mcV$sp(Unknown
Source)
at 
kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$1.apply(Unknown
Source)
at 
kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$1.apply(Unknown
Source)
at kafka.utils.Utils$.inLock(Unknown Source)
at kafka.server.AbstractFetcherThread.processFetchRequest(Unknown
Source)
at kafka.server.AbstractFetcherThread.doWork(Unknown Source)
at kafka.utils.ShutdownableThread.run(Unknown Source)
Caused by: java.io.FileNotFoundException:
/disk1/kafka-logs/perf1-4/00010558.index (Too many open
files)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.init(RandomAccessFile.java:241)
at kafka.log.OffsetIndex$$anonfun$resize$1.apply(Unknown Source)
at kafka.log.OffsetIndex$$anonfun$resize$1.apply(Unknown Source)
at kafka.utils.Utils$.inLock(Unknown Source)
at kafka.log.OffsetIndex.resize(Unknown Source)
at kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply$mcV$sp(Unknown
Source)
at kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply(Unknown
Source)
at kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply(Unknown
Source)
at kafka.utils.Utils$.inLock(Unknown Source)
at kafka.log.OffsetIndex.trimToValidSize(Unknown Source)
at kafka.log.Log.roll(Unknown Source)
at kafka.log.Log.maybeRoll(Unknown Source)


Re: Too many open files

2013-09-26 Thread Jun Rao
Are you using the java or non-java producer? Are you using ZK based,
broker-list based, or VIP based producer?

Thanks,

Jun


On Wed, Sep 25, 2013 at 10:06 PM, Nicolas Berthet
nicolasbert...@maaii.comwrote:

 Jun,

 I observed similar kind of things recently. (didn't notice before because
 our file limit is huge)

 I have a set of brokers in a datacenter, and producers in different data
 centers.

 At some point I got disconnections, from the producer perspective I had
 something like 15 connections to the broker. On the other hand on the
 broker side, I observed hundreds of connections from the producer in an
 ESTABLISHED state.

 We had some default settings for the socket timeout on the OS level, which
 we reduced hoping it would prevent the issue in the future. I'm not sure if
 the issue is from the broker or OS configuration though. I'm still keeping
 the broker under observation for the time being.

 Note that, for clients in the same datacenter, we didn't see this issue,
 the socket count matches on both ends.

 Nicolas Berthet

 -Original Message-
 From: Jun Rao [mailto:jun...@gmail.com]
 Sent: Thursday, September 26, 2013 12:39 PM
 To: users@kafka.apache.org
 Subject: Re: Too many open files

 If a client is gone, the broker should automatically close those broken
 sockets. Are you using a hardware load balancer?

 Thanks,

 Jun


 On Wed, Sep 25, 2013 at 4:48 PM, Mark static.void@gmail.com wrote:

  FYI if I kill all producers I don't see the number of open files drop.
  I still see all the ESTABLISHED connections.
 
  Is there a broker setting to automatically kill any inactive TCP
  connections?
 
 
  On Sep 25, 2013, at 4:30 PM, Mark static.void@gmail.com wrote:
 
   Any other ideas?
  
   On Sep 25, 2013, at 9:06 AM, Jun Rao jun...@gmail.com wrote:
  
   We haven't seen any socket leaks with the java producer. If you
   have
  lots
   of unexplained socket connections in established mode, one possible
  cause
   is that the client created new producer instances, but didn't close
   the
  old
   ones.
  
   Thanks,
  
   Jun
  
  
   On Wed, Sep 25, 2013 at 6:08 AM, Mark static.void@gmail.com
  wrote:
  
   No. We are using the kafka-rb ruby gem producer.
   https://github.com/acrosa/kafka-rb
  
   Now that you asked that question I need to ask. Is there a problem
   with the java producer?
  
   Sent from my iPhone
  
   On Sep 24, 2013, at 9:01 PM, Jun Rao jun...@gmail.com wrote:
  
   Are you using the java producer client?
  
   Thanks,
  
   Jun
  
  
   On Tue, Sep 24, 2013 at 5:33 PM, Mark
   static.void@gmail.com
   wrote:
  
   Our 0.7.2 Kafka cluster keeps crashing with:
  
   2013-09-24 17:21:47,513 -  [kafka-acceptor:Acceptor@153] - Error
   in acceptor
java.io.IOException: Too many open
  
   The obvious fix is to bump up the number of open files but I'm
  wondering
   if there is a leak on the Kafka side and/or our application
   side. We currently have the ulimit set to a generous 4096 but
   obviously we are hitting this ceiling. What's a recommended value?
  
   We are running rails and our Unicorn workers are connecting to
   our
  Kafka
   cluster via round-robin load balancing. We have about 1500
   workers to
   that
   would be 1500 connections right there but they should be split
   across
   our 3
   nodes. Instead Netstat shows thousands of connections that look
   like
   this:
  
   tcp0  0 kafka1.mycompany.:XmlIpcRegSvc :::
   10.99.99.1:22503ESTABLISHED
   tcp0  0 kafka1.mycompany.:XmlIpcRegSvc :::
   10.99.99.1:48398ESTABLISHED
   tcp0  0 kafka1.mycompany.:XmlIpcRegSvc :::
   10.99.99.2:29617ESTABLISHED
   tcp0  0 kafka1.mycompany.:XmlIpcRegSvc :::
   10.99.99.1:32444ESTABLISHED
   tcp0  0 kafka1.mycompany.:XmlIpcRegSvc :::
   10.99.99.1:34415ESTABLISHED
   tcp0  0 kafka1.mycompany.:XmlIpcRegSvc :::
   10.99.99.1:56901ESTABLISHED
   tcp0  0 kafka1.mycompany.:XmlIpcRegSvc :::
   10.99.99.2:45349ESTABLISHED
  
   Has anyone come across this problem before? Is this a 0.7.2
   leak, LB misconfiguration... ?
  
   Thanks
  
  
 
 



Re: Too many open files

2013-09-26 Thread Mark
We are using a hardware loadbalancer with a VIP based ruby producer.

On Sep 26, 2013, at 7:37 AM, Jun Rao jun...@gmail.com wrote:

 Are you using the java or non-java producer? Are you using ZK based,
 broker-list based, or VIP based producer?
 
 Thanks,
 
 Jun
 
 
 On Wed, Sep 25, 2013 at 10:06 PM, Nicolas Berthet
 nicolasbert...@maaii.comwrote:
 
 Jun,
 
 I observed similar kind of things recently. (didn't notice before because
 our file limit is huge)
 
 I have a set of brokers in a datacenter, and producers in different data
 centers.
 
 At some point I got disconnections, from the producer perspective I had
 something like 15 connections to the broker. On the other hand on the
 broker side, I observed hundreds of connections from the producer in an
 ESTABLISHED state.
 
 We had some default settings for the socket timeout on the OS level, which
 we reduced hoping it would prevent the issue in the future. I'm not sure if
 the issue is from the broker or OS configuration though. I'm still keeping
 the broker under observation for the time being.
 
 Note that, for clients in the same datacenter, we didn't see this issue,
 the socket count matches on both ends.
 
 Nicolas Berthet
 
 -Original Message-
 From: Jun Rao [mailto:jun...@gmail.com]
 Sent: Thursday, September 26, 2013 12:39 PM
 To: users@kafka.apache.org
 Subject: Re: Too many open files
 
 If a client is gone, the broker should automatically close those broken
 sockets. Are you using a hardware load balancer?
 
 Thanks,
 
 Jun
 
 
 On Wed, Sep 25, 2013 at 4:48 PM, Mark static.void@gmail.com wrote:
 
 FYI if I kill all producers I don't see the number of open files drop.
 I still see all the ESTABLISHED connections.
 
 Is there a broker setting to automatically kill any inactive TCP
 connections?
 
 
 On Sep 25, 2013, at 4:30 PM, Mark static.void@gmail.com wrote:
 
 Any other ideas?
 
 On Sep 25, 2013, at 9:06 AM, Jun Rao jun...@gmail.com wrote:
 
 We haven't seen any socket leaks with the java producer. If you
 have
 lots
 of unexplained socket connections in established mode, one possible
 cause
 is that the client created new producer instances, but didn't close
 the
 old
 ones.
 
 Thanks,
 
 Jun
 
 
 On Wed, Sep 25, 2013 at 6:08 AM, Mark static.void@gmail.com
 wrote:
 
 No. We are using the kafka-rb ruby gem producer.
 https://github.com/acrosa/kafka-rb
 
 Now that you asked that question I need to ask. Is there a problem
 with the java producer?
 
 Sent from my iPhone
 
 On Sep 24, 2013, at 9:01 PM, Jun Rao jun...@gmail.com wrote:
 
 Are you using the java producer client?
 
 Thanks,
 
 Jun
 
 
 On Tue, Sep 24, 2013 at 5:33 PM, Mark
 static.void@gmail.com
 wrote:
 
 Our 0.7.2 Kafka cluster keeps crashing with:
 
 2013-09-24 17:21:47,513 -  [kafka-acceptor:Acceptor@153] - Error
 in acceptor
 java.io.IOException: Too many open
 
 The obvious fix is to bump up the number of open files but I'm
 wondering
 if there is a leak on the Kafka side and/or our application
 side. We currently have the ulimit set to a generous 4096 but
 obviously we are hitting this ceiling. What's a recommended value?
 
 We are running rails and our Unicorn workers are connecting to
 our
 Kafka
 cluster via round-robin load balancing. We have about 1500
 workers to
 that
 would be 1500 connections right there but they should be split
 across
 our 3
 nodes. Instead Netstat shows thousands of connections that look
 like
 this:
 
 tcp0  0 kafka1.mycompany.:XmlIpcRegSvc :::
 10.99.99.1:22503ESTABLISHED
 tcp0  0 kafka1.mycompany.:XmlIpcRegSvc :::
 10.99.99.1:48398ESTABLISHED
 tcp0  0 kafka1.mycompany.:XmlIpcRegSvc :::
 10.99.99.2:29617ESTABLISHED
 tcp0  0 kafka1.mycompany.:XmlIpcRegSvc :::
 10.99.99.1:32444ESTABLISHED
 tcp0  0 kafka1.mycompany.:XmlIpcRegSvc :::
 10.99.99.1:34415ESTABLISHED
 tcp0  0 kafka1.mycompany.:XmlIpcRegSvc :::
 10.99.99.1:56901ESTABLISHED
 tcp0  0 kafka1.mycompany.:XmlIpcRegSvc :::
 10.99.99.2:45349ESTABLISHED
 
 Has anyone come across this problem before? Is this a 0.7.2
 leak, LB misconfiguration... ?
 
 Thanks
 
 
 
 
 



Re: Too many open files

2013-09-26 Thread Mark
What OS settings did you change? How high is your huge file limit?


On Sep 25, 2013, at 10:06 PM, Nicolas Berthet nicolasbert...@maaii.com wrote:

 Jun,
 
 I observed similar kind of things recently. (didn't notice before because our 
 file limit is huge)
 
 I have a set of brokers in a datacenter, and producers in different data 
 centers. 
 
 At some point I got disconnections, from the producer perspective I had 
 something like 15 connections to the broker. On the other hand on the broker 
 side, I observed hundreds of connections from the producer in an ESTABLISHED 
 state.
 
 We had some default settings for the socket timeout on the OS level, which we 
 reduced hoping it would prevent the issue in the future. I'm not sure if the 
 issue is from the broker or OS configuration though. I'm still keeping the 
 broker under observation for the time being.
 
 Note that, for clients in the same datacenter, we didn't see this issue, the 
 socket count matches on both ends.
 
 Nicolas Berthet 
 
 -Original Message-
 From: Jun Rao [mailto:jun...@gmail.com] 
 Sent: Thursday, September 26, 2013 12:39 PM
 To: users@kafka.apache.org
 Subject: Re: Too many open files
 
 If a client is gone, the broker should automatically close those broken 
 sockets. Are you using a hardware load balancer?
 
 Thanks,
 
 Jun
 
 
 On Wed, Sep 25, 2013 at 4:48 PM, Mark static.void@gmail.com wrote:
 
 FYI if I kill all producers I don't see the number of open files drop. 
 I still see all the ESTABLISHED connections.
 
 Is there a broker setting to automatically kill any inactive TCP 
 connections?
 
 
 On Sep 25, 2013, at 4:30 PM, Mark static.void@gmail.com wrote:
 
 Any other ideas?
 
 On Sep 25, 2013, at 9:06 AM, Jun Rao jun...@gmail.com wrote:
 
 We haven't seen any socket leaks with the java producer. If you 
 have
 lots
 of unexplained socket connections in established mode, one possible
 cause
 is that the client created new producer instances, but didn't close 
 the
 old
 ones.
 
 Thanks,
 
 Jun
 
 
 On Wed, Sep 25, 2013 at 6:08 AM, Mark static.void@gmail.com
 wrote:
 
 No. We are using the kafka-rb ruby gem producer.
 https://github.com/acrosa/kafka-rb
 
 Now that you asked that question I need to ask. Is there a problem 
 with the java producer?
 
 Sent from my iPhone
 
 On Sep 24, 2013, at 9:01 PM, Jun Rao jun...@gmail.com wrote:
 
 Are you using the java producer client?
 
 Thanks,
 
 Jun
 
 
 On Tue, Sep 24, 2013 at 5:33 PM, Mark 
 static.void@gmail.com
 wrote:
 
 Our 0.7.2 Kafka cluster keeps crashing with:
 
 2013-09-24 17:21:47,513 -  [kafka-acceptor:Acceptor@153] - Error 
 in acceptor
 java.io.IOException: Too many open
 
 The obvious fix is to bump up the number of open files but I'm
 wondering
 if there is a leak on the Kafka side and/or our application 
 side. We currently have the ulimit set to a generous 4096 but 
 obviously we are hitting this ceiling. What's a recommended value?
 
 We are running rails and our Unicorn workers are connecting to 
 our
 Kafka
 cluster via round-robin load balancing. We have about 1500 
 workers to
 that
 would be 1500 connections right there but they should be split 
 across
 our 3
 nodes. Instead Netstat shows thousands of connections that look 
 like
 this:
 
 tcp0  0 kafka1.mycompany.:XmlIpcRegSvc :::
 10.99.99.1:22503ESTABLISHED
 tcp0  0 kafka1.mycompany.:XmlIpcRegSvc :::
 10.99.99.1:48398ESTABLISHED
 tcp0  0 kafka1.mycompany.:XmlIpcRegSvc :::
 10.99.99.2:29617ESTABLISHED
 tcp0  0 kafka1.mycompany.:XmlIpcRegSvc :::
 10.99.99.1:32444ESTABLISHED
 tcp0  0 kafka1.mycompany.:XmlIpcRegSvc :::
 10.99.99.1:34415ESTABLISHED
 tcp0  0 kafka1.mycompany.:XmlIpcRegSvc :::
 10.99.99.1:56901ESTABLISHED
 tcp0  0 kafka1.mycompany.:XmlIpcRegSvc :::
 10.99.99.2:45349ESTABLISHED
 
 Has anyone come across this problem before? Is this a 0.7.2 
 leak, LB misconfiguration... ?
 
 Thanks
 
 
 
 



RE: Too many open files

2013-09-26 Thread Nicolas Berthet
Hi Mark,

I'm using centos 6.2. My file limit is something like 500k, the value is 
arbitrary.

One of the thing I changed so far are the TCP keepalive parameters, it had 
moderate success so far.

net.ipv4.tcp_keepalive_time
net.ipv4.tcp_keepalive_intvl
net.ipv4.tcp_keepalive_probes

I still notice an abnormal number of ESTABLISHED connections, I've been doing 
some search and came over this page 
(http://www.lognormal.com/blog/2012/09/27/linux-tcpip-tuning/)

I'll change the net.netfilter.nf_conntrack_tcp_timeout_established as 
indicated there, it looks closer to the solution to my issue.

Are you also experiencing the issue in a cross data center context ? 

Best regards,

Nicolas Berthet 


-Original Message-
From: Mark [mailto:static.void@gmail.com] 
Sent: Friday, September 27, 2013 6:08 AM
To: users@kafka.apache.org
Subject: Re: Too many open files

What OS settings did you change? How high is your huge file limit?


On Sep 25, 2013, at 10:06 PM, Nicolas Berthet nicolasbert...@maaii.com wrote:

 Jun,
 
 I observed similar kind of things recently. (didn't notice before 
 because our file limit is huge)
 
 I have a set of brokers in a datacenter, and producers in different data 
 centers. 
 
 At some point I got disconnections, from the producer perspective I had 
 something like 15 connections to the broker. On the other hand on the broker 
 side, I observed hundreds of connections from the producer in an ESTABLISHED 
 state.
 
 We had some default settings for the socket timeout on the OS level, which we 
 reduced hoping it would prevent the issue in the future. I'm not sure if the 
 issue is from the broker or OS configuration though. I'm still keeping the 
 broker under observation for the time being.
 
 Note that, for clients in the same datacenter, we didn't see this issue, the 
 socket count matches on both ends.
 
 Nicolas Berthet
 
 -Original Message-
 From: Jun Rao [mailto:jun...@gmail.com]
 Sent: Thursday, September 26, 2013 12:39 PM
 To: users@kafka.apache.org
 Subject: Re: Too many open files
 
 If a client is gone, the broker should automatically close those broken 
 sockets. Are you using a hardware load balancer?
 
 Thanks,
 
 Jun
 
 
 On Wed, Sep 25, 2013 at 4:48 PM, Mark static.void@gmail.com wrote:
 
 FYI if I kill all producers I don't see the number of open files drop. 
 I still see all the ESTABLISHED connections.
 
 Is there a broker setting to automatically kill any inactive TCP 
 connections?
 
 
 On Sep 25, 2013, at 4:30 PM, Mark static.void@gmail.com wrote:
 
 Any other ideas?
 
 On Sep 25, 2013, at 9:06 AM, Jun Rao jun...@gmail.com wrote:
 
 We haven't seen any socket leaks with the java producer. If you 
 have
 lots
 of unexplained socket connections in established mode, one possible
 cause
 is that the client created new producer instances, but didn't close 
 the
 old
 ones.
 
 Thanks,
 
 Jun
 
 
 On Wed, Sep 25, 2013 at 6:08 AM, Mark static.void@gmail.com
 wrote:
 
 No. We are using the kafka-rb ruby gem producer.
 https://github.com/acrosa/kafka-rb
 
 Now that you asked that question I need to ask. Is there a problem 
 with the java producer?
 
 Sent from my iPhone
 
 On Sep 24, 2013, at 9:01 PM, Jun Rao jun...@gmail.com wrote:
 
 Are you using the java producer client?
 
 Thanks,
 
 Jun
 
 
 On Tue, Sep 24, 2013 at 5:33 PM, Mark 
 static.void@gmail.com
 wrote:
 
 Our 0.7.2 Kafka cluster keeps crashing with:
 
 2013-09-24 17:21:47,513 -  [kafka-acceptor:Acceptor@153] - Error 
 in acceptor
 java.io.IOException: Too many open
 
 The obvious fix is to bump up the number of open files but I'm
 wondering
 if there is a leak on the Kafka side and/or our application 
 side. We currently have the ulimit set to a generous 4096 but 
 obviously we are hitting this ceiling. What's a recommended value?
 
 We are running rails and our Unicorn workers are connecting to 
 our
 Kafka
 cluster via round-robin load balancing. We have about 1500 
 workers to
 that
 would be 1500 connections right there but they should be split 
 across
 our 3
 nodes. Instead Netstat shows thousands of connections that look 
 like
 this:
 
 tcp0  0 kafka1.mycompany.:XmlIpcRegSvc :::
 10.99.99.1:22503ESTABLISHED
 tcp0  0 kafka1.mycompany.:XmlIpcRegSvc :::
 10.99.99.1:48398ESTABLISHED
 tcp0  0 kafka1.mycompany.:XmlIpcRegSvc :::
 10.99.99.2:29617ESTABLISHED
 tcp0  0 kafka1.mycompany.:XmlIpcRegSvc :::
 10.99.99.1:32444ESTABLISHED
 tcp0  0 kafka1.mycompany.:XmlIpcRegSvc :::
 10.99.99.1:34415ESTABLISHED
 tcp0  0 kafka1.mycompany.:XmlIpcRegSvc :::
 10.99.99.1:56901ESTABLISHED
 tcp0  0 kafka1.mycompany.:XmlIpcRegSvc :::
 10.99.99.2:45349ESTABLISHED
 
 Has anyone come across this problem before? Is this a 0.7.2 
 leak, LB misconfiguration... ?
 
 Thanks
 
 
 
 



Re: Too many open files

2013-09-25 Thread Mark
No. We are using the kafka-rb ruby gem producer. 
https://github.com/acrosa/kafka-rb

Now that you asked that question I need to ask. Is there a problem with the 
java producer?

Sent from my iPhone

 On Sep 24, 2013, at 9:01 PM, Jun Rao jun...@gmail.com wrote:
 
 Are you using the java producer client?
 
 Thanks,
 
 Jun
 
 
 On Tue, Sep 24, 2013 at 5:33 PM, Mark static.void@gmail.com wrote:
 
 Our 0.7.2 Kafka cluster keeps crashing with:
 
 2013-09-24 17:21:47,513 -  [kafka-acceptor:Acceptor@153] - Error in
 acceptor
java.io.IOException: Too many open
 
 The obvious fix is to bump up the number of open files but I'm wondering
 if there is a leak on the Kafka side and/or our application side. We
 currently have the ulimit set to a generous 4096 but obviously we are
 hitting this ceiling. What's a recommended value?
 
 We are running rails and our Unicorn workers are connecting to our Kafka
 cluster via round-robin load balancing. We have about 1500 workers to that
 would be 1500 connections right there but they should be split across our 3
 nodes. Instead Netstat shows thousands of connections that look like this:
 
 tcp0  0 kafka1.mycompany.:XmlIpcRegSvc :::10.99.99.1:22503   
  ESTABLISHED
 tcp0  0 kafka1.mycompany.:XmlIpcRegSvc :::10.99.99.1:48398   
  ESTABLISHED
 tcp0  0 kafka1.mycompany.:XmlIpcRegSvc :::10.99.99.2:29617   
  ESTABLISHED
 tcp0  0 kafka1.mycompany.:XmlIpcRegSvc :::10.99.99.1:32444   
  ESTABLISHED
 tcp0  0 kafka1.mycompany.:XmlIpcRegSvc :::10.99.99.1:34415   
  ESTABLISHED
 tcp0  0 kafka1.mycompany.:XmlIpcRegSvc :::10.99.99.1:56901   
  ESTABLISHED
 tcp0  0 kafka1.mycompany.:XmlIpcRegSvc :::10.99.99.2:45349   
  ESTABLISHED
 
 Has anyone come across this problem before? Is this a 0.7.2 leak, LB
 misconfiguration… ?
 
 Thanks


Re: Too many open files

2013-09-25 Thread Jun Rao
We haven't seen any socket leaks with the java producer. If you have lots
of unexplained socket connections in established mode, one possible cause
is that the client created new producer instances, but didn't close the old
ones.

Thanks,

Jun


On Wed, Sep 25, 2013 at 6:08 AM, Mark static.void@gmail.com wrote:

 No. We are using the kafka-rb ruby gem producer.
 https://github.com/acrosa/kafka-rb

 Now that you asked that question I need to ask. Is there a problem with
 the java producer?

 Sent from my iPhone

  On Sep 24, 2013, at 9:01 PM, Jun Rao jun...@gmail.com wrote:
 
  Are you using the java producer client?
 
  Thanks,
 
  Jun
 
 
  On Tue, Sep 24, 2013 at 5:33 PM, Mark static.void@gmail.com
 wrote:
 
  Our 0.7.2 Kafka cluster keeps crashing with:
 
  2013-09-24 17:21:47,513 -  [kafka-acceptor:Acceptor@153] - Error in
  acceptor
 java.io.IOException: Too many open
 
  The obvious fix is to bump up the number of open files but I'm wondering
  if there is a leak on the Kafka side and/or our application side. We
  currently have the ulimit set to a generous 4096 but obviously we are
  hitting this ceiling. What's a recommended value?
 
  We are running rails and our Unicorn workers are connecting to our Kafka
  cluster via round-robin load balancing. We have about 1500 workers to
 that
  would be 1500 connections right there but they should be split across
 our 3
  nodes. Instead Netstat shows thousands of connections that look like
 this:
 
  tcp0  0 kafka1.mycompany.:XmlIpcRegSvc :::
 10.99.99.1:22503ESTABLISHED
  tcp0  0 kafka1.mycompany.:XmlIpcRegSvc :::
 10.99.99.1:48398ESTABLISHED
  tcp0  0 kafka1.mycompany.:XmlIpcRegSvc :::
 10.99.99.2:29617ESTABLISHED
  tcp0  0 kafka1.mycompany.:XmlIpcRegSvc :::
 10.99.99.1:32444ESTABLISHED
  tcp0  0 kafka1.mycompany.:XmlIpcRegSvc :::
 10.99.99.1:34415ESTABLISHED
  tcp0  0 kafka1.mycompany.:XmlIpcRegSvc :::
 10.99.99.1:56901ESTABLISHED
  tcp0  0 kafka1.mycompany.:XmlIpcRegSvc :::
 10.99.99.2:45349ESTABLISHED
 
  Has anyone come across this problem before? Is this a 0.7.2 leak, LB
  misconfiguration… ?
 
  Thanks



Re: Too many open files

2013-09-25 Thread Mark
Any other ideas?

On Sep 25, 2013, at 9:06 AM, Jun Rao jun...@gmail.com wrote:

 We haven't seen any socket leaks with the java producer. If you have lots
 of unexplained socket connections in established mode, one possible cause
 is that the client created new producer instances, but didn't close the old
 ones.
 
 Thanks,
 
 Jun
 
 
 On Wed, Sep 25, 2013 at 6:08 AM, Mark static.void@gmail.com wrote:
 
 No. We are using the kafka-rb ruby gem producer.
 https://github.com/acrosa/kafka-rb
 
 Now that you asked that question I need to ask. Is there a problem with
 the java producer?
 
 Sent from my iPhone
 
 On Sep 24, 2013, at 9:01 PM, Jun Rao jun...@gmail.com wrote:
 
 Are you using the java producer client?
 
 Thanks,
 
 Jun
 
 
 On Tue, Sep 24, 2013 at 5:33 PM, Mark static.void@gmail.com
 wrote:
 
 Our 0.7.2 Kafka cluster keeps crashing with:
 
 2013-09-24 17:21:47,513 -  [kafka-acceptor:Acceptor@153] - Error in
 acceptor
   java.io.IOException: Too many open
 
 The obvious fix is to bump up the number of open files but I'm wondering
 if there is a leak on the Kafka side and/or our application side. We
 currently have the ulimit set to a generous 4096 but obviously we are
 hitting this ceiling. What's a recommended value?
 
 We are running rails and our Unicorn workers are connecting to our Kafka
 cluster via round-robin load balancing. We have about 1500 workers to
 that
 would be 1500 connections right there but they should be split across
 our 3
 nodes. Instead Netstat shows thousands of connections that look like
 this:
 
 tcp0  0 kafka1.mycompany.:XmlIpcRegSvc :::
 10.99.99.1:22503ESTABLISHED
 tcp0  0 kafka1.mycompany.:XmlIpcRegSvc :::
 10.99.99.1:48398ESTABLISHED
 tcp0  0 kafka1.mycompany.:XmlIpcRegSvc :::
 10.99.99.2:29617ESTABLISHED
 tcp0  0 kafka1.mycompany.:XmlIpcRegSvc :::
 10.99.99.1:32444ESTABLISHED
 tcp0  0 kafka1.mycompany.:XmlIpcRegSvc :::
 10.99.99.1:34415ESTABLISHED
 tcp0  0 kafka1.mycompany.:XmlIpcRegSvc :::
 10.99.99.1:56901ESTABLISHED
 tcp0  0 kafka1.mycompany.:XmlIpcRegSvc :::
 10.99.99.2:45349ESTABLISHED
 
 Has anyone come across this problem before? Is this a 0.7.2 leak, LB
 misconfiguration… ?
 
 Thanks
 



Re: Too many open files

2013-09-25 Thread Mark
FYI if I kill all producers I don't see the number of open files drop. I still 
see all the ESTABLISHED connections.

Is there a broker setting to automatically kill any inactive TCP connections?


On Sep 25, 2013, at 4:30 PM, Mark static.void@gmail.com wrote:

 Any other ideas?
 
 On Sep 25, 2013, at 9:06 AM, Jun Rao jun...@gmail.com wrote:
 
 We haven't seen any socket leaks with the java producer. If you have lots
 of unexplained socket connections in established mode, one possible cause
 is that the client created new producer instances, but didn't close the old
 ones.
 
 Thanks,
 
 Jun
 
 
 On Wed, Sep 25, 2013 at 6:08 AM, Mark static.void@gmail.com wrote:
 
 No. We are using the kafka-rb ruby gem producer.
 https://github.com/acrosa/kafka-rb
 
 Now that you asked that question I need to ask. Is there a problem with
 the java producer?
 
 Sent from my iPhone
 
 On Sep 24, 2013, at 9:01 PM, Jun Rao jun...@gmail.com wrote:
 
 Are you using the java producer client?
 
 Thanks,
 
 Jun
 
 
 On Tue, Sep 24, 2013 at 5:33 PM, Mark static.void@gmail.com
 wrote:
 
 Our 0.7.2 Kafka cluster keeps crashing with:
 
 2013-09-24 17:21:47,513 -  [kafka-acceptor:Acceptor@153] - Error in
 acceptor
  java.io.IOException: Too many open
 
 The obvious fix is to bump up the number of open files but I'm wondering
 if there is a leak on the Kafka side and/or our application side. We
 currently have the ulimit set to a generous 4096 but obviously we are
 hitting this ceiling. What's a recommended value?
 
 We are running rails and our Unicorn workers are connecting to our Kafka
 cluster via round-robin load balancing. We have about 1500 workers to
 that
 would be 1500 connections right there but they should be split across
 our 3
 nodes. Instead Netstat shows thousands of connections that look like
 this:
 
 tcp0  0 kafka1.mycompany.:XmlIpcRegSvc :::
 10.99.99.1:22503ESTABLISHED
 tcp0  0 kafka1.mycompany.:XmlIpcRegSvc :::
 10.99.99.1:48398ESTABLISHED
 tcp0  0 kafka1.mycompany.:XmlIpcRegSvc :::
 10.99.99.2:29617ESTABLISHED
 tcp0  0 kafka1.mycompany.:XmlIpcRegSvc :::
 10.99.99.1:32444ESTABLISHED
 tcp0  0 kafka1.mycompany.:XmlIpcRegSvc :::
 10.99.99.1:34415ESTABLISHED
 tcp0  0 kafka1.mycompany.:XmlIpcRegSvc :::
 10.99.99.1:56901ESTABLISHED
 tcp0  0 kafka1.mycompany.:XmlIpcRegSvc :::
 10.99.99.2:45349ESTABLISHED
 
 Has anyone come across this problem before? Is this a 0.7.2 leak, LB
 misconfiguration… ?
 
 Thanks
 
 



Re: Too many open files

2013-09-25 Thread Jun Rao
If a client is gone, the broker should automatically close those broken
sockets. Are you using a hardware load balancer?

Thanks,

Jun


On Wed, Sep 25, 2013 at 4:48 PM, Mark static.void@gmail.com wrote:

 FYI if I kill all producers I don't see the number of open files drop. I
 still see all the ESTABLISHED connections.

 Is there a broker setting to automatically kill any inactive TCP
 connections?


 On Sep 25, 2013, at 4:30 PM, Mark static.void@gmail.com wrote:

  Any other ideas?
 
  On Sep 25, 2013, at 9:06 AM, Jun Rao jun...@gmail.com wrote:
 
  We haven't seen any socket leaks with the java producer. If you have
 lots
  of unexplained socket connections in established mode, one possible
 cause
  is that the client created new producer instances, but didn't close the
 old
  ones.
 
  Thanks,
 
  Jun
 
 
  On Wed, Sep 25, 2013 at 6:08 AM, Mark static.void@gmail.com
 wrote:
 
  No. We are using the kafka-rb ruby gem producer.
  https://github.com/acrosa/kafka-rb
 
  Now that you asked that question I need to ask. Is there a problem with
  the java producer?
 
  Sent from my iPhone
 
  On Sep 24, 2013, at 9:01 PM, Jun Rao jun...@gmail.com wrote:
 
  Are you using the java producer client?
 
  Thanks,
 
  Jun
 
 
  On Tue, Sep 24, 2013 at 5:33 PM, Mark static.void@gmail.com
  wrote:
 
  Our 0.7.2 Kafka cluster keeps crashing with:
 
  2013-09-24 17:21:47,513 -  [kafka-acceptor:Acceptor@153] - Error in
  acceptor
   java.io.IOException: Too many open
 
  The obvious fix is to bump up the number of open files but I'm
 wondering
  if there is a leak on the Kafka side and/or our application side. We
  currently have the ulimit set to a generous 4096 but obviously we are
  hitting this ceiling. What's a recommended value?
 
  We are running rails and our Unicorn workers are connecting to our
 Kafka
  cluster via round-robin load balancing. We have about 1500 workers to
  that
  would be 1500 connections right there but they should be split across
  our 3
  nodes. Instead Netstat shows thousands of connections that look like
  this:
 
  tcp0  0 kafka1.mycompany.:XmlIpcRegSvc :::
  10.99.99.1:22503ESTABLISHED
  tcp0  0 kafka1.mycompany.:XmlIpcRegSvc :::
  10.99.99.1:48398ESTABLISHED
  tcp0  0 kafka1.mycompany.:XmlIpcRegSvc :::
  10.99.99.2:29617ESTABLISHED
  tcp0  0 kafka1.mycompany.:XmlIpcRegSvc :::
  10.99.99.1:32444ESTABLISHED
  tcp0  0 kafka1.mycompany.:XmlIpcRegSvc :::
  10.99.99.1:34415ESTABLISHED
  tcp0  0 kafka1.mycompany.:XmlIpcRegSvc :::
  10.99.99.1:56901ESTABLISHED
  tcp0  0 kafka1.mycompany.:XmlIpcRegSvc :::
  10.99.99.2:45349ESTABLISHED
 
  Has anyone come across this problem before? Is this a 0.7.2 leak, LB
  misconfiguration… ?
 
  Thanks
 
 




Too many open files

2013-09-24 Thread Mark
Our 0.7.2 Kafka cluster keeps crashing with:

2013-09-24 17:21:47,513 -  [kafka-acceptor:Acceptor@153] - Error in acceptor
java.io.IOException: Too many open 

The obvious fix is to bump up the number of open files but I'm wondering if 
there is a leak on the Kafka side and/or our application side. We currently 
have the ulimit set to a generous 4096 but obviously we are hitting this 
ceiling. What's a recommended value? 

We are running rails and our Unicorn workers are connecting to our Kafka 
cluster via round-robin load balancing. We have about 1500 workers to that 
would be 1500 connections right there but they should be split across our 3 
nodes. Instead Netstat shows thousands of connections that look like this:

tcp0  0 kafka1.mycompany.:XmlIpcRegSvc :::10.99.99.1:22503 
ESTABLISHED
tcp0  0 kafka1.mycompany.:XmlIpcRegSvc :::10.99.99.1:48398 
ESTABLISHED
tcp0  0 kafka1.mycompany.:XmlIpcRegSvc :::10.99.99.2:29617 
ESTABLISHED
tcp0  0 kafka1.mycompany.:XmlIpcRegSvc :::10.99.99.1:32444 
ESTABLISHED
tcp0  0 kafka1.mycompany.:XmlIpcRegSvc :::10.99.99.1:34415 
ESTABLISHED
tcp0  0 kafka1.mycompany.:XmlIpcRegSvc :::10.99.99.1:56901 
ESTABLISHED
tcp0  0 kafka1.mycompany.:XmlIpcRegSvc :::10.99.99.2:45349 
ESTABLISHED

Has anyone come across this problem before? Is this a 0.7.2 leak, LB 
misconfiguration… ?

Thanks

Re: Too many open files

2013-09-24 Thread Jun Rao
Are you using the java producer client?

Thanks,

Jun


On Tue, Sep 24, 2013 at 5:33 PM, Mark static.void@gmail.com wrote:

 Our 0.7.2 Kafka cluster keeps crashing with:

 2013-09-24 17:21:47,513 -  [kafka-acceptor:Acceptor@153] - Error in
 acceptor
 java.io.IOException: Too many open

 The obvious fix is to bump up the number of open files but I'm wondering
 if there is a leak on the Kafka side and/or our application side. We
 currently have the ulimit set to a generous 4096 but obviously we are
 hitting this ceiling. What's a recommended value?

 We are running rails and our Unicorn workers are connecting to our Kafka
 cluster via round-robin load balancing. We have about 1500 workers to that
 would be 1500 connections right there but they should be split across our 3
 nodes. Instead Netstat shows thousands of connections that look like this:

 tcp0  0 kafka1.mycompany.:XmlIpcRegSvc :::10.99.99.1:22503
 ESTABLISHED
 tcp0  0 kafka1.mycompany.:XmlIpcRegSvc :::10.99.99.1:48398
 ESTABLISHED
 tcp0  0 kafka1.mycompany.:XmlIpcRegSvc :::10.99.99.2:29617
 ESTABLISHED
 tcp0  0 kafka1.mycompany.:XmlIpcRegSvc :::10.99.99.1:32444
 ESTABLISHED
 tcp0  0 kafka1.mycompany.:XmlIpcRegSvc :::10.99.99.1:34415
 ESTABLISHED
 tcp0  0 kafka1.mycompany.:XmlIpcRegSvc :::10.99.99.1:56901
 ESTABLISHED
 tcp0  0 kafka1.mycompany.:XmlIpcRegSvc :::10.99.99.2:45349
 ESTABLISHED

 Has anyone come across this problem before? Is this a 0.7.2 leak, LB
 misconfiguration… ?

 Thanks


Re: java.net.SocketException: Too many open files

2013-08-02 Thread Felix GV
We've had this problem with Zookeeper...

Setting ulimit properly can occasionally be tricky because you need to
logout and re-ssh into the box for the changes to take effect on the next
processes you start up. Another problem we've hit was that our puppet
service was running in the background and silently restoring settings to
their original values, which would bite us a while later, when we'd need to
restart a service (currently running processes keep the limit they had at
start time).

You can double-check that your processes are running with the ulimit you
expect them to by finding out their PID (using ps) and then doing sudo cat
/proc/PID/limits

If you don't see the value you configured in the Max open files line,
then something somewhere prevented your process from using the number of
file handles you want it to.

Of course, what I just said doesn't address the possibility that there
could be some sort of file handle leak somewhere in the 0.8 code... Though
I guess such bug would have surfaced in heavy-duty environments such as
LinkedIn's, if it existed.

--
Felix


On Fri, Aug 2, 2013 at 12:07 AM, Jun Rao jun...@gmail.com wrote:

 If you do netstat, what hosts are those connections for and what state are
 those connections in?

 Thanks,

 Jun


 On Thu, Aug 1, 2013 at 9:04 AM, Nandigam, Sujitha snandi...@verisign.com
 wrote:

  Hi,
 
  In producer I was continuously getting this exception
  java.net.SocketException: Too many open files
  even though I added the below line to /etc/security/limits.conf
 
 
 
  kafka-0.8.0-beta1-src-nofile983040
 
 
  ERROR Producer connection to localhost:9093 unsuccessful
  (kafka.producer.SyncProducer)
  java.net.SocketException: Too many open files
 
  Please help me how to resolve this.
 
  Thanks,
  Sujitha
  This message (including any attachments) is intended only for the use of
  the individual or entity to which it is addressed, and may contain
  information that is non-public, proprietary, privileged, confidential and
  exempt from disclosure under applicable law or may be constituted as
  attorney work product. If you are not the intended recipient, you are
  hereby notified that any use, dissemination, distribution, or copying of
  this communication is strictly prohibited. If you have received this
  message in error, notify sender immediately and delete this message
  immediately.
 



java.net.SocketException: Too many open files

2013-08-01 Thread Nandigam, Sujitha
Hi,

In producer I was continuously getting this exception java.net.SocketException: 
Too many open files
even though I added the below line to /etc/security/limits.conf



kafka-0.8.0-beta1-src-nofile983040


ERROR Producer connection to localhost:9093 unsuccessful 
(kafka.producer.SyncProducer)
java.net.SocketException: Too many open files

Please help me how to resolve this.

Thanks,
Sujitha
This message (including any attachments) is intended only for the use of the 
individual or entity to which it is addressed, and may contain information that 
is non-public, proprietary, privileged, confidential and exempt from disclosure 
under applicable law or may be constituted as attorney work product. If you are 
not the intended recipient, you are hereby notified that any use, 
dissemination, distribution, or copying of this communication is strictly 
prohibited. If you have received this message in error, notify sender 
immediately and delete this message immediately.


Re: java.net.SocketException: Too many open files

2013-08-01 Thread Jun Rao
If you do netstat, what hosts are those connections for and what state are
those connections in?

Thanks,

Jun


On Thu, Aug 1, 2013 at 9:04 AM, Nandigam, Sujitha snandi...@verisign.comwrote:

 Hi,

 In producer I was continuously getting this exception
 java.net.SocketException: Too many open files
 even though I added the below line to /etc/security/limits.conf



 kafka-0.8.0-beta1-src-nofile983040


 ERROR Producer connection to localhost:9093 unsuccessful
 (kafka.producer.SyncProducer)
 java.net.SocketException: Too many open files

 Please help me how to resolve this.

 Thanks,
 Sujitha
 This message (including any attachments) is intended only for the use of
 the individual or entity to which it is addressed, and may contain
 information that is non-public, proprietary, privileged, confidential and
 exempt from disclosure under applicable law or may be constituted as
 attorney work product. If you are not the intended recipient, you are
 hereby notified that any use, dissemination, distribution, or copying of
 this communication is strictly prohibited. If you have received this
 message in error, notify sender immediately and delete this message
 immediately.