Re: Traffic Router Fail - Too Many Open Sockets
Hi, I think I found the root cause. We had an https delivery service with no ssl-keys. As a result the router was not able to fully load the configuration. The bad delivery service was deleted and cr-config was snapshot again. Now however the router was stuck, still trying to get the old certificate. Therefore it kept on rejecting the new config, ad had the below messages. Can it be the reason for the connection towards the monitor not being closed properly? Nir INFO 2018-02-14T07:33:33.020 [New I/O worker #2] com.comcast.cdn.traffic_control.traffic_router.core.config.ConfigHandler - Waiting for https certificates to support new config 5cb1f62b INFO 2018-02-14T07:33:34.021 [New I/O worker #2] com.comcast.cdn.traffic_control.traffic_router.core.config.ConfigHandler - Waiting for https certificates to support new config 5cb1f62b INFO 2018-02-14T07:33:34.990 [pool-5-thread-1] com.comcast.cdn.traffic_control.traffic_router.core.monitor.TrafficMonitorWatcher - Loading properties from /opt/traffic_router/conf/traffic_monitor.properties INFO 2018-02-14T07:33:34.994 [New I/O worker #3] com.comcast.cdn.traffic_control.traffic_router.core.config.ConfigHandler - Entered processConfig INFO 2018-02-14T07:33:35.021 [New I/O worker #2] com.comcast.cdn.traffic_control.traffic_router.core.config.ConfigHandler - Exiting processConfig: processing of config with timestamp Wed Feb 14 07:31:46 UTC 2018 was cancelled WARN 2018-02-14T07:33:35.021 [New I/O worker #2] com.comcast.cdn.traffic_control.traffic_router.core.util.PeriodicResourceUpdater - File rejected: /opt/traffic_router/db/cr-config.json On Wed, Feb 14, 2018 at 9:51 PM, Nir Sopher wrote: > Hi, > > I implemented the fix and issue was resolved > until today:) > > I have 2 routers, both got stuck together due to connections leak, with > "CLOSE_WAIT" connection towards the monitors. > The only messages in catalina.out were: > WARNING: Imported handshake data with alias > Feb 13, 2018 2:04:49 PM com.comcast.cdn.traffic_ > control.traffic_router.secure.CertificateRegistry > importCertificateDataList > > Can it be that in some rare, probably failing, situations, the monitor > does not close the connection? > Nir > > On Thu, Feb 1, 2018 at 11:27 PM, Nir Sopher wrote: > >> Great, >> Thanks! >> Nir >> >> On Thu, Feb 1, 2018 at 11:12 PM, Jeffrey Martin >> wrote: >> >>> Hi Nir, >>>This issue is defined by: >>> >>> Jira: https://issues.apache.org/jira/browse/TC-197 >>> and Github https://github.com/apache/incubator-trafficcontrol/issues/916 >>> >>> I will be working on a pull request to address this issue in 2.2. The >>> work >>> around is in the second link above. >>> Jeff >>> >>> >>> On Thu, Feb 1, 2018 at 4:09 PM, Jeffrey Martin >>> wrote: >>> >>> > Hi Nir, >>> > >>> > >>> > On Thu, Feb 1, 2018 at 4:01 PM, Nir Sopher wrote: >>> > >>> >> Hi, >>> >> >>> >> One of my routers got stuck today, not being able to answer http >>> requests >>> >> (routing and API). >>> >> When trying to investigate the issue, I found catalina.log with a lot >>> of >>> >> messages complaining on failure to open a socket due to too many open >>> >> files. See example below. >>> >> No issues were found in the log earlier to that point, beyond a >>> periodic >>> >> warnings of pulling the certificates every 5 minutes. >>> >> >>> >> When trying to understand "what are these open files", I found about >>> 4k >>> >> open connections in "CLOSE_WAIT" towards the monitor. >>> >> Note: I'm running TC2.1 RC3 with golang traffic-monitor. >>> >> >>> >> Have anyone encountered a similar issue? >>> >> Are the warnings for pulling the certificates a normal thing? >>> >> >>> >> Thanks, >>> >> Nir >>> >> >>> >> Feb 01, 2018 7:33:09 AM >>> >> com.comcast.cdn.traffic_control.traffic_router.secure.Certif >>> icateRegistry >>> >> importCertificateDataList >>> >> WARNING: Imported handshake data with alias my-ds.my-cdn.com >>> >> Feb 01, 2018 8:43:13 AM org.apache.tomcat.util.net.Nio >>> Endpoint$Acceptor >>> >> run >>> >> SEVERE: Socket accept failed >>> >> java.io.IOException: Too many open files >>> >> at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) >>> >> at >>> >> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChanne >>> >> lImpl.java:422) >>> >> at >>> >> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChanne >>> >> lImpl.java:250) >>> >> at >>> >> org.apache.tomcat.util.net.NioEndpoint$Acceptor.run(NioEndpo >>> >> int.java:1309) >>> >> at java.lang.Thread.run(Thread.java:745) >>> >> >>> >> Feb 01, 2018 8:43:14 AM org.apache.tomcat.util.net.Nio >>> Endpoint$Acceptor >>> >> run >>> >> SEVERE: Socket accept failed >>> >> java.io.IOException: Too many open files >>> >> at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) >>> >> at >>> >> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChanne >>> >> lImpl.java:422) >>> >> at >>> >> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChanne >>> >>
Re: Traffic Router Fail - Too Many Open Sockets
Hi, I implemented the fix and issue was resolved until today:) I have 2 routers, both got stuck together due to connections leak, with "CLOSE_WAIT" connection towards the monitors. The only messages in catalina.out were: WARNING: Imported handshake data with alias Feb 13, 2018 2:04:49 PM com.comcast.cdn.traffic_control.traffic_router.secure.CertificateRegistry importCertificateDataList Can it be that in some rare, probably failing, situations, the monitor does not close the connection? Nir On Thu, Feb 1, 2018 at 11:27 PM, Nir Sopher wrote: > Great, > Thanks! > Nir > > On Thu, Feb 1, 2018 at 11:12 PM, Jeffrey Martin > wrote: > >> Hi Nir, >>This issue is defined by: >> >> Jira: https://issues.apache.org/jira/browse/TC-197 >> and Github https://github.com/apache/incubator-trafficcontrol/issues/916 >> >> I will be working on a pull request to address this issue in 2.2. The work >> around is in the second link above. >> Jeff >> >> >> On Thu, Feb 1, 2018 at 4:09 PM, Jeffrey Martin >> wrote: >> >> > Hi Nir, >> > >> > >> > On Thu, Feb 1, 2018 at 4:01 PM, Nir Sopher wrote: >> > >> >> Hi, >> >> >> >> One of my routers got stuck today, not being able to answer http >> requests >> >> (routing and API). >> >> When trying to investigate the issue, I found catalina.log with a lot >> of >> >> messages complaining on failure to open a socket due to too many open >> >> files. See example below. >> >> No issues were found in the log earlier to that point, beyond a >> periodic >> >> warnings of pulling the certificates every 5 minutes. >> >> >> >> When trying to understand "what are these open files", I found about 4k >> >> open connections in "CLOSE_WAIT" towards the monitor. >> >> Note: I'm running TC2.1 RC3 with golang traffic-monitor. >> >> >> >> Have anyone encountered a similar issue? >> >> Are the warnings for pulling the certificates a normal thing? >> >> >> >> Thanks, >> >> Nir >> >> >> >> Feb 01, 2018 7:33:09 AM >> >> com.comcast.cdn.traffic_control.traffic_router.secure.Certif >> icateRegistry >> >> importCertificateDataList >> >> WARNING: Imported handshake data with alias my-ds.my-cdn.com >> >> Feb 01, 2018 8:43:13 AM org.apache.tomcat.util.net.Nio >> Endpoint$Acceptor >> >> run >> >> SEVERE: Socket accept failed >> >> java.io.IOException: Too many open files >> >> at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) >> >> at >> >> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChanne >> >> lImpl.java:422) >> >> at >> >> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChanne >> >> lImpl.java:250) >> >> at >> >> org.apache.tomcat.util.net.NioEndpoint$Acceptor.run(NioEndpo >> >> int.java:1309) >> >> at java.lang.Thread.run(Thread.java:745) >> >> >> >> Feb 01, 2018 8:43:14 AM org.apache.tomcat.util.net.Nio >> Endpoint$Acceptor >> >> run >> >> SEVERE: Socket accept failed >> >> java.io.IOException: Too many open files >> >> at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) >> >> at >> >> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChanne >> >> lImpl.java:422) >> >> at >> >> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChanne >> >> lImpl.java:250) >> >> at >> >> org.apache.tomcat.util.net.NioEndpoint$Acceptor.run(NioEndpo >> >> int.java:1309) >> >> at java.lang.Thread.run(Thread.java:745) >> >> >> > >> > >> > >
Re: Traffic Router Fail - Too Many Open Sockets
Great, Thanks! Nir On Thu, Feb 1, 2018 at 11:12 PM, Jeffrey Martin wrote: > Hi Nir, >This issue is defined by: > > Jira: https://issues.apache.org/jira/browse/TC-197 > and Github https://github.com/apache/incubator-trafficcontrol/issues/916 > > I will be working on a pull request to address this issue in 2.2. The work > around is in the second link above. > Jeff > > > On Thu, Feb 1, 2018 at 4:09 PM, Jeffrey Martin > wrote: > > > Hi Nir, > > > > > > On Thu, Feb 1, 2018 at 4:01 PM, Nir Sopher wrote: > > > >> Hi, > >> > >> One of my routers got stuck today, not being able to answer http > requests > >> (routing and API). > >> When trying to investigate the issue, I found catalina.log with a lot of > >> messages complaining on failure to open a socket due to too many open > >> files. See example below. > >> No issues were found in the log earlier to that point, beyond a periodic > >> warnings of pulling the certificates every 5 minutes. > >> > >> When trying to understand "what are these open files", I found about 4k > >> open connections in "CLOSE_WAIT" towards the monitor. > >> Note: I'm running TC2.1 RC3 with golang traffic-monitor. > >> > >> Have anyone encountered a similar issue? > >> Are the warnings for pulling the certificates a normal thing? > >> > >> Thanks, > >> Nir > >> > >> Feb 01, 2018 7:33:09 AM > >> com.comcast.cdn.traffic_control.traffic_router.secure. > CertificateRegistry > >> importCertificateDataList > >> WARNING: Imported handshake data with alias my-ds.my-cdn.com > >> Feb 01, 2018 8:43:13 AM org.apache.tomcat.util.net.NioEndpoint$Acceptor > >> run > >> SEVERE: Socket accept failed > >> java.io.IOException: Too many open files > >> at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) > >> at > >> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChanne > >> lImpl.java:422) > >> at > >> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChanne > >> lImpl.java:250) > >> at > >> org.apache.tomcat.util.net.NioEndpoint$Acceptor.run(NioEndpo > >> int.java:1309) > >> at java.lang.Thread.run(Thread.java:745) > >> > >> Feb 01, 2018 8:43:14 AM org.apache.tomcat.util.net.NioEndpoint$Acceptor > >> run > >> SEVERE: Socket accept failed > >> java.io.IOException: Too many open files > >> at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) > >> at > >> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChanne > >> lImpl.java:422) > >> at > >> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChanne > >> lImpl.java:250) > >> at > >> org.apache.tomcat.util.net.NioEndpoint$Acceptor.run(NioEndpo > >> int.java:1309) > >> at java.lang.Thread.run(Thread.java:745) > >> > > > > >
Re: Traffic Router Fail - Too Many Open Sockets
Hi Nir, This issue is defined by: Jira: https://issues.apache.org/jira/browse/TC-197 and Github https://github.com/apache/incubator-trafficcontrol/issues/916 I will be working on a pull request to address this issue in 2.2. The work around is in the second link above. Jeff On Thu, Feb 1, 2018 at 4:09 PM, Jeffrey Martin wrote: > Hi Nir, > > > On Thu, Feb 1, 2018 at 4:01 PM, Nir Sopher wrote: > >> Hi, >> >> One of my routers got stuck today, not being able to answer http requests >> (routing and API). >> When trying to investigate the issue, I found catalina.log with a lot of >> messages complaining on failure to open a socket due to too many open >> files. See example below. >> No issues were found in the log earlier to that point, beyond a periodic >> warnings of pulling the certificates every 5 minutes. >> >> When trying to understand "what are these open files", I found about 4k >> open connections in "CLOSE_WAIT" towards the monitor. >> Note: I'm running TC2.1 RC3 with golang traffic-monitor. >> >> Have anyone encountered a similar issue? >> Are the warnings for pulling the certificates a normal thing? >> >> Thanks, >> Nir >> >> Feb 01, 2018 7:33:09 AM >> com.comcast.cdn.traffic_control.traffic_router.secure.CertificateRegistry >> importCertificateDataList >> WARNING: Imported handshake data with alias my-ds.my-cdn.com >> Feb 01, 2018 8:43:13 AM org.apache.tomcat.util.net.NioEndpoint$Acceptor >> run >> SEVERE: Socket accept failed >> java.io.IOException: Too many open files >> at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) >> at >> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChanne >> lImpl.java:422) >> at >> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChanne >> lImpl.java:250) >> at >> org.apache.tomcat.util.net.NioEndpoint$Acceptor.run(NioEndpo >> int.java:1309) >> at java.lang.Thread.run(Thread.java:745) >> >> Feb 01, 2018 8:43:14 AM org.apache.tomcat.util.net.NioEndpoint$Acceptor >> run >> SEVERE: Socket accept failed >> java.io.IOException: Too many open files >> at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) >> at >> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChanne >> lImpl.java:422) >> at >> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChanne >> lImpl.java:250) >> at >> org.apache.tomcat.util.net.NioEndpoint$Acceptor.run(NioEndpo >> int.java:1309) >> at java.lang.Thread.run(Thread.java:745) >> > >
Re: Traffic Router Fail - Too Many Open Sockets
Hi Nir, On Thu, Feb 1, 2018 at 4:01 PM, Nir Sopher wrote: > Hi, > > One of my routers got stuck today, not being able to answer http requests > (routing and API). > When trying to investigate the issue, I found catalina.log with a lot of > messages complaining on failure to open a socket due to too many open > files. See example below. > No issues were found in the log earlier to that point, beyond a periodic > warnings of pulling the certificates every 5 minutes. > > When trying to understand "what are these open files", I found about 4k > open connections in "CLOSE_WAIT" towards the monitor. > Note: I'm running TC2.1 RC3 with golang traffic-monitor. > > Have anyone encountered a similar issue? > Are the warnings for pulling the certificates a normal thing? > > Thanks, > Nir > > Feb 01, 2018 7:33:09 AM > com.comcast.cdn.traffic_control.traffic_router.secure.CertificateRegistry > importCertificateDataList > WARNING: Imported handshake data with alias my-ds.my-cdn.com > Feb 01, 2018 8:43:13 AM org.apache.tomcat.util.net.NioEndpoint$Acceptor > run > SEVERE: Socket accept failed > java.io.IOException: Too many open files > at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) > at > sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java: > 422) > at > sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java: > 250) > at > org.apache.tomcat.util.net.NioEndpoint$Acceptor.run(NioEndpoint.java:1309) > at java.lang.Thread.run(Thread.java:745) > > Feb 01, 2018 8:43:14 AM org.apache.tomcat.util.net.NioEndpoint$Acceptor > run > SEVERE: Socket accept failed > java.io.IOException: Too many open files > at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) > at > sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java: > 422) > at > sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java: > 250) > at > org.apache.tomcat.util.net.NioEndpoint$Acceptor.run(NioEndpoint.java:1309) > at java.lang.Thread.run(Thread.java:745) >