[jira] [Assigned] (FLUME-3078) Expose Monitoring Metrics For Netcat Source
[ https://issues.apache.org/jira/browse/FLUME-3078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Ahuja reassigned FLUME-3078: -- Assignee: Siddharth Ahuja > Expose Monitoring Metrics For Netcat Source > --- > > Key: FLUME-3078 > URL: https://issues.apache.org/jira/browse/FLUME-3078 > Project: Flume > Issue Type: Bug > Components: Sinks+Sources >Affects Versions: 1.7.0 > Environment: Linux >Reporter: Gangaraju >Assignee: Siddharth Ahuja > > Currently for Netcat source , no metrics being shown if we query using the > HTTP. ( -DFlume.monitoring.type=http). > It will be better , if we can get the details on important stats and errors > for the Netcat source. > We are looking for following stats: > EventRecievedCount > EventAcceptedCount > StartTime > StopTime > OpenConnectionCount > ConnectionsFailed -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (FLUME-2905) NetcatSource - Socket not closed when an exception is encountered during start() leading to file descriptor leaks
[ https://issues.apache.org/jira/browse/FLUME-2905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16064494#comment-16064494 ] Siddharth Ahuja commented on FLUME-2905: Hey [~sati], thanks for your reply and comments on the review. I have uploaded a new patch to this JIRA based on your comments. I hope this is enough to get the fix out of the way! Thanks in advance for reviewing :) > NetcatSource - Socket not closed when an exception is encountered during > start() leading to file descriptor leaks > - > > Key: FLUME-2905 > URL: https://issues.apache.org/jira/browse/FLUME-2905 > Project: Flume > Issue Type: Bug > Components: Sinks+Sources >Affects Versions: 1.6.0 >Reporter: Siddharth Ahuja >Assignee: Siddharth Ahuja > Attachments: FLUME-2905-0.patch, FLUME-2905-1.patch, > FLUME-2905-2.patch, FLUME-2905-3.patch, FLUME-2905-4.patch, > FLUME-2905-5.patch, FLUME-2905-6.patch > > > During the flume agent start-up, the flume configuration containing the > NetcatSource is parsed and the source's start() is called. If there is an > issue while binding the channel's socket to a local address to configure the > socket to listen for connections following exception is thrown but the socket > open just before is not closed. > {code} > 2016-05-01 03:04:37,273 ERROR org.apache.flume.lifecycle.LifecycleSupervisor: > Unable to start EventDrivenSourceRunner: { > source:org.apache.flume.source.NetcatSource{name:src-1,state:IDLE} } - > Exception follows. > org.apache.flume.FlumeException: java.net.BindException: Address already in > use > at org.apache.flume.source.NetcatSource.start(NetcatSource.java:173) > at > org.apache.flume.source.EventDrivenSourceRunner.start(EventDrivenSourceRunner.java:44) > at > org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:251) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.net.BindException: Address already in use > at sun.nio.ch.Net.bind0(Native Method) > at sun.nio.ch.Net.bind(Net.java:444) > at sun.nio.ch.Net.bind(Net.java:436) > at > sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:67) > at org.apache.flume.source.NetcatSource.start(NetcatSource.java:167) > ... 9 more > {code} > The source's start() is then called again leading to another socket being > opened but not closed and so on. This leads to file descriptor (socket) leaks. > This can be easily reproduced as follows: > 1. Set Netcat as the source in flume agent configuration. > 2. Set the bind port for the netcat source to a port which is already in use. > e.g. in my case I used 50010 which is the port for DataNode's XCeiver > Protocol in use by the HDFS service. > 3. Start flume agent and perform "lsof -p | wc -l". Notice > the file descriptors keep on growing due to socket leaks with errors like: > "can't identify protocol". -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (FLUME-2905) NetcatSource - Socket not closed when an exception is encountered during start() leading to file descriptor leaks
[ https://issues.apache.org/jira/browse/FLUME-2905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16064494#comment-16064494 ] Siddharth Ahuja edited comment on FLUME-2905 at 6/27/17 9:00 AM: - Hey [~sati], thanks for your reply and comments on the review. I have uploaded a new patch (patch #6) to this JIRA based on your comments. I hope this is enough to get the fix out of the way! Thanks in advance for reviewing :) was (Author: sahuja): Hey [~sati], thanks for your reply and comments on the review. I have uploaded a new patch to this JIRA based on your comments. I hope this is enough to get the fix out of the way! Thanks in advance for reviewing :) > NetcatSource - Socket not closed when an exception is encountered during > start() leading to file descriptor leaks > - > > Key: FLUME-2905 > URL: https://issues.apache.org/jira/browse/FLUME-2905 > Project: Flume > Issue Type: Bug > Components: Sinks+Sources >Affects Versions: 1.6.0 >Reporter: Siddharth Ahuja >Assignee: Siddharth Ahuja > Attachments: FLUME-2905-0.patch, FLUME-2905-1.patch, > FLUME-2905-2.patch, FLUME-2905-3.patch, FLUME-2905-4.patch, > FLUME-2905-5.patch, FLUME-2905-6.patch > > > During the flume agent start-up, the flume configuration containing the > NetcatSource is parsed and the source's start() is called. If there is an > issue while binding the channel's socket to a local address to configure the > socket to listen for connections following exception is thrown but the socket > open just before is not closed. > {code} > 2016-05-01 03:04:37,273 ERROR org.apache.flume.lifecycle.LifecycleSupervisor: > Unable to start EventDrivenSourceRunner: { > source:org.apache.flume.source.NetcatSource{name:src-1,state:IDLE} } - > Exception follows. > org.apache.flume.FlumeException: java.net.BindException: Address already in > use > at org.apache.flume.source.NetcatSource.start(NetcatSource.java:173) > at > org.apache.flume.source.EventDrivenSourceRunner.start(EventDrivenSourceRunner.java:44) > at > org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:251) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.net.BindException: Address already in use > at sun.nio.ch.Net.bind0(Native Method) > at sun.nio.ch.Net.bind(Net.java:444) > at sun.nio.ch.Net.bind(Net.java:436) > at > sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:67) > at org.apache.flume.source.NetcatSource.start(NetcatSource.java:167) > ... 9 more > {code} > The source's start() is then called again leading to another socket being > opened but not closed and so on. This leads to file descriptor (socket) leaks. > This can be easily reproduced as follows: > 1. Set Netcat as the source in flume agent configuration. > 2. Set the bind port for the netcat source to a port which is already in use. > e.g. in my case I used 50010 which is the port for DataNode's XCeiver > Protocol in use by the HDFS service. > 3. Start flume agent and perform "lsof -p | wc -l". Notice > the file descriptors keep on growing due to socket leaks with errors like: > "can't identify protocol". -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (FLUME-2905) NetcatSource - Socket not closed when an exception is encountered during start() leading to file descriptor leaks
[ https://issues.apache.org/jira/browse/FLUME-2905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Ahuja updated FLUME-2905: --- Attachment: FLUME-2905-6.patch > NetcatSource - Socket not closed when an exception is encountered during > start() leading to file descriptor leaks > - > > Key: FLUME-2905 > URL: https://issues.apache.org/jira/browse/FLUME-2905 > Project: Flume > Issue Type: Bug > Components: Sinks+Sources >Affects Versions: 1.6.0 >Reporter: Siddharth Ahuja >Assignee: Siddharth Ahuja > Attachments: FLUME-2905-0.patch, FLUME-2905-1.patch, > FLUME-2905-2.patch, FLUME-2905-3.patch, FLUME-2905-4.patch, > FLUME-2905-5.patch, FLUME-2905-6.patch > > > During the flume agent start-up, the flume configuration containing the > NetcatSource is parsed and the source's start() is called. If there is an > issue while binding the channel's socket to a local address to configure the > socket to listen for connections following exception is thrown but the socket > open just before is not closed. > {code} > 2016-05-01 03:04:37,273 ERROR org.apache.flume.lifecycle.LifecycleSupervisor: > Unable to start EventDrivenSourceRunner: { > source:org.apache.flume.source.NetcatSource{name:src-1,state:IDLE} } - > Exception follows. > org.apache.flume.FlumeException: java.net.BindException: Address already in > use > at org.apache.flume.source.NetcatSource.start(NetcatSource.java:173) > at > org.apache.flume.source.EventDrivenSourceRunner.start(EventDrivenSourceRunner.java:44) > at > org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:251) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.net.BindException: Address already in use > at sun.nio.ch.Net.bind0(Native Method) > at sun.nio.ch.Net.bind(Net.java:444) > at sun.nio.ch.Net.bind(Net.java:436) > at > sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:67) > at org.apache.flume.source.NetcatSource.start(NetcatSource.java:167) > ... 9 more > {code} > The source's start() is then called again leading to another socket being > opened but not closed and so on. This leads to file descriptor (socket) leaks. > This can be easily reproduced as follows: > 1. Set Netcat as the source in flume agent configuration. > 2. Set the bind port for the netcat source to a port which is already in use. > e.g. in my case I used 50010 which is the port for DataNode's XCeiver > Protocol in use by the HDFS service. > 3. Start flume agent and perform "lsof -p | wc -l". Notice > the file descriptors keep on growing due to socket leaks with errors like: > "can't identify protocol". -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (FLUME-2905) NetcatSource - Socket not closed when an exception is encountered during start() leading to file descriptor leaks
[ https://issues.apache.org/jira/browse/FLUME-2905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058799#comment-16058799 ] Siddharth Ahuja commented on FLUME-2905: Hey [~sati], just added the review request on the Review Board. Not sure if I have done everything as per the process. Would be great if you could check. Thanks in advance! > NetcatSource - Socket not closed when an exception is encountered during > start() leading to file descriptor leaks > - > > Key: FLUME-2905 > URL: https://issues.apache.org/jira/browse/FLUME-2905 > Project: Flume > Issue Type: Bug > Components: Sinks+Sources >Affects Versions: 1.6.0 >Reporter: Siddharth Ahuja >Assignee: Siddharth Ahuja > Attachments: FLUME-2905-0.patch, FLUME-2905-1.patch, > FLUME-2905-2.patch, FLUME-2905-3.patch, FLUME-2905-4.patch, FLUME-2905-5.patch > > > During the flume agent start-up, the flume configuration containing the > NetcatSource is parsed and the source's start() is called. If there is an > issue while binding the channel's socket to a local address to configure the > socket to listen for connections following exception is thrown but the socket > open just before is not closed. > {code} > 2016-05-01 03:04:37,273 ERROR org.apache.flume.lifecycle.LifecycleSupervisor: > Unable to start EventDrivenSourceRunner: { > source:org.apache.flume.source.NetcatSource{name:src-1,state:IDLE} } - > Exception follows. > org.apache.flume.FlumeException: java.net.BindException: Address already in > use > at org.apache.flume.source.NetcatSource.start(NetcatSource.java:173) > at > org.apache.flume.source.EventDrivenSourceRunner.start(EventDrivenSourceRunner.java:44) > at > org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:251) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.net.BindException: Address already in use > at sun.nio.ch.Net.bind0(Native Method) > at sun.nio.ch.Net.bind(Net.java:444) > at sun.nio.ch.Net.bind(Net.java:436) > at > sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:67) > at org.apache.flume.source.NetcatSource.start(NetcatSource.java:167) > ... 9 more > {code} > The source's start() is then called again leading to another socket being > opened but not closed and so on. This leads to file descriptor (socket) leaks. > This can be easily reproduced as follows: > 1. Set Netcat as the source in flume agent configuration. > 2. Set the bind port for the netcat source to a port which is already in use. > e.g. in my case I used 50010 which is the port for DataNode's XCeiver > Protocol in use by the HDFS service. > 3. Start flume agent and perform "lsof -p | wc -l". Notice > the file descriptors keep on growing due to socket leaks with errors like: > "can't identify protocol". -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (FLUME-2905) NetcatSource - Socket not closed when an exception is encountered during start() leading to file descriptor leaks
[ https://issues.apache.org/jira/browse/FLUME-2905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Ahuja updated FLUME-2905: --- Attachment: FLUME-2905-5.patch > NetcatSource - Socket not closed when an exception is encountered during > start() leading to file descriptor leaks > - > > Key: FLUME-2905 > URL: https://issues.apache.org/jira/browse/FLUME-2905 > Project: Flume > Issue Type: Bug > Components: Sinks+Sources >Affects Versions: 1.6.0 >Reporter: Siddharth Ahuja >Assignee: Siddharth Ahuja > Attachments: FLUME-2905-0.patch, FLUME-2905-1.patch, > FLUME-2905-2.patch, FLUME-2905-3.patch, FLUME-2905-4.patch, FLUME-2905-5.patch > > > During the flume agent start-up, the flume configuration containing the > NetcatSource is parsed and the source's start() is called. If there is an > issue while binding the channel's socket to a local address to configure the > socket to listen for connections following exception is thrown but the socket > open just before is not closed. > {code} > 2016-05-01 03:04:37,273 ERROR org.apache.flume.lifecycle.LifecycleSupervisor: > Unable to start EventDrivenSourceRunner: { > source:org.apache.flume.source.NetcatSource{name:src-1,state:IDLE} } - > Exception follows. > org.apache.flume.FlumeException: java.net.BindException: Address already in > use > at org.apache.flume.source.NetcatSource.start(NetcatSource.java:173) > at > org.apache.flume.source.EventDrivenSourceRunner.start(EventDrivenSourceRunner.java:44) > at > org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:251) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.net.BindException: Address already in use > at sun.nio.ch.Net.bind0(Native Method) > at sun.nio.ch.Net.bind(Net.java:444) > at sun.nio.ch.Net.bind(Net.java:436) > at > sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:67) > at org.apache.flume.source.NetcatSource.start(NetcatSource.java:167) > ... 9 more > {code} > The source's start() is then called again leading to another socket being > opened but not closed and so on. This leads to file descriptor (socket) leaks. > This can be easily reproduced as follows: > 1. Set Netcat as the source in flume agent configuration. > 2. Set the bind port for the netcat source to a port which is already in use. > e.g. in my case I used 50010 which is the port for DataNode's XCeiver > Protocol in use by the HDFS service. > 3. Start flume agent and perform "lsof -p | wc -l". Notice > the file descriptors keep on growing due to socket leaks with errors like: > "can't identify protocol". -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (FLUME-2905) NetcatSource - Socket not closed when an exception is encountered during start() leading to file descriptor leaks
[ https://issues.apache.org/jira/browse/FLUME-2905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058745#comment-16058745 ] Siddharth Ahuja edited comment on FLUME-2905 at 6/22/17 5:02 AM: - Hi [~sati], thanks a lot for your review. Please find my answers for your points: * For point 1. - "calling stop() after writing out the exception", I have moved stop() after logging the exception but just before it gets thrown. * For point 2. - We should possibly have a dedicated JIRA for removing the "return" statement from the stop() method as this would be a different issue to what I am trying to fix in this JIRA which is to prevent socket leaks if a port is already bound. Also, it would make tracking easier with a new JIRA as otherwise any issues (if any) arising from this removal will be discussed in this JIRA which is a side-track from the original issue that is already potentially resolved. What do you think? * For point 3 - I believe I have nothing to do here. I have gone on and created a new patch - FLUME-2905-5.patch for your review. I have tested that to ensure that there are no leaks and the junit also passes for me. I will try and add this to review board (haven't done that yet) soon. Thanks once again. was (Author: sahuja): Hi [~sati], thanks a lot for your review. Please find my answers for your points: * For point 1. - "calling stop() after writing out the exception", I have moved stop() after logging the exception but just before it gets thrown. * For point 2. - We should possibly have a dedicated JIRA for removing the "return" statement from the stop() method as this would be a different issue to what I am trying to fix in this JIRA which is to prevent socket leaks if a port is already bound. Also, it would make tracking easier with a new JIRA as otherwise any issues (if any) arising from this removal will be discussed in this JIRA which is a side-track from the original issue that is already potentially resolved. What do you think? * For point 3 - I believe I have nothing to do here. I have gone on and created a new patch - FLUME-2905-5.patch for your review and I have also tested that to ensure that there are no leaks. I will try and add this to review board (haven't done that yet) soon. Thanks once again. > NetcatSource - Socket not closed when an exception is encountered during > start() leading to file descriptor leaks > - > > Key: FLUME-2905 > URL: https://issues.apache.org/jira/browse/FLUME-2905 > Project: Flume > Issue Type: Bug > Components: Sinks+Sources >Affects Versions: 1.6.0 >Reporter: Siddharth Ahuja >Assignee: Siddharth Ahuja > Attachments: FLUME-2905-0.patch, FLUME-2905-1.patch, > FLUME-2905-2.patch, FLUME-2905-3.patch, FLUME-2905-4.patch > > > During the flume agent start-up, the flume configuration containing the > NetcatSource is parsed and the source's start() is called. If there is an > issue while binding the channel's socket to a local address to configure the > socket to listen for connections following exception is thrown but the socket > open just before is not closed. > {code} > 2016-05-01 03:04:37,273 ERROR org.apache.flume.lifecycle.LifecycleSupervisor: > Unable to start EventDrivenSourceRunner: { > source:org.apache.flume.source.NetcatSource{name:src-1,state:IDLE} } - > Exception follows. > org.apache.flume.FlumeException: java.net.BindException: Address already in > use > at org.apache.flume.source.NetcatSource.start(NetcatSource.java:173) > at > org.apache.flume.source.EventDrivenSourceRunner.start(EventDrivenSourceRunner.java:44) > at > org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:251) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.net.BindException: Address already in use > at sun.nio.ch.Net.bind0(Native Method) > at sun.nio.ch.Net.bind(Net.java:444) > at sun.nio.ch.Net.bind(Net.java:436) > at > sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214) > at
[jira] [Comment Edited] (FLUME-2905) NetcatSource - Socket not closed when an exception is encountered during start() leading to file descriptor leaks
[ https://issues.apache.org/jira/browse/FLUME-2905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058745#comment-16058745 ] Siddharth Ahuja edited comment on FLUME-2905 at 6/22/17 4:55 AM: - Hi [~sati], thanks a lot for your review. Please find my answers for your points: * For point 1. - "calling stop() after writing out the exception", I have moved stop() after logging the exception but just before it gets thrown. * For point 2. - We should possibly have a dedicated JIRA for removing the "return" statement from the stop() method as this would be a different issue to what I am trying to fix in this JIRA which is to prevent socket leaks if a port is already bound. Also, it would make tracking easier with a new JIRA as otherwise any issues (if any) arising from this removal will be discussed in this JIRA which is a side-track from the original issue that is already potentially resolved. What do you think? * For point 3 - I believe I have nothing to do here. I have gone on and created a new patch - FLUME-2905-5.patch for your review and I have also tested that to ensure that there are no leaks. I will try and add this to review board (haven't done that yet) soon. Thanks once again. was (Author: sahuja): Hi [~sati], thanks a lot for your review. Please find my answers for your points: * For point 1. - "calling stop() after writing out the exception", I have moved stop() after logging the exception but just before it gets thrown. * For point 2. - We should possibly have a dedicated JIRA for removing the "return" statement from the stop() method as this would be a different issue to what I am trying to fix in this JIRA which is to prevent socket leaks if a port is already bound. Also, it would make tracking easier with a new JIRA as otherwise any issues (if any) arising from this removal will be discussed in this JIRA which is a side-track from the original issue that is already potentially resolved. What do you think? * For point 3 - I believe I have nothing to do here. I have gone on and created a new patch - FLUME-2905-5.patch for your review and I have also tested that. I will try and add this to review board (haven't done that yet) soon. Thanks once again. > NetcatSource - Socket not closed when an exception is encountered during > start() leading to file descriptor leaks > - > > Key: FLUME-2905 > URL: https://issues.apache.org/jira/browse/FLUME-2905 > Project: Flume > Issue Type: Bug > Components: Sinks+Sources >Affects Versions: 1.6.0 >Reporter: Siddharth Ahuja >Assignee: Siddharth Ahuja > Attachments: FLUME-2905-0.patch, FLUME-2905-1.patch, > FLUME-2905-2.patch, FLUME-2905-3.patch, FLUME-2905-4.patch > > > During the flume agent start-up, the flume configuration containing the > NetcatSource is parsed and the source's start() is called. If there is an > issue while binding the channel's socket to a local address to configure the > socket to listen for connections following exception is thrown but the socket > open just before is not closed. > {code} > 2016-05-01 03:04:37,273 ERROR org.apache.flume.lifecycle.LifecycleSupervisor: > Unable to start EventDrivenSourceRunner: { > source:org.apache.flume.source.NetcatSource{name:src-1,state:IDLE} } - > Exception follows. > org.apache.flume.FlumeException: java.net.BindException: Address already in > use > at org.apache.flume.source.NetcatSource.start(NetcatSource.java:173) > at > org.apache.flume.source.EventDrivenSourceRunner.start(EventDrivenSourceRunner.java:44) > at > org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:251) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.net.BindException: Address already in use > at sun.nio.ch.Net.bind0(Native Method) > at sun.nio.ch.Net.bind(Net.java:444) > at sun.nio.ch.Net.bind(Net.java:436) > at > sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) >
[jira] [Commented] (FLUME-2905) NetcatSource - Socket not closed when an exception is encountered during start() leading to file descriptor leaks
[ https://issues.apache.org/jira/browse/FLUME-2905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058745#comment-16058745 ] Siddharth Ahuja commented on FLUME-2905: Hi [~sati], thanks a lot for your review. Please find my answers for your points: * For point 1. - "calling stop() after writing out the exception", I have moved stop() after logging the exception but just before it gets thrown. * For point 2. - We should possibly have a dedicated JIRA for removing the "return" statement from the stop() method as this would be a different issue to what I am trying to fix in this JIRA which is to prevent socket leaks if a port is already bound. Also, it would make tracking easier with a new JIRA as otherwise any issues (if any) arising from this removal will be discussed in this JIRA which is a side-track from the original issue that is already potentially resolved. What do you think? * For point 3 - I believe I have nothing to do here. I have gone on and created a new patch - FLUME-2905-5.patch for your review. I will try and add this to review board (haven't done that yet) soon. Thanks once again. > NetcatSource - Socket not closed when an exception is encountered during > start() leading to file descriptor leaks > - > > Key: FLUME-2905 > URL: https://issues.apache.org/jira/browse/FLUME-2905 > Project: Flume > Issue Type: Bug > Components: Sinks+Sources >Affects Versions: 1.6.0 >Reporter: Siddharth Ahuja >Assignee: Siddharth Ahuja > Attachments: FLUME-2905-0.patch, FLUME-2905-1.patch, > FLUME-2905-2.patch, FLUME-2905-3.patch, FLUME-2905-4.patch > > > During the flume agent start-up, the flume configuration containing the > NetcatSource is parsed and the source's start() is called. If there is an > issue while binding the channel's socket to a local address to configure the > socket to listen for connections following exception is thrown but the socket > open just before is not closed. > {code} > 2016-05-01 03:04:37,273 ERROR org.apache.flume.lifecycle.LifecycleSupervisor: > Unable to start EventDrivenSourceRunner: { > source:org.apache.flume.source.NetcatSource{name:src-1,state:IDLE} } - > Exception follows. > org.apache.flume.FlumeException: java.net.BindException: Address already in > use > at org.apache.flume.source.NetcatSource.start(NetcatSource.java:173) > at > org.apache.flume.source.EventDrivenSourceRunner.start(EventDrivenSourceRunner.java:44) > at > org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:251) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.net.BindException: Address already in use > at sun.nio.ch.Net.bind0(Native Method) > at sun.nio.ch.Net.bind(Net.java:444) > at sun.nio.ch.Net.bind(Net.java:436) > at > sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:67) > at org.apache.flume.source.NetcatSource.start(NetcatSource.java:167) > ... 9 more > {code} > The source's start() is then called again leading to another socket being > opened but not closed and so on. This leads to file descriptor (socket) leaks. > This can be easily reproduced as follows: > 1. Set Netcat as the source in flume agent configuration. > 2. Set the bind port for the netcat source to a port which is already in use. > e.g. in my case I used 50010 which is the port for DataNode's XCeiver > Protocol in use by the HDFS service. > 3. Start flume agent and perform "lsof -p | wc -l". Notice > the file descriptors keep on growing due to socket leaks with errors like: > "can't identify protocol". -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (FLUME-2905) NetcatSource - Socket not closed when an exception is encountered during start() leading to file descriptor leaks
[ https://issues.apache.org/jira/browse/FLUME-2905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058745#comment-16058745 ] Siddharth Ahuja edited comment on FLUME-2905 at 6/22/17 4:54 AM: - Hi [~sati], thanks a lot for your review. Please find my answers for your points: * For point 1. - "calling stop() after writing out the exception", I have moved stop() after logging the exception but just before it gets thrown. * For point 2. - We should possibly have a dedicated JIRA for removing the "return" statement from the stop() method as this would be a different issue to what I am trying to fix in this JIRA which is to prevent socket leaks if a port is already bound. Also, it would make tracking easier with a new JIRA as otherwise any issues (if any) arising from this removal will be discussed in this JIRA which is a side-track from the original issue that is already potentially resolved. What do you think? * For point 3 - I believe I have nothing to do here. I have gone on and created a new patch - FLUME-2905-5.patch for your review and I have also tested that. I will try and add this to review board (haven't done that yet) soon. Thanks once again. was (Author: sahuja): Hi [~sati], thanks a lot for your review. Please find my answers for your points: * For point 1. - "calling stop() after writing out the exception", I have moved stop() after logging the exception but just before it gets thrown. * For point 2. - We should possibly have a dedicated JIRA for removing the "return" statement from the stop() method as this would be a different issue to what I am trying to fix in this JIRA which is to prevent socket leaks if a port is already bound. Also, it would make tracking easier with a new JIRA as otherwise any issues (if any) arising from this removal will be discussed in this JIRA which is a side-track from the original issue that is already potentially resolved. What do you think? * For point 3 - I believe I have nothing to do here. I have gone on and created a new patch - FLUME-2905-5.patch for your review. I will try and add this to review board (haven't done that yet) soon. Thanks once again. > NetcatSource - Socket not closed when an exception is encountered during > start() leading to file descriptor leaks > - > > Key: FLUME-2905 > URL: https://issues.apache.org/jira/browse/FLUME-2905 > Project: Flume > Issue Type: Bug > Components: Sinks+Sources >Affects Versions: 1.6.0 >Reporter: Siddharth Ahuja >Assignee: Siddharth Ahuja > Attachments: FLUME-2905-0.patch, FLUME-2905-1.patch, > FLUME-2905-2.patch, FLUME-2905-3.patch, FLUME-2905-4.patch > > > During the flume agent start-up, the flume configuration containing the > NetcatSource is parsed and the source's start() is called. If there is an > issue while binding the channel's socket to a local address to configure the > socket to listen for connections following exception is thrown but the socket > open just before is not closed. > {code} > 2016-05-01 03:04:37,273 ERROR org.apache.flume.lifecycle.LifecycleSupervisor: > Unable to start EventDrivenSourceRunner: { > source:org.apache.flume.source.NetcatSource{name:src-1,state:IDLE} } - > Exception follows. > org.apache.flume.FlumeException: java.net.BindException: Address already in > use > at org.apache.flume.source.NetcatSource.start(NetcatSource.java:173) > at > org.apache.flume.source.EventDrivenSourceRunner.start(EventDrivenSourceRunner.java:44) > at > org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:251) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.net.BindException: Address already in use > at sun.nio.ch.Net.bind0(Native Method) > at sun.nio.ch.Net.bind(Net.java:444) > at sun.nio.ch.Net.bind(Net.java:436) > at > sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) > at
[jira] [Updated] (FLUME-2905) NetcatSource - Socket not closed when an exception is encountered during start() leading to file descriptor leaks
[ https://issues.apache.org/jira/browse/FLUME-2905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Ahuja updated FLUME-2905: --- Attachment: FLUME-2905-4.patch > NetcatSource - Socket not closed when an exception is encountered during > start() leading to file descriptor leaks > - > > Key: FLUME-2905 > URL: https://issues.apache.org/jira/browse/FLUME-2905 > Project: Flume > Issue Type: Bug > Components: Sinks+Sources >Affects Versions: 1.6.0 >Reporter: Siddharth Ahuja >Assignee: Siddharth Ahuja > Attachments: FLUME-2905-0.patch, FLUME-2905-1.patch, > FLUME-2905-2.patch, FLUME-2905-3.patch, FLUME-2905-4.patch > > > During the flume agent start-up, the flume configuration containing the > NetcatSource is parsed and the source's start() is called. If there is an > issue while binding the channel's socket to a local address to configure the > socket to listen for connections following exception is thrown but the socket > open just before is not closed. > {code} > 2016-05-01 03:04:37,273 ERROR org.apache.flume.lifecycle.LifecycleSupervisor: > Unable to start EventDrivenSourceRunner: { > source:org.apache.flume.source.NetcatSource{name:src-1,state:IDLE} } - > Exception follows. > org.apache.flume.FlumeException: java.net.BindException: Address already in > use > at org.apache.flume.source.NetcatSource.start(NetcatSource.java:173) > at > org.apache.flume.source.EventDrivenSourceRunner.start(EventDrivenSourceRunner.java:44) > at > org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:251) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.net.BindException: Address already in use > at sun.nio.ch.Net.bind0(Native Method) > at sun.nio.ch.Net.bind(Net.java:444) > at sun.nio.ch.Net.bind(Net.java:436) > at > sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:67) > at org.apache.flume.source.NetcatSource.start(NetcatSource.java:167) > ... 9 more > {code} > The source's start() is then called again leading to another socket being > opened but not closed and so on. This leads to file descriptor (socket) leaks. > This can be easily reproduced as follows: > 1. Set Netcat as the source in flume agent configuration. > 2. Set the bind port for the netcat source to a port which is already in use. > e.g. in my case I used 50010 which is the port for DataNode's XCeiver > Protocol in use by the HDFS service. > 3. Start flume agent and perform "lsof -p | wc -l". Notice > the file descriptors keep on growing due to socket leaks with errors like: > "can't identify protocol". -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (FLUME-2905) NetcatSource - Socket not closed when an exception is encountered during start() leading to file descriptor leaks
[ https://issues.apache.org/jira/browse/FLUME-2905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16050088#comment-16050088 ] Siddharth Ahuja commented on FLUME-2905: Hi [~denes],[~jarcec], I have just updated the junit again with my latest patch, would have you time to review this for me please? Hopefully, the junits work this time around! Thank you in advance! > NetcatSource - Socket not closed when an exception is encountered during > start() leading to file descriptor leaks > - > > Key: FLUME-2905 > URL: https://issues.apache.org/jira/browse/FLUME-2905 > Project: Flume > Issue Type: Bug > Components: Sinks+Sources >Affects Versions: 1.6.0 >Reporter: Siddharth Ahuja >Assignee: Siddharth Ahuja > Attachments: FLUME-2905-0.patch, FLUME-2905-1.patch, > FLUME-2905-2.patch, FLUME-2905-3.patch > > > During the flume agent start-up, the flume configuration containing the > NetcatSource is parsed and the source's start() is called. If there is an > issue while binding the channel's socket to a local address to configure the > socket to listen for connections following exception is thrown but the socket > open just before is not closed. > {code} > 2016-05-01 03:04:37,273 ERROR org.apache.flume.lifecycle.LifecycleSupervisor: > Unable to start EventDrivenSourceRunner: { > source:org.apache.flume.source.NetcatSource{name:src-1,state:IDLE} } - > Exception follows. > org.apache.flume.FlumeException: java.net.BindException: Address already in > use > at org.apache.flume.source.NetcatSource.start(NetcatSource.java:173) > at > org.apache.flume.source.EventDrivenSourceRunner.start(EventDrivenSourceRunner.java:44) > at > org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:251) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.net.BindException: Address already in use > at sun.nio.ch.Net.bind0(Native Method) > at sun.nio.ch.Net.bind(Net.java:444) > at sun.nio.ch.Net.bind(Net.java:436) > at > sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:67) > at org.apache.flume.source.NetcatSource.start(NetcatSource.java:167) > ... 9 more > {code} > The source's start() is then called again leading to another socket being > opened but not closed and so on. This leads to file descriptor (socket) leaks. > This can be easily reproduced as follows: > 1. Set Netcat as the source in flume agent configuration. > 2. Set the bind port for the netcat source to a port which is already in use. > e.g. in my case I used 50010 which is the port for DataNode's XCeiver > Protocol in use by the HDFS service. > 3. Start flume agent and perform "lsof -p | wc -l". Notice > the file descriptors keep on growing due to socket leaks with errors like: > "can't identify protocol". -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (FLUME-2905) NetcatSource - Socket not closed when an exception is encountered during start() leading to file descriptor leaks
[ https://issues.apache.org/jira/browse/FLUME-2905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16049006#comment-16049006 ] Siddharth Ahuja commented on FLUME-2905: Hi [~denes], I have just created another patch (patch #3) with a minor change to my junit tests. The Junits are passing in my local build, as such, it would be great if you could test them out again. Thanks in advance for your help! > NetcatSource - Socket not closed when an exception is encountered during > start() leading to file descriptor leaks > - > > Key: FLUME-2905 > URL: https://issues.apache.org/jira/browse/FLUME-2905 > Project: Flume > Issue Type: Bug > Components: Sinks+Sources >Affects Versions: 1.6.0 >Reporter: Siddharth Ahuja >Assignee: Siddharth Ahuja > Attachments: FLUME-2905-0.patch, FLUME-2905-1.patch, > FLUME-2905-2.patch, FLUME-2905-3.patch > > > During the flume agent start-up, the flume configuration containing the > NetcatSource is parsed and the source's start() is called. If there is an > issue while binding the channel's socket to a local address to configure the > socket to listen for connections following exception is thrown but the socket > open just before is not closed. > {code} > 2016-05-01 03:04:37,273 ERROR org.apache.flume.lifecycle.LifecycleSupervisor: > Unable to start EventDrivenSourceRunner: { > source:org.apache.flume.source.NetcatSource{name:src-1,state:IDLE} } - > Exception follows. > org.apache.flume.FlumeException: java.net.BindException: Address already in > use > at org.apache.flume.source.NetcatSource.start(NetcatSource.java:173) > at > org.apache.flume.source.EventDrivenSourceRunner.start(EventDrivenSourceRunner.java:44) > at > org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:251) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.net.BindException: Address already in use > at sun.nio.ch.Net.bind0(Native Method) > at sun.nio.ch.Net.bind(Net.java:444) > at sun.nio.ch.Net.bind(Net.java:436) > at > sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:67) > at org.apache.flume.source.NetcatSource.start(NetcatSource.java:167) > ... 9 more > {code} > The source's start() is then called again leading to another socket being > opened but not closed and so on. This leads to file descriptor (socket) leaks. > This can be easily reproduced as follows: > 1. Set Netcat as the source in flume agent configuration. > 2. Set the bind port for the netcat source to a port which is already in use. > e.g. in my case I used 50010 which is the port for DataNode's XCeiver > Protocol in use by the HDFS service. > 3. Start flume agent and perform "lsof -p | wc -l". Notice > the file descriptors keep on growing due to socket leaks with errors like: > "can't identify protocol". -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (FLUME-2905) NetcatSource - Socket not closed when an exception is encountered during start() leading to file descriptor leaks
[ https://issues.apache.org/jira/browse/FLUME-2905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Ahuja updated FLUME-2905: --- Attachment: FLUME-2905-3.patch > NetcatSource - Socket not closed when an exception is encountered during > start() leading to file descriptor leaks > - > > Key: FLUME-2905 > URL: https://issues.apache.org/jira/browse/FLUME-2905 > Project: Flume > Issue Type: Bug > Components: Sinks+Sources >Affects Versions: 1.6.0 >Reporter: Siddharth Ahuja >Assignee: Siddharth Ahuja > Attachments: FLUME-2905-0.patch, FLUME-2905-1.patch, > FLUME-2905-2.patch, FLUME-2905-3.patch > > > During the flume agent start-up, the flume configuration containing the > NetcatSource is parsed and the source's start() is called. If there is an > issue while binding the channel's socket to a local address to configure the > socket to listen for connections following exception is thrown but the socket > open just before is not closed. > {code} > 2016-05-01 03:04:37,273 ERROR org.apache.flume.lifecycle.LifecycleSupervisor: > Unable to start EventDrivenSourceRunner: { > source:org.apache.flume.source.NetcatSource{name:src-1,state:IDLE} } - > Exception follows. > org.apache.flume.FlumeException: java.net.BindException: Address already in > use > at org.apache.flume.source.NetcatSource.start(NetcatSource.java:173) > at > org.apache.flume.source.EventDrivenSourceRunner.start(EventDrivenSourceRunner.java:44) > at > org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:251) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.net.BindException: Address already in use > at sun.nio.ch.Net.bind0(Native Method) > at sun.nio.ch.Net.bind(Net.java:444) > at sun.nio.ch.Net.bind(Net.java:436) > at > sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:67) > at org.apache.flume.source.NetcatSource.start(NetcatSource.java:167) > ... 9 more > {code} > The source's start() is then called again leading to another socket being > opened but not closed and so on. This leads to file descriptor (socket) leaks. > This can be easily reproduced as follows: > 1. Set Netcat as the source in flume agent configuration. > 2. Set the bind port for the netcat source to a port which is already in use. > e.g. in my case I used 50010 which is the port for DataNode's XCeiver > Protocol in use by the HDFS service. > 3. Start flume agent and perform "lsof -p | wc -l". Notice > the file descriptors keep on growing due to socket leaks with errors like: > "can't identify protocol". -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (FLUME-2966) NULL text in a TextMessage from a JMS source in Flume can lead to NPE
[ https://issues.apache.org/jira/browse/FLUME-2966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15466170#comment-15466170 ] Siddharth Ahuja commented on FLUME-2966: To provide a bit more context on this as requested, I raised this JIRA based on my investigation of the NPE stacktrace encountered by one of the customers running IBM Websphere MQ with Flume 1.5.0 as per below: {code} ERROR org.apache.flume.source.jms.JMSSource Unexpected error processing events java.lang.NullPointerException at org.apache.flume.source.jms.DefaultJMSMessageConverter.convert(DefaultJMSMessageConverter.java:101) at org.apache.flume.source.jms.JMSMessageConsumer.take(JMSMessageConsumer.java:124) at org.apache.flume.source.jms.JMSSource.doProcess(JMSSource.java:261) at org.apache.flume.source.AbstractPollableSource.process(AbstractPollableSource.java:58) at org.apache.flume.source.PollableSourceRunner$PollingRunner.run(PollableSourceRunner.java:137) at java.lang.Thread.run(Thread.java:745) {code} Please find my analysis for the above as per below: a) Based on code inspection at https://github.com/apache/flume/blob/flume-1.5/flume-ng-sources/flume-jms-source/src/main/java/org/apache/flume/source/jms/DefaultJMSMessageConverter.java#L101, event and textMessage cannot be NULL as a brand new event has been created just before at https://github.com/apache/flume/blob/flume-1.5/flume-ng-sources/flume-jms-source/src/main/java/org/apache/flume/source/jms/DefaultJMSMessageConverter.java#L74 and message has been used earlier at https://github.com/apache/flume/blob/flume-1.5/flume-ng-sources/flume-jms-source/src/main/java/org/apache/flume/source/jms/DefaultJMSMessageConverter.java#L77. Therefore, if any of them were null, we would have NPE'd with a different stacktrace. So the only remaining possibility is that textMessage.getText() is null. b) To confirm if we can have code using MQ frameworks send null text messages to Flume I got ActiveMQ going in Eclipse where I created a sample producer and consumer based on http://activemq.apache.org/hello-world.html. The Producer creates a text message (similar to this case) with a "null" text string for the destination queue as per below: {code} … // Create the destination (Topic or Queue) Destination destination = session.createQueue("TEST.FOO"); // Create a MessageProducer from the Session to the Topic or Queue MessageProducer producer = session.createProducer(destination); producer.setDeliveryMode(DeliveryMode.NON_PERSISTENT); // Create a null text message //String text = "Hello world! From: " + Thread.currentThread().getName() + " : " + this.hashCode(); String text = null; TextMessage message = session.createTextMessage(text); … {code} When I run the ActiveMQ application with this Producer, there are no errors/exceptions creating that text message with a null string. However, my consumer fails with an NPE while trying to get bytes off the null text as per below: {code} … if (message instanceof TextMessage) { TextMessage textMessage = (TextMessage) message; String text = textMessage.getText(); text.getBytes(Charset.defaultCharset());NULL text in a TextMessage from a JMS source in Flume can lead to NPE > - > > Key: FLUME-2966 > URL: https://issues.apache.org/jira/browse/FLUME-2966 > Project: Flume > Issue Type: Bug >Affects Versions: v1.5.0 >Reporter: Siddharth Ahuja >Assignee: Siddharth Ahuja > Attachments: App.java, FLUME-2966-0.patch, FLUME-2966-1.patch > > > Code at >
[jira] [Updated] (FLUME-2966) NULL text in a TextMessage from a JMS source in Flume can lead to NPE
[ https://issues.apache.org/jira/browse/FLUME-2966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Ahuja updated FLUME-2966: --- Attachment: FLUME-2966-1.patch App.java > NULL text in a TextMessage from a JMS source in Flume can lead to NPE > - > > Key: FLUME-2966 > URL: https://issues.apache.org/jira/browse/FLUME-2966 > Project: Flume > Issue Type: Bug >Reporter: Siddharth Ahuja >Assignee: Siddharth Ahuja > Attachments: App.java, FLUME-2966-0.patch, FLUME-2966-1.patch > > > Code at > https://github.com/apache/flume/blob/trunk/flume-ng-sources/flume-jms-source/src/main/java/org/apache/flume/source/jms/DefaultJMSMessageConverter.java#L103 > does not check for a NULL text in a TextMessage from a Flume JMS source. > This can lead to a NullPointerException here: > {code}textMessage.getText().getBytes(charset){code} while trying to > de-reference a null text from the textmessage. > We should probably skip these like the NULL Objects in the ObjectMessage just > below at: > https://github.com/apache/flume/blob/trunk/flume-ng-sources/flume-jms-source/src/main/java/org/apache/flume/source/jms/DefaultJMSMessageConverter.java#L107. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2966) NULL text in a TextMessage from a JMS source in Flume can lead to NPE
[ https://issues.apache.org/jira/browse/FLUME-2966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15463788#comment-15463788 ] Siddharth Ahuja commented on FLUME-2966: Sorry, was away on leave [~bessbd]. Just posted a patch for review. Please kindly advise if I needed to do something else for getting this reviewed.Thanks! > NULL text in a TextMessage from a JMS source in Flume can lead to NPE > - > > Key: FLUME-2966 > URL: https://issues.apache.org/jira/browse/FLUME-2966 > Project: Flume > Issue Type: Bug >Reporter: Siddharth Ahuja >Assignee: Siddharth Ahuja > Attachments: FLUME-2966-0.patch > > > Code at > https://github.com/apache/flume/blob/trunk/flume-ng-sources/flume-jms-source/src/main/java/org/apache/flume/source/jms/DefaultJMSMessageConverter.java#L103 > does not check for a NULL text in a TextMessage from a Flume JMS source. > This can lead to a NullPointerException here: > {code}textMessage.getText().getBytes(charset){code} while trying to > de-reference a null text from the textmessage. > We should probably skip these like the NULL Objects in the ObjectMessage just > below at: > https://github.com/apache/flume/blob/trunk/flume-ng-sources/flume-jms-source/src/main/java/org/apache/flume/source/jms/DefaultJMSMessageConverter.java#L107. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLUME-2966) NULL text in a TextMessage from a JMS source in Flume can lead to NPE
[ https://issues.apache.org/jira/browse/FLUME-2966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Ahuja updated FLUME-2966: --- Attachment: FLUME-2966-0.patch > NULL text in a TextMessage from a JMS source in Flume can lead to NPE > - > > Key: FLUME-2966 > URL: https://issues.apache.org/jira/browse/FLUME-2966 > Project: Flume > Issue Type: Bug >Reporter: Siddharth Ahuja >Assignee: Siddharth Ahuja > Attachments: FLUME-2966-0.patch > > > Code at > https://github.com/apache/flume/blob/trunk/flume-ng-sources/flume-jms-source/src/main/java/org/apache/flume/source/jms/DefaultJMSMessageConverter.java#L103 > does not check for a NULL text in a TextMessage from a Flume JMS source. > This can lead to a NullPointerException here: > {code}textMessage.getText().getBytes(charset){code} while trying to > de-reference a null text from the textmessage. > We should probably skip these like the NULL Objects in the ObjectMessage just > below at: > https://github.com/apache/flume/blob/trunk/flume-ng-sources/flume-jms-source/src/main/java/org/apache/flume/source/jms/DefaultJMSMessageConverter.java#L107. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2966) NULL text in a TextMessage from a JMS source in Flume can lead to NPE
[ https://issues.apache.org/jira/browse/FLUME-2966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15411150#comment-15411150 ] Siddharth Ahuja commented on FLUME-2966: Ah, I managed to get it done myself!:) > NULL text in a TextMessage from a JMS source in Flume can lead to NPE > - > > Key: FLUME-2966 > URL: https://issues.apache.org/jira/browse/FLUME-2966 > Project: Flume > Issue Type: Bug >Reporter: Siddharth Ahuja >Assignee: Siddharth Ahuja > > Code at > https://github.com/apache/flume/blob/trunk/flume-ng-sources/flume-jms-source/src/main/java/org/apache/flume/source/jms/DefaultJMSMessageConverter.java#L103 > does not check for a NULL text in a TextMessage from a Flume JMS source. > This can lead to a NullPointerException here: > {code}textMessage.getText().getBytes(charset){code} while trying to > de-reference a null text from the textmessage. > We should probably skip these like the NULL Objects in the ObjectMessage just > below at: > https://github.com/apache/flume/blob/trunk/flume-ng-sources/flume-jms-source/src/main/java/org/apache/flume/source/jms/DefaultJMSMessageConverter.java#L107. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (FLUME-2966) NULL text in a TextMessage from a JMS source in Flume can lead to NPE
[ https://issues.apache.org/jira/browse/FLUME-2966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Ahuja reassigned FLUME-2966: -- Assignee: Siddharth Ahuja > NULL text in a TextMessage from a JMS source in Flume can lead to NPE > - > > Key: FLUME-2966 > URL: https://issues.apache.org/jira/browse/FLUME-2966 > Project: Flume > Issue Type: Bug >Reporter: Siddharth Ahuja >Assignee: Siddharth Ahuja > > Code at > https://github.com/apache/flume/blob/trunk/flume-ng-sources/flume-jms-source/src/main/java/org/apache/flume/source/jms/DefaultJMSMessageConverter.java#L103 > does not check for a NULL text in a TextMessage from a Flume JMS source. > This can lead to a NullPointerException here: > {code}textMessage.getText().getBytes(charset){code} while trying to > de-reference a null text from the textmessage. > We should probably skip these like the NULL Objects in the ObjectMessage just > below at: > https://github.com/apache/flume/blob/trunk/flume-ng-sources/flume-jms-source/src/main/java/org/apache/flume/source/jms/DefaultJMSMessageConverter.java#L107. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2966) NULL text in a TextMessage from a JMS source in Flume can lead to NPE
[ https://issues.apache.org/jira/browse/FLUME-2966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15411148#comment-15411148 ] Siddharth Ahuja commented on FLUME-2966: Hi [~bessbd], thanks for checking. Would love to be able to work on the patch if you don't mind. Would it be possible for you to assign this to me? Thank you! > NULL text in a TextMessage from a JMS source in Flume can lead to NPE > - > > Key: FLUME-2966 > URL: https://issues.apache.org/jira/browse/FLUME-2966 > Project: Flume > Issue Type: Bug >Reporter: Siddharth Ahuja > > Code at > https://github.com/apache/flume/blob/trunk/flume-ng-sources/flume-jms-source/src/main/java/org/apache/flume/source/jms/DefaultJMSMessageConverter.java#L103 > does not check for a NULL text in a TextMessage from a Flume JMS source. > This can lead to a NullPointerException here: > {code}textMessage.getText().getBytes(charset){code} while trying to > de-reference a null text from the textmessage. > We should probably skip these like the NULL Objects in the ObjectMessage just > below at: > https://github.com/apache/flume/blob/trunk/flume-ng-sources/flume-jms-source/src/main/java/org/apache/flume/source/jms/DefaultJMSMessageConverter.java#L107. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLUME-2966) NULL text in a TextMessage from a JMS source in Flume can lead to NPE
Siddharth Ahuja created FLUME-2966: -- Summary: NULL text in a TextMessage from a JMS source in Flume can lead to NPE Key: FLUME-2966 URL: https://issues.apache.org/jira/browse/FLUME-2966 Project: Flume Issue Type: Bug Reporter: Siddharth Ahuja Code at https://github.com/apache/flume/blob/trunk/flume-ng-sources/flume-jms-source/src/main/java/org/apache/flume/source/jms/DefaultJMSMessageConverter.java#L103 does not check for a NULL text in a TextMessage from a Flume JMS source. This can lead to a NullPointerException here: {code}textMessage.getText().getBytes(charset){code} while trying to de-reference a null text from the textmessage. We should probably skip these like the NULL Objects in the ObjectMessage just below at: https://github.com/apache/flume/blob/trunk/flume-ng-sources/flume-jms-source/src/main/java/org/apache/flume/source/jms/DefaultJMSMessageConverter.java#L107. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2905) NetcatSource - Socket not closed when an exception is encountered during start() leading to file descriptor leaks
[ https://issues.apache.org/jira/browse/FLUME-2905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15297911#comment-15297911 ] Siddharth Ahuja commented on FLUME-2905: I have attached a new patch : FLUME-2905-2 for the above, thanks again for the review. > NetcatSource - Socket not closed when an exception is encountered during > start() leading to file descriptor leaks > - > > Key: FLUME-2905 > URL: https://issues.apache.org/jira/browse/FLUME-2905 > Project: Flume > Issue Type: Bug > Components: Sinks+Sources >Affects Versions: v1.6.0 >Reporter: Siddharth Ahuja >Assignee: Siddharth Ahuja > Attachments: FLUME-2905-0.patch, FLUME-2905-1.patch, > FLUME-2905-2.patch > > > During the flume agent start-up, the flume configuration containing the > NetcatSource is parsed and the source's start() is called. If there is an > issue while binding the channel's socket to a local address to configure the > socket to listen for connections following exception is thrown but the socket > open just before is not closed. > {code} > 2016-05-01 03:04:37,273 ERROR org.apache.flume.lifecycle.LifecycleSupervisor: > Unable to start EventDrivenSourceRunner: { > source:org.apache.flume.source.NetcatSource{name:src-1,state:IDLE} } - > Exception follows. > org.apache.flume.FlumeException: java.net.BindException: Address already in > use > at org.apache.flume.source.NetcatSource.start(NetcatSource.java:173) > at > org.apache.flume.source.EventDrivenSourceRunner.start(EventDrivenSourceRunner.java:44) > at > org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:251) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.net.BindException: Address already in use > at sun.nio.ch.Net.bind0(Native Method) > at sun.nio.ch.Net.bind(Net.java:444) > at sun.nio.ch.Net.bind(Net.java:436) > at > sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:67) > at org.apache.flume.source.NetcatSource.start(NetcatSource.java:167) > ... 9 more > {code} > The source's start() is then called again leading to another socket being > opened but not closed and so on. This leads to file descriptor (socket) leaks. > This can be easily reproduced as follows: > 1. Set Netcat as the source in flume agent configuration. > 2. Set the bind port for the netcat source to a port which is already in use. > e.g. in my case I used 50010 which is the port for DataNode's XCeiver > Protocol in use by the HDFS service. > 3. Start flume agent and perform "lsof -p | wc -l". Notice > the file descriptors keep on growing due to socket leaks with errors like: > "can't identify protocol". -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLUME-2905) NetcatSource - Socket not closed when an exception is encountered during start() leading to file descriptor leaks
[ https://issues.apache.org/jira/browse/FLUME-2905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Ahuja updated FLUME-2905: --- Attachment: FLUME-2905-2.patch > NetcatSource - Socket not closed when an exception is encountered during > start() leading to file descriptor leaks > - > > Key: FLUME-2905 > URL: https://issues.apache.org/jira/browse/FLUME-2905 > Project: Flume > Issue Type: Bug > Components: Sinks+Sources >Affects Versions: v1.6.0 >Reporter: Siddharth Ahuja >Assignee: Siddharth Ahuja > Attachments: FLUME-2905-0.patch, FLUME-2905-1.patch, > FLUME-2905-2.patch > > > During the flume agent start-up, the flume configuration containing the > NetcatSource is parsed and the source's start() is called. If there is an > issue while binding the channel's socket to a local address to configure the > socket to listen for connections following exception is thrown but the socket > open just before is not closed. > {code} > 2016-05-01 03:04:37,273 ERROR org.apache.flume.lifecycle.LifecycleSupervisor: > Unable to start EventDrivenSourceRunner: { > source:org.apache.flume.source.NetcatSource{name:src-1,state:IDLE} } - > Exception follows. > org.apache.flume.FlumeException: java.net.BindException: Address already in > use > at org.apache.flume.source.NetcatSource.start(NetcatSource.java:173) > at > org.apache.flume.source.EventDrivenSourceRunner.start(EventDrivenSourceRunner.java:44) > at > org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:251) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.net.BindException: Address already in use > at sun.nio.ch.Net.bind0(Native Method) > at sun.nio.ch.Net.bind(Net.java:444) > at sun.nio.ch.Net.bind(Net.java:436) > at > sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:67) > at org.apache.flume.source.NetcatSource.start(NetcatSource.java:167) > ... 9 more > {code} > The source's start() is then called again leading to another socket being > opened but not closed and so on. This leads to file descriptor (socket) leaks. > This can be easily reproduced as follows: > 1. Set Netcat as the source in flume agent configuration. > 2. Set the bind port for the netcat source to a port which is already in use. > e.g. in my case I used 50010 which is the port for DataNode's XCeiver > Protocol in use by the HDFS service. > 3. Start flume agent and perform "lsof -p | wc -l". Notice > the file descriptors keep on growing due to socket leaks with errors like: > "can't identify protocol". -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2905) NetcatSource - Socket not closed when an exception is encountered during start() leading to file descriptor leaks
[ https://issues.apache.org/jira/browse/FLUME-2905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15297905#comment-15297905 ] Siddharth Ahuja commented on FLUME-2905: Thanks [~jarcec], I have found the issue here. It is to do with how I set up my source in TestNetcatSource.java. As I created a new NetcatSource in my test method and did not set up the channelProcessor for it manually it failed for you (but for some reason it worked fine for me in both my eclipse & command line environment). Regardless, I have modified the test so that it re-uses the existing source object with the channelProcessor already set up through the setup() method of the test suite. The point of the test is to ensure that when we start up a source that tries to bind on a port which is already bound we should stop the source. Creating a "new" source in the test suite is not important and we can re-use the already existing source. > NetcatSource - Socket not closed when an exception is encountered during > start() leading to file descriptor leaks > - > > Key: FLUME-2905 > URL: https://issues.apache.org/jira/browse/FLUME-2905 > Project: Flume > Issue Type: Bug > Components: Sinks+Sources >Affects Versions: v1.6.0 >Reporter: Siddharth Ahuja >Assignee: Siddharth Ahuja > Attachments: FLUME-2905-0.patch, FLUME-2905-1.patch > > > During the flume agent start-up, the flume configuration containing the > NetcatSource is parsed and the source's start() is called. If there is an > issue while binding the channel's socket to a local address to configure the > socket to listen for connections following exception is thrown but the socket > open just before is not closed. > {code} > 2016-05-01 03:04:37,273 ERROR org.apache.flume.lifecycle.LifecycleSupervisor: > Unable to start EventDrivenSourceRunner: { > source:org.apache.flume.source.NetcatSource{name:src-1,state:IDLE} } - > Exception follows. > org.apache.flume.FlumeException: java.net.BindException: Address already in > use > at org.apache.flume.source.NetcatSource.start(NetcatSource.java:173) > at > org.apache.flume.source.EventDrivenSourceRunner.start(EventDrivenSourceRunner.java:44) > at > org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:251) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.net.BindException: Address already in use > at sun.nio.ch.Net.bind0(Native Method) > at sun.nio.ch.Net.bind(Net.java:444) > at sun.nio.ch.Net.bind(Net.java:436) > at > sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:67) > at org.apache.flume.source.NetcatSource.start(NetcatSource.java:167) > ... 9 more > {code} > The source's start() is then called again leading to another socket being > opened but not closed and so on. This leads to file descriptor (socket) leaks. > This can be easily reproduced as follows: > 1. Set Netcat as the source in flume agent configuration. > 2. Set the bind port for the netcat source to a port which is already in use. > e.g. in my case I used 50010 which is the port for DataNode's XCeiver > Protocol in use by the HDFS service. > 3. Start flume agent and perform "lsof -p | wc -l". Notice > the file descriptors keep on growing due to socket leaks with errors like: > "can't identify protocol". -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLUME-2905) NetcatSource - Socket not closed when an exception is encountered during start() leading to file descriptor leaks
[ https://issues.apache.org/jira/browse/FLUME-2905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Ahuja updated FLUME-2905: --- Attachment: FLUME-2905-1.patch > NetcatSource - Socket not closed when an exception is encountered during > start() leading to file descriptor leaks > - > > Key: FLUME-2905 > URL: https://issues.apache.org/jira/browse/FLUME-2905 > Project: Flume > Issue Type: Bug > Components: Sinks+Sources >Affects Versions: v1.6.0 >Reporter: Siddharth Ahuja >Assignee: Siddharth Ahuja > Attachments: FLUME-2905-0.patch, FLUME-2905-1.patch > > > During the flume agent start-up, the flume configuration containing the > NetcatSource is parsed and the source's start() is called. If there is an > issue while binding the channel's socket to a local address to configure the > socket to listen for connections following exception is thrown but the socket > open just before is not closed. > {code} > 2016-05-01 03:04:37,273 ERROR org.apache.flume.lifecycle.LifecycleSupervisor: > Unable to start EventDrivenSourceRunner: { > source:org.apache.flume.source.NetcatSource{name:src-1,state:IDLE} } - > Exception follows. > org.apache.flume.FlumeException: java.net.BindException: Address already in > use > at org.apache.flume.source.NetcatSource.start(NetcatSource.java:173) > at > org.apache.flume.source.EventDrivenSourceRunner.start(EventDrivenSourceRunner.java:44) > at > org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:251) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.net.BindException: Address already in use > at sun.nio.ch.Net.bind0(Native Method) > at sun.nio.ch.Net.bind(Net.java:444) > at sun.nio.ch.Net.bind(Net.java:436) > at > sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:67) > at org.apache.flume.source.NetcatSource.start(NetcatSource.java:167) > ... 9 more > {code} > The source's start() is then called again leading to another socket being > opened but not closed and so on. This leads to file descriptor (socket) leaks. > This can be easily reproduced as follows: > 1. Set Netcat as the source in flume agent configuration. > 2. Set the bind port for the netcat source to a port which is already in use. > e.g. in my case I used 50010 which is the port for DataNode's XCeiver > Protocol in use by the HDFS service. > 3. Start flume agent and perform "lsof -p | wc -l". Notice > the file descriptors keep on growing due to socket leaks with errors like: > "can't identify protocol". -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2905) NetcatSource - Socket not closed when an exception is encountered during start() leading to file descriptor leaks
[ https://issues.apache.org/jira/browse/FLUME-2905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15275659#comment-15275659 ] Siddharth Ahuja commented on FLUME-2905: Attached the patch that stops the source and cleans up the socket when a BindException is encountered if a port is already in use by invoking stop() . Also, moved the creation of the thread pool near to where it gets used so that it does not have to be cleaned up during the stop(). JUnit is also provided. Tested the patch as follows: • Started HDFS (datanode) service that binds to the port 50010, so port is in use. • Started updated flume-agent with configuration containing Netcat source and File roll sink with bind port as 50010. • Investigated flume logs and found the BindException due to port already in use. • Ran "ps auxx|grep -i flume" to get the flume process id. • Ran "lsof -p | wc -l" multiple times to check if file descriptors are increasing. They are stable. • Stopped HDFS service to free up port 50010. • Noticed from flume-agent logs that source is finally started successfully with socket bound to 50010. • From a new terminal, ran "nc localhost 50010" and entered some text. • The text is written on the local filesystem successfully. > NetcatSource - Socket not closed when an exception is encountered during > start() leading to file descriptor leaks > - > > Key: FLUME-2905 > URL: https://issues.apache.org/jira/browse/FLUME-2905 > Project: Flume > Issue Type: Bug > Components: Sinks+Sources >Affects Versions: v1.6.0 >Reporter: Siddharth Ahuja >Assignee: Siddharth Ahuja > Attachments: FLUME-2905-0.patch > > > During the flume agent start-up, the flume configuration containing the > NetcatSource is parsed and the source's start() is called. If there is an > issue while binding the channel's socket to a local address to configure the > socket to listen for connections following exception is thrown but the socket > open just before is not closed. > {code} > 2016-05-01 03:04:37,273 ERROR org.apache.flume.lifecycle.LifecycleSupervisor: > Unable to start EventDrivenSourceRunner: { > source:org.apache.flume.source.NetcatSource{name:src-1,state:IDLE} } - > Exception follows. > org.apache.flume.FlumeException: java.net.BindException: Address already in > use > at org.apache.flume.source.NetcatSource.start(NetcatSource.java:173) > at > org.apache.flume.source.EventDrivenSourceRunner.start(EventDrivenSourceRunner.java:44) > at > org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:251) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.net.BindException: Address already in use > at sun.nio.ch.Net.bind0(Native Method) > at sun.nio.ch.Net.bind(Net.java:444) > at sun.nio.ch.Net.bind(Net.java:436) > at > sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:67) > at org.apache.flume.source.NetcatSource.start(NetcatSource.java:167) > ... 9 more > {code} > The source's start() is then called again leading to another socket being > opened but not closed and so on. This leads to file descriptor (socket) leaks. > This can be easily reproduced as follows: > 1. Set Netcat as the source in flume agent configuration. > 2. Set the bind port for the netcat source to a port which is already in use. > e.g. in my case I used 50010 which is the port for DataNode's XCeiver > Protocol in use by the HDFS service. > 3. Start flume agent and perform "lsof -p | wc -l". Notice > the file descriptors keep on growing due to socket leaks with errors like: > "can't identify protocol". -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLUME-2905) NetcatSource - Socket not closed when an exception is encountered during start() leading to file descriptor leaks
[ https://issues.apache.org/jira/browse/FLUME-2905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Ahuja updated FLUME-2905: --- Description: During the flume agent start-up, the flume configuration containing the NetcatSource is parsed and the source's start() is called. If there is an issue while binding the channel's socket to a local address to configure the socket to listen for connections following exception is thrown but the socket open just before is not closed. {code} 2016-05-01 03:04:37,273 ERROR org.apache.flume.lifecycle.LifecycleSupervisor: Unable to start EventDrivenSourceRunner: { source:org.apache.flume.source.NetcatSource{name:src-1,state:IDLE} } - Exception follows. org.apache.flume.FlumeException: java.net.BindException: Address already in use at org.apache.flume.source.NetcatSource.start(NetcatSource.java:173) at org.apache.flume.source.EventDrivenSourceRunner.start(EventDrivenSourceRunner.java:44) at org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:251) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.net.BindException: Address already in use at sun.nio.ch.Net.bind0(Native Method) at sun.nio.ch.Net.bind(Net.java:444) at sun.nio.ch.Net.bind(Net.java:436) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:67) at org.apache.flume.source.NetcatSource.start(NetcatSource.java:167) ... 9 more {code} The source's start() is then called again leading to another socket being opened but not closed and so on. This leads to file descriptor (socket) leaks. This can be easily reproduced as follows: 1. Set Netcat as the source in flume agent configuration. 2. Set the bind port for the netcat source to a port which is already in use. e.g. in my case I used 50010 which is the port for DataNode's XCeiver Protocol in use by the HDFS service. 3. Start flume agent and perform "lsof -p | wc -l". Notice the file descriptors keep on growing due to socket leaks with errors like: "can't identify protocol". was: During the flume agent start-up, the flume configuration containing the NetcatSource is parsed and the source's start() is called. If there is an issue while binding the channel's socket to a local address to configure the socket to listen for connections following exception is thrown but the socket open just before is not closed. 2016-05-01 03:04:37,273 ERROR org.apache.flume.lifecycle.LifecycleSupervisor: Unable to start EventDrivenSourceRunner: { source:org.apache.flume.source.NetcatSource{name:src-1,state:IDLE} } - Exception follows. org.apache.flume.FlumeException: java.net.BindException: Address already in use at org.apache.flume.source.NetcatSource.start(NetcatSource.java:173) at org.apache.flume.source.EventDrivenSourceRunner.start(EventDrivenSourceRunner.java:44) at org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:251) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.net.BindException: Address already in use at sun.nio.ch.Net.bind0(Native Method) at sun.nio.ch.Net.bind(Net.java:444) at sun.nio.ch.Net.bind(Net.java:436) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:67) at
[jira] [Assigned] (FLUME-2905) NetcatSource - Socket not closed when an exception is encountered during start() leading to file descriptor leaks
[ https://issues.apache.org/jira/browse/FLUME-2905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Ahuja reassigned FLUME-2905: -- Assignee: Siddharth Ahuja > NetcatSource - Socket not closed when an exception is encountered during > start() leading to file descriptor leaks > - > > Key: FLUME-2905 > URL: https://issues.apache.org/jira/browse/FLUME-2905 > Project: Flume > Issue Type: Bug > Components: Sinks+Sources >Affects Versions: v1.6.0 >Reporter: Siddharth Ahuja >Assignee: Siddharth Ahuja > > During the flume agent start-up, the flume configuration containing the > NetcatSource is parsed and the source's start() is called. If there is an > issue while binding the channel's socket to a local address to configure the > socket to listen for connections following exception is thrown but the socket > open just before is not closed. > 2016-05-01 03:04:37,273 ERROR org.apache.flume.lifecycle.LifecycleSupervisor: > Unable to start EventDrivenSourceRunner: { > source:org.apache.flume.source.NetcatSource{name:src-1,state:IDLE} } - > Exception follows. > org.apache.flume.FlumeException: java.net.BindException: Address already in > use > at org.apache.flume.source.NetcatSource.start(NetcatSource.java:173) > at > org.apache.flume.source.EventDrivenSourceRunner.start(EventDrivenSourceRunner.java:44) > at > org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:251) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.net.BindException: Address already in use > at sun.nio.ch.Net.bind0(Native Method) > at sun.nio.ch.Net.bind(Net.java:444) > at sun.nio.ch.Net.bind(Net.java:436) > at > sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:67) > at org.apache.flume.source.NetcatSource.start(NetcatSource.java:167) > ... 9 more > The source's start() is then called again leading to another socket being > opened but not closed and so on. This leads to file descriptor (socket) leaks. > This can be easily reproduced as follows: > 1. Set Netcat as the source in flume agent configuration. > 2. Set the bind port for the netcat source to a port which is already in use. > e.g. in my case I used 50010 which is the port for DataNode's XCeiver > Protocol in use by the HDFS service. > 3. Start flume agent and perform "lsof -p | wc -l". Notice > the file descriptors keep on growing due to socket leaks with errors like: > "can't identify protocol". -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2905) NetcatSource - Socket not closed when an exception is encountered during start() leading to file descriptor leaks
[ https://issues.apache.org/jira/browse/FLUME-2905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15266049#comment-15266049 ] Siddharth Ahuja commented on FLUME-2905: Trying to get it assigned to myself... > NetcatSource - Socket not closed when an exception is encountered during > start() leading to file descriptor leaks > - > > Key: FLUME-2905 > URL: https://issues.apache.org/jira/browse/FLUME-2905 > Project: Flume > Issue Type: Bug > Components: Sinks+Sources >Affects Versions: v1.6.0 >Reporter: Siddharth Ahuja > > During the flume agent start-up, the flume configuration containing the > NetcatSource is parsed and the source's start() is called. If there is an > issue while binding the channel's socket to a local address to configure the > socket to listen for connections following exception is thrown but the socket > open just before is not closed. > 2016-05-01 03:04:37,273 ERROR org.apache.flume.lifecycle.LifecycleSupervisor: > Unable to start EventDrivenSourceRunner: { > source:org.apache.flume.source.NetcatSource{name:src-1,state:IDLE} } - > Exception follows. > org.apache.flume.FlumeException: java.net.BindException: Address already in > use > at org.apache.flume.source.NetcatSource.start(NetcatSource.java:173) > at > org.apache.flume.source.EventDrivenSourceRunner.start(EventDrivenSourceRunner.java:44) > at > org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:251) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.net.BindException: Address already in use > at sun.nio.ch.Net.bind0(Native Method) > at sun.nio.ch.Net.bind(Net.java:444) > at sun.nio.ch.Net.bind(Net.java:436) > at > sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:67) > at org.apache.flume.source.NetcatSource.start(NetcatSource.java:167) > ... 9 more > The source's start() is then called again leading to another socket being > opened but not closed and so on. This leads to file descriptor (socket) leaks. > This can be easily reproduced as follows: > 1. Set Netcat as the source in flume agent configuration. > 2. Set the bind port for the netcat source to a port which is already in use. > e.g. in my case I used 50010 which is the port for DataNode's XCeiver > Protocol in use by the HDFS service. > 3. Start flume agent and perform "lsof -p | wc -l". Notice > the file descriptors keep on growing due to socket leaks with errors like: > "can't identify protocol". -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLUME-2905) NetcatSource - Socket not closed when an exception is encountered during start() leading to file descriptor leaks
Siddharth Ahuja created FLUME-2905: -- Summary: NetcatSource - Socket not closed when an exception is encountered during start() leading to file descriptor leaks Key: FLUME-2905 URL: https://issues.apache.org/jira/browse/FLUME-2905 Project: Flume Issue Type: Bug Components: Sinks+Sources Affects Versions: v1.6.0 Reporter: Siddharth Ahuja During the flume agent start-up, the flume configuration containing the NetcatSource is parsed and the source's start() is called. If there is an issue while binding the channel's socket to a local address to configure the socket to listen for connections following exception is thrown but the socket open just before is not closed. 2016-05-01 03:04:37,273 ERROR org.apache.flume.lifecycle.LifecycleSupervisor: Unable to start EventDrivenSourceRunner: { source:org.apache.flume.source.NetcatSource{name:src-1,state:IDLE} } - Exception follows. org.apache.flume.FlumeException: java.net.BindException: Address already in use at org.apache.flume.source.NetcatSource.start(NetcatSource.java:173) at org.apache.flume.source.EventDrivenSourceRunner.start(EventDrivenSourceRunner.java:44) at org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:251) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.net.BindException: Address already in use at sun.nio.ch.Net.bind0(Native Method) at sun.nio.ch.Net.bind(Net.java:444) at sun.nio.ch.Net.bind(Net.java:436) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:67) at org.apache.flume.source.NetcatSource.start(NetcatSource.java:167) ... 9 more The source's start() is then called again leading to another socket being opened but not closed and so on. This leads to file descriptor (socket) leaks. This can be easily reproduced as follows: 1. Set Netcat as the source in flume agent configuration. 2. Set the bind port for the netcat source to a port which is already in use. e.g. in my case I used 50010 which is the port for DataNode's XCeiver Protocol in use by the HDFS service. 3. Start flume agent and perform "lsof -p | wc -l". Notice the file descriptors keep on growing due to socket leaks with errors like: "can't identify protocol". -- This message was sent by Atlassian JIRA (v6.3.4#6332)