[jira] [Updated] (HDFS-9669) TcpPeerServer should respect ipc.server.listen.queue.size
[ https://issues.apache.org/jira/browse/HDFS-9669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated HDFS-9669: - Fix Version/s: 2.8.0 > TcpPeerServer should respect ipc.server.listen.queue.size > - > > Key: HDFS-9669 > URL: https://issues.apache.org/jira/browse/HDFS-9669 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.7.2 >Reporter: Elliott Clark >Assignee: Elliott Clark > Fix For: 2.8.0, 2.7.3, 2.6.5, 3.0.0-alpha1 > > Attachments: HDFS-9669.0.patch, HDFS-9669.1.patch, HDFS-9669.1.patch > > > On periods of high traffic we are seeing: > {code} > 16/01/19 23:40:40 WARN hdfs.DFSClient: Connection failure: Failed to connect > to /10.138.178.47:50010 for file /MYPATH/MYFILE for block > BP-1935559084-10.138.112.27-1449689748174:blk_1080898601_7375294:java.io.IOException: > Connection reset by peer > java.io.IOException: Connection reset by peer > at sun.nio.ch.FileDispatcherImpl.write0(Native Method) > at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) > at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) > at sun.nio.ch.IOUtil.write(IOUtil.java:65) > at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471) > at > org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63) > at > org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142) > at > org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159) > at > org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117) > at > org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:109) > at java.io.DataOutputStream.writeInt(DataOutputStream.java:197) > {code} > At the time that this happens there are way less xceivers than configured. > On most JDK's this will make 50 the total backlog at any time. This > effectively means that any GC + Busy time willl result in tcp resets. > http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/tip/src/share/classes/java/net/ServerSocket.java#l370 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-9669) TcpPeerServer should respect ipc.server.listen.queue.size
[ https://issues.apache.org/jira/browse/HDFS-9669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated HDFS-9669: -- Fix Version/s: 2.6.5 Cherry-picked it to 2.6.5 (trivial). > TcpPeerServer should respect ipc.server.listen.queue.size > - > > Key: HDFS-9669 > URL: https://issues.apache.org/jira/browse/HDFS-9669 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.7.2 >Reporter: Elliott Clark >Assignee: Elliott Clark > Fix For: 2.7.3, 2.6.5, 3.0.0-alpha1 > > Attachments: HDFS-9669.0.patch, HDFS-9669.1.patch, HDFS-9669.1.patch > > > On periods of high traffic we are seeing: > {code} > 16/01/19 23:40:40 WARN hdfs.DFSClient: Connection failure: Failed to connect > to /10.138.178.47:50010 for file /MYPATH/MYFILE for block > BP-1935559084-10.138.112.27-1449689748174:blk_1080898601_7375294:java.io.IOException: > Connection reset by peer > java.io.IOException: Connection reset by peer > at sun.nio.ch.FileDispatcherImpl.write0(Native Method) > at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) > at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) > at sun.nio.ch.IOUtil.write(IOUtil.java:65) > at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471) > at > org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63) > at > org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142) > at > org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159) > at > org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117) > at > org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:109) > at java.io.DataOutputStream.writeInt(DataOutputStream.java:197) > {code} > At the time that this happens there are way less xceivers than configured. > On most JDK's this will make 50 the total backlog at any time. This > effectively means that any GC + Busy time willl result in tcp resets. > http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/tip/src/share/classes/java/net/ServerSocket.java#l370 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-9669) TcpPeerServer should respect ipc.server.listen.queue.size
[ https://issues.apache.org/jira/browse/HDFS-9669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Trezzo updated HDFS-9669: --- Target Version/s: 2.7.3, 2.6.5 (was: 2.7.3) > TcpPeerServer should respect ipc.server.listen.queue.size > - > > Key: HDFS-9669 > URL: https://issues.apache.org/jira/browse/HDFS-9669 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.7.2 >Reporter: Elliott Clark >Assignee: Elliott Clark > Fix For: 2.7.3 > > Attachments: HDFS-9669.0.patch, HDFS-9669.1.patch, HDFS-9669.1.patch > > > On periods of high traffic we are seeing: > {code} > 16/01/19 23:40:40 WARN hdfs.DFSClient: Connection failure: Failed to connect > to /10.138.178.47:50010 for file /MYPATH/MYFILE for block > BP-1935559084-10.138.112.27-1449689748174:blk_1080898601_7375294:java.io.IOException: > Connection reset by peer > java.io.IOException: Connection reset by peer > at sun.nio.ch.FileDispatcherImpl.write0(Native Method) > at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) > at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) > at sun.nio.ch.IOUtil.write(IOUtil.java:65) > at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471) > at > org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63) > at > org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142) > at > org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159) > at > org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117) > at > org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:109) > at java.io.DataOutputStream.writeInt(DataOutputStream.java:197) > {code} > At the time that this happens there are way less xceivers than configured. > On most JDK's this will make 50 the total backlog at any time. This > effectively means that any GC + Busy time willl result in tcp resets. > http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/tip/src/share/classes/java/net/ServerSocket.java#l370 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-9669) TcpPeerServer should respect ipc.server.listen.queue.size
[ https://issues.apache.org/jira/browse/HDFS-9669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-9669: --- Resolution: Fixed Fix Version/s: 2.7.3 Target Version/s: 2.7.3 Status: Resolved (was: Patch Available) > TcpPeerServer should respect ipc.server.listen.queue.size > - > > Key: HDFS-9669 > URL: https://issues.apache.org/jira/browse/HDFS-9669 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.7.2 >Reporter: Elliott Clark >Assignee: Elliott Clark > Fix For: 2.7.3 > > Attachments: HDFS-9669.0.patch, HDFS-9669.1.patch, HDFS-9669.1.patch > > > On periods of high traffic we are seeing: > {code} > 16/01/19 23:40:40 WARN hdfs.DFSClient: Connection failure: Failed to connect > to /10.138.178.47:50010 for file /MYPATH/MYFILE for block > BP-1935559084-10.138.112.27-1449689748174:blk_1080898601_7375294:java.io.IOException: > Connection reset by peer > java.io.IOException: Connection reset by peer > at sun.nio.ch.FileDispatcherImpl.write0(Native Method) > at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) > at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) > at sun.nio.ch.IOUtil.write(IOUtil.java:65) > at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471) > at > org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63) > at > org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142) > at > org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159) > at > org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117) > at > org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:109) > at java.io.DataOutputStream.writeInt(DataOutputStream.java:197) > {code} > At the time that this happens there are way less xceivers than configured. > On most JDK's this will make 50 the total backlog at any time. This > effectively means that any GC + Busy time willl result in tcp resets. > http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/tip/src/share/classes/java/net/ServerSocket.java#l370 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9669) TcpPeerServer should respect ipc.server.listen.queue.size
[ https://issues.apache.org/jira/browse/HDFS-9669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-9669: --- Issue Type: Improvement (was: Bug) > TcpPeerServer should respect ipc.server.listen.queue.size > - > > Key: HDFS-9669 > URL: https://issues.apache.org/jira/browse/HDFS-9669 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.7.2 >Reporter: Elliott Clark >Assignee: Elliott Clark > Attachments: HDFS-9669.0.patch, HDFS-9669.1.patch, HDFS-9669.1.patch > > > On periods of high traffic we are seeing: > {code} > 16/01/19 23:40:40 WARN hdfs.DFSClient: Connection failure: Failed to connect > to /10.138.178.47:50010 for file /MYPATH/MYFILE for block > BP-1935559084-10.138.112.27-1449689748174:blk_1080898601_7375294:java.io.IOException: > Connection reset by peer > java.io.IOException: Connection reset by peer > at sun.nio.ch.FileDispatcherImpl.write0(Native Method) > at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) > at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) > at sun.nio.ch.IOUtil.write(IOUtil.java:65) > at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471) > at > org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63) > at > org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142) > at > org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159) > at > org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117) > at > org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:109) > at java.io.DataOutputStream.writeInt(DataOutputStream.java:197) > {code} > At the time that this happens there are way less xceivers than configured. > On most JDK's this will make 50 the total backlog at any time. This > effectively means that any GC + Busy time willl result in tcp resets. > http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/tip/src/share/classes/java/net/ServerSocket.java#l370 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9669) TcpPeerServer should respect ipc.server.listen.queue.size
[ https://issues.apache.org/jira/browse/HDFS-9669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliott Clark updated HDFS-9669: Attachment: HDFS-9669.1.patch Checkstyle nit. > TcpPeerServer should respect ipc.server.listen.queue.size > - > > Key: HDFS-9669 > URL: https://issues.apache.org/jira/browse/HDFS-9669 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.2 >Reporter: Elliott Clark >Assignee: Elliott Clark > Attachments: HDFS-9669.0.patch, HDFS-9669.1.patch, HDFS-9669.1.patch > > > On periods of high traffic we are seeing: > {code} > 16/01/19 23:40:40 WARN hdfs.DFSClient: Connection failure: Failed to connect > to /10.138.178.47:50010 for file /MYPATH/MYFILE for block > BP-1935559084-10.138.112.27-1449689748174:blk_1080898601_7375294:java.io.IOException: > Connection reset by peer > java.io.IOException: Connection reset by peer > at sun.nio.ch.FileDispatcherImpl.write0(Native Method) > at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) > at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) > at sun.nio.ch.IOUtil.write(IOUtil.java:65) > at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471) > at > org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63) > at > org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142) > at > org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159) > at > org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117) > at > org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:109) > at java.io.DataOutputStream.writeInt(DataOutputStream.java:197) > {code} > At the time that this happens there are way less xceivers than configured. > On most JDK's this will make 50 the total backlog at any time. This > effectively means that any GC + Busy time willl result in tcp resets. > http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/tip/src/share/classes/java/net/ServerSocket.java#l370 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9669) TcpPeerServer should respect ipc.server.listen.queue.size
[ https://issues.apache.org/jira/browse/HDFS-9669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliott Clark updated HDFS-9669: Attachment: HDFS-9669.1.patch > TcpPeerServer should respect ipc.server.listen.queue.size > - > > Key: HDFS-9669 > URL: https://issues.apache.org/jira/browse/HDFS-9669 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.2 >Reporter: Elliott Clark >Assignee: Elliott Clark > Attachments: HDFS-9669.0.patch, HDFS-9669.1.patch > > > On periods of high traffic we are seeing: > {code} > 16/01/19 23:40:40 WARN hdfs.DFSClient: Connection failure: Failed to connect > to /10.138.178.47:50010 for file /MYPATH/MYFILE for block > BP-1935559084-10.138.112.27-1449689748174:blk_1080898601_7375294:java.io.IOException: > Connection reset by peer > java.io.IOException: Connection reset by peer > at sun.nio.ch.FileDispatcherImpl.write0(Native Method) > at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) > at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) > at sun.nio.ch.IOUtil.write(IOUtil.java:65) > at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471) > at > org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63) > at > org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142) > at > org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159) > at > org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117) > at > org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:109) > at java.io.DataOutputStream.writeInt(DataOutputStream.java:197) > {code} > At the time that this happens there are way less xceivers than configured. > On most JDK's this will make 50 the total backlog at any time. This > effectively means that any GC + Busy time willl result in tcp resets. > http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/tip/src/share/classes/java/net/ServerSocket.java#l370 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9669) TcpPeerServer should respect ipc.server.listen.queue.size
[ https://issues.apache.org/jira/browse/HDFS-9669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliott Clark updated HDFS-9669: Affects Version/s: 2.7.2 Status: Patch Available (was: Open) > TcpPeerServer should respect ipc.server.listen.queue.size > - > > Key: HDFS-9669 > URL: https://issues.apache.org/jira/browse/HDFS-9669 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.2 >Reporter: Elliott Clark >Assignee: Elliott Clark > Attachments: HDFS-9669.0.patch > > > On periods of high traffic we are seeing: > {code} > 16/01/19 23:40:40 WARN hdfs.DFSClient: Connection failure: Failed to connect > to /10.138.178.47:50010 for file /MYPATH/MYFILE for block > BP-1935559084-10.138.112.27-1449689748174:blk_1080898601_7375294:java.io.IOException: > Connection reset by peer > java.io.IOException: Connection reset by peer > at sun.nio.ch.FileDispatcherImpl.write0(Native Method) > at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) > at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) > at sun.nio.ch.IOUtil.write(IOUtil.java:65) > at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471) > at > org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63) > at > org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142) > at > org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159) > at > org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117) > at > org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:109) > at java.io.DataOutputStream.writeInt(DataOutputStream.java:197) > {code} > At the time that this happens there are way less xceivers than configured. > On most JDK's this will make 50 the total backlog at any time. This > effectively means that any GC + Busy time willl result in tcp resets. > http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/tip/src/share/classes/java/net/ServerSocket.java#l370 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9669) TcpPeerServer should respect ipc.server.listen.queue.size
[ https://issues.apache.org/jira/browse/HDFS-9669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliott Clark updated HDFS-9669: Attachment: HDFS-9669.0.patch Straight forward patch to make sure that all the places that bind use the listen backlog setting. > TcpPeerServer should respect ipc.server.listen.queue.size > - > > Key: HDFS-9669 > URL: https://issues.apache.org/jira/browse/HDFS-9669 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Elliott Clark >Assignee: Elliott Clark > Attachments: HDFS-9669.0.patch > > > On periods of high traffic we are seeing: > {code} > 16/01/19 23:40:40 WARN hdfs.DFSClient: Connection failure: Failed to connect > to /10.138.178.47:50010 for file /MYPATH/MYFILE for block > BP-1935559084-10.138.112.27-1449689748174:blk_1080898601_7375294:java.io.IOException: > Connection reset by peer > java.io.IOException: Connection reset by peer > at sun.nio.ch.FileDispatcherImpl.write0(Native Method) > at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) > at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) > at sun.nio.ch.IOUtil.write(IOUtil.java:65) > at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471) > at > org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63) > at > org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142) > at > org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159) > at > org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117) > at > org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:109) > at java.io.DataOutputStream.writeInt(DataOutputStream.java:197) > {code} > At the time that this happens there are way less xceivers than configured. > On most JDK's this will make 50 the total backlog at any time. This > effectively means that any GC + Busy time willl result in tcp resets. > http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/tip/src/share/classes/java/net/ServerSocket.java#l370 -- This message was sent by Atlassian JIRA (v6.3.4#6332)