[jira] [Updated] (HDFS-10472) NameNode Rpc Reader Thread crash, and cluster hang.
[ https://issues.apache.org/jira/browse/HDFS-10472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ChenFolin updated HDFS-10472: - Attachment: HDFS-10472.patch add catch throwable > NameNode Rpc Reader Thread crash, and cluster hang. > --- > > Key: HDFS-10472 > URL: https://issues.apache.org/jira/browse/HDFS-10472 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, namenode >Affects Versions: 2.5.0, 2.6.0, 2.8.0, 2.7.2, 2.6.2, 2.6.4 >Reporter: ChenFolin > Labels: patch > Attachments: HDFS-10472.patch > > > My Cluster hang yesterday . > Becuase the rpc server Reader threads crash. So all rpc request timeout, > include datanode hearbeat &. > We can see , the method doRunLoop just catch InterruptedException and > IOException: > while (running) { > SelectionKey key = null; > try { > // consume as many connections as currently queued to avoid > // unbridled acceptance of connections that starves the select > int size = pendingConnections.size(); > for (int i=size; i>0; i--) { > Connection conn = pendingConnections.take(); > conn.channel.register(readSelector, SelectionKey.OP_READ, conn); > } > readSelector.select(); > Iterator iter = > readSelector.selectedKeys().iterator(); > while (iter.hasNext()) { > key = iter.next(); > iter.remove(); > if (key.isValid()) { > if (key.isReadable()) { > doRead(key); > } > } > key = null; > } > } catch (InterruptedException e) { > if (running) { // unexpected -- log it > LOG.info(Thread.currentThread().getName() + " unexpectedly > interrupted", e); > } > } catch (IOException ex) { > LOG.error("Error in Reader", ex); > } > } -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10472) NameNode Rpc Reader Thread crash, and cluster hang.
[ https://issues.apache.org/jira/browse/HDFS-10472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ChenFolin updated HDFS-10472: - Labels: patch (was: ) Release Note: catch throwable Status: Patch Available (was: Open) add catch throwable > NameNode Rpc Reader Thread crash, and cluster hang. > --- > > Key: HDFS-10472 > URL: https://issues.apache.org/jira/browse/HDFS-10472 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, namenode >Affects Versions: 2.6.4, 2.6.2, 2.7.2, 2.6.0, 2.5.0, 2.8.0 >Reporter: ChenFolin > Labels: patch > > My Cluster hang yesterday . > Becuase the rpc server Reader threads crash. So all rpc request timeout, > include datanode hearbeat &. > We can see , the method doRunLoop just catch InterruptedException and > IOException: > while (running) { > SelectionKey key = null; > try { > // consume as many connections as currently queued to avoid > // unbridled acceptance of connections that starves the select > int size = pendingConnections.size(); > for (int i=size; i>0; i--) { > Connection conn = pendingConnections.take(); > conn.channel.register(readSelector, SelectionKey.OP_READ, conn); > } > readSelector.select(); > Iterator iter = > readSelector.selectedKeys().iterator(); > while (iter.hasNext()) { > key = iter.next(); > iter.remove(); > if (key.isValid()) { > if (key.isReadable()) { > doRead(key); > } > } > key = null; > } > } catch (InterruptedException e) { > if (running) { // unexpected -- log it > LOG.info(Thread.currentThread().getName() + " unexpectedly > interrupted", e); > } > } catch (IOException ex) { > LOG.error("Error in Reader", ex); > } > } -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org