[ 
https://issues.apache.org/jira/browse/HDFS-13828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amithsha resolved HDFS-13828.
-----------------------------
    Resolution: Not A Problem

> DataNode breaching Xceiver Count
> --------------------------------
>
>                 Key: HDFS-13828
>                 URL: https://issues.apache.org/jira/browse/HDFS-13828
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode
>    Affects Versions: 2.7.1
>            Reporter: Amithsha
>            Priority: Critical
>
> We were observing the breach of the xceiver count 4096, On a particular set 
> of nodes from 5 - 8 nodes in a 900 nodes cluster.
> And we stopped the datanode services on those nodes and made to replicate 
> across the cluster. After that also, we observed the same issue on a new set 
> of nodes.
> Q1: Why on a particular node, and also after decommissioning the node the 
> data should be replicated across the cluster, But why again difference set of 
> node?
> Assumptions :
> Reading a particular block/ data on that node might be the cause for this but 
> it should be mitigated after the decommission but not why? So suspected that 
> those MR jobs are triggered from Hive, so the query might be referring to the 
> same block mulitple times  in different stages and creating this issue?
> From Thread Dump :
> Thread dump of datanode says that out of 4090+ xceiver threads created on 
> that node nearly 4000+ where belong to the same AppId of multiple mappers 
> with state no operation.
>  
> Any suggestions on this?
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Reply via email to