[jira] [Comment Edited] (HADOOP-9640) RPC Congestion Control with FairCallQueue

2019-02-08 Thread Wei-Chiu Chuang (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-9640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16763774#comment-16763774
 ] 

Wei-Chiu Chuang edited comment on HADOOP-9640 at 2/8/19 5:33 PM:
-

Works for me. Thanks for the help here! We didn't enable FairCallQueue and the 
related features in CDH5 and we just started looking into this feature for 
CDH6. I am reviewing HADOOP-10286 (as well as HDFS-12345)


was (Author: jojochuang):
Works for me. Thanks for the help here! We didn't enable FairCallQueue and the 
related features in CDH5 and we just started looking into this feature. I am 
reviewing HADOOP-10286 (as well as HDFS-12345)

> RPC Congestion Control with FairCallQueue
> -
>
> Key: HADOOP-9640
> URL: https://issues.apache.org/jira/browse/HADOOP-9640
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 2.2.0, 3.0.0-alpha1
>Reporter: Xiaobo Peng
>Assignee: Chris Li
>Priority: Major
>  Labels: hdfs, qos, rpc
> Attachments: FairCallQueue-PerformanceOnCluster.pdf, 
> MinorityMajorityPerformance.pdf, NN-denial-of-service-updated-plan.pdf, 
> faircallqueue.patch, faircallqueue2.patch, faircallqueue3.patch, 
> faircallqueue4.patch, faircallqueue5.patch, faircallqueue6.patch, 
> faircallqueue7_with_runtime_swapping.patch, 
> rpc-congestion-control-draft-plan.pdf
>
>
> For an easy-to-read summary see: 
> http://www.ebaytechblog.com/2014/08/21/quality-of-service-in-hadoop/
> Several production Hadoop cluster incidents occurred where the Namenode was 
> overloaded and failed to respond. 
> We can improve quality of service for users during namenode peak loads by 
> replacing the FIFO call queue with a [Fair Call 
> Queue|https://issues.apache.org/jira/secure/attachment/12616864/NN-denial-of-service-updated-plan.pdf].
>  (this plan supersedes rpc-congestion-control-draft-plan).
> Excerpted from the communication of one incident, “The map task of a user was 
> creating huge number of small files in the user directory. Due to the heavy 
> load on NN, the JT also was unable to communicate with NN...The cluster 
> became responsive only once the job was killed.”
> Excerpted from the communication of another incident, “Namenode was 
> overloaded by GetBlockLocation requests (Correction: should be getFileInfo 
> requests. the job had a bug that called getFileInfo for a nonexistent file in 
> an endless loop). All other requests to namenode were also affected by this 
> and hence all jobs slowed down. Cluster almost came to a grinding 
> halt…Eventually killed jobtracker to kill all jobs that are running.”
> Excerpted from HDFS-945, “We've seen defective applications cause havoc on 
> the NameNode, for e.g. by doing 100k+ 'listStatus' on very large directories 
> (60k files) etc.”



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-9640) RPC Congestion Control with FairCallQueue

2018-06-28 Thread Yiqun Lin (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-9640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16526157#comment-16526157
 ] 

Yiqun Lin edited comment on HADOOP-9640 at 6/28/18 9:39 AM:


Hi all,
 Any progress on this issue? This looks like a good feature and can be used in 
production env, any documentation where we can learn from that how to configure 
using this? Appreciate for this, :).


was (Author: linyiqun):
Hi all,
Any progress on this issue? This looks like a good feature and can be used in 
production env, any documentation where we can learn from that how to configure 
using this? Appropriate for this, :).

> RPC Congestion Control with FairCallQueue
> -
>
> Key: HADOOP-9640
> URL: https://issues.apache.org/jira/browse/HADOOP-9640
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 2.2.0, 3.0.0-alpha1
>Reporter: Xiaobo Peng
>Assignee: Chris Li
>Priority: Major
>  Labels: hdfs, qos, rpc
> Attachments: FairCallQueue-PerformanceOnCluster.pdf, 
> MinorityMajorityPerformance.pdf, NN-denial-of-service-updated-plan.pdf, 
> faircallqueue.patch, faircallqueue2.patch, faircallqueue3.patch, 
> faircallqueue4.patch, faircallqueue5.patch, faircallqueue6.patch, 
> faircallqueue7_with_runtime_swapping.patch, 
> rpc-congestion-control-draft-plan.pdf
>
>
> For an easy-to-read summary see: 
> http://www.ebaytechblog.com/2014/08/21/quality-of-service-in-hadoop/
> Several production Hadoop cluster incidents occurred where the Namenode was 
> overloaded and failed to respond. 
> We can improve quality of service for users during namenode peak loads by 
> replacing the FIFO call queue with a [Fair Call 
> Queue|https://issues.apache.org/jira/secure/attachment/12616864/NN-denial-of-service-updated-plan.pdf].
>  (this plan supersedes rpc-congestion-control-draft-plan).
> Excerpted from the communication of one incident, “The map task of a user was 
> creating huge number of small files in the user directory. Due to the heavy 
> load on NN, the JT also was unable to communicate with NN...The cluster 
> became responsive only once the job was killed.”
> Excerpted from the communication of another incident, “Namenode was 
> overloaded by GetBlockLocation requests (Correction: should be getFileInfo 
> requests. the job had a bug that called getFileInfo for a nonexistent file in 
> an endless loop). All other requests to namenode were also affected by this 
> and hence all jobs slowed down. Cluster almost came to a grinding 
> halt…Eventually killed jobtracker to kill all jobs that are running.”
> Excerpted from HDFS-945, “We've seen defective applications cause havoc on 
> the NameNode, for e.g. by doing 100k+ 'listStatus' on very large directories 
> (60k files) etc.”



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org