[ 
https://issues.apache.org/jira/browse/HDFS-14783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17063802#comment-17063802
 ] 

Haibin Huang edited comment on HDFS-14783 at 3/21/20, 7:28 AM:
---------------------------------------------------------------

Thanks [~elgoiri] for suggestion, i think changing the behavior of SampleStat 
is not good too.So i remove the timestamp and use another way to judge expired 
SampleStat in DataNodePeerMetrics. When dn1 don't send any packet to dn2 for a 
long time, the SampleStat of DataNodePeerMetrics won't change, so the the same 
metrics info will be generated at every time doing 
org.apache.hadoop.metrics2.lib.MutableRollingAverages#rollOverAvgs():
{code:java}
final SumAndCount sumAndCount = new SumAndCount(
    rate.lastStat().total(),
    rate.lastStat().numSamples());
/* put newest sum and count to the end */
if (!deque.offerLast(sumAndCount)) {
  deque.pollFirst();
  deque.offerLast(sumAndCount);
}
{code}
Which will make the deque filled with the same sumAndCount. So just need to 
check  all members in the deque are the same, we can see the SampleStat hasn't 
changed int the last 36*300_000 ms.I think we can use this way to judge the 
exipired SampleStat in DataNodePeerMetrics.


was (Author: huanghaibin):
Thanks [~elgoiri] for suggestion, i think changing the behavior of SampleStat 
is not good too.So i remove the timestamp and use another way to judge expired 
SampleStat in DataNodePeerMetrics. When dn1 don't send any packet to dn2 for a 
long time, the SampleStat of DataNodePeerMetrics won't change, so the the same 
metrics info will be generated at every time doing 
org.apache.hadoop.metrics2.lib.MutableRollingAverages#rollOverAvgs():
{code:java}
final SumAndCount sumAndCount = new SumAndCount(
    rate.lastStat().total(),
    rate.lastStat().numSamples());
/* put newest sum and count to the end */
if (!deque.offerLast(sumAndCount)) {
  deque.pollFirst();
  deque.offerLast(sumAndCount);
}
{code}
Which will make the deque filled with the same sumAndCount. So just need to 
check  all members in the deque are the same, we can see whether the SampleStat 
hasn't changed int the last 36*300_000 ms.I think we can use this way to judge 
the exipired SampleStat in DataNodePeerMetrics.

> Expired SampleStat needs to be removed from SlowPeersReport
> -----------------------------------------------------------
>
>                 Key: HDFS-14783
>                 URL: https://issues.apache.org/jira/browse/HDFS-14783
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Haibin Huang
>            Assignee: Haibin Huang
>            Priority: Major
>         Attachments: HDFS-14783, HDFS-14783-001.patch, HDFS-14783-002.patch, 
> HDFS-14783-003.patch, HDFS-14783-004.patch
>
>
> SlowPeersReport is calculated by the SampleStat between tow dn, so it can 
> present on nn's jmx like this:
> {code:java}
> "SlowPeersReport" :[{"SlowNode":"dn2","ReportingNodes":["dn1"]}]
> {code}
> the SampleStat is stored in a LinkedBlockingDeque<SumAndCount>, it won't be 
> removed until the queue is full and a newest one is generated. Therefore, if 
> dn1 don't send any packet to dn2 for a long time, the old SampleStat will 
> keep staying in the queue, and will be used to calculated slowpeer.I think 
> these old SampleStats should be considered as expired message and ignore them 
> when generating a new SlowPeersReport.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to