[ 
https://issues.apache.org/jira/browse/YARN-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15345298#comment-15345298
 ] 

Varun Vasudev commented on YARN-5274:
-------------------------------------

The node health script is meant for the health of the node. It can't mark a 
single disk as bad. Just to clarify - I'm more interested in using the health 
check that smartctl provides rather than the disk lifetime features.

The health test to determine if a disk should be valid whether the disk is a 
HDD or SSD. We shouldn't use smartctl if it doesn't apply to storage in 
question, and fallback on the existing checks.

Today YARN will use disks with bad sectors in JBOD setups where smartctl would 
have identified the bad disks. Even worse - sometimes, things like shuffle will 
hang in a disk read leading to a big slowdown in job execution. Where explicit 
monitoring does not exist, the NM can take some pro-active steps to detect bad 
disks.

> Use smartctl to determine health of disks
> -----------------------------------------
>
>                 Key: YARN-5274
>                 URL: https://issues.apache.org/jira/browse/YARN-5274
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>            Reporter: Varun Vasudev
>
> It would be nice to add support for smartctl(on machines where it is 
> available) to determine disk health for the YARN local and log dirs(if 
> smartctl is applicable). The current disk checking mechanism misses out on 
> issues like bad sectors, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to