[
https://issues.apache.org/jira/browse/YARN-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15345298#comment-15345298
]
Varun Vasudev commented on YARN-5274:
-------------------------------------
The node health script is meant for the health of the node. It can't mark a
single disk as bad. Just to clarify - I'm more interested in using the health
check that smartctl provides rather than the disk lifetime features.
The health test to determine if a disk should be valid whether the disk is a
HDD or SSD. We shouldn't use smartctl if it doesn't apply to storage in
question, and fallback on the existing checks.
Today YARN will use disks with bad sectors in JBOD setups where smartctl would
have identified the bad disks. Even worse - sometimes, things like shuffle will
hang in a disk read leading to a big slowdown in job execution. Where explicit
monitoring does not exist, the NM can take some pro-active steps to detect bad
disks.
> Use smartctl to determine health of disks
> -----------------------------------------
>
> Key: YARN-5274
> URL: https://issues.apache.org/jira/browse/YARN-5274
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: nodemanager
> Reporter: Varun Vasudev
>
> It would be nice to add support for smartctl(on machines where it is
> available) to determine disk health for the YARN local and log dirs(if
> smartctl is applicable). The current disk checking mechanism misses out on
> issues like bad sectors, etc.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]