Github user vanzin commented on a diff in the pull request:
https://github.com/apache/spark/pull/23030#discussion_r233946890
--- Diff:
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala
---
@@ -598,13 +598,24 @@ private[yarn] class YarnAllocator(
(false, s"Container ${containerId}${onHostStr} was preempted.")
// Should probably still count memory exceeded exit codes
towards task failures
case VMEM_EXCEEDED_EXIT_CODE =>
- (true, memLimitExceededLogMessage(
- completedContainer.getDiagnostics,
- VMEM_EXCEEDED_PATTERN))
+ val suggestion = if
(conf.getBoolean(YarnConfiguration.NM_VMEM_CHECK_ENABLED,
+ YarnConfiguration.DEFAULT_NM_VMEM_CHECK_ENABLED)) {
+ s"Consider disabling
${YarnConfiguration.NM_VMEM_CHECK_ENABLED} because of YARN-4714"
+ } else {
+ ""
+ }
+ val matcher =
VMEM_EXCEEDED_PATTERN.matcher(completedContainer.getDiagnostics)
+ val diag = if (matcher.find()) " " + matcher.group() + "."
else ""
+ val message =
+ s"Container killed by YARN for exceeding virtual memory
limits.$diag $suggestion."
+ (true, message)
case PMEM_EXCEEDED_EXIT_CODE =>
- (true, memLimitExceededLogMessage(
- completedContainer.getDiagnostics,
- PMEM_EXCEEDED_PATTERN))
+ val suggestion = s"Consider boosting
${EXECUTOR_MEMORY_OVERHEAD.key}"
--- End diff --
Why isn't this a suggestion in the vmem case too? It can help, even if it's
a sledgehammer. Otherwise, you're basically telling the user in that case:
"whatcha gonna do? it failed."
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]