Github user rekhajoshm commented on the pull request:
https://github.com/apache/spark/pull/7602#issuecomment-125819970
@andrewor14 thanks for checking. imo the OOM can only happen if system is
running low on memory and/or GC is working 98% of the time and still being able
to free < 2%.This can happen if GC reclaims memory but new objects are getting
created on heap âsimultaneously", can happen in our case strings created in
loops and recursive calls.I did test the DAG visualization on few jobs, no
OOM/heap issue for me.That said, this issue is a variation on complexity of the
job with machine configuration/parallel jobs.
While -Xmx increase is a viable option, this patch is to align with best
practices of using optimum object and not entirely depend on GC, as even when
the de-scoped objects get eligible for GC, it does not guarantee that they are
reclaimed immediately.In addition for just in case scenarios had added
SparkException to catch OOM, as some users perceive stacktrace as a system flaw
which forgot to anticipate a possible concern, anyhow agreed with @JoshRosen
and removed catch.
Please review/approve @andrewor14 @JoshRosen thanks.

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]