Github user aarondav commented on the pull request:
https://github.com/apache/spark/pull/7028#issuecomment-118447192
The problem with any approach that wraps the exception is that we no longer
throw an exception of the original type; we instead always throw
SparkExceptions (as in your PR). This could be considered an API-breaking
change, and one that would only break at runtime.
The benefit of appending stack trace elements is that one can apply it to
any exception without impact on the callers or DAGScheduler.
I think that it is very intuitive to join the stacks before and after an
event loop, and it has the expected semantics of "code leaving the area I know
and entering into scary Spark internals". However, I agree that the fact that
we're joining a "user-readable" stack instead of the actual stack may be
confusing.
A compromise would be to make the stack look like this:
```
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0
in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0
(TID 0, localhost): java.lang.RuntimeException: uh-oh!
at
org.apache.spark.scheduler.DAGSchedulerSuite$$anonfun$37$$anonfun$38$$anonfun$apply$mcJ$sp$2.apply(DAGSchedulerSuite.scala:883)
at
org.apache.spark.scheduler.DAGSchedulerSuite$$anonfun$37$$anonfun$38$$anonfun$apply$mcJ$sp$2.apply(DAGSchedulerSuite.scala:883)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1627)
at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1095)
at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1095)
at
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1774)
at
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1774)
at
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
at org.apache.spark.scheduler.Task.run(Task.scala:70)
at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Driver stacktrace:
at
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1294)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1285)
at ===== DAGScheduler EventLoop Submission =====.()
at
org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:558)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1741)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1759)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1774)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1788)
at org.apache.spark.rdd.RDD.count(RDD.scala:1095)
...
```
The difference from the current version being that this also includes the
Spark internals leading up to the actual EventLoop itself, which makes the
stack appear more natural (but also uglier due to the several indirections
through runJob). The other difference is that it would return a more intuitive
(but less useful) stack trace in situations where callSite is currently used in
Spark (such as when starting a new job in Spark Streaming).
I believe that this change would satisfy your biggest concern (making the
stack trace magical) because, as I said, I think this stack trace is actually
very readable, and I'm willing to trade off some usefulness for less magic
because it still gets us much farther than today.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]