Github user JoshRosen commented on the pull request:
https://github.com/apache/spark/pull/3518#issuecomment-66868725
IMO, this patch still needs a lot of work before it will be ready to merge.
I'm not convinced that telling me which RDD referenced the unserializable
object, by itself, will be a helpful debugging tool. In many cases, it's
obvious which object is non-serializable: for instance, say that I try to
serialize a database connection pool instance. If it's an
explicitly-referenced user-created object, then it's usually not too hard to
find out the source of the reference. The hard cases are where implicit
references to non-serializable objects like SparkContext have been included in
the closure. In these cases, I might only have one RDD in my dependency chain
and still run into serialization issues, in which case I don't feel that this
patch's current approach will be very helpful to me for debugging. It would be
much more useful to print a chain of references to a non-serializable object of
the appropriate type.
Do you mind running some examples in the `spark-shell` and pasting the
output generated by this patch? This would help me and other reviewers to
asses whether this patch's current approach is useful.
There are also many code style issues here, but I don't want to spend too
much time commenting on them before we make sure that the high-level approach
is okay.
Other reviewers: please take a look at the JIRA and chime in here. Do you
think that this patch's current functionality is useful, or should we block /
wait in favor of a more full-featured solution? I think that we have plenty of
time before 1.3.0, so I'm in favor of taking more time to implement a more
full-featured approach since I don't think we're in a huge rush for this
feature.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]