Github user cloud-fan commented on the pull request:
https://github.com/apache/spark/pull/7192#issuecomment-118458771
@yhuai @andrewor14
Actually there is no problem in current code, what I was trying to do is to
avoid unnecessary serialization(when we call RDD API, we pass in a function and
serialize that function with everything referenced by the closure and then
transport it to executor side).
We build SQL logical plan tree and physical plan(`SparkPlan`) tree at
driver side, and execute the physical plan tree to get result. The execution is
just running some functions on RDDs, serialize and transport these functions
with everything referenced by them to executor side. Logically speaking, we
don't need to serialize and transport the whole physical plan tree to executor
side, only something that is really used inside closure, thus we can serialize
and transport less data which is good I think.
I'm not sure how much we can gain from this change, if it not worth, I'll
close this PR.
cc @marmbrus
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]