[GitHub] spark pull request: [SPARK-8796][SQL] make sure SparkPlan is only ...

cloud-fan Fri, 03 Jul 2015 20:47:06 -0700

Github user cloud-fan commented on the pull request:

    https://github.com/apache/spark/pull/7192#issuecomment-118458771
  
    @yhuai @andrewor14 
    Actually there is no problem in current code, what I was trying to do is to 
avoid unnecessary serialization(when we call RDD API, we pass in a function and 
serialize that function with everything referenced by the closure and then 
transport it to executor side).
    
    We build SQL logical plan tree and physical plan(`SparkPlan`) tree at 
driver side, and execute the physical plan tree to get result. The execution is 
just running some functions on RDDs, serialize and transport these functions 
with everything referenced by them to executor side. Logically speaking, we 
don't need to serialize and transport the whole physical plan tree to executor 
side, only something that is really used inside closure, thus we can serialize 
and transport less data which is good I think.
    I'm not sure how much we can gain from this change, if it not worth, I'll 
close this PR.
    
    cc @marmbrus



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-8796][SQL] make sure SparkPlan is only ...

Reply via email to