[GitHub] spark pull request: [SPARK-2973] [SQL] Avoid spark job for take on...

marmbrus Mon, 06 Apr 2015 19:41:22 -0700

Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/5247#issuecomment-90322411
  
    Thanks for working on this!  I have one issue with the current 
implementation.  In particular, it is essentially doing query planning inside 
of [[DataFrame]].  If you look, this is very close to the logic that can be 
found here: 
https://github.com/apache/spark/blob/a95043b1780bfde556db2dcc01511e40a12498dd/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala#L263
    
    I'd rather not spread this logic out over several different places.
    
    Really, it seems to me like `ExecutedCommand` is probably redundant with 
`LocalRelation`.  I haven't spent a lot of time looking at this, but it seems 
like things would be cleaner if we do the following:
    
     - High level trait called `Command`.  All things that should be eagerly 
evaluated will mix in this trait.  We can simplify the matching in DataFrame to 
only check if something is a `Command` and if so call 
`queryExecution.executedPlan` to trigger eager execution.
     - Remove ExecutedCommand, do the conversion `LocalRelation` in the query 
planner.
    
    What do you think?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-2973] [SQL] Avoid spark job for take on...

Reply via email to