Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/7116#discussion_r33608957
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala ---
    @@ -1434,7 +1434,9 @@ class DataFrame private[sql](
         }
       }
     
    -  private[sql] def internalRowRdd = queryExecution.executedPlan.execute()
    +  // Note that the RDD is memoized. Once called, it won't change even if 
you change any
    +  // query planning related Spark SQL configurations.
    +  private[sql] def internalRowRdd = queryExecution.toRdd
    --- End diff --
    
    I guess I'm saying that I don't like this change.  Its just giving a new 
name to a concept that developers are already familiar with.  And now its 
possible that people will still use `toRdd` on the `queryExecution` (since its 
still there).  So, more concepts to juggle and changes to long lived concepts 
does not seem worth saving 6 characters to me.  Lazy vals from lazy vals also 
seems like unnecessary complexity to me.
    
    IMHO, I think we should get rid of this new method instead.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to