[GitHub] spark pull request: [SPARK-5657][Examples][PySpark] Add PySpark Av...

marmbrus Tue, 19 May 2015 11:58:32 -0700

Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/4434#issuecomment-103632877
  
    I think that Spark SQL is perhaps somewhat misleadingly named (as I 
discussed [at the last Spark 
Summit](http://www.slideshare.net/databricks/spark-sqlsse2015public)). You can 
always call `.rdd` on any dataframe to get the underlying RDD if you don't want 
to do higher level DataFrame/SQL operations.
    
    The Data Sources API is the preferred way for reading data in various 
formats as it is more concise, can perform optimizations like column pruning 
automatically and it works the same in Scala/Java/Python/R, obviating the need 
specific examples for every format/language combination.
    
    So, while I appreciate the work you have done here, I don't think its worth 
the maintenance burden to add this specific example.  It would probably be 
better as a gist or a blog post somewhere.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-5657][Examples][PySpark] Add PySpark Av...

Reply via email to