[GitHub] spark pull request: SPARK-1251 Support for optimizing and executin...

marmbrus Mon, 17 Mar 2014 13:58:30 -0700

Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/146#issuecomment-37869149
  
    Thanks for all the feedback everyone!
    
    A few responses:
    * Regarding the `NotSerializableException`, you do need to mark the 
SQLContext as `@transient` to use it in the console.  I'm not really sure what 
the best approach here is.  We could explicitly state this requirement in the 
docs, include it automatically with the correct annotation when initializing 
the REPL (though this won't help the same problem for Hive), or we could try 
and make the context itself serializable.
    * The SQL parser is case sensitive with respect to keywords, and AFAICT 
this is the case for Token parsers built with parser combinators.  We could do 
something gross like preprocess and make keywords all caps before handing off 
to the combinators.  Really, long term, we should not be writing our own SQL 
parser, as this is not the only problem with the included one.  Using Optiq's 
might be a better option.
    * I changed `loadFile` to just be `parquetFile`, similar to `textFile` in 
the standard `SparkContext`.
    * I also added functions to `SQLContext` that allow you to use all of the 
functionality without requiring implicits (intended for use in Java / Python).  
I did not, however, remove the implicit on RDDs as I think this is a nice part 
of the Scala API.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1251 Support for optimizing and executin...

Reply via email to