[GitHub] spark pull request: SPARK-1251 Support for optimizing and executin...

mateiz Wed, 19 Mar 2014 12:06:30 -0700

Github user mateiz commented on the pull request:

    https://github.com/apache/spark/pull/146#issuecomment-38093365
  
    Hey Michael, the new approach looks quite good to me. I noticed a few more 
packaging changes that we should make, but maybe it's okay to push some of 
these after merging the initial PR:
    # There seem to be some examples in the core package (e.g. 
http://people.apache.org/%7Epwendell/catalyst-docs-03-18/api/sql/core/#org.apache.spark.sql.examples.SchemaRddExample$)
 -- these should go in `examples`
    # The docs still say loadFile and writeToFile instead of parquetFile and 
saveAsParquetFile, and don't show the new way of creating schema RDDs
    # Some filenames don't match the class inside, e.g. SparkSQLContext. Some 
are also lowercase, e.g. generators.scala -- if it's a class and many small 
subclasses, you can call it Generator.scala and just have the subclasses there. 
Or move them to different files, it's not a big deal.
    # The POMs say `<url>http://spark-project.org/</url>` instead of 
`spark.apache.org` -- maybe this was copied from an old POM that is also wrong
    # One important code style comment: we don't use relative package names in 
Spark (e.g. import org.apache.spark and then import catalyst). This seems to be 
used in many files.
    
    Regarding the case-sensitive keywords, apparently you can use a regex 
instead of a string to avoid this: 
http://stackoverflow.com/questions/6080437/case-insensitive-scala-parser-combinator.
    
    Regarding the transient SQLContext in spark-shell, do you know what's 
bringing it in? If it doesn't get used in the actual computation, maybe we can 
just make it Serializable. I'm surprised this happens because SparkContext, for 
example, is not Serializable and does not get pulled in.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1251 Support for optimizing and executin...

Reply via email to