[GitHub] spark pull request: SPARK-1251 Support for optimizing and executin...

marmbrus Wed, 19 Mar 2014 18:42:23 -0700

Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/146#issuecomment-38127613
  
    Okay, I think I've addressed the most recent round of comments.  Thanks 
again for all the detailed feedback!
    
    A few notes:
     - The out of date URL was copied from GraphX, its fixed and I fixed all 
the other places it shows up in the repository
     - I changed the use of relative imports in all the files that I edited and 
@liancheng has already offered to update the rest of the code at some point.
     - I did a bunch of minor updates to the docs.  More things are private, 
and the examples should now copy/paste and cleanly execute into the spark-shell.
     - Regarding SQLContext serialization errors, I'm not sure why they happen 
and they don't even seem to happen reliably.  I'd guess it has something to do 
with implicits.  Regardless, you are like the third person to have problems 
with this, so I just went ahead and made SQLContext serializable, making all of 
its fields `@transient`.  I checked that this worked in the spark-shell, even 
when explicitly included in a closure.
     - For the SQLParser case sensitivity, that trick only works for 
RegExParsers, not TokenParsers.  It would be a lot of work (if even possible) 
to convert to a RegExParser.  I'd rather switch to a real parser down the road.
     - For the file names, I fixed the mentioned instances where the name did 
not match the class.  Regarding the use of lowercase filenames, I think this is 
a case where we unnecessarily, deviate from the Scala style guide.  From a 
discussion with Patrick on this subject:
    
    > From the scala style guide: 
    
    > > All multi-unit files should be given camelCase names with a lower-case 
first letter. This is a very 
    > > important convention. It differentiates multi- from single-unit files, 
greatly easing the process of 
    > > finding declarations. These filenames may be based upon a significant 
type which they contain 
    > > (e.g. option.scala for the example above), or may be descriptive of the 
logical property shared 
    > > by all units within (e.g. ast.scala). 
    
    > I think when there are just support classes and one main parent class the 
Spark convention 
    > seems reasonable.  But due to all the tree objects in catalyst we often 
have groups of small, 
    > related classes in a single file, with no clear parent class.  So I think 
where we use the camel 
    > case convention it is materially different than in Spark.
    >
    > If you really think this is going to cause problems though I can change 
it :)
    
    
     - I also added a suite of tests that exercise the non-hive SQL pathway.  
This exposed an issues with self-joins of RDDs that is also now resolved.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1251 Support for optimizing and executin...

Reply via email to