[GitHub] spark pull request: [WIP][Spark-SQL] Optimize the Constant Folding...

marmbrus Tue, 29 Apr 2014 19:03:52 -0700

Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/482#issuecomment-41753498
  
    I think the problem with the previous implementation is the change is much 
larger and therefore harder to understand. (It took me a while to look at it 
and find bugs, but there were quite a few and I'm sure I didn't find them all.)
    
    A lot of concepts _could_ be added to the expression hierarchy, but the 
goal of Catalyst is to instead allow self contained rules to perform 
optimizations.  By keeping these optimization rules self contained it is much 
easier to look in one place and reason about their correctness.  Making the 
rule simple while pushing all of the complex logic to many different places in 
the code is not the goal here.
    
    The only reason nullability is modeled in the expression hierarchy at all 
is because it is a fairly fundamental concept that is  part of the schema of 
the data.
    
    Regarding HiveGenericUdf, I'm not sure how the three state solution allows 
us to do any extra folding.  The UDF is still a black box and we have no idea 
how its going to respond to null inputs.
    
    Regarding your last point about code generation, we don't need three 
states.  We already know if we need to track a nullable bit when doing code 
generation with the current two state implementation.  If it is possible 
statically determine a value to be null, that will have already been done by 
optimization rules and we will just have a null literal.  All we need is a 
special rule that generates the AST for `Literal(null, _)`.  So, actually I 
think the code generation logic is simpler with this implementation.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][Spark-SQL] Optimize the Constant Folding...

Reply via email to