[GitHub] spark pull request: [Spark-1461] Deferred Expression Evaluation (s...

marmbrus Fri, 18 Apr 2014 16:53:25 -0700

Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/446#issuecomment-40854164
  
    A few high-level comments:
     - I'm not sure if stateful UDFs are actually something we want to support. 
 The semantics for them are not well defined in partitioned systems, especially 
where the optimizer decides the partitioning.  If you want things like row id 
there are already ways to do this with map partitions with index.
     - The deferred evaluation class seems like a complicated way to get short 
circuit evaluation.  In a lot of cases can't we just change the ordering of 
calling the existing eval method?  Adding a new interface complicates things, 
and in some simple benchmarks that I ran this code is actually slower than what 
was there before (probably because of the extra object allocations).
     - There are a lot of unrelated changes here also.  While fixing a minor 
spelling error or something is okay, making a whole bunch of unrelated changes 
makes reviewing the PR more difficult for us.  For example, maybe you can do 
the data type additions for Hive UDFs in their own PR.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [Spark-1461] Deferred Expression Evaluation (s...

Reply via email to