[GitHub] spark pull request: [SPARK-2393][SQL] Cost estimation optimization...

concretevitamin Thu, 10 Jul 2014 11:33:07 -0700

Github user concretevitamin commented on the pull request:

    https://github.com/apache/spark/pull/1238#issuecomment-48645064
  
    To handle potential overflow (one last TODO), I think there are a couple 
alternatives:
    
    - A: Throw exceptions for overflowing operations. Similar to [1].
    - B: Use [1], but replace the overflow situations with a Top and a Bottom 
that absorb/saturate things correspondingly. Similar concepts here [2].
    - C: Use [3] (or reimplement parts of it), which just carry out an 
overflowing operation in the lifted BigInt counterparts.
    
    I think Approach A is bad as when dealing with big data we'd almost 
certainly run into this case in the future. Approach B is reasonable in that 
whenever we see a Top/Bottom, we could just disable/special-case the cost 
estimation. Approach C looks okay too but may be too heavy.
    
    Let me know what do you guys think should go into this PR.
    
    [1] 
https://github.com/twitter/util/blob/master/util-core/src/main/scala/com/twitter/util/LongOverflowArith.scala
    [2] 
https://github.com/twitter/util/blob/master/util-core/src/main/scala/com/twitter/util/Duration.scala
    [3] 
https://github.com/non/spire/blob/master/core/src/main/scala/spire/math/SafeLong.scala



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2393][SQL] Cost estimation optimization...

Reply via email to