[GitHub] spark issue #17059: [SPARK-19733][ML]Removed unnecessary castings and refact...

datumbox Mon, 27 Feb 2017 07:44:14 -0800

Github user datumbox commented on the issue:

    https://github.com/apache/spark/pull/17059
  
    @srowen The following snippet handles explicitly Longs. It can be rewritten 
to remove duplicate code by introducing bools for overflow detection but I 
don't think it is worth it. In theory you can catch also explicitly other types 
such as Byte and Short but I think that's an overkill.
    
    As far as I saw, all SQL numerical types inherit from Number so comparing 
their doubleValue with their intValue would be enough to check if they are 
within integer range. 
    
    ```scala
        val u = udf { (n: Any) =>
          n match {
            case v: Int => v
            case v: Long =>
              val intV = v.intValue
              if (v == intV) {
                intV
              }
              else {
                throw new IllegalArgumentException("out of range")
              }
            //case v: Byte => v.toInt 
            //case v: Short => v.toInt
            case v: Number =>
              val intV = v.intValue
              if (v.doubleValue == intV) {
                intV
              }
              else {
                throw new IllegalArgumentException("out of range")
              }
            case _ => throw new IllegalArgumentException("invalid type")
          }
        }
    ```
    
    Personally, I would remove the explicit Long case as it introduces 
duplicate code and does not help match. The remaining snippet avoids doing any 
casting if the ID is integer (which should be the majority of cases and yields 
the biggest memory/speed gains) or non-numeric and handles all corner cases 
(All scala/java numeric types + SQL Numerics). Agree?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #17059: [SPARK-19733][ML]Removed unnecessary castings and refact...

Reply via email to