GitHub user kiszk opened a pull request:

    https://github.com/apache/spark/pull/17684

    [SPARK-20341][SQL] Support BigInt's value that does not fit in long value 
range

    ## What changes were proposed in this pull request?
    
    This PR avoids an exception in the case where `scala.math.BigInt` has a 
value that does not fit into long value range (e.g. `Long.MAX_VALUE+1`). When 
we run the following code by using the current Spark, the following exception 
is thrown.
    
    This PR keeps the value using `BigDecimal` if we detect such an overflow 
case by catching `ArithmeticException`.
    
    
    Sample program:
    ```
    case class BigIntWrapper(value:scala.math.BigInt)```
    
spark.createDataset(BigIntWrapper(scala.math.BigInt("10000000000000000002"))::Nil).show
    ```
    Exception:
    ```
    Error while encoding: java.lang.ArithmeticException: BigInteger out of long 
range
    staticinvoke(class org.apache.spark.sql.types.Decimal$, DecimalType(38,0), 
apply, assertnotnull(assertnotnull(input[0, org.apache.spark.sql.BigIntWrapper, 
true])).value, true) AS value#0
    java.lang.RuntimeException: Error while encoding: 
java.lang.ArithmeticException: BigInteger out of long range
    staticinvoke(class org.apache.spark.sql.types.Decimal$, DecimalType(38,0), 
apply, assertnotnull(assertnotnull(input[0, org.apache.spark.sql.BigIntWrapper, 
true])).value, true) AS value#0
        at 
org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.toRow(ExpressionEncoder.scala:290)
        at 
org.apache.spark.sql.SparkSession$$anonfun$2.apply(SparkSession.scala:454)
        at 
org.apache.spark.sql.SparkSession$$anonfun$2.apply(SparkSession.scala:454)
        at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
        at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
        at scala.collection.immutable.List.foreach(List.scala:381)
        at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
        at scala.collection.immutable.List.map(List.scala:285)
        at 
org.apache.spark.sql.SparkSession.createDataset(SparkSession.scala:454)
        at org.apache.spark.sql.Agg$$anonfun$18.apply$mcV$sp(MySuite.scala:192)
        at org.apache.spark.sql.Agg$$anonfun$18.apply(MySuite.scala:192)
        at org.apache.spark.sql.Agg$$anonfun$18.apply(MySuite.scala:192)
        at 
org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
        at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
        at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
        at org.scalatest.Transformer.apply(Transformer.scala:22)
        at org.scalatest.Transformer.apply(Transformer.scala:20)
        at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166)
        at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:68)
        at 
org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163)
        at 
org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
        at 
org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
        at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
        at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175)
    ...
    Caused by: java.lang.ArithmeticException: BigInteger out of long range
        at java.math.BigInteger.longValueExact(BigInteger.java:4531)
        at org.apache.spark.sql.types.Decimal.set(Decimal.scala:140)
        at org.apache.spark.sql.types.Decimal$.apply(Decimal.scala:434)
        at org.apache.spark.sql.types.Decimal.apply(Decimal.scala)
        at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
 Source)
        at 
org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.toRow(ExpressionEncoder.scala:287)
        ... 59 more
    ```
    
    ## How was this patch tested?
    
    Add new test suite into `DecimalSuite`

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/kiszk/spark SPARK-20341

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/17684.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #17684
    
----
commit 9a15ba217ce442371db885e166f1e464d8eb9cfc
Author: Kazuaki Ishizaki <[email protected]>
Date:   2017-04-19T12:53:01Z

    initial commit

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to