dilipbiswal commented on a change in pull request #25331: [SPARK-27768][SQL]
Infinity, -Infinity, NaN should be recognized in a case insensitive manner.
URL: https://github.com/apache/spark/pull/25331#discussion_r311945691
##########
File path:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
##########
@@ -562,8 +593,12 @@ case class Cast(child: Expression, dataType: DataType,
timeZoneId: Option[String
// FloatConverter
private[this] def castToFloat(from: DataType): Any => Any = from match {
case StringType =>
- buildCast[UTF8String](_, s => try s.toString.toFloat catch {
- case _: NumberFormatException => null
+ buildCast[UTF8String](_, s => {
+ val floatStr = s.toString
+ try floatStr.toFloat catch {
+ case _: NumberFormatException =>
Review comment:
@maropu I did the comparison. So our current implementation of handling
these special values in the catch block seems to perform better than putting
these in the main code path. Here are the results. I didn't make the change to
the codegen path. So lets ignore the second line. The first set of numbers are
for the run with current code path. 2nd set of numbers are when these special
values are handled in the main code path.
```
Java HotSpot(TM) 64-Bit Server VM 1.8.0_202-b08 on Mac OS X 10.14.6
[info] Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz
[info] cast from string to double: Best Time(ms) Avg
Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info]
------------------------------------------------------------------------------------------------------------------------
[info] cast from string to double wholestage off 5076
5394 450 0.0 50759.0 1.0X
[info] cast from string to double wholestage on 4709
4876 194 0.0 47093.1 1.1X
[info]
Java HotSpot(TM) 64-Bit Server VM 1.8.0_202-b08 on Mac OS X 10.14.6
[info] Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz
[info] cast from string to double: Best Time(ms) Avg
Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info]
------------------------------------------------------------------------------------------------------------------------
[info] cast from string to double wholestage off 5002
5040 53 0.0 50023.6 1.0X
[info] cast from string to double wholestage on 4926
4976 45 0.0 49258.8 1.0X
~
```
Please let me know.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]