Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/20024#discussion_r159589126
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
---
@@ -203,9 +203,26 @@ case class Cast(child: Expression, dataType: DataType,
timeZoneId: Option[String
// UDFToString
private[this] def castToString(from: DataType): Any => Any = from match {
case BinaryType => buildCast[Array[Byte]](_, UTF8String.fromBytes)
+ case StringType => buildCast[UTF8String](_, identity)
case DateType => buildCast[Int](_, d =>
UTF8String.fromString(DateTimeUtils.dateToString(d)))
--- End diff --
we may covert a string to `UTF8String` and then convert it back, which is
inefficient. I think we should create a special `StringBuilder` for
`UTF8String`, e.g.
```
class UTF8StringBuilder {
public void append(UTF8String str)
}
```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]