MaxGekk commented on code in PR #44463:
URL: https://github.com/apache/spark/pull/44463#discussion_r1436048800
##########
sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/SparkDateTimeUtils.scala:
##########
@@ -619,6 +616,29 @@ trait SparkDateTimeUtils {
case NonFatal(_) => None
}
}
+
+ /**
+ * This method retrieves the start and end indices of a byte array after
trimming
+ * any whitespace or ISO control characters.
+ * This way we can avoid allocating a new string with trimAll method
+ * and just operate between the trimmed indices.
+ *
+ * @param bytes The byte array to be trimmed.
+ * @return A tuple of two integers; first being the start and second the end
trimmed index.
+ */
+ private def getTrimmedStartEnd(bytes: Array[Byte]): (Int, Int) = {
+ var (start, end) = (0, bytes.length - 1)
+
+ while (start < bytes.length &&
UTF8String.isWhitespaceOrISOControl(bytes(start))) {
+ start += 1
+ }
+
+ while (end > start && UTF8String.isWhitespaceOrISOControl(bytes(end))) {
+ end -= 1
+ }
+
+ (start, end + 1)
Review Comment:
Don't you create a `Tuple` instance here. Is it possible to avoid this? For
example, define two separate `inline` functions:
- `getTrimmedStart(bytes: Array[Byte]): Int`
- `getTrimmedEnd(bytes: Array[Byte], start: Int): Int`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]