[GitHub] [spark] dtenedor commented on a diff in pull request #36365: [SPARK-28516][SQL] Implement `to_char` and `try_to_char` functions to format Decimal values as strings

2022-05-12 Thread GitBox


dtenedor commented on code in PR #36365:
URL: https://github.com/apache/spark/pull/36365#discussion_r871723562


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/numberFormatExpressions.scala:
##
@@ -168,3 +168,159 @@ case class TryToNumber(left: Expression, right: 
Expression)
   newRight: Expression): TryToNumber =
 copy(left = newLeft, right = newRight)
 }
+
+/**
+ * A function that converts decimal values to strings, returning NULL if the 
decimal value fails to
+ * match the format string.
+ */
+@ExpressionDescription(
+  usage = """
+_FUNC_(numberExpr, formatExpr) - Convert `numberExpr` to a string based on 
the `formatExpr`.
+  Throws an exception if the conversion fails. The format can consist of 
the following
+  characters, case insensitive:
+'0' or '9': Specifies an expected digit between 0 and 9. A sequence of 
0 or 9 in the format
+  string matches a sequence of digits in the input value, generating a 
result string of the
+  same length as the corresponding sequence in the format string. The 
result string is
+  left-padded with zeros if the 0/9 sequence comprises more digits 
than the matching part of
+  the decimal value, starts with 0, and is before the decimal point. 
Otherwise, it is
+  padded with spaces.
+'.' or 'D': Specifies the position of the decimal point (optional, 
only allowed once).
+',' or 'G': Specifies the position of the grouping (thousands) 
separator (,). There must be
+  a 0 or 9 to the left and right of each grouping separator.
+'$': Specifies the location of the $ currency sign. This character may 
only be specified
+  once.
+'S' or 'MI': Specifies the position of a '-' or '+' sign (optional, 
only allowed once at
+  the beginning or end of the format string). Note that 'S' prints '+' 
for positive values
+  but 'MI' prints a space.
+'PR': Only allowed at the end of the format string; specifies that the 
result string will be
+  wrapped by angle brackets if the input value is negative.
+  ('<1>').
+  """,
+  examples = """
+Examples:
+  > SELECT _FUNC_(454, '999');
+   454
+  > SELECT _FUNC_(454.00, '000D00');
+   454.00
+  > SELECT _FUNC_(12454, '99G999');
+   12,454
+  > SELECT _FUNC_(78.12, '$99.99');
+   $78.12
+  > SELECT _FUNC_(-12454.8, '99G999D9S');
+   12,454.8-
+  """,
+  since = "3.3.0",

Review Comment:
   Done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dtenedor commented on a diff in pull request #36365: [SPARK-28516][SQL] Implement `to_char` and `try_to_char` functions to format Decimal values as strings

2022-05-12 Thread GitBox


dtenedor commented on code in PR #36365:
URL: https://github.com/apache/spark/pull/36365#discussion_r871723101


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ToNumberParser.scala:
##
@@ -599,4 +614,250 @@ class ToNumberParser(numberFormat: String, errorOnFail: 
Boolean) extends Seriali
   Decimal(javaDecimal, precision, scale)
 }
   }
+
+  /**
+   * Converts a decimal value to a string based on the given number format.
+   *
+   * Iterates through the [[formatTokens]] obtained from processing the format 
string, while also
+   * inspecting the input decimal value.
+   *
+   * @param input the decimal value that needs to be converted
+   * @return the result String value obtained from string formatting
+   */
+  def format(input: Decimal): UTF8String = {
+val result = new StringBuilder()
+// These are string representations of the input Decimal value.
+val (inputBeforeDecimalPoint: String,
+  inputAfterDecimalPoint: String) =
+  formatSplitInputBeforeAndAfterDecimalPoint(input)
+// These are indexes into the characters of the input string before and 
after the decimal point.
+formattingBeforeDecimalPointIndex = 0
+formattingAfterDecimalPointIndex = 0
+var reachedDecimalPoint = false
+
+// Iterate through the tokens representing the provided format string, in 
order.
+for (formatToken: InputToken <- formatTokens) {
+  formatToken match {
+case groups: DigitGroups =>
+  formatDigitGroups(
+groups, inputBeforeDecimalPoint, inputAfterDecimalPoint, 
reachedDecimalPoint, result)
+case DecimalPoint() =>
+  // If the last character so far is a space, change it to a zero. 
This means the input
+  // decimal does not have an integer part.
+  if (result.nonEmpty && result.last == SPACE) {
+result(result.length - 1) = ZERO_DIGIT
+  }
+  result.append(POINT_SIGN)
+  reachedDecimalPoint = true
+case DollarSign() =>
+  result.append(DOLLAR_SIGN)
+case _: OptionalPlusOrMinusSign =>
+  stripTrailingLoneDecimalPoint(result)
+  if (input < Decimal.ZERO) {
+addCharacterCheckingTrailingSpaces(result, MINUS_SIGN)
+  } else {
+addCharacterCheckingTrailingSpaces(result, PLUS_SIGN)
+  }
+case _: OptionalMinusSign =>
+  if (input < Decimal.ZERO) {
+stripTrailingLoneDecimalPoint(result)
+addCharacterCheckingTrailingSpaces(result, MINUS_SIGN)
+// Add a second space to account for the "MI" sequence comprising 
two characters in the
+// format string.
+result.append(SPACE)
+  } else {
+result.append(SPACE)
+result.append(SPACE)
+  }
+case OpeningAngleBracket() =>
+  if (input < Decimal.ZERO) {
+result.append(ANGLE_BRACKET_OPEN)
+  }
+case ClosingAngleBracket() =>
+  stripTrailingLoneDecimalPoint(result)
+  if (input < Decimal.ZERO) {
+addCharacterCheckingTrailingSpaces(result, ANGLE_BRACKET_CLOSE)
+  } else {
+result.append(SPACE)
+result.append(SPACE)
+  }
+  }
+}
+
+if (formattingBeforeDecimalPointIndex < inputBeforeDecimalPoint.length ||
+  formattingAfterDecimalPointIndex < inputAfterDecimalPoint.length) {
+  // Remaining digits before or after the decimal point exist in the 
decimal value but not in
+  // the format string.
+  formatMatchFailure(input, numberFormat)
+} else {
+  stripTrailingLoneDecimalPoint(result)
+  val str = result.toString
+  if (result.isEmpty || str == "+" || str == "-") {
+UTF8String.fromString("0")
+  } else {
+UTF8String.fromString(str)
+  }
+}
+  }
+
+  /**
+   * Splits the provided Decimal value's string representation by the decimal 
point, if any.
+   * @param input the Decimal value to consume
+   * @return two strings representing the contents before and after the 
decimal point (if any)
+   */
+  private def formatSplitInputBeforeAndAfterDecimalPoint(input: Decimal): 
(String, String) = {
+// Convert the input Decimal value to a string (without exponent notation).
+val inputString = input.toJavaBigDecimal.toPlainString
+// Split the digits before and after the decimal point.
+val tokens: Array[String] = inputString.split(POINT_SIGN)
+var beforeDecimalPoint: String = tokens(0)
+var afterDecimalPoint: String = if (tokens.length > 1) tokens(1) else ""
+// Strip any leading minus sign to consider the digits only.
+// Strip leading and trailing zeros to match cases when the format string 
begins with a decimal
+// point.
+beforeDecimalPoint = beforeDecimalPoint.dropWhile(c => c == MINUS_SIGN || 
c == ZERO_DIGIT)
+afterDecimalPoint = afterDecimalPoint.reverse.dropWhile(_ == 
ZERO_DIGIT).reverse
+
+   

[GitHub] [spark] dtenedor commented on a diff in pull request #36365: [SPARK-28516][SQL] Implement `to_char` and `try_to_char` functions to format Decimal values as strings

2022-05-12 Thread GitBox


dtenedor commented on code in PR #36365:
URL: https://github.com/apache/spark/pull/36365#discussion_r871722008


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ToNumberParser.scala:
##
@@ -599,4 +614,250 @@ class ToNumberParser(numberFormat: String, errorOnFail: 
Boolean) extends Seriali
   Decimal(javaDecimal, precision, scale)
 }
   }
+
+  /**
+   * Converts a decimal value to a string based on the given number format.
+   *
+   * Iterates through the [[formatTokens]] obtained from processing the format 
string, while also
+   * inspecting the input decimal value.
+   *
+   * @param input the decimal value that needs to be converted
+   * @return the result String value obtained from string formatting
+   */
+  def format(input: Decimal): UTF8String = {
+val result = new StringBuilder()
+// These are string representations of the input Decimal value.
+val (inputBeforeDecimalPoint: String,
+  inputAfterDecimalPoint: String) =
+  formatSplitInputBeforeAndAfterDecimalPoint(input)
+// These are indexes into the characters of the input string before and 
after the decimal point.
+formattingBeforeDecimalPointIndex = 0
+formattingAfterDecimalPointIndex = 0
+var reachedDecimalPoint = false
+
+// Iterate through the tokens representing the provided format string, in 
order.
+for (formatToken: InputToken <- formatTokens) {
+  formatToken match {
+case groups: DigitGroups =>
+  formatDigitGroups(
+groups, inputBeforeDecimalPoint, inputAfterDecimalPoint, 
reachedDecimalPoint, result)
+case DecimalPoint() =>
+  // If the last character so far is a space, change it to a zero. 
This means the input
+  // decimal does not have an integer part.
+  if (result.nonEmpty && result.last == SPACE) {
+result(result.length - 1) = ZERO_DIGIT
+  }
+  result.append(POINT_SIGN)
+  reachedDecimalPoint = true
+case DollarSign() =>
+  result.append(DOLLAR_SIGN)
+case _: OptionalPlusOrMinusSign =>
+  stripTrailingLoneDecimalPoint(result)
+  if (input < Decimal.ZERO) {
+addCharacterCheckingTrailingSpaces(result, MINUS_SIGN)
+  } else {
+addCharacterCheckingTrailingSpaces(result, PLUS_SIGN)
+  }
+case _: OptionalMinusSign =>
+  if (input < Decimal.ZERO) {
+stripTrailingLoneDecimalPoint(result)
+addCharacterCheckingTrailingSpaces(result, MINUS_SIGN)
+// Add a second space to account for the "MI" sequence comprising 
two characters in the
+// format string.
+result.append(SPACE)
+  } else {
+result.append(SPACE)
+result.append(SPACE)
+  }
+case OpeningAngleBracket() =>
+  if (input < Decimal.ZERO) {
+result.append(ANGLE_BRACKET_OPEN)
+  }
+case ClosingAngleBracket() =>
+  stripTrailingLoneDecimalPoint(result)
+  if (input < Decimal.ZERO) {
+addCharacterCheckingTrailingSpaces(result, ANGLE_BRACKET_CLOSE)
+  } else {
+result.append(SPACE)
+result.append(SPACE)
+  }
+  }
+}
+
+if (formattingBeforeDecimalPointIndex < inputBeforeDecimalPoint.length ||
+  formattingAfterDecimalPointIndex < inputAfterDecimalPoint.length) {
+  // Remaining digits before or after the decimal point exist in the 
decimal value but not in
+  // the format string.
+  formatMatchFailure(input, numberFormat)
+} else {
+  stripTrailingLoneDecimalPoint(result)
+  val str = result.toString
+  if (result.isEmpty || str == "+" || str == "-") {
+UTF8String.fromString("0")
+  } else {
+UTF8String.fromString(str)
+  }
+}
+  }
+
+  /**
+   * Splits the provided Decimal value's string representation by the decimal 
point, if any.
+   * @param input the Decimal value to consume
+   * @return two strings representing the contents before and after the 
decimal point (if any)
+   */
+  private def formatSplitInputBeforeAndAfterDecimalPoint(input: Decimal): 
(String, String) = {
+// Convert the input Decimal value to a string (without exponent notation).
+val inputString = input.toJavaBigDecimal.toPlainString
+// Split the digits before and after the decimal point.
+val tokens: Array[String] = inputString.split(POINT_SIGN)
+var beforeDecimalPoint: String = tokens(0)
+var afterDecimalPoint: String = if (tokens.length > 1) tokens(1) else ""
+// Strip any leading minus sign to consider the digits only.
+// Strip leading and trailing zeros to match cases when the format string 
begins with a decimal
+// point.
+beforeDecimalPoint = beforeDecimalPoint.dropWhile(c => c == MINUS_SIGN || 
c == ZERO_DIGIT)
+afterDecimalPoint = afterDecimalPoint.reverse.dropWhile(_ == 
ZERO_DIGIT).reverse
+
+   

[GitHub] [spark] dtenedor commented on a diff in pull request #36365: [SPARK-28516][SQL] Implement `to_char` and `try_to_char` functions to format Decimal values as strings

2022-05-11 Thread GitBox


dtenedor commented on code in PR #36365:
URL: https://github.com/apache/spark/pull/36365#discussion_r870575731


##
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/StringExpressionsSuite.scala:
##
@@ -1108,6 +1125,366 @@ class StringExpressionsSuite extends SparkFunSuite with 
ExpressionEvalHelper {
 }
   }
 
+  test("ToCharacter: positive tests") {
+// Test '0' and '9'
+Seq(
+  (Decimal(454),
+"") ->
+" 454",
+  (Decimal(454),
+"9") ->
+"  454",
+  (Decimal(4),
+"0") ->
+"4",
+  (Decimal(45),
+"00") ->
+"45",
+  (Decimal(454),
+"000") ->
+"454",
+  (Decimal(454),
+"") ->
+"0454",
+  (Decimal(454),
+"0") ->
+"00454"
+).foreach { case ((decimal, format), expected) =>
+  var expr: Expression = ToCharacter(Literal(decimal), Literal(format))
+  assert(expr.checkInputDataTypes() == TypeCheckResult.TypeCheckSuccess)
+  checkEvaluation(expr, expected)
+
+  expr = TryToCharacter(Literal(decimal), Literal(format))
+  assert(expr.checkInputDataTypes() == TypeCheckResult.TypeCheckSuccess)
+  checkEvaluation(expr, expected)
+}
+
+// Test '.' and 'D'
+Seq(
+  (Decimal(0.4542),
+".0") ->
+".4542 ",
+  (Decimal(454.2),
+"000.0") ->
+"454.2",
+  (Decimal(454),
+"000.0") ->
+"454  ",
+  (Decimal(454.2),
+"000.00") ->
+"454.2 ",
+  (Decimal(454),
+"000.00") ->
+"454   ",
+  (Decimal(0.4542),
+".") ->
+".4542",
+  (Decimal(4542),
+".") ->
+"4542 "
+).foreach { case ((decimal, format), expected) =>
+  val format2 = format.replace('.', 'D')
+  var expr: Expression = ToCharacter(Literal(decimal), Literal(format))
+  assert(expr.checkInputDataTypes() == TypeCheckResult.TypeCheckSuccess)
+  checkEvaluation(expr, expected)
+
+  expr = ToCharacter(Literal(decimal), Literal(format2))
+  assert(expr.checkInputDataTypes() == TypeCheckResult.TypeCheckSuccess)
+  checkEvaluation(expr, expected)
+
+  expr = TryToCharacter(Literal(decimal), Literal(format))
+  assert(expr.checkInputDataTypes() == TypeCheckResult.TypeCheckSuccess)
+  checkEvaluation(expr, expected)
+
+  expr = TryToCharacter(Literal(decimal), Literal(format2))
+  assert(expr.checkInputDataTypes() == TypeCheckResult.TypeCheckSuccess)
+  checkEvaluation(expr, expected)
+}
+
+Seq(
+  (Decimal(454.2),
+".00") ->
+"0454.2 ",
+  (Decimal(454),
+".00") ->
+"0454   ",
+  (Decimal(4542),
+"0.") ->
+"04542 ",
+  (Decimal(454.2),
+".99") ->
+" 454.2 ",
+  (Decimal(454),
+".99") ->
+" 454   ",
+  // There are no digits after the decimal point.
+  (Decimal(4542),
+"9.") ->
+" 4542 "
+).foreach { case ((decimal, format), expected) =>
+  var expr: Expression = ToCharacter(Literal(decimal), Literal(format))
+  assert(expr.checkInputDataTypes() == TypeCheckResult.TypeCheckSuccess)
+  checkEvaluation(expr, expected)
+
+  expr = TryToCharacter(Literal(decimal), Literal(format))
+  assert(expr.checkInputDataTypes() == TypeCheckResult.TypeCheckSuccess)
+  checkEvaluation(expr, expected)
+}
+
+// Test ',' and 'G'
+Seq(
+  (Decimal(12454),
+"0,") ->
+"1,2454",
+  (Decimal(12454),
+"00,000") ->
+"12,454",
+  (Decimal(124543),
+"000,000") ->
+"124,543",
+  (Decimal(12),
+"000,000") ->
+"000,012",
+  (Decimal(1245436),
+"0,000,000") ->
+"1,245,436",
+  (Decimal(12454367),
+"00,000,000") ->
+"12,454,367"
+).foreach { case ((decimal, format), expected) =>
+  val format2 = format.replace(',', 'G')
+  var expr: Expression = ToCharacter(Literal(decimal), Literal(format))
+  assert(expr.checkInputDataTypes() == TypeCheckResult.TypeCheckSuccess)
+  checkEvaluation(expr, expected)
+
+  expr = ToCharacter(Literal(decimal), Literal(format2))
+  assert(expr.checkInputDataTypes() == TypeCheckResult.TypeCheckSuccess)
+  checkEvaluation(expr, expected)
+
+  expr = TryToCharacter(Literal(decimal), Literal(format))
+  assert(expr.checkInputDataTypes() == TypeCheckResult.TypeCheckSuccess)
+  checkEvaluation(expr, expected)
+
+  expr = TryToCharacter(Literal(decimal), Literal(format2))
+  assert(expr.checkInputDataTypes() == TypeCheckResult.TypeCheckSuccess)
+  checkEvaluation(expr, expected)
+}
+
+Seq(
+  (Decimal(12454),
+"000,000") ->
+"012,454",
+  (Decimal(12454),
+"00,") ->
+"01,2454",
+  (Decimal(12454),
+"000,") ->
+

[GitHub] [spark] dtenedor commented on a diff in pull request #36365: [SPARK-28516][SQL] Implement `to_char` and `try_to_char` functions to format Decimal values as strings

2022-05-11 Thread GitBox


dtenedor commented on code in PR #36365:
URL: https://github.com/apache/spark/pull/36365#discussion_r870575731


##
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/StringExpressionsSuite.scala:
##
@@ -1108,6 +1125,366 @@ class StringExpressionsSuite extends SparkFunSuite with 
ExpressionEvalHelper {
 }
   }
 
+  test("ToCharacter: positive tests") {
+// Test '0' and '9'
+Seq(
+  (Decimal(454),
+"") ->
+" 454",
+  (Decimal(454),
+"9") ->
+"  454",
+  (Decimal(4),
+"0") ->
+"4",
+  (Decimal(45),
+"00") ->
+"45",
+  (Decimal(454),
+"000") ->
+"454",
+  (Decimal(454),
+"") ->
+"0454",
+  (Decimal(454),
+"0") ->
+"00454"
+).foreach { case ((decimal, format), expected) =>
+  var expr: Expression = ToCharacter(Literal(decimal), Literal(format))
+  assert(expr.checkInputDataTypes() == TypeCheckResult.TypeCheckSuccess)
+  checkEvaluation(expr, expected)
+
+  expr = TryToCharacter(Literal(decimal), Literal(format))
+  assert(expr.checkInputDataTypes() == TypeCheckResult.TypeCheckSuccess)
+  checkEvaluation(expr, expected)
+}
+
+// Test '.' and 'D'
+Seq(
+  (Decimal(0.4542),
+".0") ->
+".4542 ",
+  (Decimal(454.2),
+"000.0") ->
+"454.2",
+  (Decimal(454),
+"000.0") ->
+"454  ",
+  (Decimal(454.2),
+"000.00") ->
+"454.2 ",
+  (Decimal(454),
+"000.00") ->
+"454   ",
+  (Decimal(0.4542),
+".") ->
+".4542",
+  (Decimal(4542),
+".") ->
+"4542 "
+).foreach { case ((decimal, format), expected) =>
+  val format2 = format.replace('.', 'D')
+  var expr: Expression = ToCharacter(Literal(decimal), Literal(format))
+  assert(expr.checkInputDataTypes() == TypeCheckResult.TypeCheckSuccess)
+  checkEvaluation(expr, expected)
+
+  expr = ToCharacter(Literal(decimal), Literal(format2))
+  assert(expr.checkInputDataTypes() == TypeCheckResult.TypeCheckSuccess)
+  checkEvaluation(expr, expected)
+
+  expr = TryToCharacter(Literal(decimal), Literal(format))
+  assert(expr.checkInputDataTypes() == TypeCheckResult.TypeCheckSuccess)
+  checkEvaluation(expr, expected)
+
+  expr = TryToCharacter(Literal(decimal), Literal(format2))
+  assert(expr.checkInputDataTypes() == TypeCheckResult.TypeCheckSuccess)
+  checkEvaluation(expr, expected)
+}
+
+Seq(
+  (Decimal(454.2),
+".00") ->
+"0454.2 ",
+  (Decimal(454),
+".00") ->
+"0454   ",
+  (Decimal(4542),
+"0.") ->
+"04542 ",
+  (Decimal(454.2),
+".99") ->
+" 454.2 ",
+  (Decimal(454),
+".99") ->
+" 454   ",
+  // There are no digits after the decimal point.
+  (Decimal(4542),
+"9.") ->
+" 4542 "
+).foreach { case ((decimal, format), expected) =>
+  var expr: Expression = ToCharacter(Literal(decimal), Literal(format))
+  assert(expr.checkInputDataTypes() == TypeCheckResult.TypeCheckSuccess)
+  checkEvaluation(expr, expected)
+
+  expr = TryToCharacter(Literal(decimal), Literal(format))
+  assert(expr.checkInputDataTypes() == TypeCheckResult.TypeCheckSuccess)
+  checkEvaluation(expr, expected)
+}
+
+// Test ',' and 'G'
+Seq(
+  (Decimal(12454),
+"0,") ->
+"1,2454",
+  (Decimal(12454),
+"00,000") ->
+"12,454",
+  (Decimal(124543),
+"000,000") ->
+"124,543",
+  (Decimal(12),
+"000,000") ->
+"000,012",
+  (Decimal(1245436),
+"0,000,000") ->
+"1,245,436",
+  (Decimal(12454367),
+"00,000,000") ->
+"12,454,367"
+).foreach { case ((decimal, format), expected) =>
+  val format2 = format.replace(',', 'G')
+  var expr: Expression = ToCharacter(Literal(decimal), Literal(format))
+  assert(expr.checkInputDataTypes() == TypeCheckResult.TypeCheckSuccess)
+  checkEvaluation(expr, expected)
+
+  expr = ToCharacter(Literal(decimal), Literal(format2))
+  assert(expr.checkInputDataTypes() == TypeCheckResult.TypeCheckSuccess)
+  checkEvaluation(expr, expected)
+
+  expr = TryToCharacter(Literal(decimal), Literal(format))
+  assert(expr.checkInputDataTypes() == TypeCheckResult.TypeCheckSuccess)
+  checkEvaluation(expr, expected)
+
+  expr = TryToCharacter(Literal(decimal), Literal(format2))
+  assert(expr.checkInputDataTypes() == TypeCheckResult.TypeCheckSuccess)
+  checkEvaluation(expr, expected)
+}
+
+Seq(
+  (Decimal(12454),
+"000,000") ->
+"012,454",
+  (Decimal(12454),
+"00,") ->
+"01,2454",
+  (Decimal(12454),
+"000,") ->
+

[GitHub] [spark] dtenedor commented on a diff in pull request #36365: [SPARK-28516][SQL] Implement `to_char` and `try_to_char` functions to format Decimal values as strings

2022-05-11 Thread GitBox


dtenedor commented on code in PR #36365:
URL: https://github.com/apache/spark/pull/36365#discussion_r870553995


##
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/StringExpressionsSuite.scala:
##
@@ -1108,6 +1125,366 @@ class StringExpressionsSuite extends SparkFunSuite with 
ExpressionEvalHelper {
 }
   }
 
+  test("ToCharacter: positive tests") {
+// Test '0' and '9'
+Seq(
+  (Decimal(454),
+"") ->
+" 454",
+  (Decimal(454),
+"9") ->
+"  454",
+  (Decimal(4),
+"0") ->
+"4",
+  (Decimal(45),
+"00") ->
+"45",
+  (Decimal(454),
+"000") ->
+"454",
+  (Decimal(454),
+"") ->
+"0454",
+  (Decimal(454),
+"0") ->
+"00454"
+).foreach { case ((decimal, format), expected) =>
+  var expr: Expression = ToCharacter(Literal(decimal), Literal(format))
+  assert(expr.checkInputDataTypes() == TypeCheckResult.TypeCheckSuccess)
+  checkEvaluation(expr, expected)
+
+  expr = TryToCharacter(Literal(decimal), Literal(format))
+  assert(expr.checkInputDataTypes() == TypeCheckResult.TypeCheckSuccess)
+  checkEvaluation(expr, expected)
+}
+
+// Test '.' and 'D'
+Seq(
+  (Decimal(0.4542),
+".0") ->
+".4542 ",
+  (Decimal(454.2),
+"000.0") ->
+"454.2",
+  (Decimal(454),
+"000.0") ->
+"454  ",
+  (Decimal(454.2),
+"000.00") ->
+"454.2 ",
+  (Decimal(454),
+"000.00") ->
+"454   ",
+  (Decimal(0.4542),
+".") ->
+".4542",
+  (Decimal(4542),
+".") ->
+"4542 "
+).foreach { case ((decimal, format), expected) =>
+  val format2 = format.replace('.', 'D')
+  var expr: Expression = ToCharacter(Literal(decimal), Literal(format))
+  assert(expr.checkInputDataTypes() == TypeCheckResult.TypeCheckSuccess)
+  checkEvaluation(expr, expected)
+
+  expr = ToCharacter(Literal(decimal), Literal(format2))
+  assert(expr.checkInputDataTypes() == TypeCheckResult.TypeCheckSuccess)
+  checkEvaluation(expr, expected)
+
+  expr = TryToCharacter(Literal(decimal), Literal(format))
+  assert(expr.checkInputDataTypes() == TypeCheckResult.TypeCheckSuccess)
+  checkEvaluation(expr, expected)
+
+  expr = TryToCharacter(Literal(decimal), Literal(format2))
+  assert(expr.checkInputDataTypes() == TypeCheckResult.TypeCheckSuccess)
+  checkEvaluation(expr, expected)
+}
+
+Seq(
+  (Decimal(454.2),
+".00") ->
+"0454.2 ",
+  (Decimal(454),
+".00") ->
+"0454   ",
+  (Decimal(4542),
+"0.") ->
+"04542 ",
+  (Decimal(454.2),
+".99") ->
+" 454.2 ",
+  (Decimal(454),
+".99") ->
+" 454   ",
+  // There are no digits after the decimal point.
+  (Decimal(4542),
+"9.") ->
+" 4542 "
+).foreach { case ((decimal, format), expected) =>
+  var expr: Expression = ToCharacter(Literal(decimal), Literal(format))
+  assert(expr.checkInputDataTypes() == TypeCheckResult.TypeCheckSuccess)
+  checkEvaluation(expr, expected)
+
+  expr = TryToCharacter(Literal(decimal), Literal(format))
+  assert(expr.checkInputDataTypes() == TypeCheckResult.TypeCheckSuccess)
+  checkEvaluation(expr, expected)
+}
+
+// Test ',' and 'G'
+Seq(
+  (Decimal(12454),
+"0,") ->
+"1,2454",
+  (Decimal(12454),
+"00,000") ->
+"12,454",
+  (Decimal(124543),
+"000,000") ->
+"124,543",
+  (Decimal(12),
+"000,000") ->
+"000,012",
+  (Decimal(1245436),
+"0,000,000") ->
+"1,245,436",
+  (Decimal(12454367),
+"00,000,000") ->
+"12,454,367"
+).foreach { case ((decimal, format), expected) =>
+  val format2 = format.replace(',', 'G')
+  var expr: Expression = ToCharacter(Literal(decimal), Literal(format))
+  assert(expr.checkInputDataTypes() == TypeCheckResult.TypeCheckSuccess)
+  checkEvaluation(expr, expected)
+
+  expr = ToCharacter(Literal(decimal), Literal(format2))
+  assert(expr.checkInputDataTypes() == TypeCheckResult.TypeCheckSuccess)
+  checkEvaluation(expr, expected)
+
+  expr = TryToCharacter(Literal(decimal), Literal(format))
+  assert(expr.checkInputDataTypes() == TypeCheckResult.TypeCheckSuccess)
+  checkEvaluation(expr, expected)
+
+  expr = TryToCharacter(Literal(decimal), Literal(format2))
+  assert(expr.checkInputDataTypes() == TypeCheckResult.TypeCheckSuccess)
+  checkEvaluation(expr, expected)
+}
+
+Seq(
+  (Decimal(12454),
+"000,000") ->
+"012,454",
+  (Decimal(12454),
+"00,") ->
+"01,2454",
+  (Decimal(12454),
+"000,") ->
+

[GitHub] [spark] dtenedor commented on a diff in pull request #36365: [SPARK-28516][SQL] Implement `to_char` and `try_to_char` functions to format Decimal values as strings

2022-05-11 Thread GitBox


dtenedor commented on code in PR #36365:
URL: https://github.com/apache/spark/pull/36365#discussion_r870547232


##
sql/core/src/test/resources/sql-functions/sql-expression-schema.md:
##
@@ -1,4 +1,8 @@
 
+## Summary

Review Comment:
   Nice, thanks for fixing that! It will make future function updates easier.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dtenedor commented on a diff in pull request #36365: [SPARK-28516][SQL] Implement `to_char` and `try_to_char` functions to format Decimal values as strings

2022-05-11 Thread GitBox


dtenedor commented on code in PR #36365:
URL: https://github.com/apache/spark/pull/36365#discussion_r870545641


##
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/StringExpressionsSuite.scala:
##
@@ -1108,6 +1125,366 @@ class StringExpressionsSuite extends SparkFunSuite with 
ExpressionEvalHelper {
 }
   }
 
+  test("ToCharacter: positive tests") {
+// Test '0' and '9'
+Seq(
+  (Decimal(454),
+"") ->
+" 454",
+  (Decimal(454),
+"9") ->
+"  454",
+  (Decimal(4),
+"0") ->
+"4",
+  (Decimal(45),
+"00") ->
+"45",
+  (Decimal(454),
+"000") ->
+"454",
+  (Decimal(454),
+"") ->
+"0454",
+  (Decimal(454),
+"0") ->
+"00454"
+).foreach { case ((decimal, format), expected) =>
+  var expr: Expression = ToCharacter(Literal(decimal), Literal(format))
+  assert(expr.checkInputDataTypes() == TypeCheckResult.TypeCheckSuccess)
+  checkEvaluation(expr, expected)
+
+  expr = TryToCharacter(Literal(decimal), Literal(format))
+  assert(expr.checkInputDataTypes() == TypeCheckResult.TypeCheckSuccess)
+  checkEvaluation(expr, expected)
+}
+
+// Test '.' and 'D'
+Seq(
+  (Decimal(0.4542),
+".0") ->
+".4542 ",
+  (Decimal(454.2),
+"000.0") ->
+"454.2",
+  (Decimal(454),
+"000.0") ->
+"454  ",
+  (Decimal(454.2),
+"000.00") ->
+"454.2 ",
+  (Decimal(454),
+"000.00") ->
+"454   ",
+  (Decimal(0.4542),
+".") ->
+".4542",
+  (Decimal(4542),
+".") ->
+"4542 "
+).foreach { case ((decimal, format), expected) =>
+  val format2 = format.replace('.', 'D')
+  var expr: Expression = ToCharacter(Literal(decimal), Literal(format))
+  assert(expr.checkInputDataTypes() == TypeCheckResult.TypeCheckSuccess)
+  checkEvaluation(expr, expected)
+
+  expr = ToCharacter(Literal(decimal), Literal(format2))
+  assert(expr.checkInputDataTypes() == TypeCheckResult.TypeCheckSuccess)
+  checkEvaluation(expr, expected)
+
+  expr = TryToCharacter(Literal(decimal), Literal(format))
+  assert(expr.checkInputDataTypes() == TypeCheckResult.TypeCheckSuccess)
+  checkEvaluation(expr, expected)
+
+  expr = TryToCharacter(Literal(decimal), Literal(format2))
+  assert(expr.checkInputDataTypes() == TypeCheckResult.TypeCheckSuccess)
+  checkEvaluation(expr, expected)
+}
+
+Seq(
+  (Decimal(454.2),
+".00") ->
+"0454.2 ",
+  (Decimal(454),
+".00") ->
+"0454   ",
+  (Decimal(4542),
+"0.") ->
+"04542 ",
+  (Decimal(454.2),
+".99") ->
+" 454.2 ",
+  (Decimal(454),
+".99") ->
+" 454   ",
+  // There are no digits after the decimal point.
+  (Decimal(4542),
+"9.") ->
+" 4542 "
+).foreach { case ((decimal, format), expected) =>
+  var expr: Expression = ToCharacter(Literal(decimal), Literal(format))
+  assert(expr.checkInputDataTypes() == TypeCheckResult.TypeCheckSuccess)
+  checkEvaluation(expr, expected)
+
+  expr = TryToCharacter(Literal(decimal), Literal(format))
+  assert(expr.checkInputDataTypes() == TypeCheckResult.TypeCheckSuccess)
+  checkEvaluation(expr, expected)
+}
+
+// Test ',' and 'G'
+Seq(
+  (Decimal(12454),
+"0,") ->
+"1,2454",
+  (Decimal(12454),
+"00,000") ->
+"12,454",
+  (Decimal(124543),
+"000,000") ->
+"124,543",
+  (Decimal(12),
+"000,000") ->
+"000,012",
+  (Decimal(1245436),
+"0,000,000") ->
+"1,245,436",
+  (Decimal(12454367),
+"00,000,000") ->
+"12,454,367"
+).foreach { case ((decimal, format), expected) =>
+  val format2 = format.replace(',', 'G')
+  var expr: Expression = ToCharacter(Literal(decimal), Literal(format))
+  assert(expr.checkInputDataTypes() == TypeCheckResult.TypeCheckSuccess)
+  checkEvaluation(expr, expected)
+
+  expr = ToCharacter(Literal(decimal), Literal(format2))
+  assert(expr.checkInputDataTypes() == TypeCheckResult.TypeCheckSuccess)
+  checkEvaluation(expr, expected)
+
+  expr = TryToCharacter(Literal(decimal), Literal(format))
+  assert(expr.checkInputDataTypes() == TypeCheckResult.TypeCheckSuccess)
+  checkEvaluation(expr, expected)
+
+  expr = TryToCharacter(Literal(decimal), Literal(format2))
+  assert(expr.checkInputDataTypes() == TypeCheckResult.TypeCheckSuccess)
+  checkEvaluation(expr, expected)
+}
+
+Seq(
+  (Decimal(12454),
+"000,000") ->
+"012,454",
+  (Decimal(12454),
+"00,") ->
+"01,2454",
+  (Decimal(12454),
+"000,") ->
+

[GitHub] [spark] dtenedor commented on a diff in pull request #36365: [SPARK-28516][SQL] Implement `to_char` and `try_to_char` functions to format Decimal values as strings

2022-05-10 Thread GitBox


dtenedor commented on code in PR #36365:
URL: https://github.com/apache/spark/pull/36365#discussion_r869451006


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ToNumberParser.scala:
##
@@ -599,4 +614,254 @@ class ToNumberParser(numberFormat: String, errorOnFail: 
Boolean) extends Seriali
   Decimal(javaDecimal, precision, scale)
 }
   }
+
+  /**
+   * Converts a decimal value to a string based on the given number format.
+   *
+   * Iterates through the [[formatTokens]] obtained from processing the format 
string, while also
+   * inspecting the input decimal value.
+   *
+   * @param input the decimal value that needs to be converted
+   * @return the result String value obtained from string formatting
+   */
+  def format(input: Decimal): UTF8String = {
+val result = new StringBuilder()
+// These are string representations of the input Decimal value.
+val (inputBeforeDecimalPoint: String,
+  inputAfterDecimalPoint: String) =
+  formatSplitInputBeforeAndAfterDecimalPoint(input)
+// These are indexes into the characters of the input string before and 
after the decimal point.
+formattingBeforeDecimalPointIndex = 0
+formattingAfterDecimalPointIndex = 0
+var reachedDecimalPoint = false
+
+// Iterate through the tokens representing the provided format string, in 
order.
+for (formatToken: InputToken <- formatTokens) {
+  formatToken match {
+case groups: DigitGroups =>
+  formatDigitGroups(
+groups, inputBeforeDecimalPoint, inputAfterDecimalPoint, 
reachedDecimalPoint, result)
+case DecimalPoint() =>
+  // If the last character so far is a space, change it to a zero. 
This means the input
+  // decimal does not have an integer part.
+  if (result.nonEmpty && result.last == SPACE) {
+result(result.length - 1) = ZERO_DIGIT
+  }
+  result.append(POINT_SIGN)
+  reachedDecimalPoint = true
+case DollarSign() =>
+  result.append(DOLLAR_SIGN)
+case _: OptionalPlusOrMinusSign =>
+  stripTrailingLoneDecimalPoint(result)
+  if (input < Decimal.ZERO) {
+addCharacterCheckingTrailingSpaces(result, MINUS_SIGN)
+  } else {
+addCharacterCheckingTrailingSpaces(result, PLUS_SIGN)
+  }
+case _: OptionalMinusSign =>
+  if (input < Decimal.ZERO) {
+stripTrailingLoneDecimalPoint(result)
+addCharacterCheckingTrailingSpaces(result, MINUS_SIGN)
+// Add a second space to account for the "MI" sequence comprising 
two characters in the
+// format string.
+result.append(SPACE)
+  } else {
+result.append(SPACE)
+result.append(SPACE)
+  }
+case OpeningAngleBracket() =>
+  if (input < Decimal.ZERO) {
+result.append(ANGLE_BRACKET_OPEN)
+  }
+case ClosingAngleBracket() =>
+  stripTrailingLoneDecimalPoint(result)
+  if (input < Decimal.ZERO) {
+addCharacterCheckingTrailingSpaces(result, ANGLE_BRACKET_CLOSE)
+  } else {
+result.append(SPACE)
+result.append(SPACE)
+  }
+  }
+}
+
+if (formattingBeforeDecimalPointIndex < inputBeforeDecimalPoint.length ||
+  formattingAfterDecimalPointIndex < inputAfterDecimalPoint.length) {
+  // Remaining digits before or after the decimal point exist in the 
decimal value but not in
+  // the format string.
+  formatMatchFailure(input, numberFormat)
+} else {
+  stripTrailingLoneDecimalPoint(result)
+  if (result.isEmpty || result.toString == "+" || result.toString == "-") {
+result.clear()
+result.append('0')
+  }
+  UTF8String.fromString(result.toString())
+}
+  }
+
+  /**
+   * Splits the provided Decimal value's string representation by the decimal 
point, if any.
+   * @param input the Decimal value to consume
+   * @return two strings representing the contents before and after the 
decimal point (if any)
+   */
+  private def formatSplitInputBeforeAndAfterDecimalPoint(input: Decimal): 
(String, String) = {
+// Convert the input Decimal value to a string (without exponent notation).
+val inputString = input.toJavaBigDecimal.toPlainString
+// Split the digits before and after the decimal point.
+val tokens: Array[String] = inputString.split(POINT_SIGN)
+var beforeDecimalPoint: String = tokens(0)
+var afterDecimalPoint: String = if (tokens.length > 1) tokens(1) else ""
+// Strip any leading minus sign to consider the digits only.
+// Strip leading and trailing zeros to match cases when the format string 
begins with a decimal
+// point.
+beforeDecimalPoint = beforeDecimalPoint.dropWhile(c => c == MINUS_SIGN || 
c == ZERO_DIGIT)
+afterDecimalPoint = afterDecimalPoint.reverse.dropWhile(_ == 
ZERO_DIGIT).reverse
+

[GitHub] [spark] dtenedor commented on a diff in pull request #36365: [SPARK-28516][SQL] Implement `to_char` and `try_to_char` functions to format Decimal values as strings

2022-05-10 Thread GitBox


dtenedor commented on code in PR #36365:
URL: https://github.com/apache/spark/pull/36365#discussion_r869446412


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ToNumberParser.scala:
##
@@ -599,4 +614,254 @@ class ToNumberParser(numberFormat: String, errorOnFail: 
Boolean) extends Seriali
   Decimal(javaDecimal, precision, scale)
 }
   }
+
+  /**
+   * Converts a decimal value to a string based on the given number format.
+   *
+   * Iterates through the [[formatTokens]] obtained from processing the format 
string, while also
+   * inspecting the input decimal value.
+   *
+   * @param input the decimal value that needs to be converted
+   * @return the result String value obtained from string formatting
+   */
+  def format(input: Decimal): UTF8String = {
+val result = new StringBuilder()
+// These are string representations of the input Decimal value.
+val (inputBeforeDecimalPoint: String,
+  inputAfterDecimalPoint: String) =
+  formatSplitInputBeforeAndAfterDecimalPoint(input)
+// These are indexes into the characters of the input string before and 
after the decimal point.
+formattingBeforeDecimalPointIndex = 0
+formattingAfterDecimalPointIndex = 0
+var reachedDecimalPoint = false
+
+// Iterate through the tokens representing the provided format string, in 
order.
+for (formatToken: InputToken <- formatTokens) {
+  formatToken match {
+case groups: DigitGroups =>
+  formatDigitGroups(
+groups, inputBeforeDecimalPoint, inputAfterDecimalPoint, 
reachedDecimalPoint, result)
+case DecimalPoint() =>
+  // If the last character so far is a space, change it to a zero. 
This means the input
+  // decimal does not have an integer part.
+  if (result.nonEmpty && result.last == SPACE) {
+result(result.length - 1) = ZERO_DIGIT
+  }
+  result.append(POINT_SIGN)
+  reachedDecimalPoint = true
+case DollarSign() =>
+  result.append(DOLLAR_SIGN)
+case _: OptionalPlusOrMinusSign =>
+  stripTrailingLoneDecimalPoint(result)
+  if (input < Decimal.ZERO) {
+addCharacterCheckingTrailingSpaces(result, MINUS_SIGN)
+  } else {
+addCharacterCheckingTrailingSpaces(result, PLUS_SIGN)
+  }
+case _: OptionalMinusSign =>
+  if (input < Decimal.ZERO) {
+stripTrailingLoneDecimalPoint(result)
+addCharacterCheckingTrailingSpaces(result, MINUS_SIGN)
+// Add a second space to account for the "MI" sequence comprising 
two characters in the
+// format string.
+result.append(SPACE)
+  } else {
+result.append(SPACE)
+result.append(SPACE)
+  }
+case OpeningAngleBracket() =>
+  if (input < Decimal.ZERO) {
+result.append(ANGLE_BRACKET_OPEN)
+  }
+case ClosingAngleBracket() =>
+  stripTrailingLoneDecimalPoint(result)
+  if (input < Decimal.ZERO) {
+addCharacterCheckingTrailingSpaces(result, ANGLE_BRACKET_CLOSE)
+  } else {
+result.append(SPACE)
+result.append(SPACE)
+  }
+  }
+}
+
+if (formattingBeforeDecimalPointIndex < inputBeforeDecimalPoint.length ||
+  formattingAfterDecimalPointIndex < inputAfterDecimalPoint.length) {
+  // Remaining digits before or after the decimal point exist in the 
decimal value but not in
+  // the format string.
+  formatMatchFailure(input, numberFormat)
+} else {
+  stripTrailingLoneDecimalPoint(result)
+  if (result.isEmpty || result.toString == "+" || result.toString == "-") {

Review Comment:
   Good idea, done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dtenedor commented on a diff in pull request #36365: [SPARK-28516][SQL] Implement `to_char` and `try_to_char` functions to format Decimal values as strings

2022-05-10 Thread GitBox


dtenedor commented on code in PR #36365:
URL: https://github.com/apache/spark/pull/36365#discussion_r869446412


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ToNumberParser.scala:
##
@@ -599,4 +614,254 @@ class ToNumberParser(numberFormat: String, errorOnFail: 
Boolean) extends Seriali
   Decimal(javaDecimal, precision, scale)
 }
   }
+
+  /**
+   * Converts a decimal value to a string based on the given number format.
+   *
+   * Iterates through the [[formatTokens]] obtained from processing the format 
string, while also
+   * inspecting the input decimal value.
+   *
+   * @param input the decimal value that needs to be converted
+   * @return the result String value obtained from string formatting
+   */
+  def format(input: Decimal): UTF8String = {
+val result = new StringBuilder()
+// These are string representations of the input Decimal value.
+val (inputBeforeDecimalPoint: String,
+  inputAfterDecimalPoint: String) =
+  formatSplitInputBeforeAndAfterDecimalPoint(input)
+// These are indexes into the characters of the input string before and 
after the decimal point.
+formattingBeforeDecimalPointIndex = 0
+formattingAfterDecimalPointIndex = 0
+var reachedDecimalPoint = false
+
+// Iterate through the tokens representing the provided format string, in 
order.
+for (formatToken: InputToken <- formatTokens) {
+  formatToken match {
+case groups: DigitGroups =>
+  formatDigitGroups(
+groups, inputBeforeDecimalPoint, inputAfterDecimalPoint, 
reachedDecimalPoint, result)
+case DecimalPoint() =>
+  // If the last character so far is a space, change it to a zero. 
This means the input
+  // decimal does not have an integer part.
+  if (result.nonEmpty && result.last == SPACE) {
+result(result.length - 1) = ZERO_DIGIT
+  }
+  result.append(POINT_SIGN)
+  reachedDecimalPoint = true
+case DollarSign() =>
+  result.append(DOLLAR_SIGN)
+case _: OptionalPlusOrMinusSign =>
+  stripTrailingLoneDecimalPoint(result)
+  if (input < Decimal.ZERO) {
+addCharacterCheckingTrailingSpaces(result, MINUS_SIGN)
+  } else {
+addCharacterCheckingTrailingSpaces(result, PLUS_SIGN)
+  }
+case _: OptionalMinusSign =>
+  if (input < Decimal.ZERO) {
+stripTrailingLoneDecimalPoint(result)
+addCharacterCheckingTrailingSpaces(result, MINUS_SIGN)
+// Add a second space to account for the "MI" sequence comprising 
two characters in the
+// format string.
+result.append(SPACE)
+  } else {
+result.append(SPACE)
+result.append(SPACE)
+  }
+case OpeningAngleBracket() =>
+  if (input < Decimal.ZERO) {
+result.append(ANGLE_BRACKET_OPEN)
+  }
+case ClosingAngleBracket() =>
+  stripTrailingLoneDecimalPoint(result)
+  if (input < Decimal.ZERO) {
+addCharacterCheckingTrailingSpaces(result, ANGLE_BRACKET_CLOSE)
+  } else {
+result.append(SPACE)
+result.append(SPACE)
+  }
+  }
+}
+
+if (formattingBeforeDecimalPointIndex < inputBeforeDecimalPoint.length ||
+  formattingAfterDecimalPointIndex < inputAfterDecimalPoint.length) {
+  // Remaining digits before or after the decimal point exist in the 
decimal value but not in
+  // the format string.
+  formatMatchFailure(input, numberFormat)
+} else {
+  stripTrailingLoneDecimalPoint(result)
+  if (result.isEmpty || result.toString == "+" || result.toString == "-") {

Review Comment:
   Good idea, done. We can save two string copies that way.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dtenedor commented on a diff in pull request #36365: [SPARK-28516][SQL] Implement `to_char` and `try_to_char` functions to format Decimal values as strings

2022-05-10 Thread GitBox


dtenedor commented on code in PR #36365:
URL: https://github.com/apache/spark/pull/36365#discussion_r869445021


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ToNumberParser.scala:
##
@@ -599,4 +614,254 @@ class ToNumberParser(numberFormat: String, errorOnFail: 
Boolean) extends Seriali
   Decimal(javaDecimal, precision, scale)
 }
   }
+
+  /**
+   * Converts a decimal value to a string based on the given number format.
+   *
+   * Iterates through the [[formatTokens]] obtained from processing the format 
string, while also
+   * inspecting the input decimal value.
+   *
+   * @param input the decimal value that needs to be converted
+   * @return the result String value obtained from string formatting
+   */
+  def format(input: Decimal): UTF8String = {
+val result = new StringBuilder()
+// These are string representations of the input Decimal value.
+val (inputBeforeDecimalPoint: String,
+  inputAfterDecimalPoint: String) =
+  formatSplitInputBeforeAndAfterDecimalPoint(input)
+// These are indexes into the characters of the input string before and 
after the decimal point.
+formattingBeforeDecimalPointIndex = 0
+formattingAfterDecimalPointIndex = 0
+var reachedDecimalPoint = false
+
+// Iterate through the tokens representing the provided format string, in 
order.
+for (formatToken: InputToken <- formatTokens) {
+  formatToken match {
+case groups: DigitGroups =>
+  formatDigitGroups(
+groups, inputBeforeDecimalPoint, inputAfterDecimalPoint, 
reachedDecimalPoint, result)
+case DecimalPoint() =>
+  // If the last character so far is a space, change it to a zero. 
This means the input
+  // decimal does not have an integer part.
+  if (result.nonEmpty && result.last == SPACE) {
+result(result.length - 1) = ZERO_DIGIT
+  }
+  result.append(POINT_SIGN)
+  reachedDecimalPoint = true
+case DollarSign() =>
+  result.append(DOLLAR_SIGN)

Review Comment:
   Currency characters (`$`) must appear before the decimal point in the format 
string; this is enforced by the `validateFormatString` method.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dtenedor commented on a diff in pull request #36365: [SPARK-28516][SQL] Implement `to_char` and `try_to_char` functions to format Decimal values as strings

2022-05-10 Thread GitBox


dtenedor commented on code in PR #36365:
URL: https://github.com/apache/spark/pull/36365#discussion_r869442125


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ToNumberParser.scala:
##
@@ -599,4 +614,254 @@ class ToNumberParser(numberFormat: String, errorOnFail: 
Boolean) extends Seriali
   Decimal(javaDecimal, precision, scale)
 }
   }
+
+  /**
+   * Converts a decimal value to a string based on the given number format.
+   *
+   * Iterates through the [[formatTokens]] obtained from processing the format 
string, while also
+   * inspecting the input decimal value.
+   *
+   * @param input the decimal value that needs to be converted
+   * @return the result String value obtained from string formatting
+   */
+  def format(input: Decimal): UTF8String = {
+val result = new StringBuilder()
+// These are string representations of the input Decimal value.
+val (inputBeforeDecimalPoint: String,
+  inputAfterDecimalPoint: String) =
+  formatSplitInputBeforeAndAfterDecimalPoint(input)
+// These are indexes into the characters of the input string before and 
after the decimal point.
+formattingBeforeDecimalPointIndex = 0
+formattingAfterDecimalPointIndex = 0
+var reachedDecimalPoint = false
+
+// Iterate through the tokens representing the provided format string, in 
order.
+for (formatToken: InputToken <- formatTokens) {
+  formatToken match {
+case groups: DigitGroups =>
+  formatDigitGroups(
+groups, inputBeforeDecimalPoint, inputAfterDecimalPoint, 
reachedDecimalPoint, result)
+case DecimalPoint() =>
+  // If the last character so far is a space, change it to a zero. 
This means the input
+  // decimal does not have an integer part.
+  if (result.nonEmpty && result.last == SPACE) {
+result(result.length - 1) = ZERO_DIGIT
+  }
+  result.append(POINT_SIGN)
+  reachedDecimalPoint = true
+case DollarSign() =>
+  result.append(DOLLAR_SIGN)
+case _: OptionalPlusOrMinusSign =>
+  stripTrailingLoneDecimalPoint(result)
+  if (input < Decimal.ZERO) {
+addCharacterCheckingTrailingSpaces(result, MINUS_SIGN)
+  } else {
+addCharacterCheckingTrailingSpaces(result, PLUS_SIGN)
+  }
+case _: OptionalMinusSign =>
+  if (input < Decimal.ZERO) {
+stripTrailingLoneDecimalPoint(result)
+addCharacterCheckingTrailingSpaces(result, MINUS_SIGN)
+// Add a second space to account for the "MI" sequence comprising 
two characters in the
+// format string.
+result.append(SPACE)
+  } else {
+result.append(SPACE)
+result.append(SPACE)
+  }
+case OpeningAngleBracket() =>
+  if (input < Decimal.ZERO) {
+result.append(ANGLE_BRACKET_OPEN)
+  }
+case ClosingAngleBracket() =>
+  stripTrailingLoneDecimalPoint(result)
+  if (input < Decimal.ZERO) {
+addCharacterCheckingTrailingSpaces(result, ANGLE_BRACKET_CLOSE)
+  } else {
+result.append(SPACE)
+result.append(SPACE)
+  }
+  }
+}
+
+if (formattingBeforeDecimalPointIndex < inputBeforeDecimalPoint.length ||
+  formattingAfterDecimalPointIndex < inputAfterDecimalPoint.length) {
+  // Remaining digits before or after the decimal point exist in the 
decimal value but not in
+  // the format string.
+  formatMatchFailure(input, numberFormat)
+} else {
+  stripTrailingLoneDecimalPoint(result)
+  if (result.isEmpty || result.toString == "+" || result.toString == "-") {
+result.clear()
+result.append('0')
+  }
+  UTF8String.fromString(result.toString())
+}
+  }
+
+  /**
+   * Splits the provided Decimal value's string representation by the decimal 
point, if any.
+   * @param input the Decimal value to consume
+   * @return two strings representing the contents before and after the 
decimal point (if any)
+   */
+  private def formatSplitInputBeforeAndAfterDecimalPoint(input: Decimal): 
(String, String) = {
+// Convert the input Decimal value to a string (without exponent notation).
+val inputString = input.toJavaBigDecimal.toPlainString
+// Split the digits before and after the decimal point.
+val tokens: Array[String] = inputString.split(POINT_SIGN)
+var beforeDecimalPoint: String = tokens(0)
+var afterDecimalPoint: String = if (tokens.length > 1) tokens(1) else ""
+// Strip any leading minus sign to consider the digits only.
+// Strip leading and trailing zeros to match cases when the format string 
begins with a decimal
+// point.
+beforeDecimalPoint = beforeDecimalPoint.dropWhile(c => c == MINUS_SIGN || 
c == ZERO_DIGIT)
+afterDecimalPoint = afterDecimalPoint.reverse.dropWhile(_ == 
ZERO_DIGIT).reverse
+

[GitHub] [spark] dtenedor commented on a diff in pull request #36365: [SPARK-28516][SQL] Implement `to_char` and `try_to_char` functions to format Decimal values as strings

2022-05-10 Thread GitBox


dtenedor commented on code in PR #36365:
URL: https://github.com/apache/spark/pull/36365#discussion_r869441379


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ToNumberParser.scala:
##
@@ -599,4 +614,254 @@ class ToNumberParser(numberFormat: String, errorOnFail: 
Boolean) extends Seriali
   Decimal(javaDecimal, precision, scale)
 }
   }
+
+  /**
+   * Converts a decimal value to a string based on the given number format.
+   *
+   * Iterates through the [[formatTokens]] obtained from processing the format 
string, while also
+   * inspecting the input decimal value.
+   *
+   * @param input the decimal value that needs to be converted
+   * @return the result String value obtained from string formatting
+   */
+  def format(input: Decimal): UTF8String = {
+val result = new StringBuilder()
+// These are string representations of the input Decimal value.
+val (inputBeforeDecimalPoint: String,
+  inputAfterDecimalPoint: String) =
+  formatSplitInputBeforeAndAfterDecimalPoint(input)
+// These are indexes into the characters of the input string before and 
after the decimal point.
+formattingBeforeDecimalPointIndex = 0
+formattingAfterDecimalPointIndex = 0
+var reachedDecimalPoint = false
+
+// Iterate through the tokens representing the provided format string, in 
order.
+for (formatToken: InputToken <- formatTokens) {
+  formatToken match {
+case groups: DigitGroups =>
+  formatDigitGroups(
+groups, inputBeforeDecimalPoint, inputAfterDecimalPoint, 
reachedDecimalPoint, result)
+case DecimalPoint() =>
+  // If the last character so far is a space, change it to a zero. 
This means the input
+  // decimal does not have an integer part.
+  if (result.nonEmpty && result.last == SPACE) {
+result(result.length - 1) = ZERO_DIGIT
+  }
+  result.append(POINT_SIGN)
+  reachedDecimalPoint = true
+case DollarSign() =>
+  result.append(DOLLAR_SIGN)
+case _: OptionalPlusOrMinusSign =>
+  stripTrailingLoneDecimalPoint(result)
+  if (input < Decimal.ZERO) {
+addCharacterCheckingTrailingSpaces(result, MINUS_SIGN)
+  } else {
+addCharacterCheckingTrailingSpaces(result, PLUS_SIGN)
+  }
+case _: OptionalMinusSign =>
+  if (input < Decimal.ZERO) {
+stripTrailingLoneDecimalPoint(result)
+addCharacterCheckingTrailingSpaces(result, MINUS_SIGN)
+// Add a second space to account for the "MI" sequence comprising 
two characters in the
+// format string.
+result.append(SPACE)
+  } else {
+result.append(SPACE)
+result.append(SPACE)
+  }
+case OpeningAngleBracket() =>
+  if (input < Decimal.ZERO) {
+result.append(ANGLE_BRACKET_OPEN)
+  }
+case ClosingAngleBracket() =>
+  stripTrailingLoneDecimalPoint(result)
+  if (input < Decimal.ZERO) {
+addCharacterCheckingTrailingSpaces(result, ANGLE_BRACKET_CLOSE)
+  } else {
+result.append(SPACE)
+result.append(SPACE)
+  }
+  }
+}
+
+if (formattingBeforeDecimalPointIndex < inputBeforeDecimalPoint.length ||
+  formattingAfterDecimalPointIndex < inputAfterDecimalPoint.length) {
+  // Remaining digits before or after the decimal point exist in the 
decimal value but not in
+  // the format string.
+  formatMatchFailure(input, numberFormat)
+} else {
+  stripTrailingLoneDecimalPoint(result)
+  if (result.isEmpty || result.toString == "+" || result.toString == "-") {
+result.clear()
+result.append('0')
+  }
+  UTF8String.fromString(result.toString())
+}
+  }
+
+  /**
+   * Splits the provided Decimal value's string representation by the decimal 
point, if any.
+   * @param input the Decimal value to consume
+   * @return two strings representing the contents before and after the 
decimal point (if any)
+   */
+  private def formatSplitInputBeforeAndAfterDecimalPoint(input: Decimal): 
(String, String) = {
+// Convert the input Decimal value to a string (without exponent notation).
+val inputString = input.toJavaBigDecimal.toPlainString
+// Split the digits before and after the decimal point.
+val tokens: Array[String] = inputString.split(POINT_SIGN)
+var beforeDecimalPoint: String = tokens(0)
+var afterDecimalPoint: String = if (tokens.length > 1) tokens(1) else ""
+// Strip any leading minus sign to consider the digits only.
+// Strip leading and trailing zeros to match cases when the format string 
begins with a decimal
+// point.
+beforeDecimalPoint = beforeDecimalPoint.dropWhile(c => c == MINUS_SIGN || 
c == ZERO_DIGIT)
+afterDecimalPoint = afterDecimalPoint.reverse.dropWhile(_ == 
ZERO_DIGIT).reverse
+

[GitHub] [spark] dtenedor commented on a diff in pull request #36365: [SPARK-28516][SQL] Implement `to_char` and `try_to_char` functions to format Decimal values as strings

2022-05-09 Thread GitBox


dtenedor commented on code in PR #36365:
URL: https://github.com/apache/spark/pull/36365#discussion_r868397971


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/numberFormatExpressions.scala:
##
@@ -168,3 +168,157 @@ case class TryToNumber(left: Expression, right: 
Expression)
   newRight: Expression): TryToNumber =
 copy(left = newLeft, right = newRight)
 }
+
+/**
+ * A function that converts decimal values to strings, returning NULL if the 
decimal value fails to
+ * match the format string.
+ */
+@ExpressionDescription(
+  usage = """
+_FUNC_(numberExpr, formatExpr) - Convert `numberExpr` to a string based on 
the `formatExpr`.
+  Throws an exception if the conversion fails. The format can consist of 
the following
+  characters, case insensitive:
+'0' or '9': Specifies an expected digit between 0 and 9. A sequence of 
0 or 9 in the format
+  string matches a sequence of digits in the input value, generating a 
result string of the
+  same length as the corresponding sequence in the format string. The 
result string is
+  left-padded with zeros if the 0/9 sequence comprises more digits 
than the matching part of
+  the decimal value, starts with 0, and is before the decimal point.
+'.' or 'D': Specifies the position of the decimal point (optional, 
only allowed once).
+',' or 'G': Specifies the position of the grouping (thousands) 
separator (,). There must be
+  a 0 or 9 to the left and right of each grouping separator.
+'$': Specifies the location of the $ currency sign. This character may 
only be specified
+  once.
+'S' or 'MI': Specifies the position of a '-' or '+' sign (optional, 
only allowed once at
+  the beginning or end of the format string). Note that 'S' allows '-' 
but 'MI' does not.
+'PR': Only allowed at the end of the format string; specifies that 
'expr' indicates a

Review Comment:
   Done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dtenedor commented on a diff in pull request #36365: [SPARK-28516][SQL] Implement `to_char` and `try_to_char` functions to format Decimal values as strings

2022-05-09 Thread GitBox


dtenedor commented on code in PR #36365:
URL: https://github.com/apache/spark/pull/36365#discussion_r868397446


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/numberFormatExpressions.scala:
##
@@ -168,3 +168,157 @@ case class TryToNumber(left: Expression, right: 
Expression)
   newRight: Expression): TryToNumber =
 copy(left = newLeft, right = newRight)
 }
+
+/**
+ * A function that converts decimal values to strings, returning NULL if the 
decimal value fails to
+ * match the format string.
+ */
+@ExpressionDescription(
+  usage = """
+_FUNC_(numberExpr, formatExpr) - Convert `numberExpr` to a string based on 
the `formatExpr`.
+  Throws an exception if the conversion fails. The format can consist of 
the following
+  characters, case insensitive:
+'0' or '9': Specifies an expected digit between 0 and 9. A sequence of 
0 or 9 in the format
+  string matches a sequence of digits in the input value, generating a 
result string of the
+  same length as the corresponding sequence in the format string. The 
result string is
+  left-padded with zeros if the 0/9 sequence comprises more digits 
than the matching part of
+  the decimal value, starts with 0, and is before the decimal point.
+'.' or 'D': Specifies the position of the decimal point (optional, 
only allowed once).
+',' or 'G': Specifies the position of the grouping (thousands) 
separator (,). There must be
+  a 0 or 9 to the left and right of each grouping separator.
+'$': Specifies the location of the $ currency sign. This character may 
only be specified
+  once.
+'S' or 'MI': Specifies the position of a '-' or '+' sign (optional, 
only allowed once at
+  the beginning or end of the format string). Note that 'S' allows '-' 
but 'MI' does not.

Review Comment:
   Done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dtenedor commented on a diff in pull request #36365: [SPARK-28516][SQL] Implement `to_char` and `try_to_char` functions to format Decimal values as strings

2022-05-09 Thread GitBox


dtenedor commented on code in PR #36365:
URL: https://github.com/apache/spark/pull/36365#discussion_r868396911


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/numberFormatExpressions.scala:
##
@@ -168,3 +168,157 @@ case class TryToNumber(left: Expression, right: 
Expression)
   newRight: Expression): TryToNumber =
 copy(left = newLeft, right = newRight)
 }
+
+/**
+ * A function that converts decimal values to strings, returning NULL if the 
decimal value fails to
+ * match the format string.
+ */
+@ExpressionDescription(
+  usage = """
+_FUNC_(numberExpr, formatExpr) - Convert `numberExpr` to a string based on 
the `formatExpr`.
+  Throws an exception if the conversion fails. The format can consist of 
the following
+  characters, case insensitive:
+'0' or '9': Specifies an expected digit between 0 and 9. A sequence of 
0 or 9 in the format
+  string matches a sequence of digits in the input value, generating a 
result string of the
+  same length as the corresponding sequence in the format string. The 
result string is
+  left-padded with zeros if the 0/9 sequence comprises more digits 
than the matching part of
+  the decimal value, starts with 0, and is before the decimal point.

Review Comment:
   SG, done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dtenedor commented on a diff in pull request #36365: [SPARK-28516][SQL] Implement `to_char` and `try_to_char` functions to format Decimal values as strings

2022-04-28 Thread GitBox


dtenedor commented on code in PR #36365:
URL: https://github.com/apache/spark/pull/36365#discussion_r861220757


##
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/StringExpressionsSuite.scala:
##
@@ -1108,6 +1125,238 @@ class StringExpressionsSuite extends SparkFunSuite with 
ExpressionEvalHelper {
 }
   }
 
+  test("ToCharacter: positive tests") {
+// Test '0' and '9'
+Seq(
+  (Decimal(454), "") -> " 454",
+  (Decimal(454), "9") -> "  454",
+  (Decimal(4), "0") -> "4",
+  (Decimal(45), "00") -> "45",
+  (Decimal(454), "000") -> "454",
+  (Decimal(454), "") -> "0454",
+  (Decimal(454), "0") -> "00454"
+).foreach { case ((decimal, format), expected) =>
+  var expr: Expression = ToCharacter(Literal(decimal), Literal(format))
+  assert(expr.checkInputDataTypes() == TypeCheckResult.TypeCheckSuccess)
+  checkEvaluation(expr, expected)
+
+  expr = TryToCharacter(Literal(decimal), Literal(format))
+  assert(expr.checkInputDataTypes() == TypeCheckResult.TypeCheckSuccess)
+  checkEvaluation(expr, expected)
+}
+
+// Test '.' and 'D'
+Seq(
+  (Decimal(0.4542), ".0") -> ".4542 ",
+  (Decimal(454.2), "000.0") -> "454.2",
+  (Decimal(454), "000.0") -> "454  ",
+  (Decimal(454.2), "000.00") -> "454.2 ",
+  (Decimal(454), "000.00") -> "454   ",
+  (Decimal(0.4542), ".") -> ".4542",
+  (Decimal(4542), ".") -> "4542 "
+).foreach { case ((decimal, format), expected) =>
+  val format2 = format.replace('.', 'D')
+  var expr: Expression = ToCharacter(Literal(decimal), Literal(format))
+  assert(expr.checkInputDataTypes() == TypeCheckResult.TypeCheckSuccess)
+  checkEvaluation(expr, expected)
+
+  expr = ToCharacter(Literal(decimal), Literal(format2))
+  assert(expr.checkInputDataTypes() == TypeCheckResult.TypeCheckSuccess)
+  checkEvaluation(expr, expected)
+
+  expr = TryToCharacter(Literal(decimal), Literal(format))
+  assert(expr.checkInputDataTypes() == TypeCheckResult.TypeCheckSuccess)
+  checkEvaluation(expr, expected)
+
+  expr = TryToCharacter(Literal(decimal), Literal(format2))
+  assert(expr.checkInputDataTypes() == TypeCheckResult.TypeCheckSuccess)
+  checkEvaluation(expr, expected)
+}
+
+Seq(
+  (Decimal(454.2), ".00") -> "0454.2 ",
+  (Decimal(454), ".00") -> "0454   ",
+  (Decimal(4542), "0.") -> "04542 ",
+  (Decimal(454.2), ".99") -> " 454.2 ",
+  (Decimal(454), ".99") -> " 454   ",
+  (Decimal(4542), "9.") -> " 4542 "
+).foreach { case ((decimal, format), expected) =>
+  var expr: Expression = ToCharacter(Literal(decimal), Literal(format))
+  assert(expr.checkInputDataTypes() == TypeCheckResult.TypeCheckSuccess)
+  checkEvaluation(expr, expected)
+
+  expr = TryToCharacter(Literal(decimal), Literal(format))
+  assert(expr.checkInputDataTypes() == TypeCheckResult.TypeCheckSuccess)
+  checkEvaluation(expr, expected)
+}
+
+// Test ',' and 'G'
+Seq(
+  (Decimal(12454), "0,") -> "1,2454",
+  (Decimal(12454), "00,000") -> "12,454",
+  (Decimal(124543), "000,000") -> "124,543",
+  (Decimal(1245436), "0,000,000") -> "1,245,436",
+  (Decimal(12454367), "00,000,000") -> "12,454,367",
+  (Decimal(12454), "0,") -> "1,2454",
+  (Decimal(12454), "00,000") -> "12,454",
+  (Decimal(454367), "000,000") -> "454,367"
+).foreach { case ((decimal, format), expected) =>
+  val format2 = format.replace('0', '9')
+  val format3 = format.replace(',', 'G')
+  var expr: Expression = ToCharacter(Literal(decimal), Literal(format))
+  assert(expr.checkInputDataTypes() == TypeCheckResult.TypeCheckSuccess)
+  checkEvaluation(expr, expected)
+
+  expr = ToCharacter(Literal(decimal), Literal(format2))
+  assert(expr.checkInputDataTypes() == TypeCheckResult.TypeCheckSuccess)
+  checkEvaluation(expr, expected)
+
+  expr = ToCharacter(Literal(decimal), Literal(format3))
+  assert(expr.checkInputDataTypes() == TypeCheckResult.TypeCheckSuccess)
+  checkEvaluation(expr, expected)
+
+  expr = TryToCharacter(Literal(decimal), Literal(format))
+  assert(expr.checkInputDataTypes() == TypeCheckResult.TypeCheckSuccess)
+  checkEvaluation(expr, expected)
+
+  expr = TryToCharacter(Literal(decimal), Literal(format2))
+  assert(expr.checkInputDataTypes() == TypeCheckResult.TypeCheckSuccess)
+  checkEvaluation(expr, expected)
+
+  expr = TryToCharacter(Literal(decimal), Literal(format3))
+  assert(expr.checkInputDataTypes() == TypeCheckResult.TypeCheckSuccess)
+  checkEvaluation(expr, expected)
+}
+
+Seq(
+  (Decimal(12454), "000,000") -> "012,454",
+  (Decimal(12454), "00,") -> "01,2454",
+  (Decimal(12454), "000,") -> "001,2454",
+  (Decimal(12454), ",") -> "0001,2454",
+   

[GitHub] [spark] dtenedor commented on a diff in pull request #36365: [SPARK-28516][SQL] Implement `to_char` and `try_to_char` functions to format Decimal values as strings

2022-04-28 Thread GitBox


dtenedor commented on code in PR #36365:
URL: https://github.com/apache/spark/pull/36365#discussion_r861219525


##
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/StringExpressionsSuite.scala:
##
@@ -1108,6 +1125,238 @@ class StringExpressionsSuite extends SparkFunSuite with 
ExpressionEvalHelper {
 }
   }
 
+  test("ToCharacter: positive tests") {
+// Test '0' and '9'
+Seq(
+  (Decimal(454), "") -> " 454",
+  (Decimal(454), "9") -> "  454",
+  (Decimal(4), "0") -> "4",
+  (Decimal(45), "00") -> "45",
+  (Decimal(454), "000") -> "454",
+  (Decimal(454), "") -> "0454",
+  (Decimal(454), "0") -> "00454"
+).foreach { case ((decimal, format), expected) =>
+  var expr: Expression = ToCharacter(Literal(decimal), Literal(format))
+  assert(expr.checkInputDataTypes() == TypeCheckResult.TypeCheckSuccess)
+  checkEvaluation(expr, expected)
+
+  expr = TryToCharacter(Literal(decimal), Literal(format))
+  assert(expr.checkInputDataTypes() == TypeCheckResult.TypeCheckSuccess)
+  checkEvaluation(expr, expected)
+}
+
+// Test '.' and 'D'
+Seq(
+  (Decimal(0.4542), ".0") -> ".4542 ",
+  (Decimal(454.2), "000.0") -> "454.2",
+  (Decimal(454), "000.0") -> "454  ",
+  (Decimal(454.2), "000.00") -> "454.2 ",
+  (Decimal(454), "000.00") -> "454   ",
+  (Decimal(0.4542), ".") -> ".4542",
+  (Decimal(4542), ".") -> "4542 "
+).foreach { case ((decimal, format), expected) =>
+  val format2 = format.replace('.', 'D')
+  var expr: Expression = ToCharacter(Literal(decimal), Literal(format))
+  assert(expr.checkInputDataTypes() == TypeCheckResult.TypeCheckSuccess)
+  checkEvaluation(expr, expected)
+
+  expr = ToCharacter(Literal(decimal), Literal(format2))
+  assert(expr.checkInputDataTypes() == TypeCheckResult.TypeCheckSuccess)
+  checkEvaluation(expr, expected)
+
+  expr = TryToCharacter(Literal(decimal), Literal(format))
+  assert(expr.checkInputDataTypes() == TypeCheckResult.TypeCheckSuccess)
+  checkEvaluation(expr, expected)
+
+  expr = TryToCharacter(Literal(decimal), Literal(format2))
+  assert(expr.checkInputDataTypes() == TypeCheckResult.TypeCheckSuccess)
+  checkEvaluation(expr, expected)
+}
+
+Seq(
+  (Decimal(454.2), ".00") -> "0454.2 ",
+  (Decimal(454), ".00") -> "0454   ",
+  (Decimal(4542), "0.") -> "04542 ",
+  (Decimal(454.2), ".99") -> " 454.2 ",
+  (Decimal(454), ".99") -> " 454   ",
+  (Decimal(4542), "9.") -> " 4542 "
+).foreach { case ((decimal, format), expected) =>
+  var expr: Expression = ToCharacter(Literal(decimal), Literal(format))
+  assert(expr.checkInputDataTypes() == TypeCheckResult.TypeCheckSuccess)
+  checkEvaluation(expr, expected)
+
+  expr = TryToCharacter(Literal(decimal), Literal(format))
+  assert(expr.checkInputDataTypes() == TypeCheckResult.TypeCheckSuccess)
+  checkEvaluation(expr, expected)
+}
+
+// Test ',' and 'G'
+Seq(
+  (Decimal(12454), "0,") -> "1,2454",
+  (Decimal(12454), "00,000") -> "12,454",
+  (Decimal(124543), "000,000") -> "124,543",
+  (Decimal(1245436), "0,000,000") -> "1,245,436",
+  (Decimal(12454367), "00,000,000") -> "12,454,367",
+  (Decimal(12454), "0,") -> "1,2454",
+  (Decimal(12454), "00,000") -> "12,454",
+  (Decimal(454367), "000,000") -> "454,367"
+).foreach { case ((decimal, format), expected) =>
+  val format2 = format.replace('0', '9')
+  val format3 = format.replace(',', 'G')
+  var expr: Expression = ToCharacter(Literal(decimal), Literal(format))
+  assert(expr.checkInputDataTypes() == TypeCheckResult.TypeCheckSuccess)
+  checkEvaluation(expr, expected)
+
+  expr = ToCharacter(Literal(decimal), Literal(format2))
+  assert(expr.checkInputDataTypes() == TypeCheckResult.TypeCheckSuccess)
+  checkEvaluation(expr, expected)
+
+  expr = ToCharacter(Literal(decimal), Literal(format3))
+  assert(expr.checkInputDataTypes() == TypeCheckResult.TypeCheckSuccess)
+  checkEvaluation(expr, expected)
+
+  expr = TryToCharacter(Literal(decimal), Literal(format))
+  assert(expr.checkInputDataTypes() == TypeCheckResult.TypeCheckSuccess)
+  checkEvaluation(expr, expected)
+
+  expr = TryToCharacter(Literal(decimal), Literal(format2))
+  assert(expr.checkInputDataTypes() == TypeCheckResult.TypeCheckSuccess)
+  checkEvaluation(expr, expected)
+
+  expr = TryToCharacter(Literal(decimal), Literal(format3))
+  assert(expr.checkInputDataTypes() == TypeCheckResult.TypeCheckSuccess)
+  checkEvaluation(expr, expected)
+}
+
+Seq(
+  (Decimal(12454), "000,000") -> "012,454",
+  (Decimal(12454), "00,") -> "01,2454",
+  (Decimal(12454), "000,") -> "001,2454",
+  (Decimal(12454), ",") -> "0001,2454",
+   

[GitHub] [spark] dtenedor commented on a diff in pull request #36365: [SPARK-28516][SQL] Implement `to_char` and `try_to_char` functions to format Decimal values as strings

2022-04-28 Thread GitBox


dtenedor commented on code in PR #36365:
URL: https://github.com/apache/spark/pull/36365#discussion_r861218799


##
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/StringExpressionsSuite.scala:
##
@@ -1108,6 +1125,238 @@ class StringExpressionsSuite extends SparkFunSuite with 
ExpressionEvalHelper {
 }
   }
 
+  test("ToCharacter: positive tests") {
+// Test '0' and '9'
+Seq(
+  (Decimal(454), "") -> " 454",
+  (Decimal(454), "9") -> "  454",
+  (Decimal(4), "0") -> "4",
+  (Decimal(45), "00") -> "45",
+  (Decimal(454), "000") -> "454",
+  (Decimal(454), "") -> "0454",
+  (Decimal(454), "0") -> "00454"
+).foreach { case ((decimal, format), expected) =>
+  var expr: Expression = ToCharacter(Literal(decimal), Literal(format))
+  assert(expr.checkInputDataTypes() == TypeCheckResult.TypeCheckSuccess)
+  checkEvaluation(expr, expected)
+
+  expr = TryToCharacter(Literal(decimal), Literal(format))
+  assert(expr.checkInputDataTypes() == TypeCheckResult.TypeCheckSuccess)
+  checkEvaluation(expr, expected)
+}
+
+// Test '.' and 'D'
+Seq(
+  (Decimal(0.4542), ".0") -> ".4542 ",
+  (Decimal(454.2), "000.0") -> "454.2",
+  (Decimal(454), "000.0") -> "454  ",
+  (Decimal(454.2), "000.00") -> "454.2 ",
+  (Decimal(454), "000.00") -> "454   ",
+  (Decimal(0.4542), ".") -> ".4542",
+  (Decimal(4542), ".") -> "4542 "
+).foreach { case ((decimal, format), expected) =>
+  val format2 = format.replace('.', 'D')
+  var expr: Expression = ToCharacter(Literal(decimal), Literal(format))
+  assert(expr.checkInputDataTypes() == TypeCheckResult.TypeCheckSuccess)
+  checkEvaluation(expr, expected)
+
+  expr = ToCharacter(Literal(decimal), Literal(format2))
+  assert(expr.checkInputDataTypes() == TypeCheckResult.TypeCheckSuccess)
+  checkEvaluation(expr, expected)
+
+  expr = TryToCharacter(Literal(decimal), Literal(format))
+  assert(expr.checkInputDataTypes() == TypeCheckResult.TypeCheckSuccess)
+  checkEvaluation(expr, expected)
+
+  expr = TryToCharacter(Literal(decimal), Literal(format2))
+  assert(expr.checkInputDataTypes() == TypeCheckResult.TypeCheckSuccess)
+  checkEvaluation(expr, expected)
+}
+
+Seq(
+  (Decimal(454.2), ".00") -> "0454.2 ",
+  (Decimal(454), ".00") -> "0454   ",
+  (Decimal(4542), "0.") -> "04542 ",
+  (Decimal(454.2), ".99") -> " 454.2 ",
+  (Decimal(454), ".99") -> " 454   ",
+  (Decimal(4542), "9.") -> " 4542 "
+).foreach { case ((decimal, format), expected) =>
+  var expr: Expression = ToCharacter(Literal(decimal), Literal(format))
+  assert(expr.checkInputDataTypes() == TypeCheckResult.TypeCheckSuccess)
+  checkEvaluation(expr, expected)
+
+  expr = TryToCharacter(Literal(decimal), Literal(format))
+  assert(expr.checkInputDataTypes() == TypeCheckResult.TypeCheckSuccess)
+  checkEvaluation(expr, expected)
+}
+
+// Test ',' and 'G'
+Seq(
+  (Decimal(12454), "0,") -> "1,2454",
+  (Decimal(12454), "00,000") -> "12,454",
+  (Decimal(124543), "000,000") -> "124,543",
+  (Decimal(1245436), "0,000,000") -> "1,245,436",
+  (Decimal(12454367), "00,000,000") -> "12,454,367",
+  (Decimal(12454), "0,") -> "1,2454",
+  (Decimal(12454), "00,000") -> "12,454",

Review Comment:
   You're right. Removed the redundant test cases.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dtenedor commented on a diff in pull request #36365: [SPARK-28516][SQL] Implement `to_char` and `try_to_char` functions to format Decimal values as strings

2022-04-28 Thread GitBox


dtenedor commented on code in PR #36365:
URL: https://github.com/apache/spark/pull/36365#discussion_r861217041


##
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/StringExpressionsSuite.scala:
##
@@ -1108,6 +1125,238 @@ class StringExpressionsSuite extends SparkFunSuite with 
ExpressionEvalHelper {
 }
   }
 
+  test("ToCharacter: positive tests") {
+// Test '0' and '9'
+Seq(
+  (Decimal(454), "") -> " 454",
+  (Decimal(454), "9") -> "  454",
+  (Decimal(4), "0") -> "4",
+  (Decimal(45), "00") -> "45",
+  (Decimal(454), "000") -> "454",
+  (Decimal(454), "") -> "0454",
+  (Decimal(454), "0") -> "00454"
+).foreach { case ((decimal, format), expected) =>
+  var expr: Expression = ToCharacter(Literal(decimal), Literal(format))
+  assert(expr.checkInputDataTypes() == TypeCheckResult.TypeCheckSuccess)
+  checkEvaluation(expr, expected)
+
+  expr = TryToCharacter(Literal(decimal), Literal(format))
+  assert(expr.checkInputDataTypes() == TypeCheckResult.TypeCheckSuccess)
+  checkEvaluation(expr, expected)
+}
+
+// Test '.' and 'D'
+Seq(
+  (Decimal(0.4542), ".0") -> ".4542 ",
+  (Decimal(454.2), "000.0") -> "454.2",
+  (Decimal(454), "000.0") -> "454  ",
+  (Decimal(454.2), "000.00") -> "454.2 ",
+  (Decimal(454), "000.00") -> "454   ",
+  (Decimal(0.4542), ".") -> ".4542",
+  (Decimal(4542), ".") -> "4542 "
+).foreach { case ((decimal, format), expected) =>
+  val format2 = format.replace('.', 'D')
+  var expr: Expression = ToCharacter(Literal(decimal), Literal(format))
+  assert(expr.checkInputDataTypes() == TypeCheckResult.TypeCheckSuccess)
+  checkEvaluation(expr, expected)
+
+  expr = ToCharacter(Literal(decimal), Literal(format2))
+  assert(expr.checkInputDataTypes() == TypeCheckResult.TypeCheckSuccess)
+  checkEvaluation(expr, expected)
+
+  expr = TryToCharacter(Literal(decimal), Literal(format))
+  assert(expr.checkInputDataTypes() == TypeCheckResult.TypeCheckSuccess)
+  checkEvaluation(expr, expected)
+
+  expr = TryToCharacter(Literal(decimal), Literal(format2))
+  assert(expr.checkInputDataTypes() == TypeCheckResult.TypeCheckSuccess)
+  checkEvaluation(expr, expected)
+}
+
+Seq(
+  (Decimal(454.2), ".00") -> "0454.2 ",
+  (Decimal(454), ".00") -> "0454   ",
+  (Decimal(4542), "0.") -> "04542 ",
+  (Decimal(454.2), ".99") -> " 454.2 ",
+  (Decimal(454), ".99") -> " 454   ",
+  (Decimal(4542), "9.") -> " 4542 "

Review Comment:
   There are no digits (0/9) after the decimal point.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dtenedor commented on a diff in pull request #36365: [SPARK-28516][SQL] Implement `to_char` and `try_to_char` functions to format Decimal values as strings

2022-04-28 Thread GitBox


dtenedor commented on code in PR #36365:
URL: https://github.com/apache/spark/pull/36365#discussion_r861215433


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ToNumberParser.scala:
##
@@ -599,4 +617,240 @@ class ToNumberParser(numberFormat: String, errorOnFail: 
Boolean) extends Seriali
   Decimal(javaDecimal, precision, scale)
 }
   }
+
+  /**
+   * Converts a decimal value to a string based on the given number format.
+   *
+   * Iterates through the [[formatTokens]] obtained from processing the format 
string, while also
+   * inspecting the input decimal value.
+   *
+   * @param input the decimal value that needs to be converted
+   * @return the result String value obtained from string formatting
+   */
+  def format(input: Decimal): UTF8String = {
+val result = new StringBuilder()
+// These are string representations of the input Decimal value.
+val (inputBeforeDecimalPoint: String,
+  inputAfterDecimalPoint: String) =
+  formatSplitInputBeforeAndAfterDecimalPoint(input).getOrElse(
+return formatMatchFailure(input, numberFormat))
+// These are indexes into the characters of the input string before and 
after the decimal point.
+formattingBeforeDecimalPointIndex = 0
+formattingAfterDecimalPointIndex = 0
+var reachedDecimalPoint = false
+
+// Iterate through the tokens representing the provided format string, in 
order.
+for (formatToken: InputToken <- formatTokens) {
+  formatToken match {
+case groups: DigitGroups =>
+  formatDigitGroups(
+groups, inputBeforeDecimalPoint, inputAfterDecimalPoint, 
reachedDecimalPoint, result)
+case DecimalPoint() =>
+  // If the last character so far is a space, change it to a zero.
+  if (result.nonEmpty && result.last == SPACE) {
+result(result.length - 1) = ZERO_DIGIT
+  }
+  result.append(POINT_SIGN)
+  reachedDecimalPoint = true
+case DollarSign() =>
+  result.append(DOLLAR_SIGN)
+case _: OptionalPlusOrMinusSign | _: OptionalMinusSign =>

Review Comment:
   SG, done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dtenedor commented on a diff in pull request #36365: [SPARK-28516][SQL] Implement `to_char` and `try_to_char` functions to format Decimal values as strings

2022-04-28 Thread GitBox


dtenedor commented on code in PR #36365:
URL: https://github.com/apache/spark/pull/36365#discussion_r861186496


##
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/StringExpressionsSuite.scala:
##
@@ -1108,6 +1125,238 @@ class StringExpressionsSuite extends SparkFunSuite with 
ExpressionEvalHelper {
 }
   }
 
+  test("ToCharacter: positive tests") {
+// Test '0' and '9'
+Seq(
+  (Decimal(454), "") -> " 454",
+  (Decimal(454), "9") -> "  454",
+  (Decimal(4), "0") -> "4",
+  (Decimal(45), "00") -> "45",
+  (Decimal(454), "000") -> "454",
+  (Decimal(454), "") -> "0454",
+  (Decimal(454), "0") -> "00454"
+).foreach { case ((decimal, format), expected) =>
+  var expr: Expression = ToCharacter(Literal(decimal), Literal(format))
+  assert(expr.checkInputDataTypes() == TypeCheckResult.TypeCheckSuccess)
+  checkEvaluation(expr, expected)
+
+  expr = TryToCharacter(Literal(decimal), Literal(format))
+  assert(expr.checkInputDataTypes() == TypeCheckResult.TypeCheckSuccess)
+  checkEvaluation(expr, expected)
+}
+
+// Test '.' and 'D'
+Seq(
+  (Decimal(0.4542), ".0") -> ".4542 ",
+  (Decimal(454.2), "000.0") -> "454.2",
+  (Decimal(454), "000.0") -> "454  ",

Review Comment:
   Per reply elsewhere, I edited the string-formatting to skip these extra 
spaces.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dtenedor commented on a diff in pull request #36365: [SPARK-28516][SQL] Implement `to_char` and `try_to_char` functions to format Decimal values as strings

2022-04-28 Thread GitBox


dtenedor commented on code in PR #36365:
URL: https://github.com/apache/spark/pull/36365#discussion_r861186164


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ToNumberParser.scala:
##
@@ -599,4 +617,240 @@ class ToNumberParser(numberFormat: String, errorOnFail: 
Boolean) extends Seriali
   Decimal(javaDecimal, precision, scale)
 }
   }
+
+  /**
+   * Converts a decimal value to a string based on the given number format.
+   *
+   * Iterates through the [[formatTokens]] obtained from processing the format 
string, while also
+   * inspecting the input decimal value.
+   *
+   * @param input the decimal value that needs to be converted
+   * @return the result String value obtained from string formatting
+   */
+  def format(input: Decimal): UTF8String = {
+val result = new StringBuilder()
+// These are string representations of the input Decimal value.
+val (inputBeforeDecimalPoint: String,
+  inputAfterDecimalPoint: String) =
+  formatSplitInputBeforeAndAfterDecimalPoint(input).getOrElse(
+return formatMatchFailure(input, numberFormat))
+// These are indexes into the characters of the input string before and 
after the decimal point.
+formattingBeforeDecimalPointIndex = 0
+formattingAfterDecimalPointIndex = 0
+var reachedDecimalPoint = false
+
+// Iterate through the tokens representing the provided format string, in 
order.
+for (formatToken: InputToken <- formatTokens) {
+  formatToken match {
+case groups: DigitGroups =>
+  formatDigitGroups(
+groups, inputBeforeDecimalPoint, inputAfterDecimalPoint, 
reachedDecimalPoint, result)
+case DecimalPoint() =>
+  // If the last character so far is a space, change it to a zero.

Review Comment:
   Sounds good, doen.



##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ToNumberParser.scala:
##
@@ -599,4 +617,240 @@ class ToNumberParser(numberFormat: String, errorOnFail: 
Boolean) extends Seriali
   Decimal(javaDecimal, precision, scale)
 }
   }
+
+  /**
+   * Converts a decimal value to a string based on the given number format.
+   *
+   * Iterates through the [[formatTokens]] obtained from processing the format 
string, while also
+   * inspecting the input decimal value.
+   *
+   * @param input the decimal value that needs to be converted
+   * @return the result String value obtained from string formatting
+   */
+  def format(input: Decimal): UTF8String = {
+val result = new StringBuilder()
+// These are string representations of the input Decimal value.
+val (inputBeforeDecimalPoint: String,
+  inputAfterDecimalPoint: String) =
+  formatSplitInputBeforeAndAfterDecimalPoint(input).getOrElse(
+return formatMatchFailure(input, numberFormat))
+// These are indexes into the characters of the input string before and 
after the decimal point.
+formattingBeforeDecimalPointIndex = 0
+formattingAfterDecimalPointIndex = 0
+var reachedDecimalPoint = false
+
+// Iterate through the tokens representing the provided format string, in 
order.
+for (formatToken: InputToken <- formatTokens) {
+  formatToken match {
+case groups: DigitGroups =>
+  formatDigitGroups(
+groups, inputBeforeDecimalPoint, inputAfterDecimalPoint, 
reachedDecimalPoint, result)
+case DecimalPoint() =>
+  // If the last character so far is a space, change it to a zero.

Review Comment:
   Sounds good, done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dtenedor commented on a diff in pull request #36365: [SPARK-28516][SQL] Implement `to_char` and `try_to_char` functions to format Decimal values as strings

2022-04-28 Thread GitBox


dtenedor commented on code in PR #36365:
URL: https://github.com/apache/spark/pull/36365#discussion_r861181335


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ToNumberParser.scala:
##
@@ -599,4 +617,240 @@ class ToNumberParser(numberFormat: String, errorOnFail: 
Boolean) extends Seriali
   Decimal(javaDecimal, precision, scale)
 }
   }
+
+  /**
+   * Converts a decimal value to a string based on the given number format.
+   *
+   * Iterates through the [[formatTokens]] obtained from processing the format 
string, while also
+   * inspecting the input decimal value.
+   *
+   * @param input the decimal value that needs to be converted
+   * @return the result String value obtained from string formatting
+   */
+  def format(input: Decimal): UTF8String = {
+val result = new StringBuilder()
+// These are string representations of the input Decimal value.
+val (inputBeforeDecimalPoint: String,
+  inputAfterDecimalPoint: String) =
+  formatSplitInputBeforeAndAfterDecimalPoint(input).getOrElse(
+return formatMatchFailure(input, numberFormat))
+// These are indexes into the characters of the input string before and 
after the decimal point.
+formattingBeforeDecimalPointIndex = 0
+formattingAfterDecimalPointIndex = 0
+var reachedDecimalPoint = false
+
+// Iterate through the tokens representing the provided format string, in 
order.
+for (formatToken: InputToken <- formatTokens) {
+  formatToken match {
+case groups: DigitGroups =>
+  formatDigitGroups(
+groups, inputBeforeDecimalPoint, inputAfterDecimalPoint, 
reachedDecimalPoint, result)
+case DecimalPoint() =>
+  // If the last character so far is a space, change it to a zero.
+  if (result.nonEmpty && result.last == SPACE) {
+result(result.length - 1) = ZERO_DIGIT
+  }
+  result.append(POINT_SIGN)
+  reachedDecimalPoint = true
+case DollarSign() =>
+  result.append(DOLLAR_SIGN)
+case _: OptionalPlusOrMinusSign | _: OptionalMinusSign =>
+  if (input < Decimal.ZERO) {
+stripTrailingLoneDecimalPoint(result)
+addCharacterCheckingTrailingSpaces(result, MINUS_SIGN)
+  } else {
+result.append(SPACE)
+  }
+case OpeningAngleBracket() =>
+  if (input < Decimal.ZERO) {
+result.append(ANGLE_BRACKET_OPEN)
+  } else {
+result.append(SPACE)
+  }
+case ClosingAngleBracket() =>
+  if (input < Decimal.ZERO) {
+stripTrailingLoneDecimalPoint(result)
+addCharacterCheckingTrailingSpaces(result, ANGLE_BRACKET_CLOSE)
+  } else {
+result.append(SPACE)
+  }
+  }
+}
+
+if (formattingBeforeDecimalPointIndex < inputBeforeDecimalPoint.length ||
+  formattingAfterDecimalPointIndex < inputAfterDecimalPoint.length) {
+  // Remaining digits before or after the decimal point exist in the 
decimal value but not in
+  // the format string.
+  formatMatchFailure(input, numberFormat)
+} else {
+  stripTrailingLoneDecimalPoint(result)
+  UTF8String.fromString(result.toString())
+}
+  }
+
+  /**
+   * Splits the provided Decimal value's string representation by the decimal 
point, if any.
+   * @param input the Decimal value to consume
+   * @return two strings representing the contents before and after the 
decimal point (if any),
+   * respectively, or None if the input string did not match the 
format string.
+   */
+  private def formatSplitInputBeforeAndAfterDecimalPoint(
+  input: Decimal): Option[(String, String)] = {
+// Convert the input Decimal value to a string (without exponent notation).
+val inputString = input.toJavaBigDecimal.toPlainString
+// Split the digits before and after the decimal point.
+val tokens = inputString.split(POINT_SIGN)
+var beforeDecimalPoint = tokens(0)
+var afterDecimalPoint = if (tokens.length > 1) tokens(1) else ""
+// Strip any leading minus sign to consider the digits only.
+// Strip leading and trailing zeros to match cases when the format string 
begins with a decimal
+// point.
+beforeDecimalPoint = beforeDecimalPoint.dropWhile(c => c == MINUS_SIGN || 
c == ZERO_DIGIT)
+afterDecimalPoint = afterDecimalPoint.reverse.dropWhile(_ == 
ZERO_DIGIT).reverse
+
+// If the format string specifies more digits than the 
'beforeDecimalPoint', prepend leading
+// spaces to make them the same length. Likewise, if the format string 
specifies more digits
+// than the 'afterDecimalPoint', append trailing spaces to make them the 
same length. This step
+// simplifies logic consuming the format tokens later.
+var reachedDecimalPoint = false
+var numFormatDigitsBeforeDecimalPoint = 0
+var numFormatDigitsAfterDecimalPoint = 0
+formatTokens.foreach {
+  

[GitHub] [spark] dtenedor commented on a diff in pull request #36365: [SPARK-28516][SQL] Implement `to_char` and `try_to_char` functions to format Decimal values as strings

2022-04-28 Thread GitBox


dtenedor commented on code in PR #36365:
URL: https://github.com/apache/spark/pull/36365#discussion_r861170782


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ToNumberParser.scala:
##
@@ -599,4 +617,240 @@ class ToNumberParser(numberFormat: String, errorOnFail: 
Boolean) extends Seriali
   Decimal(javaDecimal, precision, scale)
 }
   }
+
+  /**
+   * Converts a decimal value to a string based on the given number format.
+   *
+   * Iterates through the [[formatTokens]] obtained from processing the format 
string, while also
+   * inspecting the input decimal value.
+   *
+   * @param input the decimal value that needs to be converted
+   * @return the result String value obtained from string formatting
+   */
+  def format(input: Decimal): UTF8String = {
+val result = new StringBuilder()
+// These are string representations of the input Decimal value.
+val (inputBeforeDecimalPoint: String,
+  inputAfterDecimalPoint: String) =
+  formatSplitInputBeforeAndAfterDecimalPoint(input).getOrElse(
+return formatMatchFailure(input, numberFormat))
+// These are indexes into the characters of the input string before and 
after the decimal point.
+formattingBeforeDecimalPointIndex = 0
+formattingAfterDecimalPointIndex = 0
+var reachedDecimalPoint = false
+
+// Iterate through the tokens representing the provided format string, in 
order.
+for (formatToken: InputToken <- formatTokens) {
+  formatToken match {
+case groups: DigitGroups =>
+  formatDigitGroups(
+groups, inputBeforeDecimalPoint, inputAfterDecimalPoint, 
reachedDecimalPoint, result)
+case DecimalPoint() =>
+  // If the last character so far is a space, change it to a zero.
+  if (result.nonEmpty && result.last == SPACE) {
+result(result.length - 1) = ZERO_DIGIT
+  }
+  result.append(POINT_SIGN)
+  reachedDecimalPoint = true
+case DollarSign() =>
+  result.append(DOLLAR_SIGN)
+case _: OptionalPlusOrMinusSign | _: OptionalMinusSign =>
+  if (input < Decimal.ZERO) {
+stripTrailingLoneDecimalPoint(result)
+addCharacterCheckingTrailingSpaces(result, MINUS_SIGN)
+  } else {
+result.append(SPACE)
+  }
+case OpeningAngleBracket() =>
+  if (input < Decimal.ZERO) {
+result.append(ANGLE_BRACKET_OPEN)
+  } else {
+result.append(SPACE)
+  }
+case ClosingAngleBracket() =>
+  if (input < Decimal.ZERO) {
+stripTrailingLoneDecimalPoint(result)
+addCharacterCheckingTrailingSpaces(result, ANGLE_BRACKET_CLOSE)
+  } else {
+result.append(SPACE)
+  }
+  }
+}
+
+if (formattingBeforeDecimalPointIndex < inputBeforeDecimalPoint.length ||
+  formattingAfterDecimalPointIndex < inputAfterDecimalPoint.length) {
+  // Remaining digits before or after the decimal point exist in the 
decimal value but not in
+  // the format string.
+  formatMatchFailure(input, numberFormat)
+} else {
+  stripTrailingLoneDecimalPoint(result)
+  UTF8String.fromString(result.toString())
+}
+  }
+
+  /**
+   * Splits the provided Decimal value's string representation by the decimal 
point, if any.
+   * @param input the Decimal value to consume
+   * @return two strings representing the contents before and after the 
decimal point (if any),
+   * respectively, or None if the input string did not match the 
format string.
+   */
+  private def formatSplitInputBeforeAndAfterDecimalPoint(
+  input: Decimal): Option[(String, String)] = {
+// Convert the input Decimal value to a string (without exponent notation).
+val inputString = input.toJavaBigDecimal.toPlainString
+// Split the digits before and after the decimal point.
+val tokens = inputString.split(POINT_SIGN)
+var beforeDecimalPoint = tokens(0)
+var afterDecimalPoint = if (tokens.length > 1) tokens(1) else ""
+// Strip any leading minus sign to consider the digits only.
+// Strip leading and trailing zeros to match cases when the format string 
begins with a decimal
+// point.
+beforeDecimalPoint = beforeDecimalPoint.dropWhile(c => c == MINUS_SIGN || 
c == ZERO_DIGIT)
+afterDecimalPoint = afterDecimalPoint.reverse.dropWhile(_ == 
ZERO_DIGIT).reverse
+
+// If the format string specifies more digits than the 
'beforeDecimalPoint', prepend leading
+// spaces to make them the same length. Likewise, if the format string 
specifies more digits
+// than the 'afterDecimalPoint', append trailing spaces to make them the 
same length. This step
+// simplifies logic consuming the format tokens later.
+var reachedDecimalPoint = false
+var numFormatDigitsBeforeDecimalPoint = 0
+var numFormatDigitsAfterDecimalPoint = 0
+formatTokens.foreach {
+  

[GitHub] [spark] dtenedor commented on a diff in pull request #36365: [SPARK-28516][SQL] Implement `to_char` and `try_to_char` functions to format Decimal values as strings

2022-04-28 Thread GitBox


dtenedor commented on code in PR #36365:
URL: https://github.com/apache/spark/pull/36365#discussion_r861164996


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ToNumberParser.scala:
##
@@ -165,6 +166,10 @@ class ToNumberParser(numberFormat: String, errorOnFail: 
Boolean) extends Seriali
   tokens.prepend(OpeningAngleBracket())
   tokens.append(ClosingAngleBracket())
   i += 2
+case SPACE =>

Review Comment:
   Postgres allows this, but asking Serge, he feels there is no value in this 
and we should block it. I updated accordingly.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dtenedor commented on a diff in pull request #36365: [SPARK-28516][SQL] Implement `to_char` and `try_to_char` functions to format Decimal values as strings

2022-04-28 Thread GitBox


dtenedor commented on code in PR #36365:
URL: https://github.com/apache/spark/pull/36365#discussion_r861164996


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ToNumberParser.scala:
##
@@ -165,6 +166,10 @@ class ToNumberParser(numberFormat: String, errorOnFail: 
Boolean) extends Seriali
   tokens.prepend(OpeningAngleBracket())
   tokens.append(ClosingAngleBracket())
   i += 2
+case SPACE =>

Review Comment:
   Yeah, I added this small change because the Spark `.sql` tests specify this, 
and it is compatible with Postgres. I asked Serge about this to see if he has 
opinions.
   
   https://user-images.githubusercontent.com/99207096/165815005-431ed4c6-cb4b-4813-bd84-91d690c0a2d2.png;>




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dtenedor commented on a diff in pull request #36365: [SPARK-28516][SQL] Implement `to_char` and `try_to_char` functions to format Decimal values as strings

2022-04-28 Thread GitBox


dtenedor commented on code in PR #36365:
URL: https://github.com/apache/spark/pull/36365#discussion_r861163608


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/numberFormatExpressions.scala:
##
@@ -168,3 +168,142 @@ case class TryToNumber(left: Expression, right: 
Expression)
   newRight: Expression): TryToNumber =
 copy(left = newLeft, right = newRight)
 }
+
+/**
+ * A function that converts decimal values to strings, returning NULL if the 
decimal value fails to
+ * match the format string.
+ */
+@ExpressionDescription(
+  usage = """
+_FUNC_(numberExpr, formatExpr) - Convert `numberExpr` to a string based on 
the `formatExpr`.
+  Throws an exception if the conversion fails. The format follows the same 
semantics as the
+  to_number function.

Review Comment:
   Good idea, done. Making some edits helps the reader understand how each part 
of the result string is generated (as opposed to consumed).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dtenedor commented on a diff in pull request #36365: [SPARK-28516][SQL] Implement `to_char` and `try_to_char` functions to format Decimal values as strings

2022-04-27 Thread GitBox


dtenedor commented on code in PR #36365:
URL: https://github.com/apache/spark/pull/36365#discussion_r860053078


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ToNumberParser.scala:
##
@@ -599,4 +612,227 @@ class ToNumberParser(numberFormat: String, errorOnFail: 
Boolean) extends Seriali
   Decimal(javaDecimal, precision, scale)
 }
   }
+
+  /**
+   * Converts a decimal value to a string based on the given number format.
+   *
+   * Iterates through the [[formatTokens]] obtained from processing the format 
string, while also
+   * inspecting the input decimal value.
+   *
+   * @param input the decimal value that needs to be converted
+   * @return the result String value obtained from string formatting
+   */
+  def format(input: Decimal): UTF8String = {
+val result = new StringBuilder()
+// These are string representations of the input Decimal value.
+val (inputBeforeDecimalPoint: String,
+  inputAfterDecimalPoint: String) =
+  formatSplitInputBeforeAndAfterDecimalPoint(input).getOrElse(
+return formatMatchFailure(input, numberFormat))
+// These are indexes into the characters of the input string before and 
after the decimal point.
+formattingBeforeDecimalPointIndex = 0
+formattingAfterDecimalPointIndex = 0
+var reachedDecimalPoint = false
+
+// Iterate through the tokens representing the provided format string, in 
order.
+for (formatToken: InputToken <- formatTokens) {
+  formatToken match {
+case groups: DigitGroups =>
+  formatDigitGroups(
+groups, inputBeforeDecimalPoint, inputAfterDecimalPoint, 
reachedDecimalPoint, result)
+case DecimalPoint() =>
+  result.append('.')
+  reachedDecimalPoint = true
+case DollarSign() =>
+  result.append('$')
+case _: OptionalPlusOrMinusSign | _: OptionalMinusSign =>
+  if (input < Decimal.ZERO) {
+result.append('-')
+  } else {
+result.append(' ')
+  }
+case OpeningAngleBracket() =>
+  if (input < Decimal.ZERO) {
+result.append('<')
+  } else {
+result.append(' ')
+  }
+case ClosingAngleBracket() =>
+  if (input < Decimal.ZERO) {
+if (result.nonEmpty && result.last == ' ') {
+  result.setCharAt(result.length - 1, '>')
+  result.append(' ')
+} else {
+  result.append('>')
+}
+  } else {
+result.append(' ')
+  }
+  }
+}
+
+if (formattingBeforeDecimalPointIndex < inputBeforeDecimalPoint.length ||
+  formattingAfterDecimalPointIndex < inputAfterDecimalPoint.length) {
+  // Remaining digits before or after the decimal point exist in the 
decimal value but not in
+  // the format string.
+  formatMatchFailure(input, numberFormat)
+} else {
+  UTF8String.fromString(result.toString())
+}
+  }
+
+  /**
+   * Splits the provided Decimal value's string representation by the decimal 
point, if any.
+   * @param input the Decimal value to consume
+   * @return two strings representing the contents before and after the 
decimal point (if any),
+   * respectively, or None if the input string did not match the 
format string.
+   */
+  private def formatSplitInputBeforeAndAfterDecimalPoint(
+  input: Decimal): Option[(String, String)] = {
+val beforeDecimalPointBuilder = new StringBuilder()
+val afterDecimalPointBuilder = new StringBuilder()
+var numInputDigitsBeforeDecimalPoint: Int = 0
+var numInputDigitsAfterDecimalPoint: Int = 0
+var reachedDecimalPoint = false
+var negateResult = false
+// Convert the input Decimal value to a string (without exponent 
notation). Strip leading zeros
+// in order to match cases when the format string begins with a decimal 
point.
+val inputString = input.toJavaBigDecimal.toPlainString.dropWhile(_ == '0')
+for (c: Char <- inputString) {
+  c match {
+case _ if c >= ZERO_DIGIT && c <= NINE_DIGIT =>
+  if (reachedDecimalPoint) {
+afterDecimalPointBuilder.append(c)
+numInputDigitsAfterDecimalPoint += 1
+  } else {
+beforeDecimalPointBuilder.append(c)
+numInputDigitsBeforeDecimalPoint += 1
+  }
+case POINT_SIGN =>
+  reachedDecimalPoint = true
+case MINUS_SIGN =>
+  negateResult = true
+  }
+}
+// If the format string specifies more digits than the 
'beforeDecimalPointBuilder', prepend
+// leading spaces to make them the same length. Likewise, if the format 
string specifies more
+// digits than the 'afterDecimalPointBuilder', append trailing spaces to 
make them the same
+// length. This step simplifies logic consuming the format tokens later.
+reachedDecimalPoint = false
+var numFormatDigitsBeforeDecimalPoint: Int = 0
+  

[GitHub] [spark] dtenedor commented on a diff in pull request #36365: [SPARK-28516][SQL] Implement `to_char` and `try_to_char` functions to format Decimal values as strings

2022-04-27 Thread GitBox


dtenedor commented on code in PR #36365:
URL: https://github.com/apache/spark/pull/36365#discussion_r860009328


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ToNumberParser.scala:
##
@@ -599,4 +612,227 @@ class ToNumberParser(numberFormat: String, errorOnFail: 
Boolean) extends Seriali
   Decimal(javaDecimal, precision, scale)
 }
   }
+
+  /**
+   * Converts a decimal value to a string based on the given number format.
+   *
+   * Iterates through the [[formatTokens]] obtained from processing the format 
string, while also
+   * inspecting the input decimal value.
+   *
+   * @param input the decimal value that needs to be converted
+   * @return the result String value obtained from string formatting
+   */
+  def format(input: Decimal): UTF8String = {
+val result = new StringBuilder()
+// These are string representations of the input Decimal value.
+val (inputBeforeDecimalPoint: String,
+  inputAfterDecimalPoint: String) =
+  formatSplitInputBeforeAndAfterDecimalPoint(input).getOrElse(
+return formatMatchFailure(input, numberFormat))
+// These are indexes into the characters of the input string before and 
after the decimal point.
+formattingBeforeDecimalPointIndex = 0
+formattingAfterDecimalPointIndex = 0
+var reachedDecimalPoint = false
+
+// Iterate through the tokens representing the provided format string, in 
order.
+for (formatToken: InputToken <- formatTokens) {
+  formatToken match {
+case groups: DigitGroups =>
+  formatDigitGroups(
+groups, inputBeforeDecimalPoint, inputAfterDecimalPoint, 
reachedDecimalPoint, result)
+case DecimalPoint() =>
+  result.append('.')
+  reachedDecimalPoint = true
+case DollarSign() =>
+  result.append('$')
+case _: OptionalPlusOrMinusSign | _: OptionalMinusSign =>
+  if (input < Decimal.ZERO) {
+result.append('-')
+  } else {
+result.append(' ')
+  }
+case OpeningAngleBracket() =>
+  if (input < Decimal.ZERO) {
+result.append('<')
+  } else {
+result.append(' ')
+  }
+case ClosingAngleBracket() =>
+  if (input < Decimal.ZERO) {
+if (result.nonEmpty && result.last == ' ') {

Review Comment:
   Good point, I've updated the logic to loop for this reason and added a test 
case.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dtenedor commented on a diff in pull request #36365: [SPARK-28516][SQL] Implement `to_char` and `try_to_char` functions to format Decimal values as strings

2022-04-27 Thread GitBox


dtenedor commented on code in PR #36365:
URL: https://github.com/apache/spark/pull/36365#discussion_r859992812


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ToNumberParser.scala:
##
@@ -599,4 +612,227 @@ class ToNumberParser(numberFormat: String, errorOnFail: 
Boolean) extends Seriali
   Decimal(javaDecimal, precision, scale)
 }
   }
+
+  /**
+   * Converts a decimal value to a string based on the given number format.
+   *
+   * Iterates through the [[formatTokens]] obtained from processing the format 
string, while also
+   * inspecting the input decimal value.
+   *
+   * @param input the decimal value that needs to be converted
+   * @return the result String value obtained from string formatting
+   */
+  def format(input: Decimal): UTF8String = {
+val result = new StringBuilder()
+// These are string representations of the input Decimal value.
+val (inputBeforeDecimalPoint: String,
+  inputAfterDecimalPoint: String) =
+  formatSplitInputBeforeAndAfterDecimalPoint(input).getOrElse(

Review Comment:
   Good idea, I was able to use this to simplify the code in 
`formatSplitInputBeforeAndAfterDecimalPoint`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dtenedor commented on a diff in pull request #36365: [SPARK-28516][SQL] Implement `to_char` and `try_to_char` functions to format Decimal values as strings

2022-04-27 Thread GitBox


dtenedor commented on code in PR #36365:
URL: https://github.com/apache/spark/pull/36365#discussion_r859978460


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/numberFormatExpressions.scala:
##
@@ -168,3 +168,141 @@ case class TryToNumber(left: Expression, right: 
Expression)
   newRight: Expression): TryToNumber =
 copy(left = newLeft, right = newRight)
 }
+
+/**
+ * A function that converts decimal values to strings, returning NULL if the 
decimal value fails to
+ * match the format string.
+ */
+@ExpressionDescription(
+  usage = """
+ _FUNC_(numberExpr, formatExpr) - Convert `numberExpr` to a string based 
on the `formatExpr`.
+   Throws an exception if the conversion fails. The format follows the 
same semantics as the
+   to_number function.
+  """,
+  examples = """
+Examples:
+  > SELECT _FUNC_(454, '999');
+   454
+  > SELECT _FUNC_(454.00, '000D00');
+   454.00
+  > SELECT _FUNC_(12454, '99G999');
+   12,454
+  > SELECT _FUNC_(78.12, '$99.99');
+   $78.12
+  > SELECT _FUNC_(-12454.8, '99G999D9S');
+   12,454.8-
+  """,
+  since = "3.3.0",
+  group = "string_funcs")
+case class ToCharacter(left: Expression, right: Expression)
+  extends BinaryExpression with ImplicitCastInputTypes with NullIntolerant {
+  private lazy val numberFormat = 
right.eval().toString.toUpperCase(Locale.ROOT)
+  private lazy val numberFormatter = new ToNumberParser(numberFormat, true)
+
+  override def dataType: DataType = StringType
+  override def inputTypes: Seq[AbstractDataType] = Seq(DecimalType, StringType)
+  override def checkInputDataTypes(): TypeCheckResult = {
+val inputTypeCheck = super.checkInputDataTypes()
+if (inputTypeCheck.isSuccess) {
+  if (right.foldable) {
+numberFormatter.check()
+  } else {
+TypeCheckResult.TypeCheckFailure(s"Format expression must be foldable, 
but got $right")
+  }
+} else {
+  inputTypeCheck
+}
+  }
+  override def prettyName: String = "to_char"
+  override def nullSafeEval(decimal: Any, format: Any): Any = {
+val input = decimal.asInstanceOf[Decimal]
+numberFormatter.format(input)
+  }
+  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+val builder =
+  ctx.addReferenceObj("builder", numberFormatter, 
classOf[ToNumberParser].getName)
+val eval = left.genCode(ctx)
+val result =
+  code"""
+|${eval.code}

Review Comment:
   Done.



##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/numberFormatExpressions.scala:
##
@@ -168,3 +168,141 @@ case class TryToNumber(left: Expression, right: 
Expression)
   newRight: Expression): TryToNumber =
 copy(left = newLeft, right = newRight)
 }
+
+/**
+ * A function that converts decimal values to strings, returning NULL if the 
decimal value fails to
+ * match the format string.
+ */
+@ExpressionDescription(
+  usage = """
+ _FUNC_(numberExpr, formatExpr) - Convert `numberExpr` to a string based 
on the `formatExpr`.
+   Throws an exception if the conversion fails. The format follows the 
same semantics as the
+   to_number function.
+  """,
+  examples = """
+Examples:
+  > SELECT _FUNC_(454, '999');
+   454
+  > SELECT _FUNC_(454.00, '000D00');
+   454.00
+  > SELECT _FUNC_(12454, '99G999');
+   12,454
+  > SELECT _FUNC_(78.12, '$99.99');
+   $78.12
+  > SELECT _FUNC_(-12454.8, '99G999D9S');
+   12,454.8-
+  """,
+  since = "3.3.0",
+  group = "string_funcs")
+case class ToCharacter(left: Expression, right: Expression)
+  extends BinaryExpression with ImplicitCastInputTypes with NullIntolerant {
+  private lazy val numberFormat = 
right.eval().toString.toUpperCase(Locale.ROOT)
+  private lazy val numberFormatter = new ToNumberParser(numberFormat, true)
+
+  override def dataType: DataType = StringType
+  override def inputTypes: Seq[AbstractDataType] = Seq(DecimalType, StringType)
+  override def checkInputDataTypes(): TypeCheckResult = {
+val inputTypeCheck = super.checkInputDataTypes()
+if (inputTypeCheck.isSuccess) {
+  if (right.foldable) {
+numberFormatter.check()
+  } else {
+TypeCheckResult.TypeCheckFailure(s"Format expression must be foldable, 
but got $right")
+  }
+} else {
+  inputTypeCheck
+}
+  }
+  override def prettyName: String = "to_char"
+  override def nullSafeEval(decimal: Any, format: Any): Any = {
+val input = decimal.asInstanceOf[Decimal]
+numberFormatter.format(input)
+  }
+  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+val builder =
+  ctx.addReferenceObj("builder", numberFormatter, 
classOf[ToNumberParser].getName)
+val eval = left.genCode(ctx)
+val result =
+  code"""
+|${eval.code}
+|boolean ${ev.isNull} = ${eval.isNull};
+|${CodeGenerator.javaType(dataType)} ${ev.value} = 

[GitHub] [spark] dtenedor commented on a diff in pull request #36365: [SPARK-28516][SQL] Implement `to_char` and `try_to_char` functions to format Decimal values as strings

2022-04-27 Thread GitBox


dtenedor commented on code in PR #36365:
URL: https://github.com/apache/spark/pull/36365#discussion_r859976824


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/numberFormatExpressions.scala:
##
@@ -168,3 +168,141 @@ case class TryToNumber(left: Expression, right: 
Expression)
   newRight: Expression): TryToNumber =
 copy(left = newLeft, right = newRight)
 }
+
+/**
+ * A function that converts decimal values to strings, returning NULL if the 
decimal value fails to
+ * match the format string.
+ */
+@ExpressionDescription(
+  usage = """
+ _FUNC_(numberExpr, formatExpr) - Convert `numberExpr` to a string based 
on the `formatExpr`.

Review Comment:
   Done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org