Re: [PR] feat: Support string decimal cast [datafusion-comet]
andygrove commented on PR #2925: URL: https://github.com/apache/datafusion-comet/pull/2925#issuecomment-3712099081 > Late comment, but did rust_decimal not work for us? I didn't know this at the time, but there are security issues with rust_decimal (this is currently blocking PR builds in the datafusion core project). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] feat: Support string decimal cast [datafusion-comet]
coderfender commented on PR #2925: URL: https://github.com/apache/datafusion-comet/pull/2925#issuecomment-3712088019 Yeah . I had a chat with @andygrove about this and preferred to keep the cast logic to ourselves to match with Spark rather than relying on a new library -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] feat: Support string decimal cast [datafusion-comet]
parthchandra commented on PR #2925: URL: https://github.com/apache/datafusion-comet/pull/2925#issuecomment-3712073922 Late comment, but did rust_decimal not work for us? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] feat: Support string decimal cast [datafusion-comet]
coderfender commented on PR #2925: URL: https://github.com/apache/datafusion-comet/pull/2925#issuecomment-3684114952 Thank you @andygrove -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] feat: Support string decimal cast [datafusion-comet]
andygrove merged PR #2925: URL: https://github.com/apache/datafusion-comet/pull/2925 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] feat: Support string decimal cast [datafusion-comet]
coderfender commented on PR #2925: URL: https://github.com/apache/datafusion-comet/pull/2925#issuecomment-3684062501 @andygrove the docs are already updated AFAIK -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] feat: Support string decimal cast [datafusion-comet]
andygrove commented on PR #2925: URL: https://github.com/apache/datafusion-comet/pull/2925#issuecomment-3684033931 Thanks @coderfender. This LGTM but could you also update the compatibility guide. It currently states `Does not support inputs ending with 'd' or 'f'. Does not support 'inf'. Does not support ANSI mode. Returns 0.0 instead of null if input contains no digits` and this is not entirely correct now. With the docs update, I'll be happy to approve. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] feat: Support string decimal cast [datafusion-comet]
coderfender commented on code in PR #2925:
URL: https://github.com/apache/datafusion-comet/pull/2925#discussion_r2641058678
##
spark/src/test/scala/org/apache/comet/CometCastSuite.scala:
##
@@ -684,21 +683,88 @@ class CometCastSuite extends CometTestBase with
AdaptiveSparkPlanHelper {
}
}
+// This is to pass the first `all cast combinations are covered`
ignore("cast StringType to DecimalType(10,2)") {
Review Comment:
@andygrove this test cant be enabled given that we are still marking the
casts as incompatible . However I added the same tests (with a different name
to make sure they pass) . Enabling this test will fail 'all cast combinations
are covered'
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] feat: Support string decimal cast [datafusion-comet]
andygrove commented on code in PR #2925:
URL: https://github.com/apache/datafusion-comet/pull/2925#discussion_r2641053036
##
spark/src/test/scala/org/apache/comet/CometCastSuite.scala:
##
@@ -684,21 +683,88 @@ class CometCastSuite extends CometTestBase with
AdaptiveSparkPlanHelper {
}
}
+// This is to pass the first `all cast combinations are covered`
ignore("cast StringType to DecimalType(10,2)") {
Review Comment:
can this test also be enabled now?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] feat: Support string decimal cast [datafusion-comet]
coderfender commented on PR #2925: URL: https://github.com/apache/datafusion-comet/pull/2925#issuecomment-3683891186 @andygrove please take a look whenever you get a chance. Thank you -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] feat: Support string decimal cast [datafusion-comet]
codecov-commenter commented on PR #2925: URL: https://github.com/apache/datafusion-comet/pull/2925#issuecomment-3682949388 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/2925?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) Report :x: Patch coverage is `0%` with `1 line` in your changes missing coverage. Please review. :white_check_mark: Project coverage is 9.99%. Comparing base ([`f09f8af`](https://app.codecov.io/gh/apache/datafusion-comet/commit/f09f8af64c6599255e116a376f4f008f2fd63b43?dropdown=coverage&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)) to head ([`6a6c80d`](https://app.codecov.io/gh/apache/datafusion-comet/commit/6a6c80d0c66e93c8f08ba4fd7d72096049c04ae0?dropdown=coverage&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)). :warning: Report is 785 commits behind head on main. | [Files with missing lines](https://app.codecov.io/gh/apache/datafusion-comet/pull/2925?dropdown=coverage&src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | Patch % | Lines | |---|---|---| | [...scala/org/apache/comet/expressions/CometCast.scala](https://app.codecov.io/gh/apache/datafusion-comet/pull/2925?src=pr&el=tree&filepath=spark%2Fsrc%2Fmain%2Fscala%2Forg%2Fapache%2Fcomet%2Fexpressions%2FCometCast.scala&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-c3Bhcmsvc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9jb21ldC9leHByZXNzaW9ucy9Db21ldENhc3Quc2NhbGE=) | 0.00% | [1 Missing :warning: ](https://app.codecov.io/gh/apache/datafusion-comet/pull/2925?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #2925 +/- ## - Coverage 56.12% 9.99% -46.14% + Complexity 976 234 -742 Files 119 167 +48 Lines 11743 15478 +3735 Branches 22512542 +291 - Hits 65911547 -5044 - Misses 4012 13754 +9742 + Partials 1140 177 -963 ``` [:umbrella: View full report in Codecov by Sentry](https://app.codecov.io/gh/apache/datafusion-comet/pull/2925?dropdown=coverage&src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache). :loudspeaker: Have feedback on the report? [Share it here](https://about.codecov.io/codecov-pr-comment-feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache). :rocket: New features to boost your workflow: - :snowflake: [Test Analytics](https://docs.codecov.com/docs/test-analytics): Detect flaky tests, report on failures, and find test suite problems. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] feat: Support string decimal cast [datafusion-comet]
coderfender commented on PR #2925: URL: https://github.com/apache/datafusion-comet/pull/2925#issuecomment-3682866288 Benchmarks after adding further optimizations ``` Running benchmark DecimalType(2,2) OpenJDK 64-Bit Server VM 17.0.16+8-LTS on Mac OS X 16.0 Apple M2 Max Cast function to : DecimalType(2,2) , ansi mode enabled : false: Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative --- SQL Parquet - Spark Cast expr from STRING to : DECIMAL(2,2) , ansi mode enabled : false106130 14 9.4 106.3 1.0X SQL Parquet - Comet Cast expr from STRING to : DECIMAL(2,2) , ansi mode enabled : false 94123 36 10.7 93.5 1.1X Running benchmark DecimalType(2,2) OpenJDK 64-Bit Server VM 17.0.16+8-LTS on Mac OS X 16.0 Apple M2 Max Cast function to : DecimalType(2,2) , ansi mode enabled : true: Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative -- SQL Parquet - Spark Cast expr from STRING to : DECIMAL(2,2) , ansi mode enabled : true 88109 18 11.3 88.3 1.0X SQL Parquet - Comet Cast expr from STRING to : DECIMAL(2,2) , ansi mode enabled : true 78 85 9 12.8 78.2 1.1X Running benchmark DecimalType(10,2) OpenJDK 64-Bit Server VM 17.0.16+8-LTS on Mac OS X 16.0 Apple M2 Max Cast function to : DecimalType(10,2) , ansi mode enabled : false: Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative SQL Parquet - Spark Cast expr from STRING to : DECIMAL(10,2) , ansi mode enabled : false102132 22 9.8 101.7 1.0X SQL Parquet - Comet Cast expr from STRING to : DECIMAL(10,2) , ansi mode enabled : false 86104 18 11.7 85.8 1.2X Running benchmark DecimalType(10,2) OpenJDK 64-Bit Server VM 17.0.16+8-LTS on Mac OS X 16.0 Apple M2 Max Cast function to : DecimalType(10,2) , ansi mode enabled : true: Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative --- SQL Parquet - Spark Cast expr from STRING to : DECIMAL(10,2) , ansi mode enabled : true118150 34 8.5 118.1 1.0X SQL Parquet - Comet Cast expr from STRING to : DECIMAL(10,2) , ansi mode enabled : true 86138 36 11.6 86.1 1.4X Running benchmark DecimalType(20,2) OpenJDK 64-Bit Server VM 17.0.16+8-LTS on Mac OS X 16.0 Apple M2 Max Cast function to : DecimalType(20,2) , ansi mode enabled : false: Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative SQL Parquet - Spark Cast expr from STRING to : DECIMAL(20,2) , ansi mode
Re: [PR] feat: Support string decimal cast [datafusion-comet]
coderfender commented on PR #2925: URL: https://github.com/apache/datafusion-comet/pull/2925#issuecomment-3680143962 Addressed double iteration and we now validate and parse in a single iteration -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] feat: Support string decimal cast [datafusion-comet]
coderfender commented on PR #2925: URL: https://github.com/apache/datafusion-comet/pull/2925#issuecomment-3678046392 Pushed commit to not have expensive 'trim' and split by '.' operations. Working on the rather difficult part of merging decimal validation and parsing in a single pass -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] feat: Support string decimal cast [datafusion-comet]
coderfender commented on PR #2925: URL: https://github.com/apache/datafusion-comet/pull/2925#issuecomment-3663403499 Great catch @andygrove . Let me try and see if I can resolve the issue or move forward with these changes and mark it compatible -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] feat: Support string decimal cast [datafusion-comet]
andygrove commented on PR #2925:
URL:
https://github.com/apache/datafusion-comet/pull/2925#issuecomment-3663382328
@coderfender, the functionality looks good overall. There are likely some
edge cases we do not support that we need to handle or document. I have only
found a few so far, and here is a suggested test to add.
```scala
test("cast StringType to DecimalType non-ASCII characters") {
// TODO fix for Spark 4.0.0
assume(!isSpark40Plus)
val values = Seq(
"\uff11\uff12\uff13", // fullwidth digits 123
"123\u", // null byte at end
"12\u3", // null byte in middle
"\u123", // null byte at start
null).toDF("a")
Seq(true, false).foreach(k =>
castTest(values, DataTypes.createDecimalType(10, 2), testAnsi = k))
}
```
```
!== Correct Answer - 5 == == Spark Answer - 5 ==
struct struct
[null,null][null,null]
![ 123,123.00] [ 123,null]
[12 3,null][12 3,null]
![123 ,123.00] [123 ,null]
![123,123.00] [123,null]
```
I think it is fine if we do not support everything, but we should mark the
cast as `Incompatible` and add documentation explaining the current limitations.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] feat: Support string decimal cast [datafusion-comet]
coderfender commented on PR #2925: URL: https://github.com/apache/datafusion-comet/pull/2925#issuecomment-3663339660 Sure @andygrove . Let me run some benchmarks on the cast op and update this thread -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] feat: Support string decimal cast [datafusion-comet]
andygrove commented on PR #2925: URL: https://github.com/apache/datafusion-comet/pull/2925#issuecomment-3663326197 Thanks for moving this to a separate PR @coderfender. I left some initial comments. Could you run some benchmarks? I'd like to get a feel for performance. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] feat: Support string decimal cast [datafusion-comet]
andygrove commented on code in PR #2925:
URL: https://github.com/apache/datafusion-comet/pull/2925#discussion_r2625347126
##
native/spark-expr/src/conversion_funcs/cast.rs:
##
@@ -1976,6 +1975,363 @@ fn do_cast_string_to_int<
Ok(Some(result))
}
+fn cast_string_to_decimal(
+array: &ArrayRef,
+to_type: &DataType,
+precision: &u8,
+scale: &i8,
+eval_mode: EvalMode,
+) -> SparkResult {
+match to_type {
+DataType::Decimal128(_, _) => {
+cast_string_to_decimal128_impl(array, eval_mode, *precision,
*scale)
+}
+DataType::Decimal256(_, _) => {
+cast_string_to_decimal256_impl(array, eval_mode, *precision,
*scale)
+}
+_ => Err(SparkError::Internal(format!(
+"Unexpected type in cast_string_to_decimal: {:?}",
+to_type
+))),
+}
+}
+
+fn cast_string_to_decimal128_impl(
+array: &ArrayRef,
+eval_mode: EvalMode,
+precision: u8,
+scale: i8,
+) -> SparkResult {
+let string_array = array
+.as_any()
+.downcast_ref::()
+.ok_or_else(|| SparkError::Internal("Expected string
array".to_string()))?;
+
+let mut decimal_builder =
Decimal128Builder::with_capacity(string_array.len());
+
+for i in 0..string_array.len() {
+if string_array.is_null(i) {
+decimal_builder.append_null();
+} else {
+let str_value = string_array.value(i).trim();
+match parse_string_to_decimal(str_value, precision, scale) {
+Ok(Some(decimal_value)) => {
+decimal_builder.append_value(decimal_value);
+}
+Ok(None) => {
+if eval_mode == EvalMode::Ansi {
+return Err(invalid_value(
+string_array.value(i),
+"STRING",
+&format!("DECIMAL({},{})", precision, scale),
+));
+}
+decimal_builder.append_null();
+}
+Err(e) => {
+if eval_mode == EvalMode::Ansi {
+return Err(e);
+}
+decimal_builder.append_null();
+}
+}
+}
+}
+
+Ok(Arc::new(
+decimal_builder
+.with_precision_and_scale(precision, scale)?
+.finish(),
+))
+}
+
+fn cast_string_to_decimal256_impl(
+array: &ArrayRef,
+eval_mode: EvalMode,
+precision: u8,
+scale: i8,
+) -> SparkResult {
+let string_array = array
+.as_any()
+.downcast_ref::()
+.ok_or_else(|| SparkError::Internal("Expected string
array".to_string()))?;
+
+let mut decimal_builder =
PrimitiveBuilderwith_capacity(string_array.len());
+
+for i in 0..string_array.len() {
+if string_array.is_null(i) {
+decimal_builder.append_null();
+} else {
+let str_value = string_array.value(i).trim();
+match parse_string_to_decimal(str_value, precision, scale) {
+Ok(Some(decimal_value)) => {
+// Convert i128 to i256
+let i256_value = i256::from_i128(decimal_value);
+decimal_builder.append_value(i256_value);
+}
+Ok(None) => {
+if eval_mode == EvalMode::Ansi {
+return Err(invalid_value(
+str_value,
+"STRING",
+&format!("DECIMAL({},{})", precision, scale),
+));
+}
+decimal_builder.append_null();
+}
+Err(e) => {
+if eval_mode == EvalMode::Ansi {
+return Err(e);
+}
+decimal_builder.append_null();
+}
+}
+}
+}
+
+Ok(Arc::new(
+decimal_builder
+.with_precision_and_scale(precision, scale)?
+.finish(),
+))
+}
+
+/// Validates if a string is a valid decimal similar to BigDecimal
+fn is_valid_decimal_format(s: &str) -> bool {
+if s.is_empty() {
+return false;
+}
+
+let bytes = s.as_bytes();
+let mut idx = 0;
+let len = bytes.len();
+
+// Skip leading +/- signs
+if bytes[idx] == b'+' || bytes[idx] == b'-' {
+idx += 1;
+if idx >= len {
+// Sign only. Fail early
+return false;
+}
+}
+
+// Check invalid cases like "++", "+-"
+if bytes[idx] == b'+' || bytes[idx] == b'-' {
+return false;
+}
+
+// Now we need at least one digit either before or after a decimal point
+let mut has_digit = false;
+let mut is_decimal_point_seen = false;
+
+while idx < len {
+let ch = bytes[i
Re: [PR] feat: Support string decimal cast [datafusion-comet]
andygrove commented on code in PR #2925:
URL: https://github.com/apache/datafusion-comet/pull/2925#discussion_r2625348185
##
spark/src/test/scala/org/apache/comet/CometCastSuite.scala:
##
@@ -684,22 +684,73 @@ class CometCastSuite extends CometTestBase with
AdaptiveSparkPlanHelper {
}
}
- ignore("cast StringType to DecimalType(10,2)") {
-// https://github.com/apache/datafusion-comet/issues/325
-val values = gen.generateStrings(dataSize, numericPattern, 8).toDF("a")
-castTest(values, DataTypes.createDecimalType(10, 2))
+ test("cast StringType to DecimalType(10,2)") {
+// TODO fix for Spark 4.0.0
+assume(!isSpark40Plus)
Review Comment:
What is the issue with Spark 4? Do we need to file a new issue and reference
it here?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] feat: Support string decimal cast [datafusion-comet]
andygrove commented on code in PR #2925:
URL: https://github.com/apache/datafusion-comet/pull/2925#discussion_r2625345509
##
native/spark-expr/src/conversion_funcs/cast.rs:
##
@@ -1976,6 +1975,363 @@ fn do_cast_string_to_int<
Ok(Some(result))
}
+fn cast_string_to_decimal(
+array: &ArrayRef,
+to_type: &DataType,
+precision: &u8,
+scale: &i8,
+eval_mode: EvalMode,
+) -> SparkResult {
+match to_type {
+DataType::Decimal128(_, _) => {
+cast_string_to_decimal128_impl(array, eval_mode, *precision,
*scale)
+}
+DataType::Decimal256(_, _) => {
+cast_string_to_decimal256_impl(array, eval_mode, *precision,
*scale)
+}
+_ => Err(SparkError::Internal(format!(
+"Unexpected type in cast_string_to_decimal: {:?}",
+to_type
+))),
+}
+}
+
+fn cast_string_to_decimal128_impl(
+array: &ArrayRef,
+eval_mode: EvalMode,
+precision: u8,
+scale: i8,
+) -> SparkResult {
+let string_array = array
+.as_any()
+.downcast_ref::()
+.ok_or_else(|| SparkError::Internal("Expected string
array".to_string()))?;
+
+let mut decimal_builder =
Decimal128Builder::with_capacity(string_array.len());
+
+for i in 0..string_array.len() {
+if string_array.is_null(i) {
+decimal_builder.append_null();
+} else {
+let str_value = string_array.value(i).trim();
+match parse_string_to_decimal(str_value, precision, scale) {
+Ok(Some(decimal_value)) => {
+decimal_builder.append_value(decimal_value);
+}
+Ok(None) => {
+if eval_mode == EvalMode::Ansi {
+return Err(invalid_value(
+string_array.value(i),
+"STRING",
+&format!("DECIMAL({},{})", precision, scale),
+));
+}
+decimal_builder.append_null();
+}
+Err(e) => {
+if eval_mode == EvalMode::Ansi {
+return Err(e);
+}
+decimal_builder.append_null();
+}
+}
+}
+}
+
+Ok(Arc::new(
+decimal_builder
+.with_precision_and_scale(precision, scale)?
+.finish(),
+))
+}
+
+fn cast_string_to_decimal256_impl(
+array: &ArrayRef,
+eval_mode: EvalMode,
+precision: u8,
+scale: i8,
+) -> SparkResult {
+let string_array = array
+.as_any()
+.downcast_ref::()
+.ok_or_else(|| SparkError::Internal("Expected string
array".to_string()))?;
+
+let mut decimal_builder =
PrimitiveBuilderwith_capacity(string_array.len());
+
+for i in 0..string_array.len() {
+if string_array.is_null(i) {
+decimal_builder.append_null();
+} else {
+let str_value = string_array.value(i).trim();
+match parse_string_to_decimal(str_value, precision, scale) {
+Ok(Some(decimal_value)) => {
+// Convert i128 to i256
+let i256_value = i256::from_i128(decimal_value);
+decimal_builder.append_value(i256_value);
+}
+Ok(None) => {
+if eval_mode == EvalMode::Ansi {
+return Err(invalid_value(
+str_value,
+"STRING",
+&format!("DECIMAL({},{})", precision, scale),
+));
+}
+decimal_builder.append_null();
+}
+Err(e) => {
+if eval_mode == EvalMode::Ansi {
+return Err(e);
+}
+decimal_builder.append_null();
+}
+}
+}
+}
+
+Ok(Arc::new(
+decimal_builder
+.with_precision_and_scale(precision, scale)?
+.finish(),
+))
+}
+
+/// Validates if a string is a valid decimal similar to BigDecimal
+fn is_valid_decimal_format(s: &str) -> bool {
+if s.is_empty() {
+return false;
+}
+
+let bytes = s.as_bytes();
+let mut idx = 0;
+let len = bytes.len();
+
+// Skip leading +/- signs
+if bytes[idx] == b'+' || bytes[idx] == b'-' {
+idx += 1;
+if idx >= len {
+// Sign only. Fail early
+return false;
+}
+}
+
+// Check invalid cases like "++", "+-"
+if bytes[idx] == b'+' || bytes[idx] == b'-' {
+return false;
+}
+
+// Now we need at least one digit either before or after a decimal point
+let mut has_digit = false;
+let mut is_decimal_point_seen = false;
+
+while idx < len {
+let ch = bytes[i
Re: [PR] feat: Support string decimal cast [datafusion-comet]
andygrove commented on code in PR #2925:
URL: https://github.com/apache/datafusion-comet/pull/2925#discussion_r2625343838
##
native/spark-expr/src/conversion_funcs/cast.rs:
##
@@ -1976,6 +1975,363 @@ fn do_cast_string_to_int<
Ok(Some(result))
}
+fn cast_string_to_decimal(
+array: &ArrayRef,
+to_type: &DataType,
+precision: &u8,
+scale: &i8,
+eval_mode: EvalMode,
+) -> SparkResult {
+match to_type {
+DataType::Decimal128(_, _) => {
+cast_string_to_decimal128_impl(array, eval_mode, *precision,
*scale)
+}
+DataType::Decimal256(_, _) => {
+cast_string_to_decimal256_impl(array, eval_mode, *precision,
*scale)
+}
+_ => Err(SparkError::Internal(format!(
+"Unexpected type in cast_string_to_decimal: {:?}",
+to_type
+))),
+}
+}
+
+fn cast_string_to_decimal128_impl(
+array: &ArrayRef,
+eval_mode: EvalMode,
+precision: u8,
+scale: i8,
+) -> SparkResult {
+let string_array = array
+.as_any()
+.downcast_ref::()
+.ok_or_else(|| SparkError::Internal("Expected string
array".to_string()))?;
+
+let mut decimal_builder =
Decimal128Builder::with_capacity(string_array.len());
+
+for i in 0..string_array.len() {
+if string_array.is_null(i) {
+decimal_builder.append_null();
+} else {
+let str_value = string_array.value(i).trim();
Review Comment:
It would be better to avoid `trim` since this could be expensive (creating a
new string). It would be more efficient for the parser to ignore whitespace at
the end of this string.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
