Re: [PR] feat: Support string decimal cast [datafusion-comet]

2026-01-05 Thread via GitHub


andygrove commented on PR #2925:
URL: 
https://github.com/apache/datafusion-comet/pull/2925#issuecomment-3712099081

   > Late comment, but did rust_decimal not work for us?
   
   I didn't know this at the time, but there are security issues with 
rust_decimal (this is currently blocking PR builds in the datafusion core 
project).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] feat: Support string decimal cast [datafusion-comet]

2026-01-05 Thread via GitHub


coderfender commented on PR #2925:
URL: 
https://github.com/apache/datafusion-comet/pull/2925#issuecomment-3712088019

   Yeah . I had a chat with @andygrove about this and preferred to keep the 
cast logic to ourselves to match with Spark rather than relying on a new 
library 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] feat: Support string decimal cast [datafusion-comet]

2026-01-05 Thread via GitHub


parthchandra commented on PR #2925:
URL: 
https://github.com/apache/datafusion-comet/pull/2925#issuecomment-3712073922

   Late comment, but did rust_decimal not work for us?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] feat: Support string decimal cast [datafusion-comet]

2025-12-22 Thread via GitHub


coderfender commented on PR #2925:
URL: 
https://github.com/apache/datafusion-comet/pull/2925#issuecomment-3684114952

   Thank you @andygrove 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] feat: Support string decimal cast [datafusion-comet]

2025-12-22 Thread via GitHub


andygrove merged PR #2925:
URL: https://github.com/apache/datafusion-comet/pull/2925


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] feat: Support string decimal cast [datafusion-comet]

2025-12-22 Thread via GitHub


coderfender commented on PR #2925:
URL: 
https://github.com/apache/datafusion-comet/pull/2925#issuecomment-3684062501

   @andygrove the docs are already updated AFAIK 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] feat: Support string decimal cast [datafusion-comet]

2025-12-22 Thread via GitHub


andygrove commented on PR #2925:
URL: 
https://github.com/apache/datafusion-comet/pull/2925#issuecomment-3684033931

   Thanks @coderfender. This LGTM but could you also update the compatibility 
guide. It currently states `Does not support inputs ending with 'd' or 'f'. 
Does not support 'inf'. Does not support ANSI mode. Returns 0.0 instead of null 
if input contains no digits` and this is not entirely correct now.
   
   With the docs update, I'll be happy to approve.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] feat: Support string decimal cast [datafusion-comet]

2025-12-22 Thread via GitHub


coderfender commented on code in PR #2925:
URL: https://github.com/apache/datafusion-comet/pull/2925#discussion_r2641058678


##
spark/src/test/scala/org/apache/comet/CometCastSuite.scala:
##
@@ -684,21 +683,88 @@ class CometCastSuite extends CometTestBase with 
AdaptiveSparkPlanHelper {
 }
   }
 
+//  This is to pass the first `all cast combinations are covered`
   ignore("cast StringType to DecimalType(10,2)") {

Review Comment:
   @andygrove this test cant be enabled given that we are still marking the 
casts as incompatible . However I added the same tests (with a different name 
to make sure they pass) . Enabling this test will fail 'all cast combinations 
are covered' 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] feat: Support string decimal cast [datafusion-comet]

2025-12-22 Thread via GitHub


andygrove commented on code in PR #2925:
URL: https://github.com/apache/datafusion-comet/pull/2925#discussion_r2641053036


##
spark/src/test/scala/org/apache/comet/CometCastSuite.scala:
##
@@ -684,21 +683,88 @@ class CometCastSuite extends CometTestBase with 
AdaptiveSparkPlanHelper {
 }
   }
 
+//  This is to pass the first `all cast combinations are covered`
   ignore("cast StringType to DecimalType(10,2)") {

Review Comment:
   can this test also be enabled now?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] feat: Support string decimal cast [datafusion-comet]

2025-12-22 Thread via GitHub


coderfender commented on PR #2925:
URL: 
https://github.com/apache/datafusion-comet/pull/2925#issuecomment-3683891186

   @andygrove please take a look whenever you get a chance. Thank you 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] feat: Support string decimal cast [datafusion-comet]

2025-12-22 Thread via GitHub


codecov-commenter commented on PR #2925:
URL: 
https://github.com/apache/datafusion-comet/pull/2925#issuecomment-3682949388

   ## 
[Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/2925?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
 Report
   :x: Patch coverage is `0%` with `1 line` in your changes missing coverage. 
Please review.
   :white_check_mark: Project coverage is 9.99%. Comparing base 
([`f09f8af`](https://app.codecov.io/gh/apache/datafusion-comet/commit/f09f8af64c6599255e116a376f4f008f2fd63b43?dropdown=coverage&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache))
 to head 
([`6a6c80d`](https://app.codecov.io/gh/apache/datafusion-comet/commit/6a6c80d0c66e93c8f08ba4fd7d72096049c04ae0?dropdown=coverage&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)).
   :warning: Report is 785 commits behind head on main.
   
   | [Files with missing 
lines](https://app.codecov.io/gh/apache/datafusion-comet/pull/2925?dropdown=coverage&src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
 | Patch % | Lines |
   |---|---|---|
   | 
[...scala/org/apache/comet/expressions/CometCast.scala](https://app.codecov.io/gh/apache/datafusion-comet/pull/2925?src=pr&el=tree&filepath=spark%2Fsrc%2Fmain%2Fscala%2Forg%2Fapache%2Fcomet%2Fexpressions%2FCometCast.scala&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-c3Bhcmsvc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9jb21ldC9leHByZXNzaW9ucy9Db21ldENhc3Quc2NhbGE=)
 | 0.00% | [1 Missing :warning: 
](https://app.codecov.io/gh/apache/datafusion-comet/pull/2925?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
 |
   
   Additional details and impacted files
   
   
   
   ```diff
   @@ Coverage Diff  @@
   ##   main   #2925   +/-   ##
   
   - Coverage 56.12%   9.99%   -46.14% 
   + Complexity  976 234  -742 
   
 Files   119 167   +48 
 Lines 11743   15478 +3735 
 Branches   22512542  +291 
   
   - Hits   65911547 -5044 
   - Misses 4012   13754 +9742 
   + Partials   1140 177  -963 
   ```
   
   
   [:umbrella: View full report in Codecov by 
Sentry](https://app.codecov.io/gh/apache/datafusion-comet/pull/2925?dropdown=coverage&src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache).
   
   :loudspeaker: Have feedback on the report? [Share it 
here](https://about.codecov.io/codecov-pr-comment-feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache).
:rocket: New features to boost your workflow: 
   
   - :snowflake: [Test 
Analytics](https://docs.codecov.com/docs/test-analytics): Detect flaky tests, 
report on failures, and find test suite problems.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] feat: Support string decimal cast [datafusion-comet]

2025-12-22 Thread via GitHub


coderfender commented on PR #2925:
URL: 
https://github.com/apache/datafusion-comet/pull/2925#issuecomment-3682866288

   Benchmarks after adding further optimizations 
   
   ```
   

   Running benchmark DecimalType(2,2)
   

   
   OpenJDK 64-Bit Server VM 17.0.16+8-LTS on Mac OS X 16.0
   Apple M2 Max
   Cast function to : DecimalType(2,2) , ansi mode enabled : false: 
Best Time(ms)   Avg Time(ms)   Stdev(ms)Rate(M/s)   Per Row(ns) 
  Relative
   
---
   SQL Parquet - Spark Cast expr from STRING to : DECIMAL(2,2) , ansi mode 
enabled : false106130  14  9.4 
106.3   1.0X
   SQL Parquet - Comet Cast expr from STRING to : DECIMAL(2,2) , ansi mode 
enabled : false 94123  36 10.7  
93.5   1.1X
   
   
   

   Running benchmark DecimalType(2,2)
   

   
   OpenJDK 64-Bit Server VM 17.0.16+8-LTS on Mac OS X 16.0
   Apple M2 Max
   Cast function to : DecimalType(2,2) , ansi mode enabled : true:  
   Best Time(ms)   Avg Time(ms)   Stdev(ms)Rate(M/s)   Per Row(ns)  
 Relative
   
--
   SQL Parquet - Spark Cast expr from STRING to : DECIMAL(2,2) , ansi mode 
enabled : true 88109  18 11.3  
88.3   1.0X
   SQL Parquet - Comet Cast expr from STRING to : DECIMAL(2,2) , ansi mode 
enabled : true 78 85   9 12.8  
78.2   1.1X
   
   
   

   Running benchmark DecimalType(10,2)
   

   
   OpenJDK 64-Bit Server VM 17.0.16+8-LTS on Mac OS X 16.0
   Apple M2 Max
   Cast function to : DecimalType(10,2) , ansi mode enabled : false:
 Best Time(ms)   Avg Time(ms)   Stdev(ms)Rate(M/s)   Per 
Row(ns)   Relative
   

   SQL Parquet - Spark Cast expr from STRING to : DECIMAL(10,2) , ansi mode 
enabled : false102132  22  9.8 
101.7   1.0X
   SQL Parquet - Comet Cast expr from STRING to : DECIMAL(10,2) , ansi mode 
enabled : false 86104  18 11.7  
85.8   1.2X
   
   
   

   Running benchmark DecimalType(10,2)
   

   
   OpenJDK 64-Bit Server VM 17.0.16+8-LTS on Mac OS X 16.0
   Apple M2 Max
   Cast function to : DecimalType(10,2) , ansi mode enabled : true: 
Best Time(ms)   Avg Time(ms)   Stdev(ms)Rate(M/s)   Per Row(ns) 
  Relative
   
---
   SQL Parquet - Spark Cast expr from STRING to : DECIMAL(10,2) , ansi mode 
enabled : true118150  34  8.5 
118.1   1.0X
   SQL Parquet - Comet Cast expr from STRING to : DECIMAL(10,2) , ansi mode 
enabled : true 86138  36 11.6  
86.1   1.4X
   
   
   

   Running benchmark DecimalType(20,2)
   

   
   OpenJDK 64-Bit Server VM 17.0.16+8-LTS on Mac OS X 16.0
   Apple M2 Max
   Cast function to : DecimalType(20,2) , ansi mode enabled : false:
 Best Time(ms)   Avg Time(ms)   Stdev(ms)Rate(M/s)   Per 
Row(ns)   Relative
   

   SQL Parquet - Spark Cast expr from STRING to : DECIMAL(20,2) , ansi mode 

Re: [PR] feat: Support string decimal cast [datafusion-comet]

2025-12-21 Thread via GitHub


coderfender commented on PR #2925:
URL: 
https://github.com/apache/datafusion-comet/pull/2925#issuecomment-3680143962

   Addressed double iteration and we now validate and parse in a single 
iteration 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] feat: Support string decimal cast [datafusion-comet]

2025-12-20 Thread via GitHub


coderfender commented on PR #2925:
URL: 
https://github.com/apache/datafusion-comet/pull/2925#issuecomment-3678046392

   Pushed commit to not have expensive 'trim' and split by '.' operations. 
Working on the rather difficult part of merging decimal validation and parsing 
in a single pass 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] feat: Support string decimal cast [datafusion-comet]

2025-12-16 Thread via GitHub


coderfender commented on PR #2925:
URL: 
https://github.com/apache/datafusion-comet/pull/2925#issuecomment-3663403499

   Great catch @andygrove . Let me try and see if I can resolve the issue or 
move forward with these changes and mark it compatible 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] feat: Support string decimal cast [datafusion-comet]

2025-12-16 Thread via GitHub


andygrove commented on PR #2925:
URL: 
https://github.com/apache/datafusion-comet/pull/2925#issuecomment-3663382328

   @coderfender, the functionality looks good overall. There are likely some 
edge cases we do not support that we need to handle or document. I have only 
found a few so far, and here is a suggested test to add.
   
   ```scala
 test("cast StringType to DecimalType non-ASCII characters") {
   // TODO fix for Spark 4.0.0
   assume(!isSpark40Plus)
   val values = Seq(
 "\uff11\uff12\uff13", // fullwidth digits 123
 "123\u", // null byte at end
 "12\u3", // null byte in middle
 "\u123", // null byte at start
 null).toDF("a")
   Seq(true, false).foreach(k =>
 castTest(values, DataTypes.createDecimalType(10, 2), testAnsi = k))
 }
   ```
   
   ```
   !== Correct Answer - 5 ==   == Spark Answer - 5 ==
struct   struct
[null,null][null,null]
   ![ 123,123.00]  [ 123,null]
[12 3,null][12 3,null]
   ![123 ,123.00]  [123 ,null]
   ![123,123.00]   [123,null]
   ```
   
   I think it is fine if we do not support everything, but we should mark the 
cast as `Incompatible` and add documentation explaining the current limitations.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] feat: Support string decimal cast [datafusion-comet]

2025-12-16 Thread via GitHub


coderfender commented on PR #2925:
URL: 
https://github.com/apache/datafusion-comet/pull/2925#issuecomment-3663339660

   Sure @andygrove . Let me run some benchmarks on the cast op and update this 
thread 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] feat: Support string decimal cast [datafusion-comet]

2025-12-16 Thread via GitHub


andygrove commented on PR #2925:
URL: 
https://github.com/apache/datafusion-comet/pull/2925#issuecomment-3663326197

   Thanks for moving this to a separate PR @coderfender. I left some initial 
comments. Could you run some benchmarks? I'd like to get a feel for performance.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] feat: Support string decimal cast [datafusion-comet]

2025-12-16 Thread via GitHub


andygrove commented on code in PR #2925:
URL: https://github.com/apache/datafusion-comet/pull/2925#discussion_r2625347126


##
native/spark-expr/src/conversion_funcs/cast.rs:
##
@@ -1976,6 +1975,363 @@ fn do_cast_string_to_int<
 Ok(Some(result))
 }
 
+fn cast_string_to_decimal(
+array: &ArrayRef,
+to_type: &DataType,
+precision: &u8,
+scale: &i8,
+eval_mode: EvalMode,
+) -> SparkResult {
+match to_type {
+DataType::Decimal128(_, _) => {
+cast_string_to_decimal128_impl(array, eval_mode, *precision, 
*scale)
+}
+DataType::Decimal256(_, _) => {
+cast_string_to_decimal256_impl(array, eval_mode, *precision, 
*scale)
+}
+_ => Err(SparkError::Internal(format!(
+"Unexpected type in cast_string_to_decimal: {:?}",
+to_type
+))),
+}
+}
+
+fn cast_string_to_decimal128_impl(
+array: &ArrayRef,
+eval_mode: EvalMode,
+precision: u8,
+scale: i8,
+) -> SparkResult {
+let string_array = array
+.as_any()
+.downcast_ref::()
+.ok_or_else(|| SparkError::Internal("Expected string 
array".to_string()))?;
+
+let mut decimal_builder = 
Decimal128Builder::with_capacity(string_array.len());
+
+for i in 0..string_array.len() {
+if string_array.is_null(i) {
+decimal_builder.append_null();
+} else {
+let str_value = string_array.value(i).trim();
+match parse_string_to_decimal(str_value, precision, scale) {
+Ok(Some(decimal_value)) => {
+decimal_builder.append_value(decimal_value);
+}
+Ok(None) => {
+if eval_mode == EvalMode::Ansi {
+return Err(invalid_value(
+string_array.value(i),
+"STRING",
+&format!("DECIMAL({},{})", precision, scale),
+));
+}
+decimal_builder.append_null();
+}
+Err(e) => {
+if eval_mode == EvalMode::Ansi {
+return Err(e);
+}
+decimal_builder.append_null();
+}
+}
+}
+}
+
+Ok(Arc::new(
+decimal_builder
+.with_precision_and_scale(precision, scale)?
+.finish(),
+))
+}
+
+fn cast_string_to_decimal256_impl(
+array: &ArrayRef,
+eval_mode: EvalMode,
+precision: u8,
+scale: i8,
+) -> SparkResult {
+let string_array = array
+.as_any()
+.downcast_ref::()
+.ok_or_else(|| SparkError::Internal("Expected string 
array".to_string()))?;
+
+let mut decimal_builder = 
PrimitiveBuilderwith_capacity(string_array.len());
+
+for i in 0..string_array.len() {
+if string_array.is_null(i) {
+decimal_builder.append_null();
+} else {
+let str_value = string_array.value(i).trim();
+match parse_string_to_decimal(str_value, precision, scale) {
+Ok(Some(decimal_value)) => {
+// Convert i128 to i256
+let i256_value = i256::from_i128(decimal_value);
+decimal_builder.append_value(i256_value);
+}
+Ok(None) => {
+if eval_mode == EvalMode::Ansi {
+return Err(invalid_value(
+str_value,
+"STRING",
+&format!("DECIMAL({},{})", precision, scale),
+));
+}
+decimal_builder.append_null();
+}
+Err(e) => {
+if eval_mode == EvalMode::Ansi {
+return Err(e);
+}
+decimal_builder.append_null();
+}
+}
+}
+}
+
+Ok(Arc::new(
+decimal_builder
+.with_precision_and_scale(precision, scale)?
+.finish(),
+))
+}
+
+/// Validates if a string is a valid decimal similar to BigDecimal
+fn is_valid_decimal_format(s: &str) -> bool {
+if s.is_empty() {
+return false;
+}
+
+let bytes = s.as_bytes();
+let mut idx = 0;
+let len = bytes.len();
+
+// Skip leading +/- signs
+if bytes[idx] == b'+' || bytes[idx] == b'-' {
+idx += 1;
+if idx >= len {
+// Sign only. Fail early
+return false;
+}
+}
+
+// Check invalid cases like "++", "+-"
+if bytes[idx] == b'+' || bytes[idx] == b'-' {
+return false;
+}
+
+// Now we need at least one digit either before or after a decimal point
+let mut has_digit = false;
+let mut is_decimal_point_seen = false;
+
+while idx < len {
+let ch = bytes[i

Re: [PR] feat: Support string decimal cast [datafusion-comet]

2025-12-16 Thread via GitHub


andygrove commented on code in PR #2925:
URL: https://github.com/apache/datafusion-comet/pull/2925#discussion_r2625348185


##
spark/src/test/scala/org/apache/comet/CometCastSuite.scala:
##
@@ -684,22 +684,73 @@ class CometCastSuite extends CometTestBase with 
AdaptiveSparkPlanHelper {
 }
   }
 
-  ignore("cast StringType to DecimalType(10,2)") {
-// https://github.com/apache/datafusion-comet/issues/325
-val values = gen.generateStrings(dataSize, numericPattern, 8).toDF("a")
-castTest(values, DataTypes.createDecimalType(10, 2))
+  test("cast StringType to DecimalType(10,2)") {
+// TODO fix for Spark 4.0.0
+assume(!isSpark40Plus)

Review Comment:
   What is the issue with Spark 4? Do we need to file a new issue and reference 
it here?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] feat: Support string decimal cast [datafusion-comet]

2025-12-16 Thread via GitHub


andygrove commented on code in PR #2925:
URL: https://github.com/apache/datafusion-comet/pull/2925#discussion_r2625345509


##
native/spark-expr/src/conversion_funcs/cast.rs:
##
@@ -1976,6 +1975,363 @@ fn do_cast_string_to_int<
 Ok(Some(result))
 }
 
+fn cast_string_to_decimal(
+array: &ArrayRef,
+to_type: &DataType,
+precision: &u8,
+scale: &i8,
+eval_mode: EvalMode,
+) -> SparkResult {
+match to_type {
+DataType::Decimal128(_, _) => {
+cast_string_to_decimal128_impl(array, eval_mode, *precision, 
*scale)
+}
+DataType::Decimal256(_, _) => {
+cast_string_to_decimal256_impl(array, eval_mode, *precision, 
*scale)
+}
+_ => Err(SparkError::Internal(format!(
+"Unexpected type in cast_string_to_decimal: {:?}",
+to_type
+))),
+}
+}
+
+fn cast_string_to_decimal128_impl(
+array: &ArrayRef,
+eval_mode: EvalMode,
+precision: u8,
+scale: i8,
+) -> SparkResult {
+let string_array = array
+.as_any()
+.downcast_ref::()
+.ok_or_else(|| SparkError::Internal("Expected string 
array".to_string()))?;
+
+let mut decimal_builder = 
Decimal128Builder::with_capacity(string_array.len());
+
+for i in 0..string_array.len() {
+if string_array.is_null(i) {
+decimal_builder.append_null();
+} else {
+let str_value = string_array.value(i).trim();
+match parse_string_to_decimal(str_value, precision, scale) {
+Ok(Some(decimal_value)) => {
+decimal_builder.append_value(decimal_value);
+}
+Ok(None) => {
+if eval_mode == EvalMode::Ansi {
+return Err(invalid_value(
+string_array.value(i),
+"STRING",
+&format!("DECIMAL({},{})", precision, scale),
+));
+}
+decimal_builder.append_null();
+}
+Err(e) => {
+if eval_mode == EvalMode::Ansi {
+return Err(e);
+}
+decimal_builder.append_null();
+}
+}
+}
+}
+
+Ok(Arc::new(
+decimal_builder
+.with_precision_and_scale(precision, scale)?
+.finish(),
+))
+}
+
+fn cast_string_to_decimal256_impl(
+array: &ArrayRef,
+eval_mode: EvalMode,
+precision: u8,
+scale: i8,
+) -> SparkResult {
+let string_array = array
+.as_any()
+.downcast_ref::()
+.ok_or_else(|| SparkError::Internal("Expected string 
array".to_string()))?;
+
+let mut decimal_builder = 
PrimitiveBuilderwith_capacity(string_array.len());
+
+for i in 0..string_array.len() {
+if string_array.is_null(i) {
+decimal_builder.append_null();
+} else {
+let str_value = string_array.value(i).trim();
+match parse_string_to_decimal(str_value, precision, scale) {
+Ok(Some(decimal_value)) => {
+// Convert i128 to i256
+let i256_value = i256::from_i128(decimal_value);
+decimal_builder.append_value(i256_value);
+}
+Ok(None) => {
+if eval_mode == EvalMode::Ansi {
+return Err(invalid_value(
+str_value,
+"STRING",
+&format!("DECIMAL({},{})", precision, scale),
+));
+}
+decimal_builder.append_null();
+}
+Err(e) => {
+if eval_mode == EvalMode::Ansi {
+return Err(e);
+}
+decimal_builder.append_null();
+}
+}
+}
+}
+
+Ok(Arc::new(
+decimal_builder
+.with_precision_and_scale(precision, scale)?
+.finish(),
+))
+}
+
+/// Validates if a string is a valid decimal similar to BigDecimal
+fn is_valid_decimal_format(s: &str) -> bool {
+if s.is_empty() {
+return false;
+}
+
+let bytes = s.as_bytes();
+let mut idx = 0;
+let len = bytes.len();
+
+// Skip leading +/- signs
+if bytes[idx] == b'+' || bytes[idx] == b'-' {
+idx += 1;
+if idx >= len {
+// Sign only. Fail early
+return false;
+}
+}
+
+// Check invalid cases like "++", "+-"
+if bytes[idx] == b'+' || bytes[idx] == b'-' {
+return false;
+}
+
+// Now we need at least one digit either before or after a decimal point
+let mut has_digit = false;
+let mut is_decimal_point_seen = false;
+
+while idx < len {
+let ch = bytes[i

Re: [PR] feat: Support string decimal cast [datafusion-comet]

2025-12-16 Thread via GitHub


andygrove commented on code in PR #2925:
URL: https://github.com/apache/datafusion-comet/pull/2925#discussion_r2625343838


##
native/spark-expr/src/conversion_funcs/cast.rs:
##
@@ -1976,6 +1975,363 @@ fn do_cast_string_to_int<
 Ok(Some(result))
 }
 
+fn cast_string_to_decimal(
+array: &ArrayRef,
+to_type: &DataType,
+precision: &u8,
+scale: &i8,
+eval_mode: EvalMode,
+) -> SparkResult {
+match to_type {
+DataType::Decimal128(_, _) => {
+cast_string_to_decimal128_impl(array, eval_mode, *precision, 
*scale)
+}
+DataType::Decimal256(_, _) => {
+cast_string_to_decimal256_impl(array, eval_mode, *precision, 
*scale)
+}
+_ => Err(SparkError::Internal(format!(
+"Unexpected type in cast_string_to_decimal: {:?}",
+to_type
+))),
+}
+}
+
+fn cast_string_to_decimal128_impl(
+array: &ArrayRef,
+eval_mode: EvalMode,
+precision: u8,
+scale: i8,
+) -> SparkResult {
+let string_array = array
+.as_any()
+.downcast_ref::()
+.ok_or_else(|| SparkError::Internal("Expected string 
array".to_string()))?;
+
+let mut decimal_builder = 
Decimal128Builder::with_capacity(string_array.len());
+
+for i in 0..string_array.len() {
+if string_array.is_null(i) {
+decimal_builder.append_null();
+} else {
+let str_value = string_array.value(i).trim();

Review Comment:
   It would be better to avoid `trim` since this could be expensive (creating a 
new string). It would be more efficient for the parser to ignore whitespace at 
the end of this string.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]