[GitHub] [spark] amaliujia commented on a change in pull request #35352: [SPARK-38063][SQL] Support split_part Function

GitBox Tue, 15 Mar 2022 10:38:00 -0700


amaliujia commented on a change in pull request #35352:
URL: https://github.com/apache/spark/pull/35352#discussion_r827245478




##########
File path: 
common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java
##########
@@ -999,13 +1000,22 @@ public static UTF8String concatWs(UTF8String separator, 
UTF8String... inputs) {
   }
 
   public UTF8String[] split(UTF8String pattern, int limit) {
+    return split(pattern, limit, false);
+  }
+
+  public UTF8String[] split(UTF8String pattern, int limit, boolean ifQuote) {

Review comment:
       Sure, this will be a very important decision and will change 
implementation details. We should make a call before continuing checking other 
code details.
   
   Current split treats `pattern` as a regex, while split_part treats `pattern` 
as a fixed string (or quoted regex pattern). Because we know that split_part 
can be turned into ElementAt(split()), that is why I introduced branches into 
existing code path: to re-use code as much as possible while maintaining minor 
differences between function specs.
   
   There are two principles:
   1. Implement a function that is aligned with most of the vendors.
   2. Re-use code as much as possible but keep internal consistency.
   
   Thus leads to two options:
   Option 1: we implement split_part separately without re-using element_at and 
split, this will make the behavior compatible with others but might not produce 
minimal code addition.
   Option 2: we change split_part to follow split,  thus leads to very nice 
code re-use, but our split will be pretty unique.
   
   What do you think @cloud-fan @srielau?
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] amaliujia commented on a change in pull request #35352: [SPARK-38063][SQL] Support split_part Function

Reply via email to