beliefer opened a new pull request #29891:
URL: https://github.com/apache/spark/pull/29891
### What changes were proposed in this pull request?
`REGEXP_REPLACE` could replace all substrings of string that match regexp
with replacement string.
But `REGEXP_REPLACE` lost some flexibility. such as: converts camel case
strings to a string containing lower case words separated by an underscore:
AddressLine1 -> address_line_1
If we support the parameter position, we can do like this(e.g. Oracle):
```
CopyWITH strings as (
SELECT 'AddressLine1' s FROM dual union all
SELECT 'ZipCode' s FROM dual union all
SELECT 'Country' s FROM dual
)
SELECT s "STRING",
lower(regexp_replace(s, '([A-Z0-9])', '_\1', 2)) "MODIFIED_STRING"
FROM strings;
STRING MODIFIED_STRING
-------------------- --------------------
AddressLine1 address_line_1
ZipCode zip_code
Country country
```
There are some mainstream database support the syntax.
**Oracle**
https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/REGEXP_REPLACE.html#GUID-EA80A33C-441A-4692-A959-273B5A224490
**Vertica**
https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/SQLReferenceManual/Functions/RegularExpressions/REGEXP_REPLACE.htm?zoom_highlight=regexp_replace
**Redshift**
https://docs.aws.amazon.com/redshift/latest/dg/REGEXP_REPLACE.html
### Why are the changes needed?
The parameter position for `REGEXP_REPLACE` is very useful.
### Does this PR introduce _any_ user-facing change?
'Yes'.
### How was this patch tested?
Jenkins test.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]