pchintar opened a new pull request, #56240:
URL: https://github.com/apache/spark/pull/56240
### What changes were proposed in this pull request?
Spark SQL already supports the 4-argument form of `regexp_replace`:
```sql
regexp_replace(str, regexp, replacement, position)
```
However, the corresponding Scala, PySpark, and Spark Connect APIs currently
expose only the 3-argument variants.
This PR exposes the existing 4-argument functionality through:
* Scala API (`functions.regexp_replace`)
* PySpark API (`functions.regexp_replace`)
* Spark Connect API (`functions.regexp_replace`)
and adds corresponding Scala, PySpark, and Connect test coverage.
### Why are the changes needed?
The underlying SQL functionality already exists and is available through SQL
expressions, but it is not accessible through the public Scala, PySpark, and
Spark Connect APIs.
This creates an inconsistency between SQL and programmatic interfaces.
Exposing the optional `position` argument aligns the public APIs with existing
SQL functionality.
### Does this PR introduce *any* user-facing change?
Yes.
Users can now specify the optional `position` argument through the Scala,
PySpark, and Spark Connect APIs.
Before:
```scala
regexp_replace(col("s"), "(\\d+)", "--")
```
After:
```scala
regexp_replace(col("s"), "(\\d+)", "--", 5)
```
Similarly, PySpark users can now call:
```python
F.regexp_replace("s", r"(\d+)", "--", 5)
```
### How was this patch tested?
Added test coverage in:
* `StringFunctionsSuite`
* `FunctionsTests.test_regexp_replace`
* `SparkConnectFunctionTests.test_string_functions_multi_args`
Verified with:
```bash
./build/sbt "sql-api/compile"
./build/sbt "sql/Test/compile"
./build/sbt \
"sql/testOnly org.apache.spark.sql.StringFunctionsSuite -- -z
\"regex_replace / regex_extract\""
python3.11 -m pytest \
python/pyspark/sql/tests/test_functions.py::FunctionsTests::test_regexp_replace
-v
python3.11 -m pytest \
python/pyspark/sql/tests/connect/test_connect_function.py::SparkConnectFunctionTests::test_string_functions_multi_args
-v
```
### Was this patch authored or co-authored using generative AI tooling?
Co-written with GPT 5
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]