rdtr opened a new pull request, #56184:
URL: https://github.com/apache/spark/pull/56184
### What changes were proposed in this pull request?
`SQLQueryTestHelper.getSparkSettings` splits `--SET` directive values on
every comma:
```scala
val settingLines = comments.filter(_.startsWith("--SET
")).map(_.substring(6))
settingLines.flatMap(_.split(",").map { kv =>
val (conf, value) = kv.span(_ != '=')
conf.trim -> value.substring(1).trim
})
```
This breaks for any setting whose value itself contains a comma (e.g.
`spark.sql.optimizer.excludedRules`, which is documented as a comma-separated
list of rule names). The parser splits inside the value, the next segment has
no `=`, and `value.substring(1)` then throws `StringIndexOutOfBoundsException`.
This PR changes the split to only occur at commas that are immediately
followed by what looks like a new `key=`:
```scala
settingLines.flatMap(_.split(",(?=[\\w.]+=)").map { ... })
```
The positive lookahead `(?=[\w.]+=)` peeks at the following characters and
only splits when they form a config key followed by `=`, without consuming
those characters. This preserves the documented multi-setting form `--SET
k1=v1,k2=v2` while now also correctly handling values that themselves contain
commas.
### Why are the changes needed?
Several Spark configs take comma-separated lists as values —
`spark.sql.optimizer.excludedRules` being the most common one. SQL test files
currently cannot express such a value in a `--SET` directive at all; the parser
fails before the test even runs, forcing authors to scope `--SET` down to one
excluded rule per directive.
This was surfaced while writing a per-file workaround in Apache Gluten
([apache/incubator-gluten#12165](https://github.com/apache/incubator-gluten/pull/12165)),
which reuses this test helper and needs to exclude multiple rules at once.
### Does this PR introduce _any_ user-facing change?
No. Test-framework-only change. All existing `--SET` directives in the SQL
test corpus continue to parse identically.
### How was this patch tested?
Added `SQLQueryTestHelperSuite` (new file) with focused unit tests covering:
- single `key=value`
- documented multi-setting form `k1=v1,k2=v2` in a single `--SET`
- multiple `--SET` lines
- a value containing commas (e.g. `excludedRules=Rule1,Rule2`) — fails on
`master`, passes after the fix
- mixed: multi-setting where one of the values itself contains commas
- non-`--SET` comments are ignored
All 6 tests pass with the fix applied.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]