wankunde opened a new pull request #35550:
URL: https://github.com/apache/spark/pull/35550
### What changes were proposed in this pull request?
Try to optimize the string contains join query which could run for a long
time.
For example:
```sql
SELECT a.text, b.pattern
FROM fact_table a
JOIN patterns b
ON a.text like concat('%', b.pattern, '%');
```
Or
```sql
SELECT a.text, b.pattern
FROM fact_table a
JOIN patterns b
ON position(b.pattern, a.text) > 0;
```
The query will go from **O(M * N * m * n)** to O**(M * m * max(n))**
M = number of records in the fact table
N = number of records in the patterns table
m = row length of the fact table
n = row length of the patterns table
### Why are the changes needed?
Before this change, if we want to match many patterns for each row of the
fact table, it could run a very long time.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Added UTs
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]