MaxGekk commented on code in PR #56498:
URL: https://github.com/apache/spark/pull/56498#discussion_r3494103805
##########
common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java:
##########
@@ -1246,6 +1246,114 @@ public int indexOf(UTF8String v, int start) {
return -1;
}
+ /**
+ * Finds the {@code occurrence}-th occurrence of {@code pattern} in this
string,
+ * starting the search at the specified position.
+ * When {@code start} is positive, the search proceeds forward from the
+ * {@code start}-th character (1-based). When {@code start} is negative, the
+ * search proceeds backward: {@code start} specifies the first character to
+ * compare, counting from the end of the string. For example,
+ * {@code start = -3} points at the 3rd character from the end, and the first
+ * candidate substring is the one that begins at that character.
+ * Overlapping matches are supported (e.g. "aa" in "aaa" returns 0, 1, 2 for
+ * occurrence 1, 2, 3 respectively).
Review Comment:
The overlapping example overstates the count: `"aa"` occurs only **twice**
in `"aaa"` (0-based positions 0 and 1) — there's no third occurrence, and the
method correctly returns -1 for occurrence 3 (verified: `instr('aaa','aa',1,3)`
returns 0). Suggest fixing the example (or using `"aaaa"`, which does have
three overlapping `"aa"` at 0, 1, 2):
```suggestion
* Overlapping matches are supported (e.g. "aa" in "aaa" returns 0, 1 for
* occurrence 1, 2 respectively).
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]