Or use extended Grapheme Cluster boundary "\\b{g}" instead of "\\b".
This will correctly search emoji sequences such as 👨👩👧👧, while
"\\b" with Unicode option won't.
HTH,
Naoto
On 12/15/23 11:29 AM, Stefan Norberg wrote:
Thanks Raffaello,
Ah, thanks! Found https://bugs.openjdk.org/browse/JD
Thanks Raffaello,
Ah, thanks! Found https://bugs.openjdk.org/browse/JDK-8264160 in the
release notes for 19 just now.
Have a great weekend!
/Stefan
On Fri, Dec 15, 2023 at 8:24 PM Raffaello Giulietti <
raffaello.giulie...@oracle.com> wrote:
> By default, a word boundary only considers ASCII lett
By default, a word boundary only considers ASCII letters and digits. See
"Predefined character classes" in the documentation.
To add Unicode support, you have a choice between adding a flag as a 2nd
argument to the compile() method
Pattern p = Pattern.compile("(\\b" + word + "\\b)",
Pattern.
The following test works in 17 but fails in 19.0.2, and 21.0.1 on the last
assertion. Bug or feature?
import org.junit.jupiter.api.Assertions;
import org.junit.jupiter.api.Test;
import java.util.ArrayList;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
/**
* Tests passes in JDK