Re: Regexp with word-boundary followed by unicode character doesn't work in 19, 21

2023-12-15 Thread Naoto Sato
Or use extended Grapheme Cluster boundary "\\b{g}" instead of "\\b". This will correctly search emoji sequences such as 👨‍👩‍👧‍👧, while "\\b" with Unicode option won't. HTH, Naoto On 12/15/23 11:29 AM, Stefan Norberg wrote: Thanks Raffaello, Ah, thanks! Found https://bugs.openjdk.org/browse/JD

Re: Regexp with word-boundary followed by unicode character doesn't work in 19, 21

2023-12-15 Thread Stefan Norberg
Thanks Raffaello, Ah, thanks! Found https://bugs.openjdk.org/browse/JDK-8264160 in the release notes for 19 just now. Have a great weekend! /Stefan On Fri, Dec 15, 2023 at 8:24 PM Raffaello Giulietti < raffaello.giulie...@oracle.com> wrote: > By default, a word boundary only considers ASCII lett

Re: Regexp with word-boundary followed by unicode character doesn't work in 19, 21

2023-12-15 Thread Raffaello Giulietti
By default, a word boundary only considers ASCII letters and digits. See "Predefined character classes" in the documentation. To add Unicode support, you have a choice between adding a flag as a 2nd argument to the compile() method Pattern p = Pattern.compile("(\\b" + word + "\\b)", Pattern.

Regexp with word-boundary followed by unicode character doesn't work in 19, 21

2023-12-15 Thread Stefan Norberg
The following test works in 17 but fails in 19.0.2, and 21.0.1 on the last assertion. Bug or feature? import org.junit.jupiter.api.Assertions; import org.junit.jupiter.api.Test; import java.util.ArrayList; import java.util.regex.Matcher; import java.util.regex.Pattern; /** * Tests passes in JDK