Racket REPL doesn’t handle unicode well. If you try (regexp-match?
#px"^[a-zA-Z]+$" "héllo") in DrRacket, or write it as a program in a file
and run it, you will find that it does evaluate to #f.

On Thu, Jul 9, 2020 at 7:19 AM Peter W A Wood <peterwaw...@gmail.com> wrote:

> I was experimenting with regular expressions to try to emulate the Python
> isalpha() String method. Using a simple [a-zA-Z] character class worked for
> the English alphabet (ASCII characters):
>
> > (regexp-match? #px"^[a-zA-Z]+$" "hello")
> #t
> > (regexp-match? #px"^[a-zA-Z]+$" "h1llo")
> #f
>
> It then dawned on me that the Python is alpha() method was Unicode aware:
>
> >>> "é".isalpha()
> True
>
> I started scratching my head as how to achieve the equivalent using a
> regular expression in Python. I tried the same regular expression with a
> non-English character in the string. To my surprise, the regular expression
> recognised the non-ASCII characters.
>
> > (regexp-match? #px"^[a-zA-Z]+$" "h\U+FFC3\U+FFA9llo")
> #t
>
> Are Racket regular expression character classes Unicode aware or is there
> some other explanation why this regular expression matches?
>
> Peter
>
> --
> You received this message because you are subscribed to the Google Groups
> "Racket Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to racket-users+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/racket-users/2197C34F-165D-4D97-97AD-F158153316F5%40gmail.com
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/racket-users/CADcuegsvf-hFwofptc2ieKQmqWFzxDnD1Cn8G7bFSzBZ%2BM3EDA%40mail.gmail.com.

Reply via email to