On Fri, May 20, 2022 at 03:54:36PM -0500, Bryan K. Walton wrote:

> We are trying to do some header checks that block on both the From and
> Return Path header, but that also block some addresses with
> international characters in them. Characters like: 

The "Return-Path" header is added during final message *delivery*, after
the message enters the queue, and is almost universally absent at the
SMTP stage.  Any header checks on "Return-Path" are pointless.

Instead, use "check_sender_access", since the content of the
Return-Path header added during final delivery is the envelope
sender address.

> ù, ǔ, ɫ, ɇ, etc.

If you haven't enabled SMTPUTF8 support, the "From" header should not
have such characters present, they're instead encoded quoted-printable
or base64 via RFC2047.

Also regular expressions are a rather poor tool for parsing email
addresses.  You're turning screws with a hammer.  You should probably
rethink your goals.

> I've read this page:
> https://www.postfix.org/SMTPUTF8_README.html and I understand that
> header checks are not UTF-8 enabled.  My understanding of that page is
> that I must add *UTF8 to the beginning of the PCRE pattern.  I'm a
> little unclear about what the pattern would look like.

No.  The correct interpretation is that expecting valid UTF8 syntax is
not realistic, and that you'd end up rejecting messages you'd want to
accept if you did that.  You should therefore NOT add that prefix.

> If I want to block a slightly different domain in the header check, for
> example: 1105iĕĕ.com, using the pattern shown above, can somebody please
> tell me specifically what needs to be added to the pattern to make it
> work?

Are you sure that's actually the domain in the From header?  It could
well be in A-label form: xn--1105i-yzaa.com

You could also (without enabling UTF8 RE syntax) check for the
underlying raw octets of the UTF-8 encoding of "ĕ".  All you
need to do for that is edit the regexp/pcre table with a UTF-8
enabled editor, and type a literal "ĕ" into the pattern.

    $ echo ĕĕ | (LANG=C LC_CTYPE=C LC_ALL=C egrep ĕĕ)
    ĕĕ

UTF-8 encoded patterns match UTF-8 encoded input.  The only
reason to use explicit UTF-8 in regular expressions is to use
fancy Unicode features (character classes) in the pattern.

-- 
    Viktor.

Reply via email to