I'm not saying that what is sent to the server has to be those bytes; I'm saying that if we use the convention that punctuation, whitespace, etc gets escaped, it would allow us to recognize the boundaries of the local part in plain text.
I think what you mention is part of a more general problem. Let's suppose that I have an email address where the bytes that the server recognizes for the local part are <61 B3>@foo.com. I convert that using Latin-14 to aġ@ foo.com. I send it in an email to you, and you receive it as UTF-8. You see aġ@foo.com, but underneath the covers it is bytes <61 C4 A1>. But then you send to the server <61 C4 A1>@foo.com, and it fails. Or worse yet, reaches someone whose email is aÄ¡@foo.com. (Ok, I could have poked around and found a more compelling example, but you see the point). If I really wanted to be absolutely certain that my email wouldn't be munged by a conversion, I'd never convert from bytes: we'd never see " [email protected]", we'd always see the equivalent of %6d%61%[email protected]. Mark <https://google.com/+MarkDavis> * * *— Il meglio è l’inimico del bene —* ** On Fri, Nov 1, 2013 at 1:36 PM, Philippe Verdy <[email protected]> wrote: > > > 2013/11/1 Mark Davis ☕ <[email protected]> > >> These are two well-known serious flaws in EAI and URLs; there is no >> useful syntactic limit on what is in the query part of a URL or on the >> local part of an email address that would allow their boundaries to be >> detected in plaintext. >> >> No use complaining about them, because people are concerned with >> backwards compatibility, and wouldn't change the underlying specs. >> >> That being true, I wish that industry could come to consensus about >> requiring everything outside of a well-defined, backwards-compatible set of >> characters to be expressed as UTF-8 percent-escaped characters in these >> fields when they are expressed as plaintext. (Something like XID_Continue ± >> exceptions.) That would allow for unambiguous parsing in plaintext. >> > > Why "UTF-8" only ? There exists already email accounts created with > various ISO8859-* or windows codepages, or KOI-8R (or U). And none of these > addresses are aliased with an UTF-8 encoded account name reaching the same > mailbox (creting these aliases would help these users having such accounts > to protect their privacy, however there may exist rare cases where these > aliases woulda conflict with distinct mail accounts >

