https://bugzilla.wikimedia.org/show_bug.cgi?id=11547
Philippe Verdy <[email protected]> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |[email protected] --- Comment #3 from Philippe Verdy <[email protected]> 2009-11-14 17:43:32 UTC --- URL encoding is definitely NOT the correct way to make the "user@" part of emails address valid. Read the RFCs: URL encoding just applies to the hierarchical page name within a domain space (and under a hierarchical protocol like "http(s):" and "ftp(s):"), as well as in query parameters (when they are supported in those protocols). Valid user names in email addresses also use a "safe" alphabet different from that for domain names (which also DO NOT use URL encoding but the encodings supported in IDNA, if they are internationized, and DNS specifications otherwise). For example, the underscore character "_" (which is part of my own email address and cannot be subtituted into a "+" or "-" and not even into "%7E") or the exclamation punctuation mark "!" is perfectly safe (and standard) in the "user@" part (which in fact is not really described as a user name, but as an identity specifier whose internal syntax may contain a user name and some other authorization data, that cannot be safely stripped out or separated (some sites will use the colon ":" instead of the exclamation mark). Mapping any Unicode characters with UTF-8 or other representations into a valid "user@" part of an email address is completely unspecified (there's absolutely no reliable algorithm to do this, as the mapping is completely domain-dependant and may even be different from the mapping used for encoding usernames in URI schemes other than "mailto:"). All that can be done is to check that the "user@" part provided uses the valid ASCII subset which is specific to the "mailto:" URI scheme (and distinct from the ASCII subsets used: either in the DNS protocol for domain names; or in the server-local address part of HTTP/FTP URLs). Note also that "user@" parts in email addresses are normally CASE-SIGNIFICANT (even if most target SMTP servers, will accept emails using any case, and if some RFCs require that users provide an email address containing a user name that can be used as a valid label in a DNS subdomain, in order to activate some functionality) ; STMP relay agents (as well as senders) MUST NOT change the letter case in a pseudo-canonicalization (because they can't realiably know if the recipient server makes the case distinction) : this could simply break the authorization data which is part of the "user@" part (for example it could contain Base64-encoded binary data, in addition to representing the user identity on the target server where it will be delivered to the target POP3/IMAP/WebMail user's mailbox). -- Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug. You are on the CC list for the bug. _______________________________________________ Wikibugs-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
