Re: [PATCH] parser: Unmangle From: headers that have been mangled for DMARC purposes
On 12/10/19 2:51 am, Jeff King wrote: On Sat, Oct 12, 2019 at 02:42:49AM +1100, Daniel Axtens wrote: where a possible solution was to get senders to use in-body From headers even when sending their own patches. [...] I'm not sure this solution is correct. If I take a patch from Andrew, backport it, and send to the list, Andrew will be listed in the in-body From. However, he shouldn't be the sender from the Patchwork point of view: he shouldn't get the patch status notification emails - I should. We don't want to spam an original author if their patch is backported to several different releases, or picked up and resent in someone else's series, etc etc. So unless I've misunderstood something, we can't rely on the in-body from matching Patchwork's understanding of the sender. Yeah, it may be that patchwork and git have two different priorities here. From my perspective, the problem is getting the patch into a git repo with the right author name. But patchwork may want to make the distinction between author and sender. Yes, I was referring to the git am case, not the Patchwork case. -- Andrew Donnellan OzLabs, ADL Canberra a...@linux.ibm.com IBM Australia Limited ___ Patchwork mailing list Patchwork@lists.ozlabs.org https://lists.ozlabs.org/listinfo/patchwork
Re: [PATCH] parser: Unmangle From: headers that have been mangled for DMARC purposes
Christian Schoenebeck writes: > 4. MTA's should also address this DKIM issue more accurately. I agree that Exim should be changed as you suggest. > > By taking these things into account, emails of domains with strict DMARC > policies are no longer munged on gnu lists. Additional info: Migration of many gnu/nongnu.gnu.org lists is still in progress for another week or so, then that will be true for most of them. For a minority of lists, the list administrators have set weird settings like making all messages have from: rewritten as from this list, and we are leaving them as is since the list administrators opted in to that at some point. But if the list deals with patches and not modifying the headers is useful to the people on the list, I think a request to change the list settings is likely to be accepted by the list admin. -- Ian Kelling | Senior Systems Administrator, Free Software Foundation GPG Key: B125 F60B 7B28 7FF6 A2B7 DF8F 170A F0E2 9542 95DF https://fsf.org | https://gnu.org ___ Patchwork mailing list Patchwork@lists.ozlabs.org https://lists.ozlabs.org/listinfo/patchwork
Re: [PATCH] parser: Unmangle From: headers that have been mangled for DMARC purposes
On Sat, Oct 12, 2019 at 02:42:49AM +1100, Daniel Axtens wrote: > >> where a possible solution was to get senders to use in-body From > >> headers even when sending their own patches. > [...] > I'm not sure this solution is correct. > > If I take a patch from Andrew, backport it, and send to the list, Andrew > will be listed in the in-body From. However, he shouldn't be the sender > from the Patchwork point of view: he shouldn't get the patch status > notification emails - I should. We don't want to spam an original author > if their patch is backported to several different releases, or picked up > and resent in someone else's series, etc etc. So unless I've > misunderstood something, we can't rely on the in-body from matching > Patchwork's understanding of the sender. Yeah, it may be that patchwork and git have two different priorities here. From my perspective, the problem is getting the patch into a git repo with the right author name. But patchwork may want to make the distinction between author and sender. -Peff ___ Patchwork mailing list Patchwork@lists.ozlabs.org https://lists.ozlabs.org/listinfo/patchwork
Re: [PATCH] parser: Unmangle From: headers that have been mangled for DMARC purposes
On Freitag, 11. Oktober 2019 06:50:14 CEST Andrew Donnellan wrote: > On 11/10/19 3:36 pm, Andrew Donnellan wrote: > > It would be nice if Mailman could adopt X-Original-Sender too. As it is, > > (which I have gone ahead and reported as > https://gitlab.com/mailman/mailman/issues/641) Not stopping you from doing that, since I still think that it'd be helpful if mailman added some kind X-Original-Sender header in case the email has to be munged for some reason. Just some notes about status & consensus we had: 1. On GNU lists the default mailman settings are now to prevent munging in first place (if possible): https://lists.gnu.org/archive/html/qemu-devel/2019-09/msg00416.html 2. If any list member has the "nodup" mailman option turned on, mailman would still munge emails due to that. Ian (on CC) worked on a patch to override that individual user setting automatically if necessary: https://bugs.launchpad.net/mailman/+bug/1845751 3. On git side it was suggested to add some kind of "always_use_in_body_from" option: https://public-inbox.org/git/20190923222415.ga22...@sigill.intra.peff.net/ Unless that git option exists, this little trick proofed as usable workaround for git patch submitters suffering from munging: https://lists.gnu.org/archive/html/qemu-devel/2019-09/msg00932.html 4. MTA's should also address this DKIM issue more accurately. For instance Exim is currently by default filling the "dkim h=..." header with "all header names listed in RFC4871 will be used, whether or not each header is present in the message": https://www.exim.org/exim-html-current/doc/html/spec_html/ch-dkim_and_spf.html That "h=" tag in email's dkim header lists all email headers which were included by MTA for signing the message. However IMO MTA's should not list any "List-*" header name in "dkim h=..." (at least not if not present in message), otherwise mailman is forced to munge any of such messages when adding its required List-* headers. BTW section 5.5. (page 38) of that RFC4871 actually sais these headers "SHOULD be included in the signature, if they are present in the message being signed". For now you can override this setting, e.g. by using Exim's "dkim_sign_headers" setting and providing your own list of header names, but from security point of view that's suboptimal, since admins probably leave that untouched for years and new security relevant headers might not be included for signing at some point in future. So IMO it would make sense to add more fine graded MTA DKIM config options like: "include these headers for dkim signing only if present in message" and/or "use default header names except of these". By taking these things into account, emails of domains with strict DMARC policies are no longer munged on gnu lists. Best regards, Christian Schoenebeck ___ Patchwork mailing list Patchwork@lists.ozlabs.org https://lists.ozlabs.org/listinfo/patchwork
Re: [PATCH] parser: Unmangle From: headers that have been mangled for DMARC purposes
Hi, >> Neat. There was discussion on a similar issue recently in: >> >> >> https://public-inbox.org/git/305577c2-709a-b632-4056-658277117...@redhat.com/ >> >> where a possible solution was to get senders to use in-body From >> headers even when sending their own patches. > > I think that's a good idea. > >> >> This might provide an alternate solution (or vice versa). I kind of like >> this one better in that it doesn't require the sender to do anything >> differently (but it may be less robust, as it assumes the receiver >> reliably de-mangling). > > Yep, it's less robust - but OTOH there's always a long tail of users > stuck on old versions of git for whatever reason and having some logic > to detect DMARC munging may thus still be useful. I'm not sure this solution is correct. If I take a patch from Andrew, backport it, and send to the list, Andrew will be listed in the in-body From. However, he shouldn't be the sender from the Patchwork point of view: he shouldn't get the patch status notification emails - I should. We don't want to spam an original author if their patch is backported to several different releases, or picked up and resent in someone else's series, etc etc. So unless I've misunderstood something, we can't rely on the in-body from matching Patchwork's understanding of the sender. Regards, Daniel > > -- > Andrew Donnellan OzLabs, ADL Canberra > a...@linux.ibm.com IBM Australia Limited > > ___ > Patchwork mailing list > Patchwork@lists.ozlabs.org > https://lists.ozlabs.org/listinfo/patchwork ___ Patchwork mailing list Patchwork@lists.ozlabs.org https://lists.ozlabs.org/listinfo/patchwork
Re: [PATCH] parser: Unmangle From: headers that have been mangled for DMARC purposes
Jeff King writes: > This might provide an alternate solution (or vice versa). I kind of like > this one better in that it doesn't require the sender to do anything > differently (but it may be less robust, as it assumes the receiver > reliably de-mangling). I share the assessment. I also feel that relying on Reply-To: would make the result a lot less reliable (I do not have much problem with the use of X-Original-Sender, though). ___ Patchwork mailing list Patchwork@lists.ozlabs.org https://lists.ozlabs.org/listinfo/patchwork
Re: [PATCH] parser: Unmangle From: headers that have been mangled for DMARC purposes
On Thu, Oct 10, 2019 at 12:41:32PM -0700, Jonathan Nieder wrote: > > Add support for using the X-Original-Sender or Reply-To headers, as used by > > Google Groups and Mailman respectively, to unmangle the From header when > > necessary. > [...] > Interesting! I'm cc-ing the Git mailing list in case "git am" might > wnat to learn the same support. Neat. There was discussion on a similar issue recently in: https://public-inbox.org/git/305577c2-709a-b632-4056-658277117...@redhat.com/ where a possible solution was to get senders to use in-body From headers even when sending their own patches. This might provide an alternate solution (or vice versa). I kind of like this one better in that it doesn't require the sender to do anything differently (but it may be less robust, as it assumes the receiver reliably de-mangling). -Peff ___ Patchwork mailing list Patchwork@lists.ozlabs.org https://lists.ozlabs.org/listinfo/patchwork
Re: [PATCH] parser: Unmangle From: headers that have been mangled for DMARC purposes
On 11/10/19 3:36 pm, Andrew Donnellan wrote: It would be nice if Mailman could adopt X-Original-Sender too. As it is, (which I have gone ahead and reported as https://gitlab.com/mailman/mailman/issues/641) -- Andrew Donnellan OzLabs, ADL Canberra a...@linux.ibm.com IBM Australia Limited ___ Patchwork mailing list Patchwork@lists.ozlabs.org https://lists.ozlabs.org/listinfo/patchwork
Re: [PATCH] parser: Unmangle From: headers that have been mangled for DMARC purposes
On 11/10/19 3:29 pm, Junio C Hamano wrote: Jeff King writes: This might provide an alternate solution (or vice versa). I kind of like this one better in that it doesn't require the sender to do anything differently (but it may be less robust, as it assumes the receiver reliably de-mangling). I share the assessment. I also feel that relying on Reply-To: would make the result a lot less reliable (I do not have much problem with the use of X-Original-Sender, though). It would be nice if Mailman could adopt X-Original-Sender too. As it is, it adds the original sender to Reply-To, but in some cases (where the list is set as reply-to-list, or has a custom reply-to setting) it adds to Cc instead. (In the patch that started this thread, I match the name from the munged From field against the name in Reply-To/Cc for the case where there's multiple Reply-Tos/Ccs.) For the Patchwork use case, I'm quite okay with accepting the risk of using Reply-To, as the alternative is worse, the corner cases are rare, and ultimately a maintainer can still fix the odd stuff-up before applying the patch. -- Andrew Donnellan OzLabs, ADL Canberra a...@linux.ibm.com IBM Australia Limited ___ Patchwork mailing list Patchwork@lists.ozlabs.org https://lists.ozlabs.org/listinfo/patchwork
Re: [PATCH] parser: Unmangle From: headers that have been mangled for DMARC purposes
On Fri, Oct 11, 2019 at 10:01:23AM +1100, Andrew Donnellan wrote: > > This might provide an alternate solution (or vice versa). I kind of like > > this one better in that it doesn't require the sender to do anything > > differently (but it may be less robust, as it assumes the receiver > > reliably de-mangling). > > Yep, it's less robust - but OTOH there's always a long tail of users stuck > on old versions of git for whatever reason and having some logic to detect > DMARC munging may thus still be useful. I think the two features would work together nicely out of the box: if somebody has an in-body from, we'd respect that before looking at email headers anyway. So senders who do the extra work will be covered, and checking other email headers would just improve the fallback case. -Peff ___ Patchwork mailing list Patchwork@lists.ozlabs.org https://lists.ozlabs.org/listinfo/patchwork
Re: [PATCH] parser: Unmangle From: headers that have been mangled for DMARC purposes
On Fri, 11 Oct 2019 10:16:40 +1100 Daniel Axtens wrote: > > Andrew Donnellan writes: > > > On 11/10/19 6:41 am, Jonathan Nieder wrote: > >> Interesting! I'm cc-ing the Git mailing list in case "git am" might > >> wnat to learn the same support. > > Argh, that reminds me... this patch only rewrites the name and email > > that is recorded as the Patchwork submitter, it doesn't actually rewrite > > the From: header when you fetch the mbox off Patchwork. > > > > Part of me would really like to keep Patchwork mboxes as close as > > possible to the mbox we ingested, but on the other hand it means the > > mangled address is still going to land in the git repo at the end... so > > I should probably just change it? > > Yes, I think change it. If you're worried, stash the original one in the > headers. Or add a From: line to the start of the body (if one is not already there). -- Cheers, Stephen Rothwell pgpATtDfG0Tsi.pgp Description: OpenPGP digital signature ___ Patchwork mailing list Patchwork@lists.ozlabs.org https://lists.ozlabs.org/listinfo/patchwork
Re: [PATCH] parser: Unmangle From: headers that have been mangled for DMARC purposes
Andrew Donnellan writes: > On 11/10/19 6:41 am, Jonathan Nieder wrote: >> Interesting! I'm cc-ing the Git mailing list in case "git am" might >> wnat to learn the same support. > Argh, that reminds me... this patch only rewrites the name and email > that is recorded as the Patchwork submitter, it doesn't actually rewrite > the From: header when you fetch the mbox off Patchwork. > > Part of me would really like to keep Patchwork mboxes as close as > possible to the mbox we ingested, but on the other hand it means the > mangled address is still going to land in the git repo at the end... so > I should probably just change it? Yes, I think change it. If you're worried, stash the original one in the headers. > > -- > Andrew Donnellan OzLabs, ADL Canberra > a...@linux.ibm.com IBM Australia Limited > > ___ > Patchwork mailing list > Patchwork@lists.ozlabs.org > https://lists.ozlabs.org/listinfo/patchwork ___ Patchwork mailing list Patchwork@lists.ozlabs.org https://lists.ozlabs.org/listinfo/patchwork
Re: [PATCH] parser: Unmangle From: headers that have been mangled for DMARC purposes
On 11/10/19 9:54 am, Jeff King wrote: Neat. There was discussion on a similar issue recently in: https://public-inbox.org/git/305577c2-709a-b632-4056-658277117...@redhat.com/ where a possible solution was to get senders to use in-body From headers even when sending their own patches. I think that's a good idea. This might provide an alternate solution (or vice versa). I kind of like this one better in that it doesn't require the sender to do anything differently (but it may be less robust, as it assumes the receiver reliably de-mangling). Yep, it's less robust - but OTOH there's always a long tail of users stuck on old versions of git for whatever reason and having some logic to detect DMARC munging may thus still be useful. -- Andrew Donnellan OzLabs, ADL Canberra a...@linux.ibm.com IBM Australia Limited ___ Patchwork mailing list Patchwork@lists.ozlabs.org https://lists.ozlabs.org/listinfo/patchwork
Re: [PATCH] parser: Unmangle From: headers that have been mangled for DMARC purposes
On 11/10/19 6:41 am, Jonathan Nieder wrote: Interesting! I'm cc-ing the Git mailing list in case "git am" might wnat to learn the same support. Argh, that reminds me... this patch only rewrites the name and email that is recorded as the Patchwork submitter, it doesn't actually rewrite the From: header when you fetch the mbox off Patchwork. Part of me would really like to keep Patchwork mboxes as close as possible to the mbox we ingested, but on the other hand it means the mangled address is still going to land in the git repo at the end... so I should probably just change it? -- Andrew Donnellan OzLabs, ADL Canberra a...@linux.ibm.com IBM Australia Limited ___ Patchwork mailing list Patchwork@lists.ozlabs.org https://lists.ozlabs.org/listinfo/patchwork
Re: [PATCH] parser: Unmangle From: headers that have been mangled for DMARC purposes
Hi, Andrew Donnellan wrote: > To avoid triggering spam filters due to failed signature validation, many > mailing lists mangle the From header to change the From address to be the > address of the list, typically where the sender's domain has a strict DMARC > policy enabled. > > In this case, we should try to unmangle the From header. > > Add support for using the X-Original-Sender or Reply-To headers, as used by > Google Groups and Mailman respectively, to unmangle the From header when > necessary. > > Closes: #64 ("Incorrect submitter when using googlegroups") > Reported-by: Alexandre Belloni > Reported-by: Stephen Rothwell > Signed-off-by: Andrew Donnellan > --- > patchwork/parser.py| 75 ++ > patchwork/tests/test_parser.py | 68 -- > 2 files changed, 130 insertions(+), 13 deletions(-) Interesting! I'm cc-ing the Git mailing list in case "git am" might wnat to learn the same support. Thanks, Jonathan (patch left unsnipped for reference) > diff --git a/patchwork/parser.py b/patchwork/parser.py > index 7dc66bc05a5b..79beac5617b1 100644 > --- a/patchwork/parser.py > +++ b/patchwork/parser.py > @@ -321,12 +321,7 @@ def find_series(project, mail, author): > return _find_series_by_markers(project, mail, author) > > > -def get_or_create_author(mail): > -from_header = clean_header(mail.get('From')) > - > -if not from_header: > -raise ValueError("Invalid 'From' header") > - > +def split_from_header(from_header): > name, email = (None, None) > > # tuple of (regex, fn) > @@ -355,6 +350,65 @@ def get_or_create_author(mail): > (name, email) = fn(match.groups()) > break > > +return (name, email) > + > + > +# Unmangle From addresses that have been mangled for DMARC purposes. > +# > +# To avoid triggering spam filters due to failed signature validation, many > +# mailing lists mangle the From header to change the From address to be the > +# address of the list, typically where the sender's domain has a strict > +# DMARC policy enabled. > +# > +# Unfortunately, there's no standardised way of preserving the original > +# From address. > +# > +# Google Groups adds an X-Original-Sender header. If present, we use that. > +# > +# Mailman preserves the original address by adding a Reply-To, except in the > +# case where the list is set to either reply to list, or reply to a specific > +# address, in which case the original From is added to Cc instead. These > corner > +# cases are dumb, but we try and handle things as sensibly as possible by > +# looking for a name in Reply-To/Cc that matches From. It's inexact but > should > +# be good enough for our purposes. > +def get_original_sender(mail, name, email): > +if name and ' via ' in name: > +# Mailman uses the format " via " > +# Google Groups uses "'' via " > +stripped_name = name[:name.rfind(' via ')].strip().strip("'") > + > +original_sender = clean_header(mail.get('X-Original-Sender', '')) > +if original_sender: > +new_email = split_from_header(original_sender)[1].strip()[:255] > +return (stripped_name, new_email) > + > +addrs = [] > +reply_to_headers = mail.get_all('Reply-To') or [] > +cc_headers = mail.get_all('Cc') or [] > +for header in reply_to_headers + cc_headers: > +header = clean_header(header) > +addrs = header.split(",") > +for addr in addrs: > +new_name, new_email = split_from_header(addr) > +if new_name: > +new_name = new_name.strip()[:255] > +if new_email: > +new_email = new_email.strip()[:255] > +if new_name == stripped_name: > +return (stripped_name, new_email) > + > +# If we can't figure out the original sender, just keep it as is > +return (name, email) > + > + > +def get_or_create_author(mail, project=None): > +from_header = clean_header(mail.get('From')) > + > +if not from_header: > +raise ValueError("Invalid 'From' header") > + > +name, email = split_from_header(from_header) > + > if not email: > raise ValueError("Invalid 'From' header") > > @@ -362,6 +416,9 @@ def get_or_create_author(mail): > if name is not None: > name = name.strip()[:255] > > +if project and email.lower() == project.listemail.lower(): > +name, email = get_original_sender(mail, name, email) > + > # this correctly handles the case where we lose the race to create > # the person and another process beats us to it. (If the record > # does not exist, g_o_c invokes _create_object_from_params which > @@ -1004,7 +1061,7 @@ def parse_mail(mail, list_id=None): > > if not is_comment and (diff or pull_url): # patches or pull requests > # we delay the saving until we know we have a patch. > -author = get_or_create_author(mail) > +author =
[PATCH] parser: Unmangle From: headers that have been mangled for DMARC purposes
To avoid triggering spam filters due to failed signature validation, many mailing lists mangle the From header to change the From address to be the address of the list, typically where the sender's domain has a strict DMARC policy enabled. In this case, we should try to unmangle the From header. Add support for using the X-Original-Sender or Reply-To headers, as used by Google Groups and Mailman respectively, to unmangle the From header when necessary. Closes: #64 ("Incorrect submitter when using googlegroups") Reported-by: Alexandre Belloni Reported-by: Stephen Rothwell Signed-off-by: Andrew Donnellan --- patchwork/parser.py| 75 ++ patchwork/tests/test_parser.py | 68 -- 2 files changed, 130 insertions(+), 13 deletions(-) diff --git a/patchwork/parser.py b/patchwork/parser.py index 7dc66bc05a5b..79beac5617b1 100644 --- a/patchwork/parser.py +++ b/patchwork/parser.py @@ -321,12 +321,7 @@ def find_series(project, mail, author): return _find_series_by_markers(project, mail, author) -def get_or_create_author(mail): -from_header = clean_header(mail.get('From')) - -if not from_header: -raise ValueError("Invalid 'From' header") - +def split_from_header(from_header): name, email = (None, None) # tuple of (regex, fn) @@ -355,6 +350,65 @@ def get_or_create_author(mail): (name, email) = fn(match.groups()) break +return (name, email) + + +# Unmangle From addresses that have been mangled for DMARC purposes. +# +# To avoid triggering spam filters due to failed signature validation, many +# mailing lists mangle the From header to change the From address to be the +# address of the list, typically where the sender's domain has a strict +# DMARC policy enabled. +# +# Unfortunately, there's no standardised way of preserving the original +# From address. +# +# Google Groups adds an X-Original-Sender header. If present, we use that. +# +# Mailman preserves the original address by adding a Reply-To, except in the +# case where the list is set to either reply to list, or reply to a specific +# address, in which case the original From is added to Cc instead. These corner +# cases are dumb, but we try and handle things as sensibly as possible by +# looking for a name in Reply-To/Cc that matches From. It's inexact but should +# be good enough for our purposes. +def get_original_sender(mail, name, email): +if name and ' via ' in name: +# Mailman uses the format " via " +# Google Groups uses "'' via " +stripped_name = name[:name.rfind(' via ')].strip().strip("'") + +original_sender = clean_header(mail.get('X-Original-Sender', '')) +if original_sender: +new_email = split_from_header(original_sender)[1].strip()[:255] +return (stripped_name, new_email) + +addrs = [] +reply_to_headers = mail.get_all('Reply-To') or [] +cc_headers = mail.get_all('Cc') or [] +for header in reply_to_headers + cc_headers: +header = clean_header(header) +addrs = header.split(",") +for addr in addrs: +new_name, new_email = split_from_header(addr) +if new_name: +new_name = new_name.strip()[:255] +if new_email: +new_email = new_email.strip()[:255] +if new_name == stripped_name: +return (stripped_name, new_email) + +# If we can't figure out the original sender, just keep it as is +return (name, email) + + +def get_or_create_author(mail, project=None): +from_header = clean_header(mail.get('From')) + +if not from_header: +raise ValueError("Invalid 'From' header") + +name, email = split_from_header(from_header) + if not email: raise ValueError("Invalid 'From' header") @@ -362,6 +416,9 @@ def get_or_create_author(mail): if name is not None: name = name.strip()[:255] +if project and email.lower() == project.listemail.lower(): +name, email = get_original_sender(mail, name, email) + # this correctly handles the case where we lose the race to create # the person and another process beats us to it. (If the record # does not exist, g_o_c invokes _create_object_from_params which @@ -1004,7 +1061,7 @@ def parse_mail(mail, list_id=None): if not is_comment and (diff or pull_url): # patches or pull requests # we delay the saving until we know we have a patch. -author = get_or_create_author(mail) +author = get_or_create_author(mail, project) delegate = find_delegate_by_header(mail) if not delegate and diff: @@ -1099,7 +1156,7 @@ def parse_mail(mail, list_id=None): is_cover_letter = True if is_cover_letter: -author = get_or_create_author(mail) +author = get_or_create_author(mail, project) # we don't use 'find_series' here as a cover letter