Hi, Thanks for your contribution Raxel!
>> When contributors using Gnus reply to a patch, the In-Reply-To & >> References header fields include comments in the form described in >> remove_rfc2822_comments spec. These comments can lead to threading >> issues where users' replies don't get associated to the patch they >> reply to because the added comment will be part of the message-id when >> parsed. Following RFC2822, a comment is not supposed to affect threading >> in any way, so we remove it. To use regex, we assume that comments are >> not nested; further changes to more thoroughly match the RFC2822 grammar >> remains for anyone interested. > > Hm, does the nesting level of the comments really matter? Or is the > issue that they may be multiline? > > That is - it's pretty trivial to write a regex to match > (foo(bar)baz((quot)meh)), as long as you don't actually care about the > semantics of the nesting. https://datatracker.ietf.org/doc/html/rfc2822#section-3.2.3 suggests that comments may nest and, if I am reading the spec correctly, that they may be multi-line: comment can contain folding white-space and folding white-space can contain CRLF. However, given that this is the first time comments have mattered at all, I'd be OK to ignore multi-line comments. I would like nested comments to work though, unless it makes a real mess of things. >> >> Signed-off-by: Raxel Gutierrez <ra...@google.com> >> Closes: #399 >> --- >> patchwork/parser.py | 25 +++++++++++++++++-- >> .../notes/issue-399-584c5be5b71dcf63.yaml | 7 ++++++ >> 2 files changed, 30 insertions(+), 2 deletions(-) >> create mode 100644 releasenotes/notes/issue-399-584c5be5b71dcf63.yaml >> >> diff --git a/patchwork/parser.py b/patchwork/parser.py >> index 61a8124..683ff55 100644 >> --- a/patchwork/parser.py >> +++ b/patchwork/parser.py >> @@ -70,6 +70,27 @@ def normalise_space(value): >> return whitespace_re.sub(' ', value).strip() >> >> >> +def remove_rfc2822_comments(header_contents): >> + """Removes RFC2822 comments from header fields. >> + >> + Gnus create reply emails with commments like In-Reply-To/References: >> + <msg-id> (User's message of Sun, 01 Jan 2012 12:34:56 +0700) [comment]. >> + Patchwork parses the values of the In-Reply-To & References header >> fields >> + with the comment included as part of their value. A side effect of the >> + comment not being removed is that message-ids are mismatched. These >> + comments do not provide useful information for processing patches >> + because they are ignored for threading and not rendered by mail readers. >> + """ >> + >> + # Captures comments in header fields. > > Firstly, I'd like to point out for other reviewers that Raxel commented > the expression this way because I told him to - if you hate it, blame > me, not him ;) If `tox -e flake8` is happy, I am happy :) >> + comment_pattern = re.compile(r""" >> + \( # The opening parenthesis of comment >> + [^()]* # The contents of the comment > I *think* this is the bit that's making it not support nesting. > "Match anything besides another open- or close-paren". > > https://docs.python.org/3/library/re.html tells me that Python treats > '*' as greedy by default, so wouldn't "\(.*\)" handle nested comments? > Or is there an issue that you can have more that one, e.g. > > In-Reply-To: (danica's mail) abcd1-40...@mail.google.com (from gnus) > > in which case greedy-matching would also obliterate the actual > message-id? > > This actually brings to mind that I'd like to see an example of one such > problematic line in the commit message, if you've got one handy. I've asked on the issue (https://github.com/getpatchwork/patchwork/issues/399) to see if we can get some examples. Ostensibly emacs generates them, but I use emacs+notmuch and I don't see them so I think it might be gnus specific. Kind Regards, Daniel _______________________________________________ Patchwork mailing list Patchwork@lists.ozlabs.org https://lists.ozlabs.org/listinfo/patchwork