Re: [PATCH] parser: Unmangle From: headers that have been mangled for DMARC purposes

2019-10-10 Thread Junio C Hamano
Jeff King  writes:

> This might provide an alternate solution (or vice versa). I kind of like
> this one better in that it doesn't require the sender to do anything
> differently (but it may be less robust, as it assumes the receiver
> reliably de-mangling).

I share the assessment.  I also feel that relying on Reply-To: would
make the result a lot less reliable (I do not have much problem with
the use of X-Original-Sender, though).
___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork


Re: [PATCH] parser: Unmangle From: headers that have been mangled for DMARC purposes

2019-10-10 Thread Jeff King
On Thu, Oct 10, 2019 at 12:41:32PM -0700, Jonathan Nieder wrote:

> > Add support for using the X-Original-Sender or Reply-To headers, as used by
> > Google Groups and Mailman respectively, to unmangle the From header when
> > necessary.
> [...]
> Interesting!  I'm cc-ing the Git mailing list in case "git am" might
> wnat to learn the same support.

Neat. There was discussion on a similar issue recently in:

  https://public-inbox.org/git/305577c2-709a-b632-4056-658277117...@redhat.com/

where a possible solution was to get senders to use in-body From
headers even when sending their own patches.

This might provide an alternate solution (or vice versa). I kind of like
this one better in that it doesn't require the sender to do anything
differently (but it may be less robust, as it assumes the receiver
reliably de-mangling).

-Peff
___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork


Re: [PATCH] parser: Unmangle From: headers that have been mangled for DMARC purposes

2019-10-10 Thread Andrew Donnellan

On 11/10/19 3:36 pm, Andrew Donnellan wrote:
It would be nice if Mailman could adopt X-Original-Sender too. As it is, 


(which I have gone ahead and reported as 
https://gitlab.com/mailman/mailman/issues/641)


--
Andrew Donnellan  OzLabs, ADL Canberra
a...@linux.ibm.com IBM Australia Limited

___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork


Re: [PATCH] parser: Unmangle From: headers that have been mangled for DMARC purposes

2019-10-10 Thread Andrew Donnellan

On 11/10/19 3:29 pm, Junio C Hamano wrote:

Jeff King  writes:


This might provide an alternate solution (or vice versa). I kind of like
this one better in that it doesn't require the sender to do anything
differently (but it may be less robust, as it assumes the receiver
reliably de-mangling).


I share the assessment.  I also feel that relying on Reply-To: would
make the result a lot less reliable (I do not have much problem with
the use of X-Original-Sender, though).



It would be nice if Mailman could adopt X-Original-Sender too. As it is, 
it adds the original sender to Reply-To, but in some cases (where the 
list is set as reply-to-list, or has a custom reply-to setting) it adds 
to Cc instead. (In the patch that started this thread, I match the name 
from the munged From field against the name in Reply-To/Cc for the case 
where there's multiple Reply-Tos/Ccs.)


For the Patchwork use case, I'm quite okay with accepting the risk of 
using Reply-To, as the alternative is worse, the corner cases are rare, 
and ultimately a maintainer can still fix the odd stuff-up before 
applying the patch.


--
Andrew Donnellan  OzLabs, ADL Canberra
a...@linux.ibm.com IBM Australia Limited

___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork


Re: [PATCH] parser: Unmangle From: headers that have been mangled for DMARC purposes

2019-10-10 Thread Jeff King
On Fri, Oct 11, 2019 at 10:01:23AM +1100, Andrew Donnellan wrote:

> > This might provide an alternate solution (or vice versa). I kind of like
> > this one better in that it doesn't require the sender to do anything
> > differently (but it may be less robust, as it assumes the receiver
> > reliably de-mangling).
> 
> Yep, it's less robust - but OTOH there's always a long tail of users stuck
> on old versions of git for whatever reason and having some logic to detect
> DMARC munging may thus still be useful.

I think the two features would work together nicely out of the box: if
somebody has an in-body from, we'd respect that before looking at email
headers anyway. So senders who do the extra work will be covered, and
checking other email headers would just improve the fallback case.

-Peff
___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork


Re: [PATCH] parser: Unmangle From: headers that have been mangled for DMARC purposes

2019-10-10 Thread Stephen Rothwell
On Fri, 11 Oct 2019 10:16:40 +1100 Daniel Axtens  wrote:
>
> Andrew Donnellan  writes:
> 
> > On 11/10/19 6:41 am, Jonathan Nieder wrote:  
> >> Interesting!  I'm cc-ing the Git mailing list in case "git am" might
> >> wnat to learn the same support.  
> > Argh, that reminds me... this patch only rewrites the name and email 
> > that is recorded as the Patchwork submitter, it doesn't actually rewrite 
> > the From: header when you fetch the mbox off Patchwork.
> >
> > Part of me would really like to keep Patchwork mboxes as close as 
> > possible to the mbox we ingested, but on the other hand it means the 
> > mangled address is still going to land in the git repo at the end... so 
> > I should probably just change it?  
> 
> Yes, I think change it. If you're worried, stash the original one in the
> headers.

Or add a From: line to the start of the body (if one is not already there).

-- 
Cheers,
Stephen Rothwell


pgpATtDfG0Tsi.pgp
Description: OpenPGP digital signature
___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork


Re: [PATCH] parser: Unmangle From: headers that have been mangled for DMARC purposes

2019-10-10 Thread Daniel Axtens
Andrew Donnellan  writes:

> On 11/10/19 6:41 am, Jonathan Nieder wrote:
>> Interesting!  I'm cc-ing the Git mailing list in case "git am" might
>> wnat to learn the same support.
> Argh, that reminds me... this patch only rewrites the name and email 
> that is recorded as the Patchwork submitter, it doesn't actually rewrite 
> the From: header when you fetch the mbox off Patchwork.
>
> Part of me would really like to keep Patchwork mboxes as close as 
> possible to the mbox we ingested, but on the other hand it means the 
> mangled address is still going to land in the git repo at the end... so 
> I should probably just change it?

Yes, I think change it. If you're worried, stash the original one in the
headers.

>
> -- 
> Andrew Donnellan  OzLabs, ADL Canberra
> a...@linux.ibm.com IBM Australia Limited
>
> ___
> Patchwork mailing list
> Patchwork@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/patchwork
___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork


Re: [PATCH] parser: Unmangle From: headers that have been mangled for DMARC purposes

2019-10-10 Thread Andrew Donnellan

On 11/10/19 9:54 am, Jeff King wrote:

Neat. There was discussion on a similar issue recently in:

   https://public-inbox.org/git/305577c2-709a-b632-4056-658277117...@redhat.com/

where a possible solution was to get senders to use in-body From
headers even when sending their own patches.


I think that's a good idea.



This might provide an alternate solution (or vice versa). I kind of like
this one better in that it doesn't require the sender to do anything
differently (but it may be less robust, as it assumes the receiver
reliably de-mangling).


Yep, it's less robust - but OTOH there's always a long tail of users 
stuck on old versions of git for whatever reason and having some logic 
to detect DMARC munging may thus still be useful.


--
Andrew Donnellan  OzLabs, ADL Canberra
a...@linux.ibm.com IBM Australia Limited

___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork


Re: RFE: use patchwork to submit a patch

2019-10-10 Thread Eric Wong
Jonathan Nieder  wrote:
> Eric Wong wrote:
> > Konstantin Ryabitsev  wrote:
> 
> >> This is actually really fast if you already have a local copy of the
> >> repository with most objects. Try this yourself if you have
> >> torvalds/linux.git locally:
> >>
> >> git clone --bare -s torvalds/linux.git test
> >
> > Yep, -s (--shared) makes cloning really cheap.  One of my goals is to get
> >
> > git clone -s https://example.com/torvalds/linux.git
> >
> > and
> >
> > git clone -s https://example.com/torvalds/linux.git/clone.bundle
> >
> > working.  That would make it easier for new contributors to
> > setup lightweight clones and pull in history on an as-needed
> > basis w/o hacks like shallow cloning.
> 
> Does "git clone --filter=blob:none" do what you're looking for?

Oops, haven't seen that new feature :x  And haven't tried, pu @
8d9027fa59b943db96a8a9090ec31d7f0f935596 is broken due to
conflicts with hashmap (probably won't have time for a bit to
look at it).

What I'm hoping to do with "git clone -s" would be client-only
and compatible with existing HTTP servers that run
"git update-server-info".
___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork


Re: RFE: use patchwork to submit a patch

2019-10-10 Thread Eric Wong
Konstantin Ryabitsev  wrote:
> This is actually really fast if you already have a local copy of the
> repository with most objects. Try this yourself if you have
> torvalds/linux.git locally:
> 
> git clone --bare -s torvalds/linux.git test

Yep, -s (--shared) makes cloning really cheap.  One of my goals is to get

git clone -s https://example.com/torvalds/linux.git

and

git clone -s https://example.com/torvalds/linux.git/clone.bundle

working.  That would make it easier for new contributors to
setup lightweight clones and pull in history on an as-needed
basis w/o hacks like shallow cloning.
___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork


Re: RFE: use patchwork to submit a patch

2019-10-10 Thread Steven Rostedt
On Thu, 10 Oct 2019 15:07:29 -0300
Mauro Carvalho Chehab  wrote:

> Em Thu, 10 Oct 2019 10:41:50 -0400
> Konstantin Ryabitsev  escreveu:
> 
> > Hi, all:
> > 
> > I would like to propose a new (large) feature to patchwork with the goal 
> > to make the process of submitting a patch easier for newbies and people 
> > generally less familiar with patch-based development. This was discussed 
> > previously on the workflows list: 
> > https://lore.kernel.org/workflows/20190930202451.GA14403@pure.paranoia.local/
> > 
> > How I envision this would work:
> > 
> > - user creates an account (which requires a mail confirmation)
> > - they choose a "submit patch" option from the menu
> > - the patch submission screen has a succession of screens:
> > 
> >   1. a screen with a single field allowing a user to paste a URL to 
> >  their fork of the git repository.   
> 
> This will raise the bar, as it will force all developers to have a
> public site to host the tree. I guess only a fraction of the 4k kernel
> devs have it... In special, the ones that just want to send us a patch
> fixing a bug may have serious troubles implementing that.

A lot of people are going the way of gitlab/github. I don't think this
raises the bar very high. Anyone can trivially get a public git
repository somewhere.

We can still have email coming in for those die hards, as long as we
have someone to proxy it in. Note, I'm closer to one of the die hards,
but I see where the world is heading.

> 
> > Once submitted, patchwork does a 
> >  "git ls-remote" to attempt to get a list of refs and to verify that 
> >  this is indeed a valid git repository
> > 
> >   2. next screen asks the user to select the ref to work from using the 
> >  list obtained from the remote. Once submitted, patchwork performs a 
> >  `git clone --reference` to clone the repository locally using a 
> >  local fork of the same repo to minimize object transfer. This part 
> >  requires that:
> >a. patchwork project is configured with a path to a local fork, 
> >   if this feature is enabled for a project
> >b. that fork is kept current via some mechanism outside of 
> >   patchwork (e.g. with grokmirror)
> >c. there is some sanity-checking during the clone process to 
> >   avoid abuse (e.g. a sane timeout, a tmpdir with limited size, 
> >   etc -- other suggestions welcome)  
> 
> That would require a high bandwidth at the machine with as patchwork.

Which most solutions may not be able to avoid this.

> 
> Also, doesn't sound a good idea to me, as the server may end by having
> tons of open sections, most waiting for tens of minutes, in order to
> complete git clone.
> 
> > 
> >   3. next screen asks the user to pick a starting commit from the log.  
> >  Once submitted, patchwork generates the patch from the commit 
> >  provided to the tip of the branch selected by the user earlier,
> >  using git format-patch.
> > 
> >   4. next screen asks the user to review the patch to make sure this is 
> >  what they want to submit. Once confirmed, patchwork performs two 
> >  admin-defined optional hooks:
> > 
> >a. a hook to generate a list of cc's (e.g. get_maintainer.pl)
> >b. a sanity check hook (e.g. checkpatch.pl)
> > 
> >   5. if sanity checking is defined, next screen shows the output of the 
> >  sanity check hook, asking confirmation to proceed.
> > 
> >   6. next screen shows the user three fields:
> > 
> >a. title of the patch/series
> >b. cover letter for the patch/series
> >c. message-id of the previous patch revision (can be picked from 
> >   the list of previously submitted series by this user -- 
> >   patchwork should have them already)
> >d. number of the revision (can be auto-filled if previous field 
> >   is provided) and other tags to include in []
> > 
> >   7. next screen shows final review of what would be sent out to the 
> >  list (and cc's, if the hook to get cc's is defined and returned any 
> >  results). Once submitted, patchwork sends the patch/series using 
> >  patchwork's envelope-from and the user's own email in the From: 
> >  header.
> > 
> >   8. once sent successfully, cleanups are performed (also needs to be 
> >  done as part of the regular cron job, for any aborted attempts)
> > 
> > I know this is a pretty big RFE, and I would like to hear your thoughts 
> > about this. If there is general agreement that this is doable/good idea, 
> > I may be able to come up with funding for this development as part of 
> > the overall tooling improvement proposal.  
> 
> The procedure itself is not bad, but, if implemented, IMHO, it should,
> instead, be something that will run at the machine of the one submitting
> the patch. For instance, this could be a perl or python script inside 
> Kernel's ./script directory that would handle everything locally, 

Re: RFE: use patchwork to submit a patch

2019-10-10 Thread Mauro Carvalho Chehab
Em Thu, 10 Oct 2019 10:41:50 -0400
Konstantin Ryabitsev  escreveu:

> Hi, all:
> 
> I would like to propose a new (large) feature to patchwork with the goal 
> to make the process of submitting a patch easier for newbies and people 
> generally less familiar with patch-based development. This was discussed 
> previously on the workflows list: 
> https://lore.kernel.org/workflows/20190930202451.GA14403@pure.paranoia.local/
> 
> How I envision this would work:
> 
> - user creates an account (which requires a mail confirmation)
> - they choose a "submit patch" option from the menu
> - the patch submission screen has a succession of screens:
> 
>   1. a screen with a single field allowing a user to paste a URL to 
>  their fork of the git repository. 

This will raise the bar, as it will force all developers to have a
public site to host the tree. I guess only a fraction of the 4k kernel
devs have it... In special, the ones that just want to send us a patch
fixing a bug may have serious troubles implementing that.

> Once submitted, patchwork does a 
>  "git ls-remote" to attempt to get a list of refs and to verify that 
>  this is indeed a valid git repository
> 
>   2. next screen asks the user to select the ref to work from using the 
>  list obtained from the remote. Once submitted, patchwork performs a 
>  `git clone --reference` to clone the repository locally using a 
>  local fork of the same repo to minimize object transfer. This part 
>  requires that:
>a. patchwork project is configured with a path to a local fork, 
>   if this feature is enabled for a project
>b. that fork is kept current via some mechanism outside of 
>   patchwork (e.g. with grokmirror)
>c. there is some sanity-checking during the clone process to 
>   avoid abuse (e.g. a sane timeout, a tmpdir with limited size, 
>   etc -- other suggestions welcome)

That would require a high bandwidth at the machine with as patchwork.

Also, doesn't sound a good idea to me, as the server may end by having
tons of open sections, most waiting for tens of minutes, in order to
complete git clone.

> 
>   3. next screen asks the user to pick a starting commit from the log.  
>  Once submitted, patchwork generates the patch from the commit 
>  provided to the tip of the branch selected by the user earlier,
>  using git format-patch.
> 
>   4. next screen asks the user to review the patch to make sure this is 
>  what they want to submit. Once confirmed, patchwork performs two 
>  admin-defined optional hooks:
> 
>a. a hook to generate a list of cc's (e.g. get_maintainer.pl)
>b. a sanity check hook (e.g. checkpatch.pl)
> 
>   5. if sanity checking is defined, next screen shows the output of the 
>  sanity check hook, asking confirmation to proceed.
> 
>   6. next screen shows the user three fields:
> 
>a. title of the patch/series
>b. cover letter for the patch/series
>c. message-id of the previous patch revision (can be picked from 
>   the list of previously submitted series by this user -- 
>   patchwork should have them already)
>d. number of the revision (can be auto-filled if previous field 
>   is provided) and other tags to include in []
> 
>   7. next screen shows final review of what would be sent out to the 
>  list (and cc's, if the hook to get cc's is defined and returned any 
>  results). Once submitted, patchwork sends the patch/series using 
>  patchwork's envelope-from and the user's own email in the From: 
>  header.
> 
>   8. once sent successfully, cleanups are performed (also needs to be 
>  done as part of the regular cron job, for any aborted attempts)
> 
> I know this is a pretty big RFE, and I would like to hear your thoughts 
> about this. If there is general agreement that this is doable/good idea, 
> I may be able to come up with funding for this development as part of 
> the overall tooling improvement proposal.

The procedure itself is not bad, but, if implemented, IMHO, it should,
instead, be something that will run at the machine of the one submitting
the patch. For instance, this could be a perl or python script inside 
Kernel's ./script directory that would handle everything locally, and
then submit the patch via patchwork's REST API.

By using the REST API, it would avoid the need of having to do special
e-mail setups for the casual developers.

Thanks,
Mauro
___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork


Re: RFE: use patchwork to submit a patch

2019-10-10 Thread Konstantin Ryabitsev

On Fri, Oct 11, 2019 at 08:38:49AM +1100, Daniel Axtens wrote:

Hi Konstantin,

tl;dr: I think a git-to-email bridge is a good step. I'm not sure why
patchwork would be the thing to build it on top of, and I'm worried that
it would slow us both down. I'm very open to being convinced though.


In very broad terms, I chose patchwork because:

1. in people's minds patchwork.kernel.org already deals with patches and 
  mailing lists, so it seems to be a logical place for such tool to 
  live
2. it's already built on top of a powerful web system (django), and we 
  already have it up and running, so this wouldn't require setting up 
  Yet Another Web Framework Service



I will readily admit that both of these assertions are pretty tenuous.


If I understand correctly, you're using Patchwork as:


- user creates an account (which requires a mail confirmation)


a) an identity provider, and


Well, rather as a tool that already has account management which 
includes email confirmation at some point.



b) a way to integrate with existing concepts of a project and keep
   metadata about them in 1 place


Yes.


c) a handy tool for getting previous series by a given user.


Yes, it's convenient to already have that user's previously submitted 
series readily available.



d) a 'trusted' source of email.


Well, this part isn't really that important. Rather, it's a tool where, 
to have an account, one must confirm email delivery.



Is that right? I just ask because this idea seems a long way from what
Patchwork traditionally does. That's not necessarily bad, I just want to
make sure I understand, and that if you get funding you're not tying
yourself to a platform that doesn't suit your needs.


Well, in my mind patchwork:

1. already deals with patches and series, including knowing how to do 
  diff highlights and all the fancy stuff
2. will hopefully gain ability to do interdiffs in the future, so if 
  someone submits a series revision, they can see what actually changed 
  before their previous submission and their new attempt

3. already has a lot of knowledge around git, mboxes, formats, etc.

This, of course, is not to say that patchwork is where this *must* 
happen, but I think it would be *nice* if this is where this happens. :)



I'm particularly curious about Patchwork as (a) an identity
provider. You wrote:


- user creates an account (which requires a mail confirmation)


This seems like "optional centralising" on Patchwork - it becomes a
central identity provider but it's optional in that you can just send
email directly if you prefer.


Right, this is a tool to help people allergic to CLI (or who do this 
infrequently enough that they can never remember all the steps, and oh 
my god, why isn't there a web tool to hold my hand?).



If you're going to do 'lightweight' centralisation like that, why not do
it on a platform that already understands git? It's really easy to
extract the information you describe in (c) just by querying the
patchwork API. You don't need to actually integrate into patchwork, or
be logged in, to do that. You lose the ability to load any git remote,
but if you have a git remote that isn't github or gitlab, you probably
already have a good email flow (e.g. if you repo is on kernel.org).

If you really want to use Patchwork as an identity provider, rather than
a forge, could we just teach Patchwork how to be an identity broker, and
then build things separately, authenticating through Patchwork to
confirm a user's identity? That means you could build in whatever
language you like and, critically, run on whatever deployment schedule
you want. You could also get much better isolation that way, which would
be good - I don't want an RCE in the git library to allow someone to
wipe out all patchwork data, for example.


Well, ve hawe vays of prewenting that (e.g. by transitioning git calls 
into their own selinux domain which cannot talk to databases). 


I know this is a pretty big RFE, and I would like to hear your thoughts
about this. If there is general agreement that this is doable/good idea,
I may be able to come up with funding for this development as part of
the overall tooling improvement proposal.


As I said up top, I'm not opposed to this per se. I think a git-to-email
bridge is a good step. I'm just confused as to why patchwork would be
the thing to build it on top of, and I'm worried about how you'd deploy
and update this extended Patchwork. I'm very open to being convinced
though.


Generally, I think patchwork, as a web application that already deals 
with patches and series is a convenient place for this tool to live, 
that's basically the extent of my thinking. For sure, it can exist as a 
separate tool, but then I'd have to set up and maintain that separate 
tool in addition to patchwork, as opposed to just patchwork.


Best,
-K
___
Patchwork mailing list
Patchwork@lists.ozlabs.org

Re: RFE: use patchwork to submit a patch

2019-10-10 Thread Daniel Axtens
Hi Konstantin,

tl;dr: I think a git-to-email bridge is a good step. I'm not sure why
patchwork would be the thing to build it on top of, and I'm worried that
it would slow us both down. I'm very open to being convinced though.

> I would like to propose a new (large) feature to patchwork with the goal 
> to make the process of submitting a patch easier for newbies and people 
> generally less familiar with patch-based development. This was discussed 
> previously on the workflows list: 
> https://lore.kernel.org/workflows/20190930202451.GA14403@pure.paranoia.local/
>
> How I envision this would work:
>
If I understand correctly, you're using Patchwork as:

> - user creates an account (which requires a mail confirmation)

 a) an identity provider, and

> - they choose a "submit patch" option from the menu
> - the patch submission screen has a succession of screens:
>
>   1. a screen with a single field allowing a user to paste a URL to 
>  their fork of the git repository. Once submitted, patchwork does a 
>  "git ls-remote" to attempt to get a list of refs and to verify that 
>  this is indeed a valid git repository
>
>   2. next screen asks the user to select the ref to work from using the 
>  list obtained from the remote. Once submitted, patchwork performs a 
>  `git clone --reference` to clone the repository locally using a 
>  local fork of the same repo to minimize object transfer. This part 
>  requires that:
>a. patchwork project is configured with a path to a local fork, 
>   if this feature is enabled for a project

 b) a way to integrate with existing concepts of a project and keep
metadata about them in 1 place

>b. that fork is kept current via some mechanism outside of 
>   patchwork (e.g. with grokmirror)
>c. there is some sanity-checking during the clone process to 
>   avoid abuse (e.g. a sane timeout, a tmpdir with limited size, 
>   etc -- other suggestions welcome)
>
>   3. next screen asks the user to pick a starting commit from the log.  
>  Once submitted, patchwork generates the patch from the commit 
>  provided to the tip of the branch selected by the user earlier,
>  using git format-patch.
>
>   4. next screen asks the user to review the patch to make sure this is 
>  what they want to submit. Once confirmed, patchwork performs two 
>  admin-defined optional hooks:
>
>a. a hook to generate a list of cc's (e.g. get_maintainer.pl)
>b. a sanity check hook (e.g. checkpatch.pl)
>
>   5. if sanity checking is defined, next screen shows the output of the 
>  sanity check hook, asking confirmation to proceed.
>
>   6. next screen shows the user three fields:
>
>a. title of the patch/series
>b. cover letter for the patch/series
>c. message-id of the previous patch revision (can be picked from 
>   the list of previously submitted series by this user -- 
>   patchwork should have them already)

 c) a handy tool for getting previous series by a given user.

>d. number of the revision (can be auto-filled if previous field 
>   is provided) and other tags to include in []
>
>   7. next screen shows final review of what would be sent out to the 
>  list (and cc's, if the hook to get cc's is defined and returned any 
>  results). Once submitted, patchwork sends the patch/series using 
>  patchwork's envelope-from and the user's own email in the From: 
>  header.

 d) a 'trusted' source of email.

Is that right? I just ask because this idea seems a long way from what
Patchwork traditionally does. That's not necessarily bad, I just want to
make sure I understand, and that if you get funding you're not tying
yourself to a platform that doesn't suit your needs.

I'm particularly curious about Patchwork as (a) an identity
provider. You wrote:

> - user creates an account (which requires a mail confirmation)

This seems like "optional centralising" on Patchwork - it becomes a
central identity provider but it's optional in that you can just send
email directly if you prefer.

If you're going to do 'lightweight' centralisation like that, why not do
it on a platform that already understands git? It's really easy to
extract the information you describe in (c) just by querying the
patchwork API. You don't need to actually integrate into patchwork, or
be logged in, to do that. You lose the ability to load any git remote,
but if you have a git remote that isn't github or gitlab, you probably
already have a good email flow (e.g. if you repo is on kernel.org).

If you really want to use Patchwork as an identity provider, rather than
a forge, could we just teach Patchwork how to be an identity broker, and
then build things separately, authenticating through Patchwork to
confirm a user's identity? That means you could build in whatever
language you like and, critically, run on whatever deployment schedule
you 

Re: [PATCH] parser: Unmangle From: headers that have been mangled for DMARC purposes

2019-10-10 Thread Andrew Donnellan

On 11/10/19 6:41 am, Jonathan Nieder wrote:

Interesting!  I'm cc-ing the Git mailing list in case "git am" might
wnat to learn the same support.
Argh, that reminds me... this patch only rewrites the name and email 
that is recorded as the Patchwork submitter, it doesn't actually rewrite 
the From: header when you fetch the mbox off Patchwork.


Part of me would really like to keep Patchwork mboxes as close as 
possible to the mbox we ingested, but on the other hand it means the 
mangled address is still going to land in the git repo at the end... so 
I should probably just change it?


--
Andrew Donnellan  OzLabs, ADL Canberra
a...@linux.ibm.com IBM Australia Limited

___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork


Re: RFE: use patchwork to submit a patch

2019-10-10 Thread Jonathan Nieder
Eric Wong wrote:
> Konstantin Ryabitsev  wrote:

>> This is actually really fast if you already have a local copy of the
>> repository with most objects. Try this yourself if you have
>> torvalds/linux.git locally:
>>
>> git clone --bare -s torvalds/linux.git test
>
> Yep, -s (--shared) makes cloning really cheap.  One of my goals is to get
>
>   git clone -s https://example.com/torvalds/linux.git
>
> and
>
>   git clone -s https://example.com/torvalds/linux.git/clone.bundle
>
> working.  That would make it easier for new contributors to
> setup lightweight clones and pull in history on an as-needed
> basis w/o hacks like shallow cloning.

Does "git clone --filter=blob:none" do what you're looking for?

Thanks,
Jonathan
___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork


Re: RFE: use patchwork to submit a patch

2019-10-10 Thread Jonathan Nieder
(+cc: git)
Hi,

Konstantin Ryabitsev wrote[1]:

> How I envision this would work:
>
> - user creates an account (which requires a mail confirmation)
> - they choose a "submit patch" option from the menu
> - the patch submission screen has a succession of screens:

Interesting!  This reminds me a bit of https://gitgitgadget.github.io
(except using patchwork instead of github pull requests as substrate).

That leads me to wonder: should these kinds of tools share some code?
Are there any subtleties that web-to-mail patch submission interfaces
have in common or can learn from each other?

Thanks,
Jonathan

[1] 
https://lore.kernel.org/workflows/20191010144150.hqiosvwolm3lmzp5@chatter.i7.local/
___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork


Re: RFE: use patchwork to submit a patch

2019-10-10 Thread Konstantin Ryabitsev

On Thu, Oct 10, 2019 at 03:07:29PM -0300, Mauro Carvalho Chehab wrote:

- the patch submission screen has a succession of screens:

  1. a screen with a single field allowing a user to paste a URL to
 their fork of the git repository.


This will raise the bar, as it will force all developers to have a
public site to host the tree. I guess only a fraction of the 4k kernel
devs have it... In special, the ones that just want to send us a patch
fixing a bug may have serious troubles implementing that.


I don't think this will raise the bar, as Github/Gitlab allow for very 
easy forking of https://github.com/torvalds/linux. This is also not at 
all aimed at "all developers" -- only those that don't want to use the 
current CLI workflow and are more comfortable with web tools like 
Github.


  2. next screen asks the user to select the ref to work from using 
 the

 list obtained from the remote. Once submitted, patchwork performs a
 `git clone --reference` to clone the repository locally using a
 local fork of the same repo to minimize object transfer. This part
 requires that:
   a. patchwork project is configured with a path to a local fork,
  if this feature is enabled for a project
   b. that fork is kept current via some mechanism outside of
  patchwork (e.g. with grokmirror)
   c. there is some sanity-checking during the clone process to
  avoid abuse (e.g. a sane timeout, a tmpdir with limited size,
  etc -- other suggestions welcome)


That would require a high bandwidth at the machine with as patchwork.
Also, doesn't sound a good idea to me, as the server may end by having
tons of open sections, most waiting for tens of minutes, in order to
complete git clone.


This is actually really fast if you already have a local copy of the 
repository with most objects. Try this yourself if you have 
torvalds/linux.git locally:


git clone --bare -s torvalds/linux.git test
cd test
git remote add arm-soc 
https://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
git fetch arm-soc for-next

The whole process takes a second or so and the resulting repo is 328K in 
size.


Of course, this assumes that the remote repository isn't trying to do 
something nasty, which is why I suggest anti-abuse precautions.


I know this is a pretty big RFE, and I would like to hear your 
thoughts

about this. If there is general agreement that this is doable/good idea,
I may be able to come up with funding for this development as part of
the overall tooling improvement proposal.


The procedure itself is not bad, but, if implemented, IMHO, it should,
instead, be something that will run at the machine of the one submitting
the patch. For instance, this could be a perl or python script inside
Kernel's ./script directory that would handle everything locally, and
then submit the patch via patchwork's REST API.


I think I didn't make clear that this isn't supposed to go straight to 
patchwork. Patchwork is merely a convenient tool where this happens -- 
the resulting patch gets mailed out to the mailing list just as the user 
would have done.


-K
___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork


Re: [PATCH] parser: Unmangle From: headers that have been mangled for DMARC purposes

2019-10-10 Thread Jonathan Nieder
Hi,

Andrew Donnellan wrote:

> To avoid triggering spam filters due to failed signature validation, many
> mailing lists mangle the From header to change the From address to be the
> address of the list, typically where the sender's domain has a strict DMARC
> policy enabled.
>
> In this case, we should try to unmangle the From header.
>
> Add support for using the X-Original-Sender or Reply-To headers, as used by
> Google Groups and Mailman respectively, to unmangle the From header when
> necessary.
>
> Closes: #64 ("Incorrect submitter when using googlegroups")
> Reported-by: Alexandre Belloni 
> Reported-by: Stephen Rothwell 
> Signed-off-by: Andrew Donnellan 
> ---
>  patchwork/parser.py| 75 ++
>  patchwork/tests/test_parser.py | 68 --
>  2 files changed, 130 insertions(+), 13 deletions(-)

Interesting!  I'm cc-ing the Git mailing list in case "git am" might
wnat to learn the same support.

Thanks,
Jonathan

(patch left unsnipped for reference)
> diff --git a/patchwork/parser.py b/patchwork/parser.py
> index 7dc66bc05a5b..79beac5617b1 100644
> --- a/patchwork/parser.py
> +++ b/patchwork/parser.py
> @@ -321,12 +321,7 @@ def find_series(project, mail, author):
>  return _find_series_by_markers(project, mail, author)
>  
>  
> -def get_or_create_author(mail):
> -from_header = clean_header(mail.get('From'))
> -
> -if not from_header:
> -raise ValueError("Invalid 'From' header")
> -
> +def split_from_header(from_header):
>  name, email = (None, None)
>  
>  # tuple of (regex, fn)
> @@ -355,6 +350,65 @@ def get_or_create_author(mail):
>  (name, email) = fn(match.groups())
>  break
>  
> +return (name, email)
> +
> +
> +# Unmangle From addresses that have been mangled for DMARC purposes.
> +#
> +# To avoid triggering spam filters due to failed signature validation, many
> +# mailing lists mangle the From header to change the From address to be the
> +# address of the list, typically where the sender's domain has a strict
> +# DMARC policy enabled.
> +#
> +# Unfortunately, there's no standardised way of preserving the original
> +# From address.
> +#
> +# Google Groups adds an X-Original-Sender header. If present, we use that.
> +#
> +# Mailman preserves the original address by adding a Reply-To, except in the
> +# case where the list is set to either reply to list, or reply to a specific
> +# address, in which case the original From is added to Cc instead. These 
> corner
> +# cases are dumb, but we try and handle things as sensibly as possible by
> +# looking for a name in Reply-To/Cc that matches From. It's inexact but 
> should
> +# be good enough for our purposes.
> +def get_original_sender(mail, name, email):
> +if name and ' via ' in name:
> +# Mailman uses the format " via "
> +# Google Groups uses "'' via "
> +stripped_name = name[:name.rfind(' via ')].strip().strip("'")
> +
> +original_sender = clean_header(mail.get('X-Original-Sender', ''))
> +if original_sender:
> +new_email = split_from_header(original_sender)[1].strip()[:255]
> +return (stripped_name, new_email)
> +
> +addrs = []
> +reply_to_headers = mail.get_all('Reply-To') or []
> +cc_headers = mail.get_all('Cc') or []
> +for header in reply_to_headers + cc_headers:
> +header = clean_header(header)
> +addrs = header.split(",")
> +for addr in addrs:
> +new_name, new_email = split_from_header(addr)
> +if new_name:
> +new_name = new_name.strip()[:255]
> +if new_email:
> +new_email = new_email.strip()[:255]
> +if new_name == stripped_name:
> +return (stripped_name, new_email)
> +
> +# If we can't figure out the original sender, just keep it as is
> +return (name, email)
> +
> +
> +def get_or_create_author(mail, project=None):
> +from_header = clean_header(mail.get('From'))
> +
> +if not from_header:
> +raise ValueError("Invalid 'From' header")
> +
> +name, email = split_from_header(from_header)
> +
>  if not email:
>  raise ValueError("Invalid 'From' header")
>  
> @@ -362,6 +416,9 @@ def get_or_create_author(mail):
>  if name is not None:
>  name = name.strip()[:255]
>  
> +if project and email.lower() == project.listemail.lower():
> +name, email = get_original_sender(mail, name, email)
> +
>  # this correctly handles the case where we lose the race to create
>  # the person and another process beats us to it. (If the record
>  # does not exist, g_o_c invokes _create_object_from_params which
> @@ -1004,7 +1061,7 @@ def parse_mail(mail, list_id=None):
>  
>  if not is_comment and (diff or pull_url):  # patches or pull requests
>  # we delay the saving until we know we have a patch.
> -author = get_or_create_author(mail)
> +author = 

Re: [PATCH v2 3/3] /api/events: Add 'actor' field to generated JSON

2019-10-10 Thread Stephen Finucane
On Wed, 2019-10-09 at 18:44 +0200, Johan Herland wrote:
> Before I post v3, I'll wait for some more responses to the discussion
> on patch 2/3 about the best way (or rather: least horrible way) to
> determine the active user and forward it into the event creation.

I'll try do just that tomorrow. If I don't, it'll be next week (I'm at
PyCon IE this weekend). Just FYI.

Stephen

___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork


RFE: use patchwork to submit a patch

2019-10-10 Thread Konstantin Ryabitsev

Hi, all:

I would like to propose a new (large) feature to patchwork with the goal 
to make the process of submitting a patch easier for newbies and people 
generally less familiar with patch-based development. This was discussed 
previously on the workflows list: 
https://lore.kernel.org/workflows/20190930202451.GA14403@pure.paranoia.local/


How I envision this would work:

- user creates an account (which requires a mail confirmation)
- they choose a "submit patch" option from the menu
- the patch submission screen has a succession of screens:

 1. a screen with a single field allowing a user to paste a URL to 
their fork of the git repository. Once submitted, patchwork does a 
"git ls-remote" to attempt to get a list of refs and to verify that 
this is indeed a valid git repository


 2. next screen asks the user to select the ref to work from using the 
list obtained from the remote. Once submitted, patchwork performs a 
`git clone --reference` to clone the repository locally using a 
local fork of the same repo to minimize object transfer. This part 
requires that:
  a. patchwork project is configured with a path to a local fork, 
 if this feature is enabled for a project
  b. that fork is kept current via some mechanism outside of 
 patchwork (e.g. with grokmirror)
  c. there is some sanity-checking during the clone process to 
 avoid abuse (e.g. a sane timeout, a tmpdir with limited size, 
 etc -- other suggestions welcome)


 3. next screen asks the user to pick a starting commit from the log.  
Once submitted, patchwork generates the patch from the commit 
provided to the tip of the branch selected by the user earlier,

using git format-patch.

 4. next screen asks the user to review the patch to make sure this is 
what they want to submit. Once confirmed, patchwork performs two 
admin-defined optional hooks:


  a. a hook to generate a list of cc's (e.g. get_maintainer.pl)
  b. a sanity check hook (e.g. checkpatch.pl)

 5. if sanity checking is defined, next screen shows the output of the 
sanity check hook, asking confirmation to proceed.


 6. next screen shows the user three fields:

  a. title of the patch/series
  b. cover letter for the patch/series
  c. message-id of the previous patch revision (can be picked from 
 the list of previously submitted series by this user -- 
 patchwork should have them already)
  d. number of the revision (can be auto-filled if previous field 
 is provided) and other tags to include in []


 7. next screen shows final review of what would be sent out to the 
list (and cc's, if the hook to get cc's is defined and returned any 
results). Once submitted, patchwork sends the patch/series using 
patchwork's envelope-from and the user's own email in the From: 
header.


 8. once sent successfully, cleanups are performed (also needs to be 
done as part of the regular cron job, for any aborted attempts)


I know this is a pretty big RFE, and I would like to hear your thoughts 
about this. If there is general agreement that this is doable/good idea, 
I may be able to come up with funding for this development as part of 
the overall tooling improvement proposal.


Best regards,
-K
___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork


[PATCH] parser: Unmangle From: headers that have been mangled for DMARC purposes

2019-10-10 Thread Andrew Donnellan
To avoid triggering spam filters due to failed signature validation, many
mailing lists mangle the From header to change the From address to be the
address of the list, typically where the sender's domain has a strict DMARC
policy enabled.

In this case, we should try to unmangle the From header.

Add support for using the X-Original-Sender or Reply-To headers, as used by
Google Groups and Mailman respectively, to unmangle the From header when
necessary.

Closes: #64 ("Incorrect submitter when using googlegroups")
Reported-by: Alexandre Belloni 
Reported-by: Stephen Rothwell 
Signed-off-by: Andrew Donnellan 
---
 patchwork/parser.py| 75 ++
 patchwork/tests/test_parser.py | 68 --
 2 files changed, 130 insertions(+), 13 deletions(-)

diff --git a/patchwork/parser.py b/patchwork/parser.py
index 7dc66bc05a5b..79beac5617b1 100644
--- a/patchwork/parser.py
+++ b/patchwork/parser.py
@@ -321,12 +321,7 @@ def find_series(project, mail, author):
 return _find_series_by_markers(project, mail, author)
 
 
-def get_or_create_author(mail):
-from_header = clean_header(mail.get('From'))
-
-if not from_header:
-raise ValueError("Invalid 'From' header")
-
+def split_from_header(from_header):
 name, email = (None, None)
 
 # tuple of (regex, fn)
@@ -355,6 +350,65 @@ def get_or_create_author(mail):
 (name, email) = fn(match.groups())
 break
 
+return (name, email)
+
+
+# Unmangle From addresses that have been mangled for DMARC purposes.
+#
+# To avoid triggering spam filters due to failed signature validation, many
+# mailing lists mangle the From header to change the From address to be the
+# address of the list, typically where the sender's domain has a strict
+# DMARC policy enabled.
+#
+# Unfortunately, there's no standardised way of preserving the original
+# From address.
+#
+# Google Groups adds an X-Original-Sender header. If present, we use that.
+#
+# Mailman preserves the original address by adding a Reply-To, except in the
+# case where the list is set to either reply to list, or reply to a specific
+# address, in which case the original From is added to Cc instead. These corner
+# cases are dumb, but we try and handle things as sensibly as possible by
+# looking for a name in Reply-To/Cc that matches From. It's inexact but should
+# be good enough for our purposes.
+def get_original_sender(mail, name, email):
+if name and ' via ' in name:
+# Mailman uses the format " via "
+# Google Groups uses "'' via "
+stripped_name = name[:name.rfind(' via ')].strip().strip("'")
+
+original_sender = clean_header(mail.get('X-Original-Sender', ''))
+if original_sender:
+new_email = split_from_header(original_sender)[1].strip()[:255]
+return (stripped_name, new_email)
+
+addrs = []
+reply_to_headers = mail.get_all('Reply-To') or []
+cc_headers = mail.get_all('Cc') or []
+for header in reply_to_headers + cc_headers:
+header = clean_header(header)
+addrs = header.split(",")
+for addr in addrs:
+new_name, new_email = split_from_header(addr)
+if new_name:
+new_name = new_name.strip()[:255]
+if new_email:
+new_email = new_email.strip()[:255]
+if new_name == stripped_name:
+return (stripped_name, new_email)
+
+# If we can't figure out the original sender, just keep it as is
+return (name, email)
+
+
+def get_or_create_author(mail, project=None):
+from_header = clean_header(mail.get('From'))
+
+if not from_header:
+raise ValueError("Invalid 'From' header")
+
+name, email = split_from_header(from_header)
+
 if not email:
 raise ValueError("Invalid 'From' header")
 
@@ -362,6 +416,9 @@ def get_or_create_author(mail):
 if name is not None:
 name = name.strip()[:255]
 
+if project and email.lower() == project.listemail.lower():
+name, email = get_original_sender(mail, name, email)
+
 # this correctly handles the case where we lose the race to create
 # the person and another process beats us to it. (If the record
 # does not exist, g_o_c invokes _create_object_from_params which
@@ -1004,7 +1061,7 @@ def parse_mail(mail, list_id=None):
 
 if not is_comment and (diff or pull_url):  # patches or pull requests
 # we delay the saving until we know we have a patch.
-author = get_or_create_author(mail)
+author = get_or_create_author(mail, project)
 
 delegate = find_delegate_by_header(mail)
 if not delegate and diff:
@@ -1099,7 +1156,7 @@ def parse_mail(mail, list_id=None):
 is_cover_letter = True
 
 if is_cover_letter:
-author = get_or_create_author(mail)
+author = get_or_create_author(mail, project)
 
 # we don't use 'find_series' here as a cover letter