[Mailman-Developers] Re: Code formatting tools for Mailman project

2022-10-31 Thread Stephen J. Turnbull
Abhilash Raj writes:
 > I don't see the tools actually supporting formatting only-diffs.

I don't know if git diff supports everything GNU diff does but
--ignore-all-space would likely help quite a bit.

It shouldn't be that hard to write a blue-diff command for git:

1.  create a temporary branches at the relevant commits
2.  for each temporary branch
2.1 checkout the branch
2.2 run blue on the relevant .py files and commit
3.  diff the temparary branches and save the output
4.  delete the temporary branches

I don't know if this would help in practice, but the theory seems
good.

Scripting that should be easy, and I can test it on the Postorius
code.  I'm not going to get to it before December, though, so if you
want to do it first :-D

 > It _can_ be done by piping all the updated files to `blue` command,
 > but then reverting style changes in parts of file where there
 > wasn't any real change made would be more work than just fixing the
 > flake8 errors manually.

Is that necessary?  Since the desired result is reformatted files
without impairing the ability to diff, why would we revert the style
changes?

 > I am trying to think if we can make that process better, after making a 
 > one-time code formatting change, through tooling or alternative commands 
 > to grok the diffs.

Additional tools to parse the diffs might be useful.  Another
possibility would be to diff the AST. :-O

 > Are you usually looking for just anything that looks odd or typically 
 > any kind of difference that looks suspicious or just trying to learn the 
 > differences?

Both of those are common.  Typically it's something where I can't come
up with a bisectable test, eg, a GUI regression.  Then it's wait for
rebuild, fire up the app, mess with the GUI, repeat ... so I don't
actually bisect, once I have a bracket I just take that diff.

 > Can just comparing the commits/commit messages?

git diff --stat often helps isolate to a file.

 > or Merge Requests merged since the feature branch would help since
 > we mandate all changes go through MR workflow?

I don't see how that helps if you're looking at a dozen commits
touching that file on main.

 > I don't know if I can speak much about comparing raw diffs, because I 
 > very rarely end up doing that.

I try to avoid it, but about even in the large code bases I know best
(XEmacs, Lisp and C) half the time I end up looking at diffs.

 > I am usually looking at commit level data and looking for MRs that
 > last changed the point I am interested in with `git-blame` for some
 > contextual data around the MR. Merge commits have the MR no to
 > easily track where the change came from.

I rarely can pinpoint the relevant code without a diff, though.  Most
of the time the bad data is several frames up the stack, and was
generated far from the visible error.

Also, you and I are likely to be working on code we just wrote or
reviewed.  We need to consider the drive-by contributor who is not
going to know our code well.

Steve
___
Mailman-Developers mailing list -- mailman-developers@python.org
To unsubscribe send an email to mailman-developers-le...@python.org
https://mail.python.org/mailman3/lists/mailman-developers.python.org/
Mailman FAQ: https://wiki.list.org/x/AgA3

Security Policy: https://wiki.list.org/x/QIA9


[Mailman-Developers] Re: Code formatting tools for Mailman project

2022-10-31 Thread Abhilash Raj

On 10/28/22 00:04, Stephen J. Turnbull wrote:

Abhilash Raj writes:
  > I am not sure I understand this part. Do you mean to say we'd be getting
  > too may PRs with "formatting-only" changes?

No, I mean the diffs we do to localize errors.  The day after you
merge the reformatting PR, anybody who wants to identify a regression
since Postorius 1.3.7 is going to get as many lines of " -> ' as they
do of code changes.  Half the time what appears to be a regression
from future 1.3.8 to HEAD is going to turn out to be older than the
reformat merge, same problem just less frequent.  It gets less
frequent over time, but it will take quite a while before the reformat
merge consistently contributes less than 10% of changes, I expect.


Ah, I get your point now. And, I agree with you that it would make it 
difficult to compare raw diffs.



  > Currently, my thought is to do a one-time format for the entire codebase
  > and then add "blue --check" as a part of the "qa" CI gate. Then, for
  > each subsequent MR that follows, people would need to run "tox -e
  > format" before they commit and create MR. The hope there is also

Yes, this is a great idea for new files, or files you're refactoring
to death anyway.  But it has a potentially large downside of turning
diffs that should be one-line changes into rafts of fixquote changes.


I don't see the tools actually supporting formatting only-diffs. It 
_can_ be done by piping all the updated files to `blue` command, but 
then reverting style changes in parts of file where there wasn't any 
real change made would be more work than just fixing the flake8 errors 
manually.


It sounds all-or-nothing kind of scenario when it comes to formatting 
tools, blue/black at least from what I can see.



  > My motivation here is to not have to push "oops, make flake8 gods happy"
  > commits on top of my PRs and not have to worry about it when writing code.
  >
  > While it isn't too huge of an issue, it is still manual effort that I
  > feel can be automated to reduce some work and save time.

Sure, but doing it globally also makes some work.  The question is
does it make more work (and annoyance) for everybody who does diffs
than is experienced by the developer who needs to make a flake8
commit?  I do a lot more diffs on Mailman than I do commits.


I am trying to think if we can make that process better, after making a 
one-time code formatting change, through tooling or alternative commands 
to grok the diffs.


Are you usually looking for just anything that looks odd or typically 
any kind of difference that looks suspicious or just trying to learn the 
differences? Can just comparing the commits/commit messages? or Merge 
Requests merged since the feature branch would help since we mandate all 
changes go through MR workflow?


I don't know if I can speak much about comparing raw diffs, because I 
very rarely end up doing that. I am usually looking at commit level data 
and looking for MRs that last changed the point I am interested in with 
`git-blame` for some contextual data around the MR. Merge commits have 
the MR no to easily track where the change came from.


  > > What is the purpose of this?  Anybody who runs it on a code base
  > > before the blue'd code gets merged is going to generate hard to read
  > > crufty PRs, no?
  >
  > How so? If the existing code base is already auto-formatted, then the
  > PRs would be just regular diffs, formatted with the same tool.

Maybe tox -e format isn't a problem because you would add that tox
target in the same commit (or later) you do the reformat, but if this
is going to be policy, some people will aggressively reformat their
code to that standard before the package is reformatted.  Every once
in a while you still see MRs in Python where somebody PEP8s a whole
stdlib module where the contribution is a 2-line doc improvement.
Those get rejected of course.  We should do that too.


My intent with `tox -e format` was to aid with the formatting of the 
changes folks would contribute, with the assumption that the entire 
codebase is already formatted.


But yeah, we shouldn't really entertain format-only changes.


  > I've tried to solve at least the "git-blame" issue by adding the rev of
  > the commit with the "mass-refactor" into a file in the main repo that
  > you can feed to "git blame --ignore-rev-file .git-blame-ignore-revs".
  >
  > For editors, there is a way to set git config for the blame-ignore-file
  > so it works across everything that uses git.

That's very helpful for some operations.  How to make it discoverable
is something of a headache though -- I didn't know there was a
blame-ignore-file!


Yeah, I agree. I just added to Postorius with the formatting changes, 
currently it doesn't exist for other repos since we haven't really done 
any of those formatting changes.



--
thanks,
Abhilash Raj (maxking)



OpenPGP_signature
Description: OpenPGP digital signature
___

[Mailman-Developers] Re: Code formatting tools for Mailman project

2022-10-27 Thread Stephen J. Turnbull
Abhilash Raj writes:

 > > I (speaking only for myself) have no objection going forward for new
 > > files and major refactorings.  But if you do this all at once, I fear
 > > that most interesting diffs for the next year or so will be full of
 > > reformatting cruft, 
 > 
 > I am not sure I understand this part. Do you mean to say we'd be getting 
 > too may PRs with "formatting-only" changes?

No, I mean the diffs we do to localize errors.  The day after you
merge the reformatting PR, anybody who wants to identify a regression
since Postorius 1.3.7 is going to get as many lines of " -> ' as they
do of code changes.  Half the time what appears to be a regression
from future 1.3.8 to HEAD is going to turn out to be older than the
reformat merge, same problem just less frequent.  It gets less
frequent over time, but it will take quite a while before the reformat
merge consistently contributes less than 10% of changes, I expect.

 > Currently, my thought is to do a one-time format for the entire codebase 
 > and then add "blue --check" as a part of the "qa" CI gate. Then, for 
 > each subsequent MR that follows, people would need to run "tox -e 
 > format" before they commit and create MR. The hope there is also

Yes, this is a great idea for new files, or files you're refactoring
to death anyway.  But it has a potentially large downside of turning
diffs that should be one-line changes into rafts of fixquote changes.

 > My motivation here is to not have to push "oops, make flake8 gods happy" 
 > commits on top of my PRs and not have to worry about it when writing code.
 > 
 > While it isn't too huge of an issue, it is still manual effort that I 
 > feel can be automated to reduce some work and save time.

Sure, but doing it globally also makes some work.  The question is
does it make more work (and annoyance) for everybody who does diffs
than is experienced by the developer who needs to make a flake8
commit?  I do a lot more diffs on Mailman than I do commits.

 > > What is the purpose of this?  Anybody who runs it on a code base
 > > before the blue'd code gets merged is going to generate hard to read
 > > crufty PRs, no?  
 > 
 > How so? If the existing code base is already auto-formatted, then the 
 > PRs would be just regular diffs, formatted with the same tool.

Maybe tox -e format isn't a problem because you would add that tox
target in the same commit (or later) you do the reformat, but if this
is going to be policy, some people will aggressively reformat their
code to that standard before the package is reformatted.  Every once
in a while you still see MRs in Python where somebody PEP8s a whole
stdlib module where the contribution is a 2-line doc improvement.
Those get rejected of course.  We should do that too.

 > I've tried to solve at least the "git-blame" issue by adding the rev of 
 > the commit with the "mass-refactor" into a file in the main repo that 
 > you can feed to "git blame --ignore-rev-file .git-blame-ignore-revs".
 > 
 > For editors, there is a way to set git config for the blame-ignore-file 
 > so it works across everything that uses git.

That's very helpful for some operations.  How to make it discoverable
is something of a headache though -- I didn't know there was a
blame-ignore-file!

Steve
___
Mailman-Developers mailing list -- mailman-developers@python.org
To unsubscribe send an email to mailman-developers-le...@python.org
https://mail.python.org/mailman3/lists/mailman-developers.python.org/
Mailman FAQ: https://wiki.list.org/x/AgA3

Security Policy: https://wiki.list.org/x/QIA9


[Mailman-Developers] Re: Code formatting tools for Mailman project

2022-10-27 Thread Abhilash Raj

On 10/27/22 22:10, Stephen J. Turnbull wrote:

Abhilash Raj writes:

  > I am trying out some of the code formatting tools on Mailman repos to
  > help reduce some efforts in development. The tools I am trying out is
  > blue[1]

Please don't hurry to merge this to main/master.  Remember that Python
has been rejecting proposals to mass-PEP8 the stdlib for about 2
decades now.


So far, I did get it merged in Postorius, but that's the only one so 
far. Obviously git gives us powers to undo that, so we should still 
discuss this.




I (speaking only for myself) have no objection going forward for new
files and major refactorings.  But if you do this all at once, I fear
that most interesting diffs for the next year or so will be full of
reformatting cruft, 


I am not sure I understand this part. Do you mean to say we'd be getting 
too may PRs with "formatting-only" changes?


Currently, my thought is to do a one-time format for the entire codebase 
and then add "blue --check" as a part of the "qa" CI gate. Then, for 
each subsequent MR that follows, people would need to run "tox -e 
format" before they commit and create MR. The hope there is also



On the other hand, I don't
find any Mailman code hard or annoying to read because of stylistic
differences.


That's true, the current state of code base isn't the motivation really 
for me personally :-)


My motivation here is to not have to push "oops, make flake8 gods happy" 
commits on top of my PRs and not have to worry about it when writing code.


While it isn't too huge of an issue, it is still manual effort that I 
feel can be automated to reduce some work and save time. My editor 
sometimes helps me with annoying red and yellow underlines, but I don't 
still see a need to have to resolve any formatting only error manually 
during development process.


From a contributor's perspective also, even though it is not a massive 
help, it reduces one less CI check (qa -- partially) you don't need to 
manually fix for.



I'll take a look at the PRs but it will probably be a while (weeks)
before I have a good sense for just how much disruption is involved.

  > and isort[2]

I'm cautiously in favor of this.  Standardizing import format is a big
enough improvement when reading code to justify the relatively small
amount of churn.

  > I am also adding a new tox command `tox -e format` that should run the
  > appropriate commands locally for users to run.

What is the purpose of this?  Anybody who runs it on a code base
before the blue'd code gets merged is going to generate hard to read
crufty PRs, no?  


How so? If the existing code base is already auto-formatted, then the 
PRs would be just regular diffs, formatted with the same tool.


I am probably missing something here.


I guess people with long-lasting feature branches may
find it convenient in rebasing their branches (but I think that's
going to be a massive PITA regardless[1]).  I think the documentation
should be *very* opinionated about getting input from core devs before
blue'ing a file.


> Footnotes:
> [1]  Does anybody have experience in rebasing branches across a
> massive reformatting by upstream?  Are there best practices here?


Yeah, rebasing is going to be an issue, but I agree with you that it is 
an issue regardless.


I've tried to solve at least the "git-blame" issue by adding the rev of 
the commit with the "mass-refactor" into a file in the main repo that 
you can feed to "git blame --ignore-rev-file .git-blame-ignore-revs".


For editors, there is a way to set git config for the blame-ignore-file 
so it works across everything that uses git.



--
thanks,
Abhilash Raj (maxking)



OpenPGP_signature
Description: OpenPGP digital signature
___
Mailman-Developers mailing list -- mailman-developers@python.org
To unsubscribe send an email to mailman-developers-le...@python.org
https://mail.python.org/mailman3/lists/mailman-developers.python.org/
Mailman FAQ: https://wiki.list.org/x/AgA3

Security Policy: https://wiki.list.org/x/QIA9