Re: Proposal for the transition timetable for the move to GIT

2020-01-13 Thread Jeff Law
On Thu, 2020-01-09 at 12:30 +, Joseph Myers wrote:
> On Wed, 8 Jan 2020, Jeff Law wrote:
> 
> > Is there any chance we could get one more trunk snapshot before the
> > conversion starts -- even if that means firing up the snapshot process
> > Friday?  It'd be quite useful for the ongoing Fedora build testing.
> 
> I could run a snapshot manually.  I was planning to run at least one 
> snapshot (for some branch) manually *after* the conversion to test the 
> conversion of the gcc_release script to use git (in snapshot mode that 
> doesn't make any commits so could be done while the git repository is 
> still read-only for checking).
THanks.  It was greatly appreciated.

Jeff



Re: Proposal for the transition timetable for the move to GIT

2020-01-11 Thread Segher Boessenkool
On Fri, Jan 10, 2020 at 12:38:10PM +0100, Richard Biener wrote:
> Just to chime in I also just want to get it done (well, I can handle
> SVN as well :P).

I will never have to learn it!  I'm so happy!

> I trust Joseph, too, but then from my POV anything not worse than the current
> mirror works for me.  Thanks to Maxim anyway for all the work - without that
> we'd not switch in 10 other years...

Absolutely agreed!  Thank you, Maxim.


Segher


Re: Proposal for the transition timetable for the move to GIT

2020-01-11 Thread Segher Boessenkool
On Fri, Jan 10, 2020 at 09:49:41AM +, Richard Earnshaw (lists) wrote:
> On 10/01/2020 07:33, Maxim Kuvyrkov wrote:
> >>On Jan 9, 2020, at 5:38 AM, Segher Boessenkool 
> >> wrote:
> >>Where and when and by who was it decided to use this conversion?
> >
> >Joseph, please point to message on gcc@ mailing list that expresses 
> >consensus of GCC community to use reposurgeon conversion.  Otherwise, it 
> >is not appropriate to substitute one's opinion for community consensus.
> 
> I've gone back through this thread (if I've missed, or misrepresented, 
> anybody who's expressed an opinion I apologize now).
> 
> Segher Boessenkool 
> "If Joseph and Richard agree a candidate is good, then I will agree as
> well.  All that can be left is nit-picking, and that is not worth it
> anyway:"

That is not saying I agree the reposurgeon conversion is best if you two
agree.  It says that if you think that is a good conversion, then I agree.
However I do still think it is the worst of the three options, in some
regards.

> So I don't see any clear dissent and most folks just want to get this 
> done.

Yes.  After the GCC community took over five years to decide to switch
to git, and then we were delayed by another almost five years because it
just *had* to be done using reposurgeon, we just want it *done*, and even
the reposurgeon option is acceptable, in my book.

I don't look at old commit messages *at all* (*).  Mangled patch authors
can be harder, but I do have old trees as well, worst case.  We'll
survive, the info in the changelogs is still there.  And hopefully new
patches will eventually have good author info and commit messages.

To a gitty future, onwards and upwards, etc.,


Segher


(*) That's a lie: I look at it a lot, but only to extract the SVN
revision number from it!


Re: Proposal for the transition timetable for the move to GIT

2020-01-11 Thread Segher Boessenkool
On Thu, Jan 09, 2020 at 12:12:49PM +, Richard Earnshaw (lists) wrote:
> On 09/01/2020 02:38, Segher Boessenkool wrote:
> >Where and when and by who was it decided to use this conversion?
> >
> >Will it at least be *tested* first?
> 
> Tested for what?

Acceptance test, of course, the only test that matters.  I.e. the GCC
community gets to decide if this conversion is acceptable to them, instead
of being confronted with it as a fait accompli.

> I want to also take this opportunity to thank Maxim for the work he has 
> done.  Having that fallback option has meant that we could press harder 
> for a timely solution and has also driven several significant 
> improvements to the overall result.  I do not think we would have 
> achieved as good a result overall if he hadn't developed his scripts.

And my thanks go to you and everyone else who tried to make this result
in a git conversion that is the most useful for us, the GCC developers
(and other consumers of our repo)!


Segher


Re: Proposal for the transition timetable for the move to GIT

2020-01-10 Thread Gerald Pfeifer
On Thu, 9 Jan 2020, Joseph Myers wrote:
>> Is there any chance we could get one more trunk snapshot before the
>> conversion starts -- even if that means firing up the snapshot process
>> Friday?  It'd be quite useful for the ongoing Fedora build testing.
> I could run a snapshot manually.  I was planning to run at least one 
> snapshot (for some branch) manually *after* the conversion to test the 
> conversion of the gcc_release script to use git (in snapshot mode that 
> doesn't make any commits so could be done while the git repository is 
> still read-only for checking).

Saturday's the GCC 9 snapshots are on, Sunday's GCC 10, so with a
GCC 10 snapshot out yesterday, perhaps run a GCC 9 snapshot today
or tomorrow and then fall back to the regular cadence?

Gerald


Re: Proposal for the transition timetable for the move to GIT

2020-01-10 Thread Gerald Pfeifer
On Fri, 10 Jan 2020, Maxim Kuvyrkov wrote:
> I was wrong re. r182541, I didn't notice that it is the first commit on 
> branch.  This renders the analysis in favor of reposurgeon conversion, 
> not svn-git.

Kudos for that statement, Maxim.

And thanks a bunch for all the work you have been doing, even if
the other conversion was picked in the end.  Like others have said,
without that we would not be where we are now.

Thank you,
Gerald


Re: Proposal for the transition timetable for the move to GIT

2020-01-10 Thread Maxim Kuvyrkov
> On Jan 10, 2020, at 6:15 PM, Joseph Myers  wrote:
> 
> On Fri, 10 Jan 2020, Maxim Kuvyrkov wrote:
> 
>> To me this looks like cherry-picks of r182541 and r182547 from 
>> redhat/gcc-4_7-branch into redhat/gcc-4_8-branch.
> 
> r182541 is the first commit on /branches/redhat/gcc-4_7-branch after it 
> was created as a copy of trunk.  I.e., merging and cherry-picking it are 
> indistinguishable, and it's entirely correct for reposurgeon to consider a 
> commit merging it as a merge from r182541 (together with a cherry-pick of 
> r182547).

I was wrong re. r182541, I didn't notice that it is the first commit on branch. 
 This renders the analysis in favor of reposurgeon conversion, not svn-git.

--
Maxim Kuvyrkov
https://www.linaro.org



Re: Proposal for the transition timetable for the move to GIT

2020-01-10 Thread Eric S. Raymond
Bernd Schmidt :
> I was on the fence for a long time, since I felt that the rewritten
> reposurgeon was still somewhat unproven.

And that was a fair criticism for a short while, until the first compare-all
verification on the GCC history came back clean.

The most difficult point in the whole process for me was in late
November.  That was when I faced up to the fact that, while I had a
Subversion dump reader that was 95% good, (1) that 5% could
disqualify it for this complex a history, and (2) I wasn't going to
be able to solve that last 5% without tearing down most of the reader
and rebuilding it.

The problem was that I'd been patching the dump reader to fix edge
cases for too long, and the code had rigidified. Too many auxiliary
data structures with partially overlapping semabtics - I couldn't
change anything without breaking everything. Which is the universe's
way of telling you it's time for a rewrite.

Of course the risk was that I wouldn't get that rewrite done in time
for deadline. But I had two assets that mitigated the risk. One was
a couple of very sharp collaborators, Julien Rivaud and Daniel Brooks
(and later another, Edward Cree). The other was having a really good
test suite, and a well-established procedure for integrating new
tests that jsm and rearnshaw were able to use.

It was (as the Duke of Wellington famously said) a damned near-run
thing. With all those advantages, if I had waited even a week longer
to make the crucial scrap-and-rebuild decision, the new reader might
have landed too late.

There's a lesson in here somewhere. When I figure out what it is, I'll
put it in my next book.
-- 
http://www.catb.org/~esr/;>Eric S. Raymond




Re: Proposal for the transition timetable for the move to GIT

2020-01-10 Thread Joseph Myers
On Thu, 9 Jan 2020, Joseph Myers wrote:

> On Wed, 8 Jan 2020, Jeff Law wrote:
> 
> > Is there any chance we could get one more trunk snapshot before the
> > conversion starts -- even if that means firing up the snapshot process
> > Friday?  It'd be quite useful for the ongoing Fedora build testing.
> 
> I could run a snapshot manually.  I was planning to run at least one 
> snapshot (for some branch) manually *after* the conversion to test the 
> conversion of the gcc_release script to use git (in snapshot mode that 
> doesn't make any commits so could be done while the git repository is 
> still read-only for checking).

This gcc-10-20200110 snapshot is now available.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: Proposal for the transition timetable for the move to GIT

2020-01-10 Thread Joseph Myers
On Fri, 10 Jan 2020, Maxim Kuvyrkov wrote:

> To me this looks like cherry-picks of r182541 and r182547 from 
> redhat/gcc-4_7-branch into redhat/gcc-4_8-branch.

r182541 is the first commit on /branches/redhat/gcc-4_7-branch after it 
was created as a copy of trunk.  I.e., merging and cherry-picking it are 
indistinguishable, and it's entirely correct for reposurgeon to consider a 
commit merging it as a merge from r182541 (together with a cherry-pick of 
r182547).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: Proposal for the transition timetable for the move to GIT

2020-01-10 Thread Maxim Kuvyrkov


> On Jan 10, 2020, at 10:33 AM, Maxim Kuvyrkov  
> wrote:
> 
>> On Jan 9, 2020, at 5:38 AM, Segher Boessenkool  
>> wrote:
>> 
>> On Wed, Jan 08, 2020 at 11:34:32PM +, Joseph Myers wrote:
>>> As noted on overseers, once Saturday's DATESTAMP update has run at 00:16 
>>> UTC on Saturday, I intend to add a README.MOVED_TO_GIT file on SVN trunk 
>>> and change the SVN hooks to make SVN readonly, then disable gccadmin's 
>>> cron jobs that build snapshots and update online documentation until they 
>>> are ready to run with the git repository.  Once the existing git mirror 
>>> has picked up the last changes I'll make that read-only and disable that 
>>> cron job as well, and start the conversion process with a view to having 
>>> the converted repository in place this weekend (it could either be made 
>>> writable as soon as I think it's ready, or left read-only until people 
>>> have had time to do any final checks on Monday).  Before then, I'll work 
>>> on hooks, documentation and maintainer-scripts updates.
>> 
>> Where and when and by who was it decided to use this conversion?
> 
> Joseph, please point to message on gcc@ mailing list that expresses consensus 
> of GCC community to use reposurgeon conversion.  Otherwise, it is not 
> appropriate to substitute one's opinion for community consensus.
> 
> I want GCC community to get the best possible conversion, being it mine or 
> reposurgeon's.  To this end I'm comparing the two conversions and will post 
> my results later today.
> 
> Unfortunately, the comparison is complicated by the fact that you uploaded 
> only "b" version of gcc-reposurgeon-8 repository, which uses modified branch 
> layout (or confirm that there are no substantial differences between "7" and 
> "8" reposurgeon conversions).

There are plenty of difference between reposurgeon and svn-git conversions; 
today I've ignored subjective differences like author and committer entries and 
focused on comparing histories of branches.

Redhat's branches are among the most complicated and below analysis is 
difficult to follow.  It took me most of today to untangle it.  Let's look at 
redhat/gcc-9-branch.

TL;DR:
1. Reposurgeon conversion has extra history (more commits than intended) of 
redhat/gcc-4_7-branch@182541 merged into redhat/gcc-4_8-branch, which is then 
propagated into all following branches including redhat/gcc-9-branch.
2. Svn-git conversion has redhat/gcc-4_8-branch with history corresponding to 
SVN history, with no less and no more commits.
3. Other branches are likely to have similar issues, I didn't check.
4. I consider history of reposurgeon conversion to be incorrect.
5. The only history artifact (extra merges in reparented branches/tags) of 
svn-git conversion has been fixed.
6. I can appreciate that GCC community is tired of this discussion and wants it 
to go away.

Analysis:
Commit histories for redhat/gcc-9-branch match up to history inherited from 
redhat/gcc-4_8-branch (yes, redhat's branch history goes into ancient 
branches).  So now we are looking at redhat/gcc-4_8-branch, and the two 
conversions have different commit histories for redhat/gcc-4_8-branch.  This is 
relevant because it shows up in current development branch.  The histories 
diverge at r194477:

r194477 | jakub | 2012-12-13 13:34:44 + (Thu, 13 Dec 2012) | 3 lines

svn merge -r182540:182541 
svn+ssh://gcc.gnu.org/svn/gcc/branches/redhat/gcc-4_7-branch
svn merge -r182546:182547 
svn+ssh://gcc.gnu.org/svn/gcc/branches/redhat/gcc-4_7-branch

Added: svn:mergeinfo
## -0,0 +0,4 ##
   Merged /branches/redhat/gcc-4_4-branch:r143377,143388,144574,144578,155228
   Merged /branches/redhat/gcc-4_5-branch:r161595
   Merged /branches/redhat/gcc-4_6-branch:r168425
   Merged /branches/redhat/gcc-4_7-branch:r182541,182547


To me this looks like cherry-picks of r182541 and r182547 from 
redhat/gcc-4_7-branch into redhat/gcc-4_8-branch.

[1] Note that commit r182541 is itself a merge of redhat/gcc-4_6-branch@168425 
into redhat/gcc-4_7-branch and cherry-picks from the other branches.  It is an 
actual merge (not cherry-pick) from redhat/gcc-4_6-branch@168425 because 
r168425 is the only commit to redhat/gcc-4_6-branch@168425 not present on 
trunk.  The other branches had more commits to their histories, so they can't 
be represented as git merges.

Reposurgeon commit for r194477 (e601ffdd860b0deed6d7ce78da61e8964c287b0b) 
merges in commit for r182541 from redhat/gcc-4_7-branch bringing *full* history 
of redhat/gcc-4_7-branch into redhat/gcc-4_8-branch.

Svn-git commit for r194477 (98d65ca0b53332e7c9cb62dfe85936a0e71d077e) 
cherry-picks commits r182541 and r182547 from redhat/gcc-4_7-branch.  As part 
of cherry-picking commit r182541 svn-git conversion merges 
redhat/gcc-4_6-branch@168425 into 

Re: Proposal for the transition timetable for the move to GIT

2020-01-10 Thread Martin Liška

On 1/10/20 1:53 PM, Nathan Sidwell wrote:

On 1/10/20 6:38 AM, Richard Biener wrote:


So I don't see any clear dissent and most folks just want to get this
done.


Just to chime in I also just want to get it done (well, I can handle
SVN as well :P).
I trust Joseph, too, but then from my POV anything not worse than the current
mirror works for me.  Thanks to Maxim anyway for all the work - without that
we'd not switch in 10 other years...


Joseph's conversion please


+ 1

I would like to thank to all people involved in the GIT conversion!

Martin



nathan





Re: Proposal for the transition timetable for the move to GIT

2020-01-10 Thread Bernd Schmidt

On 1/10/20 8:33 AM, Maxim Kuvyrkov wrote:

Joseph, please point to message on gcc@ mailing list that expresses consensus 
of GCC community to use reposurgeon conversion.  Otherwise, it is not 
appropriate to substitute one's opinion for community consensus.


I was on the fence for a long time, since I felt that the rewritten 
reposurgeon was still somewhat unproven. However, I think the 
reposurgeon conversion is probably the best choice at this point, given 
that an actual problem caused by the use of git-svn was demonstrated by 
Richard E, indicating that the scripts do not have an inherent 
reliability advantage. I also think Joseph has a very throrough pair of 
eyeballs.


I have no opinion one way or another which method should be used to 
identify author names, since I have not looked into that.



Bernd


Re: Proposal for the transition timetable for the move to GIT

2020-01-10 Thread Joseph Myers
On Fri, 10 Jan 2020, Iain Sandoe wrote:

> One minor nit (and accepted that this might be too late).

Hooks can always be changed after the conversion is live.

> mail commit messages like this:
> [gcc-reposurgeon-8(refs/users/jsm28/heads/test-branch)] Test git hooks
> interaction with Bugzilla.
> 
> seem to have a title stretched by redundant infomation ;
> at least "users/jsm28/test-branch” would seem to contain all the necessary
> information

I guess this is something to consider for any cleaner upstream support in 
the hooks for custom branch namespaces.

> will commits in the user namespace appear on the mailing list in the end?

Right now the configuration is for all commits to appear on the mailing 
list, just as all SVN commits do.  I think user namespace commits are 
interesting to see, but we can set hooks.no-emails to refs/users/ in the 
hook configuration if we don't want them to appear on the list - that 
configuration option already exists.  https://github.com/AdaCore/git-hooks 
documents the available configuration options.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: Proposal for the transition timetable for the move to GIT

2020-01-10 Thread Nathan Sidwell

On 1/10/20 6:38 AM, Richard Biener wrote:


So I don't see any clear dissent and most folks just want to get this
done.


Just to chime in I also just want to get it done (well, I can handle
SVN as well :P).
I trust Joseph, too, but then from my POV anything not worse than the current
mirror works for me.  Thanks to Maxim anyway for all the work - without that
we'd not switch in 10 other years...


Joseph's conversion please

nathan

--
Nathan Sidwell


Re: Proposal for the transition timetable for the move to GIT

2020-01-10 Thread Iain Sandoe

Richard Biener  wrote:


On Fri, Jan 10, 2020 at 10:49 AM Richard Earnshaw (lists)
 wrote:

On 10/01/2020 07:33, Maxim Kuvyrkov wrote:
On Jan 9, 2020, at 5:38 AM, Segher Boessenkool  
 wrote:


On Wed, Jan 08, 2020 at 11:34:32PM +, Joseph Myers wrote:
As noted on overseers, once Saturday's DATESTAMP update has run at  
00:16
UTC on Saturday, I intend to add a README.MOVED_TO_GIT file on SVN  
trunk

and change the SVN hooks to make SVN readonly, then disable gccadmin's
cron jobs that build snapshots and update online documentation until  
they

are ready to run with the git repository.  Once the existing git mirror
has picked up the last changes I'll make that read-only and disable  
that
cron job as well, and start the conversion process with a view to  
having

the converted repository in place this weekend (it could either be made
writable as soon as I think it's ready, or left read-only until people
have had time to do any final checks on Monday).  Before then, I'll  
work

on hooks, documentation and maintainer-scripts updates.


Where and when and by who was it decided to use this conversion?


Joseph, please point to message on gcc@ mailing list that expresses  
consensus of GCC community to use reposurgeon conversion.  Otherwise,  
it is not appropriate to substitute one's opinion for community  
consensus.


I've gone back through this thread (if I've missed, or misrepresented,
anybody who's expressed an opinion I apologize now).

Segher Boessenkool 
"If Joseph and Richard agree a candidate is good, then I will agree as
well.  All that can be left is nit-picking, and that is not worth it
anyway:"

Jeff Law 
"When Richard and I spoke we generally agreed that we felt a reposurgeon
conversion, if it could be made to work was the preferred solution,
followed by Maxim's approach and lastly the existing git-svn mirror."

Richard Earnshaw (lists) 
FWIW, I now support using reposurgeon for the final conversion.

And, of course, I'm taking Joseph's opinion as read :-)

So I don't see any clear dissent and most folks just want to get this
done.


Just to chime in I also just want to get it done (well, I can handle
SVN as well :P).
I trust Joseph, too, but then from my POV anything not worse than the  
current
mirror works for me.  Thanks to Maxim anyway for all the work - without  
that

we'd not switch in 10 other years...

Btw, "consensus" among the quiet doesn't usually work and "consensus" among
the most vocal isn't really "consensus".  I think GCC (and FOSS) works by
giving power to those who actually do the work.  Doesn't make it easier  
when

there are two, of course ;)


Thanks to all those who’ve put (a lot of) effort into doing this work and  
those who’ve
challenged and tested the conversions, for my part, I am also happy to take  
Joseph’s

recommendation.

One minor nit (and accepted that this might be too late).

mail commit messages like this:
[gcc-reposurgeon-8(refs/users/jsm28/heads/test-branch)] Test git hooks  
interaction with Bugzilla.


seem to have a title stretched by redundant infomation ;
at least "users/jsm28/test-branch” would seem to contain all the necessary  
information


will commits in the user namespace appear on the mailing list in the end?

thanks again
Iain



Re: Proposal for the transition timetable for the move to GIT

2020-01-10 Thread Richard Biener
On Fri, Jan 10, 2020 at 10:49 AM Richard Earnshaw (lists)
 wrote:
>
> On 10/01/2020 07:33, Maxim Kuvyrkov wrote:
> >> On Jan 9, 2020, at 5:38 AM, Segher Boessenkool 
> >>  wrote:
> >>
> >> On Wed, Jan 08, 2020 at 11:34:32PM +, Joseph Myers wrote:
> >>> As noted on overseers, once Saturday's DATESTAMP update has run at 00:16
> >>> UTC on Saturday, I intend to add a README.MOVED_TO_GIT file on SVN trunk
> >>> and change the SVN hooks to make SVN readonly, then disable gccadmin's
> >>> cron jobs that build snapshots and update online documentation until they
> >>> are ready to run with the git repository.  Once the existing git mirror
> >>> has picked up the last changes I'll make that read-only and disable that
> >>> cron job as well, and start the conversion process with a view to having
> >>> the converted repository in place this weekend (it could either be made
> >>> writable as soon as I think it's ready, or left read-only until people
> >>> have had time to do any final checks on Monday).  Before then, I'll work
> >>> on hooks, documentation and maintainer-scripts updates.
> >>
> >> Where and when and by who was it decided to use this conversion?
> >
> > Joseph, please point to message on gcc@ mailing list that expresses 
> > consensus of GCC community to use reposurgeon conversion.  Otherwise, it is 
> > not appropriate to substitute one's opinion for community consensus.
> >
>
> I've gone back through this thread (if I've missed, or misrepresented,
> anybody who's expressed an opinion I apologize now).
>
> Segher Boessenkool 
> "If Joseph and Richard agree a candidate is good, then I will agree as
> well.  All that can be left is nit-picking, and that is not worth it
> anyway:"
>
> Jeff Law 
> "When Richard and I spoke we generally agreed that we felt a reposurgeon
> conversion, if it could be made to work was the preferred solution,
> followed by Maxim's approach and lastly the existing git-svn mirror."
>
> Richard Earnshaw (lists) 
> FWIW, I now support using reposurgeon for the final conversion.
>
> And, of course, I'm taking Joseph's opinion as read :-)
>
> So I don't see any clear dissent and most folks just want to get this
> done.

Just to chime in I also just want to get it done (well, I can handle
SVN as well :P).
I trust Joseph, too, but then from my POV anything not worse than the current
mirror works for me.  Thanks to Maxim anyway for all the work - without that
we'd not switch in 10 other years...

Btw, "consensus" among the quiet doesn't usually work and "consensus" among
the most vocal isn't really "consensus".  I think GCC (and FOSS) works by
giving power to those who actually do the work.  Doesn't make it easier when
there are two, of course ;)

Richard.


Re: Proposal for the transition timetable for the move to GIT

2020-01-10 Thread Richard Earnshaw (lists)

On 10/01/2020 07:33, Maxim Kuvyrkov wrote:

On Jan 9, 2020, at 5:38 AM, Segher Boessenkool  
wrote:

On Wed, Jan 08, 2020 at 11:34:32PM +, Joseph Myers wrote:

As noted on overseers, once Saturday's DATESTAMP update has run at 00:16
UTC on Saturday, I intend to add a README.MOVED_TO_GIT file on SVN trunk
and change the SVN hooks to make SVN readonly, then disable gccadmin's
cron jobs that build snapshots and update online documentation until they
are ready to run with the git repository.  Once the existing git mirror
has picked up the last changes I'll make that read-only and disable that
cron job as well, and start the conversion process with a view to having
the converted repository in place this weekend (it could either be made
writable as soon as I think it's ready, or left read-only until people
have had time to do any final checks on Monday).  Before then, I'll work
on hooks, documentation and maintainer-scripts updates.


Where and when and by who was it decided to use this conversion?


Joseph, please point to message on gcc@ mailing list that expresses consensus 
of GCC community to use reposurgeon conversion.  Otherwise, it is not 
appropriate to substitute one's opinion for community consensus.



I've gone back through this thread (if I've missed, or misrepresented, 
anybody who's expressed an opinion I apologize now).


Segher Boessenkool 
"If Joseph and Richard agree a candidate is good, then I will agree as
well.  All that can be left is nit-picking, and that is not worth it
anyway:"

Jeff Law 
"When Richard and I spoke we generally agreed that we felt a reposurgeon
conversion, if it could be made to work was the preferred solution,
followed by Maxim's approach and lastly the existing git-svn mirror."

Richard Earnshaw (lists) 
FWIW, I now support using reposurgeon for the final conversion.

And, of course, I'm taking Joseph's opinion as read :-)

So I don't see any clear dissent and most folks just want to get this 
done.



I want GCC community to get the best possible conversion, being it mine or 
reposurgeon's.  To this end I'm comparing the two conversions and will post my 
results later today.




Unfortunately, the comparison is complicated by the fact that you uploaded only "b" version of 
gcc-reposurgeon-8 repository, which uses modified branch layout (or confirm that there are no substantial 
differences between "7" and "8" reposurgeon conversions).


The main differences are

a) more revisions added due to commits upstream
b) release tags from SVN era now on the main release branch rather than 
in sidings
c) more author fixups from Joseph's cross validation against your 
repository and reposurgeon's own reports of suspect attributions



R.


--
Maxim Kuvyrkov
https://www.linaro.org






Re: Proposal for the transition timetable for the move to GIT

2020-01-09 Thread Maxim Kuvyrkov
> On Jan 9, 2020, at 5:38 AM, Segher Boessenkool  
> wrote:
> 
> On Wed, Jan 08, 2020 at 11:34:32PM +, Joseph Myers wrote:
>> As noted on overseers, once Saturday's DATESTAMP update has run at 00:16 
>> UTC on Saturday, I intend to add a README.MOVED_TO_GIT file on SVN trunk 
>> and change the SVN hooks to make SVN readonly, then disable gccadmin's 
>> cron jobs that build snapshots and update online documentation until they 
>> are ready to run with the git repository.  Once the existing git mirror 
>> has picked up the last changes I'll make that read-only and disable that 
>> cron job as well, and start the conversion process with a view to having 
>> the converted repository in place this weekend (it could either be made 
>> writable as soon as I think it's ready, or left read-only until people 
>> have had time to do any final checks on Monday).  Before then, I'll work 
>> on hooks, documentation and maintainer-scripts updates.
> 
> Where and when and by who was it decided to use this conversion?

Joseph, please point to message on gcc@ mailing list that expresses consensus 
of GCC community to use reposurgeon conversion.  Otherwise, it is not 
appropriate to substitute one's opinion for community consensus.

I want GCC community to get the best possible conversion, being it mine or 
reposurgeon's.  To this end I'm comparing the two conversions and will post my 
results later today.

Unfortunately, the comparison is complicated by the fact that you uploaded only 
"b" version of gcc-reposurgeon-8 repository, which uses modified branch layout 
(or confirm that there are no substantial differences between "7" and "8" 
reposurgeon conversions).

--
Maxim Kuvyrkov
https://www.linaro.org



Re: Proposal for the transition timetable for the move to GIT

2020-01-09 Thread Eric S. Raymond
Richard Earnshaw (lists) :
> I want to also take this opportunity to thank Maxim for the work he has
> done.  Having that fallback option has meant that we could press harder for
> a timely solution and has also driven several significant improvements to
> the overall result.  I do not think we would have achieved as good a result
> overall if he hadn't developed his scripts.

Yes. Reposurgeon's ChangeLog processing, in particular, was significantly
improved using lessons learned from maxim's scripts.
-- 
http://www.catb.org/~esr/;>Eric S. Raymond




Re: Proposal for the transition timetable for the move to GIT

2020-01-09 Thread Joseph Myers
On Wed, 8 Jan 2020, Jeff Law wrote:

> Is there any chance we could get one more trunk snapshot before the
> conversion starts -- even if that means firing up the snapshot process
> Friday?  It'd be quite useful for the ongoing Fedora build testing.

I could run a snapshot manually.  I was planning to run at least one 
snapshot (for some branch) manually *after* the conversion to test the 
conversion of the gcc_release script to use git (in snapshot mode that 
doesn't make any commits so could be done while the git repository is 
still read-only for checking).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: Proposal for the transition timetable for the move to GIT

2020-01-09 Thread Richard Earnshaw (lists)

On 09/01/2020 02:38, Segher Boessenkool wrote:

On Wed, Jan 08, 2020 at 11:34:32PM +, Joseph Myers wrote:

As noted on overseers, once Saturday's DATESTAMP update has run at 00:16
UTC on Saturday, I intend to add a README.MOVED_TO_GIT file on SVN trunk
and change the SVN hooks to make SVN readonly, then disable gccadmin's
cron jobs that build snapshots and update online documentation until they
are ready to run with the git repository.  Once the existing git mirror
has picked up the last changes I'll make that read-only and disable that
cron job as well, and start the conversion process with a view to having
the converted repository in place this weekend (it could either be made
writable as soon as I think it's ready, or left read-only until people
have had time to do any final checks on Monday).  Before then, I'll work
on hooks, documentation and maintainer-scripts updates.


Where and when and by who was it decided to use this conversion?

Will it at least be *tested* first?


Segher



Tested for what?  We run many tests on the conversion, for example to 
check that the branch tips are all sane, and many other things as well.


Additionally, Joseph has made many trial conversions available for 
public examination as we've been iterating towards the final result.


FWIW, I now support using reposurgeon for the final conversion.

I want to also take this opportunity to thank Maxim for the work he has 
done.  Having that fallback option has meant that we could press harder 
for a timely solution and has also driven several significant 
improvements to the overall result.  I do not think we would have 
achieved as good a result overall if he hadn't developed his scripts.


R.


Re: Proposal for the transition timetable for the move to GIT

2020-01-08 Thread Jeff Law
On Wed, 2020-01-08 at 23:34 +, Joseph Myers wrote:
> 
> As noted on overseers, once Saturday's DATESTAMP update has run at 00:16 
> UTC on Saturday, I intend to add a README.MOVED_TO_GIT file on SVN trunk 
> and change the SVN hooks to make SVN readonly, then disable gccadmin's 
> cron jobs that build snapshots and update online documentation until they 
> are ready to run with the git repository.  Once the existing git mirror 
> has picked up the last changes I'll make that read-only and disable that 
> cron job as well, and start the conversion process with a view to having 
> the converted repository in place this weekend (it could either be made 
> writable as soon as I think it's ready, or left read-only until people 
> have had time to do any final checks on Monday).  Before then, I'll work 
> on hooks, documentation and maintainer-scripts updates.
Is there any chance we could get one more trunk snapshot before the
conversion starts -- even if that means firing up the snapshot process
Friday?  It'd be quite useful for the ongoing Fedora build testing.

If it's a significant hassle, then don't worry, I'll create one
manually.

Jeff
> 



Re: Proposal for the transition timetable for the move to GIT

2020-01-08 Thread Segher Boessenkool
On Wed, Jan 08, 2020 at 11:34:32PM +, Joseph Myers wrote:
> As noted on overseers, once Saturday's DATESTAMP update has run at 00:16 
> UTC on Saturday, I intend to add a README.MOVED_TO_GIT file on SVN trunk 
> and change the SVN hooks to make SVN readonly, then disable gccadmin's 
> cron jobs that build snapshots and update online documentation until they 
> are ready to run with the git repository.  Once the existing git mirror 
> has picked up the last changes I'll make that read-only and disable that 
> cron job as well, and start the conversion process with a view to having 
> the converted repository in place this weekend (it could either be made 
> writable as soon as I think it's ready, or left read-only until people 
> have had time to do any final checks on Monday).  Before then, I'll work 
> on hooks, documentation and maintainer-scripts updates.

Where and when and by who was it decided to use this conversion?

Will it at least be *tested* first?


Segher


Re: Proposal for the transition timetable for the move to GIT

2020-01-08 Thread Joseph Myers
On Wed, 8 Jan 2020, Eric S. Raymond wrote:

> They use your feedback to find places where their comment-processing
> scripts could be improved; we've used it learn what additional
> oddities in ChangeLogs we need to be able to handle automatically.

I've used comparisons of authors in the two conversions - in cases where 
they get different human identities for the author, not just different 
email addresses or name variants - to identify cases for manual review, 
since ChangeLog parsing is the most subjective part of doing a conversion 
and cases where different heuristics produce different results indicate 
those worthy of manual review.

Apart from about 1600 with no changes to ChangeLog files but a ChangeLog 
entry in the commit message, which I reviewed mostly automatically to make 
sure I agreed with Maxim's author extraction with only limited manual 
checks on those that looked like suspect cases, that involved reviewing 
around 3000 commits manually; I've now completed that review.  Some of 
those are also subjective cases even after review (for example, where the 
commit involved one person backporting another person's patch).

In the set of around 1200 commits with both ChangeLog and non-ChangeLog 
files being changed, which did not look like backports, for example, I 
arrived at around 400 author improvements from this review (not all of 
them the same authors as in Maxim's conversion), while for around 800 
commits I concluded the reposurgeon author was preferable.  (The typical 
case where reposurgeon does better is where successive commits add new 
ChangeLog entries under an existing ChangeLog header.  The typical case 
where I added fixes was where a commit made nonsubstantive changes under 
an existing header, as well as adding new entries, which is hard to 
distinguish automatically from a multi-author commit so reposurgeon 
conservatively treats as a multi-author commit.)

In the case of ChangeLog-only commits, where reposurgeon assumes they are 
likely to be fixing typos or similar and so does not extract an 
attribution from ChangeLog files in such commits, manual review identified 
many cases (especially in the earlier parts of the history) where the 
ChangeLog was committed separately from the substantive parts of the patch 
and so a better attribution could be assigned to those substantive 
commits.

I consider the reposurgeon-based conversion machinery to be in essentially 
its final state now; I don't have any further authors to review, Richard 
doesn't have any further Bugzilla-based commit summaries to review and we 
don't know of any relevant reposurgeon bugs or missing features.  I'm 
running a conversion now to verify both the current state of the fixups 
and the Makefile integration of the conversion and subsequent automated 
validation, and will make that converted repository available for final 
checks if this succeeds.  Compared to the previous converted repository, 
this one has many author fixups, a fix for a bug in the author fixups 
where they broke commit dates, and reposurgeon improvements to avoid 
producing unidiomatic empty git commits in the converted repository for 
things such as branch and tag creation.

This converted repository uses the ref rearrangements along the lines 
proposed by Richard (so dead branches and vendor branches are available 
but not fetched by default); the objects from the existing git mirror will 
also be included in the repository (so existing gitweb links to such 
objects in list archives continue to work, for example, as long as they 
aren't links to objects that were made unreachable at some point in the 
mirror's history), but again under ref names that are not fetched by 
default.

As noted on overseers, once Saturday's DATESTAMP update has run at 00:16 
UTC on Saturday, I intend to add a README.MOVED_TO_GIT file on SVN trunk 
and change the SVN hooks to make SVN readonly, then disable gccadmin's 
cron jobs that build snapshots and update online documentation until they 
are ready to run with the git repository.  Once the existing git mirror 
has picked up the last changes I'll make that read-only and disable that 
cron job as well, and start the conversion process with a view to having 
the converted repository in place this weekend (it could either be made 
writable as soon as I think it's ready, or left read-only until people 
have had time to do any final checks on Monday).  Before then, I'll work 
on hooks, documentation and maintainer-scripts updates.

As well as having objects from the existing git mirror available under 
refs that are not fetched by default, that mirror will remain available 
read-only at git://gcc.gnu.org/git/gcc-old.git (which already exists, 
currently a symlink to the mirror).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: Proposal for the transition timetable for the move to GIT

2020-01-08 Thread Eric S. Raymond
Maxim Kuvyrkov :
> Once gcc-reparent conversion is regenerated, I'll do another round of 
> comparisons between it and whatever the latest reposurgeon version is.

Thanks, Maxim. Those comparisons have been very helpful to Joseph and
Richard and to the reposurgeon devteam as well.

They use your feedback to find places where their comment-processing
scripts could be improved; we've used it learn what additional
oddities in ChangeLogs we need to be able to handle automatically.
-- 
http://www.catb.org/~esr/;>Eric S. Raymond




Re: Proposal for the transition timetable for the move to GIT

2020-01-08 Thread Maxim Kuvyrkov
> On Dec 30, 2019, at 7:08 PM, Richard Earnshaw (lists) 
>  wrote:
> 
> On 30/12/2019 15:49, Maxim Kuvyrkov wrote:
>>> On Dec 30, 2019, at 6:31 PM, Richard Earnshaw (lists) 
>>>  wrote:
>>> 
>>> On 30/12/2019 13:00, Maxim Kuvyrkov wrote:
> On Dec 30, 2019, at 1:24 AM, Richard Earnshaw (lists) 
>  wrote:
> 
> On 29/12/2019 18:30, Maxim Kuvyrkov wrote:
>> Below are several more issues I found in reposurgeon-6a conversion 
>> comparing it against gcc-reparent conversion.
>> 
>> I am sure, these and whatever other problems I may find in the 
>> reposurgeon conversion can be fixed in time.  However, I don't see why 
>> should bother.  My conversion has been available since summer 2019, I 
>> made it ready in time for GCC Cauldron 2019, and it didn't change in any 
>> significant way since then.
>> 
>> With the "Missed merges" problem (see below) I don't see how reposurgeon 
>> conversion can be considered "ready".  Also, I expected a diligent 
>> developer to compare new conversion (aka reposurgeon's) against existing 
>> conversion (aka gcc-pretty / gcc-reparent) before declaring the new 
>> conversion "better" or even "ready".  The data I'm seeing in differences 
>> between my and reposurgeon conversions shows that gcc-reparent 
>> conversion is /better/.
>> 
>> I suggest that GCC community adopts either gcc-pretty or gcc-reparent 
>> conversion.  I welcome Richard E. to modify his summary scripts to work 
>> with svn-git scripts, which should be straightforward, and I'm ready to 
>> help.
>> 
> 
> I don't think either of these conversions are any more ready to use than
> the reposurgeon one, possibly less so.  In fact, there are still some
> major issues to resolve first before they can be considered.
> 
> gcc-pretty has completely wrong parent information for the gcc-3 era
> release tags, showing the tags as being made directly from trunk with
> massive deltas representing the roll-up of all the commits that were
> made on the gcc-3 release branch.
 
 I will clarify the above statement, and please correct me where you think 
 I'm wrong.  Gcc-pretty conversion has the exact right parent information 
 for the gcc-3 era
 release tags as recorded in SVN version history.  Gcc-pretty conversion 
 aims to produce an exact copy of SVN history in git.  IMO, it manages to 
 do so just fine.
 
 It is a different thing that SVN history has a screwed up record of gcc-3 
 era tags.
>>> 
>>> It's not screwed up in svn.  Svn shows the correct history information for 
>>> the gcc-3 era release tags, but the git-svn conversion in gcc-pretty does 
>>> not.
>>> 
>>> For example, looking at gcc_3_0_release in expr.c with git blame and svn 
>>> blame shows
>> 
>> In SVN history tags/gcc_3_0_release has been copied off /trunk:39596 and in 
>> the same commit bunch of files were replaced from /branches/gcc-3_0-branch/ 
>> (and from different revisions of this branch!).
>> 
>> $ svn log -qv --stop-on-copy file://$(pwd)/tags/gcc_3_0_release | grep 
>> "/tags/gcc_3_0_release \|/tags/gcc_3_0_release/gcc/expr.c 
>> \|/tags/gcc_3_0_release/gcc/reload.c "
>>   A /tags/gcc_3_0_release (from /trunk:39596)
>>   R /tags/gcc_3_0_release/gcc/expr.c (from 
>> /branches/gcc-3_0-branch/gcc/expr.c:43255)
>>   R /tags/gcc_3_0_release/gcc/reload.c (from 
>> /branches/gcc-3_0-branch/gcc/reload.c:42007)
>> 
> 
> Right, (and wrong).  You have to understand how the release branches and
> tags are represented in CVS to understand why the SVN conversion is done
> this way.  When a branch was created in CVS a tag was added to each
> commit which would then be used in any future revisions along that
> branch.  But until a commit is made on that branch, the release branch
> is just a placeholder.
> 
> When a CVS release tag is created, the tag labels the relevant commit
> that is to be used.  If that commit is unchanged from the trunk revision
> (no commit on the branch), then that is what gets labelled, and it
> *appears* to still come from trunk - but that does not matter, since it
> is the same as the version on trunk.
> 
> The svn copy operations are formed from this set of information by
> copying the SVN revision of trunk that applied at the point the branch
> was made, and then overriding the copy information for each file that
> was then modified on the branch with information about that copy.  This
> is sufficient for svn to fully understand the history information for
> each and every file in the tag.
> 
> Unfortunately, git-svn mis-interprets this when building its graph of
> what happened and while it copies the right *content* into the release
> branch, it does not copy the right *history*.  The SVN R operation
> copies the history from named revision, not just the content.  That's
> the significant difference between the two.
> 
> R
>> IMO, from such history (absent 

Re: Proposal for the transition timetable for the move to GIT

2020-01-02 Thread Richard Earnshaw (lists)
On 02/01/2020 02:58, Alexandre Oliva wrote:
> On Dec 30, 2019, "Richard Earnshaw (lists)"  wrote:
> 
>> Right, (and wrong).  You have to understand how the release branches and
>> tags are represented in CVS to understand why the SVN conversion is done
>> this way.
> 
> I'm curious and ignorant, is the convoluted representation that Maxim
> described what SVN normally uses for tree copies, that any conversion
> tool from SVN to GIT thus ought to be able to figure out, or is it just
> an unusual artifact of the conversion from CVS to SVN, that we'd like to
> fix in the conversion from SVN to GIT with some specialized recovery for
> such errors in repos poorly converted from CVS?
> 
> Thanks in advance for cluing me in,
> 

I think it mostly comes from cvs2svn.  You probably could manufacture
something similar directly in SVN, but you'd have to try very hard to
create such brain damage.  Some thing like

svn cp ^/trunk ^/branches/foo
svn rm -f ^/branches/foo/fred.c
svn cp ^/branches/bar/fred ^/branches/foo/fred.c
...
svn ci

Which would create a copy of trunk in foo with a copy of fred.c from the
bar branch etc.

Normal SVN copies to a branch use a simple node copy of the top-level
directory, which is why branching in SVN is cheap (essentially O(1) in
time).

R.


Re: Proposal for the transition timetable for the move to GIT

2020-01-01 Thread Alexandre Oliva
On Dec 30, 2019, "Richard Earnshaw (lists)"  wrote:

> Right, (and wrong).  You have to understand how the release branches and
> tags are represented in CVS to understand why the SVN conversion is done
> this way.

I'm curious and ignorant, is the convoluted representation that Maxim
described what SVN normally uses for tree copies, that any conversion
tool from SVN to GIT thus ought to be able to figure out, or is it just
an unusual artifact of the conversion from CVS to SVN, that we'd like to
fix in the conversion from SVN to GIT with some specialized recovery for
such errors in repos poorly converted from CVS?

Thanks in advance for cluing me in,

-- 
Alexandre Oliva, freedom fighter   he/him   https://FSFLA.org/blogs/lxo
Free Software Evangelist   Stallman was right, but he's left :(
GNU Toolchain EngineerFSMatrix: It was he who freed the first of us
FSF & FSFLA board memberThe Savior shall return (true);


Re: Proposal for the transition timetable for the move to GIT

2019-12-31 Thread Segher Boessenkool
On Tue, Dec 31, 2019 at 01:42:55PM +, Joseph Myers wrote:
> As the remaining changes being made to the reposurgeon conversion are of 
> the form "tidy things up where reposurgeon is already making a reasonable 
> conservative choice but minor improvements are still possible", I think 
> it's very clearly ready and propose doing the actual conversion with 
> reposurgeon over the weekend of 11/12 January [*], with whatever 
> improvements to commit messages and authors are ready by then.

I propose following the original plan, instead, and choosing the best
conversion that *exist* today, or at whatever later date we choose.

Or we can use what we had ready over half a year ago, which was a Fine
conversion already.  Or what we had *over ten years ago*, a repo made
as a plain git-svn mirror, which is perfectly serviceable as well.  (We
*know* it is, many of us have used it daily for that long).

Switching to a conversion that is different, in some ways better, sure,
but in some ways worse as well, is not a good idea imo.  *Especially*
since we asked many times for an evaluation where it is worse or better,
but nothing is forthcoming, we are just asked to accept on blind faith
that it is better, all evidence to the contrary notwithstanding.


Segher


Re: Proposal for the transition timetable for the move to GIT

2019-12-31 Thread Richard Earnshaw (lists)
On 31/12/2019 13:42, Joseph Myers wrote:
> On Mon, 16 Dec 2019, Jeff Law wrote:
> 
>>> Joseph Myers has made his choice.  He has said repeatedly that he
>>> wants to follow through with the reposurgeon conversion, and he's
>>> putting his effort behind that by writing tests and even contributing
>>> code to reposurgeon.
>>>
>>> We'll get this done faster if nobody is joggling his elbow. Or mine.
>> And just to be clear, my preference is for reposurgeon, if it's ready. 
>> But if it isn't, then I'm absolutely comfortable dropping back to
>> Maxim's conversion or even the existing mirror.
> 
> As the remaining changes being made to the reposurgeon conversion are of 
> the form "tidy things up where reposurgeon is already making a reasonable 
> conservative choice but minor improvements are still possible", I think 
> it's very clearly ready and propose doing the actual conversion with 
> reposurgeon over the weekend of 11/12 January [*], with whatever 
> improvements to commit messages and authors are ready by then.  (People 
> would then be free to note issues found afterwards, with the potential to 
> address them if some other version control system takes over from git in 
> 20 years' time, just as some issues from cvs2svn are being addressed in 
> this conversion.)
> 
> This is explicitly not aiming for perfection, but saying that having some 
> improved commit message summaries and authors, based on a combination of 
> sufficiently safe heuristics and manual review of cases heuristics suggest 
> may be questionable and that can be reviewed in time, without trying to 
> have the best possible commit message or author in every case, is better 
> than falling back to only the original commit messages and only using the 
> committer as the author.
> 
> [*] The time taken by a reposurgeon conversion is actually dominated by 
> time spent in git (git-fast-import takes about four hours to import the 
> repository, git gc --aggressive takes over an hour to repack it 
> afterwards) and in validation against SVN, not in reposurgeon itself, so 
> doesn't take a whole weekend, but there will be other things such as hook 
> setup and testing and documenting usage of the repository.
> 

We can develop and test the hook setup on one of the trial repositories.
 In fact, we could probably open one of them up to allow commits from
the community on the understanding that all such commits are for testing
purposes only and will be lost during the final conversion.

That will give folk an opportunity to test their own local setups so
that when the switch does occur they are well prepared.

R.


Re: Proposal for the transition timetable for the move to GIT

2019-12-31 Thread Joseph Myers
On Mon, 16 Dec 2019, Jeff Law wrote:

> > Joseph Myers has made his choice.  He has said repeatedly that he
> > wants to follow through with the reposurgeon conversion, and he's
> > putting his effort behind that by writing tests and even contributing
> > code to reposurgeon.
> > 
> > We'll get this done faster if nobody is joggling his elbow. Or mine.
> And just to be clear, my preference is for reposurgeon, if it's ready. 
> But if it isn't, then I'm absolutely comfortable dropping back to
> Maxim's conversion or even the existing mirror.

As the remaining changes being made to the reposurgeon conversion are of 
the form "tidy things up where reposurgeon is already making a reasonable 
conservative choice but minor improvements are still possible", I think 
it's very clearly ready and propose doing the actual conversion with 
reposurgeon over the weekend of 11/12 January [*], with whatever 
improvements to commit messages and authors are ready by then.  (People 
would then be free to note issues found afterwards, with the potential to 
address them if some other version control system takes over from git in 
20 years' time, just as some issues from cvs2svn are being addressed in 
this conversion.)

This is explicitly not aiming for perfection, but saying that having some 
improved commit message summaries and authors, based on a combination of 
sufficiently safe heuristics and manual review of cases heuristics suggest 
may be questionable and that can be reviewed in time, without trying to 
have the best possible commit message or author in every case, is better 
than falling back to only the original commit messages and only using the 
committer as the author.

[*] The time taken by a reposurgeon conversion is actually dominated by 
time spent in git (git-fast-import takes about four hours to import the 
repository, git gc --aggressive takes over an hour to repack it 
afterwards) and in validation against SVN, not in reposurgeon itself, so 
doesn't take a whole weekend, but there will be other things such as hook 
setup and testing and documenting usage of the repository.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: Proposal for the transition timetable for the move to GIT

2019-12-31 Thread Segher Boessenkool
On Mon, Dec 30, 2019 at 06:23:16PM -0600, Segher Boessenkool wrote:
> > To me, that indicates that using a conversion tool that is conservative in 
> > its heuristics, and then selectively applying improvements to the extent 
> > they can be done safely with manual review in a reasonable time, is better 
> > than applying a conversion tool with more aggressive heuristics.
> 
> Then you need to just completely drop this, and always use
> , because a large percentage will get that anyway
> then.  Which is fine with me, fwiw: it's correct, and it's a little
> inconvenient perhaps, but it doesn't really make the result less usable
> at all.
> 
> Precisely like weird merges on svn tags that aren't even on a branch.
> Perfect is the enemy of ever getting a conversion done.

Oh, and let me add:

$ date -d "aug 20 2015 + 1600 days"

That is how long this reposurgeon obstinence has delayed us so far.

Happy turn of the year everyone,


Segher


Re: Proposal for the transition timetable for the move to GIT

2019-12-30 Thread Eric S. Raymond
Joseph Myers :
> To me, that indicates that using a conversion tool that is conservative in 
> its heuristics, and then selectively applying improvements to the extent 
> they can be done safely with manual review in a reasonable time, is better 
> than applying a conversion tool with more aggressive heuristics.

There's a more general point here, which I'm developing in my
book-in-progress.

Clean data-conversion problems can be done algorithmically without a
human in the loop.  Messy data-conversion problems need judgment
amplifiers.

Maxim's scripts try to treat a messy conversion problem as though it
were a clean one. Maxim is pretty sharp, so this almost works. Almost.
But the failure mode is predictable - overinterpreting badly-formed
input leads to plausible garbage on output.  

When this happens, it's the Goddess Eris's way of telling you that
there needs to be human judgment in the loop.  Instead of trying to
automate it out, you should be building tools that partion the process 
into things a computer does well, driven by choices a human makes well.

This is a point that needs making because programmers thrown at messy
conversion problems tend to be more fixated on achieving full
automation than they perhaps ought to be.

Elswhere I have written of Zeno tarpits:
http://esr.ibiblio.org/?p=6772 Subversion dump streams are not quite a
Zeno tarpit - they actually obey something that has the effect of a
formal specification - but ChangeLog parsing is.

> The issues with the reposurgeon conversion listed in Maxim's last comments 
> were of the form "reposurgeon is being conservative in how it generates 
> metadata from SVN information".  I think that's a very good basis for 
> adding on a limited set of safe improvements to authors and commit 
> messages that can be done reasonably soon and then doing the final 
> conversion with reposurgeon.

The flip side of this is that Joseph has been making intelligent and
realistic suggestions for how to improve reposurgeon.  That is
*invaluable* - it captures knowledge that will make future comparisons
easier and better.

Software engineers (outside of a few AI specialists) don't ordinarily
think of themselves as being in the knowledge-capture business. But
it's a useful perspective to cultivate.
-- 
http://www.catb.org/~esr/;>Eric S. Raymond




Re: Proposal for the transition timetable for the move to GIT

2019-12-30 Thread Segher Boessenkool
On Mon, Dec 30, 2019 at 10:58:05PM +, Joseph Myers wrote:
> > If you guys want to ever finish, you'll need to drop the quest for
> > perfection, because this leads to a) much more work, and b) worse quality
> > in the end.
> 
> To me, that indicates that using a conversion tool that is conservative in 
> its heuristics, and then selectively applying improvements to the extent 
> they can be done safely with manual review in a reasonable time, is better 
> than applying a conversion tool with more aggressive heuristics.

Then you need to just completely drop this, and always use
, because a large percentage will get that anyway
then.  Which is fine with me, fwiw: it's correct, and it's a little
inconvenient perhaps, but it doesn't really make the result less usable
at all.

Precisely like weird merges on svn tags that aren't even on a branch.
Perfect is the enemy of ever getting a conversion done.

> The issues with the reposurgeon conversion listed in Maxim's last comments 
> were of the form "reposurgeon is being conservative in how it generates 
> metadata from SVN information".  I think that's a very good basis for 
> adding on a limited set of safe improvements to authors and commit 
> messages that can be done reasonably soon and then doing the final 
> conversion with reposurgeon.

No, we want to *see* why it would be better than the alternatives, what
the differences are.


Segher


Re: Proposal for the transition timetable for the move to GIT

2019-12-30 Thread Joseph Myers
On Mon, 30 Dec 2019, Segher Boessenkool wrote:

> To make it not be super much work, I'd do the second option: better
> heuristics.  Those in Maxim's conversion have been great since over half
> a year, you could borrow some, or peek for inspiration?

Actually, comparing authors between the two conversions shows plenty of 
places where the more aggressive ChangeLog extraction in Maxim's 
conversion has produced less good attributions than reposurgeon (e.g. 
attributing merges to some random author from a ChangeLog modified in the 
merge, rather than to the committer of the merge, or attributing fixes in 
a ChangeLog to the author of a random entry that got fixed), as well as 
places where it's simply failed to extract an author from a ChangeLog that 
reposurgeon has extracted.  So for "great", read "have some good ideas to 
learn from, but plenty of places with problems as well".

I'm working on more detailed comparison of authors with some more 
heuristics to help identify the most interesting cases for manual 
inspection (those where it's more likely Maxim's heuristics are finding 
valid authors reposurgeon didn't) and separate those from cases where 
different subjective choices were made (e.g. of how to assign an author 
when one person backports another's patch, or multi-author commits where 
one conversion chose one author as the main one and the other conversion 
chose the other author).

> If you guys want to ever finish, you'll need to drop the quest for
> perfection, because this leads to a) much more work, and b) worse quality
> in the end.

To me, that indicates that using a conversion tool that is conservative in 
its heuristics, and then selectively applying improvements to the extent 
they can be done safely with manual review in a reasonable time, is better 
than applying a conversion tool with more aggressive heuristics.

The issues with the reposurgeon conversion listed in Maxim's last comments 
were of the form "reposurgeon is being conservative in how it generates 
metadata from SVN information".  I think that's a very good basis for 
adding on a limited set of safe improvements to authors and commit 
messages that can be done reasonably soon and then doing the final 
conversion with reposurgeon.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: Proposal for the transition timetable for the move to GIT

2019-12-30 Thread Segher Boessenkool
On Mon, Dec 30, 2019 at 03:36:42PM +, Richard Earnshaw (lists) wrote:
> On 29/12/2019 23:13, Segher Boessenkool wrote:
> > On Sun, Dec 29, 2019 at 11:00:08PM +, Joseph Myers wrote:
> >> fixups in bugdb.py - and that way benefit both from reposurgeon making 
> >> choices that are as conservatively safe as possible, which seems a 
> >> desirable property for problem cases that haven't been manually reviewed, 
> > 
> > Problem cases that haven't been manually reviewed should *be* manually
> > reviewed, or the heuristics improved so there are fewer problem cases.
> > 
> 
> Thank you for offering to help with the checking.
> 
> ;-)

I am telling you what you (imo) need to do at a minimum to make your
candidate conversion acceptable, if it has the problems you say it has.

To make it not be super much work, I'd do the second option: better
heuristics.  Those in Maxim's conversion have been great since over half
a year, you could borrow some, or peek for inspiration?

I have no interest in improving another candidate conversion, as I'm sure
you realise.  And I'm supposed to have time off now ;-)

If you guys want to ever finish, you'll need to drop the quest for
perfection, because this leads to a) much more work, and b) worse quality
in the end.  And before you protest, please look at the evidence again.
*Your own* evidence.

HTH, this is supposed to be constructive, not a flame,

Best wishes,


Segher


Re: Proposal for the transition timetable for the move to GIT

2019-12-30 Thread Richard Earnshaw (lists)
On 30/12/2019 15:49, Maxim Kuvyrkov wrote:
>> On Dec 30, 2019, at 6:31 PM, Richard Earnshaw (lists) 
>>  wrote:
>>
>> On 30/12/2019 13:00, Maxim Kuvyrkov wrote:
 On Dec 30, 2019, at 1:24 AM, Richard Earnshaw (lists) 
  wrote:

 On 29/12/2019 18:30, Maxim Kuvyrkov wrote:
> Below are several more issues I found in reposurgeon-6a conversion 
> comparing it against gcc-reparent conversion.
>
> I am sure, these and whatever other problems I may find in the 
> reposurgeon conversion can be fixed in time.  However, I don't see why 
> should bother.  My conversion has been available since summer 2019, I 
> made it ready in time for GCC Cauldron 2019, and it didn't change in any 
> significant way since then.
>
> With the "Missed merges" problem (see below) I don't see how reposurgeon 
> conversion can be considered "ready".  Also, I expected a diligent 
> developer to compare new conversion (aka reposurgeon's) against existing 
> conversion (aka gcc-pretty / gcc-reparent) before declaring the new 
> conversion "better" or even "ready".  The data I'm seeing in differences 
> between my and reposurgeon conversions shows that gcc-reparent conversion 
> is /better/.
>
> I suggest that GCC community adopts either gcc-pretty or gcc-reparent 
> conversion.  I welcome Richard E. to modify his summary scripts to work 
> with svn-git scripts, which should be straightforward, and I'm ready to 
> help.
>

 I don't think either of these conversions are any more ready to use than
 the reposurgeon one, possibly less so.  In fact, there are still some
 major issues to resolve first before they can be considered.

 gcc-pretty has completely wrong parent information for the gcc-3 era
 release tags, showing the tags as being made directly from trunk with
 massive deltas representing the roll-up of all the commits that were
 made on the gcc-3 release branch.
>>>
>>> I will clarify the above statement, and please correct me where you think 
>>> I'm wrong.  Gcc-pretty conversion has the exact right parent information 
>>> for the gcc-3 era
>>> release tags as recorded in SVN version history.  Gcc-pretty conversion 
>>> aims to produce an exact copy of SVN history in git.  IMO, it manages to do 
>>> so just fine.
>>>
>>> It is a different thing that SVN history has a screwed up record of gcc-3 
>>> era tags.
>>
>> It's not screwed up in svn.  Svn shows the correct history information for 
>> the gcc-3 era release tags, but the git-svn conversion in gcc-pretty does 
>> not.
>>
>> For example, looking at gcc_3_0_release in expr.c with git blame and svn 
>> blame shows
> 
> In SVN history tags/gcc_3_0_release has been copied off /trunk:39596 and in 
> the same commit bunch of files were replaced from /branches/gcc-3_0-branch/ 
> (and from different revisions of this branch!).
> 
> $ svn log -qv --stop-on-copy file://$(pwd)/tags/gcc_3_0_release | grep 
> "/tags/gcc_3_0_release \|/tags/gcc_3_0_release/gcc/expr.c 
> \|/tags/gcc_3_0_release/gcc/reload.c "
>A /tags/gcc_3_0_release (from /trunk:39596)
>R /tags/gcc_3_0_release/gcc/expr.c (from 
> /branches/gcc-3_0-branch/gcc/expr.c:43255)
>R /tags/gcc_3_0_release/gcc/reload.c (from 
> /branches/gcc-3_0-branch/gcc/reload.c:42007)
> 

Right, (and wrong).  You have to understand how the release branches and
tags are represented in CVS to understand why the SVN conversion is done
this way.  When a branch was created in CVS a tag was added to each
commit which would then be used in any future revisions along that
branch.  But until a commit is made on that branch, the release branch
is just a placeholder.

When a CVS release tag is created, the tag labels the relevant commit
that is to be used.  If that commit is unchanged from the trunk revision
(no commit on the branch), then that is what gets labelled, and it
*appears* to still come from trunk - but that does not matter, since it
is the same as the version on trunk.

The svn copy operations are formed from this set of information by
copying the SVN revision of trunk that applied at the point the branch
was made, and then overriding the copy information for each file that
was then modified on the branch with information about that copy.  This
is sufficient for svn to fully understand the history information for
each and every file in the tag.

Unfortunately, git-svn mis-interprets this when building its graph of
what happened and while it copies the right *content* into the release
branch, it does not copy the right *history*.  The SVN R operation
copies the history from named revision, not just the content.  That's
the significant difference between the two.

R
> IMO, from such history (absent external knowledge about better reparenting 
> options) the best choice for parent branch is /trunk@39596, not 
> /branches/gcc-3_0-branch at a random revision from the replaced files.
> 
> Still, I see your 

Re: Proposal for the transition timetable for the move to GIT

2019-12-30 Thread Maxim Kuvyrkov
> On Dec 30, 2019, at 6:31 PM, Richard Earnshaw (lists) 
>  wrote:
> 
> On 30/12/2019 13:00, Maxim Kuvyrkov wrote:
>>> On Dec 30, 2019, at 1:24 AM, Richard Earnshaw (lists) 
>>>  wrote:
>>> 
>>> On 29/12/2019 18:30, Maxim Kuvyrkov wrote:
 Below are several more issues I found in reposurgeon-6a conversion 
 comparing it against gcc-reparent conversion.
 
 I am sure, these and whatever other problems I may find in the reposurgeon 
 conversion can be fixed in time.  However, I don't see why should bother.  
 My conversion has been available since summer 2019, I made it ready in 
 time for GCC Cauldron 2019, and it didn't change in any significant way 
 since then.
 
 With the "Missed merges" problem (see below) I don't see how reposurgeon 
 conversion can be considered "ready".  Also, I expected a diligent 
 developer to compare new conversion (aka reposurgeon's) against existing 
 conversion (aka gcc-pretty / gcc-reparent) before declaring the new 
 conversion "better" or even "ready".  The data I'm seeing in differences 
 between my and reposurgeon conversions shows that gcc-reparent conversion 
 is /better/.
 
 I suggest that GCC community adopts either gcc-pretty or gcc-reparent 
 conversion.  I welcome Richard E. to modify his summary scripts to work 
 with svn-git scripts, which should be straightforward, and I'm ready to 
 help.
 
>>> 
>>> I don't think either of these conversions are any more ready to use than
>>> the reposurgeon one, possibly less so.  In fact, there are still some
>>> major issues to resolve first before they can be considered.
>>> 
>>> gcc-pretty has completely wrong parent information for the gcc-3 era
>>> release tags, showing the tags as being made directly from trunk with
>>> massive deltas representing the roll-up of all the commits that were
>>> made on the gcc-3 release branch.
>> 
>> I will clarify the above statement, and please correct me where you think 
>> I'm wrong.  Gcc-pretty conversion has the exact right parent information for 
>> the gcc-3 era
>> release tags as recorded in SVN version history.  Gcc-pretty conversion aims 
>> to produce an exact copy of SVN history in git.  IMO, it manages to do so 
>> just fine.
>> 
>> It is a different thing that SVN history has a screwed up record of gcc-3 
>> era tags.
> 
> It's not screwed up in svn.  Svn shows the correct history information for 
> the gcc-3 era release tags, but the git-svn conversion in gcc-pretty does not.
> 
> For example, looking at gcc_3_0_release in expr.c with git blame and svn 
> blame shows

In SVN history tags/gcc_3_0_release has been copied off /trunk:39596 and in the 
same commit bunch of files were replaced from /branches/gcc-3_0-branch/ (and 
from different revisions of this branch!).

$ svn log -qv --stop-on-copy file://$(pwd)/tags/gcc_3_0_release | grep 
"/tags/gcc_3_0_release \|/tags/gcc_3_0_release/gcc/expr.c 
\|/tags/gcc_3_0_release/gcc/reload.c "
   A /tags/gcc_3_0_release (from /trunk:39596)
   R /tags/gcc_3_0_release/gcc/expr.c (from 
/branches/gcc-3_0-branch/gcc/expr.c:43255)
   R /tags/gcc_3_0_release/gcc/reload.c (from 
/branches/gcc-3_0-branch/gcc/reload.c:42007)

IMO, from such history (absent external knowledge about better reparenting 
options) the best choice for parent branch is /trunk@39596, not 
/branches/gcc-3_0-branch at a random revision from the replaced files.

Still, I see your point, and I will fix reparenting support.  Whether GCC 
community opts to reparent or not reparent is a different topic.

--
Maxim Kuvyrkov
https://www.linaro.org


> git blame expr.c:
> 
> ba0a9cb85431 (Richard Kenner 1992-03-03 23:34:57 +   396) 
> return temp;
> ba0a9cb85431 (Richard Kenner 1992-03-03 23:34:57 +   397)   }
> 5fbf0b0d5828 (no-author  2001-06-17 19:44:25 +   398) /* 
> Copy the address into a pseudo, so that the returned value
> 5fbf0b0d5828 (no-author  2001-06-17 19:44:25 +   399)
> remains correct across calls to emit_queue.  */
> 5fbf0b0d5828 (no-author  2001-06-17 19:44:25 +   400) 
> XEXP (new, 0) = copy_to_reg (XEXP (new, 0));
> 59f26b7caad9 (Richard Kenner 1994-01-11 00:23:47 +   401) 
> return new;
> 
> git log 5fbf0b0d5828
> commit 5fbf0b0d5828687914c1c18a83ff12c8627d5a70 (HEAD, tag: gcc_3_0_release)
> Author: no-author 
> Date:   Sun Jun 17 19:44:25 2001 +
> 
>This commit was manufactured by cvs2svn to create tag
>'gcc_3_0_release'.
> 
> while svn blame expr.c correctly shows:
> 
>   386 kenner return temp;
>   386 kenner   }
> 42209 bernds /* Copy the address into a pseudo, so that the 
> returned value
> 42209 berndsremains correct across calls to emit_queue.  */
> 42209 bernds XEXP (new, 0) = copy_to_reg (XEXP (new, 0));
>  6375 kenner return new;
> 
> svn log -r42209 ^/
> 

Re: Proposal for the transition timetable for the move to GIT

2019-12-30 Thread Richard Earnshaw (lists)
On 29/12/2019 23:13, Segher Boessenkool wrote:
> On Sun, Dec 29, 2019 at 11:00:08PM +, Joseph Myers wrote:
>> fixups in bugdb.py - and that way benefit both from reposurgeon making 
>> choices that are as conservatively safe as possible, which seems a 
>> desirable property for problem cases that haven't been manually reviewed, 
> 
> Problem cases that haven't been manually reviewed should *be* manually
> reviewed, or the heuristics improved so there are fewer problem cases.
> 

Thank you for offering to help with the checking.

;-)

R.

> As I've said many many times now, we only have *one* repository to
> convert here.  Taking shortcuts is *good*, making problems for ourselves
> by pretending we do things more generically is *bad*.
> 
> 
> Segher
> 



Re: Proposal for the transition timetable for the move to GIT

2019-12-30 Thread Richard Earnshaw (lists)
On 30/12/2019 13:00, Maxim Kuvyrkov wrote:
>> On Dec 30, 2019, at 1:24 AM, Richard Earnshaw (lists) 
>>  wrote:
>>
>> On 29/12/2019 18:30, Maxim Kuvyrkov wrote:
>>> Below are several more issues I found in reposurgeon-6a conversion 
>>> comparing it against gcc-reparent conversion.
>>>
>>> I am sure, these and whatever other problems I may find in the reposurgeon 
>>> conversion can be fixed in time.  However, I don't see why should bother.  
>>> My conversion has been available since summer 2019, I made it ready in time 
>>> for GCC Cauldron 2019, and it didn't change in any significant way since 
>>> then.
>>>
>>> With the "Missed merges" problem (see below) I don't see how reposurgeon 
>>> conversion can be considered "ready".  Also, I expected a diligent 
>>> developer to compare new conversion (aka reposurgeon's) against existing 
>>> conversion (aka gcc-pretty / gcc-reparent) before declaring the new 
>>> conversion "better" or even "ready".  The data I'm seeing in differences 
>>> between my and reposurgeon conversions shows that gcc-reparent conversion 
>>> is /better/.
>>>
>>> I suggest that GCC community adopts either gcc-pretty or gcc-reparent 
>>> conversion.  I welcome Richard E. to modify his summary scripts to work 
>>> with svn-git scripts, which should be straightforward, and I'm ready to 
>>> help.
>>>
>>
>> I don't think either of these conversions are any more ready to use than
>> the reposurgeon one, possibly less so.  In fact, there are still some
>> major issues to resolve first before they can be considered.
>>
>> gcc-pretty has completely wrong parent information for the gcc-3 era
>> release tags, showing the tags as being made directly from trunk with
>> massive deltas representing the roll-up of all the commits that were
>> made on the gcc-3 release branch.
> 
> I will clarify the above statement, and please correct me where you think I'm 
> wrong.  Gcc-pretty conversion has the exact right parent information for the 
> gcc-3 era
> release tags as recorded in SVN version history.  Gcc-pretty conversion aims 
> to produce an exact copy of SVN history in git.  IMO, it manages to do so 
> just fine.
> 
> It is a different thing that SVN history has a screwed up record of gcc-3 era 
> tags.

It's not screwed up in svn.  Svn shows the correct history information for the 
gcc-3 era release tags, but the git-svn conversion in gcc-pretty does not.

For example, looking at gcc_3_0_release in expr.c with git blame and svn blame 
shows

git blame expr.c:

ba0a9cb85431 (Richard Kenner 1992-03-03 23:34:57 +   396) 
return temp;
ba0a9cb85431 (Richard Kenner 1992-03-03 23:34:57 +   397)   }
5fbf0b0d5828 (no-author  2001-06-17 19:44:25 +   398) /* 
Copy the address into a pseudo, so that the returned value
5fbf0b0d5828 (no-author  2001-06-17 19:44:25 +   399)
remains correct across calls to emit_queue.  */
5fbf0b0d5828 (no-author  2001-06-17 19:44:25 +   400) XEXP 
(new, 0) = copy_to_reg (XEXP (new, 0));
59f26b7caad9 (Richard Kenner 1994-01-11 00:23:47 +   401) 
return new;

git log 5fbf0b0d5828
commit 5fbf0b0d5828687914c1c18a83ff12c8627d5a70 (HEAD, tag: gcc_3_0_release)
Author: no-author 
Date:   Sun Jun 17 19:44:25 2001 +

This commit was manufactured by cvs2svn to create tag
'gcc_3_0_release'.

while svn blame expr.c correctly shows:

   386 kenner return temp;
   386 kenner   }
 42209 bernds /* Copy the address into a pseudo, so that the 
returned value
 42209 berndsremains correct across calls to emit_queue.  */
 42209 bernds XEXP (new, 0) = copy_to_reg (XEXP (new, 0));
  6375 kenner return new;

svn log -r42209 ^/

r42209 | bernds | 2001-05-17 18:07:08 +0100 (Thu, 17 May 2001) | 2 lines

Fix queueing-related bugs

In other words, svn can correctly track the files that were modified on the 
release branch, while the git conversion looses that information, rolling up 
all the diffs on the release branch into a single unattributed commit.

As I said, gcc-reparent is better in this regard, but there are still artefacts 
from conversion, such as incorrect merge records, that show up.

R.

> 
>>
>> gcc-reparent is better, but many (most?) of the release tags are shown
>> as merge commits with a fake parent back to the gcc-3 branch point,
>> which is certainly not what happened when the tagging was done at that
>> time.
> 
> I agree with you here.
> 
>>
>> Both of these factually misrepresent the history at the time of the
>> release tag being made.
> 
> Yes and no.  Gcc-pretty repository mirrors SVN history.  And regarding the 
> need for reparenting -- we lived with current history for gcc-3 release tags 
> for a long time.  I would argue their continued brokenness is not a 
> show-stopper.
> 
> Looking at this from a 

Re: Proposal for the transition timetable for the move to GIT

2019-12-30 Thread Maxim Kuvyrkov
> On Dec 30, 2019, at 1:24 AM, Richard Earnshaw (lists) 
>  wrote:
> 
> On 29/12/2019 18:30, Maxim Kuvyrkov wrote:
>> Below are several more issues I found in reposurgeon-6a conversion comparing 
>> it against gcc-reparent conversion.
>> 
>> I am sure, these and whatever other problems I may find in the reposurgeon 
>> conversion can be fixed in time.  However, I don't see why should bother.  
>> My conversion has been available since summer 2019, I made it ready in time 
>> for GCC Cauldron 2019, and it didn't change in any significant way since 
>> then.
>> 
>> With the "Missed merges" problem (see below) I don't see how reposurgeon 
>> conversion can be considered "ready".  Also, I expected a diligent developer 
>> to compare new conversion (aka reposurgeon's) against existing conversion 
>> (aka gcc-pretty / gcc-reparent) before declaring the new conversion "better" 
>> or even "ready".  The data I'm seeing in differences between my and 
>> reposurgeon conversions shows that gcc-reparent conversion is /better/.
>> 
>> I suggest that GCC community adopts either gcc-pretty or gcc-reparent 
>> conversion.  I welcome Richard E. to modify his summary scripts to work with 
>> svn-git scripts, which should be straightforward, and I'm ready to help.
>> 
> 
> I don't think either of these conversions are any more ready to use than
> the reposurgeon one, possibly less so.  In fact, there are still some
> major issues to resolve first before they can be considered.
> 
> gcc-pretty has completely wrong parent information for the gcc-3 era
> release tags, showing the tags as being made directly from trunk with
> massive deltas representing the roll-up of all the commits that were
> made on the gcc-3 release branch.

I will clarify the above statement, and please correct me where you think I'm 
wrong.  Gcc-pretty conversion has the exact right parent information for the 
gcc-3 era
release tags as recorded in SVN version history.  Gcc-pretty conversion aims to 
produce an exact copy of SVN history in git.  IMO, it manages to do so just 
fine.

It is a different thing that SVN history has a screwed up record of gcc-3 era 
tags.

> 
> gcc-reparent is better, but many (most?) of the release tags are shown
> as merge commits with a fake parent back to the gcc-3 branch point,
> which is certainly not what happened when the tagging was done at that
> time.

I agree with you here.

> 
> Both of these factually misrepresent the history at the time of the
> release tag being made.

Yes and no.  Gcc-pretty repository mirrors SVN history.  And regarding the need 
for reparenting -- we lived with current history for gcc-3 release tags for a 
long time.  I would argue their continued brokenness is not a show-stopper.

Looking at this from a different perspective, when I posted the initial svn-git 
scripts back in Summer, the community roughly agreed on a plan to
1. Convert entire SVN history to git.
2. Use the stock git history rewrite tools (git filter-branch) to fixup what we 
want, e.g., reparent tags and branches or set better author/committer entries.

Gcc-pretty does (1) in entirety.

For reparenting, I tried a 15min fix to my scripts to enable reparenting, which 
worked, but with artifacts like the merge commit from old and new parents.  I 
will drop this and instead use tried-and-true "git filter-branch" to reparent 
those tags and branches, thus producing gcc-reparent from gcc-pretty.

> 
> As for converting my script to work with your tools, I'm afraid I don't
> have time to work on that right now.  I'm still bogged down validating
> the incorrect bug ids that the script has identified for some commits.
> I'm making good progress (we're down to 160 unreviewed commits now), but
> it is still going to take what time I have over the next week to
> complete that task.
> 
> Furthermore, there is no documentation on how your conversion scripts
> work, so it is not possible for me to test any work I might do in order
> to validate such changes.  Not being able to run the script locally to
> test change would be a non-starter.
> 
> You are welcome, of course, to clone the script I have and attempt to
> modify it yourself, it's reasonably well documented.  The sources can be
> found in esr's gcc-conversion repository here:
> https://gitlab.com/esr/gcc-conversion.git

--
Maxim Kuvyrkov
https://www.linaro.org

> 
> 
>> Meanwhile, I'm going to add additional root commits to my gcc-reparent 
>> conversion to bring in "missing" branches (the ones, which don't share 
>> history with trunk@1) and restart daily updates of gcc-reparent conversion.
>> 
>> Finally, with the comparison data I have, I consider statements about 
>> git-svn's poor quality to be very misleading.  Git-svn may have had serious 
>> bugs years ago when Eric R. evaluated it and started his work on 
>> reposurgeon.  But a lot of development has happened and many problems have 
>> been fixed since them.  At the moment it is reposurgeon that is producing 
>> 

Re: Proposal for the transition timetable for the move to GIT

2019-12-30 Thread Maxim Kuvyrkov
> On Dec 30, 2019, at 3:18 AM, Joseph Myers  wrote:
> 
> On Sun, 29 Dec 2019, Richard Earnshaw (lists) wrote:
> 
>> gcc-reparent is better, but many (most?) of the release tags are shown
>> as merge commits with a fake parent back to the gcc-3 branch point,
>> which is certainly not what happened when the tagging was done at that
>> time.
> 
> And looking at the history of gcc-reparent as part of preparing to compare 
> authors to identify commits needing manual attention to author 
> identification, I see other oddities.
> 
> Do "git log egcs_1_1_2_prerelease_2" in gcc-reparent, for example.  The 
> history ends up containing two different versions of SVN r5 and of many 
> other commits.  One of them looks normal:
> 
> commit c01d37f1690de9ea83b341780fad458f506b80c7
> Author: Charles Hannum 
> Date:   Mon Nov 27 21:22:14 1989 +
> 
>entered into RCS
> 
> 
>git-svn-id: https://gcc.gnu.org/svn/gcc/trunk@5 
> 138bc75d-0d04-0410-961f-82ee72b054a4
> 
> The other looks strange:
> 
> commit 09c5a0fa5ed76e58cc67f3d72bf397277fdd
> Author: Charles Hannum 
> Date:   Mon Nov 27 21:22:14 1989 +
> 
>entered into RCS
> 
> 
>git-svn-id: https://gcc.gnu.org/svn/gcc/trunk@5 
> 138bc75d-0d04-0410-961f-82ee72b054a4
>Updated tag 'egcs_1_1_2_prerelease_2@279090' (was bc80be265a0)
>Updated tag 'egcs_1_1_2_prerelease_2@279154' (was f7cee65b219)
>Updated tag 'egcs_1_1_2_prerelease_2@279213' (was 74dcba9b414)
>Updated tag 'egcs_1_1_2_prerelease_2@279270' (was 7e63c9b344d)
>Updated tag 'egcs_1_1_2_prerelease_2@279336' (was 47894371e3c)
>Updated tag 'egcs_1_1_2_prerelease_2@279392' (was 3c3f6932316)
>Updated tag 'egcs_1_1_2_prerelease_2@279402' (was 29d9998f523b)
> 
> (and in fact it seems there are *four* commits corresponding to SVN r5 and 
> reachable from refs in the gcc-reparent repository).  So we don't just 
> have stray merge commits, they actually end up leading back to strange 
> alternative versions of history (which I think is clearly worse than 
> conservatively not having a merge commit in some case where a commit might 
> or might not be unambiguously a merge - if a merge was missed on an active 
> branch, the branch maintainer can easily correct that afterwards with "git 
> merge -s ours" to avoid problems with future merges).
> 
> My expectation is that there are only multiple git commits corresponding 
> to an SVN commit when the SVN commit touched more than one SVN branch or 
> tag and so has to be split to represent it in git (there are about 1500 
> such SVN commits, most of them automatic datestamp updates in the CVS era 
> that cvs2svn turned into mixed-branch commits).

Thanks for catching this.  This is fallout from incremental rebuilds (rather 
than fresh builds) of gcc-reparent repository.  Incremental builds take about 
1h and full rebuilds take about 30h.  I'll switch to doing full rebuilds.

--
Maxim Kuvyrkov
https://www.linaro.org



Re: Proposal for the transition timetable for the move to GIT

2019-12-29 Thread Julien "FrnchFrgg" Rivaud

Le 30/12/2019 à 01:18, Joseph Myers a écrit :


Do "git log egcs_1_1_2_prerelease_2" in gcc-reparent, for example.  The
history ends up containing two different versions of SVN r5 and of many
other commits.


When trying to migrate Blender from svn to git, we actually tried 
git-svn first, and it produced that kind of strangeness. Sometimes, 
something it didn't like in a commit made it duplicate or even multiply 
more the whole history predating that commit, with slight differences 
(that explain the differing sha1 and thus the multiple versions).


That's actually the reason I got involved with reposurgeon in the first 
place, trying to make the then Python version able to cope with the 
Blender repository with less than 64GB of ram.


I thought that working around git-svn to only feed it linear commits 
would sidestep that bug, but it looks like it still can be triggered.


(At the time the bug was so common that we ended with maybe 20 or 30 
times the first 1500 commits in the repository, and of course with the 
speed of git-svn, doing 30 times the same work is horrendous)


Julien



Re: Proposal for the transition timetable for the move to GIT

2019-12-29 Thread Julien "FrnchFrgg" Rivaud

Le 29/12/2019 à 18:30, Ian Lance Taylor via gcc a écrit :

On Sun, Dec 29, 2019 at 5:49 AM Julien '_FrnchFrgg_' RIVAUD
 wrote:


Which brings me to something I find strange in your policy: to me,
merges from trunk to branches should be rare if not nonexistent. And you
are deciding to banish merges the other way around.


Out of curiosity, why do you say that merges from trunk to branches
should be rare?  It seems to me that any long-lived development branch
will require merges from trunk to the branch.  Are you saying that
those kinds of branches are rare?


Because in most cases, the development branch should be periodically 
rebased on top of master, not use a merge from master to the branch.


Maybe that's easier to do while developping, but in the end a real 
rebase should be made (dropping the merge commits), because what you 
will send to the ML for review should be a logical stream of changes and 
"update" merge commits are not that.


Thankfully, if you have git rerere enabled, most conflict resolutions 
you did while merging will be reused when rebasing so it should not be 
too painful.




In GCC we have historically had a pattern in which people use
long-lived parallel branches that maintain specific patches on top of
GCC trunk.  These branches provide a simple way to get a variant of
GCC with specific patches of interest to some people.  These branches
too require regular merges from trunk.


In that case, sure. But I expect these branches to never be merged in 
trunk. So the real rule would be « branches that merge from trunk should 
not be merged into trunk » (rather than « forbid merges into trunk » or 
even « pretend nobody ever merged anything into trunk, these aren't the 
droids you are looking for »)




Ian






Re: Proposal for the transition timetable for the move to GIT

2019-12-29 Thread Joseph Myers
On Sun, 29 Dec 2019, Richard Earnshaw (lists) wrote:

> gcc-reparent is better, but many (most?) of the release tags are shown
> as merge commits with a fake parent back to the gcc-3 branch point,
> which is certainly not what happened when the tagging was done at that
> time.

And looking at the history of gcc-reparent as part of preparing to compare 
authors to identify commits needing manual attention to author 
identification, I see other oddities.

Do "git log egcs_1_1_2_prerelease_2" in gcc-reparent, for example.  The 
history ends up containing two different versions of SVN r5 and of many 
other commits.  One of them looks normal:

commit c01d37f1690de9ea83b341780fad458f506b80c7
Author: Charles Hannum 
Date:   Mon Nov 27 21:22:14 1989 +

entered into RCS


git-svn-id: https://gcc.gnu.org/svn/gcc/trunk@5 
138bc75d-0d04-0410-961f-82ee72b054a4

The other looks strange:

commit 09c5a0fa5ed76e58cc67f3d72bf397277fdd
Author: Charles Hannum 
Date:   Mon Nov 27 21:22:14 1989 +

entered into RCS


git-svn-id: https://gcc.gnu.org/svn/gcc/trunk@5 
138bc75d-0d04-0410-961f-82ee72b054a4
Updated tag 'egcs_1_1_2_prerelease_2@279090' (was bc80be265a0)
Updated tag 'egcs_1_1_2_prerelease_2@279154' (was f7cee65b219)
Updated tag 'egcs_1_1_2_prerelease_2@279213' (was 74dcba9b414)
Updated tag 'egcs_1_1_2_prerelease_2@279270' (was 7e63c9b344d)
Updated tag 'egcs_1_1_2_prerelease_2@279336' (was 47894371e3c)
Updated tag 'egcs_1_1_2_prerelease_2@279392' (was 3c3f6932316)
Updated tag 'egcs_1_1_2_prerelease_2@279402' (was 29d9998f523b)

(and in fact it seems there are *four* commits corresponding to SVN r5 and 
reachable from refs in the gcc-reparent repository).  So we don't just 
have stray merge commits, they actually end up leading back to strange 
alternative versions of history (which I think is clearly worse than 
conservatively not having a merge commit in some case where a commit might 
or might not be unambiguously a merge - if a merge was missed on an active 
branch, the branch maintainer can easily correct that afterwards with "git 
merge -s ours" to avoid problems with future merges).

My expectation is that there are only multiple git commits corresponding 
to an SVN commit when the SVN commit touched more than one SVN branch or 
tag and so has to be split to represent it in git (there are about 1500 
such SVN commits, most of them automatic datestamp updates in the CVS era 
that cvs2svn turned into mixed-branch commits).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: Proposal for the transition timetable for the move to GIT

2019-12-29 Thread Jeff Law
On Sun, 2019-12-29 at 22:30 +0100, Thomas Koenig wrote:
> Am 29.12.19 um 14:26 schrieb Segher Boessenkool:
> > We cannot waste a year on a social experiment.  We can slowly and carefully
> > adopt new procedures, certainly.  But anything drastic isn't advisable imo.
> > 
> > Also, many GCC developers aren't familiar with Git at all.  It takes time
> > to learn it, and to learn new ways of working.  Small steps are needed.
> 
> Amen to that.
> 
> My uses of git have can be counted in a single digit (in decimal).  I am
> just hoping you guys know what you are doing, and I am a bit
> apprehensive about the change and my continued ability to contribute.
> 
> Talk of a radical new development model does not raise my confidence.
I was fairly anti GIT for a while, but there's simplistic workflows you
can use which will be close enough to SVN that you're really just
changing the commands you're using, not your entire workflow.

You can add in "git specific" workflows later as you've become familiar
with the basics.  That's what I did, and boy once you wrap your head
around git rebase for dealing with work in progress it's a game
changer.

jeff



Re: Proposal for the transition timetable for the move to GIT

2019-12-29 Thread Segher Boessenkool
On Sun, Dec 29, 2019 at 11:00:08PM +, Joseph Myers wrote:
> fixups in bugdb.py - and that way benefit both from reposurgeon making 
> choices that are as conservatively safe as possible, which seems a 
> desirable property for problem cases that haven't been manually reviewed, 

Problem cases that haven't been manually reviewed should *be* manually
reviewed, or the heuristics improved so there are fewer problem cases.

As I've said many many times now, we only have *one* repository to
convert here.  Taking shortcuts is *good*, making problems for ourselves
by pretending we do things more generically is *bad*.


Segher


Re: Proposal for the transition timetable for the move to GIT

2019-12-29 Thread Joseph Myers
On Sun, 29 Dec 2019, Eric S. Raymond wrote:

> Joseph Myers :
> > The case you mention is one where there was a merge to a branch not from 
> > its immediate parent but from an indirect parent.  I don't think it would 
> > be hard to support detecting such merges in reposurgeon.
> 
> We're working on it.

And the other example branch mentioned (redhat/gcc-9-branch) is a 
different case: if the merge from gcc-9-branch to redhat/gcc-9-branch had 
been done in the idiomatic way with modern SVN (i.e. naming the branch to 
merge from and letting SVN deal with identifying the revisions involved), 
I think reposurgeon would have handled it just fine.  But the commit 
messages indicate the merge was done in an old-fashioned way (naming 
individual ranges of revisions to merge manually), which resulted in merge 
properties very slightly different from what SVN creates automatically.  
Now I understand what the difference is I expect we'll be able to fix that 
case as well.

> As Joseph says, one of reposurgeon's design principles is "First, do no harm."
> 
> And yes, changelogs are full of malformations and junk like this. I
> saw and dealt with a lifetime's worth while converting the Emacs
> history from bzr to git.
> 
> If you try to interpret any random garbage in, you will assuredly
> get garbage out when you least expect it. Often the cost of this 
> sort of mistake is not fully realized until it is far too late
> for correction.  This is *why* reposurgeon is conservative.
> 
> The correct thing for reposurgeon to do is flag unparseable entry
> headers for human intervention, and as of today it does that.

Furthermore, we can compare authors in the different conversions to 
identify cases where, based on a manual review, Maxim's heuristics produce 
better results for a particular commit, and add those to the list of 
fixups in bugdb.py - and that way benefit both from reposurgeon making 
choices that are as conservatively safe as possible, which seems a 
desirable property for problem cases that haven't been manually reviewed, 
and from different heuristics helping suggest improvements in particular 
cases.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: Proposal for the transition timetable for the move to GIT

2019-12-29 Thread Eric S. Raymond
Joseph Myers :
> The case you mention is one where there was a merge to a branch not from 
> its immediate parent but from an indirect parent.  I don't think it would 
> be hard to support detecting such merges in reposurgeon.

We're working on it.

> This is an example where the originally added ChangeLog entry was 
> malformed (had the date in the form "2004-0630"), so a conservatively safe 
> approach was taken of using the committer rather than trying to guess what 
> a malformed ChangeLog entry means and risk extracting nonsense.
> 
> I expect other cases are being similarly careful in cases where there was 
> a malformed ChangeLog entry or a commit edited ChangeLog entries by other 
> authors so leaving its single-author nature ambiguous.  Parsing 
> ChangeLogs, especially where malformed entries are involved, is inherently 
> a heuristic matter.

As Joseph says, one of reposurgeon's design principles is "First, do no harm."

And yes, changelogs are full of malformations and junk like this. I
saw and dealt with a lifetime's worth while converting the Emacs
history from bzr to git.

If you try to interpret any random garbage in, you will assuredly
get garbage out when you least expect it. Often the cost of this 
sort of mistake is not fully realized until it is far too late
for correction.  This is *why* reposurgeon is conservative.

The correct thing for reposurgeon to do is flag unparseable entry
headers for human intervention, and as of today it does that.
-- 
http://www.catb.org/~esr/;>Eric S. Raymond




Re: Proposal for the transition timetable for the move to GIT

2019-12-29 Thread Joseph Myers
On Sun, 29 Dec 2019, Mark Wielaard wrote:

> Maybe we should have a separate historical git repo which contains
> everything that we were able to salvage and that people could git
> remote add if they are really, really interested.

I'm not convinced that's very different from having one repo with 
everything but some pieces in refs that aren't fetched by default.  Maybe 
separate repos make fetching a bit more efficient if it allows packs to be 
reused on the server, but they also mean extra administrative overhead 
ensuring the correct configuration for each repo (for public access, not 
allowing pushes to the historical repo, etc.).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: Proposal for the transition timetable for the move to GIT

2019-12-29 Thread Richard Earnshaw (lists)
On 29/12/2019 18:30, Maxim Kuvyrkov wrote:
> Below are several more issues I found in reposurgeon-6a conversion comparing 
> it against gcc-reparent conversion.
> 
> I am sure, these and whatever other problems I may find in the reposurgeon 
> conversion can be fixed in time.  However, I don't see why should bother.  My 
> conversion has been available since summer 2019, I made it ready in time for 
> GCC Cauldron 2019, and it didn't change in any significant way since then.
> 
> With the "Missed merges" problem (see below) I don't see how reposurgeon 
> conversion can be considered "ready".  Also, I expected a diligent developer 
> to compare new conversion (aka reposurgeon's) against existing conversion 
> (aka gcc-pretty / gcc-reparent) before declaring the new conversion "better" 
> or even "ready".  The data I'm seeing in differences between my and 
> reposurgeon conversions shows that gcc-reparent conversion is /better/.
> 
> I suggest that GCC community adopts either gcc-pretty or gcc-reparent 
> conversion.  I welcome Richard E. to modify his summary scripts to work with 
> svn-git scripts, which should be straightforward, and I'm ready to help.
> 

I don't think either of these conversions are any more ready to use than
the reposurgeon one, possibly less so.  In fact, there are still some
major issues to resolve first before they can be considered.

gcc-pretty has completely wrong parent information for the gcc-3 era
release tags, showing the tags as being made directly from trunk with
massive deltas representing the roll-up of all the commits that were
made on the gcc-3 release branch.

gcc-reparent is better, but many (most?) of the release tags are shown
as merge commits with a fake parent back to the gcc-3 branch point,
which is certainly not what happened when the tagging was done at that
time.

Both of these factually misrepresent the history at the time of the
release tag being made.

As for converting my script to work with your tools, I'm afraid I don't
have time to work on that right now.  I'm still bogged down validating
the incorrect bug ids that the script has identified for some commits.
I'm making good progress (we're down to 160 unreviewed commits now), but
it is still going to take what time I have over the next week to
complete that task.

Furthermore, there is no documentation on how your conversion scripts
work, so it is not possible for me to test any work I might do in order
to validate such changes.  Not being able to run the script locally to
test change would be a non-starter.

You are welcome, of course, to clone the script I have and attempt to
modify it yourself, it's reasonably well documented.  The sources can be
found in esr's gcc-conversion repository here:
https://gitlab.com/esr/gcc-conversion.git


> Meanwhile, I'm going to add additional root commits to my gcc-reparent 
> conversion to bring in "missing" branches (the ones, which don't share 
> history with trunk@1) and restart daily updates of gcc-reparent conversion.
> 
> Finally, with the comparison data I have, I consider statements about 
> git-svn's poor quality to be very misleading.  Git-svn may have had serious 
> bugs years ago when Eric R. evaluated it and started his work on reposurgeon. 
>  But a lot of development has happened and many problems have been fixed 
> since them.  At the moment it is reposurgeon that is producing conversions 
> with obscure mistakes in repository metadata.
> 
> 
> === Missed merges ===
> 
> Reposurgeon misses merges from trunk on 130+ branches.  I've spot-checked 
> ARM/hard_vfp_branch and redhat/gcc-9-branch and, indeed, rather mundane 
> merges were omitted.  Below is analysis for ARM/hard_vfp_branch.
> 
> $ git log --stat refs/remotes/gcc-reposurgeon-6a/ARM/hard_vfp_branch~4
> 
> commit ef92c24b042965dfef982349cd5994a2e0ff5fde
> Author: Richard Earnshaw 
> Date:   Mon Jul 20 08:15:51 2009 +
> 
> Merge trunk through to r149768
> 
> Legacy-ID: 149804
> 
>  COPYING.RUNTIME |73 +
>  ChangeLog   |   270 +-
>  MAINTAINERS |19 +-
> 
> 
> 
> at the same time for svn-git scripts we have:
> 
> $ git log --stat refs/remotes/gcc-reparent/ARM/hard_vfp_branch~4
> 
> commit ce7d5c8df673a7a561c29f095869f20567a7c598
> Merge: 4970119c20da 3a69b1e566a7
> Author: Richard Earnshaw 
> Date:   Mon Jul 20 08:15:51 2009 +
> 
> Merge trunk through to r149768
> 
> git-svn-id: 
> https://gcc.gnu.org/svn/gcc/branches/ARM/hard_vfp_branch@149804 
> 138bc75d-0d04-0410-961f-82ee72b054a4
> 
> 
> ... which agrees with
> $ svn propget svn:mergeinfo 
> file:///home/maxim.kuvyrkov/tmpfs-stuff/svnrepo/branches/ARM/hard_vfp_branch@149804
> /trunk:142588-149768
> 
> === Bad author entries ===
> 
> Reposurgeon-6a conversion has authors "12:46:56 1998 Jim Wilson" and 
> "2005-03-18 Kazu Hirata".  It is rather obvious that person's name is 

Re: Proposal for the transition timetable for the move to GIT

2019-12-29 Thread Thomas Koenig

Am 29.12.19 um 14:26 schrieb Segher Boessenkool:

We cannot waste a year on a social experiment.  We can slowly and carefully
adopt new procedures, certainly.  But anything drastic isn't advisable imo.

Also, many GCC developers aren't familiar with Git at all.  It takes time
to learn it, and to learn new ways of working.  Small steps are needed.


Amen to that.

My uses of git have can be counted in a single digit (in decimal).  I am
just hoping you guys know what you are doing, and I am a bit
apprehensive about the change and my continued ability to contribute.

Talk of a radical new development model does not raise my confidence.


Re: Proposal for the transition timetable for the move to GIT

2019-12-29 Thread Joseph Myers
On Sun, 29 Dec 2019, Maxim Kuvyrkov wrote:

> With the "Missed merges" problem (see below) I don't see how reposurgeon 
> conversion can be considered "ready".

It aims to be conservatively safe regarding merges, erring on the side of 
not adding incorrect merges if in doubt.  Because of the difficulty in 
matching SVN and git merge semantics, it's inherently hard to define 
unambiguously exactly which merges are correct and which are cherry-picks 
or erroneous.  I think extra merges are something nice-to-have rather than 
critical.

The case you mention is one where there was a merge to a branch not from 
its immediate parent but from an indirect parent.  I don't think it would 
be hard to support detecting such merges in reposurgeon.

> Reposurgeon-6a conversion has authors "12:46:56 1998 Jim Wilson" and 
> "2005-03-18 Kazu Hirata".  It is rather obvious that person's name is 
> unlikely to start with a digit.

These are already fixed in bugdb.py since that conversion, as part of the 
general review of authors to fix typos and make them more uniform.

> Reposurgeon-6a conversion misses many authors, below is a list of people 
> with names starting with "A".
> 
> Akos Kiss

This is an example where the originally added ChangeLog entry was 
malformed (had the date in the form "2004-0630"), so a conservatively safe 
approach was taken of using the committer rather than trying to guess what 
a malformed ChangeLog entry means and risk extracting nonsense.

I expect other cases are being similarly careful in cases where there was 
a malformed ChangeLog entry or a commit edited ChangeLog entries by other 
authors so leaving its single-author nature ambiguous.  Parsing 
ChangeLogs, especially where malformed entries are involved, is inherently 
a heuristic matter.

-- 
Joseph S. Myers
j...@polyomino.org.uk


Re: Proposal for the transition timetable for the move to GIT

2019-12-29 Thread Maxim Kuvyrkov
Below are several more issues I found in reposurgeon-6a conversion comparing it 
against gcc-reparent conversion.

I am sure, these and whatever other problems I may find in the reposurgeon 
conversion can be fixed in time.  However, I don't see why should bother.  My 
conversion has been available since summer 2019, I made it ready in time for 
GCC Cauldron 2019, and it didn't change in any significant way since then.

With the "Missed merges" problem (see below) I don't see how reposurgeon 
conversion can be considered "ready".  Also, I expected a diligent developer to 
compare new conversion (aka reposurgeon's) against existing conversion (aka 
gcc-pretty / gcc-reparent) before declaring the new conversion "better" or even 
"ready".  The data I'm seeing in differences between my and reposurgeon 
conversions shows that gcc-reparent conversion is /better/.

I suggest that GCC community adopts either gcc-pretty or gcc-reparent 
conversion.  I welcome Richard E. to modify his summary scripts to work with 
svn-git scripts, which should be straightforward, and I'm ready to help.

Meanwhile, I'm going to add additional root commits to my gcc-reparent 
conversion to bring in "missing" branches (the ones, which don't share history 
with trunk@1) and restart daily updates of gcc-reparent conversion.

Finally, with the comparison data I have, I consider statements about git-svn's 
poor quality to be very misleading.  Git-svn may have had serious bugs years 
ago when Eric R. evaluated it and started his work on reposurgeon.  But a lot 
of development has happened and many problems have been fixed since them.  At 
the moment it is reposurgeon that is producing conversions with obscure 
mistakes in repository metadata.


=== Missed merges ===

Reposurgeon misses merges from trunk on 130+ branches.  I've spot-checked 
ARM/hard_vfp_branch and redhat/gcc-9-branch and, indeed, rather mundane merges 
were omitted.  Below is analysis for ARM/hard_vfp_branch.

$ git log --stat refs/remotes/gcc-reposurgeon-6a/ARM/hard_vfp_branch~4

commit ef92c24b042965dfef982349cd5994a2e0ff5fde
Author: Richard Earnshaw 
Date:   Mon Jul 20 08:15:51 2009 +

Merge trunk through to r149768

Legacy-ID: 149804

 COPYING.RUNTIME |73 +
 ChangeLog   |   270 +-
 MAINTAINERS |19 +-



at the same time for svn-git scripts we have:

$ git log --stat refs/remotes/gcc-reparent/ARM/hard_vfp_branch~4

commit ce7d5c8df673a7a561c29f095869f20567a7c598
Merge: 4970119c20da 3a69b1e566a7
Author: Richard Earnshaw 
Date:   Mon Jul 20 08:15:51 2009 +

Merge trunk through to r149768

git-svn-id: https://gcc.gnu.org/svn/gcc/branches/ARM/hard_vfp_branch@149804 
138bc75d-0d04-0410-961f-82ee72b054a4


... which agrees with
$ svn propget svn:mergeinfo 
file:///home/maxim.kuvyrkov/tmpfs-stuff/svnrepo/branches/ARM/hard_vfp_branch@149804
/trunk:142588-149768

=== Bad author entries ===

Reposurgeon-6a conversion has authors "12:46:56 1998 Jim Wilson" and 
"2005-03-18 Kazu Hirata".  It is rather obvious that person's name is unlikely 
to start with a digit.

=== Missed authors ===

Reposurgeon-6a conversion misses many authors, below is a list of people with 
names starting with "A".

Akos Kiss
Anders Bertelrud
Andrew Pochinsky
Anton Hartl
Arthur Norman
Aymeric Vincent

=== Conservative author entries ===

Reposurgeon-6a conversion uses default "@gcc.gnu.org" emails for many commits 
where svn-git conversion manages to extract valid email from commit data.  This 
happens for hundreds of author entries.

Regards,

--
Maxim Kuvyrkov
https://www.linaro.org


> On Dec 26, 2019, at 7:11 PM, Maxim Kuvyrkov  wrote:
> 
> 
>> On Dec 26, 2019, at 2:16 PM, Jakub Jelinek  wrote:
>> 
>> On Thu, Dec 26, 2019 at 11:04:29AM +, Joseph Myers wrote:
>> Is there some easy way (e.g. file in the conversion scripts) to correct
>> spelling and other mistakes in the commit authors?
>> E.g. there are misspelled surnames, etc. (e.g. looking at my name, I see
>> Jakub Jakub Jelinek (1):
>> Jakub Jeilnek (1):
>> Jelinek (1):
>> entries next to the expected one with most of the commits.
>> For the misspellings, wonder if e.g. we couldn't compute edit distances from
>> other names and if we have one with many commits and then one with very few
>> with small edit distance from those, flag it for human review.
> 
> This is close to what svn-git-author.sh script is doing in gcc-pretty and 
> gcc-reparent conversions.  It ignores 1-3 character differences in 
> author/committer names and email addresses.  I've audited results for all 
> branches and didn't spot any mistakes.
> 
> In other news, I'm working on comparison of gcc-pretty, gcc-reparent and 
> gcc-reposurgeon-5a repos among themselves.  Below are current notes for 
> comparison of gcc-pretty/trunk and gcc-reposurgeon-5a/trunk.
> 
> == Merges on trunk ==
> 
> Reposurgeon 

Re: Proposal for the transition timetable for the move to GIT

2019-12-29 Thread Ian Lance Taylor via gcc
On Sun, Dec 29, 2019 at 5:49 AM Julien '_FrnchFrgg_' RIVAUD
 wrote:
>
> Which brings me to something I find strange in your policy: to me,
> merges from trunk to branches should be rare if not nonexistent. And you
> are deciding to banish merges the other way around.

Out of curiosity, why do you say that merges from trunk to branches
should be rare?  It seems to me that any long-lived development branch
will require merges from trunk to the branch.  Are you saying that
those kinds of branches are rare?

In GCC we have historically had a pattern in which people use
long-lived parallel branches that maintain specific patches on top of
GCC trunk.  These branches provide a simple way to get a variant of
GCC with specific patches of interest to some people.  These branches
too require regular merges from trunk.

Ian


Re: Proposal for the transition timetable for the move to GIT

2019-12-29 Thread Mark Wielaard
Hi,

On Wed, 2019-12-25 at 06:10 -0600, Segher Boessenkool wrote:
> git-svn did not miss any branches.  Finding branches is not done by
> git-svn at all, for this.  These branches were skipped because they
> have nothing to do with GCC, have no history in common (they are not
> descendants of revision 1).  They can easily be added -- Maxim might
> already have done that, not sure, imo it's better to just drop the
> garbage, it's in svn if anyone cares.

I just looked at one of these "missed" branches CLASSPATH.
That was created when both GNU Classpath and gcc/libgcj were both in
cvs. The idea was that it was a kind of cvs vendor branch of the
upstream GNU Classpath releases (and some random checkouts) which would
make merging imports of new code into the main trunk easier. libgcj was
merged and then based on GNU Classpath in the past/when it was
officially imported into gcc. The CLASSPATH branch only contains files
under libjava/classpath.

Some of the commits look a little odd, probably because it was
converted from cvs2svn and then again to git. GNU Classpath moved to
git a long time ago and never was in subversion. And of course these
days gcj and libgcj aren't part of the main gcc trunk anymore.

There is also a classpath-generics branch, which has a couple of
snapshots of the GNU Classpath generics branch (some pre-releases of
classpath before 0.95 which had generics separately).

There are also some other branches containing classpath:
gcj/classpath-095-import-branch
gcj/classpath-095-merge-branch
gcj/classpath-0961-import-branch
gcj/classpath-098-merge-branch
gcj/classpath-20070727-import-branch

These branches contain all of gcc, not just the files under
libjava/classpath
I am not sure why these were separate from the CLASSPATH vendor branch.

Even though I have an (historical) interest in the gcj frontend and GNU
Classpath class library I am not sure these branches would really help
me. Also I think the branch aren't very interesting without the actual
GNU Classpath (git) tree history from which they were cherry-picked.
The classpath git tree does contain tags for each import already, so
you can get the real history there.

Seeing how big the git tree/conversion already is I would suggest
leaving these out of the main git repo if at all possible.

Maybe we should have a separate historical git repo which contains
everything that we were able to salvage and that people could git
remote add if they are really, really interested.

Cheers,

Mark


Re: Proposal for the transition timetable for the move to GIT

2019-12-29 Thread Julien '_FrnchFrgg_' RIVAUD

Le 29/12/2019 à 17:32, Richard Earnshaw a écrit :

We agreed that for changes in our current workflow practices we'd defer
that until *after* we'd switched to git; so this is getting off topic.

On the other hand, we do need to sort out what we do with existing merge
history, as that forms part of the conversion.  Can we stick to what's
relevant, please, at least in this thread?


I never wanted to make the GCC project choose new rules now. What I 
advise (and you are more than able to choose to follow or not) is only 
to avoid taking decisions right now, as part of the migration, that 
would impair establishing better rules later, especially if those 
decisions come from (bad?) habits that were taken during the SVN era, 
due to the idiosyncrasies of SVN itself.


Julien




Re: Proposal for the transition timetable for the move to GIT

2019-12-29 Thread Richard Earnshaw
On 29/12/2019 12:15, Segher Boessenkool wrote:
> On Sun, Dec 29, 2019 at 12:02:51PM +0100, Richard Biener wrote:
>> For bisecting trunk a merge would be a single commit, right? So I could see 
>> value in preserving a patch series where individual steps might introduce 
>> temporary issues as a branch merge (after rebasing) so the series is visible 
>> but not when bisecting (by default). It would also make the series 
>> relatedness obvious and avoids splitting it with a commit race (if that is 
>> possible with git). 
> 
> "git bisect" actually goes all the way down the rabbit hole, it tries to
> find the first bad commit in the range you marked as starting as "good",
> ending as "bad".
> 
> It is pretty confusing to do if there are many merges, especially if many
> commits end up not building at all.  But you can always "git bisect skip"
> stuff (it just eats time, and it hampers automated bisecting).
> 
> The really nasty cases are when the code does build, but fails for
> unrelated reasons.
> 
> We require every commit to be individually tested, and if we *do* allow
> merges, that should still be done imo.  Which again makes merging less
> useful: if you are going to rebase your branch anyway (to fix simple
> stuff), why not rebase it onto trunk!
> 
>> IMHO exact workflow for merging a patch series as opposed to a single patch 
>> should be documented. 
> 
> Yes.  It isn't actually documented in so many words for what we do now,
> either, but it would be good to have.
> 
> 
> Segher
> 

We agreed that for changes in our current workflow practices we'd defer
that until *after* we'd switched to git; so this is getting off topic.

On the other hand, we do need to sort out what we do with existing merge
history, as that forms part of the conversion.  Can we stick to what's
relevant, please, at least in this thread?

R.


Re: Proposal for the transition timetable for the move to GIT

2019-12-29 Thread Segher Boessenkool
On Sun, Dec 29, 2019 at 02:48:31PM +0100, Julien '_FrnchFrgg_' RIVAUD wrote:
> >Merges aren't scary.  Merges are inconvenient.
> 
> No they are not. You are unaccustomed to them, which is different. 

Lol.  Okay, end of discussion.  You are assuming all the wrong things.


Segher


Re: Proposal for the transition timetable for the move to GIT

2019-12-29 Thread Julien '_FrnchFrgg_' RIVAUD

Le 29/12/2019 à 14:31, Segher Boessenkool a écrit :

On Sun, Dec 29, 2019 at 12:46:50PM +0100, Julien '_FrnchFrgg_' RIVAUD wrote:

At worst, no commit is testable in the
branch except the last, and git will say that the bug was introduced in
the branch, which is not worse that what you'd get without a merge commit.

We normally require every commit to be tested, so it is a lot worse, yes.


That's very good, and should not change. I test every commit of every 
merge request I submit, even on projects that use real merges. It is 
easy to create CI/CD configurations and/or hooks that enforce that when 
trying to push a patch set, with or without a merge commit.


Merge commits have the great effect of separating the history into 
related chunks. Without them, you don't really know if a single bugfix 
is logically part of a set (because it fixes something important to pave 
the way) or not, and you have to think harder to detect the end of a set 
and the start of another (with maybe single commits inbetween).





Segher





Re: Proposal for the transition timetable for the move to GIT

2019-12-29 Thread Julien '_FrnchFrgg_' RIVAUD

Le 29/12/2019 à 14:26, Segher Boessenkool a écrit :

Hi!

On Sun, Dec 29, 2019 at 12:42:07PM +0100, Julien '_FrnchFrgg_' RIVAUD wrote:

I'm not arguing that you should go that route, it seems a bit extreme to
me. But outright refusing merges on the basis they are painful is (if
you can accept the strong word) ludicrous.

They are painful for everyone working with the history later.


I don't think merges make looking at history more or less painful, 
unless you consider projects like git where there are a inordinate 
amount of merges. And even then, I think they have solutions.



  Something that we do in GCC more often than in most other projects.
I would have expected a lot if not all projects to look often in 
history, at least for projects with significant complexity.


Which is almost *never* the case for GCC, in my opinion.  Almost all
commits are smallish improvements / bugfixes.

Which are indepenent, clearly.

Every patch should normally be posted to the mailing lists for review.
Such patches should be against trunk.  And *that* patch will be approved,
so *that* is the one you will commit and push upstream eventually.


Indeed, the rebased series would be what is reviewed and pushed 
upstream. Which can be done with a merge commit anyway. I think you 
really should look at the workflow of the git project (and they have 
their share of interdependent strange things that happen too; of course 
less than GCC due to the complexity of the project, but the techniques 
to ensure you don't get bitten by that are the same).


They use merges extensively, and have a very very good track record of 
non-broken master (or at least had last time I looked).




We cannot waste a year on a social experiment.  We can slowly and carefully
adopt new procedures, certainly.  But anything drastic isn't advisable imo.

Also, many GCC developers aren't familiar with Git at all.  It takes time
to learn it, and to learn new ways of working.  Small steps are needed.


Of course ! I am not suggesting you change everything. But setting in 
stone hard rules that force the SVN mindset is harmful too.




Merges aren't scary.  Merges are inconvenient.


No they are not. You are unaccustomed to them, which is different. 
People that only ever used DVCS feel merges are much more natural and 
even productivity increasing. Some even do "bad merges", like "sync from 
trunk" every other commit, which I very much frown against.


Which brings me to something I find strange in your policy: to me, 
merges from trunk to branches should be rare if not nonexistent. And you 
are deciding to banish merges the other way around.


Julien


Re: Proposal for the transition timetable for the move to GIT

2019-12-29 Thread Segher Boessenkool
On Sun, Dec 29, 2019 at 12:46:50PM +0100, Julien '_FrnchFrgg_' RIVAUD wrote:
> At worst, no commit is testable in the 
> branch except the last, and git will say that the bug was introduced in 
> the branch, which is not worse that what you'd get without a merge commit.

We normally require every commit to be tested, so it is a lot worse, yes.


Segher


Re: Proposal for the transition timetable for the move to GIT

2019-12-29 Thread Segher Boessenkool
Hi!

On Sun, Dec 29, 2019 at 12:42:07PM +0100, Julien '_FrnchFrgg_' RIVAUD wrote:
> I'm not arguing that you should go that route, it seems a bit extreme to 
> me. But outright refusing merges on the basis they are painful is (if 
> you can accept the strong word) ludicrous.

They are painful for everyone working with the history later.  Something
that we do in GCC more often than in most other projects.

> >Merging is appropriate if there is parallel development of (mostly) 
> >independent things.
> 
> Which is almost always the case.

Which is almost *never* the case for GCC, in my opinion.  Almost all
commits are smallish improvements / bugfixes.  And most bigger things are
not independent enough -- we require the resulting thing to be (regression)
tested before pushing it upstream, and that is because often that *does*
find problems!

> >Features aren't that, usually: they can be rebased easily, and they should 
> >be posted
> >for review anyway.
> How often successive features checked into GCC are dependent on each 
> other ?

Almost always, one way or the other.  It's not just the GCC code itself
you have to consider here, there things are easily independent enough,
but looking at the code generated by GCC often shows unexpected
interactions.

> The fact that they can be rebased either way and easily is 
> almost a testimony of that. And the fact that they need review has 
> nothing to do with anything.

Every patch should normally be posted to the mailing lists for review.
Such patches should be against trunk.  And *that* patch will be approved,
so *that* is the one you will commit and push upstream eventually.

Those are the procedures we currently have, and it is necessary to keep
the tree even somewhat working most of the time.  Too often the tree is
broken for days on end :-(

> >It is very easy to use merges more often than is useful, and it hurts.
> 
> And it is very easy to use SVN-like workflows, and it hurts far more. 
> SVN, due to its centrality and inherent impossibility to encode logical 
> relationships between changes (as opposed to time-based evolution), 
> slowly impaired most developers mind openness about what can be done in 
> a worthwhile VCS. Moving to git is an opportunity to at last free 
> yourselves, not continue that narrow treading on SVN paths.

We cannot waste a year on a social experiment.  We can slowly and carefully
adopt new procedures, certainly.  But anything drastic isn't advisable imo.

Also, many GCC developers aren't familiar with Git at all.  It takes time
to learn it, and to learn new ways of working.  Small steps are needed.

> SVN was like an almanac listing successive events without any analysis. 
> That's not History (as in the field of study). Git at least can let you 
> express and use to your common benefit logical links between 
> modifications. Don't miss that train.

I think you seriously overestimate how much information content is in a
merge (esp. as applied to the GCC context).  Let's start with using good
commit messages (or actual commit messages *at all*), that has a much
better pain/gain ratio.

> Merges are not scary when the tools are good. Even the logs are totally 
> usable with a lot of merges, with suitable tools. The tool has to adapt, 
> not you.

Merges aren't scary.  Merges are inconvenient.  And yes, there is no way
that all of us will change on a non-geological time scale.


Segher


Re: Proposal for the transition timetable for the move to GIT

2019-12-29 Thread Segher Boessenkool
On Sun, Dec 29, 2019 at 12:02:51PM +0100, Richard Biener wrote:
> For bisecting trunk a merge would be a single commit, right? So I could see 
> value in preserving a patch series where individual steps might introduce 
> temporary issues as a branch merge (after rebasing) so the series is visible 
> but not when bisecting (by default). It would also make the series 
> relatedness obvious and avoids splitting it with a commit race (if that is 
> possible with git). 

"git bisect" actually goes all the way down the rabbit hole, it tries to
find the first bad commit in the range you marked as starting as "good",
ending as "bad".

It is pretty confusing to do if there are many merges, especially if many
commits end up not building at all.  But you can always "git bisect skip"
stuff (it just eats time, and it hampers automated bisecting).

The really nasty cases are when the code does build, but fails for
unrelated reasons.

We require every commit to be individually tested, and if we *do* allow
merges, that should still be done imo.  Which again makes merging less
useful: if you are going to rebase your branch anyway (to fix simple
stuff), why not rebase it onto trunk!

> IMHO exact workflow for merging a patch series as opposed to a single patch 
> should be documented. 

Yes.  It isn't actually documented in so many words for what we do now,
either, but it would be good to have.


Segher


Re: Proposal for the transition timetable for the move to GIT

2019-12-29 Thread Julien '_FrnchFrgg_' RIVAUD

Le 29/12/2019 à 12:02, Richard Biener a écrit :

On December 29, 2019 11:41:00 AM GMT+01:00, Segher Boessenkool 
 wrote:

On Sun, Dec 29, 2019 at 02:40:45AM +0100, Julien FrnchFrgg Rivaud
wrote:

Oh, I'm not talking about historical merges.  I'm saying we

shouldn't do

future merges, where we can help that.  It disagrees with our

documented

"submitting patches" protocol.

I don't see how that can be correct. Linux is heavily "submitting
patches" based, with stringent reviews on LKML, yet heavily uses

merges.

Linux has most development done in separate trees, one for each
maintainer.
That is not how GCC works.

I was talking about https://gcc.gnu.org/contribute.html , see heading
"submitting patches" :-)


Nothing should ever be flattened to a single commit.  But before

patches

hit trunk, the patch series can be made nicer than it was at the

start

of its development.

I quite agree with that, and it resonates with my TL;DR chunk of text

above.

Yup.  Rebasing is superior to merging in many ways.  Merging is
appropriate
if there is parallel development of (mostly) independent things.
Features
aren't that, usually: they can be rebased easily, and they should be
posted
for review anyway.

It is very easy to use merges more often than is useful, and it hurts.

For bisecting trunk a merge would be a single commit, right?
Not exactly. It will if the bug was not introduced by the merge, but if 
so then "git bisect" will start looking at individual commits in the 
branch, which is IMHO very good. It is far easier to have a bug pinned 
to a single change (or say 5-6 commits, if all were not buildable or 
testable), than a whole branch. At worst, no commit is testable in the 
branch except the last, and git will say that the bug was introduced in 
the branch, which is not worse that what you'd get without a merge commit.


Julien



Re: Proposal for the transition timetable for the move to GIT

2019-12-29 Thread Julien '_FrnchFrgg_' RIVAUD

Le 29/12/2019 à 11:41, Segher Boessenkool a écrit :

On Sun, Dec 29, 2019 at 02:40:45AM +0100, Julien FrnchFrgg Rivaud wrote:

Oh, I'm not talking about historical merges.  I'm saying we shouldn't do
future merges, where we can help that.  It disagrees with our documented
"submitting patches" protocol.

I don't see how that can be correct. Linux is heavily "submitting
patches" based, with stringent reviews on LKML, yet heavily uses merges.

Linux has most development done in separate trees, one for each maintainer.
That is not how GCC works.


I mentioned the git development for a reason. They use merges for 
*everything*, including patchsets by people who never contributed before 
and might never contribute afterwards. The very *concept* of a DVCS is 
that each developer has a separate tree, not each maintainer.


I'm not arguing that you should go that route, it seems a bit extreme to 
me. But outright refusing merges on the basis they are painful is (if 
you can accept the strong word) ludicrous.



Nothing should ever be flattened to a single commit. But before patches

hit trunk, the patch series can be made nicer than it was at the start
of its development.

I quite agree with that, and it resonates with my TL;DR chunk of text above.

Yup.  Rebasing is superior to merging in many ways.


That's not what I agreed with. I agreed with « the patch series can be 
made nicer », which I took to be the contrary of « append patches at the 
end ». Rebasing is *one* of the ways to do that, especially interactive 
rebasing to shuffle patches around, check that each step compiles and 
passes the full test suite (updating it if needed and correct), reword 
messages, and think a lot of times about the best progression. But I 
never opposed rebasing to merging. In particular, I clearly wrote that 
*even if you rebased*, there are very strong arguments out there about 
refusing fast-forward merges, that is *always* generate a real merge 
commit, with a cover letter message roughly corresponding to the mail 
people send on the ML to convince people their patch series are worth 
including in GCC.


That leaves individual commit messages to explain the local rationale 
behind each discrete change (not the how, as it is readily apparent from 
the code, unless the code is very clever and then an in-code comment is 
warranted)




Merging is appropriate if there is parallel development of (mostly) independent 
things.


Which is almost always the case.


Features aren't that, usually: they can be rebased easily, and they should be 
posted
for review anyway.
How often successive features checked into GCC are dependent on each 
other ? The fact that they can be rebased either way and easily is 
almost a testimony of that. And the fact that they need review has 
nothing to do with anything.

It is very easy to use merges more often than is useful, and it hurts.


And it is very easy to use SVN-like workflows, and it hurts far more. 
SVN, due to its centrality and inherent impossibility to encode logical 
relationships between changes (as opposed to time-based evolution), 
slowly impaired most developers mind openness about what can be done in 
a worthwhile VCS. Moving to git is an opportunity to at last free 
yourselves, not continue that narrow treading on SVN paths.


SVN was like an almanac listing successive events without any analysis. 
That's not History (as in the field of study). Git at least can let you 
express and use to your common benefit logical links between 
modifications. Don't miss that train.


Merges are not scary when the tools are good. Even the logs are totally 
usable with a lot of merges, with suitable tools. The tool has to adapt, 
not you.


Julien


Re: Proposal for the transition timetable for the move to GIT

2019-12-29 Thread Richard Biener
On December 29, 2019 11:41:00 AM GMT+01:00, Segher Boessenkool 
 wrote:
>On Sun, Dec 29, 2019 at 02:40:45AM +0100, Julien FrnchFrgg Rivaud
>wrote:
>> >Oh, I'm not talking about historical merges.  I'm saying we
>shouldn't do
>> >future merges, where we can help that.  It disagrees with our
>documented
>> >"submitting patches" protocol.
>> 
>> I don't see how that can be correct. Linux is heavily "submitting 
>> patches" based, with stringent reviews on LKML, yet heavily uses
>merges. 
>
>Linux has most development done in separate trees, one for each
>maintainer.
>That is not how GCC works.
>
>I was talking about https://gcc.gnu.org/contribute.html , see heading
>"submitting patches" :-)
>
>> >Nothing should ever be flattened to a single commit.  But before
>patches
>> >hit trunk, the patch series can be made nicer than it was at the
>start
>> >of its development.
>> 
>> I quite agree with that, and it resonates with my TL;DR chunk of text
>above.
>
>Yup.  Rebasing is superior to merging in many ways.  Merging is
>appropriate
>if there is parallel development of (mostly) independent things. 
>Features
>aren't that, usually: they can be rebased easily, and they should be
>posted
>for review anyway.
>
>It is very easy to use merges more often than is useful, and it hurts.

For bisecting trunk a merge would be a single commit, right? So I could see 
value in preserving a patch series where individual steps might introduce 
temporary issues as a branch merge (after rebasing) so the series is visible 
but not when bisecting (by default). It would also make the series relatedness 
obvious and avoids splitting it with a commit race (if that is possible with 
git). 

IMHO exact workflow for merging a patch series as opposed to a single patch 
should be documented. 

Richard. 

>
>Segher



Re: Proposal for the transition timetable for the move to GIT

2019-12-29 Thread Segher Boessenkool
On Sun, Dec 29, 2019 at 02:40:45AM +0100, Julien FrnchFrgg Rivaud wrote:
> >Oh, I'm not talking about historical merges.  I'm saying we shouldn't do
> >future merges, where we can help that.  It disagrees with our documented
> >"submitting patches" protocol.
> 
> I don't see how that can be correct. Linux is heavily "submitting 
> patches" based, with stringent reviews on LKML, yet heavily uses merges. 

Linux has most development done in separate trees, one for each maintainer.
That is not how GCC works.

I was talking about https://gcc.gnu.org/contribute.html , see heading
"submitting patches" :-)

> >Nothing should ever be flattened to a single commit.  But before patches
> >hit trunk, the patch series can be made nicer than it was at the start
> >of its development.
> 
> I quite agree with that, and it resonates with my TL;DR chunk of text above.

Yup.  Rebasing is superior to merging in many ways.  Merging is appropriate
if there is parallel development of (mostly) independent things.  Features
aren't that, usually: they can be rebased easily, and they should be posted
for review anyway.

It is very easy to use merges more often than is useful, and it hurts.


Segher


Re: Proposal for the transition timetable for the move to GIT

2019-12-28 Thread Julien "FrnchFrgg" Rivaud

Le 28/12/2019 à 21:28, Segher Boessenkool a écrit :

On Sat, Dec 28, 2019 at 05:11:47PM +, Richard Earnshaw (lists) wrote >> I 
disagree.  The review comments will show up as additional commits on

the branch and can be tracked back to such events.  Once history gets
flattened into a major single commit it's significantly more effort to
drill down into the history and find out why if we've lost the merge
information.


Review comments should *not* correspond to any *new* commit on any 
branch. At least not in the vast majority of cases. They should trigger 
modifications of the existing commits.


Commits are units of meaning, and their sequence is not a timeline, but 
a logical relationship (in a single patch set at least). When submitting 
a feature, the different commits *should not* correspond to passing time 
in the development but to incremental building of the feature. And when 
a patch set is finished as in the end point is what you want, you should 
rewrite it so that the changes are coherent units of change in a "small 
provable steps" point of view.


Similarly that in Maths it is customary to have proofs where you first 
proove a weaker result you know will be never used when you get to 
proving the stronger version, if that's the cleanest/most coherent or 
beautiful or convincing way to write the proof, I quite often introduce 
a method or structure early in a patch set *that I know will not survive 
the whole patchset and will never be in the final result*, just because 
that makes the transition easier and easier to prove right, be it with 
suitable passing tests, concise code changes, and extensive commit 
messages that explain the reasoning (all of those are required together 
IMHO).


The goad behind a patch set is to look like you wrote it linearly, with 
god-like insight throughout, that enabled you to modify in small, 
trivial steps some code and lead smoothly the reader towards your end 
goal that is unrecognizable from the starting point. I work very hard to 
attain that, because I do not have a god-like brain.


History littered with « That didn't work, now try that » or « Fix that 
to make the reviewer happy » is just that, history. A VCS is not to 
track history, but to track meaningful changes (IMHO again).




Oh, I'm not talking about historical merges.  I'm saying we shouldn't do
future merges, where we can help that.  It disagrees with our documented
"submitting patches" protocol.


I don't see how that can be correct. Linux is heavily "submitting 
patches" based, with stringent reviews on LKML, yet heavily uses merges. 
The git project itself uses that workflow, and reading the ML is quite 
enlightening (even enjoyable). Patch sets have to be rebased on the last 
*release*, and every one of them gets then merged by Juno C Hamano.

That means the "first parent" history of git is only merge commits.


Nothing should ever be flattened to a single commit.  But before patches
hit trunk, the patch series can be made nicer than it was at the start
of its development.


I quite agree with that, and it resonates with my TL;DR chunk of text above.


All merges lose information.  All of them.  You take two branches, and
cut and paste between the two, but you never show which part is from
what, or how conflicts were resolved, etc.  All this can be reconstructed
of course -- you know the inputs, and you have the output -- but the info
isn't there directly, and there is no why or what.  If you're lucky there
is a mail about it, or the merge commit itself goes into it a bit.


In fact, that's one of the reasons I argue for the « always use merge 
commits » rule, *even if you rebase beforehand*: the individual commits 
are the logical steps, their commit message explain the local why of the 
incremental changes, and the merge commit is the cover letter. It 
explains the rationale of the complete set, describes the feature, etc.


Again, have a look at the git ML to see what I consider as near-perfect 
application of these principles. And that can be done even without 
hugely branchy repositories (look at the git repo in gitk and cringe).


I would add that of course, no merge commit should be non-trivial. If 
you had to introduce code changes *in* the merge commit itself, 
something is wrong. Rebase before merging (but keep the merge commit, 
using the --no-ff flag).


Julien "_FrnchFrgg_" Rivaud



Re: Proposal for the transition timetable for the move to GIT

2019-12-28 Thread Segher Boessenkool
On Sat, Dec 28, 2019 at 05:11:47PM +, Richard Earnshaw (lists) wrote:
> On 28/12/2019 12:19, Segher Boessenkool wrote:
> > Branch merges do not mesh well with our commit policies, fwiw:
> > everything should normally be posted for public review on the mailing
> > lists.  This does not really work for commits that have been set in
> > stone months before.
> 
> I disagree.  The review comments will show up as additional commits on
> the branch and can be tracked back to such events.  Once history gets
> flattened into a major single commit it's significantly more effort to
> drill down into the history and find out why if we've lost the merge
> information.

Oh, I'm not talking about historical merges.  I'm saying we shouldn't do
future merges, where we can help that.  It disagrees with our documented
"submitting patches" protocol.

Nothing should ever be flattened to a single commit.  But before patches
hit trunk, the patch series can be made nicer than it was at the start
of its development.

All merges lose information.  All of them.  You take two branches, and
cut and paste between the two, but you never show which part is from
what, or how conflicts were resolved, etc.  All this can be reconstructed
of course -- you know the inputs, and you have the output -- but the info
isn't there directly, and there is no why or what.  If you're lucky there
is a mail about it, or the merge commit itself goes into it a bit.


Segher


Re: Proposal for the transition timetable for the move to GIT

2019-12-28 Thread Richard Earnshaw (lists)
On 28/12/2019 12:19, Segher Boessenkool wrote:
> Branch merges do not mesh well with our commit policies, fwiw:
> everything should normally be posted for public review on the mailing
> lists.  This does not really work for commits that have been set in
> stone months before.
> 

I disagree.  The review comments will show up as additional commits on
the branch and can be tracked back to such events.  Once history gets
flattened into a major single commit it's significantly more effort to
drill down into the history and find out why if we've lost the merge
information.

R.


Re: Proposal for the transition timetable for the move to GIT

2019-12-28 Thread Segher Boessenkool
On Fri, Dec 27, 2019 at 11:35:21AM +, Joseph Myers wrote:
> On Fri, 27 Dec 2019, Richard Earnshaw (lists) wrote:
> 
> > I'm not really sure I understand why we don't want merge commits into
> > trunk, especially for large changes.  Performing archaeology on a change
> > is just so much easier if the development history is just there.
> 
> To some extent it fits with the principle of separating changes to 
> workflow from the actual move to git (as the existing state is that we 
> have a linear history on trunk and the few merge properties that were 
> there were later deleted).  So after the conversion we could consider if 
> for future merges we wish to use merge commits.

SVN mergeinfo is not representable in Git.  It records which changesets
have been copied over from one branch to another.  Git doesn't do
changesets *at all*: it just stores tree contents, and it records one or
multiple parents for every commit.  That isn't actually derivable from
the SVN info.  You can guess, and you can guess wrong.

Branch merges do not mesh well with our commit policies, fwiw:
everything should normally be posted for public review on the mailing
lists.  This does not really work for commits that have been set in
stone months before.


Segher


Re: Proposal for the transition timetable for the move to GIT

2019-12-28 Thread Joseph Myers
On Fri, 27 Dec 2019, Eric S. Raymond wrote:

> > Merge info is not one of those cases.
> 
> Sometimes. Some Subversion mergeinfo operations map to Git's
> branch-centric merging.  Many do not, corresponding to cherry-picks
> that cannot be expressed in a Git history.

And in the case of merge commits on master: *deletion* of SVN merge 
properties (which is what was done some time ago on trunk when it was 
decided we didn't want them there) cannot be expressed in a git history.  
But using "unmerge" for the three commits in question (so they don't 
appear as merge commits in the git conversion) is one reasonable choice 
for how to represent it.

-- 
Joseph S. Myers
j...@polyomino.org.uk


Re: Proposal for the transition timetable for the move to GIT

2019-12-27 Thread Eric S. Raymond
Joseph Myers :
> reposurgeon results are fully reproducible (by design, the same inputs to 
> the same version of reposurgeon should produce the same output as a 
> git-fast-import stream,

Designer confirms, and adds that we gave a *very* stringent test suite
to verify this.

Much of it consists of bizarre malformations collected during past
conversions. GCC has added its share.
-- 
http://www.catb.org/~esr/;>Eric S. Raymond




Re: Proposal for the transition timetable for the move to GIT

2019-12-27 Thread Eric S. Raymond
Richard Earnshaw (lists) :
> Well, personally, I'd rather we didn't throw away data we have in our
> current SVN repo unless it's unpresentable in the final conversion.

I agree with this philosophy. You will have noticed by now, I hope,
that reposurgeon peserves as much as it can, leaving deletions to be 
a matter of user policy.

In the normal case, reposurgeon could save its users a significant
amount of work by being more aggressive about automatically deleting
remnant bits that are merely *very unlikely* to be useful. I deliberately
refused to go thar route.

> Merge info is not one of those cases.

Sometimes. Some Subversion mergeinfo operations map to Git's
branch-centric merging.  Many do not, corresponding to cherry-picks
that cannot be expressed in a Git history.

Reposurgeon does a correct but not complete job of translating 
mergeinfos that compose into branch merges.  It handles the simple,
cmmon cases and punts the tricky ones.

More coverage would theoretically be possible, but I don't
have the faintest clue what a general resolution rule would
look like.  Except I'm pretty sure the problem is bitchy-hard
and the solution really easy to get subtly wrong.

Frankly, I don't want to touch this mess with insulated
tongs. Somebody would have to offer me serious money to
compensate for the expected level of pain.
-- 
http://www.catb.org/~esr/;>Eric S. Raymond




Re: Proposal for the transition timetable for the move to GIT

2019-12-27 Thread Eric S. Raymond
Maxim Kuvyrkov :
> Removing auto-generated .gitignore files from reposurgeon conversion
> would allow comparison of git trees vs gcc-pretty and gcc-reparent
> beyond r195087.  So, while we are evaluating the conversion
> candidates, it is best to disable conversion features that cause
> hard-to-workaround differences.

I was going to write that feature yesterday, then Julien nipped in and
did it while my back was turned.  It's a read option,
--no-automatic-ignores.
-- 
http://www.catb.org/~esr/;>Eric S. Raymond




Re: Proposal for the transition timetable for the move to GIT

2019-12-27 Thread Segher Boessenkool
On Fri, Dec 27, 2019 at 03:32:57AM -0800, Andrew Pinski wrote:
> The one branch merge which would have helped me track down why a
> testcase was added is the tree-ssa branch merge.  If we had the commit
> for the merge to have the merge info, it would have been easier for me
> to track down that.  Note this testcase failed with a new patch I am
> working on and I decided in the end, the testcase is bogus and not
> even testing what it was testing for anyways.  There is a few other
> instances like that which would have been helpful.

It sounds like it would have helped you if the testcase had stated what
it is for, what it is testing, in the testcase file itself.  As all tests
should, imnsho.

In the more general case you need to find the discussion on the mailing
list archives.  Which is a difficul problem in itself.


Segher


Re: Proposal for the transition timetable for the move to GIT

2019-12-27 Thread Segher Boessenkool
On Fri, Dec 27, 2019 at 11:21:41AM +, Richard Earnshaw (lists) wrote:
> On 26/12/2019 18:59, Joseph Myers wrote:
> > On Thu, 26 Dec 2019, Jakub Jelinek wrote:
> >> Yes, I'd prefer the trunk to have no merge commits (in svn I've removed the
> >> svn:mergeinfo property on the trunk when it appeared too).
> > 
> > I've added the unmerge commands for the three commits in question to 
> > gcc.lift.
> 
> I'm not really sure I understand why we don't want merge commits into
> trunk, especially for large changes.  Performing archaeology on a change
> is just so much easier if the development history is just there.
> 
> Without the merge information, if you're tracking down the reason for a
> bug, you get to the merge, and then have to go find the branch where the
> development was done and start the process all over again.  With merge
> information, tools like git blame will show which commit during
> development touched the relevant line last and a major step in analysis
> is vastly simplified.

Archaeology is much simpler still if people do not do merges at all, but
use a rebase (or rebase-like, e.g. quilt) workflow.  That way, there are
no bad changes that have to be undone later, etc.  Ideally everything
comes in as small, well thought out patches.


Segher


Re: Proposal for the transition timetable for the move to GIT

2019-12-27 Thread Richard Earnshaw (lists)
On 27/12/2019 11:35, Joseph Myers wrote:
> On Fri, 27 Dec 2019, Richard Earnshaw (lists) wrote:
> 
>> I'm not really sure I understand why we don't want merge commits into
>> trunk, especially for large changes.  Performing archaeology on a change
>> is just so much easier if the development history is just there.
> 
> To some extent it fits with the principle of separating changes to 
> workflow from the actual move to git (as the existing state is that we 
> have a linear history on trunk and the few merge properties that were 
> there were later deleted).  So after the conversion we could consider if 
> for future merges we wish to use merge commits.
> 

Well, personally, I'd rather we didn't throw away data we have in our
current SVN repo unless it's unpresentable in the final conversion.
Merge info is not one of those cases.

R.


Re: Proposal for the transition timetable for the move to GIT

2019-12-27 Thread Joseph Myers
On Fri, 27 Dec 2019, Alexandre Oliva wrote:

> That depends on the used tool.  A reproducible one, or at least one that
> aimed at stability across multiple conversions, could make this easier,
> but I guess reposurgeon is not such a tool.  Which suggests to me we
> have to be even more reassured of the correctness of its moving-target
> output before we adopt it, unlike other conversion tools that have long
> had a certain stability of output built into their design.

reposurgeon results are fully reproducible (by design, the same inputs to 
the same version of reposurgeon should produce the same output as a 
git-fast-import stream, and git should then produce the same objects given 
the same fast-import stream) - note that "same inputs" here includes the 
same bug data used for adding commit summary lines (and of course past 
commit messages, when people fix commit messages in SVN that consequently 
changes the hashes for all descendent commits in any git conversion).  It 
is, however, a tool that works with the *global* commit history.

Most of reposurgeon's own tests verify an output fast-import stream is as 
expected and thus would be liable to fail if the output were not 
reproducible as a function of the input.

Even with two completely separate conversions done with different tools, 
adding a new branch into a git repository should not be more complicated 
than a rebase operation, possibly with some fixups to merge commit 
parents.

-- 
Joseph S. Myers
j...@polyomino.org.uk


Re: Proposal for the transition timetable for the move to GIT

2019-12-27 Thread Alexandre Oliva
On Dec 26, 2019, Joseph Myers  wrote:

> We should ensure we don't have missing branches in the first place (for 
> whatever definition of what branches we should have).

*nod*

> Adding a branch after the fact is a fundamentally different kind of
> operation

That depends on the used tool.  A reproducible one, or at least one that
aimed at stability across multiple conversions, could make this easier,
but I guess reposurgeon is not such a tool.  Which suggests to me we
have to be even more reassured of the correctness of its moving-target
output before we adopt it, unlike other conversion tools that have long
had a certain stability of output built into their design.


I understand you're on it, and I thank you for undertaking much of that
validation and verification work.  Your well-known attention to detail
is very valuable.

-- 
Alexandre Oliva, freedom fighter   he/him   https://FSFLA.org/blogs/lxo
Free Software Evangelist   Stallman was right, but he's left :(
GNU Toolchain EngineerFSMatrix: It was he who freed the first of us
FSF & FSFLA board memberThe Savior shall return (true);


Re: Proposal for the transition timetable for the move to GIT

2019-12-27 Thread Joseph Myers
On Fri, 27 Dec 2019, Richard Earnshaw (lists) wrote:

> I'm not really sure I understand why we don't want merge commits into
> trunk, especially for large changes.  Performing archaeology on a change
> is just so much easier if the development history is just there.

To some extent it fits with the principle of separating changes to 
workflow from the actual move to git (as the existing state is that we 
have a linear history on trunk and the few merge properties that were 
there were later deleted).  So after the conversion we could consider if 
for future merges we wish to use merge commits.

-- 
Joseph S. Myers
j...@polyomino.org.uk


Re: Proposal for the transition timetable for the move to GIT

2019-12-27 Thread Andrew Pinski
On Fri, Dec 27, 2019 at 3:22 AM Richard Earnshaw (lists)
 wrote:
>
> On 26/12/2019 18:59, Joseph Myers wrote:
> > On Thu, 26 Dec 2019, Jakub Jelinek wrote:
> >
> >> On Thu, Dec 26, 2019 at 04:58:22PM +, Joseph Myers wrote:
> >>> If we don't want merge commits on git master for the cases where people
> >>> put merge properties on trunk in the past, we can use a reposurgeon
> >>> "unmerge" command in gcc.lift to stop the few commits in question from
> >>> being merge commits (while keeping all other merges as-is).  (The merges
> >>> of trunk into other branches that copied merge properties from trunk into
> >>> those branches will still be handled correctly, with exactly two parents
> >>> rather than regaining the extra parents corresponding to the merges into
> >>> trunk that Bernd noted in an earlier version of the conversion, because
> >>> the processing that avoids redundant merge parents takes place well before
> >>> any unmerge commands are executed - so at the time of that processing,
> >>> reposurgeon knows that those other branches are in fact in the ancestry of
> >>> trunk, even if we remove that information in the final git repository.)
> >>
> >> Yes, I'd prefer the trunk to have no merge commits (in svn I've removed the
> >> svn:mergeinfo property on the trunk when it appeared too).
> >
> > I've added the unmerge commands for the three commits in question to
> > gcc.lift.
> >
>
> I'm not really sure I understand why we don't want merge commits into
> trunk, especially for large changes.  Performing archaeology on a change
> is just so much easier if the development history is just there.
>
> Without the merge information, if you're tracking down the reason for a
> bug, you get to the merge, and then have to go find the branch where the
> development was done and start the process all over again.  With merge
> information, tools like git blame will show which commit during
> development touched the relevant line last and a major step in analysis
> is vastly simplified.

The one branch merge which would have helped me track down why a
testcase was added is the tree-ssa branch merge.  If we had the commit
for the merge to have the merge info, it would have been easier for me
to track down that.  Note this testcase failed with a new patch I am
working on and I decided in the end, the testcase is bogus and not
even testing what it was testing for anyways.  There is a few other
instances like that which would have been helpful.

Thanks,
Andrew Pinski

>
> R.


Re: Proposal for the transition timetable for the move to GIT

2019-12-27 Thread Richard Earnshaw (lists)
On 26/12/2019 18:59, Joseph Myers wrote:
> On Thu, 26 Dec 2019, Jakub Jelinek wrote:
> 
>> On Thu, Dec 26, 2019 at 04:58:22PM +, Joseph Myers wrote:
>>> If we don't want merge commits on git master for the cases where people 
>>> put merge properties on trunk in the past, we can use a reposurgeon 
>>> "unmerge" command in gcc.lift to stop the few commits in question from 
>>> being merge commits (while keeping all other merges as-is).  (The merges 
>>> of trunk into other branches that copied merge properties from trunk into 
>>> those branches will still be handled correctly, with exactly two parents 
>>> rather than regaining the extra parents corresponding to the merges into 
>>> trunk that Bernd noted in an earlier version of the conversion, because 
>>> the processing that avoids redundant merge parents takes place well before 
>>> any unmerge commands are executed - so at the time of that processing, 
>>> reposurgeon knows that those other branches are in fact in the ancestry of 
>>> trunk, even if we remove that information in the final git repository.)
>>
>> Yes, I'd prefer the trunk to have no merge commits (in svn I've removed the
>> svn:mergeinfo property on the trunk when it appeared too).
> 
> I've added the unmerge commands for the three commits in question to 
> gcc.lift.
> 

I'm not really sure I understand why we don't want merge commits into
trunk, especially for large changes.  Performing archaeology on a change
is just so much easier if the development history is just there.

Without the merge information, if you're tracking down the reason for a
bug, you get to the merge, and then have to go find the branch where the
development was done and start the process all over again.  With merge
information, tools like git blame will show which commit during
development touched the relevant line last and a major step in analysis
is vastly simplified.

R.


Re: Proposal for the transition timetable for the move to GIT

2019-12-27 Thread Maxim Kuvyrkov
> On Dec 27, 2019, at 4:32 AM, Joseph Myers  wrote:
> 
> On Thu, 26 Dec 2019, Joseph Myers wrote:
> 
>>> It appears that .gitignore has been added in r1 by reposurgeon and then 
>>> deleted at r130805.  In SVN repository .gitignore was added in r195087.  
>>> I speculate that addition of .gitignore at r1 is expected, but it's 
>>> deletion at r130805 is highly suspicious.
>> 
>> I suspect this is one of the known issues related to reposurgeon-generated 
>> .gitignore files.  Since such files are not really part of the GCC 
>> history, and the .gitignore files checked into SVN are properly preserved 
>> as far as I can see, I don't think it's a particularly important issue for 
>> the GCC conversion (since auto-generated .gitignore files are only 
>> nice-to-have, not required).  I've filed 
>> https://gitlab.com/esr/reposurgeon/issues/219 anyway with a reduced test 
>> for this oddity.
> 
> This has now been fixed, so future conversion runs with reposurgeon should 
> have the automatically-generated .gitignore present until replaced by the 
> one checked into SVN.  (If people don't want automatically-generated 
> .gitignore files at all, we could always add an option to reposurgeon not 
> to generate them.)

Removing auto-generated .gitignore files from reposurgeon conversion would 
allow comparison of git trees vs gcc-pretty and gcc-reparent beyond r195087.  
So, while we are evaluating the conversion candidates, it is best to disable 
conversion features that cause hard-to-workaround differences.

> 
> I'll do another GCC conversion run to pick up all the accumulated fixes 
> and improvements (including many more PR whitelist entries / fixes in 
> Richard's script), once another ChangeLog-related fix is in.


--
Maxim Kuvyrkov
https://www.linaro.org



Re: Proposal for the transition timetable for the move to GIT

2019-12-26 Thread Joseph Myers
On Thu, 26 Dec 2019, Joseph Myers wrote:

> > It appears that .gitignore has been added in r1 by reposurgeon and then 
> > deleted at r130805.  In SVN repository .gitignore was added in r195087.  
> > I speculate that addition of .gitignore at r1 is expected, but it's 
> > deletion at r130805 is highly suspicious.
> 
> I suspect this is one of the known issues related to reposurgeon-generated 
> .gitignore files.  Since such files are not really part of the GCC 
> history, and the .gitignore files checked into SVN are properly preserved 
> as far as I can see, I don't think it's a particularly important issue for 
> the GCC conversion (since auto-generated .gitignore files are only 
> nice-to-have, not required).  I've filed 
> https://gitlab.com/esr/reposurgeon/issues/219 anyway with a reduced test 
> for this oddity.

This has now been fixed, so future conversion runs with reposurgeon should 
have the automatically-generated .gitignore present until replaced by the 
one checked into SVN.  (If people don't want automatically-generated 
.gitignore files at all, we could always add an option to reposurgeon not 
to generate them.)

I'll do another GCC conversion run to pick up all the accumulated fixes 
and improvements (including many more PR whitelist entries / fixes in 
Richard's script), once another ChangeLog-related fix is in.

-- 
Joseph S. Myers
j...@polyomino.org.uk


Re: Proposal for the transition timetable for the move to GIT

2019-12-26 Thread Joseph Myers
On Thu, 26 Dec 2019, Jakub Jelinek wrote:

> Is there some easy way (e.g. file in the conversion scripts) to correct
> spelling and other mistakes in the commit authors?

I've added author fixups to bugdb.py, so you can add any number of fixes 
(e.g. based on authors that look suspicious in "git shortlog -s -e --all" 
output) to the author_fixups array (and send a merge-request for the 
gcc-conversion project, or a patch).

The case of multiple consecutive spaces in an attribution is now 
normalized to a single space in reposurgeon, so no fixes are needed for 
that (and fixups should be given in the form with a single space).  In 
addition to that array of fixes, bugdb.py does the following so they don't 
need listing in the array of fixups: converts ISO-8859-1 NBSP to space 
(and trims such spaces at left or right or where the result is multiple 
consecutive spaces); converts ISO-8859-1 author names (coming from 
ChangeLog files) to UTF-8 (there are manual fixups for cases where the 
author in the ChangeLog file didn't seem to be ISO-8859-1 but wasn't valid 
UTF-8 either); fixes up the cases you found where certain forms of 
timestamp from the ChangeLog header, or header specifying multiple 
authors, were used but handled badly in conversion to authors.  I've found 
and reported another case where a form of ChangeLog header used in the 
past isn't handled at all, and Eric is looking at it.

-- 
Joseph S. Myers
j...@polyomino.org.uk


Re: Proposal for the transition timetable for the move to GIT

2019-12-26 Thread Eric S. Raymond
Alexandre Oliva :
> I don't see that it does (help).  Incremental conversion of a missed
> branch should include the very same parent links that the conversion of
> the entire repo would, just linking to the proper commits in the adopted
> conversion.  git-svn can do that incrementally, after the fact; I'm not
> sure whether either conversion tool we're contemplating does, but being
> able to undertake such recovery seems like a desirable feature to me.

It's all in what you have in the lift script.  Reposurgeon can do any kind
of branch surgery you want, and that can be added to the conversion pipeline
and replicated every time.

> >From what I read, he's doing verifications against SVN.  What I'm
> suggesting, at this final stage, is for us to do verify one git
> converted repo against the other.

There are no tools for that, and probably won't be unless somebody
revives repodiffer. There isn't a lot of time left in the schedule for
that, and I have my hands full fixing other glitches.  (Minor issues
about parsing ChangeLogs and generated .gitignores; the serious
problems are well behind us at this point.)

> Maxim appears to be doing so and finding (easy-to-fix) problems in the
> reposurgeon conversion; it would be nice for reposurgeon folks to
> reciprocate and maybe even point out problems in the gcc-pretty
> conversion, if they can find any, otherwise the allegations of
> unsuitability of the tools would have to be taken on blind faith.

Joseph has already made the call to go with a reposurgeon-based
conversion for reasons he explained in detail on this list. Given
that, it really doesn't make any sense for me to do any of what
you're proposing with time I could use working on Joseph's RFEs
instead.

If you're concerned about the quality of reposurgeon's conversion,
you'd be a good person to work on a comparison tool. Should I email you
a copy of the repodiffer code as it last existed in my repository?
-- 
http://www.catb.org/~esr/;>Eric S. Raymond




Re: Proposal for the transition timetable for the move to GIT

2019-12-26 Thread Richard Biener
On December 26, 2019 5:58:22 PM GMT+01:00, Joseph Myers  
wrote:
>On Thu, 26 Dec 2019, Maxim Kuvyrkov wrote:
>
>> Reposurgeon creates merge entries on trunk when changes from a branch
>
>> are merged into trunk.  This brings entire development history from
>the 
>> branch to trunk, which is both good and bad.  The good part is that
>we 
>> get more visibility into how the code evolved.  The bad part is that
>we 
>> get many "noisy" commits from merged branch (e.g., "Merge in trunk" 
>> every few revisions) and that our SVN branches are work-in-progress 
>> quality, not ready for review/commit quality.  It's common for files
>to 
>> be re-written in large chunks on branches.
>
>Seeing "noisy" or possibly confusing commits in "git log" output for 
>master is simply a consequence of the possibly confusing defaults for
>how 
>git log behaves (showing all commits in the ancestry in reverse
>committer 
>date order).  I often find "git log --first-parent" output less
>confusing 
>when dealing with any git repository making heavy use of branches (but 
>there are other options as well to control how it shows such
>histories).
>
>If we don't want merge commits on git master for the cases where people
>
>put merge properties on trunk in the past, we can use a reposurgeon 

We've never wanted merge properties on trunk, even deleted them from time to 
time. And I don't think we want any merge commits to appear in git for this 
reason
(non-official branches might be fine). 

Richard. 

>"unmerge" command in gcc.lift to stop the few commits in question from 
>being merge commits (while keeping all other merges as-is).  (The
>merges 
>of trunk into other branches that copied merge properties from trunk
>into 
>those branches will still be handled correctly, with exactly two
>parents 
>rather than regaining the extra parents corresponding to the merges
>into 
>trunk that Bernd noted in an earlier version of the conversion, because
>
>the processing that avoids redundant merge parents takes place well
>before 
>any unmerge commands are executed - so at the time of that processing, 
>reposurgeon knows that those other branches are in fact in the ancestry
>of 
>trunk, even if we remove that information in the final git repository.)
>
>> Also, reposurgeon's commit logs don't have information on SVN path
>from 
>> which the change came, so there is no easy way to determine that a
>given 
>> commit is from a merged branch, not an original trunk commit. 
>Git-svn, 
>
>I think it's idiomatic in git for a branch commit not to say "this is a
>
>commit on X branch", i.e. this is a general property of branchy git 
>histories (and unmerge is the solution if we don't want a branchy
>history 
>of master, or use of smarter git tools for viewing the history that
>people 
>may well make more use of when dealing with repositories with that kind
>of 
>history).
>
>> It appears that .gitignore has been added in r1 by reposurgeon and
>then 
>> deleted at r130805.  In SVN repository .gitignore was added in
>r195087.  
>> I speculate that addition of .gitignore at r1 is expected, but it's 
>> deletion at r130805 is highly suspicious.
>
>I suspect this is one of the known issues related to
>reposurgeon-generated 
>.gitignore files.  Since such files are not really part of the GCC 
>history, and the .gitignore files checked into SVN are properly
>preserved 
>as far as I can see, I don't think it's a particularly important issue
>for 
>the GCC conversion (since auto-generated .gitignore files are only 
>nice-to-have, not required).  I've filed 
>https://gitlab.com/esr/reposurgeon/issues/219 anyway with a reduced
>test 
>for this oddity.
>
>> Reposurgeon uses $u...@gcc.gnu.org for committer email addresses even
>
>> when it correctly detects author name from ChangeLog.
>
>I think that's logically accurate (and certainly harmless) as a 
>description of commits made to a central repository on gcc.gnu.org, 
>although using committer = author would also be OK.
>
>> == Bad summary line ==
>> 
>> While looking around r138087, below caught my eye.  Is the contents
>of 
>> summary line as expected?
>> 
>> commit cc2726884d56995c514d8171cc4a03657851657e
>> Author: Chris Fairles 
>> Date:   Wed Jul 23 14:49:00 2008 +
>> 
>> acinclude.m4 ([GLIBCXX_CHECK_CLOCK_GETTIME]): Define
>GLIBCXX_LIBS.
>
>Yes.  This seems to be Richard's script working exactly as intended, by
>
>extracting the first bit of the ChangeLog entry *after* the date/author
>
>header as a better description than "2008-07-23 Chris Fairles 
>" (i.e. it certainly gives more distinctive 
>information about the commit and is more useful than having a
>date/author 
>line as the summary line).  I don't think it's a bad summary line (but 
>Richard's script supports hardcoding new summary lines for individual 
>commits where desired).



Re: Proposal for the transition timetable for the move to GIT

2019-12-26 Thread Joseph Myers
On Thu, 26 Dec 2019, Alexandre Oliva wrote:

> I don't see that it does (help).  Incremental conversion of a missed
> branch should include the very same parent links that the conversion of
> the entire repo would, just linking to the proper commits in the adopted
> conversion.  git-svn can do that incrementally, after the fact; I'm not
> sure whether either conversion tool we're contemplating does, but being
> able to undertake such recovery seems like a desirable feature to me.

We should ensure we don't have missing branches in the first place (for 
whatever definition of what branches we should have).  Adding a branch 
after the fact is a fundamentally different kind of operation from 
including one in the conversion, because it comes with an extra constraint 
of not changing any existing commit hashes (even if the missing branch 
were e.g. merged into some existing branch and maybe logically an ideal 
conversion would thus have had different hashes for existing commits).

> Maxim appears to be doing so and finding (easy-to-fix) problems in the
> reposurgeon conversion; it would be nice for reposurgeon folks to
> reciprocate and maybe even point out problems in the gcc-pretty
> conversion, if they can find any, otherwise the allegations of

That's exactly where information on missing branches, tags in 
branches/st/tags appearing as branches, reparented commits appearing as 
merges came from - I examined properties of those conversions by 
comparison to reposurgeon conversions.

-- 
Joseph S. Myers
j...@polyomino.org.uk


Re: Proposal for the transition timetable for the move to GIT

2019-12-26 Thread Alexandre Oliva
On Dec 26, 2019, "Eric S. Raymond"  wrote:

> Alexandre Oliva :
>> On Dec 25, 2019, "Eric S. Raymond"  wrote:
>> 
>> > Reposurgeon has a reparent command.  If you have determined that a
>> > branch is detached or has an incorrect attachment point, patching the
>> > metadata of the root node to fix that is very easy.
>> 
>> Thanks, I see how that can enable a missed branch to be converted and
>> added incrementally to a converted repo even after it went live, at
>> least as long as there aren't subsequent merges from a converted branch
>> to the missed one.  I don't quite see how this helps if there are,
>> though.

> There's also a command for cutting parent links, ifvthat helps.

I don't see that it does (help).  Incremental conversion of a missed
branch should include the very same parent links that the conversion of
the entire repo would, just linking to the proper commits in the adopted
conversion.  git-svn can do that incrementally, after the fact; I'm not
sure whether either conversion tool we're contemplating does, but being
able to undertake such recovery seems like a desirable feature to me.

> repotool compare does that, and there's a production in the conversion
> makefile that applies it.

> As Joseph says in anotyer reply, he's already doing a lot of the 
> verifications you are suggesting.

>From what I read, he's doing verifications against SVN.  What I'm
suggesting, at this final stage, is for us to do verify one git
converted repo against the other.

Since both claim to be nearing readiness for adoption, I gather it's the
time for both to be comparing with each other (which should be far more
efficient than comparing with SVN) and attempting to narrow down on
differences and converge, so that the community can choose one repo or
another on the actual merits of the converted repositories (e.g. slight
policy differences in metadata conversion), rather than on allegations
by developers of either conversion tool about the reliability of the
tool used by the each other.

Maxim appears to be doing so and finding (easy-to-fix) problems in the
reposurgeon conversion; it would be nice for reposurgeon folks to
reciprocate and maybe even point out problems in the gcc-pretty
conversion, if they can find any, otherwise the allegations of
unsuitability of the tools would have to be taken on blind faith.

I wouldn't like the community to have to decide based on blind faith,
rather than hard data.  I'd much rather we had two great, maybe even
equivalent repos to choose from, possibly with a coin toss if they're
close enough, than pick one over the other on unsubstantiated faith.  It
appears to me that this final stage of collaboration and coopetition,
namely comparing the converted repos proposed for adoption and aiming at
convergence, is in the best interest of our community, even if seemingly
at odds with the promotion of either conversion tool.  I hope we can set
aside these slight conflicts of interest, and do what's best for the
community.

-- 
Alexandre Oliva, freedom fighter   he/him   https://FSFLA.org/blogs/lxo
Free Software Evangelist   Stallman was right, but he's left :(
GNU Toolchain EngineerFSMatrix: It was he who freed the first of us
FSF & FSFLA board memberThe Savior shall return (true);


Re: Proposal for the transition timetable for the move to GIT

2019-12-26 Thread Eric S. Raymond
Alexandre Oliva :
> On Dec 25, 2019, "Eric S. Raymond"  wrote:
> 
> > Reposurgeon has a reparent command.  If you have determined that a
> > branch is detached or has an incorrect attachment point, patching the
> > metadata of the root node to fix that is very easy.
> 
> Thanks, I see how that can enable a missed branch to be converted and
> added incrementally to a converted repo even after it went live, at
> least as long as there aren't subsequent merges from a converted branch
> to the missed one.  I don't quite see how this helps if there are,
> though.

There's also a command for cutting parent links, ifvthat helps.

> Could make it a requirement that at least the commits associated with
> head branches and published tags compare equal in both conversions, or
> that differences are known, understood and accepted, before we switch
> over to either one?  Going over all corresponding commits might be too
> much, but at least a representative random sample would be desirable to
> check IMHO.

repotool compare does that, and there's a production in the conversion
makefile that applies it.

As Joseph says in anotyer reply, he's already doing a lot of the 
verifications you are suggesting.
-- 
http://www.catb.org/~esr/;>Eric S. Raymond




Re: Proposal for the transition timetable for the move to GIT

2019-12-26 Thread Joseph Myers
On Thu, 26 Dec 2019, Jakub Jelinek wrote:

> On Thu, Dec 26, 2019 at 04:58:22PM +, Joseph Myers wrote:
> > If we don't want merge commits on git master for the cases where people 
> > put merge properties on trunk in the past, we can use a reposurgeon 
> > "unmerge" command in gcc.lift to stop the few commits in question from 
> > being merge commits (while keeping all other merges as-is).  (The merges 
> > of trunk into other branches that copied merge properties from trunk into 
> > those branches will still be handled correctly, with exactly two parents 
> > rather than regaining the extra parents corresponding to the merges into 
> > trunk that Bernd noted in an earlier version of the conversion, because 
> > the processing that avoids redundant merge parents takes place well before 
> > any unmerge commands are executed - so at the time of that processing, 
> > reposurgeon knows that those other branches are in fact in the ancestry of 
> > trunk, even if we remove that information in the final git repository.)
> 
> Yes, I'd prefer the trunk to have no merge commits (in svn I've removed the
> svn:mergeinfo property on the trunk when it appeared too).

I've added the unmerge commands for the three commits in question to 
gcc.lift.

-- 
Joseph S. Myers
j...@polyomino.org.uk


Re: Proposal for the transition timetable for the move to GIT

2019-12-26 Thread Jakub Jelinek
On Thu, Dec 26, 2019 at 04:58:22PM +, Joseph Myers wrote:
> If we don't want merge commits on git master for the cases where people 
> put merge properties on trunk in the past, we can use a reposurgeon 
> "unmerge" command in gcc.lift to stop the few commits in question from 
> being merge commits (while keeping all other merges as-is).  (The merges 
> of trunk into other branches that copied merge properties from trunk into 
> those branches will still be handled correctly, with exactly two parents 
> rather than regaining the extra parents corresponding to the merges into 
> trunk that Bernd noted in an earlier version of the conversion, because 
> the processing that avoids redundant merge parents takes place well before 
> any unmerge commands are executed - so at the time of that processing, 
> reposurgeon knows that those other branches are in fact in the ancestry of 
> trunk, even if we remove that information in the final git repository.)

Yes, I'd prefer the trunk to have no merge commits (in svn I've removed the
svn:mergeinfo property on the trunk when it appeared too).

Jakub



Re: Proposal for the transition timetable for the move to GIT

2019-12-26 Thread Joseph Myers
On Thu, 26 Dec 2019, Maxim Kuvyrkov wrote:

> Reposurgeon creates merge entries on trunk when changes from a branch 
> are merged into trunk.  This brings entire development history from the 
> branch to trunk, which is both good and bad.  The good part is that we 
> get more visibility into how the code evolved.  The bad part is that we 
> get many "noisy" commits from merged branch (e.g., "Merge in trunk" 
> every few revisions) and that our SVN branches are work-in-progress 
> quality, not ready for review/commit quality.  It's common for files to 
> be re-written in large chunks on branches.

Seeing "noisy" or possibly confusing commits in "git log" output for 
master is simply a consequence of the possibly confusing defaults for how 
git log behaves (showing all commits in the ancestry in reverse committer 
date order).  I often find "git log --first-parent" output less confusing 
when dealing with any git repository making heavy use of branches (but 
there are other options as well to control how it shows such histories).

If we don't want merge commits on git master for the cases where people 
put merge properties on trunk in the past, we can use a reposurgeon 
"unmerge" command in gcc.lift to stop the few commits in question from 
being merge commits (while keeping all other merges as-is).  (The merges 
of trunk into other branches that copied merge properties from trunk into 
those branches will still be handled correctly, with exactly two parents 
rather than regaining the extra parents corresponding to the merges into 
trunk that Bernd noted in an earlier version of the conversion, because 
the processing that avoids redundant merge parents takes place well before 
any unmerge commands are executed - so at the time of that processing, 
reposurgeon knows that those other branches are in fact in the ancestry of 
trunk, even if we remove that information in the final git repository.)

> Also, reposurgeon's commit logs don't have information on SVN path from 
> which the change came, so there is no easy way to determine that a given 
> commit is from a merged branch, not an original trunk commit.  Git-svn, 

I think it's idiomatic in git for a branch commit not to say "this is a 
commit on X branch", i.e. this is a general property of branchy git 
histories (and unmerge is the solution if we don't want a branchy history 
of master, or use of smarter git tools for viewing the history that people 
may well make more use of when dealing with repositories with that kind of 
history).

> It appears that .gitignore has been added in r1 by reposurgeon and then 
> deleted at r130805.  In SVN repository .gitignore was added in r195087.  
> I speculate that addition of .gitignore at r1 is expected, but it's 
> deletion at r130805 is highly suspicious.

I suspect this is one of the known issues related to reposurgeon-generated 
.gitignore files.  Since such files are not really part of the GCC 
history, and the .gitignore files checked into SVN are properly preserved 
as far as I can see, I don't think it's a particularly important issue for 
the GCC conversion (since auto-generated .gitignore files are only 
nice-to-have, not required).  I've filed 
https://gitlab.com/esr/reposurgeon/issues/219 anyway with a reduced test 
for this oddity.

> Reposurgeon uses $u...@gcc.gnu.org for committer email addresses even 
> when it correctly detects author name from ChangeLog.

I think that's logically accurate (and certainly harmless) as a 
description of commits made to a central repository on gcc.gnu.org, 
although using committer = author would also be OK.

> == Bad summary line ==
> 
> While looking around r138087, below caught my eye.  Is the contents of 
> summary line as expected?
> 
> commit cc2726884d56995c514d8171cc4a03657851657e
> Author: Chris Fairles 
> Date:   Wed Jul 23 14:49:00 2008 +
> 
> acinclude.m4 ([GLIBCXX_CHECK_CLOCK_GETTIME]): Define GLIBCXX_LIBS.

Yes.  This seems to be Richard's script working exactly as intended, by 
extracting the first bit of the ChangeLog entry *after* the date/author 
header as a better description than "2008-07-23 Chris Fairles 
" (i.e. it certainly gives more distinctive 
information about the commit and is more useful than having a date/author 
line as the summary line).  I don't think it's a bad summary line (but 
Richard's script supports hardcoding new summary lines for individual 
commits where desired).

-- 
Joseph S. Myers
j...@polyomino.org.uk


Re: Proposal for the transition timetable for the move to GIT

2019-12-26 Thread Maxim Kuvyrkov


> On Dec 26, 2019, at 2:16 PM, Jakub Jelinek  wrote:
> 
> On Thu, Dec 26, 2019 at 11:04:29AM +, Joseph Myers wrote:
> Is there some easy way (e.g. file in the conversion scripts) to correct
> spelling and other mistakes in the commit authors?
> E.g. there are misspelled surnames, etc. (e.g. looking at my name, I see
> Jakub Jakub Jelinek (1):
> Jakub Jeilnek (1):
> Jelinek (1):
> entries next to the expected one with most of the commits.
> For the misspellings, wonder if e.g. we couldn't compute edit distances from
> other names and if we have one with many commits and then one with very few
> with small edit distance from those, flag it for human review.

This is close to what svn-git-author.sh script is doing in gcc-pretty and 
gcc-reparent conversions.  It ignores 1-3 character differences in 
author/committer names and email addresses.  I've audited results for all 
branches and didn't spot any mistakes.

In other news, I'm working on comparison of gcc-pretty, gcc-reparent and 
gcc-reposurgeon-5a repos among themselves.  Below are current notes for 
comparison of gcc-pretty/trunk and gcc-reposurgeon-5a/trunk.

== Merges on trunk ==

Reposurgeon creates merge entries on trunk when changes from a branch are 
merged into trunk.  This brings entire development history from the branch to 
trunk, which is both good and bad.  The good part is that we get more 
visibility into how the code evolved.  The bad part is that we get many "noisy" 
commits from merged branch (e.g., "Merge in trunk" every few revisions) and 
that our SVN branches are work-in-progress quality, not ready for review/commit 
quality.  It's common for files to be re-written in large chunks on branches.

Also, reposurgeon's commit logs don't have information on SVN path from which 
the change came, so there is no easy way to determine that a given commit is 
from a merged branch, not an original trunk commit.  Git-svn, on the other 
hand, provides "git-svn-id: @" tags in its commit logs.

My conversion follows current GCC development policy that trunk history should 
be linear.  Branch merges to trunk are squashed.  Merges between non-trunk 
branches are handled as specified by svn:mergeinfo SVN properties.

== Differences in trees ==

Git trees (aka filesystem content) match between pretty/trunk and 
reposurgeon-5a/trunk from current tip and up tosvn's r130805.
Here is SVN log of that revision (restoration of deleted trunk):

r130805 | dberlin | 2007-12-13 01:53:37 + (Thu, 13 Dec 2007)
Changed paths:
   A /trunk (from /trunk:130802)


Reposurgeon conversion has:
-
commit 7e6f2a96e89d96c2418482788f94155d87791f0a
Author: Daniel Berlin 
Date:   Thu Dec 13 01:53:37 2007 +

Readd trunk

Legacy-ID: 130805

 .gitignore | 17 -
 1 file changed, 17 deletions(-)
-
and my conversion has:
-
commit fb128f3970789ce094c798945b4fa20eceb84cc7
Author: Daniel Berlin 
Date:   Thu Dec 13 01:53:37 2007 +

Readd trunk


git-svn-id: https://gcc.gnu.org/svn/gcc/trunk@130805 
138bc75d-0d04-0410-961f-82ee72b054a4
-

It appears that .gitignore has been added in r1 by reposurgeon and then deleted 
at r130805.  In SVN repository .gitignore was added in r195087.  I speculate 
that addition of .gitignore at r1 is expected, but it's deletion at r130805 is 
highly suspicious.

== Committer entries ==

Reposurgeon uses $u...@gcc.gnu.org for committer email addresses even when it 
correctly detects author name from ChangeLog.

reposurgeon-5a:
r278995 Martin Liska  Martin Liska 
r278994 Jozef Lawrynowicz  Jozef Lawrynowicz 

r278993 Frederik Harwath  Frederik Harwath 

r278992 Georg-Johann Lay  Georg-Johann Lay 
r278991 Richard Biener  Richard Biener 

pretty:
r278995 Martin Liska  Martin Liska 
r278994 Jozef Lawrynowicz  Jozef Lawrynowicz 

r278993 Frederik Harwath  Frederik Harwath 

r278992 Georg-Johann Lay  Georg-Johann Lay 
r278991 Richard Biener  Richard Biener 

== Bad summary line ==

While looking around r138087, below caught my eye.  Is the contents of summary 
line as expected?

commit cc2726884d56995c514d8171cc4a03657851657e
Author: Chris Fairles 
Date:   Wed Jul 23 14:49:00 2008 +

acinclude.m4 ([GLIBCXX_CHECK_CLOCK_GETTIME]): Define GLIBCXX_LIBS.

2008-07-23  Chris Fairles 

* acinclude.m4 ([GLIBCXX_CHECK_CLOCK_GETTIME]): Define GLIBCXX_LIBS.
Holds the lib that defines clock_gettime (-lrt or -lposix4).
* src/Makefile.am: Use it.
* configure: Regenerate.
* configure.in: Likewise.
* Makefile.in: Likewise.
* src/Makefile.in: Likewise.
* libsup++/Makefile.in: Likewise.
* po/Makefile.in: Likewise.
* doc/Makefile.in: Likewise.

Legacy-ID: 138087


--
Maxim Kuvyrkov

Re: Proposal for the transition timetable for the move to GIT

2019-12-26 Thread Joseph Myers
On Thu, 26 Dec 2019, Jakub Jelinek wrote:

> Is there some easy way (e.g. file in the conversion scripts) to correct
> spelling and other mistakes in the commit authors?

These can be corrected via reposurgeon commands in gcc.lift (see the 
existing "// attribution =A set 
jwakely@gmail.com" command), or the msgout/msgin mechanism used in 
Richard's script for commit message improvements could also make changes 
to authors (don't know the exact syntax offhand, but I believe authors are 
among the things that mechanism allows to be changed in commit metadata, 
so the script could gain a table of author corrections to apply).

> Or I see in git shortlog parts of date being parsed as name, e.g.
> (basically anything in git shortlog after the "..." wrapped names and before
> Aaron Conole (2): in alphabetical sorting, or after Zuxy Meng (4):.
> 00:27 -0700  Zack Weinberg (1):

> lsd.ic.unicamp.br),  Jakub Jelinek (1):

Filed https://gitlab.com/esr/reposurgeon/issues/218 for these kinds of 
ChangeLog entries - some changes to regular expressions should be able to 
make the code handle them better (possibly by reverting to committer 
identities in some more cases where the ChangeLog header line looks odd in 
some way).

> Eric Botcazou (1):

I didn't include anything for this in my reduced test.  I'd noted some of 
the invalid attribution warnings from reposurgeon also involving bytes 
0xA0 (= ISO-8859-1 NBSP).  If anything is appropriate there, it might be 
something like "change any 0xA0 that's preceded by an ASCII byte to ASCII 
space before processing further" ("preceded by an ASCII byte" being needed 
to avoid the case of 0xA0 in the middle of a UTF-8 character).

-- 
Joseph S. Myers
j...@polyomino.org.uk


Re: Proposal for the transition timetable for the move to GIT

2019-12-26 Thread Jakub Jelinek
On Thu, Dec 26, 2019 at 11:04:29AM +, Joseph Myers wrote:
Is there some easy way (e.g. file in the conversion scripts) to correct
spelling and other mistakes in the commit authors?
E.g. there are misspelled surnames, etc. (e.g. looking at my name, I see
Jakub Jakub Jelinek (1):
Jakub Jeilnek (1):
Jelinek (1):
entries next to the expected one with most of the commits.
For the misspellings, wonder if e.g. we couldn't compute edit distances from
other names and if we have one with many commits and then one with very few
with small edit distance from those, flag it for human review.

Or I see in git shortlog parts of date being parsed as name, e.g.
(basically anything in git shortlog after the "..." wrapped names and before
Aaron Conole (2): in alphabetical sorting, or after Zuxy Meng (4):.
00:27 -0700  Zack Weinberg (1):
  c-typeck.c (c_expand_start_case): Return immediately if exp is an 
ERROR_MARK.

01:17 -0500  Zack Weinberg (1):
  cpplib.h (struct cpp_buffer): Replace dir and dlen members with a struct 
file_name_list pointer.

02:50  Ulrich Drepper (1):
  Handle __set_errno correctly.

04:08  Ulrich Drepper (1):
  Fix all problems reported by the test suite.

07:51 -0500  Zack Weinberg (1):
  gcc.c: Split out Objective-C specs to...
...
Or e.g.
linux.org.pl) & Denis Chertykov (1):
  avr.c (avr_case_values_threshold): New.

lsd.ic.unicamp.br),  Jakub Jelinek (1):
  configure.in: When target is sparc* and tm_file contains 64, test for 
64bit support in assembler.

lsd.ic.unicamp.br), Richard Henderson (1):
  resource.c (mark_referenced_resources): Mark a set strict_low_part as 
used.

m17n.org), Kaz Kojima (1):
  lib1funcs.asm (GLOBAL): Define.

redhat.com), Alexandre Oliva (1):
  * g++.dg/init/pm1.C: New test.

redhat.com), Bernd Schmidt (1):
  reload.c (find_reloads_address_1): Generate reloads for auto_inc pseudos 
that refer to the original pseudos...

redhat.com), DJ Delorie (1):
  configure.in (FLAGS_FOR_TARGET): Use -nostdinc even for Canadian 
crosses...

redhat.com), J"orn Rennecke (1):
  reload1.c (move2add_note_store): Treat all registers about which no 
information is known as potential bases...

redhat.com), Jakub Jelinek (1):
  re PR debug/54693 (VTA guality issues with loops)

redhat.com), Jan Hubicka (1):
  tree-ssa-live.c (remove_unused_scope_block_p): Drop declarations and 
blocks only after inlining.

redhat.com), Jeff Sturm (1):
  Makefile.in (AS_FOR_TARGET, [...]): If gcc/xgcc is built, use 
-print-prog-name to find out the program name to use.

redhat.com), Kazu Hirata (1):
  h8300.md: Remove the memory alternative and correct the insn lengths in 
the templates for...

redhat.com), NIIBE Yutaka (1):
  sh-protos.h (symbol_ref_operand): Declare.

Eric Botcazou (1):
  config.gcc (sparc64-*-solaris2*, [...]): Add tm-dwarf2.h to tm_file.

Jakub



Re: Proposal for the transition timetable for the move to GIT

2019-12-26 Thread Joseph Myers
On Thu, 26 Dec 2019, Alexandre Oliva wrote:

> Could make it a requirement that at least the commits associated with
> head branches and published tags compare equal in both conversions, or
> that differences are known, understood and accepted, before we switch
> over to either one?  Going over all corresponding commits might be too

The checks I run on every conversion with reposurgeon include checking the 
tree contents at the tip of every (non-deleted) branch and tag agree with 
SVN (this check now includes checking that execute permissions match).  
Empty directories are removed from SVN checkouts before that comparison; 
.gitignore files are ignored because of those generated automatically by 
reposurgeon from svn:ignore properties (but the conversion is set to 
prefer .gitignore files checked into SVN where they exist, and empirically 
that works as expected, so differences relating to automatically-generated 
.gitignore files are only relevant to older branches).  Two branches 
(c++-modules and melt-branch) have some files with SVN keyword expansion 
enabled, which causes expected differences in such comparisons.

The scripts used for those checks are checked into the gcc-conversion 
repository; the input needed is a mapping from SVN branch / tag paths to 
git refs (along with the SVN revision number the branch tips should 
match).  The main thing that consumes time in the checks is switching SVN 
checkouts to a different branch.

-- 
Joseph S. Myers
j...@polyomino.org.uk


  1   2   >