Re: Repo conversion troubles.

2018-07-20 Thread Eric S. Raymond
Joseph Myers :
> On Mon, 9 Jul 2018, Alexandre Oliva wrote:
> 
> > On Jul  9, 2018, Jeff Law  wrote:
> > 
> > > On 07/09/2018 01:57 PM, Eric S. Raymond wrote:
> > >> Jeff Law :
> > >>> I'm not aware of any such merges, but any that occurred most likely
> > >>> happened after mid-April when the trunk was re-opened for development.
> > 
> > >> I'm pretty certain things were still good at r256000.  I've started that
> > >> check running.  Not expecting results in less than twelve hours.
> > 
> > > r256000 would be roughly Christmas 2017.
> > 
> > When was the RAID/LVM disk corruption incident?  Could it possibly have
> > left any of our svn repo metadata in a corrupted way that confuses
> > reposurgeon, and that leads to such huge differences?
> 
> That was 14/15 Aug 2017, and all the SVN revision data up to r251080 were 
> restored from backup within 24 hours or so.  I found no signs of damage to 
> revisions from the 24 hours or so between r251080 and the time of the 
> corruption when I examined diffs for all those revisions by hand at that 
> time.

Agreed. I don't think that incident is at the root of the problems.
-- 
http://www.catb.org/~esr/;>Eric S. Raymond

My work is funded by the Internet Civil Engineering Institute: https://icei.org
Please visit their site and donate: the civilization you save might be your own.




Re: Repo conversion troubles.

2018-07-20 Thread Eric S. Raymond
Joseph Myers :
> On Mon, 9 Jul 2018, Eric S. Raymond wrote:
> 
> > Richard Biener :
> > > 12 hours from remote I guess? The subversion repository is available 
> > > through rsync so you can create a local mirror to work from (we've been 
> > > doing that at suse for years) 
> > 
> > I'm saying I see rsync plus local checkout take 10-12 hours.  I asked Jason
> > about this and his response was basically "Well...we don't do that often."
> 
> Isn't that a local checkout *of top-level of the repository*, i.e. 
> checking out all branches and tags?  Which is indeed something developers 
> would never normally do - they'd just check out the particular branches 
> they're working on.

It is.  I have to check out all tags and branches to validate the conversion.
-- 
http://www.catb.org/~esr/;>Eric S. Raymond

My work is funded by the Internet Civil Engineering Institute: https://icei.org
Please visit their site and donate: the civilization you save might be your own.




Re: Repo conversion troubles.

2018-07-20 Thread Joseph Myers
On Tue, 10 Jul 2018, Jonathan Wakely wrote:

> > Large-scale, I'm afraid.  The context diff is about a GLOC.
> 
> I don't see how that's possible. Most of those files are tiny, or
> change very rarely, so I don't see how that large a diff can happen.

Concretely, the *complete GCC source tree* (trunk, that is) is under 1 GB.  
A complete diff generating the whole source tree from nothing would only 
be about 15 MLOC.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: Repo conversion troubles.

2018-07-20 Thread Joseph Myers
On Mon, 9 Jul 2018, Alexandre Oliva wrote:

> On Jul  9, 2018, Jeff Law  wrote:
> 
> > On 07/09/2018 01:57 PM, Eric S. Raymond wrote:
> >> Jeff Law :
> >>> I'm not aware of any such merges, but any that occurred most likely
> >>> happened after mid-April when the trunk was re-opened for development.
> 
> >> I'm pretty certain things were still good at r256000.  I've started that
> >> check running.  Not expecting results in less than twelve hours.
> 
> > r256000 would be roughly Christmas 2017.
> 
> When was the RAID/LVM disk corruption incident?  Could it possibly have
> left any of our svn repo metadata in a corrupted way that confuses
> reposurgeon, and that leads to such huge differences?

That was 14/15 Aug 2017, and all the SVN revision data up to r251080 were 
restored from backup within 24 hours or so.  I found no signs of damage to 
revisions from the 24 hours or so between r251080 and the time of the 
corruption when I examined diffs for all those revisions by hand at that 
time.

(If anyone rsynced corrupted old revisions from the repository during the 
window of corruption, those corrupted old revisions might remain in their 
rsynced repository copy because the restoration preserved file times and 
size, just fixing corrupted contents.)

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: Repo conversion troubles.

2018-07-20 Thread Joseph Myers
On Mon, 9 Jul 2018, Eric S. Raymond wrote:

> Richard Biener :
> > 12 hours from remote I guess? The subversion repository is available 
> > through rsync so you can create a local mirror to work from (we've been 
> > doing that at suse for years) 
> 
> I'm saying I see rsync plus local checkout take 10-12 hours.  I asked Jason
> about this and his response was basically "Well...we don't do that often."

Isn't that a local checkout *of top-level of the repository*, i.e. 
checking out all branches and tags?  Which is indeed something developers 
would never normally do - they'd just check out the particular branches 
they're working on.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: Repo conversion troubles.

2018-07-10 Thread Philip Martin
"Eric S. Raymond"  writes:

> I'm saying I see rsync plus local checkout take 10-12 hours.

The rsync is a one-off cost.  Once you have the repository locally you
can checkout any individual revision much more quickly.  I have a local
copy of the gcc repository and a checkout of gcc trunk from localhost
takes about 40 seconds.  I'm not using fancy hardware.  I can even check
it out across my very average WiFi in just over 60 seconds.

-- 
Philip


Re: Repo conversion troubles.

2018-07-10 Thread Eric S. Raymond
Jonathan Wakely :
> On Tue, 10 Jul 2018 at 09:19, Jonathan Wakely  wrote:
> >
> > On Mon, 9 Jul 2018 at 21:00, Eric S. Raymond  wrote:
> > >
> > > Bernd Schmidt :
> > > > On 07/09/2018 09:19 PM, Eric S. Raymond wrote:
> > > > > Last time I did a comparison between SVN head and the git conversion
> > > > > tip they matched exactly.  This time I have mismatches in the 
> > > > > following
> > > > > files.
> > > >
> > > > So what are the diffs? Are we talking about small differences (like one
> > > > change missing) or large-scale mismatches?
> > >
> > > Large-scale, I'm afraid.  The context diff is about a GLOC.
> >
> > I don't see how that's possible. Most of those files are tiny, or
> > change very rarely, so I don't see how that large a diff can happen.
> >
> > Take zlib/configure.ac and zlib/configure, there's only been one
> > change in the past 18 months: https://gcc.gnu.org/r261739
> > That change didn't touch the other files in the list.
> >
> > libtool.m4 has one change in the past 2 years (just a few days ago):
> > https://gcc.gnu.org/r262451
> > That was also tiny, and didn't touch the other files.
> >
> > maintainer-scripts/crontab only has one change in the past 6 months:
> > https://gcc.gnu.org/r259637
> > That was a tiny change, and didn't touch any other files.
> >
> > None of those were merges from any other branch.
> 
> libtool.m4
> ltmain.sh
> 
> Changed by https://gcc.gnu.org/r262451
> 
> libvtv/ChangeLog
> libvtv/configure
> libvtv/testsuite/lib/libvtv.exp
> 
> Changed by https://gcc.gnu.org/r257809 https://gcc.gnu.org/r259462
> https://gcc.gnu.org/r259487 https://gcc.gnu.org/r259837
> https://gcc.gnu.org/r259838 (but mostly one line changes).
> 
> lto-plugin/ChangeLog
> lto-plugin/configure
> lto-plugin/lto-plugin.c
> 
> Changed by https://gcc.gnu.org/r259462 and https://gcc.gnu.org/r260960
> 
> MAINTAINERS
> 
> This file sees a air bit of churn, but all one line changes.
> https://gcc.gnu.org/viewcvs/gcc/trunk/MAINTAINERS?view=log
> 
> maintainer-scripts/ChangeLog
> maintainer-scripts/crontab
> maintainer-scripts/gcc_release
> 
> Changed by https://gcc.gnu.org/r257045 and https://gcc.gnu.org/r259637
> and https://gcc.gnu.org/r259881
> 
> Makefile.def
> Makefile.in
> Makefile.tpl
> 
> Changed by https://gcc.gnu.org/r261717 (which didn't touch any other
> files) but also by some large changes, which might have been merges:
> https://gcc.gnu.org/r255195 (large removal of feature)
> https://gcc.gnu.org/r259669 https://gcc.gnu.org/r259755
> https://gcc.gnu.org/r261304 (another large feature removal)
> https://gcc.gnu.org/r262267
> 
> zlib/configure
> zlib/configure.ac
> 
> Changed by https://gcc.gnu.org/r261739
> 
> There's no single change that touched all of them. Not even two or
> three changes that seem seem to have anything in common, except for
> autoconf regeneration, which happens frequently throughout GCC's
> history.

I don't know what's going on either, yet.  I'm trying to idenify the
earliest point of content mismatch now.

Thanks for all this data.  It may help a lot.
-- 
http://www.catb.org/~esr/;>Eric S. Raymond

My work is funded by the Internet Civil Engineering Institute: https://icei.org
Please visit their site and donate: the civilization you save might be your own.




Re: Repo conversion troubles.

2018-07-10 Thread Jonathan Wakely
On Tue, 10 Jul 2018 at 09:19, Jonathan Wakely  wrote:
>
> On Mon, 9 Jul 2018 at 21:00, Eric S. Raymond  wrote:
> >
> > Bernd Schmidt :
> > > On 07/09/2018 09:19 PM, Eric S. Raymond wrote:
> > > > Last time I did a comparison between SVN head and the git conversion
> > > > tip they matched exactly.  This time I have mismatches in the following
> > > > files.
> > >
> > > So what are the diffs? Are we talking about small differences (like one
> > > change missing) or large-scale mismatches?
> >
> > Large-scale, I'm afraid.  The context diff is about a GLOC.
>
> I don't see how that's possible. Most of those files are tiny, or
> change very rarely, so I don't see how that large a diff can happen.
>
> Take zlib/configure.ac and zlib/configure, there's only been one
> change in the past 18 months: https://gcc.gnu.org/r261739
> That change didn't touch the other files in the list.
>
> libtool.m4 has one change in the past 2 years (just a few days ago):
> https://gcc.gnu.org/r262451
> That was also tiny, and didn't touch the other files.
>
> maintainer-scripts/crontab only has one change in the past 6 months:
> https://gcc.gnu.org/r259637
> That was a tiny change, and didn't touch any other files.
>
> None of those were merges from any other branch.

libtool.m4
ltmain.sh

Changed by https://gcc.gnu.org/r262451

libvtv/ChangeLog
libvtv/configure
libvtv/testsuite/lib/libvtv.exp

Changed by https://gcc.gnu.org/r257809 https://gcc.gnu.org/r259462
https://gcc.gnu.org/r259487 https://gcc.gnu.org/r259837
https://gcc.gnu.org/r259838 (but mostly one line changes).

lto-plugin/ChangeLog
lto-plugin/configure
lto-plugin/lto-plugin.c

Changed by https://gcc.gnu.org/r259462 and https://gcc.gnu.org/r260960

MAINTAINERS

This file sees a air bit of churn, but all one line changes.
https://gcc.gnu.org/viewcvs/gcc/trunk/MAINTAINERS?view=log

maintainer-scripts/ChangeLog
maintainer-scripts/crontab
maintainer-scripts/gcc_release

Changed by https://gcc.gnu.org/r257045 and https://gcc.gnu.org/r259637
and https://gcc.gnu.org/r259881

Makefile.def
Makefile.in
Makefile.tpl

Changed by https://gcc.gnu.org/r261717 (which didn't touch any other
files) but also by some large changes, which might have been merges:
https://gcc.gnu.org/r255195 (large removal of feature)
https://gcc.gnu.org/r259669 https://gcc.gnu.org/r259755
https://gcc.gnu.org/r261304 (another large feature removal)
https://gcc.gnu.org/r262267

zlib/configure
zlib/configure.ac

Changed by https://gcc.gnu.org/r261739

There's no single change that touched all of them. Not even two or
three changes that seem seem to have anything in common, except for
autoconf regeneration, which happens frequently throughout GCC's
history.


Re: Repo conversion troubles.

2018-07-10 Thread Jonathan Wakely
On Mon, 9 Jul 2018 at 21:00, Eric S. Raymond  wrote:
>
> Bernd Schmidt :
> > On 07/09/2018 09:19 PM, Eric S. Raymond wrote:
> > > Last time I did a comparison between SVN head and the git conversion
> > > tip they matched exactly.  This time I have mismatches in the following
> > > files.
> >
> > So what are the diffs? Are we talking about small differences (like one
> > change missing) or large-scale mismatches?
>
> Large-scale, I'm afraid.  The context diff is about a GLOC.

I don't see how that's possible. Most of those files are tiny, or
change very rarely, so I don't see how that large a diff can happen.

Take zlib/configure.ac and zlib/configure, there's only been one
change in the past 18 months: https://gcc.gnu.org/r261739
That change didn't touch the other files in the list.

libtool.m4 has one change in the past 2 years (just a few days ago):
https://gcc.gnu.org/r262451
That was also tiny, and didn't touch the other files.

maintainer-scripts/crontab only has one change in the past 6 months:
https://gcc.gnu.org/r259637
That was a tiny change, and didn't touch any other files.

None of those were merges from any other branch.


Re: Repo conversion troubles.

2018-07-09 Thread Richard Biener
On July 9, 2018 10:20:39 PM GMT+02:00, "Eric S. Raymond"  
wrote:
>Richard Biener :
>> 12 hours from remote I guess? The subversion repository is available
>through rsync so you can create a local mirror to work from (we've been
>doing that at suse for years) 
>
>I'm saying I see rsync plus local checkout take 10-12 hours. 

For a fresh rsync I can guess that's true. But it works incremental just fine 
and quick for me... 

 I asked
>Jason
>about this and his response was basically "Well...we don't do that
>often."
>
>You probably never see thids case.  Update from a remote is much
>faster.
>
>I'm trying to do a manual correctness check via update to commit 256000
>now.



Re: Repo conversion troubles.

2018-07-09 Thread Alexandre Oliva
On Jul  9, 2018, Jeff Law  wrote:

> On 07/09/2018 01:57 PM, Eric S. Raymond wrote:
>> Jeff Law :
>>> I'm not aware of any such merges, but any that occurred most likely
>>> happened after mid-April when the trunk was re-opened for development.

>> I'm pretty certain things were still good at r256000.  I've started that
>> check running.  Not expecting results in less than twelve hours.

> r256000 would be roughly Christmas 2017.

When was the RAID/LVM disk corruption incident?  Could it possibly have
left any of our svn repo metadata in a corrupted way that confuses
reposurgeon, and that leads to such huge differences?

On Jul  9, 2018, "Eric S. Raymond"  wrote:

> Bernd Schmidt :
>> So what are the diffs? Are we talking about small differences (like one
>> change missing) or large-scale mismatches?

> Large-scale, I'm afraid.  The context diff is about a GLOC.

-- 
Alexandre Oliva, freedom fighter   https://FSFLA.org/blogs/lxo
Be the change, be Free! FSF Latin America board member
GNU Toolchain EngineerFree Software Evangelist


Re: Repo conversion troubles.

2018-07-09 Thread Eric S. Raymond
Richard Biener :
> 12 hours from remote I guess? The subversion repository is available through 
> rsync so you can create a local mirror to work from (we've been doing that at 
> suse for years) 

I'm saying I see rsync plus local checkout take 10-12 hours.  I asked Jason
about this and his response was basically "Well...we don't do that often."

You probably never see thids case.  Update from a remote is much faster.

I'm trying to do a manual correctness check via update to commit 256000 now.
-- 
http://www.catb.org/~esr/;>Eric S. Raymond

My work is funded by the Internet Civil Engineering Institute: https://icei.org
Please visit their site and donate: the civilization you save might be your own.




Re: Repo conversion troubles.

2018-07-09 Thread Eric S. Raymond
Jeff Law :
> > I'm pretty certain things were still good at r256000.  I've started that
> > check running.  Not expecting results in less than twelve hours.

> r256000 would be roughly Christmas 2017.  I'd be very surprised if any
> merges to the trunk happened between that point and early April.  We're
> essentially in regression bugfixes only during that timeframe.  Not a
> time for branch->trunk merging :-)

Thanks, that's useful to know.  That means if the r256000 check passes
I can jump forward to 1 Apr reasonably expecting that one to pass too.
-- 
http://www.catb.org/~esr/;>Eric S. Raymond

My work is funded by the Internet Civil Engineering Institute: https://icei.org
Please visit their site and donate: the civilization you save might be your own.




Re: Repo conversion troubles.

2018-07-09 Thread Richard Biener
On July 9, 2018 9:19:11 PM GMT+02:00, e...@thyrsus.com wrote:
>Last time I did a comparison between SVN head and the git conversion
>tip they matched exactly.  This time I have mismatches in the following
>files.
>
>libtool.m4
>libvtv/ChangeLog
>libvtv/configure
>libvtv/testsuite/lib/libvtv.exp
>ltmain.sh
>lto-plugin/ChangeLog
>lto-plugin/configure
>lto-plugin/lto-plugin.c
>MAINTAINERS
>maintainer-scripts/ChangeLog
>maintainer-scripts/crontab
>maintainer-scripts/gcc_release
>Makefile.def
>Makefile.in
>Makefile.tpl
>zlib/configure
>zlib/configure.ac
>
>Now I'll explain what this means and why it's a serious problem.
>
>Reposurgeon is never confused by linear history, branching, or
>tagging; I have lots of regression tests for those cases.  When it
>screws up it is invariably around branch copy operations, because
>there are cases near those where the data model of Subversion stream
>files is underspecified. That model was in fact entirely undocumented
>before I reverse-engineered it and wrote the description that now
>lives in the Subversion source tree.  But that description is not
>complete; nobody, not even Subversion's designers, knows how to fill
>in all the corner cases.
>
>Thus, a content mismatch like this means there was some recent branch
>merge to trunk in the gcc history that reposurgeon is not interpreting
>as intended, or more likely an operator error such as a non-Subversion
>directory copy followed by a commit - my analyzer can recover from
>most such cases but not all.
>
>There are brute-force ways to pin down such malformations, but none of
>them are practical at the huge scale of this repository.  The main
>problem here wouldn't reposurgeon itself but the fact that Subversion
>checkouts on a repo this large are very slow. I've seen a single one
>take 12 hours; an attempt at a whole bisection run to pin down the
>divergence point on trunk would therefore probably cost log2 of the
>commit length times that, or about 18 days.

12 hours from remote I guess? The subversion repository is available through 
rsync so you can create a local mirror to work from (we've been doing that at 
suse for years) 

Richard. 

>
>So...does that list of changed files look familar to anyone?  If we can
>identify the revision number of the bad commit, the odds of being able
>to unscramble this mess go way up.  They still aren't good, not when
>merely loading the repository for examination takes over four hours,
>but they would way better than if I were starting from zero.
>
>This is serious. I have preduced demonstrably correct history
>conversions of the gcc repo in the past.  We may now be in a situation
>where I will never again be able to do that.



Re: Repo conversion troubles.

2018-07-09 Thread Jeff Law
On 07/09/2018 01:57 PM, Eric S. Raymond wrote:
> Jeff Law :
>>> There are brute-force ways to pin down such malformations, but none of
>>> them are practical at the huge scale of this repository.  The main
>>> problem here wouldn't reposurgeon itself but the fact that Subversion
>>> checkouts on a repo this large are very slow. I've seen a single one
>>> take 12 hours; an attempt at a whole bisection run to pin down the
>>> divergence point on trunk would therefore probably cost log2 of the
>>> commit length times that, or about 18 days.
>>
>> I'm not aware of any such merges, but any that occurred most likely
>> happened after mid-April when the trunk was re-opened for development.
> 
> I agree it can't have been earlier than that, or I'd have hit this rock
> sooner.  I'd bet on the problem having arisen within the last six weeks.
> 
>> I'm assuming that it's only work that merges onto the trunk that's
>> potentially problematical here.
> 
> Yes.  It is possible there are also content mismatches on branches - I
> haven't run that check yet, it takes an absurd amount of time to complete -
> - but not much point in worrying about that if we can't get trunk right.
> 
> I'm pretty certain things were still good at r256000.  I've started that
> check running.  Not expecting results in less than twelve hours.
r256000 would be roughly Christmas 2017.  I'd be very surprised if any
merges to the trunk happened between that point and early April.  We're
essentially in regression bugfixes only during that timeframe.  Not a
time for branch->trunk merging :-)

jeff


Re: Repo conversion troubles.

2018-07-09 Thread Eric S. Raymond
Bernd Schmidt :
> On 07/09/2018 09:19 PM, Eric S. Raymond wrote:
> > Last time I did a comparison between SVN head and the git conversion
> > tip they matched exactly.  This time I have mismatches in the following
> > files.
> 
> So what are the diffs? Are we talking about small differences (like one
> change missing) or large-scale mismatches?

Large-scale, I'm afraid.  The context diff is about a GLOC.
-- 
http://www.catb.org/~esr/;>Eric S. Raymond

My work is funded by the Internet Civil Engineering Institute: https://icei.org
Please visit their site and donate: the civilization you save might be your own.




Re: Repo conversion troubles.

2018-07-09 Thread Eric S. Raymond
Jeff Law :
> > There are brute-force ways to pin down such malformations, but none of
> > them are practical at the huge scale of this repository.  The main
> > problem here wouldn't reposurgeon itself but the fact that Subversion
> > checkouts on a repo this large are very slow. I've seen a single one
> > take 12 hours; an attempt at a whole bisection run to pin down the
> > divergence point on trunk would therefore probably cost log2 of the
> > commit length times that, or about 18 days.
>
> I'm not aware of any such merges, but any that occurred most likely
> happened after mid-April when the trunk was re-opened for development.

I agree it can't have been earlier than that, or I'd have hit this rock
sooner.  I'd bet on the problem having arisen within the last six weeks.

> I'm assuming that it's only work that merges onto the trunk that's
> potentially problematical here.

Yes.  It is possible there are also content mismatches on branches - I
haven't run that check yet, it takes an absurd amount of time to complete -
- but not much point in worrying about that if we can't get trunk right.

I'm pretty certain things were still good at r256000.  I've started that
check running.  Not expecting results in less than twelve hours.
-- 
http://www.catb.org/~esr/;>Eric S. Raymond

My work is funded by the Internet Civil Engineering Institute: https://icei.org
Please visit their site and donate: the civilization you save might be your own.




Re: Repo conversion troubles.

2018-07-09 Thread Bernd Schmidt
On 07/09/2018 09:19 PM, Eric S. Raymond wrote:
> Last time I did a comparison between SVN head and the git conversion
> tip they matched exactly.  This time I have mismatches in the following
> files.

So what are the diffs? Are we talking about small differences (like one
change missing) or large-scale mismatches?


Bernd




Re: Repo conversion troubles.

2018-07-09 Thread Jeff Law
On 07/09/2018 01:19 PM, Eric S. Raymond wrote:
> Last time I did a comparison between SVN head and the git conversion
> tip they matched exactly.  This time I have mismatches in the following
> files.
> 
> libtool.m4
> libvtv/ChangeLog
> libvtv/configure
> libvtv/testsuite/lib/libvtv.exp
> ltmain.sh
> lto-plugin/ChangeLog
> lto-plugin/configure
> lto-plugin/lto-plugin.c
> MAINTAINERS
> maintainer-scripts/ChangeLog
> maintainer-scripts/crontab
> maintainer-scripts/gcc_release
> Makefile.def
> Makefile.in
> Makefile.tpl
> zlib/configure
> zlib/configure.ac
> 
> Now I'll explain what this means and why it's a serious problem.
[ ... ]
That's weird -- let's take maintainer-scripts/crontab as our victim.
That file (according to the git mirror) has only changed on the trunk 3
times in the last year.  They're all changes from Jakub and none look
unusual at all.  Just trivial looking updates.

libvtv.exp is another interesting file.  It changed twice in early May
of this year.  Prior to that it hadn't changed since 2015.


[ ... ]

> 
> There are brute-force ways to pin down such malformations, but none of
> them are practical at the huge scale of this repository.  The main
> problem here wouldn't reposurgeon itself but the fact that Subversion
> checkouts on a repo this large are very slow. I've seen a single one
> take 12 hours; an attempt at a whole bisection run to pin down the
> divergence point on trunk would therefore probably cost log2 of the
> commit length times that, or about 18 days.
I'm not aware of any such merges, but any that occurred most likely
happened after mid-April when the trunk was re-opened for development.

I'm assuming that it's only work that merges onto the trunk that's
potentially problematical here.

> 
> So...does that list of changed files look familar to anyone?  If we can
> identify the revision number of the bad commit, the odds of being able
> to unscramble this mess go way up.  They still aren't good, not when
> merely loading the repository for examination takes over four hours,
> but they would way better than if I were starting from zero.
They're familiar only in the sense that I know what those files are :-)

Jeff


Repo conversion troubles.

2018-07-09 Thread Eric S. Raymond
Last time I did a comparison between SVN head and the git conversion
tip they matched exactly.  This time I have mismatches in the following
files.

libtool.m4
libvtv/ChangeLog
libvtv/configure
libvtv/testsuite/lib/libvtv.exp
ltmain.sh
lto-plugin/ChangeLog
lto-plugin/configure
lto-plugin/lto-plugin.c
MAINTAINERS
maintainer-scripts/ChangeLog
maintainer-scripts/crontab
maintainer-scripts/gcc_release
Makefile.def
Makefile.in
Makefile.tpl
zlib/configure
zlib/configure.ac

Now I'll explain what this means and why it's a serious problem.

Reposurgeon is never confused by linear history, branching, or
tagging; I have lots of regression tests for those cases.  When it
screws up it is invariably around branch copy operations, because
there are cases near those where the data model of Subversion stream
files is underspecified. That model was in fact entirely undocumented
before I reverse-engineered it and wrote the description that now
lives in the Subversion source tree.  But that description is not
complete; nobody, not even Subversion's designers, knows how to fill
in all the corner cases.

Thus, a content mismatch like this means there was some recent branch
merge to trunk in the gcc history that reposurgeon is not interpreting
as intended, or more likely an operator error such as a non-Subversion
directory copy followed by a commit - my analyzer can recover from
most such cases but not all.

There are brute-force ways to pin down such malformations, but none of
them are practical at the huge scale of this repository.  The main
problem here wouldn't reposurgeon itself but the fact that Subversion
checkouts on a repo this large are very slow. I've seen a single one
take 12 hours; an attempt at a whole bisection run to pin down the
divergence point on trunk would therefore probably cost log2 of the
commit length times that, or about 18 days.

So...does that list of changed files look familar to anyone?  If we can
identify the revision number of the bad commit, the odds of being able
to unscramble this mess go way up.  They still aren't good, not when
merely loading the repository for examination takes over four hours,
but they would way better than if I were starting from zero.

This is serious. I have preduced demonstrably correct history
conversions of the gcc repo in the past.  We may now be in a situation
where I will never again be able to do that.
-- 
http://www.catb.org/~esr/;>Eric S. Raymond

The real point of audits is to instill fear, not to extract revenue;
the IRS aims at winning through intimidation and (thereby) getting
maximum voluntary compliance
-- Paul Strassel, former IRS Headquarters Agent Wall St. Journal 1980