Re: Duplicate commits in git clone of src

2020-06-02 Thread J. Lewis Muir
On May 31, 2020, at 11:41 AM, Joerg Sonnenberger  wrote:
> 
> On Sun, May 31, 2020 at 02:07:23PM +0200, Rhialto wrote:
>> I was looking at the git clone of the src repo
>> (https://github.com/netbsd/src) and I noticed that there are lots of
>> duplicate commits in there; some commits are even present 3 or 4 times.
>> At first I thought this occurs only with very old commits, but it is the
>> case for relatively recent ones as well.
> 
> There is no way to force a GC on GitHub short of tearing down the repo
> completely AFAIK.

Based on

  
https://help.github.com/en/github/authenticating-to-github/removing-sensitive-data-from-a-repository

It looks like you might be able to request a GC from GitHub Support.  (I have 
no actual experience doing this, though; it’s just what I read.  Also, I guess 
it would be pointless if more unreachable commits will be added right away as 
part of the normal conversion.  But if the unreachable commits are an artifact 
of an old conversion, then maybe it would be worth asking for a GC.)

Lewis



Re: Duplicate commits in git clone of src

2020-05-31 Thread Joerg Sonnenberger
On Sun, May 31, 2020 at 03:06:42PM -0500, J. Lewis Muir wrote:
> On May 31, 2020, at 11:41 AM, Joerg Sonnenberger  wrote:
> > 
> > On Sun, May 31, 2020 at 02:07:23PM +0200, Rhialto wrote:
> >> I was looking at the git clone of the src repo
> >> (https://github.com/netbsd/src) and I noticed that there are lots of
> >> duplicate commits in there; some commits are even present 3 or 4 times.
> >> At first I thought this occurs only with very old commits, but it is the
> >> case for relatively recent ones as well.
> > 
> > There is no way to force a GC on GitHub short of tearing down the repo
> > completely AFAIK.
> 
> Based on
> 
>   
> https://help.github.com/en/github/authenticating-to-github/removing-sensitive-data-from-a-repository
> 
> It looks like you might be able to request a GC from GitHub Support. 
> (I have no actual experience doing this, though; it’s just what I read.
> Also, I guess it would be pointless if more unreachable commits will
> be added right away as part of the normal conversion.  But if the
> unreachable commits are an artifact of an old conversion, then maybe
> it would be worth asking for a GC.)

Yeah, not something worth to do before a final round.

Joerg


Re: Duplicate commits in git clone of src

2020-05-31 Thread Joerg Sonnenberger
On Sun, May 31, 2020 at 02:07:23PM +0200, Rhialto wrote:
> I was looking at the git clone of the src repo
> (https://github.com/netbsd/src) and I noticed that there are lots of
> duplicate commits in there; some commits are even present 3 or 4 times.
> At first I thought this occurs only with very old commits, but it is the
> case for relatively recent ones as well.

There is no way to force a GC on GitHub short of tearing down the repo
completely AFAIK.

Joerg


Duplicate commits in git clone of src

2020-05-31 Thread Rhialto
I was looking at the git clone of the src repo
(https://github.com/netbsd/src) and I noticed that there are lots of
duplicate commits in there; some commits are even present 3 or 4 times.
At first I thought this occurs only with very old commits, but it is the
case for relatively recent ones as well.

Normally this isn't so easy to see, but with gitk and these settings it
is fairly obvious: choose menu View -> New View, select under
References: All refs, All (local) branches, All tags, All
remote-tracking branches. Lower down, select Strictly sort by date.

If you dan scroll back just a few years of commits, you can find a bunch
below the time "2017-04-10 23:53:37"

Taking some random commits from 2017-03-22 23:37:41:

c75b502dcf23b51c8d2504be7a9b5dd7823e4a09 
Author: sevan   2017-03-22 23:37:41
Committer: sevan   2017-03-22 23:37:41
Parent: 20d6933e4ccdf0811b2b11f64dd019c016cea33e (On second through, it may 
be possible to have a NULL kfs_v in read and write)
Child:  fa4a1a6573dcb68fb2675cb80653b446a3231bb9 (KDTRACE_HOOKS is enabled 
by default in GENERIC.common, remove references in)
Branch: remotes/origin/jdolecek_ncq

d595117d197582e247e9d5d89ea2c3327feb9e3c
Author: sevan   2017-03-22 23:37:41
Committer: sevan   2017-03-22 23:37:41
Parent: 058026589ba723ce74452748b5e78aa0a7cd15bc (On second through, it may 
be possible to have a NULL kfs_v in read and write)
Child:  b13c9c92f5f3fb3b6e010d31acd1b2a6bd1b1c22 (KDTRACE_HOOKS is enabled 
by default in GENERIC.common, remove references in)
Branches: netbsd-9, remotes/origin/ad-namecache, 
remotes/origin/bouyer-xenpvh, remotes/origin/is-mlppp, 
remotes/origin/isaki-audio2, remotes/origin/jdolecek-ncq, 
remotes/origin/jdolecek-ncqfixes, remotes/origin/matt-nb8-mediatek, 
remotes/origin/netbsd-8, remotes/origin/netbsd-9, 
remotes/origin/perseant-stdc-iso10646, remotes/origin/pgoyette-compat, 
remotes/origin/phil-wifi, remotes/origin/prg-localcount2, remotes/origin/trunk, 
trunk

Looking at the differences between these, I notice a different
conversion of the author/committer name. Also it is on branch
"jdolecek_ncq".

The second one has improved the author/committer, mentions several
branches, one of which is "jdolecek-ncq", with a dash rather than an
underscore.

With some other commits I saw, the branch names are "ROY" vs "roy".
Around 1999-12-05 you can see triple commits (but there are too many
branches and gitk doesn't show them, so analyzing that is more
difficult).

My guess here is that there was an incremental conversion, with
improvements in author and branch name conversion along the way. But
commits and branches from earlier processing stayed in the result, and
hence the duplicates.

Maybe it just needs a fresh conversion from the start to get rid of
these duplicates. Or if that is not feasible, removal of the outdated
branches from the origin repo would probably help a lot.

But it is cool to be able to look back all the way to 1992 to the first
commit!

-Olaf.
-- 
Olaf 'Rhialto' Seibert -- rhialto at falu dot nl
___  Anyone who is capable of getting themselves made President should on
\X/  no account be allowed to do the job.   --Douglas Adams, "THGTTG"


signature.asc
Description: PGP signature