I was looking at the git clone of the src repo
(https://github.com/netbsd/src) and I noticed that there are lots of
duplicate commits in there; some commits are even present 3 or 4 times.
At first I thought this occurs only with very old commits, but it is the
case for relatively recent ones as well.
Normally this isn't so easy to see, but with gitk and these settings it
is fairly obvious: choose menu View -> New View, select under
References: All refs, All (local) branches, All tags, All
remote-tracking branches. Lower down, select Strictly sort by date.
If you dan scroll back just a few years of commits, you can find a bunch
below the time "2017-04-10 23:53:37"
Taking some random commits from 2017-03-22 23:37:41:
c75b502dcf23b51c8d2504be7a9b5dd7823e4a09
Author: sevan 2017-03-22 23:37:41
Committer: sevan 2017-03-22 23:37:41
Parent: 20d6933e4ccdf0811b2b11f64dd019c016cea33e (On second through, it may
be possible to have a NULL kfs_v in read and write)
Child: fa4a1a6573dcb68fb2675cb80653b446a3231bb9 (KDTRACE_HOOKS is enabled
by default in GENERIC.common, remove references in)
Branch: remotes/origin/jdolecek_ncq
d595117d197582e247e9d5d89ea2c3327feb9e3c
Author: sevan 2017-03-22 23:37:41
Committer: sevan 2017-03-22 23:37:41
Parent: 058026589ba723ce74452748b5e78aa0a7cd15bc (On second through, it may
be possible to have a NULL kfs_v in read and write)
Child: b13c9c92f5f3fb3b6e010d31acd1b2a6bd1b1c22 (KDTRACE_HOOKS is enabled
by default in GENERIC.common, remove references in)
Branches: netbsd-9, remotes/origin/ad-namecache,
remotes/origin/bouyer-xenpvh, remotes/origin/is-mlppp,
remotes/origin/isaki-audio2, remotes/origin/jdolecek-ncq,
remotes/origin/jdolecek-ncqfixes, remotes/origin/matt-nb8-mediatek,
remotes/origin/netbsd-8, remotes/origin/netbsd-9,
remotes/origin/perseant-stdc-iso10646, remotes/origin/pgoyette-compat,
remotes/origin/phil-wifi, remotes/origin/prg-localcount2, remotes/origin/trunk,
trunk
Looking at the differences between these, I notice a different
conversion of the author/committer name. Also it is on branch
"jdolecek_ncq".
The second one has improved the author/committer, mentions several
branches, one of which is "jdolecek-ncq", with a dash rather than an
underscore.
With some other commits I saw, the branch names are "ROY" vs "roy".
Around 1999-12-05 you can see triple commits (but there are too many
branches and gitk doesn't show them, so analyzing that is more
difficult).
My guess here is that there was an incremental conversion, with
improvements in author and branch name conversion along the way. But
commits and branches from earlier processing stayed in the result, and
hence the duplicates.
Maybe it just needs a fresh conversion from the start to get rid of
these duplicates. Or if that is not feasible, removal of the outdated
branches from the origin repo would probably help a lot.
But it is cool to be able to look back all the way to 1992 to the first
commit!
-Olaf.
--
Olaf 'Rhialto' Seibert -- rhialto at falu dot nl
___ Anyone who is capable of getting themselves made President should on
\X/ no account be allowed to do the job. --Douglas Adams, "THGTTG"
signature.asc
Description: PGP signature