[gentoo-dev] Re: gentoo git workflow
C. Bergström posted on Mon, 15 Sep 2014 02:49:48 +0700 as excerpted: > Pretty please do NOT allow "merge" commits.. they are the bane of evil > for the long term ability to have any sane work-flow. Trying browsing a > commit history after a big merge commit.. or following the parent.. You just called the inventor of git and the workflow it was designed to manage insane. If that's the case arguably quite a bit more insanity would be a good thing! =:^) Try git log --color --stat --graph on the mainline kernel in a terminal and read only the main merge-commit logs unless that merge is something of special interest you want more info on. It actually makes following a full kernel cycle, including the commit window, drilling down to sub- merges and individual commits only on 2-3 areas of interest while keeping a general awareness of developments in the rest of the kernel not only practical once again, but relatively easy. Without seeing merge-commits it was a LOT harder. I know as I've done it both ways, and while I can get around in git to some extent, my git skills are definitely nothing special! =:^) -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman
[gentoo-dev] Re: My masterplan for git migration (+ looking for infra to test it)
hasufell posted on Sun, 14 Sep 2014 13:50:32 + as excerpted: > Jauhien Piatlicki: >> >> Again, how will user check the integrity and authenticity if Manifests >> are unsigned? > There is no regression if this isn't solved. > People who really care use emerge-webrsync. > If we use the proposed solution, then there is an additional method via > the User syncing repo, so it's a win. Absolutely. emerge-webrsync is the current "I care enough to worry about it" method, and this already adds the user-sync git repo as an "I care" option. Leaving standard rsync users where they already are isn't a regression and shouldn't be a blocker. Don't let the perfect be the enemy of the imperfect but better! =:^) -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman
Re: [gentoo-dev] My masterplan for git migration (+ looking for infra to test it)
On 2014-09-14 21:57, Kent Fredric wrote: > I generate metadata for the perl-experimental overlay periodically as a > snapshotted variation of the same, and the performance isn't so bad. Overlays with few eclasses are much different than the main tree. Anyway, egencache isn't bad it's just significantly slower than alternatives so it could be sped up quite a lot if necessary. > However, what I suspect you *could* do with a push hook is regen metadata > for only things that were modified in that commit, because I believe > there's a way to regen metadata for only specific files now. > ie: > modifications to cat/PN *would* trigger a metadata update, but only for > that cat/PN > modifications to eclass/* would *NOT* trigger a metadata update as part of > the push. > And doing tree-wide "an eclass was changed" updates could be done with > lower priority in an asynchronous cron job or something so as not to block > workflow for several minutes/hours/whatever while some muppet sits there > watching "git push" do nothing. If we need to do piecewise regen it seems we would be better off just sticking with the current scheduled cron job approach. Otherwise it sounds like one could pull updates without having the correct metadata for a significant portion of the tree. Tim pgpD3F3w_LSNi.pgp Description: PGP signature
Re: [gentoo-dev] My masterplan for git migration (+ looking for infra to test it)
On 15 September 2014 13:30, Tim Harder wrote: > I haven't run portage metadata regen on a beefy machine lately, but I > don't think it could keep up in all cases. Perhaps someone can prove me > wrong. > > Anyway, things could definitely be sped up if portage merges a few speed > tweaks used in pkgcore. Specifically, I think using some of the weakref > and perhaps jitted attrs support along with the eclass caching hacks > would give a 2-4x metadata regen speedup. Otherwise pkgcore could > potentially be used to regen metadata as well or some other tuned regen > tool. > I generate metadata for the perl-experimental overlay periodically as a snapshotted variation of the same, and the performance isn't so bad. However, what I suspect you *could* do with a push hook is regen metadata for only things that were modified in that commit, because I believe there's a way to regen metadata for only specific files now. ie: modifications to cat/PN *would* trigger a metadata update, but only for that cat/PN modifications to eclass/* would *NOT* trigger a metadata update as part of the push. And doing tree-wide "an eclass was changed" updates could be done with lower priority in an asynchronous cron job or something so as not to block workflow for several minutes/hours/whatever while some muppet sits there watching "git push" do nothing. -- Kent *KENTNL* - https://metacpan.org/author/KENTNL
Re: [gentoo-dev] My masterplan for git migration (+ looking for infra to test it)
On 2014-09-14 10:46, Michał Górny wrote: > Dnia 2014-09-14, o godz. 15:40:06 > Davide Pesavento napisał(a): > > How long does the md5-cache regeneration process take? Are you sure it > > will be able to keep up with the rate of pushes to the repo during > > "peak hours"? If not, maybe we could use a time-based thing similar to > > the current cvs->rsync synchronization. > > This strongly depends on how much data is there to update. A few > ebuilds are quite fast, eclass change isn't ;). I was thinking of > something along the lines of, in pseudo-code speaking: > > systemctl restart cache-regen > > That is, we start the regen on every update. If it finishes in time, it > commits the new metadata. If another update occurs during regen, we > just restart it to let it catch the new data. > > Of course, if we can't spare the resources to do intermediate updates, > we may as well switch to cron-based update method. I don't see per push metadata regen working entirely well in this case if this is the only way we're generating the metadata cache for users to sync. It's easy to imagine a plausible situation where a widely used eclass change is made followed by commits less than a minute apart (or shorter than however long it would take for metadata regen to occur) for at least 30 minutes (rsync refresh period for most user-facing mirrors) during a time of high activity. I haven't run portage metadata regen on a beefy machine lately, but I don't think it could keep up in all cases. Perhaps someone can prove me wrong. Anyway, things could definitely be sped up if portage merges a few speed tweaks used in pkgcore. Specifically, I think using some of the weakref and perhaps jitted attrs support along with the eclass caching hacks would give a 2-4x metadata regen speedup. Otherwise pkgcore could potentially be used to regen metadata as well or some other tuned regen tool. Tim pgpGfmG5Ks9YC.pgp Description: PGP signature
Re: [gentoo-dev] My masterplan for git migration (+ looking for infra to test it)
On 15 September 2014 13:06, Peter Stuge wrote: > even after > the commits. > I've even made branches in "detached head" state ( that is, without a branch ) and given them branches after the fact. After all, branches aren't really "things", they're just pointers to SHA1s, that get repointed to new sha1's as part of "git commit". Tags are also simply pointers, they just don't get updated by default. -- Kent *KENTNL* - https://metacpan.org/author/KENTNL
Re: [gentoo-dev] My masterplan for git migration (+ looking for infra to test it)
Rich Freeman wrote: > If you just want to do 15 standalone commits before you push you can > do those sequentially easily enough. A branch would be more > appropriate for some kind of mini-project. .. > That is the beauty of git - branches are really cheap. > So are repositories And commits. Not only are branches cheap, they are also very easy to create, and maybe most importantly they can be created at any time, even after the commits. It's quick and painless to create a bunch of commits which aren't really closely related in sequence, and only later clean the whole series of commits up while creating different branches for commits which should actually be grouped rather than mixed all together. //Peter
Re: [gentoo-dev] My masterplan for git migration (+ looking for infra to test it)
Patrick Lauer wrote: > > > That'd mean I need half a dozen checkouts just to emulate cvs, which > > > somehow doesn't make much sense to me ... > > > > Unlike CVS, git doesn't force you to work in "Keep millions of files in > > uncommitted states" mode just to work on a codebase, due to the commit <-> > > replicate seperation. > > But that's the feature! You can have millions of uncommitted files with git too. The person who creates a commit always decides what changes in what files should be included in that commit. (You don't even have to commit all the changes within one file at the same time.) There are some shortcuts for committing all uncommitted changes at once but you don't have to do that. I frequently only commit little bits of my currently uncommitted changes. > I can work on bumping postgresql (takes about 1h walltime to compile and test > all versions) *and* work on a few tiny python packages while doing that. > Without breaking either process. Without multiple checkouts. Same with git. //Peter
Re: [gentoo-dev] My masterplan for git migration (+ looking for infra to test it)
On Sun, Sep 14, 2014 at 7:21 PM, Patrick Lauer wrote: > > iow, git doesn't allow people to work on more than one item at a time? > > That'd mean I need half a dozen checkouts just to emulate cvs, which somehow > doesn't make much sense to me ... > Well, you can work on as many things as you like in git, but it doesn't keep track of what changes have to do with what things if you don't commit in-between. So, you'll have a big list of changes in your index, and you'll have to pick-and-choose what you commit at any one time. If you really want to work on many things "at once" the better way to do it is to do a temporary branch per-thing, and when you switch between things you switch between branches, and then move into master things as they are done. I assume you mean working on things that will take a while to complete. If you just want to do 15 standalone commits before you push you can do those sequentially easily enough. A branch would be more appropriate for some kind of mini-project. You can work on branches without pushing those to the master repo. Or, if appropriate a project team might choose to push their branch to master, or to some other repo (like an overlay). This would allow collaborative work on a large commit, with a quick final merge into the main tree. That is the beauty of git - branches are really cheap. So are repositories - if somebody wants to do all their work in github and then push to the main tree, they can do that. -- Rich
Re: [gentoo-dev] My masterplan for git migration (+ looking for infra to test it)
Patrick Lauer: > On Monday 15 September 2014 11:27:34 Kent Fredric wrote: >> On 15 September 2014 11:21, Patrick Lauer wrote: >>> iow, git doesn't allow people to work on more than one item at a time? >>> >>> That'd mean I need half a dozen checkouts just to emulate cvs, which >>> somehow >>> doesn't make much sense to me ... >> >> Use the Stash. Or just commit items, then swap branches, and then discard >> the commits sometime later before pushing. >> >> Unlike CVS, git doesn't force you to work in "Keep millions of files in >> uncommitted states" mode just to work on a codebase, due to the commit <-> >> replicate seperation. > But that's the feature! > > I can work on bumping postgresql (takes about 1h walltime to compile and test > all versions) *and* work on a few tiny python packages while doing that. > Without breaking either process. Without multiple checkouts. > > I doubt stash would allow things to progress ... but it's a cute idea. > Please read up about git branches. I don't see anything particularly broken. People use git to work on 10+ different feature at a time. It works. Also, let's not derail this thread to git vs CVS, thanks.
Re: [gentoo-dev] My masterplan for git migration (+ looking for infra to test it)
On 15 September 2014 11:25, hasufell wrote: > Robin said > > The Git commit-signing design explicitly signs the entire commit, > including blob contents, to avoid this security problem. > > Is this correct or not? > I can verify a commit by hand with only the commit object and gpg, but without any of the trees or parents. https://gist.github.com/kentfredric/8448fe55ffab7d314ecb -- Kent *KENTNL* - https://metacpan.org/author/KENTNL
Re: [gentoo-dev] My masterplan for git migration (+ looking for infra to test it)
Patrick Lauer: > On Sunday 14 September 2014 15:42:15 hasufell wrote: >> Patrick Lauer: Are we going to disallow merge commits and ask devs to rebase local changes in order to keep the history "clean"? >>> >>> Is that going to be sane with our commit frequency? >> >> You have to merge or rebase anyway in case of a push conflict, so the >> only difference is the method and the effect on the history. >> >> Currently... CVS allows you to run repoman on an outdated tree and push >> broken ebuilds with repoman being happy. Git will not allow this. > > iow, git doesn't allow people to work on more than one item at a time? > Completely the opposite. You can work on 400 packages, accumulate the changes, commit them and push them in one blow instead of writing fragile scripts or Makefiles that do >400 pushes, fail at some point in the middle because of a conflict and then try to figure out what you already pushed and what not. > That'd mean I need half a dozen checkouts just to emulate cvs, which somehow > doesn't make much sense to me ... > checkouts? You probably mean that you have to rebase your changes in case someone pushed before you. That makes perfect sense, because the ebuild you just wrote might be broken by now, because someone changed profiles/. We are talking about a one-liner in the shell that will work in the majority of the cases. If it doesn't work (as in: merge conflict), then that means there is something REALLY wrong and 2 people are working uncoordinated on the same file at a time.
Re: [gentoo-dev] My masterplan for git migration (+ looking for infra to test it)
On Monday 15 September 2014 11:27:34 Kent Fredric wrote: > On 15 September 2014 11:21, Patrick Lauer wrote: > > iow, git doesn't allow people to work on more than one item at a time? > > > > That'd mean I need half a dozen checkouts just to emulate cvs, which > > somehow > > doesn't make much sense to me ... > > Use the Stash. Or just commit items, then swap branches, and then discard > the commits sometime later before pushing. > > Unlike CVS, git doesn't force you to work in "Keep millions of files in > uncommitted states" mode just to work on a codebase, due to the commit <-> > replicate seperation. But that's the feature! I can work on bumping postgresql (takes about 1h walltime to compile and test all versions) *and* work on a few tiny python packages while doing that. Without breaking either process. Without multiple checkouts. I doubt stash would allow things to progress ... but it's a cute idea.
Re: [gentoo-dev] My masterplan for git migration (+ looking for infra to test it)
On Sun, Sep 14, 2014 at 11:25:33PM +, hasufell wrote: > So can we get this clear now. > > Robin said > > > The Git commit-signing design explicitly signs the entire commit, > > including blob contents, to avoid this security problem. > > Is this correct or not? That is false. The commit signature explicitly signs the commit, which includes the root tree hash. That is the only connection between the signature and the tree contents. Cheers, Trevor -- This email may be signed or encrypted with GnuPG (http://www.gnupg.org). For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy signature.asc Description: OpenPGP digital signature
Re: [gentoo-dev] My masterplan for git migration (+ looking for infra to test it)
On 15 September 2014 11:21, Patrick Lauer wrote: > iow, git doesn't allow people to work on more than one item at a time? > > That'd mean I need half a dozen checkouts just to emulate cvs, which > somehow > doesn't make much sense to me ... > Use the Stash. Or just commit items, then swap branches, and then discard the commits sometime later before pushing. Unlike CVS, git doesn't force you to work in "Keep millions of files in uncommitted states" mode just to work on a codebase, due to the commit <-> replicate seperation. -- Kent *KENTNL* - https://metacpan.org/author/KENTNL
Re: [gentoo-dev] My masterplan for git migration (+ looking for infra to test it)
Rich Freeman: > On Sun, Sep 14, 2014 at 6:56 PM, hasufell wrote: >> According to Robin, it's not about rebasing, it's about signing all >> commits so that messing with the blob (even if it has the same sha-1) >> will cause signature verification failure. >> > > The only thing that gets signed is the commit message, and the only > thing that ties the commit message to the code is the sha1 of the > top-level tree. If you can attack sha1 either at any tree level or at > the blob level you can defeat the signature. > So can we get this clear now. Robin said > The Git commit-signing design explicitly signs the entire commit, including > blob contents, to avoid this security problem. Is this correct or not?
Re: [gentoo-dev] My masterplan for git migration (+ looking for infra to test it)
On Sun, Sep 14, 2014 at 07:13:21PM -0400, Rich Freeman wrote: > The only thing that gets signed is the commit message, and the only > thing that ties the commit message to the code is the sha1 of the > top-level tree. If you can attack sha1 either at any tree level or at > the blob level you can defeat the signature. > > That is way better than nothing though - I think it is worth pursuing > until somebody comes up with a way to upgrade git to more secure > hashes. Most projects don't gpg sign their trees at all, including > linux. I'm not worried about the attack (as I explained earlier in this thread). I'm just arguing for signing first-parent commits to master, and not worrying about signatures on any side-branch commits. So long as the merge gets signed, you've got all the security you're going to get. Leaving the side-branch commits unchanged allows you to preserve any non-dev commit hashes, which makes it easier for contributors to verify that their changes have landed (the same way that GitHub is checking to know when to automatically close pull requests). Cheers, Trevor -- This email may be signed or encrypted with GnuPG (http://www.gnupg.org). For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy signature.asc Description: OpenPGP digital signature
Re: [gentoo-dev] My masterplan for git migration (+ looking for infra to test it)
On 15 September 2014 11:15, W. Trevor King wrote: > All cherry-pick and am do is apply one commit's diff to a different > parent. Changing the parent hash (which is stored in the commit body > [1]), so old signatures won't apply to the new commit. If there have > been other tree changes between the initial parent and the new parent, > the tree hash will also change, which would also break old signatures. > None of that has anything to do with a malicious blob being pushed > into the tree disguised as a same-hashed good blob. Such a blob will > *not* break any signatures, since GnuPG is *never hashing the blob > contents* when signing commits [1,2]. You're only signing the commit > object, not the tree and blob objects referenced by that commit. > > Cheers, > Trevor > And given that the method of "security" against attacks is established by a chain of custody from a signed commit, through multiple child unsigned SHA1 objects, having a parent being an unsigned commit is no *less* secure than having a tree or file blob being unsigned, it doesn't make perfect sense to me that "all" commits have to be signed. ( Because doing so doesn't give the benefit of security we think it does ). Thus, a "I signed this commit, establishing a chain of trust relying on SHA1 integrity to the previous signed commit" is all that seems truly necessary. Anything else is decreased utility with no increase in security. -- Kent *KENTNL* - https://metacpan.org/author/KENTNL
[gentoo-dev] Last rites: net-misc/netcomics-cvs
Masked for removal in 30 days. See bug #515028 Ancient and unmaintained. -- Dion Moult
Re: [gentoo-dev] My masterplan for git migration (+ looking for infra to test it)
On Sunday 14 September 2014 15:42:15 hasufell wrote: > Patrick Lauer: > >> Are we going to disallow merge commits and ask devs to rebase local > >> changes in order to keep the history "clean"? > > > > Is that going to be sane with our commit frequency? > > You have to merge or rebase anyway in case of a push conflict, so the > only difference is the method and the effect on the history. > > Currently... CVS allows you to run repoman on an outdated tree and push > broken ebuilds with repoman being happy. Git will not allow this. iow, git doesn't allow people to work on more than one item at a time? That'd mean I need half a dozen checkouts just to emulate cvs, which somehow doesn't make much sense to me ...
Re: [gentoo-dev] My masterplan for git migration (+ looking for infra to test it)
On 15 September 2014 10:56, hasufell wrote: > According to Robin, it's not about rebasing, it's about signing all > commits so that messing with the blob (even if it has the same sha-1) > will cause signature verification failure. > Correct me if I'm wrong, but wouldn't a SHA1 attack on the tree object or file blobs be completely invisible to the commit SHA1? As the Signature only signs content of the commit object, not any of the nodes it refers to. Granted, getting a tree/file object to replicate might be interesting. -- Kent *KENTNL* - https://metacpan.org/author/KENTNL
Re: [gentoo-dev] My masterplan for git migration (+ looking for infra to test it)
On Sun, Sep 14, 2014 at 10:56:33PM +, hasufell wrote: > W. Trevor King: > > On Sun, Sep 14, 2014 at 10:38:41PM +, hasufell wrote: > >> So we'd basically end up using either "git cherry-pick" or "git > >> am" for "pulling" user stuff, so that we also sign the blobs. > > > > Rebasing the original commits doesn't protect you from the > > birthday attach either, because the vulnerable hash is likely > > going to still be in the rebased commit's tree. All rebasing does > > is swap the committer and drop the initial signature. > > According to Robin, it's not about rebasing, it's about signing all > commits so that messing with the blob (even if it has the same > sha-1) will cause signature verification failure. All cherry-pick and am do is apply one commit's diff to a different parent. Changing the parent hash (which is stored in the commit body [1]), so old signatures won't apply to the new commit. If there have been other tree changes between the initial parent and the new parent, the tree hash will also change, which would also break old signatures. None of that has anything to do with a malicious blob being pushed into the tree disguised as a same-hashed good blob. Such a blob will *not* break any signatures, since GnuPG is *never hashing the blob contents* when signing commits [1,2]. You're only signing the commit object, not the tree and blob objects referenced by that commit. Cheers, Trevor [1]: http://article.gmane.org/gmane.linux.gentoo.devel/77537 [2]: http://git.kernel.org/cgit/git/git.git/tree/commit.c?id=v2.1.0#n1076 -- This email may be signed or encrypted with GnuPG (http://www.gnupg.org). For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy signature.asc Description: OpenPGP digital signature
Re: [gentoo-dev] My masterplan for git migration (+ looking for infra to test it)
On Sun, Sep 14, 2014 at 6:56 PM, hasufell wrote: > According to Robin, it's not about rebasing, it's about signing all > commits so that messing with the blob (even if it has the same sha-1) > will cause signature verification failure. > The only thing that gets signed is the commit message, and the only thing that ties the commit message to the code is the sha1 of the top-level tree. If you can attack sha1 either at any tree level or at the blob level you can defeat the signature. That is way better than nothing though - I think it is worth pursuing until somebody comes up with a way to upgrade git to more secure hashes. Most projects don't gpg sign their trees at all, including linux. -- Rich
[gentoo-dev] Last rites: app-text/pastebin
Masked for removal in 30 days. Please see bug #434366 It has no support for new API since 2012. A good replacement of this package, app-text/pastebinit, is already stabilized. -- Dion Moult
Re: [gentoo-dev] My masterplan for git migration (+ looking for infra to test it)
W. Trevor King: > On Sun, Sep 14, 2014 at 10:38:41PM +, hasufell wrote: >> Yes, there is a possible attack vector mentioned in this comment >> https://bugs.gentoo.org/show_bug.cgi?id=502060#c16 > > From that comment, the point 1.2 is highly unlikely [1]: > > 1. Attacker constructs a init.d script, regular part at the start, > malicious part at the end > 1.1. This would be fairly simple, just construct two start() > functions, one of which is mundane, the other is malicious. > 1.2. Both variants of the script have the same SHA1... > >> So we'd basically end up using either "git cherry-pick" or "git am" >> for "pulling" user stuff, so that we also sign the blobs. > > Rebasing the original commits doesn't protect you from the birthday > attach either, because the vulnerable hash is likely going to still be > in the rebased commit's tree. All rebasing does is swap the committer > and drop the initial signature. > According to Robin, it's not about rebasing, it's about signing all commits so that messing with the blob (even if it has the same sha-1) will cause signature verification failure.
Re: [gentoo-dev] My masterplan for git migration (+ looking for infra to test it)
On Sun, Sep 14, 2014 at 10:38:41PM +, hasufell wrote: > Yes, there is a possible attack vector mentioned in this comment > https://bugs.gentoo.org/show_bug.cgi?id=502060#c16 From that comment, the point 1.2 is highly unlikely [1]: 1. Attacker constructs a init.d script, regular part at the start, malicious part at the end 1.1. This would be fairly simple, just construct two start() functions, one of which is mundane, the other is malicious. 1.2. Both variants of the script have the same SHA1... > So we'd basically end up using either "git cherry-pick" or "git am" > for "pulling" user stuff, so that we also sign the blobs. Rebasing the original commits doesn't protect you from the birthday attach either, because the vulnerable hash is likely going to still be in the rebased commit's tree. All rebasing does is swap the committer and drop the initial signature. Cheers, Trevor [1]: http://article.gmane.org/gmane.comp.version-control.git/210622 -- This email may be signed or encrypted with GnuPG (http://www.gnupg.org). For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy signature.asc Description: OpenPGP digital signature
Re: [gentoo-dev] My masterplan for git migration (+ looking for infra to test it)
W. Trevor King: > On Sun, Sep 14, 2014 at 05:40:30PM +0200, Michał Górny wrote: >> Dnia 2014-09-15, o godz. 03:15:14 Kent Fredric napisał(a): >>> Only downside there is the way github pull reqs work is if the >>> final SHA1's that hit tree don't match, the pull req doesn't >>> close. >>> >>> Solutions: >>> >>> - A) Have somebody tasked with reaping old pull reqs with >>> permissions granted. ( Uck ) >>> - B) Always use a merge of some kind to mark the pull req as dead >>> ( for instance, an "ours" merge to mark the branch as deprecated ) >>> >>> Both of those options are kinda ugly. >> >> If you merge a pull request, I suggest doing a proper 'git merge -S' >> anyway to get a developer signature on top of all the changes. > > Some previous package-tree-in-Git efforts suggested that only > Gentoo-dev signatures were acceptable, and that those signatures would > be required on every commit (not just the first-parent line) [1,2]. I > don't see the point of that, so long as Gentoo devs are signing the > first-parent line, but if folks still want Gentoo-dev signatures on > every commit the ‘git merge -S’ approach will not work for closing > PRs. > > Cheers, > Trevor > > [1]: http://article.gmane.org/gmane.linux.gentoo.devel/77572 > id:cagfcs_manfikevtj3cmcq1of-uqavebe2r1okykygwc5vom...@mail.gmail.com > [2]: https://bugs.gentoo.org/show_bug.cgi?id=502060#c0 > Yes, there is a possible attack vector mentioned in this comment https://bugs.gentoo.org/show_bug.cgi?id=502060#c16 So we'd basically end up using either "git cherry-pick" or "git am" for "pulling" user stuff, so that we also sign the blobs. Regular merges would still be possible for developer pull requests, but that's probably not the primary use case anyway.
Re: [gentoo-dev] My masterplan for git migration (+ looking for infra to test it)
On Sun, Sep 14, 2014 at 05:40:30PM +0200, Michał Górny wrote: > Dnia 2014-09-15, o godz. 03:15:14 Kent Fredric napisał(a): > > Only downside there is the way github pull reqs work is if the > > final SHA1's that hit tree don't match, the pull req doesn't > > close. > > > > Solutions: > > > > - A) Have somebody tasked with reaping old pull reqs with > > permissions granted. ( Uck ) > > - B) Always use a merge of some kind to mark the pull req as dead > > ( for instance, an "ours" merge to mark the branch as deprecated ) > > > > Both of those options are kinda ugly. > > If you merge a pull request, I suggest doing a proper 'git merge -S' > anyway to get a developer signature on top of all the changes. Some previous package-tree-in-Git efforts suggested that only Gentoo-dev signatures were acceptable, and that those signatures would be required on every commit (not just the first-parent line) [1,2]. I don't see the point of that, so long as Gentoo devs are signing the first-parent line, but if folks still want Gentoo-dev signatures on every commit the ‘git merge -S’ approach will not work for closing PRs. Cheers, Trevor [1]: http://article.gmane.org/gmane.linux.gentoo.devel/77572 id:cagfcs_manfikevtj3cmcq1of-uqavebe2r1okykygwc5vom...@mail.gmail.com [2]: https://bugs.gentoo.org/show_bug.cgi?id=502060#c0 -- This email may be signed or encrypted with GnuPG (http://www.gnupg.org). For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy signature.asc Description: OpenPGP digital signature
Re: [gentoo-dev] My masterplan for git migration (+ looking for infra to test it)
Michał Górny wrote: > What I need others to do is provide the hosting for git repos. I'm happy to set up repos on my git server with custom hooks and accounts as needed. It's probably not what we want long-term, but it might be useful as proof of concept, so that infra only needs to do setup one time. I even have some virtual hosting working, point an A to the right IP and it looks like only desired repos are hosted there. Gitweb, git-daemon and git over http and CAcert https with pretty URLs. //Peter pgp4M7ju1Sv1x.pgp Description: PGP signature
Re: [gentoo-dev] My masterplan for git migration (+ looking for infra to test it)
I think the better option Is to block rsync and force emerge-webrsync .sended from a phone Il 14/09/2014 14:03, Michał Górny ha scritto: > The rsync tree > -- > > We'd also propagate things to rsync. We'd have to populate it with old > ChangeLogs, new ChangeLog entries (autogenerated from git) and thick > Manifests. So users won't notice much of a change. > If this will change all Changelog the first rsync from the users will generate a lot of traffic, rsync network need to be prepared
Re: [gentoo-dev] My masterplan for git migration (+ looking for infra to test it)
Il 14/09/2014 14:03, Michał Górny ha scritto: > The rsync tree > -- > > We'd also propagate things to rsync. We'd have to populate it with old > ChangeLogs, new ChangeLog entries (autogenerated from git) and thick > Manifests. So users won't notice much of a change. > If this will change all Changelog the first rsync from the users will generate a lot of traffic, rsync network need to be prepared
Re: [gentoo-dev] gentoo git workflow
> > However, rebasing changes *on* master, before they are pushed, is a good > thing, because that kills non-fast-forward merges. > Nontrivial rebases *on* master can be problematic because you're changing history. Imagine you pull some nice commits from a user. Then at some point you will have to rebase them before you push them. If this fails and requires manual interaction, the original version of the commits is lost (including signatures) and errors are not traceable. With a merge instead any manual intervention is clearly located in the merge commit and the authors of each change are uniquely identifiable. -- Andreas K. Huettel Gentoo Linux developer (council, kde) dilfri...@gentoo.org http://www.akhuettel.de/
Re: [gentoo-dev] gentoo git workflow
"C. Bergström": > Pretty please do NOT allow "merge" commits.. they are the bane of evil > for the long term ability to have any sane work-flow. It works pretty well for the linux kernel. Ofc it's a matter of actually handling it. If people are unable to properly handle tools/methods, everything could become the bane of evil, no matter what tool you use. > There's a big debate between merge vs rebase I think most of those debates are nonsense. Both methods have their use cases. But it matters when to use them. A lot of people don't even know how to actually rebase, so they end up causing merge commits for everything which will lead to a _very_ confusing history. Simply banning that method is not a solution in my opinion. The solution is to make a clear policy or recommendations when to use one of them.
Re: [gentoo-dev] gentoo git workflow
On 09/15/14 02:34 AM, hasufell wrote: William Hubbs: On Sun, Sep 14, 2014 at 08:04:12PM +0200, Andreas K. Huettel wrote: Deciding on a _commit policy_ should be fairly straightforward and we already have one point * gpg sign every commit (unless it's a merged branch, then we only care about the merge commit) +1 Merge commits only happen if we allow non-fast-forward merges. I would personally be against allowing merge commits on the master branch. Allowing fast-forward merges will break signature verification if you fetched from a user repo. If we don't allow merge commits, then _every_ commit hast to be signed by a gentoo dev (e.g. by using git-am). I don't see much sense in this. It will rather complicate workflow. The currently proposed verification script skips branch 'B', so what matters is the signature of the merge commit which say "yes, I have reviewed the users branch(es) and it's fine". Merging from branches holds useful information. A linear history isn't necessarily easier to understand, so from me linear history gets a -1 It just isn't really "git" to me. But it also requires people to know when to avoid merge commits. Rebases involving commits that are already pushed to master probably shouldn't be allowed. Of course, yes. That has to be documented in a gentoo developer git guide. Pretty please do NOT allow "merge" commits.. they are the bane of evil for the long term ability to have any sane work-flow. Trying browsing a commit history after a big merge commit.. or following the parent.. lastly - the "merge" commit itself could be very confusing to some people when viewed in github. (At least personally I find them frequently unreadable) After 5 years of git where I work - they are now banned (policy) and I wish github would allow them to be banned (non-fast forward) to avoid mistakes There's a big debate between merge vs rebase.. I'm not trying to go down the benefits of one workflow vs the other.. However, if rebase fails.. you can allow merge commits in the future.. The opposite isn't easily accomplished without squashing history and losing stuff..
Re: [gentoo-dev] gentoo git workflow
William Hubbs: > On Sun, Sep 14, 2014 at 08:04:12PM +0200, Andreas K. Huettel wrote: >> >>> Deciding on a _commit policy_ should be fairly straightforward and we >>> already have one point >>> * gpg sign every commit (unless it's a merged branch, then we only care >>> about the merge commit) >> >> +1 > > Merge commits only happen if we allow non-fast-forward merges. I would > personally be against allowing merge commits on the master branch. > Allowing fast-forward merges will break signature verification if you fetched from a user repo. If we don't allow merge commits, then _every_ commit hast to be signed by a gentoo dev (e.g. by using git-am). I don't see much sense in this. It will rather complicate workflow. The currently proposed verification script skips branch 'B', so what matters is the signature of the merge commit which say "yes, I have reviewed the users branch(es) and it's fine". Merging from branches holds useful information. A linear history isn't necessarily easier to understand, so from me linear history gets a -1 It just isn't really "git" to me. But it also requires people to know when to avoid merge commits. > > Rebases involving commits that are already pushed to master probably > shouldn't be allowed. > Of course, yes. That has to be documented in a gentoo developer git guide.
Re: [gentoo-dev] gentoo git workflow
On Sun, Sep 14, 2014 at 08:04:12PM +0200, Andreas K. Huettel wrote: > > > Deciding on a _commit policy_ should be fairly straightforward and we > > already have one point > > * gpg sign every commit (unless it's a merged branch, then we only care > > about the merge commit) > > +1 Merge commits only happen if we allow non-fast-forward merges. I would personally be against allowing merge commits on the master branch. > > > More things to consider for commit policy are: > > * commit message format (line length, maybe prepend category/PN?) > > this could be done in part by repoman... having a meaningful shortlog would > be > nice. I don't see how repoman could do anything about this, but here is a good description of how to write git commit messages [1]. > > * do we expect repoman to run successfully for every commit (I'd say no)? > > commit no, push yes? +1, every time we push that should indicate a successful repoman run. > > * additional information that must be provided I'm not sure what additional information is being referred to. > > * when to force/avoid merge commits I would be against merge commits on the master branch; everything should be a fast-forward merge. > my take- disallow (by policy) nontrivial rebases by third parties, encourage > trivial rebases Rebases involving commits that are already pushed to master probably shouldn't be allowed. However, rebasing changes *on* master, before they are pushed, is a good thing, because that kills non-fast-forward merges. William signature.asc Description: Digital signature
Re: [gentoo-dev] gentoo git workflow
> Deciding on a _commit policy_ should be fairly straightforward and we > already have one point > * gpg sign every commit (unless it's a merged branch, then we only care > about the merge commit) +1 > More things to consider for commit policy are: > * commit message format (line length, maybe prepend category/PN?) this could be done in part by repoman... having a meaningful shortlog would be nice. > * do we expect repoman to run successfully for every commit (I'd say no)? commit no, push yes? > * additional information that must be provided > * when to force/avoid merge commits my take- disallow (by policy) nontrivial rebases by third parties, encourage trivial rebases -- Andreas K. Huettel Gentoo Linux developer (council, kde) dilfri...@gentoo.org http://www.akhuettel.de/ signature.asc Description: This is a digitally signed message part.
Re: [gentoo-dev] My masterplan for git migration (+ looking for infra to test it)
> "MG" == Michał Górny writes: MG> This means we don't have to wait till someone figures out the perfect MG> way of converting the old CVS repository. You don't need that history MG> most of the time, and you can play with CVS to get it if you really do. MG> In any case, we would likely strip the history anyway to get a small MG> repo to work with. +1 on that. The cvs repo can be converted to an historical git repo on a slower timeframe, and remain available as cvs until then. That old-vs-fresh concept worked fine for other projects (including Linux). -JimC -- James Cloos OpenPGP: 0x997A9F17ED7DAEA6
Re: [gentoo-dev] Re: My masterplan for git migration (+ looking for infra to test it)
On Sun, Sep 14, 2014 at 6:10 PM, hasufell wrote: > Let's try it with push access for every developer. +1. I'm pretty strongly opposed to leaving the history behind. I'd tend to agree with Rich when he says that history conversion is pretty much a solved problem, anyway. Cheers, Dirkjan
Re: [gentoo-dev] Re: My masterplan for git migration (+ looking for infra to test it)
Rich Freeman: > On Sun, Sep 14, 2014 at 10:56 AM, Michał Górny wrote: >> Dnia 2014-09-14, o godz. 10:33:03 >> >> With git, we can finally do stuff like preparing everything and pushing >> in one go. Rebasing or merging will be much easier then, since >> the effective push rate will be smaller than current commit rate. > > While I agree that the ability to consolidate commits will definitely > help with the commit rate, I'm not sure it will make a big difference. > It will turn a kde stablereq from 300 commits into 1, and do the same > for things like package moves and such. However, I suspect that the > vast majority of our commits are things like bumps on individual > packages that will always be individual commits. Maybe insofar as one > person does a bunch of them they can be pushed at the same time, > but... > > Looking at https://github.com/rich0/gentoo-gitmig-2014-02-21 it seems > like we get about 150 commits/day on busy days. I suspect that isn't > evenly distributed, but you may be right that it will just work out. > If the push frequency becomes so high that people barely get stuff pushed because of conflicts, then we simply have to say goodbye to the central repository workflow and have to establish a hierarchy where only a handful of people have direct push access and the rest is worked out through pull requests to project leads or dedicated reviewers. So the merging and rebasing work would then be done by fewer people instead of every single developer. But given that currently project leads may or may not be active I'm not sure that I'd vote for such a workflow. And I don't think we need that yet (although enforced review workflow is ofc superior in many ways). Let's try it with push access for every developer.
Re: [gentoo-dev] gentoo git workflow
On Sun, Sep 14, 2014 at 11:11 AM, hasufell wrote: > > The only hard part is that people have to know the differences between > merging/rebasing, fast-forward merges, non-fast-forward merges etc. and > when and when not to do them. > > 'git rebase' is a powerful thing, but also pretty good to mess up your > local history if used wrong. > > I think we can write up a gentoo-specific guide in 2-3 weeks. > Sounds good. I think one thing we need to get over with the whole git migration is the fact that it isn't going to be perfect. We probably will find minor errors in the migration itself, little glitches in the back-end stuff, problems in the proposed workflow, and so on. We're just going to have to adapt. We've been using cvs for eons and have learned to ignore its shortcomings and have well-polished workflows. It isn't like there are 500 devs doing commits every day. We're a reasonably tight community and we're just going to have to work together to get over the inevitable bumps. It may make sense to just start out with guidelines in the beginning, and then we can turn them into rules when problems actually come up. Once upon a time there wasn't a hard rule about changelog entries for removals/etc, and the world didn't end, but we decided that having the rule made more sense than not having it. With git we should expect more of the same - we won't get it 100% right out of the gate. -- Rich
Re: [gentoo-dev] My masterplan for git migration (+ looking for infra to test it)
On Sun, Sep 14, 2014 at 11:42 AM, hasufell wrote: > Patrick Lauer: >>> Are we going to disallow merge commits and ask devs to rebase local >>> changes in order to keep the history "clean"? >> >> Is that going to be sane with our commit frequency? >> > > You have to merge or rebase anyway in case of a push conflict, so the > only difference is the method and the effect on the history. > > Currently... CVS allows you to run repoman on an outdated tree and push > broken ebuilds with repoman being happy. Git will not allow this. > Repoman is going to be a challenge here. With cvs every package is its own private repository with its own private history and cvs only cares if there is a collision within the scope of a single file. With git your commit is against the whole tree. So, even though it is trivial to merge, independent commits against two different packages do collide and need to be rebased or merged. Repoman can run against a single package fairly quickly, so assuming we still allow that we could do a pull/rebase/repman/push workflow even if people are doing commits every few minutes. On the other hand, if you're doing a package move or eclass change or some other change that affects 300 packages, just doing the rebase might cost you a few minutes (due to actual collisions), and running repoman against the whole thing before doing a push isn't going to be practical. Somebody doing a tree-wide commit would almost certainly have to run repoman before the final rebase/merge, push that out, and then maybe do another repoman after-the-fact and maybe clean up any issues. For all intents in purposes that is what we're doing today anyway, since repoman+cvs doesn't offer any kind of tree-wide consistency guarantees unless you're checking out based on a timestamp or something like that. -- Rich
Re: [gentoo-dev] Re: My masterplan for git migration (+ looking for infra to test it)
On Sun, Sep 14, 2014 at 10:56 AM, Michał Górny wrote: > Dnia 2014-09-14, o godz. 10:33:03 > > With git, we can finally do stuff like preparing everything and pushing > in one go. Rebasing or merging will be much easier then, since > the effective push rate will be smaller than current commit rate. While I agree that the ability to consolidate commits will definitely help with the commit rate, I'm not sure it will make a big difference. It will turn a kde stablereq from 300 commits into 1, and do the same for things like package moves and such. However, I suspect that the vast majority of our commits are things like bumps on individual packages that will always be individual commits. Maybe insofar as one person does a bunch of them they can be pushed at the same time, but... Looking at https://github.com/rich0/gentoo-gitmig-2014-02-21 it seems like we get about 150 commits/day on busy days. I suspect that isn't evenly distributed, but you may be right that it will just work out. >> >> Actually doing the conversion is basically a solved problem. If this >> were actually the blocker I'd be all for just sticking the history in >> a different repo and starting from scratch with a new one. > > Was the resulting tree actually verified? How long does the conversion > take? Can it be incremental, i.e. convert most of it, lock CVS, convert > the remaining new commits? The tree has been verified. The verification approaches so far are neither 100% thorough nor realtime in operation. However, I think we have a working migration process and I don't really see the need to do a double-check at the time of the actual migration. ferringb was able to do conversions in about 20min with a decent SSD and a 32-core system. His migration scripts can migrate categories in parallel. I haven't personally tried to run them myself, but I believe robbat2 and patrick have experimented with them. If there is revived interested I can see if I can set them up to run in a chroot with some documentation so that anybody can run it and satisfy themselves that it works, assuming somebody else doesn't have such a chroot ready to go. If finding a host to run it on is a problem I'm sure we could get the Trustees to spring for some time on EC2 or whatever. There is no reason that this couldn't be as simple as extracting a tarball, bind-mounting a cvs repo inside, and firing off the scripts. I do not believe it can be made to be incremental. But, the runtime should be in keeping with your hour-or-two of downtime suggestion. I suspect a fair bit of the downtime will taken just to transfer the copy of the cvroot to the migration server, and transfer the resulting git tree to wherever it needs to go and get all the back-end scripts running/etc. > > Are you willing to champion that, then? :) > Well, I'm in for what it matters. I don't have root on any infra boxes if that is what you're looking for. :) -- Rich
Re: [gentoo-dev] My masterplan for git migration (+ looking for infra to test it)
Dnia 2014-09-15, o godz. 03:15:14 Kent Fredric napisał(a): > On 15 September 2014 02:40, Michał Górny wrote: > > > However, I'm wondering if it would be possible to restrict people from > > accidentally committing straight into github (e.g. merging pull > > requests there instead of to our main server). > > => Github is just a read only mirror, any pull reqs submitted there will be > fielded and pushed to gentoo directly. > > Only downside there is the way github pull reqs work is if the final SHA1's > that hit tree don't match, the pull req doesn't close. > > Solutions: > > - A) Have somebody tasked with reaping old pull reqs with permissions > granted. ( Uck ) > - B) Always use a merge of some kind to mark the pull req as dead ( for > instance, an "ours" merge to mark the branch as deprecated ) > > Both of those options are kinda ugly. If you merge a pull request, I suggest doing a proper 'git merge -S' anyway to get a developer signature on top of all the changes. -- Best regards, Michał Górny signature.asc Description: PGP signature
Re: [gentoo-dev] My masterplan for git migration (+ looking for infra to test it)
Patrick Lauer: >> Are we going to disallow merge commits and ask devs to rebase local >> changes in order to keep the history "clean"? > > Is that going to be sane with our commit frequency? > You have to merge or rebase anyway in case of a push conflict, so the only difference is the method and the effect on the history. Currently... CVS allows you to run repoman on an outdated tree and push broken ebuilds with repoman being happy. Git will not allow this.
Re: [gentoo-dev] My masterplan for git migration (+ looking for infra to test it)
On Sunday 14 September 2014 15:40:06 Davide Pesavento wrote: > On Sun, Sep 14, 2014 at 2:03 PM, Michał Górny wrote: > > We have main developer repo where developers work & commit and are > > relatively happy. For every push into developer repo, automated magic > > thingie merges stuff into user sync repo and updates the metadata cache > > there. > > How long does the md5-cache regeneration process take? Are you sure it > will be able to keep up with the rate of pushes to the repo during > "peak hours"? If not, maybe we could use a time-based thing similar to > the current cvs->rsync synchronization. Best case only one package is affected - a few seconds Worst case someone touches an eclass like eutils, then it expands to something on the order of one or two CPU-hours. > Are we going to disallow merge commits and ask devs to rebase local > changes in order to keep the history "clean"? Is that going to be sane with our commit frequency?
Re: [gentoo-dev] My masterplan for git migration (+ looking for infra to test it)
On 15 September 2014 02:40, Michał Górny wrote: > However, I'm wondering if it would be possible to restrict people from > accidentally committing straight into github (e.g. merging pull > requests there instead of to our main server). > Easy. Put the Gentoo repo in its own group. Don't give anyone any kinds of permissions on it. Have only one approved account for the purpose of pushing commits. Have a post-push hook that replicates to github as that approved account => Github is just a read only mirror, any pull reqs submitted there will be fielded and pushed to gentoo directly. Only downside there is the way github pull reqs work is if the final SHA1's that hit tree don't match, the pull req doesn't close. Solutions: - A) Have somebody tasked with reaping old pull reqs with permissions granted. ( Uck ) - B) Always use a merge of some kind to mark the pull req as dead ( for instance, an "ours" merge to mark the branch as deprecated ) Both of those options are kinda ugly. -- Kent *KENTNL* - https://metacpan.org/author/KENTNL
[gentoo-dev] gentoo git workflow
Rich Freeman: > > This is one of the blockers. We haven't actually decided how we want > to use git. > There are IMO 3 main things to consider for a git workflow: * commit policy * branching model * remote model (and history format somewhere implicitly) Deciding on a _commit policy_ should be fairly straightforward and we already have one point * gpg sign every commit (unless it's a merged branch, then we only care about the merge commit) More things to consider for commit policy are: * commit message format (line length, maybe prepend category/PN?) * do we expect repoman to run successfully for every commit (I'd say no)? * additional information that must be provided * when to force/avoid merge commits Deciding on _branching model_ should be pretty easy here too. We are mainly working on master and there may be developer-specific branches etc. History does not need to be linear. Creating additional branches is up to developers and there are no specific rules about that. The _remote model_ is: use a central repository with every developer having push access. I think this is pretty reasonable for our use case, although I'd love to see a linux-like workflow with enforced reviews that propagate through project members/leads. But I'm not sure we need that much overhead, except for non-trivial stuff like eclasses where we already require reviews (well, more or less). The only hard part is that people have to know the differences between merging/rebasing, fast-forward merges, non-fast-forward merges etc. and when and when not to do them. 'git rebase' is a powerful thing, but also pretty good to mess up your local history if used wrong. I think we can write up a gentoo-specific guide in 2-3 weeks.
Re: [gentoo-dev] Re: My masterplan for git migration (+ looking for infra to test it)
Dnia 2014-09-14, o godz. 10:33:03 Rich Freeman napisał(a): > > Of course, that assumes infra is > > going to cooperate quickly or someone else is willing to provide the > > infra for it. > > The infra components to a git infrastructure are one of the main > blockers at this point. I don't really see cooperation as the issue - > just lack of manpower or interest. By 'cooperating' I simply meant offering the necessary resources in a reasonable time. > > > > 1. send announcement to devs to explain how to use git, > > This is one of the blockers. We haven't actually decided how we want > to use git. > > Sure, everybody knows how to use git. The problem is that there are a > dozen different ways we COULD use git, and nobody has picked the ONE > way we WILL use it. > > This isn't as trivial as you might think. We have a fairly high > commit rate and with a single repository that means that in-between a > pull-merge/rebase-push there is a decent chance of another commit that > will make the resulting push a non-fast-forward. > > People love to point out linux and its insane commit rate. The thing > is, the mainline git repo with all those commits has exactly one > committer - Linus himself. They don't have one big repo with one > master branch that everybody pushes to. At least, that is my > understanding (and there are certainly others here who are more > involved with kernel development). It's hard to talk about commit rate when we combine crippled CVS with awfully stupid two-part repoman committing. This forces us to commit everything immediately, and makes some of us not committing anything at all anymore... With git, we can finally do stuff like preparing everything and pushing in one go. Rebasing or merging will be much easier then, since the effective push rate will be smaller than current commit rate. > > On top of user sync repo rsync is propagated. The rsync tree is populated > > with all old ChangeLogs copied from CVS (stored in 30M git repo), new > > ChangeLogs are generated from git logs and Manifests are expanded. > > So, I don't really have a problem with your design. I still question > whether we still need to be generating changelogs - they seem > incredibly redundant. But, if people really want a redundant copy of > the git log, whatever... I don't want them too. However, I'm pretty sure people will bikeshed this to death if we kill them... Especially that rsync has no git log. Not that many users make real use of ChangeLogs, esp. considering how useless messages often are there... > > Main developer repo > > --- > > > > I was able to create a start git repository that takes around 66M > > as a git pack (this is how much you will have to fetch to start working > > with it). The repository is stripped clean of history and ChangeLogs, > > and has thin Manifests only. > > > > This means we don't have to wait till someone figures out the perfect > > way of converting the old CVS repository. You don't need that history > > most of the time, and you can play with CVS to get it if you really do. > > In any case, we would likely strip the history anyway to get a small > > repo to work with. > > We already have a migration process that coverts the old CVS > repository, generating both a shallow repository that lacks history > and a full repository that contains all of history. Additionally, > these two are consistent - that is the last branch of the full > repository has the same commit ID as the base of the shallow > repository. Basically we generate the full history and then trim out > 99% of it so that the commit in the shallow repository points to a > parent that isn't in the packed repository. > > Actually doing the conversion is basically a solved problem. If this > were actually the blocker I'd be all for just sticking the history in > a different repo and starting from scratch with a new one. Was the resulting tree actually verified? How long does the conversion take? Can it be incremental, i.e. convert most of it, lock CVS, convert the remaining new commits? > > I think we should also merge gentoo-news & glsa & herds.xml into > > the repository. They all reference Gentoo packages at a particular > > state in time, and it would be much nicer to have them synced properly. > > > > I can see the pros/cons here, but I don't personally have an issue > with merging them. As has been brought up elsewhere herds.xml may > just go away. > > If somebody can come up with a set of hooks/scripts that will create > the various trees and the only thing that is left is to get infra to > host them, I think we can make real progress. I don't think this is > something that needs to take a long time. The pieces are mostly there > - they just have to be assembled. Are you willing to champion that, then? :) -- Best regards, Michał Górny signature.asc Description: PGP signature
Re: [gentoo-dev] My masterplan for git migration (+ looking for infra to test it)
Dnia 2014-09-14, o godz. 15:23:24 Jauhien Piatlicki napisał(a): > Another question: will it be possible to maintain a copy of tree on github to > make contributions for users simpler (similarly to e.g. science overlay)? > (Can it somehow be combined with proposed signing mechanism?) Yes. I'm planning to have a mirror on github and bitbucket, and auto-pushing to both. However, I'm wondering if it would be possible to restrict people from accidentally committing straight into github (e.g. merging pull requests there instead of to our main server). In fact, I would start my experiments straight into github if not the fact that they don't allow us to set our own update hooks. -- Best regards, Michał Górny signature.asc Description: PGP signature
Re: [gentoo-dev] My masterplan for git migration (+ looking for infra to test it)
Dnia 2014-09-14, o godz. 15:40:06 Davide Pesavento napisał(a): > On Sun, Sep 14, 2014 at 2:03 PM, Michał Górny wrote: > > We have main developer repo where developers work & commit and are > > relatively happy. For every push into developer repo, automated magic > > thingie merges stuff into user sync repo and updates the metadata cache > > there. > > How long does the md5-cache regeneration process take? Are you sure it > will be able to keep up with the rate of pushes to the repo during > "peak hours"? If not, maybe we could use a time-based thing similar to > the current cvs->rsync synchronization. This strongly depends on how much data is there to update. A few ebuilds are quite fast, eclass change isn't ;). I was thinking of something along the lines of, in pseudo-code speaking: systemctl restart cache-regen That is, we start the regen on every update. If it finishes in time, it commits the new metadata. If another update occurs during regen, we just restart it to let it catch the new data. Of course, if we can't spare the resources to do intermediate updates, we may as well switch to cron-based update method. > [...] > > In any case, we would likely strip the history anyway to get a small > > repo to work with. > > > > I have prepared a basic git update hook that keeps master clean > > and attached it to the bug [1]. It enforces basic policies, prevents > > forced updates and checks GPG signatures on left-most history line. It > > can also be extended to do more extensive tree checks. > > Are we going to disallow merge commits and ask devs to rebase local > changes in order to keep the history "clean"? I don't think we should cripple git. Just to be clear, 'accidental' merges won't happen because the automatic merges are unsigned and the 'update' hook will refuse them. The developers will have to either rebase and resign the commits, or use a signed merge commit whichever makes more sense in particular context. Signed merge commits will also allow merging user-submitted changes while preserving original history. -- Best regards, Michał Górny signature.asc Description: PGP signature
Re: [gentoo-dev] Re: My masterplan for git migration (+ looking for infra to test it)
Dnia 2014-09-14, o godz. 15:17:41 Ulrich Mueller napisał(a): > > On Sun, 14 Sep 2014, Michał Górny wrote: > > > I think we should also merge gentoo-news & glsa & herds.xml into the > > repository. They all reference Gentoo packages at a particular state > > in time, and it would be much nicer to have them synced properly. > > Not a good idea, because we may want to grant commit access to these > repos for people who are not necessarily ebuild devs. We may want to add metadata.xml access to those people too. If you really are that distrustful of our contributors, I believe we can do per-path filtering in the 'update' hook, or use pull request or intermediate-repository based workflow. -- Best regards, Michał Górny signature.asc Description: PGP signature
Re: [gentoo-dev] My masterplan for git migration (+ looking for infra to test it)
Dnia 2014-09-14, o godz. 15:09:25 Jauhien Piatlicki napisał(a): > 14.09.14 14:03, Michał Górny написав(ла): > > Hi, > > > > I'm quite tired of promises and all that perfectionist non-sense which > > locks us up with CVS for next 10 years of bikeshed. Therefore, I have > > prepared a plan how to do git migration, and I believe it's doable in > > less than 2 weeks (plus the testing). Of course, that assumes infra is > > going to cooperate quickly or someone else is willing to provide the > > infra for it. > > > > as always, nice effort, but I foresee lots of bikeshedding in this thread. ) Yes. I'm planning to ignore most of bikeshed and take only serious answers into consideration. Otherwise, we will be stuck with CVS. > > This means we don't have to wait till someone figures out the perfect > > way of converting the old CVS repository. You don't need that history > > most of the time, and you can play with CVS to get it if you really do. > > In any case, we would likely strip the history anyway to get a small > > repo to work with. > > Is it so difficult to convert CVS history? It may be difficult to convert it properly, especially considering the splitting of ebuild+Manifest commit. Then we need to somehow check if it was converted properly. I don't even want to waste my time on this. IMO the history doesn't have such a great value. > > The rsync tree > > -- > > > > We'd also propagate things to rsync. We'd have to populate it with old > > ChangeLogs, new ChangeLog entries (autogenerated from git) and thick > > Manifests. So users won't notice much of a change. > > > > How will user check the ebuild integrity with thick manifests using rsync? The same way he currently does :). > > The remaining issue is signing of stuff. We could supposedly sign > > Manifests but IMO it's a waste of resources considered how poor > > the signing system is for non-git repos. > > Again, how will user check the integrity and authenticity if Manifests are > unsigned? As far as I'm concerned, user can use the user git tree to get proper signatures or any other method that has proper signing support already. If someone wants proper GPG support in rsync, he can work on that. -- Best regards, Michał Górny signature.asc Description: PGP signature
[gentoo-dev] Re: My masterplan for git migration (+ looking for infra to test it)
On Sun, Sep 14, 2014 at 8:03 AM, Michał Górny wrote: > > I'm quite tired of promises and all that perfectionist non-sense which > locks us up with CVS for next 10 years of bikeshed. While I tend to agree with the sentiment, I don't think you're actually targeting the problems that aren't already solved here. > Of course, that assumes infra is > going to cooperate quickly or someone else is willing to provide the > infra for it. The infra components to a git infrastructure are one of the main blockers at this point. I don't really see cooperation as the issue - just lack of manpower or interest. > > I can provide some testing repos once someone is willing to provide > the hardware. We already have plenty of testing repos (well, minus all the back-end stuff). > > 1. send announcement to devs to explain how to use git, This is one of the blockers. We haven't actually decided how we want to use git. Sure, everybody knows how to use git. The problem is that there are a dozen different ways we COULD use git, and nobody has picked the ONE way we WILL use it. This isn't as trivial as you might think. We have a fairly high commit rate and with a single repository that means that in-between a pull-merge/rebase-push there is a decent chance of another commit that will make the resulting push a non-fast-forward. People love to point out linux and its insane commit rate. The thing is, the mainline git repo with all those commits has exactly one committer - Linus himself. They don't have one big repo with one master branch that everybody pushes to. At least, that is my understanding (and there are certainly others here who are more involved with kernel development). > > 2. lock CVS out to read-only, > > 3. create all the git repos, get hooks rolling, > > 4. enable R/W access to the repos. > > With some luck, no more than 2 hours downtime. I agree that the actual conversion should be able to done quickly. > On top of user sync repo rsync is propagated. The rsync tree is populated > with all old ChangeLogs copied from CVS (stored in 30M git repo), new > ChangeLogs are generated from git logs and Manifests are expanded. So, I don't really have a problem with your design. I still question whether we still need to be generating changelogs - they seem incredibly redundant. But, if people really want a redundant copy of the git log, whatever... > Main developer repo > --- > > I was able to create a start git repository that takes around 66M > as a git pack (this is how much you will have to fetch to start working > with it). The repository is stripped clean of history and ChangeLogs, > and has thin Manifests only. > > This means we don't have to wait till someone figures out the perfect > way of converting the old CVS repository. You don't need that history > most of the time, and you can play with CVS to get it if you really do. > In any case, we would likely strip the history anyway to get a small > repo to work with. We already have a migration process that coverts the old CVS repository, generating both a shallow repository that lacks history and a full repository that contains all of history. Additionally, these two are consistent - that is the last branch of the full repository has the same commit ID as the base of the shallow repository. Basically we generate the full history and then trim out 99% of it so that the commit in the shallow repository points to a parent that isn't in the packed repository. Actually doing the conversion is basically a solved problem. If this were actually the blocker I'd be all for just sticking the history in a different repo and starting from scratch with a new one. > > I think we should also merge gentoo-news & glsa & herds.xml into > the repository. They all reference Gentoo packages at a particular > state in time, and it would be much nicer to have them synced properly. > I can see the pros/cons here, but I don't personally have an issue with merging them. As has been brought up elsewhere herds.xml may just go away. If somebody can come up with a set of hooks/scripts that will create the various trees and the only thing that is left is to get infra to host them, I think we can make real progress. I don't think this is something that needs to take a long time. The pieces are mostly there - they just have to be assembled. -- Rich
Re: [gentoo-dev] My masterplan for git migration (+ looking for infra to test it)
On 15 September 2014 00:03, Michał Górny wrote: > This means we don't have to wait till someone figures out the perfect > way of converting the old CVS repository. You don't need that history > most of the time, and you can play with CVS to get it if you really do. > Once somebody works this out, you can also simply make it available as a "replacement" ref. See 'git replace' This would mean, essentially, you could push a ref called 'refs/replace/oldcvs' of value "firstsha1 oldcvssha1" and anyone who wanted it could manually fetch it, and any one who did fetch it would get the full history in all of its glory, and then git would transparently pretend that history was always there anyway. No rebasing required, and available on a need-to-know basis :) -- Kent *KENTNL* - https://metacpan.org/author/KENTNL
Re: [gentoo-dev] My masterplan for git migration (+ looking for infra to test it)
On Sun, Sep 14, 2014 at 3:55 PM, hasufell wrote: > Davide Pesavento: >>> In any case, we would likely strip the history anyway to get a small >>> repo to work with. >>> >>> I have prepared a basic git update hook that keeps master clean >>> and attached it to the bug [1]. It enforces basic policies, prevents >>> forced updates and checks GPG signatures on left-most history line. It >>> can also be extended to do more extensive tree checks. >> >> Are we going to disallow merge commits and ask devs to rebase local >> changes in order to keep the history "clean"? >> > > I'd say it doesn't make sense to create merge commits for conflicts that > arise by someone having pushed earlier than you. > > Merge commits should only be there if they give useful information. > I totally agree. But is there a way to automatically enforce this? > Also... if you merge from a _user_ who is untrusted and allow a > fast-forward merge, then the signature verification fails. That means > for such pull requests you either have to use "git am" or "git merge > --no-ff". > Right. In that case you can either sign the merge commit or amend the user's commit and sign it yourself (re-signing could be needed anyway if you have to rebase). Thanks, Davide
Re: [gentoo-dev] My masterplan for git migration (+ looking for infra to test it)
Jauhien Piatlicki: > > Or well, have our own pull requests review tool. > > Also only a secondary problem. Mirroring on github/bitbucket whatever should be fairly straightforward to allow user contributions. In addition the usual git workflow via e-mail/ML would become more popular (either via git style patches or plain pull request information with branch/commit/repository). So I'd suggest to focus on the git migration first.
Re: [gentoo-dev] My masterplan for git migration (+ looking for infra to test it)
Davide Pesavento: >> Main developer repo >> --- >> >> I was able to create a start git repository that takes around 66M >> as a git pack (this is how much you will have to fetch to start working >> with it). The repository is stripped clean of history and ChangeLogs, >> and has thin Manifests only. >> >> This means we don't have to wait till someone figures out the perfect >> way of converting the old CVS repository. You don't need that history >> most of the time, and you can play with CVS to get it if you really do. > > +1 > +1 >> In any case, we would likely strip the history anyway to get a small >> repo to work with. >> >> I have prepared a basic git update hook that keeps master clean >> and attached it to the bug [1]. It enforces basic policies, prevents >> forced updates and checks GPG signatures on left-most history line. It >> can also be extended to do more extensive tree checks. > > Are we going to disallow merge commits and ask devs to rebase local > changes in order to keep the history "clean"? > I'd say it doesn't make sense to create merge commits for conflicts that arise by someone having pushed earlier than you. Merge commits should only be there if they give useful information. Also... if you merge from a _user_ who is untrusted and allow a fast-forward merge, then the signature verification fails. That means for such pull requests you either have to use "git am" or "git merge --no-ff".
Re: [gentoo-dev] My masterplan for git migration (+ looking for infra to test it)
Jauhien Piatlicki: > > Again, how will user check the integrity and authenticity if Manifests are > unsigned? > While this is an issue to be solved, it shouldn't be a blocker for the git migration. There is no regression if this isn't solved. There is no sane automated method for verifying signed Manifests yet (that should be on PM level) and signing them isn't even enforced throughout the tree. Moreover I highly doubt that there is any user who runs around ebuild directories and checks Manifest signatures by hand. People who really care use emerge-webrsync. If we use the proposed solution, then there is an additional method via the User syncing repo, so it's a win. We can put more effort into solving this for rsync mirrors later, but I'd rather focus on the git migration.
Re: [gentoo-dev] My masterplan for git migration (+ looking for infra to test it)
On Sun, Sep 14, 2014 at 2:03 PM, Michał Górny wrote: > We have main developer repo where developers work & commit and are > relatively happy. For every push into developer repo, automated magic > thingie merges stuff into user sync repo and updates the metadata cache > there. How long does the md5-cache regeneration process take? Are you sure it will be able to keep up with the rate of pushes to the repo during "peak hours"? If not, maybe we could use a time-based thing similar to the current cvs->rsync synchronization. [...] > Main developer repo > --- > > I was able to create a start git repository that takes around 66M > as a git pack (this is how much you will have to fetch to start working > with it). The repository is stripped clean of history and ChangeLogs, > and has thin Manifests only. > > This means we don't have to wait till someone figures out the perfect > way of converting the old CVS repository. You don't need that history > most of the time, and you can play with CVS to get it if you really do. +1 > In any case, we would likely strip the history anyway to get a small > repo to work with. > > I have prepared a basic git update hook that keeps master clean > and attached it to the bug [1]. It enforces basic policies, prevents > forced updates and checks GPG signatures on left-most history line. It > can also be extended to do more extensive tree checks. Are we going to disallow merge commits and ask devs to rebase local changes in order to keep the history "clean"? Thanks a lot, Davide
Re: [gentoo-dev] Re: My masterplan for git migration (+ looking for infra to test it)
> On Sun, 14 Sep 2014, Johannes Huber wrote: > Am Sonntag 14 September 2014, 15:17:41 schrieb Ulrich Mueller: >> > On Sun, 14 Sep 2014, Michał Górny wrote: >> > I think we should also merge gentoo-news & glsa & herds.xml into the >> > repository. They all reference Gentoo packages at a particular state >> > in time, and it would be much nicer to have them synced properly. >> >> Not a good idea, because we may want to grant commit access to >> these repos for people who are not necessarily ebuild devs. > This could be solved by a pull requests review tool (gerrit, > reviewboard, gitlab etc). Second argument is that gentoo-x86 is large enough as it is, and we shouldn't make it even larger by merging in things that are not strictly necessary. Especially glsa has a non negligible size. Ulrich pgpl6AlNuOgQC.pgp Description: PGP signature
Re: [gentoo-dev] My masterplan for git migration (+ looking for infra to test it)
14.09.14 15:25, "C. Bergström" написав(ла): > On 09/14/14 08:24 PM, Jauhien Piatlicki wrote: >> 14.09.14 15:23, Jauhien Piatlicki написав(ла): >>> Another question: will it be possible to maintain a copy of tree on github >>> to make contributions for users simpler (similarly to e.g. science >>> overlay)? (Can it somehow be combined with proposed signing mechanism?) >>> >>> >> Or well, have our own pull requests review tool. > NIH? What would be the benefit of that.. before going down this path.. I > think there's some good tools around which may at least serve as a base to > (fork) from before starting a ground up project. > > Sorry to jump in the middle of the conversation, but I know 1st hand how much > is involved here. > I was not precise. By our own I mean hosted by us, not by github. ) signature.asc Description: OpenPGP digital signature
Re: [gentoo-dev] My masterplan for git migration (+ looking for infra to test it)
On 09/14/14 08:24 PM, Jauhien Piatlicki wrote: 14.09.14 15:23, Jauhien Piatlicki написав(ла): Another question: will it be possible to maintain a copy of tree on github to make contributions for users simpler (similarly to e.g. science overlay)? (Can it somehow be combined with proposed signing mechanism?) Or well, have our own pull requests review tool. NIH? What would be the benefit of that.. before going down this path.. I think there's some good tools around which may at least serve as a base to (fork) from before starting a ground up project. Sorry to jump in the middle of the conversation, but I know 1st hand how much is involved here.
Re: [gentoo-dev] My masterplan for git migration (+ looking for infra to test it)
14.09.14 15:23, Jauhien Piatlicki написав(ла): > Another question: will it be possible to maintain a copy of tree on github to > make contributions for users simpler (similarly to e.g. science overlay)? > (Can it somehow be combined with proposed signing mechanism?) > > Or well, have our own pull requests review tool. signature.asc Description: OpenPGP digital signature
Re: [gentoo-dev] My masterplan for git migration (+ looking for infra to test it)
Another question: will it be possible to maintain a copy of tree on github to make contributions for users simpler (similarly to e.g. science overlay)? (Can it somehow be combined with proposed signing mechanism?) signature.asc Description: OpenPGP digital signature
Re: [gentoo-dev] Re: My masterplan for git migration (+ looking for infra to test it)
Am Sonntag 14 September 2014, 15:17:41 schrieb Ulrich Mueller: > > On Sun, 14 Sep 2014, Michał Górny wrote: > > I think we should also merge gentoo-news & glsa & herds.xml into the > > repository. They all reference Gentoo packages at a particular state > > in time, and it would be much nicer to have them synced properly. > > Not a good idea, because we may want to grant commit access to these > repos for people who are not necessarily ebuild devs. > > Ulrich This could be solved by a pull requests review tool (gerrit, reviewboard, gitlab etc). -- Johannes Huber (johu) Gentoo Linux Developer / KDE Team GPG Key ID F3CFD2BD signature.asc Description: This is a digitally signed message part.
[gentoo-dev] Re: My masterplan for git migration (+ looking for infra to test it)
> On Sun, 14 Sep 2014, Michał Górny wrote: > I think we should also merge gentoo-news & glsa & herds.xml into the > repository. They all reference Gentoo packages at a particular state > in time, and it would be much nicer to have them synced properly. Not a good idea, because we may want to grant commit access to these repos for people who are not necessarily ebuild devs. Ulrich pgpiMJ_H9pr1L.pgp Description: PGP signature
Re: [gentoo-dev] My masterplan for git migration (+ looking for infra to test it)
Hi, 14.09.14 14:03, Michał Górny написав(ла): > Hi, > > I'm quite tired of promises and all that perfectionist non-sense which > locks us up with CVS for next 10 years of bikeshed. Therefore, I have > prepared a plan how to do git migration, and I believe it's doable in > less than 2 weeks (plus the testing). Of course, that assumes infra is > going to cooperate quickly or someone else is willing to provide the > infra for it. > as always, nice effort, but I foresee lots of bikeshedding in this thread. ) > This means we don't have to wait till someone figures out the perfect > way of converting the old CVS repository. You don't need that history > most of the time, and you can play with CVS to get it if you really do. > In any case, we would likely strip the history anyway to get a small > repo to work with. > Is it so difficult to convert CVS history? > > The rsync tree > -- > > We'd also propagate things to rsync. We'd have to populate it with old > ChangeLogs, new ChangeLog entries (autogenerated from git) and thick > Manifests. So users won't notice much of a change. > How will user check the ebuild integrity with thick manifests using rsync? > The remaining issue is signing of stuff. We could supposedly sign > Manifests but IMO it's a waste of resources considered how poor > the signing system is for non-git repos. > Again, how will user check the integrity and authenticity if Manifests are unsigned? Also, it would be a good idea to add automatic signature checking to portage for overlays that use signing (or is it already done?). -- Jauhien signature.asc Description: OpenPGP digital signature
[gentoo-dev] My masterplan for git migration (+ looking for infra to test it)
Hi, I'm quite tired of promises and all that perfectionist non-sense which locks us up with CVS for next 10 years of bikeshed. Therefore, I have prepared a plan how to do git migration, and I believe it's doable in less than 2 weeks (plus the testing). Of course, that assumes infra is going to cooperate quickly or someone else is willing to provide the infra for it. I can provide some testing repos once someone is willing to provide the hardware. What needs to be done - I can do most of the scripting. What I need others to do is provide the hosting for git repos. We can't use public services like github since they don't allow us to set our own update hook, so we can't enforce signing policies etc. Once basic infra is ready, I think the following is the best way to switch: 1. send announcement to devs to explain how to use git, 2. lock CVS out to read-only, 3. create all the git repos, get hooks rolling, 4. enable R/W access to the repos. With some luck, no more than 2 hours downtime. The infra - The general idea is based on 3-level structure that's extension of how Funtoo works. The following ultimately pretty picture explains that: ++ | developer repo | - - - - - - - - - - -, ++ v | +--+ | | cache, DTDs and other extras | v +--+ ++ | | user sync repo | <' ++ - - - - - - - - - - , | v | +-+ | | ChangeLogs, thick Manifests | v +-+ ++ | |rsync | <---' ++ Text version: We have main developer repo where developers work & commit and are relatively happy. For every push into developer repo, automated magic thingie merges stuff into user sync repo and updates the metadata cache there. User sync repo is for power users than want to fetch via git. It's quite fast and efficient for frequent updates, and also saves space by being free of ChangeLogs. On top of user sync repo rsync is propagated. The rsync tree is populated with all old ChangeLogs copied from CVS (stored in 30M git repo), new ChangeLogs are generated from git logs and Manifests are expanded. Main developer repo --- I was able to create a start git repository that takes around 66M as a git pack (this is how much you will have to fetch to start working with it). The repository is stripped clean of history and ChangeLogs, and has thin Manifests only. This means we don't have to wait till someone figures out the perfect way of converting the old CVS repository. You don't need that history most of the time, and you can play with CVS to get it if you really do. In any case, we would likely strip the history anyway to get a small repo to work with. I have prepared a basic git update hook that keeps master clean and attached it to the bug [1]. It enforces basic policies, prevents forced updates and checks GPG signatures on left-most history line. It can also be extended to do more extensive tree checks. For GPG signing, I relied upon gpg to do the right thing. That is, git checks the signatures and we accept only trusted signatures. So an external tool (gentoo-keys) need to play with gpg to import, trust and revoke developer keys. I think we should also merge gentoo-news & glsa & herds.xml into the repository. They all reference Gentoo packages at a particular state in time, and it would be much nicer to have them synced properly. [1]:https://bugs.gentoo.org/show_bug.cgi?id=502060 User syncing repo - IMO this will be the most useful syncing method. The user syncing repo is updated automatically for developer repo commits, and afterwards md5-cache is regenerated and committed. Also other repositories (like DTDs, glsas and others if you dislike the previous idea) are merged into it. This repo is still free of ChangeLogs (since git logs are more efficient) and has thin Manifests. It's the space-efficient Gentoo variant. And commits are signed so users can verify the trust. The rsync tree -- We'd also propagate things to rsync. We'd have to populate it with old ChangeLogs, new ChangeLog entries (autogenerated from git) and thick Manifests. So users won't notice much of a change. The remaining issue is signing of stuff. We could supposedly sign Manifests but IMO it's a waste of resources considered how poor the signing system is for non-git repos. -- Best regards, Michał Górny signature.asc Description: PGP signature