[scm-migration-dev] the CTRL-C dilemma

David Marker Sun, 03 Aug 2008 11:48:34 -0600

OK, Mark Nelson's testing yesterday showed that we can't trap CTRL-C on 
the server.


Basically a CTRL-C while doing a push to the gate has some good and some 
surprising effects:

    (1) If done in a pretxnchangegroup hook you will be rolled back.
        (a) it takes a while. If you push again hg reports that there is 
nothing to push
            -- this means the push lock isn't held while rolling back!
        (b) anybody else will be rejected during this period due to 
multiple heads.
            -- fortunate we check that due to (a).

    (2) If done in a changegroup hook you won't be rolled back, but no 
further changegroup hooks will run


This has some interesting implications, depending upon how we arrange 
the hooks.

The clone push (and also the hg.os.o) can't be a pretxnchangegroup hook.

If it is you could end up rolling back the gate and your changes would 
be in the clone.
Not a good situation. The clone can't be "ahead" of the gate ever.

So we make the clone push a changegroup hook (same for hg.os.o), 
probably the last
hook.

But that has another interesting side effect. If you CTRL-C before your 
change is pushed to
the clone then *nobody* can merge up. That means that anybody following 
you will get
rejected by the sanity test for having multiple heads (as they should be).

So now the gate is pretty much in Denial of Service mode until somebody 
writes to gk
and says, "Hey I keep pulling from the clone and I'm up to date. But I 
can't seem to
push to the gate!". At which point a gatekeeper can go look and see that 
somebody
interrupted their push.

This gives gatekeeper a couple options: rollback the push that didn't 
succeed or manually
push it on over to the clone.

Unless of course the same user (the only one in sync with the gate now 
and who won't
be rejected for having multiple heads) pushes again. That would clear it 
up automatically.
(Well, unless they hit CTRL-C again.)

We can't just have some random cron job see if they are in sync. The 
whole point of
push-only gate and pull-only clone is that it protects everybody from 
getting a cset that
is doomed to rejection (no RTI, or didn't pass sanity tests, etc.).

So depending upon when we send out notification, pretxnchangegroup or 
changegroup
we end up with either the "stealth push" or the "phantom push".

We get a "phantom push" if its in pretxnchangegroup and they hit CTRL-C 
(so their
change is rolled back).

The "stealth push" (if clone update is run last in changegroup) will be 
discovered eventually.
Because at some point we will get a complaint. And the hooks are coded 
so that gk can
run them manually. Just do an `hg outgoing` from the gate to determine 
the csets and then
run any hooks that haven't happened already (including the push to the 
clone).

Given that we will have a chance to fix things if clone update is run as 
changegroup that
is a little more preferable. It does mean there can be Denial of Service 
on the gate.

About the only way out of this mess I can think of is this:
    last pretxnchangegroup sends eamil to some user on the gate with the 
previous tip node
    hash and the new tip node hash.
    that user or something in /etc/mail.aliases tests to see if the new 
tip reported made it
    into the gate (after some acceptible delay to make sure it isn't in 
the process of being
    rolled back). If yes it runs all the hooks I had planned on running 
in changegroup hooks.

Any other ideas?
Cause either we run with what we got and accept this flaw, or I got some 
re-tooling
of things like webrev.py, notify.py, update the clone etc.

I was looking forward to not running off of email, but I don't see an 
alternative.
(Yeah it doesn't have to be email, but why write a network daemon to do what
email can already take care of?)
-dvd

[scm-migration-dev] the CTRL-C dilemma

Reply via email to