Re: [PATCH 0/3] fix unparsed object access in upload-pack

2013-03-17 Thread Junio C Hamano
Jeff King p...@peff.net writes:

[3/3]: upload-pack: load non-tip want objects from disk
 
  While investigating the bug, I found some weirdness around the
  stateless-rpc check_non_tip code. As far as I can tell, that code
  never actually gets triggered. It's not too surprising that we
  wouldn't have noticed, because it is about falling back due to a
  race condition. But please sanity check my explanation and patch.
 
 Thanks. That fall-back is Shawn's doing and I suspect that nobody is
 exercising the codepath (he isn't).

 I almost wonder if we should cut it out entirely. It is definitely a
 possible race condition, but I wonder if anybody actually hits it in
 practice (and if they do, the consequence is that the fetch fails and
 needs to be retried). As far as I can tell, the code path has never
 actually been followed, and I do not recall ever seeing a bug report or
 complaint about it (though perhaps it happened once, which spurred the
 initial development?).

If you run multiple servers serving the same repository at the same
URL with a small mirroring lag, one may observe a set of refs from
one server, that are a tad older than the other server you actually
fetch from.  k.org may have such an arrangement, but does GitHub
serve the same repository on multiple machines without tying the
same client to the same backend?

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] fix unparsed object access in upload-pack

2013-03-17 Thread Jeff King
On Sat, Mar 16, 2013 at 11:17:18PM -0700, Junio C Hamano wrote:

  I almost wonder if we should cut it out entirely. It is definitely a
  possible race condition, but I wonder if anybody actually hits it in
  practice (and if they do, the consequence is that the fetch fails and
  needs to be retried). As far as I can tell, the code path has never
  actually been followed, and I do not recall ever seeing a bug report or
  complaint about it (though perhaps it happened once, which spurred the
  initial development?).
 
 If you run multiple servers serving the same repository at the same
 URL with a small mirroring lag, one may observe a set of refs from
 one server, that are a tad older than the other server you actually
 fetch from.  k.org may have such an arrangement, but does GitHub
 serve the same repository on multiple machines without tying the
 same client to the same backend?

Each repository is a on a single backend host. They're redundant
internally (each host is actually multiple hosts), but pure-git requests
go to a single master for each host (though for some read-only
operations I think we spread the load across the redundant spares). You
might get a separate machine during a failover event, but they share
block devices via DRBD, so in theory an fsync() should hit both
machines, and there is no lag (and you are likely to get an intermittent
failure in such a case, anyway, since the machine serving your git
request probably died mid-packet).

I thought this change was to prevent against the common race:

  1. Client request stateless ref advertisement.

  2. Somebody updates ref.

  3. Client requests want objects based on old advertisement.

and I think it does solve that (assuming step 2 is not a rewind). The
important thing is that time always moves forward.

But if you are talking about mirror lag, time can move in either
direction. Imagine you have two machines, A and B, and A is missing an
update to B. If you hit A first, then B, it is the same as the update
sequence above. The patch helps. But if you hit B first, then A, you
will ask it for objects it has not yet received, and we must fail.

So I think any such mirroring setup would want to try very hard to make
sure you hit the same machine both times anyway, regardless of this
patch.

I'm fine to leave it. I was just questioning its utility since AFAICT,
it has never worked and nobody has cared. It's not too much code,
though, and it is only run when we hit the race, so I don't think it is
hurting anything.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] fix unparsed object access in upload-pack

2013-03-17 Thread René Scharfe

Am 17.03.2013 06:40, schrieb Jeff King:

We do have the capability to roll out to one or a few of our servers
(the granularity is not 0.2%, but it is still small). I'm going to try
to keep us more in sync with upstream git, but I don't know if I will
get to the point of ever deploying master or next, even for a small
portion of the population. We are accumulating more hacks[1] on top of
git, so it is not just run master for an hour on this server; I have
to actually merge our fork.


Did you perhaps intend to list these hacks in a footnote or link to a 
repository containing them?  (I can't find the counterpart of that [1].)


René

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] fix unparsed object access in upload-pack

2013-03-16 Thread Junio C Hamano
Jeff King p...@peff.net writes:

 This series fixes the issue I mentioned recently with upload-pack, where
 we might feed unparsed objects to the revision parser. The bug is in
 435c833 (the tip of the jk/peel-ref topic), which is in v1.8.1 and up.

Good to see follow-up from a responsible contributor ;-)

 ... (I had several bug reports
 within a few hours of deploying v1.8.1.5 on github.com)

Nice to have a pro at the widely used site ;-)  I often wish it had
a mechanism to deploy the tip of 'master' or 'maint', or even 'next'
for 0.2% of its users' repositories to give us an early feedback.

   [3/3]: upload-pack: load non-tip want objects from disk

 While investigating the bug, I found some weirdness around the
 stateless-rpc check_non_tip code. As far as I can tell, that code
 never actually gets triggered. It's not too surprising that we
 wouldn't have noticed, because it is about falling back due to a
 race condition. But please sanity check my explanation and patch.

Thanks. That fall-back is Shawn's doing and I suspect that nobody is
exercising the codepath (he isn't).
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] fix unparsed object access in upload-pack

2013-03-16 Thread Jeff King
On Sat, Mar 16, 2013 at 10:16:40PM -0700, Junio C Hamano wrote:

  ... (I had several bug reports
  within a few hours of deploying v1.8.1.5 on github.com)
 
 Nice to have a pro at the widely used site ;-)  I often wish it had
 a mechanism to deploy the tip of 'master' or 'maint', or even 'next'
 for 0.2% of its users' repositories to give us an early feedback.

We are quite slow and conservative at deploying new git, preferring
instead to let the rest of the world act as our testbed. :)

As seen with this bug, though, we really do get a lot more coverage of
weird cases due to our size. In the cases I looked at, the trigger
seemed to be clients doing the equivalent of of git clone --depth=X
--no-single-branch. Almost nobody would do that intentionally, but
prior to v1.7.10, we did not have --single-branch; older clients using
--depth on a repository with multiple branches started failing clones
almost immediately.

We do have the capability to roll out to one or a few of our servers
(the granularity is not 0.2%, but it is still small). I'm going to try
to keep us more in sync with upstream git, but I don't know if I will
get to the point of ever deploying master or next, even for a small
portion of the population. We are accumulating more hacks[1] on top of
git, so it is not just run master for an hour on this server; I have
to actually merge our fork.

I had been handling our hacks as patch series to be rebased on top of
upstream git releases, but that was getting increasingly unwieldy
(especially as people besides me work on it). Going forward, I'm going
to treat upstream git as a vendor branch and merge in occasionally to
get fixes.

One thing I might try is to keep a local next-upstream branch, that is
continually merging what is on upstream master with our local production
version. Then I could graduate it to production just like any other
topic branch when it comes time (instead of doing a gigantic painful
merge when we decide to upgrade upstream git).

That would mean I could test-deploy the next-upstream branch to a few
servers on any given day without doing a lot of work. So it might make
sense to do it at key times, like when we are in -rc here, to help shake
out any existing bugs before you make a release.

[3/3]: upload-pack: load non-tip want objects from disk
 
  While investigating the bug, I found some weirdness around the
  stateless-rpc check_non_tip code. As far as I can tell, that code
  never actually gets triggered. It's not too surprising that we
  wouldn't have noticed, because it is about falling back due to a
  race condition. But please sanity check my explanation and patch.
 
 Thanks. That fall-back is Shawn's doing and I suspect that nobody is
 exercising the codepath (he isn't).

I almost wonder if we should cut it out entirely. It is definitely a
possible race condition, but I wonder if anybody actually hits it in
practice (and if they do, the consequence is that the fetch fails and
needs to be retried). As far as I can tell, the code path has never
actually been followed, and I do not recall ever seeing a bug report or
complaint about it (though perhaps it happened once, which spurred the
initial development?).

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html