Re: propagating repo corruption across clone

2013-03-28 Thread Jeff Mitchell
On Wed, Mar 27, 2013 at 2:51 PM, Rich Fromm  wrote:
> Nevertheless, I will try to contact Jeff and point him at this.  My initial
> reading of his blog posts definitely gave me the impression that this was a
> --mirror vs. not issue, but it really sounds like his main problem was using
> --local.

Actually, I wasn't using --local, I just wasn't using --no-local, as I
mixed up that and --no-hardlinks (which lies somewhere between, but is
still not the same as using file://).

It's entirely possible that you read the posts before I updated them.

Also, keep in mind, if you're evaluating what you're doing, that the
system we have was not set up to be a backup system. It was set up to
be a mirror system. With proper sanity checking, it would have acted
in a decent capacity as a backup system, but that wasn't why it was
set up, nor how it was set up.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: propagating repo corruption across clone

2013-03-28 Thread Jeff Mitchell
On Tue, Mar 26, 2013 at 11:47 PM, Junio C Hamano  wrote:
> The difference between --mirror and no --mirror is a red herring.
> You may want to ask Jeff Mitchell to remove the mention of it; it
> only adds to the confusion without helping users.  If you made
> byte-for-byte copy of corrupt repository, it wouldn't make any
> difference if the first "checkout" notices it.

Hi,

Several days ago I had actually already updated the post to indicate
that my testing methodology was incorrect as a result of mixing up
--no-hardlinks and --no-local, and pointed to this thread.

I will say that we did see corrupted repos on the downstream mirrors.
I don't have an explanation for it, but have not been able to
reproduce it either. My only potential guess (untested) is that
perhaps when the corruption was detected the clone aborted but left
the objects already transferred locally. Again, untested -- I mention
it only because it's my only potential explanation  :-)

> To be paranoid, you may want to set transfer.fsckObjects to true,
> perhaps in your ~/.gitconfig.

Interesting; I'd known about receive.fsckObjects but not
transfer/fetch. Thanks for the pointer.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: propagating repo corruption across clone

2013-03-27 Thread Jeff King
On Wed, Mar 27, 2013 at 03:49:38PM -0400, Jeff King wrote:

> On Wed, Mar 27, 2013 at 11:23:15AM -0700, Rich Fromm wrote:
> 
> > But I'm still somewhat confused about what is and is not checked under what
> > conditions.  Consider the three statements:
> > 
> > # 1
> > git clone --mirror myuser@myhost:my_repo
> > 
> > # 2
> > git clone --mirror --config transfer.fsckObjects=true myuser@myhost:my_repo
> > 
> > # 3
> > git clone --mirror myuser@myhost:my_repo && cd my_repo.git && git-fsck
> > 
> > Are 2 and 3 equivalent?  Or is there an increasing level of checking that
> > occurs from 1 to 2, and from 2 to 3?  My guess is the latter, but perhaps
> > this could be clearer in the man pages.
> 
> 2 and 3 are not exactly equivalent, in that they are implemented
> slightly differently, but I do not know offhand of any case that would
> pass 2 but not 3. We do check reachability with transfer.fsckObjects.

Oh, and in the case of #1, I think we would already find corruption, in
that index-pack will expand and check the sha1 of each object we
receive. The transfer.fsckObjects check adds some semantic checks as
well (e.g., making sure author identities are well-formed).

Clone will not currently detect missing objects and reachability
without transfer.fsckObjects set, but that is IMHO a bug; fetch will
notice it, and clone should behave the same way.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: propagating repo corruption across clone

2013-03-27 Thread Jeff King
On Wed, Mar 27, 2013 at 11:23:15AM -0700, Rich Fromm wrote:

> But I'm still somewhat confused about what is and is not checked under what
> conditions.  Consider the three statements:
> 
> # 1
> git clone --mirror myuser@myhost:my_repo
> 
> # 2
> git clone --mirror --config transfer.fsckObjects=true myuser@myhost:my_repo
> 
> # 3
> git clone --mirror myuser@myhost:my_repo && cd my_repo.git && git-fsck
> 
> Are 2 and 3 equivalent?  Or is there an increasing level of checking that
> occurs from 1 to 2, and from 2 to 3?  My guess is the latter, but perhaps
> this could be clearer in the man pages.

2 and 3 are not exactly equivalent, in that they are implemented
slightly differently, but I do not know offhand of any case that would
pass 2 but not 3. We do check reachability with transfer.fsckObjects.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: propagating repo corruption across clone

2013-03-27 Thread Junio C Hamano
Rich Fromm  writes:

> Apologies if my questions are considered slightly off topic -- I'm not
> positive if this is supposed to be a list for developers, and not users.

The list is both for users and developers.

> However, I think there may be room for some additional clarity in the docs. 
> The --local option in git-config(1) says "When the repository to clone from
> is on a local machine, this flag bypasses the normal "git aware" transport
> mechanism".  But there's no mention of the consequences of this transport
> bypass.

Yeah, I would not mind a patch to update the documentation for
"clone --local" and rsync transport to say something about
byte-for-byte copying of broken repository.

Thanks.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: propagating repo corruption across clone

2013-03-27 Thread Rich Fromm
Junio C Hamano wrote
> If you use --local, that is equivalent to "cp -R".  Your corruption
> in the source will faithfully be byte-for-byte copied to the
> destination.  If you do not
> ...
> transport layer will notice
> object corruption.
> ...
> The difference between --mirror and no --mirror is a red herring.
> You may want to ask Jeff Mitchell to remove the mention of it; it
> only adds to the confusion without helping users.

Just to clarify, I don't know Jeff Mitchell personally, and I'm not
affiliated with the KDE project.  I happened to have recently implemented a
backup strategy for a different codebase, that relies on `git clone
--mirror` to take the actual snapshots of the live repos, and I read about
Jeff's experiences, and that's why I started following this discussion. 
Apologies if my questions are considered slightly off topic -- I'm not
positive if this is supposed to be a list for developers, and not users.

Nevertheless, I will try to contact Jeff and point him at this.  My initial
reading of his blog posts definitely gave me the impression that this was a
--mirror vs. not issue, but it really sounds like his main problem was using
--local.

However, I think there may be room for some additional clarity in the docs. 
The --local option in git-config(1) says "When the repository to clone from
is on a local machine, this flag bypasses the normal "git aware" transport
mechanism".  But there's no mention of the consequences of this transport
bypass.  There's also no mention of this in the "GIT URLS" section that
discusses transport protocols, and I also don't see anything noting it in
either of these sections of the git book:

http://git-scm.com/book/en/Git-on-the-Server-The-Protocols
http://git-scm.com/book/en/Git-Internals-Transfer-Protocols




--
View this message in context: 
http://git.661346.n2.nabble.com/propagating-repo-corruption-across-clone-tp7580504p7580845.html
Sent from the git mailing list archive at Nabble.com.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: propagating repo corruption across clone

2013-03-27 Thread Rich Fromm
Jonathan Nieder-2 wrote
> Is the "[transfer] fsckObjects" configuration on the host executing the
> clone set to true?

I hadn't been setting it at all, and according to git-config(1) it defaults
to false, so the answer is no.  It looks like setting it might be a good
idea.

But I'm still somewhat confused about what is and is not checked under what
conditions.  Consider the three statements:

# 1
git clone --mirror myuser@myhost:my_repo

# 2
git clone --mirror --config transfer.fsckObjects=true myuser@myhost:my_repo

# 3
git clone --mirror myuser@myhost:my_repo && cd my_repo.git && git-fsck

Are 2 and 3 equivalent?  Or is there an increasing level of checking that
occurs from 1 to 2, and from 2 to 3?  My guess is the latter, but perhaps
this could be clearer in the man pages.

git-config(1) says that transfer.fsckObjects essentially (if fetch... and
receive... are not explicitly set) "git-fetch-pack will check all fetched
objects" and "git-receive-pack will check all received objects."  While
git-fsck(1) says "git-fsck tests SHA1 and general object sanity, and it does
full tracking of the resulting reachability and everything else."  The
latter sounds like a stronger statement to me.  But if that's true, perhaps
should the relevant section(s) of git-config(1) explicitly note that this is
not equivalent to a full git-fsck ?



--
View this message in context: 
http://git.661346.n2.nabble.com/propagating-repo-corruption-across-clone-tp7580504p7580839.html
Sent from the git mailing list archive at Nabble.com.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: propagating repo corruption across clone

2013-03-27 Thread Sitaram Chamarty
On Wed, Mar 27, 2013 at 8:33 PM, Junio C Hamano  wrote:
> Sitaram Chamarty  writes:
>
>> On Wed, Mar 27, 2013 at 9:17 AM, Junio C Hamano  wrote:
>>
>>> To be paranoid, you may want to set transfer.fsckObjects to true,
>>> perhaps in your ~/.gitconfig.
>>
>> do we have any numbers on the overhead of this?
>>
>> Even a "guesstimate" will do...
>
> On a reasonably slow machine:
>
> $ cd /project/git/git.git && git repack -a -d
> $ ls -hl .git/objects/pack/*.pack
> -r--r--r-- 1 junio src 44M Mar 26 13:18 
> .git/objects/pack/pack-c40635e5ee2b7094eb0e2c416e921a2b129bd8d2.pack
>
> $ cd .. && git --bare init junk && cd junk
> $ time git index-pack --strict --stdin <../git.git/.git/objects/pack/*.pack
> real0m13.873s
> user0m21.345s
> sys 0m2.248s
>
> That's about 3.2 Mbps?
>
> Compare that with the speed your other side feeds you (or your line
> speed could be the limiting factor) and decide how much you value
> your data.

Thanks.  I was also interested in overhead on the server just as a %-age.

I have no idea why but when I did some tests a long time ago I got
upwards of 40% or so, but now when I try these tests for git.git

cd 
git init --bare
# git config transfer.fsckobjects true
git fetch file:///full/path/to/git.git refs/*:refs/*

then, the difference in elapsed time 18s -> 22s, so about 22%, and CPU
time is 31 -> 37, so about 20%.  I didn't measure disk access
increases, but I guess 20% is not too bad.

Is it likely to be linear in the size of the repo, by and large?
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: propagating repo corruption across clone

2013-03-27 Thread Junio C Hamano
Sitaram Chamarty  writes:

> On Wed, Mar 27, 2013 at 9:17 AM, Junio C Hamano  wrote:
>
>> To be paranoid, you may want to set transfer.fsckObjects to true,
>> perhaps in your ~/.gitconfig.
>
> do we have any numbers on the overhead of this?
>
> Even a "guesstimate" will do...

On a reasonably slow machine:

$ cd /project/git/git.git && git repack -a -d
$ ls -hl .git/objects/pack/*.pack
-r--r--r-- 1 junio src 44M Mar 26 13:18 
.git/objects/pack/pack-c40635e5ee2b7094eb0e2c416e921a2b129bd8d2.pack

$ cd .. && git --bare init junk && cd junk
$ time git index-pack --strict --stdin <../git.git/.git/objects/pack/*.pack
real0m13.873s
user0m21.345s
sys 0m2.248s

That's about 3.2 Mbps?

Compare that with the speed your other side feeds you (or your line
speed could be the limiting factor) and decide how much you value
your data.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: propagating repo corruption across clone

2013-03-26 Thread Sitaram Chamarty
On Wed, Mar 27, 2013 at 9:17 AM, Junio C Hamano  wrote:

> To be paranoid, you may want to set transfer.fsckObjects to true,
> perhaps in your ~/.gitconfig.

do we have any numbers on the overhead of this?

Even a "guesstimate" will do...
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: propagating repo corruption across clone

2013-03-26 Thread Junio C Hamano
Rich Fromm  writes:

> Jeff King wrote
>> Fundamentally the problem is
>> that the --local transport is not safe from propagating corruption, and
>> should not be used if that's a requirement.
>
> I've read Jeff Mitchell's blog post, his update, relevant parts of the
> git-clone(1) man page, and a decent chunk of this thread, and I'm still not
> clear on one thing.  Is the danger of `git clone --mirror` propagating
> corruption only true when using the --local option ?

If you use --local, that is equivalent to "cp -R".  Your corruption
in the source will faithfully be byte-for-byte copied to the
destination.  If you do not (and in your case you have two different
machines), unless you are using the long deprecated rsync transport
(which again is the same as "cp -R"), transport layer will notice
object corruption.  See Jeff's analysis earlier in the thread.

If you are lucky (or unlucky, depending on how you look at it), the
corruption you have in your object store may affect objects that are
needed to check out the version at the tip of the history, and "git
checkout" that happens as the last step of cloning may loudly
complain, but that just means you can immediately notice the
breakage in that case.  You may be unlucky and the corruption may
not affect objects that are needed to check out the tip. The initial
checkout will succeed as if nothing is wrong, but the corruption in
your object store is still there nevertheless.  "git log -p --all"
or "git fsck" will certainly be unhappy.

The difference between --mirror and no --mirror is a red herring.
You may want to ask Jeff Mitchell to remove the mention of it; it
only adds to the confusion without helping users.  If you made
byte-for-byte copy of corrupt repository, it wouldn't make any
difference if the first "checkout" notices it.

To be paranoid, you may want to set transfer.fsckObjects to true,
perhaps in your ~/.gitconfig.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: propagating repo corruption across clone

2013-03-26 Thread Jonathan Nieder
Hi,

Rich Fromm wrote:

>The host executing the clone
> command is different than the the host on which the remote repository lives,
> and I am using ssh as a transport protocol.  If there is corruption, can I
> or can I not expect the clone operation to fail and return a non-zero exit
> value?  If I can not expect this, is the workaround to run `git fsck` on the
> resulting clone?

Is the "[transfer] fsckObjects" configuration on the host executing the
clone set to true?
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: propagating repo corruption across clone

2013-03-26 Thread Rich Fromm
Jeff King wrote
> Fundamentally the problem is
> that the --local transport is not safe from propagating corruption, and
> should not be used if that's a requirement.

I've read Jeff Mitchell's blog post, his update, relevant parts of the
git-clone(1) man page, and a decent chunk of this thread, and I'm still not
clear on one thing.  Is the danger of `git clone --mirror` propagating
corruption only true when using the --local option ?

Specifically, in my case, I'm using `git clone --mirror`, but I'm *not*
using --local, nor am I using --no-hardlinks.  The host executing the clone
command is different than the the host on which the remote repository lives,
and I am using ssh as a transport protocol.  If there is corruption, can I
or can I not expect the clone operation to fail and return a non-zero exit
value?  If I can not expect this, is the workaround to run `git fsck` on the
resulting clone?




--
View this message in context: 
http://git.661346.n2.nabble.com/propagating-repo-corruption-across-clone-tp7580504p7580771.html
Sent from the git mailing list archive at Nabble.com.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: propagating repo corruption across clone

2013-03-26 Thread Jeff King
On Tue, Mar 26, 2013 at 09:59:42PM -, Philip Oakley wrote:

> Which way does `git bundle file.bundl --all` perform after the changes
> for both the 'transport' checking and being reliable during updates.

Bundles are treated at a fairly low level the same as a remote who
provides us a particular set of refs and a packfile. So we should get
the same protections via index-pack, and still run
check_everything_connected on it, just as we would with a fetch over the
git protocol.

I didn't test it, though.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: propagating repo corruption across clone

2013-03-26 Thread Philip Oakley

From: "Jeff King" 
Sent: Tuesday, March 26, 2013 4:55 PM

On Tue, Mar 26, 2013 at 09:43:01AM -0400, Jeff Mitchell wrote:


On Mon, Mar 25, 2013 at 4:07 PM, Jeff King  wrote:
> On Mon, Mar 25, 2013 at 12:32:50PM -0400, Jeff Mitchell wrote:
>> For commit corruptions, the --no-hardlinks, non --mirror case
>> refused
>> to create the new repository and exited with an error code of 128.
>> The
>> --no-hardlinks, --mirror case spewed errors to the console, yet
>> *still* created the new clone *and* returned an error code of
>> zero.
>
> I wasn't able to reproduce this; can you post a succint test case?

[...link to tar.gz...]
Once you extract that, you should be able to run a clone using paths
(not file://) with --no-hardlinks --mirror and replicate the behavior
I saw. FYI, I'm on Git 1.8.2.


Thanks for providing an example.

The difference is the same "--mirror implies --bare" issue; the
non-bare
case dies during the checkout (even before my patches, as the
corruption
is not in a blob, but rather in the HEAD commit object itself). You
can
replace --mirror with --bare and see the same behavior.

The troubling part is that we see errors in the bare case, but do not
die. Those errors all come from upload-pack, the "sending" side of a
clone/fetch. Even though we do not transfer the objects via the git
protocol, we still invoke upload-pack to get the ref list (and then
copy
the objects themselves out-of-band).

What happens is that upload-pack sees the errors while trying to see
if
the object is a tag that can be peeled (the server advertises both
tags
and the objects they point to). It does not distinguish between
"errors
did not let me peel this object" and "this object is not a tag, and
therefore there is nothing to peel".

We could change that, but I'm not sure whether it is a good idea. I
think the intent is that upload-pack's ref advertisement would remain
resilient to corruption in the repository (e.g., even if that commit
is
corrupt, you can still fetch the rest of the data). We should not
worry
about advertising broken objects, because we will encounter the same
error when we actually do try to send the objects. Dying at the
advertisement phase would be premature, since we do not yet know what
the client will request.

The problem, of course, is that the --local optimization _skips_ the
part where we actually ask upload-pack for data, and instead blindly
copies it. So this is the same issue as usual, which is that the local
transport is not thorough enough to catch corruption. It seems like a
failing in this case, because upload-pack does notice the problem, but
that is only luck; if the corruption were in a non-tip object, it
would
not notice it at all. So trying to die on errors in the ref
advertisement would just be a band-aid. Fundamentally the problem is
that the --local transport is not safe from propagating corruption,
and
should not be used if that's a requirement.

-Peff
--


Which way does `git bundle file.bundl --all` perform after the changes
for both the 'transport' checking and being reliable during updates.

Is it an option for creating an archivable file that can be used for a
later `clone`?

I wasn't sure if the bundle capability had been considered.

Philip


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: propagating repo corruption across clone

2013-03-26 Thread Jeff King
On Tue, Mar 26, 2013 at 09:43:01AM -0400, Jeff Mitchell wrote:

> On Mon, Mar 25, 2013 at 4:07 PM, Jeff King  wrote:
> > On Mon, Mar 25, 2013 at 12:32:50PM -0400, Jeff Mitchell wrote:
> >> For commit corruptions, the --no-hardlinks, non --mirror case refused
> >> to create the new repository and exited with an error code of 128. The
> >> --no-hardlinks, --mirror case spewed errors to the console, yet
> >> *still* created the new clone *and* returned an error code of zero.
> >
> > I wasn't able to reproduce this; can you post a succint test case?
>
> [...link to tar.gz...]
> Once you extract that, you should be able to run a clone using paths
> (not file://) with --no-hardlinks --mirror and replicate the behavior
> I saw. FYI, I'm on Git 1.8.2.

Thanks for providing an example.

The difference is the same "--mirror implies --bare" issue; the non-bare
case dies during the checkout (even before my patches, as the corruption
is not in a blob, but rather in the HEAD commit object itself). You can
replace --mirror with --bare and see the same behavior.

The troubling part is that we see errors in the bare case, but do not
die. Those errors all come from upload-pack, the "sending" side of a
clone/fetch. Even though we do not transfer the objects via the git
protocol, we still invoke upload-pack to get the ref list (and then copy
the objects themselves out-of-band).

What happens is that upload-pack sees the errors while trying to see if
the object is a tag that can be peeled (the server advertises both tags
and the objects they point to). It does not distinguish between "errors
did not let me peel this object" and "this object is not a tag, and
therefore there is nothing to peel".

We could change that, but I'm not sure whether it is a good idea. I
think the intent is that upload-pack's ref advertisement would remain
resilient to corruption in the repository (e.g., even if that commit is
corrupt, you can still fetch the rest of the data). We should not worry
about advertising broken objects, because we will encounter the same
error when we actually do try to send the objects. Dying at the
advertisement phase would be premature, since we do not yet know what
the client will request.

The problem, of course, is that the --local optimization _skips_ the
part where we actually ask upload-pack for data, and instead blindly
copies it. So this is the same issue as usual, which is that the local
transport is not thorough enough to catch corruption. It seems like a
failing in this case, because upload-pack does notice the problem, but
that is only luck; if the corruption were in a non-tip object, it would
not notice it at all. So trying to die on errors in the ref
advertisement would just be a band-aid. Fundamentally the problem is
that the --local transport is not safe from propagating corruption, and
should not be used if that's a requirement.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: propagating repo corruption across clone

2013-03-26 Thread Jeff Mitchell
On Mon, Mar 25, 2013 at 4:07 PM, Jeff King  wrote:
> On Mon, Mar 25, 2013 at 12:32:50PM -0400, Jeff Mitchell wrote:
>> For commit corruptions, the --no-hardlinks, non --mirror case refused
>> to create the new repository and exited with an error code of 128. The
>> --no-hardlinks, --mirror case spewed errors to the console, yet
>> *still* created the new clone *and* returned an error code of zero.
>
> I wasn't able to reproduce this; can you post a succint test case?

This actually seems hard to reproduce. I found this during testing
with an existing repository on-disk, but when I tried creating a new
repository with some commit objects, and modifying one of the commit
objects the same way I modified the an object in the previous
repository, I was unable to reproduce it.

I do have the original repository though, so I'll tar.gz it up so that
you can have exactly the same content as I do. It's about 40MB and you
can grab it here:
https://www.dropbox.com/s/e8dhedmpd1a1axs/tomahawk-corrupt.tar.gz (MD5
sum: cde8a43233db5d649932407891f8366b).

Once you extract that, you should be able to run a clone using paths
(not file://) with --no-hardlinks --mirror and replicate the behavior
I saw. FYI, I'm on Git 1.8.2.

Thanks,
Jeff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: propagating repo corruption across clone

2013-03-25 Thread Duy Nguyen
On Mon, Mar 25, 2013 at 10:56 PM, Jeff King  wrote:
> On Mon, Mar 25, 2013 at 10:31:04PM +0700, Nguyen Thai Ngoc Duy wrote:
>
>> On Mon, Mar 25, 2013 at 9:56 PM, Jeff King  wrote:
>> > There are basically three levels of transport that can be used on a
>> > local machine:
>> >
>> >   1. Hard-linking (very fast, no redundancy).
>> >
>> >   2. Byte-for-byte copy (medium speed, makes a separate copy of the
>> >  data, but does not check the integrity of the original).
>> >
>> >   3. Regular git transport, creating a pack (slowest, but should include
>> >  redundancy checks).
>> >
>> > Using --no-hardlinks turns off (1), but leaves (2) as an option.  I
>> > think the documentation in "git clone" could use some improvement in
>> > that area.
>>
>> Not only git-clone. How git-fetch and git-push verify the new pack
>> should also be documented. I don't think many people outside the
>> contributor circle know what is done (and maybe how) when data is
>> received from outside.
>
> I think it's less of a documentation issue there, though, because they
> _only_ do (3). There is no option to do anything else, so there is
> nothing to warn the user about in terms of tradeoffs.
>
> I agree that in general git's handling of corruption could be documented
> somewhere, but I'm not sure where.

I think either a section in git-fsck.txt or git.txt. Probably the
former as people who read it are probably more concerned about
corruption.
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: propagating repo corruption across clone

2013-03-25 Thread Jeff King
On Mon, Mar 25, 2013 at 12:32:50PM -0400, Jeff Mitchell wrote:

> I think what was conflating the issue in my testing is that with
> --mirror it implies --bare, so there would be checking of the objects
> when the working tree was being created, hence --mirror won't show the
> error a normal clone will -- it's not a transport question, it's just
> a matter of the normal clone doing more and so having more data run
> through checks.

Exactly.

> However, there are still problems. For blob corruptions, even in this
> --no-hardlinks, non --mirror case where an error was found, the exit
> code from the clone was 0. I can see this tripping up all sorts of
> automated scripts or repository GUIs that ignore the output and only
> check the error code, which is not an unreasonable thing to do.

Yes, this is a bug. I'll post a series in a minute which fixes it.

> For commit corruptions, the --no-hardlinks, non --mirror case refused
> to create the new repository and exited with an error code of 128. The
> --no-hardlinks, --mirror case spewed errors to the console, yet
> *still* created the new clone *and* returned an error code of zero.

I wasn't able to reproduce this; can you post a succint test case?

> It seems that when there is an "error" as opposed to a "fatal" it
> doesn't affect the status code on a clone; I'd argue that it ought to.

Agreed completely. The current behavior is buggy.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: propagating repo corruption across clone

2013-03-25 Thread Jeff King
On Mon, Mar 25, 2013 at 01:01:59PM -0700, Junio C Hamano wrote:

> Jeff King  writes:
> 
> > We _do_ see a problem during the checkout phase, but we don't propagate
> > a checkout failure to the exit code from clone.  That is bad in general,
> > and should probably be fixed. Though it would never find corruption of
> > older objects in the history, anyway, so checkout should not be relied
> > on for robustness.
> 
> It is obvious that we should exit with non-zero status when we see a
> failure from the checkout, but do we want to nuke the resulting
> repository as in the case of normal transport failure?  A checkout
> failure might be due to being under quota for object store but
> running out of quota upon populating the working tree, in which case
> we probably do not want to.

I'm just running through my final tests on a large-ish patch series
which deals with this (among other issues). I had the same thought,
though we do already die on a variety of checkout errors. I left it as a
die() for now, but I think we should potentially address it with a
further patch.

> >   $ git init non-local && cd non-local && git fetch ..
> >   remote: Counting objects: 3, done.
> >   remote: Total 3 (delta 0), reused 3 (delta 0)
> >   Unpacking objects: 100% (3/3), done.
> >   fatal: missing blob object 'd95f3ad14dee633a758d2e331151e950dd13e4ed'
> >   error: .. did not send all necessary objects
> >
> > we do notice.
> 
> Yes, it is OK to add connectedness check to "git clone".

That's in my series, too. Unfortunately, in the local clone case, it
slows down the clone considerably (since we otherwise would not have to
traverse the objects at all).

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: propagating repo corruption across clone

2013-03-25 Thread Junio C Hamano
Jeff King  writes:

> We _do_ see a problem during the checkout phase, but we don't propagate
> a checkout failure to the exit code from clone.  That is bad in general,
> and should probably be fixed. Though it would never find corruption of
> older objects in the history, anyway, so checkout should not be relied
> on for robustness.

It is obvious that we should exit with non-zero status when we see a
failure from the checkout, but do we want to nuke the resulting
repository as in the case of normal transport failure?  A checkout
failure might be due to being under quota for object store but
running out of quota upon populating the working tree, in which case
we probably do not want to.

> We do not notice the sha1 mis-match on the sending side (which we could,
> if we checked the sha1 as we were sending). We do not notice the broken
> object graph during the receive process either. I would have expected
> check_everything_connected to handle this, but we don't actually call it
> during clone! If you do this:
>
>   $ git init non-local && cd non-local && git fetch ..
>   remote: Counting objects: 3, done.
>   remote: Total 3 (delta 0), reused 3 (delta 0)
>   Unpacking objects: 100% (3/3), done.
>   fatal: missing blob object 'd95f3ad14dee633a758d2e331151e950dd13e4ed'
>   error: .. did not send all necessary objects
>
> we do notice.

Yes, it is OK to add connectedness check to "git clone".

> And one final check:
>
>   $ git -c transfer.fsckobjects=1 clone --no-local . fsck
>   Cloning into 'fsck'...
>   remote: Counting objects: 3, done.
>   remote: Total 3 (delta 0), reused 3 (delta 0)
>   Receiving objects: 100% (3/3), done.
>   error: unable to find d95f3ad14dee633a758d2e331151e950dd13e4ed
>   fatal: object of unexpected type
>   fatal: index-pack failed
>
> Fscking the incoming objects does work, but of course it comes at a cost
> in the normal case (for linux-2.6, I measured an increase in CPU time
> with "index-pack --strict" from ~2.5 minutes to ~4 minutes). And I think
> it is probably overkill for finding corruption; index-pack already
> recognizes bit corruption inside an object, and
> check_everything_connected can detect object graph problems much more
> cheaply.

> One thing I didn't check is bit corruption inside a packed object that
> still correctly zlib inflates. check_everything_connected will end up
> reading all of the commits and trees (to walk them), but not the blobs.

Correct.

> So I think at the very least we should:
>
>   1. Make sure clone propagates errors from checkout to the final exit
>  code.
>
>   2. Teach clone to run check_everything_connected.

I agree with both but with a slight reservation on the former one
(see above).

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: propagating repo corruption across clone

2013-03-25 Thread Jeff Mitchell
On Mon, Mar 25, 2013 at 11:56 AM, Jeff King  wrote:
> On Mon, Mar 25, 2013 at 10:31:04PM +0700, Nguyen Thai Ngoc Duy wrote:
>
>> On Mon, Mar 25, 2013 at 9:56 PM, Jeff King  wrote:
>> > There are basically three levels of transport that can be used on a
>> > local machine:
>> >
>> >   1. Hard-linking (very fast, no redundancy).
>> >
>> >   2. Byte-for-byte copy (medium speed, makes a separate copy of the
>> >  data, but does not check the integrity of the original).
>> >
>> >   3. Regular git transport, creating a pack (slowest, but should include
>> >  redundancy checks).
>> >
>> > Using --no-hardlinks turns off (1), but leaves (2) as an option.  I
>> > think the documentation in "git clone" could use some improvement in
>> > that area.
>>
>> Not only git-clone. How git-fetch and git-push verify the new pack
>> should also be documented. I don't think many people outside the
>> contributor circle know what is done (and maybe how) when data is
>> received from outside.
>
> I think it's less of a documentation issue there, though, because they
> _only_ do (3). There is no option to do anything else, so there is
> nothing to warn the user about in terms of tradeoffs.
>
> I agree that in general git's handling of corruption could be documented
> somewhere, but I'm not sure where.

Hi there,

First of all, thanks for the analysis, it's much appreciated. It's
good to know that we weren't totally off-base in thinking that a naive
copy may be out of sync, as small as the chance are (certainly we
wouldn't have known the right ordering).

I think what was conflating the issue in my testing is that with
--mirror it implies --bare, so there would be checking of the objects
when the working tree was being created, hence --mirror won't show the
error a normal clone will -- it's not a transport question, it's just
a matter of the normal clone doing more and so having more data run
through checks.

However, there are still problems. For blob corruptions, even in this
--no-hardlinks, non --mirror case where an error was found, the exit
code from the clone was 0. I can see this tripping up all sorts of
automated scripts or repository GUIs that ignore the output and only
check the error code, which is not an unreasonable thing to do.

For commit corruptions, the --no-hardlinks, non --mirror case refused
to create the new repository and exited with an error code of 128. The
--no-hardlinks, --mirror case spewed errors to the console, yet
*still* created the new clone *and* returned an error code of zero.

It seems that when there is an "error" as opposed to a "fatal" it
doesn't affect the status code on a clone; I'd argue that it ought to.
If Git knows that the source repository has problems, it ought to be
reflected in the status code so that scripts performing clones have a
normal way to detect this and alert a user/sysadmin/whoever. Even if a
particular cloning method doesn't perform all sanity checks, if it
finds something in the sanity checks it *does* perform, this should be
trumpeted, loudly, regardless of transport mechanism and regardless of
whether a user is watching the process or a script is.

Thanks,
Jeff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: propagating repo corruption across clone

2013-03-25 Thread Jeff King
On Mon, Mar 25, 2013 at 10:31:04PM +0700, Nguyen Thai Ngoc Duy wrote:

> On Mon, Mar 25, 2013 at 9:56 PM, Jeff King  wrote:
> > There are basically three levels of transport that can be used on a
> > local machine:
> >
> >   1. Hard-linking (very fast, no redundancy).
> >
> >   2. Byte-for-byte copy (medium speed, makes a separate copy of the
> >  data, but does not check the integrity of the original).
> >
> >   3. Regular git transport, creating a pack (slowest, but should include
> >  redundancy checks).
> >
> > Using --no-hardlinks turns off (1), but leaves (2) as an option.  I
> > think the documentation in "git clone" could use some improvement in
> > that area.
> 
> Not only git-clone. How git-fetch and git-push verify the new pack
> should also be documented. I don't think many people outside the
> contributor circle know what is done (and maybe how) when data is
> received from outside.

I think it's less of a documentation issue there, though, because they
_only_ do (3). There is no option to do anything else, so there is
nothing to warn the user about in terms of tradeoffs.

I agree that in general git's handling of corruption could be documented
somewhere, but I'm not sure where.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: propagating repo corruption across clone

2013-03-25 Thread Duy Nguyen
On Mon, Mar 25, 2013 at 9:56 PM, Jeff King  wrote:
> There are basically three levels of transport that can be used on a
> local machine:
>
>   1. Hard-linking (very fast, no redundancy).
>
>   2. Byte-for-byte copy (medium speed, makes a separate copy of the
>  data, but does not check the integrity of the original).
>
>   3. Regular git transport, creating a pack (slowest, but should include
>  redundancy checks).
>
> Using --no-hardlinks turns off (1), but leaves (2) as an option.  I
> think the documentation in "git clone" could use some improvement in
> that area.

Not only git-clone. How git-fetch and git-push verify the new pack
should also be documented. I don't think many people outside the
contributor circle know what is done (and maybe how) when data is
received from outside.
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: propagating repo corruption across clone

2013-03-25 Thread Jeff King
On Mon, Mar 25, 2013 at 09:43:23AM -0400, Jeff Mitchell wrote:

> > But I haven't seen exactly what the corruption is, nor exactly what
> > commands they used to clone. I've invited the blog author to give more
> > details in this thread.
> 
> The syncing was performed via a clone with git clone --mirror (and a
> git:// URL) and updates with git remote update.

OK. That should be resilient to corruption, then[1].

> So I should mention that my experiments after the fact were using
> local paths, but with --no-hardlinks.

Yeah, we will do a direct copy in that case, and there is nothing to
prevent corruption propagating.

> If you're saying that the transport is where corruption is supposed to
> be caught, then it's possible that we shouldn't see corruption
> propagate on an initial mirror clone across git://, and that something
> else was responsible for the trouble we saw with the repositories that
> got cloned after-the-fact.

Right. Either it was something else, or there is a bug in git's
protections (but I haven't been able to reproduce anything likely).

> But then I'd argue that this is non-obvious. In particular, when using
> --no-hardlinks, I wouldn't expect that behavior to be different with a
> straight path and with file://.

There are basically three levels of transport that can be used on a
local machine:

  1. Hard-linking (very fast, no redundancy).

  2. Byte-for-byte copy (medium speed, makes a separate copy of the
 data, but does not check the integrity of the original).

  3. Regular git transport, creating a pack (slowest, but should include
 redundancy checks).

Using --no-hardlinks turns off (1), but leaves (2) as an option.  I
think the documentation in "git clone" could use some improvement in
that area.

> Something else: apparently one of my statements prompted joeyh to
> think about potential issues with backing up live git repos
> (http://joeyh.name/blog/entry/difficulties_in_backing_up_live_git_repositories/).
> Looking at that post made me realize that, when we were doing our
> initial thinking about the system three years ago, we made an
> assumption that, in fact, taking a .tar.gz of a repo as it's in the
> process of being written to or garbage collected or repacked could be
> problematic. This isn't a totally baseless assumption, as I once had a
> git repository that I was in the process of updating when I had a
> sudden power outage that suffered corruption. (It could totally have
> been the filesystem, of course, although it was a journaled file
> system.)

Yes, if you take a snapshot of a repository with rsync or tar, it may be
in an inconsistent state. Using the git protocol should always be
robust, but if you want to do it with other tools, you need to follow a
particular order:

  1. copy the refs (refs/ and packed-refs) first

  2. copy everything else (including object/)

That covers the case where somebody is pushing an update simultaneously
(you _may_ get extra objects in step 2 that they have not yet
referenced, but you will never end up with a case where you are
referencing objects that you did not yet transfer).

If it's possible that the repository might be repacked during your
transfer, I think the issue a bit trickier, as there's a moment where
the new packfile is renamed into place, and then the old ones are
deleted. Depending on the timing and how your readdir() implementation
behaves with respect to new and deleted entries, it might be possible to
miss both the new one appearing and the old ones disappearing. This is
quite a tight race to catch, I suspect, but if you were to rsync
objects/pack twice in a row, that would be sufficient.

For pruning, I think you could run into the opposite situation: you grab
the refs, somebody updates them with a history rewind (or branch
deletion), then somebody prunes and objects go away. Again, the timing
on this race is quite tight and it's unlikely in practice. I'm not sure
of a simple way to eliminate it completely.

Yet another option is to simply rsync the whole thing and then "git
fsck" the result. If it's not 100% good, just re-run the rsync. This is
simple and should be robust, but is more CPU intensive (you'll end up
re-checking all of the data on each update).

> So, we decided to use Git's built-in capabilities of consistency
> checking to our advantage (with, as it turns out, a flaw in our
> implementation). But the question remains: are we wrong about thinking
> that rsyncing or tar.gz live repositories in the middle of being
> pushed to/gc'd/repacked could result in a bogus backup?

No, I think you are right. If you do the refs-then-objects ordering,
that saves you from most of it, but I do think there are still some
races that exist during repacking or pruning.

-Peff

[1] I mentioned that clone-over-git:// is resilient to corruption. I
think that is true for bit corruption, but my tests did show that we
are not as careful about checking graph connectivity during clone as
we are with 

Re: propagating repo corruption across clone

2013-03-25 Thread Jeff Mitchell
On Sun, Mar 24, 2013 at 3:23 PM, Jeff King  wrote:
> On Sun, Mar 24, 2013 at 08:01:33PM +0100, Ævar Arnfjörð Bjarmason wrote:
>
>> On Sun, Mar 24, 2013 at 7:31 PM, Jeff King  wrote:
>> >
>> > I don't have details on the KDE corruption, or why it wasn't detected
>> > (if it was one of the cases I mentioned above, or a more subtle issue).
>>
>> One thing worth mentioning is this part of the article:
>>
>> "Originally, mirrored clones were in fact not used, but non-mirrored
>> clones on the anongits come with their own set of issues, and are more
>> prone to getting stopped up by legitimate, authenticated force pushes,
>> ref deletions, and so on – and if we set the refspec such that those
>> are allowed through silently, we don’t gain much. "
>>
>> So the only reason they were even using --mirror was because they were
>> running into those problems with fetching.

With a normal fetch. We actually *wanted* things like force updates
and ref deletions to propagate, because we have not just Gitolite's
checks but our own checks on the servers, and wanted that to be
considered the authenticated source. Besides just daily use and
preventing cruft, we wanted to ensure that such actions propagated so
that if a branch was removed because it contained personal
information, accidental commits, or a security issue (for instance)
that the branch was removed on the anongits too, within a timely
fashion.

> I think the --mirror thing is a red herring. It should not be changing
> the transport used, and that is the part of git that is expected to
> catch such corruption.
>
> But I haven't seen exactly what the corruption is, nor exactly what
> commands they used to clone. I've invited the blog author to give more
> details in this thread.

The syncing was performed via a clone with git clone --mirror (and a
git:// URL) and updates with git remote update.

So I should mention that my experiments after the fact were using
local paths, but with --no-hardlinks. If you're saying that the
transport is where corruption is supposed to be caught, then it's
possible that we shouldn't see corruption propagate on an initial
mirror clone across git://, and that something else was responsible
for the trouble we saw with the repositories that got cloned
after-the-fact. But then I'd argue that this is non-obvious. In
particular, when using --no-hardlinks, I wouldn't expect that behavior
to be different with a straight path and with file://.

Something else: apparently one of my statements prompted joeyh to
think about potential issues with backing up live git repos
(http://joeyh.name/blog/entry/difficulties_in_backing_up_live_git_repositories/).
Looking at that post made me realize that, when we were doing our
initial thinking about the system three years ago, we made an
assumption that, in fact, taking a .tar.gz of a repo as it's in the
process of being written to or garbage collected or repacked could be
problematic. This isn't a totally baseless assumption, as I once had a
git repository that I was in the process of updating when I had a
sudden power outage that suffered corruption. (It could totally have
been the filesystem, of course, although it was a journaled file
system.)

So, we decided to use Git's built-in capabilities of consistency
checking to our advantage (with, as it turns out, a flaw in our
implementation). But the question remains: are we wrong about thinking
that rsyncing or tar.gz live repositories in the middle of being
pushed to/gc'd/repacked could result in a bogus backup?

Thanks,
Jeff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: propagating repo corruption across clone

2013-03-24 Thread Ilari Liusvaara
On Sun, Mar 24, 2013 at 02:31:33PM -0400, Jeff King wrote:
> 
> Fscking the incoming objects does work, but of course it comes at a cost
> in the normal case (for linux-2.6, I measured an increase in CPU time
> with "index-pack --strict" from ~2.5 minutes to ~4 minutes). And I think
> it is probably overkill for finding corruption; index-pack already
> recognizes bit corruption inside an object, and
> check_everything_connected can detect object graph problems much more
> cheaply.

AFAIK, standard checks index-pack has to do + checking that the object
graph has no broken links (and every ref points to something valid) will
catch everything except:

- SHA-1 collisions between corrupt objects and clean ones.
- Corrupted refs (that still point to something valid).
- "Born-corrupted" objects.

> One thing I didn't check is bit corruption inside a packed object that
> still correctly zlib inflates. check_everything_connected will end up
> reading all of the commits and trees (to walk them), but not the blobs.

Checking that everything is connected will (modulo SHA-1 collisions) save
you there, at least if packv3 is used as transport stream.

-Ilari
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: propagating repo corruption across clone

2013-03-24 Thread Jeff King
On Sun, Mar 24, 2013 at 08:01:33PM +0100, Ævar Arnfjörð Bjarmason wrote:

> On Sun, Mar 24, 2013 at 7:31 PM, Jeff King  wrote:
> >
> > I don't have details on the KDE corruption, or why it wasn't detected
> > (if it was one of the cases I mentioned above, or a more subtle issue).
> 
> One thing worth mentioning is this part of the article:
> 
> "Originally, mirrored clones were in fact not used, but non-mirrored
> clones on the anongits come with their own set of issues, and are more
> prone to getting stopped up by legitimate, authenticated force pushes,
> ref deletions, and so on – and if we set the refspec such that those
> are allowed through silently, we don’t gain much. "
> 
> So the only reason they were even using --mirror was because they were
> running into those problems with fetching.

I think the --mirror thing is a red herring. It should not be changing
the transport used, and that is the part of git that is expected to
catch such corruption.

But I haven't seen exactly what the corruption is, nor exactly what
commands they used to clone. I've invited the blog author to give more
details in this thread.

> So aside from the problems with --mirror I think we should have
> something that updates your local refs to be exactly like they are on
> the other end, i.e. deletes some, non-fast-forwards others etc.
> (obviously behind several --force options and so on). But such an
> option *wouldn't* accept corrupted objects.

That _should_ be how "git fetch --prune +refs/*:refs/*" behaves (and
that refspec is set up when you use "--mirror"; we should probably have
it turn on --prune, too, but I do not think you can do so via a config
option currently).

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: propagating repo corruption across clone

2013-03-24 Thread Ævar Arnfjörð Bjarmason
On Sun, Mar 24, 2013 at 7:31 PM, Jeff King  wrote:
>
> I don't have details on the KDE corruption, or why it wasn't detected
> (if it was one of the cases I mentioned above, or a more subtle issue).

One thing worth mentioning is this part of the article:

"Originally, mirrored clones were in fact not used, but non-mirrored
clones on the anongits come with their own set of issues, and are more
prone to getting stopped up by legitimate, authenticated force pushes,
ref deletions, and so on – and if we set the refspec such that those
are allowed through silently, we don’t gain much. "

So the only reason they were even using --mirror was because they were
running into those problems with fetching.

So aside from the problems with --mirror I think we should have
something that updates your local refs to be exactly like they are on
the other end, i.e. deletes some, non-fast-forwards others etc.
(obviously behind several --force options and so on). But such an
option *wouldn't* accept corrupted objects.

That would give KDE and other parties a safe way to do exact repo
mirroring like this, wouldn't protect them from someone maliciously
deleting all the refs in all the repos, but would prevent FS
corruption from propagating.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html