Re: does a successful 'git gc' imply 'git fsck'

2012-12-03 Thread Junio C Hamano
Matthieu Moy  writes:

> Junio C Hamano  writes:
>
>> But a "gc" does not necessarily run "repack -a" when it does not see
>> too many pack files, so it can end up scanning only the surface of
>> the history to collect the recently created loose objects into a
>> pack, and stop its traversal without going into existing packfiles.
>
> Isn't that the behavior of "git gc --auto", not plain "git gc" ?

True; I missed that Sitaram was running "gc" manually.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: does a successful 'git gc' imply 'git fsck'

2012-12-03 Thread Matthieu Moy
Junio C Hamano  writes:

> But a "gc" does not necessarily run "repack -a" when it does not see
> too many pack files, so it can end up scanning only the surface of
> the history to collect the recently created loose objects into a
> pack, and stop its traversal without going into existing packfiles.

Isn't that the behavior of "git gc --auto", not plain "git gc" ?

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: does a successful 'git gc' imply 'git fsck'

2012-12-03 Thread Sitaram Chamarty
On Sun, Dec 2, 2012 at 3:01 PM, Junio C Hamano  wrote:
> Sitaram Chamarty  writes:
>
>> If I could assume that a successful 'git gc' means an fsck is not
>> needed, I'd save a lot of time.  Hence my question.
>
> When it does "repack -a", it at least scans the whole history so you
> would be sure that all the commits and trees are readable for the
> purpose of enumerating the objects referred by them (and a bit flip
> in them will likely be noticed by zlib inflation).
>
> But a "gc" does not necessarily run "repack -a" when it does not see
> too many pack files, so it can end up scanning only the surface of
> the history to collect the recently created loose objects into a
> pack, and stop its traversal without going into existing packfiles.

Thanks; I'd missed this nuance as well...

-- 
Sitaram
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: does a successful 'git gc' imply 'git fsck'

2012-12-02 Thread Junio C Hamano
Sitaram Chamarty  writes:

> If I could assume that a successful 'git gc' means an fsck is not
> needed, I'd save a lot of time.  Hence my question.

When it does "repack -a", it at least scans the whole history so you
would be sure that all the commits and trees are readable for the
purpose of enumerating the objects referred by them (and a bit flip
in them will likely be noticed by zlib inflation).

But a "gc" does not necessarily run "repack -a" when it does not see
too many pack files, so it can end up scanning only the surface of
the history to collect the recently created loose objects into a
pack, and stop its traversal without going into existing packfiles.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: does a successful 'git gc' imply 'git fsck'

2012-12-02 Thread Sitaram Chamarty
On Sun, Dec 2, 2012 at 9:58 AM, Shawn Pearce  wrote:
> On Sat, Dec 1, 2012 at 6:31 PM, Sitaram Chamarty  wrote:
>> Background: I have a situation where I have to fix up a few hundred
>> repos in terms of 'git gc' (the auto gc seems to have failed in many
>> cases; they have far more than 6700 loose objects).  I also found some
>> corrupted objects in some cases that prevent the gc from completing.
>>
>> I am running "git gc" followed by "git fsck".  The majority of the
>> repos I have worked through so far appear to be fine, but in the
>> larger repos (upwards of 2-3 GB) the git fsck is taking almost 5 times
>> longer than the 'gc'.
>>
>> If I could assume that a successful 'git gc' means an fsck is not
>> needed, I'd save a lot of time.  Hence my question.
>
> Not really. For example fsck verifies that every blob when
> decompressed and fully inflated matches its SHA-1. gc only checks

OK that makes sense.  After I posted I happened to check using strace
and kinda guessed this from what I saw, but it's nice to have
confirmation.

> connectivity of the commit and tree graph by making sure every object
> was accounted for. But when creating the output pack it only verifies
> a CRC-32 was correct when copying the bits from the source to the
> destination, it does not verify that the data decompresses and matches
> the SHA-1 it should match.
>
> So it depends on what level of check you need to feel safe.

Yup; thanks.

All the repos my internal client manages are mirrored in multiple
places, and they set (or were at least told to set, heh!)
receive.fsckObjects so the lesser check is fine in most cases.

-- 
Sitaram
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: does a successful 'git gc' imply 'git fsck'

2012-12-01 Thread Shawn Pearce
On Sat, Dec 1, 2012 at 6:31 PM, Sitaram Chamarty  wrote:
> Background: I have a situation where I have to fix up a few hundred
> repos in terms of 'git gc' (the auto gc seems to have failed in many
> cases; they have far more than 6700 loose objects).  I also found some
> corrupted objects in some cases that prevent the gc from completing.
>
> I am running "git gc" followed by "git fsck".  The majority of the
> repos I have worked through so far appear to be fine, but in the
> larger repos (upwards of 2-3 GB) the git fsck is taking almost 5 times
> longer than the 'gc'.
>
> If I could assume that a successful 'git gc' means an fsck is not
> needed, I'd save a lot of time.  Hence my question.

Not really. For example fsck verifies that every blob when
decompressed and fully inflated matches its SHA-1. gc only checks
connectivity of the commit and tree graph by making sure every object
was accounted for. But when creating the output pack it only verifies
a CRC-32 was correct when copying the bits from the source to the
destination, it does not verify that the data decompresses and matches
the SHA-1 it should match.

So it depends on what level of check you need to feel safe.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html