Re: [PATCH v3 4/4] clone: open a shortcut for connectivity check

2013-05-03 Thread Duy Nguyen
On Fri, May 3, 2013 at 11:15 PM, Junio C Hamano  wrote:
> Nguyễn Thái Ngọc Duy   writes:
>
>> In order to make sure the cloned repository is good, we run "rev-list
>> --objects --not --all $new_refs" on the repository. This is expensive
>> on large repositories. This patch attempts to mitigate the impact in
>> this special case.
>>
>> In the "good" clone case, we only have one pack.
>
> If "On large repositories" is the focus, we need to take into
> account the fact that pack.packSizeLimit can split and store the
> incoming packstream to multiple packs, so "only have one pack" is
> misleading.

I only had a quick look. But I don't think index-pack respects
packSizeLimit. pack-objects does but only when --stdout is not used,
which is not the case for pack transfer.

> I think you can still do the same trick even when we split the pack
> as index-pack will keep track of the objects it saw in the same
> incoming pack stream (but I am writing this from memory without
> looking at the original code you are touching, so please double
> check).

Yeah. As long we have only one incoming stream, we can still do the
same verification.

>> "index-pack + new checks" is still faster than the current "index-pack
>> + rev-list", which is the whole point of this patch. If any of the
>
> Does the same check apply if we end up on the unpack-objects
> codepath?

No. unpack-objects does not do this and check_everything_connected
should invoke rev-list like before.
--
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 4/4] clone: open a shortcut for connectivity check

2013-05-03 Thread Junio C Hamano
Nguyễn Thái Ngọc Duy   writes:

> In order to make sure the cloned repository is good, we run "rev-list
> --objects --not --all $new_refs" on the repository. This is expensive
> on large repositories. This patch attempts to mitigate the impact in
> this special case.
>
> In the "good" clone case, we only have one pack.

If "On large repositories" is the focus, we need to take into
account the fact that pack.packSizeLimit can split and store the
incoming packstream to multiple packs, so "only have one pack" is
misleading.

I think you can still do the same trick even when we split the pack
as index-pack will keep track of the objects it saw in the same
incoming pack stream (but I am writing this from memory without
looking at the original code you are touching, so please double
check).

> If all of the
> following are met, we can be sure that all objects reachable from the
> new refs exist, which is the intention of running "rev-list ...":
>
>  - all refs point to an object in the pack
>  - there are no dangling pointers in any object in the pack
>  - no objects in the pack point to objects outside the pack
>
> The second and third checks can be done with the help of index-pack as
> a slight variation of --strict check (which introduces a new condition
> for the shortcut: pack transfer must be used and the number of objects
> large enough to call index-pack). The first is checked in
> check_everything_connected after we get an "ok" from index-pack.
>
> "index-pack + new checks" is still faster than the current "index-pack
> + rev-list", which is the whole point of this patch. If any of the

Does the same check apply if we end up on the unpack-objects
codepath?

> This shortcut is not applied to shallow clones, partly because shallow
> clones should have no more objects than a usual fetch and the cost of
> rev-list is acceptable, partly to avoid dealing with corner cases when
> grafting is involved.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 4/4] clone: open a shortcut for connectivity check

2013-05-03 Thread Eric Sunshine
On Fri, May 3, 2013 at 8:35 AM, Nguyễn Thái Ngọc Duy  wrote:
> In order to make sure the cloned repository is good, we run "rev-list
> --objects --not --all $new_refs" on the repository. This is expensive
> on large repositories. This patch attempts to mitigate the impact in
> this special case.
>
> In the "good" clone case, we only have one pack. If all of the
> following are met, we can be sure that all objects reachable from the
> new refs exist, which is the intention of running "rev-list ...":
>
>  - all refs point to an object in the pack
>  - there are no dangling pointers in any object in the pack
>  - no objects in the pack point to objects outside the pack
>
> The second and third checks can be done with the help of index-pack as
> a slight variation of --strict check (which introduces a new condition
> for the shortcut: pack transfer must be used and the number of objects
> large enough to call index-pack). The first is checked in
> check_everything_connected after we get an "ok" from index-pack.
>
> "index-pack + new checks" is still faster than the current "index-pack
> + rev-list", which is the whole point of this patch. If any of the
> conditions fails, we fall back to the good old but expensive "rev-list

s/fails/fail/

> ..". In that case it's even more expensive because we have to pay for
> the new checks in index-pack. But that should only happen when the
> other side is either buggy or malicious.
>
> Cloning linux-2.6 over file://
>
> before after
> real3m25.693s  2m53.050s
> user5m2.037s   4m42.396s
> sys 0m13.750s  0m16.574s
>
> A more realistic test with ssh:// over wireless
>
> before after
> real11m26.629s 10m4.213s
> user5m43.196s  5m19.444s
> sys 0m35.812s  0m37.630s
>
> This shortcut is not applied to shallow clones, partly because shallow
> clones should have no more objects than a usual fetch and the cost of
> rev-list is acceptable, partly to avoid dealing with corner cases when
> grafting is involved.
>
> Signed-off-by: Nguyễn Thái Ngọc Duy 
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html