Re: A local shared clone is now much slower

2013-07-08 Thread Jeff King
On Mon, Jul 08, 2013 at 08:00:09AM -0700, Junio C Hamano wrote:

> I think this deserves to be backported to 'maint' track for
> 1.8.3.x.  Here is an attempt to do so.

Agreed. As it makes certain local-clone workflows really painful, I
think my original can be considered a performance regression for those
cases.

Your back-port looks good to me. Thanks.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: A local shared clone is now much slower

2013-07-08 Thread Junio C Hamano
Duy Nguyen  writes:

> On Mon, Jul 8, 2013 at 2:30 PM, Jeff King  wrote:
>> Subject: [PATCH] clone: drop connectivity check for local clones
>>
>> Commit 0433ad1 (clone: run check_everything_connected,
>> 2013-03-25) added the same connectivity check to clone that
>> we use for fetching. The intent was to provide enough safety
>> checks that "git clone git://..." could be counted on to
>> detect bit errors and other repo corruption, and not
>> silently propagate them to the clone.
>>
>> For local clones, this turns out to be a bad idea, for two
>> reasons:
>>
>>   1. Local clones use hard linking (or even shared object
>>  stores), and so complete far more quickly. The time
>>  spent on the connectivity check is therefore
>>  proportionally much more painful.
>
> There's also byte-to-byte copy when system does not support hardlinks
> (or the user does not want it) but I guess it's safe to trust the OS
> to copy correctly in most cases.

While that may be true, I do not think it matters that much.  The
check during transport is meant to guard against not just corruption
during the object transfer, but also against a corrupt source
repository, and your trust on "cp -R" only covers the "transfer"
part.  And that makes 2. below very relevant.

>>   2. Local clones do not actually meet our safety guarantee
>>  anyway.
>>  ...
>
> Faster clones make everybody happy :-)

Yup.

I think this deserves to be backported to 'maint' track for
1.8.3.x.  Here is an attempt to do so.

 builtin/clone.c   | 11 +++
 t/t5710-info-alternate.sh |  8 +++-
 2 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index 035ab64..38a0a64 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -542,12 +542,15 @@ static void update_remote_refs(const struct ref *refs,
   const struct ref *mapped_refs,
   const struct ref *remote_head_points_at,
   const char *branch_top,
-  const char *msg)
+  const char *msg,
+  int check_connectivity)
 {
const struct ref *rm = mapped_refs;
 
-   if (check_everything_connected(iterate_ref_map, 0, &rm))
-   die(_("remote did not send all necessary objects"));
+   if (check_connectivity) {
+   if (check_everything_connected(iterate_ref_map, 0, &rm))
+   die(_("remote did not send all necessary objects"));
+   }
 
if (refs) {
write_remote_refs(mapped_refs);
@@ -950,7 +953,7 @@ int cmd_clone(int argc, const char **argv, const char 
*prefix)
transport_fetch_refs(transport, mapped_refs);
 
update_remote_refs(refs, mapped_refs, remote_head_points_at,
-  branch_top.buf, reflog_msg.buf);
+  branch_top.buf, reflog_msg.buf, !is_local);
 
update_head(our_head_points_at, remote_head, reflog_msg.buf);
 
diff --git a/t/t5710-info-alternate.sh b/t/t5710-info-alternate.sh
index 8956c21..5a6e49d 100755
--- a/t/t5710-info-alternate.sh
+++ b/t/t5710-info-alternate.sh
@@ -58,7 +58,13 @@ test_expect_success 'creating too deep nesting' \
 git clone -l -s D E &&
 git clone -l -s E F &&
 git clone -l -s F G &&
-test_must_fail git clone --bare -l -s G H'
+git clone --bare -l -s G H'
+
+test_expect_success 'invalidity of deepest repository' \
+'cd H && {
+   test_valid_repo
+   test $? -ne 0
+}'
 
 cd "$base_dir"
 
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: A local shared clone is now much slower

2013-07-08 Thread Junio C Hamano
Jeff King  writes:

> On Mon, Jul 08, 2013 at 01:03:55PM +1000, Stephen Rothwell wrote:
>
>> So commit 0433ad128c59 ("clone: run check_everything_connected") (which
>> turned up with v1.8.3) added a large traversal to clone which (as the
>> comment said) makes a clone much slower.  It is especially noticeable on
>> "git clone -s -l -n" which I use every day and used to be almost
>> instant.  Is there any thought to making it fast again, please?
>> 
>> The above clone is very useful for working with different branches in one
>> tree without touching every file in the main branch you are working
>> with (and consequent issues with rebuilding at least).  As linux-next
>> maintainer, you can imagine that I do this a bit.
>
> Yeah, I have noticed it is somewhat annoying, as well, because the
> proportion of time taken for the check is so much larger compared to the
> relatively instant time taken for the local shared clone.
>
> The point of that commit is to add the same safety checks to clone that
> we do for fetching. But in the local shared-repo case, I really feel
> like all safety bets are off anyway. You are not creating a verified
> redundant copy at all, and there are still corruptions that can sneak
> through (e.g., bit corruptions of blob objects).

Yeah, I was thinking the same when I saw that report, so obviously I
think the approacch makes sense ;-)

Thanks.

>
> So maybe this:
>
> -- >8 --
> Subject: [PATCH] clone: drop connectivity check for local clones
>
> Commit 0433ad1 (clone: run check_everything_connected,
> 2013-03-25) added the same connectivity check to clone that
> we use for fetching. The intent was to provide enough safety
> checks that "git clone git://..." could be counted on to
> detect bit errors and other repo corruption, and not
> silently propagate them to the clone.
>
> For local clones, this turns out to be a bad idea, for two
> reasons:
>
>   1. Local clones use hard linking (or even shared object
>  stores), and so complete far more quickly. The time
>  spent on the connectivity check is therefore
>  proportionally much more painful.
>
>   2. Local clones do not actually meet our safety guarantee
>  anyway. The connectivity check makes sure we have all
>  of the objects we claim to, but it does not check for
>  bit errors. We will notice bit errors in commits and
>  trees, but we do not load blob objects at all. Whereas
>  over the pack transport, we actually recompute the sha1
>  of each object in the incoming packfile; bit errors
>  change the sha1 of the object, which is then caught by
>  the connectivity check.
>
> This patch drops the connectivity check in the local case.
> Note that we have to revert the changes from 0433ad1 to
> t5710, as we no longer notice the corruption during clone.
>
> We could go a step further and provide a "verify even local
> clones" option, but it is probably not worthwhile. You can
> already spell that as "cd foo.git && git fsck && git clone ."
> or as "git clone --no-local foo.git".
>
> Signed-off-by: Jeff King 
> ---
>  builtin/clone.c   | 22 +-
>  t/t5710-info-alternate.sh |  8 +++-
>  2 files changed, 20 insertions(+), 10 deletions(-)
>
> diff --git a/builtin/clone.c b/builtin/clone.c
> index 14b1323..dafb6b5 100644
> --- a/builtin/clone.c
> +++ b/builtin/clone.c
> @@ -545,17 +545,20 @@ static void update_remote_refs(const struct ref *refs,
>  const struct ref *remote_head_points_at,
>  const char *branch_top,
>  const char *msg,
> -struct transport *transport)
> +struct transport *transport,
> +int check_connectivity)
>  {
>   const struct ref *rm = mapped_refs;
>  
> - if (0 <= option_verbosity)
> - printf(_("Checking connectivity... "));
> - if (check_everything_connected_with_transport(iterate_ref_map,
> -   0, &rm, transport))
> - die(_("remote did not send all necessary objects"));
> - if (0 <= option_verbosity)
> - printf(_("done\n"));
> + if (check_connectivity) {
> + if (0 <= option_verbosity)
> + printf(_("Checking connectivity... "));
> + if (check_everything_connected_with_transport(iterate_ref_map,
> +   0, &rm, 
> transport))
> + die(_("remote did not send all necessary objects"));
> + if (0 <= option_verbosity)
> + printf(_("done\n"));
> + }
>  
>   if (refs) {
>   write_remote_refs(mapped_refs);
> @@ -963,7 +966,8 @@ int cmd_clone(int argc, const char **argv, const char 
> *prefix)
>   transport_fetch_refs(transport, mapped_refs);
>  
>   update_remote_refs(refs, mapped_refs, remote_head_points

Re: A local shared clone is now much slower

2013-07-08 Thread Duy Nguyen
On Mon, Jul 8, 2013 at 2:30 PM, Jeff King  wrote:
> Subject: [PATCH] clone: drop connectivity check for local clones
>
> Commit 0433ad1 (clone: run check_everything_connected,
> 2013-03-25) added the same connectivity check to clone that
> we use for fetching. The intent was to provide enough safety
> checks that "git clone git://..." could be counted on to
> detect bit errors and other repo corruption, and not
> silently propagate them to the clone.
>
> For local clones, this turns out to be a bad idea, for two
> reasons:
>
>   1. Local clones use hard linking (or even shared object
>  stores), and so complete far more quickly. The time
>  spent on the connectivity check is therefore
>  proportionally much more painful.

There's also byte-to-byte copy when system does not support hardlinks
(or the user does not want it) but I guess it's safe to trust the OS
to copy correctly in most cases.

>   2. Local clones do not actually meet our safety guarantee
>  anyway. The connectivity check makes sure we have all
>  of the objects we claim to, but it does not check for
>  bit errors. We will notice bit errors in commits and
>  trees, but we do not load blob objects at all. Whereas
>  over the pack transport, we actually recompute the sha1
>  of each object in the incoming packfile; bit errors
>  change the sha1 of the object, which is then caught by
>  the connectivity check.

We used to, before d21c463 (fetch/receive: remove over-pessimistic
connectivity check - 2012-03-15). But back then we did not even do
connectivity check in clone.

> This patch drops the connectivity check in the local case.
> Note that we have to revert the changes from 0433ad1 to
> t5710, as we no longer notice the corruption during clone.
>
> We could go a step further and provide a "verify even local
> clones" option, but it is probably not worthwhile. You can
> already spell that as "cd foo.git && git fsck && git clone ."
> or as "git clone --no-local foo.git".

Faster clones make everybody happy :-)
--
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: A local shared clone is now much slower

2013-07-08 Thread Jeff King
On Mon, Jul 08, 2013 at 01:03:55PM +1000, Stephen Rothwell wrote:

> So commit 0433ad128c59 ("clone: run check_everything_connected") (which
> turned up with v1.8.3) added a large traversal to clone which (as the
> comment said) makes a clone much slower.  It is especially noticeable on
> "git clone -s -l -n" which I use every day and used to be almost
> instant.  Is there any thought to making it fast again, please?
> 
> The above clone is very useful for working with different branches in one
> tree without touching every file in the main branch you are working
> with (and consequent issues with rebuilding at least).  As linux-next
> maintainer, you can imagine that I do this a bit.

Yeah, I have noticed it is somewhat annoying, as well, because the
proportion of time taken for the check is so much larger compared to the
relatively instant time taken for the local shared clone.

The point of that commit is to add the same safety checks to clone that
we do for fetching. But in the local shared-repo case, I really feel
like all safety bets are off anyway. You are not creating a verified
redundant copy at all, and there are still corruptions that can sneak
through (e.g., bit corruptions of blob objects).

So maybe this:

-- >8 --
Subject: [PATCH] clone: drop connectivity check for local clones

Commit 0433ad1 (clone: run check_everything_connected,
2013-03-25) added the same connectivity check to clone that
we use for fetching. The intent was to provide enough safety
checks that "git clone git://..." could be counted on to
detect bit errors and other repo corruption, and not
silently propagate them to the clone.

For local clones, this turns out to be a bad idea, for two
reasons:

  1. Local clones use hard linking (or even shared object
 stores), and so complete far more quickly. The time
 spent on the connectivity check is therefore
 proportionally much more painful.

  2. Local clones do not actually meet our safety guarantee
 anyway. The connectivity check makes sure we have all
 of the objects we claim to, but it does not check for
 bit errors. We will notice bit errors in commits and
 trees, but we do not load blob objects at all. Whereas
 over the pack transport, we actually recompute the sha1
 of each object in the incoming packfile; bit errors
 change the sha1 of the object, which is then caught by
 the connectivity check.

This patch drops the connectivity check in the local case.
Note that we have to revert the changes from 0433ad1 to
t5710, as we no longer notice the corruption during clone.

We could go a step further and provide a "verify even local
clones" option, but it is probably not worthwhile. You can
already spell that as "cd foo.git && git fsck && git clone ."
or as "git clone --no-local foo.git".

Signed-off-by: Jeff King 
---
 builtin/clone.c   | 22 +-
 t/t5710-info-alternate.sh |  8 +++-
 2 files changed, 20 insertions(+), 10 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index 14b1323..dafb6b5 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -545,17 +545,20 @@ static void update_remote_refs(const struct ref *refs,
   const struct ref *remote_head_points_at,
   const char *branch_top,
   const char *msg,
-  struct transport *transport)
+  struct transport *transport,
+  int check_connectivity)
 {
const struct ref *rm = mapped_refs;
 
-   if (0 <= option_verbosity)
-   printf(_("Checking connectivity... "));
-   if (check_everything_connected_with_transport(iterate_ref_map,
- 0, &rm, transport))
-   die(_("remote did not send all necessary objects"));
-   if (0 <= option_verbosity)
-   printf(_("done\n"));
+   if (check_connectivity) {
+   if (0 <= option_verbosity)
+   printf(_("Checking connectivity... "));
+   if (check_everything_connected_with_transport(iterate_ref_map,
+ 0, &rm, 
transport))
+   die(_("remote did not send all necessary objects"));
+   if (0 <= option_verbosity)
+   printf(_("done\n"));
+   }
 
if (refs) {
write_remote_refs(mapped_refs);
@@ -963,7 +966,8 @@ int cmd_clone(int argc, const char **argv, const char 
*prefix)
transport_fetch_refs(transport, mapped_refs);
 
update_remote_refs(refs, mapped_refs, remote_head_points_at,
-  branch_top.buf, reflog_msg.buf, transport);
+  branch_top.buf, reflog_msg.buf, transport,
+  !is_local);
 
update_head(our_head_points_at, remote_head, reflog_msg.buf);
 
diff --git a/t

Re: A local shared clone is now much slower

2013-07-07 Thread Stephen Rothwell
Hi Duy,

On Mon, 8 Jul 2013 10:20:22 +0700 Duy Nguyen  wrote:
>
> On Mon, Jul 8, 2013 at 10:03 AM, Stephen Rothwell  
> wrote:
> >
> > So commit 0433ad128c59 ("clone: run check_everything_connected") (which
> > turned up with v1.8.3) added a large traversal to clone which (as the
> > comment said) makes a clone much slower.  It is especially noticeable on
> > "git clone -s -l -n" which I use every day and used to be almost
> > instant.  Is there any thought to making it fast again, please?
> 
> It's done that way as a security measure against repo corruption.
> Although I wonder if we could do connectivity check in background
> instead (reports are stored in .git and picked up by git-status). The
> same mechanism could be used for "git gc --auto". If the repo turns
> out corrupted, the user may lose the last ~10 minutes of work, not
> really bad for the speed trade off. This mode is not the default, of
> course. The user has to be aware of the risk when choosing this route.

Thanks for the explanation.  Now, is there some way I can turn it off
just for the local shared case.   In my case, I check my repo regularly,
so don't need or want this going on while I am working ...

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgpK5FEJKmn52.pgp
Description: PGP signature


Re: A local shared clone is now much slower

2013-07-07 Thread Duy Nguyen
On Mon, Jul 8, 2013 at 10:03 AM, Stephen Rothwell  wrote:
> Hi guys,
>
> So commit 0433ad128c59 ("clone: run check_everything_connected") (which
> turned up with v1.8.3) added a large traversal to clone which (as the
> comment said) makes a clone much slower.  It is especially noticeable on
> "git clone -s -l -n" which I use every day and used to be almost
> instant.  Is there any thought to making it fast again, please?

It's done that way as a security measure against repo corruption.
Although I wonder if we could do connectivity check in background
instead (reports are stored in .git and picked up by git-status). The
same mechanism could be used for "git gc --auto". If the repo turns
out corrupted, the user may lose the last ~10 minutes of work, not
really bad for the speed trade off. This mode is not the default, of
course. The user has to be aware of the risk when choosing this route.
--
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


A local shared clone is now much slower

2013-07-07 Thread Stephen Rothwell
Hi guys,

So commit 0433ad128c59 ("clone: run check_everything_connected") (which
turned up with v1.8.3) added a large traversal to clone which (as the
comment said) makes a clone much slower.  It is especially noticeable on
"git clone -s -l -n" which I use every day and used to be almost
instant.  Is there any thought to making it fast again, please?

The above clone is very useful for working with different branches in one
tree without touching every file in the main branch you are working
with (and consequent issues with rebuilding at least).  As linux-next
maintainer, you can imagine that I do this a bit.

I am sure one of Linus' points about branches was that being able to make
a fast local clone of a tree to use more than one branch was a feature.

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgpA3zkp3VAx8.pgp
Description: PGP signature