Re: [PATCH 11/11] Resumable clone: implement primer logic in git-clone

2016-09-27 Thread Kevin Wern
On Fri, Sep 16, 2016 at 04:32:05PM -0700, Junio C Hamano wrote:
> >  ---
> > @@ -172,6 +173,12 @@ objects from the source repository into a pack in the 
> > cloned repository.
> > via ssh, this specifies a non-default path for the command
> > run on the other end.
> >  
> > +--prime-clone ::
> > +-p ::
> 
> Not many other options have single letter shorthand.  Is it expected
> that it is worth to let this option squat on a short-and-sweet "-p",
> perhaps because it is so frequently used?

I based my decision more on precedent than value--the "--upload-pack" option
had the shorthand "-u".

> > @@ -40,17 +42,20 @@ static const char * const builtin_clone_usage[] = {
> >  
> >  static int option_no_checkout, option_bare, option_mirror, 
> > option_single_branch = -1;
> >  static int option_local = -1, option_no_hardlinks, option_shared, 
> > option_recursive;
> > +static int option_resume;
> >  static char *option_template, *option_depth;
> > -static char *option_origin = NULL;
> > +static const char *option_origin = NULL;
> 
> Is this change related to anything you are doing here?
> 
> If you are fixing things while at it, please don't ;-) If you really
> want to, please also remove " = NULL", from this line and also from
> the next line.  Also do not add " = NULL" at the end of alt_res.
> 

Yes, it was because remote_config uses all const char* strings, and .name gets
assigned to option_origin when it pulls the previous info from config. I looked
to see if option_origin's underlying string is ever modified in a way that
makes it ineligible to be const, and it wasn't.

So, I should remove the NULLs here? And then initalize alt_res to NULL in
cmd_clone?

> >  static char *option_branch = NULL;
> >  ...
> > +static const struct alt_resource *alt_res = NULL;
> 
> > +static char *get_filename(const char *dir)
> > +{
> > +   char *dir_copy = xstrdup(dir);
> > +   strip_trailing_slashes(dir_copy);
> > +   char *filename, *final = NULL;
> > +
> > +   filename = find_last_dir_sep(dir);
> > +
> > +   if (filename && *(++filename))
> > +   final = xstrdup(filename);
> > +
> > +   free(dir_copy);
> > +   return final;
> > +}
> 
> Hmph, don't we have our own basename(3) lookalike that knows about
> dir-sep already?

Whoops, did not catch that. Thanks!

> > @@ -562,7 +614,7 @@ static void write_remote_refs(const struct ref 
> > *local_refs)
> > die("%s", err.buf);
> >  
> > for (r = local_refs; r; r = r->next) {
> > -   if (!r->peer_ref)
> > +   if (!r->peer_ref || ref_exists(r->peer_ref->name))
> > continue;
> > if (ref_transaction_create(t, r->peer_ref->name, 
> > r->old_oid.hash,
> >0, NULL, ))
> 
> What is this change about?

Because resumable clone is supposed to handle halting, I included the
possibility of halting after the final fetch happened and remote refs were
updated (basically anything after resumable download happens). This change
basically says "If the remote ref was not previously written, then write it."
Because the write is all-or-nothing, this means the logic should either write
all of the intended references, or none of them (because they were written
in the previous invocation).

> > +static const char *setup_and_index_pack(const char *filename)
> > +{
> > +   const char *primer_idx_path = NULL, *primer_bndl_path = NULL;
> > +   primer_idx_path = replace_extension(filename, ".pack", ".idx");
> > +   primer_bndl_path = replace_extension(filename, ".pack", ".bndl");
> > +
> > +   if (!(primer_idx_path && primer_bndl_path)) {
> > +   warning("invalid pack filename '%s', falling back to full "
> > +   "clone", filename);
> > +   return NULL;
> > +   }
> > +
> > +   if (!file_exists(primer_bndl_path)) {
> > +   if (do_index_pack(filename, primer_idx_path)) {
> > +   warning("could not index primer pack, falling back to "
> > +   "full clone");
> > +   return NULL;
> > +   }
> > +   }
> 
> Can it be another (undetected) failure mode that .bndl somehow
> already existed, but not .idx, leaving the resulting object store in
> an incosistent state?  Can do_index_pack() fail and leave .bndl
> behind to get you into such a state?

I don't think so, based on looking at builtin/index-pack.c. write_bundle_file()
happens at the very end of final(), which is long after the process created the
resulting .idx, or has died in the event one is not created. There also is no
automatic cleanup of .idx after that if the process dies somehow after .bndl
is written.

> > +static int write_bundle_refs(const char *bundle_filename)
> > +{
> > +   struct ref_transaction *t;
> > +   struct bundle_header history_tips;
> > +   const char *temp_ref_base = "resume";
> > +   struct strbuf err = STRBUF_INIT;
> > +   int i;
> > +
> > +   init_bundle_header(_tips, bundle_filename);
> > +   

Re: [PATCH 11/11] Resumable clone: implement primer logic in git-clone

2016-09-27 Thread Kevin Wern
On Mon, Sep 19, 2016 at 09:04:40PM +0700, Duy Nguyen wrote:
> On Fri, Sep 16, 2016 at 7:12 AM, Kevin Wern  wrote:
> >  builtin/clone.c | 590 
> > +---
> 
> Argh.. this is too big for my brain at this hour. It might be easier
> to follow if you separate out some code move (I think I've seen some,
> not sure). I'll try to have another look when I find time. But it's
> great to hear from you again, the pleasant surprise in my inbox today,
> as I thought we lost you ;-) There's hope for resumable clone maybe
> before 2018 again.

Sorry, I didn't mean to hurt anyone haha. I probably should have broken it down
into the bare process (priming from static resource, with no options), and the
resume option.

Thanks so much for your feedback! :)


Re: [PATCH 11/11] Resumable clone: implement primer logic in git-clone

2016-09-19 Thread Junio C Hamano
Duy Nguyen  writes:

> On Fri, Sep 16, 2016 at 7:12 AM, Kevin Wern  wrote:
>>  builtin/clone.c | 590 
>> +---
>
> Argh.. this is too big for my brain at this hour. It might be easier
> to follow if you separate out some code move (I think I've seen some,
> not sure). I'll try to have another look when I find time. But it's
> great to hear from you again, the pleasant surprise in my inbox today,
> as I thought we lost you ;-) There's hope for resumable clone maybe
> before 2018 again.

I had a similar thought.

What "git clone" (WITHOUT Kevin's update) should have been was to be
as close to

* Parse command line arguments;

* Create a new repository and go into it; this step would
  require us to have parsed the command line for --template,
  , --separate-git-dir, etc.

* Talk to the remote and do get_remote_heads() aka ls-remote
  output;

* Decide what fetch refspec to use, which alternate object store
  to borrow from; this step would require us to have parsed the
  command line for --reference, --mirror, --origin, etc;

* Issue "git fetch" with the refspec determined above; this step
  would require us to have parsed the command line for --depth, etc.

* Run "git checkout -b" to create an initial checkout; this step
  would require us to have parsed the command line for --branch,
  etc.

and the current code Kevin is basing his work on is not quite in
that shape.  This round of the patches may be RFC so it may be OK
but the ready-for-review work may need to do the refactoring to get
it close to the above shape as a preparatory step before doing
anything else.  Once that is done, Kevin's series (other than the
part that acutally does the resumable static file download) will
become a very easily understood "Ah, we know where to prime the well
from, so let's do that first" step inserted immediately before the
"Issue 'git fetch'" step.



Re: [PATCH 11/11] Resumable clone: implement primer logic in git-clone

2016-09-19 Thread Duy Nguyen
On Fri, Sep 16, 2016 at 7:12 AM, Kevin Wern  wrote:
>  builtin/clone.c | 590 
> +---

Argh.. this is too big for my brain at this hour. It might be easier
to follow if you separate out some code move (I think I've seen some,
not sure). I'll try to have another look when I find time. But it's
great to hear from you again, the pleasant surprise in my inbox today,
as I thought we lost you ;-) There's hope for resumable clone maybe
before 2018 again.
-- 
Duy


Re: [PATCH 11/11] Resumable clone: implement primer logic in git-clone

2016-09-16 Thread Junio C Hamano
Kevin Wern  writes:

> Use transport_download_primer and transport_prime_clone in git clone.
> This only supports a fully connected packfile.
>
> transport_prime_clone and transport_download_primer are executed
> completely independent of transport_(get|fetch)_remote_refs, et al.
> transport_download_primer is executed based on the existence of an
> alt_resource. The idea is that the "prime clone" execution should be
> able to attempt retrieving an alternate resource without dying, as
> opposed to depending on the result of upload pack's "capabilities" to
> indicate whether or not the client can attempt it.
>
> If a resumable resource is available, execute a codepath with the
> following modular components:
> - downloading resource to a specific directory
> - using the resource (for pack, indexing and generating the bundle
>   file)
> - cleaning up the resource (if the download or use fails)
> - cleaning up the resource (if the download or use succeeds)
>
> If resume is interrupted on the client side, the alternate resource
> info is written to the RESUMABLE file in the git directory.
>
> On resume, the required info is extracted by parsing the created
> config file, and that info is used to determine the work and git
> directories. If these cannot be determined, the program exits.
> The writing of the refspec and determination of the initial git
> directories are skipped, along with transport_prime_clone.
>
> The main purpose of this series of patches is to flesh out a codepath
> for automatic resuming, manual resuming, and leaving a resumable
> directory on exit--the logic for when to do these still needs more
> work.
>
> Signed-off-by: Kevin Wern 
> ---
>  Documentation/git-clone.txt |  16 ++
>  builtin/clone.c | 590 
> +---
>  t/t9904-git-prime-clone.sh  | 181 ++
>  3 files changed, 698 insertions(+), 89 deletions(-)
>  create mode 100755 t/t9904-git-prime-clone.sh
>
> diff --git a/Documentation/git-clone.txt b/Documentation/git-clone.txt
> index b7c467a..5934bb6 100644
> --- a/Documentation/git-clone.txt
> +++ b/Documentation/git-clone.txt
> @@ -16,6 +16,7 @@ SYNOPSIS
> [--depth ] [--[no-]single-branch]
> [--recursive | --recurse-submodules] [--] 
> []
> +'git clone --resume '
>  
>  DESCRIPTION
>  ---
> @@ -172,6 +173,12 @@ objects from the source repository into a pack in the 
> cloned repository.
>   via ssh, this specifies a non-default path for the command
>   run on the other end.
>  
> +--prime-clone ::
> +-p ::

Not many other options have single letter shorthand.  Is it expected
that it is worth to let this option squat on a short-and-sweet "-p",
perhaps because it is so frequently used?

> +--resume::
> + Resume a partially cloned repo in a "resumable" state. This
> + can only be specified with a single local directory ( + dir>). This is incompatible with all other options.
> +
> +::
> + The directory of the partial clone. This could be either the
> + work directory or the git directory.

I think these should be described this way:

--resume ::
description if what resume option does and how resumable_dir
is used by the option.

in a single bullet point.

> diff --git a/builtin/clone.c b/builtin/clone.c
> index 9ac6c01..d9a13dc 100644
> --- a/builtin/clone.c
> +++ b/builtin/clone.c
> @@ -8,7 +8,9 @@
>   * Clone a repository into a different directory that does not yet exist.
>   */
>  
> +#include "cache.h"
>  #include "builtin.h"

I do not think you need to include cache.h if you are including
builtin.h; Documentation/CodingGuidelines says:

 - The first #include in C files, except in platform specific compat/
   implementations, must be either "git-compat-util.h", "cache.h" or
   "builtin.h".  You do not have to include more than one of these.

> @@ -40,17 +42,20 @@ static const char * const builtin_clone_usage[] = {
>  
>  static int option_no_checkout, option_bare, option_mirror, 
> option_single_branch = -1;
>  static int option_local = -1, option_no_hardlinks, option_shared, 
> option_recursive;
> +static int option_resume;
>  static char *option_template, *option_depth;
> -static char *option_origin = NULL;
> +static const char *option_origin = NULL;

Is this change related to anything you are doing here?

If you are fixing things while at it, please don't ;-) If you really
want to, please also remove " = NULL", from this line and also from
the next line.  Also do not add " = NULL" at the end of alt_res.

>  static char *option_branch = NULL;
>  ...
> +static const struct alt_resource *alt_res = NULL;

> +static char *get_filename(const char *dir)
> +{
> + char *dir_copy = xstrdup(dir);
> + strip_trailing_slashes(dir_copy);
> + char *filename, *final = NULL;
> +
> + filename = find_last_dir_sep(dir);
> +
> + if (filename && *(++filename))
> + final = 

[PATCH 11/11] Resumable clone: implement primer logic in git-clone

2016-09-15 Thread Kevin Wern
Use transport_download_primer and transport_prime_clone in git clone.
This only supports a fully connected packfile.

transport_prime_clone and transport_download_primer are executed
completely independent of transport_(get|fetch)_remote_refs, et al.
transport_download_primer is executed based on the existence of an
alt_resource. The idea is that the "prime clone" execution should be
able to attempt retrieving an alternate resource without dying, as
opposed to depending on the result of upload pack's "capabilities" to
indicate whether or not the client can attempt it.

If a resumable resource is available, execute a codepath with the
following modular components:
- downloading resource to a specific directory
- using the resource (for pack, indexing and generating the bundle
  file)
- cleaning up the resource (if the download or use fails)
- cleaning up the resource (if the download or use succeeds)

If resume is interrupted on the client side, the alternate resource
info is written to the RESUMABLE file in the git directory.

On resume, the required info is extracted by parsing the created
config file, and that info is used to determine the work and git
directories. If these cannot be determined, the program exits.
The writing of the refspec and determination of the initial git
directories are skipped, along with transport_prime_clone.

The main purpose of this series of patches is to flesh out a codepath
for automatic resuming, manual resuming, and leaving a resumable
directory on exit--the logic for when to do these still needs more
work.

Signed-off-by: Kevin Wern 
---
 Documentation/git-clone.txt |  16 ++
 builtin/clone.c | 590 +---
 t/t9904-git-prime-clone.sh  | 181 ++
 3 files changed, 698 insertions(+), 89 deletions(-)
 create mode 100755 t/t9904-git-prime-clone.sh

diff --git a/Documentation/git-clone.txt b/Documentation/git-clone.txt
index b7c467a..5934bb6 100644
--- a/Documentation/git-clone.txt
+++ b/Documentation/git-clone.txt
@@ -16,6 +16,7 @@ SYNOPSIS
  [--depth ] [--[no-]single-branch]
  [--recursive | --recurse-submodules] [--] 
  []
+'git clone --resume '
 
 DESCRIPTION
 ---
@@ -172,6 +173,12 @@ objects from the source repository into a pack in the 
cloned repository.
via ssh, this specifies a non-default path for the command
run on the other end.
 
+--prime-clone ::
+-p ::
+   When given and the repository to clone from is accessed
+   via ssh, this specifies a non-default path for the command
+   run on the other end.
+
 --template=::
Specify the directory from which templates will be used;
(See the "TEMPLATE DIRECTORY" section of linkgit:git-init[1].)
@@ -232,6 +239,15 @@ objects from the source repository into a pack in the 
cloned repository.
for `host.xz:foo/.git`).  Cloning into an existing directory
is only allowed if the directory is empty.
 
+--resume::
+   Resume a partially cloned repo in a "resumable" state. This
+   can only be specified with a single local directory (). This is incompatible with all other options.
+
+::
+   The directory of the partial clone. This could be either the
+   work directory or the git directory.
+
 :git-clone: 1
 include::urls.txt[]
 
diff --git a/builtin/clone.c b/builtin/clone.c
index 9ac6c01..d9a13dc 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -8,7 +8,9 @@
  * Clone a repository into a different directory that does not yet exist.
  */
 
+#include "cache.h"
 #include "builtin.h"
+#include "bundle.h"
 #include "lockfile.h"
 #include "parse-options.h"
 #include "fetch-pack.h"
@@ -40,17 +42,20 @@ static const char * const builtin_clone_usage[] = {
 
 static int option_no_checkout, option_bare, option_mirror, 
option_single_branch = -1;
 static int option_local = -1, option_no_hardlinks, option_shared, 
option_recursive;
+static int option_resume;
 static char *option_template, *option_depth;
-static char *option_origin = NULL;
+static const char *option_origin = NULL;
 static char *option_branch = NULL;
 static const char *real_git_dir;
 static char *option_upload_pack = "git-upload-pack";
+static char *option_prime_clone = "git-prime-clone";
 static int option_verbosity;
 static int option_progress = -1;
 static enum transport_family family;
 static struct string_list option_config;
 static struct string_list option_reference;
 static int option_dissociate;
+static const struct alt_resource *alt_res = NULL;
 
 static struct option builtin_clone_options[] = {
OPT__VERBOSITY(_verbosity),
@@ -85,10 +90,14 @@ static struct option builtin_clone_options[] = {
   N_("checkout  instead of the remote's HEAD")),
OPT_STRING('u', "upload-pack", _upload_pack, N_("path"),
   N_("path to git-upload-pack on the remote")),
+   OPT_STRING('p', "prime-clone", _prime_clone, N_("path"),
+