date:20170731

Re: [PATCH] hash: Allow building with the external sha1dc library

2017-07-31 Thread Takashi Iwai

On Fri, 28 Jul 2017 18:04:18 +0200,
Ævar Arnfjörð Bjarmason wrote:
> 
> On Fri, Jul 28, 2017 at 5:58 PM, Ævar Arnfjörð Bjarmason
>  wrote:
> > On Tue, Jul 25, 2017 at 7:57 AM, Takashi Iwai  wrote:
> >> Some distros provide SHA1 collision detect code as a shared library.
> >> It's the very same code as we have in git tree, and git can link with
> >> it as well; at least, it may make maintenance easier, according to our
> >> security guys.
> >>
> >> This patch allows user to build git linking with the external sha1dc
> >> library instead of the built-in sha1dc code.  User needs to define
> >> DC_SHA1_EXTERNAL explicitly.  As default, the built-in sha1dc code is
> >> used like before.
> >
> > This whole thing sounds sensible. I reviewed this (but like Junio
> > haven't tested it with a lib) and I think it would be worth noting the
> > following in the commit message / Makefile documentation:
> >
> > * The "sha1detectcoll" *.so name for the "sha1collisiondetection"
> > library is not something you or suse presumably) made up, it's a name
> > the sha1collisiondetection.git itself creates for its library. I think
> > the Makefile docs you've added here are a bit confusing, you talk
> > about the "external sha1collisiondetection library" but then link
> > against sha1detectcoll". It's worth calling out this difference in the
> > docs IMO. I.e. not talk about the sha1detectcoll.so library form of
> > sha1collisiondetection, not the sha1collisiondetection project name as
> > a library.
> >
> > * It might be worth noting that this is *not* linking against the same
> > code we ship ourselves due to the difference in defining
> > SHA1DC_INIT_SAFE_HASH_DEFAULT for the git project's needs in the one
> > we build, hence your need to have a git_SHA1DCInit() wrapper whereas
> > we call SHA1DCInit() directly. It might be interesting to note that
> > the library version will always be *slightly* slower (although the
> > difference will be trivial).
> >
> > * Nothing in your commit message or docs explains why DC_SHA1_LINK is
> > needed. We don't have these sorts of variables for other external
> > libraries we link to, why the difference?
> >
> > Some other things I observed:
> >
> > * We now have much of the same header code copy/pasted between
> > sha1dc_git.h and sha1dc_git_ext.h, did you consider just always
> > including the former but making what it's doing conditional on
> > DC_SHA1_EXTERNAL? I don't know if it would be worth it from a cursory
> > glance, but again your commit message doesn't list that among options
> > considered & discarded.
> >
> > * I think it makes sense to spew out a "not both!" error in the
> > Makefile if you set DC_SHA1_EXTERNAL=Y and DC_SHA1_SUBMODULE=Y. See my
> > 94da9193a6 ("grep: add support for PCRE v2", 2017-06-01) for an
> > example of how to do this.
> >
> > * The whole business of "#include " looks very fragile, are
> > there really no other packages in e.g. suse that ship a sha1.h? Debian
> > has libmd-dev that ships /usr/include/sha1.h that conflicts with this:
> > https://packages.debian.org/search?searchon=contents=sha1.h=exactfilename=unstable=any
> >
> > Shipping a sha1.h as opposed to a sha1collisiondetection.h or
> > sha1detectcoll.h or whatever seems like a *really* bad decision by
> > upstream that should be the subject of at least seeing if they'll take
> > a pull request to fix it before you package it or before we include
> > something that'll probably need to be fixed / worked around anyway in
> > Git.
> 
> I sent this last bit a tad too soon in a checkout of 
> sha1collisiondetection.git:
> 
> $ make PREFIX=/tmp/local install >/dev/null 2>&1 && find /tmp/local/ 
> -type f
> /tmp/local/include/sha1dc/sha1.h
> /tmp/local/bin/sha1dcsum
> /tmp/local/bin/sha1dcsum_partialcoll
> /tmp/local/lib/libsha1detectcoll.a
> /tmp/local/lib/libsha1detectcoll.so.1.0.0
> /tmp/local/lib/libsha1detectcoll.la
> 
> So the upstream library expects you (and it's documented in their README) to 
> do:
> 
> #include 
> 
> But your patch is just doing:
> 
> #include 
>
> At best this seems like a trivial bug and at worst us encoding some
> Suse-specific packaging convention in git, since other distros would
> presumably want to package this in /usr/include/sha1dc/sha1.h as
> upstream suggests. I.e. using the ambiguous sha1.h name is not
> something upstream's doing by default, it's something you're doing in
> your package.

Actually it seems to be a wrong usage of $INCLUDEDIR in the upstream
Makefile, and SUSE package blindly override $INCLUDEDIR.

But sha1dc/sha1.h looks like the correct path, as README.md mentions,
indeed, so maybe we need to work around in SUSE package side.

Andreas, could you work on it please?


thanks,

Takashi

Re: [PATCH] hash: Allow building with the external sha1dc library

2017-07-31 Thread Takashi Iwai

On Fri, 28 Jul 2017 17:58:14 +0200,
Ævar Arnfjörð Bjarmason wrote:
> 
> On Tue, Jul 25, 2017 at 7:57 AM, Takashi Iwai  wrote:
> > Some distros provide SHA1 collision detect code as a shared library.
> > It's the very same code as we have in git tree, and git can link with
> > it as well; at least, it may make maintenance easier, according to our
> > security guys.
> >
> > This patch allows user to build git linking with the external sha1dc
> > library instead of the built-in sha1dc code.  User needs to define
> > DC_SHA1_EXTERNAL explicitly.  As default, the built-in sha1dc code is
> > used like before.
> 
> This whole thing sounds sensible. I reviewed this (but like Junio
> haven't tested it with a lib) and I think it would be worth noting the
> following in the commit message / Makefile documentation:

Hi,

sorry for the late reply, and thanks for the review.

> * The "sha1detectcoll" *.so name for the "sha1collisiondetection"
> library is not something you or suse presumably) made up, it's a name
> the sha1collisiondetection.git itself creates for its library. I think
> the Makefile docs you've added here are a bit confusing, you talk
> about the "external sha1collisiondetection library" but then link
> against sha1detectcoll". It's worth calling out this difference in the
> docs IMO. I.e. not talk about the sha1detectcoll.so library form of
> sha1collisiondetection, not the sha1collisiondetection project name as
> a library.

Heh, I was confused and annoyed by this inconsistency, too :)
IIRC, I used "sha1collisiondetection" since the text for
DC_SHA1_SUBMODULE refers to this term already.

I'll keep using the more readable term (e.g. SHA1 collision-detection
or sha1dc as abbreviated form) instead of the project name.

> * It might be worth noting that this is *not* linking against the same
> code we ship ourselves due to the difference in defining
> SHA1DC_INIT_SAFE_HASH_DEFAULT for the git project's needs in the one
> we build, hence your need to have a git_SHA1DCInit() wrapper whereas
> we call SHA1DCInit() directly. It might be interesting to note that
> the library version will always be *slightly* slower (although the
> difference will be trivial).

Well, it's just a matter of definition of "code".  If you think of the
source code, it is the same code.  The difference is merely the build
option and how it's compiled.  The performance difference is also not
worth to note.  If the call pattern differs due to shlib, it does
change no matter whether it's with the same option or not.  Using
shlib always implies that, you must know it.

In anyway, I'm going to rephrase "code" with "source code" to avoid
confusion.

> * Nothing in your commit message or docs explains why DC_SHA1_LINK is
> needed. We don't have these sorts of variables for other external
> libraries we link to, why the difference?

IMO, it's other way round.  If we didn't provide any such option for
using external library, we were too lazy.

Actually, I was lazy and didn't provide DC_SHA1_CFLAGS, and you've
proven this to be bad -- the library header might be installed in a
different location than the standard path.  The DC_SHA1_CFLAGS can be
set as default to -Isha1dc so that it covers that path.

I'll add this to the revised patch.

> Some other things I observed:
> 
> * We now have much of the same header code copy/pasted between
> sha1dc_git.h and sha1dc_git_ext.h, did you consider just always
> including the former but making what it's doing conditional on
> DC_SHA1_EXTERNAL? I don't know if it would be worth it from a cursory
> glance, but again your commit message doesn't list that among options
> considered & discarded.

I don't mind either way, there is no perfect solution in this case.
As you know, many people think the ifdef ugly no matter how.

I leave the decision to maintainer.  Just let me know which option is
preferred.

> * I think it makes sense to spew out a "not both!" error in the
> Makefile if you set DC_SHA1_EXTERNAL=Y and DC_SHA1_SUBMODULE=Y. See my
> 94da9193a6 ("grep: add support for PCRE v2", 2017-06-01) for an
> example of how to do this.

Yeah, that's a good point.  Will add the check.

> * The whole business of "#include " looks very fragile, are
> there really no other packages in e.g. suse that ship a sha1.h? Debian
> has libmd-dev that ships /usr/include/sha1.h that conflicts with this:
> https://packages.debian.org/search?searchon=contents=sha1.h=exactfilename=unstable=any
> 
> Shipping a sha1.h as opposed to a sha1collisiondetection.h or
> sha1detectcoll.h or whatever seems like a *really* bad decision by
> upstream that should be the subject of at least seeing if they'll take
> a pull request to fix it before you package it or before we include
> something that'll probably need to be fixed / worked around anyway in
> Git.

Yeah, a unique header name would be better indeed.

thanks,

Takashi

[RFC] Moving main git-subtree development. to GitHub

2017-07-31 Thread David A. Greene

Hello all,

I have decided that moving git-subtree development off of the main git
mailing list is the best way to address the needs of git-subtree users
while providing the flexibility necessary to get it in shape for
eventual "official" status in the git project.

Over the last year and a half I have been working on some new features
for git-subtree, motivated by day-job requirements.  Much of that effort
has been spent hardening the new code in a real work environment, driven
by real-world needs.  I believe it is now ready for public consumption.
However, because it is a large change, it will need lots of public
exposure before it can be considered safe for general use.  Essentially,
I would like to do a "beta" release of the new code while not impacting
existing users of git-subtree in contrib.

During this time and due to work and life commitments, I have not been
able to keep up with the git mailing list as I would like.  Questions
and patches related to git-subtree have languished and I don't want to
lose that good work by our users.  Therefore, I would like to transfer
the main development activity over to GitHub.  GitHub's patch tracking,
review and feedback infrastructure works better for me that a large
mailing list with patches sent via e-mail.  It is easy to lose things in
a sea of conversations.  It's completely personal preference but I think
a switch to GitHub will also make tracking git-subtree's progress easier
for users.  Moving the main development to GitHub will also allow
git-subtree users to be more visible, ask questions and help each other
out.

Going forward, I would like to do the main feature and bug fix work on
GitHub and periodically subtree-merge to git's main repository under
contrib when the code has stabilized and we are reasonably confident
interfaces are stable.  This will allow us to experiment with new ideas
while keeping a stable codebase for end users.  I expect a lot of
re-engineering of the core bits of git-subtree to bring it into
compliance with git's coding standards, support new features and provide
a better user experience.

I believe keeping a stable git-subtree in contrib is valuable.
git-subtree and git-submodule provide alternative solutions to similar
problems, as well as each solving problems the other does not.
Anecdotally, I noticed an uptick in git-subtree user activity after the
move into contrib.  I would like to maintain that visibilty.

Does this mode of operation work for the larger git community?  Are
there suggestions of how to make this work as smoothly as possible?

Thank you for your feedback and support of git-subtree!

  -David

Re: [PATCH 0/1] add git-splice subcommand for non-interactive branch splicing

2017-07-31 Thread Adam Spiers

On 31 July 2017 at 23:18, Junio C Hamano  wrote:
> Adam Spiers  writes:
>
> > Therefore there is a risk that each new UI for higher-level workflows
> > will end up re-implementing these mid-level operations.  This
> > undesirable situation could be avoided if git itself provided those
> > mid-level operations.
>
> Let me make sure if I get your general idea right, first.
>
> Is your aim is to give a single unified mid-layer that these other
> tools can build on instead of rolling their own "cherry-pick these
> ranges, then squash that in, and then merge the other one in, ..."
> sequencing machinery?

Pretty much, yes.  The original itch I wanted to scratch was
implementing git-explode, which aims to automatically explode a large
topic branch into a set of smaller, independent topic branches, by
harnessing my git-deps for automatically detecting inter-dependencies
between commits in the large source branch and using that dependency
tree to construct the smaller topic branches.  (Before anyone protests
at this point, yes, I am fully aware that it is not possible to
automate 100% accurate detection of these dependencies, and no, that
does not completely invalidate the approach.[0])

My initial thought was that in order to be able to automatically
decompose a branch into smaller branches, I would need a mid-layer
operation "git-transplant" somewhat analogous to mv(1), which would
let me easily move commits out of the source branch into a new target
branch.  And then I realised that, in the same way that
(simplistically speaking) mv(1) could be reimplemented as cp(1)
followed by rm(1), implementing "git-transplant" in turn would require
more primitive operations for copying commits between branches, and
removing commits from branches.  At this point I saw value in
generalising those operations; hence the idea for git-splice was born.

Consequently I implemented prototypes for splice and transplant, which
didn't take too long.  (The real work was writing comprehensive test
suites and polishing the tools until they were reliable enough to pass
100%.)

Ironically, soon after I started to implement git-explode, I realised
that the order in which I needed to walk the dependency tree
discovered by git-deps actually meant that I couldn't use
git-transplant for this particular use case, so in the end I
implemented it with pygit2.  (I still need to polish it up a bit more
before releasing.)

However, even though splice and transplant are not useful for this
particular use case, I still believe that they (or similar tools) have
the potential to serve as a useful foundation for other workflows.

> If so, I think that is a very good goal.

Glad to hear it :-)

> > # Remove commits A..B (i.e. excluding A) from the current branch.
> > git splice A..B
> > # Remove commit A from the current branch.
> > git splice A^!
> > # Remove commits A..B from the current branch, and cherry-pick
> > # commits C..D at the same point.
> > git splice A..B C..D
>
> We need to make sure that the mid-layer tool offers a good set of
> primitive operations that serve all of these other tools' needs.  I
> do not know offhand if what you implemented that are illustrated by
> these examples is or isn't that "good set".

Agreed.  That's why I sent the RFC to this list last year: in the hope
that these details could be hashed out and guide my development in the
right direction.  Unfortunately I didn't get much response at the
time, which was probably my fault for not explaining my "mission
goals" clearly enough.  Although in fairness to myself, I think I
needed a year anyway to let the ideas in my head mature to the point
where I understood them well enough myself to communicate them clearly
to others :-)

> Assuming that there is such a "good set of primitives" surfaced at
> the UI level so that these other tools can express what they want to
> perform with, I'd personally prefer to see a solution that extends
> and uses the common "sequencer" machinery we have been using to
> drive cherry-picks, reverts and interactive rebases that work on
> multiple commits.  IOW, it would be nice to see that the only thing
> "git splice A..B" does is to prepare a series of instructions in a
> file, e.g. .git/sequencer/todo, just like "git cherry-pick A..B"
> would, and let the sequencer machinery to handle the sequencing.
>
> E.g. In a history like
>
> ---o---A---o---B---X---Y---Z   HEAD
>
> "git splice A..B" command would write something like this:
>
> reset to A
> pick X
> pick Y
> pick Z
>
> to the todo file and drive the sequencer.

That sounds great to me!  At this point sadly I'm currently a bit
ignorant of the intricacies of the sequencer, otherwise I might have
adopted this approach from day 0.  But I'm pleased to be able to say
that under the hood, the way I implemented splice and transplant isn't
too dissimilar to this: they both write "todo" files, under
.git/splice

RE:

2017-07-31 Thread TD CREDIT

DO YOU NEED ANY KIND OF LOAN CREDIT ASSISTANCE? IF YES,EMAIL US FOR MORE INFO.

---
This email is free from viruses and malware because avast! Antivirus protection 
is active.
http://www.avast.com

Re: [PATCH 2/2] t6500: mark tests as SHA1 reliant

2017-07-31 Thread brian m. carlson

On Mon, Jul 31, 2017 at 01:26:40PM -0700, Stefan Beller wrote:
> On Sun, Jul 30, 2017 at 4:24 PM, brian m. carlson
>  wrote:
> > I realize this was worded poorly.  So for my example, in this case, we'd
> > do:
> >
> > test-helper-hash-string "263 410"
> >
> > For SHA-1, we'd get "263 410".  For SHA-256, we'd get "313 481" (which,
> > as SHA-256 blobs, both start with "17" in their hex representation).
> > Presumably we'd read some environment variable to determine the proper
> > value.
> 
> This is what Junio proposed in the first message, except that we defer that
> to a shell script as for each test we may need different things, so a helper 
> may
> be of little value?

I think a shell script may end up being a fine helper for our needs.  In
any case, I think we're in violent agreement.  My proposal was just an
idea I had considered when thinking about how to do this, and any
suitable solution is fine with me.
-- 
brian m. carlson / brian with sandals: Houston, Texas, US
https://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature

Re: reftable [v4]: new ref storage format

2017-07-31 Thread Shawn Pearce

On Mon, Jul 31, 2017 at 12:42 PM, Junio C Hamano  wrote:
> Shawn Pearce  writes:
>
>> ### Peeling
>>
>> References in a reftable are always peeled.
>
> This hopefully means "a record for an annotated (or signed) tag
> records both the tag object and the object it refers to", and does
> not include peeling a commit down to its tree.

Yes, thanks for the clarification. I've added that to this section.


>> ### Reference name encoding
>>
>> Reference names should be encoded with UTF-8.
>
> If we limited ourselves to say that a refname is an uninterpreted
> sequence of bytes that must pass check_refname_format() test, mthen
> we won't open us to confusion such as "are two refs with the same
> Unicode name encoded with different normalization form considered
> the same?"

Ack. I'll reword this as bytes, and point to git-check-ref-format.

> In what way does this "should be in UTF-8" help describing the file
> format, I wonder.

In JGit we have a String object for a reference name. Dropping that
down to a sequence of bytes requires converting it with a character
encoding, such as US-ASCII or UTF-8.


>> ### First ref block
>>
>> The first ref block shares the same block as the file header, and is
>> 24 bytes smaller than all other blocks in the file.  The first block
>> immediately begins after the file header, at offset 24.
>>
>> If the first block is a log block (a log only table), its block header
>> begins immediately at offset 24.
>
> A minor nit: You called such a file "a log-only file"; let's be consistent.

Fixed. I found another that also was inconsistent. Thanks for the
attention to detail.


>> ### Ref block format
>>
>> A ref block is written as:
>>
>> 'r'
>> uint24( block_len )
>> ref_record+
>> uint32( restart_offset )+
>> uint16( restart_count )
>> padding?
>>
>> Blocks begin with `block_type = 'r'` and a 3-byte `block_len` which
>> encodes the number of bytes in the block up to, but not including the
>> optional `padding`.  This is almost always shorter than the file's
>> `block_size`.  In the first ref block, `block_len` includes 24 bytes
>> for the file header.
>
> As a block cannot be longer than 16MB, allocating uint32 to a
> restart offset may be a bit overkill.  I do not know if it is worth
> attempting to pack 1/3 more restart entries in a block by using
> uint24, though.

I don't think its worth the annoyance in code that has to deal with
this field. An example 87k ref repository with a 4k block size has ~9
restarts per block and ~19 bytes of padding. Using uint24 saves us 9
bytes, yet the average ref here costs 27 bytes. We aren't likely to
fill the block with another ref, or another restart point.


>>  ref record
[...]
>> The `value` follows.  Its format is determined by `value_type`, one of
>> the following:
>>
>> - `0x0`: deletion; no value data (see transactions, below)
>> - `0x1`: one 20-byte object id; value of the ref
>> - `0x2`: two 20-byte object ids; value of the ref, peeled target
>> - `0x3`: symref and text: `varint( text_len ) text`
>> - `0x4`: index record (see below)
>> - `0x5`: log record (see below)
[...]
>> Types `0x6..0x7` are reserved for future use.
>
> I wondered if we regret the apparent limited extensibility later,
> but giving 4 bits to value-type would limit suffix length that can
> be represented by a single varint() only to 7, while the format
> described would give us up to 15 bytes.  We can say type 0x7 would
> be followed by another varint() to record the extended type, or
> something, to extend it, so probably what you did here strikes a
> good balance.

Thanks. I think we actually have 0x4-0x7 available in value_type. I
used 0x4 and 0x5 as above only as a suggestion from Stefan to help
someone debug a reader. The block type differs where 0x0-0x3 and 0x4
and 0x5 are used, so there is already sufficient information available
to disambiguate the value_type such that we don't actually have to use
0x4 and 0x5 as I have here.

So I agree with myself, and with you, 3 bits for value_type and 4 bits
for suffix_len is the right balance.


>> ### Ref index
>> ...
>> An index block should only be written if there are at least 4 blocks
>> in the file, as cold reads using the index requires 2 disk reads (read
>> index, read block), and binary searching <= 4 blocks also requires <=
>> 2 reads.  Omitting the index block from smaller files saves space.
>
> I think the last "<= 4" should be "< 4" here.  That is consistent
> with an earlier part of the paragraph that requires at least 4
> ref-blocks in the file, because a reftable with only 3 ref-blocks
> still can be accessed with 2 reads (a reftable with 4 ref-blocks
> without index may need 3 reads as there is no "middle" for binary
> search).
>
> The first sentence should be "if there are at least 4 ref-blocks", I
> guess.

Thanks, clarified.


>> ### Obj block format
>>
>> Object blocks use unique, abbreviated 2-20 byte SHA-1 keys, mapping
>> to ref

Re: [PATCH v3 07/10] submodule: check for unstaged .gitmodules outside of config parsing

2017-07-31 Thread Stefan Beller

On Tue, Jul 18, 2017 at 12:05 PM, Brandon Williams  wrote:
> Teach 'is_staging_gitmodules_ok()' to be able to determine in the
> '.gitmodules' file has unstaged changes based on the passed in index
> instead of relying on a global varible which is set during the

variable

> submodule-config parsing.
>
> Signed-off-by: Brandon Williams 
> ---
>  builtin/mv.c |  2 +-
>  builtin/rm.c |  2 +-
>  submodule.c  | 32 +---
>  submodule.h  |  2 +-
>  4 files changed, 20 insertions(+), 18 deletions(-)
>
> diff --git a/builtin/mv.c b/builtin/mv.c
> index dcf6736b5..94fbaaa5d 100644
> --- a/builtin/mv.c
> +++ b/builtin/mv.c
> @@ -81,7 +81,7 @@ static void prepare_move_submodule(const char *src, int 
> first,
> struct strbuf submodule_dotgit = STRBUF_INIT;
> if (!S_ISGITLINK(active_cache[first]->ce_mode))
> die(_("Directory %s is in index and no submodule?"), src);
> -   if (!is_staging_gitmodules_ok())
> +   if (!is_staging_gitmodules_ok(_index))
> die(_("Please stage your changes to .gitmodules or stash them 
> to proceed"));
> strbuf_addf(_dotgit, "%s/.git", src);
> *submodule_gitfile = read_gitfile(submodule_dotgit.buf);
> diff --git a/builtin/rm.c b/builtin/rm.c
> index 52826d137..4057e73fa 100644
> --- a/builtin/rm.c
> +++ b/builtin/rm.c
> @@ -286,7 +286,7 @@ int cmd_rm(int argc, const char **argv, const char 
> *prefix)
> list.entry[list.nr].name = xstrdup(ce->name);
> list.entry[list.nr].is_submodule = S_ISGITLINK(ce->ce_mode);
> if (list.entry[list.nr++].is_submodule &&
> -   !is_staging_gitmodules_ok())
> +   !is_staging_gitmodules_ok(_index))
> die (_("Please stage your changes to .gitmodules or 
> stash them to proceed"));
> }
>
> diff --git a/submodule.c b/submodule.c
> index b1965290f..46ec04d7c 100644
> --- a/submodule.c
> +++ b/submodule.c
> @@ -37,18 +37,25 @@ static struct oid_array ref_tips_after_fetch;
>  static int gitmodules_is_unmerged;
>
>  /*
> - * This flag is set if the .gitmodules file had unstaged modifications on
> - * startup. This must be checked before allowing modifications to the
> - * .gitmodules file with the intention to stage them later, because when
> - * continuing we would stage the modifications the user didn't stage herself
> - * too. That might change in a future version when we learn to stage the
> - * changes we do ourselves without staging any previous modifications.
> + * Check if the .gitmodules file has unstaged modifications.  This must be
> + * checked before allowing modifications to the .gitmodules file with the
> + * intention to stage them later, because when continuing we would stage the
> + * modifications the user didn't stage herself too. That might change in a
> + * future version when we learn to stage the changes we do ourselves without
> + * staging any previous modifications.
>   */
> -static int gitmodules_is_modified;
> -
> -int is_staging_gitmodules_ok(void)
> +int is_staging_gitmodules_ok(const struct index_state *istate)
>  {
> -   return !gitmodules_is_modified;
> +   int pos = index_name_pos(istate, GITMODULES_FILE, 
> strlen(GITMODULES_FILE));
> +
> +   if ((pos >= 0) && (pos < istate->cache_nr)) {

Why do we need the second check (pos < istate->cache_nr) ?

I would have assumed the first one suffices,
it might read better if turned around:


if (pos < 0)
return 1;

return (lstat(GITMODULES_FILE, ) == 0 &&
ce_match_stat(istate->cache[pos], , 0) & DATA_CHANGED);
  }

> @@ -231,11 +238,6 @@ void gitmodules_config(void)
> !memcmp(ce->name, ".gitmodules", 11))
> gitmodules_is_unmerged = 1;
> }
> -   } else if (pos < active_nr) {
> -   struct stat st;
> -   if (lstat(".gitmodules", ) == 0 &&
> -   ce_match_stat(active_cache[pos], , 0) & 
> DATA_CHANGED)
> -   gitmodules_is_modified = 1;
> }

So this is where the check "pos < active_nr" is coming from,
introduced in 5fee995244 (submodule.c: add .gitmodules staging
helper functions, 2013-07-30) as well as d4e98b581b (Submodules:
Don't parse .gitmodules when it contains, merge conflicts, 2011-05-14).

If I am reading the docs for cache_name_pos correctly, we would
not need to check for the index exceeding active_cache,
but checking for the index not being out of bounds seems
to be wide spread.

Re: What's cooking in git.git (Jul 2017, #09; Mon, 31)

2017-07-31 Thread Ramsay Jones

On 31/07/17 23:30, Junio C Hamano wrote:
[snip]
> 
> * sd/branch-copy (2017-06-18) 3 commits
>   (merged to 'next' on 2017-07-18 at 5e3b9357ea)
>  + branch: add a --copy (-c) option to go with --move (-m)
>  + branch: add test for -m renaming multiple config sections
>  + config: create a function to format section headers
> 
>  "git branch" learned "-c/-C" to create and switch to a new branch
>  by copying an existing one.
> 
>  Will cook in 'next'.
> 
>  I personally do not think "branch --copy master backup" while on
>  "master" that switches to "backup" is a good UI, and I *will* say
>  "I told you so" when users complain after we merge this down to
>  'master'.

I wouldn't normally comment on an issue like this because I am
not very good at specifying, designing and evaluating UIs (so
who in their right mind would listen to me). ;-)

FWIW, I suspect that I would not like using this interface either
and would, therefore, not use it. However, I guess the worst that
would happen, is that it would gain another 'wort' (--option) to
turn off the "switches to backup" branch. :-D

I didn't want you to think that the lack of comments on this was
because everybody agreed that it was a good idea.

ATB,
Ramsay Jones

Re: [GSoC][PATCH 08/13] submodule: port submodule subcommand 'summary' from shell to C

2017-07-31 Thread Christian Couder

On Mon, Jul 31, 2017 at 10:56 PM, Prathamesh Chavan  wrote:

> * variable head was no longer used in module_summary() and instead the strbuf
>   was utilized.

Good but there might be a few problems in the way it is used. See below.

> +static int compute_summary_module_list(char *head, struct summary_cb *info)
> +{
> +   struct argv_array diff_args = ARGV_ARRAY_INIT;
> +   struct rev_info rev;
> +   struct module_cb_list list = MODULE_CB_LIST_INIT;
> +
> +   argv_array_push(_args, info->diff_cmd);
> +   if (info->cached)
> +   argv_array_push(_args, "--cached");
> +   argv_array_pushl(_args, "--ignore-submodules=dirty", "--raw",
> +NULL);
> +   if (head)
> +   argv_array_push(_args, head);
> +   argv_array_push(_args, "--");
> +   if (info->argc)
> +   argv_array_pushv(_args, info->argv);
> +
> +   git_config(git_diff_basic_config, NULL);
> +   init_revisions(, info->prefix);
> +   gitmodules_config();
> +   rev.abbrev = 0;
> +   precompose_argv(diff_args.argc, diff_args.argv);
> +
> +   diff_args.argc = setup_revisions(diff_args.argc, diff_args.argv,
> +, NULL);
> +   rev.diffopt.output_format = DIFF_FORMAT_NO_OUTPUT | 
> DIFF_FORMAT_CALLBACK;
> +   rev.diffopt.format_callback = submodule_summary_callback;
> +   rev.diffopt.format_callback_data = 
> +
> +   if (!info->cached) {
> +   if (!strcmp(info->diff_cmd, "diff-index"))
> +   setup_work_tree();
> +   if (read_cache_preload() < 0) {
> +   perror("read_cache_preload");
> +   return -1;
> +   }
> +   } else if (read_cache() < 0) {
> +   perror("read_cache");
> +   return -1;
> +   }
> +
> +   if (!strcmp(info->diff_cmd, "diff-index"))
> +   run_diff_index(, info->cached);
> +   else
> +   run_diff_files(, 0);
> +   prepare_submodule_summary(info, );
> +
> +   free(head);

It is a bit strange to have this function free() an argument it is passed.

> +   return 0;
> +

Spurious new line.

> +}
> +
> +static int module_summary(int argc, const char **argv, const char *prefix)
> +{
> +   struct summary_cb info = SUMMARY_CB_INIT;
> +   int cached = 0;
> +   char *diff_cmd = "diff-index";
> +   int for_status = 0;
> +   int quiet = 0;
> +   int files = 0;
> +   int summary_limits = -1;
> +   struct child_process cp_rev = CHILD_PROCESS_INIT;
> +   struct strbuf sb = STRBUF_INIT;

[...]

> +   if (!capture_command(_rev, , 0)) {
> +   strbuf_strip_suffix(, "\n");
> +   if (argc) {
> +   argv++;
> +   argc--;
> +   }
> +   } else if (!argc || !strcmp(argv[0], "HEAD")) {
> +   /* before the first commit: compare with an empty tree */
> +   struct stat st;
> +   struct object_id oid;
> +   if (fstat(0, ) < 0 || index_fd(oid.hash, 0, , 2, 
> prefix, 3))
> +   die("Unable to add %s to database", oid.hash);
> +   strbuf_addstr(, oid_to_hex());
> +   if (argc) {
> +   argv++;
> +   argc--;
> +   }
> +   } else {
> +   strbuf_addstr(, "HEAD");
> +   }
> +
> +   if (files) {
> +   if (cached)
> +   die(_("The --cached option cannot be used with the 
> --files option"));
> +   diff_cmd = "diff-files";
> +
> +   strbuf_reset();

strbuf_reset() does not free the memory allocated to sb.buf...

> +   sb.buf = NULL;

...then this makes sure that the memory previously allocated to sb.buf
will not be free()d.

Maybe instead of the above 2 lines just: strbuf_release()
Or maybe just don't set sb.buf to NULL.

> +   }
> +
> +   info.argc = argc;
> +   info.argv = argv;
> +   info.prefix = prefix;
> +   info.cached = !!cached;
> +   info.for_status = !!for_status;
> +   info.quiet = quiet;
> +   info.files = files;
> +   info.summary_limits = summary_limits;
> +   info.diff_cmd = diff_cmd;
> +
> +   return compute_summary_module_list(strbuf_detach(, NULL), );

Maybe you could pass: "sb.len > 0 ? strbuf_detach(, NULL) : NULL"
This way if sb has previously been released or reset, NULL will be passed.

I would suggest though to just pass sb.buf and to strbuf_release()
after calling compute_summary_module_list() and before returning from
module_summary() instead of having compute_summary_module_list() free
its first argument.
If you do that then compute_summary_module_list() should be changed so
that when it is passed "" it will behave in the same way as when it is
passed NULL.

> +}

[PATCH] convert any hard coded .gitmodules file string to the MACRO

2017-07-31 Thread Stefan Beller

I used these commands:
  $ cat sem.cocci
  @@
  @@
  - ".gitmodules"
  + GITMODULES_FILE

  $ spatch --in-place --sp-file sem.cocci builtin/*.c *.c *.h

Feel free to regenerate or squash it in or have it as a separate commit.

Signed-off-by: Stefan Beller 
---
 submodule.c| 18 +-
 unpack-trees.c |  2 +-
 2 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/submodule.c b/submodule.c
index 37f4a92872..b75d02ba7b 100644
--- a/submodule.c
+++ b/submodule.c
@@ -63,7 +63,7 @@ int update_path_in_gitmodules(const char *oldpath, const char 
*newpath)
struct strbuf entry = STRBUF_INIT;
const struct submodule *submodule;
 
-   if (!file_exists(".gitmodules")) /* Do nothing without .gitmodules */
+   if (!file_exists(GITMODULES_FILE)) /* Do nothing without .gitmodules */
return -1;
 
if (gitmodules_is_unmerged)
@@ -77,7 +77,7 @@ int update_path_in_gitmodules(const char *oldpath, const char 
*newpath)
strbuf_addstr(, "submodule.");
strbuf_addstr(, submodule->name);
strbuf_addstr(, ".path");
-   if (git_config_set_in_file_gently(".gitmodules", entry.buf, newpath) < 
0) {
+   if (git_config_set_in_file_gently(GITMODULES_FILE, entry.buf, newpath) 
< 0) {
/* Maybe the user already did that, don't error out here */
warning(_("Could not update .gitmodules entry %s"), entry.buf);
strbuf_release();
@@ -97,7 +97,7 @@ int remove_path_from_gitmodules(const char *path)
struct strbuf sect = STRBUF_INIT;
const struct submodule *submodule;
 
-   if (!file_exists(".gitmodules")) /* Do nothing without .gitmodules */
+   if (!file_exists(GITMODULES_FILE)) /* Do nothing without .gitmodules */
return -1;
 
if (gitmodules_is_unmerged)
@@ -110,7 +110,7 @@ int remove_path_from_gitmodules(const char *path)
}
strbuf_addstr(, "submodule.");
strbuf_addstr(, submodule->name);
-   if (git_config_rename_section_in_file(".gitmodules", sect.buf, NULL) < 
0) {
+   if (git_config_rename_section_in_file(GITMODULES_FILE, sect.buf, NULL) 
< 0) {
/* Maybe the user already did that, don't error out here */
warning(_("Could not remove .gitmodules entry for %s"), path);
strbuf_release();
@@ -122,7 +122,7 @@ int remove_path_from_gitmodules(const char *path)
 
 void stage_updated_gitmodules(void)
 {
-   if (add_file_to_cache(".gitmodules", 0))
+   if (add_file_to_cache(GITMODULES_FILE, 0))
die(_("staging updated .gitmodules failed"));
 }
 
@@ -233,18 +233,18 @@ void gitmodules_config(void)
strbuf_addstr(_path, "/.gitmodules");
if (read_cache() < 0)
die("index file corrupt");
-   pos = cache_name_pos(".gitmodules", 11);
+   pos = cache_name_pos(GITMODULES_FILE, 11);
if (pos < 0) { /* .gitmodules not found or isn't merged */
pos = -1 - pos;
if (active_nr > pos) {  /* there is a .gitmodules */
const struct cache_entry *ce = 
active_cache[pos];
if (ce_namelen(ce) == 11 &&
-   !memcmp(ce->name, ".gitmodules", 11))
+   !memcmp(ce->name, GITMODULES_FILE, 11))
gitmodules_is_unmerged = 1;
}
} else if (pos < active_nr) {
struct stat st;
-   if (lstat(".gitmodules", ) == 0 &&
+   if (lstat(GITMODULES_FILE, ) == 0 &&
ce_match_stat(active_cache[pos], , 0) & 
DATA_CHANGED)
gitmodules_is_modified = 1;
}
@@ -264,7 +264,7 @@ static int gitmodules_cb(const char *var, const char 
*value, void *data)
 
 void repo_read_gitmodules(struct repository *repo)
 {
-   char *gitmodules_path = repo_worktree_path(repo, ".gitmodules");
+   char *gitmodules_path = repo_worktree_path(repo, GITMODULES_FILE);
 
git_config_from_file(gitmodules_cb, gitmodules_path, repo);
free(gitmodules_path);
diff --git a/unpack-trees.c b/unpack-trees.c
index dd535bc849..05335fe5bf 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -286,7 +286,7 @@ static void reload_gitmodules_file(struct index_state 
*index,
for (i = 0; i < index->cache_nr; i++) {
struct cache_entry *ce = index->cache[i];
if (ce->ce_flags & CE_UPDATE) {
-   int r = strcmp(ce->name, ".gitmodules");
+   int r = strcmp(ce->name, GITMODULES_FILE);
if (r < 0)
continue;
else if (r == 0) {
-- 
2.14.0.rc0.3.g6c2e499285

Re: reftable [v4]: new ref storage format

2017-07-31 Thread Shawn Pearce

On Mon, Jul 31, 2017 at 12:01 PM, Stefan Beller  wrote:
> On Sun, Jul 30, 2017 at 8:51 PM, Shawn Pearce  wrote:
>> 4th iteration of the reftable storage format.
>>
>> You can read a rendered version of this here:
>> https://googlers.googlesource.com/sop/jgit/+/reftable/Documentation/technical/reftable.md
>>
>
>> uint16( restart_count )
>
> When looking for 16/32/64 bit hard coded ints I came across
> this one once again. How much more expensive is reading
> a varint? As the block_len points at the restart_count, we cannot
> replace it with a varint easily, but we could use a byte-reversed
> varint instead. If we do this step, all restart offsets could also be
> (reverse) varints?

Its not the expense of decoding a varint, its making the file
structure predictable so its simple to binary search within restart
points. If these were varints, it becomes much more difficult to
divide the remaining range in half and update the boundary conditions
to locate the next mid-point.

Re: [PATCH v2 0/5] Fsck for lazy objects, and (now) actual invocation of loader

2017-07-31 Thread Jonathan Tan

On Mon, 31 Jul 2017 14:21:56 -0700
Junio C Hamano  wrote:

> Jonathan Tan  writes:
> 
> > Besides review changes, this patch set now includes my rewritten
> > lazy-loading sha1_file patch, so you can now do this (excerpted from one
> > of the tests):
> >
> > test_create_repo server
> > test_commit -C server 1 1.t abcdefgh
> > HASH=$(git hash-object server/1.t)
> > 
> > test_create_repo client
> > test_must_fail git -C client cat-file -p "$HASH"
> > git -C client config core.repositoryformatversion 1
> > git -C client config extensions.lazyobject \
> > "\"$TEST_DIRECTORY/t0410/lazy-object\" \"$(pwd)/server/.git\""
> > git -C client cat-file -p "$HASH"
> >
> > with fsck still working. Also, there is no need for a list of promised
> > blobs, and the long-running process protocol is being used.
> 
> I do not think I read your response to my last comment on this
> series, so I could be missing something large, but I am afraid that
> the resulting fsck is only half as useful as the normal fsck.  I do
> not see it any better than a hypothetical castrated version of fsck
> that only checks the integrity of objects that appear in the local
> object store, without doing any connectivity checks.

Sorry, I haven't replied to your last response [1]. That does sound like
a good idea, though, and probably can be extended to trees and blobs in
that we need to make sure that any object referenced from local-only
commits (calculated as you describe in [1]) can be obtained through an
object walk from a remote-tracking branch.

I haven't fully thought of the implications of things like building
commits on top of an arbitrary upstream commit (so since our upstream
commit is not a tip, the object walk from all remote-tracking branches
might not reach our upstream commit).

To try to solve that, we could use an alternate object store to store
remote objects in order to be able to find remote objects quickly
without doing a traversal, but that does not fully solve the problem,
because some information about remote object possession lies only in
their parents (for example, if we don't have a remote blob, sometimes
the only way to know that the remote has it is by having a tree
containing that blob).

In addition, this also couples the lazy object loading with either a
remote ref (or all remote refs, if we decide to consider objects from
all remote refs as potentially loadable).

I'll think about this further.

[1] https://public-inbox.org/git/xmqq379fkz4x@gitster.mtv.corp.google.com/

> Don't get me wrong.  The integrity check on local objects you still
> do is important---that is what allows us to make sure that the local
> "cache" does not prevent us from going to the real source of the
> remote object store by having a corrupt copy.  
> 
> But not being able to tell if a missing object is OK to be missing
> (because we can get them if/as needed from elsewhere) or we lost the
> sole copy of an object that we created and have not pushed out
> (hence we are in deep yogurt) makes it pretty much pointless to run
> "fsck", doesn't it?  It does not give us any guarantee that our
> repository plus perfect network connectivity gives us an environment
> to build further work on.
> 
> Or am I missing something fundamental?

Well, the fsck can still detect issues like corrupt objects (as you
mention above) and dangling heads, which might be real issues. But it is
true that it does not give you the guarantee you describe.

From a user standpoint, this might be able to be worked around by
providing a network-requiring object connectivity checking tool or by
just having the user running a build to ensure that all necessary files
are present.

Having said that, this feature will be very nice to have.

What's cooking in git.git (Jul 2017, #09; Mon, 31)

2017-07-31 Thread Junio C Hamano

Here are the topics that have been cooking.  Commits prefixed with
'-' are only in 'pu' (proposed updates) while commits prefixed with
'+' are in 'next'.  The ones marked with '.' do not appear in any of
the integration branches, but I am still holding onto them.

Even though -rc2 was scheduled for today, I'll wait for a few days
so that we can tag with updated l10n; there isn't much change since
the first release candidate.

You can find the changes described here in the integration branches
of the repositories listed at

http://git-blame.blogspot.com/p/git-public-repositories.html

--
[Graduated to "master"]

* js/blame-lib (2017-07-24) 1 commit
  (merged to 'next' on 2017-07-26 at db0d081012)
 + blame: fix memory corruption scrambling revision name in error message

 A hotfix to a topic already in 'master'.

--
[New Topics]

* as/diff-options-grammofix (2017-07-31) 1 commit
 - diff-options doc: grammar fix

 A grammofix.

--
[Stalled]

* mg/status-in-progress-info (2017-05-10) 2 commits
 - status --short --inprogress: spell it as --in-progress
 - status: show in-progress info for short status

 "git status" learns an option to report various operations
 (e.g. "merging") that the user is in the middle of.

 cf. 


* nd/worktree-move (2017-04-20) 6 commits
 - worktree remove: new command
 - worktree move: refuse to move worktrees with submodules
 - worktree move: accept destination as directory
 - worktree move: new command
 - worktree.c: add update_worktree_location()
 - worktree.c: add validate_worktree()

 "git worktree" learned move and remove subcommands.

 Expecting a reroll.
 cf. <20170420101024.7593-1-pclo...@gmail.com>
 cf. <20170421145916.mknekgqzhxffu...@sigill.intra.peff.net>
 cf. 


* sg/clone-refspec-from-command-line-config (2017-06-16) 2 commits
 - Documentation/clone: document ignored configuration variables
 - clone: respect additional configured fetch refspecs during initial fetch
 (this branch is used by sg/remote-no-string-refspecs.)

 "git clone -c var=val" is a way to set configuration variables in
 the resulting repository, but it is more useful to also make these
 variables take effect while the initial clone is happening,
 e.g. these configuration variables could be fetch refspecs.

 Waiting for a response.
 cf. <20170617112228.vugswym4o4owf...@sigill.intra.peff.net>
 cf. 


* js/rebase-i-final (2017-07-27) 10 commits
 - rebase -i: rearrange fixup/squash lines using the rebase--helper
 - t3415: test fixup with wrapped oneline
 - rebase -i: skip unnecessary picks using the rebase--helper
 - rebase -i: check for missing commits in the rebase--helper
 - t3404: relax rebase.missingCommitsCheck tests
 - rebase -i: also expand/collapse the SHA-1s via the rebase--helper
 - rebase -i: do not invent onelines when expanding/collapsing SHA-1s
 - rebase -i: remove useless indentation
 - rebase -i: generate the script via rebase--helper
 - t3415: verify that an empty instructionFormat is handled as before

 The final batch to "git rebase -i" updates to move more code from
 the shell script to C.

 Expecting a reroll.


* bp/fsmonitor (2017-06-12) 6 commits
 - fsmonitor: add a sample query-fsmonitor hook script for Watchman
 - fsmonitor: add documentation for the fsmonitor extension.
 - fsmonitor: add test cases for fsmonitor extension
 - fsmonitor: teach git to optionally utilize a file system monitor to speed up 
detecting new or changed files.
 - dir: make lookup_untracked() available outside of dir.c
 - bswap: add 64 bit endianness helper get_be64

 We learned to talk to watchman to speed up "git status".

 Expecting a reroll.
 cf. 

--
[Cooking]

* ah/doc-wserrorhighlight (2017-07-25) 1 commit
  (merged to 'next' on 2017-07-27 at cd1bb28d95)
 + doc: add missing values "none" and "default" for diff.wsErrorHighlight

 Doc update.

 Will cook in 'next'.


* dc/fmt-merge-msg-microcleanup (2017-07-25) 1 commit
  (merged to 'next' on 2017-07-27 at 6df06eb788)
 + fmt-merge-msg: fix coding style

 Code cleanup.

 Will cook in 'next'.


* js/git-gui-msgfmt-on-windows (2017-07-25) 7 commits
 - Merge branch 'js/msgfmt-on-windows' of ../git-gui into 
js/git-gui-msgfmt-on-windows
 - git-gui (MinGW): make use of MSys2's msgfmt
 - Merge remote-tracking branch 'philoakley/dup-gui'
 - git gui: allow for a long recentrepo list
 - git gui: de-dup selected repo from recentrepo history
 - git gui: cope with duplicates in _get_recentrepo
 - git-gui: remove duplicate entries from .gitconfig's gui.recentrepo

 Because recent Git for Windows do come with a real msgfmt, the
 build procedure for git-gui has been updated to use

Re: [PATCH] hash: Allow building with the external sha1dc library

2017-07-31 Thread Junio C Hamano

Ævar Arnfjörð Bjarmason  writes:

> So the upstream library expects you (and it's documented in their README) to 
> do:
>
> #include 
>
> But your patch is just doing:
>
> #include 
>
> At best this seems like a trivial bug and at worst us encoding some
> Suse-specific packaging convention in git, since other distros would
> presumably want to package this in /usr/include/sha1dc/sha1.h as
> upstream suggests. I.e. using the ambiguous sha1.h name is not
> something upstream's doing by default, it's something you're doing in
> your package.

It seems there still needs a bit more work on this patch.  Thanks
for reviewing and pointing out what needs to be addressed.

Re: [GSoC][PATCH 13/13] submodule: port submodule subcommand 'foreach' from shell to C

2017-07-31 Thread Stefan Beller

On Mon, Jul 31, 2017 at 1:56 PM, Prathamesh Chavan  wrote:
> This aims to make git-submodule foreach a builtin. This is the very
> first step taken in this direction. Hence, 'foreach' is ported to
> submodule--helper, and submodule--helper is called from git-submodule.sh.
> The code is split up to have one function to obtain all the list of
> submodules. This function acts as the front-end of git-submodule foreach
> subcommand. It calls the function for_each_submodule_list, which basically
> loops through the list and calls function fn, which in this case is
> runcommand_in_submodule. This third function is a calling function that
> takes care of running the command in that submodule, and recursively
> perform the same when --recursive is flagged.
>
> The first function module_foreach first parses the options present in
> argv, and then with the help of module_list_compute, generates the list of
> submodules present in the current working tree.
>
> The second function for_each_submodule_list traverses through the
> list, and calls function fn (which in case of submodule subcommand
> foreach is runcommand_in_submodule) is called for each entry.
>
> The third function runcommand_in_submodule, generates a submodule struct sub
> for $name, value and then later prepends name=sub->name; and other
> value assignment to the env argv_array structure of a child_process.
> Also the  of submodule-foreach is push to args argv_array
> structure and finally, using run_command the commands are executed
> using a shell.
>
> The third function also takes care of the recursive flag, by creating
> a separate child_process structure and prepending "--super-prefix 
> displaypath",
> to the args argv_array structure. Other required arguments and the
> input  of submodule-foreach is also appended to this argv_array.
>
> Helped-by: Brandon Williams 
> Mentored-by: Christian Couder 
> Mentored-by: Stefan Beller 
> Signed-off-by: Prathamesh Chavan 
> ---
> In this new version, the following changes have been made:
> * Comment style is improved in the function runcommand_in_submodule()
>
> * Comment in added about why the variable "path" was exposed via args
>   argv_array instead of exposing it via the env_array.
>
> * This patch exposes the various variables when argc == 1 only, just
>   for maintaining a faithful porting. You can also find discussion about
>   the same at [1].
>
> [1]: 
> https://public-inbox.org/git/came+mvusgafbn5j-_hv7qpas57hq4wgh+yz7xjmpuyqn1ga...@mail.gmail.com/

Ah right, maybe add a NEEDSWORK or have the commit message explain
this behavior.

Re: [PATCH 0/1] add git-splice subcommand for non-interactive branch splicing

2017-07-31 Thread Junio C Hamano

Adam Spiers  writes:

> Therefore there is a risk that each new UI for higher-level workflows
> will end up re-implementing these mid-level operations.  This
> undesirable situation could be avoided if git itself provided those
> mid-level operations.

Let me make sure if I get your general idea right, first.

Is your aim is to give a single unified mid-layer that these other
tools can build on instead of rolling their own "cherry-pick these
ranges, then squash that in, and then merge the other one in, ..."
sequencing machinery?

If so, I think that is a very good goal.

> # Remove commits A..B (i.e. excluding A) from the current branch.
> git splice A..B
> # Remove commit A from the current branch.
> git splice A^!
> # Remove commits A..B from the current branch, and cherry-pick
> # commits C..D at the same point.
> git splice A..B C..D

We need to make sure that the mid-layer tool offers a good set of
primitive operations that serve all of these other tools' needs.  I
do not know offhand if what you implemented that are illustrated by
these examples is or isn't that "good set".

Assuming that there is such a "good set of primitives" surfaced at
the UI level so that these other tools can express what they want to
perform with, I'd personally prefer to see a solution that extends
and uses the common "sequencer" machinery we have been using to
drive cherry-picks, reverts and interactive rebases that work on
multiple commits.  IOW, it would be nice to see that the only thing
"git splice A..B" does is to prepare a series of instructions in a
file, e.g. .git/sequencer/todo, just like "git cherry-pick A..B"
would, and let the sequencer machinery to handle the sequencing.

E.g. In a history like

---o---A---o---B---X---Y---Z   HEAD

"git splice A..B" command would write something like this:

reset to A
pick X
pick Y
pick Z

to the todo file and drive the sequencer.  As you notice, you would
need to extend the vocabulary of the sequencer a bit to allow
various things that the current users of the sequencer machinery do
not need, like resetting the HEAD to a specific commit, merging a
side branch, remembering the result of an operation, and referring
to such a commit in later operation.  For example, if you tell "git
splice" to expunge A from this sample history (I am not sure how you
express that operation in your UI):

 B---C---D
/ \
---o---A---E---F---G   HEAD

it might create a "todo" list like this to rebuild the history:

reset to A^
pick B
pick C
pick D
mark :1
reset to A^
pick E
merge :1 using F's log message and conflict resolution as reference
pick G

to result in:

 B---C---D
/ \
---o---E---F---G   HEAD

Do not pay too much attention to how the hypothetical "extended todo
instruction set" is spelled in the above illustration (e.g. I am not
advocating for multi-word command like "reset to"); these are only
to illustrate what kind of features would be needed for the job.  In
the final shape of the system, "merge" in the illustration above may
be a more succinct "merge F :1", for example (i.e. the first
parameter would name an existing merge to use as reference, the
remainder is a list of commits to be merged to the current HEAD),
just like "pick X" is a succinct way to say "cherry-pick the change
introduced by existing commit X to HEAD, reusing X's log message
and author information".

Something like that may have a place in the git-core, I would think. 

I am not sure if a bash script that calls rebase/cherry-pick/commit
manually can serve as a good "universal mid-layer" or just adding
another random command to the set of existing third-party commands
for "higher-level workflows".

Re: [GSoC][PATCH 08/13] submodule: port submodule subcommand 'summary' from shell to C

2017-07-31 Thread Stefan Beller

On Mon, Jul 31, 2017 at 1:56 PM, Prathamesh Chavan  wrote:
> The submodule subcommand 'summary' is ported in the process of
> making git-submodule a builtin. The function cmd_summary() from
> git-submodule.sh is ported to functions module_summary(),
> compute_summary_module_list(), prepare_submodule_summary() and
> print_submodule_summary().
>
> The first function module_summary() parses the options of submodule
> subcommand and also acts as the front-end of this subcommand.
> After parsing them, it calls the compute_summary_module_list()
>
> The functions compute_summary_module_list() runs the diff_cmd,
> and generates the modules list, as required by the subcommand.
> The generation of this module list is done by the using the
> callback function submodule_summary_callback(), and stored in the
> structure module_cb.
>
> Once the module list is generated, prepare_submodule_summary()
> further goes through the list and filters the list, for
> eventually calling the print_submodule_summary() function.
>
> Finally, the print_submodule_summary() takes care of generating
> and printing the summary for each submodule.
>
> Mentored-by: Christian Couder 
> Mentored-by: Stefan Beller 
> Signed-off-by: Prathamesh Chavan 
> ---
> In this new version, the following changes have been made:
>
> * Firstly, about the function compute_summary_module_list().
>   This function is created to generate the list of modules, for which
>   we will generate the summary further. Since the list is actually
>   generated using the git-diff-files or git-diff-index command, but for
>   porting this, we required to create a function similar to the builtin
>   functions of the above commands. But we can't directly call cmd_diff_files()
>   and cmd_diff_index() since we don't have to display the output and instead
>   need to store it. Hence, this function is introduced.
>
> * Also, the module_cb_list *list is not freed since it is a non-heap object.
>   Hence, free() can't be using on the non-heap objects.
>
> * In the function prepare_submodule_summary(), as suggested
>   'git_config_get_string_const' was used instead of instead of '_value'
>
> * Some variables which weren't modified throughout the function-call were
>   passed as const.
>
> * The '!!' trick, which wasn't used in the last patch, is now used in this
>   new version .
>
> * the variables sha1_dst and sha1_src are removed from the function
>   print_submodule_summary(), and instead the p->oid_src and p->oid_dst are
>   used.
>
> * The variable sm_git_dir is freed at the end of the function.
>
> * variable head was no longer used in module_summary() and instead the strbuf
>   was utilized.
>
>  builtin/submodule--helper.c | 425 
> 
>  git-submodule.sh| 182 +--
>  2 files changed, 426 insertions(+), 181 deletions(-)
>
> diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
> index f642f9889..94438d6ce 100644
> --- a/builtin/submodule--helper.c
> +++ b/builtin/submodule--helper.c
> @@ -13,6 +13,9 @@
>  #include "remote.h"
>  #include "refs.h"
>  #include "connect.h"
> +#include "revision.h"
> +#include "diffcore.h"
> +#include "diff.h"
>
>  typedef void (*submodule_list_func_t)(const struct cache_entry *list_item,
>   void *cb_data);
> @@ -766,6 +769,427 @@ static int module_name(int argc, const char **argv, 
> const char *prefix)
> return 0;
>  }
>
> +struct module_cb {
> +   unsigned int mod_src;
> +   unsigned int mod_dst;
> +   struct object_id oid_src;
> +   struct object_id oid_dst;
> +   char status;
> +   const char *sm_path;
> +};
> +#define MODULE_CB_INIT { 0, 0, NULL, NULL, '\0', NULL }
> +
> +struct module_cb_list {
> +   struct module_cb **entries;
> +   int alloc, nr;
> +};
> +#define MODULE_CB_LIST_INIT { NULL, 0, 0 }
> +
> +struct summary_cb {
> +   int argc;
> +   const char **argv;
> +   const char *prefix;
> +   char *diff_cmd;
> +   unsigned int cached: 1;
> +   unsigned int for_status: 1;
> +   unsigned int quiet: 1;
> +   unsigned int files: 1;
> +   int summary_limits;
> +};
> +#define SUMMARY_CB_INIT { 0, NULL, NULL, NULL, 0, 0, 0, 0, 0 }
> +
> +static int verify_submodule_object_name(const char *sm_path, const char 
> *sha1)
> +{
> +   struct child_process cp_rev_parse = CHILD_PROCESS_INIT;
> +
> +   cp_rev_parse.git_cmd = 1;
> +   cp_rev_parse.no_stdout = 1;
> +   cp_rev_parse.dir = sm_path;
> +   prepare_submodule_repo_env(_rev_parse.env_array);
> +
> +   argv_array_pushl(_rev_parse.args, "rev-parse", "-q",
> +"--verify", NULL);
> +   argv_array_pushf(_rev_parse.args, "%s^0", sha1);
> +
> +   if (run_command(_rev_parse))
> +   return 1;
> +
> +   return 0;
> +}
> +
> +static void

Re: [GSoC][PATCH 06/13] submodule: port submodule subcommand 'deinit' from shell to C

2017-07-31 Thread Stefan Beller

On Mon, Jul 31, 2017 at 1:56 PM, Prathamesh Chavan  wrote:
> The same mechanism is used even for porting this submodule
> subcommand, as used in the ported subcommands till now.
> The function cmd_deinit in split up after porting into three
> functions: module_deinit(), for_each_submodule_list() and
> deinit_submodule().
>
> Mentored-by: Christian Couder 
> Mentored-by: Stefan Beller 
> Signed-off-by: Prathamesh Chavan 
> ---
> In this new version, the following changes have been made:
> * In the function deinit_submodule, since the test is_git_directory()
>   adds an additional condition, instead is_directory() is used to check
>   if "sm_path/.git" is a directory.

Thanks for writing these patches.
I wonder if (some of) these notes are best put into the code
as a comment such as

/* NEEDSWORK: convert to is_submodule_active */

such that people reading this code later realize that checking
for a directory may not be the "correct" thing, but a thing which
was easy to express using shell.

> +struct deinit_cb {
> +   const char *prefix;
> +   unsigned int quiet: 1;
> +   unsigned int force: 1;
> +   unsigned int all: 1;

The value 'all' seems to be unused, i.e. we assign it but never read it?

> +};
> +#define DEINIT_CB_INIT { NULL, 0, 0, 0 }
> +
> +static void deinit_submodule(const struct cache_entry *list_item,
> +void *cb_data)
> +{
> +   struct deinit_cb *info = cb_data;
> +   const struct submodule *sub;
> +   char *displaypath = NULL;
> +   struct child_process cp_config = CHILD_PROCESS_INIT;
> +   struct strbuf sb_config = STRBUF_INIT;
> +   char *sub_git_dir = xstrfmt("%s/.git", list_item->name);
> +   mode_t mode = 0777;
> +
> +   sub = submodule_from_path(null_sha1, list_item->name);
> +
> +   if (!sub || !sub->name)
> +   goto cleanup;
> +
> +   displaypath = get_submodule_displaypath(list_item->name, 
> info->prefix);
> +
> +   /* remove the submodule work tree (unless the user already did it) */
> +   if (is_directory(list_item->name)) {
> +   struct stat st;
> +   /* protect submodules containing a .git directory */

Here may a good place to put:
  /* NEEDSWORK: automatically call absorbgitdirs before warning/die. */
(It was not in the shell version, so feel free to ignore)

> +   if (!info->force) {
> +   struct child_process cp_rm = CHILD_PROCESS_INIT;
> +   cp_rm.git_cmd = 1;
> +   argv_array_pushl(_rm.args, "rm", "-qn",
> +list_item->name, NULL);

A bug that exists in the shell version as well as here:
What if the submodule has the name '--cached', which happens
to be a valid argument for git-rm?

The call to git-rm would die claiming that the  is missing,
as the file name was miss-interpreted as another flag.

To solve this problem we would insert a '--' after the options,
before the file name to state that the last argument is a .

Not sure if we want to fix the bug while we're here or if we rather
want to add

/* NEEDSWORK: add '--' to confirm  argument */

> +   if (mkdir(list_item->name, mode))
> +   die(_("could not create empty submodule directory %s"),
> + displaypath);

In the shell version we used the shell mkdir, which on failure
would print the error to stderr on its own. The added "|| say"
is just to clarify the bigger picture.

mkdir in C doesn't print the actual error (e.g.
no diskspace, permissions) so we have to do it.
Use 'die_errno' instead as then the reason will be given as well.

> +   cp_config.git_cmd = 1;
> +   argv_array_pushl(_config.args, "config", "--get-regexp", NULL);
> +   argv_array_pushf(_config.args, "submodule.%s\\.", sub->name);

It would be nice if the commit message could give hints as why a literal
translation was chosen here and not e.g. one of the builtin config
calls, as that would omit spawning a process.

Re: [PATCH v2 5/5] sha1_file: support loading lazy objects

2017-07-31 Thread Junio C Hamano

Jonathan Tan  writes:

> Teach sha1_file to invoke the command configured in
> extensions.lazyObject whenever an object is requested and unavailable.
>
> The usage of the hook can be suppressed through a flag when invoking
> has_object_file_with_flags() and other similar functions.
>
> This is meant as a temporary measure to ensure that all Git commands
> work in such a situation. Future patches will update some commands to
> either tolerate missing objects (without invoking the command) or be
> more efficient in invoking this command.
>
> In order to determine the code changes in sha1_file.c necessary, I
> investigated the following:
>  (1) functions in sha1_file that take in a hash, without the user
>  regarding how the object is stored (loose or packed)
>  (2) functions in sha1_file that operate on packed objects (because I
>  need to check callers that know about the loose/packed distinction
>  and operate on both differently, and ensure that they can handle
>  the concept of objects that are neither loose nor packed)
>
> (1) is handled by the modification to sha1_object_info_extended().
>
> For (2), I looked at for_each_packed_object and at the packed-related
> functions that take in a hash. For for_each_packed_object, the callers
> either already work or are fixed in this patch:
>  - reachable - only to find recent objects
>  - builtin/fsck - already knows about missing objects
>  - builtin/cat-file - warning message added in this commit
>
> Callers of the other functions do not need to be changed:
>  - parse_pack_index
>- http - indirectly from http_get_info_packs
>  - find_pack_entry_one
>- this searches a single pack that is provided as an argument; the
>  caller already knows (through other means) that the sought object
>  is in a specific pack
>  - find_sha1_pack
>- fast-import - appears to be an optimization to not store a
>  file if it is already in a pack
>- http-walker - to search through a struct alt_base
>- http-push - to search through remote packs
>  - has_sha1_pack
>- builtin/fsck - already knows about promised objects
>- builtin/count-objects - informational purposes only (check if loose
>  object is also packed)
>- builtin/prune-packed - check if object to be pruned is packed (if
>  not, don't prune it)
>- revision - used to exclude packed objects if requested by user
>- diff - just for optimization
>
> An alternative design that I considered but rejected:
>
>  - Adding a hook whenever a packed object is requested, not on any
>object.  That is, whenever we attempt to search the packfiles for an
>object, if it is missing (from the packfiles and from the loose
>object storage), to invoke the hook (which must then store it as a
>packfile), open the packfile the hook generated, and report that the
>object is found in that new packfile. This reduces the amount of
>analysis needed (in that we only need to look at how packed objects
>are handled), but requires that the hook generate packfiles (or for
>sha1_file to pack whatever loose objects are generated), creating one
>packfile for each missing object and potentially very many packfiles
>that must be linearly searched. This may be tolerable now for repos
>that only have a few missing objects (for example, repos that only
>want to exclude large blobs), and might be tolerable in the future if
>we have batching support for the most commonly used commands, but is
>not tolerable now for repos that exclude a large amount of objects.
>
> Helped-by: Ben Peart 
> Signed-off-by: Jonathan Tan 
> ---

Even though I said a hugely negative thing about the "missing
objects are always OK" butchering of fsck, I do like what this patch
does.  The interface is reasonably well isolated, and moving of the
long-running-process documentation to a standalone file is very
sensible.

[PATCH 0/1] add git-splice subcommand for non-interactive branch splicing

2017-07-31 Thread Adam Spiers

This patch adds a new subcommand called git-splice, which facilitates
higher-level workflow operations in the area of branch management, for
example moving commits from one branch into another, or decomposing
large branches into smaller, independent branches, or vice-versa.

Motivation
--

By now git is very mature, and excels at low-level plumbing and
porcelain operations.  However, developer workflows have become
increasingly sophisticated, partially as a result of CI / review
systems supporting development and testing of large numbers of
concurrent patches to the same repository.  These systems can track
dependencies / conflicts between those patches (gerrit) or group
series of related commmits into a single topic (GitLab merge requests;
GitHub pull requests).

This trend towards higher-level workflows has also driven innovations
in branch management tools (e.g. topgit, gitflow, gitwork,
git-series).  I expect this trend to continue, and git UIs to evolve
which make re-organising commits between branches almost as easy as
moving files around a filesystem with a file manager tool.

Of course, any git UI which aids the user in manipulating branches can
already automate the tasks using existing porcelain and plumbing.  But
this will typically require mid-level operations, such as:

  1) removing commits from a branch
  2) porting (copying) commits from one branch into another
  3) moving commits from one branch into another
  4) predicting when any of the above will cause conflicts
  5) decomposing large branches into smaller, independent branches
 in order to reduce conflicts

Performing these operations using existing porcelain is cumbersome,
generally involving manual (i.e. interactive) use of git rebase -i
etc.  However, automated higher-level workflows will need these
operations to be as *non*-interactive as possible.

Therefore there is a risk that each new UI for higher-level workflows
will end up re-implementing these mid-level operations.  This
undesirable situation could be avoided if git itself provided those
mid-level operations.

This is where git-splice comes in.  It handles operations 1) and 2),
and lays the foundation for git-transplant, another git subcommand I
have written which implements 3).  (See below for more info on this,
and on other tools related to 4) and 5)).

Description
---

git-splice(1) non-interactively splices the current branch by removing
a range of commits from within it and/or cherry-picking a range of
commits into it.

It's essentially a convenience wrapper around cherry-pick and
interactive rebase, but the workflow state is persisted to disk, and
thereby supports standard --abort and --continue semantics just like
git's other extended workflow commands.  It also handles more complex
cases, as briefly demonstrated by the examples below.

(See git-splice.txt in the patch for full detail.)

Example usage
-

# Remove commits A..B (i.e. excluding A) from the current branch.
git splice A..B

# Remove commit A from the current branch.
git splice A^!

# Remove commits A..B from the current branch, and cherry-pick
# commits C..D at the same point.
git splice A..B C..D

# Cherry-pick commits C..D, splicing them in just after commit A.
git splice A C..D

# Remove all commits since 11am this morning mentioning "foo".
git splice --since=11am --grep="foo" --

# Remove commit A and all its ancestors (including the root commit)
# from the current branch.
git splice --root A

# Abort a splice which failed during cherry-pick or rebase.
git splice --abort

# Resume a splice after manually fixing conflicts caused by
# cherry-pick or rebase.
git splice --continue

Remaining work
--

The code does not yet conform 100% to Documentation/CodingGuidelines.
The only known areas of non-conformance are:

1. It relies on bash arrays, which is a non-POSIX feature.

2. It does not support i18n.

I would be more than happy to fix these if there is a chance of
git-splice being accepted for inclusion within the git distribution.

I appreciate that adding a new subcommand to git automatically brings
concerns about the increase in maintenance burden.  ICBW but I would
expect git-splice to add significantly less burden than the last
subcommand I wrote (check-ignore), because it only relies on very
mainstream porcelain commands and one plumbing command (update-ref).
OTOH, the test suite makes very heavy use of git's test framework, so
separating it out into a separate tree would presumably be
non-trivial.

Previous and recent work


I first announced git-splice just over a year ago:

https://www.spinics.net/lists/git/msg277346.html

https://public-inbox.org/git/20160527140811.GB11256@pacific.linksys.moosehall/

I have just fixed all known remaining issues and further beefed up the
test suite, so I think it's now ready for serious consideration.

[PATCH 1/1] add git-splice command for non-interactive branch splicing

2017-07-31 Thread Adam Spiers

Add a new subcommand git-splice(1) which non-interactively splices the
current branch by removing a range of commits from within it and/or
cherry-picking a range of commits into it.

It's essentially a convenience wrapper around cherry-pick and
interactive rebase, but the workflow state is persisted to disk, and
thereby supports standard --abort and --continue semantics just like
git's other extended workflow commands.  It also handles more complex
cases, as described in the manual page.

Signed-off-by: Adam Spiers 
---
 .gitignore   |   1 +-
 Documentation/git-splice.txt | 125 ++-
 Makefile |   1 +-
 git-splice.sh| 737 -
 t/t7900-splice.sh| 630 +++-
 5 files changed, 1494 insertions(+)
 create mode 100644 Documentation/git-splice.txt
 create mode 100755 git-splice.sh
 create mode 100755 t/t7900-splice.sh

diff --git a/.gitignore b/.gitignore
index 833ef3b..4062009 100644
--- a/.gitignore
+++ b/.gitignore
@@ -150,6 +150,7 @@
 /git-show-branch
 /git-show-index
 /git-show-ref
+/git-splice
 /git-stage
 /git-stash
 /git-status
diff --git a/Documentation/git-splice.txt b/Documentation/git-splice.txt
new file mode 100644
index 000..29f3ac8
--- /dev/null
+++ b/Documentation/git-splice.txt
@@ -0,0 +1,125 @@
+git-splice(1)
+=
+
+NAME
+
+git-splice - Splice commits into/out of current branch
+
+SYNOPSIS
+
+[verse]
+'git splice'  
+'git splice'  \-- 
+'git splice' [-r|--root]  []
+'git splice' [-r|--root]  \-- []
+'git splice' (--abort | --continue | --in-progress)
+
+DESCRIPTION
+---
+Non-interactively splice branch by removing a range of commits from
+within the current branch, and/or cherry-picking a range of commits
+into the current branch.
+
+ specifies the range of commits to remove from the
+current branch, and  specifies the range to insert
+at the point where  previously existed, or just after
+.
+
+ is a commit-ish in the standard format accepted
+by linkgit:git-rev-parse[1].
+
+ and  are single shell words
+specifying commit ranges in the standard format accepted by
+linkgit:git-rev-list[1], e.g.
+
+A..B
+A...B
+A^!   (just commit A)
+
+It is possible to pass multi-word specifications for both the removal
+and insertion ranges, in which case they are passed to
+linkgit:git-rev-list[1] to calculate the commits to remove or
+cherry-pick.  For this you need to terminate  with
+`--` to indicate that the multi-word form of parameters is being used.
+
+When the `--root` option is present, a removal range can be specified
+as a commit-ish in the standard format accepted by
+linkgit:git-rev-parse[1], in which case the commit-ish is treated as a
+range.  This makes it possible to remove or replace root
+(i.e. parentless) commits.
+
+Currently git-splice assumes that all commits being operated on have a
+single parent; removal and insertion of merge commits is not supported.
+
+N.B. Obviously this command rewrites history!  As with
+linkgit:git-rebase[1], you should be aware of all the implications of
+history rewriting before using it.  (And actually this command is just
+a glorified wrapper around linkgit:git-cherry-pick[1] and
+linkgit:git-rebase[1] in interactive mode.)
+
+OPTIONS
+---
+
+-r::
+--root::
+   Treat (each) removal range argument as a commit-ish, and
+   remove all its ancestors.
+
+--abort::
+   Abort an in-progress splice.
+
+--continue::
+   Resume an in-progress splice.
+
+--in-progress::
+   Exit 0 if and only if a splice is in progress.
+
+EXAMPLES
+
+
+`git splice A..B`::
+
+   Remove commits A..B (i.e. excluding A) from the current branch.
+
+`git splice A^!`::
+
+   Remove commit A from the current branch.
+
+`git splice --root A`::
+
+   Remove commit A and all its ancestors (including the root commit)
+   from the current branch.
+
+`git splice A..B C..D`::
+
+   Remove commits A..B from the current branch, and cherry-pick
+   commits C..D at the same point.
+
+`git splice A C..D`::
+
+   Cherry-pick commits C..D, splicing them in just after commit A.
+
+`git splice --since=11am --grep="foo" --`::
+
+   Remove all commits since 11am this morning mentioning "foo".
+
+`git splice --abort`::
+
+   Abort a splice which failed during cherry-pick or rebase.
+
+`git splice --continue`::
+
+   Resume a splice after manually fixing conflicts caused by
+   cherry-pick or rebase.
+
+`git splice --in-progress && git splice --abort`::
+
+   Abort if there is a splice in progress.
+
+SEE ALSO
+
+linkgit:git-rebase[1], linkgit:git-cherry-pick[1]
+
+GIT
+---
+Part of the linkgit:git[1] suite
diff --git a/Makefile b/Makefile
index 461c845..eeaabc2 100644
--- a/Makefile
+++ b/Makefile
@@ -547,6 +547,7 @@ SCRIPT_SH += git-quiltimport.sh
 SCRIPT_SH += git-rebase.sh
 SCRIPT_SH += git-remote-testgit.sh
 SCRIPT_SH +=

Re: [PATCH v2 0/5] Fsck for lazy objects, and (now) actual invocation of loader

2017-07-31 Thread Junio C Hamano

Jonathan Tan  writes:

> Besides review changes, this patch set now includes my rewritten
> lazy-loading sha1_file patch, so you can now do this (excerpted from one
> of the tests):
>
> test_create_repo server
> test_commit -C server 1 1.t abcdefgh
> HASH=$(git hash-object server/1.t)
> 
> test_create_repo client
> test_must_fail git -C client cat-file -p "$HASH"
> git -C client config core.repositoryformatversion 1
> git -C client config extensions.lazyobject \
> "\"$TEST_DIRECTORY/t0410/lazy-object\" \"$(pwd)/server/.git\""
> git -C client cat-file -p "$HASH"
>
> with fsck still working. Also, there is no need for a list of promised
> blobs, and the long-running process protocol is being used.

I do not think I read your response to my last comment on this
series, so I could be missing something large, but I am afraid that
the resulting fsck is only half as useful as the normal fsck.  I do
not see it any better than a hypothetical castrated version of fsck
that only checks the integrity of objects that appear in the local
object store, without doing any connectivity checks.

Don't get me wrong.  The integrity check on local objects you still
do is important---that is what allows us to make sure that the local
"cache" does not prevent us from going to the real source of the
remote object store by having a corrupt copy.  

But not being able to tell if a missing object is OK to be missing
(because we can get them if/as needed from elsewhere) or we lost the
sole copy of an object that we created and have not pushed out
(hence we are in deep yogurt) makes it pretty much pointless to run
"fsck", doesn't it?  It does not give us any guarantee that our
repository plus perfect network connectivity gives us an environment
to build further work on.

Or am I missing something fundamental?

Re: [GSoC][PATCH 05/13] submodule: port submodule subcommand 'sync' from shell to C

2017-07-31 Thread Stefan Beller

On Mon, Jul 31, 2017 at 1:56 PM, Prathamesh Chavan  wrote:
> In this new version, the following changes have been made:
> * There was no good reason for using puts in the function 
> print_default_remote()
>   Hence, in this patch, we instead use printf to do the same, as it is what
>   is generally used throughout the codebase.

Hah! I refrained on suggesting the opposite at the last patch.
In older code there are lots of "puts", c.f. "git grep puts", so I assumed
it to be a micro optimization, hence I would have suggested it.
Either way is fine with me.

Re: [GSoC][PATCH 04/13] submodule: port submodule subcommand 'status' from shell to C

2017-07-31 Thread Stefan Beller

On Mon, Jul 31, 2017 at 1:56 PM, Prathamesh Chavan  wrote:
> This aims to make git-submodule 'status' a built-in. Hence, the function
> cmd_status() is ported from shell to C. This is done by introducing
> three functions: module_status(), submodule_status() and print_status().
>
> The function module_status() acts as the front-end of the subcommand.
> It parses subcommand's options and then calls the function
> module_list_compute() for computing the list of submodules. Then
> this functions calls for_each_submodule_list() looping through the
> list obtained.
>
> Then for_each_submodule_list() calls submodule_status() for each of the
> submodule in its list. The function submodule_status() is responsible
> for generating the status each submodule it is called for, and
> then calls print_status().
>
> Finally, the function print_status() handles the printing of submodule's
> status.
>
> Mentored-by: Christian Couder 
> Mentored-by: Stefan Beller 
> Signed-off-by: Prathamesh Chavan 
> ---
> In this new version, the following changes have been made:
> * parameters passed to the function print_status() have been changed.
>   Instead of passing char *sub_sha1, instead the object_id is being passed.
>
> * Also, since the passed parameter displaypath's value isn't changed
>   by the function, it is passed to the funcition as const char *displaypath
>   instead of char *displaypath.
>
> * the output type of the function handle_submodule_head_ref() is changed
>   from strbuf to object_id, as we will use the object_id instead of the
>   hex of sha1 being stored in a struct strbuf.
>
> * diff_files_args is cleared after using it by passing it as args in the
>   function cmd_diff_files.
>
> * In the function status_submodule(), for checking if a submodule has merge
>   conflicts, the patch currently checks if the value of any of the ce_flags
>   is non-zero. Currently, I think the we aren't interested in a partiular 
> flag,
>   but I'm not sure on this.
>
> * Debugging leftovers and suprious new-lines are removed.
>
> * The confusion with displaypath being passed as te super-prefix in many
>   of the ported subcommands may be a result of the fact that the
>   function generating the displaypath: get_submodule_displaypath()
>   uses the super-prefix as simply a path concatenated with the current
>   submodule name to denote our current location.
>   The function get_super_prefix() is declared in cache.h and defined in
>   environment.c, but is majorly used in the builtin/submodule--helper.c
>   and also in unpack-trees.c
>   Also, for generating any submodule's displaypath, it would be important to
>   have ".." passed to the submodule, and currently it is possible only via the
>   super-prefix.
>   This is also other instaces where the super-prefix contained ".." as well.
>   One of such instance is Test 4 from t7406-submodule-update.sh
>   Hence, maybe documenting the value of displaypath might a solution
>   for the above problem.
>   I'm just stating my views and would like to recieve your opinion on this
>   matter.

Yes, I agree that the display path is not quite easily understood as it can be
ambiguous.  I am confused by this paragraph:
* does test 4 from 7406 fail here, or was it just the starting point
  of the discussion and it all works fine?

I have reviewed the patches up to (and including this) patch and
so far they look good to me.

Thanks,
Stefan

[PATCH v2 5/5] sha1_file: support loading lazy objects

2017-07-31 Thread Jonathan Tan

Teach sha1_file to invoke the command configured in
extensions.lazyObject whenever an object is requested and unavailable.

The usage of the hook can be suppressed through a flag when invoking
has_object_file_with_flags() and other similar functions.

This is meant as a temporary measure to ensure that all Git commands
work in such a situation. Future patches will update some commands to
either tolerate missing objects (without invoking the command) or be
more efficient in invoking this command.

In order to determine the code changes in sha1_file.c necessary, I
investigated the following:
 (1) functions in sha1_file that take in a hash, without the user
 regarding how the object is stored (loose or packed)
 (2) functions in sha1_file that operate on packed objects (because I
 need to check callers that know about the loose/packed distinction
 and operate on both differently, and ensure that they can handle
 the concept of objects that are neither loose nor packed)

(1) is handled by the modification to sha1_object_info_extended().

For (2), I looked at for_each_packed_object and at the packed-related
functions that take in a hash. For for_each_packed_object, the callers
either already work or are fixed in this patch:
 - reachable - only to find recent objects
 - builtin/fsck - already knows about missing objects
 - builtin/cat-file - warning message added in this commit

Callers of the other functions do not need to be changed:
 - parse_pack_index
   - http - indirectly from http_get_info_packs
 - find_pack_entry_one
   - this searches a single pack that is provided as an argument; the
 caller already knows (through other means) that the sought object
 is in a specific pack
 - find_sha1_pack
   - fast-import - appears to be an optimization to not store a
 file if it is already in a pack
   - http-walker - to search through a struct alt_base
   - http-push - to search through remote packs
 - has_sha1_pack
   - builtin/fsck - already knows about promised objects
   - builtin/count-objects - informational purposes only (check if loose
 object is also packed)
   - builtin/prune-packed - check if object to be pruned is packed (if
 not, don't prune it)
   - revision - used to exclude packed objects if requested by user
   - diff - just for optimization

An alternative design that I considered but rejected:

 - Adding a hook whenever a packed object is requested, not on any
   object.  That is, whenever we attempt to search the packfiles for an
   object, if it is missing (from the packfiles and from the loose
   object storage), to invoke the hook (which must then store it as a
   packfile), open the packfile the hook generated, and report that the
   object is found in that new packfile. This reduces the amount of
   analysis needed (in that we only need to look at how packed objects
   are handled), but requires that the hook generate packfiles (or for
   sha1_file to pack whatever loose objects are generated), creating one
   packfile for each missing object and potentially very many packfiles
   that must be linearly searched. This may be tolerable now for repos
   that only have a few missing objects (for example, repos that only
   want to exclude large blobs), and might be tolerable in the future if
   we have batching support for the most commonly used commands, but is
   not tolerable now for repos that exclude a large amount of objects.

Helped-by: Ben Peart 
Signed-off-by: Jonathan Tan 
---
 Documentation/Makefile |   1 +
 Documentation/gitattributes.txt|  54 ++-
 Documentation/gitrepository-layout.txt |   3 +
 .../technical/long-running-process-protocol.txt|  50 ++
 Documentation/technical/repository-version.txt |  16 
 Makefile   |   1 +
 builtin/cat-file.c |   2 +
 builtin/fsck.c |   2 +-
 cache.h|   2 +
 lazy-object.c  |  80 
 lazy-object.h  |  12 +++
 object.c   |   7 ++
 object.h   |  13 +++
 sha1_file.c|  44 ++---
 t/t0410-lazy-object.sh |  18 
 t/t0410/lazy-object| 102 +
 16 files changed, 345 insertions(+), 62 deletions(-)
 create mode 100644 Documentation/technical/long-running-process-protocol.txt
 create mode 100644 lazy-object.c
 create mode 100644 lazy-object.h
 create mode 100755 t/t0410/lazy-object

diff --git a/Documentation/Makefile b/Documentation/Makefile
index 2415e0d65..5657d49c2 100644
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@@ -69,6 +69,7 @@

[PATCH v2 4/5] fsck: support lazy objects as CLI argument

2017-07-31 Thread Jonathan Tan

Teach fsck to not treat missing objects provided on the CLI as an error
when extensions.lazyobject is set.

Signed-off-by: Jonathan Tan 
---
 builtin/fsck.c |  2 ++
 t/t0410-lazy-object.sh | 16 
 2 files changed, 18 insertions(+)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index 19681c5b3..20415902f 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -754,6 +754,8 @@ int cmd_fsck(int argc, const char **argv, const char 
*prefix)
struct object *obj = lookup_object(oid.hash);
 
if (!obj || !(obj->flags & HAS_OBJ)) {
+   if (repository_format_lazy_object)
+   continue;
error("%s: object missing", oid_to_hex());
errors_found |= ERROR_OBJECT;
continue;
diff --git a/t/t0410-lazy-object.sh b/t/t0410-lazy-object.sh
index 45f665a15..3ac61c1c5 100755
--- a/t/t0410-lazy-object.sh
+++ b/t/t0410-lazy-object.sh
@@ -76,4 +76,20 @@ test_expect_success '...but succeeds if lazyobject is set' '
git -C repo fsck
 '
 
+test_expect_success 'fsck fails on lazy object directly given in command-line' 
'
+   rm -rf repo &&
+   test_create_repo repo &&
+   test_commit -C repo 1 &&
+   HASH=$(git hash-object repo/1.t) &&
+   delete_object repo "$HASH" &&
+
+   test_must_fail git -C repo fsck "$HASH"
+'
+
+test_expect_success '...but succeeds if lazyobject is set' '
+   git -C repo config core.repositoryformatversion 1 &&
+   git -C repo config extensions.lazyobject "arbitrary string" &&
+   git -C repo fsck "$HASH"
+'
+
 test_done
-- 
2.14.0.rc1.383.gd1ce394fe2-goog

[PATCH v2 2/5] fsck: support refs pointing to lazy objects

2017-07-31 Thread Jonathan Tan

Teach fsck to not treat refs with missing targets as an error when
extensions.lazyobject is set.

For the purposes of warning about no default refs, such refs are still
treated as legitimate refs.

Signed-off-by: Jonathan Tan 
---
 builtin/fsck.c |  8 
 t/t0410-lazy-object.sh | 20 
 2 files changed, 28 insertions(+)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index b7e245654..38ed630d8 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -438,6 +438,14 @@ static int fsck_handle_ref(const char *refname, const 
struct object_id *oid,
 
obj = parse_object(oid);
if (!obj) {
+   if (repository_format_lazy_object) {
+   /*
+* Increment default_refs anyway, because this is a
+* valid ref.
+*/
+   default_refs++;
+   return 0;
+   }
error("%s: invalid sha1 pointer %s", refname, oid_to_hex(oid));
errors_found |= ERROR_REACHABLE;
/* We'll continue with the rest despite the error.. */
diff --git a/t/t0410-lazy-object.sh b/t/t0410-lazy-object.sh
index 36442531f..00e1b4a88 100755
--- a/t/t0410-lazy-object.sh
+++ b/t/t0410-lazy-object.sh
@@ -29,4 +29,24 @@ test_expect_success '...but succeeds if lazyobject is set' '
git -C repo fsck
 '
 
+test_expect_success 'fsck fails on lazy object pointed to by ref' '
+   rm -rf repo &&
+   test_create_repo repo &&
+   test_commit -C repo 1 &&
+
+   A=$(git -C repo commit-tree -m a HEAD^{tree}) &&
+
+   # Reference $A only from ref, and delete it
+   git -C repo branch mybranch "$A" &&
+   delete_object repo "$A" &&
+
+   test_must_fail git -C repo fsck
+'
+
+test_expect_success '...but succeeds if lazyobject is set' '
+   git -C repo config core.repositoryformatversion 1 &&
+   git -C repo config extensions.lazyobject "arbitrary string" &&
+   git -C repo fsck
+'
+
 test_done
-- 
2.14.0.rc1.383.gd1ce394fe2-goog

[PATCH v2 1/5] environment, fsck: introduce lazyobject extension

2017-07-31 Thread Jonathan Tan

Currently, Git does not support repos with very large numbers of objects
or repos that wish to minimize manipulation of certain blobs (for
example, because they are very large) very well, even if the user
operates mostly on part of the repo, because Git is designed on the
assumption that every referenced object is available somewhere in the
repo storage.

Introduce a new repository extension option `extensions.lazyObject`.
When it is set, Git does not treat missing objects as errors. The value
of `extensions.lazyObject` must be a string, naming the command used to
make a missing object appear whenever it is needed.

Teach fsck about the new state of affairs. In this commit, teach fsck
that missing objects referenced from the reflog are not an error case;
in future commits, fsck will be taught about other cases.

Signed-off-by: Jonathan Tan 
---
 Documentation/technical/repository-version.txt |  7 ++
 builtin/fsck.c |  2 +-
 cache.h|  2 ++
 environment.c  |  1 +
 setup.c|  7 +-
 t/t0410-lazy-object.sh | 32 ++
 6 files changed, 49 insertions(+), 2 deletions(-)
 create mode 100755 t/t0410-lazy-object.sh

diff --git a/Documentation/technical/repository-version.txt 
b/Documentation/technical/repository-version.txt
index 00ad37986..f0482cae4 100644
--- a/Documentation/technical/repository-version.txt
+++ b/Documentation/technical/repository-version.txt
@@ -86,3 +86,10 @@ for testing format-1 compatibility.
 When the config key `extensions.preciousObjects` is set to `true`,
 objects in the repository MUST NOT be deleted (e.g., by `git-prune` or
 `git repack -d`).
+
+`lazyObject`
+~
+
+When the config key `extensions.lazyObject` is set, Git does not treat missing
+objects as errors. The value of `extensions.lazyObject` must be a string,
+naming the command used to make a missing object appear whenever it is needed.
diff --git a/builtin/fsck.c b/builtin/fsck.c
index a92f44818..b7e245654 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -402,7 +402,7 @@ static void fsck_handle_reflog_oid(const char *refname, 
struct object_id *oid,
xstrfmt("%s@{%"PRItime"}", refname, 
timestamp));
obj->flags |= USED;
mark_object_reachable(obj);
-   } else {
+   } else if (!repository_format_lazy_object) {
error("%s: invalid reflog entry %s", refname, 
oid_to_hex(oid));
errors_found |= ERROR_REACHABLE;
}
diff --git a/cache.h b/cache.h
index 6c8242340..9e6bc0a21 100644
--- a/cache.h
+++ b/cache.h
@@ -853,10 +853,12 @@ extern int grafts_replace_parents;
 #define GIT_REPO_VERSION 0
 #define GIT_REPO_VERSION_READ 1
 extern int repository_format_precious_objects;
+extern char *repository_format_lazy_object;
 
 struct repository_format {
int version;
int precious_objects;
+   char *lazy_object;
int is_bare;
char *work_tree;
struct string_list unknown_extensions;
diff --git a/environment.c b/environment.c
index 3fd4b1084..cd8ef2897 100644
--- a/environment.c
+++ b/environment.c
@@ -27,6 +27,7 @@ int warn_ambiguous_refs = 1;
 int warn_on_object_refname_ambiguity = 1;
 int ref_paranoia = -1;
 int repository_format_precious_objects;
+char *repository_format_lazy_object;
 const char *git_commit_encoding;
 const char *git_log_output_encoding;
 const char *apply_default_whitespace;
diff --git a/setup.c b/setup.c
index 860507e1f..94cfde3cc 100644
--- a/setup.c
+++ b/setup.c
@@ -425,7 +425,11 @@ static int check_repo_format(const char *var, const char 
*value, void *vdata)
;
else if (!strcmp(ext, "preciousobjects"))
data->precious_objects = git_config_bool(var, value);
-   else
+   else if (!strcmp(ext, "lazyobject")) {
+   if (!value)
+   return config_error_nonbool(var);
+   data->lazy_object = xstrdup(value);
+   } else
string_list_append(>unknown_extensions, ext);
} else if (strcmp(var, "core.bare") == 0) {
data->is_bare = git_config_bool(var, value);
@@ -468,6 +472,7 @@ static int check_repository_format_gently(const char 
*gitdir, int *nongit_ok)
}
 
repository_format_precious_objects = candidate.precious_objects;
+   repository_format_lazy_object = candidate.lazy_object;
string_list_clear(_extensions, 0);
if (!has_common) {
if (candidate.is_bare != -1) {
diff --git a/t/t0410-lazy-object.sh b/t/t0410-lazy-object.sh
new file mode 100755
index 0..36442531f
--- /dev/null
+++ b/t/t0410-lazy-object.sh
@@ -0,0 +1,32 @@

[PATCH v2 3/5] fsck: support referenced lazy objects

2017-07-31 Thread Jonathan Tan

Teach fsck to not treat missing objects indirectly pointed to by refs as
an error when extensions.lazyobject is set.

Signed-off-by: Jonathan Tan 
---
 builtin/fsck.c | 11 +++
 t/t0410-lazy-object.sh | 27 +++
 2 files changed, 38 insertions(+)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index 38ed630d8..19681c5b3 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -149,6 +149,15 @@ static int mark_object(struct object *obj, int type, void 
*data, struct fsck_opt
return 0;
obj->flags |= REACHABLE;
if (!(obj->flags & HAS_OBJ)) {
+   if (repository_format_lazy_object)
+   /*
+* Return immediately; this is not an error, and further
+* recursion does not need to be performed on this
+* object since it is missing (so it does not need to be
+* added to "pending").
+*/
+   return 0;
+
if (parent && !has_object_file(>oid)) {
printf("broken link from %7s %s\n",
 printable_type(parent), 
describe_object(parent));
@@ -212,6 +221,8 @@ static void check_reachable_object(struct object *obj)
 * do a full fsck
 */
if (!(obj->flags & HAS_OBJ)) {
+   if (repository_format_lazy_object)
+   return;
if (has_sha1_pack(obj->oid.hash))
return; /* it is in pack - forget about it */
printf("missing %s %s\n", printable_type(obj),
diff --git a/t/t0410-lazy-object.sh b/t/t0410-lazy-object.sh
index 00e1b4a88..45f665a15 100755
--- a/t/t0410-lazy-object.sh
+++ b/t/t0410-lazy-object.sh
@@ -49,4 +49,31 @@ test_expect_success '...but succeeds if lazyobject is set' '
git -C repo fsck
 '
 
+test_expect_success 'fsck fails on lazy object indirectly pointed to by ref' '
+   rm -rf repo &&
+   test_create_repo repo &&
+   test_commit -C repo 1 &&
+   test_commit -C repo 2 &&
+   test_commit -C repo 3 &&
+   git -C repo tag -a annotated_tag -m "annotated tag" &&
+
+   C=$(git -C repo rev-parse 1) &&
+   T=$(git -C repo rev-parse 2^{tree}) &&
+   B=$(git hash-object repo/3.t) &&
+   AT=$(git -C repo rev-parse annotated_tag) &&
+
+   # missing commit, tree, blob, and tag
+   delete_object repo "$C" &&
+   delete_object repo "$T" &&
+   delete_object repo "$B" &&
+   delete_object repo "$AT" &&
+   test_must_fail git -C repo fsck
+'
+
+test_expect_success '...but succeeds if lazyobject is set' '
+   git -C repo config core.repositoryformatversion 1 &&
+   git -C repo config extensions.lazyobject "arbitrary string" &&
+   git -C repo fsck
+'
+
 test_done
-- 
2.14.0.rc1.383.gd1ce394fe2-goog

[PATCH v2 0/5] Fsck for lazy objects, and (now) actual invocation of loader

2017-07-31 Thread Jonathan Tan

Besides review changes, this patch set now includes my rewritten
lazy-loading sha1_file patch, so you can now do this (excerpted from one
of the tests):

test_create_repo server
test_commit -C server 1 1.t abcdefgh
HASH=$(git hash-object server/1.t)

test_create_repo client
test_must_fail git -C client cat-file -p "$HASH"
git -C client config core.repositoryformatversion 1
git -C client config extensions.lazyobject \
"\"$TEST_DIRECTORY/t0410/lazy-object\" \"$(pwd)/server/.git\""
git -C client cat-file -p "$HASH"

with fsck still working. Also, there is no need for a list of promised
blobs, and the long-running process protocol is being used.

Changes from v1:
 - added last patch that supports lazy loading
 - clarified documentation in "introduce lazyobject extension" patch
   (following Junio's comments [1])

As listed in the changes above, I have rewritten my lazy-loading
sha1_file patch to no longer use the list of promises. Also, I have
added documentation about the protocol used to (hopefully) the
appropriate places.

This is a minimal implementation, hopefully enough of a foundation to be
built upon. In particular, I haven't added the environment variable to
suppress lazy loading, and the lazy loading protocol only supports one
object at a time.

Other work
--

This differs slightly from Ben Peart's patch [2] in that the
lazy-loading functionality is provided through a configured shell
command instead of a hook shell script. I envision commands like "git
clone", in the future, needing to pre-configure lazy loading, and I
think that it will be less surprising to the user if "git clone" wrote a
default configuration instead of a default hook.

This also differs from Christian Couder's patch set [3] that implement a
larger-scale object database, in that (i) my patch set does not support
putting objects into external databases, and (ii) my patch set requires
the lazy loader to make the objects available in the local repo, instead
of allowing the objects to only be stored in the external database.

[1] https://public-inbox.org/git/xmqqzibpn1zh@gitster.mtv.corp.google.com/
[2] https://public-inbox.org/git/20170714132651.170708-2-benpe...@microsoft.com/
[3] https://public-inbox.org/git/20170620075523.26961-1-chrisc...@tuxfamily.org/

Jonathan Tan (5):
  environment, fsck: introduce lazyobject extension
  fsck: support refs pointing to lazy objects
  fsck: support referenced lazy objects
  fsck: support lazy objects as CLI argument
  sha1_file: support loading lazy objects

 Documentation/Makefile |   1 +
 Documentation/gitattributes.txt|  54 ++
 Documentation/gitrepository-layout.txt |   3 +
 .../technical/long-running-process-protocol.txt|  50 +
 Documentation/technical/repository-version.txt |  23 +
 Makefile   |   1 +
 builtin/cat-file.c |   2 +
 builtin/fsck.c |  25 -
 cache.h|   4 +
 environment.c  |   1 +
 lazy-object.c  |  80 +++
 lazy-object.h  |  12 +++
 object.c   |   7 ++
 object.h   |  13 +++
 setup.c|   7 +-
 sha1_file.c|  44 +---
 t/t0410-lazy-object.sh | 113 +
 t/t0410/lazy-object| 102 +++
 18 files changed, 478 insertions(+), 64 deletions(-)
 create mode 100644 Documentation/technical/long-running-process-protocol.txt
 create mode 100644 lazy-object.c
 create mode 100644 lazy-object.h
 create mode 100755 t/t0410-lazy-object.sh
 create mode 100755 t/t0410/lazy-object

-- 
2.14.0.rc1.383.gd1ce394fe2-goog

[GSoC][PATCH 08/13] submodule: port submodule subcommand 'summary' from shell to C

2017-07-31 Thread Prathamesh Chavan

The submodule subcommand 'summary' is ported in the process of
making git-submodule a builtin. The function cmd_summary() from
git-submodule.sh is ported to functions module_summary(),
compute_summary_module_list(), prepare_submodule_summary() and
print_submodule_summary().

The first function module_summary() parses the options of submodule
subcommand and also acts as the front-end of this subcommand.
After parsing them, it calls the compute_summary_module_list()

The functions compute_summary_module_list() runs the diff_cmd,
and generates the modules list, as required by the subcommand.
The generation of this module list is done by the using the
callback function submodule_summary_callback(), and stored in the
structure module_cb.

Once the module list is generated, prepare_submodule_summary()
further goes through the list and filters the list, for
eventually calling the print_submodule_summary() function.

Finally, the print_submodule_summary() takes care of generating
and printing the summary for each submodule.

Mentored-by: Christian Couder 
Mentored-by: Stefan Beller 
Signed-off-by: Prathamesh Chavan 
---
In this new version, the following changes have been made:

* Firstly, about the function compute_summary_module_list().
  This function is created to generate the list of modules, for which
  we will generate the summary further. Since the list is actually
  generated using the git-diff-files or git-diff-index command, but for
  porting this, we required to create a function similar to the builtin
  functions of the above commands. But we can't directly call cmd_diff_files() 
  and cmd_diff_index() since we don't have to display the output and instead
  need to store it. Hence, this function is introduced.
 
* Also, the module_cb_list *list is not freed since it is a non-heap object.
  Hence, free() can't be using on the non-heap objects.

* In the function prepare_submodule_summary(), as suggested
  'git_config_get_string_const' was used instead of instead of '_value'

* Some variables which weren't modified throughout the function-call were
  passed as const.

* The '!!' trick, which wasn't used in the last patch, is now used in this
  new version .

* the variables sha1_dst and sha1_src are removed from the function
  print_submodule_summary(), and instead the p->oid_src and p->oid_dst are
  used.

* The variable sm_git_dir is freed at the end of the function.

* variable head was no longer used in module_summary() and instead the strbuf
  was utilized.

 builtin/submodule--helper.c | 425 
 git-submodule.sh| 182 +--
 2 files changed, 426 insertions(+), 181 deletions(-)

diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index f642f9889..94438d6ce 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -13,6 +13,9 @@
 #include "remote.h"
 #include "refs.h"
 #include "connect.h"
+#include "revision.h"
+#include "diffcore.h"
+#include "diff.h"
 
 typedef void (*submodule_list_func_t)(const struct cache_entry *list_item,
  void *cb_data);
@@ -766,6 +769,427 @@ static int module_name(int argc, const char **argv, const 
char *prefix)
return 0;
 }
 
+struct module_cb {
+   unsigned int mod_src;
+   unsigned int mod_dst;
+   struct object_id oid_src;
+   struct object_id oid_dst;
+   char status;
+   const char *sm_path;
+};
+#define MODULE_CB_INIT { 0, 0, NULL, NULL, '\0', NULL }
+
+struct module_cb_list {
+   struct module_cb **entries;
+   int alloc, nr;
+};
+#define MODULE_CB_LIST_INIT { NULL, 0, 0 }
+
+struct summary_cb {
+   int argc;
+   const char **argv;
+   const char *prefix;
+   char *diff_cmd;
+   unsigned int cached: 1;
+   unsigned int for_status: 1;
+   unsigned int quiet: 1;
+   unsigned int files: 1;
+   int summary_limits;
+};
+#define SUMMARY_CB_INIT { 0, NULL, NULL, NULL, 0, 0, 0, 0, 0 }
+
+static int verify_submodule_object_name(const char *sm_path, const char *sha1)
+{
+   struct child_process cp_rev_parse = CHILD_PROCESS_INIT;
+
+   cp_rev_parse.git_cmd = 1;
+   cp_rev_parse.no_stdout = 1;
+   cp_rev_parse.dir = sm_path;
+   prepare_submodule_repo_env(_rev_parse.env_array);
+
+   argv_array_pushl(_rev_parse.args, "rev-parse", "-q",
+"--verify", NULL);
+   argv_array_pushf(_rev_parse.args, "%s^0", sha1);
+
+   if (run_command(_rev_parse))
+   return 1;
+
+   return 0;
+}
+
+static void print_submodule_summary(struct summary_cb *info,
+   struct module_cb *p)
+{
+   int missing_src = 0;
+   int missing_dst = 0;
+   char *displaypath;
+   const char *sha1_abbr_src;
+   const char *sha1_abbr_dst;
+   int errmsg = 0;
+   int total_commits = -1;
+   char

[GSoC][PATCH 11/13] submodule foreach: clarify the '$toplevel' variable documentation

2017-07-31 Thread Prathamesh Chavan

It does not contain the topmost superproject as the author assumed,
but the direct superproject, such that $toplevel/$sm_path is the
actual absolute path of the submodule.

Discussed-with: Ramsay Jones 
Signed-off-by: Stefan Beller 
Signed-off-by: Prathamesh Chavan 
---
 Documentation/git-submodule.txt | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-submodule.txt b/Documentation/git-submodule.txt
index a23baef62..8e7930ebc 100644
--- a/Documentation/git-submodule.txt
+++ b/Documentation/git-submodule.txt
@@ -188,7 +188,8 @@ foreach [--recursive] ::
$name is the name of the relevant submodule section in `.gitmodules`,
$sm_path is the path of the submodule as recorded in the superproject,
$sha1 is the commit as recorded in the superproject, and
-   $toplevel is the absolute path to the top-level of the superproject.
+   $toplevel is the absolute path to its superproject, such that
+   $toplevel/$sm_path is the absolute path of the submodule.
Note that to avoid conflicts with '$PATH' on Windows, the '$path'
variable is now a deprecated synonym of '$sm_path' variable.
Any submodules defined in the superproject but not checked out are
-- 
2.13.0

[GSoC][PATCH 05/13] submodule: port submodule subcommand 'sync' from shell to C

2017-07-31 Thread Prathamesh Chavan

Port the submodule subcommand 'sync' from shell to C using the same
mechanism as that used for porting submodule subcommand 'status'.
Hence, here the function cmd_sync() is ported from shell to C.
This is done by introducing three functions: module_sync(),
sync_submodule() and print_default_remote().

The function print_default_remote() is introduced for getting
the default remote as stdout.

Mentored-by: Christian Couder 
Mentored-by: Stefan Beller 
Signed-off-by: Prathamesh Chavan 
---
In this new version, the following changes have been made:
* There was no good reason for using puts in the function print_default_remote()
  Hence, in this patch, we instead use printf to do the same, as it is what
  is generally used throughout the codebase.

* As suggested, this patch ensures a more efficient use of variables, and
  removes most of the variables by reusing 'strbuf sb' at places required.

 builtin/submodule--helper.c | 182 
 git-submodule.sh|  56 +-
 2 files changed, 183 insertions(+), 55 deletions(-)

diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index a6e6a48cc..91945337f 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -44,6 +44,20 @@ static char *get_default_remote(void)
return ret;
 }
 
+static int print_default_remote(int argc, const char **argv, const char 
*prefix)
+{
+   const char *remote;
+
+   if (argc != 1)
+   die(_("submodule--helper print-default-remote takes no 
arguments"));
+
+   remote = get_default_remote();
+   if (remote)
+   printf("%s\n", remote);
+
+   return 0;
+}
+
 static int starts_with_dot_slash(const char *str)
 {
return str[0] == '.' && is_dir_sep(str[1]);
@@ -380,6 +394,25 @@ static void module_list_active(struct module_list *list)
*list = active_modules;
 }
 
+static char *get_up_path(const char *path)
+{
+   int i;
+   struct strbuf sb = STRBUF_INIT;
+
+   for (i = count_slashes(path); i; i--)
+   strbuf_addstr(, "../");
+
+   /*
+* Check if 'path' ends with slash or not
+* for having the same output for dir/sub_dir
+* and dir/sub_dir/
+*/
+   if (!is_dir_sep(path[strlen(path) - 1]))
+   strbuf_addstr(, "../");
+
+   return strbuf_detach(, NULL);
+}
+
 static int module_list(int argc, const char **argv, const char *prefix)
 {
int i;
@@ -733,6 +766,153 @@ static int module_name(int argc, const char **argv, const 
char *prefix)
return 0;
 }
 
+struct sync_cb {
+   const char *prefix;
+   unsigned int quiet: 1;
+   unsigned int recursive: 1;
+};
+#define SYNC_CB_INIT { NULL, 0, 0 }
+
+static void sync_submodule(const struct cache_entry *list_item, void *cb_data)
+{
+   struct sync_cb *info = cb_data;
+   const struct submodule *sub;
+   char *remote_key;
+   char *sub_origin_url, *super_config_url, *displaypath;
+   struct strbuf sb = STRBUF_INIT;
+   struct child_process cp = CHILD_PROCESS_INIT;
+   char *sub_config_path = NULL;
+
+   if (!is_submodule_active(the_repository, list_item->name))
+   return;
+
+   sub = submodule_from_path(null_sha1, list_item->name);
+
+   if (sub && sub->url) {
+   if (starts_with_dot_dot_slash(sub->url) || 
starts_with_dot_slash(sub->url)) {
+   char *remote_url, *up_path;
+   char *remote = get_default_remote();
+   strbuf_addf(, "remote.%s.url", remote);
+
+   if (git_config_get_string(sb.buf, _url))
+   remote_url = xgetcwd();
+
+   up_path = get_up_path(list_item->name);
+   sub_origin_url = relative_url(remote_url, sub->url, 
up_path);
+   super_config_url = relative_url(remote_url, sub->url, 
NULL);
+
+   free(remote);
+   free(up_path);
+   free(remote_url);
+   } else {
+   sub_origin_url = xstrdup(sub->url);
+   super_config_url = xstrdup(sub->url);
+   }
+   } else {
+   sub_origin_url = "";
+   super_config_url = "";
+   }
+
+   displaypath = get_submodule_displaypath(list_item->name, info->prefix);
+
+   if (!info->quiet)
+   printf(_("Synchronizing submodule url for '%s'\n"),
+displaypath);
+
+   strbuf_reset();
+   strbuf_addf(, "submodule.%s.url", sub->name);
+   if (git_config_set_gently(sb.buf, super_config_url))
+   die(_("failed to register url for submodule path '%s'"),
+ displaypath);
+
+   if (!is_submodule_populated_gently(list_item->name, NULL))
+   goto cleanup;
+
+

[GSoC][PATCH 07/13] diff: change scope of the function count_lines()

2017-07-31 Thread Prathamesh Chavan

Change the scope of function count_lines for allowing the function
to be reused in other parts of the code as well.

Mentored-by: Christian Couder 
Mentored-by: Stefan Beller 
Signed-off-by: Prathamesh Chavan 
---
 diff.c | 2 +-
 diff.h | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/diff.c b/diff.c
index 85e714f6c..03ed64f93 100644
--- a/diff.c
+++ b/diff.c
@@ -425,7 +425,7 @@ struct emit_callback {
struct strbuf *header;
 };
 
-static int count_lines(const char *data, int size)
+int count_lines(const char *data, int size)
 {
int count, ch, completely_empty = 1, nl_just_seen = 0;
count = 0;
diff --git a/diff.h b/diff.h
index 2d442e296..8522514e9 100644
--- a/diff.h
+++ b/diff.h
@@ -273,6 +273,7 @@ extern struct diff_filepair *diff_unmerge(struct 
diff_options *, const char *pat
 extern int parse_long_opt(const char *opt, const char **argv,
 const char **optarg);
 
+extern int count_lines(const char *data, int size);
 extern int git_diff_basic_config(const char *var, const char *value, void *cb);
 extern int git_diff_heuristic_config(const char *var, const char *value, void 
*cb);
 extern void init_diff_ui_defaults(void);
-- 
2.13.0

[GSoC][PATCH 06/13] submodule: port submodule subcommand 'deinit' from shell to C

2017-07-31 Thread Prathamesh Chavan

The same mechanism is used even for porting this submodule
subcommand, as used in the ported subcommands till now.
The function cmd_deinit in split up after porting into three
functions: module_deinit(), for_each_submodule_list() and
deinit_submodule().

Mentored-by: Christian Couder 
Mentored-by: Stefan Beller 
Signed-off-by: Prathamesh Chavan 
---
In this new version, the following changes have been made:
* In the function deinit_submodule, since the test is_git_directory()
  adds an additional condition, instead is_directory() is used to check
  if "sm_path/.git" is a directory.

* since it was possible in the previous path that the value st.st_mode passed
  to the function mkdir contained a garbage value, instead we intrduce a
  mode_t variable mode, initially containing a default mode value '0777'.
  This is what the default of mode is set in case, that the value is
  not set after the lstat call. 

 builtin/submodule--helper.c | 144 
 git-submodule.sh|  55 +
 2 files changed, 145 insertions(+), 54 deletions(-)

diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index 91945337f..f642f9889 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -913,6 +913,149 @@ static int module_sync(int argc, const char **argv, const 
char *prefix)
return 0;
 }
 
+struct deinit_cb {
+   const char *prefix;
+   unsigned int quiet: 1;
+   unsigned int force: 1;
+   unsigned int all: 1;
+};
+#define DEINIT_CB_INIT { NULL, 0, 0, 0 }
+
+static void deinit_submodule(const struct cache_entry *list_item,
+void *cb_data)
+{
+   struct deinit_cb *info = cb_data;
+   const struct submodule *sub;
+   char *displaypath = NULL;
+   struct child_process cp_config = CHILD_PROCESS_INIT;
+   struct strbuf sb_config = STRBUF_INIT;
+   char *sub_git_dir = xstrfmt("%s/.git", list_item->name);
+   mode_t mode = 0777;
+
+   sub = submodule_from_path(null_sha1, list_item->name);
+
+   if (!sub || !sub->name)
+   goto cleanup;
+
+   displaypath = get_submodule_displaypath(list_item->name, info->prefix);
+
+   /* remove the submodule work tree (unless the user already did it) */
+   if (is_directory(list_item->name)) {
+   struct stat st;
+   /* protect submodules containing a .git directory */
+   if (is_directory(sub_git_dir))
+   die(_("Submodule work tree '%s' contains a .git "
+ "directory use 'rm -rf' if you really want "
+ "to remove it including all of its history"),
+ displaypath);
+
+   if (!info->force) {
+   struct child_process cp_rm = CHILD_PROCESS_INIT;
+   cp_rm.git_cmd = 1;
+   argv_array_pushl(_rm.args, "rm", "-qn",
+list_item->name, NULL);
+
+   if (run_command(_rm))
+   die(_("Submodule work tree '%s' contains local "
+ "modifications; use '-f' to discard 
them"),
+ displaypath);
+   }
+
+   if (!lstat(list_item->name, )) {
+   struct strbuf sb_rm = STRBUF_INIT;
+   const char *format;
+
+   strbuf_addstr(_rm, list_item->name);
+
+   if (!remove_dir_recursively(_rm, 0))
+   format = _("Cleared directory '%s'\n");
+   else
+   format = _("Could not remove submodule work 
tree '%s'\n");
+
+   if (!info->quiet)
+   printf(format, displaypath);
+
+   mode = st.st_mode;
+
+   strbuf_release(_rm);
+   }
+   }
+
+   if (mkdir(list_item->name, mode))
+   die(_("could not create empty submodule directory %s"),
+ displaypath);
+
+   cp_config.git_cmd = 1;
+   argv_array_pushl(_config.args, "config", "--get-regexp", NULL);
+   argv_array_pushf(_config.args, "submodule.%s\\.", sub->name);
+
+   /* remove the .git/config entries (unless the user already did it) */
+   if (!capture_command(_config, _config, 0) && sb_config.len) {
+   char *sub_key = xstrfmt("submodule.%s", sub->name);
+   /*
+* remove the whole section so we have a clean state when
+* the user later decides to init this submodule again
+*/
+   git_config_rename_section_in_file(NULL, sub_key, NULL);
+   if (!info->quiet)
+   printf(_("Submodule '%s' (%s) unregistered

[GSoC][PATCH 10/13] submodule foreach: document '$sm_path' instead of '$path'

2017-07-31 Thread Prathamesh Chavan

As using a variable '$path' may be harmful to users due to
capitalization issues, see 64394e3ae9 (git-submodule.sh: Don't
use $path variable in eval_gettext string, 2012-04-17). Adjust
the documentation to advocate for using $sm_path,  which contains
the same value. We still make the 'path' variable available and
document it as a deprecated synonym of 'sm_path'.

Discussed-with: Ramsay Jones 
Signed-off-by: Stefan Beller 
Signed-off-by: Prathamesh Chavan 
---
This patch is same as its previous version.
Although here I'll like to add a point that we aim to slowly drop the support
of the variable 'path'.

 Documentation/git-submodule.txt | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/Documentation/git-submodule.txt b/Documentation/git-submodule.txt
index ff612001d..a23baef62 100644
--- a/Documentation/git-submodule.txt
+++ b/Documentation/git-submodule.txt
@@ -183,12 +183,14 @@ information too.
 
 foreach [--recursive] ::
Evaluates an arbitrary shell command in each checked out submodule.
-   The command has access to the variables $name, $path, $sha1 and
+   The command has access to the variables $name, $sm_path, $sha1 and
$toplevel:
$name is the name of the relevant submodule section in `.gitmodules`,
-   $path is the name of the submodule directory relative to the
-   superproject, $sha1 is the commit as recorded in the superproject,
-   and $toplevel is the absolute path to the top-level of the superproject.
+   $sm_path is the path of the submodule as recorded in the superproject,
+   $sha1 is the commit as recorded in the superproject, and
+   $toplevel is the absolute path to the top-level of the superproject.
+   Note that to avoid conflicts with '$PATH' on Windows, the '$path'
+   variable is now a deprecated synonym of '$sm_path' variable.
Any submodules defined in the superproject but not checked out are
ignored by this command. Unless given `--quiet`, foreach prints the name
of each submodule before evaluating the command.
-- 
2.13.0

[GSoC][PATCH 12/13] submodule foreach: document variable '$displaypath'

2017-07-31 Thread Prathamesh Chavan

It was observed that the variable '$displaypath' was accessible but
undocumented. Hence, document it.

Discussed-with: Ramsay Jones 
Signed-off-by: Stefan Beller 
Signed-off-by: Prathamesh Chavan 
---
In this new version, the following changes have been made:
* Spelling mistake in the commit message was corrected.

 Documentation/git-submodule.txt |  6 --
 t/t7407-submodule-foreach.sh| 22 +++---
 2 files changed, 15 insertions(+), 13 deletions(-)

diff --git a/Documentation/git-submodule.txt b/Documentation/git-submodule.txt
index 8e7930ebc..0cca702cb 100644
--- a/Documentation/git-submodule.txt
+++ b/Documentation/git-submodule.txt
@@ -183,10 +183,12 @@ information too.
 
 foreach [--recursive] ::
Evaluates an arbitrary shell command in each checked out submodule.
-   The command has access to the variables $name, $sm_path, $sha1 and
-   $toplevel:
+   The command has access to the variables $name, $sm_path, $displaypath,
+   $sha1 and $toplevel:
$name is the name of the relevant submodule section in `.gitmodules`,
$sm_path is the path of the submodule as recorded in the superproject,
+   $displaypath contains the relative path from the current working
+   directory to the submodules root directory,
$sha1 is the commit as recorded in the superproject, and
$toplevel is the absolute path to its superproject, such that
$toplevel/$sm_path is the absolute path of the submodule.
diff --git a/t/t7407-submodule-foreach.sh b/t/t7407-submodule-foreach.sh
index 0663622a4..6ad57e061 100755
--- a/t/t7407-submodule-foreach.sh
+++ b/t/t7407-submodule-foreach.sh
@@ -82,16 +82,16 @@ test_expect_success 'test basic "submodule foreach" usage' '
 
 cat >expect <../../actual
+   git submodule foreach "echo 
\$toplevel-\$name-\$sm_path-\$displaypath-\$sha1" >../../actual
) &&
test_i18ncmp expect actual
 '
@@ -206,25 +206,25 @@ submodulesha1=$(cd 
clone2/nested1/nested2/nested3/submodule && git rev-parse HEA
 
 cat >expect <../../actual
+   git submodule foreach --recursive "echo 
\$toplevel-\$name-\$sm_path-\$displaypath-\$sha1" >../../actual
) &&
test_i18ncmp expect actual
 '
-- 
2.13.0

[GSoC][PATCH 13/13] submodule: port submodule subcommand 'foreach' from shell to C

2017-07-31 Thread Prathamesh Chavan

This aims to make git-submodule foreach a builtin. This is the very
first step taken in this direction. Hence, 'foreach' is ported to
submodule--helper, and submodule--helper is called from git-submodule.sh.
The code is split up to have one function to obtain all the list of
submodules. This function acts as the front-end of git-submodule foreach
subcommand. It calls the function for_each_submodule_list, which basically
loops through the list and calls function fn, which in this case is
runcommand_in_submodule. This third function is a calling function that
takes care of running the command in that submodule, and recursively
perform the same when --recursive is flagged.

The first function module_foreach first parses the options present in
argv, and then with the help of module_list_compute, generates the list of
submodules present in the current working tree.

The second function for_each_submodule_list traverses through the
list, and calls function fn (which in case of submodule subcommand
foreach is runcommand_in_submodule) is called for each entry.

The third function runcommand_in_submodule, generates a submodule struct sub
for $name, value and then later prepends name=sub->name; and other
value assignment to the env argv_array structure of a child_process.
Also the  of submodule-foreach is push to args argv_array
structure and finally, using run_command the commands are executed
using a shell.

The third function also takes care of the recursive flag, by creating
a separate child_process structure and prepending "--super-prefix displaypath",
to the args argv_array structure. Other required arguments and the
input  of submodule-foreach is also appended to this argv_array.

Helped-by: Brandon Williams 
Mentored-by: Christian Couder 
Mentored-by: Stefan Beller 
Signed-off-by: Prathamesh Chavan 
---
In this new version, the following changes have been made:
* Comment style is improved in the function runcommand_in_submodule()

* Comment in added about why the variable "path" was exposed via args
  argv_array instead of exposing it via the env_array.

* This patch exposes the various variables when argc == 1 only, just
  for maintaining a faithful porting. You can also find discussion about
  the same at [1].

[1]: 
https://public-inbox.org/git/came+mvusgafbn5j-_hv7qpas57hq4wgh+yz7xjmpuyqn1ga...@mail.gmail.com/

 builtin/submodule--helper.c | 136 
 git-submodule.sh|  39 +
 2 files changed, 137 insertions(+), 38 deletions(-)

diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index 94438d6ce..935f56510 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -769,6 +769,141 @@ static int module_name(int argc, const char **argv, const 
char *prefix)
return 0;
 }
 
+struct cb_foreach {
+   int argc;
+   const char **argv;
+   const char *prefix;
+   unsigned int quiet: 1;
+   unsigned int recursive: 1;
+};
+#define CB_FOREACH_INIT { 0, NULL, NULL, 0, 0 }
+
+static void runcommand_in_submodule(const struct cache_entry *list_item,
+   void *cb_data)
+{
+   struct cb_foreach *info = cb_data;
+   const struct submodule *sub;
+   struct child_process cp = CHILD_PROCESS_INIT;
+   char *displaypath;
+
+   displaypath = get_submodule_displaypath(list_item->name, info->prefix);
+
+   sub = submodule_from_path(null_sha1, list_item->name);
+
+   if (!sub)
+   die(_("No url found for submodule path '%s' in .gitmodules"),
+ displaypath);
+
+   if (!is_submodule_populated_gently(list_item->name, NULL))
+   goto cleanup;
+
+   prepare_submodule_repo_env(_array);
+
+   /*
+* For the purpose of executing  in the submodule,
+* separate shell is used for the purpose of running the
+* child process.
+*/
+   cp.use_shell = 1;
+   cp.dir = list_item->name;
+
+   if (info->argc == 1) {
+   char *toplevel = xgetcwd();
+
+   argv_array_pushf(_array, "name=%s", sub->name);
+   argv_array_pushf(_array, "sm_path=%s", list_item->name);
+   argv_array_pushf(_array, "displaypath=%s", displaypath);
+   argv_array_pushf(_array, "sha1=%s",
+oid_to_hex(_item->oid));
+   argv_array_pushf(_array, "toplevel=%s", toplevel);
+
+   /*
+* Since the path variable was accessible from the script
+* before porting, it is also made available after porting.
+* The environment variable "PATH" has a very special purpose
+* on windows. And since environment variables are
+* case-insensitive in windows, it interferes with the
+* existing PATH variable. Hence, to avoid that,

[GSoC][PATCH 09/13] submodule foreach: correct '$path' in nested submodules from a subdirectory

2017-07-31 Thread Prathamesh Chavan

When running 'git submodule foreach' from a subdirectory of your
repository, nested submodules get a bogus value for $sm_path:
For a submodule 'sub' that contains a nested submodule 'nested',
running 'git -C dir submodule foreach echo $path' would report
path='../nested' for the nested submodule. The first part '../' is
derived from the logic computing the relative path from $pwd to the
root of the superproject. The second part is the submodule path inside
the submodule. This value is of little use and is hard to document.

There are two different possible solutions that have more value:
(a) The path value is documented as the path from the toplevel of the
superproject to the mount point of the submodule.
In this case we would want to have path='sub/nested'.

(b) As Ramsay noticed the documented value is wrong. For the non-nested
case the path is equal to the relative path from $pwd to the
submodules working directory. When following this model,
the expected value would be path='../sub/nested'.

The behavior for (b) was introduced in 091a6eb0fe (submodule: drop the
top-level requirement, 2013-06-16) the intent for $path seemed to be
relative to $cwd to the submodule worktree, but that did not work for
nested submodules, as the intermittent submodules were not included in
the path.

If we were to fix the meaning of the $path using (a) such that "path"
is "the path from the toplevel of the superproject to the mount point
of the submodule", we would break any existing submodule user that runs
foreach from non-root of the superproject as the non-nested submodule
'../sub' would change its path to 'sub'.

If we would fix the meaning of the $path using (b), such that "path"
is "the relative path from $pwd to the submodule", then we would break
any user that uses nested submodules (even from the root directory) as
the 'nested' would become 'sub/nested'.

Both groups can be found in the wild.  The author has no data if one group
outweighs the other by large margin, and offending each one seems equally
bad at first.  However in the authors imagination it is better to go with
(a) as running from a sub directory sounds like it is carried out
by a human rather than by some automation task.  With a human on
the keyboard the feedback loop is short and the changed behavior can be
adapted to quickly unlike some automation that can break silently.

Discussed-with: Ramsay Jones 
Signed-off-by: Prathamesh Chavan 
Signed-off-by: Stefan Beller 
---
 git-submodule.sh |  1 -
 t/t7407-submodule-foreach.sh | 36 ++--
 2 files changed, 34 insertions(+), 3 deletions(-)

diff --git a/git-submodule.sh b/git-submodule.sh
index a427ddafd..493a64372 100755
--- a/git-submodule.sh
+++ b/git-submodule.sh
@@ -320,7 +320,6 @@ cmd_foreach()
prefix="$prefix$sm_path/"
sanitize_submodule_env
cd "$sm_path" &&
-   sm_path=$(git submodule--helper relative-path 
"$sm_path" "$wt_prefix") &&
# we make $path available to scripts ...
path=$sm_path &&
if test $# -eq 1
diff --git a/t/t7407-submodule-foreach.sh b/t/t7407-submodule-foreach.sh
index 6ba5daf42..0663622a4 100755
--- a/t/t7407-submodule-foreach.sh
+++ b/t/t7407-submodule-foreach.sh
@@ -82,9 +82,9 @@ test_expect_success 'test basic "submodule foreach" usage' '
 
 cat >expect  expect <

[GSoC][PATCH 04/13] submodule: port submodule subcommand 'status' from shell to C

2017-07-31 Thread Prathamesh Chavan

This aims to make git-submodule 'status' a built-in. Hence, the function
cmd_status() is ported from shell to C. This is done by introducing
three functions: module_status(), submodule_status() and print_status().

The function module_status() acts as the front-end of the subcommand.
It parses subcommand's options and then calls the function
module_list_compute() for computing the list of submodules. Then
this functions calls for_each_submodule_list() looping through the
list obtained.

Then for_each_submodule_list() calls submodule_status() for each of the
submodule in its list. The function submodule_status() is responsible
for generating the status each submodule it is called for, and
then calls print_status().

Finally, the function print_status() handles the printing of submodule's
status.

Mentored-by: Christian Couder 
Mentored-by: Stefan Beller 
Signed-off-by: Prathamesh Chavan 
---
In this new version, the following changes have been made:
* parameters passed to the function print_status() have been changed.
  Instead of passing char *sub_sha1, instead the object_id is being passed.
 
* Also, since the passed parameter displaypath's value isn't changed
  by the function, it is passed to the funcition as const char *displaypath
  instead of char *displaypath.

* the output type of the function handle_submodule_head_ref() is changed
  from strbuf to object_id, as we will use the object_id instead of the
  hex of sha1 being stored in a struct strbuf.

* diff_files_args is cleared after using it by passing it as args in the
  function cmd_diff_files.

* In the function status_submodule(), for checking if a submodule has merge
  conflicts, the patch currently checks if the value of any of the ce_flags
  is non-zero. Currently, I think the we aren't interested in a partiular flag,
  but I'm not sure on this.

* Debugging leftovers and suprious new-lines are removed.

* The confusion with displaypath being passed as te super-prefix in many
  of the ported subcommands may be a result of the fact that the
  function generating the displaypath: get_submodule_displaypath()
  uses the super-prefix as simply a path concatenated with the current
  submodule name to denote our current location.
  The function get_super_prefix() is declared in cache.h and defined in
  environment.c, but is majorly used in the builtin/submodule--helper.c
  and also in unpack-trees.c
  Also, for generating any submodule's displaypath, it would be important to
  have ".." passed to the submodule, and currently it is possible only via the
  super-prefix.
  This is also other instaces where the super-prefix contained ".." as well.
  One of such instance is Test 4 from t7406-submodule-update.sh
  Hence, maybe documenting the value of displaypath might a solution
  for the above problem.
  I'm just stating my views and would like to recieve your opinion on this
  matter.

 builtin/submodule--helper.c | 154 
 git-submodule.sh|  49 +-
 2 files changed, 155 insertions(+), 48 deletions(-)

diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index 2cb72d68e..a6e6a48cc 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -561,6 +561,159 @@ static int module_init(int argc, const char **argv, const 
char *prefix)
return 0;
 }
 
+struct status_cb {
+   const char *prefix;
+   unsigned int quiet: 1;
+   unsigned int recursive: 1;
+   unsigned int cached: 1;
+};
+#define STATUS_CB_INIT { NULL, 0, 0, 0 }
+
+static void print_status(struct status_cb *info, char state, const char *path,
+const struct object_id *oid, const char *displaypath)
+{
+   if (info->quiet)
+   return;
+
+   printf("%c%s %s", state, oid_to_hex(oid), displaypath);
+
+   if (state == ' ' || state == '+') {
+   struct argv_array name_rev_args = ARGV_ARRAY_INIT;
+
+   argv_array_pushl(_rev_args, "print-name-rev",
+path, oid_to_hex(oid), NULL);
+   print_name_rev(name_rev_args.argc, name_rev_args.argv,
+  info->prefix);
+   } else {
+   printf("\n");
+   }
+}
+
+static int handle_submodule_head_ref(const char *refname,
+const struct object_id *oid, int flags,
+void *cb_data)
+{
+   struct object_id *output = cb_data;
+   if (oid)
+   oidcpy(output, oid);
+
+   return 0;
+}
+
+static void status_submodule(const struct cache_entry *list_item, void 
*cb_data)
+{
+   struct status_cb *info = cb_data;
+   char *displaypath;
+   struct argv_array diff_files_args = ARGV_ARRAY_INIT;
+
+   if (!submodule_from_path(null_sha1, list_item->name))
+   die(_("no submodule mapping found in .gitmodules for path

[GSoC][PATCH 03/13] submodule: port set_name_rev() from shell to C

2017-07-31 Thread Prathamesh Chavan

Function set_name_rev() is ported from git-submodule to the
submodule--helper builtin. The function get_name_rev() generates the
value of the revision name as required, and the function
print_name_rev() handles the formating and printing of the obtained
revision name.

Mentored-by: Christian Couder 
Mentored-by: Stefan Beller 
Signed-off-by: Prathamesh Chavan 
---
In this new version, the following changes have been made:
* The variable namerev from print_name_rev is now freed at the end of the 
  function.

 builtin/submodule--helper.c | 64 +
 git-submodule.sh| 16 ++--
 2 files changed, 66 insertions(+), 14 deletions(-)

diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index e41572f7a..2cb72d68e 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -244,6 +244,69 @@ static char *get_submodule_displaypath(const char *path, 
const char *prefix)
}
 }
 
+static char *get_name_rev(const char *sub_path, const char* object_id)
+{
+   struct strbuf sb = STRBUF_INIT;
+   const char ***d;
+
+   static const char *describe_bare[] = {
+   NULL
+   };
+
+   static const char *describe_tags[] = {
+   "--tags", NULL
+   };
+
+   static const char *describe_contains[] = {
+   "--contains", NULL
+   };
+
+   static const char *describe_all_always[] = {
+   "--all", "--always", NULL
+   };
+
+   static const char **describe_argv[] = {
+   describe_bare, describe_tags, describe_contains,
+   describe_all_always, NULL
+   };
+
+   for (d = describe_argv; *d; d++) {
+   struct child_process cp = CHILD_PROCESS_INIT;
+   prepare_submodule_repo_env(_array);
+   cp.dir = sub_path;
+   cp.git_cmd = 1;
+   cp.no_stderr = 1;
+
+   argv_array_push(, "describe");
+   argv_array_pushv(, *d);
+   argv_array_push(, object_id);
+
+   if (!capture_command(, , 0) && sb.len) {
+   strbuf_strip_suffix(, "\n");
+   return strbuf_detach(, NULL);
+   }
+
+   }
+
+   strbuf_release();
+   return NULL;
+}
+
+static int print_name_rev(int argc, const char **argv, const char *prefix)
+{
+   char *namerev;
+   if (argc != 3)
+   die("print-name-rev only accepts two arguments:  ");
+
+   namerev = get_name_rev(argv[1], argv[2]);
+   if (namerev && namerev[0])
+   printf(" (%s)", namerev);
+   printf("\n");
+
+   free(namerev);
+   return 0;
+}
+
 struct module_list {
const struct cache_entry **entries;
int alloc, nr;
@@ -1242,6 +1305,7 @@ static struct cmd_struct commands[] = {
{"relative-path", resolve_relative_path, 0},
{"resolve-relative-url", resolve_relative_url, 0},
{"resolve-relative-url-test", resolve_relative_url_test, 0},
+   {"print-name-rev", print_name_rev, 0},
{"init", module_init, SUPPORT_SUPER_PREFIX},
{"remote-branch", resolve_remote_submodule_branch, 0},
{"push-check", push_check, 0},
diff --git a/git-submodule.sh b/git-submodule.sh
index e131760ee..e988167e0 100755
--- a/git-submodule.sh
+++ b/git-submodule.sh
@@ -759,18 +759,6 @@ cmd_update()
}
 }
 
-set_name_rev () {
-   revname=$( (
-   sanitize_submodule_env
-   cd "$1" && {
-   git describe "$2" 2>/dev/null ||
-   git describe --tags "$2" 2>/dev/null ||
-   git describe --contains "$2" 2>/dev/null ||
-   git describe --all --always "$2"
-   }
-   ) )
-   test -z "$revname" || revname=" ($revname)"
-}
 #
 # Show commit summary for submodules in index or working tree
 #
@@ -1042,14 +1030,14 @@ cmd_status()
fi
if git diff-files --ignore-submodules=dirty --quiet -- 
"$sm_path"
then
-   set_name_rev "$sm_path" "$sha1"
+   revname=$(git submodule--helper print-name-rev 
"$sm_path" "$sha1")
say " $sha1 $displaypath$revname"
else
if test -z "$cached"
then
sha1=$(sanitize_submodule_env; cd "$sm_path" && 
git rev-parse --verify HEAD)
fi
-   set_name_rev "$sm_path" "$sha1"
+   revname=$(git submodule--helper print-name-rev 
"$sm_path" "$sha1")
say "+$sha1 $displaypath$revname"
fi
 
-- 
2.13.0

[GSoC][PATCH 02/13] submodule--helper: introduce for_each_submodule_list()

2017-07-31 Thread Prathamesh Chavan

Introduce function for_each_submodule_list() and
replace a loop in module_init() with a call to it.

The new function will also be used in other parts of the
system in later patches.

Mentored-by: Christian Couder 
Mentored-by: Stefan Beller 
Signed-off-by: Prathamesh Chavan 
---
 builtin/submodule--helper.c | 39 +--
 1 file changed, 29 insertions(+), 10 deletions(-)

diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index 7af4de09b..e41572f7a 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -14,6 +14,9 @@
 #include "refs.h"
 #include "connect.h"
 
+typedef void (*submodule_list_func_t)(const struct cache_entry *list_item,
+ void *cb_data);
+
 static char *get_default_remote(void)
 {
char *dest = NULL, *ret;
@@ -352,17 +355,30 @@ static int module_list(int argc, const char **argv, const 
char *prefix)
return 0;
 }
 
-static void init_submodule(const char *path, const char *prefix, int quiet)
+static void for_each_submodule_list(const struct module_list list,
+   submodule_list_func_t fn, void *cb_data)
 {
+   int i;
+   for (i = 0; i < list.nr; i++)
+   fn(list.entries[i], cb_data);
+}
+
+struct init_cb {
+   const char *prefix;
+   unsigned int quiet: 1;
+};
+#define INIT_CB_INIT { NULL, 0 }
+
+static void init_submodule(const struct cache_entry *list_item, void *cb_data)
+{
+   struct init_cb *info = cb_data;
const struct submodule *sub;
struct strbuf sb = STRBUF_INIT;
char *upd = NULL, *url = NULL, *displaypath;
 
-   /* Only loads from .gitmodules, no overlay with .git/config */
-   gitmodules_config();
-   displaypath = get_submodule_displaypath(path, prefix);
+   displaypath = get_submodule_displaypath(list_item->name, info->prefix);
 
-   sub = submodule_from_path(null_sha1, path);
+   sub = submodule_from_path(null_sha1, list_item->name);
 
if (!sub)
die(_("No url found for submodule path '%s' in .gitmodules"),
@@ -374,7 +390,7 @@ static void init_submodule(const char *path, const char 
*prefix, int quiet)
 *
 * Set active flag for the submodule being initialized
 */
-   if (!is_submodule_active(the_repository, path)) {
+   if (!is_submodule_active(the_repository, list_item->name)) {
strbuf_addf(, "submodule.%s.active", sub->name);
git_config_set_gently(sb.buf, "true");
}
@@ -416,7 +432,7 @@ static void init_submodule(const char *path, const char 
*prefix, int quiet)
if (git_config_set_gently(sb.buf, url))
die(_("Failed to register url for submodule path '%s'"),
displaypath);
-   if (!quiet)
+   if (!info->quiet)
fprintf(stderr,
_("Submodule '%s' (%s) registered for path 
'%s'\n"),
sub->name, url, displaypath);
@@ -445,10 +461,10 @@ static void init_submodule(const char *path, const char 
*prefix, int quiet)
 
 static int module_init(int argc, const char **argv, const char *prefix)
 {
+   struct init_cb info = INIT_CB_INIT;
struct pathspec pathspec;
struct module_list list = MODULE_LIST_INIT;
int quiet = 0;
-   int i;
 
struct option module_init_options[] = {
OPT__QUIET(, N_("Suppress output for initializing a 
submodule")),
@@ -473,8 +489,11 @@ static int module_init(int argc, const char **argv, const 
char *prefix)
if (!argc && git_config_get_value_multi("submodule.active"))
module_list_active();
 
-   for (i = 0; i < list.nr; i++)
-   init_submodule(list.entries[i]->name, prefix, quiet);
+   info.prefix = prefix;
+   info.quiet = !!quiet;
+
+   gitmodules_config();
+   for_each_submodule_list(list, init_submodule, );
 
return 0;
 }
-- 
2.13.0

[GSoC][PATCH 01/13] submodule--helper: introduce get_submodule_displaypath()

2017-07-31 Thread Prathamesh Chavan

Introduce function get_submodule_displaypath() to replace the code
occurring in submodule_init() for generating displaypath of the
submodule with a call to it.

This new function will also be used in other parts of the system
in later patches.

Mentored-by: Christian Couder 
Mentored-by: Stefan Beller 
Signed-off-by: Prathamesh Chavan 
---
 builtin/submodule--helper.c | 33 ++---
 1 file changed, 22 insertions(+), 11 deletions(-)

diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index 6abdad329..7af4de09b 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -220,6 +220,27 @@ static int resolve_relative_url_test(int argc, const char 
**argv, const char *pr
return 0;
 }
 
+static char *get_submodule_displaypath(const char *path, const char *prefix)
+{
+   const char *super_prefix = get_super_prefix();
+
+   if (prefix && super_prefix) {
+   BUG("cannot have prefix '%s' and superprefix '%s'",
+   prefix, super_prefix);
+   } else if (prefix) {
+   struct strbuf sb = STRBUF_INIT;
+   char *displaypath = xstrdup(relative_path(path, prefix, ));
+   strbuf_release();
+   return displaypath;
+   } else if (super_prefix) {
+   int len = strlen(super_prefix);
+   const char *format = is_dir_sep(super_prefix[len - 1]) ? "%s%s" 
: "%s/%s";
+   return xstrfmt(format, super_prefix, path);
+   } else {
+   return xstrdup(path);
+   }
+}
+
 struct module_list {
const struct cache_entry **entries;
int alloc, nr;
@@ -339,16 +360,7 @@ static void init_submodule(const char *path, const char 
*prefix, int quiet)
 
/* Only loads from .gitmodules, no overlay with .git/config */
gitmodules_config();
-
-   if (prefix && get_super_prefix())
-   die("BUG: cannot have prefix and superprefix");
-   else if (prefix)
-   displaypath = xstrdup(relative_path(path, prefix, ));
-   else if (get_super_prefix()) {
-   strbuf_addf(, "%s%s", get_super_prefix(), path);
-   displaypath = strbuf_detach(, NULL);
-   } else
-   displaypath = xstrdup(path);
+   displaypath = get_submodule_displaypath(path, prefix);
 
sub = submodule_from_path(null_sha1, path);
 
@@ -363,7 +375,6 @@ static void init_submodule(const char *path, const char 
*prefix, int quiet)
 * Set active flag for the submodule being initialized
 */
if (!is_submodule_active(the_repository, path)) {
-   strbuf_reset();
strbuf_addf(, "submodule.%s.active", sub->name);
git_config_set_gently(sb.buf, "true");
}
-- 
2.13.0

[GSoC][PATCH 00/13] Update: Week-11

2017-07-31 Thread Prathamesh Chavan

SUMMARY OF MY PROJECT:

Git submodule subcommands are currently implemented by using shell script
'git-submodule.sh'. There are several reasons why we'll prefer not to
use the shell script. My project intends to convert the subcommands into
C code, thus making them builtins. This will increase Git's portability
and hence the efficiency of working with the git-submodule commands.
Link to the complete proposal: [1]

Mentors:
Stefan Beller 
Christian Couder 

UPDATES:

Following are the updates about my ongoing project:

* Following patches were updated after the previous reviews:
  - set_name_rev()
  submodule subcommands:
  - status
  - sync
  - deinit
  - summary
  - foreach

* Reviews from both Christian Couder  and
  Brandon Williams  helped in improvising these patches
  and their suggestions were implemented.

* Porting of submodule subcommand add is still underway. Its progess can be
  viewed at [2].

PLAN FOR WEEK-12 (1 August 2017 to 7 August 2017):

* summary: One of the problems pointed out by Brandon this week was that
  the function print_submodule_summary() was too big to keep track of
  all the things that are happening. Hence, I will be splitting the
  function into smaller functions.

* displaypath: There is some confusion produced with the way the
  value of displaypath is being generated, using super-prefix. [3]
  Via having discussion on this, I'll try to resolve the issues
  regarding it. In the patches following the update, I have addressed
  this issue as well.

* add: Porting of this subcommand is still underway and will be working
  on to completely port this subcommand.

A complete build report of these series of patches is available at: [4].
Build #145
Branch: week-11

The work is push on github and is available at: [5].

[1]: 
https://docs.google.com/document/d/1krxVLooWl--75Pot3dazhfygR3wCUUWZWzTXtK1
L-xU/
[2]: https://github.com/pratham-pc/git/commits/sub-add
[3]: https://public-inbox.org/git/20170724213028.gb92...@google.com/
[4]: https://travis-ci.org/pratham-pc/git/builds/
[5]: https://github.com/pratham-pc/git/commits/week-11

  submodule--helper: introduce get_submodule_displaypath()
  submodule--helper: introduce for_each_submodule_list()
  submodule: port set_name_rev() from shell to C
  submodule: port submodule subcommand 'status' from shell to C
  submodule: port submodule subcommand 'sync' from shell to C
  submodule: port submodule subcommand 'deinit' from shell to C
  diff: change scope of the function count_lines()
  submodule: port submodule subcommand 'summary' from shell to C
  submodule foreach: correct '$path' in nested submodules from a
subdirectory
  submodule foreach: document '$sm_path' instead of '$path'
  submodule foreach: clarify the '$toplevel' variable documentation
  submodule foreach: document variable '$displaypath'
  submodule: port submodule subcommand 'foreach' from shell to C

 Documentation/git-submodule.txt |   15 +-
 builtin/submodule--helper.c | 1175 ++-
 diff.c  |2 +-
 diff.h  |1 +
 git-submodule.sh|  394 +
 t/t7407-submodule-foreach.sh|   38 +-
 6 files changed, 1207 insertions(+), 418 deletions(-)

-- 
2.13.0

Re: [PATCH 03/15] add, reset: ensure submodules can be added or reset

2017-07-31 Thread Brandon Williams

On 07/26, Junio C Hamano wrote:
> Stefan Beller  writes:
> 
> > On Tue, Jul 25, 2017 at 2:39 PM, Brandon Williams  wrote:
> >> Commit aee9c7d65 (Submodules: Add the new "ignore" config option for
> >> diff and status) ...
> >
> > introduced in 2010, so quite widely spread.
> >
> >> ...  introduced the ignore configuration option for
> >> submodules so that configured submodules could be omitted from the
> >> status and diff commands.  Because this flag is respected in the diff
> >> machinery it has the unintended consequence of potentially prohibiting
> >> users from adding or resetting a submodule, even when a path to the
> >> submodule is explicitly given.
> >>
> >> Ensure that submodules can be added or set, even if they are configured
> >> to be ignored, by setting the `DIFF_OPT_OVERRIDE_SUBMODULE_CONFIG` diff
> >> flag.
> >>
> >> Signed-off-by: Brandon Williams 
> >> ---
> >>  builtin/add.c   | 1 +
> >>  builtin/reset.c | 1 +
> >>  2 files changed, 2 insertions(+)
> >>
> >> diff --git a/builtin/add.c b/builtin/add.c
> >> index e888fb8c5..6f271512f 100644
> >> --- a/builtin/add.c
> >> +++ b/builtin/add.c
> >> @@ -116,6 +116,7 @@ int add_files_to_cache(const char *prefix,
> >> rev.diffopt.output_format = DIFF_FORMAT_CALLBACK;
> >> rev.diffopt.format_callback = update_callback;
> >> rev.diffopt.format_callback_data = 
> >> +   rev.diffopt.flags |= DIFF_OPT_OVERRIDE_SUBMODULE_CONFIG;
> >
> >
> > This flag occurs once in the code base, with the comment:
> > /*
> >  * Unless the user did explicitly request a submodule
> >  * ignore mode by passing a command line option we do
> >  * not ignore any changed submodule SHA-1s when
> >  * comparing index and parent, no matter what is
> >  * configured. Otherwise we won't commit any
> >  * submodules which were manually staged, which would
> >  * be really confusing.
> >  */
> > int diff_flags = DIFF_OPT_OVERRIDE_SUBMODULE_CONFIG;
> >
> > in prepare_commit, so commit ignores the .gitmodules file.
> >
> > This allows git-add to add ignored submodules, currently ignored submodules
> > would have to be added using the plumbing
> > git update-index --add --cacheinfo 16,$SHA1,
> 
> Let me play devil's advocate (as I have this suspicion that .ignore
> thing specific for submodule is probably misdesigned and certainly
> its implementation is backwards).  Is the primary use case for this
> .ignore thing to be able to do
> 
>   git add .
> 
> without having to worry about adding the submodule marked as such?  
> And if so, wouldn't it surprise these users who do use .ignore if
> "git add" suddenly started adding them?
> 
> I think the right tool to use these days for excluding some paths
> when adding all others is the negative pathspec; perhaps back when
> the .ignore thing was added, it didn't exist or not widely known?  
> 
> I suspect that it may result in a better system overall if we can
> deprecate and remove the submodule-specific .ignore thing.  At
> least, I think the DIFF_OPT_OVERRIDE_SUBMODULE_CONFIG is backwards
> in that .ignore causes a submodule to be excluded from the diff by
> default and forces paths that care about differences to opt into the
> "override" thing, which is wrong---the specific UI thing that wants
> not to show them should instead opt into ignoring, while keeping the
> default not to special case such a flag that can only be set to a
> submodule path.

It looks like .ignore was added with aee9c7d65 (Submodules: Add the new
"ignore" config option for diff and status, 2010-08-06) in order to
ignore particular submodules with 'status' and 'diff' commands.  I don't
think it was intended to ignore submodules with commands like add and
reset.  Either way I agree that some of the things with most of the
submodules config seem a bit backwards and we may want to migrate away
from them completely as we begin to add more support for submodules into
the builtin commands.

> 
> > This makes sense, though a test demonstrating the change in behavior
> > would be nice, but git-add doesn't seem to change as it doesn't even load
> > the git modules config?
> >
> >> rev.max_count = 0; /* do not compare unmerged paths with stage #2 
> >> */
> >> run_diff_files(, DIFF_RACY_IS_MODIFIED);
> >> return !!data.add_errors;
> >> diff --git a/builtin/reset.c b/builtin/reset.c
> >> index 046403ed6..772d078b8 100644
> >> --- a/builtin/reset.c
> >> +++ b/builtin/reset.c
> >> @@ -156,6 +156,7 @@ static int read_from_tree(const struct pathspec 
> >> *pathspec,
> >> opt.output_format = DIFF_FORMAT_CALLBACK;
> >> opt.format_callback = update_index_from_diff;
> >> opt.format_callback_data = _to_add;
> >> +   opt.flags |= DIFF_OPT_OVERRIDE_SUBMODULE_CONFIG;
> >
> > same here? Also as this is not failing any test, it may be worth adding one
> > to document the behavior of the

Re: [PATCH 02/15] submodule: don't use submodule_from_name

2017-07-31 Thread Stefan Beller

On Sun, Jul 30, 2017 at 6:43 AM, Jens Lehmann  wrote:
> Am 26.07.2017 um 23:06 schrieb Junio C Hamano:
>>
>> Stefan Beller  writes:
>>
>>> Rereading the archives, there was quite some discussion on the design
>>> of these patches, but these lines of code did not get any attention
>>>
>>>  https://public-inbox.org/git/4cdb3063.5010...@web.de/
>>>
>>> I cc'd Jens in the hope of him having a good memory why he
>>> wrote the code that way. :)
>>
>>
>> Thanks for digging.  I wouldn't be surprised if this were a fallback
>> to help a broken entry in .gitmodules that lack .path variable, but
>> we shouldn't be sweeping the problem under the rug like that.
>
>
> Sorry to disappoint you ;-) I added this in 7dce19d374 because
> submodule by path lookup back then only parsed the checked out
> .gitmodules file.

This is still the case AFAICT, as we never ask for a specific .gitmodules
file identified by sha1 of the commit.

> So looking for it by name was a good guess to
> fetch a new submodule that wasn't present in the current HEAD's
> .gitmodules, as the path is used as the default name in "git
> submodule add".

3 things:
a) I think it is not as much a feature ('fallback to still make it work'),
   but rather a bug as when there is no (or wrong) entry in the .gitmodules
   file, reporting it is better than trying something.
b) in the case of moved submodules (2 submodules swapped their path)
   this may be harmful as we'd get a wrong submodule potentially.

c) I wonder if we want to use a different default for submodule names
   as I have seen people get confused by path and name being the same,
   e.g. to move a submodule they would have not just adapted the path,
   but any occurrence of the string that reads like the path.
   (i.e. also change the name, defeating the purpose of name/path
   separation).

   For a new name default, I would wager for some non-legible gibberish
   such as "hash( path/time )", as that sends a clear message to not mess
   with the value of the name.

>
> The refactoring in 851e18c385 could and should have removed that
> because since then we use the .gitmodules path to name mapping
> of the fetched commit.
>
>> I wonder if we should barf loudly if there shouldn't be a submodule
>> at that path, i.e.
>>
>> if (!submodule)
>> die("there is no submodule defined for path '%s'"...);
>>
>> though.
>
>
> Not sure if you want to die() or just issue a warning(), but yes.

Either die() or "warning && return 0" is fine with me.

Re: [PATCH 1/2] doc: fix small issues in SubmittingPatches

2017-07-31 Thread Junio C Hamano

Stefan Beller  writes:

>> Further, remove duplicated space character.
>
> https://en.wikipedia.org/wiki/Sentence_spacing
> seems like a globally controversial thing. (I assumed
> it was some sort of local dialect before researching
> it properly)
>
> I personally do not mind one way or another regarding
> (double) spaces after a period, but I would think we'd
> strive for consistency throughout the project.

I am not sure if that is something we want to encourage newbies to
be doing.  Especially a patch like this (notice the double-space
before "Go back to..." in the pre-context) makes me feel it is
distracting without adding much "consistency" value.

And no, I am not suggesting a tree-wide sweep to make everything
consistent.

>>   spend their time to improve your patch.  Go back to step (2).
>>
>>   (4) The list forms consensus that the last round of your patch is
>> - good.  Send it to the maintainer and cc the list.
>> + good. Send it to the maintainer and cc the list.
>>
>>   (5) A topic branch is created with the patch and is merged to 'next',
>>   and cooked further and eventually graduates to 'master'.
>> --
>> 2.14.0.rc1.434.g6eded367a
>>

Re: [PATCH 2/2] t6500: mark tests as SHA1 reliant

2017-07-31 Thread Stefan Beller

On Mon, Jul 31, 2017 at 1:17 PM, Ævar Arnfjörð Bjarmason
 wrote:
> On Mon, Jul 31, 2017 at 1:24 AM, brian m. carlson
>  wrote:
>> On Sun, Jul 30, 2017 at 11:00:19PM +, brian m. carlson wrote:
>>> Yes, basically, but a bit more generally.  There will always be cases in
>>> which we need to specify an object ID or an arbitrary string and the
>>> behavior will need to vary based on the hash.  That can be something
>>> like, in this case, the two blob contents that would have the similar
>>> prefix.
>>>
>>> So in this case, we pass the helper the string "263 410" and get back a
>>> value for either the hacked SHA-1 hash or the SHA-256 or whatever we're
>>> using.
>>
>> I realize this was worded poorly.  So for my example, in this case, we'd
>> do:
>>
>> test-helper-hash-string "263 410"
>>
>> For SHA-1, we'd get "263 410".  For SHA-256, we'd get "313 481" (which,
>> as SHA-256 blobs, both start with "17" in their hex representation).
>> Presumably we'd read some environment variable to determine the proper
>> value.
>
> I've been mostly out of the loop on this hash transition plan, but
> don't we expect to be compiling a git that knows about both SHA-1 and
> whatever the $newhash is?

That is my understanding as well.

> If so it seems better to just test all N
> hashes we have:
>
> test_expect_success_hash $desc_description '
> hash_value=$(test-helper-hash-string $CURRENT_HASH)
> 
> '
>
> Then test_expect_success_hash would run N times for the N hashes we have.

I think that is just adding more workload without furthering the stated goal
which is usually reached with just one hash function. The tests we're talking
about here are not trying to test correctness of hashes but some other
functionality
(correct abbreviation length, collisions in prefix, etc.) that would not change
depending on the hash function used, I imagine.

For t we want to have multiple versions, one for each hash.

> This would obviously be slightly more hassle to write & convert, but I
> think it would be worth it, particularly with something like Travis
> where we can test all hashes, instead of being in some mode where we
> fragment on all of hashes/gettext poison and whatever other
> compilation option we have that really requires compiling a new git
> version...

Re: [GSoC][PATCH v2 00/13] Update: Week 10

2017-07-31 Thread Brandon Williams

On 07/30, Prathamesh Chavan wrote:
> Thank you Brandon Williams  for reviewing the previous
> patch series.
> Also, I'm sorry for repling late to your reviews. The main reason was
> to give sufficient time to prepare the next version of each patch as
> suggested.

No worries, things take time.  That and I was busy most of last week
anyway :)

> The changes made in each patch are enlisted in the patch itself.
> 
> Complete build report of this work is available at: [1]
> Branch: week-10
> Build #142
> 
> Also, I have push the work on github as well and can be checked out at: [2]
> 
> [1]: https://travis-ci.org/pratham-pc/git/builds
> [2]: https://github.com/pratham-pc/git/commits/week-10
> 
> Prathamesh Chavan (13):
>   submodule--helper: introduce get_submodule_displaypath()
>   submodule--helper: introduce for_each_submodule_list()
>   submodule: port set_name_rev() from shell to C
>   submodule: port submodule subcommand 'status' from shell to C
>   submodule: port submodule subcommand 'sync' from shell to C
>   submodule: port submodule subcommand 'deinit' from shell to C
>   diff: change scope of the function count_lines()
>   submodule: port submodule subcommand 'summary' from shell to C
>   submodule foreach: correct '$path' in nested submodules from a
> subdirectory
>   submodule foreach: document '$sm_path' instead of '$path'
>   submodule foreach: clarify the '$toplevel' variable documentation
>   submodule foreach: document variable '$displaypath'
>   submodule: port submodule subcommand 'foreach' from shell to C
> 
>  Documentation/git-submodule.txt |   15 +-
>  builtin/submodule--helper.c | 1186 
> ++-
>  diff.c  |2 +-
>  diff.h  |1 +
>  git-submodule.sh|  394 +
>  t/t7407-submodule-foreach.sh|   38 +-
>  6 files changed, 1218 insertions(+), 418 deletions(-)
> 
> -- 
> 2.13.0
> 

-- 
Brandon Williams

Re: [PATCH 2/2] t6500: mark tests as SHA1 reliant

2017-07-31 Thread Stefan Beller

On Sun, Jul 30, 2017 at 4:24 PM, brian m. carlson
 wrote:
> On Sun, Jul 30, 2017 at 11:00:19PM +, brian m. carlson wrote:
>> Yes, basically, but a bit more generally.  There will always be cases in
>> which we need to specify an object ID or an arbitrary string and the
>> behavior will need to vary based on the hash.  That can be something
>> like, in this case, the two blob contents that would have the similar
>> prefix.
>>
>> So in this case, we pass the helper the string "263 410" and get back a
>> value for either the hacked SHA-1 hash or the SHA-256 or whatever we're
>> using.
>
> I realize this was worded poorly.  So for my example, in this case, we'd
> do:
>
> test-helper-hash-string "263 410"
>
> For SHA-1, we'd get "263 410".  For SHA-256, we'd get "313 481" (which,
> as SHA-256 blobs, both start with "17" in their hex representation).
> Presumably we'd read some environment variable to determine the proper
> value.

This is what Junio proposed in the first message, except that we defer that
to a shell script as for each test we may need different things, so a helper may
be of little value?

Re: [PATCH 1/2] doc: fix small issues in SubmittingPatches

2017-07-31 Thread Stefan Beller

On Sun, Jul 30, 2017 at 9:18 AM, Kaartic Sivaraam
 wrote:
> Replace the dashed version of a command with undashed
> version and quote it.

I like it, but similar as below, we'd want to go for
consistency.

>
> Further, remove duplicated space character.

https://en.wikipedia.org/wiki/Sentence_spacing
seems like a globally controversial thing. (I assumed
it was some sort of local dialect before researching
it properly)

I personally do not mind one way or another regarding
(double) spaces after a period, but I would think we'd
strive for consistency throughout the project.

>
> Signed-off-by: Kaartic Sivaraam 
> ---
>  Documentation/SubmittingPatches | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/SubmittingPatches b/Documentation/SubmittingPatches
> index 558d465b6..9d0dab08d 100644
> --- a/Documentation/SubmittingPatches
> +++ b/Documentation/SubmittingPatches
> @@ -293,7 +293,7 @@ then you just add a line saying
>
>  Signed-off-by: Random J Developer 
>
> -This line can be automatically added by Git if you run the git-commit
> +This line can be automatically added by Git if you run the 'git commit'
>  command with the -s option.
>
>  Notice that you can place your own Signed-off-by: line when
> @@ -366,7 +366,7 @@ suggests to the contributors:
>   spend their time to improve your patch.  Go back to step (2).
>
>   (4) The list forms consensus that the last round of your patch is
> - good.  Send it to the maintainer and cc the list.
> + good. Send it to the maintainer and cc the list.
>
>   (5) A topic branch is created with the patch and is merged to 'next',
>   and cooked further and eventually graduates to 'master'.
> --
> 2.14.0.rc1.434.g6eded367a
>

Re: [PATCH 2/2] t6500: mark tests as SHA1 reliant

2017-07-31 Thread Ævar Arnfjörð Bjarmason

On Mon, Jul 31, 2017 at 1:24 AM, brian m. carlson
 wrote:
> On Sun, Jul 30, 2017 at 11:00:19PM +, brian m. carlson wrote:
>> Yes, basically, but a bit more generally.  There will always be cases in
>> which we need to specify an object ID or an arbitrary string and the
>> behavior will need to vary based on the hash.  That can be something
>> like, in this case, the two blob contents that would have the similar
>> prefix.
>>
>> So in this case, we pass the helper the string "263 410" and get back a
>> value for either the hacked SHA-1 hash or the SHA-256 or whatever we're
>> using.
>
> I realize this was worded poorly.  So for my example, in this case, we'd
> do:
>
> test-helper-hash-string "263 410"
>
> For SHA-1, we'd get "263 410".  For SHA-256, we'd get "313 481" (which,
> as SHA-256 blobs, both start with "17" in their hex representation).
> Presumably we'd read some environment variable to determine the proper
> value.

I've been mostly out of the loop on this hash transition plan, but
don't we expect to be compiling a git that knows about both SHA-1 and
whatever the $newhash is? If so it seems better to just test all N
hashes we have:

test_expect_success_hash $desc_description '
hash_value=$(test-helper-hash-string $CURRENT_HASH)

'

Then test_expect_success_hash would run N times for the N hashes we have.

This would obviously be slightly more hassle to write & convert, but I
think it would be worth it, particularly with something like Travis
where we can test all hashes, instead of being in some mode where we
fragment on all of hashes/gettext poison and whatever other
compilation option we have that really requires compiling a new git
version...

Re: [PATCH v2 00/10] tag: only respect `pager.tag` in list-mode

2017-07-31 Thread Junio C Hamano

Jeff King  writes:

> On Mon, Jul 31, 2017 at 07:44:27PM +0200, Martin Ågren wrote:
>
>> On 31 July 2017 at 05:46, Jeff King  wrote:
>> > On Sun, Jul 23, 2017 at 08:17:42PM +0200, Martin Ågren wrote:
>> >
>> >> On 21 July 2017 at 00:27, Junio C Hamano  wrote:
>> >> > I tend to agree with you that 1-3/10 may be better off being a
>> >> > single patch (or 3/10 dropped, as Brandon is working on losing it
>> >> > nearby).  I would have expected 7-8/10 to be a single patch, as by
>> >> > the time a reader reaches 07/10, because of the groundwork laid by
>> >> > 04-06/10, it is obvious that the general direction is to allow the
>> >> > caller, i.e. cmd_tag(), to make a call to setup_auto_pager() only in
>> >> > some but not all circumstances, and 07/10 being faithful to the
>> >> > original behaviour (only to be updated in 08/10) is somewhat counter
>> >> > intuitive.  It is not wrong per-se; it was just unexpected.
>> >>
>> >> Thanks for your comments. I will be away for a few days, but once I
>> >> get back, I'll try to produce a v3 based on this and any further
>> >> feedback.
>> >
>> > Overall it looks good to me. I left a few minor comments.
>> >
>> > I actually like the split. I found it pretty easy to follow (though
>> > squashing as Junio suggested would be fine with me, too).
>> 
>> I assume that by "the split" you mean patches 7-8, not 1-3.  Anyway,
>> I'll squash since you're fine with it and Junio prefers it.
>
> I actually meant both, but as I said, I'm OK with it either way.

I am OK with it either way, too ;-)

Re: [L10N] Kickoff of translation for Git 2.14.0 round 2

2017-07-31 Thread Junio C Hamano

Jiang Xin  writes:

> In the last round of l10n, some l10n messages from "date.c" are
> disappeared because of the l10n unfriendly PRItime macro.  This issue
> has been fixed by commit fc0fd5b23b (Makefile: help gettext tools to
> cope with our custom PRItime format), so let's start new round of l10n
> based on the new generated "po/git.pot" file.
>
> This time there are 9 updated messages need to be translated since last
> update:
>
> l10n: git.pot: v2.14.0 round 2 (9 new, 2 removed)
>
> Generate po/git.pot from v2.14.0-rc0-40-g5eada8987e for git
> v2.14.0 l10n round 2.
>
> Signed-off-by: Jiang Xin 
>
> You can get it from the usual place:
>
> https://github.com/git-l10n/git-po/
>
> As how to update your XX.po and help to translate Git, please see
> "Updating a XX.po file" and other sections in “po/README" file.

As there is only one patch 9e7d8a9b ("blame: fix memory corruption
scrambling revision name in error message", 2017-07-24) that needs
to go to 'master', and there is nothing in that patch that requires
further translation tweaks, I am postponing -rc2 which was scheduled
to happen today.  Hopefully we can include the i18n/l10n updates
that way in the release candidate.

Please throw me a pull request when you are ready in the next few
days.  Thanks.

Re: reftable [v4]: new ref storage format

2017-07-31 Thread Junio C Hamano

Shawn Pearce  writes:

> ### Peeling
>
> References in a reftable are always peeled.

This hopefully means "a record for an annotated (or signed) tag
records both the tag object and the object it refers to", and does
not include peeling a commit down to its tree.

> ### Reference name encoding
>
> Reference names should be encoded with UTF-8.

If we limited ourselves to say that a refname is an uninterpreted
sequence of bytes that must pass check_refname_format() test, then
we won't open us to confusion such as "are two refs with the same
Unicode name encoded with different normalization form considered
the same?"

In what way does this "should be in UTF-8" help describing the file
format, I wonder.

> ### Directory/file conflicts
>
> The reftable format accepts both `refs/heads/foo` and
> `refs/heads/foo/bar` as distinct references.
>
> This property is useful for retaining log records in reftable, but may
> confuse versions of Git using `$GIT_DIR/refs` directory tree to
> maintain references.  Users of reftable may choose to continue to
> reject `foo` and `foo/bar` type conflicts to prevent problems for
> peers.

This is an interesting one.  I do agree that preserving reflog for
removed refs is a nice propertly to have.

> ### First ref block
>
> The first ref block shares the same block as the file header, and is
> 24 bytes smaller than all other blocks in the file.  The first block
> immediately begins after the file header, at offset 24.
>
> If the first block is a log block (a log only table), its block header
> begins immediately at offset 24.

A minor nit: You called such a file "a log-only file"; let's be consistent.

>
> ### Ref block format
>
> A ref block is written as:
>
> 'r'
> uint24( block_len )
> ref_record+
> uint32( restart_offset )+
> uint16( restart_count )
> padding?
>
> Blocks begin with `block_type = 'r'` and a 3-byte `block_len` which
> encodes the number of bytes in the block up to, but not including the
> optional `padding`.  This is almost always shorter than the file's
> `block_size`.  In the first ref block, `block_len` includes 24 bytes
> for the file header.

As a block cannot be longer than 16MB, allocating uint32 to a
restart offset may be a bit overkill.  I do not know if it is worth
attempting to pack 1/3 more restart entries in a block by using
uint24, though.

> The end of the block may be filled with `padding` NUL bytes to fill
> out the block to the common `block_size` as specified in the file
> header.  Padding may be necessary to ensure the following block starts
> at a block alignment, and does not spill into the tail of this block.
> Padding may be omitted if the block is the last block of the file, and
> there is no index block.  This allows reftable to efficiently scale
> down to a small number of refs.

We may want to phrase it in a way that it is more clear that
padding, if exists, must be filled with NUL bytes, not arbitrary
garbage.  Your version may be clear enough already.  I dunno.

>  ref record
>
> A `ref_record` describes a single reference, storing both the name and
> its value(s). Records are formatted as:
>
> varint( prefix_length )

Just like we saw that "uintX are network byte order" upfront, it may
be easier to give the definition, or at least an outline, of varint()
near there.

> varint( (suffix_length << 3) | value_type )
> suffix
> value?
>
> The `prefix_length` field specifies how many leading bytes of the
> prior reference record's name should be copied to obtain this
> reference's name.  This must be 0 for the first reference in any
> block, and also must be 0 for any `ref_record` whose offset is listed
> in the `restart_offset` table at the end of the block.
>
> Recovering a reference name from any `ref_record` is a simple concat:
>
> this_name = prior_name[0..prefix_length] + suffix
>
> The `suffix_length` value provides the number of bytes to copy from
> `suffix` to complete the reference name.
>
> The `value` follows.  Its format is determined by `value_type`, one of
> the following:
>
> - `0x0`: deletion; no value data (see transactions, below)
> - `0x1`: one 20-byte object id; value of the ref
> - `0x2`: two 20-byte object ids; value of the ref, peeled target
> - `0x3`: symref and text: `varint( text_len ) text`
> - `0x4`: index record (see below)
> - `0x5`: log record (see below)
>
> Symbolic references use `0x3` with a `text` string starting with `"ref: "`,
> followed by the complete name of the reference target.  No
> compression is applied to the target name.  Other types of contents
> that are also reference like, such as `FETCH_HEAD` and `MERGE_HEAD`,
> may also be stored using type `0x3`.
>
> Types `0x6..0x7` are reserved for future use.

I wondered if we regret the apparent limited extensibility later,
but giving 4 bits to value-type would limit suffix length that can
be represented by a single varint() only to 7, while the format
described would give us

Re: reftable [v4]: new ref storage format

2017-07-31 Thread Stefan Beller

On Sun, Jul 30, 2017 at 8:51 PM, Shawn Pearce  wrote:
> 4th iteration of the reftable storage format.
>
> You can read a rendered version of this here:
> https://googlers.googlesource.com/sop/jgit/+/reftable/Documentation/technical/reftable.md
>
> Significant changes from v3:
> - Incorporated Michael Haggerty's update_index concept for reflog.
> - Explicitly documented log-only tables.

I have read through v4 and I was missing the rationale
to why this is a good idea, digging up the discussion for
v3 seems to indicate that reflogs and refs themselves have
different usage patterns such that different compaction patterns
are desired, hence we need to have different files for them.

> ### Ref block format
>
> A ref block is written as:
>
> 'r'
> uint24( block_len )
> ref_record+
> uint32( restart_offset )+

As the block_size is encoded in uint24, (and so is block_len),
the restart offsets could be made uint24 as well, though that may
have alignment issues, such that reading may be slower.
But as the ref_records may produce unaligned 32 bit ints
already, I am not worried about that.

> uint16( restart_count )

When looking for 16/32/64 bit hard coded ints I came across
this one once again. How much more expensive is reading
a varint? As the block_len points at the restart_count, we cannot
replace it with a varint easily, but we could use a byte-reversed
varint instead. If we do this step, all restart offsets could also be
(reverse) varints?

Re: [PATCH v2 06/10] t7006: add tests for how git tag paginates

2017-07-31 Thread Martin Ågren

On 31 July 2017 at 18:37, Junio C Hamano  wrote:
> Jeff King  writes:
>
>> But here...
>>
>>> +test_expect_success TTY 'git tag -a respects pager.tag' '
>>> +test_when_finished "git tag -d newtag" &&
>>> +rm -f paginated.out &&
>>> +test_terminal git -c pager.tag tag -am message newtag &&
>>> +test -e paginated.out
>>> +'
>>
>> I think this behavior is just buggy, and it might be better introduced
>> as a test_expect_failure on "git tag -a does not respect pager.tag".
>>
>> Kind of a minor nit, as the series should end up in the right place
>> either way, but it can be helpful if you end up digging back in history
>> to the introduction of the test.
>
> Yes, I think that is essentially the same reaction I had to patches
> 7 and 8, where it carries the "buggy" behaviour forward and then
> fixes it.  The way the series lays groundwork to introduce a
> mechanism that can be used to address this behaviour in its earlier
> patches strongly suggests to the users that this is considered a bug
> by the author of the series to the user from early on, so adding
> this as "expect failure" and then flip it to "expect success" when
> the bug is fixed would be a more natural sequence of changes.

Thanks both for very helpful comments. I admit I viewed it less as
"fix buggy behavior" and more like "redefine wanted behavior". So I
wanted to postpone the redefinition of the behavior until all the
restructuring was done. Looking at this as a bug-fix does make
carefully moving the bug forward look rather silly.

I haven't responded to each of your suggestions individually where the
answers would have been a mere "thanks, will do". They're still much
appreciated and will help make v3 much better. Thanks.

Martin

Re: [PATCH v2 10/10] git.c: ignore pager.* when launching builtin as dashed external

2017-07-31 Thread Martin Ågren

On 31 July 2017 at 05:45, Jeff King  wrote:
> On Mon, Jul 17, 2017 at 10:10:52PM +0200, Martin Ågren wrote:
>
>> One could address this in run_argv(), by making the second call to
>> execv_dashed_external() conditional on "!is_builtin()" whereas a builtin
>> would be started as "git foo". (Possibly after unrolling and cleaning up
>> the "while (1)"-loop.) That seems like the wrong fix for this particular
>> issue, but might be a wanted change on its own -- or maybe not --, since
>> it would mean one could relay, e.g., "-c baz" to "git -c baz foo" (but
>> only for builtins...).
>
> We shouldn't need to relay them. They get added to the environment by
> the initial "git" invocation, and then are available everywhere (in
> fact, it would be wrong to relay them for multi-valued config).

Thanks for explaining. I did some very sloppy reading of the comment
in git.c that we "cannot take flags in between the 'git' and the
''" which I somehow misunderstood completely as "we cannot pass
that sort of information to git-". Silly. Thanks for taking the
time to explain what I should have found out myself...

So yeah, I meant the above and not this:

> Or did
> you mean that we could potentially allow:
>
>   [alias]
>   foo = "-c baz some-builtin"
>
> That's interesting, but I think the fact that it only works with
> builtins makes it a bad idea. And you can always do:
>
>   [alias]
>   foo = "!git -c baz some-builtin"
>
> which is equivalent.

Re: [PATCH v2 00/10] tag: only respect `pager.tag` in list-mode

2017-07-31 Thread Martin Ågren

On 31 July 2017 at 05:46, Jeff King  wrote:
> On Sun, Jul 23, 2017 at 08:17:42PM +0200, Martin Ågren wrote:
>
>> On 21 July 2017 at 00:27, Junio C Hamano  wrote:
>> > I tend to agree with you that 1-3/10 may be better off being a
>> > single patch (or 3/10 dropped, as Brandon is working on losing it
>> > nearby).  I would have expected 7-8/10 to be a single patch, as by
>> > the time a reader reaches 07/10, because of the groundwork laid by
>> > 04-06/10, it is obvious that the general direction is to allow the
>> > caller, i.e. cmd_tag(), to make a call to setup_auto_pager() only in
>> > some but not all circumstances, and 07/10 being faithful to the
>> > original behaviour (only to be updated in 08/10) is somewhat counter
>> > intuitive.  It is not wrong per-se; it was just unexpected.
>>
>> Thanks for your comments. I will be away for a few days, but once I
>> get back, I'll try to produce a v3 based on this and any further
>> feedback.
>
> Overall it looks good to me. I left a few minor comments.
>
> I actually like the split. I found it pretty easy to follow (though
> squashing as Junio suggested would be fine with me, too).

I assume that by "the split" you mean patches 7-8, not 1-3.  Anyway,
I'll squash since you're fine with it and Junio prefers it.

Martin

Re: [PATCH v2 00/10] tag: only respect `pager.tag` in list-mode

2017-07-31 Thread Jeff King

On Mon, Jul 31, 2017 at 07:44:27PM +0200, Martin Ågren wrote:

> On 31 July 2017 at 05:46, Jeff King  wrote:
> > On Sun, Jul 23, 2017 at 08:17:42PM +0200, Martin Ågren wrote:
> >
> >> On 21 July 2017 at 00:27, Junio C Hamano  wrote:
> >> > I tend to agree with you that 1-3/10 may be better off being a
> >> > single patch (or 3/10 dropped, as Brandon is working on losing it
> >> > nearby).  I would have expected 7-8/10 to be a single patch, as by
> >> > the time a reader reaches 07/10, because of the groundwork laid by
> >> > 04-06/10, it is obvious that the general direction is to allow the
> >> > caller, i.e. cmd_tag(), to make a call to setup_auto_pager() only in
> >> > some but not all circumstances, and 07/10 being faithful to the
> >> > original behaviour (only to be updated in 08/10) is somewhat counter
> >> > intuitive.  It is not wrong per-se; it was just unexpected.
> >>
> >> Thanks for your comments. I will be away for a few days, but once I
> >> get back, I'll try to produce a v3 based on this and any further
> >> feedback.
> >
> > Overall it looks good to me. I left a few minor comments.
> >
> > I actually like the split. I found it pretty easy to follow (though
> > squashing as Junio suggested would be fine with me, too).
> 
> I assume that by "the split" you mean patches 7-8, not 1-3.  Anyway,
> I'll squash since you're fine with it and Junio prefers it.

I actually meant both, but as I said, I'm OK with it either way.

-Peff

Re: reftable [v4]: new ref storage format

2017-07-31 Thread Dave Borowitz

On Sun, Jul 30, 2017 at 11:51 PM, Shawn Pearce  wrote:
> - Near constant time verification a SHA-1 is referred to by at least
>   one reference (for allow-tip-sha1-in-want).

I think I understated the importance of this when I originally brought
up allow-tip-sha1-in-want. This is an important optimization for *any*
HTTP server, even without allow-tip-sha1-in-want, in order to validate
the SHA-1s sent in the upload-pack request, which doesn't share memory
state with the /info/refs request processing.

Re: [PATCH] Fix minor typo in git-diff docs.

2017-07-31 Thread Anthony Sottile

Thanks!

I'll keep this in mind next time I send a patch.

Anthony

On Mon, Jul 31, 2017 at 9:59 AM, Junio C Hamano  wrote:
> Anthony Sottile  writes:
>
>> diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
>> index 89cc0f4..43d18a4 100644
>> --- a/Documentation/diff-options.txt
>> +++ b/Documentation/diff-options.txt
>> @@ -392,7 +392,7 @@ endif::git-log[]
>> the diff between the preimage and `/dev/null`. The resulting patch
>> is not meant to be applied with `patch` or `git apply`; this is
>> solely for people who want to just concentrate on reviewing the
>> -   text after the change. In addition, the output obviously lack
>> +   text after the change. In addition, the output obviously lacks
>> enough information to apply such a patch in reverse, even manually,
>> hence the name of the option.
>>  +
>
> Another thing that is more severe.  You seem to have replaced all
> leading tabs with whitespaces, which makes the patch unusable.  For
> this single character patch, I can pretend as if I applied your
> patch while making the fix myself in my editor, so there is no need
> to resend, but please make sure your e-mail client does not do that
> the next time.
>
> Thanks.  Queued.

git checkout-index --all --force does not restore all files when `core.autocrlf` is set to `input`

2017-07-31 Thread Anthony Sottile

I'm not sure if this is a bug or the intended behaviour.

Here's my minimal reproduction (using python3 to write files so I can
control line endings)

```
#!/bin/bash
set -ex

rm -rf repo
git init repo
cd repo

git config --local core.autocrlf input

python3 -c 'open("foo", "wb").write(b"1\r\n2\r\n")'
git add foo
python3 -c 'open("foo", "wb").write(b"3\r\n4\r\n")'

git checkout-index --all --force
echo 'I expect this `git status` to have no modifications'
git status
```

Here's the output:

```
+ rm -rf repo
+ git init repo
Initialized empty Git repository in /tmp/foo/repo/.git/
+ cd repo
+ git config --local core.autocrlf input
+ python3 -c 'open("foo", "wb").write(b"1\r\n2\r\n")'
+ git add foo
warning: CRLF will be replaced by LF in foo.
The file will have its original line endings in your working directory.
+ python3 -c 'open("foo", "wb").write(b"3\r\n4\r\n")'
+ git checkout-index --all --force
+ echo 'I expect this `git status` to have no modifications'
I expect this `git status` to have no modifications
+ git status
On branch master

Initial commit

Changes to be committed:
  (use "git rm --cached ..." to unstage)

new file:   foo

Changes not staged for commit:
  (use "git add ..." to update what will be committed)
  (use "git checkout -- ..." to discard changes in working directory)

modified:   foo

```

In this state, `git diff` and `git diff-index` disagree as well:

```
$ git diff-index --exit-code $(git write-tree) --patch; echo $?
1
$ git diff --exit-code; echo $?
0
```

I expect the plumbing command `checkout-index -af` to exactly restore
the disk state to the index such that `git status`, and `git
diff-index` both indicate there are no changes.

Interestingly, `git checkout -- .` does exactly this, but it is a
porcelain command and not suitable for scripting.  Alternatively, I'm
looking for an equivalent to `git checkout -- .` which uses only
plumbing commands.

Thanks,

Anthony

Re: [PATCH] Fix minor typo in git-diff docs.

2017-07-31 Thread Junio C Hamano

Anthony Sottile  writes:

> diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
> index 89cc0f4..43d18a4 100644
> --- a/Documentation/diff-options.txt
> +++ b/Documentation/diff-options.txt
> @@ -392,7 +392,7 @@ endif::git-log[]
> the diff between the preimage and `/dev/null`. The resulting patch
> is not meant to be applied with `patch` or `git apply`; this is
> solely for people who want to just concentrate on reviewing the
> -   text after the change. In addition, the output obviously lack
> +   text after the change. In addition, the output obviously lacks
> enough information to apply such a patch in reverse, even manually,
> hence the name of the option.
>  +

Another thing that is more severe.  You seem to have replaced all
leading tabs with whitespaces, which makes the patch unusable.  For
this single character patch, I can pretend as if I applied your
patch while making the fix myself in my editor, so there is no need
to resend, but please make sure your e-mail client does not do that
the next time.

Thanks.  Queued.

Re: [PATCH] Fix minor typo in git-diff docs.

2017-07-31 Thread Junio C Hamano

Anthony Sottile  writes:

> To be honest, I'm a bit overwhelmed by the documentation for submitting a 
> patch!
>
> I tried to follow as best I could, here's my attempt (please advise).

OK ;-)  Thanks for a patch.  Let's nitpick.

Our titles (your "Subject:" line in the e-mail header) state the
area, colon and then a one line summary and ends without the final
full stop, e.g.

Subject: [PATCH] diff-options: grammar fix

>
> From e88ad689a7587c11f270a10f191a3b6bc52a90d4 Mon Sep 17 00:00:00 2001
> From: Anthony Sottile 
> Date: Mon, 31 Jul 2017 06:54:14 -0700
> Subject: [PATCH] Fix minor typo in git-diff docs.

Generally these do not belong to the body of the message.  

What object name the commit has locally in your repository is
immaterial.  Whose change it is, when it was shown to the general
public for the first time, and what the title of the change is, are
all in the e-mail header of your message.

One exception is when you cannot convince your e-mail client to put
the name under which you sign-off the patch on the "From: " line in
your e-mail header, or you are relaying somebody else's patch.  You
can then have "From: " line to record the right name, a blank line
and then write body of the message.

>
> Signed-off-by: Anthony Sottile 

The space above "Signed-off-by: " is to describe what the change is
about, when the title is not clear enough, but I think there is no
need for such extra explanation for this particular patch.

Thanks for correcting my grammar.  Will queue with a tweaked title.

> ---
>  Documentation/diff-options.txt | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
> index 89cc0f4..43d18a4 100644
> --- a/Documentation/diff-options.txt
> +++ b/Documentation/diff-options.txt
> @@ -392,7 +392,7 @@ endif::git-log[]
> the diff between the preimage and `/dev/null`. The resulting patch
> is not meant to be applied with `patch` or `git apply`; this is
> solely for people who want to just concentrate on reviewing the
> -   text after the change. In addition, the output obviously lack
> +   text after the change. In addition, the output obviously lacks
> enough information to apply such a patch in reverse, even manually,
> hence the name of the option.
>  +

Re: [PATCH v2 06/10] t7006: add tests for how git tag paginates

2017-07-31 Thread Junio C Hamano

Jeff King  writes:

> But here...
>
>> +test_expect_success TTY 'git tag -a respects pager.tag' '
>> +test_when_finished "git tag -d newtag" &&
>> +rm -f paginated.out &&
>> +test_terminal git -c pager.tag tag -am message newtag &&
>> +test -e paginated.out
>> +'
>
> I think this behavior is just buggy, and it might be better introduced
> as a test_expect_failure on "git tag -a does not respect pager.tag".
>
> Kind of a minor nit, as the series should end up in the right place
> either way, but it can be helpful if you end up digging back in history
> to the introduction of the test.

Yes, I think that is essentially the same reaction I had to patches
7 and 8, where it carries the "buggy" behaviour forward and then
fixes it.  The way the series lays groundwork to introduce a
mechanism that can be used to address this behaviour in its earlier
patches strongly suggests to the users that this is considered a bug
by the author of the series to the user from early on, so adding
this as "expect failure" and then flip it to "expect success" when
the bug is fixed would be a more natural sequence of changes.

Thanks.

Re: [PATCH v2 05/10] git.c: provide setup_auto_pager()

2017-07-31 Thread Junio C Hamano

Jeff King  writes:

> But thinking on it, the most plausible case is something like:
>
>   setup_auto_pager("foo", -1);
>   ...
>   /* fallback to some historical compatibility name */
>   setup_auto_pager("bar", 0);
>
> And it's important for the "-1" there to be a true punt, and not do
> anything in commit_pager_choice(). So it's probably worth documenting
> the "-1" behavior as you did here as a possible value for "def".

Thanks for reading it over. I agree that the "punt" behaviour is a
sensible one in that use case, and I also agree that it would be
good to explain it.

[PATCH] Fix minor typo in git-diff docs.

2017-07-31 Thread Anthony Sottile

To be honest, I'm a bit overwhelmed by the documentation for submitting a patch!

I tried to follow as best I could, here's my attempt (please advise).

>From e88ad689a7587c11f270a10f191a3b6bc52a90d4 Mon Sep 17 00:00:00 2001
From: Anthony Sottile 
Date: Mon, 31 Jul 2017 06:54:14 -0700
Subject: [PATCH] Fix minor typo in git-diff docs.

Signed-off-by: Anthony Sottile 
---
 Documentation/diff-options.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 89cc0f4..43d18a4 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -392,7 +392,7 @@ endif::git-log[]
the diff between the preimage and `/dev/null`. The resulting patch
is not meant to be applied with `patch` or `git apply`; this is
solely for people who want to just concentrate on reviewing the
-   text after the change. In addition, the output obviously lack
+   text after the change. In addition, the output obviously lacks
enough information to apply such a patch in reverse, even manually,
hence the name of the option.
 +
-- 
2.7.4

Re: Contact with Latinamerica

2017-07-31 Thread Christopher Díaz


> You may also want to look at the https://git-scm.com/book/en/v2 free
> book 
> which can be translated by volunteers, and is possibly one of the
> first 
> ports of call for most users (top or near-top of search engine hits)
> 

Thanks for that info, I've already read that one a couple of weeks ago
when I was trying to make my first PR to my community :) sure I can
help with translations.



> Another very simple step is to read and comment on the commit
> messages sent 
> to the list, particulalrly for ease of readability and ease of 
> translation/comprehension.
> 
> It is very easy to write sentences that are too long. And with too
> many 
> conjunctions.
> 
> We do not notice when we do it, having only been taught to write
> long 
> flowery sentences for essays and novels...
> 
> Proof-reading the commit messages will also allow the reader to do
> directed 
> research on just the particular item (both the use of language and
> the code 
> style)
> 

That is a very good point, I'll keep it on mind when we have more
members and that may help them in having "more" fluent conversations.


> > 
> Welcome to the community! All it takes is one email (a journey of a
> thousand 
> miles starts with the first step) [1]
> 

Thank yoou all very much! I'll present myself then, since the last
presentations don't say a lot about myself hehe.

Well you all know that I am a software development student here. The
institute here teaches us things like web (java, php, javascript) and
mobile development (java, swift). On the other hand I think that in
order to be a good developer (like real hackers) someone needs to
understand how the machine thinks and works, that's why I've been
learning OS concepts and low level languages like Assembly and C by
myself in order to really understand how a computer works. I love open
source given the fact that it helps you to discover how other amazing
developers think and work. I've been using Linux-based distros since a
couple of years, I could even create my own Linux From Scratch once a
couple of months ago, that was a great learning experience :D but I
stay with Gentoo linux since I've found there a lot of great
information and the ability to learn about source code and get the max
performance from my laptop.

I don't like web and mobile development, but I do enjoy breaking mobile
and web apps :D that's why I'm focusing right now on cybersecurity.
Right now I'm and intern in a company called BroderJobs here in Lima,
as security analist (actually I'm the whole TI dptmnt).

Here in Peru, sadly, open source is like a taboo... noone uses it,
neither companies nor educational institutions. That's why I'm founding
this community, in order to give students the ability to connect with
open source communities and work with great developers. Right now we
are 4, but I believe that when the classes begin again ( on August 15th
me and some other universities one week later) I'm going to be able to
invite them and see how this project growths :D
 
That's why right now I'm getting in touch with lots of communities to
see how can we contribute and this could help me to present the
community to universities and other institutions so that they can see
that we actually help big communities.

That's me :) nice to meet you all
Christopher Díaz Riveros

Re: Contact with Latinamerica

2017-07-31 Thread Philip Oakley


From: "Christopher Díaz" 

Hi Christopher,
I've included the Git list to keep the discussion open to all the community. 
(we usually use reply all)



El dom, 30-07-2017 a las 19:01 +0100, Philip Oakley escribió:

Hi Philip,

Thank you very much for such a fast reply.



I can see two simple steps toward your goal that may help.

The first is to ensure that the open source tools do have
localisation
(l10n) of their command line messages, for which I'm sure Git's l10n
team
would be happy to have your communities support. -
https://github.com/git-l10n and
https://public-inbox.org/git/CANYiYbEJ3Gw=JvbhLBeFWBD7xLXxd=_fFdH3UX7
6h97zu_3...@mail.gmail.com/#r



That would be amazing since the second step in order to help open
source communities is knowing how to use git (the first one is teaching
them a programming language in order to be able to understand the code,
we are taking care of that too). This is something that we will need to
teach practicaly all of the members to do if they don't already know.
I'll be in contact with the team.



You may also want to look at the https://git-scm.com/book/en/v2 free book 
which can be translated by volunteers, and is possibly one of the first 
ports of call for most users (top or near-top of search engine hits)



The other idea is to consider how Git's version message, or
something
similar, should report the users current i18n settings, and any links
to the
right (e.g. local) support groups. At present, I don't see any
obvious
command to help users (and those on the help forums and lists) know
what
i18n nationality / language names to use for discussions. Sometimes
it is
worth ensuring these baby steps are in place.



Another very simple step is to read and comment on the commit messages sent 
to the list, particulalrly for ease of readability and ease of 
translation/comprehension.


It is very easy to write sentences that are too long. And with too many 
conjunctions.


We do not notice when we do it, having only been taught to write long 
flowery sentences for essays and novels...


Proof-reading the commit messages will also allow the reader to do directed 
research on just the particular item (both the use of language and the code 
style)




Yes, thanks for taking this in consideration, the reallity here in
Peru, and I could say maybe in a lot of parts in Latinamerica is that
they don't produce technology since it is mucho more easy just to buy
something and the language doesn't help to make the learning curve
easier.

I'll contact the translation team and I hope you won't mind if we could
say here that we contribute to the git project. It would be a lot
easier for me in order to go to invite people to the community in
educational institutions if I could say that the community helps very
big open source projects like git. And if there are some legal
procedures that I need to accomplish in order to be able to say it
publicly, it would be very kind of you if you point me into the right
direction.

Welcome to the community! All it takes is one email (a journey of a thousand 
miles starts with the first step) [1]




Thanks for all the info
Christopher Díaz Riveros


[1] It's easy to feel overwhelmed by other who post..  can find the article 
but...

Loan Offer

2017-07-31 Thread Sec Capital Loans

Loan Offer at 3%, Feel Free to REPLY back to us for more info.

72 matches

Mail list logo