Re: [PATCH] Support long format for log-based submodule diff

2018-04-02 Thread Stefan Beller
On Sun, Apr 1, 2018 at 6:07 PM, Robert Dailey  wrote:
> On Tue, Mar 27, 2018 at 5:17 PM, Stefan Beller  wrote:
>>> >> $ git diff --submodule=log --submodule-log-detail=(long|short)
>>> >>
>>> >> I'm not sure what makes sense here. I welcome thoughts/discussion and
>>> >> will provide follow-up patches.
>>> >
>>> > The case of merges is usually configured with --[no-]merges, or
>>> > --min-parents=.
>>
>>> But that is a knob that controls an irrelevant aspect of the detail
>>> in the context of this discussion, isn't it?  This code is about "to
>>> what degree the things that happened between two submodule commits
>>> in an adjacent pair of commits in the superproject are summarized?"
>>
>> And I took it a step further and wanted to give a general solution, which
>> allows giving any option that the diff machinery accepts to only apply
>> to the submodule diffing part of the current diff.
>>
>>> The hack Robert illustrates below is to change it to stop favouring
>>> such projects with "clean" histories, and show "log --oneline
>>> --no-merges --left-right".  When presented that way, clean histories
>>> of topic-branch based projects will suffer by losing conciseness,
>>> but clean histories of totally linear projects will still be shown
>>> the same way, and messy history that sometimes merges, sometimes
>>> merges mergy histories, and sometimes directly builds on the trunk
>>> will be shown as an enumeration of individual commits in a flat way
>>> by ignoring merges and not restricting the traversal to the first
>>> parent chains, which would appear more uniform than what the current
>>> code shows.
>>
>> Oh, I realize this is in the *summary* code path, I was thinking about the
>> show_submodule_inline_diff, which would benefit from more diff options.
>>
>>> I do not see a point in introducing --min/max-parents as a knob to
>>> control how the history is summarized.
>>
>> For a summary a flat list of commits may be fine, ignoring
>> (ideally non-evil) merges.
>>
>>> This is a strongly related tangent, but I wonder if we can and/or
>>> want to share more code with the codepath that prepares the log
>>> message for a merge.  It summarizes what happened on the side branch
>>> since it forked from the history it is joining back to (I think it
>>> is merge.c::shortlog() that computes this)
>>
>> I do not find code there. To me it looks like builtin/fmt-merge-msg.c
>> is responsible for coming up with a default merge message?
>> In that file there is a shortlog() function, which walks revisions
>> and puts together the subject lines of commits.
>>
>>> and it is quite similar
>>> to what Robert wants to use for submodules here.  On the other hand,
>>> in a project _without_ submodule, if you are pulling history made by
>>> your lieutenant whose history is full of linear merges of topic
>>> branches to the mainline, it may not be a bad idea to allow
>>> fmt-merge-msg to alternatively show something similar to the "diff
>>> --submodule=log" gives us, i.e. summarize the history of the side
>>> branch being merged by just listing the commits on the first-parent
>>> chain.  So I sense some opportunity for cross pollination here.
>>
>> The cross pollination that I sense is the desire in both cases to freely
>> specify the format as it may depend on the workflow.
>
> First I want to apologize for having taken so long to get back with
> each of you about this. I actually have a lot of work started to
> expand the --submodule option to add a "full-log" option in addition
> to the existing "log". This is a pretty big task for me already,
> mostly because I'm unfamiliar with git and have limited personal time
> to do this at home (this is part of what I am apologizing for).

No worries wrt. time.

> I kind
> of get what Stefan and Junio are saying. There's a lot of opportunity
> for cleanup. More specific to my use case, adding some functionality
> to generate a log message (although I've developed a bash script to do
> this since I wrote my original email. I'll attach it to this email for
> those interested).

The functionality looks very similar what Gerrit does in its
"superproject subscription mode", which would update the submodules in
the superproject automatically, when you submit on the submodule.
For example [1] is an update of the Gerrit project itself, that has some
submodules. This commit only updates the replication plugin, but
provides a summary what happened in that plugin.

[1] 
https://gerrit.googlesource.com/gerrit/+/db20af7123221b0b2f01d1f06e4eaac32a04cef6


I wonder if there is need for this in upstream git as well, e.g.
"git submodule update --remote" would also want to have a
switch "--commit-with-proposed-commit-message" or if the
standard commit message template would provide a submodule
summary for you. I realize that there is the config option
status.submoduleSummary already, but it is not as clear as either
your script or the Gerrit example.

> Also 

Re: [PATCH] Support long format for log-based submodule diff

2018-04-01 Thread Robert Dailey
On Tue, Mar 27, 2018 at 5:17 PM, Stefan Beller  wrote:
>> >> $ git diff --submodule=log --submodule-log-detail=(long|short)
>> >>
>> >> I'm not sure what makes sense here. I welcome thoughts/discussion and
>> >> will provide follow-up patches.
>> >
>> > The case of merges is usually configured with --[no-]merges, or
>> > --min-parents=.
>
>> But that is a knob that controls an irrelevant aspect of the detail
>> in the context of this discussion, isn't it?  This code is about "to
>> what degree the things that happened between two submodule commits
>> in an adjacent pair of commits in the superproject are summarized?"
>
> And I took it a step further and wanted to give a general solution, which
> allows giving any option that the diff machinery accepts to only apply
> to the submodule diffing part of the current diff.
>
>> The hack Robert illustrates below is to change it to stop favouring
>> such projects with "clean" histories, and show "log --oneline
>> --no-merges --left-right".  When presented that way, clean histories
>> of topic-branch based projects will suffer by losing conciseness,
>> but clean histories of totally linear projects will still be shown
>> the same way, and messy history that sometimes merges, sometimes
>> merges mergy histories, and sometimes directly builds on the trunk
>> will be shown as an enumeration of individual commits in a flat way
>> by ignoring merges and not restricting the traversal to the first
>> parent chains, which would appear more uniform than what the current
>> code shows.
>
> Oh, I realize this is in the *summary* code path, I was thinking about the
> show_submodule_inline_diff, which would benefit from more diff options.
>
>> I do not see a point in introducing --min/max-parents as a knob to
>> control how the history is summarized.
>
> For a summary a flat list of commits may be fine, ignoring
> (ideally non-evil) merges.
>
>> This is a strongly related tangent, but I wonder if we can and/or
>> want to share more code with the codepath that prepares the log
>> message for a merge.  It summarizes what happened on the side branch
>> since it forked from the history it is joining back to (I think it
>> is merge.c::shortlog() that computes this)
>
> I do not find code there. To me it looks like builtin/fmt-merge-msg.c
> is responsible for coming up with a default merge message?
> In that file there is a shortlog() function, which walks revisions
> and puts together the subject lines of commits.
>
>> and it is quite similar
>> to what Robert wants to use for submodules here.  On the other hand,
>> in a project _without_ submodule, if you are pulling history made by
>> your lieutenant whose history is full of linear merges of topic
>> branches to the mainline, it may not be a bad idea to allow
>> fmt-merge-msg to alternatively show something similar to the "diff
>> --submodule=log" gives us, i.e. summarize the history of the side
>> branch being merged by just listing the commits on the first-parent
>> chain.  So I sense some opportunity for cross pollination here.
>
> The cross pollination that I sense is the desire in both cases to freely
> specify the format as it may depend on the workflow.

First I want to apologize for having taken so long to get back with
each of you about this. I actually have a lot of work started to
expand the --submodule option to add a "full-log" option in addition
to the existing "log". This is a pretty big task for me already,
mostly because I'm unfamiliar with git and have limited personal time
to do this at home (this is part of what I am apologizing for). I kind
of get what Stefan and Junio are saying. There's a lot of opportunity
for cleanup. More specific to my use case, adding some functionality
to generate a log message (although I've developed a bash script to do
this since I wrote my original email. I'll attach it to this email for
those interested). Also I get that taking this a notch higher and
adding a new option to pass options down to submodules also addresses
my case. Before I waste anyone's time on this, I want to make sure
that my very narrow and specific implementation will be ideal. By all
means I do not want to do things the easy way which ends up adding
"cruft" you'll have to deal with later. If there's a larger effort to
generalize this and other things related to submodules maybe I can
just wait for that to happen instead? What direction would you guys
recommend?

Junio basically hit the nail on the head with the comparisons of
different mainlines. I think some repositories are more disciplined
than others. At my workplace, I deal with a lot of folks that aren't
interested in learning git beyond the required day to day
responsibilities. It's difficult to enforce very specific branching,
rebase, and merge habits. As such, the best I can do to work around
that for building release notes is to exclude merge commits (since
most of the time, people keep the default message which is generally

Re: [PATCH] Support long format for log-based submodule diff

2018-03-27 Thread Stefan Beller
> >> $ git diff --submodule=log --submodule-log-detail=(long|short)
> >>
> >> I'm not sure what makes sense here. I welcome thoughts/discussion and
> >> will provide follow-up patches.
> >
> > The case of merges is usually configured with --[no-]merges, or
> > --min-parents=.

> But that is a knob that controls an irrelevant aspect of the detail
> in the context of this discussion, isn't it?  This code is about "to
> what degree the things that happened between two submodule commits
> in an adjacent pair of commits in the superproject are summarized?"

And I took it a step further and wanted to give a general solution, which
allows giving any option that the diff machinery accepts to only apply
to the submodule diffing part of the current diff.

> The hack Robert illustrates below is to change it to stop favouring
> such projects with "clean" histories, and show "log --oneline
> --no-merges --left-right".  When presented that way, clean histories
> of topic-branch based projects will suffer by losing conciseness,
> but clean histories of totally linear projects will still be shown
> the same way, and messy history that sometimes merges, sometimes
> merges mergy histories, and sometimes directly builds on the trunk
> will be shown as an enumeration of individual commits in a flat way
> by ignoring merges and not restricting the traversal to the first
> parent chains, which would appear more uniform than what the current
> code shows.

Oh, I realize this is in the *summary* code path, I was thinking about the
show_submodule_inline_diff, which would benefit from more diff options.

> I do not see a point in introducing --min/max-parents as a knob to
> control how the history is summarized.

For a summary a flat list of commits may be fine, ignoring
(ideally non-evil) merges.

> This is a strongly related tangent, but I wonder if we can and/or
> want to share more code with the codepath that prepares the log
> message for a merge.  It summarizes what happened on the side branch
> since it forked from the history it is joining back to (I think it
> is merge.c::shortlog() that computes this)

I do not find code there. To me it looks like builtin/fmt-merge-msg.c
is responsible for coming up with a default merge message?
In that file there is a shortlog() function, which walks revisions
and puts together the subject lines of commits.

> and it is quite similar
> to what Robert wants to use for submodules here.  On the other hand,
> in a project _without_ submodule, if you are pulling history made by
> your lieutenant whose history is full of linear merges of topic
> branches to the mainline, it may not be a bad idea to allow
> fmt-merge-msg to alternatively show something similar to the "diff
> --submodule=log" gives us, i.e. summarize the history of the side
> branch being merged by just listing the commits on the first-parent
> chain.  So I sense some opportunity for cross pollination here.

The cross pollination that I sense is the desire in both cases to freely
specify the format as it may depend on the workflow.

Stefan


Re: [PATCH] Support long format for log-based submodule diff

2018-03-09 Thread Junio C Hamano
Stefan Beller  writes:

>> $ git diff --submodule=log --submodule-log-detail=(long|short)
>>
>> I'm not sure what makes sense here. I welcome thoughts/discussion and
>> will provide follow-up patches.
>
> The case of merges is usually configured with --[no-]merges, or
> --min-parents=.

But that is a knob that controls an irrelevant aspect of the detail
in the context of this discussion, isn't it?  This code is about "to
what degree the things that happened between two submodule commits
in an adjacent pair of commits in the superproject are summarized?"
and the current one unilaterally decides that something similar to
what you would see in the output from "log --oneline --first-parent
--left-right" is sufficient, which is a position to heavily favour
projects whose histories are very clean by either being:

 (1) totally linear, each individual commit appearing on the
 first-parent chain; or

 (2) totally topic-branch based, everything appearing as merges of
 a topic branch to the trunk

The hack Robert illustrates below is to change it to stop favouring
such projects with "clean" histories, and show "log --oneline
--no-merges --left-right".  When presented that way, clean histories
of topic-branch based projects will suffer by losing conciseness,
but clean histories of totally linear projects will still be shown
the same way, and messy history that sometimes merges, sometimes
merges mergy histories, and sometimes directly builds on the trunk
will be shown as an enumeration of individual commits in a flat way
by ignoring merges and not restricting the traversal to the first
parent chains, which would appear more uniform than what the current
code shows.

I do not see a point in introducing --min/max-parents as a knob to
control how the history is summarized.

This is a strongly related tangent, but I wonder if we can and/or
want to share more code with the codepath that prepares the log
message for a merge.  It summarizes what happened on the side branch
since it forked from the history it is joining back to (I think it
is merge.c::shortlog() that computes this) and it is quite similar
to what Robert wants to use for submodules here.  On the other hand,
in a project _without_ submodule, if you are pulling history made by
your lieutenant whose history is full of linear merges of topic
branches to the mainline, it may not be a bad idea to allow
fmt-merge-msg to alternatively show something similar to the "diff
--submodule=log" gives us, i.e. summarize the history of the side
branch being merged by just listing the commits on the first-parent
chain.  So I sense some opportunity for cross pollination here.


Re: [PATCH] Support long format for log-based submodule diff

2018-03-09 Thread Stefan Beller
On Wed, Mar 7, 2018 at 1:11 PM, Robert Dailey  wrote:
> I am experimenting with a version of submodule diff (using log style)
> that prints the commits brought in from merges, while excluding the
> merge commits themselves. This is useful in cases where a merge commit's
> summary does not fully explain the changes being merged (for example,
> for longer-lived branches).
>
> I could have gone through the effort to make this more configurable, but
> before doing that level of work I wanted to get some discussion going to
> understand first if this is a useful change and second how it should be
> configured. For example, we could allow:
>
> $ git diff --submodule=long-log
>
> Or a supplementary option such as:
>
> $ git diff --submodule=log --submodule-log-detail=(long|short)
>
> I'm not sure what makes sense here. I welcome thoughts/discussion and
> will provide follow-up patches.

The case of merges is usually configured with --[no-]merges, or
--min-parents=.

I would think we would want to have different settings per repository,
i.e. these settings would only apply to the superproject, however
we could keep the same names for submodules, such that we could do

git log --min-parents=0 --submodules=--no-merges

We started an effort to have a repository object handle in most functions
some time ago, but the option parsing for the revision walking doesn't
take a repository yet, otherwise the generic revision parsing for submodules
would be easy to implement.

Thoughts on this generic approach?
Stefan

>
> Signed-off-by: Robert Dailey 
> ---
>  submodule.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/submodule.c b/submodule.c
> index 2967704317..a0a62ad7bd 100644
> --- a/submodule.c
> +++ b/submodule.c
> @@ -428,7 +428,8 @@ static int prepare_submodule_summary(struct rev_info 
> *rev, const char *path,
> init_revisions(rev, NULL);
> setup_revisions(0, NULL, rev, NULL);
> rev->left_right = 1;
> -   rev->first_parent_only = 1;
> +   rev->max_parents = 1;
> +   rev->first_parent_only = 0;
> left->object.flags |= SYMMETRIC_LEFT;
> add_pending_object(rev, >object, path);
> add_pending_object(rev, >object, path);
> --
> 2.13.1.windows.2
>


Re: [PATCH] Support long format for log-based submodule diff

2018-03-07 Thread Junio C Hamano
Robert Dailey  writes:

> I could have gone through the effort to make this more configurable, but
> before doing that level of work I wanted to get some discussion going to
> understand first if this is a useful change and second how it should be
> configured. For example, we could allow:
>
> $ git diff --submodule=long-log
>
> Or a supplementary option such as:
>
> $ git diff --submodule=log --submodule-log-detail=(long|short)
>
> I'm not sure what makes sense here. I welcome thoughts/discussion and
> will provide follow-up patches.

My quick looking around reveals that prepare_submodule_summary() is
called only by show_submodule_summary(), which in turn is called
only from builtin_diff() in a codepath like this:

if (o->submodule_format == DIFF_SUBMODULE_LOG &&
(!one->mode || S_ISGITLINK(one->mode)) &&
(!two->mode || S_ISGITLINK(two->mode))) {
show_submodule_summary(o, one->path ? one->path : two->path,
>oid, >oid,
two->dirty_submodule);
return;
} else if (o->submodule_format == DIFF_SUBMODULE_INLINE_DIFF &&
   (!one->mode || S_ISGITLINK(one->mode)) &&
   (!two->mode || S_ISGITLINK(two->mode))) {
show_submodule_inline_diff(o, one->path ? one->path : two->path,
>oid, >oid,
two->dirty_submodule);
return;
}

It looks like introducing a new value to o->submodule_format (enum
diff_submodule_format defined in diff.h) would be one natural way to
extend this codepath, at least to me from a quick glance.

It also looks to me that the above may become far easier to read if
the common "are we dealing with a filepair  that involves
submodules?" check in the above if/else if cascade is factored out,
perhaps like this as a preliminary clean-up step, before adding a
new value:

if ((!one->mode || S_ISGITLINK(one->mode)) &&
(!two->mode || S_ISGITLINK(two->mode))) {
switch (o->submodule_format) {
case DIFF_SUBMODULE_LOG:
... do the "log" thing ...
return;
case DIFF_SUBMODULE_INLINE_DIFF:
... do the "inline" thing ...
return;
default:
break;
}
}

Then the place to add a new format would be trivially obvious,
i.e. just add a new case arm to call a new function to give the
summary.