Re: New semantic patches vs. in-flight topics [was: Re: [PATCH 00/19] Bring more repository handles into our code base]

2018-10-25 Thread Stefan Beller
On Wed, Oct 24, 2018 at 6:59 PM SZEDER Gábor  wrote:
>
> On Mon, Oct 22, 2018 at 11:54:06AM -0700, Stefan Beller wrote:
>
> > For the sake of a good history, I would think running 'make coccicheck'
> > and applying the resulting patches would be best as part of the (dirty)
> > merge of any topic that proposes new semantic patches, but that would
> > add load to Junio as it would be an extra step during the merge.
> >
> > One could argue that the step of applying such transformations into
> > the dirty merge is cheaper than resolving merge conflicts that are
> > had when the topic includes the transformation.
>
> Please consider that merge commits' have uglier diffs than regular
> commits, and that merge commits cause additional complications when
> 'git bisect' points the finger at them, both of which are exacerbated
> by additional changes squeezed into evil merges.
>
> > > Consequently, 'make coccicheck' won't run clean and the
> > > static analysis build job will fail until all those topics reach
> > > 'master', and the remaining transformations are applied on top.
> > >
> > > This was (and still is!) an issue with the hasheq()/oideq() series
> > > as well: that series was added on 2018-08-28, and the static
> > > analysis build job is red on 'pu' ever since.  See the follow-up
> > > patch e43d2dcce1 (more oideq/hasheq conversions, 2018-10-02), and
> > > one more follow-up will be necessary after the builtin stash topic
> > > is merged to 'master'.
> >
> > In my understanding this follow up is a feature, as it helps to avoid
> > merge conflicts with other topics in flight.
>
> I don't see how such a follow up patch helps to avoid merge conflicts.

Well, you merge first (the new topic and the cocci patches), and then
do the transformation. But that is putting a lot more work on Junio
as the way to integrate is not just merge a new topic into the pile of
topics (whether it is pu/next/master), but to first perform a merge
of the topic and the coccinelle patches, apply the transformation
and then merge to the pile, assuming the pile is already transformed
(as it happened in "treewide: apply cocci patch" in next/pu).

> > as 'make coccicheck' is an integral part of your review?
>
> Erm, right, "review" was not the right word here.  Anyway, as it is,
> 'make coccicheck' is an integral part of our automated tests, not only
> on Travis CI but on the upcoming Azure thing as well.  I just try to
> pay attention to its results and the results of a bunch of my
> additional builds, and complain or even send a fix when something goes
> reproducibly wrong.  This has certainly became more cumbersome with
> the permanently failing static analysis build job in the last couple
> of weeks.

I seem to not pay as much attention as I should to these,
mostly because they are unreliable  on the aggregate level (a failure
there most likely means another topic than the one I am interested
broke; except in this case, where we discuss the fallout there via
this topic.)

> > I like the approach of having separate classes of semantic patches:
> > (a) the regular "we need to keep checking these" as they address
> > undesirable code patterns, which is what we currently have,
> > and what 'make coccicheck' would complain about.
> > (b) The pending patches as you propose. However I would
> > argue that we'd not want to include the transformation into
> > the same patch as then the patch will have merge conflicts.
>
> Since we have a lot of parallel running topics, merge conflicts are
> basically unavoidable anyway.  If the conflicts from the
> transformation are really that severe, then perhaps the whole series
> should be postponed to a calmer, more suitable time.

Merge conflicts of this kind could be avoided, by running the
transformation on both sides before merging (or not running them
on both sides, but only after merging).

So maybe for these larger 'the_repository.pending.cocci' patches
we could get them into the tree and wait until all (most) of the topics
are including that semantic patch in their base, such that application
of the patch is easy whether before or after writing a series
(as the semantic patch is in its base).

Another short term plan would be renaming the_repository.cocci
such that it would break the 'make coccicheck'

> In the case of 'the_repository.cocci', merging its transformations
> into 'pu' resulted in only four conflicts, and I found all four on the
> easy side to resolve.  I don't think it's worth waiting with the
> transformations in this particular case.

I am not worried about the current conflicts, but about those to come
in new series.

>
> > Ideally we'd have an automated process/bot that would apply
> > all pending semantic patches onto master and then checks for
> > conflicts in HEAD..pu, and only sends off the non-conflicting
> > diffs as a topic.
>
> New semantic patches didn't pop up all that frequently in the past, so
> I'm

Re: New semantic patches vs. in-flight topics [was: Re: [PATCH 00/19] Bring more repository handles into our code base]

2018-10-24 Thread Jeff King
On Mon, Oct 22, 2018 at 07:39:35PM +0200, SZEDER Gábor wrote:

> I don't really like how this or the previous RFC patch series deal
> with semantic patches (or how some past patch series dealt with them,
> for that matter), for various reasons:
> [..]

I am a little late to this thread, but it really seems to me that the
crux of the issue here:

>   - 'make coccicheck' won't run clean (and the static analysis build
> job on Travis CI will fail) for all commits following adding the
> new semantic patches but before applying the resulting
> transformations.
> 
>   - These semantic patches interact badly with 'pu' and 'next',
> because those integration branches can contain topic branches
> adding new code that should be transformed by these semanic
> patches.  Consequently, 'make coccicheck' won't run clean and the
> static analysis build job will fail until all those topics reach
> 'master', and the remaining transformations are applied on top.

Is that we are considering it a failure for "pu" to have pending
coccinelle patches. Are there any coccicheck transformations that aren't
just "we prefer to do it like X instead of Y"?

Changes that actually break things should be caught by the compiler
(either by their nature, or because we take care to rename or change
interfaces when making a semantic change to a function).

I think that's getting at what you're saying here:

>   - Is it really necessary to carry these semantic patches in-tree?
> Let me ellaborate.  There are basically two main use cases for
> semantic patches:
> 
>   - To avoid undesirable code patterns, e.g. we should not use
> [...]
>   - To perform one-off code transformations, e.g. to modify a

I am mostly thinking of your first type here. And I think it makes sense
for people to avoid submitting code that does not pass "make coccicheck"
_as it was at the state of their branch_. But for cocci changes that are
in flight at the same time as their branch, I do not see any need to
strictly adhere to them. The code is "undesirable", not "wrong".

For the second type, I agree that we do not necessarily need to carry
them in-tree. Eventually the transformation happens, and nobody would
use the "old" way because it doesn't work anymore. Problem solved.

I do not mind carrying them for a while as a convenience to people who
will need to fix up their topics in flight (or more likely to the
maintainer, who will need to fixup the merge). But we should make sure
not to forget to remove them when their usefulness has passed.

Likewise, we should not forget to occasionally run "make coccicheck" on
master to make sure people have a clean base to build on. If Junio is
able to do that, then great. But other people can also do so and submit
patches (to go on their respective topics, or to be a new mass-cleanup
topic).

I guess there is some lag there if Junio pushes out a master branch that
fails coccicheck, because contributors may build on it before somebody
gets around to fixing it.

> Having said that, it's certainly easier to double-check the
> resulting transformations when one can apply the semantic
> patches locally, and doing so is easier when the semantic
> patches are in tree than when they must be copy-pasted from a
> commit message.

I've wondered if we could have a script that pulls a coccinelle snippet
from a commit message and runs it. It may be a hassle to find the right
commit, though (you'd start the procedure from "oops, my compile now
seems broken; what was the change that I need to apply to adapt?").

>   - A new semantic patch should be added as "pending", e.g. to the
> file 'the_repository.pending.cocci', together with the resulting
> transformations in the same commit.
> 
> This way neither 'make coccicheck' nor the static analysis build
> job would complain in the topic branch or in the two integration
> branches.  And if they do complain, then we would know right away
> that they complain because of a well-established semantic patch.
> Yet, anyone interested could run 'make coccicheck-pending' to see
> where are we heading.

OK, makes sense.

>   - The author of the "pending" semanting patch should then keep an
> eye on already cooking topics: whether any of them contain new
> code that should be transformed, and how they progress to
> 'master', and sending followup patch(es) with the remaining
> transformations when applicable.
> 
> Futhermore, the author should also pay attention to any new topics
> that branch off after the "pending" semantic patch, and whether
> any of them introduce code to be transformed, warning their
> authors as necessary.

This part seems tricky, though. There's a race condition between
promoting the patch from pending to not-pending and other topics
branching off. And remember that we do not always see other people's
branches, which they may work on in 

Re: New semantic patches vs. in-flight topics [was: Re: [PATCH 00/19] Bring more repository handles into our code base]

2018-10-24 Thread SZEDER Gábor
On Mon, Oct 22, 2018 at 11:54:06AM -0700, Stefan Beller wrote:

> For the sake of a good history, I would think running 'make coccicheck'
> and applying the resulting patches would be best as part of the (dirty)
> merge of any topic that proposes new semantic patches, but that would
> add load to Junio as it would be an extra step during the merge.
> 
> One could argue that the step of applying such transformations into
> the dirty merge is cheaper than resolving merge conflicts that are
> had when the topic includes the transformation.

Please consider that merge commits' have uglier diffs than regular
commits, and that merge commits cause additional complications when
'git bisect' points the finger at them, both of which are exacerbated
by additional changes squeezed into evil merges.

> > Consequently, 'make coccicheck' won't run clean and the
> > static analysis build job will fail until all those topics reach
> > 'master', and the remaining transformations are applied on top.
> >
> > This was (and still is!) an issue with the hasheq()/oideq() series
> > as well: that series was added on 2018-08-28, and the static
> > analysis build job is red on 'pu' ever since.  See the follow-up
> > patch e43d2dcce1 (more oideq/hasheq conversions, 2018-10-02), and
> > one more follow-up will be necessary after the builtin stash topic
> > is merged to 'master'.
> 
> In my understanding this follow up is a feature, as it helps to avoid
> merge conflicts with other topics in flight.

I don't see how such a follow up patch helps to avoid merge conflicts.

There were topics that branched off before the introduction of oideq()
into 'master', therefore they couldn't make use of this new function
until they were merged to 'master' as well, so they added their own
!oidcmp() calls.  That follow up patch was necessary to transform
these new !oidcmp() calls after those topics reached 'master'.  Merge
conflicts had nothing to do with it.

So this follow up patch is not a feature, but rather an inherent
consequence of the project's branching model, with lots of parallel
running topics branching off at different points and progressing at
different speeds.

> > This makes it harder to review other patch series.
> 
> as 'make coccicheck' is an integral part of your review?

Erm, right, "review" was not the right word here.  Anyway, as it is,
'make coccicheck' is an integral part of our automated tests, not only
on Travis CI but on the upcoming Azure thing as well.  I just try to
pay attention to its results and the results of a bunch of my
additional builds, and complain or even send a fix when something goes
reproducibly wrong.  This has certainly became more cumbersome with
the permanently failing static analysis build job in the last couple
of weeks.

> > How about introducing the concept of "pending" semantic patches,
> > stored in 'contrib/coccinelle/.pending.cocci' files, modifying
> > 'make coccicheck' to skip them, and adding the new 'make
> > coccicheck-pending' target to make it convenient to apply them, e.g.
> > something like the simple patch at the end.
> >
> > So the process would go something like this:
> >
> >   - A new semantic patch should be added as "pending", e.g. to the
> > file 'the_repository.pending.cocci', together with the resulting
> > transformations in the same commit.
> >
> > This way neither 'make coccicheck' nor the static analysis build
> > job would complain in the topic branch or in the two integration
> > branches.  And if they do complain, then we would know right away
> > that they complain because of a well-established semantic patch.
> > Yet, anyone interested could run 'make coccicheck-pending' to see
> > where are we heading.
> >
> >   - The author of the "pending" semanting patch should then keep an
> > eye on already cooking topics: whether any of them contain new
> > code that should be transformed, and how they progress to
> > 'master', and sending followup patch(es) with the remaining
> > transformations when applicable.
> >
> > Futhermore, the author should also pay attention to any new topics
> > that branch off after the "pending" semantic patch, and whether
> > any of them introduce code to be transformed, warning their
> > authors as necessary.
> >
> >   - Finally, after all the dust settled, the dev should follow up with
> > a patch to:
> >
> >   - promote the "penging" patch to '.cocci', if its purpose
> > is to avoid undesirable code patterns in the future, or
> >
> >   - remove the semantic patch, if it was used in a one-off
> > transformation.
> >
> > Thoughts?
> 
> I like the approach of having separate classes of semantic patches:
> (a) the regular "we need to keep checking these" as they address
> undesirable code patterns, which is what we currently have,
> and what 'make coccicheck' would complain about.
> (b) The pending patches as 

Re: New semantic patches vs. in-flight topics [was: Re: [PATCH 00/19] Bring more repository handles into our code base]

2018-10-23 Thread Junio C Hamano
Stefan Beller  writes:

>> Anyway, even though "make coccicheck" does not run in subsecond,
>> I've updated my machinery to rebuild the integration branches so
>> that I can optionally queue generated coccicheck patches, and what I
>> pushed out tonight has one at the tip of 'pu' and also another at
>> the tip of 'next'.  The latter seems to be passing all archs and
>> executing Windows run.
>
> That is pretty exciting!
>
> Looking at the commit in next, you also included the suggestion
> from [1] to use a postincrement instead of a preincrement and I got
> excited to see how we express such a thing in coccinelle,
> but it turns out that it slipped in unrelated to the coccinelle patches.

See below, which was sitting in my working tree.

> How would we go from here?
> It is not obvious to me how such changes would be integrated,
> as regenerating them on top of pu will not help getting these changes
> merged down, and applying the semantic patch on next (once
> sb/more-repo-in-api lands in next) would created the merge conflicts for
> all topics that are merged to next after that series.

Conflicts with later topics is indeed worrysome.  That is why I did
it as an experiment.  If it becomes too painful, I'd probably stop
doing it while merging to anything other than 'pu', and then we can
follow the more distributed approach along the lines of what Szeder
suggested, to see how smoothly it goes.

-- >8 --
Subject: [PATCH] cocci: simplify "if (++u > 1)" to "if (u++)"

It is more common to use post-increment than pre-increment when the
side effect is the primary thing we want in our code and in C in
general (unlike C++).

Initializing a variable to 0, incrementing it every time we do
something, and checking if we have already done that thing to guard
the code to do that thing, is easier to understand when written

if (u++)
; /* we've done that! */
else
do_it(); /* just once. */

but if you try to use pre-increment, you end up with a less natural
looking

if (++u > 1)

Signed-off-by: Junio C Hamano 
---
 contrib/coccinelle/preincr.cocci | 5 +
 1 file changed, 5 insertions(+)
 create mode 100644 contrib/coccinelle/preincr.cocci

diff --git a/contrib/coccinelle/preincr.cocci b/contrib/coccinelle/preincr.cocci
new file mode 100644
index 00..7fe1e8d2d9
--- /dev/null
+++ b/contrib/coccinelle/preincr.cocci
@@ -0,0 +1,5 @@
+@ preincrement @
+identifier i;
+@@
+- ++i > 1
++ i++
-- 
2.19.1-542-gc4df23f792



Re: New semantic patches vs. in-flight topics [was: Re: [PATCH 00/19] Bring more repository handles into our code base]

2018-10-23 Thread Stefan Beller
On Tue, Oct 23, 2018 at 2:38 AM Junio C Hamano  wrote:
>
> Junio C Hamano  writes:
>
> > I actually think this round does a far nicer job playing well with
> > other topics than any earlier series.  The pain you are observing I
> > think come primarily from my not making the best use of these
> > patches.
> >
> > Steppng back a bit, I'd imagine in an ideal world where "make
> > coccicheck" can be done instantaneously _and_ the spatch machinery
> > is just as reliable as C compilers
> >
> > What I _could_ do (and what I did do only for one round of pushing
> > out 'pu') is to queue a coccinelle transformation at the tip of
> > integration branches.  If "make coccicheck" ran in subsecond, I
> > could even automate it in the script that is used to rebuild 'pu'
> > every day, ...
>
> Anyway, even though "make coccicheck" does not run in subsecond,
> I've updated my machinery to rebuild the integration branches so
> that I can optionally queue generated coccicheck patches, and what I
> pushed out tonight has one at the tip of 'pu' and also another at
> the tip of 'next'.  The latter seems to be passing all archs and
> executing Windows run.

That is pretty exciting!

Looking at the commit in next, you also included the suggestion
from [1] to use a postincrement instead of a preincrement and I got
excited to see how we express such a thing in coccinelle,
but it turns out that it slipped in unrelated to the coccinelle patches.

How would we go from here?
It is not obvious to me how such changes would be integrated,
as regenerating them on top of pu will not help getting these changes
merged down, and applying the semantic patch on next (once
sb/more-repo-in-api lands in next) would created the merge conflicts for
all topics that are merged to next after that series.

[1] https://public-inbox.org/git/xmqqin1wyxvz@gitster-ct.c.googlers.com/


Re: New semantic patches vs. in-flight topics [was: Re: [PATCH 00/19] Bring more repository handles into our code base]

2018-10-23 Thread Junio C Hamano
Carlo Arenas  writes:

> On Tue, Oct 23, 2018 at 2:40 AM Junio C Hamano  wrote:
>>
>> The tip of 'pu' has trouble with -Wunused on Apple around the
>> delta-islands series.
>
> FWIW the "problem" is actually with -Wunused-function and is AFAIK not
> related to the semantic changes or Apple (AKA macOS)

Oh, don't get me wrong.  By Apple, I meant "the versions of compiler
used on the Apple build at TravisCI site".  I could have sent the
above two lines in a separate topic, as the issue does not have
anything to do with "new semantic patches" discussion, but I thought
it would be obvious to everybody who is reading the thread---it
seems I thought wrong ;-)


Re: New semantic patches vs. in-flight topics [was: Re: [PATCH 00/19] Bring more repository handles into our code base]

2018-10-23 Thread Carlo Arenas
On Tue, Oct 23, 2018 at 2:40 AM Junio C Hamano  wrote:
>
> The tip of 'pu' has trouble with -Wunused on Apple around the
> delta-islands series.

FWIW the "problem" is actually with -Wunused-function and is AFAIK not
related to the semantic changes or Apple (AKA macOS)

Indeed, I saw this issue before also in Linux (Fedora Rawhide) when
using the latest clang and presumed it was just expected because the
topic branch was most likely incomplete.

Carlo


Re: New semantic patches vs. in-flight topics [was: Re: [PATCH 00/19] Bring more repository handles into our code base]

2018-10-23 Thread Junio C Hamano
Junio C Hamano  writes:

> I actually think this round does a far nicer job playing well with
> other topics than any earlier series.  The pain you are observing I
> think come primarily from my not making the best use of these
> patches.
>
> Steppng back a bit, I'd imagine in an ideal world where "make
> coccicheck" can be done instantaneously _and_ the spatch machinery
> is just as reliable as C compilers
>
> What I _could_ do (and what I did do only for one round of pushing
> out 'pu') is to queue a coccinelle transformation at the tip of
> integration branches.  If "make coccicheck" ran in subsecond, I
> could even automate it in the script that is used to rebuild 'pu'
> every day, ...

Anyway, even though "make coccicheck" does not run in subsecond,
I've updated my machinery to rebuild the integration branches so
that I can optionally queue generated coccicheck patches, and what I
pushed out tonight has one at the tip of 'pu' and also another at
the tip of 'next'.  The latter seems to be passing all archs and
executing Windows run.

The tip of 'pu' has trouble with -Wunused on Apple around the
delta-islands series.


next: https://travis-ci.org/git/git/builds/444969406
pu:   https://travis-ci.org/git/git/builds/444969342




Re: New semantic patches vs. in-flight topics [was: Re: [PATCH 00/19] Bring more repository handles into our code base]

2018-10-22 Thread Junio C Hamano
Stefan Beller  writes:

> Am I overestimating or misunderstanding rerere here?

Yes.

> Would it be realistic for next and master branch instead of pu?
>
> I'd be wary for the master branch, as we may not want to rely on
> spatch without review. (It can produce funny white space issues,
> but seems to produce working/correct code)

Yes, that is why I think it is a bit too late to do the "fixup with
spatch" after merging to 'next' and 'master'.


Re: New semantic patches vs. in-flight topics [was: Re: [PATCH 00/19] Bring more repository handles into our code base]

2018-10-22 Thread Stefan Beller
> Stepping back a bit, I'd imagine in an ideal world where "make
> coccicheck" can be done instantaneously _and_ the spatch machinery
> is just as reliable as C compilers.
> [...]
> Now we do not live in that ideal world and [...]
>  such a series will have zero
> chance of landing in 'pu', unless we freeze the world.

I wonder if we could approximate the ideal world with
rerere+spatch a bit more:

1) I resend the series that includes "apply cocci patches"
as the last patch and you queue it as usual

2) The first time such a series is merged, you'd merge
HEAD^ (i.e. excluding the "apply the semantic patch)
to pu instead. I view this as a yet-to-be invented mode
'--theirs-is-stale-use-tree-instead=THEIRS~1^{tree}',
then run spatch to reproduce the last patch into the
dirty merge (which has pu and the last patch as parent)

This step is done to 'pre-heat' the rerere cache.

3) Any further integration (e.g. rebuilding pu) would
benefit from the hot rerere cache and very little work
is actually required as the conflicts are resolved by rerere.

Am I overestimating or misunderstanding rerere here?

> What I _could_ do (and what I did do only for one round of pushing
> out 'pu') is to queue a coccinelle transformation at the tip of
> integration branches.  If "make coccicheck" ran in subsecond, I
> could even automate it in the script that is used to rebuild 'pu'
> every day, so that after merging each and every topic, do the "make
> coccicheck" and apply resulting .cocci.patch files and squash that
> into the merge commit.
>
> But with the current "make coccicheck" performance, that is not
> realistic.

Would it be realistic for next and master branch instead of pu?

I'd be wary for the master branch, as we may not want to rely on
spatch without review. (It can produce funny white space issues,
but seems to produce working/correct code)

> I am wondering if it is feasible to do it at the tip of 'pu' (which
> is rebuilt two to three times a day), 'next' (which is updated once
> or twice a week) and 'master'.

We could even optimize that, by checking if contrib/cocci/ has
changes for the new tip of next/master respectively.

Another thing I wonder is if we care about the distinction between
the (a) pending changes as described by SZEDER, as we introduce
these deliberately, whereas (b) undesirable code patterns
(e.g. free and null instead of FREE_AND_NULL macro) might be
caught and reported in pu/next and then someone learns from it.
Automatic rewriting the (b) cases seems not just as desirable as
(a), where we do it purely to avoid resolving merge conflicts by
hand.

> I find that your "pending" idea may be nicer, as it distributes the
> load.  Whoever wants to change the world order by updating the .cocci
> rules is primarily responsible for making it happen without breaking
> the world during the transition.  That's more scalable.

... and I think SZEDER considers the current world broken as
'make coccicheck' returns non-empty, so it sounds to me as if
the current transition is thought less-than-optimal.


Re: New semantic patches vs. in-flight topics [was: Re: [PATCH 00/19] Bring more repository handles into our code base]

2018-10-22 Thread Junio C Hamano
SZEDER Gábor  writes:

> I don't really like how this or the previous RFC patch series deal
> with semantic patches (or how some past patch series dealt with them,
> for that matter), for various reasons:
> ...
> How about introducing the concept of "pending" semantic patches,
> stored in 'contrib/coccinelle/.pending.cocci' files, modifying
> 'make coccicheck' to skip them, and adding the new 'make
> coccicheck-pending' target to make it convenient to apply them, e.g.
> something like the simple patch at the end.
>
> So the process would go something like this:
>
>   - A new semantic patch should be added as "pending", e.g. to the
> file 'the_repository.pending.cocci', together with the resulting
> transformations in the same commit.
>
> This way neither 'make coccicheck' nor the static analysis build
> job would complain in the topic branch or in the two integration
> branches.  And if they do complain, then we would know right away
> that they complain because of a well-established semantic patch.
> Yet, anyone interested could run 'make coccicheck-pending' to see
> where are we heading.
>
>   - The author of the "pending" semanting patch should then keep an
> eye on already cooking topics: whether any of them contain new
> code that should be transformed, and how they progress to
> 'master', and sending followup patch(es) with the remaining
> transformations when applicable.
> 
> Futhermore, the author should also pay attention to any new topics
> that branch off after the "pending" semantic patch, and whether
> any of them introduce code to be transformed, warning their
> authors as necessary.
>
>   - Finally, after all the dust settled, the dev should follow up with
> a patch to:
> 
>   - promote the "penging" patch to '.cocci', if its purpose
> is to avoid undesirable code patterns in the future, or
> 
>   - remove the semantic patch, if it was used in a one-off
> transformation.
> ...
> Thoughts?

I actually think this round does a far nicer job playing well with
other topics than any earlier series.  The pain you are observing I
think come primarily from my not making the best use of these
patches.

Steppng back a bit, I'd imagine in an ideal world where "make
coccicheck" can be done instantaneously _and_ the spatch machinery
is just as reliable as C compilers.  In such a world, I may rename
all the *.c files in my tree to *.c.in, make the very first step of
the "make all" to run coccicheck and transform *.c.in to *.c before
starting the compilation.  There is no need to have changes to
*.c.in that spatch would fix.  Such an idealized set-up has a few
nice property.

 - Mechanical conflict resolutions between topics in flight and a
   series that modifies or adds new .cocci rules would greatly be
   reduced.

 - Still, *.c files that are "compiled" from *.c.in file used by
   each build will by definition cocci-clean.

 - Bugs like the one that required 6afedba8 ("object_id.cocci: match
   only expressions of type 'struct object_id'", 2018-10-15) are
   still possible, but some of them may be caught by C compilers
   that inspect the result of spatch compilation from *.c.in to *.c

Now we do not live in that ideal world and there is no separate
"turn *.c.in into *.c" step in our build procedure.  In such a
"real" world, if we had a rule like "each individual commit must
pass 'make coccicheck' cleanly", anybody who does such a merge and
resolve huge conflics would essentially be wasting time on something
that the tool could do, and then the result of the merge would
further need to be amended for semantic conflicts (i.e. the other
branch may have introduced new instances of old pattern .cocci
patches in our branch would want to transform)).  By not insisting
on cocci-cleanness at each commit level, we can gain (or at least
sumulate gaining) some benefit that we would reap in the ideal
world, and Stefan's latest series already does so for the former.
If we insisted that these patches must be accompanied with the
result of the cocci transformations, such a series will have zero
chance of landing in 'pu', unless we freeze the world.

What I _could_ do (and what I did do only for one round of pushing
out 'pu') is to queue a coccinelle transformation at the tip of
integration branches.  If "make coccicheck" ran in subsecond, I
could even automate it in the script that is used to rebuild 'pu'
every day, so that after merging each and every topic, do the "make
coccicheck" and apply resulting .cocci.patch files and squash that
into the merge commit.

But with the current "make coccicheck" performance, that is not
realistic.

I am wondering if it is feasible to do it at the tip of 'pu' (which
is rebuilt two to three times a day), 'next' (which is updated once
or twice a week) and 'master'.

I find that your "pending" idea may be nicer, as it distributes the
load.  Whoever wants to change the world order by

Re: New semantic patches vs. in-flight topics [was: Re: [PATCH 00/19] Bring more repository handles into our code base]

2018-10-22 Thread Stefan Beller
On Mon, Oct 22, 2018 at 10:39 AM SZEDER Gábor  wrote:
>
> On Tue, Oct 16, 2018 at 04:35:31PM -0700, Stefan Beller wrote:
> > the last patch (applying the semantic patches) has been omitted as that
> > would produce a lot of merge conflicts. Without that patch, this merges
> > cleanly to next.
> >
> > As for when to apply the semantic patches, I wondered if we would prefer a 
> > dirty merge
> > (created via "make coccicheck && git apply 
> > contrib/coccinelle/the_repository.cocci.patch")
> > of the semantic patches, or if we'd actually trickle in the changes over 
> > some time?
>
> > This series takes another approach as it doesn't change the signature of
> > functions, but introduces new functions that can deal with arbitrary
> > repositories, keeping the old function signature around using a shallow 
> > wrapper.
> >
> > Additionally each patch adds a semantic patch, that would port from the 
> > old to
> > the new function. These semantic patches are all applied in the very 
> > last patch,
> > but we could omit applying the last patch if it causes too many merge 
> > conflicts
> > and trickl in the semantic patches over time when there are no merge 
> > conflicts.
>
> I don't really like how this or the previous RFC patch series deal
> with semantic patches (or how some past patch series dealt with them,
> for that matter), for various reasons:
>
>   - Applying the transformations from several semantic patches in one
> single patch makes it harder to review it, because we won't know
> which change came from which semantic patch.

Good point, so to improve the series sb/more-repo-in-api, I could
send the application of the semantic patch just after each patch
that adds another semantic patching rule? I personally dislike
applying the semantic patch in the same patch as where the
semantic rule was introduced, as then the mechanical conversions
(from the semantic patch) drown out reviewers attention to the
manual changes.

> For comparison, see the patch series adding hasheq()/oideq(),
> merged in 769af0fd9e (Merge branch 'jk/cocci', 2018-09-17), in
> particular the four "convert  to " patches.

The four patches "convert  to  only contain the semantic
patch as well as its effects, the manual changes are separated out
to later patches, which is quite nice.

>   - 'make coccicheck' won't run clean (and the static analysis build
> job on Travis CI will fail) for all commits following adding the
> new semantic patches but before applying the resulting
> transformations.
>
>   - These semantic patches interact badly with 'pu' and 'next',
> because those integration branches can contain topic branches
> adding new code that should be transformed by these semanic
> patches.

And I thought of this as a feature. There is no merge conflict and it
still compiles, which makes Junios work easy.

Of course it put the load elsewhere. :/

For the sake of a good history, I would think running 'make coccicheck'
and applying the resulting patches would be best as part of the (dirty)
merge of any topic that proposes new semantic patches, but that would
add load to Junio as it would be an extra step during the merge.

One could argue that the step of applying such transformations into
the dirty merge is cheaper than resolving merge conflicts that are
had when the topic includes the transformation.

> Consequently, 'make coccicheck' won't run clean and the
> static analysis build job will fail until all those topics reach
> 'master', and the remaining transformations are applied on top.
>
> This was (and still is!) an issue with the hasheq()/oideq() series
> as well: that series was added on 2018-08-28, and the static
> analysis build job is red on 'pu' ever since.  See the follow-up
> patch e43d2dcce1 (more oideq/hasheq conversions, 2018-10-02), and
> one more follow-up will be necessary after the builtin stash topic
> is merged to 'master'.

In my understanding this follow up is a feature, as it helps to avoid
merge conflicts with other topics in flight.

> This makes it harder to review other patch series.

as 'make coccicheck' is an integral part of your review?

>   - Is it really necessary to carry these semantic patches in-tree?
> Let me ellaborate.  There are basically two main use cases for
> semantic patches:
>
>   - To avoid undesirable code patterns, e.g. we should not use
> sha1_to_hex(oid.hash) or strbuf_addf(&sb, "fixed string"), but
> use oid_to_hex(&oid) or strbuf_addstr(&sb, "fixed string")
> instead.  Note that in these cases we don't remove the
> functions sha1_to_hex() or strbuf_addf(), because there are
> good reasons to use them in other scenarios.
>
> Our semantic patches under 'contrib/coccinelle/' fall into
> this category, and we have 'make coccicheck' and the static
> analysis build job on Travis CI to catch thes

New semantic patches vs. in-flight topics [was: Re: [PATCH 00/19] Bring more repository handles into our code base]

2018-10-22 Thread SZEDER Gábor
On Tue, Oct 16, 2018 at 04:35:31PM -0700, Stefan Beller wrote:
> the last patch (applying the semantic patches) has been omitted as that
> would produce a lot of merge conflicts. Without that patch, this merges
> cleanly to next.
> 
> As for when to apply the semantic patches, I wondered if we would prefer a 
> dirty merge
> (created via "make coccicheck && git apply 
> contrib/coccinelle/the_repository.cocci.patch")
> of the semantic patches, or if we'd actually trickle in the changes over some 
> time?

> This series takes another approach as it doesn't change the signature of
> functions, but introduces new functions that can deal with arbitrary 
> repositories, keeping the old function signature around using a shallow 
> wrapper.
> 
> Additionally each patch adds a semantic patch, that would port from the 
> old to
> the new function. These semantic patches are all applied in the very last 
> patch,
> but we could omit applying the last patch if it causes too many merge 
> conflicts
> and trickl in the semantic patches over time when there are no merge 
> conflicts.

I don't really like how this or the previous RFC patch series deal
with semantic patches (or how some past patch series dealt with them,
for that matter), for various reasons:

  - Applying the transformations from several semantic patches in one
single patch makes it harder to review it, because we won't know
which change came from which semantic patch.

For comparison, see the patch series adding hasheq()/oideq(),
merged in 769af0fd9e (Merge branch 'jk/cocci', 2018-09-17), in
particular the four "convert  to " patches.

  - 'make coccicheck' won't run clean (and the static analysis build
job on Travis CI will fail) for all commits following adding the
new semantic patches but before applying the resulting
transformations.

  - These semantic patches interact badly with 'pu' and 'next',
because those integration branches can contain topic branches
adding new code that should be transformed by these semanic
patches.  Consequently, 'make coccicheck' won't run clean and the
static analysis build job will fail until all those topics reach
'master', and the remaining transformations are applied on top.

This was (and still is!) an issue with the hasheq()/oideq() series
as well: that series was added on 2018-08-28, and the static
analysis build job is red on 'pu' ever since.  See the follow-up
patch e43d2dcce1 (more oideq/hasheq conversions, 2018-10-02), and
one more follow-up will be necessary after the builtin stash topic
is merged to 'master'.

This makes it harder to review other patch series.

  - Is it really necessary to carry these semantic patches in-tree?
Let me ellaborate.  There are basically two main use cases for
semantic patches:

  - To avoid undesirable code patterns, e.g. we should not use
sha1_to_hex(oid.hash) or strbuf_addf(&sb, "fixed string"), but
use oid_to_hex(&oid) or strbuf_addstr(&sb, "fixed string")
instead.  Note that in these cases we don't remove the
functions sha1_to_hex() or strbuf_addf(), because there are
good reasons to use them in other scenarios.

Our semantic patches under 'contrib/coccinelle/' fall into
this category, and we have 'make coccicheck' and the static
analysis build job on Travis CI to catch these undesirable
code patterns preferably early, and to prevent them from
entering our codebase.

  - To perform one-off code transformations, e.g. to modify a
function's name and/or signature and convert all its
callsites; see e.g. commits abef9020e3 (sha1_file: convert
sha1_object_info* to object_id, 2018-03-12) and b4f5aca40e
(sha1_file: convert read_sha1_file to struct object_id,
2018-03-12).

As far as I understand this patch series falls into this
category: once the conversion is complete the old functions
will be removed.  After that there will be no use for these
semanic patches.

Having said that, it's certainly easier to double-check the
resulting transformations when one can apply the semantic
patches locally, and doing so is easier when the semantic
patches are in tree than when they must be copy-pasted from a
commit message.

OK, that was already long.  Now, can we do better?

How about introducing the concept of "pending" semantic patches,
stored in 'contrib/coccinelle/.pending.cocci' files, modifying
'make coccicheck' to skip them, and adding the new 'make
coccicheck-pending' target to make it convenient to apply them, e.g.
something like the simple patch at the end.

So the process would go something like this:

  - A new semantic patch should be added as "pending", e.g. to the
file 'the_repository.pending.cocci', together with the resulting
transformations in 

Re: [PATCH 00/19] Bring more repository handles into our code base

2018-10-19 Thread Junio C Hamano
Stefan Beller  writes:

> This rerolls sb/more-repo-in-api.
> It applies on nd/the-index merged with ds/reachable and is available via
> git fetch https://github.com/stefanbeller/git object-store-final-3

Thanks.  Luckily we have both of these prerequisites in 'master'
now, o hopefully this can be applied to 'master' directly and would
play well when merged to 'next' and to 'pu'.

Will queue and play with it a bit before sending comments.


Re: [PATCH 00/19] Bring more repository handles into our code base

2018-10-17 Thread Stefan Beller
On Wed, Oct 17, 2018 at 5:41 AM Derrick Stolee  wrote:
>
> On 10/16/2018 7:35 PM, Stefan Beller wrote:
> >
> >  This series takes another approach as it doesn't change the signature 
> > of
> >  functions, but introduces new functions that can deal with arbitrary
> >  repositories, keeping the old function signature around using a 
> > shallow wrapper.
> I think this is a good direction, and the changes look good to me.
>
> >  Additionally each patch adds a semantic patch, that would port from 
> > the old to
> >  the new function. These semantic patches are all applied in the very 
> > last patch,
> >  but we could omit applying the last patch if it causes too many merge 
> > conflicts
> >  and trickl in the semantic patches over time when there are no merge 
> > conflicts.
>
> The semantic patches are a good idea. At some point in the future, we
> can submit a series that applies the patches to any leftover calls and
> removes the old callers. We can hopefully rely on review (and the
> semantic patches warning that there is work to do) to prevent new
> callers from being introduced in future topics. That doesn't count
> topics that come around while this one isn't merged down.

For those topics still in flight, I added re-defines, e.g.

#ifndef NO_THE_REPOSITORY_COMPATIBILITY_MACROS
#define get_merge_bases(r1, r2)
repo_get_merge_bases(the_repository, r1, r2)
#endif

so the base function still keeps working, and we can cleanup
multiple times, until eventually, we can get rid of the base function.

> I had one high-level question: How are we testing that these "arbitrary
> repository" changes are safe?

I did the bare minimum in conversions in this series, such that the
submodule code tests successfully. So if we'd revert some parts,
the submodule tests would break.



> I just remember the issue we had with the
> commit-graph code relying on arbitrary repositories, but then adding a
> reference to the replace objects broke the code (because replace-objects
> wasn't using arbitrary repos correctly, but the commit-graph was tested
> with arbitrary repositories using "test-tool repository"). It would be
> nice to introduce more method calls in t/helper/test-repository.c that
> help us know these are safe conversions.

Or instead we could accelerate the long term plan of removing a
hard coded the_repository and have each cmd builtin take an additional
repository pointer from the init code, such that we'd bring all of Git to
work on arbitrary repositories. Then the standard test suite should be
okay, as there is no special case for the_repository any more.

> Otherwise, we are essentially
> waiting until we try to take submodule things in-process and find out
> what breaks. As we discovered with the refstore, we can't just ensure
> that all references to the_repository are removed.

Yes, that is correct. We had a similar case with partial clone,
as laid out in the cover letter for the RFC.

I'll explore both the test tool approach as well as
repository-fication of the code base.

Thanks,
Stefan


Re: [PATCH 00/19] Bring more repository handles into our code base

2018-10-17 Thread Derrick Stolee

On 10/16/2018 7:35 PM, Stefan Beller wrote:
 
 This series takes another approach as it doesn't change the signature of

 functions, but introduces new functions that can deal with arbitrary
 repositories, keeping the old function signature around using a shallow 
wrapper.

I think this is a good direction, and the changes look good to me.


 Additionally each patch adds a semantic patch, that would port from the 
old to
 the new function. These semantic patches are all applied in the very last 
patch,
 but we could omit applying the last patch if it causes too many merge 
conflicts
 and trickl in the semantic patches over time when there are no merge 
conflicts.


The semantic patches are a good idea. At some point in the future, we 
can submit a series that applies the patches to any leftover calls and 
removes the old callers. We can hopefully rely on review (and the 
semantic patches warning that there is work to do) to prevent new 
callers from being introduced in future topics. That doesn't count 
topics that come around while this one isn't merged down.


I had one high-level question: How are we testing that these "arbitrary 
repository" changes are safe? I just remember the issue we had with the 
commit-graph code relying on arbitrary repositories, but then adding a 
reference to the replace objects broke the code (because replace-objects 
wasn't using arbitrary repos correctly, but the commit-graph was tested 
with arbitrary repositories using "test-tool repository"). It would be 
nice to introduce more method calls in t/helper/test-repository.c that 
help us know these are safe conversions. Otherwise, we are essentially 
waiting until we try to take submodule things in-process and find out 
what breaks. As we discovered with the refstore, we can't just ensure 
that all references to the_repository are removed.


Thanks,
-Stolee


[PATCH 00/19] Bring more repository handles into our code base

2018-10-16 Thread Stefan Beller
This rerolls sb/more-repo-in-api.
It applies on nd/the-index merged with ds/reachable and is available via
git fetch https://github.com/stefanbeller/git object-store-final-3

I addressed all of Jonathans comments, see range-diff below;
the last patch (applying the semantic patches) has been omitted as that
would produce a lot of merge conflicts. Without that patch, this merges
cleanly to next.

As for when to apply the semantic patches, I wondered if we would prefer a 
dirty merge
(created via "make coccicheck && git apply 
contrib/coccinelle/the_repository.cocci.patch")
of the semantic patches, or if we'd actually trickle in the changes over some 
time?

Thanks,
Stefan

An earlier RFC was sent out at
https://public-inbox.org/git/20181011211754.31369-1-sbel...@google.com/ and 
said:
This applies on nd/the-index (b3c7eef9b05) and is the logical continuation 
of
the object store series, which I sent over the last year.

The previous series did take a very slow and pedantic approach,
using a #define trick, see cfc62fc98c for details, but it turns out,
that it doesn't work:
   When changing the signature of widely used functions, it burdens the
   maintainer in resolving the semantic conflicts.
   
   In the orginal approach this was called a feature, as then we can ensure
   that not bugs creep into the code base during the merge window (while 
such
   a refactoring series wanders from pu to master). It turns out this
   was not well received and was just burdensome.
   
   The #define trick doesn't buy us much to begin with when dealing with
   non-merge-conflicts.  For example, see deref_tag at tag.c:68, which got
   the repository argument in 286d258d4f (tag.c: allow deref_tag to handle
   arbitrary repositories, 2018-06-28) but lost its property of working on 
any
   repository while 8c4cc32689 (tag: don't warn if target is missing but
   promised, 2018-07-12) was in flight simultaneously.
   
   Another example of failure of this approach is seen in patch 5, which
   shows that the pedantry was missed.

This series takes another approach as it doesn't change the signature of
functions, but introduces new functions that can deal with arbitrary 
repositories, keeping the old function signature around using a shallow 
wrapper.

Additionally each patch adds a semantic patch, that would port from the old 
to
the new function. These semantic patches are all applied in the very last 
patch,
but we could omit applying the last patch if it causes too many merge 
conflicts
and trickl in the semantic patches over time when there are no merge 
conflicts.


The original goal of all these refactoring series was to remove 
add_submodule_odb 
in submodule.c, which was partially reached with this series. I'll 
investigate the
remaining calls in another series, but it shows we're close to be done with 
these
large refactorings as far as I am concerned.

Thanks,
Stefan


Stefan Beller (19):
  sha1_file: allow read_object to read objects in arbitrary repositories
  packfile: allow has_packed_and_bad to handle arbitrary repositories
  object-store: allow read_object_file_extended to read from arbitrary
repositories
  object-store: prepare read_object_file to deal with arbitrary
repositories
  object-store: prepare has_{sha1, object}_file[_with_flags] to handle
arbitrary repositories
  object: parse_object to honor its repository argument
  commit: allow parse_commit* to handle arbitrary repositories
  commit-reach.c: allow paint_down_to_common to handle arbitrary
repositories
  commit-reach.c: allow merge_bases_many to handle arbitrary
repositories
  commit-reach.c: allow remove_redundant to handle arbitrary
repositories
  commit-reach.c: allow get_merge_bases_many_0 to handle arbitrary
repositories
  commit-reach: prepare get_merge_bases to handle arbitrary repositories
  commit-reach: prepare in_merge_bases[_many] to handle arbitrary
repositories
  commit: prepare get_commit_buffer to handle arbitrary repositories
  commit: prepare repo_unuse_commit_buffer to handle arbitrary
repositories
  commit: prepare logmsg_reencode to handle arbitrary repositories
  pretty: prepare format_commit_message to handle arbitrary repositories
  submodule: use submodule repos for object lookup
  submodule: don't add submodule as odb for push

 commit-reach.c  |  73 +++-
 commit-reach.h  |  38 +--
 commit.c|  32 --
 commit.h|  39 ++-
 contrib/coccinelle/the_repository.cocci | 144 
 object-store.h  |  35 --
 object.c|   6 +-
 packfile.c  |   5 +-
 packfile.h  |   

Re: [RFC PATCH 00/19] Bring more repository handles into our code base

2018-10-12 Thread Stefan Beller
On Fri, Oct 12, 2018 at 11:50 AM Jonathan Nieder  wrote:

>
> It appears that some patches use a the_index-style
> NO_THE_REPOSITORY_COMPATIBILITY_MACROS backward compatibility synonym
> and others don't.  Can you say a little more about this aspect of the
> approach?  Would the compatibility macros go away eventually?

I use the macro only when not doing the whole conversion in the patch
(i.e. there is a coccinelle patch IFF there is the macro and vice versa).

It's quite frankly a judgement call what I would convert as a whole
and what not, it depends on the usage of the functions and if I know
series that are in flight using it. The full conversion is easy to write if
there are less than a hand full of callers, so for the "small case", I just
did it, hoping it won't break other topics in flight.


Re: [RFC PATCH 00/19] Bring more repository handles into our code base

2018-10-12 Thread Jonathan Nieder
Hi,

Stefan Beller wrote:

> This applies on nd/the-index (b3c7eef9b05) and is the logical continuation of
> the object store series, which I sent over the last year.
>
> The previous series did take a very slow and pedantic approach,
> using a #define trick, see cfc62fc98c for details, but it turns out,
> that it doesn't work:

Thanks for the heads up --- this will remind me to review this new
series more carefully, since it differs from what was reviewed before.

I think this will be easiest to review with --function-context.  I can
generate that diff locally, so no need to resend.

>When changing the signature of widely used functions, it burdens the
>maintainer in resolving the semantic conflicts.
>
>In the orginal approach this was called a feature, as then we can ensure
>that not bugs creep into the code base during the merge window (while such
>a refactoring series wanders from pu to master). It turns out this
>was not well received and was just burdensome.

I don't agree with this characterization.

The question of who resolves conflicts is separate from the question
of whether conflicts appear, which is in turn separate from the
question of whether the build breaks.

I consider making the build break when a caller tries to use a
half-converted function too early to be a very useful feature.  There
is a way to do that in C++ that allows decoupled conversions, but the
C version forced an ordering of the conversions.  It seems that the
pain was caused by the combination of

 1. that coupling, which forced an ordering on the conversions and
prevented us from ordering the patches in an order based on
convenience of integration (unlike e.g. the "struct object_id"
series which was able to proceed by taking a batch covering a
quiet area of the tree at a time)

 2. as you mentioned, removal of old API at the same time of addition
 of new API forced callers across the tree to update at once

 3. the lack of having decided how to handle the anticipated churn

Now most of the conversions are done (thanks much for that) so the
ordering (1) is not the main remaining pain point.  Thanks for
tackling the other two in this series.

I want future API changes to be easier.  That means tackling the
following questions up front:

 i. Where does this fit in Rusty's API rating scheme
?
Does misuse (or misconverted callers) break the build, break
visibly at runtime, or are the effects more subtle?

 ii. Is there good test coverage for the new API?  Are there tests
 that need to be migrated?

 iii. Is there a way to automatically migrate callers, or does this
  require manual, error-prone work (thanks for tackling that in
  this one.)

 iv. How are we planning to handle multiple patches in flight?  Will
 the change produce merge conflicts?  How can others on the list
 help the maintainer with integrating this set of changes?

 iv. Is the ending point cleaner than where we started?

The #define trick you're referring to was a way of addressing (i).

[...]
>  79 files changed, 571 insertions(+), 278 deletions(-)

Most of the increase is in the coccinelle file and in improved
documentation.

It appears that some patches use a the_index-style
NO_THE_REPOSITORY_COMPATIBILITY_MACROS backward compatibility synonym
and others don't.  Can you say a little more about this aspect of the
approach?  Would the compatibility macros go away eventually?

Thanks,
Jonathan


Re: [RFC PATCH 00/19] Bring more repository handles into our code base

2018-10-11 Thread Junio C Hamano
Stefan Beller  writes:

> Additionally each patch adds a semantic patch, that would port from the old to
> the new function. These semantic patches are all applied in the very last 
> patch,
> but we could omit applying the last patch if it causes too many merge 
> conflicts
> and trickl in the semantic patches over time when there are no merge 
> conflicts.

That's an interesting approach ;-)

> The original goal of all these refactoring series was to remove 
> add_submodule_odb 
> in submodule.c, which was partially reached with this series.

Yup, that is a very good goalpost to keep in mind.

> remaining calls in another series, but it shows we're close to be done with 
> these
> large refactorings as far as I am concerned.

Nice.


Re: [RFC PATCH 00/19] Bring more repository handles into our code base

2018-10-11 Thread Jonathan Tan
> This series takes another approach as it doesn't change the signature of
> functions, but introduces new functions that can deal with arbitrary 
> repositories, keeping the old function signature around using a shallow 
> wrapper.
> 
> Additionally each patch adds a semantic patch, that would port from the old to
> the new function. These semantic patches are all applied in the very last 
> patch,
> but we could omit applying the last patch if it causes too many merge 
> conflicts
> and trickl in the semantic patches over time when there are no merge 
> conflicts.

Thanks, this looks like a good plan.

One concern is that if we leave 2 versions of functions around, it will
be difficult to look at a function and see if it's truly
multi-repository-compatible (or making a call to a function that
internally uses the_repository, and is thus wrong). But with the plan
Stefan quoted [1], mentioned in commit e675765235 ("diff.c: remove
implicit dependency on the_index", 2018-09-21):

  The plan is these macros will always be defined for all library files
  and the macros are only accessible in builtin/

(The macros include NO_THE_REPOSITORY_COMPATIBILITY_MACROS, which
disables the single-repository function-like macros.) This mitigates the
concern somewhat.

[1] https://public-inbox.org/git/20181011211754.31369-1-sbel...@google.com/


[RFC PATCH 00/19] Bring more repository handles into our code base

2018-10-11 Thread Stefan Beller
This applies on nd/the-index (b3c7eef9b05) and is the logical continuation of
the object store series, which I sent over the last year.

The previous series did take a very slow and pedantic approach,
using a #define trick, see cfc62fc98c for details, but it turns out,
that it doesn't work:
   When changing the signature of widely used functions, it burdens the
   maintainer in resolving the semantic conflicts.
   
   In the orginal approach this was called a feature, as then we can ensure
   that not bugs creep into the code base during the merge window (while such
   a refactoring series wanders from pu to master). It turns out this
   was not well received and was just burdensome.
   
   The #define trick doesn't buy us much to begin with when dealing with
   non-merge-conflicts.  For example, see deref_tag at tag.c:68, which got
   the repository argument in 286d258d4f (tag.c: allow deref_tag to handle
   arbitrary repositories, 2018-06-28) but lost its property of working on any
   repository while 8c4cc32689 (tag: don't warn if target is missing but
   promised, 2018-07-12) was in flight simultaneously.
   
   Another example of failure of this approach is seen in patch 5, which
   shows that the pedantry was missed.

This series takes another approach as it doesn't change the signature of
functions, but introduces new functions that can deal with arbitrary 
repositories, keeping the old function signature around using a shallow wrapper.

Additionally each patch adds a semantic patch, that would port from the old to
the new function. These semantic patches are all applied in the very last patch,
but we could omit applying the last patch if it causes too many merge conflicts
and trickl in the semantic patches over time when there are no merge conflicts.


The original goal of all these refactoring series was to remove 
add_submodule_odb 
in submodule.c, which was partially reached with this series. I'll investigate 
the
remaining calls in another series, but it shows we're close to be done with 
these
large refactorings as far as I am concerned.

Thanks,
Stefan

Stefan Beller (19):
  sha1_file: allow read_object to read objects in arbitrary repositories
  packfile: allow has_packed_and_bad to handle arbitrary repositories
  object-store: allow read_object_file_extended to read from arbitrary
repositories
  object-store: prepare read_object_file to deal with arbitrary
repositories
  object: parse_object to honor its repository argument
  commit: allow parse_commit* to handle arbitrary repositories
  commit.c: allow paint_down_to_common to handle arbitrary repositories
  commit.c: allow merge_bases_many to handle arbitrary repositories
  commit.c: allow remove_redundant to handle arbitrary repositories
  commit: allow get_merge_bases_many_0 to handle arbitrary repositories
  commit: prepare get_merge_bases to handle arbitrary repositories
  commit: prepare get_commit_buffer to handle arbitrary repositories
  commit: prepare in_merge_bases[_many] to handle arbitrary repositories
  commit: prepare repo_unuse_commit_buffer to handle arbitrary
repositories
  commit: prepare logmsg_reencode to handle arbitrary repositories
  pretty: prepare format_commit_message to handle arbitrary repositories
  submodule: use submodule repos for object lookup
  submodule: don't add submodule as odb for push
  Apply semantic patches from previous patches

 apply.c |   6 +-
 archive.c   |   5 +-
 bisect.c|   5 +-
 blame.c |  15 +--
 builtin/am.c|   2 +-
 builtin/blame.c |   4 +-
 builtin/cat-file.c  |  21 +++--
 builtin/checkout.c  |   4 +-
 builtin/commit.c|  13 ++-
 builtin/describe.c  |   4 +-
 builtin/difftool.c  |   3 +-
 builtin/fast-export.c   |   7 +-
 builtin/fmt-merge-msg.c |   8 +-
 builtin/grep.c  |   2 +-
 builtin/index-pack.c|   8 +-
 builtin/log.c   |   4 +-
 builtin/merge-base.c|   2 +-
 builtin/merge-tree.c|   9 +-
 builtin/mktag.c |   3 +-
 builtin/name-rev.c  |   2 +-
 builtin/notes.c |  12 ++-
 builtin/pack-objects.c  |  22 +++--
 builtin/reflog.c|   5 +-
 builtin/replace.c   |   2 +-
 builtin/shortlog.c  |   5 +-
 builtin/show-branch.c   |   4 +-
 builtin/tag.c   |   4 +-
 builtin/unpack-file.c   |   2 +-
 builtin/unpack-objects.c|   3 +-
 builtin/verify-commit.c |   2 +-
 bundle.c|   2 +-
 combine-diff.c