Re: [PATCH 10/10] fast-export: add --always-show-modify-after-rename

2018-11-13 Thread Jeff King
On Tue, Nov 13, 2018 at 09:10:36AM -0800, Elijah Newren wrote:

> > I am looking at this problem as "how do you answer question X in a
> > repository". And I think you are looking at as "I am receiving a
> > fast-export stream, and I need to answer question X on the fly".
> >
> > And that would explain why you want to get extra annotations into the
> > fast-export stream. Is that right?
> 
> I'm not trying to get information on the fly during a rewrite or
> anything like that.  This is an optional pre-rewrite step (from a
> separate invocation of the tool) where I have multiple questions I
> want to answer.  I'd like to answer them all relatively quickly, if
> possible, and I think all of them should be answerable with a single
> history traversal (plus a cat-file --batch-all-objects call to get
> object sizes, since I don't know of another way to get those).  I'd be
> fine with switching from fast-export to log or something else if it
> met the needs better.

Ah, OK. Yes, if we're just trying to query, then I think you should be
able to do what you want with the existing traversal and diff tools. And
if not, we should think about a new feature there, and not try to
shoe-horn it into fast-export.

> As far as I can tell, you're trying to split each question apart and
> do a history traversal for each, and I don't see why that's better.
> Simpler, perhaps, but it seems worse for performance.  Am I missing
> something?

I was only trying to address each possible query individually. I agree
that if you are querying both things, you should be able to do it in a
single traversal (and that is strictly better). It may require a little
more parsing of the output (e.g., `--find-object` is easy to implement
yourself looking at --raw output).

> Ah, I didn't know renames were on by default; I somehow missed that.
> Also, the rev-list to diff-tree pipe is nice, but I also need parent
> and commit timestamp information.

diff-tree will format the commit info as well (before git-log was a C
builtin, it was just a rev-list/diff-tree pipeline in a shell script).
So you can do:

  git rev-list ... |
  git diff-tree --stdin --format='%h %ct %p' --raw -r -M

and get dump very similar to what fast-export would give you.

> > >   git log -M --diff-filter=RAMD --no-abbrev --raw
> >
> > What is there besides RAMD? :)
> 
> Well, as you pointed out above, log detects renames by default,
> whereas it didn't used to.
> So, if someone had written some similar-ish history walking/parsing
> tool years ago that didn't depend need renames and was based on log
> output, there's a good chance their tool might start failing when
> rename detection was turned on by default, because instead of getting
> both a 'D' and an 'M' change, they'd get an unexpected 'R'.

Mostly I just meant: your diff-filter includes basically everything, so
why bother filtering? You're going to have to parse the result anyway,
and you can throw away uninteresting bits there.

> For my case, do I have to worry about similar future changes?  Will
> copy detection ('C') or break detection ('B') become the default in
> the future?  Do I have to worry about typechanges ('T")?  Will new
> change types be added?  I mean, the fast-export output could maybe
> change too, but it seems much less likely than with log.

If you use diff-tree, then it won't ever enable copy or break detection
without you explicitly asking for it.

> Let me try to put it as briefly as I can.  With as few traversals as
> possible, I want to:
>   * Get all blob sizes
>   * Map blob shas to filename(s) they appeared under in the history
>   * Find when files and directories were deleted (and whether they
> were later reinstated, since that means they aren't actually gone)
>   * Find sets of filenames referring to the same logical 'file'. (e.g.
> foo->bar in commit A and bar->baz in commit B mean that {foo,bar,baz}
> refer to the same 'file' so that a user has an easy report to look at
> to find out that if they just want to "keep baz and its history" then
> they need foo & bar & baz.  I need to know about things like another
> foo or bar being introduced after the rename though, since that breaks
> the connection between filenames)
>   * Do a few aggregations on the above data as well (e.g. all copies
> of postgres.exe add up to 20M -- why were those checked in anyway?,
> *.webm files in aggregate are .5G, your long-deleted src/video-server/
> directory from that aborted experimental project years ago takes up 2G
> of your history, etc.)
> 
> Right now, my best solution for this combination of questions is
> 'cat-file --batch-all-objects' plus fast-export, if I get patch 10/10
> in place.  I'm totally open to better solutions, including ones that
> don't use fast-export.

OK, I think I understand your problem better now. I don't think there's
anything fast-export can show that log/diff-tree could not, aside from
actual blob contents. But I don't think you want them (and if you did,
you can use 

Re: [PATCH 10/10] fast-export: add --always-show-modify-after-rename

2018-11-13 Thread Elijah Newren
On Tue, Nov 13, 2018 at 6:45 AM Jeff King  wrote:
> It is an expensive log command, but it's the same expense as running
> fast-export, no? And I think maybe that is the disconnect.

I would expect an expensive log command to generally be the same
expense as running fast-export, yes.  But I would expect two expensive
log commands to be twice the expense of a single fast-export (and you
suggested two log commands: both the --find-object= one and the
--diff-filter one).

> I am looking at this problem as "how do you answer question X in a
> repository". And I think you are looking at as "I am receiving a
> fast-export stream, and I need to answer question X on the fly".
>
> And that would explain why you want to get extra annotations into the
> fast-export stream. Is that right?

I'm not trying to get information on the fly during a rewrite or
anything like that.  This is an optional pre-rewrite step (from a
separate invocation of the tool) where I have multiple questions I
want to answer.  I'd like to answer them all relatively quickly, if
possible, and I think all of them should be answerable with a single
history traversal (plus a cat-file --batch-all-objects call to get
object sizes, since I don't know of another way to get those).  I'd be
fine with switching from fast-export to log or something else if it
met the needs better.

As far as I can tell, you're trying to split each question apart and
do a history traversal for each, and I don't see why that's better.
Simpler, perhaps, but it seems worse for performance.  Am I missing
something?

> > > There I think you'd want to assemble the list with something like "git
> > > log --follow --name-only paths-of-interest" except that --follow sucks
> > > too much to handle more than one path at a time.
> > >
> > > But if you wanted to do it manually, then:
> > >
> > >   git log --diff-filter=R --name-only
> > >
> > > would be enough to let you track it down, wouldn't it?
> >
> > Without a -M you'd only catch 100% renames, right?  Those aren't the
> > only ones I'd want to catch, so I'd need to add -M.  You are right
> > that we could get basic renames this way, but it doesn't cover
> > everything I need.  Let's use this as a starting point, though, and
> > build up to what I need...
>
> No, renames are on by default these days, and that includes inexact
> renames. That said, if you're scripting you probably ought to be doing:
>
>   git rev-list HEAD | git diff-tree --stdin
>
> and there yes, you'd have to enable "-M" yourself (you touched on
> scripting and formatting below; diff-tree can accept the format options
> you'd want).

Ah, I didn't know renames were on by default; I somehow missed that.
Also, the rev-list to diff-tree pipe is nice, but I also need parent
and commit timestamp information.


> Yeah, I think "-t" would help your tree deletion problem.

Absolutely, thanks for the hint.  Much appreciated.  :-)

> > At this point, let's remember that we had another full git-log
> > invocation for mapping object sizes to filenames.  We might as well
> > coalesce the two log commands into one, by extending this latest one
> > to:
> >
> >   git log -M --diff-filter=RAMD --no-abbrev --raw
>
> What is there besides RAMD? :)

Well, as you pointed out above, log detects renames by default,
whereas it didn't used to.
So, if someone had written some similar-ish history walking/parsing
tool years ago that didn't depend need renames and was based on log
output, there's a good chance their tool might start failing when
rename detection was turned on by default, because instead of getting
both a 'D' and an 'M' change, they'd get an unexpected 'R'.

For my case, do I have to worry about similar future changes?  Will
copy detection ('C') or break detection ('B') become the default in
the future?  Do I have to worry about typechanges ('T")?  Will new
change types be added?  I mean, the fast-export output could maybe
change too, but it seems much less likely than with log.

> > I could potentially switch to using this and drop patch 10/10.
>
> So I'm still not _entirely_ clear on what you're trying to do with
> 10/10. I think maybe the "disconnect" part I wrote above explains it. If
> that's correct, then I think framing it in terms of the operations that
> you'd be able to perform _without running a separate traverse_ would
> make it more obvious.

Let me try to put it as briefly as I can.  With as few traversals as
possible, I want to:
  * Get all blob sizes
  * Map blob shas to filename(s) they appeared under in the history
  * Find when files and directories were deleted (and whether they
were later reinstated, since that means they aren't actually gone)
  * Find sets of filenames referring to the same logical 'file'. (e.g.
foo->bar in commit A and bar->baz in commit B mean that {foo,bar,baz}
refer to the same 'file' so that a user has an easy report to look at
to find out that if they just want to "keep baz and its history" then
they need foo & bar & baz.  I 

Re: [PATCH 10/10] fast-export: add --always-show-modify-after-rename

2018-11-13 Thread Jeff King
On Mon, Nov 12, 2018 at 10:08:10AM -0800, Elijah Newren wrote:

> > I would do:
> >
> >git log --raw $(
> >  git cat-file --batch-check='%(objectsize:disk) %(objectname)' 
> > --batch-all-objects |
> >  sort -rn | head -3 |
> >  awk '{print "--find-object=" $2 }'
> >)
> >
> > I'm not sure how renames enter into it at all.
> 
> How did I miss objectsize:disk??  Especially since it is right next to
> objectsize in the manpage to boot?  That's awesome, thanks for that
> pointer.
> 
> I do have a separate cat-file --batch-check --batch-all-objects
> process already, since I can't get sizes out of either log or
> fast-export.  However, I wouldn't use your 'head -3' since I'm not
> looking for the N biggest, but reporting on _all_ objects (in reverse
> size order) and letting the user look over the report and deciding
> where to stop reading.  So, this is a big and expensive log command.
> Granted, we will need a big and expensive log command, but let's keep
> in mind that we have this one.

It is an expensive log command, but it's the same expense as running
fast-export, no? And I think maybe that is the disconnect.

I am looking at this problem as "how do you answer question X in a
repository". And I think you are looking at as "I am receiving a
fast-export stream, and I need to answer question X on the fly".

And that would explain why you want to get extra annotations into the
fast-export stream. Is that right?

> > There I think you'd want to assemble the list with something like "git
> > log --follow --name-only paths-of-interest" except that --follow sucks
> > too much to handle more than one path at a time.
> >
> > But if you wanted to do it manually, then:
> >
> >   git log --diff-filter=R --name-only
> >
> > would be enough to let you track it down, wouldn't it?
> 
> Without a -M you'd only catch 100% renames, right?  Those aren't the
> only ones I'd want to catch, so I'd need to add -M.  You are right
> that we could get basic renames this way, but it doesn't cover
> everything I need.  Let's use this as a starting point, though, and
> build up to what I need...

No, renames are on by default these days, and that includes inexact
renames. That said, if you're scripting you probably ought to be doing:

  git rev-list HEAD | git diff-tree --stdin

and there yes, you'd have to enable "-M" yourself (you touched on
scripting and formatting below; diff-tree can accept the format options
you'd want).

> I also want to know when files were deleted.  I've generally found
> that people are more okay with purging parts of history [corresponding
> to large ojbects] that were deleted longer ago than more recent stuff,
> for a variety of reasons.  So we could either run yet another log, or
> modify the command to:
> 
>   git log -M --diff-filter=RD --name-status
> 
> However, I don't just want to know when files were deleted, I'd like
> to know when directories are deleted.  I only knew how to derive that
> from knowing what files existed within those directories, so that
> would take me to:
> 
>   git log -M --diff-filter=RAD --name-status
> 
> [Edit: I just saw your other email and for the first time learned
> about the -t rev-list option which might simplify this a little,
> although "need to worry about deleted files being reinstated" below
> might require the 'A' anyway.]

Yeah, I think "-t" would help your tree deletion problem.

> At this point, let's remember that we had another full git-log
> invocation for mapping object sizes to filenames.  We might as well
> coalesce the two log commands into one, by extending this latest one
> to:
> 
>   git log -M --diff-filter=RAMD --no-abbrev --raw

What is there besides RAMD? :)

> I could potentially switch to using this and drop patch 10/10.

So I'm still not _entirely_ clear on what you're trying to do with
10/10. I think maybe the "disconnect" part I wrote above explains it. If
that's correct, then I think framing it in terms of the operations that
you'd be able to perform _without running a separate traverse_ would
make it more obvious.

> Anyway, I hope it makes a little more sense why I created this patch.
> Does it, or have I just made things even more confusing?

Some of both, I think.

> ...and if you've read this far, I'm impressed.  Thanks for reading.

I'll admit I skimmed near the end. ;)

-Peff


Re: [PATCH 10/10] fast-export: add --always-show-modify-after-rename

2018-11-12 Thread Elijah Newren
On Mon, Nov 12, 2018 at 4:58 AM Jeff King  wrote:
> On Sun, Nov 11, 2018 at 12:42:58AM -0800, Elijah Newren wrote:
>
> Maybe I don't understand what you're trying to accomplish. I was
> thinking specifically of your "cat-file can tell you the large objects,
> but you don't know their names/commits" from above.

Fair enough.  And just to be clear, the first 9 patches were fixes and
features around trying to rewrite history; patch 10 is orthogonal and
was used for a separate run to just gather data.  It is entirely
possible I could gather that data other ways.

> I would do:
>
>git log --raw $(
>  git cat-file --batch-check='%(objectsize:disk) %(objectname)' 
> --batch-all-objects |
>  sort -rn | head -3 |
>  awk '{print "--find-object=" $2 }'
>)
>
> I'm not sure how renames enter into it at all.

How did I miss objectsize:disk??  Especially since it is right next to
objectsize in the manpage to boot?  That's awesome, thanks for that
pointer.

I do have a separate cat-file --batch-check --batch-all-objects
process already, since I can't get sizes out of either log or
fast-export.  However, I wouldn't use your 'head -3' since I'm not
looking for the N biggest, but reporting on _all_ objects (in reverse
size order) and letting the user look over the report and deciding
where to stop reading.  So, this is a big and expensive log command.
Granted, we will need a big and expensive log command, but let's keep
in mind that we have this one.

> > One of the problems with filter-branch that people often run into is
> > they know what they want at a high-level (e.g. extract the history of
> > this directory for a new repository, or rewrite the history of this
> > repo to appear at a subdirectory so it can be merged into a bigger
> > repo and people passing filenames to log will still get the history of
> > those files, or I want to remove some of the big stuff in my history),
> > but often times that's not quite enough.  They need help finding big
> > objects, or may be unaware that the subset of files they want used to
> > be known by alternative names.
> >
> > I want a simple --analyze mode that can report on all files that have
> > been renamed (so users don't just say "all I care about is these N
> > files, give me a rewritten history just including those" -- we can
> > point out to them whether those N files used to be known by other
> > names), as well as reporting on all big files and if they've been
> > deleted, and aggregations of the "big files" information across
> > directories and file extensions.
>
> So this seems like a separate problem than what the commit message talks
> about.
>
> There I think you'd want to assemble the list with something like "git
> log --follow --name-only paths-of-interest" except that --follow sucks
> too much to handle more than one path at a time.
>
> But if you wanted to do it manually, then:
>
>   git log --diff-filter=R --name-only
>
> would be enough to let you track it down, wouldn't it?

Without a -M you'd only catch 100% renames, right?  Those aren't the
only ones I'd want to catch, so I'd need to add -M.  You are right
that we could get basic renames this way, but it doesn't cover
everything I need.  Let's use this as a starting point, though, and
build up to what I need...

I also want to know when files were deleted.  I've generally found
that people are more okay with purging parts of history [corresponding
to large ojbects] that were deleted longer ago than more recent stuff,
for a variety of reasons.  So we could either run yet another log, or
modify the command to:

  git log -M --diff-filter=RD --name-status

However, I don't just want to know when files were deleted, I'd like
to know when directories are deleted.  I only knew how to derive that
from knowing what files existed within those directories, so that
would take me to:

  git log -M --diff-filter=RAD --name-status

[Edit: I just saw your other email and for the first time learned
about the -t rev-list option which might simplify this a little,
although "need to worry about deleted files being reinstated" below
might require the 'A' anyway.]

At this point, let's remember that we had another full git-log
invocation for mapping object sizes to filenames.  We might as well
coalesce the two log commands into one, by extending this latest one
to:

  git log -M --diff-filter=RAMD --no-abbrev --raw

Also, I wanted commit date rather than author date, so we need to
extend the headers a bit.  Also, for reasons I won't bother detailing,
I think I want to traverse commits in reverse topological order.  So
our command is:

  git log --pretty=fuller --topo-order --reverse -M --diff-filter=RAMD
--no-abbrev --raw

But that still leaves us with four problems, three of which we can
solve with further extensions to this command:

1) There are some weird edge cases with deletions and renames.  Lots
of them in fact.  At a simple level, branching and merging and
multiple refs means that 

Re: [PATCH 10/10] fast-export: add --always-show-modify-after-rename

2018-11-12 Thread Jeff King
On Sun, Nov 11, 2018 at 12:42:58AM -0800, Elijah Newren wrote:

> > > fast-export output is traditionally used as an input to a fast-import
> > > program, but it is also useful to help gather statistics about the
> > > history of a repository (particularly when --no-data is also passed).
> > > For example, two of the types of information we may want to collect
> > > could include:
> > >   1) general information about renames that have occurred
> > >   2) what the biggest objects in a repository are and what names
> > >  they appear under.
> > >
> > > The first bit of information can be gathered by just passing -M to
> > > fast-export.  The second piece of information can partially be gotten
> > > from running
> > > git cat-file --batch-check --batch-all-objects
> > > However, that only shows what the biggest objects in the repository are
> > > and their sizes, not what names those objects appear as or what commits
> > > they were introduced in.  We can get that information from fast-export,
> > > but when we only see
> > > R oldname newname
> > > instead of
> > > R oldname newname
> > > M 100644 $SHA1 newname
> > > then it makes the job more difficult.  Add an option which allows us to
> > > force the latter output even when commits have exact renames of files.
> >
> > fast-export seems like a funny tool to look up paths. What about "git
> > log --find-object=$SHA1" ?
> 
> Eek, and give me O(N*M) behavior, where N is the number of commits in
> the repository and M is the number of renames that occur in its
> history?  Also, that's the inverse of the lookup I need anyway (I have
> the commit and filename, but am missing the SHA).

Maybe I don't understand what you're trying to accomplish. I was
thinking specifically of your "cat-file can tell you the large objects,
but you don't know their names/commits" from above.

I would do:

   git log --raw $(
 git cat-file --batch-check='%(objectsize:disk) %(objectname)' 
--batch-all-objects |
 sort -rn | head -3 |
 awk '{print "--find-object=" $2 }'
   )

I'm not sure how renames enter into it at all.

> One of the problems with filter-branch that people often run into is
> they know what they want at a high-level (e.g. extract the history of
> this directory for a new repository, or rewrite the history of this
> repo to appear at a subdirectory so it can be merged into a bigger
> repo and people passing filenames to log will still get the history of
> those files, or I want to remove some of the big stuff in my history),
> but often times that's not quite enough.  They need help finding big
> objects, or may be unaware that the subset of files they want used to
> be known by alternative names.
> 
> I want a simple --analyze mode that can report on all files that have
> been renamed (so users don't just say "all I care about is these N
> files, give me a rewritten history just including those" -- we can
> point out to them whether those N files used to be known by other
> names), as well as reporting on all big files and if they've been
> deleted, and aggregations of the "big files" information across
> directories and file extensions.

So this seems like a separate problem than what the commit message talks
about.

There I think you'd want to assemble the list with something like "git
log --follow --name-only paths-of-interest" except that --follow sucks
too much to handle more than one path at a time.

But if you wanted to do it manually, then:

  git log --diff-filter=R --name-only

would be enough to let you track it down, wouldn't it?

-Peff


Re: [PATCH 10/10] fast-export: add --always-show-modify-after-rename

2018-11-11 Thread Elijah Newren
On Sat, Nov 10, 2018 at 11:23 PM Jeff King  wrote:
>
> On Sat, Nov 10, 2018 at 10:23:12PM -0800, Elijah Newren wrote:
>
> > fast-export output is traditionally used as an input to a fast-import
> > program, but it is also useful to help gather statistics about the
> > history of a repository (particularly when --no-data is also passed).
> > For example, two of the types of information we may want to collect
> > could include:
> >   1) general information about renames that have occurred
> >   2) what the biggest objects in a repository are and what names
> >  they appear under.
> >
> > The first bit of information can be gathered by just passing -M to
> > fast-export.  The second piece of information can partially be gotten
> > from running
> > git cat-file --batch-check --batch-all-objects
> > However, that only shows what the biggest objects in the repository are
> > and their sizes, not what names those objects appear as or what commits
> > they were introduced in.  We can get that information from fast-export,
> > but when we only see
> > R oldname newname
> > instead of
> > R oldname newname
> > M 100644 $SHA1 newname
> > then it makes the job more difficult.  Add an option which allows us to
> > force the latter output even when commits have exact renames of files.
>
> fast-export seems like a funny tool to look up paths. What about "git
> log --find-object=$SHA1" ?

Eek, and give me O(N*M) behavior, where N is the number of commits in
the repository and M is the number of renames that occur in its
history?  Also, that's the inverse of the lookup I need anyway (I have
the commit and filename, but am missing the SHA).

One of the problems with filter-branch that people often run into is
they know what they want at a high-level (e.g. extract the history of
this directory for a new repository, or rewrite the history of this
repo to appear at a subdirectory so it can be merged into a bigger
repo and people passing filenames to log will still get the history of
those files, or I want to remove some of the big stuff in my history),
but often times that's not quite enough.  They need help finding big
objects, or may be unaware that the subset of files they want used to
be known by alternative names.

I want a simple --analyze mode that can report on all files that have
been renamed (so users don't just say "all I care about is these N
files, give me a rewritten history just including those" -- we can
point out to them whether those N files used to be known by other
names), as well as reporting on all big files and if they've been
deleted, and aggregations of the "big files" information across
directories and file extensions.


Re: [PATCH 10/10] fast-export: add --always-show-modify-after-rename

2018-11-10 Thread Jeff King
On Sat, Nov 10, 2018 at 10:23:12PM -0800, Elijah Newren wrote:

> fast-export output is traditionally used as an input to a fast-import
> program, but it is also useful to help gather statistics about the
> history of a repository (particularly when --no-data is also passed).
> For example, two of the types of information we may want to collect
> could include:
>   1) general information about renames that have occurred
>   2) what the biggest objects in a repository are and what names
>  they appear under.
> 
> The first bit of information can be gathered by just passing -M to
> fast-export.  The second piece of information can partially be gotten
> from running
> git cat-file --batch-check --batch-all-objects
> However, that only shows what the biggest objects in the repository are
> and their sizes, not what names those objects appear as or what commits
> they were introduced in.  We can get that information from fast-export,
> but when we only see
> R oldname newname
> instead of
> R oldname newname
> M 100644 $SHA1 newname
> then it makes the job more difficult.  Add an option which allows us to
> force the latter output even when commits have exact renames of files.

fast-export seems like a funny tool to look up paths. What about "git
log --find-object=$SHA1" ?

-Peff


[PATCH 10/10] fast-export: add --always-show-modify-after-rename

2018-11-10 Thread Elijah Newren
fast-export output is traditionally used as an input to a fast-import
program, but it is also useful to help gather statistics about the
history of a repository (particularly when --no-data is also passed).
For example, two of the types of information we may want to collect
could include:
  1) general information about renames that have occurred
  2) what the biggest objects in a repository are and what names
 they appear under.

The first bit of information can be gathered by just passing -M to
fast-export.  The second piece of information can partially be gotten
from running
git cat-file --batch-check --batch-all-objects
However, that only shows what the biggest objects in the repository are
and their sizes, not what names those objects appear as or what commits
they were introduced in.  We can get that information from fast-export,
but when we only see
R oldname newname
instead of
R oldname newname
M 100644 $SHA1 newname
then it makes the job more difficult.  Add an option which allows us to
force the latter output even when commits have exact renames of files.

Signed-off-by: Elijah Newren 
---
 Documentation/git-fast-export.txt | 11 ++
 builtin/fast-export.c |  7 +-
 t/t9350-fast-export.sh| 36 +++
 3 files changed, 53 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-fast-export.txt 
b/Documentation/git-fast-export.txt
index 4e40f0b99a..946a5aee1f 100644
--- a/Documentation/git-fast-export.txt
+++ b/Documentation/git-fast-export.txt
@@ -128,6 +128,17 @@ marks the same across runs.
for intermediary filters (e.g. for rewriting commit messages
which refer to older commits, or for stripping blobs by id).
 
+--always-show-modify-after-rename::
+   When a rename is detected, fast-export normally issues both a
+   'R' (rename) and a 'M' (modify) directive.  However, if the
+   contents of the old and new filename match exactly, it will
+   only issue the rename directive.  Use this flag to have it
+   always issue the modify directive after the rename, which may
+   be useful for tools which are using the fast-export stream as
+   a mechanism for gathering statistics about a repository.  Note
+   that this option only has effect when rename detection is
+   active (see the -M option).
+
 --refspec::
Apply the specified refspec to each ref exported. Multiple of them can
be specified.
diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index cc01dcc90c..db606d1fd0 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -38,6 +38,7 @@ static int use_done_feature;
 static int no_data;
 static int full_tree;
 static int reference_excluded_commits;
+static int always_show_modify_after_rename;
 static int show_original_ids;
 static struct string_list extra_refs = STRING_LIST_INIT_NODUP;
 static struct string_list tag_refs = STRING_LIST_INIT_NODUP;
@@ -407,7 +408,8 @@ static void show_filemodify(struct diff_queue_struct *q,
putchar('\n');
 
if (oideq(>oid, >oid) &&
-   ospec->mode == spec->mode)
+   ospec->mode == spec->mode &&
+   !always_show_modify_after_rename)
break;
}
/* fallthrough */
@@ -1099,6 +1101,9 @@ int cmd_fast_export(int argc, const char **argv, const 
char *prefix)
 _excluded_commits, N_("Reference parents 
which are not in fast-export stream by sha1sum")),
OPT_BOOL(0, "show-original-ids", _original_ids,
N_("Show original sha1sums of blobs/commits")),
+   OPT_BOOL(0, "always-show-modify-after-rename",
+   _show_modify_after_rename,
+N_("Always provide 'M' directive after 'R'")),
 
OPT_END()
};
diff --git a/t/t9350-fast-export.sh b/t/t9350-fast-export.sh
index 5ad6669910..d0c30672ac 100755
--- a/t/t9350-fast-export.sh
+++ b/t/t9350-fast-export.sh
@@ -638,4 +638,40 @@ test_expect_success 'merge commit gets exported with 
--import-marks' '
)
 '
 
+test_expect_success 'rename detection and --always-show-modify-after-rename' '
+   test_create_repo renames &&
+   (
+   cd renames &&
+   test_seq 0  9  >single_digit &&
+   test_seq 10 98 >double_digit &&
+   git add . &&
+   git commit -m initial &&
+
+   echo 99 >>double_digit &&
+   git mv single_digit single-digit &&
+   git mv double_digit double-digit &&
+   git add double-digit &&
+   git commit -m renames &&
+
+   # First, check normal fast-export -M output
+   git fast-export -M --no-data master >out &&
+
+