Re: Unfortunate interaction between git diff-index and fakeroot

2017-12-08 Thread Elazar Leibovich


On 12/07/2017 05:31 PM, Junio C Hamano wrote:

Correct.  fakeroot would report that the files that are actually
owned by the user who is running fakeroot are owned by root; the
cached stat information in the index would be "corrected" to say
that they are owned by root.  So once the index is refreshed like
so, things will become consistent.

I still fail to understand, what's the benefit of storing the owner ID in
the index, where no one seems to use this information.
I mean, one could store many other meta information in the index, such as
ACL lists,  number of blocks allocated, etc, but the code seems to
ignore unused information, such as commit 
2cb45e95438c113871fbbea5b4f629f9463034e7

which ignores st_dev, because it doesn't actually matter, or
commit e44794706eeb57f2ee38ed1604821aa38b8ad9d2 which ignores
mode changes not relevant to the owner.

In which situation does anyone care about the owner UID?
What's the difference between that, and, e.g., st_dev?

If you are using "diff-index" for the "is the tree dirty?" check
without running "update-index --refresh", then you are not using
the command correctly.

Just want to clarify. It seems like the linux kernel is using diff-index
to do just that, in scripts/setlocalversion

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/scripts/setlocalversion#n77

Do I understand correctly that linux should update the index
first, or better, use porcelain "git diff --name-only" instead?

It seems to be the case previously, but in commit 
cdf2bc632ebc9ef512345fe8e6015edfd367e256

git update-index was removed, to prevent error messages on
read-only linux tree.

Do I understand correctly that the more correct solution
for modern git, is to use git diff --name-only instead? It should
work on read-only as well as on read-write trees, and
would not break if anyone uses git under fakeroot.

Did I understand you correctly?


Unfortunate interaction between git diff-index and fakeroot

2017-12-07 Thread Elazar Leibovich

Hi,

We noticed some unexpected behavior of git, when running git commands under
fakeroot, and then running another command without fakeroot.

When running, e.g., git status, or git describe --dirty, git can
update the index file.

This can happen inadvertently by, e.g., running the make install process of
a package under fakeroot, and running "git describe --dirty", or "git
status" to generate package name.

The unexpected result is:

"fakeroot git status" updates the index, and the index now says all files
are owned by uid:0.

"git diff-index --name-only HEAD" is used to test if the git tree is dirty
without fakeroot, concluding all files have changed, since their owner UID
is changed.

You can easily recreate it:

$ mkdir gitexample;cd gitexample;git init
Initialized empty Git repository in /home/elazar/gitexample/.git/
$ touch x; git add x; git commit -m init
 1 file changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 x
$ git diff-index HEAD # nothing is printed, since working dir clean
$ fakeroot git diff # index is updated to contain uid:0
$ git ls-files -s --debug
100644 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 0   x
  ctime: 1512636533:915178157
  mtime: 1512636533:915178157
  dev: 64515ino: 8523907
  uid: 0gid: 0
  size: 0   flags: 0
$ git diff-index --name-only HEAD # x is considered different now
x

Make sure you do not have a shell prompt that calls git status in
between, like oh-my-zsh, or bash-it.

This is problematic because:

1. File owner is not really tracked by git porcelain, and I'm not sure
   what's the benefit of tracking ownership of files in git diff-index. 
Users
   who expect git diff to return "no changes" after ownership changed, 
expect

   git diff-index to return "no changes" as well.

  This is supported by the manual page who say "Compares the content 
and mode
  of the blobs found in a tree object with the corresponding tracked 
files in

  the working tree", and doesn't mention file ownership.

2. Linux tree uses git diff-index to test if the working tree is dirty. It
   can, and has cause subtle issues when building and packaging different
   parts of the tree to different directories concurrently.

Our proposed solutions:

Make match_stat_data ignore ownership changes.

I searched git source for usage of OWNER_CHANGED bit and found that no one
tests for it. I can't think of a scenario where the fact that the owner have
changed would matter.

I searched for usage of git-diff-index in git tree, and found that
git-stash uses it. This is not relevant since stash would ignore ownership
changes. I saw that merge-octupus and filter-branch uses git-diff-index
but they also don't seem to care if ownership has changed.

I searched for the first commit who tracked file ownership in the index,
it was there from the first commit e83c5163316f89bfbde7d9ab23ca2e25604af290
I found no specific reason given for tracking the owner in related commits.

You can see a change in a similar spirit to what I propose in commit
2cb45e95438c113871fbbea5b4f629f9463034e7 where st_dev is ignored by default
due to similar problem in NFS.

What do you think?

Should the behavior remain, and diff-index should say it tracks "mode and
owner changes"?
Would it be OK to change the actual behavior of diff-index to ignore owner
changes?