Re: Unfortunate interaction between git diff-index and fakeroot
Elazar Leibovichwrites: > ignore unused information, such as commit > 2cb45e95438c113871fbbea5b4f629f9463034e7 > which ignores st_dev, because it doesn't actually matter, or I do not think it ignores because "it doesn't matter". st_dev is known not to be stable (e.g. across reboots and reconnects nfs), so it is a poor indicator to use as a quick measure to see if the file contents may have changed since we last checked. On the other hand, in a sane (it is of course open to debate "by whose definition") environment, the fact that the owner of the file has changed _is_ a significant enough sign to suspect that the file contents have changed (i.e. the file is suspect to be dirty; you can actually check and refresh, of course, though). > commit e44794706eeb57f2ee38ed1604821aa38b8ad9d2 which ignores > mode changes not relevant to the owner. And that one is not even about the cached stat information used for the quick "dirty-ness" check. The change is about the mode bits in recorded history. File's mtime is not recorded in the history, either, but we do care and record it in the index, because it can be used as a good (albeit a bit too coarse grained) indicator that the contents of a file may have changed. The owner and group fall into the same category.
Re: Unfortunate interaction between git diff-index and fakeroot
On 12/07/2017 05:31 PM, Junio C Hamano wrote: Correct. fakeroot would report that the files that are actually owned by the user who is running fakeroot are owned by root; the cached stat information in the index would be "corrected" to say that they are owned by root. So once the index is refreshed like so, things will become consistent. I still fail to understand, what's the benefit of storing the owner ID in the index, where no one seems to use this information. I mean, one could store many other meta information in the index, such as ACL lists, number of blocks allocated, etc, but the code seems to ignore unused information, such as commit 2cb45e95438c113871fbbea5b4f629f9463034e7 which ignores st_dev, because it doesn't actually matter, or commit e44794706eeb57f2ee38ed1604821aa38b8ad9d2 which ignores mode changes not relevant to the owner. In which situation does anyone care about the owner UID? What's the difference between that, and, e.g., st_dev? If you are using "diff-index" for the "is the tree dirty?" check without running "update-index --refresh", then you are not using the command correctly. Just want to clarify. It seems like the linux kernel is using diff-index to do just that, in scripts/setlocalversion https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/scripts/setlocalversion#n77 Do I understand correctly that linux should update the index first, or better, use porcelain "git diff --name-only" instead? It seems to be the case previously, but in commit cdf2bc632ebc9ef512345fe8e6015edfd367e256 git update-index was removed, to prevent error messages on read-only linux tree. Do I understand correctly that the more correct solution for modern git, is to use git diff --name-only instead? It should work on read-only as well as on read-write trees, and would not break if anyone uses git under fakeroot. Did I understand you correctly?
Re: Unfortunate interaction between git diff-index and fakeroot
Elazar Leibovichwrites: > We noticed some unexpected behavior of git, when running git commands under > fakeroot, and then running another command without fakeroot. > > When running, e.g., git status, or git describe --dirty, git can > update the index file. Correct. fakeroot would report that the files that are actually owned by the user who is running fakeroot are owned by root; the cached stat information in the index would be "corrected" to say that they are owned by root. So once the index is refreshed like so, things will become consistent. > The unexpected result is: > > "fakeroot git status" updates the index, and the index now says all files > are owned by uid:0. You should learn to expect it; that is how fakeroot works. > "git diff-index --name-only HEAD" is used to test if the git tree is dirty > without fakeroot, concluding all files have changed, since their owner UID > is changed. The lower-level plumbing diff-* commands are meant to be used by scripts that refresh the index once upfront. If you are using "diff-index" for the "is the tree dirty?" check without running "update-index --refresh", then you are not using the command correctly.
Unfortunate interaction between git diff-index and fakeroot
Hi, We noticed some unexpected behavior of git, when running git commands under fakeroot, and then running another command without fakeroot. When running, e.g., git status, or git describe --dirty, git can update the index file. This can happen inadvertently by, e.g., running the make install process of a package under fakeroot, and running "git describe --dirty", or "git status" to generate package name. The unexpected result is: "fakeroot git status" updates the index, and the index now says all files are owned by uid:0. "git diff-index --name-only HEAD" is used to test if the git tree is dirty without fakeroot, concluding all files have changed, since their owner UID is changed. You can easily recreate it: $ mkdir gitexample;cd gitexample;git init Initialized empty Git repository in /home/elazar/gitexample/.git/ $ touch x; git add x; git commit -m init 1 file changed, 0 insertions(+), 0 deletions(-) create mode 100644 x $ git diff-index HEAD # nothing is printed, since working dir clean $ fakeroot git diff # index is updated to contain uid:0 $ git ls-files -s --debug 100644 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 0 x ctime: 1512636533:915178157 mtime: 1512636533:915178157 dev: 64515ino: 8523907 uid: 0gid: 0 size: 0 flags: 0 $ git diff-index --name-only HEAD # x is considered different now x Make sure you do not have a shell prompt that calls git status in between, like oh-my-zsh, or bash-it. This is problematic because: 1. File owner is not really tracked by git porcelain, and I'm not sure what's the benefit of tracking ownership of files in git diff-index. Users who expect git diff to return "no changes" after ownership changed, expect git diff-index to return "no changes" as well. This is supported by the manual page who say "Compares the content and mode of the blobs found in a tree object with the corresponding tracked files in the working tree", and doesn't mention file ownership. 2. Linux tree uses git diff-index to test if the working tree is dirty. It can, and has cause subtle issues when building and packaging different parts of the tree to different directories concurrently. Our proposed solutions: Make match_stat_data ignore ownership changes. I searched git source for usage of OWNER_CHANGED bit and found that no one tests for it. I can't think of a scenario where the fact that the owner have changed would matter. I searched for usage of git-diff-index in git tree, and found that git-stash uses it. This is not relevant since stash would ignore ownership changes. I saw that merge-octupus and filter-branch uses git-diff-index but they also don't seem to care if ownership has changed. I searched for the first commit who tracked file ownership in the index, it was there from the first commit e83c5163316f89bfbde7d9ab23ca2e25604af290 I found no specific reason given for tracking the owner in related commits. You can see a change in a similar spirit to what I propose in commit 2cb45e95438c113871fbbea5b4f629f9463034e7 where st_dev is ignored by default due to similar problem in NFS. What do you think? Should the behavior remain, and diff-index should say it tracks "mode and owner changes"? Would it be OK to change the actual behavior of diff-index to ignore owner changes?