Re: [git-users] files case name changes detection.
thanks a lot Konstantin for the detailed answer. great help indeed. I was well aware of the similarity index of git diff and planned to use it. however, your global view of the issue made it much clearer for me. I do think I will need to go through all the commits in between since when applied during a pull, it is done one by one so in between commits with a rename like that can still confuse windows clients. regarding a change+rename you mentioned - if you declare you are looking for 100% in the similarity index + the rename flag from git - so we are covered, as you said. On Wednesday, November 13, 2013 2:31:25 PM UTC+2, Konstantin Khomoutov wrote: On Tue, 12 Nov 2013 08:11:40 -0800 (PST) Gabby Romano omer...@gmail.com javascript: wrote: I would like to be able to prevent case name changes done on windows clients and being pushed to our linux remote repository. when pulled, it confuses the other windows clients and messes things up. I want to use a hook for that along with the rename detection mechanism of git, if I can call it this way. my question is - what would be the best way to approach this in the hook ? detect the rename and check the content is the same (sha1 check ? ) am I wrong regarding the approach in general and there is a much better way to do this ? I think it's a viable approach though it should be used somewhat differently to what you proposed. Git does not explicitly track file renames so renaming a file (without changing the file's contents) physically looks like a (new) commit indirectly referencing *the same* blob of data as does one of its parent commits, but this blob has a different symbolic name attached to it in one of the tree objects referenced by both commits. In Git, each commit references exactly one tree object (representing the root directory of the project), and that tree object might reference zero or more other tree objects -- one for each top-level subdirectory, and so on and so on going deeper down. Tree objects also reference blobs which contain the data of tracked files. In a tree object, both references to blobs and to other tree objects are decorated with filesystem names for these objects. That's how filesystem hierarchies are mapped to Git objects in its database. As you can see, renames can only be detected by analyzing a part of the commit graph using special algorythms. Fortunately, Git has this machinery implemented for you in its `git diff` command: git diff --name-status --diff-filter=R --find-renames A B should present you with a list or files (along with the Rnnn marker in front of them) which were renamed in commit B compared to commit A. R means Git detected the file has been renamed, and nnn shows you the persentage of the file's contents which remained unchanged (the similarity index; 100% means the file's content hasn't changed). Applying this paradigm in a post-receive hook *might* be more involved since any push operation might update any ref (a branch or a tag) with more than one commit at once -- in a general case with a graph of commits. If you're fine with merely making sure the new tip commit does not introduce any rename compared to the old one, ignoring anything which might have happened in between, just use `git diff` as shown above. If, instead, you want to be sure no commit between A and B introduced a rename, you should employ the fact the command for walking commit graphs, `git log`, is able to use `git diff` machinery for analyzing the commits as it walks. Hence, if we have a ref that is currently pointing to a commit A, and it's about to be updated by the push operation to point to a commit B, we could call git log --oneline --name-status --diff-filter=R --find-renames A..B to list all commits sent to update our ref, which contain renames. Notice that `git diff` is passed the two revisions as a separate arguments, and `git log` receives the rule A..B which means all commits reachable from B excluding all commits reachable from A. I should stress again that there are no renames in Git -- in this VCS this concept is purely synthetic, and so there are knobs to control the algorythm which detects renames. At least you can specify which persentage of the file contents must be *not* changed in a commit to count as a rename. The rationale: in a given commit, a file might be both changed and renamed, and since their contents differ, how do you tell if it's still logically the same file or not? You might tell Git your idea about this -- read the `git diff` manual page about the --find-renames (-M) command-line option. -- You received this message because you are subscribed to the Google Groups Git for human beings group. To unsubscribe from this group and stop receiving emails from it, send an email to git-users+unsubscr...@googlegroups.com. For more options, visit
Re: [git-users] files case name changes detection.
On Thu, 14 Nov 2013 02:37:20 -0800 (PST) Gabby Romano omerik...@gmail.com wrote: [...] regarding a change+rename you mentioned - if you declare you are looking for 100% in the similarity index + the rename flag from git - so we are covered, as you said. Really? Not so fast, please. ;-) Looking for R100 markers or passing -M100 to `git diff`/`git log` means you're looking for files which were just renamed, with their contents unchanged. But what if I change a single letter in a file *and* rename it in the same commit? Its similarity index will be close to 100%, but still lower, and so this rename will pass undetected. I meant to underline (but seems like I failed at it) that what constitutes a rename is a rather philosophical question in Git as merely *using* `git mv` does not *record* a rename using some imaginary metadata (as could be the case in other VCSes). Look at it this way: while comparing two commits Git sees two identical blobs with their filesystem names changed -- okay, that's surely a rename. Now it sees two blobs with similarity index of, say, 50% and different names -- is this a rename? That could be just a file split into two (or three or more) other files or parts of the code contained in the original file moved to other existing files. Since no explicit renames are recorded, Git does not know by itself what it really is. I would recommend to thoroughly read this post [1] to really understand the Git's approach to this and its implications. So *to me,* using -M100 may pass certain renames undetected. I would possibly use something like 80%... I don't know. You decide. 1. http://thread.gmane.org/gmane.comp.version-control.git/27/focus=217 -- You received this message because you are subscribed to the Google Groups Git for human beings group. To unsubscribe from this group and stop receiving emails from it, send an email to git-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: [git-users] files case name changes detection.
On Tue, 12 Nov 2013 08:11:40 -0800 (PST) Gabby Romano omerik...@gmail.com wrote: I would like to be able to prevent case name changes done on windows clients and being pushed to our linux remote repository. when pulled, it confuses the other windows clients and messes things up. I want to use a hook for that along with the rename detection mechanism of git, if I can call it this way. my question is - what would be the best way to approach this in the hook ? detect the rename and check the content is the same (sha1 check ? ) am I wrong regarding the approach in general and there is a much better way to do this ? I think it's a viable approach though it should be used somewhat differently to what you proposed. Git does not explicitly track file renames so renaming a file (without changing the file's contents) physically looks like a (new) commit indirectly referencing *the same* blob of data as does one of its parent commits, but this blob has a different symbolic name attached to it in one of the tree objects referenced by both commits. In Git, each commit references exactly one tree object (representing the root directory of the project), and that tree object might reference zero or more other tree objects -- one for each top-level subdirectory, and so on and so on going deeper down. Tree objects also reference blobs which contain the data of tracked files. In a tree object, both references to blobs and to other tree objects are decorated with filesystem names for these objects. That's how filesystem hierarchies are mapped to Git objects in its database. As you can see, renames can only be detected by analyzing a part of the commit graph using special algorythms. Fortunately, Git has this machinery implemented for you in its `git diff` command: git diff --name-status --diff-filter=R --find-renames A B should present you with a list or files (along with the Rnnn marker in front of them) which were renamed in commit B compared to commit A. R means Git detected the file has been renamed, and nnn shows you the persentage of the file's contents which remained unchanged (the similarity index; 100% means the file's content hasn't changed). Applying this paradigm in a post-receive hook *might* be more involved since any push operation might update any ref (a branch or a tag) with more than one commit at once -- in a general case with a graph of commits. If you're fine with merely making sure the new tip commit does not introduce any rename compared to the old one, ignoring anything which might have happened in between, just use `git diff` as shown above. If, instead, you want to be sure no commit between A and B introduced a rename, you should employ the fact the command for walking commit graphs, `git log`, is able to use `git diff` machinery for analyzing the commits as it walks. Hence, if we have a ref that is currently pointing to a commit A, and it's about to be updated by the push operation to point to a commit B, we could call git log --oneline --name-status --diff-filter=R --find-renames A..B to list all commits sent to update our ref, which contain renames. Notice that `git diff` is passed the two revisions as a separate arguments, and `git log` receives the rule A..B which means all commits reachable from B excluding all commits reachable from A. I should stress again that there are no renames in Git -- in this VCS this concept is purely synthetic, and so there are knobs to control the algorythm which detects renames. At least you can specify which persentage of the file contents must be *not* changed in a commit to count as a rename. The rationale: in a given commit, a file might be both changed and renamed, and since their contents differ, how do you tell if it's still logically the same file or not? You might tell Git your idea about this -- read the `git diff` manual page about the --find-renames (-M) command-line option. -- You received this message because you are subscribed to the Google Groups Git for human beings group. To unsubscribe from this group and stop receiving emails from it, send an email to git-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.