Re: [git-users] files case name changes detection.

2013-11-14 Thread Gabby Romano
thanks a lot Konstantin for the detailed answer. great help indeed.

I was well aware of the similarity index of git diff and planned to use it. 
however, your global view of the issue made it much clearer for me.
I do think I will need to go through all the commits in between since when 
applied during a pull, it is done one by one so in between commits with a 
rename like that can still confuse windows clients.

regarding a change+rename you mentioned - if you declare you are looking 
for 100% in the similarity index + the rename flag from git - so we are 
covered, as you said.

On Wednesday, November 13, 2013 2:31:25 PM UTC+2, Konstantin Khomoutov 
wrote:

 On Tue, 12 Nov 2013 08:11:40 -0800 (PST) 
 Gabby Romano omer...@gmail.com javascript: wrote: 

  I would like to be able to prevent case name changes done on windows 
  clients and being pushed to our linux remote repository. when pulled, 
  it confuses the other windows clients and messes things up. I want to 
  use a hook for that along with the rename detection mechanism of git, 
  if I can call it this way. 
  
  my question is - what would be the best way to approach this in the 
  hook ? detect the rename and check the content is the same (sha1 
  check ? ) 
  
  am I wrong regarding the approach in general and there is a much 
  better way to do this ? 

 I think it's a viable approach though it should be used somewhat 
 differently to what you proposed. 

 Git does not explicitly track file renames so renaming a file (without 
 changing the file's contents) physically looks like a (new) commit 
 indirectly referencing *the same* blob of data as does one of its parent 
 commits, but this blob has a different symbolic name attached to it in 
 one of the tree objects referenced by both commits. 

 In Git, each commit references exactly one tree object (representing the 
 root directory of the project), and that tree object might reference 
 zero or more other tree objects -- one for each top-level subdirectory, 
 and so on and so on going deeper down.  Tree objects also reference 
 blobs which contain the data of tracked files.  In a tree object, both 
 references to blobs and to other tree objects are decorated with 
 filesystem names for these objects.  That's how filesystem 
 hierarchies are mapped to Git objects in its database. 

 As you can see, renames can only be detected by analyzing a part 
 of the commit graph using special algorythms.  Fortunately, Git has 
 this machinery implemented for you in its `git diff` command: 

 git diff --name-status --diff-filter=R --find-renames A B 

 should present you with a list or files (along with the Rnnn marker in 
 front of them) which were renamed in commit B compared to commit A. 
 R means Git detected the file has been renamed, and nnn shows you 
 the persentage of the file's contents which remained unchanged (the 
 similarity index; 100% means the file's content hasn't changed). 

 Applying this paradigm in a post-receive hook *might* be more involved 
 since any push operation might update any ref (a branch or a tag) with 
 more than one commit at once -- in a general case with a graph of 
 commits.  If you're fine with merely making sure the new tip commit 
 does not introduce any rename compared to the old one, ignoring 
 anything which might have happened in between, just use `git diff` as 
 shown above.  If, instead, you want to be sure no commit between A and B 
 introduced a rename, you should employ the fact the command for walking 
 commit graphs, `git log`, is able to use `git diff` machinery for 
 analyzing the commits as it walks. 

 Hence, if we have a ref that is currently pointing to a commit A, and 
 it's about to be updated by the push operation to point to a commit B, 
 we could call 

 git log --oneline --name-status --diff-filter=R --find-renames A..B 

 to list all commits sent to update our ref, which contain renames. 

 Notice that `git diff` is passed the two revisions as a separate 
 arguments, and `git log` receives the rule A..B which means all 
 commits reachable from B excluding all commits reachable from A. 

 I should stress again that there are no renames in Git -- in this VCS 
 this concept is purely synthetic, and so there are knobs to control the 
 algorythm which detects renames.  At least you can specify which 
 persentage of the file contents must be *not* changed in a commit to 
 count as a rename.  The rationale: in a given commit, a file might be 
 both changed and renamed, and since their contents differ, how do you 
 tell if it's still logically the same file or not?  You might tell Git 
 your idea about this -- read the `git diff` manual page about the 
 --find-renames (-M) command-line option. 


-- 
You received this message because you are subscribed to the Google Groups Git 
for human beings group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to git-users+unsubscr...@googlegroups.com.
For more options, visit 

Re: [git-users] files case name changes detection.

2013-11-14 Thread Konstantin Khomoutov
On Thu, 14 Nov 2013 02:37:20 -0800 (PST)
Gabby Romano omerik...@gmail.com wrote:

[...]
 regarding a change+rename you mentioned - if you declare you are
 looking for 100% in the similarity index + the rename flag from git -
 so we are covered, as you said.

Really?  Not so fast, please. ;-)

Looking for R100 markers or passing -M100 to `git diff`/`git log`
means you're looking for files which were just renamed, with their
contents unchanged.  But what if I change a single letter in a file
*and* rename it in the same commit?  Its similarity index will be close
to 100%, but still lower, and so this rename will pass undetected.

I meant to underline (but seems like I failed at it) that what
constitutes a rename is a rather philosophical question in Git as
merely *using* `git mv` does not *record* a rename using some imaginary
metadata (as could be the case in other VCSes).

Look at it this way: while comparing two commits Git sees two identical
blobs with their filesystem names changed -- okay, that's surely a
rename.  Now it sees two blobs with similarity index of, say, 50% and
different names -- is this a rename?  That could be just a file split
into two (or three or more) other files or parts of the code contained
in the original file moved to other existing files.  Since no explicit
renames are recorded, Git does not know by itself what it really is.

I would recommend to thoroughly read this post [1] to really
understand the Git's approach to this and its implications.

So *to me,* using -M100 may pass certain renames undetected.
I would possibly use something like 80%...  I don't know.  You decide.

1. http://thread.gmane.org/gmane.comp.version-control.git/27/focus=217

-- 
You received this message because you are subscribed to the Google Groups Git 
for human beings group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to git-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: [git-users] files case name changes detection.

2013-11-13 Thread Konstantin Khomoutov
On Tue, 12 Nov 2013 08:11:40 -0800 (PST)
Gabby Romano omerik...@gmail.com wrote:

 I would like to be able to prevent case name changes done on windows 
 clients and being pushed to our linux remote repository. when pulled,
 it confuses the other windows clients and messes things up. I want to
 use a hook for that along with the rename detection mechanism of git,
 if I can call it this way. 
 
 my question is - what would be the best way to approach this in the
 hook ? detect the rename and check the content is the same (sha1
 check ? )
 
 am I wrong regarding the approach in general and there is a much
 better way to do this ? 

I think it's a viable approach though it should be used somewhat
differently to what you proposed.

Git does not explicitly track file renames so renaming a file (without
changing the file's contents) physically looks like a (new) commit
indirectly referencing *the same* blob of data as does one of its parent
commits, but this blob has a different symbolic name attached to it in
one of the tree objects referenced by both commits.

In Git, each commit references exactly one tree object (representing the
root directory of the project), and that tree object might reference
zero or more other tree objects -- one for each top-level subdirectory,
and so on and so on going deeper down.  Tree objects also reference
blobs which contain the data of tracked files.  In a tree object, both
references to blobs and to other tree objects are decorated with
filesystem names for these objects.  That's how filesystem
hierarchies are mapped to Git objects in its database.

As you can see, renames can only be detected by analyzing a part
of the commit graph using special algorythms.  Fortunately, Git has
this machinery implemented for you in its `git diff` command:

git diff --name-status --diff-filter=R --find-renames A B

should present you with a list or files (along with the Rnnn marker in
front of them) which were renamed in commit B compared to commit A.
R means Git detected the file has been renamed, and nnn shows you
the persentage of the file's contents which remained unchanged (the
similarity index; 100% means the file's content hasn't changed).

Applying this paradigm in a post-receive hook *might* be more involved
since any push operation might update any ref (a branch or a tag) with
more than one commit at once -- in a general case with a graph of
commits.  If you're fine with merely making sure the new tip commit
does not introduce any rename compared to the old one, ignoring
anything which might have happened in between, just use `git diff` as
shown above.  If, instead, you want to be sure no commit between A and B
introduced a rename, you should employ the fact the command for walking
commit graphs, `git log`, is able to use `git diff` machinery for
analyzing the commits as it walks.

Hence, if we have a ref that is currently pointing to a commit A, and
it's about to be updated by the push operation to point to a commit B,
we could call

git log --oneline --name-status --diff-filter=R --find-renames A..B

to list all commits sent to update our ref, which contain renames.

Notice that `git diff` is passed the two revisions as a separate
arguments, and `git log` receives the rule A..B which means all
commits reachable from B excluding all commits reachable from A.

I should stress again that there are no renames in Git -- in this VCS
this concept is purely synthetic, and so there are knobs to control the
algorythm which detects renames.  At least you can specify which
persentage of the file contents must be *not* changed in a commit to
count as a rename.  The rationale: in a given commit, a file might be
both changed and renamed, and since their contents differ, how do you
tell if it's still logically the same file or not?  You might tell Git
your idea about this -- read the `git diff` manual page about the
--find-renames (-M) command-line option.

-- 
You received this message because you are subscribed to the Google Groups Git 
for human beings group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to git-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.