Re: FW: Windows. Git, and Dedupe

2013-03-20 Thread René Scharfe

Am 19.03.2013 22:36, schrieb Josh Rowe:

Yes, Dedup is in fact a Server-only feature.


Is there an easier way to reproduce the issue than registering and 
downloading the Windows Server 2012 evaluation version?  It's not that 
hard, admittedly, but still.



The reparse point could be decoded as being a non-symlink reparse
itemusing; in those cases, treating the file as an ordinary
file would be appropriate.

For example, see the following. The reparse tag value for symlinks
isIO_REPARSE_TAG_SYMLINK (0xa00c) and for deduped files is

 (IO_REPARSE_TAG_DEDUP) 0x8013.

That's interesting and invalidates my initial checks with mklink, 
because if I read compat/mingw.c [1] correctly then git handles symlinks 
on Windows in a special way, but should treat dedup reparse points as 
normal files already.


Hrm, but probably st_size is set to zero for them.  Do the deduped files 
appear as empty?  git ls-tree -r HEAD would show them with a hash of 
e69de29bb2d1d6434b8b29ae775ad8c2e48c5391.  If true then how do we get 
their real content sizes using Win32 API calls?


By the way, what does the command git version return for you?

Thanks,
René


[1] https://git.kernel.org/cgit/git/git.git/tree/compat/mingw.c#n427

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: FW: Windows. Git, and Dedupe

2013-03-20 Thread Josh Rowe
If you have Win8 or HyperV 2012, I can ship you a small NTFS .vhd with some 
deduped files.  I'm not sure if that will be readable, but I would hazard a 
guess that it would be.  It definitely will not be readable on Win7.  

I'm using:

PS C:\ git version
git version 1.8.0.msysgit.0

I don't see any changes related to this in the file log since the original code 
was added in 2010.  I do notice that mingw_fstat doesn't do anything special 
with symlinks; I don't know where that is used.  

The file sizes show up as their original size with Windows tools (powershell, 
Win32, cmd, .Net, etc).  git ls-tree -r HEAD does not show that hash code for 
files that are not intentionally empty.  

Jmr


-Original Message-
From: René Scharfe [mailto:rene.scha...@lsrfire.ath.cx] 
Sent: Wednesday, March 20, 2013 12:55 PM
To: Josh Rowe
Cc: git@vger.kernel.org; msys...@googlegroups.com
Subject: Re: FW: Windows. Git, and Dedupe

Am 19.03.2013 22:36, schrieb Josh Rowe:
 Yes, Dedup is in fact a Server-only feature.

Is there an easier way to reproduce the issue than registering and downloading 
the Windows Server 2012 evaluation version?  It's not that hard, admittedly, 
but still.

 The reparse point could be decoded as being a non-symlink reparse 
 itemusing; in those cases, treating the file as an ordinary
 file would be appropriate.

 For example, see the following. The reparse tag value for symlinks 
 isIO_REPARSE_TAG_SYMLINK (0xa00c) and for deduped files is
  (IO_REPARSE_TAG_DEDUP) 0x8013.

That's interesting and invalidates my initial checks with mklink, because if I 
read compat/mingw.c [1] correctly then git handles symlinks on Windows in a 
special way, but should treat dedup reparse points as normal files already.

Hrm, but probably st_size is set to zero for them.  Do the deduped files appear 
as empty?  git ls-tree -r HEAD would show them with a hash of 
e69de29bb2d1d6434b8b29ae775ad8c2e48c5391.  If true then how do we get their 
real content sizes using Win32 API calls?

By the way, what does the command git version return for you?

Thanks,
René


[1] https://git.kernel.org/cgit/git/git.git/tree/compat/mingw.c#n427





Re: FW: Windows. Git, and Dedupe

2013-03-20 Thread René Scharfe

Am 20.03.2013 21:43, schrieb Josh Rowe:

If you have Win8 or HyperV 2012, I can ship you a small NTFS .vhd
with some deduped files.  I'm not sure if that will be readable, but
I would hazard a guess that it would be.  It definitely will not be
readable on Win7.


It would be nice if you could upload it to an FTP server or website and 
post a public link so that the real git-on-Windows developers can get it 
as well.  You can also send it to me personally and I'll see if I can 
mount it using Windows 8 and where I get from there.  In any case, 
please make sure there's no sensitive or private data in the VHD file.


How big is it after compression using (preferably) 7-Zip or ZIP?


I'm using:

PS C:\ git version git version 1.8.0.msysgit.0


That means compat/mingw.c is directly relevant to you; more about MinGW, 
MSys and git at http://msysgit.github.com/ and http://mingw.org/.



The file sizes show up as their original size with Windows tools
(powershell, Win32, cmd, .Net, etc).  git ls-tree -r HEAD does not
show that hash code for files that are not intentionally empty.


So we can likely (hopefully) get the sizes of deduped files with the 
same API calls as for regular ones.  Which makes me even more puzzled 
over the question of why git makes a difference between the two kinds.


René
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: FW: Windows. Git, and Dedupe

2013-03-19 Thread René Scharfe
Am 18.03.2013 22:20, schrieb Josh Rowe:
 On Windows with an NTFS volume with Deduplication enabled, Git
 believes that deduplicated files are symlinks.  It then fails to be
 able to do anything with the file.  This can be repro-ed by creating
 an NTFS volume with dedup, creating some duplicate files, verifying
 that a few files are deduped, and trying to add and commit the files
 via git.

Both Single Instance Storage[1] and Data Deduplication[2] (introduced
with Windows Server 2012) seem to be server-only features.  How about
keeping regular git repositories with checked-out files on client
disks and use the server only for bare repositories (without working
tree)?

When I tried to add a symbolic link created with mklink on Windows 8,
the mingw version of git refused because readlink(2) is not
supported.  This seems to be sufficient to reproduce the issue.

I couldn't test the Cygwin version, though, because http://cygwin.com
doesn't respond at the moment.

But a working readlink(2) wouldn't help anyway, I guess.  I imagine
that the reparse points used for deduplication point into a magic
block store which performs garbage collection of content that is no
longer referenced -- which probably means that a recreated symlink
may point to blocks that have been deleted in the meantime.

Perhaps you need a way to ask git to always follow symlinks instead
of trying to store their target specification.

René


[1] http://technet.microsoft.com/en-us/library/dd573308%28v=ws.10%29.aspx
[2] 
http://msdn.microsoft.com/en-us/library/windows/desktop/hh769303%28v=vs.85%29.aspx

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: FW: Windows. Git, and Dedupe

2013-03-19 Thread Josh Rowe
Yes, Dedup is in fact a Server-only feature.  However, there are lots of people 
using the Server SKU as development workstations (especially here at Microsoft 
g).  There are also some sysadmins that I know of who use git and download 
sysadmin scripts via git to Servers.  Finally, I would hazard a guess that it's 
possible to mount an NTFS filesystem containing deduped files from a Server 
machine onto a client SKU and access those files.  (I'm not on the NTFS team, 
and haven't tried it.)  So I think there are good reasons to support reparse 
points on Windows.  

The reparse point could be decoded as being a non-symlink reparse item using; 
in those cases, treating the file as an ordinary file would be appropriate.

For example, see the following.  The reparse tag value for symlinks is 
IO_REPARSE_TAG_SYMLINK (0xa00c) and for deduped files is 
(IO_REPARSE_TAG_DEDUP) 0x8013.  The value can be discovered from the 
information at [1].  

I admin to not having looked at the git code nor being familiar with mingw.  
Are native Win32 calls supported in the git codebase?

Jmr


[1] 
http://msdn.microsoft.com/en-us/library/windows/desktop/aa365740(v=vs.85).aspx


PS I:\temp cmd /c mklink x y
symbolic link created for x === y
PS I:\temp fsutil reparsepoint query x
Reparse Tag Value : 0xa00c
Tag value: Microsoft
Tag value: Name Surrogate
Tag value: Symbolic Link

Reparse Data Length: 0x0010
Reparse Data:
:  02 00 02 00 00 00 02 00  01 00 00 00 79 00 79 00  y.y.
PS I:\temp fsutil reparsepoint query x.txt
Reparse Tag Value : 0x8013
Tag value: Microsoft

Reparse Data Length: 0x007c
Reparse Data:
:  01 02 7c 00 00 00 00 00  66 9c 1a 01 00 00 00 00  ..|.f...
0010:  00 00 01 00 00 00 00 00  cb eb c5 00 6a 97 63 4d  j.cM
0020:  97 9c 13 0c 41 8e ed 8b  40 00 40 00 40 00 00 00  A...@.@.@...
0030:  d3 b9 a8 d4 e4 c6 cd 01  55 ca 02 00 00 00 05 00  U...
0040:  70 ac 21 04 00 00 05 00  01 00 00 00 88 8d 00 00  p.!.
0050:  c8 30 00 00 00 00 00 00  c8 44 db 94 6c 88 9a d4  .0...D..l...
0060:  0a a9 01 3a 1f 80 80 8d  ea 0d 53 d7 36 49 b9 a4  ...:..S.6I..
0070:  82 a2 b9 4e 2a 16 4b a1  2e d9 f3 dd  ...N*.K.

-Original Message-
From: René Scharfe [mailto:rene.scha...@lsrfire.ath.cx] 
Sent: Tuesday, March 19, 2013 2:08 PM
To: Josh Rowe
Cc: git@vger.kernel.org; msys...@googlegroups.com
Subject: Re: FW: Windows. Git, and Dedupe

Am 18.03.2013 22:20, schrieb Josh Rowe:
 On Windows with an NTFS volume with Deduplication enabled, Git 
 believes that deduplicated files are symlinks.  It then fails to be 
 able to do anything with the file.  This can be repro-ed by creating 
 an NTFS volume with dedup, creating some duplicate files, verifying 
 that a few files are deduped, and trying to add and commit the files 
 via git.

Both Single Instance Storage[1] and Data Deduplication[2] (introduced with 
Windows Server 2012) seem to be server-only features.  How about keeping 
regular git repositories with checked-out files on client disks and use the 
server only for bare repositories (without working tree)?

When I tried to add a symbolic link created with mklink on Windows 8, the mingw 
version of git refused because readlink(2) is not supported.  This seems to be 
sufficient to reproduce the issue.

I couldn't test the Cygwin version, though, because http://cygwin.com doesn't 
respond at the moment.

But a working readlink(2) wouldn't help anyway, I guess.  I imagine that the 
reparse points used for deduplication point into a magic block store which 
performs garbage collection of content that is no longer referenced -- which 
probably means that a recreated symlink
may point to blocks that have been deleted in the meantime.

Perhaps you need a way to ask git to always follow symlinks instead of trying 
to store their target specification.

René


[1] http://technet.microsoft.com/en-us/library/dd573308%28v=ws.10%29.aspx
[2] 
http://msdn.microsoft.com/en-us/library/windows/desktop/hh769303%28v=vs.85%29.aspx



N�r��yb�X��ǧv�^�)޺{.n�+ا���ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf

FW: Windows. Git, and Dedupe

2013-03-18 Thread Josh Rowe
Windows probably isn’t the most popular platform for Git developers ☺, but here 
goes…

On Windows with an NTFS volume with Deduplication enabled, Git believes that 
deduplicated files are symlinks.  It then fails to be able to do anything with 
the file.  This can be repro-ed by creating an NTFS volume with dedup, creating 
some duplicate files, verifying that a few files are deduped, and trying to add 
and commit the files via git.

Jmr

N�r��yb�X��ǧv�^�)޺{.n�+ا���ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf