Re: FW: Windows. Git, and Dedupe
Am 20.03.2013 21:43, schrieb Josh Rowe: If you have Win8 or HyperV 2012, I can ship you a small NTFS .vhd with some deduped files. I'm not sure if that will be readable, but I would hazard a guess that it would be. It definitely will not be readable on Win7. It would be nice if you could upload it to an FTP server or website and post a public link so that the real git-on-Windows developers can get it as well. You can also send it to me personally and I'll see if I can mount it using Windows 8 and where I get from there. In any case, please make sure there's no sensitive or private data in the VHD file. How big is it after compression using (preferably) 7-Zip or ZIP? I'm using: PS C:\> git version git version 1.8.0.msysgit.0 That means compat/mingw.c is directly relevant to you; more about MinGW, MSys and git at http://msysgit.github.com/ and http://mingw.org/. The file sizes show up as their original size with Windows tools (powershell, Win32, cmd, .Net, etc). git ls-tree -r HEAD does not show that hash code for files that are not intentionally empty. So we can likely (hopefully) get the sizes of deduped files with the same API calls as for regular ones. Which makes me even more puzzled over the question of why git makes a difference between the two kinds. René -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: FW: Windows. Git, and Dedupe
If you have Win8 or HyperV 2012, I can ship you a small NTFS .vhd with some deduped files. I'm not sure if that will be readable, but I would hazard a guess that it would be. It definitely will not be readable on Win7. I'm using: PS C:\> git version git version 1.8.0.msysgit.0 I don't see any changes related to this in the file log since the original code was added in 2010. I do notice that mingw_fstat doesn't do anything special with symlinks; I don't know where that is used. The file sizes show up as their original size with Windows tools (powershell, Win32, cmd, .Net, etc). git ls-tree -r HEAD does not show that hash code for files that are not intentionally empty. Jmr -Original Message- From: René Scharfe [mailto:rene.scha...@lsrfire.ath.cx] Sent: Wednesday, March 20, 2013 12:55 PM To: Josh Rowe Cc: git@vger.kernel.org; msys...@googlegroups.com Subject: Re: FW: Windows. Git, and Dedupe Am 19.03.2013 22:36, schrieb Josh Rowe: > Yes, Dedup is in fact a Server-only feature. Is there an easier way to reproduce the issue than registering and downloading the Windows Server 2012 evaluation version? It's not that hard, admittedly, but still. > The reparse point could be decoded as being a non-symlink reparse > itemusing; in those cases, treating the file as an "ordinary" > file would be appropriate. > > For example, see the following. The reparse tag value for symlinks > isIO_REPARSE_TAG_SYMLINK (0xa00c) and for deduped files is > (IO_REPARSE_TAG_DEDUP) 0x8013. That's interesting and invalidates my initial checks with mklink, because if I read compat/mingw.c [1] correctly then git handles symlinks on Windows in a special way, but should treat dedup reparse points as normal files already. Hrm, but probably st_size is set to zero for them. Do the deduped files appear as empty? "git ls-tree -r HEAD" would show them with a hash of e69de29bb2d1d6434b8b29ae775ad8c2e48c5391. If true then how do we get their real content sizes using Win32 API calls? By the way, what does the command "git version" return for you? Thanks, René [1] https://git.kernel.org/cgit/git/git.git/tree/compat/mingw.c#n427
Re: FW: Windows. Git, and Dedupe
Am 19.03.2013 22:36, schrieb Josh Rowe: Yes, Dedup is in fact a Server-only feature. Is there an easier way to reproduce the issue than registering and downloading the Windows Server 2012 evaluation version? It's not that hard, admittedly, but still. The reparse point could be decoded as being a non-symlink reparse itemusing; in those cases, treating the file as an "ordinary" file would be appropriate. For example, see the following. The reparse tag value for symlinks isIO_REPARSE_TAG_SYMLINK (0xa00c) and for deduped files is > (IO_REPARSE_TAG_DEDUP) 0x8013. That's interesting and invalidates my initial checks with mklink, because if I read compat/mingw.c [1] correctly then git handles symlinks on Windows in a special way, but should treat dedup reparse points as normal files already. Hrm, but probably st_size is set to zero for them. Do the deduped files appear as empty? "git ls-tree -r HEAD" would show them with a hash of e69de29bb2d1d6434b8b29ae775ad8c2e48c5391. If true then how do we get their real content sizes using Win32 API calls? By the way, what does the command "git version" return for you? Thanks, René [1] https://git.kernel.org/cgit/git/git.git/tree/compat/mingw.c#n427 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: FW: Windows. Git, and Dedupe
Yes, Dedup is in fact a Server-only feature. However, there are lots of people using the Server SKU as development workstations (especially here at Microsoft ). There are also some sysadmins that I know of who use git and download sysadmin scripts via git to Servers. Finally, I would hazard a guess that it's possible to mount an NTFS filesystem containing deduped files from a Server machine onto a client SKU and access those files. (I'm not on the NTFS team, and haven't tried it.) So I think there are good reasons to support reparse points on Windows. The reparse point could be decoded as being a non-symlink reparse item using; in those cases, treating the file as an "ordinary" file would be appropriate. For example, see the following. The reparse tag value for symlinks is IO_REPARSE_TAG_SYMLINK (0xa00c) and for deduped files is (IO_REPARSE_TAG_DEDUP) 0x8013. The value can be discovered from the information at [1]. I admin to not having looked at the git code nor being familiar with mingw. Are native Win32 calls supported in the git codebase? Jmr [1] http://msdn.microsoft.com/en-us/library/windows/desktop/aa365740(v=vs.85).aspx PS I:\temp> cmd /c mklink x y symbolic link created for x <<===>> y PS I:\temp> fsutil reparsepoint query x Reparse Tag Value : 0xa00c Tag value: Microsoft Tag value: Name Surrogate Tag value: Symbolic Link Reparse Data Length: 0x0010 Reparse Data: : 02 00 02 00 00 00 02 00 01 00 00 00 79 00 79 00 y.y. PS I:\temp> fsutil reparsepoint query x.txt Reparse Tag Value : 0x8013 Tag value: Microsoft Reparse Data Length: 0x007c Reparse Data: : 01 02 7c 00 00 00 00 00 66 9c 1a 01 00 00 00 00 ..|.f... 0010: 00 00 01 00 00 00 00 00 cb eb c5 00 6a 97 63 4d j.cM 0020: 97 9c 13 0c 41 8e ed 8b 40 00 40 00 40 00 00 00 A...@.@.@... 0030: d3 b9 a8 d4 e4 c6 cd 01 55 ca 02 00 00 00 05 00 U... 0040: 70 ac 21 04 00 00 05 00 01 00 00 00 88 8d 00 00 p.!. 0050: c8 30 00 00 00 00 00 00 c8 44 db 94 6c 88 9a d4 .0...D..l... 0060: 0a a9 01 3a 1f 80 80 8d ea 0d 53 d7 36 49 b9 a4 ...:..S.6I.. 0070: 82 a2 b9 4e 2a 16 4b a1 2e d9 f3 dd ...N*.K. -Original Message- From: René Scharfe [mailto:rene.scha...@lsrfire.ath.cx] Sent: Tuesday, March 19, 2013 2:08 PM To: Josh Rowe Cc: git@vger.kernel.org; msys...@googlegroups.com Subject: Re: FW: Windows. Git, and Dedupe Am 18.03.2013 22:20, schrieb Josh Rowe: > On Windows with an NTFS volume with Deduplication enabled, Git > believes that deduplicated files are symlinks. It then fails to be > able to do anything with the file. This can be repro-ed by creating > an NTFS volume with dedup, creating some duplicate files, verifying > that a few files are deduped, and trying to add and commit the files > via git. Both Single Instance Storage[1] and Data Deduplication[2] (introduced with Windows Server 2012) seem to be server-only features. How about keeping regular git repositories with checked-out files on client disks and use the server only for bare repositories (without working tree)? When I tried to add a symbolic link created with mklink on Windows 8, the mingw version of git refused because readlink(2) is not supported. This seems to be sufficient to reproduce the issue. I couldn't test the Cygwin version, though, because http://cygwin.com doesn't respond at the moment. But a working readlink(2) wouldn't help anyway, I guess. I imagine that the reparse points used for deduplication point into a magic block store which performs garbage collection of content that is no longer referenced -- which probably means that a recreated "symlink" may point to blocks that have been deleted in the meantime. Perhaps you need a way to ask git to always follow symlinks instead of trying to store their target specification. René [1] http://technet.microsoft.com/en-us/library/dd573308%28v=ws.10%29.aspx [2] http://msdn.microsoft.com/en-us/library/windows/desktop/hh769303%28v=vs.85%29.aspx N�r��yb�X��ǧv�^�){.n�+ا���ܨ}���Ơz�&j:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf
Re: FW: Windows. Git, and Dedupe
Am 18.03.2013 22:20, schrieb Josh Rowe: > On Windows with an NTFS volume with Deduplication enabled, Git > believes that deduplicated files are symlinks. It then fails to be > able to do anything with the file. This can be repro-ed by creating > an NTFS volume with dedup, creating some duplicate files, verifying > that a few files are deduped, and trying to add and commit the files > via git. Both Single Instance Storage[1] and Data Deduplication[2] (introduced with Windows Server 2012) seem to be server-only features. How about keeping regular git repositories with checked-out files on client disks and use the server only for bare repositories (without working tree)? When I tried to add a symbolic link created with mklink on Windows 8, the mingw version of git refused because readlink(2) is not supported. This seems to be sufficient to reproduce the issue. I couldn't test the Cygwin version, though, because http://cygwin.com doesn't respond at the moment. But a working readlink(2) wouldn't help anyway, I guess. I imagine that the reparse points used for deduplication point into a magic block store which performs garbage collection of content that is no longer referenced -- which probably means that a recreated "symlink" may point to blocks that have been deleted in the meantime. Perhaps you need a way to ask git to always follow symlinks instead of trying to store their target specification. René [1] http://technet.microsoft.com/en-us/library/dd573308%28v=ws.10%29.aspx [2] http://msdn.microsoft.com/en-us/library/windows/desktop/hh769303%28v=vs.85%29.aspx -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
FW: Windows. Git, and Dedupe
Windows probably isn’t the most popular platform for Git developers ☺, but here goes… On Windows with an NTFS volume with Deduplication enabled, Git believes that deduplicated files are symlinks. It then fails to be able to do anything with the file. This can be repro-ed by creating an NTFS volume with dedup, creating some duplicate files, verifying that a few files are deduped, and trying to add and commit the files via git. Jmr N�r��yb�X��ǧv�^�){.n�+ا���ܨ}���Ơz�&j:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf