Re: Git on Windows maps creation time onto changed time

2018-02-02 Thread Johannes Schindelin
Hi Keith,

On Fri, 2 Feb 2018, Keith Goldfarb wrote:

> > On Feb 2, 2018, at 1:18 PM, Johannes Schindelin 
> >  wrote:
> > 
> > You cannot assume that the inode numbers are identical between file
> > systems/operating systems. That's simply not going to work.
> 
> Yes, I agree with you, I cannot assume this, so I checked. In my case, the 
> inode numbers are indeed identical.
> 
> > I know you really want *so hard* for the same working directory to be
> > accessible from both Windows and Linux. I have a lot of sympathy for that
> > sentiment. Though I do not see much chance for success on that front.
> 
> I’m certainly willing to accept that there are going to be limitations by 
> using a filesystem from two different operating systems. But regardless of 
> the problems caused by that pattern, would you agree that the Windows code 
> should be using the actual inode number (well, 32-bits of it) instead of zero?

In all likelihood, you have enabled the FS Cache feature. And for a good
reason: it makes many Git operations much, much faster.

Still not as fast as on Linux because Git is quite tuned to assume all the
Linux semantics. So instead of using tell-tales for file modifications
that are portable, Git uses tell-tales that are readily available on
Linux. And instead of using access patterns that are portable, Git assumes
POSIX semantics where you simply use opendir()/readdir()/closedir() and in
that inner loop, you call lstat() to get all the stat data.

This is obviously not portable beyond POSIX. On Windows, for example,
there are really fast FindFirstFile()/FindNextFile() APIs that even allow
you to specify patterns to match (which Linux has to do manually, by
inspecting the file name obtained via readdir()). You even get some stat
data back at the same time, really fast, without additional API calls,
unlike on Linux.

But Git's access patterns really are tuned to the *dir() way, and we have
to emulate it in Git for Windows. This is very, very slow.

The FS Cache feature tries to gain back some speed by using
FindFirstFile()/FindNextFile() pre-emptively and caching the data, looking
it up upon the next time stat data is queried.

The stat data obtained that way does not, however, include Change Time nor
inode number.

And that, dear children, was the story for the day: this is why Git for
Windows uses the CreationTime instead of the Change Time and why it marks
inode as 0; Because to do it accurately would be a lot slower, and we
cannot afford even slower because we are already super slow because we
have to work around Git's assumptions as to what APIs are available.

Ciao,
Johannes

Re: Git on Windows maps creation time onto changed time

2018-02-02 Thread Keith Goldfarb
> On Feb 2, 2018, at 1:18 PM, Johannes Schindelin  
> wrote:
> 
> You cannot assume that the inode numbers are identical between file
> systems/operating systems. That's simply not going to work.

Yes, I agree with you, I cannot assume this, so I checked. In my case, the 
inode numbers are indeed identical.

> I know you really want *so hard* for the same working directory to be
> accessible from both Windows and Linux. I have a lot of sympathy for that
> sentiment. Though I do not see much chance for success on that front.

I’m certainly willing to accept that there are going to be limitations by using 
a filesystem from two different operating systems. But regardless of the 
problems caused by that pattern, would you agree that the Windows code should 
be using the actual inode number (well, 32-bits of it) instead of zero?

K.

Re: Git on Windows maps creation time onto changed time

2018-02-02 Thread Johannes Schindelin
Hi Keith,

On Fri, 2 Feb 2018, Keith Goldfarb wrote:

> > The purpose of these values is to allow to notice a change on the file
> > system without going through the actual file data. Duplicating
> > st_mtime would be pointless.
> 
> Well, I agree with the first statement. I disagree with the 2nd. It’s
> not pointless to fix a problem, and in theory the creation date of a
> file can never change.
> 
> > Don't do that then. Use core.trustctime.
> 
> I am. Unfortunately, my problem isn’t solved by that alone. Perhaps this
> deserves its own thread, but on Windows the st_ino field is set to zero.
> This can also trigger a false positive, causing the whole cache to be
> rebuilt unnecessarily.

You cannot assume that the inode numbers are identical between file
systems/operating systems. That's simply not going to work.

I know you really want *so hard* for the same working directory to be
accessible from both Windows and Linux. I have a lot of sympathy for that
sentiment. Though I do not see much chance for success on that front.

Ciao,
Johannes

Re: Git on Windows maps creation time onto changed time

2018-02-02 Thread Keith Goldfarb
> The purpose of these values is to allow to notice a change on the file system 
> without going through the actual file data. Duplicating st_mtime would be 
> pointless.

Well, I agree with the first statement. I disagree with the 2nd. It’s not 
pointless to fix a problem, and in theory the creation date of a file can never 
change.

> Don't do that then. Use core.trustctime.

I am. Unfortunately, my problem isn’t solved by that alone. Perhaps this 
deserves its own thread, but on Windows the st_ino field is set to zero. This 
can also trigger a false positive, causing the whole cache to be rebuilt 
unnecessarily.

K.

Re: Git on Windows maps creation time onto changed time

2018-02-02 Thread Johannes Sixt

Am 02.02.2018 um 00:10 schrieb Keith Goldfarb:

Dear git,

While tracking down a problem with a filesystem shared by Windows and Ubuntu, I 
came across the following code in compat/mingw.c (ming_fstat(), also in 
do_lstat()):

if (GetFileInformationByHandle(fh, )) {
buf->st_ino = 0;
buf->st_gid = 0;
buf->st_uid = 0;
buf->st_nlink = 1;
buf->st_mode = file_attr_to_st_mode(fdata.dwFileAttributes);
buf->st_size = fdata.nFileSizeLow |
(((off_t)fdata.nFileSizeHigh)<<32);
buf->st_dev = buf->st_rdev = 0; /* not used by Git */
buf->st_atime = filetime_to_time_t(&(fdata.ftLastAccessTime));
buf->st_mtime = filetime_to_time_t(&(fdata.ftLastWriteTime));
buf->st_ctime = filetime_to_time_t(&(fdata.ftCreationTime));
return 0;
}

The assignment of buf->st_ctime doesn’t seem right to me. I
understand there’s no good choice here, but I think a better choice
would be to  duplicate the definition used for st_mtime.


The purpose of these values is to allow to notice a change on the file 
system without going through the actual file data. Duplicating st_mtime 
would be pointless.



Background: When I do a git status on Windows and then later on
Ubuntu (or the other order), it is extremely slow, as the entire tree
is being traversed. I tracked it down to this difference in
definition of c_time. Yes, I know about the core.trustctime variable,
but my problem aside  this seems like an unwise choice.


Don't do that then. Use core.trustctime.

-- Hannes


Re: Git on Windows maps creation time onto changed time

2018-02-01 Thread Ævar Arnfjörð Bjarmason
On Fri, Feb 2, 2018 at 12:10 AM, Keith Goldfarb
 wrote:
> Dear git,
>
> While tracking down a problem with a filesystem shared by Windows and Ubuntu, 
> I came across the following code in compat/mingw.c (ming_fstat(), also in 
> do_lstat()):
>
> if (GetFileInformationByHandle(fh, )) {
> buf->st_ino = 0;
> buf->st_gid = 0;
> buf->st_uid = 0;
> buf->st_nlink = 1;
> buf->st_mode = file_attr_to_st_mode(fdata.dwFileAttributes);
> buf->st_size = fdata.nFileSizeLow |
> (((off_t)fdata.nFileSizeHigh)<<32);
> buf->st_dev = buf->st_rdev = 0; /* not used by Git */
> buf->st_atime = filetime_to_time_t(&(fdata.ftLastAccessTime));
> buf->st_mtime = filetime_to_time_t(&(fdata.ftLastWriteTime));
> buf->st_ctime = filetime_to_time_t(&(fdata.ftCreationTime));
> return 0;
> }
>
> The assignment of buf->st_ctime doesn’t seem right to me. I understand 
> there’s no good choice here, but I think a better choice would be to 
> duplicate the definition used for st_mtime.
>
> Background: When I do a git status on Windows and then later on Ubuntu (or 
> the other order), it is extremely slow, as the entire tree is being 
> traversed. I tracked it down to this difference in definition of c_time. Yes, 
> I know about the core.trustctime variable, but my problem aside this seems 
> like an unwise choice.
>
> Thanks for listening,

Let's CC Marius Storm-Olsen who added this behavior (CC'd).