Re: [RFC/PATCH] Documentation/technical/api-fswatch.txt: start with outline
Just a small heads-up for people using Emacs. 24.4 has inotify support, and magit-inotify.el [1] has already started using it. From initial impressions, I'm quite impressed with it. [1]: https://github.com/magit/magit/blob/master/contrib/magit-inotify.el -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH] Documentation/technical/api-fswatch.txt: start with outline
On Mon, Mar 25, 2013 at 5:44 PM, Ramkumar Ramachandra artag...@gmail.com wrote: Just a small heads-up for people using Emacs. 24.4 has inotify support, and magit-inotify.el [1] has already started using it. From initial impressions, I'm quite impressed with it. Have you tried it? From a quick look, it seems to watch all directories. I wonder how it performs on webkit (at least 5k dirs) [1]: https://github.com/magit/magit/blob/master/contrib/magit-inotify.el -- Duy -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH] Documentation/technical/api-fswatch.txt: start with outline
Ramkumar Ramachandra artag...@gmail.com writes: Junio C Hamano wrote: Yes, and you would need one inotify per directory but you do not have an infinite supply of outstanding inotify watch (wasn't the limit like 8k per a single uid or something?), so the daemon must be prepared to say I'll watch this, that and that directories, but the consumers should check other directories themselves. FWIW, I share your suspicion that an effort in the direction this thread suggests may end up duplicating what the caching vfs layer already does, and doing so poorly. Thomas Rast wrote: $ cat /proc/sys/fs/inotify/max_user_watches 65536 $ cat /proc/sys/fs/inotify/max_user_instancest 128 From Junio's and Thomas' observations, I'm inclined to think that inotify is ill-suited for the problem we are trying to solve. It is designed as a per-directory watch, because VFS can quickly supply the inodes for a directory entry. As such, I think the ideal usecase for inotify is to execute something immediately when a change takes place in a directory: it's well-suited for solutions like Dropbox (which I think is poorly designed to begin with, but that's offtopic). It doesn't substitute of augment VFS caching. I suspect the VFS cache works by caching the inodes in a frequently used directory entry, thus optimizing calls like lstat() on them. I have three objections to changing the kernel to fit us, as opposed to just using inotify: * inotify works. I can watch most of my $HOME with the hack I linked earlier[1]. Yes, it's a lot of coding around the problem that it is nonrecursive, but we already have a lot of code around the problem that we can't ask the VFS for diffs between points in time (namely, the whole business with an index and lstat() loops). * inotify is here today. Even if you got a hypothetical notifier into the kernel today, you'd have to wait months/years until it is available in distros, and years until everyone has it. * I'll bet you a beer that the kernel folks already had the same discussion when they made inotify. There has to be a reason why it's better than providing for recursive watches. [1] https://github.com/trast/watch -- Thomas Rast trast@{inf,student}.ethz.ch -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH] Documentation/technical/api-fswatch.txt: start with outline
Junio C Hamano gits...@pobox.com writes: Karsten Blees karsten.bl...@gmail.com writes: However, AFAIK inotify doesn't work recursively, so the daemon would at least have to track the directory structure to be able to register / unregister inotify handlers as directories come and go. Yes, and you would need one inotify per directory but you do not have an infinite supply of outstanding inotify watch (wasn't the limit like 8k per a single uid or something?), so the daemon must be prepared to say I'll watch this, that and that directories, but the consumers should check other directories themselves. Those are tunable limits though. For example I run this silly hack https://github.com/trast/watch with the shell snippets to be able to quickly cd a shell to where something recently happened. I am able to watch most of my working set even under default limits, which here (opensuse tumbleweed, kernel 3.8.x, x86_64) are $ cat /proc/sys/fs/inotify/max_user_watches 65536 $ cat /proc/sys/fs/inotify/max_user_instances 128 I'm not sure if other distros impose tighter limits by default, but as it stands you're not very likely to hit the 65k watches limit in any given repo. It seems more likely that you might hit the 128 instances limit if we go with a design that uses one daemon per repo, if you run a script that accesses many repos. For example, in an android tree I have lying around, $ repo list | wc -l 297 That alone might indicate it would be a good idea to have one global git-agent that starts on demand, rather than a per-repo daemon. Otherwise we'd have to find a way to discover old daemons and tell them to quit when we hit max_user_instances. -- Thomas Rast trast@{inf,student}.ethz.ch -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH] Documentation/technical/api-fswatch.txt: start with outline
gits...@pobox.com wrote on Wed, 13 Mar 2013 12:38 -0700: Karsten Blees karsten.bl...@gmail.com writes: However, AFAIK inotify doesn't work recursively, so the daemon would at least have to track the directory structure to be able to register / unregister inotify handlers as directories come and go. Yes, and you would need one inotify per directory but you do not have an infinite supply of outstanding inotify watch (wasn't the limit like 8k per a single uid or something?), so the daemon must be prepared to say I'll watch this, that and that directories, but the consumers should check other directories themselves. fanotify is an option here too; it can watch an entire file system. -- Pete -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH] Documentation/technical/api-fswatch.txt: start with outline
On Thu, Mar 14, 2013 at 2:38 AM, Junio C Hamano gits...@pobox.com wrote: Karsten Blees karsten.bl...@gmail.com writes: However, AFAIK inotify doesn't work recursively, so the daemon would at least have to track the directory structure to be able to register / unregister inotify handlers as directories come and go. Yes, and you would need one inotify per directory but you do not have an infinite supply of outstanding inotify watch (wasn't the limit like 8k per a single uid or something?), so the daemon must be prepared to say I'll watch this, that and that directories, but the consumers should check other directories themselves. Hey I did not know that. Webkit has about 6k leaf dirs and 182k files. Watching the top N biggest directories would cover M% of cached files: N M% 10 8.60 20 13.28 30 17.52 40 20.52 50 23.55 200 49.70 676 75.00 863 80.00 1486 90.00 So it's trade-off. We can cut some syscall cost off but we probably need to pay some for inotify. And we definitely can't watch full worktree. I don't know how costly it may be for watching many directories. If it's not so costly, watching 256 or 512 dirs might be enough. What about Windows? Does the equivalent mechanism have similar limits? FWIW, I share your suspicion that an effort in the direction this thread suggests may end up duplicating what the caching vfs layer already does, and doing so poorly. I'm still curious how it works out. Maybe it's not up to the original expectation, but hopefully it will speed things up a bit. -- Duy -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH] Documentation/technical/api-fswatch.txt: start with outline
Am 13.03.2013 02:03, schrieb Duy Nguyen: On Wed, Mar 13, 2013 at 6:21 AM, Karsten Blees karsten.bl...@gmail.com wrote: Hmmm...I don't see how filesystem changes since last invocation can solve the problem, or am I missing something? I think what you mean to say is that the daemon should keep track of the filesystem *state* of the working copy, or alternatively the deltas/changes to some known state (such as .git/index)? I think git process can keep track of filesystem state (and save it down if necessary). [...] Ah, saving the state was the missing bits, thanks. However, AFAIK inotify doesn't work recursively, so the daemon would at least have to track the directory structure to be able to register / unregister inotify handlers as directories come and go. Consider 'git status; make; make clean; git status'...that's a *lot* of changes to process for nothing (potentially slowing down make). Yeah. In my opinion, the daemon should realize that at some point accumulated changes are too much that it's not worth collecting anymore, and drop them all. Git will do it the normal/slow way. After that the daemon picks up again. We only optimize for the case when little changes are made in filesystem. That sounds reasonable... Then there's the issue of stale data in the cache. Modifying porcelain commands that use 'git status --porcelain' to compile their changesets will want 100% exact data. I'm not saying its not doable, but adding another platform specific, caching daemon to the tool chain doesn't exactly simplify things... But perhaps I'm too pessimistic (or just stigmatized by inherently slow and out-of-date TGitCache/TSvnCache on Windows :-) Thanks. I didn't know about TGitCache. Will dig it up. Maybe we can learn something from it (or realize the daemon approach is futile after all). TGitCache/TSvnCache are the background processes in TortoiseGit/TortoiseSvn that keep track of filesystem state to display icon overlays in Windows Explorer. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH] Documentation/technical/api-fswatch.txt: start with outline
Karsten Blees karsten.bl...@gmail.com writes: However, AFAIK inotify doesn't work recursively, so the daemon would at least have to track the directory structure to be able to register / unregister inotify handlers as directories come and go. Yes, and you would need one inotify per directory but you do not have an infinite supply of outstanding inotify watch (wasn't the limit like 8k per a single uid or something?), so the daemon must be prepared to say I'll watch this, that and that directories, but the consumers should check other directories themselves. FWIW, I share your suspicion that an effort in the direction this thread suggests may end up duplicating what the caching vfs layer already does, and doing so poorly. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH] Documentation/technical/api-fswatch.txt: start with outline
Heiko Voigt wrote: While talking about platform independence. How about Windows? AFAIK there are no file based sockets. How about using shared memory, thats available, instead? It would greatly reduce the needed porting effort. What about the git credential helper: it uses UNIX sockets, no? How does git-credential-winstore [1] work? [1]: https://github.com/anurse/git-credential-winstore -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH] Documentation/technical/api-fswatch.txt: start with outline
On Tue, Mar 12, 2013 at 10:43 AM, Ramkumar Ramachandra artag...@gmail.com wrote: Heiko Voigt wrote: While talking about platform independence. How about Windows? AFAIK there are no file based sockets. How about using shared memory, thats available, instead? It would greatly reduce the needed porting effort. What about the git credential helper: it uses UNIX sockets, no? How does git-credential-winstore [1] work? [1]: https://github.com/anurse/git-credential-winstore First, we have a proper credential helper for Windows in contrib/credential/wincred these days. As the one who wrote that, we communicate using stdin/stdout. The credential-helper doesn't maintain state in itself, the Windows Credential Manager does. I suspect git-credential-winstore works the same way. As for Windows support, AFAIK there is no support for Unix domain sockets in Windows. But there is support for named pipes, which is almost the same thing. What we have support for in compat/mingw.[ch] is a different matter, but we can extend that if needed. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH] Documentation/technical/api-fswatch.txt: start with outline
On Tue, Mar 12, 2013 at 03:13:39PM +0530, Ramkumar Ramachandra wrote: Heiko Voigt wrote: While talking about platform independence. How about Windows? AFAIK there are no file based sockets. How about using shared memory, thats available, instead? It would greatly reduce the needed porting effort. What about the git credential helper: it uses UNIX sockets, no? How does git-credential-winstore [1] work? No, the main credential protocol happens over pipes to a child process's stdin/stdout. The credential-cache helper does use unix sockets (since it needs to contact a long-running daemon that caches the credentials), and AFAIK is not available under Windows (but that's OK, because Windows-specific helpers that use secure storage are better anyway). When I introduced credential-cache, I recall somebody mentioned that there is some Windows-equivalent IPC that can be used to emulate unix domain sockets. The calls aren't the same, but as long as your requirements are basically get messages to/from the daemon, you can probably abstract away the details on a per-platform basis. Unfortunately I can't seem to find the original message or any details in the archive (and I know next to nothing about Windows IPC). -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH] Documentation/technical/api-fswatch.txt: start with outline
Am 10.03.2013 21:17, schrieb Ramkumar Ramachandra: git operations are slow on repositories with lots of files, and lots of tiny filesystem calls like lstat(), getdents(), open() are reposible for this. On the linux-2.6 repository, for instance, the numbers for git status look like this: top syscalls sorted top syscalls sorted by acc. timeby number -- 0.401906 40950 lstat0.401906 40950 lstat 0.190484 5343 getdents 0.150055 5374 open 0.150055 5374 open 0.190484 5343 getdents 0.074843 2806 close 0.074843 2806 close 0.003216 157 read 0.003216 157 read To solve this problem, we propose to build a daemon which will watch the filesystem using inotify and report batched up events over a UNIX socket. [...] + +The credential C API is meant to be called by Git code which needs +information aboutx filesystem changes. It is centered around an +object representing the changes the filesystem since the last +invocation. + Hmmm...I don't see how filesystem changes since last invocation can solve the problem, or am I missing something? I think what you mean to say is that the daemon should keep track of the filesystem *state* of the working copy, or alternatively the deltas/changes to some known state (such as .git/index)? I'm also still skeptical whether a daemon will improve overall performance. In my understanding its essentially a filesystem cache in user-mode. The difference to using the OS filesystem cache directly (via lstat/readdir) is that we replace ~50k sys-calls with a single IPC call (i.e. the git -- fswatch daemon communication is less 'chatty'). However, the 'chattyness' is still there between the fswatch daemon and the OS / inotify. Consider 'git status; make; make clean; git status'...that's a *lot* of changes to process for nothing (potentially slowing down make). Then there's the issue of stale data in the cache. Modifying porcelain commands that use 'git status --porcelain' to compile their changesets will want 100% exact data. I'm not saying its not doable, but adding another platform specific, caching daemon to the tool chain doesn't exactly simplify things... But perhaps I'm too pessimistic (or just stigmatized by inherently slow and out-of-date TGitCache/TSvnCache on Windows :-) -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH] Documentation/technical/api-fswatch.txt: start with outline
On Wed, Mar 13, 2013 at 6:21 AM, Karsten Blees karsten.bl...@gmail.com wrote: Hmmm...I don't see how filesystem changes since last invocation can solve the problem, or am I missing something? I think what you mean to say is that the daemon should keep track of the filesystem *state* of the working copy, or alternatively the deltas/changes to some known state (such as .git/index)? I think git process can keep track of filesystem state (and save it down if necessary). But when git process is not running, system state changes and it cannot know about. The daemon helps filling this gap (and basically keeps git running (in a light form) throughout a development session). For example if we know only 5 files have changed since the last refresh, we only need to re-stat those 5. The same for untracked/ignored file checking, I'm also still skeptical whether a daemon will improve overall performance. In my understanding its essentially a filesystem cache in user-mode. The difference to using the OS filesystem cache directly (via lstat/readdir) is that we replace ~50k sys-calls with a single IPC call (i.e. the git -- fswatch daemon communication is less 'chatty'). However, the 'chattyness' is still there between the fswatch daemon and the OS / inotify. I think it attempts to reduce unnecessary system calls, not eliminate them all. In the 5 changed files above, a few IPC calls are done to retrieve the file list, then 5 lstat will be issued (by git, not the daemon) instead of thousands of them. Consider 'git status; make; make clean; git status'...that's a *lot* of changes to process for nothing (potentially slowing down make). Yeah. In my opinion, the daemon should realize that at some point accumulated changes are too much that it's not worth collecting anymore, and drop them all. Git will do it the normal/slow way. After that the daemon picks up again. We only optimize for the case when little changes are made in filesystem. Then there's the issue of stale data in the cache. Modifying porcelain commands that use 'git status --porcelain' to compile their changesets will want 100% exact data. I'm not saying its not doable, but adding another platform specific, caching daemon to the tool chain doesn't exactly simplify things... But perhaps I'm too pessimistic (or just stigmatized by inherently slow and out-of-date TGitCache/TSvnCache on Windows :-) Thanks. I didn't know about TGitCache. Will dig it up. Maybe we can learn something from it (or realize the daemon approach is futile after all). -- Duy -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH] Documentation/technical/api-fswatch.txt: start with outline
On Mon, Mar 11, 2013 at 01:47:03AM +0530, Ramkumar Ramachandra wrote: git operations are slow on repositories with lots of files, and lots of tiny filesystem calls like lstat(), getdents(), open() are reposible for this. On the linux-2.6 repository, for instance, the numbers for git status look like this: top syscalls sorted top syscalls sorted by acc. timeby number -- 0.401906 40950 lstat0.401906 40950 lstat 0.190484 5343 getdents 0.150055 5374 open 0.150055 5374 open 0.190484 5343 getdents 0.074843 2806 close 0.074843 2806 close 0.003216 157 read 0.003216 157 read To solve this problem, we propose to build a daemon which will watch the filesystem using inotify and report batched up events over a UNIX socket. Since inotify is Linux-only, we have to leave open the possibility of writing similar daemons for other platforms. Everything will continue to work as before if there is no helper present. While talking about platform independence. How about Windows? AFAIK there are no file based sockets. How about using shared memory, thats available, instead? It would greatly reduce the needed porting effort. Since operations on a lot of files is especially expensive on Windows it is one of the platforms that would profit the most from such a daemon. Cheers Heiko -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC/PATCH] Documentation/technical/api-fswatch.txt: start with outline
git operations are slow on repositories with lots of files, and lots of tiny filesystem calls like lstat(), getdents(), open() are reposible for this. On the linux-2.6 repository, for instance, the numbers for git status look like this: top syscalls sorted top syscalls sorted by acc. timeby number -- 0.401906 40950 lstat0.401906 40950 lstat 0.190484 5343 getdents 0.150055 5374 open 0.150055 5374 open 0.190484 5343 getdents 0.074843 2806 close 0.074843 2806 close 0.003216 157 read 0.003216 157 read To solve this problem, we propose to build a daemon which will watch the filesystem using inotify and report batched up events over a UNIX socket. Since inotify is Linux-only, we have to leave open the possibility of writing similar daemons for other platforms. Everything will continue to work as before if there is no helper present. The fswatch API introduces a generic way for git.git to request for filesystem changes. Different helpers (like the inotify daemon on Linux) will be plugged into this API on different platforms. It falls back to using the filesystem calls. The daemon will start up with the very first operation done on the git repository, and will die after a specified period of repository inactivity. It is going to be a per-repo daemon and will write to a socket in the repository: access control is managed by filesystem permissions. This design is inspired by the credential helper design. Signed-off-by: Ramkumar Ramachandra artag...@gmail.com --- Documentation/technical/api-fswatch.txt | 62 + 1 file changed, 62 insertions(+) create mode 100644 Documentation/technical/api-fswatch.txt diff --git a/Documentation/technical/api-fswatch.txt b/Documentation/technical/api-fswatch.txt new file mode 100644 index 000..9c6826a --- /dev/null +++ b/Documentation/technical/api-fswatch.txt @@ -0,0 +1,62 @@ +fswatch API +=== + +The fswatch API provides an abstracted way of collecting information +about filesystem changes. A remote helper is typically a daemon which +uses inotify to watch the filesystem, and this information is used by +git instead of making expensive system calls like lstat(), open(). + +Typical setup +- + + ++---+ +| Git code (C) |--- requires information about fs changes +|...| +| C fswatch API |--- system calls --- filesystem ++---+ + ^ | + | UNIX socket | + | v ++---+ +| Git fswatch helper|--- daemon inotify-watching --- filesystem ++---+ + + +The Git code will call the C API to obtain changes in filesystem +information. The API will itself call a configured helper (e.g. git +fswatch-notify) which may run filesystem changes, if the remote +helper daemon was started in a previous invocation. If the daemon is +not already running, it is started, and the C API will fall back to +making expensive system calls. + +C API +- + +The credential C API is meant to be called by Git code which needs +information aboutx filesystem changes. It is centered around an +object representing the changes the filesystem since the last +invocation. + +Data Structures +~~~ + +`struct fschanges`:: + + TODO + + +Functions +~ + +TODO + +Example +~~~ + +TODO + +fswatch Helpers +--- + +TODO -- 1.8.1.5 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html