Re: [RFC/PATCH] Documentation/technical/api-fswatch.txt: start with outline

2013-03-25 Thread Ramkumar Ramachandra
Just a small heads-up for people using Emacs.  24.4 has inotify
support, and magit-inotify.el [1] has already started using it.  From
initial impressions, I'm quite impressed with it.

[1]: https://github.com/magit/magit/blob/master/contrib/magit-inotify.el
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH] Documentation/technical/api-fswatch.txt: start with outline

2013-03-25 Thread Duy Nguyen
On Mon, Mar 25, 2013 at 5:44 PM, Ramkumar Ramachandra
artag...@gmail.com wrote:
 Just a small heads-up for people using Emacs.  24.4 has inotify
 support, and magit-inotify.el [1] has already started using it.  From
 initial impressions, I'm quite impressed with it.

Have you tried it? From a quick look, it seems to watch all
directories. I wonder how it performs on webkit (at least 5k dirs)

 [1]: https://github.com/magit/magit/blob/master/contrib/magit-inotify.el
-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH] Documentation/technical/api-fswatch.txt: start with outline

2013-03-18 Thread Thomas Rast
Ramkumar Ramachandra artag...@gmail.com writes:

 Junio C Hamano wrote:
 Yes, and you would need one inotify per directory but you do not
 have an infinite supply of outstanding inotify watch (wasn't the
 limit like 8k per a single uid or something?), so the daemon must be
 prepared to say I'll watch this, that and that directories, but the
 consumers should check other directories themselves.

 FWIW, I share your suspicion that an effort in the direction this
 thread suggests may end up duplicating what the caching vfs layer
 already does, and doing so poorly.

 Thomas Rast wrote:
   $ cat /proc/sys/fs/inotify/max_user_watches
   65536
   $ cat /proc/sys/fs/inotify/max_user_instancest
   128

 From Junio's and Thomas' observations, I'm inclined to think that
 inotify is ill-suited for the problem we are trying to solve.  It is
 designed as a per-directory watch, because VFS can quickly supply the
 inodes for a directory entry.  As such, I think the ideal usecase for
 inotify is to execute something immediately when a change takes place
 in a directory: it's well-suited for solutions like Dropbox (which I
 think is poorly designed to begin with, but that's offtopic).  It
 doesn't substitute of augment VFS caching.  I suspect the VFS cache
 works by caching the inodes in a frequently used directory entry, thus
 optimizing calls like lstat() on them.

I have three objections to changing the kernel to fit us, as opposed to
just using inotify:

* inotify works.  I can watch most of my $HOME with the hack I linked
  earlier[1].  Yes, it's a lot of coding around the problem that it is
  nonrecursive, but we already have a lot of code around the problem
  that we can't ask the VFS for diffs between points in time (namely,
  the whole business with an index and lstat() loops).

* inotify is here today.  Even if you got a hypothetical notifier into
  the kernel today, you'd have to wait months/years until it is
  available in distros, and years until everyone has it.

* I'll bet you a beer that the kernel folks already had the same
  discussion when they made inotify.  There has to be a reason why it's
  better than providing for recursive watches.


[1]  https://github.com/trast/watch

-- 
Thomas Rast
trast@{inf,student}.ethz.ch
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH] Documentation/technical/api-fswatch.txt: start with outline

2013-03-16 Thread Thomas Rast
Junio C Hamano gits...@pobox.com writes:

 Karsten Blees karsten.bl...@gmail.com writes:

 However, AFAIK inotify doesn't work recursively, so the daemon
 would at least have to track the directory structure to be able to
 register / unregister inotify handlers as directories come and go.

 Yes, and you would need one inotify per directory but you do not
 have an infinite supply of outstanding inotify watch (wasn't the
 limit like 8k per a single uid or something?), so the daemon must be
 prepared to say I'll watch this, that and that directories, but the
 consumers should check other directories themselves.

Those are tunable limits though.  For example I run this silly hack

  https://github.com/trast/watch

with the shell snippets to be able to quickly cd a shell to where
something recently happened.  I am able to watch most of my working
set even under default limits, which here (opensuse tumbleweed, kernel
3.8.x, x86_64) are

  $ cat /proc/sys/fs/inotify/max_user_watches 
  65536
  $ cat /proc/sys/fs/inotify/max_user_instances 
  128

I'm not sure if other distros impose tighter limits by default, but as
it stands you're not very likely to hit the 65k watches limit in any
given repo.  It seems more likely that you might hit the 128 instances
limit if we go with a design that uses one daemon per repo, if you run a
script that accesses many repos.  For example, in an android tree I have
lying around,

  $ repo list | wc -l
  297

That alone might indicate it would be a good idea to have one global
git-agent that starts on demand, rather than a per-repo daemon.
Otherwise we'd have to find a way to discover old daemons and tell
them to quit when we hit max_user_instances.

-- 
Thomas Rast
trast@{inf,student}.ethz.ch
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH] Documentation/technical/api-fswatch.txt: start with outline

2013-03-15 Thread Pete Wyckoff
gits...@pobox.com wrote on Wed, 13 Mar 2013 12:38 -0700:
 Karsten Blees karsten.bl...@gmail.com writes:
 
  However, AFAIK inotify doesn't work recursively, so the daemon
  would at least have to track the directory structure to be able to
  register / unregister inotify handlers as directories come and go.
 
 Yes, and you would need one inotify per directory but you do not
 have an infinite supply of outstanding inotify watch (wasn't the
 limit like 8k per a single uid or something?), so the daemon must be
 prepared to say I'll watch this, that and that directories, but the
 consumers should check other directories themselves.

fanotify is an option here too; it can watch an entire file
system.

-- Pete
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH] Documentation/technical/api-fswatch.txt: start with outline

2013-03-14 Thread Duy Nguyen
On Thu, Mar 14, 2013 at 2:38 AM, Junio C Hamano gits...@pobox.com wrote:
 Karsten Blees karsten.bl...@gmail.com writes:

 However, AFAIK inotify doesn't work recursively, so the daemon
 would at least have to track the directory structure to be able to
 register / unregister inotify handlers as directories come and go.

 Yes, and you would need one inotify per directory but you do not
 have an infinite supply of outstanding inotify watch (wasn't the
 limit like 8k per a single uid or something?), so the daemon must be
 prepared to say I'll watch this, that and that directories, but the
 consumers should check other directories themselves.

Hey I did not know that. Webkit has about 6k leaf dirs and 182k files.
Watching the top N biggest directories would cover M% of cached files:

   N M%
  10   8.60
  20  13.28
  30  17.52
  40  20.52
  50  23.55
 200  49.70
 676  75.00
 863  80.00
1486  90.00

So it's trade-off. We can cut some syscall cost off but we probably
need to pay some for inotify. And we definitely can't watch full
worktree. I don't know how costly it may be for watching many
directories. If it's not so costly, watching 256 or 512 dirs might be
enough.

What about Windows? Does the equivalent mechanism have similar limits?

 FWIW, I share your suspicion that an effort in the direction this
 thread suggests may end up duplicating what the caching vfs layer
 already does, and doing so poorly.

I'm still curious how it works out. Maybe it's not up to the original
expectation, but hopefully it will speed things up a bit.
-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH] Documentation/technical/api-fswatch.txt: start with outline

2013-03-13 Thread Karsten Blees
Am 13.03.2013 02:03, schrieb Duy Nguyen:
 On Wed, Mar 13, 2013 at 6:21 AM, Karsten Blees karsten.bl...@gmail.com 
 wrote:
 Hmmm...I don't see how filesystem changes since last invocation can solve 
 the problem, or am I missing something? I think what you mean to say is that 
 the daemon should keep track of the filesystem *state* of the working copy, 
 or alternatively the deltas/changes to some known state (such as .git/index)?
 
 I think git process can keep track of filesystem state (and save it
 down if necessary).
[...]
Ah, saving the state was the missing bits, thanks.

However, AFAIK inotify doesn't work recursively, so the daemon would at least 
have to track the directory structure to be able to register / unregister 
inotify handlers as directories come and go.

 Consider 'git status; make; make clean; git status'...that's a *lot* of 
 changes to process for nothing (potentially slowing down make).
 
 Yeah. In my opinion, the daemon should realize that at some point
 accumulated changes are too much that it's not worth collecting
 anymore, and drop them all. Git will do it the normal/slow way. After
 that the daemon picks up again. We only optimize for the case when
 little changes are made in filesystem.
 

That sounds reasonable...

 Then there's the issue of stale data in the cache. Modifying porcelain 
 commands that use 'git status --porcelain' to compile their changesets will 
 want 100% exact data. I'm not saying its not doable, but adding another 
 platform specific, caching daemon to the tool chain doesn't exactly simplify 
 things...

 But perhaps I'm too pessimistic (or just stigmatized by inherently slow and 
 out-of-date TGitCache/TSvnCache on Windows :-)
 
 Thanks. I didn't know about TGitCache. Will dig it up. Maybe we can
 learn something from it (or realize the daemon approach is futile
 after all).
 

TGitCache/TSvnCache are the background processes in TortoiseGit/TortoiseSvn 
that keep track of filesystem state to display icon overlays in Windows 
Explorer.

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH] Documentation/technical/api-fswatch.txt: start with outline

2013-03-13 Thread Junio C Hamano
Karsten Blees karsten.bl...@gmail.com writes:

 However, AFAIK inotify doesn't work recursively, so the daemon
 would at least have to track the directory structure to be able to
 register / unregister inotify handlers as directories come and go.

Yes, and you would need one inotify per directory but you do not
have an infinite supply of outstanding inotify watch (wasn't the
limit like 8k per a single uid or something?), so the daemon must be
prepared to say I'll watch this, that and that directories, but the
consumers should check other directories themselves.

FWIW, I share your suspicion that an effort in the direction this
thread suggests may end up duplicating what the caching vfs layer
already does, and doing so poorly.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH] Documentation/technical/api-fswatch.txt: start with outline

2013-03-12 Thread Ramkumar Ramachandra
Heiko Voigt wrote:
 While talking about platform independence. How about Windows? AFAIK
 there are no file based sockets. How about using shared memory, thats
 available, instead? It would greatly reduce the needed porting effort.

What about the git credential helper: it uses UNIX sockets, no?  How
does git-credential-winstore [1] work?

[1]: https://github.com/anurse/git-credential-winstore
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH] Documentation/technical/api-fswatch.txt: start with outline

2013-03-12 Thread Erik Faye-Lund
On Tue, Mar 12, 2013 at 10:43 AM, Ramkumar Ramachandra
artag...@gmail.com wrote:
 Heiko Voigt wrote:
 While talking about platform independence. How about Windows? AFAIK
 there are no file based sockets. How about using shared memory, thats
 available, instead? It would greatly reduce the needed porting effort.

 What about the git credential helper: it uses UNIX sockets, no?  How
 does git-credential-winstore [1] work?

 [1]: https://github.com/anurse/git-credential-winstore

First, we have a proper credential helper for Windows in
contrib/credential/wincred these days. As the one who wrote that, we
communicate using stdin/stdout. The credential-helper doesn't maintain
state in itself, the Windows Credential Manager does. I suspect
git-credential-winstore works the same way.

As for Windows support, AFAIK there is no support for Unix domain
sockets in Windows. But there is support for named pipes, which is
almost the same thing. What we have support for in compat/mingw.[ch]
is a different matter, but we can extend that if needed.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH] Documentation/technical/api-fswatch.txt: start with outline

2013-03-12 Thread Jeff King
On Tue, Mar 12, 2013 at 03:13:39PM +0530, Ramkumar Ramachandra wrote:

 Heiko Voigt wrote:
  While talking about platform independence. How about Windows? AFAIK
  there are no file based sockets. How about using shared memory, thats
  available, instead? It would greatly reduce the needed porting effort.
 
 What about the git credential helper: it uses UNIX sockets, no?  How
 does git-credential-winstore [1] work?

No, the main credential protocol happens over pipes to a child process's
stdin/stdout. The credential-cache helper does use unix sockets (since
it needs to contact a long-running daemon that caches the credentials),
and AFAIK is not available under Windows (but that's OK, because
Windows-specific helpers that use secure storage are better anyway).

When I introduced credential-cache, I recall somebody mentioned that
there is some Windows-equivalent IPC that can be used to emulate unix
domain sockets. The calls aren't the same, but as long as your
requirements are basically get messages to/from the daemon, you can
probably abstract away the details on a per-platform basis.

Unfortunately I can't seem to find the original message or any details
in the archive (and I know next to nothing about Windows IPC).

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH] Documentation/technical/api-fswatch.txt: start with outline

2013-03-12 Thread Karsten Blees
Am 10.03.2013 21:17, schrieb Ramkumar Ramachandra:
 git operations are slow on repositories with lots of files, and lots
 of tiny filesystem calls like lstat(), getdents(), open() are
 reposible for this.  On the linux-2.6 repository, for instance, the
 numbers for git status look like this:
 
   top syscalls sorted top syscalls sorted
   by acc. timeby number
   --
   0.401906 40950 lstat0.401906 40950 lstat
   0.190484 5343 getdents  0.150055 5374 open
   0.150055 5374 open  0.190484 5343 getdents
   0.074843 2806 close 0.074843 2806 close
   0.003216 157 read   0.003216 157 read
 
 To solve this problem, we propose to build a daemon which will watch
 the filesystem using inotify and report batched up events over a UNIX
 socket.

[...]

 +
 +The credential C API is meant to be called by Git code which needs
 +information aboutx filesystem changes.  It is centered around an
 +object representing the changes the filesystem since the last
 +invocation.
 +

Hmmm...I don't see how filesystem changes since last invocation can solve the 
problem, or am I missing something? I think what you mean to say is that the 
daemon should keep track of the filesystem *state* of the working copy, or 
alternatively the deltas/changes to some known state (such as .git/index)?

I'm also still skeptical whether a daemon will improve overall performance. In 
my understanding its essentially a filesystem cache in user-mode. The 
difference to using the OS filesystem cache directly (via lstat/readdir) is 
that we replace ~50k sys-calls with a single IPC call (i.e. the git -- 
fswatch daemon communication is less 'chatty'). However, the 'chattyness' is 
still there between the fswatch daemon and the OS / inotify. Consider 'git 
status; make; make clean; git status'...that's a *lot* of changes to process 
for nothing (potentially slowing down make).

Then there's the issue of stale data in the cache. Modifying porcelain commands 
that use 'git status --porcelain' to compile their changesets will want 100% 
exact data. I'm not saying its not doable, but adding another platform 
specific, caching daemon to the tool chain doesn't exactly simplify things...

But perhaps I'm too pessimistic (or just stigmatized by inherently slow and 
out-of-date TGitCache/TSvnCache on Windows :-)
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH] Documentation/technical/api-fswatch.txt: start with outline

2013-03-12 Thread Duy Nguyen
On Wed, Mar 13, 2013 at 6:21 AM, Karsten Blees karsten.bl...@gmail.com wrote:
 Hmmm...I don't see how filesystem changes since last invocation can solve the 
 problem, or am I missing something? I think what you mean to say is that the 
 daemon should keep track of the filesystem *state* of the working copy, or 
 alternatively the deltas/changes to some known state (such as .git/index)?

I think git process can keep track of filesystem state (and save it
down if necessary). But when git process is not running, system state
changes and it cannot know about. The daemon helps filling this gap
(and basically keeps git running (in a light form) throughout a
development session). For example if we know only 5 files have changed
since the last refresh, we only need to re-stat those 5. The same for
untracked/ignored file checking,

 I'm also still skeptical whether a daemon will improve overall performance. 
 In my understanding its essentially a filesystem cache in user-mode. The 
 difference to using the OS filesystem cache directly (via lstat/readdir) is 
 that we replace ~50k sys-calls with a single IPC call (i.e. the git -- 
 fswatch daemon communication is less 'chatty'). However, the 'chattyness' is 
 still there between the fswatch daemon and the OS / inotify.

I think it attempts to reduce unnecessary system calls, not eliminate
them all. In the 5 changed files above, a few IPC calls are done to
retrieve the file list, then 5 lstat will be issued (by git, not the
daemon) instead of thousands of them.

Consider 'git status; make; make clean; git status'...that's a *lot* of 
changes to process for nothing (potentially slowing down make).

Yeah. In my opinion, the daemon should realize that at some point
accumulated changes are too much that it's not worth collecting
anymore, and drop them all. Git will do it the normal/slow way. After
that the daemon picks up again. We only optimize for the case when
little changes are made in filesystem.

 Then there's the issue of stale data in the cache. Modifying porcelain 
 commands that use 'git status --porcelain' to compile their changesets will 
 want 100% exact data. I'm not saying its not doable, but adding another 
 platform specific, caching daemon to the tool chain doesn't exactly simplify 
 things...

 But perhaps I'm too pessimistic (or just stigmatized by inherently slow and 
 out-of-date TGitCache/TSvnCache on Windows :-)

Thanks. I didn't know about TGitCache. Will dig it up. Maybe we can
learn something from it (or realize the daemon approach is futile
after all).
-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH] Documentation/technical/api-fswatch.txt: start with outline

2013-03-11 Thread Heiko Voigt
On Mon, Mar 11, 2013 at 01:47:03AM +0530, Ramkumar Ramachandra wrote:
 git operations are slow on repositories with lots of files, and lots
 of tiny filesystem calls like lstat(), getdents(), open() are
 reposible for this.  On the linux-2.6 repository, for instance, the
 numbers for git status look like this:
 
   top syscalls sorted top syscalls sorted
   by acc. timeby number
   --
   0.401906 40950 lstat0.401906 40950 lstat
   0.190484 5343 getdents  0.150055 5374 open
   0.150055 5374 open  0.190484 5343 getdents
   0.074843 2806 close 0.074843 2806 close
   0.003216 157 read   0.003216 157 read
 
 To solve this problem, we propose to build a daemon which will watch
 the filesystem using inotify and report batched up events over a UNIX
 socket.  Since inotify is Linux-only, we have to leave open the
 possibility of writing similar daemons for other platforms.
 Everything will continue to work as before if there is no helper
 present.

While talking about platform independence. How about Windows? AFAIK
there are no file based sockets. How about using shared memory, thats
available, instead? It would greatly reduce the needed porting effort.

Since operations on a lot of files is especially expensive on Windows it
is one of the platforms that would profit the most from such a daemon.

Cheers Heiko
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC/PATCH] Documentation/technical/api-fswatch.txt: start with outline

2013-03-10 Thread Ramkumar Ramachandra
git operations are slow on repositories with lots of files, and lots
of tiny filesystem calls like lstat(), getdents(), open() are
reposible for this.  On the linux-2.6 repository, for instance, the
numbers for git status look like this:

  top syscalls sorted top syscalls sorted
  by acc. timeby number
  --
  0.401906 40950 lstat0.401906 40950 lstat
  0.190484 5343 getdents  0.150055 5374 open
  0.150055 5374 open  0.190484 5343 getdents
  0.074843 2806 close 0.074843 2806 close
  0.003216 157 read   0.003216 157 read

To solve this problem, we propose to build a daemon which will watch
the filesystem using inotify and report batched up events over a UNIX
socket.  Since inotify is Linux-only, we have to leave open the
possibility of writing similar daemons for other platforms.
Everything will continue to work as before if there is no helper
present.

The fswatch API introduces a generic way for git.git to request for
filesystem changes.  Different helpers (like the inotify daemon on
Linux) will be plugged into this API on different platforms.  It falls
back to using the filesystem calls.

The daemon will start up with the very first operation done on the git
repository, and will die after a specified period of repository
inactivity.  It is going to be a per-repo daemon and will write to a
socket in the repository: access control is managed by filesystem
permissions.

This design is inspired by the credential helper design.

Signed-off-by: Ramkumar Ramachandra artag...@gmail.com
---
 Documentation/technical/api-fswatch.txt | 62 +
 1 file changed, 62 insertions(+)
 create mode 100644 Documentation/technical/api-fswatch.txt

diff --git a/Documentation/technical/api-fswatch.txt 
b/Documentation/technical/api-fswatch.txt
new file mode 100644
index 000..9c6826a
--- /dev/null
+++ b/Documentation/technical/api-fswatch.txt
@@ -0,0 +1,62 @@
+fswatch API
+===
+
+The fswatch API provides an abstracted way of collecting information
+about filesystem changes.  A remote helper is typically a daemon which
+uses inotify to watch the filesystem, and this information is used by
+git instead of making expensive system calls like lstat(), open().
+
+Typical setup
+-
+
+
++---+
+| Git code (C)  |--- requires information about fs changes
+|...|
+| C fswatch API |--- system calls --- filesystem
++---+
+ ^ |
+ | UNIX socket |
+ | v
++---+
+| Git fswatch helper|--- daemon inotify-watching --- filesystem
++---+
+
+
+The Git code will call the C API to obtain changes in filesystem
+information.  The API will itself call a configured helper (e.g. git
+fswatch-notify) which may run filesystem changes, if the remote
+helper daemon was started in a previous invocation.  If the daemon is
+not already running, it is started, and the C API will fall back to
+making expensive system calls.
+
+C API
+-
+
+The credential C API is meant to be called by Git code which needs
+information aboutx filesystem changes.  It is centered around an
+object representing the changes the filesystem since the last
+invocation.
+
+Data Structures
+~~~
+
+`struct fschanges`::
+
+   TODO
+
+
+Functions
+~
+
+TODO
+
+Example
+~~~
+
+TODO
+
+fswatch Helpers
+---
+
+TODO
-- 
1.8.1.5

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html