Re: [PATCH v14 00/21] index-helper/watchman

2016-07-14 Thread Ben Peart
Duy Nguyen  gmail.com> writes:

> 
> On Wed, Jul 13, 2016 at 11:59 PM, David Turner  novalis.org> 
wrote:
> > On 07/12/2016 02:24 PM, Duy Nguyen wrote:
> >>
> >> Just thinking out loud. I've been thinking about this more about this.
> >> After the move from signal-based to unix socket for communication, we
> >> probably are better off with a simpler design than the shm-alike one
> >> we have now.
> >>
> >> What if we send everything over a socket or a pipe? Sending 500MB over
> >> a unix socket takes 253ms, that's insignificant when operations on an
> >> index that size usually take seconds. If we send everything over
> >> socket/pipe, we can trust data integrity and don't have to verify,
> >> even the trailing SHA-1 in shm file.
> >
> >
> > I think it would be good to make index operations not take seconds.
> >
> > In general, we should not need to verify the trailing SHA-1 for shm data.
> > So the index-helper verifies it when it loads it, but the git (e.g.) status
> > should not need to verify.
> >
> > Also, if we have two git commands running at the same time, the index-helper
> > can only serve one at a time; with shm, both can run at full speed.
> 
> We still have an option to send a (shm, possibly) path to git to pick
> up and skip verification. If we can exchange capabilities then sending
> the index some way else is always possible.
> 
> >> So, what I have in mind is this, at read index time, instead of open a
> >> socket, we run a separate program and communicate via pipes. We can
> >> exchange capabilities if needed, then the program sends the entire
> >> current index, the list of updated files back (and/or the list of dirs
> >> to invalidate). The design looks very much like a smudge/clean filter.
> >
> >
> > This seems very complicated.  Now git status talks to the separate program,
> > which talks to the index-helper, which talks to watchman.  That is a lot of
> > steps!
> 
> I was suggesting this because I think it would simplify things, not
> complicate stuff further. Yes the separate program plays the role of
> our unix client, if we keep the index-helper. But we don't have to.
> 
> Do you remember Junio once suggested to put the index on tmpfs? That's
> what I imagine in common, medium scale setups. We don't need an extra
> daemon:
> 
> 1) when git needs the index, the script looks at its tmpfs mount, if
> found, pass the path back
> 2) when git announces the index has been updated, the script reads the
> index and saves it in tmpfs
> 3) when git refreshes and asks for watchman support, the script simply
> runs "watchman" command, post processes the output a bit and send the
> file list to git
> 
> Because there is no separate daemon in this case, we don't need
> --kill, we don't need --autorun. We still need WAMA extension but it
> can contain just an arbitrary clock string, this is completely opaque
> to git. If we can get rid of the index-helper (with an example script
> probably landed in contrib folder), that's a lot of less headache down
> the road.
> 
> For giant-scale repos, you probably want something more efficient than
> a script like this. And the good thing is you have freedom to do
> whatever you want. You can run one daemon per repo, you can run one
> daemon per system... In some previous mail exchange with Dscho, it was
> mentioned that something other than watchman may be desired. This
> opens up that door without much headache from outside.
> 
> > I think the daemon also has the advantage that it can reload the index as
> > soon as it changes.  This is not quite implemented, but it would be pretty
> > easy to do.  That would save a lot of time in the typical workflow.
> 
> A script has the same advantage, that is if git notifies it (like we
> do now). You can also do it using watchman trigger, which does not
> need any special support from git.


Taking a step back, git needs (at least) 3 things to update the index quickly, 
the old index, the list of potentially modified files and the list of 
directories with changes.  When looked at this way, the question becomes how do 
we provide this data to git in the fastest, simplest, most compatible way.

It makes sense to separate the code that git can call to abstract away some of 
the potentially platform dependent aspects of how we store and retrieve this 
data quickly.  For example, we would like to use something other than Watchman 
to track changes as it isn’t well supported on Windows.

It would be nice to simplify this for git as much as possible by eliminating 
the 
need for it to manage a daemon.  No need for --autorun, --kill, poking sockets, 
etc - just run the program and get back the data.

It also makes sense to allow git to negotiate for the data it needs.  It may 
only need the index, it may need the list of updated files and if 
untrackedcache 
is turned on, it would need the list of directories to invalidate.  The 
separate 
code should also be able to say which parts it can provide quickly and have git 

Re: [PATCH v14 00/21] index-helper/watchman

2016-07-14 Thread Duy Nguyen
Big typo..

On Thu, Jul 14, 2016 at 5:56 PM, Duy Nguyen  wrote:
> For giant-scale repos, you probably want something more efficient than
> a script like this. And the good thing is you have freedom to do
> whatever you want. You can run one daemon per repo, you can run one
> daemon per system... In some previous mail exchange with Dscho, it was
> mentioned that something other than watchman may be desired. This
> opens up that door without much headache from outside.

s/outside/our side/
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v14 00/21] index-helper/watchman

2016-07-14 Thread Duy Nguyen
On Wed, Jul 13, 2016 at 11:59 PM, David Turner  wrote:
> On 07/12/2016 02:24 PM, Duy Nguyen wrote:
>>
>> Just thinking out loud. I've been thinking about this more about this.
>> After the move from signal-based to unix socket for communication, we
>> probably are better off with a simpler design than the shm-alike one
>> we have now.
>>
>> What if we send everything over a socket or a pipe? Sending 500MB over
>> a unix socket takes 253ms, that's insignificant when operations on an
>> index that size usually take seconds. If we send everything over
>> socket/pipe, we can trust data integrity and don't have to verify,
>> even the trailing SHA-1 in shm file.
>
>
> I think it would be good to make index operations not take seconds.
>
> In general, we should not need to verify the trailing SHA-1 for shm data.
> So the index-helper verifies it when it loads it, but the git (e.g.) status
> should not need to verify.
>
> Also, if we have two git commands running at the same time, the index-helper
> can only serve one at a time; with shm, both can run at full speed.

We still have an option to send a (shm, possibly) path to git to pick
up and skip verification. If we can exchange capabilities then sending
the index some way else is always possible.

>> So, what I have in mind is this, at read index time, instead of open a
>> socket, we run a separate program and communicate via pipes. We can
>> exchange capabilities if needed, then the program sends the entire
>> current index, the list of updated files back (and/or the list of dirs
>> to invalidate). The design looks very much like a smudge/clean filter.
>
>
> This seems very complicated.  Now git status talks to the separate program,
> which talks to the index-helper, which talks to watchman.  That is a lot of
> steps!

I was suggesting this because I think it would simplify things, not
complicate stuff further. Yes the separate program plays the role of
our unix client, if we keep the index-helper. But we don't have to.

Do you remember Junio once suggested to put the index on tmpfs? That's
what I imagine in common, medium scale setups. We don't need an extra
daemon:

1) when git needs the index, the script looks at its tmpfs mount, if
found, pass the path back
2) when git announces the index has been updated, the script reads the
index and saves it in tmpfs
3) when git refreshes and asks for watchman support, the script simply
runs "watchman" command, post processes the output a bit and send the
file list to git

Because there is no separate daemon in this case, we don't need
--kill, we don't need --autorun. We still need WAMA extension but it
can contain just an arbitrary clock string, this is completely opaque
to git. If we can get rid of the index-helper (with an example script
probably landed in contrib folder), that's a lot of less headache down
the road.

For giant-scale repos, you probably want something more efficient than
a script like this. And the good thing is you have freedom to do
whatever you want. You can run one daemon per repo, you can run one
daemon per system... In some previous mail exchange with Dscho, it was
mentioned that something other than watchman may be desired. This
opens up that door without much headache from outside.

> I think the daemon also has the advantage that it can reload the index as
> soon as it changes.  This is not quite implemented, but it would be pretty
> easy to do.  That would save a lot of time in the typical workflow.

A script has the same advantage, that is if git notifies it (like we
do now). You can also do it using watchman trigger, which does not
need any special support from git.
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v14 00/21] index-helper/watchman

2016-07-13 Thread David Turner

On 07/12/2016 02:24 PM, Duy Nguyen wrote:

Just thinking out loud. I've been thinking about this more about this.
After the move from signal-based to unix socket for communication, we
probably are better off with a simpler design than the shm-alike one
we have now.

What if we send everything over a socket or a pipe? Sending 500MB over
a unix socket takes 253ms, that's insignificant when operations on an
index that size usually take seconds. If we send everything over
socket/pipe, we can trust data integrity and don't have to verify,
even the trailing SHA-1 in shm file.


I think it would be good to make index operations not take seconds.

In general, we should not need to verify the trailing SHA-1 for shm 
data.  So the index-helper verifies it when it loads it, but the git 
(e.g.) status should not need to verify.


Also, if we have two git commands running at the same time, the 
index-helper can only serve one at a time; with shm, both can run at 
full speed.



So, what I have in mind is this, at read index time, instead of open a
socket, we run a separate program and communicate via pipes. We can
exchange capabilities if needed, then the program sends the entire
current index, the list of updated files back (and/or the list of dirs
to invalidate). The design looks very much like a smudge/clean filter.


This seems very complicated.  Now git status talks to the separate 
program, which talks to the index-helper, which talks to watchman.  That 
is a lot of steps!


I think we should wait until we heard from the Windows folks what the 
problems with the current solution are, and see what design they come up 
with.



For people who don't want extra daemon, they can write a short script
that saves indexes somewhere in tmpfs, and talk to watchman or
something else. I haven't written this script, but I don't think it
takes long to write one. Windows folks have total freedom to implement
a daemon, a service or whatever and use this program as front end. How
the service talks to this program is totally up to them. For people
who want to centralize everything, they can have just one daemon and
have the script to talk to this daemon.

I can see that getting rid of file-based stuff simplifies some
patches. We can still provide a daemon to do more advanced stuff (or
to make it work out of the box). But it's not a hard requirement and
we probably don't need to include one right now. And I think it makes
it easier to test as well because we can just go with some fake file
monitor service instead of real watchman.


I think the daemon also has the advantage that it can reload the index 
as soon as it changes.  This is not quite implemented, but it would be 
pretty easy to do.  That would save a lot of time in the typical workflow.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v14 00/21] index-helper/watchman

2016-07-12 Thread Duy Nguyen
Just thinking out loud. I've been thinking about this more about this.
After the move from signal-based to unix socket for communication, we
probably are better off with a simpler design than the shm-alike one
we have now.

What if we send everything over a socket or a pipe? Sending 500MB over
a unix socket takes 253ms, that's insignificant when operations on an
index that size usually take seconds. If we send everything over
socket/pipe, we can trust data integrity and don't have to verify,
even the trailing SHA-1 in shm file.

So, what I have in mind is this, at read index time, instead of open a
socket, we run a separate program and communicate via pipes. We can
exchange capabilities if needed, then the program sends the entire
current index, the list of updated files back (and/or the list of dirs
to invalidate). The design looks very much like a smudge/clean filter.

For people who don't want extra daemon, they can write a short script
that saves indexes somewhere in tmpfs, and talk to watchman or
something else. I haven't written this script, but I don't think it
takes long to write one. Windows folks have total freedom to implement
a daemon, a service or whatever and use this program as front end. How
the service talks to this program is totally up to them. For people
who want to centralize everything, they can have just one daemon and
have the script to talk to this daemon.

I can see that getting rid of file-based stuff simplifies some
patches. We can still provide a daemon to do more advanced stuff (or
to make it work out of the box). But it's not a hard requirement and
we probably don't need to include one right now. And I think it makes
it easier to test as well because we can just go with some fake file
monitor service instead of real watchman.
--
Duy

On Sun, Jul 3, 2016 at 9:57 AM, David Turner  wrote:
> This addresses comments on v13:
> removed unnecessary no_mmap ifdef
> add an ifdef in unix-socket
> OS X fix for select()
> test improvement
>
> Thanks to all for suggestions.
>
> David Turner (10):
>   pkt-line: add gentle version of packet_write
>   index-helper: log warnings
>   unpack-trees: preserve index extensions
>   watchman: add a config option to enable the extension
>   index-helper: kill mode
>   index-helper: don't run if already running
>   index-helper: autorun mode
>   index-helper: optionally automatically run
>   index-helper: indexhelper.exitAfter config
>   mailmap: use main email address for dturner
>
> Nguyễn Thái Ngọc Duy (11):
>   read-cache: allow to keep mmap'd memory after reading
>   unix-socket.c: add stub implementation when unix sockets are not
> supported
>   index-helper: new daemon for caching index and related stuff
>   index-helper: add --strict
>   daemonize(): set a flag before exiting the main process
>   index-helper: add --detach
>   read-cache: add watchman 'WAMA' extension
>   watchman: support watchman to reduce index refresh cost
>   index-helper: use watchman to avoid refreshing index with lstat()
>   update-index: enable/disable watchman support
>   trace: measure where the time is spent in the index-heavy operations
>
>  .gitignore   |   2 +
>  .mailmap |   1 +
>  Documentation/config.txt |  12 +
>  Documentation/git-index-helper.txt   |  86 +
>  Documentation/git-update-index.txt   |   6 +
>  Documentation/technical/index-format.txt |  22 ++
>  Makefile |  22 ++
>  builtin/gc.c |   2 +-
>  builtin/update-index.c   |  15 +
>  cache.h  |  25 +-
>  command-list.txt |   1 +
>  config.c |   5 +
>  configure.ac |   8 +
>  contrib/completion/git-completion.bash   |   1 +
>  daemon.c |   2 +-
>  diff-lib.c   |   4 +
>  dir.c|  25 +-
>  dir.h|   6 +
>  environment.c|   2 +
>  git-compat-util.h|   1 +
>  index-helper.c   | 469 +++
>  name-hash.c  |   2 +
>  pkt-line.c   |  18 ++
>  pkt-line.h   |   2 +
>  preload-index.c  |   2 +
>  read-cache.c | 531 
> ++-
>  refs/files-backend.c |   2 +
>  setup.c  |   4 +-
>  t/t1701-watchman-extension.sh|  37 +++
>  t/t7063-status-untracked-cache.sh|  22 ++
>  t/t7900-index-helper.sh  |  79 +
>  t/test-lib-functions.sh  |   4 +
>  test-dump-watchman.c |  16 +
>  unix-socket.h   

Re: [PATCH v14 00/21] index-helper/watchman

2016-07-06 Thread Junio C Hamano
David Turner  writes:

> This addresses comments on v13:
> removed unnecessary no_mmap ifdef
> add an ifdef in unix-socket
> OS X fix for select()
> test improvement

Thanks.  It seems that what used to be 809fd05a (read-cache.c: fix
constness of verify_hdr(), 2016-06-26) is no longer included in the
series, even though the tightening of the constness seems to still
be OK (i.e. verify_hdr() function takes the parameter hdr and only
uses it read-only).  Do you want me to keep it, or (more likely) did
I miss some discussion that reached the concensus that we do not
want it?
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v14 00/21] index-helper/watchman

2016-07-04 Thread Johannes Schindelin
Hi Dave,

On Sun, 3 Jul 2016, Johannes Schindelin wrote:

> On Sun, 3 Jul 2016, David Turner wrote:
> 
> > This addresses comments on v13:
> > removed unnecessary no_mmap ifdef
> > add an ifdef in unix-socket
> > OS X fix for select()
> > test improvement
> 
> Thanks.
> 
> Would you mind re-sending 20 & 21, they seem to have gotten lost. Or is
> there a public repository where I can simply fetch the branch? (I am not
> really a fan of applying patches from my mailbox...)

The patches arrived, thanks.

For interested parties: I rebased the patches (adjusting for t/helper) and
pushed the result to refs/tags/index-helper-v14-rebased at
https://github.com/dscho/git.

Thanks,
Dscho
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v14 00/21] index-helper/watchman

2016-07-03 Thread Johannes Schindelin
Hi Dave,

On Sun, 3 Jul 2016, David Turner wrote:

> This addresses comments on v13:
> removed unnecessary no_mmap ifdef
> add an ifdef in unix-socket
> OS X fix for select()
> test improvement

Thanks.

Would you mind re-sending 20 & 21, they seem to have gotten lost. Or is
there a public repository where I can simply fetch the branch? (I am not
really a fan of applying patches from my mailbox...)

Thanks,
Dscho
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html