Re: [PATCH v6 13/13] read-cache: make sure file handles are not inherited by child processes
Torsten Bögershausenwrites: >> >> diff --git a/sha1_file.c b/sha1_file.c >> index d5e1121..759991e 100644 >> --- a/sha1_file.c >> +++ b/sha1_file.c >> @@ -1485,7 +1485,7 @@ int check_sha1_signature(const unsigned char *sha1, >> void *map, >> >> int git_open_noatime(const char *name) > > Hm, should the function then be renamed into > > git_open_noatime_cloexec() > >> { >> -static int sha1_file_open_flag = O_NOATIME; >> +static int sha1_file_open_flag = O_NOATIME | O_CLOEXEC; Perhaps. In any case, this is probably something that can and should be done outside this series. I am tempted to suggest that the patch 13/13 under discussion may also want to be done outside the scope of, and before, this series. Even though with the current system an inherited file descriptor to v1 filter processes would cause issues, there is no good reason to expose this file desciptor to them. Thanks.
Re: [PATCH v6 13/13] read-cache: make sure file handles are not inherited by child processes
> > diff --git a/sha1_file.c b/sha1_file.c > index d5e1121..759991e 100644 > --- a/sha1_file.c > +++ b/sha1_file.c > @@ -1485,7 +1485,7 @@ int check_sha1_signature(const unsigned char *sha1, > void *map, > > int git_open_noatime(const char *name) Hm, should the function then be renamed into git_open_noatime_cloexec() > { > - static int sha1_file_open_flag = O_NOATIME; > + static int sha1_file_open_flag = O_NOATIME | O_CLOEXEC; > > for (;;) { > int fd; > > >
Re: [PATCH v6 13/13] read-cache: make sure file handles are not inherited by child processes
> On 29 Aug 2016, at 21:45, Junio C Hamanowrote: > > Lars Schneider writes: > >> I see. Thanks for the explanation. > > I expect the updated log message to explain the issue correctly > then. Sure! >>> The parent is >>> very likely to have pack windows open into .pack files and they need >>> to be closed on the child side after fork(2) starts the child >>> process but before execve(2) runs the helper, if we want to avoid >>> file descriptor leaks. >> >> I think I understand what you are saying. However, during my tests >> .pack file fd's were never a problem. > > I do not expect during the lifetime of your long-running helper > anybody would try to unlink an existing packfile, so it is unlikely > that "cannot unlink an open file on Windows" issue to come up. And > the cross-platform problem I pointed out is a fd leak; leaks would > not show up until you run out of the resource, just like you > wouldn't notice small memory leak here and there UNLESS you actively > look for them. I would be surprised if your "tests" found anything. > >> How would I find the open .pack file fd's? Should I go through >> /proc/PID/fd? Why is this no problem for other longer running >> commands such as the git-credential-cache--daemon or git-daemon? > > Nobody said this is "no problem for others". While discussing the > patch that started this thread, we noticed that any open file > descriptor the main process has when the long-running clean/smudge > helper is spawned _is_ a problem. Other helpers may share the same > problem, and if they do, that is all the more reason that fixing the > file descriptor leak is a good thing. > > The primary thing I was wondering was if we should open the window > into packfiles with CLOEXEC, just like we recently started opening > the tempfiles and lockfiles with the flag. The reason why I asked > if the site that spawns (i.e. run_command API) has an easy access to > the list of file descriptors that we opened ONLY for ourselves is > because that would make it possible to consider an alternative > approach close them before execve(2) in run_commands API. That > would save us from having to sprinkle existing calls to open(2) with > CLOEXEC. But if that is not the case, opening them with CLOEXEC is > a much better solution than going outside the program to "find" them > in non-portable ways. The pack files are opened before the filter process is forked. Therefore, I think CLOEXEC makes sense for them. Should this change be part of this series? If yes, would it look like below? Should I adjust the function name? Thanks, Lars diff --git a/sha1_file.c b/sha1_file.c index d5e1121..759991e 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -1485,7 +1485,7 @@ int check_sha1_signature(const unsigned char *sha1, void *map, int git_open_noatime(const char *name) { - static int sha1_file_open_flag = O_NOATIME; + static int sha1_file_open_flag = O_NOATIME | O_CLOEXEC; for (;;) { int fd;
Re: [PATCH v6 13/13] read-cache: make sure file handles are not inherited by child processes
Lars Schneiderwrites: > I see. Thanks for the explanation. I expect the updated log message to explain the issue correctly then. >> And even on POSIX systems, if you are doing a long-running helper >> any open file descriptor in the parent process when the long-running >> helper is spawned will become leaked fd. CLOEXEC is a possible >> solution (but not necessarily the only or the best one) to the fd >> leak in this case. >> >> How much does the code that spawns these long-running helpers know >> about the file descriptors that happen to be open? > > Nothing really. OK. >> The parent is >> very likely to have pack windows open into .pack files and they need >> to be closed on the child side after fork(2) starts the child >> process but before execve(2) runs the helper, if we want to avoid >> file descriptor leaks. > > I think I understand what you are saying. However, during my tests > .pack file fd's were never a problem. I do not expect during the lifetime of your long-running helper anybody would try to unlink an existing packfile, so it is unlikely that "cannot unlink an open file on Windows" issue to come up. And the cross-platform problem I pointed out is a fd leak; leaks would not show up until you run out of the resource, just like you wouldn't notice small memory leak here and there UNLESS you actively look for them. I would be surprised if your "tests" found anything. > How would I find the open .pack file fd's? Should I go through > /proc/PID/fd? Why is this no problem for other longer running > commands such as the git-credential-cache--daemon or git-daemon? Nobody said this is "no problem for others". While discussing the patch that started this thread, we noticed that any open file descriptor the main process has when the long-running clean/smudge helper is spawned _is_ a problem. Other helpers may share the same problem, and if they do, that is all the more reason that fixing the file descriptor leak is a good thing. The primary thing I was wondering was if we should open the window into packfiles with CLOEXEC, just like we recently started opening the tempfiles and lockfiles with the flag. The reason why I asked if the site that spawns (i.e. run_command API) has an easy access to the list of file descriptors that we opened ONLY for ourselves is because that would make it possible to consider an alternative approach close them before execve(2) in run_commands API. That would save us from having to sprinkle existing calls to open(2) with CLOEXEC. But if that is not the case, opening them with CLOEXEC is a much better solution than going outside the program to "find" them in non-portable ways. Thanks.
Re: [PATCH v6 13/13] read-cache: make sure file handles are not inherited by child processes
> On 29 Aug 2016, at 20:05, Junio C Hamanowrote: > > larsxschnei...@gmail.com writes: > >> From: Lars Schneider >> >> Consider the case of a file that requires filtering and is present in >> branch A but not in branch B. If A is the current HEAD and we checkout B >> then the following happens: >> >> 1. ce_compare_data() opens the file >> 2. index_fd() detects that the file requires to run a clean filter and >> calls index_stream_convert_blob() >> 4. index_stream_convert_blob() calls convert_to_git_filter_fd() >> 5. convert_to_git_filter_fd() calls apply_filter() which creates a >> new long running filter process (in case it is the first file >> of this kind to be filtered) >> 6. The new filter process inherits all file handles. This is the >> default on Linux/OSX and is explicitly defined in the >> `CreateProcessW` call in `mingw.c` on Windows. >> 7. ce_compare_data() closes the file >> 8. Git unlinks the file as it is not present in B >> >> The unlink operation does not work on Windows because the filter process >> has still an open handle to the file. Apparently that is no problem on >> Linux/OSX. Probably because "[...] the two file descriptors share open >> file status flags" (see fork(2)). > > Wait, a, minute. "that is no problem" may be true as long as "that" > is "unlinking the now-gone file in the filesystem", but the reason > does not have anything to do with the "open-file status flags"; > unlike Windows, you _can_ unlink file that has an open file > descriptor on it. I see. Thanks for the explanation. > > And even on POSIX systems, if you are doing a long-running helper > any open file descriptor in the parent process when the long-running > helper is spawned will become leaked fd. CLOEXEC is a possible > solution (but not necessarily the only or the best one) to the fd > leak in this case. > > How much does the code that spawns these long-running helpers know > about the file descriptors that happen to be open? Nothing really. > The parent is > very likely to have pack windows open into .pack files and they need > to be closed on the child side after fork(2) starts the child > process but before execve(2) runs the helper, if we want to avoid > file descriptor leaks. I think I understand what you are saying. However, during my tests .pack file fd's were never a problem. I use start_command() [1] which wraps the fork() and exec calls [2]. How would I find the open .pack file fd's? Should I go through /proc/PID/fd? Why is this no problem for other longer running commands such as the git-credential-cache--daemon or git-daemon? Thanks, Lars [1] https://github.com/larsxschneider/git/blob/protocol-filter/v6/convert.c#L566 [2] https://github.com/larsxschneider/git/blob/protocol-filter/v6/run-command.c#L345-L412
Re: [PATCH v6 13/13] read-cache: make sure file handles are not inherited by child processes
larsxschnei...@gmail.com writes: > From: Lars Schneider> > Consider the case of a file that requires filtering and is present in > branch A but not in branch B. If A is the current HEAD and we checkout B > then the following happens: > > 1. ce_compare_data() opens the file > 2. index_fd() detects that the file requires to run a clean filter and > calls index_stream_convert_blob() > 4. index_stream_convert_blob() calls convert_to_git_filter_fd() > 5. convert_to_git_filter_fd() calls apply_filter() which creates a > new long running filter process (in case it is the first file > of this kind to be filtered) > 6. The new filter process inherits all file handles. This is the > default on Linux/OSX and is explicitly defined in the > `CreateProcessW` call in `mingw.c` on Windows. > 7. ce_compare_data() closes the file > 8. Git unlinks the file as it is not present in B > > The unlink operation does not work on Windows because the filter process > has still an open handle to the file. Apparently that is no problem on > Linux/OSX. Probably because "[...] the two file descriptors share open > file status flags" (see fork(2)). Wait, a, minute. "that is no problem" may be true as long as "that" is "unlinking the now-gone file in the filesystem", but the reason does not have anything to do with the "open-file status flags"; unlike Windows, you _can_ unlink file that has an open file descriptor on it. And even on POSIX systems, if you are doing a long-running helper any open file descriptor in the parent process when the long-running helper is spawned will become leaked fd. CLOEXEC is a possible solution (but not necessarily the only or the best one) to the fd leak in this case. How much does the code that spawns these long-running helpers know about the file descriptors that happen to be open? The parent is very likely to have pack windows open into .pack files and they need to be closed on the child side after fork(2) starts the child process but before execve(2) runs the helper, if we want to avoid file descriptor leaks. > Fix this problem by opening files in read-cache with the `O_CLOEXEC` > flag to ensure that the file descriptor does not remain open in a newly > spawned process. `O_CLOEXEC` is defined as `O_NOINHERIT` on Windows. A > similar fix for temporary file handles was applied on Git for Windows > already: > https://github.com/git-for-windows/git/commit/667b8b51ec850c3e1c7d75dee69dc13c29d1f162 After a few iterations, the final version of the same commit is now in our tree as d5cb9cbd ("Git 2.10-rc2", 2016-08-26).
[PATCH v6 13/13] read-cache: make sure file handles are not inherited by child processes
From: Lars SchneiderConsider the case of a file that requires filtering and is present in branch A but not in branch B. If A is the current HEAD and we checkout B then the following happens: 1. ce_compare_data() opens the file 2. index_fd() detects that the file requires to run a clean filter and calls index_stream_convert_blob() 4. index_stream_convert_blob() calls convert_to_git_filter_fd() 5. convert_to_git_filter_fd() calls apply_filter() which creates a new long running filter process (in case it is the first file of this kind to be filtered) 6. The new filter process inherits all file handles. This is the default on Linux/OSX and is explicitly defined in the `CreateProcessW` call in `mingw.c` on Windows. 7. ce_compare_data() closes the file 8. Git unlinks the file as it is not present in B The unlink operation does not work on Windows because the filter process has still an open handle to the file. Apparently that is no problem on Linux/OSX. Probably because "[...] the two file descriptors share open file status flags" (see fork(2)). Fix this problem by opening files in read-cache with the `O_CLOEXEC` flag to ensure that the file descriptor does not remain open in a newly spawned process. `O_CLOEXEC` is defined as `O_NOINHERIT` on Windows. A similar fix for temporary file handles was applied on Git for Windows already: https://github.com/git-for-windows/git/commit/667b8b51ec850c3e1c7d75dee69dc13c29d1f162 Signed-off-by: Lars Schneider --- read-cache.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/read-cache.c b/read-cache.c index db27766..f481dee 100644 --- a/read-cache.c +++ b/read-cache.c @@ -159,7 +159,7 @@ void fill_stat_cache_info(struct cache_entry *ce, struct stat *st) static int ce_compare_data(const struct cache_entry *ce, struct stat *st) { int match = -1; - int fd = open(ce->name, O_RDONLY); + int fd = open(ce->name, O_RDONLY | O_CLOEXEC); if (fd >= 0) { unsigned char sha1[20]; -- 2.9.2 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html