Re: [PATCH v6 13/13] read-cache: make sure file handles are not inherited by child processes

2016-09-01 Thread Junio C Hamano
Torsten Bögershausen  writes:

>> 
>> diff --git a/sha1_file.c b/sha1_file.c
>> index d5e1121..759991e 100644
>> --- a/sha1_file.c
>> +++ b/sha1_file.c
>> @@ -1485,7 +1485,7 @@ int check_sha1_signature(const unsigned char *sha1, 
>> void *map,
>>  
>>  int git_open_noatime(const char *name)
>
> Hm, should the function then be renamed into
>
> git_open_noatime_cloexec()
>
>>  {
>> -static int sha1_file_open_flag = O_NOATIME;
>> +static int sha1_file_open_flag = O_NOATIME | O_CLOEXEC;

Perhaps.

In any case, this is probably something that can and should be done
outside this series.

I am tempted to suggest that the patch 13/13 under discussion may
also want to be done outside the scope of, and before, this series.
Even though with the current system an inherited file descriptor to
v1 filter processes would cause issues, there is no good reason to
expose this file desciptor to them.

Thanks.



Re: [PATCH v6 13/13] read-cache: make sure file handles are not inherited by child processes

2016-08-30 Thread Torsten Bögershausen

> 
> diff --git a/sha1_file.c b/sha1_file.c
> index d5e1121..759991e 100644
> --- a/sha1_file.c
> +++ b/sha1_file.c
> @@ -1485,7 +1485,7 @@ int check_sha1_signature(const unsigned char *sha1, 
> void *map,
>  
>  int git_open_noatime(const char *name)

Hm, should the function then be renamed into

git_open_noatime_cloexec()

>  {
> - static int sha1_file_open_flag = O_NOATIME;
> + static int sha1_file_open_flag = O_NOATIME | O_CLOEXEC;
>  
>   for (;;) {
>   int fd;
> 
> 
> 


Re: [PATCH v6 13/13] read-cache: make sure file handles are not inherited by child processes

2016-08-30 Thread Lars Schneider

> On 29 Aug 2016, at 21:45, Junio C Hamano  wrote:
> 
> Lars Schneider  writes:
> 
>> I see. Thanks for the explanation.
> 
> I expect the updated log message to explain the issue correctly
> then.

Sure!


>>> The parent is
>>> very likely to have pack windows open into .pack files and they need
>>> to be closed on the child side after fork(2) starts the child
>>> process but before execve(2) runs the helper, if we want to avoid
>>> file descriptor leaks.
>> 
>> I think I understand what you are saying. However, during my tests
>> .pack file fd's were never a problem.
> 
> I do not expect during the lifetime of your long-running helper
> anybody would try to unlink an existing packfile, so it is unlikely
> that "cannot unlink an open file on Windows" issue to come up.  And
> the cross-platform problem I pointed out is a fd leak; leaks would
> not show up until you run out of the resource, just like you
> wouldn't notice small memory leak here and there UNLESS you actively
> look for them.  I would be surprised if your "tests" found anything.
> 
>> How would I find the open .pack file fd's?  Should I go through
>> /proc/PID/fd? Why is this no problem for other longer running
>> commands such as the git-credential-cache--daemon or git-daemon?
> 
> Nobody said this is "no problem for others".  While discussing the
> patch that started this thread, we noticed that any open file
> descriptor the main process has when the long-running clean/smudge
> helper is spawned _is_ a problem.  Other helpers may share the same
> problem, and if they do, that is all the more reason that fixing the
> file descriptor leak is a good thing.
> 
> The primary thing I was wondering was if we should open the window
> into packfiles with CLOEXEC, just like we recently started opening
> the tempfiles and lockfiles with the flag.  The reason why I asked
> if the site that spawns (i.e. run_command API) has an easy access to
> the list of file descriptors that we opened ONLY for ourselves is
> because that would make it possible to consider an alternative
> approach close them before execve(2) in run_commands API.  That
> would save us from having to sprinkle existing calls to open(2) with
> CLOEXEC.  But if that is not the case, opening them with CLOEXEC is
> a much better solution than going outside the program to "find" them
> in non-portable ways.

The pack files are opened before the filter process is forked. Therefore,
I think CLOEXEC makes sense for them. Should this change be part of this 
series? If yes, would it look like below? Should I adjust the function name?

Thanks,
Lars

diff --git a/sha1_file.c b/sha1_file.c
index d5e1121..759991e 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1485,7 +1485,7 @@ int check_sha1_signature(const unsigned char *sha1, void 
*map,
 
 int git_open_noatime(const char *name)
 {
-   static int sha1_file_open_flag = O_NOATIME;
+   static int sha1_file_open_flag = O_NOATIME | O_CLOEXEC;
 
for (;;) {
int fd;





Re: [PATCH v6 13/13] read-cache: make sure file handles are not inherited by child processes

2016-08-29 Thread Junio C Hamano
Lars Schneider  writes:

> I see. Thanks for the explanation.

I expect the updated log message to explain the issue correctly
then.

>> And even on POSIX systems, if you are doing a long-running helper
>> any open file descriptor in the parent process when the long-running
>> helper is spawned will become leaked fd.  CLOEXEC is a possible
>> solution (but not necessarily the only or the best one) to the fd
>> leak in this case.
>> 
>> How much does the code that spawns these long-running helpers know
>> about the file descriptors that happen to be open?
>
> Nothing really.

OK.

>> The parent is
>> very likely to have pack windows open into .pack files and they need
>> to be closed on the child side after fork(2) starts the child
>> process but before execve(2) runs the helper, if we want to avoid
>> file descriptor leaks.
>
> I think I understand what you are saying. However, during my tests
> .pack file fd's were never a problem.

I do not expect during the lifetime of your long-running helper
anybody would try to unlink an existing packfile, so it is unlikely
that "cannot unlink an open file on Windows" issue to come up.  And
the cross-platform problem I pointed out is a fd leak; leaks would
not show up until you run out of the resource, just like you
wouldn't notice small memory leak here and there UNLESS you actively
look for them.  I would be surprised if your "tests" found anything.

> How would I find the open .pack file fd's?  Should I go through
> /proc/PID/fd? Why is this no problem for other longer running
> commands such as the git-credential-cache--daemon or git-daemon?

Nobody said this is "no problem for others".  While discussing the
patch that started this thread, we noticed that any open file
descriptor the main process has when the long-running clean/smudge
helper is spawned _is_ a problem.  Other helpers may share the same
problem, and if they do, that is all the more reason that fixing the
file descriptor leak is a good thing.

The primary thing I was wondering was if we should open the window
into packfiles with CLOEXEC, just like we recently started opening
the tempfiles and lockfiles with the flag.  The reason why I asked
if the site that spawns (i.e. run_command API) has an easy access to
the list of file descriptors that we opened ONLY for ourselves is
because that would make it possible to consider an alternative
approach close them before execve(2) in run_commands API.  That
would save us from having to sprinkle existing calls to open(2) with
CLOEXEC.  But if that is not the case, opening them with CLOEXEC is
a much better solution than going outside the program to "find" them
in non-portable ways.

Thanks.


Re: [PATCH v6 13/13] read-cache: make sure file handles are not inherited by child processes

2016-08-29 Thread Lars Schneider

> On 29 Aug 2016, at 20:05, Junio C Hamano  wrote:
> 
> larsxschnei...@gmail.com writes:
> 
>> From: Lars Schneider 
>> 
>> Consider the case of a file that requires filtering and is present in
>> branch A but not in branch B. If A is the current HEAD and we checkout B
>> then the following happens:
>> 
>> 1. ce_compare_data() opens the file
>> 2.   index_fd() detects that the file requires to run a clean filter and
>> calls index_stream_convert_blob()
>> 4. index_stream_convert_blob() calls convert_to_git_filter_fd()
>> 5.   convert_to_git_filter_fd() calls apply_filter() which creates a
>> new long running filter process (in case it is the first file
>> of this kind to be filtered)
>> 6.   The new filter process inherits all file handles. This is the
>> default on Linux/OSX and is explicitly defined in the
>> `CreateProcessW` call in `mingw.c` on Windows.
>> 7. ce_compare_data() closes the file
>> 8. Git unlinks the file as it is not present in B
>> 
>> The unlink operation does not work on Windows because the filter process
>> has still an open handle to the file. Apparently that is no problem on
>> Linux/OSX. Probably because "[...] the two file descriptors share open
>> file status flags" (see fork(2)).
> 
> Wait, a, minute.  "that is no problem" may be true as long as "that"
> is "unlinking the now-gone file in the filesystem", but the reason
> does not have anything to do with the "open-file status flags";
> unlike Windows, you _can_ unlink file that has an open file
> descriptor on it.

I see. Thanks for the explanation.

> 
> And even on POSIX systems, if you are doing a long-running helper
> any open file descriptor in the parent process when the long-running
> helper is spawned will become leaked fd.  CLOEXEC is a possible
> solution (but not necessarily the only or the best one) to the fd
> leak in this case.
> 
> How much does the code that spawns these long-running helpers know
> about the file descriptors that happen to be open?

Nothing really.

>  The parent is
> very likely to have pack windows open into .pack files and they need
> to be closed on the child side after fork(2) starts the child
> process but before execve(2) runs the helper, if we want to avoid
> file descriptor leaks.

I think I understand what you are saying. However, during my tests
.pack file fd's were never a problem. I use start_command() [1]
which wraps the fork() and exec calls [2].

How would I find the open .pack file fd's? Should I go through 
/proc/PID/fd? Why is this no problem for other longer running 
commands such as the git-credential-cache--daemon or git-daemon?

Thanks,
Lars


[1] https://github.com/larsxschneider/git/blob/protocol-filter/v6/convert.c#L566
[2] 
https://github.com/larsxschneider/git/blob/protocol-filter/v6/run-command.c#L345-L412




Re: [PATCH v6 13/13] read-cache: make sure file handles are not inherited by child processes

2016-08-29 Thread Junio C Hamano
larsxschnei...@gmail.com writes:

> From: Lars Schneider 
>
> Consider the case of a file that requires filtering and is present in
> branch A but not in branch B. If A is the current HEAD and we checkout B
> then the following happens:
>
> 1. ce_compare_data() opens the file
> 2.   index_fd() detects that the file requires to run a clean filter and
>  calls index_stream_convert_blob()
> 4. index_stream_convert_blob() calls convert_to_git_filter_fd()
> 5.   convert_to_git_filter_fd() calls apply_filter() which creates a
>  new long running filter process (in case it is the first file
>  of this kind to be filtered)
> 6.   The new filter process inherits all file handles. This is the
>  default on Linux/OSX and is explicitly defined in the
>  `CreateProcessW` call in `mingw.c` on Windows.
> 7. ce_compare_data() closes the file
> 8. Git unlinks the file as it is not present in B
>
> The unlink operation does not work on Windows because the filter process
> has still an open handle to the file. Apparently that is no problem on
> Linux/OSX. Probably because "[...] the two file descriptors share open
> file status flags" (see fork(2)).

Wait, a, minute.  "that is no problem" may be true as long as "that"
is "unlinking the now-gone file in the filesystem", but the reason
does not have anything to do with the "open-file status flags";
unlike Windows, you _can_ unlink file that has an open file
descriptor on it.

And even on POSIX systems, if you are doing a long-running helper
any open file descriptor in the parent process when the long-running
helper is spawned will become leaked fd.  CLOEXEC is a possible
solution (but not necessarily the only or the best one) to the fd
leak in this case.

How much does the code that spawns these long-running helpers know
about the file descriptors that happen to be open?  The parent is
very likely to have pack windows open into .pack files and they need
to be closed on the child side after fork(2) starts the child
process but before execve(2) runs the helper, if we want to avoid
file descriptor leaks.

> Fix this problem by opening files in read-cache with the `O_CLOEXEC`
> flag to ensure that the file descriptor does not remain open in a newly
> spawned process. `O_CLOEXEC` is defined as `O_NOINHERIT` on Windows. A
> similar fix for temporary file handles was applied on Git for Windows
> already:
> https://github.com/git-for-windows/git/commit/667b8b51ec850c3e1c7d75dee69dc13c29d1f162

After a few iterations, the final version of the same commit is now
in our tree as d5cb9cbd ("Git 2.10-rc2", 2016-08-26).


[PATCH v6 13/13] read-cache: make sure file handles are not inherited by child processes

2016-08-25 Thread larsxschneider
From: Lars Schneider 

Consider the case of a file that requires filtering and is present in
branch A but not in branch B. If A is the current HEAD and we checkout B
then the following happens:

1. ce_compare_data() opens the file
2.   index_fd() detects that the file requires to run a clean filter and
 calls index_stream_convert_blob()
4. index_stream_convert_blob() calls convert_to_git_filter_fd()
5.   convert_to_git_filter_fd() calls apply_filter() which creates a
 new long running filter process (in case it is the first file
 of this kind to be filtered)
6.   The new filter process inherits all file handles. This is the
 default on Linux/OSX and is explicitly defined in the
 `CreateProcessW` call in `mingw.c` on Windows.
7. ce_compare_data() closes the file
8. Git unlinks the file as it is not present in B

The unlink operation does not work on Windows because the filter process
has still an open handle to the file. Apparently that is no problem on
Linux/OSX. Probably because "[...] the two file descriptors share open
file status flags" (see fork(2)).

Fix this problem by opening files in read-cache with the `O_CLOEXEC`
flag to ensure that the file descriptor does not remain open in a newly
spawned process. `O_CLOEXEC` is defined as `O_NOINHERIT` on Windows. A
similar fix for temporary file handles was applied on Git for Windows
already:
https://github.com/git-for-windows/git/commit/667b8b51ec850c3e1c7d75dee69dc13c29d1f162

Signed-off-by: Lars Schneider 
---
 read-cache.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/read-cache.c b/read-cache.c
index db27766..f481dee 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -159,7 +159,7 @@ void fill_stat_cache_info(struct cache_entry *ce, struct 
stat *st)
 static int ce_compare_data(const struct cache_entry *ce, struct stat *st)
 {
int match = -1;
-   int fd = open(ce->name, O_RDONLY);
+   int fd = open(ce->name, O_RDONLY | O_CLOEXEC);
 
if (fd >= 0) {
unsigned char sha1[20];
-- 
2.9.2

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html