Re: [PATCH v4 08/13] commit-graph: implement --delete-expired
Derrick Stolee writes: > My thought was that using a fixed name > (e.g. .git/objects/info/commit-graph) would block making the graph > incremental. Upon thinking again, this is not the case. That feature > could be designed with a fixed name for the small, frequently-updated > file and use .../info/graph-.graph names for the "base" graph > files. I guess that is in line with the way how split-index folks did their thing ;-)
Re: [PATCH v4 08/13] commit-graph: implement --delete-expired
On 2/23/2018 2:33 PM, Junio C Hamano wrote: Derrick Stolee writes: The (unlikely, but possible) race condition involves two processes (P1 and P2): 1. P1 reads from graph-latest to see commit graph file F1. 2. P2 updates graph-latest to point to F2 and deletes F1. 3. P1 tries to read F1 and fails. I could explicitly mention this condition in the message, or we can just let P2 fail by deleting all files other than the one referenced by 'graph-latest'. Thoughts? The established way we do this is not to have -latest pointer, I would think, and instead, make -latest be the actual thing. That is how $GIT_DIR/index is updated, for example, by first writing into a temporary file and then moving it to the final destination. Is there a reason why the same pattern cannot be used? My thought was that using a fixed name (e.g. .git/objects/info/commit-graph) would block making the graph incremental. Upon thinking again, this is not the case. That feature could be designed with a fixed name for the small, frequently-updated file and use .../info/graph-.graph names for the "base" graph files. I'll spend some time thinking about the ramifications of this fixed-name concept. At the moment, it would remove two commits from this patch series, which is nice. Thanks, -Stolee
Re: [PATCH v4 08/13] commit-graph: implement --delete-expired
Derrick Stolee writes: > The (unlikely, but possible) race condition involves two processes (P1 > and P2): > > 1. P1 reads from graph-latest to see commit graph file F1. > 2. P2 updates graph-latest to point to F2 and deletes F1. > 3. P1 tries to read F1 and fails. > > I could explicitly mention this condition in the message, or we can > just let P2 fail by deleting all files other than the one referenced > by 'graph-latest'. Thoughts? The established way we do this is not to have -latest pointer, I would think, and instead, make -latest be the actual thing. That is how $GIT_DIR/index is updated, for example, by first writing into a temporary file and then moving it to the final destination. Is there a reason why the same pattern cannot be used?
Re: [PATCH v4 08/13] commit-graph: implement --delete-expired
On 2/22/2018 1:48 PM, Junio C Hamano wrote: Derrick Stolee writes: Teach git-commit-graph to delete the .graph files that are siblings of a newly-written graph file, except for the file referenced by 'graph-latest' at the beginning of the process and the newly-written file. If we fail to delete a graph file, only report a warning because another git process may be using that file. In a multi-process environment, we expect the previoius graph file to be used by a concurrent process, so we do not delete it to avoid race conditions. I do not understand the later part of the above. On some operating systems, you actually can remove a file that is open by another process without any ill effect. There are systems that do not allow removing a file that is in use, and an attempt to unlink it may fail. The need to handle such a failure gracefully is not limited to the case of removing a commit graph file---we need to deal with it when removing file of _any_ type. My thought is that we should _warn_ when we fail to delete a .graph file that we think should be safe to delete. However, if we are warning for a file that is currently being accessed (as is the case on Windows, at least), then we will add a lot of noise. This is especially true when using IDEs that run 'status' or 'fetch' in the background, frequently. Especially the last sentence "we do not delete it to avoid race conditions" I find problematic. If a system does not allow removing a file in use and we detect a failure after an attempt to do so, it is not "we do not delete it" --- even if you do, you won't succeed anyway, so there is no point saying that. And on systems that do allow safe removal of a file in use (i.e. they allow an open file to be used by processes that have open filehandles to it after its removal), there is no point refraining to delete it "to avoid race conditions", either---in fact it is unlikely that you would even know somebody else had it open and was using it. The (unlikely, but possible) race condition involves two processes (P1 and P2): 1. P1 reads from graph-latest to see commit graph file F1. 2. P2 updates graph-latest to point to F2 and deletes F1. 3. P1 tries to read F1 and fails. I could explicitly mention this condition in the message, or we can just let P2 fail by deleting all files other than the one referenced by 'graph-latest'. Thoughts? In any case, I do not think '--delete-expired' option that can be given only when you are writing makes much sense as an API. An 'expire' command, just like 'set-latest' command, that is a separate command from 'write', may make sense, though. In another message, I proposed dropping the argument and assuming expires happen on every write. Thanks, -Stolee
Re: [PATCH v4 08/13] commit-graph: implement --delete-expired
On 2/21/2018 4:34 PM, Stefan Beller wrote: On Mon, Feb 19, 2018 at 10:53 AM, Derrick Stolee wrote: graph_name = write_commit_graph(opts.obj_dir); if (graph_name) { if (opts.set_latest) set_latest_file(opts.obj_dir, graph_name); + if (opts.delete_expired) + do_delete_expired(opts.obj_dir, + old_graph_name, + graph_name); + So this only allows to delete expired things and setting the latest when writing a new graph. Would we ever envision a user to produce a new graph (e.g. via obtaining a graph that they got from a server) and then manually rerouting the latest to that new graph file without writing that graph file in the same process? The same for expired. I guess these operations are just available via editing the latest or deleting files manually, which slightly contradicts e.g. "git update-ref", which in olden times was just a fancy way of rewriting the refs file manually. (though it claims to be less prone to errors as it takes lock files) I imagine these alternatives for placing a new, latest commit graph file would want Git to handle rewriting the "graph-latest" file. Given such a use case, we could consider extending the 'commit-graph' interface, but I don't want to plan for it now. extern char *get_graph_latest_filename(const char *obj_dir); +extern char *get_graph_latest_contents(const char *obj_dir); Did https://public-inbox.org/git/20180208213806.ga6...@sigill.intra.peff.net/ ever make it into tree? (It is sort of new, but I feel we'd want to strive for consistency in the code base, eventually.) Thank you for the reminder. I've removed the externs from 'commit-graph.h'. Should I also remove the externs from other methods I introduce even though their surrounding definitions include 'extern'? Thanks, -Stolee
Re: [PATCH v4 08/13] commit-graph: implement --delete-expired
Derrick Stolee writes: > Teach git-commit-graph to delete the .graph files that are siblings of a > newly-written graph file, except for the file referenced by 'graph-latest' > at the beginning of the process and the newly-written file. If we fail to > delete a graph file, only report a warning because another git process may > be using that file. In a multi-process environment, we expect the previoius > graph file to be used by a concurrent process, so we do not delete it to > avoid race conditions. I do not understand the later part of the above. On some operating systems, you actually can remove a file that is open by another process without any ill effect. There are systems that do not allow removing a file that is in use, and an attempt to unlink it may fail. The need to handle such a failure gracefully is not limited to the case of removing a commit graph file---we need to deal with it when removing file of _any_ type. Especially the last sentence "we do not delete it to avoid race conditions" I find problematic. If a system does not allow removing a file in use and we detect a failure after an attempt to do so, it is not "we do not delete it" --- even if you do, you won't succeed anyway, so there is no point saying that. And on systems that do allow safe removal of a file in use (i.e. they allow an open file to be used by processes that have open filehandles to it after its removal), there is no point refraining to delete it "to avoid race conditions", either---in fact it is unlikely that you would even know somebody else had it open and was using it. In any case, I do not think '--delete-expired' option that can be given only when you are writing makes much sense as an API. An 'expire' command, just like 'set-latest' command, that is a separate command from 'write', may make sense, though.
Re: [PATCH v4 08/13] commit-graph: implement --delete-expired
On Mon, Feb 19, 2018 at 10:53 AM, Derrick Stolee wrote: > graph_name = write_commit_graph(opts.obj_dir); > > if (graph_name) { > if (opts.set_latest) > set_latest_file(opts.obj_dir, graph_name); > > + if (opts.delete_expired) > + do_delete_expired(opts.obj_dir, > + old_graph_name, > + graph_name); > + So this only allows to delete expired things and setting the latest when writing a new graph. Would we ever envision a user to produce a new graph (e.g. via obtaining a graph that they got from a server) and then manually rerouting the latest to that new graph file without writing that graph file in the same process? The same for expired. I guess these operations are just available via editing the latest or deleting files manually, which slightly contradicts e.g. "git update-ref", which in olden times was just a fancy way of rewriting the refs file manually. (though it claims to be less prone to errors as it takes lock files) > > extern char *get_graph_latest_filename(const char *obj_dir); > +extern char *get_graph_latest_contents(const char *obj_dir); Did https://public-inbox.org/git/20180208213806.ga6...@sigill.intra.peff.net/ ever make it into tree? (It is sort of new, but I feel we'd want to strive for consistency in the code base, eventually.)