Re: [PATCH v4 0/3] Make update refs more atomic
On Tue, Apr 15, 2014 at 1:32 PM, Michael Haggerty mhag...@alum.mit.edu wrote: On 04/15/2014 06:33 PM, Ronnie Sahlberg wrote: On Mon, Apr 14, 2014 at 11:36 PM, Michael Haggerty mhag...@alum.mit.edu wrote: [...] I wonder, however, whether your approach of changing callers from lock = lock_ref_sha1_basic() (or varient of) write_ref_sha1(lock) to lock = lock_ref_sha1_basic() (or varient of) write_ref_sha1(lock) unlock_ref(lock) | commit_ref_lock(lock) is not doing work that we will soon need to rework. Would it be jumping the gun to change the callers to transaction = ref_transaction_begin(); ref_transaction_{update,delete,etc}(transaction, ...); ref_transaction_{commit,rollback}(transaction, ...); instead? Then we could bury the details of calling write_ref_sha1() and commit_lock_ref() inside ref_transaction_commit() rather than having to expose them in the public API. I think you are right. Lets put this patch series on the backburner for now and start by making all callers use transactions and remove write_ref_sha1() from the public API thar refs.c exports. Once everything is switched over to transactions I can rework this patchseries for ref_transaction_commit() and resubmit to the mailing list. Sounds good. Rewriting callers to use transactions would be a great next step. Please especially keep track of what new features the transactions API still needs. More flexible error handling? The ability to have steps in the transaction that are best-effort (i.e., don't abort the transaction if they fail)? Different reflog messages for different updates within the same transaction rather than one reflog message for all updates? Etc. And some callers who currently change multiple references one at a time might be able to be rewritten to update the references in a single transaction. As an experiment I rewrite most of the callers to use transactions yesterday. Most callers would translate just fine, but some callers, such as walker_fetch() does not yet fit well with the current transaction code. For example that code does want to first take out locks on all refs before it does a lot of processing, with the locks held, before it writes and updates the refs. Some of my thoughts after going over the callers : Currently any locking of refs in a transaction only happens during the commit phase. I think it would be useful to have a mechanism where you could optionally take out locks for the involved refs early during the transaction. So that simple callers could continue using ref_transaction_begin() ref_transaction_create|update|delete()* ref_transaction_commit() but, if a caller such as walker_fetch() could opt to do ref_transaction_begin() ref_transaction_lock_ref()* ...do stuff... ref_transaction_create|update|delete()* ref_transaction_commit() In this second case ref_transaction_commit() would only take out any locks that are missing during the 'lock the refs loop. Suggestion 1: Add a ref_transaction_lock_ref() to allow locking a ref early during a transaction. A second idea is to change the signatures for ref_transaction_create|update|delete() slightly and allow them to return errors early. We can check for things like add_update() failing, check that the ref-name looks sane, check some of the flags, like if has_old==true then old sha1 should not be NULL or 0{40}, etc. Additionally for robustness, if any of these functions detect an error we can flag this in the transaction structure and take action during ref_transaction_commit(). I.e. if a ref_transaction_update had a hard failure, do not allow ref_transaction_commit() to succeed. Suggestion 2: Change ref_transaction_create|delete|update() to return an int. All callers that use these functions should check the function for error. Suggestion 3: remove the qsort and check for duplicates in ref_transaction_commit() Since we are already taking out a lock for each ref we are updating during the transaction any duplicate refs will fail the second attempt to lock the same ref which will implicitly make sure that a transaction will not change the same ref twice. There are only two caveats I see with this third suggestion. 1, The second lock attempt will always cause a die() since we eventually would end up in lock_ref_sha1_basic() and this function will always unconditionally die() if the lock failed. But your locking changes are addressing this, right? (all callers to lock_ref_sha1() or lock_any_ref_for_update() do check for and handle if the lock failed, so that change to not die() should be safe) 2, We would need to take care when a lock fails here to print the proper error message so that we still show Multiple updates for ref '%s' not allowed. if the lock failed because the transaction had already locked this file. Lets start preparing patches to change all external callers to use transactions instead. I am happy to help preparing patches for this. How do
Re: [PATCH v4 0/3] Make update refs more atomic
Ronnie Sahlberg sahlb...@google.com writes: Currently any locking of refs in a transaction only happens during the commit phase. I think it would be useful to have a mechanism where you could optionally take out locks for the involved refs early during the transaction. So that simple callers could continue using ref_transaction_begin() ref_transaction_create|update|delete()* ref_transaction_commit() but, if a caller such as walker_fetch() could opt to do ref_transaction_begin() ref_transaction_lock_ref()* ...do stuff... ref_transaction_create|update|delete()* ref_transaction_commit() In this second case ref_transaction_commit() would only take out any locks that are missing during the 'lock the refs loop. Suggestion 1: Add a ref_transaction_lock_ref() to allow locking a ref early during a transaction. Hmph. I am not sure if that is the right way to go, or instead change all create/update/delete to take locks without adding a new primitive. A second idea is to change the signatures for ref_transaction_create|update|delete() slightly and allow them to return errors early. We can check for things like add_update() failing, check that the ref-name looks sane, check some of the flags, like if has_old==true then old sha1 should not be NULL or 0{40}, etc. Additionally for robustness, if any of these functions detect an error we can flag this in the transaction structure and take action during ref_transaction_commit(). I.e. if a ref_transaction_update had a hard failure, do not allow ref_transaction_commit() to succeed. Suggestion 2: Change ref_transaction_create|delete|update() to return an int. All callers that use these functions should check the function for error. I think that is a very sensible thing to do. The details of determining this cannot possibly succeed may change (for example, if we have them take locks at the point of create/delete/update, a failure to lock may count as an early error). Is there any reason why this should be conditional (i.e. you said allow them to, implying that the early failure is optional)? Suggestion 3: remove the qsort and check for duplicates in ref_transaction_commit() Since we are already taking out a lock for each ref we are updating during the transaction any duplicate refs will fail the second attempt to lock the same ref which will implicitly make sure that a transaction will not change the same ref twice. I do not know if I care about the implementation detail of do we have a unique set of update requests?. While I do not see a strong need for one transaction to touch the same ref twice (e.g. create to point at commit A and update it to point at commit B), I do not see why we should forbid such a use in the future. Just some of my knee-jerk reactions. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 0/3] Make update refs more atomic
On Wed, Apr 16, 2014 at 12:31 PM, Junio C Hamano gits...@pobox.com wrote: Ronnie Sahlberg sahlb...@google.com writes: Currently any locking of refs in a transaction only happens during the commit phase. I think it would be useful to have a mechanism where you could optionally take out locks for the involved refs early during the transaction. So that simple callers could continue using ref_transaction_begin() ref_transaction_create|update|delete()* ref_transaction_commit() but, if a caller such as walker_fetch() could opt to do ref_transaction_begin() ref_transaction_lock_ref()* ...do stuff... ref_transaction_create|update|delete()* ref_transaction_commit() In this second case ref_transaction_commit() would only take out any locks that are missing during the 'lock the refs loop. Suggestion 1: Add a ref_transaction_lock_ref() to allow locking a ref early during a transaction. Hmph. I am not sure if that is the right way to go, or instead change all create/update/delete to take locks without adding a new primitive. ack. A second idea is to change the signatures for ref_transaction_create|update|delete() slightly and allow them to return errors early. We can check for things like add_update() failing, check that the ref-name looks sane, check some of the flags, like if has_old==true then old sha1 should not be NULL or 0{40}, etc. Additionally for robustness, if any of these functions detect an error we can flag this in the transaction structure and take action during ref_transaction_commit(). I.e. if a ref_transaction_update had a hard failure, do not allow ref_transaction_commit() to succeed. Suggestion 2: Change ref_transaction_create|delete|update() to return an int. All callers that use these functions should check the function for error. I think that is a very sensible thing to do. The details of determining this cannot possibly succeed may change (for example, if we have them take locks at the point of create/delete/update, a failure to lock may count as an early error). Is there any reason why this should be conditional (i.e. you said allow them to, implying that the early failure is optional)? It was poor wording on my side. Checking for the ref_transaction_*() return for error should be mandatory (modulo bugs). But a caller could be buggy and fail to check properly. It would be very cheap to detect this condition in ref_transaction_commit() which could then do die(transaction commit called for errored transaction); which would make it easy to spot this kind of bugs. Suggestion 3: remove the qsort and check for duplicates in ref_transaction_commit() Since we are already taking out a lock for each ref we are updating during the transaction any duplicate refs will fail the second attempt to lock the same ref which will implicitly make sure that a transaction will not change the same ref twice. I do not know if I care about the implementation detail of do we have a unique set of update requests?. While I do not see a strong need for one transaction to touch the same ref twice (e.g. create to point at commit A and update it to point at commit B), I do not see why we should forbid such a use in the future. ack. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 0/3] Make update refs more atomic
Ronnie Sahlberg sahlb...@google.com writes: I am not sure if that is the right way to go, or instead change all create/update/delete to take locks without adding a new primitive. ack. Hmph. When I say I am not sure, I dunno, etc., I do mean it. Did you mean by Ack I do not know, either, or I think it is better to take locks early everywhere? -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 0/3] Make update refs more atomic
On 04/16/2014 09:31 PM, Junio C Hamano wrote: Ronnie Sahlberg sahlb...@google.com writes: Currently any locking of refs in a transaction only happens during the commit phase. I think it would be useful to have a mechanism where you could optionally take out locks for the involved refs early during the transaction. So that simple callers could continue using ref_transaction_begin() ref_transaction_create|update|delete()* ref_transaction_commit() but, if a caller such as walker_fetch() could opt to do ref_transaction_begin() ref_transaction_lock_ref()* ...do stuff... ref_transaction_create|update|delete()* ref_transaction_commit() In this second case ref_transaction_commit() would only take out any locks that are missing during the 'lock the refs loop. Suggestion 1: Add a ref_transaction_lock_ref() to allow locking a ref early during a transaction. Hmph. I am not sure if that is the right way to go, or instead change all create/update/delete to take locks without adding a new primitive. Junio's suggestion seems like a good idea to me. Obviously, as soon as we take out a lock we could also do any applicable old_sha1 check and possibly fail fast. Does a verify operation require holding a lock? If not, when is the verification done--early, or during the commit, or both? (I realize that we don't yet *have* a verify operation at the API level, but we should!) We also need to think about for what period of time we have to hold the packed-refs lock. Finally, we shouldn't forget that currently the reflog files are locked by holding the lock on the corresponding loose reference file. Do we need to integrate these files into our system any more than they currently are? [By the way, I noticed the other day that the command git reflog expire --stale-fix --expire-unreachable=now --all can hold loose reference locks for a *long* time (like 10s of minutes), especially in weird cases like the repository having 9000 packfiles for some reason or another :-) The command execution time grows strongly with the length of the reference's log, or maybe even the square of the length assuming the history is roughly linear and reachability is computed separately for each SHA-1. This is just an empirical observation so far; I haven't looked into the code yet.] A second idea is to change the signatures for ref_transaction_create|update|delete() slightly and allow them to return errors early. We can check for things like add_update() failing, check that the ref-name looks sane, check some of the flags, like if has_old==true then old sha1 should not be NULL or 0{40}, etc. Additionally for robustness, if any of these functions detect an error we can flag this in the transaction structure and take action during ref_transaction_commit(). I.e. if a ref_transaction_update had a hard failure, do not allow ref_transaction_commit() to succeed. Suggestion 2: Change ref_transaction_create|delete|update() to return an int. All callers that use these functions should check the function for error. I think that is a very sensible thing to do. The details of determining this cannot possibly succeed may change (for example, if we have them take locks at the point of create/delete/update, a failure to lock may count as an early error). Is there any reason why this should be conditional (i.e. you said allow them to, implying that the early failure is optional)? Also a good suggestion. We should make it clear in the documentation that the create/delete/update functions are not *obliged* to return an error (for example that the current value of the reference does not agree with old_sha1) because future alternate ref backends might possibly not be able to make such checks until the commit phase. For example, checking old_sha1 might involve a round-trip to a database or remote repository, in which case it might be preferable to check all values in a single round-trip upon commit. So, callers might be informed early of problems, or they might only learn about problems when they try to commit the transaction. They have to be able to handle either type of error reporting. So then the question arises (and maybe this is what Ronnie was getting at by suggesting that the checks might be conditional): are callers *obliged* to check the return values from create/delete/update, or are they allowed to just keep throwing everything into the transaction, ignoring errors, and only check the result of ref_transaction_commit()? I don't feel strongly one way or the other about this question. It might be nice to be able to write callers sloppily, but it would cost a bit more code in the reference backends. Though maybe it wouldn't even be much extra code, given that we would probably want to put consistency checks in there anyway. Suggestion 3: remove the qsort and check for duplicates in ref_transaction_commit() Since we are already taking out a lock for each ref we are updating
Re: [PATCH v4 0/3] Make update refs more atomic
On 04/14/2014 08:29 PM, Ronnie Sahlberg wrote: refs.c:ref_transaction_commit() intermingles doing updates and checks with actually applying changes to the refs in loops that abort on error. This is done one ref at a time and means that if an error is detected that will fail the operation partway through the list of refs to update we will end up with some changes applied to disk and others not. Without having transaction support from the filesystem, it is hard to make an update that involves multiple refs to guarantee atomicity, but we can do a somewhat better than we currently do. It took me a moment to understand what you were talking about here, because the code for ref_transaction_commit() already seems superficially to do reference modifications in phases. The problem is that write_ref_sha1() internally contains additional checks that can fail in normal circumstances. So the most important part of this patch series is allowing those checks to be done before committing anything. These patches change the update and delete functions to use a three call pattern of 1, lock 2, update, or flag for deletion 3, apply on disk (rename() or unlink()) When a transaction is commited we first do all the locking, preparations and most of the error checking before we actually start applying any changes to the filesystem store. This means that more of the error cases that will fail the commit will trigger before we start doing any changes to the actual files. This should make the changes of refs in refs_transaction_commit slightly more atomic. [...] Yes, this is a good and important goal. I wonder, however, whether your approach of changing callers from lock = lock_ref_sha1_basic() (or varient of) write_ref_sha1(lock) to lock = lock_ref_sha1_basic() (or varient of) write_ref_sha1(lock) unlock_ref(lock) | commit_ref_lock(lock) is not doing work that we will soon need to rework. Would it be jumping the gun to change the callers to transaction = ref_transaction_begin(); ref_transaction_{update,delete,etc}(transaction, ...); ref_transaction_{commit,rollback}(transaction, ...); instead? Then we could bury the details of calling write_ref_sha1() and commit_lock_ref() inside ref_transaction_commit() rather than having to expose them in the public API. I suspect that the answer is no, ref transactions are not yet powerful enough to do everything that the callers need. But then I would suggest that we *make* them powerful enough and *then* make the change at the callers. I'm not saying that we shouldn't accept your change as a first step [1] and do the next step later, but wanted to get your reaction about making the first step a bit more ambitious. Michael [1] Though I still need to review your patch series in detail. -- Michael Haggerty mhag...@alum.mit.edu http://softwareswirl.blogspot.com/ -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 0/3] Make update refs more atomic
On Mon, Apr 14, 2014 at 1:24 PM, Junio C Hamano gits...@pobox.com wrote: Thanks; will queue. Junio, Please defer queuing for now. I think we should convert more of the external callers to use transactions first. Once that is done and everything uses transactions I will re-send an updated version of this patch series. regards ronnie sahlberg -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 0/3] Make update refs more atomic
On 04/15/2014 06:33 PM, Ronnie Sahlberg wrote: On Mon, Apr 14, 2014 at 11:36 PM, Michael Haggerty mhag...@alum.mit.edu wrote: [...] I wonder, however, whether your approach of changing callers from lock = lock_ref_sha1_basic() (or varient of) write_ref_sha1(lock) to lock = lock_ref_sha1_basic() (or varient of) write_ref_sha1(lock) unlock_ref(lock) | commit_ref_lock(lock) is not doing work that we will soon need to rework. Would it be jumping the gun to change the callers to transaction = ref_transaction_begin(); ref_transaction_{update,delete,etc}(transaction, ...); ref_transaction_{commit,rollback}(transaction, ...); instead? Then we could bury the details of calling write_ref_sha1() and commit_lock_ref() inside ref_transaction_commit() rather than having to expose them in the public API. I think you are right. Lets put this patch series on the backburner for now and start by making all callers use transactions and remove write_ref_sha1() from the public API thar refs.c exports. Once everything is switched over to transactions I can rework this patchseries for ref_transaction_commit() and resubmit to the mailing list. Sounds good. Rewriting callers to use transactions would be a great next step. Please especially keep track of what new features the transactions API still needs. More flexible error handling? The ability to have steps in the transaction that are best-effort (i.e., don't abort the transaction if they fail)? Different reflog messages for different updates within the same transaction rather than one reflog message for all updates? Etc. And some callers who currently change multiple references one at a time might be able to be rewritten to update the references in a single transaction. Lets start preparing patches to change all external callers to use transactions instead. I am happy to help preparing patches for this. How do we ensure that we do not create duplicate work and work on the same functions? I have a few loose ends to take care of on my lockfile patch series, and there are a few things I would like to tidy up internal to the transactions implementation, so I think if you are working on the caller side then we won't step on each other's toes too much in the near future. I suggest we use IRC (mhagger@freenode) or XMPP (mhag...@jabber.org) for small-scale coordination. I also have a GitHub repo (http://github.com/mhagger/git) to which I often push intermediate results; I will try to push to that more regularly (warning: I often rebase feature branches even after they are pushed to GitHub). I think you are in Pacific Time whereas I am in Berlin, so we will tend to work in serial rather than in parallel; that should help. It would be a good habit to shoot each short status emails at the end of each working day. Of course we should only use one-on-one communication for early work; as soon as something is getting ripe we should make sure our technical discussions take place here on the mailing list. Sound OK? Michael -- Michael Haggerty mhag...@alum.mit.edu http://softwareswirl.blogspot.com/ -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 0/3] Make update refs more atomic
Thanks; will queue. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html