Hi Akihiko, On Sun, Aug 03, 2025 at 04:15:16PM +0900, Akihiko Odaki wrote: > On 2025/08/02 2:27, Arun Menon wrote: > > Hi Akihiko, > > Thanks for the review. > > > > On Fri, Aug 01, 2025 at 04:35:34PM +0900, Akihiko Odaki wrote: > > > On 2025/07/31 22:21, Arun Menon wrote: > > > > Currently post_save hook is called without checking its return. If > > > > post_save > > > > fails, we need to set an error and propagate it to the caller. > > > > > > > > Since post_save hook is called regardless of whether there is a > > > > preceeding error, > > > > it is possible that we have 2 distict errors, one from the preceeding > > > > function > > > > call, and the other from the post_save call. > > > > > > > > Return the latest error to the caller. > > > > > > This needs to be explained better. This patch makes two changes on the > > > behavior when there are two errors: > > > > > > 1) Proceeding errors were propagated before, but they are now > > > dismissed. > > > 2) post_error() errors were dismissed before, but they are now > > > propagated. > > > > > > This message doesn't mention 1) at all. It does say 2) is necessary, but > > > does not explain why. > > > > Please correct me if I am wrong, does the following look ok? > > > > Currently post_save() hook is called without checking its return. > > post_save() hooks, generally does the cleanup, and that is the reason why > > we have been dismissing its failure. > > > > We want to consider > > - the saving of the device state (save or subsection save) and > > - running the cleanup after > > as an atomic operation. And that is why, post_save is called regardless > > of whether there is a preceeding error. > > This means that, it is possible that we have 2 distict errors, one from > > the preceeding function and the other from the post_save() function. > > > > This patch makes two changes on the behavior when there are two errors: > > > > 1) Preceeding errors were propagated before, but they are now > > dismissed if there is a new post_save() error. > > 2) post_save() errors were dismissed before, but they are now > > propagated. > > > > We intend to extend the error propagation feature (saving erros in > > Error object and return to the caller) to the post/pre > > save/load hooks. Therefore, it is important for the user to know > > the exact reason of failure in case any of these hooks fail. > > > > ==== > > I do not know much about the operations we do, or intend to do > > using the post_save() call. But my guess is that we are going to > > revert things or cleanup stuff, and if reverting/cleanup fails, > > then the user should be notified about that. > > All what you stated sounds correct to me and explains 2). But there is still > no reasoning for 1). > > Previously I listed three possible design options: > > - propagate whatever error easier to do so > > - somehow combine errors > > (languages that support exceptions often do this) > > - make sure that no error will happen when there is a proceeding error > > by having two callbacks: > > - One callback that is called only when there is no proceeding error > > and can raise an error > > - Another that is always called but cannot raise an error > > https://lore.kernel.org/qemu-devel/1ff92294-5c8c-4b33-89c1-91d37d6ac...@rsg.ci.i.u-tokyo.ac.jp/ > > It will be a good reasoning if we can show propagating post_save() errors is > easier (or more important) than propagating proceeding errors. > > There are alternative options. "Making sure that no error will happen when > there is a proceeding error by having two callbacks" is one of them. You > wrote: > > We want to consider > > - the saving of the device state (save or subsection save) and > > - running the cleanup after > > as an atomic operation. And that is why, post_save is called > > regardless of whether there is a preceeding error. > > This means that, it is possible that we have 2 distict errors, one > > from the preceeding function and the other from the post_save() > > function. > > But it may not be mandatory. In fact, I searched "post_save" in the code > base and found all implementations are for cleanup and always succeeds, > which makes sense; most (if not all) cleanup operations like g_free() always > succeeds in general.
post_save calls do succeed in general. Now, we are introducing the post_save_errp() function which is an alternative to the post_save() call. We are adding the feature to catch errors in Error object, thereby clearly considering the case where post_save_erp can go wrong. So we have already deviated from the old design that all post_save operations are infalliable cleaup routines. The only catch is, these alternate functions are introduced a couple of commits later. If that is the case, then we need to handle both the errors, and propagate only the most significant (which in my opinion is the post_save) one. In other words, the reasoning for 1, 1) Proceeding errors were propagated before, but they are now dismissed. is that, a failure during the preceeding operation means that we failed in saving the state but a failure during the post-save call, means that we failed in cleanup or restoring back the system to a stable state, which is a more serious one, because it may lead to the system being in an inconsistent state. Can I write something on these lines, to justify the change? > > So perhaps "making sure that no error will happen when there is a proceeding > error" may be feasible, and it may not even require two callbacks because > post_save() always succeeds. I am sorry, I could not completely understand the part, where you mentioned that we need to have 2 callbacks; I am assuming you are mentioning something like try..catch, finally blocks from the other languages. post_save (that can not raise an error) being called in the goto block. So if the main operation fails, we say goto cleanup, and then execute post_save which always succeeds. If the main operation succeeds then we call post_save that can raise an error. Please correct me if I am wrong. > > Regards, > Akihiko Odaki > Regards, Arun