vlc | branch: master | Romain Vimont <[email protected]> | Wed Sep 9 12:55:52 2020 +0200| [0b938e35bfa444be5abad09abeaccb90291d6803] | committer: Alexandre Janniaux
executor: add design documentation Signed-off-by: Alexandre Janniaux <[email protected]> > http://git.videolan.org/gitweb.cgi/vlc.git/?a=commit;h=0b938e35bfa444be5abad09abeaccb90291d6803 --- doc/executor.md | 256 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 256 insertions(+) diff --git a/doc/executor.md b/doc/executor.md new file mode 100644 index 0000000000..7305e61c86 --- /dev/null +++ b/doc/executor.md @@ -0,0 +1,256 @@ +# Executor + +## Overview + +The executor API allows to submit _runnables_ to be executed from background +threads. + +The runnable instances are allocated and freed by the caller. It is the +responsibility of the caller to guarantee that the runnable is valid until +the task is canceled or complete. + +The caller is expected to (but not forced to) embed the runnable into its +custom task structure: + +```c +struct my_task +{ + /* custom data */ + int num; + char *str; + + /* embedded runnable */ + struct vlc_runnable runnable; +}; + +void Run(void *userdata) +{ + // ... +} +``` + +and submit it as follow: + +```c + struct my_task *task = ...; + + task->runnable.run = Run; + task->runnable.userdata = task; /* userdata passed to Run() */ + + vlc_executor_Submit(executor, &task->runnable); +``` + +Since the task is allocated by the caller, vlc_executor_Submit() may not fail +(it returns void). + +The cancelation of a submitted runnable may be requested. It succeeds only if +the runnable is not started yet. + +```c +VLC_API void +vlc_executor_Submit(vlc_executor_t *executor, struct vlc_runnable *runnable); + +VLC_API bool +vlc_executor_Cancel(vlc_executor_t *executor, struct vlc_runnable *runnable); +``` + + +## Design discussions + +The design of this API is the result of many discussions, especially on the +mailing-list: + - <https://mailman.videolan.org/pipermail/vlc-devel/2020-August/136696.html> + - <https://mailman.videolan.org/pipermail/vlc-devel/2020-September/136944.html> + - <https://mailman.videolan.org/pipermail/vlc-devel/2020-September/137188.html> + +This documentation aims to explain the rationale for each decision. + + +### Runnable allocation + +The `struct vlc_runnable` must be provided by the client, and the executor uses +only this instance to represent its task (it does not allocate per-task private +data). + +Firstly, note that this allows `vlc_executor_Submit()` to return `void`, so no +error handling is necessary. It's minor, but convenient. + +An attractive alternative would be to return a new allocated runnable on +submission, which could be used for cancelation: + +```c +struct vlc_runnable * +vlc_executor_Submit(vlc_executor_t *, void (*run)(void *), void *userdata); + +void vlc_executor_Cancel(vlc_executor_t *, struct vlc_runnable *); +``` + +But this is inherently racy: a naive implementation could return a pointer to a +`vlc_runnable` already freed. Indeed, `vlc_executor_Submit()` must post the +runnable to some pending queue, and in theory it may be executed by a background +thread even before the function returns. + +For example, if the executor frees the runnable as soon as its execution is +complete, this simple code is incorrect: + +```c +struct my_task +{ + char *some_custom_data; + struct vlc_runnable *runnable; /* keep a pointer to cancel it */ +}; + +void Run(void *userdata) +{ + struct my_task *task = userdata; + /* (task->runnable may be uninitialized here) */ +} +``` + +```c + struct my_task *task = ...; + task->runnable = vlc_executor_Submit(executor, Run, task); + /* task->runnable may be already freed here */ + vlc_executor_Cancel(executor, task->runnable); /* boom, use-after-free */ +``` + +To avoid the use-after-free, the runnable ownership must be shared. This makes +the API more complex: + +```c +struct vlc_runnable_t *runnable = vlc_executor_Submit(executor, Run, task); +vlc_runnable_Release(runnable); /* don't need the runnable anymore */ +``` + +However, this would still be insufficient: the task resources must typically be +released at the end of the `run()` callback, but at this point +`vlc_executor_Submit()` may not have returned yet, so the runnable instance +would not be known by the user. + +Since in practice, the client needs its own task structure (to store the +parameters, state and result of the execution), so it must allocate it and pass +it to the executor (as userdata) anyway. Therefore, it's simpler if the task +already embed the runnable. + +Note that it is not mandatory to embed the runnable into the client task +structure (it's just convenient). The runnable could be anywhere, for example on +the stack (provided that it lives long enough). + +This design also has disadvantages. In particular, it forces the runnable queue +to be implemented as an intrusive list. As a consequence, the public `struct +vlc_runnable` must include an executor-private field (`struct vlc_list node`), +and the same runnable may not be queued twice (it is not possible to put the +same item twice in the same intrusive list). + +Therefore, a client must always explicitly create a new runnable for each +execution. In practice, it should not be a problem though, since it must often +create a new custom task for the execution state anyway. + +In addition, further extensions of the executor API are constrained by the fact +that it could not allocate and store per-task data (other than the intrusive +list node). + + +### Cancelation + +It is important to be able to cancel and interrupt a running task. + +However, in C, the interruption mechanism is very specific to the concrete task +implementation, so it could not be provided by the executor: it has to be +provided by the user. And since the user is also at the origin of the +cancelation request (which could lead to the interruption), the executor just +let the user handle interruption manually. + +Since it does not handle interruption, it could not provide a generic timeout +mechanism either (see below). The user has to handle timeout manually. + +It just provides a way to cancel a queued task not started yet: + +```c +VLC_API bool +vlc_executor_Cancel(vlc_executor_t *executor, struct vlc_runnable *runnable); +``` + +The runnable instance is created and destroyed by the user, so there is no +possible use-after-free race condition. + +Since the runnable is queued via an intrusive list, it can be removed in O(1). + +The tricky part is that it must indicate to the user if the runnable has +actually been dequeued or not (i.e. if it has already been taken by a thread to +be run). Indeed, the user must be able to know if the `run()` callback will be +executed or not, in order to release the task resources correctly in all cases, +without race conditions. This adds some complexity for the user. + +For the implementation, the problem is that even if we can remove an item in an +intrusive list in O(1), there is a priori no way to know immediately if the item +was in a specific list or not. To circumvent this problem, when an item is +dequeued, the executor resets the list nodes to `NULL`. This allows to store 1 +bit of information indicating whether the item has been dequeued or not. + +Note that the use-after-free race condition on cancel could alternatively be +solved by passing an optional user-provided `void *id`: + +```c +struct vlc_runnable +{ + void (*run)(void *userdata) + void *userdata; + struct vlc_list node; + + /* add a user-provided id (may be NULL) used for cancelation */ + void *id; +} + +void vlc_executor_Submit(vlc_executor_t *executor, struct vlc_runnable *); + +void vlc_executor_Cancel(vlc_executor_t *executor, void *id); +``` + +This would solve the problem because the user owns the `id`, so it is guaranteed +to exist on cancel. This is also practical, because the user could just pass a +pointer to its custom task structure. + +An additional advantage is that a higher-level API (like the preparser) could +directly benefit from the cancelation API, without tracking its submitted tasks +to find the matching runnable. + +But it would only make sense if the user was not already forced to track its +submitted tasks, to be able to cancel them on deletion. Indeed, since the +executor must report for each task if it has been dequeued or not, it is not +possible to provide a function to cancel all tasks at once. Therefore, the user +has to keep track of submitted tasks, which is inconvenient. + + +### Timeout + +A timeout implementation is often very specific to the concrete task +implementation (for example, calling `vlc_cond_timedwait()` instead of +`vlc_cond_wait()`), so it makes sense to let the user implement it. + +However, we could see the timeout as a cancelation where the deadline is known +in advance. Since the user must already implement the interruption on its own, +it could have been used to provide a timeout mechanism "for free" (for the +user). + +We decided against it, for several reasons: + - in theory, the user may want to do something different on timeout and + cancelation (it may for example consider that one is an error and the other + is not); + - this would require more callbacks and would complexify the executor API; + - if it's necessary, the user could just use a separate component (timer) to + trigger the interruption. + + +## Conclusion + +The resulting executor is a "minimal" API, simple and general: it just handles +execution. + +As a drawback, it does not help for interruption, cancelation and timeout, so +the user has more work to do on its own. For example, it must always track its +submitted tasks, and implement boilerplate to correctly handle cancelation and +interruption on deletion. + +But I can't think of an executor API design in C with all the desirable +properties. Any design I considered sucked one way or another. :/ _______________________________________________ vlc-commits mailing list [email protected] https://mailman.videolan.org/listinfo/vlc-commits
