On 01.11.2018 21:29, Ben Pfaff wrote:
> On Thu, Nov 01, 2018 at 04:58:40PM +0300, Ilya Maximets wrote:
>> revalidator_purge() iterates and modifies umap->cmap. This should
>> not happen in quiescent state, because cmap implementation based
>> on rcu protected variables. Let's move the thread to active state
>> before purging to avoid possible wrong memory accesses.
>>
>> CC: Joe Stringer <[email protected]>
>> Fixes: 9fce0584a643 ("revalidator: Use 'cmap' for storing ukeys.")
>> Signed-off-by: Ilya Maximets <[email protected]>
>
> Thank you for catching the problem, and for the fix. This is quite
> subtle.
>
> Looking at the code here, and the history, it's not at all clear to me
> why there needs to be such a wide window of RCU quiescing. Usually, RCU
> quiescing is to avoid delaying the RCU freeing thread across an
> operation that can block for a considerable amount of time. Across
> udpif_stop_threads() and udpif_start_threads(), I only see a few
> operations that can do that, in particular the xpthread_join() calls.
> dpif_disable_upcall() could also arguably be slow for dpif-netdev since
> it's capturing a rwlock for writing.
>
> So, I'd be inclined to fix this by narrowing the quiescent window,
> something like the following. Do you see flaws in my analysis?
I agree with you. One thing is that pthread_create() also could
consume valuable amount of time on some systems. Up to few hundreds
of milliseconds. This could be a problem in case of big number of
revalidators. So, I'd like to have following incremental:
---
diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c
index 57869cb32..e44339338 100644
--- a/ofproto/ofproto-dpif-upcall.c
+++ b/ofproto/ofproto-dpif-upcall.c
@@ -565,6 +565,8 @@ udpif_start_threads(struct udpif *udpif, size_t n_handlers_,
size_t n_revalidators_)
{
if (udpif && n_handlers_ && n_revalidators_) {
+ ovsrcu_quiesce_start();
+
udpif->n_handlers = n_handlers_;
udpif->n_revalidators = n_revalidators_;
@@ -595,6 +597,7 @@ udpif_start_threads(struct udpif *udpif, size_t n_handlers_,
revalidator->thread = ovs_thread_create(
"revalidator", udpif_revalidator, revalidator);
}
+ ovsrcu_quiesce_end();
}
}
---
What do you think?
Will you format a patch?
Few comments inline.
Best regards, Ilya Maximets.
>
> diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c
> index e36cfa0eecca..22beaa569719 100644
> --- a/ofproto/ofproto-dpif-upcall.c
> +++ b/ofproto/ofproto-dpif-upcall.c
> @@ -504,34 +504,34 @@ udpif_destroy(struct udpif *udpif)
> free(udpif);
> }
>
> -/* Stops the handler and revalidator threads, must be enclosed in
> - * ovsrcu quiescent state unless when destroying udpif. */
> +/* Stops the handler and revalidator threads. */
> static void
> udpif_stop_threads(struct udpif *udpif)
> {
> if (udpif && (udpif->n_handlers != 0 || udpif->n_revalidators != 0)) {
> size_t i;
>
> + /* Tell the threads to exit. */
> latch_set(&udpif->exit_latch);
>
> + /* Wait for the threads to exit. This can take a long time so
> quiesce
> + * so as not to block RCU freeing. */
Something wrong with above comment. Could you fix or re-word?
> + ovsrcu_quiesce_start();
> for (i = 0; i < udpif->n_handlers; i++) {
> struct handler *handler = &udpif->handlers[i];
>
> xpthread_join(handler->thread, NULL);
As you're dropping the local variable for purging loop below, maybe
it'll be good to drop the local variable here too. This will make all
three loops look uniform.
> }
> -
> for (i = 0; i < udpif->n_revalidators; i++) {
> xpthread_join(udpif->revalidators[i].thread, NULL);
> }
> -
> dpif_disable_upcall(udpif->dpif);
> + ovsrcu_quiesce_end();
>
> + /* Delete ukeys, and delete all flows from the datapath to prevent
> + * double-counting stats. */
> for (i = 0; i < udpif->n_revalidators; i++) {
> - struct revalidator *revalidator = &udpif->revalidators[i];
> -
> - /* Delete ukeys, and delete all flows from the datapath to
> prevent
> - * double-counting stats. */
> - revalidator_purge(revalidator);
> + revalidator_purge(&udpif->revalidators[i]);
> }
>
> latch_poll(&udpif->exit_latch);
> @@ -549,8 +549,7 @@ udpif_stop_threads(struct udpif *udpif)
> }
> }
>
> -/* Starts the handler and revalidator threads, must be enclosed in
> - * ovsrcu quiescent state. */
> +/* Starts the handler and revalidator threads. */
> static void
> udpif_start_threads(struct udpif *udpif, size_t n_handlers_,
> size_t n_revalidators_)
> @@ -623,7 +622,6 @@ udpif_set_threads(struct udpif *udpif, size_t n_handlers_,
> ovs_assert(udpif);
> ovs_assert(n_handlers_ && n_revalidators_);
>
> - ovsrcu_quiesce_start();
> if (udpif->n_handlers != n_handlers_
> || udpif->n_revalidators != n_revalidators_) {
> udpif_stop_threads(udpif);
> @@ -641,7 +639,6 @@ udpif_set_threads(struct udpif *udpif, size_t n_handlers_,
>
> udpif_start_threads(udpif, n_handlers_, n_revalidators_);
> }
> - ovsrcu_quiesce_end();
> }
>
> /* Waits for all ongoing upcall translations to complete. This ensures that
> @@ -657,10 +654,8 @@ udpif_synchronize(struct udpif *udpif)
> size_t n_handlers_ = udpif->n_handlers;
> size_t n_revalidators_ = udpif->n_revalidators;
>
> - ovsrcu_quiesce_start();
> udpif_stop_threads(udpif);
> udpif_start_threads(udpif, n_handlers_, n_revalidators_);
> - ovsrcu_quiesce_end();
> }
>
> /* Notifies 'udpif' that something changed which may render previous
> @@ -700,13 +695,9 @@ udpif_flush(struct udpif *udpif)
> size_t n_handlers_ = udpif->n_handlers;
> size_t n_revalidators_ = udpif->n_revalidators;
>
> - ovsrcu_quiesce_start();
> -
> udpif_stop_threads(udpif);
> dpif_flow_flush(udpif->dpif);
> udpif_start_threads(udpif, n_handlers_, n_revalidators_);
> -
> - ovsrcu_quiesce_end();
> }
>
> /* Removes all flows from all datapaths. */
>
>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev