Re: resend, finish_wait with list_empty_careful maybe fragile or buggy
+ CC Boqun in case you are interested in this topic Best Regards, Trol > Sorry, this is a resend because the previous one was messed > up by my editor and hard to be read. > > void finish_wait(struct wait_queue_head *wq_head, > struct wait_queue_entry *wq_entry) > { > > ->if (!list_empty_careful(&wq_entry->entry)) { > ->spin_lock_irqsave(&wq_head->lock, flags); > ->list_del_init(&wq_entry->entry); > ->spin_unlock_irqrestore(&wq_head->lock, flags); > ->} > } > > After careful look into the stop/wakeup code, I found the above > code fragile or even buggy. This code was introduced at least 14 years > ago and seems fragile or buggy now after years of study on SMP > synchronization by us. > > I understand this code are being used a lot and no bug seems to emerge. > But, as I'll explain, it depends on a lot of unreliable implementation > details. > > Firstly, > > static inline int list_empty_careful(const struct list_head *head) > { > struct list_head *next = head->next; > return (next == head) && (next == head->prev); > } > > note that the read of head->next & head->prev is not protected by > READ_ONCE. So the read is free to be optimized out entirely. > Luckily, this optimization is hard for compilers now, since all other accesses > are out of finish_wait. And still, GCC won't spit aligned accesses into > multiple > instructions so it is atomic so far. > > Secondly, > > if ( ! list_empty_careful(&wq_entry->entry) ) { > [remove entry with spinning-lock] > } > [ends stack frame of the function calling finish_wait] > [overwrites wq_entry with another frame] > > and > > __wake_up_common() --> > [read wq_entry->func] --> > [read wq_entry->flags] --> > autoremove_wake_function() --> > [remove wq_entry->entry from wait queue] --> > > are not properly ordered for SMP so that [read wq_entry->flags] > may be reordered after [remove wq_entry->entry from wait queue] > since no dependency or memory barrier forbids it. This may cause > [overwrites wq_entry with another frame] on one CPU takes place > before [read wq_entry->flags] on another CPU and cause > [read wq_entry->flags] to return bad value. > > This behavior is not reported may thank to: > - few code is using autoremove_wake_function > - CPU pipeline is not as deep as to make this emerge > > Best regards, > Trol
Re: resend, finish_wait with list_empty_careful maybe fragile or buggy
> Sorry, this is a resend because the previous one was messed > up by my editor and hard to be read. > > void finish_wait( > struct wait_queue_head *wq_head, > struct wait_queue_entry *wq_entry) > { > > ->if (!list_empty_careful(&wq_entry->entry)) { > ->spin_lock_irqsave(&wq_head->lock, flags); > ->list_del_init(&wq_entry->entry); > ->spin_unlock_irqrestore(&wq_head->lock, flags); > ->} > } > > After careful look into the stop/wakeup code, I found > the above code fragile or even buggy. This code was > introduced at least 14 years ago and seems fragile or > buggy now after years of study on SMP synchronization > by us. > > I understand this code are being used a lot and no bug > seems to emerge. But, as I'll explain, it depends on a lot > of unreliable implementation details. > > Firstly, > > static inline int list_empty_careful(const struct list_head *head) > { > struct list_head *next = head->next; > return (next == head) && (next == head->prev); > } > > note that the read of head->next & head->prev is not > protected by READ_ONCE. So the read is free to be > optimized out entirely. Luckily, this optimization is hard > for compilers now, since all other accesses are out of > finish_wait. And still, GCC won't spit aligned accesses > into multiple instructions so it is atomic so far. > > Secondly, > > if ( ! list_empty_careful(&wq_entry->entry) ) { > > } > > > > and > > __wake_up_common() --> > func> --> > flags> --> > autoremove_wake_function() --> > entry from wait queue> --> > > are not properly ordered for SMP so that flags> > may be reordered after entry from wait queue> > since no dependency or memory barrier forbids it. This may cause > on one CPU takes place > before flags> on another CPU and cause > flags> to return bad value. > > This behavior is not reported may thank to: > - few code is using autoremove_wake_function > - CPU pipeline is not as deep as to make this emerge > Hi, mingo, peterz Would you please have a look at it if it won't waste you much time. Thanks a lot. CC scheduler maintainers Best regards, Patrick Trol