On 22/06/21 12:39, Vladimir Sementsov-Ogievskiy wrote:
22.06.2021 13:20, Paolo Bonzini wrote:
On 22/06/21 11:36, Vladimir Sementsov-Ogievskiy wrote:
OK, I agree, let's keep it.

You can also have a finished job, but get a stale value for error_is_read or ret.  The issue is not in getting the stale value per se, but in block_copy_call_status's caller not expecting it.

Hmm. So, do you mean that we can read ret and error_is_read ONLY after explicitly doing load_acquire(finished) and checking that it's true?

That means that we must do it not in assertion (to not be compiled out):

bool finished = load_acquire()

assert(finished);

... read reat and error_is_read ...

In reality you must have synchronized in some other way; that outside synchronization outside block-copy.c is what guarantees that the assertion does not fail. The simplest cases are:

- a mutex: "unlock" counts as release, "lock" counts as acquire;

- a bottom half: "schedule" counts as release, running counts as acquire.

Therefore, removing the assertion would work just fine because the synchronization would be like this:

   write ret/error_is_read
   write finished
   trigger bottom half or something -->    bottom half runs
                                           read ret/error_is_read

So there is no need at all to read ->finished, much less to load it outside the assertion. At the same time there are two problems with "assert(qatomic_read(&call_state->finished))". Note that they are not logic issues, they are maintenance issues.

First, if *there is a bug elsewhere* and the above synchronization doesn't happen, it may manifest sometimes as an assertion failure and sometimes as a memory reordering. This is what I was talking about in the previous message, and it is probably the last thing that you want when debugging; since we're adding asserts defensively, we might as well do it well.

Second, if somebody later carelessly changes the function to

  if (qatomic_read(&call_state->finished)) {
      ...
  } else {
      error_setg(...);
  }

that would be broken. Using qatomic_load_acquire makes the code more future-proof against a change like the one above.

Paolo


Reply via email to