[Bug 1765241] Re: virtio_scsi race can corrupt memory, panic kernel

Jay Vosburgh Thu, 19 Apr 2018 11:31:25 -0700

SRU Justification:

Impact:


        This issue can cause system panics of systems using the
virtio_scsi driver with the affected Ubuntu kernels.  The issue manifests
irregularly, as it is timing dependent.

Fix:

        The issue is resolved by adding synchronization between the two
code paths that race with one another.  The lowest regression risk is to
use a synchronize_rcu_expedited call, as that is the functionality that
blocks the race in unaffected kernels.

diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
index 03a2aad..c122e68 100644
--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@ -762,6 +762,9 @@ static int virtscsi_target_alloc(struct scsi_target 
*starget)
 static void virtscsi_target_destroy(struct scsi_target *starget)
 {
        struct virtio_scsi_target_state *tgt = starget->hostdata;
+
+       /* we can race with concurrent virtscsi_complete_cmd */
+       synchronize_rcu_expedited();
        kfree(tgt);
 }
 

        It is also possible to have the code wait for any outstanding
requests to drain prior to freeing the target structure, e.g.,

--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@ -762,6 +762,10 @@ static int virtscsi_target_alloc(struct scsi_target 
*starget)
 static void virtscsi_target_destroy(struct scsi_target *starget)
 {
        struct virtio_scsi_target_state *tgt = starget->hostdata;
+
+       /* we can race with concurrent virtscsi_complete_cmd */
+       while (atomic_read(&tgt->reqs))
+               cpu_relax();
        kfree(tgt);
 }

        This completes a bit faster for the usual case, but SCSI target
destroy is not a fast path and the above runs the risk of the loop never
terminating.


Testcase:

This reproduces on Google Cloud, using the current, unmodified
ubuntu-1404-lts image (with the Ubuntu 4.4 kernel). Using the two attached
scripts, run e.g.

  ./create_shutdown_instance.sh 100

to create 100 instances. If an instance runs its startup script
successfully, it'll shut itself down right away. So instances that are
still running after a few minutes likely demonstrate this problem.

The issue reproduces easily with n1-standard-4.

create_shutdown_instance.sh:

#!/bin/bash -e

ZONE=us-central1-a

for i in $(seq -w $1); do
  gcloud compute instances create shutdown-experiment-$i \
    --zone="${ZONE}" \
    --image-family=ubuntu-1404-lts \
    --image-project=ubuntu-os-cloud \
    --machine-type=n1-standard-4 \
    --scopes compute-rw \
    --metadata-from-file startup-script=immediate_shutdown.sh &
done

wait

immediate_shutdown.sh:

#!/bin/bash -x

function get_metadata_value() {
  curl -H 'Metadata-Flavor: Google' \
    "http://metadata.google.internal/computeMetadata/v1/instance/$1";
}

readonly ZONE="$(get_metadata_value zone | awk -F'/' '{print $NF}')"
gcloud compute instances delete "$(hostname)" --zone="${ZONE}" --quiet

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1765241

Title:
  virtio_scsi race can corrupt memory, panic kernel

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1765241/+subscriptions

-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1765241] Re: virtio_scsi race can corrupt memory, panic kernel

Reply via email to