[PATCH] x86/cpu/amd: Remove duplicate definition of MSRC001_1029
MSRC001_1029 had duplicate definitions in msr-index.h and amd.c. Remove the local definition in favor of the more widely used version. Signed-off-by: Venkatesh Srinivas --- arch/x86/kernel/cpu/amd.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c index 347a956f71ca..31c592e5096e 100644 --- a/arch/x86/kernel/cpu/amd.c +++ b/arch/x86/kernel/cpu/amd.c @@ -795,15 +795,13 @@ static void init_amd_gh(struct cpuinfo_x86 *c) set_cpu_bug(c, X86_BUG_AMD_TLB_MMATCH); } -#define MSR_AMD64_DE_CFG 0xC0011029 - static void init_amd_ln(struct cpuinfo_x86 *c) { /* * Apply erratum 665 fix unconditionally so machines without a BIOS * fix work. */ - msr_set_bit(MSR_AMD64_DE_CFG, 31); + msr_set_bit(MSR_F10H_DECFG, 31); } static bool rdrand_force; -- 2.31.0.291.g576ba9dcdaf-goog
Re: [PATCH v2] vhost: replace % with & on data path
On Mon, Nov 30, 2015 at 11:15:23AM +0200, Michael S. Tsirkin wrote: > We know vring num is a power of 2, so use & > to mask the high bits. > > Signed-off-by: Michael S. Tsirkin > --- The generated code switches from DIV -> masking, source is clearer as well. Tested-by: Venkatesh Srinivas -- vs; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] tools/virtio: fix byteswap logic
On Mon, Nov 30, 2015 at 10:33:34AM +0200, Michael S. Tsirkin wrote: > commit cf561f0d2eb74574ad9985a2feab134267a9d298 ("virtio: introduce > virtio_is_little_endian() helper") changed byteswap logic to > skip feature bit checks for LE platforms, but didn't > update tools/virtio, so vring_bench started failing. > > Update the copy under tools/virtio/ (TODO: find a way to avoid this code > duplication). > > Cc: Greg Kurz > Signed-off-by: Michael S. Tsirkin > --- > tools/virtio/linux/virtio_config.h | 20 +--- > 1 file changed, 13 insertions(+), 7 deletions(-) > > diff --git a/tools/virtio/linux/virtio_config.h > b/tools/virtio/linux/virtio_config.h > index 806d683..57a6964 100644 > --- a/tools/virtio/linux/virtio_config.h > +++ b/tools/virtio/linux/virtio_config.h > @@ -40,33 +40,39 @@ static inline void __virtio_clear_bit(struct > virtio_device *vdev, > #define virtio_has_feature(dev, feature) \ > (__virtio_test_bit((dev), feature)) > > +static inline bool virtio_is_little_endian(struct virtio_device *vdev) > +{ > + return virtio_has_feature(vdev, VIRTIO_F_VERSION_1) || > + virtio_legacy_is_little_endian(); > +} > + > +/* Memory accessors */ > static inline u16 virtio16_to_cpu(struct virtio_device *vdev, __virtio16 val) > { > - return __virtio16_to_cpu(virtio_has_feature(vdev, VIRTIO_F_VERSION_1), > val); > + return __virtio16_to_cpu(virtio_is_little_endian(vdev), val); > } > > static inline __virtio16 cpu_to_virtio16(struct virtio_device *vdev, u16 val) > { > - return __cpu_to_virtio16(virtio_has_feature(vdev, VIRTIO_F_VERSION_1), > val); > + return __cpu_to_virtio16(virtio_is_little_endian(vdev), val); > } > > static inline u32 virtio32_to_cpu(struct virtio_device *vdev, __virtio32 val) > { > - return __virtio32_to_cpu(virtio_has_feature(vdev, VIRTIO_F_VERSION_1), > val); > + return __virtio32_to_cpu(virtio_is_little_endian(vdev), val); > } > > static inline __virtio32 cpu_to_virtio32(struct virtio_device *vdev, u32 val) > { > - return __cpu_to_virtio32(virtio_has_feature(vdev, VIRTIO_F_VERSION_1), > val); > + return __cpu_to_virtio32(virtio_is_little_endian(vdev), val); > } > > static inline u64 virtio64_to_cpu(struct virtio_device *vdev, __virtio64 val) > { > - return __virtio64_to_cpu(virtio_has_feature(vdev, VIRTIO_F_VERSION_1), > val); > + return __virtio64_to_cpu(virtio_is_little_endian(vdev), val); > } > > static inline __virtio64 cpu_to_virtio64(struct virtio_device *vdev, u64 val) > { > - return __cpu_to_virtio64(virtio_has_feature(vdev, VIRTIO_F_VERSION_1), > val); > + return __cpu_to_virtio64(virtio_is_little_endian(vdev), val); > } Tested patch with vring-bench. (I did something similar when reviving it). Tested-by: Venkatesh Srinivas -- vs; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] vhost: replace % with & on data path
On Mon, Nov 30, 2015 at 11:15:23AM +0200, Michael S. Tsirkin wrote: > We know vring num is a power of 2, so use & > to mask the high bits. > > Signed-off-by: Michael S. Tsirkin <m...@redhat.com> > --- The generated code switches from DIV -> masking, source is clearer as well. Tested-by: Venkatesh Srinivas <venkate...@google.com> -- vs; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] tools/virtio: fix byteswap logic
On Mon, Nov 30, 2015 at 10:33:34AM +0200, Michael S. Tsirkin wrote: > commit cf561f0d2eb74574ad9985a2feab134267a9d298 ("virtio: introduce > virtio_is_little_endian() helper") changed byteswap logic to > skip feature bit checks for LE platforms, but didn't > update tools/virtio, so vring_bench started failing. > > Update the copy under tools/virtio/ (TODO: find a way to avoid this code > duplication). > > Cc: Greg Kurz <gk...@linux.vnet.ibm.com> > Signed-off-by: Michael S. Tsirkin <m...@redhat.com> > --- > tools/virtio/linux/virtio_config.h | 20 +--- > 1 file changed, 13 insertions(+), 7 deletions(-) > > diff --git a/tools/virtio/linux/virtio_config.h > b/tools/virtio/linux/virtio_config.h > index 806d683..57a6964 100644 > --- a/tools/virtio/linux/virtio_config.h > +++ b/tools/virtio/linux/virtio_config.h > @@ -40,33 +40,39 @@ static inline void __virtio_clear_bit(struct > virtio_device *vdev, > #define virtio_has_feature(dev, feature) \ > (__virtio_test_bit((dev), feature)) > > +static inline bool virtio_is_little_endian(struct virtio_device *vdev) > +{ > + return virtio_has_feature(vdev, VIRTIO_F_VERSION_1) || > + virtio_legacy_is_little_endian(); > +} > + > +/* Memory accessors */ > static inline u16 virtio16_to_cpu(struct virtio_device *vdev, __virtio16 val) > { > - return __virtio16_to_cpu(virtio_has_feature(vdev, VIRTIO_F_VERSION_1), > val); > + return __virtio16_to_cpu(virtio_is_little_endian(vdev), val); > } > > static inline __virtio16 cpu_to_virtio16(struct virtio_device *vdev, u16 val) > { > - return __cpu_to_virtio16(virtio_has_feature(vdev, VIRTIO_F_VERSION_1), > val); > + return __cpu_to_virtio16(virtio_is_little_endian(vdev), val); > } > > static inline u32 virtio32_to_cpu(struct virtio_device *vdev, __virtio32 val) > { > - return __virtio32_to_cpu(virtio_has_feature(vdev, VIRTIO_F_VERSION_1), > val); > + return __virtio32_to_cpu(virtio_is_little_endian(vdev), val); > } > > static inline __virtio32 cpu_to_virtio32(struct virtio_device *vdev, u32 val) > { > - return __cpu_to_virtio32(virtio_has_feature(vdev, VIRTIO_F_VERSION_1), > val); > + return __cpu_to_virtio32(virtio_is_little_endian(vdev), val); > } > > static inline u64 virtio64_to_cpu(struct virtio_device *vdev, __virtio64 val) > { > - return __virtio64_to_cpu(virtio_has_feature(vdev, VIRTIO_F_VERSION_1), > val); > + return __virtio64_to_cpu(virtio_is_little_endian(vdev), val); > } > > static inline __virtio64 cpu_to_virtio64(struct virtio_device *vdev, u64 val) > { > - return __cpu_to_virtio64(virtio_has_feature(vdev, VIRTIO_F_VERSION_1), > val); > + return __cpu_to_virtio64(virtio_is_little_endian(vdev), val); > } Tested patch with vring-bench. (I did something similar when reviving it). Tested-by: Venkatesh Srinivas <venkate...@google.com> -- vs; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] KVM: x86: work around infinite loop in microcode when #AC is delivered
On Tue, Nov 10, 2015 at 01:22:52PM +0100, Paolo Bonzini wrote: > From: Eric Northup > > It was found that a guest can DoS a host by triggering an infinite > stream of "alignment check" (#AC) exceptions. This causes the > microcode to enter an infinite loop where the core never receives > another interrupt. The host kernel panics pretty quickly due to the > effects (CVE-2015-5307). > > Signed-off-by: Eric Northup > Cc: sta...@vger.kernel.org > Signed-off-by: Paolo Bonzini Tested-by: Venkatesh Srinivas -- vs; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] KVM: x86: work around infinite loop in microcode when #AC is delivered
On Tue, Nov 10, 2015 at 01:22:52PM +0100, Paolo Bonzini wrote: > From: Eric Northup <digitale...@google.com> > > It was found that a guest can DoS a host by triggering an infinite > stream of "alignment check" (#AC) exceptions. This causes the > microcode to enter an infinite loop where the core never receives > another interrupt. The host kernel panics pretty quickly due to the > effects (CVE-2015-5307). > > Signed-off-by: Eric Northup <digitale...@google.com> > Cc: sta...@vger.kernel.org > Signed-off-by: Paolo Bonzini <pbonz...@redhat.com> Tested-by: Venkatesh Srinivas <venkate...@google.com> -- vs; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RESEND 2/7] scsi: storvsc: Size the queue depth based on the ringbuffer size
On Mon, Mar 23, 2015 at 2:06 PM, K. Y. Srinivasan wrote: > Size the queue depth based on the ringbuffer size. Also accomodate for the > fact that we could have multiple channels (ringbuffers) per adaptor. > > Signed-off-by: K. Y. Srinivasan > Reviewed-by: Long Li > --- > drivers/scsi/storvsc_drv.c | 27 --- > 1 files changed, 16 insertions(+), 11 deletions(-) > > diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c > index 27fe850..5a12897 100644 > --- a/drivers/scsi/storvsc_drv.c > +++ b/drivers/scsi/storvsc_drv.c > @@ -309,10 +309,15 @@ enum storvsc_request_type { > */ > > static int storvsc_ringbuffer_size = (256 * PAGE_SIZE); > +static u32 max_outstanding_req_per_channel; > + > +static int storvsc_vcpus_per_sub_channel = 4; > > module_param(storvsc_ringbuffer_size, int, S_IRUGO); > MODULE_PARM_DESC(storvsc_ringbuffer_size, "Ring buffer size (bytes)"); > > +module_param(storvsc_vcpus_per_sub_channel, int, S_IRUGO); > +MODULE_PARM_DESC(vcpus_per_sub_channel, "Ratio of VCPUs to subchannels"); > /* > * Timeout in seconds for all devices managed by this driver. > */ > @@ -320,7 +325,6 @@ static int storvsc_timeout = 180; > > static int msft_blist_flags = BLIST_TRY_VPD_PAGES; > > -#define STORVSC_MAX_IO_REQUESTS200 > > static void storvsc_on_channel_callback(void *context); > > @@ -1376,7 +1380,6 @@ static int storvsc_do_io(struct hv_device *device, > > static int storvsc_device_configure(struct scsi_device *sdevice) > { > - scsi_change_queue_depth(sdevice, STORVSC_MAX_IO_REQUESTS); > > blk_queue_max_segment_size(sdevice->request_queue, PAGE_SIZE); > > @@ -1646,7 +1649,6 @@ static struct scsi_host_template scsi_driver = { > .eh_timed_out = storvsc_eh_timed_out, > .slave_configure = storvsc_device_configure, > .cmd_per_lun = 255, > - .can_queue =STORVSC_MAX_IO_REQUESTS*STORVSC_MAX_TARGETS, > .this_id = -1, > /* no use setting to 0 since ll_blk_rw reset it to 1 */ > /* currently 32 */ > @@ -1686,6 +1688,7 @@ static int storvsc_probe(struct hv_device *device, > const struct hv_vmbus_device_id *dev_id) > { > int ret; > + int num_cpus = num_online_cpus(); > struct Scsi_Host *host; > struct hv_host_device *host_dev; > bool dev_is_ide = ((dev_id->driver_data == IDE_GUID) ? true : false); > @@ -1694,6 +1697,7 @@ static int storvsc_probe(struct hv_device *device, > int max_luns_per_target; > int max_targets; > int max_channels; > + int max_sub_channels = 0; > > /* > * Based on the windows host we are running on, > @@ -1719,12 +1723,18 @@ static int storvsc_probe(struct hv_device *device, > max_luns_per_target = STORVSC_MAX_LUNS_PER_TARGET; > max_targets = STORVSC_MAX_TARGETS; > max_channels = STORVSC_MAX_CHANNELS; > + /* > +* On Windows8 and above, we support sub-channels for storage. > +* The number of sub-channels offerred is based on the number > of > +* VCPUs in the guest. > +*/ > + max_sub_channels = (num_cpus / storvsc_vcpus_per_sub_channel); > break; > } > > - if (dev_id->driver_data == SFC_GUID) > - scsi_driver.can_queue = (STORVSC_MAX_IO_REQUESTS * > -STORVSC_FC_MAX_TARGETS); > + scsi_driver.can_queue = (max_outstanding_req_per_channel * > +max_sub_channels + 1); > + If num_online_cpus() returned 1 - 3, can_queue will be set to 1 I think. Is that desired? > host = scsi_host_alloc(_driver, >sizeof(struct hv_host_device)); > if (!host) > @@ -1837,7 +1847,6 @@ static struct hv_driver storvsc_drv = { > > static int __init storvsc_drv_init(void) > { > - u32 max_outstanding_req_per_channel; > > /* > * Divide the ring buffer data size (which is 1 page less > @@ -1852,10 +1861,6 @@ static int __init storvsc_drv_init(void) > vmscsi_size_delta, > sizeof(u64))); > > - if (max_outstanding_req_per_channel < > - STORVSC_MAX_IO_REQUESTS) > - return -EINVAL; > - > return vmbus_driver_register(_drv); > } -- vs; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RESEND 2/7] scsi: storvsc: Size the queue depth based on the ringbuffer size
On Mon, Mar 23, 2015 at 2:06 PM, K. Y. Srinivasan k...@microsoft.com wrote: Size the queue depth based on the ringbuffer size. Also accomodate for the fact that we could have multiple channels (ringbuffers) per adaptor. Signed-off-by: K. Y. Srinivasan k...@microsoft.com Reviewed-by: Long Li lon...@microsoft.com --- drivers/scsi/storvsc_drv.c | 27 --- 1 files changed, 16 insertions(+), 11 deletions(-) diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c index 27fe850..5a12897 100644 --- a/drivers/scsi/storvsc_drv.c +++ b/drivers/scsi/storvsc_drv.c @@ -309,10 +309,15 @@ enum storvsc_request_type { */ static int storvsc_ringbuffer_size = (256 * PAGE_SIZE); +static u32 max_outstanding_req_per_channel; + +static int storvsc_vcpus_per_sub_channel = 4; module_param(storvsc_ringbuffer_size, int, S_IRUGO); MODULE_PARM_DESC(storvsc_ringbuffer_size, Ring buffer size (bytes)); +module_param(storvsc_vcpus_per_sub_channel, int, S_IRUGO); +MODULE_PARM_DESC(vcpus_per_sub_channel, Ratio of VCPUs to subchannels); /* * Timeout in seconds for all devices managed by this driver. */ @@ -320,7 +325,6 @@ static int storvsc_timeout = 180; static int msft_blist_flags = BLIST_TRY_VPD_PAGES; -#define STORVSC_MAX_IO_REQUESTS200 static void storvsc_on_channel_callback(void *context); @@ -1376,7 +1380,6 @@ static int storvsc_do_io(struct hv_device *device, static int storvsc_device_configure(struct scsi_device *sdevice) { - scsi_change_queue_depth(sdevice, STORVSC_MAX_IO_REQUESTS); blk_queue_max_segment_size(sdevice-request_queue, PAGE_SIZE); @@ -1646,7 +1649,6 @@ static struct scsi_host_template scsi_driver = { .eh_timed_out = storvsc_eh_timed_out, .slave_configure = storvsc_device_configure, .cmd_per_lun = 255, - .can_queue =STORVSC_MAX_IO_REQUESTS*STORVSC_MAX_TARGETS, .this_id = -1, /* no use setting to 0 since ll_blk_rw reset it to 1 */ /* currently 32 */ @@ -1686,6 +1688,7 @@ static int storvsc_probe(struct hv_device *device, const struct hv_vmbus_device_id *dev_id) { int ret; + int num_cpus = num_online_cpus(); struct Scsi_Host *host; struct hv_host_device *host_dev; bool dev_is_ide = ((dev_id-driver_data == IDE_GUID) ? true : false); @@ -1694,6 +1697,7 @@ static int storvsc_probe(struct hv_device *device, int max_luns_per_target; int max_targets; int max_channels; + int max_sub_channels = 0; /* * Based on the windows host we are running on, @@ -1719,12 +1723,18 @@ static int storvsc_probe(struct hv_device *device, max_luns_per_target = STORVSC_MAX_LUNS_PER_TARGET; max_targets = STORVSC_MAX_TARGETS; max_channels = STORVSC_MAX_CHANNELS; + /* +* On Windows8 and above, we support sub-channels for storage. +* The number of sub-channels offerred is based on the number of +* VCPUs in the guest. +*/ + max_sub_channels = (num_cpus / storvsc_vcpus_per_sub_channel); break; } - if (dev_id-driver_data == SFC_GUID) - scsi_driver.can_queue = (STORVSC_MAX_IO_REQUESTS * -STORVSC_FC_MAX_TARGETS); + scsi_driver.can_queue = (max_outstanding_req_per_channel * +max_sub_channels + 1); + If num_online_cpus() returned 1 - 3, can_queue will be set to 1 I think. Is that desired? host = scsi_host_alloc(scsi_driver, sizeof(struct hv_host_device)); if (!host) @@ -1837,7 +1847,6 @@ static struct hv_driver storvsc_drv = { static int __init storvsc_drv_init(void) { - u32 max_outstanding_req_per_channel; /* * Divide the ring buffer data size (which is 1 page less @@ -1852,10 +1861,6 @@ static int __init storvsc_drv_init(void) vmscsi_size_delta, sizeof(u64))); - if (max_outstanding_req_per_channel - STORVSC_MAX_IO_REQUESTS) - return -EINVAL; - return vmbus_driver_register(storvsc_drv); } -- vs; -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v4] virtio-scsi: Fix the race condition in virtscsi_handle_event
On Wed, Jan 7, 2015 at 6:57 PM, Fam Zheng wrote: > > There is a race condition in virtscsi_handle_event, when many device > hotplug/unplug events flush in quickly. > > The scsi_remove_device in virtscsi_handle_transport_reset may trigger > the BUG_ON in scsi_target_reap, because the state is altered behind it, > probably by scsi_scan_host of another event. I'm able to reproduce it by > repeatedly plugging and unplugging a scsi disk with the same lun number. > > To fix this, a single thread workqueue (local to the module) is added, > which makes the scan work serialized. With this change, the panic goes > away. > > Signed-off-by: Fam Zheng Reviewed-by: Venkatesh Srinivas > --- > > v4: Addressing MST's comments: > Use ordered workqueue, with WQ_FREEZABLE and WQ_MEM_RECLAIM flags. > Coding style fixes. > > v3: Fix spacing and destroy order. (MST) > --- > drivers/scsi/virtio_scsi.c | 14 +- > 1 file changed, 13 insertions(+), 1 deletion(-) > > diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c > index c52bb5d..0db63b5 100644 > --- a/drivers/scsi/virtio_scsi.c > +++ b/drivers/scsi/virtio_scsi.c > @@ -120,6 +120,7 @@ struct virtio_scsi { > > static struct kmem_cache *virtscsi_cmd_cache; > static mempool_t *virtscsi_cmd_pool; > +static struct workqueue_struct *virtscsi_scan_wq; > > static inline struct Scsi_Host *virtio_scsi_host(struct virtio_device *vdev) > { > @@ -404,7 +405,7 @@ static void virtscsi_complete_event(struct virtio_scsi > *vscsi, void *buf) > struct virtio_scsi_event_node *event_node = buf; > > if (!vscsi->stop_events) > - queue_work(system_freezable_wq, _node->work); > + queue_work(virtscsi_scan_wq, _node->work); > } > > static void virtscsi_event_done(struct virtqueue *vq) > @@ -1119,6 +1120,14 @@ static int __init init(void) > pr_err("mempool_create() for virtscsi_cmd_pool failed\n"); > goto error; > } > + > + virtscsi_scan_wq = > + alloc_ordered_workqueue("virtscsi-scan", WQ_FREEZABLE | > WQ_MEM_RECLAIM); > + if (!virtscsi_scan_wq) { > + pr_err("create_singlethread_workqueue() for virtscsi_scan_wq > failed\n"); > + goto error; > + } > + > ret = register_virtio_driver(_scsi_driver); > if (ret < 0) > goto error; > @@ -1126,6 +1135,8 @@ static int __init init(void) > return 0; > > error: > + if (virtscsi_scan_wq) > + destroy_workqueue(virtscsi_scan_wq); > if (virtscsi_cmd_pool) { > mempool_destroy(virtscsi_cmd_pool); > virtscsi_cmd_pool = NULL; > @@ -1140,6 +1151,7 @@ error: > static void __exit fini(void) > { > unregister_virtio_driver(_scsi_driver); > + destroy_workqueue(virtscsi_scan_wq); > mempool_destroy(virtscsi_cmd_pool); > kmem_cache_destroy(virtscsi_cmd_cache); > } > -- > 1.9.3 > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v4] virtio-scsi: Fix the race condition in virtscsi_handle_event
On Wed, Jan 7, 2015 at 6:57 PM, Fam Zheng f...@redhat.com wrote: There is a race condition in virtscsi_handle_event, when many device hotplug/unplug events flush in quickly. The scsi_remove_device in virtscsi_handle_transport_reset may trigger the BUG_ON in scsi_target_reap, because the state is altered behind it, probably by scsi_scan_host of another event. I'm able to reproduce it by repeatedly plugging and unplugging a scsi disk with the same lun number. To fix this, a single thread workqueue (local to the module) is added, which makes the scan work serialized. With this change, the panic goes away. Signed-off-by: Fam Zheng f...@redhat.com Reviewed-by: Venkatesh Srinivas venkate...@google.com --- v4: Addressing MST's comments: Use ordered workqueue, with WQ_FREEZABLE and WQ_MEM_RECLAIM flags. Coding style fixes. v3: Fix spacing and destroy order. (MST) --- drivers/scsi/virtio_scsi.c | 14 +- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c index c52bb5d..0db63b5 100644 --- a/drivers/scsi/virtio_scsi.c +++ b/drivers/scsi/virtio_scsi.c @@ -120,6 +120,7 @@ struct virtio_scsi { static struct kmem_cache *virtscsi_cmd_cache; static mempool_t *virtscsi_cmd_pool; +static struct workqueue_struct *virtscsi_scan_wq; static inline struct Scsi_Host *virtio_scsi_host(struct virtio_device *vdev) { @@ -404,7 +405,7 @@ static void virtscsi_complete_event(struct virtio_scsi *vscsi, void *buf) struct virtio_scsi_event_node *event_node = buf; if (!vscsi-stop_events) - queue_work(system_freezable_wq, event_node-work); + queue_work(virtscsi_scan_wq, event_node-work); } static void virtscsi_event_done(struct virtqueue *vq) @@ -1119,6 +1120,14 @@ static int __init init(void) pr_err(mempool_create() for virtscsi_cmd_pool failed\n); goto error; } + + virtscsi_scan_wq = + alloc_ordered_workqueue(virtscsi-scan, WQ_FREEZABLE | WQ_MEM_RECLAIM); + if (!virtscsi_scan_wq) { + pr_err(create_singlethread_workqueue() for virtscsi_scan_wq failed\n); + goto error; + } + ret = register_virtio_driver(virtio_scsi_driver); if (ret 0) goto error; @@ -1126,6 +1135,8 @@ static int __init init(void) return 0; error: + if (virtscsi_scan_wq) + destroy_workqueue(virtscsi_scan_wq); if (virtscsi_cmd_pool) { mempool_destroy(virtscsi_cmd_pool); virtscsi_cmd_pool = NULL; @@ -1140,6 +1151,7 @@ error: static void __exit fini(void) { unregister_virtio_driver(virtio_scsi_driver); + destroy_workqueue(virtscsi_scan_wq); mempool_destroy(virtscsi_cmd_pool); kmem_cache_destroy(virtscsi_cmd_cache); } -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] virtio-scsi: Fix the race condition in virtscsi_handle_event
On Mon, Jan 5, 2015 at 11:33 PM, Fam Zheng wrote: > There is a race condition in virtscsi_handle_event, when many device > hotplug/unplug events flush in quickly. > > The scsi_remove_device in virtscsi_handle_transport_reset may trigger > the BUG_ON in scsi_target_reap, because the state is altered behind it, > probably by scsi_scan_host of another event. I'm able to reproduce it by > repeatedly plugging and unplugging a scsi disk with the same lun number. > > To make is safe, a single thread workqueue local to the module is added > which runs the scan work. With this change, the panic goes away. > > Signed-off-by: Fam Zheng > > --- > > v2: Use a single threaded workqueue instead of mutex to serialize work > (Venkatesh) > --- > drivers/scsi/virtio_scsi.c | 14 +- > 1 file changed, 13 insertions(+), 1 deletion(-) > > diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c > index c52bb5d..71b0091 100644 > --- a/drivers/scsi/virtio_scsi.c > +++ b/drivers/scsi/virtio_scsi.c > @@ -120,6 +120,8 @@ struct virtio_scsi { > > static struct kmem_cache *virtscsi_cmd_cache; > static mempool_t *virtscsi_cmd_pool; > +static struct workqueue_struct *virtscsi_scan_wq; > + > > static inline struct Scsi_Host *virtio_scsi_host(struct virtio_device *vdev) > { > @@ -404,7 +406,7 @@ static void virtscsi_complete_event(struct virtio_scsi > *vscsi, void *buf) > struct virtio_scsi_event_node *event_node = buf; > > if (!vscsi->stop_events) > - queue_work(system_freezable_wq, _node->work); > + queue_work(virtscsi_scan_wq, _node->work); > } > > static void virtscsi_event_done(struct virtqueue *vq) > @@ -1119,6 +1121,12 @@ static int __init init(void) > pr_err("mempool_create() for virtscsi_cmd_pool failed\n"); > goto error; > } > + virtscsi_scan_wq = create_singlethread_workqueue("virtscsi-scan"); > + if (!virtscsi_scan_wq) { > + pr_err("create_singlethread_workqueue() for virtscsi_scan_wq > failed\n"); > + goto error; > + } > + > ret = register_virtio_driver(_scsi_driver); > if (ret < 0) > goto error; > @@ -1126,6 +1134,9 @@ static int __init init(void) > return 0; > > error: > + if (virtscsi_scan_wq) { > + destroy_workqueue(virtscsi_scan_wq); > + } > if (virtscsi_cmd_pool) { > mempool_destroy(virtscsi_cmd_pool); > virtscsi_cmd_pool = NULL; > @@ -1142,6 +1153,7 @@ static void __exit fini(void) > unregister_virtio_driver(_scsi_driver); > mempool_destroy(virtscsi_cmd_pool); > kmem_cache_destroy(virtscsi_cmd_cache); > + destroy_workqueue(virtscsi_scan_wq); > } > module_init(init); > module_exit(fini); > -- > 1.9.3 Reviewed-by: Venkatesh Srinivas -- vs; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] virtio-scsi: Fix the race condition in virtscsi_handle_event
On Mon, Jan 5, 2015 at 11:33 PM, Fam Zheng f...@redhat.com wrote: There is a race condition in virtscsi_handle_event, when many device hotplug/unplug events flush in quickly. The scsi_remove_device in virtscsi_handle_transport_reset may trigger the BUG_ON in scsi_target_reap, because the state is altered behind it, probably by scsi_scan_host of another event. I'm able to reproduce it by repeatedly plugging and unplugging a scsi disk with the same lun number. To make is safe, a single thread workqueue local to the module is added which runs the scan work. With this change, the panic goes away. Signed-off-by: Fam Zheng f...@redhat.com --- v2: Use a single threaded workqueue instead of mutex to serialize work (Venkatesh) --- drivers/scsi/virtio_scsi.c | 14 +- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c index c52bb5d..71b0091 100644 --- a/drivers/scsi/virtio_scsi.c +++ b/drivers/scsi/virtio_scsi.c @@ -120,6 +120,8 @@ struct virtio_scsi { static struct kmem_cache *virtscsi_cmd_cache; static mempool_t *virtscsi_cmd_pool; +static struct workqueue_struct *virtscsi_scan_wq; + static inline struct Scsi_Host *virtio_scsi_host(struct virtio_device *vdev) { @@ -404,7 +406,7 @@ static void virtscsi_complete_event(struct virtio_scsi *vscsi, void *buf) struct virtio_scsi_event_node *event_node = buf; if (!vscsi-stop_events) - queue_work(system_freezable_wq, event_node-work); + queue_work(virtscsi_scan_wq, event_node-work); } static void virtscsi_event_done(struct virtqueue *vq) @@ -1119,6 +1121,12 @@ static int __init init(void) pr_err(mempool_create() for virtscsi_cmd_pool failed\n); goto error; } + virtscsi_scan_wq = create_singlethread_workqueue(virtscsi-scan); + if (!virtscsi_scan_wq) { + pr_err(create_singlethread_workqueue() for virtscsi_scan_wq failed\n); + goto error; + } + ret = register_virtio_driver(virtio_scsi_driver); if (ret 0) goto error; @@ -1126,6 +1134,9 @@ static int __init init(void) return 0; error: + if (virtscsi_scan_wq) { + destroy_workqueue(virtscsi_scan_wq); + } if (virtscsi_cmd_pool) { mempool_destroy(virtscsi_cmd_pool); virtscsi_cmd_pool = NULL; @@ -1142,6 +1153,7 @@ static void __exit fini(void) unregister_virtio_driver(virtio_scsi_driver); mempool_destroy(virtscsi_cmd_pool); kmem_cache_destroy(virtscsi_cmd_cache); + destroy_workqueue(virtscsi_scan_wq); } module_init(init); module_exit(fini); -- 1.9.3 Reviewed-by: Venkatesh Srinivas venkate...@google.com -- vs; -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] virtio-scsi: Fix the race condition in virtscsi_handle_event
On Sun, Jan 4, 2015 at 10:04 PM, Fam Zheng wrote: > > There is a race condition in virtscsi_handle_event, when many device > hotplug/unplug events flush in quickly. > > The scsi_remove_device in virtscsi_handle_transport_reset may trigger > the BUG_ON in scsi_target_reap, because the state is altered behind it, > probably by scsi_scan_host of another event. I'm able to reproduce it by > repeatedly plugging and unplugging a scsi disk with the same lun number. > > To make is safe, the mutex added in struct virtio_scsi is held in > virtscsi_handle_event, so that all the events are processed in a > synchronized way. With this lock, the panic goes away. > > Signed-off-by: Fam Zheng > --- > drivers/scsi/virtio_scsi.c | 6 ++ > 1 file changed, 6 insertions(+) > > diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c > index c52bb5d..7f194d4 100644 > --- a/drivers/scsi/virtio_scsi.c > +++ b/drivers/scsi/virtio_scsi.c > @@ -110,6 +110,9 @@ struct virtio_scsi { > /* CPU hotplug notifier */ > struct notifier_block nb; > > + /* Protect the hotplug/unplug event handling */ > + struct mutex scan_lock; > + > /* Protected by event_vq lock */ > bool stop_events; > > @@ -377,6 +380,7 @@ static void virtscsi_handle_event(struct work_struct > *work) > struct virtio_scsi *vscsi = event_node->vscsi; > struct virtio_scsi_event *event = _node->event; > > + mutex_lock(>scan_lock); > if (event->event & > cpu_to_virtio32(vscsi->vdev, VIRTIO_SCSI_T_EVENTS_MISSED)) { > event->event &= ~cpu_to_virtio32(vscsi->vdev, > @@ -397,6 +401,7 @@ static void virtscsi_handle_event(struct work_struct > *work) > pr_err("Unsupport virtio scsi event %x\n", event->event); > } > virtscsi_kick_event(vscsi, event_node); > + mutex_unlock(>scan_lock); > } > > static void virtscsi_complete_event(struct virtio_scsi *vscsi, void *buf) > @@ -894,6 +899,7 @@ static int virtscsi_init(struct virtio_device *vdev, > const char **names; > struct virtqueue **vqs; > > + mutex_init(>scan_lock); > num_vqs = vscsi->num_queues + VIRTIO_SCSI_VQ_BASE; > vqs = kmalloc(num_vqs * sizeof(struct virtqueue *), GFP_KERNEL); > callbacks = kmalloc(num_vqs * sizeof(vq_callback_t *), GFP_KERNEL); > -- > 1.9.3 Nice find. This fix does have the effect of serializing all event handling via scan_lock; perhaps you want to instead create a singlethreaded workqueue in virtio_scsi and queue handle_event there, rather than waiting on scan_lock on the system workqueue? Reviewed-by: Venkatesh Srinivas -- vs; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] virtio-scsi: Fix the race condition in virtscsi_handle_event
On Sun, Jan 4, 2015 at 10:04 PM, Fam Zheng f...@redhat.com wrote: There is a race condition in virtscsi_handle_event, when many device hotplug/unplug events flush in quickly. The scsi_remove_device in virtscsi_handle_transport_reset may trigger the BUG_ON in scsi_target_reap, because the state is altered behind it, probably by scsi_scan_host of another event. I'm able to reproduce it by repeatedly plugging and unplugging a scsi disk with the same lun number. To make is safe, the mutex added in struct virtio_scsi is held in virtscsi_handle_event, so that all the events are processed in a synchronized way. With this lock, the panic goes away. Signed-off-by: Fam Zheng f...@redhat.com --- drivers/scsi/virtio_scsi.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c index c52bb5d..7f194d4 100644 --- a/drivers/scsi/virtio_scsi.c +++ b/drivers/scsi/virtio_scsi.c @@ -110,6 +110,9 @@ struct virtio_scsi { /* CPU hotplug notifier */ struct notifier_block nb; + /* Protect the hotplug/unplug event handling */ + struct mutex scan_lock; + /* Protected by event_vq lock */ bool stop_events; @@ -377,6 +380,7 @@ static void virtscsi_handle_event(struct work_struct *work) struct virtio_scsi *vscsi = event_node-vscsi; struct virtio_scsi_event *event = event_node-event; + mutex_lock(vscsi-scan_lock); if (event-event cpu_to_virtio32(vscsi-vdev, VIRTIO_SCSI_T_EVENTS_MISSED)) { event-event = ~cpu_to_virtio32(vscsi-vdev, @@ -397,6 +401,7 @@ static void virtscsi_handle_event(struct work_struct *work) pr_err(Unsupport virtio scsi event %x\n, event-event); } virtscsi_kick_event(vscsi, event_node); + mutex_unlock(vscsi-scan_lock); } static void virtscsi_complete_event(struct virtio_scsi *vscsi, void *buf) @@ -894,6 +899,7 @@ static int virtscsi_init(struct virtio_device *vdev, const char **names; struct virtqueue **vqs; + mutex_init(vscsi-scan_lock); num_vqs = vscsi-num_queues + VIRTIO_SCSI_VQ_BASE; vqs = kmalloc(num_vqs * sizeof(struct virtqueue *), GFP_KERNEL); callbacks = kmalloc(num_vqs * sizeof(vq_callback_t *), GFP_KERNEL); -- 1.9.3 Nice find. This fix does have the effect of serializing all event handling via scan_lock; perhaps you want to instead create a singlethreaded workqueue in virtio_scsi and queue handle_event there, rather than waiting on scan_lock on the system workqueue? Reviewed-by: Venkatesh Srinivas venkate...@google.com -- vs; -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] scsi: Fix qemu boot hang problem
Should we wrap the sdev->device_busy read and test in a new scsi_device_busy(), so as to preserve the old polarity? Reviewed-by: Venkatesh Srinivas On 8/11/14, Jens Axboe wrote: > On 2014-08-10 06:54, Guenter Roeck wrote: >> The latest kernel fails to boot qemu arm images when using scsi >> for disk access. Boot gets stuck after the following messages. >> >> brd: module loaded >> sym53c8xx :00:0c.0: enabling device (0100 -> 0103) >> sym0: <895a> rev 0x0 at pci :00:0c.0 irq 93 >> sym0: No NVRAM, ID 7, Fast-40, LVD, parity checking >> sym0: SCSI BUS has been reset. >> scsi host0: sym-2.2.3 >> >> Bisect points to commit 71e75c97f97a ("scsi: convert device_busy to >> atomic_t"). Code inspection shows the following suspicious change >> in scsi_request_fn. >> >> out_delay: >> - if (sdev->device_busy == 0 && !scsi_device_blocked(sdev)) >> + if (atomic_read(>device_busy) && >> !scsi_device_blocked(sdev)) >> blk_delay_queue(q, SCSI_QUEUE_DELAY); >> } >> >> 'sdev->device_busy == 0' was replaced with >> 'atomic_read(>device_busy)', >> meaning the logic was reversed. Changing this expression to >> '!atomic_read(>device_busy)' fixes the problem. >> >> Cc: Christoph Hellwig >> Cc: Hannes Reinecke >> Cc: Webb Scales >> Cc: Jens Axboe >> Signed-off-by: Guenter Roeck >> --- >> drivers/scsi/scsi_lib.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c >> index 9c44392..ce62e87 100644 >> --- a/drivers/scsi/scsi_lib.c >> +++ b/drivers/scsi/scsi_lib.c >> @@ -1774,7 +1774,7 @@ static void scsi_request_fn(struct request_queue >> *q) >> blk_requeue_request(q, req); >> atomic_dec(>device_busy); >> out_delay: >> -if (atomic_read(>device_busy) && !scsi_device_blocked(sdev)) >> +if (!atomic_read(>device_busy) && !scsi_device_blocked(sdev)) >> blk_delay_queue(q, SCSI_QUEUE_DELAY); >> } >> >> > > That's a no-brainer obvious fix! > > Acked-by: Jens Axboe > > > -- > Jens Axboe > > -- > To unsubscribe from this list: send the line "unsubscribe linux-scsi" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] scsi: Fix qemu boot hang problem
Should we wrap the sdev-device_busy read and test in a new scsi_device_busy(), so as to preserve the old polarity? Reviewed-by: Venkatesh Srinivas venkate...@google.com On 8/11/14, Jens Axboe ax...@kernel.dk wrote: On 2014-08-10 06:54, Guenter Roeck wrote: The latest kernel fails to boot qemu arm images when using scsi for disk access. Boot gets stuck after the following messages. brd: module loaded sym53c8xx :00:0c.0: enabling device (0100 - 0103) sym0: 895a rev 0x0 at pci :00:0c.0 irq 93 sym0: No NVRAM, ID 7, Fast-40, LVD, parity checking sym0: SCSI BUS has been reset. scsi host0: sym-2.2.3 Bisect points to commit 71e75c97f97a (scsi: convert device_busy to atomic_t). Code inspection shows the following suspicious change in scsi_request_fn. out_delay: - if (sdev-device_busy == 0 !scsi_device_blocked(sdev)) + if (atomic_read(sdev-device_busy) !scsi_device_blocked(sdev)) blk_delay_queue(q, SCSI_QUEUE_DELAY); } 'sdev-device_busy == 0' was replaced with 'atomic_read(sdev-device_busy)', meaning the logic was reversed. Changing this expression to '!atomic_read(sdev-device_busy)' fixes the problem. Cc: Christoph Hellwig h...@lst.de Cc: Hannes Reinecke h...@suse.de Cc: Webb Scales web...@hp.com Cc: Jens Axboe ax...@kernel.dk Signed-off-by: Guenter Roeck li...@roeck-us.net --- drivers/scsi/scsi_lib.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index 9c44392..ce62e87 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -1774,7 +1774,7 @@ static void scsi_request_fn(struct request_queue *q) blk_requeue_request(q, req); atomic_dec(sdev-device_busy); out_delay: -if (atomic_read(sdev-device_busy) !scsi_device_blocked(sdev)) +if (!atomic_read(sdev-device_busy) !scsi_device_blocked(sdev)) blk_delay_queue(q, SCSI_QUEUE_DELAY); } That's a no-brainer obvious fix! Acked-by: Jens Axboe ax...@fb.com -- Jens Axboe -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 5/6] virtio-scsi: fix various bad behavior on aborted requests
On 6/4/14, Paolo Bonzini wrote: > Il 04/06/2014 19:29, Venkatesh Srinivas ha scritto: >> Do you really want to poll the request VQs for completions if the TMF >> was rejected? > > I wasn't sure, but bugs in this path are hard enough that I preferred > the safer patch. Ok. As long as it was deliberate. I'd slightly prefer only doing so in the success case, but simplicity is a compelling argument :) Reviewed-by: Venkatesh Srinivas -- vs; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 5/6] virtio-scsi: fix various bad behavior on aborted requests
On 6/4/14, Paolo Bonzini wrote: > Even though the virtio-scsi spec guarantees that all requests related > to the TMF will have been completed by the time the TMF itself completes, > the request queue's callback might not have run yet. This causes requests > to be completed more than once, and as a result triggers a variety of > BUGs or oopses. > > Cc: sta...@vger.kernel.org > Signed-off-by: Paolo Bonzini > --- > drivers/scsi/virtio_scsi.c | 22 ++ > 1 file changed, 22 insertions(+) > > diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c > index d66c4ee2c774..fda9fb35 100644 > --- a/drivers/scsi/virtio_scsi.c > +++ b/drivers/scsi/virtio_scsi.c > @@ -235,6 +235,16 @@ static void virtscsi_req_done(struct virtqueue *vq) > virtscsi_vq_done(vscsi, req_vq, virtscsi_complete_cmd); > }; > > +static void virtscsi_poll_requests(struct virtio_scsi *vscsi) > +{ > + int i, num_vqs; > + > + num_vqs = vscsi->num_queues; > + for (i = 0; i < num_vqs; i++) > + virtscsi_vq_done(vscsi, >req_vqs[i], > + virtscsi_complete_cmd); > +} > + > static void virtscsi_complete_free(struct virtio_scsi *vscsi, void *buf) > { > struct virtio_scsi_cmd *cmd = buf; > @@ -579,6 +589,18 @@ static int virtscsi_tmf(struct virtio_scsi *vscsi, > struct virtio_scsi_cmd *cmd) > cmd->resp.tmf.response == VIRTIO_SCSI_S_FUNCTION_SUCCEEDED) > ret = SUCCESS; > > + /* > + * The spec guarantees that all requests related to the TMF have > + * been completed, but the callback might not have run yet if > + * we're using independent interrupts (e.g. MSI). Poll the > + * virtqueues once. > + * > + * In the abort case, sc->scsi_done will do nothing, because > + * the block layer must have detected a timeout and as a result > + * REQ_ATOM_COMPLETE has been set. > + */ > + virtscsi_poll_requests(vscsi); Do you really want to poll the request VQs for completions if the TMF was rejected? TMF ABORT may return FUNCTION REJECTED if the command to abort completed before the device saw the TMF ABORT message, for example. In such cases, this would unnecessarily lengthen the EH path. > + > out: > mempool_free(cmd, virtscsi_cmd_pool); > return ret; > -- > 1.8.3.1 Thanks for looking into this, -- vs; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 2/6] virtio-scsi: replace target spinlock with seqcount
On 6/4/14, Paolo Bonzini wrote: > From: Ming Lei > > The spinlock of tgt_lock is only for serializing read and write > req_vq, one lockless seqcount is enough for the purpose. > > On one 16core VM with vhost-scsi backend, the patch can improve > IOPS with 3% on random read test. > > Signed-off-by: Ming Lei > [Add initialization in virtscsi_target_alloc. - Paolo] > Signed-off-by: Paolo Bonzini Reviewed-by: Venkatesh Srinivas -- vs; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 2/6] virtio-scsi: replace target spinlock with seqcount
On 6/4/14, Paolo Bonzini pbonz...@redhat.com wrote: From: Ming Lei ming@canonical.com The spinlock of tgt_lock is only for serializing read and write req_vq, one lockless seqcount is enough for the purpose. On one 16core VM with vhost-scsi backend, the patch can improve IOPS with 3% on random read test. Signed-off-by: Ming Lei ming@canonical.com [Add initialization in virtscsi_target_alloc. - Paolo] Signed-off-by: Paolo Bonzini pbonz...@redhat.com Reviewed-by: Venkatesh Srinivas venkate...@google.com -- vs; -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 5/6] virtio-scsi: fix various bad behavior on aborted requests
On 6/4/14, Paolo Bonzini pbonz...@redhat.com wrote: Even though the virtio-scsi spec guarantees that all requests related to the TMF will have been completed by the time the TMF itself completes, the request queue's callback might not have run yet. This causes requests to be completed more than once, and as a result triggers a variety of BUGs or oopses. Cc: sta...@vger.kernel.org Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- drivers/scsi/virtio_scsi.c | 22 ++ 1 file changed, 22 insertions(+) diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c index d66c4ee2c774..fda9fb35 100644 --- a/drivers/scsi/virtio_scsi.c +++ b/drivers/scsi/virtio_scsi.c @@ -235,6 +235,16 @@ static void virtscsi_req_done(struct virtqueue *vq) virtscsi_vq_done(vscsi, req_vq, virtscsi_complete_cmd); }; +static void virtscsi_poll_requests(struct virtio_scsi *vscsi) +{ + int i, num_vqs; + + num_vqs = vscsi-num_queues; + for (i = 0; i num_vqs; i++) + virtscsi_vq_done(vscsi, vscsi-req_vqs[i], + virtscsi_complete_cmd); +} + static void virtscsi_complete_free(struct virtio_scsi *vscsi, void *buf) { struct virtio_scsi_cmd *cmd = buf; @@ -579,6 +589,18 @@ static int virtscsi_tmf(struct virtio_scsi *vscsi, struct virtio_scsi_cmd *cmd) cmd-resp.tmf.response == VIRTIO_SCSI_S_FUNCTION_SUCCEEDED) ret = SUCCESS; + /* + * The spec guarantees that all requests related to the TMF have + * been completed, but the callback might not have run yet if + * we're using independent interrupts (e.g. MSI). Poll the + * virtqueues once. + * + * In the abort case, sc-scsi_done will do nothing, because + * the block layer must have detected a timeout and as a result + * REQ_ATOM_COMPLETE has been set. + */ + virtscsi_poll_requests(vscsi); Do you really want to poll the request VQs for completions if the TMF was rejected? TMF ABORT may return FUNCTION REJECTED if the command to abort completed before the device saw the TMF ABORT message, for example. In such cases, this would unnecessarily lengthen the EH path. + out: mempool_free(cmd, virtscsi_cmd_pool); return ret; -- 1.8.3.1 Thanks for looking into this, -- vs; -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 5/6] virtio-scsi: fix various bad behavior on aborted requests
On 6/4/14, Paolo Bonzini pbonz...@redhat.com wrote: Il 04/06/2014 19:29, Venkatesh Srinivas ha scritto: Do you really want to poll the request VQs for completions if the TMF was rejected? I wasn't sure, but bugs in this path are hard enough that I preferred the safer patch. Ok. As long as it was deliberate. I'd slightly prefer only doing so in the success case, but simplicity is a compelling argument :) Reviewed-by: Venkatesh Srinivas venkate...@google.com -- vs; -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] perf/x86: fix RAPL rdmsrl_safe() usage
On Wed, Apr 23, 2014 at 12:00 PM, Borislav Petkov wrote: > On Wed, Apr 23, 2014 at 07:04:19PM +0200, Stephane Eranian wrote: >> >> This patch fixes a bug introduced by commit 24223657. > > Maybe add > > Fixes: > > tag? > >> The rdmsrl_safe() function returns 0 on success. >> The current code was failing to detect the RAPL PMU >> on real hardware (missing /sys/devices/power) because >> the return value of rdmsrl_safe() was misinterpreted. >> >> Signed-off-by: Stephane Eranian > > Acked-by: Borislav Petkov Brown-Paper-Bag-Worn-By: Venkatesh Srinivas -- vs; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] perf/x86: fix RAPL rdmsrl_safe() usage
On Wed, Apr 23, 2014 at 12:00 PM, Borislav Petkov b...@alien8.de wrote: On Wed, Apr 23, 2014 at 07:04:19PM +0200, Stephane Eranian wrote: This patch fixes a bug introduced by commit 24223657. Maybe add Fixes: commit-id tag? The rdmsrl_safe() function returns 0 on success. The current code was failing to detect the RAPL PMU on real hardware (missing /sys/devices/power) because the return value of rdmsrl_safe() was misinterpreted. Signed-off-by: Stephane Eranian eran...@google.com Acked-by: Borislav Petkov b...@suse.de Brown-Paper-Bag-Worn-By: Venkatesh Srinivas venkate...@google.com -- vs; -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:perf/core] perf/x86/intel: Use rdmsrl_safe() when initializing RAPL PMU
Commit-ID: 24223657806a0ebd0ae5c9caaf7b021091889cf2 Gitweb: http://git.kernel.org/tip/24223657806a0ebd0ae5c9caaf7b021091889cf2 Author: Venkatesh Srinivas AuthorDate: Thu, 13 Mar 2014 12:36:26 -0700 Committer: Ingo Molnar CommitDate: Fri, 18 Apr 2014 12:14:26 +0200 perf/x86/intel: Use rdmsrl_safe() when initializing RAPL PMU CPUs which should support the RAPL counters according to Family/Model/Stepping may still issue #GP when attempting to access the RAPL MSRs. This may happen when Linux is running under KVM and we are passing-through host F/M/S data, for example. Use rdmsrl_safe to first access the RAPL_POWER_UNIT MSR; if this fails, do not attempt to use this PMU. Signed-off-by: Venkatesh Srinivas Signed-off-by: Peter Zijlstra Link: http://lkml.kernel.org/r/1394739386-22260-1-git-send-email-venkate...@google.com Cc: zheng.z@intel.com Cc: eran...@google.com Cc: a...@linux.intel.com Cc: linux-kernel@vger.kernel.org [ The patch also silently fixes another bug: rapl_pmu_init() didn't handle the memory alloc failure case previously. ] Signed-off-by: Ingo Molnar --- arch/x86/kernel/cpu/perf_event_intel_rapl.c | 12 +--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/arch/x86/kernel/cpu/perf_event_intel_rapl.c b/arch/x86/kernel/cpu/perf_event_intel_rapl.c index 4b9a9e9..7c87424 100644 --- a/arch/x86/kernel/cpu/perf_event_intel_rapl.c +++ b/arch/x86/kernel/cpu/perf_event_intel_rapl.c @@ -535,6 +535,7 @@ static int rapl_cpu_prepare(int cpu) struct rapl_pmu *pmu = per_cpu(rapl_pmu, cpu); int phys_id = topology_physical_package_id(cpu); u64 ms; + u64 msr_rapl_power_unit_bits; if (pmu) return 0; @@ -542,6 +543,9 @@ static int rapl_cpu_prepare(int cpu) if (phys_id < 0) return -1; + if (!rdmsrl_safe(MSR_RAPL_POWER_UNIT, _rapl_power_unit_bits)) + return -1; + pmu = kzalloc_node(sizeof(*pmu), GFP_KERNEL, cpu_to_node(cpu)); if (!pmu) return -1; @@ -555,8 +559,7 @@ static int rapl_cpu_prepare(int cpu) * * we cache in local PMU instance */ - rdmsrl(MSR_RAPL_POWER_UNIT, pmu->hw_unit); - pmu->hw_unit = (pmu->hw_unit >> 8) & 0x1FULL; + pmu->hw_unit = (msr_rapl_power_unit_bits >> 8) & 0x1FULL; pmu->pmu = _pmu_class; /* @@ -677,7 +680,9 @@ static int __init rapl_pmu_init(void) cpu_notifier_register_begin(); for_each_online_cpu(cpu) { - rapl_cpu_prepare(cpu); + ret = rapl_cpu_prepare(cpu); + if (ret) + goto out; rapl_cpu_init(cpu); } @@ -700,6 +705,7 @@ static int __init rapl_pmu_init(void) hweight32(rapl_cntr_mask), ktime_to_ms(pmu->timer_interval)); +out: cpu_notifier_register_done(); return 0; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:perf/core] perf/x86/intel: Use rdmsrl_safe() when initializing RAPL PMU
Commit-ID: 24223657806a0ebd0ae5c9caaf7b021091889cf2 Gitweb: http://git.kernel.org/tip/24223657806a0ebd0ae5c9caaf7b021091889cf2 Author: Venkatesh Srinivas venkate...@google.com AuthorDate: Thu, 13 Mar 2014 12:36:26 -0700 Committer: Ingo Molnar mi...@kernel.org CommitDate: Fri, 18 Apr 2014 12:14:26 +0200 perf/x86/intel: Use rdmsrl_safe() when initializing RAPL PMU CPUs which should support the RAPL counters according to Family/Model/Stepping may still issue #GP when attempting to access the RAPL MSRs. This may happen when Linux is running under KVM and we are passing-through host F/M/S data, for example. Use rdmsrl_safe to first access the RAPL_POWER_UNIT MSR; if this fails, do not attempt to use this PMU. Signed-off-by: Venkatesh Srinivas venkate...@google.com Signed-off-by: Peter Zijlstra pet...@infradead.org Link: http://lkml.kernel.org/r/1394739386-22260-1-git-send-email-venkate...@google.com Cc: zheng.z@intel.com Cc: eran...@google.com Cc: a...@linux.intel.com Cc: linux-kernel@vger.kernel.org [ The patch also silently fixes another bug: rapl_pmu_init() didn't handle the memory alloc failure case previously. ] Signed-off-by: Ingo Molnar mi...@kernel.org --- arch/x86/kernel/cpu/perf_event_intel_rapl.c | 12 +--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/arch/x86/kernel/cpu/perf_event_intel_rapl.c b/arch/x86/kernel/cpu/perf_event_intel_rapl.c index 4b9a9e9..7c87424 100644 --- a/arch/x86/kernel/cpu/perf_event_intel_rapl.c +++ b/arch/x86/kernel/cpu/perf_event_intel_rapl.c @@ -535,6 +535,7 @@ static int rapl_cpu_prepare(int cpu) struct rapl_pmu *pmu = per_cpu(rapl_pmu, cpu); int phys_id = topology_physical_package_id(cpu); u64 ms; + u64 msr_rapl_power_unit_bits; if (pmu) return 0; @@ -542,6 +543,9 @@ static int rapl_cpu_prepare(int cpu) if (phys_id 0) return -1; + if (!rdmsrl_safe(MSR_RAPL_POWER_UNIT, msr_rapl_power_unit_bits)) + return -1; + pmu = kzalloc_node(sizeof(*pmu), GFP_KERNEL, cpu_to_node(cpu)); if (!pmu) return -1; @@ -555,8 +559,7 @@ static int rapl_cpu_prepare(int cpu) * * we cache in local PMU instance */ - rdmsrl(MSR_RAPL_POWER_UNIT, pmu-hw_unit); - pmu-hw_unit = (pmu-hw_unit 8) 0x1FULL; + pmu-hw_unit = (msr_rapl_power_unit_bits 8) 0x1FULL; pmu-pmu = rapl_pmu_class; /* @@ -677,7 +680,9 @@ static int __init rapl_pmu_init(void) cpu_notifier_register_begin(); for_each_online_cpu(cpu) { - rapl_cpu_prepare(cpu); + ret = rapl_cpu_prepare(cpu); + if (ret) + goto out; rapl_cpu_init(cpu); } @@ -700,6 +705,7 @@ static int __init rapl_pmu_init(void) hweight32(rapl_cntr_mask), ktime_to_ms(pmu-timer_interval)); +out: cpu_notifier_register_done(); return 0; -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] virtio-blk: make the queue depth the max supportable by the hypervisor
On Wed, Mar 19, 2014 at 10:48 AM, Venkatesh Srinivas wrote: >> And I rewrote it substantially, mainly to take >> VIRTIO_RING_F_INDIRECT_DESC into account. >> >> As QEMU sets the vq size for PCI to 128, Venkatash's patch wouldn't >> have made a change. This version does (since QEMU also offers >> VIRTIO_RING_F_INDIRECT_DESC. > > That divide-by-2 produced the same queue depth as the prior > computation in QEMU was deliberate -- but raising it to 128 seems > pretty reasonable. > > Signed-off-by: Venkatesh Srinivas Soft ping about this patch. Thanks, -- vs; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] virtio-blk: make the queue depth the max supportable by the hypervisor
On Wed, Mar 19, 2014 at 10:48 AM, Venkatesh Srinivas venkate...@google.com wrote: And I rewrote it substantially, mainly to take VIRTIO_RING_F_INDIRECT_DESC into account. As QEMU sets the vq size for PCI to 128, Venkatash's patch wouldn't have made a change. This version does (since QEMU also offers VIRTIO_RING_F_INDIRECT_DESC. That divide-by-2 produced the same queue depth as the prior computation in QEMU was deliberate -- but raising it to 128 seems pretty reasonable. Signed-off-by: Venkatesh Srinivas venkate...@google.com Soft ping about this patch. Thanks, -- vs; -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: GPF in intel_pmu_lbr_reset() with qemu -cpu host
On Fri, Mar 21, 2014 at 10:46 AM, Peter Wu wrote: > cc'ing kvm people and list. > > On Friday 21 March 2014 18:42:40 Peter Wu wrote: >> Hi, >> >> While trying to run QEMU with `-enable-kvm -host cpu`, I get a GPF in >> intel_pmu_lbr_reset(): >> >> [0.024000] general protection fault: [#1] >> [0.024000] CPU: 0 PID: 1 Comm: swapper Not tainted >> 3.14.0-rc7-qemu-00059-g08edb33 #14 >> [0.024000] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 >> [0.024000] task: 88003e05 ti: 88003e054000 task.ti: >> 88003e054000 >> [0.024000] RIP: 0010:[] [] >> intel_pmu_lbr_reset+0x2a/0x80 >> [0.024000] RSP: :88003e055e78 EFLAGS: 0002 >> [0.024000] RAX: RBX: 0286 RCX: >> 0680 >> [0.024000] RDX: RSI: RDI: >> >> [0.024000] RBP: 81622120 R08: 88003ffee0e0 R09: >> 88003e00bf00 >> [0.024000] R10: R11: 0004 R12: >> >> [0.024000] R13: R14: R15: >> >> [0.024000] FS: () GS:8161e000() >> knlGS: >> [0.024000] CS: 0010 DS: ES: CR0: 80050033 >> [0.024000] CR2: 8800019bb000 CR3: 01611000 CR4: >> 001407b0 >> [0.024000] Stack: >> [0.024000] 8101308a 8100e3da 8165ba62 >> >> [0.024000] 8165b5bd >> >> [0.024000] 81655dcd >> >> [0.024000] Call Trace: >> [0.024000] [] ? intel_pmu_cpu_starting+0xa/0x80 >> [0.024000] [] ? x86_pmu_notifier+0x5a/0xc0 >> [0.024000] [] ? init_hw_perf_events+0x4a5/0x4dd >> [0.024000] [] ? check_bugs+0x42/0x42 >> [0.024000] [] ? do_one_initcall+0x76/0xf9 >> [0.024000] [] ? rest_init+0x70/0x70 >> [0.024000] [] ? kernel_init_freeable+0x57/0x177 >> [0.024000] [] ? rest_init+0x70/0x70 >> [0.024000] [] ? kernel_init+0x5/0xe0 >> [0.024000] [] ? ret_from_fork+0x7a/0xb0 >> [0.024000] [] ? rest_init+0x70/0x70 >> [0.024000] Code: 00 8b 15 02 c4 63 00 85 d2 74 69 f6 05 af c3 63 00 3f >> 75 2d 85 d2 7e 5c 31 f6 31 c0 0f 1f 44 00 00 8b 0d d2 c3 63 00 89 c2 01 f1 >> <0f> 30 83 c6 01 3b 35 d3 c3 63 00 7c e9 f3 c3 0f 1f 80 00 00 00 >> [0.024000] RIP [] intel_pmu_lbr_reset+0x2a/0x80 >> [0.024000] RSP >> [0.024000] ---[ end trace ecbd794f78441b2c ]--- >> [0.024002] Kernel panic - not syncing: Attempted to kill init! >> exitcode=0x000b >> >> >> It possibly has something to do with the msr write. Reproducable with: >> >> qemu-system-x86_64 -enable-kvm -cpu host -kernel bzImage -m 1G -serial >> file:ser.txt >> >> In the host dmesg, the following is visible when qemu: >> >> kvm [4939]: vcpu0 unhandled wrmsr: 0x680 data 0 >> >> The full guest dmesg is shown below. The issue occurs also with >> v3.13.6, v3.12.14, v3.10.33 (other versions are not tested). >> >> QEMU: 1.7.0 >> Host kernel: v3.14-rc5 >> Guest kernel: v3.14-rc7-59-g08edb33 (.config on the bottom) >> >> Kind regards, >> Peter >> >> ### dmesg >> [0.00] Linux version 3.14.0-rc7-qemu-00059-g08edb33 (pc@antartica) >> (gcc version 4.8.2 (Ubuntu 4.8.2-16ubuntu6) ) #14 Fri Mar 21 17:30:49 CET >> 2014 >> [0.00] Command line: console=ttyS0 loglevel=8 >> [0.00] KERNEL supported cpus: >> [0.00] Intel GenuineIntel >> [0.00] e820: BIOS-provided physical RAM map: >> [0.00] BIOS-e820: [mem 0x-0x0009fbff] usable >> [0.00] BIOS-e820: [mem 0x0009fc00-0x0009] >> reserved >> [0.00] BIOS-e820: [mem 0x000f-0x000f] >> reserved >> [0.00] BIOS-e820: [mem 0x0010-0x3fffdfff] usable >> [0.00] BIOS-e820: [mem 0x3fffe000-0x3fff] >> reserved >> [0.00] BIOS-e820: [mem 0xfeffc000-0xfeff] >> reserved >> [0.00] BIOS-e820: [mem 0xfffc-0x] >> reserved >> [0.00] NX (Execute Disable) protection: active >> [0.00] SMBIOS 2.4 present. >> [0.00] DMI: Bochs Bochs, BIOS Bochs 01/01/2011 >> [0.00] e820: update [mem 0x-0x0fff] usable ==> reserved >> [0.00] e820: remove [mem 0x000a-0x000f] usable >> [0.00] e820: last_pfn = 0x3fffe max_arch_pfn = 0x4 >> [0.00] MTRR default type: write-back >> [0.00] MTRR fixed ranges enabled: >> [0.00] 0-9 write-back >> [0.00] A-B uncachable >> [0.00] C-F write-protect >> [0.00] MTRR variable ranges enabled: >> [0.00] 0 base 008000 mask FF8000 uncachable >> [0.00] 1 disabled >> [0.00] 2
Re: GPF in intel_pmu_lbr_reset() with qemu -cpu host
On Fri, Mar 21, 2014 at 10:46 AM, Peter Wu pe...@lekensteyn.nl wrote: cc'ing kvm people and list. On Friday 21 March 2014 18:42:40 Peter Wu wrote: Hi, While trying to run QEMU with `-enable-kvm -host cpu`, I get a GPF in intel_pmu_lbr_reset(): [0.024000] general protection fault: [#1] [0.024000] CPU: 0 PID: 1 Comm: swapper Not tainted 3.14.0-rc7-qemu-00059-g08edb33 #14 [0.024000] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [0.024000] task: 88003e05 ti: 88003e054000 task.ti: 88003e054000 [0.024000] RIP: 0010:[8101148a] [8101148a] intel_pmu_lbr_reset+0x2a/0x80 [0.024000] RSP: :88003e055e78 EFLAGS: 0002 [0.024000] RAX: RBX: 0286 RCX: 0680 [0.024000] RDX: RSI: RDI: [0.024000] RBP: 81622120 R08: 88003ffee0e0 R09: 88003e00bf00 [0.024000] R10: R11: 0004 R12: [0.024000] R13: R14: R15: [0.024000] FS: () GS:8161e000() knlGS: [0.024000] CS: 0010 DS: ES: CR0: 80050033 [0.024000] CR2: 8800019bb000 CR3: 01611000 CR4: 001407b0 [0.024000] Stack: [0.024000] 8101308a 8100e3da 8165ba62 [0.024000] 8165b5bd [0.024000] 81655dcd [0.024000] Call Trace: [0.024000] [8101308a] ? intel_pmu_cpu_starting+0xa/0x80 [0.024000] [8100e3da] ? x86_pmu_notifier+0x5a/0xc0 [0.024000] [8165ba62] ? init_hw_perf_events+0x4a5/0x4dd [0.024000] [8165b5bd] ? check_bugs+0x42/0x42 [0.024000] [81655dcd] ? do_one_initcall+0x76/0xf9 [0.024000] [81276b70] ? rest_init+0x70/0x70 [0.024000] [81655ea7] ? kernel_init_freeable+0x57/0x177 [0.024000] [81276b70] ? rest_init+0x70/0x70 [0.024000] [81276b75] ? kernel_init+0x5/0xe0 [0.024000] [8128067a] ? ret_from_fork+0x7a/0xb0 [0.024000] [81276b70] ? rest_init+0x70/0x70 [0.024000] Code: 00 8b 15 02 c4 63 00 85 d2 74 69 f6 05 af c3 63 00 3f 75 2d 85 d2 7e 5c 31 f6 31 c0 0f 1f 44 00 00 8b 0d d2 c3 63 00 89 c2 01 f1 0f 30 83 c6 01 3b 35 d3 c3 63 00 7c e9 f3 c3 0f 1f 80 00 00 00 [0.024000] RIP [8101148a] intel_pmu_lbr_reset+0x2a/0x80 [0.024000] RSP 88003e055e78 [0.024000] ---[ end trace ecbd794f78441b2c ]--- [0.024002] Kernel panic - not syncing: Attempted to kill init! exitcode=0x000b It possibly has something to do with the msr write. Reproducable with: qemu-system-x86_64 -enable-kvm -cpu host -kernel bzImage -m 1G -serial file:ser.txt In the host dmesg, the following is visible when qemu: kvm [4939]: vcpu0 unhandled wrmsr: 0x680 data 0 The full guest dmesg is shown below. The issue occurs also with v3.13.6, v3.12.14, v3.10.33 (other versions are not tested). QEMU: 1.7.0 Host kernel: v3.14-rc5 Guest kernel: v3.14-rc7-59-g08edb33 (.config on the bottom) Kind regards, Peter ### dmesg [0.00] Linux version 3.14.0-rc7-qemu-00059-g08edb33 (pc@antartica) (gcc version 4.8.2 (Ubuntu 4.8.2-16ubuntu6) ) #14 Fri Mar 21 17:30:49 CET 2014 [0.00] Command line: console=ttyS0 loglevel=8 [0.00] KERNEL supported cpus: [0.00] Intel GenuineIntel [0.00] e820: BIOS-provided physical RAM map: [0.00] BIOS-e820: [mem 0x-0x0009fbff] usable [0.00] BIOS-e820: [mem 0x0009fc00-0x0009] reserved [0.00] BIOS-e820: [mem 0x000f-0x000f] reserved [0.00] BIOS-e820: [mem 0x0010-0x3fffdfff] usable [0.00] BIOS-e820: [mem 0x3fffe000-0x3fff] reserved [0.00] BIOS-e820: [mem 0xfeffc000-0xfeff] reserved [0.00] BIOS-e820: [mem 0xfffc-0x] reserved [0.00] NX (Execute Disable) protection: active [0.00] SMBIOS 2.4 present. [0.00] DMI: Bochs Bochs, BIOS Bochs 01/01/2011 [0.00] e820: update [mem 0x-0x0fff] usable == reserved [0.00] e820: remove [mem 0x000a-0x000f] usable [0.00] e820: last_pfn = 0x3fffe max_arch_pfn = 0x4 [0.00] MTRR default type: write-back [0.00] MTRR fixed ranges enabled: [0.00] 0-9 write-back [0.00] A-B uncachable [0.00] C-F write-protect [0.00] MTRR variable ranges enabled: [0.00] 0 base 008000 mask FF8000 uncachable [0.00] 1
Re: [PATCH] virtio-blk: make the queue depth the max supportable by the hypervisor
> And I rewrote it substantially, mainly to take > VIRTIO_RING_F_INDIRECT_DESC into account. > > As QEMU sets the vq size for PCI to 128, Venkatash's patch wouldn't > have made a change. This version does (since QEMU also offers > VIRTIO_RING_F_INDIRECT_DESC. That divide-by-2 produced the same queue depth as the prior computation in QEMU was deliberate -- but raising it to 128 seems pretty reasonable. Signed-off-by: Venkatesh Srinivas -- vs; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] virtio-blk: make the queue depth the max supportable by the hypervisor
And I rewrote it substantially, mainly to take VIRTIO_RING_F_INDIRECT_DESC into account. As QEMU sets the vq size for PCI to 128, Venkatash's patch wouldn't have made a change. This version does (since QEMU also offers VIRTIO_RING_F_INDIRECT_DESC. That divide-by-2 produced the same queue depth as the prior computation in QEMU was deliberate -- but raising it to 128 seems pretty reasonable. Signed-off-by: Venkatesh Srinivas venkate...@google.com -- vs; -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] perf/x86/intel: Use rdmsrl_safe when initializing RAPL PMU.
On Fri, Mar 14, 2014 at 10:57:58AM -0600, David Ahern wrote: On 3/14/14, 10:17 AM, Andi Kleen wrote: The Intel ISR section for RDMSR seems to say: "Specifying a reserved or unimplemented MSR address in ECX will also cause a general protection exception". From a guest's perspective, MSR_RAPL_POWER_UNIT is unimplemented; kvm matches this behavior. MSRs are model specific and defined per model number. If you report a model number you're expected to implement the MSRs defined for that model number. AFAIK Xen just reports 0 for unknown MSRs (and I'm surprised KVM doesn't too) I would suggest to fix KVM. I believe ignore_msrs parameter to kvm handles that. David Hi, cc-ing the virtualization mailing list for more detail on the kvm default for ignore_msrs (it defaults off). 1) Just returning 0 for unsupported MSRs is not workable -- 0 may be a meaningful value for an MSR. RDMSR/WRMSR already have a mechanism for out-of-band errors, #GP. 2) #GP has been KVM's default behavior for quite some time. Even if we believe changing KVM's default is appropriate, Linux w/ the RAPL PMU code enabled will fail to boot on existing KVM versions. W/ this change, Linux will boot on prior KVM versions. Thanks, -- vs; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] perf/x86/intel: Use rdmsrl_safe when initializing RAPL PMU.
On Fri, Mar 14, 2014 at 6:56 AM, Andi Kleen wrote: > On Fri, Mar 14, 2014 at 09:44:29AM +0100, Peter Zijlstra wrote: >> On Thu, Mar 13, 2014 at 12:36:26PM -0700, Venkatesh Srinivas wrote: >> > CPUs which should support the RAPL counters according to >> > Family/Model/Stepping may still issue #GP when attempting to access >> > the RAPL MSRs. This may happen when Linux is running under KVM and >> > we are passing-through host F/M/S data, for example. Use rdmsrl_safe >> > to first access the RAPL_POWER_UNIT MSR; if this fails, do not >> > attempt to use this PMU. >> >> This kvm thing is getting tedious :-( > > It sounds bogus to me too, VMs are supposed to return 0 > on reading unknown MSRs, no #GP. The Intel ISR section for RDMSR seems to say: "Specifying a reserved or unimplemented MSR address in ECX will also cause a general protection exception". >From a guest's perspective, MSR_RAPL_POWER_UNIT is unimplemented; kvm matches this behavior. Thanks, -- vs; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] perf/x86/intel: Use rdmsrl_safe when initializing RAPL PMU.
On Fri, Mar 14, 2014 at 6:56 AM, Andi Kleen a...@linux.intel.com wrote: On Fri, Mar 14, 2014 at 09:44:29AM +0100, Peter Zijlstra wrote: On Thu, Mar 13, 2014 at 12:36:26PM -0700, Venkatesh Srinivas wrote: CPUs which should support the RAPL counters according to Family/Model/Stepping may still issue #GP when attempting to access the RAPL MSRs. This may happen when Linux is running under KVM and we are passing-through host F/M/S data, for example. Use rdmsrl_safe to first access the RAPL_POWER_UNIT MSR; if this fails, do not attempt to use this PMU. This kvm thing is getting tedious :-( It sounds bogus to me too, VMs are supposed to return 0 on reading unknown MSRs, no #GP. The Intel ISR section for RDMSR seems to say: Specifying a reserved or unimplemented MSR address in ECX will also cause a general protection exception. From a guest's perspective, MSR_RAPL_POWER_UNIT is unimplemented; kvm matches this behavior. Thanks, -- vs; -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] perf/x86/intel: Use rdmsrl_safe when initializing RAPL PMU.
On Fri, Mar 14, 2014 at 10:57:58AM -0600, David Ahern wrote: On 3/14/14, 10:17 AM, Andi Kleen wrote: The Intel ISR section for RDMSR seems to say: Specifying a reserved or unimplemented MSR address in ECX will also cause a general protection exception. From a guest's perspective, MSR_RAPL_POWER_UNIT is unimplemented; kvm matches this behavior. MSRs are model specific and defined per model number. If you report a model number you're expected to implement the MSRs defined for that model number. AFAIK Xen just reports 0 for unknown MSRs (and I'm surprised KVM doesn't too) I would suggest to fix KVM. I believe ignore_msrs parameter to kvm handles that. David Hi, cc-ing the virtualization mailing list for more detail on the kvm default for ignore_msrs (it defaults off). 1) Just returning 0 for unsupported MSRs is not workable -- 0 may be a meaningful value for an MSR. RDMSR/WRMSR already have a mechanism for out-of-band errors, #GP. 2) #GP has been KVM's default behavior for quite some time. Even if we believe changing KVM's default is appropriate, Linux w/ the RAPL PMU code enabled will fail to boot on existing KVM versions. W/ this change, Linux will boot on prior KVM versions. Thanks, -- vs; -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] perf/x86/intel: Use rdmsrl_safe when initializing RAPL PMU.
CPUs which should support the RAPL counters according to Family/Model/Stepping may still issue #GP when attempting to access the RAPL MSRs. This may happen when Linux is running under KVM and we are passing-through host F/M/S data, for example. Use rdmsrl_safe to first access the RAPL_POWER_UNIT MSR; if this fails, do not attempt to use this PMU. Signed-off-by: Venkatesh Srinivas --- arch/x86/kernel/cpu/perf_event_intel_rapl.c | 12 +--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/arch/x86/kernel/cpu/perf_event_intel_rapl.c b/arch/x86/kernel/cpu/perf_event_intel_rapl.c index 5ad35ad..95700e5 100644 --- a/arch/x86/kernel/cpu/perf_event_intel_rapl.c +++ b/arch/x86/kernel/cpu/perf_event_intel_rapl.c @@ -511,6 +511,7 @@ static int rapl_cpu_prepare(int cpu) struct rapl_pmu *pmu = per_cpu(rapl_pmu, cpu); int phys_id = topology_physical_package_id(cpu); u64 ms; + u64 msr_rapl_power_unit_bits; if (pmu) return 0; @@ -518,6 +519,9 @@ static int rapl_cpu_prepare(int cpu) if (phys_id < 0) return -1; + if (!rdmsrl_safe(MSR_RAPL_POWER_UNIT, _rapl_power_unit_bits)) + return -1; + pmu = kzalloc_node(sizeof(*pmu), GFP_KERNEL, cpu_to_node(cpu)); if (!pmu) return -1; @@ -531,8 +535,7 @@ static int rapl_cpu_prepare(int cpu) * * we cache in local PMU instance */ - rdmsrl(MSR_RAPL_POWER_UNIT, pmu->hw_unit); - pmu->hw_unit = (pmu->hw_unit >> 8) & 0x1FULL; + pmu->hw_unit = (msr_rapl_power_unit_bits >> 8) & 0x1FULL; pmu->pmu = _pmu_class; /* @@ -649,7 +652,9 @@ static int __init rapl_pmu_init(void) get_online_cpus(); for_each_online_cpu(cpu) { - rapl_cpu_prepare(cpu); + ret = rapl_cpu_prepare(cpu); + if (ret) + goto out; rapl_cpu_init(cpu); } @@ -672,6 +677,7 @@ static int __init rapl_pmu_init(void) hweight32(rapl_cntr_mask), ktime_to_ms(pmu->timer_interval)); +out: put_online_cpus(); return 0; -- 1.9.0.279.gdc9e3eb -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] perf/x86/intel: Use rdmsrl_safe when initializing RAPL PMU.
CPUs which should support the RAPL counters according to Family/Model/Stepping may still issue #GP when attempting to access the RAPL MSRs. This may happen when Linux is running under KVM and we are passing-through host F/M/S data, for example. Use rdmsrl_safe to first access the RAPL_POWER_UNIT MSR; if this fails, do not attempt to use this PMU. Signed-off-by: Venkatesh Srinivas venkate...@google.com --- arch/x86/kernel/cpu/perf_event_intel_rapl.c | 12 +--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/arch/x86/kernel/cpu/perf_event_intel_rapl.c b/arch/x86/kernel/cpu/perf_event_intel_rapl.c index 5ad35ad..95700e5 100644 --- a/arch/x86/kernel/cpu/perf_event_intel_rapl.c +++ b/arch/x86/kernel/cpu/perf_event_intel_rapl.c @@ -511,6 +511,7 @@ static int rapl_cpu_prepare(int cpu) struct rapl_pmu *pmu = per_cpu(rapl_pmu, cpu); int phys_id = topology_physical_package_id(cpu); u64 ms; + u64 msr_rapl_power_unit_bits; if (pmu) return 0; @@ -518,6 +519,9 @@ static int rapl_cpu_prepare(int cpu) if (phys_id 0) return -1; + if (!rdmsrl_safe(MSR_RAPL_POWER_UNIT, msr_rapl_power_unit_bits)) + return -1; + pmu = kzalloc_node(sizeof(*pmu), GFP_KERNEL, cpu_to_node(cpu)); if (!pmu) return -1; @@ -531,8 +535,7 @@ static int rapl_cpu_prepare(int cpu) * * we cache in local PMU instance */ - rdmsrl(MSR_RAPL_POWER_UNIT, pmu-hw_unit); - pmu-hw_unit = (pmu-hw_unit 8) 0x1FULL; + pmu-hw_unit = (msr_rapl_power_unit_bits 8) 0x1FULL; pmu-pmu = rapl_pmu_class; /* @@ -649,7 +652,9 @@ static int __init rapl_pmu_init(void) get_online_cpus(); for_each_online_cpu(cpu) { - rapl_cpu_prepare(cpu); + ret = rapl_cpu_prepare(cpu); + if (ret) + goto out; rapl_cpu_init(cpu); } @@ -672,6 +677,7 @@ static int __init rapl_pmu_init(void) hweight32(rapl_cntr_mask), ktime_to_ms(pmu-timer_interval)); +out: put_online_cpus(); return 0; -- 1.9.0.279.gdc9e3eb -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] perf/x86/intel: Use rdmsrl_safe when initializing RAPL PMU.
CPUs which should support the RAPL counters according to Family/Model/Stepping may still issue #GP when attempting to access the RAPL MSRs. This may happen when Linux is running under KVM and we are passing-through host F/M/S data, for example. Use rdmsrl_safe to first access the RAPL_POWER_UNIT MSR; if this fails, do not attempt to use this PMU. Signed-off-by: Venkatesh Srinivas --- arch/x86/kernel/cpu/perf_event_intel_rapl.c | 12 +--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/arch/x86/kernel/cpu/perf_event_intel_rapl.c b/arch/x86/kernel/cpu/perf_event_intel_rapl.c index 5ad35ad..95700e5 100644 --- a/arch/x86/kernel/cpu/perf_event_intel_rapl.c +++ b/arch/x86/kernel/cpu/perf_event_intel_rapl.c @@ -511,6 +511,7 @@ static int rapl_cpu_prepare(int cpu) struct rapl_pmu *pmu = per_cpu(rapl_pmu, cpu); int phys_id = topology_physical_package_id(cpu); u64 ms; + u64 msr_rapl_power_unit_bits; if (pmu) return 0; @@ -518,6 +519,9 @@ static int rapl_cpu_prepare(int cpu) if (phys_id < 0) return -1; + if (!rdmsrl_safe(MSR_RAPL_POWER_UNIT, _rapl_power_unit_bits)) + return -1; + pmu = kzalloc_node(sizeof(*pmu), GFP_KERNEL, cpu_to_node(cpu)); if (!pmu) return -1; @@ -531,8 +535,7 @@ static int rapl_cpu_prepare(int cpu) * * we cache in local PMU instance */ - rdmsrl(MSR_RAPL_POWER_UNIT, pmu->hw_unit); - pmu->hw_unit = (pmu->hw_unit >> 8) & 0x1FULL; + pmu->hw_unit = (msr_rapl_power_unit_bits >> 8) & 0x1FULL; pmu->pmu = _pmu_class; /* @@ -649,7 +652,9 @@ static int __init rapl_pmu_init(void) get_online_cpus(); for_each_online_cpu(cpu) { - rapl_cpu_prepare(cpu); + ret = rapl_cpu_prepare(cpu); + if (ret) + goto out; rapl_cpu_init(cpu); } @@ -672,6 +677,7 @@ static int __init rapl_pmu_init(void) hweight32(rapl_cntr_mask), ktime_to_ms(pmu->timer_interval)); +out: put_online_cpus(); return 0; -- 1.9.0.279.gdc9e3eb -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] perf/x86/intel: Use rdmsrl_safe when initializing RAPL PMU.
CPUs which should support the RAPL counters according to Family/Model/Stepping may still issue #GP when attempting to access the RAPL MSRs. This may happen when Linux is running under KVM and we are passing-through host F/M/S data, for example. Use rdmsrl_safe to first access the RAPL_POWER_UNIT MSR; if this fails, do not attempt to use this PMU. Signed-off-by: Venkatesh Srinivas venkate...@google.com --- arch/x86/kernel/cpu/perf_event_intel_rapl.c | 12 +--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/arch/x86/kernel/cpu/perf_event_intel_rapl.c b/arch/x86/kernel/cpu/perf_event_intel_rapl.c index 5ad35ad..95700e5 100644 --- a/arch/x86/kernel/cpu/perf_event_intel_rapl.c +++ b/arch/x86/kernel/cpu/perf_event_intel_rapl.c @@ -511,6 +511,7 @@ static int rapl_cpu_prepare(int cpu) struct rapl_pmu *pmu = per_cpu(rapl_pmu, cpu); int phys_id = topology_physical_package_id(cpu); u64 ms; + u64 msr_rapl_power_unit_bits; if (pmu) return 0; @@ -518,6 +519,9 @@ static int rapl_cpu_prepare(int cpu) if (phys_id 0) return -1; + if (!rdmsrl_safe(MSR_RAPL_POWER_UNIT, msr_rapl_power_unit_bits)) + return -1; + pmu = kzalloc_node(sizeof(*pmu), GFP_KERNEL, cpu_to_node(cpu)); if (!pmu) return -1; @@ -531,8 +535,7 @@ static int rapl_cpu_prepare(int cpu) * * we cache in local PMU instance */ - rdmsrl(MSR_RAPL_POWER_UNIT, pmu-hw_unit); - pmu-hw_unit = (pmu-hw_unit 8) 0x1FULL; + pmu-hw_unit = (msr_rapl_power_unit_bits 8) 0x1FULL; pmu-pmu = rapl_pmu_class; /* @@ -649,7 +652,9 @@ static int __init rapl_pmu_init(void) get_online_cpus(); for_each_online_cpu(cpu) { - rapl_cpu_prepare(cpu); + ret = rapl_cpu_prepare(cpu); + if (ret) + goto out; rapl_cpu_init(cpu); } @@ -672,6 +677,7 @@ static int __init rapl_pmu_init(void) hweight32(rapl_cntr_mask), ktime_to_ms(pmu-timer_interval)); +out: put_online_cpus(); return 0; -- 1.9.0.279.gdc9e3eb -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] vhost/scsi: Check LUN structure byte 0 is set to 1, per spec
The virtio spec requires byte 0 of the virtio-scsi LUN structure to be '1'. Signed-off-by: Venkatesh Srinivas --- drivers/vhost/scsi.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c index 0a025b8..e48d4a6 100644 --- a/drivers/vhost/scsi.c +++ b/drivers/vhost/scsi.c @@ -1001,6 +1001,12 @@ vhost_scsi_handle_vq(struct vhost_scsi *vs, struct vhost_virtqueue *vq) break; } + /* virtio-scsi spec requires byte 0 of the lun to be 1 */ + if (unlikely(v_req.lun[0] != 1)) { + vhost_scsi_send_bad_target(vs, vq, head, out); + continue; + } + /* Extract the tpgt */ target = v_req.lun[1]; tpg = ACCESS_ONCE(vs_tpg[target]); -- 1.8.5.3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] vhost/scsi: Check LUN structure byte 0 is set to 1, per spec
The virtio spec requires byte 0 of the virtio-scsi LUN structure to be '1'. Signed-off-by: Venkatesh Srinivas venkate...@google.com --- drivers/vhost/scsi.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c index 0a025b8..e48d4a6 100644 --- a/drivers/vhost/scsi.c +++ b/drivers/vhost/scsi.c @@ -1001,6 +1001,12 @@ vhost_scsi_handle_vq(struct vhost_scsi *vs, struct vhost_virtqueue *vq) break; } + /* virtio-scsi spec requires byte 0 of the lun to be 1 */ + if (unlikely(v_req.lun[0] != 1)) { + vhost_scsi_send_bad_target(vs, vq, head, out); + continue; + } + /* Extract the tpgt */ target = v_req.lun[1]; tpg = ACCESS_ONCE(vs_tpg[target]); -- 1.8.5.3 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V6 4/5] virtio-scsi: introduce multiqueue support
On Wed, Mar 20, 2013 at 03:01:23PM +0800, Wanlong Gao wrote: From: Paolo Bonzini This patch adds queue steering to virtio-scsi. When a target is sent multiple requests, we always drive them to the same queue so that FIFO processing order is kept. However, if a target was idle, we can choose a queue arbitrarily. In this case the queue is chosen according to the current VCPU, so the driver expects the number of request queues to be equal to the number of VCPUs. This makes it easy and fast to select the queue, and also lets the driver optimize the IRQ affinity for the virtqueues (each virtqueue's affinity is set to the CPU that "owns" the queue). The speedup comes from improving cache locality and giving CPU affinity to the virtqueues, which is why this scheme was selected. Assuming that the thread that is sending requests to the device is I/O-bound, it is likely to be sleeping at the time the ISR is executed, and thus executing the ISR on the same processor that sent the requests is cheap. However, the kernel will not execute the ISR on the "best" processor unless you explicitly set the affinity. This is because in practice you will have many such I/O-bound processes and thus many otherwise idle processors. Then the kernel will execute the ISR on a random processor, rather than the one that is sending requests to the device. The alternative to per-CPU virtqueues is per-target virtqueues. To achieve the same locality, we could dynamically choose the virtqueue's affinity based on the CPU of the last task that sent a request. This is less appealing because we do not set the affinity directly---we only provide a hint to the irqbalanced running in userspace. Dynamically changing the affinity only works if the userspace applies the hint fast enough. Looks good! Tested as V5. Tested-by: Venkatesh Srinivas -- vs; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V6 4/5] virtio-scsi: introduce multiqueue support
On Wed, Mar 20, 2013 at 03:01:23PM +0800, Wanlong Gao wrote: From: Paolo Bonzini pbonz...@redhat.com This patch adds queue steering to virtio-scsi. When a target is sent multiple requests, we always drive them to the same queue so that FIFO processing order is kept. However, if a target was idle, we can choose a queue arbitrarily. In this case the queue is chosen according to the current VCPU, so the driver expects the number of request queues to be equal to the number of VCPUs. This makes it easy and fast to select the queue, and also lets the driver optimize the IRQ affinity for the virtqueues (each virtqueue's affinity is set to the CPU that owns the queue). The speedup comes from improving cache locality and giving CPU affinity to the virtqueues, which is why this scheme was selected. Assuming that the thread that is sending requests to the device is I/O-bound, it is likely to be sleeping at the time the ISR is executed, and thus executing the ISR on the same processor that sent the requests is cheap. However, the kernel will not execute the ISR on the best processor unless you explicitly set the affinity. This is because in practice you will have many such I/O-bound processes and thus many otherwise idle processors. Then the kernel will execute the ISR on a random processor, rather than the one that is sending requests to the device. The alternative to per-CPU virtqueues is per-target virtqueues. To achieve the same locality, we could dynamically choose the virtqueue's affinity based on the CPU of the last task that sent a request. This is less appealing because we do not set the affinity directly---we only provide a hint to the irqbalanced running in userspace. Dynamically changing the affinity only works if the userspace applies the hint fast enough. Looks good! Tested as V5. Tested-by: Venkatesh Srinivas venkate...@google.com -- vs; -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V5 4/5] virtio-scsi: introduce multiqueue support
This looks pretty good! I rather like the (lack of) locking in I/O completion (around the req count vs. target/queue binding). It is unfortunate that you need to hold the per-target lock in virtscsi_pick_vq() though; have any idea how much that lock hurts? Just two minor comments: (in struct virtio_scsi_target_data): + /* This spinlock never help at the same time as vq_lock. */ held? (in struct virtio_scsi): + /* Does the affinity hint is set for virtqueues? */ Could you rephrase that, please? Tested on qemu and w/ Google Compute Engine's virtio-scsi device. Reviewed-and-tested-by: Venkatesh Srinivas Thanks, -- vs; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V5 4/5] virtio-scsi: introduce multiqueue support
This looks pretty good! I rather like the (lack of) locking in I/O completion (around the req count vs. target/queue binding). It is unfortunate that you need to hold the per-target lock in virtscsi_pick_vq() though; have any idea how much that lock hurts? Just two minor comments: (in struct virtio_scsi_target_data): + /* This spinlock never help at the same time as vq_lock. */ held? (in struct virtio_scsi): + /* Does the affinity hint is set for virtqueues? */ Could you rephrase that, please? Tested on qemu and w/ Google Compute Engine's virtio-scsi device. Reviewed-and-tested-by: Venkatesh Srinivas venkate...@google.com Thanks, -- vs; -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/