Re: [RFC 0/8] virtio,vhost: Add VIRTIO_F_IN_ORDER support

2024-03-21 Thread Dongli Zhang
Hi Jonah,

Would you mind helping explain how does VIRTIO_F_IN_ORDER improve the 
performance?

https://lore.kernel.org/all/20240321155717.1392787-1-jonah.pal...@oracle.com/#t

I tried to look for it from prior discussions but could not find why.

https://lore.kernel.org/all/byapr18mb2791df7e6c0f61e2d8698e8fa0...@byapr18mb2791.namprd18.prod.outlook.com/

Thank you very much!

Dongli Zhang

On 3/21/24 08:57, Jonah Palmer wrote:
> The goal of these patches is to add support to a variety of virtio and
> vhost devices for the VIRTIO_F_IN_ORDER transport feature. This feature
> indicates that all buffers are used by the device in the same order in
> which they were made available by the driver.
> 
> These patches attempt to implement a generalized, non-device-specific
> solution to support this feature.
> 
> The core feature behind this solution is a buffer mechanism in the form
> of GLib's GHashTable. The decision behind using a hash table was to
> leverage their ability for quick lookup, insertion, and removal
> operations. Given that our keys are simply numbers of an ordered
> sequence, a hash table seemed like the best choice for a buffer
> mechanism.
> 
> -
> 
> The strategy behind this implementation is as follows:
> 
> We know that buffers that are popped from the available ring and enqueued
> for further processing will always done in the same order in which they
> were made available by the driver. Given this, we can note their order
> by assigning the resulting VirtQueueElement a key. This key is a number
> in a sequence that represents the order in which they were popped from
> the available ring, relative to the other VirtQueueElements.
> 
> For example, given 3 "elements" that were popped from the available
> ring, we assign a key value to them which represents their order (elem0
> is popped first, then elem1, then lastly elem2):
> 
>  elem2   --  elem1   --  elem0   ---> Enqueue for processing
> (key: 2)(key: 1)(key: 0)
> 
> Then these elements are enqueued for further processing by the host.
> 
> While most devices will return these completed elements in the same
> order in which they were enqueued, some devices may not (e.g.
> virtio-blk). To guarantee that these elements are put on the used ring
> in the same order in which they were enqueued, we can use a buffering
> mechanism that keeps track of the next expected sequence number of an
> element.
> 
> In other words, if the completed element does not have a key value that
> matches the next expected sequence number, then we know this element is
> not in-order and we must stash it away in a hash table until an order
> can be made. The element's key value is used as the key for placing it
> in the hash table.
> 
> If the completed element has a key value that matches the next expected
> sequence number, then we know this element is in-order and we can push
> it on the used ring. Then we increment the next expected sequence number
> and check if the hash table contains an element at this key location.
> 
> If so, we retrieve this element, push it to the used ring, delete the
> key-value pair from the hash table, increment the next expected sequence
> number, and check the hash table again for an element at this new key
> location. This process is repeated until we're unable to find an element
> in the hash table to continue the order.
> 
> So, for example, say the 3 elements we enqueued were completed in the
> following order: elem1, elem2, elem0. The next expected sequence number
> is 0:
> 
> exp-seq-num = 0:
> 
>  elem1   --> elem1.key == exp-seq-num ? --> No, stash it
> (key: 1) |
>  |
>  v
>
>|key: 1 - elem1|
>
> -
> exp-seq-num = 0:
> 
>  elem2   --> elem2.key == exp-seq-num ? --> No, stash it
> (key: 2) |
>  |
>  v
>
>|key: 1 - elem1|
>|--|
>|key: 2 - elem2|
>
> -
>

Re: [PATCH 0/2] scsi: to fix issue on passing host_status to the guest kernel

2022-01-10 Thread Dongli Zhang
ping?

Thank you very much!

Dongli Zhang

On 12/10/21 6:16 AM, Dongli Zhang wrote:
> This patchset fixes the issue on passing 'host_status' to the guest kernel.
> 
> The 1st patch fixes the erroneous usage of req->host_status.
> 
> The 2nd patch is to pass the SCSI_HOST_ERROR to the guest kernel when the
> req->bus->info->fail() is not implemented. I do not add 'Fixes:' because I
> am not sure if to not pass SCSI_HOST_ERROR was on purpose, especially for
> security reason.
> 
> Thank you very much!
> 
> Dongli Zhang
> 
> 
> 



[PATCH 1/2] scsi/scsi_bus: use host_status as parameter for scsi_sense_from_host_status()

2021-12-10 Thread Dongli Zhang
The scsi_sense_from_host_status() always returns GOOD since
req->host_status is 255 (-1) at this time. Change req->host_status to
host_status so that scsi_sense_from_host_status() will be able to return
the expected value.

Fixes: f3126d65b393("scsi: move host_status handling into SCSI drivers")
Cc: Joe Jin 
Signed-off-by: Dongli Zhang 
---
 hw/scsi/scsi-bus.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c
index 77325d8cc7..d46650bd8c 100644
--- a/hw/scsi/scsi-bus.c
+++ b/hw/scsi/scsi-bus.c
@@ -1465,7 +1465,7 @@ void scsi_req_complete_failed(SCSIRequest *req, int 
host_status)
 assert(req->ops != _unit_attention);
 
 if (!req->bus->info->fail) {
-status = scsi_sense_from_host_status(req->host_status, );
+status = scsi_sense_from_host_status(host_status, );
 if (status == CHECK_CONDITION) {
 scsi_req_build_sense(req, sense);
 }
-- 
2.17.1




[PATCH 2/2] scsi/utils: pass host_status = SCSI_HOST_ERROR to guest kernel

2021-12-10 Thread Dongli Zhang
For scsi_req_complete_failed() and when the req->bus->info->fail() is
implemented, the virtio-scsi passes SCSI_HOST_ERROR to the guest kernel as
VIRTIO_SCSI_S_FAILURE, while the pvscsi passes SCSI_HOST_ERROR to guest
kernel as BTSTAT_HASOFTWARE.

However, the scsi_req_complete_failed()->scsi_sense_from_host_status()
always returns GOOD for SCSI_HOST_ERROR, when the req->bus->info->fail() is
not implemented (e.g., megasas). As a result, the sense is not passed to
the guest kernel.

The SCSI_HOST_ERROR is reproduced on purpose by below QEMU command line:

-device megasas,id=vscsi0,bus=pci.0,addr=0x4 \
-drive file=/dev/sdc,format=raw,if=none,id=drive01 \
-device scsi-block,bus=vscsi0.0,channel=0,scsi-id=0,lun=0,drive=drive01 \

... and when we remove the /dev/sdc from the host with:

"echo 1 > /sys/block/sdc/device/delete"

This patch passes sense_code_IO_ERROR to the guest kernel for
host_status = SCSI_HOST_ERROR.

(This issue is detected by running a testing code from Rui Loura).

Cc: Joe Jin 
Cc: Adnan Misherfi 
Cc: Rui Loura 
Signed-off-by: Dongli Zhang 
---
 scsi/utils.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/scsi/utils.c b/scsi/utils.c
index 357b036671..086a1fea66 100644
--- a/scsi/utils.c
+++ b/scsi/utils.c
@@ -638,6 +638,9 @@ int scsi_sense_from_host_status(uint8_t host_status,
 case SCSI_HOST_ABORTED:
 *sense = SENSE_CODE(COMMAND_ABORTED);
 return CHECK_CONDITION;
+case SCSI_HOST_ERROR:
+*sense = SENSE_CODE(IO_ERROR);
+return CHECK_CONDITION;
 case SCSI_HOST_RESET:
 *sense = SENSE_CODE(RESET);
 return CHECK_CONDITION;
-- 
2.17.1




[PATCH 0/2] scsi: to fix issue on passing host_status to the guest kernel

2021-12-10 Thread Dongli Zhang
This patchset fixes the issue on passing 'host_status' to the guest kernel.

The 1st patch fixes the erroneous usage of req->host_status.

The 2nd patch is to pass the SCSI_HOST_ERROR to the guest kernel when the
req->bus->info->fail() is not implemented. I do not add 'Fixes:' because I
am not sure if to not pass SCSI_HOST_ERROR was on purpose, especially for
security reason.

Thank you very much!

Dongli Zhang





Re: [PATCH 0/6] Add debug interface to kick/call on purpose

2021-04-07 Thread Dongli Zhang



On 4/6/21 7:20 PM, Jason Wang wrote:
> 
> 在 2021/4/7 上午7:27, Dongli Zhang 写道:
>>> This will answer your question that "Can it bypass the masking?".
>>>
>>> For vhost-scsi, virtio-blk, virtio-scsi and virtio-net, to write to eventfd 
>>> is
>>> not able to bypass masking because masking is to unregister the eventfd. To
>>> write to eventfd does not take effect.
>>>
>>> However, it is possible to bypass masking for vhost-net because vhost-net
>>> registered a specific masked_notifier eventfd in order to mask irq. To 
>>> write to
>>> original eventfd still takes effect.
>>>
>>> We may leave the user to decide whether to write to 'masked_notifier' or
>>> original 'guest_notifier' for vhost-net.
>> My fault here. To write to masked_notifier will always be masked:(
> 
> 
> Only when there's no bug in the qemu.
> 
> 
>>
>> If it is EventNotifier level, we will not care whether the EventNotifier is
>> masked or not. It just provides an interface to write to EventNotifier.
> 
> 
> Yes.
> 
> 
>>
>> To dump the MSI-x table for both virtio and vfio will help confirm if the 
>> vector
>> is masked.
> 
> 
> That would be helpful as well. It's probably better to extend "info pci" 
> command.
> 
> Thanks

I will try if to add to "info pci" (introduce new arg option to "info pci"), or
to introduce new command.

About the EventNotifier, I will classify them as guest notifier or host notifier
so that it will be much more easier for user to tell if the eventfd is for
injecting IRQ or kicking the doorbell.

Thank you very much for all suggestions!

Dongli Zhang

> 
> 
>>
>> Thank you very much!
>>
>> Dongli Zhang
>>
> 



Re: [PATCH 1/6] qdev: introduce qapi/hmp command for kick/call event

2021-04-07 Thread Dongli Zhang



On 4/7/21 6:40 AM, Eduardo Habkost wrote:
> On Thu, Mar 25, 2021 at 10:44:28PM -0700, Dongli Zhang wrote:
>> The virtio device/driver (e.g., vhost-scsi or vhost-net) may hang due to
>> the loss of doorbell kick, e.g.,
>>
>> https://urldefense.com/v3/__https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg01711.html__;!!GqivPVa7Brio!NaqdV_o0gMJkUtVWaHyLRwKDa_8MsiuANAqEcM-Ooy4pYE3R1bwPmLdCTkE0gq6gywY$
>>  
>>
>> ... or due to the loss of IRQ, e.g., as fixed by linux kernel commit
>> fe200ae48ef5 ("genirq: Mark polled irqs and defer the real handler").
>>
>> This patch introduces a new debug interface 'DeviceEvent' to DeviceClass
>> to help narrow down if the issue is due to loss of irq/kick. So far the new
>> interface handles only two events: 'call' and 'kick'. Any device (e.g.,
>> virtio/vhost or VFIO) may implement the interface (e.g., via eventfd, MSI-X
>> or legacy IRQ).
>>
>> The 'call' is to inject irq on purpose by admin for a specific device (e.g.,
>> vhost-scsi) from QEMU/host to VM, while the 'kick' is to kick the doorbell
>> on purpose by admin at QEMU/host side for a specific device.
>>
>> Signed-off-by: Dongli Zhang 
> [...]
>> diff --git a/include/monitor/hmp.h b/include/monitor/hmp.h
>> index 605d57287a..c7795d4ba5 100644
>> --- a/include/monitor/hmp.h
>> +++ b/include/monitor/hmp.h
>> @@ -129,5 +129,6 @@ void hmp_info_replay(Monitor *mon, const QDict *qdict);
>>  void hmp_replay_break(Monitor *mon, const QDict *qdict);
>>  void hmp_replay_delete_break(Monitor *mon, const QDict *qdict);
>>  void hmp_replay_seek(Monitor *mon, const QDict *qdict);
>> +void hmp_x_debug_device_event(Monitor *mon, const QDict *qdict);
>>  
>>  #endif
>> diff --git a/qapi/qdev.json b/qapi/qdev.json
>> index b83178220b..711c4a297a 100644
>> --- a/qapi/qdev.json
>> +++ b/qapi/qdev.json
>> @@ -124,3 +124,33 @@
>>  ##
>>  { 'event': 'DEVICE_DELETED',
>>'data': { '*device': 'str', 'path': 'str' } }
>> +
>> +##
>> +# @x-debug-device-event:
>> +#
>> +# Generate device event for a specific device queue
>> +#
>> +# @dev: device path
>> +#
>> +# @event: event (e.g., kick or call) to trigger
> 
> Any specific reason to not use an enum here?
> 
> In addition to making the QAPI schema and documentation more
> descriptive, it would save you the work of manually defining the
> DEVICE_EVENT_* constants and implementing get_device_event().

Thank you very much for the suggestion!

I will use enum in json file.

Dongli Zhang

> 
> 
>> +#
>> +# @queue: queue id
>> +#
>> +# Returns: Nothing on success
>> +#
>> +# Since: 6.1
>> +#
>> +# Notes: This is used to debug VM driver hang issue. The 'kick' event is to
>> +#send notification to QEMU/vhost while the 'call' event is to
>> +#interrupt VM on purpose.
>> +#
>> +# Example:
>> +#
>> +# -> { "execute": "x-debug-device_event",
>> +#  "arguments": { "dev": "/machine/peripheral/vscsi0", "event": "kick",
>> +# "queue": 1 } }
>> +# <- { "return": {} }
>> +#
>> +##
>> +{ 'command': 'x-debug-device-event',
>> +  'data': {'dev': 'str', 'event': 'str', 'queue': 'int'} }
> [...]
> 



Re: [PATCH 0/6] Add debug interface to kick/call on purpose

2021-04-06 Thread Dongli Zhang



On 4/6/21 1:43 AM, Dongli Zhang wrote:
> 
> 
> On 4/5/21 6:55 PM, Jason Wang wrote:
>>
>> 在 2021/4/6 上午4:00, Dongli Zhang 写道:
>>>
>>> On 4/1/21 8:47 PM, Jason Wang wrote:
>>>> 在 2021/3/30 下午3:29, Dongli Zhang 写道:
>>>>> On 3/28/21 8:56 PM, Jason Wang wrote:
>>>>>> 在 2021/3/27 上午5:16, Dongli Zhang 写道:
>>>>>>> Hi Jason,
>>>>>>>
>>>>>>> On 3/26/21 12:24 AM, Jason Wang wrote:
>>>>>>>> 在 2021/3/26 下午1:44, Dongli Zhang 写道:
>>>>>>>>> The virtio device/driver (e.g., vhost-scsi or vhost-net) may hang due 
>>>>>>>>> to
>>>>>>>>> the loss of doorbell kick, e.g.,
>>>>>>>>>
>>>>>>>>> https://urldefense.com/v3/__https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg01711.html__;!!GqivPVa7Brio!KS3pAU2cKjz4wgI4QSlE-YsJPhPG71nkE5_tGhaOf7mi_xvNxbvKkfn03rk5BNDLSEU$
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ... or due to the loss of IRQ, e.g., as fixed by linux kernel commit
>>>>>>>>> fe200ae48ef5 ("genirq: Mark polled irqs and defer the real handler").
>>>>>>>>>
>>>>>>>>> This patch introduces a new debug interface 'DeviceEvent' to 
>>>>>>>>> DeviceClass
>>>>>>>>> to help narrow down if the issue is due to loss of irq/kick. So far 
>>>>>>>>> the new
>>>>>>>>> interface handles only two events: 'call' and 'kick'. Any device 
>>>>>>>>> (e.g.,
>>>>>>>>> virtio/vhost or VFIO) may implement the interface (e.g., via eventfd, 
>>>>>>>>> MSI-X
>>>>>>>>> or legacy IRQ).
>>>>>>>>>
>>>>>>>>> The 'call' is to inject irq on purpose by admin for a specific device
>>>>>>>>> (e.g.,
>>>>>>>>> vhost-scsi) from QEMU/host to VM, while the 'kick' is to kick the 
>>>>>>>>> doorbell
>>>>>>>>> on purpose by admin at QEMU/host side for a specific device.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> This device can be used as a workaround if call/kick is lost due to
>>>>>>>>> virtualization software (e.g., kernel or QEMU) issue.
>>>>>>>>>
>>>>>>>>> We may also implement the interface for VFIO PCI, e.g., to write to
>>>>>>>>> VFIOPCIDevice->msi_vectors[i].interrupt will be able to inject IRQ to 
>>>>>>>>> VM
>>>>>>>>> on purpose. This is considered future work once the virtio part is 
>>>>>>>>> done.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Below is from live crash analysis. Initially, the queue=2 has 
>>>>>>>>> count=15 for
>>>>>>>>> 'kick' eventfd_ctx. Suppose there is data in vring avail while there 
>>>>>>>>> is no
>>>>>>>>> used available. We suspect this is because vhost-scsi was not 
>>>>>>>>> notified by
>>>>>>>>> VM. In order to narrow down and analyze the issue, we use live crash 
>>>>>>>>> to
>>>>>>>>> dump the current counter of eventfd for queue=2.
>>>>>>>>>
>>>>>>>>> crash> eventfd_ctx 8f67f6bbe700
>>>>>>>>> struct eventfd_ctx {
>>>>>>>>>   kref = {
>>>>>>>>>     refcount = {
>>>>>>>>>   refs = {
>>>>>>>>>     counter = 4
>>>>>>>>>   }
>>>>>>>>>     }
>>>>>>>>>   },
>>>>>>>>>   wqh = {
>>>>>>>>>     lock = {
>>>>>>>>>   {
>>>>>>>>>     rlock = {
>>>>>>>>>   raw_lock = {
>>>>>>>>>     val = {
>>>>>>>>>   counter = 0
>>>>>&

Re: [PATCH 0/6] Add debug interface to kick/call on purpose

2021-04-06 Thread Dongli Zhang



On 4/5/21 6:55 PM, Jason Wang wrote:
> 
> 在 2021/4/6 上午4:00, Dongli Zhang 写道:
>>
>> On 4/1/21 8:47 PM, Jason Wang wrote:
>>> 在 2021/3/30 下午3:29, Dongli Zhang 写道:
>>>> On 3/28/21 8:56 PM, Jason Wang wrote:
>>>>> 在 2021/3/27 上午5:16, Dongli Zhang 写道:
>>>>>> Hi Jason,
>>>>>>
>>>>>> On 3/26/21 12:24 AM, Jason Wang wrote:
>>>>>>> 在 2021/3/26 下午1:44, Dongli Zhang 写道:
>>>>>>>> The virtio device/driver (e.g., vhost-scsi or vhost-net) may hang due 
>>>>>>>> to
>>>>>>>> the loss of doorbell kick, e.g.,
>>>>>>>>
>>>>>>>> https://urldefense.com/v3/__https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg01711.html__;!!GqivPVa7Brio!KS3pAU2cKjz4wgI4QSlE-YsJPhPG71nkE5_tGhaOf7mi_xvNxbvKkfn03rk5BNDLSEU$
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> ... or due to the loss of IRQ, e.g., as fixed by linux kernel commit
>>>>>>>> fe200ae48ef5 ("genirq: Mark polled irqs and defer the real handler").
>>>>>>>>
>>>>>>>> This patch introduces a new debug interface 'DeviceEvent' to 
>>>>>>>> DeviceClass
>>>>>>>> to help narrow down if the issue is due to loss of irq/kick. So far 
>>>>>>>> the new
>>>>>>>> interface handles only two events: 'call' and 'kick'. Any device (e.g.,
>>>>>>>> virtio/vhost or VFIO) may implement the interface (e.g., via eventfd, 
>>>>>>>> MSI-X
>>>>>>>> or legacy IRQ).
>>>>>>>>
>>>>>>>> The 'call' is to inject irq on purpose by admin for a specific device
>>>>>>>> (e.g.,
>>>>>>>> vhost-scsi) from QEMU/host to VM, while the 'kick' is to kick the 
>>>>>>>> doorbell
>>>>>>>> on purpose by admin at QEMU/host side for a specific device.
>>>>>>>>
>>>>>>>>
>>>>>>>> This device can be used as a workaround if call/kick is lost due to
>>>>>>>> virtualization software (e.g., kernel or QEMU) issue.
>>>>>>>>
>>>>>>>> We may also implement the interface for VFIO PCI, e.g., to write to
>>>>>>>> VFIOPCIDevice->msi_vectors[i].interrupt will be able to inject IRQ to 
>>>>>>>> VM
>>>>>>>> on purpose. This is considered future work once the virtio part is 
>>>>>>>> done.
>>>>>>>>
>>>>>>>>
>>>>>>>> Below is from live crash analysis. Initially, the queue=2 has count=15 
>>>>>>>> for
>>>>>>>> 'kick' eventfd_ctx. Suppose there is data in vring avail while there 
>>>>>>>> is no
>>>>>>>> used available. We suspect this is because vhost-scsi was not notified 
>>>>>>>> by
>>>>>>>> VM. In order to narrow down and analyze the issue, we use live crash to
>>>>>>>> dump the current counter of eventfd for queue=2.
>>>>>>>>
>>>>>>>> crash> eventfd_ctx 8f67f6bbe700
>>>>>>>> struct eventfd_ctx {
>>>>>>>>   kref = {
>>>>>>>>     refcount = {
>>>>>>>>   refs = {
>>>>>>>>     counter = 4
>>>>>>>>   }
>>>>>>>>     }
>>>>>>>>   },
>>>>>>>>   wqh = {
>>>>>>>>     lock = {
>>>>>>>>   {
>>>>>>>>     rlock = {
>>>>>>>>   raw_lock = {
>>>>>>>>     val = {
>>>>>>>>   counter = 0
>>>>>>>>     }
>>>>>>>>   }
>>>>>>>>     }
>>>>>>>>   }
>>>>>>>>     },
>>>>>>>>     head = {
>>>>>>>>   next = 0x8f841dc08e18,
>>>>>>>>   prev = 0x8f841dc08e18
>>>>

Re: [PATCH 0/6] Add debug interface to kick/call on purpose

2021-04-05 Thread Dongli Zhang



On 4/1/21 8:47 PM, Jason Wang wrote:
> 
> 在 2021/3/30 下午3:29, Dongli Zhang 写道:
>>
>> On 3/28/21 8:56 PM, Jason Wang wrote:
>>> 在 2021/3/27 上午5:16, Dongli Zhang 写道:
>>>> Hi Jason,
>>>>
>>>> On 3/26/21 12:24 AM, Jason Wang wrote:
>>>>> 在 2021/3/26 下午1:44, Dongli Zhang 写道:
>>>>>> The virtio device/driver (e.g., vhost-scsi or vhost-net) may hang due to
>>>>>> the loss of doorbell kick, e.g.,
>>>>>>
>>>>>> https://urldefense.com/v3/__https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg01711.html__;!!GqivPVa7Brio!KS3pAU2cKjz4wgI4QSlE-YsJPhPG71nkE5_tGhaOf7mi_xvNxbvKkfn03rk5BNDLSEU$
>>>>>>
>>>>>>
>>>>>>
>>>>>> ... or due to the loss of IRQ, e.g., as fixed by linux kernel commit
>>>>>> fe200ae48ef5 ("genirq: Mark polled irqs and defer the real handler").
>>>>>>
>>>>>> This patch introduces a new debug interface 'DeviceEvent' to DeviceClass
>>>>>> to help narrow down if the issue is due to loss of irq/kick. So far the 
>>>>>> new
>>>>>> interface handles only two events: 'call' and 'kick'. Any device (e.g.,
>>>>>> virtio/vhost or VFIO) may implement the interface (e.g., via eventfd, 
>>>>>> MSI-X
>>>>>> or legacy IRQ).
>>>>>>
>>>>>> The 'call' is to inject irq on purpose by admin for a specific device 
>>>>>> (e.g.,
>>>>>> vhost-scsi) from QEMU/host to VM, while the 'kick' is to kick the 
>>>>>> doorbell
>>>>>> on purpose by admin at QEMU/host side for a specific device.
>>>>>>
>>>>>>
>>>>>> This device can be used as a workaround if call/kick is lost due to
>>>>>> virtualization software (e.g., kernel or QEMU) issue.
>>>>>>
>>>>>> We may also implement the interface for VFIO PCI, e.g., to write to
>>>>>> VFIOPCIDevice->msi_vectors[i].interrupt will be able to inject IRQ to VM
>>>>>> on purpose. This is considered future work once the virtio part is done.
>>>>>>
>>>>>>
>>>>>> Below is from live crash analysis. Initially, the queue=2 has count=15 
>>>>>> for
>>>>>> 'kick' eventfd_ctx. Suppose there is data in vring avail while there is 
>>>>>> no
>>>>>> used available. We suspect this is because vhost-scsi was not notified by
>>>>>> VM. In order to narrow down and analyze the issue, we use live crash to
>>>>>> dump the current counter of eventfd for queue=2.
>>>>>>
>>>>>> crash> eventfd_ctx 8f67f6bbe700
>>>>>> struct eventfd_ctx {
>>>>>>  kref = {
>>>>>>    refcount = {
>>>>>>  refs = {
>>>>>>    counter = 4
>>>>>>  }
>>>>>>    }
>>>>>>  },
>>>>>>  wqh = {
>>>>>>    lock = {
>>>>>>  {
>>>>>>    rlock = {
>>>>>>  raw_lock = {
>>>>>>    val = {
>>>>>>  counter = 0
>>>>>>    }
>>>>>>  }
>>>>>>    }
>>>>>>  }
>>>>>>    },
>>>>>>    head = {
>>>>>>  next = 0x8f841dc08e18,
>>>>>>  prev = 0x8f841dc08e18
>>>>>>    }
>>>>>>  },
>>>>>>  count = 15, ---> eventfd is 15 !!!
>>>>>>  flags = 526336
>>>>>> }
>>>>>>
>>>>>> Now we kick the doorbell for vhost-scsi queue=2 on purpose for diagnostic
>>>>>> with this interface.
>>>>>>
>>>>>> { "execute": "x-debug-device-event",
>>>>>>  "arguments": { "dev": "/machine/peripheral/vscsi0",
>>>>>>     "event": "kick", "queue": 2 } }
>>>>>>
>>>>>> The counter is increased to 16. Suppose the hang issue is resolved, it
>>>>>> indicates something bad is in software that the 'kick' is los

Re: [PATCH 0/6] Add debug interface to kick/call on purpose

2021-03-30 Thread Dongli Zhang



On 3/28/21 8:56 PM, Jason Wang wrote:
> 
> 在 2021/3/27 上午5:16, Dongli Zhang 写道:
>> Hi Jason,
>>
>> On 3/26/21 12:24 AM, Jason Wang wrote:
>>> 在 2021/3/26 下午1:44, Dongli Zhang 写道:
>>>> The virtio device/driver (e.g., vhost-scsi or vhost-net) may hang due to
>>>> the loss of doorbell kick, e.g.,
>>>>
>>>> https://urldefense.com/v3/__https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg01711.html__;!!GqivPVa7Brio!KS3pAU2cKjz4wgI4QSlE-YsJPhPG71nkE5_tGhaOf7mi_xvNxbvKkfn03rk5BNDLSEU$
>>>>
>>>>
>>>> ... or due to the loss of IRQ, e.g., as fixed by linux kernel commit
>>>> fe200ae48ef5 ("genirq: Mark polled irqs and defer the real handler").
>>>>
>>>> This patch introduces a new debug interface 'DeviceEvent' to DeviceClass
>>>> to help narrow down if the issue is due to loss of irq/kick. So far the new
>>>> interface handles only two events: 'call' and 'kick'. Any device (e.g.,
>>>> virtio/vhost or VFIO) may implement the interface (e.g., via eventfd, MSI-X
>>>> or legacy IRQ).
>>>>
>>>> The 'call' is to inject irq on purpose by admin for a specific device 
>>>> (e.g.,
>>>> vhost-scsi) from QEMU/host to VM, while the 'kick' is to kick the doorbell
>>>> on purpose by admin at QEMU/host side for a specific device.
>>>>
>>>>
>>>> This device can be used as a workaround if call/kick is lost due to
>>>> virtualization software (e.g., kernel or QEMU) issue.
>>>>
>>>> We may also implement the interface for VFIO PCI, e.g., to write to
>>>> VFIOPCIDevice->msi_vectors[i].interrupt will be able to inject IRQ to VM
>>>> on purpose. This is considered future work once the virtio part is done.
>>>>
>>>>
>>>> Below is from live crash analysis. Initially, the queue=2 has count=15 for
>>>> 'kick' eventfd_ctx. Suppose there is data in vring avail while there is no
>>>> used available. We suspect this is because vhost-scsi was not notified by
>>>> VM. In order to narrow down and analyze the issue, we use live crash to
>>>> dump the current counter of eventfd for queue=2.
>>>>
>>>> crash> eventfd_ctx 8f67f6bbe700
>>>> struct eventfd_ctx {
>>>>     kref = {
>>>>   refcount = {
>>>>     refs = {
>>>>   counter = 4
>>>>     }
>>>>   }
>>>>     },
>>>>     wqh = {
>>>>   lock = {
>>>>     {
>>>>   rlock = {
>>>>     raw_lock = {
>>>>   val = {
>>>>     counter = 0
>>>>   }
>>>>     }
>>>>   }
>>>>     }
>>>>   },
>>>>   head = {
>>>>     next = 0x8f841dc08e18,
>>>>     prev = 0x8f841dc08e18
>>>>   }
>>>>     },
>>>>     count = 15, ---> eventfd is 15 !!!
>>>>     flags = 526336
>>>> }
>>>>
>>>> Now we kick the doorbell for vhost-scsi queue=2 on purpose for diagnostic
>>>> with this interface.
>>>>
>>>> { "execute": "x-debug-device-event",
>>>>     "arguments": { "dev": "/machine/peripheral/vscsi0",
>>>>    "event": "kick", "queue": 2 } }
>>>>
>>>> The counter is increased to 16. Suppose the hang issue is resolved, it
>>>> indicates something bad is in software that the 'kick' is lost.
>>> What do you mean by "software" here? And it looks to me you're testing 
>>> whether
>>> event_notifier_set() is called by virtio_queue_notify() here. If so, I'm not
>>> sure how much value could we gain from a dedicated debug interface like this
>>> consider there're a lot of exisinting general purpose debugging method like
>>> tracing or gdb. I'd say the path from virtio_queue_notify() to
>>> event_notifier_set() is only a very small fraction of the process of 
>>> virtqueue
>>> kick which is unlikey to be buggy. Consider usually the ioeventfd will be
>>> offloaded to KVM, it's more a chance that something is wrong in setuping
>>> ioeventfd instead of here. Irq is even more complicated.
>> Thank you very much!
>>
>> I am not testing whether event_notifier_set() is call

Re: [PATCH 0/6] Add debug interface to kick/call on purpose

2021-03-26 Thread Dongli Zhang
Hi Jason,

On 3/26/21 12:24 AM, Jason Wang wrote:
> 
> 在 2021/3/26 下午1:44, Dongli Zhang 写道:
>> The virtio device/driver (e.g., vhost-scsi or vhost-net) may hang due to
>> the loss of doorbell kick, e.g.,
>>
>> https://urldefense.com/v3/__https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg01711.html__;!!GqivPVa7Brio!KS3pAU2cKjz4wgI4QSlE-YsJPhPG71nkE5_tGhaOf7mi_xvNxbvKkfn03rk5BNDLSEU$
>>
>> ... or due to the loss of IRQ, e.g., as fixed by linux kernel commit
>> fe200ae48ef5 ("genirq: Mark polled irqs and defer the real handler").
>>
>> This patch introduces a new debug interface 'DeviceEvent' to DeviceClass
>> to help narrow down if the issue is due to loss of irq/kick. So far the new
>> interface handles only two events: 'call' and 'kick'. Any device (e.g.,
>> virtio/vhost or VFIO) may implement the interface (e.g., via eventfd, MSI-X
>> or legacy IRQ).
>>
>> The 'call' is to inject irq on purpose by admin for a specific device (e.g.,
>> vhost-scsi) from QEMU/host to VM, while the 'kick' is to kick the doorbell
>> on purpose by admin at QEMU/host side for a specific device.
>>
>>
>> This device can be used as a workaround if call/kick is lost due to
>> virtualization software (e.g., kernel or QEMU) issue.
>>
>> We may also implement the interface for VFIO PCI, e.g., to write to
>> VFIOPCIDevice->msi_vectors[i].interrupt will be able to inject IRQ to VM
>> on purpose. This is considered future work once the virtio part is done.
>>
>>
>> Below is from live crash analysis. Initially, the queue=2 has count=15 for
>> 'kick' eventfd_ctx. Suppose there is data in vring avail while there is no
>> used available. We suspect this is because vhost-scsi was not notified by
>> VM. In order to narrow down and analyze the issue, we use live crash to
>> dump the current counter of eventfd for queue=2.
>>
>> crash> eventfd_ctx 8f67f6bbe700
>> struct eventfd_ctx {
>>    kref = {
>>  refcount = {
>>    refs = {
>>  counter = 4
>>    }
>>  }
>>    },
>>    wqh = {
>>  lock = {
>>    {
>>  rlock = {
>>    raw_lock = {
>>  val = {
>>    counter = 0
>>  }
>>    }
>>  }
>>    }
>>  },
>>  head = {
>>    next = 0x8f841dc08e18,
>>    prev = 0x8f841dc08e18
>>  }
>>    },
>>    count = 15, ---> eventfd is 15 !!!
>>    flags = 526336
>> }
>>
>> Now we kick the doorbell for vhost-scsi queue=2 on purpose for diagnostic
>> with this interface.
>>
>> { "execute": "x-debug-device-event",
>>    "arguments": { "dev": "/machine/peripheral/vscsi0",
>>   "event": "kick", "queue": 2 } }
>>
>> The counter is increased to 16. Suppose the hang issue is resolved, it
>> indicates something bad is in software that the 'kick' is lost.
> 
> 
> What do you mean by "software" here? And it looks to me you're testing whether
> event_notifier_set() is called by virtio_queue_notify() here. If so, I'm not
> sure how much value could we gain from a dedicated debug interface like this
> consider there're a lot of exisinting general purpose debugging method like
> tracing or gdb. I'd say the path from virtio_queue_notify() to
> event_notifier_set() is only a very small fraction of the process of virtqueue
> kick which is unlikey to be buggy. Consider usually the ioeventfd will be
> offloaded to KVM, it's more a chance that something is wrong in setuping
> ioeventfd instead of here. Irq is even more complicated.

Thank you very much!

I am not testing whether event_notifier_set() is called by 
virtio_queue_notify().

The 'software' indicates the data processing and event notification mechanism
involved with virtio/vhost PV driver frontend. E.g., while VM is waiting for an
extra IRQ, vhost side did not trigger IRQ, suppose vring_need_event()
erroneously returns false due to corrupted ring buffer status.

This was initially proposed for vhost only and I was going to export
ioeventfd/irqfd from vhost to admin via sysfs. Finally, I realized I would
better implement this at QEMU.

The QEMU inits the eventfd (ioeventfd and irqfd), and offloads them to
KVM/vhost. The VM side sends requests to ring buffer and kicks the doorbell (via
ioeventfd), while the backend vhost side sends responses back and calls the IRQ
(via ioeventfd).

Unfortunately, sometimes there is issue with virtio/vhost so that kick/call was
missed/ignored, or even never triggered

[PATCH 1/6] qdev: introduce qapi/hmp command for kick/call event

2021-03-25 Thread Dongli Zhang
The virtio device/driver (e.g., vhost-scsi or vhost-net) may hang due to
the loss of doorbell kick, e.g.,

https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg01711.html

... or due to the loss of IRQ, e.g., as fixed by linux kernel commit
fe200ae48ef5 ("genirq: Mark polled irqs and defer the real handler").

This patch introduces a new debug interface 'DeviceEvent' to DeviceClass
to help narrow down if the issue is due to loss of irq/kick. So far the new
interface handles only two events: 'call' and 'kick'. Any device (e.g.,
virtio/vhost or VFIO) may implement the interface (e.g., via eventfd, MSI-X
or legacy IRQ).

The 'call' is to inject irq on purpose by admin for a specific device (e.g.,
vhost-scsi) from QEMU/host to VM, while the 'kick' is to kick the doorbell
on purpose by admin at QEMU/host side for a specific device.

Signed-off-by: Dongli Zhang 
---
 hmp-commands.hx| 14 +++
 include/hw/qdev-core.h |  9 +++
 include/monitor/hmp.h  |  1 +
 qapi/qdev.json | 30 ++
 softmmu/qdev-monitor.c | 56 ++
 5 files changed, 110 insertions(+)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index 435c591a1c..d74b895fff 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1725,3 +1725,17 @@ ERST
 .flags  = "p",
 },
 
+{
+.name   = "x-debug-device-event",
+.args_type  = "dev:s,event:s,queue:l",
+.params = "dev event queue",
+.help   = "generate device event for a specific device queue",
+.cmd= hmp_x_debug_device_event,
+.flags  = "p",
+},
+
+SRST
+``x-debug-device-event`` *dev* *event* *queue*
+  Generate device event *event* for specific *queue* of *dev*
+ERST
+
diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index bafc311bfa..1ea8bf23b9 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -29,9 +29,17 @@ typedef enum DeviceCategory {
 DEVICE_CATEGORY_MAX
 } DeviceCategory;
 
+enum {
+DEVICE_EVENT_CALL,
+DEVICE_EVENT_KICK,
+DEVICE_EVENT_MAX
+};
+
 typedef void (*DeviceRealize)(DeviceState *dev, Error **errp);
 typedef void (*DeviceUnrealize)(DeviceState *dev);
 typedef void (*DeviceReset)(DeviceState *dev);
+typedef void (*DeviceEvent)(DeviceState *dev, int event, int queue,
+Error **errp);
 typedef void (*BusRealize)(BusState *bus, Error **errp);
 typedef void (*BusUnrealize)(BusState *bus);
 
@@ -132,6 +140,7 @@ struct DeviceClass {
 DeviceReset reset;
 DeviceRealize realize;
 DeviceUnrealize unrealize;
+DeviceEvent event;
 
 /* device state */
 const VMStateDescription *vmsd;
diff --git a/include/monitor/hmp.h b/include/monitor/hmp.h
index 605d57287a..c7795d4ba5 100644
--- a/include/monitor/hmp.h
+++ b/include/monitor/hmp.h
@@ -129,5 +129,6 @@ void hmp_info_replay(Monitor *mon, const QDict *qdict);
 void hmp_replay_break(Monitor *mon, const QDict *qdict);
 void hmp_replay_delete_break(Monitor *mon, const QDict *qdict);
 void hmp_replay_seek(Monitor *mon, const QDict *qdict);
+void hmp_x_debug_device_event(Monitor *mon, const QDict *qdict);
 
 #endif
diff --git a/qapi/qdev.json b/qapi/qdev.json
index b83178220b..711c4a297a 100644
--- a/qapi/qdev.json
+++ b/qapi/qdev.json
@@ -124,3 +124,33 @@
 ##
 { 'event': 'DEVICE_DELETED',
   'data': { '*device': 'str', 'path': 'str' } }
+
+##
+# @x-debug-device-event:
+#
+# Generate device event for a specific device queue
+#
+# @dev: device path
+#
+# @event: event (e.g., kick or call) to trigger
+#
+# @queue: queue id
+#
+# Returns: Nothing on success
+#
+# Since: 6.1
+#
+# Notes: This is used to debug VM driver hang issue. The 'kick' event is to
+#send notification to QEMU/vhost while the 'call' event is to
+#interrupt VM on purpose.
+#
+# Example:
+#
+# -> { "execute": "x-debug-device_event",
+#  "arguments": { "dev": "/machine/peripheral/vscsi0", "event": "kick",
+# "queue": 1 } }
+# <- { "return": {} }
+#
+##
+{ 'command': 'x-debug-device-event',
+  'data': {'dev': 'str', 'event': 'str', 'queue': 'int'} }
diff --git a/softmmu/qdev-monitor.c b/softmmu/qdev-monitor.c
index a9955b97a0..bca53111fb 100644
--- a/softmmu/qdev-monitor.c
+++ b/softmmu/qdev-monitor.c
@@ -924,6 +924,62 @@ void hmp_device_del(Monitor *mon, const QDict *qdict)
 hmp_handle_error(mon, err);
 }
 
+static const char * const device_events[DEVICE_EVENT_MAX] = {
+[DEVICE_EVENT_KICK] = "kick",
+[DEVICE_EVENT_CALL] = "call"
+};
+
+static int get_device_event(const char *event)
+{
+int evt;
+
+for (evt = 0; evt < ARRAY_SIZE(device_events); evt++) {
+if (!strcmp(device_events[evt], event)) {
+return evt;
+}
+}
+
+return -ENOENT;
+}
+
+void qmp_x_de

[PATCH 3/6] virtio-blk-pci: implement device event interface for kick/call

2021-03-25 Thread Dongli Zhang
This is to implement the device event interface for virtio-blk-pci.

Signed-off-by: Dongli Zhang 
---
 hw/block/virtio-blk.c  |  9 +
 hw/virtio/virtio-blk-pci.c | 10 ++
 include/hw/virtio/virtio-blk.h |  2 ++
 3 files changed, 21 insertions(+)

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index d28979efb8..2b3583a913 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -1118,6 +1118,15 @@ static int virtio_blk_load_device(VirtIODevice *vdev, 
QEMUFile *f,
 return 0;
 }
 
+void virtio_blk_device_event(DeviceState *dev, int event, int queue,
+ Error **errp)
+{
+VirtIOBlock *s = VIRTIO_BLK(dev);
+bool irqfd = s->dataplane_started && !s->dataplane_disabled;
+
+virtio_device_event(dev, event, queue, irqfd, errp);
+}
+
 static void virtio_resize_cb(void *opaque)
 {
 VirtIODevice *vdev = opaque;
diff --git a/hw/virtio/virtio-blk-pci.c b/hw/virtio/virtio-blk-pci.c
index 9d5795810c..f1fc72e7f1 100644
--- a/hw/virtio/virtio-blk-pci.c
+++ b/hw/virtio/virtio-blk-pci.c
@@ -47,6 +47,15 @@ static Property virtio_blk_pci_properties[] = {
 DEFINE_PROP_END_OF_LIST(),
 };
 
+static void virtio_blk_pci_event(DeviceState *dev, int event, int queue,
+ Error **errp)
+{
+VirtIOBlkPCI *vblk = VIRTIO_BLK_PCI(dev);
+DeviceState *vdev = DEVICE(>vdev);
+
+virtio_blk_device_event(vdev, event, queue, errp);
+}
+
 static void virtio_blk_pci_realize(VirtIOPCIProxy *vpci_dev, Error **errp)
 {
 VirtIOBlkPCI *dev = VIRTIO_BLK_PCI(vpci_dev);
@@ -72,6 +81,7 @@ static void virtio_blk_pci_class_init(ObjectClass *klass, 
void *data)
 
 set_bit(DEVICE_CATEGORY_STORAGE, dc->categories);
 device_class_set_props(dc, virtio_blk_pci_properties);
+dc->event = virtio_blk_pci_event;
 k->realize = virtio_blk_pci_realize;
 pcidev_k->vendor_id = PCI_VENDOR_ID_REDHAT_QUMRANET;
 pcidev_k->device_id = PCI_DEVICE_ID_VIRTIO_BLOCK;
diff --git a/include/hw/virtio/virtio-blk.h b/include/hw/virtio/virtio-blk.h
index 29655a406d..500be01dff 100644
--- a/include/hw/virtio/virtio-blk.h
+++ b/include/hw/virtio/virtio-blk.h
@@ -92,5 +92,7 @@ typedef struct MultiReqBuffer {
 
 bool virtio_blk_handle_vq(VirtIOBlock *s, VirtQueue *vq);
 void virtio_blk_process_queued_requests(VirtIOBlock *s, bool is_bh);
+void virtio_blk_device_event(DeviceState *dev, int event, int queue,
+ Error **errp);
 
 #endif
-- 
2.17.1




[PATCH 5/6] vhost-scsi-pci: implement device event interface for kick/call

2021-03-25 Thread Dongli Zhang
This is to implement the device event interface for vhost-scsi-pci.

Signed-off-by: Dongli Zhang 
---
 hw/scsi/vhost-scsi.c   |  6 ++
 hw/virtio/vhost-scsi-pci.c | 10 ++
 include/hw/virtio/vhost-scsi.h |  3 +++
 3 files changed, 19 insertions(+)

diff --git a/hw/scsi/vhost-scsi.c b/hw/scsi/vhost-scsi.c
index 4d70fa036b..11dd94ff92 100644
--- a/hw/scsi/vhost-scsi.c
+++ b/hw/scsi/vhost-scsi.c
@@ -163,6 +163,12 @@ static const VMStateDescription vmstate_virtio_vhost_scsi 
= {
 .pre_save = vhost_scsi_pre_save,
 };
 
+void vhost_scsi_device_event(DeviceState *dev, int event, int queue,
+ Error **errp)
+{
+virtio_device_event(dev, event, queue, true, errp);
+}
+
 static void vhost_scsi_realize(DeviceState *dev, Error **errp)
 {
 VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(dev);
diff --git a/hw/virtio/vhost-scsi-pci.c b/hw/virtio/vhost-scsi-pci.c
index cb71a294fa..c7a614cb11 100644
--- a/hw/virtio/vhost-scsi-pci.c
+++ b/hw/virtio/vhost-scsi-pci.c
@@ -44,6 +44,15 @@ static Property vhost_scsi_pci_properties[] = {
 DEFINE_PROP_END_OF_LIST(),
 };
 
+static void vhost_scsi_pci_event(DeviceState *dev, int event, int queue,
+ Error **errp)
+{
+VHostSCSIPCI *vscsi = VHOST_SCSI_PCI(dev);
+DeviceState *vdev = DEVICE(>vdev);
+
+vhost_scsi_device_event(vdev, event, queue, errp);
+}
+
 static void vhost_scsi_pci_realize(VirtIOPCIProxy *vpci_dev, Error **errp)
 {
 VHostSCSIPCI *dev = VHOST_SCSI_PCI(vpci_dev);
@@ -70,6 +79,7 @@ static void vhost_scsi_pci_class_init(ObjectClass *klass, 
void *data)
 k->realize = vhost_scsi_pci_realize;
 set_bit(DEVICE_CATEGORY_STORAGE, dc->categories);
 device_class_set_props(dc, vhost_scsi_pci_properties);
+dc->event = vhost_scsi_pci_event;
 pcidev_k->vendor_id = PCI_VENDOR_ID_REDHAT_QUMRANET;
 pcidev_k->device_id = PCI_DEVICE_ID_VIRTIO_SCSI;
 pcidev_k->revision = 0x00;
diff --git a/include/hw/virtio/vhost-scsi.h b/include/hw/virtio/vhost-scsi.h
index 7dc2bdd69d..b47854a0c6 100644
--- a/include/hw/virtio/vhost-scsi.h
+++ b/include/hw/virtio/vhost-scsi.h
@@ -32,4 +32,7 @@ struct VHostSCSI {
 VHostSCSICommon parent_obj;
 };
 
+void vhost_scsi_device_event(DeviceState *dev, int event, int queue,
+ Error **errp);
+
 #endif
-- 
2.17.1




[PATCH 4/6] virtio-scsi-pci: implement device event interface for kick/call

2021-03-25 Thread Dongli Zhang
This is to implement the device event interface for virtio-scsi-pci.

Signed-off-by: Dongli Zhang 
---
 hw/scsi/virtio-scsi.c   |  9 +
 hw/virtio/virtio-scsi-pci.c | 10 ++
 include/hw/virtio/virtio-scsi.h |  3 +++
 3 files changed, 22 insertions(+)

diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
index 6d80730287..f437ed1a81 100644
--- a/hw/scsi/virtio-scsi.c
+++ b/hw/scsi/virtio-scsi.c
@@ -962,6 +962,15 @@ static struct SCSIBusInfo virtio_scsi_scsi_info = {
 .load_request = virtio_scsi_load_request,
 };
 
+void virtio_scsi_device_event(DeviceState *dev, int event, int queue,
+  Error **errp)
+{
+VirtIOSCSI *s = VIRTIO_SCSI(dev);
+bool irqfd = s->dataplane_started && !s->dataplane_fenced;
+
+virtio_device_event(dev, event, queue, irqfd, errp);
+}
+
 void virtio_scsi_common_realize(DeviceState *dev,
 VirtIOHandleOutput ctrl,
 VirtIOHandleOutput evt,
diff --git a/hw/virtio/virtio-scsi-pci.c b/hw/virtio/virtio-scsi-pci.c
index 97fab74236..d5db743692 100644
--- a/hw/virtio/virtio-scsi-pci.c
+++ b/hw/virtio/virtio-scsi-pci.c
@@ -43,6 +43,15 @@ static Property virtio_scsi_pci_properties[] = {
 DEFINE_PROP_END_OF_LIST(),
 };
 
+static void virtio_scsi_pci_event(DeviceState *dev, int event, int queue,
+  Error **errp)
+{
+VirtIOSCSIPCI *vscsi = VIRTIO_SCSI_PCI(dev);
+DeviceState *vdev = DEVICE(>vdev);
+
+virtio_scsi_device_event(vdev, event, queue, errp);
+}
+
 static void virtio_scsi_pci_realize(VirtIOPCIProxy *vpci_dev, Error **errp)
 {
 VirtIOSCSIPCI *dev = VIRTIO_SCSI_PCI(vpci_dev);
@@ -82,6 +91,7 @@ static void virtio_scsi_pci_class_init(ObjectClass *klass, 
void *data)
 k->realize = virtio_scsi_pci_realize;
 set_bit(DEVICE_CATEGORY_STORAGE, dc->categories);
 device_class_set_props(dc, virtio_scsi_pci_properties);
+dc->event = virtio_scsi_pci_event;
 pcidev_k->vendor_id = PCI_VENDOR_ID_REDHAT_QUMRANET;
 pcidev_k->device_id = PCI_DEVICE_ID_VIRTIO_SCSI;
 pcidev_k->revision = 0x00;
diff --git a/include/hw/virtio/virtio-scsi.h b/include/hw/virtio/virtio-scsi.h
index 543681bc18..d1fff0eeac 100644
--- a/include/hw/virtio/virtio-scsi.h
+++ b/include/hw/virtio/virtio-scsi.h
@@ -163,4 +163,7 @@ void virtio_scsi_dataplane_setup(VirtIOSCSI *s, Error 
**errp);
 int virtio_scsi_dataplane_start(VirtIODevice *s);
 void virtio_scsi_dataplane_stop(VirtIODevice *s);
 
+void virtio_scsi_device_event(DeviceState *dev, int event, int queue,
+  Error **errp);
+
 #endif /* QEMU_VIRTIO_SCSI_H */
-- 
2.17.1




[PATCH 2/6] virtio: introduce helper function for kick/call device event

2021-03-25 Thread Dongli Zhang
This is to introduce the helper function for virtio device to kick or
call.

Signed-off-by: Dongli Zhang 
---
 hw/virtio/virtio.c | 64 ++
 include/hw/virtio/virtio.h |  3 ++
 2 files changed, 67 insertions(+)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 07f4e60b30..e081041a75 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -30,6 +30,8 @@
 #include "sysemu/runstate.h"
 #include "standard-headers/linux/virtio_ids.h"
 
+/* #define DEBUG_VIRTIO_EVENT */
+
 /*
  * The alignment to use between consumer and producer parts of vring.
  * x86 pagesize again. This is the default, used by transports like PCI
@@ -2572,6 +2574,68 @@ void virtio_notify(VirtIODevice *vdev, VirtQueue *vq)
 virtio_irq(vq);
 }
 
+static void virtio_device_event_call(VirtQueue *vq, bool eventfd,
+ Error **errp)
+{
+#ifdef DEBUG_VIRTIO_EVENT
+printf("The 'call' event is triggered for path=%s, queue=%d, irqfd=%d.\n",
+   object_get_canonical_path(OBJECT(vq->vdev)),
+   vq->queue_index, eventfd);
+#endif
+
+if (eventfd) {
+virtio_set_isr(vq->vdev, 0x1);
+event_notifier_set(>guest_notifier);
+} else {
+virtio_irq(vq);
+}
+}
+
+static void virtio_device_event_kick(VirtQueue *vq, bool eventfd,
+ Error **errp)
+{
+#ifdef DEBUG_VIRTIO_EVENT
+printf("The 'kick' event is triggered for path=%s, queue=%d.\n",
+   object_get_canonical_path(OBJECT(vq->vdev)), vq->queue_index);
+#endif
+
+virtio_queue_notify(vq->vdev, virtio_get_queue_index(vq));
+}
+
+typedef void (VirtIOEvent)(VirtQueue *vq, bool eventfd, Error **errp);
+
+static VirtIOEvent *virtio_event_funcs[DEVICE_EVENT_MAX] = {
+[DEVICE_EVENT_CALL] = virtio_device_event_call,
+[DEVICE_EVENT_KICK] = virtio_device_event_kick
+};
+
+void virtio_device_event(DeviceState *dev, int event, int queue,
+ bool eventfd, Error **errp)
+{
+struct VirtIODevice *vdev = VIRTIO_DEVICE(dev);
+int num = virtio_get_num_queues(vdev);
+VirtQueue *vq;
+
+assert(event < DEVICE_EVENT_MAX);
+
+if (vdev->broken) {
+error_setg(errp, "Broken device");
+return;
+}
+
+if (queue < 0 || queue >= num) {
+error_setg(errp, "Invalid queue %d", queue);
+return;
+}
+
+vq = >vq[queue];
+
+if (virtio_event_funcs[event])
+virtio_event_funcs[event](vq, eventfd, errp);
+else
+error_setg(errp, "The event is not supported");
+}
+
 void virtio_notify_config(VirtIODevice *vdev)
 {
 if (!(vdev->status & VIRTIO_CONFIG_S_DRIVER_OK))
diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index b7ece7a6a8..21bb13ffa6 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -210,6 +210,9 @@ void virtqueue_get_avail_bytes(VirtQueue *vq, unsigned int 
*in_bytes,
 void virtio_notify_irqfd(VirtIODevice *vdev, VirtQueue *vq);
 void virtio_notify(VirtIODevice *vdev, VirtQueue *vq);
 
+void virtio_device_event(DeviceState *dev, int event, int queue,
+ bool eventfd, Error **errp);
+
 int virtio_save(VirtIODevice *vdev, QEMUFile *f);
 
 extern const VMStateInfo virtio_vmstate_info;
-- 
2.17.1




[PATCH 6/6] virtio-net-pci: implement device event interface for kick/call

2021-03-25 Thread Dongli Zhang
This is to implement the device event interface for virtio-net-pci.

Signed-off-by: Dongli Zhang 
---
 hw/net/virtio-net.c|  9 +
 hw/virtio/virtio-net-pci.c | 10 ++
 include/hw/virtio/virtio-net.h |  3 +++
 3 files changed, 22 insertions(+)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 66b9ff4511..b5c3fa392c 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -3195,6 +3195,15 @@ static bool failover_hide_primary_device(DeviceListener 
*listener,
 return qatomic_read(>failover_primary_hidden);
 }
 
+void virtio_net_device_event(DeviceState *dev, int event, int queue,
+ Error **errp)
+{
+VirtIONet *n = VIRTIO_NET(dev);
+bool irqfd = n->vhost_started;
+
+virtio_device_event(dev, event, queue, irqfd, errp);
+}
+
 static void virtio_net_device_realize(DeviceState *dev, Error **errp)
 {
 VirtIODevice *vdev = VIRTIO_DEVICE(dev);
diff --git a/hw/virtio/virtio-net-pci.c b/hw/virtio/virtio-net-pci.c
index aa0b3caecb..1fa5a6fe5d 100644
--- a/hw/virtio/virtio-net-pci.c
+++ b/hw/virtio/virtio-net-pci.c
@@ -46,6 +46,15 @@ static Property virtio_net_properties[] = {
 DEFINE_PROP_END_OF_LIST(),
 };
 
+static void virtio_net_pci_event(DeviceState *dev, int event, int queue,
+ Error **errp)
+{
+VirtIONetPCI *vnet = VIRTIO_NET_PCI(dev);
+DeviceState *vdev = DEVICE(>vdev);
+
+virtio_net_device_event(vdev, event, queue, errp);
+}
+
 static void virtio_net_pci_realize(VirtIOPCIProxy *vpci_dev, Error **errp)
 {
 DeviceState *qdev = DEVICE(vpci_dev);
@@ -77,6 +86,7 @@ static void virtio_net_pci_class_init(ObjectClass *klass, 
void *data)
 k->class_id = PCI_CLASS_NETWORK_ETHERNET;
 set_bit(DEVICE_CATEGORY_NETWORK, dc->categories);
 device_class_set_props(dc, virtio_net_properties);
+dc->event = virtio_net_pci_event;
 vpciklass->realize = virtio_net_pci_realize;
 }
 
diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
index 7e96d193aa..d88c9969ea 100644
--- a/include/hw/virtio/virtio-net.h
+++ b/include/hw/virtio/virtio-net.h
@@ -214,4 +214,7 @@ struct VirtIONet {
 void virtio_net_set_netclient_name(VirtIONet *n, const char *name,
const char *type);
 
+void virtio_net_device_event(DeviceState *dev, int event, int queue,
+ Error **errp);
+
 #endif
-- 
2.17.1




[PATCH 0/6] Add debug interface to kick/call on purpose

2021-03-25 Thread Dongli Zhang
The virtio device/driver (e.g., vhost-scsi or vhost-net) may hang due to
the loss of doorbell kick, e.g.,

https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg01711.html

... or due to the loss of IRQ, e.g., as fixed by linux kernel commit
fe200ae48ef5 ("genirq: Mark polled irqs and defer the real handler").

This patch introduces a new debug interface 'DeviceEvent' to DeviceClass
to help narrow down if the issue is due to loss of irq/kick. So far the new
interface handles only two events: 'call' and 'kick'. Any device (e.g.,
virtio/vhost or VFIO) may implement the interface (e.g., via eventfd, MSI-X
or legacy IRQ).

The 'call' is to inject irq on purpose by admin for a specific device (e.g.,
vhost-scsi) from QEMU/host to VM, while the 'kick' is to kick the doorbell
on purpose by admin at QEMU/host side for a specific device.


This device can be used as a workaround if call/kick is lost due to
virtualization software (e.g., kernel or QEMU) issue.

We may also implement the interface for VFIO PCI, e.g., to write to
VFIOPCIDevice->msi_vectors[i].interrupt will be able to inject IRQ to VM
on purpose. This is considered future work once the virtio part is done.


Below is from live crash analysis. Initially, the queue=2 has count=15 for
'kick' eventfd_ctx. Suppose there is data in vring avail while there is no
used available. We suspect this is because vhost-scsi was not notified by
VM. In order to narrow down and analyze the issue, we use live crash to
dump the current counter of eventfd for queue=2.

crash> eventfd_ctx 8f67f6bbe700
struct eventfd_ctx {
  kref = {
refcount = {
  refs = {
counter = 4
  }
}
  }, 
  wqh = {
lock = {
  {
rlock = {
  raw_lock = {
val = {
  counter = 0
}
  }
}
  }
}, 
head = {
  next = 0x8f841dc08e18, 
  prev = 0x8f841dc08e18
}
  }, 
  count = 15, ---> eventfd is 15 !!!
  flags = 526336
}

Now we kick the doorbell for vhost-scsi queue=2 on purpose for diagnostic
with this interface.

{ "execute": "x-debug-device-event",
  "arguments": { "dev": "/machine/peripheral/vscsi0",
 "event": "kick", "queue": 2 } }

The counter is increased to 16. Suppose the hang issue is resolved, it
indicates something bad is in software that the 'kick' is lost.

crash> eventfd_ctx 8f67f6bbe700
struct eventfd_ctx {
  kref = {
refcount = {
  refs = {
counter = 4
  }
}
  },
  wqh = {
lock = {
  {
rlock = {
  raw_lock = {
val = {
  counter = 0
}
  }
}
  }
},
head = {
  next = 0x8f841dc08e18,
  prev = 0x8f841dc08e18
}
  },
  count = 16, ---> eventfd incremented to 16 !!!
  flags = 526336
}


Original RFC link:

https://lists.nongnu.org/archive/html/qemu-devel/2021-01/msg03441.html

Changed since RFC:
  - add support for more virtio/vhost pci devices
  - add log (toggled by DEBUG_VIRTIO_EVENT) to virtio.c to say that this 
mischeivous command had been used
  - fix grammer error (s/lost/loss/)
  - change version to 6.1
  - fix incorrect example in qapi/qdev.json
  - manage event types with enum/array, instead of hard coding


Dongli Zhang (6):
   qdev: introduce qapi/hmp command for kick/call event
   virtio: introduce helper function for kick/call device event
   virtio-blk-pci: implement device event interface for kick/call
   virtio-scsi-pci: implement device event interface for kick/call
   vhost-scsi-pci: implement device event interface for kick/call
   virtio-net-pci: implement device event interface for kick/call

 hmp-commands.hx | 14 
 hw/block/virtio-blk.c   |  9 +
 hw/net/virtio-net.c |  9 +
 hw/scsi/vhost-scsi.c|  6 
 hw/scsi/virtio-scsi.c   |  9 +
 hw/virtio/vhost-scsi-pci.c  | 10 ++
 hw/virtio/virtio-blk-pci.c  | 10 ++
 hw/virtio/virtio-net-pci.c  | 10 ++
 hw/virtio/virtio-scsi-pci.c | 10 ++
 hw/virtio/virtio.c  | 64 
 include/hw/qdev-core.h  |  9 +
 include/hw/virtio/vhost-scsi.h  |  3 ++
 include/hw/virtio/virtio-blk.h  |  2 ++
 include/hw/virtio/virtio-net.h  |  3 ++
 include/hw/virtio/virtio-scsi.h |  3 ++
 include/hw/virtio/virtio.h  |  3 ++
 include/monitor/hmp.h   |  1 +
 qapi/qdev.json  | 30 +
 softmmu/qdev-monitor.c  | 56 +++
 19 files changed, 261 insertions(+)


I did tests with below cases.

- virtio-blk-pci (ioeventfd on/off, iothread, live migration)
- virtio-scsi-pci (ioeventfd on/off)
- vhost-scsi-pci
- virtio-net-pci (ioeventfd on/off, vhost)

Thank you very much!

Dongli Zhang





Re: [Qemu-block] [Qemu-devel] megasas: Unexpected response from lun 1 while scanning, scan aborted

2019-03-27 Thread Dongli Zhang



On 3/27/19 7:31 PM, Hannes Reinecke wrote:
> On 3/26/19 5:47 PM, Dongli Zhang wrote:
>> I am reporting an error that the scsi lun cannot initialize successfully 
>> when I
>> am emulating megasas scsi controller with qemu.
>>
>> I am not sure if this is issue in qemu or linux kernel.
>>
>> When 'lun=1' is specified, there is "Unexpected response from lun 1 while
>> scanning, scan aborted".
>>
>> Everything works well if 'lun=0' is specified.
>>
>>
>> Below is the qemu cmdline involved:
>>
>> -device megasas,id=scsi0 \
>> -device scsi-hd,drive=drive0,bus=scsi0.0,lun=1 \
>> -drive file=/home/zhang/img/test.img,if=none,id=drive0,format=raw
>>
>>
>> Below is the syslog related to 'scsi|SCSI'
>>
>> # dmesg | grep SCSI
>> [    0.392494] SCSI subsystem initialized
>> [    0.460666] Block layer SCSI generic (bsg) driver version 0.4 loaded 
>> (major
>> 251)
>> [    0.706788] sd 1:0:0:0: [sda] Attached SCSI disk
>> # dmesg | grep scsi
>> [    0.511643] scsi host0: Avago SAS based MegaRAID driver
>> [    0.523302] scsi 0:2:0:0: Unexpected response from lun 1 while scanning,
>> scan aborted
>> [    0.540364] scsi host1: ata_piix
>> [    0.540780] scsi host2: ata_piix
>> [    0.702396] scsi 1:0:0:0: Direct-Access ATA  QEMU HARDDISK    2.5+
>> PQ: 0 ANSI: 5
>>
>> When 'lun=1' is changed to 'lun=0', there is no issue.
>>
>> Thank you very much!
>>
> That's an artifact from the megasas emulation in quemu.
> Megasas (internally) can't handle LUN numbers (the RAID part only knows about
> 'disks'), so I took the decision to not expose devices with LUN != 0.
> Please use a different SCSI target number, not a non-zero LUN number.

The guest kernel is able to detect the disk if lun is always 0, while target
number is changed:

-device scsi-hd,drive=drive0,bus=scsi0.0,channel=0,scsi-id=0,lun=0
-device scsi-hd,drive=drive1,bus=scsi0.0,channel=0,scsi-id=1,lun=0

# dmesg | grep scsi
[0.935999] scsi host0: ata_piix
[0.936401] scsi host1: ata_piix
[1.100945] scsi 0:0:0:0: Direct-Access ATA  QEMU HARDDISK2.5+
PQ: 0 ANSI: 5
[1.102409] sd 0:0:0:0: Attached scsi generic sg0 type 0
[1.672952] scsi host2: Avago SAS based MegaRAID driver
[1.683886] scsi 2:2:0:0: Direct-Access QEMU QEMU HARDDISK2.5+
PQ: 0 ANSI: 5
[1.684915] scsi 2:2:1:0: Direct-Access QEMU QEMU HARDDISK2.5+
PQ: 0 ANSI: 5
[1.701529] sd 2:2:0:0: Attached scsi generic sg1 type 0
[1.704795] sd 2:2:1:0: Attached scsi generic sg2 type 0
# dmesg | grep SCSI
[0.111015] SCSI subsystem initialized
[0.904712] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 
246)
[1.121174] sd 0:0:0:0: [sda] Attached SCSI disk
[1.703739] sd 2:2:0:0: [sdb] Attached SCSI disk
[1.706964] sd 2:2:1:0: [sdc] Attached SCSI disk



If device with LUN != 0 will not be exposed, why not set max_lun = 0 as what
qemu lsi is doing?

diff --git a/hw/scsi/megasas.c b/hw/scsi/megasas.c
index a56317e..c966ee0 100644
--- a/hw/scsi/megasas.c
+++ b/hw/scsi/megasas.c
@@ -2298,7 +2298,7 @@ static void megasas_scsi_uninit(PCIDevice *d)
 static const struct SCSIBusInfo megasas_scsi_info = {
 .tcq = true,
 .max_target = MFI_MAX_LD,
-.max_lun = 255,
+.max_lun = 0,

 .transfer_data = megasas_xfer_complete,
 .get_sg_list = megasas_get_sg_list,


Thank you very much!

Dongli Zhang



[Qemu-block] megasas: Unexpected response from lun 1 while scanning, scan aborted

2019-03-26 Thread Dongli Zhang
I am reporting an error that the scsi lun cannot initialize successfully when I
am emulating megasas scsi controller with qemu.

I am not sure if this is issue in qemu or linux kernel.

When 'lun=1' is specified, there is "Unexpected response from lun 1 while
scanning, scan aborted".

Everything works well if 'lun=0' is specified.


Below is the qemu cmdline involved:

-device megasas,id=scsi0 \
-device scsi-hd,drive=drive0,bus=scsi0.0,lun=1 \
-drive file=/home/zhang/img/test.img,if=none,id=drive0,format=raw


Below is the syslog related to 'scsi|SCSI'

# dmesg | grep SCSI
[0.392494] SCSI subsystem initialized
[0.460666] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 
251)
[0.706788] sd 1:0:0:0: [sda] Attached SCSI disk
# dmesg | grep scsi
[0.511643] scsi host0: Avago SAS based MegaRAID driver
[0.523302] scsi 0:2:0:0: Unexpected response from lun 1 while scanning, 
scan aborted
[0.540364] scsi host1: ata_piix
[0.540780] scsi host2: ata_piix
[0.702396] scsi 1:0:0:0: Direct-Access ATA  QEMU HARDDISK2.5+ 
PQ: 0 ANSI: 5

When 'lun=1' is changed to 'lun=0', there is no issue.

Thank you very much!

Dongli Zhang



[Qemu-block] When AioHandler->is_external=true?

2018-11-20 Thread Dongli Zhang
Hi,

Would you please help explain in which case AioHandler->is_external is true, and
when it is false?

I read about iothread and mainloop and I am little bit confused about it.

Thank you very much!

Dongli Zhang



Re: [Qemu-block] [Qemu-devel] [PATCH 1/1] virtio-blk: rename iov to out_iov in virtio_blk_handle_request()

2018-11-14 Thread Dongli Zhang
Ping?

Thanks!

Dongli Zhang

On 11/07/2018 12:09 AM, Dongli Zhang wrote:
> In virtio_blk_handle_request(), in_iov is used for input header while iov
> is used for output header. Rename iov to out_iov to pair output header's
> name with in_iov to avoid confusing people when reading source code.
> 
> Signed-off-by: Dongli Zhang 
> ---
>  hw/block/virtio-blk.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
> index 83cf5c0..fb0d74d 100644
> --- a/hw/block/virtio-blk.c
> +++ b/hw/block/virtio-blk.c
> @@ -482,7 +482,7 @@ static int virtio_blk_handle_request(VirtIOBlockReq *req, 
> MultiReqBuffer *mrb)
>  {
>  uint32_t type;
>  struct iovec *in_iov = req->elem.in_sg;
> -struct iovec *iov = req->elem.out_sg;
> +struct iovec *out_iov = req->elem.out_sg;
>  unsigned in_num = req->elem.in_num;
>  unsigned out_num = req->elem.out_num;
>  VirtIOBlock *s = req->dev;
> @@ -493,13 +493,13 @@ static int virtio_blk_handle_request(VirtIOBlockReq 
> *req, MultiReqBuffer *mrb)
>  return -1;
>  }
>  
> -if (unlikely(iov_to_buf(iov, out_num, 0, >out,
> +if (unlikely(iov_to_buf(out_iov, out_num, 0, >out,
>  sizeof(req->out)) != sizeof(req->out))) {
>  virtio_error(vdev, "virtio-blk request outhdr too short");
>  return -1;
>  }
>  
> -iov_discard_front(, _num, sizeof(req->out));
> +iov_discard_front(_iov, _num, sizeof(req->out));
>  
>  if (in_iov[in_num - 1].iov_len < sizeof(struct virtio_blk_inhdr)) {
>  virtio_error(vdev, "virtio-blk request inhdr too short");
> @@ -526,7 +526,7 @@ static int virtio_blk_handle_request(VirtIOBlockReq *req, 
> MultiReqBuffer *mrb)
> >out.sector);
>  
>  if (is_write) {
> -qemu_iovec_init_external(>qiov, iov, out_num);
> +qemu_iovec_init_external(>qiov, out_iov, out_num);
>  trace_virtio_blk_handle_write(vdev, req, req->sector_num,
>req->qiov.size / BDRV_SECTOR_SIZE);
>  } else {
> 



Re: [Qemu-block] [Qemu-devel] How to emulate block I/O timeout on qemu side?

2018-11-12 Thread Dongli Zhang



On 11/13/2018 06:52 AM, Marc Olson via Qemu-devel wrote:
> On 11/11/18 11:36 PM, Dongli Zhang wrote:
>> On 11/12/2018 03:13 PM, Marc Olson via Qemu-devel wrote:
>>> On 11/3/18 10:24 AM, Dongli Zhang wrote:
>>>> The 'write' latency of sector=40960 is set to a very large value. When the 
>>>> I/O
>>>> is stalled in guest due to that sector=40960 is accessed, I do see below
>>>> messages in guest log:
>>>>
>>>> [   80.807755] nvme nvme0: I/O 11 QID 2 timeout, aborting
>>>> [   80.808095] nvme nvme0: Abort status: 0x4001
>>>>
>>>>
>>>> However, then nothing happens further. nvme I/O hangs in guest. I am not
>>>> able to
>>>> kill the qemu process with Ctrl+C. Both vnc and qemu user net do not work. 
>>>> I
>>>> need to kill qemu with "kill -9"
>>>>
>>>>
>>>> The same result for virtio-scsi and qemu is stuck as well.
>>> While I didn't try virtio-scsi, I wasn't able to reproduce this behavior 
>>> using
>>> nvme on Ubuntu 18.04 (4.15). What image and kernel version are you trying
>>> against?
>> Would you like to reproduce the "aborting" message or the qemu hang?
> I could not reproduce IO hanging in the guest, but I can reproduce qemu 
> hanging.
>> guest image: ubuntu 16.04
>> guest kernel: mainline linux kernel (and default kernel in ubuntu 16.04)
>> qemu: qemu-3.0.0 (with the blkdebug delay patch)
>>
>> Would you be able to see the nvme abort (which is indeed not supported by 
>> qemu)
>> message in guest kernel?
> Yes.
>> Once I see that message, I would not be able to kill the qemu-system-x86_64
>> command line with Ctrl+C.
> 
> I missed this part. I wasn't expecting to handle very long timeouts, but what
> appears to be happening is that the sleep doesn't get interrupted on 
> shutdown. I
> suspect something like this, on top of the series I sent last night, should 
> help:
> 
> diff --git a/block/blkdebug.c b/block/blkdebug.c
> index 6b1f2d6..0bfb91b 100644
> --- a/block/blkdebug.c
> +++ b/block/blkdebug.c
> @@ -557,8 +557,11 @@ static int rule_check(BlockDriverState *bs, uint64_t
> offset, uint64_t bytes)
>  remove_active_rule(s, delay_rule);
>  }
> 
> -if (latency != 0) {
> -qemu_co_sleep_ns(QEMU_CLOCK_REALTIME, latency);
> +while (latency > 0 && 
> !aio_external_disabled(bdrv_get_aio_context(bs))) {
> +int64_t cur_latency = MIN(latency, 10ULL);
> +
> +qemu_co_sleep_ns(QEMU_CLOCK_REALTIME, cur_latency);
> +latency -= cur_latency;
>  }
>  }
> 
> 
> /marc
> 
> 

I am able to interrupt qemu with above patch to periodically wake up and sleep
again.

Dongli Zhang



Re: [Qemu-block] [Qemu-devel] [PATCH v3 1/3] blkdebug: fix one shot rule processing

2018-11-12 Thread Dongli Zhang
Hi Marc,

When I play with the v3 patch set, the qemu hangs again and I need to kill it
with "kill -9".

I got below from guest:

[  104.828127] nvme nvme0: I/O 52 QID 1 timeout, aborting
[  104.828470] nvme nvme0: Abort status: 0x4001

nvme abort is not supported by qemu and therefore 0x4001 (NVME_INVALID_OPCODE |
NVME_DNR) is returned. qemu does not restart the device.


Below is the environment:

host: most recent qemu (master branch) with 3 patches applied, on top of Ubuntu
16.04.4 with 4.15.0-36-generic

guest: Ubuntu 16.04.4 with 4.13.0-39-generic.


The image to emulate nvme is a 128MB raw image (ext4). I randomly run "dd
if=/dev/zero of=output bs=1M count=30" on the nvme raw image to hit the
sector="40960" with "inject-delay" enabled as shown below:

[inject-delay]
event = "write_aio"
latency = "99"
sector = "40960"


sudo ./x86_64-softmmu/qemu-system-x86_64 \
-drive file=os.img,format=raw,if=none,id=disk0 \
-device
virtio-blk-pci,drive=disk0,id=device0,num-queues=2,iothread=io1,bootindex=0 \
-object iothread,id=io1 \
-smp 2 -m 2000M -enable-kvm  -vnc :0 -monitor stdio \
-device nvme,drive=nvmedrive,serial=deadbeaf1 \
-drive file=blkdebug:blkdebug.config:nvme.img,format=raw,if=none,id=nvmedrive \
-net nic -net user,hostfwd=tcp::5022-:22

I will debug where it hangs.

Dongli Zhang


On 11/12/2018 03:06 PM, Marc Olson via Qemu-devel wrote:
> If 'once' is specified, the rule should execute just once, regardless if
> it is supposed to return an error or not. Take the example where you
> want the first IO to an LBA to succeed, but subsequent IOs to fail. You
> could either use state transitions, or create two rules, one with
> error = 0 and once set to true, and one with a non-zero error.
> 
> Signed-off-by: Marc Olson 
> ---
>  block/blkdebug.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/block/blkdebug.c b/block/blkdebug.c
> index 0759452..327049b 100644
> --- a/block/blkdebug.c
> +++ b/block/blkdebug.c
> @@ -488,7 +488,7 @@ static int rule_check(BlockDriverState *bs, uint64_t 
> offset, uint64_t bytes)
>  }
>  }
>  
> -if (!rule || !rule->options.inject.error) {
> +if (!rule) {
>  return 0;
>  }
>  
> @@ -500,7 +500,7 @@ static int rule_check(BlockDriverState *bs, uint64_t 
> offset, uint64_t bytes)
>  remove_rule(rule);
>  }
>  
> -if (!immediately) {
> +if (error && !immediately) {
>  aio_co_schedule(qemu_get_current_aio_context(), 
> qemu_coroutine_self());
>  qemu_coroutine_yield();
>  }
> 



Re: [Qemu-block] [Qemu-devel] How to emulate block I/O timeout on qemu side?

2018-11-11 Thread Dongli Zhang



On 11/12/2018 03:13 PM, Marc Olson via Qemu-devel wrote:
> On 11/3/18 10:24 AM, Dongli Zhang wrote:
>> Hi all,
>>
>> I tried with the patch at:
>>
>> https://lists.gnu.org/archive/html/qemu-devel/2018-09/msg00394.html
>>
>> The patch is applied to qemu-3.0.0.
>>
>>
>> Below configuration is used to test the feature for guest VM nvme.
>>
>> # qemu-system-x86_64 \
>> -smp 4 -m 2000M -enable-kvm -vnc :0 -monitor stdio \
>> -net nic -net user,hostfwd=tcp::5022-:22 \
>> -drive file=virtio-disk.img,format=raw,if=none,id=disk0 \
>> -device virtio-blk-pci,drive=disk0,id=disk0-dev,num-queues=2,iothread=io1 \
>> -object iothread,id=io1 \
>> -device nvme,drive=nvme1,serial=deadbeaf1 \
>> -drive file=blkdebug:blkdebug.config:nvme.img,if=none,id=nvme1
>>
>> # cat blkdebug.config
>> [delay]
>> event = "write_aio"
>> latency = "99"
>> sector = "40960"
>>
>>
>> The 'write' latency of sector=40960 is set to a very large value. When the 
>> I/O
>> is stalled in guest due to that sector=40960 is accessed, I do see below
>> messages in guest log:
>>
>> [   80.807755] nvme nvme0: I/O 11 QID 2 timeout, aborting
>> [   80.808095] nvme nvme0: Abort status: 0x4001
>>
>>
>> However, then nothing happens further. nvme I/O hangs in guest. I am not 
>> able to
>> kill the qemu process with Ctrl+C. Both vnc and qemu user net do not work. I
>> need to kill qemu with "kill -9"
>>
>>
>> The same result for virtio-scsi and qemu is stuck as well.
> While I didn't try virtio-scsi, I wasn't able to reproduce this behavior using
> nvme on Ubuntu 18.04 (4.15). What image and kernel version are you trying 
> against?

Would you like to reproduce the "aborting" message or the qemu hang?

guest image: ubuntu 16.04
guest kernel: mainline linux kernel (and default kernel in ubuntu 16.04)
qemu: qemu-3.0.0 (with the blkdebug delay patch)

Would you be able to see the nvme abort (which is indeed not supported by qemu)
message in guest kernel?

Once I see that message, I would not be able to kill the qemu-system-x86_64
command line with Ctrl+C.

Dongli Zhang



Re: [Qemu-block] [Qemu-devel] [PATCH 1/1 V2] Add vhost-pci-blk driver

2018-11-08 Thread Dongli Zhang



On 11/09/2018 12:47 AM, Michael S. Tsirkin wrote:
> On Thu, Nov 08, 2018 at 10:09:00PM +0800, Dongli Zhang wrote:
>> It looks the kernel space vhost-blk can only process raw image.
>>
>> How about to verify that only raw image is used in the drive command line 
>> when
>> vhost-blk-pci is paired with it?
>>
>> Otherwise, vhost-blk-pci might be working with qcow2 image without any 
>> warning
>> on qemu side.
>>
>> Dongli Zhang
> 
> raw being raw can you really verify that?
> 

I meant to verify the property 'format=' of '-drive', e.g., to check if
BlockBackend->root->bs->drv->format_name is raw?

We allow the user to erroneously give a qcow2 file with 'format=raw'. However,
if 'format=qcow2' is set explicitly, vhots-blk-pci would exit with error as only
raw is supported in kernel space.

Dongli Zhang



Re: [Qemu-block] [Qemu-devel] [PATCH 1/1 V2] Add vhost-pci-blk driver

2018-11-08 Thread Dongli Zhang
It looks the kernel space vhost-blk can only process raw image.

How about to verify that only raw image is used in the drive command line when
vhost-blk-pci is paired with it?

Otherwise, vhost-blk-pci might be working with qcow2 image without any warning
on qemu side.

Dongli Zhang

On 11/06/2018 04:56 AM, Vitaly Mayatskikh wrote:
> This driver uses the kernel-mode acceleration for virtio-blk and
> allows to get a near bare metal disk performance inside a VM.
> 
> Signed-off-by: Vitaly Mayatskikh 
> ---
>  configure |  10 +
>  default-configs/virtio.mak|   1 +
>  hw/block/Makefile.objs|   1 +
>  hw/block/vhost-blk.c  | 429 ++
>  hw/virtio/virtio-pci.c|  60 +
>  hw/virtio/virtio-pci.h|  19 ++
>  include/hw/virtio/vhost-blk.h |  43 
>  7 files changed, 563 insertions(+)
>  create mode 100644 hw/block/vhost-blk.c
>  create mode 100644 include/hw/virtio/vhost-blk.h
> 
> diff --git a/configure b/configure
> index 46ae1e8c76..787bc780da 100755
> --- a/configure
> +++ b/configure
> @@ -371,6 +371,7 @@ vhost_crypto="no"
>  vhost_scsi="no"
>  vhost_vsock="no"
>  vhost_user=""
> +vhost_blk=""
>  kvm="no"
>  hax="no"
>  hvf="no"
> @@ -869,6 +870,7 @@ Linux)
>vhost_crypto="yes"
>vhost_scsi="yes"
>vhost_vsock="yes"
> +  vhost_blk="yes"
>QEMU_INCLUDES="-I\$(SRC_PATH)/linux-headers -I$(pwd)/linux-headers 
> $QEMU_INCLUDES"
>supported_os="yes"
>libudev="yes"
> @@ -1263,6 +1265,10 @@ for opt do
>;;
>--enable-vhost-vsock) vhost_vsock="yes"
>;;
> +  --disable-vhost-blk) vhost_blk="no"
> +  ;;
> +  --enable-vhost-blk) vhost_blk="yes"
> +  ;;
>--disable-opengl) opengl="no"
>;;
>--enable-opengl) opengl="yes"
> @@ -6000,6 +6006,7 @@ echo "vhost-crypto support $vhost_crypto"
>  echo "vhost-scsi support $vhost_scsi"
>  echo "vhost-vsock support $vhost_vsock"
>  echo "vhost-user support $vhost_user"
> +echo "vhost-blk support $vhost_blk"
>  echo "Trace backends$trace_backends"
>  if have_backend "simple"; then
>  echo "Trace output file $trace_file-"
> @@ -6461,6 +6468,9 @@ fi
>  if test "$vhost_user" = "yes" ; then
>echo "CONFIG_VHOST_USER=y" >> $config_host_mak
>  fi
> +if test "$vhost_blk" = "yes" ; then
> +  echo "CONFIG_VHOST_BLK=y" >> $config_host_mak
> +fi
>  if test "$blobs" = "yes" ; then
>echo "INSTALL_BLOBS=yes" >> $config_host_mak
>  fi
> diff --git a/default-configs/virtio.mak b/default-configs/virtio.mak
> index 1304849018..765c0a2a04 100644
> --- a/default-configs/virtio.mak
> +++ b/default-configs/virtio.mak
> @@ -1,5 +1,6 @@
>  CONFIG_VHOST_USER_SCSI=$(call land,$(CONFIG_VHOST_USER),$(CONFIG_LINUX))
>  CONFIG_VHOST_USER_BLK=$(call land,$(CONFIG_VHOST_USER),$(CONFIG_LINUX))
> +CONFIG_VHOST_BLK=$(CONFIG_LINUX)
>  CONFIG_VIRTIO=y
>  CONFIG_VIRTIO_9P=y
>  CONFIG_VIRTIO_BALLOON=y
> diff --git a/hw/block/Makefile.objs b/hw/block/Makefile.objs
> index 53ce5751ae..857ce823fc 100644
> --- a/hw/block/Makefile.objs
> +++ b/hw/block/Makefile.objs
> @@ -14,3 +14,4 @@ obj-$(CONFIG_SH4) += tc58128.o
>  obj-$(CONFIG_VIRTIO_BLK) += virtio-blk.o
>  obj-$(CONFIG_VIRTIO_BLK) += dataplane/
>  obj-$(CONFIG_VHOST_USER_BLK) += vhost-user-blk.o
> +obj-$(CONFIG_VHOST_BLK) += vhost-blk.o
> diff --git a/hw/block/vhost-blk.c b/hw/block/vhost-blk.c
> new file mode 100644
> index 00..4ca8040ee7
> --- /dev/null
> +++ b/hw/block/vhost-blk.c
> @@ -0,0 +1,429 @@
> +/*
> + * vhost-blk host device
> + *
> + * Copyright(C) 2018 IBM Corporation
> + *
> + * Authors:
> + *  Vitaly Mayatskikh 
> + *
> + * Largely based on the "vhost-user-blk.c" implemented by:
> + * Changpeng Liu 
> + *
> + * This work is licensed under the terms of the GNU LGPL, version 2 or later.
> + * See the COPYING.LIB file in the top-level directory.
> + *
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qapi/error.h"
> +#include "qemu/error-report.h"
> +#include "qemu/cutils.h"
> +#include "qom/object.h"
> +#include "hw/qdev-core.h"
> +#include "hw/virtio/vhost.h"
> +#include "hw/virtio/vhost-blk.h"
> +#include "hw/virtio/virtio.h"
> +#include "hw/virtio/virtio-bus.h"
> +

[Qemu-block] [PATCH 1/1] virtio-blk: rename iov to out_iov in virtio_blk_handle_request()

2018-11-06 Thread Dongli Zhang
In virtio_blk_handle_request(), in_iov is used for input header while iov
is used for output header. Rename iov to out_iov to pair output header's
name with in_iov to avoid confusing people when reading source code.

Signed-off-by: Dongli Zhang 
---
 hw/block/virtio-blk.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 83cf5c0..fb0d74d 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -482,7 +482,7 @@ static int virtio_blk_handle_request(VirtIOBlockReq *req, 
MultiReqBuffer *mrb)
 {
 uint32_t type;
 struct iovec *in_iov = req->elem.in_sg;
-struct iovec *iov = req->elem.out_sg;
+struct iovec *out_iov = req->elem.out_sg;
 unsigned in_num = req->elem.in_num;
 unsigned out_num = req->elem.out_num;
 VirtIOBlock *s = req->dev;
@@ -493,13 +493,13 @@ static int virtio_blk_handle_request(VirtIOBlockReq *req, 
MultiReqBuffer *mrb)
 return -1;
 }
 
-if (unlikely(iov_to_buf(iov, out_num, 0, >out,
+if (unlikely(iov_to_buf(out_iov, out_num, 0, >out,
 sizeof(req->out)) != sizeof(req->out))) {
 virtio_error(vdev, "virtio-blk request outhdr too short");
 return -1;
 }
 
-iov_discard_front(, _num, sizeof(req->out));
+iov_discard_front(_iov, _num, sizeof(req->out));
 
 if (in_iov[in_num - 1].iov_len < sizeof(struct virtio_blk_inhdr)) {
 virtio_error(vdev, "virtio-blk request inhdr too short");
@@ -526,7 +526,7 @@ static int virtio_blk_handle_request(VirtIOBlockReq *req, 
MultiReqBuffer *mrb)
>out.sector);
 
 if (is_write) {
-qemu_iovec_init_external(>qiov, iov, out_num);
+qemu_iovec_init_external(>qiov, out_iov, out_num);
 trace_virtio_blk_handle_write(vdev, req, req->sector_num,
   req->qiov.size / BDRV_SECTOR_SIZE);
 } else {
-- 
2.7.4




[Qemu-block] [PATCH 1/1] virtio-blk: fix comment for virtio_blk_rw_complete as nalloc is initially -1

2018-11-05 Thread Dongli Zhang
The initial value of nalloc is -1, but not 1.

Signed-off-by: Dongli Zhang 
---
This is based on git://git.kernel.org/pub/scm/virt/kvm/mst/qemu.git 
tags/for_upstream

 hw/block/virtio-blk.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 83cf5c0..30999c3 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -96,7 +96,7 @@ static void virtio_blk_rw_complete(void *opaque, int ret)
 trace_virtio_blk_rw_complete(vdev, req, ret);
 
 if (req->qiov.nalloc != -1) {
-/* If nalloc is != 1 req->qiov is a local copy of the original
+/* If nalloc is != -1 req->qiov is a local copy of the original
  * external iovec. It was allocated in submit_requests to be
  * able to merge requests. */
 qemu_iovec_destroy(>qiov);
-- 
2.7.4




Re: [Qemu-block] [Qemu-devel] [PULL 05/33] virtio-blk: fix comment for virtio_blk_rw_complete

2018-11-05 Thread Dongli Zhang



On 11/06/2018 02:15 AM, Michael S. Tsirkin wrote:
> From: Yaowei Bai 
> 
> Here should be submit_requests, there is no submit_merged_requests
> function.
> 
> Signed-off-by: Yaowei Bai 
> Reviewed-by: Michael S. Tsirkin 
> Signed-off-by: Michael S. Tsirkin 
> ---
>  hw/block/virtio-blk.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
> index 225fe44b7a..83cf5c01f9 100644
> --- a/hw/block/virtio-blk.c
> +++ b/hw/block/virtio-blk.c
> @@ -97,8 +97,8 @@ static void virtio_blk_rw_complete(void *opaque, int ret)
>  
>  if (req->qiov.nalloc != -1) {
>  /* If nalloc is != 1 req->qiov is a local copy of the original

Should it be "If nalloc is != -1" in the comment? Seems the initial state is -1.

> - * external iovec. It was allocated in submit_merged_requests
> - * to be able to merge requests. */
> + * external iovec. It was allocated in submit_requests to be
> + * able to merge requests. */
>  qemu_iovec_destroy(>qiov);
>  }
>  
> 

Dongli Zhang



Re: [Qemu-block] [Qemu-devel] How to emulate block I/O timeout on qemu side?

2018-11-03 Thread Dongli Zhang
Hi all,

I tried with the patch at:

https://lists.gnu.org/archive/html/qemu-devel/2018-09/msg00394.html

The patch is applied to qemu-3.0.0.


Below configuration is used to test the feature for guest VM nvme.

# qemu-system-x86_64 \
-smp 4 -m 2000M -enable-kvm -vnc :0 -monitor stdio \
-net nic -net user,hostfwd=tcp::5022-:22 \
-drive file=virtio-disk.img,format=raw,if=none,id=disk0 \
-device virtio-blk-pci,drive=disk0,id=disk0-dev,num-queues=2,iothread=io1 \
-object iothread,id=io1 \
-device nvme,drive=nvme1,serial=deadbeaf1 \
-drive file=blkdebug:blkdebug.config:nvme.img,if=none,id=nvme1

# cat blkdebug.config
[delay]
event = "write_aio"
latency = "99"
sector = "40960"


The 'write' latency of sector=40960 is set to a very large value. When the I/O
is stalled in guest due to that sector=40960 is accessed, I do see below
messages in guest log:

[   80.807755] nvme nvme0: I/O 11 QID 2 timeout, aborting
[   80.808095] nvme nvme0: Abort status: 0x4001


However, then nothing happens further. nvme I/O hangs in guest. I am not able to
kill the qemu process with Ctrl+C. Both vnc and qemu user net do not work. I
need to kill qemu with "kill -9"


The same result for virtio-scsi and qemu is stuck as well.


About blkdebug, I can only trigger the error by the config file. Is there a way
to inject error or latency via qemu monior? For instance, I would like to inject
error not for a specific sector or state, but for the entire disk when I input
some command via qemu monitor.

Dongli Zhang


On 11/03/2018 02:17 AM, John Snow wrote:
> 
> 
> On 11/02/2018 01:55 PM, Marc Olson wrote:
>> On 11/2/18 10:49 AM, John Snow wrote:
>>> On 11/02/2018 04:11 AM, Dongli Zhang wrote:
>>>> Hi,
>>>>
>>>> Is there any way to emulate I/O timeout on qemu side (not fault
>>>> injection in VM
>>>> kernel) without modifying qemu source code?
>>>>
>>>> For instance, I would like to observe/study/debug the I/O timeout
>>>> handling of
>>>> nvme, scsi, virtio-blk (not supported) of VM kernel.
>>>>
>>>> Is there a way to trigger this on purpose on qemu side?
>>>>
>>>> Thank you very much!
>>>>
>>>> Dongli Zhang
>>>>
>>> I don't think the blkdebug driver supports arbitrary delays right now.
>>> Maybe we could augment it to do so?
>>>
>>> (I thought someone already had, but maybe it wasn't merged?)
>>>
>>> Aha, here:
>>>
>>> https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg05297.html
>>> V2: https://lists.gnu.org/archive/html/qemu-devel/2018-09/msg00394.html
>>>
>>> Let's work from there.
>>
>> I've got updates to that patch series that fell on the floor due to
>> other competing things. I'll get some screen time this weekend to work
>> on them and submit v3.
>>
>> /marc
>>
> 
> Great! Please CC the usual maintainers, but also include me.
> 
> In the meantime, Dongli Zhang, why don't you try the v2 patch and see if
> that helps you out for your use case? Report back if it works for you or
> not.
> 
> --js
>