在 2022/10/18 16:55, He Rongguang 写道:
在 2022/10/18 14:54, Jason Wang 写道:
On Mon, Oct 17, 2022 at 8:31 PM Xuan Zhuo
<[email protected]> wrote:
On Mon, 17 Oct 2022 16:17:31 +0800, Jason Wang <[email protected]>
wrote:
Adding Stefan.
On Mon, Oct 17, 2022 at 3:47 PM Xuan Zhuo
<[email protected]> wrote:
Hello everyone,
# Background
Nowadays, there is a common scenario to accelerate communication
between
different VMs and containers, including light weight virtual
machine based
containers. One way to achieve this is to colocate them on the
same host.
However, the performance of inter-VM communication through network
stack is not
optimal and may also waste extra CPU cycles. This scenario has
been discussed
many times, but still no generic solution available [1] [2] [3].
With pci-ivshmem + SMC(Shared Memory Communications: [4]) based
PoC[5],
We found that by changing the communication channel between VMs
from TCP to SMC
with shared memory, we can achieve superior performance for a common
socket-based application[5]:
- latency reduced by about 50%
- throughput increased by about 300%
- CPU consumption reduced by about 50%
Since there is no particularly suitable shared memory management
solution
matches the need for SMC(See ## Comparison with existing
technology), and virtio
is the standard for communication in the virtualization world, we
want to
implement a virtio-ism device based on virtio, which can support
on-demand
memory sharing across VMs, containers or VM-container. To match
the needs of SMC,
the virtio-ism device need to support:
1. Dynamic provision: shared memory regions are dynamically
allocated and
provisioned.
2. Multi-region management: the shared memory is divided into
regions,
and a peer may allocate one or more regions from the same
shared memory
device.
3. Permission control: The permission of each region can be set
seperately.
Looks like virtio-ROCE
https://lore.kernel.org/all/[email protected]/T/
and virtio-vhost-user can satisfy the requirement?
# Virtio ism device
ISM devices provide the ability to share memory between different
guests on a
host. A guest's memory got from ism device can be shared with
multiple peers at
the same time. This shared relationship can be dynamically created
and released.
The shared memory obtained from the device is divided into
multiple ism regions
for share. ISM device provides a mechanism to notify other ism
region referrers
of content update events.
# Usage (SMC as example)
Maybe there is one of possible use cases:
1. SMC calls the interface ism_alloc_region() of the ism driver to
return the
location of a memory region in the PCI space and a token.
2. The ism driver mmap the memory region and return to SMC with
the token
3. SMC passes the token to the connected peer
3. the peer calls the ism driver interface
ism_attach_region(token) to
get the location of the PCI space of the shared memory
# About hot plugging of the ism device
Hot plugging of devices is a heavier, possibly failed,
time-consuming, and
less scalable operation. So, we don't plan to support it for now.
# Comparison with existing technology
## ivshmem or ivshmem 2.0 of Qemu
1. ivshmem 1.0 is a large piece of memory that can be seen by
all devices that
use this VM, so the security is not enough.
2. ivshmem 2.0 is a shared memory belonging to a VM that can
be read-only by all
other VMs that use the ivshmem 2.0 shared memory device, which
also does not
meet our needs in terms of security.
## vhost-pci and virtiovhostuser
Does not support dynamic allocation and therefore not suitable
for SMC.
I think this is an implementation issue, we can support VHOST IOTLB
message then the regions could be added/removed on demand.
1. After the attacker connects with the victim, if the attacker does
not
dereference memory, the memory will be occupied under
virtiovhostuser. In the
case of ism devices, the victim can directly release the
reference, and the
maliciously referenced region only occupies the attacker's
resources
Let's define the security boundary here. E.g do we trust the device or
not? If yes, in the case of virtiovhostuser, can we simple do
VHOST_IOTLB_UNMAP then we can safely release the memory from the
attacker.
2. The ism device of a VM can be shared with multiple (1000+) VMs at
the same
time, which is a challenge for virtiovhostuser
Please elaborate more the the challenges, anything make
virtiovhostuser different?
Hi, besides that, I think there's another distinctive difference
between virtio-ism+smc and virtiovhostuser: in virtiovhostuser, one
end is frontend(virtio-net device), the other end is vhost backend,
thus it's one frontend to one backend model, whereas in our business
scenario, we need a dynamically network communication model, in which
one end that runs for a long time may connect and communicate to a
just booted VM, i.e., each end is equal, thus there are no frontend or
vhost backend roles as in vhost, and each end may appear and disappear
dynamically, not provisioned in advance.
Ok, please describe them in the changelog at least.
Note that what I want to say is, virtio-vhost-user could be tweaked to
achieve the same goal. For the dynamic provision, it could be something
like having a 0 for VHOST_IOTLB_UPDATE message (like what mmap()) works.
I wonder if we can unify them.
Thanks
3. The sharing relationship of ism is dynamically increased, and
virtiovhostuser
determines the sharing relationship at startup.
Not necessarily with IOTLB API?
4. For security issues, the device under virtiovhostuser may mmap
more memory,
while ism only maps one region to other devices
With VHOST_IOTLB_MAP, the map could be done per region.
Thanks
Thanks.
Thanks
# Design
This is a structure diagram based on ism sharing between two vms.
|-------------------------------------------------------------------------------------------------------------|
| |------------------------------------------------|
|------------------------------------------------| |
| | Guest | |
Guest | |
| | | | | |
| | ---------------- | |
---------------- | |
| | | driver | [M1] [M2] [M3] | |
| driver | [M2] [M3] | |
| | ---------------- | | | | |
---------------- | | | |
| | |cq| |map |map |map | |
|cq| |map |map | |
| | | | | | | | | |
| | | | |
| | | | ------------------- | |
| | -------------------- | |
| |----|--|----------------| device memory |-----|
|----|--|----------------| device memory |----| |
| | | | ------------------- | |
| | -------------------- | |
| | | |
| | | |
| | | |
| | | |
| | Qemu | | |
Qemu | | |
| |--------------------------------+---------------|
|-------------------------------+----------------| |
| | | |
| | | |
| |------------------------------+------------------------| |
| | |
| | |
| -------------------------- |
| | M1 | | M2 | | M3 | |
| -------------------------- |
| |
| HOST |
---------------------------------------------------------------------------------------------------------------
# POC code
Kernel:
https://github.com/fengidri/linux-kernel-virtio-ism/commits/ism
Qemu: https://github.com/fengidri/qemu/commits/ism
If there are any problems, please point them out.
Hope to hear from you, thank you.
[1]
https://projectacrn.github.io/latest/tutorials/enable_ivshmem.html
[2] https://dl.acm.org/doi/10.1145/2847562
[3] https://hal.archives-ouvertes.fr/hal-00368622/document
[4] https://lwn.net/Articles/711071/
[5]
https://lore.kernel.org/netdev/[email protected]/T/
Xuan Zhuo (2):
Reserve device id for ISM device
virtio-ism: introduce new device virtio-ism
content.tex | 3 +
virtio-ism.tex | 340
+++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 343 insertions(+)
create mode 100644 virtio-ism.tex
--
2.32.0.3.g01195cf9f
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]