[Qemu-devel] Re: [RFC][PATCH] performance improvement for windows guests, running on top of virtio block device
On 01/11/2010 09:40 AM, Vadim Rozenfeld wrote: The following patch allows us to improve Windows virtio block driver performance on small size requests. Additionally, it leads to reducing of cpu usage on write IOs Note, this is not an improvement for Windows specifically. diff --git a/hw/virtio-blk.c b/hw/virtio-blk.c index a2f0639..0e3a8d5 100644 --- a/hw/virtio-blk.c +++ b/hw/virtio-blk.c @@ -28,6 +28,7 @@ typedef struct VirtIOBlock char serial_str[BLOCK_SERIAL_STRLEN + 1]; QEMUBH *bh; size_t config_size; +unsigned int pending; } VirtIOBlock; static VirtIOBlock *to_virtio_blk(VirtIODevice *vdev) @@ -87,6 +88,8 @@ typedef struct VirtIOBlockReq struct VirtIOBlockReq *next; } VirtIOBlockReq; +static void virtio_blk_handle_output(VirtIODevice *vdev, VirtQueue *vq); + static void virtio_blk_req_complete(VirtIOBlockReq *req, int status) { VirtIOBlock *s = req-dev; @@ -95,6 +98,11 @@ static void virtio_blk_req_complete(VirtIOBlockReq *req, int status) virtqueue_push(s-vq,req-elem, req-qiov.size + sizeof(*req-in)); virtio_notify(s-vdev, s-vq); +if(--s-pending == 0) { +virtio_queue_set_notification(s-vq, 1); +virtio_blk_handle_output(s-vdev, s-vq); +} + Coding style: space after if. See the CODING_STYLE file. @@ -340,6 +348,9 @@ static void virtio_blk_handle_output(VirtIODevice *vdev, VirtQueue *vq) exit(1); } +if(++s-pending == 1) +virtio_queue_set_notification(s-vq, 0); + req-out = (void *)req-elem.out_sg[0].iov_base; req-in = (void *)req-elem.in_sg[req-elem.in_num - 1].iov_base; Coding style: space after if, braces after if. Your patch is word wrapped, please send it correctly. Easiest using git send-email. The patch has potential to reduce performance on volumes with multiple spindles. Consider two processes issuing sequential reads into a RAID array. With this patch, the reads will be executed sequentially rather than in parallel, so I think a follow-on patch to make the minimum depth a parameter (set by the guest? the host?) would be helpful. -- error compiling committee.c: too many arguments to function
Re: [Qemu-devel] [PATCH 5/8] virtio-serial-bus: Add support for buffering guest output, throttling guests
On (Fri) Jan 08 2010 [13:35:03], Jamie Lokier wrote: Since VNC is clearly designed to work over TCP, and is written by people who know this, I'm wondering why you think it needs to be different for virtio-serial. For vnc putting stuff from a guest clipboard into vnc client clipboard using the ServerCutText command, the entire buffer has to be provided after sending the command and the 'length' values. In this case, if the data from guest arrives in multiple packets, we really don't want to call into the write function multiple times. A single clipboard entry has to be created in the client with the entire contents, so a single write operation has to be invoked. For this to happen, there has to be some indication from the guest as to how much data was written in one write() operation, which will let us make a single write operation to the vnc client. Amit
Re: [Qemu-devel] [PATCH 5/8] virtio-serial-bus: Add support for buffering guest output, throttling guests
On (Fri) Jan 08 2010 [10:26:59], Anthony Liguori wrote: On 01/08/2010 07:35 AM, Jamie Lokier wrote: Sometimes it looks like TCP is maintaining write boundaries, but it is just an artifact of its behaviour on many systems, and is not reliable even on those systems where it seems to happen most of the time. Even when connecting to localhost, you cannot rely on that. I have seen people write code assuming TCP keeps boundaries, and then some weeks later they are very confused debugging their code because it is not reliable... Since VNC is clearly designed to work over TCP, and is written by people who know this, I'm wondering why you think it needs to be different for virtio-serial. I'm confused about why the buffering is needed in the first place. I would think that any buffering should be pushed back to the guest. IOW, if there's available data from the char driver, but the guest doesn't have a buffer. Don't select on the char driver until the guest has a buffer available. If the guest attempts to write data but the char driver isn't ready to receive data, don't complete the request until the char driver can accept data. This is a different thing from what Jamie's talking about. A guest or a host might be interested in communicating data without waiting for the other end to come up. The other end can just start consuming the data (even the data that it missed while it wasn't connected) once it's up. (I can remove this option for now and add it later, if you prefer it that way.) Amit
Re: [Qemu-devel] Re: [RFC][PATCH] performance improvement for windows guests, running on top of virtio block device
On 01/11/2010 11:03 AM, Dor Laor wrote: On 01/11/2010 10:30 AM, Avi Kivity wrote: On 01/11/2010 09:40 AM, Vadim Rozenfeld wrote: The following patch allows us to improve Windows virtio block driver performance on small size requests. Additionally, it leads to reducing of cpu usage on write IOs Note, this is not an improvement for Windows specifically. diff --git a/hw/virtio-blk.c b/hw/virtio-blk.c index a2f0639..0e3a8d5 100644 --- a/hw/virtio-blk.c +++ b/hw/virtio-blk.c @@ -28,6 +28,7 @@ typedef struct VirtIOBlock char serial_str[BLOCK_SERIAL_STRLEN + 1]; QEMUBH *bh; size_t config_size; + unsigned int pending; } VirtIOBlock; static VirtIOBlock *to_virtio_blk(VirtIODevice *vdev) @@ -87,6 +88,8 @@ typedef struct VirtIOBlockReq struct VirtIOBlockReq *next; } VirtIOBlockReq; +static void virtio_blk_handle_output(VirtIODevice *vdev, VirtQueue *vq); + static void virtio_blk_req_complete(VirtIOBlockReq *req, int status) { VirtIOBlock *s = req-dev; @@ -95,6 +98,11 @@ static void virtio_blk_req_complete(VirtIOBlockReq *req, int status) virtqueue_push(s-vq,req-elem, req-qiov.size + sizeof(*req-in)); virtio_notify(s-vdev, s-vq); + if(--s-pending == 0) { + virtio_queue_set_notification(s-vq, 1); + virtio_blk_handle_output(s-vdev, s-vq); The above line should be moved out of the 'if'. Attached results with rhel5.4 (qemu0.11) for win2k8 32bit guest. Note the drastic reduction in cpu consumption. Attachment did not survive the email server, so you'll have to trust me saying that cpu consumption was done from 65% - 40% for reads and from 80% -- 30% for writes + } + Coding style: space after if. See the CODING_STYLE file. @@ -340,6 +348,9 @@ static void virtio_blk_handle_output(VirtIODevice *vdev, VirtQueue *vq) exit(1); } + if(++s-pending == 1) + virtio_queue_set_notification(s-vq, 0); + req-out = (void *)req-elem.out_sg[0].iov_base; req-in = (void *)req-elem.in_sg[req-elem.in_num - 1].iov_base; Coding style: space after if, braces after if. Your patch is word wrapped, please send it correctly. Easiest using git send-email. The patch has potential to reduce performance on volumes with multiple spindles. Consider two processes issuing sequential reads into a RAID array. With this patch, the reads will be executed sequentially rather than in parallel, so I think a follow-on patch to make the minimum depth a parameter (set by the guest? the host?) would be helpful.
Re: [Qemu-devel] [PATCH 5/8] virtio-serial-bus: Add support for buffering guest output, throttling guests
Amit Shah wrote: On (Fri) Jan 08 2010 [13:35:03], Jamie Lokier wrote: Since VNC is clearly designed to work over TCP, and is written by people who know this, I'm wondering why you think it needs to be different for virtio-serial. For vnc putting stuff from a guest clipboard into vnc client clipboard using the ServerCutText command, the entire buffer has to be provided after sending the command and the 'length' values. Are you talking about a VNC protocol command between qemu's VNC server and the user's VNC client, or a private protocol between the guest and qemu's VNC server? In this case, if the data from guest arrives in multiple packets, we really don't want to call into the write function multiple times. A single clipboard entry has to be created in the client with the entire contents, so a single write operation has to be invoked. Same question again: *Why do you think the VNC server (in qemu) needs to see the entire clipboard in a aingle write from the guest?* You have already told it the total length to expect. There is no ambiguity about where it ends. There is no need to do any more, if the reciever (in qemu) is implemented correctly with a sane protocol. That's assuming the guest sends to qemu's VNC server which then sends it to the user's VNC client. For this to happen, there has to be some indication from the guest as to how much data was written in one write() operation, which will let us make a single write operation to the vnc client. When it is sent to the user's VNC client, it will be split into multiple packets by TCP. You *can't* send a single large write over TCP without getting it split at arbitrary places. It's *impossible*. TCP doesn't support that. It will split and merge your writes arbitrarily. So the only interesting part is how it's transmitted from the guest to qemu's VNC server first. Do you get to design that protocol yourself? -- Jamie
Re: [Qemu-devel] [PATCH 5/8] virtio-serial-bus: Add support for buffering guest output, throttling guests
On (Mon) Jan 11 2010 [10:45:53], Jamie Lokier wrote: Amit Shah wrote: On (Fri) Jan 08 2010 [13:35:03], Jamie Lokier wrote: Since VNC is clearly designed to work over TCP, and is written by people who know this, I'm wondering why you think it needs to be different for virtio-serial. For vnc putting stuff from a guest clipboard into vnc client clipboard using the ServerCutText command, the entire buffer has to be provided after sending the command and the 'length' values. Are you talking about a VNC protocol command between qemu's VNC server and the user's VNC client, or a private protocol between the guest and qemu's VNC server? What happens is: 1. Guest puts something on its clipboard 2. An agent on the guest gets notified of new clipboard contents 3. This agent sends over the entire clipboard contents to qemu via virtio-serial 4. virtio-serial sends off this data to the virtio-serial-vnc code 5. ServerCutText message from the vnc backend is sent to the vnc client 6. vnc client's clipboard gets updated 7. You can see guest's clipboard contents in your client's clipboard. I'm talking about steps 3, 4, 5 here. In this case, if the data from guest arrives in multiple packets, we really don't want to call into the write function multiple times. A single clipboard entry has to be created in the client with the entire contents, so a single write operation has to be invoked. Same question again: *Why do you think the VNC server (in qemu) needs to see the entire clipboard in a aingle write from the guest?* You have already told it the total length to expect. There is no ambiguity about where it ends. Where does the total length come from? It has to come from the guest. Otherwise, the vnc code will not know if a byte stream contains two separate clipboard entries or just one huge clipboard entry. Earlier, I used to send the length of one write as issued by a guest to qemu. I just changed that to send a START and END flag so that I don't have to send the length. If this doesn't explain it, then I think we're not understanding each other here. Amit
[Qemu-devel] Hang up: qemu-system-ppc creates a ppc VM on a x86_64 host with OS Fedora12.
HI, guys, when qemu-system-ppc creates a f12 ppc VM on a f12 x86_64 host, qemu hangs on the command line of openbios such as: Welcome to OpenBIOS v1.0 built on Dec 7 2009 17:11 0 While qemu creates a RH5.4 ppc VM, the output is such as: cd:0,\ppc\chrp\yaboot.conf: Unknown or corrupt filesystem Can anyone also encounter the same issue or know how to solve it? It will be apprieciated for any help or comment from you. Cheers, Zhiyong Wu
[Qemu-devel] Re: Hang up: qemu-system-ppc creates a ppc VM on a x86_64 host with OS Fedora12.
Am 11.01.2010 um 12:22 schrieb Zhiyong Wu zwu.ker...@gmail.com: HI, guys, when qemu-system-ppc creates a f12 ppc VM on a f12 x86_64 host, qemu hangs on the command line of openbios such as: Welcome to OpenBIOS v1.0 built on Dec 7 2009 17:11 0 How do you start qemu? To boot from CD you need to pass -boot d. Alex
[Qemu-devel] [PATCH] dmg: fix -open failure
Currently the dmg image format driver simply opens the images as raw if any kind of failure happens. This is contrarty to the behaviour of all other image formats which just return an error and let the block core deal with it. Signed-off-by: Christoph Hellwig h...@lst.de Index: qemu/block/dmg.c === --- qemu.orig/block/dmg.c 2010-01-11 14:00:25.945021645 +0100 +++ qemu/block/dmg.c2010-01-11 14:03:03.006036707 +0100 @@ -90,24 +90,21 @@ static int dmg_open(BlockDriverState *bs /* read offset of info blocks */ if(lseek(s-fd,-0x1d8,SEEK_END)0) { -dmg_close: - close(s-fd); - /* open raw instead */ - bs-drv=bdrv_find_format(raw); - return bs-drv-bdrv_open(bs, filename, flags); +goto fail; } + info_begin=read_off(s-fd); if(info_begin==0) - goto dmg_close; + goto fail; if(lseek(s-fd,info_begin,SEEK_SET)0) - goto dmg_close; + goto fail; if(read_uint32(s-fd)!=0x100) - goto dmg_close; + goto fail; if((count = read_uint32(s-fd))==0) - goto dmg_close; + goto fail; info_end = info_begin+count; if(lseek(s-fd,0xf8,SEEK_CUR)0) - goto dmg_close; + goto fail; /* read offsets */ last_in_offset = last_out_offset = 0; @@ -116,14 +113,14 @@ dmg_close: count = read_uint32(s-fd); if(count==0) - goto dmg_close; + goto fail; type = read_uint32(s-fd); if(type!=0x6d697368 || count244) lseek(s-fd,count-4,SEEK_CUR); else { int new_size, chunk_count; if(lseek(s-fd,200,SEEK_CUR)0) - goto dmg_close; + goto fail; chunk_count = (count-204)/40; new_size = sizeof(uint64_t) * (s-n_chunks + chunk_count); s-types = qemu_realloc(s-types, new_size/2); @@ -142,7 +139,7 @@ dmg_close: chunk_count--; i--; if(lseek(s-fd,36,SEEK_CUR)0) - goto dmg_close; + goto fail; continue; } read_uint32(s-fd); @@ -163,11 +160,14 @@ dmg_close: s-compressed_chunk = qemu_malloc(max_compressed_size+1); s-uncompressed_chunk = qemu_malloc(512*max_sectors_per_chunk); if(inflateInit(s-zstream) != Z_OK) - goto dmg_close; + goto fail; s-current_chunk = s-n_chunks; return 0; +fail: +close(s-fd); +return -1; } static inline int is_sector_in_chunk(BDRVDMGState* s,
Re: [Qemu-devel] Re: [RFC][PATCH] performance improvement for windows guests, running on top of virtio block device
On Mon, Jan 11, 2010 at 11:19:21AM +0200, Dor Laor wrote: Attached results with rhel5.4 (qemu0.11) for win2k8 32bit guest. Note the drastic reduction in cpu consumption. Attachment did not survive the email server, so you'll have to trust me saying that cpu consumption was done from 65% - 40% for reads and from 80% -- 30% for writes For what kind of workload, and using what qemu parameters, and cpu consumtion in the host or guest? Either way this is an awfull lot, did you use kernel AIO on the host? Any chance you could publish the benchmark, guest and host configs so we have meaningfull numbers? FYI below is the manually applied patch without all the wrapping: Index: qemu/hw/virtio-blk.c === --- qemu.orig/hw/virtio-blk.c 2010-01-11 14:05:09.619254004 +0100 +++ qemu/hw/virtio-blk.c2010-01-11 14:06:54.385013004 +0100 @@ -28,6 +28,7 @@ typedef struct VirtIOBlock char serial_str[BLOCK_SERIAL_STRLEN + 1]; QEMUBH *bh; size_t config_size; +unsigned int pending; } VirtIOBlock; static VirtIOBlock *to_virtio_blk(VirtIODevice *vdev) @@ -87,6 +88,8 @@ typedef struct VirtIOBlockReq struct VirtIOBlockReq *next; } VirtIOBlockReq; +static void virtio_blk_handle_output(VirtIODevice *vdev, VirtQueue *vq); + static void virtio_blk_req_complete(VirtIOBlockReq *req, int status) { VirtIOBlock *s = req-dev; @@ -95,6 +98,12 @@ static void virtio_blk_req_complete(Virt virtqueue_push(s-vq, req-elem, req-qiov.size + sizeof(*req-in)); virtio_notify(s-vdev, s-vq); +if (--s-pending == 0) { +virtio_queue_set_notification(s-vq, 1); +virtio_blk_handle_output(s-vdev, s-vq); +} + + qemu_free(req); } @@ -340,6 +349,9 @@ static void virtio_blk_handle_output(Vir exit(1); } +if (++s-pending == 1) +virtio_queue_set_notification(s-vq, 0); + req-out = (void *)req-elem.out_sg[0].iov_base; req-in = (void *)req-elem.in_sg[req-elem.in_num - 1].iov_base;
Re: [Qemu-devel] Re: [RFC][PATCH] performance improvement for windows guests, running on top of virtio block device
On 01/11/2010 03:11 PM, Christoph Hellwig wrote: FYI below is the manually applied patch without all the wrapping: static void virtio_blk_req_complete(VirtIOBlockReq *req, int status) { VirtIOBlock *s = req-dev; @@ -95,6 +98,12 @@ static void virtio_blk_req_complete(Virt virtqueue_push(s-vq,req-elem, req-qiov.size + sizeof(*req-in)); virtio_notify(s-vdev, s-vq); +if (--s-pending == 0) { +virtio_queue_set_notification(s-vq, 1); +virtio_blk_handle_output(s-vdev, s-vq); +} + As Dor points out, the call to virtio_blk_handle_output() wants to be before the test for pending, so we scan the ring as early as possible -- error compiling committee.c: too many arguments to function
Re: [Qemu-devel] Re: [RFC][PATCH] performance improvement for windows guests, running on top of virtio block device
On Mon, Jan 11, 2010 at 03:13:53PM +0200, Avi Kivity wrote: As Dor points out, the call to virtio_blk_handle_output() wants to be before the test for pending, so we scan the ring as early as possible I just reposted the patch in a way that it applies to share the work I did when starting to review it.
Re: [Qemu-devel] cpuid problem in upstream qemu with kvm
Avi Kivity a...@redhat.com writes: On 01/07/2010 02:33 PM, Anthony Liguori wrote: There's another option. Make cpuid information part of live migration protocol, and then support something like -cpu Xeon-3550. We would remember the exact cpuid mask we present to the guest and then we could validate that we can obtain the same mask on the destination. Currently, our policy is to only migrate dynamic (from the guest's point of view) state, and specify static state on the command line [1]. I think your suggestion makes a lot of sense, but I'd like to expand it to move all guest state, whether dynamic or static. So '-m 1G' would be migrated as well (but not -mem-path). Similarly, in -drive file=...,if=ide,index=1, everything but file=... would be migrated. Becomes a bit clearer with the new way to configure stuff: -drive if=none,id=DRIVE-ID,...--- host, don't migrate -device ide=drive,drive=DRIVE-ID,... --- guest, do migrate [...]
Re: [Qemu-devel] Re: [RFC][PATCH] performance improvement for windows guests, running on top of virtio block device
On Mon, Jan 11, 2010 at 10:30:53AM +0200, Avi Kivity wrote: The patch has potential to reduce performance on volumes with multiple spindles. Consider two processes issuing sequential reads into a RAID array. With this patch, the reads will be executed sequentially rather than in parallel, so I think a follow-on patch to make the minimum depth a parameter (set by the guest? the host?) would be helpful. Let's think about the life cycle of I/O requests a bit. We have an idle virtqueue (aka one virtio-blk device). The first (read) request comes in, we get the virtio notify from the guest, which calls into virtio_blk_handle_output. With the new code we now disable the notify once we start processing the first request. If the second request hits the queue before we call into virtio_blk_get_request the second time we're fine even with the new code as we keep picking it up. If however it hits after we leave virtio_blk_handle_output, but before we complete the first request we do indeed introduce additional latency. So instead of disabling notify while requests are active we might want to only disable it while we are inside virtio_blk_handle_output. Something like the following minimally tested patch: Index: qemu/hw/virtio-blk.c === --- qemu.orig/hw/virtio-blk.c 2010-01-11 14:28:42.896010503 +0100 +++ qemu/hw/virtio-blk.c2010-01-11 14:40:13.535256353 +0100 @@ -328,7 +328,15 @@ static void virtio_blk_handle_output(Vir int num_writes = 0; BlockDriverState *old_bs = NULL; +/* + * While we are processing requests there is no need to get further + * notifications from the guest - it'll just burn cpu cycles doing + * useless context switches into the host. + */ +virtio_queue_set_notification(s-vq, 0); + while ((req = virtio_blk_get_request(s))) { +handle_request: if (req-elem.out_num 1 || req-elem.in_num 1) { fprintf(stderr, virtio-blk missing headers\n); exit(1); @@ -358,6 +366,18 @@ static void virtio_blk_handle_output(Vir } } +/* + * Once we're done processing all pending requests re-enable the queue + * notification. If there's an entry pending after we enabled + * notification again we hit a race and need to process it before + * returning. + */ +virtio_queue_set_notification(s-vq, 1); +req = virtio_blk_get_request(s); +if (req) { +goto handle_request; +} + if (num_writes 0) { do_multiwrite(old_bs, blkreq, num_writes); }
Re: [Qemu-devel] [PATCH] dmg: fix -open failure
Am 11.01.2010 14:06, schrieb Christoph Hellwig: Currently the dmg image format driver simply opens the images as raw if any kind of failure happens. This is contrarty to the behaviour of all other image formats which just return an error and let the block core deal with it. Signed-off-by: Christoph Hellwig h...@lst.de Acked-by: Kevin Wolf kw...@redhat.com I mean looking at the patched code I see lots of things that are wrong, but they are all unrelated to your change: There are error cases where memory is leaked, and it should use bdrv_* functions instead of the native open/read/etc. And obviously coding style is completely off (most annoying: tabs!) Kevin
Re: [Qemu-devel] [PATCH] dmg: fix -open failure
On Mon, Jan 11, 2010 at 02:43:59PM +0100, Kevin Wolf wrote: Am 11.01.2010 14:06, schrieb Christoph Hellwig: Currently the dmg image format driver simply opens the images as raw if any kind of failure happens. This is contrarty to the behaviour of all other image formats which just return an error and let the block core deal with it. Signed-off-by: Christoph Hellwig h...@lst.de Acked-by: Kevin Wolf kw...@redhat.com I mean looking at the patched code I see lots of things that are wrong, but they are all unrelated to your change: There are error cases where memory is leaked, and it should use bdrv_* functions instead of the native open/read/etc. And obviously coding style is completely off (most annoying: tabs!) Yes, the code pretty much is a mess, but I didn't really want to touch it. I just looked into picking up your search host_ for raw patches and was looking for all the block image driver functionality in the tree. Kevin ---end quoted text---
Re: [Qemu-devel] Re: [RFC][PATCH] performance improvement for windows guests, running on top of virtio block device
On Mon, Jan 11, 2010 at 03:13:53PM +0200, Avi Kivity wrote: As Dor points out, the call to virtio_blk_handle_output() wants to be before the test for pending, so we scan the ring as early as possible It could cause a race window where we add an entry to the ring after we run virtio_blk_handle_output, but before re-enabling the notification. But I think my variant of the patch that I just posted should deal with this in an even better way.
Re: [Qemu-devel] Re: [RFC][PATCH] performance improvement for windows guests, running on top of virtio block device
On 01/11/2010 07:42 AM, Christoph Hellwig wrote: On Mon, Jan 11, 2010 at 10:30:53AM +0200, Avi Kivity wrote: The patch has potential to reduce performance on volumes with multiple spindles. Consider two processes issuing sequential reads into a RAID array. With this patch, the reads will be executed sequentially rather than in parallel, so I think a follow-on patch to make the minimum depth a parameter (set by the guest? the host?) would be helpful. Let's think about the life cycle of I/O requests a bit. We have an idle virtqueue (aka one virtio-blk device). The first (read) request comes in, we get the virtio notify from the guest, which calls into virtio_blk_handle_output. With the new code we now disable the notify once we start processing the first request. If the second request hits the queue before we call into virtio_blk_get_request the second time we're fine even with the new code as we keep picking it up. If however it hits after we leave virtio_blk_handle_output, but before we complete the first request we do indeed introduce additional latency. So instead of disabling notify while requests are active we might want to only disable it while we are inside virtio_blk_handle_output. Something like the following minimally tested patch: I'd suggest that we get even more aggressive and install an idle bottom half that checks the queue for newly submitted requests. If we keep getting requests submitted before a new one completes, we'll never take an I/O exit. The same approach is probably a good idea for virtio-net. Regards, Anthony Liguori
[Qemu-devel] Re: [ kvm-Bugs-2907597 ] qemu vnc server clips at 2560x1600
On 01/10/2010 10:30 AM, Avi Kivity wrote: On 01/10/2010 06:26 PM, SourceForge.net wrote: Initial Comment: So I am running using the VESA driver to run an Ubuntu 9.10 guest at 2560x1600 (I had to modify the xserver-video-vesa package to remove an internal screen limit of 2048x2048 in the xorg vesa driver) and everything works great except that the qemu vnc server appears to clip at this resolution. The problem goes away if I run 1900x1200 and it doesn't change if I run 16bit depth or 24bit depth. I have attached two screenshots, the first is vncing directly into qemu (which exhibits the problem) and the second is vncing to a vnc server I have running in the guest which doesn't have the problem. I poked around in vnc.c and couldn't see any limits but I feel like its a buffer limit of some kind. Also if you look very closely at the first image you can see that the first row is drawn correctly all the way across but subsequent rows are not. If you need more information doesn't hesitate to ask. Anthony, can you take a look at this? Seems like a serious issue, could find nothing obvious in vnc.c. VNC_MAX_WIDTH and VNC_MAX_HEIGHT in vnc.h are currently defined to 2048. We do dirty tracking with a bitmap and that bitmap is currently a fixed size. 2048 is bigger than any physical screen that I know of so I assume this is a multiple monitor scenario. Long term, I think exposing multiple monitors to the guest is a better approach for this kind of functionality. Since these resolutions for a single screen don't really exist, this is largely an untested path within the guest. Regards, Anthony Liguori
[Qemu-devel] Re: [RFC 0/7]: Add VNC connect/disconnect events
On Fri, Jan 08, 2010 at 07:47:09PM -0200, Luiz Capitulino wrote: Hi there, This series contains two related changes. First a small cleanup is done in the current 'query-vnc' command, then two new QMP asynchronous events are introduced: VNC connect and disconnect. That's, everytime a VNC client connects or disconnects from QEMU, the QMP client will get full VNC client and VNC server info. There's one problem though and that's why this series is a RFC. The connection is a two step procedure if an authentication mechism is enabled. First the client establishes the connection then it authenticates. Currently, 'info vnc' and 'query-vnc' will show client information as soon as it establishes the connection even if the client didn't autheticate yet. This series changes that. Now, if an authentication mechanism is enabled, client information will only be available _after_ it has authenticated. Also, the connect/disconnect events are only emitted after the authentication step. There's a way to fix this and add the old behavior back, but we'll need one additional event (say CONNECT_AUTH) and the client will have to look at the server info to learn that a disconnection happened before authentication. Hmm, I had not considered the timing problems when I discussed this with you previously. I think it is important to separate the connect event from the authentication event, because a mgmt application using QEMU may want to see clients connecting even if they abort/delay before authenticating, So perhaps we should declare that the lifecycle is - CONNECT (provide IP / port details) - AUTHENTICATED (provide IP / port details + authenticated ID details eg x509 dname, or SASL usernsmae) - DISCONNECT(provide IP / port details) Obviously AUTHENTICATED may be optional if the client goes away immedaitely before trying auth. The AUTHENTICATED event probably also ought to allow for an indication of success vs failure so the app can see failed login attempts Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|
Re: [Qemu-devel] [PATCH] dmg: fix -open failure
Am 11.01.2010 14:46, schrieb Christoph Hellwig: On Mon, Jan 11, 2010 at 02:43:59PM +0100, Kevin Wolf wrote: Am 11.01.2010 14:06, schrieb Christoph Hellwig: Currently the dmg image format driver simply opens the images as raw if any kind of failure happens. This is contrarty to the behaviour of all other image formats which just return an error and let the block core deal with it. Signed-off-by: Christoph Hellwig h...@lst.de Acked-by: Kevin Wolf kw...@redhat.com I mean looking at the patched code I see lots of things that are wrong, but they are all unrelated to your change: There are error cases where memory is leaked, and it should use bdrv_* functions instead of the native open/read/etc. And obviously coding style is completely off (most annoying: tabs!) Yes, the code pretty much is a mess, but I didn't really want to touch it. I just looked into picking up your search host_ for raw patches and was looking for all the block image driver functionality in the tree. Are you going to propose a cleaner patch? I have currently some other bugs to do first, but I was certainly planning to do so. However, I'll happily leave it to you if you have the time right now. For dmg, I'm not even sure if it's worth fixing. Does anybody use this driver? But I guess it's good enough for another lowest priority task on my todo list. Right after vvfat or so... Kevin
Re: [Qemu-devel] [PATCH] dmg: fix -open failure
On Mon, Jan 11, 2010 at 02:56:16PM +0100, Kevin Wolf wrote: Are you going to propose a cleaner patch? I have currently some other bugs to do first, but I was certainly planning to do so. However, I'll happily leave it to you if you have the time right now. I'm looking into doing it in the generic block layer, yes. For dmg, I'm not even sure if it's worth fixing. Does anybody use this driver? But I guess it's good enough for another lowest priority task on my todo list. Right after vvfat or so... Hehe. All these odd image formats are extremly low in my todo list either. I'd be really interested if there are any users around at all.
Re: [Qemu-devel] Re: [RFC][PATCH] performance improvement for windows guests, running on top of virtio block device
On 01/11/2010 07:47 AM, Christoph Hellwig wrote: On Mon, Jan 11, 2010 at 03:13:53PM +0200, Avi Kivity wrote: As Dor points out, the call to virtio_blk_handle_output() wants to be before the test for pending, so we scan the ring as early as possible It could cause a race window where we add an entry to the ring after we run virtio_blk_handle_output, but before re-enabling the notification. But I think my variant of the patch that I just posted should deal with this in an even better way. When we've disabled notifications, we should use any opportunity we have in userspace to check the rings to see if anything was added. Bottom halves are run at the very end of the event loop which means that they're guaranteed to be the last thing run. idle bottom halves can be rescheduled without causing an infinite loop and do not affect the select timeout (which normal bottom halves do). Regards, Anthony Liguori
Re: [Qemu-devel] [PATCH] dmg: fix -open failure
Am 11.01.2010 15:00, schrieb Christoph Hellwig: On Mon, Jan 11, 2010 at 02:56:16PM +0100, Kevin Wolf wrote: Are you going to propose a cleaner patch? I have currently some other bugs to do first, but I was certainly planning to do so. However, I'll happily leave it to you if you have the time right now. I'm looking into doing it in the generic block layer, yes. More or less the same hack, just in cleaner? Or trying to fundamentally change things? I think you haven't answered yet to what I said in the thread of my original hack. I'm quoting it here for convenience: Ok, if you start talking about layering, we can have a fundamental discussion on this topic and why the layering is broken anyway. Logically, we have image formats like qcow2, VMDK and raw, and they are stored in files, on CD-ROMs or general block devices. From a layering perspective, it is wrong to include the latter in the raw format driver in the first place. Actually, I think the differentiation between raw files and host_* is at the same level as protocols are. Probably they should be implemented very similarly. Do you think it's possible/worth the effort to try putting things straight here? Kevin
Re: [Qemu-devel] Re: [RFC][PATCH] performance improvement for windows guests, running on top of virtio block device
On 01/11/2010 03:42 PM, Christoph Hellwig wrote: On Mon, Jan 11, 2010 at 10:30:53AM +0200, Avi Kivity wrote: The patch has potential to reduce performance on volumes with multiple spindles. Consider two processes issuing sequential reads into a RAID array. With this patch, the reads will be executed sequentially rather than in parallel, so I think a follow-on patch to make the minimum depth a parameter (set by the guest? the host?) would be helpful. Let's think about the life cycle of I/O requests a bit. We have an idle virtqueue (aka one virtio-blk device). The first (read) request comes in, we get the virtio notify from the guest, which calls into virtio_blk_handle_output. With the new code we now disable the notify once we start processing the first request. If the second request hits the queue before we call into virtio_blk_get_request the second time we're fine even with the new code as we keep picking it up. If however it hits after we leave virtio_blk_handle_output, but before we complete the first request we do indeed introduce additional latency. So instead of disabling notify while requests are active we might want to only disable it while we are inside virtio_blk_handle_output. Something like the following minimally tested patch: Index: qemu/hw/virtio-blk.c === --- qemu.orig/hw/virtio-blk.c 2010-01-11 14:28:42.896010503 +0100 +++ qemu/hw/virtio-blk.c2010-01-11 14:40:13.535256353 +0100 @@ -328,7 +328,15 @@ static void virtio_blk_handle_output(Vir int num_writes = 0; BlockDriverState *old_bs = NULL; +/* + * While we are processing requests there is no need to get further + * notifications from the guest - it'll just burn cpu cycles doing + * useless context switches into the host. + */ +virtio_queue_set_notification(s-vq, 0); + while ((req = virtio_blk_get_request(s))) { +handle_request: if (req-elem.out_num 1 || req-elem.in_num 1) { fprintf(stderr, virtio-blk missing headers\n); exit(1); @@ -358,6 +366,18 @@ static void virtio_blk_handle_output(Vir } } +/* + * Once we're done processing all pending requests re-enable the queue + * notification. If there's an entry pending after we enabled + * notification again we hit a race and need to process it before + * returning. + */ +virtio_queue_set_notification(s-vq, 1); +req = virtio_blk_get_request(s); +if (req) { +goto handle_request; +} + I don't think this will have much effect. First, the time spent in virtio_blk_handle_output() is a small fraction of total guest time, so the probability of the guest hitting the notifications closed window is low. Second, while we're in that function, the vcpu that kicked us is stalled, and other vcpus are likely to be locked out of the queue by the guest. -- error compiling committee.c: too many arguments to function
Re: [Qemu-devel] Re: [RFC][PATCH] performance improvement for windows guests, running on top of virtio block device
On 01/11/2010 03:49 PM, Anthony Liguori wrote: So instead of disabling notify while requests are active we might want to only disable it while we are inside virtio_blk_handle_output. Something like the following minimally tested patch: I'd suggest that we get even more aggressive and install an idle bottom half that checks the queue for newly submitted requests. If we keep getting requests submitted before a new one completes, we'll never take an I/O exit. That has the downside of bouncing a cache line on unrelated exits. It probably doesn't matter with qemu as it is now, since it will bounce qemu_mutex, but it will hurt with large guests (especially if they have many rings). IMO we should get things to work well without riding on unrelated exits, especially as we're trying to reduce those exits. The same approach is probably a good idea for virtio-net. With vhost-net you don't see exits. -- error compiling committee.c: too many arguments to function
[Qemu-devel] Re: [ kvm-Bugs-2907597 ] qemu vnc server clips at 2560x1600
On 01/11/2010 03:53 PM, Anthony Liguori wrote: Anthony, can you take a look at this? Seems like a serious issue, could find nothing obvious in vnc.c. VNC_MAX_WIDTH and VNC_MAX_HEIGHT in vnc.h are currently defined to 2048. We do dirty tracking with a bitmap and that bitmap is currently a fixed size. 2048 is bigger than any physical screen that I know of so I assume this is a multiple monitor scenario. Long term, I think exposing multiple monitors to the guest is a better approach for this kind of functionality. Since these resolutions for a single screen don't really exist, this is largely an untested path within the guest. Google suggests such screens do exist, and -vga std (at least in qemu-kvm) supports them. I've asked the reporter to play with VNC_MAX_WIDTH. -- error compiling committee.c: too many arguments to function
Re: [Qemu-devel] Re: [RFC][PATCH] performance improvement for windows guests, running on top of virtio block device
On 01/11/2010 08:29 AM, Avi Kivity wrote: On 01/11/2010 03:49 PM, Anthony Liguori wrote: So instead of disabling notify while requests are active we might want to only disable it while we are inside virtio_blk_handle_output. Something like the following minimally tested patch: I'd suggest that we get even more aggressive and install an idle bottom half that checks the queue for newly submitted requests. If we keep getting requests submitted before a new one completes, we'll never take an I/O exit. That has the downside of bouncing a cache line on unrelated exits. The read and write sides of the ring are widely separated in physical memory specifically to avoid cache line bouncing. It probably doesn't matter with qemu as it is now, since it will bounce qemu_mutex, but it will hurt with large guests (especially if they have many rings). IMO we should get things to work well without riding on unrelated exits, especially as we're trying to reduce those exits. A block I/O request can potentially be very, very long lived. By serializing requests like this, there's a high likelihood that it's going to kill performance with anything capable of processing multiple requests. OTOH, if we aggressively poll the ring when we have an opportunity to, there's very little down side to that and it addresses the serialization problem. The same approach is probably a good idea for virtio-net. With vhost-net you don't see exits. The point is, when we've disabled notification, we should poll on the ring for additional requests instead of waiting for one to complete before looking at another one. Even with vhost-net, this logic is still applicable although potentially harder to achieve. Regards, Anthony Liguori
Re: [Qemu-devel] Re: [RFC][PATCH] performance improvement for windows guests, running on top of virtio block device
On 01/11/2010 04:37 PM, Anthony Liguori wrote: That has the downside of bouncing a cache line on unrelated exits. The read and write sides of the ring are widely separated in physical memory specifically to avoid cache line bouncing. I meant, exits on random vcpus will cause the cacheline containing the notification disable flag to bounce around. As it is, we read it on the vcpu that owns the queue and write it on that vcpu or the I/O thread. It probably doesn't matter with qemu as it is now, since it will bounce qemu_mutex, but it will hurt with large guests (especially if they have many rings). IMO we should get things to work well without riding on unrelated exits, especially as we're trying to reduce those exits. A block I/O request can potentially be very, very long lived. By serializing requests like this, there's a high likelihood that it's going to kill performance with anything capable of processing multiple requests. Right, that's why I suggested having a queue depth at which disabling notification kicks in. The patch hardcodes this depth to 1, unpatched qemu is infinite, a good value is probably spindle count + VAT. OTOH, if we aggressively poll the ring when we have an opportunity to, there's very little down side to that and it addresses the serialization problem. But we can't guarantee that we'll get those opportunities, so it doesn't address the problem in a general way. A guest that doesn't use hpet and only has a single virtio-blk device will not have any reason to exit to qemu. -- error compiling committee.c: too many arguments to function
[Qemu-devel] [PATCH 1/4] ppc-40x: Get TLB attributes from TLBLO.
The ZSEL was incorrectly beeing decoded from TLBHI. Decode it from TLBLO instead. Signed-off-by: Edgar E. Iglesias edgar.igles...@gmail.com --- target-ppc/op_helper.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/target-ppc/op_helper.c b/target-ppc/op_helper.c index cea27f2..3575b29 100644 --- a/target-ppc/op_helper.c +++ b/target-ppc/op_helper.c @@ -3990,7 +3990,6 @@ void helper_4xx_tlbwe_hi (target_ulong entry, target_ulong val) cpu_abort(env, Little-endian TLB entries are not supported by now\n); } tlb-PID = env-spr[SPR_40x_PID]; /* PID */ -tlb-attr = val 0xFF; LOG_SWTLB(%s: set up TLB %d RPN TARGET_FMT_plx EPN TARGET_FMT_lx size TARGET_FMT_lx prot %c%c%c%c PID %d\n, __func__, (int)entry, tlb-RPN, tlb-EPN, tlb-size, @@ -4016,6 +4015,7 @@ void helper_4xx_tlbwe_lo (target_ulong entry, target_ulong val) val); entry = 0x3F; tlb = env-tlb[entry].tlbe; +tlb-attr = val 0xFF; tlb-RPN = val 0xFC00; tlb-prot = PAGE_READ; if (val 0x200) -- 1.6.4.4
[Qemu-devel] [PATCH 4/4] ppc-40x: Correct ESR for zone protection faults.
Raise the zone protection fault in ESR for TLB faults caused by zone protection bits. Signed-off-by: Edgar E. Iglesias edgar.igles...@gmail.com --- target-ppc/helper.c | 23 ++- 1 files changed, 18 insertions(+), 5 deletions(-) diff --git a/target-ppc/helper.c b/target-ppc/helper.c index f9b5589..a4fae31 100644 --- a/target-ppc/helper.c +++ b/target-ppc/helper.c @@ -1171,6 +1171,8 @@ static int mmu40x_get_physical_address (CPUState *env, mmu_ctx_t *ctx, break; case 0x0: if (pr != 0) { +/* Raise Zone protection fault. */ +env-spr[SPR_40x_ESR] = 1 22; ctx-prot = 0; ret = -2; break; @@ -1183,6 +1185,8 @@ static int mmu40x_get_physical_address (CPUState *env, mmu_ctx_t *ctx, ctx-prot = tlb-prot; ctx-prot |= PAGE_EXEC; ret = check_prot(ctx-prot, rw, access_type); +if (ret == -2) +env-spr[SPR_40x_ESR] = 0; break; } if (ret = 0) { @@ -1580,11 +1584,20 @@ int cpu_ppc_handle_mmu_fault (CPUState *env, target_ulong address, int rw, /* Access rights violation */ env-exception_index = POWERPC_EXCP_DSI; env-error_code = 0; -env-spr[SPR_DAR] = address; -if (rw == 1) -env-spr[SPR_DSISR] = 0x0A00; -else -env-spr[SPR_DSISR] = 0x0800; +if (env-mmu_model == POWERPC_MMU_SOFT_4xx +|| env-mmu_model == POWERPC_MMU_SOFT_4xx_Z) { +env-spr[SPR_40x_DEAR] = address; +if (rw) { +env-spr[SPR_40x_ESR] |= 0x0080; +} +} else { +env-spr[SPR_DAR] = address; +if (rw == 1) { +env-spr[SPR_DSISR] = 0x0A00; +} else { +env-spr[SPR_DSISR] = 0x0800; +} +} break; case -4: /* Direct store exception */ -- 1.6.4.4
[Qemu-devel] [PATCH 0/4] PPC 40x MMU fixes.
Hi, I've been trying to boot linux on an emulated PPC 405 (Xilinx virtex 4) board but I ran into trouble with the 40x MMU emulation. Patch 4 is not the nicest but with this set I can now boot into a shell and get a pretty stable system. I'm very noob at ppc so comments are very welcome. Thanks, Edgar Edgar E. Iglesias (4): ppc-40x: Get TLB attributes from TLBLO. ppc-40x: Correct check for Endian swapping TLB entries. ppc-40x: Correct decoding of zone protection bits. ppc-40x: Correct ESR for zone protection faults. target-ppc/helper.c| 25 +++-- target-ppc/op_helper.c | 14 -- 2 files changed, 27 insertions(+), 12 deletions(-)
[Qemu-devel] [PATCH 3/4] ppc-40x: Correct decoding of zone protection bits.
The 40x MMU has 15 zones in the ZPR register. Signed-off-by: Edgar E. Iglesias edgar.igles...@gmail.com --- target-ppc/helper.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/target-ppc/helper.c b/target-ppc/helper.c index b233d4f..f9b5589 100644 --- a/target-ppc/helper.c +++ b/target-ppc/helper.c @@ -1155,7 +1155,7 @@ static int mmu40x_get_physical_address (CPUState *env, mmu_ctx_t *ctx, env-spr[SPR_40x_PID], 0, i) 0) continue; zsel = (tlb-attr 4) 0xF; -zpr = (env-spr[SPR_40x_ZPR] (28 - (2 * zsel))) 0x3; +zpr = (env-spr[SPR_40x_ZPR] (30 - (2 * zsel))) 0x3; LOG_SWTLB(%s: TLB %d zsel %d zpr %d rw %d attr %08x\n, __func__, i, zsel, zpr, rw, tlb-attr); /* Check execute enable bit */ -- 1.6.4.4
Re: [Qemu-devel] Re: [ kvm-Bugs-2907597 ] qemu vnc server clips at 2560x1600
On Monday, January 11, 2010, 14:53:05, Anthony Liguori wrote: 2048 is bigger than any physical screen that I know of so I assume this is a multiple monitor scenario. 30 TFTs have resolution 2560x1600. Speaking of large resolutions, I just tried 2360x1770 (the largest offered by VMWare VGA driver in Win2003 x64), and in my case, the left side of the screen is clipped, while the right side is mostly fine (only the mouse pointer is leaving trails behind): http://img27.imageshack.us/img27/2760/image1ud.png -- Jernej Simončič http://eternallybored.org/ What the gods get away with, the cows don't. -- The Aquinas Axiom
Re: [Qemu-devel] Re: [RFC][PATCH] performance improvement for windows guests, running on top of virtio block device
On 01/11/2010 08:46 AM, Avi Kivity wrote: On 01/11/2010 04:37 PM, Anthony Liguori wrote: That has the downside of bouncing a cache line on unrelated exits. The read and write sides of the ring are widely separated in physical memory specifically to avoid cache line bouncing. I meant, exits on random vcpus will cause the cacheline containing the notification disable flag to bounce around. As it is, we read it on the vcpu that owns the queue and write it on that vcpu or the I/O thread. Bottom halves are always run from the IO thread. It probably doesn't matter with qemu as it is now, since it will bounce qemu_mutex, but it will hurt with large guests (especially if they have many rings). IMO we should get things to work well without riding on unrelated exits, especially as we're trying to reduce those exits. A block I/O request can potentially be very, very long lived. By serializing requests like this, there's a high likelihood that it's going to kill performance with anything capable of processing multiple requests. Right, that's why I suggested having a queue depth at which disabling notification kicks in. The patch hardcodes this depth to 1, unpatched qemu is infinite, a good value is probably spindle count + VAT. That means we would need a user visible option which is quite unfortunate. Also, that logic only really makes sense with cache=off. With cache=writethrough, you can get pathological cases whereas you have an uncached access followed by cached accesses. In fact, with read-ahead, this is probably not an uncommon scenario. OTOH, if we aggressively poll the ring when we have an opportunity to, there's very little down side to that and it addresses the serialization problem. But we can't guarantee that we'll get those opportunities, so it doesn't address the problem in a general way. A guest that doesn't use hpet and only has a single virtio-blk device will not have any reason to exit to qemu. We can mitigate this with a timer but honestly, we need to do perf measurements to see. My feeling is that we will need some more aggressive form of polling than just waiting for IO completion. I don't think queue depth is enough because it assumes that all requests are equal. When dealing with cache=off or even just storage with it's own cache, that's simply not the case. Regards, Anthony Liguori
Re: [Qemu-devel] Re: [RFC][PATCH] performance improvement for windows guests, running on top of virtio block device
On 01/11/2010 09:19 AM, Avi Kivity wrote: OTOH, if we aggressively poll the ring when we have an opportunity to, there's very little down side to that and it addresses the serialization problem. But we can't guarantee that we'll get those opportunities, so it doesn't address the problem in a general way. A guest that doesn't use hpet and only has a single virtio-blk device will not have any reason to exit to qemu. We can mitigate this with a timer but honestly, we need to do perf measurements to see. My feeling is that we will need some more aggressive form of polling than just waiting for IO completion. I don't think queue depth is enough because it assumes that all requests are equal. When dealing with cache=off or even just storage with it's own cache, that's simply not the case. Maybe we can adapt behaviour dynamically based on how fast results come in. Based on our experiences with virtio-net, what I'd suggest is to make a lot of tunable options (ring size, various tx mitigation schemes, timeout durations, etc) and then we can do some deep performance studies to see how things interact with each other. I think we should do that before making any changes because I'm deeply concerned that we'll introduce significant performance regressions. Regards, Anthony Liguori
Re: [Qemu-devel] Re: [RFC][PATCH] performance improvement for windows guests, running on top of virtio block device
On 01/11/2010 05:13 PM, Anthony Liguori wrote: On 01/11/2010 08:46 AM, Avi Kivity wrote: On 01/11/2010 04:37 PM, Anthony Liguori wrote: That has the downside of bouncing a cache line on unrelated exits. The read and write sides of the ring are widely separated in physical memory specifically to avoid cache line bouncing. I meant, exits on random vcpus will cause the cacheline containing the notification disable flag to bounce around. As it is, we read it on the vcpu that owns the queue and write it on that vcpu or the I/O thread. Bottom halves are always run from the IO thread. Okay, so that won't be an issue. It probably doesn't matter with qemu as it is now, since it will bounce qemu_mutex, but it will hurt with large guests (especially if they have many rings). IMO we should get things to work well without riding on unrelated exits, especially as we're trying to reduce those exits. A block I/O request can potentially be very, very long lived. By serializing requests like this, there's a high likelihood that it's going to kill performance with anything capable of processing multiple requests. Right, that's why I suggested having a queue depth at which disabling notification kicks in. The patch hardcodes this depth to 1, unpatched qemu is infinite, a good value is probably spindle count + VAT. That means we would need a user visible option which is quite unfortunate. We could guess it, perhaps. Also, that logic only really makes sense with cache=off. With cache=writethrough, you can get pathological cases whereas you have an uncached access followed by cached accesses. In fact, with read-ahead, this is probably not an uncommon scenario. So you'd increase the disable depths when cache!=none. OTOH, if we aggressively poll the ring when we have an opportunity to, there's very little down side to that and it addresses the serialization problem. But we can't guarantee that we'll get those opportunities, so it doesn't address the problem in a general way. A guest that doesn't use hpet and only has a single virtio-blk device will not have any reason to exit to qemu. We can mitigate this with a timer but honestly, we need to do perf measurements to see. My feeling is that we will need some more aggressive form of polling than just waiting for IO completion. I don't think queue depth is enough because it assumes that all requests are equal. When dealing with cache=off or even just storage with it's own cache, that's simply not the case. Maybe we can adapt behaviour dynamically based on how fast results come in. -- error compiling committee.c: too many arguments to function
[Qemu-devel] [PATCH 2/4] ppc-40x: Correct check for Endian swapping TLB entries.
Bailout on 40x TLB entries with endianess swapping only if the entry is valid. Signed-off-by: Edgar E. Iglesias edgar.igles...@gmail.com --- target-ppc/op_helper.c | 12 +++- 1 files changed, 7 insertions(+), 5 deletions(-) diff --git a/target-ppc/op_helper.c b/target-ppc/op_helper.c index 3575b29..f905c64 100644 --- a/target-ppc/op_helper.c +++ b/target-ppc/op_helper.c @@ -3981,13 +3981,15 @@ void helper_4xx_tlbwe_hi (target_ulong entry, target_ulong val) tlb-size, TARGET_PAGE_SIZE, (int)((val 7) 0x7)); } tlb-EPN = val ~(tlb-size - 1); -if (val 0x40) +if (val 0x40) { tlb-prot |= PAGE_VALID; -else +if (val 0x20) { +/* XXX: TO BE FIXED */ +cpu_abort(env, + Little-endian TLB entries are not supported by now\n); +} +} else { tlb-prot = ~PAGE_VALID; -if (val 0x20) { -/* XXX: TO BE FIXED */ -cpu_abort(env, Little-endian TLB entries are not supported by now\n); } tlb-PID = env-spr[SPR_40x_PID]; /* PID */ LOG_SWTLB(%s: set up TLB %d RPN TARGET_FMT_plx EPN TARGET_FMT_lx -- 1.6.4.4
Re: [Qemu-devel] Re: [RFC][PATCH] performance improvement for windows guests, running on top of virtio block device
On 01/11/2010 05:22 PM, Anthony Liguori wrote: Based on our experiences with virtio-net, what I'd suggest is to make a lot of tunable options (ring size, various tx mitigation schemes, timeout durations, etc) and then we can do some deep performance studies to see how things interact with each other. I think we should do that before making any changes because I'm deeply concerned that we'll introduce significant performance regressions. I agree. We can start with this patch, with a tunable depth, defaulting to current behaviour. -- error compiling committee.c: too many arguments to function
Re: [Qemu-devel] Re: [RFC][PATCH] performance improvement for windows guests, running on top of virtio block device
On 01/11/2010 09:31 AM, Avi Kivity wrote: On 01/11/2010 05:22 PM, Anthony Liguori wrote: Based on our experiences with virtio-net, what I'd suggest is to make a lot of tunable options (ring size, various tx mitigation schemes, timeout durations, etc) and then we can do some deep performance studies to see how things interact with each other. I think we should do that before making any changes because I'm deeply concerned that we'll introduce significant performance regressions. I agree. We can start with this patch, with a tunable depth, defaulting to current behaviour. I wouldn't be opposed to that provided we made it clear that these options were not supported long term. I don't want management tools (like libvirt) to start relying on them. Regards, Anthony Liguori
Re: [Qemu-devel] Re: [RFC][PATCH] performance improvement for windows guests, running on top of virtio block device
On 01/11/2010 05:32 PM, Anthony Liguori wrote: On 01/11/2010 09:31 AM, Avi Kivity wrote: On 01/11/2010 05:22 PM, Anthony Liguori wrote: Based on our experiences with virtio-net, what I'd suggest is to make a lot of tunable options (ring size, various tx mitigation schemes, timeout durations, etc) and then we can do some deep performance studies to see how things interact with each other. I think we should do that before making any changes because I'm deeply concerned that we'll introduce significant performance regressions. I agree. We can start with this patch, with a tunable depth, defaulting to current behaviour. I wouldn't be opposed to that provided we made it clear that these options were not supported long term. I don't want management tools (like libvirt) to start relying on them. x-option-name for experimental options? -device disk,if=virtio,x-queue-depth-suppress-notify=4 -- error compiling committee.c: too many arguments to function
Re: [Qemu-devel] Re: [RFC][PATCH] performance improvement for windows guests, running on top of virtio block device
On 01/11/2010 09:35 AM, Avi Kivity wrote: On 01/11/2010 05:32 PM, Anthony Liguori wrote: On 01/11/2010 09:31 AM, Avi Kivity wrote: On 01/11/2010 05:22 PM, Anthony Liguori wrote: Based on our experiences with virtio-net, what I'd suggest is to make a lot of tunable options (ring size, various tx mitigation schemes, timeout durations, etc) and then we can do some deep performance studies to see how things interact with each other. I think we should do that before making any changes because I'm deeply concerned that we'll introduce significant performance regressions. I agree. We can start with this patch, with a tunable depth, defaulting to current behaviour. I wouldn't be opposed to that provided we made it clear that these options were not supported long term. I don't want management tools (like libvirt) to start relying on them. x-option-name for experimental options? -device disk,if=virtio,x-queue-depth-suppress-notify=4 Sounds reasonable to me. Regards, Anthony Liguori
[Qemu-devel] [PATCH] virtio-pci: thinko fix
Since patch ed757e140c0ada220f213036e4497315d24ca8bct, virtio will sometimes clear all status registers on bus master disable, which loses information such as VIRTIO_CONFIG_S_FAILED bit. This is a result of a patch being misapplied: code uses ! instead of ~ for bit operations as in Yan's original patch. This obviously does not make sense. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- hw/virtio-pci.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c index d222ce0..f4f1bc1 100644 --- a/hw/virtio-pci.c +++ b/hw/virtio-pci.c @@ -374,7 +374,7 @@ static void virtio_write_config(PCIDevice *pci_dev, uint32_t address, if (PCI_COMMAND == address) { if (!(val PCI_COMMAND_MASTER)) { -proxy-vdev-status = !VIRTIO_CONFIG_S_DRIVER_OK; +proxy-vdev-status = ~VIRTIO_CONFIG_S_DRIVER_OK; } } -- 1.6.6.rc1.43.gf55cc
Re: [Qemu-devel] [BUGFIX] MCE: Fix bug of IA32_MCG_STATUS after system reset
On 01/05/2010 02:32 AM, Huang Ying wrote: Now, if we inject a fatal MCE into guest OS, for example Linux, Linux will go panic and then reboot. But if we inject another MCE now, system will reset directly instead of go panic firstly, because MCG_STATUS.MCIP is set to 1 and not cleared after reboot. This is does not follow the behavior in real hardware. This patch fixes this via set env-mcg_status to 0 during system reset. Signed-off-by: Huang Yingying.hu...@intel.com Applied. Thanks. Regards, Anthony Liguori --- target-i386/helper.c |2 ++ 1 file changed, 2 insertions(+) --- a/target-i386/helper.c +++ b/target-i386/helper.c @@ -617,6 +617,8 @@ void cpu_reset(CPUX86State *env) env-dr[7] = DR7_FIXED_1; cpu_breakpoint_remove_all(env, BP_CPU); cpu_watchpoint_remove_all(env, BP_CPU); + +env-mcg_status = 0; } void cpu_x86_close(CPUX86State *env)
Re: [Qemu-devel] [PATCH] finish VPATH - vpath translation
On 01/04/2010 04:02 AM, Paolo Bonzini wrote: This adds a few more vpath suffixes and points the remaining two paths explicitly to $(SRC_PATH) in order to eliminate the VPATH assignment from config-host.mak. Signed-off-by: Paolo Bonzinipbonz...@redhat.com Applied. Thanks. Regards, Anthony Liguori Cc: Juan Quintelaquint...@redhat.com --- Makefile|2 +- Makefile.target |2 +- configure |3 --- rules.mak |3 ++- 4 files changed, 4 insertions(+), 6 deletions(-) diff --git a/Makefile b/Makefile index 10c04a8..5c4e256 100644 --- a/Makefile +++ b/Makefile @@ -8,7 +8,7 @@ ifneq ($(wildcard config-host.mak),) all: build-all include config-host.mak include $(SRC_PATH)/rules.mak -config-host.mak: configure +config-host.mak: $(SRC_PATH)/configure @echo $@ is out-of-date, running configure @sed -n /.*Configured with/s/[^:]*: //p $@ | sh else diff --git a/Makefile.target b/Makefile.target index edcd2e5..5999da7 100644 --- a/Makefile.target +++ b/Makefile.target @@ -316,7 +316,7 @@ $(QEMU_PROG): $(obj-y) $(obj-$(TARGET_BASE_ARCH)-y) $(ARLIBS) $(call LINK,$(obj-y) $(obj-$(TARGET_BASE_ARCH)-y)) -gdbstub-xml.c: $(TARGET_XML_FILES) feature_to_c.sh +gdbstub-xml.c: $(TARGET_XML_FILES) $(SRC_PATH)/feature_to_c.sh $(call quiet-command,rm -f $@ $(SHELL) $(SRC_PATH)/feature_to_c.sh $@ $(TARGET_XML_FILES), GEN $(TARGET_DIR)$@) qemu-options.h: $(SRC_PATH)/qemu-options.hx diff --git a/configure b/configure index 18aed43..b7c6519 100755 --- a/configure +++ b/configure @@ -2022,9 +2022,6 @@ qemu_version=`head $source_path/VERSION` echo VERSION=$qemu_version$config_host_mak echo PKGVERSION=$pkgversion$config_host_mak echo SRC_PATH=$source_path $config_host_mak -if [ $source_path_used = yes ]; then - echo VPATH=$source_path $config_host_mak -fi echo TARGET_DIRS=$target_list $config_host_mak if [ $docs = yes ] ; then echo BUILD_DOCS=yes $config_host_mak diff --git a/rules.mak b/rules.mak index 9cd67f0..82e8e3d 100644 --- a/rules.mak +++ b/rules.mak @@ -39,7 +39,8 @@ quiet-command = $(if $(V),$1,$(if $(2),@echo $2 $1, @$1)) cc-option = $(if $(shell $(CC) $1 $2 -S -o /dev/null -xc /dev/null \ /dev/null 21 echo OK), $2, $3) -set-vpath = $(if $1,$(foreach PATTERN,%.c %.h %.S, $(eval vpath $(PATTERN) $1))) +VPATH_SUFFIXES = %.c %.h %.S %.m %.mak %.texi +set-vpath = $(if $1,$(foreach PATTERN,$(VPATH_SUFFIXES), $(eval vpath $(PATTERN) $1))) # Generate timestamp files for .h include files
Re: [Qemu-devel] [PATCH] Fix CPU topology initialization
On 01/05/2010 10:26 AM, Jiri Denemark wrote: Late initialization of CPU topology in CPUState prevents KVM guests to actually see the topology. Signed-off-by: Jiri Denemarkjdene...@redhat.com Applied. Thanks. Regards, Anthony Liguori --- vl.c |8 1 files changed, 4 insertions(+), 4 deletions(-) diff --git a/vl.c b/vl.c index e881e45..a03d7a6 100644 --- a/vl.c +++ b/vl.c @@ -3484,10 +3484,10 @@ void qemu_init_vcpu(void *_env) { CPUState *env = _env; -if (kvm_enabled()) -kvm_init_vcpu(env); env-nr_cores = smp_cores; env-nr_threads = smp_threads; +if (kvm_enabled()) +kvm_init_vcpu(env); return; } @@ -3813,12 +3813,12 @@ void qemu_init_vcpu(void *_env) { CPUState *env = _env; +env-nr_cores = smp_cores; +env-nr_threads = smp_threads; if (kvm_enabled()) kvm_start_vcpu(env); else tcg_init_vcpu(env); -env-nr_cores = smp_cores; -env-nr_threads = smp_threads; } void qemu_notify_event(void)
Re: [Qemu-devel] [PATCH resend] vmware_vga: Check cursor dimensions passed from guest to avoid buffer overflow
On 01/05/2010 10:43 PM, Roland Dreier wrote: Check that the cursor dimensions passed from the guest for the DEFINE_CURSOR command don't overflow the available space in the cursor.image[] or cursor.mask[] arrays before copying data from the guest into those arrays. Signed-off-by: Roland Dreierrola...@cisco.com Applied. Thanks. Regards, Anthony Liguori --- Hi Anthony, as far as I can tell this seems to have slipped through the cracks. I think this is fairly important: it is a guest-triggerable stack smashing attack in the worst case. Thanks, Roland hw/vmware_vga.c |7 +++ 1 files changed, 7 insertions(+), 0 deletions(-) diff --git a/hw/vmware_vga.c b/hw/vmware_vga.c index 7ab1c79..5e969ae 100644 --- a/hw/vmware_vga.c +++ b/hw/vmware_vga.c @@ -562,6 +562,13 @@ static void vmsvga_fifo_run(struct vmsvga_state_s *s) cursor.height = y = vmsvga_fifo_read(s); vmsvga_fifo_read(s); cursor.bpp = vmsvga_fifo_read(s); + + if (SVGA_BITMAP_SIZE(x, y) sizeof cursor.mask || + SVGA_PIXMAP_SIZE(x, y, cursor.bpp) sizeof cursor.image) { + args = SVGA_BITMAP_SIZE(x, y) + SVGA_PIXMAP_SIZE(x, y, cursor.bpp); + goto badcmd; + } + for (args = 0; args SVGA_BITMAP_SIZE(x, y); args ++) cursor.mask[args] = vmsvga_fifo_read_raw(s); for (args = 0; args SVGA_PIXMAP_SIZE(x, y, cursor.bpp); args ++)
Re: [Qemu-devel] [PATCH] remove pending exception on vcpu reset.
On 01/06/2010 08:30 AM, Gleb Natapov wrote: Without this qemu can even start on kvm modules with events support since default value of exception_injected in zero and this is #DE exception. Signed-off-by: Gleb Natapovg...@redhat.com Applied. Thanks. Regards, Anthony Liguori diff --git a/target-i386/kvm.c b/target-i386/kvm.c index de79eb7..4084503 100644 --- a/target-i386/kvm.c +++ b/target-i386/kvm.c @@ -227,6 +227,7 @@ int kvm_arch_init_vcpu(CPUState *env) void kvm_arch_reset_vcpu(CPUState *env) { +env-exception_injected = -1; env-interrupt_injected = -1; env-nmi_injected = 0; env-nmi_pending = 0; -- Gleb.
[Qemu-devel] Re: [ kvm-Bugs-2907597 ] qemu vnc server clips at 2560x1600
On 01/11/2010 04:30 PM, Avi Kivity wrote: VNC_MAX_WIDTH and VNC_MAX_HEIGHT in vnc.h are currently defined to 2048. We do dirty tracking with a bitmap and that bitmap is currently a fixed size. 2048 is bigger than any physical screen that I know of so I assume this is a multiple monitor scenario. Long term, I think exposing multiple monitors to the guest is a better approach for this kind of functionality. Since these resolutions for a single screen don't really exist, this is largely an untested path within the guest. Google suggests such screens do exist, and -vga std (at least in qemu-kvm) supports them. I've asked the reporter to play with VNC_MAX_WIDTH. The reporter confirms that increasing V_M_W fixes the problem. -- error compiling committee.c: too many arguments to function
Re: [Qemu-devel] mulu2 not implemented for tcg target sparc
On 01/10/2010 01:17 PM, Palle Lyckegaard wrote: On Sun, 10 Jan 2010, Blue Swirl wrote: Is it needed somewhere? I was trying to run qemu-system-mips with a NetBSD malta kernel that generates a MIPS mult operation. Tracing the code through tcg points at a missing mulu2 opreration for sparc. It will be generated for instance for some muls when the target is i386 (not even x86_64) on a 32-bit host. In that case mulu2 support would be useful for Sparc32 host too. Palle, are you planning to dig into this? Yes I will try even if my tcg and SPARC assembly skills and very limited (so far...) Try this. You're also missing double-word addition and subtraction. r~ diff --git a/tcg/sparc/tcg-target.c b/tcg/sparc/tcg-target.c index 8f094e5..003f084 100644 --- a/tcg/sparc/tcg-target.c +++ b/tcg/sparc/tcg-target.c @@ -215,6 +215,7 @@ static inline int tcg_target_const_match(tcg_target_long val, #define BA (INSN_OP(0) | INSN_COND(COND_A, 0) | INSN_OP2(0x2)) #define ARITH_ADD (INSN_OP(2) | INSN_OP3(0x00)) +#define ARITH_ADDCC (INSN_OP(2) | INSN_OP3(0x10)) #define ARITH_AND (INSN_OP(2) | INSN_OP3(0x01)) #define ARITH_OR (INSN_OP(2) | INSN_OP3(0x02)) #define ARITH_ORCC (INSN_OP(2) | INSN_OP3(0x12)) @@ -238,6 +239,7 @@ static inline int tcg_target_const_match(tcg_target_long val, #define SHIFT_SRLX (INSN_OP(2) | INSN_OP3(0x26) | (1 12)) #define SHIFT_SRAX (INSN_OP(2) | INSN_OP3(0x27) | (1 12)) +#define RDY(INSN_OP(2) | INSN_OP3(0x28) | INSN_RS1(0)) #define WRY(INSN_OP(2) | INSN_OP3(0x30)) #define JMPL (INSN_OP(2) | INSN_OP3(0x38)) #define SAVE (INSN_OP(2) | INSN_OP3(0x3c)) @@ -403,6 +405,11 @@ static inline void tcg_out_sety(TCGContext *s, tcg_target_long val) fprintf(stderr, unimplemented sety %ld\n, (long)val); } +static inline void tcg_out_rdy(TCGContext *s, int rd) +{ +tcg_out32(s, RDY | INSN_RD(rd)); +} + static inline void tcg_out_addi(TCGContext *s, int reg, tcg_target_long val) { if (val != 0) { @@ -1128,6 +1135,38 @@ static inline void tcg_out_op(TCGContext *s, int opc, const TCGArg *args, args[2], const_args[2], args[3], const_args[3], args[5]); break; +case INDEX_op_add2_i32: +if (const_args[4]) { +tcg_out_arithi(s, args[0], args[2], args[4], ARITH_ADDCC); +} else { +tcg_out_arith(s, args[0], args[2], args[4], ARITH_ADDCC); +} +if (const_args[5]) { +tcg_out_arithi(s, args[1], args[3], args[5], ARITH_ADDX); +} else { +tcg_out_arithi(s, args[1], args[3], args[5], ARITH_ADDX); +} +break; +case INDEX_op_sub2_i32: +if (const_args[4]) { +tcg_out_arithi(s, args[0], args[2], args[4], ARITH_SUBCC); +} else { +tcg_out_arith(s, args[0], args[2], args[4], ARITH_SUBCC); +} +if (const_args[5]) { +tcg_out_arithi(s, args[1], args[3], args[5], ARITH_SUBX); +} else { +tcg_out_arithi(s, args[1], args[3], args[5], ARITH_SUBX); +} +break; +case INDEX_op_mulu2_i32: +if (const_args[3]) { +tcg_out_arithi(s, args[0], args[2], args[3], ARITH_UMUL); +} else { +tcg_out_arith(s, args[0], args[2], args[3], ARITH_UMUL); +} +tcg_out_rdy(s, args[1]); +break; #endif case INDEX_op_qemu_ld8u:
[Qemu-devel] Re: New kvm-related qemu patch queue
On 01/10/2010 06:02 AM, Avi Kivity wrote: In order to improve qemu.git kvm integration quality wrt performance, features, and reliability Marcelo and I will begin to maintain a patch queue based on qemu.git containing kvm-related patches. We will review and apply patches to this queue, test them using the same test suite that is used for qemu-kvm.git, and regularly submit them for inclusion in qemu.git, mimicking the relationship between kvm.git and Linus' linux-2.6.git. Thanks for setting this up Avi! I just want to stress that everyone continue CC'ing qemu-devel on all KVM patches. Even if the patch is qemu-kvm specific for the moment, I think it's important for qemu-devel to be exposed to the refactoring work. It might be good to prefix qemu-kvm.git patches in some manner to make it clear which repository they belong to. Regards, Anthony Liguori
[Qemu-devel] [PATCH-RFC 00/13] vhost-net: preview
Here's an untested patchset with vhost support for upstream qemu. Note that you should not expect performance gains from vhost unless in-kernel irqchip is enabled (which is not in upstream qemu now). Since adding vhost involves quite a bit of infrastructure, I thought it makes sense to send an RFC already, so that interested parties can review it. In particular, command line and help text need to be finalized early to so that management can start looking on supporting the feature. This patch has all bits besides migration filled in. Also missing is packet socket backend: another team is now working on this. Michael S. Tsirkin (13): virtio: export virtqueue structure kvm: add API to set ioeventfd virtio: add iofd/irqfd support virtio-pci: fill in irqfd/queufd support syborg_virtio: add irqfd/eventfd support s390-virtio: fill in irqfd support virtio: move typedef to qemu-common net/tap: add interface to get device fd tap: add vhost/vhostfd options tap: add API to retrieve vhost net header vhost net support virtio: add status change callback virtio-net: connect to vhost net backend Makefile.target |1 + hw/s390-virtio-bus.c | 19 +++ hw/syborg_virtio.c | 27 hw/vhost.c | 349 ++ hw/vhost.h | 33 + hw/vhost_net.c | 145 + hw/vhost_net.h | 20 +++ hw/virtio-net.c | 40 ++ hw/virtio-pci.c | 32 + hw/virtio.c | 31 ++--- hw/virtio.h | 23 +++- kvm-all.c| 24 kvm.h|3 + net.c|8 + net/tap.c| 43 ++ net/tap.h|5 + qemu-common.h|1 + qemu-options.hx |4 +- 18 files changed, 785 insertions(+), 23 deletions(-) create mode 100644 hw/vhost.c create mode 100644 hw/vhost.h create mode 100644 hw/vhost_net.c create mode 100644 hw/vhost_net.h
[Qemu-devel] [PATCH-RFC 01/13] virtio: export virtqueue structure
vhost needs physical addresses for ring so expose that structure. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- hw/virtio.c | 18 -- hw/virtio.h | 17 + 2 files changed, 17 insertions(+), 18 deletions(-) diff --git a/hw/virtio.c b/hw/virtio.c index 3c609ce..8e3c9ad 100644 --- a/hw/virtio.c +++ b/hw/virtio.c @@ -49,24 +49,6 @@ typedef struct VRingUsed VRingUsedElem ring[0]; } VRingUsed; -typedef struct VRing -{ -unsigned int num; -target_phys_addr_t desc; -target_phys_addr_t avail; -target_phys_addr_t used; -} VRing; - -struct VirtQueue -{ -VRing vring; -target_phys_addr_t pa; -uint16_t last_avail_idx; -int inuse; -uint16_t vector; -void (*handle_output)(VirtIODevice *vdev, VirtQueue *vq); -}; - #define VIRTIO_PCI_QUEUE_MAX16 /* virt queue functions */ diff --git a/hw/virtio.h b/hw/virtio.h index 3994cc9..ca840e1 100644 --- a/hw/virtio.h +++ b/hw/virtio.h @@ -180,5 +180,22 @@ void virtio_net_exit(VirtIODevice *vdev); DEFINE_PROP_BIT(indirect_desc, _state, _field, \ VIRTIO_RING_F_INDIRECT_DESC, true) +typedef struct VRing +{ +unsigned int num; +target_phys_addr_t desc; +target_phys_addr_t avail; +target_phys_addr_t used; +} VRing; + +struct VirtQueue +{ +VRing vring; +target_phys_addr_t pa; +uint16_t last_avail_idx; +int inuse; +uint16_t vector; +void (*handle_output)(VirtIODevice *vdev, VirtQueue *vq); +}; #endif -- 1.6.6.rc1.43.gf55cc
[Qemu-devel] [PATCH-RFC 02/13] kvm: add API to set ioeventfd
Signed-off-by: Michael S. Tsirkin m...@redhat.com --- kvm-all.c | 24 kvm.h |3 +++ 2 files changed, 27 insertions(+), 0 deletions(-) diff --git a/kvm-all.c b/kvm-all.c index a312654..aa00119 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -1113,3 +1113,27 @@ void kvm_remove_all_breakpoints(CPUState *current_env) { } #endif /* !KVM_CAP_SET_GUEST_DEBUG */ + +#ifdef KVM_IOEVENTFD +int kvm_set_ioeventfd(uint16_t addr, uint16_t data, int fd, bool assigned) +{ +struct kvm_ioeventfd kick = { +.datamatch = data, +.addr = addr, +.len = 2, +.flags = KVM_IOEVENTFD_FLAG_DATAMATCH | KVM_IOEVENTFD_FLAG_PIO, +.fd = fd, +}; +if (!assigned) +kick.flags |= KVM_IOEVENTFD_FLAG_DEASSIGN; +int r = kvm_vm_ioctl(kvm_state, KVM_IOEVENTFD, kick); +if (r 0) +return r; +return 0; +} +#else +int kvm_set_ioeventfd(uint16_t data, uint16_t addr, int fd, bool assigned) +{ +return -ENOSYS; +} +#endif diff --git a/kvm.h b/kvm.h index 672d511..2d723b8 100644 --- a/kvm.h +++ b/kvm.h @@ -14,6 +14,7 @@ #ifndef QEMU_KVM_H #define QEMU_KVM_H +#include stdbool.h #include config.h #include qemu-queue.h @@ -131,4 +132,6 @@ static inline void cpu_synchronize_state(CPUState *env) } } +int kvm_set_ioeventfd(uint16_t addr, uint16_t data, int fd, bool assigned); + #endif -- 1.6.6.rc1.43.gf55cc
[Qemu-devel] [PATCH-RFC 03/13] virtio: add iofd/irqfd support
Add binding API to set iofd/irqfd support. Will be used by vhost. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- hw/virtio.c | 13 ++--- hw/virtio.h |4 2 files changed, 14 insertions(+), 3 deletions(-) diff --git a/hw/virtio.c b/hw/virtio.c index 8e3c9ad..b9ec863 100644 --- a/hw/virtio.c +++ b/hw/virtio.c @@ -572,6 +572,12 @@ VirtQueue *virtio_add_queue(VirtIODevice *vdev, int queue_size, return vdev-vq[i]; } +void virtio_irq(VirtIODevice *vdev, VirtQueue *vq) +{ +vdev-isr |= 0x01; +virtio_notify_vector(vdev, vq-vector); +} + void virtio_notify(VirtIODevice *vdev, VirtQueue *vq) { /* Make sure used ring is updated before we check NO_INTERRUPT. */ @@ -582,8 +588,7 @@ void virtio_notify(VirtIODevice *vdev, VirtQueue *vq) (vq-inuse || vring_avail_idx(vq) != vq-last_avail_idx))) return; -vdev-isr |= 0x01; -virtio_notify_vector(vdev, vq-vector); +virtio_irq(vdev, vq); } void virtio_notify_config(VirtIODevice *vdev) @@ -696,8 +701,10 @@ VirtIODevice *virtio_common_init(const char *name, uint16_t device_id, vdev-queue_sel = 0; vdev-config_vector = VIRTIO_NO_VECTOR; vdev-vq = qemu_mallocz(sizeof(VirtQueue) * VIRTIO_PCI_QUEUE_MAX); -for(i = 0; i VIRTIO_PCI_QUEUE_MAX; i++) +for(i = 0; i VIRTIO_PCI_QUEUE_MAX; i++) { vdev-vq[i].vector = VIRTIO_NO_VECTOR; +vdev-vq[i].vdev = vdev; +} vdev-name = name; vdev-config_len = config_size; diff --git a/hw/virtio.h b/hw/virtio.h index ca840e1..193b3f9 100644 --- a/hw/virtio.h +++ b/hw/virtio.h @@ -88,6 +88,8 @@ typedef struct { int (*load_config)(void * opaque, QEMUFile *f); int (*load_queue)(void * opaque, int n, QEMUFile *f); unsigned (*get_features)(void * opaque); +int (*set_irqfd)(void * opaque, int n, int fd, bool assigned); +int (*set_queuefd)(void * opaque, int n, int fd, bool assigned); } VirtIOBindings; #define VIRTIO_PCI_QUEUE_MAX 16 @@ -196,6 +198,8 @@ struct VirtQueue int inuse; uint16_t vector; void (*handle_output)(VirtIODevice *vdev, VirtQueue *vq); +VirtIODevice *vdev; }; +void virtio_irq(VirtIODevice *vdev, VirtQueue *vq); #endif -- 1.6.6.rc1.43.gf55cc
[Qemu-devel] [PATCH-RFC 04/13] virtio-pci: fill in irqfd/queufd support
Support irqfd/queuefd. The last one only with kvm, that's okay because vhost relies on kvm anyway. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- hw/virtio-pci.c | 26 ++ 1 files changed, 26 insertions(+), 0 deletions(-) diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c index 6d0f9dd..5a3be6b 100644 --- a/hw/virtio-pci.c +++ b/hw/virtio-pci.c @@ -23,6 +23,7 @@ #include msix.h #include net.h #include loader.h +#include kvm.h /* from Linux's linux/virtio_pci.h */ @@ -388,6 +389,29 @@ static unsigned virtio_pci_get_features(void *opaque) return proxy-host_features; } +static void virtio_pci_irqfd_read(void *opaque) +{ +VirtQueue *vq = opaque; +virtio_irq(vq-vdev, vq); +} + +static int virtio_pci_irqfd(void * opaque, int n, int fd, bool assign) +{ +VirtIOPCIProxy *proxy = opaque; +VirtQueue *vq = proxy-vdev-vq[n]; + +qemu_set_fd_handler(fd, assign ? virtio_pci_irqfd_read : NULL, NULL, vq); +return 0; +} + +static int virtio_pci_queuefd(void * opaque, int n, int fd, bool assign) +{ +VirtIOPCIProxy *proxy = opaque; +return kvm_set_ioeventfd(proxy-addr + VIRTIO_PCI_QUEUE_NOTIFY, + n, fd, assign); +} + + static const VirtIOBindings virtio_pci_bindings = { .notify = virtio_pci_notify, .save_config = virtio_pci_save_config, @@ -395,6 +419,8 @@ static const VirtIOBindings virtio_pci_bindings = { .save_queue = virtio_pci_save_queue, .load_queue = virtio_pci_load_queue, .get_features = virtio_pci_get_features, +.set_irqfd = virtio_pci_irqfd, +.set_queuefd = virtio_pci_queuefd, }; static void virtio_init_pci(VirtIOPCIProxy *proxy, VirtIODevice *vdev, -- 1.6.6.rc1.43.gf55cc
[Qemu-devel] [PATCH-RFC 05/13] syborg_virtio: add irqfd/eventfd support
No idea if it's right .. and syborg does not support vectors so we won't get performance gains out of it either - so quite likely it's best to just keep this patch out of qemu. But just to show what's possible. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- hw/syborg_virtio.c | 25 + 1 files changed, 25 insertions(+), 0 deletions(-) diff --git a/hw/syborg_virtio.c b/hw/syborg_virtio.c index 65239a0..ef6364f 100644 --- a/hw/syborg_virtio.c +++ b/hw/syborg_virtio.c @@ -249,9 +249,34 @@ static unsigned syborg_virtio_get_features(void *opaque) return proxy-host_features; } +static void syborg_virtio_irqfd_read(void *opaque) +{ +VirtQueue *vq = opaque; +virtio_irq(vq-vdev, vq); +} + +static int syborg_virtio_irqfd(void * opaque, int n, int fd, bool assign) +{ +SyborgVirtIOProxy *proxy = opaque; +VirtQueue *vq = proxy-vdev-vq[n]; + +qemu_set_fd_handler(fd, assign ? syborg_virtio_irqfd_read : NULL, NULL, vq); +return 0; +} + +static int syborg_virtio_queuefd(void * opaque, int n, int fd, bool assign) +{ +SyborgVirtIOProxy *proxy = opaque; +return kvm_set_ioeventfd(proxy-busdev.mmio[0].addr + + SYBORG_VIRTIO_QUEUE_NOTIFY 2, + n, fd, assign); +} + static VirtIOBindings syborg_virtio_bindings = { .notify = syborg_virtio_update_irq, .get_features = syborg_virtio_get_features, +.set_irqfd = syborg_virtio_irqfd, +.set_queuefd = syborg_virtio_queuefd, }; static int syborg_virtio_init(SyborgVirtIOProxy *proxy, VirtIODevice *vdev) -- 1.6.6.rc1.43.gf55cc
[Qemu-devel] [PATCH-RFC 07/13] virtio: move typedef to qemu-common
make it possible to use type without header include Signed-off-by: Michael S. Tsirkin m...@redhat.com --- hw/virtio.h |1 - qemu-common.h |1 + 2 files changed, 1 insertions(+), 1 deletions(-) diff --git a/hw/virtio.h b/hw/virtio.h index 193b3f9..6a7e3ec 100644 --- a/hw/virtio.h +++ b/hw/virtio.h @@ -67,7 +67,6 @@ static inline target_phys_addr_t vring_align(target_phys_addr_t addr, } typedef struct VirtQueue VirtQueue; -typedef struct VirtIODevice VirtIODevice; #define VIRTQUEUE_MAX_SIZE 1024 diff --git a/qemu-common.h b/qemu-common.h index 8630f8c..8bb1f83 100644 --- a/qemu-common.h +++ b/qemu-common.h @@ -217,6 +217,7 @@ typedef struct uWireSlave uWireSlave; typedef struct I2SCodec I2SCodec; typedef struct DeviceState DeviceState; typedef struct SSIBus SSIBus; +typedef struct VirtIODevice VirtIODevice; /* CPU save/load. */ void cpu_save(QEMUFile *f, void *opaque); -- 1.6.6.rc1.43.gf55cc
[Qemu-devel] [PATCH-RFC 08/13] net/tap: add interface to get device fd
Will be used by vhost to attach/detach to backend. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- net/tap.c |7 +++ net/tap.h |2 ++ 2 files changed, 9 insertions(+), 0 deletions(-) diff --git a/net/tap.c b/net/tap.c index d3492de..7e9ca79 100644 --- a/net/tap.c +++ b/net/tap.c @@ -269,6 +269,13 @@ static void tap_poll(VLANClientState *nc, bool enable) tap_write_poll(s, enable); } +int tap_get_fd(VLANClientState *nc) +{ +TAPState *s = DO_UPCAST(TAPState, nc, nc); +assert(nc-info-type == NET_CLIENT_TYPE_TAP); +return s-fd; +} + /* fd support */ static NetClientInfo net_tap_info = { diff --git a/net/tap.h b/net/tap.h index 538a562..a244b28 100644 --- a/net/tap.h +++ b/net/tap.h @@ -48,4 +48,6 @@ int tap_probe_vnet_hdr(int fd); int tap_probe_has_ufo(int fd); void tap_fd_set_offload(int fd, int csum, int tso4, int tso6, int ecn, int ufo); +int tap_get_fd(VLANClientState *vc); + #endif /* QEMU_NET_TAP_H */ -- 1.6.6.rc1.43.gf55cc
[Qemu-devel] [PATCH-RFC 09/13] tap: add vhost/vhostfd options
Looks like order got mixed up: vhost_net header is added by a follow-up patch. Will be fixed in the next revision. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- net.c |8 net/tap.c | 29 + qemu-options.hx |4 +++- 3 files changed, 40 insertions(+), 1 deletions(-) diff --git a/net.c b/net.c index 6ef93e6..b942d03 100644 --- a/net.c +++ b/net.c @@ -976,6 +976,14 @@ static struct { .name = vnet_hdr, .type = QEMU_OPT_BOOL, .help = enable the IFF_VNET_HDR flag on the tap interface +}, { +.name = vhost, +.type = QEMU_OPT_BOOL, +.help = enable vhost-net network accelerator, +}, { +.name = vhostfd, +.type = QEMU_OPT_STRING, +.help = file descriptor of an already opened vhost net device, }, #endif /* _WIN32 */ { /* end of list */ } diff --git a/net/tap.c b/net/tap.c index 7e9ca79..d9f2e41 100644 --- a/net/tap.c +++ b/net/tap.c @@ -41,6 +41,8 @@ #include net/tap-linux.h +#include hw/vhost_net.h + /* Maximum GSO packet size (64k) plus plenty of room for * the ethernet and virtio_net headers */ @@ -57,6 +59,7 @@ typedef struct TAPState { unsigned int has_vnet_hdr : 1; unsigned int using_vnet_hdr : 1; unsigned int has_ufo: 1; +struct vhost_net *vhost_net; } TAPState; static int launch_script(const char *setup_script, const char *ifname, int fd); @@ -252,6 +255,10 @@ static void tap_cleanup(VLANClientState *nc) { TAPState *s = DO_UPCAST(TAPState, nc, nc); +if (s-vhost_net) { +vhost_net_cleanup(s-vhost_net); +} + qemu_purge_queued_packets(nc); if (s-down_script[0]) @@ -307,6 +314,7 @@ static TAPState *net_tap_fd_init(VLANState *vlan, s-has_ufo = tap_probe_has_ufo(s-fd); tap_set_offload(s-nc, 0, 0, 0, 0, 0); tap_read_poll(s, 1); +s-vhost_net = NULL; return s; } @@ -456,6 +464,27 @@ int net_init_tap(QemuOpts *opts, Monitor *mon, const char *name, VLANState *vlan } } +if (qemu_opt_get_bool(opts, vhost, 0)) { +int vhostfd, r; +if (qemu_opt_get(opts, vhostfd)) { +r = net_handle_fd_param(mon, qemu_opt_get(opts, vhostfd)); +if (r == -1) { +return -1; +} +vhostfd = r; +} else { +vhostfd = -1; +} +s-vhost_net = vhost_net_init(s-nc, vhostfd); +if (!s-vhost_net) { +qemu_error(vhost-net requested but could not be initialized\n); +return -1; +} +} else if (qemu_opt_get(opts, vhostfd)) { +qemu_error(vhostfd= is not valid without vhost\n); +return -1; +} + if (vlan) { vlan-nb_host_devs++; } diff --git a/qemu-options.hx b/qemu-options.hx index ecd50eb..ef185dd 100644 --- a/qemu-options.hx +++ b/qemu-options.hx @@ -813,7 +813,7 @@ DEF(net, HAS_ARG, QEMU_OPTION_net, -net tap[,vlan=n][,name=str],ifname=name\n connect the host TAP network interface to VLAN 'n'\n #else --net tap[,vlan=n][,name=str][,fd=h][,ifname=name][,script=file][,downscript=dfile][,sndbuf=nbytes][,vnet_hdr=on|off]\n +-net tap[,vlan=n][,name=str][,fd=h][,ifname=name][,script=file][,downscript=dfile][,sndbuf=nbytes][,vnet_hdr=on|off][,vhost=on|off][,vhostfd=h]\n connect the host TAP network interface to VLAN 'n' and use the\n network scripts 'file' (default=%s)\n and 'dfile' (default=%s);\n @@ -823,6 +823,8 @@ DEF(net, HAS_ARG, QEMU_OPTION_net, default of 'sndbuf=1048576' can be disabled using 'sndbuf=0'\n use vnet_hdr=off to avoid enabling the IFF_VNET_HDR tap flag; use\n vnet_hdr=on to make the lack of IFF_VNET_HDR support an error condition\n +use vhost=on to enable experimental in kernel accelerator\n +use 'vhostfd=h' to connect to an already opened vhost net device\n #endif -net socket[,vlan=n][,name=str][,fd=h][,listen=[host]:port][,connect=host:port]\n connect the vlan 'n' to another VLAN using a socket connection\n -- 1.6.6.rc1.43.gf55cc
[Qemu-devel] [PATCH-RFC 10/13] tap: add API to retrieve vhost net header
same comment as patch 09. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- net/tap.c |7 +++ net/tap.h |3 +++ 2 files changed, 10 insertions(+), 0 deletions(-) diff --git a/net/tap.c b/net/tap.c index d9f2e41..166cf05 100644 --- a/net/tap.c +++ b/net/tap.c @@ -491,3 +491,10 @@ int net_init_tap(QemuOpts *opts, Monitor *mon, const char *name, VLANState *vlan return 0; } + +struct vhost_net *tap_get_vhost_net(VLANClientState *nc) +{ +TAPState *s = DO_UPCAST(TAPState, nc, nc); +assert(nc-info-type == NET_CLIENT_TYPE_TAP); +return s-vhost_net; +} diff --git a/net/tap.h b/net/tap.h index a244b28..b8cec83 100644 --- a/net/tap.h +++ b/net/tap.h @@ -50,4 +50,7 @@ void tap_fd_set_offload(int fd, int csum, int tso4, int tso6, int ecn, int ufo); int tap_get_fd(VLANClientState *vc); +struct vhost_net; +struct vhost_net *tap_get_vhost_net(VLANClientState *vc); + #endif /* QEMU_NET_TAP_H */ -- 1.6.6.rc1.43.gf55cc
[Qemu-devel] [PATCH-RFC 12/13] virtio: add status change callback
vhost net backend needs to be notified when frontend status changes. Add a callback. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- hw/s390-virtio-bus.c |3 +++ hw/syborg_virtio.c |2 ++ hw/virtio-pci.c |6 ++ hw/virtio.h |1 + 4 files changed, 12 insertions(+), 0 deletions(-) diff --git a/hw/s390-virtio-bus.c b/hw/s390-virtio-bus.c index b663972..1f8a04f 100644 --- a/hw/s390-virtio-bus.c +++ b/hw/s390-virtio-bus.c @@ -243,6 +243,9 @@ void s390_virtio_device_update_status(VirtIOS390Device *dev) uint32_t features; vdev-status = ldub_phys(dev-dev_offs + VIRTIO_DEV_OFFS_STATUS); +if (vdev-set_status) { +vdev-set_status(vdev); +} /* Update guest supported feature bitmap */ diff --git a/hw/syborg_virtio.c b/hw/syborg_virtio.c index ef6364f..264c1f7 100644 --- a/hw/syborg_virtio.c +++ b/hw/syborg_virtio.c @@ -152,6 +152,8 @@ static void syborg_virtio_writel(void *opaque, target_phys_addr_t offset, vdev-status = value 0xFF; if (vdev-status == 0) virtio_reset(vdev); +if (vdev-set_status) +vdev-set_status(vdev); break; case SYBORG_VIRTIO_INT_ENABLE: s-int_enable = value; diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c index 5a3be6b..fa0a7a0 100644 --- a/hw/virtio-pci.c +++ b/hw/virtio-pci.c @@ -209,6 +209,9 @@ static void virtio_ioport_write(void *opaque, uint32_t addr, uint32_t val) virtio_reset(proxy-vdev); msix_unuse_all_vectors(proxy-pci_dev); } +if (vdev-set_status) { +vdev-set_status(vdev); +} break; case VIRTIO_MSI_CONFIG_VECTOR: msix_vector_unuse(proxy-pci_dev, vdev-config_vector); @@ -376,6 +379,9 @@ static void virtio_write_config(PCIDevice *pci_dev, uint32_t address, if (PCI_COMMAND == address) { if (!(val PCI_COMMAND_MASTER)) { proxy-vdev-status = !VIRTIO_CONFIG_S_DRIVER_OK; +if (proxy-vdev-set_status) { +proxy-vdev-set_status(proxy-vdev); +} } } diff --git a/hw/virtio.h b/hw/virtio.h index 6a7e3ec..76734ac 100644 --- a/hw/virtio.h +++ b/hw/virtio.h @@ -112,6 +112,7 @@ struct VirtIODevice void (*get_config)(VirtIODevice *vdev, uint8_t *config); void (*set_config)(VirtIODevice *vdev, const uint8_t *config); void (*reset)(VirtIODevice *vdev); +void (*set_status)(VirtIODevice *vdev); VirtQueue *vq; const VirtIOBindings *binding; void *binding_opaque; -- 1.6.6.rc1.43.gf55cc
[Qemu-devel] [PATCH-RFC 13/13] virtio-net: connect to vhost net backend
start/stop backend on driver start/stop Signed-off-by: Michael S. Tsirkin m...@redhat.com --- hw/virtio-net.c | 40 1 files changed, 40 insertions(+), 0 deletions(-) diff --git a/hw/virtio-net.c b/hw/virtio-net.c index c2a389f..99169e1 100644 --- a/hw/virtio-net.c +++ b/hw/virtio-net.c @@ -17,6 +17,7 @@ #include net/tap.h #include qemu-timer.h #include virtio-net.h +#include vhost_net.h #define VIRTIO_NET_VM_VERSION11 @@ -47,6 +48,7 @@ typedef struct VirtIONet uint8_t nomulti; uint8_t nouni; uint8_t nobcast; +uint8_t vhost_started; struct { int in_use; int first_multi; @@ -114,6 +116,10 @@ static void virtio_net_reset(VirtIODevice *vdev) n-nomulti = 0; n-nouni = 0; n-nobcast = 0; +if (n-vhost_started) { +vhost_net_stop(tap_get_vhost_net(n-nic-nc.peer), vdev); +n-vhost_started = 0; +} /* Flush any MAC and VLAN filter table state */ n-mac_table.in_use = 0; @@ -820,6 +826,36 @@ static NetClientInfo net_virtio_info = { .link_status_changed = virtio_net_set_link_status, }; +static void virtio_net_set_status(struct VirtIODevice *vdev) +{ +VirtIONet *n = to_virtio_net(vdev); +if (!n-nic-nc.peer) { +return; +} +if (n-nic-nc.peer-info-type != NET_CLIENT_TYPE_TAP) { +return; +} + +if (!tap_get_vhost_net(n-nic-nc.peer)) { +return; +} +if (!!n-vhost_started == !!(vdev-status VIRTIO_CONFIG_S_DRIVER_OK)) { +return; +} +if (vdev-status VIRTIO_CONFIG_S_DRIVER_OK) { +int r = vhost_net_start(tap_get_vhost_net(n-nic-nc.peer), vdev); +if (r 0) { +fprintf(stderr, unable to start vhost net: +falling back on userspace virtio\n); +} else { +n-vhost_started = 1; +} +} else { +vhost_net_stop(tap_get_vhost_net(n-nic-nc.peer), vdev); +n-vhost_started = 0; +} +} + VirtIODevice *virtio_net_init(DeviceState *dev, NICConf *conf) { VirtIONet *n; @@ -835,6 +871,7 @@ VirtIODevice *virtio_net_init(DeviceState *dev, NICConf *conf) n-vdev.set_features = virtio_net_set_features; n-vdev.bad_features = virtio_net_bad_features; n-vdev.reset = virtio_net_reset; +n-vdev.set_status = virtio_net_set_status; n-rx_vq = virtio_add_queue(n-vdev, 256, virtio_net_handle_rx); n-tx_vq = virtio_add_queue(n-vdev, 256, virtio_net_handle_tx); n-ctrl_vq = virtio_add_queue(n-vdev, 64, virtio_net_handle_ctrl); @@ -864,6 +901,9 @@ VirtIODevice *virtio_net_init(DeviceState *dev, NICConf *conf) void virtio_net_exit(VirtIODevice *vdev) { VirtIONet *n = DO_UPCAST(VirtIONet, vdev, vdev); +if (n-vhost_started) { +vhost_net_stop(tap_get_vhost_net(n-nic-nc.peer), vdev); +} qemu_purge_queued_packets(n-nic-nc); -- 1.6.6.rc1.43.gf55cc
[Qemu-devel] [PATCH-RFC 11/13] vhost net support
This adds vhost net support in qemu. Will be tied to tap device and virtio later. Raw backend is currently missing, will be worked on/submitted separately. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- Makefile.target |1 + hw/vhost.c | 349 +++ hw/vhost.h | 33 + hw/vhost_net.c | 145 +++ hw/vhost_net.h | 20 +++ 5 files changed, 548 insertions(+), 0 deletions(-) create mode 100644 hw/vhost.c create mode 100644 hw/vhost.h create mode 100644 hw/vhost_net.c create mode 100644 hw/vhost_net.h diff --git a/Makefile.target b/Makefile.target index 7c1f30c..61b7148 100644 --- a/Makefile.target +++ b/Makefile.target @@ -157,6 +157,7 @@ obj-y = vl.o async.o monitor.o pci.o pci_host.o pcie_host.o machine.o gdbstub.o # virtio has to be here due to weird dependency between PCI and virtio-net. # need to fix this properly obj-y += virtio-blk.o virtio-balloon.o virtio-net.o virtio-console.o virtio-pci.o +obj-y += vhost_net.o vhost.o obj-$(CONFIG_KVM) += kvm.o kvm-all.o obj-$(CONFIG_ISA_MMIO) += isa_mmio.o LIBS+=-lz diff --git a/hw/vhost.c b/hw/vhost.c new file mode 100644 index 000..d23d94c --- /dev/null +++ b/hw/vhost.c @@ -0,0 +1,349 @@ +#include linux/vhost.h +#include sys/ioctl.h +#include sys/eventfd.h +#include vhost.h +#include hw/hw.h +/* For range_get_last */ +#include pci.h + +static void vhost_dev_unassign_memory(struct vhost_dev *dev, + struct vhost_memory *mem, + uint64_t start_addr, + uint64_t size) +{ + int from, to; + for (from = 0, to = 0; from dev-mem-nregions; ++from, ++to) { + struct vhost_memory_region *reg = mem-regions + to; + uint64_t reglast; + uint64_t memlast; + uint64_t change; + + /* clone old region */ + memcpy(reg, dev-mem-regions + from, sizeof *reg); + + /* No overlap is simple */ + if (!ranges_overlap(reg-guest_phys_addr, reg-memory_size, + start_addr, size)) { + continue; + } + reglast = range_get_last(reg-guest_phys_addr, reg-memory_size); + memlast = range_get_last(start_addr, size); + + /* Remove whole region */ + if (start_addr = reg-guest_phys_addr memlast = reglast) { + --to; + continue; + } + + /* Shrink region */ + if (memlast = reglast) { + reg-memory_size = start_addr - reg-guest_phys_addr; + continue; + } + + /* Shift region */ + if (start_addr = reg-guest_phys_addr) { + change = memlast + 1 - reg-guest_phys_addr; + reg-memory_size -= change; + reg-guest_phys_addr += change; + reg-userspace_addr += change; + continue; + } + + /* Split region: shrink first part, shift second part. */ + memcpy(reg + 1, reg, sizeof *reg); + reg[0].memory_size = start_addr - reg-guest_phys_addr; + change = memlast + 1 - reg-guest_phys_addr; + reg[1].memory_size -= change; + reg[1].guest_phys_addr += change; + reg[1].userspace_addr += change; + ++to; + } + mem-nregions = to; +} + +/* Called after unassign, so no regions overlap the given range. */ +static void vhost_dev_assign_memory(struct vhost_dev *dev, + struct vhost_memory *mem, + uint64_t start_addr, + uint64_t size, + uint64_t uaddr) +{ + int from, to; + struct vhost_memory_region *merged = NULL; + for (from = 0, to = 0; from dev-mem-nregions; ++from, ++to) { + struct vhost_memory_region *reg = mem-regions + to; + uint64_t prlast, urlast; + uint64_t pmlast, umlast; + uint64_t s, e, u; + + /* clone old region */ + memcpy(reg, dev-mem-regions + from, sizeof *reg); + prlast = range_get_last(reg-guest_phys_addr, reg-memory_size); + pmlast = range_get_last(start_addr, size); + urlast = range_get_last(reg-userspace_addr, reg-memory_size); + umlast = range_get_last(uaddr, size); + + /* Not an adjecent region - do not merge. */ + if ((prlast + 1 != start_addr || urlast + 1 != uaddr) + (pmlast + 1 != reg-guest_phys_addr || +umlast + 1 != reg-userspace_addr)) { + continue; +
[Qemu-devel] [PATCH] vnc_refresh: return if vd-timer is NULL
Hi all, calling vnc_update_client in vnc_refresh might have the unlikely side effect of setting vd-timer = NULL, if the last vnc client disconnected. In this case we have to return from vnc_refresh without updating the timer, otherwise we cause a segfault. Signed-off-by: Stefano Stabellini stefano.stabell...@eu.citrix.com --- diff --git a/vnc.c b/vnc.c index c54c6e0..58eac73 100644 --- a/vnc.c +++ b/vnc.c @@ -2305,6 +2305,10 @@ static void vnc_refresh(void *opaque) rects += vnc_update_client(vs, has_dirty); vs = vs-next; } +/* vd-timer could be NULL now if the last client disconnected, + * in this case don't update the timer */ +if (vd-timer == NULL) +return; if (has_dirty rects) { vd-timer_interval /= 2;
Re: [Qemu-devel] [PATCH] dmg: fix -open failure
On Mon, Jan 11, 2010 at 03:11:52PM +0100, Kevin Wolf wrote: More or less the same hack, just in cleaner? Or trying to fundamentally change things? I think you haven't answered yet to what I said in the thread of my original hack. I'm quoting it here for convenience: Well, not dealing with the format list in raw, but rather in block.c Ok, if you start talking about layering, we can have a fundamental discussion on this topic and why the layering is broken anyway. Logically, we have image formats like qcow2, VMDK and raw, and they are stored in files, on CD-ROMs or general block devices. From a layering perspective, it is wrong to include the latter in the raw format driver in the first place. Actually, I think the differentiation between raw files and host_* is at the same level as protocols are. Probably they should be implemented very similarly. Do you think it's possible/worth the effort to try putting things straight here? So what you want is basically: - hdev_* and file as protocols in addition to nbd/ftp/http/.. - a raw image format that can be used ontop of any protocol instead of an image format That would indeed be a much better, not to say actually logical layering. The raw image format would be more or less a no-op just stacking ontop of the protocol. If we can find a way to implement this efficiently it might be the way to go.
[Qemu-devel] [PATCH 1/2] block: clean up bdrv_open2 structure a bit
Check the whitelist as early as possible instead of continuing the setup, and move all the error handling code to the end of the function. Signed-off-by: Christoph Hellwig h...@lst.de Index: qemu/block.c === --- qemu.orig/block.c 2010-01-11 17:48:27.811004054 +0100 +++ qemu/block.c2010-01-11 17:52:15.578006158 +0100 @@ -428,10 +428,16 @@ int bdrv_open2(BlockDriverState *bs, con drv = find_image_format(filename); } } + if (!drv) { ret = -ENOENT; goto unlink_and_fail; } +if (use_bdrv_whitelist !bdrv_is_whitelisted(drv)) { +ret = -ENOTSUP; +goto unlink_and_fail; +} + bs-drv = drv; bs-opaque = qemu_mallocz(drv-instance_size); @@ -452,23 +458,17 @@ int bdrv_open2(BlockDriverState *bs, con (flags (BDRV_O_CACHE_MASK|BDRV_O_NATIVE_AIO)); else open_flags = flags ~(BDRV_O_FILE | BDRV_O_SNAPSHOT); -if (use_bdrv_whitelist !bdrv_is_whitelisted(drv)) -ret = -ENOTSUP; -else -ret = drv-bdrv_open(bs, filename, open_flags); + +ret = drv-bdrv_open(bs, filename, open_flags); if ((ret == -EACCES || ret == -EPERM) !(flags BDRV_O_FILE)) { ret = drv-bdrv_open(bs, filename, open_flags ~BDRV_O_RDWR); bs-read_only = 1; } + if (ret 0) { -qemu_free(bs-opaque); -bs-opaque = NULL; -bs-drv = NULL; -unlink_and_fail: -if (bs-is_temporary) -unlink(filename); -return ret; +goto free_and_fail; } + if (drv-bdrv_getlength) { bs-total_sectors = bdrv_getlength(bs) BDRV_SECTOR_BITS; } @@ -502,6 +502,15 @@ int bdrv_open2(BlockDriverState *bs, con bs-change_cb(bs-change_opaque); } return 0; + +free_and_fail: +qemu_free(bs-opaque); +bs-opaque = NULL; +bs-drv = NULL; +unlink_and_fail: +if (bs-is_temporary) +unlink(filename); +return ret; } void bdrv_close(BlockDriverState *bs)
[Qemu-devel] [PATCH 2/2] block: untangle open flag manipulation in bdrv_open2
Untangle the open flag manipulation in bdrv_open2 and document why we are clearing the various flags in the different flag combinations. Signed-off-by: Christoph Hellwig h...@lst.de Index: qemu/block.c === --- qemu.orig/block.c 2010-01-11 17:52:54.273003789 +0100 +++ qemu/block.c2010-01-11 18:19:22.485006278 +0100 @@ -356,7 +356,7 @@ int bdrv_open(BlockDriverState *bs, cons int bdrv_open2(BlockDriverState *bs, const char *filename, int flags, BlockDriver *drv) { -int ret, open_flags, try_rw; +int ret, open_flags; char tmp_filename[PATH_MAX]; char backing_filename[PATH_MAX]; @@ -450,19 +450,39 @@ int bdrv_open2(BlockDriverState *bs, con if (flags (BDRV_O_CACHE_WB|BDRV_O_NOCACHE)) bs-enable_write_cache = 1; -/* Note: for compatibility, we open disk image files as RDWR, and - RDONLY as fallback */ -try_rw = !bs-read_only || bs-is_temporary; -if (!(flags BDRV_O_FILE)) -open_flags = (try_rw ? BDRV_O_RDWR : 0) | -(flags (BDRV_O_CACHE_MASK|BDRV_O_NATIVE_AIO)); -else -open_flags = flags ~(BDRV_O_FILE | BDRV_O_SNAPSHOT); +/* + * Always clear these flags as they are not destined for the drivers. + */ +open_flags = flags ~(BDRV_O_FILE | BDRV_O_SNAPSHOT); + +if (!(flags BDRV_O_FILE)) { +/* + * For historical reasons we always try to open drive images as + * read-write first and only fall back to read-only later. + * + * This can be overriden using readonly suboption for the + * drive option. + */ +if (bs-read_only !bs-is_temporary) { +open_flags = ~BDRV_O_RDWR; +} else { +open_flags |= BDRV_O_RDWR; +} -ret = drv-bdrv_open(bs, filename, open_flags); -if ((ret == -EACCES || ret == -EPERM) !(flags BDRV_O_FILE)) { -ret = drv-bdrv_open(bs, filename, open_flags ~BDRV_O_RDWR); -bs-read_only = 1; +/* + * Currently BDRV_O_CREAT is not supported by any image format, + * but I'm not sure that's reason enough to always clear it for + * the !BDRV_O_FILE case.. + */ +open_flags = ~(BDRV_O_CREAT); + +ret = drv-bdrv_open(bs, filename, open_flags); +if (ret == -EACCES || ret == -EPERM) { +ret = drv-bdrv_open(bs, filename, open_flags ~BDRV_O_RDWR); +bs-read_only = 1; +} +} else { +ret = drv-bdrv_open(bs, filename, open_flags); } if (ret 0) {
[Qemu-devel] Re: Re: [RFC][PATCH] performance improvement for windows guests, running on top of virtio block device
On Mon, Jan 11, 2010 at 08:37:10AM -0600, Anthony Liguori wrote: On 01/11/2010 08:29 AM, Avi Kivity wrote: On 01/11/2010 03:49 PM, Anthony Liguori wrote: So instead of disabling notify while requests are active we might want to only disable it while we are inside virtio_blk_handle_output. Something like the following minimally tested patch: I'd suggest that we get even more aggressive and install an idle bottom half that checks the queue for newly submitted requests. If we keep getting requests submitted before a new one completes, we'll never take an I/O exit. That has the downside of bouncing a cache line on unrelated exits. The read and write sides of the ring are widely separated in physical memory specifically to avoid cache line bouncing. It probably doesn't matter with qemu as it is now, since it will bounce qemu_mutex, but it will hurt with large guests (especially if they have many rings). IMO we should get things to work well without riding on unrelated exits, especially as we're trying to reduce those exits. A block I/O request can potentially be very, very long lived. By serializing requests like this, there's a high likelihood that it's going to kill performance with anything capable of processing multiple requests. OTOH, if we aggressively poll the ring when we have an opportunity to, there's very little down side to that and it addresses the serialization problem. The same approach is probably a good idea for virtio-net. With vhost-net you don't see exits. The point is, when we've disabled notification, we should poll on the ring for additional requests instead of waiting for one to complete before looking at another one. Even with vhost-net, this logic is still applicable although potentially harder to achieve. Regards, Anthony Liguori vhost net does this already: it has a mode where it poll when skbs have left send queue: for tap this is when they have crossed the bridge, for packet socket this is when they have been transmitted. -- MST
[Qemu-devel] Re: Re: [RFC][PATCH] performance improvement for windows guests, running on top of virtio block device
On Mon, Jan 11, 2010 at 09:13:23AM -0600, Anthony Liguori wrote: On 01/11/2010 08:46 AM, Avi Kivity wrote: On 01/11/2010 04:37 PM, Anthony Liguori wrote: That has the downside of bouncing a cache line on unrelated exits. The read and write sides of the ring are widely separated in physical memory specifically to avoid cache line bouncing. I meant, exits on random vcpus will cause the cacheline containing the notification disable flag to bounce around. As it is, we read it on the vcpu that owns the queue and write it on that vcpu or the I/O thread. Bottom halves are always run from the IO thread. It probably doesn't matter with qemu as it is now, since it will bounce qemu_mutex, but it will hurt with large guests (especially if they have many rings). IMO we should get things to work well without riding on unrelated exits, especially as we're trying to reduce those exits. A block I/O request can potentially be very, very long lived. By serializing requests like this, there's a high likelihood that it's going to kill performance with anything capable of processing multiple requests. Right, that's why I suggested having a queue depth at which disabling notification kicks in. The patch hardcodes this depth to 1, unpatched qemu is infinite, a good value is probably spindle count + VAT. That means we would need a user visible option which is quite unfortunate. Also, that logic only really makes sense with cache=off. With cache=writethrough, you can get pathological cases whereas you have an uncached access followed by cached accesses. In fact, with read-ahead, this is probably not an uncommon scenario. OTOH, if we aggressively poll the ring when we have an opportunity to, there's very little down side to that and it addresses the serialization problem. But we can't guarantee that we'll get those opportunities, so it doesn't address the problem in a general way. A guest that doesn't use hpet and only has a single virtio-blk device will not have any reason to exit to qemu. We can mitigate this with a timer but honestly, we need to do perf measurements to see. My feeling is that we will need some more aggressive form of polling than just waiting for IO completion. I don't think queue depth is enough because it assumes that all requests are equal. When dealing with cache=off or even just storage with it's own cache, that's simply not the case. Regards, Anthony Liguori BTW this is why vhost net uses queue depth in bytes. -- MST
[Qemu-devel] [PATCH] raw-posix: Detect CDROM via ioctl
Current CDROM detection is hardcoded based on source file name. Make this smarter by attempting a CDROM specific ioctl. This makes '-cdrom /dev/sr0' succeed with no media present. Signed-off-by: Cole Robinson crobi...@redhat.com --- block/raw-posix.c | 18 +- 1 files changed, 17 insertions(+), 1 deletions(-) diff --git a/block/raw-posix.c b/block/raw-posix.c index 5a6a22b..3ec58d0 100644 --- a/block/raw-posix.c +++ b/block/raw-posix.c @@ -1140,9 +1140,25 @@ static int cdrom_open(BlockDriverState *bs, const char *filename, int flags) static int cdrom_probe_device(const char *filename) { +int fd, ret; + if (strstart(filename, /dev/cd, NULL)) return 100; -return 0; + +fd = open(filename, O_RDONLY | O_NONBLOCK); +if (fd 0) { +return 0; +} + +/* Attempt to detect via a CDROM specific ioctl */ +if (ioctl(fd, CDROM_DRIVE_STATUS, CDSL_CURRENT) 0 errno == EINVAL) { +ret = 0; +} else { +ret = 100; +} + +close(fd); +return ret; } static int cdrom_is_inserted(BlockDriverState *bs) -- 1.6.6
[Qemu-devel] [PATCH] raw-posix: Detect IDE floppy via ioctl
Current IDE floppy detection is hardcoded based on source file name. Make this smarter by attempting a floppy specific ioctl. Signed-off-by: Cole Robinson crobi...@redhat.com --- block/raw-posix.c | 19 ++- 1 files changed, 18 insertions(+), 1 deletions(-) diff --git a/block/raw-posix.c b/block/raw-posix.c index 3ec58d0..5d6b9fb 100644 --- a/block/raw-posix.c +++ b/block/raw-posix.c @@ -1055,9 +1055,26 @@ static int floppy_open(BlockDriverState *bs, const char *filename, int flags) static int floppy_probe_device(const char *filename) { +int fd, ret; +struct floppy_struct fdparam; + if (strstart(filename, /dev/fd, NULL)) return 100; -return 0; + +fd = open(filename, O_RDONLY | O_NONBLOCK); +if (fd 0) { +return 0; +} + +/* Attempt to detect via a floppy specific ioctl */ +if (ioctl(fd, FDGETPRM, fdparam) 0 errno == EINVAL) { +ret = 0; +} else { +ret = 100; +} + +close(fd); +return ret; } -- 1.6.6
[Qemu-devel] QMP forward compatibility support
Hi. We (Markus and I) are working on getting QMP forward compatibility support, supported. :) We have a plan for it and I'd like to ask the CC'ed people to review it. Needless to say, but the objective here is to add new commands, arguments, async messages and protocol features w/o breaking existing clients. General Plan 1. QMP should describe itself, ie. it should dump all accepted commands, their replies and arguments, async messages and protocol features. All in JSON format 2. Protocol features are advertised by the (already existent) capabilities array, but are _disabled_ by default. The exception is async messages, which are _enabled_ by default 3. We should add command(s) to enable/disable protocol features 4. Proper feature negotiation is done in pause mode. That's, clients interested in enabling new protocol features should start QEMU in pause mode and enable the features they are interested in using Work to do -- 1. Markus is already working on getting QMP to describe itself, we should get a status/proposal soon. This is the most important part, as we may want to change query-commands output, query-events is missing and hopefully we can generate user docs from this 2. We need to add commands to enable/disable features. Not high priority, as the only existing feature is async messages which are enabled by default
[Qemu-devel] Re: [RFC] API change for pci_set_word and related functions (was Re: [PATCH] eepro100: Fix initial value for PCI_STATUS)
On Thu, Jan 07, 2010 at 04:07:26PM +0100, Stefan Weil wrote: Michael S. Tsirkin schrieb: On Thu, Jan 07, 2010 at 12:15:25PM +0100, Stefan Weil wrote: ... --- hw/eepro100.c | 4 +--- 1 files changed, 1 insertions(+), 3 deletions(-) diff --git a/hw/eepro100.c b/hw/eepro100.c index 336ca49..a21c984 100644 --- a/hw/eepro100.c +++ b/hw/eepro100.c @@ -420,10 +420,8 @@ static void pci_reset(EEPRO100State * s) /* TODO: this is the default, do not override. */ PCI_CONFIG_16(PCI_COMMAND, 0x); /* PCI Status */ - /* TODO: this seems to make no sense. */ /* TODO: Value at RST# should be 0. */ So this second todo can go too. I've removed it in my tree. - PCI_CONFIG_16(PCI_STATUS, - PCI_STATUS_REC_MASTER_ABORT | PCI_STATUS_SIG_TARGET_ABORT); + PCI_CONFIG_16(PCI_STATUS, PCI_STATUS_DEVSEL_MEDIUM | PCI_STATUS_FAST_BACK); /* PCI Revision ID */ PCI_CONFIG_8(PCI_REVISION_ID, 0x08); BTW if you are not afraid of churn, there's no reason for PCI_CONFIG_8 and friends anymore, because pci.h has much nicer pci_set_byte etc. Hello Michael, I already noticed pci_set_byte, pci_set_word, pci_set_long and the corresponding pci_get_xxx functions and thought about using them. I did not start it because I want to suggest a different API for use in PCI device emulations: instead of pci_set_word(pci_conf + PCI_STATUS, PCI_STATUS_CAP_LIST); or pci_set_word(pci_conf[PCI_STATUS], PCI_STATUS_CAP_LIST); it would be better to call pci_set_word(pci_conf, PCI_STATUS, PCI_STATUS_CAP_LIST); The prototypes would look like this: /* Set PCI config value. */ void pci_set_word(PCIDevice *s, uint8_t offset, uint16_t val); /* Set PCI cmask value. */ void pci_set_cmask_word(PCIDevice *s, uint8_t offset, uint16_t val); /* Set PCI wmask value. */ void pci_set_wmask_word(PCIDevice *s, uint8_t offset, uint16_t val); What are the advantages? * strict type checking (the old API takes any uint8_t *) So IMO it's easier to make mistakes with your proposed API because if you confuse offset and value compiler does not complain. More importantly, if you want to pass some other uint8_t pointer to e.g. pci_set_word, you can *and nothing unexpected will happen* it will set the word to the given value. So the current API is more type safe than what you propose. * many other pci_* functions also have a first parameter of type PCIDevice So what would make sense IMO is higer level abstraction, for example similar to what we have with capabilities and msix, I think we could have something like this for e.g. power management. For low-level bit tweaking, the advantages of current API is that same thing can be used to set wmask, cmask, config itself, and whatever else we will come up with. * calls look nicer (at least in my opinion) What I value is the fact that it's obvious which data is changed. * strict range checking (offset is limited to 0...255, additional assertions possible - the old API is unsafe because it just takes a pointer) I don't think we want to add return status, so there wouldn't be a benefit to range checking as we can't act on it. Anyway, it's very unusual to use anything but a constant as an offset, so range errors are very uncommon. The functions are inline, so the resulting code won't differ. Instead of _byte, _word and _long I personally prefer something like _8, _16, _32 because _word and _long need interpretation. But this is only a matter of taste - the API change is more important. Regards, Stefan Weil -- MST
[Qemu-devel] Re: [PATCH] eepro100: Update ROM file support
On Thu, Jan 07, 2010 at 05:13:30PM +0100, Stefan Weil wrote: Use new way to associate ROM files to devices. Currently, there is only a ROM file for i82559er included in QEMU, so the patch does not add .romfile for the other devices. When flexible mode is fixed in eepro100, adding more ROM files will be possible. It should be possible to create them from pxe-i82559er.bin, because etherboot uses the same driver for all eepro100 devices (only PCI ids differ). Maybe it is even possible to create a single pxe-i8255x.bin which supports all eepro100 devices (not supported with current etherboot). Signed-off-by: Stefan Weil w...@mail.berlios.de Gerd, could you ack this patch please? Thanks! --- hw/eepro100.c | 11 +-- 1 files changed, 1 insertions(+), 10 deletions(-) diff --git a/hw/eepro100.c b/hw/eepro100.c index a21c984..b33dbb8 100644 --- a/hw/eepro100.c +++ b/hw/eepro100.c @@ -40,7 +40,6 @@ #include stddef.h /* offsetof */ #include stdbool.h #include hw.h -#include loader.h /* rom_add_option */ #include pci.h #include net.h #include eeprom93xx.h @@ -1894,15 +1893,6 @@ static int nic_init(PCIDevice *pci_dev, uint32_t device) s-vmstate-name = s-nic-nc.model; vmstate_register(-1, s-vmstate, s); -if (!pci_dev-qdev.hotplugged) { -static int loaded = 0; -if (!loaded) { -char fname[32]; -snprintf(fname, sizeof(fname), pxe-%s.bin, s-nic-nc.model); -rom_add_option(fname); -loaded = 1; -} -} return 0; } @@ -2062,6 +2052,7 @@ static PCIDeviceInfo eepro100_info[] = { .qdev.size = sizeof(EEPRO100State), .init = pci_i82559er_init, .exit = pci_nic_uninit, +.romfile = pxe-i82559er.bin, .qdev.props = (Property[]) { DEFINE_NIC_PROPERTIES(EEPRO100State, conf), DEFINE_PROP_END_OF_LIST(), -- 1.6.5
Re: [Qemu-devel] QMP forward compatibility support
On 01/11/2010 12:34 PM, Luiz Capitulino wrote: Hi. We (Markus and I) are working on getting QMP forward compatibility support, supported. :) We have a plan for it and I'd like to ask the CC'ed people to review it. Needless to say, but the objective here is to add new commands, arguments, async messages and protocol features w/o breaking existing clients. General Plan 1. QMP should describe itself, ie. it should dump all accepted commands, their replies and arguments, async messages and protocol features. All in JSON format 2. Protocol features are advertised by the (already existent) capabilities array, but are _disabled_ by default. The exception is async messages, which are _enabled_ by default Any reason to have an exception like this? 3. We should add command(s) to enable/disable protocol features 4. Proper feature negotiation is done in pause mode. That's, clients interested in enabling new protocol features should start QEMU in pause mode and enable the features they are interested in using Why does this matter? We should be careful to support connecting to a VM long after it's been started so any requirement like this is likely to cause trouble. Regards, Anthony Liguori
Re: [Qemu-devel] [PATCH] dmg: fix -open failure
On Mon, 11 Jan 2010, Christoph Hellwig wrote: On Mon, Jan 11, 2010 at 02:56:16PM +0100, Kevin Wolf wrote: Are you going to propose a cleaner patch? I have currently some other bugs to do first, but I was certainly planning to do so. However, I'll happily leave it to you if you have the time right now. I'm looking into doing it in the generic block layer, yes. For dmg, I'm not even sure if it's worth fixing. Does anybody use this driver? But I guess it's good enough for another lowest priority task on my todo list. Right after vvfat or so... Hehe. All these odd image formats are extremly low in my todo list either. I'd be really interested if there are any users around at all. I use vvfat. -- mailto:av1...@comtv.ru
[Qemu-devel] virtio: Add memory statistics reporting to the balloon driver (V6)
After some good discussion, V6 of this patch integrates well with the new QMP support. When the monitor is in QMP mode, the query-balloon command triggers a stats refresh request to the guest. This request is asynchronous. If the guest does not respond then nothing further happens. When stats are updated, a BALLOON monitor event is raised and the data element will contain the memory statistics. For the user monitor, a timer has been added to prevent monitor hangs with unresponsive guests. When the timer fires, the most recently collected stats are returned along with an additional entry 'age' which indicates the number of host_clock milliseconds that have passed since the stats were collected. This method for dealing with asynchronous commands may prove useful for other existing or future commands. In that case, we may want to consider incorporating this into the actual monitor API. Changes since V5: - Asynchronous query-balloon mode for QMP - Add timeout to prevent hanging the user monitor in synchronous mode Changes since V4: - Virtio spec updated: http://ozlabs.org/~rusty/virtio-spec/virtio-spec-0.8.2.pdf - Guest-side Linux implementation applied by Rusty - Start using the QObject infrastructure - All endian conversions done in the host - Report stats that reference a quantity of memory in bytes Changes since V3: - Increase stat field size to 64 bits - Report all sizes in kb (not pages) - Drop anon_pages stat Changes since V2: - Use a virtqueue for communication instead of the device config space Changes since V1: - In the monitor, print all stats on one line with less abbreviated names - Coding style changes When using ballooning to manage overcommitted memory on a host, a system for guests to communicate their memory usage to the host can provide information that will minimize the impact of ballooning on the guests. The current method employs a daemon running in each guest that communicates memory statistics to a host daemon at a specified time interval. The host daemon aggregates this information and inflates and/or deflates balloons according to the level of host memory pressure. This approach is effective but overly complex since a daemon must be installed inside each guest and coordinated to communicate with the host. A simpler approach is to collect memory statistics in the virtio balloon driver and communicate them directly to the hypervisor. Signed-off-by: Adam Litke a...@us.ibm.com To: Anthony Liguori aligu...@us.ibm.com Cc: Avi Kivity a...@redhat.com Cc: Luiz Capitulino lcapitul...@redhat.com Cc: Jamie Lokier ja...@shareable.org Cc: qemu-devel@nongnu.org diff --git a/balloon.h b/balloon.h index 60b4a5d..7e29028 100644 --- a/balloon.h +++ b/balloon.h @@ -16,12 +16,22 @@ #include cpu-defs.h -typedef ram_addr_t (QEMUBalloonEvent)(void *opaque, ram_addr_t target); +/* Timeout for synchronous stats requests (in seconds) */ +#define QEMU_BALLOON_SYNC_TIMEOUT 5 + +typedef enum { +QEMU_BALLOON_MODE_NONE = 0, /* No stats request is active */ +QEMU_BALLOON_MODE_SYNC = 1, /* Synchronous stats request */ +QEMU_BALLOON_MODE_ASYNC = 2, /* Asynchronous stats request */ +} balloon_mode_t; + +typedef void (QEMUBalloonEvent)(void *opaque, ram_addr_t target, +balloon_mode_t mode); void qemu_add_balloon_handler(QEMUBalloonEvent *func, void *opaque); -void qemu_balloon(ram_addr_t target); +int qemu_balloon(ram_addr_t target); -ram_addr_t qemu_balloon_status(void); +int qemu_balloon_status(balloon_mode_t mode); #endif diff --git a/hw/virtio-balloon.c b/hw/virtio-balloon.c index cfd3b41..bf67f4d 100644 --- a/hw/virtio-balloon.c +++ b/hw/virtio-balloon.c @@ -19,6 +19,10 @@ #include balloon.h #include virtio-balloon.h #include kvm.h +#include monitor.h +#include qlist.h +#include qint.h +#include qstring.h #if defined(__linux__) #include sys/mman.h @@ -27,9 +31,15 @@ typedef struct VirtIOBalloon { VirtIODevice vdev; -VirtQueue *ivq, *dvq; +VirtQueue *ivq, *dvq, *svq; uint32_t num_pages; uint32_t actual; +uint64_t stats[VIRTIO_BALLOON_S_NR]; +VirtQueueElement stats_vq_elem; +size_t stats_vq_offset; +balloon_mode_t stats_request_mode; +QEMUTimer *stats_timer; +uint64_t stats_updated; } VirtIOBalloon; static VirtIOBalloon *to_virtio_balloon(VirtIODevice *vdev) @@ -46,6 +56,50 @@ static void balloon_page(void *addr, int deflate) #endif } +/* + * reset_stats - Mark all items in the stats array as unset + * + * This function needs to be called at device intialization and before + * before updating to a set of newly-generated stats. This will ensure that no + * stale values stick around in case the guest reports a subset of the supported + * statistics. + */ +static inline void reset_stats(VirtIOBalloon *dev) +{ +int i; +for (i = 0; i VIRTIO_BALLOON_S_NR; dev-stats[i++] = -1); +dev-stats_updated = qemu_get_clock(host_clock); +} + +static void
Re: [Qemu-devel] [PATCHv2] Add KVM paravirt cpuid leaf
On 01/07/2010 10:24 AM, Gleb Natapov wrote: diff --git a/target-i386/kvm.c b/target-i386/kvm.c index 4084503..6a841de 100644 --- a/target-i386/kvm.c +++ b/target-i386/kvm.c @@ -17,6 +17,7 @@ #includesys/mman.h #includelinux/kvm.h +#includelinux/kvm_para.h This breaks the build on a default F12 install because while kvm.h is present, kvm_para.h is not. This is a hard one to fix. We can default the kvm search path to /lib/modules/$(uname -r)/build, we can fix the glibc headers and live with it, or we can pull in the kvm headers into qemu. Avi/Marcelo/Jan, any thoughts from the qemu-kvm side? Regards, Anthony Liguori
[Qemu-devel] Registering key events android (qemu based) emulator
Hi All guys I am a bit new to this stuff. I want to call a method after user presses a combination of keys lets say alt+s or whatever. How can I do it using which event handler. Best Regards Anwar Ghani +31 647 344 773 --- On Mon, 1/11/10, Anthony Liguori anth...@codemonkey.ws wrote: From: Anthony Liguori anth...@codemonkey.ws Subject: Re: [Qemu-devel] [PATCHv2] Add KVM paravirt cpuid leaf To: Gleb Natapov g...@redhat.com Cc: Marcelo Tosatti mtosa...@redhat.com, Jan Kiszka jan.kis...@web.de, qemu-devel@nongnu.org, kvm-devel k...@vger.kernel.org, Avi Kivity a...@redhat.com Date: Monday, January 11, 2010, 7:18 PM On 01/07/2010 10:24 AM, Gleb Natapov wrote: diff --git a/target-i386/kvm.c b/target-i386/kvm.c index 4084503..6a841de 100644 --- a/target-i386/kvm.c +++ b/target-i386/kvm.c @@ -17,6 +17,7 @@ #includesys/mman.h #includelinux/kvm.h +#includelinux/kvm_para.h This breaks the build on a default F12 install because while kvm.h is present, kvm_para.h is not. This is a hard one to fix. We can default the kvm search path to /lib/modules/$(uname -r)/build, we can fix the glibc headers and live with it, or we can pull in the kvm headers into qemu. Avi/Marcelo/Jan, any thoughts from the qemu-kvm side? Regards, Anthony Liguori
[Qemu-devel] Re: [RFC] API change for pci_set_word and related functions
Michael S. Tsirkin schrieb: On Thu, Jan 07, 2010 at 04:07:26PM +0100, Stefan Weil wrote: Michael S. Tsirkin schrieb: On Thu, Jan 07, 2010 at 12:15:25PM +0100, Stefan Weil wrote: ... --- hw/eepro100.c | 4 +--- 1 files changed, 1 insertions(+), 3 deletions(-) diff --git a/hw/eepro100.c b/hw/eepro100.c index 336ca49..a21c984 100644 --- a/hw/eepro100.c +++ b/hw/eepro100.c @@ -420,10 +420,8 @@ static void pci_reset(EEPRO100State * s) /* TODO: this is the default, do not override. */ PCI_CONFIG_16(PCI_COMMAND, 0x); /* PCI Status */ - /* TODO: this seems to make no sense. */ /* TODO: Value at RST# should be 0. */ So this second todo can go too. I've removed it in my tree. - PCI_CONFIG_16(PCI_STATUS, - PCI_STATUS_REC_MASTER_ABORT | PCI_STATUS_SIG_TARGET_ABORT); + PCI_CONFIG_16(PCI_STATUS, PCI_STATUS_DEVSEL_MEDIUM | PCI_STATUS_FAST_BACK); /* PCI Revision ID */ PCI_CONFIG_8(PCI_REVISION_ID, 0x08); BTW if you are not afraid of churn, there's no reason for PCI_CONFIG_8 and friends anymore, because pci.h has much nicer pci_set_byte etc. Hello Michael, I already noticed pci_set_byte, pci_set_word, pci_set_long and the corresponding pci_get_xxx functions and thought about using them. I did not start it because I want to suggest a different API for use in PCI device emulations: instead of pci_set_word(pci_conf + PCI_STATUS, PCI_STATUS_CAP_LIST); or pci_set_word(pci_conf[PCI_STATUS], PCI_STATUS_CAP_LIST); it would be better to call pci_set_word(pci_conf, PCI_STATUS, PCI_STATUS_CAP_LIST); The prototypes would look like this: /* Set PCI config value. */ void pci_set_word(PCIDevice *s, uint8_t offset, uint16_t val); /* Set PCI cmask value. */ void pci_set_cmask_word(PCIDevice *s, uint8_t offset, uint16_t val); /* Set PCI wmask value. */ void pci_set_wmask_word(PCIDevice *s, uint8_t offset, uint16_t val); What are the advantages? * strict type checking (the old API takes any uint8_t *) So IMO it's easier to make mistakes with your proposed API because if you confuse offset and value compiler does not complain. More importantly, if you want to pass some other uint8_t pointer to e.g. pci_set_word, you can *and nothing unexpected will happen* it will set the word to the given value. So the current API is more type safe than what you propose. No. The current API takes any uint8_t pointer to read or write a value. This is not safe. The proposed API only takes a PCIDevice pointer and reads or writes only configuration (or cmask or wmask) values. Yes, you can take offset for value. If you are lucky and value is an uint16_t or uint32_t, your compiler will complain. And even if your compiler does not complain, it is wrong but still safe, because the code will only access the PCI configuration data. * many other pci_* functions also have a first parameter of type PCIDevice So what would make sense IMO is higer level abstraction, for example similar to what we have with capabilities and msix, I think we could have something like this for e.g. power management. For low-level bit tweaking, the advantages of current API is that same thing can be used to set wmask, cmask, config itself, and whatever else we will come up with. The low level API can be used where low level is adequate: in pci.c for example. To implement emulated PCI devices, a more robust API would be better. Think of the number of devices which are still missing, think of people who want to write a new PCI device emulation for QEMU without being a QEMU expert. * calls look nicer (at least in my opinion) What I value is the fact that it's obvious which data is changed. Here there is no difference between current and proposed API: old: pci_set_word(pci_conf[PCI_STATUS], PCI_STATUS_CAP_LIST); new: pci_set_word(pci_conf, PCI_STATUS, PCI_STATUS_CAP_LIST); Every function call hides what happens. If you really wanted to see which data is changed, you would have to write *(uint16_t *)pci_conf[PCI_STATUS] = cpu_to_le16(PCI_STATUS_CAP_LIST); * strict range checking (offset is limited to 0...255, additional assertions possible - the old API is unsafe because it just takes a pointer) I don't think we want to add return status, so there wouldn't be a benefit to range checking as we can't act on it. Anyway, it's very unusual to use anything but a constant as an offset, so range errors are very uncommon. There is an implicit range checking in the proposed API because the offset is uint8_t, so it cannot exceed the range which is valid for configuration offsets. A more elaborated check could require that configuration byte values are only addressed using pci_set_byte (not pci_set_long) and raise a fatal runtime error otherwise. Runtime checks without return values are well established in QEMU's code, and they are very useful for code writers. The functions are inline, so the resulting code won't differ. Instead of _byte, _word and _long I personally prefer something
Re: [Qemu-devel] [PATCH 2/3] use pkg-config for sdl when cross compiling
On 01/07/2010 08:42 AM, Paolo Bonzini wrote: Together with the previous patch this enables using the prefixed pkg-config, thus picking up the correct flags for SDL. Since pkg-config has an awful lot of dependencies, I still use sdl-config when not cross-compiling since some people may only have the latter. Wouldn't it make more sense to try to use pkg-config always and if it's not available, fall back on sdl-config? That means we don't have a separate code path for cross compiling which means it's less likely to break. Regards, Anthony Liguori
[Qemu-devel] Re: [RFC] API change for pci_set_word and related functions
On Mon, Jan 11, 2010 at 08:38:53PM +0100, Stefan Weil wrote: Michael S. Tsirkin schrieb: On Thu, Jan 07, 2010 at 04:07:26PM +0100, Stefan Weil wrote: Michael S. Tsirkin schrieb: On Thu, Jan 07, 2010 at 12:15:25PM +0100, Stefan Weil wrote: ... --- hw/eepro100.c | 4 +--- 1 files changed, 1 insertions(+), 3 deletions(-) diff --git a/hw/eepro100.c b/hw/eepro100.c index 336ca49..a21c984 100644 --- a/hw/eepro100.c +++ b/hw/eepro100.c @@ -420,10 +420,8 @@ static void pci_reset(EEPRO100State * s) /* TODO: this is the default, do not override. */ PCI_CONFIG_16(PCI_COMMAND, 0x); /* PCI Status */ - /* TODO: this seems to make no sense. */ /* TODO: Value at RST# should be 0. */ So this second todo can go too. I've removed it in my tree. - PCI_CONFIG_16(PCI_STATUS, - PCI_STATUS_REC_MASTER_ABORT | PCI_STATUS_SIG_TARGET_ABORT); + PCI_CONFIG_16(PCI_STATUS, PCI_STATUS_DEVSEL_MEDIUM | PCI_STATUS_FAST_BACK); /* PCI Revision ID */ PCI_CONFIG_8(PCI_REVISION_ID, 0x08); BTW if you are not afraid of churn, there's no reason for PCI_CONFIG_8 and friends anymore, because pci.h has much nicer pci_set_byte etc. Hello Michael, I already noticed pci_set_byte, pci_set_word, pci_set_long and the corresponding pci_get_xxx functions and thought about using them. I did not start it because I want to suggest a different API for use in PCI device emulations: instead of pci_set_word(pci_conf + PCI_STATUS, PCI_STATUS_CAP_LIST); or pci_set_word(pci_conf[PCI_STATUS], PCI_STATUS_CAP_LIST); it would be better to call pci_set_word(pci_conf, PCI_STATUS, PCI_STATUS_CAP_LIST); The prototypes would look like this: /* Set PCI config value. */ void pci_set_word(PCIDevice *s, uint8_t offset, uint16_t val); /* Set PCI cmask value. */ void pci_set_cmask_word(PCIDevice *s, uint8_t offset, uint16_t val); /* Set PCI wmask value. */ void pci_set_wmask_word(PCIDevice *s, uint8_t offset, uint16_t val); What are the advantages? * strict type checking (the old API takes any uint8_t *) So IMO it's easier to make mistakes with your proposed API because if you confuse offset and value compiler does not complain. More importantly, if you want to pass some other uint8_t pointer to e.g. pci_set_word, you can *and nothing unexpected will happen* it will set the word to the given value. So the current API is more type safe than what you propose. No. The current API takes any uint8_t pointer to read or write a value. This is not safe. Why isn't it? The proposed API only takes a PCIDevice pointer and reads or writes only configuration (or cmask or wmask) values. Yes, you can take offset for value. If you are lucky and value is an uint16_t or uint32_t, your compiler will complain. Such a compiler will also complain over most of qemu code. And even if your compiler does not complain, it is wrong but still safe, because the code will only access the PCI configuration data. Correct and safe beats wrong and safe every time. * many other pci_* functions also have a first parameter of type PCIDevice So what would make sense IMO is higer level abstraction, for example similar to what we have with capabilities and msix, I think we could have something like this for e.g. power management. For low-level bit tweaking, the advantages of current API is that same thing can be used to set wmask, cmask, config itself, and whatever else we will come up with. The low level API can be used where low level is adequate: in pci.c for example. To implement emulated PCI devices, a more robust API would be better. Think of the number of devices which are still missing, think of people who want to write a new PCI device emulation for QEMU without being a QEMU expert. * calls look nicer (at least in my opinion) What I value is the fact that it's obvious which data is changed. Here there is no difference between current and proposed API: old: pci_set_word(pci_conf[PCI_STATUS], PCI_STATUS_CAP_LIST); new: pci_set_word(pci_conf, PCI_STATUS, PCI_STATUS_CAP_LIST); Every function call hides what happens. If you really wanted to see which data is changed, you would have to write *(uint16_t *)pci_conf[PCI_STATUS] = cpu_to_le16(PCI_STATUS_CAP_LIST); That's what we used to have, and it's not all bad, but very verbose and ugly. * strict range checking (offset is limited to 0...255, additional assertions possible - the old API is unsafe because it just takes a pointer) I don't think we want to add return status, so there wouldn't be a benefit to range checking as we can't act on it. Anyway, it's very unusual to use anything but a constant as an offset, so range errors are very uncommon. There is an implicit range checking in the proposed API because the offset is uint8_t, so it cannot exceed the range which is valid for configuration offsets. Oh, btw, this is
Re: [Qemu-devel] [PATCH] virtio-pci: thinko fix
On 01/11/2010 09:57 AM, Michael S. Tsirkin wrote: Since patch ed757e140c0ada220f213036e4497315d24ca8bct, virtio will sometimes clear all status registers on bus master disable, which loses information such as VIRTIO_CONFIG_S_FAILED bit. This is a result of a patch being misapplied: code uses ! instead of ~ for bit operations as in Yan's original patch. This obviously does not make sense. Actually, the original patch used '!'. It didn't carry a Signed-off-by and when it was reposted with a SoB, this was changed to '~'. There was no indication though that the contents of the patch was different. Moral of the story is, when you make a change to the patch, add a (v2) and state in the commit message what change was made. Thanks for catching this. Regards, Anthony Liguori
Re: [Qemu-devel] QMP forward compatibility support
On Mon, Jan 11, 2010 at 12:57:15PM -0600, Anthony Liguori wrote: On 01/11/2010 12:34 PM, Luiz Capitulino wrote: Hi. We (Markus and I) are working on getting QMP forward compatibility support, supported. :) We have a plan for it and I'd like to ask the CC'ed people to review it. Needless to say, but the objective here is to add new commands, arguments, async messages and protocol features w/o breaking existing clients. General Plan 1. QMP should describe itself, ie. it should dump all accepted commands, their replies and arguments, async messages and protocol features. All in JSON format 2. Protocol features are advertised by the (already existent) capabilities array, but are _disabled_ by default. The exception is async messages, which are _enabled_ by default Any reason to have an exception like this? Personally I'd be fine with async events being off by default too. There's no point in me receiving events I'm not able to handle, so might as well only give me events I actually request. 3. We should add command(s) to enable/disable protocol features 4. Proper feature negotiation is done in pause mode. That's, clients interested in enabling new protocol features should start QEMU in pause mode and enable the features they are interested in using Why does this matter? We should be careful to support connecting to a VM long after it's been started so any requirement like this is likely to cause trouble. Indeed, this won't work for libvirt. If you stop the libvirtd daemon while a guest is running, and then start the libvirtd daemon again, it will need to re-negotiate features while the guest is running. Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|
[Qemu-devel] [PATCH 2/5] tcg-sparc: Implement add2, sub2, mulu2.
Add missing 32-bit double-word support opcodes. Signed-off-by: Richard Henderson r...@twiddle.net --- tcg/sparc/tcg-target.c | 27 +++ 1 files changed, 27 insertions(+), 0 deletions(-) diff --git a/tcg/sparc/tcg-target.c b/tcg/sparc/tcg-target.c index 067e26e..6934580 100644 --- a/tcg/sparc/tcg-target.c +++ b/tcg/sparc/tcg-target.c @@ -215,6 +215,7 @@ static inline int tcg_target_const_match(tcg_target_long val, #define BA (INSN_OP(0) | INSN_COND(COND_A, 0) | INSN_OP2(0x2)) #define ARITH_ADD (INSN_OP(2) | INSN_OP3(0x00)) +#define ARITH_ADDCC (INSN_OP(2) | INSN_OP3(0x10)) #define ARITH_AND (INSN_OP(2) | INSN_OP3(0x01)) #define ARITH_OR (INSN_OP(2) | INSN_OP3(0x02)) #define ARITH_ORCC (INSN_OP(2) | INSN_OP3(0x12)) @@ -238,6 +239,7 @@ static inline int tcg_target_const_match(tcg_target_long val, #define SHIFT_SRLX (INSN_OP(2) | INSN_OP3(0x26) | (1 12)) #define SHIFT_SRAX (INSN_OP(2) | INSN_OP3(0x27) | (1 12)) +#define RDY(INSN_OP(2) | INSN_OP3(0x28) | INSN_RS1(0)) #define WRY(INSN_OP(2) | INSN_OP3(0x30)) #define JMPL (INSN_OP(2) | INSN_OP3(0x38)) #define SAVE (INSN_OP(2) | INSN_OP3(0x3c)) @@ -410,6 +412,11 @@ static inline void tcg_out_sety(TCGContext *s, tcg_target_long val) fprintf(stderr, unimplemented sety %ld\n, (long)val); } +static inline void tcg_out_rdy(TCGContext *s, int rd) +{ +tcg_out32(s, RDY | INSN_RD(rd)); +} + static inline void tcg_out_addi(TCGContext *s, int reg, tcg_target_long val) { if (val != 0) { @@ -1132,6 +1139,23 @@ static inline void tcg_out_op(TCGContext *s, int opc, const TCGArg *args, args[2], const_args[2], args[3], const_args[3], args[5]); break; +case INDEX_op_add2_i32: +tcg_out_arithc(s, args[0], args[2], args[4], const_args[4], + ARITH_ADDCC); +tcg_out_arithc(s, args[1], args[3], args[5], const_args[5], + ARITH_ADDX); +break; +case INDEX_op_sub2_i32: +tcg_out_arithc(s, args[0], args[2], args[4], const_args[4], + ARITH_SUBCC); +tcg_out_arithc(s, args[1], args[3], args[5], const_args[5], + ARITH_SUBX); +break; +case INDEX_op_mulu2_i32: +tcg_out_arithc(s, args[0], args[2], args[3], const_args[3], + ARITH_UMUL); +tcg_out_rdy(s, args[1]); +break; #endif case INDEX_op_qemu_ld8u: @@ -1250,6 +1274,9 @@ static const TCGTargetOpDef sparc_op_defs[] = { { INDEX_op_brcond_i32, { r, rJ } }, #if TCG_TARGET_REG_BITS == 32 { INDEX_op_brcond2_i32, { r, r, rJ, rJ } }, +{ INDEX_op_add2_i32, { r, r, r, r, rJ, rJ } }, +{ INDEX_op_sub2_i32, { r, r, r, r, rJ, rJ } }, +{ INDEX_op_mulu2_i32, { r, r, r, rJ } }, #endif { INDEX_op_qemu_ld8u, { r, L } }, -- 1.6.5.2
[Qemu-devel] [PATCH 3/5] tcg-sparc: Do not remove %o[012] from 'r' constraint.
Only 'L' constraint needs that. Signed-off-by: Richard Henderson r...@twiddle.net --- tcg/sparc/tcg-target.c |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/tcg/sparc/tcg-target.c b/tcg/sparc/tcg-target.c index 6934580..8675fce 100644 --- a/tcg/sparc/tcg-target.c +++ b/tcg/sparc/tcg-target.c @@ -143,6 +143,9 @@ static int target_parse_constraint(TCGArgConstraint *ct, const char **pct_str) ct_str = *pct_str; switch (ct_str[0]) { case 'r': +ct-ct |= TCG_CT_REG; +tcg_regset_set32(ct-u.regs, 0, 0x); +break; case 'L': /* qemu_ld/st constraint */ ct-ct |= TCG_CT_REG; tcg_regset_set32(ct-u.regs, 0, 0x); -- 1.6.5.2
[Qemu-devel] [PATCH 4/5] tcg-sparc: Implement division properly.
The {div,divu}2 opcodes are intended for systems for which the division instruction produces both quotient and remainder. Sparc is not such a system. Indeed, the remainder must be computed as quot = a / b rem = a - (quot * b) Split out a tcg_out_div32 function that properly initializes Y with the extension of the input to 64-bits. Discard the code that used the 64-bit DIVX on sparc9/sparcv8plus without extending the inputs to 64-bits. Implement remainders in terms of division followed by multiplication. Signed-off-by: Richard Henderson r...@twiddle.net --- tcg/sparc/tcg-target.c | 82 ++- tcg/sparc/tcg-target.h |3 ++ 2 files changed, 55 insertions(+), 30 deletions(-) diff --git a/tcg/sparc/tcg-target.c b/tcg/sparc/tcg-target.c index 8675fce..1bef2fb 100644 --- a/tcg/sparc/tcg-target.c +++ b/tcg/sparc/tcg-target.c @@ -243,7 +243,7 @@ static inline int tcg_target_const_match(tcg_target_long val, #define SHIFT_SRAX (INSN_OP(2) | INSN_OP3(0x27) | (1 12)) #define RDY(INSN_OP(2) | INSN_OP3(0x28) | INSN_RS1(0)) -#define WRY(INSN_OP(2) | INSN_OP3(0x30)) +#define WRY(INSN_OP(2) | INSN_OP3(0x30) | INSN_RD(0)) #define JMPL (INSN_OP(2) | INSN_OP3(0x38)) #define SAVE (INSN_OP(2) | INSN_OP3(0x3c)) #define RESTORE(INSN_OP(2) | INSN_OP3(0x3d)) @@ -407,12 +407,9 @@ static inline void tcg_out_st(TCGContext *s, TCGType type, int arg, tcg_out_ldst(s, arg, arg1, arg2, STX); } -static inline void tcg_out_sety(TCGContext *s, tcg_target_long val) +static inline void tcg_out_sety(TCGContext *s, int rs) { -if (val == 0 || val == -1) -tcg_out32(s, WRY | INSN_IMM13(val)); -else -fprintf(stderr, unimplemented sety %ld\n, (long)val); +tcg_out32(s, WRY | INSN_RS1(TCG_REG_G0) | INSN_RS2(rs)); } static inline void tcg_out_rdy(TCGContext *s, int rd) @@ -444,6 +441,21 @@ static inline void tcg_out_andi(TCGContext *s, int reg, tcg_target_long val) } } +static void tcg_out_div32(TCGContext *s, int rd, int rs1, + int val2, int val2const, int uns) +{ +/* Load Y with the sign/zero extension of RS1 to 64-bits. */ +if (uns) { +tcg_out_sety(s, TCG_REG_G0); +} else { +tcg_out_arith(s, TCG_REG_I5, rs1, 31, SHIFT_SRA); +tcg_out_sety(s, TCG_REG_I5); +} + +tcg_out_arithc(s, rd, rs1, val2, val2const, + uns ? ARITH_UDIV : ARITH_SDIV); +} + static inline void tcg_out_nop(TCGContext *s) { tcg_out_sethi(s, TCG_REG_G0, 0); @@ -1113,24 +1125,22 @@ static inline void tcg_out_op(TCGContext *s, int opc, const TCGArg *args, case INDEX_op_mul_i32: c = ARITH_UMUL; goto gen_arith; -case INDEX_op_div2_i32: -#if defined(__sparc_v9__) || defined(__sparc_v8plus__) -c = ARITH_SDIVX; -goto gen_arith; -#else -tcg_out_sety(s, 0); -c = ARITH_SDIV; -goto gen_arith; -#endif -case INDEX_op_divu2_i32: -#if defined(__sparc_v9__) || defined(__sparc_v8plus__) -c = ARITH_UDIVX; -goto gen_arith; -#else -tcg_out_sety(s, 0); -c = ARITH_UDIV; -goto gen_arith; -#endif + +case INDEX_op_div_i32: +tcg_out_div32(s, args[0], args[1], args[2], const_args[2], 0); +break; +case INDEX_op_divu_i32: +tcg_out_div32(s, args[0], args[1], args[2], const_args[2], 1); +break; + +case INDEX_op_rem_i32: +case INDEX_op_remu_i32: +tcg_out_div32(s, TCG_REG_I5, args[1], args[2], const_args[2], + opc == INDEX_op_remu_i32); +tcg_out_arithc(s, TCG_REG_I5, TCG_REG_I5, args[2], const_args[2], + ARITH_UMUL); +tcg_out_arith(s, args[0], args[1], TCG_REG_I5, ARITH_SUB); +break; case INDEX_op_brcond_i32: tcg_out_brcond_i32(s, args[2], args[0], args[1], const_args[1], @@ -1214,12 +1224,20 @@ static inline void tcg_out_op(TCGContext *s, int opc, const TCGArg *args, case INDEX_op_mul_i64: c = ARITH_MULX; goto gen_arith; -case INDEX_op_div2_i64: +case INDEX_op_div_i64: c = ARITH_SDIVX; goto gen_arith; -case INDEX_op_divu2_i64: +case INDEX_op_divu_i64: c = ARITH_UDIVX; goto gen_arith; +case INDEX_op_rem_i64: +case INDEX_op_remu_i64: +tcg_out_arithc(s, TCG_REG_I5, args[1], args[2], const_args[2], + opc == INDEX_op_rem_i64 ? ARITH_SDIVX : ARITH_UDIVX); +tcg_out_arithc(s, TCG_REG_I5, TCG_REG_I5, args[2], const_args[2], + ARITH_MULX); +tcg_out_arith(s, args[0], args[1], TCG_REG_I5, ARITH_SUB); +break; case INDEX_op_brcond_i64: tcg_out_brcond_i64(s, args[2], args[0], args[1], const_args[1], @@ -1263,8 +1281,10 @@ static const TCGTargetOpDef sparc_op_defs[] = { { INDEX_op_add_i32, { r, r, rJ } }, { INDEX_op_mul_i32, { r, r, rJ } }, -
[Qemu-devel] [PATCH 5/5] tcg-sparc: Implement ext32[su]_i64
The 32-bit right-shift instructions is defined to extend the shifted output to 64-bits. A shift count of zero therefore is a simple extension without actually shifting. Signed-off-by: Richard Henderson r...@twiddle.net --- tcg/sparc/tcg-target.c | 16 tcg/sparc/tcg-target.h |5 + 2 files changed, 21 insertions(+), 0 deletions(-) diff --git a/tcg/sparc/tcg-target.c b/tcg/sparc/tcg-target.c index 1bef2fb..bc28f43 100644 --- a/tcg/sparc/tcg-target.c +++ b/tcg/sparc/tcg-target.c @@ -1238,6 +1238,20 @@ static inline void tcg_out_op(TCGContext *s, int opc, const TCGArg *args, ARITH_MULX); tcg_out_arith(s, args[0], args[1], TCG_REG_I5, ARITH_SUB); break; +case INDEX_op_ext32s_i64: +if (const_args[1]) { +tcg_out_movi(s, TCG_TYPE_I64, args[0], (int32_t)args[1]); +} else { +tcg_out_arithi(s, args[0], args[1], 0, SHIFT_SRA); +} +break; +case INDEX_op_ext32u_i64: +if (const_args[1]) { +tcg_out_movi_imm32(s, args[0], args[1]); +} else { +tcg_out_arithi(s, args[0], args[1], 0, SHIFT_SRL); +} +break; case INDEX_op_brcond_i64: tcg_out_brcond_i64(s, args[2], args[0], args[1], const_args[1], @@ -1344,6 +1358,8 @@ static const TCGTargetOpDef sparc_op_defs[] = { { INDEX_op_shl_i64, { r, r, rJ } }, { INDEX_op_shr_i64, { r, r, rJ } }, { INDEX_op_sar_i64, { r, r, rJ } }, +{ INDEX_op_ext32s_i64, { r, ri } }, +{ INDEX_op_ext32u_i64, { r, ri } }, { INDEX_op_brcond_i64, { r, rJ } }, #endif diff --git a/tcg/sparc/tcg-target.h b/tcg/sparc/tcg-target.h index e00707b..d27ed5a 100644 --- a/tcg/sparc/tcg-target.h +++ b/tcg/sparc/tcg-target.h @@ -91,6 +91,11 @@ enum { #define TCG_TARGET_HAS_div_i32 #define TCG_TARGET_HAS_div_i64 +#if TCG_TARGET_REG_BITS == 64 +#define TCG_TARGET_HAS_ext32s_i64 +#define TCG_TARGET_HAS_ext32u_i64 +#endif + //#define TCG_TARGET_HAS_bswap32_i32 //#define TCG_TARGET_HAS_bswap64_i64 //#define TCG_TARGET_HAS_neg_i32 -- 1.6.5.2
Re: [Qemu-devel] [PATCH] [tftp] Handle TFTP ERROR from client
On 01/07/2010 11:01 AM, Thomas Horsten wrote: If a PXE client only wants to find out the size of a file, it will open the file and then abort the transfer by sending a TFTP ERROR packet. The ERROR packet should cause qemu to terminate the session. If not, the sessions will soon run out and cause timeouts in the client. Also, if a TFTP session already exists with same IP/UDP port, it should be terminated when a new RRQ is received, instead of creating a duplicate (which will never be used). A patch for gPXE to send the ERROR packet is also being submitted to gPXE. Together they resolve slowness/hanging when booting pxegrub from qemu's internal TFTP server. The patch from Milan Plzik to return after sending OACK is also required for a complete fix. Signed-off-by: Thomas Horstentho...@horsten.com Signed-off-by: Milan Plzikmilan.pl...@gmail.com Applied. Thanks. Regards, Anthony Liguori --- slirp/tftp.c | 23 +++ 1 files changed, 23 insertions(+), 0 deletions(-) diff --git a/slirp/tftp.c b/slirp/tftp.c index db869fc..96c0e0c 100644 --- a/slirp/tftp.c +++ b/slirp/tftp.c @@ -264,6 +264,12 @@ static void tftp_handle_rrq(Slirp *slirp, struct tftp_t *tp, int pktlen) size_t prefix_len; char *req_fname; + /* check if a session already exists and if so terminate it */ + s = tftp_session_find(slirp, tp); + if (s= 0) { +tftp_session_terminate(slirp-tftp_sessions[s]); + } + s = tftp_session_allocate(slirp, tp); if (s 0) { @@ -386,6 +392,19 @@ static void tftp_handle_ack(Slirp *slirp, struct tftp_t *tp, int pktlen) } } +static void tftp_handle_error(Slirp *slirp, struct tftp_t *tp, int pktlen) +{ + int s; + + s = tftp_session_find(slirp, tp); + + if (s 0) { +return; + } + + tftp_session_terminate(slirp-tftp_sessions[s]); +} + void tftp_input(struct mbuf *m) { struct tftp_t *tp = (struct tftp_t *)m-m_data; @@ -398,5 +417,9 @@ void tftp_input(struct mbuf *m) case TFTP_ACK: tftp_handle_ack(m-slirp, tp, m-m_len); break; + + case TFTP_ERROR: +tftp_handle_error(m-slirp, tp, m-m_len); +break; } }
Re: [Qemu-devel] [PATCH] pc-bios: Update README (SeaBIOS)
On 01/07/2010 12:27 PM, Stefan Weil wrote: The PC BIOS no longer comes from Bochs. This patch updates the related entry. V2 - Modify SeaBIOS description and URL (Thanks to Gleb Natapov for the hint). Signed-off-by: Stefan Weilw...@mail.berlios.de Applied. Thanks. Regards, Anthony Liguori --- pc-bios/README |3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/pc-bios/README b/pc-bios/README index 1b7a666..7dfa459 100644 --- a/pc-bios/README +++ b/pc-bios/README @@ -1,4 +1,5 @@ -- The PC BIOS comes from the Bochs project (http://bochs.sourceforge.net/). +- SeaBIOS (bios.bin) is the successor of pc bios. + See http://www.seabios.org/ for more information. - The VGA BIOS and the Cirrus VGA BIOS come from the LGPL VGA bios project (http://www.nongnu.org/vgabios/).
Re: [Qemu-devel] [PATCHv7 1/3] qdev: add bit property type
On 01/10/2010 05:52 AM, Michael S. Tsirkin wrote: This adds bit property type, which is a boolean stored in a 32 bit integer field, with legal values on and off. Will be used by virtio for feature bits. Signed-off-by: Michael S. Tsirkinm...@redhat.com Acked-by: Gerd Hoffmannkra...@redhat.com Applied. Thanks. Regards, Anthony Liguori --- hw/qdev-properties.c | 62 - hw/qdev.h| 11 + 2 files changed, 66 insertions(+), 7 deletions(-) diff --git a/hw/qdev-properties.c b/hw/qdev-properties.c index 217ddc0..9e123ae 100644 --- a/hw/qdev-properties.c +++ b/hw/qdev-properties.c @@ -9,6 +9,59 @@ void *qdev_get_prop_ptr(DeviceState *dev, Property *prop) return ptr; } +static uint32_t qdev_get_prop_mask(Property *prop) +{ +assert(prop-info-type == PROP_TYPE_BIT); +return 0x1 prop-bitnr; +} + +static void bit_prop_set(DeviceState *dev, Property *props, bool val) +{ +uint32_t *p = qdev_get_prop_ptr(dev, props); +uint32_t mask = qdev_get_prop_mask(props); +if (val) +*p |= ~mask; +else +*p= ~mask; +} + +static void qdev_prop_cpy(DeviceState *dev, Property *props, void *src) +{ +if (props-info-type == PROP_TYPE_BIT) { +bool *defval = src; +bit_prop_set(dev, props, *defval); +} else { +char *dst = qdev_get_prop_ptr(dev, props); +memcpy(dst, src, props-info-size); +} +} + +/* Bit */ +static int parse_bit(DeviceState *dev, Property *prop, const char *str) +{ +if (!strncasecmp(str, on, 2)) +bit_prop_set(dev, prop, true); +else if (!strncasecmp(str, off, 3)) +bit_prop_set(dev, prop, false); +else +return -1; +return 0; +} + +static int print_bit(DeviceState *dev, Property *prop, char *dest, size_t len) +{ +uint8_t *p = qdev_get_prop_ptr(dev, prop); +return snprintf(dest, len, (*p qdev_get_prop_mask(prop)) ? on : off); +} + +PropertyInfo qdev_prop_bit = { +.name = on/off, +.type = PROP_TYPE_BIT, +.size = sizeof(uint32_t), +.parse = parse_bit, +.print = print_bit, +}; + /* --- 8bit integer --- */ static int parse_uint8(DeviceState *dev, Property *prop, const char *str) @@ -511,7 +564,6 @@ int qdev_prop_parse(DeviceState *dev, const char *name, const char *value) void qdev_prop_set(DeviceState *dev, const char *name, void *src, enum PropertyType type) { Property *prop; -void *dst; prop = qdev_prop_find(dev, name); if (!prop) { @@ -524,8 +576,7 @@ void qdev_prop_set(DeviceState *dev, const char *name, void *src, enum PropertyT __FUNCTION__, dev-info-name, name); abort(); } -dst = qdev_get_prop_ptr(dev, prop); -memcpy(dst, src, prop-info-size); +qdev_prop_cpy(dev, prop, src); } void qdev_prop_set_uint8(DeviceState *dev, const char *name, uint8_t value) @@ -585,14 +636,11 @@ void qdev_prop_set_ptr(DeviceState *dev, const char *name, void *value) void qdev_prop_set_defaults(DeviceState *dev, Property *props) { -char *dst; - if (!props) return; while (props-name) { if (props-defval) { -dst = qdev_get_prop_ptr(dev, props); -memcpy(dst, props-defval, props-info-size); +qdev_prop_cpy(dev, props, props-defval); } props++; } diff --git a/hw/qdev.h b/hw/qdev.h index bbcdba1..07b9603 100644 --- a/hw/qdev.h +++ b/hw/qdev.h @@ -64,6 +64,7 @@ struct Property { const char *name; PropertyInfo *info; int offset; +int bitnr; void *defval; }; @@ -82,6 +83,7 @@ enum PropertyType { PROP_TYPE_NETDEV, PROP_TYPE_VLAN, PROP_TYPE_PTR, +PROP_TYPE_BIT, }; struct PropertyInfo { @@ -173,6 +175,7 @@ void do_device_del(Monitor *mon, const QDict *qdict); /*** qdev-properties.c ***/ +extern PropertyInfo qdev_prop_bit; extern PropertyInfo qdev_prop_uint8; extern PropertyInfo qdev_prop_uint16; extern PropertyInfo qdev_prop_uint32; @@ -202,6 +205,14 @@ extern PropertyInfo qdev_prop_pci_devfn; + type_check(_type,typeof_field(_state, _field)), \ .defval= (_type[]) { _defval }, \ } +#define DEFINE_PROP_BIT(_name, _state, _field, _bit, _defval) { \ +.name = (_name),\ +.info =(qdev_prop_bit), \ +.bitnr= (_bit), \ +.offset= offsetof(_state, _field)\ ++ type_check(uint32_t,typeof_field(_state, _field)), \ +.defval= (bool[]) { (_defval) }, \ +} #define DEFINE_PROP_UINT8(_n, _s, _f, _d) \ DEFINE_PROP_DEFAULT(_n, _s, _f, _d, qdev_prop_uint8, uint8_t)
Re: [Qemu-devel] [PATCH] dmg: fix -open failure
On 01/11/2010 07:06 AM, Christoph Hellwig wrote: Currently the dmg image format driver simply opens the images as raw if any kind of failure happens. This is contrarty to the behaviour of all other image formats which just return an error and let the block core deal with it. Signed-off-by: Christoph Hellwigh...@lst.de Applied. Thanks. Regards, Anthony Liguori Index: qemu/block/dmg.c === --- qemu.orig/block/dmg.c 2010-01-11 14:00:25.945021645 +0100 +++ qemu/block/dmg.c2010-01-11 14:03:03.006036707 +0100 @@ -90,24 +90,21 @@ static int dmg_open(BlockDriverState *bs /* read offset of info blocks */ if(lseek(s-fd,-0x1d8,SEEK_END)0) { -dmg_close: - close(s-fd); - /* open raw instead */ - bs-drv=bdrv_find_format(raw); - return bs-drv-bdrv_open(bs, filename, flags); +goto fail; } + info_begin=read_off(s-fd); if(info_begin==0) - goto dmg_close; + goto fail; if(lseek(s-fd,info_begin,SEEK_SET)0) - goto dmg_close; + goto fail; if(read_uint32(s-fd)!=0x100) - goto dmg_close; + goto fail; if((count = read_uint32(s-fd))==0) - goto dmg_close; + goto fail; info_end = info_begin+count; if(lseek(s-fd,0xf8,SEEK_CUR)0) - goto dmg_close; + goto fail; /* read offsets */ last_in_offset = last_out_offset = 0; @@ -116,14 +113,14 @@ dmg_close: count = read_uint32(s-fd); if(count==0) - goto dmg_close; + goto fail; type = read_uint32(s-fd); if(type!=0x6d697368 || count244) lseek(s-fd,count-4,SEEK_CUR); else { int new_size, chunk_count; if(lseek(s-fd,200,SEEK_CUR)0) - goto dmg_close; + goto fail; chunk_count = (count-204)/40; new_size = sizeof(uint64_t) * (s-n_chunks + chunk_count); s-types = qemu_realloc(s-types, new_size/2); @@ -142,7 +139,7 @@ dmg_close: chunk_count--; i--; if(lseek(s-fd,36,SEEK_CUR)0) - goto dmg_close; + goto fail; continue; } read_uint32(s-fd); @@ -163,11 +160,14 @@ dmg_close: s-compressed_chunk = qemu_malloc(max_compressed_size+1); s-uncompressed_chunk = qemu_malloc(512*max_sectors_per_chunk); if(inflateInit(s-zstream) != Z_OK) - goto dmg_close; + goto fail; s-current_chunk = s-n_chunks; return 0; +fail: +close(s-fd); +return -1; } static inline int is_sector_in_chunk(BDRVDMGState* s,
Re: [Qemu-devel] [PATCH-RFC 00/13] vhost-net: preview
On Mon, Jan 11, 2010 at 07:16:42PM +0200, Michael S. Tsirkin wrote: Here's an untested patchset with vhost support for upstream qemu. Note that you should not expect performance gains from vhost unless in-kernel irqchip is enabled (which is not in upstream qemu now). Since adding vhost involves quite a bit of infrastructure, I thought it makes sense to send an RFC already, so that interested parties can review it. In particular, command line and help text need to be finalized early to so that management can start looking on supporting the feature. This patch has all bits besides migration filled in. Also missing is packet socket backend: another team is now working on this. Can you clarify a question about migration for me. Is it possible to live migrate a guest configured with tap + bridge on one machine over to another where it is launched with vhost + bridge, and vice-versa. In other words does this vhost support have any guest visible impact that would cause migraiton compatability problems, or it is purely a host side optimization like vnet_hdr was ? Regards, Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|
Re: [Qemu-devel] [PATCH 1/8] virtio: Remove duplicate macro definition for max. virtqueues, bump up the max
On 01/07/2010 01:31 AM, Amit Shah wrote: VIRTIO_PCI_QUEUE_MAX is redefined in hw/virtio.c. Let's just keep it in hw/virtio.h. Also, bump up the value of the maximum allowed virtqueues to 64. This is in preparation to allow multiple ports per virtio-console device. Signed-off-by: Amit Shahamit.s...@redhat.com These will need rebasing against master. It gets pretty ugly because of the rename. Wait for me to do a review (today or tomorrow) before hand though. I'll explicitly reply and Ack/Nack. Regards, Anthony Liguori
Re: [Qemu-devel] [PATCH 4/5] tcg-sparc: Implement division properly.
On 01/11/2010 11:09 AM, Richard Henderson wrote: +/* Load Y with the sign/zero extension of RS1 to 64-bits. */ +if (uns) { +tcg_out_sety(s, TCG_REG_G0); +} else { +tcg_out_arith(s, TCG_REG_I5, rs1, 31, SHIFT_SRA); Bah. tcg_out_arithi. Programming by code inspection only sucks. r~
Re: [Qemu-devel] [PATCH-RFC 00/13] vhost-net: preview
On Mon, Jan 11, 2010 at 08:09:10PM +, Daniel P. Berrange wrote: On Mon, Jan 11, 2010 at 07:16:42PM +0200, Michael S. Tsirkin wrote: Here's an untested patchset with vhost support for upstream qemu. Note that you should not expect performance gains from vhost unless in-kernel irqchip is enabled (which is not in upstream qemu now). Since adding vhost involves quite a bit of infrastructure, I thought it makes sense to send an RFC already, so that interested parties can review it. In particular, command line and help text need to be finalized early to so that management can start looking on supporting the feature. This patch has all bits besides migration filled in. Also missing is packet socket backend: another team is now working on this. Can you clarify a question about migration for me. Is it possible to live migrate a guest configured with tap + bridge on one machine over to another where it is launched with vhost + bridge, and vice-versa. Long term, this will be possible without any tweaks. Currently vhost does not support mergeable buffers feature which thus needs to be disabled on both sides for migration to work. Work is underway to add this support. In other words does this vhost support have any guest visible impact that would cause migraiton compatability problems, or it is purely a host side optimization like vnet_hdr was ? Regards, Daniel vnet_hdr has guest visible impact. -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|
[Qemu-devel] [PATCH] pci: Add missing 'const' in argument to pci_get_xxx
pci_get_byte, pci_get_word, pci_get_long and pci_get_quad all take a const uint8_t pointer, because they only read the configuration data. Their prototypes should reflect this fact. Signed-off-by: Stefan Weil w...@mail.berlios.de --- hw/pci.h | 14 +++--- 1 files changed, 7 insertions(+), 7 deletions(-) diff --git a/hw/pci.h b/hw/pci.h index 9b5ae97..025eb9f 100644 --- a/hw/pci.h +++ b/hw/pci.h @@ -245,7 +245,7 @@ pci_set_byte(uint8_t *config, uint8_t val) } static inline uint8_t -pci_get_byte(uint8_t *config) +pci_get_byte(const uint8_t *config) { return *config; } @@ -257,9 +257,9 @@ pci_set_word(uint8_t *config, uint16_t val) } static inline uint16_t -pci_get_word(uint8_t *config) +pci_get_word(const uint8_t *config) { -return le16_to_cpupu((uint16_t *)config); +return le16_to_cpupu((const uint16_t *)config); } static inline void @@ -269,9 +269,9 @@ pci_set_long(uint8_t *config, uint32_t val) } static inline uint32_t -pci_get_long(uint8_t *config) +pci_get_long(const uint8_t *config) { -return le32_to_cpupu((uint32_t *)config); +return le32_to_cpupu((const uint32_t *)config); } static inline void @@ -281,9 +281,9 @@ pci_set_quad(uint8_t *config, uint64_t val) } static inline uint64_t -pci_get_quad(uint8_t *config) +pci_get_quad(const uint8_t *config) { -return le64_to_cpup((uint64_t *)config); +return le64_to_cpup((const uint64_t *)config); } static inline void -- 1.6.5
[Qemu-devel] Re: [RFC] API change for pci_set_word and related functions
Michael S. Tsirkin schrieb: On Mon, Jan 11, 2010 at 08:38:53PM +0100, Stefan Weil wrote: Michael S. Tsirkin schrieb: On Thu, Jan 07, 2010 at 04:07:26PM +0100, Stefan Weil wrote: Michael S. Tsirkin schrieb: On Thu, Jan 07, 2010 at 12:15:25PM +0100, Stefan Weil wrote: ... - PCI_CONFIG_16(PCI_STATUS, - PCI_STATUS_REC_MASTER_ABORT | PCI_STATUS_SIG_TARGET_ABORT); + PCI_CONFIG_16(PCI_STATUS, PCI_STATUS_DEVSEL_MEDIUM | PCI_STATUS_FAST_BACK); /* PCI Revision ID */ PCI_CONFIG_8(PCI_REVISION_ID, 0x08); BTW if you are not afraid of churn, there's no reason for PCI_CONFIG_8 and friends anymore, because pci.h has much nicer pci_set_byte etc. Hello Michael, I already noticed pci_set_byte, pci_set_word, pci_set_long and the corresponding pci_get_xxx functions and thought about using them. I did not start it because I want to suggest a different API for use in PCI device emulations: instead of pci_set_word(pci_conf + PCI_STATUS, PCI_STATUS_CAP_LIST); or pci_set_word(pci_conf[PCI_STATUS], PCI_STATUS_CAP_LIST); it would be better to call pci_set_word(pci_conf, PCI_STATUS, PCI_STATUS_CAP_LIST); The prototypes would look like this: /* Set PCI config value. */ void pci_set_word(PCIDevice *s, uint8_t offset, uint16_t val); /* Set PCI cmask value. */ void pci_set_cmask_word(PCIDevice *s, uint8_t offset, uint16_t val); /* Set PCI wmask value. */ void pci_set_wmask_word(PCIDevice *s, uint8_t offset, uint16_t val); What are the advantages? * strict type checking (the old API takes any uint8_t *) So IMO it's easier to make mistakes with your proposed API because if you confuse offset and value compiler does not complain. More importantly, if you want to pass some other uint8_t pointer to e.g. pci_set_word, you can *and nothing unexpected will happen* it will set the word to the given value. So the current API is more type safe than what you propose. No. The current API takes any uint8_t pointer to read or write a value. This is not safe. Why isn't it? It is not safe, because it allows programmers to write silly code like these examples: pci_set_word(gen_opc_cc_op[PCI_STATUS], PCI_STATUS_CAP_LIST); pci_set_word(pci_conf[UINT32_MAX], PCI_STATUS_CAP_LIST); for (i = 0; i sizeof(pci_conf); i++) { pci_set_long(pci_conf[i], 0); } All three will result in runtime failures which can be very difficult to detect. The proposed API only takes a PCIDevice pointer and reads or writes only configuration (or cmask or wmask) values. Yes, you can take offset for value. If you are lucky and value is an uint16_t or uint32_t, your compiler will complain. Such a compiler will also complain over most of qemu code. Yes, but it's good to see that QEMU's code is improving. By the way, such a compiler is gcc when called with -Wconversion, so it is easy to see how much code is affected. And even if your compiler does not complain, it is wrong but still safe, because the code will only access the PCI configuration data. Correct and safe beats wrong and safe every time. See example above. * many other pci_* functions also have a first parameter of type PCIDevice So what would make sense IMO is higer level abstraction, for example similar to what we have with capabilities and msix, I think we could have something like this for e.g. power management. For low-level bit tweaking, the advantages of current API is that same thing can be used to set wmask, cmask, config itself, and whatever else we will come up with. The low level API can be used where low level is adequate: in pci.c for example. To implement emulated PCI devices, a more robust API would be better. Think of the number of devices which are still missing, think of people who want to write a new PCI device emulation for QEMU without being a QEMU expert. * calls look nicer (at least in my opinion) What I value is the fact that it's obvious which data is changed. Here there is no difference between current and proposed API: old: pci_set_word(pci_conf[PCI_STATUS], PCI_STATUS_CAP_LIST); new: pci_set_word(pci_conf, PCI_STATUS, PCI_STATUS_CAP_LIST); Every function call hides what happens. If you really wanted to see which data is changed, you would have to write *(uint16_t *)pci_conf[PCI_STATUS] = cpu_to_le16(PCI_STATUS_CAP_LIST); That's what we used to have, and it's not all bad, but very verbose and ugly. Yes. I also prefer a function API. * strict range checking (offset is limited to 0...255, additional assertions possible - the old API is unsafe because it just takes a pointer) I don't think we want to add return status, so there wouldn't be a benefit to range checking as we can't act on it. Anyway, it's very unusual to use anything but a constant as an offset, so range errors are very uncommon. There is an implicit range checking in the proposed API because the offset is uint8_t, so it cannot exceed the range which is valid for configuration offsets. Oh, btw, this is
[Qemu-devel] Re: [RFC] API change for pci_set_word and related functions
On Mon, Jan 11, 2010 at 09:18:51PM +0100, Stefan Weil wrote: Michael S. Tsirkin schrieb: On Mon, Jan 11, 2010 at 08:38:53PM +0100, Stefan Weil wrote: Michael S. Tsirkin schrieb: On Thu, Jan 07, 2010 at 04:07:26PM +0100, Stefan Weil wrote: Michael S. Tsirkin schrieb: On Thu, Jan 07, 2010 at 12:15:25PM +0100, Stefan Weil wrote: ... - PCI_CONFIG_16(PCI_STATUS, - PCI_STATUS_REC_MASTER_ABORT | PCI_STATUS_SIG_TARGET_ABORT); + PCI_CONFIG_16(PCI_STATUS, PCI_STATUS_DEVSEL_MEDIUM | PCI_STATUS_FAST_BACK); /* PCI Revision ID */ PCI_CONFIG_8(PCI_REVISION_ID, 0x08); BTW if you are not afraid of churn, there's no reason for PCI_CONFIG_8 and friends anymore, because pci.h has much nicer pci_set_byte etc. Hello Michael, I already noticed pci_set_byte, pci_set_word, pci_set_long and the corresponding pci_get_xxx functions and thought about using them. I did not start it because I want to suggest a different API for use in PCI device emulations: instead of pci_set_word(pci_conf + PCI_STATUS, PCI_STATUS_CAP_LIST); or pci_set_word(pci_conf[PCI_STATUS], PCI_STATUS_CAP_LIST); it would be better to call pci_set_word(pci_conf, PCI_STATUS, PCI_STATUS_CAP_LIST); The prototypes would look like this: /* Set PCI config value. */ void pci_set_word(PCIDevice *s, uint8_t offset, uint16_t val); /* Set PCI cmask value. */ void pci_set_cmask_word(PCIDevice *s, uint8_t offset, uint16_t val); /* Set PCI wmask value. */ void pci_set_wmask_word(PCIDevice *s, uint8_t offset, uint16_t val); What are the advantages? * strict type checking (the old API takes any uint8_t *) So IMO it's easier to make mistakes with your proposed API because if you confuse offset and value compiler does not complain. More importantly, if you want to pass some other uint8_t pointer to e.g. pci_set_word, you can *and nothing unexpected will happen* it will set the word to the given value. So the current API is more type safe than what you propose. No. The current API takes any uint8_t pointer to read or write a value. This is not safe. Why isn't it? It is not safe, because it allows programmers to write silly code like these examples: pci_set_word(gen_opc_cc_op[PCI_STATUS], PCI_STATUS_CAP_LIST); This might be valid and useful for all I know. pci_set_word(pci_conf[UINT32_MAX], PCI_STATUS_CAP_LIST); for (i = 0; i sizeof(pci_conf); i++) { pci_set_long(pci_conf[i], 0); } All three will result in runtime failures which can be very difficult to detect. In theory. In practice I expect pci_set_word(pci_dev, 0x0, PCI_STATUS) to be much more common with your proposed API. Just look how common swapping memset parameters is. The proposed API only takes a PCIDevice pointer and reads or writes only configuration (or cmask or wmask) values. Yes, you can take offset for value. If you are lucky and value is an uint16_t or uint32_t, your compiler will complain. Such a compiler will also complain over most of qemu code. Yes, but it's good to see that QEMU's code is improving. By the way, such a compiler is gcc when called with -Wconversion, so it is easy to see how much code is affected. Didn't try it. So it will warn on pci_set_word(pci_dev, 0x0, PCI_STATUS) but not pci_set_word(pci_dev, PCI_STATUS, 0x0) ? And even if your compiler does not complain, it is wrong but still safe, because the code will only access the PCI configuration data. Correct and safe beats wrong and safe every time. See example above. Example above tries to show that usng pointers in C is bad, switching to indexes all over is proposed as a solution. Oh well .. but I am surpised you bring up type safety as an argument, is is exactly the reason to use pointers. * many other pci_* functions also have a first parameter of type PCIDevice So what would make sense IMO is higer level abstraction, for example similar to what we have with capabilities and msix, I think we could have something like this for e.g. power management. For low-level bit tweaking, the advantages of current API is that same thing can be used to set wmask, cmask, config itself, and whatever else we will come up with. The low level API can be used where low level is adequate: in pci.c for example. To implement emulated PCI devices, a more robust API would be better. Think of the number of devices which are still missing, think of people who want to write a new PCI device emulation for QEMU without being a QEMU expert. * calls look nicer (at least in my opinion) What I value is the fact that it's obvious which data is changed. Here there is no difference between current and proposed API: old: pci_set_word(pci_conf[PCI_STATUS], PCI_STATUS_CAP_LIST); new: pci_set_word(pci_conf, PCI_STATUS, PCI_STATUS_CAP_LIST); Every function call hides what happens. If you really wanted to see which data is changed, you