[Qemu-devel] Re: [RFC][PATCH] performance improvement for windows guests, running on top of virtio block device

2010-01-11 Thread Avi Kivity

On 01/11/2010 09:40 AM, Vadim Rozenfeld wrote:

The following patch allows us to improve Windows virtio
block driver performance on small size requests.
Additionally, it leads to reducing of cpu usage on write IOs

   


Note, this is not an improvement for Windows specifically.


diff --git a/hw/virtio-blk.c b/hw/virtio-blk.c
index a2f0639..0e3a8d5 100644
--- a/hw/virtio-blk.c
+++ b/hw/virtio-blk.c
@@ -28,6 +28,7 @@ typedef struct VirtIOBlock
  char serial_str[BLOCK_SERIAL_STRLEN + 1];
  QEMUBH *bh;
  size_t config_size;
+unsigned int pending;
  } VirtIOBlock;

  static VirtIOBlock *to_virtio_blk(VirtIODevice *vdev)
@@ -87,6 +88,8 @@ typedef struct VirtIOBlockReq
  struct VirtIOBlockReq *next;
  } VirtIOBlockReq;

+static void virtio_blk_handle_output(VirtIODevice *vdev, VirtQueue
*vq);
+
  static void virtio_blk_req_complete(VirtIOBlockReq *req, int status)
  {
  VirtIOBlock *s = req-dev;
@@ -95,6 +98,11 @@ static void virtio_blk_req_complete(VirtIOBlockReq
*req, int status)
  virtqueue_push(s-vq,req-elem, req-qiov.size +
sizeof(*req-in));
  virtio_notify(s-vdev, s-vq);

+if(--s-pending == 0) {
+virtio_queue_set_notification(s-vq, 1);
+virtio_blk_handle_output(s-vdev, s-vq);
+}
+
   


Coding style: space after if.  See the CODING_STYLE file.


@@ -340,6 +348,9 @@ static void virtio_blk_handle_output(VirtIODevice
*vdev, VirtQueue *vq)
  exit(1);
  }

+if(++s-pending == 1)
+virtio_queue_set_notification(s-vq, 0);
+
  req-out = (void *)req-elem.out_sg[0].iov_base;
  req-in = (void *)req-elem.in_sg[req-elem.in_num -
1].iov_base;

   


Coding style: space after if, braces after if.

Your patch is word wrapped, please send it correctly.  Easiest using git 
send-email.


The patch has potential to reduce performance on volumes with multiple 
spindles.  Consider two processes issuing sequential reads into a RAID 
array.  With this patch, the reads will be executed sequentially rather 
than in parallel, so I think a follow-on patch to make the minimum depth 
a parameter (set by the guest? the host?) would be helpful.


--
error compiling committee.c: too many arguments to function





Re: [Qemu-devel] [PATCH 5/8] virtio-serial-bus: Add support for buffering guest output, throttling guests

2010-01-11 Thread Amit Shah
On (Fri) Jan 08 2010 [13:35:03], Jamie Lokier wrote:
 
 Since VNC is clearly designed to work over TCP, and is written by
 people who know this, I'm wondering why you think it needs to be
 different for virtio-serial.

For vnc putting stuff from a guest clipboard into vnc client clipboard
using the ServerCutText command, the entire buffer has to be provided
after sending the command and the 'length' values.

In this case, if the data from guest arrives in multiple packets, we
really don't want to call into the write function multiple times. A
single clipboard entry has to be created in the client with the entire
contents, so a single write operation has to be invoked.

For this to happen, there has to be some indication from the guest as to
how much data was written in one write() operation, which will let us
make a single write operation to the vnc client.

Amit




Re: [Qemu-devel] [PATCH 5/8] virtio-serial-bus: Add support for buffering guest output, throttling guests

2010-01-11 Thread Amit Shah
On (Fri) Jan 08 2010 [10:26:59], Anthony Liguori wrote:
 On 01/08/2010 07:35 AM, Jamie Lokier wrote:
 Sometimes it looks like TCP is maintaining write boundaries, but it is
 just an artifact of its behaviour on many systems, and is not reliable
 even on those systems where it seems to happen most of the time.  Even
 when connecting to localhost, you cannot rely on that.  I have seen
 people write code assuming TCP keeps boundaries, and then some weeks
 later they are very confused debugging their code because it is not
 reliable...

 Since VNC is clearly designed to work over TCP, and is written by
 people who know this, I'm wondering why you think it needs to be
 different for virtio-serial.


 I'm confused about why the buffering is needed in the first place.

 I would think that any buffering should be pushed back to the guest.   
 IOW, if there's available data from the char driver, but the guest  
 doesn't have a buffer.  Don't select on the char driver until the guest  
 has a buffer available.  If the guest attempts to write data but the  
 char driver isn't ready to receive data, don't complete the request  
 until the char driver can accept data.

This is a different thing from what Jamie's talking about. A guest or a
host might be interested in communicating data without waiting for the
other end to come up. The other end can just start consuming the data
(even the data that it missed while it wasn't connected) once it's up.

(I can remove this option for now and add it later, if you prefer it
that way.)

Amit




Re: [Qemu-devel] Re: [RFC][PATCH] performance improvement for windows guests, running on top of virtio block device

2010-01-11 Thread Dor Laor

On 01/11/2010 11:03 AM, Dor Laor wrote:

On 01/11/2010 10:30 AM, Avi Kivity wrote:

On 01/11/2010 09:40 AM, Vadim Rozenfeld wrote:

The following patch allows us to improve Windows virtio
block driver performance on small size requests.
Additionally, it leads to reducing of cpu usage on write IOs



Note, this is not an improvement for Windows specifically.


diff --git a/hw/virtio-blk.c b/hw/virtio-blk.c
index a2f0639..0e3a8d5 100644
--- a/hw/virtio-blk.c
+++ b/hw/virtio-blk.c
@@ -28,6 +28,7 @@ typedef struct VirtIOBlock
char serial_str[BLOCK_SERIAL_STRLEN + 1];
QEMUBH *bh;
size_t config_size;
+ unsigned int pending;
} VirtIOBlock;

static VirtIOBlock *to_virtio_blk(VirtIODevice *vdev)
@@ -87,6 +88,8 @@ typedef struct VirtIOBlockReq
struct VirtIOBlockReq *next;
} VirtIOBlockReq;

+static void virtio_blk_handle_output(VirtIODevice *vdev, VirtQueue
*vq);
+
static void virtio_blk_req_complete(VirtIOBlockReq *req, int status)
{
VirtIOBlock *s = req-dev;
@@ -95,6 +98,11 @@ static void virtio_blk_req_complete(VirtIOBlockReq
*req, int status)
virtqueue_push(s-vq,req-elem, req-qiov.size +
sizeof(*req-in));
virtio_notify(s-vdev, s-vq);

+ if(--s-pending == 0) {
+ virtio_queue_set_notification(s-vq, 1);
+ virtio_blk_handle_output(s-vdev, s-vq);


The above line should be moved out of the 'if'.

Attached results with rhel5.4 (qemu0.11) for win2k8 32bit guest. Note
the drastic reduction in cpu consumption.


Attachment did not survive the email server, so you'll have to trust me 
saying that cpu consumption was done from 65% - 40% for reads and from 
80% -- 30% for writes





+ }
+


Coding style: space after if. See the CODING_STYLE file.


@@ -340,6 +348,9 @@ static void virtio_blk_handle_output(VirtIODevice
*vdev, VirtQueue *vq)
exit(1);
}

+ if(++s-pending == 1)
+ virtio_queue_set_notification(s-vq, 0);
+
req-out = (void *)req-elem.out_sg[0].iov_base;
req-in = (void *)req-elem.in_sg[req-elem.in_num -
1].iov_base;



Coding style: space after if, braces after if.

Your patch is word wrapped, please send it correctly. Easiest using git
send-email.

The patch has potential to reduce performance on volumes with multiple
spindles. Consider two processes issuing sequential reads into a RAID
array. With this patch, the reads will be executed sequentially rather
than in parallel, so I think a follow-on patch to make the minimum depth
a parameter (set by the guest? the host?) would be helpful.









Re: [Qemu-devel] [PATCH 5/8] virtio-serial-bus: Add support for buffering guest output, throttling guests

2010-01-11 Thread Jamie Lokier
Amit Shah wrote:
 On (Fri) Jan 08 2010 [13:35:03], Jamie Lokier wrote:
  Since VNC is clearly designed to work over TCP, and is written by
  people who know this, I'm wondering why you think it needs to be
  different for virtio-serial.
 
 For vnc putting stuff from a guest clipboard into vnc client clipboard
 using the ServerCutText command, the entire buffer has to be provided
 after sending the command and the 'length' values.

Are you talking about a VNC protocol command between qemu's VNC server
and the user's VNC client, or a private protocol between the guest and
qemu's VNC server?

 In this case, if the data from guest arrives in multiple packets, we
 really don't want to call into the write function multiple times. A
 single clipboard entry has to be created in the client with the entire
 contents, so a single write operation has to be invoked.

Same question again: *Why do you think the VNC server (in qemu) needs to
see the entire clipboard in a aingle write from the guest?*

You have already told it the total length to expect.  There is no
ambiguity about where it ends.

There is no need to do any more, if the reciever (in qemu) is
implemented correctly with a sane protocol.  That's assuming the guest
sends to qemu's VNC server which then sends it to the user's VNC client.

 For this to happen, there has to be some indication from the guest as to
 how much data was written in one write() operation, which will let us
 make a single write operation to the vnc client.

When it is sent to the user's VNC client, it will be split into
multiple packets by TCP. You *can't* send a single large write over
TCP without getting it split at arbitrary places. It's *impossible*. TCP
doesn't support that. It will split and merge your writes arbitrarily.

So the only interesting part is how it's transmitted from the guest to
qemu's VNC server first. Do you get to design that protocol yourself?

-- Jamie




Re: [Qemu-devel] [PATCH 5/8] virtio-serial-bus: Add support for buffering guest output, throttling guests

2010-01-11 Thread Amit Shah
On (Mon) Jan 11 2010 [10:45:53], Jamie Lokier wrote:
 Amit Shah wrote:
  On (Fri) Jan 08 2010 [13:35:03], Jamie Lokier wrote:
   Since VNC is clearly designed to work over TCP, and is written by
   people who know this, I'm wondering why you think it needs to be
   different for virtio-serial.
  
  For vnc putting stuff from a guest clipboard into vnc client clipboard
  using the ServerCutText command, the entire buffer has to be provided
  after sending the command and the 'length' values.
 
 Are you talking about a VNC protocol command between qemu's VNC server
 and the user's VNC client, or a private protocol between the guest and
 qemu's VNC server?

What happens is:

1. Guest puts something on its clipboard
2. An agent on the guest gets notified of new clipboard contents
3. This agent sends over the entire clipboard contents to qemu via
   virtio-serial
4. virtio-serial sends off this data to the virtio-serial-vnc code
5. ServerCutText message from the vnc backend is sent to the vnc client
6. vnc client's clipboard gets updated
7. You can see guest's clipboard contents in your client's clipboard.

I'm talking about steps 3, 4, 5 here.

  In this case, if the data from guest arrives in multiple packets, we
  really don't want to call into the write function multiple times. A
  single clipboard entry has to be created in the client with the entire
  contents, so a single write operation has to be invoked.
 
 Same question again: *Why do you think the VNC server (in qemu) needs to
 see the entire clipboard in a aingle write from the guest?*
 
 You have already told it the total length to expect.  There is no
 ambiguity about where it ends.

Where does the total length come from? It has to come from the guest.
Otherwise, the vnc code will not know if a byte stream contains two
separate clipboard entries or just one huge clipboard entry.

Earlier, I used to send the length of one write as issued by a guest to
qemu. I just changed that to send a START and END flag so that I don't
have to send the length.

If this doesn't explain it, then I think we're not understanding each
other here.

Amit




[Qemu-devel] Hang up: qemu-system-ppc creates a ppc VM on a x86_64 host with OS Fedora12.

2010-01-11 Thread Zhiyong Wu
HI, guys,

when qemu-system-ppc creates a f12 ppc VM on a f12 x86_64 host,

qemu hangs on the command line of openbios such as:

Welcome to OpenBIOS v1.0 built on  Dec 7 2009 17:11

0 


While qemu creates a RH5.4 ppc VM, the output is such as:


cd:0,\ppc\chrp\yaboot.conf: Unknown or corrupt filesystem


Can anyone also encounter the same issue or know how to solve it?

It will be apprieciated for any help or comment from you.


Cheers,

Zhiyong Wu




[Qemu-devel] Re: Hang up: qemu-system-ppc creates a ppc VM on a x86_64 host with OS Fedora12.

2010-01-11 Thread Alexander Graf

Am 11.01.2010 um 12:22 schrieb Zhiyong Wu zwu.ker...@gmail.com:


HI, guys,

when qemu-system-ppc creates a f12 ppc VM on a f12 x86_64 host,

qemu hangs on the command line of openbios such as:

Welcome to OpenBIOS v1.0 built on  Dec 7 2009 17:11

0 


How do you start qemu? To boot from CD you need to pass -boot d.

Alex





[Qemu-devel] [PATCH] dmg: fix -open failure

2010-01-11 Thread Christoph Hellwig

Currently the dmg image format driver simply opens the images as raw
if any kind of failure happens.  This is contrarty to the behaviour
of all other image formats which just return an error and let the
block core deal with it.

Signed-off-by: Christoph Hellwig h...@lst.de

Index: qemu/block/dmg.c
===
--- qemu.orig/block/dmg.c   2010-01-11 14:00:25.945021645 +0100
+++ qemu/block/dmg.c2010-01-11 14:03:03.006036707 +0100
@@ -90,24 +90,21 @@ static int dmg_open(BlockDriverState *bs
 
 /* read offset of info blocks */
 if(lseek(s-fd,-0x1d8,SEEK_END)0) {
-dmg_close:
-   close(s-fd);
-   /* open raw instead */
-   bs-drv=bdrv_find_format(raw);
-   return bs-drv-bdrv_open(bs, filename, flags);
+goto fail;
 }
+
 info_begin=read_off(s-fd);
 if(info_begin==0)
-   goto dmg_close;
+   goto fail;
 if(lseek(s-fd,info_begin,SEEK_SET)0)
-   goto dmg_close;
+   goto fail;
 if(read_uint32(s-fd)!=0x100)
-   goto dmg_close;
+   goto fail;
 if((count = read_uint32(s-fd))==0)
-   goto dmg_close;
+   goto fail;
 info_end = info_begin+count;
 if(lseek(s-fd,0xf8,SEEK_CUR)0)
-   goto dmg_close;
+   goto fail;
 
 /* read offsets */
 last_in_offset = last_out_offset = 0;
@@ -116,14 +113,14 @@ dmg_close:
 
count = read_uint32(s-fd);
if(count==0)
-   goto dmg_close;
+   goto fail;
type = read_uint32(s-fd);
if(type!=0x6d697368 || count244)
lseek(s-fd,count-4,SEEK_CUR);
else {
int new_size, chunk_count;
if(lseek(s-fd,200,SEEK_CUR)0)
-   goto dmg_close;
+   goto fail;
chunk_count = (count-204)/40;
new_size = sizeof(uint64_t) * (s-n_chunks + chunk_count);
s-types = qemu_realloc(s-types, new_size/2);
@@ -142,7 +139,7 @@ dmg_close:
chunk_count--;
i--;
if(lseek(s-fd,36,SEEK_CUR)0)
-   goto dmg_close;
+   goto fail;
continue;
}
read_uint32(s-fd);
@@ -163,11 +160,14 @@ dmg_close:
 s-compressed_chunk = qemu_malloc(max_compressed_size+1);
 s-uncompressed_chunk = qemu_malloc(512*max_sectors_per_chunk);
 if(inflateInit(s-zstream) != Z_OK)
-   goto dmg_close;
+   goto fail;
 
 s-current_chunk = s-n_chunks;
 
 return 0;
+fail:
+close(s-fd);
+return -1;
 }
 
 static inline int is_sector_in_chunk(BDRVDMGState* s,




Re: [Qemu-devel] Re: [RFC][PATCH] performance improvement for windows guests, running on top of virtio block device

2010-01-11 Thread Christoph Hellwig
On Mon, Jan 11, 2010 at 11:19:21AM +0200, Dor Laor wrote:
 Attached results with rhel5.4 (qemu0.11) for win2k8 32bit guest. Note
 the drastic reduction in cpu consumption.
 
 Attachment did not survive the email server, so you'll have to trust me 
 saying that cpu consumption was done from 65% - 40% for reads and from 
 80% -- 30% for writes

For what kind of workload, and using what qemu parameters, and cpu
consumtion in the host or guest?  Either way this is an awfull lot, did
you use kernel AIO on the host?

Any chance you could publish the benchmark, guest and host configs
so we have meaningfull numbers?

FYI below is the manually applied patch without all the wrapping:


Index: qemu/hw/virtio-blk.c
===
--- qemu.orig/hw/virtio-blk.c   2010-01-11 14:05:09.619254004 +0100
+++ qemu/hw/virtio-blk.c2010-01-11 14:06:54.385013004 +0100
@@ -28,6 +28,7 @@ typedef struct VirtIOBlock
 char serial_str[BLOCK_SERIAL_STRLEN + 1];
 QEMUBH *bh;
 size_t config_size;
+unsigned int pending;
 } VirtIOBlock;
 
 static VirtIOBlock *to_virtio_blk(VirtIODevice *vdev)
@@ -87,6 +88,8 @@ typedef struct VirtIOBlockReq
 struct VirtIOBlockReq *next;
 } VirtIOBlockReq;
 
+static void virtio_blk_handle_output(VirtIODevice *vdev, VirtQueue *vq);
+
 static void virtio_blk_req_complete(VirtIOBlockReq *req, int status)
 {
 VirtIOBlock *s = req-dev;
@@ -95,6 +98,12 @@ static void virtio_blk_req_complete(Virt
 virtqueue_push(s-vq, req-elem, req-qiov.size + sizeof(*req-in));
 virtio_notify(s-vdev, s-vq);
 
+if (--s-pending == 0) {
+virtio_queue_set_notification(s-vq, 1);
+virtio_blk_handle_output(s-vdev, s-vq);
+}
+
+
 qemu_free(req);
 }
 
@@ -340,6 +349,9 @@ static void virtio_blk_handle_output(Vir
 exit(1);
 }
 
+if (++s-pending == 1)
+virtio_queue_set_notification(s-vq, 0);
+
 req-out = (void *)req-elem.out_sg[0].iov_base;
 req-in = (void *)req-elem.in_sg[req-elem.in_num - 1].iov_base;
 




Re: [Qemu-devel] Re: [RFC][PATCH] performance improvement for windows guests, running on top of virtio block device

2010-01-11 Thread Avi Kivity

On 01/11/2010 03:11 PM, Christoph Hellwig wrote:

FYI below is the manually applied patch without all the wrapping:
  static void virtio_blk_req_complete(VirtIOBlockReq *req, int status)
  {
  VirtIOBlock *s = req-dev;
@@ -95,6 +98,12 @@ static void virtio_blk_req_complete(Virt
  virtqueue_push(s-vq,req-elem, req-qiov.size + sizeof(*req-in));
  virtio_notify(s-vdev, s-vq);

+if (--s-pending == 0) {
+virtio_queue_set_notification(s-vq, 1);
+virtio_blk_handle_output(s-vdev, s-vq);
+}
+
   


As Dor points out, the call to virtio_blk_handle_output() wants to be 
before the test for pending, so we scan the ring as early as possible


--
error compiling committee.c: too many arguments to function





Re: [Qemu-devel] Re: [RFC][PATCH] performance improvement for windows guests, running on top of virtio block device

2010-01-11 Thread Christoph Hellwig
On Mon, Jan 11, 2010 at 03:13:53PM +0200, Avi Kivity wrote:
 As Dor points out, the call to virtio_blk_handle_output() wants to be 
 before the test for pending, so we scan the ring as early as possible

I just reposted the patch in a way that it applies to share the work I
did when starting to review it.





Re: [Qemu-devel] cpuid problem in upstream qemu with kvm

2010-01-11 Thread Markus Armbruster
Avi Kivity a...@redhat.com writes:

 On 01/07/2010 02:33 PM, Anthony Liguori wrote:

 There's another option.

 Make cpuid information part of live migration protocol, and then
 support something like -cpu Xeon-3550.  We would remember the exact
 cpuid mask we present to the guest and then we could validate that
 we can obtain the same mask on the destination.

 Currently, our policy is to only migrate dynamic (from the guest's
 point of view) state, and specify static state on the command line
 [1].

 I think your suggestion makes a lot of sense, but I'd like to expand
 it to move all guest state, whether dynamic or static.  So '-m 1G'
 would be migrated as well (but not -mem-path).  Similarly, in -drive
 file=...,if=ide,index=1, everything but file=... would be migrated.

Becomes a bit clearer with the new way to configure stuff:

  -drive if=none,id=DRIVE-ID,...--- host, don't migrate
  -device ide=drive,drive=DRIVE-ID,...  --- guest, do migrate

[...]




Re: [Qemu-devel] Re: [RFC][PATCH] performance improvement for windows guests, running on top of virtio block device

2010-01-11 Thread Christoph Hellwig
On Mon, Jan 11, 2010 at 10:30:53AM +0200, Avi Kivity wrote:
 The patch has potential to reduce performance on volumes with multiple 
 spindles.  Consider two processes issuing sequential reads into a RAID 
 array.  With this patch, the reads will be executed sequentially rather 
 than in parallel, so I think a follow-on patch to make the minimum depth 
 a parameter (set by the guest? the host?) would be helpful.

Let's think about the life cycle of I/O requests a bit.

We have an idle virtqueue (aka one virtio-blk device).  The first (read)
request comes in, we get the virtio notify from the guest, which calls
into virtio_blk_handle_output.  With the new code we now disable the
notify once we start processing the first request.  If the second
request hits the queue before we call into virtio_blk_get_request
the second time we're fine even with the new code as we keep picking it
up.  If however it hits after we leave virtio_blk_handle_output, but
before we complete the first request we do indeed introduce additional
latency.

So instead of disabling notify while requests are active we might want
to only disable it while we are inside virtio_blk_handle_output.
Something like the following minimally tested patch:


Index: qemu/hw/virtio-blk.c
===
--- qemu.orig/hw/virtio-blk.c   2010-01-11 14:28:42.896010503 +0100
+++ qemu/hw/virtio-blk.c2010-01-11 14:40:13.535256353 +0100
@@ -328,7 +328,15 @@ static void virtio_blk_handle_output(Vir
 int num_writes = 0;
 BlockDriverState *old_bs = NULL;
 
+/*
+ * While we are processing requests there is no need to get further
+ * notifications from the guest - it'll just burn cpu cycles doing
+ * useless context switches into the host.
+ */
+virtio_queue_set_notification(s-vq, 0);
+
 while ((req = virtio_blk_get_request(s))) {
+handle_request:
 if (req-elem.out_num  1 || req-elem.in_num  1) {
 fprintf(stderr, virtio-blk missing headers\n);
 exit(1);
@@ -358,6 +366,18 @@ static void virtio_blk_handle_output(Vir
 }
 }
 
+/*
+ * Once we're done processing all pending requests re-enable the queue
+ * notification.  If there's an entry pending after we enabled
+ * notification again we hit a race and need to process it before
+ * returning.
+ */
+virtio_queue_set_notification(s-vq, 1);
+req = virtio_blk_get_request(s);
+if (req) {
+goto handle_request;
+}
+
 if (num_writes  0) {
 do_multiwrite(old_bs, blkreq, num_writes);
 }





Re: [Qemu-devel] [PATCH] dmg: fix -open failure

2010-01-11 Thread Kevin Wolf
Am 11.01.2010 14:06, schrieb Christoph Hellwig:
 
 Currently the dmg image format driver simply opens the images as raw
 if any kind of failure happens.  This is contrarty to the behaviour
 of all other image formats which just return an error and let the
 block core deal with it.
 
 Signed-off-by: Christoph Hellwig h...@lst.de

Acked-by: Kevin Wolf kw...@redhat.com

I mean looking at the patched code I see lots of things that are wrong,
but they are all unrelated to your change: There are error cases where
memory is leaked, and it should use bdrv_* functions instead of the
native open/read/etc. And obviously coding style is completely off (most
annoying: tabs!)

Kevin




Re: [Qemu-devel] [PATCH] dmg: fix -open failure

2010-01-11 Thread Christoph Hellwig
On Mon, Jan 11, 2010 at 02:43:59PM +0100, Kevin Wolf wrote:
 Am 11.01.2010 14:06, schrieb Christoph Hellwig:
  
  Currently the dmg image format driver simply opens the images as raw
  if any kind of failure happens.  This is contrarty to the behaviour
  of all other image formats which just return an error and let the
  block core deal with it.
  
  Signed-off-by: Christoph Hellwig h...@lst.de
 
 Acked-by: Kevin Wolf kw...@redhat.com
 
 I mean looking at the patched code I see lots of things that are wrong,
 but they are all unrelated to your change: There are error cases where
 memory is leaked, and it should use bdrv_* functions instead of the
 native open/read/etc. And obviously coding style is completely off (most
 annoying: tabs!)

Yes, the code pretty much is a mess, but I didn't really want to touch
it.  I just looked into picking up your search host_ for raw patches and
was looking for all the block image driver functionality in the tree.

 
 Kevin
---end quoted text---




Re: [Qemu-devel] Re: [RFC][PATCH] performance improvement for windows guests, running on top of virtio block device

2010-01-11 Thread Christoph Hellwig
On Mon, Jan 11, 2010 at 03:13:53PM +0200, Avi Kivity wrote:
 As Dor points out, the call to virtio_blk_handle_output() wants to be 
 before the test for pending, so we scan the ring as early as possible

It could cause a race window where we add an entry to the ring after
we run virtio_blk_handle_output, but before re-enabling the
notification.  But I think my variant of the patch that I just posted
should deal with this in an even better way.





Re: [Qemu-devel] Re: [RFC][PATCH] performance improvement for windows guests, running on top of virtio block device

2010-01-11 Thread Anthony Liguori

On 01/11/2010 07:42 AM, Christoph Hellwig wrote:

On Mon, Jan 11, 2010 at 10:30:53AM +0200, Avi Kivity wrote:
   

The patch has potential to reduce performance on volumes with multiple
spindles.  Consider two processes issuing sequential reads into a RAID
array.  With this patch, the reads will be executed sequentially rather
than in parallel, so I think a follow-on patch to make the minimum depth
a parameter (set by the guest? the host?) would be helpful.
 

Let's think about the life cycle of I/O requests a bit.

We have an idle virtqueue (aka one virtio-blk device).  The first (read)
request comes in, we get the virtio notify from the guest, which calls
into virtio_blk_handle_output.  With the new code we now disable the
notify once we start processing the first request.  If the second
request hits the queue before we call into virtio_blk_get_request
the second time we're fine even with the new code as we keep picking it
up.  If however it hits after we leave virtio_blk_handle_output, but
before we complete the first request we do indeed introduce additional
latency.

So instead of disabling notify while requests are active we might want
to only disable it while we are inside virtio_blk_handle_output.
Something like the following minimally tested patch:
   


I'd suggest that we get even more aggressive and install an idle bottom 
half that checks the queue for newly submitted requests.  If we keep 
getting requests submitted before a new one completes, we'll never take 
an I/O exit.


The same approach is probably a good idea for virtio-net.

Regards,

Anthony Liguori




[Qemu-devel] Re: [ kvm-Bugs-2907597 ] qemu vnc server clips at 2560x1600

2010-01-11 Thread Anthony Liguori

On 01/10/2010 10:30 AM, Avi Kivity wrote:

On 01/10/2010 06:26 PM, SourceForge.net wrote:

Initial Comment:
So I am running using the VESA driver to run an Ubuntu 9.10 guest at 
2560x1600 (I had to modify the xserver-video-vesa package to remove 
an internal screen limit of 2048x2048 in the xorg vesa driver) and 
everything works great except that the qemu vnc server appears to 
clip at this resolution. The problem goes away if I run 1900x1200 and 
it doesn't change if I run 16bit depth or 24bit depth.


I have attached two screenshots, the first is vncing directly into 
qemu (which exhibits the problem) and the second is vncing to a vnc 
server I have running in the guest which doesn't have the problem.


I poked around in vnc.c and couldn't see any limits but I feel like 
its a buffer limit of some kind.


Also if you look very closely at the first image you can see that the 
first row is drawn correctly all the way across but subsequent rows 
are not.


If you need more information doesn't hesitate to ask.



Anthony, can you take a look at this?  Seems like a serious issue, 
could find nothing obvious in vnc.c.


VNC_MAX_WIDTH and VNC_MAX_HEIGHT in vnc.h are currently defined to 
2048.  We do dirty tracking with a bitmap and that bitmap is currently a 
fixed size.


2048 is bigger than any physical screen that I know of so I assume this 
is a multiple monitor scenario.  Long term, I think exposing multiple 
monitors to the guest is a better approach for this kind of functionality.


Since these resolutions for a single screen don't really exist, this is 
largely an untested path within the guest.


Regards,

Anthony Liguori




[Qemu-devel] Re: [RFC 0/7]: Add VNC connect/disconnect events

2010-01-11 Thread Daniel P. Berrange
On Fri, Jan 08, 2010 at 07:47:09PM -0200, Luiz Capitulino wrote:
  Hi there,
 
  This series contains two related changes. First a small cleanup is done in
 the current 'query-vnc' command, then two new QMP asynchronous events are
 introduced: VNC connect and disconnect.
 
  That's, everytime a VNC client connects or disconnects from QEMU, the
 QMP client will get full VNC client and VNC server info.
 
  There's one problem though and that's why this series is a RFC.
 
  The connection is a two step procedure if an authentication mechism is
 enabled. First the client establishes the connection then it authenticates.
 
  Currently, 'info vnc' and 'query-vnc' will show client information as soon
 as it establishes the connection even if the client didn't autheticate yet.
 
  This series changes that. Now, if an authentication mechanism is enabled,
 client information will only be available _after_ it has authenticated. Also,
 the connect/disconnect events are only emitted after the authentication step.
 
  There's a way to fix this and add the old behavior back, but we'll need
 one additional event (say CONNECT_AUTH) and the client will have to look
 at the server info to learn that a disconnection happened before
 authentication.

Hmm, I had not considered the timing problems when I discussed this with
you previously. I think it is important to separate the connect event
from the authentication event, because a mgmt application using QEMU
may want to see clients connecting even if they abort/delay before 
authenticating,

So perhaps we should declare that the lifecycle is

  -  CONNECT   (provide IP / port details)
  -  AUTHENTICATED (provide IP / port details + authenticated ID details
eg x509 dname, or SASL usernsmae)
  -  DISCONNECT(provide IP / port details)


Obviously AUTHENTICATED may be optional if the client goes away
immedaitely before trying auth. The AUTHENTICATED event probably
also ought to allow for an indication of success vs failure so 
the app can see failed login attempts

Daniel
-- 
|: Red Hat, Engineering, London   -o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org  -o-  http://virt-manager.org  -o-  http://ovirt.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-  F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|




Re: [Qemu-devel] [PATCH] dmg: fix -open failure

2010-01-11 Thread Kevin Wolf
Am 11.01.2010 14:46, schrieb Christoph Hellwig:
 On Mon, Jan 11, 2010 at 02:43:59PM +0100, Kevin Wolf wrote:
 Am 11.01.2010 14:06, schrieb Christoph Hellwig:

 Currently the dmg image format driver simply opens the images as raw
 if any kind of failure happens.  This is contrarty to the behaviour
 of all other image formats which just return an error and let the
 block core deal with it.

 Signed-off-by: Christoph Hellwig h...@lst.de

 Acked-by: Kevin Wolf kw...@redhat.com

 I mean looking at the patched code I see lots of things that are wrong,
 but they are all unrelated to your change: There are error cases where
 memory is leaked, and it should use bdrv_* functions instead of the
 native open/read/etc. And obviously coding style is completely off (most
 annoying: tabs!)
 
 Yes, the code pretty much is a mess, but I didn't really want to touch
 it.  I just looked into picking up your search host_ for raw patches and
 was looking for all the block image driver functionality in the tree.

Are you going to propose a cleaner patch? I have currently some other
bugs to do first, but I was certainly planning to do so. However, I'll
happily leave it to you if you have the time right now.

For dmg, I'm not even sure if it's worth fixing. Does anybody use this
driver? But I guess it's good enough for another lowest priority task on
my todo list. Right after vvfat or so...

Kevin




Re: [Qemu-devel] [PATCH] dmg: fix -open failure

2010-01-11 Thread Christoph Hellwig
On Mon, Jan 11, 2010 at 02:56:16PM +0100, Kevin Wolf wrote:
 Are you going to propose a cleaner patch? I have currently some other
 bugs to do first, but I was certainly planning to do so. However, I'll
 happily leave it to you if you have the time right now.

I'm looking into doing it in the generic block layer, yes.

 For dmg, I'm not even sure if it's worth fixing. Does anybody use this
 driver? But I guess it's good enough for another lowest priority task on
 my todo list. Right after vvfat or so...

Hehe.  All these odd image formats are extremly low in my todo list
either.  I'd be really interested if there are any users around at all.





Re: [Qemu-devel] Re: [RFC][PATCH] performance improvement for windows guests, running on top of virtio block device

2010-01-11 Thread Anthony Liguori

On 01/11/2010 07:47 AM, Christoph Hellwig wrote:

On Mon, Jan 11, 2010 at 03:13:53PM +0200, Avi Kivity wrote:
   

As Dor points out, the call to virtio_blk_handle_output() wants to be
before the test for pending, so we scan the ring as early as possible
 

It could cause a race window where we add an entry to the ring after
we run virtio_blk_handle_output, but before re-enabling the
notification.  But I think my variant of the patch that I just posted
should deal with this in an even better way.
   


When we've disabled notifications, we should use any opportunity we have 
in userspace to check the rings to see if anything was added.


Bottom halves are run at the very end of the event loop which means that 
they're guaranteed to be the last thing run.  idle bottom halves can be 
rescheduled without causing an infinite loop and do not affect the 
select timeout (which normal bottom halves do).


Regards,

Anthony Liguori



   






Re: [Qemu-devel] [PATCH] dmg: fix -open failure

2010-01-11 Thread Kevin Wolf
Am 11.01.2010 15:00, schrieb Christoph Hellwig:
 On Mon, Jan 11, 2010 at 02:56:16PM +0100, Kevin Wolf wrote:
 Are you going to propose a cleaner patch? I have currently some other
 bugs to do first, but I was certainly planning to do so. However, I'll
 happily leave it to you if you have the time right now.
 
 I'm looking into doing it in the generic block layer, yes.

More or less the same hack, just in cleaner? Or trying to fundamentally
change things? I think you haven't answered yet to what I said in the
thread of my original hack. I'm quoting it here for convenience:

 Ok, if you start talking about layering, we can have a fundamental
 discussion on this topic and why the layering is broken anyway.
 Logically, we have image formats like qcow2, VMDK and raw, and they are
 stored in files, on CD-ROMs or general block devices. From a layering
 perspective, it is wrong to include the latter in the raw format driver
 in the first place.

Actually, I think the differentiation between raw files and host_* is at
the same level as protocols are. Probably they should be implemented
very similarly.

Do you think it's possible/worth the effort to try putting things
straight here?

Kevin




Re: [Qemu-devel] Re: [RFC][PATCH] performance improvement for windows guests, running on top of virtio block device

2010-01-11 Thread Avi Kivity

On 01/11/2010 03:42 PM, Christoph Hellwig wrote:

On Mon, Jan 11, 2010 at 10:30:53AM +0200, Avi Kivity wrote:
   

The patch has potential to reduce performance on volumes with multiple
spindles.  Consider two processes issuing sequential reads into a RAID
array.  With this patch, the reads will be executed sequentially rather
than in parallel, so I think a follow-on patch to make the minimum depth
a parameter (set by the guest? the host?) would be helpful.
 

Let's think about the life cycle of I/O requests a bit.

We have an idle virtqueue (aka one virtio-blk device).  The first (read)
request comes in, we get the virtio notify from the guest, which calls
into virtio_blk_handle_output.  With the new code we now disable the
notify once we start processing the first request.  If the second
request hits the queue before we call into virtio_blk_get_request
the second time we're fine even with the new code as we keep picking it
up.  If however it hits after we leave virtio_blk_handle_output, but
before we complete the first request we do indeed introduce additional
latency.

So instead of disabling notify while requests are active we might want
to only disable it while we are inside virtio_blk_handle_output.
Something like the following minimally tested patch:


Index: qemu/hw/virtio-blk.c
===
--- qemu.orig/hw/virtio-blk.c   2010-01-11 14:28:42.896010503 +0100
+++ qemu/hw/virtio-blk.c2010-01-11 14:40:13.535256353 +0100
@@ -328,7 +328,15 @@ static void virtio_blk_handle_output(Vir
  int num_writes = 0;
  BlockDriverState *old_bs = NULL;

+/*
+ * While we are processing requests there is no need to get further
+ * notifications from the guest - it'll just burn cpu cycles doing
+ * useless context switches into the host.
+ */
+virtio_queue_set_notification(s-vq, 0);
+
  while ((req = virtio_blk_get_request(s))) {
+handle_request:
  if (req-elem.out_num  1 || req-elem.in_num  1) {
  fprintf(stderr, virtio-blk missing headers\n);
  exit(1);
@@ -358,6 +366,18 @@ static void virtio_blk_handle_output(Vir
  }
  }

+/*
+ * Once we're done processing all pending requests re-enable the queue
+ * notification.  If there's an entry pending after we enabled
+ * notification again we hit a race and need to process it before
+ * returning.
+ */
+virtio_queue_set_notification(s-vq, 1);
+req = virtio_blk_get_request(s);
+if (req) {
+goto handle_request;
+}
+
   


I don't think this will have much effect.  First, the time spent in 
virtio_blk_handle_output() is a small fraction of total guest time, so 
the probability of the guest hitting the notifications closed window is 
low.  Second, while we're in that function, the vcpu that kicked us is 
stalled, and other vcpus are likely to be locked out of the queue by the 
guest.


--
error compiling committee.c: too many arguments to function





Re: [Qemu-devel] Re: [RFC][PATCH] performance improvement for windows guests, running on top of virtio block device

2010-01-11 Thread Avi Kivity

On 01/11/2010 03:49 PM, Anthony Liguori wrote:

So instead of disabling notify while requests are active we might want
to only disable it while we are inside virtio_blk_handle_output.
Something like the following minimally tested patch:



I'd suggest that we get even more aggressive and install an idle 
bottom half that checks the queue for newly submitted requests.  If we 
keep getting requests submitted before a new one completes, we'll 
never take an I/O exit.




That has the downside of bouncing a cache line on unrelated exits.  It 
probably doesn't matter with qemu as it is now, since it will bounce 
qemu_mutex, but it will hurt with large guests (especially if they have 
many rings).


IMO we should get things to work well without riding on unrelated exits, 
especially as we're trying to reduce those exits.



The same approach is probably a good idea for virtio-net.


With vhost-net you don't see exits.

--
error compiling committee.c: too many arguments to function





[Qemu-devel] Re: [ kvm-Bugs-2907597 ] qemu vnc server clips at 2560x1600

2010-01-11 Thread Avi Kivity

On 01/11/2010 03:53 PM, Anthony Liguori wrote:
Anthony, can you take a look at this?  Seems like a serious issue, 
could find nothing obvious in vnc.c.



VNC_MAX_WIDTH and VNC_MAX_HEIGHT in vnc.h are currently defined to 
2048.  We do dirty tracking with a bitmap and that bitmap is currently 
a fixed size.


2048 is bigger than any physical screen that I know of so I assume 
this is a multiple monitor scenario.  Long term, I think exposing 
multiple monitors to the guest is a better approach for this kind of 
functionality.


Since these resolutions for a single screen don't really exist, this 
is largely an untested path within the guest.


Google suggests such screens do exist, and -vga std (at least in 
qemu-kvm) supports them.


I've asked the reporter to play with VNC_MAX_WIDTH.

--
error compiling committee.c: too many arguments to function





Re: [Qemu-devel] Re: [RFC][PATCH] performance improvement for windows guests, running on top of virtio block device

2010-01-11 Thread Anthony Liguori

On 01/11/2010 08:29 AM, Avi Kivity wrote:

On 01/11/2010 03:49 PM, Anthony Liguori wrote:

So instead of disabling notify while requests are active we might want
to only disable it while we are inside virtio_blk_handle_output.
Something like the following minimally tested patch:



I'd suggest that we get even more aggressive and install an idle 
bottom half that checks the queue for newly submitted requests.  If 
we keep getting requests submitted before a new one completes, we'll 
never take an I/O exit.




That has the downside of bouncing a cache line on unrelated exits.


The read and write sides of the ring are widely separated in physical 
memory specifically to avoid cache line bouncing.


  It probably doesn't matter with qemu as it is now, since it will 
bounce qemu_mutex, but it will hurt with large guests (especially if 
they have many rings).


IMO we should get things to work well without riding on unrelated 
exits, especially as we're trying to reduce those exits.


A block I/O request can potentially be very, very long lived.  By 
serializing requests like this, there's a high likelihood that it's 
going to kill performance with anything capable of processing multiple 
requests.


OTOH, if we aggressively poll the ring when we have an opportunity to, 
there's very little down side to that and it addresses the serialization 
problem.



The same approach is probably a good idea for virtio-net.


With vhost-net you don't see exits.


The point is, when we've disabled notification, we should poll on the 
ring for additional requests instead of waiting for one to complete 
before looking at another one.


Even with vhost-net, this logic is still applicable although potentially 
harder to achieve.


Regards,

Anthony Liguori





Re: [Qemu-devel] Re: [RFC][PATCH] performance improvement for windows guests, running on top of virtio block device

2010-01-11 Thread Avi Kivity

On 01/11/2010 04:37 PM, Anthony Liguori wrote:

That has the downside of bouncing a cache line on unrelated exits.



The read and write sides of the ring are widely separated in physical 
memory specifically to avoid cache line bouncing.


I meant, exits on random vcpus will cause the cacheline containing the 
notification disable flag to bounce around.  As it is, we read it on the 
vcpu that owns the queue and write it on that vcpu or the I/O thread.


  It probably doesn't matter with qemu as it is now, since it will 
bounce qemu_mutex, but it will hurt with large guests (especially if 
they have many rings).


IMO we should get things to work well without riding on unrelated 
exits, especially as we're trying to reduce those exits.


A block I/O request can potentially be very, very long lived.  By 
serializing requests like this, there's a high likelihood that it's 
going to kill performance with anything capable of processing multiple 
requests.


Right, that's why I suggested having a queue depth at which disabling 
notification kicks in.  The patch hardcodes this depth to 1, unpatched 
qemu is infinite, a good value is probably spindle count + VAT.


OTOH, if we aggressively poll the ring when we have an opportunity to, 
there's very little down side to that and it addresses the 
serialization problem.


But we can't guarantee that we'll get those opportunities, so it doesn't 
address the problem in a general way.  A guest that doesn't use hpet and 
only has a single virtio-blk device will not have any reason to exit to 
qemu.


--
error compiling committee.c: too many arguments to function





[Qemu-devel] [PATCH 1/4] ppc-40x: Get TLB attributes from TLBLO.

2010-01-11 Thread Edgar E. Iglesias
The ZSEL was incorrectly beeing decoded from TLBHI. Decode it from
TLBLO instead.

Signed-off-by: Edgar E. Iglesias edgar.igles...@gmail.com
---
 target-ppc/op_helper.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/target-ppc/op_helper.c b/target-ppc/op_helper.c
index cea27f2..3575b29 100644
--- a/target-ppc/op_helper.c
+++ b/target-ppc/op_helper.c
@@ -3990,7 +3990,6 @@ void helper_4xx_tlbwe_hi (target_ulong entry, 
target_ulong val)
 cpu_abort(env, Little-endian TLB entries are not supported by now\n);
 }
 tlb-PID = env-spr[SPR_40x_PID]; /* PID */
-tlb-attr = val  0xFF;
 LOG_SWTLB(%s: set up TLB %d RPN  TARGET_FMT_plx  EPN  TARGET_FMT_lx
size  TARGET_FMT_lx  prot %c%c%c%c PID %d\n, __func__,
   (int)entry, tlb-RPN, tlb-EPN, tlb-size,
@@ -4016,6 +4015,7 @@ void helper_4xx_tlbwe_lo (target_ulong entry, 
target_ulong val)
   val);
 entry = 0x3F;
 tlb = env-tlb[entry].tlbe;
+tlb-attr = val  0xFF;
 tlb-RPN = val  0xFC00;
 tlb-prot = PAGE_READ;
 if (val  0x200)
-- 
1.6.4.4





[Qemu-devel] [PATCH 4/4] ppc-40x: Correct ESR for zone protection faults.

2010-01-11 Thread Edgar E. Iglesias
Raise the zone protection fault in ESR for TLB faults caused by
zone protection bits.

Signed-off-by: Edgar E. Iglesias edgar.igles...@gmail.com
---
 target-ppc/helper.c |   23 ++-
 1 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/target-ppc/helper.c b/target-ppc/helper.c
index f9b5589..a4fae31 100644
--- a/target-ppc/helper.c
+++ b/target-ppc/helper.c
@@ -1171,6 +1171,8 @@ static int mmu40x_get_physical_address (CPUState *env, 
mmu_ctx_t *ctx,
 break;
 case 0x0:
 if (pr != 0) {
+/* Raise Zone protection fault.  */
+env-spr[SPR_40x_ESR] = 1  22;
 ctx-prot = 0;
 ret = -2;
 break;
@@ -1183,6 +1185,8 @@ static int mmu40x_get_physical_address (CPUState *env, 
mmu_ctx_t *ctx,
 ctx-prot = tlb-prot;
 ctx-prot |= PAGE_EXEC;
 ret = check_prot(ctx-prot, rw, access_type);
+if (ret == -2)
+env-spr[SPR_40x_ESR] = 0;
 break;
 }
 if (ret = 0) {
@@ -1580,11 +1584,20 @@ int cpu_ppc_handle_mmu_fault (CPUState *env, 
target_ulong address, int rw,
 /* Access rights violation */
 env-exception_index = POWERPC_EXCP_DSI;
 env-error_code = 0;
-env-spr[SPR_DAR] = address;
-if (rw == 1)
-env-spr[SPR_DSISR] = 0x0A00;
-else
-env-spr[SPR_DSISR] = 0x0800;
+if (env-mmu_model == POWERPC_MMU_SOFT_4xx
+|| env-mmu_model == POWERPC_MMU_SOFT_4xx_Z) {
+env-spr[SPR_40x_DEAR] = address;
+if (rw) {
+env-spr[SPR_40x_ESR] |= 0x0080;
+}
+} else {
+env-spr[SPR_DAR] = address;
+if (rw == 1) {
+env-spr[SPR_DSISR] = 0x0A00;
+} else {
+env-spr[SPR_DSISR] = 0x0800;
+}
+}
 break;
 case -4:
 /* Direct store exception */
-- 
1.6.4.4





[Qemu-devel] [PATCH 0/4] PPC 40x MMU fixes.

2010-01-11 Thread Edgar E. Iglesias
Hi,

I've been trying to boot linux on an emulated PPC 405 (Xilinx virtex 4)
board but I ran into trouble with the 40x MMU emulation.

Patch 4 is not the nicest but with this set I can now boot into a shell and
get a pretty stable system.

I'm very noob at ppc so comments are very welcome.

Thanks,
Edgar


Edgar E. Iglesias (4):
  ppc-40x: Get TLB attributes from TLBLO.
  ppc-40x: Correct check for Endian swapping TLB entries.
  ppc-40x: Correct decoding of zone protection bits.
  ppc-40x: Correct ESR for zone protection faults.

 target-ppc/helper.c|   25 +++--
 target-ppc/op_helper.c |   14 --
 2 files changed, 27 insertions(+), 12 deletions(-)





[Qemu-devel] [PATCH 3/4] ppc-40x: Correct decoding of zone protection bits.

2010-01-11 Thread Edgar E. Iglesias
The 40x MMU has 15 zones in the ZPR register.

Signed-off-by: Edgar E. Iglesias edgar.igles...@gmail.com
---
 target-ppc/helper.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/target-ppc/helper.c b/target-ppc/helper.c
index b233d4f..f9b5589 100644
--- a/target-ppc/helper.c
+++ b/target-ppc/helper.c
@@ -1155,7 +1155,7 @@ static int mmu40x_get_physical_address (CPUState *env, 
mmu_ctx_t *ctx,
  env-spr[SPR_40x_PID], 0, i)  0)
 continue;
 zsel = (tlb-attr  4)  0xF;
-zpr = (env-spr[SPR_40x_ZPR]  (28 - (2 * zsel)))  0x3;
+zpr = (env-spr[SPR_40x_ZPR]  (30 - (2 * zsel)))  0x3;
 LOG_SWTLB(%s: TLB %d zsel %d zpr %d rw %d attr %08x\n,
 __func__, i, zsel, zpr, rw, tlb-attr);
 /* Check execute enable bit */
-- 
1.6.4.4





Re: [Qemu-devel] Re: [ kvm-Bugs-2907597 ] qemu vnc server clips at 2560x1600

2010-01-11 Thread Jernej Simončič
On Monday, January 11, 2010, 14:53:05, Anthony Liguori wrote:

 2048 is bigger than any physical screen that I know of so I assume this
 is a multiple monitor scenario.

30 TFTs have resolution 2560x1600.

Speaking of large resolutions, I just tried 2360x1770 (the largest
offered by VMWare VGA driver in Win2003 x64), and in my case, the left
side of the screen is clipped, while the right side is mostly fine
(only the mouse pointer is leaving trails behind):
http://img27.imageshack.us/img27/2760/image1ud.png

-- 
 Jernej Simončič  http://eternallybored.org/ 

What the gods get away with, the cows don't.
   -- The Aquinas Axiom





Re: [Qemu-devel] Re: [RFC][PATCH] performance improvement for windows guests, running on top of virtio block device

2010-01-11 Thread Anthony Liguori

On 01/11/2010 08:46 AM, Avi Kivity wrote:

On 01/11/2010 04:37 PM, Anthony Liguori wrote:

That has the downside of bouncing a cache line on unrelated exits.



The read and write sides of the ring are widely separated in physical 
memory specifically to avoid cache line bouncing.


I meant, exits on random vcpus will cause the cacheline containing the 
notification disable flag to bounce around.  As it is, we read it on 
the vcpu that owns the queue and write it on that vcpu or the I/O thread.


Bottom halves are always run from the IO thread.
  It probably doesn't matter with qemu as it is now, since it will 
bounce qemu_mutex, but it will hurt with large guests (especially if 
they have many rings).


IMO we should get things to work well without riding on unrelated 
exits, especially as we're trying to reduce those exits.


A block I/O request can potentially be very, very long lived.  By 
serializing requests like this, there's a high likelihood that it's 
going to kill performance with anything capable of processing 
multiple requests.


Right, that's why I suggested having a queue depth at which disabling 
notification kicks in.  The patch hardcodes this depth to 1, unpatched 
qemu is infinite, a good value is probably spindle count + VAT.


That means we would need a user visible option which is quite unfortunate.

Also, that logic only really makes sense with cache=off.  With 
cache=writethrough, you can get pathological cases whereas you have an 
uncached access followed by cached accesses.  In fact, with read-ahead, 
this is probably not an uncommon scenario.


OTOH, if we aggressively poll the ring when we have an opportunity 
to, there's very little down side to that and it addresses the 
serialization problem.


But we can't guarantee that we'll get those opportunities, so it 
doesn't address the problem in a general way.  A guest that doesn't 
use hpet and only has a single virtio-blk device will not have any 
reason to exit to qemu.


We can mitigate this with a timer but honestly, we need to do perf 
measurements to see.  My feeling is that we will need some more 
aggressive form of polling than just waiting for IO completion.  I don't 
think queue depth is enough because it assumes that all requests are 
equal.  When dealing with cache=off or even just storage with it's own 
cache, that's simply not the case.


Regards,

Anthony Liguori





Re: [Qemu-devel] Re: [RFC][PATCH] performance improvement for windows guests, running on top of virtio block device

2010-01-11 Thread Anthony Liguori

On 01/11/2010 09:19 AM, Avi Kivity wrote:
OTOH, if we aggressively poll the ring when we have an opportunity 
to, there's very little down side to that and it addresses the 
serialization problem.


But we can't guarantee that we'll get those opportunities, so it 
doesn't address the problem in a general way.  A guest that doesn't 
use hpet and only has a single virtio-blk device will not have any 
reason to exit to qemu.


We can mitigate this with a timer but honestly, we need to do perf 
measurements to see.  My feeling is that we will need some more 
aggressive form of polling than just waiting for IO completion.  I 
don't think queue depth is enough because it assumes that all 
requests are equal.  When dealing with cache=off or even just storage 
with it's own cache, that's simply not the case.


Maybe we can adapt behaviour dynamically based on how fast results 
come in.


Based on our experiences with virtio-net, what I'd suggest is to make a 
lot of tunable options (ring size, various tx mitigation schemes, 
timeout durations, etc) and then we can do some deep performance studies 
to see how things interact with each other.


I think we should do that before making any changes because I'm deeply 
concerned that we'll introduce significant performance regressions.


Regards,

Anthony Liguori





Re: [Qemu-devel] Re: [RFC][PATCH] performance improvement for windows guests, running on top of virtio block device

2010-01-11 Thread Avi Kivity

On 01/11/2010 05:13 PM, Anthony Liguori wrote:

On 01/11/2010 08:46 AM, Avi Kivity wrote:

On 01/11/2010 04:37 PM, Anthony Liguori wrote:

That has the downside of bouncing a cache line on unrelated exits.



The read and write sides of the ring are widely separated in 
physical memory specifically to avoid cache line bouncing.


I meant, exits on random vcpus will cause the cacheline containing 
the notification disable flag to bounce around.  As it is, we read it 
on the vcpu that owns the queue and write it on that vcpu or the I/O 
thread.


Bottom halves are always run from the IO thread.


Okay, so that won't be an issue.

  It probably doesn't matter with qemu as it is now, since it will 
bounce qemu_mutex, but it will hurt with large guests (especially 
if they have many rings).


IMO we should get things to work well without riding on unrelated 
exits, especially as we're trying to reduce those exits.


A block I/O request can potentially be very, very long lived.  By 
serializing requests like this, there's a high likelihood that it's 
going to kill performance with anything capable of processing 
multiple requests.


Right, that's why I suggested having a queue depth at which disabling 
notification kicks in.  The patch hardcodes this depth to 1, 
unpatched qemu is infinite, a good value is probably spindle count + 
VAT.


That means we would need a user visible option which is quite 
unfortunate.


We could guess it, perhaps.

Also, that logic only really makes sense with cache=off.  With 
cache=writethrough, you can get pathological cases whereas you have an 
uncached access followed by cached accesses.  In fact, with 
read-ahead, this is probably not an uncommon scenario.


So you'd increase the disable depths when cache!=none.

OTOH, if we aggressively poll the ring when we have an opportunity 
to, there's very little down side to that and it addresses the 
serialization problem.


But we can't guarantee that we'll get those opportunities, so it 
doesn't address the problem in a general way.  A guest that doesn't 
use hpet and only has a single virtio-blk device will not have any 
reason to exit to qemu.


We can mitigate this with a timer but honestly, we need to do perf 
measurements to see.  My feeling is that we will need some more 
aggressive form of polling than just waiting for IO completion.  I 
don't think queue depth is enough because it assumes that all requests 
are equal.  When dealing with cache=off or even just storage with it's 
own cache, that's simply not the case.


Maybe we can adapt behaviour dynamically based on how fast results come in.

--
error compiling committee.c: too many arguments to function





[Qemu-devel] [PATCH 2/4] ppc-40x: Correct check for Endian swapping TLB entries.

2010-01-11 Thread Edgar E. Iglesias
Bailout on 40x TLB entries with endianess swapping only if the entry
is valid.

Signed-off-by: Edgar E. Iglesias edgar.igles...@gmail.com
---
 target-ppc/op_helper.c |   12 +++-
 1 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/target-ppc/op_helper.c b/target-ppc/op_helper.c
index 3575b29..f905c64 100644
--- a/target-ppc/op_helper.c
+++ b/target-ppc/op_helper.c
@@ -3981,13 +3981,15 @@ void helper_4xx_tlbwe_hi (target_ulong entry, 
target_ulong val)
   tlb-size, TARGET_PAGE_SIZE, (int)((val  7)  0x7));
 }
 tlb-EPN = val  ~(tlb-size - 1);
-if (val  0x40)
+if (val  0x40) {
 tlb-prot |= PAGE_VALID;
-else
+if (val  0x20) {
+/* XXX: TO BE FIXED */
+cpu_abort(env,
+  Little-endian TLB entries are not supported by now\n);
+}
+} else {
 tlb-prot = ~PAGE_VALID;
-if (val  0x20) {
-/* XXX: TO BE FIXED */
-cpu_abort(env, Little-endian TLB entries are not supported by now\n);
 }
 tlb-PID = env-spr[SPR_40x_PID]; /* PID */
 LOG_SWTLB(%s: set up TLB %d RPN  TARGET_FMT_plx  EPN  TARGET_FMT_lx
-- 
1.6.4.4





Re: [Qemu-devel] Re: [RFC][PATCH] performance improvement for windows guests, running on top of virtio block device

2010-01-11 Thread Avi Kivity

On 01/11/2010 05:22 PM, Anthony Liguori wrote:


Based on our experiences with virtio-net, what I'd suggest is to make 
a lot of tunable options (ring size, various tx mitigation schemes, 
timeout durations, etc) and then we can do some deep performance 
studies to see how things interact with each other.


I think we should do that before making any changes because I'm deeply 
concerned that we'll introduce significant performance regressions.




I agree.  We can start with this patch, with a tunable depth, defaulting 
to current behaviour.


--
error compiling committee.c: too many arguments to function





Re: [Qemu-devel] Re: [RFC][PATCH] performance improvement for windows guests, running on top of virtio block device

2010-01-11 Thread Anthony Liguori

On 01/11/2010 09:31 AM, Avi Kivity wrote:

On 01/11/2010 05:22 PM, Anthony Liguori wrote:


Based on our experiences with virtio-net, what I'd suggest is to make 
a lot of tunable options (ring size, various tx mitigation schemes, 
timeout durations, etc) and then we can do some deep performance 
studies to see how things interact with each other.


I think we should do that before making any changes because I'm 
deeply concerned that we'll introduce significant performance 
regressions.




I agree.  We can start with this patch, with a tunable depth, 
defaulting to current behaviour.


I wouldn't be opposed to that provided we made it clear that these 
options were not supported long term.  I don't want management tools 
(like libvirt) to start relying on them.


Regards,

Anthony Liguori






Re: [Qemu-devel] Re: [RFC][PATCH] performance improvement for windows guests, running on top of virtio block device

2010-01-11 Thread Avi Kivity

On 01/11/2010 05:32 PM, Anthony Liguori wrote:

On 01/11/2010 09:31 AM, Avi Kivity wrote:

On 01/11/2010 05:22 PM, Anthony Liguori wrote:


Based on our experiences with virtio-net, what I'd suggest is to 
make a lot of tunable options (ring size, various tx mitigation 
schemes, timeout durations, etc) and then we can do some deep 
performance studies to see how things interact with each other.


I think we should do that before making any changes because I'm 
deeply concerned that we'll introduce significant performance 
regressions.




I agree.  We can start with this patch, with a tunable depth, 
defaulting to current behaviour.


I wouldn't be opposed to that provided we made it clear that these 
options were not supported long term.  I don't want management tools 
(like libvirt) to start relying on them.




x-option-name for experimental options?

-device disk,if=virtio,x-queue-depth-suppress-notify=4

--
error compiling committee.c: too many arguments to function





Re: [Qemu-devel] Re: [RFC][PATCH] performance improvement for windows guests, running on top of virtio block device

2010-01-11 Thread Anthony Liguori

On 01/11/2010 09:35 AM, Avi Kivity wrote:

On 01/11/2010 05:32 PM, Anthony Liguori wrote:

On 01/11/2010 09:31 AM, Avi Kivity wrote:

On 01/11/2010 05:22 PM, Anthony Liguori wrote:


Based on our experiences with virtio-net, what I'd suggest is to 
make a lot of tunable options (ring size, various tx mitigation 
schemes, timeout durations, etc) and then we can do some deep 
performance studies to see how things interact with each other.


I think we should do that before making any changes because I'm 
deeply concerned that we'll introduce significant performance 
regressions.




I agree.  We can start with this patch, with a tunable depth, 
defaulting to current behaviour.


I wouldn't be opposed to that provided we made it clear that these 
options were not supported long term.  I don't want management tools 
(like libvirt) to start relying on them.




x-option-name for experimental options?

-device disk,if=virtio,x-queue-depth-suppress-notify=4


Sounds reasonable to me.

Regards,

Anthony Liguori





[Qemu-devel] [PATCH] virtio-pci: thinko fix

2010-01-11 Thread Michael S. Tsirkin
Since patch ed757e140c0ada220f213036e4497315d24ca8bct, virtio will
sometimes clear all status registers on bus master disable, which loses
information such as VIRTIO_CONFIG_S_FAILED bit.  This is a result of
a patch being misapplied: code uses !  instead of ~ for bit
operations as in Yan's original patch.  This obviously does not make
sense.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---

 hw/virtio-pci.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
index d222ce0..f4f1bc1 100644
--- a/hw/virtio-pci.c
+++ b/hw/virtio-pci.c
@@ -374,7 +374,7 @@ static void virtio_write_config(PCIDevice *pci_dev, 
uint32_t address,
 
 if (PCI_COMMAND == address) {
 if (!(val  PCI_COMMAND_MASTER)) {
-proxy-vdev-status = !VIRTIO_CONFIG_S_DRIVER_OK;
+proxy-vdev-status = ~VIRTIO_CONFIG_S_DRIVER_OK;
 }
 }
 
-- 
1.6.6.rc1.43.gf55cc




Re: [Qemu-devel] [BUGFIX] MCE: Fix bug of IA32_MCG_STATUS after system reset

2010-01-11 Thread Anthony Liguori

On 01/05/2010 02:32 AM, Huang Ying wrote:

Now, if we inject a fatal MCE into guest OS, for example Linux, Linux
will go panic and then reboot. But if we inject another MCE now,
system will reset directly instead of go panic firstly, because
MCG_STATUS.MCIP is set to 1 and not cleared after reboot. This is does
not follow the behavior in real hardware.

This patch fixes this via set env-mcg_status to 0 during system reset.

Signed-off-by: Huang Yingying.hu...@intel.com
   


Applied.  Thanks.

Regards,

Anthony Liguori

---
  target-i386/helper.c |2 ++
  1 file changed, 2 insertions(+)

--- a/target-i386/helper.c
+++ b/target-i386/helper.c
@@ -617,6 +617,8 @@ void cpu_reset(CPUX86State *env)
  env-dr[7] = DR7_FIXED_1;
  cpu_breakpoint_remove_all(env, BP_CPU);
  cpu_watchpoint_remove_all(env, BP_CPU);
+
+env-mcg_status = 0;
  }

  void cpu_x86_close(CPUX86State *env)





   






Re: [Qemu-devel] [PATCH] finish VPATH - vpath translation

2010-01-11 Thread Anthony Liguori

On 01/04/2010 04:02 AM, Paolo Bonzini wrote:

This adds a few more vpath suffixes and points the remaining two paths
explicitly to $(SRC_PATH) in order to eliminate the VPATH assignment
from config-host.mak.

Signed-off-by: Paolo Bonzinipbonz...@redhat.com
   


Applied.  Thanks.

Regards,

Anthony Liguori


Cc: Juan Quintelaquint...@redhat.com
---
  Makefile|2 +-
  Makefile.target |2 +-
  configure   |3 ---
  rules.mak   |3 ++-
  4 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/Makefile b/Makefile
index 10c04a8..5c4e256 100644
--- a/Makefile
+++ b/Makefile
@@ -8,7 +8,7 @@ ifneq ($(wildcard config-host.mak),)
  all: build-all
  include config-host.mak
  include $(SRC_PATH)/rules.mak
-config-host.mak: configure
+config-host.mak: $(SRC_PATH)/configure
@echo $@ is out-of-date, running configure
@sed -n /.*Configured with/s/[^:]*: //p $@ | sh
  else
diff --git a/Makefile.target b/Makefile.target
index edcd2e5..5999da7 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -316,7 +316,7 @@ $(QEMU_PROG): $(obj-y) $(obj-$(TARGET_BASE_ARCH)-y) 
$(ARLIBS)
$(call LINK,$(obj-y) $(obj-$(TARGET_BASE_ARCH)-y))


-gdbstub-xml.c: $(TARGET_XML_FILES) feature_to_c.sh
+gdbstub-xml.c: $(TARGET_XML_FILES) $(SRC_PATH)/feature_to_c.sh
$(call quiet-command,rm -f $@  $(SHELL) $(SRC_PATH)/feature_to_c.sh $@ 
$(TARGET_XML_FILES),  GEN   $(TARGET_DIR)$@)

  qemu-options.h: $(SRC_PATH)/qemu-options.hx
diff --git a/configure b/configure
index 18aed43..b7c6519 100755
--- a/configure
+++ b/configure
@@ -2022,9 +2022,6 @@ qemu_version=`head $source_path/VERSION`
  echo VERSION=$qemu_version$config_host_mak
  echo PKGVERSION=$pkgversion$config_host_mak
  echo SRC_PATH=$source_path  $config_host_mak
-if [ $source_path_used = yes ]; then
-  echo VPATH=$source_path  $config_host_mak
-fi
  echo TARGET_DIRS=$target_list  $config_host_mak
  if [ $docs = yes ] ; then
echo BUILD_DOCS=yes  $config_host_mak
diff --git a/rules.mak b/rules.mak
index 9cd67f0..82e8e3d 100644
--- a/rules.mak
+++ b/rules.mak
@@ -39,7 +39,8 @@ quiet-command = $(if $(V),$1,$(if $(2),@echo $2  $1, @$1))
  cc-option = $(if $(shell $(CC) $1 $2 -S -o /dev/null -xc /dev/null \
/dev/null 21  echo OK), $2, $3)

-set-vpath = $(if $1,$(foreach PATTERN,%.c %.h %.S, $(eval vpath $(PATTERN) 
$1)))
+VPATH_SUFFIXES = %.c %.h %.S %.m %.mak %.texi
+set-vpath = $(if $1,$(foreach PATTERN,$(VPATH_SUFFIXES), $(eval vpath 
$(PATTERN) $1)))

  # Generate timestamp files for .h include files

   






Re: [Qemu-devel] [PATCH] Fix CPU topology initialization

2010-01-11 Thread Anthony Liguori

On 01/05/2010 10:26 AM, Jiri Denemark wrote:

Late initialization of CPU topology in CPUState prevents KVM guests to
actually see the topology.

Signed-off-by: Jiri Denemarkjdene...@redhat.com
   


Applied.  Thanks.

Regards,

Anthony Liguori

---
  vl.c |8 
  1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/vl.c b/vl.c
index e881e45..a03d7a6 100644
--- a/vl.c
+++ b/vl.c
@@ -3484,10 +3484,10 @@ void qemu_init_vcpu(void *_env)
  {
  CPUState *env = _env;

-if (kvm_enabled())
-kvm_init_vcpu(env);
  env-nr_cores = smp_cores;
  env-nr_threads = smp_threads;
+if (kvm_enabled())
+kvm_init_vcpu(env);
  return;
  }

@@ -3813,12 +3813,12 @@ void qemu_init_vcpu(void *_env)
  {
  CPUState *env = _env;

+env-nr_cores = smp_cores;
+env-nr_threads = smp_threads;
  if (kvm_enabled())
  kvm_start_vcpu(env);
  else
  tcg_init_vcpu(env);
-env-nr_cores = smp_cores;
-env-nr_threads = smp_threads;
  }

  void qemu_notify_event(void)
   






Re: [Qemu-devel] [PATCH resend] vmware_vga: Check cursor dimensions passed from guest to avoid buffer overflow

2010-01-11 Thread Anthony Liguori

On 01/05/2010 10:43 PM, Roland Dreier wrote:

Check that the cursor dimensions passed from the guest for the
DEFINE_CURSOR command don't overflow the available space in the
cursor.image[] or cursor.mask[] arrays before copying data from the
guest into those arrays.

Signed-off-by: Roland Dreierrola...@cisco.com
   


Applied.  Thanks.

Regards,

Anthony Liguori

---
Hi Anthony,

as far as I can tell this seems to have slipped through the cracks.  I
think this is fairly important: it is a guest-triggerable stack smashing
attack in the worst case.

Thanks,
   Roland

  hw/vmware_vga.c |7 +++
  1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/hw/vmware_vga.c b/hw/vmware_vga.c
index 7ab1c79..5e969ae 100644
--- a/hw/vmware_vga.c
+++ b/hw/vmware_vga.c
@@ -562,6 +562,13 @@ static void vmsvga_fifo_run(struct vmsvga_state_s *s)
  cursor.height = y = vmsvga_fifo_read(s);
  vmsvga_fifo_read(s);
  cursor.bpp = vmsvga_fifo_read(s);
+
+   if (SVGA_BITMAP_SIZE(x, y)  sizeof cursor.mask ||
+   SVGA_PIXMAP_SIZE(x, y, cursor.bpp)  sizeof cursor.image) {
+   args = SVGA_BITMAP_SIZE(x, y) + SVGA_PIXMAP_SIZE(x, y, 
cursor.bpp);
+   goto badcmd;
+   }
+
  for (args = 0; args  SVGA_BITMAP_SIZE(x, y); args ++)
  cursor.mask[args] = vmsvga_fifo_read_raw(s);
  for (args = 0; args  SVGA_PIXMAP_SIZE(x, y, cursor.bpp); args ++)



   






Re: [Qemu-devel] [PATCH] remove pending exception on vcpu reset.

2010-01-11 Thread Anthony Liguori

On 01/06/2010 08:30 AM, Gleb Natapov wrote:

Without this qemu can even start on kvm modules with events support
since default value of exception_injected in zero and this is #DE
exception.

Signed-off-by: Gleb Natapovg...@redhat.com
   


Applied.  Thanks.

Regards,

Anthony Liguori

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index de79eb7..4084503 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -227,6 +227,7 @@ int kvm_arch_init_vcpu(CPUState *env)

  void kvm_arch_reset_vcpu(CPUState *env)
  {
+env-exception_injected = -1;
  env-interrupt_injected = -1;
  env-nmi_injected = 0;
  env-nmi_pending = 0;
--
Gleb.



   






[Qemu-devel] Re: [ kvm-Bugs-2907597 ] qemu vnc server clips at 2560x1600

2010-01-11 Thread Avi Kivity

On 01/11/2010 04:30 PM, Avi Kivity wrote:


VNC_MAX_WIDTH and VNC_MAX_HEIGHT in vnc.h are currently defined to 
2048.  We do dirty tracking with a bitmap and that bitmap is 
currently a fixed size.


2048 is bigger than any physical screen that I know of so I assume 
this is a multiple monitor scenario.  Long term, I think exposing 
multiple monitors to the guest is a better approach for this kind of 
functionality.


Since these resolutions for a single screen don't really exist, this 
is largely an untested path within the guest.



Google suggests such screens do exist, and -vga std (at least in 
qemu-kvm) supports them.


I've asked the reporter to play with VNC_MAX_WIDTH.



The reporter confirms that increasing V_M_W fixes the problem.

--
error compiling committee.c: too many arguments to function





Re: [Qemu-devel] mulu2 not implemented for tcg target sparc

2010-01-11 Thread Richard Henderson

On 01/10/2010 01:17 PM, Palle Lyckegaard wrote:

On Sun, 10 Jan 2010, Blue Swirl wrote:



Is it needed somewhere?




I was trying to run qemu-system-mips with a NetBSD malta kernel that
generates a MIPS mult operation. Tracing the code through tcg points at
a missing mulu2 opreration for sparc.

  It will be generated for instance for some muls when the target 
is i386 (not even x86_64) on a 32-bit host.


In that case mulu2 support would be useful for Sparc32 host too.
Palle, are you planning to dig into this?


Yes I will try even if my tcg and SPARC assembly skills and very limited
(so far...)


Try this.  You're also missing double-word addition and subtraction.


r~
diff --git a/tcg/sparc/tcg-target.c b/tcg/sparc/tcg-target.c
index 8f094e5..003f084 100644
--- a/tcg/sparc/tcg-target.c
+++ b/tcg/sparc/tcg-target.c
@@ -215,6 +215,7 @@ static inline int tcg_target_const_match(tcg_target_long 
val,
 #define BA (INSN_OP(0) | INSN_COND(COND_A, 0) | INSN_OP2(0x2))
 
 #define ARITH_ADD  (INSN_OP(2) | INSN_OP3(0x00))
+#define ARITH_ADDCC (INSN_OP(2) | INSN_OP3(0x10))
 #define ARITH_AND  (INSN_OP(2) | INSN_OP3(0x01))
 #define ARITH_OR   (INSN_OP(2) | INSN_OP3(0x02))
 #define ARITH_ORCC (INSN_OP(2) | INSN_OP3(0x12))
@@ -238,6 +239,7 @@ static inline int tcg_target_const_match(tcg_target_long 
val,
 #define SHIFT_SRLX (INSN_OP(2) | INSN_OP3(0x26) | (1  12))
 #define SHIFT_SRAX (INSN_OP(2) | INSN_OP3(0x27) | (1  12))
 
+#define RDY(INSN_OP(2) | INSN_OP3(0x28) | INSN_RS1(0))
 #define WRY(INSN_OP(2) | INSN_OP3(0x30))
 #define JMPL   (INSN_OP(2) | INSN_OP3(0x38))
 #define SAVE   (INSN_OP(2) | INSN_OP3(0x3c))
@@ -403,6 +405,11 @@ static inline void tcg_out_sety(TCGContext *s, 
tcg_target_long val)
 fprintf(stderr, unimplemented sety %ld\n, (long)val);
 }
 
+static inline void tcg_out_rdy(TCGContext *s, int rd)
+{
+tcg_out32(s, RDY | INSN_RD(rd));
+}
+
 static inline void tcg_out_addi(TCGContext *s, int reg, tcg_target_long val)
 {
 if (val != 0) {
@@ -1128,6 +1135,38 @@ static inline void tcg_out_op(TCGContext *s, int opc, 
const TCGArg *args,
 args[2], const_args[2],
 args[3], const_args[3], args[5]);
 break;
+case INDEX_op_add2_i32:
+if (const_args[4]) {
+tcg_out_arithi(s, args[0], args[2], args[4], ARITH_ADDCC);
+} else {
+tcg_out_arith(s, args[0], args[2], args[4], ARITH_ADDCC);
+}
+if (const_args[5]) {
+tcg_out_arithi(s, args[1], args[3], args[5], ARITH_ADDX);
+} else {
+tcg_out_arithi(s, args[1], args[3], args[5], ARITH_ADDX);
+}
+break;
+case INDEX_op_sub2_i32:
+if (const_args[4]) {
+tcg_out_arithi(s, args[0], args[2], args[4], ARITH_SUBCC);
+} else {
+tcg_out_arith(s, args[0], args[2], args[4], ARITH_SUBCC);
+}
+if (const_args[5]) {
+tcg_out_arithi(s, args[1], args[3], args[5], ARITH_SUBX);
+} else {
+tcg_out_arithi(s, args[1], args[3], args[5], ARITH_SUBX);
+}
+break;
+case INDEX_op_mulu2_i32:
+if (const_args[3]) {
+tcg_out_arithi(s, args[0], args[2], args[3], ARITH_UMUL);
+} else {
+tcg_out_arith(s, args[0], args[2], args[3], ARITH_UMUL);
+}
+tcg_out_rdy(s, args[1]);
+break;
 #endif
 
 case INDEX_op_qemu_ld8u:


[Qemu-devel] Re: New kvm-related qemu patch queue

2010-01-11 Thread Anthony Liguori

On 01/10/2010 06:02 AM, Avi Kivity wrote:
In order to improve qemu.git kvm integration quality wrt performance, 
features, and reliability Marcelo and I will begin to maintain a patch 
queue based on qemu.git containing kvm-related patches.  We will 
review and apply patches to this queue, test them using the same test 
suite that is used for qemu-kvm.git, and regularly submit them for 
inclusion in qemu.git, mimicking the relationship between kvm.git and 
Linus' linux-2.6.git.


Thanks for setting this up Avi!

I just want to stress that everyone continue CC'ing qemu-devel on all 
KVM patches.  Even if the patch is qemu-kvm specific for the moment, I 
think it's important for qemu-devel to be exposed to the refactoring work.


It might be good to prefix qemu-kvm.git patches in some manner to make 
it clear which repository they belong to.


Regards,

Anthony Liguori




[Qemu-devel] [PATCH-RFC 00/13] vhost-net: preview

2010-01-11 Thread Michael S. Tsirkin
Here's an untested patchset with vhost support for upstream qemu.  Note
that you should not expect performance gains from vhost unless in-kernel
irqchip is enabled (which is not in upstream qemu now).  Since adding
vhost involves quite a bit of infrastructure, I thought it makes sense
to send an RFC already, so that interested parties can review it.  In
particular, command line and help text need to be finalized early to so
that management can start looking on supporting the feature. This patch
has all bits besides migration filled in. Also missing is packet socket
backend: another team is now working on this.


Michael S. Tsirkin (13):
  virtio: export virtqueue structure
  kvm: add API to set ioeventfd
  virtio: add iofd/irqfd support
  virtio-pci: fill in irqfd/queufd support
  syborg_virtio: add irqfd/eventfd support
  s390-virtio: fill in irqfd support
  virtio: move typedef to qemu-common
  net/tap: add interface to get device fd
  tap: add vhost/vhostfd options
  tap: add API to retrieve vhost net header
  vhost net support
  virtio: add status change callback
  virtio-net: connect to vhost net backend

 Makefile.target  |1 +
 hw/s390-virtio-bus.c |   19 +++
 hw/syborg_virtio.c   |   27 
 hw/vhost.c   |  349 ++
 hw/vhost.h   |   33 +
 hw/vhost_net.c   |  145 +
 hw/vhost_net.h   |   20 +++
 hw/virtio-net.c  |   40 ++
 hw/virtio-pci.c  |   32 +
 hw/virtio.c  |   31 ++---
 hw/virtio.h  |   23 +++-
 kvm-all.c|   24 
 kvm.h|3 +
 net.c|8 +
 net/tap.c|   43 ++
 net/tap.h|5 +
 qemu-common.h|1 +
 qemu-options.hx  |4 +-
 18 files changed, 785 insertions(+), 23 deletions(-)
 create mode 100644 hw/vhost.c
 create mode 100644 hw/vhost.h
 create mode 100644 hw/vhost_net.c
 create mode 100644 hw/vhost_net.h




[Qemu-devel] [PATCH-RFC 01/13] virtio: export virtqueue structure

2010-01-11 Thread Michael S. Tsirkin
vhost needs physical addresses for
ring so expose that structure.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 hw/virtio.c |   18 --
 hw/virtio.h |   17 +
 2 files changed, 17 insertions(+), 18 deletions(-)

diff --git a/hw/virtio.c b/hw/virtio.c
index 3c609ce..8e3c9ad 100644
--- a/hw/virtio.c
+++ b/hw/virtio.c
@@ -49,24 +49,6 @@ typedef struct VRingUsed
 VRingUsedElem ring[0];
 } VRingUsed;
 
-typedef struct VRing
-{
-unsigned int num;
-target_phys_addr_t desc;
-target_phys_addr_t avail;
-target_phys_addr_t used;
-} VRing;
-
-struct VirtQueue
-{
-VRing vring;
-target_phys_addr_t pa;
-uint16_t last_avail_idx;
-int inuse;
-uint16_t vector;
-void (*handle_output)(VirtIODevice *vdev, VirtQueue *vq);
-};
-
 #define VIRTIO_PCI_QUEUE_MAX16
 
 /* virt queue functions */
diff --git a/hw/virtio.h b/hw/virtio.h
index 3994cc9..ca840e1 100644
--- a/hw/virtio.h
+++ b/hw/virtio.h
@@ -180,5 +180,22 @@ void virtio_net_exit(VirtIODevice *vdev);
DEFINE_PROP_BIT(indirect_desc, _state, _field, \
VIRTIO_RING_F_INDIRECT_DESC, true)
 
+typedef struct VRing
+{
+unsigned int num;
+target_phys_addr_t desc;
+target_phys_addr_t avail;
+target_phys_addr_t used;
+} VRing;
+
+struct VirtQueue
+{
+VRing vring;
+target_phys_addr_t pa;
+uint16_t last_avail_idx;
+int inuse;
+uint16_t vector;
+void (*handle_output)(VirtIODevice *vdev, VirtQueue *vq);
+};
 
 #endif
-- 
1.6.6.rc1.43.gf55cc





[Qemu-devel] [PATCH-RFC 02/13] kvm: add API to set ioeventfd

2010-01-11 Thread Michael S. Tsirkin
Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 kvm-all.c |   24 
 kvm.h |3 +++
 2 files changed, 27 insertions(+), 0 deletions(-)

diff --git a/kvm-all.c b/kvm-all.c
index a312654..aa00119 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -1113,3 +1113,27 @@ void kvm_remove_all_breakpoints(CPUState *current_env)
 {
 }
 #endif /* !KVM_CAP_SET_GUEST_DEBUG */
+
+#ifdef KVM_IOEVENTFD
+int kvm_set_ioeventfd(uint16_t addr, uint16_t data, int fd, bool assigned)
+{
+struct kvm_ioeventfd kick = {
+.datamatch = data,
+.addr = addr,
+.len = 2,
+.flags = KVM_IOEVENTFD_FLAG_DATAMATCH | KVM_IOEVENTFD_FLAG_PIO,
+.fd = fd,
+};
+if (!assigned)
+kick.flags |= KVM_IOEVENTFD_FLAG_DEASSIGN;
+int r = kvm_vm_ioctl(kvm_state, KVM_IOEVENTFD, kick);
+if (r  0)
+return r;
+return 0;
+}
+#else
+int kvm_set_ioeventfd(uint16_t data, uint16_t addr, int fd, bool assigned)
+{
+return -ENOSYS;
+}
+#endif
diff --git a/kvm.h b/kvm.h
index 672d511..2d723b8 100644
--- a/kvm.h
+++ b/kvm.h
@@ -14,6 +14,7 @@
 #ifndef QEMU_KVM_H
 #define QEMU_KVM_H
 
+#include stdbool.h
 #include config.h
 #include qemu-queue.h
 
@@ -131,4 +132,6 @@ static inline void cpu_synchronize_state(CPUState *env)
 }
 }
 
+int kvm_set_ioeventfd(uint16_t addr, uint16_t data, int fd, bool assigned);
+
 #endif
-- 
1.6.6.rc1.43.gf55cc





[Qemu-devel] [PATCH-RFC 03/13] virtio: add iofd/irqfd support

2010-01-11 Thread Michael S. Tsirkin
Add binding API to set iofd/irqfd support.
Will be used by vhost.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 hw/virtio.c |   13 ++---
 hw/virtio.h |4 
 2 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/hw/virtio.c b/hw/virtio.c
index 8e3c9ad..b9ec863 100644
--- a/hw/virtio.c
+++ b/hw/virtio.c
@@ -572,6 +572,12 @@ VirtQueue *virtio_add_queue(VirtIODevice *vdev, int 
queue_size,
 return vdev-vq[i];
 }
 
+void virtio_irq(VirtIODevice *vdev, VirtQueue *vq)
+{
+vdev-isr |= 0x01;
+virtio_notify_vector(vdev, vq-vector);
+}
+
 void virtio_notify(VirtIODevice *vdev, VirtQueue *vq)
 {
 /* Make sure used ring is updated before we check NO_INTERRUPT. */
@@ -582,8 +588,7 @@ void virtio_notify(VirtIODevice *vdev, VirtQueue *vq)
  (vq-inuse || vring_avail_idx(vq) != vq-last_avail_idx)))
 return;
 
-vdev-isr |= 0x01;
-virtio_notify_vector(vdev, vq-vector);
+virtio_irq(vdev, vq);
 }
 
 void virtio_notify_config(VirtIODevice *vdev)
@@ -696,8 +701,10 @@ VirtIODevice *virtio_common_init(const char *name, 
uint16_t device_id,
 vdev-queue_sel = 0;
 vdev-config_vector = VIRTIO_NO_VECTOR;
 vdev-vq = qemu_mallocz(sizeof(VirtQueue) * VIRTIO_PCI_QUEUE_MAX);
-for(i = 0; i  VIRTIO_PCI_QUEUE_MAX; i++)
+for(i = 0; i  VIRTIO_PCI_QUEUE_MAX; i++) {
 vdev-vq[i].vector = VIRTIO_NO_VECTOR;
+vdev-vq[i].vdev = vdev;
+}
 
 vdev-name = name;
 vdev-config_len = config_size;
diff --git a/hw/virtio.h b/hw/virtio.h
index ca840e1..193b3f9 100644
--- a/hw/virtio.h
+++ b/hw/virtio.h
@@ -88,6 +88,8 @@ typedef struct {
 int (*load_config)(void * opaque, QEMUFile *f);
 int (*load_queue)(void * opaque, int n, QEMUFile *f);
 unsigned (*get_features)(void * opaque);
+int (*set_irqfd)(void * opaque, int n, int fd, bool assigned);
+int (*set_queuefd)(void * opaque, int n, int fd, bool assigned);
 } VirtIOBindings;
 
 #define VIRTIO_PCI_QUEUE_MAX 16
@@ -196,6 +198,8 @@ struct VirtQueue
 int inuse;
 uint16_t vector;
 void (*handle_output)(VirtIODevice *vdev, VirtQueue *vq);
+VirtIODevice *vdev;
 };
 
+void virtio_irq(VirtIODevice *vdev, VirtQueue *vq);
 #endif
-- 
1.6.6.rc1.43.gf55cc





[Qemu-devel] [PATCH-RFC 04/13] virtio-pci: fill in irqfd/queufd support

2010-01-11 Thread Michael S. Tsirkin
Support irqfd/queuefd. The last one only with kvm, that's
okay because vhost relies on kvm anyway.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 hw/virtio-pci.c |   26 ++
 1 files changed, 26 insertions(+), 0 deletions(-)

diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
index 6d0f9dd..5a3be6b 100644
--- a/hw/virtio-pci.c
+++ b/hw/virtio-pci.c
@@ -23,6 +23,7 @@
 #include msix.h
 #include net.h
 #include loader.h
+#include kvm.h
 
 /* from Linux's linux/virtio_pci.h */
 
@@ -388,6 +389,29 @@ static unsigned virtio_pci_get_features(void *opaque)
 return proxy-host_features;
 }
 
+static void virtio_pci_irqfd_read(void *opaque)
+{
+VirtQueue *vq = opaque;
+virtio_irq(vq-vdev, vq);
+}
+
+static int virtio_pci_irqfd(void * opaque, int n, int fd, bool assign)
+{
+VirtIOPCIProxy *proxy = opaque;
+VirtQueue *vq = proxy-vdev-vq[n];
+
+qemu_set_fd_handler(fd, assign ? virtio_pci_irqfd_read : NULL, NULL, vq);
+return 0;
+}
+
+static int virtio_pci_queuefd(void * opaque, int n, int fd, bool assign)
+{
+VirtIOPCIProxy *proxy = opaque;
+return kvm_set_ioeventfd(proxy-addr + VIRTIO_PCI_QUEUE_NOTIFY,
+ n, fd, assign);
+}
+
+
 static const VirtIOBindings virtio_pci_bindings = {
 .notify = virtio_pci_notify,
 .save_config = virtio_pci_save_config,
@@ -395,6 +419,8 @@ static const VirtIOBindings virtio_pci_bindings = {
 .save_queue = virtio_pci_save_queue,
 .load_queue = virtio_pci_load_queue,
 .get_features = virtio_pci_get_features,
+.set_irqfd = virtio_pci_irqfd,
+.set_queuefd = virtio_pci_queuefd,
 };
 
 static void virtio_init_pci(VirtIOPCIProxy *proxy, VirtIODevice *vdev,
-- 
1.6.6.rc1.43.gf55cc





[Qemu-devel] [PATCH-RFC 05/13] syborg_virtio: add irqfd/eventfd support

2010-01-11 Thread Michael S. Tsirkin
No idea if it's right .. and syborg does not support vectors so we won't
get performance gains out of it either - so quite likely it's best to
just keep this patch out of qemu.  But just to show what's possible.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 hw/syborg_virtio.c |   25 +
 1 files changed, 25 insertions(+), 0 deletions(-)

diff --git a/hw/syborg_virtio.c b/hw/syborg_virtio.c
index 65239a0..ef6364f 100644
--- a/hw/syborg_virtio.c
+++ b/hw/syborg_virtio.c
@@ -249,9 +249,34 @@ static unsigned syborg_virtio_get_features(void *opaque)
 return proxy-host_features;
 }
 
+static void syborg_virtio_irqfd_read(void *opaque)
+{
+VirtQueue *vq = opaque;
+virtio_irq(vq-vdev, vq);
+}
+
+static int syborg_virtio_irqfd(void * opaque, int n, int fd, bool assign)
+{
+SyborgVirtIOProxy *proxy = opaque;
+VirtQueue *vq = proxy-vdev-vq[n];
+
+qemu_set_fd_handler(fd, assign ? syborg_virtio_irqfd_read : NULL, NULL, 
vq);
+return 0;
+}
+
+static int syborg_virtio_queuefd(void * opaque, int n, int fd, bool assign)
+{
+SyborgVirtIOProxy *proxy = opaque;
+return kvm_set_ioeventfd(proxy-busdev.mmio[0].addr +
+ SYBORG_VIRTIO_QUEUE_NOTIFY  2,
+ n, fd, assign);
+}
+
 static VirtIOBindings syborg_virtio_bindings = {
 .notify = syborg_virtio_update_irq,
 .get_features = syborg_virtio_get_features,
+.set_irqfd = syborg_virtio_irqfd,
+.set_queuefd = syborg_virtio_queuefd,
 };
 
 static int syborg_virtio_init(SyborgVirtIOProxy *proxy, VirtIODevice *vdev)
-- 
1.6.6.rc1.43.gf55cc





[Qemu-devel] [PATCH-RFC 07/13] virtio: move typedef to qemu-common

2010-01-11 Thread Michael S. Tsirkin
make it possible to use type without header include

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 hw/virtio.h   |1 -
 qemu-common.h |1 +
 2 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/hw/virtio.h b/hw/virtio.h
index 193b3f9..6a7e3ec 100644
--- a/hw/virtio.h
+++ b/hw/virtio.h
@@ -67,7 +67,6 @@ static inline target_phys_addr_t 
vring_align(target_phys_addr_t addr,
 }
 
 typedef struct VirtQueue VirtQueue;
-typedef struct VirtIODevice VirtIODevice;
 
 #define VIRTQUEUE_MAX_SIZE 1024
 
diff --git a/qemu-common.h b/qemu-common.h
index 8630f8c..8bb1f83 100644
--- a/qemu-common.h
+++ b/qemu-common.h
@@ -217,6 +217,7 @@ typedef struct uWireSlave uWireSlave;
 typedef struct I2SCodec I2SCodec;
 typedef struct DeviceState DeviceState;
 typedef struct SSIBus SSIBus;
+typedef struct VirtIODevice VirtIODevice;
 
 /* CPU save/load.  */
 void cpu_save(QEMUFile *f, void *opaque);
-- 
1.6.6.rc1.43.gf55cc





[Qemu-devel] [PATCH-RFC 08/13] net/tap: add interface to get device fd

2010-01-11 Thread Michael S. Tsirkin
Will be used by vhost to attach/detach to backend.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 net/tap.c |7 +++
 net/tap.h |2 ++
 2 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/net/tap.c b/net/tap.c
index d3492de..7e9ca79 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -269,6 +269,13 @@ static void tap_poll(VLANClientState *nc, bool enable)
 tap_write_poll(s, enable);
 }
 
+int tap_get_fd(VLANClientState *nc)
+{
+TAPState *s = DO_UPCAST(TAPState, nc, nc);
+assert(nc-info-type == NET_CLIENT_TYPE_TAP);
+return s-fd;
+}
+
 /* fd support */
 
 static NetClientInfo net_tap_info = {
diff --git a/net/tap.h b/net/tap.h
index 538a562..a244b28 100644
--- a/net/tap.h
+++ b/net/tap.h
@@ -48,4 +48,6 @@ int tap_probe_vnet_hdr(int fd);
 int tap_probe_has_ufo(int fd);
 void tap_fd_set_offload(int fd, int csum, int tso4, int tso6, int ecn, int 
ufo);
 
+int tap_get_fd(VLANClientState *vc);
+
 #endif /* QEMU_NET_TAP_H */
-- 
1.6.6.rc1.43.gf55cc





[Qemu-devel] [PATCH-RFC 09/13] tap: add vhost/vhostfd options

2010-01-11 Thread Michael S. Tsirkin
Looks like order got mixed up: vhost_net header
is added by a follow-up patch. Will be fixed
in the next revision.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 net.c   |8 
 net/tap.c   |   29 +
 qemu-options.hx |4 +++-
 3 files changed, 40 insertions(+), 1 deletions(-)

diff --git a/net.c b/net.c
index 6ef93e6..b942d03 100644
--- a/net.c
+++ b/net.c
@@ -976,6 +976,14 @@ static struct {
 .name = vnet_hdr,
 .type = QEMU_OPT_BOOL,
 .help = enable the IFF_VNET_HDR flag on the tap interface
+}, {
+.name = vhost,
+.type = QEMU_OPT_BOOL,
+.help = enable vhost-net network accelerator,
+}, {
+.name = vhostfd,
+.type = QEMU_OPT_STRING,
+.help = file descriptor of an already opened vhost net 
device,
 },
 #endif /* _WIN32 */
 { /* end of list */ }
diff --git a/net/tap.c b/net/tap.c
index 7e9ca79..d9f2e41 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -41,6 +41,8 @@
 
 #include net/tap-linux.h
 
+#include hw/vhost_net.h
+
 /* Maximum GSO packet size (64k) plus plenty of room for
  * the ethernet and virtio_net headers
  */
@@ -57,6 +59,7 @@ typedef struct TAPState {
 unsigned int has_vnet_hdr : 1;
 unsigned int using_vnet_hdr : 1;
 unsigned int has_ufo: 1;
+struct vhost_net *vhost_net;
 } TAPState;
 
 static int launch_script(const char *setup_script, const char *ifname, int fd);
@@ -252,6 +255,10 @@ static void tap_cleanup(VLANClientState *nc)
 {
 TAPState *s = DO_UPCAST(TAPState, nc, nc);
 
+if (s-vhost_net) {
+vhost_net_cleanup(s-vhost_net);
+}
+
 qemu_purge_queued_packets(nc);
 
 if (s-down_script[0])
@@ -307,6 +314,7 @@ static TAPState *net_tap_fd_init(VLANState *vlan,
 s-has_ufo = tap_probe_has_ufo(s-fd);
 tap_set_offload(s-nc, 0, 0, 0, 0, 0);
 tap_read_poll(s, 1);
+s-vhost_net = NULL;
 return s;
 }
 
@@ -456,6 +464,27 @@ int net_init_tap(QemuOpts *opts, Monitor *mon, const char 
*name, VLANState *vlan
 }
 }
 
+if (qemu_opt_get_bool(opts, vhost, 0)) {
+int vhostfd, r;
+if (qemu_opt_get(opts, vhostfd)) {
+r = net_handle_fd_param(mon, qemu_opt_get(opts, vhostfd));
+if (r == -1) {
+return -1;
+}
+vhostfd = r;
+} else {
+vhostfd = -1;
+}
+s-vhost_net = vhost_net_init(s-nc, vhostfd);
+if (!s-vhost_net) {
+qemu_error(vhost-net requested but could not be initialized\n);
+return -1;
+}
+} else if (qemu_opt_get(opts, vhostfd)) {
+qemu_error(vhostfd= is not valid without vhost\n);
+return -1;
+}
+
 if (vlan) {
 vlan-nb_host_devs++;
 }
diff --git a/qemu-options.hx b/qemu-options.hx
index ecd50eb..ef185dd 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -813,7 +813,7 @@ DEF(net, HAS_ARG, QEMU_OPTION_net,
 -net tap[,vlan=n][,name=str],ifname=name\n
 connect the host TAP network interface to VLAN 'n'\n
 #else
--net 
tap[,vlan=n][,name=str][,fd=h][,ifname=name][,script=file][,downscript=dfile][,sndbuf=nbytes][,vnet_hdr=on|off]\n
+-net 
tap[,vlan=n][,name=str][,fd=h][,ifname=name][,script=file][,downscript=dfile][,sndbuf=nbytes][,vnet_hdr=on|off][,vhost=on|off][,vhostfd=h]\n
 connect the host TAP network interface to VLAN 'n' and 
use the\n
 network scripts 'file' (default=%s)\n
 and 'dfile' (default=%s);\n
@@ -823,6 +823,8 @@ DEF(net, HAS_ARG, QEMU_OPTION_net,
 default of 'sndbuf=1048576' can be disabled using 
'sndbuf=0'\n
 use vnet_hdr=off to avoid enabling the IFF_VNET_HDR tap 
flag; use\n
 vnet_hdr=on to make the lack of IFF_VNET_HDR support an 
error condition\n
+use vhost=on to enable experimental in kernel 
accelerator\n
+use 'vhostfd=h' to connect to an already opened vhost net 
device\n
 #endif
 -net 
socket[,vlan=n][,name=str][,fd=h][,listen=[host]:port][,connect=host:port]\n
 connect the vlan 'n' to another VLAN using a socket 
connection\n
-- 
1.6.6.rc1.43.gf55cc





[Qemu-devel] [PATCH-RFC 10/13] tap: add API to retrieve vhost net header

2010-01-11 Thread Michael S. Tsirkin
same comment as patch 09.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 net/tap.c |7 +++
 net/tap.h |3 +++
 2 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/net/tap.c b/net/tap.c
index d9f2e41..166cf05 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -491,3 +491,10 @@ int net_init_tap(QemuOpts *opts, Monitor *mon, const char 
*name, VLANState *vlan
 
 return 0;
 }
+
+struct vhost_net *tap_get_vhost_net(VLANClientState *nc)
+{
+TAPState *s = DO_UPCAST(TAPState, nc, nc);
+assert(nc-info-type == NET_CLIENT_TYPE_TAP);
+return s-vhost_net;
+}
diff --git a/net/tap.h b/net/tap.h
index a244b28..b8cec83 100644
--- a/net/tap.h
+++ b/net/tap.h
@@ -50,4 +50,7 @@ void tap_fd_set_offload(int fd, int csum, int tso4, int tso6, 
int ecn, int ufo);
 
 int tap_get_fd(VLANClientState *vc);
 
+struct vhost_net;
+struct vhost_net *tap_get_vhost_net(VLANClientState *vc);
+
 #endif /* QEMU_NET_TAP_H */
-- 
1.6.6.rc1.43.gf55cc





[Qemu-devel] [PATCH-RFC 12/13] virtio: add status change callback

2010-01-11 Thread Michael S. Tsirkin
vhost net backend needs to be notified when
frontend status changes. Add a callback.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 hw/s390-virtio-bus.c |3 +++
 hw/syborg_virtio.c   |2 ++
 hw/virtio-pci.c  |6 ++
 hw/virtio.h  |1 +
 4 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/hw/s390-virtio-bus.c b/hw/s390-virtio-bus.c
index b663972..1f8a04f 100644
--- a/hw/s390-virtio-bus.c
+++ b/hw/s390-virtio-bus.c
@@ -243,6 +243,9 @@ void s390_virtio_device_update_status(VirtIOS390Device *dev)
 uint32_t features;
 
 vdev-status = ldub_phys(dev-dev_offs + VIRTIO_DEV_OFFS_STATUS);
+if (vdev-set_status) {
+vdev-set_status(vdev);
+}
 
 /* Update guest supported feature bitmap */
 
diff --git a/hw/syborg_virtio.c b/hw/syborg_virtio.c
index ef6364f..264c1f7 100644
--- a/hw/syborg_virtio.c
+++ b/hw/syborg_virtio.c
@@ -152,6 +152,8 @@ static void syborg_virtio_writel(void *opaque, 
target_phys_addr_t offset,
 vdev-status = value  0xFF;
 if (vdev-status == 0)
 virtio_reset(vdev);
+if (vdev-set_status)
+vdev-set_status(vdev);
 break;
 case SYBORG_VIRTIO_INT_ENABLE:
 s-int_enable = value;
diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
index 5a3be6b..fa0a7a0 100644
--- a/hw/virtio-pci.c
+++ b/hw/virtio-pci.c
@@ -209,6 +209,9 @@ static void virtio_ioport_write(void *opaque, uint32_t 
addr, uint32_t val)
 virtio_reset(proxy-vdev);
 msix_unuse_all_vectors(proxy-pci_dev);
 }
+if (vdev-set_status) {
+vdev-set_status(vdev);
+}
 break;
 case VIRTIO_MSI_CONFIG_VECTOR:
 msix_vector_unuse(proxy-pci_dev, vdev-config_vector);
@@ -376,6 +379,9 @@ static void virtio_write_config(PCIDevice *pci_dev, 
uint32_t address,
 if (PCI_COMMAND == address) {
 if (!(val  PCI_COMMAND_MASTER)) {
 proxy-vdev-status = !VIRTIO_CONFIG_S_DRIVER_OK;
+if (proxy-vdev-set_status) {
+proxy-vdev-set_status(proxy-vdev);
+}
 }
 }
 
diff --git a/hw/virtio.h b/hw/virtio.h
index 6a7e3ec..76734ac 100644
--- a/hw/virtio.h
+++ b/hw/virtio.h
@@ -112,6 +112,7 @@ struct VirtIODevice
 void (*get_config)(VirtIODevice *vdev, uint8_t *config);
 void (*set_config)(VirtIODevice *vdev, const uint8_t *config);
 void (*reset)(VirtIODevice *vdev);
+void (*set_status)(VirtIODevice *vdev);
 VirtQueue *vq;
 const VirtIOBindings *binding;
 void *binding_opaque;
-- 
1.6.6.rc1.43.gf55cc





[Qemu-devel] [PATCH-RFC 13/13] virtio-net: connect to vhost net backend

2010-01-11 Thread Michael S. Tsirkin
start/stop backend on driver start/stop

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 hw/virtio-net.c |   40 
 1 files changed, 40 insertions(+), 0 deletions(-)

diff --git a/hw/virtio-net.c b/hw/virtio-net.c
index c2a389f..99169e1 100644
--- a/hw/virtio-net.c
+++ b/hw/virtio-net.c
@@ -17,6 +17,7 @@
 #include net/tap.h
 #include qemu-timer.h
 #include virtio-net.h
+#include vhost_net.h
 
 #define VIRTIO_NET_VM_VERSION11
 
@@ -47,6 +48,7 @@ typedef struct VirtIONet
 uint8_t nomulti;
 uint8_t nouni;
 uint8_t nobcast;
+uint8_t vhost_started;
 struct {
 int in_use;
 int first_multi;
@@ -114,6 +116,10 @@ static void virtio_net_reset(VirtIODevice *vdev)
 n-nomulti = 0;
 n-nouni = 0;
 n-nobcast = 0;
+if (n-vhost_started) {
+vhost_net_stop(tap_get_vhost_net(n-nic-nc.peer), vdev);
+n-vhost_started = 0;
+}
 
 /* Flush any MAC and VLAN filter table state */
 n-mac_table.in_use = 0;
@@ -820,6 +826,36 @@ static NetClientInfo net_virtio_info = {
 .link_status_changed = virtio_net_set_link_status,
 };
 
+static void virtio_net_set_status(struct VirtIODevice *vdev)
+{
+VirtIONet *n = to_virtio_net(vdev);
+if (!n-nic-nc.peer) {
+return;
+}
+if (n-nic-nc.peer-info-type != NET_CLIENT_TYPE_TAP) {
+return;
+}
+
+if (!tap_get_vhost_net(n-nic-nc.peer)) {
+return;
+}
+if (!!n-vhost_started == !!(vdev-status  VIRTIO_CONFIG_S_DRIVER_OK)) {
+return;
+}
+if (vdev-status  VIRTIO_CONFIG_S_DRIVER_OK) {
+int r = vhost_net_start(tap_get_vhost_net(n-nic-nc.peer), vdev);
+if (r  0) {
+fprintf(stderr, unable to start vhost net: 
+falling back on userspace virtio\n);
+} else {
+n-vhost_started = 1;
+}
+} else {
+vhost_net_stop(tap_get_vhost_net(n-nic-nc.peer), vdev);
+n-vhost_started = 0;
+}
+}
+
 VirtIODevice *virtio_net_init(DeviceState *dev, NICConf *conf)
 {
 VirtIONet *n;
@@ -835,6 +871,7 @@ VirtIODevice *virtio_net_init(DeviceState *dev, NICConf 
*conf)
 n-vdev.set_features = virtio_net_set_features;
 n-vdev.bad_features = virtio_net_bad_features;
 n-vdev.reset = virtio_net_reset;
+n-vdev.set_status = virtio_net_set_status;
 n-rx_vq = virtio_add_queue(n-vdev, 256, virtio_net_handle_rx);
 n-tx_vq = virtio_add_queue(n-vdev, 256, virtio_net_handle_tx);
 n-ctrl_vq = virtio_add_queue(n-vdev, 64, virtio_net_handle_ctrl);
@@ -864,6 +901,9 @@ VirtIODevice *virtio_net_init(DeviceState *dev, NICConf 
*conf)
 void virtio_net_exit(VirtIODevice *vdev)
 {
 VirtIONet *n = DO_UPCAST(VirtIONet, vdev, vdev);
+if (n-vhost_started) {
+vhost_net_stop(tap_get_vhost_net(n-nic-nc.peer), vdev);
+}
 
 qemu_purge_queued_packets(n-nic-nc);
 
-- 
1.6.6.rc1.43.gf55cc




[Qemu-devel] [PATCH-RFC 11/13] vhost net support

2010-01-11 Thread Michael S. Tsirkin
This adds vhost net support in qemu. Will be tied to tap device and
virtio later. Raw backend is currently missing, will be worked
on/submitted separately.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 Makefile.target |1 +
 hw/vhost.c  |  349 +++
 hw/vhost.h  |   33 +
 hw/vhost_net.c  |  145 +++
 hw/vhost_net.h  |   20 +++
 5 files changed, 548 insertions(+), 0 deletions(-)
 create mode 100644 hw/vhost.c
 create mode 100644 hw/vhost.h
 create mode 100644 hw/vhost_net.c
 create mode 100644 hw/vhost_net.h

diff --git a/Makefile.target b/Makefile.target
index 7c1f30c..61b7148 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -157,6 +157,7 @@ obj-y = vl.o async.o monitor.o pci.o pci_host.o pcie_host.o 
machine.o gdbstub.o
 # virtio has to be here due to weird dependency between PCI and virtio-net.
 # need to fix this properly
 obj-y += virtio-blk.o virtio-balloon.o virtio-net.o virtio-console.o 
virtio-pci.o
+obj-y += vhost_net.o vhost.o
 obj-$(CONFIG_KVM) += kvm.o kvm-all.o
 obj-$(CONFIG_ISA_MMIO) += isa_mmio.o
 LIBS+=-lz
diff --git a/hw/vhost.c b/hw/vhost.c
new file mode 100644
index 000..d23d94c
--- /dev/null
+++ b/hw/vhost.c
@@ -0,0 +1,349 @@
+#include linux/vhost.h
+#include sys/ioctl.h
+#include sys/eventfd.h
+#include vhost.h
+#include hw/hw.h
+/* For range_get_last */
+#include pci.h
+
+static void vhost_dev_unassign_memory(struct vhost_dev *dev,
+ struct vhost_memory *mem,
+ uint64_t start_addr,
+ uint64_t size)
+{
+   int from, to;
+   for (from = 0, to = 0; from  dev-mem-nregions; ++from, ++to) {
+   struct vhost_memory_region *reg = mem-regions + to;
+   uint64_t reglast;
+   uint64_t memlast;
+   uint64_t change;
+
+   /* clone old region */
+   memcpy(reg, dev-mem-regions + from, sizeof *reg);
+
+   /* No overlap is simple */
+   if (!ranges_overlap(reg-guest_phys_addr, reg-memory_size,
+   start_addr, size)) {
+   continue;
+   }
+   reglast = range_get_last(reg-guest_phys_addr, 
reg-memory_size);
+   memlast = range_get_last(start_addr, size);
+
+   /* Remove whole region */
+   if (start_addr = reg-guest_phys_addr  memlast = reglast) {
+   --to;
+   continue;
+   }
+
+   /* Shrink region */
+   if (memlast = reglast) {
+   reg-memory_size = start_addr - reg-guest_phys_addr;
+   continue;
+   }
+
+   /* Shift region */
+   if (start_addr = reg-guest_phys_addr) {
+   change = memlast + 1 - reg-guest_phys_addr;
+   reg-memory_size -= change;
+   reg-guest_phys_addr += change;
+   reg-userspace_addr += change;
+   continue;
+   }
+
+   /* Split region: shrink first part, shift second part. */
+   memcpy(reg + 1, reg, sizeof *reg);
+   reg[0].memory_size = start_addr - reg-guest_phys_addr;
+   change = memlast + 1 - reg-guest_phys_addr;
+   reg[1].memory_size -= change;
+   reg[1].guest_phys_addr += change;
+   reg[1].userspace_addr += change;
+   ++to;
+   }
+   mem-nregions = to;
+}
+
+/* Called after unassign, so no regions overlap the given range. */
+static void vhost_dev_assign_memory(struct vhost_dev *dev,
+   struct vhost_memory *mem,
+   uint64_t start_addr,
+   uint64_t size,
+   uint64_t uaddr)
+{
+   int from, to;
+   struct vhost_memory_region *merged = NULL;
+   for (from = 0, to = 0; from  dev-mem-nregions; ++from, ++to) {
+   struct vhost_memory_region *reg = mem-regions + to;
+   uint64_t prlast, urlast;
+   uint64_t pmlast, umlast;
+   uint64_t s, e, u;
+
+   /* clone old region */
+   memcpy(reg, dev-mem-regions + from, sizeof *reg);
+   prlast = range_get_last(reg-guest_phys_addr, reg-memory_size);
+   pmlast = range_get_last(start_addr, size);
+   urlast = range_get_last(reg-userspace_addr, reg-memory_size);
+   umlast = range_get_last(uaddr, size);
+
+   /* Not an adjecent region - do not merge. */
+   if ((prlast + 1 != start_addr || urlast + 1 != uaddr) 
+   (pmlast + 1 != reg-guest_phys_addr ||
+umlast + 1 != reg-userspace_addr)) {
+   continue;
+   

[Qemu-devel] [PATCH] vnc_refresh: return if vd-timer is NULL

2010-01-11 Thread Stefano Stabellini
Hi all,
calling vnc_update_client in vnc_refresh might have the unlikely side
effect of setting vd-timer = NULL, if the last vnc client disconnected.
In this case we have to return from vnc_refresh without updating the
timer, otherwise we cause a segfault.

Signed-off-by: Stefano Stabellini stefano.stabell...@eu.citrix.com

---

diff --git a/vnc.c b/vnc.c
index c54c6e0..58eac73 100644
--- a/vnc.c
+++ b/vnc.c
@@ -2305,6 +2305,10 @@ static void vnc_refresh(void *opaque)
 rects += vnc_update_client(vs, has_dirty);
 vs = vs-next;
 }
+/* vd-timer could be NULL now if the last client disconnected,
+ * in this case don't update the timer */
+if (vd-timer == NULL)
+return;
 
 if (has_dirty  rects) {
 vd-timer_interval /= 2;




Re: [Qemu-devel] [PATCH] dmg: fix -open failure

2010-01-11 Thread Christoph Hellwig
On Mon, Jan 11, 2010 at 03:11:52PM +0100, Kevin Wolf wrote:
 More or less the same hack, just in cleaner? Or trying to fundamentally
 change things? I think you haven't answered yet to what I said in the
 thread of my original hack. I'm quoting it here for convenience:

Well, not dealing with the format list in raw, but rather in block.c

  Ok, if you start talking about layering, we can have a fundamental
  discussion on this topic and why the layering is broken anyway.
  Logically, we have image formats like qcow2, VMDK and raw, and they are
  stored in files, on CD-ROMs or general block devices. From a layering
  perspective, it is wrong to include the latter in the raw format driver
  in the first place.
 
 Actually, I think the differentiation between raw files and host_* is at
 the same level as protocols are. Probably they should be implemented
 very similarly.
 
 Do you think it's possible/worth the effort to try putting things
 straight here?

So what you want is basically:

 - hdev_* and file as protocols in addition to nbd/ftp/http/..
 - a raw image format that can be used ontop of any protocol instead of
   an image format

That would indeed be a much better, not to say actually logical
layering.  The raw image format would be more or less a no-op just
stacking ontop of the protocol.  If we can find a way to implement this
efficiently it might be the way to go.





[Qemu-devel] [PATCH 1/2] block: clean up bdrv_open2 structure a bit

2010-01-11 Thread Christoph Hellwig
Check the whitelist as early as possible instead of continuing the
setup, and move all the error handling code to the end of the
function.

Signed-off-by: Christoph Hellwig h...@lst.de

Index: qemu/block.c
===
--- qemu.orig/block.c   2010-01-11 17:48:27.811004054 +0100
+++ qemu/block.c2010-01-11 17:52:15.578006158 +0100
@@ -428,10 +428,16 @@ int bdrv_open2(BlockDriverState *bs, con
 drv = find_image_format(filename);
 }
 }
+
 if (!drv) {
 ret = -ENOENT;
 goto unlink_and_fail;
 }
+if (use_bdrv_whitelist  !bdrv_is_whitelisted(drv)) {
+ret = -ENOTSUP;
+goto unlink_and_fail;
+}
+
 bs-drv = drv;
 bs-opaque = qemu_mallocz(drv-instance_size);
 
@@ -452,23 +458,17 @@ int bdrv_open2(BlockDriverState *bs, con
 (flags  (BDRV_O_CACHE_MASK|BDRV_O_NATIVE_AIO));
 else
 open_flags = flags  ~(BDRV_O_FILE | BDRV_O_SNAPSHOT);
-if (use_bdrv_whitelist  !bdrv_is_whitelisted(drv))
-ret = -ENOTSUP;
-else
-ret = drv-bdrv_open(bs, filename, open_flags);
+
+ret = drv-bdrv_open(bs, filename, open_flags);
 if ((ret == -EACCES || ret == -EPERM)  !(flags  BDRV_O_FILE)) {
 ret = drv-bdrv_open(bs, filename, open_flags  ~BDRV_O_RDWR);
 bs-read_only = 1;
 }
+
 if (ret  0) {
-qemu_free(bs-opaque);
-bs-opaque = NULL;
-bs-drv = NULL;
-unlink_and_fail:
-if (bs-is_temporary)
-unlink(filename);
-return ret;
+goto free_and_fail;
 }
+
 if (drv-bdrv_getlength) {
 bs-total_sectors = bdrv_getlength(bs)  BDRV_SECTOR_BITS;
 }
@@ -502,6 +502,15 @@ int bdrv_open2(BlockDriverState *bs, con
 bs-change_cb(bs-change_opaque);
 }
 return 0;
+
+free_and_fail:
+qemu_free(bs-opaque);
+bs-opaque = NULL;
+bs-drv = NULL;
+unlink_and_fail:
+if (bs-is_temporary)
+unlink(filename);
+return ret;
 }
 
 void bdrv_close(BlockDriverState *bs)




[Qemu-devel] [PATCH 2/2] block: untangle open flag manipulation in bdrv_open2

2010-01-11 Thread Christoph Hellwig
Untangle the open flag manipulation in bdrv_open2 and document why we
are clearing the various flags in the different flag combinations.


Signed-off-by: Christoph Hellwig h...@lst.de

Index: qemu/block.c
===
--- qemu.orig/block.c   2010-01-11 17:52:54.273003789 +0100
+++ qemu/block.c2010-01-11 18:19:22.485006278 +0100
@@ -356,7 +356,7 @@ int bdrv_open(BlockDriverState *bs, cons
 int bdrv_open2(BlockDriverState *bs, const char *filename, int flags,
BlockDriver *drv)
 {
-int ret, open_flags, try_rw;
+int ret, open_flags;
 char tmp_filename[PATH_MAX];
 char backing_filename[PATH_MAX];
 
@@ -450,19 +450,39 @@ int bdrv_open2(BlockDriverState *bs, con
 if (flags  (BDRV_O_CACHE_WB|BDRV_O_NOCACHE))
 bs-enable_write_cache = 1;
 
-/* Note: for compatibility, we open disk image files as RDWR, and
-   RDONLY as fallback */
-try_rw = !bs-read_only || bs-is_temporary;
-if (!(flags  BDRV_O_FILE))
-open_flags = (try_rw ? BDRV_O_RDWR : 0) |
-(flags  (BDRV_O_CACHE_MASK|BDRV_O_NATIVE_AIO));
-else
-open_flags = flags  ~(BDRV_O_FILE | BDRV_O_SNAPSHOT);
+/*
+ * Always clear these flags as they are not destined for the drivers.
+ */
+open_flags = flags  ~(BDRV_O_FILE | BDRV_O_SNAPSHOT);
+
+if (!(flags  BDRV_O_FILE)) {
+/*
+ * For historical reasons we always try to open drive images as
+ * read-write first and only fall back to read-only later.
+ *
+ * This can be overriden using readonly suboption for the
+ * drive option.
+ */
+if (bs-read_only  !bs-is_temporary) {
+open_flags = ~BDRV_O_RDWR;
+} else {
+open_flags |= BDRV_O_RDWR;
+}
 
-ret = drv-bdrv_open(bs, filename, open_flags);
-if ((ret == -EACCES || ret == -EPERM)  !(flags  BDRV_O_FILE)) {
-ret = drv-bdrv_open(bs, filename, open_flags  ~BDRV_O_RDWR);
-bs-read_only = 1;
+/*
+ * Currently BDRV_O_CREAT is not supported by any image format,
+ * but I'm not sure that's reason enough to always clear it for
+ * the !BDRV_O_FILE case..
+ */
+open_flags = ~(BDRV_O_CREAT);
+
+ret = drv-bdrv_open(bs, filename, open_flags);
+if (ret == -EACCES || ret == -EPERM) {
+ret = drv-bdrv_open(bs, filename, open_flags  ~BDRV_O_RDWR);
+bs-read_only = 1;
+}
+} else {
+ret = drv-bdrv_open(bs, filename, open_flags);
 }
 
 if (ret  0) {




[Qemu-devel] Re: Re: [RFC][PATCH] performance improvement for windows guests, running on top of virtio block device

2010-01-11 Thread Michael S. Tsirkin
On Mon, Jan 11, 2010 at 08:37:10AM -0600, Anthony Liguori wrote:
 On 01/11/2010 08:29 AM, Avi Kivity wrote:
 On 01/11/2010 03:49 PM, Anthony Liguori wrote:
 So instead of disabling notify while requests are active we might want
 to only disable it while we are inside virtio_blk_handle_output.
 Something like the following minimally tested patch:


 I'd suggest that we get even more aggressive and install an idle  
 bottom half that checks the queue for newly submitted requests.  If  
 we keep getting requests submitted before a new one completes, we'll  
 never take an I/O exit.


 That has the downside of bouncing a cache line on unrelated exits.

 The read and write sides of the ring are widely separated in physical  
 memory specifically to avoid cache line bouncing.

   It probably doesn't matter with qemu as it is now, since it will  
 bounce qemu_mutex, but it will hurt with large guests (especially if  
 they have many rings).

 IMO we should get things to work well without riding on unrelated  
 exits, especially as we're trying to reduce those exits.

 A block I/O request can potentially be very, very long lived.  By  
 serializing requests like this, there's a high likelihood that it's  
 going to kill performance with anything capable of processing multiple  
 requests.

 OTOH, if we aggressively poll the ring when we have an opportunity to,  
 there's very little down side to that and it addresses the serialization  
 problem.

 The same approach is probably a good idea for virtio-net.

 With vhost-net you don't see exits.

 The point is, when we've disabled notification, we should poll on the  
 ring for additional requests instead of waiting for one to complete  
 before looking at another one.

 Even with vhost-net, this logic is still applicable although potentially  
 harder to achieve.

 Regards,

 Anthony Liguori


vhost net does this already: it has a mode where it poll when skbs have
left send queue: for tap this is when they have crossed the bridge, for packet
socket this is when they have been transmitted.

-- 
MST




[Qemu-devel] Re: Re: [RFC][PATCH] performance improvement for windows guests, running on top of virtio block device

2010-01-11 Thread Michael S. Tsirkin
On Mon, Jan 11, 2010 at 09:13:23AM -0600, Anthony Liguori wrote:
 On 01/11/2010 08:46 AM, Avi Kivity wrote:
 On 01/11/2010 04:37 PM, Anthony Liguori wrote:
 That has the downside of bouncing a cache line on unrelated exits.


 The read and write sides of the ring are widely separated in physical 
 memory specifically to avoid cache line bouncing.

 I meant, exits on random vcpus will cause the cacheline containing the  
 notification disable flag to bounce around.  As it is, we read it on  
 the vcpu that owns the queue and write it on that vcpu or the I/O 
 thread.

 Bottom halves are always run from the IO thread.
   It probably doesn't matter with qemu as it is now, since it will  
 bounce qemu_mutex, but it will hurt with large guests (especially 
 if they have many rings).

 IMO we should get things to work well without riding on unrelated  
 exits, especially as we're trying to reduce those exits.

 A block I/O request can potentially be very, very long lived.  By  
 serializing requests like this, there's a high likelihood that it's  
 going to kill performance with anything capable of processing  
 multiple requests.

 Right, that's why I suggested having a queue depth at which disabling  
 notification kicks in.  The patch hardcodes this depth to 1, unpatched  
 qemu is infinite, a good value is probably spindle count + VAT.

 That means we would need a user visible option which is quite unfortunate.

 Also, that logic only really makes sense with cache=off.  With  
 cache=writethrough, you can get pathological cases whereas you have an  
 uncached access followed by cached accesses.  In fact, with read-ahead,  
 this is probably not an uncommon scenario.

 OTOH, if we aggressively poll the ring when we have an opportunity  
 to, there's very little down side to that and it addresses the  
 serialization problem.

 But we can't guarantee that we'll get those opportunities, so it  
 doesn't address the problem in a general way.  A guest that doesn't  
 use hpet and only has a single virtio-blk device will not have any  
 reason to exit to qemu.

 We can mitigate this with a timer but honestly, we need to do perf  
 measurements to see.  My feeling is that we will need some more  
 aggressive form of polling than just waiting for IO completion.  I don't  
 think queue depth is enough because it assumes that all requests are  
 equal.  When dealing with cache=off or even just storage with it's own  
 cache, that's simply not the case.

 Regards,

 Anthony Liguori


BTW this is why vhost net uses queue depth in bytes.

-- 
MST




[Qemu-devel] [PATCH] raw-posix: Detect CDROM via ioctl

2010-01-11 Thread Cole Robinson
Current CDROM detection is hardcoded based on source file name.
Make this smarter by attempting a CDROM specific ioctl.

This makes '-cdrom /dev/sr0' succeed with no media present.

Signed-off-by: Cole Robinson crobi...@redhat.com
---
 block/raw-posix.c |   18 +-
 1 files changed, 17 insertions(+), 1 deletions(-)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index 5a6a22b..3ec58d0 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -1140,9 +1140,25 @@ static int cdrom_open(BlockDriverState *bs, const char 
*filename, int flags)
 
 static int cdrom_probe_device(const char *filename)
 {
+int fd, ret;
+
 if (strstart(filename, /dev/cd, NULL))
 return 100;
-return 0;
+
+fd = open(filename, O_RDONLY | O_NONBLOCK);
+if (fd  0) {
+return 0;
+}
+
+/* Attempt to detect via a CDROM specific ioctl */
+if (ioctl(fd, CDROM_DRIVE_STATUS, CDSL_CURRENT)  0  errno == EINVAL) {
+ret = 0;
+} else {
+ret = 100;
+}
+
+close(fd);
+return ret;
 }
 
 static int cdrom_is_inserted(BlockDriverState *bs)
-- 
1.6.6





[Qemu-devel] [PATCH] raw-posix: Detect IDE floppy via ioctl

2010-01-11 Thread Cole Robinson
Current IDE floppy detection is hardcoded based on source file name.
Make this smarter by attempting a floppy specific ioctl.

Signed-off-by: Cole Robinson crobi...@redhat.com
---
 block/raw-posix.c |   19 ++-
 1 files changed, 18 insertions(+), 1 deletions(-)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index 3ec58d0..5d6b9fb 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -1055,9 +1055,26 @@ static int floppy_open(BlockDriverState *bs, const char 
*filename, int flags)
 
 static int floppy_probe_device(const char *filename)
 {
+int fd, ret;
+struct floppy_struct fdparam;
+
 if (strstart(filename, /dev/fd, NULL))
 return 100;
-return 0;
+
+fd = open(filename, O_RDONLY | O_NONBLOCK);
+if (fd  0) {
+return 0;
+}
+
+/* Attempt to detect via a floppy specific ioctl */
+if (ioctl(fd, FDGETPRM, fdparam)  0  errno == EINVAL) {
+ret = 0;
+} else {
+ret = 100;
+}
+
+close(fd);
+return ret;
 }
 
 
-- 
1.6.6





[Qemu-devel] QMP forward compatibility support

2010-01-11 Thread Luiz Capitulino

 Hi.

 We (Markus and I) are working on getting QMP forward compatibility support,
supported. :)

 We have a plan for it and I'd like to ask the CC'ed people to review it.

 Needless to say, but the objective here is to add new commands, arguments,
async messages and protocol features w/o breaking existing clients.

General Plan


1. QMP should describe itself, ie. it should dump all accepted commands,
their replies and arguments, async messages and protocol features. All in
JSON format

2. Protocol features are advertised by the (already existent) capabilities
array, but are _disabled_ by default. The exception is async messages,
which are _enabled_ by default

3. We should add command(s) to enable/disable protocol features

4. Proper feature negotiation is done in pause mode. That's, clients
interested in enabling new protocol features should start QEMU in
pause mode and enable the features they are interested in using

Work to do
--

1. Markus is already working on getting QMP to describe itself, we
should get a status/proposal soon. This is the most important part,
as we may want to change query-commands output, query-events is
missing and hopefully we can generate user docs from this

2. We need to add commands to enable/disable features. Not high priority,
as the only existing feature is async messages which are enabled by
default




[Qemu-devel] Re: [RFC] API change for pci_set_word and related functions (was Re: [PATCH] eepro100: Fix initial value for PCI_STATUS)

2010-01-11 Thread Michael S. Tsirkin
On Thu, Jan 07, 2010 at 04:07:26PM +0100, Stefan Weil wrote:
 Michael S. Tsirkin schrieb:
  On Thu, Jan 07, 2010 at 12:15:25PM +0100, Stefan Weil wrote:
  ...
  ---
  hw/eepro100.c | 4 +---
  1 files changed, 1 insertions(+), 3 deletions(-)
 
  diff --git a/hw/eepro100.c b/hw/eepro100.c
  index 336ca49..a21c984 100644
  --- a/hw/eepro100.c
  +++ b/hw/eepro100.c
  @@ -420,10 +420,8 @@ static void pci_reset(EEPRO100State * s)
  /* TODO: this is the default, do not override. */
  PCI_CONFIG_16(PCI_COMMAND, 0x);
  /* PCI Status */
  - /* TODO: this seems to make no sense. */
  /* TODO: Value at RST# should be 0. */
 
  So this second todo can go too. I've removed it in my tree.
 
  - PCI_CONFIG_16(PCI_STATUS,
  - PCI_STATUS_REC_MASTER_ABORT | PCI_STATUS_SIG_TARGET_ABORT);
  + PCI_CONFIG_16(PCI_STATUS, PCI_STATUS_DEVSEL_MEDIUM |
  PCI_STATUS_FAST_BACK);
  /* PCI Revision ID */
  PCI_CONFIG_8(PCI_REVISION_ID, 0x08);
 
  BTW if you are not afraid of churn, there's no reason
  for PCI_CONFIG_8 and friends anymore, because pci.h
  has much nicer pci_set_byte etc.
 
 Hello Michael,
 
 I already noticed pci_set_byte, pci_set_word, pci_set_long and
 the corresponding pci_get_xxx functions and thought about using them.
 
 I did not start it because I want to suggest a different API
 for use in PCI device emulations:
 
 instead of
 
 pci_set_word(pci_conf + PCI_STATUS, PCI_STATUS_CAP_LIST);
 
 or
 
 pci_set_word(pci_conf[PCI_STATUS], PCI_STATUS_CAP_LIST);
 
 it would be better to call
 
 pci_set_word(pci_conf, PCI_STATUS, PCI_STATUS_CAP_LIST);
 
 
 The prototypes would look like this:
 
 /* Set PCI config value. */
 void pci_set_word(PCIDevice *s, uint8_t offset, uint16_t val);
 
 /* Set PCI cmask value. */
 void pci_set_cmask_word(PCIDevice *s, uint8_t offset, uint16_t val);
 
 /* Set PCI wmask value. */
 void pci_set_wmask_word(PCIDevice *s, uint8_t offset, uint16_t val);
 
 What are the advantages?
 * strict type checking (the old API takes any uint8_t *)

So IMO it's easier to make mistakes with your proposed API because if
you confuse offset and value compiler does not complain.  More
importantly, if you want to pass some other uint8_t pointer to e.g.
pci_set_word, you can *and nothing unexpected will happen*
it will set the word to the given value. So the current
API is more type safe than what you propose.

 * many other pci_* functions also have a first parameter of type PCIDevice

So what would make sense IMO is higer level abstraction,
for example similar to what we have with capabilities
and msix, I think we could have something like this
for e.g. power management.

For low-level bit tweaking, the advantages of current API is that same
thing can be used to set wmask, cmask, config itself, and whatever else
we will come up with.

 * calls look nicer (at least in my opinion)

What I value is the fact that it's obvious which
data is changed.

 * strict range checking (offset is limited to 0...255, additional
   assertions possible - the old API is unsafe because it just takes
   a pointer)

I don't think we want to add return status, so there wouldn't
be a benefit to range checking as we can't act on it.
Anyway, it's very unusual to use anything but a constant
as an offset, so range errors are very uncommon.

 The functions are inline, so the resulting code won't differ.
 
 Instead of _byte, _word and _long I personally prefer something
 like _8, _16, _32 because _word and _long need interpretation.
 But this is only a matter of taste - the API change is more important.
 
 
 Regards,
 
 Stefan Weil



-- 
MST




[Qemu-devel] Re: [PATCH] eepro100: Update ROM file support

2010-01-11 Thread Michael S. Tsirkin
On Thu, Jan 07, 2010 at 05:13:30PM +0100, Stefan Weil wrote:
 Use new way to associate ROM files to devices.
 
 Currently, there is only a ROM file for i82559er
 included in QEMU, so the patch does not add
 .romfile for the other devices.
 
 When flexible mode is fixed in eepro100, adding
 more ROM files will be possible. It should be
 possible to create them from pxe-i82559er.bin,
 because etherboot uses the same driver for all
 eepro100 devices (only PCI ids differ).
 
 Maybe it is even possible to create a single
 pxe-i8255x.bin which supports all eepro100 devices
 (not supported with current etherboot).
 
 Signed-off-by: Stefan Weil w...@mail.berlios.de

Gerd, could you ack this patch please?
Thanks!

 ---
  hw/eepro100.c |   11 +--
  1 files changed, 1 insertions(+), 10 deletions(-)
 
 diff --git a/hw/eepro100.c b/hw/eepro100.c
 index a21c984..b33dbb8 100644
 --- a/hw/eepro100.c
 +++ b/hw/eepro100.c
 @@ -40,7 +40,6 @@
  #include stddef.h /* offsetof */
  #include stdbool.h
  #include hw.h
 -#include loader.h /* rom_add_option */
  #include pci.h
  #include net.h
  #include eeprom93xx.h
 @@ -1894,15 +1893,6 @@ static int nic_init(PCIDevice *pci_dev, uint32_t 
 device)
  s-vmstate-name = s-nic-nc.model;
  vmstate_register(-1, s-vmstate, s);
  
 -if (!pci_dev-qdev.hotplugged) {
 -static int loaded = 0;
 -if (!loaded) {
 -char fname[32];
 -snprintf(fname, sizeof(fname), pxe-%s.bin, s-nic-nc.model);
 -rom_add_option(fname);
 -loaded = 1;
 -}
 -}
  return 0;
  }
  
 @@ -2062,6 +2052,7 @@ static PCIDeviceInfo eepro100_info[] = {
  .qdev.size = sizeof(EEPRO100State),
  .init  = pci_i82559er_init,
  .exit  = pci_nic_uninit,
 +.romfile   = pxe-i82559er.bin,
  .qdev.props = (Property[]) {
  DEFINE_NIC_PROPERTIES(EEPRO100State, conf),
  DEFINE_PROP_END_OF_LIST(),
 -- 
 1.6.5




Re: [Qemu-devel] QMP forward compatibility support

2010-01-11 Thread Anthony Liguori

On 01/11/2010 12:34 PM, Luiz Capitulino wrote:

  Hi.

  We (Markus and I) are working on getting QMP forward compatibility support,
supported. :)

  We have a plan for it and I'd like to ask the CC'ed people to review it.

  Needless to say, but the objective here is to add new commands, arguments,
async messages and protocol features w/o breaking existing clients.

General Plan


1. QMP should describe itself, ie. it should dump all accepted commands,
their replies and arguments, async messages and protocol features. All in
JSON format

2. Protocol features are advertised by the (already existent) capabilities
array, but are _disabled_ by default. The exception is async messages,
which are _enabled_ by default
   


Any reason to have an exception like this?


3. We should add command(s) to enable/disable protocol features

4. Proper feature negotiation is done in pause mode. That's, clients
interested in enabling new protocol features should start QEMU in
pause mode and enable the features they are interested in using
   


Why does this matter?

We should be careful to support connecting to a VM long after it's been 
started so any requirement like this is likely to cause trouble.


Regards,

Anthony Liguori




Re: [Qemu-devel] [PATCH] dmg: fix -open failure

2010-01-11 Thread malc
On Mon, 11 Jan 2010, Christoph Hellwig wrote:

 On Mon, Jan 11, 2010 at 02:56:16PM +0100, Kevin Wolf wrote:
  Are you going to propose a cleaner patch? I have currently some other
  bugs to do first, but I was certainly planning to do so. However, I'll
  happily leave it to you if you have the time right now.
 
 I'm looking into doing it in the generic block layer, yes.
 
  For dmg, I'm not even sure if it's worth fixing. Does anybody use this
  driver? But I guess it's good enough for another lowest priority task on
  my todo list. Right after vvfat or so...
 
 Hehe.  All these odd image formats are extremly low in my todo list
 either.  I'd be really interested if there are any users around at all.

I use vvfat.

-- 
mailto:av1...@comtv.ru




[Qemu-devel] virtio: Add memory statistics reporting to the balloon driver (V6)

2010-01-11 Thread Adam Litke
After some good discussion, V6 of this patch integrates well with the new QMP
support.  When the monitor is in QMP mode, the query-balloon command triggers a
stats refresh request to the guest.  This request is asynchronous.  If the
guest does not respond then nothing further happens.  When stats are updated, a
BALLOON monitor event is raised and the data element will contain the memory
statistics.

For the user monitor, a timer has been added to prevent monitor hangs with
unresponsive guests.  When the timer fires, the most recently collected stats
are returned along with an additional entry 'age' which indicates the number of
host_clock milliseconds that have passed since the stats were collected.

This method for dealing with asynchronous commands may prove useful for other
existing or future commands.  In that case, we may want to consider
incorporating this into the actual monitor API.

Changes since V5:
 - Asynchronous query-balloon mode for QMP
 - Add timeout to prevent hanging the user monitor in synchronous mode

Changes since V4:
 - Virtio spec updated: 
http://ozlabs.org/~rusty/virtio-spec/virtio-spec-0.8.2.pdf
 - Guest-side Linux implementation applied by Rusty
 - Start using the QObject infrastructure
 - All endian conversions done in the host
 - Report stats that reference a quantity of memory in bytes

Changes since V3:
 - Increase stat field size to 64 bits
 - Report all sizes in kb (not pages)
 - Drop anon_pages stat

Changes since V2:
 - Use a virtqueue for communication instead of the device config space

Changes since V1:
 - In the monitor, print all stats on one line with less abbreviated names
 - Coding style changes

When using ballooning to manage overcommitted memory on a host, a system for
guests to communicate their memory usage to the host can provide information
that will minimize the impact of ballooning on the guests.  The current method
employs a daemon running in each guest that communicates memory statistics to a
host daemon at a specified time interval.  The host daemon aggregates this
information and inflates and/or deflates balloons according to the level of
host memory pressure.  This approach is effective but overly complex since a
daemon must be installed inside each guest and coordinated to communicate with
the host.  A simpler approach is to collect memory statistics in the virtio
balloon driver and communicate them directly to the hypervisor.

Signed-off-by: Adam Litke a...@us.ibm.com
To: Anthony Liguori aligu...@us.ibm.com
Cc: Avi Kivity a...@redhat.com
Cc: Luiz Capitulino lcapitul...@redhat.com
Cc: Jamie Lokier ja...@shareable.org
Cc: qemu-devel@nongnu.org

diff --git a/balloon.h b/balloon.h
index 60b4a5d..7e29028 100644
--- a/balloon.h
+++ b/balloon.h
@@ -16,12 +16,22 @@
 
 #include cpu-defs.h
 
-typedef ram_addr_t (QEMUBalloonEvent)(void *opaque, ram_addr_t target);
+/* Timeout for synchronous stats requests (in seconds) */
+#define QEMU_BALLOON_SYNC_TIMEOUT 5
+
+typedef enum {
+QEMU_BALLOON_MODE_NONE = 0,  /* No stats request is active */
+QEMU_BALLOON_MODE_SYNC = 1,  /* Synchronous stats request */
+QEMU_BALLOON_MODE_ASYNC = 2, /* Asynchronous stats request */
+} balloon_mode_t;
+
+typedef void (QEMUBalloonEvent)(void *opaque, ram_addr_t target,
+balloon_mode_t mode);
 
 void qemu_add_balloon_handler(QEMUBalloonEvent *func, void *opaque);
 
-void qemu_balloon(ram_addr_t target);
+int qemu_balloon(ram_addr_t target);
 
-ram_addr_t qemu_balloon_status(void);
+int qemu_balloon_status(balloon_mode_t mode);
 
 #endif
diff --git a/hw/virtio-balloon.c b/hw/virtio-balloon.c
index cfd3b41..bf67f4d 100644
--- a/hw/virtio-balloon.c
+++ b/hw/virtio-balloon.c
@@ -19,6 +19,10 @@
 #include balloon.h
 #include virtio-balloon.h
 #include kvm.h
+#include monitor.h
+#include qlist.h
+#include qint.h
+#include qstring.h
 
 #if defined(__linux__)
 #include sys/mman.h
@@ -27,9 +31,15 @@
 typedef struct VirtIOBalloon
 {
 VirtIODevice vdev;
-VirtQueue *ivq, *dvq;
+VirtQueue *ivq, *dvq, *svq;
 uint32_t num_pages;
 uint32_t actual;
+uint64_t stats[VIRTIO_BALLOON_S_NR];
+VirtQueueElement stats_vq_elem;
+size_t stats_vq_offset;
+balloon_mode_t stats_request_mode;
+QEMUTimer *stats_timer;
+uint64_t stats_updated;
 } VirtIOBalloon;
 
 static VirtIOBalloon *to_virtio_balloon(VirtIODevice *vdev)
@@ -46,6 +56,50 @@ static void balloon_page(void *addr, int deflate)
 #endif
 }
 
+/*
+ * reset_stats - Mark all items in the stats array as unset
+ *
+ * This function needs to be called at device intialization and before
+ * before updating to a set of newly-generated stats.  This will ensure that no
+ * stale values stick around in case the guest reports a subset of the 
supported
+ * statistics.
+ */
+static inline void reset_stats(VirtIOBalloon *dev)
+{
+int i;
+for (i = 0; i  VIRTIO_BALLOON_S_NR; dev-stats[i++] = -1);
+dev-stats_updated = qemu_get_clock(host_clock);
+}
+
+static void 

Re: [Qemu-devel] [PATCHv2] Add KVM paravirt cpuid leaf

2010-01-11 Thread Anthony Liguori

On 01/07/2010 10:24 AM, Gleb Natapov wrote:

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 4084503..6a841de 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -17,6 +17,7 @@
  #includesys/mman.h

  #includelinux/kvm.h
+#includelinux/kvm_para.h


This breaks the build on a default F12 install because while kvm.h is 
present, kvm_para.h is not.  This is a hard one to fix.


We can default the kvm search path to /lib/modules/$(uname -r)/build, we 
can fix the glibc headers and live with it, or we can pull in the kvm 
headers into qemu.


Avi/Marcelo/Jan, any thoughts from the qemu-kvm side?

Regards,

Anthony Liguori




[Qemu-devel] Registering key events android (qemu based) emulator

2010-01-11 Thread Anwar Ghani
Hi All

guys I am a bit new to this stuff. I want to call a method after user presses a 
combination of keys lets say alt+s or whatever. How can I do it using which 
event handler.

Best Regards



Anwar Ghani

+31 647 344 773

--- On Mon, 1/11/10, Anthony Liguori anth...@codemonkey.ws wrote:

From: Anthony Liguori anth...@codemonkey.ws
Subject: Re: [Qemu-devel] [PATCHv2] Add KVM paravirt cpuid leaf
To: Gleb Natapov g...@redhat.com
Cc: Marcelo Tosatti mtosa...@redhat.com, Jan Kiszka jan.kis...@web.de, 
qemu-devel@nongnu.org, kvm-devel k...@vger.kernel.org, Avi Kivity 
a...@redhat.com
Date: Monday, January 11, 2010, 7:18 PM

On 01/07/2010 10:24 AM, Gleb Natapov wrote:
 diff --git a/target-i386/kvm.c b/target-i386/kvm.c
 index 4084503..6a841de 100644
 --- a/target-i386/kvm.c
 +++ b/target-i386/kvm.c
 @@ -17,6 +17,7 @@
   #includesys/mman.h
 
   #includelinux/kvm.h
 +#includelinux/kvm_para.h

This breaks the build on a default F12 install because while kvm.h is present, 
kvm_para.h is not.  This is a hard one to fix.

We can default the kvm search path to /lib/modules/$(uname -r)/build, we can 
fix the glibc headers and live with it, or we can pull in the kvm headers into 
qemu.

Avi/Marcelo/Jan, any thoughts from the qemu-kvm side?

Regards,

Anthony Liguori





  

[Qemu-devel] Re: [RFC] API change for pci_set_word and related functions

2010-01-11 Thread Stefan Weil
Michael S. Tsirkin schrieb:
 On Thu, Jan 07, 2010 at 04:07:26PM +0100, Stefan Weil wrote:
 Michael S. Tsirkin schrieb:
 On Thu, Jan 07, 2010 at 12:15:25PM +0100, Stefan Weil wrote:
 ...
 ---
 hw/eepro100.c | 4 +---
 1 files changed, 1 insertions(+), 3 deletions(-)

 diff --git a/hw/eepro100.c b/hw/eepro100.c
 index 336ca49..a21c984 100644
 --- a/hw/eepro100.c
 +++ b/hw/eepro100.c
 @@ -420,10 +420,8 @@ static void pci_reset(EEPRO100State * s)
 /* TODO: this is the default, do not override. */
 PCI_CONFIG_16(PCI_COMMAND, 0x);
 /* PCI Status */
 - /* TODO: this seems to make no sense. */
 /* TODO: Value at RST# should be 0. */
 So this second todo can go too. I've removed it in my tree.

 - PCI_CONFIG_16(PCI_STATUS,
 - PCI_STATUS_REC_MASTER_ABORT | PCI_STATUS_SIG_TARGET_ABORT);
 + PCI_CONFIG_16(PCI_STATUS, PCI_STATUS_DEVSEL_MEDIUM |
 PCI_STATUS_FAST_BACK);
 /* PCI Revision ID */
 PCI_CONFIG_8(PCI_REVISION_ID, 0x08);
 BTW if you are not afraid of churn, there's no reason
 for PCI_CONFIG_8 and friends anymore, because pci.h
 has much nicer pci_set_byte etc.
 Hello Michael,

 I already noticed pci_set_byte, pci_set_word, pci_set_long and
 the corresponding pci_get_xxx functions and thought about using them.

 I did not start it because I want to suggest a different API
 for use in PCI device emulations:

 instead of

 pci_set_word(pci_conf + PCI_STATUS, PCI_STATUS_CAP_LIST);

 or

 pci_set_word(pci_conf[PCI_STATUS], PCI_STATUS_CAP_LIST);

 it would be better to call

 pci_set_word(pci_conf, PCI_STATUS, PCI_STATUS_CAP_LIST);


 The prototypes would look like this:

 /* Set PCI config value. */
 void pci_set_word(PCIDevice *s, uint8_t offset, uint16_t val);

 /* Set PCI cmask value. */
 void pci_set_cmask_word(PCIDevice *s, uint8_t offset, uint16_t val);

 /* Set PCI wmask value. */
 void pci_set_wmask_word(PCIDevice *s, uint8_t offset, uint16_t val);

 What are the advantages?
 * strict type checking (the old API takes any uint8_t *)

 So IMO it's easier to make mistakes with your proposed API because if
 you confuse offset and value compiler does not complain. More
 importantly, if you want to pass some other uint8_t pointer to e.g.
 pci_set_word, you can *and nothing unexpected will happen*
 it will set the word to the given value. So the current
 API is more type safe than what you propose.


No. The current API takes any uint8_t pointer to read or write
a value. This is not safe.

The proposed API only takes a PCIDevice pointer
and reads or writes only configuration (or cmask or
wmask) values. Yes, you can take offset for value.
If you are lucky and value is an uint16_t or uint32_t,
your compiler will complain. And even if your compiler
does not complain, it is wrong but still safe, because
the code will only access the PCI configuration data.


 * many other pci_* functions also have a first parameter of type PCIDevice

 So what would make sense IMO is higer level abstraction,
 for example similar to what we have with capabilities
 and msix, I think we could have something like this
 for e.g. power management.

 For low-level bit tweaking, the advantages of current API is that same
 thing can be used to set wmask, cmask, config itself, and whatever else
 we will come up with.

The low level API can be used where low level is
adequate: in pci.c for example.

To implement emulated PCI devices, a more robust API
would be better. Think of the number of devices which
are still missing, think of people who want to write
a new PCI device emulation for QEMU without being
a QEMU expert.


 * calls look nicer (at least in my opinion)

 What I value is the fact that it's obvious which
 data is changed.

Here there is no difference between current and
proposed API:

old: pci_set_word(pci_conf[PCI_STATUS], PCI_STATUS_CAP_LIST);
new: pci_set_word(pci_conf, PCI_STATUS, PCI_STATUS_CAP_LIST);

Every function call hides what happens. If you really wanted
to see which data is changed, you would have to write

*(uint16_t *)pci_conf[PCI_STATUS] = cpu_to_le16(PCI_STATUS_CAP_LIST);


 * strict range checking (offset is limited to 0...255, additional
   assertions possible - the old API is unsafe because it just takes
   a pointer)

 I don't think we want to add return status, so there wouldn't
 be a benefit to range checking as we can't act on it.
 Anyway, it's very unusual to use anything but a constant
 as an offset, so range errors are very uncommon.

There is an implicit range checking in the proposed
API because the offset is uint8_t, so it cannot
exceed the range which is valid for configuration
offsets.

A more elaborated check could require that
configuration byte values are only addressed
using pci_set_byte (not pci_set_long)
and raise a fatal runtime error otherwise.

Runtime checks without return values
are well established in QEMU's code,
and they are very useful for code writers.


 The functions are inline, so the resulting code won't differ.

 Instead of _byte, _word and _long I personally prefer something
 

Re: [Qemu-devel] [PATCH 2/3] use pkg-config for sdl when cross compiling

2010-01-11 Thread Anthony Liguori

On 01/07/2010 08:42 AM, Paolo Bonzini wrote:

Together with the previous patch this enables using the prefixed
pkg-config, thus picking up the correct flags for SDL.  Since
pkg-config has an awful lot of dependencies, I still use sdl-config
when not cross-compiling since some people may only have the latter.
   


Wouldn't it make more sense to try to use pkg-config always and if it's 
not available, fall back on sdl-config?


That means we don't have a separate code path for cross compiling which 
means it's less likely to break.


Regards,

Anthony Liguori




[Qemu-devel] Re: [RFC] API change for pci_set_word and related functions

2010-01-11 Thread Michael S. Tsirkin
On Mon, Jan 11, 2010 at 08:38:53PM +0100, Stefan Weil wrote:
 Michael S. Tsirkin schrieb:
  On Thu, Jan 07, 2010 at 04:07:26PM +0100, Stefan Weil wrote:
  Michael S. Tsirkin schrieb:
  On Thu, Jan 07, 2010 at 12:15:25PM +0100, Stefan Weil wrote:
  ...
  ---
  hw/eepro100.c | 4 +---
  1 files changed, 1 insertions(+), 3 deletions(-)
 
  diff --git a/hw/eepro100.c b/hw/eepro100.c
  index 336ca49..a21c984 100644
  --- a/hw/eepro100.c
  +++ b/hw/eepro100.c
  @@ -420,10 +420,8 @@ static void pci_reset(EEPRO100State * s)
  /* TODO: this is the default, do not override. */
  PCI_CONFIG_16(PCI_COMMAND, 0x);
  /* PCI Status */
  - /* TODO: this seems to make no sense. */
  /* TODO: Value at RST# should be 0. */
  So this second todo can go too. I've removed it in my tree.
 
  - PCI_CONFIG_16(PCI_STATUS,
  - PCI_STATUS_REC_MASTER_ABORT | PCI_STATUS_SIG_TARGET_ABORT);
  + PCI_CONFIG_16(PCI_STATUS, PCI_STATUS_DEVSEL_MEDIUM |
  PCI_STATUS_FAST_BACK);
  /* PCI Revision ID */
  PCI_CONFIG_8(PCI_REVISION_ID, 0x08);
  BTW if you are not afraid of churn, there's no reason
  for PCI_CONFIG_8 and friends anymore, because pci.h
  has much nicer pci_set_byte etc.
  Hello Michael,
 
  I already noticed pci_set_byte, pci_set_word, pci_set_long and
  the corresponding pci_get_xxx functions and thought about using them.
 
  I did not start it because I want to suggest a different API
  for use in PCI device emulations:
 
  instead of
 
  pci_set_word(pci_conf + PCI_STATUS, PCI_STATUS_CAP_LIST);
 
  or
 
  pci_set_word(pci_conf[PCI_STATUS], PCI_STATUS_CAP_LIST);
 
  it would be better to call
 
  pci_set_word(pci_conf, PCI_STATUS, PCI_STATUS_CAP_LIST);
 
 
  The prototypes would look like this:
 
  /* Set PCI config value. */
  void pci_set_word(PCIDevice *s, uint8_t offset, uint16_t val);
 
  /* Set PCI cmask value. */
  void pci_set_cmask_word(PCIDevice *s, uint8_t offset, uint16_t val);
 
  /* Set PCI wmask value. */
  void pci_set_wmask_word(PCIDevice *s, uint8_t offset, uint16_t val);
 
  What are the advantages?
  * strict type checking (the old API takes any uint8_t *)
 
  So IMO it's easier to make mistakes with your proposed API because if
  you confuse offset and value compiler does not complain. More
  importantly, if you want to pass some other uint8_t pointer to e.g.
  pci_set_word, you can *and nothing unexpected will happen*
  it will set the word to the given value. So the current
  API is more type safe than what you propose.
 
 
 No. The current API takes any uint8_t pointer to read or write
 a value. This is not safe.

Why isn't it?

 The proposed API only takes a PCIDevice pointer
 and reads or writes only configuration (or cmask or
 wmask) values. Yes, you can take offset for value.
 If you are lucky and value is an uint16_t or uint32_t,
 your compiler will complain.

Such a compiler will also complain over most of qemu code.

 And even if your compiler
 does not complain, it is wrong but still safe, because
 the code will only access the PCI configuration data.
 

Correct and safe beats wrong and safe every time.

  * many other pci_* functions also have a first parameter of type PCIDevice
 
  So what would make sense IMO is higer level abstraction,
  for example similar to what we have with capabilities
  and msix, I think we could have something like this
  for e.g. power management.
 
  For low-level bit tweaking, the advantages of current API is that same
  thing can be used to set wmask, cmask, config itself, and whatever else
  we will come up with.
 
 The low level API can be used where low level is
 adequate: in pci.c for example.
 
 To implement emulated PCI devices, a more robust API
 would be better. Think of the number of devices which
 are still missing, think of people who want to write
 a new PCI device emulation for QEMU without being
 a QEMU expert.
 
 
  * calls look nicer (at least in my opinion)
 
  What I value is the fact that it's obvious which
  data is changed.
 
 Here there is no difference between current and
 proposed API:
 
 old: pci_set_word(pci_conf[PCI_STATUS], PCI_STATUS_CAP_LIST);
 new: pci_set_word(pci_conf, PCI_STATUS, PCI_STATUS_CAP_LIST);
 
 Every function call hides what happens. If you really wanted
 to see which data is changed, you would have to write
 
 *(uint16_t *)pci_conf[PCI_STATUS] = cpu_to_le16(PCI_STATUS_CAP_LIST);

That's what we used to have, and it's not all bad, but very verbose and
ugly.

 
  * strict range checking (offset is limited to 0...255, additional
assertions possible - the old API is unsafe because it just takes
a pointer)
 
  I don't think we want to add return status, so there wouldn't
  be a benefit to range checking as we can't act on it.
  Anyway, it's very unusual to use anything but a constant
  as an offset, so range errors are very uncommon.
 
 There is an implicit range checking in the proposed
 API because the offset is uint8_t, so it cannot
 exceed the range which is valid for configuration
 offsets.

Oh, btw, this is 

Re: [Qemu-devel] [PATCH] virtio-pci: thinko fix

2010-01-11 Thread Anthony Liguori

On 01/11/2010 09:57 AM, Michael S. Tsirkin wrote:

Since patch ed757e140c0ada220f213036e4497315d24ca8bct, virtio will
sometimes clear all status registers on bus master disable, which loses
information such as VIRTIO_CONFIG_S_FAILED bit.  This is a result of
a patch being misapplied: code uses !  instead of ~ for bit
operations as in Yan's original patch.  This obviously does not make
sense.
   


Actually, the original patch used '!'.  It didn't carry a Signed-off-by 
and when it was reposted with a SoB, this was changed to '~'.  There was 
no indication though that the contents of the patch was different.


Moral of the story is, when you make a change to the patch, add a (v2) 
and state in the commit message what change was made.


Thanks for catching this.

Regards,

Anthony Liguori




Re: [Qemu-devel] QMP forward compatibility support

2010-01-11 Thread Daniel P. Berrange
On Mon, Jan 11, 2010 at 12:57:15PM -0600, Anthony Liguori wrote:
 On 01/11/2010 12:34 PM, Luiz Capitulino wrote:
   Hi.
 
   We (Markus and I) are working on getting QMP forward compatibility 
   support,
 supported. :)
 
   We have a plan for it and I'd like to ask the CC'ed people to review it.
 
   Needless to say, but the objective here is to add new commands, 
   arguments,
 async messages and protocol features w/o breaking existing clients.
 
 General Plan
 
 
 1. QMP should describe itself, ie. it should dump all accepted commands,
 their replies and arguments, async messages and protocol features. All in
 JSON format
 
 2. Protocol features are advertised by the (already existent) capabilities
 array, but are _disabled_ by default. The exception is async messages,
 which are _enabled_ by default
 
 Any reason to have an exception like this?

Personally I'd be fine with async events being off by default too.
There's no point in me receiving events I'm not able to handle,
so might as well only give me events I actually request.

 3. We should add command(s) to enable/disable protocol features
 
 4. Proper feature negotiation is done in pause mode. That's, clients
 interested in enabling new protocol features should start QEMU in
 pause mode and enable the features they are interested in using

 
 Why does this matter?
 
 We should be careful to support connecting to a VM long after it's been 
 started so any requirement like this is likely to cause trouble.

Indeed, this won't work for libvirt. If you stop the libvirtd daemon
while a guest is running, and then start the libvirtd daemon again, it
will need to re-negotiate features while the guest is running.

Daniel
-- 
|: Red Hat, Engineering, London   -o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org  -o-  http://virt-manager.org  -o-  http://ovirt.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-  F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|




[Qemu-devel] [PATCH 2/5] tcg-sparc: Implement add2, sub2, mulu2.

2010-01-11 Thread Richard Henderson
Add missing 32-bit double-word support opcodes.

Signed-off-by: Richard Henderson r...@twiddle.net
---
 tcg/sparc/tcg-target.c |   27 +++
 1 files changed, 27 insertions(+), 0 deletions(-)

diff --git a/tcg/sparc/tcg-target.c b/tcg/sparc/tcg-target.c
index 067e26e..6934580 100644
--- a/tcg/sparc/tcg-target.c
+++ b/tcg/sparc/tcg-target.c
@@ -215,6 +215,7 @@ static inline int tcg_target_const_match(tcg_target_long 
val,
 #define BA (INSN_OP(0) | INSN_COND(COND_A, 0) | INSN_OP2(0x2))
 
 #define ARITH_ADD  (INSN_OP(2) | INSN_OP3(0x00))
+#define ARITH_ADDCC (INSN_OP(2) | INSN_OP3(0x10))
 #define ARITH_AND  (INSN_OP(2) | INSN_OP3(0x01))
 #define ARITH_OR   (INSN_OP(2) | INSN_OP3(0x02))
 #define ARITH_ORCC (INSN_OP(2) | INSN_OP3(0x12))
@@ -238,6 +239,7 @@ static inline int tcg_target_const_match(tcg_target_long 
val,
 #define SHIFT_SRLX (INSN_OP(2) | INSN_OP3(0x26) | (1  12))
 #define SHIFT_SRAX (INSN_OP(2) | INSN_OP3(0x27) | (1  12))
 
+#define RDY(INSN_OP(2) | INSN_OP3(0x28) | INSN_RS1(0))
 #define WRY(INSN_OP(2) | INSN_OP3(0x30))
 #define JMPL   (INSN_OP(2) | INSN_OP3(0x38))
 #define SAVE   (INSN_OP(2) | INSN_OP3(0x3c))
@@ -410,6 +412,11 @@ static inline void tcg_out_sety(TCGContext *s, 
tcg_target_long val)
 fprintf(stderr, unimplemented sety %ld\n, (long)val);
 }
 
+static inline void tcg_out_rdy(TCGContext *s, int rd)
+{
+tcg_out32(s, RDY | INSN_RD(rd));
+}
+
 static inline void tcg_out_addi(TCGContext *s, int reg, tcg_target_long val)
 {
 if (val != 0) {
@@ -1132,6 +1139,23 @@ static inline void tcg_out_op(TCGContext *s, int opc, 
const TCGArg *args,
 args[2], const_args[2],
 args[3], const_args[3], args[5]);
 break;
+case INDEX_op_add2_i32:
+tcg_out_arithc(s, args[0], args[2], args[4], const_args[4],
+   ARITH_ADDCC);
+tcg_out_arithc(s, args[1], args[3], args[5], const_args[5],
+   ARITH_ADDX);
+break;
+case INDEX_op_sub2_i32:
+tcg_out_arithc(s, args[0], args[2], args[4], const_args[4],
+   ARITH_SUBCC);
+tcg_out_arithc(s, args[1], args[3], args[5], const_args[5],
+   ARITH_SUBX);
+break;
+case INDEX_op_mulu2_i32:
+tcg_out_arithc(s, args[0], args[2], args[3], const_args[3],
+   ARITH_UMUL);
+tcg_out_rdy(s, args[1]);
+break;
 #endif
 
 case INDEX_op_qemu_ld8u:
@@ -1250,6 +1274,9 @@ static const TCGTargetOpDef sparc_op_defs[] = {
 { INDEX_op_brcond_i32, { r, rJ } },
 #if TCG_TARGET_REG_BITS == 32
 { INDEX_op_brcond2_i32, { r, r, rJ, rJ } },
+{ INDEX_op_add2_i32, { r, r, r, r, rJ, rJ } },
+{ INDEX_op_sub2_i32, { r, r, r, r, rJ, rJ } },
+{ INDEX_op_mulu2_i32, { r, r, r, rJ } },
 #endif
 
 { INDEX_op_qemu_ld8u, { r, L } },
-- 
1.6.5.2





[Qemu-devel] [PATCH 3/5] tcg-sparc: Do not remove %o[012] from 'r' constraint.

2010-01-11 Thread Richard Henderson
Only 'L' constraint needs that.

Signed-off-by: Richard Henderson r...@twiddle.net
---
 tcg/sparc/tcg-target.c |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/tcg/sparc/tcg-target.c b/tcg/sparc/tcg-target.c
index 6934580..8675fce 100644
--- a/tcg/sparc/tcg-target.c
+++ b/tcg/sparc/tcg-target.c
@@ -143,6 +143,9 @@ static int target_parse_constraint(TCGArgConstraint *ct, 
const char **pct_str)
 ct_str = *pct_str;
 switch (ct_str[0]) {
 case 'r':
+ct-ct |= TCG_CT_REG;
+tcg_regset_set32(ct-u.regs, 0, 0x);
+break;
 case 'L': /* qemu_ld/st constraint */
 ct-ct |= TCG_CT_REG;
 tcg_regset_set32(ct-u.regs, 0, 0x);
-- 
1.6.5.2





[Qemu-devel] [PATCH 4/5] tcg-sparc: Implement division properly.

2010-01-11 Thread Richard Henderson
The {div,divu}2 opcodes are intended for systems for which the
division instruction produces both quotient and remainder.  Sparc
is not such a system.  Indeed, the remainder must be computed as

  quot = a / b
  rem = a - (quot * b)

Split out a tcg_out_div32 function that properly initializes Y
with the extension of the input to 64-bits.  Discard the code
that used the 64-bit DIVX on sparc9/sparcv8plus without extending
the inputs to 64-bits.  Implement remainders in terms of division
followed by multiplication.

Signed-off-by: Richard Henderson r...@twiddle.net
---
 tcg/sparc/tcg-target.c |   82 ++-
 tcg/sparc/tcg-target.h |3 ++
 2 files changed, 55 insertions(+), 30 deletions(-)

diff --git a/tcg/sparc/tcg-target.c b/tcg/sparc/tcg-target.c
index 8675fce..1bef2fb 100644
--- a/tcg/sparc/tcg-target.c
+++ b/tcg/sparc/tcg-target.c
@@ -243,7 +243,7 @@ static inline int tcg_target_const_match(tcg_target_long 
val,
 #define SHIFT_SRAX (INSN_OP(2) | INSN_OP3(0x27) | (1  12))
 
 #define RDY(INSN_OP(2) | INSN_OP3(0x28) | INSN_RS1(0))
-#define WRY(INSN_OP(2) | INSN_OP3(0x30))
+#define WRY(INSN_OP(2) | INSN_OP3(0x30) | INSN_RD(0))
 #define JMPL   (INSN_OP(2) | INSN_OP3(0x38))
 #define SAVE   (INSN_OP(2) | INSN_OP3(0x3c))
 #define RESTORE(INSN_OP(2) | INSN_OP3(0x3d))
@@ -407,12 +407,9 @@ static inline void tcg_out_st(TCGContext *s, TCGType type, 
int arg,
 tcg_out_ldst(s, arg, arg1, arg2, STX);
 }
 
-static inline void tcg_out_sety(TCGContext *s, tcg_target_long val)
+static inline void tcg_out_sety(TCGContext *s, int rs)
 {
-if (val == 0 || val == -1)
-tcg_out32(s, WRY | INSN_IMM13(val));
-else
-fprintf(stderr, unimplemented sety %ld\n, (long)val);
+tcg_out32(s, WRY | INSN_RS1(TCG_REG_G0) | INSN_RS2(rs));
 }
 
 static inline void tcg_out_rdy(TCGContext *s, int rd)
@@ -444,6 +441,21 @@ static inline void tcg_out_andi(TCGContext *s, int reg, 
tcg_target_long val)
 }
 }
 
+static void tcg_out_div32(TCGContext *s, int rd, int rs1,
+  int val2, int val2const, int uns)
+{
+/* Load Y with the sign/zero extension of RS1 to 64-bits.  */
+if (uns) {
+tcg_out_sety(s, TCG_REG_G0);
+} else {
+tcg_out_arith(s, TCG_REG_I5, rs1, 31, SHIFT_SRA);
+tcg_out_sety(s, TCG_REG_I5);
+}
+
+tcg_out_arithc(s, rd, rs1, val2, val2const,
+   uns ? ARITH_UDIV : ARITH_SDIV);
+}
+
 static inline void tcg_out_nop(TCGContext *s)
 {
 tcg_out_sethi(s, TCG_REG_G0, 0);
@@ -1113,24 +1125,22 @@ static inline void tcg_out_op(TCGContext *s, int opc, 
const TCGArg *args,
 case INDEX_op_mul_i32:
 c = ARITH_UMUL;
 goto gen_arith;
-case INDEX_op_div2_i32:
-#if defined(__sparc_v9__) || defined(__sparc_v8plus__)
-c = ARITH_SDIVX;
-goto gen_arith;
-#else
-tcg_out_sety(s, 0);
-c = ARITH_SDIV;
-goto gen_arith;
-#endif
-case INDEX_op_divu2_i32:
-#if defined(__sparc_v9__) || defined(__sparc_v8plus__)
-c = ARITH_UDIVX;
-goto gen_arith;
-#else
-tcg_out_sety(s, 0);
-c = ARITH_UDIV;
-goto gen_arith;
-#endif
+
+case INDEX_op_div_i32:
+tcg_out_div32(s, args[0], args[1], args[2], const_args[2], 0);
+break;
+case INDEX_op_divu_i32:
+tcg_out_div32(s, args[0], args[1], args[2], const_args[2], 1);
+break;
+
+case INDEX_op_rem_i32:
+case INDEX_op_remu_i32:
+tcg_out_div32(s, TCG_REG_I5, args[1], args[2], const_args[2],
+  opc == INDEX_op_remu_i32);
+tcg_out_arithc(s, TCG_REG_I5, TCG_REG_I5, args[2], const_args[2],
+   ARITH_UMUL);
+tcg_out_arith(s, args[0], args[1], TCG_REG_I5, ARITH_SUB);
+break;
 
 case INDEX_op_brcond_i32:
 tcg_out_brcond_i32(s, args[2], args[0], args[1], const_args[1],
@@ -1214,12 +1224,20 @@ static inline void tcg_out_op(TCGContext *s, int opc, 
const TCGArg *args,
 case INDEX_op_mul_i64:
 c = ARITH_MULX;
 goto gen_arith;
-case INDEX_op_div2_i64:
+case INDEX_op_div_i64:
 c = ARITH_SDIVX;
 goto gen_arith;
-case INDEX_op_divu2_i64:
+case INDEX_op_divu_i64:
 c = ARITH_UDIVX;
 goto gen_arith;
+case INDEX_op_rem_i64:
+case INDEX_op_remu_i64:
+tcg_out_arithc(s, TCG_REG_I5, args[1], args[2], const_args[2],
+   opc == INDEX_op_rem_i64 ? ARITH_SDIVX : ARITH_UDIVX);
+tcg_out_arithc(s, TCG_REG_I5, TCG_REG_I5, args[2], const_args[2],
+   ARITH_MULX);
+tcg_out_arith(s, args[0], args[1], TCG_REG_I5, ARITH_SUB);
+break;
 
 case INDEX_op_brcond_i64:
 tcg_out_brcond_i64(s, args[2], args[0], args[1], const_args[1],
@@ -1263,8 +1281,10 @@ static const TCGTargetOpDef sparc_op_defs[] = {
 
 { INDEX_op_add_i32, { r, r, rJ } },
 { INDEX_op_mul_i32, { r, r, rJ } },
-  

[Qemu-devel] [PATCH 5/5] tcg-sparc: Implement ext32[su]_i64

2010-01-11 Thread Richard Henderson
The 32-bit right-shift instructions is defined to extend the shifted
output to 64-bits.  A shift count of zero therefore is a simple
extension without actually shifting.

Signed-off-by: Richard Henderson r...@twiddle.net
---
 tcg/sparc/tcg-target.c |   16 
 tcg/sparc/tcg-target.h |5 +
 2 files changed, 21 insertions(+), 0 deletions(-)

diff --git a/tcg/sparc/tcg-target.c b/tcg/sparc/tcg-target.c
index 1bef2fb..bc28f43 100644
--- a/tcg/sparc/tcg-target.c
+++ b/tcg/sparc/tcg-target.c
@@ -1238,6 +1238,20 @@ static inline void tcg_out_op(TCGContext *s, int opc, 
const TCGArg *args,
ARITH_MULX);
 tcg_out_arith(s, args[0], args[1], TCG_REG_I5, ARITH_SUB);
 break;
+case INDEX_op_ext32s_i64:
+if (const_args[1]) {
+tcg_out_movi(s, TCG_TYPE_I64, args[0], (int32_t)args[1]);
+} else {
+tcg_out_arithi(s, args[0], args[1], 0, SHIFT_SRA);
+}
+break;
+case INDEX_op_ext32u_i64:
+if (const_args[1]) {
+tcg_out_movi_imm32(s, args[0], args[1]);
+} else {
+tcg_out_arithi(s, args[0], args[1], 0, SHIFT_SRL);
+}
+break;
 
 case INDEX_op_brcond_i64:
 tcg_out_brcond_i64(s, args[2], args[0], args[1], const_args[1],
@@ -1344,6 +1358,8 @@ static const TCGTargetOpDef sparc_op_defs[] = {
 { INDEX_op_shl_i64, { r, r, rJ } },
 { INDEX_op_shr_i64, { r, r, rJ } },
 { INDEX_op_sar_i64, { r, r, rJ } },
+{ INDEX_op_ext32s_i64, { r, ri } },
+{ INDEX_op_ext32u_i64, { r, ri } },
 
 { INDEX_op_brcond_i64, { r, rJ } },
 #endif
diff --git a/tcg/sparc/tcg-target.h b/tcg/sparc/tcg-target.h
index e00707b..d27ed5a 100644
--- a/tcg/sparc/tcg-target.h
+++ b/tcg/sparc/tcg-target.h
@@ -91,6 +91,11 @@ enum {
 #define TCG_TARGET_HAS_div_i32
 #define TCG_TARGET_HAS_div_i64
 
+#if TCG_TARGET_REG_BITS == 64
+#define TCG_TARGET_HAS_ext32s_i64
+#define TCG_TARGET_HAS_ext32u_i64
+#endif
+
 //#define TCG_TARGET_HAS_bswap32_i32
 //#define TCG_TARGET_HAS_bswap64_i64
 //#define TCG_TARGET_HAS_neg_i32
-- 
1.6.5.2





Re: [Qemu-devel] [PATCH] [tftp] Handle TFTP ERROR from client

2010-01-11 Thread Anthony Liguori

On 01/07/2010 11:01 AM, Thomas Horsten wrote:

If a PXE client only wants to find out the size of a file, it will
open the file and then abort the transfer by sending a TFTP ERROR packet.

The ERROR packet should cause qemu to terminate the session. If not,
the sessions will soon run out and cause timeouts in the client.

Also, if a TFTP session already exists with same IP/UDP port, it
should be terminated when a new RRQ is received, instead of creating a
duplicate (which will never be used).

A patch for gPXE to send the ERROR packet is also being submitted to
gPXE. Together they resolve slowness/hanging when booting pxegrub from
qemu's internal TFTP server. The patch from Milan Plzik to return
after sending OACK is also required for a complete fix.

Signed-off-by: Thomas Horstentho...@horsten.com
Signed-off-by: Milan Plzikmilan.pl...@gmail.com
   


Applied.  Thanks.

Regards,

Anthony Liguori

---
  slirp/tftp.c |   23 +++
  1 files changed, 23 insertions(+), 0 deletions(-)

diff --git a/slirp/tftp.c b/slirp/tftp.c
index db869fc..96c0e0c 100644
--- a/slirp/tftp.c
+++ b/slirp/tftp.c
@@ -264,6 +264,12 @@ static void tftp_handle_rrq(Slirp *slirp, struct tftp_t 
*tp, int pktlen)
size_t prefix_len;
char *req_fname;

+  /* check if a session already exists and if so terminate it */
+  s = tftp_session_find(slirp, tp);
+  if (s= 0) {
+tftp_session_terminate(slirp-tftp_sessions[s]);
+  }
+
s = tftp_session_allocate(slirp, tp);

if (s  0) {
@@ -386,6 +392,19 @@ static void tftp_handle_ack(Slirp *slirp, struct tftp_t 
*tp, int pktlen)
}
  }

+static void tftp_handle_error(Slirp *slirp, struct tftp_t *tp, int pktlen)
+{
+  int s;
+
+  s = tftp_session_find(slirp, tp);
+
+  if (s  0) {
+return;
+  }
+
+  tftp_session_terminate(slirp-tftp_sessions[s]);
+}
+
  void tftp_input(struct mbuf *m)
  {
struct tftp_t *tp = (struct tftp_t *)m-m_data;
@@ -398,5 +417,9 @@ void tftp_input(struct mbuf *m)
case TFTP_ACK:
  tftp_handle_ack(m-slirp, tp, m-m_len);
  break;
+
+  case TFTP_ERROR:
+tftp_handle_error(m-slirp, tp, m-m_len);
+break;
}
  }
   






Re: [Qemu-devel] [PATCH] pc-bios: Update README (SeaBIOS)

2010-01-11 Thread Anthony Liguori

On 01/07/2010 12:27 PM, Stefan Weil wrote:

The PC BIOS no longer comes from Bochs.
This patch updates the related entry.

V2 - Modify SeaBIOS description and URL
  (Thanks to Gleb Natapov for the hint).

Signed-off-by: Stefan Weilw...@mail.berlios.de
   


Applied.  Thanks.

Regards,

Anthony Liguori

---
  pc-bios/README |3 ++-
  1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/pc-bios/README b/pc-bios/README
index 1b7a666..7dfa459 100644
--- a/pc-bios/README
+++ b/pc-bios/README
@@ -1,4 +1,5 @@
-- The PC BIOS comes from the Bochs project (http://bochs.sourceforge.net/).
+- SeaBIOS (bios.bin) is the successor of pc bios.
+  See http://www.seabios.org/ for more information.

  - The VGA BIOS and the Cirrus VGA BIOS come from the LGPL VGA bios
project (http://www.nongnu.org/vgabios/).
   






Re: [Qemu-devel] [PATCHv7 1/3] qdev: add bit property type

2010-01-11 Thread Anthony Liguori

On 01/10/2010 05:52 AM, Michael S. Tsirkin wrote:

This adds bit property type, which is a boolean stored in a 32 bit
integer field, with legal values on and off.  Will be used by virtio for
feature bits.

Signed-off-by: Michael S. Tsirkinm...@redhat.com
Acked-by: Gerd Hoffmannkra...@redhat.com
   


Applied.  Thanks.

Regards,

Anthony Liguori

---
  hw/qdev-properties.c |   62 -
  hw/qdev.h|   11 +
  2 files changed, 66 insertions(+), 7 deletions(-)

diff --git a/hw/qdev-properties.c b/hw/qdev-properties.c
index 217ddc0..9e123ae 100644
--- a/hw/qdev-properties.c
+++ b/hw/qdev-properties.c
@@ -9,6 +9,59 @@ void *qdev_get_prop_ptr(DeviceState *dev, Property *prop)
  return ptr;
  }

+static uint32_t qdev_get_prop_mask(Property *prop)
+{
+assert(prop-info-type == PROP_TYPE_BIT);
+return 0x1  prop-bitnr;
+}
+
+static void bit_prop_set(DeviceState *dev, Property *props, bool val)
+{
+uint32_t *p = qdev_get_prop_ptr(dev, props);
+uint32_t mask = qdev_get_prop_mask(props);
+if (val)
+*p |= ~mask;
+else
+*p= ~mask;
+}
+
+static void qdev_prop_cpy(DeviceState *dev, Property *props, void *src)
+{
+if (props-info-type == PROP_TYPE_BIT) {
+bool *defval = src;
+bit_prop_set(dev, props, *defval);
+} else {
+char *dst = qdev_get_prop_ptr(dev, props);
+memcpy(dst, src, props-info-size);
+}
+}
+
+/* Bit */
+static int parse_bit(DeviceState *dev, Property *prop, const char *str)
+{
+if (!strncasecmp(str, on, 2))
+bit_prop_set(dev, prop, true);
+else if (!strncasecmp(str, off, 3))
+bit_prop_set(dev, prop, false);
+else
+return -1;
+return 0;
+}
+
+static int print_bit(DeviceState *dev, Property *prop, char *dest, size_t len)
+{
+uint8_t *p = qdev_get_prop_ptr(dev, prop);
+return snprintf(dest, len, (*p  qdev_get_prop_mask(prop)) ? on : off);
+}
+
+PropertyInfo qdev_prop_bit = {
+.name  = on/off,
+.type  = PROP_TYPE_BIT,
+.size  = sizeof(uint32_t),
+.parse = parse_bit,
+.print = print_bit,
+};
+
  /* --- 8bit integer --- */

  static int parse_uint8(DeviceState *dev, Property *prop, const char *str)
@@ -511,7 +564,6 @@ int qdev_prop_parse(DeviceState *dev, const char *name, 
const char *value)
  void qdev_prop_set(DeviceState *dev, const char *name, void *src, enum 
PropertyType type)
  {
  Property *prop;
-void *dst;

  prop = qdev_prop_find(dev, name);
  if (!prop) {
@@ -524,8 +576,7 @@ void qdev_prop_set(DeviceState *dev, const char *name, void 
*src, enum PropertyT
  __FUNCTION__, dev-info-name, name);
  abort();
  }
-dst = qdev_get_prop_ptr(dev, prop);
-memcpy(dst, src, prop-info-size);
+qdev_prop_cpy(dev, prop, src);
  }

  void qdev_prop_set_uint8(DeviceState *dev, const char *name, uint8_t value)
@@ -585,14 +636,11 @@ void qdev_prop_set_ptr(DeviceState *dev, const char 
*name, void *value)

  void qdev_prop_set_defaults(DeviceState *dev, Property *props)
  {
-char *dst;
-
  if (!props)
  return;
  while (props-name) {
  if (props-defval) {
-dst = qdev_get_prop_ptr(dev, props);
-memcpy(dst, props-defval, props-info-size);
+qdev_prop_cpy(dev, props, props-defval);
  }
  props++;
  }
diff --git a/hw/qdev.h b/hw/qdev.h
index bbcdba1..07b9603 100644
--- a/hw/qdev.h
+++ b/hw/qdev.h
@@ -64,6 +64,7 @@ struct Property {
  const char   *name;
  PropertyInfo *info;
  int  offset;
+int  bitnr;
  void *defval;
  };

@@ -82,6 +83,7 @@ enum PropertyType {
  PROP_TYPE_NETDEV,
  PROP_TYPE_VLAN,
  PROP_TYPE_PTR,
+PROP_TYPE_BIT,
  };

  struct PropertyInfo {
@@ -173,6 +175,7 @@ void do_device_del(Monitor *mon, const QDict *qdict);

  /*** qdev-properties.c ***/

+extern PropertyInfo qdev_prop_bit;
  extern PropertyInfo qdev_prop_uint8;
  extern PropertyInfo qdev_prop_uint16;
  extern PropertyInfo qdev_prop_uint32;
@@ -202,6 +205,14 @@ extern PropertyInfo qdev_prop_pci_devfn;
  + type_check(_type,typeof_field(_state, _field)),   \
  .defval= (_type[]) { _defval }, \
  }
+#define DEFINE_PROP_BIT(_name, _state, _field, _bit, _defval) {  \
+.name  = (_name),\
+.info  =(qdev_prop_bit),   \
+.bitnr= (_bit),  \
+.offset= offsetof(_state, _field)\
++ type_check(uint32_t,typeof_field(_state, _field)), \
+.defval= (bool[]) { (_defval) }, \
+}

  #define DEFINE_PROP_UINT8(_n, _s, _f, _d)   \
  DEFINE_PROP_DEFAULT(_n, _s, _f, _d, qdev_prop_uint8, uint8_t)
   






Re: [Qemu-devel] [PATCH] dmg: fix -open failure

2010-01-11 Thread Anthony Liguori

On 01/11/2010 07:06 AM, Christoph Hellwig wrote:

Currently the dmg image format driver simply opens the images as raw
if any kind of failure happens.  This is contrarty to the behaviour
of all other image formats which just return an error and let the
block core deal with it.

Signed-off-by: Christoph Hellwigh...@lst.de
   


Applied.  Thanks.

Regards,

Anthony Liguori

Index: qemu/block/dmg.c
===
--- qemu.orig/block/dmg.c   2010-01-11 14:00:25.945021645 +0100
+++ qemu/block/dmg.c2010-01-11 14:03:03.006036707 +0100
@@ -90,24 +90,21 @@ static int dmg_open(BlockDriverState *bs

  /* read offset of info blocks */
  if(lseek(s-fd,-0x1d8,SEEK_END)0) {
-dmg_close:
-   close(s-fd);
-   /* open raw instead */
-   bs-drv=bdrv_find_format(raw);
-   return bs-drv-bdrv_open(bs, filename, flags);
+goto fail;
  }
+
  info_begin=read_off(s-fd);
  if(info_begin==0)
-   goto dmg_close;
+   goto fail;
  if(lseek(s-fd,info_begin,SEEK_SET)0)
-   goto dmg_close;
+   goto fail;
  if(read_uint32(s-fd)!=0x100)
-   goto dmg_close;
+   goto fail;
  if((count = read_uint32(s-fd))==0)
-   goto dmg_close;
+   goto fail;
  info_end = info_begin+count;
  if(lseek(s-fd,0xf8,SEEK_CUR)0)
-   goto dmg_close;
+   goto fail;

  /* read offsets */
  last_in_offset = last_out_offset = 0;
@@ -116,14 +113,14 @@ dmg_close:

count = read_uint32(s-fd);
if(count==0)
-   goto dmg_close;
+   goto fail;
type = read_uint32(s-fd);
if(type!=0x6d697368 || count244)
lseek(s-fd,count-4,SEEK_CUR);
else {
int new_size, chunk_count;
if(lseek(s-fd,200,SEEK_CUR)0)
-   goto dmg_close;
+   goto fail;
chunk_count = (count-204)/40;
new_size = sizeof(uint64_t) * (s-n_chunks + chunk_count);
s-types = qemu_realloc(s-types, new_size/2);
@@ -142,7 +139,7 @@ dmg_close:
chunk_count--;
i--;
if(lseek(s-fd,36,SEEK_CUR)0)
-   goto dmg_close;
+   goto fail;
continue;
}
read_uint32(s-fd);
@@ -163,11 +160,14 @@ dmg_close:
  s-compressed_chunk = qemu_malloc(max_compressed_size+1);
  s-uncompressed_chunk = qemu_malloc(512*max_sectors_per_chunk);
  if(inflateInit(s-zstream) != Z_OK)
-   goto dmg_close;
+   goto fail;

  s-current_chunk = s-n_chunks;

  return 0;
+fail:
+close(s-fd);
+return -1;
  }

  static inline int is_sector_in_chunk(BDRVDMGState* s,



   






Re: [Qemu-devel] [PATCH-RFC 00/13] vhost-net: preview

2010-01-11 Thread Daniel P. Berrange
On Mon, Jan 11, 2010 at 07:16:42PM +0200, Michael S. Tsirkin wrote:
 Here's an untested patchset with vhost support for upstream qemu.  Note
 that you should not expect performance gains from vhost unless in-kernel
 irqchip is enabled (which is not in upstream qemu now).  Since adding
 vhost involves quite a bit of infrastructure, I thought it makes sense
 to send an RFC already, so that interested parties can review it.  In
 particular, command line and help text need to be finalized early to so
 that management can start looking on supporting the feature. This patch
 has all bits besides migration filled in. Also missing is packet socket
 backend: another team is now working on this.

Can you clarify a question about migration for me. Is it possible to
live migrate a guest configured with tap + bridge on one machine over
to another where it is launched with vhost + bridge, and vice-versa.
In other words does this vhost support have any guest visible impact
that would cause migraiton compatability problems, or it is purely a
host side optimization like vnet_hdr was ?

Regards,
Daniel
-- 
|: Red Hat, Engineering, London   -o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org  -o-  http://virt-manager.org  -o-  http://ovirt.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-  F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|




Re: [Qemu-devel] [PATCH 1/8] virtio: Remove duplicate macro definition for max. virtqueues, bump up the max

2010-01-11 Thread Anthony Liguori

On 01/07/2010 01:31 AM, Amit Shah wrote:

VIRTIO_PCI_QUEUE_MAX is redefined in hw/virtio.c. Let's just keep it in
hw/virtio.h.

Also, bump up the value of the maximum allowed virtqueues to 64. This is
in preparation to allow multiple ports per virtio-console device.

Signed-off-by: Amit Shahamit.s...@redhat.com
   


These will need rebasing against master.  It gets pretty ugly because of 
the rename.  Wait for me to do a review (today or tomorrow) before hand 
though.  I'll explicitly reply and Ack/Nack.


Regards,

Anthony Liguori




Re: [Qemu-devel] [PATCH 4/5] tcg-sparc: Implement division properly.

2010-01-11 Thread Richard Henderson

On 01/11/2010 11:09 AM, Richard Henderson wrote:

+/* Load Y with the sign/zero extension of RS1 to 64-bits.  */
+if (uns) {
+tcg_out_sety(s, TCG_REG_G0);
+} else {
+tcg_out_arith(s, TCG_REG_I5, rs1, 31, SHIFT_SRA);


Bah.  tcg_out_arithi.

Programming by code inspection only sucks.


r~




Re: [Qemu-devel] [PATCH-RFC 00/13] vhost-net: preview

2010-01-11 Thread Michael S. Tsirkin
On Mon, Jan 11, 2010 at 08:09:10PM +, Daniel P. Berrange wrote:
 On Mon, Jan 11, 2010 at 07:16:42PM +0200, Michael S. Tsirkin wrote:
  Here's an untested patchset with vhost support for upstream qemu.  Note
  that you should not expect performance gains from vhost unless in-kernel
  irqchip is enabled (which is not in upstream qemu now).  Since adding
  vhost involves quite a bit of infrastructure, I thought it makes sense
  to send an RFC already, so that interested parties can review it.  In
  particular, command line and help text need to be finalized early to so
  that management can start looking on supporting the feature. This patch
  has all bits besides migration filled in. Also missing is packet socket
  backend: another team is now working on this.
 
 Can you clarify a question about migration for me. Is it possible to
 live migrate a guest configured with tap + bridge on one machine over
 to another where it is launched with vhost + bridge, and vice-versa.

Long term, this will be possible without any tweaks.  Currently vhost
does not support mergeable buffers feature which thus needs to be
disabled on both sides for migration to work.  Work is underway to add
this support.

 In other words does this vhost support have any guest visible impact
 that would cause migraiton compatability problems, or it is purely a
 host side optimization like vnet_hdr was ?
 
 Regards,
 Daniel

vnet_hdr has guest visible impact.

 -- 
 |: Red Hat, Engineering, London   -o-   http://people.redhat.com/berrange/ :|
 |: http://libvirt.org  -o-  http://virt-manager.org  -o-  http://ovirt.org :|
 |: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
 |: GnuPG: 7D3B9505  -o-  F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|




[Qemu-devel] [PATCH] pci: Add missing 'const' in argument to pci_get_xxx

2010-01-11 Thread Stefan Weil
pci_get_byte, pci_get_word, pci_get_long and pci_get_quad
all take a const uint8_t pointer, because they only read
the configuration data.

Their prototypes should reflect this fact.

Signed-off-by: Stefan Weil w...@mail.berlios.de
---
 hw/pci.h |   14 +++---
 1 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/hw/pci.h b/hw/pci.h
index 9b5ae97..025eb9f 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -245,7 +245,7 @@ pci_set_byte(uint8_t *config, uint8_t val)
 }
 
 static inline uint8_t
-pci_get_byte(uint8_t *config)
+pci_get_byte(const uint8_t *config)
 {
 return *config;
 }
@@ -257,9 +257,9 @@ pci_set_word(uint8_t *config, uint16_t val)
 }
 
 static inline uint16_t
-pci_get_word(uint8_t *config)
+pci_get_word(const uint8_t *config)
 {
-return le16_to_cpupu((uint16_t *)config);
+return le16_to_cpupu((const uint16_t *)config);
 }
 
 static inline void
@@ -269,9 +269,9 @@ pci_set_long(uint8_t *config, uint32_t val)
 }
 
 static inline uint32_t
-pci_get_long(uint8_t *config)
+pci_get_long(const uint8_t *config)
 {
-return le32_to_cpupu((uint32_t *)config);
+return le32_to_cpupu((const uint32_t *)config);
 }
 
 static inline void
@@ -281,9 +281,9 @@ pci_set_quad(uint8_t *config, uint64_t val)
 }
 
 static inline uint64_t
-pci_get_quad(uint8_t *config)
+pci_get_quad(const uint8_t *config)
 {
-return le64_to_cpup((uint64_t *)config);
+return le64_to_cpup((const uint64_t *)config);
 }
 
 static inline void
-- 
1.6.5





[Qemu-devel] Re: [RFC] API change for pci_set_word and related functions

2010-01-11 Thread Stefan Weil
Michael S. Tsirkin schrieb:
 On Mon, Jan 11, 2010 at 08:38:53PM +0100, Stefan Weil wrote:
 Michael S. Tsirkin schrieb:
 On Thu, Jan 07, 2010 at 04:07:26PM +0100, Stefan Weil wrote:
 Michael S. Tsirkin schrieb:
 On Thu, Jan 07, 2010 at 12:15:25PM +0100, Stefan Weil wrote:
 ...
 - PCI_CONFIG_16(PCI_STATUS,
 - PCI_STATUS_REC_MASTER_ABORT | PCI_STATUS_SIG_TARGET_ABORT);
 + PCI_CONFIG_16(PCI_STATUS, PCI_STATUS_DEVSEL_MEDIUM |
 PCI_STATUS_FAST_BACK);
 /* PCI Revision ID */
 PCI_CONFIG_8(PCI_REVISION_ID, 0x08);
 BTW if you are not afraid of churn, there's no reason
 for PCI_CONFIG_8 and friends anymore, because pci.h
 has much nicer pci_set_byte etc.
 Hello Michael,

 I already noticed pci_set_byte, pci_set_word, pci_set_long and
 the corresponding pci_get_xxx functions and thought about using them.

 I did not start it because I want to suggest a different API
 for use in PCI device emulations:

 instead of

 pci_set_word(pci_conf + PCI_STATUS, PCI_STATUS_CAP_LIST);

 or

 pci_set_word(pci_conf[PCI_STATUS], PCI_STATUS_CAP_LIST);

 it would be better to call

 pci_set_word(pci_conf, PCI_STATUS, PCI_STATUS_CAP_LIST);


 The prototypes would look like this:

 /* Set PCI config value. */
 void pci_set_word(PCIDevice *s, uint8_t offset, uint16_t val);

 /* Set PCI cmask value. */
 void pci_set_cmask_word(PCIDevice *s, uint8_t offset, uint16_t val);

 /* Set PCI wmask value. */
 void pci_set_wmask_word(PCIDevice *s, uint8_t offset, uint16_t val);

 What are the advantages?
 * strict type checking (the old API takes any uint8_t *)
 So IMO it's easier to make mistakes with your proposed API because if
 you confuse offset and value compiler does not complain. More
 importantly, if you want to pass some other uint8_t pointer to e.g.
 pci_set_word, you can *and nothing unexpected will happen*
 it will set the word to the given value. So the current
 API is more type safe than what you propose.

 No. The current API takes any uint8_t pointer to read or write
 a value. This is not safe.

 Why isn't it?

It is not safe, because it allows programmers to write silly code
like these examples:

pci_set_word(gen_opc_cc_op[PCI_STATUS], PCI_STATUS_CAP_LIST);
pci_set_word(pci_conf[UINT32_MAX], PCI_STATUS_CAP_LIST);

for (i = 0; i  sizeof(pci_conf); i++) {
pci_set_long(pci_conf[i], 0);
}

All three will result in runtime failures which can be very
difficult to detect.


 The proposed API only takes a PCIDevice pointer
 and reads or writes only configuration (or cmask or
 wmask) values. Yes, you can take offset for value.
 If you are lucky and value is an uint16_t or uint32_t,
 your compiler will complain.

 Such a compiler will also complain over most of qemu code.

Yes, but it's good to see that QEMU's code is
improving.

By the way, such a compiler is gcc when called with
-Wconversion, so it is easy to see how much code
is affected.


 And even if your compiler
 does not complain, it is wrong but still safe, because
 the code will only access the PCI configuration data.


 Correct and safe beats wrong and safe every time.

See example above.


 * many other pci_* functions also have a first parameter of type
 PCIDevice
 So what would make sense IMO is higer level abstraction,
 for example similar to what we have with capabilities
 and msix, I think we could have something like this
 for e.g. power management.

 For low-level bit tweaking, the advantages of current API is that same
 thing can be used to set wmask, cmask, config itself, and whatever else
 we will come up with.
 The low level API can be used where low level is
 adequate: in pci.c for example.

 To implement emulated PCI devices, a more robust API
 would be better. Think of the number of devices which
 are still missing, think of people who want to write
 a new PCI device emulation for QEMU without being
 a QEMU expert.

 * calls look nicer (at least in my opinion)
 What I value is the fact that it's obvious which
 data is changed.
 Here there is no difference between current and
 proposed API:

 old: pci_set_word(pci_conf[PCI_STATUS], PCI_STATUS_CAP_LIST);
 new: pci_set_word(pci_conf, PCI_STATUS, PCI_STATUS_CAP_LIST);

 Every function call hides what happens. If you really wanted
 to see which data is changed, you would have to write

 *(uint16_t *)pci_conf[PCI_STATUS] = cpu_to_le16(PCI_STATUS_CAP_LIST);

 That's what we used to have, and it's not all bad, but very verbose and
 ugly.

Yes. I also prefer a function API.


 * strict range checking (offset is limited to 0...255, additional
 assertions possible - the old API is unsafe because it just takes
 a pointer)
 I don't think we want to add return status, so there wouldn't
 be a benefit to range checking as we can't act on it.
 Anyway, it's very unusual to use anything but a constant
 as an offset, so range errors are very uncommon.
 There is an implicit range checking in the proposed
 API because the offset is uint8_t, so it cannot
 exceed the range which is valid for configuration
 offsets.

 Oh, btw, this is 

[Qemu-devel] Re: [RFC] API change for pci_set_word and related functions

2010-01-11 Thread Michael S. Tsirkin
On Mon, Jan 11, 2010 at 09:18:51PM +0100, Stefan Weil wrote:
 Michael S. Tsirkin schrieb:
  On Mon, Jan 11, 2010 at 08:38:53PM +0100, Stefan Weil wrote:
  Michael S. Tsirkin schrieb:
  On Thu, Jan 07, 2010 at 04:07:26PM +0100, Stefan Weil wrote:
  Michael S. Tsirkin schrieb:
  On Thu, Jan 07, 2010 at 12:15:25PM +0100, Stefan Weil wrote:
  ...
  - PCI_CONFIG_16(PCI_STATUS,
  - PCI_STATUS_REC_MASTER_ABORT | PCI_STATUS_SIG_TARGET_ABORT);
  + PCI_CONFIG_16(PCI_STATUS, PCI_STATUS_DEVSEL_MEDIUM |
  PCI_STATUS_FAST_BACK);
  /* PCI Revision ID */
  PCI_CONFIG_8(PCI_REVISION_ID, 0x08);
  BTW if you are not afraid of churn, there's no reason
  for PCI_CONFIG_8 and friends anymore, because pci.h
  has much nicer pci_set_byte etc.
  Hello Michael,
 
  I already noticed pci_set_byte, pci_set_word, pci_set_long and
  the corresponding pci_get_xxx functions and thought about using them.
 
  I did not start it because I want to suggest a different API
  for use in PCI device emulations:
 
  instead of
 
  pci_set_word(pci_conf + PCI_STATUS, PCI_STATUS_CAP_LIST);
 
  or
 
  pci_set_word(pci_conf[PCI_STATUS], PCI_STATUS_CAP_LIST);
 
  it would be better to call
 
  pci_set_word(pci_conf, PCI_STATUS, PCI_STATUS_CAP_LIST);
 
 
  The prototypes would look like this:
 
  /* Set PCI config value. */
  void pci_set_word(PCIDevice *s, uint8_t offset, uint16_t val);
 
  /* Set PCI cmask value. */
  void pci_set_cmask_word(PCIDevice *s, uint8_t offset, uint16_t val);
 
  /* Set PCI wmask value. */
  void pci_set_wmask_word(PCIDevice *s, uint8_t offset, uint16_t val);
 
  What are the advantages?
  * strict type checking (the old API takes any uint8_t *)
  So IMO it's easier to make mistakes with your proposed API because if
  you confuse offset and value compiler does not complain. More
  importantly, if you want to pass some other uint8_t pointer to e.g.
  pci_set_word, you can *and nothing unexpected will happen*
  it will set the word to the given value. So the current
  API is more type safe than what you propose.
 
  No. The current API takes any uint8_t pointer to read or write
  a value. This is not safe.
 
  Why isn't it?
 
 It is not safe, because it allows programmers to write silly code
 like these examples:
 
 pci_set_word(gen_opc_cc_op[PCI_STATUS], PCI_STATUS_CAP_LIST);

This might be valid and useful for all I know.

 pci_set_word(pci_conf[UINT32_MAX], PCI_STATUS_CAP_LIST);
 
 for (i = 0; i  sizeof(pci_conf); i++) {
 pci_set_long(pci_conf[i], 0);
 }
 
 All three will result in runtime failures which can be very
 difficult to detect.

In theory. In practice I expect pci_set_word(pci_dev, 0x0, PCI_STATUS)
to be much more common with your proposed API. Just look how
common swapping memset parameters is.

 
  The proposed API only takes a PCIDevice pointer
  and reads or writes only configuration (or cmask or
  wmask) values. Yes, you can take offset for value.
  If you are lucky and value is an uint16_t or uint32_t,
  your compiler will complain.
 
  Such a compiler will also complain over most of qemu code.
 
 Yes, but it's good to see that QEMU's code is
 improving.
 
 By the way, such a compiler is gcc when called with
 -Wconversion, so it is easy to see how much code
 is affected.

Didn't try it. So it will warn on
pci_set_word(pci_dev, 0x0, PCI_STATUS)
but not
pci_set_word(pci_dev, PCI_STATUS, 0x0)
?

 
  And even if your compiler
  does not complain, it is wrong but still safe, because
  the code will only access the PCI configuration data.
 
 
  Correct and safe beats wrong and safe every time.
 
 See example above.

Example above tries to show that usng pointers in C is bad, switching to
indexes all over is proposed as a solution.  Oh well .. but I am
surpised you bring up type safety as an argument, is is exactly the
reason to use pointers.

 
  * many other pci_* functions also have a first parameter of type
  PCIDevice
  So what would make sense IMO is higer level abstraction,
  for example similar to what we have with capabilities
  and msix, I think we could have something like this
  for e.g. power management.
 
  For low-level bit tweaking, the advantages of current API is that same
  thing can be used to set wmask, cmask, config itself, and whatever else
  we will come up with.
  The low level API can be used where low level is
  adequate: in pci.c for example.
 
  To implement emulated PCI devices, a more robust API
  would be better. Think of the number of devices which
  are still missing, think of people who want to write
  a new PCI device emulation for QEMU without being
  a QEMU expert.
 
  * calls look nicer (at least in my opinion)
  What I value is the fact that it's obvious which
  data is changed.
  Here there is no difference between current and
  proposed API:
 
  old: pci_set_word(pci_conf[PCI_STATUS], PCI_STATUS_CAP_LIST);
  new: pci_set_word(pci_conf, PCI_STATUS, PCI_STATUS_CAP_LIST);
 
  Every function call hides what happens. If you really wanted
  to see which data is changed, you 

  1   2   >