Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm
At Fri, 21 May 2010 06:28:42 +0100, Stefan Hajnoczi wrote: > > On Thu, May 20, 2010 at 11:16 PM, Christian Brunner wrote: > > 2010/5/20 Anthony Liguori : > >> Both sheepdog and ceph ultimately transmit I/O over a socket to a central > >> daemon, right? So could we not standardize a protocol for this that both > >> sheepdog and ceph could implement? > > > > There is no central daemon. The concept is that they talk to many > > storage nodes at the same time. Data is distributed and replicated > > over many nodes in the network. The mechanism to do this is quite > > complex. I don't know about sheepdog, but in Ceph this is called RADOS > > (reliable autonomic distributed object store). Sheepdog and Ceph may > > look similar, but this is where they act different. I don't think that > > it would be possible to implement a common protocol. > > I believe Sheepdog has a local daemon on each node. The QEMU storage > backend talks to the daemon on the same node, which then does the real > network communication with the rest of the distributed storage system. Yes. It is because Sheepdog doesn't have a configuration about cluster membership as I mentioned in another mail, so the drvier doesn't know which node to access other than localhost. > So I think we're not talking about a network protocol here, we're > talking about a common interface that can be used by QEMU and other > programs to take advantage of Ceph, Sheepdog, etc services available > on the local node. > > Haven't looked into your patch enough yet, but does librados talk > directly over the network or does it connect to a local daemon/driver? > AFAIK, librados access directly over the network, so I think it is difficult to define a common interface. Thanks, Kazutaka -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm
At Fri, 21 May 2010 00:16:46 +0200, Christian Brunner wrote: > > 2010/5/20 Anthony Liguori : > >> With new approaches like Sheepdog or Ceph, things are getting a lot > >> cheaper and you can scale your system without disrupting your service. > >> The concepts are quite similar to what Amazon is doing in their EC2 > >> environment, but they certainly won't publish it as OpenSource anytime > >> soon. > >> > >> Both projects have advantages and disadvantages. Ceph is a bit more > >> universal as it implements a whole filesystem. Sheepdog is more > >> feature complete in regards of managing images (e.g. snapshots). Both I think a major difference is that Sheepdog servers act fully autonomously. Any Sheepdog server has no fixed role such as a monitor server, and Sheepdog doesn't require any configuration about a list of nodes in the cluster. > >> projects require some additional work to become stable, but they are > >> on a good way. > >> > >> I would really like to see both drivers in the qemu tree, as they are > >> the key to a design shift in how storage in the datacenter is being > >> built. > >> > > > > I'd be more interested in enabling people to build these types of storage > > systems without touching qemu. > > You could do this by using Yehuda's rbd kernel driver, but I think > that it would be better to avoid this additional layer. > I agree. In addition, if a storage client is a qemu driver, the storage system can support some features specific to qemu such as live snapshot from qemu monitor. Regards, Kazutaka -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm
On Thu, May 20, 2010 at 11:16 PM, Christian Brunner wrote: > 2010/5/20 Anthony Liguori : >> Both sheepdog and ceph ultimately transmit I/O over a socket to a central >> daemon, right? So could we not standardize a protocol for this that both >> sheepdog and ceph could implement? > > There is no central daemon. The concept is that they talk to many > storage nodes at the same time. Data is distributed and replicated > over many nodes in the network. The mechanism to do this is quite > complex. I don't know about sheepdog, but in Ceph this is called RADOS > (reliable autonomic distributed object store). Sheepdog and Ceph may > look similar, but this is where they act different. I don't think that > it would be possible to implement a common protocol. I believe Sheepdog has a local daemon on each node. The QEMU storage backend talks to the daemon on the same node, which then does the real network communication with the rest of the distributed storage system. So I think we're not talking about a network protocol here, we're talking about a common interface that can be used by QEMU and other programs to take advantage of Ceph, Sheepdog, etc services available on the local node. Haven't looked into your patch enough yet, but does librados talk directly over the network or does it connect to a local daemon/driver? Stefan -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [KVM-AUTOTEST PATCH] KVM test: use command line option wrapper functions
- "Lucas Meneghel Rodrigues" wrote: > From: "Lucas Meneghel Rodrigues" > To: "Michael Goldish" > Cc: "Feng Yang" , autot...@test.kernel.org, > kvm@vger.kernel.org > Sent: Thursday, May 20, 2010 6:57:23 PM GMT +08:00 Beijing / Chongqing / Hong > Kong / Urumqi > Subject: Re: [KVM-AUTOTEST PATCH] KVM test: use command line option wrapper > functions > > On Thu, 2010-05-20 at 12:50 +0300, Michael Goldish wrote: > > On 05/19/2010 11:25 AM, Feng Yang wrote: > > > Hi, Michael > > > > > > Thanks for your patch. > > > We plan add "netdev" parameter support in make_qemu_command. > Since you are working on this part. Could you add netdev support in > your patch? hopeful netdev can be default supported in > make_qemu_command if qemu support it. Thanks very much! > > > > Sure, I'll look into it. > > > > > I think the point of this patch is good and we need this kinds of > patch. > > > But I think we need not add so many new function. Especially some > function only directly return the string and do nothing more. > > > This will increase the function call consumption. > > > > > All these helper functions are meant to be extended and modified in > the > > future. They're only there to minimize future effort involved in > adding > > support for new command line syntaxes. > > Right now add_smp() just returns " -smp %s", but in the future we > may > > have to support different syntaxes for -smp, and then add_smp() > will > > consult the output of 'qemu -help' and return the proper string. > > What do you mean by function call consumption? I don't think these > > functions cause a measurable slowdown, and make_qemu_command() is > called > > very few times, so this really isn't a concern IMO. > > Agreed, the wrappers are a good strategy in the case we have to > support > different feature sets and syntax. I know your meaning. Yes, the wrapper is a good strategy for some parameters. For the new added feature which could not work on old kvm, We need check support before using it. So i say we need this patch. But i still think so many one line function is not a good code style. For many parameters that already support by all the kvm build, I do not think it will have big change on syntax later. Because a mature software should ensure the stability of the interface. So we need not add one line function for these parameters now. Even these parameters change its interface later, then we update the code is ok. Again. Thanks for your patch and comments. > > > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH qemu-kvm 2/2] device-assignment: Don't use libpci
On Thu, 2010-05-20 at 17:27 -0700, Chris Wright wrote: > From: Alex Williamson > > We've already got an open fd for PCI config space for the device, we > might as well use it. This also makes sure that if we're making use of > a privileged file descriptor opened for us, we use it for all accesses > to the device. > > Signed-off-by: Alex Williamson > [chrisw: kill pci_dev, configure check for header, narrow header requirement] > Signed-off-by: Chris Wright Looks good, Thanks Alex > --- > configure | 14 +- > hw/device-assignment.c | 68 > > hw/device-assignment.h |1 - > 3 files changed, 58 insertions(+), 25 deletions(-) > > diff --git a/configure b/configure > index ed8e17b..8ca9e1e 100755 > --- a/configure > +++ b/configure > @@ -1623,20 +1623,20 @@ EOF > fi > > ## > -# libpci probe for kvm_cap_device_assignment > +# libpci header probe for kvm_cap_device_assignment > if test $kvm_cap_device_assignment = "yes" ; then >cat > $TMPC << EOF > -#include > +#include > #ifndef PCI_VENDOR_ID > -#error NO LIBPCI > +#error NO LIBPCI HEADER > #endif > -int main(void) { struct pci_access a; pci_init(&a); return 0; } > +int main(void) { return 0; } > EOF > - if compile_prog "" "-lpci -lz" ; then > -libs_softmmu="-lpci -lz $libs_softmmu" > + if compile_prog "" "" ; then > +kvm_cap_device_assignment=yes >else > echo > -echo "Error: libpci check failed" > +echo "Error: libpci header check failed" > echo "Disable KVM Device Assignment capability." > echo > kvm_cap_device_assignment=no > diff --git a/hw/device-assignment.c b/hw/device-assignment.c > index fd09ec3..d8e7cb4 100644 > --- a/hw/device-assignment.c > +++ b/hw/device-assignment.c > @@ -37,7 +37,7 @@ > #include "console.h" > #include "device-assignment.h" > #include "loader.h" > -#include > +#include > > /* From linux/ioport.h */ > #define IORESOURCE_IO 0x0100 /* Resource type */ > @@ -335,24 +335,61 @@ static void assigned_dev_ioport_map(PCIDevice *pci_dev, > int region_num, >(r_dev->v_addrs + region_num)); > } > > -static uint8_t pci_find_cap_offset(struct pci_dev *pci_dev, uint8_t cap) > +static uint32_t assigned_dev_pci_read(PCIDevice *d, int pos, int len) > +{ > +AssignedDevice *pci_dev = container_of(d, AssignedDevice, dev); > +uint32_t val; > +ssize_t ret; > +int fd = pci_dev->real_device.config_fd; > + > +again: > +ret = pread(fd, &val, len, pos); > +if (ret != len) { > + if ((ret < 0) && (errno == EINTR || errno == EAGAIN)) > + goto again; > + > + fprintf(stderr, "%s: pread failed, ret = %zd errno = %d\n", > + __func__, ret, errno); > + > + exit(1); > +} > + > +return val; > +} > + > +static uint8_t assigned_dev_pci_read_byte(PCIDevice *d, int pos) > +{ > +return (uint8_t)assigned_dev_pci_read(d, pos, 1); > +} > + > +static uint16_t assigned_dev_pci_read_word(PCIDevice *d, int pos) > +{ > +return (uint16_t)assigned_dev_pci_read(d, pos, 2); > +} > + > +static uint32_t assigned_dev_pci_read_long(PCIDevice *d, int pos) > +{ > +return assigned_dev_pci_read(d, pos, 4); > +} > + > +static uint8_t pci_find_cap_offset(PCIDevice *d, uint8_t cap) > { > int id; > int max_cap = 48; > int pos = PCI_CAPABILITY_LIST; > int status; > > -status = pci_read_byte(pci_dev, PCI_STATUS); > +status = assigned_dev_pci_read_byte(d, PCI_STATUS); > if ((status & PCI_STATUS_CAP_LIST) == 0) > return 0; > > while (max_cap--) { > -pos = pci_read_byte(pci_dev, pos); > +pos = assigned_dev_pci_read_byte(d, pos); > if (pos < 0x40) > break; > > pos &= ~3; > -id = pci_read_byte(pci_dev, pos + PCI_CAP_LIST_ID); > +id = assigned_dev_pci_read_byte(d, pos + PCI_CAP_LIST_ID); > > if (id == 0xff) > break; > @@ -858,7 +895,7 @@ static int assign_irq(AssignedDevice *dev) > int irq, r = 0; > > /* Interrupt PIN 0 means don't use INTx */ > -if (pci_read_byte(dev->pdev, PCI_INTERRUPT_PIN) == 0) > +if (assigned_dev_pci_read_byte(&dev->dev, PCI_INTERRUPT_PIN) == 0) > return 0; > > irq = pci_map_irq(&dev->dev, dev->intpin); > @@ -1196,7 +1233,7 @@ static int assigned_device_pci_cap_init(PCIDevice > *pci_dev) > #ifdef KVM_CAP_DEVICE_MSI > /* Expose MSI capability > * MSI capability is the 1st capability in capability config */ > -if (pci_find_cap_offset(dev->pdev, PCI_CAP_ID_MSI)) { > +if (pci_find_cap_offset(pci_dev, PCI_CAP_ID_MSI)) { > dev->cap.available |= ASSIGNED_DEVICE_CAP_MSI; > memset(&pci_dev->config[pci_dev->cap.start + pci_dev->cap.length], > 0, PCI_CAPABILITY_CONFIG_MSI_LENGTH); > @@ -1208,23 +1245,25 @@ static int assigned_device_pci_cap_init(PCIDevice > *pc
Re: [PATCH qemu-kvm 1/2] device-assignment: use stdint types
On Thu, 2010-05-20 at 17:25 -0700, Chris Wright wrote: > Use stdint types to avoid extra reliance on pci/pci.h header. > > Cc: Alex Williamson > Signed-off-by: Chris Wright > --- > hw/device-assignment.c |6 +++--- > 1 files changed, 3 insertions(+), 3 deletions(-) Acked-by: Alex Williamson > diff --git a/hw/device-assignment.c b/hw/device-assignment.c > index eb31c78..fd09ec3 100644 > --- a/hw/device-assignment.c > +++ b/hw/device-assignment.c > @@ -1024,9 +1024,9 @@ static void assigned_dev_update_msi(PCIDevice *pci_dev, > unsigned int ctrl_pos) > static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev) > { > AssignedDevice *adev = container_of(pci_dev, AssignedDevice, dev); > -u16 entries_nr = 0, entries_max_nr; > +uint16_t entries_nr = 0, entries_max_nr; > int pos = 0, i, r = 0; > -u32 msg_addr, msg_upper_addr, msg_data, msg_ctrl; > +uint32_t msg_addr, msg_upper_addr, msg_data, msg_ctrl; > struct kvm_assigned_msix_nr msix_nr; > struct kvm_assigned_msix_entry msix_entry; > void *va = adev->msix_table_page; > @@ -1210,7 +1210,7 @@ static int assigned_device_pci_cap_init(PCIDevice > *pci_dev) > /* Expose MSI-X capability */ > if (pci_find_cap_offset(dev->pdev, PCI_CAP_ID_MSIX)) { > int pos, entry_nr, bar_nr; > -u32 msix_table_entry; > +uint32_t msix_table_entry; > dev->cap.available |= ASSIGNED_DEVICE_CAP_MSIX; > memset(&pci_dev->config[pci_dev->cap.start + pci_dev->cap.length], > 0, PCI_CAPABILITY_CONFIG_MSIX_LENGTH); -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH qemu-kvm 2/2] device-assignment: Don't use libpci
From: Alex Williamson We've already got an open fd for PCI config space for the device, we might as well use it. This also makes sure that if we're making use of a privileged file descriptor opened for us, we use it for all accesses to the device. Signed-off-by: Alex Williamson [chrisw: kill pci_dev, configure check for header, narrow header requirement] Signed-off-by: Chris Wright --- configure | 14 +- hw/device-assignment.c | 68 hw/device-assignment.h |1 - 3 files changed, 58 insertions(+), 25 deletions(-) diff --git a/configure b/configure index ed8e17b..8ca9e1e 100755 --- a/configure +++ b/configure @@ -1623,20 +1623,20 @@ EOF fi ## -# libpci probe for kvm_cap_device_assignment +# libpci header probe for kvm_cap_device_assignment if test $kvm_cap_device_assignment = "yes" ; then cat > $TMPC << EOF -#include +#include #ifndef PCI_VENDOR_ID -#error NO LIBPCI +#error NO LIBPCI HEADER #endif -int main(void) { struct pci_access a; pci_init(&a); return 0; } +int main(void) { return 0; } EOF - if compile_prog "" "-lpci -lz" ; then -libs_softmmu="-lpci -lz $libs_softmmu" + if compile_prog "" "" ; then +kvm_cap_device_assignment=yes else echo -echo "Error: libpci check failed" +echo "Error: libpci header check failed" echo "Disable KVM Device Assignment capability." echo kvm_cap_device_assignment=no diff --git a/hw/device-assignment.c b/hw/device-assignment.c index fd09ec3..d8e7cb4 100644 --- a/hw/device-assignment.c +++ b/hw/device-assignment.c @@ -37,7 +37,7 @@ #include "console.h" #include "device-assignment.h" #include "loader.h" -#include +#include /* From linux/ioport.h */ #define IORESOURCE_IO 0x0100 /* Resource type */ @@ -335,24 +335,61 @@ static void assigned_dev_ioport_map(PCIDevice *pci_dev, int region_num, (r_dev->v_addrs + region_num)); } -static uint8_t pci_find_cap_offset(struct pci_dev *pci_dev, uint8_t cap) +static uint32_t assigned_dev_pci_read(PCIDevice *d, int pos, int len) +{ +AssignedDevice *pci_dev = container_of(d, AssignedDevice, dev); +uint32_t val; +ssize_t ret; +int fd = pci_dev->real_device.config_fd; + +again: +ret = pread(fd, &val, len, pos); +if (ret != len) { + if ((ret < 0) && (errno == EINTR || errno == EAGAIN)) + goto again; + + fprintf(stderr, "%s: pread failed, ret = %zd errno = %d\n", + __func__, ret, errno); + + exit(1); +} + +return val; +} + +static uint8_t assigned_dev_pci_read_byte(PCIDevice *d, int pos) +{ +return (uint8_t)assigned_dev_pci_read(d, pos, 1); +} + +static uint16_t assigned_dev_pci_read_word(PCIDevice *d, int pos) +{ +return (uint16_t)assigned_dev_pci_read(d, pos, 2); +} + +static uint32_t assigned_dev_pci_read_long(PCIDevice *d, int pos) +{ +return assigned_dev_pci_read(d, pos, 4); +} + +static uint8_t pci_find_cap_offset(PCIDevice *d, uint8_t cap) { int id; int max_cap = 48; int pos = PCI_CAPABILITY_LIST; int status; -status = pci_read_byte(pci_dev, PCI_STATUS); +status = assigned_dev_pci_read_byte(d, PCI_STATUS); if ((status & PCI_STATUS_CAP_LIST) == 0) return 0; while (max_cap--) { -pos = pci_read_byte(pci_dev, pos); +pos = assigned_dev_pci_read_byte(d, pos); if (pos < 0x40) break; pos &= ~3; -id = pci_read_byte(pci_dev, pos + PCI_CAP_LIST_ID); +id = assigned_dev_pci_read_byte(d, pos + PCI_CAP_LIST_ID); if (id == 0xff) break; @@ -858,7 +895,7 @@ static int assign_irq(AssignedDevice *dev) int irq, r = 0; /* Interrupt PIN 0 means don't use INTx */ -if (pci_read_byte(dev->pdev, PCI_INTERRUPT_PIN) == 0) +if (assigned_dev_pci_read_byte(&dev->dev, PCI_INTERRUPT_PIN) == 0) return 0; irq = pci_map_irq(&dev->dev, dev->intpin); @@ -1196,7 +1233,7 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev) #ifdef KVM_CAP_DEVICE_MSI /* Expose MSI capability * MSI capability is the 1st capability in capability config */ -if (pci_find_cap_offset(dev->pdev, PCI_CAP_ID_MSI)) { +if (pci_find_cap_offset(pci_dev, PCI_CAP_ID_MSI)) { dev->cap.available |= ASSIGNED_DEVICE_CAP_MSI; memset(&pci_dev->config[pci_dev->cap.start + pci_dev->cap.length], 0, PCI_CAPABILITY_CONFIG_MSI_LENGTH); @@ -1208,23 +1245,25 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev) #endif #ifdef KVM_CAP_DEVICE_MSIX /* Expose MSI-X capability */ -if (pci_find_cap_offset(dev->pdev, PCI_CAP_ID_MSIX)) { +if (pci_find_cap_offset(pci_dev, PCI_CAP_ID_MSIX)) { int pos, entry_nr, bar_nr; uint32_t msix_table_entry; dev->cap.available |= ASSIGNED_DEVICE_CAP_MSIX; memset(&pci_dev->config[pci_dev->cap.st
[PATCH qemu-kvm 1/2] device-assignment: use stdint types
Use stdint types to avoid extra reliance on pci/pci.h header. Cc: Alex Williamson Signed-off-by: Chris Wright --- hw/device-assignment.c |6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) diff --git a/hw/device-assignment.c b/hw/device-assignment.c index eb31c78..fd09ec3 100644 --- a/hw/device-assignment.c +++ b/hw/device-assignment.c @@ -1024,9 +1024,9 @@ static void assigned_dev_update_msi(PCIDevice *pci_dev, unsigned int ctrl_pos) static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev) { AssignedDevice *adev = container_of(pci_dev, AssignedDevice, dev); -u16 entries_nr = 0, entries_max_nr; +uint16_t entries_nr = 0, entries_max_nr; int pos = 0, i, r = 0; -u32 msg_addr, msg_upper_addr, msg_data, msg_ctrl; +uint32_t msg_addr, msg_upper_addr, msg_data, msg_ctrl; struct kvm_assigned_msix_nr msix_nr; struct kvm_assigned_msix_entry msix_entry; void *va = adev->msix_table_page; @@ -1210,7 +1210,7 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev) /* Expose MSI-X capability */ if (pci_find_cap_offset(dev->pdev, PCI_CAP_ID_MSIX)) { int pos, entry_nr, bar_nr; -u32 msix_table_entry; +uint32_t msix_table_entry; dev->cap.available |= ASSIGNED_DEVICE_CAP_MSIX; memset(&pci_dev->config[pci_dev->cap.start + pci_dev->cap.length], 0, PCI_CAPABILITY_CONFIG_MSIX_LENGTH); -- 1.6.6.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] device-assignment: Don't use libpci
* Chris Wright (chr...@redhat.com) wrote: > * Alex Williamson (alex.william...@redhat.com) wrote: > > We've already got an open fd for PCI config space for the device, we > > might as well use it. This also makes sure that if we're making use of > > a privileged file descriptor opened for us, we use it for all accesses > > to the device. > > > > Signed-off-by: Alex Williamson > > Looks good to me. Hmm, it's not that straight forward. We still have a build requirement for libpci. Not for the library, but the headers. This leaves the check for the header, drops the unused pci_dev, and narrows our header requirement. thanks, -chris > Acked-by: Chris Wright -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm
On Thu, May 20, 2010 at 1:31 PM, Blue Swirl wrote: > On Wed, May 19, 2010 at 7:22 PM, Christian Brunner wrote: >> The attached patch is a block driver for the distributed file system >> Ceph (http://ceph.newdream.net/). This driver uses librados (which >> is part of the Ceph server) for direct access to the Ceph object >> store and is running entirely in userspace. Therefore it is >> called "rbd" - rados block device. ... > > IIRC underscores here may conflict with system header use. Please use > something like QEMU_BLOCK_RADOS_H. This header is shared between the linux kernel client and the ceph userspace servers and client. We can actually get rid of it, as we only need it to define CEPH_OSD_TMAP_SET. We can move this definition to librados.h. >> diff --git a/block/rbd_types.h b/block/rbd_types.h >> new file mode 100644 >> index 000..dfd5aa0 >> --- /dev/null >> +++ b/block/rbd_types.h >> @@ -0,0 +1,48 @@ >> +#ifndef _FS_CEPH_RBD >> +#define _FS_CEPH_RBD > > QEMU_BLOCK_RBD? This header is shared between the ceph kernel client, between the qemu rbd module (and between other ceph utilities). It'd be much easier maintaining it without having to have a different implementation for each. The same goes to the use of __le32/64 and __u32/64 within these headers. > >> + >> +#include > > Can you use standard includes, like or ? Are > Ceph libraries used in other systems than Linux? Not at the moment. I guess that we can take this include out. > >> + >> +/* >> + * rbd image 'foo' consists of objects >> + * foo.rbd - image metadata >> + * foo. >> + * foo.0001 >> + * ... - data >> + */ >> + >> +#define RBD_SUFFIX ".rbd" >> +#define RBD_DIRECTORY "rbd_directory" >> + >> +#define RBD_DEFAULT_OBJ_ORDER 22 /* 4MB */ >> + >> +#define RBD_MAX_OBJ_NAME_SIZE 96 >> +#define RBD_MAX_SEG_NAME_SIZE 128 >> + >> +#define RBD_COMP_NONE 0 >> +#define RBD_CRYPT_NONE 0 >> + >> +static const char rbd_text[] = "<<< Rados Block Device Image >>>\n"; >> +static const char rbd_signature[] = "RBD"; >> +static const char rbd_version[] = "001.001"; >> + >> +struct rbd_obj_snap_ondisk { >> + __le64 id; >> + __le64 image_size; >> +} __attribute__((packed)); >> + >> +struct rbd_obj_header_ondisk { >> + char text[64]; >> + char signature[4]; >> + char version[8]; >> + __le64 image_size; > > Unaligned? Is the disk format fixed? This is a packed structure that represents the on disk format. Operations on it are being done only to read from the disk header or to write to the disk header. Yehuda -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] cgroups: Add an API to attach a task to current task's cgroup
On Thu, May 20, 2010 at 3:22 PM, Paul Menage wrote: > On Tue, May 18, 2010 at 5:04 PM, Sridhar Samudrala > wrote: >> Add a new kernel API to attach a task to current task's cgroup >> in all the active hierarchies. >> >> Signed-off-by: Sridhar Samudrala > > Reviewed-by: Paul Menage > One other thought on this - this would be the first piece of code that's attaching a task to a cgroup without holding the cgroup directory inode i_mutex. I believe that this is probably OK. Paul -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] cgroups: Add an API to attach a task to current task's cgroup
On Tue, May 18, 2010 at 5:04 PM, Sridhar Samudrala wrote: > Add a new kernel API to attach a task to current task's cgroup > in all the active hierarchies. > > Signed-off-by: Sridhar Samudrala Reviewed-by: Paul Menage It would be more efficient to just attach directly to current->cgroups rather than potentially creating/destroying one css_set for each hierarchy until we've completely converged on current->cgroups - but that would require a bunch of refactoring of the guts of cgroup_attach_task() to ensure that the right can_attach()/attach() callbacks are made. That doesn't really seem worthwhile right now for the initial use, that I imagine isn't going to be performance-sensitive. Paul -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm
2010/5/20 Anthony Liguori : >> With new approaches like Sheepdog or Ceph, things are getting a lot >> cheaper and you can scale your system without disrupting your service. >> The concepts are quite similar to what Amazon is doing in their EC2 >> environment, but they certainly won't publish it as OpenSource anytime >> soon. >> >> Both projects have advantages and disadvantages. Ceph is a bit more >> universal as it implements a whole filesystem. Sheepdog is more >> feature complete in regards of managing images (e.g. snapshots). Both >> projects require some additional work to become stable, but they are >> on a good way. >> >> I would really like to see both drivers in the qemu tree, as they are >> the key to a design shift in how storage in the datacenter is being >> built. >> > > I'd be more interested in enabling people to build these types of storage > systems without touching qemu. You could do this by using Yehuda's rbd kernel driver, but I think that it would be better to avoid this additional layer. > Both sheepdog and ceph ultimately transmit I/O over a socket to a central > daemon, right? So could we not standardize a protocol for this that both > sheepdog and ceph could implement? There is no central daemon. The concept is that they talk to many storage nodes at the same time. Data is distributed and replicated over many nodes in the network. The mechanism to do this is quite complex. I don't know about sheepdog, but in Ceph this is called RADOS (reliable autonomic distributed object store). Sheepdog and Ceph may look similar, but this is where they act different. I don't think that it would be possible to implement a common protocol. Regards, Christian -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] device-assignment: Don't use libpci
* Alex Williamson (alex.william...@redhat.com) wrote: > We've already got an open fd for PCI config space for the device, we > might as well use it. This also makes sure that if we're making use of > a privileged file descriptor opened for us, we use it for all accesses > to the device. > > Signed-off-by: Alex Williamson Looks good to me. Acked-by: Chris Wright -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] device-assignment: Don't use libpci
We've already got an open fd for PCI config space for the device, we might as well use it. This also makes sure that if we're making use of a privileged file descriptor opened for us, we use it for all accesses to the device. Signed-off-by: Alex Williamson --- configure | 21 --- hw/device-assignment.c | 66 2 files changed, 50 insertions(+), 37 deletions(-) diff --git a/configure b/configure index ed8e17b..632d4b0 100755 --- a/configure +++ b/configure @@ -1623,27 +1623,6 @@ EOF fi ## -# libpci probe for kvm_cap_device_assignment -if test $kvm_cap_device_assignment = "yes" ; then - cat > $TMPC << EOF -#include -#ifndef PCI_VENDOR_ID -#error NO LIBPCI -#endif -int main(void) { struct pci_access a; pci_init(&a); return 0; } -EOF - if compile_prog "" "-lpci -lz" ; then -libs_softmmu="-lpci -lz $libs_softmmu" - else -echo -echo "Error: libpci check failed" -echo "Disable KVM Device Assignment capability." -echo -kvm_cap_device_assignment=no - fi -fi - -## # test for vhost net if test "$vhost_net" != "no"; then diff --git a/hw/device-assignment.c b/hw/device-assignment.c index 172f0c9..e6b34ac 100644 --- a/hw/device-assignment.c +++ b/hw/device-assignment.c @@ -335,24 +335,61 @@ static void assigned_dev_ioport_map(PCIDevice *pci_dev, int region_num, (r_dev->v_addrs + region_num)); } -static uint8_t pci_find_cap_offset(struct pci_dev *pci_dev, uint8_t cap) +static uint32_t assigned_dev_pci_read(PCIDevice *d, int pos, int len) +{ +AssignedDevice *pci_dev = container_of(d, AssignedDevice, dev); +uint32_t val; +ssize_t ret; +int fd = pci_dev->real_device.config_fd; + +again: +ret = pread(fd, &val, len, pos); +if (ret != len) { + if ((ret < 0) && (errno == EINTR || errno == EAGAIN)) + goto again; + + fprintf(stderr, "%s: pread failed, ret = %zd errno = %d\n", + __func__, ret, errno); + + exit(1); +} + +return val; +} + +static uint8_t assigned_dev_pci_read_byte(PCIDevice *d, int pos) +{ +return (uint8_t)assigned_dev_pci_read(d, pos, 1); +} + +static uint16_t assigned_dev_pci_read_word(PCIDevice *d, int pos) +{ +return (uint16_t)assigned_dev_pci_read(d, pos, 2); +} + +static uint32_t assigned_dev_pci_read_long(PCIDevice *d, int pos) +{ +return assigned_dev_pci_read(d, pos, 4); +} + +static uint8_t pci_find_cap_offset(PCIDevice *d, uint8_t cap) { int id; int max_cap = 48; int pos = PCI_CAPABILITY_LIST; int status; -status = pci_read_byte(pci_dev, PCI_STATUS); +status = assigned_dev_pci_read_byte(d, PCI_STATUS); if ((status & PCI_STATUS_CAP_LIST) == 0) return 0; while (max_cap--) { -pos = pci_read_byte(pci_dev, pos); +pos = assigned_dev_pci_read_byte(d, pos); if (pos < 0x40) break; pos &= ~3; -id = pci_read_byte(pci_dev, pos + PCI_CAP_LIST_ID); +id = assigned_dev_pci_read_byte(d, pos + PCI_CAP_LIST_ID); if (id == 0xff) break; @@ -861,7 +898,7 @@ static int assign_irq(AssignedDevice *dev) int irq, r = 0; /* Interrupt PIN 0 means don't use INTx */ -if (pci_read_byte(dev->pdev, PCI_INTERRUPT_PIN) == 0) +if (assigned_dev_pci_read_byte(&dev->dev, PCI_INTERRUPT_PIN) == 0) return 0; irq = pci_map_irq(&dev->dev, dev->intpin); @@ -1199,7 +1236,7 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev) #ifdef KVM_CAP_DEVICE_MSI /* Expose MSI capability * MSI capability is the 1st capability in capability config */ -if (pci_find_cap_offset(dev->pdev, PCI_CAP_ID_MSI)) { +if (pci_find_cap_offset(pci_dev, PCI_CAP_ID_MSI)) { dev->cap.available |= ASSIGNED_DEVICE_CAP_MSI; memset(&pci_dev->config[pci_dev->cap.start + pci_dev->cap.length], 0, PCI_CAPABILITY_CONFIG_MSI_LENGTH); @@ -1211,23 +1248,25 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev) #endif #ifdef KVM_CAP_DEVICE_MSIX /* Expose MSI-X capability */ -if (pci_find_cap_offset(dev->pdev, PCI_CAP_ID_MSIX)) { +if (pci_find_cap_offset(pci_dev, PCI_CAP_ID_MSIX)) { int pos, entry_nr, bar_nr; u32 msix_table_entry; dev->cap.available |= ASSIGNED_DEVICE_CAP_MSIX; memset(&pci_dev->config[pci_dev->cap.start + pci_dev->cap.length], 0, PCI_CAPABILITY_CONFIG_MSIX_LENGTH); -pos = pci_find_cap_offset(dev->pdev, PCI_CAP_ID_MSIX); -entry_nr = pci_read_word(dev->pdev, pos + 2) & PCI_MSIX_TABSIZE; +pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_MSIX); +entry_nr = assigned_dev_pci_read_word(pci_dev, pos + 2) & + PCI_MSIX_TABSIZE; pci_dev->config[pci_dev->cap.start + pci_dev->
Re: [Qemu-devel] [RFC] Bug Day - June 1st, 2010
20.05.2010 11:15, Andre Przywara wrote: Michael Tokarev wrote: [] It'd be nice if we had more flexibility in defining custom machine types so you could just do qemu -M win98. This is wrong IMHO. win98 and winNT can run on various different machines, including all modern ones (yes I tried the same winNT on my Athlon X2-64, just had to switch SATA from AHCI to IDE; win95 works too)... just not in kvm :) Well, not really. You were lucky with your Athlon X2-64, actually it is the last machine not triggering the bug. I tried it on a AthlonII-X4 (which has maxleaf=5 as any newer AMD machines) and it showed the same bug. On Intel boxes this bug should trigger on every CPU starting with some Pentium4 models, including all Core chips. Have you tried versions with a newer service pack (SP6)? I replied in the original discussion -- after upgrading to SP6 there's no need in ,level=1 anymore, any -cpu variant works without crashes. The problem is to set it up, at least for me, since I don't have sp6 integrated into setup. Well, I don't use winNT to start with, actually, so for me it's not a problem at all ;) -- the reason why I asked is because I have a debian bugreport about this very issue, see http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=575439 (and because I had winNT install handy) But this is really interesting information - that winNT fails on other CPUs too. Thank you for that, now I can close the debian bugreport ;) BTW: Does anyone knows what the problem with Windows95/98 on KVM is? I tried some tracing today, but couldn't find a hint. Um. The bugreport(s) come as a surprize for me: I tried to install win98 in kvm several times in the past but setup always failed - different messages in different versions of kvm, either "unable to emulate" or "real mode trap" or something else, or just lockup, usually on first reboot. So - the bugreports talks about mouse non-working, but this means win98 itself works somehow... I dunno :) I think these bug reports are about plain QEMU. I tried it yesterday, in fact the mouse is non-functional. In KVM Windows95 gives me a black screen after the welcome screen with the moving bottom row. There are just two lines at the top: (translated from the german version) While initializing device NTKERN: Windows protection fault. Restart the computer. Yeah, that's what i've seen too, it's exactly ow it fails here with modern kvm. KVM catched some #UDs due to ARPL from VM86 mode, but TCG got them too and it survived. So if anyone has some more hints, I'd be grateful. Thank you! /mjt -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm
On 05/20/2010 04:18 PM, Christian Brunner wrote: Thanks for your comments. I'll send an updated patch in a few days. Having a central storage system is quite essential in larger hosting environments, it enables you to move your guest systems from one node to another easily (live-migration or dynamic restart). Traditionally this has been done using SAN, iSCSI or NFS. However most of these systems don't scale very well and and the costs for high-availability are quite high. With new approaches like Sheepdog or Ceph, things are getting a lot cheaper and you can scale your system without disrupting your service. The concepts are quite similar to what Amazon is doing in their EC2 environment, but they certainly won't publish it as OpenSource anytime soon. Both projects have advantages and disadvantages. Ceph is a bit more universal as it implements a whole filesystem. Sheepdog is more feature complete in regards of managing images (e.g. snapshots). Both projects require some additional work to become stable, but they are on a good way. I would really like to see both drivers in the qemu tree, as they are the key to a design shift in how storage in the datacenter is being built. I'd be more interested in enabling people to build these types of storage systems without touching qemu. Both sheepdog and ceph ultimately transmit I/O over a socket to a central daemon, right? So could we not standardize a protocol for this that both sheepdog and ceph could implement? Regards, Anthony Liguori Christian -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm
2010/5/20 Blue Swirl : > On Wed, May 19, 2010 at 7:22 PM, Christian Brunner wrote: >> The attached patch is a block driver for the distributed file system >> Ceph (http://ceph.newdream.net/). This driver uses librados (which >> is part of the Ceph server) for direct access to the Ceph object >> store and is running entirely in userspace. Therefore it is >> called "rbd" - rados block device. >> >> To compile the driver a recent version of ceph (>= 0.20.1) is needed >> and you have to "--enable-rbd" when running configure. >> >> Additional information is available on the Ceph-Wiki: >> >> http://ceph.newdream.net/wiki/Kvm-rbd > > > I have no idea whether it makes sense to add Ceph (no objection > either). I have some minor comments below. Thanks for your comments. I'll send an updated patch in a few days. Having a central storage system is quite essential in larger hosting environments, it enables you to move your guest systems from one node to another easily (live-migration or dynamic restart). Traditionally this has been done using SAN, iSCSI or NFS. However most of these systems don't scale very well and and the costs for high-availability are quite high. With new approaches like Sheepdog or Ceph, things are getting a lot cheaper and you can scale your system without disrupting your service. The concepts are quite similar to what Amazon is doing in their EC2 environment, but they certainly won't publish it as OpenSource anytime soon. Both projects have advantages and disadvantages. Ceph is a bit more universal as it implements a whole filesystem. Sheepdog is more feature complete in regards of managing images (e.g. snapshots). Both projects require some additional work to become stable, but they are on a good way. I would really like to see both drivers in the qemu tree, as they are the key to a design shift in how storage in the datacenter is being built. Christian -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] pc: fix segfault introduced by 3d53f5c36ff6
Good catch. Thanks, applied. On Thu, May 20, 2010 at 6:14 AM, Eduard - Gabriel Munteanu wrote: > Commit 3d53f5c36ff6 introduced a segfault by erroneously making fw_cfg a > 'void **' and passing it around in different ways. > > Signed-off-by: Eduard - Gabriel Munteanu > --- > hw/pc.c | 4 ++-- > 1 files changed, 2 insertions(+), 2 deletions(-) > > diff --git a/hw/pc.c b/hw/pc.c > index fee08c9..4a4a706 100644 > --- a/hw/pc.c > +++ b/hw/pc.c > @@ -822,7 +822,7 @@ void pc_memory_init(ram_addr_t ram_size, > ram_addr_t ram_addr, bios_offset, option_rom_offset; > ram_addr_t below_4g_mem_size, above_4g_mem_size = 0; > int bios_size, isa_bios_size; > - void **fw_cfg; > + void *fw_cfg; > > if (ram_size >= 0xe000 ) { > above_4g_mem_size = ram_size - 0xe000; > @@ -905,7 +905,7 @@ void pc_memory_init(ram_addr_t ram_size, > rom_set_fw(fw_cfg); > > if (linux_boot) { > - load_linux(*fw_cfg, kernel_filename, initrd_filename, > kernel_cmdline, below_4g_mem_size); > + load_linux(fw_cfg, kernel_filename, initrd_filename, kernel_cmdline, > below_4g_mem_size); > } > > for (i = 0; i < nb_option_roms; i++) { > -- > 1.6.4.4 > > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm
On Wed, May 19, 2010 at 7:22 PM, Christian Brunner wrote: > The attached patch is a block driver for the distributed file system > Ceph (http://ceph.newdream.net/). This driver uses librados (which > is part of the Ceph server) for direct access to the Ceph object > store and is running entirely in userspace. Therefore it is > called "rbd" - rados block device. > > To compile the driver a recent version of ceph (>= 0.20.1) is needed > and you have to "--enable-rbd" when running configure. > > Additional information is available on the Ceph-Wiki: > > http://ceph.newdream.net/wiki/Kvm-rbd I have no idea whether it makes sense to add Ceph (no objection either). I have some minor comments below. > > --- > Makefile | 3 + > Makefile.objs | 1 + > block/rados.h | 376 ++ > block/rbd.c | 585 > + > block/rbd_types.h | 48 + > configure | 27 +++ > 6 files changed, 1040 insertions(+), 0 deletions(-) > create mode 100644 block/rados.h > create mode 100644 block/rbd.c > create mode 100644 block/rbd_types.h > > diff --git a/Makefile b/Makefile > index eb9e02b..b1ab3e9 100644 > --- a/Makefile > +++ b/Makefile > @@ -27,6 +27,9 @@ configure: ; > $(call set-vpath, $(SRC_PATH):$(SRC_PATH)/hw) > > LIBS+=-lz $(LIBS_TOOLS) > +ifdef CONFIG_RBD > +LIBS+=-lrados > +endif > > ifdef BUILD_DOCS > DOCS=qemu-doc.html qemu-tech.html qemu.1 qemu-img.1 qemu-nbd.8 > diff --git a/Makefile.objs b/Makefile.objs > index acbaf22..85791ac 100644 > --- a/Makefile.objs > +++ b/Makefile.objs > @@ -18,6 +18,7 @@ block-nested-y += parallels.o nbd.o blkdebug.o > block-nested-$(CONFIG_WIN32) += raw-win32.o > block-nested-$(CONFIG_POSIX) += raw-posix.o > block-nested-$(CONFIG_CURL) += curl.o > +block-nested-$(CONFIG_RBD) += rbd.o > > block-obj-y += $(addprefix block/, $(block-nested-y)) > > diff --git a/block/rados.h b/block/rados.h > new file mode 100644 > index 000..6cde9a1 > --- /dev/null > +++ b/block/rados.h > @@ -0,0 +1,376 @@ > +#ifndef __RADOS_H > +#define __RADOS_H IIRC underscores here may conflict with system header use. Please use something like QEMU_BLOCK_RADOS_H. > + > +/* > + * Data types for the Ceph distributed object storage layer RADOS > + * (Reliable Autonomic Distributed Object Store). > + */ > + > + > + > +/* > + * osdmap encoding versions > + */ > +#define CEPH_OSDMAP_INC_VERSION 5 > +#define CEPH_OSDMAP_INC_VERSION_EXT 5 > +#define CEPH_OSDMAP_VERSION 5 > +#define CEPH_OSDMAP_VERSION_EXT 5 > + > +/* > + * fs id > + */ > +struct ceph_fsid { > + unsigned char fsid[16]; Too large indent, please check also elsewhere. > +}; > + > +static inline int ceph_fsid_compare(const struct ceph_fsid *a, > + const struct ceph_fsid *b) > +{ > + return memcmp(a, b, sizeof(*a)); > +} > + > +/* > + * ino, object, etc. > + */ > +typedef __le64 ceph_snapid_t; Please use uint64_t and le_to_cpu()/cpu_to_le(). > +#define CEPH_SNAPDIR ((__u64)(-1)) /* reserved for hidden .snap dir */ Likewise, uint64_t is the standard type. Also other places. > +#define CEPH_NOSNAP ((__u64)(-2)) /* "head", "live" revision */ > +#define CEPH_MAXSNAP ((__u64)(-3)) /* largest valid snapid */ > + > +struct ceph_timespec { > + __le32 tv_sec; > + __le32 tv_nsec; > +} __attribute__ ((packed)); > + > + > +/* > + * object layout - how objects are mapped into PGs > + */ > +#define CEPH_OBJECT_LAYOUT_HASH 1 > +#define CEPH_OBJECT_LAYOUT_LINEAR 2 > +#define CEPH_OBJECT_LAYOUT_HASHINO 3 > + > +/* > + * pg layout -- how PGs are mapped onto (sets of) OSDs > + */ > +#define CEPH_PG_LAYOUT_CRUSH 0 > +#define CEPH_PG_LAYOUT_HASH 1 > +#define CEPH_PG_LAYOUT_LINEAR 2 > +#define CEPH_PG_LAYOUT_HYBRID 3 > + > + > +/* > + * placement group. > + * we encode this into one __le64. > + */ > +struct ceph_pg { > + __le16 preferred; /* preferred primary osd */ > + __le16 ps; /* placement seed */ > + __le32 pool; /* object pool */ > +} __attribute__ ((packed)); > + > +/* > + * pg_pool is a set of pgs storing a pool of objects > + * > + * pg_num -- base number of pseudorandomly placed pgs > + * > + * pgp_num -- effective number when calculating pg placement. this > + * is used for pg_num increases. new pgs result in data being "split" > + * into new pgs. for this to proceed smoothly, new pgs are intiially > + * colocated with their parents; that is, pgp_num doesn't increase > + * until the new pgs have successfully split. only _then_ are the new > + * pgs placed independently. > + * > + * lpg_num -- localized pg count (per device). replicas are randomly > + * selected. > + * > + * lpgp_num -- as above. > + */ > +#define CEPH_PG_TYPE_REP 1 > +#define CEPH_PG_TYPE_RAID4 2 > +#define CEPH_PG_POOL_VERSION 2 > +struct ceph_pg_pool { > + __u8 type; /* CEPH_PG_TYPE_* */ > + __u8 size;
Support for direct inter-VM sockets? Inter-VM shared memory?
I'm interested in moving some research prototypes from Xen to KVM, but there are a few esoteric features I'd need to do this. First is an efficient mechanism for direct VM-to-VM sockets...something that bypasses the protocol stack and minimizes overhead. Xen has XenSocket, XenLoop, and others. I found a few mentions of this idea dating back to 2006*, mostly saying that a few people have done something like this as a one-off, but nothing official has been released. I haven't found anything like this more recently, though. Has there been any progress on this front? Ideally, I'd want a character device or a special program that acts as a fast pipe to a different VM. Second, what about inter-VM shared memory? Apologies if I'm missing some well known doc...I search around google & the wiki to no avail. Thanks, Tyler Bletsch * http://www.mail-archive.com/kvm-de...@lists.sourceforge.net/msg00304.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: computer frozen
On Thursday, May 20, 2010 02:03:31 am magicboiz wrote: > Hello > > since kernel 2.6.28 or 2.6.29, I don't remember exactly, whenever I try to > run KVM in my laptop, I get my computer totally frozen. > > I'd try: > - "-no-kvm" flag: works, but very slow > - "-cpu qemu32,-nx": frozen > - "-no-acpi" flag: frozen > > I'd try with several kernels (ubuntu and openssuse kernels), also with > custom kernels compiled by me (with the minimal options enabled)but > always the same result: computer frozen It's been a long time since KVM caused host lockups. That is almost always something hardware/bios/local config related. > > An interesting point: with Sun VirtualBox 3.1, the same frozen result. > > My laptop is a TOSHIBA TECRA S4 (europe model only). > > magicb...@linux-ue9l:~/> cat /proc/cpuinfo > > Anyone can help me? > > Thx in advance. > > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH][VHOST] fix race with guest on multi-buffer used buffer updates
[for Michael Tsirkin's vhost development git tree] This patch fixes a race between guest and host when adding used buffers wraps the ring. Without it, guests can see partial packets before num_buffers is set in the vnet header. Signed-off-by: David L Stevens diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index 7f2568d..74790ab 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -1065,14 +1065,6 @@ static int __vhost_add_used_n(struct vhost_virtqueue *vq, vq_err(vq, "Failed to write used"); return -EFAULT; } - /* Make sure buffer is written before we update index. */ - smp_wmb(); - if (put_user(vq->last_used_idx + count, &vq->used->idx)) { - vq_err(vq, "Failed to increment used idx"); - return -EFAULT; - } - if (unlikely(vq->log_used)) - vhost_log_used(vq, used); vq->last_used_idx += count; return 0; } @@ -1093,7 +1085,17 @@ int vhost_add_used_n(struct vhost_virtqueue *vq, struct vring_used_elem *heads, heads += n; count -= n; } - return __vhost_add_used_n(vq, heads, count); + r = __vhost_add_used_n(vq, heads, count); + + /* Make sure buffer is written before we update index. */ + smp_wmb(); + if (put_user(vq->last_used_idx, &vq->used->idx)) { + vq_err(vq, "Failed to increment used idx"); + return -EFAULT; + } + if (unlikely(vq->log_used)) + vhost_log_used(vq, vq->used->ring + start); + return r; } /* This actually signals the guest, using eventfd. */ -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Print a user-friendly message on failed vmentry
On Thu, May 20, 2010 at 6:53 PM, Avi Kivity wrote: > On 05/20/2010 05:46 PM, Mohammed Gamal wrote: >> >> On Thu, May 20, 2010 at 5:37 PM, Chris Lalancette >> wrote: >> >>> >>> On 05/19/2010 05:16 PM, Mohammed Gamal wrote: >>> This patch address bug report in https://bugs.launchpad.net/qemu/+bug/530077. Failed vmentries were handled with handle_unhandled() which prints a rather unfriendly message to the user. This patch separates handling vmentry failures from unknown exit reasons and prints a friendly message to the user. Signed-off-by: Mohammed Gamal --- qemu-kvm.c | 16 +++- 1 files changed, 15 insertions(+), 1 deletions(-) diff --git a/qemu-kvm.c b/qemu-kvm.c index 35a4c8a..deb4df8 100644 --- a/qemu-kvm.c +++ b/qemu-kvm.c @@ -106,6 +106,20 @@ static int handle_unhandled(uint64_t reason) return -EINVAL; } +static int handle_failed_vmentry(uint64_t reason) +{ + fprintf(stderr, "kvm: vm entry failed with error 0x%" PRIx64 "\n\n", reason); + fprintf(stderr, "If you're runnning a guest on an Intel machine, it can be\n"); + fprintf(stderr, "most-likely due to the guest going into an invalid state\n"); + fprintf(stderr, "for Intel VT. For example, the guest maybe running in big\n"); + fprintf(stderr, "real mode which is not supported by Intel VT.\n\n"); + fprintf(stderr, "You may want to try enabling real mode emulation in KVM.\n"); + fprintf(stderr, "To Enable it, you may run the following commands as root:\n"); + fprintf(stderr, "# rmmod kvm_intel\n"); + fprintf(stderr, "# rmmod kvm\n"); + fprintf(stderr, "# modprobe kvm_intel emulate_invalid_guest_state=1\n"); + return -EINVAL; +} >>> >>> The thing is, there are other valid reasons for vmentry failure. A while >>> ago I tracked >>> down a bug in the Linux kernel that was causing us to vmenter with >>> invalid segments; >>> this message would have been very misleading in that case. I think you'd >>> have to do >>> more complete analysis of the vmentry failure code to be more certain >>> about the reason >>> for failure. >>> >>> >> >> Your point is definitely valid, yet big real mode is usually the most >> likely case, and that's why this message is shown. Note also that it >> says it's _most likely_ a failure caused by an invalid guest state, >> but it doesn't rule out all other reasons. And in any case, it'd be >> better than just printing something along the lines of: >> " kvm: unhandled exit 8021 >> kvm_run returned -22" >> > > However, we're still giving bad advice. Currently > emulate_invalid_guest_state=1 is not going to work well (right?). Once it > does, we'll simply make it the default and the message will never appear. > I already added a warning in the second patch I sent. > -- > Do not meddle in the internals of kernels, for they are subtle and quick to > panic. > > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Print a user-friendly message on failed vmentry
On 05/20/2010 05:46 PM, Mohammed Gamal wrote: On Thu, May 20, 2010 at 5:37 PM, Chris Lalancette wrote: On 05/19/2010 05:16 PM, Mohammed Gamal wrote: This patch address bug report in https://bugs.launchpad.net/qemu/+bug/530077. Failed vmentries were handled with handle_unhandled() which prints a rather unfriendly message to the user. This patch separates handling vmentry failures from unknown exit reasons and prints a friendly message to the user. Signed-off-by: Mohammed Gamal --- qemu-kvm.c | 16 +++- 1 files changed, 15 insertions(+), 1 deletions(-) diff --git a/qemu-kvm.c b/qemu-kvm.c index 35a4c8a..deb4df8 100644 --- a/qemu-kvm.c +++ b/qemu-kvm.c @@ -106,6 +106,20 @@ static int handle_unhandled(uint64_t reason) return -EINVAL; } +static int handle_failed_vmentry(uint64_t reason) +{ +fprintf(stderr, "kvm: vm entry failed with error 0x%" PRIx64 "\n\n", reason); +fprintf(stderr, "If you're runnning a guest on an Intel machine, it can be\n"); +fprintf(stderr, "most-likely due to the guest going into an invalid state\n"); +fprintf(stderr, "for Intel VT. For example, the guest maybe running in big\n"); +fprintf(stderr, "real mode which is not supported by Intel VT.\n\n"); +fprintf(stderr, "You may want to try enabling real mode emulation in KVM.\n"); +fprintf(stderr, "To Enable it, you may run the following commands as root:\n"); +fprintf(stderr, "# rmmod kvm_intel\n"); +fprintf(stderr, "# rmmod kvm\n"); +fprintf(stderr, "# modprobe kvm_intel emulate_invalid_guest_state=1\n"); +return -EINVAL; +} The thing is, there are other valid reasons for vmentry failure. A while ago I tracked down a bug in the Linux kernel that was causing us to vmenter with invalid segments; this message would have been very misleading in that case. I think you'd have to do more complete analysis of the vmentry failure code to be more certain about the reason for failure. Your point is definitely valid, yet big real mode is usually the most likely case, and that's why this message is shown. Note also that it says it's _most likely_ a failure caused by an invalid guest state, but it doesn't rule out all other reasons. And in any case, it'd be better than just printing something along the lines of: " kvm: unhandled exit 8021 kvm_run returned -22" However, we're still giving bad advice. Currently emulate_invalid_guest_state=1 is not going to work well (right?). Once it does, we'll simply make it the default and the message will never appear. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH RFC] virtio: put last seen used index into ring itself
On 05/20/2010 05:34 PM, Rusty Russell wrote: Have just one ring, no indexes. The producer places descriptors into the ring and updates the head, The consumer copies out descriptors to be processed and copies back in completed descriptors. Chaining is always linear. The descriptors contain a tag that allow the producer to identify the completion. This could definitely work. The original reason for the page boundaries was for untrusted inter-guest communication: with appropriate page protections they could see each other's rings and a simply inter-guest copy hypercall could verify that the other guest really exposed that data via virtio ring. But, cute as that is, we never did that. And it's not clear that it wins much over simply having the hypervisor read both rings directly. AFAICS having separate avail_ring/used_ring/desc_pool is orthogonal to this cuteness. Can we do better? The obvious idea is to try to get rid of last_used and used, and use the ring itself. We would use an invalid entry to mark the head of the ring. Interesting! So a peer will read until it hits a wall. But how to update the wall atomically? Maybe we can have a flag in the descriptor indicate headness or tailness. Update looks ugly though: write descriptor with head flag, write next descriptor with head flag, remove flag from previous descriptor. I was thinking a separate magic "invalid" entry. To publish an 3 descriptor chain, you would write descriptors 2 and 3, write an invalid entry at 4, barrier, write entry 1. It is a bit ugly, yes, but not terrible. Worth exploring. This amortizes the indexes into the ring, a good thing. Another thing we can do is place the tail a half ring away from the head (and limit ring utilization to 50%), reducing bounces on short kicks. Or equivalently have an avail ring and used ring, but both containing tagged descriptors instead of pointers to descriptors. I think that a simple simulator for this is worth writing, which tracks cacheline moves under various fullness scenarios... Yup. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Print a user-friendly message on failed vmentry
On Thu, May 20, 2010 at 5:37 PM, Chris Lalancette wrote: > On 05/19/2010 05:16 PM, Mohammed Gamal wrote: >> This patch address bug report in https://bugs.launchpad.net/qemu/+bug/530077. >> >> Failed vmentries were handled with handle_unhandled() which prints a rather >> unfriendly message to the user. This patch separates handling vmentry >> failures >> from unknown exit reasons and prints a friendly message to the user. >> >> Signed-off-by: Mohammed Gamal >> --- >> qemu-kvm.c | 16 +++- >> 1 files changed, 15 insertions(+), 1 deletions(-) >> >> diff --git a/qemu-kvm.c b/qemu-kvm.c >> index 35a4c8a..deb4df8 100644 >> --- a/qemu-kvm.c >> +++ b/qemu-kvm.c >> @@ -106,6 +106,20 @@ static int handle_unhandled(uint64_t reason) >> return -EINVAL; >> } >> >> +static int handle_failed_vmentry(uint64_t reason) >> +{ >> + fprintf(stderr, "kvm: vm entry failed with error 0x%" PRIx64 "\n\n", >> reason); >> + fprintf(stderr, "If you're runnning a guest on an Intel machine, it can >> be\n"); >> + fprintf(stderr, "most-likely due to the guest going into an invalid >> state\n"); >> + fprintf(stderr, "for Intel VT. For example, the guest maybe running in >> big\n"); >> + fprintf(stderr, "real mode which is not supported by Intel VT.\n\n"); >> + fprintf(stderr, "You may want to try enabling real mode emulation in >> KVM.\n"); >> + fprintf(stderr, "To Enable it, you may run the following commands as >> root:\n"); >> + fprintf(stderr, "# rmmod kvm_intel\n"); >> + fprintf(stderr, "# rmmod kvm\n"); >> + fprintf(stderr, "# modprobe kvm_intel emulate_invalid_guest_state=1\n"); >> + return -EINVAL; >> +} > > The thing is, there are other valid reasons for vmentry failure. A while ago > I tracked > down a bug in the Linux kernel that was causing us to vmenter with invalid > segments; > this message would have been very misleading in that case. I think you'd > have to do > more complete analysis of the vmentry failure code to be more certain about > the reason > for failure. > Your point is definitely valid, yet big real mode is usually the most likely case, and that's why this message is shown. Note also that it says it's _most likely_ a failure caused by an invalid guest state, but it doesn't rule out all other reasons. And in any case, it'd be better than just printing something along the lines of: " kvm: unhandled exit 8021 kvm_run returned -22" > -- > Chris Lalancette > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Print a user-friendly message on failed vmentry
On 05/20/2010 09:37 AM, Chris Lalancette wrote: On 05/19/2010 05:16 PM, Mohammed Gamal wrote: This patch address bug report in https://bugs.launchpad.net/qemu/+bug/530077. Failed vmentries were handled with handle_unhandled() which prints a rather unfriendly message to the user. This patch separates handling vmentry failures from unknown exit reasons and prints a friendly message to the user. Signed-off-by: Mohammed Gamal --- qemu-kvm.c | 16 +++- 1 files changed, 15 insertions(+), 1 deletions(-) diff --git a/qemu-kvm.c b/qemu-kvm.c index 35a4c8a..deb4df8 100644 --- a/qemu-kvm.c +++ b/qemu-kvm.c @@ -106,6 +106,20 @@ static int handle_unhandled(uint64_t reason) return -EINVAL; } +static int handle_failed_vmentry(uint64_t reason) +{ +fprintf(stderr, "kvm: vm entry failed with error 0x%" PRIx64 "\n\n", reason); +fprintf(stderr, "If you're runnning a guest on an Intel machine, it can be\n"); +fprintf(stderr, "most-likely due to the guest going into an invalid state\n"); +fprintf(stderr, "for Intel VT. For example, the guest maybe running in big\n"); +fprintf(stderr, "real mode which is not supported by Intel VT.\n\n"); +fprintf(stderr, "You may want to try enabling real mode emulation in KVM.\n"); +fprintf(stderr, "To Enable it, you may run the following commands as root:\n"); +fprintf(stderr, "# rmmod kvm_intel\n"); +fprintf(stderr, "# rmmod kvm\n"); +fprintf(stderr, "# modprobe kvm_intel emulate_invalid_guest_state=1\n"); +return -EINVAL; +} The thing is, there are other valid reasons for vmentry failure. A while ago I tracked down a bug in the Linux kernel that was causing us to vmenter with invalid segments; this message would have been very misleading in that case. I think you'd have to do more complete analysis of the vmentry failure code to be more certain about the reason for failure. We should probably only display this message if reason == 0x8021. It may be worth looking at the cpu_state to verify that we're trying to enter 16-bit mode too. Then the message can be much more definitive too. Regards, Anthony Liguori -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Print a user-friendly message on failed vmentry
On 05/19/2010 05:16 PM, Mohammed Gamal wrote: > This patch address bug report in https://bugs.launchpad.net/qemu/+bug/530077. > > Failed vmentries were handled with handle_unhandled() which prints a rather > unfriendly message to the user. This patch separates handling vmentry failures > from unknown exit reasons and prints a friendly message to the user. > > Signed-off-by: Mohammed Gamal > --- > qemu-kvm.c | 16 +++- > 1 files changed, 15 insertions(+), 1 deletions(-) > > diff --git a/qemu-kvm.c b/qemu-kvm.c > index 35a4c8a..deb4df8 100644 > --- a/qemu-kvm.c > +++ b/qemu-kvm.c > @@ -106,6 +106,20 @@ static int handle_unhandled(uint64_t reason) > return -EINVAL; > } > > +static int handle_failed_vmentry(uint64_t reason) > +{ > +fprintf(stderr, "kvm: vm entry failed with error 0x%" PRIx64 "\n\n", > reason); > +fprintf(stderr, "If you're runnning a guest on an Intel machine, it can > be\n"); > +fprintf(stderr, "most-likely due to the guest going into an invalid > state\n"); > +fprintf(stderr, "for Intel VT. For example, the guest maybe running in > big\n"); > +fprintf(stderr, "real mode which is not supported by Intel VT.\n\n"); > +fprintf(stderr, "You may want to try enabling real mode emulation in > KVM.\n"); > +fprintf(stderr, "To Enable it, you may run the following commands as > root:\n"); > +fprintf(stderr, "# rmmod kvm_intel\n"); > +fprintf(stderr, "# rmmod kvm\n"); > +fprintf(stderr, "# modprobe kvm_intel emulate_invalid_guest_state=1\n"); > +return -EINVAL; > +} The thing is, there are other valid reasons for vmentry failure. A while ago I tracked down a bug in the Linux kernel that was causing us to vmenter with invalid segments; this message would have been very misleading in that case. I think you'd have to do more complete analysis of the vmentry failure code to be more certain about the reason for failure. -- Chris Lalancette -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH RFC] virtio: put last seen used index into ring itself
On Thu, 20 May 2010 04:30:56 pm Avi Kivity wrote: > On 05/20/2010 08:01 AM, Rusty Russell wrote: > > > >> A device with out of order > >> completion (like virtio-blk) will quickly randomize the unused > >> descriptor indexes, so every descriptor fetch will require a bounce. > >> > >> In contrast, if the rings hold the descriptors themselves instead of > >> pointers, we bounce (sizeof(descriptor)/cache_line_size) cache lines for > >> every descriptor, amortized. > >> > > We already have indirect, this would be a logical next step. So let's > > think about it. The avail ring would contain 64 bit values, the used ring > > would contain indexes into the avail ring. > > Have just one ring, no indexes. The producer places descriptors into > the ring and updates the head, The consumer copies out descriptors to > be processed and copies back in completed descriptors. Chaining is > always linear. The descriptors contain a tag that allow the producer to > identify the completion. This could definitely work. The original reason for the page boundaries was for untrusted inter-guest communication: with appropriate page protections they could see each other's rings and a simply inter-guest copy hypercall could verify that the other guest really exposed that data via virtio ring. But, cute as that is, we never did that. And it's not clear that it wins much over simply having the hypervisor read both rings directly. > > Can we do better? The obvious idea is to try to get rid of last_used and > > used, and use the ring itself. We would use an invalid entry to mark the > > head of the ring. > > Interesting! So a peer will read until it hits a wall. But how to > update the wall atomically? > > Maybe we can have a flag in the descriptor indicate headness or > tailness. Update looks ugly though: write descriptor with head flag, > write next descriptor with head flag, remove flag from previous descriptor. I was thinking a separate magic "invalid" entry. To publish an 3 descriptor chain, you would write descriptors 2 and 3, write an invalid entry at 4, barrier, write entry 1. It is a bit ugly, yes, but not terrible. I think that a simple simulator for this is worth writing, which tracks cacheline moves under various fullness scenarios... Cheers, Rusty. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH] AMD IOMMU emulation
This is preliminary work for AMD IOMMU emulation support. Signed-off-by: Eduard - Gabriel Munteanu --- Makefile.target |2 + configure |9 + hw/amd_iommu.c | 442 +++ hw/pc.c |2 + hw/pc.h |3 + hw/pci_ids.h|2 + hw/pci_regs.h |1 + 7 files changed, 461 insertions(+), 0 deletions(-) create mode 100644 hw/amd_iommu.c diff --git a/Makefile.target b/Makefile.target index 0bdb184..13f8086 100644 --- a/Makefile.target +++ b/Makefile.target @@ -217,6 +217,8 @@ obj-i386-y += testdev.o obj-i386-$(CONFIG_KVM_PIT) += i8254-kvm.o obj-i386-$(CONFIG_KVM_DEVICE_ASSIGNMENT) += device-assignment.o +obj-i386-$(CONFIG_AMD_IOMMU) += amd_iommu.o + # Hardware support obj-ia64-y += ide.o pckbd.o vga.o $(SOUND_HW) dma.o $(AUDIODRV) obj-ia64-y += fdc.o mc146818rtc.o serial.o i8259.o ipf.o diff --git a/configure b/configure index ed8e17b..34e5194 100755 --- a/configure +++ b/configure @@ -305,6 +305,7 @@ mixemu="no" kvm_trace="no" kvm_cap_pit="" kvm_cap_device_assignment="" +amd_iommu="no" kerneldir="" aix="no" blobs="yes" @@ -603,6 +604,8 @@ for opt do ;; --enable-kvm-device-assignment) kvm_cap_device_assignment="yes" ;; + --enable-amd-iommu-emul) amd_iommu="yes" + ;; --enable-profiler) profiler="yes" ;; --enable-cocoa) @@ -829,6 +832,8 @@ echo " --disable-kvm-pitdisable KVM pit support" echo " --enable-kvm-pit enable KVM pit support" echo " --disable-kvm-device-assignment disable KVM device assignment support" echo " --enable-kvm-device-assignment enable KVM device assignment support" +echo " --disable-amd-iommu-emul disable AMD IOMMU emulation" +echo " --enable-amd-iommu-emul enable AMD IOMMU emulation" echo " --disable-nptl disable usermode NPTL support" echo " --enable-nptlenable usermode NPTL support" echo " --enable-system enable all system emulation targets" @@ -2185,6 +2190,7 @@ echo "KVM support $kvm" echo "KVM PIT support $kvm_cap_pit" echo "KVM device assig. $kvm_cap_device_assignment" echo "KVM trace support $kvm_trace" +echo "AMD IOMMU emul. $amd_iommu" echo "fdt support $fdt" echo "preadv support$preadv" echo "fdatasync $fdatasync" @@ -2599,6 +2605,9 @@ case "$target_arch2" in x86_64) TARGET_BASE_ARCH=i386 target_phys_bits=64 +if test "$amd_iommu" = "yes"; then + echo "CONFIG_AMD_IOMMU=y" >> $config_target_mak +fi ;; ia64) target_phys_bits=64 diff --git a/hw/amd_iommu.c b/hw/amd_iommu.c new file mode 100644 index 000..cde90d0 --- /dev/null +++ b/hw/amd_iommu.c @@ -0,0 +1,442 @@ +/* + * AMD IOMMU emulation + * + * Copyright (c) 2010 Eduard - Gabriel Munteanu + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to deal + * in the Software without restriction, including without limitation the rights + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + * copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN + * THE SOFTWARE. + */ + +#include "pc.h" +#include "hw.h" +#include "pci.h" + +/* Capability registers */ +#define CAPAB_HEADER0x00 +#define CAPAB_REV_TYPE0x02 +#define CAPAB_FLAGS 0x03 +#define CAPAB_BAR_LOW 0x04 +#define CAPAB_BAR_HIGH 0x08 +#define CAPAB_RANGE 0x0C +#define CAPAB_MISC 0x10 + +#define CAPAB_SIZE 0x14 + +/* Capability header data */ +#define CAPAB_FLAG_IOTLBSUP (1 << 0) +#define CAPAB_FLAG_HTTUNNEL (1 << 1) +#define CAPAB_FLAG_NPCACHE (1 << 2) +#define CAPAB_INIT_REV (1 << 3) +#define CAPAB_INIT_TYPE 3 +#define CAPAB_INIT_REV_TYPE (CAPAB_REV | CAPAB_TYPE) +#define CAPAB_INIT_FLAGS(CAPAB_FLAG_NPCACHE | CAPAB_FLAG_HTTUNNEL) +#define CAPAB_INIT_MISC (64 << 15) | (48 << 8) +#define CAPAB_BAR_MASK ~((1UL << 14) - 1) + +/* MMIO registers */ +#define MMIO_DEVICE_TABLE 0x +#define MMIO_COMMAND_BASE 0x0008 +#define MMIO_EVENT_BASE 0x0010 +#define MMIO_CONTROL0x0018 +#define MMIO_EXCL_BASE 0x0020 +#define MMIO_EXCL_LIMIT
Re: [Qemu-devel] [PATCH 0/2] Fix scsi-generic breakage in upstream qemu-kvm.git
Am 17.05.2010 18:45, schrieb Nicholas A. Bellinger: > From: Nicholas Bellinger > > Greetings, > > Attached are the updated patches following hch's comments to fix scsi-generic > device breakage with find_image_format() and refresh_total_sectors(). > > These are being resent as the last attachments where in MBOX format from > git-format-patch. > > Signed-off-by: Nicholas A. Bellinger Thanks, applied all to the block branch, even though I forgot to reply here. Kevin -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Wiki docs on counting and tracing KVM perf events
On 05/20/2010 03:24 PM, Stefan Hajnoczi wrote: On Thu, May 20, 2010 at 1:14 PM, Avi Kivity wrote: echo 1>/sys/kernel/debug/tracing/events/kvm/enable cat /sys/kernel/debug/tracing/trace_pipe>results/trace& perf will enable the events by itself (no?), so all you need is is the perf call in the middle. Yes, it will enable events. However, I am also generating a kvm_trace-like log using trace_pipe. On this box I couldn't get perf trace working so I used trace_pipe for the kvm_trace-equivalent and perf stat for the kvm_stat equivalent. Ah. There's the new shiny trace_cmd as well. If you're getting a (binary) trace, runing perf stat is redundant since it contains the same data. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Wiki docs on counting and tracing KVM perf events
On Thu, May 20, 2010 at 1:14 PM, Avi Kivity wrote: >> echo 1>/sys/kernel/debug/tracing/events/kvm/enable >> cat /sys/kernel/debug/tracing/trace_pipe>results/trace& > > perf will enable the events by itself (no?), so all you need is is the perf > call in the middle. Yes, it will enable events. However, I am also generating a kvm_trace-like log using trace_pipe. On this box I couldn't get perf trace working so I used trace_pipe for the kvm_trace-equivalent and perf stat for the kvm_stat equivalent. Stefan -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Wiki docs on counting and tracing KVM perf events
On 05/20/2010 02:23 PM, Stefan Hajnoczi wrote: On Thu, May 20, 2010 at 12:16 PM, Jes Sorensen wrote: On 05/20/10 13:10, Avi Kivity wrote: What's wrong with starting perf after the warm-up period and stopping it before it's done? It's pretty hard to script. I use the following. It ain't pretty: #!/bin/bash cleanup() { trap - 2 kill -2 $sleep_pid echo 0>/sys/kernel/debug/tracing/events/kvm/enable kill $cat_pid } perf stat -a -e 'kvm:*' sleep 1h>results/perf_stat 2>&1& sleep_pid=$(sleep 1&& pgrep -x -f "sleep 1h") # sleep 1 is to avoid race with forked perf process trap cleanup 2 echo 1>/sys/kernel/debug/tracing/events/kvm/enable cat /sys/kernel/debug/tracing/trace_pipe>results/trace& cat_pid=$! # ...do stuff here... cleanup perf will enable the events by itself (no?), so all you need is is the perf call in the middle. What's missing is vmstat-like or kvm_stat-like output, but that's another thing. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH qemu-kvm] device-assignment: add config fd qdev property
Anthony Liguori writes: > On 05/19/2010 02:00 PM, Chris Wright wrote: >> When libvirt launches a guest it first chowns the relevenat >> /sys/bus/pci/.../config file for an assigned device then drops privileges. >> >> This causes an issue for device assignment because despite being file >> owner, the sysfs config space file checks for CAP_SYS_ADMIN before >> allowing access to device dependent config space. >> >> This adds a new qdev configfd property which allows libvirt to open the >> sysfs config space file and give qemu an already opened file descriptor. >> Along with a change pending for the 2.6.35 kernel, this allows the >> capability check to compare against privileges from when the file was >> opened. >> >> Signed-off-by: Chris Wright >> > > An fd as a qdev property seems like a bad idea to me. I'm not sure I > have a better suggestion though. Shot from the hip without much thought: could we use monitor command getfd? That associates the fd with a name. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Wiki docs on counting and tracing KVM perf events
On Thu, May 20, 2010 at 12:16 PM, Jes Sorensen wrote: > On 05/20/10 13:10, Avi Kivity wrote: >> What's wrong with starting perf after the warm-up period and stopping it >> before it's done? > > It's pretty hard to script. I use the following. It ain't pretty: #!/bin/bash cleanup() { trap - 2 kill -2 $sleep_pid echo 0 >/sys/kernel/debug/tracing/events/kvm/enable kill $cat_pid } perf stat -a -e 'kvm:*' sleep 1h >results/perf_stat 2>&1 & sleep_pid=$(sleep 1 && pgrep -x -f "sleep 1h") # sleep 1 is to avoid race with forked perf process trap cleanup 2 echo 1 >/sys/kernel/debug/tracing/events/kvm/enable cat /sys/kernel/debug/tracing/trace_pipe >results/trace & cat_pid=$! # ...do stuff here... cleanup Stefan -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Wiki docs on counting and tracing KVM perf events
On 05/20/10 13:10, Avi Kivity wrote: > On 05/20/2010 02:05 PM, Stefan Hajnoczi wrote: >> Jes, you're right, something like "perf stat -e kvm:* --start" and >> "perf stat --stop" would be more usable for system-wide monitoring. I >> wonder if it is possible to support this or whether the perf process >> needs to periodically accumulate the counters (i.e. babysit the kernel >> infrastructure)? >> > > perf needs to be running to pull data out of the kernel (and since > profiling is tied to an fd life cycle). > > What's wrong with starting perf after the warm-up period and stopping it > before it's done? It's pretty hard to script. I have scripts that do 'read_stats() ; run_test() ; read_stats() ; calc_average()' Autotest would like to do this kinda stuff too. Jes -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Wiki docs on counting and tracing KVM perf events
On 05/20/2010 02:05 PM, Stefan Hajnoczi wrote: 8330 kvm:kvm_entry# 0.000 M/sec ^--- count since starting perf The 8330 number means that kvm_entry has fired 8330 times since perf was started. Like Avi says, you need to keep the perf process running. I run benchmarks using a script that kills perf after the benchmark completes. Jes, you're right, something like "perf stat -e kvm:* --start" and "perf stat --stop" would be more usable for system-wide monitoring. I wonder if it is possible to support this or whether the perf process needs to periodically accumulate the counters (i.e. babysit the kernel infrastructure)? perf needs to be running to pull data out of the kernel (and since profiling is tied to an fd life cycle). What's wrong with starting perf after the warm-up period and stopping it before it's done? -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Wiki docs on counting and tracing KVM perf events
8330 kvm:kvm_entry# 0.000 M/sec ^--- count since starting perf The 8330 number means that kvm_entry has fired 8330 times since perf was started. Like Avi says, you need to keep the perf process running. I run benchmarks using a script that kills perf after the benchmark completes. Jes, you're right, something like "perf stat -e kvm:* --start" and "perf stat --stop" would be more usable for system-wide monitoring. I wonder if it is possible to support this or whether the perf process needs to periodically accumulate the counters (i.e. babysit the kernel infrastructure)? Stefan -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [KVM-AUTOTEST PATCH] KVM test: use command line option wrapper functions
On Thu, 2010-05-20 at 12:50 +0300, Michael Goldish wrote: > On 05/19/2010 11:25 AM, Feng Yang wrote: > > Hi, Michael > > > > Thanks for your patch. > > We plan add "netdev" parameter support in make_qemu_command. Since you are > > working on this part. Could you add netdev support in your patch? hopeful > > netdev can be default supported in make_qemu_command if qemu support it. > > Thanks very much! > > Sure, I'll look into it. > > > I think the point of this patch is good and we need this kinds of patch. > > But I think we need not add so many new function. Especially some function > > only directly return the string and do nothing more. > > This will increase the function call consumption. > > > All these helper functions are meant to be extended and modified in the > future. They're only there to minimize future effort involved in adding > support for new command line syntaxes. > Right now add_smp() just returns " -smp %s", but in the future we may > have to support different syntaxes for -smp, and then add_smp() will > consult the output of 'qemu -help' and return the proper string. > What do you mean by function call consumption? I don't think these > functions cause a measurable slowdown, and make_qemu_command() is called > very few times, so this really isn't a concern IMO. Agreed, the wrappers are a good strategy in the case we have to support different feature sets and syntax. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCHv2] correctly trace irq injection on SVM.
On SVM interrupts are injected by svm_set_irq() not svm_inject_irq(). The later is used only to wait for irq window. Signed-off-by: Gleb Natapov ChangeLog: v1->v2: - fix stupid cut&paste error. diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 58c91f5..69b16a7 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -2831,8 +2831,6 @@ static inline void svm_inject_irq(struct vcpu_svm *svm, int irq) { struct vmcb_control_area *control; - trace_kvm_inj_virq(irq); - ++svm->vcpu.stat.irq_injections; control = &svm->vmcb->control; control->int_vector = irq; @@ -2847,6 +2845,8 @@ static void svm_set_irq(struct kvm_vcpu *vcpu) BUG_ON(!(gif_set(svm))); + trace_kvm_inj_virq(vcpu->arch.interrupt.nr); + svm->vmcb->control.event_inj = vcpu->arch.interrupt.nr | SVM_EVTINJ_VALID | SVM_EVTINJ_TYPE_INTR; } -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] correctly trace irq injection on SVM.
On Thu, May 20, 2010 at 01:19:45PM +0300, Gleb Natapov wrote: > On SVM interrupts are injected by svm_set_irq() not svm_inject_irq(). > The later is used only to wait for irq window. > And of course ignore this. Haven't noticed compilation error. > Signed-off-by: Gleb Natapov > diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c > index 58c91f5..cd70109 100644 > --- a/arch/x86/kvm/svm.c > +++ b/arch/x86/kvm/svm.c > @@ -2831,8 +2831,6 @@ static inline void svm_inject_irq(struct vcpu_svm *svm, > int irq) > { > struct vmcb_control_area *control; > > - trace_kvm_inj_virq(irq); > - > ++svm->vcpu.stat.irq_injections; > control = &svm->vmcb->control; > control->int_vector = irq; > @@ -2847,6 +2845,8 @@ static void svm_set_irq(struct kvm_vcpu *vcpu) > > BUG_ON(!(gif_set(svm))); > > + trace_kvm_inj_virq(irq); > + > svm->vmcb->control.event_inj = vcpu->arch.interrupt.nr | > SVM_EVTINJ_VALID | SVM_EVTINJ_TYPE_INTR; > } > -- > Gleb. > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] correctly trace irq injection on SVM.
On SVM interrupts are injected by svm_set_irq() not svm_inject_irq(). The later is used only to wait for irq window. Signed-off-by: Gleb Natapov diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 58c91f5..cd70109 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -2831,8 +2831,6 @@ static inline void svm_inject_irq(struct vcpu_svm *svm, int irq) { struct vmcb_control_area *control; - trace_kvm_inj_virq(irq); - ++svm->vcpu.stat.irq_injections; control = &svm->vmcb->control; control->int_vector = irq; @@ -2847,6 +2845,8 @@ static void svm_set_irq(struct kvm_vcpu *vcpu) BUG_ON(!(gif_set(svm))); + trace_kvm_inj_virq(irq); + svm->vmcb->control.event_inj = vcpu->arch.interrupt.nr | SVM_EVTINJ_VALID | SVM_EVTINJ_TYPE_INTR; } -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] fix "info cpus" halted state display
On Thu, May 13, 2010 at 04:17:14PM +0300, Gleb Natapov wrote: > When in-kernel irqchip is used env->halted is never used for anything > except "info cpus" command. Halted state is synced in > kvm_arch_save_mpstate() and showed by do_info_cpus() but otherwise never > looked at. Zeroing it here breaks "info cpus" since before > do_info_cpus() outputs env->halted in io thread it is zeroed here when > vcpu thread reenters kernel. > Avi, what about this patch? > Signed-off-by: Gleb Natapov > diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c > index 61d9331..0ec2881 100644 > --- a/qemu-kvm-x86.c > +++ b/qemu-kvm-x86.c > @@ -922,10 +922,6 @@ void kvm_arch_load_regs(CPUState *env, int level) > if (env->kvm_vcpu_update_vapic) > kvm_tpr_enable_vapic(env); > } > -if (kvm_irqchip_in_kernel()) { > -/* Avoid deadlock: no user space IRQ will ever clear it. */ > -env->halted = 0; > -} > > kvm_put_vcpu_events(env, level); > kvm_put_debugregs(env); > -- > Gleb. > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH RFC] virtio: put last seen used index into ring itself
On Thu, May 20, 2010 at 02:31:50PM +0930, Rusty Russell wrote: > Can we do better? The obvious idea is to try to get rid of last_used and > used, and use the ring itself. We would use an invalid entry to mark the > head of the ring. > > Any other thoughts? > Rusty. We also need a way to avoid interrupts at least while we are processing the ring. -- MST -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH qemu-kvm] device-assignment: add config fd qdev property
On Wed, May 19, 2010 at 02:18:35PM -0500, Anthony Liguori wrote: > On 05/19/2010 02:00 PM, Chris Wright wrote: > >When libvirt launches a guest it first chowns the relevenat > >/sys/bus/pci/.../config file for an assigned device then drops privileges. > > > >This causes an issue for device assignment because despite being file > >owner, the sysfs config space file checks for CAP_SYS_ADMIN before > >allowing access to device dependent config space. > > > >This adds a new qdev configfd property which allows libvirt to open the > >sysfs config space file and give qemu an already opened file descriptor. > >Along with a change pending for the 2.6.35 kernel, this allows the > >capability check to compare against privileges from when the file was > >opened. > > > >Signed-off-by: Chris Wright > > > > An fd as a qdev property seems like a bad idea to me. I'm not sure I > have a better suggestion though. The entire requirement to pass the open FD to qemu is a bad idea, but the kernel makes it impossible to do otherwise :-( Personally I wish the kernel just used the file ownership, so we could chown() the sysfs file to 'qemu' user in a normal manner :-( Daniel -- |: Red Hat, Engineering, London-o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :| |: http://autobuild.org-o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :| -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Autotest] [v3 PATCH] KVM test: Add a helper to search the panic in the log
On 05/19/2010 12:13 PM, Jason Wang wrote: > This checker serves as the post_command to find the panic information > in the file which contains the content of guest serial console. > > Changes from v2: > - Put all things into __main__ > - Fix some typos > > Signed-off-by: Jason Wang > --- > client/tests/kvm/scripts/check_serial.py | 24 > client/tests/kvm/tests_base.cfg.sample |7 +-- > 2 files changed, 29 insertions(+), 2 deletions(-) > create mode 100644 client/tests/kvm/scripts/check_serial.py > > diff --git a/client/tests/kvm/scripts/check_serial.py > b/client/tests/kvm/scripts/check_serial.py > new file mode 100644 > index 000..6432c27 > --- /dev/null > +++ b/client/tests/kvm/scripts/check_serial.py > @@ -0,0 +1,24 @@ > +import os, sys, glob, re > + > + > +class SerialCheckerError(Exception): > +""" > +Simple wrapper for the builtin Exception class. > +""" > +pass > + > + > +if __name__ == "__main__": > +client_dir = os.environ['AUTODIR'] > +pattern = os.environ['KVM_TEST_search_pattern'] > +shortname = os.environ['KVM_TEST_shortname'] > +debugdir = os.path.join(client_dir, "results/default/kvm.%s/debug" > +% shortname) > +serial_files = glob.glob(os.path.join(debugdir, 'serial*')) > + > +fail = [ f for f in serial_files if > + re.findall(pattern, file(f).read(), re.I) ] > +if fail: > +print "%s is found in %s" % (pattern, fail) > +raise SerialCheckerError("Error found during the check, please " > + "check the log") > diff --git a/client/tests/kvm/tests_base.cfg.sample > b/client/tests/kvm/tests_base.cfg.sample > index e85bb4a..c4e522a 100644 > --- a/client/tests/kvm/tests_base.cfg.sample > +++ b/client/tests/kvm/tests_base.cfg.sample > @@ -52,6 +52,10 @@ address_index = 0 > # Misc > profilers = kvm_stat > > +# pattern to search in guest serial console > +search_pattern = panic > +post_command = "python scripts/check_serial.py" > +post_command_noncritical = no > > # Tests > variants: > @@ -1324,10 +1328,9 @@ virtio|virtio_blk|e1000|balloon_check: > variants: > - @qcow2: > image_format = qcow2 > -post_command = " python scripts/check_image.py;" > +post_command += " && python scripts/check_image.py" If post_command is empty before executing this line, it'll end up being " && python scripts/..." and bash doesn't like that AFAIK. I think this line should be: post_command += " python scripts/check_image.py;" So if post_command is empty, it becomes " python ...;" which is OK, and if it isn't empty, it becomes "previous command; python ...;" (assuming the previous command ended with ';'). > remove_image = no > post_command_timeout = 600 > -post_command_noncritical = yes > - vmdk: > only Fedora Ubuntu Windows > only smp2 > > ___ > Autotest mailing list > autot...@test.kernel.org > http://test.kernel.org/cgi-bin/mailman/listinfo/autotest -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [KVM-AUTOTEST PATCH] KVM test: use command line option wrapper functions
On 05/19/2010 11:25 AM, Feng Yang wrote: > Hi, Michael > > Thanks for your patch. > We plan add "netdev" parameter support in make_qemu_command. Since you are > working on this part. Could you add netdev support in your patch? hopeful > netdev can be default supported in make_qemu_command if qemu support it. > Thanks very much! Sure, I'll look into it. > I think the point of this patch is good and we need this kinds of patch. > But I think we need not add so many new function. Especially some function > only directly return the string and do nothing more. > This will increase the function call consumption. > > > - "Michael Goldish" wrote: > >> From: "Michael Goldish" >> To: autot...@test.kernel.org, kvm@vger.kernel.org >> Cc: "Michael Goldish" >> Sent: Monday, May 17, 2010 9:29:35 PM GMT +08:00 Beijing / Chongqing / Hong >> Kong / Urumqi >> Subject: [KVM-AUTOTEST PATCH] KVM test: use command line option wrapper >> functions >> >> In order to support multiple versions of qemu which use different >> command line >> options or syntaxes, wrap all command line options in small helper >> functions, >> which append text to the command line according to the output of 'qemu >> -help'. >> >> Signed-off-by: Michael Goldish >> --- >> client/tests/kvm/kvm_vm.py | 198 >> ++-- >> 1 files changed, 135 insertions(+), 63 deletions(-) >> >> diff --git a/client/tests/kvm/kvm_vm.py b/client/tests/kvm/kvm_vm.py >> index 047505a..94bacdf 100755 >> --- a/client/tests/kvm/kvm_vm.py >> +++ b/client/tests/kvm/kvm_vm.py >> @@ -186,12 +186,100 @@ class VM: >> nic_model -- string to pass as 'model' parameter for >> this >> NIC (e.g. e1000) >> """ >> -if name is None: >> -name = self.name >> -if params is None: >> -params = self.params >> -if root_dir is None: >> -root_dir = self.root_dir >> +# Helper function for command line option wrappers >> +def has_option(help, option): >> +return bool(re.search(r"^-%s(\s|$)" % option, help, >> re.MULTILINE)) >> + >> +# Wrappers for all supported qemu command line parameters. >> +# This is meant to allow support for multiple qemu versions. >> +# Each of these functions receives the output of 'qemu -help' >> as a >> +# parameter, and should add the requested command line >> option >> +# accordingly. >> + >> +def add_name(help, name): >> +return " -name '%s'" % name > > I think we need not add so many new function. Especially some function only > directly return the string and do nothing more. > This will increase the function call consumption. > >> + >> +def add_unix_socket_monitor(help, filename): >> +return " -monitor unix:%s,server,nowait" % filename > Same as above >> + >> +def add_mem(help, mem): >> +return " -m %s" % mem > Same as above >> + >> +def add_smp(help, smp): >> +return " -smp %s" % smp > Same as above. All these helper functions are meant to be extended and modified in the future. They're only there to minimize future effort involved in adding support for new command line syntaxes. Right now add_smp() just returns " -smp %s", but in the future we may have to support different syntaxes for -smp, and then add_smp() will consult the output of 'qemu -help' and return the proper string. What do you mean by function call consumption? I don't think these functions cause a measurable slowdown, and make_qemu_command() is called very few times, so this really isn't a concern IMO. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][v3] KVM: VMX: Enable XSAVE/XRSTORE for guest
On 05/20/2010 12:16 PM, Sheng Yang wrote: From: Dexuan Cui Enable XSAVE/XRSTORE for guest. Change from V2: Addressed comments from Avi. Change from V1: 1. Use FPU API. 2. Fix CPUID issue. 3. Save/restore all possible guest xstate fields when switching. Because we don't know which fields guest has already touched. diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index d08bb4a..3938bd1 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -302,6 +302,7 @@ struct kvm_vcpu_arch { } update_pte; struct fpu guest_fpu; + u64 xcr0; gva_t mmio_fault_cr2; struct kvm_pio_request pio; diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h index 9e6779f..346ea66 100644 --- a/arch/x86/include/asm/vmx.h +++ b/arch/x86/include/asm/vmx.h @@ -266,6 +266,7 @@ enum vmcs_field { #define EXIT_REASON_EPT_VIOLATION 48 #define EXIT_REASON_EPT_MISCONFIG 49 #define EXIT_REASON_WBINVD54 +#define EXIT_REASON_XSETBV 55 /* * Interruption-information format diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 99ae513..a63f206 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -36,6 +36,8 @@ #include #include #include +#include +#include #include "trace.h" @@ -247,6 +249,9 @@ static const u32 vmx_msr_index[] = { }; #define NR_VMX_MSR ARRAY_SIZE(vmx_msr_index) +#define MERGE_TO_U64(low, high) \ + (((low)& -1u) | ((u64)((high)& -1u)<< 32)) + static inline u64 kvm_read_edx_eax(vcpu) in cache_regs.h +static int handle_xsetbv(struct kvm_vcpu *vcpu) +{ + u64 new_bv = MERGE_TO_U64(kvm_register_read(vcpu, VCPU_REGS_RAX), + kvm_register_read(vcpu, VCPU_REGS_RDX)); + + if (kvm_register_read(vcpu, VCPU_REGS_RCX) != 0) + goto err; + if (vmx_get_cpl(vcpu) != 0) + goto err; + if (!(new_bv& XSTATE_FP)) + goto err; + if ((new_bv& XSTATE_YMM)&& !(new_bv& XSTATE_SSE)) + goto err; What about a check against unknown bits? + vcpu->arch.xcr0 = new_bv; + xsetbv(XCR_XFEATURE_ENABLED_MASK, vcpu->arch.xcr0); + skip_emulated_instruction(vcpu); + return 1; +err: + kvm_inject_gp(vcpu, 0); + return 1; +} + static int handle_apic_access(struct kvm_vcpu *vcpu) { return emulate_instruction(vcpu, 0, 0, 0) == EMULATE_DONE; +static u64 host_xcr0; __read_mostly. + +static void update_cpuid(struct kvm_vcpu *vcpu) +{ + struct kvm_cpuid_entry2 *best; + + best = kvm_find_cpuid_entry(vcpu, 1, 0); + if (!best) + return; + + /* Update OSXSAVE bit */ + if (cpu_has_xsave&& best->function == 0x1) { + best->ecx&= ~(bit(X86_FEATURE_OSXSAVE)); + if (kvm_read_cr4(vcpu)& X86_CR4_OSXSAVE) + best->ecx |= bit(X86_FEATURE_OSXSAVE); + } +} Note: need to update after userspace writes cpuid as well. + int __kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4) { unsigned long old_cr4 = kvm_read_cr4(vcpu); @@ -481,6 +513,9 @@ int __kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4) if (cr4& CR4_RESERVED_BITS) return 1; + if (!guest_cpuid_has_xsave(vcpu)&& (cr4& X86_CR4_OSXSAVE)) + return 1; + if (is_long_mode(vcpu)) { if (!(cr4& X86_CR4_PAE)) return 1; @@ -497,6 +532,9 @@ int __kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4) if ((cr4 ^ old_cr4)& pdptr_bits) kvm_mmu_reset_context(vcpu); + if ((cr4 ^ old_cr4)& X86_CR4_OSXSAVE) + update_cpuid(vcpu); + I think we need to reload the guest's xcr0 at this point. Alternatively, call vmx_load_host_state() to ensure the the next entry will reload it. @@ -1931,7 +1964,7 @@ static void do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, switch (function) { case 0: - entry->eax = min(entry->eax, (u32)0xb); + entry->eax = min(entry->eax, (u32)0xd); Do we need any special handling for leaf 0xc? @@ -4567,6 +4616,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) kvm_x86_ops->prepare_guest_switch(vcpu); if (vcpu->fpu_active) kvm_load_guest_fpu(vcpu); + if (kvm_read_cr4(vcpu)& X86_CR4_OSXSAVE) + xsetbv(XCR_XFEATURE_ENABLED_MASK, vcpu->arch.xcr0); Better done in vmx_save_host_state(), so we only do it on context switches or entries from userspace. kvm_read_cr4_bits() is faster - doesn't need a vmcs_readl(). atomic_set(&vcpu->guest_mode, 1); smp_wmb(); @@ -5118,6 +5169,10 @@ void fx_init(struct kvm_vcpu *vcpu) fpu_alloc(&vcpu->arch.guest_fpu); fpu_finit(&vcpu->arch.guest_fpu); + /* Ensure guest xcr0 is valid for loading */ +
Re: [Autotest] [PATCH] KVM Test: Make remote_scp() more robust.
On 05/19/2010 05:14 AM, Feng Yang wrote: > > - "Michael Goldish" wrote: > >> From: "Michael Goldish" >> To: "Feng Yang" >> Cc: autot...@test.kernel.org, kvm@vger.kernel.org >> Sent: Monday, May 17, 2010 11:05:37 PM GMT +08:00 Beijing / Chongqing / Hong >> Kong / Urumqi >> Subject: Re: [Autotest] [PATCH] KVM Test: Make remote_scp() more robust. >> >> On 05/07/2010 01:26 PM, Feng Yang wrote: >>> 1. In remote_scp(), if SCP connetion stalled for some reason, >> following >>> code will be ran. >>> else: # match == None >>> >>> logging.debug("Timeout elapsed or process terminated") >>> status = sub.get_status() >>> sub.close() >>> return status == 0 >>> At this moment, kvm_subprocess server is still running which means >>> lock_server_running_filename is still locked. But sub.get_status() >>> tries to lock it again. If kvm_subprocess server keeps running, >>> a deadlock will happen. This patch will fix this issue by enable >> >> Agreed. It's a mistake (my mistake) to call get_status() on a >> process >> that's still running and isn't expected to terminate soon. I think >> even >> the docstring of get_status() says that it blocks, so that's expected >> behavior. >> However, there's a simple solution to that, and I don't see why an >> additional timeout is necessary. >> >>> timeout parameter. Update default value for timeout to 600, it >> should >>> be enough. >>> >>> 2. Add "-v" in scp command to catch more infomation. Also add "Exit >> status" >>> and "stalled" match prompt in remote_scp(). >>> Signed-off-by: Feng Yang >>> --- >>> client/tests/kvm/kvm_utils.py | 36 >> >>> client/tests/kvm/kvm_vm.py|4 ++-- >>> 2 files changed, 30 insertions(+), 10 deletions(-) >>> >>> diff --git a/client/tests/kvm/kvm_utils.py >> b/client/tests/kvm/kvm_utils.py >>> index 25f3c8c..3db4dec 100644 >>> --- a/client/tests/kvm/kvm_utils.py >>> +++ b/client/tests/kvm/kvm_utils.py >>> @@ -524,7 +524,7 @@ def remote_login(command, password, prompt, >> linesep="\n", timeout=10): >>> return None >>> >>> >>> -def remote_scp(command, password, timeout=300, login_timeout=10): >>> +def remote_scp(command, password, timeout=600, login_timeout=10): >>> """ >>> Run the given command using kvm_spawn and provide answers to >> the questions >>> asked. If timeout expires while waiting for the transfer to >> complete , >>> @@ -548,12 +548,18 @@ def remote_scp(command, password, timeout=300, >> login_timeout=10): >>> >>> password_prompt_count = 0 >>> _timeout = login_timeout >>> +end_time = time.time() + timeout >>> +logging.debug("Trying to SCP...") >>> >>> -logging.debug("Trying to login...") >>> >>> while True: >>> +if end_time <= time.time(): >>> +logging.debug("transfer timeout!") >>> +sub.close() >>> +return False >>> (match, text) = sub.read_until_last_line_matches( >>> -[r"[Aa]re you sure", r"[Pp]assword:\s*$", r"lost >> connection"], >>> +[r"[Aa]re you sure", r"[Pp]assword:\s*$", r"lost >> connection", >>> + r"Exit status", r"stalled"], >>> timeout=_timeout, internal_timeout=0.5) >>> if match == 0: # "Are you sure you want to continue >> connecting" >>> logging.debug("Got 'Are you sure...'; sending 'yes'") >>> @@ -574,15 +580,29 @@ def remote_scp(command, password, timeout=300, >> login_timeout=10): >>> logging.debug("Got 'lost connection'") >>> sub.close() >>> return False >>> +elif match == 3: # "Exit status" >> >> This check for "Exit status" is redundant. When the process >> terminates, >> read_until_last_line_matches() will return None and get_status() will >> return the exit status. > Here check for "Exit status", we can get not only the exit status,but also > some useful debug information when exit status is not 0. > Because we have enable '-v' in scp command. > > but read_until_last_line_matches() only return exit status. You get the same information from read_until_last_line_matches(). It returns (match, text). If match is None, and sub.get_status() != 0, then something bad happened, and then we can print text. >> >>> +sub.close() >>> +if "Exit status 0" in text: >>> +logging.debug("SCP command completed >> successfully") >>> +return True >>> +else: >>> +logging.debug("SCP command fail with exit status >> %s" % text) >>> +return False >>> +elif match == 4: # "stalled" >>> +logging.debug("SCP connection stalled for some >> reason") >>> +continue >>> + >>> else: # match == None >>> -logging.debug("Timeout elapsed or process terminated") >>> +if sub.is_alive(): >>> +continue >>> +
[PATCH][v3] KVM: VMX: Enable XSAVE/XRSTORE for guest
From: Dexuan Cui Enable XSAVE/XRSTORE for guest. Change from V2: Addressed comments from Avi. Change from V1: 1. Use FPU API. 2. Fix CPUID issue. 3. Save/restore all possible guest xstate fields when switching. Because we don't know which fields guest has already touched. Signed-off-by: Dexuan Cui Signed-off-by: Sheng Yang --- Avi, could you help to review this kernel patch first? Testcase and LM are in progress now. arch/x86/include/asm/kvm_host.h |1 + arch/x86/include/asm/vmx.h |1 + arch/x86/kvm/vmx.c | 28 arch/x86/kvm/x86.c | 88 +++--- 4 files changed, 111 insertions(+), 7 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index d08bb4a..3938bd1 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -302,6 +302,7 @@ struct kvm_vcpu_arch { } update_pte; struct fpu guest_fpu; + u64 xcr0; gva_t mmio_fault_cr2; struct kvm_pio_request pio; diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h index 9e6779f..346ea66 100644 --- a/arch/x86/include/asm/vmx.h +++ b/arch/x86/include/asm/vmx.h @@ -266,6 +266,7 @@ enum vmcs_field { #define EXIT_REASON_EPT_VIOLATION 48 #define EXIT_REASON_EPT_MISCONFIG 49 #define EXIT_REASON_WBINVD 54 +#define EXIT_REASON_XSETBV 55 /* * Interruption-information format diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 99ae513..a63f206 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -36,6 +36,8 @@ #include #include #include +#include +#include #include "trace.h" @@ -247,6 +249,9 @@ static const u32 vmx_msr_index[] = { }; #define NR_VMX_MSR ARRAY_SIZE(vmx_msr_index) +#define MERGE_TO_U64(low, high) \ + (((low) & -1u) | ((u64)((high) & -1u) << 32)) + static inline bool is_page_fault(u32 intr_info) { return (intr_info & (INTR_INFO_INTR_TYPE_MASK | INTR_INFO_VECTOR_MASK | @@ -3354,6 +3359,28 @@ static int handle_wbinvd(struct kvm_vcpu *vcpu) return 1; } +static int handle_xsetbv(struct kvm_vcpu *vcpu) +{ + u64 new_bv = MERGE_TO_U64(kvm_register_read(vcpu, VCPU_REGS_RAX), + kvm_register_read(vcpu, VCPU_REGS_RDX)); + + if (kvm_register_read(vcpu, VCPU_REGS_RCX) != 0) + goto err; + if (vmx_get_cpl(vcpu) != 0) + goto err; + if (!(new_bv & XSTATE_FP)) + goto err; + if ((new_bv & XSTATE_YMM) && !(new_bv & XSTATE_SSE)) + goto err; + vcpu->arch.xcr0 = new_bv; + xsetbv(XCR_XFEATURE_ENABLED_MASK, vcpu->arch.xcr0); + skip_emulated_instruction(vcpu); + return 1; +err: + kvm_inject_gp(vcpu, 0); + return 1; +} + static int handle_apic_access(struct kvm_vcpu *vcpu) { return emulate_instruction(vcpu, 0, 0, 0) == EMULATE_DONE; @@ -3632,6 +3659,7 @@ static int (*kvm_vmx_exit_handlers[])(struct kvm_vcpu *vcpu) = { [EXIT_REASON_TPR_BELOW_THRESHOLD] = handle_tpr_below_threshold, [EXIT_REASON_APIC_ACCESS] = handle_apic_access, [EXIT_REASON_WBINVD] = handle_wbinvd, + [EXIT_REASON_XSETBV] = handle_xsetbv, [EXIT_REASON_TASK_SWITCH] = handle_task_switch, [EXIT_REASON_MCE_DURING_VMENTRY] = handle_machine_check, [EXIT_REASON_EPT_VIOLATION] = handle_ept_violation, diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 7be1d36..7580d14 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -64,6 +64,7 @@ (~(unsigned long)(X86_CR4_VME | X86_CR4_PVI | X86_CR4_TSD | X86_CR4_DE\ | X86_CR4_PSE | X86_CR4_PAE | X86_CR4_MCE \ | X86_CR4_PGE | X86_CR4_PCE | X86_CR4_OSFXSR \ + | X86_CR4_OSXSAVE \ | X86_CR4_OSXMMEXCPT | X86_CR4_VMXE)) #define CR8_RESERVED_BITS (~(unsigned long)X86_CR8_TPR) @@ -149,6 +150,13 @@ struct kvm_stats_debugfs_item debugfs_entries[] = { { NULL } }; +static u64 host_xcr0; + +static inline u32 bit(int bitno) +{ + return 1 << (bitno & 31); +} + static void kvm_on_user_return(struct user_return_notifier *urn) { unsigned slot; @@ -473,6 +481,30 @@ void kvm_lmsw(struct kvm_vcpu *vcpu, unsigned long msw) } EXPORT_SYMBOL_GPL(kvm_lmsw); +static bool guest_cpuid_has_xsave(struct kvm_vcpu *vcpu) +{ + struct kvm_cpuid_entry2 *best; + + best = kvm_find_cpuid_entry(vcpu, 1, 0); + return best && (best->ecx & bit(X86_FEATURE_XSAVE)); +} + +static void update_cpuid(struct kvm_vcpu *vcpu) +{ + struct kvm_cpuid_entry2 *best; + + best = kvm_find_cpuid_entry(vcpu, 1, 0); + if (!best) + return; + + /* Update OSXSAVE bit */ + if (cpu_has_xsave && best->function == 0x1) { + best->ecx
Re: Wiki docs on counting and tracing KVM perf events
On 05/20/10 10:19, Avi Kivity wrote: > On 05/20/2010 11:15 AM, Jes Sorensen wrote: >>> Two things are missing to make this really useful: >>> >>> - a continuously updating difference mode like kvm_stat >>> - subevents; for example kvm:kvm_exit is an aggregate of all exit types >>> that can be split using filters to show individual exit reason >>> statistics >>> >> Third missing item, which I find really useful: >> - run once spit out raw counters >> >> For some operations, like file system benchmarking, it is useful to >> sample the counters before and after and then divide the raw number of >> events by the number of IOPS performed by the benchmark. If perf spits >> out events/sec it's kinda hard to get this. >> > > That's 'perf stat -a sleep 2' sleep 2 doesn't really cut it, I guess you could do it with perf stat -a sleep 0.1 but to be honest, that is pretty ugly. Something like this would be a lot nicer: perf stat -a -raw Jes -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Wiki docs on counting and tracing KVM perf events
On 05/20/2010 11:15 AM, Jes Sorensen wrote: On 05/13/10 16:57, Avi Kivity wrote: On 05/13/2010 05:35 PM, Stefan Hajnoczi wrote: How to count and trace KVM perf events: http://www.linux-kvm.org/page/Perf_events I want to draw attention to this because traditional kvm_stat and kvm_trace use has been moving over to the debugfs based tracing mechanisms. Perhaps we can flesh out documentation and examples of common perf event usage. Two things are missing to make this really useful: - a continuously updating difference mode like kvm_stat - subevents; for example kvm:kvm_exit is an aggregate of all exit types that can be split using filters to show individual exit reason statistics Third missing item, which I find really useful: - run once spit out raw counters For some operations, like file system benchmarking, it is useful to sample the counters before and after and then divide the raw number of events by the number of IOPS performed by the benchmark. If perf spits out events/sec it's kinda hard to get this. That's 'perf stat -a sleep 2' -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Wiki docs on counting and tracing KVM perf events
On 05/13/10 16:57, Avi Kivity wrote: > On 05/13/2010 05:35 PM, Stefan Hajnoczi wrote: >> How to count and trace KVM perf events: >> >> http://www.linux-kvm.org/page/Perf_events >> >> I want to draw attention to this because traditional kvm_stat and >> kvm_trace use has been moving over to the debugfs based tracing >> mechanisms. Perhaps we can flesh out documentation and examples of >> common perf event usage. >> >> > > Two things are missing to make this really useful: > > - a continuously updating difference mode like kvm_stat > - subevents; for example kvm:kvm_exit is an aggregate of all exit types > that can be split using filters to show individual exit reason statistics Third missing item, which I find really useful: - run once spit out raw counters For some operations, like file system benchmarking, it is useful to sample the counters before and after and then divide the raw number of events by the number of IOPS performed by the benchmark. If perf spits out events/sec it's kinda hard to get this. Cheers, Jes -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] pc: fix segfault introduced by 3d53f5c36ff6
Thank you for fixing it. Probably I was too in hurry when rebasing the patches. Acked-by: Isaku Yamahata On Thu, May 20, 2010 at 09:14:04AM +0300, Eduard - Gabriel Munteanu wrote: > Commit 3d53f5c36ff6 introduced a segfault by erroneously making fw_cfg a > 'void **' and passing it around in different ways. > > Signed-off-by: Eduard - Gabriel Munteanu > --- > hw/pc.c |4 ++-- > 1 files changed, 2 insertions(+), 2 deletions(-) > > diff --git a/hw/pc.c b/hw/pc.c > index fee08c9..4a4a706 100644 > --- a/hw/pc.c > +++ b/hw/pc.c > @@ -822,7 +822,7 @@ void pc_memory_init(ram_addr_t ram_size, > ram_addr_t ram_addr, bios_offset, option_rom_offset; > ram_addr_t below_4g_mem_size, above_4g_mem_size = 0; > int bios_size, isa_bios_size; > -void **fw_cfg; > +void *fw_cfg; > > if (ram_size >= 0xe000 ) { > above_4g_mem_size = ram_size - 0xe000; > @@ -905,7 +905,7 @@ void pc_memory_init(ram_addr_t ram_size, > rom_set_fw(fw_cfg); > > if (linux_boot) { > -load_linux(*fw_cfg, kernel_filename, initrd_filename, > kernel_cmdline, below_4g_mem_size); > +load_linux(fw_cfg, kernel_filename, initrd_filename, kernel_cmdline, > below_4g_mem_size); > } > > for (i = 0; i < nb_option_roms; i++) { > -- > 1.6.4.4 > > -- yamahata -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [RFC] Bug Day - June 1st, 2010
On 05/19/10 15:34, Anthony Liguori wrote: > On 05/19/2010 12:04 AM, Aurelien Jarno wrote: >> The idea is nice, but would it be possible to hold this on a week-end, >> I personally won't be able to attend such thing on a day week. >> >> Or maybe holding that on two days: friday and saturday so that people >> can participate at least one of the two days, depending if they do that >> from work or from home. > > The work week in Israel is Sunday - Thursday. > > It would have to be Sunday and Monday but honestly, I think both days > tend to be bad for this sort of thing. > > I'd much rather do more frequent bug days and alternate between a > weekday and a Saturday. If we settle for the 2nd of June, maybe the people who are unavailable on week days, could run a pre-bug day on Sunday the 30th. Maybe some of us would be able to stop by the channel briefly on the Sunday even if we plan to do the big bug day on the 2nd? Cheers, Jes -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [RFC] Bug Day - June 1st, 2010
On 05/19/10 02:58, Natalia Portillo wrote: > Hi, > >> - We'll try to migrate as many confirmable bugs from the Source Forge >> tracker to Launchpad. > I think that part of the bug day should also include retesting OSes that > appear in OS Support List as having bug and confirming if the bug is still > present and if it's in Launchpad or not. This would be a great task for people who would like to contribute, but maybe don't feel they have the experience or knowledge to hack on the code itself. Jes -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [RFC] Bug Day - June 1st, 2010
Michael Tokarev wrote: 20.05.2010 02:30, Anthony Liguori wrote: On 05/19/2010 05:29 PM, Andre Przywara wrote: Michael Tokarev wrote: ... Also, thanks to Andre Przywara, whole winNT thing works but it requires -cpu qemu64,level=1 (or level=2 or =3), -- _not_ with default CPU. This [] It'd be nice if we had more flexibility in defining custom machine types so you could just do qemu -M win98. This is wrong IMHO. win98 and winNT can run on various different machines, including all modern ones (yes I tried the same winNT on my Athlon X2-64, just had to switch SATA from AHCI to IDE; win95 works too)... just not in kvm :) Well, not really. You were lucky with your Athlon X2-64, actually it is the last machine not triggering the bug. I tried it on a AthlonII-X4 (which has maxleaf=5 as any newer AMD machines) and it showed the same bug. On Intel boxes this bug should trigger on every CPU starting with some Pentium4 models, including all Core chips. Have you tried versions with a newer service pack (SP6)? BTW: Does anyone knows what the problem with Windows95/98 on KVM is? I tried some tracing today, but couldn't find a hint. Um. The bugreport(s) come as a surprize for me: I tried to install win98 in kvm several times in the past but setup always failed - different messages in different versions of kvm, either "unable to emulate" or "real mode trap" or something else, or just lockup, usually on first reboot. So - the bugreports talks about mouse non-working, but this means win98 itself works somehow... I dunno :) I think these bug reports are about plain QEMU. I tried it yesterday, in fact the mouse is non-functional. In KVM Windows95 gives me a black screen after the welcome screen with the moving bottom row. There are just two lines at the top: (translated from the german version) While initializing device NTKERN: Windows protection fault. Restart the computer. KVM catched some #UDs due to ARPL from VM86 mode, but TCG got them too and it survived. So if anyone has some more hints, I'd be grateful. Regards, Andre. -- Andre Przywara AMD-Operating System Research Center (OSRC), Dresden, Germany Tel: +49 351 448-3567-12 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
computer frozen
Hello since kernel 2.6.28 or 2.6.29, I don't remember exactly, whenever I try to run KVM in my laptop, I get my computer totally frozen. I'd try: - "-no-kvm" flag: works, but very slow - "-cpu qemu32,-nx": frozen - "-no-acpi" flag: frozen I'd try with several kernels (ubuntu and openssuse kernels), also with custom kernels compiled by me (with the minimal options enabled)but always the same result: computer frozen An interesting point: with Sun VirtualBox 3.1, the same frozen result. My laptop is a TOSHIBA TECRA S4 (europe model only). magicb...@linux-ue9l:~/> cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Core(TM)2 CPU T7200 @ 2.00GHz stepping: 6 cpu MHz : 2000.000 cache size : 4096 KB physical id : 0 siblings: 2 core id : 0 cpu cores : 2 apicid : 0 initial apicid : 0 fdiv_bug: no hlt_bug : no f00f_bug: no coma_bug: no fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc arch_perfmon pebs bts aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm lahf_lm tpr_shadow bogomips: 3989.99 clflush size: 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: processor : 1 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Core(TM)2 CPU T7200 @ 2.00GHz stepping: 6 cpu MHz : 2000.000 cache size : 4096 KB physical id : 0 siblings: 2 core id : 1 cpu cores : 2 apicid : 1 initial apicid : 1 fdiv_bug: no hlt_bug : no f00f_bug: no coma_bug: no fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc arch_perfmon pebs bts aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm lahf_lm tpr_shadow bogomips: 3995.44 clflush size: 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: Anyone can help me? Thx in advance. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 10/12] kvm: enable smp > 1
Avi Kivity schrieb: On 05/19/2010 11:02 PM, Udo Lembke wrote: Unrelated, what are your smp issues? If i use one cpu i got a good io-performance: e.g. over 500MB/s at the profile "install" of the io-benchmark h2benchw.exe. ( aio=threads | SAS-Raid-0 | ftp://ftp.heise.de/pub/ct/ctsi/h2benchw.zip | hwbenchw.exe -p -w iotest 0) The same test but with two cpus gives results between 27 and 298 MB/s! Also in real life it's noticeable not only with an benchmark. I use a win-vm with two cpu for postscript-ripping and have a performance drop due to the bad io. Hi, What's your block device model? virtio or ide? in the test described before i used virtio, but the same happens with ide (but of course slightly different values). What does cpu usage look like on guest or host? On the guest it's looks like the io-process flap between the cpus. Windows show both cpus together are around 65% (less or more) , but if one CPU-usage rise, the other drop. On the host: PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 5386 root 20 0 1160m 1.0g 1552 R 109 13.5 1:23.58 kvm The guest is a win-xp, but the same happens in real life on a win2003. Udo smime.p7s Description: S/MIME Cryptographic Signature
Re: [Qemu-devel] [PATCH RFC] virtio: put last seen used index into ring itself
On 05/20/2010 08:01 AM, Rusty Russell wrote: A device with out of order completion (like virtio-blk) will quickly randomize the unused descriptor indexes, so every descriptor fetch will require a bounce. In contrast, if the rings hold the descriptors themselves instead of pointers, we bounce (sizeof(descriptor)/cache_line_size) cache lines for every descriptor, amortized. We already have indirect, this would be a logical next step. So let's think about it. The avail ring would contain 64 bit values, the used ring would contain indexes into the avail ring. Have just one ring, no indexes. The producer places descriptors into the ring and updates the head, The consumer copies out descriptors to be processed and copies back in completed descriptors. Chaining is always linear. The descriptors contain a tag that allow the producer to identify the completion. Indirect only pays when there are enough descriptors in it to fill a couple of cache lines. Otherwise it's an extra bounce. We will always bounce here, that what happens when transferring data. The question is whether how many cache lines per descriptor. A pointer adds 1 bounce, linear descriptors cost 1/4 bounce, chained descriptors cost a bounce. So best is one ring of linearly chained descriptors. Indirect works when you have large requests (like block). So client writes descriptor page and adds to avail ring, then writes to index. Server reads index, avail ring, descriptor page (3). Writes used entry (1). Updates last_used (1). Client reads used (1), derefs avail (1), updates last_used (1), cleans descriptor page (1). That's 9 cacheline transfers, worst case. Best case of a half-full ring in steady state, assuming 128-byte cache lines, the avail ring costs are 1/16, the used entry is 1/64. This drops it to 6 and 9/64 transfers. Cache lines are 64 bytes these days. With a single ring, client writes descriptors (ceil(N/4)), updates head (1). Server reads head (1) copies out descriptors (ceil(N/4)), issues requests, copies back completions ((ceil(N/4)), updates tail (1). Client reads back tail and descriptors (1 + ceil(N/4)) Worst case: 4 + 4 * ceil(N/4). Best case I think this drops by half. (Note, the current scheme adds 2 more cacheline transfers, for the descriptor table, worst case. 2 bounces per descriptor due to random access. Assuming indirect, we get 2/8 xfer best case. Either way, it's not the main source of cacheline xfers). Indirect adds a double bounce to get to the descriptor table, but any descriptors there are accessed linearly. It's only good when you have large chains. Can we do better? The obvious idea is to try to get rid of last_used and used, and use the ring itself. We would use an invalid entry to mark the head of the ring. Interesting! So a peer will read until it hits a wall. But how to update the wall atomically? Maybe we can have a flag in the descriptor indicate headness or tailness. Update looks ugly though: write descriptor with head flag, write next descriptor with head flag, remove flag from previous descriptor. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html