Re: [Qemu-devel] [sheepdog] [PATCH] sheepdog: selectable object size support
Hi Hitoshi, Thanks for your check. (2015/01/15 15:05), Hitoshi Mitake wrote: At Tue, 13 Jan 2015 17:41:12 +0900, Teruaki Ishizaki wrote: Previously, qemu block driver of sheepdog used hard-coded VDI object size. This patch enables users to handle block_size_shift value for calculating VDI object size. When you start qemu, you don't need to specify additional command option. But when you create the VDI which doesn't have default object size with qemu-img command, you specify block_size_shift option. If you want to create a VDI of 8MB(1 23) object size, you need to specify following command option. # qemu-img create -o block_size_shift=23 sheepdog:test1 100M In addition, when you don't specify qemu-img command option, a default value of sheepdog cluster is used for creating VDI. # qemu-img create sheepdog:test2 100M Signed-off-by: Teruaki Ishizaki ishizaki.teru...@lab.ntt.co.jp --- block/sheepdog.c | 138 +--- include/block/block_int.h |1 + 2 files changed, 117 insertions(+), 22 deletions(-) @@ -1610,7 +1633,8 @@ out_with_err_set: if (bs) { bdrv_unref(bs); } -g_free(buf); +if (buf) The above line has a style problem (white space). I'll change it. @@ -1669,6 +1693,17 @@ static int parse_redundancy(BDRVSheepdogState *s, const char *opt) return 0; } +static int parse_block_size_shift(BDRVSheepdogState *s, const char *opt) +{ +struct SheepdogInode *inode = s-inode; +inode-block_size_shift = (uint8_t)atoi(opt); This patch cannot create VDI with specified block size shift. Below is the reason: Initializing block_size_shift of the inode object here (in parse_block_size_shift()) doesn't make sense. Because the block_size_shift of newly created VDI is specified by block_size_shift of SheepdogVdiReq (in sheepdog source code, sd_req.vdi). You need to synchronize the struct and initialize the parameter. Below is a slack solution: diff --git a/block/sheepdog.c b/block/sheepdog.c index 525254e..3dc3359 100644 --- a/block/sheepdog.c +++ b/block/sheepdog.c @@ -168,7 +168,8 @@ typedef struct SheepdogVdiReq { uint32_t base_vdi_id; uint8_t copies; uint8_t copy_policy; -uint8_t reserved[2]; +uint8_t store_policy; +uint8_t block_size_shift; uint32_t snapid; uint32_t type; uint32_t pad[2]; @@ -1560,6 +1561,7 @@ static int do_sd_create(BDRVSheepdogState *s, uint32_t *vdi_id, int snapshot, hdr.vdi_size = s-inode.vdi_size; hdr.copy_policy = s-inode.copy_policy; hdr.copies = s-inode.nr_copies; +hdr.block_size_shift = s-inode.block_size_shift; ret = do_req(fd, s-aio_context, (SheepdogReq *)hdr, buf, wlen, rlen); Oh, sorry for my miss-edit for the patch I had tested the source code as you sugested fixing. I'll fix the patch like that! # This is a very slack solution. You need to avoid using inode as a # temporal space of the parameter. Originally, do_sd_create() had argument struct BDRVSheepdogState which included inode, and parse_redundancy() substitute copy_policy and copies substituted for inode. So, I created parse_block_size_shift() like parse_redundancy() to handle block_size_shift like copy_policy and copies options. Which point is temporal? Thanks, Teruaki Ishizaki
[Qemu-devel] [PATCH] qtest: Fix deadloop by running main loop AIO context's timers
qemu_clock_run_timers() only takes care of main_loop_tlg, we shouldn't forget aio timer list groups. Currently, the qemu_clock_deadline_ns_all (a few lines above) counts all the timergroups of this clock type, including aio tlg, but we don't fire them, so they are never cleared, which makes a dead loop. For example, this function hangs when trying to drive throttled block request queue with qtest clock_step. Signed-off-by: Fam Zheng f...@redhat.com --- cpus.c | 1 + 1 file changed, 1 insertion(+) diff --git a/cpus.c b/cpus.c index 3a5323b..dd7e595 100644 --- a/cpus.c +++ b/cpus.c @@ -387,6 +387,7 @@ void qtest_clock_warp(int64_t dest) seqlock_write_unlock(timers_state.vm_clock_seqlock); qemu_clock_run_timers(QEMU_CLOCK_VIRTUAL); +timerlistgroup_run_timers(qemu_get_aio_context()-tlg); clock = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL); } qemu_clock_notify(QEMU_CLOCK_VIRTUAL); -- 1.9.3
Re: [Qemu-devel] [RFC PATCH 2/4] pcie-aer: Fix command pcie_aer_inject_error is invalid
On 01/12/2015 09:56 PM, Marcel Apfelbaum wrote: On 01/12/2015 05:04 AM, Chen Fan wrote: in spec PCI Express 3.0 section 6.2.6 Figure 6-3 virtual bridge part, the flowchart showing tell us SERR# enable at Bridge Control register associate with system error at Secondary Status register can send error message. but bridge_control from dev-config is NULL, and SERR# was set in dev-wmask in pcie_aer_init() wmask denotes the register bits that can be written by the guest. If you are referring to: pci_word_test_and_set_mask(dev-wmask + PCI_BRIDGE_CONTROL, PCI_BRIDGE_CTL_SERR); that means that the OS *is able* to turn on/off SERR forwarding. Hi marcel, I saw the OS that turn on SERR# is to via evaluate _HPP (Hot Plug Parameters) method in ACPI. is it the only way to turn on SERR#? Thanks, Chen which was implemented by root port and swith devices, so we should add wmask (for w/r) bit set for bridge control. we can user command like: qemu_system_x86_64: -device ioh3420,bus=pcie.0,addr=1c.0,multifunction=on,port=1,id=bridge1 -device x3130-upstream,bus=bridge1,id=up.1,addr=00.0 -device xio3130-downstream,bus=up.1,id=down.1,port=1,addr=00.0,chassis=5 (qemu) pcie_aer_inject_error net0 POISON_TLP after that, guest can output the error message. Signed-off-by: Chen Fan chen.fan.f...@cn.fujitsu.com --- hw/pci/pcie_aer.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/hw/pci/pcie_aer.c b/hw/pci/pcie_aer.c index 7ca077a..571dc92 100644 --- a/hw/pci/pcie_aer.c +++ b/hw/pci/pcie_aer.c @@ -231,7 +231,8 @@ pcie_aer_msg_alldev(PCIDevice *dev, const PCIEAERMsg *msg) */ static bool pcie_aer_msg_vbridge(PCIDevice *dev, const PCIEAERMsg *msg) { -uint16_t bridge_control = pci_get_word(dev-config + PCI_BRIDGE_CONTROL); Here we check if the Guest OS/firmware actually turned the #SERR forwarding on. +uint16_t bridge_control = pci_get_word(dev-config + PCI_BRIDGE_CONTROL) | + pci_get_word(dev-wmask + PCI_BRIDGE_CONTROL); I don't think that this check is correct given the above comments. Please correct me if I mislead you, Thanks, Marcel if (pcie_aer_msg_is_uncor(msg)) { /* Received System Error */ .
Re: [Qemu-devel] [RFC PATCH v7 12/21] replay: recording and replaying clock ticks
From: Paolo Bonzini [mailto:pbonz...@redhat.com] On 13/01/2015 10:21, Pavel Dovgaluk wrote: +/*! Reads next clock event from the input. */ +int64_t replay_read_clock(unsigned int kind) +{ +if (kind = REPLAY_CLOCK_COUNT) { +fprintf(stderr, invalid clock ID %d for replay\n, kind); +exit(1); +} + +replay_exec_instructions(); + +if (replay_file) { +if (skip_async_events(EVENT_CLOCK + kind)) { +replay_read_next_clock(kind); +} +int64_t ret = replay_state.cached_clock[kind]; + +return ret; +} + +fprintf(stderr, REPLAY INTERNAL ERROR %d\n, __LINE__); +exit(1); +} Is this thread safe? It is, because order of main_loop and cpu_exec executions is protected by global mutex. Please document exactly which globals are protected by the rr QemuMutex, and which by the global mutex. But I think as many variables as possible should be protected by the rr QemuMutex, for two reasons: 1) people are working on threaded TCG. While SMP record/replay is a whole different story, even UP TCG will be multithreaded. 2) in the end it makes reviewing the code simpler. There is one problem with protecting log file inside the replay code. We probably should have the following sequence of calls: lock_replay_mutex replay_save_events replay_run_event unlock_replay_mutex But replay_run_event can also generate some log events and we will have deadlock here. That is why we rely on global mutex. Pavel Dovgalyuk
Re: [Qemu-devel] global_mutex and multithread.
On 2015-01-16 08:25, Mark Burton wrote: On 15 Jan 2015, at 22:41, Paolo Bonzini pbonz...@redhat.com wrote: On 15/01/2015 21:53, Mark Burton wrote: Jan said he had it working at least on ARM (MusicPal). yeah - our problem is when we enable multi-threads - which I dont believe Jan did… Multithreaded TCG, or single-threaded TCG with SMP? He mentions SMP, - I assume thats single-threaded …. Yes, I didn't patched anything towards multi-threaded SMP. Main reason: there was no answer on how to emulated the memory models of that target architecture over the host one which is mandatory if you let the emulated CPUs run unsynchronized in parallel. Did this change? One thing I wonder - why do we need to go to the extent of mutexing in the TCG like this? Why can’t you simply put a mutex get/release on the slow path? If the core is going to do ‘fast path’ access to the memory, - even if that memory was IO mapped - would it matter if it didn’t have the mutex? Because there is no guarantee that the memory map isn't changed by a core under the feet of another. The TLB (in particular the iotlb) is only valid with reference to a particular memory map. Changes to the memory map certainly happen in the slow path, but lookups are part of the fast path. Even an rwlocks is too slow for a fast path, hence the plan of going with RCU. Could we arrange the world such that lookups ‘succeed’ (the wheels dont fall off) -ether getting the old value, or the new, but not getting rubbish - and we still only take the mutex if we are going to make alterations to the MM itself? (I have’t looked at the code around that… so sorry if the question is ridiculous). That's the definition of RCU. :) Look at the docs in http://permalink.gmane.org/gmane.comp.emulators.qemu/313929 for more information. :) Ahh - I see ! It's still not trivial to make it 100% correct, but at the same time it's not too hard to prepare something decent to play with. Also, most of the work can be done with KVM so it's more or less independent from what you guys have been doing so far. Yes - the issue is if we end up relying on it. But - I see what you mean - these 2 things can ‘dovetail’ together “independently” - so - Jan’s patch will be good for now, and then later we can use RCU to make it work more generally (and more efficiently). So - our only small problem is getting Jan’s patch to work for multi-thread :-)) See above regarding the potential dimension. Jan -- Siemens AG, Corporate Technology, CT RTC ITP SES-DE Corporate Competence Center Embedded Linux
Re: [Qemu-devel] [PATCH v3 3/9] rocker: add register programming guide
On Mon, Jan 12, 2015 at 3:40 AM, Paolo Bonzini pbonz...@redhat.com wrote: On 11/01/2015 04:57, sfel...@gmail.com wrote: +PCI Configuration Space +--- + +Each switch instance registers as a PCI device with PCI configuration space: + + offset width description value + - + 0x0 2 Vendor ID 0x1b36 + 0x2 2 Device ID 0x0006 + 0x4 4 Command/Status + 0x8 1 Revision ID 0x01 + 0x9 3 Class code 0x2800 + 0xC 1 Cache line size + 0xD 1 Latency timer + 0xE 1 Header type + 0xF 1 Built-in self test + 0x104 Base address low + 0x144 Base address high + 0x18-28 Reserved + 0x2C2 Subsystem vendor ID 0x + 0x2E2 Subsystem ID0x This should not be guaranteed to 0, should it? Your're right. Added a note that subsystem implementation will fill this in. + 0x30-38 Reserved + 0x3C1 Interrupt line + 0x3D1 Interrupt pin 0x00 + 0x3E1 Min grant 0x00 + 0x3D1 Max latency 0x00 + 0x401 TRDY timeout + 0x411 Retry count + 0x422 Reserved + + +SECTION 3: Memory-Mapped Register Space +=== + +There are two memory-mapped BARs. BAR0 maps device register space and is +0x2000 in size. BAR1 maps MSI-X vector and PBA tables and is also 0x2000 in +size, allowing for 256 MSI-X vectors. The host BIOS will assign the base +address location. The host driver/OS will map the base address to host memory, +giving the driver mmio access to the device register space. No need for the bits after The host BIOS... since that's just normal PCI. Gone. +All registers are 4 or 8 bytes long. It is assumed host software will access 4 +byte registers with one 4-byte access, and 8 byte registers with either two +4-byte accesses or a single 8-byte access. In the case of two 4-byte accesses, +access must be lower and then upper 4-bytes, in that order. Double 4-byte accesses are not implemented, are they? They are now :) Tested on i386. I'll include changes with v4. +Interrupt credits +^ + +MSI-X vectors used for descriptor ring completions use a credit mechanism for +efficient device, PCIe bus, OS and driver operations. Each descriptor ring has +a credit count which represent the number of outstanding descriptors to be +processed by the driver. As the device marks descriptors complete, the credit +count is incremented. As the driver processes those outstanding descriptors, +it returns credits back to the device. This way, the device knows the driver's +progress and can make decisions about when to fire the next interrupt or not. +When the credit count is zero, and the first descriptors are posted for the +driver, a single interrupt is fired. Once the interrupt is fired, the +interrupt is disabled (auto-masked). In response to the interrupt, the driver +will process descriptors and PIO write a returned credit value for that +descriptor ring. If the driver returns all credits (the driver caught up with +the device and there is no outstanding work), then the interrupt is unmasked, +but not fired. If only partial credits are returned, the interrupt remains +masked but the device generates an interrupt, signaling the driver that more +outstanding work is available. Perhaps mention that this masking is unrelated to the MSI-X interrupt mask register? Done. +SECTION 5: Test Registers += + +Rocker switch has several test registers to support troubleshooting register s/Rocker switch/Rocker/ Done. +access, interrupt generation, and DMA operations: + + TEST_REG, offset 0x0010, 32-bit (R/W) + TEST_REG64, offset 0x0018, 64-bit (R/W) + TEST_IRQ, offset 0x0020, 32-bit (R/W) + TEST_DMA_ADDR, offset 0x0028, 64-bit (R/W) + TEST_DMA_SIZE, offset 0x0030, 32-bit (R/W) + TEST_DMA_CTRL, offset 0x0034, 32-bit (R/W) + +Reads to TEST_REG and TEST_REG64 will read a value 2x the last value written to s/2x/equal to twice/ Done. +the register. The 32-bit and 64-bit versions are for testing 32-bit and 64-bit +host accesses. Right now, as mentioned above, 64-bit registers must be accessed with a single 32-bit host access. Fixed in implementation. In the case of 32-bit host accesses, should TEST_REG64's value be latched until the upper half is written? If so, please mention it and describe that this behavior is shared with the other 64-bit Rocker registers. +Bits written to TEST_IRQ will cause same (unmasked) bits to be written to +IRQ_STAT and an interrupt
Re: [Qemu-devel] [PATCH v7] block/raw-posix.c: Fixes raw_getlength() on Mac OS X so that it reports the correct length of a real CD
Programmingkid programmingk...@gmail.com writes: This patch allows Mac OS X to use a real CDROM disc in QEMU. Testing this patch will require using QEMU v2.2.0 because the current git version has a bug in it that prevents /dev/cdrom from being used. make check did pass and my Debian boot disc did work. Signed-off-by: John Arbuckle programmingk...@gmail.com Subject line is too long, and the body lacks line breaks. Suggest something like: block/raw-posix.c: Fix raw_getlength() on Mac OS X for CD This patch allows Mac OS X to use a real CDROM disc in QEMU. Testing this patch will require using QEMU v2.2.0 because the current git version has a bug in it that prevents /dev/cdrom from being used. make check did pass and my Debian boot disc did work.
Re: [Qemu-devel] global_mutex and multithread.
On 16/01/2015 09:07, Jan Kiszka wrote: On 2015-01-16 08:25, Mark Burton wrote: On 15 Jan 2015, at 22:41, Paolo Bonzini pbonz...@redhat.com wrote: On 15/01/2015 21:53, Mark Burton wrote: Jan said he had it working at least on ARM (MusicPal). yeah - our problem is when we enable multi-threads - which I dont believe Jan did… Multithreaded TCG, or single-threaded TCG with SMP? He mentions SMP, - I assume thats single-threaded …. Yes, I didn't patched anything towards multi-threaded SMP. Main reason: there was no answer on how to emulated the memory models of that target architecture over the host one which is mandatory if you let the emulated CPUs run unsynchronized in parallel. Did this change? Hi Jan, Actually it's what we are trying to do, running emulated CPUs in parallel. I get a double mutex_lock error (eg: no unlock) I must have missed something during the rebase. I'll check. Thanks, Fred One thing I wonder - why do we need to go to the extent of mutexing in the TCG like this? Why can’t you simply put a mutex get/release on the slow path? If the core is going to do ‘fast path’ access to the memory, - even if that memory was IO mapped - would it matter if it didn’t have the mutex? Because there is no guarantee that the memory map isn't changed by a core under the feet of another. The TLB (in particular the iotlb) is only valid with reference to a particular memory map. Changes to the memory map certainly happen in the slow path, but lookups are part of the fast path. Even an rwlocks is too slow for a fast path, hence the plan of going with RCU. Could we arrange the world such that lookups ‘succeed’ (the wheels dont fall off) -ether getting the old value, or the new, but not getting rubbish - and we still only take the mutex if we are going to make alterations to the MM itself? (I have’t looked at the code around that… so sorry if the question is ridiculous). That's the definition of RCU. :) Look at the docs in http://permalink.gmane.org/gmane.comp.emulators.qemu/313929 for more information. :) Ahh - I see ! It's still not trivial to make it 100% correct, but at the same time it's not too hard to prepare something decent to play with. Also, most of the work can be done with KVM so it's more or less independent from what you guys have been doing so far. Yes - the issue is if we end up relying on it. But - I see what you mean - these 2 things can ‘dovetail’ together “independently” - so - Jan’s patch will be good for now, and then later we can use RCU to make it work more generally (and more efficiently). So - our only small problem is getting Jan’s patch to work for multi-thread :-)) See above regarding the potential dimension. Jan
[Qemu-devel] [PATCH v5 1/5] qemu-io: Account IO by aio_read and aio_write
This will enable accounting of aio requests issued from qemu-io aio read/write commands. Signed-off-by: Fam Zheng f...@redhat.com --- qemu-io-cmds.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c index e708552..29377cd 100644 --- a/qemu-io-cmds.c +++ b/qemu-io-cmds.c @@ -13,6 +13,7 @@ #include block/qapi.h #include qemu/main-loop.h #include qemu/timer.h +#include sysemu/block-backend.h #define CMD_NOFILE_OK 0x01 @@ -1337,6 +1338,7 @@ out: } struct aio_ctx { +BlockDriverState *bs; QEMUIOVector qiov; int64_t offset; char *buf; @@ -1344,6 +1346,7 @@ struct aio_ctx { int vflag; int Cflag; int Pflag; +BlockAcctCookie acct; int pattern; struct timeval t1; }; @@ -1361,6 +1364,8 @@ static void aio_write_done(void *opaque, int ret) goto out; } +block_acct_done(ctx-bs-stats, ctx-acct); + if (ctx-qflag) { goto out; } @@ -1398,6 +1403,8 @@ static void aio_read_done(void *opaque, int ret) g_free(cmp_buf); } +block_acct_done(ctx-bs-stats, ctx-acct); + if (ctx-qflag) { goto out; } @@ -1453,6 +1460,7 @@ static int aio_read_f(BlockDriverState *bs, int argc, char **argv) int nr_iov, c; struct aio_ctx *ctx = g_new0(struct aio_ctx, 1); +ctx-bs = bs; while ((c = getopt(argc, argv, CP:qv)) != EOF) { switch (c) { case 'C': @@ -1506,6 +1514,7 @@ static int aio_read_f(BlockDriverState *bs, int argc, char **argv) } gettimeofday(ctx-t1, NULL); +block_acct_start(bs-stats, ctx-acct, ctx-qiov.size, BLOCK_ACCT_READ); bdrv_aio_readv(bs, ctx-offset 9, ctx-qiov, ctx-qiov.size 9, aio_read_done, ctx); return 0; @@ -1549,6 +1558,7 @@ static int aio_write_f(BlockDriverState *bs, int argc, char **argv) int pattern = 0xcd; struct aio_ctx *ctx = g_new0(struct aio_ctx, 1); +ctx-bs = bs; while ((c = getopt(argc, argv, CqP:)) != EOF) { switch (c) { case 'C': @@ -1598,6 +1608,7 @@ static int aio_write_f(BlockDriverState *bs, int argc, char **argv) } gettimeofday(ctx-t1, NULL); +block_acct_start(bs-stats, ctx-acct, ctx-qiov.size, BLOCK_ACCT_WRITE); bdrv_aio_writev(bs, ctx-offset 9, ctx-qiov, ctx-qiov.size 9, aio_write_done, ctx); return 0; -- 1.9.3
[Qemu-devel] [PATCH v5 0/5] block: Add a qemu-iotests case for IO throttling
v5: Rebase and improve the test. Please review again. Patch dependencies: This test depends on the qtest timer fix to run correctly. http://lists.gnu.org/archive/html/qemu-devel/2015-01/msg01865.html Also depends on the os check fix to run at all: http://lists.gnu.org/archive/html/qemu-devel/2015-01/msg01848.html Original cover letter - There is a change in qemu-io sub-commands aio_read and aio_write, which makes the aio requests accounted and the statistics reflected in blockstats. Note that IO throttling implementation allows overcommiting of requests, so the actual IO happened in a time unit may be a bit larger than given limits. In the test case, the stats numbers are compared with a 10% error tolerance, to make room for such flexibility in order to improve determinism. Fam Fam Zheng (5): qemu-io: Account IO by aio_read and aio_write qtest: Add scripts/qtest.py qemu-iotests: Add VM method qtest() to iotests.py qemu-iotests: Allow caller to disable underscore convertion for qmp qemu-iotests: Add 093 for IO throttling qemu-io-cmds.c| 11 + scripts/qtest.py | 71 + tests/qemu-iotests/093| 103 ++ tests/qemu-iotests/093.out| 5 ++ tests/qemu-iotests/group | 1 + tests/qemu-iotests/iotests.py | 23 -- 6 files changed, 210 insertions(+), 4 deletions(-) create mode 100644 scripts/qtest.py create mode 100755 tests/qemu-iotests/093 create mode 100644 tests/qemu-iotests/093.out -- 1.9.3
[Qemu-devel] [PATCH v5 2/5] qtest: Add scripts/qtest.py
This adds scripts/qtest.py as a python library for qtest protocol. This is a skeleton with a basic cmd method to execute a command, reading and parsing of qtest output could be added later on demand. Signed-off-by: Fam Zheng f...@redhat.com --- scripts/qtest.py | 71 1 file changed, 71 insertions(+) create mode 100644 scripts/qtest.py diff --git a/scripts/qtest.py b/scripts/qtest.py new file mode 100644 index 000..25391f5 --- /dev/null +++ b/scripts/qtest.py @@ -0,0 +1,71 @@ +# QEMU qtest library +# +# Copyright (C) 2015 Red Hat Inc. +# +# Authors: +# Fam Zheng f...@redhat.com +# +# This work is licensed under the terms of the GNU GPL, version 2. See +# the COPYING file in the top-level directory. +# +# Based on qmp.py. +# + +import errno +import socket + +class QEMUQtestProtocol(object): +def __init__(self, address, server=False): + +Create a QEMUQtestProtocol object. + +@param address: QEMU address, can be either a unix socket path (string) +or a tuple in the form ( address, port ) for a TCP +connection +@param server: server mode listens on the socket (bool) +@raise socket.error on socket connection errors +@note No connection is established, this is done by the connect() or + accept() methods + +self._address = address +self._sock = self._get_sock() +if server: +self._sock.bind(self._address) +self._sock.listen(1) + +def _get_sock(self): +if isinstance(self._address, tuple): +family = socket.AF_INET +else: +family = socket.AF_UNIX +return socket.socket(family, socket.SOCK_STREAM) + +def connect(self): + +Connect to the qtest socket. + +@raise socket.error on socket connection errors + +self._sock.connect(self._address) + +def accept(self): + +Await connection from QEMU. + +@raise socket.error on socket connection errors + +self._sock, _ = self._sock.accept() + +def cmd(self, qtest_cmd): + +Send a qtest command on the wire. + +@param qtest_cmd: qtest command text to be sent + +self._sock.sendall(qtest_cmd + \n) + +def close(self): +self._sock.close() + +def settimeout(self, timeout): +self._sock.settimeout(timeout) -- 1.9.3
[Qemu-devel] [PATCH v5 4/5] qemu-iotests: Allow caller to disable underscore convertion for qmp
QMP command block_set_io_throttle expects underscores in parameters instead of dashes: {iops,bps}_{rd,wr,max}. Add optional argument conv_keys (defaults to True, backward compatible), it will be used in IO throttling test case. Reviewed-by: Benoit Canet ben...@irqsave.net Signed-off-by: Fam Zheng f...@redhat.com --- tests/qemu-iotests/iotests.py | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py index 85cb9a5..1402854 100644 --- a/tests/qemu-iotests/iotests.py +++ b/tests/qemu-iotests/iotests.py @@ -185,11 +185,14 @@ class VM(object): self._popen = None underscore_to_dash = string.maketrans('_', '-') -def qmp(self, cmd, **args): +def qmp(self, cmd, conv_keys=True, **args): '''Invoke a QMP command and return the result dict''' qmp_args = dict() for k in args.keys(): -qmp_args[k.translate(self.underscore_to_dash)] = args[k] +if conv_keys: +qmp_args[k.translate(self.underscore_to_dash)] = args[k] +else: +qmp_args[k] = args[k] return self._qmp.cmd(cmd, args=qmp_args) -- 1.9.3
[Qemu-devel] [PATCH v5 5/5] qemu-iotests: Add 093 for IO throttling
This case utilizes qemu-io command aio_{read,write} -q to verify the effectiveness of IO throttling options. It's implemented by driving the vm timer from qtest protocol, so the throttling timers are signaled with determinied time duration. Then we verify the completed IO requests are within 10% error of bps and iops limits. null protocol is used as the disk backend so that no actual disk IO is performed on host, this will make the blockstats much more deterministic. Both null-aio and null-co are covered, which is also a simple cross validation test for the driver code. Signed-off-by: Fam Zheng f...@redhat.com --- tests/qemu-iotests/093 | 103 + tests/qemu-iotests/093.out | 5 +++ tests/qemu-iotests/group | 1 + 3 files changed, 109 insertions(+) create mode 100755 tests/qemu-iotests/093 create mode 100644 tests/qemu-iotests/093.out diff --git a/tests/qemu-iotests/093 b/tests/qemu-iotests/093 new file mode 100755 index 000..d12cc25 --- /dev/null +++ b/tests/qemu-iotests/093 @@ -0,0 +1,103 @@ +#!/usr/bin/env python +# +# Tests for IO throttling +# +# Copyright (C) 2015 Red Hat, Inc. +# +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 2 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program. If not, see http://www.gnu.org/licenses/. +# + +import iotests + +class ThrottleTestCase(iotests.QMPTestCase): +test_img = null-aio:// + +def blockstats(self, device): +result = self.vm.qmp(query-blockstats) +for r in result['return']: +if r['device'] == device: +stat = r['stats'] +return stat['rd_bytes'], stat['rd_operations'], stat['wr_bytes'], stat['wr_operations'] +raise Exception(Device not found for blockstats: %s % device) + +def setUp(self): +self.vm = iotests.VM().add_drive(self.test_img) +self.vm.launch() + +def tearDown(self): +self.vm.shutdown() + +def do_test_throttle(self, seconds, params): +def check_limit(limit, num): +# IO throttling algorithm is discrete, allow 10% error so the test +# is more +return limit == 0 or \ + (num seconds * limit * 1.1 + and num seconds * limit * 0.9) + +nsec_per_sec = 10 + +params['device'] = 'drive0' +params['bps_max'] = 1 +params['iops_max'] = 1 + +result = self.vm.qmp(block_set_io_throttle, conv_keys=False, **params) +self.assert_qmp(result, 'return', {}) + +# Set vm clock to a known value +ns = seconds * nsec_per_sec +self.vm.qtest(clock_step %d % ns) + +# Append many requests into the throttle queue. They will drain bps_max +# and iops_max, but the rest requests won't get executed until we +# advance the virtual clock with qtest interface +for i in range(1000): +self.vm.hmp_qemu_io(drive0, aio_read %d 4096 % (i * 512)) +self.vm.hmp_qemu_io(drive0, aio_write %d 4096 % (i * 512)) + +start_rd_bytes, start_rd_iops, start_wr_bytes, start_wr_iops = self.blockstats('drive0') + +self.vm.qtest(clock_step %d % ns) +end_rd_bytes, end_rd_iops, end_wr_bytes, end_wr_iops = self.blockstats('drive0') + +rd_bytes = end_rd_bytes - start_rd_bytes +rd_iops = end_rd_iops - start_rd_iops +wr_bytes = end_wr_bytes - start_wr_bytes +wr_iops = end_wr_iops - start_wr_iops + +self.assertTrue(check_limit(params['bps'], rd_bytes + wr_bytes)) +self.assertTrue(check_limit(params['bps_rd'], rd_bytes)) +self.assertTrue(check_limit(params['bps_wr'], wr_bytes)) +self.assertTrue(check_limit(params['iops'], rd_iops + wr_iops)) +self.assertTrue(check_limit(params['iops_rd'], rd_iops)) +self.assertTrue(check_limit(params['iops_wr'], wr_iops)) + +def test_all(self): +params = {bps: 128000, + bps_rd: 128000, + bps_wr: 128000, + iops: 64, + iops_rd: 64, + iops_wr: 64 + } +for tk in params: +limits = dict([(k, 0) for k in params]) +limits[tk] = params[tk] +self.do_test_throttle(10, limits) + +class ThrottleTestCoroutine(ThrottleTestCase): +test_img = null-co:// + +if __name__ == '__main__': +iotests.main(supported_fmts=[raw]) diff --git a/tests/qemu-iotests/093.out
[Qemu-devel] [PATCH v5 3/5] qemu-iotests: Add VM method qtest() to iotests.py
This will allow test case to run command in qtest protocol. It's write-only for now. Signed-off-by: Fam Zheng f...@redhat.com --- tests/qemu-iotests/iotests.py | 16 ++-- 1 file changed, 14 insertions(+), 2 deletions(-) diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py index 241b5ee..85cb9a5 100644 --- a/tests/qemu-iotests/iotests.py +++ b/tests/qemu-iotests/iotests.py @@ -21,8 +21,11 @@ import re import subprocess import string import unittest -import sys; sys.path.append(os.path.join(os.path.dirname(__file__), '..', '..', 'scripts', 'qmp')) +import sys +sys.path.append(os.path.join(os.path.dirname(__file__), '..', '..', 'scripts')) +sys.path.append(os.path.join(os.path.dirname(__file__), '..', '..', 'scripts', 'qmp')) import qmp +import qtest import struct __all__ = ['imgfmt', 'imgproto', 'test_dir' 'qemu_img', 'qemu_io', @@ -81,10 +84,12 @@ class VM(object): def __init__(self): self._monitor_path = os.path.join(test_dir, 'qemu-mon.%d' % os.getpid()) self._qemu_log_path = os.path.join(test_dir, 'qemu-log.%d' % os.getpid()) +self._qtest_path = os.path.join(test_dir, 'qemu-qtest.%d' % os.getpid()) self._args = qemu_args + ['-chardev', 'socket,id=mon,path=' + self._monitor_path, '-mon', 'chardev=mon,mode=control', - '-qtest', 'stdio', '-machine', 'accel=qtest', + '-qtest', 'unix:path=' + self._qtest_path, + '-machine', 'accel=qtest', '-display', 'none', '-vga', 'none'] self._num_drives = 0 @@ -160,9 +165,11 @@ class VM(object): qemulog = open(self._qemu_log_path, 'wb') try: self._qmp = qmp.QEMUMonitorProtocol(self._monitor_path, server=True) +self._qtest = qtest.QEMUQtestProtocol(self._qtest_path, server=True) self._popen = subprocess.Popen(self._args, stdin=devnull, stdout=qemulog, stderr=subprocess.STDOUT) self._qmp.accept() +self._qtest.accept() except: os.remove(self._monitor_path) raise @@ -173,6 +180,7 @@ class VM(object): self._qmp.cmd('quit') self._popen.wait() os.remove(self._monitor_path) +os.remove(self._qtest_path) os.remove(self._qemu_log_path) self._popen = None @@ -185,6 +193,10 @@ class VM(object): return self._qmp.cmd(cmd, args=qmp_args) +def qtest(self, cmd): +'''Send a qtest command to guest''' +return self._qtest.cmd(cmd) + def get_qmp_event(self, wait=False): '''Poll for one queued QMP events and return it''' return self._qmp.pull_event(wait=wait) -- 1.9.3
[Qemu-devel] [PATCH v4 02/10] virtio-net: use qemu_mac_strdup_printf
From: Scott Feldman sfel...@gmail.com Signed-off-by: Scott Feldman sfel...@gmail.com --- hw/net/virtio-net.c | 12 +++- 1 file changed, 3 insertions(+), 9 deletions(-) diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c index 45da34a..698156f 100644 --- a/hw/net/virtio-net.c +++ b/hw/net/virtio-net.c @@ -226,12 +226,6 @@ static void rxfilter_notify(NetClientState *nc) } } -static char *mac_strdup_printf(const uint8_t *mac) -{ -return g_strdup_printf(%.2x:%.2x:%.2x:%.2x:%.2x:%.2x, mac[0], -mac[1], mac[2], mac[3], mac[4], mac[5]); -} - static intList *get_vlan_table(VirtIONet *n) { intList *list, *entry; @@ -284,12 +278,12 @@ static RxFilterInfo *virtio_net_query_rxfilter(NetClientState *nc) info-multicast_overflow = n-mac_table.multi_overflow; info-unicast_overflow = n-mac_table.uni_overflow; -info-main_mac = mac_strdup_printf(n-mac); +info-main_mac = qemu_mac_strdup_printf(n-mac); str_list = NULL; for (i = 0; i n-mac_table.first_multi; i++) { entry = g_malloc0(sizeof(*entry)); -entry-value = mac_strdup_printf(n-mac_table.macs + i * ETH_ALEN); +entry-value = qemu_mac_strdup_printf(n-mac_table.macs + i * ETH_ALEN); entry-next = str_list; str_list = entry; } @@ -298,7 +292,7 @@ static RxFilterInfo *virtio_net_query_rxfilter(NetClientState *nc) str_list = NULL; for (i = n-mac_table.first_multi; i n-mac_table.in_use; i++) { entry = g_malloc0(sizeof(*entry)); -entry-value = mac_strdup_printf(n-mac_table.macs + i * ETH_ALEN); +entry-value = qemu_mac_strdup_printf(n-mac_table.macs + i * ETH_ALEN); entry-next = str_list; str_list = entry; } -- 1.7.10.4
[Qemu-devel] [PATCH v4 00/10] rocker: add new rocker ethernet switch device
From: Scott Feldman sfel...@gmail.com v4: - Per Paolo Bonzini review: - move reg_guide.txt to docs/specs/rocker.txt - fix some spelling/grammer mistakes in the rocker.txt doc - fix some misleading/wrong statements in rocker.txt - add double 4-byte access for 64-bit registers - define new ROCKER_Exxx to replace usage of errno.h return codes - Add patch from David Ahern to timestamp debug output v3: - Per Stefan Hajnoczi review: - move HMP rocker cmds to info rocker - prefix QMP rocker cmds with query- - tag QMP cmds as Since 2.3 - convert structs to typedef with CamelCase naming - Remove SDHCI device ID move patch...Paolo Bonzini is addressing SDHCI move is separate patch. v2: - Address reg_guide.txt review comments from Eric Blake - Address QMP review comments from Eric Blake v1: [This is a collaboration between myself and Jiri Pirko]. This patch set adds a new ethernet switch device, called rocker. Rocker is intended to emulate HW features of switch ASICs found in today's data-center-class switch/routers. The original motivation in creating a new device is to accelerate device driver development for ethernet switches in the Linux kernel. A device driver for rocker already exists in the Linux 3.18 kernel and loads against this device. Basic L2 switching (bridging) functionality is offloaded to the device. Work continues to enable offloading of L3 routing functions and ACLs, as well as support for a flow-based modes, such as OpenVSwitch with OpenFlow. Future support for terminating L2-over-L3 tunnels is also planned. The core network processing functions are based on the spec of a real device: Broadcom's OF-DPA. Specifically, rocker borrows OF-DPA's network processing pipeline comprised of flow match and action tables. Only the OF-DPA spec was used in constructing rocker. The rocker developers do not have access to the real OF-DPA's software source code, so this is a clean-room, ground-up development. Each rocker device is a PCI device with a memory-mapped register space and MSI-X interrupts for command and event processing, as well as CPU-bound I/O. Each device can support up to 62 front-panel ports, which present themselves as -netdev attachment points. The device is programmed using OF-DPA flow and group tables to setup the flow pipeline. The programming defines the forwarding path for packets ingressing on 'front-panel' ports. The forwarding path can look at L2/L3/L4 packet header to forward the packet to its destination. For the performance path, packets would ingress and egress only on the device, and not be passed up to the device driver (or host OS). The slow path for control packets will forward packets to the CPU via the device driver for host OS processing. A QMP/HMP interface is added to give inside into the device's internal port configuration and flow/group tables. A test directory is included with some basic sanity tests to verify the device and driver. David Ahern (1): rocker: timestamp on the debug logs helps correlate with events in the VM Scott Feldman (9): net: add MAC address string printer virtio-net: use qemu_mac_strdup_printf rocker: add register programming guide pci: add rocker device ID pci: add network device class 'other' for network switches rocker: add new rocker switch device qmp: add rocker device support rocker: add tests MAINTAINERS: add rocker MAINTAINERS|6 + default-configs/pci.mak|1 + docs/specs/pci-ids.txt |1 + docs/specs/rocker.txt | 960 + hmp-commands.hx| 24 + hmp.c | 303 + hmp.h |4 + hw/net/Makefile.objs |3 + hw/net/rocker/rocker.c | 1511 + hw/net/rocker/rocker.h | 85 ++ hw/net/rocker/rocker_desc.c| 379 ++ hw/net/rocker/rocker_desc.h| 57 + hw/net/rocker/rocker_fp.c | 242 hw/net/rocker/rocker_fp.h | 54 + hw/net/rocker/rocker_hw.h | 491 +++ hw/net/rocker/rocker_of_dpa.c | 2622 hw/net/rocker/rocker_of_dpa.h | 25 + hw/net/rocker/rocker_tlv.h | 244 hw/net/rocker/rocker_world.c | 108 ++ hw/net/rocker/rocker_world.h | 63 + hw/net/rocker/test/README |5 + hw/net/rocker/test/all | 19 + hw/net/rocker/test/bridge | 43 + hw/net/rocker/test/bridge-stp | 52 + hw/net/rocker/test/bridge-vlan | 52 + hw/net/rocker/test/bridge-vlan-stp | 64 + hw/net/rocker/test/port| 22 + hw/net/rocker/test/tut.dot |8 + hw/net/virtio-net.c| 12 +- include/hw/pci/pci.h |1 + include/hw/pci/pci_ids.h |1 + include/net/net.h |1 + monitor.c
[Qemu-devel] [PATCH v4 08/10] rocker: add tests
From: Scott Feldman sfel...@gmail.com Add some basic test for rocker to test L2/L3/L4 functionality. Requires an external test environment, simp, located here: https://github.com/scottfeldman/simp To run tests, simp environment must be installed and a suitable VM image built and installed with a Linux 3.18 (or greater) kernel with rocker driver support enabled. Signed-off-by: Scott Feldman sfel...@gmail.com --- hw/net/rocker/test/README |5 +++ hw/net/rocker/test/all | 19 +++ hw/net/rocker/test/bridge | 43 hw/net/rocker/test/bridge-stp | 52 + hw/net/rocker/test/bridge-vlan | 52 + hw/net/rocker/test/bridge-vlan-stp | 64 hw/net/rocker/test/port| 22 + hw/net/rocker/test/tut.dot |8 + 8 files changed, 265 insertions(+) create mode 100644 hw/net/rocker/test/README create mode 100755 hw/net/rocker/test/all create mode 100755 hw/net/rocker/test/bridge create mode 100755 hw/net/rocker/test/bridge-stp create mode 100755 hw/net/rocker/test/bridge-vlan create mode 100755 hw/net/rocker/test/bridge-vlan-stp create mode 100755 hw/net/rocker/test/port create mode 100644 hw/net/rocker/test/tut.dot diff --git a/hw/net/rocker/test/README b/hw/net/rocker/test/README new file mode 100644 index 000..531e673 --- /dev/null +++ b/hw/net/rocker/test/README @@ -0,0 +1,5 @@ +Tests require simp (simple network simulator) found here: + +https://github.com/scottfeldman/simp + +Run 'all' to run all tests. diff --git a/hw/net/rocker/test/all b/hw/net/rocker/test/all new file mode 100755 index 000..d5ae963 --- /dev/null +++ b/hw/net/rocker/test/all @@ -0,0 +1,19 @@ +echo -n Running port test... +./port +if [ $? -eq 0 ]; then echo pass; else echo FAILED; exit 1; fi + +echo -n Running bridge test... +./bridge +if [ $? -eq 0 ]; then echo pass; else echo FAILED; exit 1; fi + +echo -n Running bridge STP test... +./bridge-stp +if [ $? -eq 0 ]; then echo pass; else echo FAILED; exit 1; fi + +echo -n Running bridge VLAN test... +./bridge-vlan +if [ $? -eq 0 ]; then echo pass; else echo FAILED; exit 1; fi + +echo -n Running bridge VLAN STP test... +./bridge-vlan-stp +if [ $? -eq 0 ]; then echo pass; else echo FAILED; exit 1; fi diff --git a/hw/net/rocker/test/bridge b/hw/net/rocker/test/bridge new file mode 100755 index 000..f1f555a --- /dev/null +++ b/hw/net/rocker/test/bridge @@ -0,0 +1,43 @@ +simp destroy .* +simp create -o sw1:rocker:sw1 tut tut.dot +simp start tut +sleep 10 +while ! simp ssh tut sw1 --cmd ping -c 1 localhost /dev/null; do sleep 1; done +while ! simp ssh tut h1 --cmd ping -c 1 localhost /dev/null; do sleep 1; done +while ! simp ssh tut h2 --cmd ping -c 1 localhost /dev/null; do sleep 1; done + +# configure a 2-port bridge + +simp ssh tut sw1 --cmd sudo /sbin/ip link add name br0 type bridge +simp ssh tut sw1 --cmd sudo /sbin/ip link set dev swp1 master br0 +simp ssh tut sw1 --cmd sudo /sbin/ip link set dev swp2 master br0 + +# turn off vlan default_pvid on br0 + +simp ssh tut sw1 --cmd echo 0 | sudo dd of=/sys/class/net/br0/bridge/default_pvid 2 /dev/null + +# turn off learning and flooding in SW + +simp ssh tut sw1 --cmd sudo /sbin/bridge link set dev swp1 learning off +simp ssh tut sw1 --cmd sudo /sbin/bridge link set dev swp2 learning off + +simp ssh tut sw1 --cmd sudo /sbin/bridge link set dev swp1 flood off +simp ssh tut sw1 --cmd sudo /sbin/bridge link set dev swp2 flood off + +# bring up bridge and ports + +simp ssh tut sw1 --cmd sudo ifconfig br0 up +simp ssh tut sw1 --cmd sudo ifconfig swp1 up +simp ssh tut sw1 --cmd sudo ifconfig swp2 up +simp ssh tut sw1 --cmd sudo ifconfig br0 11.0.0.3/24 + +# config IP on hosts + +simp ssh tut h1 --cmd sudo ifconfig swp1 11.0.0.1/24 +simp ssh tut h2 --cmd sudo ifconfig swp1 11.0.0.2/24 + +# test... + +simp ssh tut h1 --cmd ping -c10 11.0.0.2 /dev/null +if [ $? -ne 0 ]; then exit 1; fi +simp ssh tut h1 --cmd ping -c10 11.0.0.3 /dev/null diff --git a/hw/net/rocker/test/bridge-stp b/hw/net/rocker/test/bridge-stp new file mode 100755 index 000..60f4483 --- /dev/null +++ b/hw/net/rocker/test/bridge-stp @@ -0,0 +1,52 @@ +simp destroy .* +simp create -o sw1:rocker:sw1 tut tut.dot +simp start tut +sleep 10 +while ! simp ssh tut sw1 --cmd ping -c 1 localhost /dev/null; do sleep 1; done +while ! simp ssh tut h1 --cmd ping -c 1 localhost /dev/null; do sleep 1; done +while ! simp ssh tut h2 --cmd ping -c 1 localhost /dev/null; do sleep 1; done + +# configure a 2-port bridge + +simp ssh tut sw1 --cmd sudo /sbin/ip link add name br0 type bridge +simp ssh tut sw1 --cmd sudo brctl stp br0 on +simp ssh tut sw1 --cmd sudo /sbin/ip link set dev swp1 master br0 +simp ssh tut sw1 --cmd sudo /sbin/ip link set dev swp2 master br0 + +# turn off vlan default_pvid on br0 + +simp ssh tut sw1 --cmd echo
[Qemu-devel] [PATCH v4 07/10] qmp: add rocker device support
From: Scott Feldman sfel...@gmail.com Add QMP/HMP support for rocker devices. This is mostly for debugging purposes to see inside the device's tables and port configurations. Some examples: (qemu) info rocker sw1 name: sw1 id: 0x013512005452 ports: 4 (qemu) info rocker-ports sw1 ena/speed/ auto port linkduplex neg? sw1.1 up 10G FD No sw1.2 up 10G FD No sw1.3 !ena 10G FD No sw1.4 !ena 10G FD No (qemu) info rocker-of-dpa-flows sw1 prio tbl hits key(mask) -- actions 260 lport 1 vlan 1 LLDP src 00:02:00:00:02:00 dst 01:80:c2:00:00:0e 260 lport 1 vlan 1 ARP src 00:02:00:00:02:00 dst 00:02:00:00:03:00 260 lport 2 vlan 2 IPv6 src 00:02:00:00:03:00 dst 33:33:ff:00:00:02 proto 58 350 vlan 2 dst 33:33:ff:00:00:02 -- write group 0x3201 goto tbl 60 260 lport 2 vlan 2 IPv6 src 00:02:00:00:03:00 dst 33:33:ff:00:03:00 proto 58 350 1vlan 2 dst 33:33:ff:00:03:00 -- write group 0x3201 goto tbl 60 260 lport 2 vlan 2 ARP src 00:02:00:00:03:00 dst 00:02:00:00:02:00 350 2vlan 2 dst 00:02:00:00:02:00 -- write group 0x0201 goto tbl 60 260 1lport 2 vlan 2 IP src 00:02:00:00:03:00 dst 00:02:00:00:02:00 proto 1 350 2vlan 1 dst 00:02:00:00:03:00 -- write group 0x0102 goto tbl 60 260 1lport 1 vlan 1 IP src 00:02:00:00:02:00 dst 00:02:00:00:03:00 proto 1 260 lport 1 vlan 1 IPv6 src 00:02:00:00:02:00 dst 33:33:ff:00:00:01 proto 58 350 vlan 1 dst 33:33:ff:00:00:01 -- write group 0x3100 goto tbl 60 260 lport 1 vlan 1 IPv6 src 00:02:00:00:02:00 dst 33:33:ff:00:02:00 proto 58 350 1vlan 1 dst 33:33:ff:00:02:00 -- write group 0x3100 goto tbl 60 160 173 lport 2 vlan 2 LLDP src any dst 01:80:c2:00:00:0e -- write group 0x0200 160 6lport 2 vlan 2 IPv6 src any dst any -- write group 0x0200 160 174 lport 1 vlan 1 LLDP src any dst 01:80:c2:00:00:0e -- write group 0x0100 160 174 lport 2 vlan 2 IP src any dst any -- write group 0x0200 160 6lport 1 vlan 1 IPv6 src any dst any -- write group 0x0100 160 181 lport 2 vlan 2 ARP src any dst any -- write group 0x0200 110 715 lport 2 -- apply new vlan 2 goto tbl 20 160 177 lport 1 vlan 1 ARP src any dst any -- write group 0x0100 160 174 lport 1 vlan 1 IP src any dst any -- write group 0x0100 110 717 lport 1 -- apply new vlan 1 goto tbl 20 10 1432 lport 0(0x) -- goto tbl 10 (qemu) info rocker-of-dpa-groups sw1 id (decode) -- buckets 0x3201 (type L2 multicast vlan 2 index 1) -- groups [0x0201,0x0200] 0x0201 (type L2 interface vlan 2 lport 1) -- pop vlan out lport 1 0x0102 (type L2 interface vlan 1 lport 2) -- pop vlan out lport 2 0x0200 (type L2 interface vlan 2 lport 0) -- pop vlan out lport 0 0x0100 (type L2 interface vlan 1 lport 0) -- pop vlan out lport 0 0x3100 (type L2 multicast vlan 1 index 0) -- groups [0x0102,0x0100] Signed-off-by: Scott Feldman sfel...@gmail.com Signed-off-by: Jiri Pirko j...@resnulli.us --- hmp-commands.hx | 24 hmp.c | 303 hmp.h |4 + hw/net/rocker/rocker.c| 45 ++ hw/net/rocker/rocker_fp.c | 10 ++ hw/net/rocker/rocker_fp.h |1 + hw/net/rocker/rocker_of_dpa.c | 308 + monitor.c | 28 qapi-schema.json |3 + qapi/rocker.json | 259 ++ qmp-commands.hx | 97 + 11 files changed, 1082 insertions(+) create mode 100644 qapi/rocker.json diff --git a/hmp-commands.hx b/hmp-commands.hx index e37bc8b..6873a3a 100644 --- a/hmp-commands.hx +++ b/hmp-commands.hx @@ -1787,5 +1787,29 @@ show available trace events and their state ETEXI STEXI +@item rocker @var{name} +@findex rocker +Show Rocker(s) +ETEXI + +STEXI +@item rocker_ports @var{name} +@findex rocker_ports +Show Rocker ports +ETEXI + +STEXI +@item rocker_of_dpa_flows @var{name} [@var{tbl_id}] +@findex rocker_of_dpa_flows +Show Rocker OF-DPA flow tables +ETEXI + +STEXI +@item rocker_of_dpa_groups @var{name} [@var{type}] +@findex rocker_of_dpa_groups +Show Rocker OF-DPA groups +ETEXI + +STEXI @end table ETEXI diff --git a/hmp.c b/hmp.c index 481be80..949d43c 100644 --- a/hmp.c +++ b/hmp.c @@ -15,6 +15,7 @@ #include hmp.h #include net/net.h +#include net/eth.h #include sysemu/char.h #include qemu/option.h #include qemu/timer.h @@ -1813,3 +1814,305 @@ void hmp_info_memory_devices(Monitor *mon, const QDict *qdict) qapi_free_MemoryDeviceInfoList(info_list); } + +void hmp_rocker(Monitor *mon, const QDict *qdict) +{ +const char *name = qdict_get_str(qdict, name); +RockerSwitch *rocker; +Error *errp = NULL; +
[Qemu-devel] [PATCH v4 04/10] pci: add rocker device ID
From: Scott Feldman sfel...@gmail.com Signed-off-by: Scott Feldman sfel...@gmail.com Signed-off-by: Jiri Pirko j...@resnulli.us --- docs/specs/pci-ids.txt |1 + include/hw/pci/pci.h |1 + 2 files changed, 2 insertions(+) diff --git a/docs/specs/pci-ids.txt b/docs/specs/pci-ids.txt index c6732fe..e4a4490 100644 --- a/docs/specs/pci-ids.txt +++ b/docs/specs/pci-ids.txt @@ -45,6 +45,7 @@ PCI devices (other than virtio): 1b36:0003 PCI Dual-port 16550A adapter (docs/specs/pci-serial.txt) 1b36:0004 PCI Quad-port 16550A adapter (docs/specs/pci-serial.txt) 1b36:0005 PCI test device (docs/specs/pci-testdev.txt) +1b36:0006 PCI Rocker Ethernet switch device 1b36:0007 PCI SD Card Host Controller Interface (SDHCI) All these devices are documented in docs/specs. diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h index 97a83d3..2c58bf2 100644 --- a/include/hw/pci/pci.h +++ b/include/hw/pci/pci.h @@ -88,6 +88,7 @@ #define PCI_DEVICE_ID_REDHAT_SERIAL2 0x0003 #define PCI_DEVICE_ID_REDHAT_SERIAL4 0x0004 #define PCI_DEVICE_ID_REDHAT_TEST0x0005 +#define PCI_DEVICE_ID_REDHAT_ROCKER 0x0006 #define PCI_DEVICE_ID_REDHAT_SDHCI 0x0007 #define PCI_DEVICE_ID_REDHAT_QXL 0x0100 -- 1.7.10.4
[Qemu-devel] [PATCH v4 01/10] net: add MAC address string printer
From: Scott Feldman sfel...@gmail.com We can use this in virtio-net code as well as new Rocker driver code, so up-level this. Signed-off-by: Scott Feldman sfel...@gmail.com --- include/net/net.h |1 + net/net.c |7 +++ 2 files changed, 8 insertions(+) diff --git a/include/net/net.h b/include/net/net.h index 008d610..90742f0 100644 --- a/include/net/net.h +++ b/include/net/net.h @@ -97,6 +97,7 @@ typedef struct NICState { bool peer_deleted; } NICState; +char *qemu_mac_strdup_printf(const uint8_t *macaddr); NetClientState *qemu_find_netdev(const char *id); int qemu_find_net_clients_except(const char *id, NetClientState **ncs, NetClientOptionsKind type, int max); diff --git a/net/net.c b/net/net.c index 7acc162..2387616 100644 --- a/net/net.c +++ b/net/net.c @@ -151,6 +151,13 @@ int parse_host_port(struct sockaddr_in *saddr, const char *str) return 0; } +char *qemu_mac_strdup_printf(const uint8_t *macaddr) +{ +return g_strdup_printf(%.2x:%.2x:%.2x:%.2x:%.2x:%.2x, + macaddr[0], macaddr[1], macaddr[2], + macaddr[3], macaddr[4], macaddr[5]); +} + void qemu_format_nic_info_str(NetClientState *nc, uint8_t macaddr[6]) { snprintf(nc-info_str, sizeof(nc-info_str), -- 1.7.10.4
[Qemu-devel] [PATCH v4 05/10] pci: add network device class 'other' for network switches
From: Scott Feldman sfel...@gmail.com Rocker is an ethernet switch device, so add 'other' network device class as defined by PCI to cover these types of devices. Signed-off-by: Scott Feldman sfel...@gmail.com Signed-off-by: Jiri Pirko j...@resnulli.us --- include/hw/pci/pci_ids.h |1 + 1 file changed, 1 insertion(+) diff --git a/include/hw/pci/pci_ids.h b/include/hw/pci/pci_ids.h index d7be386..c6de710 100644 --- a/include/hw/pci/pci_ids.h +++ b/include/hw/pci/pci_ids.h @@ -23,6 +23,7 @@ #define PCI_CLASS_STORAGE_OTHER 0x0180 #define PCI_CLASS_NETWORK_ETHERNET 0x0200 +#define PCI_CLASS_NETWORK_OTHER 0x0280 #define PCI_CLASS_DISPLAY_VGA0x0300 #define PCI_CLASS_DISPLAY_OTHER 0x0380 -- 1.7.10.4
[Qemu-devel] [PATCH v4 03/10] rocker: add register programming guide
From: Scott Feldman sfel...@gmail.com This is the register programming guide for the Rocker device. It's intended for driver writers and device writers. It covers the device's PCI space, the register set, DMA interface, and interrupts. Signed-off-by: Scott Feldman sfel...@gmail.com Signed-off-by: Jiri Pirko j...@resnulli.us --- docs/specs/rocker.txt | 960 + 1 file changed, 960 insertions(+) create mode 100644 docs/specs/rocker.txt diff --git a/docs/specs/rocker.txt b/docs/specs/rocker.txt new file mode 100644 index 000..fa9db19 --- /dev/null +++ b/docs/specs/rocker.txt @@ -0,0 +1,960 @@ +Rocker Network Switch Register Programming Guide +Copyright (c) Scott Feldman sfel...@gmail.com +Copyright (c) Neil Horman nhor...@tuxdriver.com +Version 0.11, 12/29/2014 + +LICENSE +=== + +This program is free software; you can redistribute it and/or modify +it under the terms of the GNU General Public License as published by +the Free Software Foundation; either version 2 of the License, or +(at your option) any later version. + +This program is distributed in the hope that it will be useful, +but WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +GNU General Public License for more details. + +SECTION 1: Introduction +=== + +Overview + + +This document describes the hardware/software interface for the Rocker switch +device. The intended audience is authors of OS drivers and device emulation +software. + +Notations and Conventions +- + +o In register descriptions, [n:m] indicates a range from bit n to bit m, +inclusive. +o Use of leading 0x indicates a hexadecimal number. +o Use of leading 0b indicates a binary number. +o The use of RSVD or Reserved indicates that a bit or field is reserved for +future use. +o Field width is in bytes, unless otherwise noted. +o Register are (R) read-only, (R/W) read/write, (W) write-only, or (COR) clear +on read +o TLV values in network-byte-order are designated with (N). + + +SECTION 2: PCI Configuration Registers +== + +PCI Configuration Space +--- + +Each switch instance registers as a PCI device with PCI configuration space: + + offset width description value + - + 0x0 2 Vendor ID 0x1b36 + 0x2 2 Device ID 0x0006 + 0x4 4 Command/Status + 0x8 1 Revision ID 0x01 + 0x9 3 Class code 0x2800 + 0xC 1 Cache line size + 0xD 1 Latency timer + 0xE 1 Header type + 0xF 1 Built-in self test + 0x104 Base address low + 0x144 Base address high + 0x18-28 Reserved + 0x2C2 Subsystem vendor ID * + 0x2E2 Subsystem ID* + 0x30-38 Reserved + 0x3C1 Interrupt line + 0x3D1 Interrupt pin 0x00 + 0x3E1 Min grant 0x00 + 0x3D1 Max latency 0x00 + 0x401 TRDY timeout + 0x411 Retry count + 0x422 Reserved + + +* Assigned by sub-system implementation + +SECTION 3: Memory-Mapped Register Space +=== + +There are two memory-mapped BARs. BAR0 maps device register space and is +0x2000 in size. BAR1 maps MSI-X vector and PBA tables and is also 0x2000 in +size, allowing for 256 MSI-X vectors. + +All registers are 4 or 8 bytes long. It is assumed host software will access 4 +byte registers with one 4-byte access, and 8 byte registers with either two +4-byte accesses or a single 8-byte access. In the case of two 4-byte accesses, +access must be lower and then upper 4-bytes, in that order. + +BAR0 device register space is organized as follows: + + offset description + -- + 0x-0x000f Bogus registers to catch misbehaving + drivers. Writes do nothing. Reads + back as 0xDEADBABE. + 0x0010-0x00ff Test registers + 0x0300-0x03ff General purpose registers + 0x1000-0x1fff Descriptor control + +Holes in register space are reserved. Writes to reserved registers do nothing. +Reads to reserved registers read back as 0. + +No fancy stuff like write-combining is enabled on any of the registers. + +BAR1 MSI-X register space is organized as follows: + + offset description + -- + 0x-0x0fff MSI-X vector table (256 vectors total) + 0x1000-0x1fff MSI-X PBA table + + +SECTION 4: Interrupts, DMA, and Endianess
[Qemu-devel] [PATCH v4 09/10] MAINTAINERS: add rocker
From: Scott Feldman sfel...@gmail.com Signed-off-by: Scott Feldman sfel...@gmail.com Signed-off-by: Jiri Pirko j...@resnulli.us --- MAINTAINERS |6 ++ 1 file changed, 6 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index 430688d..8b6f8d4 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -736,6 +736,12 @@ S: Maintained F: hw/net/vmxnet* F: hw/scsi/vmw_pvscsi* +Rocker +M: Scott Feldman sfel...@gmail.com +M: Jiri Pirko j...@resnulli.us +S: Maintained +F: hw/net/rocker/ + Subsystems -- Audio -- 1.7.10.4
[Qemu-devel] [PATCH v4 10/10] rocker: timestamp on the debug logs helps correlate with events in the VM
From: David Ahern dsah...@gmail.com Signed-off-by: David Ahern dsah...@gmail.com Signed-off-by: Scott Feldman sfel...@gmail.com --- hw/net/rocker/rocker.h | 11 ++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/hw/net/rocker/rocker.h b/hw/net/rocker/rocker.h index ef77487..5ae8aff 100644 --- a/hw/net/rocker/rocker.h +++ b/hw/net/rocker/rocker.h @@ -25,7 +25,16 @@ #if defined(DEBUG_ROCKER) # define DPRINTF(fmt, ...) \ -do { fprintf(stderr, ROCKER: fmt, ## __VA_ARGS__); } while (0) +do { \ +struct timeval tv; \ +char timestr[64]; \ +time_t now;\ +gettimeofday(tv, NULL); \ +now = tv.tv_sec; \ +strftime(timestr, sizeof(timestr), %T, localtime(now)); \ +fprintf(stderr, %s.%06ld , timestr, tv.tv_usec); \ +fprintf(stderr, ROCKER: fmt, ## __VA_ARGS__); \ +} while (0) #else static inline GCC_FMT_ATTR(1, 2) int DPRINTF(const char *fmt, ...) { -- 1.7.10.4
Re: [Qemu-devel] global_mutex and multithread.
On 16 Jan 2015, at 09:07, Jan Kiszka jan.kis...@siemens.com wrote: On 2015-01-16 08:25, Mark Burton wrote: On 15 Jan 2015, at 22:41, Paolo Bonzini pbonz...@redhat.com wrote: On 15/01/2015 21:53, Mark Burton wrote: Jan said he had it working at least on ARM (MusicPal). yeah - our problem is when we enable multi-threads - which I dont believe Jan did… Multithreaded TCG, or single-threaded TCG with SMP? He mentions SMP, - I assume thats single-threaded …. Yes, I didn't patched anything towards multi-threaded SMP. Main reason: there was no answer on how to emulated the memory models of that target architecture over the host one which is mandatory if you let the emulated CPUs run unsynchronized in parallel. Did this change? No - we just decided to stop pressing the button… I think this is the ‘x86 on ARM’ issue ? - our plan is to get ARM of x86 working (or that class), and the worry about the other way round. I dont see why SMP ARM on X86 wouldn’t work with your patch? Cheers Mark. One thing I wonder - why do we need to go to the extent of mutexing in the TCG like this? Why can’t you simply put a mutex get/release on the slow path? If the core is going to do ‘fast path’ access to the memory, - even if that memory was IO mapped - would it matter if it didn’t have the mutex? Because there is no guarantee that the memory map isn't changed by a core under the feet of another. The TLB (in particular the iotlb) is only valid with reference to a particular memory map. Changes to the memory map certainly happen in the slow path, but lookups are part of the fast path. Even an rwlocks is too slow for a fast path, hence the plan of going with RCU. Could we arrange the world such that lookups ‘succeed’ (the wheels dont fall off) -ether getting the old value, or the new, but not getting rubbish - and we still only take the mutex if we are going to make alterations to the MM itself? (I have’t looked at the code around that… so sorry if the question is ridiculous). That's the definition of RCU. :) Look at the docs in http://permalink.gmane.org/gmane.comp.emulators.qemu/313929 for more information. :) Ahh - I see ! It's still not trivial to make it 100% correct, but at the same time it's not too hard to prepare something decent to play with. Also, most of the work can be done with KVM so it's more or less independent from what you guys have been doing so far. Yes - the issue is if we end up relying on it. But - I see what you mean - these 2 things can ‘dovetail’ together “independently” - so - Jan’s patch will be good for now, and then later we can use RCU to make it work more generally (and more efficiently). So - our only small problem is getting Jan’s patch to work for multi-thread :-)) See above regarding the potential dimension. Jan -- Siemens AG, Corporate Technology, CT RTC ITP SES-DE Corporate Competence Center Embedded Linux +44 (0)20 7100 3485 x 210 +33 (0)5 33 52 01 77x 210 +33 (0)603762104 mark.burton
Re: [Qemu-devel] [PATCH 1/4] configure: opengl overhaul
Hi, +# opengl probe, used by milkymist-tmu2 Maybe remove this part of the comment about milkymist? Will do. +if test $opengl != no ; then + opengl_pkgs=gl + if $pkg_config $opengl_pkgs; then +opengl_libs=$($pkg_config --libs $opengl_pkgs) -lX11 +opengl=yes `pkg-config gl` might be successful even without X11 available. In this case, $opengl will be set to 'yes', but linking will probably fail. Indeed. Maybe we should keep GLX separated from OpenGL; for SDL we don't need X11, but for milkymist-tmu2 we apparently do. That should be only temporary. There will be infrastructure for device emulations to ask the UI for a opengl context. Once this is in place we should be able to switch over milkymist-tmu2 to that so it doesn't has to initialize opengl on its own. cheers, Gerd
Re: [Qemu-devel] [virtio] virtqueue allocation and thread-safety
Does the guest memory allocation include the descriptor tables allocation? Which part of the structures is the descriptor tables? On 15.01.2015 03:41, Fam Zheng wrote: On Wed, 01/14 14:01, Vasile Catalin-B50542 wrote: Hi, I'm trying to make a new virtio device. I got it running (I made a functional dummy device guest driver). Now I'm trying to build some communication between the device and guest driver. I can't seem to find where the actual allocation of virtqueues are made. I've looked inside virtio_init(), but it just allocates the vq structures and I don't see any pointers inside that structure that might give and idea of something dynamically allocated. There is a member of that structure named vector, but it's type is uint16_t. I've also looked inside the virtio_add_queue(), and it just makes some constant assignments. Where are the vqs buffer space actually allocated? The guest memory is prepared by guest. See the virtio spec Virtual I/O Device (VIRTIO) Version 1.0 section 3.2.1 Supplying Buffers to The Device: quote The driver offers buffers to one of the device’s virtqueues as follows: 1. The driver places the buffer into free descriptor(s) in the descriptor table, chaining as necessary (see 2.4.5 The Virtqueue Descriptor Table). 2. The driver places the index of the head of the descriptor chain into the next ring entry of the available ring. 3. Steps 1 and 2 MAY be performed repeatedly if batching is possible. 4. The driver performs suitable a memory barrier to ensure the device sees the updated descriptor table and available ring before the next step. 5. The available idx is increased by the number of descriptor chain heads added to the available ring. 6. The driver performs a suitable memory barrier to ensure that it updates the idx field before checking for notification suppression. 7. If notifications are not suppressed, the driver notifies the device of the new available buffers. /unquote Fam One more thing. Are the virtqueue functions thread safe, from both point of views (qemu and guest driver API's view)? Also I don't see any dynamic allocations/freeing when pushing and popping, either. Cătă -- CatalinVasile Intern, DN-Software FreescaleSemiconductor, Inc. www.freescale.com phone: 073-021-1938 e-mail: catalin.vas...@freescale.com mailto:your.n...@freescale.com *Freescale_Logo-nosemi_Lh_4c*** This e-mail, and any associated attachments have been classified as: [ ] Public [ ] Freescale Semiconductor Internal Use Only [ ] Freescale Semiconductor Confidential Proprietary
Re: [Qemu-devel] [PATCH v3 6/9] rocker: add new rocker switch device
On 01/11/2015 11:57 AM, sfel...@gmail.com wrote: From: Scott Feldman sfel...@gmail.com Rocker is a simulated ethernet switch device. The device supports up to 62 front-panel ports and supports L2 switching and L3 routing functions, as well as L2/L3/L4 ACLs. The device presents a single PCI device for each switch, with a memory-mapped register space for device driver access. Rocker device is invoked with -device, for example a 4-port switch: -device rocker,name=sw1,len-ports=4,ports[0]=dev0,ports[1]=dev1, \ ports[2]=dev2,ports[3]=dev3 Each port is a netdev and can be paired with using -netdev id=port name. Signed-off-by: Scott Feldman sfel...@gmail.com Signed-off-by: Jiri Pirko j...@resnulli.us --- Looks like the devices does not support state saving. How about tagging it with unmigratable first? default-configs/pci.mak |1 + hw/net/Makefile.objs |3 + hw/net/rocker/rocker.c| 1395 + hw/net/rocker/rocker.h| 76 ++ hw/net/rocker/rocker_desc.c | 379 +++ hw/net/rocker/rocker_desc.h | 57 + hw/net/rocker/rocker_fp.c | 232 + hw/net/rocker/rocker_fp.h | 53 + hw/net/rocker/rocker_hw.h | 475 + hw/net/rocker/rocker_of_dpa.c | 2314 + hw/net/rocker/rocker_of_dpa.h | 25 + hw/net/rocker/rocker_tlv.h| 244 + hw/net/rocker/rocker_world.c | 108 ++ hw/net/rocker/rocker_world.h | 63 ++ 14 files changed, 5425 insertions(+) create mode 100644 hw/net/rocker/rocker.c create mode 100644 hw/net/rocker/rocker.h create mode 100644 hw/net/rocker/rocker_desc.c create mode 100644 hw/net/rocker/rocker_desc.h create mode 100644 hw/net/rocker/rocker_fp.c create mode 100644 hw/net/rocker/rocker_fp.h create mode 100644 hw/net/rocker/rocker_hw.h create mode 100644 hw/net/rocker/rocker_of_dpa.c create mode 100644 hw/net/rocker/rocker_of_dpa.h create mode 100644 hw/net/rocker/rocker_tlv.h create mode 100644 hw/net/rocker/rocker_world.c create mode 100644 hw/net/rocker/rocker_world.h [...] + +typedef struct rocker { +/* private */ +PCIDevice parent_obj; +/* public */ + +MemoryRegion mmio; +MemoryRegion msix_bar; + +/* switch configuration */ +char *name; /* switch name */ +uint32_t fp_ports; /* front-panel port count */ +NICPeers *fp_ports_peers; +MACAddr fp_start_macaddr;/* front-panel port 0 mac addr */ +uint64_t switch_id; /* switch id */ Looks like the function of name ad switch_id are duplicated? + +/* front-panel ports */ +FpPort *fp_port[ROCKER_FP_PORTS_MAX]; + +/* register backings */ +uint32_t test_reg; +uint64_t test_reg64; +dma_addr_t test_dma_addr; +uint32_t test_dma_size; + +/* desc rings */ +DescRing **rings; + +/* switch worlds */ +World *worlds[ROCKER_WORLD_TYPE_MAX]; +World *world_dflt; + +QLIST_ENTRY(rocker) next; +} Rocker; + [...] + +static int tx_consume(Rocker *r, DescInfo *info) +{ +PCIDevice *dev = PCI_DEVICE(r); +char *buf = desc_get_buf(info, true); +RockerTlv *tlv_frag; +RockerTlv *tlvs[ROCKER_TLV_TX_MAX + 1]; +struct iovec iov[ROCKER_TX_FRAGS_MAX] = { { 0, }, }; +uint32_t pport; +uint32_t port; +uint16_t tx_offload = ROCKER_TX_OFFLOAD_NONE; +uint16_t tx_l3_csum_off = 0; +uint16_t tx_tso_mss = 0; +uint16_t tx_tso_hdr_len = 0; +int iovcnt = 0; +int err = 0; +int rem; +int i; + +if (!buf) { +return -ENXIO; +} + +rocker_tlv_parse(tlvs, ROCKER_TLV_TX_MAX, buf, desc_tlv_size(info)); + +if (!tlvs[ROCKER_TLV_TX_FRAGS]) { +return -EINVAL; +} + +pport = rocker_get_pport_by_tx_ring(r, desc_get_ring(info)); +if (!fp_port_from_pport(pport, port)) { +return -EINVAL; +} + +if (tlvs[ROCKER_TLV_TX_OFFLOAD]) { +tx_offload = rocker_tlv_get_u8(tlvs[ROCKER_TLV_TX_OFFLOAD]); +} + +switch (tx_offload) { +case ROCKER_TX_OFFLOAD_L3_CSUM: +if (!tlvs[ROCKER_TLV_TX_L3_CSUM_OFF]) { +return -EINVAL; +} +case ROCKER_TX_OFFLOAD_TSO: +if (!tlvs[ROCKER_TLV_TX_TSO_MSS] || +!tlvs[ROCKER_TLV_TX_TSO_HDR_LEN]) { +return -EINVAL; +} +} + +if (tlvs[ROCKER_TLV_TX_L3_CSUM_OFF]) { +tx_l3_csum_off = rocker_tlv_get_le16(tlvs[ROCKER_TLV_TX_L3_CSUM_OFF]); +} + +if (tlvs[ROCKER_TLV_TX_TSO_MSS]) { +tx_tso_mss = rocker_tlv_get_le16(tlvs[ROCKER_TLV_TX_TSO_MSS]); +} + +if (tlvs[ROCKER_TLV_TX_TSO_HDR_LEN]) { +tx_tso_hdr_len = rocker_tlv_get_le16(tlvs[ROCKER_TLV_TX_TSO_HDR_LEN]); +} + +rocker_tlv_for_each_nested(tlv_frag, tlvs[ROCKER_TLV_TX_FRAGS], rem) { +hwaddr frag_addr; +uint16_t frag_len; + +if
Re: [Qemu-devel] [PATCH] s390x/pci: fix 2 bugs found by coverity
On Wed, 14 Jan 2015 16:20:47 +0100 Frank Blaschka blasc...@linux.vnet.ibm.com wrote: Signed-off-by: Frank Blaschka blasc...@linux.vnet.ibm.com --- hw/s390x/s390-pci-bus.c | 1 + hw/s390x/s390-pci-inst.c | 2 +- 2 files changed, 2 insertions(+), 1 deletion(-) Applied with a slightly tweaked description: s390x/pci: fix two bugs found by coverity Fix a resource leak and a store with the wrong length. Reviewed-by: Thomas Huth th...@linux.vnet.ibm.com Signed-off-by: Frank Blaschka blasc...@linux.vnet.ibm.com Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com
Re: [Qemu-devel] [PATCH 2/4] console: add opengl rendering helper functions
Hi, +bool console_gl_check_format(DisplayChangeListener *dcl, + pixman_format_code_t format) +{ +switch (format) { +case PIXMAN_x8r8g8b8: +case PIXMAN_a8r8g8b8: +case PIXMAN_r5g6b5: +return true; +default: +return false; +} +} What is this function supposed to be used for? Inflight change, there is a patch series on the list adding a check_format callback to DisplayChangeListenerOps, for format negotiation. + +void surface_gl_create_texture(DisplaySurface *surface) +{ +switch (surface-format) { +case PIXMAN_x8r8g8b8: +case PIXMAN_a8r8g8b8: +surface-glformat = GL_BGRA; Why does the format code seem to imply ARGB order but you're using the exact opposite? I could imagine something like endianness being the reason, but you're using RGB below where the format code says exactly that. endianness indeed. pixman formats are native endian, so this actually is bgra ordering in memory. While looking at it: I guess GL_BGRA is fixed byte ordering? So this probably needs fixing to work correctly on bigendian machines ... Discarding the alpha channel is intentional, I suppose? Yes. + surface_width(surface), + surface_height(surface), + 0, surface-glformat, surface-gltype, + surface_data(surface)); Is surface_stride(surface) specified to be surface_width(surface) * bytes_per_pixel (I don't know pixman so I don't know)? Usually this is the case, but there can be exceptions. Hmm, can I explicitly pass the stride? thanks, Gerd
Re: [Qemu-devel] [PATCH v3 7/9] qmp: add rocker device support
On 01/11/2015 11:57 AM, sfel...@gmail.com wrote: From: Scott Feldman sfel...@gmail.com Add QMP/HMP support for rocker devices. This is mostly for debugging purposes to see inside the device's tables and port configurations. Some examples: (qemu) info rocker sw1 name: sw1 id: 0x013512005452 ports: 4 (qemu) info rocker-ports sw1 ena/speed/ auto port linkduplex neg? sw1.1 up 10G FD No sw1.2 up 10G FD No sw1.3 !ena 10G FD No sw1.4 !ena 10G FD No (qemu) info rocker-of-dpa-flows sw1 prio tbl hits key(mask) -- actions 260 lport 1 vlan 1 LLDP src 00:02:00:00:02:00 dst 01:80:c2:00:00:0e 260 lport 1 vlan 1 ARP src 00:02:00:00:02:00 dst 00:02:00:00:03:00 260 lport 2 vlan 2 IPv6 src 00:02:00:00:03:00 dst 33:33:ff:00:00:02 proto 58 350 vlan 2 dst 33:33:ff:00:00:02 -- write group 0x3201 goto tbl 60 260 lport 2 vlan 2 IPv6 src 00:02:00:00:03:00 dst 33:33:ff:00:03:00 proto 58 350 1vlan 2 dst 33:33:ff:00:03:00 -- write group 0x3201 goto tbl 60 260 lport 2 vlan 2 ARP src 00:02:00:00:03:00 dst 00:02:00:00:02:00 350 2vlan 2 dst 00:02:00:00:02:00 -- write group 0x0201 goto tbl 60 260 1lport 2 vlan 2 IP src 00:02:00:00:03:00 dst 00:02:00:00:02:00 proto 1 350 2vlan 1 dst 00:02:00:00:03:00 -- write group 0x0102 goto tbl 60 260 1lport 1 vlan 1 IP src 00:02:00:00:02:00 dst 00:02:00:00:03:00 proto 1 260 lport 1 vlan 1 IPv6 src 00:02:00:00:02:00 dst 33:33:ff:00:00:01 proto 58 350 vlan 1 dst 33:33:ff:00:00:01 -- write group 0x3100 goto tbl 60 260 lport 1 vlan 1 IPv6 src 00:02:00:00:02:00 dst 33:33:ff:00:02:00 proto 58 350 1vlan 1 dst 33:33:ff:00:02:00 -- write group 0x3100 goto tbl 60 160 173 lport 2 vlan 2 LLDP src any dst 01:80:c2:00:00:0e -- write group 0x0200 160 6lport 2 vlan 2 IPv6 src any dst any -- write group 0x0200 160 174 lport 1 vlan 1 LLDP src any dst 01:80:c2:00:00:0e -- write group 0x0100 160 174 lport 2 vlan 2 IP src any dst any -- write group 0x0200 160 6lport 1 vlan 1 IPv6 src any dst any -- write group 0x0100 160 181 lport 2 vlan 2 ARP src any dst any -- write group 0x0200 110 715 lport 2 -- apply new vlan 2 goto tbl 20 160 177 lport 1 vlan 1 ARP src any dst any -- write group 0x0100 160 174 lport 1 vlan 1 IP src any dst any -- write group 0x0100 110 717 lport 1 -- apply new vlan 1 goto tbl 20 10 1432 lport 0(0x) -- goto tbl 10 (qemu) info rocker-of-dpa-groups sw1 id (decode) -- buckets 0x3201 (type L2 multicast vlan 2 index 1) -- groups [0x0201,0x0200] 0x0201 (type L2 interface vlan 2 lport 1) -- pop vlan out lport 1 0x0102 (type L2 interface vlan 1 lport 2) -- pop vlan out lport 2 0x0200 (type L2 interface vlan 2 lport 0) -- pop vlan out lport 0 0x0100 (type L2 interface vlan 1 lport 0) -- pop vlan out lport 0 0x3100 (type L2 multicast vlan 1 index 0) -- groups [0x0102,0x0100] Signed-off-by: Scott Feldman sfel...@gmail.com Signed-off-by: Jiri Pirko j...@resnulli.us --- hmp-commands.hx | 24 hmp.c | 303 hmp.h |4 + hw/net/rocker/rocker.c| 45 ++ hw/net/rocker/rocker_fp.c | 10 ++ hw/net/rocker/rocker_fp.h |1 + hw/net/rocker/rocker_of_dpa.c | 308 + monitor.c | 28 qapi-schema.json |3 + qapi/rocker.json | 259 ++ qmp-commands.hx | 97 + 11 files changed, 1082 insertions(+) create mode 100644 qapi/rocker.json I don't object this method but not sure it was worth (especially consider it needs nearly a thousand lines of codes). Could we just do this through remote controller or agent? Thanks
Re: [Qemu-devel] [PATCH v3 6/9] rocker: add new rocker switch device
On Fri, Jan 16, 2015 at 1:15 AM, Jason Wang jasow...@redhat.com wrote: On 01/11/2015 11:57 AM, sfel...@gmail.com wrote: Each port is a netdev and can be paired with using -netdev id=port name. Signed-off-by: Scott Feldman sfel...@gmail.com Signed-off-by: Jiri Pirko j...@resnulli.us --- Looks like the devices does not support state saving. How about tagging it with unmigratable first? State saving would be nice to add in the future. How do we tag is unmigratable for now? +uint64_t switch_id; /* switch id */ Looks like the function of name ad switch_id are duplicated? Don't follow... +if (iovcnt) { +/* XXX perform Tx offloads */ +/* XXX silence compiler for now */ Need add TSO here (e1000 is a good reference) or disable TSO in driver. TSO is not enabled in driver, currently. This is lower priority now since the I/O path for traffic to/from the guest CPU is not performance critical. The forwarding path offloaded by the device is the hot path, but that doesn't involve I/O to guest CPU, so Tx/Rx offloads aren't in play. But, someday, it would be nice to enable Rx/Tx offloads. +rocker_tlv_put_le32(buf, pos, ROCKER_TLV_EVENT_TYPE, +ROCKER_TLV_EVENT_TYPE_LINK_CHANGED); +nest = rocker_tlv_nest_start(buf, pos, ROCKER_TLV_EVENT_INFO); +rocker_tlv_put_le32(buf, pos, ROCKER_TLV_EVENT_LINK_CHANGED_PPORT, pport); +rocker_tlv_put_u8(buf, pos, ROCKER_TLV_EVENT_LINK_CHANGED_LINKUP, + link_up ? 1 : 0); Looks like those types are not documented. Ok, I'll double check spec to make sure we're covered, for the items you found. +rocker_tlv_put_le16(buf, pos, ROCKER_TLV_RX_FLAGS, rx_flags); +rocker_tlv_put_le16(buf, pos, ROCKER_TLV_RX_CSUM, rx_csum); Note: Some backend (e.g tap) can do offloading, may consider to add the support in the future. Yup, on TODO list for Rx/Tx offloads. Using only 1 queues is ok for multiqueue backend. In the future may consider to use all. When we get to QoS for working, I think we'll want to enable these multiqueues. QoS support is not implemented in rocker, currently, but there is support defined in the OF-DPA spec as part of ACL processing. So this is future work. Thanks Thanks for reviewing.
Re: [Qemu-devel] [PATCH v3 7/9] qmp: add rocker device support
On Fri, Jan 16, 2015 at 1:26 AM, Jason Wang jasow...@redhat.com wrote: On 01/11/2015 11:57 AM, sfel...@gmail.com wrote: From: Scott Feldman sfel...@gmail.com Add QMP/HMP support for rocker devices. This is mostly for debugging purposes to see inside the device's tables and port configurations. Some examples: (qemu) info rocker sw1 name: sw1 id: 0x013512005452 ports: 4 (qemu) info rocker-ports sw1 ena/speed/ auto port linkduplex neg? sw1.1 up 10G FD No sw1.2 up 10G FD No sw1.3 !ena 10G FD No sw1.4 !ena 10G FD No (qemu) info rocker-of-dpa-flows sw1 prio tbl hits key(mask) -- actions 260 lport 1 vlan 1 LLDP src 00:02:00:00:02:00 dst 01:80:c2:00:00:0e 260 lport 1 vlan 1 ARP src 00:02:00:00:02:00 dst 00:02:00:00:03:00 260 lport 2 vlan 2 IPv6 src 00:02:00:00:03:00 dst 33:33:ff:00:00:02 proto 58 350 vlan 2 dst 33:33:ff:00:00:02 -- write group 0x3201 goto tbl 60 260 lport 2 vlan 2 IPv6 src 00:02:00:00:03:00 dst 33:33:ff:00:03:00 proto 58 350 1vlan 2 dst 33:33:ff:00:03:00 -- write group 0x3201 goto tbl 60 260 lport 2 vlan 2 ARP src 00:02:00:00:03:00 dst 00:02:00:00:02:00 350 2vlan 2 dst 00:02:00:00:02:00 -- write group 0x0201 goto tbl 60 260 1lport 2 vlan 2 IP src 00:02:00:00:03:00 dst 00:02:00:00:02:00 proto 1 350 2vlan 1 dst 00:02:00:00:03:00 -- write group 0x0102 goto tbl 60 260 1lport 1 vlan 1 IP src 00:02:00:00:02:00 dst 00:02:00:00:03:00 proto 1 260 lport 1 vlan 1 IPv6 src 00:02:00:00:02:00 dst 33:33:ff:00:00:01 proto 58 350 vlan 1 dst 33:33:ff:00:00:01 -- write group 0x3100 goto tbl 60 260 lport 1 vlan 1 IPv6 src 00:02:00:00:02:00 dst 33:33:ff:00:02:00 proto 58 350 1vlan 1 dst 33:33:ff:00:02:00 -- write group 0x3100 goto tbl 60 160 173 lport 2 vlan 2 LLDP src any dst 01:80:c2:00:00:0e -- write group 0x0200 160 6lport 2 vlan 2 IPv6 src any dst any -- write group 0x0200 160 174 lport 1 vlan 1 LLDP src any dst 01:80:c2:00:00:0e -- write group 0x0100 160 174 lport 2 vlan 2 IP src any dst any -- write group 0x0200 160 6lport 1 vlan 1 IPv6 src any dst any -- write group 0x0100 160 181 lport 2 vlan 2 ARP src any dst any -- write group 0x0200 110 715 lport 2 -- apply new vlan 2 goto tbl 20 160 177 lport 1 vlan 1 ARP src any dst any -- write group 0x0100 160 174 lport 1 vlan 1 IP src any dst any -- write group 0x0100 110 717 lport 1 -- apply new vlan 1 goto tbl 20 10 1432 lport 0(0x) -- goto tbl 10 (qemu) info rocker-of-dpa-groups sw1 id (decode) -- buckets 0x3201 (type L2 multicast vlan 2 index 1) -- groups [0x0201,0x0200] 0x0201 (type L2 interface vlan 2 lport 1) -- pop vlan out lport 1 0x0102 (type L2 interface vlan 1 lport 2) -- pop vlan out lport 2 0x0200 (type L2 interface vlan 2 lport 0) -- pop vlan out lport 0 0x0100 (type L2 interface vlan 1 lport 0) -- pop vlan out lport 0 0x3100 (type L2 multicast vlan 1 index 0) -- groups [0x0102,0x0100] Signed-off-by: Scott Feldman sfel...@gmail.com Signed-off-by: Jiri Pirko j...@resnulli.us --- hmp-commands.hx | 24 hmp.c | 303 hmp.h |4 + hw/net/rocker/rocker.c| 45 ++ hw/net/rocker/rocker_fp.c | 10 ++ hw/net/rocker/rocker_fp.h |1 + hw/net/rocker/rocker_of_dpa.c | 308 + monitor.c | 28 qapi-schema.json |3 + qapi/rocker.json | 259 ++ qmp-commands.hx | 97 + 11 files changed, 1082 insertions(+) create mode 100644 qapi/rocker.json I don't object this method but not sure it was worth (especially consider it needs nearly a thousand lines of codes). I wasn't happy about touching core files to add rocker support. Splitting device-specific stuff out into qapi/rocker.json helped. Seems qmp/hmp aren't really setup for devices to plug-in their stuff in in a modular way. I couldn't find other examples of devices using qmp/hmp to the extent rocker is, but maybe I didn't look hard enough. However, QMP/HMP support is very handy for debugging rocker device since there is tons of state stored in the switch forwarding tables, and it can be quite dynamic depending on what's going on on the guest OS w.r.t networking. Could we just do this through remote controller or agent? I'm open to ideas. qmp/hmp work great for getting an instantaneous snapshot of the device state. The JSON format goodness come for free with qmp, so that's nice. -scott
Re: [Qemu-devel] kvmclock, Migration, and NTP clock jitter
On Thu, Jan 15, 2015 at 06:27:54PM +0100, Paolo Bonzini wrote: On 15/01/2015 17:39, Mohammed Gamal wrote: The increase in the jitter and offset values is well within the 500 ppm frequency tolerance limit, and therefore are easily corrected by subsequent NTP clock sync events, but some live migrations do cause much higher jitter and offset jumps, which can not be corrected by NTP and cause the time to go way off. Any idea why this is the case? It might be fixed in QEMU 2.2. See https://lists.gnu.org/archive/html/qemu-devel/2014-09/msg01239.html Paolo Hi Paolo, I did try to backport these patches to qemu 1.2. However, migrations resulted in *higher* jitter and offset values (i.e. in the order of 100+ ppm). I am not sure if I've done the backporting correctly though. Here are my patches on top of the qemu 1.2 stable tree. diff --git a/cpus.c b/cpus.c index 29aced5..e079ee5 100644 --- a/cpus.c +++ b/cpus.c @@ -187,6 +187,15 @@ void cpu_disable_ticks(void) } } +void cpu_clean_all_dirty(void) +{ +CPUArchState *cpu; + +for (cpu = first_cpu; cpu; cpu = cpu-next_cpu) { +cpu_clean_state(cpu); +} +} + /* Correlation between real and virtual time is always going to be fairly approximate, so ignore small variation. When the guest is idle real and virtual time will be aligned in diff --git a/cpus.h b/cpus.h index 3fc1a4a..1ff166b 100644 --- a/cpus.h +++ b/cpus.h @@ -12,6 +12,7 @@ void unplug_vcpu(void *p); void cpu_synchronize_all_states(void); void cpu_synchronize_all_post_reset(void); void cpu_synchronize_all_post_init(void); +void cpu_clean_all_dirty(void); void qtest_clock_warp(int64_t dest); diff --git a/hw/kvm/clock.c b/hw/kvm/clock.c index 824b978..b2bdda4 100644 --- a/hw/kvm/clock.c +++ b/hw/kvm/clock.c @@ -16,6 +16,8 @@ #include qemu-common.h #include sysemu.h #include kvm.h +#include host-utils.h +#include cpus.h #include hw/sysbus.h #include hw/kvm/clock.h @@ -28,6 +30,46 @@ typedef struct KVMClockState { bool clock_valid; } KVMClockState; +struct pvclock_vcpu_time_info { +uint32_t version; +uint32_t pad0; +uint64_t tsc_timestamp; +uint64_t system_time; +uint32_t tsc_to_system_mul; +int8_t tsc_shift; +uint8_tflags; +uint8_tpad[2]; +} __attribute__((__packed__)); /* 32 bytes */ + +static uint64_t kvmclock_current_nsec(KVMClockState *s) +{ +CPUArchState *env = first_cpu; +uint64_t migration_tsc = env-tsc; +struct pvclock_vcpu_time_info time; +uint64_t delta; +uint64_t nsec_lo; +uint64_t nsec_hi; +uint64_t nsec; + +if (!(env-system_time_msr 1ULL)) { +/* KVM clock not active */ +return 0; +} +cpu_physical_memory_read((env-system_time_msr ~1ULL), time, sizeof(time)); + +assert(time.tsc_timestamp = migration_tsc); +delta = migration_tsc - time.tsc_timestamp; +if (time.tsc_shift 0) { +delta = -time.tsc_shift; +} else { +delta = time.tsc_shift; +} + +mulu64(nsec_lo, nsec_hi, delta, time.tsc_to_system_mul); +nsec = (nsec_lo 32) | (nsec_hi 32); +return nsec + time.system_time; +} + static void kvmclock_pre_save(void *opaque) { KVMClockState *s = opaque; @@ -37,6 +79,23 @@ static void kvmclock_pre_save(void *opaque) if (s-clock_valid) { return; } + +cpu_synchronize_all_states(); +/* In theory, the cpu_synchronize_all_states() call above wouldn't + * affect the rest of the code, as the VCPU state inside CPUArchState + * is supposed to always match the VCPU state on the kernel side. + * + * In practice, calling cpu_synchronize_state() too soon will load the + * kernel-side APIC state into X86CPU.apic_state too early, APIC state + * won't be reloaded later because CPUState.vcpu_dirty==true, and + * outdated APIC state may be migrated to another host. + * + * The real fix would be to make sure outdated APIC state is read + * from the kernel again when necessary. While this is not fixed, we + * need the cpu_clean_all_dirty() call below. + */ +cpu_clean_all_dirty(); + ret = kvm_vm_ioctl(kvm_state, KVM_GET_CLOCK, data); if (ret 0) { fprintf(stderr, KVM_GET_CLOCK failed: %s\n, strerror(ret)); @@ -55,6 +114,12 @@ static int kvmclock_post_load(void *opaque, int version_id) { KVMClockState *s = opaque; struct kvm_clock_data data; +uint64_t time_at_migration = kvmclock_current_nsec(s); + +/* We can't rely on the migrated clock value, just discard it */ +if (time_at_migration) { +s-clock = time_at_migration; +} data.clock = s-clock; data.flags = 0; diff --git a/kvm-all.c b/kvm-all.c index cd2ccbe..692944e 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -1547,6 +1547,11 @@ void kvm_cpu_synchronize_post_init(CPUArchState *env) env-kvm_vcpu_dirty = 0; } +void kvm_cpu_clean_state(CPUArchState *env) +{ +env-kvm_vcpu_dirty = false; +} +
Re: [Qemu-devel] [PATCH 3/8] rcu: add rcutorture
On 16/01/2015 03:04, Fam Zheng wrote: On Tue, 01/13 18:52, Paolo Bonzini wrote: +int main(int argc, char *argv[]) +{ +int nreaders = 1; +int duration = 1; + +if (argc = 2 argv[1][0] == '-') { +g_test_init(argc, argv, NULL); +g_test_add_func(/rcu/torture/short/1reader, gtest_stress_1_1); +g_test_add_func(/rcu/torture/short/10readers, gtest_stress_10_1); +g_test_add_func(/rcu/torture/long/1reader, gtest_stress_1_5); +g_test_add_func(/rcu/torture/long/10readers, gtest_stress_10_5); Why do we need short tests when we have long tests? What are different other than the durations? Nothing really. I guess long tests could be made optional (-m slow). +return g_test_run(); +} + +if (argc = 2) { +nreaders = strtoul(argv[1], NULL, 0); +} +if (argc 3) { +duration = strtoul(argv[3], NULL, 0); +} +if (argc 3 || strcmp(argv[2], stress) == 0) { +stresstest(nreaders, duration); +} else if (strcmp(argv[2], rperf) == 0) { +rperftest(nreaders, duration); +} else if (strcmp(argv[2], uperf) == 0) { +uperftest(nreaders, duration); +} else if (strcmp(argv[2], perf) == 0) { +perftest(nreaders, duration); +} +usage(argc, argv); +return 0; +} diff --git a/util/rcu.c b/util/rcu.c index f8f52ae..8df16d9 100644 --- a/util/rcu.c +++ b/util/rcu.c @@ -169,5 +169,4 @@ static void __attribute__((__constructor__)) rcu_init(void) { qemu_mutex_init(rcu_gp_lock); qemu_event_init(rcu_gp_event, true); -rcu_register_thread(); Maybe don't add this line in patch 2? More like don't remove it in patch 3. ;) Paolo
Re: [Qemu-devel] [PULL] migration patches
On 16 January 2015 at 07:47, Amit Shah amit.s...@redhat.com wrote: The following changes since commit df58887b20fab8fe8a6dcca4db30cd4e4077d53a: Merge remote-tracking branch 'remotes/mjt/tags/pull-trivial-patches-2015-01-15' into staging (2015-01-15 10:08:46 +) are available in the git repository at: git://git.kernel.org/pub/scm/virt/qemu/amit/migration.git tags/mig-2.3-1 for you to fetch changes up to ea987c2c21d4326bb58ee28f6888fdcf8fbda067: vmstate: type-check sub-arrays (2015-01-16 13:06:17 +0530) A set of patches collected over the holidays. Mix of optimizations and fixes. Applied, thanks. -- PMM
[Qemu-devel] [PULL 1/2] target-arm: crypto: fix BE host support
From: Ard Biesheuvel ard.biesheu...@linaro.org The crypto emulation code in target-arm/crypto_helper.c never worked correctly on big endian hosts, due to the fact that it uses a union of array types to convert between the native VFP register size (64 bits) and the types used in the algorithms (bytes and 32 bit words) We cannot just swab between LE and BE when reading and writing the registers, as the SHA code performs word additions, so instead, add array accessors for the CRYPTO_STATE type whose LE and BE specific implementations ensure that the correct array elements are referenced. Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org Acked-by: Laszlo Ersek ler...@redhat.com Message-id: 1420208303-24111-1-git-send-email-ard.biesheu...@linaro.org Reviewed-by: Peter Maydell peter.mayd...@linaro.org Signed-off-by: Peter Maydell peter.mayd...@linaro.org --- target-arm/crypto_helper.c | 114 + 1 file changed, 63 insertions(+), 51 deletions(-) diff --git a/target-arm/crypto_helper.c b/target-arm/crypto_helper.c index dd60d0b..1fe975d 100644 --- a/target-arm/crypto_helper.c +++ b/target-arm/crypto_helper.c @@ -22,6 +22,14 @@ union CRYPTO_STATE { uint64_t l[2]; }; +#ifdef HOST_WORDS_BIGENDIAN +#define CR_ST_BYTE(state, i) (state.bytes[(15 - (i)) ^ 8]) +#define CR_ST_WORD(state, i) (state.words[(3 - (i)) ^ 2]) +#else +#define CR_ST_BYTE(state, i) (state.bytes[i]) +#define CR_ST_WORD(state, i) (state.words[i]) +#endif + void HELPER(crypto_aese)(CPUARMState *env, uint32_t rd, uint32_t rm, uint32_t decrypt) { @@ -46,7 +54,7 @@ void HELPER(crypto_aese)(CPUARMState *env, uint32_t rd, uint32_t rm, /* combine ShiftRows operation and sbox substitution */ for (i = 0; i 16; i++) { -st.bytes[i] = sbox[decrypt][rk.bytes[shift[decrypt][i]]]; +CR_ST_BYTE(st, i) = sbox[decrypt][CR_ST_BYTE(rk, shift[decrypt][i])]; } env-vfp.regs[rd] = make_float64(st.l[0]); @@ -198,11 +206,11 @@ void HELPER(crypto_aesmc)(CPUARMState *env, uint32_t rd, uint32_t rm, assert(decrypt 2); for (i = 0; i 16; i += 4) { -st.words[i 2] = cpu_to_le32( -mc[decrypt][st.bytes[i]] ^ -rol32(mc[decrypt][st.bytes[i + 1]], 8) ^ -rol32(mc[decrypt][st.bytes[i + 2]], 16) ^ -rol32(mc[decrypt][st.bytes[i + 3]], 24)); +CR_ST_WORD(st, i 2) = +mc[decrypt][CR_ST_BYTE(st, i)] ^ +rol32(mc[decrypt][CR_ST_BYTE(st, i + 1)], 8) ^ +rol32(mc[decrypt][CR_ST_BYTE(st, i + 2)], 16) ^ +rol32(mc[decrypt][CR_ST_BYTE(st, i + 3)], 24); } env-vfp.regs[rd] = make_float64(st.l[0]); @@ -255,24 +263,25 @@ void HELPER(crypto_sha1_3reg)(CPUARMState *env, uint32_t rd, uint32_t rn, switch (op) { case 0: /* sha1c */ -t = cho(d.words[1], d.words[2], d.words[3]); +t = cho(CR_ST_WORD(d, 1), CR_ST_WORD(d, 2), CR_ST_WORD(d, 3)); break; case 1: /* sha1p */ -t = par(d.words[1], d.words[2], d.words[3]); +t = par(CR_ST_WORD(d, 1), CR_ST_WORD(d, 2), CR_ST_WORD(d, 3)); break; case 2: /* sha1m */ -t = maj(d.words[1], d.words[2], d.words[3]); +t = maj(CR_ST_WORD(d, 1), CR_ST_WORD(d, 2), CR_ST_WORD(d, 3)); break; default: g_assert_not_reached(); } -t += rol32(d.words[0], 5) + n.words[0] + m.words[i]; - -n.words[0] = d.words[3]; -d.words[3] = d.words[2]; -d.words[2] = ror32(d.words[1], 2); -d.words[1] = d.words[0]; -d.words[0] = t; +t += rol32(CR_ST_WORD(d, 0), 5) + CR_ST_WORD(n, 0) + + CR_ST_WORD(m, i); + +CR_ST_WORD(n, 0) = CR_ST_WORD(d, 3); +CR_ST_WORD(d, 3) = CR_ST_WORD(d, 2); +CR_ST_WORD(d, 2) = ror32(CR_ST_WORD(d, 1), 2); +CR_ST_WORD(d, 1) = CR_ST_WORD(d, 0); +CR_ST_WORD(d, 0) = t; } } env-vfp.regs[rd] = make_float64(d.l[0]); @@ -286,8 +295,8 @@ void HELPER(crypto_sha1h)(CPUARMState *env, uint32_t rd, uint32_t rm) float64_val(env-vfp.regs[rm + 1]) } }; -m.words[0] = ror32(m.words[0], 2); -m.words[1] = m.words[2] = m.words[3] = 0; +CR_ST_WORD(m, 0) = ror32(CR_ST_WORD(m, 0), 2); +CR_ST_WORD(m, 1) = CR_ST_WORD(m, 2) = CR_ST_WORD(m, 3) = 0; env-vfp.regs[rd] = make_float64(m.l[0]); env-vfp.regs[rd + 1] = make_float64(m.l[1]); @@ -304,10 +313,10 @@ void HELPER(crypto_sha1su1)(CPUARMState *env, uint32_t rd, uint32_t rm) float64_val(env-vfp.regs[rm + 1]) } }; -d.words[0] = rol32(d.words[0] ^ m.words[1], 1); -d.words[1] = rol32(d.words[1] ^ m.words[2], 1); -d.words[2] = rol32(d.words[2] ^ m.words[3], 1); -d.words[3] = rol32(d.words[3] ^ d.words[0],
[Qemu-devel] [PULL 0/2] target-arm queue
A short queue, but I don't want to sit on these fixes any longer. -- PMM The following changes since commit e68cba36360a2ab5bf0576b66df4d0eb0d822f8d: Merge remote-tracking branch 'remotes/amit-migration/tags/mig-2.3-1' into staging (2015-01-16 10:16:14 +) are available in the git repository at: git://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20150116 for you to fetch changes up to 36b62ae6a58f9a588fd33be9386e18a2b90103f5: fw_cfg: fix endianness in fw_cfg_data_mem_read() / _write() (2015-01-16 11:54:30 +) target-arm queue: * fix endianness handling in fwcfg wide registers * fix broken crypto insn emulation on big endian hosts Ard Biesheuvel (1): target-arm: crypto: fix BE host support Laszlo Ersek (1): fw_cfg: fix endianness in fw_cfg_data_mem_read() / _write() hw/nvram/fw_cfg.c | 41 +++- target-arm/crypto_helper.c | 114 + 2 files changed, 70 insertions(+), 85 deletions(-)
[Qemu-devel] [PULL 2/2] fw_cfg: fix endianness in fw_cfg_data_mem_read() / _write()
From: Laszlo Ersek ler...@redhat.com (1) Let's contemplate what device endianness means, for a memory mapped device register (independently of QEMU -- that is, on physical hardware). It determines the byte order that the device will put on the data bus when the device is producing a *numerical value* for the CPU. This byte order may differ from the CPU's own byte order, therefore when software wants to consume the *numerical value*, it may have to swap the byte order first. For example, suppose we have a device that exposes in a 2-byte register the number of sheep we have to count before falling asleep. If the value is decimal 37 (0x0025), then a big endian register will produce [0x00, 0x25], while a little endian register will produce [0x25, 0x00]. If the device register is big endian, but the CPU is little endian, the numerical value will read as 0x2500 (decimal 9472), which software has to byte swap before use. However... if we ask the device about who stole our herd of sheep, and it answers XY, then the byte representation coming out of the register must be [0x58, 0x59], regardless of the device register's endianness for numeric values. And, software needs to copy these bytes into a string field regardless of the CPU's own endianness. (2) QEMU's device register accessor functions work with *numerical values* exclusively, not strings: The emulated register's read accessor function returns the numerical value (eg. 37 decimal, 0x0025) as a *host-encoded* uint64_t. QEMU translates this value for the guest to the endianness of the emulated device register (which is recorded in MemoryRegionOps.endianness). Then guest code must translate the numerical value from device register to guest CPU endianness, before including it in any computation (see (1)). (3) However, the data register of the fw_cfg device shall transfer strings *only* -- that is, opaque blobs. Interpretation of any given blob is subject to further agreement -- it can be an integer in an independently determined byte order, or a genuine string, or an array of structs of integers (in some byte order) and fixed size strings, and so on. Because register emulation in QEMU is integer-preserving, not string-preserving (see (2)), we have to jump through a few hoops. (3a) We defined the memory mapped fw_cfg data register as DEVICE_BIG_ENDIAN. The particular choice is not really relevant -- we picked BE only for consistency with the control register, which *does* transfer integers -- but our choice affects how we must host-encode values from fw_cfg strings. (3b) Since we want the fw_cfg string XY to appear as the [0x58, 0x59] array on the data register, *and* we picked DEVICE_BIG_ENDIAN, we must compose the host (== C language) value 0x5859 in the read accessor function. (3c) When the guest performs the read access, the immediate uint16_t value will be 0x5958 (in LE guests) and 0x5859 (in BE guests). However, the uint16_t value does not matter. The only thing that matters is the byte pattern [0x58, 0x59], which the guest code must copy into the target string *without* any byte-swapping. (4) Now I get to explain where I screwed up. :( When we decided for big endian *integer* representation in the MMIO data register -- see (3a) --, I mindlessly added an indiscriminate byte-swizzling step to the (little endian) guest firmware. This was a grave error -- it violates (3c) --, but I didn't realize it. I only saw that the code I otherwise intended for fw_cfg_data_mem_read(): value = 0; for (i = 0; i size; ++i) { value = (value 8) | fw_cfg_read(s); } didn't produce the expected result in the guest. In true facepalm style, instead of blaming my guest code (which violated (3c)), I blamed my host code (which was correct). Ultimately, I coded ldX_he_p() into fw_cfg_data_mem_read(), because that happened to work. Obviously (...in retrospect) that was wrong. Only because my host happened to be LE, ldX_he_p() composed the (otherwise incorrect) host value 0x5958 from the fw_cfg string XY. And that happened to compensate for the bogus indiscriminate byte-swizzling in my guest code. Clearly the current code leaks the host endianness through to the guest, which is wrong. Any device should work the same regardless of host endianness. The solution is to compose the host-endian representation (2) of the big endian interpretation (3a, 3b) of the fw_cfg string, and to drop the wrong byte-swizzling in the guest (3c). Brown paper bag time for me. Signed-off-by: Laszlo Ersek ler...@redhat.com Message-id: 1420024880-15416-1-git-send-email-ler...@redhat.com Reviewed-by: Peter Maydell peter.mayd...@linaro.org Signed-off-by: Peter Maydell peter.mayd...@linaro.org --- hw/nvram/fw_cfg.c | 41 +++-- 1 file changed, 7 insertions(+), 34 deletions(-) diff --git a/hw/nvram/fw_cfg.c b/hw/nvram/fw_cfg.c index fcdf821..78a37be 100644 --- a/hw/nvram/fw_cfg.c +++ b/hw/nvram/fw_cfg.c @@ -287,51 +287,24 @@ static
Re: [Qemu-devel] Multi GPU passthrough via VFIO
Hi Alex, Maik Broemme mbroe...@parallels.com wrote: Hi Alex, Maik Broemme mbroe...@parallels.com wrote: Hi Alex, Alex Williamson alex.william...@redhat.com wrote: On Fri, 2014-02-14 at 01:01 +0100, Maik Broemme wrote: Hi Alex, Maik Broemme mbroe...@parallels.com wrote: Hi Alex, Alex Williamson alex.william...@redhat.com wrote: On Fri, 2014-02-07 at 01:22 +0100, Maik Broemme wrote: Interesting is the diff between 1st and 2nd boot, so if I do the lspci prior to the booting. The only difference between 1st start and 2nd start are: --- 001-lspci.290x.before.1st.log 2014-02-07 01:13:41.498827928 +0100 +++ 004-lspci.290x.before.2nd.log 2014-02-07 01:16:50.966611282 +0100 @@ -24,7 +24,7 @@ ClockPM- Surprise- LLActRep- BwNot- LnkCtl: ASPM L0s L1 Enabled; RCB 64 bytes Disabled- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- - LnkSta: Speed 5GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- + LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR-, OBFF Not Supported DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis- @@ -33,13 +33,13 @@ LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1- EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest- Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit+ - Address: Data: + Address: fee0 Data: Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1 Len=010 ? Capabilities: [150 v2] Advanced Error Reporting UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- - CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr- + CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn- Capabilities: [270 v1] #19 After that if I do suspend-to-ram / resume trick I have again lspci output from before 1st boot. The Link Status change after X is stopped seems the most interesting to me. The MSI change is probably explained by the MSI save/restore of the device, but should be harmless since MSI is disabled. I'm a bit surprised the Correctable Error Status in the AER capability didn't get cleared. I would have thought that a bus reset would have caused the link to retrain back to the original speed/width as well. Let's check that we're actually getting a bus reset, try this in addition to the previous qemu patch. This just enables debug logging for the bus resest function. Thanks, Below are the outputs from 2 boots, VGA, load fglrx and start X. (2nd time X gets killed and oops happened) - 1st boot: vfio: vfio_pci_hot_reset(:01:00.1) multi vfio: :01:00.1: hot reset dependent devices: vfio: :01:00.0 group 1 vfio: :01:00.1 group 1 vfio: :01:00.1 hot reset: Success vfio: vfio_pci_hot_reset(:01:00.1) one vfio: :01:00.1: hot reset dependent devices: vfio: :01:00.0 group 1 vfio: vfio: found another in-use device :01:00.0 vfio: vfio_pci_hot_reset(:01:00.0) one vfio: :01:00.0: hot reset dependent devices: vfio: :01:00.0 group 1 vfio: :01:00.1 group 1 vfio: vfio: found another in-use device :01:00.1 - 2nd boot: vfio: vfio_pci_hot_reset(:01:00.1) multi vfio: :01:00.1: hot reset dependent devices: vfio: :01:00.0 group 1 vfio: :01:00.1 group 1 vfio: :01:00.1 hot reset: Success vfio: vfio_pci_hot_reset(:01:00.1) one vfio: :01:00.1: hot reset dependent devices: vfio: :01:00.0 group 1 vfio: vfio: found another in-use device :01:00.0 vfio:
Re: [Qemu-devel] [PULL 0/2] target-arm queue
On 16 January 2015 at 12:05, Peter Maydell peter.mayd...@linaro.org wrote: A short queue, but I don't want to sit on these fixes any longer. -- PMM The following changes since commit e68cba36360a2ab5bf0576b66df4d0eb0d822f8d: Merge remote-tracking branch 'remotes/amit-migration/tags/mig-2.3-1' into staging (2015-01-16 10:16:14 +) are available in the git repository at: git://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20150116 for you to fetch changes up to 36b62ae6a58f9a588fd33be9386e18a2b90103f5: fw_cfg: fix endianness in fw_cfg_data_mem_read() / _write() (2015-01-16 11:54:30 +) target-arm queue: * fix endianness handling in fwcfg wide registers * fix broken crypto insn emulation on big endian hosts Applied, thanks. -- PMM
Re: [Qemu-devel] [PATCH 7/9] block-migration: remove not needed iothread lock
On 09.01.2015 00:24, John Snow wrote: On 12/11/2014 09:17 AM, Vladimir Sementsov-Ogievskiy wrote: Instead of locking iothread, we can just swap these calls. So, if some write to our range occures before resetting the bitmap, then it will get into subsequent aio read, becouse it occures, in any case, after resetting the bitmap. s/occures/occurs/g s/becouse/because/g (I hope you are not annoyed by the spelling corrections: They are in good faith and personally I would hope people would correct any of my spelling mistakes before it goes in the git log!) Signed-off-by: Vladimir Sementsov-Ogievskiy vsement...@parallels.com --- block-migration.c | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/block-migration.c b/block-migration.c index d0c825f..908a66d 100644 --- a/block-migration.c +++ b/block-migration.c @@ -315,13 +315,11 @@ static int mig_save_device_bulk(QEMUFile *f, BlkMigDevState *bmds) block_mig_state.submitted++; blk_mig_unlock(); -qemu_mutex_lock_iothread(); +bdrv_reset_dirty_bitmap(bs, bmds-dirty_bitmap, cur_sector, nr_sectors); + blk-aiocb = bdrv_aio_readv(bs, cur_sector, blk-qiov, nr_sectors, blk_mig_read_cb, blk); -bdrv_reset_dirty_bitmap(bs, bmds-dirty_bitmap, cur_sector, nr_sectors); -qemu_mutex_unlock_iothread(); - bmds-cur_sector = cur_sector + nr_sectors; return (bmds-cur_sector = total_sectors); } OK, so the justification here is that by ordering it as reset, read that any writes that occur after the reset will be simply included in the read, and we won't have reset any bits that we didn't actually get a chance to read. OK. But what about losing the ability to clear bits that are set needlessly? e.g.: Sector 1 is dirty. Sector 2 is clean. We reset the bitmap. All sectors are clean. A write occurs to sector 2, marking it dirty. We read sectors one and two. Sector two is now erroneously marked dirty, when it isn't. It's not a data integrity problem, but we're swapping one inefficiency for another. Do you have a justification for why this is better than the lock? I think without lock is always better. Lock-less operations are usually better, aren't they? So, I didn't think about such kind of efficiency, but only about that this lock is needless. If we don't swap these reset and read, lock is necessary of course. Furthermore, If we have a lock, this write (to sector 2) will happen after unlock, and the sector will be dirty anyway. So, this efficiency is the same, but we can remove lock.
Re: [Qemu-devel] [PATCH 7/9] block-migration: remove not needed iothread lock
Best regards, Vladimir On 09.01.2015 01:28, Paolo Bonzini wrote: On 11/12/2014 15:17, Vladimir Sementsov-Ogievskiy wrote: -qemu_mutex_lock_iothread(); +bdrv_reset_dirty_bitmap(bs, bmds-dirty_bitmap, cur_sector, nr_sectors); + blk-aiocb = bdrv_aio_readv(bs, cur_sector, blk-qiov, nr_sectors, blk_mig_read_cb, blk); -bdrv_reset_dirty_bitmap(bs, bmds-dirty_bitmap, cur_sector, nr_sectors); -qemu_mutex_unlock_iothread(); - bdrv_aio_readv is not thread safe, so it needs the iothread lock. bdrv_reset_dirty_bitmap is also not thread safe, because bmds-dirty_bitmap is accessed concurrently by bdrv_reset_dirty (discard) and bdrv_set_dirty (write). So it needs the iothread lock too. Moving bdrv_reset_dirty_bitmap before the read is a good idea. Paolo Ok.
Re: [Qemu-devel] [PATCH 0/8] RCUification of the memory API, part 1
Am 13.01.2015 um 18:52 schrieb Paolo Bonzini: These are the minimal changes to adopt RCU and use it in memory_region_find (and hence in virtio-blk-dataplane). Looks big, but two thirds of it is documentation and tests. Please review! :) Really Nice. This gives me a boost from 1000MB/sec to 1400MB/sec on my fio test. (4k block random reads on null block devices). perf shows me that pthread_mutex_lock and kernel spin locking contention (I have a 32cpu host) goes away. Christian Paolo Jan Kiszka (1): memory: remove assertion on memory_region_destroy Paolo Bonzini (7): tls: require compiler support for __thread rcu: add rcu library rcu: add rcutorture rcu: allow nesting of rcu_read_lock/rcu_read_unlock rcu: add call_rcu memory: protect current_map by RCU memory: avoid ref/unref in memory_region_find configure | 9 +- docs/rcu.txt | 388 +++ exec.c| 2 +- hw/9pfs/virtio-9p-synth.c | 1 + include/exec/memory.h | 5 + include/qemu/atomic.h | 61 +++ include/qemu/queue.h | 13 ++ include/qemu/rcu.h| 155 include/qemu/thread.h | 3 - include/qemu/tls.h| 52 -- include/qom/cpu.h | 4 +- memory.c | 60 +++ tests/Makefile| 11 +- tests/rcutorture.c| 449 ++ tests/test-tls.c | 83 + util/Makefile.objs| 1 + util/rcu.c| 290 ++ 17 files changed, 1485 insertions(+), 102 deletions(-) create mode 100644 docs/rcu.txt create mode 100644 include/qemu/rcu.h delete mode 100644 include/qemu/tls.h create mode 100644 tests/rcutorture.c create mode 100644 tests/test-tls.c create mode 100644 util/rcu.c
Re: [Qemu-devel] [PATCH v2] qemu-iotests: Fix supported_oses check
On Fri, Jan 16, 2015 at 09:38:42AM +0800, Fam Zheng wrote: There is a bug in the recently added sys.platform test, and we no longer run python tests, because linux2 is the value to compare here. So do a prefix match. According to python doc [1], the way to use sys.platform is unless you want to test for a specific system version, it is therefore recommended to use the following idiom: if sys.platform.startswith('freebsd'): # FreeBSD-specific code here... elif sys.platform.startswith('linux'): # Linux-specific code here... [1]: https://docs.python.org/2.7/library/sys.html#sys.platform Signed-off-by: Fam Zheng f...@redhat.com --- v2: Don't use any(). Explain why prefix match is fine. (Thanks, Stefan) --- tests/qemu-iotests/iotests.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Thanks, applied to my block tree: https://github.com/stefanha/qemu/commits/block Stefan pgpMWYxjO6qs6.pgp Description: PGP signature
Re: [Qemu-devel] [PATCH v3 0/2] buildsys: Fix and enable module build
On Fri, Jan 16, 2015 at 09:54:05AM +0800, Fam Zheng wrote: v3: Fix commit message, error message and function name. (Stefan) Fam Zheng (2): configure: Default to enable module build .travis.yml: Add --disable-modules .travis.yml | 3 ++ configure | 95 ++--- 2 files changed, 69 insertions(+), 29 deletions(-) -- 1.9.3 Reviewed-by: Stefan Hajnoczi stefa...@redhat.com pgpdChbdwfbPl.pgp Description: PGP signature
Re: [Qemu-devel] [PATCH] qtest: Fix deadloop by running main loop AIO context's timers
On Fri, Jan 16, 2015 at 04:01:17PM +0800, Fam Zheng wrote: qemu_clock_run_timers() only takes care of main_loop_tlg, we shouldn't forget aio timer list groups. Currently, the qemu_clock_deadline_ns_all (a few lines above) counts all the timergroups of this clock type, including aio tlg, but we don't fire them, so they are never cleared, which makes a dead loop. For example, this function hangs when trying to drive throttled block request queue with qtest clock_step. Signed-off-by: Fam Zheng f...@redhat.com --- cpus.c | 1 + 1 file changed, 1 insertion(+) Reviewed-by: Stefan Hajnoczi stefa...@redhat.com pgp4EqiLM0cJe.pgp Description: PGP signature
Re: [Qemu-devel] [PATCH] Add 'setup' phase to docs for query-migrate QMP command
On 12/12/2014 07:14 AM, Alex Bligh wrote: The QMP command 'query-migrate' returns the state 'setup' during the setup phase. This patch documents it. Signed-off-by: Alex Bligh a...@alex.org.uk --- qmp-commands.hx | 11 --- 1 file changed, 8 insertions(+), 3 deletions(-) Reviewed-by: Eric Blake ebl...@redhat.com -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [PATCH 1/2] tcg-aarch64: user doesn't need R/W access to exec
On 13 January 2015 at 15:48, Andrew Jones drjo...@redhat.com wrote: Table D4-32 shows that execute access from EL0 doesn't depend on AP[1]. This commit message is a bit sparse, which confused me for a bit. It would be worth beefing it up a bit: target-arm: 64-bit EL0 code can execute from unreadable pages In AArch64 mode, a page can be executable even if it is not readable (a difference from AArch32). Instead of bailing out early if the page is not readable, just add 32 bit and page not readable to the list of conditions that make a page non-executable, and check whether the protections and the access type are compatible once at the end of the function. Signed-off-by: Andrew Jones drjo...@redhat.com --- target-arm/helper.c | 27 --- 1 file changed, 16 insertions(+), 11 deletions(-) diff --git a/target-arm/helper.c b/target-arm/helper.c index 3ef0f1f38eda5..7c30a2669a0f2 100644 --- a/target-arm/helper.c +++ b/target-arm/helper.c @@ -4787,7 +4787,7 @@ static int get_phys_addr_lpae(CPUARMState *env, target_ulong address, hwaddr descaddr, descmask; uint32_t tableattrs; target_ulong page_size; -uint32_t attrs; +uint32_t attrs, ap; int32_t granule_sz = 9; int32_t va_size = 32; int32_t tbi = 0; @@ -4952,14 +4952,20 @@ static int get_phys_addr_lpae(CPUARMState *env, target_ulong address, /* Access flag */ goto do_fault; } + fault_type = permission_fault; -if (is_user !(attrs (1 4))) { -/* Unprivileged access not enabled */ -goto do_fault; +ap = extract32(attrs, 4, 2); /* AP[2:1] */ + +*prot = 0; +if (!is_user || (ap 1)) { +*prot |= PAGE_READ; +*prot |= !(ap 2) ? PAGE_WRITE : 0; Personally I would find if (!(ap 2)) { *prot |= PAGE_WRITE; } clearer. } -*prot = PAGE_READ | PAGE_WRITE | PAGE_EXEC; + +*prot |= PAGE_EXEC; if ((arm_feature(env, ARM_FEATURE_V8) is_user (attrs (1 12))) || (!arm_feature(env, ARM_FEATURE_V8) (attrs (1 12))) || +(!arm_el_is_aa64(env, 1) is_user !(ap 1)) || (!is_user (attrs (1 11 { /* XN/UXN or PXN. Since we only implement EL0/EL1 we unconditionally * treat XN/UXN as UXN for v8. @@ -4969,12 +4975,11 @@ static int get_phys_addr_lpae(CPUARMState *env, target_ulong address, } There is a if access_type == 2 goto do_fault check just above this hunk which you can delete, because we're now doing that check in the code you add below. *prot = ~PAGE_EXEC; } -if (attrs (1 5)) { -/* Write access forbidden */ -if (access_type == 1) { -goto do_fault; -} -*prot = ~PAGE_WRITE; + +if ((*prot == 0) +|| (!(*prot PAGE_WRITE) access_type == 1) +|| (!(*prot PAGE_EXEC) access_type == 2)) { +goto do_fault; Why isn't this just if (!(*prot (1 access_type))) { ? (Or at least, why doesn't it treat PAGE_READ the same way as the other two bits?) As it is I think we'll treat a page that is marked exec-not-readable as if it were readable. thanks -- PMM
Re: [Qemu-devel] [PATCH 2/4] console: add opengl rendering helper functions
On 2015-01-16 at 04:21, Gerd Hoffmann wrote: Hi, +bool console_gl_check_format(DisplayChangeListener *dcl, + pixman_format_code_t format) +{ +switch (format) { +case PIXMAN_x8r8g8b8: +case PIXMAN_a8r8g8b8: +case PIXMAN_r5g6b5: +return true; +default: +return false; +} +} What is this function supposed to be used for? Inflight change, there is a patch series on the list adding a check_format callback to DisplayChangeListenerOps, for format negotiation. OK. + +void surface_gl_create_texture(DisplaySurface *surface) +{ +switch (surface-format) { +case PIXMAN_x8r8g8b8: +case PIXMAN_a8r8g8b8: +surface-glformat = GL_BGRA; Why does the format code seem to imply ARGB order but you're using the exact opposite? I could imagine something like endianness being the reason, but you're using RGB below where the format code says exactly that. endianness indeed. pixman formats are native endian, so this actually is bgra ordering in memory. While looking at it: I guess GL_BGRA is fixed byte ordering? Yes, I think so. So this probably needs fixing to work correctly on bigendian machines ... Probably, right. Discarding the alpha channel is intentional, I suppose? Yes. + surface_width(surface), + surface_height(surface), + 0, surface-glformat, surface-gltype, + surface_data(surface)); Is surface_stride(surface) specified to be surface_width(surface) * bytes_per_pixel (I don't know pixman so I don't know)? Usually this is the case, but there can be exceptions. Hmm, can I explicitly pass the stride? The internet™ tells me about glPixelStorei(GL_UNPACK_ROW_LENGTH, $pixels_per_row) and glPixelStorei(GL_UNPACK_ALIGNMENT, $row_alignment_in_bytes) (which need to be called before using glTexImage2D() or glTexSubImage2D()). You may want to use it for surface_gl_update_texture(), too (if that isn't too much work). Max
Re: [Qemu-devel] [PATCH v11 03/13] qmp: Add block-dirty-bitmap-add and block-dirty-bitmap-remove
On 2015-01-12 at 11:30, John Snow wrote: From: Fam Zheng f...@redhat.com The new command pair is added to manage user created dirty bitmap. The dirty bitmap's name is mandatory and must be unique for the same device, but different devices can have bitmaps with the same names. The granularity is an optional field. If it is not specified, we will choose a default granularity based on the cluster size if available, clamped to between 4K and 64K to mirror how the 'mirror' code was already choosing granularity. If we do not have cluster size info available, we choose 64K. This code has been factored out into a helper shared with block/mirror. This patch also introduces the 'block_dirty_bitmap_lookup' helper, which takes a device name and a dirty bitmap name and validates the lookup, returning NULL and setting errp if there is a problem with either field. This helper will be re-used in future patches in this series. The types added to block-core.json will be re-used in future patches in this series, see: 'qapi: Add transaction support to block-dirty-bitmap-{add, enable, disable}' Signed-off-by: Fam Zheng f...@redhat.com Signed-off-by: John Snow js...@redhat.com --- block.c | 20 ++ block/mirror.c| 10 + blockdev.c| 100 ++ include/block/block.h | 1 + qapi/block-core.json | 55 +++ qmp-commands.hx | 51 + 6 files changed, 228 insertions(+), 9 deletions(-) diff --git a/block.c b/block.c index bfeae6b..3eb77ee 100644 --- a/block.c +++ b/block.c @@ -5417,6 +5417,26 @@ int bdrv_get_dirty(BlockDriverState *bs, BdrvDirtyBitmap *bitmap, int64_t sector } } +/** + * Chooses a default granularity based on the existing cluster size, + * but clamped between [4K, 64K]. Defaults to 64K in the case that there + * is no cluster size information available. + */ +uint64_t bdrv_get_default_bitmap_granularity(BlockDriverState *bs) +{ +BlockDriverInfo bdi; +uint64_t granularity; + +if (bdrv_get_info(bs, bdi) = 0 bdi.cluster_size != 0) { +granularity = MAX(4096, bdi.cluster_size); +granularity = MIN(65536, granularity); +} else { +granularity = 65536; +} + +return granularity; +} + void bdrv_dirty_iter_init(BlockDriverState *bs, BdrvDirtyBitmap *bitmap, HBitmapIter *hbi) { diff --git a/block/mirror.c b/block/mirror.c index d819952..fc545f1 100644 --- a/block/mirror.c +++ b/block/mirror.c @@ -667,15 +667,7 @@ static void mirror_start_job(BlockDriverState *bs, BlockDriverState *target, MirrorBlockJob *s; if (granularity == 0) { -/* Choose the default granularity based on the target file's cluster - * size, clamped between 4k and 64k. */ -BlockDriverInfo bdi; -if (bdrv_get_info(target, bdi) = 0 bdi.cluster_size != 0) { -granularity = MAX(4096, bdi.cluster_size); -granularity = MIN(65536, granularity); -} else { -granularity = 65536; -} +granularity = bdrv_get_default_bitmap_granularity(target); } assert ((granularity (granularity - 1)) == 0); diff --git a/blockdev.c b/blockdev.c index 5651a8e..95251c7 100644 --- a/blockdev.c +++ b/blockdev.c @@ -1173,6 +1173,48 @@ out_aio_context: return NULL; } +/** + * Return a dirty bitmap (if present), after validating + * the node reference and bitmap names. Returns NULL on error, + * including when the BDS and/or bitmap is not found. + */ +static BdrvDirtyBitmap *block_dirty_bitmap_lookup(const char *node_ref, + const char *name, + BlockDriverState **pbs, + Error **errp) +{ +BlockDriverState *bs; +BdrvDirtyBitmap *bitmap; + +if (!node_ref) { +error_setg(errp, Node reference cannot be NULL); +return NULL; +} +if (!name) { +error_setg(errp, Bitmap name cannot be NULL); +return NULL; +} + +bs = bdrv_lookup_bs(node_ref, node_ref, errp); +if (!bs) { +error_setg(errp, Node reference '%s' not found, node_ref); No need to throw the (hopefully) perfectly fine Error code returned by bdrv_lookup_bs() away. +return NULL; +} + +/* If caller provided a BDS*, provide the result of that lookup, too. */ +if (pbs) { +*pbs = bs; +} + +bitmap = bdrv_find_dirty_bitmap(bs, name); +if (!bitmap) { +error_setg(errp, Dirty bitmap not found: %s, name); I'd use Dirty bitmap '%s' not found, because foo: bar in an error message looks like foo because bar to me. But it's up to you, dirty bitmap names most likely don't look like error reasons so nobody should be able to confuse it. So with the error_setg() in the fail path for bdrv_lookup_bs() removed:
[Qemu-devel] [PULL 09/16] block/dmg: fix sector data offset calculation
From: Peter Wu pe...@lekensteyn.nl This patch addresses two issues: - The data fork offset was not taken into account, resulting in failure to read an InstallESD.dmg file (5164763151 bytes) which had a non-zero DataForkOffset field. - The offset of the previous block (partition) was unconditionally added to the current block because older files would start the input offset of a new block at zero. Newer files (including vlc-2.1.5.dmg, tuxpaint-0.9.15-macosx.dmg and OS X Yosemite [MAS].dmg) failed in reads because these files have chunk offsets, relative to the begin of a data fork. Now the data offset of the mish is taken into account. While we could check that the data_offset is within the data fork, let's not do that here as it would only result in parse failures on invalid files (rather than gracefully handling such bad files). dmg_read will error out if the offset is incorrect. Signed-off-by: Peter Wu pe...@lekensteyn.nl Reviewed-by: John Snow js...@redhat.com Message-id: 1420566495-13284-9-git-send-email-pe...@lekensteyn.nl Signed-off-by: Stefan Hajnoczi stefa...@redhat.com --- block/dmg.c | 26 -- 1 file changed, 20 insertions(+), 6 deletions(-) diff --git a/block/dmg.c b/block/dmg.c index 3f8f1cc..65fc7fe 100644 --- a/block/dmg.c +++ b/block/dmg.c @@ -186,7 +186,7 @@ static int64_t dmg_find_koly_offset(BlockDriverState *file_bs, Error **errp) typedef struct DmgHeaderState { /* used internally by dmg_read_mish_block to remember offsets of blocks * across calls */ -uint64_t last_in_offset; +uint64_t data_fork_offset; uint64_t last_out_offset; /* exported for dmg_open */ uint32_t max_compressed_size; @@ -201,6 +201,8 @@ static int dmg_read_mish_block(BDRVDMGState *s, DmgHeaderState *ds, size_t new_size; uint32_t chunk_count; int64_t offset = 0; +uint64_t data_offset; +uint64_t in_offset = ds-data_fork_offset; type = buff_read_uint32(buffer, offset); /* skip data that is not a valid MISH block (invalid magic or too small) */ @@ -209,8 +211,12 @@ static int dmg_read_mish_block(BDRVDMGState *s, DmgHeaderState *ds, return 0; } -offset += 4; -offset += 200; +/* location in data fork for (compressed) blob (in bytes) */ +data_offset = buff_read_uint64(buffer, offset + 0x18); +in_offset += data_offset; + +/* move to begin of chunk entries */ +offset += 204; chunk_count = (count - 204) / 40; new_size = sizeof(uint64_t) * (s-n_chunks + chunk_count); @@ -226,7 +232,6 @@ static int dmg_read_mish_block(BDRVDMGState *s, DmgHeaderState *ds, if (s-types[i] != 0x8005 s-types[i] != 1 s-types[i] != 2) { if (s-types[i] == 0x i 0) { -ds-last_in_offset = s-offsets[i - 1] + s-lengths[i - 1]; ds-last_out_offset = s-sectors[i - 1] + s-sectorcounts[i - 1]; } @@ -253,7 +258,7 @@ static int dmg_read_mish_block(BDRVDMGState *s, DmgHeaderState *ds, } s-offsets[i] = buff_read_uint64(buffer, offset); -s-offsets[i] += ds-last_in_offset; +s-offsets[i] += in_offset; offset += 8; s-lengths[i] = buff_read_uint64(buffer, offset); @@ -411,7 +416,7 @@ static int dmg_open(BlockDriverState *bs, QDict *options, int flags, s-n_chunks = 0; s-offsets = s-lengths = s-sectors = s-sectorcounts = NULL; /* used by dmg_read_mish_block to keep track of the current I/O position */ -ds.last_in_offset = 0; +ds.data_fork_offset = 0; ds.last_out_offset = 0; ds.max_compressed_size = 1; ds.max_sectors_per_chunk = 1; @@ -423,6 +428,15 @@ static int dmg_open(BlockDriverState *bs, QDict *options, int flags, goto fail; } +/* offset of data fork (DataForkOffset) */ +ret = read_uint64(bs, offset + 0x18, ds.data_fork_offset); +if (ret 0) { +goto fail; +} else if (ds.data_fork_offset offset) { +ret = -EINVAL; +goto fail; +} + /* offset of resource fork (RsrcForkOffset) */ ret = read_uint64(bs, offset + 0x28, rsrc_fork_offset); if (ret 0) { -- 2.1.0
Re: [Qemu-devel] [v3 4/5] Qemu-Xen-vTPM: Qemu vTPM xenstubdoms backen.
-Original Message- From: Stefan Berger [mailto:stef...@linux.vnet.ibm.com] Sent: Thursday, January 15, 2015 11:49 PM To: Xu, Quan; qemu-devel@nongnu.org Cc: stefano.stabell...@eu.citrix.com; xen-de...@lists.xen.org Subject: Re: [Qemu-devel] [v3 4/5] Qemu-Xen-vTPM: Qemu vTPM xenstubdoms backen. On 12/30/2014 06:03 PM, Quan Xu wrote: This Patch provides the glue for the TPM_TIS(Qemu frontend) to Xen stubdom vTPM domain that provides the actual TPM functionality. It sends data and TPM commends with xen_vtpm_frontend. It is similar as another two vTPM backens: *vTPM passthrough backen Since QEMU 1.5. *vTPM libtpms-based backen. Some details: This part of the patch provides support for the spawning of a thread that will interact with stubdom vTPM domain by the xen_vtpm_frontend. It expects a signal from the frontend to wake and pick up the TPM command that is supposed to be processed and delivers the response packet using a callback function provided by the frontend. The backend connects itself to the frontend by filling out an interface structure with pointers to the function implementing support for various operations. (QEMU) vTPM XenStubdoms backen is initialized by Qemu command line options, -tpmdev xenstubdoms,id=xenvtpm0 -device tpm-tis,tpmdev=xenvtpm0 --Changes in v3: -Call vtpm_send() and vtpm_recv() directly. Signed-off-by: Quan Xu quan...@intel.com --- hw/tpm/Makefile.objs | 2 +- hw/tpm/tpm_xenstubdoms.c | 245 +++ 2 files changed, 246 insertions(+), 1 deletion(-) create mode 100644 hw/tpm/tpm_xenstubdoms.c diff --git a/hw/tpm/Makefile.objs b/hw/tpm/Makefile.objs index 57919fa..190e776 100644 --- a/hw/tpm/Makefile.objs +++ b/hw/tpm/Makefile.objs @@ -1,3 +1,3 @@ common-obj-$(CONFIG_TPM_TIS) += tpm_tis.o common-obj-$(CONFIG_TPM_PASSTHROUGH) += tpm_passthrough.o -common-obj-$(CONFIG_TPM_XENSTUBDOMS) += xen_vtpm_frontend.o +common-obj-$(CONFIG_TPM_XENSTUBDOMS) += tpm_xenstubdoms.o +xen_vtpm_frontend.o diff --git a/hw/tpm/tpm_xenstubdoms.c b/hw/tpm/tpm_xenstubdoms.c new file mode 100644 index 000..98ea496 --- /dev/null +++ b/hw/tpm/tpm_xenstubdoms.c @@ -0,0 +1,245 @@ +/* + * Xen Stubdom vTPM driver + * + * Copyright (c) 2014 Intel Corporation + * Authors: + *Quan Xu quan...@intel.com + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, see +http://www.gnu.org/licenses/ */ + +#include dirent.h +#include qemu-common.h +#include qapi/error.h +#include qemu/sockets.h +#include qemu/log.h +#include sysemu/tpm_backend.h +#include tpm_int.h +#include hw/hw.h +#include hw/i386/pc.h +#include hw/xen/xen_backend.h +#include sysemu/tpm_backend_int.h +#include tpm_tis.h + +#ifdef DEBUG_TPM +#define DPRINTF(fmt, ...) \ +do { fprintf(stderr, fmt, ## __VA_ARGS__); } while (0) #else +#define DPRINTF(fmt, ...) \ +do { } while (0) +#endif + +#define TYPE_TPM_XENSTUBDOMS tpm-xenstubdoms +#define TPM_XENSTUBDOMS(obj) \ +OBJECT_CHECK(TPMXenstubdomsState, (obj), TYPE_TPM_XENSTUBDOMS) + +static const TPMDriverOps tpm_xenstubdoms_driver; + +/* data structures */ +typedef struct TPMXenstubdomsThreadParams { +TPMState *tpm_state; +TPMRecvDataCB *recv_data_callback; +TPMBackend *tb; +} TPMXenstubdomsThreadParams; + +struct TPMXenstubdomsState { +TPMBackend parent; +TPMBackendThread tbt; +TPMXenstubdomsThreadParams tpm_thread_params; +bool had_startup_error; +}; + +typedef struct TPMXenstubdomsState TPMXenstubdomsState; + +/* functions */ + +static void tpm_xenstubdoms_cancel_cmd(TPMBackend *tb); + +static int tpm_xenstubdoms_unix_transfer(const TPMLocality +*locty_data) { +size_t rlen; +struct XenDevice *xendev; + +xendev = xen_be_find_xendev(vtpm, xen_domid, 0); +if (xendev == NULL) { +xen_be_printf(xendev, 0, Con not find vtpm device\n); +return -1; +} +vtpm_send(xendev, locty_data-w_buffer.buffer, locty_data-w_offset); +vtpm_recv(xendev, locty_data-r_buffer.buffer, rlen); +return 0; +} + +static void tpm_xenstubdoms_worker_thread(gpointer data, + gpointer
[Qemu-devel] [PULL 00/16] Block patches
The following changes since commit df58887b20fab8fe8a6dcca4db30cd4e4077d53a: Merge remote-tracking branch 'remotes/mjt/tags/pull-trivial-patches-2015-01-15' into staging (2015-01-15 10:08:46 +) are available in the git repository at: git://github.com/stefanha/qemu.git tags/block-pull-request for you to fetch changes up to d6e2630f17fa40342af80d20c04f279fa23ea189: qemu-iotests: Fix supported_oses check (2015-01-16 15:36:34 +) Fam Zheng (1): qemu-iotests: Fix supported_oses check Francesco Romani (1): block: add event when disk usage exceeds threshold Peter Wu (12): block/dmg: properly detect the UDIF trailer block/dmg: extract mish block decoding functionality block/dmg: extract processing of resource forks block/dmg: process a buffer instead of reading ints block/dmg: validate chunk size to avoid overflow block/dmg: process XML plists block/dmg: set virtual size to a non-zero value block/dmg: fix sector data offset calculation block/dmg: use SectorNumber from BLKX header block/dmg: factor out block type check block/dmg: support bzip2 block entry types block/dmg: improve zeroes handling Stefan Hajnoczi (2): qed: check for header size overflow qemu-iotests: add 116 invalid QED input file tests block/Makefile.objs | 2 + block/dmg.c | 501 ++-- block/qapi.c| 3 + block/qed.c | 5 + block/write-threshold.c | 125 ++ configure | 31 +++ include/block/block_int.h | 4 + include/block/write-threshold.h | 64 + qapi/block-core.json| 51 +++- qmp-commands.hx | 32 +++ tests/Makefile | 3 + tests/qemu-iotests/067.out | 5 + tests/qemu-iotests/116 | 96 tests/qemu-iotests/116.out | 37 +++ tests/qemu-iotests/group| 1 + tests/qemu-iotests/iotests.py | 2 +- tests/test-write-threshold.c| 119 ++ 17 files changed, 962 insertions(+), 119 deletions(-) create mode 100644 block/write-threshold.c create mode 100644 include/block/write-threshold.h create mode 100755 tests/qemu-iotests/116 create mode 100644 tests/qemu-iotests/116.out create mode 100644 tests/test-write-threshold.c -- 2.1.0
[Qemu-devel] [PULL 02/16] block/dmg: properly detect the UDIF trailer
From: Peter Wu pe...@lekensteyn.nl DMG files have a variable length with a UDIF trailer at the end of a file. This UDIF trailer is essential as it describes the contents of the image. At the moment however, the start of this trailer is almost always incorrect as bdrv_getlength() returns a multiple of the block size (rounded up). This results in a failure to recognize DMG files, resulting in Invalid argument (EINVAL) errors. As there is no API to retrieve the real file size, look for the magic header in the last two sectors to find the start of this 512-byte UDIF trailer (the koly block). The resource fork offset (info_begin) has its offset adjusted as the initial value of offset does not mean end of file anymore, but begin of UDIF trailer. [Replaced error_set(errp, ERROR_CLASS_GENERIC_ERROR, ...) with error_setg(errp, ...) as discussed with Peter. --Stefan] Signed-off-by: Peter Wu pe...@lekensteyn.nl Reviewed-by: John Snow js...@redhat.com Reviewed-by: Stefan Hajnoczi stefa...@redhat.com Message-id: 1420566495-13284-2-git-send-email-pe...@lekensteyn.nl Signed-off-by: Stefan Hajnoczi stefa...@redhat.com --- block/dmg.c | 47 +++ 1 file changed, 43 insertions(+), 4 deletions(-) diff --git a/block/dmg.c b/block/dmg.c index e455886..cdad28f 100644 --- a/block/dmg.c +++ b/block/dmg.c @@ -131,6 +131,46 @@ static void update_max_chunk_size(BDRVDMGState *s, uint32_t chunk, } } +static int64_t dmg_find_koly_offset(BlockDriverState *file_bs, Error **errp) +{ +int64_t length; +int64_t offset = 0; +uint8_t buffer[515]; +int i, ret; + +/* bdrv_getlength returns a multiple of block size (512), rounded up. Since + * dmg images can have odd sizes, try to look for the koly magic which + * marks the begin of the UDIF trailer (512 bytes). This magic can be found + * in the last 511 bytes of the second-last sector or the first 4 bytes of + * the last sector (search space: 515 bytes) */ +length = bdrv_getlength(file_bs); +if (length 0) { +error_setg_errno(errp, -length, +Failed to get file size while reading UDIF trailer); +return length; +} else if (length 512) { +error_setg(errp, dmg file must be at least 512 bytes long); +return -EINVAL; +} +if (length 511 + 512) { +offset = length - 511 - 512; +} +length = length 515 ? length : 515; +ret = bdrv_pread(file_bs, offset, buffer, length); +if (ret 0) { +error_setg_errno(errp, -ret, Failed while reading UDIF trailer); +return ret; +} +for (i = 0; i length - 3; i++) { +if (buffer[i] == 'k' buffer[i+1] == 'o' +buffer[i+2] == 'l' buffer[i+3] == 'y') { +return offset + i; +} +} +error_setg(errp, Could not locate UDIF trailer in dmg file); +return -EINVAL; +} + static int dmg_open(BlockDriverState *bs, QDict *options, int flags, Error **errp) { @@ -145,15 +185,14 @@ static int dmg_open(BlockDriverState *bs, QDict *options, int flags, s-n_chunks = 0; s-offsets = s-lengths = s-sectors = s-sectorcounts = NULL; -/* read offset of info blocks */ -offset = bdrv_getlength(bs-file); +/* locate the UDIF trailer */ +offset = dmg_find_koly_offset(bs-file, errp); if (offset 0) { ret = offset; goto fail; } -offset -= 0x1d8; -ret = read_uint64(bs, offset, info_begin); +ret = read_uint64(bs, offset + 0x28, info_begin); if (ret 0) { goto fail; } else if (info_begin == 0) { -- 2.1.0
[Qemu-devel] [PULL 01/16] block: add event when disk usage exceeds threshold
From: Francesco Romani from...@redhat.com Managing applications, like oVirt (http://www.ovirt.org), make extensive use of thin-provisioned disk images. To let the guest run smoothly and be not unnecessarily paused, oVirt sets a disk usage threshold (so called 'high water mark') based on the occupation of the device, and automatically extends the image once the threshold is reached or exceeded. In order to detect the crossing of the threshold, oVirt has no choice but aggressively polling the QEMU monitor using the query-blockstats command. This lead to unnecessary system load, and is made even worse under scale: deployments with hundreds of VMs are no longer rare. To fix this, this patch adds: * A new monitor command `block-set-write-threshold', to set a mark for a given block device. * A new event `BLOCK_WRITE_THRESHOLD', to report if a block device usage exceeds the threshold. * A new `write_threshold' field into the `BlockDeviceInfo' structure, to report the configured threshold. This will allow the managing application to use smarter and more efficient monitoring, greatly reducing the need of polling. [Updated qemu-iotests 067 output to add the new 'write_threshold' property. --Stefan] Signed-off-by: Francesco Romani from...@redhat.com Reviewed-by: Eric Blake ebl...@redhat.com Message-id: 1421068273-692-1-git-send-email-from...@redhat.com Signed-off-by: Stefan Hajnoczi stefa...@redhat.com --- block/Makefile.objs | 1 + block/qapi.c| 3 + block/write-threshold.c | 125 include/block/block_int.h | 4 ++ include/block/write-threshold.h | 64 qapi/block-core.json| 51 +++- qmp-commands.hx | 32 ++ tests/Makefile | 3 + tests/qemu-iotests/067.out | 5 ++ tests/test-write-threshold.c| 119 ++ 10 files changed, 406 insertions(+), 1 deletion(-) create mode 100644 block/write-threshold.c create mode 100644 include/block/write-threshold.h create mode 100644 tests/test-write-threshold.c diff --git a/block/Makefile.objs b/block/Makefile.objs index 04b0e43..010afad 100644 --- a/block/Makefile.objs +++ b/block/Makefile.objs @@ -20,6 +20,7 @@ block-obj-$(CONFIG_GLUSTERFS) += gluster.o block-obj-$(CONFIG_ARCHIPELAGO) += archipelago.o block-obj-$(CONFIG_LIBSSH2) += ssh.o block-obj-y += accounting.o +block-obj-y += write-threshold.o common-obj-y += stream.o common-obj-y += commit.o diff --git a/block/qapi.c b/block/qapi.c index a6fd6f7..03abaab 100644 --- a/block/qapi.c +++ b/block/qapi.c @@ -24,6 +24,7 @@ #include block/qapi.h #include block/block_int.h +#include block/write-threshold.h #include qmp-commands.h #include qapi-visit.h #include qapi/qmp-output-visitor.h @@ -89,6 +90,8 @@ BlockDeviceInfo *bdrv_block_device_info(BlockDriverState *bs) info-iops_size = cfg.op_size; } +info-write_threshold = bdrv_write_threshold_get(bs); + return info; } diff --git a/block/write-threshold.c b/block/write-threshold.c new file mode 100644 index 000..c2cd517 --- /dev/null +++ b/block/write-threshold.c @@ -0,0 +1,125 @@ +/* + * QEMU System Emulator block write threshold notification + * + * Copyright Red Hat, Inc. 2014 + * + * Authors: + * Francesco Romani from...@redhat.com + * + * This work is licensed under the terms of the GNU LGPL, version 2 or later. + * See the COPYING.LIB file in the top-level directory. + */ + +#include block/block_int.h +#include block/coroutine.h +#include block/write-threshold.h +#include qemu/notify.h +#include qapi-event.h +#include qmp-commands.h + + +uint64_t bdrv_write_threshold_get(const BlockDriverState *bs) +{ +return bs-write_threshold_offset; +} + +bool bdrv_write_threshold_is_set(const BlockDriverState *bs) +{ +return bs-write_threshold_offset 0; +} + +static void write_threshold_disable(BlockDriverState *bs) +{ +if (bdrv_write_threshold_is_set(bs)) { +notifier_with_return_remove(bs-write_threshold_notifier); +bs-write_threshold_offset = 0; +} +} + +uint64_t bdrv_write_threshold_exceeded(const BlockDriverState *bs, + const BdrvTrackedRequest *req) +{ +if (bdrv_write_threshold_is_set(bs)) { +if (req-offset bs-write_threshold_offset) { +return (req-offset - bs-write_threshold_offset) + req-bytes; +} +if ((req-offset + req-bytes) bs-write_threshold_offset) { +return (req-offset + req-bytes) - bs-write_threshold_offset; +} +} +return 0; +} + +static int coroutine_fn before_write_notify(NotifierWithReturn *notifier, +void *opaque) +{ +BdrvTrackedRequest *req = opaque; +BlockDriverState *bs = req-bs; +uint64_t amount = 0; + +amount = bdrv_write_threshold_exceeded(bs, req); +if (amount 0) { +
[Qemu-devel] [PULL 03/16] block/dmg: extract mish block decoding functionality
From: Peter Wu pe...@lekensteyn.nl Extract the mish block decoder such that this can be used for other formats in the future. A new DmgHeaderState struct is introduced to share state while decoding. The code is kept unchanged as much as possible, a fail label is added for example where a simple return would probably do. In dmg_open, the variable tmp is renamed to rsrc_data_offset for clarity and comments have been added explaining various data. Note that this patch has one subtle difference with the previous version which should not affect functionality. In the previous code, the end of a resource was inferred from the mish block (the offsets would be increased by the fields). In this patch, the resource length is used instead to avoid the need to rely on the previous offsets. Signed-off-by: Peter Wu pe...@lekensteyn.nl Reviewed-by: John Snow js...@redhat.com Reviewed-by: Stefan Hajnoczi stefa...@redhat.com Message-id: 1420566495-13284-3-git-send-email-pe...@lekensteyn.nl Signed-off-by: Stefan Hajnoczi stefa...@redhat.com --- block/dmg.c | 228 +++- 1 file changed, 133 insertions(+), 95 deletions(-) diff --git a/block/dmg.c b/block/dmg.c index cdad28f..c571ac9 100644 --- a/block/dmg.c +++ b/block/dmg.c @@ -171,19 +171,138 @@ static int64_t dmg_find_koly_offset(BlockDriverState *file_bs, Error **errp) return -EINVAL; } +/* used when building the sector table */ +typedef struct DmgHeaderState { +/* used internally by dmg_read_mish_block to remember offsets of blocks + * across calls */ +uint64_t last_in_offset; +uint64_t last_out_offset; +/* exported for dmg_open */ +uint32_t max_compressed_size; +uint32_t max_sectors_per_chunk; +} DmgHeaderState; + +static int dmg_read_mish_block(BlockDriverState *bs, DmgHeaderState *ds, + int64_t offset, uint32_t count) +{ +BDRVDMGState *s = bs-opaque; +uint32_t type, i; +int ret; +size_t new_size; +uint32_t chunk_count; + +ret = read_uint32(bs, offset, type); +if (ret 0) { +goto fail; +} + +/* skip data that is not a valid MISH block (invalid magic or too small) */ +if (type != 0x6d697368 || count 244) { +/* assume success for now */ +return 0; +} + +offset += 4; +offset += 200; + +chunk_count = (count - 204) / 40; +new_size = sizeof(uint64_t) * (s-n_chunks + chunk_count); +s-types = g_realloc(s-types, new_size / 2); +s-offsets = g_realloc(s-offsets, new_size); +s-lengths = g_realloc(s-lengths, new_size); +s-sectors = g_realloc(s-sectors, new_size); +s-sectorcounts = g_realloc(s-sectorcounts, new_size); + +for (i = s-n_chunks; i s-n_chunks + chunk_count; i++) { +ret = read_uint32(bs, offset, s-types[i]); +if (ret 0) { +goto fail; +} +offset += 4; +if (s-types[i] != 0x8005 s-types[i] != 1 +s-types[i] != 2) { +if (s-types[i] == 0x i 0) { +ds-last_in_offset = s-offsets[i - 1] + s-lengths[i - 1]; +ds-last_out_offset = s-sectors[i - 1] + + s-sectorcounts[i - 1]; +} +chunk_count--; +i--; +offset += 36; +continue; +} +offset += 4; + +ret = read_uint64(bs, offset, s-sectors[i]); +if (ret 0) { +goto fail; +} +s-sectors[i] += ds-last_out_offset; +offset += 8; + +ret = read_uint64(bs, offset, s-sectorcounts[i]); +if (ret 0) { +goto fail; +} +offset += 8; + +if (s-sectorcounts[i] DMG_SECTORCOUNTS_MAX) { +error_report(sector count % PRIu64 for chunk % PRIu32 + is larger than max (%u), + s-sectorcounts[i], i, DMG_SECTORCOUNTS_MAX); +ret = -EINVAL; +goto fail; +} + +ret = read_uint64(bs, offset, s-offsets[i]); +if (ret 0) { +goto fail; +} +s-offsets[i] += ds-last_in_offset; +offset += 8; + +ret = read_uint64(bs, offset, s-lengths[i]); +if (ret 0) { +goto fail; +} +offset += 8; + +if (s-lengths[i] DMG_LENGTHS_MAX) { +error_report(length % PRIu64 for chunk % PRIu32 + is larger than max (%u), + s-lengths[i], i, DMG_LENGTHS_MAX); +ret = -EINVAL; +goto fail; +} + +update_max_chunk_size(s, i, ds-max_compressed_size, + ds-max_sectors_per_chunk); +} +s-n_chunks += chunk_count; +return 0; + +fail: +return ret; +} + static int dmg_open(BlockDriverState *bs, QDict *options, int flags, Error **errp) { BDRVDMGState *s = bs-opaque; -uint64_t info_begin,
[Qemu-devel] [PULL 14/16] qed: check for header size overflow
Header size is denoted in clusters. The maximum cluster size is 64 MB but there is no limit on header size. Check for uint32_t overflow in case the header size field has a whacky value. Signed-off-by: Stefan Hajnoczi stefa...@redhat.com Message-id: 1421065893-18875-2-git-send-email-stefa...@redhat.com Reviewed-by: Kevin Wolf kw...@redhat.com --- block/qed.c | 5 + 1 file changed, 5 insertions(+) diff --git a/block/qed.c b/block/qed.c index 80f18d8..892b13c 100644 --- a/block/qed.c +++ b/block/qed.c @@ -440,6 +440,11 @@ static int bdrv_qed_open(BlockDriverState *bs, QDict *options, int flags, s-l2_mask = s-table_nelems - 1; s-l1_shift = s-l2_shift + ffs(s-table_nelems) - 1; +/* Header size calculation must not overflow uint32_t */ +if (s-header.header_size UINT32_MAX / s-header.cluster_size) { +return -EINVAL; +} + if ((s-header.features QED_F_BACKING_FILE)) { if ((uint64_t)s-header.backing_filename_offset + s-header.backing_filename_size -- 2.1.0
[Qemu-devel] [PULL 06/16] block/dmg: validate chunk size to avoid overflow
From: Peter Wu pe...@lekensteyn.nl Previously the chunk size was not checked, allowing for a large memory allocation. This patch checks whether the chunks size is within the resource fork length, and whether the resource fork is below the trailer of the dmg file. Signed-off-by: Peter Wu pe...@lekensteyn.nl Reviewed-by: John Snow js...@redhat.com Message-id: 1420566495-13284-6-git-send-email-pe...@lekensteyn.nl Signed-off-by: Stefan Hajnoczi stefa...@redhat.com --- block/dmg.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/block/dmg.c b/block/dmg.c index 4f56227..5c2c2c2 100644 --- a/block/dmg.c +++ b/block/dmg.c @@ -317,7 +317,7 @@ static int dmg_read_resource_fork(BlockDriverState *bs, DmgHeaderState *ds, ret = read_uint32(bs, offset, count); if (ret 0) { goto fail; -} else if (count == 0) { +} else if (count == 0 || count info_end - offset) { ret = -EINVAL; goto fail; } @@ -377,6 +377,11 @@ static int dmg_open(BlockDriverState *bs, QDict *options, int flags, if (ret 0) { goto fail; } +if (rsrc_fork_offset = offset || +rsrc_fork_length offset - rsrc_fork_offset) { +ret = -EINVAL; +goto fail; +} if (rsrc_fork_length != 0) { ret = dmg_read_resource_fork(bs, ds, rsrc_fork_offset, rsrc_fork_length); -- 2.1.0
[Qemu-devel] [PULL 04/16] block/dmg: extract processing of resource forks
From: Peter Wu pe...@lekensteyn.nl Besides the offset, also read the resource length. This length is now used in the extracted function to verify the end of the resource fork against count from the resource fork. Instead of relying on the value of offset to conclude whether the resource fork is available or not (info_begin==0), check the rsrc_fork_length instead. This would allow a dmg file to begin with a resource fork. This seemingly unnecessary restriction was found while trying to craft a DMG file by hand. Other changes: - Do not require resource data offset to be 0x100 (but check that it is within bounds though). - Further improve boundary checking (resource data must be within the resource fork). - Use correct value for resource data length (spotted by John Snow) - Consider the resource data offset when determining info_end. This fixes an EINVAL on the tuxpaint dmg example. The resource fork format is documented at https://developer.apple.com/legacy/library/documentation/mac/pdf/MoreMacintoshToolbox.pdf#page=151 Signed-off-by: Peter Wu pe...@lekensteyn.nl Reviewed-by: John Snow js...@redhat.com Message-id: 1420566495-13284-4-git-send-email-pe...@lekensteyn.nl Signed-off-by: Stefan Hajnoczi stefa...@redhat.com --- block/dmg.c | 104 ++-- 1 file changed, 66 insertions(+), 38 deletions(-) diff --git a/block/dmg.c b/block/dmg.c index c571ac9..04bae72 100644 --- a/block/dmg.c +++ b/block/dmg.c @@ -285,60 +285,38 @@ fail: return ret; } -static int dmg_open(BlockDriverState *bs, QDict *options, int flags, -Error **errp) +static int dmg_read_resource_fork(BlockDriverState *bs, DmgHeaderState *ds, + uint64_t info_begin, uint64_t info_length) { -BDRVDMGState *s = bs-opaque; -DmgHeaderState ds; -uint64_t info_begin, info_end; -uint32_t count, rsrc_data_offset; -int64_t offset; int ret; +uint32_t count, rsrc_data_offset; +uint64_t info_end; +uint64_t offset; -bs-read_only = 1; -s-n_chunks = 0; -s-offsets = s-lengths = s-sectors = s-sectorcounts = NULL; -/* used by dmg_read_mish_block to keep track of the current I/O position */ -ds.last_in_offset = 0; -ds.last_out_offset = 0; -ds.max_compressed_size = 1; -ds.max_sectors_per_chunk = 1; - -/* locate the UDIF trailer */ -offset = dmg_find_koly_offset(bs-file, errp); -if (offset 0) { -ret = offset; -goto fail; -} - -ret = read_uint64(bs, offset + 0x28, info_begin); -if (ret 0) { -goto fail; -} else if (info_begin == 0) { -ret = -EINVAL; -goto fail; -} - +/* read offset from begin of resource fork (info_begin) to resource data */ ret = read_uint32(bs, info_begin, rsrc_data_offset); if (ret 0) { goto fail; -} else if (rsrc_data_offset != 0x100) { +} else if (rsrc_data_offset info_length) { ret = -EINVAL; goto fail; } -ret = read_uint32(bs, info_begin + 4, count); +/* read length of resource data */ +ret = read_uint32(bs, info_begin + 8, count); if (ret 0) { goto fail; -} else if (count == 0) { +} else if (count == 0 || rsrc_data_offset + count info_length) { ret = -EINVAL; goto fail; } -/* end of resource data, ignoring the following resource map */ -info_end = info_begin + count; /* begin of resource data (consisting of one or more resources) */ -offset = info_begin + 0x100; +offset = info_begin + rsrc_data_offset; + +/* end of resource data (there is possibly a following resource map + * which will be ignored). */ +info_end = offset + count; /* read offsets (mish blocks) from one or more resources in resource data */ while (offset info_end) { @@ -352,13 +330,63 @@ static int dmg_open(BlockDriverState *bs, QDict *options, int flags, } offset += 4; -ret = dmg_read_mish_block(bs, ds, offset, count); +ret = dmg_read_mish_block(bs, ds, offset, count); if (ret 0) { goto fail; } /* advance offset by size of resource */ offset += count; } +return 0; + +fail: +return ret; +} + +static int dmg_open(BlockDriverState *bs, QDict *options, int flags, +Error **errp) +{ +BDRVDMGState *s = bs-opaque; +DmgHeaderState ds; +uint64_t rsrc_fork_offset, rsrc_fork_length; +int64_t offset; +int ret; + +bs-read_only = 1; +s-n_chunks = 0; +s-offsets = s-lengths = s-sectors = s-sectorcounts = NULL; +/* used by dmg_read_mish_block to keep track of the current I/O position */ +ds.last_in_offset = 0; +ds.last_out_offset = 0; +ds.max_compressed_size = 1; +ds.max_sectors_per_chunk = 1; + +/* locate the UDIF trailer */ +offset = dmg_find_koly_offset(bs-file, errp); +if (offset
[Qemu-devel] [PULL 12/16] block/dmg: support bzip2 block entry types
From: Peter Wu pe...@lekensteyn.nl This patch adds support for bzip2-compressed block entries as introduced with OS X 10.4 (source: https://en.wikipedia.org/wiki/Apple_Disk_Image). It was tested against a 5.2G OS X Yosemite installation image which stores the BLXX block in the XML property list (instead of resource forks) and has over 5k chunks. New configure entries are added (--enable-bzip2 / --disable-bzip2) to control inclusion of bzip2 functionality (which requires linking against libbz2). The help message suggests that this option is needed for DMG files, but the tests are generic enough that other parts of QEMU can use bzip2 if needed. The identifiers are based on http://newosxbook.com/DMG.html. The decompression routines are based on the zlib case, but as there is no way to reset the decompression state (unlike zlib), memory is allocated and deallocated for every decompression. This should not be problematic as the decompression takes most of the time and as blocks are typically about/over 1 MiB in size, only one allocation is done every 2000 sectors. Signed-off-by: Peter Wu pe...@lekensteyn.nl Reviewed-by: John Snow js...@redhat.com Acked-by: Paolo Bonzini pbonz...@redhat.com Message-id: 1420566495-13284-12-git-send-email-pe...@lekensteyn.nl Signed-off-by: Stefan Hajnoczi stefa...@redhat.com --- block/Makefile.objs | 1 + block/dmg.c | 43 ++- configure | 31 +++ 3 files changed, 74 insertions(+), 1 deletion(-) diff --git a/block/Makefile.objs b/block/Makefile.objs index 010afad..db2933e 100644 --- a/block/Makefile.objs +++ b/block/Makefile.objs @@ -37,5 +37,6 @@ gluster.o-libs := $(GLUSTERFS_LIBS) ssh.o-cflags := $(LIBSSH2_CFLAGS) ssh.o-libs := $(LIBSSH2_LIBS) archipelago.o-libs := $(ARCHIPELAGO_LIBS) +dmg.o-libs := $(BZIP2_LIBS) qcow.o-libs:= -lz linux-aio.o-libs := -laio diff --git a/block/dmg.c b/block/dmg.c index 2bb7078..196c746 100644 --- a/block/dmg.c +++ b/block/dmg.c @@ -26,6 +26,9 @@ #include qemu/bswap.h #include qemu/module.h #include zlib.h +#ifdef CONFIG_BZIP2 +#include bzlib.h +#endif #include glib.h enum { @@ -56,6 +59,9 @@ typedef struct BDRVDMGState { uint8_t *compressed_chunk; uint8_t *uncompressed_chunk; z_stream zstream; +#ifdef CONFIG_BZIP2 +bz_stream bzstream; +#endif } BDRVDMGState; static int dmg_probe(const uint8_t *buf, int buf_size, const char *filename) @@ -123,6 +129,7 @@ static void update_max_chunk_size(BDRVDMGState *s, uint32_t chunk, switch (s-types[chunk]) { case 0x8005: /* zlib compressed */ +case 0x8006: /* bzip2 compressed */ compressed_size = s-lengths[chunk]; uncompressed_sectors = s-sectorcounts[chunk]; break; @@ -198,6 +205,9 @@ static bool dmg_is_known_block_type(uint32_t entry_type) case 0x0001:/* uncompressed */ case 0x0002:/* zeroes */ case 0x8005:/* zlib */ +#ifdef CONFIG_BZIP2 +case 0x8006:/* bzip2 */ +#endif return true; default: return false; @@ -563,13 +573,16 @@ static inline int dmg_read_chunk(BlockDriverState *bs, uint64_t sector_num) if (!is_sector_in_chunk(s, s-current_chunk, sector_num)) { int ret; uint32_t chunk = search_chunk(s, sector_num); +#ifdef CONFIG_BZIP2 +uint64_t total_out; +#endif if (chunk = s-n_chunks) { return -1; } s-current_chunk = s-n_chunks; -switch (s-types[chunk]) { +switch (s-types[chunk]) { /* block entry type */ case 0x8005: { /* zlib compressed */ /* we need to buffer, because only the chunk as whole can be * inflated. */ @@ -593,6 +606,34 @@ static inline int dmg_read_chunk(BlockDriverState *bs, uint64_t sector_num) return -1; } break; } +#ifdef CONFIG_BZIP2 +case 0x8006: /* bzip2 compressed */ +/* we need to buffer, because only the chunk as whole can be + * inflated. */ +ret = bdrv_pread(bs-file, s-offsets[chunk], + s-compressed_chunk, s-lengths[chunk]); +if (ret != s-lengths[chunk]) { +return -1; +} + +ret = BZ2_bzDecompressInit(s-bzstream, 0, 0); +if (ret != BZ_OK) { +return -1; +} +s-bzstream.next_in = (char *)s-compressed_chunk; +s-bzstream.avail_in = (unsigned int) s-lengths[chunk]; +s-bzstream.next_out = (char *)s-uncompressed_chunk; +s-bzstream.avail_out = (unsigned int) 512 * s-sectorcounts[chunk]; +ret = BZ2_bzDecompress(s-bzstream); +total_out = ((uint64_t)s-bzstream.total_out_hi32 32) + +s-bzstream.total_out_lo32; +BZ2_bzDecompressEnd(s-bzstream); +if (ret
[Qemu-devel] [PULL 07/16] block/dmg: process XML plists
From: Peter Wu pe...@lekensteyn.nl The format is simple enough to avoid using a full-blown XML parser. It assumes that all BLKX items begin with the mish magic word, therefore it is not a problem if other values get matched which are not a BLKX block. The offsets are based on the description at http://newosxbook.com/DMG.html Signed-off-by: Peter Wu pe...@lekensteyn.nl Reviewed-by: John Snow js...@redhat.com Message-id: 1420566495-13284-7-git-send-email-pe...@lekensteyn.nl Signed-off-by: Stefan Hajnoczi stefa...@redhat.com --- block/dmg.c | 74 + 1 file changed, 74 insertions(+) diff --git a/block/dmg.c b/block/dmg.c index 5c2c2c2..d4a7520 100644 --- a/block/dmg.c +++ b/block/dmg.c @@ -26,6 +26,7 @@ #include qemu/bswap.h #include qemu/module.h #include zlib.h +#include glib.h enum { /* Limit chunk sizes to prevent unreasonable amounts of memory being used @@ -343,12 +344,66 @@ fail: return ret; } +static int dmg_read_plist_xml(BlockDriverState *bs, DmgHeaderState *ds, + uint64_t info_begin, uint64_t info_length) +{ +BDRVDMGState *s = bs-opaque; +int ret; +uint8_t *buffer = NULL; +char *data_begin, *data_end; + +/* Have at least some length to avoid NULL for g_malloc. Attempt to set a + * safe upper cap on the data length. A test sample had a XML length of + * about 1 MiB. */ +if (info_length == 0 || info_length 16 * 1024 * 1024) { +ret = -EINVAL; +goto fail; +} + +buffer = g_malloc(info_length + 1); +buffer[info_length] = '\0'; +ret = bdrv_pread(bs-file, info_begin, buffer, info_length); +if (ret != info_length) { +ret = -EINVAL; +goto fail; +} + +/* look for data.../data. The data is 284 (0x11c) bytes after base64 + * decode. The actual data element has 431 (0x1af) bytes which includes tabs + * and line feeds. */ +data_end = (char *)buffer; +while ((data_begin = strstr(data_end, data)) != NULL) { +gsize out_len = 0; + +data_begin += 6; +data_end = strstr(data_begin, /data); +/* malformed XML? */ +if (data_end == NULL) { +ret = -EINVAL; +goto fail; +} +*data_end++ = '\0'; +g_base64_decode_inplace(data_begin, out_len); +ret = dmg_read_mish_block(s, ds, (uint8_t *)data_begin, + (uint32_t)out_len); +if (ret 0) { +goto fail; +} +} +ret = 0; + +fail: +g_free(buffer); +return ret; +} + static int dmg_open(BlockDriverState *bs, QDict *options, int flags, Error **errp) { BDRVDMGState *s = bs-opaque; DmgHeaderState ds; uint64_t rsrc_fork_offset, rsrc_fork_length; +uint64_t plist_xml_offset, plist_xml_length; int64_t offset; int ret; @@ -382,12 +437,31 @@ static int dmg_open(BlockDriverState *bs, QDict *options, int flags, ret = -EINVAL; goto fail; } +/* offset of property list (XMLOffset) */ +ret = read_uint64(bs, offset + 0xd8, plist_xml_offset); +if (ret 0) { +goto fail; +} +ret = read_uint64(bs, offset + 0xe0, plist_xml_length); +if (ret 0) { +goto fail; +} +if (plist_xml_offset = offset || +plist_xml_length offset - plist_xml_offset) { +ret = -EINVAL; +goto fail; +} if (rsrc_fork_length != 0) { ret = dmg_read_resource_fork(bs, ds, rsrc_fork_offset, rsrc_fork_length); if (ret 0) { goto fail; } +} else if (plist_xml_length != 0) { +ret = dmg_read_plist_xml(bs, ds, plist_xml_offset, plist_xml_length); +if (ret 0) { +goto fail; +} } else { ret = -EINVAL; goto fail; -- 2.1.0
[Qemu-devel] [PULL 08/16] block/dmg: set virtual size to a non-zero value
From: Peter Wu pe...@lekensteyn.nl Right now the virtual size is always reported as zero which makes it impossible to convert between formats. After this patch, the number of sectors will be read from the trailer (koly block). To verify the behavior, the output of `dmg2img foo.dmg foo.img` was compared against `qemu-img convert -f dmg -O raw foo.dmg foo.raw`. The tests showed that the file contents are exactly the same, except that QEMU creates a slightly larger file (it matches the total sectors count). Signed-off-by: Peter Wu pe...@lekensteyn.nl Reviewed-by: John Snow js...@redhat.com Message-id: 1420566495-13284-8-git-send-email-pe...@lekensteyn.nl Signed-off-by: Stefan Hajnoczi stefa...@redhat.com --- block/dmg.c | 8 1 file changed, 8 insertions(+) diff --git a/block/dmg.c b/block/dmg.c index d4a7520..3f8f1cc 100644 --- a/block/dmg.c +++ b/block/dmg.c @@ -451,6 +451,14 @@ static int dmg_open(BlockDriverState *bs, QDict *options, int flags, ret = -EINVAL; goto fail; } +ret = read_uint64(bs, offset + 0x1ec, (uint64_t *)bs-total_sectors); +if (ret 0) { +goto fail; +} +if (bs-total_sectors 0) { +ret = -EINVAL; +goto fail; +} if (rsrc_fork_length != 0) { ret = dmg_read_resource_fork(bs, ds, rsrc_fork_offset, rsrc_fork_length); -- 2.1.0
[Qemu-devel] [PULL 11/16] block/dmg: factor out block type check
From: Peter Wu pe...@lekensteyn.nl In preparation for adding bzip2 support, split the type check into a separate function. Make all offsets relative to the begin of a chunk such that it is easier to recognize the position without having to add up all offsets. Some comments are added to describe the fields. There is no functional change. Signed-off-by: Peter Wu pe...@lekensteyn.nl Reviewed-by: John Snow js...@redhat.com Message-id: 1420566495-13284-11-git-send-email-pe...@lekensteyn.nl Signed-off-by: Stefan Hajnoczi stefa...@redhat.com --- block/dmg.c | 36 +++- 1 file changed, 23 insertions(+), 13 deletions(-) diff --git a/block/dmg.c b/block/dmg.c index 0f278c8..2bb7078 100644 --- a/block/dmg.c +++ b/block/dmg.c @@ -192,6 +192,18 @@ typedef struct DmgHeaderState { uint32_t max_sectors_per_chunk; } DmgHeaderState; +static bool dmg_is_known_block_type(uint32_t entry_type) +{ +switch (entry_type) { +case 0x0001:/* uncompressed */ +case 0x0002:/* zeroes */ +case 0x8005:/* zlib */ +return true; +default: +return false; +} +} + static int dmg_read_mish_block(BDRVDMGState *s, DmgHeaderState *ds, uint8_t *buffer, uint32_t count) { @@ -231,22 +243,19 @@ static int dmg_read_mish_block(BDRVDMGState *s, DmgHeaderState *ds, for (i = s-n_chunks; i s-n_chunks + chunk_count; i++) { s-types[i] = buff_read_uint32(buffer, offset); -offset += 4; -if (s-types[i] != 0x8005 s-types[i] != 1 -s-types[i] != 2) { +if (!dmg_is_known_block_type(s-types[i])) { chunk_count--; i--; -offset += 36; +offset += 40; continue; } -offset += 4; -s-sectors[i] = buff_read_uint64(buffer, offset); +/* sector number */ +s-sectors[i] = buff_read_uint64(buffer, offset + 8); s-sectors[i] += out_offset; -offset += 8; -s-sectorcounts[i] = buff_read_uint64(buffer, offset); -offset += 8; +/* sector count */ +s-sectorcounts[i] = buff_read_uint64(buffer, offset + 0x10); if (s-sectorcounts[i] DMG_SECTORCOUNTS_MAX) { error_report(sector count % PRIu64 for chunk % PRIu32 @@ -256,12 +265,12 @@ static int dmg_read_mish_block(BDRVDMGState *s, DmgHeaderState *ds, goto fail; } -s-offsets[i] = buff_read_uint64(buffer, offset); +/* offset in (compressed) data fork */ +s-offsets[i] = buff_read_uint64(buffer, offset + 0x18); s-offsets[i] += in_offset; -offset += 8; -s-lengths[i] = buff_read_uint64(buffer, offset); -offset += 8; +/* length in (compressed) data fork */ +s-lengths[i] = buff_read_uint64(buffer, offset + 0x20); if (s-lengths[i] DMG_LENGTHS_MAX) { error_report(length % PRIu64 for chunk % PRIu32 @@ -273,6 +282,7 @@ static int dmg_read_mish_block(BDRVDMGState *s, DmgHeaderState *ds, update_max_chunk_size(s, i, ds-max_compressed_size, ds-max_sectors_per_chunk); +offset += 40; } s-n_chunks += chunk_count; return 0; -- 2.1.0
[Qemu-devel] [PULL 05/16] block/dmg: process a buffer instead of reading ints
From: Peter Wu pe...@lekensteyn.nl As the decoded plist XML is not a pointer in the file, dmg_read_mish_block must be able to process a buffer instead of a file pointer. Since the full buffer must be processed, let's change the return value again to just a success flag. Signed-off-by: Peter Wu pe...@lekensteyn.nl Reviewed-by: John Snow js...@redhat.com Message-id: 1420566495-13284-5-git-send-email-pe...@lekensteyn.nl Signed-off-by: Stefan Hajnoczi stefa...@redhat.com --- block/dmg.c | 60 ++-- 1 file changed, 30 insertions(+), 30 deletions(-) diff --git a/block/dmg.c b/block/dmg.c index 04bae72..4f56227 100644 --- a/block/dmg.c +++ b/block/dmg.c @@ -100,6 +100,16 @@ static int read_uint32(BlockDriverState *bs, int64_t offset, uint32_t *result) return 0; } +static inline uint64_t buff_read_uint64(const uint8_t *buffer, int64_t offset) +{ +return be64_to_cpu(*(uint64_t *)buffer[offset]); +} + +static inline uint32_t buff_read_uint32(const uint8_t *buffer, int64_t offset) +{ +return be32_to_cpu(*(uint32_t *)buffer[offset]); +} + /* Increase max chunk sizes, if necessary. This function is used to calculate * the buffer sizes needed for compressed/uncompressed chunk I/O. */ @@ -182,20 +192,16 @@ typedef struct DmgHeaderState { uint32_t max_sectors_per_chunk; } DmgHeaderState; -static int dmg_read_mish_block(BlockDriverState *bs, DmgHeaderState *ds, - int64_t offset, uint32_t count) +static int dmg_read_mish_block(BDRVDMGState *s, DmgHeaderState *ds, + uint8_t *buffer, uint32_t count) { -BDRVDMGState *s = bs-opaque; uint32_t type, i; int ret; size_t new_size; uint32_t chunk_count; +int64_t offset = 0; -ret = read_uint32(bs, offset, type); -if (ret 0) { -goto fail; -} - +type = buff_read_uint32(buffer, offset); /* skip data that is not a valid MISH block (invalid magic or too small) */ if (type != 0x6d697368 || count 244) { /* assume success for now */ @@ -214,10 +220,7 @@ static int dmg_read_mish_block(BlockDriverState *bs, DmgHeaderState *ds, s-sectorcounts = g_realloc(s-sectorcounts, new_size); for (i = s-n_chunks; i s-n_chunks + chunk_count; i++) { -ret = read_uint32(bs, offset, s-types[i]); -if (ret 0) { -goto fail; -} +s-types[i] = buff_read_uint32(buffer, offset); offset += 4; if (s-types[i] != 0x8005 s-types[i] != 1 s-types[i] != 2) { @@ -233,17 +236,11 @@ static int dmg_read_mish_block(BlockDriverState *bs, DmgHeaderState *ds, } offset += 4; -ret = read_uint64(bs, offset, s-sectors[i]); -if (ret 0) { -goto fail; -} +s-sectors[i] = buff_read_uint64(buffer, offset); s-sectors[i] += ds-last_out_offset; offset += 8; -ret = read_uint64(bs, offset, s-sectorcounts[i]); -if (ret 0) { -goto fail; -} +s-sectorcounts[i] = buff_read_uint64(buffer, offset); offset += 8; if (s-sectorcounts[i] DMG_SECTORCOUNTS_MAX) { @@ -254,17 +251,11 @@ static int dmg_read_mish_block(BlockDriverState *bs, DmgHeaderState *ds, goto fail; } -ret = read_uint64(bs, offset, s-offsets[i]); -if (ret 0) { -goto fail; -} +s-offsets[i] = buff_read_uint64(buffer, offset); s-offsets[i] += ds-last_in_offset; offset += 8; -ret = read_uint64(bs, offset, s-lengths[i]); -if (ret 0) { -goto fail; -} +s-lengths[i] = buff_read_uint64(buffer, offset); offset += 8; if (s-lengths[i] DMG_LENGTHS_MAX) { @@ -288,8 +279,10 @@ fail: static int dmg_read_resource_fork(BlockDriverState *bs, DmgHeaderState *ds, uint64_t info_begin, uint64_t info_length) { +BDRVDMGState *s = bs-opaque; int ret; uint32_t count, rsrc_data_offset; +uint8_t *buffer = NULL; uint64_t info_end; uint64_t offset; @@ -330,16 +323,23 @@ static int dmg_read_resource_fork(BlockDriverState *bs, DmgHeaderState *ds, } offset += 4; -ret = dmg_read_mish_block(bs, ds, offset, count); +buffer = g_realloc(buffer, count); +ret = bdrv_pread(bs-file, offset, buffer, count); +if (ret 0) { +goto fail; +} + +ret = dmg_read_mish_block(s, ds, buffer, count); if (ret 0) { goto fail; } /* advance offset by size of resource */ offset += count; } -return 0; +ret = 0; fail: +g_free(buffer); return ret; } -- 2.1.0
[Qemu-devel] [PULL 16/16] qemu-iotests: Fix supported_oses check
From: Fam Zheng f...@redhat.com There is a bug in the recently added sys.platform test, and we no longer run python tests, because linux2 is the value to compare here. So do a prefix match. According to python doc [1], the way to use sys.platform is unless you want to test for a specific system version, it is therefore recommended to use the following idiom: if sys.platform.startswith('freebsd'): # FreeBSD-specific code here... elif sys.platform.startswith('linux'): # Linux-specific code here... [1]: https://docs.python.org/2.7/library/sys.html#sys.platform Signed-off-by: Fam Zheng f...@redhat.com Signed-off-by: Stefan Hajnoczi stefa...@redhat.com --- tests/qemu-iotests/iotests.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py index 87002e0..241b5ee 100644 --- a/tests/qemu-iotests/iotests.py +++ b/tests/qemu-iotests/iotests.py @@ -288,7 +288,7 @@ def main(supported_fmts=[], supported_oses=['linux']): if supported_fmts and (imgfmt not in supported_fmts): notrun('not suitable for this image format: %s' % imgfmt) -if sys.platform not in supported_oses: +if True not in [sys.platform.startswith(x) for x in supported_oses]: notrun('not suitable for this OS: %s' % sys.platform) # We need to filter out the time taken from the output so that qemu-iotest -- 2.1.0
[Qemu-devel] [PULL 10/16] block/dmg: use SectorNumber from BLKX header
From: Peter Wu pe...@lekensteyn.nl Previously the sector table parsing relied on the previous offset of the DMG file. Now it uses the sector number from the BLKX header (see http://newosxbook.com/DMG.html). The implementation of dmg2img (from vu1tur) does not base the output sector on the location of the terminator (0x) either so it should be safe to drop this dependency on the previous state. (It makes somehow makes sense, a terminator should halt further processing of a block and is perhaps used to preallocate some space.) Signed-off-by: Peter Wu pe...@lekensteyn.nl Reviewed-by: John Snow js...@redhat.com Message-id: 1420566495-13284-10-git-send-email-pe...@lekensteyn.nl Signed-off-by: Stefan Hajnoczi stefa...@redhat.com --- block/dmg.c | 12 +--- 1 file changed, 5 insertions(+), 7 deletions(-) diff --git a/block/dmg.c b/block/dmg.c index 65fc7fe..0f278c8 100644 --- a/block/dmg.c +++ b/block/dmg.c @@ -187,7 +187,6 @@ typedef struct DmgHeaderState { /* used internally by dmg_read_mish_block to remember offsets of blocks * across calls */ uint64_t data_fork_offset; -uint64_t last_out_offset; /* exported for dmg_open */ uint32_t max_compressed_size; uint32_t max_sectors_per_chunk; @@ -203,6 +202,7 @@ static int dmg_read_mish_block(BDRVDMGState *s, DmgHeaderState *ds, int64_t offset = 0; uint64_t data_offset; uint64_t in_offset = ds-data_fork_offset; +uint64_t out_offset; type = buff_read_uint32(buffer, offset); /* skip data that is not a valid MISH block (invalid magic or too small) */ @@ -211,6 +211,9 @@ static int dmg_read_mish_block(BDRVDMGState *s, DmgHeaderState *ds, return 0; } +/* chunk offsets are relative to this sector number */ +out_offset = buff_read_uint64(buffer, offset + 8); + /* location in data fork for (compressed) blob (in bytes) */ data_offset = buff_read_uint64(buffer, offset + 0x18); in_offset += data_offset; @@ -231,10 +234,6 @@ static int dmg_read_mish_block(BDRVDMGState *s, DmgHeaderState *ds, offset += 4; if (s-types[i] != 0x8005 s-types[i] != 1 s-types[i] != 2) { -if (s-types[i] == 0x i 0) { -ds-last_out_offset = s-sectors[i - 1] + - s-sectorcounts[i - 1]; -} chunk_count--; i--; offset += 36; @@ -243,7 +242,7 @@ static int dmg_read_mish_block(BDRVDMGState *s, DmgHeaderState *ds, offset += 4; s-sectors[i] = buff_read_uint64(buffer, offset); -s-sectors[i] += ds-last_out_offset; +s-sectors[i] += out_offset; offset += 8; s-sectorcounts[i] = buff_read_uint64(buffer, offset); @@ -417,7 +416,6 @@ static int dmg_open(BlockDriverState *bs, QDict *options, int flags, s-offsets = s-lengths = s-sectors = s-sectorcounts = NULL; /* used by dmg_read_mish_block to keep track of the current I/O position */ ds.data_fork_offset = 0; -ds.last_out_offset = 0; ds.max_compressed_size = 1; ds.max_sectors_per_chunk = 1; -- 2.1.0
Re: [Qemu-devel] [PATCH v11 04/13] block: Introduce bdrv_dirty_bitmap_granularity()
On 2015-01-12 at 11:30, John Snow wrote: From: Fam Zheng f...@redhat.com This returns the granularity (in bytes) of dirty bitmap, which matches the QMP interface and the existing query interface. Signed-off-by: Fam Zheng f...@redhat.com Signed-off-by: John Snow js...@redhat.com --- block.c | 9 +++-- include/block/block.h | 2 ++ 2 files changed, 9 insertions(+), 2 deletions(-) Reviewed-by: Max Reitz mre...@redhat.com
[Qemu-devel] [PULL 15/16] qemu-iotests: add 116 invalid QED input file tests
These tests exercise error code paths in the QED image format. The tests are very simple, they just prove that the error path exits cleanly. Signed-off-by: Stefan Hajnoczi stefa...@redhat.com Message-id: 1421065893-18875-3-git-send-email-stefa...@redhat.com Reviewed-by: Kevin Wolf kw...@redhat.com --- tests/qemu-iotests/116 | 96 ++ tests/qemu-iotests/116.out | 37 ++ tests/qemu-iotests/group | 1 + 3 files changed, 134 insertions(+) create mode 100755 tests/qemu-iotests/116 create mode 100644 tests/qemu-iotests/116.out diff --git a/tests/qemu-iotests/116 b/tests/qemu-iotests/116 new file mode 100755 index 000..713ed48 --- /dev/null +++ b/tests/qemu-iotests/116 @@ -0,0 +1,96 @@ +#!/bin/bash +# +# Test error code paths for invalid QED images +# +# The aim of this test is to exercise the error paths in qed_open() to ensure +# there are no crashes with invalid input files. +# +# Copyright (C) 2015 Red Hat, Inc. +# +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 2 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program. If not, see http://www.gnu.org/licenses/. +# + +# creator +owner=stefa...@redhat.com + +seq=`basename $0` +echo QA output created by $seq + +here=`pwd` +tmp=/tmp/$$ +status=1 # failure is the default! + +_cleanup() +{ + _cleanup_test_img +} +trap _cleanup; exit \$status 0 1 2 3 15 + +# get standard environment, filters and checks +. ./common.rc +. ./common.filter + +_supported_fmt qed +_supported_proto generic +_supported_os Linux + + +size=128M + +echo +echo == truncated header cluster == +_make_test_img $size +truncate -s 512 $TEST_IMG +$QEMU_IO -f $IMGFMT -c read 0 $size $TEST_IMG 21 | _filter_qemu_io | _filter_testdir + +echo +echo == invalid header magic == +_make_test_img $size +poke_file $TEST_IMG 0 QEDX +$QEMU_IO -f $IMGFMT -c read 0 $size $TEST_IMG 21 | _filter_qemu_io | _filter_testdir + +echo +echo == invalid cluster size == +_make_test_img $size +poke_file $TEST_IMG 4 \xff\xff\xff\xff +$QEMU_IO -f $IMGFMT -c read 0 $size $TEST_IMG 21 | _filter_qemu_io | _filter_testdir + +echo +echo == invalid table size == +_make_test_img $size +poke_file $TEST_IMG 8 \xff\xff\xff\xff +$QEMU_IO -f $IMGFMT -c read 0 $size $TEST_IMG 21 | _filter_qemu_io | _filter_testdir + +echo +echo == invalid header size == +_make_test_img $size +poke_file $TEST_IMG 12 \xff\xff\xff\xff +$QEMU_IO -f $IMGFMT -c read 0 $size $TEST_IMG 21 | _filter_qemu_io | _filter_testdir + +echo +echo == invalid L1 table offset == +_make_test_img $size +poke_file $TEST_IMG 40 \xff\xff\xff\xff\xff\xff\xff\xff +$QEMU_IO -f $IMGFMT -c read 0 $size $TEST_IMG 21 | _filter_qemu_io | _filter_testdir + +echo +echo == invalid image size == +_make_test_img $size +poke_file $TEST_IMG 48 \xff\xff\xff\xff\xff\xff\xff\xff +$QEMU_IO -f $IMGFMT -c read 0 $size $TEST_IMG 21 | _filter_qemu_io | _filter_testdir + +# success, all done +echo *** done +rm -f $seq.full +status=0 diff --git a/tests/qemu-iotests/116.out b/tests/qemu-iotests/116.out new file mode 100644 index 000..b679cee --- /dev/null +++ b/tests/qemu-iotests/116.out @@ -0,0 +1,37 @@ +QA output created by 116 + +== truncated header cluster == +Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134217728 +qemu-io: can't open device TEST_DIR/t.qed: Could not open 'TEST_DIR/t.qed': Invalid argument +no file open, try 'help open' + +== invalid header magic == +Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134217728 +qemu-io: can't open device TEST_DIR/t.qed: Image not in QED format +no file open, try 'help open' + +== invalid cluster size == +Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134217728 +qemu-io: can't open device TEST_DIR/t.qed: Could not open 'TEST_DIR/t.qed': Invalid argument +no file open, try 'help open' + +== invalid table size == +Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134217728 +qemu-io: can't open device TEST_DIR/t.qed: Could not open 'TEST_DIR/t.qed': Invalid argument +no file open, try 'help open' + +== invalid header size == +Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134217728 +qemu-io: can't open device TEST_DIR/t.qed: Could not open 'TEST_DIR/t.qed': Invalid argument +no file open, try 'help open' + +== invalid L1 table offset == +Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134217728 +qemu-io: can't open device TEST_DIR/t.qed: Could not open 'TEST_DIR/t.qed': Invalid argument +no file open, try 'help open' + +== invalid image size == +Formatting
[Qemu-devel] [PULL 13/16] block/dmg: improve zeroes handling
From: Peter Wu pe...@lekensteyn.nl Disk images may contain large all-zeroes gaps (1.66k sectors or 812 MiB is seen in the real world). These blocks (type 2) do not need to be extracted into a temporary buffer, there is no need to allocate memory for these blocks nor to check its length. (For the test image, the maximum uncompressed size is 1054371 bytes, probably for a bzip2-compressed block.) Signed-off-by: Peter Wu pe...@lekensteyn.nl Reviewed-by: John Snow js...@redhat.com Message-id: 1420566495-13284-13-git-send-email-pe...@lekensteyn.nl Signed-off-by: Stefan Hajnoczi stefa...@redhat.com --- block/dmg.c | 18 +++--- 1 file changed, 15 insertions(+), 3 deletions(-) diff --git a/block/dmg.c b/block/dmg.c index 196c746..66a4b3f 100644 --- a/block/dmg.c +++ b/block/dmg.c @@ -137,7 +137,9 @@ static void update_max_chunk_size(BDRVDMGState *s, uint32_t chunk, uncompressed_sectors = (s-lengths[chunk] + 511) / 512; break; case 2: /* zero */ -uncompressed_sectors = s-sectorcounts[chunk]; +/* as the all-zeroes block may be large, it is treated specially: the + * sector is not copied from a large buffer, a simple memset is used + * instead. Therefore uncompressed_sectors does not need to be set. */ break; } @@ -267,7 +269,9 @@ static int dmg_read_mish_block(BDRVDMGState *s, DmgHeaderState *ds, /* sector count */ s-sectorcounts[i] = buff_read_uint64(buffer, offset + 0x10); -if (s-sectorcounts[i] DMG_SECTORCOUNTS_MAX) { +/* all-zeroes sector (type 2) does not need to be uncompressed and can + * therefore be unbounded. */ +if (s-types[i] != 2 s-sectorcounts[i] DMG_SECTORCOUNTS_MAX) { error_report(sector count % PRIu64 for chunk % PRIu32 is larger than max (%u), s-sectorcounts[i], i, DMG_SECTORCOUNTS_MAX); @@ -642,7 +646,8 @@ static inline int dmg_read_chunk(BlockDriverState *bs, uint64_t sector_num) } break; case 2: /* zero */ -memset(s-uncompressed_chunk, 0, 512 * s-sectorcounts[chunk]); +/* see dmg_read, it is treated specially. No buffer needs to be + * pre-filled, the zeroes can be set directly. */ break; } s-current_chunk = chunk; @@ -661,6 +666,13 @@ static int dmg_read(BlockDriverState *bs, int64_t sector_num, if (dmg_read_chunk(bs, sector_num + i) != 0) { return -1; } +/* Special case: current chunk is all zeroes. Do not perform a memcpy as + * s-uncompressed_chunk may be too small to cover the large all-zeroes + * section. dmg_read_chunk is called to find s-current_chunk */ +if (s-types[s-current_chunk] == 2) { /* all zeroes block entry */ +memset(buf + i * 512, 0, 512); +continue; +} sector_offset_in_chunk = sector_num + i - s-sectors[s-current_chunk]; memcpy(buf + i * 512, s-uncompressed_chunk + sector_offset_in_chunk * 512, 512); -- 2.1.0
Re: [Qemu-devel] [PATCH v11 05/13] block: Add bdrv_clear_dirty_bitmap
On 2015-01-12 at 11:30, John Snow wrote: Signed-off-by: Fam Zheng f...@redhat.com Signed-off-by: John Snow js...@redhat.com --- block.c | 24 include/block/block.h | 1 + 2 files changed, 21 insertions(+), 4 deletions(-) Reviewed-by: Max Reitz mre...@redhat.com
Re: [Qemu-devel] [PATCH v11 06/13] hbitmap: add hbitmap_merge
On 2015-01-12 at 11:30, John Snow wrote: We add a bitmap merge operation to assist in error cases where we wish to combine two bitmaps together. This is algorithmically O(bits) provided HBITMAP_LEVELS remains constant. For a full bitmap on a 64bit machine: sum(bits/64^k, k, 0, HBITMAP_LEVELS) ~= 1.01587 * bits We may be able to improve running speed for particularly sparse bitmaps by using iterators, but the running time for dense maps will be worse. We present the simpler solution first, and we can refine it later if needed. Signed-off-by: John Snow js...@redhat.com --- include/qemu/hbitmap.h | 11 +++ util/hbitmap.c | 28 2 files changed, 39 insertions(+) Reviewed-by: Max Reitz mre...@redhat.com
Re: [Qemu-devel] [PATCH 2/2] tcg-arm: more instruction execution control
On 13 January 2015 at 15:48, Andrew Jones drjo...@redhat.com wrote: Cleanup XN/PXN handling in get_phys_addr_lpae, and implement all but EL2 support of the following ARMv8 sections D4.5.1 Memory access control: Access permissions for instruction execution G4.7.2 Execute-never restrictions on instruction fetching G4.7.2 matches the ARMv7 section B3.7.2 when long-descriptors are used. Signed-off-by: Andrew Jones drjo...@redhat.com --- v3: - correct logic for (is_user !(ap 1)) case - base on user doesn't need R/W access to exec patch v2: correct assert in el2 case I didn't test this with secure mode (I didn't even check if that's currently possible), but I did test all other EL10 XN controls with both tcg-arm (cortex-a15) and tcg-aarch64 (cortex-a57) by hacking up some tests with kvm-unit-tests/arm[64]. I also booted Linux (just up to looking for a rootfs) with both. target-arm/helper.c | 80 +++-- 1 file changed, 66 insertions(+), 14 deletions(-) diff --git a/target-arm/helper.c b/target-arm/helper.c index 7c30a2669a0f2..715e0f47bec7d 100644 --- a/target-arm/helper.c +++ b/target-arm/helper.c @@ -4516,6 +4516,69 @@ static inline int check_ap(CPUARMState *env, int ap, int domain_prot, } } +/* Check section/page execution permission. + * + * Returns PAGE_EXEC if execution is permitted, otherwise zero. + * + * Supports AArch64, AArch32, and v7-LPAE. Doesn't yet support + * EL2. Actually, for v7-LPAE, we'd also need to emulate Virt + * Extensions to truly have WXN/UWXN. We don't support checking + * for that yet, so we just assume we have them. If you want to know if the Virtualization Extensions are present, check feature bit ARM_FEATURE_EL2. In general this code seems to be rather short on feature bit checks -- notice that the code it is replacing was doing checks on the V8 feature bit, for instance. + * + * @env : CPUARMState + * @is_user : 0 for privileged access, 1 for user + * @ap : Access permissions (AP[2:1]) from descriptor + * @ns : (NSTable || NS) from descriptors + * @xn : ([U]XNTable || [U]XN) from descriptors + * @pxn : (PXNTable || PXN) from descriptors + */ +static int check_xn_lpae(CPUARMState *env, int is_user, int ap, + int ns, int xn, int pxn) +{ +int wxn; + +switch (arm_current_el(env)) { +case 0: If arm_current_el() is 0 but EL3 is 32 bit then the controlling environment here is the Secure PL1, which is EL3, and falling through to look at sctlr_el[1] is wrong. (SCTLR(S) is sctlr_el[3].) (Also get_phys_addr() can be called via the ats_write() functions, which means that arm_current_el() and the address translation regime we're trying to determine the answer for aren't necessarily the same...) I think we're very soon going to need to bite the bullet and make this code have a concept of the current translation regime, as the ARM ARM terms it... If you wish to avoid that (and I wouldn't object, given this patchset is snowballing in scope already) then do what get_phys_addr() does to get the relevant SCTLR: uint32_t sctlr = A32_BANKED_CURRENT_REG_GET(env, sctlr); This will work for everything we currently support: * AArch64 EL1 (with an EL0 either AArch32 or AArch64) * AArch32 EL1 * TZ for the limited case of EL3 == AArch32 (and so no EL1) +case 1: +wxn = env-cp15.sctlr_el[1] SCTLR_WXN; +if (arm_el_is_aa64(env, 1)) { +if (is_user !(ap 1)) { +wxn = 0; +} I don't understand what we're doing here: as far as I can see the ARM ARM simply defines that WXN is the bit from the relevant SCTLR, and it always applies if the page is writable. I have a feeling you're re-calculating half of is this page writable here, and then doing the other half below. It would be cleaner to pass in the 'prot' bits that have already been calculated. +pxn = pxn || ((ap 1) !(ap 2)); Very weird way to write ap == 1 ? +xn = is_user ? xn : pxn; +} else { +int uwxn = env-cp15.sctlr_el[1] SCTLR_UWXN; +pxn = pxn || (uwxn (ap 1) !(ap 2)); +xn = xn || (!is_user pxn) || (is_user !(ap 1)); +} +if (arm_is_secure(env)) { +xn = xn || (ns (env-cp15.scr_el3 SCR_SIF)); +} +break; +case 2: +/* TODO actually support EL2 */ +assert(false); + +if (arm_el_is_aa64(env, 2)) { +wxn = env-cp15.sctlr_el[2] SCTLR_WXN; +} else { +/* wxn = HSCTLR.WXN */ +} You can just write this as wxn = env-cp15.sctlr_el[2] SCTLR_WXN; because SCTLR_EL2 and HSCTLR are architecturally mapped and so we will store the state in the same env-cp15 field whether EL2 is 32 or 64 bit. +break; +case 3: +if (arm_el_is_aa64(env, 3)) { +
Re: [Qemu-devel] [PATCH v11 07/13] qmp: Add block-dirty-bitmap-enable and block-dirty-bitmap-disable
On 2015-01-12 at 11:30, John Snow wrote: From: Fam Zheng f...@redhat.com This allows to put the dirty bitmap into a disabled state where no more writes will be tracked. It will be used before backup or writing to persistent file. Signed-off-by: Fam Zheng f...@redhat.com Signed-off-by: John Snow js...@redhat.com --- block.c | 24 +++- blockdev.c| 40 include/block/block.h | 3 +++ qapi/block-core.json | 28 qmp-commands.hx | 10 ++ 5 files changed, 104 insertions(+), 1 deletion(-) diff --git a/block.c b/block.c index 7bf7079..bd4b449 100644 --- a/block.c +++ b/block.c @@ -56,6 +56,7 @@ struct BdrvDirtyBitmap { int64_t size; int64_t granularity; char *name; +bool enabled; QLIST_ENTRY(BdrvDirtyBitmap) list; }; @@ -5373,10 +5374,16 @@ BdrvDirtyBitmap *bdrv_create_dirty_bitmap(BlockDriverState *bs, bitmap-granularity = granularity; bitmap-bitmap = hbitmap_alloc(bitmap-size, ffs(sector_granularity) - 1); bitmap-name = g_strdup(name); +bitmap-enabled = true; QLIST_INSERT_HEAD(bs-dirty_bitmaps, bitmap, list); return bitmap; } +bool bdrv_dirty_bitmap_enabled(BdrvDirtyBitmap *bitmap) +{ +return bitmap-enabled; +} + void bdrv_release_dirty_bitmap(BlockDriverState *bs, BdrvDirtyBitmap *bitmap) { BdrvDirtyBitmap *bm, *next; @@ -5391,6 +5398,16 @@ void bdrv_release_dirty_bitmap(BlockDriverState *bs, BdrvDirtyBitmap *bitmap) } } +void bdrv_disable_dirty_bitmap(BdrvDirtyBitmap *bitmap) +{ +bitmap-enabled = false; +} + +void bdrv_enable_dirty_bitmap(BdrvDirtyBitmap *bitmap) +{ +bitmap-enabled = true; +} + BlockDirtyInfoList *bdrv_query_dirty_bitmaps(BlockDriverState *bs) { BdrvDirtyBitmap *bm; @@ -5458,7 +5475,9 @@ void bdrv_dirty_iter_init(BlockDriverState *bs, void bdrv_set_dirty_bitmap(BlockDriverState *bs, BdrvDirtyBitmap *bitmap, int64_t cur_sector, int nr_sectors) { -hbitmap_set(bitmap-bitmap, cur_sector, nr_sectors); +if (bdrv_dirty_bitmap_enabled(bitmap)) { +hbitmap_set(bitmap-bitmap, cur_sector, nr_sectors); +} Why are you checking whether the bitmap is enabled here in bdrv_set_dirty_bitmap(), but neither in bdrv_reset_dirty_bitmap() nor bdrv_clear_dirty_bitmap()? It seems consistent with the commit message which only states that disabled state means that no more writes (i.e. bdrv_set_dirty_bitmap()) will be tracked, but it still seems strange to me. } void bdrv_reset_dirty_bitmap(BlockDriverState *bs, BdrvDirtyBitmap *bitmap, @@ -5481,6 +5500,9 @@ static void bdrv_set_dirty(BlockDriverState *bs, int64_t cur_sector, { BdrvDirtyBitmap *bitmap; QLIST_FOREACH(bitmap, bs-dirty_bitmaps, list) { +if (!bdrv_dirty_bitmap_enabled(bitmap)) { +continue; +} Same question as above, only this time it's about bdrv_reset_dirty(). In case the answer to the question is that's intentional: Reviewed-by: Max Reitz mre...@redhat.com
Re: [Qemu-devel] Press Inquiry: Qemu Advent Calendar (German Linux Magazin)
Sounds awesome, will have to order some copies for our office. Sorry about the earlier email not being reply-all, was on mobile email and sent it incorrectly. Matthew Hungerford Pebble Firmware Engineer On Thu, Jan 15, 2015 at 1:18 PM, Tim Schürmann i...@tim-schuermann.de wrote: Hi Stefan, if nothing went wrong the (whole) advent calendar will be on the DVD published with the Linux Magazin issue 03/2015. The release date is the 05. Feb. 2015 (in germany). Best regards, Tim Schürmann i...@tim-schuermann.de Am 15.01.2015 um 15:47 schrieb Stefan Hajnoczi: On Tue, Jan 13, 2015 at 01:35:55PM +0100, Tim Schürmann wrote: Hi Tim, In order to comply with the license on GPL images you will need to offer the source code (see links on http://qemu-advent-calendar.org/ for each image). The Ubuntu Core and Ceph sourcecodes won't fit on our DVD. So we have to provide links to the corresponding websites. Is this Ok with you? (My editor-in-chief said that this conform with the GPL.) I am not the (only) copyright holder of Ubuntu Core or Ceph, and I'm not a lawyer. You need to decide whether you are in compliance or not. My plan is to keep http://qemu-advent-calendar.org/ running in the foreseeable future so the sources should remain accessible. If you mirror the sources it would be best - that way you can continue providing source even if http://qemu-advent-calendar.org/ were to go offline. I'm CCing Matthew Hungerford at Pebble to confirm that the Pebble Smartwatch image can also be distributed. Did you got an answer already? Yes, Matthew responded. It was a Reply instead of Reply-All. I forgot to forward you his response (sorry): Date: Fri, 9 Jan 2015 11:12:52 -0800 From: Matthew Hungerford matthew.hungerf...@getpebble.com Sounds great, One of my german coworkers is a fan of Linux Magazin, and would get a kick out if it being featured there. Thanks for asking, Matthew All the best, Stefan
Re: [Qemu-devel] [PULL 00/16] Block patches
On 16 January 2015 at 15:36, Stefan Hajnoczi stefa...@redhat.com wrote: The following changes since commit df58887b20fab8fe8a6dcca4db30cd4e4077d53a: Merge remote-tracking branch 'remotes/mjt/tags/pull-trivial-patches-2015-01-15' into staging (2015-01-15 10:08:46 +) are available in the git repository at: git://github.com/stefanha/qemu.git tags/block-pull-request for you to fetch changes up to d6e2630f17fa40342af80d20c04f279fa23ea189: qemu-iotests: Fix supported_oses check (2015-01-16 15:36:34 +) Hi. I'm afraid this doesn't build on all platforms, due to dependencies on newer glib versions than our minimum requirement. OSX: CCtests/test-write-threshold.o /Users/pm215/src/qemu/tests/test-write-threshold.c:21:5: warning: implicit declaration of function 'g_assert_false' is invalid in C99 [-Wimplicit-function-declaration] g_assert_false(bdrv_write_threshold_is_set(bs)); ^ 1 warning generated. LINK tests/test-write-threshold Undefined symbols for architecture x86_64: _g_assert_false, referenced from: _test_threshold_not_set_on_init in test-write-threshold.o ld: symbol(s) not found for architecture x86_64 CentOS5: ../block/dmg.o: In function `dmg_read_plist_xml': /home/petmay01/linaro/qemu-for-merges/block/dmg.c:414: undefined reference to `g_base64_decode_inplace' -- PMM
Re: [Qemu-devel] [PATCH v11 03/13] qmp: Add block-dirty-bitmap-add and block-dirty-bitmap-remove
On 01/16/2015 10:36 AM, Max Reitz wrote: On 2015-01-12 at 11:30, John Snow wrote: From: Fam Zheng f...@redhat.com The new command pair is added to manage user created dirty bitmap. The dirty bitmap's name is mandatory and must be unique for the same device, but different devices can have bitmaps with the same names. The granularity is an optional field. If it is not specified, we will choose a default granularity based on the cluster size if available, clamped to between 4K and 64K to mirror how the 'mirror' code was already choosing granularity. If we do not have cluster size info available, we choose 64K. This code has been factored out into a helper shared with block/mirror. This patch also introduces the 'block_dirty_bitmap_lookup' helper, which takes a device name and a dirty bitmap name and validates the lookup, returning NULL and setting errp if there is a problem with either field. This helper will be re-used in future patches in this series. The types added to block-core.json will be re-used in future patches in this series, see: 'qapi: Add transaction support to block-dirty-bitmap-{add, enable, disable}' Signed-off-by: Fam Zheng f...@redhat.com Signed-off-by: John Snow js...@redhat.com --- block.c | 20 ++ block/mirror.c| 10 + blockdev.c| 100 ++ include/block/block.h | 1 + qapi/block-core.json | 55 +++ qmp-commands.hx | 51 + 6 files changed, 228 insertions(+), 9 deletions(-) diff --git a/block.c b/block.c index bfeae6b..3eb77ee 100644 --- a/block.c +++ b/block.c @@ -5417,6 +5417,26 @@ int bdrv_get_dirty(BlockDriverState *bs, BdrvDirtyBitmap *bitmap, int64_t sector } } +/** + * Chooses a default granularity based on the existing cluster size, + * but clamped between [4K, 64K]. Defaults to 64K in the case that there + * is no cluster size information available. + */ +uint64_t bdrv_get_default_bitmap_granularity(BlockDriverState *bs) +{ +BlockDriverInfo bdi; +uint64_t granularity; + +if (bdrv_get_info(bs, bdi) = 0 bdi.cluster_size != 0) { +granularity = MAX(4096, bdi.cluster_size); +granularity = MIN(65536, granularity); +} else { +granularity = 65536; +} + +return granularity; +} + void bdrv_dirty_iter_init(BlockDriverState *bs, BdrvDirtyBitmap *bitmap, HBitmapIter *hbi) { diff --git a/block/mirror.c b/block/mirror.c index d819952..fc545f1 100644 --- a/block/mirror.c +++ b/block/mirror.c @@ -667,15 +667,7 @@ static void mirror_start_job(BlockDriverState *bs, BlockDriverState *target, MirrorBlockJob *s; if (granularity == 0) { -/* Choose the default granularity based on the target file's cluster - * size, clamped between 4k and 64k. */ -BlockDriverInfo bdi; -if (bdrv_get_info(target, bdi) = 0 bdi.cluster_size != 0) { -granularity = MAX(4096, bdi.cluster_size); -granularity = MIN(65536, granularity); -} else { -granularity = 65536; -} +granularity = bdrv_get_default_bitmap_granularity(target); } assert ((granularity (granularity - 1)) == 0); diff --git a/blockdev.c b/blockdev.c index 5651a8e..95251c7 100644 --- a/blockdev.c +++ b/blockdev.c @@ -1173,6 +1173,48 @@ out_aio_context: return NULL; } +/** + * Return a dirty bitmap (if present), after validating + * the node reference and bitmap names. Returns NULL on error, + * including when the BDS and/or bitmap is not found. + */ +static BdrvDirtyBitmap *block_dirty_bitmap_lookup(const char *node_ref, + const char *name, + BlockDriverState **pbs, + Error **errp) +{ +BlockDriverState *bs; +BdrvDirtyBitmap *bitmap; + +if (!node_ref) { +error_setg(errp, Node reference cannot be NULL); +return NULL; +} +if (!name) { +error_setg(errp, Bitmap name cannot be NULL); +return NULL; +} + +bs = bdrv_lookup_bs(node_ref, node_ref, errp); +if (!bs) { +error_setg(errp, Node reference '%s' not found, node_ref); No need to throw the (hopefully) perfectly fine Error code returned by bdrv_lookup_bs() away. I just wanted an error message consistent with the parameter name, in this case. i.e., We couldn't find the Node reference instead of device or node name. Just trying to distinguish the fact that this is an arbitrary reference in the error message. I can still remove it, but I am curious to see what Markus thinks of the names I have chosen before I monkey with the errors too much more. +return NULL; +} + +/* If caller provided a BDS*, provide the result of that lookup, too. */ +if (pbs) { +*pbs = bs; +} + +bitmap =
Re: [Qemu-devel] [PATCH v11 03/13] qmp: Add block-dirty-bitmap-add and block-dirty-bitmap-remove
On 2015-01-16 at 11:48, John Snow wrote: On 01/16/2015 10:36 AM, Max Reitz wrote: On 2015-01-12 at 11:30, John Snow wrote: From: Fam Zheng f...@redhat.com The new command pair is added to manage user created dirty bitmap. The dirty bitmap's name is mandatory and must be unique for the same device, but different devices can have bitmaps with the same names. The granularity is an optional field. If it is not specified, we will choose a default granularity based on the cluster size if available, clamped to between 4K and 64K to mirror how the 'mirror' code was already choosing granularity. If we do not have cluster size info available, we choose 64K. This code has been factored out into a helper shared with block/mirror. This patch also introduces the 'block_dirty_bitmap_lookup' helper, which takes a device name and a dirty bitmap name and validates the lookup, returning NULL and setting errp if there is a problem with either field. This helper will be re-used in future patches in this series. The types added to block-core.json will be re-used in future patches in this series, see: 'qapi: Add transaction support to block-dirty-bitmap-{add, enable, disable}' Signed-off-by: Fam Zheng f...@redhat.com Signed-off-by: John Snow js...@redhat.com --- block.c | 20 ++ block/mirror.c| 10 + blockdev.c| 100 ++ include/block/block.h | 1 + qapi/block-core.json | 55 +++ qmp-commands.hx | 51 + 6 files changed, 228 insertions(+), 9 deletions(-) diff --git a/block.c b/block.c index bfeae6b..3eb77ee 100644 --- a/block.c +++ b/block.c @@ -5417,6 +5417,26 @@ int bdrv_get_dirty(BlockDriverState *bs, BdrvDirtyBitmap *bitmap, int64_t sector } } +/** + * Chooses a default granularity based on the existing cluster size, + * but clamped between [4K, 64K]. Defaults to 64K in the case that there + * is no cluster size information available. + */ +uint64_t bdrv_get_default_bitmap_granularity(BlockDriverState *bs) +{ +BlockDriverInfo bdi; +uint64_t granularity; + +if (bdrv_get_info(bs, bdi) = 0 bdi.cluster_size != 0) { +granularity = MAX(4096, bdi.cluster_size); +granularity = MIN(65536, granularity); +} else { +granularity = 65536; +} + +return granularity; +} + void bdrv_dirty_iter_init(BlockDriverState *bs, BdrvDirtyBitmap *bitmap, HBitmapIter *hbi) { diff --git a/block/mirror.c b/block/mirror.c index d819952..fc545f1 100644 --- a/block/mirror.c +++ b/block/mirror.c @@ -667,15 +667,7 @@ static void mirror_start_job(BlockDriverState *bs, BlockDriverState *target, MirrorBlockJob *s; if (granularity == 0) { -/* Choose the default granularity based on the target file's cluster - * size, clamped between 4k and 64k. */ -BlockDriverInfo bdi; -if (bdrv_get_info(target, bdi) = 0 bdi.cluster_size != 0) { -granularity = MAX(4096, bdi.cluster_size); -granularity = MIN(65536, granularity); -} else { -granularity = 65536; -} +granularity = bdrv_get_default_bitmap_granularity(target); } assert ((granularity (granularity - 1)) == 0); diff --git a/blockdev.c b/blockdev.c index 5651a8e..95251c7 100644 --- a/blockdev.c +++ b/blockdev.c @@ -1173,6 +1173,48 @@ out_aio_context: return NULL; } +/** + * Return a dirty bitmap (if present), after validating + * the node reference and bitmap names. Returns NULL on error, + * including when the BDS and/or bitmap is not found. + */ +static BdrvDirtyBitmap *block_dirty_bitmap_lookup(const char *node_ref, + const char *name, + BlockDriverState **pbs, + Error **errp) +{ +BlockDriverState *bs; +BdrvDirtyBitmap *bitmap; + +if (!node_ref) { +error_setg(errp, Node reference cannot be NULL); +return NULL; +} +if (!name) { +error_setg(errp, Bitmap name cannot be NULL); +return NULL; +} + +bs = bdrv_lookup_bs(node_ref, node_ref, errp); +if (!bs) { +error_setg(errp, Node reference '%s' not found, node_ref); No need to throw the (hopefully) perfectly fine Error code returned by bdrv_lookup_bs() away. I just wanted an error message consistent with the parameter name, in this case. i.e., We couldn't find the Node reference instead of device or node name. Just trying to distinguish the fact that this is an arbitrary reference in the error message. I can still remove it, but I am curious to see what Markus thinks of the names I have chosen before I monkey with the errors too much more. Very well then, but you should clean up the error returned by bdrv_lookup_bs() (call error_free()). Feel free to keep my R-b whichever decision you'll make (as long as
Re: [Qemu-devel] [PATCH v11 03/13] qmp: Add block-dirty-bitmap-add and block-dirty-bitmap-remove
On 01/16/2015 11:51 AM, Max Reitz wrote: On 2015-01-16 at 11:48, John Snow wrote: On 01/16/2015 10:36 AM, Max Reitz wrote: On 2015-01-12 at 11:30, John Snow wrote: From: Fam Zheng f...@redhat.com The new command pair is added to manage user created dirty bitmap. The dirty bitmap's name is mandatory and must be unique for the same device, but different devices can have bitmaps with the same names. The granularity is an optional field. If it is not specified, we will choose a default granularity based on the cluster size if available, clamped to between 4K and 64K to mirror how the 'mirror' code was already choosing granularity. If we do not have cluster size info available, we choose 64K. This code has been factored out into a helper shared with block/mirror. This patch also introduces the 'block_dirty_bitmap_lookup' helper, which takes a device name and a dirty bitmap name and validates the lookup, returning NULL and setting errp if there is a problem with either field. This helper will be re-used in future patches in this series. The types added to block-core.json will be re-used in future patches in this series, see: 'qapi: Add transaction support to block-dirty-bitmap-{add, enable, disable}' Signed-off-by: Fam Zheng f...@redhat.com Signed-off-by: John Snow js...@redhat.com --- block.c | 20 ++ block/mirror.c| 10 + blockdev.c| 100 ++ include/block/block.h | 1 + qapi/block-core.json | 55 +++ qmp-commands.hx | 51 + 6 files changed, 228 insertions(+), 9 deletions(-) diff --git a/block.c b/block.c index bfeae6b..3eb77ee 100644 --- a/block.c +++ b/block.c @@ -5417,6 +5417,26 @@ int bdrv_get_dirty(BlockDriverState *bs, BdrvDirtyBitmap *bitmap, int64_t sector } } +/** + * Chooses a default granularity based on the existing cluster size, + * but clamped between [4K, 64K]. Defaults to 64K in the case that there + * is no cluster size information available. + */ +uint64_t bdrv_get_default_bitmap_granularity(BlockDriverState *bs) +{ +BlockDriverInfo bdi; +uint64_t granularity; + +if (bdrv_get_info(bs, bdi) = 0 bdi.cluster_size != 0) { +granularity = MAX(4096, bdi.cluster_size); +granularity = MIN(65536, granularity); +} else { +granularity = 65536; +} + +return granularity; +} + void bdrv_dirty_iter_init(BlockDriverState *bs, BdrvDirtyBitmap *bitmap, HBitmapIter *hbi) { diff --git a/block/mirror.c b/block/mirror.c index d819952..fc545f1 100644 --- a/block/mirror.c +++ b/block/mirror.c @@ -667,15 +667,7 @@ static void mirror_start_job(BlockDriverState *bs, BlockDriverState *target, MirrorBlockJob *s; if (granularity == 0) { -/* Choose the default granularity based on the target file's cluster - * size, clamped between 4k and 64k. */ -BlockDriverInfo bdi; -if (bdrv_get_info(target, bdi) = 0 bdi.cluster_size != 0) { -granularity = MAX(4096, bdi.cluster_size); -granularity = MIN(65536, granularity); -} else { -granularity = 65536; -} +granularity = bdrv_get_default_bitmap_granularity(target); } assert ((granularity (granularity - 1)) == 0); diff --git a/blockdev.c b/blockdev.c index 5651a8e..95251c7 100644 --- a/blockdev.c +++ b/blockdev.c @@ -1173,6 +1173,48 @@ out_aio_context: return NULL; } +/** + * Return a dirty bitmap (if present), after validating + * the node reference and bitmap names. Returns NULL on error, + * including when the BDS and/or bitmap is not found. + */ +static BdrvDirtyBitmap *block_dirty_bitmap_lookup(const char *node_ref, + const char *name, + BlockDriverState **pbs, + Error **errp) +{ +BlockDriverState *bs; +BdrvDirtyBitmap *bitmap; + +if (!node_ref) { +error_setg(errp, Node reference cannot be NULL); +return NULL; +} +if (!name) { +error_setg(errp, Bitmap name cannot be NULL); +return NULL; +} + +bs = bdrv_lookup_bs(node_ref, node_ref, errp); +if (!bs) { +error_setg(errp, Node reference '%s' not found, node_ref); No need to throw the (hopefully) perfectly fine Error code returned by bdrv_lookup_bs() away. I just wanted an error message consistent with the parameter name, in this case. i.e., We couldn't find the Node reference instead of device or node name. Just trying to distinguish the fact that this is an arbitrary reference in the error message. I can still remove it, but I am curious to see what Markus thinks of the names I have chosen before I monkey with the errors too much more. Very well then, but you should clean up the error returned by bdrv_lookup_bs() (call error_free()). Feel free to keep my R-b whichever
Re: [Qemu-devel] [PATCH v11 07/13] qmp: Add block-dirty-bitmap-enable and block-dirty-bitmap-disable
On 01/16/2015 11:28 AM, Max Reitz wrote: On 2015-01-12 at 11:30, John Snow wrote: From: Fam Zheng f...@redhat.com This allows to put the dirty bitmap into a disabled state where no more writes will be tracked. It will be used before backup or writing to persistent file. Signed-off-by: Fam Zheng f...@redhat.com Signed-off-by: John Snow js...@redhat.com --- block.c | 24 +++- blockdev.c| 40 include/block/block.h | 3 +++ qapi/block-core.json | 28 qmp-commands.hx | 10 ++ 5 files changed, 104 insertions(+), 1 deletion(-) diff --git a/block.c b/block.c index 7bf7079..bd4b449 100644 --- a/block.c +++ b/block.c @@ -56,6 +56,7 @@ struct BdrvDirtyBitmap { int64_t size; int64_t granularity; char *name; +bool enabled; QLIST_ENTRY(BdrvDirtyBitmap) list; }; @@ -5373,10 +5374,16 @@ BdrvDirtyBitmap *bdrv_create_dirty_bitmap(BlockDriverState *bs, bitmap-granularity = granularity; bitmap-bitmap = hbitmap_alloc(bitmap-size, ffs(sector_granularity) - 1); bitmap-name = g_strdup(name); +bitmap-enabled = true; QLIST_INSERT_HEAD(bs-dirty_bitmaps, bitmap, list); return bitmap; } +bool bdrv_dirty_bitmap_enabled(BdrvDirtyBitmap *bitmap) +{ +return bitmap-enabled; +} + void bdrv_release_dirty_bitmap(BlockDriverState *bs, BdrvDirtyBitmap *bitmap) { BdrvDirtyBitmap *bm, *next; @@ -5391,6 +5398,16 @@ void bdrv_release_dirty_bitmap(BlockDriverState *bs, BdrvDirtyBitmap *bitmap) } } +void bdrv_disable_dirty_bitmap(BdrvDirtyBitmap *bitmap) +{ +bitmap-enabled = false; +} + +void bdrv_enable_dirty_bitmap(BdrvDirtyBitmap *bitmap) +{ +bitmap-enabled = true; +} + BlockDirtyInfoList *bdrv_query_dirty_bitmaps(BlockDriverState *bs) { BdrvDirtyBitmap *bm; @@ -5458,7 +5475,9 @@ void bdrv_dirty_iter_init(BlockDriverState *bs, void bdrv_set_dirty_bitmap(BlockDriverState *bs, BdrvDirtyBitmap *bitmap, int64_t cur_sector, int nr_sectors) { -hbitmap_set(bitmap-bitmap, cur_sector, nr_sectors); +if (bdrv_dirty_bitmap_enabled(bitmap)) { +hbitmap_set(bitmap-bitmap, cur_sector, nr_sectors); +} Why are you checking whether the bitmap is enabled here in bdrv_set_dirty_bitmap(), but neither in bdrv_reset_dirty_bitmap() nor bdrv_clear_dirty_bitmap()? It seems consistent with the commit message which only states that disabled state means that no more writes (i.e. bdrv_set_dirty_bitmap()) will be tracked, but it still seems strange to me. } void bdrv_reset_dirty_bitmap(BlockDriverState *bs, BdrvDirtyBitmap *bitmap, @@ -5481,6 +5500,9 @@ static void bdrv_set_dirty(BlockDriverState *bs, int64_t cur_sector, { BdrvDirtyBitmap *bitmap; QLIST_FOREACH(bitmap, bs-dirty_bitmaps, list) { +if (!bdrv_dirty_bitmap_enabled(bitmap)) { +continue; +} Same question as above, only this time it's about bdrv_reset_dirty(). In case the answer to the question is that's intentional: Reviewed-by: Max Reitz mre...@redhat.com Just because it's intentional doesn't make it a good idea. Fam is currently suggesting we might just scrap the enable/disable and frozen/normal modes altogether in favor of a single unified status, but the original way he proposed enabled/disabled is that it simply, and exclusively enabled/disabled new writes to the bitmap -- it never was intended to block clears/resets. So the bitmap is still able to be consumed while disabled, for instance. How useful is this? Perhaps not very useful. Probably more confusing than useful. I will *probably* try to unify these parameters as Fam suggested, but I am really eager to hear from Markus on the QMP interface side before I bother sending out another minor iteration. I'll skip the R-b for now, since Fam had some ideas on how to make it nicer. Thanks! --js
[Qemu-devel] [RFC 03/10] replace spinlock by QemuMutex.
From: KONRAD Frederic fred.kon...@greensocs.com spinlock is only used in two cases: * cpu-exec.c: to protect TranslationBlock * mem_helper.c: for lock helper in target-i386 (which seems broken). It's a pthread_mutex_t in user-mode so better using QemuMutex directly in this case. It allows as well to reuse tb_lock mutex of TBContext in case of multithread TCG. Signed-off-by: KONRAD Frederic fred.kon...@greensocs.com --- cpu-exec.c | 15 +++ include/exec/exec-all.h | 4 ++-- linux-user/main.c| 6 +++--- target-i386/mem_helper.c | 16 +--- tcg/i386/tcg-target.c| 8 5 files changed, 37 insertions(+), 12 deletions(-) diff --git a/cpu-exec.c b/cpu-exec.c index a4f0eff..1e7513c 100644 --- a/cpu-exec.c +++ b/cpu-exec.c @@ -335,7 +335,9 @@ int cpu_exec(CPUArchState *env) SyncClocks sc; /* This must be volatile so it is not trashed by longjmp() */ +#if defined(CONFIG_USER_ONLY) volatile bool have_tb_lock = false; +#endif if (cpu-halted) { if (!cpu_has_work(cpu)) { @@ -451,8 +453,10 @@ int cpu_exec(CPUArchState *env) cpu-exception_index = EXCP_INTERRUPT; cpu_loop_exit(cpu); } -spin_lock(tcg_ctx.tb_ctx.tb_lock); +#if defined(CONFIG_USER_ONLY) +qemu_mutex_lock(tcg_ctx.tb_ctx.tb_lock); have_tb_lock = true; +#endif tb = tb_find_fast(env); /* Note: we do it here to avoid a gcc bug on Mac OS X when doing it in tb_find_slow */ @@ -474,9 +478,10 @@ int cpu_exec(CPUArchState *env) tb_add_jump((TranslationBlock *)(next_tb ~TB_EXIT_MASK), next_tb TB_EXIT_MASK, tb); } +#if defined(CONFIG_USER_ONLY) have_tb_lock = false; -spin_unlock(tcg_ctx.tb_ctx.tb_lock); - +qemu_mutex_unlock(tcg_ctx.tb_ctx.tb_lock); +#endif /* cpu_interrupt might be called while translating the TB, but before it is linked into a potentially infinite loop and becomes env-current_tb. Avoid @@ -549,10 +554,12 @@ int cpu_exec(CPUArchState *env) #ifdef TARGET_I386 x86_cpu = X86_CPU(cpu); #endif +#if defined(CONFIG_USER_ONLY) if (have_tb_lock) { -spin_unlock(tcg_ctx.tb_ctx.tb_lock); +qemu_mutex_unlock(tcg_ctx.tb_ctx.tb_lock); have_tb_lock = false; } +#endif } } /* for(;;) */ diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h index 6a15448..6a450c1 100644 --- a/include/exec/exec-all.h +++ b/include/exec/exec-all.h @@ -173,7 +173,7 @@ struct TranslationBlock { struct TranslationBlock *jmp_first; }; -#include exec/spinlock.h +#include qemu/thread.h typedef struct TBContext TBContext; @@ -183,7 +183,7 @@ struct TBContext { TranslationBlock *tb_phys_hash[CODE_GEN_PHYS_HASH_SIZE]; int nb_tbs; /* any access to the tbs or the page table must use this lock */ -spinlock_t tb_lock; +QemuMutex tb_lock; /* statistics */ int tb_flush_count; diff --git a/linux-user/main.c b/linux-user/main.c index 67b0231..97e7d50 100644 --- a/linux-user/main.c +++ b/linux-user/main.c @@ -107,7 +107,7 @@ static int pending_cpus; /* Make sure everything is in a consistent state for calling fork(). */ void fork_start(void) { -pthread_mutex_lock(tcg_ctx.tb_ctx.tb_lock); +qemu_mutex_lock(tcg_ctx.tb_ctx.tb_lock); pthread_mutex_lock(exclusive_lock); mmap_fork_start(); } @@ -129,11 +129,11 @@ void fork_end(int child) pthread_mutex_init(cpu_list_mutex, NULL); pthread_cond_init(exclusive_cond, NULL); pthread_cond_init(exclusive_resume, NULL); -pthread_mutex_init(tcg_ctx.tb_ctx.tb_lock, NULL); +qemu_mutex_init(tcg_ctx.tb_ctx.tb_lock); gdbserver_fork((CPUArchState *)thread_cpu-env_ptr); } else { pthread_mutex_unlock(exclusive_lock); -pthread_mutex_unlock(tcg_ctx.tb_ctx.tb_lock); +qemu_mutex_unlock(tcg_ctx.tb_ctx.tb_lock); } } diff --git a/target-i386/mem_helper.c b/target-i386/mem_helper.c index 1aec8a5..7106cc3 100644 --- a/target-i386/mem_helper.c +++ b/target-i386/mem_helper.c @@ -23,17 +23,27 @@ /* broken thread support */ -static spinlock_t global_cpu_lock = SPIN_LOCK_UNLOCKED; +#if defined(CONFIG_USER_ONLY) +QemuMutex global_cpu_lock; void helper_lock(void) { -spin_lock(global_cpu_lock); +qemu_mutex_lock(global_cpu_lock); } void helper_unlock(void) { -spin_unlock(global_cpu_lock); +qemu_mutex_unlock(global_cpu_lock); } +#else +void helper_lock(void) +{ +} + +void helper_unlock(void) +{ +} +#endif void helper_cmpxchg8b(CPUX86State *env, target_ulong a0) { diff --git a/tcg/i386/tcg-target.c b/tcg/i386/tcg-target.c index 4133dcf..cbb8a4f 100644 ---
[Qemu-devel] [RFC 01/10] target-arm: protect cpu_exclusive_*.
From: KONRAD Frederic fred.kon...@greensocs.com This adds a lock to avoid multiple exclusive access at the same time in case of TCG multithread. Signed-off-by: KONRAD Frederic fred.kon...@greensocs.com V1 - V2: Removed qemu_mutex_destroy(). --- target-arm/cpu.c | 14 ++ target-arm/cpu.h | 3 +++ target-arm/helper.h| 3 +++ target-arm/op_helper.c | 10 ++ target-arm/translate.c | 6 ++ 5 files changed, 36 insertions(+) diff --git a/target-arm/cpu.c b/target-arm/cpu.c index 285947f..75bdc5b 100644 --- a/target-arm/cpu.c +++ b/target-arm/cpu.c @@ -31,6 +31,19 @@ #include sysemu/kvm.h #include kvm_arm.h +/* Protect cpu_exclusive_* variable .*/ +QemuMutex cpu_exclusive_lock; + +inline void arm_exclusive_lock(void) +{ +qemu_mutex_lock(cpu_exclusive_lock); +} + +inline void arm_exclusive_unlock(void) +{ +qemu_mutex_unlock(cpu_exclusive_lock); +} + static void arm_cpu_set_pc(CPUState *cs, vaddr value) { ARMCPU *cpu = ARM_CPU(cs); @@ -374,6 +387,7 @@ static void arm_cpu_initfn(Object *obj) cpu-psci_version = 2; /* TCG implements PSCI 0.2 */ if (!inited) { inited = true; +qemu_mutex_init(cpu_exclusive_lock); arm_translate_init(); } } diff --git a/target-arm/cpu.h b/target-arm/cpu.h index 7ba55f0..2101d85 100644 --- a/target-arm/cpu.h +++ b/target-arm/cpu.h @@ -1821,4 +1821,7 @@ enum { QEMU_PSCI_CONDUIT_HVC = 2, }; +void arm_exclusive_lock(void); +void arm_exclusive_unlock(void); + #endif diff --git a/target-arm/helper.h b/target-arm/helper.h index dec3728..ce07711 100644 --- a/target-arm/helper.h +++ b/target-arm/helper.h @@ -529,6 +529,9 @@ DEF_HELPER_2(dc_zva, void, env, i64) DEF_HELPER_FLAGS_2(neon_pmull_64_lo, TCG_CALL_NO_RWG_SE, i64, i64, i64) DEF_HELPER_FLAGS_2(neon_pmull_64_hi, TCG_CALL_NO_RWG_SE, i64, i64, i64) +DEF_HELPER_0(exclusive_lock, void) +DEF_HELPER_0(exclusive_unlock, void) + #ifdef TARGET_AARCH64 #include helper-a64.h #endif diff --git a/target-arm/op_helper.c b/target-arm/op_helper.c index 2bed914..d830fd8 100644 --- a/target-arm/op_helper.c +++ b/target-arm/op_helper.c @@ -33,6 +33,16 @@ static void raise_exception(CPUARMState *env, int tt) cpu_loop_exit(cs); } +void HELPER(exclusive_lock)(void) +{ +arm_exclusive_lock(); +} + +void HELPER(exclusive_unlock)(void) +{ +arm_exclusive_unlock(); +} + uint32_t HELPER(neon_tbl)(CPUARMState *env, uint32_t ireg, uint32_t def, uint32_t rn, uint32_t maxindex) { diff --git a/target-arm/translate.c b/target-arm/translate.c index bdfcdf1..513d151 100644 --- a/target-arm/translate.c +++ b/target-arm/translate.c @@ -7381,6 +7381,7 @@ static void gen_load_exclusive(DisasContext *s, int rt, int rt2, abort(); } +gen_helper_exclusive_lock(); if (size == 3) { TCGv_i32 tmp2 = tcg_temp_new_i32(); TCGv_i32 tmp3 = tcg_temp_new_i32(); @@ -7396,11 +7397,14 @@ static void gen_load_exclusive(DisasContext *s, int rt, int rt2, store_reg(s, rt, tmp); tcg_gen_extu_i32_i64(cpu_exclusive_addr, addr); +gen_helper_exclusive_unlock(); } static void gen_clrex(DisasContext *s) { +gen_helper_exclusive_lock(); tcg_gen_movi_i64(cpu_exclusive_addr, -1); +gen_helper_exclusive_unlock(); } #ifdef CONFIG_USER_ONLY @@ -7431,6 +7435,7 @@ static void gen_store_exclusive(DisasContext *s, int rd, int rt, int rt2, done_label = gen_new_label(); extaddr = tcg_temp_new_i64(); tcg_gen_extu_i32_i64(extaddr, addr); +gen_helper_exclusive_lock(); tcg_gen_brcond_i64(TCG_COND_NE, extaddr, cpu_exclusive_addr, fail_label); tcg_temp_free_i64(extaddr); @@ -7495,6 +7500,7 @@ static void gen_store_exclusive(DisasContext *s, int rd, int rt, int rt2, tcg_gen_movi_i32(cpu_R[rd], 1); gen_set_label(done_label); tcg_gen_movi_i64(cpu_exclusive_addr, -1); +gen_helper_exclusive_unlock(); } #endif -- 1.9.0
[Qemu-devel] [RFC 08/10] Drop global lock during TCG code execution
From: Jan Kiszka jan.kis...@siemens.com This finally allows TCG to benefit from the iothread introduction: Drop the global mutex while running pure TCG CPU code. Reacquire the lock when entering MMIO or PIO emulation, or when leaving the TCG loop. We have to revert a few optimization for the current TCG threading model, namely kicking the TCG thread in qemu_mutex_lock_iothread and not kicking it in qemu_cpu_kick. We also need to disable RAM block reordering until we have a more efficient locking mechanism at hand. I'm pretty sure some cases are still broken, definitely SMP (we no longer perform round-robin scheduling by chance). Still, a Linux x86 UP guest and my Musicpal ARM model boot fine here. These numbers demonstrate where we gain something: 20338 jan 20 0 331m 75m 6904 R 99 0.9 0:50.95 qemu-system-arm 20337 jan 20 0 331m 75m 6904 S 20 0.9 0:26.50 qemu-system-arm The guest CPU was fully loaded, but the iothread could still run mostly independent on a second core. Without the patch we don't get beyond 32206 jan 20 0 330m 73m 7036 R 82 0.9 1:06.00 qemu-system-arm 32204 jan 20 0 330m 73m 7036 S 21 0.9 0:17.03 qemu-system-arm We don't benefit significantly, though, when the guest is not fully loading a host CPU. Note that this patch depends on http://thread.gmane.org/gmane.comp.emulators.qemu/118657 Changes from Fred Konrad: * Rebase on the current HEAD. * Fixes a deadlock in qemu_devices_reset(). --- cpus.c| 19 +++ cputlb.c | 5 + exec.c| 25 + softmmu_template.h| 6 ++ target-i386/misc_helper.c | 27 --- translate-all.c | 2 ++ vl.c | 6 ++ 7 files changed, 75 insertions(+), 15 deletions(-) diff --git a/cpus.c b/cpus.c index 91a48f2..f10c94d 100644 --- a/cpus.c +++ b/cpus.c @@ -1017,7 +1017,7 @@ static void *qemu_tcg_cpu_thread_fn(void *arg) qemu_tcg_init_cpu_signals(); qemu_thread_get_self(cpu-thread); -qemu_mutex_lock(qemu_global_mutex); +qemu_mutex_lock_iothread(); CPU_FOREACH(cpu) { cpu-thread_id = qemu_get_thread_id(); cpu-created = true; @@ -1125,17 +1125,7 @@ static bool qemu_in_vcpu_thread(void) void qemu_mutex_lock_iothread(void) { -if (!tcg_enabled()) { -qemu_mutex_lock(qemu_global_mutex); -} else { -iothread_requesting_mutex = true; -if (qemu_mutex_trylock(qemu_global_mutex)) { -qemu_cpu_kick_thread(first_cpu); -qemu_mutex_lock(qemu_global_mutex); -} -iothread_requesting_mutex = false; -qemu_cond_broadcast(qemu_io_proceeded_cond); -} +qemu_mutex_lock(qemu_global_mutex); } void qemu_mutex_unlock_iothread(void) @@ -1356,7 +1346,12 @@ static int tcg_cpu_exec(CPUArchState *env) cpu-icount_decr.u16.low = decr; cpu-icount_extra = count; } + +qemu_mutex_unlock_iothread(); + ret = cpu_exec(env); + +qemu_mutex_lock_iothread(); #ifdef CONFIG_PROFILER qemu_time += profile_getclock() - ti; #endif diff --git a/cputlb.c b/cputlb.c index 3b271d4..4a7e634 100644 --- a/cputlb.c +++ b/cputlb.c @@ -30,6 +30,9 @@ #include exec/ram_addr.h #include tcg/tcg.h +void qemu_mutex_lock_iothread(void); +void qemu_mutex_unlock_iothread(void); + //#define DEBUG_TLB //#define DEBUG_TLB_CHECK @@ -125,8 +128,10 @@ void tlb_flush_page(CPUState *cpu, target_ulong addr) can be detected */ void tlb_protect_code(ram_addr_t ram_addr) { +qemu_mutex_lock_iothread(); cpu_physical_memory_reset_dirty(ram_addr, TARGET_PAGE_SIZE, DIRTY_MEMORY_CODE); +qemu_mutex_unlock_iothread(); } /* update the TLB so that writes in physical page 'phys_addr' are no longer diff --git a/exec.c b/exec.c index 081818e..705d451 100644 --- a/exec.c +++ b/exec.c @@ -1786,6 +1786,7 @@ static void check_watchpoint(int offset, int len, int flags) } wp-hitaddr = vaddr; if (!cpu-watchpoint_hit) { +qemu_mutex_unlock_iothread(); cpu-watchpoint_hit = wp; tb_check_watchpoint(cpu); if (wp-flags BP_STOP_BEFORE_ACCESS) { @@ -2557,6 +2558,7 @@ static inline uint32_t ldl_phys_internal(AddressSpace *as, hwaddr addr, mr = address_space_translate(as, addr, addr1, l, false); if (l 4 || !memory_access_is_direct(mr, false)) { /* I/O case */ +qemu_mutex_lock_iothread(); io_mem_read(mr, addr1, val, 4); #if defined(TARGET_WORDS_BIGENDIAN) if (endian == DEVICE_LITTLE_ENDIAN) { @@ -2567,6 +2569,7 @@ static inline uint32_t ldl_phys_internal(AddressSpace *as, hwaddr addr, val = bswap32(val); } #endif +qemu_mutex_unlock_iothread(); } else { /* RAM case */ ptr =
[Qemu-devel] [RFC 04/10] remove unused spinlock.
From: KONRAD Frederic fred.kon...@greensocs.com This just removes spinlock as it is not used anymore. Signed-off-by: KONRAD Frederic fred.kon...@greensocs.com --- include/exec/spinlock.h | 49 - scripts/checkpatch.pl | 9 ++--- 2 files changed, 2 insertions(+), 56 deletions(-) delete mode 100644 include/exec/spinlock.h diff --git a/include/exec/spinlock.h b/include/exec/spinlock.h deleted file mode 100644 index a72edda..000 --- a/include/exec/spinlock.h +++ /dev/null @@ -1,49 +0,0 @@ -/* - * Copyright (c) 2003 Fabrice Bellard - * - * This library is free software; you can redistribute it and/or - * modify it under the terms of the GNU Lesser General Public - * License as published by the Free Software Foundation; either - * version 2 of the License, or (at your option) any later version. - * - * This library is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - * Lesser General Public License for more details. - * - * You should have received a copy of the GNU Lesser General Public - * License along with this library; if not, see http://www.gnu.org/licenses/ - */ - -/* configure guarantees us that we have pthreads on any host except - * mingw32, which doesn't support any of the user-only targets. - * So we can simply assume we have pthread mutexes here. - */ -#if defined(CONFIG_USER_ONLY) - -#include pthread.h -#define spin_lock pthread_mutex_lock -#define spin_unlock pthread_mutex_unlock -#define spinlock_t pthread_mutex_t -#define SPIN_LOCK_UNLOCKED PTHREAD_MUTEX_INITIALIZER - -#else - -/* Empty implementations, on the theory that system mode emulation - * is single-threaded. This means that these functions should only - * be used from code run in the TCG cpu thread, and cannot protect - * data structures which might also be accessed from the IO thread - * or from signal handlers. - */ -typedef int spinlock_t; -#define SPIN_LOCK_UNLOCKED 0 - -static inline void spin_lock(spinlock_t *lock) -{ -} - -static inline void spin_unlock(spinlock_t *lock) -{ -} - -#endif diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl index 053e432..e97b707 100755 --- a/scripts/checkpatch.pl +++ b/scripts/checkpatch.pl @@ -2655,11 +2655,6 @@ sub process { WARN(Use of volatile is usually wrong: see Documentation/volatile-considered-harmful.txt\n . $herecurr); } -# SPIN_LOCK_UNLOCKED RW_LOCK_UNLOCKED are deprecated - if ($line =~ /\b(SPIN_LOCK_UNLOCKED|RW_LOCK_UNLOCKED)/) { - ERROR(Use of $1 is deprecated: see Documentation/spinlocks.txt\n . $herecurr); - } - # warn about #if 0 if ($line =~ /^.\s*\#\s*if\s+0\b/) { CHK(if this code is redundant consider removing it\n . @@ -2708,8 +2703,8 @@ sub process { ERROR(exactly one space required after that #$1\n . $herecurr); } -# check for spinlock_t definitions without a comment. - if ($line =~ /^.\s*(struct\s+mutex|spinlock_t)\s+\S+;/ || +# check for mutex definitions without a comment. + if ($line =~ /^.\s*(struct\s+mutex)\s+\S+;/ || $line =~ /^.\s*(DEFINE_MUTEX)\s*\(/) { my $which = $1; if (!ctx_has_comment($first_line, $linenr)) { -- 1.9.0
[Qemu-devel] [RFC 07/10] tcg: remove tcg_halt_cond global variable.
From: KONRAD Frederic fred.kon...@greensocs.com This removes tcg_halt_cond global variable. We need one QemuCond per virtual cpu for multithread TCG. Signed-off-by: KONRAD Frederic fred.kon...@greensocs.com --- cpus.c | 18 +++--- 1 file changed, 7 insertions(+), 11 deletions(-) diff --git a/cpus.c b/cpus.c index 2edb5cd..91a48f2 100644 --- a/cpus.c +++ b/cpus.c @@ -796,7 +796,6 @@ static bool iothread_requesting_mutex; static QemuThread io_thread; static QemuThread *tcg_cpu_thread; -static QemuCond *tcg_halt_cond; /* cpu creation */ static QemuCond qemu_cpu_cond; @@ -902,15 +901,13 @@ static void qemu_wait_io_event_common(CPUState *cpu) cpu-thread_kicked = false; } -static void qemu_tcg_wait_io_event(void) +static void qemu_tcg_wait_io_event(CPUState *cpu) { -CPUState *cpu; - while (all_cpu_threads_idle()) { /* Start accounting real time to the virtual clock if the CPUs are idle. */ qemu_clock_warp(QEMU_CLOCK_VIRTUAL); -qemu_cond_wait(tcg_halt_cond, qemu_global_mutex); +qemu_cond_wait(cpu-halt_cond, qemu_global_mutex); } while (iothread_requesting_mutex) { @@ -1030,7 +1027,7 @@ static void *qemu_tcg_cpu_thread_fn(void *arg) /* wait for initial kick-off after machine start */ while (QTAILQ_FIRST(cpus)-stopped) { -qemu_cond_wait(tcg_halt_cond, qemu_global_mutex); +qemu_cond_wait(QTAILQ_FIRST(cpus)-halt_cond, qemu_global_mutex); /* process any pending work */ CPU_FOREACH(cpu) { @@ -1048,7 +1045,7 @@ static void *qemu_tcg_cpu_thread_fn(void *arg) qemu_clock_notify(QEMU_CLOCK_VIRTUAL); } } -qemu_tcg_wait_io_event(); +qemu_tcg_wait_io_event(QTAILQ_FIRST(cpus)); } return NULL; @@ -1214,12 +1211,12 @@ static void qemu_tcg_init_vcpu(CPUState *cpu) tcg_cpu_address_space_init(cpu, cpu-as); +cpu-halt_cond = g_malloc0(sizeof(QemuCond)); +qemu_cond_init(cpu-halt_cond); + /* share a single thread for all cpus with TCG */ if (!tcg_cpu_thread) { cpu-thread = g_malloc0(sizeof(QemuThread)); -cpu-halt_cond = g_malloc0(sizeof(QemuCond)); -qemu_cond_init(cpu-halt_cond); -tcg_halt_cond = cpu-halt_cond; snprintf(thread_name, VCPU_THREAD_NAME_SIZE, CPU %d/TCG, cpu-cpu_index); qemu_thread_create(cpu-thread, thread_name, qemu_tcg_cpu_thread_fn, @@ -1233,7 +1230,6 @@ static void qemu_tcg_init_vcpu(CPUState *cpu) tcg_cpu_thread = cpu-thread; } else { cpu-thread = tcg_cpu_thread; -cpu-halt_cond = tcg_halt_cond; } } -- 1.9.0
[Qemu-devel] [RFC 05/10] extract TBContext from TCGContext.
From: KONRAD Frederic fred.kon...@greensocs.com In order to have one TCGContext per thread and a single TBContext we have to extract TBContext from TCGContext. Signed-off-by: KONRAD Frederic fred.kon...@greensocs.com --- cpu-exec.c| 18 ++--- linux-user/main.c | 6 ++--- tcg/tcg.h | 3 +-- translate-all.c | 79 +++ 4 files changed, 52 insertions(+), 54 deletions(-) diff --git a/cpu-exec.c b/cpu-exec.c index 1e7513c..4d22252 100644 --- a/cpu-exec.c +++ b/cpu-exec.c @@ -237,13 +237,13 @@ static TranslationBlock *tb_find_slow(CPUArchState *env, tb_page_addr_t phys_pc, phys_page1; target_ulong virt_page2; -tcg_ctx.tb_ctx.tb_invalidated_flag = 0; +tb_ctx.tb_invalidated_flag = 0; /* find translated block using physical mappings */ phys_pc = get_page_addr_code(env, pc); phys_page1 = phys_pc TARGET_PAGE_MASK; h = tb_phys_hash_func(phys_pc); -ptb1 = tcg_ctx.tb_ctx.tb_phys_hash[h]; +ptb1 = tb_ctx.tb_phys_hash[h]; for(;;) { tb = *ptb1; if (!tb) @@ -275,8 +275,8 @@ static TranslationBlock *tb_find_slow(CPUArchState *env, /* Move the last found TB to the head of the list */ if (likely(*ptb1)) { *ptb1 = tb-phys_hash_next; -tb-phys_hash_next = tcg_ctx.tb_ctx.tb_phys_hash[h]; -tcg_ctx.tb_ctx.tb_phys_hash[h] = tb; +tb-phys_hash_next = tb_ctx.tb_phys_hash[h]; +tb_ctx.tb_phys_hash[h] = tb; } /* we add the TB in the virtual pc hash table */ cpu-tb_jmp_cache[tb_jmp_cache_hash_func(pc)] = tb; @@ -454,18 +454,18 @@ int cpu_exec(CPUArchState *env) cpu_loop_exit(cpu); } #if defined(CONFIG_USER_ONLY) -qemu_mutex_lock(tcg_ctx.tb_ctx.tb_lock); +qemu_mutex_lock(tb_ctx.tb_lock); have_tb_lock = true; #endif tb = tb_find_fast(env); /* Note: we do it here to avoid a gcc bug on Mac OS X when doing it in tb_find_slow */ -if (tcg_ctx.tb_ctx.tb_invalidated_flag) { +if (tb_ctx.tb_invalidated_flag) { /* as some TB could have been invalidated because of memory exceptions while generating the code, we must recompute the hash index here */ next_tb = 0; -tcg_ctx.tb_ctx.tb_invalidated_flag = 0; +tb_ctx.tb_invalidated_flag = 0; } if (qemu_loglevel_mask(CPU_LOG_EXEC)) { qemu_log(Trace %p [ TARGET_FMT_lx ] %s\n, @@ -480,7 +480,7 @@ int cpu_exec(CPUArchState *env) } #if defined(CONFIG_USER_ONLY) have_tb_lock = false; -qemu_mutex_unlock(tcg_ctx.tb_ctx.tb_lock); +qemu_mutex_unlock(tb_ctx.tb_lock); #endif /* cpu_interrupt might be called while translating the TB, but before it is linked into a potentially @@ -556,7 +556,7 @@ int cpu_exec(CPUArchState *env) #endif #if defined(CONFIG_USER_ONLY) if (have_tb_lock) { -qemu_mutex_unlock(tcg_ctx.tb_ctx.tb_lock); +qemu_mutex_unlock(tb_ctx.tb_lock); have_tb_lock = false; } #endif diff --git a/linux-user/main.c b/linux-user/main.c index 97e7d50..2a4c948 100644 --- a/linux-user/main.c +++ b/linux-user/main.c @@ -107,7 +107,7 @@ static int pending_cpus; /* Make sure everything is in a consistent state for calling fork(). */ void fork_start(void) { -qemu_mutex_lock(tcg_ctx.tb_ctx.tb_lock); +qemu_mutex_lock(tb_ctx.tb_lock); pthread_mutex_lock(exclusive_lock); mmap_fork_start(); } @@ -129,11 +129,11 @@ void fork_end(int child) pthread_mutex_init(cpu_list_mutex, NULL); pthread_cond_init(exclusive_cond, NULL); pthread_cond_init(exclusive_resume, NULL); -qemu_mutex_init(tcg_ctx.tb_ctx.tb_lock); +qemu_mutex_init(tb_ctx.tb_lock); gdbserver_fork((CPUArchState *)thread_cpu-env_ptr); } else { pthread_mutex_unlock(exclusive_lock); -qemu_mutex_unlock(tcg_ctx.tb_ctx.tb_lock); +qemu_mutex_unlock(tb_ctx.tb_lock); } } diff --git a/tcg/tcg.h b/tcg/tcg.h index 944b877..baf053a 100644 --- a/tcg/tcg.h +++ b/tcg/tcg.h @@ -529,13 +529,12 @@ struct TCGContext { size_t code_gen_buffer_max_size; void *code_gen_ptr; -TBContext tb_ctx; - /* The TCGBackendData structure is private to tcg-target.c. */ struct TCGBackendData *be; }; extern TCGContext tcg_ctx; +extern TBContext tb_ctx; /* pool based memory allocation */ diff --git a/translate-all.c b/translate-all.c index 0e11c70..e393d30 100644 --- a/translate-all.c +++ b/translate-all.c @@ -127,6 +127,9 @@ static void *l1_map[V_L1_SIZE]; /* code generation context */ TCGContext tcg_ctx; +/*
[Qemu-devel] [RFC 09/10] cpu: remove exit_request global.
From: KONRAD Frederic fred.kon...@greensocs.com This removes exit_request global and adds a variable in CPUState for this. Only the flag for the first cpu is used for the moment as we are still with one TCG thread. Signed-off-by: KONRAD Frederic fred.kon...@greensocs.com --- cpu-exec.c| 4 +--- cpus.c| 12 +--- include/qom/cpu.h | 1 + 3 files changed, 11 insertions(+), 6 deletions(-) diff --git a/cpu-exec.c b/cpu-exec.c index 68654e3..3ba2a7a 100644 --- a/cpu-exec.c +++ b/cpu-exec.c @@ -327,8 +327,6 @@ static void cpu_handle_debug_exception(CPUArchState *env) /* main execution loop */ -volatile sig_atomic_t exit_request; - int cpu_exec(CPUArchState *env) { CPUState *cpu = ENV_GET_CPU(env); @@ -365,7 +363,7 @@ int cpu_exec(CPUArchState *env) * an instruction scheduling constraint on modern architectures. */ smp_mb(); -if (unlikely(exit_request)) { +if (unlikely(cpu-exit_loop_request)) { cpu-exit_request = 1; } diff --git a/cpus.c b/cpus.c index f10c94d..8ae70c2 100644 --- a/cpus.c +++ b/cpus.c @@ -646,10 +646,14 @@ static void cpu_handle_guest_debug(CPUState *cpu) static void cpu_signal(int sig) { +CPUState *cpu; if (current_cpu) { cpu_exit(current_cpu); } -exit_request = 1; + +CPU_FOREACH(cpu) { +cpu-exit_loop_request = 1; +} } #ifdef CONFIG_LINUX @@ -1376,7 +1380,8 @@ static void tcg_exec_all(void) if (next_cpu == NULL) { next_cpu = first_cpu; } -for (; next_cpu != NULL !exit_request; next_cpu = CPU_NEXT(next_cpu)) { +for (; next_cpu != NULL !first_cpu-exit_loop_request; + next_cpu = CPU_NEXT(next_cpu)) { CPUState *cpu = next_cpu; CPUArchState *env = cpu-env_ptr; @@ -1393,7 +1398,8 @@ static void tcg_exec_all(void) break; } } -exit_request = 0; + +first_cpu-exit_loop_request = 0; } void list_cpus(FILE *f, fprintf_function cpu_fprintf, const char *optarg) diff --git a/include/qom/cpu.h b/include/qom/cpu.h index 2098f1c..a2e3208 100644 --- a/include/qom/cpu.h +++ b/include/qom/cpu.h @@ -249,6 +249,7 @@ struct CPUState { bool created; bool stop; bool stopped; +volatile sig_atomic_t exit_loop_request; volatile sig_atomic_t exit_request; uint32_t interrupt_request; int singlestep_enabled; -- 1.9.0
[Qemu-devel] [RFC 02/10] use a different translation block list for each cpu.
From: KONRAD Frederic fred.kon...@greensocs.com We need a different TranslationBlock list for each core in case of multithread TCG. Signed-off-by: KONRAD Frederic fred.kon...@greensocs.com --- translate-all.c | 40 ++-- 1 file changed, 22 insertions(+), 18 deletions(-) diff --git a/translate-all.c b/translate-all.c index 8fa4378..0e11c70 100644 --- a/translate-all.c +++ b/translate-all.c @@ -72,10 +72,11 @@ #endif #define SMC_BITMAP_USE_THRESHOLD 10 +#define MAX_CPUS 256 typedef struct PageDesc { /* list of TBs intersecting this ram page */ -TranslationBlock *first_tb; +TranslationBlock *first_tb[MAX_CPUS]; /* in order to optimize self modifying code, we count the number of lookups we do to a given page to use a bitmap */ unsigned int code_write_count; @@ -750,7 +751,7 @@ static inline void invalidate_page_bitmap(PageDesc *p) /* Set to NULL all the 'first_tb' fields in all PageDescs. */ static void page_flush_tb_1(int level, void **lp) { -int i; +int i, j; if (*lp == NULL) { return; @@ -759,7 +760,9 @@ static void page_flush_tb_1(int level, void **lp) PageDesc *pd = *lp; for (i = 0; i V_L2_SIZE; ++i) { -pd[i].first_tb = NULL; +for (j = 0; j MAX_CPUS; j++) { +pd[i].first_tb[j] = NULL; +} invalidate_page_bitmap(pd + i); } } else { @@ -937,12 +940,12 @@ void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr) /* remove the TB from the page list */ if (tb-page_addr[0] != page_addr) { p = page_find(tb-page_addr[0] TARGET_PAGE_BITS); -tb_page_remove(p-first_tb, tb); +tb_page_remove(p-first_tb[current_cpu-cpu_index], tb); invalidate_page_bitmap(p); } if (tb-page_addr[1] != -1 tb-page_addr[1] != page_addr) { p = page_find(tb-page_addr[1] TARGET_PAGE_BITS); -tb_page_remove(p-first_tb, tb); +tb_page_remove(p-first_tb[current_cpu-cpu_index], tb); invalidate_page_bitmap(p); } @@ -1012,7 +1015,7 @@ static void build_page_bitmap(PageDesc *p) p-code_bitmap = g_malloc0(TARGET_PAGE_SIZE / 8); -tb = p-first_tb; +tb = p-first_tb[current_cpu-cpu_index]; while (tb != NULL) { n = (uintptr_t)tb 3; tb = (TranslationBlock *)((uintptr_t)tb ~3); @@ -1138,7 +1141,7 @@ void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end, /* we remove all the TBs in the range [start, end[ */ /* XXX: see if in some cases it could be faster to invalidate all the code */ -tb = p-first_tb; +tb = p-first_tb[cpu-cpu_index]; while (tb != NULL) { n = (uintptr_t)tb 3; tb = (TranslationBlock *)((uintptr_t)tb ~3); @@ -1196,7 +1199,7 @@ void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end, } #if !defined(CONFIG_USER_ONLY) /* if no code remaining, no need to continue to use slow writes */ -if (!p-first_tb) { +if (!p-first_tb[cpu-cpu_index]) { invalidate_page_bitmap(p); if (is_cpu_write_access) { tlb_unprotect_code_phys(cpu, start, cpu-mem_io_vaddr); @@ -1224,10 +1227,10 @@ void tb_invalidate_phys_page_fast(tb_page_addr_t start, int len) #if 0 if (1) { qemu_log(modifying code at 0x%x size=%d EIP=%x PC=%08x\n, - cpu_single_env-mem_io_vaddr, len, - cpu_single_env-eip, - cpu_single_env-eip + - (intptr_t)cpu_single_env-segs[R_CS].base); + current_cpu-mem_io_vaddr, len, + current_cpu-eip, + current_cpu-eip + + (intptr_t)current_cpu-segs[R_CS].base); } #endif p = page_find(start TARGET_PAGE_BITS); @@ -1269,7 +1272,7 @@ static void tb_invalidate_phys_page(tb_page_addr_t addr, if (!p) { return; } -tb = p-first_tb; +tb = p-first_tb[current_cpu-cpu_index]; #ifdef TARGET_HAS_PRECISE_SMC if (tb pc != 0) { current_tb = tb_find_pc(pc); @@ -1299,7 +1302,7 @@ static void tb_invalidate_phys_page(tb_page_addr_t addr, tb_phys_invalidate(tb, addr); tb = tb-page_next[n]; } -p-first_tb = NULL; +p-first_tb[current_cpu-cpu_index] = NULL; #ifdef TARGET_HAS_PRECISE_SMC if (current_tb_modified) { /* we generate a block containing just the instruction @@ -1327,11 +1330,12 @@ static inline void tb_alloc_page(TranslationBlock *tb, tb-page_addr[n] = page_addr; p = page_find_alloc(page_addr TARGET_PAGE_BITS, 1); -tb-page_next[n] = p-first_tb; +tb-page_next[n] = p-first_tb[current_cpu-cpu_index]; #ifndef CONFIG_USER_ONLY -page_already_protected = p-first_tb != NULL; +page_already_protected = p-first_tb[current_cpu-cpu_index] != NULL; #endif -p-first_tb = (TranslationBlock *)((uintptr_t)tb | n); +
[Qemu-devel] [RFC 10/10] tcg: switch on multithread.
From: KONRAD Frederic fred.kon...@greensocs.com This switches on multithread. Signed-off-by: KONRAD Frederic fred.kon...@greensocs.com --- cpus.c | 85 +++--- 1 file changed, 30 insertions(+), 55 deletions(-) diff --git a/cpus.c b/cpus.c index 8ae70c2..ab327dd 100644 --- a/cpus.c +++ b/cpus.c @@ -64,7 +64,6 @@ #endif /* CONFIG_LINUX */ -static CPUState *next_cpu; int64_t max_delay; int64_t max_advance; @@ -799,8 +798,6 @@ static bool iothread_requesting_mutex; static QemuThread io_thread; -static QemuThread *tcg_cpu_thread; - /* cpu creation */ static QemuCond qemu_cpu_cond; /* system init */ @@ -907,10 +904,12 @@ static void qemu_wait_io_event_common(CPUState *cpu) static void qemu_tcg_wait_io_event(CPUState *cpu) { -while (all_cpu_threads_idle()) { +while (cpu_thread_is_idle(cpu)) { /* Start accounting real time to the virtual clock if the CPUs are idle. */ -qemu_clock_warp(QEMU_CLOCK_VIRTUAL); +if ((all_cpu_threads_idle()) (cpu-cpu_index == 0)) { + qemu_clock_warp(QEMU_CLOCK_VIRTUAL); +} qemu_cond_wait(cpu-halt_cond, qemu_global_mutex); } @@ -918,9 +917,7 @@ static void qemu_tcg_wait_io_event(CPUState *cpu) qemu_cond_wait(qemu_io_proceeded_cond, qemu_global_mutex); } -CPU_FOREACH(cpu) { -qemu_wait_io_event_common(cpu); -} +qemu_wait_io_event_common(cpu); } static void qemu_kvm_wait_io_event(CPUState *cpu) @@ -1012,7 +1009,7 @@ static void *qemu_dummy_cpu_thread_fn(void *arg) #endif } -static void tcg_exec_all(void); +static void tcg_exec_all(CPUState *cpu); static void *qemu_tcg_cpu_thread_fn(void *arg) { @@ -1022,34 +1019,26 @@ static void *qemu_tcg_cpu_thread_fn(void *arg) qemu_thread_get_self(cpu-thread); qemu_mutex_lock_iothread(); -CPU_FOREACH(cpu) { -cpu-thread_id = qemu_get_thread_id(); -cpu-created = true; -cpu-can_do_io = 1; -} -qemu_cond_signal(qemu_cpu_cond); - -/* wait for initial kick-off after machine start */ -while (QTAILQ_FIRST(cpus)-stopped) { -qemu_cond_wait(QTAILQ_FIRST(cpus)-halt_cond, qemu_global_mutex); +cpu-thread_id = qemu_get_thread_id(); +cpu-created = true; +cpu-can_do_io = 1; -/* process any pending work */ -CPU_FOREACH(cpu) { -qemu_wait_io_event_common(cpu); -} -} +qemu_cond_signal(qemu_cpu_cond); while (1) { -tcg_exec_all(); +if (!cpu-stopped) { +tcg_exec_all(cpu); -if (use_icount) { -int64_t deadline = qemu_clock_deadline_ns_all(QEMU_CLOCK_VIRTUAL); +if (use_icount) { +int64_t deadline = +qemu_clock_deadline_ns_all(QEMU_CLOCK_VIRTUAL); -if (deadline == 0) { -qemu_clock_notify(QEMU_CLOCK_VIRTUAL); +if (deadline == 0) { +qemu_clock_notify(QEMU_CLOCK_VIRTUAL); +} } } -qemu_tcg_wait_io_event(QTAILQ_FIRST(cpus)); +qemu_tcg_wait_io_event(cpu); } return NULL; @@ -1207,23 +1196,15 @@ static void qemu_tcg_init_vcpu(CPUState *cpu) cpu-halt_cond = g_malloc0(sizeof(QemuCond)); qemu_cond_init(cpu-halt_cond); - -/* share a single thread for all cpus with TCG */ -if (!tcg_cpu_thread) { -cpu-thread = g_malloc0(sizeof(QemuThread)); -snprintf(thread_name, VCPU_THREAD_NAME_SIZE, CPU %d/TCG, - cpu-cpu_index); -qemu_thread_create(cpu-thread, thread_name, qemu_tcg_cpu_thread_fn, - cpu, QEMU_THREAD_JOINABLE); +cpu-thread = g_malloc0(sizeof(QemuThread)); +snprintf(thread_name, VCPU_THREAD_NAME_SIZE, CPU %d/TCG, cpu-cpu_index); +qemu_thread_create(cpu-thread, thread_name, qemu_tcg_cpu_thread_fn, cpu, + QEMU_THREAD_JOINABLE); #ifdef _WIN32 -cpu-hThread = qemu_thread_get_handle(cpu-thread); +cpu-hThread = qemu_thread_get_handle(cpu-thread); #endif -while (!cpu-created) { -qemu_cond_wait(qemu_cpu_cond, qemu_global_mutex); -} -tcg_cpu_thread = cpu-thread; -} else { -cpu-thread = tcg_cpu_thread; +while (!cpu-created) { +qemu_cond_wait(qemu_cpu_cond, qemu_global_mutex); } } @@ -1370,21 +1351,15 @@ static int tcg_cpu_exec(CPUArchState *env) return ret; } -static void tcg_exec_all(void) +static void tcg_exec_all(CPUState *cpu) { int r; +CPUArchState *env = cpu-env_ptr; /* Account partial waits to QEMU_CLOCK_VIRTUAL. */ qemu_clock_warp(QEMU_CLOCK_VIRTUAL); -if (next_cpu == NULL) { -next_cpu = first_cpu; -} -for (; next_cpu != NULL !first_cpu-exit_loop_request; - next_cpu = CPU_NEXT(next_cpu)) { -CPUState *cpu = next_cpu; -CPUArchState *env = cpu-env_ptr; - +while
[Qemu-devel] [RFC 00/10] MultiThread TCG.
From: KONRAD Frederic fred.kon...@greensocs.com Hi everybody, This is the start of our work on TCG multithread. We send it for comment to be sure we are taking the right direction. We already discussed the first patch but we keep it for simplicity. We choice to keep a common tbs array for all VCPU but protect it with the tb_lock from TBContext. Then for each PageDesc we have a tb list per VCPU. Known issues: * Some random deadlock. * qemu_pause_cond is broken we can't quit QEMU. * tb_flush is broken we must make sure no VCPU are executing code. Jan Kiszka (1): Drop global lock during TCG code execution KONRAD Frederic (9): target-arm: protect cpu_exclusive_*. use a different translation block list for each cpu. replace spinlock by QemuMutex. remove unused spinlock. extract TBContext from TCGContext. protect TBContext with tb_lock. tcg: remove tcg_halt_cond global variable. cpu: remove exit_request global. tcg: switch on multithread. cpu-exec.c| 47 +++ cpus.c| 122 +++ cputlb.c | 5 ++ exec.c| 25 ++ include/exec/exec-all.h | 4 +- include/exec/spinlock.h | 49 --- include/qom/cpu.h | 1 + linux-user/main.c | 6 +- scripts/checkpatch.pl | 9 +- softmmu_template.h| 6 ++ target-arm/cpu.c | 14 target-arm/cpu.h | 3 + target-arm/helper.h | 3 + target-arm/op_helper.c| 10 +++ target-arm/translate.c| 6 ++ target-i386/mem_helper.c | 16 +++- target-i386/misc_helper.c | 27 +- tcg/i386/tcg-target.c | 8 ++ tcg/tcg.h | 3 +- translate-all.c | 208 +++--- vl.c | 6 ++ 21 files changed, 335 insertions(+), 243 deletions(-) delete mode 100644 include/exec/spinlock.h -- 1.9.0
[Qemu-devel] [RFC 06/10] protect TBContext with tb_lock.
From: KONRAD Frederic fred.kon...@greensocs.com This protects TBContext with tb_lock to make tb_* thread safe. We can still have issue with tb_flush in case of multithread TCG: An other CPU can be executing code during a flush. This can be fixed later by making all other TCG thread exiting before calling tb_flush(). Signed-off-by: KONRAD Frederic fred.kon...@greensocs.com --- cpu-exec.c | 16 - translate-all.c | 100 +--- 2 files changed, 82 insertions(+), 34 deletions(-) diff --git a/cpu-exec.c b/cpu-exec.c index 4d22252..68654e3 100644 --- a/cpu-exec.c +++ b/cpu-exec.c @@ -237,6 +237,7 @@ static TranslationBlock *tb_find_slow(CPUArchState *env, tb_page_addr_t phys_pc, phys_page1; target_ulong virt_page2; +qemu_mutex_lock(tb_ctx.tb_lock); tb_ctx.tb_invalidated_flag = 0; /* find translated block using physical mappings */ @@ -268,8 +269,14 @@ static TranslationBlock *tb_find_slow(CPUArchState *env, ptb1 = tb-phys_hash_next; } not_found: + /* +* FIXME: We need to release this mutex because tb_gen_code needs it. +* This can be optimised by adding a flag to tb_gen_code? +*/ + qemu_mutex_unlock(tb_ctx.tb_lock); /* if no translated code available, then translate it now */ -tb = tb_gen_code(cpu, pc, cs_base, flags, 0); + tb = tb_gen_code(cpu, pc, cs_base, flags, 0); + qemu_mutex_lock(tb_ctx.tb_lock); found: /* Move the last found TB to the head of the list */ @@ -280,6 +287,7 @@ static TranslationBlock *tb_find_slow(CPUArchState *env, } /* we add the TB in the virtual pc hash table */ cpu-tb_jmp_cache[tb_jmp_cache_hash_func(pc)] = tb; +qemu_mutex_unlock(tb_ctx.tb_lock); return tb; } @@ -460,6 +468,9 @@ int cpu_exec(CPUArchState *env) tb = tb_find_fast(env); /* Note: we do it here to avoid a gcc bug on Mac OS X when doing it in tb_find_slow */ +#if !defined(CONFIG_USER_ONLY) +qemu_mutex_lock(tb_ctx.tb_lock); +#endif if (tb_ctx.tb_invalidated_flag) { /* as some TB could have been invalidated because of memory exceptions while generating the code, we @@ -467,6 +478,9 @@ int cpu_exec(CPUArchState *env) next_tb = 0; tb_ctx.tb_invalidated_flag = 0; } +#if !defined(CONFIG_USER_ONLY) +qemu_mutex_unlock(tb_ctx.tb_lock); +#endif if (qemu_loglevel_mask(CPU_LOG_EXEC)) { qemu_log(Trace %p [ TARGET_FMT_lx ] %s\n, tb-tc_ptr, tb-pc, lookup_symbol(tb-pc)); diff --git a/translate-all.c b/translate-all.c index e393d30..68505c0 100644 --- a/translate-all.c +++ b/translate-all.c @@ -689,6 +689,7 @@ static inline void code_gen_alloc(size_t tb_size) CODE_GEN_AVG_BLOCK_SIZE; tb_ctx.tbs = g_malloc(tcg_ctx.code_gen_max_blocks * sizeof(TranslationBlock)); +qemu_mutex_init(tb_ctx.tb_lock); } /* Must be called before using the QEMU cpus. 'tb_size' is the size @@ -713,20 +714,23 @@ bool tcg_enabled(void) return tcg_ctx.code_gen_buffer != NULL; } -/* Allocate a new translation block. Flush the translation buffer if - too many translation blocks or too much generated code. */ +/* + * Allocate a new translation block. Flush the translation buffer if + * too many translation blocks or too much generated code. + * tb_alloc is not thread safe but tb_gen_code is protected by a mutex so this + * function is called only by one thread. + */ static TranslationBlock *tb_alloc(target_ulong pc) { -TranslationBlock *tb; +TranslationBlock *tb = NULL; -if (tb_ctx.nb_tbs = tcg_ctx.code_gen_max_blocks || -(tcg_ctx.code_gen_ptr - tcg_ctx.code_gen_buffer) = +if (tb_ctx.nb_tbs tcg_ctx.code_gen_max_blocks +(tcg_ctx.code_gen_ptr - tcg_ctx.code_gen_buffer) tcg_ctx.code_gen_buffer_max_size) { -return NULL; +tb = tb_ctx.tbs[tb_ctx.nb_tbs++]; +tb-pc = pc; +tb-cflags = 0; } -tb = tb_ctx.tbs[tb_ctx.nb_tbs++]; -tb-pc = pc; -tb-cflags = 0; return tb; } @@ -735,11 +739,16 @@ void tb_free(TranslationBlock *tb) /* In practice this is mostly used for single use temporary TB Ignore the hard cases and just back up if this TB happens to be the last one generated. */ + +qemu_mutex_lock(tb_ctx.tb_lock); + if (tb_ctx.nb_tbs 0 tb == tb_ctx.tbs[tb_ctx.nb_tbs - 1]) { tcg_ctx.code_gen_ptr = tb-tc_ptr; tb_ctx.nb_tbs--; } + +qemu_mutex_unlock(tb_ctx.tb_lock); } static inline void invalidate_page_bitmap(PageDesc *p) @@ -792,6 +801,8 @@ void tb_flush(CPUArchState *env1) { CPUState *cpu = ENV_GET_CPU(env1); +qemu_mutex_lock(tb_ctx.tb_lock); + #if defined(DEBUG_FLUSH) printf(qemu:
Re: [Qemu-devel] [PATCH 2/2] atapi migration: Throw recoverable error to avoid recovery
On 12/09/2014 01:15 PM, Dr. David Alan Gilbert (git) wrote: From: Dr. David Alan Gilbert dgilb...@redhat.com (With the previous atapi_dma flag recovery) If migration happens between the ATAPI command being written and the bmdma being started, the DMA is dropped. Eventually the guest times out and recovers, but that can take many seconds. (This is rare, on a pingpong reading the CD continuously I hit this about ~1/30-1/50 migrates) I don't think we've got enough state to be able to recover safely at this point, so I throw a 'medium error, no seek complete' that I'm assuming guests will try and recover from an apparently dirty CD. OK, it's a hack, the real solution is probably to push a lot of ATAPI state into the migration stream, but this is a fix that works with no stream changes. Tested only on Linux (both RHEL5 (pre-libata) and RHEL7). Signed-off-by: Dr. David Alan Gilbert dgilb...@redhat.com --- hw/ide/atapi.c| 17 + hw/ide/internal.h | 2 ++ hw/ide/pci.c | 11 +++ 3 files changed, 30 insertions(+) diff --git a/hw/ide/atapi.c b/hw/ide/atapi.c index c63b7e5..e17799c 100644 --- a/hw/ide/atapi.c +++ b/hw/ide/atapi.c @@ -394,6 +394,23 @@ static void ide_atapi_cmd_read(IDEState *s, int lba, int nb_sectors, } } + +/* Called by *_restart_bh when the transfer function points + * to ide_atapi_cmd + */ +void ide_atapi_dma_restart(IDEState *s) +{ +/* + * I'm not sure we have enough stored to restart the command + * safely, so give the guest an error it should recover from. + * I'm assuming most guests will try to recover from something + * listed as a medium error on a CD; it seems to work on Linux. + * This would be more of a problem if we did any other type of + * DMA operation. + */ +ide_atapi_cmd_error(s, MEDIUM_ERROR, ASC_NO_SEEK_COMPLETE); +} + static inline uint8_t ide_atapi_set_profile(uint8_t *buf, uint8_t *index, uint16_t profile) { diff --git a/hw/ide/internal.h b/hw/ide/internal.h index 8a3eca4..8b65285 100644 --- a/hw/ide/internal.h +++ b/hw/ide/internal.h @@ -289,6 +289,7 @@ typedef struct IDEDMAOps IDEDMAOps; #define ATAPI_INT_REASON_TAG0xf8 /* same constants as bochs */ +#define ASC_NO_SEEK_COMPLETE 0x02 #define ASC_ILLEGAL_OPCODE 0x20 #define ASC_LOGICAL_BLOCK_OOR0x21 #define ASC_INV_FIELD_IN_CMD_PACKET 0x24 @@ -529,6 +530,7 @@ void ide_dma_error(IDEState *s); void ide_atapi_cmd_ok(IDEState *s); void ide_atapi_cmd_error(IDEState *s, int sense_key, int asc); +void ide_atapi_dma_restart(IDEState *s); void ide_atapi_io_error(IDEState *s, int ret); void ide_ioport_write(void *opaque, uint32_t addr, uint32_t val); diff --git a/hw/ide/pci.c b/hw/ide/pci.c index bee5ad3..e3f2054 100644 --- a/hw/ide/pci.c +++ b/hw/ide/pci.c @@ -235,6 +235,17 @@ static void bmdma_restart_bh(void *opaque) } } else if (error_status IDE_RETRY_FLUSH) { ide_flush_cache(bmdma_active_if(bm)); +} else { +IDEState *s = bmdma_active_if(bm); + +/* + * We've not got any bits to tell us about ATAPI - but + * we do have the end_transfer_func that tells us what + * we're trying to do. + */ +if (s-end_transfer_func == ide_atapi_cmd) { +ide_atapi_dma_restart(s); +} } } I think this workaround is sensible until we develop a more precise fix. Reviewed-by: John Snow js...@redhat.com
Re: [Qemu-devel] [PATCH v11 09/13] qmp: Add support of dirty-bitmap sync mode for drive-backup
On 2015-01-12 at 11:31, John Snow wrote: For dirty-bitmap sync mode, the block job will iterate through the given dirty bitmap to decide if a sector needs backup (backup all the dirty clusters and skip clean ones), just as allocation conditions of top sync mode. Signed-off-by: Fam Zheng f...@redhat.com Signed-off-by: John Snow js...@redhat.com --- block.c | 5 ++ block/backup.c| 120 ++ block/mirror.c| 4 ++ blockdev.c| 14 +- hmp.c | 3 +- include/block/block.h | 1 + include/block/block_int.h | 2 + qapi/block-core.json | 11 +++-- qmp-commands.hx | 7 +-- 9 files changed, 137 insertions(+), 30 deletions(-) Since you seem to be intending to rethink the frozen state, I'm just scanning through the series from patch 8 on. While this patch doesn't seem to have changed much conceptually since the last version I reviewed, with it applied to master, qemu fails to build due to Fam's blockdev-backup series (some new calls to backup_start() which have to be adapted). Max
Re: [Qemu-devel] [PATCH v11 09/13] qmp: Add support of dirty-bitmap sync mode for drive-backup
On 01/16/2015 12:52 PM, Max Reitz wrote: On 2015-01-12 at 11:31, John Snow wrote: For dirty-bitmap sync mode, the block job will iterate through the given dirty bitmap to decide if a sector needs backup (backup all the dirty clusters and skip clean ones), just as allocation conditions of top sync mode. Signed-off-by: Fam Zheng f...@redhat.com Signed-off-by: John Snow js...@redhat.com --- block.c | 5 ++ block/backup.c| 120 ++ block/mirror.c| 4 ++ blockdev.c| 14 +- hmp.c | 3 +- include/block/block.h | 1 + include/block/block_int.h | 2 + qapi/block-core.json | 11 +++-- qmp-commands.hx | 7 +-- 9 files changed, 137 insertions(+), 30 deletions(-) Since you seem to be intending to rethink the frozen state, I'm just scanning through the series from patch 8 on. While this patch doesn't seem to have changed much conceptually since the last version I reviewed, with it applied to master, qemu fails to build due to Fam's blockdev-backup series (some new calls to backup_start() which have to be adapted). Max Understood. Frozen will still exist largely similar to what it doe s now, but the proposal from fam was to simply fold enabled/disabled into the same status, and perhaps differentiate between a simple disabled state and a frozen state by the presence or absence of a successor. As for the rebase onto master, here's a v11.5: https://github.com/jnsnow/qemu/commits/dbm-backup -- —js
[Qemu-devel] [PATCH] cpu_ldst.h: Don't define helpers if MMU_MODE*_SUFFIX not defined
Not all targets define a full set of suffix strings for the NB_MMU_MODES that they have. In this situation, don't define any helper functions for that mode, rather than defining helper functions with no suffix at all. The MMU mode is still functional; it is merely not directly accessible via cpu_ld*_MODE from target helper functions. Also add an NB_MMU_MODES = 2 check to the definition of the mode 1 helpers -- some targets only define one MMU mode. Signed-off-by: Peter Maydell peter.mayd...@linaro.org --- The alternative would be to put in guards requiring the suffix macros to be defined, and fix up the targets which are missing them: arm (no 2 or 3) lm32 (no 0) moxie (no 0) openrisc (no 0 1 or 2) tricore (no 0 1 or 2) I'm going to fix up ARM anyway, but I decided that just deal with it was better than trying to look up MMU mode definitions for four CPU types I don't know... This patch applies after the 15-patch cpu_ldst series I sent out the other day. include/exec/cpu_ldst.h | 12 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/include/exec/cpu_ldst.h b/include/exec/cpu_ldst.h index d98ff17..0e825ea 100644 --- a/include/exec/cpu_ldst.h +++ b/include/exec/cpu_ldst.h @@ -132,6 +132,7 @@ uint16_t helper_ldw_cmmu(CPUArchState *env, target_ulong addr, int mmu_idx); uint32_t helper_ldl_cmmu(CPUArchState *env, target_ulong addr, int mmu_idx); uint64_t helper_ldq_cmmu(CPUArchState *env, target_ulong addr, int mmu_idx); +#ifdef MMU_MODE0_SUFFIX #define CPU_MMU_INDEX 0 #define MEMSUFFIX MMU_MODE0_SUFFIX #define DATA_SIZE 1 @@ -147,7 +148,9 @@ uint64_t helper_ldq_cmmu(CPUArchState *env, target_ulong addr, int mmu_idx); #include exec/cpu_ldst_template.h #undef CPU_MMU_INDEX #undef MEMSUFFIX +#endif +#if (NB_MMU_MODES = 2) defined(MMU_MODE1_SUFFIX) #define CPU_MMU_INDEX 1 #define MEMSUFFIX MMU_MODE1_SUFFIX #define DATA_SIZE 1 @@ -163,8 +166,9 @@ uint64_t helper_ldq_cmmu(CPUArchState *env, target_ulong addr, int mmu_idx); #include exec/cpu_ldst_template.h #undef CPU_MMU_INDEX #undef MEMSUFFIX +#endif -#if (NB_MMU_MODES = 3) +#if (NB_MMU_MODES = 3) defined(MMU_MODE2_SUFFIX) #define CPU_MMU_INDEX 2 #define MEMSUFFIX MMU_MODE2_SUFFIX @@ -183,7 +187,7 @@ uint64_t helper_ldq_cmmu(CPUArchState *env, target_ulong addr, int mmu_idx); #undef MEMSUFFIX #endif /* (NB_MMU_MODES = 3) */ -#if (NB_MMU_MODES = 4) +#if (NB_MMU_MODES = 4) defined(MMU_MODE3_SUFFIX) #define CPU_MMU_INDEX 3 #define MEMSUFFIX MMU_MODE3_SUFFIX @@ -202,7 +206,7 @@ uint64_t helper_ldq_cmmu(CPUArchState *env, target_ulong addr, int mmu_idx); #undef MEMSUFFIX #endif /* (NB_MMU_MODES = 4) */ -#if (NB_MMU_MODES = 5) +#if (NB_MMU_MODES = 5) defined(MMU_MODE4_SUFFIX) #define CPU_MMU_INDEX 4 #define MEMSUFFIX MMU_MODE4_SUFFIX @@ -221,7 +225,7 @@ uint64_t helper_ldq_cmmu(CPUArchState *env, target_ulong addr, int mmu_idx); #undef MEMSUFFIX #endif /* (NB_MMU_MODES = 5) */ -#if (NB_MMU_MODES = 6) +#if (NB_MMU_MODES = 6) defined(MMU_MODE5_SUFFIX) #define CPU_MMU_INDEX 5 #define MEMSUFFIX MMU_MODE5_SUFFIX -- 1.9.1
Re: [Qemu-devel] [PATCH v11 08/13] block: Add bitmap successors
On 01/13/2015 04:24 AM, Fam Zheng wrote: On Mon, 01/12 11:31, John Snow wrote: A bitmap successor is an anonymous BdrvDirtyBitmap that is intended to be created just prior to a sensitive operation (e.g. Incremental Backup) that can either succeed or fail, but during the course of which we still want a bitmap tracking writes. On creating a successor, we freeze the parent bitmap which prevents its deletion, enabling, anonymization, or creating a bitmap with the same name. On success, the parent bitmap can abdicate responsibility to the successor, which will inherit its name. The successor will have been tracking writes during the course of the backup operation. The parent will be safely deleted. On failure, we can reclaim the successor from the parent, unifying them such that the resulting bitmap describes all writes occurring since the last successful backup, for instance. Reclamation will thaw the parent, but not explicitly re-enable it. If we compare this with image snapshot and overlay, it fits the current backing chain model very well. Given that we're on the way to persistent dirty bitmap, which will probably be read/written with general block.c code, I'm wondering if it will be any better to reuse the block layer backing file code to handle successor logic? Also with the frozen feature built in dirty bitmaps, is it okay to drop enabled state? I think there are three states for a bitmap: 1) Active, no successor (similar to an read-write top image) 2) Frozen, no successor (similar to an read-only top image) 3) Frozen, with successor (similar to an read-only backing file, with an overlap) Admittedly, more code is needed than this patch, in order to glue hbitmap and block layer together, but it would probably make live migration of dirty bitmap very easy, but I'm not sure without a closer look. Signed-off-by: John Snow js...@redhat.com --- block.c | 123 -- blockdev.c| 14 ++ include/block/block.h | 10 3 files changed, 144 insertions(+), 3 deletions(-) diff --git a/block.c b/block.c index bd4b449..3f33b9d 100644 --- a/block.c +++ b/block.c @@ -53,10 +53,12 @@ struct BdrvDirtyBitmap { HBitmap *bitmap; +BdrvDirtyBitmap *successor; int64_t size; int64_t granularity; char *name; bool enabled; +bool frozen; QLIST_ENTRY(BdrvDirtyBitmap) list; }; @@ -5342,6 +5344,7 @@ BdrvDirtyBitmap *bdrv_find_dirty_bitmap(BlockDriverState *bs, const char *name) void bdrv_dirty_bitmap_make_anon(BlockDriverState *bs, BdrvDirtyBitmap *bitmap) { +assert(!bitmap-frozen); g_free(bitmap-name); bitmap-name = NULL; } @@ -5379,9 +5382,114 @@ BdrvDirtyBitmap *bdrv_create_dirty_bitmap(BlockDriverState *bs, return bitmap; } +/** + * A frozen bitmap cannot be renamed, deleted, cleared, + * or set. A frozen bitmap can only abdicate, reclaim, + * or thaw. + */ +static void bdrv_freeze_dirty_bitmap(BdrvDirtyBitmap *bitmap) +{ +bitmap-frozen = true; +} + +static void bdrv_thaw_dirty_bitmap(BdrvDirtyBitmap *bitmap) +{ +bitmap-frozen = false; +} + +bool bdrv_dirty_bitmap_frozen(BdrvDirtyBitmap *bitmap) +{ +return bitmap-frozen; +} + bool bdrv_dirty_bitmap_enabled(BdrvDirtyBitmap *bitmap) { -return bitmap-enabled; +return bitmap-enabled !bitmap-frozen; An indication that enabled could be replaced by frozen. Otherwise it looks confusing to me. +} + +/** + * Create a successor bitmap destined to replace this bitmap after an operation. + * Requires that the bitmap is not frozen and has no successor. + */ +int bdrv_dirty_bitmap_create_successor(BlockDriverState *bs, + BdrvDirtyBitmap *bitmap, Error **errp) +{ +uint64_t granularity; + +if (bdrv_dirty_bitmap_frozen(bitmap)) { +error_setg(errp, + Cannot create a successor for a bitmap currently in-use.); +return -1; +} else if (bitmap-successor) { +error_setg(errp, Cannot create a successor for a bitmap that + already has one.); +return -1; +} + +/* Create an anonymous successor */ +granularity = bdrv_dirty_bitmap_granularity(bs, bitmap); +bitmap-successor = bdrv_create_dirty_bitmap(bs, granularity, NULL, errp); +if (!bitmap-successor) { +return -1; +} + +/* Successor will be on or off based on our current state. + * Parent will be disabled and frozen. */ +bitmap-successor-enabled = bitmap-enabled; +bdrv_disable_dirty_bitmap(bitmap); +bdrv_freeze_dirty_bitmap(bitmap); +return 0; +} + +/** + * For a bitmap with a successor, yield our name to the successor, + * Delete the old bitmap, and return a handle to the new bitmap. + */ +BdrvDirtyBitmap *bdrv_dirty_bitmap_abdicate(BlockDriverState *bs, +BdrvDirtyBitmap *bitmap, +
Re: [Qemu-devel] [PATCH v7 05/32] target-arm: make arm_current_el() return EL3
On 21 October 2014 at 17:55, Greg Bellows greg.bell...@linaro.org wrote: From: Fabian Aggeler aggel...@ethz.ch Make arm_current_el() return EL3 for secure PL1 and monitor mode. Increase MMU modes since mmu_index is directly inferred from arm_ current_el(). Change assertion in arm_el_is_aa64() to allow EL3. -#define NB_MMU_MODES 2 +#define NB_MMU_MODES 4 So this turns out not to quite be what we want. A QEMU MMU mode index basically defines a (vaddr - paddr,permissions) mapping. This is similar to the ARM ARM concept of a translation regime, with the differences that: * the ARM ARM translation regimes may have split permissions, for user and privileged code, so we need two mmu_idx values for a translation regime that applies to both EL0 and EL1 * stage 1 and stage 2 translations for a VA-IPA-PA lookup for an EL1/EL0 hypervisor guest are two different translation regimes, but for QEMU we can just cache the whole VA-PA and use a single mmu_idx. [We only need to separately do VA-IPA and IPA-VA for the do this address translation system instructions, which don't need to touch the TLB; a combined stage1+stage2 TLB is permitted by the architecture.] The translation regimes are: If EL3 is 64-bit: * Secure EL3 * Secure EL1 EL0 * NonSecure EL2 * NonSecure EL1 0 stage 1 * NonSecure EL1 0 stage 2 If EL3 is 32-bit: * Secure PL0 PL1 * NonSecure PL2 * NonSecure PL1 0 stage 1 * NonSecure PL1 0 stage 2 (reminder: for 32 bit EL3, Secure PL1 is *EL3*, not EL1.) which we can give the following mmu indexes: 64 bit EL3: 0 : NS EL0 stage 1+2 1 : NS EL1 stage 1+2 2 : NS EL2 3 : S EL3 4 : S EL0 5 : S EL1 32 bit EL3: 0 : NS EL0 (aka NS PL0) stage 1+2 1 : NS EL1 (aka NS PL1) stage 1+2 2 : NS EL2 (aka NS PL2) 3 : S EL3 (aka S PL1) 4 : S EL0 (aka S PL0) Notice how they end up being the same, except that with a 64 bit EL3 we need an extra mmu index that 32 bit doesn't have. They aren't simply what is our current EL?, though as you can see I've put them in an order that comes close. So the right answer for NB_MMU_MODES is 6 :-) -- PMM
Re: [Qemu-devel] [PATCH] cpu_ldst.h: Don't define helpers if MMU_MODE*_SUFFIX not defined
On 01/16/2015 10:13 AM, Peter Maydell wrote: Not all targets define a full set of suffix strings for the NB_MMU_MODES that they have. In this situation, don't define any helper functions for that mode, rather than defining helper functions with no suffix at all. The MMU mode is still functional; it is merely not directly accessible via cpu_ld*_MODE from target helper functions. Also add an NB_MMU_MODES = 2 check to the definition of the mode 1 helpers -- some targets only define one MMU mode. Signed-off-by: Peter Maydell peter.mayd...@linaro.org Reviewed-by: Richard Henderson r...@twiddle.net r~
Re: [Qemu-devel] [PATCH 00/15] Clean up cpu-ldst ld/st memory accessors
On 01/15/2015 07:01 AM, Peter Maydell wrote: I was looking at our confusing mess of memory accessor functions, and I realised that partly it was confusing because we have a bunch of unnecessary junk lurking in there :-) This series attempts to clean things up by removing things we weren't using at all or were only using by mistake in a few places: * ldul_*: not used * ld* (ldl, etc): hardly used * ld*_kernel: not used * ld*_raw: hardly used * cpu_{ld,st}{fq,fl}: not used The dull parts of this series are removing the unused macros and fixing uses of the hardly-used macros so those can be deleted too. This series also switches to using inline functions rather than macros for the user-only cpu_ld/st* accessors, bringing them into line with the softmmu configs. This has the nice side effect of letting us get rid of the _raw accessor macros too. I've also thrown in a commit which cleans up the doc comments. Peter Maydell (15): cpu_ldst.h: Remove unused ldul_ macros monitor.c: Use ld*_p() instead of ld*_raw() target-sparc: Don't use {ld,st}*_raw functions linux-user/elfload.c: Don't use _raw accessor functions bsd-user/elfload.c: Don't use ldl() or ldq_raw() linux-user/vm86.c: Use cpu_ldl_data c rather than plain ldl c linux-user/main.c (m68k): Use get_user_u16 rather than lduw in cpu_loop target-mips: Don't use _raw load/store accessors cpu_ldst.h: Drop unused ld/st*_kernel defines cpu_ldst.h: Remove unused very short ld*/st* defines cpu_ldst.h: Use inline functions for usermode cpu_ld/st accessors cpu_ldst_template.h: Use ld*_p directly rather than via ld*_raw macros cpu_ldst.h: Drop unused _raw macros, saddr() and laddr() cpu_ldst_template.h: Drop unused cpu_ldfq/stfq/ldfl/stfl accessors cpu_ldst.h, cpu-all.h, bswap.h: Update documentation on ld/st accessors Reviewed-by: Richard Henderson r...@twiddle.net Nice cleanup. r~
Re: [Qemu-devel] [PATCH v2] qemu-iotests: Fix supported_oses check
On 01/16/2015 08:23 AM, Stefan Hajnoczi wrote: On Fri, Jan 16, 2015 at 09:38:42AM +0800, Fam Zheng wrote: There is a bug in the recently added sys.platform test, and we no longer run python tests, because linux2 is the value to compare here. So do a prefix match. According to python doc [1], the way to use sys.platform is unless you want to test for a specific system version, it is therefore recommended to use the following idiom: if sys.platform.startswith('freebsd'): # FreeBSD-specific code here... elif sys.platform.startswith('linux'): # Linux-specific code here... [1]: https://docs.python.org/2.7/library/sys.html#sys.platform Signed-off-by: Fam Zheng f...@redhat.com --- v2: Don't use any(). Explain why prefix match is fine. (Thanks, Stefan) --- tests/qemu-iotests/iotests.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Thanks, applied to my block tree: https://github.com/stefanha/qemu/commits/block Stefan Thanks for this! fake_internet_karma++