Re: [Qemu-devel] [PATCH v3 4/5] Enable fw_cfg DMA interface for ARM
On 10/28/15 02:12, Gabriel L. Somlo wrote: > On Tue, Oct 27, 2015 at 01:43:39PM +0100, Laszlo Ersek wrote: >> On 10/27/15 12:11, Gerd Hoffmann wrote: >>> Hi, >>> > My hypothesis (which I guess I'm volunteering to verify, unless we > end up rejecting this immediately as a bad idea, for some reason that > I have missed), is that current functionality wouldn't change, given > the way existing callbacks work right now, and that we could run the > callback each time a blob is *selected*, rather than hooking into the > (dma/mmio/pio) read methods. Callback executed on first read only sounds okay to me, callback executed on selection... hm... don't like it. :) >>> >>> Care to explain why? >>> >>> I think callback on selection would be better. Interface is more clear >>> then, I don't like read having different behavior depending on hidden >>> state (current offset). >> >>> And in practice selection and read will always >>> be called together, >> >> This is what I think you cannot guarantee on the host side, without >> auditing all guest code. The behavior of callbacks has been specified >> under fw_cfg_add_file_callback(), in docs/specs/fw_cfg.txt, and guest >> code is allowed to work off that. >> >>> so there shouldn't be a difference in practice ... >> >> I guess I have no choice but to audit all QemuFwCfgSelectItem calls in >> edk2... >> >> Right, here's what I've had in the back of my mind: see the >> DetectSmbiosVersion() function in >> "OvmfPkg/Library/SmbiosVersionLib/DetectSmbiosVersionLib.c". It selects >> the key that belongs to the "etc/smbios/smbios-anchor" fw_cfg file, but >> the switch statement right after it can jump to the "default" label, and >> under that label *nothing* is read from fw_cfg. >> >> This is valid guest code according to the current specs. Its behavior >> would change (however obscurely) if there was a callback on the >> "etc/smbios/smbios-anchor" file, and the callback was executed on >> selection, not read. > > OK, but none of "etc/smbios/*" blobs actually have a callback at all. > > After some grepping, the only places inserting callback-equipped blobs > are: > > - hw/i386/acpi-build.c (via rom_add_blob or directly by calling > fw_cfg_add_file_callback) > > - files added are > "/etc/acpi/rsdp", > "/etc/acpi/tables", and > "/etc/table-loader". > > - all using the same callback: acpi_build_update() > > - hw/arm/virt-acpi-build.c (via rom_add_blob only) > > - same three files as on i386 > > - all using the same callback: virt_acpi_build_update() > > Both of these callbacks are a one-shot deal, i.e. they both > contain something along these lines: > > /* No state to update or already patched? Nothing to do. */ > if (!build_state || build_state->patched) { > return; > } > build_state->patched = 1; > > So, they do something *once* before the first byte is ever read, and > never again after that. > >> ... This one instance wouldn't be particularly hard to patch in edk2, >> but in general our specs are useless if we don't stick to them. > > OK, so I was proposing to amend the specs (now, while externally > visible behavior won't be affected), and *THEN* stick to them :) > > We're already giving up on the letter of the specs (right now, they > say once per byte read, but DMA is only doing once per chunk transfered, > which in practice amounts to once each time a whole blob is read). Good point. > Of course, if you (or anyone else with much more clue than me) expect > a future scenario where we'd need the opportunity to run the callback > more than once *before* reading anything from the blob, or (as is the > case with smbios) wish to select (but not read from) blobs, and the > blobs will be callback-enabled, but running the callback will be a bad > thing when no read follows, then by all means, let's stick with > hooking into each individual read operation. > > As it is right now, the ammended spec I'm proposing (if set, callback > runs on select, whether a read follows or not) is a NOP w.r.t. currently > visible behavior. It allows simplifying things, at the price of removing > theoretical future flexibility (but also unnecessary slowness as well). Okay. Please go ahead with the change, as far as I'm concerned. Thanks Laszlo > Thanks for helping me think this through ! > > --Gabriel >
Re: [Qemu-devel] [PATCH v3 4/5] Enable fw_cfg DMA interface for ARM
On 10/27/15 12:11, Gerd Hoffmann wrote: > Hi, > >>> My hypothesis (which I guess I'm volunteering to verify, unless we >>> end up rejecting this immediately as a bad idea, for some reason that >>> I have missed), is that current functionality wouldn't change, given >>> the way existing callbacks work right now, and that we could run the >>> callback each time a blob is *selected*, rather than hooking into the >>> (dma/mmio/pio) read methods. >> >> Callback executed on first read only sounds okay to me, callback >> executed on selection... hm... don't like it. :) > > Care to explain why? > > I think callback on selection would be better. Interface is more clear > then, I don't like read having different behavior depending on hidden > state (current offset). > And in practice selection and read will always > be called together, This is what I think you cannot guarantee on the host side, without auditing all guest code. The behavior of callbacks has been specified under fw_cfg_add_file_callback(), in docs/specs/fw_cfg.txt, and guest code is allowed to work off that. > so there shouldn't be a difference in practice ... I guess I have no choice but to audit all QemuFwCfgSelectItem calls in edk2... Right, here's what I've had in the back of my mind: see the DetectSmbiosVersion() function in "OvmfPkg/Library/SmbiosVersionLib/DetectSmbiosVersionLib.c". It selects the key that belongs to the "etc/smbios/smbios-anchor" fw_cfg file, but the switch statement right after it can jump to the "default" label, and under that label *nothing* is read from fw_cfg. This is valid guest code according to the current specs. Its behavior would change (however obscurely) if there was a callback on the "etc/smbios/smbios-anchor" file, and the callback was executed on selection, not read. ... This one instance wouldn't be particularly hard to patch in edk2, but in general our specs are useless if we don't stick to them. Thanks Laszlo
Re: [Qemu-devel] [PATCH v3 4/5] Enable fw_cfg DMA interface for ARM
Hi, > > My hypothesis (which I guess I'm volunteering to verify, unless we > > end up rejecting this immediately as a bad idea, for some reason that > > I have missed), is that current functionality wouldn't change, given > > the way existing callbacks work right now, and that we could run the > > callback each time a blob is *selected*, rather than hooking into the > > (dma/mmio/pio) read methods. > > Callback executed on first read only sounds okay to me, callback > executed on selection... hm... don't like it. :) Care to explain why? I think callback on selection would be better. Interface is more clear then, I don't like read having different behavior depending on hidden state (current offset). And in practice selection and read will always be called together, so there shouldn't be a difference in practice ... cheers, Gerd
Re: [Qemu-devel] [PATCH v3 4/5] Enable fw_cfg DMA interface for ARM
On Tue, Oct 27, 2015 at 01:43:39PM +0100, Laszlo Ersek wrote: > On 10/27/15 12:11, Gerd Hoffmann wrote: > > Hi, > > > >>> My hypothesis (which I guess I'm volunteering to verify, unless we > >>> end up rejecting this immediately as a bad idea, for some reason that > >>> I have missed), is that current functionality wouldn't change, given > >>> the way existing callbacks work right now, and that we could run the > >>> callback each time a blob is *selected*, rather than hooking into the > >>> (dma/mmio/pio) read methods. > >> > >> Callback executed on first read only sounds okay to me, callback > >> executed on selection... hm... don't like it. :) > > > > Care to explain why? > > > > I think callback on selection would be better. Interface is more clear > > then, I don't like read having different behavior depending on hidden > > state (current offset). > > > And in practice selection and read will always > > be called together, > > This is what I think you cannot guarantee on the host side, without > auditing all guest code. The behavior of callbacks has been specified > under fw_cfg_add_file_callback(), in docs/specs/fw_cfg.txt, and guest > code is allowed to work off that. > > > so there shouldn't be a difference in practice ... > > I guess I have no choice but to audit all QemuFwCfgSelectItem calls in > edk2... > > Right, here's what I've had in the back of my mind: see the > DetectSmbiosVersion() function in > "OvmfPkg/Library/SmbiosVersionLib/DetectSmbiosVersionLib.c". It selects > the key that belongs to the "etc/smbios/smbios-anchor" fw_cfg file, but > the switch statement right after it can jump to the "default" label, and > under that label *nothing* is read from fw_cfg. > > This is valid guest code according to the current specs. Its behavior > would change (however obscurely) if there was a callback on the > "etc/smbios/smbios-anchor" file, and the callback was executed on > selection, not read. OK, but none of "etc/smbios/*" blobs actually have a callback at all. After some grepping, the only places inserting callback-equipped blobs are: - hw/i386/acpi-build.c (via rom_add_blob or directly by calling fw_cfg_add_file_callback) - files added are "/etc/acpi/rsdp", "/etc/acpi/tables", and "/etc/table-loader". - all using the same callback: acpi_build_update() - hw/arm/virt-acpi-build.c (via rom_add_blob only) - same three files as on i386 - all using the same callback: virt_acpi_build_update() Both of these callbacks are a one-shot deal, i.e. they both contain something along these lines: /* No state to update or already patched? Nothing to do. */ if (!build_state || build_state->patched) { return; } build_state->patched = 1; So, they do something *once* before the first byte is ever read, and never again after that. > ... This one instance wouldn't be particularly hard to patch in edk2, > but in general our specs are useless if we don't stick to them. OK, so I was proposing to amend the specs (now, while externally visible behavior won't be affected), and *THEN* stick to them :) We're already giving up on the letter of the specs (right now, they say once per byte read, but DMA is only doing once per chunk transfered, which in practice amounts to once each time a whole blob is read). Of course, if you (or anyone else with much more clue than me) expect a future scenario where we'd need the opportunity to run the callback more than once *before* reading anything from the blob, or (as is the case with smbios) wish to select (but not read from) blobs, and the blobs will be callback-enabled, but running the callback will be a bad thing when no read follows, then by all means, let's stick with hooking into each individual read operation. As it is right now, the ammended spec I'm proposing (if set, callback runs on select, whether a read follows or not) is a NOP w.r.t. currently visible behavior. It allows simplifying things, at the price of removing theoretical future flexibility (but also unnecessary slowness as well). Thanks for helping me think this through ! --Gabriel
Re: [Qemu-devel] [PATCH v3 4/5] Enable fw_cfg DMA interface for ARM
On Thu, Oct 22, 2015 at 05:22:16PM -0400, Gabriel L. Somlo wrote: > I was re-reading the documentation for fw_cfg_add_file_callback(), > and noticed that non-dma read operations check for the presence > of a callback (and call it if present) for *every* *single* *byte*, > even on 64-bit MMIO reads. That's also what the documentation says > (in docs/specs/fw_cfg.txt, being moved into fw_cfg.h as per > http://lists.nongnu.org/archive/html/qemu-devel/2015-10/msg05315.html). > > During DMA reads, however, the callback is only checked once before > each chunk, effectively once per DMA read operation. > > Now, typical callbacks I found throughout the qemu source tend to return > immediately except for the first time they're invoked, but I wonder if > skipping over all those extra "do I have a callback, if so call it, > mostly so it can return without doing anything" per-byte operations > account in some significant part for the dramatically faster transfers? > > Not sure how I'd test for that -- besides my not having anything > resembling a viable ARM setup, I'm not sure if limiting the callbacks > to only be invoked if (s->cur_offset == 0) would make sense, just as a > test ? I think Marc came to the conclusion that it's safe and therefore made that optimization for DMA. The same can be done for PIO. Stefan
Re: [Qemu-devel] [PATCH v3 4/5] Enable fw_cfg DMA interface for ARM
On Mon, Oct 26, 2015 at 02:38:11PM +0100, Laszlo Ersek wrote: > On 10/26/15 13:49, Gabriel L. Somlo wrote: > > On Mon, Oct 26, 2015 at 10:48:08AM +, Stefan Hajnoczi wrote: > >> On Thu, Oct 22, 2015 at 05:22:16PM -0400, Gabriel L. Somlo wrote: > >>> I was re-reading the documentation for fw_cfg_add_file_callback(), > >>> and noticed that non-dma read operations check for the presence > >>> of a callback (and call it if present) for *every* *single* *byte*, > >>> even on 64-bit MMIO reads. That's also what the documentation says > >>> (in docs/specs/fw_cfg.txt, being moved into fw_cfg.h as per > >>> http://lists.nongnu.org/archive/html/qemu-devel/2015-10/msg05315.html). > >>> > >>> During DMA reads, however, the callback is only checked once before > >>> each chunk, effectively once per DMA read operation. > >>> > >>> Now, typical callbacks I found throughout the qemu source tend to return > >>> immediately except for the first time they're invoked, but I wonder if > >>> skipping over all those extra "do I have a callback, if so call it, > >>> mostly so it can return without doing anything" per-byte operations > >>> account in some significant part for the dramatically faster transfers? > >>> > >>> Not sure how I'd test for that -- besides my not having anything > >>> resembling a viable ARM setup, I'm not sure if limiting the callbacks > >>> to only be invoked if (s->cur_offset == 0) would make sense, just as a > >>> test ? > >> > >> I think Marc came to the conclusion that it's safe and therefore made > >> that optimization for DMA. > >> > >> The same can be done for PIO. > > > > OK, so at the risk of over-reaching here, would it make sense to > > rewrite the fw_cfg spec to say "If present, a callback will be > > executed *once* before each time a blob is read" ? > > > > My hypothesis (which I guess I'm volunteering to verify, unless we > > end up rejecting this immediately as a bad idea, for some reason that > > I have missed), is that current functionality wouldn't change, given > > the way existing callbacks work right now, and that we could run the > > callback each time a blob is *selected*, rather than hooking into the > > (dma/mmio/pio) read methods. > > Callback executed on first read only sounds okay to me, callback > executed on selection... hm... don't like it. :) I figured there's different code paths for the different read methods, so instead of checking for (and calling) the callback in each of them, (and additionally looking at whether the current read offset is 0 if we're to only call it on first read only), I could maybe factor it out a bit further. Since the only reason you'd want select something is to then read from it, that sort-of made sense to me, at the time... :) I don't have strong feelings about it, though... :) Thanks, --Gabriel
Re: [Qemu-devel] [PATCH v3 4/5] Enable fw_cfg DMA interface for ARM
On 10/26/15 13:49, Gabriel L. Somlo wrote: > On Mon, Oct 26, 2015 at 10:48:08AM +, Stefan Hajnoczi wrote: >> On Thu, Oct 22, 2015 at 05:22:16PM -0400, Gabriel L. Somlo wrote: >>> I was re-reading the documentation for fw_cfg_add_file_callback(), >>> and noticed that non-dma read operations check for the presence >>> of a callback (and call it if present) for *every* *single* *byte*, >>> even on 64-bit MMIO reads. That's also what the documentation says >>> (in docs/specs/fw_cfg.txt, being moved into fw_cfg.h as per >>> http://lists.nongnu.org/archive/html/qemu-devel/2015-10/msg05315.html). >>> >>> During DMA reads, however, the callback is only checked once before >>> each chunk, effectively once per DMA read operation. >>> >>> Now, typical callbacks I found throughout the qemu source tend to return >>> immediately except for the first time they're invoked, but I wonder if >>> skipping over all those extra "do I have a callback, if so call it, >>> mostly so it can return without doing anything" per-byte operations >>> account in some significant part for the dramatically faster transfers? >>> >>> Not sure how I'd test for that -- besides my not having anything >>> resembling a viable ARM setup, I'm not sure if limiting the callbacks >>> to only be invoked if (s->cur_offset == 0) would make sense, just as a >>> test ? >> >> I think Marc came to the conclusion that it's safe and therefore made >> that optimization for DMA. >> >> The same can be done for PIO. > > OK, so at the risk of over-reaching here, would it make sense to > rewrite the fw_cfg spec to say "If present, a callback will be > executed *once* before each time a blob is read" ? > > My hypothesis (which I guess I'm volunteering to verify, unless we > end up rejecting this immediately as a bad idea, for some reason that > I have missed), is that current functionality wouldn't change, given > the way existing callbacks work right now, and that we could run the > callback each time a blob is *selected*, rather than hooking into the > (dma/mmio/pio) read methods. Callback executed on first read only sounds okay to me, callback executed on selection... hm... don't like it. :) Thanks Laszlo > > Thanks, > --Gabriel >
Re: [Qemu-devel] [PATCH v3 4/5] Enable fw_cfg DMA interface for ARM
On Mon, Oct 26, 2015 at 10:48:08AM +, Stefan Hajnoczi wrote: > On Thu, Oct 22, 2015 at 05:22:16PM -0400, Gabriel L. Somlo wrote: > > I was re-reading the documentation for fw_cfg_add_file_callback(), > > and noticed that non-dma read operations check for the presence > > of a callback (and call it if present) for *every* *single* *byte*, > > even on 64-bit MMIO reads. That's also what the documentation says > > (in docs/specs/fw_cfg.txt, being moved into fw_cfg.h as per > > http://lists.nongnu.org/archive/html/qemu-devel/2015-10/msg05315.html). > > > > During DMA reads, however, the callback is only checked once before > > each chunk, effectively once per DMA read operation. > > > > Now, typical callbacks I found throughout the qemu source tend to return > > immediately except for the first time they're invoked, but I wonder if > > skipping over all those extra "do I have a callback, if so call it, > > mostly so it can return without doing anything" per-byte operations > > account in some significant part for the dramatically faster transfers? > > > > Not sure how I'd test for that -- besides my not having anything > > resembling a viable ARM setup, I'm not sure if limiting the callbacks > > to only be invoked if (s->cur_offset == 0) would make sense, just as a > > test ? > > I think Marc came to the conclusion that it's safe and therefore made > that optimization for DMA. > > The same can be done for PIO. OK, so at the risk of over-reaching here, would it make sense to rewrite the fw_cfg spec to say "If present, a callback will be executed *once* before each time a blob is read" ? My hypothesis (which I guess I'm volunteering to verify, unless we end up rejecting this immediately as a bad idea, for some reason that I have missed), is that current functionality wouldn't change, given the way existing callbacks work right now, and that we could run the callback each time a blob is *selected*, rather than hooking into the (dma/mmio/pio) read methods. Thanks, --Gabriel
Re: [Qemu-devel] [PATCH v3 4/5] Enable fw_cfg DMA interface for ARM
On Sat, 19 Sep 2015, Laszlo Ersek wrote: > Got some good news: with those two fixups in place (register block > size corrected, and dma_enabled set via device property), I could > test the AAVMF / ArmVirtPkg / > patches. > > On my APM Mustang, downloading a decompressed kernel (14,475,776 > bytes), a decompressed initrd (18,177,264), and a cmdline (104 bytes :)), > in total 32,653,144 bytes, takes approx. 24 seconds with the 8-byte wide > MMIO data register. (Yeah, it's *really* slow.) > > Using the DMA interface, the same takes about 52 milliseconds, and > that still includes one progress message per 1 MB downloaded :) > > It's a factor of approx. 450. Not bad. Not bad. :) So I've been catching up (after a several-week-long day-job related detour :) with the latest developments in fw_cfg -- and the DMA stuff looks good, and makes for a very educational read! I was re-reading the documentation for fw_cfg_add_file_callback(), and noticed that non-dma read operations check for the presence of a callback (and call it if present) for *every* *single* *byte*, even on 64-bit MMIO reads. That's also what the documentation says (in docs/specs/fw_cfg.txt, being moved into fw_cfg.h as per http://lists.nongnu.org/archive/html/qemu-devel/2015-10/msg05315.html). During DMA reads, however, the callback is only checked once before each chunk, effectively once per DMA read operation. Now, typical callbacks I found throughout the qemu source tend to return immediately except for the first time they're invoked, but I wonder if skipping over all those extra "do I have a callback, if so call it, mostly so it can return without doing anything" per-byte operations account in some significant part for the dramatically faster transfers? Not sure how I'd test for that -- besides my not having anything resembling a viable ARM setup, I'm not sure if limiting the callbacks to only be invoked if (s->cur_offset == 0) would make sense, just as a test ? Either way, I'll send out a v2 of my fw_cfg function-call doc patch to additionally say something like: * structure residing at key value FW_CFG_FILE_DIR, containing the * item name, * data size, and assigned selector key value. * Additionally, set a callback function (and argument) to be called * each - * time a byte is read by the guest from this particular item. + * time a byte is read by the guest from this particular item, or once per + * each DMA guest read operation. * NOTE: In addition to the opaque argument set here, the callback * function * takes the current data offset as an additional argument, allowing * it the * option of only acting upon specific offset values (e.g., 0, before * the Let me know what you all think... Thanks much, --Gabriel
Re: [Qemu-devel] [PATCH v3 4/5] Enable fw_cfg DMA interface for ARM
On Sat, 19 Sep 2015 01:10:46 +0200 Laszlo Ersekwrote: > On 09/18/15 22:24, Marc Marí wrote: > > On Fri, 18 Sep 2015 22:16:46 +0200 > > Laszlo Ersek wrote: > > > >> On 09/18/15 10:58, Marc Marí wrote: > >>> Enable the fw_cfg DMA interface for the ARM virt machine. > >>> > >>> Based on Gerd Hoffman's initial implementation. > >>> > >>> Signed-off-by: Marc Marí > >>> --- > >>> hw/arm/virt.c | 9 + > >>> 1 file changed, 5 insertions(+), 4 deletions(-) > >>> > >>> diff --git a/hw/arm/virt.c b/hw/arm/virt.c > >>> index 3568107..47f4ad3 100644 > >>> --- a/hw/arm/virt.c > >>> +++ b/hw/arm/virt.c > >>> @@ -113,7 +113,7 @@ static const MemMapEntry a15memmap[] = { > >>> [VIRT_GIC_V2M] ={ 0x0802, 0x1000 }, > >>> [VIRT_UART] = { 0x0900, 0x1000 }, > >>> [VIRT_RTC] ={ 0x0901, 0x1000 }, > >>> -[VIRT_FW_CFG] = { 0x0902, 0x000a }, > >>> +[VIRT_FW_CFG] = { 0x0902, 0x0014 }, > >> > >> Okay, Laszlo is the hateful reviewer. Sorry about that. I'm late, > >> yes. > >> > >> But: this says 0x0014, ie 20 bytes in decimal. I don't think > >> that's correct; it should be 0x18 -- 24 bytes in decimal. From > >> patch #2: "DMA Address address: Base + 16 (8 bytes)". > > > > It's not your problem if I don't know how to count. So don't > > apologize :). > > > > And it's better to catch this stupid little mistakes now. > > Got some good news: with those two fixups in place (register block > size corrected, and dma_enabled set via device property), I could > test the AAVMF / ArmVirtPkg / > patches. > > On my APM Mustang, downloading a decompressed kernel (14,475,776 > bytes), a decompressed initrd (18,177,264), and a cmdline (104 > bytes :)), in total 32,653,144 bytes, takes approx. 24 seconds with > the 8-byte wide MMIO data register. (Yeah, it's *really* slow.) > > Using the DMA interface, the same takes about 52 milliseconds, and > that still includes one progress message per 1 MB downloaded :) > > It's a factor of approx. 450. Not bad. Not bad. :) Not bad. Not bad :). In x86 the speedup is high but not so brutal. I'm really happy that it works so well. Thanks Marc > Thanks > Laszlo > > > > Thanks > > Marc > > > >> Thanks (and I'm sorry about being late!) > >> Laszlo > >> > >>> [VIRT_MMIO] = { 0x0a00, 0x0200 }, > >>> /* ...repeating for a total of NUM_VIRTIO_TRANSPORTS, each of > >>> that size */ [VIRT_PLATFORM_BUS] = { 0x0c00, > >>> 0x0200 }, @@ -651,13 +651,13 @@ static void > >>> create_flash(const VirtBoardInfo *vbi) g_free(nodename); > >>> } > >>> > >>> -static void create_fw_cfg(const VirtBoardInfo *vbi) > >>> +static void create_fw_cfg(const VirtBoardInfo *vbi, AddressSpace > >>> *as) { > >>> hwaddr base = vbi->memmap[VIRT_FW_CFG].base; > >>> hwaddr size = vbi->memmap[VIRT_FW_CFG].size; > >>> char *nodename; > >>> > >>> -fw_cfg_init_mem_wide(base + 8, base, 8, 0, NULL); > >>> +fw_cfg_init_mem_wide(base + 8, base, 8, base + 16, as); > >>> > >>> nodename = g_strdup_printf("/fw-cfg@%" PRIx64, base); > >>> qemu_fdt_add_subnode(vbi->fdt, nodename); > >>> @@ -919,6 +919,7 @@ static void machvirt_init(MachineState > >>> *machine) > >>> create_fdt(vbi); > >>> > >>> + > >>> for (n = 0; n < smp_cpus; n++) { > >>> ObjectClass *oc = cpu_class_by_name(TYPE_ARM_CPU, > >>> cpustr[0]); CPUClass *cc = CPU_CLASS(oc); > >>> @@ -984,7 +985,7 @@ static void machvirt_init(MachineState > >>> *machine) */ > >>> create_virtio_devices(vbi, pic); > >>> > >>> -create_fw_cfg(vbi); > >>> +create_fw_cfg(vbi, _space_memory); > >>> rom_set_fw(fw_cfg_find()); > >>> > >>> guest_info->smp_cpus = smp_cpus; > >>> > >> > > >
[Qemu-devel] [PATCH v3 4/5] Enable fw_cfg DMA interface for ARM
Enable the fw_cfg DMA interface for the ARM virt machine. Based on Gerd Hoffman's initial implementation. Signed-off-by: Marc Marí--- hw/arm/virt.c | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/hw/arm/virt.c b/hw/arm/virt.c index 3568107..47f4ad3 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -113,7 +113,7 @@ static const MemMapEntry a15memmap[] = { [VIRT_GIC_V2M] ={ 0x0802, 0x1000 }, [VIRT_UART] = { 0x0900, 0x1000 }, [VIRT_RTC] ={ 0x0901, 0x1000 }, -[VIRT_FW_CFG] = { 0x0902, 0x000a }, +[VIRT_FW_CFG] = { 0x0902, 0x0014 }, [VIRT_MMIO] = { 0x0a00, 0x0200 }, /* ...repeating for a total of NUM_VIRTIO_TRANSPORTS, each of that size */ [VIRT_PLATFORM_BUS] = { 0x0c00, 0x0200 }, @@ -651,13 +651,13 @@ static void create_flash(const VirtBoardInfo *vbi) g_free(nodename); } -static void create_fw_cfg(const VirtBoardInfo *vbi) +static void create_fw_cfg(const VirtBoardInfo *vbi, AddressSpace *as) { hwaddr base = vbi->memmap[VIRT_FW_CFG].base; hwaddr size = vbi->memmap[VIRT_FW_CFG].size; char *nodename; -fw_cfg_init_mem_wide(base + 8, base, 8, 0, NULL); +fw_cfg_init_mem_wide(base + 8, base, 8, base + 16, as); nodename = g_strdup_printf("/fw-cfg@%" PRIx64, base); qemu_fdt_add_subnode(vbi->fdt, nodename); @@ -919,6 +919,7 @@ static void machvirt_init(MachineState *machine) create_fdt(vbi); + for (n = 0; n < smp_cpus; n++) { ObjectClass *oc = cpu_class_by_name(TYPE_ARM_CPU, cpustr[0]); CPUClass *cc = CPU_CLASS(oc); @@ -984,7 +985,7 @@ static void machvirt_init(MachineState *machine) */ create_virtio_devices(vbi, pic); -create_fw_cfg(vbi); +create_fw_cfg(vbi, _space_memory); rom_set_fw(fw_cfg_find()); guest_info->smp_cpus = smp_cpus; -- 2.4.3
Re: [Qemu-devel] [PATCH v3 4/5] Enable fw_cfg DMA interface for ARM
On Fri, 18 Sep 2015 22:16:46 +0200 Laszlo Ersekwrote: > On 09/18/15 10:58, Marc Marí wrote: > > Enable the fw_cfg DMA interface for the ARM virt machine. > > > > Based on Gerd Hoffman's initial implementation. > > > > Signed-off-by: Marc Marí > > --- > > hw/arm/virt.c | 9 + > > 1 file changed, 5 insertions(+), 4 deletions(-) > > > > diff --git a/hw/arm/virt.c b/hw/arm/virt.c > > index 3568107..47f4ad3 100644 > > --- a/hw/arm/virt.c > > +++ b/hw/arm/virt.c > > @@ -113,7 +113,7 @@ static const MemMapEntry a15memmap[] = { > > [VIRT_GIC_V2M] ={ 0x0802, 0x1000 }, > > [VIRT_UART] = { 0x0900, 0x1000 }, > > [VIRT_RTC] ={ 0x0901, 0x1000 }, > > -[VIRT_FW_CFG] = { 0x0902, 0x000a }, > > +[VIRT_FW_CFG] = { 0x0902, 0x0014 }, > > Okay, Laszlo is the hateful reviewer. Sorry about that. I'm late, yes. > > But: this says 0x0014, ie 20 bytes in decimal. I don't think > that's correct; it should be 0x18 -- 24 bytes in decimal. From patch > #2: "DMA Address address: Base + 16 (8 bytes)". It's not your problem if I don't know how to count. So don't apologize :). And it's better to catch this stupid little mistakes now. Thanks Marc > Thanks (and I'm sorry about being late!) > Laszlo > > > [VIRT_MMIO] = { 0x0a00, 0x0200 }, > > /* ...repeating for a total of NUM_VIRTIO_TRANSPORTS, each of > > that size */ [VIRT_PLATFORM_BUS] = { 0x0c00, 0x0200 }, > > @@ -651,13 +651,13 @@ static void create_flash(const VirtBoardInfo > > *vbi) g_free(nodename); > > } > > > > -static void create_fw_cfg(const VirtBoardInfo *vbi) > > +static void create_fw_cfg(const VirtBoardInfo *vbi, AddressSpace > > *as) { > > hwaddr base = vbi->memmap[VIRT_FW_CFG].base; > > hwaddr size = vbi->memmap[VIRT_FW_CFG].size; > > char *nodename; > > > > -fw_cfg_init_mem_wide(base + 8, base, 8, 0, NULL); > > +fw_cfg_init_mem_wide(base + 8, base, 8, base + 16, as); > > > > nodename = g_strdup_printf("/fw-cfg@%" PRIx64, base); > > qemu_fdt_add_subnode(vbi->fdt, nodename); > > @@ -919,6 +919,7 @@ static void machvirt_init(MachineState *machine) > > > > create_fdt(vbi); > > > > + > > for (n = 0; n < smp_cpus; n++) { > > ObjectClass *oc = cpu_class_by_name(TYPE_ARM_CPU, > > cpustr[0]); CPUClass *cc = CPU_CLASS(oc); > > @@ -984,7 +985,7 @@ static void machvirt_init(MachineState *machine) > > */ > > create_virtio_devices(vbi, pic); > > > > -create_fw_cfg(vbi); > > +create_fw_cfg(vbi, _space_memory); > > rom_set_fw(fw_cfg_find()); > > > > guest_info->smp_cpus = smp_cpus; > > >
Re: [Qemu-devel] [PATCH v3 4/5] Enable fw_cfg DMA interface for ARM
On 09/18/15 10:58, Marc Marí wrote: > Enable the fw_cfg DMA interface for the ARM virt machine. > > Based on Gerd Hoffman's initial implementation. > > Signed-off-by: Marc Marí> --- > hw/arm/virt.c | 9 + > 1 file changed, 5 insertions(+), 4 deletions(-) > > diff --git a/hw/arm/virt.c b/hw/arm/virt.c > index 3568107..47f4ad3 100644 > --- a/hw/arm/virt.c > +++ b/hw/arm/virt.c > @@ -113,7 +113,7 @@ static const MemMapEntry a15memmap[] = { > [VIRT_GIC_V2M] ={ 0x0802, 0x1000 }, > [VIRT_UART] = { 0x0900, 0x1000 }, > [VIRT_RTC] ={ 0x0901, 0x1000 }, > -[VIRT_FW_CFG] = { 0x0902, 0x000a }, > +[VIRT_FW_CFG] = { 0x0902, 0x0014 }, > [VIRT_MMIO] = { 0x0a00, 0x0200 }, > /* ...repeating for a total of NUM_VIRTIO_TRANSPORTS, each of that size > */ > [VIRT_PLATFORM_BUS] = { 0x0c00, 0x0200 }, > @@ -651,13 +651,13 @@ static void create_flash(const VirtBoardInfo *vbi) > g_free(nodename); > } > > -static void create_fw_cfg(const VirtBoardInfo *vbi) > +static void create_fw_cfg(const VirtBoardInfo *vbi, AddressSpace *as) > { > hwaddr base = vbi->memmap[VIRT_FW_CFG].base; > hwaddr size = vbi->memmap[VIRT_FW_CFG].size; > char *nodename; > > -fw_cfg_init_mem_wide(base + 8, base, 8, 0, NULL); > +fw_cfg_init_mem_wide(base + 8, base, 8, base + 16, as); > > nodename = g_strdup_printf("/fw-cfg@%" PRIx64, base); > qemu_fdt_add_subnode(vbi->fdt, nodename); > @@ -919,6 +919,7 @@ static void machvirt_init(MachineState *machine) > > create_fdt(vbi); > > + > for (n = 0; n < smp_cpus; n++) { > ObjectClass *oc = cpu_class_by_name(TYPE_ARM_CPU, cpustr[0]); > CPUClass *cc = CPU_CLASS(oc); > @@ -984,7 +985,7 @@ static void machvirt_init(MachineState *machine) > */ > create_virtio_devices(vbi, pic); > > -create_fw_cfg(vbi); > +create_fw_cfg(vbi, _space_memory); > rom_set_fw(fw_cfg_find()); > > guest_info->smp_cpus = smp_cpus; > I got excited that the work got this far (thanks a lot for it, and I apologize on falling back on the review), so I wanted to start writing the edk2 / ArmVirtPkg client code for it. I applied your v3 series on top of current master (b12a84ce3c27e42c8f51c436aa196938d5cc2c71). First I wanted to see a new DTB: $ qemu-system-aarch64 -machine virt,dumpdtb=xx.dtb Unfortunately it crashes with a failed assertion: qemu-system-aarch64: hw/core/sysbus.c:130: sysbus_mmio_map_common: Assertion `n >= 0 && n < dev->num_mmio' failed. The problem is that you have a third (conditional) sysbus_mmio_map() in fw_cfg_init_mem_wide(), from patch #3, which would depend on the similarly conditional sysbus_init_mmio() call in fw_cfg_mem_realize(). However, that prerequisite sysbus_init_mmio() is never executed in fw_cfg_mem_realize(), because it would depend on the (FW_CFG(s)->dma_enabled) field, which at that point *cannot* have been set at all. So you have to set it through a property, because that's the only way you can pass it to the realize method. Please squash the following patch into patch #3: > diff --git a/hw/nvram/fw_cfg.c b/hw/nvram/fw_cfg.c > index d11d8c5..946abb5 100644 > --- a/hw/nvram/fw_cfg.c > +++ b/hw/nvram/fw_cfg.c > @@ -799,9 +799,11 @@ FWCfgState *fw_cfg_init_mem_wide(hwaddr ctl_addr, > SysBusDevice *sbd; > FWCfgState *s; > uint32_t version = FW_CFG_VERSION; > +bool dma_enabled = dma_addr && dma_as; > > dev = qdev_create(NULL, TYPE_FW_CFG_MEM); > qdev_prop_set_uint32(dev, "data_width", data_width); > +qdev_prop_set_bit(dev, "dma_enabled", dma_enabled); > > fw_cfg_init1(dev); > > @@ -811,9 +813,8 @@ FWCfgState *fw_cfg_init_mem_wide(hwaddr ctl_addr, > > s = FW_CFG(dev); > > -if (dma_addr && dma_as) { > +if (dma_enabled) { > s->dma_as = dma_as; > -s->dma_enabled = true; > s->dma_addr = 0; > sysbus_mmio_map(sbd, 2, dma_addr); > version |= FW_CFG_VERSION_DMA; > @@ -891,6 +892,8 @@ static const TypeInfo fw_cfg_io_info = { > > static Property fw_cfg_mem_properties[] = { > DEFINE_PROP_UINT32("data_width", FWCfgMemState, data_width, -1), > +DEFINE_PROP_BOOL("dma_enabled", FWCfgMemState, parent_obj.dma_enabled, > + false), > DEFINE_PROP_END_OF_LIST(), > }; > Thanks Laszlo
Re: [Qemu-devel] [PATCH v3 4/5] Enable fw_cfg DMA interface for ARM
On 09/18/15 10:58, Marc Marí wrote: > Enable the fw_cfg DMA interface for the ARM virt machine. > > Based on Gerd Hoffman's initial implementation. > > Signed-off-by: Marc Marí> --- > hw/arm/virt.c | 9 + > 1 file changed, 5 insertions(+), 4 deletions(-) > > diff --git a/hw/arm/virt.c b/hw/arm/virt.c > index 3568107..47f4ad3 100644 > --- a/hw/arm/virt.c > +++ b/hw/arm/virt.c > @@ -113,7 +113,7 @@ static const MemMapEntry a15memmap[] = { > [VIRT_GIC_V2M] ={ 0x0802, 0x1000 }, > [VIRT_UART] = { 0x0900, 0x1000 }, > [VIRT_RTC] ={ 0x0901, 0x1000 }, > -[VIRT_FW_CFG] = { 0x0902, 0x000a }, > +[VIRT_FW_CFG] = { 0x0902, 0x0014 }, Okay, Laszlo is the hateful reviewer. Sorry about that. I'm late, yes. But: this says 0x0014, ie 20 bytes in decimal. I don't think that's correct; it should be 0x18 -- 24 bytes in decimal. From patch #2: "DMA Address address: Base + 16 (8 bytes)". Thanks (and I'm sorry about being late!) Laszlo > [VIRT_MMIO] = { 0x0a00, 0x0200 }, > /* ...repeating for a total of NUM_VIRTIO_TRANSPORTS, each of that size > */ > [VIRT_PLATFORM_BUS] = { 0x0c00, 0x0200 }, > @@ -651,13 +651,13 @@ static void create_flash(const VirtBoardInfo *vbi) > g_free(nodename); > } > > -static void create_fw_cfg(const VirtBoardInfo *vbi) > +static void create_fw_cfg(const VirtBoardInfo *vbi, AddressSpace *as) > { > hwaddr base = vbi->memmap[VIRT_FW_CFG].base; > hwaddr size = vbi->memmap[VIRT_FW_CFG].size; > char *nodename; > > -fw_cfg_init_mem_wide(base + 8, base, 8, 0, NULL); > +fw_cfg_init_mem_wide(base + 8, base, 8, base + 16, as); > > nodename = g_strdup_printf("/fw-cfg@%" PRIx64, base); > qemu_fdt_add_subnode(vbi->fdt, nodename); > @@ -919,6 +919,7 @@ static void machvirt_init(MachineState *machine) > > create_fdt(vbi); > > + > for (n = 0; n < smp_cpus; n++) { > ObjectClass *oc = cpu_class_by_name(TYPE_ARM_CPU, cpustr[0]); > CPUClass *cc = CPU_CLASS(oc); > @@ -984,7 +985,7 @@ static void machvirt_init(MachineState *machine) > */ > create_virtio_devices(vbi, pic); > > -create_fw_cfg(vbi); > +create_fw_cfg(vbi, _space_memory); > rom_set_fw(fw_cfg_find()); > > guest_info->smp_cpus = smp_cpus; >
Re: [Qemu-devel] [PATCH v3 4/5] Enable fw_cfg DMA interface for ARM
On 09/18/15 22:24, Marc Marí wrote: > On Fri, 18 Sep 2015 22:16:46 +0200 > Laszlo Ersekwrote: > >> On 09/18/15 10:58, Marc Marí wrote: >>> Enable the fw_cfg DMA interface for the ARM virt machine. >>> >>> Based on Gerd Hoffman's initial implementation. >>> >>> Signed-off-by: Marc Marí >>> --- >>> hw/arm/virt.c | 9 + >>> 1 file changed, 5 insertions(+), 4 deletions(-) >>> >>> diff --git a/hw/arm/virt.c b/hw/arm/virt.c >>> index 3568107..47f4ad3 100644 >>> --- a/hw/arm/virt.c >>> +++ b/hw/arm/virt.c >>> @@ -113,7 +113,7 @@ static const MemMapEntry a15memmap[] = { >>> [VIRT_GIC_V2M] ={ 0x0802, 0x1000 }, >>> [VIRT_UART] = { 0x0900, 0x1000 }, >>> [VIRT_RTC] ={ 0x0901, 0x1000 }, >>> -[VIRT_FW_CFG] = { 0x0902, 0x000a }, >>> +[VIRT_FW_CFG] = { 0x0902, 0x0014 }, >> >> Okay, Laszlo is the hateful reviewer. Sorry about that. I'm late, yes. >> >> But: this says 0x0014, ie 20 bytes in decimal. I don't think >> that's correct; it should be 0x18 -- 24 bytes in decimal. From patch >> #2: "DMA Address address: Base + 16 (8 bytes)". > > It's not your problem if I don't know how to count. So don't > apologize :). > > And it's better to catch this stupid little mistakes now. Got some good news: with those two fixups in place (register block size corrected, and dma_enabled set via device property), I could test the AAVMF / ArmVirtPkg / patches. On my APM Mustang, downloading a decompressed kernel (14,475,776 bytes), a decompressed initrd (18,177,264), and a cmdline (104 bytes :)), in total 32,653,144 bytes, takes approx. 24 seconds with the 8-byte wide MMIO data register. (Yeah, it's *really* slow.) Using the DMA interface, the same takes about 52 milliseconds, and that still includes one progress message per 1 MB downloaded :) It's a factor of approx. 450. Not bad. Not bad. :) Thanks Laszlo > Thanks > Marc > >> Thanks (and I'm sorry about being late!) >> Laszlo >> >>> [VIRT_MMIO] = { 0x0a00, 0x0200 }, >>> /* ...repeating for a total of NUM_VIRTIO_TRANSPORTS, each of >>> that size */ [VIRT_PLATFORM_BUS] = { 0x0c00, 0x0200 }, >>> @@ -651,13 +651,13 @@ static void create_flash(const VirtBoardInfo >>> *vbi) g_free(nodename); >>> } >>> >>> -static void create_fw_cfg(const VirtBoardInfo *vbi) >>> +static void create_fw_cfg(const VirtBoardInfo *vbi, AddressSpace >>> *as) { >>> hwaddr base = vbi->memmap[VIRT_FW_CFG].base; >>> hwaddr size = vbi->memmap[VIRT_FW_CFG].size; >>> char *nodename; >>> >>> -fw_cfg_init_mem_wide(base + 8, base, 8, 0, NULL); >>> +fw_cfg_init_mem_wide(base + 8, base, 8, base + 16, as); >>> >>> nodename = g_strdup_printf("/fw-cfg@%" PRIx64, base); >>> qemu_fdt_add_subnode(vbi->fdt, nodename); >>> @@ -919,6 +919,7 @@ static void machvirt_init(MachineState *machine) >>> >>> create_fdt(vbi); >>> >>> + >>> for (n = 0; n < smp_cpus; n++) { >>> ObjectClass *oc = cpu_class_by_name(TYPE_ARM_CPU, >>> cpustr[0]); CPUClass *cc = CPU_CLASS(oc); >>> @@ -984,7 +985,7 @@ static void machvirt_init(MachineState *machine) >>> */ >>> create_virtio_devices(vbi, pic); >>> >>> -create_fw_cfg(vbi); >>> +create_fw_cfg(vbi, _space_memory); >>> rom_set_fw(fw_cfg_find()); >>> >>> guest_info->smp_cpus = smp_cpus; >>> >> >
Re: [Qemu-devel] [PATCH v3 4/5] Enable fw_cfg DMA interface for ARM
On 18 September 2015 at 09:58, Marc Maríwrote: > Enable the fw_cfg DMA interface for the ARM virt machine. > > Based on Gerd Hoffman's initial implementation. > > Signed-off-by: Marc Marí > --- > hw/arm/virt.c | 9 + > 1 file changed, 5 insertions(+), 4 deletions(-) > > diff --git a/hw/arm/virt.c b/hw/arm/virt.c > index 3568107..47f4ad3 100644 > --- a/hw/arm/virt.c > +++ b/hw/arm/virt.c > @@ -113,7 +113,7 @@ static const MemMapEntry a15memmap[] = { > [VIRT_GIC_V2M] ={ 0x0802, 0x1000 }, > [VIRT_UART] = { 0x0900, 0x1000 }, > [VIRT_RTC] ={ 0x0901, 0x1000 }, > -[VIRT_FW_CFG] = { 0x0902, 0x000a }, > +[VIRT_FW_CFG] = { 0x0902, 0x0014 }, > [VIRT_MMIO] = { 0x0a00, 0x0200 }, > /* ...repeating for a total of NUM_VIRTIO_TRANSPORTS, each of that size > */ > [VIRT_PLATFORM_BUS] = { 0x0c00, 0x0200 }, > @@ -651,13 +651,13 @@ static void create_flash(const VirtBoardInfo *vbi) > g_free(nodename); > } > > -static void create_fw_cfg(const VirtBoardInfo *vbi) > +static void create_fw_cfg(const VirtBoardInfo *vbi, AddressSpace *as) > { > hwaddr base = vbi->memmap[VIRT_FW_CFG].base; > hwaddr size = vbi->memmap[VIRT_FW_CFG].size; > char *nodename; > > -fw_cfg_init_mem_wide(base + 8, base, 8, 0, NULL); > +fw_cfg_init_mem_wide(base + 8, base, 8, base + 16, as); > > nodename = g_strdup_printf("/fw-cfg@%" PRIx64, base); > qemu_fdt_add_subnode(vbi->fdt, nodename); > @@ -919,6 +919,7 @@ static void machvirt_init(MachineState *machine) > > create_fdt(vbi); > > + Stray whitespace change. > for (n = 0; n < smp_cpus; n++) { > ObjectClass *oc = cpu_class_by_name(TYPE_ARM_CPU, cpustr[0]); > CPUClass *cc = CPU_CLASS(oc); > @@ -984,7 +985,7 @@ static void machvirt_init(MachineState *machine) > */ > create_virtio_devices(vbi, pic); > > -create_fw_cfg(vbi); > +create_fw_cfg(vbi, _space_memory); > rom_set_fw(fw_cfg_find()); > > guest_info->smp_cpus = smp_cpus; > -- > 2.4.3 Otherwise: Reviewed-by: Peter Maydell thanks -- PMM