Peter Xu <pet...@redhat.com> writes:
> Hi, Alex, > > On Thu, Nov 10, 2022 at 05:55:51PM +0000, Alex Bennée wrote: >> >> Alex Bennée <alex.ben...@linaro.org> writes: >> >> > Hi, >> > >> > I've been trying to remove current_cpu hacks from our hw/ emulation and >> > replace them with an explicit cpu_index derived from MemTxAttrs. So far >> > this has been going mostly ok but I've run into problems on x86 due to >> > the way the apic/ioapic are modelled. It comes down to the function >> > ioapic_service() which eventually does: >> > >> > /* No matter whether IR is enabled, we translate >> > * the IOAPIC message into a MSI one, and its >> > * address space will decide whether we need a >> > * translation. */ >> > stl_le_phys(ioapic_as, info.addr, info.data); >> > >> > Which essentially calls the memory API to simulate a memory write. >> > However to generate a properly formed MemTxAttrs we need to know what >> > CPU we are running on. In the case of ioapic_service() we may well be in >> > the main thread either for an expiring timer: >> > >> > hpet_timer->qemu_set_irq->ioapic_set_irq >> > >> > or for reset handling: >> > >> > pc_machine_reset->hpet_reset->qemu_set_irq->ioapic_set_irq >> > >> > neither of which can get a associated CPU. I assume if the actual writes >> > are triggered we never actually actioned stuff because we had: >> > >> > DeviceState *cpu_get_current_apic(void) >> > { >> > if (current_cpu) { >> > X86CPU *cpu = X86_CPU(current_cpu); >> > return cpu->apic_state; >> > } else { >> > return NULL; >> > } >> > } >> > >> > which basically causes the updates to be dropped on the floor. > > I think it shouldn't? Normally the irq will be in MSI format (IOAPIC will > translate to an MSI in QEMU, per ioapic_entry_parse()). > > I had a feeling that it'll just go the shortcut here (MSI always starts > with 0xfeeXXXXX so definitely bigger than 0xfff): > > if (addr > 0xfff || !index) { > /* MSI and MMIO APIC are at the same memory location, > * but actually not on the global bus: MSI is on PCI bus > * APIC is connected directly to the CPU. > * Mapping them on the global bus happens to work because > * MSI registers are reserved in APIC MMIO and vice versa. */ > MSIMessage msi = { .address = addr, .data = val }; > apic_send_msi(&msi); > return; > } > > apic_send_msi() doesn't need a cpu context. Ahh so yes maybe my changes where too quick to reject the MMIO. So I've made the following tweak to ioapic_service(): /* * No matter whether IR is enabled, we translate * the IOAPIC message into a MSI one, and its * address space will decide whether we need a * translation. * * As it is an access from something other than the * CPU or a PCI device we set its source as machine * specific. */ { MemTxAttrs attrs = MEMTXATTRS_MACHINE(MEMTX_IOAPIC); MemTxResult res; address_space_stl_le(ioapic_as, info.addr, info.data, attrs, &res); if (res != MEMTX_ERROR) { qemu_log_mask(LOG_GUEST_ERROR, "%s: couldn't write to %"PRIx32"\n", __func__, info.addr); } } and I had already tweaked the pci_msi_trigger so: static void pci_msi_trigger(PCIDevice *dev, MSIMessage msg) { MemTxAttrs attrs = { .requester_type = MTRT_PCI, .requester_id = pci_requester_id(dev) }; address_space_stl_le(&dev->bus_master_as, msg.address, msg.data, attrs, NULL); } > No expert on that, but per my understanding ioapic isn't really bound to > any apic, so it doesn't need any cpu context. As a quick reference of > that, one can look at Intel SDM Vol 3 Chap 10, figure 10.3 will be a > generic modern x86_64 system APIC structure. > > In hardware there should have a 3-wire apic bus that take care of all the > messaging, including at least not only ioapic irqs to any cores, or IPIs > between the cores. The messages coming from the ioapic should not require > any apic too as it can come from devices, just like what we do with qemu > when the device does things like pci_set_irq(), iiuc. <snip> > > AFAICT apic_mem_write() doesn't mean that this cpu will take this IRQ. The > target core to respond to the IRQ will be defined in the dest ID field of > either an MSI message or embeded in the IOAPIC entry being setup by the > guest driver. E.g. MSI message format can also be found in SDM Vol 3 chap > 10.11.1, in QEMU we can refer to "dest" field of apic_send_msi(). So now the start of apic_mem_write checks and validates MSIs first: static MemTxResult apic_mem_write(void *opaque, hwaddr addr, uint64_t val, unsigned int size, MemTxAttrs attrs) { DeviceState *dev; APICCommonState *s; int index = (addr >> 4) & 0xff; if (size < 4) { return MEMTX_ERROR; } /* * MSI and MMIO APIC are at the same memory location, but actually * not on the global bus: MSI is on PCI bus APIC is connected * directly to the CPU. * * We can check the MemTxAttrs to check they are coming from where * we expect. Even though the MSI registers are reserved in APIC * MMIO and vice versa they shouldn't respond to CPU writes. */ if (addr > 0xfff || !index) { switch (attrs.requester_type) { case MTRT_MACHINE: /* should be from the directly wired IOPIC */ if (attrs.requester_id != MEMTX_IOAPIC) { qemu_log_mask(LOG_GUEST_ERROR, "%s: rejecting machine write from something other that IOPIC (%x)", __func__, attrs.requester_id); return MEMTX_ACCESS_ERROR; } break; case MTRT_PCI: /* PCI signalled MSI */ break; case MTRT_UNSPECIFIED: qemu_log_mask(LOG_GUEST_ERROR, "%s: rejecting unspecified write", __func__); return MEMTX_ACCESS_ERROR; case MTRT_CPU: /* can CPU directly trigger MSIs? */ break; } MSIMessage msi = { .address = addr, .data = val }; apic_send_msi(&msi); return MEMTX_OK; } if (attrs.requester_type != MTRT_CPU) { return MEMTX_ACCESS_ERROR; } dev = cpu_get_current_apic(attrs.requester_id); s = APIC(dev); trace_apic_mem_writel(addr, val); which at least gets things booting properly. Does this seem like a better modelling of the APIC behaviour? -- Alex Bennée