On Thu, Apr 26, 2018 at 10:17:13AM +0200, Cédric Le Goater wrote: > On 04/26/2018 07:36 AM, David Gibson wrote: > > On Thu, Apr 19, 2018 at 07:40:09PM +0200, Cédric Le Goater wrote: > >> On 04/16/2018 06:26 AM, David Gibson wrote: > >>> On Thu, Apr 12, 2018 at 10:18:11AM +0200, Cédric Le Goater wrote: > >>>> On 04/12/2018 07:07 AM, David Gibson wrote: > >>>>> On Wed, Dec 20, 2017 at 08:38:41AM +0100, Cédric Le Goater wrote: > >>>>>> On 12/20/2017 06:09 AM, David Gibson wrote: > >>>>>>> On Sat, Dec 09, 2017 at 09:43:21AM +0100, Cédric Le Goater > >> wrote: > > [snip] > >>>> The XIVE tables are : > >>>> > >>>> * IVT > >>>> > >>>> associate an interrupt source number with an event queue. the data > >>>> to be pushed in the queue is stored there also. > >>> > >>> Ok, so there would be one of these tables for each IVRE, > >> > >> yes. one for each XIVE interrupt controller. That is one per processor > >> or socket. > > > > Ah.. so there can be more than one in a multi-socket system. > > >>> with one entry for each source managed by that IVSE, yes? > >> > >> yes. The table is simply indexed by the interrupt number in the > >> global IRQ number space of the machine. > > > > How does that work on a multi-chip machine? Does each chip just have > > a table for a slice of the global irq number space? > > yes. IRQ Allocation is done relative to the chip, each chip having > a range depending on its block id. XIVE has a concept of block, > which is used in skiboot in a one-to-one relationship with the chip.
Ok. I'm assuming this block id forms the high(ish) bits of the global irq number, yes? > >>> Do the XIVE IPIs have entries here, or do they bypass this? > >> > >> no. The IPIs have entries also in this table. > >> > >>>> * EQDT: > >>>> > >>>> describes the queues in the OS RAM, also contains a set of flags, > >>>> a virtual target, etc. > >>> > >>> So on real hardware this would be global, yes? And it would be > >>> consulted by the IVRE? > >> > >> yes. Exactly. The XIVE routing routine : > >> > >> https://github.com/legoater/qemu/blob/xive/hw/intc/xive.c#L706 > >> > >> gives a good overview of the usage of the tables. > >> > >>> For guests, we'd expect one table per-guest? > >> > >> yes but only in emulation mode. > > > > I'm not sure what you mean by this. > > I meant the sPAPR QEMU emulation mode. Linux/KVM relies on the overall > table allocated in OPAL for the system. Right.. I'm thinking of this from the point of view of the guest and/or qemu, rather than from the implementation. Even if the actual storage of the entries is distributed across the host's global table, we still logically have a table per guest, right? > >>> How would those be integrated with the host table? > >> > >> Under KVM, this is handled by the host table (setup done in skiboot) > >> and we are only interested in the state of the EQs for migration. > > > > This doesn't make sense to me; the guest is able to alter the IVT > > entries, so that configuration must be migrated somehow. > > yes. The IVE needs to be migrated. We use get/set KVM ioctls to save > and restore the value which is cached in the KVM irq state struct > (server, prio, eq data). no OPAL calls are needed though. Right. Again, at this stage I don't particularly care what the backend details are - whether the host calls OPAL or whatever. I'm more concerned with the logical model. > >> This state is set with the H_INT_SET_QUEUE_CONFIG hcall, > > > > "This state" here meaning IVT entries? > > no. The H_INT_SET_QUEUE_CONFIG sets the event queue OS page for a > server/priority couple. That is where the event queue data is > pushed. Ah. Doesn't that mean the guest *does* effectively have an EQD table, updated by this call? We'd need to migrate that data as well, and it's not part of the IVT, right? > H_INT_SET_SOURCE_CONFIG does the targeting : irq, server, priority, > and the eq data to be pushed in case of an event. Ok - that's the IVT entries, yes? > > >> followed > >> by an OPAL call and then a HW update. It defines the EQ page in which > >> to push event notification for the couple server/priority. > >> > >>>> * VPDT: > >>>> > >>>> describe the virtual targets, which can have different natures, > >>>> a lpar, a cpu. This is for powernv, spapr does not have this > >>>> concept. > >>> > >>> Ok On hardware that would also be global and consulted by the IVRE, > >>> yes? > >> > >> yes. > > > > Except.. is it actually global, or is there one per-chip/socket? > > There is a global VP allocator splitting the ids depending on the > block/chip, but, to be honest, I have not dug in the details > > > [snip] > >>>> In the current version I am working on, the XiveFabric interface is > >>>> more complex : > >>>> > >>>> typedef struct XiveFabricClass { > >>>> InterfaceClass parent; > >>>> XiveIVE *(*get_ive)(XiveFabric *xf, uint32_t lisn); > >>> > >>> This does an IVT lookup, I take it? > >> > >> yes. It is an interface for the underlying storage, which is different > >> in sPAPR and PowerNV. The goal is to make the routing generic. > > > > Right. So, yes, we definitely want a method *somehwere* to do an IVT > > lookup. I'm not entirely sure where it belongs yet. > > Me either. I have stuffed the XiveFabric with all the abstraction > needed for the moment. > > I am starting to think that there should be an interface to forward > events and another one to route them. The router being a special case > of the forwarder, the last one. The "simple" devices, like PSI, should > only be forwarders for the sources they own but the interrupt controllers > should be forwarders (they have sources) and also routers. I'm not really clear what you mean by "forward" here. > > >>>> XiveNVT *(*get_nvt)(XiveFabric *xf, uint32_t server); > >>> > >>> This one a VPDT lookup, yes? > >> > >> yes. > >> > >>>> XiveEQ *(*get_eq)(XiveFabric *xf, uint32_t eq_idx); > >>> > >>> And this one an EQDT lookup? > >> > >> yes. > >> > >>>> } XiveFabricClass; > >>>> > >>>> It helps in making the routing algorithm independent of the model. > >>>> I hope to make powernv converge and use it. > >>>> > >>>> - a set of MMIOs for the TIMA. They model the presenter engine. > >>>> current_cpu is used to retrieve the NVT object, which holds the > >>>> registers for interrupt management. > >>> > >>> Right. Now the TIMA is local to a target/server not an EQ, right? > >> > >> The TIMA is the MMIO giving access to the registers which are per CPU. > >> The EQ are for routing. They are under the CPU object because it is > >> convenient. > >> > >>> I guess we need at least one of these per-vcpu. > >> > >> yes. > >> > >>> Do we also need an lpar-global, or other special ones? > >> > >> That would be for the host. AFAICT KVM does not use such special > >> VPs. > > > > Um.. "does not use".. don't we get to decide that? > > Well, that part in the specs is still a little obscure for me and > I am not sure it will fit very well in the Linux/KVM model. It should > be hidden to the guest anyway and can come in later. > > >>>> The EQs are stored under the NVT. This saves us an unnecessary EQDT > >>>> table. But we could add one under the XIVE device model. > >>> > >>> I'm not sure of the distinction you're drawing between the NVT and the > >>> XIVE device mode. > >> > >> we could add a new table under the XIVE interrupt device model > >> sPAPRXive to store the EQs and indexed them like skiboot does. > >> But it seems unnecessary to me as we can use the object below > >> 'cpu->intc', which is the XiveNVT object. > > > > So, basically assuming a fixed set of EQs (one per priority?) > > yes. It's easier to capture the state and dump information from > the monitor. > > > per CPU for a PAPR guest? > > yes, that's own it works. > > > That makes sense (assuming PAPR doesn't provide guest interfaces to > > ask for something else). > > Yes. All hcalls take prio/server parameters and the reserved prio range > for the platform is in the device tree. 0xFF is a special case to reset > targeting. > > Thanks, > > C. > -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
signature.asc
Description: PGP signature